SNPmasker 1.1

SNPmasker Help

Overview of the program

SNPmasker is a program to mask all SNPs in given sequence using information of dbSNP. Additionally it is possible to mask all non-unique words using GenomeMasker module. It allows masking of the entire template DNA before primer design to avoid consideration of poor primer candidates. GenomeMasker is able to identify and mask repeating words that have not included in current repeat libraries. The location and abundance of all overlapping 16-nucleotide long motifs (can also be called 16-nt. words) from the genome is recorded by the GenomeMasker program. Those motifs that occur frequently in the genome can be called repeats (and are masked). One classical repeat (like Alu repeat) consists many repeated 16-nucleotide motifs. This, combined with a specific 3'-end masking technique, allowed us to achieve more sensitive masking than existing approaches.

Index file for masking repeats is created with word length 16 and binding cutoff 10 (all 16 bp words appearing more than 10 times in genome will be masked by GenomeMasker). Stand-alone GenomeMasker binaries, example files and README are available here.

Input data

User can insert desired region coordinates: chromosome name, start and end position of the region or paste sequence directly into the textbox.

Alternatively user can upload its own sequence file that must be in FastA format.

    >sequence_id
    TGCACAATTTGATGCCGGTTTAGTATTTGTTGGTGGCTGCGTGCATAATAGC
    TTAAAATGCAGATGCTGAACTGGGAATTGCTGTTTGATGGTGAATTAGGGAA

SNPmasker parameters

E-mail sending

Enter your email to be informed when the job is finished. The link to results will be sent by e-mail. Alternatively you can bookmark the results page and come back later to check them.

Output of the program

The likelihood of getting many similar hits from genome is strongly reduced by using longer query sequence for masking. We expect that typical user will use 100-1000 bp sequences for masking, which can be localized without problems. In other cases, the genomic coordinates from the MEGABLAST alignment file can be easily retrieved after manual inspection of the alignment file. The coordinates can be used for submitting desired DNA region to SNPmasker.

masked_sequence.fas:

This file contains masked sequence in FastA format. SNPs are masked with user-defined symbol. If repeat-masking option was selected, repeats are masked too./p>

info.txt:

This file contains description of all masked SNPs in given sequence. Columns include: Reference SNP ID(rs), chromosome name, position in chromosome, strand and possible alleles. If "Major allele masking" is selected, more detailed info file will be created. Additional columns are: most popular allele in given population, frequency of that allele, callrate, name of the population and notification wether a nucleotide has been changed in given position or not.

results.zip file:

This file contains all results files and they are packed with zip program.

alignments.html

If SNPmasker cannot map user-given sequence uniquely in genomic DNA, alignments file will be presented. It is html-formatted MEGABLAST output. It is possible to refine Your search selecting coordinates of desired region only.

For further information please contact: reidar.andreson [at] ut.ee