Data and software for downloading | Department of Bioinformatics

KATK package version 4.2

K-mer databases, user manuals and support.
Source and precompiled binaries of KATK (gmer_counter and gassembler) are available in a subfolder under the GenomeTester4 project in GitHub.

KATK is a toolkit for discovering and calling SNV and indel genotypes from human personal genomes. Unlike FastGT, KATK can detect all variants, even rare and de novo variants. It can process a deep-sequenced human genome in 1-3 hours.

Citation: Kaplinski L, Möls M, Puurand T, Pajuste FD, Remm M (2021). KATK: Fast genotyping of rare variants directly from unmapped sequencing reads. Human Mutation, 42(6):777-786. doi: 10.1002/humu.24197. [Full Text]

AluMine package

Source and precompiled binaries of AluMine from GitHub.
Sequences of 13,396 potentially polymorphic Alu elements. FASTA file with full-length sequences of potentially polymorphic REF+ elements. (REF+ are those elements that are present in the reference genome).
List of 13,396 potentially polymorphic ALU elements. Coordinates and signature sequences of discovered REF+ Alu element insertions.
List of potentially polymorphic ALU elements. Coordinates and signature sequences of discovered 23,108 REF+ and REF- Alu element insertions.
Combined set of ALU and SNP marker k-mers for genotyping. K-mers for simultaneous genotyping of 23,108 polymorphic Alu element insertions and 30 million SNVs. FastGT format.
Stats of markers. Based on genotyping these markers on 2200 individual genomes. Contains allele frequencies, fraction of expected genotypes, fraction of markers violating HWE, etc. for each marker. See also Table S2 in Additional_File_2.xlsx.
Stats of tested individuals. Based on genotyping all Alu markers on 2200 individual genomes. Contains depth of coverage (based on 25-mer coverage), number of discovered REF+ and REF- elements per individual, and other information.
pan_troglodytes_32.list Chimp 32-mer list used during the REF-plus element discovery.
human_37_25.index Human 25-mer list used during the REF-minus element discovery to localize potential novel elements in the human reference genome.
human_37.names Additional file required for gtester. This is used during the REF-minus element discovery.

AluMine is a toolkit for discovery and genotyping of polymorphic Alu element insertions in personal genomes. It can do discovery in 4 hours and genotyping in 20 minutes (from FASTQ of 30x personal genome).

Citation: Puurand T, Kukuškina V, Pajuste F-D, Remm M. (2019). AluMine: alignment-free method for the discovery of polymorphic Alu element insertions. Mobile DNA, 10: 31.
doi: 10.1186/s13100-019-0174-3

FastGT package version 4.2

K-mer databases, user manuals and support.
Source and precompiled binaries of FastGT are available in a subfolder under the GenomeTester4 project in GitHub.

FastGT is a toolkit for counting k-mers from FASTQ files and calling 30 millions human SNV genotypes. It can process 30x human genome in less than an hour.

Citation: Pajuste F-D, Kaplinski L, Möls M, Puurand T, Lepamets M, Remm M. (2017). FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads. Scientific Reports, 7:2537.

GenomeTester package version 4.2

Source and precompiled binaries

GenomeTester4 is a toolkit for counting k-mers from nucleic acid sequences and performing basic set operations (union, intersection and difference) on k-mer lists. It can be used in many bioinformatic analyses.
GenomeTester4 is released under GNU General Public License version 3.

Citation: Kaplinski L, Lepamets M, Remm M. (2015). GenomeTester4: a toolkit for performing basic set operations – union, intersection and complement on k-mer lists. GigaScience; 4:58

GenomeMasker package version 1.3

GenomeMasker (gmasker) masks over-represented words in the fasta file, preventing design of primers in repeated regions. GenomeTester (gtester) is the program that tests 1) whether PCR primers have excessive number of binding sites on template sequence and 2) how many PCR products would be amplified from the template DNA and where are they located. Package contains also modified the PRIMER3 program (gm_primer3), to be able to design primers from lowercase-masked sequences.

Citation: Andreson R, Reppo E, Kaplinski L, Remm M. (2006). GENOMEMASKER package for designing unique genomic PCR primers. BMC Bioinformatics; 7:172.

StrainSeeker

StrainSeeker binaries and databases

StrainSeeker is a program for identification of bacterial strains from raw sequencing reads. StrainSeeker database contains custom-built phylogenetic tree of all bacterial strains and k-mers specific to each node of the tree.

Citation: Roosaare M, Vaher M, Kaplinski L, Möls M, Andreson R, Lepamets M, Koressaar T, Naaber P, Koljalg S, Remm M. (2016). StrainSeeker: fast identification of bacterial strains from unassembled sequencing reads using user-provided guide trees. PeerJ 5:e3353. doi: 10.7717/peerj.3353.

Primer3_masker
K-mer lists for masking genomic sequences with Primer3_masker are available from the Primer3_masker webpage at http://primer3.ut.ee/lists.htm.

Citation: Kõressaar T, Lepamets M, Kaplinski L, Raime K, Andreson R, Remm M. (2018). Primer3_masker: integrating masking of template sequence with primer design software. Bioinformatics 34(11):1937-1938. doi: 10.1093/bioinformatics/bty036.

MultiPLX version 2.0

MultiPLX is a tool for analyzing PCR primer compatibility and automatically finding optimal multiplexing (grouping) solution. It uses state-of-the-art nearest neighbour DNA binding thermodynamics to estimate possible unwanted pairings between PCR samples.

Citation: Kaplinski L, Andreson R, Puurand T, Remm M. (2005). MultiPLX: automatic grouping and evaluation of PCR primers. Bioinformatics. 2005 Apr 15;21(8):1701-2.

SLICSel version 1.1

SLICSel is a program for designing specific oligonucleotide probes for microbial detection and identification. To obtain maximal specificity of designed oligonucleotides, SLICSel uses the Nearest-Neighbor thermodynamics-based approach for probe design.

Citation: Scheler O, Kaplinski L, Glynn B, Palta P, Parkel S, Toome K, Maher M, Barry T, Remm M, Kurg A. (2011). Detection of NASBA amplified bacterial tmRNA molecules on SLICSel designed microarray probes. BMC Biotechnology 2011;11:17.

FastaGrep version 2.0

FastaGrep is a tool for searching oligonucleotide binding sites from FastA genomic sequences. It can do both match/mismatch based and thermodynamic binding energy searches.