Manual for downloadable version of StrainSeeker

Contents

Installing instructions

Dependencies

System requirements Programs required to run StrainSeeker

Database

Structure and size

The total space required while building the database was about 200 GB but 300 GB is recommended (some large temporary files are created). Structural information (which are parent/child nodes, k-mer counts) is stored in a small text file info.txt, which resides in the database directory.

Creating database

EXAMPLE COMMAND LINE: perl builder.pl -n refseq_guide_tree.nwk -d strain_fasta_directory -w 32 -b ss_blacklist_w32.list -o my_database

-n is the guide tree in Newick format, describing the relationships between given strains.
-d is a directory containing all the .fna files for strains used in the Newick file.
-b is the path to blacklist (must have the same k-mer length as parameter -w).
-w is the k-mer length.
-o user-defined database name.

Additional parameters can be used which can be seen below or with the help flag: perl builder.pl -h

Required files

Database contents

Subtrees

Depending on the tree size and diversity of the strains used, some nodes (including root) might be empty of k-mers. Therefore multiple subtrees are automatically produced where the number of total unique k-mers in node exceed given cutoff (Builder's -m or --min parameter). A tree is also split into subtrees if the number of k-mers in a node exceeds previously mentioned cutoff, but still has considerably less k-mers than one of it's subnodes (difference can be set with Builder's -g or --greater parameter). Builder and Seeker take subtrees into account automatically.

Builder parameters and their effect

Search

EXAMPLE COMMAND LINE: perl seeker.pl -i sample_file.fastq -d ss_db_w32 -o sample_result.txt

Search process

Seeker parameters and their effect

Creating blacklist

File descriptions