This is the KATK package for calling rare mutations directly from raw sequencing reads. Copyright Tartu University 2020 All programs are distributed under the terms of GNU GPL version 3. The source code is available in github GenomeTester4 package (https://github.com/bioinfo-ut/GenomeTester4/) Precompiled binaries and database files can be downloaded from: http://bioinfo.ut.ee/KATK/ Quick usage katk.pl FASTQ_FILES... Generates index and calls variants. The output name will be derived from the name of first fastq file and written to the working directory. Database files cmd_20190410.dbb and cmd_20191031.txt should be in the working directory. Full usage Step 1: Indexing gmer_counter ARGUMENTS SEQUENCES... Arguments: -v | --version - Print version information and exit -db DATABASE - SNP/KMER database file -dbb DBBINARY - binary database file -w FILENAME - write binary database to file -32 - use 32-bit integeres for counts (default 16-bit) --max_kmers NUM - maximum number of kmers per node --silent - do not output kmer counts (useful if only compiling db or index is needed --header - print header row --total - print the total number of kmers per node --unique - print the number of nonzero kmers per node --kmers - print individual kmer counts (default if no other output) --compile_index FILENAME - Add read index to database and write it to file --distribution NUM - print kmer distribution (up to given number) --num_threads - number of worker threads (default 24) --prefetch - prefetch memory mapped files (faster on high-memory systems) --recover - recover from FastA/FastQ errors (useful for corrupted streams) --stats - print some statistics about sequence and kmers -D - increase debug level -DDB - increase database debug level To generate index for subsequent calling the argument --compile_index INDEX_NAME has to be used Step 2: Calling gassembler [OPTIONS] [KMERS...] Arguments: -v, --version - print version information and exit -h, --help - print this usage screen and exit --dbi FILENAME - index of sequenced reads --seq_dir DIRECTORY - directory of fastq files (overrides location in index) --region CHR START END SEQ - reference region to be called --region_file FILENAME - read reference region and kmers from file (one line at time) --min_coverage INTEGER - minimum coverage for a call (default 4) --sex male|female|auto - sex of the individual (default auto) --coverage FLOAT | median | local | ignore - average sequencing depth (default - median, local - use local number of reads) --num_threads - number of threads to use (default 24) --min_p FLOAT - minimum call quality (default 0.95) --min_pmut FLOAT - minimum reference call quality (default 0.50) --exome - Disable quality models (needed if coverage variability is high) --advanced - print advanced usage options Advanced arguments: --error_prob FLOAT - Probability of error (default 0.001) --min_confirming INTEGER - minimum confirming nucleotide count for a call (default 2) --min_group_coverage INTEGER - minimum coverage of group (default 1) --max_divergent INTEGER - maximum number of mismatches per read (default 4) --min_align_len INTEGER - minimum alignment length (default 25) --min_group_size INTEGER - minimum group size (default 3) --min_group_rsize FLOAT - minimum relative group size (default 0.00) --max_group_divergence INTEGER - maximum divergence in group (default 3) --max_group_rdivergence INTEGER - maximum relative divergence in group (default 3) --skip_end_align INTEGER - skip nucleotides at region ends during alignment (default 10) --skip_end_call INTEGER - skip nucleotides at alignment ends (default 10) --allow_one_dir - Allow calling if all confirming reads have the same dir --output poly | best | all - output type (only polymorphisms, best calls for positon, all calls) (default poly) --counts - output nucleotide counts --extra - output extra information about call --alternatives - output also homozygous variant for each heterozygous position -D - increase debug level -DG - increase group debug level