Y-mer


Y-mer uses Y chromosome-specific k-mers and distance-based models to predict Y chromosome haplogroups (Yhg). With this tool the user can upload their own data in the form of a fastq file. Y-mer will determine the closest Yhg for the uploaded sample in the chosen model on the basis of highest similarity. An example input file, an ancient genome from Caspian Steppe region (Damgaard et al. 2018, Nature 557:369-374) with 83MB in size and 0.0055x in chrY coverage, can be downloaded from here: DA189.

Your work ID: 692d61f79a601


0%


The currently available models include
11 basic haplogroups (AB, C, E, G, H, IJ, LT, N, O, Q, R) that are common at the World (W),
22 (AB, C, E1, E2, E4, G, H, I1, I2, J1, J2, LT, N3, N4, O1, O2'5, O3, O6, Q, R1a, R1b, R2) at European (E), and
23 (E2a, G2a, I1a, I1c, I1d, I1i, I1m, I2, J1, J2a, J2b, LT, N3a3, N3a4, Q, R1a1, R1a2, R1b1, R1b11, R1b2, R1b3, R1b6, R1b8) at Northeast European (NE) levels.
The k-mers used in the models have been extracted from sets of 21, 110, 213 and 222 Y chromosomes and the models have been trained on subsets of individuals from the 1000G and EGC projects data. The I1 and R1 models predict only the specified subclades of the given haplogroups.

Further details of each model are described in Puurand et al. 2025.
M21W M21E M21NE
M110W M213E M222NE
M43I1 M80R1


The key results are reported in the last line of the output in four or more columns:

sample - sample ID
coverage - Y chromosome coverage estimated from k-mer based exact matches (this is expected to be lower than mapping based coverage, which tolerates mismatches)
haplogroup - predicted most likely haplogroup
pvalue - estimated on the basis of the distances of the target sample to competing haplogroups used in the model
alternatives - alternative haplogroups are reported in increasing order of their p-values if the p-value of the primary haplogroup is higher than 0.05.

Department of Bioinformatics, University of Tartu