OrthologID identifies orthologous groups and classifies query sequences by screening the sequences for diagnostic characters using the CAOS algorithm (Sarkar et al. 2002). OrthologID uses two types of diagnostic characters: pure vs. private. The following figure illustrates the difference between the two.
Pure diagnostic characters are characters that are shared between all members of a clade, and are absent in members of other clades at a particular node. For example, amino acid “A” at position #4 is a pure diagnostic character that defines node A because it is shared by all members of clade A, but is not in any members of clade B.
Private diagnostic characters are present in one or multiple members of a clade, but are absent in members of other clades at a particular node. For example, both amino acid residues “E” and “Q” at position #6 are private diagnostic characters that define node A since they are both shared by multiple members of clade A, but are not in any members of clade B. Other examples of private characters that define node A include amino acid “L” at position #2, and amino acids “E” and “T” at position #7.