Diagnostic Characters: Pure vs. Private

OrthologID identifies orthologous groups and classifies query sequences by screening the sequences for diagnostic characters using the CAOS algorithm (Sarkar et al. 2002).  OrthologID uses two types of diagnostic characters: pure vs. private.  The following figure illustrates the difference between the two.

Pure diagnostic characters are characters that are shared between all members of a clade, and are absent in members of other clades at a particular node. For example, amino acid “A” at position #4 is a pure diagnostic character that defines node A because it is shared by all members of clade A, but is not in any members of clade B.

Private diagnostic characters are present in one or multiple members of a clade, but are absent in members of other clades at a particular node. For example, both amino acid residues “E” and “Q” at position #6 are private diagnostic characters that define node A since they are both shared by multiple members of clade A, but are not in any members of clade B.  Other examples of private characters that define node A include amino acid “L” at position #2, and amino acids “E” and “T” at position #7.