One important distinction between mFASTA-represented alignments and the alignment representation in a CD file should be kept in mind in when using this converter. In general, an mFASTA alignment may have a residue in the first (i.e., reference) sequence aligned to a gap in each sequence in the alignment. All conserved domains in the CDD, however, have a 'block structure' in which every aligned residue in the reference sequence is aligned to a residue on each sequence in the multiple alignment. An algorithm termed 'intersection by master' (IBM) can generate a multiple alignment obeying the CDD constraints from one that does not by simply truncating any alignment column containing a gap. (Clearly, the reverse transformation is not possible.)
The fa2cd converter can output a CD file from the input mFASTA with or without running the IBM algorithm. By default, IBM is not run the input alignment is preserved without truncation. The '-ibm' flag truncates the input alignment to remove gap-containing columns consistent with CDD conventions. Neither CDTree nor Cn3D require that an input multiple alignment have a valid block structure in the sense described. However note that a) Cn3D's alignment viewer displays alignments as if IBM were run, and b) those CDTree functions that assume a CDD-style alignment may fail and generate warning messages.
|