Imports and Conservation in Cn3D

Cn3D macromolecular structure viewer

Imports and Conservation in Cn3D

	This chapter illustrates some more advanced alignment features in Cn3D: Importing sequences and structures Visualizing sequence conservation
	Importing sequences and structures
	Cn3D is the perfect tool for viewing precomputed alignments - like those from VAST as described in the previous chapter. But it is also capable of importing user-selected sequences and structure, in order to display the mapping of a sequence onto the current master structure, or to align new structures. Let's start with a simple example. In an earlier section, we pointed out that when the 1D5R structure (Lee et al., 1999) is first loaded into Cn3D, the sequence numbering of the residues in the sequence window doesn't match the numbering of the residues in the PDB file, because of the residues missing in the refined structure. While not always true, in this case the PDB residues are numbered sequentially according to the "true" sequence of the natural protein. This is better illustrated by an alignment of the 1D5R sequence with the natural sequence, where the missing residues in the structure are made apparent. So, let's use Entrez to find the natural sequence: select "Protein" in the "Search" menu on the left, type in "human PTEN" in the query box, and hit "Go". Look through the list to find SwissProt's PTEN_HUMAN entry. Click on the accession "P60484" link to go to the GenPept report. For file import, Cn3D only understands FASTA, so select "FASTA" to the right of the "Display" button, then hit the "Display" button to see the FASTA-formatted sequence, which we will use later. First, load the 1D5R structure into Cn3D (see the MMDB instructions for this). Then in the sequence window, so Imports:Show Imports, which will bring up the (empty) import window. There are then two ways to import a sequence into Cn3D, either from a local file or by network download. To use a local file, cut and paste the FASTA-formatted sequence into a text file on your computer. Then do Edit:Import Sequences:From FASTA File and select in the file dialog the file you just saved. If you're not behind a firewall, you can also have Cn3D download the sequence directly from NCBI. To do this, do Edit:Import Sequence:Network via GI/Accession, then type in the above accession code "P60484" and hit "Ok." Note that at this point, all the residues are in lower case, which means there's no alignment. So you still need to align the two sequences using some alignment algorithm. Cn3D has a basic BLAST interface built in, so do Algorithms:BLAST Single and then click once anywhere on one of the two sequences in the import window. This will align the two sequences using default gapped BLAST. Scrolling through the import window, one can now clearly see the small N- and large C-terminal tails missing in 1D5R, as well as the excised loop from residues 286 to 308. Notice also that the PDB numbering of the 1D5R sequence matches the sequence numbering of the natural sequence. Finally, to make this alignment into the "multiple" alignment in the sequence window, do Alignments:Merge All. This will move the aligned pair into the sequence window, and cause the structure to be re-colored in the usual alignment coloring scheme. Importing structures is accomplished in much the same way. For example, Say we wanted to re-create the VAST alignment of 1D5R and 1VHR that was an example in the previous chapter. First, go to the MMDB summary of 1VHR. As with sequences, you can download the MMDB file yourself (select "save file" in the MMDB page), or have Cn3D download it for you. So, load the 1D5R structure into Cn3D again, go to the import window, and do Edit:Import Structure. If you do "from file", then point the dialog to the MMDB file you've saved on your computer. If you do "via network", then type in the MMDB id of 1VHR, which is 4625. Whichever way you initiate the import, Cn3D will then ask which chain of 1VHR you want to align. In this case, we want to align chain A, so select that. Cn3D will then connect to the VAST database at NCBI and try to find a VAST alignment for these two sequences. If the import is successful, you will see the sequence alignment that corresponds to the VAST alignment we saw in the previous chapter. Do Alignments:Merge All to move the import into the alignment window. Notice that there's no structure visible yet; Cn3D only shows structures that are present in the multiple alignment when a file is first opened. So, do File:Save to export to a local file (answer "Yes" to any questions it asks), then do File:Load and point it to the file you just saved. You should now see the two aligned structures, showing also the structure alignment imported from VAST. If there is no VAST alignment, or you're behind a firewall or for some reason Cn3D can't connect to NCBI, then you'll need to align the sequences and structures manually. Do Algorithms:BLAST Single and then click on one of the sequences to initiate a BLAST alignment. Then do Alignments:Merge All to move the import into the alignment window. Save and re-load the file, then do File:Realign Structures to calculate the structure alignment based on the alpha carbons of the aligned residues.
	Visualizing sequence conservation
	Cn3D has the ability to calculate various color schemes based on sequence conservation in the columns of an alignment. The location of conserved residues in the structure is thus made directly visible by applying "conserved" color to both the structure and the sequence displays. Cn3D uses red to indicate highly conserved positions, and blue for highly variable positions. Just how the composition of a column of letters in an alignment should be mapped to a color is nontrivial; Cn3D uses a variety of simple algorithms to color by conservation. These are the options in the Style:Coloring Shortcuts:Sequence Conservation submenu: Identity - colors columns of identical residues red, otherwise blue. Variety - scales from red to blue depending on the number of different residues in the column, regardless of residue type. For a column of four residues, for example, most conserved is of course a single type (e.g. "AAAA"), then two types ("AAAV"), then three ("AAVW), and the least conserved is four ("AVIW"). So there are four colors used: variety of 1 = red, 2 = blueish-red, 3 = reddish-blue, 4 = blue. Each gap is considered an additional type. Weighted Variety is similar, except the "variety" is weighted by a substitution probability matrix (BLOSUM62). Thus a column of similar residues (e.g. "VVIL") will be counted as more conserved (and made more red) than one of dissimilar residues ("AAWK"), despite the fact that the "variety" is the same (in this case, 3). Information Content also uses a BLOSUM62 matrix to calculate a conservation score. using a log ratio of observed versus expected frequency of residues in each column. Fit is a little different, in that it shows residues in a more blue color to indicate that a residue does not fit well with the conservation in the rest of the column (e.g., a lysine in a column otherwise composed of valines). Using these color schemes, one can quickly spot where the more conserved (more red) residues are located on the structures and in the alignment. Often the very conserved residues around an active site, for example, will be clearly seen this way.

Revised 20 September 2016