VAST: Vector Alignment Search Tool

This VAST Search help document describes how to use the VAST Search page, which allows you to enter a query structure as a PDB-formatted file. The search results will be returned in original VAST output format, which is described in the VAST help document.
(The new VAST+ output format is currently available only for structures that are publicly available in MMDB. The VAST+ help document provides illustrated examples of VAST+ search results and describes the difference between VAST and VAST+.)

VAST Search Help
back to top What is VAST Search?
  VAST Search is a World Wide Web service for "on the fly" searches to compare the 3D structure of a query protein against the 3D structures of proteins in NCBI's Molecular Modeling Database (MMDB). It finds similar structures (sometimes referred to as "neighbors") using the Vector Alignment Search Tool (VAST) algorithm.

The VAST Search service allows you to input your data in PDB file format. It identifies the 3D domains in each protein molecule of your query structure and finds other structures that are 3D similar to the individual protein molecules, and individual 3D domains, in your query.

This "on the fly" search service is typically used if your query structure is not yet publicly available (for example, if you have a newly resolved structure) and you want to identify other structures that might have a similar 3D shape. On the other hand, if your query structure is publicly available in MMDB, you can easily retrieve pre-computed VAST results by following the "Similar Structures" link near the upper right corner of an MMDB structure summary page.

At this time, the "on the fly" VAST Search service still returns results in the original-style VAST display, which lists structures that have similarities to individual protein molecules and 3D domains in your query structure, and allows you to view their sequence alignments and 3D superpositions. The VAST help document provides illustrated examples of the original VAST display, alignment footprints and a 3D superposition.

The new VAST+ display, which ranks similar structures based on the 3D similarity of their macromolecular complex (biological unit) to the query structure, and provides the ability to view sequence alignments and superpositions of the biological units, is currently available only for structures that are available in the public database. If you retrieve pre-computed VAST results for a record that's publicly available in MMDB, the VAST+ display will be shown by default. The VAST+ help document provides illustrated examples of the new VAST+ results display and describes the difference between VAST and VAST+.

back to top How to submit a file to the VAST Search service?
  The submitted file should be in plain-text (ASCII) in PDB file format. (Important: Word documents will not work!) The file only needs to contain ATOM records for chain(s), and HETATM records for any heterogens. It does not need any SEQRES records or others that do not contain atomic coordinates.

To submit a data file to VAST Search:
  • Open the VAST Search home page.
  • The first section of the page allows you to "Input a PDB-formatted file directly into VAST Search."
  • Use the [Choose File] button to select the desired PDB-formatted data file from your computer.
  • Select the data set (all or medium redundancy) against which you want to search.
    The (default) "Medium-redundancy Subset of PDB" is about 1/10 the size of "All of PDB" and therefore the search will be much faster.
  • Click on the [Submit] button to begin the search. This step will convert the PDB formatted file into an ASN1 formatted file, which is used for the VAST Search.
  • A long, random numerical request identifier is assigned to each search. It can be used for one week to retrieve the search results without doing the search over.
back to top How to select a database search set?
  You have the option of performing the search against a database consisting of a non-redundant subset of the PDB, or all of PDB. The non-redundant subset was selected by clustering the protein sequences at a BLAST P-value cutoff of 10e-40. The resulting set of representatives is about 1/10-th the size of the full database, and the search time should be proportionally faster. (Additional details about the non-redundant data sets are provided in the VAST help document.)  
back to top Where to see the 3D structure of your query protein and how to launch a VAST Search job?
  After the input file is processed you will be taken to the structure summary page and informed "Data Parsing Done". Your request identifier is displayed; copy and save it, you will need it to see the final results!

The structure summary of your query is displayed in a table and in graphics as well. The graphic "rulers" corresponding to the chains in your input file. If there are protein chains and 3D domains are present, then these domains will also be displayed with graphics, where you can see the locations and boundaries of each 3D domain on the protein chain.

You can also use the [View 3D Structure] button to verify that your file has been input correctly. The option "Display" will display the structure using the Cn3D graphics viewer. The options "See File" or "Save File" allow you to directly view the ASN1-format file, or to save it to your local disk. If you are satisfied that your file has been converted correctly, then click on the [Start] button to begin the actual VAST Search. This will take you to the status monitoring page, tell you that the "Search [is] in Progress" and inform you of the results of your search job time.
back to top How long might it take to complete a VAST Search job?
  There are 3 main factors that will influence the running time of VAST Search. First, if you choose to search the non-redundant subset (which is recommended), then the search will run about 10 times faster than for searching the entire database. Second, if you have multiple protein chains in your file, or large chains that split into many domains, then this will also lengthen the time to complete the search. Third, the type of the fold will also have an influence. A VAST Search with a structure such as a TIM barrel, for which there are very many other TIM barrels in the database (even the non-redundant one), will take much longer than a search with a fold that is less populated in the database.  
back to top How to view results of VAST search?
  The status monitoring page will take you to the result summary page automatically after a search job has completed. Or on the VAST Search home page enter your Request Id in the box adjacent to the [Show] button. Then click on [Show]. If the VAST Search job has not yet completed, then you will be informed of the status; if it has, then you will see the result summary page which tell you that "Search Job Completed".

The summary information will be detailed in a summary table. Besides the structure summary about the query chains/domain, the table describes how many structures in the database search set you chose when you sumbmitted your file are significantly similar to the query chain/domain, as found by VAST. Follow the links in the table, you will see the detailed footprints of alignments between each query chain/domain and their neighbors. Important: Since VAST returns neighbors that are deemed significant according to it's E-value, and this E-value is related to the size of the structure, it is not unusual for the neighbor lists for the entire chain and the domains to be different.

If you forget what your protein looks like and want to view its 3D structure, you may click on the [View] button of "View the Summary (graphical)" on the summary page. Besides viewing the structure in the graphics, you may click on the ruler for an entire chain to see the VAST structure neighbors for that entire chain; click on the bars for the domains to see the VAST structure neighbors for those domains.
back to top Where to see individual VAST structure neighbors?
  The VAST structure neighbors are introduced to you by clicking links in the VAST Search result summary page. All neighbors are given their names and alignment footprints in the graphics on the neighboring page. Besides, the page has several options available for viewing the alignments, selecting a subset to display, sorting neighbors based on various measures and looking for possible structure similar candidates:
  • [View 3D Structure]: This button can be used to view structural superpositions in Cn3D, or to output the superpositions into a file. Select the neighbors you wish to view by clicking the checkboxes to the left of the PDB codes in the neighbor list. Up to 10 neighbors may be selected. If you wish to See or Save the file, see the options under "Display".
  • [View Alignment]: This may be used to obtain a text version of the sequence alignments corresponding to the structural superposition of selected (check-boxed) neighbors.
  • [List]: for selecting subsets, sorting by alignment quantities such as alignment length, root mean square deviation, percent identical residues, etc.
  • [Find]: for selecting or looking for neighbors by PDB/MMDB/or domain identifiers.
The VAST Help document provides additional information and illustrated examples of the initial VAST results page and the graphic display of alignment footprints.
back to top How can I save the alignments?
  Search results will remain on the server for a week to allow you time to save the hit lists and the alignments. The HTML page containing the list of neighbors can be saved locally and reloaded into your browser. Alignments can be saved in either binary or text ASN.1 format to be viewed locally with Cn3D. For any hit list of five or more neighbors, a maximum of five neighbors can be viewed with Cn3D and saved in ASN.1. Alternatively, you can save the alignments of the query protein and a single hit as a kinemage file to read with the program MAGE or save the rotated PDB file of each neighbor, which should allow you to view the alignments with a program that reads PDB files.

back to top What is a Biostruc and what does it mean when the error message "Missing Biostruc" appears?
  Biostruc is a file format specificed in data description language called ASN.1.  It includes a complete chemical description of a molecules, which is not present in PDB formatted files.   Biostruc format allows the submitter to view the structure of the query protein and alignment superpositions with it's neighbors in Cn3D and is needed to browse the list of neighbors. When the error message "Missing Biostruc" is returned, the means that this file was not created and the search for neighbors cannot begin.  
back to top Do I need Cn3D to examine the list of structure neighbors?
  No. You have the option to view the alignments with any graphics program of your choice, but we recommend you view the structural alignments with Cn3D, which we distribute for free with VAST Search.  
back to top What is the VAST Search privacy policy?
  When you submit a data file to VAST Search, a long, random numerical request identifier is assigned to each search. The request identifier is presented to you when the VAST Search is initiated. The structures of the query protein and results of the search can be viewed only if the request identifier is specified, either by bookmarking the "Data Parsing Done" or "Search In Progress" page, or recording the identifier. This ensures that the original data and the search results are confidential. Note that results are maintained on the server for one week only. After that time please submit a new request.  
Revised 26 September 2016