NCBI Inferred Biological Interactions Server Database (IBIS) Help

What is IBIS?

IBIS is the Inferred Biomolecular Interactions Server (/Structure/ibis/ibis.cgi), which was developed at NCBI to analyze and predict interactions between proteins and other biomolecules. The unique feature of IBIS is that it identifies protein interaction partners together with the locations of their binding sites on the query sequence/structure. IBIS categorizes protein binding sites into five categories: protein, small molecule, nucleic acids, peptide and ion binding sites. For a given sequence/structure query with unknown binding sites, first IBIS reports physical interactions made by this query. These so-called observed interactions have been directly observed in experimentally-determined structures and are denoted by a letter "o" in red color at the beginning of the row. The whole list of observed interactions for all PDB queries can be downloaded from ftp://ftp.ncbi.nih.gov/pub/mmdb/ibis/.

Second, IBIS can infer binding sites by homology, by inspecting protein complexes formed by close homologs of a given query. To ensure biological relevance of inferred binding sites, IBIS clusters them based on their sequence and structure conservation. Only those binding sites that are evolutionarily conserved among non-redundant homologous proteins are considered in the prediction. Additionally, binding site clusters are verified by comparing them with the curated binding site annotations from the Conserved Domain Database (CDD⁽²⁾) (if present) and only biological assemblies from PDB/MMDB databases are used. After binding sites are clustered, position specific scoring matrices (PSSMs) are constructed based on binding site alignments. PSSMs and are subsequently used (together with other measures) to rank binding sites with respect to their closeness to the query and their biological relevance.

How to search IBIS?

IBIS can be searched by a four letter PDB code (Ex: 1XBB).

If known, a single letter chain identifier can also be supplied along with the PDB code (Ex:1XBBA).

IBIS can also be searched using the GI (GenBank identifier) of any protein sequence (Ex: 1668706) or protein sequence accession (O36006).

How does IBIS find inferred interactions for protein sequences without a known structure?

IBIS can also be searched for sequences with no known structures using their NCBI GenBank identifier (GI) or protein sequence accessions. In the current version, for a given protein sequence query, BLAST search is performed against sequences of all structures in PDB/MMDB⁽¹⁾ to find the closest homolog. The interactions for the closest homolog are then displayed.

Why there are no results reported for my query?

If the query is a PDB identifier and does not have any homologs with more than 25% sequence identity over the structurally superimposed region, IBIS will not report any interactions.

If the query is a protein sequence that does not have any close homologs, IBIS will not report any interactions.

How are different types of interactions defined and how to navigate between them?

Five different types of interactions/binding sites are currently presented in IBIS. protein-protein, protein-chemical, protein-nucleic acid, protein-peptide and protein-ion. The tabs "Protein", "Chemical", "DNA/RNA", "Peptide" and "Ion" on the top left corner show the interactions/binding sites for the respective type. The user may navigate between different types of interactions by clicking on these tabs.

Protein-chemical, protein-nucleic acid, protein-peptide and protein-ion interactions are based on the full chain sequence of the query.

Protein-protein interactions are annotated and inferred for each domain of the query. Domains on the query are mapped by using CDD⁽²⁾ database and CD-search method⁽³⁾. If no domains are mapped to the query, the full chain sequence of the query is used. Different query domains can be selected by clicking on the corresponding domain bubble below the query in the graphic at the top of the page.

How is the IBIS web page organized?

Each web page pertains to a single query sequence/structure or protein domain.
There are three main parts of the web page display.

A summary graphic is at the top of the page. The query sequence is represented by a "ruler". Bubbles below the sequence show the footprints of any conserved domains detected for the query. Underneath is a list of binding site clusters (see below), with locations of binding site residues indicated by triangles. Each cluster is labeled by the name of the interacting partner. Clicking on the partner name expands the corresponding row in the detailed table.

There is a table listing the details of each binding site cluster for the selected type of interaction (Protein, Chemical, DNA/RNA, Peptide, and Ion). Click on the [+] button at the left side of the cluster to see the cluster features. To close the cluster click on the [-] button.

On the left, there are Search tools for exploring the interaction data of a given query protein.

What is a "binding site cluster"?

A cluster consists of a collection of structures that are related to the query. All members of the cluster should contain similar binding sites. Homology is inferred by comparing the query with similar structures determined using the VAST algorithm for structure-structure superposition. Currently 25% identity threshold is used for all except for homooligomer interactions (those protein-protein interactions between two domains belonging to the same CDD families). For homooligomers we used a more stringent threshold of 50% identity between query and all cluster members. On the main page IBIS displays only those clusters that contain evolutionarily-conserved binding sites among non-redundant homologous proteins. Similarity between binding sites is measured in terms of sequence similarity and those positions which structurally overlap are assigned a higher weight. A "singleton" cluster has only one non-redundant member (after members with more than 90% identity are purged) and singleton clusters are displayed at the bottom of the list.

For each cluster, binding site residues are illustrated in the summary graphic and further details are provided in the corresponding row of the table. Clusters that contain an interaction observed in the query structure are marked by letter "o" at the beginning of the row. By expanding the cluster row, one can see additional information about its members.

What do the columns in the interaction summary tables mean?

Each row in the table corresponds to a binding site cluster. Each column in the table is defined as follows.

Interaction partner - name of a representative interaction partner which interacts with the actual query or with its homolog using the same binding site. For protein-protein interactions, the CDD(2) domain name of the partner is listed. If there are no domain assignments on the interacting chain, "No domain assigned" is displayed. For protein-chemical interactions, this column reports the name of small molecule bound to the representative member of the cluster. For protein-nucleic acid and protein-peptide interactions, the column reports the first 20 nucleotides/residues from the interaction partner of the representative cluster member.
Ranking score - score which ranks binding site clusters in terms of their biological relevance and similarity to the query. Components of the ranking score include the sequence- PSSM score; average sequence identity between the query and cluster members calculated over the whole structure-structure alignment; number of interfacial contacts and fraction of conserved (calculated as entropy) columns in the binding site alignment . All components of the ranking score are then normalized (Z-score) and all clusters are ranked with respect to this Z-score. The ranking score is not defined for "singleton" clusters. Clusters with CDD annotations are displayed on the top of the table.

Number of cluster members - a number of homologs in the cluster.

Average percent identity to query - an average sequence identity between the query and cluster members calculated over the whole structure-structure alignment.

Number of binding site residues - a union of binding sites mapped from all members of the cluster to the query.

Number of chemicals (for protein-chemical interactions) - a number of different chemicals present in a given binding site cluster.

Curator annotation - binding site annotation(s) from the CDD⁽²⁾ which overlaps more than 50% with the sites annotated by IBIS.

Taxonomic diversity - a last common ancestor of proteins from a given cluster.

What do the columns mean if the binding site cluster is expanded?

Each binding site cluster can be expanded (by clicking the (+) sign). Up to ten non-redundant members are displayed (non-redundancy defined with respect to 90% sequence identity), The complete list of members is displayed by clicking the "See all members" link at the bottom of the list of cluster members. Details of the columns are as follows,
First column lists a type of interaction partner. Other columns include:

Homologous complex - a PDB code of the complex which is homologous to a given query.

Homolog - a PDB chain identifier for a protein homologous to the query. Chain identifiers of type "A_1" represent chains generated by applying the crystallographic symmetry operations. All biological assemblies are taken from PDB/MMDB.

Interaction partner - a chain identifier of a protein, peptide or DNA/RNA which interacts with "Homolog" (see previous column) in the homologous complex. If this is an observed interaction, both chains are from query structure complex. For "chemical" and "ion" types of interactions, only one chain is listed, showing which chain in the structure homolog aligns with the query protein.

% identity to query - an average sequence identity between the query and the cluster members calculated over the whole structure-structure alignment.

Curator annotation (if available) is the same as in the summary table. Cluster members with annotations are elevated to the top of the table irrespective of their ranking score.

Binding site - an alignment of binding site residues of cluster members, query is shown at the top. Residues are numbered with respect to the full length protein sequence (used in GenBank database). Residue conservation is calculated in terms of the relative entropy and different colors are used to depict the degree of conservation: red indicates highly conserved and blue indicates medium and non-conserved residues are shown in black. All singletons are shown in grey.

View Binding Sites - clicking this button launches the NCBI graphical viewer Cn3D to view binding site residues. See "How do I view interaction sites in protein structures?" for more info.

How can I access the IBIS web page for another query structure?

Enter the PDB code or PDB code along with chain identifier (case sensitive) or a GI in the Search PDB ID/GI box on the upper right of the web page. Ex: 1xbb, 1xbbA, 1668706.

What is a singleton cluster?

"Singleton" means a cluster which has only one non-redundant member (after members with more than 90% identity are purged).

How do you "expand a cluster"?

The table contains a list of clusters. Click on the [+] button at the left to see the cluster features. When you are done browsing, to close the cluster click on the [-] button.

How do I view binding sites in protein structures?

Structures can be viewed with the Cn3D software (/Structure/CN3D/cn3d.shtml). The first step is to expand a binding site cluster and then select the specific structures or all structures in the table by using the checkboxes. Then clicking on the button [Show Binding Sites] will show the query structure superimposed with the selected structures and the binding sites depicted with side chains. The sequence viewer window of Cn3D also highlights the binding site residues based on sequence conservation in the aligned column.

How do you use the various "Advanced Search" features?

The search features enable you to filter the interactions based on various criteria. For example, one might be interested in the Chemical binding sites and want to know if there is a structure with a particular chemical bound to it. Importantly, the search results apply only to the IBIS data corresponding to the current query structure.

What are non-biological chemical sites?

Non-biological chemical sites are binding sites formed by non-biological small molecules (such as buffers, salts, detergents, solvents and ions added for purification and/or crystallization process). These sites are hidden by default, but can be viewed by selecting the option Non-Biological Chemical Sites:Show on the left side bar.

References :

1) Shoemaker BA, Zhang D, Thangudu RR, Tyagi M, Fong JH, Marchler-Bauer A, Bryant SH, Madej T, Panchenko AR.: Inferred Biomolecular Interaction Server --a web server to analyze and predict protein interacting partners and binding sites. Nucleic Acids Res 2010, 38(D):518-24.

2) Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T: Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010, 11:365.

3) Chen J, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ et al: MMDB: Entrez's 3D-structure database. Nucleic acids research 2003, 31(1):474-477.

4) Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M et al: CDD: specific functional annotation with the Conserved Domain Database. Nucleic acids research 2009, 37(Database issue):D205-210.

5) Marchler-Bauer A, Bryant SH: CD-Search: protein domain annotations on the fly. Nucleic acids research 2004, 32(Web Server issue):W327-331.

6) Wang Y, Geer LY, Chappey C, Kans JA, Bryant SH: Cn3D: sequence and structure views for Entrez. Trends in biochemical sciences 2000, 25(6):300-302.

7) Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH: PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic acids research 2009, 37(Web Server issue):W623-633.

Frequently Asked IBIS Questions and Answers