What is IBIS?
IBIS is the Inferred Biomolecular Interactions Server (/Structure/ibis/ibis.cgi),
which was developed at NCBI to analyze and predict interactions between
proteins and other biomolecules. The unique feature of IBIS is that it
identifies protein interaction partners together with the
locations of their binding sites on the query sequence/structure. IBIS
categorizes protein binding sites into five categories: protein, small molecule,
nucleic acids, peptide and ion binding sites. For a given sequence/structure
query with unknown binding sites, first IBIS reports physical interactions
made by this query. These so-called
observed interactions have been directly observed in experimentally-determined
structures and are denoted by a letter "o" in red color at the beginning of the
row. The whole list of observed interactions for all PDB queries can be
downloaded from ftp://ftp.ncbi.nih.gov/pub/mmdb/ibis/.
Second, IBIS can infer binding sites by homology, by inspecting
protein complexes formed by close homologs of a given query. To ensure
biological relevance of inferred binding sites, IBIS clusters them based on
their sequence and structure
conservation. Only those binding sites that are evolutionarily conserved
among non-redundant homologous proteins are considered in the prediction.
Additionally, binding site clusters are verified by comparing them with the curated
binding site annotations from the Conserved Domain Database (CDD(2)) (if
present) and only biological
assemblies from PDB/MMDB databases are used. After binding sites are
clustered, position specific scoring matrices (PSSMs) are constructed based
on binding site alignments. PSSMs and are subsequently used (together with
other measures) to rank binding sites with respect to their closeness to the
query and their biological relevance.
How to search IBIS?
IBIS can be searched by a four letter PDB code (Ex: 1XBB).
If known, a single
letter chain identifier can also be supplied along with the PDB code
(Ex:1XBBA).
IBIS can also be searched using the GI (GenBank identifier) of any
protein sequence (Ex: 1668706) or protein sequence accession (O36006).
How does IBIS find inferred interactions for protein
sequences without a known structure?
IBIS can also be searched for sequences with no known structures using their NCBI GenBank identifier (GI)
or protein sequence accessions.
In the current version, for a given protein sequence query, BLAST search is
performed against sequences of all structures in PDB/MMDB(1) to find the closest
homolog. The interactions for the closest homolog are then displayed.
Why there are no results reported for my query?
If the query is a PDB identifier and does not have any homologs with more
than 25% sequence identity over the structurally superimposed region, IBIS will
not report any interactions.
If the query is a protein sequence that does not have any close homologs, IBIS will not report any interactions.
How are different types of interactions defined and how
to navigate between them?
Five different types of interactions/binding sites are currently presented in IBIS. protein-protein, protein-chemical, protein-nucleic acid, protein-peptide and protein-ion. The tabs "Protein", "Chemical", "DNA/RNA", "Peptide" and "Ion" on the top left corner show the interactions/binding sites for the respective type. The user may navigate between different types of interactions by clicking on these tabs.
Protein-chemical, protein-nucleic acid, protein-peptide and protein-ion interactions are based on the full chain sequence of the query.
Protein-protein interactions are annotated and inferred for each domain of the
query. Domains on the query are mapped by using CDD(2) database and CD-search method(3). If no domains
are mapped to the query, the full chain sequence of the query is used. Different query domains can be selected by clicking on the
corresponding domain bubble below the query in the graphic
at the top of the page.
How is the IBIS web page organized?
Each web page pertains to a single query sequence/structure or protein domain.
There are three main parts of the web page display.
A summary graphic is at the top of the page. The query sequence is represented
by a "ruler". Bubbles below the sequence show the footprints of any conserved
domains detected for the query. Underneath is a list of binding site clusters
(see below), with locations of binding site residues indicated by triangles.
Each cluster is labeled by the name of the interacting partner. Clicking on the
partner name expands the corresponding row in the detailed table.
There is a table listing the details of each binding site cluster for the selected type of interaction (Protein, Chemical, DNA/RNA, Peptide, and Ion). Click on the [+] button at the left side of the cluster to see the cluster features. To close the cluster click on the [-] button.
On the left, there are Search tools for exploring the interaction data of a
given query protein.
What is a "binding site cluster"?
A cluster consists of a collection of structures that are
related to the query. All members of the cluster should contain similar
binding sites. Homology is inferred by comparing the query with similar
structures determined using the VAST algorithm for structure-structure superposition.
Currently 25% identity threshold is used for all except for homooligomer interactions (those protein-protein interactions between two domains belonging to the same CDD families).
For homooligomers we used a more stringent threshold of 50% identity between query and all cluster members. On the main page IBIS displays only those
clusters that contain evolutionarily-conserved binding sites among non-redundant
homologous proteins. Similarity between binding sites is measured in terms of
sequence similarity and those positions which structurally overlap are assigned
a higher weight. A "singleton" cluster has only one non-redundant member (after
members with more than 90% identity are purged) and singleton clusters are displayed at the bottom of the list.
For each cluster, binding site residues are illustrated in
the summary graphic and further details are provided in the corresponding row of
the table. Clusters that contain an interaction observed in the query structure are marked by letter
"o" at the beginning of the row. By expanding the cluster row, one can see
additional information about its members.
What do the columns in the interaction summary tables
mean?
Each row in the table corresponds to a binding site cluster. Each column in
the table is defined as follows.
Interaction partner - name of a representative interaction partner which
interacts with the actual query or with its homolog using the same binding site. For protein-protein
interactions, the CDD(2) domain name of the partner is listed. If
there are no domain assignments on the interacting chain, "No domain assigned"
is displayed. For protein-chemical interactions, this column reports the name of
small molecule bound to the representative member of the cluster. For
protein-nucleic acid and protein-peptide interactions, the column reports the
first 20 nucleotides/residues from the interaction partner of the representative
cluster member.
Ranking score - score which ranks binding site
clusters in terms of their biological relevance and similarity to the query.
Components of the ranking score include the sequence- PSSM score; average
sequence identity between the query and cluster members calculated over the
whole structure-structure alignment; number of interfacial contacts and
fraction of conserved (calculated as entropy) columns in the binding site
alignment . All components of the ranking score are then normalized
(Z-score) and all clusters are ranked with respect to this Z-score. The ranking score is
not defined for "singleton" clusters. Clusters with CDD annotations are
displayed on the top of the table.
Number of cluster members - a number of homologs in the cluster.
Average percent identity to query - an average sequence identity between the query and cluster members calculated over the whole structure-structure alignment.
Number of binding site residues - a union of binding sites mapped from all members of the cluster to the query.
Number of chemicals (for protein-chemical interactions)
- a number of different chemicals present in a given binding site cluster.
Curator annotation - binding site annotation(s) from the CDD(2)
which overlaps more than 50% with the sites annotated by IBIS.
Taxonomic diversity - a last common ancestor of
proteins from a given cluster.
What do the columns mean if the binding site cluster is
expanded?
Each binding site cluster can be expanded (by clicking the (+) sign). Up to ten non-redundant members are displayed (non-redundancy defined with respect to 90% sequence identity), The complete list of members is displayed by clicking the "See all members" link at the bottom of the list of cluster members. Details of the columns are as follows,
First column lists a type of interaction partner. Other columns include:
Homologous complex - a PDB code of the
complex which is homologous to a given query.
Homolog - a PDB chain identifier for a protein
homologous to the query. Chain identifiers of type "A_1" represent chains
generated by applying the crystallographic symmetry operations. All biological
assemblies are taken from PDB/MMDB.
Interaction partner - a chain identifier of a
protein, peptide or DNA/RNA which interacts with "Homolog" (see previous column)
in the homologous complex. If this is an observed interaction, both chains are
from query structure complex. For "chemical" and "ion" types of interactions, only one chain is listed, showing which chain in the structure homolog aligns with the query protein.
% identity to query - an average sequence identity between the query and the cluster members calculated over the whole structure-structure alignment.
Curator
annotation (if available) is the same as in the summary table. Cluster members with annotations are elevated to the top of the table irrespective of their ranking score.
Binding site - an alignment of binding site
residues of cluster members,
query is shown at the top. Residues are numbered with respect to the full length
protein sequence (used in GenBank database). Residue conservation is calculated in terms of the relative entropy and
different colors are used to depict the degree of conservation: red indicates highly conserved and blue indicates medium and non-conserved residues are shown in black. All singletons are shown in grey.
View Binding Sites - clicking this button launches
the NCBI graphical viewer Cn3D to view binding site residues. See "How do I view interaction sites in protein structures?" for more info.
How can I access the IBIS web page for another query structure?
Enter the PDB code or PDB code along with chain identifier (case sensitive) or a GI in the Search PDB ID/GI box on the upper right of the web page. Ex: 1xbb, 1xbbA, 1668706.
What is a singleton cluster?
"Singleton" means a cluster which has only one non-redundant member (after members with more than 90% identity are purged).
How do you "expand a cluster"?
The table contains a list of clusters. Click on the [+] button at
the left to see the cluster features. When you are
done browsing, to close the cluster click on the [-] button.
How do I view binding sites in protein
structures?
Structures can be viewed with the Cn3D software (/Structure/CN3D/cn3d.shtml). The first step is to expand a binding site cluster and then select the specific structures or all structures in the table by using the checkboxes. Then clicking on the button [Show Binding Sites] will show the query structure superimposed with the selected structures and the binding sites depicted with side chains. The sequence viewer window of Cn3D also highlights the binding site residues based on sequence conservation in the aligned column.
How do you use the various "Advanced Search" features?
The search features enable you to filter the interactions based on various
criteria. For example, one might be interested in the Chemical binding sites and
want to know if there is a structure with a particular chemical bound to it.
Importantly, the search results apply only to the IBIS data corresponding to the
current query structure.
What are non-biological chemical sites?
Non-biological chemical sites are binding sites formed by non-biological
small molecules (such as buffers, salts, detergents, solvents and ions added
for purification and/or crystallization process). These sites are hidden
by default, but can be viewed by selecting the option Non-Biological
Chemical Sites:Show on the
left side bar.
References :
1) Shoemaker BA, Zhang D, Thangudu RR, Tyagi M, Fong JH, Marchler-Bauer A, Bryant SH, Madej T, Panchenko AR.: Inferred Biomolecular Interaction Server --a web server to analyze and predict protein interacting partners and binding sites. Nucleic Acids Res 2010, 38(D):518-24.
2) Thangudu RR, Tyagi M, Shoemaker BA, Bryant SH, Panchenko AR, Madej T: Knowledge-based annotation of small molecule binding sites in proteins. BMC Bioinformatics 2010, 11:365.
3) Chen J, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ et al: MMDB: Entrez's 3D-structure database. Nucleic acids research 2003, 31(1):474-477.
4) Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M et al: CDD: specific functional annotation with the Conserved Domain Database. Nucleic acids research 2009, 37(Database issue):D205-210.
5) Marchler-Bauer A, Bryant SH: CD-Search: protein domain annotations on the fly. Nucleic acids research 2004, 32(Web Server issue):W327-331.
6) Wang Y, Geer LY, Chappey C, Kans JA, Bryant SH: Cn3D: sequence and structure views for Entrez. Trends in biochemical sciences 2000, 25(6):300-302.
7) Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH: PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic acids research 2009, 37(Web Server issue):W623-633.