CDART Overview

About CDART

The Conserved Domain Architecture Retrieval Tool (CDART) finds protein similarities across significant evolutionary distances using sensitive domain profiles rather than direct sequence similarity. A domain architecture is defined as the sequential order of conserved domains (functional units) in a protein sequence.

Given a protein query sequence, CDART shows the conserved domains that make up a protein, as identified by RPS-BLAST, and then lists proteins with a similar conserved domain architecture, as shown in the illustration below. Relying on domain profiles allows CDART to be fast and, because it relies on annotated functional domains, informative.

A query can be submitted as an (a) protein sequence (in the form of a sequence identifier or as sequence data), (b) set of conserved domains (in the form of superfamily cluster IDs, conserved domain accession numbers, or PSSM IDs), or as (c) multiple queries. Alternatively, you can retrieve a protein sequence record from the Entrez Protein database and follow the link for "Related information: Domain Relatives." The help document provides a quick start guide, as well as details about the input required, output display, and the program's features and functions.

Illustration of CDART search results, which list proteins that have domain architectures similar to your query protein sequence (NP_081086, mouse DNA mismatch repair protein, in this example). Click on this graphic to read more about the output display.

(A related tool, SPARCLE, the Subfamily Protein Architecture Labeling Engine, is a resource for the functional characterization and labeling of protein sequences that have been grouped by their characteristic conserved domain architecture.)

Revised 29 November 2016