DOOR (Database of prOkaryotic OpeRons) (version 1.0) (version 2.0)
You are the click for a free hit counter visitor since Feb. 6th, 2009


DOOR (Database of prOkaryotic OpeRons) is an operon database developed by Computational Systems Biology Lab (CSBL) at University of Georgia. The operons in the database are based on prediction. The important facts are listed below:

  1. The prediction algorithm
    The algorithm is a data-mining classifier. The features include Intergenic distance, Neighborhood conservation, Phylogenetic distance, information from short DNA motifs, Similarity score between GO terms of gene pairs and Length ratio between a pair of genes. The classifier is a trained decision tree based one the training data from E. coli and B. subtilus. Please read the paper below for detail.

    P. Dam, V. Olman, K. Harris, Z. Su, Y. Xu. Operon prediction using both genome-specific and general genomic information. Nucleic Acids Res., 35(1):288-98, 2007.

  2. The data quality
    Based on the algorithm paper the accuracy can reach 90.2% and 93.7% on B. subtilis and E. coli genomes, respectively.
    Based on another evaluation paper (see reference below) published at Brief Bioinformatics by Brouwer RW et al., this algorithm is consistently best at all aspects including sensitivity and specificity for both true positives and true negatives, and the overall accuracy reach ~90%.

    Brouwer, R.W., Kuipers, O.P. and van Hijum, S.A. (2008) The relative value of operon predictions. Briefings in bioinformatics, 9, 367-375.

  3. The database size
    Currently DOOR has operons for 675 prokaryotic genomes.
    Some other research groups also provide operons (predicted or collected from literature), such as the OperonDB provided by Steven L. Salzberg's group at University of Maryland, the predicted operons in MicrobesOnline at VIMSS (Virtual Institute of Microbial Stress and Survival), ODB at Kyoto University in Japan, DBTBS in Japan and RegulonDB at Mexico.
    Upon the time we are developing this database, MicrobesOnline is providing operons for 620 genomes, and OperonDB is providing operons for 550 genomes, and ODB is providing operons for 203 genomes, and RegulonDB is basically providing operons in E. coli only and DBTBS is for B. subtilis only. All operons in OperonDB, MicrobesOnline are predicted, and most operons in ODB are also predicted. RegulonDB and DBTBS operons are based on experiment and literature. In addition we will keep updating the database when new prokaryotic genomes are available.

  4. Experiment/literature information
    Although most of operons in DOOR are not verified by experiments, we are also trying to provide some limited literature information, which is extracted from ODB. In addition we believe the operon data provided in DOOR will be quite useful to serve for analysis such as operon evolution, and operon transfer study, etc.
    We want to emphasize that if the users are looking for strictly experimentally verified operons, they should look into DBTBS and RegulonDB first.

  5. OperonWiki
    In order to facilitate the interaction between the users and the database, we provided an operon wiki, which can be edited by the users, to incorperate the information from the users.

  6. RNA genes
    We provide operons which include RNA genes, which are rarely seen in other operon databases especially for predicted operon databases.

  7. Query capability
    We provide very powerful query capability for the users to find their wanted information easily. Please see section 3, 4, 6 for detailed description.

  8. Operon similarity
    We defined the similarity scores between operons, which is based on weighted maximum matching between operons. Similar operon groups can be used to predict accurate orthologous genes, and their upstream regions can be used to find the consensus binding motifs.

  9. Integrated motif finding programs
    We integrated two motif finding programs in the database: the popular MEME and our in house program CUBIC. MEME is a popular motif finding program, we integrated it because most people may like it; and based on our experience our in house motif finding program CUBIC outperforms MEME in many aspects, thus we also integrated it.

  10. Operon selection
    Convenient operon selection function makes feeding your interested operons to MEME or CUBIC very easy.

 

 
The distribution of known operonic pairs and boundary pairs according to prediction score. The x-axis indicates the prediction score while the y-axis indicates the percentage of operonic or boundary pairs among all gene pairs with the same scores. Data from E. coli are shown as dashed lines while data from B. subtilis are shown as solid lines. Data from operonic pairs are shown in pink while data from boundary pairs are shown in blue.

Statistics:
The operon database cover 675 complete archeal and bacterial genomes that include both chromosomal and plasmid gene pairs. The figure at right side shows the distribution of the gene pair scores.

References:
Please cites the following papers if you use the result of the operon database:
F. Mao, P. Dam, J. Chou, V. Olman, Y. Xu, DOOR: a Database of prOkaryotic OpeRons. Nucl. Acids Res. 37: D459-D463, 2009
P. Dam, V Olman, K. Harris, Z. Su, Ying Xu, Operon prediction using both genome-specific and general genome information, Nucleic Acids Research, 35:288 - 298, 2007.

Contact information:
Question about algorithm: (phd@csbl.bmb.uga.edu)
Question about database (fenglou@csbl.bmb.uga.edu)
Suggestions and others (fenglou@csbl.bmb.uga.edu)