(Database of prOkaryotic
OpeRons) (version 1.0) (version 2.0)
|You are the visitor since Feb. 6th, 2009|
DOOR (Database of prOkaryotic OpeRons) is
an operon database developed by Computational Systems Biology Lab (CSBL) at
University of Georgia. The operons in the database are based on prediction.
The important facts are listed below:
- The prediction algorithm
The algorithm is a data-mining classifier. The features include Intergenic
distance, Neighborhood conservation, Phylogenetic distance, information from
short DNA motifs, Similarity score between GO terms of gene pairs and Length
ratio between a pair of genes. The classifier is a trained decision
tree based one the training data from E. coli and B. subtilus.
Please read the paper below for detail.
P. Dam, V. Olman, K. Harris, Z. Su, Y. Xu. Operon
prediction using both genome-specific and general genomic information.
Nucleic Acids Res., 35(1):288-98,
- The data quality
Based on the algorithm paper the accuracy can reach 90.2%
and 93.7% on B. subtilis and E. coli genomes,
Based on another evaluation paper (see reference below) published at Brief
Bioinformatics by Brouwer RW et al., this algorithm is consistently
best at all aspects including sensitivity and specificity for both
true positives and true negatives, and the overall accuracy reach ~90%.
Brouwer, R.W., Kuipers, O.P. and van Hijum, S.A. (2008) The
relative value of operon predictions. Briefings in bioinformatics,
- The database size
Currently DOOR has operons for 675 prokaryotic genomes.
Some other research groups also provide operons (predicted or collected from
literature), such as the OperonDB provided by Steven L. Salzberg's group at
University of Maryland, the predicted operons in MicrobesOnline at VIMSS (Virtual
Institute of Microbial Stress and Survival), ODB at Kyoto University in Japan,
DBTBS in Japan and RegulonDB at Mexico.
Upon the time we are developing this database, MicrobesOnline
is providing operons for 620 genomes, and OperonDB
is providing operons for 550 genomes, and ODB
is providing operons for 203 genomes, and RegulonDB
is basically providing operons in E. coli only and DBTBS
is for B. subtilis only. All operons in OperonDB, MicrobesOnline
are predicted, and most operons in ODB are also predicted. RegulonDB and DBTBS
operons are based on experiment and literature. In addition we will keep updating
the database when new prokaryotic genomes are available.
- Experiment/literature information
Although most of operons in DOOR are not verified by experiments, we are also
trying to provide some limited literature information, which is extracted
from ODB. In addition we believe the operon data provided in DOOR will be
quite useful to serve for analysis such as operon evolution, and operon transfer
We want to emphasize that if the users are looking for strictly experimentally
verified operons, they should look into DBTBS and RegulonDB first.
In order to facilitate the interaction between the users and the database,
we provided an operon wiki, which can be edited by the users, to incorperate
the information from the users.
- RNA genes
We provide operons which include RNA genes, which are rarely seen in other
operon databases especially for predicted operon databases.
- Query capability
We provide very powerful query capability for the users to find their wanted
information easily. Please see section 3, 4, 6 for detailed description.
- Operon similarity
We defined the similarity scores between operons, which is based on weighted
maximum matching between operons. Similar operon groups can be used to predict
accurate orthologous genes, and their upstream regions can be used to find
the consensus binding motifs.
- Integrated motif finding programs
We integrated two motif finding programs in the database: the popular MEME
and our in house program CUBIC. MEME is a popular motif finding program, we
integrated it because most people may like it; and based on our experience
our in house motif finding program CUBIC outperforms MEME in many aspects,
thus we also integrated it.
- Operon selection
Convenient operon selection function makes feeding your interested operons
to MEME or CUBIC very easy.
The distribution of known operonic pairs
and boundary pairs according to prediction score. The x-axis indicates
the prediction score while the y-axis indicates the percentage
of operonic or boundary pairs among all gene pairs with the same
scores. Data from E. coli are shown as dashed lines while
data from B. subtilis are shown as solid lines. Data
from operonic pairs are shown in pink while data from boundary
pairs are shown in blue.
The operon database cover 675 complete archeal and bacterial genomes that include
both chromosomal and plasmid gene pairs. The figure at right side shows the
distribution of the gene pair scores.
Please cites the following papers if you use the result of the operon database:
F. Mao, P. Dam, J. Chou, V. Olman, Y. Xu, DOOR: a Database of prOkaryotic OpeRons. Nucl. Acids Res. 37: D459-D463, 2009
P. Dam, V Olman, K. Harris, Z. Su, Ying Xu, Operon prediction using both genome-specific
and general genome information, Nucleic Acids Research,
35:288 - 298, 2007.
Question about algorithm: (firstname.lastname@example.org)
Question about database (email@example.com)
Suggestions and others (firstname.lastname@example.org)