Quick search (query the database for individual Arabidopsis gene locus):

Home
Introduction
Main Conclusions
HOW-TO
FAQS
Implementation
Data description

Advanced Search

Locus Search

Search by Gene Family

 

 Data in the Plant-Specific Database.




Click for the list of 3848 Arabidopsis Plant-Specific Proteins.


All other organisms (2436).
Eukaryota only (1152).
Bacteria but not in Archea or Eukaryota (576).
Bacteria and Eukaryota (249).
Cyanobacteria (316).
Multicellular eukaryotes only (163).
Archea and Eukaryota (141).
Bacteria and Archea but not Eukaryota (89).
Archea only (10).




National Science Foundation U.S. Department of Energy

Below is a list of all the data compiled and integrated into the Plant-Specific Database.
For further details about this work see (Gutierrez, RA, et al., 2004).



[Description]

Description field from the The Institute for Genomic Research (TIGR) Arabidopsis thaliana genome.

[Comment]

Comment field from the The Institute for Genomic Research (TIGR) Arabidopsis thaliana genome.

[Classification]

Classification based on the pattern of sequence similarity to proteins in other organisms. For further details see the Introduction.

[TIGR ID]

Pub locus identifier.

[GB]

GenBank accession number.

[Gene Family based on BLASTCLUST]

BLASTCLUST systematically clusters protein or DNA sequences based on pairwise matches found using the BLAST algorithm.

  • The number of genes included in a gene family depends on the criteria for inclusion. For this "Plant Specific" database, we have arbitrarily defined a gene family as those members whose proteins are clustered by BLASTCLUST using the parameters L(length)=0.6 and S(similarity)=0.8. THe list below indicates other Arabidopsis proteins which are clustered with protein AtXgXXXXX using these parameters.
  • If you would like to explore the gene family using less or more stringent clustering parameters, click on the "Gene Family" link below. This will return a graphical display of how L and S parameters influence the number of members of the family and will provide links to clusters that are formed with other parameters. However, this analysis will require one minute or more.
  • For more information on BLASTCLUST: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blastclust.txt

[Expert gene family]

The classification indicated in this field is based on the work by scientists expert in the particular gene family. Most of this information was obtained from the Arabidopsis Information Resource (http://www.arabidopsis.org). See the (Gene Family Information webpage for further details. The gene families related to Lipid Metabolism were obtained from the "Arabidopsis Lipid Gene Database". Description of this database can be found in Beisson, F. et al (2003). The gene families related to RNA metabolism were kindly provided by Dr Vivek Anantharaman and Dr. Eugene V. Koonin. For a description of the study of proteins involved in RNA metabolism see Anantharaman,V. et al (2002).

[MW (TIGR)]

Protein molecular weight is based on the prediction in the TIGR genome.

[pI (TIGR)]

Protein isoelectric point is based on the prediction in the TIGR genome.

[EC analysis (KEGG)]

Enzyme Commission number as annotated by the Kyoto Encyclopedia of Genes and Genomes.

[External Links]

Links to external databases:

[Subcellular Localization (TargetP Prediction)]

Predictions were performed using the TargetP program (described in Emanuelsson, O. et al., (2000).) available at the Center for Biological Sequence Analysis (http://www.cbs.dtu.dk/services/TargetP). TargetP looks for N-terminal sorting signals by feeding the outputs from SignalP, ChloroP and an analogous mitochondrial predictor into a neural network that makes the final choice between the different compartments. It provides a score and a reliability class (a measure of the difference between the winner and runner-up models) to evaluate the significance of the prediction. The TargetP web server size cutoff of 4000 aa precluded analysis of the complete sequence of four Arabidopsis protein-coding genes (At1g48090.1, At1g67120.1, At3g02260.1 and At5g23110.1). In these cases, only the N-terminal portion of the protein was utilized for the prediction.
Caution should be exerted when looking at the individual predictions. TargetP program can yield false positives and false negatives (Emanuelsson et al., 2000). TargetP correctly discriminates between chloroplast, mitochondria, secretory pathway and other location 85% of the time when analyzing Arabidopsis proteins (Emanuelsson et al., 2000). To facilitate correct interpretation of individual predictions, we provide the complete output of the programs.

[Gene Expression in different plant organs]

[EST data]

EST data was prepared as described in Beisson, F. et al (2003). The composition of the synthetic cDNA libraries used in this study is available from the Arabidopsis Lipid Gene database.

[AFGC microarray data]

To analyze the expression of genes in organs, we used a highly filtered dataset prepared from the publicly available two-color microarray experiments performed by the Arabidopsis Functional Genomics Consortium (AFGC). Briefly, all microarray hybridizations comparing gene expression in organs were considered for the analysis. These include the following SMD identifiers: 7197, 7199, 7200, 7201, 7203, 7205, 21096, 21097, 21098, 21099, 2370, 2371. Spot quality parameters were applied to each hybridization to filter out sub-optimal data points: (1) Sum of raw channel intensities >= 1000. (2) Channel intensity values could not be saturated in more than 1 channel per hybridization. (3) 50% or more of the pixels in the spot had to be greater than 1.5-times the background (in at least one channel per hybridization) (4) Flag = 0. (5) We included only spots that were printed with DNA from good PCR reactions (SMD codes 0, 5 and 7). The lowess method by sector was then used to normalize each hybridization (Yang et al., 2002). All the organ hybridizations used in this study passed slide quality parameters: (1) Hybridizations did not have strong gradient in the ratios after normalization (Gutierrez et al, 2002). (2) Data in replicate hybridizations was reproducible. Reproducibility was qualitatively assessed by scatter plots of the replicates. EST clones that had been printed several times were averaged and a final data table was generated by calculating the median of redundant EST clones (those that represent the same gene). The mean was calculated for the replicates.

[Membrane/Soluble (TMHMM Prediction)]

Putative transmembrane helices were predicted using TMHMM ( Krogh et al., 2001) through the web server available at http://www.cbs.dtu.dk/services/TMHMM/. TMHMM uses a hidden markov model to predict transmembrane helices from the primary protein structure.
Caution should be exerted when looking at predictions for individual proteins. TMHMM program can yield false positives and false negatives (Krogh et al., 2001). TMHMM success rate in discriminating soluble from membrane proteins is claimed to be higher than 99% in proteins without a signal peptide (Krogh et al., 2001). To facilitate correct interpretation of individual predictions, we provide the complete output of the programs.

[MIPS Automatic Functional Classification]

The automatic functional assignments for the Arabidopsis genes represented in PLASdb were obtained from the MIPS Arabidopsis thaliana database (http://mips.gsf.de/proj/thal/db/index.html).

[Gene Ontology]

Ontology assignments were obtained from the TIGR database.

[Protein Domain Analysis]

Protein domain analysis for the Arabidopsis proteins in PLASdb was obtained from the TIGR Arabidopsis thaliana database.

[Signal-P Analysis]

Signal-P analysis for the Arabidopsis proteins in PLASdb was obtained from the TIGR Arabidopsis thaliana database.

References:

  1. Anantharaman V, Koonin EV, Aravind L (2002) Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res 30: 1427-1464
  2. Beisson F, Koo AJ, Ruuska S, Schwender J, Pollard M, Thelen JJ, Paddock T, Salas JJ, Savage L, Milcamps A, Mhaske VB, Cho Y, Ohlrogge JB (2003) Arabidopsis genes involved in acyl lipid metabolism. A 2003 census of the candidates, a study of the distribution of expressed sequence tags in organs, and a web-based database. Plant Physiol 132: 681-697
  3. Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their N- terminal amino acid sequence. J.Mol Biol. 300: 1005-1016
  4. Gutierrez RA, Ewing RM, Cherry JM, Green PJ (2002) Identification of unstable transcripts in Arabidopsis by cDNA microarray analysis: rapid decay is associated with a group of touch- and specific clock-controlled genes. Proc Natl Acad Sci U S A 99: 11513-11518
  5. Gutierrez RA, Larson, MD, Wilkerson, C (2004) The Plant-Specific Database: classification of Arabidopsis proteins based on their phylogenetic profile. Plant Phys. submitted.
  6. Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J.Mol Biol. 305: 567-580
  7. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 30: e15