About LUCApedia

Why use LUCApedia

Thanks to the growth of genomics, proteomics, and metabolomics, it is possible to investigate early life through a range of methods. LUCApedia was established to aggregate and unify the results of studies aimed at describing early life through a variety of bioinformatics approaches and pair them with a number of enzymological characteristics predicted in previous studies to reflect catalysts important in the early evolution of life. Users may query the webserver for individual proteins to rapidly identify evidence of deep ancestry. Advanced users may download the database as a series of flat files and use it to discover trends in early evolution or test hypotheses related to ancient life.

Underlying database framework

Datasets corresponding to studies predicting characteristics of the Last Universal Common Ancestor (LUCA) or earlier forms of life consist of different data types: Protein structures, protein domain folds, clusters of orthologous genes, enzyme functions, etc. In order to use these data in concert, they must be organized into a common framework. We achieve this unification by mapping these datasets to Uniprot IDs (also called “entry names”) and then using these Uniprot IDs to group predictions via EggNOG Clusters. The Uniprot entries represent individual proteins while the EggNOG clusters represent families of proteins.

More information about each specific dataset

Proteins that use iron sulfur cluster cofactors (e.g., Wächtershäuser. 1990. PNAS, 87:200-204)
Proteins that use zinc as a cofactor (Mulkidjanian. 2009. Biol Direct. 4:26)
Enzymes in a core phosphate-free metabolic network without thioester-driven biosynthesis (Goldford et al. 2017. Cell, 168: 1126–1134 )
Enzymes in a core phosphate-free metabolic network with thioester-driven biosynthesis (Goldford et al. 2017. Cell, 168: 1126–1134 )
Proteins that use nucleotide-derived cofactors (White. 1976. J Mol Evol, 7:101-104)
Proteins that use amino acid-derived cofactors (Szathmáry. 1999. Trends Genet. 15:223-9)
Proteins that can bind artificial RNA or DNA aptamers (Blanco. 2018. Curr Biol. 28:526-537)
399 KEGG Orthology (KO) groups predicted to have been present in the LUCA through gene tree-species tree reconciliation (Moody et al. 2024. Nature Ecol Evol, 8:654–1666)
Proteins that perform functions similar to natural or artificial ribozymes (Gilbert. 1986. Nature 319:618)
Proteins that contain at least one of 66 universal protein SCOP superfamilies (Yang et al. 2005. PNAS, 102:373-378)
Proteins that contain at least one of 165 deeper branching SCOP folds (Wang et al. 2007. Genome Res. 17:1572-1585)
Proteins that contain at least one of 115 common Pfam domains (Delaye et al. 2005. Orig, Life Evol. Biosph. 35:537-554)
Proteins that contain at least one of 114 CATH domain superfamilies (Ranea et al. 2006. J Mol Evol. 63(4):513-25)
Proteins belonging to one of 80 universally distributed COGs in the original COG database (Harris et al. 2003. Genome Res, 13:407)
Proteins belonging to one of 571 COGs in the original COG database that were determined to be present in the LUCA by early gene tree-species tree reconciliation (Mirkin et al. 2003. BMC Evol Biol, 3:2)
Enzymes performing one of 286 reactions represented as EC numbers were identified in taxonomically broad metabolic pathways (Srinvasan and Morowitz. 2009. Biol Bull, 216:126-130)
Proteins belonging to one of 355 taxonomically broad COGs from the EggNOG database (Weiss et al. 2016. Nature Microbiol, 1:16116)

More information

For more information please consult our complete documentation available on the download page.

If you have questions or comments about the LUCApedia database, please contact Aaron Goldman at agoldman@oberlin.edu

Goldman Lab Web Page

Faculty Page at Oberlin College

If you use LUCApedia, please cite:
Goldman AD, Bernhard TM, Dolzhenko E, Landweber LF (2013) LUCApedia: a database for the study of ancient life. Nucleic Acids Res. 41:D1079-82. doi: 10.1093/nar/gks1217.