AHA Approved Data Repositories

Subject-focused repositories, when available, are preferred over general repositories.

ArrayExpressThe ArrayExpress Archive is a database of functional genomics experiments including gene expression where you can query and download data collected to MIAME and MINSEQE standards. Gene Expression Atlas contains a subset of curated and re-annotated Archive data which can be queried for individual gene expression under different biological conditions across experiments.
BioModelsBioModels Database is a repository of computational models of biological processes. Models described from literature are manually created and enriched with cross-references.
CellMLThe purpose of CellML is to store and exchange computer-based mathematical models. CellML allows scientists to share models even if they are using different modeling tools. It also enables them to reuse components from one model in another, thus accelerating model development. is a Web-based resource that provides patients, their family members, health care professionals, researchers, and the public with easy access to information on publicly and privately supported clinical studies on a wide range of diseases and conditions.
ClinVarClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation.
COSMICCOSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers.
DashDataONE Dash is a self-service tool for researchers to describe, upload, and share their research data via ONEShare, member repository of the DataONE network.
Database of Genomic Variants (DGV)Aims to provide a comprehensive summary of structural variation in the human genome and provides a useful catalog of control data for studies aiming to correlate genomic variation with phenotypic data. The database is continuously updated with new data from peer-reviewed research studies.
Dataverse (generic)The Dataverse Network is an open source application to publish, share, reference, extract and analyze research data.
dbGAPThe database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that investigate the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits.
dbSNPIn collaboration with the National Human Genome Research Institute, the National Center for Biotechnology Information has established the dbSNP database to serve as a central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms.
EGAThe European Genome-phenome Archive (EGA) is designed to be a repository for all types of genotype experiments, including case control, population, and family studies. It includes SNP and CNV genotypes from array based methods and genotyping done with re-sequencing methods. This data may be either publicly available or limited access, depending on the design of the study.
Electron Microscopy DataBankEMDataBank is a unified global portal for deposition and retrieval of 3DEM density maps, atomic models, and associated metadata, as well as a resource for news, events, software tools, data standards, validation methods for the 3DEM community.
EMBL Nucleotide Sequence DatabaseEurope's primary nucleotide sequence resource. The main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications. It is one part of the European Nucleotide Archive (ENA).
exRNAThe goals of the ERC consortium are to discover fundamental biological principles about the mechanisms of exRNA generation, secretion, and transport; to identify and develop a catalog of exRNA found in normal human body fluids; and to investigate the potential for using exRNAs in the clinic as therapeutic molecules or biomarkers of disease.
figshare allows users to upload any file format to be made visualisable in the browser so that figures, datasets, media, papers, posters, presentations and file sets can be disseminated in a way that the current scholarly publishing model does not allow.
FlyBaseFlyBase is a database of genetic and molecular data for D. melanogaster and other Drosophila species, targeted to an audience of research professionals.
GenBankGenBank is an annotated collection of publicly available DNA sequences available through the National Center for Biotechnology Information databases. GenBank contains over 135,000,000 sequence records and is updated every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration along with the DNA DataBank of Japan and the European Molecular Biology Laboratory.
GitHub (source code)Repository for open source code.
IntACTIntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.
International Mouse Phenotype ConsortiumThe International Mouse Phenotyping Consortium (IMPC) is an international scientific endeavor to create and characterize the phenotype of 20,000 knockout mouse strains.
MetaboLightsMetaboLights is a database for Metabolomics experiments and derived information. The database is cross-species, cross-technique and covers metabolite structures and their reference spectra as well as their biological roles, locations and concentrations, and experimental data from metabolic experiments.
Nanomaterial RegistryThe Nanomaterial Registry is an authoritative, fully curated resource that archives research data on nanomaterials and their biological and environmental implications.
National Collection of Type CulturesThe National collection of Type Cultures (NCTC) is a specialized laboratory located in the Central Public Health Laboratory, Colindale. It accesses, preserves and supplies authentic cultures of bacteria and mycoplasmas that are pathogenic to man or other animals that may occur in food or water and in hospital or health related environments and which can be preserved by freeze-drying.
National Collection of Pathogenic VirusesA wide-ranging archive of well-characterized, authenticated human pathogens which will resource the supply of viruses, and materials derived from them, to the scientific community.
NCBI BioProjectThe BioProject repository collects projects with biological data that relates to a single initiative that originates from a single entity or consortium. Records provide users with a single location for the links to diverse data types generated for those projects.
NCBI BioSampleThe BioSample database contains descriptions of biological source materials used in experimental assays.
NCBI Conserved Domains DatabaseThe Conserved Domains Database (CDD) contains annotations of functional units in proteins; including multiple sequence alignment models for ancient domains and full-length proteins. This collection of models includes 3D structures that display the sequence/structure/function relationships in proteins. Users can identify amino acids in protein sequences with the resources available through CDD as well as view single sequences embedded within multiple sequence alignments.
NCBI: dbMHCThe dbMHC database provides an open, publicly accessible platform for DNA and clinical data related to the human Major Histocompatibility Complex (MHC). The dbMHC provides access to human leukocyte antigen (HLA) sequences, HLA allele and haplotype frequencies, and clinical datasets.
NCBI dbVarThe dbVar is a database of genomic structural variation containing data from multiple gene studies. Users can browse data containing the number of variant cells from each study, and filter studies by organism, study type, method and genomic variant. Organisms include human, mouse, cattle and several additional animals.
NCBI EpigenomicsThe Epigenomics database provides genomics maps of stable and reprogrammable nuclear changes that control gene expression and influence health. Users can browse current epigenomic experiments as well as search, compare and browse samples from multiple biological sources in gene-specific contexts. Many epigenomes contain modifications with histone marks, DNA methylation and chromatin structure activity. NCBI Epigenomics database contains datasets from the NIH Roadmap Epigenomics Project.
NCBI ESTEST (Expressed Sequence Tag) collects short, single-read transcript sequences from GenBank, which serve as a resource to evaluate gene expression, find potential variation, and annotate genes. It contains nucleic acid sequences and uncharacterized, short cDNA sequences.
NBCI GEO DatasetsAn international public repository, GEO (Gene Expression Omnibus) DataSets archives and distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data. The records include original submitter-supplied records (Series, Samples and Platforms) and curated DataSets. GEO aims to provide a database that efficiently store this data; offer simple submission procedures and formats that support complete and well-annotated data deposits from the research community; and provide user-friendly mechanisms for users to find and use studies and gene expression profiles of interest. GEO DataSets provides tools to identify differences in gene expression levels and cluster heatmaps.
NCBI GEO ProfilesThe Gene Expression Omnibus (GEO) database stores individual gene expression profiles from NCBI databases and is searchable by gene annotation as well as gene profile characteristics. GEO archives microarray and next-generation sequencing as well as other forms of genomic data submitted by researchers within the scientific community.
NCBI GeneIntegrating information from a variety of species, records in Gene include nomenclature, Reference Sequences, maps, pathways, variations, phenotypes, and links to genome-specific, phenotype-specific, and locus-specific resources.
NCBI GenomeThe Genome database contains annotations and analysis of eukaryotic and prokaryotic genomes, as well as tools that allow users to compare genomes and gene sequences from humans, microbes, plants, viruses and organelles. Users can browse by organism, and view genome maps and protein clusters.
NCBI GSSThe GSS database collects unannotated, short, single-read, primary genomic sequences from GenBank and contains nucleic acid sequences. These sequences include random survey sequences, clone-end sequences, and exon-trapped sequences.
NCBI HomoloGeneThe HomoloGene database provides a system for the automated detection of homologs among annotated genes of genomes across multiple species. These homologs are fully documented and organized by homology group. HomoloGene processing uses proteins from input organisms to compare and sequence homologs, mapping back to corresponding DNA sequences.
NCBI NucleotideThe NCBI Nucleotide database collects sequences from such sources as GenBank, RefSeq, TPA, and PDB. Sequences collected relate to genome, gene, and transcript sequence data, and provide a foundation for research related to the biomedical field.
NCBI PopSetNCBI PopSet collects DNA sequences to analyze the ways that populations are related by evolution. Such sequences indicate if populations originate from different members of the same species or from organisms of different species entirely.
NCBI ProbeProbe database provides a public registry of nucleic acid reagents as well as information on reagent distributors, sequence similarities and probe effectiveness. Database users have access to applications of gene expression, gene silencing and mapping, as well as reagent variation analysis and projects based on probe-generated data. The Probe database is constantly updated, with over 11,000,000 probes available.
NCBI ProteinThe Protein database collects protein sequences related to biological structure and function. The sequences in NCBI Protein come from the translations from annotated coding regions in GenBank, RefSeq, and TPA, and records from SwissProt, PIR, PRF, and PDB.
NCBI Protein ClustersThe Entrez Protein Clusters database contains annotation information, publications, structures and analysis tools for related protein sequences encoded by complete genomes. The data available in the Protein Clusters Database is generated from prokaryotic genomic studies and is intended to assist researchers studying micro-organism evolution as well as other biological sciences. Available genomes include plants and viruses as well as organelles and microbial genomes.
NCBI Reference SequenceThe Reference Sequence database provides explicitly linked nucleotide and protein sequences, as well as comprehensive and annotated sequence sets with genomic DNA, proteins and transcripts. Users have access to a wealth of resources for gene identification, comparative analysis and genome research. Reference Sequences are available for naturally occurring DNA, RNA and protein sequences in organic species worldwide.
NCBI StructureThe Structure database provides three-dimensional structures of macromolecules for a variety of research purposes and allows the user to retrieve structures for specific molecule types as well as structures for genes and proteins of interest. Three main databases comprise Structure-The Molecular Modeling Database; Conserved Domains and Protein Classification; and the BioSystems Database. Structure also links to the PubChem databases to connect biological activity data to the macromolecular structures. Users can locate structural templates for proteins and interactively view structures and sequence data to closely examine sequence-structure relationships.
NCBI TaxonomyCurrently covering about 10 percent of the described species on the planet and more than 175,000 taxa, Taxonomy is a curated classification and nomenclature for all organisms in the public sequence databases. Taxonomy gives species names and higher-level classifications of the organisms represented in the Entrez sequence databases. It maintains a phylogenetic classification (containing only monophyletic groups if possible). Most species are represented only by a small piece of sequence data that's insufficient to construct a full phylogeny, but some species contain complete genomes.
NCBI Trace ArchiveThe Trace Assembly Archive stores pairwise alignment and multiple alignment of sequencing reads, linking basic trace data with finished genomic sequence.
National Institutes of Health Blueprint for Neuroscience Research (NITRC)NITRC facilitates finding and comparing neuroimaging resources for functional and structural neuroimaging analyses.
Online Mendelian Inheritance in Animals (OMIA)Online Mendelian Inheritance in Animals contains textual information, references, links, and relevant records related to genes, traits, and inherited disorders in animals.
Online Mendelian Inheritance in Man (OMIM)OMIM contains authoritative medical data on all known mendelian disorders as well as full-text and referenced overviews on the relationship between phenotype and genotype. Users can search the OMIM database by chromosome as well as narrow their search results by known gene sequences, phenotypes and gene map locus; as well as searching using only clinical synopses containing any combination of 22 specified criteria. The information contained in OMIM is available to download for personal, educational and research uses.
Open Science FrameworkOpen Science Framework is a repository hosted by Center for Open Science, which is a non-profit technology company providing free and open services to increase inclusivity and transparency of research.
PRIDEThe PRIDE PRoteomics IDEntifications database at EMBL-EBI is a centralised, standards compliant, public data repository for proteomics data. It has been developed to provide the proteomics community with a public repository for protein and peptide identifications together with the evidence supporting these identifications. PRIDE is also able to capture details of post-translational modifications.
Protein Data Bank in Europe (PDBe)The EBI Protein Structure Database in Europe is a project for the collection, management and distribution of data about macromolecular structures, derived from the Protein Data Bank (PDB). It is one of the founding members of Worldwide Protein Data Bank (wwPDB).
PubChemPubChem provides information on the biological activities of small molecules.
Rat Genome DatabaseThe Rat Genome Database houses genomic, genetic, functional, physiological, pathway and disease data for the laboratory rat as well as comparative genomics between rat, human and mouse.
RCSB Protein Data BankProtein Data Bank (PDB) archive is the single worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.
Sequence Read ArchiveThe Sequence Read Archive stores the raw sequencing data from such sequencing platforms as the Roche 454 GS System, the Illumina Genome Analyzer, the Applied Biosystems SOLiD System, the Helicos Heliscope, and the Complete Genomics. It archives the sequencing data associated with RNA-Seq, ChIP-Seq, Genomic and Transcriptomic assemblies, and 16S ribosomal RNA data.
TriTrypDBTriTrypDB is an integrated genomic and functional genomic database for pathogens of the family Trypanosomatidae, including organisms in both Leishmania and Trypanosoma genera.
UK Data ArchiveThe UK Data Archive (UKDA) is a centre of expertise in data acquisition, preservation, dissemination and promotion and is curator of the largest collection of digital data in the social sciences and humanities in the UK.
United States Renal Data SystemThe United States Renal Data System (USRDS) is a national data system that collects, analyzes, and distributes information about chronic kidney disease (CKD) and end-stage renal disease (ESRD) in the United States
WormbaseWormBase is an online biological database about the biology and genome of the nematode model organism Caenorhabditis elegans and contains information about other related nematodes.
ZENODO builds and operate a simple and innovative service that enables researchers, scientists, EU projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of the existing institutional or subject-based repositories of the research communities.
ZFINZFIN serves as the zebrafish model organism database, on-line database of information for zebrafish researchers.