Motif Databases available for search


The databases that can be searched are grouped into categories:


Motif Databases


JASPAR Vertebrates and UniPROBE Mouse

Searches JASPAR CORE vertebrates and Mouse (UniPROBE).

JASPAR CORE and UniPROBE Mouse

Searches JASPAR CORE and Mouse (UniPROBE).

All Vertebrates

Searches JASPAR CORE vertebrates + UniProbe Mouse + Jolma2013 Human and Mouse

Human and Mouse (Jolma2013)

Human and Mouse high-throughput SELEX motifs from Cell 2013. 152(1-2):327-339.

RNA-binding motifs (Ray2013)

A compendium of 244 RNA-binding motifs from Nature 2013. 11;499(7457):172-7. The are from in vitro experiments using the RNAcompete method for the rapid and systematic analysis of the RNA sequence preferences of RBPs. The motifs are converted from the data in this archive file titled "Top10align PFMs learned from all data" that is supplied by the Ray et al. at their website.

All Drosophila

Searches OnTheFly_2014, Fly Factor Survey, FLYREG, iDMMPMM and DMMPMM.

All Yeast

Searches SCPD and MacIsaac v1.

JASPAR CORE (2014)

The 5th major release (2014) of the JASPAR CORE database contains a curated non-redundant set of 593 profiles derived from published collections of experimentally defined transcription factor binding sites for multicellular eukaryotes. This release includes 135 new curated profiles (74 in vertebrates, 8 in Drosophila melanogaster, 10 in Caenorhabditis elegans and 43 in Arabidopsis thaliana; a 30% increase in total) and 43 older updated profiles (36 in vertebrates, 3 in D. melanogaster and 4 in A. thaliana; a 9% update in total). The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets. The prime difference to similar resources (TRANSFAC; TESS etc) consist of the open data access; non-redundancy and quality: JASPAR CORE is a smaller set that is non-redundant and curated.
When should it be used? When seeking models for specific factors or structural classes or if experimental evidence is paramount.

JASPAR CORE (2014) vertebrates

The 5th major release (2014) of the JASPAR CORE database contains a curated non-redundant set of 205 profiles derived from published collections of experimentally defined transcription factor binding sites for vertebrates. This release includes 74 new curated profiles and 36 older updated profiles. The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets.
When should it be used? When seeking models for specific factors or structural classes or if experimental evidence is paramount.

JASPAR CORE (2014) fungi

The 5th major release (2014) of the JASPAR CORE database contains a curated non-redundant set of 177 profiles derived from published collections of experimentally defined transcription factor binding sites for fungi.
When should it be used? When seeking models for specific factors or structural classes or if experimental evidence is paramount.

JASPAR CORE (2014) insects

The 5th major release of the JASPAR CORE database contains a curated non-redundant set of 131 profiles derived from published collections of experimentally defined transcription factor binding sites for insects. This release included 8 new curated profiles and 3 older updated profiles for Drosophila melanogaster. The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets.
When should it be used? When seeking models for specific factors or structural classes or if experimental evidence is paramount.

JASPAR CORE (2014) nematodes

The 5th major release of the JASPAR CORE database contains a curated non-redundant set of 15 profiles derived from published collections of experimentally defined transcription factor binding sites for nematodes. This release included 10 new curated profiles for Caenorhabditis elegans. The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets.
When should it be used? When seeking models for specific factors or structural classes or if experimental evidence is paramount.

JASPAR CORE (2014) plants

The 5th major release of the JASPAR CORE database contains a curated non-redundant set of 64 profiles derived from published collections of experimentally defined transcription factor binding sites for plants. This release included 43 new curated profiles and 4 older updated profiles for Arabidopsis thaliana. The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets.
When should it be used? When seeking models for specific factors or structural classes or if experimental evidence is paramount.

JASPAR CORE (2014) urochordates

The 5th major release of the JASPAR CORE database contains a curated non-redundant set with 1 profile derived from published collections of experimentally defined transcription factor binding sites for urochordates.
When should it be used? When seeking models for specific factors or structural classes or if experimental evidence is paramount.

JASPAR PHYLOFACTS

The JASPAR PHYLOFACTS motifs database consists of 174 profiles that were extracted from phylogenetically conserved gene upstream elements.See Xie et al. Systematic discovery of regulatory motifs in human promoters and 3-prime UTRs by comparison of several mammals. Nature 434 338-345 (2005) and supplementary material for details.
When should it be used? The JASPAR PHYLOFACTS matrices are a mix of known and as of yet undefined motifs. They are useful when one expects that other factors might determine promoter characteristics such as structural aspects and tissue specificity. They are highly complementary to the JASPAR CORE matrices so are best used in combination with this matrix set.

JASPAR FAM

The JASPAR FAM motifs database consist of models describing shared binding properties of structural classes of transcription factors. These types of models can be called familial profiles or consensus matrices or metamodels. The models have two prime benefits: 1) Since many factors have similar tagrget sequences we often experience multiple predictions at the same locations that correspond to the same site. This type of models reduce the complexity of the results. 2) The models can be used to classify newly derived profiles (or project what type of structural class its cognate transcription factor belongs to).
When should it be used? When searching large genomic sequences with no prior knowledge. For classification of new user-supplied profiles.

JASPAR POLII

The JASPAR POLII motifs database consist of models describing patterns found in RNA Polymerase II (Pol II) promoters. Some of these correposnd to a known protein (like the TATA box) while some have no specific interactor (like DPE). Models are taken from published literature or public databases.
When should it be used? When investigating core promoters from multicellular eukaryotes.

JASPAR CNE

The JASPAR CNE motifs database is a collection of 233 matrix profiles derived by clustering of overrepresented motifs from human conserved non-coding elements. While the biochemical and biological role of most of these patterns is still unknown Xie et al. have shown that the most abundant ones correspond to known DNA-binding proteins--most notably insulator-binding protein CTCF.
When should it be used? Characterization of regulatory inputs in long-range developmental gene regulation in vertebrates.

JASPAR SPLICE

The JASPAR SPLICE motifs database contains matrix profiles of human canonical and non-canonical splice sites as matching donor:acceptor pairs. It currently contains only 6 highly reliable profiles obtained from human genome.
When should it be used? When investigating splice sites.

UniPROBE/BEEML-PBM (Zhao and Stormo 2011)

From Nature Biotechnology 29:480-483; 2011. New motifs derived from UniPROBE PBM data by the BEEML-PBM algorithm. Motifs are for Mouse

Human and Mouse (Jolma2010)

Human and Mouse high-throughput SELEX motifs from Genome Research 20(6):861-873, 2010.

Human and Mouse (MacIsaac THEME)

Human and mouse motifs from ChIP-chip (restricted profiles downloaded from http://fraenkel.mit.edu/THEME).

Prokaryotes (Prodoric Release 8.9)

Prokaryotic motifs from PRODORIC downloaded from http://prodoric.tu-bs.de/msearch.php.

Prokaryotes (RegTransBase v4)

Manually curated prokaryotic motifs from RegTransBase; downloaded from http://regtransbase.lbl.gov/cgi-bin/regtransbase?page=downloads

Drosophila (OnTheFly_2014)

The OnTheFly database is a systematic collection of Drosophila melanogaster transcription factors and their DNA-binding sites. We annotated and classified all Transcription Factors (TFs) predicted in the Drosophila melanogaster genome and collected the known preferred DNA binding sites of the TFs based on the B1H, DNaseI and SELEX experimental methods. OnTheFly houses DNA recognition motifs for 387 different genes encoding TFs (>50% of the Drosophila melanogaster genes encoding TFs). Reference: "OnTheFly: a database of Drosophila melanogaster transcription factors and their binding sites" (NAR 42:D167-171).

Drosophila (Fly Factor Survey)

The FlyFactorSurvey database summarizes a project using the bacterial one-hybrid method to systematically describe the binding site preferences of transcription factors in Drosophila melanogaster. This effort is a collaboration between the laboratories of Michael Brodsky and Scott Wolfe at the University of Massachusetts Medical School and has been funded by the National Human Genome Research Institute. http://pgfe.umassmed.edu/TFDBS.

Drosophila (FLYREG; Bergman & Pollard v2)

Based on FlyReg Drosophila DNase I Footprint Database (v2.0); downloaded from http://www.danielpollard.com/matrices.html.

Drosophila (DMMPMM; Kulakovskiy et al. 2009)

The collection "Drosophila Melanogaster Major Position Matrix Motifs" was created through realignment of genome-mapped DNAse footprint data.
Kulakovskiy I.V., Favorov A.F., Makeev V.J.
(2009) Motif discovery and motif finding from genome-mapped DNase footprint data.
Bioinformatics 25(18): 2318-2325.

When should it be used? Typically DMMPMM is more useful for motifs without known experimental datasources except footprinting (so the corresponding motif in the iDMMPMM is absent).
Additional information is avaliable from the official website.

Drosophila (iDMMPMM; Kulakovskiy et al. 2009)

The collection "improved Drosophila Melanogaster Major Position Matrix Motifs" was created through data integration of different experimental sources.
Kulakovskiy I.V., Makeev V.J.
(2009) Discovery of DNA motifs recognized by transcription factors through integration of different experimental sources.
Biophysics 54(6): 667-674.

Additional information is avaliable from the official website.

Homeodomains (Berger et al.)

Eukaryotic homeodomain motifs from Berger et al. Cell 133:7 2008 downloaded from http://hugheslab.ccbr.utoronto.ca/supplementary-data/homeodomains1/pwm_all_102107.txt

Mouse (UniPROBE)

Mouse TF motifs downloaded from UniPROBE

Mouse (Chen2008)

Mouse ES Cell TF motifs downloaded from Cell 2008. Jun 13;133(6):1106-17.

ETS factors (Wei et al. 2010)

From Embro J 2010. Jun 1;29:2147-2160.
Combination of 3 datasets:
wei2010_human_mws.meme - human microwell SELEX
wei2010_mouse_mws.meme - mouse microwell SELEX
wei2010_mouse_pbm.meme - mouse protein-binding microarrays

E. coli (DPINTERACT)

Footprinting databases based on DPINTERACT; downloaded from http://arep.med.harvard.edu/ecoli_matrices.

GLIs and TCF7L2/TCF4 (Hallikas et al. 2006)

From Cell 2006. Jan 13;124(1):47-59.

Worm (UniPROBE)

C. Elegans homeodomain proteins from Grove et al. Cell 138:314-327; 2009.

Malaria (Campbell et al. 2010)

Plasmodia falciparum ApiAP2 protein family DNA binding motifs from Campbell et al. PLoS Pathog 6 e1001165; 2010.

Yeast (MacIsaac v1)

Yeast motifs from computationally derived map (v1.tamo downloaded from http://fraenkel.mit.edu/improved_map).

Yeast (SCPD)

Yeast motifs from the Sacharomyces Cerevisiae Promoter Database downloaded from http://rulai.cshl.edu.