dbCoRC is the first interactive database presenting models of core transcriptional regulatory circuitries (CRCs) for 188 human and 50 murine cell line/tissue samples. In companion with CRC models, this database provides following downloadable information, 1) super enhancer (SE), enhancer, and H3K27ac landscape for individual samples, 2) putative binding sites of each core transcription factor (TF) across the SE regions within CRC. In addition, dbCoRC incorporates general descriptions of CRC TFs, together with their expression data in normal or cancer cells/tissues. Therefore, dbCoRC serves as a valuable resource for the studies of transcriptional networks and regulatory circuitries in both physiological (non-disease) and diseased conditions.
dbCoRC compiles the CRC models for 188 human and 50 murine cell line/tissue samples. Among 188 human samples, 79 of them are cancer cells/tissues, and 109 of them are non-tumor cells/tissues. Murine dataset contains 1 malignant cell line and 49 normal cells/tissues. The detailed sample information can be found in the Browse page, and the overall summaries of sample and CRC information are displayed in the Statistics page.
Studies from embryonic stem cells and other cellular models have revealed that a small group of cell-type-specific or lineage-specific transcription factors (TFs) forms an interconnected autoregulatory loop to govern transcriptional programs in particular cell types1-3. The core TFs and their interconnected autoregulatory loop are critical to maintain cell identity and cellular state, which represents the core transcriptional regulatory circuitry (CRC)4.
In the home page of the dbCoRC, users can key in TF gene ID or symbol to perform a quick search of corresponding gene of interest. Fuzzy search function has been implemented to assist the identification of potential relevant genes which contain same characters. Quick search result will be re-directed to Search page.
In the Browse page, users can easily click or enter search terms to filter samples included in the database. The left panel of the Browse page is organized for sample filtering based on species, biosample types, tissue types, and cell types. “Search” box can also be used to look for the samples of interest. By clicking the sample name, the CRC model for the corresponding sample will be displayed as an interactive image of the interconnected loops and a list of core TFs. The expression patterns of individual core TFs can be found by clicking button. Putative binding sites of core TFs within the SE region of a particular core TF can be visualized in USCS genome browser, by clicking the button. Further clicking a core TF of interest will open a new tab showing sample information, general description of TF gene, downstream targets and upstream regulators within the CRC model, and gene expression in normal or cancer cells/tissues.
Users can click Browse → human → Embryonic Stem Cell, or simply enter "H1" in "Search" tab. After choosing the "H1", the following information of the CRC model for this cell line will be displayed.
In the Search page, potential core TFs across various samples can be explored. Users can select "genome" or "cell/tissue type" to restrict the search fields, and enter single or multiple TFs (separated by commas) of interest into the "Gene" box. Searching hits will be displayed as a data sheet (see Example 2). The expression patterns of individual core TFs can be found by clicking button. Putative binding sites of other core TFs inside the SE region of a particular hit can be visualized in USCS genome browser, by clicking the button. Further clicking the hit of interest will open a new tab showing sample information, general description of TF gene, downstream targets and upstream regulators within the CRC model, and gene expression in normal or cancer cells/tissues.
In the Download page, we provide all downloadable files of each sample, including Zip files of core TF binding sites (.bed), SE annotation (.bed), processed H3K27ac ChIP-seq signals (.bw) and peak annotation (.xlsx). Users can easily search and download ( ) these data for in-house and/or in-depth bioinformatics analysis.
We envision several potential applications of these data in biological studies related to, but not limited to super enhancers (Example 3), master TF switch during lineage specification, and disease-specific transcriptional network (Example 4).
Data related to H1 cells were first selected and downloaded from the Download page. Here, "SE.bed" included locations of all the annotated super enhancers and their assigned closest genes. "All CRC models" is a file of all possible binding sites of core TFs which may form CRC. "H3K27ac.bw" included processed background subtracted H3K27ac ChIP-seq signals in H1 cells.
Users can import these data into IGV, together with other bw files (e.g. NANOG, SOX2 ChIP-Seq data) to explore TF binding features across the SE regions.
The dbCoRC displays the model of CRC for each sample by default setting. Users can compare the differential statuses of core TFs between non-tumor and tumor samples of the same tissue origin. Below is an example to explore the core TFs in the gastric cancer tissue and its adjacent normal gastric tissue from the same patient. MEIS1 was present exclusively in tumor samples, while FOSL2 and FOS were selectively observed in normal gastric tissues. SMAD3, IRF2, ELF3, TCF7L2, and IRF1 were common core TFs in both tumor and normal gastric tissues. These observations may provide novel insights into gastric tissue homeostasis and gastric tumor development, strongly encouraging follow-up biological and functional investigations.
|N2000085||N2000639||N20020720||N2001206||T2000085||T2000639||T20020720||T2001206||No.normal||No.tumor||TFs of CRC|
CRC model was initially proposed from ESC study based on ChIP-on-chip results. Recent studies also suggested that CRC models can be computationally inferred from H3K27ac ChIP-seq data. The detailed algorithms for CRC modeling in the dbCoRC were developed from a seminal study with slight modifications (Young RA, 2016. Genome Res), relying on super enhancer (SE) mapping and the prediction of TF binding sites across SE regions (summarized below in a flowchart).
Raw H3K27ac ChIP-seq data were downloaded from GEO.
Bowtie1 was used to map reads to genome (parameters, -m 1 -k 1 –best).
MACS 1.4 was used for peak calling (parameters, -p 1e-9).
H3K27ac load was calculated and ranked by ROSE algorithm to define super enhancers (SEs) and typical enhancers. SEs were then assigned closest genes. When multiple closest genes were identified to be associated with same SE, this SE was assigned premierly to the TF gene.
H3K27ac read counts within the promoter region (± 1 kb to the TSS) of each gene/transcript was ranked in each sample. The transcripts ranked in the top 2/3 were considered expressed actively.
In total, 1,253 TFs were retrieved from the intersection of AnimalTFDB and TcoF databases. CTCF, GTF2I, and GTF2IRD1 were excluded for this analysis.
Super enhancer-associated active TFs were identified by overlapping the gene lists from ④, ⑤, and ⑥.
In total, 3,160 DNA binding motifs for 695 TFs were compiled from the TRANSFAC database and MEME suite.
ROSE-defined SE regions were extended 500 bp on each side, followed by motif scanning with FIMO. Auto-regulated TF was identified, if one SE-associated TF had at least three binding motifs within its own extended SE region.
Within the same sample, motif scanning was applied further to identify potential binding sites of all auto-regulated TFs in their extended SE regions. Regulatory circuitries were then constructed based on all possible fully interconnected autoregulatory loops.
When multiple posibilities of regulatory circuitries can be computed, the one which contained TFs with the highest frequency of occurrence across all possible loops was selected as the model of CRC in individual samples.
|DNA sequence motif||TRANSFAC||http://gene-regulation.com/pub/databases.html|
|DNA sequence motif||MEME||http://meme-suite.org/|
|Gene expression in human cancers||TCGA||http://cancergenome.nih.gov/|
|Gene expression in human cell lines||EMBL-EBI Encode Cell Lines||http://www.ebi.ac.uk/gxa|
|Gene expression in normal human tissues||EMBL-EBI Illumina Body Map||http://www.ebi.ac.uk/gxa|
|Gene expression in normal murine tissues||RhesusBase||http://www.rhesusbase.org/|
|SAMtools||sort and index index||http://www.htslib.org/|
|MACS 1.4.2||Identify H3K27ac enriched region||http://liulab.dfci.harvard.edu/MACS/Download.html|
|ROSE||Super enhancer identification||https://bitbucket.org/young_computation/rose|
|CRCmapper||Map core regulator circuitry||https://bitbucket.org/young_computation/crcmapper|