FAQs

1. What information is available in the dbCoRC?

dbCoRC is the first interactive database presenting models of core transcriptional regulatory circuitries (CRCs) for 188 human and 50 murine cell line/tissue samples. In companion with CRC models, this database provides following downloadable information, 1) super enhancer (SE), enhancer, and H3K27ac landscape for individual samples, 2) putative binding sites of each core transcription factor (TF) across the SE regions within CRC. In addition, dbCoRC incorporates general descriptions of CRC TFs, together with their expression data in normal or cancer cells/tissues. Therefore, dbCoRC serves as a valuable resource for the studies of transcriptional networks and regulatory circuitries in both physiological (non-disease) and diseased conditions.

2. What samples are included in the dbCoRC?

dbCoRC compiles the CRC models for 188 human and 50 murine cell line/tissue samples. Among 188 human samples, 79 of them are cancer cells/tissues, and 109 of them are non-tumor cells/tissues. Murine dataset contains 1 malignant cell line and 49 normal cells/tissues. The detailed sample information can be found in the Browse page, and the overall summaries of sample and CRC information are displayed in the Statistics page.

3. What is core transcriptional regulatory circuitry (CRC)?

Studies from embryonic stem cells and other cellular models have revealed that a small group of cell-type-specific or lineage-specific transcription factors (TFs) forms an interconnected autoregulatory loop to govern transcriptional programs in particular cell types1-3. The core TFs and their interconnected autoregulatory loop are critical to maintain cell identity and cellular state, which represents the core transcriptional regulatory circuitry (CRC)4.

References

  1. Boyer, L.A., Lee, T.I., Cole, M.F., Johnstone, S.E., Levine, S.S., Zucker, J.P., Guenther, M.G., Kumar, R.M., Murray, H.L., Jenner, R.G. et al. (2005) Core transcriptional regulatory circuitry in human embryonic stem cells. Cell, 122, 947-956.
  2. Odom, D.T., Dowell, R.D., Jacobsen, E.S., Nekludova, L., Rolfe, P.A., Danford, T.W., Gifford, D.K., Fraenkel, E., Bell, G.I. and Young, R.A. (2006) Core transcriptional regulatory circuitry in human hepatocytes. Mol Syst Biol, 2, 2006 0017.
  3. Sanda, T., Lawton, L.N., Barrasa, M.I., Fan, Z.P., Kohlhammer, H., Gutierrez, A., Ma, W., Tatarek, J., Ahn, Y., Kelliher, M.A. et al. (2012) Core transcriptional regulatory circuit controlled by the TAL1 complex in human T cell acute lymphoblastic leukemia. Cancer Cell, 22, 209-221.
  4. Saint-Andre, V., Federation, A.J., Lin, C.Y., Abraham, B.J., Reddy, J., Lee, T.I., Bradner, J.E. and Young, R.A. (2016) Models of human core transcriptional regulatory circuitries. Genome Res, 26, 385-396.

4. How to use the dbCoRC?

4.1 Quick Search

In the home page of the dbCoRC, users can key in TF gene ID or symbol to perform a quick search of corresponding gene of interest. Fuzzy search function has been implemented to assist the identification of potential relevant genes which contain same characters. Quick search result will be re-directed to Search page.

4.2 Browse

In the Browse page, users can easily click or enter search terms to filter samples included in the database. The left panel of the Browse page is organized for sample filtering based on species, biosample types, tissue types, and cell types. “Search” box can also be used to look for the samples of interest. By clicking the sample name, the CRC model for the corresponding sample will be displayed as an interactive image of the interconnected loops and a list of core TFs. The expression patterns of individual core TFs can be found by clicking button. Putative binding sites of core TFs within the SE region of a particular core TF can be visualized in USCS genome browser, by clicking the button. Further clicking a core TF of interest will open a new tab showing sample information, general description of TF gene, downstream targets and upstream regulators within the CRC model, and gene expression in normal or cancer cells/tissues.

Example 1: searching the CRC model for H1 human embryonic stem cells

Users can click Browse → human → Embryonic Stem Cell, or simply enter "H1" in "Search" tab. After choosing the "H1", the following information of the CRC model for this cell line will be displayed.

4.3 Search

In the Search page, potential core TFs across various samples can be explored. Users can select "genome" or "cell/tissue type" to restrict the search fields, and enter single or multiple TFs (separated by commas) of interest into the "Gene" box. Searching hits will be displayed as a data sheet (see Example 2). The expression patterns of individual core TFs can be found by clicking button. Putative binding sites of other core TFs inside the SE region of a particular hit can be visualized in USCS genome browser, by clicking the button. Further clicking the hit of interest will open a new tab showing sample information, general description of TF gene, downstream targets and upstream regulators within the CRC model, and gene expression in normal or cancer cells/tissues.

Example 2: searching results of SOX2, NANOG

4.4 Download

In the Download page, we provide all downloadable files of each sample, including Zip files of core TF binding sites (.bed), SE annotation (.bed), processed H3K27ac ChIP-seq signals (.bw) and peak annotation (.xlsx). Users can easily search and download ( ) these data for in-house and/or in-depth bioinformatics analysis.

We envision several potential applications of these data in biological studies related to, but not limited to super enhancers (Example 3), master TF switch during lineage specification, and disease-specific transcriptional network (Example 4).

Example 3: Data download for customized evaluation of super enhancer and CRC in H1 cells

Data related to H1 cells were first selected and downloaded from the Download page. Here, "SE.bed" included locations of all the annotated super enhancers and their assigned closest genes. "All CRC models" is a file of all possible binding sites of core TFs which may form CRC. "H3K27ac.bw" included processed background subtracted H3K27ac ChIP-seq signals in H1 cells.

Users can import these data into IGV, together with other bw files (e.g. NANOG, SOX2 ChIP-Seq data) to explore TF binding features across the SE regions.

Example 4: Data download for the deep-analysis of cancer-specific core TFs

The dbCoRC displays the model of CRC for each sample by default setting. Users can compare the differential statuses of core TFs between non-tumor and tumor samples of the same tissue origin. Below is an example to explore the core TFs in the gastric cancer tissue and its adjacent normal gastric tissue from the same patient. MEIS1 was present exclusively in tumor samples, while FOSL2 and FOS were selectively observed in normal gastric tissues. SMAD3, IRF2, ELF3, TCF7L2, and IRF1 were common core TFs in both tumor and normal gastric tissues. These observations may provide novel insights into gastric tissue homeostasis and gastric tumor development, strongly encouraging follow-up biological and functional investigations.

N2000085 N2000639 N20020720 N2001206 T2000085 T2000639 T20020720 T2001206 No.normal No.tumor TFs of CRC
1 1 1 1 1 1 1 1 4 4 SMAD3
1 1 1 1 1 0 1 1 4 3 IRF2
0 1 1 1 0 1 1 1 3 3 ELF3
0 1 1 1 0 1 1 1 3 3 TCF7L2
1 1 1 0 0 1 1 1 3 3 IRF1
1 1 1 1 0 0 0 0 4 0 FOSL2
0 1 1 1 0 0 0 0 3 0 FOS
0 0 0 0 1 1 1 1 0 4 MEIS1

5. Pipeline to model the core transcriptional regulatory circuitry

CRC model was initially proposed from ESC study based on ChIP-on-chip results. Recent studies also suggested that CRC models can be computationally inferred from H3K27ac ChIP-seq data. The detailed algorithms for CRC modeling in the dbCoRC were developed from a seminal study with slight modifications (Young RA, 2016. Genome Res), relying on super enhancer (SE) mapping and the prediction of TF binding sites across SE regions (summarized below in a flowchart).

Description

  1. H3K27ac ChIP-Seq

    Raw H3K27ac ChIP-seq data were downloaded from GEO.

  2. Mapped reads

    Bowtie1 was used to map reads to genome (parameters, -m 1 -k 1 –best).

  3. Peaks

    MACS 1.4 was used for peak calling (parameters, -p 1e-9).

  4. Super enhancer & genes

    H3K27ac load was calculated and ranked by ROSE algorithm to define super enhancers (SEs) and typical enhancers. SEs were then assigned closest genes. When multiple closest genes were identified to be associated with same SE, this SE was assigned premierly to the TF gene.

  5. Expressed genes

    H3K27ac read counts within the promoter region (± 1 kb to the TSS) of each gene/transcript was ranked in each sample. The transcripts ranked in the top 2/3 were considered expressed actively.

  6. Transcription factors

    In total, 1,253 TFs were retrieved from the intersection of AnimalTFDB and TcoF databases. CTCF, GTF2I, and GTF2IRD1 were excluded for this analysis.

  7. Super enhancers & expressed TF

    Super enhancer-associated active TFs were identified by overlapping the gene lists from ④, ⑤, and ⑥.

  8. DNA binding motifs

    In total, 3,160 DNA binding motifs for 695 TFs were compiled from the TRANSFAC database and MEME suite.

  9. Auto-regulated TFs

    ROSE-defined SE regions were extended 500 bp on each side, followed by motif scanning with FIMO. Auto-regulated TF was identified, if one SE-associated TF had at least three binding motifs within its own extended SE region.

  10. Interconnected auto-regulatory loops of TFs

    Within the same sample, motif scanning was applied further to identify potential binding sites of all auto-regulated TFs in their extended SE regions. Regulatory circuitries were then constructed based on all possible fully interconnected autoregulatory loops.

  11. CRC

    When multiple posibilities of regulatory circuitries can be computed, the one which contained TFs with the highest frequency of occurrence across all possible loops was selected as the model of CRC in individual samples.

6. Data source and bioinformatic tools

Data Source URL
ChIP-seq GEO https://www.ncbi.nlm.nih.gov
Transcript factors AnimalTFDB http://www.bioguo.org/AnimalTFDB/
Transcript factors TcoF-DB http://www.cbrc.kaust.edu.sa/tcof/
DNA sequence motif TRANSFAC http://gene-regulation.com/pub/databases.html
DNA sequence motif MEME http://meme-suite.org/
Gene expression in human cancers TCGA http://cancergenome.nih.gov/
Gene expression in human cell lines EMBL-EBI Encode Cell Lines http://www.ebi.ac.uk/gxa
Gene expression in normal human tissues EMBL-EBI Illumina Body Map http://www.ebi.ac.uk/gxa
Gene expression in normal murine tissues RhesusBase http://www.rhesusbase.org/
Hg19/MM9 UCSC http://genome.ucsc.edu/
Tool Usage URL
Bowtie1.2.0 Reads alignment https://sourceforge.net/projects/bowtie-bio/files/bowtie/1.2.0
SAMtools sort and index index http://www.htslib.org/
MACS 1.4.2 Identify H3K27ac enriched region http://liulab.dfci.harvard.edu/MACS/Download.html
ROSE Super enhancer identification https://bitbucket.org/young_computation/rose
FIMO Search motif http://meme-suite.org/tools/fimo
CRCmapper Map core regulator circuitry https://bitbucket.org/young_computation/crcmapper