This page summarizes the different attribute groups included in EpiGRAPH and provides references to the source from which the datasets were obtained. Further information can be obtained from the EpiGRAPH Background page and from the EpiGRAPH attribute reference sheet.
Attributes that describe the DNA sequence itself, including base composition and oligonucleotide patterns
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |
Base_composition | Strand-specific frequency of occurence for each nucleotide (A, C, G and T) | Calculated directly from the DNA sequence | |||
All_2mers | Frequency of occurence separately for each oligonucleotides of size two that does not include any Ns (not strand-specific) | Calculated directly from the DNA sequence | |||
All_4mers | Frequency of occurence separately for each oligonucleotides of size four that does not include any Ns (not strand-specific) | Calculated directly from the DNA sequence |
Attributes that describe the DNA structure (as inferred from the DNA sequence), such as distortions of the DNA helix and predicted solvent accessibility
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |
Predicted_Helix_Structure | Helix structure of naked DNA as predicted from octamers with known structure | Calculated by a simple sliding window approach using the simulation data reported in Gardiner et al. (2003) J Mol Biol | twist roll tilt rise slide shift | ||
Predicted_Solvent_Accessible_Surface | Solvent accessible surface area of naked DNA as predicted from trimers with known values | Calculated similarly to the UCSC Genome Browser Boston University ORChID track (http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=78806550&c=chr7&g=encodeBu_ORChID1) | pk1_mean pk2_mean pk3_mean |
Attributes that describe repetition within the DNA, including transposable elements, tandem repeats and segmental duplications
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |
RepeatMasker | Repeats as detected by RepeatMasker. See http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=panTro2&c=chr7&g=rmsk for details. | UCSC Genome Browser, tables chr1_rmsk to chrY_rmsk | swScore repStart repLeft | repClass repFamily | |
Simple_Repeats | Tandem repeats as detected by Tandem Repeats Finder. See http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=panTro2&c=chr7&g=simpleRepeat for details. | UCSC Genome Browser, table simpleRepeat | period copyNum score entropy |
Attributes that describe the large-scale functional organisation of the chromosomes, including isochores and special-interest regions
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |
Attributes that describe the evolutionary history of the genome, including conservation and local recombination rates
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |
Attributes that describe the variability among today's individuals, including SNPs and microdeletions
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |
Attributes that describe the distribution of known and predicted protein-coding genes, pseudogenes and non-coding genes within the genome
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |
RefSeq_Genes | Known protein-coding genes taken from the NCBI mRNA reference sequences collection (RefSeq). See http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=panTro2&c=chr2&g=refGene for details. | UCSC Genome Browser, table refGene |
Attributes that describe predicted regulatory regions and elements of the genome
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |
CpG_Islands | CpG islands according to a UCSC Genome Browser detection algorithm. See http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=panTro2&c=chr7&g=cpgIslandExt for details. | UCSC Genome Browser, table cpgIslandExt | perGc obsExp |
Attributes that describe the transcriptional activity, including non-genic transcription and promoter activity
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |
Species_mRNAs | Annotation of alignments between species-specific mRNAs in GenBank and the genome. See http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=panTro2&c=chr2&g=mrna for details | UCSC Genome Browser, table all_mrna | |||
Spliced_ESTs | Annotation of alignments between species-specific spliced ESTs in GenBank and the genome. See http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=panTro2&c=chr2&g=intronEst for details | UCSC Genome Browser, tables chr*_intronEst | |||
Species_ESTs | Annotation of alignments between species-specific ESTs in GenBank and the genome. See http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=panTro2&c=chr2&g=est for details | UCSC Genome Browser, table all_est |
Attributes that describe the chromatin structure and epigenetic modifications, including histone modifications and protein binding
Attribute name | Description | Data source for attribute | Score columns | Class columns | Category columns |