EpiGRAPH: A user-friendly software for advanced (epi-) genome analysis and prediction (introduction page)

EpiGRAPH: A user-friendly software for advanced (epi-) genome analysis and prediction

Welcome to EpiGRAPH!

EpiGRAPH is a software for genome and epigenome analysis. It was developed to help biomedical researchers making sense of large-scale datasets, which are nowadays routinely generated with technologies such as ChIP-on-chip, tiling microarrays and resequencing.

EpiGRAPH is both simple and powerful. For occasional users, the EpiGRAPH website provides a default analysis workflow that is applicable to most datasets. To find out more about any dataset of genomic regions, EpiGRAPH performs statistical analyses and applies powerful machine learning algorithms, based on a huge database of genome and epigenome information. For advanced users, EpiGRAPH allows full access to its standardized XML-based analysis and documentation system.

EpiGRAPH has been decommissioned and replaced by the LOLA software and web service:

LOLA: R/Bioconductor package for performing enrichment analysis on genomic regions https://bioconductor.org/packages/release/bioc/html/LOLA.html
Sheffield NC, Bock C (2016). LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor. Bioinformatics 32, 587-589. https://academic.oup.com/bioinformatics/article/32/4/587/1743969

LOLAweb: Interactive web-based frontend for running LOLA analyses http://lolaweb.databio.org/
Nagraj VP, Magee NE, Sheffield NC (2018). LOLAweb: a containerized web server for interactive genomic locus overlap enrichment analysis. Nucleic Acids Research 46, W194-W199. https://academic.oup.com/nar/article/46/W1/W194/5033529

Things You Can Do with EpiGRAPH

Whenever you have a set of genomic regions, EpiGRAPH can help you to find out more about these regions and predict other regions of the same type in the genome.

For your inspiration, here are a few ideas for using EpiGRAPH that we find interesting (please also see the section below on the history of EpiGRAPH):

Epigenomics: Is it possible to predict and explain which genomic regions are subject to tissue-specific methylation, based on DNA sequence and structure?
-> This analysis requires large-scale DNA methylation data for multiple tissues.
Retrovirology: Which genomic regions are preferential targets of integration for retroviruses and transposable elements? Which role does the local chromatin structure play?
-> This analysis requires sequence data for a few hundred retroviral integration sites.
Developmental Epigenetics: What are the characteristics of Polycomb Response Elements in mammals as compared to Drosophila? Is it possible to predict their location genome-wide?
-> This analysis requires ChIP-on-chip data for Polycomb repression complex proteins in several mammals.
Cancer Genomics: To what degree do factors like gene richness, local recombination rates and chromatin structure influence cancer-specific microdeletions and other structural variations? And which role does tumor evolution play (i.e. do we see different determinants of cancer-specific microdeletions in early-stage vs. late-stage tumors?
-> This analysis requires large-scale resequencing data for a number of tumors.

There will be a lot of other applications of EpiGRAPH, many of which we may not even have dreamt of.

Using, Extending, and Citing EpiGRAPH
EpiGRAPH is freely available to the scientific community (*). This includes not only the web service, but also the source code - in case you plan to set up a local copy, extend and tailor EpiGRAPH to your needs, or integrate parts of EpiGRAPH into your own software. Please write us an e-mail and we will provide you not only with the URL from where you can download the most recent source code release, but also with hints and help to get started adapting the source code to your objectives.

In order to support future work on extending and improving EpiGRAPH, it is important that you cite it whenever you use it. There are three references that are directly related to EpiGRAPH:

[1] Bock, C., K. Halachev, J. Büch and T. Lengauer (2008). "EpiGRAPH: Searching genomes and epigenomes with machine learning technology." http://epigraph.mpi-inf.mpg.de.
[2] Bock, C., M. Paulsen, S. Tierling, T. Mikeska, T. Lengauer and J. Walter (2006). "CpG island methylation in human lymphocytes is highly correlated with DNA sequence, repeats, and predicted DNA structure." PLoS Genetics 2(3): e26.
[3] Bock, C., J. Walter, M. Paulsen and T. Lengauer (2007). "CpG island mapping by epigenome prediction." PLoS Computational Biology 3(6): e110.

The first citation refers to the web service and the public version of EpiGRAPH, and the other two citations describe the underlying concepts and methods. Typically, you would cite EpiGRAPH as follows: In the Results section, you could write: “We performed an EpiGRAPH [1] analysis on our dataset”, and in the Methods section, you could elaborate: “EpiGRAPH is a software for genome and epigenome analysis that was originally developed for predicting epigenetic information from the DNA [2, 3]. Here, we use the EpiGRAPH web service in order to analyze…”.

History and Future

EpiGRAPH has been applied in a number projects. Below, we give four examples that highlight different ways in which EpiGRAPH was used (and can be used).

The development of EpiGRAPH was initiated with the goal of predicting DNA methylation from properties of the genome, such as DNA sequence and structure, gene and repeat distribution, SNPs and transcription factor binding sites. As it turned out, the support vector machines that are now at the heart of EpiGRAPH could predict DNA methylation with high accuracy (Bock et al. 2006, PLoS Genetics). This finding was validated experimentally, and in the meantime also by three independent papers published by other research groups. A special focus of this project was to understand which genomic attributes are directly predictive of DNA methylation and which are only predictive via indirect correlations. Based on a strategy developed in this project, EpiGRAPH can be used to identify direct, possibly causal, associations inside a large set of genomic attributes.
In a follow-up project to our DNA methylation prediction, we used EpiGRAPH to derive genome-wide predictions for a number of epigenetic attributes and merged them into an improved definition of bona fide CpG islands (Bock et al. 2007, PLoS Computational Biology). The main contribution of this project to EpiGRAPH was the proof-of-principle that genome-wide predictive analysis across a number of different datasets, cell lines and tissues is technically feasible and can be biologically productive.
A group from the Norwegian Radium Hospital were the first external users of EpiGRAPH, applying it in order to better understand a map of DNA melting profiles that they had calculated for the human genome (Liu et al. 2007, PLoS Computational Biology). Their study showed that even previous versions of EpiGRAPH were already usable out-of-the-box by an independent group, with very little technical help from our side.
In a upcoming EU project on cancer epigenetics, EpiGRAPH will be used to identify promising cancer biomarkers from genome-wide DNA methylation profiles in a large patient sample. This project will prompt further improvements of EpiGRAPH as a pre-screening and hypothesis generation / prioritization tool.

Several other projects that involve EpiGRAPH analyses are currently in progress and will be added to this website in due time. In the future, we believe that EpiGRAPH may converge with other web services into a loosely coupled network of (epi-)genome analysis and data mining tools. Such a network would accept standardized XML-based analysis requests centrally and process them in a decentralized manner, with each web service contributing a specific analysis or access to a particular database. A descriptive term for this vision could be Statistical Genome Browser Network, and the standardized and extensible XML data format specified for EpiGRAPH may provide an appropriate internal language for data exchange within this network.

CpG island

Developers and Contact
As any complex software package, EpiGRAPH is the product of many people who share the vision of a comprehensive and easy-to-use (epi-)genome analysis and prediction toolkit. Most of the development is currently performed by three people:

Name	Main responsibilities	Contact
Christoph Bock	Project leader, software architecture, database and middleware programming, frontend development	http://www.mpi-inf.mpg.de/~cbock
Konstantin Halachev	Backend and database programming, machine learning components
Joachim Büch	IT infrastructure, deployment, web server administration	http://www.mpi-inf.mpg.de/~buech

Please feel free to contact any of us if you have questions, bug reports, ideas for future extensions or an interesting analysis that we should help you with. Typically, Christoph Bock will handle most requests.

(*) Free availability applies to universities and non-profit research institutes only. Companies should please contact us and will receive permission to use EpiGRAPH freely during an extended test phase.