New IntOGen Somatic Mutations Analysis version available

We are proud to announce the brand new version of the IntOGen Somatic Mutations Analysis (IntOGen SM) pipeline. We call it version 2.0.0 as it has been completely rewritten from scratch with a strong focus on quality, efficiency and scalability.

The IntOGen SM pipeline addresses the challenge of identifying which somatic mutations are important for the development of tumors. The input for the analysis is a list of somatic mutations detected in a cohort of tumors.

The analysis follows several steps. First, the list of mutations have to be read and parsed, as several files and formats can be used; then follows the identification of the effect that mutations may have on transcripts and regulatory regions using the Variant Effect Predictor (VEP) and subsequently the identification of their functional impact from the scores computed by the tools SIFT, PolyPhen2 and Mutation Assessor (MA) and transformed with TransFIC. The following steps perform the calculation of the recurrence of mutations, genes and pathways across tumors, the identification of cancer drivers genes and pathways using OncodriveFM, and the identification of clustered mutations in genes that would confer an adaptive advantage to the cancer cells by using OncodriveCLUST. Finally several datasets are generated with all the results.

The source code is freely available under the Affero GPL 3.0 license. We encourage people to test it and report suggestions and failures as issues on the Bitbucket project site where you can download it.

It is compatible with MacOS X and Linux and it can be run in different ways:

  • By using the online demo: We provide an online demo for fast evaluation without having to install anything (just clicking on the Analysis tab). We encourage people to install it in their servers if they find it useful. Here you will find also the documentation.
  • By running a local web interface: The best solution for novel users and for institutions wanting to provide a common service for all the researchers.
  • By running the command on a unix terminal: for advanced users and people wanting to embed the pipeline within their own workflows.

We hope you find it a useful tool for your daily analyses but you can also become indirect user without having to execute anything if you just browse the data available in the IntOGen Web. The data there has been analysed with the IntOGen SM pipeline from 26 cancer somatic mutations analysis projects obtained from different sources such as the ICGC, TCGA and the literature.

Thanks to the rapid advances in sequencing technologies, cancer research projects are now able to sequence the genome (or exome) of thousand of tumors and rapidly obtain a big amount of data comprising the catalogs of somatic mutations in those tumors. Our aim is to be able to analyse hundred of thousand of tumours in the near future with the lowest possible cost in infrastructure and time.

The implementation of the steps is quite more complex than the simplified schema shown before and require some kind of orchestra director. We use Wok for that purpose. Wok is a workflow management system that is in development in our group. It is based on message passing and allows to separate the logic of the analyses from the complexity of the execution. It can be run on platforms as simple as a laptop or as complex as a cluster of thousand of computers. Its main features are a powerful configuration system, automatic and transparent partitioning of data and parallelization of execution, and a web interface for management and monitoring.

We feel very happy with this release and expect to release new versions in the near future, so keep in touch with us through any of the available channels: the blog, our twitters, the web …

See you soon.