Oncodrive-fm is an approach to uncover driver genes or gene modules. It computes a metric of functional impact using three well-known methods (SIFT, PolyPhen2 and MutationAssessor) and assesses how the functional impact of variants found in a gene across several tumor samples deviates from a null distribution. It is thus based on the assumption that any bias towards the accumulation of variants with high functional impact is an indication of positive selection and can thus be used to detect candidate driver genes or gene modules.
How it works
Oncodrive-fm starts by computing three metrics of the functional impact of each non-synonymous SNVs (nsSNVs) found in genes across a list of tumor samples. Any measure of the impact of nsSNVs on protein function (or FI score) could in principle be used here. We have chosen three well-known methods whose scores may be obtained in a high-throughput manner to evaluate hundreds of nsSNVs in a few minutes. Stop-gain SNVs (stSNVs) and frameshift-causing indels (fsindels) are incorporated to the bias analysis by assigning them scores that are comparable to the highest-ranking tier of nsSNVs. Finally, synonymous SNVs (sSNVs) are taken into account with scores equal to those of bottom ranking nsSNVs.
The second step starts by averaging the FI scores of variants per gene and comparing them to the distribution of scores of variants in functionally similar genes. If somatic SNVs were obtained using a whole-genome or whole-exome sequencing approach, the null distribution contains all SNVs and fsindels detected across tumor samples. We call this the internal null distribution. On the other hand, if only a limited number of genes have been sequenced, the null distribution of each gene is composed of nsSNVs that occur naturally in human populations, or external mull distribution. The mean FI of each gene across all tumor samples is then probed for significance employing a permutations strategy.
How it performs
We have applied the Oncodrive-fm approach to three datasets of genes with SNVs and fsindels in samples of different tumor types: glioblastoma multiforme (gbm), and serous ovarian carcinoma (soc) produced within The Cancer Gene Atlas (TCGA) project and chronic lymphocytic leukemia (cll), produced within the International Cancer Genomes Consortium (ICGC) initiative. We were able to detect most genes also pinpointed by MutSig (a method that searches recurrently mutated genes) as significantly biased in gbm and soc. Moreover, we were able to detect recurrent genes with low functional impact which may not constitute true drivers and we uncovered other top-ranking functionally affected genes, some of which could be lowly recurrent drivers.
How to install and run
How to cite
If you use OncodriveFM, please cite it as Gonzalez-Perez A and Lopez-Bigas N. 2012. Functional impact bias reveals cancer drivers. Nucleic Acids Res., 10.1093/nar/gks743.
Any comments or feedback, please contact
Abel González Pérez, PhD
Bioinformatician, Postdoctoral Researcher
Research Unit on Biomedical Informatics - GRIB
Parc de Recerca Biomèdica de Barcelona (PRBB)
We distribute the original PERL implementation of OncodriveFM in a tar ball below. You will need the PERL interpreter installed in your computer as well as the Statistics::Descriptive cpan package in your PERL5LIB directory. You will also need an R installation. The
functional_impact_analysis.pl and pathways_functional_impact_analysis.pl scripts use R it. If your R executable cannot be invoked
directly, please make a shortcut or edit these two scripts accordingly. You can run the examples provided (gbm and cll) by doing:
>./pipeline_launcher.pl ../config/glioblastoma.configfrom the bin directory of the installation.
You may open and check the config files for an explanation of all configuration arguments.