OncodriveCLUST

Warning: OncodriveCLUST is outdated!

We have developed a new clustering algorithm, OncodriveCLUSTL, that outperforms OncodriveCLUST in the detection of protein-coding cancer driver genes and can be applied to non-coding regions of the genome. You can read OncodriveCLUSTL publication with the complete benchmark and details: Arnedo-Pac C, et al. OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers. Bioinformatics. 2019;35(22):4788–4790. doi:10.1093/bioinformatics/btz501

OncodriveCLUSTL is available as an installable Python 3.5 package at pip and conda. The source code and running examples are freely available at bitbucket.org/bbglab/oncodriveclustl under GNU Affero General Public License.

There is a web version at bbglab.irbbarcelona.org/oncodriveclustl

Description

OncodriveCLUST is a method aimed to identify genes whose mutations are biased towards a large spatial clustering. This method is designed to exploit the feature that mutations in cancer genes, especially oncogenes, often cluster in particular positions of the protein. We consider this as a sign that mutations in these regions change the function of these proteins in a manner that provides an adaptive advantage to cancer cells and consequently are positively selected during clonal evolution of tumours, and this property can thus be used to nominate novel candidate driver genes.

The method does not assume that the baseline mutation probability is homogeneous across all gene positions but it creates a background model using silent mutations. Coding silent mutations are supposed to be under no positive selection and may reflect the baseline clustering of somatic mutations. Given recent evidences of non-random mutation processes along the genome, the assumption of homogenous mutation probabilities is likely an oversimplication introducing bias in the detection of meaningful events.

How it works

Detailed description is contained in the main manuscript. Briefly, the following steps are performed: first, protein affecting mutations of each gene across a cohort of tumors are evaluated looking for those protein residues having a number of mutations barely expected by chance. Second, these positions are thereafter grouped to form mutation clusters. Third, each cluster is scored with a figure proportional to the percentage of the gene mutations that are enclosed within that cluster and inversely related to its length. The gene clustering score is obtained as the sum of the scores of all clusters (if any) found in that gene. Finally, each gene clustering score is compared with the background model to obtain a significance value. Background model is obtained performing the same steps than above but assessing only coding silent mutations.

How it performs

We have analysed those entries of the COSMIC database annotated as whole gene screen as well as data provided from 4 projects of the Cancer Genome Atlas. We demonstrated that the resulting candidate list of drivers is strongly enriched by known cancer driver genes and particularly oncogenes, supporting the idea that this approach can nominate novel driver candidates. In addition, comparison with methods based on other criteria (namely, functional impact and mutation recurrence across the tumor cohort) demonstrated that the clustering approach identifies known cancer drivers not detected by any of the other two methods, stressing the fact that the combination of methods is beneficial to identify cancer drivers. We conclude that OncodriveCLUST is a method that may be useful to identify cancer drivers through the assessment of the mutation clustering property that may be complementary to other methods aimed to identify genes involved in the disease.

How to install and run it

You may download OncodriveCLUST as a standalone program or use it in combination with other tools within the IntOGen-pipeline

You will find detailed information on how to install OncodriveCLUST and run some examples at Bitbucket

OncodriveCLUST 0.4.1 is the version for the submitted paper and can be downloaded from here, and the suplementary datasets that have been analysed from here.

How to cite

If you use OncodriveCLUST, please cite it as Tamborero D, Gonzalez-Perez A and Lopez-Bigas N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics. 2013; doi: 10.1093/bioinformatics/btt395s

For any comments or feedback, please contact

David Tamborero, PhD
Bioinformatician, Postdoctoral Researcher
Research Unit on Biomedical Informatics - GRIB
Parc de Recerca Biomèdica de Barcelona (PRBB)
david.tamborero@upf.edu