Oncodrive-CIS

Description

Oncodrive-CIS is a method aimed to identify those copy number alterations (CNAs) leading to larger in cis expression changes that may be useful in elucidating the role of these aberrations in cancer. This is based on the hypothesis that a gene driving oncogenesis through copy number changes is more prone to bias towards overexpression (or underexpression) as compared to bystanders. The effect of the gene dosage is assessed by observing expression changes not only among tumors but also taking into account normal samples data, when available.

Oncodrive-CIS has several potential benefits: first, it did not examine the frequency of the CNAs across samples and therefore the detection of low-recurrent driver alterations was not impaired. Second, amplifications and deletions were evaluated separately to obtain a fair ranking of genes, because the expression change measured in deletions was lower than the one obtained from multi-copy amplifications. Third, the expression of genes in tumor samples was analyzed according to the copy number status but was also compared to normal samples, thus better revealing the gene misregulation role of CNAs in cancer cells. And finally, it should be emphasized that the relationship between expression changes and their functional impact is complex, thus Oncodrive-CIS is proposed as a method to elucidate the role of CNAs in cancer which may be complementary to analyses based on other criteria.

How it works

Details of the Oncodrive-CIS have been included in the Methods section of the main manuscript. Briefly, it calculates two standard scores per gene, ZNORMAL and ZTUMOR, which measure the expression change bias in samples with CNAs regarding to normal and diploid tumor samples, respectively. Genes are thereafter ranked according to ZCOMB, which combines ZNORMAL and ZTUMOR by the Stouffer's method. Therefore, the higher is the ranking of the gene, the larger the bias towards misregulation caused by the copy number abnormality.

How it performs

We have benchmarked Oncodrive-CIS by using the data simulator published by Louhimo et al. (Nature Methods 2012, 9: 351), which randomly generates CNAs of different patterns and expression values with different models of dependence with the gene dosage. As a result, we obtained better performance in detecting connected copy number and expression abnormalities than other methods designed for integrating gene expression and dosage data.

On the other hand, we have used Oncodrive-CIS to analyze data from gliobastoma multiforme (GBM) and ovarian serous carcinoma (OSC) retrieved from the Cancer Genome Atlas Data Portal. The top-ranking list retrieved by Oncodrive-CIS contained several well-known cancer genes, as well as other likely driver candidates that have been already related with other tumor types. Among these alterations, several of them are lowly recurrent. Comparison with GISTIC results, which is based on assessing the frequency of the alteration across tumor samples, showed several genes that were identified by both methods, whereas other genes were supported by only one of them. This stressed the fact that both criteria may be combined.

How to install and run

We distribute a Python implementation of Oncodrive-CIS in a compressed file below. Oncodrive-CIS requires three input files containing:

  1. expression values per sample and per gene
  2. copy number status per sample and per gene
  3. a sample file stating whether each sample identifier corresponds to either a normal or a tumor

Oncodrive-CIS is executed by the oncodrivecis.py script. It requires several arguments (some of them optional), which are displayed by typing -h (or --help):

$ python src/oncodrivecis.py -h 
Usage: oncodrivecis.py [options] 
Options: 
  -h, --help            show this help message and exit 
  -e PATH, --expression=PATH 
                        Specifies the path to the exp file 
  -c PATH, --cnv=PATH   Specifies the path to the CNA file 
  -s PATH, --samples=PATH 
                        Specifies the path to the samples file 
  -o PATH, --output=PATH 
                        Specifies the output folder (by default the same than 
                        the samples file one) 
  -i PATH, --identifier=PATH 
                        Specifies the gene id conversion file 
                        (optional) 
  -n INT, --nsampling=INT 
                        Sampling number per gene (optional, 10000 by default) 
  -a INT, --alterations=INT 
                        Minimum number of alterations per gene (2 by default) 

Among the downloadable files we have included the gliobastoma multiforme data set (see the main manuscript for further details about these data) already formatted to be processed by Oncodrive-CIS. For using it, type the following:

$ python src/oncodrivecis.py \
 -e gbm_data/expression.per.gene.ens.gbm.tsv \
 -c gbm_data/cnv.rae.ens.gbm.tsv  \
 -s gbm_data/samples_to_process.tsv \
 -o output -i gbm_data/ensembl63_ensembl2hugo.tsv

The execution time for this example can be decreased by lowering the number of permutations performed to retrieve the Z score values by using the –n (--nsampling) argument or reduce the number of processed samples by modifying the 'samples_to_process.tsv' file.

Note that further details about Oncodrive-CIS execution, input files and produced output are contained in a User Manual which is available among the downloadable files.

Download

oncodrivecis-1.1.0.tar.gz

How to cite

If you use Oncodrive-CIS, please cite it as Tamborero D, Lopez-Bigas N and Gonzalez-Perez A. Oncodrive-CIS: a method to reveal likely driver genes based on the impact of their copy number changes on expression. PLoS ONE 8(2): e55489. doi:10.1371/journal.pone.0055489

Any comments or feedback, please contact

David Tamborero, PhD
Bioinformatician, Postdoctoral Researcher
Research Unit on Biomedical Informatics - GRIB
Parc de Recerca Biomèdica de Barcelona (PRBB)
david.tamborero@upf.edu