Molecular classification of breast cancer into clinically relevant subtypes helps improve prognosis and adjuvant-treatment decisions. The aim of this study is to provide a better characterization of the molecular subtypes by providing a comprehensive landscape of subtype-specific isoforms including coding, long non-coding RNA and microRNA transcripts. Isoform-level expression of all coding and non-coding RNAs is estimated from RNA-sequence data of 1168 breast samples obtained from The Cancer Genome Atlas (TCGA) project. We then search the whole transcriptome systematically for subtype-specific isoforms using a novel algorithm based on a robust quasi-Poisson model. We discover 5451 isoforms specific to single subtypes. A total of 27% of the subtype-specific isoforms have better accuracy in classifying the intrinsic subtypes than that of their corresponding genes. We find three subtype-specific miRNA and 707 subtype-specific long non-coding RNAs. The isoforms from long non-coding RNAs also show high performance for separation between Luminal A and Luminal B subtypes with an AUC of 0.97 in the discovery set and 0.90 in the validation set. In addition, we discover 1500 isoforms preferentially co-expressed in two subtypes, including 369 isoforms co-expressed in both Normal-like and Basal subtypes, which are commonly considered to have distinct ER-receptor status. Finally, analyses at protein level reveal four subtype-specific proteins and two subtype co-expression proteins that successfully validate results from the isoform level.
Comprehensive landscape of subtype-specific coding and non-coding RNA transcripts in breast cancer
CALZA, Stefano;
2016-01-01
Abstract
Molecular classification of breast cancer into clinically relevant subtypes helps improve prognosis and adjuvant-treatment decisions. The aim of this study is to provide a better characterization of the molecular subtypes by providing a comprehensive landscape of subtype-specific isoforms including coding, long non-coding RNA and microRNA transcripts. Isoform-level expression of all coding and non-coding RNAs is estimated from RNA-sequence data of 1168 breast samples obtained from The Cancer Genome Atlas (TCGA) project. We then search the whole transcriptome systematically for subtype-specific isoforms using a novel algorithm based on a robust quasi-Poisson model. We discover 5451 isoforms specific to single subtypes. A total of 27% of the subtype-specific isoforms have better accuracy in classifying the intrinsic subtypes than that of their corresponding genes. We find three subtype-specific miRNA and 707 subtype-specific long non-coding RNAs. The isoforms from long non-coding RNAs also show high performance for separation between Luminal A and Luminal B subtypes with an AUC of 0.97 in the discovery set and 0.90 in the validation set. In addition, we discover 1500 isoforms preferentially co-expressed in two subtypes, including 369 isoforms co-expressed in both Normal-like and Basal subtypes, which are commonly considered to have distinct ER-receptor status. Finally, analyses at protein level reveal four subtype-specific proteins and two subtype co-expression proteins that successfully validate results from the isoform level.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.