Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenge for variants interpretation. Here, we propose a new tool named Genomic vARiants FIltering by dEep Learning moDels in NGS (GARFIELD-NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71-0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina twocolour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. Availability and implementation: GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS.

GARFIELD-NGS: Genomic vARiants FIltering by dEep Learning moDels in NGS

Ravasio V.;Ritelli M.;Giacopuzzi E.
2018-01-01

Abstract

Exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathological variants and study genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false positive calls, and this pose serious challenge for variants interpretation. Here, we propose a new tool named Genomic vARiants FIltering by dEep Learning moDels in NGS (GARFIELD-NGS), which rely on deep learning models to dissect false and true variants in exome sequencing experiments performed with Illumina or ION platforms. GARFIELD-NGS showed strong performances for both SNP and INDEL variants (AUC 0.71-0.98) and outperformed established hard filters. The method is robust also at low coverage down to 30X and can be applied on data generated with the recent Illumina twocolour chemistry. GARFIELD-NGS processes standard VCF file and produces a regular VCF output. Thus, it can be easily integrated in existing analysis pipeline, allowing application of different thresholds based on desired level of sensitivity and specificity. Availability and implementation: GARFIELD-NGS available at https://github.com/gedoardo83/GARFIELD-NGS.
2018
2018
Ateneo di appartenenza
LS2_1 Genomics, comparative genomics, functional genomics
LS2_10 Bioinformatics
LS7_2 Diagnostic tools (e.g. genetic, imaging)
Esperti anonimi
Inglese
Internazionale
STAMPA
34
17
3038
3040
3
High-Throughput Nucleotide Sequencing; INDEL Mutation; Polymorphism, Single Nucleotide; Sequence Analysis, DNA; Deep Learning; Genomics
no
4
info:eu-repo/semantics/article
262
Ravasio, V.; Ritelli, M.; Legati, A.; Giacopuzzi, E.
1 Contributo su Rivista::1.1 Articolo in rivista
open
File in questo prodotto:
File Dimensione Formato  
Garfield.pdf

accesso aperto

Descrizione: full text
Tipologia: Full Text
Licenza: DRM non definito
Dimensione 158.13 kB
Formato Adobe PDF
158.13 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/540132
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 15
  • Scopus 27
  • ???jsp.display-item.citation.isi??? 26
social impact