Distributed learning to protect privacy in multi-centric clinical studies

Damiani, A.; Vallati, M.; Gatta, R.; Dinapoli, N.; Jochems, A.; Deist, T.; van Soest, J.; Dekker, A.; Valentini, V.

doi:10.1007/978-3-319-19551-3_8

Research in medicine has to deal with the growing amount of data about patients which are made available by modern technologies. All these data might be used to support statistical studies, and for identifying causal relations. To use these data, which are spread across hospitals, efficient merging techniques as well as policies to deal with this sensitive information are strongly needed. In this paper we introduce and empirically test a distributed learning approach, to train Support Vector Machines (SVM), that allows to overcome problems related to privacy and data being spread around. The introduced technique allows to train algorithms without sharing any patients-related information, ensuring privacy and avoids the development of merging tools. We tested this approach on a large dataset and we described results, in terms of convergence and performance; we also provide considerations about the features of an IT architecture designed to support distributed learning computations.