In the context of Smart cities, local institutions face the increasing need for monitoring the dynamic of the flow of people’s presences inside urban areas in order to plan the improvement and the maintaining of the urban infrastructure. Rectangular grid polygons reporting the density of people using mobile phone (Carpita, Simonetto, 2014) are source of very large data. Telecom Italia Mobile (TIM), which is currently the largest operator in Italy in this sector, thanks to a research agreement with the Statistical Office of the Municipality of Brescia, provided to us about two years (April 2014 to June 2016, n about 700) of Daily Mobile Phone Density Profiles (DMPDPs) for the Province of Brescia in the form of a regular grid of 923 x 607 cells each 15 minutes. In order to find regularities and detect anomalies in the flow of people’s presences, this work aims to cluster similar DMPDPs, where each DMPDP is characterized by both the 2-D spatial component (i.e. 923 x 607 dimensions, one for each cell of the grid) and by the temporal component (i.e. each cell has repeated values in time, for a total of 96 daily dimensions per cell). So, while each DMPDP counts for p about 50 millions (923 x 607 x 96) of space-time dimensions, time and economic constraints prevent us from having a longer time series of DMPDPs. In this terms, to group DMPDPs configures as an High Dimensional Low Sample Size (HDLSS) problem, since p >> n. We propose a mixed-approach procedure that we apply to the city of Brescia. First, borrowing the method of the Histogram of Oriented Gradients (HOG) from the Image Clustering discipline (Tomasi, 2012), we perform a reduction of the DMPDPs dimensionality computing their features extractions. In doing so, we perform some tuning on the HOG parameters in order to reduce as much as possible the DMPDPs dimensionality while preserving as much as possible the information contained in the extracted features. With this approach we preserve both the spatial and the temporal components of the DMPDPs. Then, using the HOG features extractions, we group DMPDPs by applying - and by testing the feasibility of - different clustering approaches for large data (Kaufman, Rousseeuw, 2009).
On Clustering Daily Mobile Phone Density Profiles
Metulini R.
;Carpita M.
2018-01-01
Abstract
In the context of Smart cities, local institutions face the increasing need for monitoring the dynamic of the flow of people’s presences inside urban areas in order to plan the improvement and the maintaining of the urban infrastructure. Rectangular grid polygons reporting the density of people using mobile phone (Carpita, Simonetto, 2014) are source of very large data. Telecom Italia Mobile (TIM), which is currently the largest operator in Italy in this sector, thanks to a research agreement with the Statistical Office of the Municipality of Brescia, provided to us about two years (April 2014 to June 2016, n about 700) of Daily Mobile Phone Density Profiles (DMPDPs) for the Province of Brescia in the form of a regular grid of 923 x 607 cells each 15 minutes. In order to find regularities and detect anomalies in the flow of people’s presences, this work aims to cluster similar DMPDPs, where each DMPDP is characterized by both the 2-D spatial component (i.e. 923 x 607 dimensions, one for each cell of the grid) and by the temporal component (i.e. each cell has repeated values in time, for a total of 96 daily dimensions per cell). So, while each DMPDP counts for p about 50 millions (923 x 607 x 96) of space-time dimensions, time and economic constraints prevent us from having a longer time series of DMPDPs. In this terms, to group DMPDPs configures as an High Dimensional Low Sample Size (HDLSS) problem, since p >> n. We propose a mixed-approach procedure that we apply to the city of Brescia. First, borrowing the method of the Histogram of Oriented Gradients (HOG) from the Image Clustering discipline (Tomasi, 2012), we perform a reduction of the DMPDPs dimensionality computing their features extractions. In doing so, we perform some tuning on the HOG parameters in order to reduce as much as possible the DMPDPs dimensionality while preserving as much as possible the information contained in the extracted features. With this approach we preserve both the spatial and the temporal components of the DMPDPs. Then, using the HOG features extractions, we group DMPDPs by applying - and by testing the feasibility of - different clustering approaches for large data (Kaufman, Rousseeuw, 2009).File | Dimensione | Formato | |
---|---|---|---|
Matulini & Carpita (2018) On Clustering Daily Mobile Phone Density Profiles Poster.pdf
accesso aperto
Descrizione: Poster Workshop on High Dimensional Small Data
Tipologia:
Documento in Post-print
Licenza:
Dominio pubblico
Dimensione
1.11 MB
Formato
Adobe PDF
|
1.11 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.