In the last decades, the rise of Big Data solutions has significantly advanced the analysis of time series data as representation of dynamic phenomena through sequences of observations. Recent research efforts have advocated for the adoption of data summarisation techniques, such as incremental clustering, to promptly capture data evolution, thus facilitating domain experts in making informed and proactive decisions, capitalising on a compact representation of time series. Neverthe-less, while incremental clustering effectively reduces data volume, thus preserving relevant statistical information, it is crucial to estimate the degree of approximation between the original time series data and its summarised version. This evaluation is pivotal whenever the summarisation output is the starting point to set up complex analytical pipelines (e.g., for pattern recognition and anomaly detection purposes). Stemming from practical and empirical considerations made upon both a synthetic and a real-world dataset, we propose in this paper a variant of a renowned quality metric for incremental clustering, to assess the extent to which the time series summary accurately captures the dynamics of the original data.
An Empirical Approach for Clustering-Based Time Series Summarisation Assessment
Bianchini D.;Garda M.
2024-01-01
Abstract
In the last decades, the rise of Big Data solutions has significantly advanced the analysis of time series data as representation of dynamic phenomena through sequences of observations. Recent research efforts have advocated for the adoption of data summarisation techniques, such as incremental clustering, to promptly capture data evolution, thus facilitating domain experts in making informed and proactive decisions, capitalising on a compact representation of time series. Neverthe-less, while incremental clustering effectively reduces data volume, thus preserving relevant statistical information, it is crucial to estimate the degree of approximation between the original time series data and its summarised version. This evaluation is pivotal whenever the summarisation output is the starting point to set up complex analytical pipelines (e.g., for pattern recognition and anomaly detection purposes). Stemming from practical and empirical considerations made upon both a synthetic and a real-world dataset, we propose in this paper a variant of a renowned quality metric for incremental clustering, to assess the extent to which the time series summary accurately captures the dynamics of the original data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.