Cultural organizations today can rely on online platforms to study users’ opinions and the most discussed topics related to both general and specific cultural offerings. Despite data acquisition tools, managing unstructured databases remains a hurdle. To overcome this, we propose a classification model that transforms unorganized data into a structured thematic database. The specific case pertains to the Italian city of Brescia. We build a language model that classifies online reviews into four semantic areas defined by the key attractions of the city. We fine-tuned the pre-trained Multilingual BERT, XLM-RoBERTa, and AlBERTo models in a multiclassification task, with promising results based on performance metrics (average F1: 0.72, 0.73, 0.7, respectively; average AUC: 0.91, 0.91, 0.92, respectively). Additionally, clusters of reviews have been detected by applying the HDBSCAN algorithm on their vector representations produced by the model. As a transformation of the chi-square statistic, the Keyness statistic has been employed to extract cluster-specific keywords, which have proven to be highly consistent with the characteristics and offerings of the key cultural attractions, further confirming the good performance of the model. Results show that the proposed model can be profitably employed by policymakers and managers of cultural tourism institutions to use textual data to derive relevant insights about visitors’ experience at specific attractions of interest.

A multilingual BERT-based classification of reviews for enhanced visitors’ experience analysis

Ricciardi R.;Manisera M.
2025-01-01

Abstract

Cultural organizations today can rely on online platforms to study users’ opinions and the most discussed topics related to both general and specific cultural offerings. Despite data acquisition tools, managing unstructured databases remains a hurdle. To overcome this, we propose a classification model that transforms unorganized data into a structured thematic database. The specific case pertains to the Italian city of Brescia. We build a language model that classifies online reviews into four semantic areas defined by the key attractions of the city. We fine-tuned the pre-trained Multilingual BERT, XLM-RoBERTa, and AlBERTo models in a multiclassification task, with promising results based on performance metrics (average F1: 0.72, 0.73, 0.7, respectively; average AUC: 0.91, 0.91, 0.92, respectively). Additionally, clusters of reviews have been detected by applying the HDBSCAN algorithm on their vector representations produced by the model. As a transformation of the chi-square statistic, the Keyness statistic has been employed to extract cluster-specific keywords, which have proven to be highly consistent with the characteristics and offerings of the key cultural attractions, further confirming the good performance of the model. Results show that the proposed model can be profitably employed by policymakers and managers of cultural tourism institutions to use textual data to derive relevant insights about visitors’ experience at specific attractions of interest.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/632607
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact