Data Visualization Literacy assessments are often administered via fixed sets of Data Visualization (DV) items, despite heterogeneity in how different people interpret the same DV. In this study, we predict Human Interpretation Correctness, i.e., whether a specific person will answer a DV item correctly, prior to their exposure to the target DV. We operationalize this as Predicting Human Interpretation Correctness (P-HIC), a binary classification task using 22 pre-exposure features spanning Human Profile, Human Performance, and Item difficulty (i.e., experts’ ratings and Rasch model). In an online survey, 1083 participants answered 32 DV items (eight DVs × four items), yielding 34,656 responses. Across 32 item-specific datasets, 10× 10-fold cross-validation shows that a Bagging ensemble of J48 decision trees, combined with feature selection, performs best, achieving a median AUC of 0.73 and a median kappa of 0.33. Feature analyses indicate that item difficulty estimated by the Rasch model dominates prediction, followed by experts’ ratings and prior correctness (increasing in importance across sessions), while profile features contribute little. These results suggest that pre-exposure misinterpretation risk can be estimated above chance and warrant future evaluation of assistive item-ranking strategies in simulated and real adaptive assessment settings.

Beyond literacy: Predicting interpretation correctness of visualizations with user traits, item difficulty, and Rasch scores

Silvia Golia;Angela Locoro;
2027-01-01

Abstract

Data Visualization Literacy assessments are often administered via fixed sets of Data Visualization (DV) items, despite heterogeneity in how different people interpret the same DV. In this study, we predict Human Interpretation Correctness, i.e., whether a specific person will answer a DV item correctly, prior to their exposure to the target DV. We operationalize this as Predicting Human Interpretation Correctness (P-HIC), a binary classification task using 22 pre-exposure features spanning Human Profile, Human Performance, and Item difficulty (i.e., experts’ ratings and Rasch model). In an online survey, 1083 participants answered 32 DV items (eight DVs × four items), yielding 34,656 responses. Across 32 item-specific datasets, 10× 10-fold cross-validation shows that a Bagging ensemble of J48 decision trees, combined with feature selection, performs best, achieving a median AUC of 0.73 and a median kappa of 0.33. Feature analyses indicate that item difficulty estimated by the Rasch model dominates prediction, followed by experts’ ratings and prior correctness (increasing in importance across sessions), while profile features contribute little. These results suggest that pre-exposure misinterpretation risk can be estimated above chance and warrant future evaluation of assistive item-ranking strategies in simulated and real adaptive assessment settings.
File in questo prodotto:
File Dimensione Formato  
IPM2026.pdf

embargo fino al 22/06/2029

Tipologia: Full Text
Licenza: PUBBLICO - Pubblico con Copyright
Dimensione 1.35 MB
Formato Adobe PDF
1.35 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/647925
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact