Data Visualization Literacy assessments are often administered via fixed sets of Data Visualization (DV) items, despite heterogeneity in how different people interpret the same DV. In this study, we predict Human Interpretation Correctness, i.e., whether a specific person will answer a DV item correctly, prior to their exposure to the target DV. We operationalize this as Predicting Human Interpretation Correctness (P-HIC), a binary classification task using 22 pre-exposure features spanning Human Profile, Human Performance, and Item difficulty (i.e., experts’ ratings and Rasch model). In an online survey, 1083 participants answered 32 DV items (eight DVs × four items), yielding 34,656 responses. Across 32 item-specific datasets, 10× 10-fold cross-validation shows that a Bagging ensemble of J48 decision trees, combined with feature selection, performs best, achieving a median AUC of 0.73 and a median kappa of 0.33. Feature analyses indicate that item difficulty estimated by the Rasch model dominates prediction, followed by experts’ ratings and prior correctness (increasing in importance across sessions), while profile features contribute little. These results suggest that pre-exposure misinterpretation risk can be estimated above chance and warrant future evaluation of assistive item-ranking strategies in simulated and real adaptive assessment settings.
Beyond literacy: Predicting interpretation correctness of visualizations with user traits, item difficulty, and Rasch scores
Silvia Golia;Angela Locoro;
2027-01-01
Abstract
Data Visualization Literacy assessments are often administered via fixed sets of Data Visualization (DV) items, despite heterogeneity in how different people interpret the same DV. In this study, we predict Human Interpretation Correctness, i.e., whether a specific person will answer a DV item correctly, prior to their exposure to the target DV. We operationalize this as Predicting Human Interpretation Correctness (P-HIC), a binary classification task using 22 pre-exposure features spanning Human Profile, Human Performance, and Item difficulty (i.e., experts’ ratings and Rasch model). In an online survey, 1083 participants answered 32 DV items (eight DVs × four items), yielding 34,656 responses. Across 32 item-specific datasets, 10× 10-fold cross-validation shows that a Bagging ensemble of J48 decision trees, combined with feature selection, performs best, achieving a median AUC of 0.73 and a median kappa of 0.33. Feature analyses indicate that item difficulty estimated by the Rasch model dominates prediction, followed by experts’ ratings and prior correctness (increasing in importance across sessions), while profile features contribute little. These results suggest that pre-exposure misinterpretation risk can be estimated above chance and warrant future evaluation of assistive item-ranking strategies in simulated and real adaptive assessment settings.| File | Dimensione | Formato | |
|---|---|---|---|
|
IPM2026.pdf
embargo fino al 22/06/2029
Tipologia:
Full Text
Licenza:
PUBBLICO - Pubblico con Copyright
Dimensione
1.35 MB
Formato
Adobe PDF
|
1.35 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


