Data exploration and decision making may benefit from the availability of data-intensive web applications, that enable domain experts to navigate across massive, dynamic and heterogeneous data sources, stored in the so-called Data Lakes. However, traditional design strategies for this kind of applications require in the background well-defined and cleaned data structures. Conceptual modelling may be fruitfully employed to provide web developers with a comprehensive vision over Data Lake sources, on which web applications are designed. Nevertheless, the cumbersome nature of Data Lakes turns the conceptual model into a dynamic entity, which must be properly managed. In this paper, we propose a methodological approach to design data-intensive web applications on top of a Data Lake. A conceptual data model, weaved over Data Lake sources, is leveraged to identify the relevant information to be included in the web application. The methodology makes the model evolve both with new data sources content emerging from the Data Lake, through a zone-based operations pipeline that prepares a curated version of the raw data (bottom-up), and with additional domain knowledge provided by web developers derived from the data-intensive web application design (top-down). The approach, independent from any specific implementation technology, is declined in the context of a real case study regarding an ongoing research project in the cultural heritage domain.

A Methodological Approach for Data-Intensive Web Application Design on Top of Data Lakes

Bianchini D.;Garda M.
2023-01-01

Abstract

Data exploration and decision making may benefit from the availability of data-intensive web applications, that enable domain experts to navigate across massive, dynamic and heterogeneous data sources, stored in the so-called Data Lakes. However, traditional design strategies for this kind of applications require in the background well-defined and cleaned data structures. Conceptual modelling may be fruitfully employed to provide web developers with a comprehensive vision over Data Lake sources, on which web applications are designed. Nevertheless, the cumbersome nature of Data Lakes turns the conceptual model into a dynamic entity, which must be properly managed. In this paper, we propose a methodological approach to design data-intensive web applications on top of a Data Lake. A conceptual data model, weaved over Data Lake sources, is leveraged to identify the relevant information to be included in the web application. The methodology makes the model evolve both with new data sources content emerging from the Data Lake, through a zone-based operations pipeline that prepares a curated version of the raw data (bottom-up), and with additional domain knowledge provided by web developers derived from the data-intensive web application design (top-down). The approach, independent from any specific implementation technology, is declined in the context of a real case study regarding an ongoing research project in the cultural heritage domain.
2023
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
MIUR (compresi PRIN FIRB,FISR)
PE6_10 Web and information systems, database systems, information retrieval and digital libraries
Esperti anonimi
Inglese
no
24th International Conference on Web Information Systems Engineering, WISE 2023
2023
aus
Internazionale
ELETTRONICO
14306
349
359
11
978-981-99-7253-1
978-981-99-7254-8
Springer Science and Business Media Deutschland GmbH
conceptual model; Data Lakes; Data-intensive web applications; methodological approach; zone-based architecture
no
Goal 9: Industry, Innovation, and Infrastructure
none
Bianchini, D.; Garda, M.
273
info:eu-repo/semantics/conferenceObject
2
4 Contributo in Atti di Convegno (Proceeding)::4.1 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/590411
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact