Data exploration and decision making may benefit from the availability of data-intensive web applications, that enable domain experts to navigate across massive, dynamic and heterogeneous data sources, stored in the so-called Data Lakes. However, traditional design strategies for this kind of applications require in the background well-defined and cleaned data structures. Conceptual modelling may be fruitfully employed to provide web developers with a comprehensive vision over Data Lake sources, on which web applications are designed. Nevertheless, the cumbersome nature of Data Lakes turns the conceptual model into a dynamic entity, which must be properly managed. In this paper, we propose a methodological approach to design data-intensive web applications on top of a Data Lake. A conceptual data model, weaved over Data Lake sources, is leveraged to identify the relevant information to be included in the web application. The methodology makes the model evolve both with new data sources content emerging from the Data Lake, through a zone-based operations pipeline that prepares a curated version of the raw data (bottom-up), and with additional domain knowledge provided by web developers derived from the data-intensive web application design (top-down). The approach, independent from any specific implementation technology, is declined in the context of a real case study regarding an ongoing research project in the cultural heritage domain.
A Methodological Approach for Data-Intensive Web Application Design on Top of Data Lakes
Bianchini D.;Garda M.
2023-01-01
Abstract
Data exploration and decision making may benefit from the availability of data-intensive web applications, that enable domain experts to navigate across massive, dynamic and heterogeneous data sources, stored in the so-called Data Lakes. However, traditional design strategies for this kind of applications require in the background well-defined and cleaned data structures. Conceptual modelling may be fruitfully employed to provide web developers with a comprehensive vision over Data Lake sources, on which web applications are designed. Nevertheless, the cumbersome nature of Data Lakes turns the conceptual model into a dynamic entity, which must be properly managed. In this paper, we propose a methodological approach to design data-intensive web applications on top of a Data Lake. A conceptual data model, weaved over Data Lake sources, is leveraged to identify the relevant information to be included in the web application. The methodology makes the model evolve both with new data sources content emerging from the Data Lake, through a zone-based operations pipeline that prepares a curated version of the raw data (bottom-up), and with additional domain knowledge provided by web developers derived from the data-intensive web application design (top-down). The approach, independent from any specific implementation technology, is declined in the context of a real case study regarding an ongoing research project in the cultural heritage domain.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.