Relevant Official Standards for interoperability
- The Generic Statistical Business Process Model (GSBPM) is the reference framework to describe and design the statistical process. GSBPM allows identifying the several steps of the statistical business process and the connection between them.
- SDMX: Statistical Data and Metadata eXchange is a standard for the exchange of statistical data and metadata among international organisations.
- DCAT-AP/statDCAT-AP: DCAT Application profile for data portals in Europe is a specification based on W3C's Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe. statDCAT-AP is a DCAT-AP extension for the exchange of metadata for statistical datasets.
- ETSI NGSI-LD: The Context Information Management API allows users to provide, consume and subscribe to context information in multiple scenarios and involving multiple stakeholders.
Data Pipeline ETL Approach
- Openness: the code, developed using open tools, is available in the INTERSTAT GitHub repository.
- Maximal automation, to avoid manual treatments, save time and improve traceability.
- Reproducibility, resulting from automation and code documentation.
- Efficiency, increased by the execution of the pipeline in a distributed environment.
Data Pipeline Domain - Knowledge Approach
- Description of the domain of interest through an ontology, representing the core concepts.
- Definition of a logical Common Data Model to link heterogeneous data sources with ontology concepts.
- Reproducibility, resulting from automation and code documentation.
- Efficiency, increased by the execution of the pipeline in a distributed environment.
- Data Acquisition: the datasets to link are downloaded from sources and uploaded in a DBMS.
- Data Processing: Data from different sources are loaded in a DBMS with the Common Data Model. In this step data are not integrated, but are harmonized and federated.
- Conceptual integration, to link ontology concepts to Data in the Common Data Model, using Monolith tool and query data through ontology concepts.
- Formal and clear definition of target concepts and related metadata
- Automatic reasoning
- Cross domain interoperability
- Data linkage by design
- Decoupling between data structure and data semantics
- Incremental approach and cheaper change management (data and metadata changes)
- Easier linkage of new external data sources