Open Statistical Data
Interoperability Framework

Inventory of tools to enable and increase semantic and technical
interoperability

Combining tools and Official Standards to implement interoperable
Data Pipelines
Tools for Data Publication

card image

Idra - Open Data Federation Platform

Tool for Data Publication

Idra

Idra is a web application able to federate existing ODMS based on different technologies providing a unique access point to search and discover open datasets.

card image

GraphDB

Tool for Data Publication

GraphDB

The tool allows for the management of data repositories produced by the project's data pipelines. It allows to query the data as an HTTP service end-point, to monitor rhe queries and to use a specific editor to test and request data.

card image



CKAN Portal

Tool for Data Publication

CKAN
Portal

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers hundreds of data portals worldwide.


Tools for Data Dissemination

card image

CEF Orion
Context Broker

Tool for Data Dissemination

CEF Orion Context Broker

The tool allows for the management and requesting context of information based on LD standards following the ETSI NGSI-LD specification. The tool consists of a set of APIs for context data management.

card image

Data
Browser

Tool for Data Dissemination

Data Browser

It interacts with SDMX web services allowing data-users to browse, present and visualize datasets and can be used in order to disseminate datasets stored into one or more databases.

card image

Eurostat NSI
Web service

Tool for Data Dissemination

Eurostat NSI Web service

The web service increases and makes more qualitative the semantic interoperability based on statistical data made available in SDMX and is intended to enable Member States to expose their dissemination database.


Tools for Data or Metadata Management

card image

Adminer
MySQL

Tool for Data or Metadata Management

Adminer
MySQL

The tool (formerly phpMinAdmin) is a full-featured database management tool written in PHP. Adminer is available for MySQL, MariaDB, PostgreSQL, SQLite, MS SQL, Oracle, Elasticsearch, MongoDB.

card image


OBDA
Monolith

Tool for Data or Metadata Management

Monolith

The tool allows for the management and requesting context of information in a structured manner based on LD standards following the ETSI NGSI-LD specification. The tool does not provide an interface but consists of a set of APIs for context data management.

card image

Meta and Data Manager

Tool for Data or Metadata Management

Meta and Data Manager

The tool publishes data and structural metadata through the Eurostat's SDMX-RI NSI Web service, manages structural metadata and allows to create a dissemination/reporting SDMX compliant database.

card image

SparQling

Tool for Data or Metadata Management

SparQling

The tool is a graphical query editor that allows to obtain a query in SPARQL language starting from an ontology expressed in Graphol.


Tools for Data Visualization

card image

Cube
Visualizer

Tool for Data Visualization

Cube Visualizer

A data analysis and visualization tool that creates graphical views of an RDF data cube's one-dimensional slices. It is possible to choose between different types of charts.

card image

Olap
Browser

Tool for Data Visualization

Olap
Browser

Data analysis and visualisation tool which presents a two-dimensional slice of an RDF data cube in table format and is possibile to select the desired fields.


Tools for Data Transformation

card image

Excel or CSV
to NGSI-LD

Tool for Data Transformation

Excel or CSV to NGSI-LD

The application aims to transform files in Excel or CSV format making them compliant with the ETSI NGSI-LD standard and upload the file directly into the CEF Context Broker. The tool does not provide an interface but consists of two REST APIs for file conversion.

card image

SDMX/NGSI-LD
Parser

Tool for Data Transformation

SDMX/NGSI-LD Parser

The tool supports automatic translation from Turtle Terse RDF to JSON-LD format compatible with ETSI NGSI-LD v1.4.1 common information model. Tool integrated with FIWARE Context Broker (Orion-LD).

card image

Eddy
Tool

Tool for Data Transformation

Eddy

Eddy is a graphical editor for the specification of Graphol/OWL ontologies. Drawing features allow designers to edit ontologies in a central viewport area while two lateral areas contains specific widgets for editing open diagrams.

card image

Juma
Editor

Tool for Data Transformation

Juma Editor

It is a CSV to RDF mapping and conversion tool build on top of R2RML technology, a standard for the conversion of tabular data to RDF. It allows users to create their mappings using blocks.

Relevant Official Standards for interoperability


Standards

  • The Generic Statistical Business Process Model (GSBPM) is the reference framework to describe and design the statistical process. GSBPM allows identifying the several steps of the statistical business process and the connection between them.

  • SDMX: Statistical Data and Metadata eXchange is a standard for the exchange of statistical data and metadata among international organisations.

  • DCAT-AP/statDCAT-AP: DCAT Application profile for data portals in Europe is a specification based on W3C's Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe. statDCAT-AP is a DCAT-AP extension for the exchange of metadata for statistical datasets.

  • ETSI NGSI-LD: The Context Information Management API allows users to provide, consume and subscribe to context information in multiple scenarios and involving multiple stakeholders.




Data Pipeline
ETL Approach


Longtail boat in Thailand

The ETL approach concerns a generalised ETL (Extract, transform, load) pattern that generates RDF triples from a CSV Dataset through Python procedures.

The main advantages of this approach are:

  • Openness: the code, developed using open tools, is available in the INTERSTAT GitHub repository.

  • Maximal automation, to avoid manual treatments, save time and improve traceability.

  • Reproducibility, resulting from automation and code documentation.

  • Efficiency, increased by the execution of the pipeline in a distributed environment.




Data Pipeline
Domain - Knowledge Approach


Longtail boat in Thailand

The Domain Knowledge approach is based on:

  • Description of the domain of interest through an ontology, representing the core concepts.

  • Definition of a logical Common Data Model to link heterogeneous data sources with ontology concepts.

  • Reproducibility, resulting from automation and code documentation.

  • Efficiency, increased by the execution of the pipeline in a distributed environment.

The main steps of this data pipeline are:

  • Data Acquisition: the datasets to link are downloaded from sources and uploaded in a DBMS.

  • Data Processing: Data from different sources are loaded in a DBMS with the Common Data Model. In this step data are not integrated, but are harmonized and federated.

  • Conceptual integration, to link ontology concepts to Data in the Common Data Model, using Monolith tool and query data through ontology concepts.

The main advantages of describing the domain of interest through ontologies are:

  • Formal and clear definition of target concepts and related metadata

  • Automatic reasoning

  • Cross domain interoperability

  • Data linkage by design

  • Decoupling between data structure and data semantics

  • Incremental approach and cheaper change management (data and metadata changes)

  • Easier linkage of new external data sources





Explore a Data Pipeline

Analyze in detail an example of a Data Pipeline and the Tools executing each step.

Longtail boat in Thailand

List of Tools:




Inventory of Tools and Generic Statistical Business Process Model (GSBPM)
Name Documentation Main feature Version Main GSBPM Owner
CKAN Portal CKAN portal for the publication of data produced within the project. Data will be entered within the INTERSTAT organization present in the portal. 2.9.4 7.2. Produce dissemination products CKAN
GraphDB Graph database with RDF and SPARQL support to store the data produced within the project. It also features a SPARQL endpoint for querying data. 9.10 7.2. Produce dissemination products Ontotext
Eurostat NSI Web Service Increases and makes more qualitative the semantic interoperability based on statistical data made available in SDMX. 1.5.6 6.1. Prepare draft outputs ISTAT
Meta and Data Manager Publishes data and structural metadata through the Eurostat's SDMX-RI NSI Web service and manages structural metadata 1.5.6 4.4. Finalise collection ISTAT
Data Browser Interacts with SDMX web services allowing data-users to browse, present and visualize datasets 1.1.1 7.2. Produce dissemination products ISTAT
Juma Editor Visual Ontology Mapping tool Jun 2019 5.1. Integrate data Derilinx
Juma API Visual Ontology Mapping tool Jun 2019 5.1. Integrate data Derilinx
Cube Visualizer Data analysis and visualisation tool that creates graphical representations of an RDF data cube’s one-dimensional slices Apr 2019 6.3. Interpret and explain outputs Derilinx
Olap Browser Data analysis and visualisation tool that presents a two-dimensional slice of an RDF data cube in form of a table Apr 2019 6.3. Interpret and explain outputs Derilinx
SparQLing SPARQL graphical query editor Oct 2018 5.5. Derive new variables and units OBDA
Sparql React Collection of demo applets showing JavaScript front-end on RDF data Jun 2021 3.6. Test statistical business process Insee
Idra Federates existing ODMS based on different technologies providing a unique access point to search and discover open datasets 2.3.1 4.2. Set up collection ENG
CEF Context Broker Allows for the management and requesting context of information in a structured manner based on LD standards following the ETSI NGSI-LD specification Jan 2021 7.2. Produce dissemination products FIWARE
Excel/CSV to NGSI-LD Transforms Excel or Csv files into ETSI NGSI-LD and upload into CEF Context Broker Jan 2021 4.4. Finalise collection FIWARE