Over the last 36 months, the HOBBIT consortium has aimed to fulfil the objectives aforementioned by implementing the plan of action illustrated in Figure 2.
- Data and measure collection: We gathered input on relevant datasets and quality measures from members of the European industry landscape within surveys. To this end, we (1) joined the EU project DataBench in the creation of the HOBBIT association and (2) co-organized and participated in meetups around Europe (including, e.g. EBDVF 2018). During these meetups, we presented the idea behind HOBBIT as well as engaged with the participants to gather their requirements to a Big Linked Data benchmarking platform. The main results of HOBBIT’s dissemination and engagement were (1) the creation of a HOBBIT association as Special Group 7 of Task Force 6 of the Big Data Value Association, (2) surveys to gather information from European companies and academic pertaining to their use and evaluation of Big Linked Data and corresponding platform and (3) datasets for the HOBBIT data repository. Overall, HOBBIT compiled a contact list with more than 300 members. The 25 datasets and dataset generators available through the HOBBIT CKAN repository at
https://hobbit.ilabt.imec.be/(opens in new window) encompass industrially relevant datasets partly provided by HOBBIT.
- Benchmark creation: The measures and the datasets collected formed the basis for the 8 HOBBIT benchmarks, which were made available in 2 versions over the project. Each benchmark comprises the following three components: a deterministic data source, a number of tasks and a set of KPIs. In addition, 5 scalable mimicking algorithms (which generate data of industrial relevance) were created in the project to ensure that the benchmarks reflect realistic use cases as well as to circumvent the problem of not being given access to real datasets from industry. A number of evaluations showed that the mimicking algorithms provided by the project generate synthetic data close to real data w.r.t. features such as temporal and spatial distribution.
- The HOBBIT evaluation platform (see Figure 3) is the third core result of HOBBIT. It is built to support the benchmarking of Big Linked Data solutions at both small and large scale. The platform is developed as an open-source solution (see
https://github.com/hobbit-project/platform(opens in new window)) and support 14 challenges over the project runtime. Extensions to remote computation facilities such as AWS and an SDK complete the package. A mix of contributions from HOBBIT and from external users has now led to the platform containing 52 benchmarks and more than 300 docker images. The more than 200 users and 12,600 experiments ran over the runtime of the project suggest that the HOBBIT platform is turning into a crystallization point for benchmarking Big Linked Data.
- Evaluation campaigns: HOBBIT ran evaluation campaigns for all benchmarks within 14 challenges (including the Mighty Storage Challenge -MOCHA-, the Question Answering on Linked Data Challenge -QALD- and the Open Knowledge Extraction Challenge -OKE- at ESWC2017 as well as the DEBS grand challenges 2017 and 2018). The results show that the HOBBIT benchmarking platform scales to the requirements of large-scale benchmarking. Limitations of existing solutions at scale (e.g. completeness for storage, recall for question answering, F-measure for machine learning) could be unveiled through the scalable benchmarking provided by HOBBIT. Moreover, the lack of scalability of a large number of Linked Data solutions was made evident.