The Big Data technology allows sharing and analyzing large amount of data to help businesses improve services. The growing use of such data raises serious privacy concerns. While the General Data Protection Regulation (GDPR) helps protect the privacy rights of individuals, it has created serious challenges for businesses to comply with it. Traditional privacy solutions are unfortunately not compatible with the underlying data analytics technology.
The main goal of PAPAYA is to develop a platform of privacy-preserving (pp) analytics. The project considers the following objectives:
- Design efficient pp data analytics techniques;
- Explore different settings involving one or more data sources and third-party queriers;
- Enable risk management and user control of data disclosure;
- Design and develop an integrated platform;
- Lead an end-to-end analysis for different use cases;
- Disseminate and exploit PAPAYA results to maximize the visibility and sustainability of the project outcomes.
The project (which concluded on July 31st, 2021) developed the PAPAYA framework to enable the execution of data analytics operations without disclosing the underlying data and enabling data subjects to have some control over the operations whenever possible. The platform regroups the following services and tools:
- 4 pp data analytics modules: neural networks (classification and training), trajectory clustering, counting and basic statistics. These modules leverage cryptographic techniques such as homomorphic encryption, secure multi-party computation, differential privacy or functional encryption;
- security and transparency services, including the identity access management (IAM) service, auditing services, and the key management service;
- the Platform and Agent dashboards for configuration, monitoring and visualization;
- Data subject tools that (i) present risk management artefacts, (ii) illustrate the PAPAYA pp analytics, and (iii) enable data subjects to express their privacy preferences and apply their rights.
The platform was demonstrated through 5 use cases (UC) regrouped in 2 families (healthcare and telecom UCs):
- The arrhythmia detection UC allows the platform to execute a pp Neural Network classification over ECG data and obtain the arrhythmia type of the patient;
- The stress detection UC implements a pp collaborative training solution where each source maintains a private health dataset and stress conditions of workers are automatically detected;
- The mobility analytics UC allows stakeholders that run the PAPAYA platform to measure the audience in some areas or extract mobility patterns in a pp manner;
- The mobile usage analytics UC allows extracting analytics (through pp counting) on individuals’ usage of their mobile phones.
- The threat detection UC executes pp neural networks to detect system threats originating from several sources.
The project identified 12 exploitable assets that can be regrouped in three categories:
- The platform for pp data analytics;
- The individual modules: 4 pp analytics and 2 GDPR compliance modules;
- The 5 UCs of the project.
One patent related to the pp mobility analytics is under submission.
The project output 21 publications and participated to various events where the project results were presented and demonstrated.