Data analysis method and data analysis system
January 4, 2021
A system that can integrate data which a plurality of institutes possess without the fear of private information leakage
Key Word : Data analysis, Privacy-preserving Distributed Data Integration, information leakage, collating the data, plurality of institutes
Background / Context / Abstract:
In general, data are collected and stored independently by each institute, and if these data were able to be collated with the related data which different institutes possess, we could obtain new knowledge; however, in order to collate data, not only must the personal information contained in the data be protected but also any sensitive information as institutes (Ex.: Presence of a patient with a special disease, academic results distribution of a certain school, etc.) must be kept secret.
For conventional studies on the collation of a database, a reliable third-party institution must control the database, and database collation with the information of relevant institutes completely kept secret has not yet been performed. In the conventional system, only one characteristic (attribute) of the data which each institute possesses is allowed to be handled but since in actuality, the attributes of the data collected by each institute differ, a plurality of attributes must be handled and the system is not suitable for real use.
The inventor made improvements on the basis of a multi-institute PSI (Private Set Intersection) system and proposed a Privacy-preserving Distributed Data Integration (PDDI) system that can integrate data which a plurality of institutes possess without the fear of information leakage. By this, the inventor demonstrated an algorithm that can solve the problems of conventional methods for collating a database.
A system for collating the data in the possession of a plurality of data possessing institutes with the information of the relevant institutes kept secret and outputting the collated data to the client, characterized by being integrated by the following steps:
 A step of entering the attribute which the client wants to analyze,
 A step of converting the entered attribute into encrypted data at each data possessing institute and transmitting the encrypted data [Enc(BF(Si)-1) is transmitted],
 A step of integrating and transmitting the transmitted encrypted data [Enc(IBF(Si)-n) is transmitted], and
 A step of processing the transmitted integrated encrypted data by the data possessing institute and outputting the analysis result to the client.
The step of integrating the encrypted data can integrate the data with the privacy kept secret by an external computer.
・It is possible to collate data with the information of the institute and the individual person kept secret without using any third-party institution.
・It is possible to aggregate (cross-aggregate) two or more items simultaneously.
・It is not necessary to define the number of attributes of the data which each institute possesses and data with a plurality of attributes is able to be analyzed.
Potential Applications / Potential Markets:
Software for integrating data possessed by a plurality of institutes
State of Development / Opportunity / Seeking:
・Available for exclusive and non-exclusive licensing
・Exclusive/non-exclusive evaluation for defined period (set up for options)
WO2018/124104 (National phase; JP,US,EP)