ZERO-CHECK, A ZERO-KNOWLEDGE PROTOCOL FOR RECONCILING PATIENT IDENTITIES ACROSS INSTITUTIONS
http://65.222.228.150/jjb/zero.txt
Jules J. Berman, PhD, MD
National Institutes of Health
Rockville, MD USA
Background: Large clinical studies collect patient records from multiple institutions. Unless patient identities can be reconciled across institutions, individuals with records held in different institutions will be falsely "counted" as multiple persons when databases are merged.
Technology: The purpose of this study is to create a safe, zero-knowledge protocol that can reconcile individuals with records in multiple institutions without exchanging or comparing confidential or identifying patient information.
Design: The steps of the protocol are as follows:
1. Institution A and Institutions B each create a random character string and send it to the other institution.
2. Each institution sums the received random character with their own random character string, producing a random character string common to both institutions (RandA+B).
3. Each institution takes a patient identifier (a name, a social security number, a birth date, or some combination of identifiers) and sums it with RandA+B. The result is a patient random character string that is identical across institutions when the patient is identical in both institutions.Optional implementation: At this point, RandA+B can be destroyed at both institutions and RandA and RandB can be destroyed by institutions A and B respectively, leaving only the patient random character string at each institution. The destruction of these random numbers makes it impossible to recompute the original identifier from the patient random character string.
Optional implementation: At this point, institutions may provide the patient random character string to a data broker. Having only the patient random character strings, the broker has zero patient-related information.4. Institution A and B compare their patient random character strings.
Optional implementation: Institution A sends the first character of the patient random character string to Institution B. If the first character is not identical in both institutions, the protocol ends. The two patients are not the same person. If the first character is identical in both institutions, Institution B sends the second character. The process is repeated until a sufficient number of transactions have occurred to convince the institutions that they have the same patient random character string. This strategy ensures that the patient random character strings are never actually exchanged between institutions.
Results: The protocol can be implemented so that no information about the patient is transmitted across institutions. The protocol can be executed at high computational speed. A Perl script (zero.pl) compared the speed of creating de-identified information using the zero-check protocol and with the MD_5 one-way hash. There was no significant difference in computational speed. The Perl script is available from http://65.222.228.150/jjb/zero.txt
Conclusion: A zero-knowledge protocol for reconciling patients across institutions is described. This protocol is an example of a computational approach to data sharing designed to help medical researchers comply with newly enacted U.S. medical privacy regulations.