Data Sharing and Participant Privacy
The goal of this Silent Spring Institute project is to investigate ways to publicly share environmental health data while protecting the privacy of study participants.
Demand for data has increased in recent years, as part of an effort to improve research outcomes and democratize knowledge. To contribute to this effort, Silent Spring Institute began developing an online tool to publicly share the Institute's household exposure data, but temporarily suspended this effort to investigate the privacy implications for study participants.
Though publicly shared data is "anonomyzed" through the removal of names, addresses, and other personally identifiable information, computer scientists have demonstrated that individuals in these datasets can often still be re-identified through data linking and statistical methods. Using these approaches, children have been re-identified from a cancer registry and methods have been introduced to re-identify people by the combination of their home and work locations, their movie reviews and their pharmacy records.
Researchers are beginning to propose privacy solutions to ensure study participants are treated ethically during data sharing. Many geneticists are embracing the open consent model, where prospective study participants are repeatedly cautioned that their data will be openly shared with no guarantee of privacy or compensation for harm. Computer scientists have proposed technically focused solutions, such as differential privacy, where data is perturbed to decrease the likelihood of participant re-identification. Both approaches have drawn criticism: while some consider the first approach to not be ethically rigorous, others are unsatisfied by the decrease in data utility caused by some technical solutions.
Our interdisciplinary research team that includes Harvard University computer scientists, Brown University sociologists, and environmental health researchers from Silent Spring Institute, seeks to develop best practices for data sharing in environmental health that:
Our team will achieve this through developing computational tools to quantify privacy risks in environmental health data, measuring the data utility of redacted environmental health datasets, and interviewing re-identified study participants to solicit their values.
Role: Lead in developing and advocating project concept. Build relationships with computer science collaborators and translate information within interdisciplinary team. Key contributor to proposal writing.