There is a fine line between obtaining data that may be in the public interest and outright theft of data just because you can. And just because the data is there – having been stolen by online intruders and then leaked – doesn’t mean it’s fair to use it.
An article published in Nature Machine Intelligence this week is an effort to help guide data scientists and researchers through the ethical dilemmas that arise when considering using information obtained from data breaches.
To begin with, Marcello Ienca, researcher at the Swiss Federal Institute of Technology, and Effy Vayena, deputy director of the Swiss Institute of Translational Medicine, proposed the definition according to which “pirated” data is “data obtained in an unauthorized manner. authorized by access to a computer or a computer network. “They claim that it is increasingly used in scientific research, such as conflict modeling studies based on WikiLeaks data sets and studies of sexual behavior based on data leaked by Ashley Madison, a dating site whose database was stolen by a group of attackers calling themselves The Impact Team, in 2015.
But basing studies on such ill-gotten data sets presents problems analogous to earlier debates about research that uses unethically sourced data, such as data obtained from Nazi medical “experiments.”
Even though it may be lawful for researchers to use pirated data if it is publicly available, responsible research practices still require a clear ethical rationale for doing so, ”argues the document.
Researchers could argue that they are justified in using stolen publicly available datasets because they offer public value, save resources, offer a single source, and may exhibit consistency across domains. On the other hand, the use of this data may not give the consent of the persons mentioned or involved in the data, the use of the data could cause secondary harm, it could represent an invasion of privacy and could reduce the quality of scientific standards.
The authors propose six ethical and procedural requirements that must be met before proceeding with the use of stolen or disclosed data for a project.
First, they encourage researchers to consider uniqueness. Can they demonstrate that the data disclosed could not have been collected using conventional methods? Then can they show that the research they are considering is of great social value and that the benefits clearly outweigh the possible harms? If the hacked data is personally identifiable, researchers must obtain the explicit and informed consent of these individuals.
If this is not possible, research should only continue if the risk is minimal and the benefits obvious. They should also make sure that they have a record of how and where all the data was obtained. Researchers should make it clear when they accessed identifiable data without the subjects’ consent, and what they did to ensure the privacy and security of those affected. These five conditions lead to a sixth: that Institutional Review Boards (IRBs) or similar bodies such as research ethics be used.
Datasets made public through WikiLeaks or the Panama Papers may offer insight for the public good, but there are also risks and unintended consequences of finding illegally accessed datasets. Ienca and Vayena proposed an approach to achieve some of these benefits while minimizing the potential harm. ®