Data Ethics
Investing Wisely in Data at Scale
David Robinson and Miranda Bogen
Executive Summary
“Data at scale” — digital information collected, stored and used in ways that are newly feasible — opens new avenues for philanthropic investment. At the same time, projects that leverage data at scale create new risks that are not addressed by existing regulatory, legal and best practice frameworks. Data-oriented projects funded by major foundations are a natural proving ground for the ethical principles and controls that should guide the ethical treatment of data in the social sector and beyond.
This project is an initial effort to map the ways that data at scale may pose risks to philanthropic priorities and beneficiaries, for grantmakers at major foundations, and draws from desk research and unstructured interviews with key individuals involved in the grantmaking enterprise at major U.S. foundations. The resulting report was prepared at the joint request of the MacArthur and Ford Foundations.
Grantmakers are exploring data at scale, but currently have poor visibility into its benefits and risks. Rapid technological change, the scarcity of data science expertise, limited training and resources, and a lack of clear guideposts around emergent risks all contribute to this problem.
Funders have important opportunities to invest in, learn from, and innovate around data-intensive projects, in concert with their grantees. Grantmakers should not treat the new ethical risks of data at scale as a barrier to investment, but these risks also must not become a blind spot that threatens the success and effectiveness of philanthropic projects. Those working with data at scale in the philanthropic context have much to learn: throughout our conversations with stakeholders, we heard consistently that grantmakers and grantees lack baseline knowledge on using data at scale, and many said that they are unsure how to make better informed decisions, both about data’s benefits and about its risks. Existing frameworks address many risks introduced by data-intensive grantmaking, but leave some major gaps. In particular, we found that:
Some new data-intensive research projects involve meaningful risk to vulnerable populations, but are not covered by existing human subjects regimes, and lack a structured way to consider these risks. In the philanthropic and public sector, human subject review is not always required and program officers, researchers, and implementers do not yet have a shared standard by which to evaluate ethical implications of using public or existing data, which is often exempt from human subjects review.
Social sector projects often depend on data that reflects patterns of bias or discrimination against vulnerable groups, and face a challenge of how to avoid reinforcing existing disparities. Automated decisions can absorb and sanitize bias from input data, and responsibly funding or evaluating statistical models in data-intensive projects increasingly demands advanced mathematical literacy which foundations lack.
Both data and the capacity to analyze it are being concentrated in the private sector, which could marginalize academic and civil society actors. Some individuals and organizations have begun to call attention to these issues and create their own trainings, guidelines, and policies — but ad hoc solutions can only accomplish so much.
To address these and other challenges, we’ve identified eight key questions that program staff and grantees need to consider in data-intensive work:
For a given project, what data should be collected, and who should have access to it?
How can projects decide when more data will help — and when it won’t?
How can grantmakers best manage the reputational risk of data-oriented projects that may be at a frontier of social acceptability?
When concerns are recognized with respect to a data-intensive grant, how will those concerns get aired and addressed?
How can funders and grantees gain the insight they need in order to critique other institutions’ use of data at scale?
How can the social sector respond to the unique leverage and power that large technology companies are developing through their accumulation of data and data-related expertise?
How should foundations and nonprofits handle their own data?
How can foundations begin to make the needed long-term investments in training and capacity?
Newly emergent ethical issues inherent in using data at scale point to the need for both a broader understanding of the possibilities and challenges of using data in the philanthropic context as well as conscientious treatment of data ethics issues. Major foundations can play a meaningful role in building a broader understanding of these possibilities and challenges, and they can set a positive example in creating space for open and candid reflection on these issues. To those ends, we recommend that funders:
Include data ethics as an element of larger efforts to build data literacy among grantmakers and grantees. Create spaces for conversation and reflection for funders in order to promote data literacy and sensitivity, and invest in education on data-related topics for current and future staff.
Incorporate data ethics in the grantmaking process. Create an internal “data ethics point of contact” who can facilitate access to relevant expertise and keep an eye out for latent data ethics risk in projects, and consider changing grant applicant procedures to encourage applicants to prospectively consider data-related issues. Larger foundations should support a central resource to address data ethics concerns for the philanthropic community.
Create a data ethics checklist for grantees and program staff. Even without introducing any new requirements or policies, equipping staff with guiding questions they can ask about new, data-oriented projects can help funders identify and address areas of ethical concern.
Read our full report here.
Related Work
In The Atlantic, we argue that digital platforms — which deliver exponentially more ads than their newsprint predecessors — are making core civil-rights laws increasingly challenging to enforce.
Across the FieldThis brief argues that the Computer Fraud and Abuse Act should not criminalize violations of computer use policies, like terms of service.
Across the FieldHow and where, exactly, does big data become a civil rights issue? This report begins to answer that question, highlighting key instances where big data and civil rights intersect.
Across the FieldIn a paper presented at the 2020 Conference on Fairness, Accountability, and Transparency in Machine Learning, we describe how and when private companies collect or infer sensitive attribute data, such as a person’s race or ethnicity, for antidiscrimination purposes.
Across the Field