It’s no secret that machine learning systems can exhibit the same bad behavior, e.g., racism and sexism, as the humans whose data was used to train them. Garbage in, garbage out, right? It’s not quite that simple, though. The “garbage” is just one sutble component of the complete output of the system: powerful and valuable insights into the data. That is to say, with some exceptions, the bias isn’t always easy to see.
Machine learning biases can have deep and lasting effects on individuals and society at large.
Criminal judges rely in part on ML risk assessment systems when making decisions about recidivism, bail, and sentencing for persons accused or convicted of a crime.
Police use software to guide the areas they patrol. The software attempts to predict areas where crime is likely to occur. In practice, this type of software often predicts arrests as a proxy for crime, resulting in overpolicing and a self-reinforcing loop: the police use the software to focus on certain areas, which creates more arrests in those areas, feeding back into the software, which tends to predict still more arrests in those areas. For some examples, the following companies produce predictive policing software, but it is a lucrative business and many more companies are entering the market:
Financial institutions use machine learning to decide whether or not to issue credit - and at what rates - to an applicant.
Many companies are using machine learning to evaluate employment applications and resumes.
The biases aren’t intentional. The agencies and companies producing and using this software simply don’t consider the social effects of the technology, or lack the resources to do so effectively. Data wrangling and machine learning are difficult. This naturally leads to a focus on the technology and the data. The social implications are often an afterthought.
One way to mitigate the social and ethical harms of bias in machine learning systems would be to audit the systems to test for bias and identify the points at which the bias was introduced. The audit team should, of course, include a technologist to understand the data and machine learning components, but it should also include someone trained in the social implications of the software. So a team might be comprised of, for example, a data scientist, a data engineer, and a sociologist.
Such teams could be assembled within an organization of sufficient size with good hiring practices. Alternatively, outside consultants could be engaged to carry out these audits.
The auditors would examine the product from data collection and transformation, to model training and testing, to output and application. This would either result in a certification that the system is ethically sound, or a list of concerns and steps to address them, including qualitative and quantitative approaches to improvement.
Incidentally such audits would (or should) include regulatory compliance factors. They would help companies establish compliance with regulators, and also reduce exposure to civil rights lawsuits based on bias.
Local and federal government agencies should create standards for fairness in these systems and require periodic auditing. Companies who produce such software should also have incentives to audit and certify their products. For example, government customers might be required to purchase only audited and certified products in these areas. And private companies might prefer certified products as well.
We’ve only found a few groups who appear to be working in this space so far: Ethical Resolve, Rocky Coast Research, O’Neil Risk Consulting, and Probable Models. In the coming months and years, we hope to see broader adoption of auditors like these.
We’re hiring! The Cloudera Fast Forward Labs Team is hiring a Data Strategist. Please help us spread the word!
And, as always, thank you for reading! We welcome your comments and feedback; please reach out anytime to email@example.com.
All the best,
The Cloudera Fast Forward Labs Team