A study carried out by the Journal of Financial Economics found that borrowers from minority groups are charged interest rate almost 8% higher and are rejected for loans 14% more than those from privileged groups. This is despite the U.S. Equal Credit Opportunity Act which prohibits discrimination in mortgage lending.
These biases often leak into the machine learning models which are used to streamline decision-making by lenders, and may even contribute to the widening of the gap in racial wealth.
If these models are trained on a discriminatory dataset, like a higher number of Black borrowers being denied loans compared to white borrowers despite having the same income and credit score, these biases will impact the predictions of the model when applied to real-life situations. To resolve this, MIT researchers created a process which removes discrimination in data used to train these machine learning models.
The researchers’ method is novel because it removes bias from a dataset with multiple sensitive attributes, like race and ethnicity. Sensitive attributes and options are characteristics which differentiate a privileged group from an underprivileged group.
The researchers used their method, referred to as ‘Dualfair’, to train a machine learning classifier which makes fair predictions of which borrower should get a mortgage loan. When applied to the mortgage lending data from some U.S. stats, it significantly decreased the bias in the predictions while still maintaining high accuracy.
‘As Sikh Americans, we deal with bias on a frequent basis and we think it is unacceptable to see that transform to algorithms in real-world applications. For things like mortgage lending and financial systems, it is very important that bias not infiltrate these systems because it can emphasize the gaps that are already in place against certain groups,’ said Jashandeep Singh, co-lead author of the paper with his twin brother, Arashdeep. They were recently accepted into MIT.
The research was recently published and will appear in a special issue of Machine Learning and Knowledge Extraction.
Dualfair handles label and selection bias in a mortgage lending dataset. Label bias happens when the balance of favorable or unfavorable outcomes for a certain group (maybe race) is unfair. Selection bias occurs where data is not representative of the larger population. For example, where it only includes people from a neighborhood where incomes are generally low.
The Dualfair process removed label bias by subdividing a dataset into the biggest amount of subgroups based on combinations of sensitive attributes and options, like Black women who are Latino or Hispanic, etc.
With this, Dualfair can simultaneously tackle discrimination on multiple attributes.
‘Researchers have mostly tried to classify biased cases as binary so far. There are multiple parameters to bias, and these multiple parameters have their own impact in different cases. They are not equally weighed. Our method is able to calibrate it much better,’ says Gupta.
After the generation of the subgroups, Dualfair evens out the amount of borrowers in each subgroup by multiplying people from minority groups and removing people from the majority group. It then balances the proportion of loan acceptances and rejections in each subgroup in order to catch the median in the original dataset before combining the subgroups again.
To test their method, the researchers used the publicly available Home Mortgage Disclosure Act dataset, which includes about 88% of all mortgage loans in the U.S. in 2019, inclusive of 21 features like race, ethnicity and sex. They utilized Dualfair to ‘de-bias’ the complete dataset and smaller datasets for 6 states. They then trained a machine learning model to forecast loan rejections and acceptances.
After the application of Dualfair, the discrimination of predictions reduced while the accuracy level remained high. They used a fairness metric called average odds difference, which measures fairness in one sensitive attribute at a time.
Creating their own fairness metric called alternate world index which considers bias from multiple sensitive attributes and options as a whole, they applied it.
‘It is the common belief that if you want to be accurate, you have to give up on fairness, or if you want to be fair, you have to give up on accuracy. We show that we can make strides toward lessening that gap,’ Khan said.
The researchers now plan to apply their method to de-bias other datasets such as car insurance rates, job applications, healthcare outcomes, etc. Plans are also in place to address the challenges of Dualfair, which is its instability when there are tiny amounts of data with multiple sensitive attributes and options.
‘Technology, very bluntly, works only for a certain group of people. In the mortgage loan domain in particular, African American women have been historically discriminated against. We feel passionate about making sure that systemic racism does not extend to algorithmic models. There is no point in making an algorithm that can automate a process if it doesn’t work for everyone equally,’ says Khan.
By Marvellous Iwendi.
Source: MIT News