Rethink Credit Scores: Ensuring Fair Lending through NLP for Transaction Categorization
- Tech Stack: Tech Stack: Python, HTML, CSS, Pandas, Scikit-learn, Numpy, nltk
- Mentorship Comapny: Petal
- Mentors: Brian Duke, Berk Ustun
- Project date: October 2022 - March 2023
- Project URL: https://dsc180a.github.io/rethink_creditscore.github.io/
Abstract
For decades, the banking industry has assessed creditworthiness the same way: they use massive amounts of data such as an applicant's current debt, the accounts they have opened, and the ratio of money owed to available credit to determine loan qualification. This system has been generally successful in the past, however it is not fair for those with no/low credit history, such as immigrants or young adults who are trying to build. Thus, it can be extremely difficult for these applicants, often referred to as being “credit invisible”, to be approved for loans or other forms of credit. Hence, in this paper, we introduced a more comprehensive assessment framework that allows individuals to submit their past banking history as a supplementary material to better assess creditworthiness.
Achievement
- Led a team of 4 to develop a comprehensive method for credit risk assessment in the banking sector, utilizing Natural Language Processing techniques such as Tf-idf and BERT to analyze banking history.
- Conducted data analysis to evaluate data integrity and further feature transformation to categorize each transaction using Pandas and Scikit-learn.
- Achieved 88% accuracy by utilizing Tf-idf to categorize 500,000 transactions into 8 categories.