Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 23351

ListenData: Datasets for Credit Risk Modeling

$
0
0
This tutorial outlines several free publicly available datasets which can be used for credit risk modeling. In banking world, credit risk is a critical business vertical which makes sure that bank has sufficient capital to protect depositors from credit, market and operational risks. During the process, its role is to work for bank in compliance to central bank regulations.

Important Credit Risk Modeling Projects

  1. Probability of Default (PD) tells us the likelihood that a borrower will default on the debt (loan or credit card). In simple words, it returns the expected probability of customers fail to repay the loan.
  2. Loss Given Default (LGD) is a proportion of the total exposure when borrower defaults. It is calculated by (1 - Recovery Rate). For example someone takes $200,000 loan from bank for purchase of flat. He/She paid some installments before he stopped paying installments further. When he defaults, loan has an outstanding balance of $100,000. Bank took possession of flat and was able to sell it for $90,000. Net loss to the bank is $10,000 which is 100,000-90,000, and the LGD is 10% i.e. $10,000/$100,000.
  3. Exposure at Default (EAD) is the amount that the borrower has to pay the bank at the time of default. In the above example shown in LGD, outstanding balance of $100,000 is EAD
credit risk datasets

Datasets for Credit Risk Modeling Projects

We have gathered data from several sources. See the list below. The following websites own the copyright on these data and authorizes their reproduction.
  1. Kaggle
  2. UCI Machine Learning Repository
  3. Econometric Analysis Book by William H. Greene
  4. Credit scoring and its applications Book by Lyn C. Thomas
  5. Credit Risk Analytics Book by Harald, Daniel and Bart
  6. Lending Club
  7. PAKDD 2009 Data Mining Competition, organized by NeuroTech Ltd. and Center for Informatics of the Federal University of Pernambuco
Kaggle : Home Credit Default Risk
It includes variables from different sources which are required to build robust and accurate probability of default model.
  • Credit bureau variables which contains details about borrower's previous credits provided by other banks
  • Previous Loans that the applicant had with Home Credit
  • Previous Point of sales and cash loans that the applicant had with Home Credit
  • Previous Credit Cards that the applicant had with Home Credit
Download data and data dictionary
Kaggle : Give Me Some Credit
Kaggle organised a competition few years ago which has problem statement - Building a probability of default model which predicts defaulters in the next two years. Download Data by visiting the website See the data dictionary below :
Variable NameDescription
SeriousDlqin2yrsPerson experienced 90 days past due delinquency or worse
RevolvingUtilizationOfUnsecuredLinesTotal balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits
ageAge of borrower in years
NumberOfTime30-59DaysPastDueNotWorseNumber of times borrower has been 30-59 days past due but no worse in the last 2 years.
DebtRatioMonthly debt payments, alimony,living costs divided by monthy gross income
MonthlyIncomeMonthly income
NumberOfOpenCreditLinesAndLoansNumber of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards)
NumberOfTimes90DaysLateNumber of times borrower has been 90 days or more past due.
NumberRealEstateLoansOrLinesNumber of mortgage and real estate loans including home equity lines of credit
NumberOfTime60-89DaysPastDueNotWorseNumber of times borrower has been 60-89 days past due but no worse in the last 2 years.
NumberOfDependentsNumber of dependents in family excluding themselves (spouse, children etc.)
Econometric Analysis Book by William H. Greene
This book has credit card data which comprises of target variable which is binary in nature (1 if application for credit card accepted, 0 if not) and a few independent variables about demographics and credit history of credit card holders.

You can download data and its description from this link

UCI Machine Learning Repository
This repository contains sample credit application data of many different countries.
Dataset about credit card defaults in Taiwan contains several attributes or characters which can be leveraged to test various machine learning algorithms for building credit scorecard.
Note : Poland dataset contains information about attributes of companies rather than retail customers.
PAKDD 2009 Data Mining Competition
It is a credit card application data of Brazilian customers. It has a labeled data set from one year period for training credit scoring model. You can do scoring to the leaderboard dataset from one year later. To download data, clink on this link Download Data and then click on Download button.
Credit Risk Analytics Book
Lending Club
It contains Peer to Peer Lending data for loans issued including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information. Check out this link - Download Data
Credit scoring and its applications (Lyn C. Thomas, David B. Edelman, Jonathan N. Crook)
Download Data
Data Description is shown below -

Bad Good/bad indicator
1 = Bad
0 = Good

yob Year of birth (If unknown the year will be 99)
nkid Number of children
dep Number of other dependents
phon Is there a home phone (1=yes, 0 = no)
sinc Spouse's income

aes Applicant's employment status
V = Government
W = housewife
M = military
P = private sector
B = public sector
R = retired
E = self employed
T = student
U = unemployed
N = others
Z = no response


dainc Applicant's income
res Residential status
O = Owner
F = tenant furnished
U = Tenant Unfurnished
P = With parents
N = Other
Z = No response

dhval Value of Home
0 = no response or not owner
000001 = zero value
blank = no response

dmort Mortgage balance outstanding
0 = no response or not owner
000001 = zero balance
blank = no response

doutm Outgoings on mortgage or rent
doutl Outgoings on Loans
douthp Outgoings on Hire Purchase
doutcc Outgoings on credit cards

Viewing all articles
Browse latest Browse all 23351

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>