# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline


# read data set into pandas DataFrame
prosper_dataset = pd.read_csv("prosperLoanData.csv")

# get a general overview of the data set.
prosper_dataset.shape

(113937, 81)


# get information on the dataset's features and completeness

prosper_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113937 entries, 0 to 113936
Data columns (total 81 columns):
 #   Column                               Non-Null Count   Dtype  
---  ------                               --------------   -----  
 0   ListingKey                           113937 non-null  object 
 1   ListingNumber                        113937 non-null  int64  
 2   ListingCreationDate                  113937 non-null  object 
 3   CreditGrade                          28953 non-null   object 
 4   Term                                 113937 non-null  int64  
 5   LoanStatus                           113937 non-null  object 
 6   ClosedDate                           55089 non-null   object 
 7   BorrowerAPR                          113912 non-null  float64
 8   BorrowerRate                         113937 non-null  float64
 9   LenderYield                          113937 non-null  float64
 10  EstimatedEffectiveYield              84853 non-null   float64
 11  EstimatedLoss                        84853 non-null   float64
 12  EstimatedReturn                      84853 non-null   float64
 13  ProsperRating (numeric)              84853 non-null   float64
 14  ProsperRating (Alpha)                84853 non-null   object 
 15  ProsperScore                         84853 non-null   float64
 16  ListingCategory (numeric)            113937 non-null  int64  
 17  BorrowerState                        108422 non-null  object 
 18  Occupation                           110349 non-null  object 
 19  EmploymentStatus                     111682 non-null  object 
 20  EmploymentStatusDuration             106312 non-null  float64
 21  IsBorrowerHomeowner                  113937 non-null  bool   
 22  CurrentlyInGroup                     113937 non-null  bool   
 23  GroupKey                             13341 non-null   object 
 24  DateCreditPulled                     113937 non-null  object 
 25  CreditScoreRangeLower                113346 non-null  float64
 26  CreditScoreRangeUpper                113346 non-null  float64
 27  FirstRecordedCreditLine              113240 non-null  object 
 28  CurrentCreditLines                   106333 non-null  float64
 29  OpenCreditLines                      106333 non-null  float64
 30  TotalCreditLinespast7years           113240 non-null  float64
 31  OpenRevolvingAccounts                113937 non-null  int64  
 32  OpenRevolvingMonthlyPayment          113937 non-null  float64
 33  InquiriesLast6Months                 113240 non-null  float64
 34  TotalInquiries                       112778 non-null  float64
 35  CurrentDelinquencies                 113240 non-null  float64
 36  AmountDelinquent                     106315 non-null  float64
 37  DelinquenciesLast7Years              112947 non-null  float64
 38  PublicRecordsLast10Years             113240 non-null  float64
 39  PublicRecordsLast12Months            106333 non-null  float64
 40  RevolvingCreditBalance               106333 non-null  float64
 41  BankcardUtilization                  106333 non-null  float64
 42  AvailableBankcardCredit              106393 non-null  float64
 43  TotalTrades                          106393 non-null  float64
 44  TradesNeverDelinquent (percentage)   106393 non-null  float64
 45  TradesOpenedLast6Months              106393 non-null  float64
 46  DebtToIncomeRatio                    105383 non-null  float64
 47  IncomeRange                          113937 non-null  object 
 48  IncomeVerifiable                     113937 non-null  bool   
 49  StatedMonthlyIncome                  113937 non-null  float64
 50  LoanKey                              113937 non-null  object 
 51  TotalProsperLoans                    22085 non-null   float64
 52  TotalProsperPaymentsBilled           22085 non-null   float64
 53  OnTimeProsperPayments                22085 non-null   float64
 54  ProsperPaymentsLessThanOneMonthLate  22085 non-null   float64
 55  ProsperPaymentsOneMonthPlusLate      22085 non-null   float64
 56  ProsperPrincipalBorrowed             22085 non-null   float64
 57  ProsperPrincipalOutstanding          22085 non-null   float64
 58  ScorexChangeAtTimeOfListing          18928 non-null   float64
 59  LoanCurrentDaysDelinquent            113937 non-null  int64  
 60  LoanFirstDefaultedCycleNumber        16952 non-null   float64
 61  LoanMonthsSinceOrigination           113937 non-null  int64  
 62  LoanNumber                           113937 non-null  int64  
 63  LoanOriginalAmount                   113937 non-null  int64  
 64  LoanOriginationDate                  113937 non-null  object 
 65  LoanOriginationQuarter               113937 non-null  object 
 66  MemberKey                            113937 non-null  object 
 67  MonthlyLoanPayment                   113937 non-null  float64
 68  LP_CustomerPayments                  113937 non-null  float64
 69  LP_CustomerPrincipalPayments         113937 non-null  float64
 70  LP_InterestandFees                   113937 non-null  float64
 71  LP_ServiceFees                       113937 non-null  float64
 72  LP_CollectionFees                    113937 non-null  float64
 73  LP_GrossPrincipalLoss                113937 non-null  float64
 74  LP_NetPrincipalLoss                  113937 non-null  float64
 75  LP_NonPrincipalRecoverypayments      113937 non-null  float64
 76  PercentFunded                        113937 non-null  float64
 77  Recommendations                      113937 non-null  int64  
 78  InvestmentFromFriendsCount           113937 non-null  int64  
 79  InvestmentFromFriendsAmount          113937 non-null  float64
 80  Investors                            113937 non-null  int64  
dtypes: bool(3), float64(50), int64(11), object(17)
memory usage: 68.1+ MB


# get random sampe of dataset

prosper_dataset.sample(10)


# Generate random dataset and seed for reproducibility.
np.random.seed(5000)
sample = np.random.choice(prosper_dataset.shape[0], 2000, replace=False)
prosper_data_rand = prosper_dataset.loc[sample]


#  summary statistics of numerical variables

prosper_dataset.describe().style.format("{0:,.3f}")


# convert the ListingCreationDate Series to the datetime data type

prosper_dataset["ListingCreationDate"] = pd.to_datetime(prosper_dataset["ListingCreationDate"])


# Confirm the new data type of the column

prosper_dataset["ListingCreationDate"].info()

<class 'pandas.core.series.Series'>
RangeIndex: 113937 entries, 0 to 113936
Series name: ListingCreationDate
Non-Null Count   Dtype         
--------------   -----         
113937 non-null  datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 890.3 KB


# Check for the start date for the data set

minimum_date = prosper_dataset["ListingCreationDate"].min()

print(f"The start date for the dataset is {minimum_date.day_name()}, {minimum_date.month_name()}, {minimum_date.day}, {minimum_date.year}")

The start date for the dataset is Wednesday, November, 9, 2005


# Check for the end date for the data set

maximum_date = prosper_dataset["ListingCreationDate"].max()

print(f"The end date for the dataset is {maximum_date.day_name()}, {maximum_date.month_name()}, {maximum_date.day}, {maximum_date.year}")

The end date for the dataset is Monday, March, 10, 2014


# Get a count of the number of listings by year
listing_year_count = prosper_dataset["ListingCreationDate"].dt.strftime('%Y').value_counts().to_frame().rename_axis("Year")

# Add percentages to the above dataframe
listing_year_count["percentage"] = listing_year_count["ListingCreationDate"] / len(prosper_dataset)*100

listing_year_count


# Get a count of the number of listings by month
listing_month_count = prosper_dataset["ListingCreationDate"].dt.strftime('%B').value_counts().to_frame().rename_axis("Month")

# Add percentages to the above dataframe
listing_month_count["percentage"] = listing_month_count["ListingCreationDate"] / len(prosper_dataset)*100

listing_month_count


# Get a count of the number of listings by day of the week
listing_day_count = prosper_dataset["ListingCreationDate"].dt.strftime('%A').value_counts().to_frame().rename_axis("Weekday")

# Add percentages to the above dataframe
listing_day_count["percentage"] = listing_day_count["ListingCreationDate"] / len(prosper_dataset)*100

listing_day_count


# Define a function that creates a simple bar chart from a value count dataframe
def simple_bar_plot(df, xlabel, ylabel, title):
    """
    This simple function creates a bar chart from a value_counts dataframe.
    It takes four parameters:
    df (DataFrame): the value_counts dataframe
    xlabel (str): the x label for the plot
    ylabel (str): the y label for the plot
    title (str): the title of the plot
    """
    fig, ax = plt.subplots(figsize=(10, 6))
    plot = df.plot(kind="bar", ax=ax)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title)
    plt.legend()
    ax.get_legend().remove()
    return plot


# plot bar graph for months count

simple_bar_plot(listing_year_count, "Listing Year",
                "Number of borrowers",
                "Bar plot for borrower listings by year");


# plot bar graph for months count
simple_bar_plot(listing_month_count, "Listing Month",
                "Number of borrowers",
                "Bar plot for borrower listings by month");


# plot bar graph for days count
simple_bar_plot(listing_day_count, "Listing Day",
                "Number of borrowers",
                "Bar plot for bumber of borrower listings by day");


prosper_dataset["Term"].describe()

count    113937.000000
mean         40.830248
std          10.436212
min          12.000000
25%          36.000000
50%          36.000000
75%          36.000000
max          60.000000
Name: Term, dtype: float64


unique_vals = prosper_dataset["Term"].unique()

unique_vals

array([36, 60, 12], dtype=int64)


#  Get the count of each unique value in the Term column

unique_counts = prosper_dataset["Term"].value_counts()

unique_counts

36    87778
60    24545
12     1614
Name: Term, dtype: int64


# Define a function for plotting bar charts using seaborn.

def bar_chart_plotter(df, x, title):
    """
    This function will be used to plot bar graphs using seaborn's countplot. 
    The parameters are the dataframe (df), Series to plot(x),
    title of plot (title), xlabel (xlabel), and ylabel (ylabel)
    """
    sns.color_palette()
    base_color = sns.color_palette()[0]
    sns.set_theme(style="darkgrid")
    plt.title(title)
    return sns.countplot(data=df, x=x, color=base_color);


# Define a function for plotting pie charts using pyplot.

def pie_plotter(x, labels, title, textprops, autopct, explode):
    """
    x (list or series): The cound of categories to be plotted
    labels (list or series): The category labels to be plotted
    title (string): title of the pie plot
    textprops (dictionary): text customizations
    autopct (string): formating for the text
    explode (list): list representing the pie offset. Should be the same length as the number of categories
    """
    
    plt.title(title)
    return plt.pie(x=x, labels=labels, textprops=textprops, autopct=autopct, explode=explode)
    plt.legend()


# Plot a bar graph and pie chart of the loan term for the dataset in one plot
plt.figure(figsize = [20, 6])

plt.subplot(1, 2, 1)
bar_chart_plotter(prosper_dataset,
                  prosper_dataset["Term"],
                  "Plot of Prosper Loan Term in months")
plt.xlabel("Term (months)")
plt.ylabel("Number of listings")

plt.subplot(1, 2, 2)
pie_plotter(unique_counts,
            unique_vals,
            "Pie chart showing Term of Prosper Loan Dataset in months",
            {"fontsize":15},
            "%0.1f%%",
            [0.01,0.01,0.2])
plt.legend();


# get an idea of the values in out LoanStatus categories
print(prosper_dataset["LoanStatus"].nunique())

prosper_dataset["LoanStatus"].unique()

12

array(['Completed', 'Current', 'Past Due (1-15 days)', 'Defaulted',
       'Chargedoff', 'Past Due (16-30 days)', 'Cancelled',
       'Past Due (61-90 days)', 'Past Due (31-60 days)',
       'Past Due (91-120 days)', 'FinalPaymentInProgress',
       'Past Due (>120 days)'], dtype=object)


def plt_bar_plotter(x, y, xlabel, title, rotation):
    """
    This function plots a bar chart using pyplot and takes the following parameters::
    x (list or series): category to be plotted on the x-axis
    y (list or series): height of the categories plotted on the x-axis
    xlabel (string): label for the x-axis
    title (string): title of the plot
    rotation (int): xticks the rotation on the x-axis
    """

    # add text to bar chart using for loop
    for i, g in enumerate(y):
        plt.text(i, g, f"{g}", ha="center", fontsize=16, fontweight="bold")

    # add labels to plot 
    plt.xlabel(xlabel) # add x-label
    plt.xticks(rotation=rotation) # rotate the x-ticks
    plt.yticks([]) # remove y-ticks
    sns.despine(left=True) # remove left spine
    plt.title(title) # add title to plot
    return plt.bar(x, y, edgecolor="black", color="xkcd:light gray blue");


fig, ax = plt.subplots(figsize=(15, 5)) # create plot canvas
# save the categories and their counts into variables for plotting
loan_status = list(prosper_dataset["LoanStatus"].unique())
loan_status_counts = prosper_dataset["LoanStatus"].value_counts(sort=True).to_list()

# plot bar chart of loan statuses using pyplot
plt_bar_plotter(loan_status, # loan status categories
                loan_status_counts, # count of categories
                "Loan Status", # x label
                "Current Status of Loans", # title
                90); # xticks rotation


prosper_dataset["BorrowerAPR"].describe()

count    113912.000000
mean          0.218828
std           0.080364
min           0.006530
25%           0.156290
50%           0.209760
75%           0.283810
max           0.512290
Name: BorrowerAPR, dtype: float64


# Function that plots a histogran
def hist_plotter(df, x, bins, xlabel, ylabel, title):
    """
    This function plots a histogram using the following parameters:
    df (DataFrame): The dataframe object 
    x (series): The series to be plotted on the histogram
    bin (list): the number of bins to plot
    xlabel (string): label on the x-axis
    ylabel (string): label on the y-axis
    title (string): title of the plot
    """
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title)
    return plt.hist(data=df, x=x, bins=bins)


fig, ax = plt.subplots(figsize=(10, 5)) # create plot canvas
# customize bins
bins = np.arange(0.00, 0.5+0.1, 0.03)

# plot histogram using borrower's APR
hist_plotter(prosper_dataset,
             "BorrowerAPR",
             bins, 
             "Annual Percentage Rate (APR)","Number of Listings",
             "Borrower's Annual Percentage Rate (APR) Distribution");


# Get the number of states with borrowers 
print(prosper_dataset["BorrowerState"].nunique())

# Print out the list of all the states with borrowers listed
prosper_dataset["BorrowerState"].unique()

51

array(['CO', 'GA', 'MN', 'NM', 'KS', 'CA', 'IL', 'MD', nan, 'AL', 'AZ',
       'VA', 'FL', 'PA', 'OR', 'MI', 'NY', 'LA', 'WI', 'OH', 'NC', 'WA',
       'NV', 'NJ', 'TX', 'AR', 'SC', 'DE', 'MO', 'NE', 'UT', 'DC', 'MA',
       'CT', 'IN', 'KY', 'OK', 'MS', 'WV', 'RI', 'TN', 'ID', 'MT', 'HI',
       'NH', 'VT', 'WY', 'ME', 'AK', 'IA', 'SD', 'ND'], dtype=object)


# Get the states with the highest number of borrowers.

print(prosper_dataset["BorrowerState"].value_counts().head(10))

prosper_dataset["BorrowerState"].value_counts().tail(10)

CA    14717
TX     6842
NY     6729
FL     6720
IL     5921
GA     5008
OH     4197
MI     3593
VA     3278
NJ     3097
Name: BorrowerState, dtype: int64

DC    382
MT    330
DE    300
VT    207
AK    200
SD    189
IA    186
WY    150
ME    101
ND     52
Name: BorrowerState, dtype: int64


# save variables for plotting
# highest number of borrowers
state_list_top = prosper_dataset["BorrowerState"].value_counts(sort=True).head(10).index
state_counts_top = prosper_dataset["BorrowerState"].value_counts(sort=True).head(10).to_list()
# least number of borrowers
state_list_bottom = prosper_dataset["BorrowerState"].value_counts(sort=True).tail(10).index
state_counts_bottom = prosper_dataset["BorrowerState"].value_counts(sort=True).tail(10).to_list()

fig, ax = plt.subplots(figsize=(20, 8)) # create plot canvas

plt.subplot(1, 2, 1)
# plot bar chart of borrower states
plt_bar_plotter(state_list_top, # borrower state
                state_counts_top, # count of borrowers
                "State", # x-label
                "Highest Number of borrowers by State", # title
                90); # xticks rotation

plt.subplot(1, 2, 2)
# plot bar chart of borrower states
plt_bar_plotter(state_list_bottom, # borrower state
                state_counts_bottom, # count of borrowers
                "State", # x-label
                "Lowest Number of borrowers by State", # title
                90); # xticks rotation


employ_status = prosper_dataset.groupby("EmploymentStatus").size().reset_index(name="counts")

employ_status


fig, ax = plt.subplots(figsize=(10, 6)) # create plot canvas

plt_bar_plotter(employ_status["EmploymentStatus"], # employment status
                employ_status["counts"], # count of status
                "Employment Status", # x-label
                "Number of Borrowers by Employment Status as at listing time", # title
                45); # xticks rotation


homeowner_status = prosper_dataset["IsBorrowerHomeowner"].unique()

homeowner_status

array([ True, False])


homeowner_counts = prosper_dataset["IsBorrowerHomeowner"].value_counts().to_list()

homeowner_counts

[57478, 56459]


# Plot pie chart of homeonwer status

fig, ax = plt.subplots(figsize=(10, 6)) # create plot canvas

pie_plotter(homeowner_counts, # count of homeowners
            homeowner_status, # labels
            "Home Owner Status among listed borrowers", # title of plot
            {"fontsize":15, "fontweight":"bold"}, # text customization
            "%0.1f%%", # Value formatting
            [0.001,0.001]) # explode pie slices
plt.legend();


income_range_counts = prosper_dataset["IncomeRange"].value_counts().to_frame()

income_range_counts["percentage"] = income_range_counts["IncomeRange"] / len(prosper_dataset) * 100

income_range_counts


# plot Income range distribution on bar chart

fig, ax = plt.subplots(figsize=(12, 6)) # create plot canvas

plt_bar_plotter(income_range_counts.index, # values to plot
                income_range_counts["IncomeRange"].to_list(), # count of values
                "Income Range", # x-label
                "Income Range of listed borrowers", # title
                45); # xticks rotation


# Generate random dataset and seed for reproducibility.
np.random.seed(5000)
sample = np.random.choice(prosper_dataset.shape[0], 2000, replace=False)
prosper_data_rand = prosper_dataset.loc[sample]


# set display format for floats
pd.options.display.float_format = '{:.4f}'.format


# Get summary statistics for income distribution
prosper_dataset["StatedMonthlyIncome"].describe()

count    113937.0000
mean       5608.0256
std        7478.4973
min           0.0000
25%        3200.3333
50%        4666.6667
75%        6825.0000
max     1750002.9167
Name: StatedMonthlyIncome, dtype: float64


# Plot histogram from the stated monthly income column to see the distribution
fig, ax = plt.subplots(figsize=(10, 5)) # create plot canvas
# customize bins
bins = np.arange(0.00, 1750000, 100000)

# plot histogram of stated income
hist_plotter(prosper_dataset,
             "StatedMonthlyIncome",
             bins, 
             "Stated Income",
             "Number of Listings",
             "Income distibution for borrowers");


# subset data to only those with income above zero. 
# We will continue working with this new dataset henceforth.
prosper_dataset = prosper_dataset[prosper_dataset["StatedMonthlyIncome"] > 0]

print(prosper_dataset["StatedMonthlyIncome"].info())

prosper_dataset["StatedMonthlyIncome"].describe()

<class 'pandas.core.series.Series'>
Int64Index: 112543 entries, 0 to 113936
Series name: StatedMonthlyIncome
Non-Null Count   Dtype  
--------------   -----  
112543 non-null  float64
dtypes: float64(1)
memory usage: 1.7 MB
None

count    112543.0000
mean       5677.4887
std        7498.4193
min           0.0833
25%        3293.3333
50%        4703.0000
75%        6833.3333
max     1750002.9167
Name: StatedMonthlyIncome, dtype: float64


# Get idea of bins to plot from log transform
np.log(prosper_dataset["StatedMonthlyIncome"].describe())

count   11.6311
mean     8.6443
std      8.9224
min     -2.4849
25%      8.0997
50%      8.4560
75%      8.8296
max     14.3751
Name: StatedMonthlyIncome, dtype: float64


# Plot histogram with the log transform
fig, ax = plt.subplots(figsize=(15, 6)) # create plot canvas
# customize bins
bins = 10 ** np.arange(3, 5, 0.03)

# Generate the x-ticks you want to apply 
ticks = [1000, 3000, 5000, 10000, 30000, 50000, 100000]

# Convert ticks into string values, to be displayed along the x-axis
labels = ['{}'.format(v) for v in ticks]

# plot histogram of stated income
hist_plotter(prosper_dataset,
             "StatedMonthlyIncome",
             bins, 
             "Stated Monthly Income",
             "Number of Borrowers",
             "Income distibution for borrowers")
# Apply log scale
plt.xscale("log")
# Apply x-ticks
plt.xticks(ticks, labels);


# Get summary statistics for the ProsperScore variable
prosper_dataset["ProsperScore"].describe()

count   84160.0000
mean        5.9558
std         2.3788
min         1.0000
25%         4.0000
50%         6.0000
75%         8.0000
max        11.0000
Name: ProsperScore, dtype: float64


# Get the value count for each rating
prosper_dataset["ProsperScore"].value_counts()

4.0000     12477
6.0000     12155
8.0000     11980
7.0000     10506
5.0000      9698
3.0000      7563
9.0000      6883
2.0000      5710
10.0000     4747
11.0000     1456
1.0000       985
Name: ProsperScore, dtype: int64


# subset dataframe to only listings with ratings between 1 and 10.
prosper_ratings_normal = prosper_dataset[prosper_dataset["ProsperScore"] < 11]

# Group the new subset by score
prosper_ratings_normal_group = prosper_ratings_normal.groupby("ProsperScore").size().reset_index(name="counts")

# describe the subset prosper score
print(prosper_ratings_normal["ProsperScore"].describe())

# view the new dataframe summary
prosper_ratings_normal_group

count   82704.0000
mean        5.8670
std         2.3027
min         1.0000
25%         4.0000
50%         6.0000
75%         8.0000
max        10.0000
Name: ProsperScore, dtype: float64


# Plot bar graph
fig, ax = plt.subplots(figsize=(10, 6)) # create plot canvas

bar_chart_plotter(prosper_ratings_normal,
                  prosper_ratings_normal["ProsperScore"],
                  "Prosper Score rating for listed borrowers")
plt.xlabel("ProsperScore")
plt.ylabel("Number of listings");


# Print out the list of categories
print(prosper_dataset["ListingCategory (numeric)"].nunique())

cats = list(prosper_dataset["ListingCategory (numeric)"].unique())

cats

21

[0, 2, 16, 1, 7, 13, 6, 15, 20, 19, 18, 3, 8, 4, 11, 14, 5, 9, 17, 10, 12]


# Create dataframe with full listing category

listing_cats = prosper_dataset.groupby("ListingCategory (numeric)").size().reset_index(name="counts")

listing_cats["Listing Category"] = ["Not Available", "Debt Consolidation", "Home Improvement",
                                    "Business", "Personal Loan", "Student Use", "Auto", "Other",
                                    "Baby&Adoption", "Boat", "Cosmetic Procedure", "Engagement Ring",
                                    "Green Loans", "Household Expenses", "Large Purchases",
                                    "Medical/Dental", "Motorcycle", "RV", "Taxes", "Vacation", "Wedding Loans"]

listing_cats


fig, ax = plt.subplots(figsize=(20, 6)) # create plot canvas
plt_bar_plotter(listing_cats["Listing Category"],
                listing_cats["counts"],
                "Listing Category",
                "Prosper Loan Data Listing Category by number of Borrowers",
                75);


prosper_dataset["LoanOriginalAmount"].describe()

count   112543.0000
mean      8364.7371
std       6249.2212
min       1000.0000
25%       4000.0000
50%       6500.0000
75%      12000.0000
max      35000.0000
Name: LoanOriginalAmount, dtype: float64


# function definition for plotting a box plot
def plot_a_box(data, series, color, title):
    """
    This function will plot a box and whiskers plot using the following arguments:
    data (DataFrame): the DataFrame of interest
    series (Series or list): the series or list to be visualized
    color (string): color of the box
    title (string): title to be used for the plot.
    """
    fig, ax = plt.subplots(figsize=(8, 5))
    box = data[series].plot(ax=ax, kind="box", vert=False, color=color, title=title)
    return box


# Plot box plot for loan amount data
plot_a_box(prosper_dataset,
           "LoanOriginalAmount",
           "#800080",
           "Box and whiskers plot for Prosper Data Original Amount Borrowed");


prosper_dataset["BorrowerRate"].describe()

count   112543.0000
mean         0.1924
std          0.0746
min          0.0000
25%          0.1334
50%          0.1830
75%          0.2500
max          0.4975
Name: BorrowerRate, dtype: float64


# Plot box plot for loan amount data
plot_a_box(prosper_dataset,
           "BorrowerRate",
           "#800080",
           "Box and whiskers plot for Prosper Data Borrower Rate");


prosper_ratings_normal["LoanStatus"].unique()

array(['Current', 'Past Due (1-15 days)', 'Defaulted', 'Completed',
       'Chargedoff', 'Past Due (16-30 days)', 'Past Due (61-90 days)',
       'Past Due (31-60 days)', 'Past Due (91-120 days)',
       'FinalPaymentInProgress', 'Past Due (>120 days)'], dtype=object)


# Define a function to plot a violin plot and 2D box plot
def violin_chart(data, x, y, xlabel, ylabel, title):
    """
    Function plots a violin plot using the parameters:
    data (DataFrame): DataFrame containing the data to be plotted.
    x (series): categorical series to plot on the x axis
    y (series): quantitavie series to be plotted on the y axis
    xlabel (str): x label for the plot
    ylabel (str): y label for the plot
    title (title): title of the plot
    """
    fig, ax = plt.subplots(figsize=(18, 6))
    base_color = sns.color_palette()[0]
    violin = sns.violinplot(data=data, x=x, y=y, ax=ax, color=base_color)
    plt.xticks(rotation=90)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title);
    return violin

def box_chart_2D(data, x, y, xlabel, ylabel, title):
    """
    Function plots a 2D box plot using the parameters:
    data (DataFrame): DataFrame containing the data to be plotted.
    x (series): categorical series to plot on the x axis
    y (series): quantitavie series to be plotted on the y axis
    xlabel (str): x label for the plot
    ylabel (str): y label for the plot
    title (title): title of the plot
    """
    fig, ax = plt.subplots(figsize=(18, 6))
    base_color = sns.color_palette()[0]
    box_chart = sns.boxplot(data=data, x=x, y=y, ax=ax, color=base_color)
    plt.xticks(rotation=90)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title);
    return box_chart


# Plot violin plot of Prosper score and Loan Status

violin_chart(prosper_ratings_normal, "LoanStatus", "ProsperScore", "Loan Status", "Prosper Score", "Distribution of Loan Status by Prosper Score");


# Plot box plot of Prosper score and Loan Status

box_chart_2D(prosper_ratings_normal, "LoanStatus", "ProsperScore", "Loan Status", "Prosper Score", "Distribution of Loan Status by Prosper Score");


# Define a function to plot scatter plots. 

def plot_scatter(data, x, y, xlabel, ylabel, a_x, a_y, color, title):
    """
    A bivariate scatter plotting function. Parameters are:
    data (DataFrame): The DataFrame that contains observations to be plotted
    x (str): series or column to plot on x-axis
    y (str): series or column to plot on y-axis
    xlabel (str): x axis label
    ylabel (str): y axis label
    a_x (int): length of x axis
    a_y (int): length of y axis
    color (str or hex): color of plot
    title (str): plot title
    """
    fig, ax = plt.subplots(figsize=(12, 6))
    ax.plot([0, a_x], [0, a_y], linestyle="--", color="#800080")
    scatter = data.plot(x, y, kind="scatter", color=color, title=title, ax=ax, alpha=0.9)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel);
    return scatter


# get correlation coefficient of borrower monthly income and loan amount
prosper_dataset["StatedMonthlyIncome"].corr(prosper_dataset["LoanOriginalAmount"])

0.19974080694069674


# plot scatter plot of borrower monthly income and original loan amount.
plot_scatter(prosper_dataset, "LoanOriginalAmount", "StatedMonthlyIncome",
             "Original Loan Amount", "Borrower's Monthly Income",
             36000, 36000, "#00FFFF",
             "Relationship between borrower monthly income and original amount borrowed");


# plot scatter plot of borrower monthly income and original loan amount.
plot_scatter(prosper_data_rand, "LoanOriginalAmount", "StatedMonthlyIncome",
             "Original Loan Amount", "Borrower's Monthly Income",
             30000, 30000, "#00FFFF",
             "Relationship between borrower monthly income and original amount borrowed");


# get correlation coefficient of borrower monthly income and loan amount
prosper_data_rand["StatedMonthlyIncome"].corr(prosper_data_rand["LoanOriginalAmount"])

0.4189600151580486


prosper_dataset["LoanOriginalAmount"].corr(prosper_dataset["BorrowerAPR"])

-0.3241576725500873


prosper_dataset["LoanOriginalAmount"].corr(prosper_dataset["BorrowerRate"])

-0.33062595925551863


# plot scatter plot of borrower original loan amount and Annual Percentage rate
plot_scatter(prosper_dataset, "LoanOriginalAmount", "BorrowerRate",
             "Original Loan Amount", "Borrower Interest Rate",
             0, 0, "#3EB4E7",
             "Relationship between amount borrowed and interest rate");


# plot scatter plot of borrower original loan amount and Annual Percentage rate
plot_scatter(prosper_dataset, "LoanOriginalAmount", "BorrowerAPR",
             "Original Loan Amount", "Borrower Annual Percentage Rate",
             0, 0, "#3EB4E7",
             "Relationship between amount borrowed and APR");


# group relationship between LoanStatus and APR
status_apr_rel = prosper_dataset.groupby("LoanStatus")["BorrowerAPR"].mean().to_frame()

status_apr_rel


# group relationship between LoanStatus and BorrowerRate
status_rate_rel = prosper_dataset.groupby("LoanStatus")["BorrowerRate"].mean().to_frame()

status_rate_rel


# Define a function that plots line graphs from groupby dataframe
def line_graph_plot(data, xlabel, ylabel, title, rotation):
    """
    Line graph plotter
    """
    fig, ax = plt.subplots(figsize=(12, 5))
    line = data.plot(kind="line", ax=ax)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title)
    plt.xticks(rotation=rotation)
    return line


# Plot line graph for BorrowerAPR and LoanStatus
line_graph_plot(status_apr_rel,
                "Loan Status", "Borrower APR",
                "Relationship between Borrower APR and Loan Status",
                45);


# Plot line graph for BorrowerRate and LoanStatus
line_graph_plot(status_rate_rel,
                "Loan Status", "Borrower Rate",
                "Relationship between Borrower Rate and Loan Status",
                45);


status_amount_rel = prosper_dataset.groupby("LoanStatus")["LoanOriginalAmount"].mean().to_frame()


status_amount_rel


# Plot line graph for Amount Borrowed and LoanStatus
line_graph_plot(status_amount_rel,
                "Loan Status", "Original Amount Borrowed",
                "Relationship between Borrower Rate and Amount Borrowed",
                45);


status_homeowner_rel = prosper_dataset.groupby(["LoanStatus"])["IsBorrowerHomeowner"].value_counts().to_frame().unstack()

status_homeowner_rel


# Plot line graph for Home ownership and LoanStatus
line_graph_plot(status_homeowner_rel,
                "Loan Status", "Is Borrower Homeowner",
                "Relationship between Borrower Home ownership and Loan Status",
                45);


prosper_dataset.groupby(["LoanStatus"])["IncomeRange"].value_counts().unstack()


# To plot a time-series chart, let us first set our index to the Listing Creation Date

prosper_dataset = prosper_dataset.set_index("ListingCreationDate")


# Plot raw time series of the original amount borrowed

fig, ax = plt.subplots(figsize=(15, 6))
prosper_dataset["LoanOriginalAmount"].plot(xlabel="Time", ylabel="Loan Original Amount", title="Loan Original Amount Series", ax=ax);


prosper_dataset["LoanOriginalAmount"].resample("1W").mean().isnull().sum()

37


prosper_dataset["LoanOriginalAmount"].resample("1M").mean().isnull().sum()

6


# Average amount borrowed weekly

prosper_dataset["LoanOriginalAmount"].resample("1W").mean().fillna(method="ffill").to_frame()


# Average amount borrowed monthly

prosper_dataset["LoanOriginalAmount"].resample("1M").mean().fillna(method="ffill").to_frame()


# Function to plot a time series
def plot_time_series(data, series, sample, xlabel, ylabel, title):
    """
    data (DataFrame): the DataFrame
    series (series): the column to plot
    sample (str): the smaple window for the series. Options are Day (D), Week (W), Month (M), Year (Y)
    xlabel (str): label on the x-axis
    ylabel (str): label on the y-axis
    title (str): title of plot
    """
    fig, ax = plt.subplots(figsize=(15, 6))
    plot = data[series].resample(sample).mean().fillna(method="ffill").plot(ax=ax)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title)
    return plot


# Plot time series of average amount borrowed weekly
plot_time_series(prosper_dataset, "LoanOriginalAmount", "1W", 
                 "Listing Creation Date", "Average Amount Borrowed",
                 "Weekly Time series plot of Average Amount Borrowed");


# Plot time series of average amount borrowed monthly
plot_time_series(prosper_dataset, "LoanOriginalAmount", "1M", 
                 "Listing Creation Date", "Average Amount Borrowed",
                 "Monthly Time series plot of Average Amount Borrowed");


# Plot timw series of borrower rate with a weekly sample window.
plot_time_series(prosper_dataset, "BorrowerRate", "1W",
                 "Listing Creation Date (Year)", "Average Borrower Rate",
                 "Time series plot of Average Borrower Rate (Weekly)");


# Plot timw series of borrower rate with a monthly sample window.
plot_time_series(prosper_dataset, "BorrowerRate", "1M",
                 "Listing Creation Date (Year)", "Average Borrower Rate",
                 "Time series plot of Average Borrower Rate (Monthly)");


g = sns.FacetGrid(data = prosper_data_rand, hue = "IsBorrowerHomeowner",
                height=5, aspect=2) # viridis used in reverse
g.map(sns.regplot, "StatedMonthlyIncome", "LoanOriginalAmount", x_jitter=0.5, y_jitter=0.5, fit_reg=True, scatter_kws={'alpha':5/10})
g.add_legend()
plt.xlabel("Monthly Income")
plt.ylabel("Original Amount Borrowed");


# plot a multivariate Facet Grid to subset homeonwers from non homeowners for the loan amount and borrower rate relationship.

g = sns.FacetGrid(data = prosper_dataset, col = "EmploymentStatus", row = "IsBorrowerHomeowner", margin_titles=True)
g.map(plt.scatter, "LoanOriginalAmount", "BorrowerRate");

	ListingKey	ListingNumber	ListingCreationDate	CreditGrade	Term	LoanStatus	ClosedDate	BorrowerAPR	BorrowerRate	LenderYield	...	LP_ServiceFees	LP_GrossPrincipalLoss	LP_NetPrincipalLoss	PercentFunded	Investors
46159	B5FB348594719244427B0FC	462328	2010-06-13 07:08:50.627000000	NaN	36	Completed	2013-12-30 00:00:00	0.33097	0.2950	0.2850	...	-35.25	0.00	0.00	1.0	35
81917	56F03390726402205C4F164	147487	2007-06-06 04:39:58.487000000	C	36	Chargedoff	2008-06-15 00:00:00	0.29776	0.2900	0.2800	...	-146.62	21951.10	21933.70	1.0	209
97664	87473593448868465EE9951	1030929	2013-11-11 10:55:20.243000000	NaN	60	Current	NaN	0.21115	0.1870	0.1770	...	-37.23	0.00	0.00	1.0	1
107839	BFE33587507307752A4BACA	888273	2013-09-05 16:01:37.460000000	NaN	36	Current	NaN	0.31032	0.2712	0.2612	...	-16.05	0.00	0.00	1.0	29
59363	F5FF3579022256746145B9A	774015	2013-05-08 18:47:09.327000000	NaN	60	Current	NaN	0.20081	0.1769	0.1669	...	-72.31	0.00	0.00	1.0	1
33843	D7B53601896238808A1578A	1193207	2014-02-14 05:53:27.880000000	NaN	36	Current	NaN	0.12117	0.0930	0.0830	...	0.00	0.00	0.00	1.0	1
16566	1CF93399896040134488434	199879	2007-09-10 10:46:05.807000000	D	36	Chargedoff	2008-11-03 00:00:00	0.18475	0.1775	0.1675	...	-30.87	4106.92	4106.92	1.0	166
52128	FBFC3507042071489F0746B	493263	2011-02-09 09:24:46.263000000	NaN	60	Completed	2012-03-06 00:00:00	0.18915	0.1749	0.1649	...	-120.00	0.00	0.00	1.0	313
22827	FB9A3534284455007B9FD2C	544504	2011-12-13 10:13:56.453000000	NaN	36	Current	NaN	0.19108	0.1621	0.1521	...	-76.08	0.00	0.00	1.0	94
46271	6A2E33767314400493604DB	76053	2006-12-18 15:53:52.733000000	C	36	Completed	2008-09-11 00:00:00	0.13654	0.1295	0.1245	...	-18.78	0.00	0.00	1.0	85

	ListingNumber	Term	BorrowerAPR	BorrowerRate	LenderYield	EstimatedEffectiveYield	EstimatedLoss	EstimatedReturn	ProsperRating (numeric)	ProsperScore	ListingCategory (numeric)	EmploymentStatusDuration	CreditScoreRangeLower	CreditScoreRangeUpper	CurrentCreditLines	OpenCreditLines	TotalCreditLinespast7years	OpenRevolvingAccounts	OpenRevolvingMonthlyPayment	InquiriesLast6Months	TotalInquiries	CurrentDelinquencies	AmountDelinquent	DelinquenciesLast7Years	PublicRecordsLast10Years	PublicRecordsLast12Months	RevolvingCreditBalance	BankcardUtilization	AvailableBankcardCredit	TotalTrades	TradesNeverDelinquent (percentage)	TradesOpenedLast6Months	DebtToIncomeRatio	StatedMonthlyIncome	TotalProsperLoans	TotalProsperPaymentsBilled	OnTimeProsperPayments	ProsperPaymentsLessThanOneMonthLate	ProsperPaymentsOneMonthPlusLate	ProsperPrincipalBorrowed	ProsperPrincipalOutstanding	ScorexChangeAtTimeOfListing	LoanCurrentDaysDelinquent	LoanFirstDefaultedCycleNumber	LoanMonthsSinceOrigination	LoanNumber	LoanOriginalAmount	MonthlyLoanPayment	LP_CustomerPayments	LP_CustomerPrincipalPayments	LP_InterestandFees	LP_ServiceFees	LP_CollectionFees	LP_GrossPrincipalLoss	LP_NetPrincipalLoss	LP_NonPrincipalRecoverypayments	PercentFunded	Recommendations	InvestmentFromFriendsCount	InvestmentFromFriendsAmount	Investors
count	113,937.000	113,937.000	113,912.000	113,937.000	113,937.000	84,853.000	84,853.000	84,853.000	84,853.000	84,853.000	113,937.000	106,312.000	113,346.000	113,346.000	106,333.000	106,333.000	113,240.000	113,937.000	113,937.000	113,240.000	112,778.000	113,240.000	106,315.000	112,947.000	113,240.000	106,333.000	106,333.000	106,333.000	106,393.000	106,393.000	106,393.000	106,393.000	105,383.000	113,937.000	22,085.000	22,085.000	22,085.000	22,085.000	22,085.000	22,085.000	22,085.000	18,928.000	113,937.000	16,952.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000	113,937.000
mean	627,885.693	40.830	0.219	0.193	0.183	0.169	0.080	0.096	4.072	5.950	2.774	96.072	685.568	704.568	10.317	9.260	26.755	6.970	398.292	1.435	5.584	0.592	984.507	4.155	0.313	0.015	17,598.707	0.561	11,210.225	23.230	0.886	0.802	0.276	5,608.026	1.421	22.934	22.272	0.614	0.049	8,472.312	2,930.314	-3.223	152.817	16.268	31.897	69,444.474	8,337.014	272.476	4,183.079	3,105.537	1,077.543	-54.726	-14.243	700.446	681.420	25.143	0.999	0.048	0.023	16.551	80.475
std	328,076.236	10.436	0.080	0.075	0.075	0.068	0.047	0.030	1.673	2.377	3.997	94.481	66.458	66.458	5.458	5.023	13.638	4.631	447.160	2.438	6.430	1.979	7,158.270	10.160	0.728	0.154	32,936.402	0.318	19,818.361	11.871	0.148	1.098	0.552	7,478.497	0.764	19.250	18.830	2.447	0.556	7,395.508	3,806.635	50.064	466.320	9.006	29.974	38,930.480	6,245.801	192.698	4,790.907	4,069.528	1,183.414	60.675	109.233	2,388.514	2,357.167	275.658	0.018	0.332	0.232	294.545	103.239
min	4.000	12.000	0.007	0.000	-0.010	-0.183	0.005	-0.183	1.000	1.000	0.000	0.000	0.000	19.000	0.000	0.000	2.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	-209.000	0.000	0.000	0.000	1.000	1,000.000	0.000	-2.350	0.000	-2.350	-664.870	-9,274.750	-94.200	-954.550	0.000	0.700	0.000	0.000	0.000	1.000
25%	400,919.000	36.000	0.156	0.134	0.124	0.116	0.042	0.074	3.000	4.000	1.000	26.000	660.000	679.000	7.000	6.000	17.000	4.000	114.000	0.000	2.000	0.000	0.000	0.000	0.000	0.000	3,121.000	0.310	880.000	15.000	0.820	0.000	0.140	3,200.333	1.000	9.000	9.000	0.000	0.000	3,500.000	0.000	-35.000	0.000	9.000	6.000	37,332.000	4,000.000	131.620	1,005.760	500.890	274.870	-73.180	0.000	0.000	0.000	0.000	1.000	0.000	0.000	0.000	2.000
50%	600,554.000	36.000	0.210	0.184	0.173	0.162	0.072	0.092	4.000	6.000	1.000	67.000	680.000	699.000	10.000	9.000	25.000	6.000	271.000	1.000	4.000	0.000	0.000	0.000	0.000	0.000	8,549.000	0.600	4,100.000	22.000	0.940	0.000	0.220	4,666.667	1.000	16.000	15.000	0.000	0.000	6,000.000	1,626.550	-3.000	0.000	14.000	21.000	68,599.000	6,500.000	217.740	2,583.830	1,587.500	700.840	-34.440	0.000	0.000	0.000	0.000	1.000	0.000	0.000	0.000	44.000
75%	892,634.000	36.000	0.284	0.250	0.240	0.224	0.112	0.117	5.000	8.000	3.000	137.000	720.000	739.000	13.000	12.000	35.000	9.000	525.000	2.000	7.000	0.000	0.000	3.000	0.000	0.000	19,521.000	0.840	13,180.000	30.000	1.000	1.000	0.320	6,825.000	2.000	33.000	32.000	0.000	0.000	11,000.000	4,126.720	25.000	0.000	22.000	65.000	101,901.000	12,000.000	371.580	5,548.400	4,000.000	1,458.540	-13.920	0.000	0.000	0.000	0.000	1.000	0.000	0.000	0.000	115.000
max	1,255,725.000	60.000	0.512	0.497	0.492	0.320	0.366	0.284	7.000	11.000	20.000	755.000	880.000	899.000	59.000	54.000	136.000	51.000	14,985.000	105.000	379.000	83.000	463,881.000	99.000	38.000	20.000	1,435,667.000	5.950	646,285.000	126.000	1.000	20.000	10.010	1,750,002.917	8.000	141.000	141.000	42.000	21.000	72,499.000	23,450.950	286.000	2,704.000	44.000	100.000	136,486.000	35,000.000	2,251.510	40,702.390	35,000.000	15,617.030	32.060	0.000	25,000.000	25,000.000	21,117.900	1.012	39.000	33.000	25,000.000	1,189.000

	ListingCreationDate	percentage
Year
2013	35413	31.081212
2012	19556	17.163871
2007	11557	10.143325
2011	11442	10.042392
2008	11263	9.885287
2014	10734	9.420996
2006	6213	5.453014
2010	5530	4.853559
2009	2206	1.936158
2005	23	0.020187

	ListingCreationDate	percentage
Month
January	11214	9.842281
October	10539	9.249849
December	10320	9.057637
February	10124	8.885612
September	10074	8.841728
November	9952	8.734652
July	9506	8.343207
August	9202	8.076393
June	8672	7.611224
May	8641	7.584016
March	8032	7.049510
April	7661	6.723891

	ListingCreationDate	percentage
Weekday
Tuesday	19946	17.506166
Monday	19594	17.197223
Wednesday	19057	16.725910
Thursday	17962	15.764853
Friday	16136	14.162212
Saturday	10758	9.442060
Sunday	10484	9.201576

	EmploymentStatus	counts
0	Employed	67322
1	Full-time	26355
2	Not available	5347
3	Not employed	835
4	Other	3806
5	Part-time	1088
6	Retired	795
7	Self-employed	6134

	IncomeRange	percentage
$25,000-49,999	32192	28.254211
$50,000-74,999	31050	27.251902
$100,000+	17337	15.216304
$75,000-99,999	16916	14.846801
Not displayed	7741	6.794106
$1-24,999	7274	6.384230
Not employed	806	0.707408
$0	621	0.545038

	ProsperScore	counts
0	1.0000	985
1	2.0000	5710
2	3.0000	7563
3	4.0000	12477
4	5.0000	9698
5	6.0000	12155
6	7.0000	10506
7	8.0000	11980
8	9.0000	6883
9	10.0000	4747

	ListingCategory (numeric)	counts	Listing Category
0	0	16727	Not Available
1	1	57950	Debt Consolidation
2	2	7386	Home Improvement
3	3	6940	Business
4	4	2274	Personal Loan
5	5	689	Student Use
6	6	2544	Auto
7	7	10311	Other
8	8	198	Baby&Adoption
9	9	84	Boat
10	10	90	Cosmetic Procedure
11	11	213	Engagement Ring
12	12	57	Green Loans
13	13	1940	Household Expenses
14	14	870	Large Purchases
15	15	1505	Medical/Dental
16	16	304	Motorcycle
17	17	52	RV
18	18	881	Taxes
19	19	760	Vacation
20	20	768	Wedding Loans

	BorrowerAPR
LoanStatus
Cancelled	0.1932
Chargedoff	0.2575
Completed	0.2085
Current	0.2133
Defaulted	0.2389
FinalPaymentInProgress	0.2296
Past Due (1-15 days)	0.2636
Past Due (16-30 days)	0.2679
Past Due (31-60 days)	0.2652
Past Due (61-90 days)	0.2730
Past Due (91-120 days)	0.2705
Past Due (>120 days)	0.2791

	BorrowerRate
LoanStatus
Cancelled	0.1711
Chargedoff	0.2352
Completed	0.1862
Current	0.1834
Defaulted	0.2231
FinalPaymentInProgress	0.1970
Past Due (1-15 days)	0.2306
Past Due (16-30 days)	0.2347
Past Due (31-60 days)	0.2325
Past Due (61-90 days)	0.2398
Past Due (91-120 days)	0.2379
Past Due (>120 days)	0.2484

	LoanOriginalAmount
LoanStatus
Cancelled	1875.0000
Chargedoff	6377.1221
Completed	6202.3844
Current	10383.4047
Defaulted	6444.4226
FinalPaymentInProgress	8346.1220
Past Due (1-15 days)	8473.4176
Past Due (16-30 days)	8211.3615
Past Due (31-60 days)	8558.2333
Past Due (61-90 days)	7758.1569
Past Due (91-120 days)	8044.7143
Past Due (>120 days)	8633.3333

	IsBorrowerHomeowner
IsBorrowerHomeowner	False	True
LoanStatus
Cancelled	3	1
Chargedoff	6444	5201
Completed	19338	18059
Current	25913	30404
Defaulted	2702	2236
FinalPaymentInProgress	95	110
Past Due (1-15 days)	422	373
Past Due (16-30 days)	133	127
Past Due (31-60 days)	178	182
Past Due (61-90 days)	130	176
Past Due (91-120 days)	174	127
Past Due (>120 days)	7	8

	LoanOriginalAmount
ListingCreationDate
2005-11-13	1250.0000
2005-11-20	4027.2727
2005-11-27	4000.0000
2005-12-04	2564.8333
2005-12-11	3500.0000
...	...
2014-02-16	11811.3439
2014-02-23	11981.4665
2014-03-02	12066.8385
2014-03-09	11957.3443
2014-03-16	10320.0000

	LoanOriginalAmount
ListingCreationDate
2005-11-30	3342.1053
2005-12-31	6296.7500
2006-01-31	4962.0000
2006-02-28	5774.4298
2006-03-31	4782.1673
...	...
2013-11-30	11148.7404
2013-12-31	11800.6394
2014-01-31	11863.5759
2014-02-28	12000.4169
2014-03-31	11835.9887

Prosper Loan Data - Part I¶

by Oluwafunmito Blessed ODEFEMI¶

Introduction¶

Preliminary Wrangling¶

What is the structure of your dataset?¶

What is/are the main feature(s) of interest in your dataset?¶

What features in the dataset do you think will help support your investigation into your feature(s) of interest?¶

Univariate Exploration¶

Variable 1 - ListingCreationDate¶

Variable 2 - Term¶

Variable 3 - LoanStatus¶

Variable 4 - BorrowerAPR¶

Variable 5 - BorrowerState¶

Variable 6 - EmploymentStatus¶

Variable 7 - IsBorrowerHomeowner¶

Variable 8 - IncomeRange¶

Variable 9 - StatedMonthlyIncome¶

Variable 10 - ProsperScore¶

Variable 11 - ListingCategory¶

Variable 12 - LoanOriginalAmount¶

Variable 13 - BorrowerRate¶

Discuss the distribution(s) of your variable(s) of interest. Were there any unusual points? Did you need to perform any transformations?¶

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?¶

Bivariate Exploration¶

Investigation 1: ProsperScore vs LoanStatus¶

Investigation 2: LoanOriginalAmount vs StatedMonthlyIncome¶

Investigation 3: LoanOriginalAmount vs BorrowerRate/BorrowerAPR¶

Investigation 4: LoanStatus vs BorrowerAPR/BorrowerRate¶

Investigation 5: LoanStatus vs LoanOriginalAmount¶

Investigation 6: LoanStatus vs Home ownership¶

Investigation 7: ListingCreationDate vs LoanOriginalAmount¶

Investigation 8: ListingCreationDate vs BorrowerRate¶

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?¶

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?¶

Multivariate Exploration¶

Investigation 1: Monthly Income vs Loan Amount vs Home Ownersip¶

Investigation 2: Loan Amount vs Borrower Rate vs Home Ownersip vs Employment Status¶

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?¶

Were there any interesting or surprising interactions between features?¶

Conclusions¶

Resources¶