I am training a model in R with a carriage package:
ctrl <- trainControl(method = "repeatedcv", repeats = 3, summaryFunction = twoClassSummary)
logitBoostFit <- train(LoanStatus~., credit, method = "LogitBoost", family=binomial, preProcess=c("center", "scale", "pca"),
trControl = ctrl)
I get the following warnings:
Warning message:
In train.default(x, y, weights = w, ...): The metric "Accuracy" was not in the result set. ROC will be used instead.Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
Something is wrong; all the ROC metric values are missing:
ROC Sens Spec
Min. : NA Min. :0.03496 Min. :0.9747
1st Qu.: NA 1st Qu.:0.03919 1st Qu.:0.9758
Median : NA Median :0.04343 Median :0.9770
Mean :NaN Mean :0.04349 Mean :0.9779
3rd Qu.: NA 3rd Qu.:0.04776 3rd Qu.:0.9795
Max. : NA Max. :0.05210 Max. :0.9821
NA :3
Error in train.default(x, y, weights = w, ...): Stopping
I installed the pROC package:
install.packages("pROC", repos="http://cran.rstudio.com/")
library(pROC)
Type 'citation("pROC")' for a citation.
Attaching package: ‘pROC’
The following objects are masked from ‘package:stats’:
cov, smooth, var
Here is the data:
str(credit)
'data.frame': 8580 obs. of 45 variables:
$ ListingCategory : int 1 7 3 1 1 7 1 1 1 1 ...
$ IncomeRange : int 3 4 6 4 4 3 3 4 3 3 ...
$ StatedMonthlyIncome : num 2583 4326 10500 4167 5667 ...
$ IncomeVerifiable : logi TRUE TRUE TRUE FALSE TRUE TRUE ...
$ DTIwProsperLoan : num 1.8e-01 2.0e-01 1.7e-01 1.0e+06 1.8e-01 4.4e-01 2.2e-01 2.0e-01 2.0e-01 3.1e-01 ...
$ EmploymentStatusDescription: Factor w/ 7 levels "Employed","Full-time",..: 1 4 1 7 1 1 1 1 1 1 ...
$ Occupation : Factor w/ 65 levels "","Accountant/CPA",..: 37 37 20 14 43 58 48 37 37 37 ...
$ MonthsEmployed : int 4 44 159 67 26 16 209 147 24 9 ...
$ BorrowerState : Factor w/ 48 levels "AK","AL","AR",..: 22 32 5 5 14 28 4 10 10 34 ...
$ BorrowerCity : Factor w/ 3089 levels "AARONSBURG","ABERDEEN",..: 1737 3059 2488 654 482 719 895 1699 2747 1903 ...
$ BorrowerMetropolitanArea : Factor w/ 1 level "(Not Implemented)": 1 1 1 1 1 1 1 1 1 1 ...
$ LenderIndicator : int 0 0 0 1 0 0 0 0 1 0 ...
$ GroupIndicator : logi FALSE FALSE FALSE TRUE FALSE FALSE ...
$ GroupName : Factor w/ 83 levels "","00 Used Car Loans",..: 1 1 1 47 1 1 1 1 1 1 ...
$ ChannelCode : int 90000 90000 90000 80000 40000 40000 90000 90000 80000 90000 ...
$ AmountParticipation : int 0 0 0 0 0 0 0 0 0 0 ...
$ MonthlyDebt : int 247 785 1631 817 644 1524 427 817 654 749 ...
$ CurrentDelinquencies : int 0 0 0 0 0 0 0 1 0 1 ...
$ DelinquenciesLast7Years : int 0 10 0 0 0 0 0 0 0 0 ...
$ PublicRecordsLast10Years : int 0 1 0 0 0 0 1 0 1 0 ...
$ PublicRecordsLast12Months : int 0 0 0 0 0 0 0 0 0 0 ...
$ FirstRecordedCreditLine : Factor w/ 4719 levels "1/1/00 0:00",..: 3032 2673 1197 2541 4698 4345 3150 925 4452 2358 ...
$ CreditLinesLast7Years : int 53 30 36 26 7 22 15 20 34 32 ...
$ InquiriesLast6Months : int 2 8 5 0 0 0 0 3 0 0 ...
$ AmountDelinquent : int 0 0 0 0 0 0 0 63 0 15 ...
$ CurrentCreditLines : int 10 10 18 10 4 11 6 10 7 8 ...
$ OpenCreditLines : int 9 10 15 8 3 8 5 7 7 8 ...
$ BankcardUtilization : num 0.26 0.69 0.94 0.69 0.81 0.38 0.55 0.24 0.03 0 ...
$ TotalOpenRevolvingAccounts : int 9 7 12 10 3 5 4 5 4 6 ...
$ InstallmentBalance : int 48648 14827 0 0 0 30916 0 21619 41340 15447 ...
$ RealEstateBalance : int 0 0 577745 0 0 0 191296 0 0 126039 ...
$ RevolvingBalance : int 5265 9967 94966 50511 37871 22463 19550 2436 1223 3236 ...
$ RealEstatePayment : int 0 0 4159 0 0 0 1303 0 0 1279 ...
$ RevolvingAvailablePercent : int 78 52 36 45 18 61 44 74 96 76 ...
$ TotalInquiries : int 8 11 15 2 0 0 1 7 1 1 ...
$ TotalTradeItems : int 53 30 36 26 7 22 15 20 34 32 ...
$ SatisfactoryAccounts : int 52 23 36 26 7 19 15 18 34 29 ...
$ NowDelinquentDerog : int 0 0 0 0 0 0 0 1 0 1 ...
$ WasDelinquentDerog : int 1 7 0 0 0 3 0 1 0 2 ...
$ OldestTradeOpenDate : int 5092001 5011977 12011984 4272000 9081993 9122000 6161987 11181999 9191990 4132000 ...
$ DelinquenciesOver30Days : int 0 6 0 0 0 13 0 2 0 2 ...
$ DelinquenciesOver60Days : int 0 4 0 0 0 0 0 0 0 1 ...
$ DelinquenciesOver90Days : int 0 10 0 0 0 0 0 0 0 0 ...
$ IsHomeowner : logi FALSE FALSE TRUE FALSE FALSE FALSE ...
$ LoanStatus : Factor w/ 2 levels "0","1": 2 1 1 2 2 2 2 2 2 1 .`..
resume (credit) ListingCategory IncomeRange StatedMonthlyIncome IncomeVerifiable Minimum: 0,000 min .: 1.000 min .: 0 Mode: logical
1st quarter: 1.000 1st quarter: 3.000 1st quarter: 3167 FALSE: 784
Median: 2.000 Average: 4.000 Average: 4750 TRUE: 7796
Average: 4.997 Average: 4,089 Average: 5755 NA: 0
3rd Qu .: 7.000 3rd Qu.:5.000 3rd Qu .: 7083
Maximum .: 20.000 Max .: 7,000 Max .: 250000
DTIwProsperLoan EmploymentStatusDescription MonthsEmployed
: 0,0 : 7182 .: -23,00
1- : 0,1 : 416 1- : 26.00
: 0,2 : 122 : 68.00
: 91609,4 : 475 : 97,44
3rd Qu.: 0.3 : 7 3rd Qu.:139.00
.: 1000000.0 : 32 .: 755,00
: 346 NA: 5
BorrowerState LenderIndicator GroupIndicator ChannelCode
CA: 1056 .: 0,00000 : .: 40000
FL: 608 1st Qu.:0.00000 : 8325 1st Qu.:80000
NY: 574 : 0,00000 : 255 : 80000
TX: 532 : 0,09196 NA: 0 : 77196
IL: 443 3rd Qu.:0.00000 3rd Qu.:90000
GA: 343 .: 1.00000 .: 90000
(): 5024
MonthlyDebt CurrentDelinquencies DelinquenciesLast7Years
: 0,0 .: 0,0000 .: 0,000
1- : 364,0 1- : 0,0000 1- : 0,000
: 708,0 : 0,0000 : 0,000
: 885,5 : 0,4119 : 4,009
3rd Qu.: 1205,2 3rd Qu.: 0.0000 3rd Qu.: 3.000
.: 30213,0 .: 21,0000 .: 99.000
PublicRecordsLast10Years PublicRecordsLast12Months CreditLinesLast7Years
: 0,0000 .: 0,00000 .: 2.0
1- : 0,0000 1- . 00.00000 1- .: 16.0
: 0,0000 : 0,00000 : 24,0
: 0,2809 : 0,01364 : 26,1
3rd Qu.: 0.0000 3rd Qu.:0.00000 3rd Qu.: 34.0
.: 11,0000 .: 4.00000 .: 115.0
Last6Months AmountDelinquent CurrentCreditLines OpenCreditLines
: 0,0000 .: 0 .: 0,000 .: 0,000
1- : 0,0000 1- : 0 1- .: 5.000 1- .: 5.000
: 1.0000 : 0 : 9.000 : 8.000
: 0,9994 : 1195 : 9.345 : 8.306
3rd Qu.: 1.0000 3rd Qu.: 0 3rd Qu.:12.000 3rd Qu.:11.000
.: 15,0000 .: 179158 .: 54.000 .: 42.000
BankcardUtilization TotalOpenRevolvingAccounts InstallmentBalance
: 0,0000 .: 0,000 .: 0
1- Qu.:0.2500 1st Qu.: 3.000 1st Qu.: 3338
: 0.5400 : 6.000 : 14453
: 0,5182 : 6.441 : 24900
3rd Qu.:0.7900 3rd Qu.: 9.000 3rd Qu.: 32238
.: 2.2300 .: 44.000 .: 739371
NA: 328
RealEstateBalance RevolvingBalance RealEstatePayment RevolvingAvailablePercent
: 0 .: 0 .: 0,0 .: 0,00
1- .: 0 1- .: 2799 1- : 0,0 1- .: 29.00
: 26154 : 8784 : 346,5 : 52,00
: 109306 : 19555 : 830,5 : 51,46
3rd Qu.: 176542 3rd Qu.: 21110 3rd Qu.: 1382.2 3rd Qu.: 75.00
.: 1938421 .: 695648 .: 13651,0.: 100.00
TotalTradeItems SatisfactoryAccounts NowDelinquentDerog
: 0,00 .: 2,0 .: 1,00 .: 0,0000
1- : 2,00 1- : 16,0 1- .: 14,00 1- : 0,0000
: 3,00 : 24,0 : 21,00 : 0,0000
: 3,91 : 26,1 : 23,34 : 0,4119
3rd Qu.: 5.00 3rd Qu.: 34.0 3rd Qu.: 30.25 3rd Qu.: 0.0000
.: 36,00 .: 115,0 .: 113,00 .: 21.0000
WasDelinquentDerog OldestTradeOpenDate DelinquenciesOver30Days
: 0,000 .: 1011957 .: 0,000
1st Qu.: 0.000 1st Qu.: 4101996 1st Qu.: 0.000
: 1.000 : 7191993 : 1.000
: 2.343 : 6934230 : 4.332
3rd Qu.: 3.000 3rd Qu.:10011990 3rd Qu.: 5.000
.: 32.000 .: 12312004 .: 99.000
Over60Days Over90Days IsHomeowner LoanStatus
: 0,000 .: 0.000 : 0: 1518
1st Qu.: 0.000 1st Qu.: 0.000 FALSE: 4264 1: 7062
: 0,000 : 0.000 : 4316
: 1.908 : 4.009 NA: 0
3rd Qu.: 2.000 3rd Qu.: 3.000
.: 73.000 .: 99.000
:
try(na.fail(credit))
dput(head(credit,4))
structure(list(ListingCategory = c(1L, 7L, 3L, 1L), IncomeRange = c(3L,
4L, 6L, 4L), StatedMonthlyIncome = c(2583.3333, 4326, 10500,
4166.6667), IncomeVerifiable = c(TRUE, TRUE, TRUE, FALSE), DTIwProsperLoan = c(0.18,
0.2, 0.17, 1e+06), EmploymentStatusDescription = structure(c(1L,
4L, 1L, 7L), .Label = c("Employed", "Full-time", "Not employed",
"Other", "Part-time", "Retired", "Self-employed"), class = "factor"),
MonthsEmployed = c(4L, 44L, 159L, 67L), BorrowerState = structure(c(22L,
32L, 5L, 5L), .Label = c("AK", "AL", "AR", "AZ", "CA", "CO",
"CT", "DC", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "KS",
"KY", "LA", "MA", "MD", "MI", "MN", "MO", "MS", "MT", "NC",
"NE", "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA",
"RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI",
"WV", "WY"), class = "factor"), LenderIndicator = c(0L, 0L,
0L, 1L), GroupIndicator = c(FALSE, FALSE, FALSE, TRUE), ChannelCode = c(90000L,
90000L, 90000L, 80000L), MonthlyDebt = c(247L, 785L, 1631L,
817L), CurrentDelinquencies = c(0L, 0L, 0L, 0L), DelinquenciesLast7Years = c(0L,
10L, 0L, 0L), PublicRecordsLast10Years = c(0L, 1L, 0L, 0L
), PublicRecordsLast12Months = c(0L, 0L, 0L, 0L), CreditLinesLast7Years = c(53L,
30L, 36L, 26L), InquiriesLast6Months = c(2L, 8L, 5L, 0L),
AmountDelinquent = c(0L, 0L, 0L, 0L), CurrentCreditLines = c(10L,
10L, 18L, 10L), OpenCreditLines = c(9L, 10L, 15L, 8L), BankcardUtilization = c(0.26,
0.69, 0.94, 0.69), TotalOpenRevolvingAccounts = c(9L, 7L,
12L, 10L), InstallmentBalance = c(48648L, 14827L, 0L, 0L),
RealEstateBalance = c(0L, 0L, 577745L, 0L), RevolvingBalance = c(5265L,
9967L, 94966L, 50511L), RealEstatePayment = c(0L, 0L, 4159L,
0L), RevolvingAvailablePercent = c(78L, 52L, 36L, 45L), TotalInquiries = c(8L,
11L, 15L, 2L), TotalTradeItems = c(53L, 30L, 36L, 26L), SatisfactoryAccounts = c(52L,
23L, 36L, 26L), NowDelinquentDerog = c(0L, 0L, 0L, 0L), WasDelinquentDerog = c(1L,
7L, 0L, 0L), OldestTradeOpenDate = c(5092001L, 5011977L,
12011984L, 4272000L), DelinquenciesOver30Days = c(0L, 6L,
0L, 0L), DelinquenciesOver60Days = c(0L, 4L, 0L, 0L), DelinquenciesOver90Days = c(0L,
10L, 0L, 0L), IsHomeowner = c(FALSE, FALSE, TRUE, FALSE),
LoanStatus = structure(c(2L, 1L, 1L, 2L), .Label = c("0",
"1"), class = "factor")), .Names = c("ListingCategory", "IncomeRange",
"StatedMonthlyIncome", "IncomeVerifiable", "DTIwProsperLoan",
"EmploymentStatusDescription", "MonthsEmployed", "BorrowerState",
"LenderIndicator", "GroupIndicator", "ChannelCode", "MonthlyDebt",
"CurrentDelinquencies", "DelinquenciesLast7Years", "PublicRecordsLast10Years",
"PublicRecordsLast12Months", "CreditLinesLast7Years", "InquiriesLast6Months",
"AmountDelinquent", "CurrentCreditLines", "OpenCreditLines",
"BankcardUtilization", "TotalOpenRevolvingAccounts", "InstallmentBalance",
"RealEstateBalance", "RevolvingBalance", "RealEstatePayment",
"RevolvingAvailablePercent", "TotalInquiries", "TotalTradeItems",
"SatisfactoryAccounts", "NowDelinquentDerog", "WasDelinquentDerog",
"OldestTradeOpenDate", "DelinquenciesOver30Days", "DelinquenciesOver60Days",
"DelinquenciesOver90Days", "IsHomeowner", "LoanStatus"), row.names = c(NA,
4L), class = "data.frame")
, ?
Warning message:
In train.default(x, y, weights = w, ...): The metric "Accuracy" was not in the result set. ROC will be used instead.
initial value 5144.538374
iter 10 value 3540.667624
iter 20 value 3329.692768
iter 30 value 3279.191024
iter 40 value 3264.926986
iter 50 value 3259.276647
iter 60 value 3259.056261
final value 3259.032668
converged
initial value 5144.538374
iter 10 value 3540.774666
iter 20 value 3330.016829
iter 30 value 3279.545595
iter 40 value 3265.384385
iter 50 value 3259.499032
iter 60 value 3259.353010
final value 3259.342601
converged
initial value 5144.538374
iter 10 value 3540.667731
iter 20 value 3329.693092
iter 30 value 3279.191379
iter 40 value 3264.927427
iter 50 value 3259.276899
iter 60 value 3259.056561
final value 3259.032978
converged
initial value 5144.538374
iter 10 value 3528.401458
iter 20 value 3314.932958
iter 30 value 3264.117072
iter 40 value 3253.780051
iter 50 value 3253.368959
iter 60 value 3253.359047
final value 3253.358819
converged
initial value 5144.538374
iter 10 value 3528.508505
iter 20 value 3315.134599
iter 30 value 3265.021404
iter 40 value 3255.739021
iter 50 value 3253.817833
iter 60 value 3253.697180
final value 3253.671003
converged
initial value 5144.538374
iter 10 value 3528.401565
iter 20 value 3314.933160
iter 30 value 3264.117768
iter 40 value 3253.780539
iter 50 value 3253.369030
iter 60 value 3253.359358
final value 3253.359133
converged
initial value 5145.231521
iter 10 value 4680.326236
iter 20 value 4672.506024
iter 30 value 3662.998233
iter 40 value 3310.207744
iter 50 value 3252.983656
iter 60 value 3250.400275
iter 70 value 3250.339216
final value 3250.332646
converged
... # : 72 (71 ) 5144.538374 iter 10 4661.569290 iter 20 4652.246624 iter 30 3715.472355 iter 40 3484.096833 iter 50 3254.247424 iter 60 3248.931841 iter 70 3248.154679 iter 80 3248.129089 iter 80 3248.129085 3248.128574 # : 72 (71 ) 5144.538374 iter 10 4663.660886 iter 20 4654.255466 iter 30 3542.473235 iter 40 3315.027437 iter 50 3250.340679 iter 60 3248.693378 iter 70 3248.455840 iter 80 3248.443345 iter 80 3248.443325 iter 80 3248.443325 3248.443325 # : 72 (71 ) 5144.538374 iter 10 4661.571382 iter 20 4652.248711 iter 30 4397.069608 iter 40 3532.067046 iter 50 3283.179445 iter 60 3249.518694 iter 70 3248.163057 iter 80 3248.129552 3248.128889 : TrainWorkflow (x = x, y = y, wts = , info = trainInfo:: . - ; ROC : ROC Sens
: NA .: 0,01805 .: 0,9946
1- .: NA 1- Qu.:0.01805 1- Qu.:0,9946
: : 0,01805 : 0,9946
: NaN : 0,01805 : 0,9946
3rd Qu.: NA 3rd Qu.:0.01805 3rd Qu.:0.9946
.: NA .: 0,01805 .: 0,9946
NA: 3
train.default(x, y, weight = w,...):
summaryFunction = twoClassSummary, , .
:
ctrl <- trainControl(method = "cv", summaryFunction = twoClassSummary)
multinomSummaryFit <- train(LoanStatus~., credit, method = "multinom", family=binomial,
trControl = ctrl)
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures.
Something is wrong; all the ROC metric values are missing:
ROC Sens Spec
Min. : NA Min. :0.01919 Min. :0.9941
1st Qu.: NA 1st Qu.:0.01988 1st Qu.:0.9942
Median : NA Median :0.02056 Median :0.9943
Mean :NaN Mean :0.02011 Mean :0.9943
3rd Qu.: NA 3rd Qu.:0.02056 3rd Qu.:0.9943
Max. : NA Max. :0.02057 Max. :0.9944
NA :3
Error in train.default(x, y, weights = w, ...): Stopping