R - createDataPartition carriage returns more samples than expected

Question

R - createDataPartition carriage returns more samples than expected

I am trying to break the diaphragm set into a training set and a test set. I used createDataPartition()as follows:

library(caret)
createDataPartition(iris$Species, p=0.1)
# [1]  12  22  26  41  42  57  63  79  89  93 114 117 134 137 142

createDataPartition(iris$Sepal.Length, p=0.1)
# [1]   1  27  44  46  54  68  72  77  83  84  93  99 104 109 117 132 134

I understand the first request. I have a vector of 0.1 * 150 elements (150 is the number of samples in the data set). However, I must have the same vector in the second query, but I get a vector of 17 elements instead of 15.

Any ideas as to why I am getting these results?

+4

r dataset r-caret

moby91 Oct 05 '17 at 8:45

source share

1 answer

desertnaut · Answer 1 · 2017-10-05T17:46:49+0000

Sepal.Length- numerical function; from online documentation:

y , . createDataPartition groups.
groups: y

:

groups = min(5, length(y))

:

groups, min(5, 150) = 5 breaks; , , 1- , , , summary:

> summary(iris$Sepal.Length)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  4.300   5.100   5.800   5.843   6.400   7.900

p = 0.1 (4) , (); , :

l1 = length(which(iris$Sepal.Length >= 4.3 & iris$Sepal.Length <= 5.1)) # 41
l2 = length(which(iris$Sepal.Length > 5.1 & iris$Sepal.Length <= 5.8))  # 39
l3 = length(which(iris$Sepal.Length > 5.8 & iris$Sepal.Length <= 6.4))  # 35
l4 = length(which(iris$Sepal.Length > 6.4 & iris$Sepal.Length <= 7.9))  # 35

? - № 140 , no. p; , p = 0.1:

ceiling(l1*p) + ceiling(l2*p) + ceiling(l3*p) + ceiling(l4*p)
# 17

!:)

R - createDataPartition carriage returns more samples than expected

More articles: