Please note that from R-3.4.0 (2017-04-21) smooth.spline can accept the direct specification λ for the newly added argument lambda . But during the evaluation, it will be converted to an internal spar . Therefore, the following answer is not affected.
The smoothing parameter λ / spar lies in the center of smoothness control
Smoothness is controlled by the smoothing parameter λ . smooth.spline() uses the spar internal smoothing parameter, not λ :
spar = s0 + 0.0601 * log(λ)
Such a logarithm conversion is necessary to perform unlimited minimization, such as GCV / CV. The user can specify spar to indirectly indicate λ . When spar grows linearly, λ will grow exponentially. Thus, the use of a large spar value is rarely required.
The degree of freedom df also defined in terms of λ :

where X is a model matrix with a B-spline basis, and S is a penalty matrix.
You can check your relationship with your dataset:
spar <- seq(1, 2.5, by = 0.1) a <- sapply(spar, function (spar_i) unlist(smooth.spline(x, y, all.knots=TRUE, spar = spar_i)[c("df","lambda")]))
Let the sketch df ~ spar , λ ~ spar and log(λ) ~ spar :
par(mfrow = c(1,3)) plot(spar, a[1, ], type = "b", main = "df ~ spar", xlab = "spar", ylab = "df") plot(spar, a[2, ], type = "b", main = "lambda ~ spar", xlab = "spar", ylab = "lambda") plot(spar, log(a[2,]), type = "b", main = "log(lambda) ~ spar", xlab = "spar", ylab = "log(lambda)")

Note the radical increase in λ with spar , the linear relationship between log(λ) and spar and the relatively smooth relationship between df and spar .
smooth.spline() setting iterations for spar
If we manually specify the spar value, like what we did in sapply() , no iterations are performed to select spar ; otherwise, smooth.spline() requires the repetition of a series of spar values. If we
- specify
cv = TRUE / FALSE , setting iterations is aimed at minimizing the CV / GCV score; - specify
df = mydf , while iterations are aimed at minimizing (df(spar) - mydf) ^ 2 .
Minimizing GCV is easy. We do not care about the evaluation of GCV, but we care about the corresponding spar . In contrast, when minimizing (df(spar) - mydf)^2 we often care about the df value at the end of the iteration, not the spar ! But bearing in mind that this is a minimization problem, we never guarantee that the final df matches our target mydf value.
Why do you put df = 3 but get df = 9.864?
The final iteration can either mean reaching a minimum, or reaching the search boundary, or achieving the maximum number of iterations.
We are far from the maximum limit of iterations (500 by default); but we did not hit the minimum. Well, we could reach the border.
Do not focus on df , think about spar .
smooth.spline(x, y, all.knots=TRUE, df=3)$spar
According to ?smooth.spline , by default smooth.spline() does a spar search between [-1.5, 1.5] . Ie, when you put df = 3 , the minimization ends at the search border, and does not press df = 3 .
Look at our graph of the relationship between df and spar , again. From the figure, it looks like we need some spar value of about 2 to bring df = 3 .
Use the control.spar argument:
fit <- smooth.spline(x, y, all.knots=TRUE, df=3, control.spar = list(high = 2.5)) # Smoothing Parameter spar= 1.859066 lambda= 0.9855336 (14 iterations) # Equivalent Degrees of Freedom (Df): 3.000305
Now you see that you are ending with df = 3 . And we need spar = 1.86 .
Best offer: do not use all.knots = TRUE
Look, you have 1000 data. With all.knots = TRUE you will use 1000 parameters. Wanting to get df = 3 , it follows that 997 out of 1000 parameters are suppressed. Imagine how large a λ therefore the spar you need!
Try using the floating regression plugin. Suppressing 200 parameters to 3 is certainly much simpler:
fit <- smooth.spline(x, y, nknots = 200, df=3) ## using 200 knots # Smoothing Parameter spar= 1.317883 lambda= 0.9853648 (16 iterations) # Equivalent Degrees of Freedom (Df): 3.000386
Now you get df = 3 without spar .