This is an excellent exercise (perhaps difficult) to digest the theory and implementation of linear models. My answer will consist of two parts:
- Part 1 (this) introduces the parameterization used and how this piecewise regression reduces to the usual least square problem. R functions are provided, including model evaluation, breakpoint selection, and prediction.
- Part 2 (another) demonstrates a toy reproducible example of how to use the functions that I defined.
I have to use a different parameterization, because the one you gave in your question is incorrect! Your parameterization ensures only the continuity of the function value, but not the first derivative! This is why your set line does not touch the corresponding quadratic polynomial in xt .

## generate design matrix getX <- function (x, c) { x <- x - c cbind("beta0" = 1, "beta1" = x, "beta2" = pmin(x, 0) ^ 2) }



The est function below completes .lm.fit (for maximum efficiency) to evaluate and output the model for a given c :
## `x`, `y` give data points; `c` is known break point est <- function (x, y, c) { ## model matrix X <- getX(x, c) p <- dim(X)[2L] ## solve least squares with QR factorization fit <- .lm.fit(X, y) ## compute Pearson estimate of `sigma ^ 2` r <- c(fit$residuals) n <- length(r) RSS <- c(crossprod(r)) sig2 <- RSS / (n - p) ## coefficients summary table beta <- fit$coefficients R <- "dimnames<-"(fit$qr[1:p, ], NULL) Rinv <- backsolve(R, diag(p)) se <- sqrt(rowSums(Rinv ^ 2) * sig2) tstat <- beta / se pval <- 2 * pt(abs(tstat), n - p, lower.tail = FALSE) tab <- matrix(c(beta, se, tstat, pval), nrow = p, ncol = 4L, dimnames = list(dimnames(X)[[2L]], c("Estimate", "Std. Error", "t value", "Pr(>|t|)"))) ## 2 * negative log-likelihood nega2logLik <- n * log(2 * pi * sig2) + (n - p) ## AIC / BIC aic <- nega2logLik + 2 * (p + 1) bic <- nega2logLik + log(n) * (p + 1) ## multiple R-squared and adjusted R-squared TSS <- c(crossprod(y - sum(y) / n)) r.squared <- 1 - RSS / TSS adj.r.squared <- 1 - sig2 * (n - 1) / TSS ## return list(coefficients = beta, residuals = r, fitted.values = c(X %*% beta), R = R, sig2 = sig2, coef.table = tab, aic = aic, bic = bic, c = c, RSS = RSS, r.squared = r.squared, adj.r.squared = adj.r.squared) }
As you can see, it also returns various summaries, as if summary.lm was called. Now write another choose.c wrapper choose.c . Draw an RSS vs c.grid and return the best model with the selected c .
choose.c <- function (x, y, c.grid) { if (is.unsorted(c.grid)) stop("'c.grid' in not increasing")
So far so good. To complete the story, we also need the predict procedure.
pred <- function (model, x.new) { ## prediction matrix X <- getX(x.new, model$c) p <- dim(X)[2L] ## predicted mean fit <- X %*% model$coefficients ## prediction standard error Qt <- forwardsolve(t(model$R), t(X)) se <- sqrt(colSums(Qt ^ 2) * model$sig2) ## 95%-confidence interval alpha <- qt(0.025, length(model$residuals) - p) lwr <- fit + alpha * se upr <- fit - alpha * se ## return matrix(c(fit, se, lwr, upr), ncol = 4L, dimnames = list(NULL, c("fit", "se", "lwr", "upr"))) }