In my previous blog post I demonstrated how to fit a simple (nonlinear) quantile regression model using a bivariate copula. In these simple quantile regression models, we have one independent and one dependent variable.
The R code below may be used for fitting a multiple (nonlinear) quantile regression model. These multiple (nonlinear) quantile regression models have two or more independent variables (but only one dependent variable). The R code fits these multiple (nonlinear) quantile regression models by means of a multivariate (Archimedean or elliptical) copula.
Finally, the algorithm used for fitting the multiple regression models in the R code below is based on Nelsen’s method for fitting quantile regression models (which I described in my previous blog post).
Suggestions and/or questions? Please contact Stefan Gelissen (email: info at datall-analyse.nl).
See this page for an overview of all of Stefan’s R code blog posts.
R code
library(copula)
library(MASS)
library(scatterplot3d)
#Function for performing multiple (nonlinear) quantile regression
#by means of a fitted multivariate copula
qrCopula <- function(copula, p=.5, marginal, x) {
#-copula is a fitted copula as returned by fitMvdc(). At the moment,
# only Archimedean and elliptical copulas are accepted.
#-p is a number specifying the quantile to be estimated, where p
# is defined in [0, 1].
#-marginal is an integer indicating which of the copula's marginals should
# be treated as the dependent variable. For instance, assume we defined
# x1, x2, and x3 as marginals (in that order) in a multivariate copula.
# If we subsequently want to treat x2 as the dependent variable, then we
# set marginal=2. See Examples 1 and 2 below for details.
#-x is a vector with numbers specifying at which data points we want to
# estimate the quantile. In other words, the numbers in this vector
# are the values on the independent variables.
# The numbers in this vector should be in exactly the same order
# as their corresponding marginals in the fitted copula.
# For instance, assume again that we defined x1, x2, and x3 as
# marginals (in that order) in a multivariate copula. Next, we treat
# x2 as the dependent variable, and x1 and x3 as independent variables.
# As a result, the order of the data points in our vector has to be (x1, x3).
# See Examples 1 and 2 below for details.
mvdcQR <- copula
#number of dimensions
nd <- mvdcQR@mvdc@copula@dimension
#independent variables
indepVar <- c(1:nd)[-marginal]
#compute u[1], ..., u[n-1] for the independent variables
ui <- sapply(1:(nd-1), function (k) {
n <- indepVar[k]
cdf.margin <- copula:::asCall(paste0("p", mvdcQR@mvdc@margins[n]),
mvdcQR@mvdc@paramMargins[[n]])
eval(cdf.margin, list(x=x[k]))
})
#compute dependent variable
ud <- NA
udfun <- function (ud, ui, alpha, t) {t - cCopula(t(as.matrix(c(ui, ud))),
copula=mvdcQR@mvdc@copula,
indices=nd)}
try(ud <- uniroot(udfun, interval=c(1-.99999999999999, .99999999999999),
ui=ui, alpha=alpha, t=p, tol=1e-5)$root, silent=FALSE)
qdf.margin <- copula:::asCall(paste0("q", mvdcQR@mvdc@margins[marginal]),
mvdcQR@mvdc@paramMargins[[marginal]])
x2 <- eval(qdf.margin, list(x=ud))
x2
}
#Function for computing the conditional probability:
#P{ Y<=y | X(1)=x(1), ..., X(N)=x(n) }, where y is the dependent variable
#and x(1), ..., x(n) the independent variables.
condProbY <- function(copula, marginal, data) {
#-copula is a fitted copula as returned by fitMvdc(). At the moment,
# only Archimedean and elliptical copulas are accepted.
#-marginal is an integer indicating which of the copula's marginals should
# be treated as the dependent variable. For instance, assume we defined
# x1, x2, and x3 as marginals (in that order) in a multivariate copula.
# If we subsequently want to treat x2 as the dependent variable, then we
# set marginal=2. See Examples 1 and 2 below for details.
#-data is a vector with numbers specifying at which values for x we
# want to compute the conditional probability.
# The numbers in this vector should be in exactly the same order
# as their corresponding marginals in the fitted copula.
# For instance, assume again that we defined x1, x2, and x3 as
# marginals (in that order) in a multivariate copula. Next, we treat
# x2 as the dependent variable, and x1 and x3 as independent variables.
# As a result, the order of the data points in our vector has to be (x1, x3).
# See Examples 1 and 2 below for details.
mvdcQR <- copula
#number of dimensions
nd <- mvdcQR@mvdc@copula@dimension
#independent variables
indepVar <- c(1:nd)[-marginal]
#compute u[1], ..., u[n]
u <- sapply(1:nd, function (k) {
cdf.margin <- copula:::asCall(paste0("p", mvdcQR@mvdc@margins[k]),
mvdcQR@mvdc@paramMargins[[k]])
eval(cdf.margin, list(x=data[k]))
})
ui <- u[indepVar] #independent variables
ud <- u[marginal] #dependent variable
#compute the conditional probability for y given the independent variables
cp <- cCopula(t(as.matrix(c(ui, ud))), copula=mvdcQR@mvdc@copula, indices=nd)
cp
}
##EXAMPLE 1: multiple linear quantile regression
##Generate some artificial data using a Gaussian copula
#3-dimensional Gaussian copula
myCop <- normalCopula(param=c(.5, .4, -.3), dim=3, dispstr="un")
#Marginals:
#the first marginal is a normal distribution, the second marginal is
#a normal distribution, and the third marginal is also a normal distribution.
myMvdc <- mvdc(copula=myCop, c("norm", "norm", "norm"),
list(list(mean=5, sd=.02),
list(mean=2, sd=3),
list(mean=10, sd=1.5)))
#Draw sample (this sample will subsequently be used as the artificial data)
ns <- 200 #sample size
xyzs <- rMvdc(ns, myMvdc)
#Visualize the data:
#we will label xyzs[,1] as x1 (this is the first marginal in our specified
#copula), xyzs[,2] as x2 (this is the second marginal in our copula), and
#xyzs[,3] as x3 (this is the third marginal in our copula).
scatterplot3d(xyzs, xlab="x1", ylab="x2", zlab="x3",
highlight.3d=TRUE, box=FALSE)
##Fit a 3-dimensional Gaussian copula to the artificial data
#Obtain start values for fitting the copula
normMLE <- fitdistr(xyzs[,1], densfun="normal")
normMean1 <- normMLE$estimate[1]
normSD1 <- normMLE$estimate[2]
normMLE <- fitdistr(xyzs[,2], densfun="normal")
normMean2 <- normMLE$estimate[1]
normSD2 <- normMLE$estimate[2]
normMLE <- fitdistr(xyzs[,3], densfun="normal")
normMean3 <- normMLE$estimate[1]
normSD3 <- normMLE$estimate[2]
#Starting values for the copula parameters
tauEst <- sin(cor(xyzs, method="kendall")*pi/2)
tauEst <- tauEst[upper.tri(tauEst)]
#Fit a Gaussian copula to the artificial data
myGaussian <- mvdc(copula=normalCopula(param=c(1,1,1), dispstr="un", dim=3),
c("norm", "norm", "norm"),
paramMargins=list(list(mean=0, sd=1),
list(mean=0, sd=1),
list(mean=0, sd=1)))
start <- c(normMean1, normSD1, normMean2, normSD2, normMean3, normSD3, tauEst)
fitGaussian <- fitMvdc(xyzs, myGaussian, start=start)
fitGaussian
##Quantile regression: regress x3 on both x1 and x2
#We will treat x3 as the dependent variable (=y), and both
#x1 and x2 as independent variables.
#Fit a (straight) plane for the median
x1points <- seq(min(xyzs[,1]), max(xyzs[,1]), length.out=5)
x2points <- seq(min(xyzs[,2]), max(xyzs[,2]), length.out=5)
x1x2points <- expand.grid(x1points, x2points)
#Note that the order of the marginals in our fitted copula
#is x1, x2, x3. As a result, the order of the values in the
#vector for x has to be (x1, x2).
#Furthermore, we will treat x3 as the dependent variable, thus set marginal=3.
ypoints <- matrix(sapply(1:nrow(x1x2points), function (k)
qrCopula(copula=fitGaussian, p=.5, marginal=3, x=as.numeric(x1x2points[k,]))),
nrow=length(x1points))
#Color the data points that lie above the plane for the median black, and
#points below the plane light gray.
cprobs <- sapply(1:nrow(xyzs), function (k)
condProbY(copula=fitGaussian, marginal=3, data=xyzs[k,]))
colorPoint <- ifelse(cprobs < .5, gray(level=.6, alpha=.7),
gray(level=0, alpha=1))
#Plot the plane
qrplot <- scatterplot3d(xyzs, xlab="x1", ylab="x2", zlab="y",
angle=40, color=colorPoint, box=FALSE)
for(i in length(x1points):1)
qrplot$points3d(rep(x1points[i], length(x2points)), x2points, ypoints[i,],
type="l", col="blue", lty=2)
for(i in length(x2points):1)
qrplot$points3d(x1points, rep(x2points[i], length(x1points)), ypoints[,i],
type="l", col="blue", lty=2)
#Approximately 100 (out of 200) data points should lie above the
#the plane for the median.
sum(cprobs>=.5)
#Fit a quantile curve while varying the values for x1, and fixing x2
#to some value.
x1points <- seq(min(xyzs[,1]), max(xyzs[,1]), length.out=50) #values for x1
x2points <- 0 #fixed value for x2
x1x2points <- expand.grid(x1points, x2points)
#First compute true quantile curves for these data points
fitGaussian2 <- fitGaussian
fitGaussian2@mvdc@copula@parameters <- c(.5, .4, -.3)
fitGaussian2@mvdc@paramMargins[[1]]$mean <- 5
fitGaussian2@mvdc@paramMargins[[1]]$sd <- .02
fitGaussian2@mvdc@paramMargins[[2]]$mean <- 2
fitGaussian2@mvdc@paramMargins[[2]]$sd <- 3
fitGaussian2@mvdc@paramMargins[[3]]$mean <- 10
fitGaussian2@mvdc@paramMargins[[3]]$sd <- 1.5
#compute the true median
ypointsTmed <- matrix(sapply(1:nrow(x1x2points), function (k)
qrCopula(copula=fitGaussian2, p=.5, marginal=3, x=as.numeric(x1x2points[k,]))),
nrow=length(x1points))
#compute the true .05th-quantile
ypointsT05 <- matrix(sapply(1:nrow(x1x2points), function (k)
qrCopula(copula=fitGaussian2, p=.05, marginal=3, x=as.numeric(x1x2points[k,]))),
nrow=length(x1points))
#compute the true .95th-quantile
ypointsT95 <- matrix(sapply(1:nrow(x1x2points), function (k)
qrCopula(copula=fitGaussian2, p=.95, marginal=3, x=as.numeric(x1x2points[k,]))),
nrow=length(x1points))
#Plot the results
plot(x1points, ypointsTmed, type="l", col="gray", xlab="x1", ylab="y",
main=paste("x2 = ", x2points), ylim=c(min(ypointsT05), max(ypointsT95)))
lines(x1points, ypointsT05, lty=2, col="gray")
lines(x1points, ypointsT95, lty=2, col="gray")
#add a legend
legend("topleft", lty=c(1, 2, 1, 1, 1),
col=c("black", "black", "gray", "blue", "red"),
legend=c("Median", "90% prediction interval", "True quantile curves",
"qrCopula function", "rq function"),
bty="n", y.intersp=1.2, cex=.7)
#Fit quantiles curves using the qrCopula function:
#compute the median
ypointsmed <- matrix(sapply(1:nrow(x1x2points), function (k)
qrCopula(copula=fitGaussian, p=.5, marginal=3, x=as.numeric(x1x2points[k,]))),
nrow=length(x1points))
#compute the .05th-quantile
ypoints05 <- matrix(sapply(1:nrow(x1x2points), function (k)
qrCopula(copula=fitGaussian, p=.05, marginal=3, x=as.numeric(x1x2points[k,]))),
nrow=length(x1points))
#compute the .95th-quantile
ypoints95 <- matrix(sapply(1:nrow(x1x2points), function (k)
qrCopula(copula=fitGaussian, p=.95, marginal=3, x=as.numeric(x1x2points[k,]))),
nrow=length(x1points))
#add the results to the plot
lines(x1points, ypointsmed, lty=1, col="blue")
lines(x1points, ypoints05, lty=2, col="blue")
lines(x1points, ypoints95, lty=2, col="blue")
#Note that the model for our data is a linear quantile regression model.
#As a result, we may also fit the quantiles curves using the rq function
#in the quantreg package.
library(quantreg)
#fit a linear quantile regression model for the median
rqmodmed <- rq(y ~ x1 + x2, tau=.5,
data=data.frame(y=xyzs[,3], x1=xyzs[,1], x2=xyzs[,2]))
#fit a linear quantile regression model for the .05th-quantile
rqmod05 <- rq(y ~ x1 + x2, tau=.05,
data=data.frame(y=xyzs[,3], x1=xyzs[,1], x2=xyzs[,2]))
#fit a linear quantile regression model for the .95th-quantile
rqmod95 <- rq(y ~ x1 + x2, tau=.95,
data=data.frame(y=xyzs[,3], x1=xyzs[,1], x2=xyzs[,2]))
#compute for these 3 linear quantile regression models the
#corresponding quantile curves
ypointsmed <- predict(rqmodmed, newdata=data.frame(x1=x1points, x2=x2points))
ypoints05 <- predict(rqmod05, newdata=data.frame(x1=x1points, x2=x2points))
ypoints95 <- predict(rqmod95, newdata=data.frame(x1=x1points, x2=x2points))
#add the results to the plot
lines(x1points, ypointsmed, lty=1, col="red")
lines(x1points, ypoints05, lty=2, col="red")
lines(x1points, ypoints95, lty=2, col="red")
##EXAMPLE 2: nonlinear quantile regression
##Generate some artificial data using a Gumbel copula
#3-dimensional Gumbel copula
myCop <- gumbelCopula(param=3, dim=3)
#Marginals:
#the first marginal is a normal distribution, the second marginal a
#gamma distribution, and the third marginal a Weibull distribution
myMvdc <- mvdc(copula=myCop, c("norm", "gamma", "weibull"),
list(list(mean=5, sd=.02),
list(shape=1.2, scale=3),
list(shape=.3, scale=5)))
#Draw sample (this sample will subsequently be used as the artificial data)
ns <- 200 #sample size
xyzs <- rMvdc(ns, myMvdc)
#Visualize the data:
#we will label xyzs[,1] as x1 (this is the first marginal in our specified
#copula), xyzs[,2] as x2 (this is the second marginal in our copula), and
#xyzs[,3] as x3 (this is the third marginal in our copula).
scatterplot3d(x=xyzs[,1], y=xyzs[,3], z=xyzs[,2],
xlab="x1", ylab="x3", zlab="x2",
highlight.3d=TRUE)
##Fit a 3-dimesional Gumbel copula to the artificial data
#Obtain start values for fitting the copula
normMLE <- fitdistr(xyzs[,1], densfun="normal")
normMean1 <- normMLE$estimate[1]
normSD1 <- normMLE$estimate[2]
gammaMLE <- fitdistr(xyzs[,2], densfun="gamma")
gammaShape2 <- gammaMLE$estimate[1]
gammaScale2 <- 1/gammaMLE$estimate[2]
weibullMLE <- fitdistr(xyzs[,3], densfun="weibull")
weibullShape3 <- weibullMLE$estimate[1]
weibullScale3 <- weibullMLE$estimate[2]
#Starting values for the copula parameter
tauEst <- cor(xyzs[,1], xyzs[,2], method="kendall")
thetaGumbel <- 1 / (1-tauEst)
#Fit a Gumbel copula to the artificial data
myGumbel <- mvdc(copula=gumbelCopula(param=1.5, dim=3),
c("norm", "gamma", "weibull"),
paramMargins=list(list(mean=0, sd=1),
list(shape=1, scale=1),
list(shape=1, scale=1)))
start <- c(normMean1, normSD1, gammaShape2, gammaScale2,
weibullShape3, weibullScale3, thetaGumbel)
fitGumbel <- fitMvdc(xyzs, myGumbel, start=start)
fitGumbel
##Quantile regression: regress x2 on both x1 and x3
#We will treat x2 as the dependent variable (=y), and both
#x1 and x3 as independent variables.
#Fit a curved plane for the median
x1points <- seq(min(xyzs[,1]), max(xyzs[,1]), length.out=15)
x3points <- seq(min(xyzs[,3]), max(xyzs[,3]), length.out=10)
x1x3points <- expand.grid(x1points, x3points)
#Note that the order of the marginals in our fitted copula
#is x1, x2, x3. As a result, the order of the values in the
#vector for x has to be (x1, x3).
#Furthermore, we will treat x2 as the dependent variable, thus marginal=2.
ypoints <- matrix(sapply(1:nrow(x1x3points), function (k)
qrCopula(copula=fitGumbel, p=.5, marginal=2, x=as.numeric(x1x3points[k,]))),
nrow=length(x1points))
#Color the data points that lie above the plane for the median black, and
#points below the plane light gray.
cprobs <- sapply(1:nrow(xyzs), function (k)
condProbY(copula=fitGumbel, marginal=2, data=xyzs[k,]))
colorPoint <- ifelse(cprobs < .5, gray(level=.6, alpha=.7),
gray(level=0, alpha=1))
#Plot the plane
qrplot <- scatterplot3d(x=xyzs[,1], y=xyzs[,3], z=xyzs[,2], angle=60,
xlab="x1", ylab="x3", zlab="y",
color=colorPoint, scale.y=.5, box=FALSE)
for(i in length(x1points):1)
qrplot$points3d(rep(x1points[i], length(x3points)), x3points, ypoints[i,],
type="l", col="blue", lty=2)
for(i in length(x3points):1)
qrplot$points3d(x1points, rep(x3points[i], length(x1points)), ypoints[,i],
type="l", col="blue", lty=2)
#Approximately 100 (out of 200) data points should lie above the
#the plane for the median
sum(cprobs>=.5)
#Fit a quantile curve while varying the values for x1, and fixing x3
#to some value.
x1points <- seq(min(xyzs[,1]), max(xyzs[,1]), length.out=50) #values for x1
x3points <- 500 #fixed value for x3
x1x3points <- expand.grid(x1points, x3points)
#First compute true quantile curves for these data points
fitGumbel2 <- fitGumbel
fitGumbel2@mvdc@copula@parameters <- 3
fitGumbel2@mvdc@paramMargins[[1]]$mean <- 5
fitGumbel2@mvdc@paramMargins[[1]]$sd <- .02
fitGumbel2@mvdc@paramMargins[[2]]$shape <- 1.2
fitGumbel2@mvdc@paramMargins[[2]]$scale <- 3
fitGumbel2@mvdc@paramMargins[[3]]$shape <- .3
fitGumbel2@mvdc@paramMargins[[3]]$scale <- 5
#compute the true median
ypointsTmed <- matrix(sapply(1:nrow(x1x3points), function (k)
qrCopula(copula=fitGumbel2, p=.5, marginal=2, x=as.numeric(x1x3points[k,]))),
nrow=length(x1points))
#compute the true .05th-quantile
ypointsT05 <- matrix(sapply(1:nrow(x1x3points), function (k)
qrCopula(copula=fitGumbel2, p=.05, marginal=2, x=as.numeric(x1x3points[k,]))),
nrow=length(x1points))
#compute the true .95th-quantile
ypointsT95 <- matrix(sapply(1:nrow(x1x3points), function (k)
qrCopula(copula=fitGumbel2, p=.95, marginal=2, x=as.numeric(x1x3points[k,]))),
nrow=length(x1points))
#Plot the results
plot(x1points, ypointsTmed, type="l", col="gray", xlab="x1", ylab="y",
main=paste("x3 = ", x3points), ylim=c(min(ypointsT05), max(ypointsT95)))
lines(x1points, ypointsT05, lty=2, col="gray")
lines(x1points, ypointsT95, lty=2, col="gray")
#add a legend
legend("topleft", lty=c(1, 2, 1, 1, 1),
col=c("black", "black", "gray", "blue"),
legend=c("Median", "90% prediction interval", "True quantile curves",
"Fitted quantile curves"),
bty="n", y.intersp=1.2, cex=.7)
#Fit quantiles curves using the qrCopula function:
#compute the median
ypointsmed <- matrix(sapply(1:nrow(x1x3points), function (k)
qrCopula(copula=fitGumbel, p=.5, marginal=2, x=as.numeric(x1x3points[k,]))),
nrow=length(x1points))
#compute the .05th-quantile
ypoints05 <- matrix(sapply(1:nrow(x1x3points), function (k)
qrCopula(copula=fitGumbel, p=.05, marginal=2, x=as.numeric(x1x3points[k,]))),
nrow=length(x1points))
#compute the .95th-quantile
ypoints95 <- matrix(sapply(1:nrow(x1x3points), function (k)
qrCopula(copula=fitGumbel, p=.95, marginal=2, x=as.numeric(x1x3points[k,]))),
nrow=length(x1points))
#add the results to the plot
lines(x1points, ypointsmed, lty=1, col="blue")
lines(x1points, ypoints05, lty=2, col="blue")
lines(x1points, ypoints95, lty=2, col="blue")