R code for fitting a multiple (nonlinear) quantile regression model by means of a copula

In my previous blog post I demonstrated how to fit a simple (nonlinear) quantile regression model using a bivariate copula. In these simple quantile regression models, we have one independent and one dependent variable.

The R code below may be used for fitting a multiple (nonlinear) quantile regression model. These multiple (nonlinear) quantile regression models have two or more independent variables (but only one dependent variable). The R code fits these multiple (nonlinear) quantile regression models by means of a multivariate (Archimedean or elliptical) copula.

quantile-regression-multivariate-copula

Finally, the algorithm used for fitting the multiple regression models in the R code below is based on Nelsen’s method for fitting quantile regression models (which I described in my previous blog post).

Suggestions and/or questions? Please contact Stefan Gelissen (email: info at datall-analyse.nl).
See this page for an overview of all of Stefan’s
R code blog posts.

R code

library(copula)
library(MASS)
library(scatterplot3d)

#Function for performing multiple (nonlinear) quantile regression
#by means of a fitted multivariate copula
qrCopula <- function(copula, p=.5, marginal, x) {
  #-copula is a fitted copula as returned by fitMvdc(). At the moment,
  # only Archimedean and elliptical copulas are accepted.
  #-p is a number specifying the quantile to be estimated, where p
  # is defined in [0, 1].
  #-marginal is an integer indicating which of the copula's marginals should
  # be treated as the dependent variable. For instance, assume we defined 
  # x1, x2, and x3 as marginals (in that order) in a multivariate copula.
  # If we subsequently want to treat x2 as the dependent variable, then we
  # set marginal=2. See Examples 1 and 2 below for details.
  #-x is a vector with numbers specifying at which data points we want to
  # estimate the quantile. In other words, the numbers in this vector
  # are the values on the independent variables. 
  # The numbers in this vector should be in exactly the same order
  # as their corresponding marginals in the fitted copula.
  # For instance, assume again that we defined x1, x2, and x3 as
  # marginals (in that order) in a multivariate copula. Next, we treat
  # x2 as the dependent variable, and x1 and x3 as independent variables.
  # As a result, the order of the data points in our vector has to be (x1, x3).
  # See Examples 1 and 2 below for details.
  
  mvdcQR <- copula
  
  #number of dimensions
  nd <- mvdcQR@mvdc@copula@dimension
  
  #independent variables
  indepVar <- c(1:nd)[-marginal]
  
  #compute u[1], ..., u[n-1] for the independent variables
  ui <- sapply(1:(nd-1), function (k) {
    n <- indepVar[k]
    cdf.margin <- copula:::asCall(paste0("p", mvdcQR@mvdc@margins[n]),
                                  mvdcQR@mvdc@paramMargins[[n]])
    eval(cdf.margin, list(x=x[k]))
  })
  
  #compute dependent variable
  ud <- NA
  udfun <- function (ud, ui, alpha, t) {t - cCopula(t(as.matrix(c(ui, ud))),
                                                    copula=mvdcQR@mvdc@copula,
                                                    indices=nd)}
  try(ud <- uniroot(udfun, interval=c(1-.99999999999999, .99999999999999),
                    ui=ui, alpha=alpha, t=p, tol=1e-5)$root, silent=FALSE)
  qdf.margin <- copula:::asCall(paste0("q", mvdcQR@mvdc@margins[marginal]),
                                mvdcQR@mvdc@paramMargins[[marginal]])
  x2 <- eval(qdf.margin, list(x=ud))
  x2
}


#Function for computing the conditional probability:
#P{ Y<=y | X(1)=x(1), ..., X(N)=x(n) }, where y is the dependent variable
#and x(1), ..., x(n) the independent variables.
condProbY <- function(copula, marginal, data) {
  #-copula is a fitted copula as returned by fitMvdc(). At the moment,
  # only Archimedean and elliptical copulas are accepted.
  #-marginal is an integer indicating which of the copula's marginals should
  # be treated as the dependent variable. For instance, assume we defined 
  # x1, x2, and x3 as marginals (in that order) in a multivariate copula.
  # If we subsequently want to treat x2 as the dependent variable, then we
  # set marginal=2. See Examples 1 and 2 below for details.
  #-data is a vector with numbers specifying at which values for x we
  # want to compute the conditional probability. 
  # The numbers in this vector should be in exactly the same order
  # as their corresponding marginals in the fitted copula.
  # For instance, assume again that we defined x1, x2, and x3 as
  # marginals (in that order) in a multivariate copula. Next, we treat
  # x2 as the dependent variable, and x1 and x3 as independent variables.
  # As a result, the order of the data points in our vector has to be (x1, x3).
  # See Examples 1 and 2 below for details.
  
  mvdcQR <- copula
  
  #number of dimensions
  nd <- mvdcQR@mvdc@copula@dimension
  
  #independent variables
  indepVar <- c(1:nd)[-marginal]
  
  #compute u[1], ..., u[n]
  u <- sapply(1:nd, function (k) {
    cdf.margin <- copula:::asCall(paste0("p", mvdcQR@mvdc@margins[k]),
                                  mvdcQR@mvdc@paramMargins[[k]])
    eval(cdf.margin, list(x=data[k]))
  })
  
  ui <- u[indepVar] #independent variables
  ud <- u[marginal] #dependent variable
  
  #compute the conditional probability for y given the independent variables
  cp <- cCopula(t(as.matrix(c(ui, ud))), copula=mvdcQR@mvdc@copula, indices=nd)
  cp
}





##EXAMPLE 1: multiple linear quantile regression

##Generate some artificial data using a Gaussian copula

#3-dimensional Gaussian copula
myCop <- normalCopula(param=c(.5, .4, -.3), dim=3, dispstr="un")

#Marginals:
#the first marginal is a normal distribution, the second marginal is
#a normal distribution, and the third marginal is also a normal distribution.
myMvdc <- mvdc(copula=myCop, c("norm", "norm", "norm"),
               list(list(mean=5, sd=.02),
                    list(mean=2, sd=3),
                    list(mean=10, sd=1.5)))

#Draw sample (this sample will subsequently be used as the artificial data)
ns <- 200 #sample size
xyzs <- rMvdc(ns, myMvdc)

#Visualize the data:
#we will label xyzs[,1] as x1 (this is the first marginal in our specified
#copula), xyzs[,2] as x2 (this is the second marginal in our copula), and
#xyzs[,3] as x3 (this is the third marginal in our copula).
scatterplot3d(xyzs, xlab="x1", ylab="x2", zlab="x3",
              highlight.3d=TRUE, box=FALSE)



##Fit a 3-dimensional Gaussian copula to the artificial data

#Obtain start values for fitting the copula
normMLE <- fitdistr(xyzs[,1], densfun="normal")
normMean1 <- normMLE$estimate[1]
normSD1 <- normMLE$estimate[2]

normMLE <- fitdistr(xyzs[,2], densfun="normal")
normMean2 <- normMLE$estimate[1]
normSD2 <- normMLE$estimate[2]

normMLE <- fitdistr(xyzs[,3], densfun="normal")
normMean3 <- normMLE$estimate[1]
normSD3 <- normMLE$estimate[2]

#Starting values for the copula parameters
tauEst <- sin(cor(xyzs, method="kendall")*pi/2)
tauEst <- tauEst[upper.tri(tauEst)]

#Fit a Gaussian copula to the artificial data
myGaussian <- mvdc(copula=normalCopula(param=c(1,1,1), dispstr="un", dim=3),
                   c("norm", "norm", "norm"),
                   paramMargins=list(list(mean=0, sd=1),
                                     list(mean=0, sd=1),
                                     list(mean=0, sd=1)))

start <- c(normMean1, normSD1, normMean2, normSD2, normMean3, normSD3, tauEst)

fitGaussian <- fitMvdc(xyzs, myGaussian, start=start)
fitGaussian



##Quantile regression: regress x3 on both x1 and x2
#We will treat x3 as the dependent variable (=y), and both
#x1 and x2 as independent variables.

#Fit a (straight) plane for the median
x1points <- seq(min(xyzs[,1]), max(xyzs[,1]), length.out=5)
x2points <- seq(min(xyzs[,2]), max(xyzs[,2]), length.out=5)
x1x2points <- expand.grid(x1points, x2points)
#Note that the order of the marginals in our fitted copula
#is x1, x2, x3. As a result, the order of the values in the
#vector for x has to be (x1, x2).
#Furthermore, we will treat x3 as the dependent variable, thus set marginal=3.
ypoints <- matrix(sapply(1:nrow(x1x2points), function (k)
  qrCopula(copula=fitGaussian, p=.5, marginal=3, x=as.numeric(x1x2points[k,]))),
  nrow=length(x1points))

#Color the data points that lie above the plane for the median black, and
#points below the plane light gray.
cprobs <- sapply(1:nrow(xyzs), function (k)
  condProbY(copula=fitGaussian, marginal=3, data=xyzs[k,]))

colorPoint <- ifelse(cprobs < .5, gray(level=.6, alpha=.7),
                     gray(level=0, alpha=1))

#Plot the plane
qrplot <- scatterplot3d(xyzs, xlab="x1", ylab="x2", zlab="y",
                        angle=40, color=colorPoint, box=FALSE)
for(i in length(x1points):1)
  qrplot$points3d(rep(x1points[i], length(x2points)), x2points, ypoints[i,],
                  type="l", col="blue", lty=2)
for(i in length(x2points):1)
  qrplot$points3d(x1points, rep(x2points[i], length(x1points)), ypoints[,i],
                  type="l", col="blue", lty=2)

#Approximately 100 (out of 200) data points should lie above the
#the plane for the median.
sum(cprobs>=.5)



#Fit a quantile curve while varying the values for x1, and fixing x2
#to some value.
x1points <- seq(min(xyzs[,1]), max(xyzs[,1]), length.out=50) #values for x1
x2points <- 0 #fixed value for x2
x1x2points <- expand.grid(x1points, x2points)

#First compute true quantile curves for these data points
fitGaussian2 <- fitGaussian
fitGaussian2@mvdc@copula@parameters <- c(.5, .4, -.3)
fitGaussian2@mvdc@paramMargins[[1]]$mean <- 5
fitGaussian2@mvdc@paramMargins[[1]]$sd <- .02
fitGaussian2@mvdc@paramMargins[[2]]$mean <- 2
fitGaussian2@mvdc@paramMargins[[2]]$sd <- 3
fitGaussian2@mvdc@paramMargins[[3]]$mean <- 10
fitGaussian2@mvdc@paramMargins[[3]]$sd <- 1.5

#compute the true median
ypointsTmed <- matrix(sapply(1:nrow(x1x2points), function (k)
  qrCopula(copula=fitGaussian2, p=.5, marginal=3, x=as.numeric(x1x2points[k,]))),
  nrow=length(x1points))
#compute the true .05th-quantile
ypointsT05 <- matrix(sapply(1:nrow(x1x2points), function (k)
  qrCopula(copula=fitGaussian2, p=.05, marginal=3, x=as.numeric(x1x2points[k,]))),
  nrow=length(x1points))
#compute the true .95th-quantile
ypointsT95 <- matrix(sapply(1:nrow(x1x2points), function (k)
  qrCopula(copula=fitGaussian2, p=.95, marginal=3, x=as.numeric(x1x2points[k,]))),
  nrow=length(x1points))

#Plot the results
plot(x1points, ypointsTmed, type="l", col="gray", xlab="x1", ylab="y",
     main=paste("x2 = ", x2points), ylim=c(min(ypointsT05), max(ypointsT95)))
lines(x1points, ypointsT05, lty=2, col="gray")
lines(x1points, ypointsT95, lty=2, col="gray")
#add a legend
legend("topleft", lty=c(1, 2, 1, 1, 1),
       col=c("black", "black", "gray", "blue", "red"),
       legend=c("Median", "90% prediction interval", "True quantile curves",
                "qrCopula function", "rq function"),
       bty="n", y.intersp=1.2, cex=.7)

#Fit quantiles curves using the qrCopula function:
#compute the median
ypointsmed <- matrix(sapply(1:nrow(x1x2points), function (k)
  qrCopula(copula=fitGaussian, p=.5, marginal=3, x=as.numeric(x1x2points[k,]))),
  nrow=length(x1points))
#compute the .05th-quantile
ypoints05 <- matrix(sapply(1:nrow(x1x2points), function (k)
  qrCopula(copula=fitGaussian, p=.05, marginal=3, x=as.numeric(x1x2points[k,]))),
  nrow=length(x1points))
#compute the .95th-quantile
ypoints95 <- matrix(sapply(1:nrow(x1x2points), function (k)
  qrCopula(copula=fitGaussian, p=.95, marginal=3, x=as.numeric(x1x2points[k,]))),
  nrow=length(x1points))
#add the results to the plot
lines(x1points, ypointsmed, lty=1, col="blue")
lines(x1points, ypoints05, lty=2, col="blue")
lines(x1points, ypoints95, lty=2, col="blue")

#Note that the model for our data is a linear quantile regression model.
#As a result, we may also fit the quantiles curves using the rq function
#in the quantreg package.
library(quantreg)
#fit a linear quantile regression model for the median
rqmodmed <- rq(y ~ x1 + x2, tau=.5,
               data=data.frame(y=xyzs[,3], x1=xyzs[,1], x2=xyzs[,2]))
#fit a linear quantile regression model for the .05th-quantile
rqmod05 <- rq(y ~ x1 + x2, tau=.05,
              data=data.frame(y=xyzs[,3], x1=xyzs[,1], x2=xyzs[,2]))
#fit a linear quantile regression model for the .95th-quantile
rqmod95 <- rq(y ~ x1 + x2, tau=.95,
              data=data.frame(y=xyzs[,3], x1=xyzs[,1], x2=xyzs[,2]))

#compute for these 3 linear quantile regression models the
#corresponding quantile curves
ypointsmed <- predict(rqmodmed, newdata=data.frame(x1=x1points, x2=x2points))
ypoints05 <- predict(rqmod05, newdata=data.frame(x1=x1points, x2=x2points))
ypoints95 <- predict(rqmod95, newdata=data.frame(x1=x1points, x2=x2points))
#add the results to the plot
lines(x1points, ypointsmed, lty=1, col="red")
lines(x1points, ypoints05, lty=2, col="red")
lines(x1points, ypoints95, lty=2, col="red")





##EXAMPLE 2: nonlinear quantile regression

##Generate some artificial data using a Gumbel copula

#3-dimensional Gumbel copula
myCop <- gumbelCopula(param=3, dim=3)

#Marginals:
#the first marginal is a normal distribution, the second marginal a
#gamma distribution, and the third marginal a Weibull distribution
myMvdc <- mvdc(copula=myCop, c("norm", "gamma", "weibull"),
               list(list(mean=5, sd=.02),
                    list(shape=1.2, scale=3),
                    list(shape=.3, scale=5)))

#Draw sample (this sample will subsequently be used as the artificial data)
ns <- 200 #sample size
xyzs <- rMvdc(ns, myMvdc)

#Visualize the data:
#we will label xyzs[,1] as x1 (this is the first marginal in our specified
#copula), xyzs[,2] as x2 (this is the second marginal in our copula), and
#xyzs[,3] as x3 (this is the third marginal in our copula).
scatterplot3d(x=xyzs[,1], y=xyzs[,3], z=xyzs[,2],
              xlab="x1", ylab="x3", zlab="x2",
              highlight.3d=TRUE)



##Fit a 3-dimesional Gumbel copula to the artificial data

#Obtain start values for fitting the copula
normMLE <- fitdistr(xyzs[,1], densfun="normal")
normMean1 <- normMLE$estimate[1]
normSD1 <- normMLE$estimate[2]

gammaMLE <- fitdistr(xyzs[,2], densfun="gamma")
gammaShape2 <- gammaMLE$estimate[1]
gammaScale2 <- 1/gammaMLE$estimate[2]

weibullMLE <- fitdistr(xyzs[,3], densfun="weibull")
weibullShape3 <- weibullMLE$estimate[1]
weibullScale3 <- weibullMLE$estimate[2]

#Starting values for the copula parameter
tauEst <- cor(xyzs[,1], xyzs[,2], method="kendall")
thetaGumbel <- 1 / (1-tauEst)

#Fit a Gumbel copula to the artificial data
myGumbel <- mvdc(copula=gumbelCopula(param=1.5, dim=3),
                 c("norm", "gamma", "weibull"),
                 paramMargins=list(list(mean=0, sd=1),
                                   list(shape=1, scale=1),
                                   list(shape=1, scale=1)))

start <- c(normMean1, normSD1, gammaShape2, gammaScale2,
           weibullShape3, weibullScale3, thetaGumbel)

fitGumbel <- fitMvdc(xyzs, myGumbel, start=start)
fitGumbel



##Quantile regression: regress x2 on both x1 and x3
#We will treat x2 as the dependent variable (=y), and both
#x1 and x3 as independent variables.

#Fit a curved plane for the median
x1points <- seq(min(xyzs[,1]), max(xyzs[,1]), length.out=15)
x3points <- seq(min(xyzs[,3]), max(xyzs[,3]), length.out=10)
x1x3points <- expand.grid(x1points, x3points)
#Note that the order of the marginals in our fitted copula
#is x1, x2, x3. As a result, the order of the values in the
#vector for x has to be (x1, x3).
#Furthermore, we will treat x2 as the dependent variable, thus marginal=2.
ypoints <- matrix(sapply(1:nrow(x1x3points), function (k)
  qrCopula(copula=fitGumbel, p=.5, marginal=2, x=as.numeric(x1x3points[k,]))),
  nrow=length(x1points))

#Color the data points that lie above the plane for the median black, and
#points below the plane light gray.
cprobs <- sapply(1:nrow(xyzs), function (k)
  condProbY(copula=fitGumbel, marginal=2, data=xyzs[k,]))

colorPoint <- ifelse(cprobs < .5, gray(level=.6, alpha=.7),
                     gray(level=0, alpha=1))

#Plot the plane
qrplot <- scatterplot3d(x=xyzs[,1], y=xyzs[,3], z=xyzs[,2], angle=60,
                        xlab="x1", ylab="x3", zlab="y",
                        color=colorPoint, scale.y=.5, box=FALSE)
for(i in length(x1points):1)
  qrplot$points3d(rep(x1points[i], length(x3points)), x3points, ypoints[i,],
                  type="l", col="blue", lty=2)
for(i in length(x3points):1)
  qrplot$points3d(x1points, rep(x3points[i], length(x1points)), ypoints[,i],
                  type="l", col="blue", lty=2)


#Approximately 100 (out of 200) data points should lie above the
#the plane for the median
sum(cprobs>=.5)


#Fit a quantile curve while varying the values for x1, and fixing x3
#to some value.
x1points <- seq(min(xyzs[,1]), max(xyzs[,1]), length.out=50) #values for x1
x3points <- 500 #fixed value for x3
x1x3points <- expand.grid(x1points, x3points)

#First compute true quantile curves for these data points
fitGumbel2 <- fitGumbel
fitGumbel2@mvdc@copula@parameters <- 3
fitGumbel2@mvdc@paramMargins[[1]]$mean <- 5
fitGumbel2@mvdc@paramMargins[[1]]$sd <- .02
fitGumbel2@mvdc@paramMargins[[2]]$shape <- 1.2
fitGumbel2@mvdc@paramMargins[[2]]$scale <- 3
fitGumbel2@mvdc@paramMargins[[3]]$shape <- .3
fitGumbel2@mvdc@paramMargins[[3]]$scale <- 5

#compute the true median
ypointsTmed <- matrix(sapply(1:nrow(x1x3points), function (k)
  qrCopula(copula=fitGumbel2, p=.5, marginal=2, x=as.numeric(x1x3points[k,]))),
  nrow=length(x1points))
#compute the true .05th-quantile
ypointsT05 <- matrix(sapply(1:nrow(x1x3points), function (k)
  qrCopula(copula=fitGumbel2, p=.05, marginal=2, x=as.numeric(x1x3points[k,]))),
  nrow=length(x1points))
#compute the true .95th-quantile
ypointsT95 <- matrix(sapply(1:nrow(x1x3points), function (k)
  qrCopula(copula=fitGumbel2, p=.95, marginal=2, x=as.numeric(x1x3points[k,]))),
  nrow=length(x1points))

#Plot the results
plot(x1points, ypointsTmed, type="l", col="gray", xlab="x1", ylab="y",
     main=paste("x3 = ", x3points), ylim=c(min(ypointsT05), max(ypointsT95)))
lines(x1points, ypointsT05, lty=2, col="gray")
lines(x1points, ypointsT95, lty=2, col="gray")
#add a legend
legend("topleft", lty=c(1, 2, 1, 1, 1),
       col=c("black", "black", "gray", "blue"),
       legend=c("Median", "90% prediction interval", "True quantile curves",
                "Fitted quantile curves"),
       bty="n", y.intersp=1.2, cex=.7)


#Fit quantiles curves using the qrCopula function:
#compute the median
ypointsmed <- matrix(sapply(1:nrow(x1x3points), function (k)
  qrCopula(copula=fitGumbel, p=.5, marginal=2, x=as.numeric(x1x3points[k,]))),
  nrow=length(x1points))
#compute the .05th-quantile
ypoints05 <- matrix(sapply(1:nrow(x1x3points), function (k)
  qrCopula(copula=fitGumbel, p=.05, marginal=2, x=as.numeric(x1x3points[k,]))),
  nrow=length(x1points))
#compute the .95th-quantile
ypoints95 <- matrix(sapply(1:nrow(x1x3points), function (k)
  qrCopula(copula=fitGumbel, p=.95, marginal=2, x=as.numeric(x1x3points[k,]))),
  nrow=length(x1points))
#add the results to the plot
lines(x1points, ypointsmed, lty=1, col="blue")
lines(x1points, ypoints05, lty=2, col="blue")
lines(x1points, ypoints95, lty=2, col="blue")