R code for constructing confidence areas around the level curves of bivariate copulas

In his 2013 paper called An uncertain journey around the tails of multivariate hydrological distributions Serinaldi discusses the problem of constructing confidence areas for the level curves of bivariate copulas. A level curve at a specific p-value (also referred to as a p-level curve) may be used for estimating the p-th quantiles.

ci-level-curve-copula

The R code below implements a nonparametric bootstrap method for computing such confidence areas for p-level curves.
The code performs the following 7 steps:

  1. Set the number of bootstrap samples to a sufficient large number Nb.
  2. Generate Nb bootstrap samples by sampling with replacement from the observed x-y pairs. The size of all Nb bootstrap samples should be identical to the size of the original sample of observed x-y pairs.
  3. Fit Nb separate copulas [1..Nb] to the generated Nb bootstrap samples.
  4. Generate a random integer number in [1..Nb], and select the corresponding copula from the [1..Nb] fitted copulas.
  5. Simulate a point along the level curve (at a specific p-level, for instance, p=.95) of the copula selected in step 4. Such a point is also referred to as the p-th quantile. Points along a level curve are simulated using an acceptance-rejection algorithm.
  6. Repeat steps 4 and 5 Ns times.
  7. Using these Ns simulated points, construct confidence areas around the p-level curve. These confidence areas reflect the uncertainty associated with the p-level curves. In other words, these confidence areas depict the reliability of the p-th quantile estimates.

The implemented nonparametric bootstrap method resembles the nonparametric bootstrap algorithm as described by Dung and colleagues (2015, p. 706).

Notice that the R code below demonstrates how to construct confidence intervals for a bivariate Gumbel copula with normal distributions as marginals. However, the code can be easily adapted to implement other copulas (Clayton copula, etc.) or marginals (exponential distribution, Weibull distribution, etc.) as well.

References
Dung, N.V, Merz, B., Bárdossy, A., and Apel, H. (2015), Handling uncertainty in bivariate quantile estimation: An application to flood hazard analysis in the Mekong Delta, Journal of Hydrology, 527, 704-717.

Serinaldi, F. (2013), An uncertain journey around the tails of multivariate hydrological distributions, Water Resources Research, 49, 6527–6547.

 

Suggestions and/or questions? Please contact Stefan Gelissen (email: info at datall-analyse.nl).
See this page for an overview of all of Stefan’s
R code blog posts.

R code (last update: 2016-10-05)

library(copula)
library(pracma)

##Generate some artificial data using a Gumbel copula

#bivariate Gumbel copula
myCop <- gumbelCopula(param=3, dim=2)

#Marginals:
#the first marginal is a normal distribution, and
#the second marginal is also a normal distribution
myMvdc <- mvdc(copula=myCop, c("norm", "norm"),
               list(list(mean=25, sd=.02), list(mean=15, sd=.08)))

#Draw sample (this sample will subsequently be used as the artificial data)
ns <- 50 #sample size
xy <- rMvdc(ns, myMvdc)

#Inspect data
plot(xy[,1], xy[,2], xlab="x", ylab="y")


##Fit a bivariate Gumbel copula to the artificial data

#Obtain start values for fitting the copula
normMean1 <- mean(xy[,1])
nNorm1 <- length(xy[,1])
sdNorm1 <- sqrt((nNorm1-1)/nNorm1)*sd(xy[,1]) #biased MLE for the sd

normMean2 <- mean(xy[,2])
nNorm2 <- length(xy[,2])
sdNorm2 <- sqrt((nNorm2-1)/nNorm2)*sd(xy[,2]) #biased MLE for the sd

#Fit Gumbel copula to artificial data
myGumbel <- mvdc(copula=gumbelCopula(param=1.5, dim=2), c("norm", "norm"),
                 list(list(mean=0, sd=1), list(mean=0, sd=1)))

#Starting value for the copula parameter
tauEst <- cor(xy[,1], xy[,2], method="kendall")
thetaGumbel <- 1 / (1-tauEst)

start <- c(normMean1, sdNorm1, normMean2, sdNorm2, thetaGumbel)

fitGumbel <- fitMvdc(xy, myGumbel, start=start)
fitGumbel

myMvdcFit <- mvdc(copula=gumbelCopula(param=fitGumbel@estimate[5], dim=2),
                  c("norm", "norm"),
                  list(list(mean=fitGumbel@estimate[1],
                            sd=fitGumbel@estimate[2]),
                       list(mean=fitGumbel@estimate[3],
                            sd=fitGumbel@estimate[4])))

#Construct plot with p-level curves.
#Notice that our specific Gumbel copula has convex level curves.
contour(myMvdcFit, pMvdc, xlim=c(24.95, 25.06), ylim=c(14.8, 15.3),
        xlab="x", ylab="y")



##Perform steps 2 and 3 of the bootstrap algorithm

#Function for 1) sampling with replacement over the observed pairs,
#and 2) fitting a Gumbel copula to the bootstrap sample.
rfGumbel <- function (data, Gcopula, parameters) {
  #-data is a matrix containing the observed pairs
  #-Gcopula specifies the Gumbel copula to be fitted
  #-parameters is a vector specifying the starting values to be used
  # for fitting the copula
  nobs <- nrow(data)
  #sample with replacement
  sobs <- sample(1:nobs, replace=TRUE)
  dataBoot <- data[sobs,] #bootstrap sample
  #fit copula to bootstrap sample
  fitGumbel <- fitMvdc(dataBoot, Gcopula, start=parameters)
  fitGumbel
}

#Fit Nb separate copulas [1..Nb] to the Nb separate bootstrap samples.
#Fitting these Nb copulas may be computationally intensive.
#(note: for simplicity we will set Nb to 100)
copbs <- replicate(100, rfGumbel(data=xy, Gcopula=myGumbel, parameters=start))



##Perform steps 4 and 5 of the bootstrap algorithm

#Generic function for computing the quantiles of a bivariate copula
qMvdc <- function (p, qmargin1, start, xyMvdc) {
  #-p is a number specifying the probability
  #-qmargin1 is the quantile value for the first marginal
  #-start is a point near the quantile value for the second marginal
  #-xyMvdc is a fitted bivariate copula
  
  #function for computing the corresponding quantile value for the second marginal,
  #given some quantile value for the first marginal
  qPair2 <- function (i) p - pMvdc(c(qmargin1, i), xyMvdc)
  #given the quantile value for the first marginal, compute the corresponding
  #quantile value for the second marginal
  qmargin2 <- fzero(qPair2, x=start)$x
  #return quantile pair
  c(x1=qmargin1, x2=qmargin2)
}

#Try the generic function for computing the quantiles of a bivariate copula
qpair <- qMvdc(p=.75, qmargin1=25.2, start=20, xyMvdc=myMvdcFit)
qpair #result (=quantile pair)
#Compute the corresponding probability of the obtained quantile pair
pMvdc(qpair, myMvdcFit)



#Function for sampling from a p-level curve. The sampling is done using
#an acceptance-rejection algorithm. The implemented acceptance-rejection
#algorithm is a modified version of the algorithm as described by
#Serinaldi (2013, Appendix B).
#CAUTION: this is a tailor made function that will be used for a bivariate Gumbel
#copula with normal distributions as marginals.
AccRejAlgo <- function (p, xyMvdc) {
  #-p is a number specifying the probability
  #-xyMvdc is a bivariate copula
  
  #1. The highest density of a p-level curve is found at the intersection
  #of the curve describing full dependence and the respective
  #p-level curve (this holds, at least, for symmetrical copulas).
  #See Dung et al. (2015, Section 3.5, pp. 708-709), and also
  #Volpi, E., and Fiori, A. (2012), Design event selection in bivariate
  #hydrological frequency analysis, Hydrological Sciences Journal, 57, 1506-1515.
      
  #At this first step we will compute our desired point of intersection
  #between the level curve and the curve of full dependence.
  #The latter curve is the curve where the cumulative probability of the
  #first marginal distribution is identical to the cumulative probability
  #of the second marginal distribution (or, similarly, Fx=Fy, where Fx
  #is the cumulative probability of the first marginal, and Fy the
  #cumulative probability of the second marginal).
  #Note that for our specific Gumbel copula, both marginals are normal
  #distributions, and therefore we will employ qnorm for computing quantiles.
  IntersectionCFD <- function (u) {
    xi <- qnorm(u, mean=xyMvdc@paramMargins[[1]]$mean,
                sd=xyMvdc@paramMargins[[1]]$sd)
    yi <- qnorm(u, mean=xyMvdc@paramMargins[[2]]$mean,
                sd=xyMvdc@paramMargins[[2]]$sd)
    p - pMvdc(c(xi, yi), xyMvdc)}
  
  u <- fzero(IntersectionCFD, x=c(1-.99999999999999, .99999999999999))$x
  
  #x-value at point of intersection
  xint <- qnorm(u, mean=xyMvdc@paramMargins[[1]]$mean,
                sd=xyMvdc@paramMargins[[1]]$sd)
  #corresponding y-value
  yint <- qnorm(u, mean=xyMvdc@paramMargins[[2]]$mean,
                sd=xyMvdc@paramMargins[[2]]$sd)
  
  #2. Compute the copula density at the intersection. This computed density
  #corresponds to the highest (maximal) density along the p-level curve.
  dmax <- dMvdc(x=c(xint, yint), mvdc=xyMvdc)
  
  #3. Sample from the p-level curve using an acceptance-rejection algorithm.
  success <- FALSE
  xy <- NA

  #specify range of quantiles for the second marginal
  y.0001 <- qnorm(1-.99999999999999, mean=xyMvdc@paramMargins[[2]]$mean,
                  sd=xyMvdc@paramMargins[[2]]$sd)
  y99.999 <- qnorm(.99999999999999, mean=xyMvdc@paramMargins[[2]]$mean,
                   sd=xyMvdc@paramMargins[[2]]$sd)
  
  while (!success) {
    #1. Simulate a realization from the first marginal distribution, 
    #by using a random number from a uniform distribution defined in [p, 1].
    #For our specific Gumbel copula, the first marginal is a normal distribution,
    #and as a consequence we will employ the qnorm function.
    #Also note that our specific Gumbel copula has convex level curves. However,
    #if a copula has concave level curves, then sample a random number from
    #a uniform distribution defined in [0, p].
    qmargins1 <- qnorm(runif(1, min=p, max=.99999999999999),
                       mean=xyMvdc@paramMargins[[1]]$mean,
                       sd=xyMvdc@paramMargins[[1]]$sd)
    #2. Compute the corresponding value for the second marginal distribution,
    #such that the quantile pair (x, y) is located at the p-level curve.
    qpair <- qMvdc(p=p, qmargin1=qmargins1, start=c(y.0001, y99.999), xyMvdc=xyMvdc)
    #3. Compute the copula density c(x, y) of the quantile pair.
    dpair <- dMvdc(x=qpair, mvdc=xyMvdc)
    #4. Draw a random number from a uniform distribution defined in [0, 1].
    U <- runif(1)
    #5. Accept the pair for which holds: xi <= c(x, y) / max( c(x, y) )
    if (U <= dpair/dmax) {xy <- qpair; success <- TRUE}
  }
  
  #return the accepted quantile pair
  xy
}



##Function for performing steps 4 and 5 of the bootstrap algorithm
pairXY <- function (copulas, p) {
  #-copulas is a list containing the Nb fitted copulas
  #-p is a number specifying the p-level
  
  #randomly sample from fitted copulas
  rs <- sample(1:length(copulas), 1)
  rscopbs <- copulas[[rs]] #randomly selected copula
  
  #specify the selected copula with normal distributions as marginals
  myMvdc <- mvdc(copula=gumbelCopula(param=rscopbs@estimate[5], dim=2),
                 margins=c("norm", "norm"),
                 paramMargins=list(list(mean=rscopbs@estimate[1],
                                        sd=rscopbs@estimate[2]),
                                   list(mean=rscopbs@estimate[3],
                                        sd=rscopbs@estimate[4])))
  
  #For our selected Gumbel copula above (=myMvdc), simulate an x-y pair
  #at a given p-level.
  xysim <- AccRejAlgo(p=p, xyMvdc=myMvdc)
  #return the pair
  xysim
}

#Perform steps 4 and 5 of the bootstrap algorithm Ns times (here Ns=1000)
simPairXY <- replicate(1000, pairXY(copulas=copbs, p=.75), simplify=FALSE)

#Plot the Ns simulated x-y pairs
plot(matrix(unlist(lapply(simPairXY, function(x) x)), ncol=2, byrow=TRUE),
     cex=.2, xlab="x", ylab="y", main="p=.75")

#Plot the confidence area for the p-level curve
library(emdbook)
library(RColorBrewer)
nlev <- 4
cols <- rev(brewer.pal(nlev, "RdYlBu"))
kdData <- matrix(unlist(lapply(simPairXY, function(x) x)), ncol=2, byrow=TRUE)
plot(xy[,1], xy[,2], xlab="x", ylab="y", pch=3, cex=.6, col="gray",
     main="Level curve (at p=.75)")
HPDregionplot(kdData, prob=c(.95, .75, .5, .25), col=cols,
              lwd=1, add=TRUE)
contour(myMvdcFit, pMvdc, xlim=c(24.5, 25.04), ylim=c(14.5, 15.2), levels=.75,
        labcex=.8, add=TRUE)
legend("topleft", legend=c("95%", "75%", "50%", "25%"),
       col=cols, lty=c(1,1,1), lwd=c(1,1,1),
       bty="n", y.intersp=1.2, cex=.7)