Extreme value analysis in engineering (smallest extreme values)

Extreme value analysis deals with extreme events. In engineering, extreme value analysis is used to estimate the minimum strength of materials, the minimum life time of a component, the minimum surrounding/outside temperature, or the minimum load at which a crack will develop, just to name a few.

The following code may be used for fitting extreme value models in R. Note, however, that the R code is restricted to the analysis of minima (or smallest values). For maxima (or largest values) see this R code.

The code focuses on an application of extreme value analysis in the field of engineering, namely obtaining the minimum breaking strength of a chain.

The code shows how to fit a Gumbel and Weibull distribution for smallest values to the breaking strength data. Subsequently, the same data will be fitted with a Generalized Extreme Value (or GEV) distribution for minima. The three extreme value distributions for smallest values (i.e., Gumbel, Weibull, and Fréchet distribution) are all family members of the GEV distribution. The code will demonstrate that a GEV model for smallest values fits either the Gumbel, Weibull, or Fréchet distribution for minima.

A comprehensive introduction to extreme value analysis is given by Castillo et al. in their 2005 book Extreme value and related models with applications in engineering and science.
Another introduction to extreme value analysis is given by Coles in his 2001 book An introduction to statistical modeling of extreme values.

Suggestions and/or questions? Please contact Stefan Gelissen (email: info at datall-analyse.nl).
See this page for an overview of all of Stefan’s R code blog posts.

R code

library(survival)
library(SPREDA)
library(extRemes)


#The R functions for performing the extreme value analysis (for minima)
#can be downloaded from: 
source("http://www.datall-analyse.nl/R/eva_min.R")


##data
#data from Castillo et al., Table 1.10, p. 12-13
#breaking strength (in kg) for 20 chains
evobs <- scan("http://personales.unican.es/castie/extremes/Table1-10.txt")

#explore data visually
hist(evobs)



##MLEs of parameters Gumbel and Weibull model (smallest values)

#Gumbel
gumbelmod <- survreg(Surv(evobs) ~ 1, dist="extreme")
summary(gumbelmod)
#extract MLEs (these are needed for the remaining part of the analysis)
muG <- coef(gumbelmod)
sigmaG <- gumbelmod$scale

#Weibull
weibullmod <- survreg(Surv(evobs) ~ 1, dist="weibull")
summary(weibullmod)
#extract MLEs (these are needed for the remaining part of the analysis)
muW <- coef(weibullmod)
sigmaW <- weibullmod$scale



##probability plot for smallest values (=minima)
#probability plot for Gumbel and Weibull distribution (side by side)
par(mfrow=c(1, 2))
probplot(values=evobs, model=gumbelmod, varname="Strength (kg)",
         alpha=1-.95, dist="gumbel")
probplot(values=evobs, model=weibullmod, varname="Strength (kg)",
         alpha=1-.95, dist="weibull")
par(mfrow=c(1, 1))



##cumulative probability plot for smallest values (including return period)
#note: for the breaking strength data, a return period of 10 means that out
#of 10 chains, only 1 is (on average) expected to have a breaking strength of
#less than muG+qsev(1/10)*sigmaG=59.3 kg (in case of a Gumbel distribution),
#or exp(muW+qsev(1/10)*sigmaW)=62.9 kg (in case of a Weibull distribution)
#(this expected breaking strength is also called the return level)

#Gumbel distribution
cprobplot(values=evobs, model=gumbelmod, varname="Strength (kg)",
          alpha=1-.95, dist="gumbel")
#Weibull distribution
cprobplot(values=evobs, model=weibullmod, varname="Strength (kg)",
          alpha=1-.95, dist="weibull")



##diagnostics
#q-q and p-p plot for Gumbel and Weibull distribution
par(mfrow=c(2, 2))
QQplot(values=evobs, mu=muG, sigma=sigmaG, dist="gumbel")
QQplot(values=evobs, mu=muW, sigma=sigmaW, dist="weibull")
PPplot(values=evobs, mu=muG, sigma=sigmaG, dist="gumbel")
PPplot(values=evobs, mu=muW, sigma=sigmaW, dist="weibull")
par(mfrow=c(1, 1))



##fit GEV distribution (smallest values) to data
#note: the GEV model for minima is fitted with the GEV model for maxima
#therefore, the signs of the data are reversed (evobs -> -1*evobs)
#the MLEs of the GEV for maxima are identical to those of the GEV for minima,
#apart from the sign of mu (thus: muGev [maxima] becomes -muGev [minima])
#see Castillo et al., p. 226; Coles, p. 52-53
gevmod <- fevd(x=-1*evobs, units="Strength (kg)", period.basis="# of chain",
               type="GEV", method="MLE")
gevmod
#extract MLEs (these are needed for the remaining part of the analysis)
muGev <- gevmod$results$par[1]
sigmaGev <- gevmod$results$par[2]
xiGev <- gevmod$results$par[3]



##confidence intervals for MLEs of GEV model
#normal-approximation confidence intervals
#remember: reverse the sign of mu (location -> -location)
ci(gevmod, alpha=.05, type="parameter", method="normal")

#likelihood based confidence intervals
#mu (=location)
#remember: reverse the sign of mu (location -> -location)
ci(gevmod, alpha=.05, which.par=1, type="parameter", method="proflik",
   xrange=c(-110,-80), nint=50, verbose=TRUE, main="GEV - smallest values")
#sigma (=scale)
ci(gevmod, alpha=.05, which.par=2, type="parameter", method="proflik",
   xrange=c(10,30), nint=50, verbose=TRUE, main="GEV - smallest values")
#xi (=shape)
ci(gevmod, alpha=.05, which.par=3, type="parameter", method="proflik",
   xrange=c(-.5,.2), nint=50, verbose=TRUE, main="GEV - smallest values",
   ylim=c(-90,-80)) #change range of the y-axis



##relationship between GEV model and Gumbel/Weibull/Fréchet distribution
#obtain parameters of Gumbel, Weibull, and Fréchet model from GEV model
#see Castillo et al., p. 200

#note: sign of muGev is reversed

#xi = 0: GEV model fitted a Gumbel distribution
parmsGumbel <- c(-muGev, sigmaGev)
names(parmsGumbel) <- c("Mu", "Sigma")
parmsGumbel #GEV model's estimates of the Gumbel distribution
#compare with Gumbel model
survreg(Surv(evobs) ~ 1, dist="extreme")

#xi < 0: GEV model fitted a Weibull distribution
parmsWeibull <- c(log(sigmaGev/-xiGev), -xiGev, -muGev+sigmaGev/xiGev)
names(parmsWeibull) <- c("Mu", "Sigma", "Gamma")
parmsWeibull #GEV model's estimates of the Weibull distribution
#compare with Weibull model
gamma <- parmsWeibull[3] #threshold parameter of Weibull distribution
survreg(Surv(evobs-gamma) ~ 1, dist="weibull")

#xi > 0: GEV model fitted a Fréchet distribution
parmsFrechet <- c(log(sigmaGev/xiGev), xiGev, -muGev+sigmaGev/xiGev)
names(parmsFrechet) <- c("Mu", "Sigma", "Gamma")
parmsFrechet #GEV model's estimates of the Fréchet distribution
#compare with Fréchet model
gamma <- parmsFrechet[3] #threshold parameter of Fréchet distribution
Lifedata.MLE(Surv(gamma-evobs) ~ 1, dist="frechet")



##diagnostics
#histogram of the data together with the density of the fitted GEV distribution
plot(gevmod, "hist", ylim=c(0,0.1), col="gray", main="GEV - smallest values")
#density plot of the data together with the density of the fitted GEV distribution
plot(gevmod, "density", ylim=c(0,0.1), main="GEV - smallest values")

#q-q and p-p plot for GEV (here: compare with Gumbel distribution)
par(mfrow=c(2, 2))
QQplot(values=evobs, mu=muGev, sigma=sigmaGev, xi=xiGev, dist="gev")
QQplot(values=evobs, mu=muG, sigma=sigmaG, dist="gumbel")
PPplot(values=evobs, mu=muGev, sigma=sigmaGev, xi=xiGev, dist="gev")
PPplot(values=evobs, mu=muG, sigma=sigmaG, dist="gumbel")
par(mfrow=c(1, 1))



#probability plot for GEV distribution (here: compare with Gumbel distribution)
#note: the GEV model may display concave/convex behavior in the lower tail
#this concave/convex behavior depends on the xi-parameter of the GEV model
#- if this parameter differs from zero, then the GEV model fitted
# either the Weibull distribution (concave lower tail), or the Fréchet
# distribution (convex lower tail)
#- if this xi-parameter is very close to zero, then the GEV model fitted a
# Gumbel distribution, and no concave/convex lower tail will show up
par(mfrow=c(1, 2))
probplot(values=evobs, model=gevmod, varname="Strength (kg)",
         alpha=1-.95, dist="gev")
probplot(values=evobs, model=gumbelmod, varname="Strength (kg)",
         alpha=1-.95, dist="gumbel")
par(mfrow=c(1, 1))

#compare GEV and Gumbel model with likelihood-ratio test
gummod <- fevd(x=-1*evobs, units="Strength (kg)", type="Gumbel", method="MLE")
lr.test(gevmod, gummod)



##cumulative probability plot for GEV model (including return period)
#GEV distribution (smallest values)
cprobplot(values=evobs, model=gevmod, varname="Strength (kg)",
          alpha=1-.95, dist="gev")
#return period versus return level
plotRL(model=gevmod, alpha=1-.95)



##likelihood based intervals for return levels (GEV model)
#in case of the GEV model, the normal-approximation confidence intervals for the
#return level can be unrealistically wide, especially in the tails
#(see, for instance, the probability and cumulative probability plot of the GEV model)
#likelihood based intervals, on the other hand, yield more realistic intervals

#specify cumulative probability
cumProb <- .01
returnPeriod <- 1/cumProb #return period = 1/cumulative probability

#normal-approximation confidence interval
ciRetperN <- ci(gevmod, type="return.level", return.period=returnPeriod,
                method="normal", alpha=.05)
#point estimate
-as.vector(ciRetperN[2])
#confidence interval
-as.vector(c(ciRetperN[3], ciRetperN[1]))

#likelihood based confidence interval
ciRetperL <- ci(gevmod, type="return.level", return.period=returnPeriod,
                main="GEV - largest values",
                method="proflik", alpha=.05, nint=100,
                xrange=c(ciRetperN[1], 0), verbose=TRUE)
#confidence interval
-as.vector(c(ciRetperL[3], ciRetperL[1]))
#note: a negative value for breaking strength is physically unrealistic
#hence, the lower limit for the return level was set to zero
#(see xrange in ci function above)