R code for forecasting with the Ensemble Kalman Filter

In one of my previous blog posts I demonstrated how to implement an Ensemble Kalman Filter (EnKF) in R.
In this post I will demonstrate how to predict future system states and observations with the
EnKF.

Ensemble Kalman Filter Prediction Interval

Suggestions and/or questions? Please contact Stefan Gelissen (email: info at datall-analyse.nl).
See this page for an overview of all of Stefan’s
R code blog posts.

R code

library(mvtnorm)


#The R functions for the Ensemble Kalman filter (=EnKF) and
#forecasting (=EnKFforecast) can be downloaded from: 
source("http://www.datall-analyse.nl/R/EnKF.R")
#Have a look at the functions EnKF and EnKFforecast, and notice that at the
#beginning of the functions' scripts you will find a description of
#the functions' arguments.
EnKF
EnKFforecast




##EXAMPLE

#In the following example, the EnKF will be used to simulate heat
#conduction in a one-dimensional bar. The heat conduction in the
#bar is governed by the partial differential equation (PDE):
# d T(x,t) / d t = k * d^2 T(x,t) / d x^2 + u(x,t) + w(x,t),
#where T(x,t) is the temperature at position x and time t, k the heat conduction
#coefficient, u(x,t) an external heat source at position x and time t,
#and w(x,t) represents Gaussian noise at position x and time t.
#We will employ a numerical method (more specifically, the "explicit method")
#for solving the above PDE. In applying this numerical method, we will devide
#the bar into 100 equally spaced nodes, which yields 100 temperature states.
#We will assume that the boundary conditions are T(0,t)=200 and T(L,t)=150 (where
#L is the length of the bar). The intial condition is set to 100 for the interior
#nodes, and 200 and 150 for the nodes at x=0 and x=L, respectively.
#In this example we will apply two external heat sources. The two heat sources
#act on the one-dimensional at nodes 33 and 67.
#These two heat sources are active from time steps 1 to 400, while they are
#turned off at time step 401 till the end of our heat conduction experiment.
#Finally, the process noise (covariance), w(x,t), is set
#to 1e-2*I (where I is a 100x100 identity matrix).
#
#Measurements of the temperature are taken at nodes 10, 20, ..., 90. The
#noise (covariance) of these measurements is 1*I (where I is
#a 9x9 identity matrix).



##TRUE STATES

#Generate the true states T(x,t).
dt <- 1/500 #time step size used in the explicit method
tt <- 100 #upper bound of time window (that is, max number of time steps)
st <- seq(0, tt, by=dt) #lower time bounds of the integration intervals
ns <- length(st) #number of integrations
nc <- 100 #number of nodes
x <- matrix(0, ncol=nc, nrow=ns) #specify matrix for states
dx <- 1 #spacing between nodes
x[1,] <- c(200, rep(100, nc-2), 150) + rnorm(nc, 0, 1) #initial conditions (at t=0)

#Specify noises (variances)
wk <- 1e-2 #process noise
vk <- 1 #measurement noise

#Parameters in the PDE
k <- 5 #heat conduction coefficient

#State transition matrix
A <- matrix(0, ncol=nc, nrow=nc)

#Entries of the interior nodes in the transition matrix
for(i in 2:(nc-1)) {
  A[i, (i-1):(i+1)] <- c(1, -2, 1)
}

#Taking into account the spacing between the nodes (=dx) and
#the heat conduction coefficient (=k)
A <- (dx)^2*k*A

#The explicit method yields stable and convergent results if (k*dt)/(dx)^2 < .5
(k*dt)/(dx)^2

#Control input matrix
B <- matrix(0, ncol=2, nrow=100)
B[33, 1] <- B[67, 2] <- 1 #heat sources act at nodes 33 and 67

#Heat source
timeHS <- 400 #time steps during which the heat sources act on the bar
hs <- c(0, c(rep(1, timeHS), rep(0, length(st)-timeHS)))

#Simulate true states
for (i in 2:ns) {
  #states
  x[i, ] <- x[i-1, ] + (A%*%x[i-1, ])*dt +
    B%*%c(2*hs[i], .5*hs[i]) +
    rnorm(nc, 0, sqrt(wk))}



#Plot of simulated states at specific times.
#The times are computed as k*dt, where k is the time step index
#and dt the size of the time steps.
#However, note that x[1, ] represents the temperature states at t=0. Thus, for
#t=5 we need to select x[501, ].
par(mfrow=c(2, 2))
plot(1:nc, x[501,], type="l", xlab="Node, i", ylab="Temperature (i)",
     main="t=5")
plot(1:nc, x[1001,], type="l", xlab="Node, i", ylab="Temperature (i)",
     main="t=10")
plot(1:nc, x[5001,], type="l", xlab="Node, i", ylab="Temperature (i)",
     main="t=50")
plot(1:nc, x[10001,], type="l", xlab="Node, i", ylab="Temperature (i)",
     main="t=100")
par(mfrow=c(1, 1))



##MEASUREMENTS

#We will create measurements for the first 510 time steps.
#More specifically, we assume that we take measurements from t=.01 to t=5.1.
#Again, let k be the time step index and dt the size of the time steps.
#In that way, we may compute the time (t) as k*dt. Thus, for k=1 and dt=1/100
#the corresponding time is 1*(1/100)=.01

#Extract the temperature states for t=.01 to t=5.1.
#Note that x[1, ] represents the temperature states at t=0. Thus, for t=.01
#to t=5.1 we need to select x[2:511, ]
x <- x[2:511,]
#Select temperatures at nodes 10, 20, ..., 90
np <- length(seq(10,90,10)) #number of nodes
dataEx1 <- x[, seq(10, 90, 10)]

#Add measurement noise to the selected states
for (i in 1:510) {
  dataEx1[i, ] <- dataEx1[i, ] + rnorm(np, 0, sqrt(vk))}

#Plot the generated measurements at t=5.1.
#Remember that the times are computed as k*dt, where k is the time step index
#and dt the size of the time steps.
plot(1:nc, x[510,], type="l", xlab="Node, i", ylab="Temperature (i)",
     col=gray(level=.8))
points(seq(10,90,10), dataEx1[510,], col="darkgreen", cex=.5)
legend("topright", pch=c(NA, 1), lty=c(1, NA),
       col=c(gray(level=.8), "darkgreen"),
       legend=c("true", "measured"),
       bty="n", y.intersp=1.2, cex=.7)

#Of these 510 steps, we will use the first 500 for fitting the EnKF
#and the last 10 steps for forecasting.
dataPred1 <- dataEx1[501:510,]
dataEx1 <- dataEx1[1:500, ]



##FORECASTING WITH THE ENSEMBLE KALMAN FILTER (EnKF)

#Dynamic model: specifying 100 temperature states.
#Note that you may change the specified initial state estimates (at t=0)
#below for the 100 states and see how such changes influence the behavior
#of the ensemble Kalman filter.
ex1 <- list(m0=c(200, rep(100, nc-2), 150), #initial state estimates
            #error covariances of the initial state estimates:
            #this matrix reflects the uncertainty in our initial state estimates
            #you may change the values in this matrix and see how they influence
            #the behavior of the Kalman filter
            C0=diag(rep(1e+1, nc)),
            #measurement noise
            V=diag(rep(vk, 9)),
            #process noise
            W=diag(rep(wk, nc)))

#Specify the state transition function:
#WARNING: always use arguments x and k when specifying the GGfunction
GGfunction <- function (x, k){
  x + (A%*%x)*dt + B%*%c(2*hs[k+1], .5*hs[k+1])}

#Specify the observation/measurement function:
#WARNING: always use arguments x and k when specifying the FFfunction
FFfunction <- function (x, k){
  x[seq(10, 90, 10)]}



##Compute the filtered (a posteriori) state estimates with the EnKF
#and employ 10 ensemble members in the EnKF 
enkf1 <- EnKF(y=dataEx1, mod=ex1, size=10,
              GGfunction=GGfunction, FFfunction=FFfunction)

#As a comparison, increase the size of the ensemble to 20
enkf2 <- EnKF(y=dataEx1, mod=ex1, size=20,
              GGfunction=GGfunction, FFfunction=FFfunction)

#And, finally, an EnKF with 100 ensemble members
enkf3 <- EnKF(y=dataEx1, mod=ex1, size=100,
              GGfunction=GGfunction, FFfunction=FFfunction)



#Plot the filtered state estimates at t=5
plot(1:nc, x[500,], type="l", col=c(gray(level=.5)),
     ylim=range(c(x[500,], enkf1$m[501,], enkf2$m[501,], enkf3$m[501,])),
     xlab="Node, i", ylab="Temperature (i)", main="t=5")
lines(1:nc, enkf1$m[501,], lty=2, col="blue", lwd=1)
lines(1:nc, enkf2$m[501,], lty=2, col="red", lwd=1)
lines(1:nc, enkf3$m[501,], lty=2, col="darkgreen", lwd=1)
legend("topright", lty=c(1, 2, 2, 2),
       col=c(gray(level=.5), "blue", "red", "darkgreen"),
       legend=c("true state", "EnKf with 10 members",
                "EnKf with 20 members", "EnKf with 100 members"),
       bty="n", y.intersp=1.2, cex=.7)

#Error plot
e1 <- sapply(1:(nrow(x)-10), function (i) mean((x[i,]-enkf1$m[i+1,])^2))
e2 <- sapply(1:(nrow(x)-10), function (i) mean((x[i,]-enkf2$m[i+1,])^2))
e3 <- sapply(1:(nrow(x)-10), function (i) mean((x[i,]-enkf3$m[i+1,])^2))
rangeError <- range(cbind(e1, e2, e3))
plot(1:(nrow(x)-10), e1, type="l", lty=1, col="blue", log="y",
     ylim=rangeError, ylab="Mean Squared Error", xlab="Time index, k")
lines(1:(nrow(x)-10), e2, lty=1, col="red")
lines(1:(nrow(x)-10), e3, lty=1, col="darkgreen")
legend("topright", lty=c(2, 2, 2),
       col=c("blue", "red", "darkgreen"),
       legend=c("EnKf with 10 members",
                "EnKf with 20 members", "EnKf with 100 members"),
       bty="n", y.intersp=1.2, cex=.7)



#Compute forecasts for 10 time steps ahead
#(If this forecasting function generates an error, then decrease nAhead
#from 10 to, for instance, 5)
fc1 <- EnKFforecast(filterData=enkf1, nAhead=10)
fc2 <- EnKFforecast(filterData=enkf2, nAhead=10)
fc3 <- EnKFforecast(filterData=enkf3, nAhead=10)

##FUTURE SYSTEM STATES
#95% confidence intervals for forecasts

#EnKF with 10 members: future system states 10 steps ahead
seFoA <- sqrt(fc1$R[[10]])
ciFoA <- fc1$a[10, ] + qnorm(.05/2)*seFoA%o%c(1, -1)

#Plot future system states including confidence intervals
plot(1:nc, x[510, ], type="l", lty=2, col="red",
     xlab="Node, i", ylab="Temperature (i)",
     main="EnKF with 10 members")
lines(1:nc, fc1$a[10, ], col="blue", lwd=1)
lines(1:nc, ciFoA[,1], col="blue", lty=2, lwd=1)
lines(1:nc, ciFoA[,2], col="blue", lty=2, lwd=1)
legend("topright", lty=c(2, 1), 
       col=c("red", "blue"),
       legend=c("true state", "future state"),
       bty="n", y.intersp=1.2, cex=.7)



#EnKF with 20 members: future system states 10 steps ahead
seFoA <- sqrt(fc2$R[[10]])
ciFoA <- fc2$a[10, ] + qnorm(.05/2)*seFoA%o%c(1, -1)

#Plot future system states including confidence intervals
plot(1:nc, x[510, ], type="l", lty=2, col="red",
     xlab="Node, i", ylab="Temperature (i)",
     main="EnKF with 20 members")
lines(1:nc, fc2$a[10, ], col="blue", lwd=1)
lines(1:nc, ciFoA[,1], col="blue", lty=2, lwd=1)
lines(1:nc, ciFoA[,2], col="blue", lty=2, lwd=1)
legend("topright", lty=c(2, 1), 
       col=c("red", "blue"),
       legend=c("true state", "future state"),
       bty="n", y.intersp=1.2, cex=.7)



#EnKF with 100 members: future system states 10 steps ahead
seFoA <- sqrt(fc3$R[[10]])
ciFoA <- fc3$a[10, ] + qnorm(.05/2)*seFoA%o%c(1, -1)

#Plot future system states including confidence intervals
plot(1:nc, x[510, ], type="l", lty=2, col="red",
     xlab="Node, i", ylab="Temperature (i)",
     main="EnKF with 100 members")
lines(1:nc, fc3$a[10, ], col="blue", lwd=1)
lines(1:nc, ciFoA[,1], col="blue", lty=2, lwd=1)
lines(1:nc, ciFoA[,2], col="blue", lty=2, lwd=1)
legend("topright", lty=c(2, 1), 
       col=c("red", "blue"),
       legend=c("true state", "future state"),
       bty="n", y.intersp=1.2, cex=.7)



##FUTURE OBSERVATIONS
#95% confidence intervals for forecasts

#EnKF with 10 members: future observations
seFoX <- sqrt(fc1$Q[[10]])
ciFoX <- fc1$f[10, ] + qnorm(.05/2)*seFoX%o%c(1, -1)

#Plot prediction intervals of future observations
plot(1:nc, x[510, ], type="l", lty=2, col=gray(level=.7),
     xlab="Node, i", ylab="Temperature (i)",
     main="EnKF with 10 members")
points(seq(10,90,10), dataPred1[10,], col="red", cex=.5)
arrows(seq(10,90,10), ciFoX[,1], seq(10,90,10) ,ciFoX[,2],
       code=3, length=0.1, angle=90, col=gray(level=.4))
legend("topright", lty=c(2, NA, 1), pch=c(NA, 1, NA),
       col=c(gray(level=.7), "red", gray(level=.7)),
       legend=c("true state", "measurements", "prediction interval"),
       bty="n", y.intersp=1.2, cex=.7)


#EnKF with 20 members: future observations
seFoX <- sqrt(fc2$Q[[10]])
ciFoX <- fc2$f[10, ] + qnorm(.05/2)*seFoX%o%c(1, -1)

#Plot prediction intervals of future observations
plot(1:nc, x[510, ], type="l", lty=2, col=gray(level=.7),
     xlab="Node, i", ylab="Temperature (i)",
     main="EnKF with 20 members")
points(seq(10,90,10), dataPred1[10,], col="red", cex=.5)
arrows(seq(10,90,10), ciFoX[,1], seq(10,90,10) ,ciFoX[,2],
       code=3, length=0.1, angle=90, col=gray(level=.4))
legend("topright", lty=c(2, NA, 1), pch=c(NA, 1, NA),
       col=c(gray(level=.7), "red", gray(level=.7)),
       legend=c("true state", "measurements", "prediction interval"),
       bty="n", y.intersp=1.2, cex=.7)



#EnKF with 100 members: future observations
seFoX <- sqrt(fc3$Q[[10]])
ciFoX <- fc3$f[10, ] + qnorm(.05/2)*seFoX%o%c(1, -1)

#Plot prediction intervals of future observations
plot(1:nc, x[510, ], type="l", lty=2, col=gray(level=.7),
     xlab="Node, i", ylab="Temperature (i)",
     main="EnKF with 100 members")
points(seq(10,90,10), dataPred1[10,], col="red", cex=.5)
arrows(seq(10,90,10), ciFoX[,1], seq(10,90,10) ,ciFoX[,2],
       code=3, length=0.1, angle=90, col=gray(level=.4))
legend("topright", lty=c(2, NA, 1), pch=c(NA, 1, NA),
       col=c(gray(level=.7), "red", gray(level=.7)),
       legend=c("true state", "measurements", "prediction interval"),
       bty="n", y.intersp=1.2, cex=.7)





#Future system states and observations for Node 10 separately.

#EnKF with 100 members: future system states for node 10
seFoA <- unlist(lapply(fc3$R, function (i) sqrt(i[10])))
ciFoA <- fc3$a[1:10, 10] + qnorm(.05/2)*seFoA%o%c(1, -1)

#Plot future system states including 95% confidence intervals for Node 10
plot(1:10, x[501:510, 10], type="l", lty=2, col="red",
     ylim=range(ciFoA),
     xlab="Step ahead, k", ylab="Temperature (k)",
     main="EnKF with 100 members")
lines(1:10, fc3$a[1:10, 10], col="blue", lwd=1)
lines(1:10, ciFoA[,1], col="blue", lty=2, lwd=1)
lines(1:10, ciFoA[,2], col="blue", lty=2, lwd=1)
legend("topright", lty=c(2, 1, 2), 
       col=c("red", "blue", "blue"),
       legend=c("true state", "future state", "confidence interval"),
       bty="n", y.intersp=1.2, cex=.7)


#EnKF with 100 members: future observations for node 10
seFoX <- unlist(lapply(fc3$Q, function (i) sqrt(i[1])))
ciFoX <- fc3$f[1:10, 1] + qnorm(.05/2)*seFoX%o%c(1, -1)

#Plot 95% prediction intervals for future observations of Node 10
plot(1:10, x[501:510, 10], type="l", lty=2, col=gray(level=.7),
     ylim=range(ciFoX),
     xlab="Step ahead, k", ylab="Temperature (k)",
     main="EnKF with 100 members")
points(1:10, dataPred1[1:10, 1], col="red", cex=.5)
lines(1:10, fc3$f[1:10, 1], col="blue", lwd=1)
lines(1:10, ciFoX[,1], col="blue", lty=2, lwd=1)
lines(1:10, ciFoX[,2], col="blue", lty=2, lwd=1)
legend("topleft", lty=c(2, NA, 1, 2), pch=c(NA, 1, NA, NA),
       col=c(gray(level=.7), "red", "blue", "blue"),
       legend=c("true state", "measurement", "predicted", "prediction interval"),
       bty="n", y.intersp=1.2, cex=.7)