# R code for computing variable importance for a neural network

The following R code computes the relative importance of input variables in a neural network. The implemented method for computing the relative importance was inspired by the Leo Breiman’s method for computing variable importance in a Random Forest.

Breiman’s method for computing variable importance can best be explained with an example.
Suppose you have 5 predictor variables, say x1 to x5. These 5 variables are used for predicting some observed response y. However, the 5 variables do not predict exactly the observed values for y. In other words, the predictions based on the 5 variables will more or less deviate from the observed values for y. The Mean Squared Error (MSE) is calculated as the mean of these deviations.
Assume now that predictor x1 has no predictive value for the response y. Hence, if we would randomly permute the observed values for x1, then our predictions for y would hardly change. As a consequence, the MSE before and after permuting the observed values for x1 would be similar.
On the other hand, assume that x3 is strongly related to our response y. If we randomly permute the observed values for x3, then our MSE before and after this permutation would deviate considerably.
Based on these random permutations and our observed change in MSE, we may conclude that predictor variable x3 is more important than x1 in predicting y.

The R code below applies Breiman’s permutation method for computing the relative importance of input variables in a neural network.
Furthermore, the R code also compares in 3 simulations the performance of this method for neural networks with those of:

• Olden’s method for computing variable importance in a neural network
• Breiman’s permutation method for computing variable importance in a random forest

The simulations in the R code are similar to the simulation described by Olden, Joy, and Death in their paper An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Note that Olden and colleagues refer to Olden’s method for computing variable importance as the Connection Weights method.

See this page for an overview of all of Stefan’s R code blog posts.

R code

```library(nnet)
library(NeuralNetTools)
library(mvtnorm)
library(cluster)
library(randomForest)

#Function for computing importance of input variables in a neural network
#based on Breiman's permutation method.
breiman <- function (model, y, covdata, nrep){
#model is an object of class nnet as returned by nnet
#y is a vector containing the target (or dependent variable)
#covdata is a data frame containing the covariates
#nrep specifies the number of Monte Carlo simulations for computing the
# variable importance

#calculate mean squared error of model (will be used as benchmark)
residuals <- y-predict(model, covdata, type="raw")
MSE <- mean(residuals^2)

#compute variable importance
ncov <- ncol(covdata)
nobs <- nrow(covdata)

vi <- sapply(1:ncov, function (i){
simMat <- covdata[rep(seq(nobs), nrep), ]
permutedCol <- as.vector(replicate(nrep, sample(covdata[,i], nobs, replace=FALSE)))
indicatorNrep <- rep(1:nrep, each=nobs)
simMat[,i] <- permutedCol
permuted.pred <- predict(model, newdata=simMat, type="raw")
SMSE <- tapply(permuted.pred, indicatorNrep, function (j) mean((y-j)^2))
mean((SMSE-MSE)/MSE)})

names(vi) <- colnames(covdata)
return(vi)
}

##SIMULATION 1

##Generate data:
#- Five predictor variables (x1 to x5) and one response variable (y).
#- Correlations between these five predictors and the response variable are set to
# .8, .6., .4, .2, and 0.
#- Correlations between predictor variables are set to .2.
#- Mean of each variable is zero, and the variance is 1.

#variance-covariance matrix
varcov <- matrix(c(1, .8, .6, .4, .2, 0,
.8, 1, .2, .2, .2, .2,
.6, .2, 1, .2, .2, .2,
.4, .2, .2 ,1, .2, .2,
.2, .2, .2, .2, 1, .2,
0, .2, .2, .2, .2, 1), ncol=6, byrow=TRUE)

#statistical population
statpop <- rmvnorm(10000, mean=rep(0, 6), sigma=varcov, method="chol")
colnames(statpop) <- c("y", "x1", "x2", "x3", "x4", "x5")

#check generated data
apply(statpop,2,mean) #means of variables
round(var(statpop),2) #variances-covariances
round(cor(statpop),2) #correlations
summary(lm(y~., data=data.frame(statpop))) #fit ols-model to data
#Why has the coefficient for x5 a (large) negative value? Predictor variable
#x5 is a classical example of, what is called in regression modeling, a suppressor.
#That is, there is a zero correlation between x5 and the response variable (y) in
#the statistical population. At the same time, x5 is correlated with all the
#other predictor variables (x1 to x4) in the population.
#Furthermore, all these other predictors (x1 to x4) are positively correlated with
#the response variable in the population. Under such circumstances, the coefficient
#of the suppressor in a regression model usually has a negative value.

##Monte-Carlo experiment

#Optimal values for the size and decay parameters of the neural network.
#These values are obtained using a 10-fold cross-validation method.
#(see the appendix towards the end of the R code for performing
#this 10-fold cross-validation)
bestValues <- matrix(c(5, .01), nrow=1)

#Run 500 Monte-Carlo trials. For each trial, draw a sample of 50 from the
#statistical population. Subsequently, compute for each trial sample the
#importance of each of the predictor variables (x1 to x5) .
simMat <- matrix(NA, nrow=500, ncol=15)

for(i in 1:500){
#sample
randSampleObs <- sample(10000, 2*50, replace=FALSE)
randSample <- data.frame(statpop[randSampleObs,])
colnames(randSample) <- c("y", "x1", "x2", "x3", "x4", "x5")

#split random sample into training and validation set
d <- sample(nrow(randSample), size=.5*(nrow(randSample)), replace=FALSE)
#training data
randSampleT <-randSample[d,]
#validation data (needed for Breiman's method for neural networks)
randSampleV <-randSample[-d,]

#fit neural network to training data
modnn <- nnet(y~., data=randSampleT, linout=TRUE, size=bestValues[,1],
decay=bestValues[,2], trace=FALSE)
#fit random forest to training data
modrf <- randomForest(y~., data=randSampleT, importance=TRUE)

#Breiman's variable importance for neural networks
resultBreiman <- breiman(modnn, randSampleV[,1], randSampleV[,-1], nrep=50)
simMat[i, 1:5] <- names(resultBreiman)[order(-resultBreiman)]
#Olden's variable importance
resultOlden <- olden(modnn, "y", bar_plot=FALSE)
simMat[i, 6:10] <- rownames(resultOlden)[order(-resultOlden)]
#Breiman's variable importance for random forest
resultBreimanRF <- importance(modrf)[,1]
simMat[i, 11:15] <- names(resultBreimanRF)[order(-resultBreimanRF)]}

#extract results
Breiman <- simMat[,1:5] #Breiman's method applied to neural network
Olden <- simMat[,6:10] #Olden's method
BreimanRF <- simMat[,11:15] #Breiman's method applied to random forest

##Gower similarity
#assess the degree of similarity between the estimated and the true
#ranked importance of the predictor variables
trueRanking <- c("x1", "x2", "x3", "x4", "x5")

#results Olden's variable importance
simOlden <- apply(Olden, 1, function (x)
1-daisy(data.frame(rbind(trueRanking, x)), metric="gower"))
mOnn <- mean(simOlden); sdOnn <- sd(simOlden)

#results Breiman's variable importance for neural network
simBreiman <- apply(Breiman, 1, function (x)
1-daisy(data.frame(rbind(trueRanking, x)), metric="gower"))
mBnn <- mean(simBreiman); sdBnn <- sd(simBreiman)

#results Breiman's variable importance for random forest
simBreimanRF <- apply(BreimanRF, 1, function (x)
1-daisy(data.frame(rbind(trueRanking, x)), metric="gower"))
mBrf <- mean(simBreimanRF); sdBrf <- sd(simBreimanRF)

#plot results (similar to Figure 1 in Olden, Joy, and Death)
#Olden = Olden's variable importance method
#Breiman (nnet) = Breiman's variable importance method applied to neural network
#Breiman (rf) = Breiman's variable importance method applied to random forest
simGower <- cbind(c(mBrf, mBnn, mOnn), c(mBrf-sdBrf, mBnn-sdBnn, mOnn-sdOnn),
c(mBrf+sdBrf, mBnn+sdBnn, mOnn+sdOnn))
rownames(simGower) <- c("Breiman (rf)", "Breiman (nnet)","Olden")
dotchart(simGower[,1], xlab="Gower Similarity",
xlim=c(min(simGower[,2]), max(simGower[,3])))
segments(simGower[,2], 1:3, simGower[,3], 1:3, col="darkgray")

##Ranked Variable Importance
#For each of the 500 Monte-Carlo trials, the predictor variables are
#ranked according to their computed variable importance value.
#The predictor variable with the highest calculated importance
#value is assigned position 1 in the ranking, the variable with the
#second highest value position 2, et cetera.
#The Ranked Variable Importance is the average position calculated
#over 500 trials.

#results Olden's variable importance
rankOlden <- apply(Olden, 1, order)
mrOnn <- apply(rankOlden, 1, mean); sdrOnn <- apply(rankOlden, 1, sd)
lclOnn <- mrOnn-sdrOnn; uclOnn <- mrOnn+sdrOnn

#results Breiman's variable importance for neural network
rankBreiman <- apply(Breiman, 1, order)
mrBnn <- apply(rankBreiman, 1, mean); sdrBnn <- apply(rankBreiman, 1, sd)
lclBnn <- mrBnn-sdrBnn; uclBnn <- mrBnn+sdrBnn

#results Breiman's variable importance for random forest
rankBreimanRF <- apply(BreimanRF, 1, order)
mrBrf <- apply(rankBreimanRF, 1, mean); sdrBrf <- apply(rankBreimanRF, 1, sd)
lclBrf <- mrBrf-sdrBrf; uclBrf <- mrBrf+sdrBrf

#plot results (similar to Figure 2 in Olden, Joy, and Death)
#Olden = Olden's variable importance method
#Breiman (nnet) = Breiman's variable importance method applied to neural network
#Breiman (rf) = Breiman's variable importance method applied to random forest
simRank <- c(mrOnn, mrBnn, mrBrf)
names(simRank) <- rep(trueRanking, times=3)

dotchart(simRank, xlab="Ranked Variable Importance",
groups=factor(rep(c("1. Olden", "2. Breiman (nnet)", "3. Breiman (rf)"), each=5)),
xlim=c(min(c(lclOnn, lclBnn, lclBrf)), max(c(uclOnn, uclBnn, uclBrf))),
cex=.8, col="darkgreen")
segments(lclOnn, 15:19, uclOnn, 15:19, col="darkgray")
segments(lclBnn, 8:12, uclBnn, 8:12, col="darkgray")
segments(lclBrf, 1:5, uclBrf, 1:5, col="darkgray")
abline(v=c(1,2,3,4,5), col="gray", lty=2)

#Of all three methods, Breiman's method applied to neural networks
#seems to do the worst job of predicting the correct variable importance
#order for this data.

#For explaining the results of Breiman's method applied to neural networks,
#it is useful to look once again at the ols-model fitted to the
#statistical population.
summary(lm(y~., data=data.frame(statpop))) #fit ols-model to data
#Now, take the absolute values of the ols-coefficients for x1 to x5,
#and order them from large to small.
#The resulting order is x1, x2, x5, x3, and x4. This order resembles the
#order of the ranked variable importances we found for Breiman's method
#applied to neural networks (see dotchart for Ranked Variables Importance).
#Note that it only makes sense to compare the absolute values of the
#ols-coefficients when the predictor variables are measured on the same scale.
#Remember that this was the case with the predictor variables employed in
#Simulation 1 (variables x1 to x5 all had mean 0 and variance 1).

#In the following simulation we will generate a dataset where the
#order of the ols-coefficients (from large to small) will be x1, x2, x3, x4, and x5.
#In that case we might expect that Breiman's method for neural networks
#yields an order of variable importances similar to the order of these ols-coefficients.

##SIMULATION 2

##Generate data:
#Again, the mean of each variable is set to 0, and the variance to 1.
varcov <- matrix(c(1, .8, .6, .5, .4, .3,
.8, 1, .2, .2, .2, .2,
.6, .2, 1, .2, .2, .2,
.5, .2, .2 ,1, .2, .2,
.4, .2, .2, .2, 1, .2,
.3, .2, .2, .2, .2, 1), ncol=6, byrow=TRUE)

#statistical population
statpop <- rmvnorm(10000, rep(0, 6), sigma=varcov, method="chol")
colnames(statpop) <- c("y", "x1", "x2", "x3", "x4", "x5")

#check generated data
apply(statpop,2,mean) #means of variables
round(var(statpop),2) #variances-covariances
round(cor(statpop),2) #correlations
summary(lm(y~., data=data.frame(statpop))) #fit ols-model to data
#note that the order of the ols-coefficients for the predictor variables is
#x1, x2, x3, x4, and x5 (when ordered from large to small)

##Monte-Carlo experiment

#optimal values for the size and decay parameters of the neural network
bestValues <- matrix(c(2, .01), nrow=1)

#run 500 Monte-Carlo trials
simMat <- matrix(NA, nrow=500, ncol=15)

for(i in 1:500){
#sample
randSampleObs <- sample(10000, 2*50, replace=FALSE)
randSample <- data.frame(statpop[randSampleObs,])
colnames(randSample) <- c("y", "x1", "x2", "x3", "x4", "x5")

#split random sample into training and validation set
d <- sample(nrow(randSample), size=.5*(nrow(randSample)), replace=FALSE)
#training data
randSampleT <-randSample[d,]
#validation data (needed for Breiman's method for neural networks)
randSampleV <-randSample[-d,]

#fit neural network to training data
modnn <- nnet(y~., data=randSampleT, linout=TRUE, size=bestValues[,1],
decay=bestValues[,2], trace=FALSE)
#fit random forest to training data
modrf <- randomForest(y~., data=randSampleT, importance=TRUE)

#Breiman's variable importance for neural networks
resultBreiman <- breiman(modnn, randSampleV[,1], randSampleV[,-1], nrep=50)
simMat[i, 1:5] <- names(resultBreiman)[order(-resultBreiman)]
#Olden's variable importance
resultOlden <- olden(modnn, "y", bar_plot=FALSE)
simMat[i, 6:10] <- rownames(resultOlden)[order(-resultOlden)]
#Breiman's variable importance for random forest
resultBreimanRF <- importance(modrf)[,1]
simMat[i, 11:15] <- names(resultBreimanRF)[order(-resultBreimanRF)]}

#extract results
Breiman <- simMat[,1:5] #Breiman's method applied to neural network
Olden <- simMat[,6:10] #Olden's method
BreimanRF <- simMat[,11:15] #Breiman's method applied to random forest

##Gower similarity
#assess the degree of similarity between the estimated and the true
#ranked importance of the predictor variables
trueRanking <- c("x1", "x2", "x3", "x4", "x5")

#results Breiman's variable importance for neural network
simBreiman <- apply(Breiman, 1, function (x)
1-daisy(data.frame(rbind(trueRanking, x)), metric="gower"))
mBnn <- mean(simBreiman); sdBnn <- sd(simBreiman)

#results Olden's variable importance
simOlden <- apply(Olden, 1, function (x)
1-daisy(data.frame(rbind(trueRanking, x)), metric="gower"))
mOnn <- mean(simOlden); sdOnn <- sd(simOlden)

#results Breiman's variable importance for random forest
simBreimanRF <- apply(BreimanRF, 1, function (x)
1-daisy(data.frame(rbind(trueRanking, x)), metric="gower"))
mBrf <- mean(simBreimanRF); sdBrf <- sd(simBreimanRF)

#plot results
simGower <- cbind(c(mBrf, mBnn, mOnn), c(mBrf-sdBrf, mBnn-sdBnn, mOnn-sdOnn),
c(mBrf+sdBrf, mBnn+sdBnn, mOnn+sdOnn))
rownames(simGower) <- c("Breiman (rf)", "Breiman (nnet)","Olden")
dotchart(simGower[,1], xlab="Gower Similarity",
xlim=c(min(simGower[,2]), max(simGower[,3])))
segments(simGower[,2], 1:3, simGower[,3], 1:3, col="darkgray")

##Ranked variable importance
#results Breiman's variable importance for neural network
rankBreiman <- apply(Breiman, 1, order)
mrBnn <- apply(rankBreiman, 1, mean); sdrBnn <- apply(rankBreiman, 1, sd)
lclBnn <- mrBnn-sdrBnn; uclBnn <- mrBnn+sdrBnn

#results Olden's variable importance
rankOlden <- apply(Olden, 1, order)
mrOnn <- apply(rankOlden, 1, mean); sdrOnn <- apply(rankOlden, 1, sd)
lclOnn <- mrOnn-sdrOnn; uclOnn <- mrOnn+sdrOnn

#results Breiman's variable importance for random forest
rankBreimanRF <- apply(BreimanRF, 1, order)
mrBrf <- apply(rankBreimanRF, 1, mean); sdrBrf <- apply(rankBreimanRF, 1, sd)
lclBrf <- mrBrf-sdrBrf; uclBrf <- mrBrf+sdrBrf

#plot results
simRank <- c(mrOnn, mrBnn, mrBrf)
names(simRank) <- rep(trueRanking, times=3)

dotchart(simRank, xlab="Ranked Variable Importance",
groups=factor(rep(c("1. Olden", "2. Breiman (nnet)", "3. Breiman (rf)"), each=5)),
xlim=c(min(c(lclOnn, lclBnn, lclBrf)), max(c(uclOnn, uclBnn, uclBrf))),
cex=.8, col="darkgreen")
segments(lclOnn, 15:19, uclOnn, 15:19, col="darkgray")
segments(lclBnn, 8:12, uclBnn, 8:12, col="darkgray")
segments(lclBrf, 1:5, uclBrf, 1:5, col="darkgray")
abline(v=c(1,2,3,4,5), col="gray", lty=2)

#Note that Breiman's method applied to neural networks now predicts the order of
#the variable importances much more precisely.

##SIMULATION 3

##Generate data:
#In the following dataset predictor variables x1 and x3 are positively
#correlated with the response variable (y), whereas x2 is negatively
#correlated with the response variable.
varcov <- matrix(c(1, .7, -.5, .3,
.7, 1, .2, .2,
-.5, .2, 1, .2,
.3, .2, .2 ,1), ncol=4, byrow=TRUE)

#statistical population
statpop <- rmvnorm(10000, rep(0, 4), sigma=varcov, method="chol")
colnames(statpop) <- c("y", "x1", "x2", "x3")

#check generated data
apply(statpop,2,mean) #means of variables
round(var(statpop),2) #variances-covariances
round(cor(statpop),2) #correlations
summary(lm(y~., data=data.frame(statpop))) #fit ols-model to data
#If we take the absolute values of the coefficients and order them
#from large to small, then we observe that the order of the coefficients
#is x1, x2, x3.

##Monte-Carlo experiment

#optimal values for the size and decay parameters of the neural network
bestValues <- matrix(c(10, .001), nrow=1)

#run 500 Monte-Carlo trials
simMat <- matrix(NA, nrow=500, ncol=15)

for(i in 1:500){
#sample
randSampleObs <- sample(10000, 2*50, replace=FALSE)
randSample <- data.frame(statpop[randSampleObs,])
colnames(randSample) <- c("y", "x1", "x2", "x3")

#split random sample into training and validation set
d <- sample(nrow(randSample), size=.5*(nrow(randSample)), replace=FALSE)
#training data
randSampleT <-randSample[d,]
#validation data (needed for Breiman's method for neural networks)
randSampleV <-randSample[-d,]

#fit neural network to training data
modnn <- nnet(y~., data=randSampleT, linout=TRUE, size=bestValues[,1],
decay=bestValues[,2], trace=FALSE)
#fit random forest to training data
modrf <- randomForest(y~., data=randSampleT, importance=TRUE)

#Breiman's variable importance for neural networks
resultBreiman <- breiman(modnn, randSampleV[,1], randSampleV[,-1], nrep=50)
simMat[i, 1:3] <- names(resultBreiman)[order(-resultBreiman)]
#Olden's variable importance
resultOlden <- olden(modnn, "y", bar_plot=FALSE)
simMat[i, 4:6] <- rownames(resultOlden)[order(-resultOlden)]
#Breiman's variable importance for random forest
resultBreimanRF <- importance(modrf)[,1]
simMat[i, 7:9] <- names(resultBreimanRF)[order(-resultBreimanRF)]}

#extract results
Breiman <- simMat[,1:3] #Breiman's method applied to neural network
Olden <- simMat[,4:6] #Olden's method
BreimanRF <- simMat[,7:9] #Breiman's method applied to random forest

##Gower similarity
#assess the degree of similarity between the estimated and the true
#ranked importance of the variables
trueRanking <- c("x1", "x2", "x3")

#results Breiman's variable importance for neural network
simBreiman <- apply(Breiman, 1, function (x)
1-daisy(data.frame(rbind(trueRanking, x)), metric="gower"))
mBnn <- mean(simBreiman); sdBnn <- sd(simBreiman)

#results Olden's variable importance
simOlden <- apply(Olden, 1, function (x)
1-daisy(data.frame(rbind(trueRanking, x)), metric="gower"))
mOnn <- mean(simOlden); sdOnn <- sd(simOlden)

#results Breiman's variable importance for random forest
simBreimanRF <- apply(BreimanRF, 1, function (x)
1-daisy(data.frame(rbind(trueRanking, x)), metric="gower"))
mBrf <- mean(simBreimanRF); sdBrf <- sd(simBreimanRF)

#plot results
simGower <- cbind(c(mBrf, mBnn, mOnn), c(mBrf-sdBrf, mBnn-sdBnn, mOnn-sdOnn),
c(mBrf+sdBrf, mBnn+sdBnn, mOnn+sdOnn))
rownames(simGower) <- c("Breiman (rf)", "Breiman (nnet)","Olden")
dotchart(simGower[,1], xlab="Gower Similarity",
xlim=c(min(simGower[,2]), max(simGower[,3])))
segments(simGower[,2], 1:3, simGower[,3], 1:3, col="darkgray")

##Ranked variable importance
#results Breiman's variable importance for neural network
rankBreiman <- apply(Breiman, 1, order)
mrBnn <- apply(rankBreiman, 1, mean); sdrBnn <- apply(rankBreiman, 1, sd)
lclBnn <- mrBnn-sdrBnn; uclBnn <- mrBnn+sdrBnn

#results Olden's variable importance
rankOlden <- apply(Olden, 1, order)
mrOnn <- apply(rankOlden, 1, mean); sdrOnn <- apply(rankOlden, 1, sd)
lclOnn <- mrOnn-sdrOnn; uclOnn <- mrOnn+sdrOnn

#results Breiman's variable importance for random forest
rankBreimanRF <- apply(BreimanRF, 1, order)
mrBrf <- apply(rankBreimanRF, 1, mean); sdrBrf <- apply(rankBreimanRF, 1, sd)
lclBrf <- mrBrf-sdrBrf; uclBrf <- mrBrf+sdrBrf

#plot results
simRank <- c(mrOnn, mrBnn, mrBrf)
names(simRank) <- rep(trueRanking, times=3)

dotchart(simRank, xlab="Ranked Variable Importance",
groups=factor(rep(c("1. Olden", "2. Breiman (nnet)", "3. Breiman (rf)"), each=3)),
xlim=c(min(c(lclOnn, lclBnn, lclBrf)), max(c(uclOnn, uclBnn, uclBrf))),
cex=.8, col="darkgreen")
segments(lclOnn, 11:13, uclOnn, 11:13, col="darkgray")
segments(lclBnn, 6:8, uclBnn, 6:8, col="darkgray")
segments(lclBrf, 1:3, uclBrf, 1:3, col="darkgray")
abline(v=c(1,2,3), col="gray", lty=2)

#Note that Olden's method seems to do the worst job of predicting
#the correct variable importance order for this data.

##Sobol's indices
#We will also calculate the Sobol Indices for the data employed in Simulation 3.
library(sensitivity)

#fit ols-model to data
modlm <- lm(y~., data=data.frame(statpop))

#model specification
lm.model <- function (X, coefs=coef(modlm)) (c(1, X)%*%coefs)

lm.sim <- function(X){
y <- apply(X, 1, lm.model)
as.numeric(y)
}

#generate "X1" data (=first random sample)
N <- 10000
X1 <- data.frame(rmvnorm(N, rep(0, 4), sigma=varcov, method="chol"))[,-1]
colnames(X1) <- c("x1", "x2", "x3")

#generate "X2" data (=second random sample)
N <- 10000
X2 <- data.frame(rmvnorm(N, rep(0, 4), sigma=varcov, method="chol"))[,-1]
colnames(X2) <- c("x1", "x2", "x3")

#compute Sobol's indices
xs <- sobol(model=lm.sim, X1=X1, X2=X2, order=1)
print(xs)
plot(xs)

#alternatively
xs2 <- soboljansen(model=lm.sim, X1=X1, X2=X2)
print(xs2)
plot(xs2,ylim=c(min(xs2\$S\$"original",xs2\$T\$"original")-.05,
max(xs2\$S\$"original",xs2\$T\$"original")+.05))

##APPENDIX
library(ipred)

#TRAIN NETWORK ON POPULATION
#specify tuning values for:
#1) the size of the hidden layer
#2) the weight decay parameter
sizes <- c(2, 5, 10)
decays <- c(1e-3, .01, .1)
tryValues <- expand.grid(size=sizes, decay=decays)

#perform nrep times 10-fold cross-validation
trainData <- data.frame(statpop)
nrep <- 1
errorCV <- sapply(1:nrow(tryValues), function (i)
replicate(nrep, errorest(y~., data=trainData, model=nnet, linout=TRUE,
size=tryValues[i,1], decay=tryValues[i,2], trace=FALSE,
estimator="cv",
est.para=list(control.errorest(k=10)))\$error),
simplify=FALSE)

#display results
interaction.plot(rep(tryValues[,2], each=nrep), rep(tryValues[,1], each=nrep),
unlist(errorCV), lty=1, col=1:length(sizes), ylab="mean rmse",
xlab=names(tryValues), trace.label=names(tryValues))
#select tuning values that yield the lowest mean rmse
(bestValues <- tryValues[which.min(simplify2array(lapply(errorCV, mean))),])```