R code for fitting a quantile regression model to censored data by means of a copula

In my previous blog post I showed how to fit a copula to censored data. For the ease of use, I’m going to call these fitted copulas censored copulas.

The following R code demonstrates how these censored copulas in turn can be used for fitting a quantile regression model to censored data.

A more detailed description of the method employed for fitting the quantile regression model can be found in this blog post. Continue reading R code for fitting a quantile regression model to censored data by means of a copula

R code for fitting a copula to censored data

The following R code fits a bivariate (Archimedean or elliptical) copula to data where one of the variables contains censored observations. The censored observations can be left, right or interval censored.

copula-censored-data

Two-stage parametric ML method
The copula is fitted using the two-stage parametric ML approach (also referred to as the Inference Functions for Margins [IFM] method). This method fits a copula in two steps:

  1. Estimate the parameters of the marginals
  2. Fix the marginal parameters to the values estimated in first step, and subsequently estimate the copula parameters.

Continue reading R code for fitting a copula to censored data

R code for constructing probability plots

Probability plots are a tool for assessing whether the observed data follow a particular distribution.

probability-plotExample of a probability plot for a Beta distribution

In short, if all data points in a probability plot fall on an approximate straight line, then you may assume that the data fit to the distribution. In the figure above, for instance, all points seem to fall on a straight line in a Beta probability plot. As a result, we may assume that these data points come from a Beta distribution.

The following R code constructs probability plots. Continue reading R code for constructing probability plots

R code for modeling with left truncated and right/interval censored data

Burn-in testing is used to screen out units or systems with short lifetimes. Units or systems that survived a burn-in test may give rise to left truncated data that is either right or interval censored.

Left truncated and right censored data
Tobias and Trindade reported in their 2012 book Applied Reliability on field failure times of units that survived a burn-in test of 5000 hours (Example 4.6, p. 109). These field failure times represent an example of left truncation in combination with right censoring.

Left truncated and interval censored data
Meeker and Escobar described in their 1998 book Statistical Methods for Reliability Data a field-tracking study of units that survived a 1000 hours burn-in test (Example 11.11, pp. 269-270). The data in Meeker and Escobar’s study is an example of left truncation in combination with interval censoring.

Model fitting using maximum likelihood optimization

The R code fits a Weibull (or lognormal) model to left truncated data that is either right or interval censored. The fitting of these models is done by log-likelihood optimization (using the optim function in R).

Continue reading R code for modeling with left truncated and right/interval censored data

R code for computing variable importance for a survival model

The following R code computes the relative importance for predictor variables in a survival model. The implemented method for computing the relative importance was inspired by the Leo Breiman’s method for computing variable importance in a Random Forest.

Breiman’s method for computing variable importance can best be explained with an example.
Suppose you have 5 predictor variables, say x1 to x5. These 5 variables are used for predicting some observed response y. However, the 5 variables do not predict exactly the observed values for y. In other words, the predictions based on the 5 variables will more or less deviate from the observed values for y. The Mean Squared Error (MSE) is calculated as the mean of these deviations.
Assume now that predictor x1 has no predictive value for the response y. Hence, if we would randomly permute the observed values for x1, then our predictions for y would hardly change. As a consequence, the MSE before and after permuting the observed values for x1 would be similar.
On the other hand, assume that x3 is strongly related to our response y. If we randomly permute the observed values for x3, then our MSE before and after this permutation would deviate considerably.
Based on these random permutations and our observed change in MSE, we may conclude that predictor variable x3 is more important than x1 in predicting y.

The R code below applies Breiman’s permutation method for computing the relative importance of predictor variables to a survival model. However, instead of MSE the code employs concordance as a measure for the prediction accuracy.
Furthermore, the R code also compares in 2 simulations the performance of Breiman’s method applied to survival models with that of Breiman’s method implemented in a random survival forest.

survival model variable importance

Continue reading R code for computing variable importance for a survival model