Title: | Enhanced Implementation of Whittaker-Henderson Smoothing |
---|---|
Description: | An enhanced implementation of Whittaker-Henderson smoothing for the gradation of one-dimensional and two-dimensional actuarial tables used to quantify Life Insurance risks. 'WH' is based on the methods described in Biessy (2023) <doi:10.48550/arXiv.2306.06932>. Among other features, it generalizes the original smoothing algorithm to maximum likelihood estimation, automatically selects the smoothing parameter(s) and extrapolates beyond the range of data. |
Authors: | Guillaume Biessy [aut, cre, cph] |
Maintainer: | Guillaume Biessy <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.2.9000 |
Built: | 2024-11-14 05:14:36 UTC |
Source: | https://github.com/guillaumebiessy/wh |
An enhanced implementation of Whittaker-Henderson smoothing for the gradation
of one-dimensional and two-dimensional actuarial tables used to quantify Life
Insurance risks. WH
is based on the methods described in Biessy (2023)
doi:10.48550/arXiv.2306.06932. Among other features, it generalizes the
original smoothing algorithm to maximum likelihood estimation, automatically
selects the smoothing parameter(s) and extrapolates beyond the range of data.
Maintainer: Guillaume Biessy [email protected] (ORCID) [copyright holder]
Useful links:
Provide WH model Fit Results as a Data.frame
output_to_df(object, dim1 = "x", dim2 = "t")
output_to_df(object, dim1 = "x", dim2 = "t")
object |
An object of class |
dim1 |
The optional name to be given to the first dimension |
dim2 |
The optional name to be given to the second dimension |
A data.frame regrouping information about the fitted and predicted values, the model variance, residuals and effective degrees of freedom...
d <- portfolio_mort$d ec <- portfolio_mort$ec y <- log(d / ec) y[d == 0] <- - 20 wt <- d fit_1d <- WH_1d(d, ec) output_to_df(fit_1d) keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] y <- log(d / ec) # observation vector y[d == 0] <- - 20 wt <- d # Maximum likelihood fit_2d <- WH_2d(d, ec) output_to_df(fit_2d)
d <- portfolio_mort$d ec <- portfolio_mort$ec y <- log(d / ec) y[d == 0] <- - 20 wt <- d fit_1d <- WH_1d(d, ec) output_to_df(fit_1d) keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] y <- log(d / ec) # observation vector y[d == 0] <- - 20 wt <- d # Maximum likelihood fit_2d <- WH_2d(d, ec) output_to_df(fit_2d)
Plot Method for a Whittaker-Henderson Fit
## S3 method for class 'WH_1d' plot(x, what = "fit", trans, ...)
## S3 method for class 'WH_1d' plot(x, what = "fit", trans, ...)
x |
An object of class |
what |
What should be plotted. Should be one of |
trans |
An (optional) transformation to be applied to the data. By default the identity function |
... |
Not used |
A plot representing the desired element from the fit
d <- portfolio_mort$d ec <- portfolio_mort$ec fit <- WH_1d(d, ec) plot(fit) plot(fit, "res") plot(fit, "edf")
d <- portfolio_mort$d ec <- portfolio_mort$ec fit <- WH_1d(d, ec) plot(fit) plot(fit, "res") plot(fit, "edf")
Plot Method for a Whittaker-Henderson Fit
## S3 method for class 'WH_2d' plot(x, what = "y_hat", trans, ...)
## S3 method for class 'WH_2d' plot(x, what = "y_hat", trans, ...)
x |
An object of class |
what |
Should be one of |
trans |
An (optional) transformation to be applied to the data. By default the identity function |
... |
Not used |
A plot representing the desired element from the fit...
keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] fit <- WH_2d(d, ec) plot(fit) plot(fit, "std_y_hat")
keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] fit <- WH_2d(d, ec) plot(fit) plot(fit, "std_y_hat")
Agregated dataset built from a synthetic long-term care portfolio
portfolio_LTC
portfolio_LTC
A dataset obtaining by agregating the information from a fictive long-term care annuitant database with 5,000 annuitants over a 10-year observation period. The dataset is supplied as a list with two components :
A matrix containing the number of observed deaths for each combination of age and duration in LTC where at least one death has been observed
A matrix containing the associated central exposure in person-years for each combination of age and duration in LTC in d
Agregated dataset built from a synthetic mortality portfolio
portfolio_mort
portfolio_mort
A dataset containing the information from a simulated annuity portfolio with 100,000 contributors over a 10-year observation period. The dataset is supplied as a list with two components :
A vector containing the number of observed deaths for ages where at least one death has been observed
A vector containing the associated central exposure in person-years for each age in d
Extrapolate the Whittaker-Henderson fit for new observations.
## S3 method for class 'WH_1d' predict(object, newdata = NULL, ...)
## S3 method for class 'WH_1d' predict(object, newdata = NULL, ...)
object |
An object of class |
newdata |
A vector containing the position of new observations. Observations from the fit will automatically be added to this, in the adequate order |
... |
Not used |
An object of class "WH_1d"
with additional components y_pred
and
std_y_pred
corresponding to the model predictions and associated standard
deviations.
d <- portfolio_mort$d ec <- portfolio_mort$ec fit <- WH_1d(d, ec) newdata = 18:99 pred <- predict(fit, newdata) plot(pred)
d <- portfolio_mort$d ec <- portfolio_mort$ec fit <- WH_1d(d, ec) newdata = 18:99 pred <- predict(fit, newdata) plot(pred)
Extrapolate the Whittaker-Henderson fit for new observations in a way that is consistent with the initial model fit.
## S3 method for class 'WH_2d' predict(object, newdata = NULL, ...)
## S3 method for class 'WH_2d' predict(object, newdata = NULL, ...)
object |
An object of class |
newdata |
A list containing two vectors indicating the new observation positions |
... |
Not used |
An object of class "WH_2d"
with additional components y_pred
and
std_y_pred
corresponding to the model predictions and associated standard
deviations.
keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] fit <- WH_2d(d, ec) newdata <- list(age = 50:99, duration = 0:19) pred <- predict(fit, newdata) plot(pred)
keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] fit <- WH_2d(d, ec) newdata <- list(age = 50:99, duration = 0:19) pred <- predict(fit, newdata) plot(pred)
Print Method for a Whittaker-Henderson Fit
## S3 method for class 'WH_1d' print(x, ...)
## S3 method for class 'WH_1d' print(x, ...)
x |
An object of class |
... |
Not used |
Invisibly returns x
.
d <- portfolio_mort$d ec <- portfolio_mort$ec y <- log(d / ec) y[d == 0] <- - 20 wt <- d WH_1d(d, ec)
d <- portfolio_mort$d ec <- portfolio_mort$ec y <- log(d / ec) y[d == 0] <- - 20 wt <- d WH_1d(d, ec)
Print Method for a Whittaker-Henderson Fit
## S3 method for class 'WH_2d' print(x, ...)
## S3 method for class 'WH_2d' print(x, ...)
x |
An object of class |
... |
Not used |
Invisibly returns x
.
keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] WH_2d(d, ec)
keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] WH_2d(d, ec)
Main package function to apply Whittaker-Henderson smoothing in a
one-dimensional survival analysis framework. It takes as input a vector of
observed events and a vector of associated central exposure, both depending
on a single covariate, and build a smooth version of the log-hazard rate.
Smoothing parameters may be supplied or automatically chosen according to an
adequate criterion such as "REML"
(the default), "AIC"
, "BIC"
or
"GCV"
. Whittaker-Henderson may be applied in a full maximum likelihood
framework (the default) or an approximate gaussian framework (the original).
WH_1d( d, ec, lambda, criterion, method, q = 2, framework, y, wt, quiet = FALSE, ... )
WH_1d( d, ec, lambda, criterion, method, q = 2, framework, y, wt, quiet = FALSE, ... )
d |
Vector of observed events, should have named elements. |
ec |
Vector of central exposure. The central exposure corresponds to the sum of the exposure duration over the insured population. An individual experiencing an event of interest during the year will no longer be exposed afterward and the exposure should be computed accordingly. |
lambda |
Smoothing parameter. If missing, an optimization procedure will
be used to find the optimal smoothing parameter. If supplied, no optimal
smoothing parameter search will take place unless the |
criterion |
Criterion to be used for the selection of the optimal
smoothing parameter. Default is |
method |
Method to be used to find the optimal smoothing parameter.
Default to |
q |
Order of penalization. Polynoms of degrees |
framework |
Default framework is |
y |
Optional vector of observations whose elements should be named. Used
only in the regression framework and if the |
wt |
Optional vector of weights. As for the observation vector |
quiet |
Should messages and warnings be silenced ? Default to |
... |
Additional parameters passed to the smoothing function called. |
An object of class WH_1d
i.e. a list containing :
d
The inputed vector of observed events (if supplied as input)
ec
The inputed vector of central exposure (if supplied as input)
y
The observation vector, either supplied or computed as y = log(d) - log(ec)
wt
The inputed vector of weights, either supplied or computed as d
y_hat
The vector of values fitted using Whittaker-Henderson smoothing
std_y_hat
The vector of standard deviation associated with the fit
res
The vector of deviance residuals associated with the fit
edf_obs
The vector of effective degrees of freedom associated with each observation
edf_par
The vector of effective degrees of freedom associated with each eigenvector
diagnosis
A data.frame with one line containing the sum of effective degrees of freedom
for the model, the deviance of the fit as well as the AIC, BIC, GCV and
REML criteria
Psi
The variance-covariance matrix associated with the fit, which is required
for the extrapolation step.
lambda
The smoothing parameter used.
p
The number of eigenvectors kept on each dimension if the rank reduction method
is used (it should not in the one-dimensional case).
q
The supplied order for the penalization matrix.
d <- portfolio_mort$d ec <- portfolio_mort$ec y <- log(d / ec) y[d == 0] <- - 20 wt <- d # Maximum likelihood WH_1d(d, ec, lambda = 1e2) WH_1d(d, ec) # default outer iteration method based on the optimize function WH_1d(d, ec, criterion = "GCV") # alternative optimization criterion for smoothing parameter selection # Regression WH_1d(y = y, wt = wt, lambda = 1e2) # regression framework is default when y is supplied WH_1d(d, ec, framework = "reg", lambda = 1e2) # setting framework = "reg" forces computation of y from d and ec
d <- portfolio_mort$d ec <- portfolio_mort$ec y <- log(d / ec) y[d == 0] <- - 20 wt <- d # Maximum likelihood WH_1d(d, ec, lambda = 1e2) WH_1d(d, ec) # default outer iteration method based on the optimize function WH_1d(d, ec, criterion = "GCV") # alternative optimization criterion for smoothing parameter selection # Regression WH_1d(y = y, wt = wt, lambda = 1e2) # regression framework is default when y is supplied WH_1d(d, ec, framework = "reg", lambda = 1e2) # setting framework = "reg" forces computation of y from d and ec
Main package function to apply Whittaker-Henderson smoothing in a
bidimensional survival analysis framework. It takes as input a matrix of
observed events and a matrix of associated central exposure, both depending
on two covariates, and build a smooth version of the log-hazard rate.
Smoothing parameters may be supplied or automatically chosen according to a
specific criterion such as "REML"
(the default), "AIC"
, "BIC"
or
"GCV"
. Whittaker-Henderson may be applied in a full maximum likelihood
framework or an approximate gaussian framework. As Whittaker-Henderson
smoothing relies on full-rank smoothers, computation time and memory usage in
the bidimensional case may prove overwhelming and the function integrates a
rank-reduction procedure to avoid such issues.
WH_2d( d, ec, lambda, criterion, method, max_dim = 200, p, q = c(2, 2), framework, y, wt, quiet = FALSE, ... )
WH_2d( d, ec, lambda, criterion, method, max_dim = 200, p, q = c(2, 2), framework, y, wt, quiet = FALSE, ... )
d |
Matrix of observed events, whose rows and columns must be named. |
ec |
Matrix of central exposure. The central exposure corresponds to the sum of the exposure duration over the insured population. An individual experiencing an event of interest during the year will no longer be exposed afterward and the exposure should be computed accordingly. |
lambda |
Smoothing parameter vector of size |
criterion |
Criterion to be used for the selection of the optimal
smoothing parameter. Default is |
method |
Method to be used to find the optimal smoothing parameter.
Default to |
max_dim |
Number of parameters to be kept in the optimization problem.
Default is |
p |
Optional vector of size |
q |
Order of penalization vector of size |
framework |
Default framework is |
y |
Optional matrix of observations whose rows and columns should be
named. Used only in the regression framework and if the |
wt |
Optional matrix of weights. As for the observation vector |
quiet |
Should messages and warnings be silenced ? Default to |
... |
Additional parameters passed to the smoothing function called. |
An object of class WH_2d
i.e. a list containing :
d
The inputed matrix of observed events (if supplied as input)
ec
The inputed matrix of central exposure (if supplied as input)
y
The observation matrix, either supplied or computed as y = log(d) - log(ec)
wt
The inputed matrix of weights, either supplied or set to d
y_hat
The matrix of values fitted using Whittaker-Henderson smoothing
std_y_hat
The matrix of standard deviations associated with the fit
res
The matrix of deviance residuals associated with the fit
edf_obs
The matrix of effective degrees of freedom associated with each observation
edf_par
The matrix of effective degrees of freedom associated with each eigenvector
diagnosis
A data.frame with one line containing the sum of effective degrees of freedom
for the model, the deviance of the fit as well as the AIC, BIC, GCV and
REML criteria
Psi
The variance-covariance matrix associated with the fit, which is required
for the extrapolation step.
lambda
The vector of smoothing parameters used.
p
The number of eigenvectors kept on each dimension if the rank reduction method
is used.
q
The supplied vector of orders for the penalization matrices.
keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] y <- log(d / ec) # observation vector y[d == 0] <- - 20 wt <- d # Maximum likelihood WH_2d(d, ec, lambda = c(1e2, 1e2)) WH_2d(d, ec) # performance iteration default method WH_2d(d, ec, method = "outer") # slower but safer outer iteration method WH_2d(d, ec, criterion = "GCV") # alternative optimization criteria for smoothing parameter selection # Regression WH_2d(y = y, wt = wt, lambda = c(1e2, 1e2)) # regression framework is triggered when y is supplied WH_2d(d, ec, framework = "reg", lambda = c(1e2, 1e2)) # setting framework = "reg" forces computation of y from d and ec # Rank reduction keep_age <- which(rowSums(portfolio_LTC$ec) > 1e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e2) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] prod(dim(d)) # problem dimension is 627 ! WH_2d(d, ec) # rank-reduction is used to find an approximate solution using 200 parameters
keep_age <- which(rowSums(portfolio_LTC$ec) > 5e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e3) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] y <- log(d / ec) # observation vector y[d == 0] <- - 20 wt <- d # Maximum likelihood WH_2d(d, ec, lambda = c(1e2, 1e2)) WH_2d(d, ec) # performance iteration default method WH_2d(d, ec, method = "outer") # slower but safer outer iteration method WH_2d(d, ec, criterion = "GCV") # alternative optimization criteria for smoothing parameter selection # Regression WH_2d(y = y, wt = wt, lambda = c(1e2, 1e2)) # regression framework is triggered when y is supplied WH_2d(d, ec, framework = "reg", lambda = c(1e2, 1e2)) # setting framework = "reg" forces computation of y from d and ec # Rank reduction keep_age <- which(rowSums(portfolio_LTC$ec) > 1e2) keep_duration <- which(colSums(portfolio_LTC$ec) > 1e2) d <- portfolio_LTC$d[keep_age, keep_duration] ec <- portfolio_LTC$ec[keep_age, keep_duration] prod(dim(d)) # problem dimension is 627 ! WH_2d(d, ec) # rank-reduction is used to find an approximate solution using 200 parameters