Subsampling
Functions
This package provides a series of methods for partitioning time-series data based on jackkniving and bootstrapping.
Jackknife
The first group of algorithms includes the following generalisation of the jackknife for dependent data:
- the block jackknife (Kunsch, 1989);
- the artificial delete-$d$ jackknife (Pellegrino, 2022).
MessyTimeSeries.block_jackknife
— Functionblock_jackknife(Y::Union{FloatMatrix, JMatrix{Float64}}, subsample::Float64)
Generate block jackknife (Kunsch, 1989) samples. This implementation is described in Pellegrino (2022).
This technique subsamples a time series dataset by removing, in turn, all the blocks of consecutive observations with a given size.
Arguments
Y
: Observed measurements (nxT
), wheren
andT
are the number of series and observations.subsample
: Block size as a percentage of number of observed periods. It is bounded between 0 and 1.
References
Kunsch (1989) and Pellegrino (2022).
MessyTimeSeries.artificial_jackknife
— Functionartificial_jackknife(Y::Union{FloatMatrix, JMatrix{Float64}}, subsample::Float64, max_samples::Int64, seed::Int64=1)
Generate artificial jackknife samples as in Pellegrino (2022).
The artificial delete-$d$ jackknife is an extension of the delete-$d$ jackknife for dependent data problems.
- This technique replaces the actual data removal step with a fictitious deletion, which consists of imposing $d$-dimensional (artificial) patterns of missing observations to the data.
- This approach does not alter the data order nor destroy the correlation structure.
Arguments
Y
: Observed measurements (nxT
), wheren
andT
are the number of series and observations.subsample
: $d$ as a percentage of the original sample size. It is bounded between 0 and 1.max_samples
: If $\binomial{nT,d}$ is too large,artificial_jackknife
generatesmax_samples
jackknife samples.seed
: Random seed (default: 1).
References
Pellegrino (2022).
MessyTimeSeries.optimal_d
— Functionoptimal_d(n::Int64, T::Int64)
Select the optimal value for $d$. See ?artificial_jackknife
for more details on $d$.
Arguments
n
: Number of seriesT
: Number of observations
Bootstrap
The second group includes the following bootstrap versions compatible with time series:
- the moving block bootstrap (Kunsch, 1989; Liu and Singh, 1992);
- the stationary block bootstrap (Politis and Romano, 1994).
MessyTimeSeries.moving_block_bootstrap
— Functionmoving_block_bootstrap(Y::Union{FloatMatrix, JMatrix{Float64}}, subsample::Float64, samples::Int64, seed::Int64=1)
Generate moving block bootstrap samples.
The moving block bootstrap randomly subsamples a time series into ordered and overlapped blocks of consecutive observations.
Arguments
Y
: Observed measurements (nxT
), wheren
andT
are the number of series and observations.subsample
: Block size as a percentage of number of observed periods. It is bounded between 0 and 1.samples
: Number of bootstrap samples.seed
: Random seed (default: 1).
References
Kunsch (1989) and Liu and Singh (1992).
MessyTimeSeries.stationary_block_bootstrap
— Functionstationary_block_bootstrap(Y::Union{FloatMatrix, JMatrix{Float64}}, subsample::Float64, samples::Int64, seed::Int64=1)
Generate stationary block bootstrap samples.
The stationary bootstrap is similar to the block bootstrap proposed in independently in Kunsch (1989) and Liu and Singh (1992).
There are two main differences:
- The blocks have random length
- In order to achieve stationarity, the stationary (block) bootstrap "wraps" the data around in a "circle" so that the first observation follows the last.
Note: Block size is exponentially distributed with mean Int64(ceil(subsample*T))
.
Arguments
Y
: Observed measurements (nxT
), wheren
andT
are the number of series and observations.subsample
: Block size as a percentage of number of observed periods. It is bounded between 0 and 1.samples
: Number of bootstrap samples.seed
: Random seed (default: 1).
References
Politis and Romano (1994).