API reference¶
All public symbols from purgedcv.__all__, auto-rendered from the source
docstrings. The constructors of the splitters share a single set of
keyword arguments (prediction_times, evaluation_times,
purge_horizon, embargo, groups); see
BaseTemporalSplitter for the shared
contract.
Splitters¶
purgedcv.BaseTemporalSplitter ¶
Bases: ABC
Duck-typed sklearn CV splitter with purge + embargo orchestration.
Concrete subclasses implement :meth:_iter_test_indices to yield the
raw test-index arrays for each fold. The base class handles purge,
embargo, optional group-disjointness, and the sklearn-compatible
:meth:split / :meth:get_n_splits protocol.
Times are bound to the splitter at construction. This couples the splitter to a specific dataset's timestamps, which is intentional: a splitter for one dataset is rarely meaningful for another.
.. note::
The subclassing interface (_iter_test_indices,
_candidate_train_idx) is not yet covered by the v0.3 stability
contract. Subclasses may need adjustments through v1.0.
get_n_splits
abstractmethod
¶
Return the total number of splits the iterator will yield.
split ¶
split(X: NDArrayAny | DataFrame, y: object = None, groups: object = None) -> Iterator[tuple[NDArrayAny, NDArrayAny]]
Yield (train_idx, test_idx) pairs for each fold.
The y and groups parameters of this method are accepted for
sklearn protocol compatibility but ignored — group information must
be bound at construction via the groups argument of __init__.
When groups were bound at construction,
:func:~purgedcv.diagnostics.assert_groups_disjoint is called on
every fold after purge and embargo; a
:class:~purgedcv.exceptions.GroupLeakageError is raised if any
group identifier appears in both train and test of the same fold.
with_times ¶
Return a copy of this splitter with new times bound. All other
parameters (n_splits, purge_horizon, embargo, groups,
and any subclass-specific state such as a cached unique-group list)
are preserved unchanged.
To change groups or any other construction parameter, build a
fresh splitter via the constructor — this avoids surprising
interactions between cached state and rebound inputs.
purgedcv.WalkForwardSplit ¶
Bases: BaseTemporalSplitter
Walk-forward CV with sliding or expanding training window.
For window="expanding" and zero purge/embargo, this matches
:class:sklearn.model_selection.TimeSeriesSplit. For
window="sliding", the training window has a fixed maximum length
(the most recent train_size samples before the test fold).
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from purgedcv import WalkForwardSplit
>>> pred = pd.Series(pd.date_range("2024-01-01", periods=10, freq="D"))
>>> evalu = pred + pd.Timedelta(days=1)
>>> cv = WalkForwardSplit(
... n_splits=3, test_size=2,
... prediction_times=pred, evaluation_times=evalu,
... )
>>> cv.get_n_splits()
3
__init__ ¶
__init__(n_splits: int, test_size: int, *, train_size: int | None = None, window: WindowMode = 'expanding', prediction_times: Series, evaluation_times: Series, purge_horizon: HorizonLike | None = None, embargo: HorizonLike | None = None) -> None
Configure a walk-forward CV splitter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_splits
|
int
|
Number of folds to yield. Test folds tile the END of the dataset; each fold trains only on data strictly before its test indices. |
required |
test_size
|
int
|
Number of consecutive rows in each test fold.
|
required |
train_size
|
int | None
|
Maximum number of training rows per fold AFTER
purge and embargo. Counts kept rows, not raw indices, so
with a 2-day embargo and |
None
|
window
|
WindowMode
|
|
'expanding'
|
prediction_times
|
Series
|
Per-sample prediction times for the dataset.
Bound at construction so :meth: |
required |
evaluation_times
|
Series
|
Per-sample evaluation times. Required to apply purge and embargo correctly. |
required |
purge_horizon
|
HorizonLike | None
|
Symmetric padding around the test fold's label
window; training rows whose label horizon overlaps the
padded test horizon are dropped. |
None
|
embargo
|
HorizonLike | None
|
Post-test embargo duration; training rows whose
prediction time falls in the closed window
|
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
split ¶
split(X: NDArrayAny | DataFrame, y: object = None, groups: object = None) -> Iterator[tuple[NDArrayAny, NDArrayAny]]
Yield (train_idx, test_idx) pairs for each walk-forward fold.
The base class handles X length validation, purge, and embargo;
this override additionally trims each training set to the most
recent train_size rows when window="sliding". Training
indices are always strictly less than the minimum index of the
test fold (the defining walk-forward property).
The number of pairs yielded equals self.n_splits. The y
and groups arguments are accepted for sklearn protocol
compatibility but ignored.
purgedcv.PurgedKFold ¶
Bases: BaseTemporalSplitter
K-fold CV with contiguous test folds and purge + embargo applied.
Test folds tile the index space contiguously: fold k holds rows
[start_k, start_k + size_k) where size_k differs across folds
by at most one due to integer division. Train = complement of the
test fold, with D2 purge and D3 embargo applied by the base class.
For zero purge_horizon and embargo, the test fold structure
is identical to :class:sklearn.model_selection.KFold(shuffle=False)
and the full complement of indices serves as training data per fold.
The first n_samples % n_splits folds receive
n_samples // n_splits + 1 rows; the remaining folds receive
n_samples // n_splits. When n_splits > n_samples, the
excess folds are empty — the base class purge and embargo handle
empty arrays gracefully.
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from purgedcv import PurgedKFold
>>> pred = pd.Series(pd.date_range("2024-01-01", periods=20, freq="D"))
>>> evalu = pred + pd.Timedelta(days=1)
>>> cv = PurgedKFold(
... n_splits=5,
... prediction_times=pred, evaluation_times=evalu,
... )
>>> cv.get_n_splits()
5
__init__ ¶
__init__(n_splits: int, *, prediction_times: Series, evaluation_times: Series, purge_horizon: HorizonLike | None = None, embargo: HorizonLike | None = None) -> None
Configure a purged k-fold splitter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_splits
|
int
|
Number of folds. Must be at least 2. |
required |
prediction_times
|
Series
|
Per-sample prediction times. Bound at
construction so :meth: |
required |
evaluation_times
|
Series
|
Per-sample evaluation times. |
required |
purge_horizon
|
HorizonLike | None
|
Symmetric padding around the test fold's label
window; training rows whose label horizon overlaps the
padded test horizon are dropped. |
None
|
embargo
|
HorizonLike | None
|
Post-test embargo duration; training rows whose
prediction time falls in any closed window
|
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
purgedcv.PurgedGroupKFold ¶
Bases: BaseTemporalSplitter
Group-aware k-fold variant of :class:PurgedKFold.
Each test fold consists of all rows from a contiguous block of unique
group identifiers, so no group_id ever appears in both train and
test of the same fold. Purge and embargo still apply ACROSS group
boundaries — training rows from other groups whose horizons overlap
the test window are dropped by D2/D3 in the base class.
Groups are assigned to fold blocks in first-appearance order within
the groups Series (the order returned by pd.Series.unique()).
For temporal coherence — patients enrolled in chronological order,
assets in time-of-IPO order — ensure the groups Series is sorted
by first occurrence time.
The base class's :func:assert_groups_disjoint enforcement runs on
every fold automatically because groups is bound at construction.
Useful in clinical ML (no patient appears in both sides of a fold), asset CV (no symbol appears in both sides), and any setting where grouped observations must not leak across the train/test boundary.
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from purgedcv import PurgedGroupKFold
>>> pred = pd.Series(pd.date_range("2024-01-01", periods=12, freq="D"))
>>> evalu = pred + pd.Timedelta(days=1)
>>> groups = pd.Series(np.repeat([0, 1, 2, 3], 3))
>>> cv = PurgedGroupKFold(
... n_splits=2,
... prediction_times=pred, evaluation_times=evalu,
... groups=groups,
... )
>>> cv.get_n_splits()
2
__init__ ¶
__init__(n_splits: int, *, prediction_times: Series, evaluation_times: Series, groups: Series, purge_horizon: HorizonLike | None = None, embargo: HorizonLike | None = None) -> None
Configure a group-aware purged k-fold splitter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_splits
|
int
|
Number of folds. Must be at least 2 and must not exceed the number of unique group identifiers. |
required |
prediction_times
|
Series
|
Per-sample prediction times. |
required |
evaluation_times
|
Series
|
Per-sample evaluation times. |
required |
groups
|
Series
|
Group identifier per sample. Must have the same
length as |
required |
purge_horizon
|
HorizonLike | None
|
Symmetric padding around the test fold's
label window; cross-group training rows whose label
horizon overlaps the padded test horizon are dropped.
|
None
|
embargo
|
HorizonLike | None
|
Post-test embargo duration. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
purgedcv.CombinatorialPurgedCV ¶
Bases: BaseTemporalSplitter
Combinatorial Purged Cross-Validation (fold enumeration).
Partitions the time-ordered samples into n_splits contiguous group
blocks. For each combination of n_test_groups chosen from those
blocks, yields one fold whose test indices are the union of the
chosen blocks. Total folds: C(n_splits, n_test_groups).
Each group block appears as test in exactly C(n_splits - 1,
n_test_groups - 1) folds.
The base class applies D2 purge and D3 embargo to each fold's train
set. :meth:backtest_paths then assembles the C(N,K) folds into
n_paths time-ordered out-of-sample sequences.
See Advances in Financial Machine Learning (Lopez de Prado, Wiley 2018), chapter 12 section 12.4, for the original method.
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from purgedcv import CombinatorialPurgedCV
>>> pred = pd.Series(pd.date_range("2024-01-01", periods=24, freq="D"))
>>> evalu = pred + pd.Timedelta(days=1)
>>> cv = CombinatorialPurgedCV(
... n_splits=6, n_test_groups=2,
... prediction_times=pred, evaluation_times=evalu,
... )
>>> cv.get_n_splits()
15
__init__ ¶
__init__(n_splits: int, n_test_groups: int, *, prediction_times: Series, evaluation_times: Series, purge_horizon: HorizonLike | None = None, embargo: HorizonLike | None = None) -> None
Configure a Combinatorial Purged CV splitter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_splits
|
int
|
Number of contiguous group blocks to partition the samples into. Must be at least 2. |
required |
n_test_groups
|
int
|
Number of group blocks chosen as the test
set in each fold. Must be in |
required |
prediction_times
|
Series
|
Per-sample prediction times. |
required |
evaluation_times
|
Series
|
Per-sample evaluation times. |
required |
purge_horizon
|
HorizonLike | None
|
Symmetric padding around the test fold's
label window; training rows whose label horizon overlaps
the padded test horizon are dropped. |
None
|
embargo
|
HorizonLike | None
|
Post-test embargo duration; training rows whose
prediction time falls in any closed window
|
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
backtest_paths ¶
Fit estimator on each fold and reconstruct the C(N-1, K-1)
out-of-sample backtest paths.
For each of the C(N, K) folds:
- Clone the estimator (so per-fold fits do not contaminate each other or the original).
- Fit on the fold's training set (after purge + embargo).
- Predict on the fold's test set.
- If the fold has no training rows under an unusually aggressive purge/embargo configuration, the predictions for that fold are NaN.
The per-fold predictions are then handed to :func:reconstruct_paths,
which assembles them into an (n_paths, n_samples) matrix where
each row is a complete time-ordered out-of-sample prediction
sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimator
|
object
|
A scikit-learn estimator with |
required |
X
|
NDArrayAny | DataFrame
|
Feature matrix of shape |
required |
y
|
NDArrayAny | Series
|
Target vector of shape |
required |
Returns:
| Type | Description |
|---|---|
NDArrayAny
|
|
NDArrayAny
|
with |
NDArrayAny
|
Affected rows contain NaN when an upstream fold could not be |
NDArrayAny
|
fit. |
Raises:
| Type | Description |
|---|---|
AttributeError or TypeError
|
if |
Examples:
>>> import warnings
>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.dummy import DummyRegressor
>>> from sklearn.exceptions import FitFailedWarning
>>> from purgedcv import CombinatorialPurgedCV
>>> pred = pd.Series(pd.date_range("2024-01-01", periods=16, freq="D"))
>>> evalu = pred + pd.Timedelta(days=1)
>>> cv = CombinatorialPurgedCV(
... n_splits=4, n_test_groups=2,
... prediction_times=pred, evaluation_times=evalu,
... )
>>> X = np.arange(16).reshape(-1, 1).astype(float)
>>> y = np.arange(16).astype(float)
>>> with warnings.catch_warnings():
... warnings.simplefilter("ignore", FitFailedWarning)
... paths = cv.backtest_paths(DummyRegressor(strategy="mean"), X, y)
>>> paths.shape
(3, 16)
Backtest paths¶
purgedcv.reconstruct_paths ¶
reconstruct_paths(fold_predictions: Sequence[NDArrayAny], fold_test_indices: Sequence[NDArrayAny], n_splits: int, n_test_groups: int, n_samples: int) -> NDArrayAny
Assemble Combinatorial Purged CV fold predictions into backtest paths.
Given the predictions and test indices for all C(N, K) folds produced by
:class:CombinatorialPurgedCV, returns a (n_paths, n_samples) array
where each row is a complete time-ordered out-of-sample prediction
sequence built from a different combination of the folds.
n_paths = C(n_splits - 1, n_test_groups - 1). Each sample is
predicted in every path (no missing entries). The assignment of fold
to path for each group is the canonical greedy positional rule: for
group g, the k-th fold (in fold-iteration order) that contains
g as a test group contributes g's predictions to path k.
NaN predictions in any fold propagate only to the paths that use that fold for the affected group(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fold_predictions
|
Sequence[NDArrayAny]
|
One array per fold; |
required |
fold_test_indices
|
Sequence[NDArrayAny]
|
One array per fold; the per-fold test_idx as
yielded by :meth: |
required |
n_splits
|
int
|
Number of CPCV group blocks. |
required |
n_test_groups
|
int
|
Number of test groups per fold. |
required |
n_samples
|
int
|
Total number of samples in the dataset. |
required |
Returns:
| Type | Description |
|---|---|
NDArrayAny
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
on fold count or fold-prediction length mismatch. |
Examples:
>>> import numpy as np
>>> from purgedcv import reconstruct_paths
>>> # Synthetic 4-fold output: 4 samples per fold's test set,
>>> # predictions equal to the fold index.
>>> fold_test = [
... np.array([0, 1, 2, 3, 4, 5, 6, 7]), # combo (0, 1)
... np.array([0, 1, 2, 3, 8, 9, 10, 11]), # combo (0, 2)
... np.array([0, 1, 2, 3, 12, 13, 14, 15]), # combo (0, 3)
... np.array([4, 5, 6, 7, 8, 9, 10, 11]), # combo (1, 2)
... np.array([4, 5, 6, 7, 12, 13, 14, 15]), # combo (1, 3)
... np.array([8, 9, 10, 11, 12, 13, 14, 15]), # combo (2, 3)
... ]
>>> fold_preds = [
... np.full(len(ti), float(f)) for f, ti in enumerate(fold_test)
... ]
>>> paths = reconstruct_paths(fold_preds, fold_test, 4, 2, 16)
>>> paths.shape
(3, 16)
Row-level primitives¶
purgedcv.purge ¶
purge(train_idx: NDArrayAny, test_idx: NDArrayAny, prediction_times: Series, evaluation_times: Series, purge_horizon: Timedelta | None = None) -> NDArrayAny
Drop training rows whose half-open label horizon overlaps any test horizon.
Each test row contributes a half-open horizon
[prediction_time - purge_horizon, evaluation_time + purge_horizon).
The intervals are merged before filtering, so disjoint test blocks purge
only their local overlap zones rather than the full convex hull between
them.
A training row at position i is kept iff
its label horizon overlaps none of those merged test horizons.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_idx
|
NDArrayAny
|
positional indices of training rows. |
required |
test_idx
|
NDArrayAny
|
positional indices of test rows. |
required |
prediction_times
|
Series
|
prediction times for all rows. |
required |
evaluation_times
|
Series
|
evaluation times for all rows. |
required |
purge_horizon
|
Timedelta | None
|
optional symmetric padding ( |
None
|
Returns:
| Type | Description |
|---|---|
NDArrayAny
|
The subset of |
NDArrayAny
|
window. Input ordering and dtype are preserved. |
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from purgedcv import purge
>>> pred = pd.Series(pd.date_range("2024-01-01", periods=10, freq="D"))
>>> evalu = pred + pd.Timedelta(days=3)
>>> train_idx = np.array([0, 1, 2, 3, 4, 8, 9])
>>> test_idx = np.array([5, 6, 7])
>>> purge(train_idx, test_idx, pred, evalu)
array([0, 1, 2])
purgedcv.apply_embargo ¶
apply_embargo(train_idx: NDArrayAny, test_idx: NDArrayAny, prediction_times: Series, evaluation_times: Series, embargo: Timedelta) -> NDArrayAny
Drop training rows whose prediction_time falls inside any closed
embargo window [test_evaluation_time, test_evaluation_time + embargo].
Embargo is asymmetric: rows whose prediction_time is strictly before
all test evaluation times are never dropped. embargo == 0 is the
identity (the embargo window is logically empty at zero width), avoiding
degenerate single-point windows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_idx
|
NDArrayAny
|
positional indices of training rows. |
required |
test_idx
|
NDArrayAny
|
positional indices of test rows. |
required |
prediction_times
|
Series
|
prediction times for all rows. |
required |
evaluation_times
|
Series
|
evaluation times for all rows. |
required |
embargo
|
Timedelta
|
post-test embargo duration. |
required |
Returns:
| Type | Description |
|---|---|
NDArrayAny
|
The subset of |
NDArrayAny
|
post-test embargo window. Input ordering and dtype are preserved. |
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from purgedcv import apply_embargo
>>> pred = pd.Series(pd.date_range("2024-01-01", periods=20, freq="D"))
>>> evalu = pred + pd.Timedelta(days=1)
>>> train_idx = np.array([11, 12, 13, 14])
>>> test_idx = np.arange(5, 10)
>>> apply_embargo(train_idx, test_idx, pred, evalu, pd.Timedelta(days=1))
array([12, 13, 14])
Time and horizon utilities¶
purgedcv.parse_horizon ¶
Coerce a horizon-like input to a non-negative pd.Timedelta.
Accepts pandas offset strings ("2D", "6h", "30min"),
pd.Timedelta, datetime.timedelta, and numpy.timedelta64.
Rejects negative durations and calendar-ambiguous offsets such as
"M" (month) or "Y" (year), which do not represent a fixed
duration in seconds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
HorizonLike
|
The horizon to parse. |
required |
Returns:
| Type | Description |
|---|---|
Timedelta
|
A non-negative |
Raises:
| Type | Description |
|---|---|
ValueError
|
if the input is negative or a calendar-ambiguous string. |
TypeError
|
if the input is not one of the supported types. |
Examples:
purgedcv.horizons_overlap ¶
horizons_overlap(a_start: Timestamp, a_end: Timestamp, b_start: Timestamp, b_end: Timestamp) -> bool
Return True iff half-open intervals [a_start, a_end) and
[b_start, b_end) overlap.
Touching intervals (a_end == b_start) do NOT overlap. The function
is symmetric in its arguments.
Examples:
>>> import pandas as pd
>>> from purgedcv import horizons_overlap
>>> horizons_overlap(
... pd.Timestamp("2024-01-01"), pd.Timestamp("2024-01-03"),
... pd.Timestamp("2024-01-02"), pd.Timestamp("2024-01-04"),
... )
True
>>> horizons_overlap(
... pd.Timestamp("2024-01-01"), pd.Timestamp("2024-01-02"),
... pd.Timestamp("2024-01-02"), pd.Timestamp("2024-01-03"),
... )
False
purgedcv.validate_times ¶
validate_times(prediction_times: Series, evaluation_times: Series, *, require_monotonic: bool = True) -> None
Validate that prediction_times and evaluation_times are well-formed.
Raises:
| Type | Description |
|---|---|
ValueError
|
on length mismatch, NaT values, |
Examples:
Statistical metrics¶
purgedcv.probabilistic_sharpe_ratio ¶
Probability that the true Sharpe ratio exceeds benchmark_skill.
Formula (Bailey & Lopez de Prado 2012, Eq. 7):
.. math:: \text{PSR}(\text{SR}^\ast) = \Phi!\left( \frac{(\widehat{\text{SR}} - \text{SR}^\ast)\sqrt{n - 1}} {\sqrt{1 - \widehat{\gamma}_3\,\widehat{\text{SR}} + \frac{\widehat{\gamma}_4 - 1}{4}\,\widehat{\text{SR}}^{\,2}}} \right)
where :math:\widehat{\gamma}_3 is sample skew, :math:\widehat{\gamma}_4
is sample kurtosis (NOT excess kurtosis), and :math:\Phi is the
standard normal CDF.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
returns
|
NDArrayAny
|
1-D array of returns, length >= 2, no NaN, non-zero variance. |
required |
benchmark_skill
|
float
|
The Sharpe-ratio threshold to test against. Use 0 for "is this strategy better than holding cash." |
required |
Returns:
| Type | Description |
|---|---|
float
|
Scalar probability in [0, 1]. |
Raises:
| Type | Description |
|---|---|
ValueError
|
on length < 2, NaN, or zero variance. |
Examples:
purgedcv.deflated_sharpe_ratio ¶
Probability that the true Sharpe ratio exceeds the deflated
benchmark that accounts for n_trials independent hyperparameter
searches under the null.
Formula (Bailey & Lopez de Prado 2014):
.. math:: \text{SR}^\ast_{n} = \sqrt{V[\text{SR}]} \left[ (1 - \gamma) \Phi^{-1}!\left(1 - \frac{1}{n_{\text{trials}}}\right) + \gamma \Phi^{-1}!\left(1 - \frac{1}{n_{\text{trials}} \cdot e}\right) \right]
where :math:\gamma \approx 0.5772 is the Euler-Mascheroni constant.
DSR is then :func:probabilistic_sharpe_ratio evaluated at the
deflated benchmark :math:\text{SR}^\ast_n.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
returns
|
NDArrayAny
|
1-D array of returns (passed through to PSR). |
required |
n_trials
|
int
|
Number of independent hyperparameter searches the user
ran before reporting this strategy's Sharpe. Must be >= 1.
With |
required |
var_sharpe
|
float
|
Estimated variance of Sharpe ratios across the
|
required |
Returns:
| Type | Description |
|---|---|
float
|
Scalar probability in [0, 1]. |
Raises:
| Type | Description |
|---|---|
ValueError
|
on invalid n_trials or negative var_sharpe. |
Examples:
purgedcv.min_track_record_length ¶
min_track_record_length(observed_sharpe: float, target_sharpe: float, alpha: float, skew: float, kurtosis: float) -> int
Minimum sample size such that PSR(target_sharpe) >= 1 - alpha.
Inverts the :func:probabilistic_sharpe_ratio formula for n:
.. math:: n^\ast = 1 + \left\lceil \left(\frac{\Phi^{-1}(1 - \alpha) \cdot \sqrt{1 - \gamma_3 \widehat{\text{SR}} + \frac{\gamma_4 - 1}{4} \widehat{\text{SR}}^2}} {\widehat{\text{SR}} - \text{SR}^\ast} \right)^2 \right\rceil
Bailey & Lopez de Prado (2012, Eq. 11).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observed_sharpe
|
float
|
The sample Sharpe ratio you actually observed.
Must be strictly greater than |
required |
target_sharpe
|
float
|
The benchmark you want to beat with confidence. |
required |
alpha
|
float
|
Significance level in (0, 1). PSR must meet 1 - alpha. |
required |
skew
|
float
|
Sample skew of the return distribution. |
required |
kurtosis
|
float
|
Sample kurtosis (NOT excess) of the return distribution. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Minimum integer sample size. |
Raises:
| Type | Description |
|---|---|
ValueError
|
if |
Examples:
Diagnostics¶
purgedcv.diagnostics.compute_overlap_fraction ¶
compute_overlap_fraction(train_idx: NDArrayAny, test_idx: NDArrayAny, prediction_times: Series, evaluation_times: Series) -> float
Return the fraction of training rows whose half-open label horizon overlaps any test horizon.
Unlike :func:assert_no_temporal_leakage, this never raises — it
returns 0.0 for clean splits and 1.0 when every training row
leaks. Useful for logging splitter health metrics or for debugging
a splitter that produces unexpected behavior.
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from purgedcv.diagnostics import compute_overlap_fraction
>>> pred = pd.Series(pd.date_range("2024-01-01", periods=20, freq="D"))
>>> evalu = pred + pd.Timedelta(days=1)
>>> compute_overlap_fraction(np.arange(5), np.arange(10, 15), pred, evalu)
0.0
>>> compute_overlap_fraction(
... np.arange(10, 15), np.arange(10, 15), pred, evalu
... )
1.0
purgedcv.diagnostics.assert_no_temporal_leakage ¶
assert_no_temporal_leakage(train_idx: NDArrayAny, test_idx: NDArrayAny, prediction_times: Series, evaluation_times: Series, *, purge_horizon: HorizonLike | None = None) -> None
Raise :class:TemporalLeakageError if any training row's label horizon
overlaps any test label horizon, optionally padded on both sides by
purge_horizon.
Test horizons are checked as a union of half-open intervals, not as one convex hull. This matters for CPCV folds whose test groups are intentionally non-contiguous.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_idx
|
NDArrayAny
|
positional indices of training rows. |
required |
test_idx
|
NDArrayAny
|
positional indices of test rows. |
required |
prediction_times
|
Series
|
prediction times for all rows. |
required |
evaluation_times
|
Series
|
evaluation times for all rows. |
required |
purge_horizon
|
HorizonLike | None
|
optional padding (default: |
None
|
Raises:
| Type | Description |
|---|---|
TemporalLeakageError
|
with the offending training row index and the two overlapping intervals in the message. |
Examples:
purgedcv.diagnostics.assert_groups_disjoint ¶
Raise :class:GroupLeakageError if any group identifier appears in
both train_idx and test_idx.
Used by group-aware splitters to verify that no entity (patient, asset, user, etc.) is represented in both training and test of the same fold. The error message names a representative overlapping group plus the total count of overlapping groups, so the caller can scope follow-up.
Examples:
purgedcv.diagnostics.assert_embargo_respected ¶
assert_embargo_respected(train_idx: NDArrayAny, test_idx: NDArrayAny, prediction_times: Series, evaluation_times: Series, embargo: HorizonLike) -> None
Raise :class:EmbargoViolationError if any training row's
prediction_time falls inside any closed embargo window
[test_evaluation_time, test_evaluation_time + embargo].
Embargo is asymmetric: rows whose prediction_time is strictly before
all test evaluation times are never flagged. embargo == 0 is the
identity (no rows flagged) — the embargo window is logically empty at
zero width.
Examples:
>>> import numpy as np
>>> import pandas as pd
>>> from purgedcv.diagnostics import assert_embargo_respected
>>> pred = pd.Series(pd.date_range("2024-01-01", periods=20, freq="D"))
>>> evalu = pred + pd.Timedelta(days=1)
>>> assert_embargo_respected(
... np.array([18]), np.arange(5, 10), pred, evalu, embargo="2D"
... )
Exceptions¶
purgedcv.TemporalCVError ¶
Bases: ValueError
Base class for all purged-cross-validation errors.
purgedcv.TemporalLeakageError ¶
Bases: TemporalCVError
Raised when a training row's label horizon overlaps the test horizon.
purgedcv.EmbargoViolationError ¶
Bases: TemporalCVError
Raised when a training row falls inside the post-test embargo window.
purgedcv.GroupLeakageError ¶
Bases: TemporalCVError
Raised when a group_id appears in both training and test of the same fold.