Model Validation
Model validation ensures that machine learning models generalise well to unseen data by assessing their performance using independent validation sets
Selection Model Validation in evoML
- Create a New Trial
- Under Splitting Options, select Model Validation method. Existing model validation approaches:
- Holdout (all tasks)
- K-fold cross validation (classification & regression)
- Sliding window (timeseries)
- Expanding window (timeseries)
Model Validation Options
1. Holdout (all models)
The dataset is split into two subsets: one for training and one for validation.
Setting | Details |
---|---|
Size | The fraction of the training dataset to include in the validation subset. |
Keep order | Whether or not to shuffle the data. |
2. K-fold Cross Validation (Classification & Regression)
The dataset is divided into K subsets. The model is trained on K-1 subsets and validated on the remaining one, repeating the process for each subset.
Setting | Details |
---|---|
K | Number of subsets into which to divide the training data. |
Keep order | Whether or not to shuffle the data. |
3. Sliding Window (Timeseries)
The model is trained on a fixed-length training window and validated on a forecast window. Both windows move forward in time by a defined slide length between rounds. An optional gap can be added between them.
Setting | Details |
---|---|
Evaluation Window | Number of time steps in the forecast window (typically 15% of the dataset). |
Train Window | Number of time steps in the training window (typically 45% of the dataset). |
Slide | Number of time steps by which both windows move forward between rounds (typically 15% of the dataset). |
4. Expanding Window (Timeseries)
The model is trained on an initially defined training window, which expands over time. The forecast window moves forward by a defined expansion length in each round. An optional gap can be added between them.
Setting | Details |
---|---|
Evaluation Window | Number of time steps in the forecast window (typically 15% of the dataset). |
Initial Train Window | Number of time steps in the first round of training (typically 45% of the dataset). |
Expansion Length | Number of time steps by which the training window grows in each subsequent round (typically 15% of dataset). |