Skip to main content

Build Timeseries Models

This guide walks you through creating time series models using EvoML. It covers the workflow from data preparation, model configuration, training, evaluation, to deployment. For time-series-specific options, refer to this guide.

Key Considerations

  • This mode is activated using the Time series forecasting slider
  • Timeseries models are built by generating lagged values of the target and features and using those lagged values to predict future values with tabular regression/classification models.
  • Lagged values are generated over a context window, and the length of time into the future we forecast is controlled by the horizon option.
  • The choice of which features for which to generate lags is controlled in feature engineering using the future/past covariates options.
  • For time series we don't shuffle the data and time ordered splitting methods are used instead.

1. Data Preparation and Upload

  1. Upload: Start by uploading or importing your dataset.
  2. Explore: Selecting the uploaded dataset provides additional details on the dataset and allows users to understand the structure and quality of your dataset. Key steps include:
    • Examine Feature Types: evoML automatically detects the types of each feature (numeric, categorical, etc.). Review detected feature types to ensure proper handling.
    • Analyse Patterns: Investigate trends, seasonality, and patterns. Check tags generated by evoML.
    • Check Feature Distribution: analyse the distribution of features (including the target variable). To identify any skewed distribution or potential problems.
    • Correlation & Association: Compute correlation and association between features.

2. Create Trial

  1. Initiate Trial: Select New Trial to initiate a new model creation pipeline.
  2. Choose Dataset: Select the dataset you previously uploaded/imported.
  3. Specify Target Column: Choose the column to predict.

3. Task Configuration

  1. Auto-Detect ML Task: evoML automatically identifies the type of machine learning task (e.g., classification) based on the target column.

4. Multi-Objective Optimization

  1. Select Objective Function: Choose an appropriate function for the task. If needed, you can define a custom objective function to fine-tune performance metrics.
  2. Optimise Hyperparameters: evoML supports various optimizers to fine-tune model performance by adjusting hyperparameters.

5. Time Series Forecasting options

  1. Enable Timeseries feature
  2. Select Timeseries Index: Choose the timestamp column defining temporal order.
  3. Select task type: Based on the target column, choose regression or classification.

More details available in this guide.

6. Train/Test Split:

  1. Define Split Ratio: evoML provides different splitting strategies for training and testing (e.g., 80/20 split). Make sure to choose the ratio that works best for your data.

7. Model Validation Strategy

  1. Holdout Validation: This method splits the data into a training set and a separate testing set. It's straightforward and often used for quick evaluation.
  2. Sliding Window Validation:
  3. Expanding Window Validation:

For more details, refer to our guide on validation.

8. Feature Engineering

Timeseries specific features
  • Context Window: Define how many previous time steps the model will consider to make predictions. This context window allows the model to leverage the historical data for classification.
  • Forecast Horizon: This represents how far ahead you want the model to predict (e.g., classify the event on the 5th day from now).
  • Covariates: Specify which features are past or future covariates.
    • Past Covariates: Known only after the timestamp they correspond to (e.g., sales data for a given day, only known once the day ends).
    • Future Covariates: Known ahead of time (e.g., whether a given day is a holiday).
Timeseries encoding strategies
  • Difference Transform: Computes the difference between values separated by the forecast horizon, helping to highlight trends and seasonality.
  • Ratio Transform: Calculates the ratio between values separated by the forecast horizon, normalizing fluctuating data and identifying growth patterns.
  • Log-Ratio Transform: Applies a logarithmic transformation to stabilize variance and normalize the data.
Timeseries imputation strategies
  • Forward Filling: Uses the last observed value.
  • Backward Filling: Uses the next valid observation (use cautiously to avoid data leakage).
  • Linear Interpolation: Estimates a straight line between the surrounding data points.
  • Spline Interpolation: Uses a smooth curve for estimation.
  • Moving Average: Fills gaps using the mean of surrounding values.
  • Polynomial Interpolation: Uses higher-order polynomials for more precise estimates.

Further details, refer to the timeseries guide.

9. Model Selection

  1. Select Models: evoML offers multiple model options. You can choose from traditional machine learning models (e.g., decision trees) to more advanced models (e.g., XGBoost).
  2. Tune Hyperparameters: You can manually adjust the model's hyperparameters, or evoML will tune them for you based on your defined range of values. Visit the model tuning guide for details.

10. Train the Model

Once all configurations are set, click Start to initiate model training:

  1. EvoML will automatically preprocess your data.
  2. The data will be split into training and test sets as per the defined strategy.
  3. Use the optimizer to tune hyperparameters and train the model on the training set.

11. Evaluate Performance

After training, evoML will evaluate models on the test set using the appropriate classification metrics or regression metrics.

Refer to the performance evaluation guide for more in-depth analysis and interpretation of these metric

12. Deploy Model

  1. Generate Deployment Pipeline: evoML can generate a deployment pipeline to help you integrate your model into production environments.
  2. Deploy to Your Preferred Environment: evoML supports easy deployment on different environments (e.g., local machine, cloud server)