Consumer price index forecasting

Overview

This Notebook presents a pipeline for forecasting the Consumer Price index (CPI), a key measure of inflation, over the next year. It generates predictions for the Year-on-Year CPI at 12,9,6 and 3-month intervals, with corresponding visualizations to illustrate projected trends.

Clients could use the generated predictions as an analytical insight on inflation trends, and/or when making informed decisions on setting interest rates.

The dataset used could be sourced from the UK Office for National Statistics (ONS) Feel free to experiment with other datasets providing inflation measures, should they fit a desired statistical objective.

Setup

Dependencies

This work uses the following library versions:

turintech-evoml-client
pandas
matplotlib
plotly
nbformat (if using xls data)

Credentials

You will also require:

A URL for an instance of the evoML platform (e.g. https://evoml.ai)
Your evoML username and password

import pandas as pd 
import os 
import numpy as np
import matplotlib.pyplot as plt
import typing 
from typing import Final
import evoml_client as ec 
from evoml_client.trial_conf_models import BudgetMode, SplitMethodOptions
import plotly.express as px
import plotly.graph_objects as go 
import nbformat as nbf
import math
import openpyxl
from dataclasses import dataclass

API_URL: Final[str] = "https://evoml.ai"
EVOML_USERNAME: Final[str] = ""
EVOML_PASSWORD: Final[str] = ""

# Connect to evoML platform
ec.init(base_url=API_URL, username=EVOML_USERNAME, password=EVOML_PASSWORD)

Data loading and manipulation

We first retrieve the downloaded data and select the relevant sheet name (Table 57, in this case) to extract the CPI summary of all items for the time period between between 1988 and 2025.

We then further convert the date column into a datetime format and we create a dataframe with our time column and dependent variable only.

# Reading the data 
xls = pd.ExcelFile("consumer-price-inflation-ONS.xlsx", engine="openpyxl")
CPI_UK = pd.read_excel("consumer-price-inflation-ONS.xlsx", sheet_name="Table 57", skiprows=6, engine="openpyxl")

# Dropping columns 
#print(CPI_UK['name'].tail(15)) #The last 14 rows of data are not relevant to the analysis
CPI_UK = CPI_UK.drop(CPI_UK.tail(14).index)

# Converting the time column to datetime format 
CPI_UK['name'] = pd.to_datetime(CPI_UK['name'])
CPI_UK['name'] = CPI_UK['name'].dt.strftime('%Y-%m') #Removing the 00:00:00 timestamp from the date
CPI_UK.rename(columns={"name": "Date_CPI"}, inplace=True)

# Creating the df for exploratory analysis 
CPI_UK = CPI_UK.copy()
CPI_UK_single = CPI_UK[['Date_CPI', 'CPI ALL ITEMS']]

Dependent variable manipulation

In order to retrieve a valuable estimate of inflation, we:

Compute the 12-month rolling inflation rate from the Consumer Price index, which can be expressed with the following equation:

\frac{CPI_t - CPI_{t-1}}{CPI_{t-1}} \times 100

In order to avoid null values, we add a small offset to the product (epislon value).

In order to ensure stationarity, we apply seasonal differecing to further detrend the data by removing annual seasonality.

\Delta_{12} \pi_t = \pi_t - \pi_{t-12}

We visualize both results and prepare the final dataframe for analysis.

# Variable manipulation 
# Convert Date_CPI to datetime and sort the dataframe
CPI_UK_single['Date_CPI'] = pd.to_datetime(CPI_UK_single['Date_CPI'])
CPI_UK_single = CPI_UK_single.sort_values(by='Date_CPI')

# Compute the 12-month rolling inflation rate
epsilon = 1e-10
CPI_UK_single['CPI_Annual_Change'] = (
    CPI_UK_single['CPI ALL ITEMS'] - CPI_UK_single['CPI ALL ITEMS'].shift(12)) / (CPI_UK_single['CPI ALL ITEMS'].shift(12) + epsilon
) * 100

CPI_UK_single = CPI_UK_single.dropna()

# Apply seasonal differencing to the data
CPI_UK_single['Delta_CPI_Annual_Change'] = CPI_UK_single['CPI_Annual_Change'].diff(12)
CPI_UK_single = CPI_UK_single.dropna()

# Create the figure
fig = go.Figure()

# Add traces
fig.add_trace(
    go.Scatter(x=CPI_UK_single['Date_CPI'], 
               y=CPI_UK_single['CPI_Annual_Change'], 
               mode='lines', 
               name='CPI Annual Change', 
               line=dict(color='blue'))
)
fig.add_trace(
    go.Scatter(x=CPI_UK_single['Date_CPI'], 
               y=CPI_UK_single['Delta_CPI_Annual_Change'], 
               mode='lines', name='Delta CPI Annual Change', 
               line=dict(color='red'))
               )

# Determine the y-axis range with a buffer
y_min = min(CPI_UK_single[['CPI_Annual_Change', 'Delta_CPI_Annual_Change']].min()) * 1.1
y_max = max(CPI_UK_single[['CPI_Annual_Change', 'Delta_CPI_Annual_Change']].max()) * 1.1

# Update layout to ensure y-axis does not change when toggling legend items
fig.update_layout(
    height=400,
    width=900,
    title='Annual CPI Inflation Rate (12-Month Change)',
    xaxis_title='Year',
    yaxis_title='Annual Inflation Rate (%)',
    xaxis=dict(tickangle=45, showgrid=False),
    yaxis=dict(showgrid=False, fixedrange=True, range=[y_min, y_max]),
    legend_title_text='',
    plot_bgcolor='white',
    paper_bgcolor='white'
)

# Show the figure
fig.show()

# And fetch final dataframe for analysis
CPI_Delta_YoY = CPI_UK_single[['Date_CPI', 'Delta_CPI_Annual_Change']]

#Upload the dataset to EvoML 
dataset = ec.Dataset.from_pandas(CPI_Delta_YoY, name="CPI_Dataset_Delta")
dataset.put()
dataset.wait()
print(f"Dataset URL: {API_URL}/platform/datasets/view/{dataset.dataset_id}")

Dataset URL: https://evoml.ai/platform/datasets/view/67d97d584082c2ed0965a5f8

Trial configuration

Here we use the client to configure four different trials, with a predictive horizon of 12,9,6 and 3 months in the future, respectively.

We therefore create a universal workflow which will allow us to execute this process, generate predictions, and back-transform these predictions to their original Year-on-year inflation scale. For this example, we have kept the window size the same (=6) and have chosen regularized regression models with the aim of ensuring generalization and handling milticolinearity concerns. In the end, this function also fetches the best model to be used for our purpouses.

In order to execute our workflow at the end, we mirror EvoML's 80/20 prepreprocessing split globally. If your dependent variable does not require back-transformation, this step can be skipped.
Feel free to also recreate a separate module out of the workflow and call into this notebook, as per best practices.

# This is a workflow that works end-to-end for the tral confirguration, processing, fetching predictions, and back-transforming them for visualization 
def config_trial(trial_name, models, dataset_id, target_col, train_percentage=0.8, budget_mode=BudgetMode.fast, loss_funcs=["Root Mean Squared Error"], is_timeseries=True, TimeseriesHorizon=12):
    '''
    Configures and runs a trial with the specified parameters.

    Params:
    dataset_id: str - the dataset ID
    target_col: str - the target column name
    trial_name: str - the name of the trial
    models: list - a list of model names
    train_percentage: float - the percentage of the dataset to use for training
    budget_mode: BudgetMode - the budget mode
    loss_funcs: list - a list of loss functions
    is_timeseries: bool - whether the dataset is a time series
    TimeseriesHorizon: int - the time series horizon (default is 12)

    Returns:
    trial: Trial - the trial object
    best_model: Model - the best model object
    '''
    try:
        config = ec.TrialConfig.with_models(
            models=models,
            task=ec.MlTask.regression,
            budget_mode=budget_mode,
            loss_funcs=loss_funcs,
            dataset_id=dataset_id,
            is_timeseries=is_timeseries,
        )
        config.options.timeSeriesWindowSize = 6
        config.options.timeSeriesHorizon = TimeseriesHorizon
        config.options.splittingMethodOptions = SplitMethodOptions(method="percentage", trainPercentage=train_percentage)
        config.options.enableBudgetTuning = False
        
        trial, _ = ec.Trial.from_dataset_id(
            dataset_id,
            target_col=target_col,
            trial_name=trial_name,
            config=config,
        )

        trial.run(timeout=900)
        
        best_model = trial.get_best()
        best_model.build_model()
        
        return trial, best_model

    except Exception as e:
        print(f"An error occurred while building the trial: {e}")
        return None, None

@dataclass
class TrialResult:
    metrics_df: any
    best_model_name: str
    best_model_mse_test: float
    best_model_rmse_test: float

def process_trial(trial: ec.Trial, trial_number: int) -> TrialResult: 
    '''
    Params: 
    trial: Trial - the trial object
    trial_number: int - the trial number

    Returns: TrialResult - an instance of TrialResult containing the results
    '''
    if trial:
        metrics_df = trial.get_metrics_dataframe()
        
        best_model = trial.get_best()
        model_rep_dict = best_model.model_rep.__dict__
        
        best_model_name = model_rep_dict.get('name')
        
        best_model_mse_test = model_rep_dict.get('metrics', {}).get('regression-mse', {}).get('test', {}).get('average')
        
        best_model_rmse_test = math.sqrt(best_model_mse_test)
        
        print(f"Best Model Name ({trial_number}): {best_model_name}")
        print(f"Best Model MSE (Test) ({trial_number}): {best_model_mse_test}")
        print(f"Best Model RMSE (Test) ({trial_number}): {best_model_rmse_test}")
        
        return TrialResult(
            metrics_df=metrics_df,
            best_model_name=best_model_name,
            best_model_mse_test=best_model_mse_test,
            best_model_rmse_test=best_model_rmse_test
        )
    else:
        print(f"Trial {trial_number} unsuccessful.")
        return None

def extend_test_data_and_get_predictions(test_data, model, periods):
    '''
    Extends the test data by adding new dates and generates predictions using the model.
    
    Params:
    test_data: pd.DataFrame - the test data
    model: Model - the trained model
    periods: int - the number of periods to extend
    
    Returns:
    extended_test_data: pd.DataFrame - the extended test data
    predictions: pd.Series - the model predictions
    '''
    last_date = test_data['Date_CPI'].max()
    new_dates = pd.date_range(start=last_date + pd.DateOffset(months=1), periods=periods, freq='M')
    new_entries = pd.DataFrame({
        'Date_CPI': new_dates,
        'Delta_CPI_Annual_Change': [0] * len(new_dates)
    })
    extended_test_data = test_data.append(new_entries, ignore_index=True)
    predictions = pd.Series(model.predict(data=extended_test_data), index=extended_test_data.index)
    return extended_test_data, predictions

def back_transformed_predictions(last_data, predictions, period=12):
    '''
    Back-transforms the predictions to the original scale.
    
    Params:
    last_data: pd.Series - the last 'period' data points from the original series
    predictions: pd.Series - the model predictions
    period: int - the period of the time series
    
    Returns:
    reversed_predictions: pd.Series - the back-transformed predictions
    '''
    extended_predictions = pd.concat([last_data, predictions], ignore_index=True)
    
    reversed_predictions = extended_predictions.copy()
    for t in range(period, len(extended_predictions)):
        reversed_predictions[t] = extended_predictions[t] + reversed_predictions[t-period]
    
    return reversed_predictions

# --- Workflow preparation steps --- 
split_idx_test = int(len(CPI_Delta_YoY) * 0.8)
test_data = CPI_Delta_YoY.iloc[split_idx_test:].copy()   # Last 20% 
train_data = CPI_Delta_YoY.iloc[:split_idx_test].copy()  # First 80%

# Retrieve last 12 months of data for back-transformation 
slice_length = len(train_data)
CPI_UK_train = CPI_UK_single.iloc[:slice_length]
last_12 = CPI_UK_train.tail(12)['CPI_Annual_Change'].reset_index(drop=True)
# ----Workflow preparation steps --- 

def run_workflow(trial_name, models, dataset_id, target_col, train_percentage, budget_mode, loss_funcs, is_timeseries, TimeseriesHorizon, test_data, last_12):
    results = {}

    # Configure the trial
    trial, best_model = config_trial(
        trial_name=trial_name, 
        models=models, 
        dataset_id=dataset_id,  
        target_col=target_col, 
        train_percentage=train_percentage, 
        budget_mode=budget_mode, 
        loss_funcs=loss_funcs, 
        is_timeseries=is_timeseries, 
        TimeseriesHorizon=TimeseriesHorizon
    )

    # Process the trial
    trial_result = process_trial(trial, TimeseriesHorizon)
    if trial_result:
        results[trial_name] = trial_result

        # Get Predictions 
        extended_test_data, predictions = extend_test_data_and_get_predictions(test_data, best_model, TimeseriesHorizon)

        # Back transform them to their original scale
        back_transformed_preds = back_transformed_predictions(last_12, predictions, period=12)

        # Print metrics for the best model
        best_model_rmse_test = results[trial_name].best_model_rmse_test
        print(f"Best Model RMSE Test for {trial_name}: {best_model_rmse_test}")

        # Attach as column for further visualization
        extended_test_data[f'Recovered_CPI_Annual_Change_{TimeseriesHorizon}'] = back_transformed_preds

        return extended_test_data, results, best_model_rmse_test
    else:
        return None, None

extended_test_data_12, results_12, best_model_rmse_test_12 = run_workflow(
    trial_name='Inflation_12',
    models=['ridge_regressor', 'lasso_regressor', 'elastic_net_regressor'],
    dataset_id=dataset.dataset_id,
    target_col='Delta_CPI_Annual_Change',
    train_percentage=0.8,
    budget_mode=BudgetMode.fast,
    loss_funcs=['Root Mean Squared Error'],
    is_timeseries=True,
    TimeseriesHorizon=12,
    test_data=test_data,
    last_12=last_12
)

extended_test_data_9, results_9, best_model_rmse_test_9 = run_workflow(
    trial_name='Inflation_9',
    models=['ridge_regressor', 'lasso_regressor', 'elastic_net_regressor'],
    dataset_id=dataset.dataset_id,
    target_col='Delta_CPI_Annual_Change',
    train_percentage=0.8,
    budget_mode=BudgetMode.fast,
    loss_funcs=['Root Mean Squared Error'],
    is_timeseries=True,
    TimeseriesHorizon=9,
    test_data=test_data,
    last_12=last_12
)

extended_test_data_6, results_6, best_model_rmse_test_6 = run_workflow(
    trial_name='Inflation_6',
    models=['ridge_regressor', 'lasso_regressor', 'elastic_net_regressor'],
    dataset_id=dataset.dataset_id,
    target_col='Delta_CPI_Annual_Change',
    train_percentage=0.8,
    budget_mode=BudgetMode.fast,
    loss_funcs=['Root Mean Squared Error'],
    is_timeseries=True,
    TimeseriesHorizon=6,
    test_data=test_data,
    last_12=last_12
)

extended_test_data_3, results_3, best_model_rmse_test_3 = run_workflow(
    trial_name='Inflation_3',
    models=['ridge_regressor', 'lasso_regressor', 'elastic_net_regressor'],
    dataset_id=dataset.dataset_id,
    target_col='Delta_CPI_Annual_Change',
    train_percentage=0.8,
    budget_mode=BudgetMode.fast,
    loss_funcs=['Root Mean Squared Error'],
    is_timeseries=True,
    TimeseriesHorizon=3,
    test_data=test_data,
    last_12=last_12
)

Defining and processing trials

We will now create our four trials with a varying predictive horizon for the future, and extract their metrics. We recommend running these one after the other, as it can be quite time consuming if ran simultaneously.

Retrieving predictions

After we have created our four trials, we will manually split the data to generate predictions on the test set. We also generate a similar split for original, Year-on-Year inflation rate variable to visualize our actual values.

Next, we extend our test dataframes with a time window representing the desired period for forecasting and generate predictions.

Dependent variable transformation

Finally, we transform the seasonally differenced predictions back to their original scale, as we take the last 12 values before the slice of the testing set and add the annual difference between the values back, so we could achieve a representation of the year-on-year inflation rate that is interpretable and aligned with our visualization data. This can also be understood by the equation below:

x_t = x_{t-12} + \text{diff}(x_t)

We visualize the results with a window representing the predictions time window between 2025 and 2026 for each of our trials.

# Split dataset for visualization (original DV format)
split_idx_vis = int(len(CPI_UK_single) * 0.8)
visualization_data = CPI_UK_single.iloc[split_idx_vis:].copy()

# PLotting the data 
plot_data = pd.concat([
    visualization_data[['Date_CPI', 'CPI_Annual_Change']].rename(columns={'CPI_Annual_Change': 'Annual_Change'}),
    extended_test_data_12[['Date_CPI', 'Recovered_CPI_Annual_Change_12']].rename(columns={'Recovered_CPI_Annual_Change_12': 'Annual_Change'}),
    extended_test_data_9[['Date_CPI', 'Recovered_CPI_Annual_Change_9']].rename(columns={'Recovered_CPI_Annual_Change_9': 'Annual_Change'}),
    extended_test_data_6[['Date_CPI', 'Recovered_CPI_Annual_Change_6']].rename(columns={'Recovered_CPI_Annual_Change_6': 'Annual_Change'}),
    extended_test_data_3[['Date_CPI', 'Recovered_CPI_Annual_Change_3']].rename(columns={'Recovered_CPI_Annual_Change_3': 'Annual_Change'})
], keys=['Actual', '12-Month Prediction', '9-Month Prediction', '6-Month Prediction', '3-Month Prediction']).reset_index(level=0).rename(columns={'level_0': 'Type'})


fig = go.Figure()


actual_data = plot_data[plot_data['Type'] == 'Actual']
fig.add_trace(go.Scatter(x=actual_data['Date_CPI'], y=actual_data['Annual_Change'], mode='lines', name='Actual'))


for prediction_type in ['12-Month Prediction', '9-Month Prediction', '6-Month Prediction', '3-Month Prediction']:
    prediction_data = plot_data[plot_data['Type'] == prediction_type]
    fig.add_trace(go.Scatter(x=prediction_data['Date_CPI'], y=prediction_data['Annual_Change'], mode='lines', name=prediction_type, opacity=0.5))


fig.add_shape(
    type="line",
    x0="2025-01-01", y0=0, x1="2025-01-01", y1=1,
    xref='x', yref='paper', opacity=0.5,
    line=dict(color="Black", width=1, dash="dash")
)
fig.add_shape(
    type="line",
    x0="2026-01-01", y0=0, x1="2026-01-01", y1=1,
    xref='x', yref='paper', opacity=0.5,
    line=dict(color="Black", width=1, dash="dash")
)

fig.update_layout(
    height=400,
    width=900,
    title_text="Annual CPI Inflation Rate with Predictions (12-Month, 9-Month, 6-Month, and 3-Month Horizons)",
    xaxis_title="Date",
    yaxis_title="Annual CPI Change",
    legend_title="Forecast",
    plot_bgcolor='white'  
)
fig.update_xaxes(tickangle=45, showgrid=True)
fig.update_yaxes(showgrid=True)


fig.show()

The final plot constitutes an output summary of our trials, representing the last prediction point each of the best selectd models generate for their respective time window (12,9,6 and 3).

# Extract specific points for the 3rd, 6th, 9th, and 12th month predictions
prediction_points = {
    'last_actual': visualization_data[visualization_data['Date_CPI'] == '2025-01-01T00:00:00.000000000']['CPI_Annual_Change'].values[0],
    '3-Month Prediction': extended_test_data_3[extended_test_data_3['Date_CPI'] == '2025-04-30T00:00:00.000000000']['Recovered_CPI_Annual_Change_3'].values[0],
    '6-Month Prediction': extended_test_data_6[extended_test_data_6['Date_CPI'] == '2025-07-31T00:00:00.000000000']['Recovered_CPI_Annual_Change_6'].values[0],
    '9-Month Prediction': extended_test_data_9[extended_test_data_9['Date_CPI'] == '2025-10-31T00:00:00.000000000']['Recovered_CPI_Annual_Change_9'].values[0],
    '12-Month Prediction': extended_test_data_12[extended_test_data_12['Date_CPI'] == '2026-01-31T00:00:00.000000000']['Recovered_CPI_Annual_Change_12'].values[0]
}


# Create a new DataFrame for these points with RMSE values
prediction_points_df = pd.DataFrame({
    'Date_CPI': pd.to_datetime(['2025-01-01', '2025-04-30', '2025-07-31', '2025-10-31', '2026-01-31']),
    'Annual_Change': list(prediction_points.values()),
    'RMSE': [0] + [best_model_rmse_test_3, best_model_rmse_test_6, best_model_rmse_test_9, best_model_rmse_test_12]
})

# Calculate upper and lower bounds
prediction_points_df['Upper_Bound'] = prediction_points_df['Annual_Change'] + prediction_points_df['RMSE']
prediction_points_df['Lower_Bound'] = prediction_points_df['Annual_Change'] - prediction_points_df['RMSE']

fig = go.Figure()

# Add actual data trace
actual_data = visualization_data[['Date_CPI', 'CPI_Annual_Change']].rename(columns={'CPI_Annual_Change': 'Annual_Change'})
actual_data['Date_CPI'] = pd.to_datetime(actual_data['Date_CPI'])
fig.add_trace(go.Scatter(x=actual_data['Date_CPI'], y=actual_data['Annual_Change'], mode='lines', name='Actual'))


fig.add_trace(go.Scatter(
    x=prediction_points_df['Date_CPI'],
    y=prediction_points_df['Upper_Bound'],
    mode='lines',
    line=dict(width=0),
    name='Upper Bound',
    showlegend=False
))


fig.add_trace(go.Scatter(
    x=prediction_points_df['Date_CPI'],
    y=prediction_points_df['Lower_Bound'],
    mode='lines',
    line=dict(width=0),
    fill='tonexty',
    fillcolor='rgba(255, 0, 0, 0.1)',
    name='Confidence Interval',
    showlegend=False
))


fig.add_trace(go.Scatter(
    x=prediction_points_df['Date_CPI'],
    y=prediction_points_df['Annual_Change'],
    mode='lines+markers+text',
    name='Prediction Points',
    line=dict(color='red', dash='dash'),
    marker=dict(size=10, color='red'),
    text=prediction_points_df['Date_CPI'].dt.strftime('%b'),
    textposition='top center',
    error_y=dict(
        type='data',
        array=prediction_points_df['RMSE'],
        visible=True,
        color='lightgrey', 
    )
))

fig.add_shape(
    type="line",
    x0="2025-01-01", y0=0, x1="2025-01-01", y1=1,
    xref='x', yref='paper', opacity=0.5,
    line=dict(color="Black", width=1, dash="dash")
)
fig.add_shape(
    type="line",
    x0="2026-01-01", y0=0, x1="2026-01-01", y1=1,
    xref='x', yref='paper', opacity=0.5,
    line=dict(color="Black", width=1, dash="dash")
)

fig.update_layout(
    height=600,
    width=1200,
    title_text="CPI Annual Inflation Rate Projection",
    xaxis_title="Date",
    yaxis_title="Annual CPI Change",
    legend_title="Data Type",
    plot_bgcolor='white'
)

fig.update_xaxes(tickangle=45, showgrid=True)
fig.update_yaxes(showgrid=True)

fig.show()

Overview​

Setup​

Dependencies​

Credentials​

Data loading and manipulation​

Dependent variable manipulation​

Trial configuration​

Defining and processing trials​

Retrieving predictions​

Dependent variable transformation​