Consumer price index forecasting
Overview
This Notebook presents a pipeline for forecasting the Consumer Price index (CPI), a key measure of inflation, over the next year. It generates predictions for the Year-on-Year CPI at 12,9,6 and 3-month intervals, with corresponding visualizations to illustrate projected trends.
Clients could use the generated predictions as an analytical insight on inflation trends, and/or when making informed decisions on setting interest rates.
The dataset used could be sourced from the UK Office for National Statistics (ONS) Feel free to experiment with other datasets providing inflation measures, should they fit a desired statistical objective.
Setup
Dependencies
This work uses the following library versions:
- turintech-evoml-client
- pandas
- matplotlib
- plotly
- nbformat (if using xls data) ###Credentials
You will also require:
- A URL for an instance of the evoML platform (e.g. https://evoml.ai)
- Your evoML username and password
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
import typing
from typing import Final
import evoml_client as ec
from evoml_client.trial_conf_models import BudgetMode, SplitMethodOptions
import plotly.express as px
import plotly.graph_objects as go
import nbformat as nbf
import math
import openpyxl
from dataclasses import dataclass
API_URL: Final[str] = "https://evoml.ai"
EVOML_USERNAME: Final[str] = ""
EVOML_PASSWORD: Final[str] = ""
# Connect to evoML platform
ec.init(base_url=API_URL, username=EVOML_USERNAME, password=EVOML_PASSWORD)
Data loading and manipulation
We first retrieve the downloaded data and select the relevant sheet name (Table 57, in this case) to extract the CPI summary of all items for the time period between between 1988 and 2025.
We then further convert the date column into a datetime format and we create a dataframe with our time column and dependent variable only.
# Reading the data
xls = pd.ExcelFile("consumer-price-inflation-ONS.xlsx", engine="openpyxl")
CPI_UK = pd.read_excel("consumer-price-inflation-ONS.xlsx", sheet_name="Table 57", skiprows=6, engine="openpyxl")
# Dropping columns
#print(CPI_UK['name'].tail(15)) #The last 14 rows of data are not relevant to the analysis
CPI_UK = CPI_UK.drop(CPI_UK.tail(14).index)
# Converting the time column to datetime format
CPI_UK['name'] = pd.to_datetime(CPI_UK['name'])
CPI_UK['name'] = CPI_UK['name'].dt.strftime('%Y-%m') #Removing the 00:00:00 timestamp from the date
CPI_UK.rename(columns={"name": "Date_CPI"}, inplace=True)
# Creating the df for exploratory analysis
CPI_UK = CPI_UK.copy()
CPI_UK_single = CPI_UK[['Date_CPI', 'CPI ALL ITEMS']]
Dependent variable manipulation
In order to retrieve a valuable estimate of inflation, we:
- Compute the 12-month rolling inflation rate from the Consumer Price index, which can be expressed with the following equation:
In order to avoid null values, we add a small offset to the product (epislon value).
- In order to ensure stationarity, we apply seasonal differecing to further detrend the data by removing annual seasonality.
We visualize both results and prepare the final dataframe for analysis.
# Variable manipulation
# Convert Date_CPI to datetime and sort the dataframe
CPI_UK_single['Date_CPI'] = pd.to_datetime(CPI_UK_single['Date_CPI'])
CPI_UK_single = CPI_UK_single.sort_values(by='Date_CPI')
# Compute the 12-month rolling inflation rate
epsilon = 1e-10
CPI_UK_single['CPI_Annual_Change'] = (
CPI_UK_single['CPI ALL ITEMS'] - CPI_UK_single['CPI ALL ITEMS'].shift(12)) / (CPI_UK_single['CPI ALL ITEMS'].shift(12) + epsilon
) * 100
CPI_UK_single = CPI_UK_single.dropna()
# Apply seasonal differencing to the data
CPI_UK_single['Delta_CPI_Annual_Change'] = CPI_UK_single['CPI_Annual_Change'].diff(12)
CPI_UK_single = CPI_UK_single.dropna()
# Create the figure
fig = go.Figure()
# Add traces
fig.add_trace(
go.Scatter(x=CPI_UK_single['Date_CPI'],
y=CPI_UK_single['CPI_Annual_Change'],
mode='lines',
name='CPI Annual Change',
line=dict(color='blue'))
)
fig.add_trace(
go.Scatter(x=CPI_UK_single['Date_CPI'],
y=CPI_UK_single['Delta_CPI_Annual_Change'],
mode='lines', name='Delta CPI Annual Change',
line=dict(color='red'))
)
# Determine the y-axis range with a buffer
y_min = min(CPI_UK_single[['CPI_Annual_Change', 'Delta_CPI_Annual_Change']].min()) * 1.1
y_max = max(CPI_UK_single[['CPI_Annual_Change', 'Delta_CPI_Annual_Change']].max()) * 1.1
# Update layout to ensure y-axis does not change when toggling legend items
fig.update_layout(
height=400,
width=900,
title='Annual CPI Inflation Rate (12-Month Change)',
xaxis_title='Year',
yaxis_title='Annual Inflation Rate (%)',
xaxis=dict(tickangle=45, showgrid=False),
yaxis=dict(showgrid=False, fixedrange=True, range=[y_min, y_max]),
legend_title_text='',
plot_bgcolor='white',
paper_bgcolor='white'
)
# Show the figure
fig.show()
# And fetch final dataframe for analysis
CPI_Delta_YoY = CPI_UK_single[['Date_CPI', 'Delta_CPI_Annual_Change']]
#Upload the dataset to EvoML
dataset = ec.Dataset.from_pandas(CPI_Delta_YoY, name="CPI_Dataset_Delta")
dataset.put()
dataset.wait()
print(f"Dataset URL: {API_URL}/platform/datasets/view/{dataset.dataset_id}")
Dataset URL: https://evoml.ai/platform/datasets/view/67d97d584082c2ed0965a5f8
Trial configuration
Here we use the client to configure four different trials, with a predictive horizon of 12,9,6 and 3 months in the future, respectively.
We therefore create a universal workflow which will allow us to execute this process, generate predictions, and back-transform these predictions to their original Year-on-year inflation scale. For this example, we have kept the window size the same (=6) and have chosen regularized regression models with the aim of ensuring generalization and handling milticolinearity concerns. In the end, this function also fetches the best model to be used for our purpouses.
- In order to execute our workflow at the end, we mirror EvoML's 80/20 prepreprocessing split globally. If your dependent variable does not require back-transformation, this step can be skipped.
- Feel free to also recreate a separate module out of the workflow and call into this notebook, as per best practices.
# This is a workflow that works end-to-end for the tral confirguration, processing, fetching predictions, and back-transforming them for visualization
def config_trial(trial_name, models, dataset_id, target_col, train_percentage=0.8, budget_mode=BudgetMode.fast, loss_funcs=["Root Mean Squared Error"], is_timeseries=True, TimeseriesHorizon=12):
'''
Configures and runs a trial with the specified parameters.
Params:
dataset_id: str - the dataset ID
target_col: str - the target column name
trial_name: str - the name of the trial
models: list - a list of model names
train_percentage: float - the percentage of the dataset to use for training
budget_mode: BudgetMode - the budget mode
loss_funcs: list - a list of loss functions
is_timeseries: bool - whether the dataset is a time series
TimeseriesHorizon: int - the time series horizon (default is 12)
Returns:
trial: Trial - the trial object
best_model: Model - the best model object
'''
try:
config = ec.TrialConfig.with_models(
models=models,
task=ec.MlTask.regression,
budget_mode=budget_mode,
loss_funcs=loss_funcs,
dataset_id=dataset_id,
is_timeseries=is_timeseries,
)
config.options.timeSeriesWindowSize = 6
config.options.timeSeriesHorizon = TimeseriesHorizon
config.options.splittingMethodOptions = SplitMethodOptions(method="percentage", trainPercentage=train_percentage)
config.options.enableBudgetTuning = False
trial, _ = ec.Trial.from_dataset_id(
dataset_id,
target_col=target_col,
trial_name=trial_name,
config=config,
)
trial.run(timeout=900)
best_model = trial.get_best()
best_model.build_model()
return trial, best_model
except Exception as e:
print(f"An error occurred while building the trial: {e}")
return None, None
@dataclass
class TrialResult:
metrics_df: any
best_model_name: str
best_model_mse_test: float
best_model_rmse_test: float
def process_trial(trial: ec.Trial, trial_number: int) -> TrialResult:
'''
Params:
trial: Trial - the trial object
trial_number: int - the trial number
Returns: TrialResult - an instance of TrialResult containing the results
'''
if trial:
metrics_df = trial.get_metrics_dataframe()
best_model = trial.get_best()
model_rep_dict = best_model.model_rep.__dict__
best_model_name = model_rep_dict.get('name')
best_model_mse_test = model_rep_dict.get('metrics', {}).get('regression-mse', {}).get('test', {}).get('average')
best_model_rmse_test = math.sqrt(best_model_mse_test)
print(f"Best Model Name ({trial_number}): {best_model_name}")
print(f"Best Model MSE (Test) ({trial_number}): {best_model_mse_test}")
print(f"Best Model RMSE (Test) ({trial_number}): {best_model_rmse_test}")
return TrialResult(
metrics_df=metrics_df,
best_model_name=best_model_name,
best_model_mse_test=best_model_mse_test,
best_model_rmse_test=best_model_rmse_test
)
else:
print(f"Trial {trial_number} unsuccessful.")
return None
def extend_test_data_and_get_predictions(test_data, model, periods):
'''
Extends the test data by adding new dates and generates predictions using the model.
Params:
test_data: pd.DataFrame - the test data
model: Model - the trained model
periods: int - the number of periods to extend
Returns:
extended_test_data: pd.DataFrame - the extended test data
predictions: pd.Series - the model predictions
'''
last_date = test_data['Date_CPI'].max()
new_dates = pd.date_range(start=last_date + pd.DateOffset(months=1), periods=periods, freq='M')
new_entries = pd.DataFrame({
'Date_CPI': new_dates,
'Delta_CPI_Annual_Change': [0] * len(new_dates)
})
extended_test_data = test_data.append(new_entries, ignore_index=True)
predictions = pd.Series(model.predict(data=extended_test_data), index=extended_test_data.index)
return extended_test_data, predictions
def back_transformed_predictions(last_data, predictions, period=12):
'''
Back-transforms the predictions to the original scale.
Params:
last_data: pd.Series - the last 'period' data points from the original series
predictions: pd.Series - the model predictions
period: int - the period of the time series
Returns:
reversed_predictions: pd.Series - the back-transformed predictions
'''
extended_predictions = pd.concat([last_data, predictions], ignore_index=True)
reversed_predictions = extended_predictions.copy()
for t in range(period, len(extended_predictions)):
reversed_predictions[t] = extended_predictions[t] + reversed_predictions[t-period]
return reversed_predictions
# --- Workflow preparation steps ---
split_idx_test = int(len(CPI_Delta_YoY) * 0.8)
test_data = CPI_Delta_YoY.iloc[split_idx_test:].copy() # Last 20%
train_data = CPI_Delta_YoY.iloc[:split_idx_test].copy() # First 80%
# Retrieve last 12 months of data for back-transformation
slice_length = len(train_data)
CPI_UK_train = CPI_UK_single.iloc[:slice_length]
last_12 = CPI_UK_train.tail(12)['CPI_Annual_Change'].reset_index(drop=True)
# ----Workflow preparation steps ---
def run_workflow(trial_name, models, dataset_id, target_col, train_percentage, budget_mode, loss_funcs, is_timeseries, TimeseriesHorizon, test_data, last_12):
results = {}
# Configure the trial
trial, best_model = config_trial(
trial_name=trial_name,
models=models,
dataset_id=dataset_id,
target_col=target_col,
train_percentage=train_percentage,
budget_mode=budget_mode,
loss_funcs=loss_funcs,
is_timeseries=is_timeseries,
TimeseriesHorizon=TimeseriesHorizon
)
# Process the trial
trial_result = process_trial(trial, TimeseriesHorizon)
if trial_result:
results[trial_name] = trial_result
# Get Predictions
extended_test_data, predictions = extend_test_data_and_get_predictions(test_data, best_model, TimeseriesHorizon)
# Back transform them to their original scale
back_transformed_preds = back_transformed_predictions(last_12, predictions, period=12)
# Print metrics for the best model
best_model_rmse_test = results[trial_name].best_model_rmse_test
print(f"Best Model RMSE Test for {trial_name}: {best_model_rmse_test}")
# Attach as column for further visualization
extended_test_data[f'Recovered_CPI_Annual_Change_{TimeseriesHorizon}'] = back_transformed_preds
return extended_test_data, results, best_model_rmse_test
else:
return None, None
extended_test_data_12, results_12, best_model_rmse_test_12 = run_workflow(
trial_name='Inflation_12',
models=['ridge_regressor', 'lasso_regressor', 'elastic_net_regressor'],
dataset_id=dataset.dataset_id,
target_col='Delta_CPI_Annual_Change',
train_percentage=0.8,
budget_mode=BudgetMode.fast,
loss_funcs=['Root Mean Squared Error'],
is_timeseries=True,
TimeseriesHorizon=12,
test_data=test_data,
last_12=last_12
)
extended_test_data_9, results_9, best_model_rmse_test_9 = run_workflow(
trial_name='Inflation_9',
models=['ridge_regressor', 'lasso_regressor', 'elastic_net_regressor'],
dataset_id=dataset.dataset_id,
target_col='Delta_CPI_Annual_Change',
train_percentage=0.8,
budget_mode=BudgetMode.fast,
loss_funcs=['Root Mean Squared Error'],
is_timeseries=True,
TimeseriesHorizon=9,
test_data=test_data,
last_12=last_12
)
extended_test_data_6, results_6, best_model_rmse_test_6 = run_workflow(
trial_name='Inflation_6',
models=['ridge_regressor', 'lasso_regressor', 'elastic_net_regressor'],
dataset_id=dataset.dataset_id,
target_col='Delta_CPI_Annual_Change',
train_percentage=0.8,
budget_mode=BudgetMode.fast,
loss_funcs=['Root Mean Squared Error'],
is_timeseries=True,
TimeseriesHorizon=6,
test_data=test_data,
last_12=last_12
)
extended_test_data_3, results_3, best_model_rmse_test_3 = run_workflow(
trial_name='Inflation_3',
models=['ridge_regressor', 'lasso_regressor', 'elastic_net_regressor'],
dataset_id=dataset.dataset_id,
target_col='Delta_CPI_Annual_Change',
train_percentage=0.8,
budget_mode=BudgetMode.fast,
loss_funcs=['Root Mean Squared Error'],
is_timeseries=True,
TimeseriesHorizon=3,
test_data=test_data,
last_12=last_12
)
Defining and processing trials
We will now create our four trials with a varying predictive horizon for the future, and extract their metrics. We recommend running these one after the other, as it can be quite time consuming if ran simultaneously.
Retrieving predictions
After we have created our four trials, we will manually split the data to generate predictions on the test set. We also generate a similar split for original, Year-on-Year inflation rate variable to visualize our actual values.
Next, we extend our test dataframes with a time window representing the desired period for forecasting and generate predictions.
Dependent variable transformation
Finally, we transform the seasonally differenced predictions back to their original scale, as we take the last 12 values before the slice of the testing set and add the annual difference between the values back, so we could achieve a representation of the year-on-year inflation rate that is interpretable and aligned with our visualization data. This can also be understood by the equation below:
We visualize the results with a window representing the predictions time window between 2025 and 2026 for each of our trials.
# Split dataset for visualization (original DV format)
split_idx_vis = int(len(CPI_UK_single) * 0.8)
visualization_data = CPI_UK_single.iloc[split_idx_vis:].copy()
# PLotting the data
plot_data = pd.concat([
visualization_data[['Date_CPI', 'CPI_Annual_Change']].rename(columns={'CPI_Annual_Change': 'Annual_Change'}),
extended_test_data_12[['Date_CPI', 'Recovered_CPI_Annual_Change_12']].rename(columns={'Recovered_CPI_Annual_Change_12': 'Annual_Change'}),
extended_test_data_9[['Date_CPI', 'Recovered_CPI_Annual_Change_9']].rename(columns={'Recovered_CPI_Annual_Change_9': 'Annual_Change'}),
extended_test_data_6[['Date_CPI', 'Recovered_CPI_Annual_Change_6']].rename(columns={'Recovered_CPI_Annual_Change_6': 'Annual_Change'}),
extended_test_data_3[['Date_CPI', 'Recovered_CPI_Annual_Change_3']].rename(columns={'Recovered_CPI_Annual_Change_3': 'Annual_Change'})
], keys=['Actual', '12-Month Prediction', '9-Month Prediction', '6-Month Prediction', '3-Month Prediction']).reset_index(level=0).rename(columns={'level_0': 'Type'})
fig = go.Figure()
actual_data = plot_data[plot_data['Type'] == 'Actual']
fig.add_trace(go.Scatter(x=actual_data['Date_CPI'], y=actual_data['Annual_Change'], mode='lines', name='Actual'))
for prediction_type in ['12-Month Prediction', '9-Month Prediction', '6-Month Prediction', '3-Month Prediction']:
prediction_data = plot_data[plot_data['Type'] == prediction_type]
fig.add_trace(go.Scatter(x=prediction_data['Date_CPI'], y=prediction_data['Annual_Change'], mode='lines', name=prediction_type, opacity=0.5))
fig.add_shape(
type="line",
x0="2025-01-01", y0=0, x1="2025-01-01", y1=1,
xref='x', yref='paper', opacity=0.5,
line=dict(color="Black", width=1, dash="dash")
)
fig.add_shape(
type="line",
x0="2026-01-01", y0=0, x1="2026-01-01", y1=1,
xref='x', yref='paper', opacity=0.5,
line=dict(color="Black", width=1, dash="dash")
)
fig.update_layout(
height=400,
width=900,
title_text="Annual CPI Inflation Rate with Predictions (12-Month, 9-Month, 6-Month, and 3-Month Horizons)",
xaxis_title="Date",
yaxis_title="Annual CPI Change",
legend_title="Forecast",
plot_bgcolor='white'
)
fig.update_xaxes(tickangle=45, showgrid=True)
fig.update_yaxes(showgrid=True)
fig.show()
The final plot constitutes an output summary of our trials, representing the last prediction point each of the best selectd models generate for their respective time window (12,9,6 and 3).
# Extract specific points for the 3rd, 6th, 9th, and 12th month predictions
prediction_points = {
'last_actual': visualization_data[visualization_data['Date_CPI'] == '2025-01-01T00:00:00.000000000']['CPI_Annual_Change'].values[0],
'3-Month Prediction': extended_test_data_3[extended_test_data_3['Date_CPI'] == '2025-04-30T00:00:00.000000000']['Recovered_CPI_Annual_Change_3'].values[0],
'6-Month Prediction': extended_test_data_6[extended_test_data_6['Date_CPI'] == '2025-07-31T00:00:00.000000000']['Recovered_CPI_Annual_Change_6'].values[0],
'9-Month Prediction': extended_test_data_9[extended_test_data_9['Date_CPI'] == '2025-10-31T00:00:00.000000000']['Recovered_CPI_Annual_Change_9'].values[0],
'12-Month Prediction': extended_test_data_12[extended_test_data_12['Date_CPI'] == '2026-01-31T00:00:00.000000000']['Recovered_CPI_Annual_Change_12'].values[0]
}
# Create a new DataFrame for these points with RMSE values
prediction_points_df = pd.DataFrame({
'Date_CPI': pd.to_datetime(['2025-01-01', '2025-04-30', '2025-07-31', '2025-10-31', '2026-01-31']),
'Annual_Change': list(prediction_points.values()),
'RMSE': [0] + [best_model_rmse_test_3, best_model_rmse_test_6, best_model_rmse_test_9, best_model_rmse_test_12]
})
# Calculate upper and lower bounds
prediction_points_df['Upper_Bound'] = prediction_points_df['Annual_Change'] + prediction_points_df['RMSE']
prediction_points_df['Lower_Bound'] = prediction_points_df['Annual_Change'] - prediction_points_df['RMSE']
fig = go.Figure()
# Add actual data trace
actual_data = visualization_data[['Date_CPI', 'CPI_Annual_Change']].rename(columns={'CPI_Annual_Change': 'Annual_Change'})
actual_data['Date_CPI'] = pd.to_datetime(actual_data['Date_CPI'])
fig.add_trace(go.Scatter(x=actual_data['Date_CPI'], y=actual_data['Annual_Change'], mode='lines', name='Actual'))
fig.add_trace(go.Scatter(
x=prediction_points_df['Date_CPI'],
y=prediction_points_df['Upper_Bound'],
mode='lines',
line=dict(width=0),
name='Upper Bound',
showlegend=False
))
fig.add_trace(go.Scatter(
x=prediction_points_df['Date_CPI'],
y=prediction_points_df['Lower_Bound'],
mode='lines',
line=dict(width=0),
fill='tonexty',
fillcolor='rgba(255, 0, 0, 0.1)',
name='Confidence Interval',
showlegend=False
))
fig.add_trace(go.Scatter(
x=prediction_points_df['Date_CPI'],
y=prediction_points_df['Annual_Change'],
mode='lines+markers+text',
name='Prediction Points',
line=dict(color='red', dash='dash'),
marker=dict(size=10, color='red'),
text=prediction_points_df['Date_CPI'].dt.strftime('%b'),
textposition='top center',
error_y=dict(
type='data',
array=prediction_points_df['RMSE'],
visible=True,
color='lightgrey',
)
))
fig.add_shape(
type="line",
x0="2025-01-01", y0=0, x1="2025-01-01", y1=1,
xref='x', yref='paper', opacity=0.5,
line=dict(color="Black", width=1, dash="dash")
)
fig.add_shape(
type="line",
x0="2026-01-01", y0=0, x1="2026-01-01", y1=1,
xref='x', yref='paper', opacity=0.5,
line=dict(color="Black", width=1, dash="dash")
)
fig.update_layout(
height=600,
width=1200,
title_text="CPI Annual Inflation Rate Projection",
xaxis_title="Date",
yaxis_title="Annual CPI Change",
legend_title="Data Type",
plot_bgcolor='white'
)
fig.update_xaxes(tickangle=45, showgrid=True)
fig.update_yaxes(showgrid=True)
fig.show()