Household Power Consumption

Overview

In this notebook, we will give an example of how you can use evoML to build a predictive model for future daily household power consumption. We will be using the daily cumulative power consumptions of 955 households in Germany over a period from November 2017 to October 2020. In this example we will be forecasting the mean household.

This notebook can also be viewed at https://nbviewer.org/url/docs.evoml.ai/notebooks/predicting-power-consumption.ipynb.

Setup

Dependencies

This notebook uses the following dependencies:

turintech-evoml-client
pandas
matplotlib

Credentials and Dataset

You will also require:

A URL for an instance of the evoML platform (e.g. https://evoml.ai).
Your evoML username and password.
After creating an IEEE account the all-users-daily-data.csv dataset is freely available at: https://ieee-dataport.org/open-access/gem-house-opendata-german-electricity-consumption-many-households-over-three-years-2018.

import pandas as pd
import matplotlib.pyplot as plt
from typing import Final

import evoml_client as ec
from evoml_client.trial_conf_models import BudgetMode

# Load credentials from environment variables
API_URL: Final[str] = "https://evoml.ai"
EVOML_USERNAME: Final[str] = ""
EVOML_PASSWORD: Final[str] = ""

# Connect to the evoML platform
ec.init(base_url=API_URL, username=EVOML_USERNAME, password=EVOML_PASSWORD)

Data Preparation

Mean Daily Energy Consumption

After downloading the all-users-daily-data.csv dataset, we will load it into a pandas DataFrame and convert the raw meter readings to kilowatt-hours (kWh). We will then calculate the daily power consumption of each household and resample the data to ensure daily frequency. The data is stored in a narrow format and the households are distinguished by the userId column.

cumulative_energy = pd.read_csv("all-users-daily-data.csv")
cumulative_energy["energy"] *= 1e-10  # Convert from raw meter readings to kWh

# Function to standardize time index and compute daily energy
def calculate_daily_energy(df):
    return (
        df.assign(date=pd.to_datetime(df['date'])) # Ensure date column is in datetime format
        .set_index('date')                         # Set date column as the index
        .resample('D')                             # Resample to ensure daily frequency
        .asfreq()                                  # Fill missing dates with NaNs
        .assign(daily_energy=lambda x: x['energy'].diff())  # Compute daily energy difference
    )

# Apply the function groupwise and standardize the time index
daily_energy = cumulative_energy.groupby('userId').apply(calculate_daily_energy)

# Group by date and calculate the mean daily_energy
mean_daily_energy = daily_energy.pivot_table(values='daily_energy', index='date', aggfunc='mean')

Exploratory Data Analysis

To get some idea of what's going on in our data we will now visualize the mean daily power consumption.

mean_daily_energy["daily_energy"].plot(figsize=(15, 6), linestyle='-')
plt.title('Mean Daily Energy Consumption Across All Users Over Time')
plt.xlabel('Date')
plt.ylabel('Mean Daily Energy Consumption')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

png

Immediately we can see that the daily power consumption follows a slow yearly cycle with the addition of a lower amplitude weekly cycle. This will be useful when setting up a trial.

Upload to evoML

In order to prepare a trial, we first need to upload a dataset. We will perform this task from within the current notebook using the evoML client.

# Upload the dataset
dataset = ec.Dataset.from_pandas(mean_daily_energy.reset_index(), name="Mean Daily Energy Consumption")
dataset.put()
dataset.wait()
print(f"Dataset URL: {API_URL}/platform/datasets/view/{dataset.dataset_id}")

Running a Trial

With our dataset loaded, we can now set up a trial to predict the daily power consumption of each household. We first create a configuration, before creating a trial object and running it.

config = ec.TrialConfig.with_default(
    task=ec.MlTask.regression,
    budget_mode=BudgetMode.fast,
    loss_funcs=["R2"],
    dataset_id=dataset.dataset_id,
    is_timeseries=True,
)
config.options.timeSeriesWindowSize = 7
config.options.enableBudgetTuning = False

trial, dataset = ec.Trial.from_dataset_id(
    dataset.dataset_id,
    target_col="daily_energy",
    trial_name="Forecast - Daily Energy",
    config=config,
)

trial.run(timeout=900)

metrics = trial.get_metrics_dataframe()
selected_metrics = metrics.loc[:, pd.IndexSlice["regression-r2", ["validation", "test"]]]
selected_metrics.sort_values(by=("regression-r2", "validation"), ascending=False)

Model	Validation R2	Test R2
linear_regressor-7dc3d	0.845	0.817
lightgbm_regressor-424fa	0.834	0.831
decision_tree_regressor-5a883	0.761	-0.787

We see that on the validation set the best model was a linear model which achieved an R2 score of 0.845. Let's get the pipeline for this model and use it to predict the daily power consumption of the last 4 weeks of data.

Interacting with Pipelines

Building the Pipeline

After a trial is completed, we can retrieve and use the best pipeline for predictions. Here we demonstrate how to use the pipeline to make online predictions.

# Retrieve the best model according to the specified objective function and validation set
best_model = trial.get_best()

# Build the model so that we can use it for prediction
best_model.build_model()

Visualizing Predictions

Now we create the test set for prediction and visualization and plot the results.

# Select n_days of data
n_days = 7*4
test_data = mean_daily_energy.iloc[-n_days:]

# Predict the mean daily energy
predicted_mean_daily_energy = pd.Series(best_model.predict(data=test_data.reset_index()), index=test_data.index)

# Plot the actual and predicted mean daily energy without auto-legend
ax = test_data.plot(
    figsize=(15, 6),
    linestyle='-',
    label='Actual',
    legend=False
)
predicted_mean_daily_energy.plot(
    ax=ax,
    linestyle='--',
    label='Predicted',
    legend=False
)
ax.legend(['Actual', 'Predicted'])

plt.title('Mean Daily Energy Consumption: Actual vs Predicted')
plt.xlabel('Date')
plt.ylabel('Mean Daily Energy Consumption')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

png

The predicted values generally align with the trend of actual energy consumption but do not fully capture sharp peaks and troughs, such as those around the 11th, 19th, and 25th. This suggests the model smooths the predictions and has limited sensitivity to sudden daily variability.

Overview​

Setup​

Dependencies​

Credentials and Dataset​

Data Preparation​

Mean Daily Energy Consumption​

Exploratory Data Analysis​

Upload to evoML​

Running a Trial​

Interacting with Pipelines​

Building the Pipeline​

Visualizing Predictions​