Intro to MLOps: Hyperparameter Tuning

September 29, 2023

Intro to MLOps: Hyperparameter Tuning

Introduction

While the ingredients of a recipe play an important role, the instructions are just as important. Whether you bake a batch of cookies at 160°C for 20 minutes or at 180°C for 12 minutes can make a huge difference with the same ingredients.

So what does this have to do with machine learning (ML)? Well, in ML, the data, the preprocessing, and the model selection play an important role. But the model’s hyperparameters can significantly impact your ML model’s performance as well.

However, choosing the right hyperparameters for an ML model can be time-consuming. This article aims to give you an overview of what hyperparameters are, why it is important to tune them, how to tune them, and three different algorithms to automate hyperparameter optimization.

What are Hyperparameters in Machine Learning?
What is Hyperparameter Optimization in Machine Learning?
How Do You Optimize Hyperparameters?
Methods for Automated Hyperparameter Optimization
Conclusion
References

This is the second article in a small series of articles related to MLOps. Be sure to read the first article about Experiment Tracking in Machine Learning.

Let’s get started.

What are Hyperparameters in Machine Learning?

Hyperparameters are parameters that control the learning process. In contrast to other parameters, e.g., model weights, hyperparameters are not learned during the training process. Instead, you set them before training an ML model.

An example of a hyperparameter is the learning rate for training a neural network. Learning rate determines the step size at which the optimizer updates the model weights during training. A larger learning rate converges faster, but it can also cause the model to overshoot the optimal solution. A lower learning rate may take longer to converge, but it can help the model to find a better solution.

What is Hyperparameter Optimization in Machine Learning?

Hyperparameter optimization or hyperparameter tuning is the process of finding the best hyperparameters for an ML model. This is done by evaluating different sets of hyperparameter values from a specified search space to identify the best combination.

In the example of the learning rate, hyperparameter optimization aims to find a value that reaches the best solution in a given time frame (essentially finding the best trade-off between advantages and disadvantages of smaller and higher learning rate values).

Optimizing hyperparameters is essential because it can significantly impact a model’s performance. Different hyperparameter values result in different model performances.

How Do You Optimize Hyperparameters?

You can optimize hyperparameters manually or automatically. You can manually search for the best set of hyperparameters based on intuition and experience and through trial and error. Or you can use algorithms to automate this task for you. And automation is far more popular.

Before you begin with the hyperparameter tuning process, you need to define the following:

A set of hyperparameters you want to optimize (e.g., learning rate)

A search space for each hyperparameter either as specific values (e.g., 1e-3, 1e-4, and 1e-5) or as a value range (e.g., between 1e-5 and 1e-3 )

A performance metric to optimize (e.g., validation accuracy)

The number of trial runs (depending on the type of hyperparameter optimization, this can be implicit instead of explicit)

Before starting with automated hyperparameter optimization, you need to specify all of the above. But you can adjust the hyperparameter search space and the number of trial runs during manual hyperparameter tuning.

The generic steps for hyperparameter tuning are:

Select a set of hyperparameter values to evaluate

Run an ML experiment for the selected set of hyperparameters and their values, and evaluate and log its performance metric.

Repeat for the specified number of trial runs or until you are happy with the model’s performance

Depending on whether you manually conduct these steps or automate them, we talk about manual or automated hyperparameter optimization.

After this process, you will end up with a list of experiments, including their hyperparameters and performance metrics. An automated hyperparameter optimization algorithm returns the experiment with the best performance metric and the respective hyperparameter values.

During or after this process, you can compare the experiments’ performance metrics and choose the set of hyperparameter values that resulted in the best performance metric.

Methods for Automated Hyperparameter Optimization

The three main algorithms used in automated hyperparameter optimization are

Grid Search

Random Search

Bayesian Optimization

The main difference between the three algorithms is how they select the set of hyperparameter values to test next. But they are also different in how you define the search space (fixed values vs. value ranges) and how you specify the number of runs (implicit vs. explicit).

This section will explore these differences and their advantages and disadvantages.

I will use W&B Sweeps to optimize the hyperparameters epochs and learning_rate in the following. For more details, you can check out my related Kaggle Notebook and W&B project.

Grid Search

Grid search is a hyperparameter tuning technique that evaluates all possible hyperparameter combinations in a specified grid (Cartesian product). It is a brute-force approach recommended only for ML models with few hyperparameters.

Inputs

A set of hyperparameters you want to optimize

A discretized search space for each hyperparameter either as specific values

A performance metric to optimize

(Implicit number of runs: Because the search space is a fixed set of values, you don’t have to specify the number of experiments to run)

(The differences between random search and Bayesian optimization are highlighted in bold above.)

A popular way to implement grid search in Python is to use GridSearchCV from the scikit learn library. Alternatively, as shown below, you can set up a grid search for hyperparameter tuning with W&B:

Steps

Step 1: The grid search algorithm selects a set of hyperparameter values to evaluate by creating a grid (cartesian product) of all possible hyperparameter combinations of the specified hyperparameter values. Then it simply iterates over the grid. This approach is an exhaustive search or brute force approach.

Below, you can see the resulting grid for our example.

Step 2: Run an ML experiment for the selected set of hyperparameters and their values, and evaluate and log its performance metric.

Step 3: Repeat for the specified number of trial runs or until you are happy with the model’s performance

Output

As with all automated hyperparameter optimization algorithms, Grid Search returns the experiment with the best performance metric and the respective hyperparameter values.

Below you can see at which time the hyperparameter optimization algorithm chose which parameters and the resulting performance. You can make the following observations:

The grid search algorithm iterates over the grid of hyperparameter sets as specified.

Since grid search is an uninformed search algorithm, the resulting performance doesn’t show a trend over the runs.

The best val_acc score is 0.9902

Advantages

Simple to implement

Can be parallelized: because the hyperparameter sets can be evaluated independently

Disadvantages

Not suitable for models with many hyperparameters: this is largely because the computational cost grows exponentially with the number of hyperparameters

Uninformed search because knowledge from previous experiments is not leveraged. You may want to run the grid search algorithm several times with a fine-tuned search space to achieve good results.

Unless you have three or fewer hyperparameters to tune, it is generally recommended to avoid grid search.

Random Search

Random search is a hyperparameter tuning technique that randomly samples values from a specified search space. It is more effective than grid search for ML models with many hyperparameters where only a few affect the model’s performance [1].