Home / Blog / Data Science / HyperOpt Auto-ML

HyperOpt Auto-ML

June 23, 2024
86

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

HyperOpt and HyperOpt-Sklearn

It allows the optimization process to be stretched across several cores and different computers and is designed for large-scale optimization for models with hyper parameters.

Data preparation, model selection, and model hyperparameter optimization were all explicitly done using the package.

“Our strategy is to reveal the underlying expression graph of a performance metric's computation from hyperparameters that control not only how each processing step is performed but also whether processing steps are included (for example, classification accuracy on validation instances).”

— Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures, 2013.

Since the optimisation procedure and search space must be precisely specified, direct employment of HyperOpt is challenging. With the help of the HyperOpt-Sklearn extension, the well-known Sci Kit Learn free machine learning library's preparation of data and models for machine learning may be used with the HyperOpt approach. HyperOpt-Sklearn supports the HyperOpt library and provides the automated search of preparing data methods, learning computations, and model hyperparameters for problems with regression and classification.

Let's look at using HyperOpt-Sklearn now that we are familiar with HyperOpt and HyperOpt-Sklearn.

Installation and Use of HyperOpt-Sklearn

Installing the HyperOpt library is the first step.

The pip package manager can be used to accomplish the following:

1. sudo pip install hyperopt

By entering the following command once the library has been installed, we can verify that the installation was successful and determine its version:

1. sudo pip show hyperopt

This will provide the HyperOpt version that is currently installed and attest to the fact that it is a recent version.

Name: hyperopt
Version: 0.2.3
Summary: Distributed Asynchronous Hyperparameter Optimization
Home-page: http://hyperopt.github.com/hyperopt/
Author: James Bergstra
Author-email: james.bergstra@gmail.com
License: BSD
Location: ...
Requires: tqdm, six, networkx, future, scipy, cloudpickle, NumPy
Required-by:

The HyperOpt-Sklearn library must then be installed.

Pip may be used to install this as well, but we must carry out the following steps manually by cloning the repository and starting the installation from local files

git clone git@github.com:hyperopt/hyperopt-sklearn.git
cd hyperopt-sklearn
sudo pip install.
cd...

Again, by using the following command to check the version number, we can verify that the installation was successful:

sudo pip show hpsklearn

This will list the HyperOpt-Sklearn version that is currently installed, demonstrating that a recent version is being used.

Name: hpsklearn
Version: 0.0.3
Summary: Hyperparameter Optimization for SKlearn
Home-page: http://hyperopt.github.com/hyperopt-sklearn/
Author: James Bergstra
Author-email: anon@anon.com
License: BSD
Location: ...
Requires: nose, scikit-learn, NumPy, scipy, hyperopt
Required-by:

We may review the HyperOpt-Sklearn API now that the necessary libraries have been installed.

It's simple to use HyperOpt-Sklearn. Making and setting up a HyperoptEstimator class instance defines the search procedure.

The "algo" option can be used to specify the search algorithm, "max evals" can be used to specify how many evaluations will be conducted during the search, and "trial timeout" can be used to set a time limit for evaluating each pipeline. Click Here Data Science Course

...

# define search

model = HyperoptEstimator(..., algo=tpe.suggest, max_evals=50, trial_timeout=120)

There are numerous options for optimization algorithms, including:

Random Search
Tree of Parzen Estimators
Annealing
Tree
Gaussian Process Tree

The paper "Algorithms for Hyper-Parameter Optimisation" ([PDF]) offers further information on the various types of algorithms, and the "Tree of Parzen Estimators" is a good default.

The preconfigured lists of models offered by the library, such as "any classifier" and "any regressor," which give the search space of models for classification tasks and regression tasks, respectively, may be customised to utilise either of these inputs.

Similarly, to this, the "preprocessing" option defines the search space for data planning, while the "any preprocessing" option permits the use of a list of preprocessing methods that have been previously established.

...

# define search

model = HyperoptEstimator(classifier=any_classifier('cla'), preprocessing=any_preprocessing('pre'), ...)

You can look at the class's source code to learn more about the additional search parameters:

HyperoptEstimator Class Arguments

By using the fit() function once the search has been defined, it can be carried out.

...

# perform the search

model.fit(X_train, y_train)

By using the score() function, the best-performing model can be assessed on fresh data after the run.

...

# summarize performance

acc = model.score(X_test, y_test)

print("Accuracy: %.3f" % acc)

The best model() function allows us to retrieve the Pipeline of transforms, models, and model configurations that outperformed all others on the training dataset.

...

# summarize the best model

print(model.best_model())