Home / Blog / AutoML / LIGHT AUTOML: Introduction, Facts and Features

LIGHT AUTOML: Introduction, Facts and Features

August 07, 2024
89

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Introduction

Welcome to the world of Automated Machine Learning – where dreams are data-driven, and the future is forever within our grasp. Imagine a world where anyone, regardless of technical expertise, can unlock the full potential of machine learning. This is the essence of Automated Machine Learning, or AutoML, a groundbreaking concept that promises to revolutionize the way we approach complex problem-solving. Embrace the art of automation and the magic of self-learning systems, as AutoML paves the way for organizations to embrace the future of AI-driven decisions. No longer limited by technical boundaries, the world of AutoML beckons us to unlock the full potential of data, guiding us towards a future where innovation knows no bounds. AutoML has several benefits, including reduced time and resources required for developing and deploying machine learning models.This can help to increase the adoption of machine learning in a wide range of applications. Let us embark on this remarkable journey, where the pursuit of knowledge and innovation propels us towards a brighter tomorrow. Here in this blog we are going to discuss in detail about Light AutoML which is an open-source library aimed at automatic Machine Learning.

Let us begin our journey.

Light AutoML is an advanced open-source AutoML library that basically automates the process of feature and model selection, it simplifies and accelerates the process of machine learning. Light AutoML leverages the synergy between different machine learning techniques, such as gradient boosting, linear models, and neural networks, to create an ensemble of models that complement each other's strengths. This ensemble approach significantly improves prediction accuracy and generalization, making it a formidable tool for tackling various real-world challenges. Light AutoML excels at handling structured data, unstructured data, and even time series data, making it versatile and adaptable to a wide range of applications. Its flexibility allows data scientists and machine learning practitioners to focus on refining problem-specific aspects while the heavy lifting of model optimization is taken care of by the library.

Some interesting facts about AutoML

India being the 2nd most well established country after the USA in the field of Data Science and AI, several companies are utilizing the concepts of AutoML throughout the day. It has been recorded that on an average more than 60 times per month, AutoML keywords have been searched in the last 2 years.

Coming to worldwide facts, the Auto ML keyword is searched for more than 80 times per month on an average in the last 2 years and Myanmar is in the top list in terms for web search of this word.

Features of Light AutoML

Light AutoML is an open source AutoML library that offers a wide range of features to simplify and accelerate the machine learning process. Some of the important features of this library are:

Automated feature engineering:

One of the main advantages of this library is to automate feature engineering. Automated feature engineering techniques like missing value imputation, scaling, and normalization, which can significantly improve the quality of the feature set and improve the accuracy of the final model.
Custom feature engineering:

Custom feature engineering is an important feature of Light AutoML as it allows users to work on the feature set to their specific needs. The library provides a wide range of feature engineering techniques, such as categorical encoding, text processing, and feature selection, that can be combined in various ways to create a feature set that is enhanced for the given task. However, sometimes the built-in feature engineering techniques are not sufficient to capture the complexity of the data or to address domain-specific problems.

Based on the domain knowledge and experience custom feature engineering allows users to set their own features. This can be done by using external data sources, such as weather or economic data, or by creating new features based on specific business rules or constraints. For example, if the task is to predict the demand for a particular product, a domain expert may build a feature that captures the impact of a marketing campaign on the demand for the product.

Defining own features allows the user to make better models with better accuracy. However, it is important to remember that custom feature engineering can bring bias to the model if not done carefully. Therefore, it is important to validate the performance of the model.
Model Selection:

Model selection is an important step in the machine learning pipeline, and Light AutoML offers a variety of options to help users select the best model for their task. The library includes a wide range of models such as tree based models, linear models, and neural networks, which can be applied to various machine learning tasks.

Ensemble learning is a popular technique in machine learning that involves combining the predictions of multiple models to improve the overall accuracy of the model. Light AutoML uses ensemble learning to create a final model that is meant for the given task. The library provides several ensemble learning techniques, such as stacking, bagging, boosting, which can be used to combine the predictions of multiple models.

Light AutoML also supports custom models that allows users to define their own models based on their domain knowledge and experience. This is basically used for complex problems where no pre-built models are available, or for problems that require specific model architectures.

In addition to model selection techniques, Light AutoML also provides tools to select and finalize best hyperparameters. Hyperparameters are the parameters that are given by the user to improve the accuracy, it is not constant and it varies based on Machine Learning models used. Examples of hyperparameters include the learning rate in neural networks, the number of trees in a random forest model, regularization parameter in a linear model, kernels in SVM or number of nearest neighbours in KNN. Optimizing hyperparameters can improve the performance of the model and reduce overfitting.

The library gives several hyperparameters techniques, such as grid search, random search, and Bayesian optimization. Grid search checks each hyperparameter given in a range and selects the best one out of it, while random search involves randomly selecting the best hyperparameter from a given range. Bayesian optimization selects the best hyperparameter based on the probabilistic model.
Tabular and image data support:

Light AutoML is developed to work with both image and tabular data, making it a versatile tool for a wide range of machine learning tasks. For tabular data, the library supports common file formats such as CSV and Excel, and includes a wide range of preprocessing techniques for cleaning and preparing the data before modelling. These preprocessing techniques include missing value imputation, scaling, normalization, and feature engineering. This library also provides tools for feature selection and dimensionality reduction, which improve the efficiency and accuracy of the model.

For image data, Light AutoML supports popular image file formats such as JPEG and PNG, and includes a range of image preprocessing techniques such as cropping, resizing, and colour normalization. The library also supports popular deep learning models for image classification, object detection, and segmentation, including convolutional neural networks (CNNs), residual networks (ResNets), and U-Net architectures.
Time-series support:

Time series data consists of observations recorded at time intervals. Examples of time-series data include weather data, stock prices and sensor readings.

Here, the observations are often dependent on previous observations, and the data can exhibit patterns such as trends and seasonality. Light AutoML systems explore and evaluate various time series models, such as ARIMA (Autoregressive Integrated Moving Average), SARIMA (Seasonal ARIMA), Exponential Smoothing (ETS), Prophet, and more. The system can automatically select the most appropriate model based on the data characteristics and performance metrics.

Light AutoML's support for time-series data makes it a powerful tool for modelling and analysing time-series data, allowing users to make exact predictions and gain valuable insights into complex time-series patterns.
Interpretability:

Feature importance scores and model interpretability tools are provided by this library to help users understand how the model is making predictions. It is widely important in the field of Medical and Financial industries.
Ease of use:

This library is designed to be user-friendly and accessible to data scientists and researchers with varying levels of expertise. It includes detailed documentation and examples to help users get sta
rted easily and quickly. Additionally, the library is available as a Python package, making it easy to integrate into existing workflows.
Reproducibility:

This library ensures reproducibility by using a fixed random seed and providing detailed information about the model training process. Users can compare the performance of different models using the same dataset.
Model deployment:

The library provides tools for model deployment, allowing users to easily deploy the trained model to a production environment. The library includes a range of deployment options, such as REST API deployment, making it easy to integrate the model into existing systems.
Community support:

This library has an active community of users and contributors, which provides support and feedback to the developers.It is an open-source library and users can use it freely.

Without using AutoML Examples

Here we load the data to a variable,then do the preprocessing and then import varieties of models to train the model on the training data and then test on the test data.Manually hyperparameters are passed and checked.Then evaluation is done using various evaluation metrics.

Various modules are imported

Dataset loaded to variables.80 columns are present in test dataset with 1459 rows and 1460 rows with 81 columns present in train dataset.

Data set is split into train and valid set.

Shape of the dataset has been mentioned in the above code.

Model building is done manually as the model is selected by the user and checked which model is giving good accuracy.

Thus, we can see that using the AutoML library, we are saving time as the library is automatically taking a model, it simplifies and accelerates the process of machine learning.

Using Light AutoML with Code Examples

We are showing a code using AutoML, Sample dataset is used from kaggle. The dataset is called the "House Prices: Advanced Regression Techniques" dataset and contains information about various features of houses such as the number of bedrooms, the size of the lot, and the age of the house. Predicting the sale price of the house is the task.

Step 1: Import the necessary libraries

We will start by importing the necessary libraries. We will use Light AutoML, pandas, and sci-kit-learn.

Step 2: Load the data

Next, Dataset is loaded into pandas DataFrame, and the read_csv function is used to load the dataset.

Step 3: Data split into training and validation sets

Dataset is split into training and testing using the train_test_split function from scikit-learn. 80% of the data used for training and 20% for testing.

we can see the train data size is (1168,81) and valid_data size is (292,81).

Step 4: Define the task and target column

Task is defined as regression and the target column as the sale price of the house.

Step 5: Create a Light AutoML model

Light AutoML model is created using the Tabular AutoML pre-set for tabular data. Training data is passed as a task with a target as Sales Price.

Timeout of 600 seconds and cpu_limit of 4 cores is set. Algorithms are specified for model selection using the use_algos parameter.

Step 6: Train the model

Fit function is used to train the model with passing the training data and target column.

Fit_predict function trains the model and predicts on the validation data.

Step 7: Evaluate the model

RMSE evaluation technique used to evaluate the actual sales and predicted sales.

The RMSE gives us an estimate of the error of the model on the validation data. A lower RMSE indicates better performance.Here we are getting RMSE as 32.49 which is quite feasible for the given model.

Conclusion:

In conclusion,Light AutoML library aims to simplify and accelerate the process of building machine learning models by automating various steps, such as feature engineering, model selection, hyperparameter tuning, and ensembling.It makes the entire process seamless and saves time.

In this blog, we have provided an overview of Light AutoML and demonstrated how to use it with code examples. We have checked the same without using AutoML. We have seen how AutoML helped to automatically carry out the task of machine learning, how it helped to solve regression problems using a sample dataset.Light AutoML can be beneficial in various scenarios, such as when data scientists want to rapidly prototype and evaluate multiple models, when domain experts without extensive machine learning knowledge need to leverage AI, or when computational resources are limited.

I hope you have had an amazing experience while reading this blog. I’m ending my blog with a famous saying “We are not what we know but what we are willing to learn”. Thanks for your patience and appreciation. If you liked this content, please provide your feedback in the comment section.