Home / Blog / Data Science / AutoGluon - AutoML Framework for Deep Learning

AutoGluon - AutoML Framework for Deep Learning

June 23, 2024
65

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Machine learning became well-known because to its broad variety of uses, which include financial services, life science analysis, marketing, and manufacturing. These uses have significant advantages for several sectors. Since then, "Automated Machine Learning," or simply "AutoML," has grown in popularity and is producing astounding results across all applications. Traditional machine learning involves numerous processes, including data, algorithm, assessment, and deployment, in the decision-making process for each application. The AutoML technique was developed as a result of the difficulties in creating and evaluating several models before choosing the best one. This helps with the building of a pipeline and the AutoML framework, which takes into account various machine learning models, preprocesses data, and adjusts hyperparameters.

The AutoML technology and the AutoML frameworks are quite new technologies that support the Machine Learning model building process. Currently, various AutoML frameworks can work with a variety of data, available in both open-source and paid versions. We have many such AutoML packages, such as AutoGluon, Auto-PyTorch, MLJAR, H2O AutoML, MLBox, TPOT, AutoKeras, Auto-sklearn, Autoworker, etc., which are open-sourced, but, there are few commercially available packages such as Darwin, DataRobot, Google AutoML, etc.

Image courtesy: https://towardsdatascience.com/autogluon-deep-learning-automl-5cdb4e2388ec

Amazon Web Services recently launched this open-source library AutoGluon that allows developers to devise Deep Learning models on data such as images, text, or tabular data using just a few lines of code. This toolbox is planned to be an easy-to-use and easy-to-extend AutoML toolkit, which would be a tool that could be used for both Machine Learning beginners and experts. Using this AutoML framework, the advantage is using a short code; it can perform multiple applications such as automatic hyperparameter tuning, model selection process, and data pre-processing, such as data cleansing, feature engineering, and automatic application of SOTA Deep Learning models.

Let's experiment with some of AutoGluon's capabilities that automate machine learning activities in this post. A few of the AutoGluon's applications include ensembling, deep learning, and real-world applications that cover picture, text, and tabular data, making it a simple-to-use and simple-to-extend AutoML automated package. We can prototype Deep Learning and conventional ML solutions more quickly by using the AutoGluon AutoML framework, which just requires a few lines of code. Anyone may use this package and access these approaches in the proper context without needing to have a thorough understanding of any of the state-of-the-art procedures. Automatic hyperparameter tweaking, model selection and assembly, architectural search, and data processing are the advantages or superiority that we obtain by utilising this package in AutoML. Depending on your area of expertise, it is simple to adjust or refine the many models and data pipelines produced by AutoGluon. It is also simple to customise to any of your use-cases.

One of the newest libraries created by AWS, the AutoGluon Library, was created in 2020 and aids in obtaining a good prediction performance in various Machine Learning and Deep Learning models. When it comes to installation, it is supported for the Linux and Mac operating systems, however AutoGluon Library is not officially supported by Windows OS. It was created as an AutoML open-source toolkit. When we refer to the AutoGluon library as simple, we mean that it describes the training and deployment of regression and classification models, which can be accomplished by implementing only a few lines of code. Users are able to use raw data using this package without the need for feature engineering or data transformation. With the help of these AutoGluon package capabilities, we can find the optimal model within a certain time limit. In addition, it is a fault-tolerance AutoML framework that allows users to review all of the intermediate phases and allows training to restart in the event of an interruption.

How do you install AutoGluon?

To install AutoGluon, we require Python version 3.7, 3.8, or 3.9 and by using the following lines of code, we can install this package.

pip3 install -U pip

pip3 install -U setuptools wheel

pip3 install "torch>=1.0,<1.11+cpu" -f

pip3 install autogluon

AutoGluon is divided into sub-modules dedicated for tabular, text, or image data and by installing a specific sub-module, we can reduce the number of dependencies required by executing python3 -m pip install , where could be related to any of the data. We have submodules, such as, autogluon.tabular, autogluon.vision, autogluon.text, autogluon.core and autogluon.features. So, AutoGluon can be used for the different categories, such as tabular prediction, image prediction, object prediction, text prediction, and multimodal prediction. When we fit and fine-tune our model, TabularPrediction(Classification) with AutoGluon, we use TabularPredictor(label=’stroke’).fit(train_data = df_train, verbosity = 2,presets='best_quality'), where based on the two unique labels ‘0’ & ‘1’, AutoGluon perfectly recognizes the classification problem in the outcome column. AutoGluon trains different models and selects the best model spontaneously. For example: when a TabularPrediction (Regression) has been taken, AutoGluon trained 11 models and recommended KNN (KNeighborsDist_BAG_L1) as the best model followed by XGBoost (XGBoost_BAG_L1).

For categorising photos based on their content in Tabular Prediction, AutoGluon utilises a straightforward 'fit ()' function, which automatically generates high-quality image classification models. Computer vision analysis for object identification is crucial in the process of finding and concentrating items in a picture. AutoGluon has the ability to use a straightforward 'fit ()' command to automatically create a high-quality object recognition model for detecting the presence and placement of objects in photos in this situation as well.

Let's look at an illustration where we use Python code to first import AutoGluon and then offer a specific job where we will interact with tabular data using TabularPrediction. The CSV file's dataset would be stored on S3. When the method fit () is called, AutoGluon examines the data and creates an assortment of machine learning models known as a "predictor" that can forecast the "class" variable in the input data. The other columns, such as the persons' employment, age, and education, are used by AutoGluon as predictive factors. This ensemble comprises many trained and evaluated algorithms, including LightGBM, CatBoost, and Deep Neural Networks, which routinely beat more conventional ML models like logistic regression. These algorithms are renowned within the ML community for their quality, resilience, and speed.

AutoGluon’s model leaderboard, where we can notice different models and their accuracies.

For the prediction of text data in supervised learning, we can use a simple ‘fit()’ command that can automatically generate high-quality text prediction models. The training examples in data can be a sentence, a short paragraph, or some additional numeric/categorical features present in the text. If we provide a single function, ‘predictor.fit()’ command, it can train highly accurate neural networks on the given text dataset where the target values or labels used to predict may be continuous values or individual categories. Even though the TextPredictor is designed for classification and regression tasks only, it can directly be used for other NLP tasks also if the data is properly formatted into a data table. The TextPredictor uses only Transformer neural network models. These are fit to the provided data via transfer learning from a pre-trained list of NLP models like BERT, ALBERT, and ELECTRA. It also allows training on multi-modal data tables which contain text, numeric and categorical columns, and the neural network hyperparameter which can be automatically tuned with Hyperparameter Optimization (HPO). Multimodal tabular data consisting of text, numeric, and categorical columns can also be handled by AutoGluon. Raw text data is observed as a first-class citizen of data tables in AutoGluon. It can help you train and match a wide variety of models including classical tabular models like LightGBM, RF, and CatBoost as well as the pre-trained NLP model-based multimodal network.

We have learned a lot of fascinating information regarding the efficiency of AutoGluon AutoML frameworks. It is being used to quickly and expertly shorten the time it takes to develop ML models that are suitable for production. This quickens the whole ML process and gives data scientists more time to concentrate on coming up with creative solutions to real-world issues. The flexibility of the AutoGluon AutoML framework to train and test many current Machine Learning algorithms on various data sets separately can be considered the major advantage of utilising it. It should also be emphasised that employing the AutoGluon AutoML framework does not eliminate the necessity for training and a fundamental comprehension of the data, data annotation, and the intended conclusion. As a result, the AutoGluon AutoML framework's success will probably depend on how quickly it is embraced and used as well as the real advantages it offers a particular sector. However, we can affirm that the AutoGluon AutoML framework will continue to exist and provide the answer.

As previously mentioned, AutoGluon is an open-source AutoML framework that allows users to build extremely precise Machine Learning models on raw data using just one line of Python. When AutoGluon is compared to other AutoML systems like TPOT, H2O, AutoWEKA, auto-sklearn, and Google AutoML, it is shown to be quicker, more reliable, and significantly more accurate. As a result, the AutoGluon AutoML framework will be crucial in the future of machine learning.