Home / Blog / Data Science / Auto Stacker

Auto Stacker

June 23, 2024
92

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Table of Content

Problem Setting
System Architecture
Search Algorithm
Training and Testing Process
Scaling and Parallelization
Application of Auto-Stacker in Genetic Algorithm
Limitation
Conclusion

According to the No Free Lunch (NFL) theorem by Wolpert and Macready, no single Machine Learning algorithm is suitable to perform all possible problems. Based on this theorem, whenever a new dataset comes up a new question arises ‘what model can be chosen’ or ‘what are the best hyper-parameters for the selected model’. To select a good model knowledge and experience are required. And choosing the best hyper-parameters often take much time for tuning. If this process of model selection and hyper-parameter tuning can be automated, then we can easily train the newly available dataset. If this solution is automated, the non-expert users may also avail of this to get solutions, making it readily available to everyone.

To meet this need of offering the ability to tackle challenging ML issues without a lot of code, an auto-ML system is created. These Auto-ML models provide one or more modelling processes that perform well on the dataset after receiving prepared datasets as input. Numerous effective auto-ML libraries have been developed, such as Auto-SKlearn and TPOT, which exhibit versatility in performance when it comes to operating on a range of datasets.

A brand-new Auto-ML architecture called Auto-Stacker has been unveiled in this same sector. The stacking approach used in ensemble learning serves as the model's main source of inspiration. To identify a solution, Auto-Stacker may locate pipelines that contain many machine learning models. If Auto-Stacker uses greater number datasets than other Auto-ML models, it exhibits equivalent capacity to solve a problem. Auto-Stacker is a formidable rival in terms of both speed and accuracy.

Cascading: It is a process in which while dealing with a small or sparse dataset, cascading is done to overcome this problem. The original dataset is used in each stacking layer and synthetic columns are added to each layer of stacking.

Model Flexibility: The existing Auto-ML models output a full pipeline consisting of preprocessing, feature engineering, and model selection. Model selection requires optimization of primitive machine learning models like SVM or a traditional ensemble method like boosting. Auto-Stacker gives the facility to combine multiple machine learning models resulting in a larger range of uses.

Evolutionary Search Algorithm: Evolutionary Algorithm (EA) helps in finding proper solutions when a large number of variables are present. These variables might include primitive machine learning models, hyper-parameters for each model, and also the configuration setting of the framework. All of the above are taken as hyper-parameters and in this large space of hyper-parameters, we treat the Auto-ML as a search problem instead of the traditional approach of the optimization problem. Making use of the parallel search capability of EA, Auto-stacker models can find good pipelines very quickly.

To better understand the above properties we must also understand stacking, cascading, and AutoMl.

Stacking and Cascading:A way of an ensemble is stacking. A stacking model uses the following strategy. The original dataset is located in the top layer of stacking. The outputs of the classifiers in the preceding levels serve as the input for the following layer, and so on. The subsequent layer makes corrections as a result of the errors made by the earlier levels. Cascading uses the output of one layer as the input for the following layer, much like stacking does. Viola and Jones investigated it as a model for group learning. Binary classifiers are used to feed the data here. The iteration pauses and the cascade returns a false result if the classifier's output is false; otherwise, it advances to the next classifier. The cascade delivers a true result if the ensemble model completes all levels and the output is true. Heitz et al. developed the advanced version of cascading known as cascaded classification models (CCMs). A difficult problem is broken down into smaller, simpler ones by doing this.

Automatic Machine Learning: Automated machine learning techniques are focused on two major tasks:

⦁ Building a machine learning pipeline,

⦁ Intelligent model hyper-parameter search.

One such instance is AutoWeka, which chooses a single simple machine learning model and optimises its hyper-parameter. It is based on Weka, and the best hyper-parameter values for the machine learning pipeline are found using Bayesian Optimisation (a sequential model-based optimisation approach). The machine learning pipeline adheres to the usual steps of feature engineering, prediction, and data preparation. However, complicated or tiny datasets cannot be employed with this type of model (model with a single prediction). A similar approach is used by AutoSKlearn, which similarly employs a Bayesian optimizer to adjust hyper-parameters and a Machine Learning library as a toolbox.

Several other Bayesian Optimization algorithms are designed to handle large-scale parameter configuration problems like that of an Auto-ML model. One such example is Robo which does multiple implementations of different Bayesian optimization. It also provides the facility to change the components of this process. Hyperopt also uses a similar sequential model-based optimization approach to design a pipeline based on classification and preprocessing models.

TPOT architecture on the other hand explores the use of EA for hyper-parameter optimization. Adding to the approaches in AutoWeka and AutoSKLearn, TPOT also does parallel feature engineering before model prediction. Also, TPOT uses EA to make the parameter optimization a search problem.

All of the above techniques focus primarily on a single machine learning model. With AutoSKlearn ensemble models could be built but only traditional ensemble methods are considered. It was seen that with stacking, ensemble models were more robust and had an efficient performance.

However, Auto-Stacker is an ensemble approach by default. As a primitive, it can handle both individual and ensemble techniques concurrently. Contrarily, the cascade approach enables the compatible integration of many ML primitives in order to effectively manage generalisation. We may employ numerous ensemble models on the same architecture using Auto-Stacker. Therefore, Auto-Stacker may be utilised, which combines complementary ML primitives to get higher performance, as opposed to the conventional technique where one primitive model is picked and optimised. In contrast to a single basic model AutoML system like TPOT or AutoSklearn, the search area is considerably large when stacking models on top of one another. Additionally, the primitive models must be optimised. Use of the evolutionary algorithm is possible to address these problems. To identify the right hyper-parameters, Evolutionary Algorithm is employed rather than other optimisation approaches like Bayesian Optimisation. EA is utilised for optimisation in other domains as well, such as neural network optimisation and reinforcement learning, proving that it is effective in situations with big search spaces.

Problem Setting:

In supervised learning, a model is the function ‘f’ relating input data ‘X’ and output data ‘Y’.

Y ???? fH,O (X)

The model f is controlled by two parameters i.e. H hyperparameters and O model parameters.

Also, the terms used in this paper are:

● Primitive: It is a preexisting ML model like a Decision Tree.

● Pipeline: It is the output of Auto-Stacker; it might be a single primitive or a combination of multiple primitives.

● Layer and Node: Each stack is a layer and each layer contains multiple nodes. Each node is a primitive.

System Architecture:

An Auto-Stacker has many pipelines, each of which has layers with numerous nodes. Every node contains a primitive. The dataset is directly used by the first layer. And the subsequent levels use the outputs of the preceding layer as their inputs. Learning is transferred from one layer to another. The dataset that serves as an input to the following layer as synthetic features contains the prediction from the presentation layer. This results in traits being given higher weight, which is important for prediction.

The hyperparameter space in auto stacker consists of:

● Type of each primitive

● Each model hyperparameter within each primitive

● Number of layers in each pipeline

● Number of nodes in each layer.

Search Algorithm:

An evolutionary algorithm is used to find the optimal hyperparameters for each primitive machine learning algorithm. EA's bare-bones algorithm consists of only mutation and crossover, without any sophisticated technique. This is enough to get significantly better performance.

Training and Testing Process:

The stacking model is trained layer by layer, starting at the evaluation stage and following the hierarchical structure. Additionally, a layer's primitives are individually trained using the same dataset. The dataset and the prediction outcomes from the previous learned layer are combined to train the next layers.

The same technique is used for both the validation and test processes, although they use different test sets. The Auto-Stacker's final output is selected from the top 10 pipelines with the best accuracy.

Scaling and Parallelization:

Auto-Stacker is having the advantage over others as the system is very flexible in scaling up and parallelization. This is due to EA. Each pipeline works independently starting from initialization, mutation, cross-over, training, validation, and evaluation, so each pipeline can work parallel. The results are only shared so that they can be ranked.

Application of Auto-Stacker in Genetic Algorithm:

A succession of feature selectors, pre-processors, and ML models may be genetically programmed to be optimised by Auto-Stacker, an EA-based AutoML system, to increase classification accuracy. It has been observed that Auto-Stacker typically beats its rival method even when no previous information about the task is known. It can readily handle big datasets, such as those from gene expression investigations, because it is an ensemble model.

Limitation:

In problems involving large, high-dimensional datasets or multi-tasking problems like Computer vision and natural language processing, the Deep Learning model approach is dominant. In such cases, Auto-Stacker doesn’t scale well.

Auto-Stacker has much more field for improvement by making improvements in its search algorithm.

Conclusion:

An AutoML system called Auto-Stacker uses stacking, cascading, and evolutionary algorithms. It surpasses several rival AutoMl systems even in the absence of data pretreatment and feature selection stages.