Login
Congrats in choosing to up-skill for your bright career! Please share correct details.
Home / Blog / Interview Questions / MLOps Interview Questions & Answers
Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.
Table of Content
Are you looking to become a MLOps Engineer? Go through 360DigiTMG's MLOps Course in Bangalore.
MLOps is a profession where the entire lifecycle including the deployment and monitoring in production is performed seamlessly. This also means that the Data Science workforce with MLOps skills will be more preferred and this will be the way forward for scaling up the career ladder & earn lucrative salaries that are much higher than typical Data Scientists.
MLOps & DevOps have a lot of things in common. However, DevOps include developing and deploying the software application code in production and this code is usually static and does not change rapidly.
MLOps on the other side also includes developing and deploying the ML code in production. However, here the data changes rapidly and the up-gradation of models has to happen more frequently than typical software application code.
DataOps is a term coined by IBM with focus on data quality. Sudden change in data will trigger an alarm to the stakeholders for action.
yourself a promising career in MLOps Course in Chennai by enrolling in the MLOps Training and Placement Program offered by 360DigiTMG.
MLOps has DataOps as one of the components and in addition to that it has end to end model development, deployment, monitoring in place.
Data Science typically has the following issues:
These risks can be addressed by using MLOps.
Model deployment in production is in the current world being treated as the start of the actual ML lifecycle. Monitoring how the model is performing for a longer duration, how the data is increasing, and how to scale the model for wider organization use is something that is done post deployment. These are the activities which are at the core of ML lifecycle and it at the heart of MLOps.
MLOps, a.k.a Machine Learning Operations is an emerging domain within the larger AI/DS/ML space that addresses the problem of operationalizing the ML models. MLOps can be thought of as a practice and culture within software engineering which fundamentally attempts to unify the machine learning/data science model development (Dev) and its subsequent operationalization (Ops). MLOps has some analogies to traditional DevOps but it is also significantly different from that. While DevOps predominantly focuses on operationalizing code and software releases which may not be stateful, MLOps has another complexity added to it - data. That is why MLOps is often referred to as the union of ML + Data + Ops (machine learning, data engineering, and DevOps).
MLOps has several benefits. Some of them are listed below (in no particular order)
There are many different ways in which MLOps infrastructure can be created. The core responsibility typically lies outside of the scope of an MLOps engineer. However, for a given set of existing environments, the MLOps engineer can definitely create a tech stack that can be best suited for hosting a successful machine learning platform. For example, if the enterprise has a predominantly AWS-based infrastructure, then it becomes easy to implement MLOps pipelines utilizing AWS Sagemaker framework in conjunction with services like Sagemaker pipelines, Cloudformation, Lambdas for orchestration and Infrastructure as Code. If the enterprise is open, then the best platform for most modern software development firms is leaning towards a Kubernetes (k8s) powered infrastructure. This also enables the ML engineer to adopt Kubeflow which is quickly becoming the de facto MLOps framework of choice for many ML practitioners. However, creating an infrastructure exclusively for ML models is generally not within the scope of an ML Engineer.
CI stands for continuous integration and CD stands for continuous deployment. The fundamental feature of having a CI/CD pipeline is to ensure that data scientists and software engineering teams are able to create and deploy error-free code as quickly as possible.
Specifically, a CI/CD pipeline aims to automate and streamline the software deployment process which includes - building code, running tests and deploying new versions of model/application when there are updates/revisions.
CI/CD for machine learning has an added complexity in terms of including data in addition to code. But, it could be achieved through a variety of tools depending on the technical stack the enterprise is using.
If the technical stack is primarily AWS driven, Sagemaker pipelines can stand in for CI/CD pipelines.
Other approaches could be to use Kubeflow pipelines and traditional tools like Jenkins or even Github actions to build CI/CD pipelines.
Also, check this MLOps Course in Pune to start a career in Best MLOps Training Institute.
Model drift, sometimes called concept drift, occurs when the model performance during the inference phase (using real-world data) degrades when compared to its performance during the training phase (using historical, labeled data). It is also known as train/serve skew as the performance of the model is skewed when compared with the training and serving phases. This could be due to many reasons like
To detect model drift, it is always necessary to keep continuously monitoring the performance of the model. If there is a sustained degradation of model performance, the cause needs to be investigated and treatment methods need to be applied accordingly which almost always involves model retraining.
MLOps is the intersection of Machine Learning and DevOps principles. There is a growing need in the Data Science and Artificial Intelligence industry to quickly and efficiently deploy ML models into production. With the power of MLOps, Data Scientists can iterate over models fast. Once the best one is evaluated and identified, it can be easily deployed typically on the cloud as containers. On top of that, MLOps frameworks allow data scientists to track and version their experiments, perform A/B testing, monitor their performance, and log results creating a feedback loop. It is a powerful framework for ML practitioners that can be achieved using Kubeflow, MLFlow, Apache Airflow, and Tensorflow Extended among others.
MLOps has a lot of similarities with DevOps, in that it has origins in the latter. Essentially, MLOps exists because of the inherent differences between software engineering and machine learning projects. DevOps principles for software engineering are fairly robust and well established. But ML projects have some unique features such as:
Continuous Integration (CI) is not just about code and components; it also needs to account for models, input data, and its schema.
To learn more about Best MLOps Courses, the best place is 360DigiTMG, with multiple awards in its name 360DigiTMG is the Best place to start your MLOps Course Training in Hyderabad. Enroll now!
Continuous Delivery (CD) is not just a single service or software but an entire ML pipeline (for various stages of the MLDC) which should serve the inference pipeline.
Continuous Training (CT) is unique to MLOps, where the framework has mechanisms in place for retraining and calibrating models periodically.
TFX is the open-source version of the data science and initial phases of the MLOps solution developed by Google. It stands for Tensorflow Extended. The main idea behind TFX is ‘standardization’. There are multiple ways to develop a data science solution. But, that leads to issues such as reproducibility, production-friendly, and no way to monitor models post-production deployment.
TFX takes each step of the ML model life cycle and develops components to standardize the code development process. In practice, the ML Model lifecycle has a lot of stages before we begin training the model such as Data ingestion, validation, and transformation commonly bundled in the CRISP-ML stages known as Data Understanding and Data Preparation. TFX emphasizes the importance of validating datasets and asserting the schema, calculating the statistics and distribution of the features, etc. This is done using something called TensorFlow data validation a.k.a TFDV which is both a component of TFX. But, could also be used as a standalone library. Not only does this feature give us the ability to know about the distribution and schema of our data. But, also gives us the ability to compare two datasets that can be used to determine if our train/eval splits are having similar characteristics, etc.
By way of extension, this feature also allows us to compare the training data and the inference data and look for something called ‘data drift’. Data Drift is a condition where the inference data on which predictions are expected do not follow the same distribution as the training data.
Most often than not, data is not passed to the modeling phase in its raw format. It needs to be preprocessed and hence undergoes several transformations. Moreover, a lot of machine learning algorithms accept only numerical inputs and aren’t equipped to deal with missing values and outliers. In other cases, they need to be completely transformed into something else like removing/filling missing values, handling outliers, scaling numerical values, encoding categorical features, etc.
The challenge is that all the processing steps need to be repeated when trying to derive inferences because the model expects the data on which predictions need to be issued to be in the same format as the training data.
If the prediction data differs significantly from the training data then it can be argued that there is a train/serve skew.
There are multiple ways to avoid train serve skew like:
The introduction of data and mathematical logic (algorithms/models) that are applied to that data makes MLOps an interesting endeavor. Ideally for mos software engineering projects CI/CD should be enough, but MLOps also introduces the following concepts:
Continuous Training: when to re-train your model, how often do you do it, etc.
Continuous Monitoring: is the model performing well in the field.
There are a few concepts from web service deployments that map nicely into strategies for deploying models into production environments, like creating several instances of a live inferencing application for scalability and progressively switching from an older to a newer model. A couple of them are
In a Blue-Green deployment, the newer version of the model is brought into the staging environment that is almost identical to the production environment. In some cases, the environment is the same as the production environment but the traffic is routed differently. If we utilize Kubernetes, it is possible to have a single k8s cluster to route the traffic to a separate (new k8s cluster) - the ‘blue’ deployment while the production traffic is going to older - ‘green’ deployment. This is to allow further testing of the newer model in a production environment before complete adoption. Once enough confidence is established in the newer model the older version is then moved to ‘green’ status and the process will repeat with any further improvements.
Canary deployment is a bit more involved and usually a lot riskier but it is gaining popularity among the DevOps community. It follows a similar deployment model as the blue-green discussed above but provides the ability to progressively change configuration based on constraints depending on the level of confidence in the newer model. In this case, traffic is routed progressively to the newer model at the same time the previous model is serving predictions. So the two versions are live and processing requests simultaneously, but doing them in different ratios. The reason for this percentage-based rollout is that you can enable metrics and other checks to capture problems in real-time, allowing you to roll back immediately if conditions are unfavorable.
Both of these strategies can be applied by Kubeflow as it natively relies on the Kubernetes environment.
360DigiTMG - Data Science, Data Scientist Course Training in Bangalore
No 23, 2nd Floor, 9th Main Rd, 22nd Cross Rd, 7th Sector, HSR Layout, Bengaluru, Karnataka 560102
+91-9989994319, 1800-212-654-321
Didn’t receive OTP? Resend
Let's Connect! Please share your details here