MLOps Interview Questions & Answers
Table of Content
- What is the difference between MLOps, ModelOps & AIOps
- Define MLOps and how is it different from Data Science?
- What is the difference between MLOps and DevOps?
- What is the difference between MLOps and DataOps?
- What are the risks associated with Data Science & how MLOps can overcome the same?
- Is model deployment end of ML lifecycle?
- What is MLOps?
- What are the benefits of MLOps?
- How do you create infrastructure for MLOps?
- How to create CI/CD pipelines for machine learning?
- Explain about model/concept drift.
- What is MLOps?
- Difference between DevOps and MLOps?
- What is TFDV and how does it help with some of the pertinent challenges of MLOps?
- Define train/serve skew and some potential ways to avoid them
- In addition to CI and CD are there any other considerations unique to MLOps?
- What can be some of the deployment strategies borrowed from DevOps that can be utilized in MLOPs and how to achieve them?
What is the difference between MLOps, ModelOps & AIOps
Are you looking to become a MLOps Engineer? Go through 360DigiTMG's MLOps Course in Bangalore.
- MLOps is an application of DevOps in building end-to-end Machine Learning algorithms including - Data Collection, Data Pre-processing, Model Building, Model Deployment in Production, Monitoring Model in Production, and Model Periodic Upgradation.
- ModelOps is the application of DevOps in handling end to end implementation of any algorithms such as Rule-Based Models. This is a more generic term used
- AIOps is building AI applications end to end using DevOps concepts
Define MLOps and how is it different from Data Science?
MLOps is a profession where the entire lifecycle including the deployment and monitoring in production is performed seamlessly. This also means that the Data Science workforce with MLOps skills will be more preferred and this will be the way forward for scaling up the career ladder & earn lucrative salaries that are much higher than typical Data Scientists.
What is the difference between MLOps and DevOps?
MLOps & DevOps have a lot of things in common. However, DevOps include developing and deploying the software application code in production and this code is usually static and does not change rapidly.
MLOps on the other side also includes developing and deploying the ML code in production. However, here the data changes rapidly and the up-gradation of models has to happen more frequently than typical software application code.
What is the difference between MLOps and DataOps?
DataOps is a term coined by IBM with focus on data quality. Sudden change in data will trigger an alarm to the stakeholders for action.
yourself a promising career in MLOps Course in Chennai by enrolling in the MLOps Training and Placement Program offered by 360DigiTMG.
MLOps has DataOps as one of the components and in addition to that it has end to end model development, deployment, monitoring in place.
What are the risks associated with Data Science & how MLOps can overcome the same?
Data Science typically has the following issues:
- Model goes down without an alert and becomes unavailable
- Model gives incorrect predictions for a given observation that cannot be scrutinized further
- Model accuracy decreases further as and how time progresses
- Model maintenance also should be done by data scientists, who are expensive
- Model scaling across the organization is not easy
These risks can be addressed by using MLOps.
Is model deployment end of ML lifecycle?
Model deployment in production is in the current world being treated as the start of the actual ML lifecycle. Monitoring how the model is performing for a longer duration, how the data is increasing, and how to scale the model for wider organization use is something that is done post deployment. These are the activities which are at the core of ML lifecycle and it at the heart of MLOps.
What is MLOps?
MLOps, a.k.a Machine Learning Operations is an emerging domain within the larger AI/DS/ML space that addresses the problem of operationalizing the ML models. MLOps can be thought of as a practice and culture within software engineering which fundamentally attempts to unify the machine learning/data science model development (Dev) and its subsequent operationalization (Ops). MLOps has some analogies to traditional DevOps but it is also significantly different from that. While DevOps predominantly focuses on operationalizing code and software releases which may not be stateful, MLOps has another complexity added to it - data. That is why MLOps is often referred to as the union of ML + Data + Ops (machine learning, data engineering, and DevOps).
What are the benefits of MLOps?
MLOps has several benefits. Some of them are listed below (in no particular order)
- Improves Efficiency - By way of implementing MLOps principles allows both Data Engineers and Data Scientists to have unfettered access to curated and cultivated datasets and exponentially increases their ability to develop models faster
- Rinse/Repeat - Because MLOps helps in automating all or most of the tasks/steps in the MDLC (model development lifecycle), data scientists and MLOps engineers can reproduce experiments quickly ensuring models are trained and evaluated properly. Also enables versioning both for models and data.
- Improves reliability - Because MLOps practices borrow heavily from DevOps it also ingrains within itself several of CI/CD principles thereby improving code quality and reliability.
- Leaves breadcrumbs (audit trail) - The ability to have models and datasets versioned will improve the model audit trail considerably allowing data scientists to fall back on the model which performed better if the newer iteration does not meet expectations
How do you create infrastructure for MLOps?
There are many different ways in which MLOps infrastructure can be created. The core responsibility typically lies outside of the scope of an MLOps engineer. However, for a given set of existing environments, the MLOps engineer can definitely create a tech stack that can be best suited for hosting a successful machine learning platform. For example, if the enterprise has a predominantly AWS-based infrastructure, then it becomes easy to implement MLOps pipelines utilizing AWS Sagemaker framework in conjunction with services like Sagemaker pipelines, Cloudformation, Lambdas for orchestration and Infrastructure as Code. If the enterprise is open, then the best platform for most modern software development firms is leaning towards a Kubernetes (k8s) powered infrastructure. This also enables the ML engineer to adopt Kubeflow which is quickly becoming the de facto MLOps framework of choice for many ML practitioners. However, creating an infrastructure exclusively for ML models is generally not within the scope of an ML Engineer.
How to create CI/CD pipelines for machine learning?
CI stands for continuous integration and CD stands for continuous deployment. The fundamental feature of having a CI/CD pipeline is to ensure that data scientists and software engineering teams are able to create and deploy error-free code as quickly as possible.
Specifically, a CI/CD pipeline aims to automate and streamline the software deployment process which includes - building code, running tests and deploying new versions of model/application when there are updates/revisions.
CI/CD for machine learning has an added complexity in terms of including data in addition to code. But, it could be achieved through a variety of tools depending on the technical stack the enterprise is using.
If the technical stack is primarily AWS driven, Sagemaker pipelines can stand in for CI/CD pipelines.
Other approaches could be to use Kubeflow pipelines and traditional tools like Jenkins or even Github actions to build CI/CD pipelines.
Also, check this MLOps Course in Pune to start a career in Best MLOps Training Institute.
Explain about model/concept drift.
Model drift, sometimes called concept drift, occurs when the model performance during the inference phase (using real-world data) degrades when compared to its performance during the training phase (using historical, labeled data). It is also known as train/serve skew as the performance of the model is skewed when compared with the training and serving phases. This could be due to many reasons like
- The underlying distribution of data has changed
- Unforeseen events - like a model trained on pre-covid data is expected to perform much worse on data during the COVID-19 pandemic
- Training happened on a limited number of categories but a recent environmental change happened which added another category
- In NLP problems the real world data has significantly more number of tokens that are different from training data
To detect model drift, it is always necessary to keep continuously monitoring the performance of the model. If there is a sustained degradation of model performance, the cause needs to be investigated and treatment methods need to be applied accordingly which almost always involves model retraining.
What is MLOps?
MLOps is the intersection of Machine Learning and DevOps principles. There is a growing need in the Data Science and Artificial Intelligence industry to quickly and efficiently deploy ML models into production. With the power of MLOps, Data Scientists can iterate over models fast. Once the best one is evaluated and identified, it can be easily deployed typically on the cloud as containers. On top of that, MLOps frameworks allow data scientists to track and version their experiments, perform A/B testing, monitor their performance, and log results creating a feedback loop. It is a powerful framework for ML practitioners that can be achieved using Kubeflow, MLFlow, Apache Airflow, and Tensorflow Extended among others.
Difference between DevOps and MLOps?
MLOps has a lot of similarities with DevOps, in that it has origins in the latter. Essentially, MLOps exists because of the inherent differences between software engineering and machine learning projects. DevOps principles for software engineering are fairly robust and well established. But ML projects have some unique features such as:
- Exploratory in nature, sometimes there may not be a result or the result is not satisfactory.
- Data scientists and ML engineers are researchers and mathematicians who may not have the skills to produce production-quality code.
- The added complexity in testing - need to test data validation, the model quality validation, and model validation.
Continuous Integration (CI) is not just about code and components; it also needs to account for models, input data, and its schema.
To learn more about Best MLOps Courses, the best place is 360DigiTMG, with multiple awards in its name 360DigiTMG is the Best place to start your MLOps Course Training in Hyderabad. Enroll now!
Continuous Delivery (CD) is not just a single service or software but an entire ML pipeline (for various stages of the MLDC) which should serve the inference pipeline.
Continuous Training (CT) is unique to MLOps, where the framework has mechanisms in place for retraining and calibrating models periodically.
What is TFDV and how does it help with some of the pertinent challenges of MLOps?
TFX is the open-source version of the data science and initial phases of the MLOps solution developed by Google. It stands for Tensorflow Extended. The main idea behind TFX is ‘standardization’. There are multiple ways to develop a data science solution. But, that leads to issues such as reproducibility, production-friendly, and no way to monitor models post-production deployment.
TFX takes each step of the ML model life cycle and develops components to standardize the code development process. In practice, the ML Model lifecycle has a lot of stages before we begin training the model such as Data ingestion, validation, and transformation commonly bundled in the CRISP-ML stages known as Data Understanding and Data Preparation. TFX emphasizes the importance of validating datasets and asserting the schema, calculating the statistics and distribution of the features, etc. This is done using something called TensorFlow data validation a.k.a TFDV which is both a component of TFX. But, could also be used as a standalone library. Not only does this feature give us the ability to know about the distribution and schema of our data. But, also gives us the ability to compare two datasets that can be used to determine if our train/eval splits are having similar characteristics, etc.
By way of extension, this feature also allows us to compare the training data and the inference data and look for something called ‘data drift’. Data Drift is a condition where the inference data on which predictions are expected do not follow the same distribution as the training data.
Define train/serve skew and some potential ways to avoid them
Most often than not, data is not passed to the modeling phase in its raw format. It needs to be preprocessed and hence undergoes several transformations. Moreover, a lot of machine learning algorithms accept only numerical inputs and aren’t equipped to deal with missing values and outliers. In other cases, they need to be completely transformed into something else like removing/filling missing values, handling outliers, scaling numerical values, encoding categorical features, etc.
The challenge is that all the processing steps need to be repeated when trying to derive inferences because the model expects the data on which predictions need to be issued to be in the same format as the training data.
If the prediction data differs significantly from the training data then it can be argued that there is a train/serve skew.
There are multiple ways to avoid train serve skew like:
- Maintain separate module files for data preprocessing (a separate class or module.py file)
- Compose a preprocessing graph using TFX transform graph etc
In addition to CI and CD are there any other considerations unique to MLOps?
The introduction of data and mathematical logic (algorithms/models) that are applied to that data makes MLOps an interesting endeavor. Ideally for mos software engineering projects CI/CD should be enough, but MLOps also introduces the following concepts:
Continuous Training: when to re-train your model, how often do you do it, etc.
Continuous Monitoring: is the model performing well in the field.
What can be some of the deployment strategies borrowed from DevOps that can be utilized in MLOPs and how to achieve them?
There are a few concepts from web service deployments that map nicely into strategies for deploying models into production environments, like creating several instances of a live inferencing application for scalability and progressively switching from an older to a newer model. A couple of them are
- Blue-Green Deployment
- Canary deployment
In a Blue-Green deployment, the newer version of the model is brought into the staging environment that is almost identical to the production environment. In some cases, the environment is the same as the production environment but the traffic is routed differently. If we utilize Kubernetes, it is possible to have a single k8s cluster to route the traffic to a separate (new k8s cluster) - the ‘blue’ deployment while the production traffic is going to older - ‘green’ deployment. This is to allow further testing of the newer model in a production environment before complete adoption. Once enough confidence is established in the newer model the older version is then moved to ‘green’ status and the process will repeat with any further improvements.
Canary deployment is a bit more involved and usually a lot riskier but it is gaining popularity among the DevOps community. It follows a similar deployment model as the blue-green discussed above but provides the ability to progressively change configuration based on constraints depending on the level of confidence in the newer model. In this case, traffic is routed progressively to the newer model at the same time the previous model is serving predictions. So the two versions are live and processing requests simultaneously, but doing them in different ratios. The reason for this percentage-based rollout is that you can enable metrics and other checks to capture problems in real-time, allowing you to roll back immediately if conditions are unfavorable.
Both of these strategies can be applied by Kubeflow as it natively relies on the Kubernetes environment.
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Navigate to Address
360DigiTMG - Data Science, Data Scientist Course Training in Bangalore
No 23, 2nd Floor, 9th Main Rd, 22nd Cross Rd, 7th Sector, HSR Layout, Bengaluru, Karnataka 560102