Workflow Element Store

  1. Mobile Applications or IoT Applications
  2. Data bases - NoSQL
  3. Experiments (DoE)
  4. WebScraping
  5. Flat files
  6. Data Bases - SQL
  7. APIs and Data Feeds
  8. Data Collaboration and Partnerships
  9. Feedback Data
  10. Public Datasets
  11. Surveys and Questionnaires
  1. s3
  2. AWS Glue
  3. GCS
  4. Azure Streaming Analytics
  5. RDBMS
  6. PostgreSQL
  7. MS SQL server
  8. Apache Kafka
  9. AWS Redshift
  10. Azure Synapse
  11. Azure blob storage
  12. GCP Dataflow
  13. MySQL
  14. GCP Data Fusion
  15. ETL/ELT pipeline
  16. MongoDB
  17. Oracle DB
  18. Azure ADF
  19. AWS Kinesis
  20. AWS RDS
  21. GCP BigQuery
  1. Dealing with Outliers
  2. Data Scaling and Normalization
  3. Data Transformations
  4. Domain-Specific Feature Engineering
  5. Handling Noisy Data
  6. Feature Extraction from Images
  7. Time-Based Features
  8. Annotation
  9. Feature Selection
  10. Dimensionality Reduction
  11. Textual Feature Extraction
  12. Handling Imbalanced Classes
  13. Handling Categorical Data
  14. Auto-Preprocessing libraries
  15. Interaction Features
  16. Handling Missing Data
  17. AutoEDA libraries
  18. Binning / Discretization
  19. Polynomial Features
  20. Augmentation
  21. Data Partitioning - Train, Validation, & Test
  22. Handling Time-Series Data
  1. Binary Classification Techniques
  2. Regression Analysis
  3. Regularization
  4. Performance Visualization
  5. Batch Normalization
  6. Batch Size Selection
  7. Natural Language Processing
  8. Forecasting Techniques
  9. Ensemble Techniques
  10. Reinforcement Learning
  11. Model Comparison
  12. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  13. Cross-Validation
  14. AutoML
  15. Association Rules
  16. Transfer Learning
  17. Weight Initialization
  18. Clustering
  19. Data Augmentation
  20. Cross-Validation
  21. Transfer Learning
  22. Blackbox - Neural Network Models
  23. Recommendation Engine
  24. Regularization Techniques
  25. Learning Rate Scheduling
  26. Hyperparameter Tuning
  27. Model Interpretability
  28. Regular Monitoring and Logging
  29. Network Analytics/ GeoSpatial Analytics
  30. Multiclass Classification Techniques
  31. Word Embeddings
  32. Evaluation Metrics
  33. External Validation
  34. Early Stopping
  1. Data Preprocessing pipeline models
  2. Databases
  3. model registry
  4. code repository
  5. Datawarehouse
  1. Data Drift Monitoring
  2. Containerization
  3. Model Drift
  4. Bias and Fairness Assessment
  5. FastAPI
  6. Model Health Monitoring
  7. Alerting and Notification
  8. Model Versioning
  9. Flask
  10. Prediction Logging
  11. Streamlit
  12. Feedback Collection
  13. Performance Metrics
  14. Cloud Deployment
  15. Edge Deployment
  16. Model Serialization
  17. Serverless Computing
  18. Concept Drift Detection
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API