Workflow Element Store

  1. Experiments (DoE)
  2. WebScraping
  3. Mobile Applications or IoT Applications
  4. Data bases - NoSQL
  5. Data Collaboration and Partnerships
  6. Data Bases - SQL
  7. Public Datasets
  8. Feedback Data
  9. Flat files
  10. APIs and Data Feeds
  11. Surveys and Questionnaires
  1. Apache Kafka
  2. Azure Synapse
  3. Oracle DB
  4. RDBMS
  5. Azure blob storage
  6. AWS Redshift
  7. s3
  8. MySQL
  9. GCP Dataflow
  10. AWS Glue
  11. GCS
  12. AWS RDS
  13. GCP Data Fusion
  14. ETL/ELT pipeline
  15. Azure ADF
  16. Azure Streaming Analytics
  17. MongoDB
  18. PostgreSQL
  19. AWS Kinesis
  20. MS SQL server
  21. GCP BigQuery
  1. Domain-Specific Feature Engineering
  2. Binning / Discretization
  3. Textual Feature Extraction
  4. Dimensionality Reduction
  5. Annotation
  6. Time-Based Features
  7. Feature Selection
  8. Data Partitioning - Train, Validation, & Test
  9. Interaction Features
  10. Feature Extraction from Images
  11. Handling Imbalanced Classes
  12. Dealing with Outliers
  13. Handling Noisy Data
  14. AutoEDA libraries
  15. Handling Missing Data
  16. Data Scaling and Normalization
  17. Handling Time-Series Data
  18. Handling Categorical Data
  19. Data Transformations
  20. Auto-Preprocessing libraries
  21. Polynomial Features
  22. Augmentation
  1. Regularization Techniques
  2. Reinforcement Learning
  3. Association Rules
  4. Performance Visualization
  5. Forecasting Techniques
  6. Batch Size Selection
  7. Multiclass Classification Techniques
  8. Network Analytics/ GeoSpatial Analytics
  9. Transfer Learning
  10. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  11. Cross-Validation
  12. Data Augmentation
  13. Regularization
  14. Evaluation Metrics
  15. External Validation
  16. Recommendation Engine
  17. Model Interpretability
  18. Learning Rate Scheduling
  19. Hyperparameter Tuning
  20. Word Embeddings
  21. Natural Language Processing
  22. Weight Initialization
  23. Blackbox - Neural Network Models
  24. Binary Classification Techniques
  25. Cross-Validation
  26. AutoML
  27. Batch Normalization
  28. Regular Monitoring and Logging
  29. Early Stopping
  30. Ensemble Techniques
  31. Regression Analysis
  32. Clustering
  33. Model Comparison
  34. Transfer Learning
  1. code repository
  2. Datawarehouse
  3. Data Preprocessing pipeline models
  4. Databases
  5. model registry
  1. Bias and Fairness Assessment
  2. Feedback Collection
  3. Prediction Logging
  4. Serverless Computing
  5. Performance Metrics
  6. Streamlit
  7. Cloud Deployment
  8. Model Drift
  9. Flask
  10. FastAPI
  11. Model Serialization
  12. Concept Drift Detection
  13. Alerting and Notification
  14. Data Drift Monitoring
  15. Edge Deployment
  16. Model Versioning
  17. Containerization
  18. Model Health Monitoring
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API