Workflow Element Store

  1. Flat files
  2. Surveys and Questionnaires
  3. Mobile Applications or IoT Applications
  4. WebScraping
  5. Experiments (DoE)
  6. Data Bases - SQL
  7. Data bases - NoSQL
  8. Feedback Data
  9. APIs and Data Feeds
  10. Public Datasets
  11. Data Collaboration and Partnerships
  1. Apache Kafka
  2. AWS RDS
  3. MySQL
  4. GCP Dataflow
  5. AWS Kinesis
  6. GCP BigQuery
  7. Azure Synapse
  8. ETL/ELT pipeline
  9. Azure Streaming Analytics
  10. Azure blob storage
  11. RDBMS
  12. PostgreSQL
  13. AWS Glue
  14. Oracle DB
  15. s3
  16. GCS
  17. GCP Data Fusion
  18. MS SQL server
  19. AWS Redshift
  20. Azure ADF
  21. MongoDB
  1. Handling Missing Data
  2. Polynomial Features
  3. Data Scaling and Normalization
  4. Handling Imbalanced Classes
  5. Feature Selection
  6. Data Transformations
  7. Interaction Features
  8. Handling Time-Series Data
  9. Handling Noisy Data
  10. Data Partitioning - Train, Validation, & Test
  11. Binning / Discretization
  12. AutoEDA libraries
  13. Domain-Specific Feature Engineering
  14. Textual Feature Extraction
  15. Time-Based Features
  16. Feature Extraction from Images
  17. Dimensionality Reduction
  18. Augmentation
  19. Auto-Preprocessing libraries
  20. Handling Categorical Data
  21. Dealing with Outliers
  22. Annotation
  1. Transfer Learning
  2. Regular Monitoring and Logging
  3. Multiclass Classification Techniques
  4. Transfer Learning
  5. Regularization Techniques
  6. Batch Size Selection
  7. Natural Language Processing
  8. Weight Initialization
  9. Binary Classification Techniques
  10. Regularization
  11. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  12. External Validation
  13. Batch Normalization
  14. Blackbox - Neural Network Models
  15. Ensemble Techniques
  16. Performance Visualization
  17. Cross-Validation
  18. Model Interpretability
  19. Forecasting Techniques
  20. Learning Rate Scheduling
  21. Reinforcement Learning
  22. Recommendation Engine
  23. Regression Analysis
  24. Data Augmentation
  25. Word Embeddings
  26. Network Analytics/ GeoSpatial Analytics
  27. Model Comparison
  28. Cross-Validation
  29. Association Rules
  30. Evaluation Metrics
  31. Clustering
  32. AutoML
  33. Hyperparameter Tuning
  34. Early Stopping
  1. code repository
  2. Datawarehouse
  3. Data Preprocessing pipeline models
  4. model registry
  5. Databases
  1. Containerization
  2. Cloud Deployment
  3. Model Drift
  4. Streamlit
  5. Concept Drift Detection
  6. Model Versioning
  7. Bias and Fairness Assessment
  8. Edge Deployment
  9. FastAPI
  10. Data Drift Monitoring
  11. Feedback Collection
  12. Alerting and Notification
  13. Model Serialization
  14. Prediction Logging
  15. Performance Metrics
  16. Flask
  17. Serverless Computing
  18. Model Health Monitoring
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API