Workflow Element Store

  1. WebScraping
  2. Surveys and Questionnaires
  3. Mobile Applications or IoT Applications
  4. Data bases - NoSQL
  5. APIs and Data Feeds
  6. Data Bases - SQL
  7. Flat files
  8. Experiments (DoE)
  9. Public Datasets
  10. Data Collaboration and Partnerships
  11. Feedback Data
  1. RDBMS
  2. ETL/ELT pipeline
  3. Apache Kafka
  4. AWS Kinesis
  5. MS SQL server
  6. AWS RDS
  7. Azure ADF
  8. s3
  9. GCP Dataflow
  10. MySQL
  11. Azure blob storage
  12. GCP BigQuery
  13. PostgreSQL
  14. AWS Redshift
  15. AWS Glue
  16. Azure Streaming Analytics
  17. Azure Synapse
  18. Oracle DB
  19. MongoDB
  20. GCS
  21. GCP Data Fusion
  1. Binning / Discretization
  2. Annotation
  3. Handling Noisy Data
  4. Polynomial Features
  5. Feature Selection
  6. Dealing with Outliers
  7. Handling Time-Series Data
  8. Data Partitioning - Train, Validation, & Test
  9. Auto-Preprocessing libraries
  10. Feature Extraction from Images
  11. Data Transformations
  12. Handling Categorical Data
  13. Handling Missing Data
  14. Handling Imbalanced Classes
  15. Time-Based Features
  16. Domain-Specific Feature Engineering
  17. Interaction Features
  18. Augmentation
  19. Dimensionality Reduction
  20. AutoEDA libraries
  21. Data Scaling and Normalization
  22. Textual Feature Extraction
  1. Regression Analysis
  2. Performance Visualization
  3. Network Analytics/ GeoSpatial Analytics
  4. Transfer Learning
  5. Multiclass Classification Techniques
  6. Forecasting Techniques
  7. Cross-Validation
  8. AutoML
  9. Weight Initialization
  10. Blackbox - Neural Network Models
  11. Early Stopping
  12. External Validation
  13. Clustering
  14. Natural Language Processing
  15. Ensemble Techniques
  16. Model Interpretability
  17. Association Rules
  18. Batch Size Selection
  19. Word Embeddings
  20. Cross-Validation
  21. Learning Rate Scheduling
  22. Regular Monitoring and Logging
  23. Reinforcement Learning
  24. Transfer Learning
  25. Regularization
  26. Regularization Techniques
  27. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  28. Recommendation Engine
  29. Batch Normalization
  30. Binary Classification Techniques
  31. Data Augmentation
  32. Model Comparison
  33. Hyperparameter Tuning
  34. Evaluation Metrics
  1. Databases
  2. model registry
  3. Datawarehouse
  4. code repository
  5. Data Preprocessing pipeline models
  1. Cloud Deployment
  2. Model Versioning
  3. Streamlit
  4. Performance Metrics
  5. Bias and Fairness Assessment
  6. FastAPI
  7. Serverless Computing
  8. Concept Drift Detection
  9. Edge Deployment
  10. Prediction Logging
  11. Model Drift
  12. Alerting and Notification
  13. Feedback Collection
  14. Model Serialization
  15. Flask
  16. Data Drift Monitoring
  17. Containerization
  18. Model Health Monitoring
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API