Workflow Element Store

  1. Flat files
  2. Surveys and Questionnaires
  3. Feedback Data
  4. Data bases - NoSQL
  5. Experiments (DoE)
  6. Public Datasets
  7. Mobile Applications or IoT Applications
  8. Data Collaboration and Partnerships
  9. APIs and Data Feeds
  10. Data Bases - SQL
  11. WebScraping
  1. MongoDB
  2. AWS Redshift
  3. RDBMS
  4. Azure blob storage
  5. Oracle DB
  6. MS SQL server
  7. MySQL
  8. s3
  9. GCP BigQuery
  10. Azure ADF
  11. AWS Kinesis
  12. GCS
  13. Apache Kafka
  14. PostgreSQL
  15. GCP Data Fusion
  16. AWS Glue
  17. ETL/ELT pipeline
  18. Azure Streaming Analytics
  19. Azure Synapse
  20. GCP Dataflow
  21. AWS RDS
  1. Dimensionality Reduction
  2. AutoEDA libraries
  3. Binning / Discretization
  4. Auto-Preprocessing libraries
  5. Handling Categorical Data
  6. Handling Missing Data
  7. Data Transformations
  8. Domain-Specific Feature Engineering
  9. Data Scaling and Normalization
  10. Handling Noisy Data
  11. Handling Time-Series Data
  12. Handling Imbalanced Classes
  13. Annotation
  14. Textual Feature Extraction
  15. Augmentation
  16. Data Partitioning - Train, Validation, & Test
  17. Feature Selection
  18. Interaction Features
  19. Dealing with Outliers
  20. Polynomial Features
  21. Time-Based Features
  22. Feature Extraction from Images
  1. External Validation
  2. Evaluation Metrics
  3. Regression Analysis
  4. Natural Language Processing
  5. Regularization Techniques
  6. Learning Rate Scheduling
  7. Cross-Validation
  8. AutoML
  9. Ensemble Techniques
  10. Early Stopping
  11. Regular Monitoring and Logging
  12. Recommendation Engine
  13. Reinforcement Learning
  14. Binary Classification Techniques
  15. Transfer Learning
  16. Word Embeddings
  17. Transfer Learning
  18. Data Augmentation
  19. Batch Normalization
  20. Weight Initialization
  21. Network Analytics/ GeoSpatial Analytics
  22. Batch Size Selection
  23. Forecasting Techniques
  24. Cross-Validation
  25. Regularization
  26. Model Interpretability
  27. Clustering
  28. Hyperparameter Tuning
  29. Performance Visualization
  30. Multiclass Classification Techniques
  31. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  32. Association Rules
  33. Model Comparison
  34. Blackbox - Neural Network Models
  1. model registry
  2. Datawarehouse
  3. code repository
  4. Databases
  5. Data Preprocessing pipeline models
  1. Prediction Logging
  2. Concept Drift Detection
  3. Flask
  4. Data Drift Monitoring
  5. Edge Deployment
  6. Bias and Fairness Assessment
  7. Alerting and Notification
  8. Cloud Deployment
  9. FastAPI
  10. Model Versioning
  11. Streamlit
  12. Containerization
  13. Performance Metrics
  14. Model Drift
  15. Model Health Monitoring
  16. Model Serialization
  17. Serverless Computing
  18. Feedback Collection
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API