Workflow Element Store

  1. WebScraping
  2. Data bases - NoSQL
  3. Data Bases - SQL
  4. Experiments (DoE)
  5. Data Collaboration and Partnerships
  6. Mobile Applications or IoT Applications
  7. Surveys and Questionnaires
  8. Flat files
  9. APIs and Data Feeds
  10. Feedback Data
  11. Public Datasets
  1. AWS Redshift
  2. AWS Kinesis
  3. RDBMS
  4. Azure ADF
  5. MongoDB
  6. GCP Data Fusion
  7. MS SQL server
  8. Azure blob storage
  9. GCS
  10. Oracle DB
  11. PostgreSQL
  12. GCP Dataflow
  13. Azure Streaming Analytics
  14. ETL/ELT pipeline
  15. GCP BigQuery
  16. s3
  17. AWS Glue
  18. Apache Kafka
  19. MySQL
  20. AWS RDS
  21. Azure Synapse
  1. Textual Feature Extraction
  2. Feature Extraction from Images
  3. Data Scaling and Normalization
  4. Dimensionality Reduction
  5. Handling Imbalanced Classes
  6. Handling Missing Data
  7. Annotation
  8. Auto-Preprocessing libraries
  9. Data Partitioning - Train, Validation, & Test
  10. Handling Categorical Data
  11. Polynomial Features
  12. AutoEDA libraries
  13. Feature Selection
  14. Interaction Features
  15. Binning / Discretization
  16. Data Transformations
  17. Domain-Specific Feature Engineering
  18. Dealing with Outliers
  19. Handling Noisy Data
  20. Handling Time-Series Data
  21. Augmentation
  22. Time-Based Features
  1. AutoML
  2. Forecasting Techniques
  3. Ensemble Techniques
  4. Regression Analysis
  5. Batch Size Selection
  6. Recommendation Engine
  7. Early Stopping
  8. Multiclass Classification Techniques
  9. Network Analytics/ GeoSpatial Analytics
  10. Cross-Validation
  11. Hyperparameter Tuning
  12. Model Interpretability
  13. Word Embeddings
  14. Natural Language Processing
  15. Transfer Learning
  16. Blackbox - Neural Network Models
  17. Clustering
  18. Data Augmentation
  19. Model Comparison
  20. Binary Classification Techniques
  21. External Validation
  22. Batch Normalization
  23. Reinforcement Learning
  24. Evaluation Metrics
  25. Performance Visualization
  26. Regularization Techniques
  27. Weight Initialization
  28. Cross-Validation
  29. Regular Monitoring and Logging
  30. Transfer Learning
  31. Learning Rate Scheduling
  32. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  33. Regularization
  34. Association Rules
  1. Datawarehouse
  2. code repository
  3. model registry
  4. Databases
  5. Data Preprocessing pipeline models
  1. Concept Drift Detection
  2. Model Serialization
  3. Cloud Deployment
  4. Containerization
  5. Performance Metrics
  6. Edge Deployment
  7. Flask
  8. Model Drift
  9. Feedback Collection
  10. Alerting and Notification
  11. Prediction Logging
  12. Streamlit
  13. FastAPI
  14. Data Drift Monitoring
  15. Bias and Fairness Assessment
  16. Model Versioning
  17. Serverless Computing
  18. Model Health Monitoring
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API