Workflow Element Store

  1. WebScraping
  2. Data Collaboration and Partnerships
  3. Public Datasets
  4. Flat files
  5. Feedback Data
  6. Experiments (DoE)
  7. Data Bases - SQL
  8. APIs and Data Feeds
  9. Mobile Applications or IoT Applications
  10. Data bases - NoSQL
  11. Surveys and Questionnaires
  1. Apache Kafka
  2. GCP BigQuery
  3. AWS Glue
  4. GCS
  5. AWS Redshift
  6. GCP Data Fusion
  7. MySQL
  8. GCP Dataflow
  9. RDBMS
  10. Azure ADF
  11. AWS RDS
  12. PostgreSQL
  13. Oracle DB
  14. ETL/ELT pipeline
  15. Azure Synapse
  16. s3
  17. MongoDB
  18. Azure Streaming Analytics
  19. MS SQL server
  20. Azure blob storage
  21. AWS Kinesis
  1. Handling Categorical Data
  2. Interaction Features
  3. Handling Time-Series Data
  4. Textual Feature Extraction
  5. Augmentation
  6. Annotation
  7. AutoEDA libraries
  8. Polynomial Features
  9. Handling Missing Data
  10. Data Transformations
  11. Feature Selection
  12. Domain-Specific Feature Engineering
  13. Handling Imbalanced Classes
  14. Dimensionality Reduction
  15. Data Partitioning - Train, Validation, & Test
  16. Time-Based Features
  17. Auto-Preprocessing libraries
  18. Feature Extraction from Images
  19. Binning / Discretization
  20. Dealing with Outliers
  21. Data Scaling and Normalization
  22. Handling Noisy Data
  1. Binary Classification Techniques
  2. Batch Normalization
  3. Word Embeddings
  4. Weight Initialization
  5. Model Comparison
  6. Clustering
  7. Association Rules
  8. Performance Visualization
  9. Ensemble Techniques
  10. Early Stopping
  11. Blackbox - Neural Network Models
  12. Batch Size Selection
  13. Regularization Techniques
  14. Reinforcement Learning
  15. Network Analytics/ GeoSpatial Analytics
  16. Regression Analysis
  17. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  18. Regularization
  19. Natural Language Processing
  20. Recommendation Engine
  21. Hyperparameter Tuning
  22. Regular Monitoring and Logging
  23. Multiclass Classification Techniques
  24. Cross-Validation
  25. Learning Rate Scheduling
  26. Forecasting Techniques
  27. Transfer Learning
  28. Cross-Validation
  29. AutoML
  30. Model Interpretability
  31. External Validation
  32. Data Augmentation
  33. Evaluation Metrics
  34. Transfer Learning
  1. code repository
  2. Databases
  3. model registry
  4. Data Preprocessing pipeline models
  5. Datawarehouse
  1. Cloud Deployment
  2. Bias and Fairness Assessment
  3. Alerting and Notification
  4. Data Drift Monitoring
  5. Performance Metrics
  6. Model Health Monitoring
  7. Model Versioning
  8. Containerization
  9. Streamlit
  10. Flask
  11. Concept Drift Detection
  12. Model Serialization
  13. Model Drift
  14. Prediction Logging
  15. Feedback Collection
  16. Serverless Computing
  17. FastAPI
  18. Edge Deployment
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API