Workflow Element Store

  1. Feedback Data
  2. Flat files
  3. Mobile Applications or IoT Applications
  4. Public Datasets
  5. Data Bases - SQL
  6. Experiments (DoE)
  7. WebScraping
  8. APIs and Data Feeds
  9. Data Collaboration and Partnerships
  10. Data bases - NoSQL
  11. Surveys and Questionnaires
  1. Azure blob storage
  2. Oracle DB
  3. s3
  4. Azure ADF
  5. GCP Dataflow
  6. ETL/ELT pipeline
  7. GCP BigQuery
  8. AWS Glue
  9. Azure Synapse
  10. MongoDB
  11. Azure Streaming Analytics
  12. MySQL
  13. AWS Redshift
  14. MS SQL server
  15. Apache Kafka
  16. AWS Kinesis
  17. PostgreSQL
  18. GCP Data Fusion
  19. RDBMS
  20. AWS RDS
  21. GCS
  1. Domain-Specific Feature Engineering
  2. AutoEDA libraries
  3. Handling Categorical Data
  4. Dimensionality Reduction
  5. Feature Extraction from Images
  6. Handling Noisy Data
  7. Feature Selection
  8. Data Transformations
  9. Handling Imbalanced Classes
  10. Handling Missing Data
  11. Textual Feature Extraction
  12. Data Scaling and Normalization
  13. Data Partitioning - Train, Validation, & Test
  14. Auto-Preprocessing libraries
  15. Dealing with Outliers
  16. Polynomial Features
  17. Annotation
  18. Handling Time-Series Data
  19. Binning / Discretization
  20. Interaction Features
  21. Time-Based Features
  22. Augmentation
  1. Early Stopping
  2. Association Rules
  3. Forecasting Techniques
  4. Cross-Validation
  5. Recommendation Engine
  6. Cross-Validation
  7. Hyperparameter Tuning
  8. Model Comparison
  9. Weight Initialization
  10. Data Augmentation
  11. Learning Rate Scheduling
  12. Word Embeddings
  13. Blackbox - Neural Network Models
  14. Transfer Learning
  15. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  16. Regression Analysis
  17. Batch Size Selection
  18. Reinforcement Learning
  19. Network Analytics/ GeoSpatial Analytics
  20. Binary Classification Techniques
  21. Performance Visualization
  22. Batch Normalization
  23. Transfer Learning
  24. Regularization
  25. AutoML
  26. Multiclass Classification Techniques
  27. Regular Monitoring and Logging
  28. Evaluation Metrics
  29. Model Interpretability
  30. External Validation
  31. Natural Language Processing
  32. Regularization Techniques
  33. Clustering
  34. Ensemble Techniques
  1. Databases
  2. Datawarehouse
  3. model registry
  4. Data Preprocessing pipeline models
  5. code repository
  1. Concept Drift Detection
  2. Model Drift
  3. Model Serialization
  4. Performance Metrics
  5. Prediction Logging
  6. Alerting and Notification
  7. Edge Deployment
  8. FastAPI
  9. Flask
  10. Serverless Computing
  11. Cloud Deployment
  12. Model Versioning
  13. Containerization
  14. Bias and Fairness Assessment
  15. Feedback Collection
  16. Data Drift Monitoring
  17. Streamlit
  18. Model Health Monitoring
ML Workflow - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Inference Pipeline

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference