Workflow Element Store

  1. Surveys and Questionnaires
  2. Data Collaboration and Partnerships
  3. Public Datasets
  4. Flat files
  5. Mobile Applications or IoT Applications
  6. Feedback Data
  7. WebScraping
  8. Data Bases - SQL
  9. Data bases - NoSQL
  10. APIs and Data Feeds
  11. Experiments (DoE)
  1. AWS Kinesis
  2. s3
  3. GCP Data Fusion
  4. AWS Glue
  5. MongoDB
  6. GCP Dataflow
  7. Apache Kafka
  8. ETL/ELT pipeline
  9. Azure blob storage
  10. Azure ADF
  11. Azure Synapse
  12. GCS
  13. RDBMS
  14. MS SQL server
  15. AWS Redshift
  16. GCP BigQuery
  17. MySQL
  18. Oracle DB
  19. Azure Streaming Analytics
  20. AWS RDS
  21. PostgreSQL
  1. Interaction Features
  2. Handling Missing Data
  3. Handling Imbalanced Classes
  4. Data Scaling and Normalization
  5. Data Transformations
  6. Annotation
  7. Feature Selection
  8. Auto-Preprocessing libraries
  9. Data Partitioning - Train, Validation, & Test
  10. Augmentation
  11. Polynomial Features
  12. Dimensionality Reduction
  13. Textual Feature Extraction
  14. Handling Noisy Data
  15. Feature Extraction from Images
  16. Domain-Specific Feature Engineering
  17. Time-Based Features
  18. AutoEDA libraries
  19. Dealing with Outliers
  20. Binning / Discretization
  21. Handling Time-Series Data
  22. Handling Categorical Data
  1. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  2. Batch Normalization
  3. Ensemble Techniques
  4. Multiclass Classification Techniques
  5. Blackbox - Neural Network Models
  6. Regression Analysis
  7. External Validation
  8. Weight Initialization
  9. Learning Rate Scheduling
  10. Transfer Learning
  11. Cross-Validation
  12. Model Interpretability
  13. Forecasting Techniques
  14. Word Embeddings
  15. Regularization Techniques
  16. Batch Size Selection
  17. Transfer Learning
  18. Binary Classification Techniques
  19. Evaluation Metrics
  20. Data Augmentation
  21. Cross-Validation
  22. Association Rules
  23. Natural Language Processing
  24. Regular Monitoring and Logging
  25. AutoML
  26. Network Analytics/ GeoSpatial Analytics
  27. Clustering
  28. Hyperparameter Tuning
  29. Early Stopping
  30. Reinforcement Learning
  31. Performance Visualization
  32. Model Comparison
  33. Recommendation Engine
  34. Regularization
  1. Datawarehouse
  2. code repository
  3. model registry
  4. Databases
  5. Data Preprocessing pipeline models
  1. Data Drift Monitoring
  2. Streamlit
  3. Cloud Deployment
  4. Prediction Logging
  5. FastAPI
  6. Concept Drift Detection
  7. Flask
  8. Edge Deployment
  9. Performance Metrics
  10. Model Health Monitoring
  11. Feedback Collection
  12. Alerting and Notification
  13. Bias and Fairness Assessment
  14. Model Drift
  15. Serverless Computing
  16. Model Serialization
  17. Model Versioning
  18. Containerization
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Inference Pipeline

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference