Workflow Element Store

  1. Surveys and Questionnaires
  2. Data bases - NoSQL
  3. Public Datasets
  4. Data Collaboration and Partnerships
  5. APIs and Data Feeds
  6. Experiments (DoE)
  7. Flat files
  8. Feedback Data
  9. Mobile Applications or IoT Applications
  10. Data Bases - SQL
  11. WebScraping
  1. Apache Kafka
  2. MySQL
  3. AWS Redshift
  4. Azure blob storage
  5. RDBMS
  6. PostgreSQL
  7. MS SQL server
  8. Azure Streaming Analytics
  9. ETL/ELT pipeline
  10. s3
  11. Azure ADF
  12. Azure Synapse
  13. GCP Dataflow
  14. GCS
  15. AWS Glue
  16. GCP BigQuery
  17. AWS Kinesis
  18. AWS RDS
  19. Oracle DB
  20. GCP Data Fusion
  21. MongoDB
  1. Time-Based Features
  2. Handling Noisy Data
  3. Feature Extraction from Images
  4. Data Partitioning - Train, Validation, & Test
  5. Polynomial Features
  6. Textual Feature Extraction
  7. Handling Time-Series Data
  8. Auto-Preprocessing libraries
  9. Data Transformations
  10. Feature Selection
  11. Interaction Features
  12. Handling Categorical Data
  13. AutoEDA libraries
  14. Dealing with Outliers
  15. Handling Imbalanced Classes
  16. Augmentation
  17. Annotation
  18. Dimensionality Reduction
  19. Binning / Discretization
  20. Data Scaling and Normalization
  21. Handling Missing Data
  22. Domain-Specific Feature Engineering
  1. Cross-Validation
  2. Clustering
  3. Blackbox - Neural Network Models
  4. Model Comparison
  5. Word Embeddings
  6. Transfer Learning
  7. Regularization
  8. Binary Classification Techniques
  9. Ensemble Techniques
  10. Transfer Learning
  11. Learning Rate Scheduling
  12. Association Rules
  13. Reinforcement Learning
  14. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  15. Natural Language Processing
  16. Performance Visualization
  17. Weight Initialization
  18. Network Analytics/ GeoSpatial Analytics
  19. Regression Analysis
  20. Data Augmentation
  21. Forecasting Techniques
  22. Regular Monitoring and Logging
  23. Batch Size Selection
  24. Model Interpretability
  25. External Validation
  26. AutoML
  27. Recommendation Engine
  28. Hyperparameter Tuning
  29. Early Stopping
  30. Multiclass Classification Techniques
  31. Batch Normalization
  32. Cross-Validation
  33. Regularization Techniques
  34. Evaluation Metrics
  1. code repository
  2. Databases
  3. Data Preprocessing pipeline models
  4. Datawarehouse
  5. model registry
  1. Cloud Deployment
  2. Model Health Monitoring
  3. Serverless Computing
  4. Model Drift
  5. Data Drift Monitoring
  6. Edge Deployment
  7. Prediction Logging
  8. Flask
  9. Containerization
  10. Feedback Collection
  11. Bias and Fairness Assessment
  12. Alerting and Notification
  13. Performance Metrics
  14. Concept Drift Detection
  15. FastAPI
  16. Model Versioning
  17. Streamlit
  18. Model Serialization
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API