Workflow Element Store

  1. Public Datasets
  2. WebScraping
  3. APIs and Data Feeds
  4. Data Bases - SQL
  5. Feedback Data
  6. Surveys and Questionnaires
  7. Data Collaboration and Partnerships
  8. Mobile Applications or IoT Applications
  9. Flat files
  10. Data bases - NoSQL
  11. Experiments (DoE)
  1. AWS Redshift
  2. PostgreSQL
  3. ETL/ELT pipeline
  4. MongoDB
  5. AWS RDS
  6. s3
  7. Azure ADF
  8. MS SQL server
  9. GCP BigQuery
  10. Azure Synapse
  11. GCS
  12. GCP Data Fusion
  13. GCP Dataflow
  14. MySQL
  15. Apache Kafka
  16. AWS Glue
  17. Oracle DB
  18. Azure Streaming Analytics
  19. RDBMS
  20. Azure blob storage
  21. AWS Kinesis
  1. AutoEDA libraries
  2. Dealing with Outliers
  3. Handling Noisy Data
  4. Data Transformations
  5. Polynomial Features
  6. Feature Extraction from Images
  7. Handling Missing Data
  8. Handling Imbalanced Classes
  9. Handling Categorical Data
  10. Dimensionality Reduction
  11. Time-Based Features
  12. Annotation
  13. Domain-Specific Feature Engineering
  14. Handling Time-Series Data
  15. Auto-Preprocessing libraries
  16. Data Scaling and Normalization
  17. Feature Selection
  18. Binning / Discretization
  19. Augmentation
  20. Interaction Features
  21. Textual Feature Extraction
  22. Data Partitioning - Train, Validation, & Test
  1. Ensemble Techniques
  2. Recommendation Engine
  3. Forecasting Techniques
  4. Clustering
  5. Regularization Techniques
  6. Regular Monitoring and Logging
  7. Blackbox - Neural Network Models
  8. Natural Language Processing
  9. External Validation
  10. Regression Analysis
  11. Multiclass Classification Techniques
  12. Batch Size Selection
  13. Performance Visualization
  14. Reinforcement Learning
  15. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  16. Transfer Learning
  17. Hyperparameter Tuning
  18. Data Augmentation
  19. Association Rules
  20. Transfer Learning
  21. Cross-Validation
  22. AutoML
  23. Learning Rate Scheduling
  24. Early Stopping
  25. Cross-Validation
  26. Word Embeddings
  27. Weight Initialization
  28. Binary Classification Techniques
  29. Model Interpretability
  30. Network Analytics/ GeoSpatial Analytics
  31. Batch Normalization
  32. Model Comparison
  33. Regularization
  34. Evaluation Metrics
  1. Data Preprocessing pipeline models
  2. Databases
  3. model registry
  4. code repository
  5. Datawarehouse
  1. Flask
  2. Concept Drift Detection
  3. Performance Metrics
  4. Model Versioning
  5. Cloud Deployment
  6. Data Drift Monitoring
  7. Model Health Monitoring
  8. Bias and Fairness Assessment
  9. Prediction Logging
  10. Containerization
  11. Model Drift
  12. Streamlit
  13. Model Serialization
  14. Alerting and Notification
  15. Serverless Computing
  16. Feedback Collection
  17. Edge Deployment
  18. FastAPI
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API