Workflow Element Store

  1. Flat files
  2. WebScraping
  3. Surveys and Questionnaires
  4. APIs and Data Feeds
  5. Public Datasets
  6. Mobile Applications or IoT Applications
  7. Experiments (DoE)
  8. Data bases - NoSQL
  9. Data Bases - SQL
  10. Feedback Data
  11. Data Collaboration and Partnerships
  1. Azure ADF
  2. Azure blob storage
  3. AWS Kinesis
  4. Oracle DB
  5. ETL/ELT pipeline
  6. s3
  7. PostgreSQL
  8. AWS Glue
  9. AWS RDS
  10. GCP BigQuery
  11. GCP Dataflow
  12. AWS Redshift
  13. RDBMS
  14. MongoDB
  15. Azure Synapse
  16. MS SQL server
  17. GCS
  18. MySQL
  19. GCP Data Fusion
  20. Azure Streaming Analytics
  21. Apache Kafka
  1. Augmentation
  2. Dimensionality Reduction
  3. Auto-Preprocessing libraries
  4. Feature Extraction from Images
  5. Interaction Features
  6. Handling Noisy Data
  7. Annotation
  8. Handling Missing Data
  9. Domain-Specific Feature Engineering
  10. Data Partitioning - Train, Validation, & Test
  11. Dealing with Outliers
  12. Data Transformations
  13. Textual Feature Extraction
  14. Feature Selection
  15. Handling Time-Series Data
  16. AutoEDA libraries
  17. Time-Based Features
  18. Handling Categorical Data
  19. Binning / Discretization
  20. Handling Imbalanced Classes
  21. Polynomial Features
  22. Data Scaling and Normalization
  1. Cross-Validation
  2. Binary Classification Techniques
  3. Weight Initialization
  4. Regression Analysis
  5. Natural Language Processing
  6. Cross-Validation
  7. Evaluation Metrics
  8. External Validation
  9. AutoML
  10. Performance Visualization
  11. Transfer Learning
  12. Model Comparison
  13. Batch Normalization
  14. Recommendation Engine
  15. Network Analytics/ GeoSpatial Analytics
  16. Regular Monitoring and Logging
  17. Blackbox - Neural Network Models
  18. Forecasting Techniques
  19. Data Augmentation
  20. Word Embeddings
  21. Reinforcement Learning
  22. Clustering
  23. Association Rules
  24. Hyperparameter Tuning
  25. Batch Size Selection
  26. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  27. Early Stopping
  28. Multiclass Classification Techniques
  29. Ensemble Techniques
  30. Learning Rate Scheduling
  31. Regularization Techniques
  32. Model Interpretability
  33. Transfer Learning
  34. Regularization
  1. Datawarehouse
  2. code repository
  3. model registry
  4. Data Preprocessing pipeline models
  5. Databases
  1. Feedback Collection
  2. Streamlit
  3. Model Health Monitoring
  4. Containerization
  5. Prediction Logging
  6. Data Drift Monitoring
  7. Flask
  8. Alerting and Notification
  9. Concept Drift Detection
  10. Model Drift
  11. Serverless Computing
  12. FastAPI
  13. Bias and Fairness Assessment
  14. Edge Deployment
  15. Performance Metrics
  16. Model Versioning
  17. Cloud Deployment
  18. Model Serialization
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Inference Pipeline

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference