Workflow Element Store

  1. Data bases - NoSQL
  2. APIs and Data Feeds
  3. Surveys and Questionnaires
  4. Public Datasets
  5. Mobile Applications or IoT Applications
  6. Flat files
  7. Experiments (DoE)
  8. Data Bases - SQL
  9. Data Collaboration and Partnerships
  10. WebScraping
  11. Feedback Data
  1. GCS
  2. GCP Data Fusion
  3. AWS Kinesis
  4. MS SQL server
  5. AWS Glue
  6. AWS RDS
  7. GCP Dataflow
  8. Azure Streaming Analytics
  9. Oracle DB
  10. GCP BigQuery
  11. RDBMS
  12. PostgreSQL
  13. Apache Kafka
  14. s3
  15. MySQL
  16. Azure Synapse
  17. AWS Redshift
  18. Azure ADF
  19. Azure blob storage
  20. ETL/ELT pipeline
  21. MongoDB
  1. Binning / Discretization
  2. Handling Missing Data
  3. Dealing with Outliers
  4. Auto-Preprocessing libraries
  5. Handling Time-Series Data
  6. AutoEDA libraries
  7. Data Scaling and Normalization
  8. Data Partitioning - Train, Validation, & Test
  9. Annotation
  10. Data Transformations
  11. Interaction Features
  12. Handling Noisy Data
  13. Dimensionality Reduction
  14. Domain-Specific Feature Engineering
  15. Feature Extraction from Images
  16. Handling Categorical Data
  17. Augmentation
  18. Handling Imbalanced Classes
  19. Feature Selection
  20. Textual Feature Extraction
  21. Time-Based Features
  22. Polynomial Features
  1. Ensemble Techniques
  2. Model Comparison
  3. Regular Monitoring and Logging
  4. Data Augmentation
  5. Batch Normalization
  6. Cross-Validation
  7. Blackbox - Neural Network Models
  8. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  9. Regularization
  10. Weight Initialization
  11. Model Interpretability
  12. Transfer Learning
  13. Early Stopping
  14. Recommendation Engine
  15. Hyperparameter Tuning
  16. Learning Rate Scheduling
  17. Batch Size Selection
  18. Performance Visualization
  19. Word Embeddings
  20. Cross-Validation
  21. Binary Classification Techniques
  22. Association Rules
  23. Forecasting Techniques
  24. Regression Analysis
  25. Evaluation Metrics
  26. Transfer Learning
  27. Reinforcement Learning
  28. External Validation
  29. Clustering
  30. AutoML
  31. Network Analytics/ GeoSpatial Analytics
  32. Multiclass Classification Techniques
  33. Natural Language Processing
  34. Regularization Techniques
  1. model registry
  2. code repository
  3. Databases
  4. Data Preprocessing pipeline models
  5. Datawarehouse
  1. Containerization
  2. Performance Metrics
  3. Alerting and Notification
  4. Streamlit
  5. Flask
  6. Model Serialization
  7. Serverless Computing
  8. Model Health Monitoring
  9. Concept Drift Detection
  10. Edge Deployment
  11. Prediction Logging
  12. Model Versioning
  13. Cloud Deployment
  14. Data Drift Monitoring
  15. FastAPI
  16. Feedback Collection
  17. Model Drift
  18. Bias and Fairness Assessment
ML Workflow - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Inference Pipeline

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference