Workflow Element Store

  1. Public Datasets
  2. Mobile Applications or IoT Applications
  3. Flat files
  4. Feedback Data
  5. Surveys and Questionnaires
  6. WebScraping
  7. APIs and Data Feeds
  8. Data bases - NoSQL
  9. Experiments (DoE)
  10. Data Collaboration and Partnerships
  11. Data Bases - SQL
  1. AWS RDS
  2. MS SQL server
  3. Azure ADF
  4. ETL/ELT pipeline
  5. GCP BigQuery
  6. Azure Streaming Analytics
  7. PostgreSQL
  8. s3
  9. GCP Dataflow
  10. MongoDB
  11. GCS
  12. MySQL
  13. AWS Kinesis
  14. GCP Data Fusion
  15. Oracle DB
  16. AWS Redshift
  17. Azure blob storage
  18. AWS Glue
  19. Apache Kafka
  20. Azure Synapse
  21. RDBMS
  1. Feature Selection
  2. Dimensionality Reduction
  3. Textual Feature Extraction
  4. Polynomial Features
  5. Binning / Discretization
  6. Handling Categorical Data
  7. Interaction Features
  8. Feature Extraction from Images
  9. Domain-Specific Feature Engineering
  10. Annotation
  11. Data Scaling and Normalization
  12. Auto-Preprocessing libraries
  13. Time-Based Features
  14. Dealing with Outliers
  15. AutoEDA libraries
  16. Handling Time-Series Data
  17. Handling Missing Data
  18. Augmentation
  19. Handling Imbalanced Classes
  20. Handling Noisy Data
  21. Data Transformations
  22. Data Partitioning - Train, Validation, & Test
  1. Forecasting Techniques
  2. Binary Classification Techniques
  3. Early Stopping
  4. Weight Initialization
  5. Transfer Learning
  6. Model Interpretability
  7. Data Augmentation
  8. Performance Visualization
  9. Batch Normalization
  10. Natural Language Processing
  11. Association Rules
  12. Transfer Learning
  13. Multiclass Classification Techniques
  14. Regularization
  15. Blackbox - Neural Network Models
  16. Recommendation Engine
  17. Word Embeddings
  18. Batch Size Selection
  19. Regression Analysis
  20. Cross-Validation
  21. Model Comparison
  22. Network Analytics/ GeoSpatial Analytics
  23. Regular Monitoring and Logging
  24. Ensemble Techniques
  25. Evaluation Metrics
  26. Clustering
  27. AutoML
  28. Reinforcement Learning
  29. Hyperparameter Tuning
  30. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  31. Regularization Techniques
  32. Learning Rate Scheduling
  33. Cross-Validation
  34. External Validation
  1. Databases
  2. Data Preprocessing pipeline models
  3. Datawarehouse
  4. model registry
  5. code repository
  1. Cloud Deployment
  2. Model Versioning
  3. Data Drift Monitoring
  4. Model Serialization
  5. Model Health Monitoring
  6. Edge Deployment
  7. Bias and Fairness Assessment
  8. Prediction Logging
  9. Streamlit
  10. Feedback Collection
  11. Alerting and Notification
  12. Containerization
  13. Model Drift
  14. FastAPI
  15. Performance Metrics
  16. Concept Drift Detection
  17. Serverless Computing
  18. Flask
ML Workflow - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Inference Pipeline

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference