Workflow Element Store

  1. Flat files
  2. Mobile Applications or IoT Applications
  3. WebScraping
  4. APIs and Data Feeds
  5. Feedback Data
  6. Data Bases - SQL
  7. Public Datasets
  8. Data Collaboration and Partnerships
  9. Experiments (DoE)
  10. Surveys and Questionnaires
  11. Data bases - NoSQL
  1. GCP BigQuery
  2. ETL/ELT pipeline
  3. AWS Kinesis
  4. Azure Streaming Analytics
  5. GCP Dataflow
  6. Azure blob storage
  7. MongoDB
  8. MS SQL server
  9. AWS Redshift
  10. Azure ADF
  11. GCS
  12. Apache Kafka
  13. Azure Synapse
  14. GCP Data Fusion
  15. s3
  16. Oracle DB
  17. AWS RDS
  18. PostgreSQL
  19. AWS Glue
  20. RDBMS
  21. MySQL
  1. Handling Missing Data
  2. Data Scaling and Normalization
  3. Data Transformations
  4. Interaction Features
  5. Feature Selection
  6. Binning / Discretization
  7. Handling Time-Series Data
  8. Polynomial Features
  9. Augmentation
  10. Handling Noisy Data
  11. Annotation
  12. Time-Based Features
  13. Textual Feature Extraction
  14. Domain-Specific Feature Engineering
  15. Dealing with Outliers
  16. Handling Imbalanced Classes
  17. Feature Extraction from Images
  18. Handling Categorical Data
  19. AutoEDA libraries
  20. Auto-Preprocessing libraries
  21. Dimensionality Reduction
  22. Data Partitioning - Train, Validation, & Test
  1. Natural Language Processing
  2. Blackbox - Neural Network Models
  3. Forecasting Techniques
  4. AutoML
  5. Model Interpretability
  6. Cross-Validation
  7. Transfer Learning
  8. Regularization
  9. Regular Monitoring and Logging
  10. Weight Initialization
  11. Cross-Validation
  12. Data Augmentation
  13. Ensemble Techniques
  14. Network Analytics/ GeoSpatial Analytics
  15. Batch Size Selection
  16. Association Rules
  17. Evaluation Metrics
  18. Binary Classification Techniques
  19. Model Comparison
  20. Clustering
  21. Reinforcement Learning
  22. External Validation
  23. Hyperparameter Tuning
  24. Performance Visualization
  25. Regression Analysis
  26. Recommendation Engine
  27. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  28. Word Embeddings
  29. Transfer Learning
  30. Regularization Techniques
  31. Learning Rate Scheduling
  32. Multiclass Classification Techniques
  33. Early Stopping
  34. Batch Normalization
  1. code repository
  2. Databases
  3. Datawarehouse
  4. model registry
  5. Data Preprocessing pipeline models
  1. Flask
  2. Edge Deployment
  3. Model Serialization
  4. Performance Metrics
  5. Model Health Monitoring
  6. Model Versioning
  7. Model Drift
  8. Feedback Collection
  9. Bias and Fairness Assessment
  10. Streamlit
  11. Serverless Computing
  12. Prediction Logging
  13. Containerization
  14. Data Drift Monitoring
  15. Alerting and Notification
  16. Concept Drift Detection
  17. FastAPI
  18. Cloud Deployment
ML Workflow - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Inference Pipeline

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference