Workflow Element Store

  1. WebScraping
  2. Data Collaboration and Partnerships
  3. Data bases - NoSQL
  4. Flat files
  5. Surveys and Questionnaires
  6. Feedback Data
  7. Public Datasets
  8. Data Bases - SQL
  9. Mobile Applications or IoT Applications
  10. APIs and Data Feeds
  11. Experiments (DoE)
  1. AWS Redshift
  2. Apache Kafka
  3. RDBMS
  4. GCS
  5. Azure Streaming Analytics
  6. Oracle DB
  7. Azure blob storage
  8. MS SQL server
  9. GCP BigQuery
  10. Azure Synapse
  11. s3
  12. PostgreSQL
  13. AWS Glue
  14. AWS Kinesis
  15. MySQL
  16. AWS RDS
  17. Azure ADF
  18. MongoDB
  19. GCP Data Fusion
  20. GCP Dataflow
  21. ETL/ELT pipeline
  1. Auto-Preprocessing libraries
  2. Handling Missing Data
  3. Handling Time-Series Data
  4. Annotation
  5. Data Partitioning - Train, Validation, & Test
  6. Dimensionality Reduction
  7. Time-Based Features
  8. Handling Noisy Data
  9. Domain-Specific Feature Engineering
  10. Feature Extraction from Images
  11. Data Scaling and Normalization
  12. Polynomial Features
  13. Handling Categorical Data
  14. Dealing with Outliers
  15. Textual Feature Extraction
  16. Interaction Features
  17. Data Transformations
  18. Binning / Discretization
  19. Handling Imbalanced Classes
  20. Augmentation
  21. Feature Selection
  22. AutoEDA libraries
  1. Cross-Validation
  2. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  3. Reinforcement Learning
  4. Forecasting Techniques
  5. Blackbox - Neural Network Models
  6. External Validation
  7. Performance Visualization
  8. Regularization
  9. Evaluation Metrics
  10. Recommendation Engine
  11. Weight Initialization
  12. Transfer Learning
  13. Early Stopping
  14. Learning Rate Scheduling
  15. Regular Monitoring and Logging
  16. Ensemble Techniques
  17. Batch Size Selection
  18. Model Interpretability
  19. Transfer Learning
  20. Binary Classification Techniques
  21. Cross-Validation
  22. Regression Analysis
  23. Natural Language Processing
  24. Hyperparameter Tuning
  25. Batch Normalization
  26. Clustering
  27. Multiclass Classification Techniques
  28. Data Augmentation
  29. Regularization Techniques
  30. Word Embeddings
  31. Model Comparison
  32. Network Analytics/ GeoSpatial Analytics
  33. AutoML
  34. Association Rules
  1. model registry
  2. Data Preprocessing pipeline models
  3. code repository
  4. Databases
  5. Datawarehouse
  1. Prediction Logging
  2. Model Drift
  3. Cloud Deployment
  4. Feedback Collection
  5. Serverless Computing
  6. Performance Metrics
  7. Edge Deployment
  8. FastAPI
  9. Containerization
  10. Bias and Fairness Assessment
  11. Alerting and Notification
  12. Model Versioning
  13. Data Drift Monitoring
  14. Model Serialization
  15. Model Health Monitoring
  16. Flask
  17. Concept Drift Detection
  18. Streamlit
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

API Stream

Web crawler

API Stream

Web crawler

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

streamlit