ML Trading Pipeline
Production-quality machine learning pipeline for stock market prediction with ensemble models, SHAP explainability, and vectorbt backtesting.
January 1, 2026
Machine LearningXGBoostSHAPPythonFinance
Overview
A personal side project building a modular, production-quality machine learning pipeline for stock market prediction and backtesting. Designed with proper temporal splitting, explainability, and a comprehensive test suite.
Pipeline Architecture
- Data Ingestion: Alpaca Markets API with Parquet caching for efficient historical data retrieval
- Feature Engineering: 12 technical indicators — SMA, EMA, RSI, MACD, Bollinger Bands, ATR, OBV, and more
- Temporal Splitting: Train/validation/test splits with strict lookahead prevention
- Model Training: Random Forest, XGBoost, and LightGBM classifiers with hyperparameter tuning
- Explainability: SHAP values for feature importance and model interpretability
- Backtesting: vectorbt engine for realistic portfolio simulation with transaction costs
Key Features
- 42 unit and integration tests ensuring pipeline correctness
- Lookahead bias prevention at every stage
- Model comparison framework across multiple classifiers
- Feature importance analysis with SHAP visualizations
Technologies
Python, scikit-learn, XGBoost, LightGBM, SHAP, vectorbt, Alpaca API, Pandas, NumPy