Back to Projects

ML Trading Pipeline

Production-quality machine learning pipeline for stock market prediction with ensemble models, SHAP explainability, and vectorbt backtesting.

January 1, 2026
Machine LearningXGBoostSHAPPythonFinance

Overview

A personal side project building a modular, production-quality machine learning pipeline for stock market prediction and backtesting. Designed with proper temporal splitting, explainability, and a comprehensive test suite.

Pipeline Architecture

  1. Data Ingestion: Alpaca Markets API with Parquet caching for efficient historical data retrieval
  2. Feature Engineering: 12 technical indicators — SMA, EMA, RSI, MACD, Bollinger Bands, ATR, OBV, and more
  3. Temporal Splitting: Train/validation/test splits with strict lookahead prevention
  4. Model Training: Random Forest, XGBoost, and LightGBM classifiers with hyperparameter tuning
  5. Explainability: SHAP values for feature importance and model interpretability
  6. Backtesting: vectorbt engine for realistic portfolio simulation with transaction costs

Key Features

  • 42 unit and integration tests ensuring pipeline correctness
  • Lookahead bias prevention at every stage
  • Model comparison framework across multiple classifiers
  • Feature importance analysis with SHAP visualizations

Technologies

Python, scikit-learn, XGBoost, LightGBM, SHAP, vectorbt, Alpaca API, Pandas, NumPy