You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
186 lines
6.0 KiB
186 lines
6.0 KiB
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Overview
|
|
|
|
Alpha Lab is a quantitative research experiment framework for the `qshare` library. It uses a notebook-centric approach for exploring trading strategies and ML models. The codebase is organized around two prediction tasks:
|
|
|
|
- **cta_1d**: CTA (Commodity Trading Advisor) futures 1-day return prediction
|
|
- **stock_15m**: Stock 15-minute forward return prediction using high-frequency features
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
alpha_lab/
|
|
├── common/ # Shared utilities
|
|
│ ├── __init__.py
|
|
│ ├── paths.py # Path management
|
|
│ └── plotting.py # Common plotting functions
|
|
│
|
|
├── cta_1d/ # CTA 1-day return prediction
|
|
│ ├── __init__.py # Re-exports from src/
|
|
│ ├── config.yaml # Task configuration
|
|
│ ├── src/ # Implementation modules
|
|
│ │ ├── __init__.py
|
|
│ │ ├── loader.py # CTA1DLoader
|
|
│ │ ├── train.py # Training functions
|
|
│ │ ├── backtest.py # Backtest functions
|
|
│ │ └── labels.py # Label blending utilities
|
|
│ └── *.ipynb # Experiment notebooks
|
|
│
|
|
├── stock_15m/ # Stock 15-minute return prediction
|
|
│ ├── __init__.py # Re-exports from src/
|
|
│ ├── config.yaml # Task configuration
|
|
│ ├── src/ # Implementation modules
|
|
│ │ ├── __init__.py
|
|
│ │ ├── loader.py # Stock15mLoader
|
|
│ │ └── train.py # Training functions
|
|
│ └── *.ipynb # Experiment notebooks
|
|
│
|
|
└── results/ # Output directory (gitignored)
|
|
```
|
|
|
|
## Common Commands
|
|
|
|
### Development Setup
|
|
|
|
```bash
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Create environment configuration
|
|
cp .env.template .env
|
|
# Edit .env with your DolphinDB host and data paths
|
|
```
|
|
|
|
### Running Experiments
|
|
|
|
```bash
|
|
# Start Jupyter for interactive experiments
|
|
jupyter notebook
|
|
|
|
# Train CTA model from config
|
|
python -m cta_1d.train --config cta_1d/config.yaml --output results/cta_1d/exp01
|
|
|
|
# Train Stock 15m model
|
|
python -m stock_15m.train --config stock_15m/config.yaml --output results/stock_15m/exp01
|
|
|
|
# Run CTA backtest
|
|
python -m cta_1d.backtest \
|
|
--model results/cta_1d/exp01/model.json \
|
|
--dt-range 2023-01-01 2023-12-31 \
|
|
--output results/cta_1d/backtest_01
|
|
```
|
|
|
|
### Python API Usage
|
|
|
|
```python
|
|
# CTA 1D workflow
|
|
from cta_1d import CTA1DLoader, train_model, TrainConfig
|
|
|
|
loader = CTA1DLoader(return_type='o2c_twap1min', normalization='dual')
|
|
dataset = loader.load(dt_range=['2020-01-01', '2023-12-31'])
|
|
|
|
config = TrainConfig(dt_range=['2020-01-01', '2023-12-31'], feature_sets=['alpha158'])
|
|
model, metrics = train_model(config, output_dir='results/exp01')
|
|
|
|
# Stock 15m workflow
|
|
from stock_15m import Stock15mLoader, train_model, TrainConfig
|
|
|
|
loader = Stock15mLoader(normalization_mode='dual')
|
|
dataset = loader.load(
|
|
dt_range=['2020-01-01', '2023-12-31'],
|
|
feature_path='/data/parquet/stock_1min_alpha158',
|
|
kline_path='/data/parquet/stock_1min_kline'
|
|
)
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Module Organization
|
|
|
|
All implementation code lives in `src/` subdirectories:
|
|
|
|
- **`cta_1d/src/`**: CTA-specific implementations
|
|
- `loader.py`: CTA1DLoader class
|
|
- `train.py`: train_model, TrainConfig
|
|
- `backtest.py`: run_backtest, BacktestConfig
|
|
- `labels.py`: Label blending utilities
|
|
|
|
- **`stock_15m/src/`**: Stock-specific implementations
|
|
- `loader.py`: Stock15mLoader class
|
|
- `train.py`: train_model, TrainConfig
|
|
|
|
Root `__init__.py` files re-export public APIs for backward compatibility:
|
|
```python
|
|
from cta_1d import CTA1DLoader # Imports from cta_1d.src
|
|
```
|
|
|
|
### Data Flow
|
|
|
|
Both tasks follow a consistent pattern:
|
|
|
|
1. **Loaders** (`src/loader.py`): Fetch data from DolphinDB (CTA) or Parquet files (Stock), apply normalization, compute sample weights, return `pl_Dataset`
|
|
2. **Training** (`src/train.py`): XGBoost with early stopping, outputs model JSON + metrics
|
|
3. **Backtest** (`src/backtest.py`): CTA-only; uses `qshare.eval.cta.backtest.CTABacktester` for strategy simulation
|
|
|
|
### Key Classes
|
|
|
|
- **`CTA1DLoader`**: Loads alpha158/hffactor features from DolphinDB; supports 5 normalization modes (`zscore`, `cs_zscore`, `rolling_20`, `rolling_60`, `dual`)
|
|
- **`Stock15mLoader`**: Loads Alpha158 on 1-min data; computes 15-min forward returns; normalization modes: `industry`, `cs_zscore`, `dual`
|
|
- **`pl_Dataset`**: From `qshare.data`; provides `.with_segments()`, `.split()`, `.to_numpy()` methods
|
|
|
|
### Normalization Modes
|
|
|
|
**CTA 1D** (`dual` blending):
|
|
- `zscore`: Fit-time mean/std normalization
|
|
- `cs_zscore`: Cross-sectional z-score per datetime
|
|
- `rolling_20/60`: Rolling window normalization
|
|
- `dual`: Weighted blend (default: [0.2, 0.1, 0.3, 0.4])
|
|
|
|
**Stock 15m**:
|
|
- `industry`: Industry-neutralized returns
|
|
- `cs_zscore`: Cross-sectional z-score
|
|
- `dual`: 80% industry-neutral + 20% cs_zscore
|
|
|
|
### Experiment Tracking
|
|
|
|
Manual tracking in `results/{task}/README.md`:
|
|
|
|
```markdown
|
|
## 2025-01-15: Baseline XGB
|
|
- Notebook: `cta_1d/03_baseline_xgb.ipynb` (cells 1-50)
|
|
- Config: eta=0.5, lambda=0.1
|
|
- Train IC: 0.042
|
|
- Test IC: 0.038
|
|
- Notes: Dual normalization, 4 trades/day
|
|
```
|
|
|
|
### Dependencies on qshare
|
|
|
|
The codebase relies heavily on the `qshare` library (already installed in the venv):
|
|
|
|
- `qshare.data.pl_Dataset`: Dataset container with Polars backend
|
|
- `qshare.io.ddb`: DolphinDB session management
|
|
- `qshare.io.polars`: Parquet loading utilities
|
|
- `qshare.algo.polars`: Industry neutralization, cross-sectional z-score
|
|
- `qshare.eval.cta.backtest`: CTA backtesting framework
|
|
- `qshare.config.research.cta`: Predefined column lists (HFFACTOR_COLS)
|
|
|
|
### Configuration Files
|
|
|
|
YAML configs define data ranges, model hyperparameters, and output settings:
|
|
|
|
```yaml
|
|
data:
|
|
dt_range: ['2020-01-01', '2023-12-31']
|
|
feature_sets: [alpha158, hffactor]
|
|
normalization: dual
|
|
model:
|
|
type: xgb
|
|
params: {eta: 0.05, max_depth: 6}
|
|
```
|
|
|
|
Load with: `python -m cta_1d.train --config config.yaml` or `yaml.safe_load()` directly.
|