- Add detailed directory structure to CLAUDE.md and README.md - Document Module Organization section explaining src/ layout - Add Python API import examples showing re-export pattern - Add Command Line usage section with examples - Update "Adding a New Task" instructions for src/ structure - Add module organization best practice Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>master
parent
966c17d7a9
commit
19f7c522e4
@ -0,0 +1,185 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Overview
|
||||
|
||||
Alpha Lab is a quantitative research experiment framework for the `qshare` library. It uses a notebook-centric approach for exploring trading strategies and ML models. The codebase is organized around two prediction tasks:
|
||||
|
||||
- **cta_1d**: CTA (Commodity Trading Advisor) futures 1-day return prediction
|
||||
- **stock_15m**: Stock 15-minute forward return prediction using high-frequency features
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
alpha_lab/
|
||||
├── common/ # Shared utilities
|
||||
│ ├── __init__.py
|
||||
│ ├── paths.py # Path management
|
||||
│ └── plotting.py # Common plotting functions
|
||||
│
|
||||
├── cta_1d/ # CTA 1-day return prediction
|
||||
│ ├── __init__.py # Re-exports from src/
|
||||
│ ├── config.yaml # Task configuration
|
||||
│ ├── src/ # Implementation modules
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── loader.py # CTA1DLoader
|
||||
│ │ ├── train.py # Training functions
|
||||
│ │ ├── backtest.py # Backtest functions
|
||||
│ │ └── labels.py # Label blending utilities
|
||||
│ └── *.ipynb # Experiment notebooks
|
||||
│
|
||||
├── stock_15m/ # Stock 15-minute return prediction
|
||||
│ ├── __init__.py # Re-exports from src/
|
||||
│ ├── config.yaml # Task configuration
|
||||
│ ├── src/ # Implementation modules
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── loader.py # Stock15mLoader
|
||||
│ │ └── train.py # Training functions
|
||||
│ └── *.ipynb # Experiment notebooks
|
||||
│
|
||||
└── results/ # Output directory (gitignored)
|
||||
```
|
||||
|
||||
## Common Commands
|
||||
|
||||
### Development Setup
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Create environment configuration
|
||||
cp .env.template .env
|
||||
# Edit .env with your DolphinDB host and data paths
|
||||
```
|
||||
|
||||
### Running Experiments
|
||||
|
||||
```bash
|
||||
# Start Jupyter for interactive experiments
|
||||
jupyter notebook
|
||||
|
||||
# Train CTA model from config
|
||||
python -m cta_1d.train --config cta_1d/config.yaml --output results/cta_1d/exp01
|
||||
|
||||
# Train Stock 15m model
|
||||
python -m stock_15m.train --config stock_15m/config.yaml --output results/stock_15m/exp01
|
||||
|
||||
# Run CTA backtest
|
||||
python -m cta_1d.backtest \
|
||||
--model results/cta_1d/exp01/model.json \
|
||||
--dt-range 2023-01-01 2023-12-31 \
|
||||
--output results/cta_1d/backtest_01
|
||||
```
|
||||
|
||||
### Python API Usage
|
||||
|
||||
```python
|
||||
# CTA 1D workflow
|
||||
from cta_1d import CTA1DLoader, train_model, TrainConfig
|
||||
|
||||
loader = CTA1DLoader(return_type='o2c_twap1min', normalization='dual')
|
||||
dataset = loader.load(dt_range=['2020-01-01', '2023-12-31'])
|
||||
|
||||
config = TrainConfig(dt_range=['2020-01-01', '2023-12-31'], feature_sets=['alpha158'])
|
||||
model, metrics = train_model(config, output_dir='results/exp01')
|
||||
|
||||
# Stock 15m workflow
|
||||
from stock_15m import Stock15mLoader, train_model, TrainConfig
|
||||
|
||||
loader = Stock15mLoader(normalization_mode='dual')
|
||||
dataset = loader.load(
|
||||
dt_range=['2020-01-01', '2023-12-31'],
|
||||
feature_path='/data/parquet/stock_1min_alpha158',
|
||||
kline_path='/data/parquet/stock_1min_kline'
|
||||
)
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Module Organization
|
||||
|
||||
All implementation code lives in `src/` subdirectories:
|
||||
|
||||
- **`cta_1d/src/`**: CTA-specific implementations
|
||||
- `loader.py`: CTA1DLoader class
|
||||
- `train.py`: train_model, TrainConfig
|
||||
- `backtest.py`: run_backtest, BacktestConfig
|
||||
- `labels.py`: Label blending utilities
|
||||
|
||||
- **`stock_15m/src/`**: Stock-specific implementations
|
||||
- `loader.py`: Stock15mLoader class
|
||||
- `train.py`: train_model, TrainConfig
|
||||
|
||||
Root `__init__.py` files re-export public APIs for backward compatibility:
|
||||
```python
|
||||
from cta_1d import CTA1DLoader # Imports from cta_1d.src
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
Both tasks follow a consistent pattern:
|
||||
|
||||
1. **Loaders** (`src/loader.py`): Fetch data from DolphinDB (CTA) or Parquet files (Stock), apply normalization, compute sample weights, return `pl_Dataset`
|
||||
2. **Training** (`src/train.py`): XGBoost with early stopping, outputs model JSON + metrics
|
||||
3. **Backtest** (`src/backtest.py`): CTA-only; uses `qshare.eval.cta.backtest.CTABacktester` for strategy simulation
|
||||
|
||||
### Key Classes
|
||||
|
||||
- **`CTA1DLoader`**: Loads alpha158/hffactor features from DolphinDB; supports 5 normalization modes (`zscore`, `cs_zscore`, `rolling_20`, `rolling_60`, `dual`)
|
||||
- **`Stock15mLoader`**: Loads Alpha158 on 1-min data; computes 15-min forward returns; normalization modes: `industry`, `cs_zscore`, `dual`
|
||||
- **`pl_Dataset`**: From `qshare.data`; provides `.with_segments()`, `.split()`, `.to_numpy()` methods
|
||||
|
||||
### Normalization Modes
|
||||
|
||||
**CTA 1D** (`dual` blending):
|
||||
- `zscore`: Fit-time mean/std normalization
|
||||
- `cs_zscore`: Cross-sectional z-score per datetime
|
||||
- `rolling_20/60`: Rolling window normalization
|
||||
- `dual`: Weighted blend (default: [0.2, 0.1, 0.3, 0.4])
|
||||
|
||||
**Stock 15m**:
|
||||
- `industry`: Industry-neutralized returns
|
||||
- `cs_zscore`: Cross-sectional z-score
|
||||
- `dual`: 80% industry-neutral + 20% cs_zscore
|
||||
|
||||
### Experiment Tracking
|
||||
|
||||
Manual tracking in `results/{task}/README.md`:
|
||||
|
||||
```markdown
|
||||
## 2025-01-15: Baseline XGB
|
||||
- Notebook: `cta_1d/03_baseline_xgb.ipynb` (cells 1-50)
|
||||
- Config: eta=0.5, lambda=0.1
|
||||
- Train IC: 0.042
|
||||
- Test IC: 0.038
|
||||
- Notes: Dual normalization, 4 trades/day
|
||||
```
|
||||
|
||||
### Dependencies on qshare
|
||||
|
||||
The codebase relies heavily on the `qshare` library (already installed in the venv):
|
||||
|
||||
- `qshare.data.pl_Dataset`: Dataset container with Polars backend
|
||||
- `qshare.io.ddb`: DolphinDB session management
|
||||
- `qshare.io.polars`: Parquet loading utilities
|
||||
- `qshare.algo.polars`: Industry neutralization, cross-sectional z-score
|
||||
- `qshare.eval.cta.backtest`: CTA backtesting framework
|
||||
- `qshare.config.research.cta`: Predefined column lists (HFFACTOR_COLS)
|
||||
|
||||
### Configuration Files
|
||||
|
||||
YAML configs define data ranges, model hyperparameters, and output settings:
|
||||
|
||||
```yaml
|
||||
data:
|
||||
dt_range: ['2020-01-01', '2023-12-31']
|
||||
feature_sets: [alpha158, hffactor]
|
||||
normalization: dual
|
||||
model:
|
||||
type: xgb
|
||||
params: {eta: 0.05, max_depth: 6}
|
||||
```
|
||||
|
||||
Load with: `python -m cta_1d.train --config config.yaml` or `yaml.safe_load()` directly.
|
||||
Loading…
Reference in new issue