# Alpha Lab
Quantitative research experiments for qshare library. This repository contains Jupyter notebooks and analysis scripts for exploring trading strategies and machine learning models.
## Philosophy
- **Notebook-centric**: Experiments are interactive notebooks, not rigid scripts
- **Minimal abstraction**: Simple functions over complex class hierarchies
- **Self-contained**: Each task directory is independent
- **Ad-hoc friendly**: Easy to modify for exploration
## Structure
```
alpha_lab/
├── common/ # Shared utilities (keep minimal!)
│ ├── __init__ .py
│ ├── paths.py # Path management
│ └── plotting.py # Common plotting functions
│
├── cta_1d/ # CTA 1-day return prediction
│ ├── __init__ .py # Re-exports from src/
│ ├── config.yaml # Task configuration
│ ├── src/ # Implementation modules
│ │ ├── __init__ .py
│ │ ├── loader.py # CTA1DLoader
│ │ ├── train.py # Training functions
│ │ ├── backtest.py # Backtest functions
│ │ └── labels.py # Label blending utilities
│ ├── 01_data_check.ipynb
│ ├── 02_label_analysis.ipynb
│ ├── 03_baseline_xgb.ipynb
│ └── 04_blend_comparison.ipynb
│
├── stock_15m/ # Stock 15-minute return prediction
│ ├── __init__ .py # Re-exports from src/
│ ├── config.yaml # Task configuration
│ ├── src/ # Implementation modules
│ │ ├── __init__ .py
│ │ ├── loader.py # Stock15mLoader
│ │ └── train.py # Training functions
│ ├── 01_data_exploration.ipynb
│ └── 02_baseline_model.ipynb
│
└── results/ # Output directory (gitignored)
├── cta_1d/
└── stock_15m/
```
## Setup
```bash
# Install dependencies
pip install -r requirements.txt
# Create environment file
cp .env.template .env
# Edit .env with your settings
```
## Usage
### Interactive (Notebooks)
Start Jupyter and run notebooks interactively:
```bash
jupyter notebook
```
Each task directory contains numbered notebooks:
- `01_*.ipynb` - Data loading and exploration
- `02_*.ipynb` - Analysis and baseline models
- `03_*.ipynb` - Advanced experiments
- `04_*.ipynb` - Comparisons and ablations
### Command Line
Train models from config files:
```bash
# CTA 1D
python -m cta_1d.train --config cta_1d/config.yaml --output results/cta_1d/exp01
# Stock 15m
python -m stock_15m.train --config stock_15m/config.yaml --output results/stock_15m/exp01
# CTA Backtest
python -m cta_1d.backtest \
--model results/cta_1d/exp01/model.json \
--dt-range 2023-01-01 2023-12-31 \
--output results/cta_1d/backtest_01
```
### Python API
```python
# Import from task root (re-exports from src/)
from cta_1d import CTA1DLoader, train_model, TrainConfig
from stock_15m import Stock15mLoader, train_model, TrainConfig
from common import create_experiment_dir
```
## Experiment Tracking
Experiments are tracked manually in `results/{task}/README.md` :
```markdown
## 2025-01-15: Baseline XGB
- Notebook: `cta_1d/03_baseline_xgb.ipynb` (cells 1-50)
- Config: eta=0.5, lambda=0.1
- Train IC: 0.042
- Test IC: 0.038
- Notes: Dual normalization, 4 trades/day
```
## Adding a New Task
1. Create directory: `mkdir my_task`
2. Add `src/` subdirectory with:
- `__init__.py` - Export public APIs
- `loader.py` - Dataset loader class
- Other modules as needed
3. Add root `__init__.py` that re-exports from `src/`
4. Create numbered notebooks
5. Add entry to `results/my_task/README.md`
## Git Worktrees
This repository uses git worktrees for parallel experiment development:
| Worktree | Branch | Purpose |
|----------|--------|---------|
| `alpha_lab` | `master` | Main repo (reference) |
| `alpha_lab_cta_1d` | `cta_1d_exp` | CTA 1-day experiments |
| `alpha_lab_stock_1d` | `stock_1d_exp` | Stock 1-day experiments |
| `alpha_lab_stock_15m` | `stock_15m_exp` | Stock 15-min experiments |
| `alpha_lab_data_ops` | `data_ops_exp` | Data ops research |
```bash
# Create a new worktree
git worktree add ../alpha_lab_new_exp -b new_exp
# List all worktrees
git worktree list
# Remove a worktree when done
git worktree remove ../alpha_lab_new_exp
```
## Best Practices
1. **Keep it simple** : Only add to `common/` after 3+ copies
2. **Module organization** : Place implementation in `src/` , re-export from root `__init__.py`
3. **Notebook configs** : Define CONFIG dict in first cell for easy modification
4. **Document results** : Update results README after significant runs
5. **Git discipline** : Don't commit large files, results, or credentials