You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
3 days ago | |
|---|---|---|
| .. | ||
| results | 3 days ago | |
| src | 3 days ago | |
| 01_data_check.ipynb | 3 weeks ago | |
| 02_label_analysis.ipynb | 3 weeks ago | |
| 03_baseline_xgb.ipynb | 3 weeks ago | |
| 03_baseline_xgb_executed.ipynb | 4 days ago | |
| 04_blend_comparison.ipynb | 3 weeks ago | |
| README.md | 3 days ago | |
| __init__.py | 3 weeks ago | |
| config.yaml | 4 days ago | |
| config_parquet.yaml | 3 weeks ago | |
README.md
CTA 1-Day Return Prediction
Experiments for predicting CTA (Commodity Trading Advisor) futures 1-day returns.
Data
- Features: alpha158, hffactor
- Labels: Return indicators (o2c_twap1min, o2o_twap1min, etc.)
- Normalization: dual (blend of zscore, cs_zscore, rolling_20, rolling_60)
Notebooks
| Notebook | Purpose |
|---|---|
01_data_check.ipynb |
Load and validate CTA data |
02_label_analysis.ipynb |
Explore label distributions and blending |
03_baseline_xgb.ipynb |
Train baseline XGBoost model |
04_blend_comparison.ipynb |
Compare different normalization blends |
Blend Configurations
The label blending combines 4 normalization methods:
- zscore: Fit-time mean/std normalization
- cs_zscore: Cross-sectional z-score per datetime
- rolling_20: 20-day rolling window normalization
- rolling_60: 60-day rolling window normalization
Predefined weights (from qshare.config.research.cta.labels):
equal: [0.25, 0.25, 0.25, 0.25]zscore_heavy: [0.5, 0.2, 0.15, 0.15]rolling_heavy: [0.1, 0.1, 0.3, 0.5]cs_heavy: [0.2, 0.5, 0.15, 0.15]short_term: [0.1, 0.1, 0.4, 0.4]long_term: [0.4, 0.2, 0.2, 0.2]
Default: [0.2, 0.1, 0.3, 0.4]
Processors Module
The cta_1d.src.processors module provides Polars-based data processors that replicate Qlib's preprocessing pipeline:
Available Processors
| Processor | Description |
|---|---|
DiffProcessor |
Adds diff features with configurable period |
FlagMarketInjector |
Adds market_0, market_1 columns based on instrument codes |
FlagSTInjector |
Creates IsST column from ST flags |
ColumnRemover |
Removes specified columns |
FlagToOnehot |
Converts one-hot industry flags to single index column |
IndusNtrlInjector |
Industry neutralization per datetime |
RobustZScoreNorm |
Robust z-score normalization using median/MAD |
Fillna |
Fills NaN values with specified value |
RobustZScoreNorm with Pre-fitted Parameters
The RobustZScoreNorm processor supports loading pre-fitted parameters from Qlib's proc_list.proc:
from cta_1d.src.processors import RobustZScoreNorm
# Method 1: Load from saved version (recommended)
processor = RobustZScoreNorm.from_version("csiallx_feature2_ntrla_flag_pnlnorm")
# Method 2: Load with direct parameters
processor = RobustZScoreNorm(
feature_cols=['KMID', 'KLEN', ...],
use_qlib_params=True,
qlib_mean=mean_array,
qlib_std=std_array
)
# Apply normalization
df = processor.process(df)
Parameter Extraction
Extract parameters from Qlib's proc_list.proc:
python stock_1d/d033/alpha158_beta/scripts/extract_qlib_params.py \
--proc-list /path/to/proc_list.proc \
--version my_version
Output structure:
data/robust_zscore_params/{version}/
├── mean_train.npy # Pre-fitted mean (330,)
├── std_train.npy # Pre-fitted std (330,)
└── metadata.json # Feature columns and metadata
Pipeline Helper Functions
from cta_1d.src.processors import create_processor_pipeline, get_final_feature_columns
# Create pipeline from processor configs
pipeline = create_processor_pipeline([
{'type': 'Diff', 'columns': ['turnover', 'free_turnover']},
{'type': 'RobustZScoreNorm', 'feature_cols': feature_cols},
{'type': 'Fillna', 'value': 0},
])
# Get final feature columns after industry neutralization
final_cols = get_final_feature_columns(
alpha158_cols=ALPHA158_COLS,
market_ext_cols=MARKET_EXT_COLS,
)