# CTA 1-Day Return Prediction Experiments for predicting CTA (Commodity Trading Advisor) futures 1-day returns. ## Data - **Features**: alpha158, hffactor - **Labels**: Return indicators (o2c_twap1min, o2o_twap1min, etc.) - **Normalization**: dual (blend of zscore, cs_zscore, rolling_20, rolling_60) ## Notebooks | Notebook | Purpose | |----------|---------| | `01_data_check.ipynb` | Load and validate CTA data | | `02_label_analysis.ipynb` | Explore label distributions and blending | | `03_baseline_xgb.ipynb` | Train baseline XGBoost model | | `04_blend_comparison.ipynb` | Compare different normalization blends | ## Blend Configurations The label blending combines 4 normalization methods: - **zscore**: Fit-time mean/std normalization - **cs_zscore**: Cross-sectional z-score per datetime - **rolling_20**: 20-day rolling window normalization - **rolling_60**: 60-day rolling window normalization Predefined weights (from qshare.config.research.cta.labels): - `equal`: [0.25, 0.25, 0.25, 0.25] - `zscore_heavy`: [0.5, 0.2, 0.15, 0.15] - `rolling_heavy`: [0.1, 0.1, 0.3, 0.5] - `cs_heavy`: [0.2, 0.5, 0.15, 0.15] - `short_term`: [0.1, 0.1, 0.4, 0.4] - `long_term`: [0.4, 0.2, 0.2, 0.2] Default: [0.2, 0.1, 0.3, 0.4] ## Processors Module The `cta_1d.src.processors` module provides Polars-based data processors that replicate Qlib's preprocessing pipeline: ### Available Processors | Processor | Description | |-----------|-------------| | `DiffProcessor` | Adds diff features with configurable period | | `FlagMarketInjector` | Adds market_0, market_1 columns based on instrument codes | | `FlagSTInjector` | Creates IsST column from ST flags | | `ColumnRemover` | Removes specified columns | | `FlagToOnehot` | Converts one-hot industry flags to single index column | | `IndusNtrlInjector` | Industry neutralization per datetime | | `RobustZScoreNorm` | Robust z-score normalization using median/MAD | | `Fillna` | Fills NaN values with specified value | ### RobustZScoreNorm with Pre-fitted Parameters The `RobustZScoreNorm` processor supports loading pre-fitted parameters from Qlib's `proc_list.proc`: ```python from cta_1d.src.processors import RobustZScoreNorm # Method 1: Load from saved version (recommended) processor = RobustZScoreNorm.from_version("csiallx_feature2_ntrla_flag_pnlnorm") # Method 2: Load with direct parameters processor = RobustZScoreNorm( feature_cols=['KMID', 'KLEN', ...], use_qlib_params=True, qlib_mean=mean_array, qlib_std=std_array ) # Apply normalization df = processor.process(df) ``` ### Parameter Extraction Extract parameters from Qlib's proc_list.proc: ```bash python stock_1d/d033/alpha158_beta/scripts/extract_qlib_params.py \ --proc-list /path/to/proc_list.proc \ --version my_version ``` Output structure: ``` data/robust_zscore_params/{version}/ ├── mean_train.npy # Pre-fitted mean (330,) ├── std_train.npy # Pre-fitted std (330,) └── metadata.json # Feature columns and metadata ``` ### Pipeline Helper Functions ```python from cta_1d.src.processors import create_processor_pipeline, get_final_feature_columns # Create pipeline from processor configs pipeline = create_processor_pipeline([ {'type': 'Diff', 'columns': ['turnover', 'free_turnover']}, {'type': 'RobustZScoreNorm', 'feature_cols': feature_cols}, {'type': 'Fillna', 'value': 0}, ]) # Get final feature columns after industry neutralization final_cols = get_final_feature_columns( alpha158_cols=ALPHA158_COLS, market_ext_cols=MARKET_EXT_COLS, ) ```