- Remove split_at_end parameter from pipeline.transform(), always return DataFrame
- Add pack_struct parameter to pack feature groups into struct columns
- Rename exporters: select_feature_groups_from_df -> get_groups, select_feature_groups -> get_groups_from_fg
- Add pack_structs() and unpack_struct() helper functions
- Remove split_from_merged() method from FeatureGroups (no longer needed)
- Rename dump_polars_dataset.py to dump_features.py with --pack-struct flag
- Update README with new CLI usage and struct column documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add extract_qlib_params.py script to extract pre-fitted mean/std parameters
from Qlib's proc_list.proc and save as reusable .npy files with metadata.json
- Add RobustZScoreNorm.from_version() class method to load saved parameters
by version name, supporting multiple parameter versions coexistence
- Update dump_polars_dataset.py to use from_version() instead of loading
parameters directly from proc_list.proc
- Update generate_beta_embedding.py to use qshare's filter_instruments for
stock universe filtering
- Save parameters to data/robust_zscore_params/csiallx_feature2_ntrla_flag_pnlnorm/
with 330 features (158 alpha158_ntrl + 158 alpha158_raw + 7 market_ext_ntrl + 7 market_ext_raw)
- Update README.md with documentation for parameter extraction and usage
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major changes:
- Fix FixedFlagMarketInjector to add market_0, market_1 columns based on instrument codes
- Fix FixedFlagSTInjector to create IsST column from ST_S, ST_Y flags
- Update generate_beta_embedding.py to handle IsST creation conditionally
- Add dump_polars_dataset.py for generating raw and processed datasets
- Add debug_data_divergence.py for comparing gold-standard vs polars output
Documentation:
- Update BUG_ANALYSIS_FINAL.md with IsST column issue discovery
- Update README.md with polars dataset generation instructions
Key discovery:
- The FlagSTInjector in the gold-standard qlib code fails silently
- The VAE was trained without IsST column (341 features, not 342)
- The polars pipeline correctly skips FlagSTInjector to match gold-standard
Generated dataset structure (2026-02-23 to 2026-02-27):
- Raw data: 18,291 rows × 204 columns
- Processed data: 18,291 rows × 342 columns (341 for VAE input)
- market_0, market_1 columns correctly added to feature_flag group
- Add .claudeignore and .clauderc for Claude Code setup
- Add config.yaml for cta_1d, stock_15m, and alpha158_beta tasks
- Add alpha158_beta pipeline.py with documentation
- Add utility scripts for embedding generation and prediction
- Add executed baseline notebook for cta_1d
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>