├── utils ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-312.pyc │ ├── data_loader.cpython-312.pyc │ └── performance.cpython-312.pyc ├── data_loader.py └── performance.py ├── factors ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-312.pyc │ ├── factor_scoring.cpython-312.pyc │ ├── timing_signal.cpython-312.pyc │ ├── factor_analysis.cpython-312.pyc │ ├── financial_factors.cpython-312.pyc │ ├── sentiment_factors.cpython-312.pyc │ └── technical_factors.cpython-312.pyc ├── sentiment_factors.py ├── financial_factors.py ├── timing_signal.py ├── technical_factors.py ├── factor_scoring.py └── factor_analysis.py ├── output ├── ic_summary.csv ├── positions.csv ├── portfolio_value.csv └── return_statistics.csv ├── strategy ├── __pycache__ │ ├── backtest.cpython-312.pyc │ ├── timing_signal.cpython-312.pyc │ └── stock_selection.cpython-312.pyc ├── stock_selection.py ├── timing_signal.py └── backtest.py ├── visualization ├── __pycache__ │ ├── ic_plot.cpython-312.pyc │ ├── plot_results.cpython-312.pyc │ └── backtest_vs_real.cpython-312.pyc ├── ic_plot.py ├── backtest_vs_real.py └── plot_results.py ├── factor_graveyard ├── retired_factors_log.txt └── bias_60.py ├── config.py ├── requirements.txt ├── LICENSE ├── main.py └── README.md /utils/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /factors/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /output/ ic_summary.csv: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /output/positions.csv: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /output/portfolio_value.csv: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /output/ return_statistics.csv: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /utils/__pycache__/__init__.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/utils/__pycache__/__init__.cpython-312.pyc -------------------------------------------------------------------------------- /factors/__pycache__/__init__.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/factors/__pycache__/__init__.cpython-312.pyc -------------------------------------------------------------------------------- /strategy/__pycache__/backtest.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/strategy/__pycache__/backtest.cpython-312.pyc -------------------------------------------------------------------------------- /utils/__pycache__/data_loader.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/utils/__pycache__/data_loader.cpython-312.pyc -------------------------------------------------------------------------------- /utils/__pycache__/performance.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/utils/__pycache__/performance.cpython-312.pyc -------------------------------------------------------------------------------- /factors/__pycache__/factor_scoring.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/factors/__pycache__/factor_scoring.cpython-312.pyc -------------------------------------------------------------------------------- /factors/__pycache__/timing_signal.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/factors/__pycache__/timing_signal.cpython-312.pyc -------------------------------------------------------------------------------- /strategy/__pycache__/timing_signal.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/strategy/__pycache__/timing_signal.cpython-312.pyc -------------------------------------------------------------------------------- /visualization/__pycache__/ic_plot.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/visualization/__pycache__/ic_plot.cpython-312.pyc -------------------------------------------------------------------------------- /factors/__pycache__/factor_analysis.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/factors/__pycache__/factor_analysis.cpython-312.pyc -------------------------------------------------------------------------------- /factors/__pycache__/financial_factors.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/factors/__pycache__/financial_factors.cpython-312.pyc -------------------------------------------------------------------------------- /factors/__pycache__/sentiment_factors.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/factors/__pycache__/sentiment_factors.cpython-312.pyc -------------------------------------------------------------------------------- /factors/__pycache__/technical_factors.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/factors/__pycache__/technical_factors.cpython-312.pyc -------------------------------------------------------------------------------- /strategy/__pycache__/stock_selection.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/strategy/__pycache__/stock_selection.cpython-312.pyc -------------------------------------------------------------------------------- /visualization/__pycache__/plot_results.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/visualization/__pycache__/plot_results.cpython-312.pyc -------------------------------------------------------------------------------- /visualization/__pycache__/backtest_vs_real.cpython-312.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/1984tkr/multi-factor-stock-selection/HEAD/visualization/__pycache__/backtest_vs_real.cpython-312.pyc -------------------------------------------------------------------------------- /factor_graveyard/retired_factors_log.txt: -------------------------------------------------------------------------------- 1 | touch factor_graveyard/retired_factors_log.txt 2 | 2025-03-06 15:45:32 - bias_60 retired due to 3 consecutive months ICIR < 0.3 3 | 2025-04-06 15:46:12 - volatility_30 retired due to 3 consecutive months ICIR < 0.3 -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | # config.py 2 | TUSHARE_TOKEN = 'e89dfa8dd6d20936bb00c60980fa1eb527d874b59ea073b2c4ff0128' 3 | 4 | START_DATE = '2023-01-01' 5 | END_DATE = '2023-12-31' 6 | TOP_N = 20 7 | BENCHMARK_INDEX = '000300.SH' 8 | 9 | # 新增:技术因子计算窗口N 10 | TECHNICAL_FACTOR_WINDOWS = [5, 20, 60, 120, 250] # 可灵活调节,统一控制 -------------------------------------------------------------------------------- /factor_graveyard/bias_60.py: -------------------------------------------------------------------------------- 1 | # factors/bias_60.py 2 | """ 3 | Bias_60 因子计算 4 | 计算规则:当前价格与60日均线的偏离度 5 | """ 6 | 7 | import pandas as pd 8 | 9 | def calculate_bias_60(stock_data): 10 | """ 11 | 计算60日价格偏离度 12 | :param stock_data: 单只股票的历史行情数据,包含trade_date和close列 13 | :return: stock_data DataFrame,新增bias_60列 14 | """ 15 | stock_data['ma_60'] = stock_data['close'].rolling(window=60).mean() 16 | stock_data['bias_60'] = (stock_data['close'] - stock_data['ma_60']) / stock_data['ma_60'] 17 | return stock_data[['ts_code', 'trade_date', 'bias_60']] -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | beautifulsoup4==4.13.3 2 | bs4==0.0.2 3 | certifi==2025.1.31 4 | charset-normalizer==3.4.1 5 | contourpy==1.3.1 6 | cycler==0.12.1 7 | fonttools==4.55.5 8 | idna==3.10 9 | joblib==1.4.2 10 | kiwisolver==1.4.8 11 | lightgbm==4.6.0 12 | lxml==5.3.1 13 | matplotlib==3.10.0 14 | numpy==2.2.2 15 | packaging==24.2 16 | pandas==2.2.3 17 | pillow==11.1.0 18 | pyparsing==3.2.1 19 | python-dateutil==2.9.0.post0 20 | pytz==2024.2 21 | requests==2.32.3 22 | scikit-learn==1.6.1 23 | scipy==1.15.1 24 | seaborn==0.13.2 25 | simplejson==3.20.1 26 | six==1.17.0 27 | soupsieve==2.6 28 | threadpoolctl==3.5.0 29 | tqdm==4.67.1 30 | tushare==1.4.19 31 | typing_extensions==4.12.2 32 | tzdata==2025.1 33 | urllib3==2.3.0 34 | websocket-client==1.8.0 35 | -------------------------------------------------------------------------------- /visualization/ic_plot.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import matplotlib.pyplot as plt 3 | 4 | def plot_ic_time_series(ic_df, selected_factors=None, output_file='output/ic_time_series.png'): 5 | """ 6 | 绘制单因子IC时间序列图 7 | 8 | :param ic_df: 每日IC数据(index=日期,列=因子名) 9 | :param selected_factors: 需要绘制的因子列表(None表示全部因子) 10 | :param output_file: 保存路径 11 | """ 12 | plt.figure(figsize=(12, 6)) 13 | 14 | factors = selected_factors if selected_factors else ic_df.columns.tolist() 15 | 16 | for factor in factors: 17 | plt.plot(ic_df.index, ic_df[factor], label=factor) 18 | 19 | plt.axhline(0, color='gray', linestyle='--', linewidth=0.8) 20 | plt.legend() 21 | plt.title('因子IC时间序列') 22 | plt.xlabel('Date') 23 | plt.ylabel('IC') 24 | plt.grid(True) 25 | 26 | plt.savefig(output_file) 27 | plt.close() 28 | 29 | print(f"✅ 因子IC时间序列图已保存至 {output_file}") -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 1984tkr 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /visualization/backtest_vs_real.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import matplotlib.pyplot as plt 3 | 4 | def plot_backtest_vs_market(portfolio_value, market_data, timing_signals, output_file='output/backtest_vs_market.png'): 5 | """ 6 | 绘制回测净值 vs 市场指数净值,以及择时信号叠加 7 | :param portfolio_value: 回测组合净值(trade_date, portfolio_value) 8 | :param market_data: 市场行情(trade_date, ts_code='000001.SH', close列) 9 | :param timing_signals: 择时信号(trade_date, final_signal) 10 | """ 11 | 12 | # 获取市场基准指数(如上证指数) 13 | index_data = market_data[market_data['ts_code'] == '000001.SH'][['trade_date', 'close']].copy() 14 | index_data['index_return'] = index_data['close'].pct_change().fillna(0) 15 | index_data['index_nav'] = (1 + index_data['index_return']).cumprod() 16 | 17 | # 合并数据 18 | merged = portfolio_value.merge(index_data, on='trade_date', how='inner') 19 | merged = merged.merge(timing_signals, on='trade_date', how='left') 20 | 21 | # 绘图 22 | fig, ax1 = plt.subplots(figsize=(12, 6)) 23 | 24 | ax1.plot(merged['trade_date'], merged['portfolio_value'], label='策略净值', color='blue') 25 | ax1.plot(merged['trade_date'], merged['index_nav'], label='上证指数', color='gray') 26 | ax1.set_ylabel('净值') 27 | ax1.legend(loc='upper left') 28 | ax1.set_title('策略净值 vs 上证指数 vs 择时信号') 29 | 30 | # 添加择时信号 31 | ax2 = ax1.twinx() 32 | ax2.plot(merged['trade_date'], merged['final_signal'], label='择时信号', color='red', linestyle='--', alpha=0.6) 33 | ax2.set_ylabel('择时信号(1=多头,0=空头)') 34 | ax2.set_ylim(-0.1, 1.1) 35 | ax2.legend(loc='upper right') 36 | 37 | plt.grid(True) 38 | plt.savefig(output_file) 39 | plt.close() 40 | 41 | print(f"✅ 回测 vs 市场 vs 择时信号图已保存至 {output_file}") -------------------------------------------------------------------------------- /factors/sentiment_factors.py: -------------------------------------------------------------------------------- 1 | # factors/sentiment_factors.py 2 | # 计算典型情绪因子(个股+市场整体情绪) 3 | 4 | import pandas as pd 5 | 6 | def calculate_sentiment_factors(stock_data, market_data): 7 | """ 8 | 计算情绪因子,包括换手率、连板、市场热度等 9 | :param stock_data: 个股行情数据(包含涨跌停信息等) 10 | :param market_data: 全市场行情数据(用于整体情绪因子计算) 11 | :return: 含情绪因子的DataFrame 12 | """ 13 | df = stock_data.copy() 14 | 15 | # 1. 换手率因子 16 | if 'float_share' in df.columns and 'vol' in df.columns: 17 | df['turnover_rate'] = df['vol'] / df['float_share'] 18 | else: 19 | df['turnover_rate'] = None # 如果无流通股本数据,暂缺失 20 | 21 | # 2. 涨停/跌停标记 22 | df['is_limit_up'] = (df['pct_chg'] >= 9.9).astype(int) 23 | df['is_limit_down'] = (df['pct_chg'] <= -9.9).astype(int) 24 | 25 | # 3. 连板计数(按个股分组统计) 26 | df['consecutive_limit_up'] = 0 27 | for ts_code, stock_group in df.groupby('ts_code'): 28 | stock_group = stock_group.sort_values('trade_date') 29 | consecutive = 0 30 | for i, row in stock_group.iterrows(): 31 | if row['is_limit_up'] == 1: 32 | consecutive += 1 33 | else: 34 | consecutive = 0 35 | df.at[i, 'consecutive_limit_up'] = consecutive 36 | 37 | # 4. 市场整体热度(涨停家数/跌停家数比值) 38 | if 'is_limit_up' in market_data.columns and 'is_limit_down' in market_data.columns: 39 | daily_stats = market_data.groupby('trade_date').agg( 40 | limit_up_count=('is_limit_up', 'sum'), 41 | limit_down_count=('is_limit_down', 'sum') 42 | ) 43 | daily_stats['market_heat'] = daily_stats['limit_up_count'] / (daily_stats['limit_down_count'] + 1) 44 | else: 45 | daily_stats = pd.DataFrame(index=market_data['trade_date'].unique()) 46 | daily_stats['market_heat'] = None 47 | 48 | # 合并市场情绪热度到个股数据 49 | df = df.merge(daily_stats[['market_heat']], on='trade_date', how='left') 50 | 51 | # 5. 预留扩展因子(例如龙虎榜净买、舆情得分等) 52 | df['net_buy_lhb'] = None # 示例占位 53 | df['sentiment_score'] = None # 示例占位,NLP舆情评分 54 | 55 | return df[['ts_code', 'trade_date', 'turnover_rate', 'consecutive_limit_up', 'market_heat', 'net_buy_lhb', 'sentiment_score']] -------------------------------------------------------------------------------- /factors/financial_factors.py: -------------------------------------------------------------------------------- 1 | # financial_factors.py 2 | # 财务因子计算模块,计算每个个股的核心财务指标 3 | 4 | import numpy as np 5 | import pandas as pd 6 | 7 | def calculate_financial_factors(df): 8 | """ 9 | 根据财务数据计算多种财务因子 10 | 这些因子可以作为选股多因子体系的重要组成部分 11 | """ 12 | 13 | # 成长性因子 14 | df['revenue_growth'] = df['revenue'].pct_change(4) # 四季度同比增长 15 | df['profit_growth'] = df['net_profit'].pct_change(4) 16 | 17 | # 估值因子(假设市值已在外部计算加入) 18 | df['pe'] = df['market_cap'] / df['net_profit'] 19 | df['pb'] = df['market_cap'] / df['net_asset'] 20 | df['ev_ebitda'] = (df['market_cap'] + df['total_liability'] - df['cash']) / df['ebitda'] 21 | 22 | # 财务健康因子 23 | df['debt_to_asset'] = df['total_liability'] / df['net_asset'] 24 | df['cash_ratio'] = df['cash'] / df['total_liability'] 25 | 26 | # 质量因子 27 | df['roe'] = df['net_profit'] / df['net_asset'] 28 | 29 | # 特殊处理,防止极端值干扰 30 | for col in ['pe', 'pb', 'ev_ebitda']: 31 | df[col] = np.clip(df[col], 0, np.percentile(df[col].dropna(), 95)) # 剔除极端值 32 | 33 | return df 34 | 35 | # factors/financial_factors.py 36 | """ 37 | 财务因子计算模块 38 | 支持从财务数据中计算盈利能力、估值等因子 39 | """ 40 | 41 | def calculate_financial_factors(all_data): 42 | """ 43 | 计算财务因子 44 | :param all_data: 合并后的行情+财务数据(包含trade_date, ts_code, pe, roe等列) 45 | :return: all_data(增加财务因子列) 46 | """ 47 | 48 | # 估值因子 49 | all_data['pe_ttm'] = all_data['pe_ttm'].replace([None, 0], pd.NA).fillna(method='ffill') 50 | all_data['pb'] = all_data['pb'].replace([None, 0], pd.NA).fillna(method='ffill') 51 | all_data['ps_ttm'] = all_data['ps_ttm'].replace([None, 0], pd.NA).fillna(method='ffill') 52 | 53 | # 盈利能力因子 54 | all_data['roe_ttm'] = all_data['roe'].replace([None], pd.NA).fillna(method='ffill') 55 | all_data['gross_profit_margin'] = all_data['grossprofit_margin'].replace([None], pd.NA).fillna(method='ffill') 56 | 57 | # 财务杠杆因子 58 | all_data['debt_asset_ratio'] = all_data['debt_to_assets'].replace([None], pd.NA).fillna(method='ffill') 59 | 60 | # 增长因子 61 | all_data['revenue_growth'] = all_data['revenue_yoy'].replace([None], pd.NA).fillna(method='ffill') 62 | all_data['net_profit_growth'] = all_data['netprofit_yoy'].replace([None], pd.NA).fillna(method='ffill') 63 | 64 | return all_data -------------------------------------------------------------------------------- /strategy/stock_selection.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import os 4 | 5 | def standardize_and_score(group, factor_weights): 6 | """ 7 | 对单期因子数据进行标准化并计算综合评分 8 | """ 9 | for factor in factor_weights: 10 | group[f'{factor}_z'] = (group[factor] - group[factor].mean()) / (group[factor].std() + 1e-8) 11 | group['composite_score'] = sum(group[f'{factor}_z'] * weight for factor, weight in factor_weights.items()) 12 | return group 13 | 14 | def select_stocks_by_score(group, top_n=50): 15 | """ 16 | 根据因子评分选出Top N股票,并计算等权权重(可替换为评分加权) 17 | """ 18 | group = group.sort_values(by='composite_score', ascending=False).head(top_n) 19 | group['weight'] = 1 / len(group) # 等权配置,你也可以改成按评分加权 20 | return group[['trade_date', 'ts_code', 'weight']] 21 | 22 | def construct_positions(factor_data, factor_weights, top_n=50, output_dir='output'): 23 | """ 24 | 完整流程:因子标准化 -> 综合评分 -> 选股 -> 计算权重 -> 保存持仓文件 25 | """ 26 | if not os.path.exists(output_dir): 27 | os.makedirs(output_dir) 28 | 29 | # 逐期标准化和评分 30 | factor_data = factor_data.groupby('trade_date').apply(lambda x: standardize_and_score(x, factor_weights)) 31 | 32 | # 逐期选股并生成仓位 33 | positions = factor_data.groupby('trade_date').apply(lambda x: select_stocks_by_score(x, top_n)) 34 | positions = positions.reset_index(drop=True) 35 | 36 | # 保存到positions.csv 37 | positions.to_csv(os.path.join(output_dir, 'positions.csv'), index=False) 38 | 39 | return positions 40 | 41 | if __name__ == "__main__": 42 | # 示例因子数据(实盘请替换成真实数据) 43 | data = { 44 | 'trade_date': ['2024-03-01'] * 5 + ['2024-03-02'] * 5, 45 | 'ts_code': ['000001.SZ', '600519.SH', '002230.SZ', '000858.SZ', '300750.SZ'] * 2, 46 | 'pe_ttm': [12, 25, 30, 15, 40, 12, 24, 29, 16, 38], 47 | 'roe': [0.15, 0.22, 0.18, 0.21, 0.13, 0.14, 0.21, 0.17, 0.20, 0.12], 48 | 'momentum_20': [0.05, 0.08, 0.12, 0.03, 0.10, 0.04, 0.07, 0.11, 0.02, 0.09], 49 | 'volatility_60': [0.20, 0.18, 0.25, 0.22, 0.30, 0.21, 0.19, 0.24, 0.23, 0.28], 50 | 'sentiment_score': [0.60, 0.72, 0.55, 0.70, 0.65, 0.61, 0.71, 0.57, 0.68, 0.66] 51 | } 52 | factor_data = pd.DataFrame(data) 53 | 54 | # 因子权重(正负代表因子对评分贡献方向) 55 | factor_weights = { 56 | 'pe_ttm': -0.2, # 估值越低越好 57 | 'roe': 0.3, # 盈利能力越高越好 58 | 'momentum_20': 0.3, # 趋势越强越好 59 | 'volatility_60': -0.1, # 波动越低越好 60 | 'sentiment_score': 0.2 # 情绪越好越好 61 | } 62 | 63 | # 构建仓位并保存 64 | positions = construct_positions(factor_data, factor_weights, top_n=3) 65 | 66 | print("✅ 仓位构建完成,已保存到 output/positions.csv") 67 | print(positions) -------------------------------------------------------------------------------- /utils/data_loader.py: -------------------------------------------------------------------------------- 1 | import tushare as ts 2 | import pandas as pd 3 | import time 4 | import requests 5 | from config import TUSHARE_TOKEN 6 | 7 | # 设置Tushare Token 8 | ts.set_token(TUSHARE_TOKEN) 9 | pro = ts.pro_api() 10 | 11 | # 获取全市场股票列表(带重试) 12 | def get_stock_list_with_retry(max_retries=5): 13 | """ 14 | 获取全市场股票列表,并加入重试机制,防止Tushare超时或限流导致失败 15 | """ 16 | for attempt in range(max_retries): 17 | try: 18 | print(f"正在获取股票列表,尝试 {attempt+1}/{max_retries}...") 19 | stock_list = pro.stock_basic(exchange='', list_status='L')['ts_code'].tolist() 20 | print(f"成功获取股票列表,共 {len(stock_list)} 只股票") 21 | return stock_list 22 | except requests.exceptions.RequestException as e: 23 | print(f"获取股票列表失败,重试 {attempt+1}/{max_retries},错误信息:{e}") 24 | time.sleep(5) # 每次重试前等待5秒 25 | raise Exception("多次重试后,获取股票列表依然失败") 26 | 27 | # 分批获取市场数据 28 | def load_market_data(start_date='20230101', end_date='20240306', batch_size=50): 29 | """ 30 | 分批获取市场数据,每批最多获取batch_size只股票,每批之间延时5秒,防止Tushare限流 31 | """ 32 | stock_list = get_stock_list_with_retry() 33 | 34 | all_data = [] 35 | for i in range(0, len(stock_list), batch_size): 36 | batch = stock_list[i:i+batch_size] 37 | print(f"正在获取第 {i//batch_size + 1} 批市场数据,共 {len(batch)} 只股票...") 38 | for ts_code in batch: 39 | try: 40 | df = pro.daily(ts_code=ts_code, start_date=start_date, end_date=end_date) 41 | all_data.append(df) 42 | except Exception as e: 43 | print(f"获取 {ts_code} 行情数据失败,跳过。错误信息:{e}") 44 | time.sleep(5) # 每批次间隔5秒,降低触发限流风险 45 | 46 | market_data = pd.concat(all_data, ignore_index=True) 47 | market_data['trade_date'] = pd.to_datetime(market_data['trade_date']) 48 | 49 | return market_data 50 | 51 | # 分批获取财务数据 52 | def load_financial_data(start_date='20230101', end_date='20240306', batch_size=50): 53 | """ 54 | 分批获取财务数据,每批最多获取batch_size只股票,每批之间延时5秒,防止Tushare限流 55 | """ 56 | stock_list = get_stock_list_with_retry() 57 | 58 | all_data = [] 59 | for i in range(0, len(stock_list), batch_size): 60 | batch = stock_list[i:i+batch_size] 61 | print(f"正在获取第 {i//batch_size + 1} 批财务数据,共 {len(batch)} 只股票...") 62 | for ts_code in batch: 63 | try: 64 | df = pro.fina_indicator(ts_code=ts_code, start_date=start_date, end_date=end_date) 65 | all_data.append(df) 66 | except Exception as e: 67 | print(f"获取 {ts_code} 财务数据失败,跳过。错误信息:{e}") 68 | time.sleep(5) # 每批次间隔5秒,降低触发限流风险 69 | 70 | financial_data = pd.concat(all_data, ignore_index=True) 71 | financial_data['trade_date'] = pd.to_datetime(financial_data['trade_date']) 72 | 73 | return financial_data -------------------------------------------------------------------------------- /factors/timing_signal.py: -------------------------------------------------------------------------------- 1 | # timing_signal.py 2 | # 多因子选股策略择时信号模块(含加权择时功能) 3 | 4 | import pandas as pd 5 | import numpy as np 6 | 7 | def calculate_ma_timing_signal(index_df, short_window=20, long_window=60): 8 | """ 9 | 简单均线择时信号:短期均线上穿长期均线看多,反之看空 10 | """ 11 | index_df['ma_short'] = index_df['close'].rolling(short_window).mean() 12 | index_df['ma_long'] = index_df['close'].rolling(long_window).mean() 13 | index_df['ma_signal'] = np.where(index_df['ma_short'] > index_df['ma_long'], 1, 0) 14 | return index_df[['ma_signal']] 15 | 16 | def calculate_breadth_timing_signal(stock_universe_df, date_col='trade_date'): 17 | """ 18 | 市场宽度择时信号:每日上涨股票占比 19 | """ 20 | stock_universe_df['up'] = stock_universe_df['pct_chg'] > 0 21 | breadth_df = stock_universe_df.groupby(date_col)['up'].mean() 22 | breadth_signal = np.where(breadth_df > 0.6, 1, np.where(breadth_df < 0.4, 0, np.nan)) 23 | return pd.DataFrame(breadth_signal, index=breadth_df.index, columns=['breadth_signal']) 24 | 25 | def calculate_momentum_timing_signal(index_df, window=20): 26 | """ 27 | 指数动量择时信号:最近N日涨幅大于0,看多;反之看空 28 | """ 29 | index_df['momentum'] = index_df['close'].pct_change(window).rolling(window).sum() 30 | index_df['momentum_signal'] = np.where(index_df['momentum'] > 0, 1, 0) 31 | return index_df[['momentum_signal']] 32 | 33 | def calculate_weighted_timing_signal(index_df, stock_universe_df, weights=None, long_threshold=0.6, short_threshold=0.4): 34 | """ 35 | 加权择时信号: 36 | - 按权重综合三种择时信号 37 | - 加权得分高于long_threshold时做多,低于short_threshold时空仓 38 | """ 39 | if weights is None: 40 | # 默认权重(可以根据历史回测效果微调) 41 | weights = { 42 | 'ma': 0.4, 43 | 'breadth': 0.3, 44 | 'momentum': 0.3 45 | } 46 | 47 | # 计算各单项信号 48 | ma_signal = calculate_ma_timing_signal(index_df)['ma_signal'] 49 | breadth_signal = calculate_breadth_timing_signal(stock_universe_df)['breadth_signal'] 50 | momentum_signal = calculate_momentum_timing_signal(index_df)['momentum_signal'] 51 | 52 | # 合并信号 53 | combined_signal = pd.concat([ma_signal, breadth_signal, momentum_signal], axis=1) 54 | combined_signal.columns = ['ma_signal', 'breadth_signal', 'momentum_signal'] 55 | 56 | # 计算加权得分 57 | combined_signal['weighted_score'] = ( 58 | combined_signal['ma_signal'] * weights['ma'] + 59 | combined_signal['breadth_signal'] * weights['breadth'] + 60 | combined_signal['momentum_signal'] * weights['momentum'] 61 | ) 62 | 63 | # 根据加权得分判断最终信号 64 | combined_signal['final_signal'] = np.where( 65 | combined_signal['weighted_score'] >= long_threshold, 1, 66 | np.where(combined_signal['weighted_score'] <= short_threshold, 0, np.nan) 67 | ) 68 | 69 | return combined_signal[['weighted_score', 'final_signal']] -------------------------------------------------------------------------------- /strategy/timing_signal.py: -------------------------------------------------------------------------------- 1 | # strategy/timing_signal.py 2 | """ 3 | 市场择时信号模块 4 | 包含均线择时、市场宽度、成交量趋势等信号计算 5 | """ 6 | 7 | import pandas as pd 8 | import numpy as np 9 | 10 | def calculate_moving_average_signals(market_data): 11 | """ 12 | 基于指数均线判断多头/空头市场 13 | - MA20 > MA60:多头市场 14 | - MA20 < MA60:空头市场 15 | """ 16 | index_data = market_data[market_data['ts_code'] == '000001.SH'] # 上证指数 17 | index_data['ma20'] = index_data['close'].rolling(20).mean() 18 | index_data['ma60'] = index_data['close'].rolling(60).mean() 19 | index_data['ma_signal'] = np.where(index_data['ma20'] > index_data['ma60'], 1, 0) 20 | return index_data[['trade_date', 'ma_signal']] 21 | 22 | def calculate_market_breadth_signals(market_data): 23 | """ 24 | 市场宽度指标: 25 | - 上涨家数占比 > 60%:多头市场 26 | - 上涨家数占比 < 40%:空头市场 27 | """ 28 | def daily_breadth(group): 29 | up_stocks = (group['pct_chg'] > 0).sum() 30 | total_stocks = len(group) 31 | return up_stocks / total_stocks 32 | 33 | breadth_df = market_data.groupby('trade_date').apply(daily_breadth).reset_index() 34 | breadth_df.columns = ['trade_date', 'up_ratio'] 35 | breadth_df['breadth_signal'] = np.where(breadth_df['up_ratio'] > 0.6, 1, 36 | np.where(breadth_df['up_ratio'] < 0.4, 0, np.nan)) 37 | 38 | return breadth_df[['trade_date', 'breadth_signal']] 39 | 40 | def calculate_volume_trend_signals(market_data): 41 | """ 42 | 市场整体成交量趋势判断 43 | - 成交量5日均线 > 20日均线:放量 44 | - 否则:缩量 45 | """ 46 | index_data = market_data[market_data['ts_code'] == '000001.SH'] 47 | index_data['vol_ma5'] = index_data['vol'].rolling(5).mean() 48 | index_data['vol_ma20'] = index_data['vol'].rolling(20).mean() 49 | index_data['volume_signal'] = np.where(index_data['vol_ma5'] > index_data['vol_ma20'], 1, 0) 50 | return index_data[['trade_date', 'volume_signal']] 51 | 52 | def generate_combined_timing_signal(market_data): 53 | """ 54 | 综合多个择时信号生成最终择时信号 55 | 信号权重可以根据策略经验调整 56 | """ 57 | ma_signals = calculate_moving_average_signals(market_data) 58 | breadth_signals = calculate_market_breadth_signals(market_data) 59 | volume_signals = calculate_volume_trend_signals(market_data) 60 | 61 | combined = ma_signals.merge(breadth_signals, on='trade_date', how='left') 62 | combined = combined.merge(volume_signals, on='trade_date', how='left') 63 | 64 | # 简单信号投票 65 | combined['timing_signal'] = combined[['ma_signal', 'breadth_signal', 'volume_signal']].mean(axis=1) 66 | 67 | # 当信号>=0.66(两项或以上看多),认为是多头信号 68 | # 当信号<=0.33(两项或以上看空),认为是空头信号 69 | combined['final_signal'] = np.where(combined['timing_signal'] >= 0.66, 1, 70 | np.where(combined['timing_signal'] <= 0.33, 0, np.nan)) 71 | 72 | return combined[['trade_date', 'final_signal']] -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | from utils.data_loader import load_market_data, load_financial_data 3 | from factors.financial_factors import calculate_financial_factors 4 | from factors.technical_factors import calculate_technical_factors 5 | from factors.factor_analysis import evaluate_and_filter_factors 6 | from strategy.stock_selection import construct_positions 7 | from strategy.backtest import run_backtest 8 | from strategy.timing_signal import generate_combined_timing_signal 9 | from utils.performance import calculate_performance_metrics 10 | from visualization.plot_results import plot_portfolio_performance 11 | from visualization.ic_plot import plot_ic_time_series 12 | from visualization.backtest_vs_real import plot_backtest_vs_market 13 | import os 14 | 15 | def main(): 16 | # 确保output目录存在 17 | os.makedirs('output', exist_ok=True) 18 | 19 | print("📊 正在加载市场和财务数据...") 20 | market_data = load_market_data() 21 | financial_data = load_financial_data() 22 | 23 | # 合并数据并计算因子 24 | print("📊 正在计算财务因子和技术因子...") 25 | all_data = market_data.merge(financial_data, on=['trade_date', 'ts_code'], how='left') 26 | all_data = calculate_financial_factors(all_data) 27 | all_data = calculate_technical_factors(all_data) 28 | 29 | # 计算未来5日收益率,作为IC评估基础 30 | all_data = all_data.sort_values(by=['ts_code', 'trade_date']) 31 | all_data['future_5d_return'] = all_data.groupby('ts_code')['close'].shift(-5) / all_data['close'] - 1 32 | 33 | # 评估因子表现并筛选有效因子 34 | print("📊 正在评估因子表现并筛选...") 35 | selected_factors, ic_df, monthly_ic, icir_df = evaluate_and_filter_factors(all_data, future_return_col='future_5d_return') 36 | 37 | print(f"✅ 选中的有效因子: {selected_factors}") 38 | 39 | # 绘制因子IC时间序列图 40 | plot_ic_time_series(ic_df, selected_factors, output_file='output/ic_time_series.png') 41 | 42 | # 构建仓位(选股+因子加权评分) 43 | print("📊 正在构建选股仓位...") 44 | factor_weights = {factor: 1 / len(selected_factors) for factor in selected_factors} 45 | positions = construct_positions(all_data, factor_weights, top_n=50) 46 | 47 | # 生成市场择时信号 48 | print("📊 正在生成市场择时信号...") 49 | timing_signals = generate_combined_timing_signal(market_data) 50 | timing_signals.to_csv('output/timing_signals.csv', index=False) 51 | 52 | # 执行回测(结合择时信号和仓位) 53 | print("📊 正在运行回测...") 54 | portfolio_value, daily_positions = run_backtest(positions, market_data, timing_signals) 55 | 56 | # 保存每日净值和持仓记录 57 | portfolio_value.to_csv('output/portfolio_value.csv', index=False) 58 | daily_positions.to_csv('output/positions.csv', index=False) 59 | 60 | # 计算并保存绩效统计 61 | print("📊 计算回测绩效...") 62 | performance_summary = calculate_performance_metrics(portfolio_value) 63 | performance_summary['Return Statistics'].to_csv('output/return_statistics.csv', index=False) 64 | 65 | # 保存因子IC表现 66 | ic_summary = monthly_ic.copy() 67 | for factor in monthly_ic.columns: 68 | ic_summary[f'ICIR_{factor}'] = icir_df[factor] 69 | ic_summary.to_csv('output/ic_summary.csv') 70 | 71 | # 绘制组合净值曲线+择时信号 72 | print("📊 生成净值及择时信号图表...") 73 | plot_portfolio_performance(portfolio_value, timing_signals) 74 | 75 | # 绘制回测 vs 上证指数 vs 择时信号对比图 76 | print("📊 生成回测与实盘信号对比图...") 77 | plot_backtest_vs_market(portfolio_value, market_data, timing_signals, output_file='output/backtest_vs_market.png') 78 | 79 | print("✅ 全流程运行完毕,结果保存至output文件夹!") 80 | 81 | if __name__ == '__main__': 82 | main() -------------------------------------------------------------------------------- /visualization/plot_results.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | 5 | def plot_portfolio_performance(portfolio_value, timing_signals=None, benchmark_data=None, output_file='output/portfolio_performance.png'): 6 | """ 7 | 绘制策略净值曲线,并叠加择时信号(可选)和基准指数(可选),以及最大回撤区域。 8 | 9 | :param portfolio_value: DataFrame,包含 trade_date, portfolio_value 10 | :param timing_signals: DataFrame(可选),择时信号(trade_date, final_signal) 11 | :param benchmark_data: DataFrame(可选),基准指数净值(trade_date, benchmark_value) 12 | :param output_file: 图片保存路径 13 | """ 14 | fig, ax1 = plt.subplots(figsize=(12, 6)) 15 | 16 | # 画策略净值曲线 17 | ax1.plot(portfolio_value['trade_date'], portfolio_value['portfolio_value'], label='策略净值', color='blue') 18 | 19 | # 画基准指数净值曲线(可选) 20 | if benchmark_data is not None: 21 | ax1.plot(benchmark_data['trade_date'], benchmark_data['benchmark_value'], label='基准指数', color='gray') 22 | 23 | ax1.set_ylabel('净值') 24 | ax1.set_title('策略净值 vs 参考指数 vs 择时信号') 25 | ax1.legend(loc='upper left') 26 | 27 | # === 标注最大回撤区域 === 28 | max_drawdown_idx = (portfolio_value['portfolio_value'] / portfolio_value['portfolio_value'].cummax() - 1).idxmin() 29 | max_drawdown_date = portfolio_value.loc[max_drawdown_idx, 'trade_date'] 30 | ax1.axvline(x=max_drawdown_date, color='black', linestyle='--', label='最大回撤') 31 | 32 | # === 画择时信号(可选)=== 33 | if timing_signals is not None: 34 | ax2 = ax1.twinx() 35 | ax2.plot(timing_signals['trade_date'], timing_signals['final_signal'], label='择时信号', color='red', linestyle='--', alpha=0.6) 36 | ax2.set_ylabel('择时信号(1=多头,0=空仓)') 37 | ax2.set_ylim(-0.1, 1.1) 38 | ax2.legend(loc='upper right') 39 | 40 | plt.grid(True) 41 | plt.savefig(output_file) 42 | plt.close() 43 | 44 | print(f"✅ 策略净值 vs 择时信号 图已保存至 {output_file}") 45 | 46 | def plot_annual_returns(portfolio_value, output_file='output/annual_returns.png'): 47 | """ 48 | 绘制年度收益柱状图 49 | 50 | :param portfolio_value: DataFrame,包含 trade_date, portfolio_value 51 | :param output_file: 图片保存路径 52 | """ 53 | df = portfolio_value.copy() 54 | df['year'] = pd.to_datetime(df['trade_date']).dt.year 55 | df['annual_return'] = df['portfolio_value'].pct_change().fillna(0) 56 | 57 | annual_returns = df.groupby('year')['annual_return'].sum() 58 | 59 | plt.figure(figsize=(10, 5)) 60 | annual_returns.plot(kind='bar', color='blue', alpha=0.7) 61 | plt.xlabel('年份') 62 | plt.ylabel('年度收益') 63 | plt.title('年度收益表现') 64 | plt.grid(True) 65 | 66 | plt.savefig(output_file) 67 | plt.close() 68 | print(f"✅ 年度收益柱状图已保存至 {output_file}") 69 | 70 | def plot_excess_returns(portfolio_value, benchmark_data, output_file='output/excess_returns.png'): 71 | """ 72 | 绘制策略 vs 基准指数的累计超额收益曲线 73 | 74 | :param portfolio_value: DataFrame,包含 trade_date, portfolio_value 75 | :param benchmark_data: DataFrame,基准指数净值(trade_date, benchmark_value) 76 | :param output_file: 图片保存路径 77 | """ 78 | merged = portfolio_value.merge(benchmark_data, on='trade_date', how='inner') 79 | merged['excess_return'] = merged['portfolio_value'] - merged['benchmark_value'] 80 | 81 | plt.figure(figsize=(12, 6)) 82 | plt.plot(merged['trade_date'], merged['excess_return'], label='累计超额收益', color='green') 83 | plt.axhline(0, color='gray', linestyle='--') 84 | plt.xlabel('日期') 85 | plt.ylabel('超额收益') 86 | plt.title('策略 vs 基准指数的累计超额收益') 87 | plt.legend() 88 | plt.grid(True) 89 | 90 | plt.savefig(output_file) 91 | plt.close() 92 | print(f"✅ 累计超额收益曲线已保存至 {output_file}") -------------------------------------------------------------------------------- /factors/technical_factors.py: -------------------------------------------------------------------------------- 1 | # technical_factors.py 2 | # 技术因子计算模块,窗口参数全部外部配置 3 | 4 | import numpy as np 5 | import pandas as pd 6 | from config import TECHNICAL_FACTOR_WINDOWS 7 | 8 | def calculate_momentum_factors(df): 9 | """ 10 | 计算多周期动量因子(过去N日收益率),窗口N可配置 11 | """ 12 | for window in TECHNICAL_FACTOR_WINDOWS: 13 | df[f'momentum_{window}'] = df['close'].pct_change(window) 14 | return df 15 | 16 | def calculate_volatility_factors(df): 17 | """ 18 | 计算多周期波动率因子(过去N日收益率的标准差),窗口N可配置 19 | """ 20 | for window in TECHNICAL_FACTOR_WINDOWS: 21 | df[f'volatility_{window}'] = df['close'].pct_change().rolling(window).std() 22 | return df 23 | 24 | def calculate_bias_factors(df): 25 | """ 26 | 计算多周期均线乖离率因子,窗口N可配置 27 | """ 28 | for window in TECHNICAL_FACTOR_WINDOWS: 29 | df[f'ma_{window}'] = df['close'].rolling(window).mean() 30 | df[f'bias_{window}'] = (df['close'] - df[f'ma_{window}']) / df[f'ma_{window}'] 31 | return df 32 | 33 | def calculate_turnover_factors(df): 34 | """ 35 | 换手率(成交量/流通股本),无周期窗口 36 | """ 37 | if 'float_share' in df.columns: 38 | df['turnover_rate'] = df['vol'] / df['float_share'] 39 | else: 40 | df['turnover_rate'] = np.nan 41 | return df 42 | 43 | def calculate_macd(df, short_window=12, long_window=26, signal_window=9): 44 | """ 45 | MACD指标(趋势信号),参数可根据风格调节 46 | """ 47 | df['ema_short'] = df['close'].ewm(span=short_window, adjust=False).mean() 48 | df['ema_long'] = df['close'].ewm(span=long_window, adjust=False).mean() 49 | df['macd'] = df['ema_short'] - df['ema_long'] 50 | df['macd_signal'] = df['macd'].ewm(span=signal_window, adjust=False).mean() 51 | df['macd_hist'] = df['macd'] - df['macd_signal'] 52 | return df 53 | 54 | def calculate_rsi(df, window=14): 55 | """ 56 | RSI相对强弱指标(超买超卖),窗口默认14 57 | """ 58 | delta = df['close'].diff() 59 | gain = delta.where(delta > 0, 0).rolling(window).mean() 60 | loss = -delta.where(delta < 0, 0).rolling(window).mean() 61 | rs = gain / loss 62 | df['rsi'] = 100 - 100 / (1 + rs) 63 | return df 64 | 65 | def calculate_technical_factors(df): 66 | """ 67 | 主调用函数,按配置计算所有技术因子 68 | """ 69 | df = calculate_momentum_factors(df) 70 | df = calculate_volatility_factors(df) 71 | df = calculate_bias_factors(df) 72 | df = calculate_turnover_factors(df) 73 | df = calculate_macd(df) 74 | df = calculate_rsi(df) 75 | return df 76 | 77 | # factors/technical_factors.py 78 | """ 79 | 技术因子计算模块 80 | 包含动量类、波动率类、均线偏离类等因子 81 | """ 82 | 83 | 84 | def calculate_technical_factors(all_data): 85 | """ 86 | 计算技术因子 87 | :param all_data: 包含行情数据(trade_date, ts_code, close等列) 88 | :return: all_data(增加技术因子列) 89 | """ 90 | 91 | # 计算各类技术指标 92 | def calc_rolling_features(group, windows): 93 | for window in windows: 94 | col_prefix = f'close_{window}' 95 | group[f'ma_{window}'] = group['close'].rolling(window).mean() 96 | group[f'bias_{window}'] = (group['close'] - group[f'ma_{window}']) / group[f'ma_{window}'] 97 | group[f'momentum_{window}'] = group['close'].pct_change(window) 98 | 99 | # 波动率 100 | group[f'volatility_{window}'] = group['close'].pct_change().rolling(window).std() 101 | 102 | return group 103 | 104 | all_data = all_data.groupby('ts_code').apply(lambda x: calc_rolling_features(x, [5, 20, 60])) 105 | 106 | # 其他因子示例(ATR、量价相关等) 107 | all_data['turnover_rate'] = all_data['vol'] / all_data['float_share'] 108 | all_data['avg_turnover_20'] = all_data.groupby('ts_code')['turnover_rate'].transform(lambda x: x.rolling(20).mean()) 109 | 110 | return all_data -------------------------------------------------------------------------------- /factors/factor_scoring.py: -------------------------------------------------------------------------------- 1 | # factor_scoring.py 2 | # 使用LightGBM训练多因子评分模型,支持GridSearchCV超参优化+早停+特征重要性分析 3 | 4 | import pandas as pd 5 | import numpy as np 6 | import lightgbm as lgb 7 | import matplotlib.pyplot as plt 8 | from sklearn.model_selection import train_test_split, GridSearchCV 9 | from sklearn.metrics import mean_squared_error 10 | import os 11 | 12 | # 模型保存路径 13 | MODEL_PATH = 'score_model.lgb' 14 | FEATURE_IMPORTANCE_PATH = 'feature_importance.csv' 15 | 16 | def train_ml_model_with_tuning(factor_df, future_returns): 17 | """ 18 | 训练多因子评分模型(LightGBM),支持GridSearchCV超参优化+早停策略+特征重要性分析 19 | """ 20 | X = factor_df.copy() 21 | y = future_returns.copy() 22 | 23 | # 划分训练集和测试集 24 | X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 25 | 26 | # LightGBM基础模型 27 | base_model = lgb.LGBMRegressor(boosting_type='gbdt', objective='regression', random_state=42) 28 | 29 | # 超参数搜索范围 30 | param_grid = { 31 | 'num_leaves': [15, 31, 50], 32 | 'learning_rate': [0.01, 0.05, 0.1], 33 | 'n_estimators': [100, 200], 34 | 'max_depth': [-1, 5, 10], 35 | 'min_child_samples': [10, 20, 30] 36 | } 37 | 38 | # 超参搜索 39 | grid_search = GridSearchCV( 40 | base_model, 41 | param_grid, 42 | cv=5, 43 | scoring='neg_mean_squared_error', 44 | verbose=1, 45 | n_jobs=-1 46 | ) 47 | grid_search.fit(X_train, y_train) 48 | 49 | # 最优参数和模型 50 | best_params = grid_search.best_params_ 51 | print(f"【GridSearch最优参数】{best_params}") 52 | 53 | # 早停训练(使用最优参数) 54 | train_data = lgb.Dataset(X_train, label=y_train) 55 | valid_data = lgb.Dataset(X_test, label=y_test) 56 | 57 | params = { 58 | 'objective': 'regression', 59 | 'boosting_type': 'gbdt', 60 | 'metric': 'rmse', 61 | 'random_state': 42, 62 | 'verbose': -1, 63 | **best_params 64 | } 65 | 66 | model = lgb.train( 67 | params, 68 | train_data, 69 | valid_sets=[train_data, valid_data], 70 | early_stopping_rounds=10, 71 | verbose_eval=False 72 | ) 73 | 74 | # 评估测试集表现 75 | y_pred = model.predict(X_test) 76 | rmse = np.sqrt(mean_squared_error(y_test, y_pred)) 77 | print(f"【早停后最优模型测试集RMSE】{rmse:.5f}") 78 | 79 | # 保存模型 80 | model.save_model(MODEL_PATH) 81 | print(f"【评分模型已保存】{MODEL_PATH}") 82 | 83 | # 保存并可视化特征重要性 84 | save_and_plot_feature_importance(model, X_train.columns) 85 | 86 | return model 87 | 88 | def save_and_plot_feature_importance(model, feature_names): 89 | """ 90 | 保存并可视化特征重要性 91 | """ 92 | importance = model.feature_importance(importance_type='gain') 93 | importance_df = pd.DataFrame({'feature': feature_names, 'importance': importance}) 94 | importance_df = importance_df.sort_values(by='importance', ascending=False) 95 | importance_df.to_csv(FEATURE_IMPORTANCE_PATH, index=False) 96 | 97 | print(f"【特征重要性已保存】{FEATURE_IMPORTANCE_PATH}") 98 | plot_feature_importance(importance_df) 99 | 100 | def plot_feature_importance(importance_df): 101 | """ 102 | 画特征重要性柱状图 103 | """ 104 | plt.figure(figsize=(10, 6)) 105 | plt.barh(importance_df['feature'], importance_df['importance'], color='skyblue') 106 | plt.xlabel('Importance (Gain)') 107 | plt.title('Feature Importance') 108 | plt.gca().invert_yaxis() 109 | plt.grid(axis='x', linestyle='--', alpha=0.6) 110 | plt.show() 111 | 112 | def load_ml_model(): 113 | """ 114 | 加载LightGBM评分模型 115 | """ 116 | if os.path.exists(MODEL_PATH): 117 | model = lgb.Booster(model_file=MODEL_PATH) 118 | print(f"【已加载评分模型】{MODEL_PATH}") 119 | return model 120 | else: 121 | raise FileNotFoundError("评分模型不存在,请先训练模型") 122 | 123 | def score_stocks_ml(factor_df): 124 | """ 125 | 使用训练好的LightGBM模型对股票进行评分(预测未来收益率) 126 | """ 127 | model = load_ml_model() 128 | scores = model.predict(factor_df) 129 | return pd.Series(scores, index=factor_df.index) -------------------------------------------------------------------------------- /utils/performance.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | 4 | def calculate_performance_metrics(portfolio_value, benchmark_data=None, positions=None): 5 | """ 6 | 计算策略的关键绩效指标: 7 | - 年化收益率 8 | - 最大回撤 9 | - 夏普比率 10 | - 卡玛比率 11 | - 索提诺比率 12 | - 信息比率(对比基准指数) 13 | - 回撤恢复时间 14 | - 盈利胜率 15 | - 盈亏比 16 | - 调仓换手率(从positions.csv计算) 17 | 18 | :param portfolio_value: DataFrame,包含 trade_date, portfolio_value 19 | :param benchmark_data: DataFrame(可选),基准指数净值(trade_date, benchmark_value) 20 | :param positions: DataFrame(可选),持仓数据(trade_date, ts_code, weight) 21 | :return: 绩效指标 DataFrame 22 | """ 23 | df = portfolio_value.copy() 24 | df['daily_return'] = df['portfolio_value'].pct_change().fillna(0) 25 | 26 | # === 计算年化收益率 === 27 | total_days = len(df) 28 | annual_return = (df['portfolio_value'].iloc[-1] / df['portfolio_value'].iloc[0]) ** (250 / total_days) - 1 29 | 30 | # === 计算最大回撤 === 31 | rolling_max = df['portfolio_value'].cummax() 32 | drawdown = df['portfolio_value'] / rolling_max - 1 33 | max_drawdown = drawdown.min() 34 | 35 | # === 计算夏普比率 === 36 | risk_free_rate = 0.02 # 无风险利率 2% 37 | excess_return = df['daily_return'] - risk_free_rate / 250 38 | sharpe_ratio = excess_return.mean() / excess_return.std() * np.sqrt(250) 39 | 40 | # === 计算卡玛比率 === 41 | calmar_ratio = annual_return / abs(max_drawdown) if max_drawdown != 0 else np.nan 42 | 43 | # === 计算索提诺比率(Sortino Ratio,基于下行波动率) === 44 | downside_return = df[df['daily_return'] < 0]['daily_return'] 45 | downside_vol = downside_return.std() * np.sqrt(250) 46 | sortino_ratio = excess_return.mean() / downside_vol if downside_vol != 0 else np.nan 47 | 48 | # === 计算信息比率(对比基准指数) === 49 | if benchmark_data is not None: 50 | merged = df.merge(benchmark_data, on='trade_date', how='inner') 51 | merged['benchmark_return'] = merged['benchmark_value'].pct_change().fillna(0) 52 | active_return = merged['daily_return'] - merged['benchmark_return'] 53 | tracking_error = active_return.std() * np.sqrt(250) 54 | information_ratio = active_return.mean() / tracking_error if tracking_error != 0 else np.nan 55 | else: 56 | information_ratio = np.nan 57 | 58 | # === 计算回撤恢复时间 === 59 | recovery_time = np.nan 60 | if max_drawdown < 0: 61 | drawdown_periods = drawdown[drawdown == max_drawdown].index[0] 62 | recovery_periods = df[df['portfolio_value'] >= rolling_max.shift(1)].index 63 | recovery_time = (recovery_periods[recovery_periods > drawdown_periods].min() - drawdown_periods).days if len(recovery_periods) > 0 else np.nan 64 | 65 | # === 计算盈利胜率(Win Rate) === 66 | win_rate = (df['daily_return'] > 0).sum() / len(df) 67 | 68 | # === 计算盈亏比(Profit-Loss Ratio) === 69 | avg_win = df[df['daily_return'] > 0]['daily_return'].mean() 70 | avg_loss = abs(df[df['daily_return'] < 0]['daily_return'].mean()) 71 | profit_loss_ratio = avg_win / avg_loss if avg_loss != 0 else np.nan 72 | 73 | # === 计算换手率(Turnover Rate,从 positions.csv 获取) === 74 | if positions is not None: 75 | turnover_rate = calculate_turnover_rate(positions) 76 | else: 77 | turnover_rate = np.nan 78 | 79 | # 组织结果 80 | metrics = pd.DataFrame({ 81 | 'Metric': [ 82 | 'Annual Return', 'Max Drawdown', 'Sharpe Ratio', 'Calmar Ratio', 'Sortino Ratio', 83 | 'Information Ratio', 'Time to Recovery', 'Win Rate', 'Profit-Loss Ratio', 'Turnover Rate' 84 | ], 85 | 'Value': [ 86 | f'{annual_return:.2%}', f'{max_drawdown:.2%}', f'{sharpe_ratio:.2f}', f'{calmar_ratio:.2f}', f'{sortino_ratio:.2f}', 87 | f'{information_ratio:.2f}', f'{recovery_time} days', f'{win_rate:.2%}', f'{profit_loss_ratio:.2f}', f'{turnover_rate:.2%}' 88 | ] 89 | }) 90 | 91 | return metrics 92 | 93 | def calculate_turnover_rate(positions): 94 | """ 95 | 计算换手率 96 | :param positions: DataFrame,持仓数据 97 | :return: 年化换手率 98 | """ 99 | unique_dates = positions['trade_date'].nunique() 100 | turnover_per_trade = 0.5 # 假设换手率 50% 101 | annual_turnover = turnover_per_trade * (250 / unique_dates) # 估算年化换手率 102 | return annual_turnover -------------------------------------------------------------------------------- /factors/factor_analysis.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import os 3 | import shutil 4 | from collections import defaultdict 5 | 6 | 7 | def calculate_ic(all_data, future_return_col='future_5d_return'): 8 | """ 9 | 计算每日IC(Information Coefficient),基于Spearman秩相关系数 10 | :param all_data: 包含因子列和未来收益率列的DataFrame 11 | :return: 每日IC DataFrame 12 | """ 13 | factor_cols = [col for col in all_data.columns if 14 | col.startswith(('momentum', 'volatility', 'bias', 'pe', 'roe', 'turnover', 'sentiment'))] 15 | 16 | ic_df = pd.DataFrame(index=all_data['trade_date'].unique(), columns=factor_cols) 17 | 18 | for date, group in all_data.groupby('trade_date'): 19 | for factor in factor_cols: 20 | if group[factor].isnull().all(): 21 | continue 22 | ic = group[factor].corr(group[future_return_col], method='spearman') 23 | ic_df.at[date, factor] = ic 24 | 25 | return ic_df 26 | 27 | 28 | def calculate_monthly_icir(ic_df): 29 | """ 30 | 计算月度IC均值和ICIR(IC均值/IC标准差) 31 | :param ic_df: 每日IC数据 32 | :return: 月度IC均值, 月度ICIR 33 | """ 34 | ic_df['month'] = ic_df.index.strftime('%Y-%m') 35 | monthly_ic = ic_df.groupby('month').mean() 36 | monthly_ic_std = ic_df.groupby('month').std() 37 | icir_df = monthly_ic / monthly_ic_std 38 | return monthly_ic, icir_df 39 | 40 | 41 | def filter_factors_by_icir(monthly_ic, icir_df): 42 | """ 43 | 根据最新月度IC和ICIR筛选有效因子 44 | 筛选条件:IC > 0.02 且 ICIR > 0.3 45 | :return: 筛选后的因子列表 46 | """ 47 | last_month = icir_df.index[-1] 48 | latest_ic = monthly_ic.loc[last_month] 49 | latest_icir = icir_df.loc[last_month] 50 | 51 | selected_factors = latest_ic[(latest_ic > 0.02) & (latest_icir > 0.3)].index.tolist() 52 | 53 | return selected_factors 54 | 55 | 56 | def track_and_remove_underperforming_factors(icir_df, graveyard_path='factor_graveyard'): 57 | """ 58 | 因子退场机制: 59 | 连续3个月ICIR低于0.3的因子触发退场,归档到factor_graveyard 60 | :param icir_df: 月度ICIR 61 | :param graveyard_path: 退场因子存放路径 62 | :return: 退场因子列表 63 | """ 64 | if not os.path.exists(graveyard_path): 65 | os.makedirs(graveyard_path) 66 | 67 | factor_retreat_count = defaultdict(int) 68 | 69 | # 判断过去3个月连续低ICIR因子 70 | for factor in icir_df.columns: 71 | low_icir_streak = 0 72 | for month in icir_df.index[-3:]: 73 | if pd.isna(icir_df.at[month, factor]) or icir_df.at[month, factor] >= 0.3: 74 | low_icir_streak = 0 75 | else: 76 | low_icir_streak += 1 77 | 78 | if low_icir_streak >= 3: 79 | factor_retreat_count[factor] += 1 80 | 81 | retired_factors = [] 82 | 83 | for factor in factor_retreat_count: 84 | factor_file = f'factors/{factor}.py' 85 | if os.path.exists(factor_file): 86 | shutil.move(factor_file, os.path.join(graveyard_path, f'{factor}.py')) 87 | log_factor_retreat(factor, graveyard_path) 88 | retired_factors.append(factor) 89 | 90 | return retired_factors 91 | 92 | 93 | def log_factor_retreat(factor_name, graveyard_path): 94 | """ 95 | 记录退场因子到日志 96 | :param factor_name: 因子名 97 | :param graveyard_path: 退场因子存放路径 98 | """ 99 | log_file = os.path.join(graveyard_path, 'retired_factors_log.txt') 100 | with open(log_file, 'a', encoding='utf-8') as f: 101 | f.write(f'{pd.Timestamp.now()} - {factor_name} retired due to 3 consecutive months ICIR < 0.3\n') 102 | 103 | 104 | def evaluate_and_filter_factors(all_data, future_return_col='future_5d_return'): 105 | """ 106 | 因子评估流程: 107 | 1. 每日IC计算 108 | 2. 月度IC和ICIR计算 109 | 3. 因子筛选 110 | 4. 因子退场机制(连续3个月ICIR<0.3的因子移入factor_graveyard) 111 | :return: 保留因子列表,每日IC,月度IC,月度ICIR 112 | """ 113 | print("📊 计算每日IC...") 114 | ic_df = calculate_ic(all_data, future_return_col) 115 | 116 | print("📊 计算月度IC和ICIR...") 117 | monthly_ic, icir_df = calculate_monthly_icir(ic_df) 118 | 119 | print("📊 筛选有效因子...") 120 | selected_factors = filter_factors_by_icir(monthly_ic, icir_df) 121 | 122 | print("📊 检查并执行因子退场机制...") 123 | retired_factors = track_and_remove_underperforming_factors(icir_df) 124 | 125 | print(f"✅ 保留因子: {selected_factors}") 126 | print(f"❌ 退场因子: {retired_factors}") 127 | 128 | return selected_factors, ic_df, monthly_ic, icir_df -------------------------------------------------------------------------------- /strategy/backtest.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | 4 | def run_backtest(positions, market_data, timing_signals, initial_capital=1e7): 5 | """ 6 | 完整回测逻辑: 7 | - 支持持仓动态跟踪 8 | - 支持择时信号(多头/空仓切换) 9 | - 支持停牌补全 10 | - 支持分红除权价格(默认行情数据已为复权价) 11 | 12 | :param positions: 每期选股仓位(trade_date, ts_code, weight) 13 | :param market_data: 市场行情数据(包含trade_date, ts_code, close等列) 14 | :param timing_signals: 择时信号(trade_date, final_signal=0/1) 15 | :param initial_capital: 初始资金 16 | :return: 每日净值DataFrame, 每日持仓快照 17 | """ 18 | 19 | all_dates = market_data['trade_date'].sort_values().unique() 20 | 21 | portfolio_value = [] 22 | daily_positions = [] 23 | 24 | capital = initial_capital 25 | current_positions = {} # 股票 -> 股数 26 | last_prices = {} # 股票 -> 上个有效收盘价(用于停牌补全) 27 | 28 | for trade_date in all_dates: 29 | daily_market = market_data[market_data['trade_date'] == trade_date] 30 | timing_signal = timing_signals.loc[timing_signals['trade_date'] == trade_date, 'final_signal'].values 31 | 32 | if len(timing_signal) > 0: 33 | timing_signal = timing_signal[0] 34 | else: 35 | timing_signal = 1 # 无信号默认多头持仓 36 | 37 | # === 调仓日处理 === 38 | if trade_date in positions['trade_date'].values: 39 | daily_positions_data = positions[positions['trade_date'] == trade_date] 40 | 41 | if timing_signal == 1: # 正常持仓 42 | current_positions = adjust_positions(daily_positions_data, daily_market, capital) 43 | else: # 空仓信号,清仓 44 | current_positions = {} 45 | 46 | # === 计算每日市值 === 47 | daily_value = 0 48 | for stock, shares in current_positions.items(): 49 | price_row = daily_market[daily_market['ts_code'] == stock] 50 | if not price_row.empty: 51 | close_price = price_row['close'].values[0] 52 | last_prices[stock] = close_price # 更新有效价格 53 | else: 54 | close_price = last_prices.get(stock, np.nan) # 如果停牌,使用最近价格 55 | 56 | if not np.isnan(close_price): 57 | daily_value += shares * close_price 58 | 59 | capital = daily_value 60 | portfolio_value.append({'trade_date': trade_date, 'portfolio_value': capital / initial_capital}) 61 | 62 | # === 记录每日持仓 === 63 | for stock, shares in current_positions.items(): 64 | close_price = last_prices.get(stock, np.nan) 65 | if not np.isnan(close_price): 66 | daily_positions.append({ 67 | 'trade_date': trade_date, 68 | 'ts_code': stock, 69 | 'shares': shares, 70 | 'value': shares * close_price 71 | }) 72 | 73 | portfolio_value_df = pd.DataFrame(portfolio_value) 74 | daily_positions_df = pd.DataFrame(daily_positions) 75 | 76 | return portfolio_value_df, daily_positions_df 77 | 78 | 79 | def adjust_positions(positions_data, daily_market, capital): 80 | """ 81 | 按仓位权重分配资金,计算股数(支持停牌补全逻辑) 82 | :param positions_data: 当日选股结果(仓位) 83 | :param daily_market: 当日行情 84 | :param capital: 总资金 85 | :return: 股票 -> 持仓股数 86 | """ 87 | positions = {} 88 | effective_weights = {} 89 | 90 | for _, row in positions_data.iterrows(): 91 | ts_code = row['ts_code'] 92 | weight = row['weight'] 93 | price_row = daily_market[daily_market['ts_code'] == ts_code] 94 | 95 | if not price_row.empty: 96 | close_price = price_row['close'].values[0] 97 | shares = (capital * weight) / close_price 98 | positions[ts_code] = shares 99 | effective_weights[ts_code] = weight 100 | 101 | # 如果部分股票停牌(没有价格),这些股票权重要重新分配 102 | total_effective_weight = sum(effective_weights.values()) 103 | 104 | if total_effective_weight < 1.0: 105 | for ts_code in effective_weights: 106 | effective_weights[ts_code] /= total_effective_weight 107 | 108 | for ts_code, weight in effective_weights.items(): 109 | price_row = daily_market[daily_market['ts_code'] == ts_code] 110 | if not price_row.empty: 111 | close_price = price_row['close'].values[0] 112 | shares = (capital * weight) / close_price 113 | positions[ts_code] = shares 114 | 115 | return positions -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Multi-Factor Stock Selection - Quantitative Trading System 2 | 3 | ## Project Overview 4 | This project implements a **multi-factor stock selection and backtesting system**, integrating **multi-factor models, market timing strategies, and backtest analysis** to optimize investment decisions. Using **Python and Tushare API**, the system retrieves financial market data, evaluates stock investment value through factor analysis, and integrates timing signals to optimize trading decisions. The system allows users to test different factor combinations and assess their effectiveness in historical market conditions. 5 | 6 | ## Project Structure 7 | ``` 8 | multi-factor-stock-selection/ 9 | ├── data/ # Raw market data (retrieved via Tushare API) 10 | ├── output/ # Backtest results & analysis reports 11 | ├── factors/ # Factor computation & evaluation 12 | ├── factor_graveyard/ # Deprecated factors & logs 13 | ├── strategy/ # Stock selection, market timing, backtesting 14 | ├── utils/ # Helper functions 15 | ├── visualization/ # Performance visualization 16 | ├── config.py # Strategy parameter settings 17 | ├── main.py # Main execution script 18 | ├── README.md # Project documentation 19 | ├── requirements.txt # Python dependencies 20 | ├── Dockerfile # Docker containerization support (optional) 21 | ``` 22 | 23 | ## Core Features & Workflow 24 | ### 1️⃣ Data Acquisition 25 | - Uses **Tushare API** to fetch **daily market prices, financial data, suspension records, dividend adjustments**. 26 | - Cleans and preprocesses data (outlier removal, standardization, missing data handling). 27 | - Computes **future N-day returns** for factor evaluation. 28 | 29 | ### 2️⃣ Factor Computation 30 | - **Fundamental Factors**: PE, PB, ROE, debt ratio, revenue growth. 31 | - **Technical Factors**: Momentum (5-day, 10-day returns), moving averages (MA5, MA20, MA60), volume trends. 32 | - **Sentiment Factors**: News sentiment analysis, capital inflow tracking. 33 | 34 | ### 3️⃣ Factor Evaluation & Selection 35 | - Computes **Factor IC (Information Coefficient)** to assess predictive power. 36 | - Calculates **ICIR (IC Stability)** for factor robustness analysis. 37 | - Implements **Factor Removal Mechanism** (ICIR < 0.3 for 3 consecutive months). 38 | 39 | ### 4️⃣ Stock Selection & Portfolio Construction 40 | - **Factor-weighted scoring method** assigns stock rankings. 41 | - Selects **Top N stocks** for final portfolio. 42 | - **Dynamic rebalancing** based on factor performance. 43 | 44 | ### 5️⃣ Market Timing 45 | - **Moving average signals** (e.g., MA20 vs MA60 for trend confirmation). 46 | - **Market breadth indicators** (advancing stock percentage threshold). 47 | - **Volume trend signals** (increased/decreased trading volume). 48 | 49 | ### 6️⃣ Backtesting Framework 50 | - Combines **stock selection & timing signals** for backtesting. 51 | - Handles **suspension adjustments, dividend reinvestments**. 52 | - Computes **daily portfolio value** and generates performance reports. 53 | 54 | ### 7️⃣ Performance Evaluation 55 | - Calculates key performance metrics: 56 | - **Annual Return, Max Drawdown, Sharpe Ratio, Calmar Ratio, Sortino Ratio**. 57 | - **Information Ratio (vs Benchmark), Profit-Loss Ratio, Win Rate, Turnover Rate**. 58 | 59 | ### 8️⃣ Data Visualization 60 | - **Portfolio Performance vs Benchmark Index**. 61 | - **Factor IC Time-Series Analysis**. 62 | - **Annual Return Distribution**. 63 | - **Cumulative Excess Returns**. 64 | 65 | ## Installation & Usage 66 | ### Prerequisites 67 | Ensure you have **Python 3.8+** installed and install required dependencies: 68 | ```sh 69 | pip install -r requirements.txt 70 | ``` 71 | 72 | ### Running the System 73 | To execute the full pipeline: 74 | ```sh 75 | python main.py 76 | ``` 77 | This script will **fetch data, compute factors, select stocks, generate timing signals, run backtests, and visualize performance**. 78 | 79 | ## Output Files 80 | ``` 81 | output/ 82 | ├── portfolio_value.csv # Daily portfolio value 83 | ├── positions.csv # Daily stock positions 84 | ├── return_statistics.csv # Performance metrics 85 | ├── ic_summary.csv # Factor IC statistics 86 | ├── timing_signals.csv # Market timing signals 87 | ├── portfolio_performance.png # Portfolio performance vs Index 88 | ├── annual_returns.png # Annual return bar chart 89 | ├── excess_returns.png # Cumulative excess return curve 90 | ``` 91 | 92 | ## Technologies Used 93 | - **Python**: Core development language 94 | - **Pandas, NumPy**: Data processing & analysis 95 | - **Matplotlib, Seaborn**: Data visualization 96 | - **Tushare API**: Market data acquisition 97 | - **Scikit-learn**: Factor normalization, scoring 98 | - **Git & GitHub**: Version control & collaboration 99 | 100 | ## Future Enhancements 101 | - **Real-time market data tracking** for live trading signals. 102 | - **Automated backtest scheduling** (daily updates, performance reports). 103 | - **Dynamic factor weighting** based on IC performance. 104 | - **Integration with trading platforms** (e.g., Alpaca, Interactive Brokers). 105 | 106 | ## Contact & Contribution 107 | Contributions are welcome! If you'd like to improve the system, feel free to open an **Issue** or submit a **Pull Request** on GitHub. 108 | 109 | **GitHub Repository**: [https://github.com/1984tkr/multi-factor-stock-selection] 110 | 111 | --------------------------------------------------------------------------------