├── README.md
├── data
    ├── preprocessed
    │   ├── SortCols.csv
    │   ├── finance.csv
    │   └── stockReturns.csv
    └── raw
    │   ├── Balance sheet
    │       ├── FS_Combas1.xlsx
    │       ├── FS_Combas2.xlsx
    │       ├── FS_Combas3.xlsx
    │       └── balance sheet.csv
    │   ├── Income statement
    │       ├── FS_Comins1.xlsx
    │       ├── FS_Comins2.xlsx
    │       └── FS_Comins3.xlsx
    │   ├── mktmnth
    │       ├── TRD_Cnmont.xlsx
    │       └── TRD_Cnmont1.xlsx
    │   ├── rf
    │       └── TRD_Nrrate.xlsx
    │   ├── stockmnth
    │       ├── TRD_Mnth0.xlsx
    │       ├── TRD_Mnth1.xlsx
    │       └── TRD_Mnth2.xlsx
    │   └── 字段说明.txt
├── factors.py
├── preprocess.py
├── regression.py
└── requirements.txt


/README.md:
--------------------------------------------------------------------------------
 1 | ## 因子处理
 2 | 我们使用pandas + numpy进行数据预处理，回归分析使用statsmodel的api。
 3 | 
 4 | 主要分为三个文件，每个文件为一个模块，具体解释如下:
 5 | 
 6 | - preprocess.py 负责数据预处理
 7 |   
 8 |   1. 首先将原始数据文件夹下的若干文件按文件夹合并为csv
 9 |   2. 其次将资产负债表和现金流量表合成finance.csv输出
10 |   3. 其次将个股return、无风险收益率、市场收益率合并为stockReturns.csv输出
11 |   4. 根据finance.csv和stockReturns.csv分别计算出BM SIZE INV OP 指标表,其中需要跨表操作时才将两表合并,否则依然分开使用两表.
12 | 
13 | - factors.py 负责计算Fama-French五因子
14 |    
15 |    我们选定的是论文中的2*3模型
16 |    
17 |    该文件下存在一个Grouping类，其主要功能是将股票根据BM SIZE INV OP大小进行分组。并通过getVMReturn函数返回对应组合的收益率，具体使用方法见类的说明
18 |    
19 |    后缀为_MethodOne的函数均为2*3法计算单个因子的函数
20 |    
21 |    FF5函数合并单个因子并输出2016FF5.csv和2017FF5.csv两个csv
22 |    
23 | - PortfolioExcessReturn是根据特定指标分出组合,并输出组合市值加权回报率的函数(即我们选定的应变量).
24 | 
25 |     我们按INV-SIZE,OP-SIZE,BM-SIZE大小排序分别分成5*5的组合，最终输出得到group_result文件夹下的6个csv
26 | ## 程序说明
27 | 要运行程序首先安装依赖
28 | ```commandline
29 | pip install -r requirements.txt
30 | ```
31 | 然后按以下顺序执行命令
32 | ```commandline
33 | python preprocess.py
34 | python factors.py
35 | python regeression.py
36 | ```
37 | 
38 | ## 程序结构
39 | ```java
40 | │  factors.py # 计算指标的py文件
41 | │  preprocess.py # 预处理的py文件
42 | │  regression.py # 回归处理py文件
43 | │  requirements.txt # 依赖文件
44 | │
45 | ├─data
46 | │  ├─preprocessed # 初步预处理后的数据
47 | │  │      finance.csv # 合并后的资产负债表
48 | │  │      SortCols.csv # BM SIZE INV OP 指标表
49 | │  │      stockReturns.csv # 合并后的回报表
50 | │  │
51 | │  └─raw # 原始数据
52 | │      │  字段说明.txt
53 | │      │
54 | │      ├─Balance sheet
55 | │      │      balance sheet.csv
56 | │      │      FS_Combas1.xlsx
57 | │      │      FS_Combas2.xlsx
58 | │      │      FS_Combas3.xlsx
59 | │      │
60 | │      ├─Income statement
61 | │      │      FS_Comins1.xlsx
62 | │      │      FS_Comins2.xlsx
63 | │      │      FS_Comins3.xlsx
64 | │      │
65 | │      ├─mktmnth
66 | │      │      TRD_Cnmont.xlsx
67 | │      │      TRD_Cnmont1.xlsx
68 | │      │
69 | │      ├─rf
70 | │      │      TRD_Nrrate.xlsx
71 | │      │
72 | │      └─stockmnth
73 | │              TRD_Mnth0.xlsx
74 | │              TRD_Mnth1.xlsx
75 | │              TRD_Mnth2.xlsx
76 | │
77 | ├─factor_result # 2*3 FF五因子结果（2015、2016期）
78 | │      2016FF5.csv
79 | │      2017FF5.csv
80 | │
81 | ├─group_result # 5*5分组 组合超额回报率
82 | │      2016_BM_Size.csv
83 | │      2016_Inv_Size.csv
84 | │      2016_OP_Size.csv
85 | │      2017_BM_Size.csv
86 | │      2017_Inv_Size.csv
87 | │      2017_OP_Size.csv
88 | │
89 | └─regression_result # 回归结果
90 |        regression_BM.csv
91 |        regression_BM_pvalue.csv
92 |        regression_Inv.csv
93 |        regression_Inv_pvalue.csv
94 |        regression_OP.csv
95 |        regression_OP_pvalue.csv
96 | 
97 | ```
98 | 
99 | 


--------------------------------------------------------------------------------
/data/raw/Balance sheet/FS_Combas1.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/Balance sheet/FS_Combas1.xlsx


--------------------------------------------------------------------------------
/data/raw/Balance sheet/FS_Combas2.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/Balance sheet/FS_Combas2.xlsx


--------------------------------------------------------------------------------
/data/raw/Balance sheet/FS_Combas3.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/Balance sheet/FS_Combas3.xlsx


--------------------------------------------------------------------------------
/data/raw/Income statement/FS_Comins1.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/Income statement/FS_Comins1.xlsx


--------------------------------------------------------------------------------
/data/raw/Income statement/FS_Comins2.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/Income statement/FS_Comins2.xlsx


--------------------------------------------------------------------------------
/data/raw/Income statement/FS_Comins3.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/Income statement/FS_Comins3.xlsx


--------------------------------------------------------------------------------
/data/raw/mktmnth/TRD_Cnmont.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/mktmnth/TRD_Cnmont.xlsx


--------------------------------------------------------------------------------
/data/raw/mktmnth/TRD_Cnmont1.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/mktmnth/TRD_Cnmont1.xlsx


--------------------------------------------------------------------------------
/data/raw/rf/TRD_Nrrate.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/rf/TRD_Nrrate.xlsx


--------------------------------------------------------------------------------
/data/raw/stockmnth/TRD_Mnth0.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/stockmnth/TRD_Mnth0.xlsx


--------------------------------------------------------------------------------
/data/raw/stockmnth/TRD_Mnth1.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/stockmnth/TRD_Mnth1.xlsx


--------------------------------------------------------------------------------
/data/raw/stockmnth/TRD_Mnth2.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/stockmnth/TRD_Mnth2.xlsx


--------------------------------------------------------------------------------
/data/raw/字段说明.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/teal0range/FF5/75854dd622df74152d6e322ef2762569becdb6c9/data/raw/字段说明.txt


--------------------------------------------------------------------------------
/factors.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | import os
  4 | from datetime import datetime
  5 | from numpy import vectorize
  6 | 
  7 | from preprocess import generateDate
  8 | 
  9 | data_path = 'data/preprocessed'
 10 | phase = {
 11 |     "2016": generateDate("2016-07-31", "2017-06-30"),
 12 |     "2017": generateDate("2017-07-31", "2018-01-31")
 13 | }
 14 | 
 15 | currentPhase = phase["2016"]
 16 | 
 17 | 
 18 | class Grouping:
 19 |     # 分组类
 20 |     def __init__(self, t):
 21 |         """形如[[('h',df0),('l',df1)],[('b',df0),('s',df1)]]
 22 |         用于存储交集之前的数据
 23 |         """
 24 |         self.groups = []
 25 | 
 26 |         """pd.DataFrame用于存储方便查询的格式"""
 27 |         self.frame = pd.DataFrame()
 28 |         self.view = pd.DataFrame()
 29 | 
 30 |         """标记类有没有准备好被查询"""
 31 |         self.isPrepared = False
 32 | 
 33 |         """区间"""
 34 |         self.t = t
 35 | 
 36 |         """BM Inv OP Size"""
 37 |         self.factorList = pd.read_csv(os.path.join(data_path, 'SortCols.csv'))
 38 |         self.factorList = self.factorList[self.factorList['phase'] == int(self.t)]
 39 |         self.factorList['MktSize'] = self.factorList['Size']
 40 | 
 41 |         """回报率文件"""
 42 |         self.stockReturn = self.date2Phase(
 43 |             pd.read_csv(os.path.join(data_path, 'stockReturns.csv'))[['Stkcd', 'date', 'Mretwd', 'Msmvosd', 'Nrrmtdt']]
 44 |         )
 45 | 
 46 |         """向量化getVMReturn函数"""
 47 |         self.getVMReturn = np.vectorize(self.getVMReturn, excluded=['section'])
 48 |         self.getVMExcessReturn = np.vectorize(self.getVMExcessReturn, excluded=['section'])
 49 | 
 50 |     def append(self, breakpoints, col_name):
 51 |         """
 52 |         用于向Group中添加维度
 53 |         :param df: 添加维度的数据表
 54 |         :param breakpoints: 断点，形如[('H',0),('L',0.5)],意味[0,0.5]分位点为H,[0.5,1]为L
 55 |         :param col_name: 数据列名
 56 |         :return: None
 57 |         """
 58 |         self.groups.append(self.split(breakpoints, col_name))
 59 |         self.isPrepared = False
 60 | 
 61 |     def split(self, breakpoints, col_name):
 62 |         """
 63 |         根据数据列分割factorList
 64 |         :param breakpoints: 断点，形如[('H',0),('L',0.5)],意味[0,0.5]分位点为H,[0.5,1]为L
 65 |         :param col_name: 数据列名
 66 |         :return: 形如[('h',df0),('l',df1)]
 67 |         """
 68 |         df = self.factorList
 69 |         df = df[['Stkcd', 'phase', 'MktSize', col_name]].sort_values(by=[col_name])
 70 |         res = []
 71 |         for idx, _breakpoint in enumerate(breakpoints):
 72 |             upper = 1 if idx == len(breakpoints) - 1 else breakpoints[idx + 1][1]
 73 |             tag, ratio = _breakpoint
 74 |             res.append((tag, df.iloc[int(ratio * len(df)):int(upper * len(df))].drop([col_name], axis=1)))
 75 |         return res
 76 | 
 77 |     def prepare(self):
 78 |         """
 79 |         准备类以备查询
 80 |         :return:None
 81 |         """
 82 |         self.isPrepared = True
 83 |         self.intersections()
 84 |         for i in range(len(self.frame)):
 85 |             data = self.frame.iat[i, self.getAxis()]
 86 |             self.frame.iat[i, self.getAxis()] = pd.merge(data, self.stockReturn, on=['Stkcd', 'phase'])
 87 | 
 88 |     def intersections(self, axis=0, cols=None, df=None):
 89 |         """
 90 |         根据数值做出排序交集
 91 |         :param axis:
 92 |         :param cols:
 93 |         :param df:
 94 |         :return:
 95 |         """
 96 |         if cols is None:
 97 |             cols = {}
 98 |         if axis == len(self.groups):
 99 |             cols['data'] = [df]
100 |             self.frame = self.frame.append(pd.DataFrame(cols, index=[len(self.frame)]), sort=True)
101 |             return
102 | 
103 |         for group in self.groups[axis]:
104 |             if axis == 0:
105 |                 df = group[1]
106 |             tmp = df
107 |             if axis != 0:
108 |                 df = pd.merge(df, group[1], on=['Stkcd', 'phase', 'MktSize'])
109 |             cols[axis] = [group[0]]
110 |             self.intersections(axis + 1, cols, df)
111 |             cols.pop(axis)
112 |             df = tmp
113 | 
114 |     def getAxis(self):
115 |         return len(self.groups)
116 | 
117 |     def __getitem__(self, item):
118 |         if not self.isPrepared:
119 |             self.prepare()
120 |         res = self.frame
121 |         for axis in range(self.getAxis()):
122 |             res = res[res[axis] == item[axis]]
123 |         return res.iat[0, self.getAxis()].reset_index().drop(['index'], axis=1)
124 | 
125 |     def getVMReturn(self, date, section):
126 |         """
127 |         获取某分组的VM加权回报，此函数在类初始化时会被向量化
128 |         :param date: 时间 YYYY-MM-DD
129 |         :param section: 类别
130 |         :return:
131 |         """
132 |         df = self.__getitem__(section)
133 |         df = df[df['date'] == date]
134 |         return np.sum(df['Mretwd'] * df['MktSize']) / np.sum(df['MktSize'])
135 | 
136 |     def getVMExcessReturn(self, date, section):
137 |         """
138 |         获取某分组的VM加权回报，此函数在类初始化时会被向量化
139 |         :param date: 时间 YYYY-MM-DD
140 |         :param section: 类别
141 |         :return:
142 |         """
143 |         df = self.__getitem__(section)
144 |         df = df[df['date'] == date]
145 |         return np.sum((df['Mretwd'] - df['Nrrmtdt']) * df['MktSize']) / np.sum(df['MktSize'])
146 | 
147 |     @staticmethod
148 |     def date2Phase(df):
149 |         """
150 |         将时间转化为t, t年7月至t+1年6月
151 |         :param df: 数据表
152 |         :return:
153 |         """
154 |         df['phase'] = df['date'].apply(lambda x: datetime.strptime(x, "%Y-%m").year
155 |         if datetime.strptime(x, "%Y-%m").month >= 7 else datetime.strptime(x, "%Y-%m").year - 1)
156 |         return df
157 | 
158 | 
159 | def Mktrf_MethodOne():
160 |     df = pd.read_csv(os.path.join(data_path, 'stockReturns.csv'))
161 |     df = df[['date', 'Cmretwdos', 'Nrrmtdt']].drop_duplicates().sort_values(by='date')
162 |     df['Mktrf'] = df['Cmretwdos'] - df['Nrrmtdt']
163 |     return df[['date', 'Mktrf']]
164 | 
165 | 
166 | def SMB_MethodOne(t):
167 |     # SMB BM
168 |     g = Grouping(t)
169 |     g.append([('S', 0), ('B', 0.5)], 'Size')
170 |     g.append([('L', 0), ('N', 0.3), ('H', 0.7)], 'BM')
171 |     SMB_BM = (g.getVMReturn(currentPhase, section=['S', 'L']) + g.getVMReturn(currentPhase, section=['S', 'N'])
172 |               + g.getVMReturn(currentPhase, section=['S', 'H'])) / 3 - \
173 |              (g.getVMReturn(currentPhase, section=['B', 'L']) + g.getVMReturn(currentPhase, section=['B', 'N'])
174 |               + g.getVMReturn(currentPhase, section=['B', 'H'])) / 3
175 | 
176 |     g = Grouping(t)
177 |     g.append([('S', 0), ('B', 0.5)], 'Size')
178 |     g.append([('W', 0), ('N', 0.3), ('R', 0.7)], 'OP')
179 |     SMB_OP = (g.getVMReturn(currentPhase, section=['S', 'W']) + g.getVMReturn(currentPhase, section=['S', 'N'])
180 |               + g.getVMReturn(currentPhase, section=['S', 'R'])) / 3 - \
181 |              (g.getVMReturn(currentPhase, section=['B', 'W']) + g.getVMReturn(currentPhase, section=['B', 'N'])
182 |               + g.getVMReturn(currentPhase, section=['B', 'R'])) / 3
183 | 
184 |     g = Grouping(t)
185 |     g.append([('S', 0), ('B', 0.5)], 'Size')
186 |     g.append([('C', 0), ('N', 0.3), ('A', 0.7)], 'Inv')
187 |     SMB_Inv = (g.getVMReturn(currentPhase, section=['S', 'C']) + g.getVMReturn(currentPhase, section=['S', 'N'])
188 |                + g.getVMReturn(currentPhase, section=['S', 'A'])) / 3 - \
189 |               (g.getVMReturn(currentPhase, section=['B', 'C']) + g.getVMReturn(currentPhase, section=['B', 'N'])
190 |                + g.getVMReturn(currentPhase, section=['B', 'A'])) / 3
191 | 
192 |     return pd.DataFrame({'date': currentPhase, 'SMB': (SMB_BM + SMB_OP + SMB_Inv) / 3})
193 | 
194 | 
195 | def HML_MethodOne(t):
196 |     g = Grouping(t)
197 |     g.append([('S', 0), ('B', 0.5)], 'Size')
198 |     g.append([('L', 0), ('N', 0.3), ('H', 0.7)], 'BM')
199 |     HML = (g.getVMReturn(currentPhase, section=['S', 'H']) + g.getVMReturn(currentPhase, section=['B', 'H'])) / 2 - \
200 |           (g.getVMReturn(currentPhase, section=['S', 'L']) + g.getVMReturn(currentPhase, section=['B', 'L'])) / 2
201 |     return pd.DataFrame({'date': currentPhase, 'HML': HML})
202 | 
203 | 
204 | def RMW_MethodOne(t):
205 |     g = Grouping(t)
206 |     g.append([('S', 0), ('B', 0.5)], 'Size')
207 |     g.append([('W', 0), ('N', 0.3), ('R', 0.7)], 'OP')
208 |     RMW = (g.getVMReturn(currentPhase, section=['S', 'R']) + g.getVMReturn(currentPhase, section=['B', 'R'])) / 2 - \
209 |           (g.getVMReturn(currentPhase, section=['S', 'W']) + g.getVMReturn(currentPhase, section=['B', 'W'])) / 2
210 | 
211 |     return pd.DataFrame({'date': currentPhase, 'RMW': RMW})
212 | 
213 | 
214 | def CMA_MethodOne(t):
215 |     g = Grouping(t)
216 |     g.append([('S', 0), ('B', 0.5)], 'Size')
217 |     g.append([('C', 0), ('N', 0.3), ('A', 0.7)], 'Inv')
218 |     CMA = (g.getVMReturn(currentPhase, section=['S', 'C']) + g.getVMReturn(currentPhase, section=['B', 'C'])) / 2 - \
219 |           (g.getVMReturn(currentPhase, section=['S', 'A']) + g.getVMReturn(currentPhase, section=['B', 'A'])) / 2
220 | 
221 |     return pd.DataFrame({'date': currentPhase, 'CMA': CMA})
222 | 
223 | 
224 | @vectorize
225 | def FF5(t):
226 |     # 输出t期FF5因子的列表
227 |     global currentPhase
228 |     currentPhase = phase[t]
229 |     res = pd.merge(Mktrf_MethodOne(),
230 |                    pd.merge(SMB_MethodOne(t),
231 |                             pd.merge(HML_MethodOne(t),
232 |                                      pd.merge(RMW_MethodOne(t), CMA_MethodOne(t), on=['date']), on=['date']),
233 |                             on=['date']), on=['date'])
234 |     res.to_csv(os.path.join("factor_result", t + "FF5.csv"), index=False)
235 | 
236 | 
237 | def PortfolioExcessReturn(t, row_type, row_num, col_type='Size', col_num=5, csv=True):
238 |     """
239 |     指标分组计算组合
240 |     :param t: 期号
241 |     :param row_type: 行名
242 |     :param row_num: 行分裂数
243 |     :param col_type: 列名（默认Size）
244 |     :param col_num: 列分裂数（默认5）
245 |     :param csv: 是否输出csv
246 |     :return:
247 |     """
248 |     global currentPhase
249 |     currentPhase = phase[t]
250 |     g = Grouping(t)
251 |     g.append([(i, i * 1 / row_num) for i in range(row_num)], row_type)
252 |     g.append([(i, i * 1 / col_num) for i in range(col_num)], col_type)
253 | 
254 |     @vectorize
255 |     def formatDataframe(row_count, col_count):
256 |         return pd.DataFrame({"date": currentPhase,
257 |                              "row_type": [row_type] * len(currentPhase),
258 |                              "row_count": [row_count + 1] * len(currentPhase),
259 |                              "col_type": [col_type] * len(currentPhase),
260 |                              "col_count": [col_count + 1] * len(currentPhase),
261 |                              "ExcessReturn": g.getVMExcessReturn(currentPhase, section=[row_count, col_count])}
262 |                             , index=[i for i in range(len(currentPhase))])
263 | 
264 |     df = pd.DataFrame().append(
265 |         list(formatDataframe(np.arange(0, row_num * col_num, 1) % row_num,
266 |                              np.arange(0, row_num * col_num, 1) // row_num))
267 |     )
268 |     if csv:
269 |         df.to_csv(os.path.join("group_result", t + "_" + row_type + "_" + col_type + ".csv"), index=False)
270 |     return df
271 | 
272 | 
273 | if __name__ == '__main__':
274 |     FF5(['2016', '2017'])
275 |     PortfolioExcessReturn('2016', 'OP', 5)
276 |     PortfolioExcessReturn('2017', 'OP', 5)
277 |     PortfolioExcessReturn('2016', 'Inv', 5)
278 |     PortfolioExcessReturn('2017', 'Inv', 5)
279 |     PortfolioExcessReturn('2016', 'BM', 5)
280 |     PortfolioExcessReturn('2017', 'BM', 5)


--------------------------------------------------------------------------------
/preprocess.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import os
  3 | from datetime import datetime, timedelta
  4 | import numpy as np
  5 | 
  6 | data_path = './data/preprocessed'
  7 | 
  8 | 
  9 | # Utils工具类
 10 | def convertCode(code) -> str:
 11 |     """
 12 |     将股票代码转换为标准6位
 13 |     :param code: 股票代码
 14 |     :return: 6位标准代码
 15 |     """
 16 |     return "{:06d}".format(int(code))
 17 | 
 18 | 
 19 | def generateDate(start, end) -> list:
 20 |     start = datetime.strptime(start, "%Y-%m-%d")
 21 |     end = datetime.strptime(end, "%Y-%m-%d")
 22 |     current = start
 23 |     res = []
 24 |     while current < end:
 25 |         res.append(current.strftime("%Y-%m"))
 26 |         current = current.replace(day=28) + timedelta(days=4)
 27 |     return res
 28 | 
 29 | 
 30 | # IO 输入输出
 31 | def excel2Df(directory: str, **kwargs) -> pd.DataFrame:
 32 |     """
 33 |     合并excel表并且输出dataframe
 34 |     :rtype: pd.DataFrame
 35 |     :param directory: xlsx path
 36 |     :return: None
 37 |     """
 38 |     # 储存dataframe的临时变量
 39 |     df_ls = []
 40 |     # 遍历文件夹下的所有excel
 41 |     for root, dirs, files in os.walk(directory):
 42 |         for file in files:
 43 |             if file.endswith(".xlsx"):
 44 |                 df_ls.append(pd.read_excel(os.path.join(root, file), **kwargs))
 45 |     # 合并输出
 46 |     return pd.DataFrame().append(df_ls, sort=False)
 47 | 
 48 | 
 49 | # Filters 过滤器
 50 | def dateFilter(df: pd.DataFrame, date_col: str, date_format: str = None) -> pd.DataFrame:
 51 |     """
 52 |     过滤非6,12月的数据
 53 |     :param df: 数据表
 54 |     :param date_col: 表示时间的列名
 55 |     :param date_format: 日期格式
 56 |     :return: 过滤后的dataframe
 57 |     """
 58 |     if date_format is None:
 59 |         # 默认日期格式
 60 |         date_format = "%Y-%m-%d"
 61 |     return df[df[date_col].apply(lambda x: datetime.strptime(x, date_format).month in [6, 12])]
 62 | 
 63 | 
 64 | def MainBoardFilter(df: pd.DataFrame, code_col: str) -> pd.DataFrame:
 65 |     """
 66 |     过滤出A股数据
 67 |     :param df:数据表
 68 |     :param code_col:股票代码列名
 69 |     :return: 过滤后的数据
 70 |     """
 71 |     return df[np.isin(df[code_col].apply(convertCode).str[:3], ['000', '600', '601', '603', '605'])]
 72 | 
 73 | 
 74 | def FinanceFrames():
 75 |     # 合并&预处理财务报表数据
 76 |     balanceSheet = excel2Df("./data/raw/Balance sheet", index_col=0)
 77 |     IncomeSheet = excel2Df("./data/raw/Income statement")
 78 |     balanceSheet = balanceSheet[balanceSheet['Typrep'] == 'A'].sort_values(by=['Stkcd', 'Accper'])
 79 |     IncomeSheet = IncomeSheet[IncomeSheet['Typrep'] == 'A'].sort_values(by=['Stkcd', 'Accper'])
 80 |     # 日期过滤&板块过滤
 81 |     balanceSheet = MainBoardFilter(dateFilter(balanceSheet, 'Accper'), 'Stkcd').drop(['Typrep'], axis=1)
 82 |     IncomeSheet = MainBoardFilter(dateFilter(IncomeSheet, 'Accper'), 'Stkcd').drop(['Typrep'], axis=1)
 83 |     pd.merge(balanceSheet, IncomeSheet, on=['Stkcd', 'Accper'], how='outer'). \
 84 |         to_csv("./data/preprocessed/finance.csv", index=False)
 85 | 
 86 | 
 87 | def StockReturnFrames():
 88 |     # 合并&预处理
 89 |     # 市场收益率&无风险收益率
 90 |     mktmnth = excel2Df("./data/raw/mktmnth")
 91 |     mktmnth = mktmnth[mktmnth['Markettype'] == 5][['Trdmnt', 'Cmretwdos']]
 92 |     rf = excel2Df("./data/raw/rf")[['Clsdt', 'Nrrmtdt']]
 93 |     rf['month'] = rf['Clsdt'].str[:7]
 94 |     rf = rf.groupby('month').apply(lambda x: x.iloc[np.argmax(x['Clsdt'].values)])
 95 |     rf['Nrrmtdt'] = rf['Nrrmtdt'] / 100
 96 |     marketRf = pd.merge(mktmnth.rename(columns={"Trdmnt": "date"}),
 97 |                         rf[['month', 'Nrrmtdt']].rename(columns={"month": "date"}),
 98 |                         on='date', how='outer').sort_values(by=['date'])
 99 |     # 个股收益率
100 |     stockmnth = excel2Df("./data/raw/stockmnth")[['Stkcd', 'Trdmnt', 'Msmvosd', 'Mretwd', 'Markettype']]
101 |     stockmnth = stockmnth[np.isin(stockmnth['Markettype'], [1, 4])]. \
102 |         rename(columns={'Trdmnt': 'date'}).drop('Markettype', axis=1)
103 |     pd.merge(stockmnth, marketRf, on='date').sort_values(by=['Stkcd', 'date']).dropna(). \
104 |         to_csv("data/preprocessed/stockReturns.csv", index=None)
105 | 
106 | 
107 | def extraFactors():
108 |     finance = pd.read_csv(os.path.join(data_path, 'finance.csv')).rename(columns={'Accper': "date"})
109 |     finance['date'] = finance['date'].str[:7]
110 | 
111 |     stockReturns = pd.read_csv(os.path.join(data_path, 'stockReturns.csv'))
112 | 
113 |     df = pd.merge(stockReturns, finance, on=['Stkcd', 'date'])
114 |     #   Size
115 |     Size = stockReturns.groupby(['Stkcd']).apply(lambda x: pd.DataFrame(
116 |         {
117 |             'phase': [2016, 2017],
118 |             'Size': [
119 |                 x[x['date'] == '2016-06']['Msmvosd'].iat[0] if len(x[x['date'] == '2016-06']) > 0 else np.NAN,
120 |                 x[x['date'] == '2017-06']['Msmvosd'].iat[0] if len(x[x['date'] == '2017-06']) > 0 else np.NAN
121 |             ]
122 |         }).dropna()).reset_index().drop(['level_1'], axis=1)
123 |     #   B/M ratio
124 |     BM = df.groupby(['Stkcd']).apply(lambda x: pd.DataFrame({
125 |         'phase': [2016, 2017],
126 |         'BM': [
127 |             x[x['date'] == '2015-12']['total_equity'].iat[0] / x[x['date'] == '2015-12']['Msmvosd'].iat[0]
128 |             if len(x[x['date'] == '2015-12']) > 0 else np.NAN,
129 |             x[x['date'] == '2016-12']['total_equity'].iat[0] / x[x['date'] == '2016-12']['Msmvosd'].iat[0]
130 |             if len(x[x['date'] == '2016-12']) > 0 else np.NAN
131 |         ]
132 |     }).dropna()).reset_index().drop(['level_1'], axis=1)
133 | 
134 |     #   Inv
135 |     Inv = finance.groupby(['Stkcd']).apply(lambda x: pd.DataFrame({
136 |         'phase': [2016, 2017],
137 |         'Inv': [
138 |             (x[x['date'] == '2015-12']['total_assets'].iat[0] - x[x['date'] == '2014-12']['total_assets'].iat[0])
139 |             / x[x['date'] == '2014-12']['total_assets'].iat[0]
140 |             if len(x[x['date'] == '2015-12']) > 0 and len(x[x['date'] == '2014-12']) > 0 else np.NAN,
141 |             (x[x['date'] == '2016-12']['total_assets'].iat[0] - x[x['date'] == '2015-12']['total_assets'].iat[0])
142 |             / x[x['date'] == '2015-12']['total_assets'].iat[0]
143 |             if len(x[x['date'] == '2016-12']) > 0 and len(x[x['date'] == '2015-12']) > 0 else np.NAN
144 |         ]
145 |     }).dropna()).reset_index().drop(['level_1'], axis=1)
146 | 
147 |     #   OP
148 | 
149 |     OP = df.groupby(['Stkcd']).apply(lambda x: pd.DataFrame({
150 |         'phase': [2016, 2017],
151 |         'OP': [
152 |             x[x['date'] == '2015-12']['operating profit'].iat[0] / x[x['date'] == '2015-12']['total_equity'].iat[0]
153 |             if len(x[x['date'] == '2015-12']) > 0 else np.NAN,
154 |             x[x['date'] == '2016-12']['operating profit'].iat[0] / x[x['date'] == '2016-12']['total_equity'].iat[0]
155 |             if len(x[x['date'] == '2016-12']) > 0 and len(x[x['date'] == '2016-12']) > 0 else np.NAN
156 |         ]
157 |     }).dropna()).reset_index().drop(['level_1'], axis=1)
158 | 
159 |     pd.merge(pd.merge(BM, Inv), pd.merge(OP, Size), on=['Stkcd', 'phase']). \
160 |         to_csv(os.path.join(data_path, "SortCols.csv"), index=False)
161 | 
162 | 
163 | if __name__ == '__main__':
164 |     FinanceFrames()
165 |     StockReturnFrames()
166 |     extraFactors()
167 | 


--------------------------------------------------------------------------------
/regression.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | import os
  4 | import statsmodels.api as sm
  5 | 
  6 | # 读取并合并ff5数据
  7 | data1 = pd.read_csv("factor_result/2016FF5.csv")
  8 | data2 = pd.read_csv("factor_result/2017FF5.csv")
  9 | data_ff5 = data1.append(data2)
 10 | data_ff5 = data_ff5.reset_index(drop=True)
 11 | # 读取分组超额收益率数据
 12 | for fac in ['BM', 'Inv', 'OP']:
 13 |     data1 = pd.read_csv("group_result/2016_" + fac + "_Size.csv")
 14 |     data2 = pd.read_csv("group_result/2017_" + fac + "_Size.csv")
 15 |     if fac == 'BM':
 16 |         Size_BM = data1.append(data2)
 17 |     elif fac == 'Inv':
 18 |         Size_Inv = data1.append(data2)
 19 |     elif fac == 'OP':
 20 |         Size_OP = data1.append(data2)
 21 | # 对BM_Size分组进行五因子回归
 22 | regression_BM = pd.DataFrame(columns=['Size', 'BM', 'Const', 'Mktrf', 'SMB', 'HML', 'RMW', 'CMA', 'R-squared'])
 23 | regression_BM_pvalue = pd.DataFrame(columns=['Size', 'BM', 'Const_p', 'Mktrf_p', 'SMB_p', 'HML_p', 'RMW_p', 'CMA_p'])
 24 | regression_BM_tvalue = pd.DataFrame(columns=['Size', 'OP', 'Const_t', 'Mktrf_t', 'SMB_t', 'HML_t', 'RMW_t', 'CMA_t'])
 25 | x_ = Size_BM['row_count'].value_counts().shape[0]
 26 | y_ = Size_BM['col_count'].value_counts().shape[0]
 27 | for i in range(1, x_ + 1):
 28 |     for j in range(1, y_ + 1):
 29 |         X = data_ff5.iloc[0:19, 1:6]
 30 |         X = sm.add_constant(X)
 31 |         y = Size_BM[(Size_BM['row_count'] == i) & (Size_BM['col_count'] == j)].iloc[:, 5]
 32 |         y = y.reset_index(drop=True)
 33 |         model = sm.OLS(y, X)
 34 |         results = model.fit()
 35 |         regression_BM = regression_BM.append([{'Size': i, 'BM': j, 'Const': results.params[0],
 36 |                                                'Mktrf': results.params[1], 'SMB': results.params[2],
 37 |                                                'HML': results.params[3], 'RMW': results.params[4],
 38 |                                                'CMA': results.params[5], 'R-squared': results.rsquared}])
 39 |         regression_BM_pvalue = regression_BM_pvalue.append([{'Size': i, 'BM': j, 'Const_p': results.pvalues[0],
 40 |                                                              'Mktrf_p': results.pvalues[1], 'SMB_p': results.pvalues[2],
 41 |                                                              'HML_p': results.pvalues[3], 'RMW_p': results.pvalues[4],
 42 |                                                              'CMA_p': results.pvalues[5]}])
 43 |         regression_BM_tvalue = regression_BM_tvalue.append([{'Size': i, 'OP': j, 'Const_t': results.tvalues[0],
 44 |                                                                          'Mktrf_t': results.tvalues[1], 'SMB_t': results.tvalues[2],
 45 |                                                                          'HML_t': results.tvalues[3], 'RMW_t': results.tvalues[4],
 46 |                                                                          'CMA_t': results.tvalues[5]}])
 47 | # 对Inv_Size分组进行五因子回归
 48 | regression_Inv = pd.DataFrame(columns=['Size', 'Inv', 'Const', 'Mktrf', 'SMB', 'HML', 'RMW', 'CMA', 'R-squared'])
 49 | regression_Inv_pvalue = pd.DataFrame(columns=['Size', 'Inv', 'Const_p', 'Mktrf_p', 'SMB_p', 'HML_p', 'RMW_p', 'CMA_p'])
 50 | regression_Inv_tvalue = pd.DataFrame(columns=['Size', 'OP', 'Const_t', 'Mktrf_t', 'SMB_t', 'HML_t', 'RMW_t', 'CMA_t'])
 51 | x_ = Size_Inv['row_count'].value_counts().shape[0]
 52 | y_ = Size_Inv['col_count'].value_counts().shape[0]
 53 | for i in range(1, x_ + 1):
 54 |     for j in range(1, y_ + 1):
 55 |         X = data_ff5.iloc[0:19, 1:6]
 56 |         X = sm.add_constant(X)
 57 |         y = Size_Inv[(Size_Inv['row_count'] == i) & (Size_Inv['col_count'] == j)].iloc[:, 5]
 58 |         y = y.reset_index(drop=True)
 59 |         model = sm.OLS(y, X)
 60 |         results = model.fit()
 61 |         regression_Inv = regression_Inv.append([{'Size': i, 'Inv': j, 'Const': results.params[0],
 62 |                                                  'Mktrf': results.params[1], 'SMB': results.params[2],
 63 |                                                  'HML': results.params[3], 'RMW': results.params[4],
 64 |                                                  'CMA': results.params[5], 'R-squared': results.rsquared}])
 65 |         regression_Inv_pvalue = regression_Inv_pvalue.append([{'Size': i, 'Inv': j, 'Const_p': results.pvalues[0],
 66 |                                                                'Mktrf_p': results.pvalues[1],
 67 |                                                                'SMB_p': results.pvalues[2], 'HML_p': results.pvalues[3],
 68 |                                                                'RMW_p': results.pvalues[4],
 69 |                                                                'CMA_p': results.pvalues[5]}])
 70 |         regression_Inv_tvalue = regression_Inv_tvalue.append([{'Size': i, 'OP': j, 'Const_t': results.tvalues[0],
 71 |                                                                      'Mktrf_t': results.tvalues[1], 'SMB_t': results.tvalues[2],
 72 |                                                                      'HML_t': results.tvalues[3], 'RMW_t': results.tvalues[4],
 73 |                                                                      'CMA_t': results.tvalues[5]}])
 74 | # 对OP_Size分组进行五因子回归
 75 | regression_OP = pd.DataFrame(columns=['Size', 'OP', 'Const', 'Mktrf', 'SMB', 'HML', 'RMW', 'CMA', 'R-squared'])
 76 | regression_OP_pvalue = pd.DataFrame(columns=['Size', 'OP', 'Const_p', 'Mktrf_p', 'SMB_p', 'HML_p', 'RMW_p', 'CMA_p'])
 77 | regression_OP_tvalue = pd.DataFrame(columns=['Size', 'OP', 'Const_t', 'Mktrf_t', 'SMB_t', 'HML_t', 'RMW_t', 'CMA_t'])
 78 | x_ = Size_OP['row_count'].value_counts().shape[0]
 79 | y_ = Size_OP['col_count'].value_counts().shape[0]
 80 | for i in range(1, x_ + 1):
 81 |     for j in range(1, y_ + 1):
 82 |         X = data_ff5.iloc[0:19, 1:6]
 83 |         X = sm.add_constant(X)
 84 |         y = Size_OP[(Size_OP['row_count'] == i) & (Size_OP['col_count'] == j)].iloc[:, 5]
 85 |         y = y.reset_index(drop=True)
 86 |         model = sm.OLS(y, X)
 87 |         results = model.fit()
 88 |         regression_OP = regression_OP.append([{'Size': i, 'OP': j, 'Const': results.params[0],
 89 |                                                'Mktrf': results.params[1], 'SMB': results.params[2],
 90 |                                                'HML': results.params[3], 'RMW': results.params[4],
 91 |                                                'CMA': results.params[5], 'R-squared': results.rsquared}])
 92 |         regression_OP_pvalue = regression_OP_pvalue.append([{'Size': i, 'OP': j, 'Const_p': results.pvalues[0],
 93 |                                                              'Mktrf_p': results.pvalues[1], 'SMB_p': results.pvalues[2],
 94 |                                                              'HML_p': results.pvalues[3], 'RMW_p': results.pvalues[4],
 95 |                                                              'CMA_p': results.pvalues[5]}])
 96 |         regression_OP_tvalue = regression_OP_tvalue.append([{'Size': i, 'OP': j, 'Const_t': results.tvalues[0],
 97 |                                                                  'Mktrf_t': results.tvalues[1], 'SMB_t': results.tvalues[2],
 98 |                                                                  'HML_t': results.tvalues[3], 'RMW_t': results.tvalues[4],
 99 |                                                                  'CMA_t': results.tvalues[5]}])
100 | 
101 | if not os.path.exists("regression_result/"):
102 |     os.mkdir("regression_result/")
103 | # 保存表
104 | regression_BM.to_csv('regression_result/regression_BM.csv', index=False)
105 | regression_BM_pvalue.to_csv('regression_result/regression_BM_pvalue.csv', index=False)
106 | regression_BM_tvalue.to_csv('regression_result/regression_BM_tvalue.csv', index=False)
107 | regression_Inv.to_csv('regression_result/regression_Inv.csv', index=False)
108 | regression_Inv_pvalue.to_csv('regression_result/regression_Inv_pvalue.csv', index=False)
109 | regression_Inv_tvalue.to_csv('regression_result/regression_Inv_tvalue.csv', index=False)
110 | regression_OP.to_csv('regression_result/regression_OP.csv', index=False)
111 | regression_OP_pvalue.to_csv('regression_result/regression_OP_pvalue.csv', index=False)
112 | regression_OP_tvalue.to_csv('regression_result/regression_OP_tvalue.csv', index=False)
113 | 
114 | 
115 | # 变量间相互回归
116 | regression_factors = pd.DataFrame(columns=['DepVar','Const','Mktrf','SMB','HML','RMW','CMA','R-squared'])
117 | regression_factors_tvalue = pd.DataFrame(columns=['DepVar','Const_t','Mktrf_t','SMB_t','HML_t','RMW_t','CMA_t'])
118 | regression_factors_pvalue = pd.DataFrame(columns=['DepVar','Const_p','Mktrf_p','SMB_p','HML_p','RMW_p','CMA_p'])
119 | # 被解释变量——Mktrf
120 | y = data_ff5.loc[:,['Mktrf']]
121 | X = data_ff5.loc[:,['SMB','HML','RMW','CMA']]
122 | X = sm.add_constant(X)
123 | model = sm.OLS(y,X)
124 | results = model.fit()
125 | regression_factors = regression_factors.append([{'DepVar': 'Mktrf','Const': results.params[0],
126 |                                                  'Mktrf': 0, 'SMB': results.params[1],
127 |                                                  'HML': results.params[2], 'RMW': results.params[3],
128 |                                                  'CMA': results.params[4], 'R-squared': results.rsquared}])
129 | regression_factors_tvalue = regression_factors_tvalue.append([{'DepVar': 'Mktrf','Const_t': results.tvalues[0],
130 |                                                                'Mktrf_t': 0, 'SMB_t': results.tvalues[1],
131 |                                                                'HML_t': results.tvalues[2], 'RMW_t': results.tvalues[3],
132 |                                                                'CMA_t': results.tvalues[4]}])
133 | regression_factors_pvalue = regression_factors_pvalue.append([{'DepVar': 'Mktrf','Const_p': results.pvalues[0],
134 |                                                                'Mktrf_p': 0, 'SMB_p': results.pvalues[1],
135 |                                                                'HML_p': results.pvalues[2], 'RMW_p': results.pvalues[3],
136 |                                                                'CMA_p': results.pvalues[4]}])
137 | # 被解释变量——SMB
138 | y = data_ff5.loc[:,['SMB']]
139 | X = data_ff5.loc[:,['Mktrf','HML','RMW','CMA']]
140 | X = sm.add_constant(X)
141 | model = sm.OLS(y,X)
142 | results = model.fit()
143 | regression_factors = regression_factors.append([{'DepVar': 'SMB','Const': results.params[0],
144 |                                                  'Mktrf': results.params[1], 'SMB': 0,
145 |                                                  'HML': results.params[2], 'RMW': results.params[3],
146 |                                                  'CMA': results.params[4], 'R-squared': results.rsquared}])
147 | regression_factors_tvalue = regression_factors_tvalue.append([{'DepVar': 'SMB','Const_t': results.tvalues[0],
148 |                                                                'Mktrf_t': results.tvalues[1], 'SMB_t': 0,
149 |                                                                'HML_t': results.tvalues[2], 'RMW_t': results.tvalues[3],
150 |                                                                'CMA_t': results.tvalues[4]}])
151 | regression_factors_pvalue = regression_factors_pvalue.append([{'DepVar': 'SMB','Const_p': results.pvalues[0],
152 |                                                                'Mktrf_p': results.pvalues[1], 'SMB_p': 0,
153 |                                                                'HML_p': results.pvalues[2], 'RMW_p': results.pvalues[3],
154 |                                                                'CMA_p': results.pvalues[4]}])
155 | # 被解释变量——HML
156 | y = data_ff5.loc[:,['HML']]
157 | X = data_ff5.loc[:,['Mktrf','SMB','RMW','CMA']]
158 | X = sm.add_constant(X)
159 | model = sm.OLS(y,X)
160 | results = model.fit()
161 | regression_factors = regression_factors.append([{'DepVar': 'HML','Const': results.params[0],
162 |                                                  'Mktrf': results.params[1], 'SMB': results.params[2],
163 |                                                  'HML': 0, 'RMW': results.params[3],
164 |                                                  'CMA': results.params[4], 'R-squared': results.rsquared}])
165 | regression_factors_tvalue = regression_factors_tvalue.append([{'DepVar': 'HML','Const_t': results.tvalues[0],
166 |                                                                'Mktrf_t': results.tvalues[1], 'SMB_t': results.tvalues[2],
167 |                                                                'HML_t': 0, 'RMW_t': results.tvalues[3],
168 |                                                                'CMA_t': results.tvalues[4]}])
169 | regression_factors_pvalue = regression_factors_pvalue.append([{'DepVar': 'HML','Const_p': results.pvalues[0],
170 |                                                                'Mktrf_p': results.pvalues[1], 'SMB_p': results.pvalues[2],
171 |                                                                'HML_p': 0, 'RMW_p': results.pvalues[3],
172 |                                                                'CMA_p': results.pvalues[4]}])
173 | # 被解释变量——RMW
174 | y = data_ff5.loc[:,['RMW']]
175 | X = data_ff5.loc[:,['Mktrf','SMB','HML','CMA']]
176 | X = sm.add_constant(X)
177 | model = sm.OLS(y,X)
178 | results = model.fit()
179 | regression_factors = regression_factors.append([{'DepVar': 'RMW','Const': results.params[0],
180 |                                                  'Mktrf': results.params[1], 'SMB': results.params[2],
181 |                                                  'HML': results.params[3], 'RMW': 0,
182 |                                                  'CMA': results.params[4], 'R-squared': results.rsquared}])
183 | regression_factors_tvalue = regression_factors_tvalue.append([{'DepVar': 'RMW','Const_t': results.tvalues[0],
184 |                                                                'Mktrf_t': results.tvalues[1], 'SMB_t': results.tvalues[2],
185 |                                                                'HML_t': results.tvalues[3], 'RMW_t': 0,
186 |                                                                'CMA_t': results.tvalues[4]}])
187 | regression_factors_pvalue = regression_factors_pvalue.append([{'DepVar': 'RMW','Const_p': results.pvalues[0],
188 |                                                                'Mktrf_p': results.pvalues[1], 'SMB_p': results.pvalues[2],
189 |                                                                'HML_p': results.pvalues[3], 'RMW_p': 0,
190 |                                                                'CMA_p': results.pvalues[4]}])
191 | # 被解释变量——CMA
192 | y = data_ff5.loc[:,['CMA']]
193 | X = data_ff5.loc[:,['Mktrf','SMB','HML','RMW']]
194 | X = sm.add_constant(X)
195 | model = sm.OLS(y,X)
196 | results = model.fit()
197 | regression_factors = regression_factors.append([{'DepVar': 'CMA','Const': results.params[0],
198 |                                                  'Mktrf': results.params[0], 'SMB': results.params[1],
199 |                                                  'HML': results.params[2], 'RMW': results.params[3],
200 |                                                  'CMA': 0, 'R-squared': results.rsquared}])
201 | regression_factors_tvalue = regression_factors_tvalue.append([{'DepVar': 'SMB','Const_t': results.tvalues[0],
202 |                                                                'Mktrf_t': results.tvalues[1], 'SMB_t': results.tvalues[2],
203 |                                                                'HML_t': results.tvalues[3], 'RMW_t': results.tvalues[4],
204 |                                                                'CMA_t': 0}])
205 | regression_factors_pvalue = regression_factors_pvalue.append([{'DepVar': 'SMB','Const_p': results.pvalues[0],
206 |                                                                'Mktrf_p': results.pvalues[1], 'SMB_p': results.pvalues[2],
207 |                                                                'HML_p': results.pvalues[3], 'RMW_p': results.pvalues[4],
208 |                                                                'CMA_p': 0}])
209 | 
210 | regression_factors.to_csv('regression_result/regression_factors.csv', index=False)
211 | regression_factors_tvalue.to_csv('regression_result/regression_factors_tvalue.csv', index=False)
212 | regression_factors_pvalue.to_csv('regression_result/regression_factors_pvalue.csv', index=False)


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas==1.1.4
2 | statsmodels==0.12.1
3 | numpy==1.22.0
4 | 


--------------------------------------------------------------------------------