├── LICENSE ├── README.md └── secdef_parser.py /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2020 Databento, Inc. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 6 | 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # secdef-parser 2 | 3 | This project contains a command line tool that demonstrates how to parse secdef 4 | files found on CME's public FTP server (ftp://ftp.cmegroup.com/SBEFix/Production/). 5 | For the purpose of this demonstration, the tool generates a list of the most 6 | active futures products ranked by open interest. 7 | 8 | 9 | # About secdef files 10 | 11 | The secdef file contains useful production instrument definition data and 12 | settlement prices formatted as concatenated FIX (plaintext) messages. You can 13 | learn more about the message specifications [here](https://www.cmegroup.com/confluence/display/EPICSANDBOX/MDP+3.0+-+Security+Definition) 14 | especially if you want to parse other fields from the secdef files besides the ones 15 | extracted by this tool. 16 | 17 | CME updates its secdef files almost daily. However, for intraday instrument 18 | definition updates, you should not be using this approach. 19 | 20 | More information about secdef files can be found at: ftp://ftp.cmegroup.com/SBEFix/Production/secdef_disclaimer.txt 21 | 22 | 23 | # Requirements 24 | 25 | Required: 26 | - Python >= 3.4 27 | - pandas >= 0.24.2 28 | 29 | Recommended: 30 | - Minimum 4 GB free memory. secdef files often contain 400k+ instrument 31 | definitions which comes to 300+ MB uncompressed. This tool is not optimized 32 | for low memory devices and uses substantial amount of memory to parse the 33 | secdef files relatively quickly. 34 | 35 | 36 | # Installation 37 | 38 | Clone this repository. 39 | 40 | ``` 41 | git clone https://github.com/databento/secdef-parser.git 42 | cd secdef-parser 43 | ``` 44 | 45 | # Usage 46 | 47 | By default, the parser will look for `secdef.dat.gz` as the input secdef file 48 | in your local directory and write the output as comma-separated 49 | values to `list.csv`. 50 | 51 | First, download the latest copy of the secdef file from ftp://ftp.cmegroup.com/SBEFix/Production/secdef.dat.gz 52 | Then run the parser: 53 | 54 | ```bash 55 | # Run with default input and output 56 | $ python secdef_parser.py 57 | ``` 58 | 59 | You can change the parameters: 60 | 61 | ```bash 62 | # Parse a regular (compressed) secdef file 63 | $ python secdef_parser.py -i secdef.dat.gz 64 | 65 | # Alternatively, parse a decompressed secdef file 66 | $ python secdef_parser.py -i secdef.dat 67 | 68 | # Write to a specific location 69 | $ python secdef_parser.py -o path/to/my/output.csv 70 | ``` 71 | 72 | For your convenience, the command line tool can download the secdef file 73 | directly for you. 74 | ```bash 75 | $ python secdef_parser.py --download -o path/to/my/output.csv 76 | ``` 77 | 78 | 79 | # Results 80 | 81 | Here are the top 40 most active products on May 11, 2020. 82 | 83 | ``` 84 | SecurityExchange,SecurityGroup,UnderlyingProduct,OpenInterestQty 85 | XCME,GE,Interest Rate,10743955 86 | XCBT,ZF,Interest Rate,3618711 87 | XCBT,ZN,Interest Rate,3391115 88 | XCME,ES,Equity,3265839 89 | XNYM,CL,Energy,3003110 90 | XCBT,ZT,Interest Rate,2447405 91 | XNYM,NG,Energy,2217482 92 | XCBT,ZS,Commodity/Agriculture,1729122 93 | XCBT,ZQ,Interest Rate,1719442 94 | XCBT,ZC,Commodity/Agriculture,1395498 95 | XNYM,PW,Energy,1131339 96 | XCBT,ZU,Interest Rate,1061430 97 | XCBT,ZB,Interest Rate,1002804 98 | XCBT,Z1,Interest Rate,888267 99 | XNYM,CC,Energy,761722 100 | XCME,6E,Currency,562127 101 | XCME,RY,Equity,531217 102 | XCEC,GC,Metals,526345 103 | XCME,SS,Interest Rate,430569 104 | XCBT,ZW,Commodity/Agriculture,346276 105 | XNYM,HX,Energy,282184 106 | XCME,LE,Commodity/Agriculture,267117 107 | XCME,NQ,Equity,243555 108 | XNYM,OP,Energy,232991 109 | XNYM,ZZ,Other,226829 110 | XCME,0B,Equity,226530 111 | XKLS,BC,Commodity/Agriculture,216524 112 | XCBT,KE,Commodity/Agriculture,212212 113 | XCME,HE,Commodity/Agriculture,200599 114 | XNYM,PT,Energy,193133 115 | XCEC,HG,Metals,177534 116 | XCME,6B,Currency,163511 117 | XCME,6J,Currency,160269 118 | XCME,SD,Equity,146997 119 | XCEC,SI,Metals,141423 120 | XCME,6A,Currency,138974 121 | XNYM,GS,Energy,130863 122 | XCME,6C,Currency,120412 123 | XCME,MS,Equity,111053 124 | XCME,SP,Equity,103459 125 | ``` 126 | 127 | To understand the SecurityGroup codes, use the [CME Product Slate](https://www.cmegroup.com/trading/products/). 128 | Not surprisingly, Eurodollar futures are by far the most active among 129 | outrights, followed by US Treasury products, ES and crude oil. 130 | 131 | 132 | # Historical secdef files 133 | 134 | For additional, historical secdef data, you can find a free batch of secdef 135 | files from Dec 2019 hosted by Databento [here](https://s3.amazonaws.com/databento.com/samples/sample-cme-secdef-201912.zip). 136 | To learn more about Databento, visit us at [https://databento.com/](https://databento.com). 137 | 138 | 139 | # Release notes 140 | 141 | *0.1.0* 142 | - Initial release 143 | 144 | 145 | # License 146 | 147 | This project is licensed and made available under the terms of the MIT 148 | License. See the contained `LICENSE` file for specific language. 149 | -------------------------------------------------------------------------------- /secdef_parser.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """secdef_parser.py: Tool to parse secdef and find most active instruments""" 3 | 4 | import os 5 | import sys 6 | import urllib.request as request 7 | import gzip 8 | from argparse import ArgumentParser 9 | from contextlib import closing 10 | from io import BytesIO 11 | 12 | import pandas as pd 13 | 14 | 15 | VERSION = '0.1.0' 16 | 17 | SECDEF_URL = "ftp://ftp.cmegroup.com/SBEFix/Production/secdef.dat.gz" 18 | 19 | SECDEF_MAP = {'207': 'SecurityExchange', 20 | '1151': 'SecurityGroup', 21 | '55': 'Symbol', 22 | '167': 'SecurityType', 23 | '462': 'UnderlyingProduct', 24 | '5792': 'OpenInterestQty'} 25 | 26 | ASSET_CLASS_MAP = {'2': 'Commodity/Agriculture', 27 | '4': 'Currency', 28 | '5': 'Equity', 29 | '12': 'Other', 30 | '14': 'Interest Rate', 31 | '15': 'FX Cash', 32 | '16': 'Energy', 33 | '17': 'Metals'} 34 | 35 | DEFAULT_INPUT = 'secdef.dat.gz' 36 | DEFAULT_OUTPUT = 'list.csv' 37 | 38 | secdef_keys = SECDEF_MAP.keys() 39 | 40 | 41 | def download_secdef(): 42 | """ 43 | Downloads secdef file from CME FTP server into in-memory object 44 | """ 45 | print("Downloading secdef file...") 46 | 47 | try: 48 | with closing(request.urlopen(SECDEF_URL)) as response: 49 | content_raw = response.read() 50 | except Exception as e: 51 | print(e.message) 52 | print("Unable to access secdef from CME FTP server") 53 | sys.exit(1) 54 | 55 | print("Download complete, decompressing...") 56 | 57 | content = gzip.GzipFile(fileobj=BytesIO(content_raw)).read() 58 | return content.decode('utf8') 59 | 60 | 61 | def process_row(row): 62 | """ 63 | Extracts only the fields we are interested in and converts them to 64 | key-value representation 65 | """ 66 | return dict((k, v) for k, v in zip(row[::2], row[1::2]) 67 | if k in secdef_keys) 68 | 69 | 70 | def parse_secdef(secdef_raw): 71 | """ 72 | Parses raw, uncompressed secdef data and generates cleaned dataframe 73 | with aggregate open interest quantity 74 | """ 75 | 76 | print("Parsing secdef data... this could take a minute...") 77 | 78 | # Split by row and remove empty rows 79 | secdef_raw = [l for l in secdef_raw.split('\n') if l] 80 | 81 | # Parse FIX format and prepare key-value representation for dataframe 82 | secdef_csv = ['='.join(x.rstrip('\x01').split('\x01')).split('=') for x in 83 | secdef_raw] 84 | data = map(process_row, secdef_csv) 85 | 86 | # Generate dataframe 87 | df = pd.DataFrame(data) 88 | 89 | # Convert fields to friendly names 90 | df.rename(columns=SECDEF_MAP, inplace=True) 91 | 92 | # Rearrange columns 93 | df = df[['SecurityExchange', 94 | 'SecurityGroup', 95 | 'Symbol', 96 | 'SecurityType', 97 | 'UnderlyingProduct', 98 | 'OpenInterestQty']] 99 | 100 | # Remove nulls 101 | df.dropna(inplace=True) 102 | 103 | # Convert str quantities to integer type 104 | df['OpenInterestQty'] = df['OpenInterestQty'].astype(int) 105 | 106 | # Remove non-futures instruments 107 | df = df[df['SecurityType'] == 'FUT'] 108 | 109 | # Aggregate open interest by security group 110 | agg = df.groupby(['SecurityExchange', 'SecurityGroup', 111 | 'UnderlyingProduct'], as_index=False).sum() 112 | 113 | return agg 114 | 115 | 116 | def main(): 117 | 118 | # Current path 119 | this_file_dir = os.path.dirname(os.path.realpath(__file__)) 120 | 121 | # Parse arguments 122 | parser = ArgumentParser(description="Tool to parse secdef and find most \ 123 | active instruments") 124 | 125 | group = parser.add_mutually_exclusive_group(required=False) 126 | group.add_argument('-i', 127 | type=str, 128 | default=DEFAULT_INPUT, 129 | metavar='SECDEF_FILE', 130 | help="Input secdef.dat or secdef.dat.gz file \ 131 | (default=secdef.dat.gz)", 132 | dest='input') 133 | group.add_argument('-d', 134 | '--download', 135 | action='store_true', 136 | default=False, 137 | help="Download data from CME FTP server (default=off)", 138 | dest='download') 139 | 140 | parser.add_argument('-o', 141 | type=str, 142 | default=os.path.join(this_file_dir, DEFAULT_OUTPUT), 143 | help="Output CSV file listing most active instruments \ 144 | (default=list.csv)", 145 | dest='output') 146 | 147 | parser.add_argument('--version', 148 | action='store_true', 149 | default=False, 150 | help="Prints version number", 151 | dest='version') 152 | 153 | args = parser.parse_args() 154 | 155 | if args.version: 156 | print(VERSION) 157 | sys.exit(0) 158 | 159 | if args.download: 160 | secdef_raw = download_secdef() 161 | 162 | else: 163 | # Error-checking: -i 164 | if not os.path.isfile(args.input): 165 | print("Unable to find specified input secdef file") 166 | sys.exit(1) 167 | 168 | bname, ext = os.path.splitext(os.path.basename(args.input)) 169 | 170 | if ext == '.gz': 171 | # Expect compressed secdef file 172 | with gzip.open(args.input, 'rb') as f: 173 | secdef_raw = f.read().decode('utf8') 174 | 175 | else: 176 | # Expect uncompressed secdef file 177 | with open(args.input, 'r') as f: 178 | secdef_raw = f.read() 179 | 180 | try: 181 | agg = parse_secdef(secdef_raw) 182 | except Exception as e: 183 | print(e.message) 184 | print("Failed to parse secdef file, check if your file is corrupted") 185 | sys.exit(1) 186 | 187 | # Make output pretty 188 | agg['UnderlyingProduct'] = agg['UnderlyingProduct']\ 189 | .apply(lambda k: ASSET_CLASS_MAP.get(k)) 190 | agg.sort_values(by='OpenInterestQty', ascending=False, inplace=True) 191 | 192 | try: 193 | agg.to_csv(args.output, index=False) 194 | except Exception as e: 195 | print(e.message) 196 | print("Failed to write output file, check if directory exists") 197 | sys.exit(1) 198 | 199 | 200 | if __name__ == '__main__': 201 | main() 202 | --------------------------------------------------------------------------------