├── .gitignore
├── docs
    └── img
    │   ├── top-10-holdings.png
    │   ├── autoclassified-regions.png
    │   ├── autoclassified-sectors.png
    │   └── autoclassified-stock-style.png
├── requirements.txt
├── isin2secid.json
├── readme.md
└── portfolio-classifier.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.py[cod]
2 | *.sqlite


--------------------------------------------------------------------------------
/docs/img/top-10-holdings.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfons1Qto12/pp-portfolio-classifier/HEAD/docs/img/top-10-holdings.png


--------------------------------------------------------------------------------
/docs/img/autoclassified-regions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfons1Qto12/pp-portfolio-classifier/HEAD/docs/img/autoclassified-regions.png


--------------------------------------------------------------------------------
/docs/img/autoclassified-sectors.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfons1Qto12/pp-portfolio-classifier/HEAD/docs/img/autoclassified-sectors.png


--------------------------------------------------------------------------------
/docs/img/autoclassified-stock-style.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfons1Qto12/pp-portfolio-classifier/HEAD/docs/img/autoclassified-stock-style.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | Jinja2==2.11.2
2 | requests==2.32.3
3 | requests-cache==0.5.2
4 | jsonpath-ng==1.5.3
5 | markupsafe==1.1.1
6 | beautifulsoup4==4.9.3
7 | 


--------------------------------------------------------------------------------
/isin2secid.json:
--------------------------------------------------------------------------------
1 | {
2 |  "IE00B8FHGS14": "0P0000Y2A1|etf|de",
3 |  "IE00BP3QZ601": "0P00014G96|etf|de",
4 |  "IE00BP3QZ825": "0P00014G97|etf|de",
5 |  "IE00BP3QZB59": "0P00014G99|etf|de",
6 |  "IE00BP3QZD73": "0P00014G98|etf|de"
7 | }


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
 1 | # pp-portfolio-classifier
 2 | 
 3 | 
 4 | Python script that automatically classifies funds/ETFs (for stocks) managed in [Portfolio Performance](https://www.portfolio-performance.info/) files by the asset types, sectors, regions, and countries they are invested in. Furthermore it determines the Top 10 holdings of each fund. The classifier uses the information from Morningstar as a data source for classification.
 5 | 
 6 | Based on the script by fbuchinger and fizban99.
 7 | 
 8 | This version of the script contains a major modification and additional features. Instead of creating taxonomies anew, it updates existing ones (if they exist). This has the advantage that e.g. colours and balancing weights set by the user are maintained. If you want to preserve the earlier versions, please duplicate and rename them before running the script. If you want taxonomies to be created from scratch as in the original version, please delete or rename them before running the script. Note that the script keeps previously obtained classifications of a fund/ETF, if the instrument is inactive or if no information can be retrieved from Morningstar for the corresponding taxonomy. Note also that holdings are kept as classifications in the taxonomy 'Holding' even if they are no longer associated with any security.
 9 | 
10 | Furthermore, there are the following improvements/features: Script now retrieves data for all active funds/ETFs in the file (not just for those having transactions). Script avoids category entries with zero weight. Script tries to round total sum of a taxonomy to 100% (or less) when it slightly exceeds 100%. Script ignores negative weights and rounds individual weight of a category down to 100%, if it exceeds 100%. Script is more verbose and informs user more about its activities. Script dumps the retrieved data into pp_data_fetched.csv (which is overwritten in each run).
11 | 
12 | Further addition: Script now supports a mechanism to retrieve classification for funds/ETFs from an alternative ISIN. It is used when Morningstar data for the native ISIN does not contain classification for a taxonomy. User needs to add #PPC:[ISIN2=*XY0011223344*] with the desired ISIN value to note field of the security in PP (besides other content). (This does not work for individual stocks.)
13 | 
14 | Latest addition (Oct 2024): Script now also tries to retrieve classifications for stocks when `-stocks` is added to command line.
15 |  
16 | ## Warnings & Known Issues
17 | - Experimental software - use with caution! 
18 | - Check the [Portfolio Performance Forum thread](https://forum.portfolio-performance.info/t/automatic-import-of-classifications/14672)
19 | - New version of the script might use different colours than the original version when it needs to assign them. (Exiting colour assignments are anyway kept when a taxonomy is updated).
20 | - This version updates the name of the geographic region "Europe Emerging". So if you run the script on an xml from a previous version, this category will be recreated from scratch and colours, balancing weights, etc. will not be maintained. (If you want to keep them, rename it manually from "Europe Emerging / Russia" to "Europe Emerging" before you run the script).
21 | - If you have issues with fetching data, try deleting the files cache.sqlite. Sometimes this helps :-).
22 | 
23 | ## Installation
24 | requires Python 3, git and Portfolio Performance.
25 | Steps:
26 | 1. `git clone` this repository
27 | 2. in the install directory run `pip3 install -r requirements.txt`
28 | 3. test the script by running either `python portfolio-classifier.py test/multifaktortest.xml` or `python portfolio-classifier.py test/multifaktortest.xml -stocks`. (The latter also updates the stocks included in the the xml file). Then open the resulting file `pp_classified.xml` in Portfolio Performance.
29 | 
30 | ## How it works:
31 | 
32 | **Important: Never try this script on your original Portfolio Performance files -> risk of data loss. Always make a copy first that is safe to play around with or create a dummy portfolio like in test folder.**
33 | 
34 | 1. In Portfolio Performance, save a copy of your portfolio file as unencrypted XML. The script won't work with any other format (i.e. it also doesn't work with the more recent 'XML with "id" attributes' format of Portfolio Performance).
35 | 2. The MSID is the value of the attribute is the code at the end of the Morningstar url of the security (the id of length 10 after the  "?id=", something like 0P00012345). The script will try to get it from the Morningstar website, but the script might have to be configured with the domain of your country, since not all securities area available in all countries. The domain is only important for the translation from ISIN to secid. Once the secid (aka MSID) is obtained, the Morningstar APIs are country-independent. The script caches the mapping between the ISIN and the secid plus the security id type and the domain of the security into a file called isin2secid.json in order to reduce the number of requests.
36 | 3. Run the script `python portfolio-classifier.py <input_file> [<output_file>] [-d domain] [-stocks]` If output file is not specified, a file called pp_classified.xml will be created. If domain is not specified, 'de' will be used for morningstar.de. This is only used to retrieve the corresponding internal Morningstar id (MSID) for each ISIN.
37 | 4. open pp_classified.xml (or the given output_file name) in Portfolio Performance and check out the modified or added taxonomies and classifications.
38 | 
39 | 
40 | ## Gallery
41 | 
42 | ### Autoclassified stock-style
43 | <img src="docs/img/autoclassified-stock-style.png" alt="Autoclassified Security types" width="600"/>
44 | 
45 | 
46 | 
47 | ### Autoclassified Regions
48 | <img src="docs/img/autoclassified-regions.png" alt="Autoclassified Regions" width="600"/>
49 | 
50 | 
51 | 
52 | ### Autoclassified Sectors
53 | <img src="docs/img/autoclassified-sectors.png" alt="Autoclassified Sectors" width="600"/>
54 | 
55 | 
56 | 
57 | ### List of stocks and holdings from Top 10 of each fund
58 | <img src="docs/img/top-10-holdings.png" alt="Holdings from Top 10" width="600"/>
59 | 


--------------------------------------------------------------------------------
/portfolio-classifier.py:
--------------------------------------------------------------------------------
   1 | import xml.etree.ElementTree as ET
   2 | from xml.sax.saxutils import escape
   3 | import uuid
   4 | import argparse
   5 | import re
   6 | from jsonpath_ng.ext import parse
   7 | from typing import NamedTuple
   8 | from itertools import cycle
   9 | from collections import defaultdict
  10 | from jinja2 import Environment, BaseLoader
  11 | import requests
  12 | import requests_cache
  13 | from bs4 import BeautifulSoup 
  14 | import os
  15 | import json
  16 | 
  17 | 
  18 | requests_cache.install_cache(expire_after=60) #cache downloaded files for 1 minute
  19 | requests_cache.remove_expired_responses()
  20 | 
  21 | 
  22 | COLORS = [
  23 |   "#C0B0A0",
  24 |   "#CD9575",
  25 |   "#FDD9B5",
  26 |   "#78DBE2",
  27 |   "#87A96B",
  28 |   "#FFA474",
  29 |   "#FAE7B5",
  30 |   "#9F8170",
  31 |   "#FD7C6E",
  32 |   "#000000",
  33 |   "#ACE5EE",
  34 |   "#1F75FE",
  35 |   "#A2A2D0",
  36 |   "#6699CC",
  37 |   "#0D98BA",
  38 |   "#7366BD",
  39 |   "#DE5D83",
  40 |   "#CB4154",
  41 |   "#B4674D",
  42 |   "#FF7F49",
  43 |   "#EA7E5D",
  44 |   "#B0B7C6",
  45 |   "#FFFF99",
  46 |   "#1CD3A2",
  47 |   "#FFAACC",
  48 |   "#DD4492",
  49 |   "#1DACD6",
  50 |   "#BC5D58",
  51 |   "#DD9475",
  52 |   "#9ACEEB",
  53 |   "#FFBCD9",
  54 |   "#FDDB6D",
  55 |   "#2B6CC4",
  56 |   "#EFCDB8",
  57 |   "#6E5160",
  58 |   "#CEFF1D",
  59 |   "#71BC78",
  60 |   "#6DAE81",
  61 |   "#C364C5",
  62 |   "#CC6666",
  63 |   "#E7C697",
  64 |   "#FCD975",
  65 |   "#A8E4A0",
  66 |   "#95918C",
  67 |   "#1CAC78",
  68 |   "#1164B4",
  69 |   "#F0E891",
  70 |   "#FF1DCE",
  71 |   "#B2EC5D",
  72 |   "#5D76CB",
  73 |   "#CA3767",
  74 |   "#3BB08F",
  75 |   "#FEFE22",
  76 |   "#FCB4D5",
  77 |   "#FFF44F",
  78 |   "#FFBD88",
  79 |   "#F664AF",
  80 |   "#AAF0D1",
  81 |   "#CD4A4C",
  82 |   "#EDD19C",
  83 |   "#979AAA",
  84 |   "#FF8243",
  85 |   "#C8385A",
  86 |   "#EF98AA",
  87 |   "#FDBCB4",
  88 |   "#1A4876",
  89 |   "#30BA8F",
  90 |   "#C54B8C",
  91 |   "#1974D2",
  92 |   "#FFA343",
  93 |   "#BAB86C",
  94 |   "#FF7538",
  95 |   "#FF2B2B",
  96 |   "#F8D568",
  97 |   "#E6A8D7",
  98 |   "#414A4C",
  99 |   "#FF6E4A",
 100 |   "#1CA9C9",
 101 |   "#FFCFAB",
 102 |   "#C5D0E6",
 103 |   "#FDDDE6",
 104 |   "#158078",
 105 |   "#FC74FD",
 106 |   "#F78FA7",
 107 |   "#8E4585",
 108 |   "#7442C8",
 109 |   "#9D81BA",
 110 |   "#FE4EDA",
 111 |   "#FF496C",
 112 |   "#D68A59",
 113 |   "#714B23",
 114 |   "#FF48D0",
 115 |   "#E3256B",
 116 |   "#EE204D",
 117 |   "#FF5349",
 118 |   "#C0448F",
 119 |   "#1FCECB",
 120 |   "#7851A9",
 121 |   "#FF9BAA",
 122 |   "#FC2847",
 123 |   "#76FF7A",
 124 |   "#9FE2BF",
 125 |   "#A5694F",
 126 |   "#8A795D",
 127 |   "#45CEA2",
 128 |   "#FB7EFD",
 129 |   "#CDC5C2",
 130 |   "#80DAEB",
 131 |   "#ECEABE",
 132 |   "#FFCF48",
 133 |   "#FD5E53",
 134 |   "#FAA76C",
 135 |   "#18A7B5",
 136 |   "#EBC7DF",
 137 |   "#FC89AC",
 138 |   "#DBD7D2",
 139 |   "#17806D",
 140 |   "#DEAA88",
 141 |   "#77DDE7",
 142 |   "#FFFF66",
 143 |   "#926EAE",
 144 |   "#324AB2",
 145 |   "#F75394",
 146 |   "#FFA089",
 147 |   "#8F509D",
 148 |   "#FFFFFF",
 149 |   "#A2ADD0",
 150 |   "#FF43A4",
 151 |   "#FC6C85",
 152 |   "#CDA4DE",
 153 |   "#FCE883",
 154 |   "#C5E384",
 155 |   "#FFAE42"
 156 | ]
 157 | 
 158 | 
 159 | taxonomies = {'Asset-Type': {'url': 'https://www.emea-api.morningstar.com/sal/sal-service/fund/process/asset/v2/{secid}/data',
 160 |                              'component': 'sal-components-mip-asset-allocation',
 161 |                              'jsonpath': '$.allocationMap',                                              
 162 |                              'category': '',                                                
 163 |                              'percent': 'netAllocation',
 164 |                              'url2': 'https://www.emea-api.morningstar.com/sal/sal-service/stock/equityOverview/{secid}/data',
 165 |                              'component2': 'sal-eqsv-overview',
 166 |                              'jsonpath2': '$.securityName',                                              
 167 |                              'map':{"AssetAllocNonUSEquity":"Stocks", 
 168 |                                     "CANAssetAllocCanEquity" : "Stocks", 
 169 |                                     "CANAssetAllocUSEquity" : "Stocks",
 170 |                                     "CANAssetAllocInternationalEquity": "Stocks",
 171 |                                     "AssetAllocUSEquity":"Stocks",
 172 |                                     "AssetAllocCash":"Cash",
 173 |                                     "CANAssetAllocCash": "Stocks",
 174 |                                     "AssetAllocBond":"Bonds", 
 175 |                                     "CANAssetAllocFixedIncome": "Bonds",
 176 |                                     "UK bond":"Bonds",
 177 |                                     "AssetAllocNotClassified":"Other",
 178 |                                     "AssetAllocOther":"Other",
 179 |                                     "CANAssetAllocOther": "Other"
 180 |                                     }
 181 |                              },
 182 |               'Stock-style': {'url': 'https://www.emea-api.morningstar.com/sal/sal-service/fund/process/weighting/{secid}/data',
 183 |                             'component': 'sal-components-mip-style-weight',
 184 |                             'jsonpath': '$',
 185 |                             'category': '',
 186 |                             'percent': '',
 187 |                             'url2': 'https://www.emea-api.morningstar.com/sal/sal-service/stock/equityOverview/{secid}/data',
 188 |                             'component2': 'sal-eqsv-overview',
 189 |                             'jsonpath2': '$.investmentStyle',
 190 |                             'map':{ "largeValue":"Large Value",
 191 |                                     "largeBlend":"Large Blend", 
 192 |                                     "largeGrowth":"Large Growth",
 193 |                                     "middleValue":"Mid-Cap Value",
 194 |                                     "middleBlend":"Mid-Cap Blend", 
 195 |                                     "middleGrowth":"Mid-Cap Growth",
 196 |                                     "smallValue":"Small Value",
 197 |                                     "smallBlend":"Small Blend",
 198 |                                     "smallGrowth":"Small Growth",
 199 |                                     },
 200 |                             'map2':{"1":"Large Value",
 201 |                                     "2":"Large Blend", 
 202 |                                     "3":"Large Growth",
 203 |                                     "4":"Mid-Cap Value",
 204 |                                     "5":"Mid-Cap Blend", 
 205 |                                     "6":"Mid-Cap Growth",
 206 |                                     "7":"Small Value",
 207 |                                     "8":"Small Blend",
 208 |                                     "9":"Small Growth",
 209 |                                     }   
 210 |                             },                            
 211 | 
 212 |               'Sector': {'url': 'https://www.emea-api.morningstar.com/sal/sal-service/fund/portfolio/v2/sector/{secid}/data',
 213 |                          'component': 'sal-components-mip-sector-exposure',
 214 |                          'jsonpath': '$.EQUITY.fundPortfolio',
 215 |                          'category': '',
 216 |                          'percent': '',
 217 |                          'url2': 'https://www.emea-api.morningstar.com/sal/sal-service/stock/equityOverview/{secid}/data',
 218 |                          'component2': 'sal-eqsv-overview',
 219 |                          'jsonpath2': '$.sector',
 220 |                          'map':{"basicMaterials":"Basic Materials", 
 221 |                                 "communicationServices":"Communication Services",
 222 |                                 "consumerCyclical":"Consumer Cyclical",
 223 |                                 "consumerDefensive":"Consumer Defensive", 
 224 |                                 "energy":"Energy",
 225 |                                 "financialServices":"Financial Services",
 226 |                                 "healthcare":"Healthcare",
 227 |                                 "industrials":"Industrials",
 228 |                                 "realEstate":"Real Estate",
 229 |                                 "technology":"Technology",
 230 |                                 "utilities":"Utilities",
 231 |                                 }
 232 |                          },   
 233 |               'Holding': {'url':'https://www.emea-api.morningstar.com/sal/sal-service/fund/portfolio/holding/v2/{secid}/data',
 234 |                           'component': 'sal-components-mip-holdings',
 235 |                           'jsonpath': '$.equityHoldingPage.holdingList[*]',
 236 |                           'category': 'securityName',
 237 |                           'percent': 'weighting',
 238 |                           'url2': 'https://www.emea-api.morningstar.com/sal/sal-service/stock/equityOverview/{secid}/data',
 239 |                           'component2': 'sal-eqsv-overview',
 240 |                           'jsonpath2': '$.securityName',     
 241 |                          },  
 242 |               'Region': { 'url': 'https://www.emea-api.morningstar.com/sal/sal-service/fund/portfolio/regionalSector/{secid}/data',
 243 |                          'component': 'sal-components-mip-region-exposure',
 244 |                          'jsonpath': '$.fundPortfolio',
 245 |                          'category': '',
 246 |                          'percent': '',
 247 |                          'url2': 'https://www.emea-api.morningstar.com/sal/sal-service/stock/companyProfile/{secid}',
 248 |                          'component2': '',
 249 |                          'jsonpath2': '$..contact.country',
 250 |                          'map':{"northAmerica":"North America", 
 251 |                                 "europeDeveloped":"Europe Developed",
 252 |                                 "asiaDeveloped":"Asia Developed",
 253 |                                 "asiaEmerging":"Asia Emerging", 
 254 |                                 "australasia":"Australasia",
 255 |                                 "europeDeveloped":"Europe Developed",
 256 |                                 "europeEmerging":"Europe Emerging",
 257 |                                 "japan":"Japan",
 258 |                                 "latinAmerica":"Central & Latin America",
 259 |                                 "unitedKingdom":"United Kingdom",
 260 |                                 "africaMiddleEast":"Middle East / Africa",
 261 |                                 },
 262 |                          'map2':{ "Aruba": "Central & Latin America",
 263 |                                   "Afghanistan": "Asia Emerging",
 264 |                                   "Angola": "Middle East / Africa",
 265 |                                   "Anguilla": "Central & Latin America",
 266 |                                   "Albania": "Europe Emerging",
 267 |                                   "Andorra": "Europe Developed",
 268 |                                   "UnitedArabEmirates": "Middle East / Africa",
 269 |                                   "Argentina": "Central & Latin America",
 270 |                                   "Armenia": "Asia Emerging",
 271 |                                   "AmericanSamoa": "Asia Emerging",
 272 |                                   "AntiguaAndBarbuda": "Central & Latin America",
 273 |                                   "Australia": "Australasia",
 274 |                                   "Austria": "Europe Developed",
 275 |                                   "Azerbaijan": "Asia Emerging",
 276 |                                   "Burundi": "Middle East / Africa",
 277 |                                   "Belgium": "Europe Developed",
 278 |                                   "Benin": "Middle East / Africa",
 279 |                                   "BurkinaFaso": "Middle East / Africa",
 280 |                                   "Bangladesh": "Asia Emerging",
 281 |                                   "Bulgaria": "Europe Emerging",
 282 |                                   "Bahrain": "Middle East / Africa",
 283 |                                   "Bahamas": "Central & Latin America",
 284 |                                   "BosniaAndHerzegovina": "Europe Emerging",
 285 |                                   "Belarus": "Europe Emerging",
 286 |                                   "Belize": "Central & Latin America",
 287 |                                   "Bermuda": "Central & Latin America",
 288 |                                   "Bolivia": "Central & Latin America",
 289 |                                   "Brazil": "Central & Latin America",
 290 |                                   "Barbados": "Central & Latin America",
 291 |                                   "BruneiDarussalam": "Asia Developed",
 292 |                                   "Bhutan": "Asia Emerging",
 293 |                                   "BouvetIsland": "Middle East / Africa",
 294 |                                   "Botswana": "Middle East / Africa",
 295 |                                   "CentralAfricanRepublic": "Middle East / Africa",
 296 |                                   "Canada": "North America",
 297 |                                   "CocosKeelingIslands": "Asia Emerging",
 298 |                                   "Switzerland": "Europe Developed",
 299 |                                   "Chile": "Central & Latin America",
 300 |                                   "China": "Asia Emerging",
 301 |                                   "CoteDIvoire": "Middle East / Africa",
 302 |                                   "Cameroon": "Middle East / Africa",
 303 |                                   "CongoDemocraticRepublic": "Middle East / Africa",
 304 |                                   "Congo": "Middle East / Africa",
 305 |                                   "CookIslands": "Asia Emerging",
 306 |                                   "Colombia": "Central & Latin America",
 307 |                                   "Comoros": "Middle East / Africa",
 308 |                                   "CapeVerde": "Middle East / Africa",
 309 |                                   "CostaRica": "Central & Latin America",
 310 |                                   "Cuba": "Central & Latin America",
 311 |                                   "ChristmasIsland": "Asia Emerging",
 312 |                                   "CaymanIslands": "Central & Latin America",
 313 |                                   "Cyprus": "Europe Developed",
 314 |                                   "CzechRepublic": "Europe Emerging",
 315 |                                   "Germany": "Europe Developed",
 316 |                                   "Djibouti": "Middle East / Africa",
 317 |                                   "Dominica": "Central & Latin America",
 318 |                                   "Denmark": "Europe Developed",
 319 |                                   "DominicanRepublic": "Central & Latin America",
 320 |                                   "Algeria": "Middle East / Africa",
 321 |                                   "Ecuador": "Central & Latin America",
 322 |                                   "Egypt": "Middle East / Africa",
 323 |                                   "Eritrea": "Middle East / Africa",
 324 |                                   "WesternSahara": "Middle East / Africa",
 325 |                                   "Spain": "Europe Developed",
 326 |                                   "Estonia": "Europe Emerging",
 327 |                                   "Ethiopia": "Middle East / Africa",
 328 |                                   "Finland": "Europe Developed",
 329 |                                   "Fiji": "Asia Emerging",
 330 |                                   "FalklandIslands": "Central & Latin America",
 331 |                                   "France": "Europe Developed",
 332 |                                   "FaroeIslands": "Europe Developed",
 333 |                                   "Micronesia": "Asia Emerging",
 334 |                                   "Gabon": "Middle East / Africa",
 335 |                                   "UnitedKingdom": "United Kingdom",
 336 |                                   "Georgia": "Asia Emerging",
 337 |                                   "Guernsey": "United Kingdom",
 338 |                                   "Ghana": "Middle East / Africa",
 339 |                                   "Gibraltar": "Europe Developed",
 340 |                                   "Guinea": "Middle East / Africa",
 341 |                                   "Guadeloupe": "Central & Latin America",
 342 |                                   "Gambia": "Middle East / Africa",
 343 |                                   "GuineaBissau": "Middle East / Africa",
 344 |                                   "EquatorialGuinea": "Middle East / Africa",
 345 |                                   "Greece": "Europe Developed",
 346 |                                   "Grenada": "Central & Latin America",
 347 |                                   "Greenland": "Europe Developed",
 348 |                                   "Guatemala": "Central & Latin America",
 349 |                                   "FrenchGuiana": "Central & Latin America",
 350 |                                   "Guam": "Asia Developed",
 351 |                                   "Guyana": "Central & Latin America",
 352 |                                   "HongKong": "Asia Developed",
 353 |                                   "HeardIslandAndMcDonaldIslands": "Asia Emerging",
 354 |                                   "Honduras": "Central & Latin America",
 355 |                                   "Croatia": "Europe Emerging",
 356 |                                   "Haiti": "Central & Latin America",
 357 |                                   "Hungary": "Europe Emerging",
 358 |                                   "Indonesia": "Asia Emerging",
 359 |                                   "IsleofMan": "United Kingdom",
 360 |                                   "India": "Asia Emerging",
 361 |                                   "Ireland": "Europe Developed",
 362 |                                   "Iran": "Middle East / Africa",
 363 |                                   "Iraq": "Middle East / Africa",
 364 |                                   "Iceland": "Europe Developed",
 365 |                                   "Israel": "Middle East / Africa",
 366 |                                   "Italy": "Europe Developed",
 367 |                                   "Jamaica": "Central & Latin America",
 368 |                                   "Jersey": "United Kingdom",
 369 |                                   "Jordan": "Middle East / Africa",
 370 |                                   "Japan": "Japan",
 371 |                                   "Kazakhstan": "Asia Emerging",
 372 |                                   "Kenya": "Middle East / Africa",
 373 |                                   "Kyrgyzstan": "Asia Emerging",
 374 |                                   "Cambodia": "Asia Emerging",
 375 |                                   "Kiribati": "Asia Emerging",
 376 |                                   "StKittsAndNevis": "Central & Latin America",
 377 |                                   "SouthKorea": "Asia Developed",
 378 |                                   "Kuwait": "Middle East / Africa",
 379 |                                   "Laos": "Asia Emerging",
 380 |                                   "Lebanon": "Middle East / Africa",
 381 |                                   "Liberia": "Middle East / Africa",
 382 |                                   "Libya": "Middle East / Africa",
 383 |                                   "StLucia": "Central & Latin America",
 384 |                                   "Liechtenstein": "Europe Developed",
 385 |                                   "SriLanka": "Asia Emerging",
 386 |                                   "Lesotho": "Middle East / Africa",
 387 |                                   "Lithuania": "Europe Emerging",
 388 |                                   "Luxembourg": "Europe Developed",
 389 |                                   "Latvia": "Europe Emerging",
 390 |                                   "Macao": "Asia Developed",
 391 |                                   "Morocco": "Middle East / Africa",
 392 |                                   "Monaco": "Europe Developed",
 393 |                                   "Moldova": "Europe Emerging",
 394 |                                   "Madagascar": "Middle East / Africa",
 395 |                                   "Maldives": "Asia Emerging",
 396 |                                   "Mexico": "Central & Latin America",
 397 |                                   "MarshallIslands": "Asia Emerging",
 398 |                                   "Macedonia": "Europe Emerging",
 399 |                                   "Mali": "Middle East / Africa",
 400 |                                   "Malta": "Europe Developed",
 401 |                                   "Myanmar": "Asia Emerging",
 402 |                                   "Mongolia": "Asia Emerging",
 403 |                                   "NorthernMarianaIslands": "Asia Emerging",
 404 |                                   "Mozambique": "Middle East / Africa",
 405 |                                   "Mauritania": "Middle East / Africa",
 406 |                                   "Montserrat": "Central & Latin America",
 407 |                                   "Martinique": "Central & Latin America",
 408 |                                   "Mauritius": "Middle East / Africa",
 409 |                                   "Malawi": "Middle East / Africa",
 410 |                                   "Malaysia": "Asia Emerging",
 411 |                                   "Mayotte": "Middle East / Africa",
 412 |                                   "Namibia": "Middle East / Africa",
 413 |                                   "NewCaledonia": "Asia Developed",
 414 |                                   "Niger": "Middle East / Africa",
 415 |                                   "NorfolkIsland": "Asia Emerging",
 416 |                                   "Nigeria": "Middle East / Africa",
 417 |                                   "Nicaragua": "Central & Latin America",
 418 |                                   "Niue": "Asia Emerging",
 419 |                                   "Netherlands": "Europe Developed",
 420 |                                   "Norway": "Europe Developed",
 421 |                                   "Nepal": "Asia Emerging",
 422 |                                   "Nauru": "Asia Emerging",
 423 |                                   "NewZealand": "Australasia",
 424 |                                   "Oman": "Middle East / Africa",
 425 |                                   "Pakistan": "Asia Emerging",
 426 |                                   "Panama": "Central & Latin America",
 427 |                                   "Pitcairn": "Asia Emerging",
 428 |                                   "Peru": "Central & Latin America",
 429 |                                   "Philippines": "Asia Emerging",
 430 |                                   "Palau": "Asia Emerging",
 431 |                                   "PapuaNewGuinea": "Asia Emerging",
 432 |                                   "Poland": "Europe Emerging",
 433 |                                   "PuertoRico": "Central & Latin America",
 434 |                                   "NorthKorea": "Asia Emerging",
 435 |                                   "Portugal": "Europe Developed",
 436 |                                   "Paraguay": "Central & Latin America",
 437 |                                   "OccupiedPalestinianTerritory": "Middle East / Africa",
 438 |                                   "FrenchPolynesia": "Asia Developed",
 439 |                                   "Qatar": "Middle East / Africa",
 440 |                                   "Reunion": "Middle East / Africa",
 441 |                                   "Romania": "Europe Emerging",
 442 |                                   "Russia": "Europe Emerging",
 443 |                                   "Rwanda": "Middle East / Africa",
 444 |                                   "SaudiArabia": "Middle East / Africa",
 445 |                                   "Sudan": "Middle East / Africa",
 446 |                                   "Senegal": "Middle East / Africa",
 447 |                                   "Singapore": "Asia Developed",
 448 |                                   "StHelena": "Middle East / Africa",
 449 |                                   "SvalbardandJanMayen": "Europe Developed",
 450 |                                   "SolomonIslands": "Asia Emerging",
 451 |                                   "SierraLeone": "Middle East / Africa",
 452 |                                   "ElSalvador": "Central & Latin America",
 453 |                                   "SanMarino": "Europe Developed",
 454 |                                   "Somalia": "Middle East / Africa",
 455 |                                   "Serbia": "Europe Emerging",
 456 |                                   "SaoTomeAndPrincipe": "Middle East / Africa",
 457 |                                   "Suriname": "Central & Latin America",
 458 |                                   "Slovakia": "Europe Emerging",
 459 |                                   "Slovenia": "Europe Developed",
 460 |                                   "Sweden": "Europe Developed",
 461 |                                   "Swaziland": "Middle East / Africa",
 462 |                                   "Seychelles": "Middle East / Africa",
 463 |                                   "SyrianArabRepublic": "Middle East / Africa",
 464 |                                   "TurksAndCaicosIslands": "Central & Latin America",
 465 |                                   "Chad": "Middle East / Africa",
 466 |                                   "Togo": "Middle East / Africa",
 467 |                                   "Thailand": "Asia Emerging",
 468 |                                   "Tajikistan": "Asia Emerging",
 469 |                                   "Tokelau": "Asia Emerging",
 470 |                                   "Turkmenistan": "Asia Emerging",
 471 |                                   "TimorLeste": "Asia Emerging",
 472 |                                   "Tonga": "Asia Emerging",
 473 |                                   "TrinidadAndTobago": "Central & Latin America",
 474 |                                   "Tunisia": "Middle East / Africa",
 475 |                                   "Turkey": "Europe Emerging",
 476 |                                   "Tuvalu": "Asia Emerging",
 477 |                                   "Taiwan": "Asia Developed",
 478 |                                   "Tanzania": "Middle East / Africa",
 479 |                                   "Uganda": "Middle East / Africa",
 480 |                                   "Ukraine": "Europe Emerging",
 481 |                                   "Uruguay": "Central & Latin America",
 482 |                                   "UnitedStates": "North America",
 483 |                                   "Uzbekistan": "Asia Emerging",
 484 |                                   "Vatican": "Europe Developed",
 485 |                                   "StVincentAndTheGrenadines": "Central & Latin America",
 486 |                                   "Venezuela": "Central & Latin America",
 487 |                                   "BritishVirginIslands": "Central & Latin America",
 488 |                                   "USVirginIslands": "Central & Latin America",
 489 |                                   "Vietnam": "Asia Emerging",
 490 |                                   "Vanuatu": "Asia Emerging",
 491 |                                   "WallisAndFutunaIslands": "Asia Emerging",
 492 |                                   "Samoa": "Asia Emerging",
 493 |                                   "Yemen": "Middle East / Africa",
 494 |                                   "SouthAfrica": "Middle East / Africa",
 495 |                                   "Zambia": "Middle East / Africa",
 496 |                                   "Zimbabwe": "Middle East / Africa",
 497 |                                   "BonaireSintEustatiusAndSaba": "Central & Latin America",
 498 |                                   "Curacao": "Central & Latin America",
 499 |                                   "Supranational": "Supranational",                       
 500 |                                 },                        
 501 |                          },  
 502 |               'Country': { 'url': 'https://www.emea-api.morningstar.com/sal/sal-service/fund/portfolio/regionalSectorIncludeCountries/{secid}/data',
 503 |                           'component': 'sal-components-mip-country-exposure',
 504 |                           'jsonpath': '$.fundPortfolio.countries[*]',
 505 |                           'category': 'name',
 506 |                           'percent': 'percent',
 507 |                           'url2': 'https://www.emea-api.morningstar.com/sal/sal-service/stock/companyProfile/{secid}',
 508 |                           'component2': '',
 509 |                           'jsonpath2': '$..contact.country',
 510 |                           },
 511 |         }
 512 | 
 513 | 
 514 | 
 515 | class Isin2secid:
 516 |     mapping = dict()
 517 |     
 518 |     @staticmethod
 519 |     def load_cache():
 520 |         if os.path.exists("isin2secid.json"):
 521 |             with open("isin2secid.json", "r") as f:
 522 |                 try:
 523 |                     Isin2secid.mapping = json.load(f)
 524 |                 except json.JSONDecodeError:
 525 |                     print("Invalid json file")
 526 |                     
 527 |         
 528 |     @staticmethod
 529 |     def save_cache():
 530 |         with open("isin2secid.json", "w") as f:
 531 |             json.dump(Isin2secid.mapping, f, indent=1, sort_keys=True)
 532 |             
 533 |     @staticmethod
 534 |     def get_secid(isin):
 535 |         cached_secid = Isin2secid.mapping.get(isin,"-")
 536 |         if cached_secid == "-" or len(cached_secid.split("|"))<3:
 537 |             url = f"https://global.morningstar.com/api/v1/{DOMAIN}/search/securities"
 538 |             if isin is None: isin=""
 539 |             params = {
 540 |                    "query": '((isin ~= "' + isin +'"))'
 541 |                  }
 542 |             headers = {
 543 |                 'accept': '*/*',
 544 |                 'accept-encoding': 'gzip, deflate, br',
 545 |                 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36',
 546 |                 }
 547 |             resp = requests.get(url, headers=headers, params=params)		
 548 |             if resp.status_code == 200:
 549 |                 response = resp.json()
 550 |                 jsonpath = parse("$..securityID")
 551 |                 index = 0             
 552 |                 if jsonpath.find(response):
 553 |                   if len(jsonpath.find(response)) > 2: index=2
 554 |                   elif len(jsonpath.find(response)) > 1: index=1  
 555 |                   secid = jsonpath.find(response)[index].value              
 556 |                 else:
 557 |                   secid =""  
 558 |                 jsonpath = parse("$..universe")
 559 |                 if jsonpath.find(response):
 560 |                   if jsonpath.find(response)[0].value == "EQ": secid_type = "stock"
 561 |                   else: secid_type = "fund"
 562 |                 else:
 563 |                   secid_type = "unknown" 
 564 |                 secid_type_domain = secid + "|" + secid_type + "|" + DOMAIN
 565 |                 Isin2secid.mapping[isin] = secid_type_domain
 566 |             else:
 567 |                 secid_type_domain = '||'
 568 |         else:
 569 |             secid_type_domain = Isin2secid.mapping[isin]
 570 |         return secid_type_domain.split("|")
 571 | 
 572 | 
 573 | class Security:
 574 |  
 575 |     def __init__ (self, **kwargs):
 576 |         self.__dict__.update(kwargs)
 577 |         self.holdings = []
 578 | 
 579 |     def load_holdings (self):
 580 |         if len(self.holdings) == 0:
 581 |             self.holdings = SecurityHoldingReport()
 582 |             self.holdings.load(isin = self.ISIN, secid = self.secid, name = self.name, isRetired = self.isRetired)
 583 |         return self.holdings
 584 | 
 585 | 
 586 | class SecurityHoldingReport:
 587 |     def __init__ (self):
 588 |         self.secid=''
 589 |         pass
 590 | 
 591 | 
 592 |     
 593 |     def get_bearer_token(self, secid, domain):
 594 |         # the secid can change for retrieval purposes
 595 |         # find the retrieval secid
 596 |         global BEARER_TOKEN
 597 |         headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'}
 598 |         url = f'https://www.morningstar.{domain}/{domain}/funds/snapshot/snapshot.aspx?id={secid}'
 599 |         response = requests.get(url, headers=headers)
 600 |         secid_regexp = r"var FC =  '(.*)';"
 601 |         matches = re.findall(secid_regexp, response.text)
 602 |         if len(matches)>0:
 603 |             secid_to_search = matches[0]
 604 |         else:
 605 |             secid_to_search = secid
 606 |             
 607 |         # get one bearer token for all requests
 608 |         if BEARER_TOKEN == "":
 609 |           url = f'https://www.morningstar.{domain}/Common/funds/snapshot/PortfolioSAL.aspx'
 610 |           response = requests.get(url, headers=headers)
 611 |           token_regex = r"const maasToken \=\s\"(.+)\""
 612 |           resultstringtoken = re.findall(token_regex, response.text)[0]
 613 |           BEARER_TOKEN = resultstringtoken
 614 |         else:
 615 |           resultstringtoken = BEARER_TOKEN
 616 |         return resultstringtoken, secid_to_search
 617 | 
 618 |     def calculate_grouping(self, categories, percentages, grouping_name, long_equity):
 619 |         for category_name, percentage in zip(categories, percentages):
 620 |             self.grouping[grouping_name][escape(category_name)] = self.grouping[grouping_name].get(escape(category_name),0) +  percentage  
 621 | 
 622 |         if grouping_name !='Asset-Type':
 623 |             self.grouping[grouping_name] = {k:v*long_equity for k, v in 
 624 |                                             self.grouping[grouping_name].items()}
 625 | 
 626 |    
 627 |         
 628 |     def load (self, isin, secid, name, isRetired):
 629 |         secid, secid_type, domain = Isin2secid.get_secid(isin)
 630 |         if secid == '':
 631 |             print(f"@ isin {isin} not found in Morningstar, skipping it...")
 632 |             print(f"  [{name}]")
 633 |             return
 634 |         elif isRetired=="true":
 635 |             print(f"@ isin {isin} is inactive, skipping it...")
 636 |             print(f"  [{name}]")         
 637 |             return 
 638 |         self.secid = secid
 639 |         bearer_token, secid = self.get_bearer_token(secid, domain)
 640 |         if secid_type=="stock":
 641 |              if STOCKS:
 642 |                  print(f"@ Retrieving data for {secid_type} {isin} ({secid}) ...")
 643 |              else:    
 644 |                  print(f"@ isin {isin} is a stock, skipping it...")
 645 |         else:
 646 |              print(f"@ Retrieving data for {secid_type} {isin} ({secid}) ...")
 647 |         print(f"  [{name}]")
 648 |         headers_short = {
 649 |             'accept': '*/*',
 650 |             'accept-encoding': 'gzip, deflate, br',
 651 |             'accept-language': 'fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7',
 652 |             }
 653 |         headers = headers_short.copy()
 654 |         headers['Authorization'] = f'Bearer {bearer_token}'
 655 |         
 656 |         params = {
 657 |             'languageId': 'en-EU',
 658 |             'locale': 'en',
 659 |             'benchmarkId': 'undefined',
 660 |             'version': '4.65.0',
 661 |         }
 662 |     
 663 |         
 664 |         self.grouping=dict()
 665 |         for taxonomy in taxonomies:
 666 |             self.grouping[taxonomy] = defaultdict(float)
 667 |        
 668 |         non_categories = ['avgMarketCap', 'portfolioDate', 'name', 'masterPortfolioId' ]
 669 |         
 670 |         if secid_type!="stock":	              # secid_type=="fund"
 671 |           for grouping_name, taxonomy in taxonomies.items():
 672 |             url = taxonomy['url']
 673 |             # use corresponding id (secid or isin)
 674 |             url = url.replace("{secid}", secid)
 675 |             for urlparam in ['component']:			
 676 |               if taxonomy.get(urlparam): params[urlparam] = taxonomy[urlparam]
 677 |             resp = requests.get(url, params=params, headers=headers)
 678 |             if resp.status_code == 401 or resp.status_code == 400:
 679 |                 print(f"  Warning: No information on {grouping_name} for {secid} [{resp.status_code}]")
 680 |                 continue
 681 |             try:
 682 |                 response = resp.json()
 683 |                 jsonpath = parse(taxonomy['jsonpath'])
 684 |                 percent_field = taxonomy['percent']
 685 |                 # single match of the jsonpath from sal-service means the path contains the categories
 686 |                 if "sal-service" in url and len(jsonpath.find(response))==1:
 687 |                     value = jsonpath.find(response)[0].value 
 688 |                     keys = [key for key in value if key not in non_categories]
 689 |                     
 690 |                     if percent_field != "":
 691 |                         if value[keys[0]][percent_field] is not None:
 692 |                             percentages = [float(value[key][percent_field]) for key in keys]
 693 |                         else:
 694 |                             percentages =[]
 695 |                     else:
 696 |                         if value[keys[0]] is not None:
 697 |                             percentages = [float(value[key]) for key in keys]
 698 |                         else:
 699 |                             percentages = []
 700 |                         
 701 |                     if grouping_name == 'Asset-Type':
 702 |                         try:
 703 |                             long_equity = (float(value.get('assetAllocEquity',{}).get('longAllocation',0)) +
 704 |                                       float(value.get('AssetAllocNonUSEquity',{}).get('longAllocation',0)) +           
 705 |                                       float(value.get('AssetAllocUSEquity',{}).get('longAllocation',0)))/100
 706 |                         except TypeError:
 707 |                             print(f"  Warning: No information on {grouping_name} for {secid}")
 708 |                 else:
 709 |                     # every match is a category
 710 |                     value = jsonpath.find(response)
 711 |                     keys = [key.value[taxonomy['category']] for key in value]
 712 |                     if len(value) == 0 or value[0].value.get(taxonomy['percent'],"") =="":
 713 |                         print(f"  Warning: percentages not found for {grouping_name} for {secid}")
 714 |                     else:
 715 |                         percentages = [float(key.value[taxonomy['percent']]) for key in value]
 716 | 
 717 |                 # Map names if there is a map
 718 |                 if len(taxonomy.get('map',{})) != 0:
 719 |                     categories = [taxonomy['map'][key] for key in keys if key in taxonomy['map'].keys()]
 720 |                     unmapped = [key for key in keys if key not in taxonomy['map'].keys()]
 721 |                     if  unmapped:
 722 |                         print(f"  Warning: Categories not mapped: {unmapped} for {secid}")
 723 |                 else:
 724 |                     # capitalize first letter if not mapping
 725 |                     categories = [key[0].upper() + key[1:] for key in keys]
 726 |                 
 727 |                 if percentages:
 728 |                     self.calculate_grouping (categories, percentages, grouping_name, long_equity)
 729 |                 
 730 |             except Exception:
 731 |                 print(f"  Warning: Problem with {grouping_name} for secid {secid} in PortfolioSAL...")
 732 |                 
 733 |                 
 734 |         else:		# secid_type=="stock"
 735 |           
 736 |           if STOCKS:
 737 |               
 738 |            for grouping_name, taxonomy in taxonomies.items():
 739 |             url = taxonomy['url2']
 740 |             # use corresponding id (secid or isin)
 741 |             url = url.replace("{secid}", secid)			
 742 |             if taxonomy.get('component2'): params['component'] = taxonomy['component2']
 743 |             resp = requests.get(url, params=params, headers=headers)
 744 |             if resp.status_code != 200:                
 745 |                 print(f"  Warning: No information on {grouping_name} for {secid} [{resp.status_code}]")
 746 |                 continue
 747 |             response = resp.json()
 748 |             jsonpath = parse(taxonomy['jsonpath2'])
 749 |             value = jsonpath.find(response)[0].value
 750 |             if grouping_name == 'Asset-Type':
 751 |               print ("  Name:", value)
 752 |               value = "Stocks"
 753 |             if grouping_name in ['Country', 'Region']:
 754 |               value = re.sub(r'\([^)]*\)', '', value)
 755 |               value = value.replace(' ', '')
 756 |               value = value.replace('ofAmerica', '')
 757 |               value = value.replace('ofGreatBritainandNorthernIreland', '')
 758 |               value = value.replace('Korea','SouthKorea')
 759 |               value = value.replace('Czechia','CzechRepublic')
 760 |               value = value.replace('RussianFederation','Russia')
 761 |             if value is not None:
 762 |              if len(taxonomy.get('map2',{})) != 0:
 763 |                if value in taxonomy['map2'].keys():
 764 |                     value = taxonomy['map2'][value]
 765 |                else:
 766 |                     print (" ",grouping_name,":", value, "not mapped !!! Please report")
 767 |                     value = "" 
 768 |              if value != "":
 769 |               self.grouping[grouping_name][escape(value)] = 100.0
 770 |               continue              
 771 |             print(f"  Warning: No information on {grouping_name} for {secid}")                   
 772 |       
 773 |         
 774 |     def group_by_key (self,key):
 775 |         return self.grouping[key]
 776 | 
 777 | 
 778 | class PortfolioPerformanceFile:
 779 | 
 780 |     def __init__ (self, filepath):
 781 |         self.filepath = filepath
 782 |         self.pp_tree = ET.parse(filepath)
 783 |         self.pp = self.pp_tree.getroot()
 784 |         self.securities = None
 785 |         if self.pp.get('id') is not None:
 786 |           print ("ABORTED: XML FORMAT WITH IDs IS NOT SUPPORTED")
 787 |           print ("Please save input file in original XML format of PP.")
 788 |           print ("Please don't use XML format with \"id\" attributes.")
 789 |           exit()
 790 | 
 791 |     def get_security(self, security_xpath):
 792 |         """return a security object """
 793 |         if (matching := self.pp.findall(security_xpath)):
 794 |             security = matching[0]
 795 |         else:
 796 |             return None 
 797 |         if security is not None:
 798 |             isin = security.find('isin') 
 799 |             if isin is not None:
 800 |                 isin = isin.text
 801 |                 name = security.find('name')
 802 |                 if name is not None:
 803 |                     name = name.text
 804 |                 secid = security.find('secid')
 805 |                 if secid is not None:
 806 |                     secid = secid.text
 807 |                 note = security.find('note')
 808 |                 isRetired = security.find('isRetired').text
 809 |                 security2 = None
 810 |                 if note is not None:
 811 |                     note = note.text
 812 |                     if note is not None:
 813 |                        token_pattern = r'#PPC:\[ISIN2=([A-Z0-9]{12})'
 814 |                        match = re.search(token_pattern,note)
 815 |                        if match:
 816 |                            ISIN2 = match.group(1)
 817 |                            security2 = self.get_security2(ISIN2, isin, isRetired)
 818 |                 return Security(
 819 |                     name = name,
 820 |                     ISIN = isin,
 821 |                     secid = secid,
 822 |                     UUID = security.find('uuid').text,
 823 |                     isRetired = isRetired,
 824 |                     note = note,
 825 |                     security2 = security2
 826 |                 )
 827 |             else:
 828 |                 name = security.find('name').text
 829 |                 print(f"  Warning: security '{name}' does not have isin, skipping it...")
 830 |         return None
 831 |       
 832 |     def get_security2(self, isin2, isin, isRetired):				  
 833 |         """return an alternative security object """
 834 |         return Security(
 835 |                     name = "Alternative ISIN for " + isin,
 836 |                     ISIN = isin2,
 837 |                     secid = "",
 838 |                     UUID = "00000000-0000-0000-0000-000000000000",
 839 |                     isRetired = isRetired,
 840 |                     note = "alternative security for fetching classification"
 841 |                 ) 
 842 | 
 843 |     def get_security_xpath_by_uuid (self, uuid):
 844 |         for idx, security in enumerate(self.pp.findall(".//securities/security")):
 845 |             sec_uuid =  security.find('uuid').text
 846 |             if sec_uuid == uuid and idx == 0:
 847 |                 return "../../../../../../../../securities/security"
 848 |             if sec_uuid == uuid:
 849 |                 return f"../../../../../../../../securities/security[{idx + 1}]"
 850 |         print (f"Error: No xpath found for UUID '{uuid}'") 
 851 | 
 852 |     def add_taxonomy (self, kind):
 853 |           securities = self.get_securities()
 854 |           color = cycle(COLORS)
 855 |         
 856 |           # Does taxonomy of type kind exist in xml file? If not, create an entry.
 857 |           if self.pp.find("taxonomies/taxonomy[name='%s']" % kind) is None:
 858 |           
 859 |             print (f"### No entry for '{kind}' found: Creating it from scratch")
 860 |             new_taxonomy_tpl =  """
 861 |     <taxonomy>
 862 |       <id>{{ outer_uuid }}</id>
 863 |       <name>{{ kind }}</name>
 864 |       <root>
 865 |         <id>{{ inner_uuid }}</id>
 866 |         <name>{{ kind }}</name>
 867 |         <color>#89afee</color>
 868 |         <children/>
 869 |         <assignments/>
 870 |         <weight>10000</weight>
 871 |          <rank>0</rank>
 872 |        </root>
 873 |     </taxonomy>
 874 |             """
 875 |             
 876 |             new_taxonomy_tpl = Environment(loader=BaseLoader).from_string(new_taxonomy_tpl)
 877 |             new_taxonomy_xml = new_taxonomy_tpl.render(
 878 |                                       outer_uuid = str(uuid.uuid4()),
 879 |                                       inner_uuid = str(uuid.uuid4()),
 880 |                                       kind = kind,                            
 881 |                                    )                                  
 882 |             self.pp.find('.//taxonomies').append(ET.fromstring(new_taxonomy_xml))
 883 |            
 884 |                                 
 885 |           else:
 886 |             print (f"### Entry for '{kind}' found: updating existing data")
 887 |             
 888 |             # Substitute "'" with "....."  in all names of classifications of all taxonomies of type kind            
 889 |             for child in self.pp.findall(".//taxonomies/taxonomy[name='%s']/root/children/classification" % kind):
 890 |               category_name = child.find('name')
 891 |               if category_name is not None and category_name.text is not None:
 892 |                   category_name.text = category_name.text.replace("'", ".....")
 893 |                            
 894 |           double_entry = False
 895 |           
 896 |           for taxonomy in self.pp.findall("taxonomies/taxonomy[name='%s']" % kind):
 897 |              if double_entry == True:
 898 |                  print (f"### Another entry for '{kind}' found: updating existing data with same input")
 899 |              double_entry = True
 900 |              rank = 0       
 901 |                                                       
 902 |              # Run through all securities for which data was fetched (i.e. all securities that are not plain stocks)
 903 |              for security in securities:
 904 |                   security_xpath = self.get_security_xpath_by_uuid(security.UUID)
 905 |                   security_assignments = security.holdings.grouping[kind]
 906 |                        
 907 |                   # Set weight = 0 in all existing assignments of this specific security
 908 |                   # for all(!) categories, if anything was retrieved for this taxonomy (aka kind)
 909 |                   # (last step will remove all assignement with weight == 0)    
 910 |                   
 911 |                   if security.holdings.grouping[kind] == {}:
 912 |                      if security.security2 is not None:
 913 |                        if security.security2.holdings.grouping[kind] == {}:
 914 |                          grouping_exists = False
 915 |                          print (f"  Warning: No input for '{kind}' for '{security.name}' (also not in alternative ISIN): keeping existing data")
 916 |                        else:     
 917 |                          grouping_exists = True
 918 |                          security_assignments = security.security2.holdings.grouping[kind]
 919 |                          print (f"  Info: Using alternative ISIN {security.security2.ISIN} for '{kind}' for '{security.name}'")
 920 |                      else:
 921 |                        grouping_exists = False
 922 |                        print (f"  Warning: No input for '{kind}' for '{security.name}': keeping existing data")
 923 |                   else:
 924 |                      grouping_exists = True                       
 925 |                   
 926 |                   if grouping_exists:
 927 |                       for existing_assignment in taxonomy.findall("./root/children/classification/assignments/assignment"):                  
 928 |                            investment_vehicle = existing_assignment.find('investmentVehicle')
 929 |                            if investment_vehicle is not None and investment_vehicle.attrib.get('reference') == security_xpath:
 930 |                                weight_element = existing_assignment.find('weight')
 931 |                                if weight_element is not None:
 932 |                                    weight_element.text = "0"
 933 |                                    rank += 1
 934 |                                    next(color)            
 935 |                                     
 936 |                   # 1. Determine scaling factor for rounding issues when sum of percentages is in the range 100,01% to 100,05%
 937 |                   # 2a. Check for each category for which the security has a contribution, if there is already an entry in the file. If not, create the category.
 938 |                   # 2b. Also check, if there is already an assignment for the security in the category. If not, create one with weight = 0.
 939 |                   # 3.  Set the new weight values.                                    
 940 | 
 941 |                   scaling = 1
 942 |                   w_sum_initial = 0  
 943 |                   
 944 |                   while True:
 945 |                        w_sum = 0
 946 |                        for category, weight in security_assignments.items():
 947 |                               weight = round(weight*100*scaling)
 948 |                               if weight > 10000: weight = 10000    # weight value above 100% reduced to 100%
 949 |                               if weight > 0: w_sum += weight       # skip negative values
 950 |                        if w_sum_initial == 0: w_sum_initial = w_sum     # remember initial value without scaling
 951 |                        if w_sum > 10000 and w_sum < 10006:
 952 |                               scaling = scaling * 0.999999         # try again with new scaling
 953 |                        else: break                 
 954 |                                                                   
 955 |                   w_sum = 0                 
 956 |                   for category, weight in security_assignments.items():
 957 |                         
 958 |                         weight = round(weight*100*scaling)   
 959 |                         category = category.replace("'", ".....")
 960 |                         category = clean_text(category)                       
 961 |                         
 962 |                         if weight != 0:
 963 |                              for children in taxonomy.findall(".//root/children"):                                                
 964 |                                 
 965 |                                 # Does category already exist in xml file for this taxonomy (aka kind)?
 966 |                                 if any(clean_text(child.find('name').text) == category for child in children if child.find('name') is not None):
 967 |                                    category_found = True
 968 |                                 else:
 969 |                                    category_found = False
 970 |                                           
 971 |                                 if category_found == False:                        
 972 | 
 973 |                                        new_child_tpl =  """                    
 974 |           <classification>
 975 |             <id>{{ uuid }}</id>
 976 |             <name>{{ name }}</name>
 977 |             <color>{{ color }}</color>
 978 |             <parent reference="../../.."/>
 979 |             <children/>
 980 |             <assignments/>
 981 |             <weight>0</weight>
 982 |             <rank>1</rank>
 983 |           </classification>
 984 |                                        """
 985 |                                        
 986 |                                        new_child_tpl = Environment(loader=BaseLoader).from_string(new_child_tpl)
 987 |                                        new_child_xml = new_child_tpl.render(
 988 |                                                   uuid = str(uuid.uuid4()),
 989 |                                                   name = category.replace("&","&amp;"),
 990 |                                                   color = next(color)                              
 991 |                                                 )
 992 |                                        children.append(ET.fromstring(new_child_xml))    
 993 |                                                                                                                   
 994 |                                        print ("  Info: Entry for '%s' in '%s' created" % (category.replace(".....","'"),kind))          
 995 |                                                                            
 996 |                              # Does investment vehicle already exist in xml file for this security in this category in this taxonomy (aka kind)?
 997 |                              if any(existing_vehicle.attrib['reference'] == security_xpath for existing_vehicle in taxonomy.findall(".//root/children/classification[name='%s']/assignments/assignment/investmentVehicle" % category) if existing_vehicle.attrib['reference'] is not None):
 998 |                                    vehicle_found = True
 999 |                              else:
1000 |                                    vehicle_found = False                                                        
1001 |                              
1002 |                              if vehicle_found == False:
1003 |                              
1004 |                                        new_ass_tpl =  """
1005 |             <assignment>
1006 |                 <investmentVehicle class="security" reference="{{ security_xpath }}"/>
1007 |                 <weight>0</weight>
1008 |                 <rank>{{ rank }}</rank>
1009 |             </assignment>                             
1010 |                                        """  
1011 |                                                                                                                        
1012 |                                        new_ass_tpl = Environment(loader=BaseLoader).from_string(new_ass_tpl)
1013 |                                        
1014 |                                        rank += 1
1015 |                                        new_ass_xml = new_ass_tpl.render(
1016 |                                                   security_xpath = security_xpath,
1017 |                                                   rank = str(rank)                            
1018 |                                                 )
1019 |                                         
1020 |                                        new_ass = ET.fromstring(new_ass_xml)
1021 |                                        
1022 |                                        for assignments_element in taxonomy.findall(".//root/children/classification[name='%s']/assignments" % category):
1023 |                                             assignments_element.append(new_ass)
1024 |                                             print ("  Info: Entry for '%s' in '%s' created" % (security.name, category.replace(".....","'")))                            
1025 |         
1026 |                              for existing_assignment in taxonomy.findall(".//root/children/classification[name='%s']/assignments/assignment" % category):
1027 |                                   investment_vehicle = existing_assignment.find('investmentVehicle')
1028 |                                   if investment_vehicle is not None and investment_vehicle.attrib.get('reference') == security_xpath:
1029 |                                       weight_element = existing_assignment.find('weight')
1030 |                                       if weight_element is not None:
1031 |                                           if weight < 0:
1032 |                                                print (f"  !!! Warning: Skipping negative weight for '{security.name}' for '{category}' in '{kind}' ({weight/100}%) !!!") 
1033 |                                           else:
1034 |                                              if weight > 10000: 
1035 |                                                   print (f"  !!! Warning: Weight value > 100% reduced to 100% for '{security.name}' for '{category}' in '{kind}' (was: {weight/100}%) !!!")
1036 |                                                   weight = 10000                                         
1037 |                                              weight_element.text = str(weight)
1038 |                                              w_sum += weight
1039 |                                             
1040 |   
1041 |                   if scaling != 1:
1042 |                         print (f"  Warning: Sum adjusted to {(w_sum/100):.2f}% for '{security.name}' in '{kind}' (was: {w_sum_initial/100}%)") 
1043 |                   if w_sum > 10000:
1044 |                         print (f"  !!! Warning: Sum is higher than 100% for '{security.name}' in '{kind}' (kept: {w_sum/100}%) !!!")
1045 |               
1046 |             
1047 |           # Substitute "....." with "'" in all names of classifications of all taxonomies of type kind             
1048 |           for child in self.pp.findall(".//taxonomies/taxonomy[name='%s']/root/children/classification" % kind):
1049 |             category_name = child.find('name')
1050 |             if category_name is not None and category_name.text is not None:
1051 |                 category_name.text = category_name.text.replace(".....", "'")
1052 |              
1053 |           # delete all assignments for this taxonomy with weight == 0:
1054 |           deletions = []
1055 |              
1056 |           for assignment_parent in self.pp.findall(".//taxonomies/taxonomy[name='%s']/root/children/classification/assignments" % kind):
1057 |             for assignment in assignment_parent:
1058 |               if assignment.find('weight').text == "0":
1059 |                   deletions.append((assignment_parent,assignment))
1060 |                     
1061 |           for parent,assignment_for_deletion in deletions:
1062 |              parent.remove(assignment_for_deletion)   
1063 | 
1064 |     def write_xml(self, output_file):
1065 |         with open(output_file, 'wb') as f:
1066 |             self.pp_tree.write(f, encoding="utf-8")
1067 | 
1068 | 
1069 |     def dump_xml(self):
1070 |         print (ET.tostring(self.pp, encoding="unicode"))
1071 | 
1072 |     def dump_csv(self):
1073 |         csv_file = open ("pp_data_fetched.csv", 'w')
1074 |         csv_file.write ("ISIN,Taxonomy,Classification,Percentage,Name\n")
1075 |         for security in sorted(self.securities, key=lambda security: security.name.upper()):
1076 |           for taxonomy in sorted(taxonomies):     
1077 |              for key, value in sorted(security.holdings.grouping[taxonomy].items(), reverse=False):   
1078 |                    csv_file.write (f"{security.ISIN},{clean_text(taxonomy)},{clean_text(key)},{value/100},{clean_text(security.name)}\n")
1079 |              if security.security2 is not None: 
1080 |                for key, value in sorted(security.security2.holdings.grouping[taxonomy].items(), reverse=False):   
1081 |                    csv_file.write (f"{security.security2.ISIN},{clean_text(taxonomy)},{clean_text(key)},{value/100},{clean_text(security.security2.name)}\n")
1082 | 
1083 |     def get_securities(self):
1084 |         if self.securities is None:
1085 |             self.securities = []
1086 |             sec_xpaths = []
1087 |             
1088 |             # create list of xpaths for all securities in the file
1089 |             for count, sec in enumerate(self.pp.findall(".//securities/security")):
1090 |                sec_xpaths.append('.//securities/security['+ str(count+1) + ']')     
1091 |     
1092 |             for sec_xpath in list(set(sec_xpaths)):
1093 |                 security = self.get_security(sec_xpath)
1094 |                 if security is not None:
1095 |                     security_h = security.load_holdings()
1096 |                     if security_h.secid !='':
1097 |                         if security.security2 is not None:
1098 |                            security.security2.load_holdings()
1099 |                         self.securities.append(security)
1100 |         return self.securities
1101 | 
1102 | def clean_text(text):
1103 |     return BeautifulSoup(text, "html.parser").text
1104 | 
1105 | def print_class (grouped_holding):
1106 |     for key, value in sorted(grouped_holding.items(), reverse=True):
1107 |         print (key, "\t\t{:.2f}%".format(value))
1108 |     print ("-"*30)
1109 | 
1110 | 
1111 | if __name__ == '__main__':
1112 | 
1113 |     print ("WARNING: MORNINGSTAR API HAS CHANGED.")
1114 |     print ("THIS VERSION OF THE SCRIPT WILL PROBABALY NOT WORK ANYMMORE.")
1115 |     print ("Please try new-api-branch instead.")
1116 |   
1117 |     parser = argparse.ArgumentParser(
1118 |     #usage="%(prog) <input_file> [<output_file>] [-d domain] [-stocks]",
1119 |     description='\r\n'.join(["reads a portfolio performance xml file and auto-classifies",
1120 |                  "the securities in it by asset-type, stock-style, sector, holdings, region and country weights",
1121 |                  "For each security, you need to have an ISIN"])
1122 |     )
1123 |     
1124 |     
1125 |     parser.add_argument('-d', default='de',  dest='domain', type=str,
1126 |                         help='Morningstar domain from which to retrieve the security token and the secid (default: de)')
1127 |     
1128 |     parser.add_argument('input_file', metavar='input_file', type=str,
1129 |                    help='path to unencrypted pp.xml file')
1130 |     
1131 |     parser.add_argument('output_file', metavar='output_file', type=str, nargs='?',
1132 |                    help='path to auto-classified output file', default='pp_classified.xml')
1133 |                    
1134 |     parser.add_argument('-stocks', action='store_true', dest='retrieve_stocks',
1135 |                    help='activates retrieval of data on individual stocks')
1136 |                    
1137 | 
1138 |     args = parser.parse_args()
1139 |     
1140 |     if "input_file" not in args:
1141 |         parser.print_help()
1142 |     else:
1143 |         DOMAIN = args.domain
1144 |         STOCKS = args.retrieve_stocks
1145 |         BEARER_TOKEN = ""
1146 |         # Isin2secid.load_cache()
1147 |         pp_file = PortfolioPerformanceFile(args.input_file)
1148 |         for taxonomy in taxonomies:
1149 |             pp_file.add_taxonomy(taxonomy)
1150 |         # Isin2secid.save_cache()
1151 |         pp_file.write_xml(args.output_file)
1152 |         pp_file.dump_csv()
1153 | 


--------------------------------------------------------------------------------