├── .gitignore
├── README.md
├── Tableau_calculation_extractor.ipynb
├── Tableau_calculation_extractor_with_mermaid.ipynb
├── Tableau_calculation_extractor_with_mermaid.py
├── excelgenerator.py
└── output_examples
    ├── TheMoodsofMidgarWillSutton_CALCS_only.pdf
    └── TheMoodsofMidgarWillSutton_CALCS_only.xlsx


/.gitignore:
--------------------------------------------------------------------------------
  1 | #temporary input and output folders
  2 | inputs
  3 | outputs
  4 | 
  5 | .idea
  6 | 
  7 | #Unzipped tableau workbooks
  8 | inputs/Image
  9 | inputs/Data
 10 | inputs/to use later
 11 | 
 12 | #Archived files from main directory
 13 | archive
 14 | 
 15 | 
 16 | # Byte-compiled / optimized / DLL files
 17 | __pycache__/
 18 | *.py[cod]
 19 | *$py.class
 20 | 
 21 | # C extensions
 22 | *.so
 23 | 
 24 | # Distribution / packaging
 25 | .Python
 26 | build/
 27 | develop-eggs/
 28 | dist/
 29 | downloads/
 30 | eggs/
 31 | .eggs/
 32 | lib/
 33 | lib64/
 34 | parts/
 35 | sdist/
 36 | var/
 37 | wheels/
 38 | pip-wheel-metadata/
 39 | share/python-wheels/
 40 | *.egg-info/
 41 | .installed.cfg
 42 | *.egg
 43 | MANIFEST
 44 | 
 45 | # PyInstaller
 46 | #  Usually these files are written by a python script from a template
 47 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 48 | *.manifest
 49 | *.spec
 50 | 
 51 | # Installer logs
 52 | pip-log.txt
 53 | pip-delete-this-directory.txt
 54 | 
 55 | # Unit test / coverage reports
 56 | htmlcov/
 57 | .tox/
 58 | .nox/
 59 | .coverage
 60 | .coverage.*
 61 | .cache
 62 | nosetests.xml
 63 | coverage.xml
 64 | *.cover
 65 | *.py,cover
 66 | .hypothesis/
 67 | .pytest_cache/
 68 | 
 69 | # Translations
 70 | *.mo
 71 | *.pot
 72 | 
 73 | # Django stuff:
 74 | *.log
 75 | local_settings.py
 76 | db.sqlite3
 77 | db.sqlite3-journal
 78 | 
 79 | # Flask stuff:
 80 | instance/
 81 | .webassets-cache
 82 | 
 83 | # Scrapy stuff:
 84 | .scrapy
 85 | 
 86 | # Sphinx documentation
 87 | docs/_build/
 88 | 
 89 | # PyBuilder
 90 | target/
 91 | 
 92 | # Jupyter Notebook
 93 | .ipynb_checkpoints
 94 | 
 95 | # IPython
 96 | profile_default/
 97 | ipython_config.py
 98 | 
 99 | # pyenv
100 | .python-version
101 | 
102 | # pipenv
103 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
104 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
105 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
106 | #   install all needed dependencies.
107 | #Pipfile.lock
108 | 
109 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
110 | __pypackages__/
111 | 
112 | # Celery stuff
113 | celerybeat-schedule
114 | celerybeat.pid
115 | 
116 | # SageMath parsed files
117 | *.sage.py
118 | 
119 | # Environments
120 | .env
121 | .venv
122 | env/
123 | venv/
124 | ENV/
125 | env.bak/
126 | venv.bak/
127 | 
128 | # Spyder project settings
129 | .spyderproject
130 | .spyproject
131 | 
132 | # Rope project settings
133 | .ropeproject
134 | 
135 | # mkdocs documentation
136 | /site
137 | 
138 | # mypy
139 | .mypy_cache/
140 | .dmypy.json
141 | dmypy.json
142 | 
143 | # Pyre type checker
144 | .pyre/
145 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # tableauCalculationExport
 2 | 
 3 | ## What this code does
 4 | - This code will extract all Calculated Fields, Default Fields and Parameters from a Tableau workbook and export them into an Excel and PDF file.
 5 | - The code will also generate a Mermaid diagram showing the Lineage between fields. The diagram will be exported into an html file that you can open on your PCs Internet browser.
 6 | - Note that the Lineage Diagram will only show relationships between USED fields (ie. Default (datasource) fields that are NOT used in an Calculated Field will NOT come up in the diagram).
 7 | 
 8 | ## Limitations and Important Considerations
 9 | - The latest version of the code will only work on **twbx** files (packaged Tableau files).
10 | - The code is only available for **Windows** systems (as it needs the win32com.client package to generate the Excel file).
11 | 
12 | ## Getting Started
13 | - Please make sure you have a **working Python environment**, and you have installed the following packages/libraries (either via pip install or conda install - please Google the steps to install each package as some are either pip or Conda specific)
14 |   - win32com.client
15 |   - [tableaudocumentapi](https://tableau.github.io/document-api-python/docs/)
16 |   - pandas
17 |   - Jupyter Notebook
18 |  - Some modules should already come with your Python installation (depending on what Python version you are using), but if for some reason they're not present in your Python env, please make sure you get them too
19 |    - pathlib
20 |   
21 | ## Downloading the Code and Setting up your working directory
22 | 
23 | **Before starting on this section, please make sure you've installed Python and any dependencies into your Python environment (ie. the libraries and packages detailed in the previous section)**
24 | 
25 | 1. Download ALL the code into your preferred directory (ie. a folder on your PC).
26 |   - Make sure the **excelgenerator.py** file is in the SAME directory as the ipynb or py file you want to run (ie. the Tableau_calculation_extractor_with_mermaid.ipynb)
27 | 
28 | 2. In your working directory, create an empty "/inputs" and an "/outputs" folder. Your working directory should look like this:
29 | 
30 |    Note that an "/inputs" folder means that you will create a folder called "inputs" inside your working directory. From here onwards I will use "/inputs" and "inputs" interchangeably (same for outputs).
31 | 
32 | ![image](https://github.com/user-attachments/assets/62ec66c6-0db6-495a-9063-8b603fe66d17)
33 | 
34 | 
35 |  
36 | 3. Once you have a Tableau packaged workbook (twbx file) that you want to analyse, save it in the "inputs" folder.
37 | 
38 | 4. Run your **Calculation Extractor code** (ie. Tableau_calculation_extractor_with_mermaid.ipynb or Tableau_calculation_extractor_with_mermaid.py, depending on which version you want to run - either a Jupyter Notebook one or a py file - they're both meant to have the same functionality)
39 |    
40 | 5. Check the "/outputs" folder for the code outputs - you should now have a PDF, Excel and HTML file with the results from the Calculation Extraction process (PDF and Excel) and the Lineage Creation process (the HTML file).
41 | 
42 | ### Running the code again (eg. to analyse a new workbook)
43 | At the moment the code will only run on one twbx at a time, and will **only handle 1 twbx file from the inputs folder**. If two or more twbx files are found in the inputs folder, the code will only analyse one of them --> in future versions of the code, I will add file handling so more than one twxb file can be analysed at a time - you can also submit a PR with this code if you'd like to contribute to this code!
44 | 
45 | - Before analysing a new workbook (once already saved to the "/inputs" folder), remove any OTHER files from the "/inputs" folder (eg. any previous workbook you have already analysed), and only leave the one workbook you want to analyse.
46 | - You can now run the Calculation Extractor code.
47 | - You don't need to worry about emptying the "/outputs" folder - this folder will simply store all the outputs from any runs of the Calculation Extractor code, so more and more outputs will be added as more runs occur.
48 | 
49 | 
50 |  # Troubleshooting and Help
51 |  As this is a personal project, I am not providing any IT support for this code. However if you have any questions that are NOT explained above, feel free to reach out to nana7milana@gmail.com.
52 |  I will aim to reply within one or two weeks, but if I don't, feel free to send me a reminder.
53 |  Thanks for checking out my code!
54 | 
55 |  Ana
56 |   
57 |  
58 | 
59 | 
60 | 


--------------------------------------------------------------------------------
/Tableau_calculation_extractor.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# version 2.35"
   8 |    ]
   9 |   },
  10 |   {
  11 |    "cell_type": "code",
  12 |    "execution_count": null,
  13 |    "metadata": {},
  14 |    "outputs": [],
  15 |    "source": [
  16 |     "import pandas as pd\n",
  17 |     "import numpy as np\n",
  18 |     "import os, re, sys, pathlib, zipfile\n",
  19 |     "import win32com.client\n",
  20 |     "import xml.etree.ElementTree as ET\n",
  21 |     "import tableaudocumentapi\n",
  22 |     "\n",
  23 |     "from tableaudocumentapi import Workbook\n",
  24 |     "from os.path import isfile, join"
  25 |    ]
  26 |   },
  27 |   {
  28 |    "cell_type": "markdown",
  29 |    "metadata": {},
  30 |    "source": [
  31 |     "## Input folder - Find if there is a twbx or twb file in the folder\n",
  32 |     "- if there is a twbx, unzip it to create a twb, then work with this\n",
  33 |     "- if there's only a twb, work with this"
  34 |    ]
  35 |   },
  36 |   {
  37 |    "cell_type": "code",
  38 |    "execution_count": null,
  39 |    "metadata": {
  40 |     "scrolled": true
  41 |    },
  42 |    "outputs": [],
  43 |    "source": [
  44 |     "input_path = \"inputs\"\n",
  45 |     "output_path = \"outputs\"\n",
  46 |     "\n",
  47 |     "mypath = \"./{}\".format(input_path)"
  48 |    ]
  49 |   },
  50 |   {
  51 |    "cell_type": "code",
  52 |    "execution_count": null,
  53 |    "metadata": {
  54 |     "scrolled": false
  55 |    },
  56 |    "outputs": [],
  57 |    "source": [
  58 |     "#only gets files and not directories within the inputs folder -https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory\n",
  59 |     "f = [f for f in os.listdir(mypath) if isfile(join(mypath, f)) and f[-5:] == '.twbx'] \n",
  60 |     "f"
  61 |    ]
  62 |   },
  63 |   {
  64 |    "cell_type": "code",
  65 |    "execution_count": null,
  66 |    "metadata": {},
  67 |    "outputs": [],
  68 |    "source": [
  69 |     "def removeSpecialCharFromStr(spstring):\n",
  70 |     "  \n",
  71 |     "    return ''.join(e for e in spstring if e.isalnum())"
  72 |    ]
  73 |   },
  74 |   {
  75 |    "cell_type": "code",
  76 |    "execution_count": null,
  77 |    "metadata": {},
  78 |    "outputs": [],
  79 |    "source": [
  80 |     "for i in f: \n",
  81 |     "   \n",
  82 |     "    if i[-5:] == '.twbx':\n",
  83 |     "        sp_packagedWorkbook = i[:len(i)-5]\n",
  84 |     "        print(sp_packagedWorkbook)\n",
  85 |     "        packagedWorkbook = removeSpecialCharFromStr(sp_packagedWorkbook)+'.twbx'\n",
  86 |     "        print(packagedWorkbook)\n",
  87 |     "        \n",
  88 |     "        old_file = join(input_path, sp_packagedWorkbook+'.twbx')\n",
  89 |     "        new_file = join(input_path, packagedWorkbook)\n",
  90 |     "        os.rename(old_file, new_file)\n",
  91 |     "        \n",
  92 |     "        with zipfile.ZipFile(input_path+\"/\"+packagedWorkbook, 'r') as zip_ref:\n",
  93 |     "            zip_ref.extractall(input_path+\"/\")\n",
  94 |     "    else:\n",
  95 |     "        packagedWorkbook = \"\"\n",
  96 |     "        \n",
  97 |     "for i in [f for f in os.listdir(mypath) if isfile(join(mypath, f))] :\n",
  98 |     "    \n",
  99 |     "    if i[-4:] == '.twb':\n",
 100 |     "        sp_unpackagedWorkbook = i[:len(i)-4]\n",
 101 |     "        unpackedWorkbook = removeSpecialCharFromStr(sp_unpackagedWorkbook)+'.twb' \n",
 102 |     "        \n",
 103 |     "        old_file = join(input_path, sp_unpackagedWorkbook+'.twb')\n",
 104 |     "        new_file = join(input_path, unpackedWorkbook)\n",
 105 |     "        os.rename(old_file, new_file)\n",
 106 |     "\n",
 107 |     "print('\\n')\n",
 108 |     "print('packaged workbook: ' + packagedWorkbook)\n",
 109 |     "print('unpackaged workbook: ' + unpackedWorkbook)"
 110 |    ]
 111 |   },
 112 |   {
 113 |    "cell_type": "code",
 114 |    "execution_count": null,
 115 |    "metadata": {},
 116 |    "outputs": [],
 117 |    "source": [
 118 |     "tableauFile = input_path+\"/\"+unpackedWorkbook\n",
 119 |     "tableauFile"
 120 |    ]
 121 |   },
 122 |   {
 123 |    "cell_type": "code",
 124 |    "execution_count": null,
 125 |    "metadata": {},
 126 |    "outputs": [],
 127 |    "source": [
 128 |     "packagedTableauFile = input_path+\"/\"+packagedWorkbook\n",
 129 |     "packagedTableauFile"
 130 |    ]
 131 |   },
 132 |   {
 133 |    "cell_type": "code",
 134 |    "execution_count": null,
 135 |    "metadata": {},
 136 |    "outputs": [],
 137 |    "source": [
 138 |     "#substring to be used when naming the exported data\n",
 139 |     "\n",
 140 |     "tableau_name_substring = packagedWorkbook.replace(\".twbx\",\"\")[:30]\n",
 141 |     "tableau_name_substring"
 142 |    ]
 143 |   },
 144 |   {
 145 |    "cell_type": "markdown",
 146 |    "metadata": {},
 147 |    "source": [
 148 |     "# Parse xml to get all calculations"
 149 |    ]
 150 |   },
 151 |   {
 152 |    "cell_type": "code",
 153 |    "execution_count": null,
 154 |    "metadata": {},
 155 |    "outputs": [],
 156 |    "source": [
 157 |     "tree = ET.parse(tableauFile)\n",
 158 |     "root = tree.getroot()\n",
 159 |     "\n",
 160 |     "collator1 = []\n",
 161 |     "calcNames = []\n",
 162 |     "calcCaptions = []\n",
 163 |     "\n",
 164 |     "for_findall = [\"./datasources/datasource/column\", \"./worksheets/worksheet/table/view/datasource-dependencies/column\"]\n",
 165 |     "\n",
 166 |     "for pathy in for_findall:\n",
 167 |     "    for elem in root.findall(pathy):\n",
 168 |     "\n",
 169 |     "        dict_temp = {}\n",
 170 |     "\n",
 171 |     "        if (elem.findall('calculation')) != []:    #only get nodes where there is a calculation\n",
 172 |     "            try:\n",
 173 |     "                dict_temp['caption'] = elem.attrib['caption']\n",
 174 |     "                calcCaptions.append(elem.attrib['caption'])\n",
 175 |     "            except:\n",
 176 |     "                dict_temp['caption'] = elem.attrib['name'] #DEPRECATED #'MISSING'\n",
 177 |     "                calcCaptions.append(elem.attrib['name'])  #DEPRECATED append('MISSING')\n",
 178 |     "\n",
 179 |     "            dict_temp['datatype'] = elem.attrib['datatype']\n",
 180 |     "            dict_temp['name'] = elem.attrib['name']\n",
 181 |     "\n",
 182 |     "            f2 = (elem.attrib['name']).replace(']','')\n",
 183 |     "            f2 = f2.replace('[', '')\n",
 184 |     "            calcNames.append(f2)\n",
 185 |     "\n",
 186 |     "            try: #this part evaluates for a parameter\n",
 187 |     "                paramExists = elem.attrib['param-domain-type']\n",
 188 |     "                dict_temp['isParameter'] = 'yes'\n",
 189 |     "                dict_temp['formula'] = 'NA'\n",
 190 |     "\n",
 191 |     "            except: #this part is for calculations only (not parameters)\n",
 192 |     "                dict_temp['isParameter'] = 'no'\n",
 193 |     "\n",
 194 |     "                try:\n",
 195 |     "                    for calc in elem.findall('calculation'):\n",
 196 |     "                        dict_temp['formula'] = calc.attrib['formula']\n",
 197 |     "                except:\n",
 198 |     "\n",
 199 |     "                    dict_temp['formula'] = 'NA'\n",
 200 |     "\n",
 201 |     "            collator1.append(dict_temp)"
 202 |    ]
 203 |   },
 204 |   {
 205 |    "cell_type": "code",
 206 |    "execution_count": null,
 207 |    "metadata": {},
 208 |    "outputs": [],
 209 |    "source": [
 210 |     "calcDict = dict(zip(calcNames, calcCaptions))\n",
 211 |     "calcDict"
 212 |    ]
 213 |   },
 214 |   {
 215 |    "cell_type": "code",
 216 |    "execution_count": null,
 217 |    "metadata": {},
 218 |    "outputs": [],
 219 |    "source": [
 220 |     "def default_to_friendly_names(formulaList):\n",
 221 |     "\n",
 222 |     "    for i in formulaList:\n",
 223 |     "        for tableauName, friendlyName in calcDict.items():\n",
 224 |     "            i['formula'] = (i['formula']).replace(tableauName, friendlyName)\n",
 225 |     "       \n",
 226 |     "    return formulaList"
 227 |    ]
 228 |   },
 229 |   {
 230 |    "cell_type": "code",
 231 |    "execution_count": null,
 232 |    "metadata": {
 233 |     "scrolled": true
 234 |    },
 235 |    "outputs": [],
 236 |    "source": [
 237 |     "collator1 = default_to_friendly_names(collator1)\n",
 238 |     "collator1[0:2]"
 239 |    ]
 240 |   },
 241 |   {
 242 |    "cell_type": "code",
 243 |    "execution_count": null,
 244 |    "metadata": {
 245 |     "scrolled": true
 246 |    },
 247 |    "outputs": [],
 248 |    "source": [
 249 |     "df = pd.DataFrame(collator1)\n",
 250 |     "df = df[['caption', 'datatype', 'formula', 'isParameter', 'name']]\n",
 251 |     "df.columns = ['CalculationName', 'DataType', 'Formula', 'isParameter', 'RawName']\n",
 252 |     "\n",
 253 |     "df = df.drop_duplicates()\n",
 254 |     "\n",
 255 |     "df = df.sort_values(by=['isParameter','CalculationName'])\n",
 256 |     "df = df.reset_index(drop=True)\n",
 257 |     "df"
 258 |    ]
 259 |   },
 260 |   {
 261 |    "cell_type": "markdown",
 262 |    "metadata": {},
 263 |    "source": [
 264 |     "# Getting all filters for all worksheets"
 265 |    ]
 266 |   },
 267 |   {
 268 |    "cell_type": "code",
 269 |    "execution_count": null,
 270 |    "metadata": {},
 271 |    "outputs": [],
 272 |    "source": [
 273 |     "tree = ET.parse(tableauFile)\n",
 274 |     "root = tree.getroot()\n",
 275 |     "\n",
 276 |     "filters_in_sheet = []\n",
 277 |     "context = []\n",
 278 |     "collatelist = []\n",
 279 |     "\n",
 280 |     "for worskheet in root.findall(\"./worksheets/worksheet\"):\n",
 281 |     "    \n",
 282 |     "    tempdict = {}\n",
 283 |     "    c = 0\n",
 284 |     "    \n",
 285 |     "    for filt in worskheet.findall('table/view/filter'):\n",
 286 |     "\n",
 287 |     "        calcfromfilter = filt.attrib['column']        \n",
 288 |     "        pat = '(?<=\\:)(.*?)(?=\\:)' \n",
 289 |     "        string_cleaned = calcfromfilter.split('].[')[1].replace(']','')\n",
 290 |     "        \n",
 291 |     "        tempdict['field'] = calcfromfilter\n",
 292 |     "        tempdict['formula'] = calcfromfilter\n",
 293 |     "        tempdict['counter'] = c\n",
 294 |     "        tempdict['sheetname'] = worskheet.attrib['name']\n",
 295 |     "        \n",
 296 |     "        try:\n",
 297 |     "            st1 = re.findall(pat,string_cleaned)[0]\n",
 298 |     "            tempdict['field'] = st1\n",
 299 |     "            tempdict['formula'] = st1\n",
 300 |     "            collatelist.append(tempdict)\n",
 301 |     "            \n",
 302 |     "        except:\n",
 303 |     "            st2 = string_cleaned.replace(':','')\n",
 304 |     "            tempdict['field'] = st2\n",
 305 |     "            tempdict['formula'] = st2\n",
 306 |     "            collatelist.append(tempdict)\n",
 307 |     "\n",
 308 |     "        try:\n",
 309 |     "            tempdict['context'] = filt.attrib['context']\n",
 310 |     "        except:\n",
 311 |     "            tempdict['context'] = 'False'\n",
 312 |     "           \n",
 313 |     "        c = c + 1\n",
 314 |     "        tempdict = {}\n",
 315 |     "    \n",
 316 |     "collatelist[0:2]"
 317 |    ]
 318 |   },
 319 |   {
 320 |    "cell_type": "code",
 321 |    "execution_count": null,
 322 |    "metadata": {},
 323 |    "outputs": [],
 324 |    "source": [
 325 |     "collatelist = default_to_friendly_names(collatelist)\n",
 326 |     "collatelist[0:2]"
 327 |    ]
 328 |   },
 329 |   {
 330 |    "cell_type": "code",
 331 |    "execution_count": null,
 332 |    "metadata": {},
 333 |    "outputs": [],
 334 |    "source": [
 335 |     "try: \n",
 336 |     "    df1 = pd.DataFrame(collatelist)\n",
 337 |     "\n",
 338 |     "    df1 = df1[['sheetname', 'formula', 'context', 'field']]\n",
 339 |     "    df1.columns = ['Sheet Name', 'FilterField', 'Context filter', 'FilterField_RawName']\n",
 340 |     "\n",
 341 |     "    print(df1.head(2))\n",
 342 |     "except:\n",
 343 |     "    print('error with df1')"
 344 |    ]
 345 |   },
 346 |   {
 347 |    "cell_type": "markdown",
 348 |    "metadata": {},
 349 |    "source": [
 350 |     "# Extracting rows and cols for each sheet"
 351 |    ]
 352 |   },
 353 |   {
 354 |    "cell_type": "code",
 355 |    "execution_count": null,
 356 |    "metadata": {
 357 |     "scrolled": true
 358 |    },
 359 |    "outputs": [],
 360 |    "source": [
 361 |     "collecteddata = []\n",
 362 |     "\n",
 363 |     "for worksheet in root.findall(\"./worksheets/worksheet\"):\n",
 364 |     "\n",
 365 |     "    argumentstopass = ['rows', 'cols']\n",
 366 |     "    \n",
 367 |     "    for i in argumentstopass:   \n",
 368 |     "    \n",
 369 |     "        internaldict = {}\n",
 370 |     "\n",
 371 |     "        internaldict['sheetname'] = worksheet.attrib['name']\n",
 372 |     "        internaldict['type'] = i\n",
 373 |     "        \n",
 374 |     "        formulahere = worksheet.findall('table/'+i)[0].text\n",
 375 |     "        internaldict['formula'] = formulahere\n",
 376 |     "        \n",
 377 |     "        collecteddata.append(internaldict)\n",
 378 |     "    \n",
 379 |     "collecteddata[0:2]"
 380 |    ]
 381 |   },
 382 |   {
 383 |    "cell_type": "code",
 384 |    "execution_count": null,
 385 |    "metadata": {
 386 |     "scrolled": true
 387 |    },
 388 |    "outputs": [],
 389 |    "source": [
 390 |     "for i in collecteddata:\n",
 391 |     "\n",
 392 |     "    try:\n",
 393 |     "        pattern = '\\:.*?\\:'\n",
 394 |     "        pat = '(?<=\\:)(.*?)(?=\\:)'\n",
 395 |     "\n",
 396 |     "        calculationsWithColon = re.findall(pattern,i['formula']) \n",
 397 |     "        calcsWithoutColon = []\n",
 398 |     "\n",
 399 |     "        for n in calculationsWithColon:\n",
 400 |     "            oneCalcWithoutColon = re.findall(pat,n)[0]\n",
 401 |     "\n",
 402 |     "            calcsWithoutColon.append(oneCalcWithoutColon)\n",
 403 |     "            \n",
 404 |     "        i['extracted formulas'] = calcsWithoutColon\n",
 405 |     "        \n",
 406 |     "    except:\n",
 407 |     "        i['extracted formulas'] = []\n",
 408 |     "             \n",
 409 |     "    newcalcs = []\n",
 410 |     "    formulas_to_process = i['extracted formulas']\n",
 411 |     "    \n",
 412 |     "    for n in formulas_to_process:\n",
 413 |     "           \n",
 414 |     "        for tableauName, friendlyName in calcDict.items():\n",
 415 |     "            \n",
 416 |     "            n = n.replace(tableauName, friendlyName)\n",
 417 |     "            \n",
 418 |     "        newcalcs.append(n)\n",
 419 |     "    \n",
 420 |     "    #version 2.35 added this part to check for longitude or latitute in the formula\n",
 421 |     "    #separate to other try/except as long/lat appear in a different string structure so cannot analyse with above regex\n",
 422 |     "    try:\n",
 423 |     "        if \"Longitude (generated)\" in i['formula']:\n",
 424 |     "            newcalcs.append(\"Longitude (generated)\")\n",
 425 |     "        elif \"Latitude (generated)\" in i['formula']:\n",
 426 |     "            newcalcs.append(\"Latitude (generated)\")\n",
 427 |     "    except:\n",
 428 |     "        dummy = 0\n",
 429 |     "    \n",
 430 |     "    i['processed formulas'] = newcalcs\n",
 431 |     "\n",
 432 |     "collecteddata"
 433 |    ]
 434 |   },
 435 |   {
 436 |    "cell_type": "code",
 437 |    "execution_count": null,
 438 |    "metadata": {
 439 |     "scrolled": true
 440 |    },
 441 |    "outputs": [],
 442 |    "source": [
 443 |     "df2 = pd.DataFrame(collecteddata)\n",
 444 |     "df2 = df2[['extracted formulas', 'formula', 'processed formulas', 'sheetname', 'type']]\n",
 445 |     "df2 = df2.drop(columns=['formula', 'extracted formulas'])\n",
 446 |     "df2 = df2.pivot(index='sheetname', columns='type', values='processed formulas')\n",
 447 |     "df2 = df2.reset_index()\n",
 448 |     "df2"
 449 |    ]
 450 |   },
 451 |   {
 452 |    "cell_type": "markdown",
 453 |    "metadata": {},
 454 |    "source": [
 455 |     "# Doc API"
 456 |    ]
 457 |   },
 458 |   {
 459 |    "cell_type": "markdown",
 460 |    "metadata": {},
 461 |    "source": [
 462 |     "# All default fields - DOC API"
 463 |    ]
 464 |   },
 465 |   {
 466 |    "cell_type": "code",
 467 |    "execution_count": null,
 468 |    "metadata": {
 469 |     "scrolled": true
 470 |    },
 471 |    "outputs": [],
 472 |    "source": [
 473 |     "packagedTableauFile"
 474 |    ]
 475 |   },
 476 |   {
 477 |    "cell_type": "code",
 478 |    "execution_count": null,
 479 |    "metadata": {
 480 |     "scrolled": true
 481 |    },
 482 |    "outputs": [],
 483 |    "source": [
 484 |     "#get all fields in workbook\n",
 485 |     "sourceTWBX = Workbook(packagedTableauFile)\n",
 486 |     "\n",
 487 |     "collator = []\n",
 488 |     "calcID = []\n",
 489 |     "calcID2 = []\n",
 490 |     "calcNames = []\n",
 491 |     "\n",
 492 |     "c = 0\n",
 493 |     "\n",
 494 |     "worksheets = sourceTWBX.worksheets\n",
 495 |     "\n",
 496 |     "#for worksheet in worksheets: #see if this has to be marked out or not\n",
 497 |     "    \n",
 498 |     "for datasource in sourceTWBX.datasources:\n",
 499 |     "\n",
 500 |     "    for count, field in enumerate(datasource.fields.values()):\n",
 501 |     "\n",
 502 |     "                #if worksheet in field.worksheets: #removed this part so all fields are listed,as otherwise some fields were missed out\n",
 503 |     "\n",
 504 |     "            dict_temp = {}\n",
 505 |     "            dict_temp['counter'] = c\n",
 506 |     "            dict_temp['worksheet'] = worksheet\n",
 507 |     "            dict_temp['datasource_name'] = datasource.name\n",
 508 |     "            dict_temp['field_WHOLE'] = field\n",
 509 |     "            dict_temp['field_name'] = field.name\n",
 510 |     "            dict_temp['field_caption'] = field.caption\n",
 511 |     "            dict_temp['field_calculation'] = field.calculation\n",
 512 |     "            dict_temp['field_id'] = field.id\n",
 513 |     "            dict_temp['field_datatype'] = field.datatype\n",
 514 |     "\n",
 515 |     "\n",
 516 |     "            if not(isinstance(field.calculation, type(None))):\n",
 517 |     "                calcID.append(field.id)\n",
 518 |     "                calcNames.append(field.name)\n",
 519 |     "\n",
 520 |     "                f2 = (field.id).replace(']','')\n",
 521 |     "                f2 = f2.replace('[', '')\n",
 522 |     "                calcID2.append(f2)\n",
 523 |     "\n",
 524 |     "            c = c + 1\n",
 525 |     "\n",
 526 |     "            collator.append(dict_temp)"
 527 |    ]
 528 |   },
 529 |   {
 530 |    "cell_type": "code",
 531 |    "execution_count": null,
 532 |    "metadata": {},
 533 |    "outputs": [],
 534 |    "source": [
 535 |     "calcDict = dict(zip(calcID, calcNames))\n",
 536 |     "calcDict2 = dict(zip(calcID2, calcNames)) #raw fields without any []\n",
 537 |     "\n",
 538 |     "def default_to_friendly_names2(formulaList,fieldToConvert, dictToUse):\n",
 539 |     "\n",
 540 |     "    for i in formulaList:\n",
 541 |     "        for tableauName, friendlyName in dictToUse.items():\n",
 542 |     "            try:\n",
 543 |     "                i[fieldToConvert] = (i[fieldToConvert]).replace(tableauName, friendlyName)\n",
 544 |     "            except:\n",
 545 |     "                a = 0\n",
 546 |     "       \n",
 547 |     "    return formulaList"
 548 |    ]
 549 |   },
 550 |   {
 551 |    "cell_type": "code",
 552 |    "execution_count": null,
 553 |    "metadata": {},
 554 |    "outputs": [],
 555 |    "source": [
 556 |     "def f(row):\n",
 557 |     "    if row['field_calculation'] == None:\n",
 558 |     "        val = 'Datasource field'\n",
 559 |     "    else:\n",
 560 |     "        val = 'Calculated field'\n",
 561 |     "    return val"
 562 |    ]
 563 |   },
 564 |   {
 565 |    "cell_type": "code",
 566 |    "execution_count": null,
 567 |    "metadata": {},
 568 |    "outputs": [],
 569 |    "source": [
 570 |     "default_to_friendly_names2(collator,'field_calculation',calcDict)\n",
 571 |     "\n",
 572 |     "df_API_all = pd.DataFrame(collator)\n",
 573 |     "df_API_all['field_type'] = df_API_all.apply(f, axis=1)\n",
 574 |     "\n",
 575 |     "df_API_all.head()"
 576 |    ]
 577 |   },
 578 |   {
 579 |    "cell_type": "code",
 580 |    "execution_count": null,
 581 |    "metadata": {},
 582 |    "outputs": [],
 583 |    "source": [
 584 |     "df_defaultFields = df_API_all[df_API_all['field_type'] == 'Datasource field'][['field_id', 'field_caption','field_datatype', 'datasource_name']].drop_duplicates().copy()\n",
 585 |     "\n",
 586 |     "df_defaultFields['prefOrder'] = np.where(df_defaultFields['field_caption'].isnull(), 0, 1)\n",
 587 |     "df_defaultFields['field_id2'] = df_defaultFields['field_id'].str.replace('[','')\n",
 588 |     "df_defaultFields['field_id2'] = df_defaultFields['field_id2'].str.replace(']','')\n",
 589 |     "\n",
 590 |     "df_defaultFields = df_defaultFields.sort_values(by = ['field_id2'])\n",
 591 |     "#https://stackoverflow.com/questions/63271050/use-drop-duplicates-in-pandas-df-but-choose-keep-column-based-on-a-preference-li\n",
 592 |     "preference_list=[1,0]\n",
 593 |     "\n",
 594 |     "df_defaultFields[\"prefOrder\"] = pd.Categorical(df_defaultFields[\"prefOrder\"], categories=preference_list, ordered=True)\n",
 595 |     "\n",
 596 |     "df_defaultFields = df_defaultFields.sort_values([\"field_id2\",\"prefOrder\"]).drop_duplicates(\"field_id2\")\n",
 597 |     "df_defaultFields = df_defaultFields.drop('prefOrder', axis=1)\n",
 598 |     "df_defaultFields = df_defaultFields.drop('field_id2', axis=1)\n",
 599 |     "df_defaultFields.head(2)"
 600 |    ]
 601 |   },
 602 |   {
 603 |    "cell_type": "markdown",
 604 |    "metadata": {},
 605 |    "source": [
 606 |     "# Parameters"
 607 |    ]
 608 |   },
 609 |   {
 610 |    "cell_type": "code",
 611 |    "execution_count": null,
 612 |    "metadata": {
 613 |     "scrolled": true
 614 |    },
 615 |    "outputs": [],
 616 |    "source": [
 617 |     "colsToUse = ['field_id', 'field_name', 'field_calculation', 'field_caption','field_datatype', 'datasource_name' ]\n",
 618 |     "dfAPIParameters = df_API_all[colsToUse][df_API_all['datasource_name']=='Parameters'].drop_duplicates().copy()\n",
 619 |     "\n",
 620 |     "dfAPIParameters"
 621 |    ]
 622 |   },
 623 |   {
 624 |    "cell_type": "code",
 625 |    "execution_count": null,
 626 |    "metadata": {},
 627 |    "outputs": [],
 628 |    "source": [
 629 |     "df = df.merge(dfAPIParameters[['field_id','field_calculation']], left_on='RawName', right_on = 'field_id', how='left')\n",
 630 |     "\n",
 631 |     "df[\"Formula\"] = np.where(df[\"Formula\"] == \"NA\", df['field_calculation'], df[\"Formula\"])\n",
 632 |     "df = df.drop(columns=['field_id', 'field_calculation'])\n",
 633 |     "df"
 634 |    ]
 635 |   },
 636 |   {
 637 |    "cell_type": "markdown",
 638 |    "metadata": {},
 639 |    "source": [
 640 |     "# Sheet - all field dependencies, not just the explicitly used fields"
 641 |    ]
 642 |   },
 643 |   {
 644 |    "cell_type": "code",
 645 |    "execution_count": null,
 646 |    "metadata": {},
 647 |    "outputs": [],
 648 |    "source": [
 649 |     "#df_api_insheet\n",
 650 |     "sourceTWBX = Workbook(packagedTableauFile)\n",
 651 |     "\n",
 652 |     "collator_sheet_dependencies = []\n",
 653 |     "\n",
 654 |     "c = 0\n",
 655 |     "\n",
 656 |     "worksheets = sourceTWBX.worksheets\n",
 657 |     "\n",
 658 |     "for worksheet in worksheets:\n",
 659 |     "    \n",
 660 |     "    for datasource in sourceTWBX.datasources:\n",
 661 |     "       \n",
 662 |     "        for count, field in enumerate(datasource.fields.values()):\n",
 663 |     "            \n",
 664 |     "            if worksheet in field.worksheets: #to see if only fields that appear in sheets are listed, else last df is too large\n",
 665 |     "                \n",
 666 |     "                dict_temp = {}\n",
 667 |     "                dict_temp['counter'] = c\n",
 668 |     "                dict_temp['worksheet'] = worksheet\n",
 669 |     "                dict_temp['datasource_name'] = datasource.name\n",
 670 |     "                dict_temp['field_WHOLE'] = field\n",
 671 |     "                dict_temp['field_name'] = field.name\n",
 672 |     "                dict_temp['field_caption'] = field.caption\n",
 673 |     "                dict_temp['field_calculation'] = field.calculation\n",
 674 |     "                dict_temp['field_id'] = field.id\n",
 675 |     "                dict_temp['field_datatype'] = field.datatype\n",
 676 |     "                \n",
 677 |     "                c = c + 1\n",
 678 |     "                \n",
 679 |     "                collator_sheet_dependencies.append(dict_temp)"
 680 |    ]
 681 |   },
 682 |   {
 683 |    "cell_type": "code",
 684 |    "execution_count": null,
 685 |    "metadata": {
 686 |     "scrolled": false
 687 |    },
 688 |    "outputs": [],
 689 |    "source": [
 690 |     "#default_to_friendly_names2(collator_sheet_dependencies, 'field_calculation',calcDict)\n",
 691 |     "\n",
 692 |     "df_api_insheet = pd.DataFrame(collator_sheet_dependencies)\n",
 693 |     "df_api_insheet['field_type'] = df_api_insheet.apply(f, axis=1)\n",
 694 |     "df_api_insheet.head()"
 695 |    ]
 696 |   },
 697 |   {
 698 |    "cell_type": "code",
 699 |    "execution_count": null,
 700 |    "metadata": {
 701 |     "scrolled": true
 702 |    },
 703 |    "outputs": [],
 704 |    "source": [
 705 |     "df_sheetDependencies = df_api_insheet.copy()\n",
 706 |     "preference_list=[1,0]\n",
 707 |     "\n",
 708 |     "df_sheetDependencies['prefOrder'] = np.where(df_sheetDependencies['field_caption'].isnull(), 0, 1)\n",
 709 |     "\n",
 710 |     "df_sheetDependencies['field_id2'] = df_sheetDependencies['field_id'].str.replace('[','')\n",
 711 |     "df_sheetDependencies['field_id2'] = df_sheetDependencies['field_id2'].str.replace(']','')\n",
 712 |     "\n",
 713 |     "df_sheetDependencies[\"prefOrder\"] = pd.Categorical(df_sheetDependencies[\"prefOrder\"], categories=preference_list, ordered=True)\n",
 714 |     "df_sheetDependencies = df_sheetDependencies.sort_values([\"field_id2\",\\\n",
 715 |     "                                                         \"prefOrder\"]).drop_duplicates(subset=[\"field_id2\", \"worksheet\"])\n",
 716 |     "\n",
 717 |     "df_sheetDependencies = df_sheetDependencies.drop(\\\n",
 718 |     "                                columns=['prefOrder', 'field_id2', 'counter', 'field_caption', 'field_WHOLE', \\\n",
 719 |     "                                         'field_calculation', 'field_id'])\n",
 720 |     "\n",
 721 |     "df_sheetDependencies = df_sheetDependencies[['worksheet', 'field_name', 'field_datatype', \\\n",
 722 |     "                                             'field_type', 'datasource_name']].sort_values(by = ['worksheet', 'field_type', 'field_name'])\n",
 723 |     "df_sheetDependencies.head()"
 724 |    ]
 725 |   },
 726 |   {
 727 |    "cell_type": "markdown",
 728 |    "metadata": {},
 729 |    "source": [
 730 |     "# General workbook description"
 731 |    ]
 732 |   },
 733 |   {
 734 |    "cell_type": "code",
 735 |    "execution_count": null,
 736 |    "metadata": {},
 737 |    "outputs": [],
 738 |    "source": [
 739 |     "sourceTWBX = Workbook(packagedTableauFile)"
 740 |    ]
 741 |   },
 742 |   {
 743 |    "cell_type": "code",
 744 |    "execution_count": null,
 745 |    "metadata": {},
 746 |    "outputs": [],
 747 |    "source": [
 748 |     "collate_list = []\n",
 749 |     "\n",
 750 |     "for dash in sourceTWBX.dashboards:\n",
 751 |     "    dicti = {}\n",
 752 |     "    \n",
 753 |     "    dicti['type'] = 'dashboard'\n",
 754 |     "  #  print(format(dash))\n",
 755 |     "    dicti['name'] = format(dash)\n",
 756 |     "   \n",
 757 |     "    collate_list.append(dicti)\n",
 758 |     "    \n",
 759 |     "for data in sourceTWBX.datasources:\n",
 760 |     "    dicti = {}\n",
 761 |     "    \n",
 762 |     "    dicti['type'] = 'datasource'\n",
 763 |     "    dicti['name'] = format(data.name)\n",
 764 |     "   # print(format(data.name))\n",
 765 |     "   \n",
 766 |     "    collate_list.append(dicti)\n",
 767 |     "    \n",
 768 |     "for data in sourceTWBX.worksheets:\n",
 769 |     "    dicti = {}\n",
 770 |     "    \n",
 771 |     "    dicti['type'] = 'sheet'\n",
 772 |     "    dicti['name'] = format(data)\n",
 773 |     "   # print(format(data))\n",
 774 |     "    \n",
 775 |     "    collate_list.append(dicti)"
 776 |    ]
 777 |   },
 778 |   {
 779 |    "cell_type": "code",
 780 |    "execution_count": null,
 781 |    "metadata": {},
 782 |    "outputs": [],
 783 |    "source": [
 784 |     "df_workbookdec = pd.DataFrame(collate_list)\n",
 785 |     "df_workbookdec = df_workbookdec[['type', 'name']]\n",
 786 |     "df_workbookdec.head(2)"
 787 |    ]
 788 |   },
 789 |   {
 790 |    "cell_type": "code",
 791 |    "execution_count": null,
 792 |    "metadata": {
 793 |     "scrolled": false
 794 |    },
 795 |    "outputs": [],
 796 |    "source": [
 797 |     "df_workbookdec_counts = df_workbookdec.groupby(['type']).count().reset_index()\n",
 798 |     "df_workbookdec_counts"
 799 |    ]
 800 |   },
 801 |   {
 802 |    "cell_type": "code",
 803 |    "execution_count": null,
 804 |    "metadata": {},
 805 |    "outputs": [],
 806 |    "source": [
 807 |     "#count parameters and calc fields, based on xml scraping\n",
 808 |     "parameterCount = len(df[df['isParameter'] == 'yes'])\n",
 809 |     "calcFieldCount = len(df[df['isParameter'] != 'yes'])"
 810 |    ]
 811 |   },
 812 |   {
 813 |    "cell_type": "code",
 814 |    "execution_count": null,
 815 |    "metadata": {},
 816 |    "outputs": [],
 817 |    "source": [
 818 |     "new_row1 = {'type':'parameter', 'name':parameterCount}\n",
 819 |     "new_row2 = {'type':'calculated field', 'name':calcFieldCount}\n",
 820 |     "\n",
 821 |     "toappend = [new_row1, new_row2]\n",
 822 |     "\n",
 823 |     "for i in toappend:\n",
 824 |     "#append row to the dataframe\n",
 825 |     "    df_workbookdec_counts = df_workbookdec_counts.append(i, ignore_index=True)\n",
 826 |     "\n",
 827 |     "df_workbookdec_counts.columns = ['type', 'count']\n",
 828 |     "df_workbookdec_counts"
 829 |    ]
 830 |   },
 831 |   {
 832 |    "cell_type": "markdown",
 833 |    "metadata": {},
 834 |    "source": [
 835 |     "## Generating an excel file from a df (so the excel rows/cols can be formatted), then turning the excel into a pdf"
 836 |    ]
 837 |   },
 838 |   {
 839 |    "cell_type": "code",
 840 |    "execution_count": null,
 841 |    "metadata": {},
 842 |    "outputs": [],
 843 |    "source": [
 844 |     "cwd = os.getcwd()\n",
 845 |     "path_string = pathlib.Path(cwd).resolve().__str__() + \"\\{}\""
 846 |    ]
 847 |   },
 848 |   {
 849 |    "cell_type": "markdown",
 850 |    "metadata": {},
 851 |    "source": [
 852 |     "- Loading the file names and output locations for the excel and pdfs to be produced"
 853 |    ]
 854 |   },
 855 |   {
 856 |    "cell_type": "code",
 857 |    "execution_count": null,
 858 |    "metadata": {},
 859 |    "outputs": [],
 860 |    "source": [
 861 |     "name_to_use = tableau_name_substring    \n",
 862 |     "\n",
 863 |     "newFileName = 'outputs\\{}'.format(name_to_use)\n",
 864 |     "excelName = newFileName + \".xlsx\"\n",
 865 |     "pdfName = newFileName + \".pdf\"\n",
 866 |     "print(pdfName)\n",
 867 |     "\n",
 868 |     "excel_path = path_string.format(excelName)\n",
 869 |     "path_to_pdf = path_string.format(pdfName)"
 870 |    ]
 871 |   },
 872 |   {
 873 |    "cell_type": "markdown",
 874 |    "metadata": {},
 875 |    "source": [
 876 |     "- Functions to format the excel files"
 877 |    ]
 878 |   },
 879 |   {
 880 |    "cell_type": "code",
 881 |    "execution_count": null,
 882 |    "metadata": {},
 883 |    "outputs": [],
 884 |    "source": [
 885 |     "#colors to be used in each sheet\n",
 886 |     "c1 = '#f4dfa4'\n",
 887 |     "c2 = '#ffc8b3'\n",
 888 |     "c3 = '#fff0b3'\n",
 889 |     "c4 = '#d5dfb9'\n",
 890 |     "c5 = '#d1c5d3'\n",
 891 |     "c6 = '#bfd9d7'"
 892 |    ]
 893 |   },
 894 |   {
 895 |    "cell_type": "code",
 896 |    "execution_count": null,
 897 |    "metadata": {},
 898 |    "outputs": [],
 899 |    "source": [
 900 |     "def mainCol(colNumber, color):\n",
 901 |     "    format_mainCol = workbook.add_format({'text_wrap': True, 'bold': True})\n",
 902 |     "    format_mainCol.set_align('vcenter')\n",
 903 |     "    format_mainCol.set_bg_color(color)\n",
 904 |     "    format_mainCol.set_border(1)\n",
 905 |     "    worksheet.set_column(colNumber,colNumber,20,format_mainCol)\n",
 906 |     "    return worksheet"
 907 |    ]
 908 |   },
 909 |   {
 910 |    "cell_type": "code",
 911 |    "execution_count": null,
 912 |    "metadata": {},
 913 |    "outputs": [],
 914 |    "source": [
 915 |     "def normalCol(colNumber, colWidth):\n",
 916 |     "    format2 = workbook.add_format({'text_wrap': True})\n",
 917 |     "    format2.set_align('vcenter')\n",
 918 |     "    format2.set_border(1)\n",
 919 |     "    worksheet.set_column(colNumber,colNumber,colWidth,format2)\n",
 920 |     "    return worksheet"
 921 |    ]
 922 |   },
 923 |   {
 924 |    "cell_type": "markdown",
 925 |    "metadata": {},
 926 |    "source": [
 927 |     "- Creation of excel file"
 928 |    ]
 929 |   },
 930 |   {
 931 |    "cell_type": "code",
 932 |    "execution_count": null,
 933 |    "metadata": {},
 934 |    "outputs": [],
 935 |    "source": [
 936 |     "#modify this part if you want to add more information/dfs to be saved as a separate sheet in excel\n",
 937 |     "\n",
 938 |     "dfs_to_use = [{'excelSheetTitle': 'Dashboard, datasource and sheet details', 'df_to_use':df_workbookdec, 'mainColWidth':'' , \n",
 939 |     "               'normalColWidth': [30], 'sheetName': 'GeneralDetails', 'footer': 'Data_1 (DOC API)', 'papersize':9, 'color': c1} , \n",
 940 |     "              \n",
 941 |     "              {'excelSheetTitle': 'Overall counts of dashboards, datasources and sheets', 'df_to_use':df_workbookdec_counts, 'mainColWidth':'' , \n",
 942 |     "               'normalColWidth': [10], 'sheetName': 'GeneralCounts', 'footer': 'Data_2 (DOC API + XML)', 'papersize':9, 'color': c1},\n",
 943 |     "              \n",
 944 |     "              {'excelSheetTitle': 'Default fields from all datasources', 'df_to_use':df_defaultFields, 'mainColWidth':'' , \n",
 945 |     "               'normalColWidth': [20,20,40], 'sheetName': 'DefaultFields', 'footer': 'Data_3 (XML extraction)', 'papersize':9, 'color': c2},\n",
 946 |     "              \n",
 947 |     "              {'excelSheetTitle': 'Calculated fields and parameters', 'df_to_use':df, 'mainColWidth':'' , \n",
 948 |     "               'normalColWidth': [10,50,10,20], 'sheetName': 'CalculatedFields', 'footer': 'Data_4 (XML extraction + DOC API for Param value)', \n",
 949 |     "               'papersize':9, 'color': c3},\n",
 950 |     "              \n",
 951 |     "              {'excelSheetTitle': 'Filters used in each sheet', 'df_to_use':df1, 'mainColWidth':'' , \n",
 952 |     "               'normalColWidth': [20,20,40], 'sheetName': 'Filters', 'footer': 'Data_5 (XML extraction)', 'papersize':9, 'color': c4},\n",
 953 |     "              \n",
 954 |     "              {'excelSheetTitle': 'Metrics used in Columns and Rows, for each sheet', 'df_to_use':df2, 'mainColWidth':'' , \n",
 955 |     "               'normalColWidth': [30,40], 'sheetName': 'RowsAndCols', 'footer': 'Data_6 (XML extraction)', 'papersize':9, 'color': c5},\n",
 956 |     "              \n",
 957 |     "              {'excelSheetTitle': 'Sheet dependencies on default fields, calculated fields and parameters', 'df_to_use':df_sheetDependencies, 'mainColWidth':'' , \n",
 958 |     "               'normalColWidth': [30,15,25,30], 'sheetName': 'SheetDependencies', 'footer': 'Data_7 (DOC API)', 'papersize':8, 'color': c6}\n",
 959 |     "             ]\n",
 960 |     "\n",
 961 |     "#papersize: a3 = 8, a4 = 9"
 962 |    ]
 963 |   },
 964 |   {
 965 |    "cell_type": "code",
 966 |    "execution_count": null,
 967 |    "metadata": {},
 968 |    "outputs": [],
 969 |    "source": [
 970 |     "writer = pd.ExcelWriter(excelName, engine = 'xlsxwriter')\n",
 971 |     "\n",
 972 |     "#code to create each sheet in excel, with the specified df and formatting each sheet as per requirements\n",
 973 |     "#also adds a header and footer to each sheet\n",
 974 |     "#all the info to be replaced below (ie. for each df) comes form the dfs_to_use list of dictionaries\n",
 975 |     "\n",
 976 |     "for x in dfs_to_use:\n",
 977 |     "    excelSheetTitle = x['excelSheetTitle']\n",
 978 |     "    df_to_use = x['df_to_use']\n",
 979 |     "    normalColWidth = x['normalColWidth']\n",
 980 |     "    sheetName = x['sheetName']\n",
 981 |     "    papersize = x['papersize']\n",
 982 |     "    footer = x['footer']\n",
 983 |     "    color = x['color']\n",
 984 |     "\n",
 985 |     "    df_to_use.to_excel(writer, sheet_name = sheetName, index=False)\n",
 986 |     "    \n",
 987 |     "    workbook=writer.book\n",
 988 |     "    worksheet = writer.sheets[sheetName]\n",
 989 |     "\n",
 990 |     "    worksheet = mainCol(0, color)\n",
 991 |     "    \n",
 992 |     "    ws = 1\n",
 993 |     "    for i in normalColWidth:\n",
 994 |     "        worksheet = normalCol(ws,i)\n",
 995 |     "        ws = ws + 1\n",
 996 |     "\n",
 997 |     "    worksheet.set_paper(papersize) # a4\n",
 998 |     "    worksheet.fit_to_pages(1,0)    # fit to 1 page wide, n long\n",
 999 |     "    worksheet.repeat_rows(0)       # repeat the first row\n",
1000 |     "    \n",
1001 |     "    header_x = '&C&\"Arial,Bold\"&10{}'.format(excelSheetTitle)\n",
1002 |     "    footer_x = '&L{}&CPage &P of &N'.format(footer)\n",
1003 |     "\n",
1004 |     "    worksheet.set_header(header_x)\n",
1005 |     "    worksheet.set_footer(footer_x)\n",
1006 |     "\n",
1007 |     "writer.save()"
1008 |    ]
1009 |   },
1010 |   {
1011 |    "cell_type": "markdown",
1012 |    "metadata": {},
1013 |    "source": [
1014 |     "- Creation of pdf from excel file"
1015 |    ]
1016 |   },
1017 |   {
1018 |    "cell_type": "code",
1019 |    "execution_count": null,
1020 |    "metadata": {},
1021 |    "outputs": [],
1022 |    "source": [
1023 |     "#this creates an index to list each excel sheet, based on the number of sheets that were created before\n",
1024 |     "\n",
1025 |     "for_ws_index_list = []\n",
1026 |     "for i in range(len(dfs_to_use)):\n",
1027 |     "    for_ws_index_list.append(i+1)"
1028 |    ]
1029 |   },
1030 |   {
1031 |    "cell_type": "code",
1032 |    "execution_count": null,
1033 |    "metadata": {},
1034 |    "outputs": [],
1035 |    "source": [
1036 |     "excel = win32com.client.Dispatch(\"Excel.Application\")\n",
1037 |     "excel.Visible = False\n",
1038 |     "\n",
1039 |     "wb = excel.Workbooks.Open(excel_path)\n",
1040 |     "\n",
1041 |     "#print all the excel sheets into a single pdf\n",
1042 |     "ws_index_list = for_ws_index_list\n",
1043 |     "wb.Worksheets(ws_index_list).Select()\n",
1044 |     "wb.ActiveSheet.ExportAsFixedFormat(0, path_to_pdf)\n",
1045 |     "wb.Close()\n",
1046 |     "excel.Quit()"
1047 |    ]
1048 |   }
1049 |  ],
1050 |  "metadata": {
1051 |   "kernelspec": {
1052 |    "display_name": "Python 3 (ipykernel)",
1053 |    "language": "python",
1054 |    "name": "python3"
1055 |   },
1056 |   "language_info": {
1057 |    "codemirror_mode": {
1058 |     "name": "ipython",
1059 |     "version": 3
1060 |    },
1061 |    "file_extension": ".py",
1062 |    "mimetype": "text/x-python",
1063 |    "name": "python",
1064 |    "nbconvert_exporter": "python",
1065 |    "pygments_lexer": "ipython3",
1066 |    "version": "3.7.13"
1067 |   }
1068 |  },
1069 |  "nbformat": 4,
1070 |  "nbformat_minor": 2
1071 | }
1072 | 


--------------------------------------------------------------------------------
/Tableau_calculation_extractor_with_mermaid.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": null,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "# version 3.1\n",
 10 |     "\n",
 11 |     "import pandas as pd\n",
 12 |     "import os, re, sys\n",
 13 |     "import string\n",
 14 |     "import webbrowser\n",
 15 |     "\n",
 16 |     "from tableaudocumentapi import Workbook\n",
 17 |     "from os.path import isfile, join\n",
 18 |     "\n",
 19 |     "import excelgenerator as exg\n",
 20 |     "\n",
 21 |     "pd.set_option('display.max_columns', None)"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "## File Handling\n",
 29 |     "\n",
 30 |     "- this version of code will only work with twbx files"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "code",
 35 |    "execution_count": null,
 36 |    "metadata": {
 37 |     "scrolled": true
 38 |    },
 39 |    "outputs": [],
 40 |    "source": [
 41 |     "input_path = \"inputs\"\n",
 42 |     "output_path = \"outputs\"\n",
 43 |     "\n",
 44 |     "mypath = \"./{}\".format(input_path)   #./ points to \"this path\" as a relative path\n",
 45 |     "\n",
 46 |     "mypath"
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": null,
 52 |    "metadata": {
 53 |     "scrolled": true
 54 |    },
 55 |    "outputs": [],
 56 |    "source": [
 57 |     "#only gets files and not directories within the inputs folder -https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory\n",
 58 |     "input_files = [f for f in os.listdir(mypath) if isfile(join(mypath, f)) and f[-5:] == '.twbx'] \n",
 59 |     "#input_files.pop()\n",
 60 |     "input_files"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "code",
 65 |    "execution_count": null,
 66 |    "metadata": {},
 67 |    "outputs": [],
 68 |    "source": [
 69 |     "def removeSpecialCharFromStr(spstring):\n",
 70 |     "    \n",
 71 |     "#     \"\"\"\n",
 72 |     "#     input: string\n",
 73 |     "#     output: new string, without any special char\n",
 74 |     "#     \"\"\"\n",
 75 |     "    \n",
 76 |     "    return ''.join(e for e in spstring if e.isalnum())\n",
 77 |     "\n",
 78 |     "def removeSpecialCharFromStr_leaveSpaces(spstring):\n",
 79 |     "  \n",
 80 |     "    return ''.join(e for e in spstring if (e.isalnum() or e ==' '))\n",
 81 |     "\n",
 82 |     "def remove_sp_char_then_turn_spaces_into_underscore(string_to_convert):\n",
 83 |     "    filtered_string = re.sub(r'[^a-zA-Z0-9\\s_]', '', string_to_convert).replace(' ', \"_\")\n",
 84 |     "    return filtered_string\n",
 85 |     "\n",
 86 |     "def remove_sp_char_leave_undescore_square_brackets(string_to_convert):\n",
 87 |     "    filtered_string = re.sub(r'[^a-zA-Z0-9\\s._\\[\\]]', '', string_to_convert).replace(' ', \"_\")\n",
 88 |     "    return filtered_string\n",
 89 |     "\n",
 90 |     "def find_twbx_file(inputfile):\n",
 91 |     "    \n",
 92 |     "#     \"\"\"\n",
 93 |     "#     input: any input file\n",
 94 |     "#     output: returns the file name without any special char for a twxb file if one is found, else returns empty string\n",
 95 |     "#     \"\"\"\n",
 96 |     "\n",
 97 |     "    if inputfile[-5:] == '.twbx':\n",
 98 |     "        sp_packagedWorkbook = inputfile[:len(inputfile)-5]\n",
 99 |     "       \n",
100 |     "        packagedWorkbook = removeSpecialCharFromStr(sp_packagedWorkbook)+'.twbx'\n",
101 |     "        \n",
102 |     "        old_file = join(input_path, sp_packagedWorkbook+'.twbx')\n",
103 |     "        new_file = join(input_path, packagedWorkbook)\n",
104 |     "        os.rename(old_file, new_file)\n",
105 |     "\n",
106 |     "    else:\n",
107 |     "        packagedWorkbook = \"\" \n",
108 |     "    \n",
109 |     "    return packagedWorkbook"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "code",
114 |    "execution_count": null,
115 |    "metadata": {},
116 |    "outputs": [],
117 |    "source": [
118 |     "for i in input_files: \n",
119 |     "    packagedWorkbook = find_twbx_file(i)\n",
120 |     "    print('Packaged workbook (no sp char): ' + packagedWorkbook)\n",
121 |     "\n",
122 |     "    #substring to be used when naming the exported data, NEEDS A PACKAGED WORKBOOK TO EXIST, OTHERWISE IT WILL GIVE AN EMPTY STRING\n",
123 |     "    tableau_name_substring = packagedWorkbook.replace(\".twbx\",\"\")[:30]\n",
124 |     "    print('\\nOutput docs name (word/pdf): ' + tableau_name_substring)\n",
125 |     "    \n",
126 |     "packagedTableauFile_relPath = input_path+\"/\"+packagedWorkbook\n",
127 |     "packagedTableauFile_relPath"
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "markdown",
132 |    "metadata": {},
133 |    "source": [
134 |     "# Doc API"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "code",
139 |    "execution_count": null,
140 |    "metadata": {},
141 |    "outputs": [],
142 |    "source": [
143 |     "%%capture \n",
144 |     "\n",
145 |     "#get all fields in workbook\n",
146 |     "TWBX_Workbook = Workbook(packagedTableauFile_relPath)\n",
147 |     "\n",
148 |     "collator = []\n",
149 |     "calcID = []\n",
150 |     "calcID2 = []\n",
151 |     "calcNames = []\n",
152 |     "\n",
153 |     "c = 0\n",
154 |     "    \n",
155 |     "for datasource in TWBX_Workbook.datasources:\n",
156 |     "    datasource_name = datasource.name\n",
157 |     "    datasource_caption = datasource.caption if datasource.caption else datasource_name\n",
158 |     "\n",
159 |     "    for count, field in enumerate(datasource.fields.values()):\n",
160 |     "        dict_temp = {\n",
161 |     "            'counter': c,\n",
162 |     "            'datasource_name': datasource_name,\n",
163 |     "            'datasource_caption': datasource_caption,\n",
164 |     "            'alias': field.alias,\n",
165 |     "            'field_calculation': field.calculation,\n",
166 |     "            'field_calculation_bk': field.calculation,\n",
167 |     "            'field_caption': field.caption,\n",
168 |     "            'field_datatype': field.datatype,\n",
169 |     "            'field_def_agg': field.default_aggregation,\n",
170 |     "            'field_desc': field.description,\n",
171 |     "            'field_hidden': field.hidden,\n",
172 |     "            'field_id': field.id,\n",
173 |     "            'field_is_nominal': field.is_nominal,\n",
174 |     "            'field_is_ordinal': field.is_ordinal,\n",
175 |     "            'field_is_quantitative': field.is_quantitative,\n",
176 |     "            'field_name': field.name,\n",
177 |     "            'field_role': field.role,\n",
178 |     "            'field_type': field.type,\n",
179 |     "            'field_worksheets': field.worksheets,\n",
180 |     "            'field_WHOLE': field\n",
181 |     "        }\n",
182 |     "\n",
183 |     "        if field.calculation is not None:\n",
184 |     "            calcID.append(field.id)\n",
185 |     "            calcNames.append(field.name)\n",
186 |     "\n",
187 |     "            f2 = field.id.replace(']', '').replace('[', '')\n",
188 |     "            calcID2.append(f2)\n",
189 |     "\n",
190 |     "        c += 1\n",
191 |     "        collator.append(dict_temp)"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "code",
196 |    "execution_count": null,
197 |    "metadata": {},
198 |    "outputs": [],
199 |    "source": [
200 |     "def default_to_friendly_names2(formulaList,fieldToConvert, dictToUse):\n",
201 |     "\n",
202 |     "    for i in formulaList:\n",
203 |     "        for tableauName, friendlyName in dictToUse.items():\n",
204 |     "            try:\n",
205 |     "                i[fieldToConvert] = (i[fieldToConvert]).replace(tableauName, friendlyName)\n",
206 |     "            except:\n",
207 |     "                a = 0\n",
208 |     "       \n",
209 |     "    return formulaList\n",
210 |     "\n",
211 |     "\n",
212 |     "def category_field_type(row):\n",
213 |     "    if row['datasource_name'] == 'Parameters':\n",
214 |     "        val = 'Parameters'\n",
215 |     "    elif row['field_calculation'] == None:\n",
216 |     "        val = 'Default_Field'\n",
217 |     "    else:\n",
218 |     "        val = 'Calculated_Field'\n",
219 |     "    return val\n",
220 |     "\n",
221 |     "def compare_fields(row):\n",
222 |     "    if row['field_id'] == row['field_id2']:\n",
223 |     "        val = 0\n",
224 |     "    else:\n",
225 |     "        val = 1\n",
226 |     "    return val"
227 |    ]
228 |   },
229 |   {
230 |    "cell_type": "code",
231 |    "execution_count": null,
232 |    "metadata": {
233 |     "scrolled": false
234 |    },
235 |    "outputs": [],
236 |    "source": [
237 |     "calcDict = dict(zip(calcID, calcNames))\n",
238 |     "calcDict2 = dict(zip(calcID2, calcNames)) #raw fields without any []\n",
239 |     "\n",
240 |     "collator = default_to_friendly_names2(collator,'field_calculation',calcDict2)\n",
241 |     "\n",
242 |     "df_API_all = pd.DataFrame(collator)\n",
243 |     "df_API_all['field_type'] = df_API_all.apply(category_field_type, axis=1)\n",
244 |     "\n",
245 |     "preference_list=['Parameters', 'Calculated_Field', 'Default_Field']\n",
246 |     "df_API_all[\"field_type\"] = pd.Categorical(df_API_all[\"field_type\"], categories=preference_list, ordered=True)\n",
247 |     "\n",
248 |     "#get rid of duplicates for parameters, so only parameters from the explicit Parameters datasource are kept (as they are also listed again under the name of any other datasources)\n",
249 |     "df_API_all = df_API_all.sort_values([\"field_id\",\"field_type\"]).drop_duplicates([\"field_id\", 'field_calculation']) \n",
250 |     "\n",
251 |     "df_API_all['field_id2'] = df_API_all['field_id'].str.replace(r'[\\[\\]]', '', regex=True)\n",
252 |     "\n",
253 |     "df_API_all['comparison'] = df_API_all.apply(compare_fields, axis=1)\n",
254 |     "df_API_all = df_API_all[df_API_all['comparison'] == 1]\n",
255 |     "\n",
256 |     "df_API_all = df_API_all.drop(['field_id2', 'comparison'], axis=1)\n",
257 |     "df_API_all.sort_values(['datasource_name', 'field_type', 'counter', 'field_name'])\n",
258 |     "\n",
259 |     "df1 = df_API_all[[ 'field_name', 'field_datatype','field_type',  'field_calculation',   'field_id', 'datasource_caption']].copy()\n",
260 |     "\n",
261 |     "preference_list=[ 'Default_Field', 'Parameters', 'Calculated_Field']\n",
262 |     "df1[\"field_type\"] = pd.Categorical(df1[\"field_type\"], categories=preference_list, ordered=True)\n",
263 |     "df1 = df1.sort_values(['field_type'])\n",
264 |     "\n",
265 |     "df1.columns = ['Field_Name', 'DataType', 'Type', 'Calculation', 'Field_ID', 'Datasource']\n",
266 |     "\n",
267 |     "df1['Field_Name'] = df1['Field_Name'].str.replace(r'[\\[\\]]', '', regex=True)\n",
268 |     "\n",
269 |     "df1"
270 |    ]
271 |   },
272 |   {
273 |    "cell_type": "markdown",
274 |    "metadata": {},
275 |    "source": [
276 |     "## Generating an excel file from a df (so the excel rows/cols can be formatted), then turning the excel into a pdf"
277 |    ]
278 |   },
279 |   {
280 |    "cell_type": "code",
281 |    "execution_count": null,
282 |    "metadata": {},
283 |    "outputs": [],
284 |    "source": [
285 |     "#modify this part if you want to add more information/dfs to be saved as a separate sheet in excel\n",
286 |     "\n",
287 |     "dfs_to_use = [{'excelSheetTitle': 'All fields extracted from DOC API', 'df_to_use':df1, 'mainColWidth':'' , \n",
288 |     "               'normalColWidth': [10,15,50,20, 25], 'sheetName': 'GeneralDetails', 'footer': 'Data_1 (DOC API)', 'papersize':9, 'color': '#fff0b3'}                \n",
289 |     "             \n",
290 |     "             ]\n",
291 |     "\n",
292 |     "#papersize: a3 = 8, a4 = 9"
293 |    ]
294 |   },
295 |   {
296 |    "cell_type": "code",
297 |    "execution_count": null,
298 |    "metadata": {
299 |     "scrolled": true
300 |    },
301 |    "outputs": [],
302 |    "source": [
303 |     "path_excel_file_to_create, path_pdf_file_to_create = exg.create_new_file_paths(tableau_name_substring+'_CALCS_only')\n",
304 |     "\n",
305 |     "exg.create_excel_from_dfs(dfs_to_use, path_excel_file_to_create)\n",
306 |     "\n",
307 |     "exg.create_pdf_from_excel(path_excel_file_to_create, path_pdf_file_to_create, dfs_to_use)"
308 |    ]
309 |   },
310 |   {
311 |    "cell_type": "markdown",
312 |    "metadata": {},
313 |    "source": [
314 |     "# Start of mermaid module"
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "code",
319 |    "execution_count": null,
320 |    "metadata": {},
321 |    "outputs": [],
322 |    "source": [
323 |     "def first_char_checker(cell_value):\n",
324 |     "    if cell_value[0] != '[':\n",
325 |     "        cell_value = '__' + cell_value + '__'\n",
326 |     "    else:\n",
327 |     "        cell_value = cell_value.replace('[', '__')\n",
328 |     "        cell_value = cell_value.replace(']', '__')\n",
329 |     "\n",
330 |     "    return cell_value\n",
331 |     "\n",
332 |     "\n",
333 |     "#define abc list to use during mermaid creation\n",
334 |     "\n",
335 |     "abc=list(string.ascii_uppercase)\n",
336 |     "collated_abc = []\n",
337 |     "\n",
338 |     "for i in abc:\n",
339 |     "    for j in abc:\n",
340 |     "        collated_abc.append(i+j)"
341 |    ]
342 |   },
343 |   {
344 |    "cell_type": "code",
345 |    "execution_count": null,
346 |    "metadata": {},
347 |    "outputs": [],
348 |    "source": [
349 |     "def_fields = df1[df1['Type'] == 'Default_Field']['Field_ID'].copy().apply(remove_sp_char_leave_undescore_square_brackets)\n",
350 |     "\n",
351 |     "abc_touse = collated_abc[0:len(def_fields)]\n",
352 |     "\n",
353 |     "def_fields_final = pd.DataFrame(list(zip(def_fields.tolist(), abc_touse)))\n",
354 |     "def_fields_final['aa'] = def_fields_final.apply(lambda row: first_char_checker(row[0]), axis=1)\n",
355 |     "def_fields_final['default_field'] = def_fields_final.apply(lambda row: '_st_' + row['aa'] + '_en_', axis=1)\n",
356 |     "\n",
357 |     "mapping_dict_friendly_names = dict(zip(def_fields_final[0].tolist(), abc_touse))\n",
358 |     "mapping_dict = dict(zip(def_fields_final['aa'].tolist(), abc_touse))\n",
359 |     "\n",
360 |     "def_fields_final"
361 |    ]
362 |   },
363 |   {
364 |    "cell_type": "code",
365 |    "execution_count": null,
366 |    "metadata": {
367 |     "scrolled": true
368 |    },
369 |    "outputs": [],
370 |    "source": [
371 |     "created_calc = df_API_all[df_API_all['field_type'] != 'Default_Field']\\\n",
372 |     "                [['field_name', 'field_id', 'field_calculation', 'field_calculation_bk']].copy()\n",
373 |     "\n",
374 |     "nlsi = ['x___' + i for i in collated_abc]\n",
375 |     "nlsi_to_use = nlsi[0:len(created_calc)]\n",
376 |     "\n",
377 |     "created_calc['field_name'] = created_calc['field_name'].apply(remove_sp_char_leave_undescore_square_brackets)\n",
378 |     "created_calc['aa'] = created_calc.apply(lambda row: first_char_checker(row['field_id']), axis=1)\n",
379 |     "created_calc['calc_field'] = created_calc.apply(lambda row: '_st_' + row['aa'] + '_en_', axis=1)\n",
380 |     "created_calc['field_calculation_bk'] = created_calc['field_calculation_bk'].str.replace(r'[\\[\\]]', '__', regex=True)\n",
381 |     "\n",
382 |     "created_calc"
383 |    ]
384 |   },
385 |   {
386 |    "cell_type": "code",
387 |    "execution_count": null,
388 |    "metadata": {},
389 |    "outputs": [],
390 |    "source": [
391 |     "calc_map_dict = dict(zip(created_calc['aa'].to_list(), nlsi_to_use))\n",
392 |     "calc_map_dict"
393 |    ]
394 |   },
395 |   {
396 |    "cell_type": "code",
397 |    "execution_count": null,
398 |    "metadata": {
399 |     "scrolled": true
400 |    },
401 |    "outputs": [],
402 |    "source": [
403 |     "created_calc['shorthand_abc'] = created_calc['aa'].map(calc_map_dict)\n",
404 |     "created_calc.sort_values(by='shorthand_abc', inplace = True)\n",
405 |     "created_calc"
406 |    ]
407 |   },
408 |   {
409 |    "cell_type": "code",
410 |    "execution_count": null,
411 |    "metadata": {},
412 |    "outputs": [],
413 |    "source": [
414 |     "# function to add suffixes to duplicate values\n",
415 |     "def differentiate_duplicates(series):\n",
416 |     "    counts = series.groupby(series).cumcount() \n",
417 |     "    return series + counts.astype(str).replace('0', '')\n",
418 |     "\n",
419 |     "# differentiate field names that have duplicate values (eg. calc field Index appears twice in workbook, now it will be Index, Index1)\n",
420 |     "created_calc['field_name'] = differentiate_duplicates(created_calc['field_name'])\n",
421 |     "\n",
422 |     "created_calc"
423 |    ]
424 |   },
425 |   {
426 |    "cell_type": "code",
427 |    "execution_count": null,
428 |    "metadata": {
429 |     "scrolled": true
430 |    },
431 |    "outputs": [],
432 |    "source": [
433 |     "calc_map_dict_friendly_names = dict(zip(created_calc['field_name'], created_calc['shorthand_abc'] ))\n",
434 |     "calc_map_dict_friendly_names"
435 |    ]
436 |   },
437 |   {
438 |    "cell_type": "code",
439 |    "execution_count": null,
440 |    "metadata": {},
441 |    "outputs": [],
442 |    "source": [
443 |     "def create_mermaid_paths(df, field_type):\n",
444 |     "    \n",
445 |     "    c = 0\n",
446 |     "    t_collator = []\n",
447 |     "\n",
448 |     "    for i in df['aa']:\n",
449 |     "\n",
450 |     "        print('\\n______________________' + field_type.upper() + ' TO ANALYSE ________________________: ' + i + '\\n')\n",
451 |     "\n",
452 |     "        try:\n",
453 |     "            tlist = created_calc[created_calc['field_calculation_bk'].str.contains(i, regex=False) == True]['aa'].to_list()\n",
454 |     "        except:\n",
455 |     "            tlist = []\n",
456 |     "\n",
457 |     "        if len(tlist) != 0:\n",
458 |     "            print('LIST PRINTING:\\n\\n' + str(tlist))\n",
459 |     "\n",
460 |     "            for x in tlist:\n",
461 |     "                newdict = {}\n",
462 |     "\n",
463 |     "                newdict['count'] = c\n",
464 |     "                newdict['starting'] = i\n",
465 |     "                newdict['ending'] = x\n",
466 |     "\n",
467 |     "                newdict['path_mermaid'] = i + \" --> \" + x\n",
468 |     "\n",
469 |     "                print('\\n' + str(c) + ' ******************NEW DICT PRINTING ********************** \\n\\n' + str(newdict))\n",
470 |     "\n",
471 |     "                t_collator.append(newdict)\n",
472 |     "\n",
473 |     "                c = c + 1\n",
474 |     "    \n",
475 |     "    return t_collator"
476 |    ]
477 |   },
478 |   {
479 |    "cell_type": "code",
480 |    "execution_count": null,
481 |    "metadata": {},
482 |    "outputs": [],
483 |    "source": [
484 |     "t_collator_def_fields = create_mermaid_paths(def_fields_final, 'default_field')\n",
485 |     "t_collator_def_fields"
486 |    ]
487 |   },
488 |   {
489 |    "cell_type": "code",
490 |    "execution_count": null,
491 |    "metadata": {},
492 |    "outputs": [],
493 |    "source": [
494 |     "t_collator_calcs = create_mermaid_paths(created_calc, 'calculation')\n",
495 |     "t_collator_calcs"
496 |    ]
497 |   },
498 |   {
499 |    "cell_type": "code",
500 |    "execution_count": null,
501 |    "metadata": {},
502 |    "outputs": [],
503 |    "source": [
504 |     "###############################\n",
505 |     "#replace the full names of fields and calcs for their abbrv letters, to make the mermaid code leaner\n",
506 |     "\n",
507 |     "for default_field, mapping_letter in mapping_dict.items():\n",
508 |     "    for i in t_collator_def_fields:\n",
509 |     "        i['path_mermaid'] = i['path_mermaid'].replace(default_field, mapping_letter)\n",
510 |     "\n",
511 |     "for default_field, mapping_letter in calc_map_dict.items():\n",
512 |     "    for i in t_collator_def_fields:\n",
513 |     "        i['path_mermaid'] = i['path_mermaid'].replace(default_field, mapping_letter)\n",
514 |     "\n",
515 |     "t_collator_def_fields\n",
516 |     "##############################\n",
517 |     "\n",
518 |     "##############################\n",
519 |     "# replace the full names of fields and calcs for their abbrv letters, to make the mermaid code leaner\n",
520 |     "\n",
521 |     "for default_field, mapping_letter in mapping_dict.items():\n",
522 |     "    for i in t_collator_calcs:\n",
523 |     "        i['path_mermaid'] = i['path_mermaid'].replace(default_field, mapping_letter)\n",
524 |     "\n",
525 |     "for default_field, mapping_letter in calc_map_dict.items():\n",
526 |     "    for i in t_collator_calcs:\n",
527 |     "        i['path_mermaid'] = i['path_mermaid'].replace(default_field, mapping_letter)\n",
528 |     "\n",
529 |     "t_collator_calcs\n",
530 |     "##############################"
531 |    ]
532 |   },
533 |   {
534 |    "cell_type": "code",
535 |    "execution_count": null,
536 |    "metadata": {},
537 |    "outputs": [],
538 |    "source": [
539 |     "new_list_a = ['']\n",
540 |     "fields_list = ['']\n",
541 |     "\n",
542 |     "new_list_a.extend([i['path_mermaid'] for i in t_collator_calcs])\n",
543 |     "new_list_a.extend([i['path_mermaid'] for i in t_collator_def_fields])\n",
544 |     "\n",
545 |     "################################\n",
546 |     "#find the unique nodes within the a --> b mermaid paths in new_list_a (eg. a and b)\n",
547 |     "c = []\n",
548 |     "\n",
549 |     "for i in new_list_a:\n",
550 |     "    print(i)\n",
551 |     "    c.append(i.split(' --> ')[0])\n",
552 |     "\n",
553 |     "    try:\n",
554 |     "        c.append(i.split(' --> ')[1])\n",
555 |     "    except:\n",
556 |     "        pass\n",
557 |     "\n",
558 |     "c.pop(0)\n",
559 |     "s = set(c)\n",
560 |     "c = list(s)\n",
561 |     "##############################\n",
562 |     "\n",
563 |     "for i, d in mapping_dict_friendly_names.items():\n",
564 |     "    if d in c:\n",
565 |     "        if i[0] != '[':\n",
566 |     "            print(d + \"[\" + i + \"]\")\n",
567 |     "            fields_list.append(d + \"[\" + i + \"]:::foo\")\n",
568 |     "        else:\n",
569 |     "            print(d + i)\n",
570 |     "            fields_list.append(d + i + ':::foo')\n",
571 |     "\n",
572 |     "for i, d in calc_map_dict_friendly_names.items():\n",
573 |     "    if d in c:\n",
574 |     "        print(d + \"[\" + i + \"]\")\n",
575 |     "        fields_list.append(d + \"[\" + i + \"]\")\n",
576 |     "        \n",
577 |     "superfinallist =  fields_list + new_list_a\n",
578 |     "superfinallist"
579 |    ]
580 |   },
581 |   {
582 |    "cell_type": "code",
583 |    "execution_count": null,
584 |    "metadata": {
585 |     "scrolled": true
586 |    },
587 |    "outputs": [],
588 |    "source": [
589 |     "mermaid_diagram_code = \\\n",
590 |     "\"\"\"\n",
591 |     "flowchart LR\n",
592 |     "    classDef foo fill:#f9f,stroke:#333,stroke-width:1px{}\n",
593 |     "\"\"\".format(\"\\n\\t\".join(superfinallist))\n",
594 |     "\n",
595 |     "print(mermaid_diagram_code)"
596 |    ]
597 |   },
598 |   {
599 |    "cell_type": "code",
600 |    "execution_count": null,
601 |    "metadata": {},
602 |    "outputs": [],
603 |    "source": [
604 |     "### Create html which will display the mermaid diagram\n",
605 |     "\n",
606 |     "\n",
607 |     "html_base = \"\"\"\n",
608 |     "\n",
609 |     "<!DOCTYPE html>\n",
610 |     "<html lang=\"en\">\n",
611 |     "<head>\n",
612 |     "    <meta charset=\"UTF-8\">\n",
613 |     "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n",
614 |     "    <title>\"\"\" + tableau_name_substring + \" Calculation Lineage\" + \"\"\"</title>\n",
615 |     "    <!-- Include Mermaid.js library -->\n",
616 |     "    <script type=\"module\">\n",
617 |     "      import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';\n",
618 |     "      mermaid.initialize({ startOnLoad: true });\n",
619 |     "    </script>\n",
620 |     "</head>\n",
621 |     "<body>\n",
622 |     "    <h1>\"\"\" +  tableau_name_substring + \" Calculation Lineage\" + \"\"\"</h1>\n",
623 |     "    <!-- Mermaid diagram definition -->\n",
624 |     "    <div class=\"mermaid\">\"\"\" + mermaid_diagram_code + \"\"\"</div>\n",
625 |     "</body>\n",
626 |     "</html>\n",
627 |     "\"\"\"\n",
628 |     "\n",
629 |     "print('\\n ______________________________ START_OF_HTML ______________________________')\n",
630 |     "print(html_base)\n",
631 |     "print('\\n ______________________________ END_OF_HTML ______________________________')\n",
632 |     "\n",
633 |     "\n",
634 |     "### Output html string to a local file, then open it on the web browser (this bit was done with help of chatgpt)\n",
635 |     "\n",
636 |     "# Specify the file path\n",
637 |     "file_path = 'outputs\\mermaid_diagram_{}.html'.format(tableau_name_substring)\n",
638 |     "\n",
639 |     "# Write the string to an HTML file\n",
640 |     "with open(file_path, 'w') as file:\n",
641 |     "    file.write(html_base)\n",
642 |     "\n",
643 |     "print(\"HTML content successfully written to {}\".format(file_path))\n",
644 |     "\n",
645 |     "# Open the HTML file in the default web browser\n",
646 |     "webbrowser.open('file://' + os.path.realpath(file_path))\n",
647 |     "\n",
648 |     "### end of chatgpt code"
649 |    ]
650 |   }
651 |  ],
652 |  "metadata": {
653 |   "kernelspec": {
654 |    "display_name": "Python 3 (ipykernel)",
655 |    "language": "python",
656 |    "name": "python3"
657 |   },
658 |   "language_info": {
659 |    "codemirror_mode": {
660 |     "name": "ipython",
661 |     "version": 3
662 |    },
663 |    "file_extension": ".py",
664 |    "mimetype": "text/x-python",
665 |    "name": "python",
666 |    "nbconvert_exporter": "python",
667 |    "pygments_lexer": "ipython3",
668 |    "version": "3.7.13"
669 |   }
670 |  },
671 |  "nbformat": 4,
672 |  "nbformat_minor": 2
673 | }
674 | 


--------------------------------------------------------------------------------
/Tableau_calculation_extractor_with_mermaid.py:
--------------------------------------------------------------------------------
  1 | # version 3.0
  2 | 
  3 | import pandas as pd, os, re, string, webbrowser
  4 | 
  5 | from tableaudocumentapi import Workbook
  6 | from os.path import isfile, join
  7 | 
  8 | import excelgenerator as exg
  9 | 
 10 | pd.set_option('display.max_columns', None)
 11 | 
 12 | 
 13 | # ## File Handling
 14 | # 
 15 | # - this version of code will only work with twbx files
 16 | 
 17 | input_path = "inputs"
 18 | output_path = "outputs"
 19 | 
 20 | mypath = "./{}".format(input_path)   #./ points to "this path" as a relative path
 21 | 
 22 | #only gets files and not directories within the inputs folder -https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory
 23 | input_files = [f for f in os.listdir(mypath) if isfile(join(mypath, f))] 
 24 | input_files
 25 | 
 26 | 
 27 | 
 28 | def removeSpecialCharFromStr(spstring):
 29 |     
 30 | #     """
 31 | #     input: string
 32 | #     output: new string, without any special char
 33 | #     """
 34 |     
 35 |     return ''.join(e for e in spstring if e.isalnum())
 36 | 
 37 | 
 38 | def removeSpecialCharFromStr_leaveSpaces(spstring):
 39 |   
 40 |     return ''.join(e for e in spstring if (e.isalnum() or e ==' '))
 41 | 
 42 | 
 43 | def remove_sp_char_then_turn_spaces_into_underscore(string_to_convert):
 44 |     filtered_string = re.sub(r'[^a-zA-Z0-9\s_]', '', string_to_convert).replace(' ', "_")
 45 |     return filtered_string
 46 | 
 47 | 
 48 | def remove_sp_char_leave_undescore_square_brackets(string_to_convert):
 49 |     filtered_string = re.sub(r'[^a-zA-Z0-9\s_\[\]]', '', string_to_convert).replace(' ', "_")
 50 |     return filtered_string
 51 | 
 52 | def find_twbx_file(inputfile):
 53 |     
 54 | #     """
 55 | #     input: any input file
 56 | #     output: returns the file name without any special char for a twxb file if one is found, else returns empty string
 57 | #     """
 58 | 
 59 |     if inputfile[-5:] == '.twbx':
 60 |         sp_packagedWorkbook = i[:len(inputfile)-5]
 61 |        
 62 |         packagedWorkbook = removeSpecialCharFromStr(sp_packagedWorkbook)+'.twbx'
 63 |         
 64 |         old_file = join(input_path, sp_packagedWorkbook+'.twbx')
 65 |         new_file = join(input_path, packagedWorkbook)
 66 |         os.rename(old_file, new_file)
 67 | 
 68 |     else:
 69 |         packagedWorkbook = "" 
 70 |     
 71 |     return packagedWorkbook
 72 | 
 73 | 
 74 | for i in input_files:
 75 |     packagedWorkbook = find_twbx_file(i)
 76 |     print('Packaged workbook (no sp char): ' + packagedWorkbook)
 77 | 
 78 |     #substring to be used when naming the exported data, NEEDS A PACKAGED WORKBOOK TO EXIST, OTHERWISE IT WILL GIVE AN EMPTY STRING
 79 |     tableau_name_substring = packagedWorkbook.replace(".twbx","")[:30]
 80 |     print('\nOutput docs name (word/pdf): ' + tableau_name_substring)
 81 |     
 82 | packagedTableauFile_relPath = input_path+"/"+packagedWorkbook
 83 | 
 84 | 
 85 | # # Doc API
 86 | 
 87 | # get all fields in workbook
 88 | TWBX_Workbook = Workbook(packagedTableauFile_relPath)
 89 | 
 90 | collator = []
 91 | calcID = []
 92 | calcID2 = []
 93 | calcNames = []
 94 | 
 95 | c = 0
 96 | 
 97 | for datasource in TWBX_Workbook.datasources:
 98 |     datasource_name = datasource.name
 99 |     datasource_caption = datasource.caption if datasource.caption else datasource_name
100 | 
101 |     for count, field in enumerate(datasource.fields.values()):
102 |         dict_temp = {
103 |             'counter': c,
104 |             'datasource_name': datasource_name,
105 |             'datasource_caption': datasource_caption,
106 |             'alias': field.alias,
107 |             'field_calculation': field.calculation,
108 |             'field_calculation_bk': field.calculation,
109 |             'field_caption': field.caption,
110 |             'field_datatype': field.datatype,
111 |             'field_def_agg': field.default_aggregation,
112 |             'field_desc': field.description,
113 |             'field_hidden': field.hidden,
114 |             'field_id': field.id,
115 |             'field_is_nominal': field.is_nominal,
116 |             'field_is_ordinal': field.is_ordinal,
117 |             'field_is_quantitative': field.is_quantitative,
118 |             'field_name': field.name,
119 |             'field_role': field.role,
120 |             'field_type': field.type,
121 |             'field_worksheets': field.worksheets,
122 |             'field_WHOLE': field
123 |         }
124 | 
125 |         if field.calculation is not None:
126 |             calcID.append(field.id)
127 |             calcNames.append(field.name)
128 | 
129 |             f2 = field.id.replace(']', '').replace('[', '')
130 |             calcID2.append(f2)
131 | 
132 |         c += 1
133 |         collator.append(dict_temp)
134 | 
135 | 
136 | 
137 | def default_to_friendly_names2(formulaList,fieldToConvert, dictToUse):
138 | 
139 |     for i in formulaList:
140 |         for tableauName, friendlyName in dictToUse.items():
141 |             try:
142 |                 i[fieldToConvert] = (i[fieldToConvert]).replace(tableauName, friendlyName)
143 |             except:
144 |                 a = 0
145 |        
146 |     return formulaList
147 | 
148 | 
149 | def category_field_type(row):
150 |     if row['datasource_name'] == 'Parameters':
151 |         val = 'Parameters'
152 |     elif row['field_calculation'] == None:
153 |         val = 'Default_Field'
154 |     else:
155 |         val = 'Calculated_Field'
156 |     return val
157 | 
158 | def compare_fields(row):
159 |     if row['field_id'] == row['field_id2']:
160 |         val = 0
161 |     else:
162 |         val = 1
163 |     return val
164 | 
165 | 
166 | calcDict = dict(zip(calcID, calcNames))
167 | calcDict2 = dict(zip(calcID2, calcNames)) #raw fields without any []
168 | 
169 | collator = default_to_friendly_names2(collator,'field_calculation',calcDict2)
170 | 
171 | df_API_all = pd.DataFrame(collator)
172 | df_API_all['field_type'] = df_API_all.apply(category_field_type, axis=1)
173 | 
174 | preference_list=['Parameters', 'Calculated_Field', 'Default_Field']
175 | df_API_all["field_type"] = pd.Categorical(df_API_all["field_type"], categories=preference_list, ordered=True)
176 | 
177 | #get rid of duplicates for parameters, so only parameters from the explicit Parameters datasource are kept (as they are also listed again under the name of any other datasources)
178 | df_API_all = df_API_all.sort_values(["field_id","field_type"]).drop_duplicates(["field_id", 'field_calculation']) 
179 | 
180 | df_API_all['field_id2'] = df_API_all['field_id'].str.replace(r'[\[\]]', '', regex=True)
181 | 
182 | df_API_all['comparison'] = df_API_all.apply(compare_fields, axis=1)
183 | df_API_all = df_API_all[df_API_all['comparison'] == 1]
184 | 
185 | df_API_all = df_API_all.drop(['field_id2', 'comparison'], axis=1)
186 | df_API_all.sort_values(['datasource_name', 'field_type', 'counter', 'field_name'])
187 | 
188 | df1 = df_API_all[[ 'field_name', 'field_datatype','field_type',  'field_calculation',   'field_id', 'datasource_caption']].copy()
189 | 
190 | preference_list=[ 'Default_Field', 'Parameters', 'Calculated_Field']
191 | df1["field_type"] = pd.Categorical(df1["field_type"], categories=preference_list, ordered=True)
192 | df1 = df1.sort_values(['field_type'])
193 | 
194 | df1.columns = ['Field_Name', 'DataType', 'Type', 'Calculation', 'Field_ID', 'Datasource']
195 | 
196 | df1['Field_Name'] = df1['Field_Name'].str.replace(r'[\[\]]', '', regex=True)
197 | 
198 | 
199 | 
200 | # ## Generating an excel file from a df (so the excel rows/cols can be formatted), then turning the excel into a pdf
201 | 
202 | #modify this part if you want to add more information/dfs to be saved as a separate sheet in excel
203 | 
204 | dfs_to_use = [{'excelSheetTitle': 'All fields extracted from DOC API', 'df_to_use':df1, 'mainColWidth':'' , 
205 |                'normalColWidth': [10,15,50,20, 25], 'sheetName': 'GeneralDetails', 'footer': 'Data_1 (DOC API)', 'papersize':9, 'color': '#fff0b3'}                
206 |              
207 |              ]
208 | 
209 | #papersize: a3 = 8, a4 = 9
210 | 
211 | 
212 | 
213 | path_excel_file_to_create, path_pdf_file_to_create = exg.create_new_file_paths(tableau_name_substring+'_CALCS_only')
214 | 
215 | exg.create_excel_from_dfs(dfs_to_use, path_excel_file_to_create)
216 | 
217 | exg.create_pdf_from_excel(path_excel_file_to_create, path_pdf_file_to_create, dfs_to_use)
218 | 
219 | 
220 | # # Start of mermaid module
221 | 
222 | # In[23]:
223 | 
224 | 
225 | def first_char_checker(cell_value):
226 |     if cell_value[0] != '[':
227 |         cell_value = '__' + cell_value + '__'
228 |     else:
229 |         cell_value = cell_value.replace('[', '__')
230 |         cell_value = cell_value.replace(']', '__')
231 | 
232 |     return cell_value
233 | 
234 | 
235 | #define abc list to use during mermaid creation
236 | 
237 | abc=list(string.ascii_uppercase)
238 | collated_abc = []
239 | 
240 | for i in abc:
241 |     for j in abc:
242 |         collated_abc.append(i+j)
243 | 
244 | 
245 | # In[24]:
246 | 
247 | 
248 | def_fields = df1[df1['Type'] == 'Default_Field']['Field_ID'].copy().apply(remove_sp_char_leave_undescore_square_brackets)
249 | 
250 | abc_touse = collated_abc[0:len(def_fields)]
251 | 
252 | def_fields_final = pd.DataFrame(list(zip(def_fields.tolist(), abc_touse)))
253 | def_fields_final['aa'] = def_fields_final.apply(lambda row: first_char_checker(row[0]), axis=1)
254 | def_fields_final['default_field'] = def_fields_final.apply(lambda row: '_st_' + row['aa'] + '_en_', axis=1)
255 | 
256 | mapping_dict_friendly_names = dict(zip(def_fields_final[0].tolist(), abc_touse))
257 | mapping_dict = dict(zip(def_fields_final['aa'].tolist(), abc_touse))
258 | 
259 | 
260 | created_calc = df_API_all[df_API_all['field_type'] != 'Default_Field'][['field_name', 'field_id', 'field_calculation', 'field_calculation_bk']].copy()
261 | 
262 | nlsi = ['x___' + i for i in collated_abc]
263 | nlsi_to_use = nlsi[0:len(created_calc)]
264 | 
265 | created_calc['field_name'] = created_calc['field_name'].apply(remove_sp_char_leave_undescore_square_brackets)
266 | created_calc['aa'] = created_calc.apply(lambda row: first_char_checker(row['field_id']), axis=1)
267 | created_calc['calc_field'] = created_calc.apply(lambda row: '_st_' + row['aa'] + '_en_', axis=1)
268 | created_calc['field_calculation_bk'] = created_calc['field_calculation_bk'].str.replace(r'[\[\]]', '__', regex=True)
269 | 
270 | calc_map_dict_friendly_names = dict(zip(created_calc['field_name'].to_list(), nlsi_to_use))
271 | calc_map_dict = dict(zip(created_calc['aa'].to_list(), nlsi_to_use))
272 | 
273 | 
274 | def create_mermaid_paths(df, field_type):
275 |     
276 |     c = 0
277 |     t_collator = []
278 | 
279 |     for i in df['aa']:
280 | 
281 |         print('\n______________________' + field_type.upper() + ' TO ANALYSE ________________________: ' + i + '\n')
282 | 
283 |         try:
284 |             tlist = created_calc[created_calc['field_calculation_bk'].str.contains(i, regex=False) == True]['aa'].to_list()
285 |         except:
286 |             tlist = []
287 | 
288 |         if len(tlist) != 0:
289 |             print('LIST PRINTING:\n\n' + str(tlist))
290 | 
291 |             for x in tlist:
292 |                 newdict = {}
293 | 
294 |                 newdict['count'] = c
295 |                 newdict['starting'] = i
296 |                 newdict['ending'] = x
297 | 
298 |                 newdict['path_mermaid'] = i + " --> " + x
299 | 
300 |                 print('\n' + str(c) + ' ******************NEW DICT PRINTING ********************** \n\n' + str(newdict))
301 | 
302 |                 t_collator.append(newdict)
303 | 
304 |                 c = c + 1
305 |     
306 |     return t_collator
307 | 
308 | 
309 | 
310 | t_collator_def_fields = create_mermaid_paths(def_fields_final, 'default_field')
311 | 
312 | 
313 | t_collator_calcs = create_mermaid_paths(created_calc, 'calculation')
314 | 
315 | 
316 | 
317 | ###############################
318 | #replace the full names of fields and calcs for their abbrv letters, to make the mermaid code leaner
319 | 
320 | for default_field, mapping_letter in mapping_dict.items():
321 |     for i in t_collator_def_fields:
322 |         i['path_mermaid'] = i['path_mermaid'].replace(default_field, mapping_letter)
323 | 
324 | for default_field, mapping_letter in calc_map_dict.items():
325 |     for i in t_collator_def_fields:
326 |         i['path_mermaid'] = i['path_mermaid'].replace(default_field, mapping_letter)
327 | 
328 | 
329 | ##############################
330 | 
331 | ##############################
332 | # replace the full names of fields and calcs for their abbrv letters, to make the mermaid code leaner
333 | 
334 | for default_field, mapping_letter in mapping_dict.items():
335 |     for i in t_collator_calcs:
336 |         i['path_mermaid'] = i['path_mermaid'].replace(default_field, mapping_letter)
337 | 
338 | for default_field, mapping_letter in calc_map_dict.items():
339 |     for i in t_collator_calcs:
340 |         i['path_mermaid'] = i['path_mermaid'].replace(default_field, mapping_letter)
341 | 
342 | 
343 | ##############################
344 | 
345 | 
346 | new_list_a = ['']
347 | fields_list = ['']
348 | 
349 | new_list_a.extend([i['path_mermaid'] for i in t_collator_calcs])
350 | new_list_a.extend([i['path_mermaid'] for i in t_collator_def_fields])
351 | 
352 | ################################
353 | #find the unique nodes within the a --> b mermaid paths in new_list_a (eg. a and b)
354 | c = []
355 | 
356 | for i in new_list_a:
357 |     print(i)
358 |     c.append(i.split(' --> ')[0])
359 | 
360 |     try:
361 |         c.append(i.split(' --> ')[1])
362 |     except:
363 |         pass
364 | 
365 | c.pop(0)
366 | s = set(c)
367 | c = list(s)
368 | ##############################
369 | 
370 | for i, d in mapping_dict_friendly_names.items():
371 |     if d in c:
372 |         if i[0] != '[':
373 |             print(d + "[" + i + "]")
374 |             fields_list.append(d + "[" + i + "]:::foo")
375 |         else:
376 |             print(d + i)
377 |             fields_list.append(d + i + ':::foo')
378 | 
379 | for i, d in calc_map_dict_friendly_names.items():
380 |     if d in c:
381 |         print(d + "[" + i + "]")
382 |         fields_list.append(d + "[" + i + "]")
383 |         
384 | superfinallist =  fields_list + new_list_a
385 | 
386 | 
387 | 
388 | mermaid_diagram_code = """
389 | flowchart LR
390 |     classDef foo fill:#f9f,stroke:#333,stroke-width:1px{}
391 | """.format("\n\t".join(superfinallist))
392 | 
393 | print(mermaid_diagram_code)
394 | 
395 | 
396 | ### Create html which will display the mermaid diagram
397 | 
398 | html_base = """
399 | 
400 | <!DOCTYPE html>
401 | <html lang="en">
402 | <head>
403 |     <meta charset="UTF-8">
404 |     <meta name="viewport" content="width=device-width, initial-scale=1.0">
405 |     <title>""" + tableau_name_substring + " Calculation Lineage" + """</title>
406 |     <!-- Include Mermaid.js library -->
407 |     <script type="module">
408 |       import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
409 |       mermaid.initialize({ startOnLoad: true });
410 |     </script>
411 | </head>
412 | <body>
413 |     <h1>""" +  tableau_name_substring + " Calculation Lineage" + """</h1>
414 |     <!-- Mermaid diagram definition -->
415 |     <div class="mermaid">""" + mermaid_diagram_code + """</div>
416 | </body>
417 | </html>
418 | """
419 | 
420 | print('\n ______________________________ START_OF_HTML ______________________________')
421 | print(html_base)
422 | print('\n ______________________________ END_OF_HTML ______________________________')
423 | 
424 | 
425 | 
426 | ### Output html string to a local file, then open it on the web browser (this bit was done with help of chatgpt)
427 | 
428 | # Specify the file path
429 | file_path = 'outputs\mermaid_diagram_{}.html'.format(tableau_name_substring)
430 | 
431 | # Write the string to an HTML file
432 | with open(file_path, 'w') as file:
433 |     file.write(html_base)
434 | 
435 | print("HTML content successfully written to {}".format(file_path))
436 | 
437 | # Open the HTML file in the default web browser
438 | webbrowser.open('file://' + os.path.realpath(file_path))
439 | 
440 | ### end of code block done with help of chatgpt
441 | 
442 | 


--------------------------------------------------------------------------------
/excelgenerator.py:
--------------------------------------------------------------------------------
  1 | import os, pathlib
  2 | import win32com.client
  3 | import pandas as pd
  4 | 
  5 | 
  6 | def create_new_file_paths(tableau_name_substring):
  7 | 
  8 |     cwd = os.getcwd()
  9 |     path_string = pathlib.Path(cwd).resolve().__str__() + "\{}"
 10 | 
 11 |     print(path_string)
 12 | 
 13 |     newFileName = 'outputs\{}'.format(tableau_name_substring)
 14 | 
 15 |     excel_path = path_string.format(newFileName + ".xlsx")
 16 |     path_to_pdf = path_string.format(newFileName + ".pdf")
 17 | 
 18 |     print(excel_path)
 19 |     print(path_to_pdf)
 20 | 
 21 |     return (excel_path, path_to_pdf)
 22 | 
 23 | 
 24 | def mainCol(colNumber, color, writer, sheetName):
 25 | 
 26 |     workbook = writer.book
 27 |     worksheet = writer.sheets[sheetName]
 28 | 
 29 |     format_mainCol = workbook.add_format({'text_wrap': True, 'bold': True})
 30 |     format_mainCol.set_align('vcenter')
 31 |     format_mainCol.set_bg_color(color)
 32 |     format_mainCol.set_border(1)
 33 |     worksheet.set_column(colNumber,colNumber,20,format_mainCol)
 34 |     return worksheet
 35 | 
 36 | 
 37 | def normalCol(colNumber, colWidth, writer, sheetName):
 38 | 
 39 |     workbook = writer.book
 40 |     worksheet = writer.sheets[sheetName]
 41 | 
 42 | 
 43 |     format2 = workbook.add_format({'text_wrap': True})
 44 |     format2.set_align('vcenter')
 45 |     format2.set_border(1)
 46 |     worksheet.set_column(colNumber,colNumber,colWidth,format2)
 47 |     return worksheet
 48 | 
 49 | 
 50 | def create_excel_from_dfs(dfs_to_use, excel_path):
 51 | 
 52 |     writer = pd.ExcelWriter(excel_path, engine='xlsxwriter')
 53 |     
 54 |     # input: any number of dfs
 55 |     # output: an excel file with one excel sheet per df
 56 | 
 57 |     # code to create each sheet in excel, with the specified df and formatting each sheet as per requirements
 58 |     # also adds a header and footer to each sheet
 59 |     # all the info to be replaced below (ie. for each df) comes form the dfs_to_use list of dictionaries
 60 | 
 61 |     for x in dfs_to_use:
 62 |         excelSheetTitle = x['excelSheetTitle']
 63 |         df_to_use = x['df_to_use']
 64 |         normalColWidth = x['normalColWidth']
 65 |         sheetName = x['sheetName']
 66 |         papersize = x['papersize']
 67 |         footer = x['footer']
 68 |         color = x['color']
 69 | 
 70 |         df_to_use.to_excel(writer, sheet_name=sheetName, index=False)
 71 | 
 72 |         worksheet = mainCol(colNumber = 0, color = color, writer=writer, sheetName=sheetName)
 73 | 
 74 |         ws = 1
 75 |         for i in normalColWidth: #iterates through each column
 76 |             worksheet = normalCol(ws, i, writer=writer, sheetName=sheetName)
 77 |             ws = ws + 1
 78 | 
 79 |         worksheet.set_paper(papersize)  # a4
 80 |         worksheet.fit_to_pages(1, 0)  # fit to 1 page wide, n long
 81 |         worksheet.repeat_rows(0)  # repeat the first row
 82 | 
 83 |         header_x = '&C&"Arial,Bold"&10{}'.format(excelSheetTitle)
 84 |         footer_x = '&L{}&CPage &P of &N'.format(footer)
 85 | 
 86 |         worksheet.set_header(header_x)
 87 |         worksheet.set_footer(footer_x)
 88 | 
 89 |     #writer.save()
 90 |     writer.close()
 91 | 
 92 | 
 93 | def create_pdf_from_excel(path_excel, path_pdf, dfs_to_use):
 94 | 
 95 | 
 96 |     # this creates an index to list each excel sheet, based on the number of sheets that were created before
 97 | 
 98 |     for_ws_index_list = []
 99 |     for i in range(len(dfs_to_use)):
100 |         for_ws_index_list.append(i + 1)
101 | 
102 |     excel = win32com.client.Dispatch("Excel.Application")
103 |     excel.Visible = False
104 | 
105 |     wb = excel.Workbooks.Open(path_excel)
106 | 
107 |     #print all the excel sheets into a single pdf
108 |     ws_index_list = for_ws_index_list
109 |     wb.Worksheets(ws_index_list).Select()
110 |     wb.ActiveSheet.ExportAsFixedFormat(0, path_pdf)
111 |     wb.Close()
112 |     excel.Quit()
113 | 


--------------------------------------------------------------------------------
/output_examples/TheMoodsofMidgarWillSutton_CALCS_only.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scinana/tableauCalculationExport/165989e7d9967fe1c3810624ad942f47fe2436e3/output_examples/TheMoodsofMidgarWillSutton_CALCS_only.pdf


--------------------------------------------------------------------------------
/output_examples/TheMoodsofMidgarWillSutton_CALCS_only.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/scinana/tableauCalculationExport/165989e7d9967fe1c3810624ad942f47fe2436e3/output_examples/TheMoodsofMidgarWillSutton_CALCS_only.xlsx


--------------------------------------------------------------------------------