├── .DS_Store
├── Archive.zip
├── LICENSE
├── Obsidianki 4.0.apkg
├── Obsidianki 4.ankiaddon
├── README.md
├── miscellaneous
    └── ankiweb.html
└── src
    ├── README.md
    ├── __init__.py
    ├── anki_importer.py
    ├── files.py
    ├── manifest.json
    ├── markdown2
        ├── markdown2.py
        └── markdown2Mathjax.py
    ├── obsidian_url.py
    ├── processor.py
    └── settings.py


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wxxedu/obsidianki4/90305bbf74aa6b625b6b374e87964bf5d5d62a13/.DS_Store


--------------------------------------------------------------------------------
/Archive.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wxxedu/obsidianki4/90305bbf74aa6b625b6b374e87964bf5d5d62a13/Archive.zip


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 wxxedu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Obsidianki 4.0.apkg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wxxedu/obsidianki4/90305bbf74aa6b625b6b374e87964bf5d5d62a13/Obsidianki 4.0.apkg


--------------------------------------------------------------------------------
/Obsidianki 4.ankiaddon:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wxxedu/obsidianki4/90305bbf74aa6b625b6b374e87964bf5d5d62a13/Obsidianki 4.ankiaddon


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # obsidianki 4
  2 | 
  3 | NOTE: The project is now **PAUSED**, and I will not be actively mainting this project until June. I am busy with my exam-preparations for now. Sorry about this issue :-(
  4 | 
  5 | 
  6 | 
  7 | > **Please back-up your vault regularly while using this add-on!** 
  8 | >
  9 | > I am a noob in programming, while no occasion of losing notes has happened, I am afraid that it might. As your notes are valuable, please do remember to back up notes. 
 10 | >
 11 | > **Versions**
 12 | >
 13 | > Theoretically, it now supports Anki 2.1.26 +. I am unaware whether if supports earlier version, and I wasn't able to test it on Anki 2.1.28 as my laptop is a M1 MacBook Air and Anki 2.1.28 does not open on it. 
 14 | > 
 15 | > **Expectations**
 16 | > 
 17 | > With this add-on, it is expected that you make all your changes in Obsidian (including the deletion, addition, and moving of the files). If you want to edit a file, you can just click on the link in Anki to link back to Obsidian. Instead of deleting files, you should move unused files into the `.trash` folder that you can turn on in the settings of Obsidian. Obsidianki will automatically remove them for you. 
 18 | 
 19 | This is a [Anki](https://github.com/ankitects) add-on that would import your files from [Obsidian](https://obsidian.md) into Anki while preserving the wiki-links. Each file in Obsidianki will be converted to a single note in Anki. It does so by searching through your vault for the file with the name specified and generating an Obsidian url from the path. 
 20 | 
 21 | Its github page is [obsidianki4](https://github.com/wxxedu/obsidianki4). 
 22 | 
 23 | This add-on also works with [hierarchical tags](https://ankiweb.net/shared/info/594329229) to convert the hierarchical tags in Obsidian in the metadata section (`tags: [tag1/tag1.1/tag1.1.1, tag2/tag2.1/tag2.1.1]`) into anki hierarchical tags. `tag1::tag1.1::tag1.1.1` and `tag2::tag2.1::tag2.1.1`
 24 | 
 25 | ## How to Install
 26 | 
 27 | You can install this Add-on by downloading the `obsidianki 4.ankiaddon` file from the releases section of GitHub and double click on it. 
 28 | 
 29 | You can also download from AnkiWeb: [Obsidianki 4 Addon Page](https://ankiweb.net/shared/info/620260832). The code for this add-on is 620260832.
 30 | 
 31 | ## How to Use
 32 | 
 33 | **Before starting to use, you will have to install Obsidianki's template, without which Obsidianki would not work.** To do so, go to Anki's Add-ons folder, open the folder "Obsidianki 4", and find `Obsidianki 4.apkg`. Double click on it to install. You can also download it from GitHub. 
 34 | 
 35 | After you've installed the Add-on, you can open Anki, select `Tools` -> `Obsidianki 4`, as shown in the following picture.
 36 | 
 37 | ![](https://tva1.sinaimg.cn/large/008eGmZEgy1gmmwz3peljj30u80ncq62.jpg)
 38 | 
 39 | The following menu will pop-up, which will include the default preferences panel. **NOTE THAT THE SETTINGS IN THIS PANEL ARE ALL DEFAULT SETTINGS**, and you **SHOULD NOT** change them regularly, as a change will **AFFECT ALL YOUR NOTES**.
 40 | 
 41 | ![](https://tva1.sinaimg.cn/large/008eGmZEgy1gmpllk0e9nj30rq0zkn1f.jpg)
 42 | 
 43 | Copy the path of your Obsidian vault into the first field. Note that you will have to use **forward slashes** `/` instead of backward ones for Obsidianki to function properly. 
 44 | 
 45 | After you've set the settings (I will explain in the next section), you can click on "Save and Convert", and it will complete the conversion. However, you won't notice a difference. Why? Because Anki's interface is not refreshed. To refresh the interface, you could click on anything in Anki's main interface, and it should be refreshed. 
 46 | 
 47 | ## Default Settings
 48 | 
 49 | Now, let's take a look at the default settings. 
 50 | 
 51 | ### Vault Path
 52 | 
 53 | This place shows the path to your vault. Note that in order for the wiki-links in Anki to link back to Obsidian, you will have to use a path that is actually a vault. If you just copy the path of a folder in the vault, the link function will not work. 
 54 | 
 55 | Another thing to take especial note of is that you should use **forward slashes** instead of backward ones. 
 56 | 
 57 | ### Templates Folder Name
 58 | 
 59 | The name of the folder in the first level that holds your templates. If specified, the contents in this folder would not be imported to Anki.
 60 | 
 61 | ### Trash Folder Name
 62 | 
 63 | The name of the folder in the first level that holds your trash. If specified, the contents in this folder would be **erased** when you run the Obsidianki add-on, and the corresponding cards in anki would also be deleted.
 64 | 
 65 | ### Archive Folder Name
 66 | 
 67 | The name of the folder in the first level that holds your archived file. If specified, corresponding anki cards to the contents in this folder would be deleted in Anki, but the files are still there in Obsidian and would not be deleted. 
 68 | 
 69 | ### Mode
 70 | 
 71 | There are four importing cloze modes in Obsidianki. 
 72 | 
 73 | #### `word` mode
 74 | 
 75 | It generates a card for every cloze. If you have 10 clozes, it generates 10 cards from `{{c1::Card 1}}` to `{{c10::}}`.
 76 | 
 77 | #### `line` mode
 78 | 
 79 | It generates a card for every line. If you have 10 clozes in the first line, they will be `{{c1::Card 1}}` to `{{c10::Card 10}}`. If you have 2 more clozes in the second line, they will be `{{c2::Card 11}}` to `{{c2::Card 12}}`.
 80 | 
 81 | #### `heading` mode (Recommended)
 82 | 
 83 | It generates a card for the content under every heading, with the exception of list cards and QA cards (I will explain this below). If you have a file as below:
 84 | 
 85 | ```markdown
 86 | # Heading 1
 87 | 
 88 | Hello **Obsidianki**.
 89 | 
 90 | This is the best **Anki** add-on for importing Obsidian files into **Anki**.
 91 | 
 92 | ## Heading 2
 93 | 
 94 | This is something **interesting**.
 95 | 
 96 | Q: What is the best add-on for importing Obsidian files into **Anki**?
 97 | 
 98 | A: Obsidianki!
 99 | 
100 | What are the features of Obsidianki?
101 | 
102 | 1. Import files
103 | 2. Preserve wiki links
104 | 3. Convert to Clozes
105 | 
106 | ## Heading 3
107 | 
108 | This is **Heading 3**.
109 | 
110 | ```
111 | 
112 | The "Obsidianki", "Anki" under "Heading 1" will be turned into `{{c1::Obsidianki}}` and `{{c1::Anki}}` respectively.
113 | 
114 | Theoretically, everything under "Heading 2" should be turned into `{{c2::...}}` cards, right? Not quite, because I have added QA cards and list cards. So, after conversion, the portion under heading to would become:
115 | 
116 | ```markdown
117 | ## Heading 2
118 | 
119 | This is something **{{c2::interesting}}**.
120 | 
121 | Q: What is the best add-on for importing Obsidian files into **Anki**?
122 | 
123 | A: {{c3::Obsidianki!}}
124 | 
125 | What are the features of Obsidianki?
126 | 
127 | 1. {{c4::Import files}}
128 | 2. {{c5::Preserve wiki links}}
129 | 3. {{c6::Convert to Clozes}}
130 | ```
131 | 
132 | #### `document` mode
133 | 
134 | In the document mode, everything will be converted to `{{c1::...}}`.
135 | 
136 | ### Type
137 | 
138 | There are two types in Obsidianki 4: `cloze` and `basic`. Nevertheless, these two types are different from Anki's `cloze` and `basic`. 
139 | 
140 | #### `cloze`
141 | 
142 | This type will create visible deletions on the screen. You will be able to see `[...]` on the screen where you applied cloze.
143 | 
144 | #### `basic`
145 | 
146 | This type will only create one card, and the cloze deletion would not be visible. 
147 | 
148 | ### Conversions
149 | 
150 | #### Bold to Cloze:  
151 | 
152 | This converts the bold syntax `**bold**` to cloze in Anki, while preserving the format. 
153 | 
154 | #### Italics to Cloze: 
155 | 
156 | This converts the italics syntax `*italics*` to cloze in Anki, while preserving the format. 
157 | 
158 | #### Highlight to Cloze: 
159 | 
160 | This converts the highlight syntax `==highlight==` to cloze in Anki, while preserving the format. 
161 | 
162 | #### Image to Cloze:  
163 | 
164 | This converts the image syntax `![]()` to cloze in Anki, while preserving the image. 
165 | 
166 | #### Quote to Cloze:  
167 | 
168 | This converts the quote syntax `> this is a quote` to cloze in Anki, while preserving format. 
169 | 
170 | **Be aware that currently, this has conflicts with the other syntaxes. If you want to leave this option on, you will have to make sure that you apply no other cloze formatting in the quote.** 
171 | 
172 | #### QA to Cloze: 
173 | 
174 | This converts the QA syntax that I created into cloze in Anki. 
175 | 
176 | ```markdown
177 | Q: Question
178 | 
179 | A: Answer
180 | ```
181 | 
182 | ```markdown
183 | Q: Question
184 | 
185 | A: {{c1::Answer}}
186 | ```
187 | 
188 | #### List to Cloze:
189 | 
190 | This turns any list into `Cloze`, where each list item is a cloze. 
191 | 
192 | #### Inline Code to Cloze: 
193 | 
194 |  This converts the inline code syntax to cloze in Anki, while preserving format. 
195 | 
196 | #### Block Code to Cloze:
197 | 
198 | This converts the block code syntax to cloze in Anki, while preserving format.
199 | 
200 | ## Individual Settings
201 | 
202 | You can also individually specify the settings for each note (file) in the metadata section of your file. The metadata section is the following segment in the very beginning of a document. 
203 | 
204 | ```markdown
205 | ---
206 | uid: 4511487055494033182
207 | ---
208 | ```
209 | 
210 | **By the way, Obsidianki will automatically create a metadata section that contains the file's unique id in the file. If you don't want duplicated notes, do not change the uid.**
211 | 
212 | If you want to change the individual importing settings for each file, type it in in the metadata section. You can make this a template in Obsidian:
213 | 
214 | ```
215 | ---
216 | mode: heading
217 | type: cloze
218 | bold: True
219 | italics: True
220 | highlight: False
221 | image: True
222 | quote: False
223 | QA: True
224 | list: True
225 | inline code: True
226 | block code: False
227 | ---
228 | ```
229 | 
230 | ## Special Note
231 | 
232 | ### About the Development
233 | 
234 | I will try my best to develop and maintain this add-on. However, as of right now, I am just a high school student who barely knows any programming. All my knowledge of programming come from my AP Computer Science A class LOL. 
235 | 
236 | I know that my code is pretty bad, so feel free to help me update them. (please do so so that I can learn from you!) I will probably add more comments to my code explaining my thoughts while writing them in the future, just in case you want to know what I did in the code. (I want to do this because I struggled to understand Anki's source code and other Add-ons). While this will not be an add-on writing tutorial and I am by no means good at python, it is my best hopes that sharing my thoughts as a beginner will help other beginners understand better how to write Anki add-ons. This will take some time for me to do, as I need to get back to work and studying, but I am going to spend some time doing so. 
237 | 
238 | ### Thanks
239 | 
240 | I want to thank the creators of Anki and Obsidian for building such beautiful apps. I also want to thank my friend [Anis](https://github.com/qiaozhanrong) for helping me with the code. 
241 | 


--------------------------------------------------------------------------------
/miscellaneous/ankiweb.html:
--------------------------------------------------------------------------------
  1 | 
  2 | <b>Note: please back up your obsidian vault regularly while using this add-on. As it will write certain information to your vault, I am concerned that it might have the very slight  chance of erasing your files.</b>
  3 | 
  4 | This is a Anki add-on that would import your files from Obsidian into Anki while preserving the wiki-links. Each file in Obsidianki will be converted to a single note in Anki. It does so by searching through your vault for the file with the name specified and generating an Obsidian url from the path. Note that this only works with Anki 2.1.38+
  5 | 
  6 | <a href="https://github.com/wxxedu/obsidianki4" rel="nofollow">https://github.com/wxxedu/obsidianki4</a>
  7 | 
  8 | This add-on also works with <a href="https://ankiweb.net/shared/info/594329229" rel="nofollow">hierarchical tags</a> to convert the hierarchical tags in Obsidian in the metadata section (<code>tags: [tag1/tag1.1/tag1.1.1, tag2/tag2.1/tag2.1.1]</code>) into anki hierarchical tags. <code>tag1::tag1.1::tag1.1.1</code> and <code>tag2::tag2.1::tag2.1.1</code>.
  9 | 
 10 | ## How to Install
 11 | 
 12 | You can install this Add-on by downloading the <code>obsidianki 4.ankiaddon</code> file from the <a href="https://github.com/wxxedu/obsidianki4/releases" rel="nofollow">releases section</a> of GitHub and double click on it. 
 13 | 
 14 | You can also use the install code below. 
 15 | 
 16 | ## How to Use
 17 | 
 18 | <b>Before starting to use, you will have to install Obsidianki's template, without which Obsidianki would not work.</b> You can also download it from <a href="https://github.com/wxxedu/obsidianki4/blob/main/Obsidianki%204.apkg" rel="nofollow">Obsidianki's Github Page</a>. 
 19 | 
 20 | After you've installed the Add-on, you can open Anki, select <code>Tools</code> -&gt; <code>Obsidianki 4</code>, as shown in the following picture.
 21 | 
 22 | <img src="https://tva1.sinaimg.cn/large/008eGmZEgy1gmmwz3peljj30u80ncq62.jpg">
 23 | 
 24 | The following menu will pop-up, which will include the default preferences panel. <b>NOTE THAT THE SETTINGS IN THIS PANEL ARE ALL DEFAULT SETTINGS</b>, and you <b>SHOULD NOT</b> change them regularly, as a change will <b>AFFECT ALL YOUR NOTES</b>.
 25 | 
 26 | <img src="https://tva1.sinaimg.cn/large/008eGmZEgy1gmmwzm0g2mj30ki0setbi.jpg">
 27 | 
 28 | Copy the path of your Obsidian vault into the first field. Note that you will have to use <b>forward slashes</b> <code>/</code> instead of backward ones for Obsidianki to function properly. 
 29 | 
 30 | After you've set the settings (I will explain in the next section), you can click on "Save and Convert", and it will complete the conversion. However, you won't notice a difference. Why? Because Anki's interface is not refreshed. To refresh the interface, you could click on anything in Anki's main interface, and it should be refreshed. 
 31 | 
 32 | ## Default Settings
 33 | 
 34 | Now, let's take a look at the default settings. 
 35 | 
 36 | ### Vault Path
 37 | 
 38 | This place shows the path to your vault. Note that in order for the wiki-links in Anki to link back to Obsidian, you will have to use a path that is actually a vault. If you just copy the path of a folder in the vault, the link function will not work. 
 39 | 
 40 | Another thing to take especial note of is that you should use <b>forward slashes</b> instead of backward ones. 
 41 | 
 42 | ### Mode
 43 | 
 44 | There are four importing cloze modes in Obsidianki. 
 45 | 
 46 | #### <code>word</code> mode
 47 | 
 48 | It generates a card for every cloze. If you have 10 clozes, it generates 10 cards from <code>{{c1::Card 1}}</code> to <code>{{c10::}}</code>.
 49 | 
 50 | #### <code>line</code> mode
 51 | 
 52 | It generates a card for every line. If you have 10 clozes in the first line, they will be <code>{{c1::Card 1}}</code> to <code>{{c10::Card 10}}</code>. If you have 2 more clozes in the second line, they will be <code>{{c2::Card 11}}</code> to <code>{{c2::Card 12}}</code>.
 53 | 
 54 | #### <code>heading</code> mode (Recommended)
 55 | 
 56 | It generates a card for the content under every heading, with the exception of list cards and QA cards (I will explain this below). If you have a file as below:
 57 | 
 58 | <img src="https://tva1.sinaimg.cn/large/008eGmZEgy1gmmz02sklvj31fr0u0jv1.jpg">
 59 | 
 60 | The "Obsidianki", "Anki" under "Heading 1" will be turned into <code>{{c1::Obsidianki}}</code> and <code>{{c1::Anki}}</code> respectively.
 61 | 
 62 | Theoretically, everything under "Heading 2" should be turned into <code>{{c2::...}}</code> cards, right? Not quite, because I have added QA cards and list cards. So, after conversion, the portion under heading to would become:
 63 | 
 64 | <img src="https://tva1.sinaimg.cn/large/008eGmZEgy1gmmz10f8dij31jq0iwtba.jpg">
 65 | 
 66 | #### <code>document</code> mode
 67 | 
 68 | In the document mode, everything will be converted to <code>{{c1::...}}</code>.
 69 | 
 70 | ### Type
 71 | 
 72 | There are two types in Obsidianki 4: <code>cloze</code> and <code>basic</code>. Nevertheless, these two types are different from Anki's `cloze` and <code>basic</code>. 
 73 | 
 74 | #### <code>cloze</code>
 75 | 
 76 | This type will create visible deletions on the screen. You will be able to see <code>[...]</code> on the screen where you applied cloze.
 77 | 
 78 | #### <code>basic</code>
 79 | 
 80 | This type will only create one card, and the cloze deletion would not be visible. 
 81 | 
 82 | ### Conversions
 83 | 
 84 | #### Bold to Cloze:  
 85 | 
 86 | This converts the bold syntax <code>**bold**</code> to cloze in Anki, while preserving the format. 
 87 | 
 88 | #### Italics to Cloze: 
 89 | 
 90 | This converts the italics syntax <code>*italics*</code> to cloze in Anki, while preserving the format. 
 91 | 
 92 | #### Highlight to Cloze: 
 93 | 
 94 | This converts the highlight syntax <code>==highlight==</code> to cloze in Anki, while preserving the format. 
 95 | 
 96 | #### Image to Cloze:  
 97 | 
 98 | This converts the image syntax <code>![]()</code> to cloze in Anki, while preserving the image. 
 99 | 
100 | #### Quote to Cloze:  
101 | 
102 | This converts the quote syntax <code>&gt; this is a quote</code> to cloze in Anki, while preserving format. 
103 | 
104 | <b>Be aware that currently, this has conflicts with the other syntaxes. If you want to leave this option on, you will have to make sure that you apply no other cloze formatting in the quote.</b> 
105 | 
106 | #### QA to Cloze: 
107 | 
108 | This converts the QA syntax that I created into cloze in Anki. 
109 | 
110 | <img src="https://tva1.sinaimg.cn/large/008eGmZEgy1gmmz3ghfsjj31js0c20tm.jpg">
111 | 
112 | #### List to Cloze:
113 | 
114 | This turns any list into cloze, where each list item is a cloze. 
115 | 
116 | #### Inline Code to Cloze: 
117 | 
118 |  This converts the inline code syntax to cloze in Anki, while preserving format. 
119 | 
120 | #### Block Code to Cloze:
121 | 
122 | This converts the block code syntax to cloze in Anki, while preserving format.
123 | 
124 | ## Individual Settings
125 | 
126 | You can also individually specify the settings for each note (file) in the metadata section of your file. The metadata section is the following segment in the very beginning of a document. 
127 | 
128 | <img src="https://tva1.sinaimg.cn/large/008eGmZEgy1gmmz4g7uqmj31js06qq3d.jpg">
129 | 
130 | <b>By the way, Obsidianki will automatically create a metadata section that contains the file's unique id in the file. If you don't want duplicated notes, do not change the uid.</b>
131 | 
132 | If you want to change the individual importing settings for each file, type it in in the metadata section. You can make this a template in Obsidian:
133 | 
134 | <img src="https://tva1.sinaimg.cn/large/008eGmZEgy1gmmz5173rxj31jq0iutal.jpg">
135 | 
136 | ## Special Note
137 | 
138 | ### About the Development
139 | 
140 | I will try my best to develop and maintain this add-on. However, as of right now, I am just a high school student who barely knows any programming. All my knowledge of programming come from my AP Computer Science A class LOL. 
141 | 
142 | I know that my code is pretty bad, so feel free to help me update them. (please do so so that I can learn from you!)
143 | 
144 | ### Thanks
145 | 
146 | I want to thank the creators of Anki and Obsidian for building such beautiful apps. I also want to thank my friend <a href="https://github.com/qiaozhanrong" rel="nofollow">Anis</a> for helping me with the code. 


--------------------------------------------------------------------------------
/src/README.md:
--------------------------------------------------------------------------------
  1 | # obsidianki 4
  2 | 
  3 | > **Please back-up your vault regularly while using this add-on!**
  4 | >
  5 | > Theoretically, it now supports Anki 2.1.28 +. I am unware whether if supports earlier version, and I wasn't able to test it on Anki 2.1.28 as my laptop is a M1 MacBook Air and Anki 2.1.28 does not open on it. 
  6 | > 
  7 | > This add-on also works with [hierarchical tags](https://ankiweb.net/shared/info/594329229) to convert the hierarchical tags in Obsidian in the metadata section (`tags: [tag1/tag1.1/tag1.1.1, tag2/tag2.1/tag2.1.1]`) into anki hierarchical tags. `tag1::tag1.1::tag1.1.1` and `tag2::tag2.1::tag2.1.1`
  8 | 
  9 | This is a [Anki](https://github.com/ankitects) add-on that would import your files from [Obsidian](https://obsidian.md) into Anki while preserving the wiki-links. Each file in Obsidianki will be converted to a single note in Anki. It does so by searching through your vault for the file with the name specified and generating an Obsidian url from the path. 
 10 | 
 11 | Its github page is [obsidianki4](https://github.com/wxxedu/obsidianki4). 
 12 | 
 13 | ## How to Install
 14 | 
 15 | You can install this Add-on by downloading the `obsidianki 4.ankiaddon` file from the releases section of GitHub and double click on it. 
 16 | 
 17 | You can also download from AnkiWeb: [Obsidianki 4 Addon Page](https://ankiweb.net/shared/info/620260832). The code for this add-on is 620260832.
 18 | 
 19 | ## How to Use
 20 | 
 21 | **Before starting to use, you will have to install Obsidianki's template, without which Obsidianki would not work.** To do so, go to Anki's Add-ons folder, open the folder "Obsidianki 4", and find `Obsidianki 4.apkg`. Double click on it to install. You can also download it from GitHub. 
 22 | 
 23 | After you've installed the Add-on, you can open Anki, select `Tools` -> `Obsidianki 4`, as shown in the following picture.
 24 | 
 25 | ![](https://tva1.sinaimg.cn/large/008eGmZEgy1gmmwz3peljj30u80ncq62.jpg)
 26 | 
 27 | The following menu will pop-up, which will include the default preferences panel. **NOTE THAT THE SETTINGS IN THIS PANEL ARE ALL DEFAULT SETTINGS**, and you **SHOULD NOT** change them regularly, as a change will **AFFECT ALL YOUR NOTES**.
 28 | 
 29 | ![](https://tva1.sinaimg.cn/large/008eGmZEgy1gmpllk0e9nj30rq0zkn1f.jpg)
 30 | 
 31 | Copy the path of your Obsidian vault into the first field. Note that you will have to use **forward slashes** `/` instead of backward ones for Obsidianki to function properly. 
 32 | 
 33 | After you've set the settings (I will explain in the next section), you can click on "Save and Convert", and it will complete the conversion. However, you won't notice a difference. Why? Because Anki's interface is not refreshed. To refresh the interface, you could click on anything in Anki's main interface, and it should be refreshed. 
 34 | 
 35 | ## Default Settings
 36 | 
 37 | Now, let's take a look at the default settings. 
 38 | 
 39 | ### Vault Path
 40 | 
 41 | This place shows the path to your vault. Note that in order for the wiki-links in Anki to link back to Obsidian, you will have to use a path that is actually a vault. If you just copy the path of a folder in the vault, the link function will not work. 
 42 | 
 43 | Another thing to take especial note of is that you should use **forward slashes** instead of backward ones. 
 44 | 
 45 | ### Templates Folder Name
 46 | 
 47 | The name of the folder in the first level that holds your templates. If specified, the contents in this folder would not be imported to Anki.
 48 | 
 49 | ### Trash Folder Name
 50 | 
 51 | The name of the folder in the first level that holds your trash. If specified, the contents in this folder would be **erased** when you run the Obsidianki add-on, and the corresponding cards in anki would also be deleted.
 52 | 
 53 | ### Archive Folder Name
 54 | 
 55 | The name of the folder in the first level that holds your archived file. If specified, corresponding anki cards to the contents in this folder would be deleted in Anki, but the files are still there in Obsidian and would not be deleted. 
 56 | 
 57 | ### Mode
 58 | 
 59 | There are four importing cloze modes in Obsidianki. 
 60 | 
 61 | #### `word` mode
 62 | 
 63 | It generates a card for every cloze. If you have 10 clozes, it generates 10 cards from `{{c1::Card 1}}` to `{{c10::}}`.
 64 | 
 65 | #### `line` mode
 66 | 
 67 | It generates a card for every line. If you have 10 clozes in the first line, they will be `{{c1::Card 1}}` to `{{c10::Card 10}}`. If you have 2 more clozes in the second line, they will be `{{c2::Card 11}}` to `{{c2::Card 12}}`.
 68 | 
 69 | #### `heading` mode (Recommended)
 70 | 
 71 | It generates a card for the content under every heading, with the exception of list cards and QA cards (I will explain this below). If you have a file as below:
 72 | 
 73 | ```markdown
 74 | # Heading 1
 75 | 
 76 | Hello **Obsidianki**.
 77 | 
 78 | This is the best **Anki** add-on for importing Obsidian files into **Anki**.
 79 | 
 80 | ## Heading 2
 81 | 
 82 | This is something **interesting**.
 83 | 
 84 | Q: What is the best add-on for importing Obsidian files into **Anki**?
 85 | 
 86 | A: Obsidianki!
 87 | 
 88 | What are the features of Obsidianki?
 89 | 
 90 | 1. Import files
 91 | 2. Preserve wiki links
 92 | 3. Convert to Clozes
 93 | 
 94 | ## Heading 3
 95 | 
 96 | This is **Heading 3**.
 97 | 
 98 | ```
 99 | 
100 | The "Obsidianki", "Anki" under "Heading 1" will be turned into `{{c1::Obsidianki}}` and `{{c1::Anki}}` respectively.
101 | 
102 | Theoretically, everything under "Heading 2" should be turned into `{{c2::...}}` cards, right? Not quite, because I have added QA cards and list cards. So, after conversion, the portion under heading to would become:
103 | 
104 | ```markdown
105 | ## Heading 2
106 | 
107 | This is something **{{c2::interesting}}**.
108 | 
109 | Q: What is the best add-on for importing Obsidian files into **Anki**?
110 | 
111 | A: {{c3::Obsidianki!}}
112 | 
113 | What are the features of Obsidianki?
114 | 
115 | 1. {{c4::Import files}}
116 | 2. {{c5::Preserve wiki links}}
117 | 3. {{c6::Convert to Clozes}}
118 | ```
119 | 
120 | #### `document` mode
121 | 
122 | In the document mode, everything will be converted to `{{c1::...}}`.
123 | 
124 | ### Type
125 | 
126 | There are two types in Obsidianki 4: `cloze` and `basic`. Nevertheless, these two types are different from Anki's `cloze` and `basic`. 
127 | 
128 | #### `cloze`
129 | 
130 | This type will create visible deletions on the screen. You will be able to see `[...]` on the screen where you applied cloze.
131 | 
132 | #### `basic`
133 | 
134 | This type will only create one card, and the cloze deletion would not be visible. 
135 | 
136 | ### Conversions
137 | 
138 | #### Bold to Cloze:  
139 | 
140 | This converts the bold syntax `**bold**` to cloze in Anki, while preserving the format. 
141 | 
142 | #### Italics to Cloze: 
143 | 
144 | This converts the italics syntax `*italics*` to cloze in Anki, while preserving the format. 
145 | 
146 | #### Highlight to Cloze: 
147 | 
148 | This converts the highlight syntax `==highlight==` to cloze in Anki, while preserving the format. 
149 | 
150 | #### Image to Cloze:  
151 | 
152 | This converts the image syntax `![]()` to cloze in Anki, while preserving the image. 
153 | 
154 | #### Quote to Cloze:  
155 | 
156 | This converts the quote syntax `> this is a quote` to cloze in Anki, while preserving format. 
157 | 
158 | **Be aware that currently, this has conflicts with the other syntaxes. If you want to leave this option on, you will have to make sure that you apply no other cloze formatting in the quote.** 
159 | 
160 | #### QA to Cloze: 
161 | 
162 | This converts the QA syntax that I created into cloze in Anki. 
163 | 
164 | ```markdown
165 | Q: Question
166 | 
167 | A: Answer
168 | ```
169 | 
170 | ```markdown
171 | Q: Question
172 | 
173 | A: {{c1::Answer}}
174 | ```
175 | 
176 | #### List to Cloze:
177 | 
178 | This turns any list into `Cloze`, where each list item is a cloze. 
179 | 
180 | #### Inline Code to Cloze: 
181 | 
182 |  This converts the inline code syntax to cloze in Anki, while preserving format. 
183 | 
184 | #### Block Code to Cloze:
185 | 
186 | This converts the block code syntax to cloze in Anki, while preserving format.
187 | 
188 | ## Individual Settings
189 | 
190 | You can also individually specify the settings for each note (file) in the metadata section of your file. The metadata section is the following segment in the very beginning of a document. 
191 | 
192 | ```markdown
193 | ---
194 | uid: 4511487055494033182
195 | ---
196 | ```
197 | 
198 | **By the way, Obsidianki will automatically create a metadata section that contains the file's unique id in the file. If you don't want duplicated notes, do not change the uid.**
199 | 
200 | If you want to change the individual importing settings for each file, type it in in the metadata section. You can make this a template in Obsidian:
201 | 
202 | ```
203 | ---
204 | mode: heading
205 | type: cloze
206 | bold: True
207 | italics: True
208 | highlight: False
209 | image: True
210 | quote: False
211 | QA: True
212 | list: True
213 | inline code: True
214 | block code: False
215 | ---
216 | ```
217 | 
218 | ## Special Note
219 | 
220 | ### About the Development
221 | 
222 | I will try my best to develop and maintain this add-on. However, as of right now, I am just a high school student who barely knows any programming. All my knowledge of programming come from my AP Computer Science A class LOL. 
223 | 
224 | I know that my code is pretty bad, so feel free to help me update them. (please do so so that I can learn from you!) I will probably add more comments to my code explaining my thoughts while writing them in the future, just in case you want to know what I did in the code. (I want to do this because I struggled to understand Anki's source code and other Add-ons). While this will not be an add-on writing tutorial and I am by no means good at python, it is my best hopes that sharing my thoughts as a beginner will help other beginners understand better how to write Anki add-ons. This will take some time for me to do, as I need to get back to work and studying, but I am going to spend some time doing so. 
225 | 
226 | ### Thanks
227 | 
228 | I want to thank the creators of Anki and Obsidian for building such beautiful apps. I also want to thank my friend [Anis](https://github.com/qiaozhanrong) for helping me with the code. 
229 | 
230 | 
231 | 
232 | 
233 | 
234 | 


--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import os
  3 | from . import files
  4 | from . import settings
  5 | from . import obsidian_url
  6 | from . import anki_importer
  7 | import aqt
  8 | from aqt import mw
  9 | from aqt import AnkiQt, gui_hooks
 10 | from aqt.qt import *
 11 | from aqt.utils import showInfo
 12 | from aqt.utils import tooltip
 13 | from PyQt5 import QtWidgets, QtCore
 14 | 
 15 | 
 16 | def read_files(root_path, relative_path):
 17 | 	files_catalog = []
 18 | 	if relative_path == "":
 19 | 		paths = os.listdir(root_path)
 20 | 	else:
 21 | 		paths = os.listdir(root_path + "/" + relative_path)
 22 | 	for path in paths:
 23 | 		foler_is_ignored = False
 24 | 		
 25 | 		ignore_folder_s = settings.get_settings_by_name("ignore folder")
 26 | 		
 27 | 		if ignore_folder_s == "":
 28 | 			pass
 29 | 		elif ignore_folder_s.find("\n") != -1:
 30 | 			ignore_folders = ignore_folder_s.split("\n")
 31 | 		else:
 32 | 			ignore_folders = [ignore_folder_s]
 33 | 		
 34 | 		for ignore_folder in ignore_folders:
 35 | 			ignore_folder = ignore_folder.lstrip(" ")
 36 | 			ignore_folder = ignore_folder.rstrip(" ")
 37 | 			ignore_folder = "/" + ignore_folder
 38 | 			if relative_path.startswith(ignore_folder) and ignore_folder != "/":
 39 | 				foler_is_ignored = True
 40 | 		
 41 | 		if path.find(".") != -1 and path.split(".")[-1] != "md" and path != ".trash":
 42 | 			pass
 43 | 		elif foler_is_ignored:
 44 | 			pass
 45 | 		elif path.endswith(".md"):
 46 | 			new_path = relative_path + "/" + path
 47 | 			new_file = files.File(root_path, new_path)
 48 | 			files_catalog.append(new_file)
 49 | 		else:
 50 | 			try: 
 51 | 				new_path = relative_path + "/" + path
 52 | 				files_catalog = files_catalog + read_files(root_path, new_path)
 53 | 			except NotADirectoryError:
 54 | 				pass
 55 | 	return files_catalog
 56 | 
 57 | def get_bool(status_text):
 58 | 	return status_text == "True" or status_text == "true"
 59 | 
 60 | def get_text(status_bool):
 61 | 	if status_bool:
 62 | 		return "True"
 63 | 	else: 
 64 | 		return "False"
 65 | 
 66 | class ObsidiankiSettings(QDialog):
 67 | 	def __init__(self, mw):
 68 | 		super().__init__(mw)
 69 | 		
 70 | 		
 71 | 		layout = QFormLayout(self)
 72 | 		
 73 | 		
 74 | 		self.vault_path = QPlainTextEdit(self)
 75 | 		self.templates_folder = QPlainTextEdit(self)
 76 | 		self.archive_folder = QPlainTextEdit(self)
 77 | 		
 78 | 		
 79 | 		self.mode = QLineEdit(self)
 80 | 		self.type = QLineEdit(self)
 81 | 		
 82 | 		
 83 | 		self.bold = QCheckBox(self)
 84 | 		self.highlight = QCheckBox(self)
 85 | 		self.italics = QCheckBox(self)
 86 | 		self.image = QCheckBox(self)
 87 | 		self.quote = QCheckBox(self)
 88 | 		self.QuestionOrAnswer = QCheckBox(self)
 89 | 		self.list = QCheckBox(self)
 90 | 		self.inline_code = QCheckBox(self)
 91 | 		self.block_code = QCheckBox(self)
 92 | 		self.convert_button = QPushButton("Save and Convert")
 93 | 		self.save_button = QPushButton("Save and Close")
 94 | 		
 95 | 		
 96 | 		layout.addRow(QLabel("Vault Path: "))
 97 | 		layout.addRow(QLabel("(Please use forward slashes for your vault path)"))
 98 | 		layout.addRow(self.vault_path)
 99 | 		
100 | 		layout.addRow(QLabel("Ignore Folders: "))
101 | 		layout.addRow(QLabel("(Notes in Anki in this obsidian folder will be ignored)"))
102 | 		layout.addRow(self.templates_folder)
103 | 		
104 | 		layout.addRow(QLabel("Archive Folder Name: "))
105 | 		layout.addRow(self.archive_folder)
106 | 		layout.addRow(QLabel("Anki Cards in this folder will be deleted"))
107 | 		
108 | 		
109 | 		layout.addRow(QLabel("Mode: "), self.mode)
110 | 		layout.addRow(QLabel("Mode: choose from word/line/heading/document"))
111 | 		layout.addRow(QLabel("Type: "), self.type)
112 | 		layout.addRow(QLabel("Type: choose from cloze/basic"))
113 | 		
114 | 		
115 | 		layout.addRow(QLabel("Bold to Cloze: "), self.bold)
116 | 		layout.addRow(QLabel("Italics to Cloze: "), self.italics)
117 | 		layout.addRow(QLabel("Highlight to Cloze: "), self.highlight)
118 | 		layout.addRow(QLabel("Image to Cloze: "), self.image)
119 | 		layout.addRow(QLabel("Quote to Cloze: "), self.quote)
120 | 		layout.addRow(QLabel("QA to Cloze"), self.QuestionOrAnswer)
121 | 		layout.addRow(QLabel("List to Cloze"), self.list)
122 | 		layout.addRow(QLabel("Inline Code to Cloze"), self.inline_code)
123 | 		layout.addRow(QLabel("Block Code to Cloze"), self.block_code)
124 | 		
125 | 		
126 | 		layout.addRow(self.save_button, self.convert_button)
127 | 		
128 | 		
129 | 		self.vault_path.setPlainText(settings.get_settings_by_name("vault path"))
130 | 		self.templates_folder.setPlainText(settings.get_settings_by_name("ignore folder"))
131 | 		self.archive_folder.setPlainText(settings.get_settings_by_name("archive folder"))
132 | 		
133 | 		
134 | 		self.mode.setText(settings.get_settings_by_name("mode"))
135 | 		self.type.setText(settings.get_settings_by_name("type"))
136 | 		
137 | 		
138 | 		self.bold.setChecked(get_bool(settings.get_settings_by_name("bold")))
139 | 		self.italics.setChecked(get_bool(settings.get_settings_by_name("italics")))
140 | 		self.highlight.setChecked(get_bool(settings.get_settings_by_name("highlight")))
141 | 		self.image.setChecked(get_bool(settings.get_settings_by_name("image")))
142 | 		self.quote.setChecked(get_bool(settings.get_settings_by_name("quote")))
143 | 		self.QuestionOrAnswer.setChecked(get_bool(settings.get_settings_by_name("QA")))
144 | 		self.list.setChecked(get_bool(settings.get_settings_by_name("list")))
145 | 		self.inline_code.setChecked(get_bool(settings.get_settings_by_name("inline code")))
146 | 		self.block_code.setChecked(get_bool(settings.get_settings_by_name("block code")))
147 | 		
148 | 		
149 | 		self.convert_button.setDefault(True)
150 | 		self.convert_button.clicked.connect(self.onOk)
151 | 		self.save_button.clicked.connect(self.onSave)
152 | 		self.show()
153 | 		
154 | 	def onOk(self):
155 | 		newSettings = {}
156 | 		newSettings["vault path"] = self.vault_path.toPlainText()
157 | 		newSettings["ignore folder"] = self.templates_folder.toPlainText()
158 | 		# newSettings["trash folder"] = self.trash_folder.text()
159 | 		newSettings["archive folder"] = self.archive_folder.toPlainText()
160 | 		newSettings["mode"] = self.mode.text()
161 | 		newSettings["type"] = self.type.text()
162 | 		newSettings["bold"] = get_text(self.bold.isChecked())
163 | 		newSettings["highlight"] = get_text(self.highlight.isChecked())
164 | 		newSettings["italics"] = get_text(self.italics.isChecked())
165 | 		newSettings["image"] = get_text(self.image.isChecked())
166 | 		newSettings["quote"] = get_text(self.quote.isChecked())
167 | 		newSettings["QA"] = get_text(self.QuestionOrAnswer.isChecked())
168 | 		newSettings["list"] = get_text(self.list.isChecked())
169 | 		newSettings["inline code"] = get_text(self.inline_code.isChecked())
170 | 		newSettings["block code"] = get_text(self.block_code.isChecked())
171 | 		settings.save_settings(newSettings)
172 | 		
173 | 		###############################################################################################################################
174 | 		###############################################################################################################################
175 | 		###############################################################################################################################
176 | 		
177 | 		if self.vault_path.toPlainText().find("\n") != -1:
178 | 			vault_paths = self.vault_path.toPlainText().split("\n")
179 | 		else:
180 | 			vault_paths = [self.vault_path.toPlainText()]
181 | 			
182 | 		my_files_catalog = []
183 | 			
184 | 		for a_vault_path in vault_paths:
185 | 			if a_vault_path != "":
186 | 				my_files_catalog = my_files_catalog + read_files(a_vault_path, "")
187 | 		
188 | 		length_of_files = len(my_files_catalog)
189 | 		for i in range(0, length_of_files):
190 | 			my_files_catalog[i].set_file_content(obsidian_url.process_obsidian_file(my_files_catalog[i].file_content, my_files_catalog))
191 | 		
192 | 		anki_importer.importer(my_files_catalog)
193 | 		
194 | 		###############################################################################################################################
195 | 		###############################################################################################################################
196 | 		###############################################################################################################################
197 | 		
198 | 		mw.update()
199 | 		mw.reset(True)
200 | 
201 | 		self.close()
202 | 		
203 | 	def onSave(self):
204 | 		newSettings = {}
205 | 		newSettings["vault path"] = self.vault_path.toPlainText()
206 | 		newSettings["ignore folder"] = self.templates_folder.toPlainText()
207 | 		newSettings["archive folder"] = self.archive_folder.toPlainText()
208 | 		
209 | 		newSettings["mode"] = self.mode.text()
210 | 		newSettings["type"] = self.type.text()
211 | 		newSettings["bold"] = get_text(self.bold.isChecked())
212 | 		newSettings["highlight"] = get_text(self.highlight.isChecked())
213 | 		newSettings["italics"] = get_text(self.italics.isChecked())
214 | 		newSettings["image"] = get_text(self.image.isChecked())
215 | 		newSettings["quote"] = get_text(self.quote.isChecked())
216 | 		newSettings["QA"] = get_text(self.QuestionOrAnswer.isChecked())
217 | 		newSettings["list"] = get_text(self.list.isChecked())
218 | 		newSettings["inline code"] = get_text(self.inline_code.isChecked())
219 | 		newSettings["block code"] = get_text(self.block_code.isChecked())
220 | 		settings.save_settings(newSettings)
221 | 		self.close()
222 | 		
223 | action = QAction("Obsidianki 4", aqt.mw)
224 | action.triggered.connect(lambda: ObsidiankiSettings(aqt.mw))
225 | 
226 | aqt.mw.form.menuTools.addAction(action)


--------------------------------------------------------------------------------
/src/anki_importer.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import os
  3 | import shutil
  4 | from . import settings
  5 | from aqt import mw
  6 | from anki.cards import Card
  7 | from anki.notes import Note
  8 | from anki.collection import Collection
  9 | from aqt.utils import showInfo
 10 | 
 11 | def importer(my_files_catalog):
 12 | 	for file in my_files_catalog:
 13 | 		importer_to_anki(file)
 14 | 	empty_trash()
 15 | 	delete_empty_decks()
 16 | 	
 17 | def importer_to_anki(file):
 18 | 	
 19 | 	archive_folder_input = settings.get_settings_by_name("archive folder")
 20 | 	if archive_folder_input == "":
 21 | 		pass
 22 | 	elif archive_folder_input.find("\n") != -1:
 23 | 		archive_folders = archive_folder_input.split("\n")
 24 | 	else:
 25 | 		archive_folders = [archive_folder_input]
 26 | 	
 27 | 	is_in_archive_folder = False
 28 | 	
 29 | 	for archive_folder in archive_folders:
 30 | 		archive_folder = archive_folder.lstrip(" ")
 31 | 		archive_folder = archive_folder.rstrip(" ")
 32 | 		archive_folder = "/" + archive_folder
 33 | 		if file.get_file_relative_path().startswith(archive_folder) and archive_folder != "" and archive_folder != "\n":
 34 | 			is_in_archive_folder = True
 35 | 	
 36 | 	if file.get_file_relative_path().startswith("/.trash"):
 37 | 		uid = file.get_file_uid()
 38 | 		note_list = mw.col.find_notes(uid)
 39 | 		if len(note_list) > 0:
 40 | 			for single_note_id in note_list:
 41 | 				single_note = mw.col.getNote(single_note_id)
 42 | 				try:
 43 | 					if single_note["UID"] == uid:
 44 | 						mw.col.remNotes([single_note_id])
 45 | 				except KeyError:
 46 | 					pass
 47 | 	elif is_in_archive_folder: # or file.get_file_root_folder() == settings.get_settings_by_name("ignore folder")
 48 | 		uid = file.get_file_uid()
 49 | 		note_list = mw.col.find_notes(uid)
 50 | 		if len(note_list) > 0:
 51 | 			for single_note_id in note_list:
 52 | 				single_note = mw.col.getNote(single_note_id)
 53 | 				if single_note["UID"] == uid:
 54 | 					mw.col.remNotes([single_note_id])
 55 | 	else:
 56 | 		deck_id = mw.col.decks.id(file.get_deck_name())
 57 | 		mw.col.decks.select(deck_id)
 58 | 		card_model = mw.col.models.byName("Obsidianki4")
 59 | 		uid = file.get_file_uid()
 60 | 		note_list = mw.col.find_notes(uid)
 61 | 		found_exisiting_file = False
 62 | 		if len(note_list) > 0:
 63 | 			for single_note_id in note_list:
 64 | 				single_note = mw.col.getNote(single_note_id)
 65 | 				if single_note.model() == card_model:
 66 | 					if single_note["UID"] == uid:
 67 | 						if file.get_cloze_or_basic():
 68 | 							single_note["Cloze"] = file.get_file_content()
 69 | 							single_note["Text"] = ""
 70 | 						else:
 71 | 							single_note["Cloze"] = "{{c1::}}"
 72 | 							single_note["Text"] = file.get_file_content()
 73 | 							
 74 | 						back_extra = "Source: " + file.get_file_name_with_url()
 75 | 						single_note["Back Extra"] = back_extra
 76 | 						
 77 | 						single_note.tags = []
 78 | 						for tag in file.get_tags():
 79 | 							single_note.tags.append(tag)
 80 | 						try:
 81 | 							card_ids = mw.col.card_ids_of_note(single_note_id)
 82 | 							mw.col.set_deck(card_ids, deck_id)
 83 | 						except AttributeError:
 84 | 							card_ids = mw.col.find_cards(uid)
 85 | 							mw.col.decks.setDeck(card_ids, deck_id)
 86 | 						single_note.flush()
 87 | 						found_exisiting_file = True
 88 | 		if not found_exisiting_file:
 89 | 			try:
 90 | 				deck = mw.col.decks.get(deck_id)
 91 | 				deck["mid"] = card_model["id"]
 92 | 				mw.col.decks.save(deck)
 93 | 				note_object = mw.col.newNote(deck_id)
 94 | 				if file.get_cloze_or_basic():
 95 | 					note_object["Cloze"] = file.get_file_content()
 96 | 					note_object["Text"] = ""
 97 | 				else:
 98 | 					note_object["Cloze"] = "{{c1::}}"
 99 | 					note_object["Text"] = file.get_file_content()
100 | 				note_object["UID"] = uid
101 | 				back_extra = "Source: " + file.get_file_name_with_url()
102 | 				note_object["Back Extra"] = back_extra
103 | 				for tag in file.get_tags():
104 | 					note_object.tags.append(tag)
105 | 				
106 | 				mw.col.add_note(note_object, deck_id)
107 | 			except TypeError:
108 | 				pass
109 | 
110 | def delete_empty_decks():
111 | 	names_and_ids = mw.col.decks.all_names_and_ids()
112 | 	for name_and_id in names_and_ids:
113 | 		# I could not find what type this object is, so the only way for me to do it now is to use the string.
114 | 		name_and_id_segments = str(name_and_id).split("\n")
115 | 		deck_id= int(name_and_id_segments[0].split(": ")[1])
116 | 		
117 | 		if deck_has_cards(deck_id):
118 | 			mw.col.decks.rem(deck_id, True, True)
119 | 			
120 | def empty_trash():
121 | 	path_s = settings.get_settings_by_name("vault path")
122 | 	
123 | 	if path_s == "":
124 | 		pass
125 | 	elif path_s.find("\n"):
126 | 		paths = path_s.split("\n")
127 | 	else:
128 | 		paths = [paths]
129 | 	
130 | 	for path in paths:
131 | 		path = path.lstrip(" ")
132 | 		path = path.rstrip(" ")
133 | 		# TODO: Add this to settings
134 | 		if path != "":
135 | 			trash_can_path = path + "/" + ".trash"
136 | 			try:
137 | 				trash_directories = os.listdir(trash_can_path)
138 | 				for trash_directory in trash_directories:
139 | 					trash_directory_path = trash_can_path + "/" + trash_directory
140 | 					try:
141 | 						shutil.rmtree(trash_directory_path)
142 | 					except NotADirectoryError:
143 | 						os.remove(trash_directory_path)
144 | 			except NotADirectoryError:
145 | 				pass
146 | 		
147 | def deck_has_cards(deck_id):
148 | 	if deck_id != 1:
149 | 		try:
150 | 			if mw.col.decks.card_count(deck_id, True) == 0:
151 | 				return True
152 | 		except AttributeError:
153 | 			cids = mw.col.decks.cids(deck_id, True)
154 | 			if len(cids) == 0:
155 | 				return True
156 | 	return False


--------------------------------------------------------------------------------
/src/files.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | from aqt import mw
  3 | from aqt.utils import showInfo
  4 | from . import processor
  5 | 
  6 | def gen_obsidian_url(vault_name, file_url):
  7 |     vault_url = "obsidian://open?vault=" + my_encode(vault_name)
  8 |     file_url = "&file=" + my_encode(file_url)
  9 |     return vault_url + file_url
 10 | 
 11 | 
 12 | def my_encode(string:str):
 13 |     string = str(string.encode("utf-8"))
 14 |     string = string.replace("\\x", "%")
 15 |     string = string.replace(" ", "%20")
 16 |     string = string.replace("/", "%2F")
 17 |     string = string.lstrip("\'b")
 18 |     string = string.rstrip("\'")
 19 |     string = capitalize_unicode(string)
 20 |     return string
 21 | 
 22 | 
 23 | def capitalize_unicode(string):
 24 |     new = []
 25 |     position = -5
 26 |     for index in range(0, len(string)):
 27 |         if string[index] == "%":
 28 |             position = index
 29 |             new.append(string[index])
 30 |         elif index == position + 1 or index == position + 2:
 31 |             new.append(string[index].capitalize())
 32 |         else:
 33 |             new.append(string[index])
 34 |     return "".join(new)
 35 | 
 36 | 
 37 | class File:
 38 |     # important parameters
 39 |     file_name = ""
 40 |     vault_path = ""
 41 |     relative_path = ""
 42 |     full_path = ""
 43 |     uid = ""
 44 |     file_content = ""
 45 |     cloze_or_basic = True
 46 |     obsidian_url = ""
 47 |     metadata = {}
 48 | 
 49 |     def __init__(self, vault_path, relative_path):
 50 |         self.vault_path = vault_path
 51 |         self.relative_path = relative_path
 52 |         self.full_path = self.vault_path + "/" + self.relative_path
 53 |         tmp = processor.read_file(self.full_path)
 54 |         self.uid = tmp[0]
 55 |         self.file_content = tmp[1]
 56 |         self.cloze_or_basic = tmp[2]
 57 |         self.metadata = tmp[3]
 58 |         self.obsidian_url = self.generate_obsidian_url()
 59 |         self.file_name = self.generate_file_name()
 60 |         
 61 |     
 62 |     def get_deck_name(self):
 63 |         root_name = self.vault_path.split("/")[-1]
 64 |         sublevel_name_segments = self.relative_path.split("/")[:-1]
 65 |         sublevel_name = "::".join(sublevel_name_segments)
 66 |         deck_name = root_name + sublevel_name
 67 |         return deck_name
 68 |     
 69 |     def get_file_root_folder(self):
 70 |         tmp = self.relative_path.lstrip("/")
 71 |         root_folder = tmp.split("/")[0]
 72 |         # TODO: Delete this test code
 73 |         return root_folder
 74 |     
 75 |     def get_file_full_path(self):
 76 |         return self.full_path
 77 |     
 78 |     def get_file_relative_path(self):
 79 |         return self.relative_path
 80 |     
 81 |     
 82 |     def generate_obsidian_url(self):
 83 |         vault_name = self.vault_path.split("/")[-1]
 84 |         file_url_segments = self.relative_path.split(".")[:-1]
 85 |         file_url = ".".join(file_url_segments)
 86 |         return gen_obsidian_url(vault_name, file_url)
 87 |     
 88 |     
 89 |     def get_obsidian_url(self):
 90 |         return self.obsidian_url
 91 |     
 92 |     
 93 |     def generate_file_name(self):
 94 |         file_name = self.relative_path.split("/")[-1]
 95 |         file_name_segments = file_name.split(".")[:-1]
 96 |         file_name = ".".join(file_name_segments)
 97 |         return file_name
 98 |     
 99 |     
100 |     def get_file_name(self):
101 |         return self.file_name
102 |     
103 |     
104 |     def get_file_name_with_url(self):
105 |         url = self.get_obsidian_url()
106 |         name = self.get_file_name()
107 |         name_with_url = "<a href =\""+ url + "\">" + name + "</a>"
108 |         return name_with_url
109 |     
110 |     
111 |     def get_file_uid(self):
112 |         return self.uid
113 |     
114 |     
115 |     def get_cloze_or_basic(self):
116 |         return self.cloze_or_basic
117 |     
118 |     
119 |     def set_file_content(self, file_content):
120 |         self.file_content = file_content
121 |         
122 |     
123 |     def get_file_content(self):
124 |         return self.file_content
125 |     
126 |     
127 |     def get_tags(self):
128 |         tag_line = "[]"
129 |         try:
130 |             tag_line = self.metadata["tags"]
131 |         except:
132 |             pass
133 |         tag_line = tag_line.lstrip("[")
134 |         tag_line = tag_line.rstrip("]")
135 |         if tag_line.find("/"):
136 |             tag_line = tag_line.replace("/", "::")
137 |         tags = tag_line.split(",")
138 |         for i in range(0, len(tags)):
139 |             tags[i] = tags[i].lstrip(" ")
140 |             tags[i] = tags[i].rstrip(" ")
141 |             tags[i] = tags[i].replace(" ", "_")
142 |         return tags


--------------------------------------------------------------------------------
/src/manifest.json:
--------------------------------------------------------------------------------
1 | {
2 | 	"name": "Obsidianki 4",
3 | 	"package": "Obsidianki 4"
4 | }


--------------------------------------------------------------------------------
/src/markdown2/markdown2.py:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/env python
   2 | # Copyright (c) 2012 Trent Mick.
   3 | # Copyright (c) 2007-2008 ActiveState Corp.
   4 | # License: MIT (http://www.opensource.org/licenses/mit-license.php)
   5 | 
   6 | r"""A fast and complete Python implementation of Markdown.
   7 | 
   8 | [from http://daringfireball.net/projects/markdown/]
   9 | > Markdown is a text-to-HTML filter; it translates an easy-to-read /
  10 | > easy-to-write structured text format into HTML.  Markdown's text
  11 | > format is most similar to that of plain text email, and supports
  12 | > features such as headers, *emphasis*, code blocks, blockquotes, and
  13 | > links.
  14 | >
  15 | > Markdown's syntax is designed not as a generic markup language, but
  16 | > specifically to serve as a front-end to (X)HTML. You can use span-level
  17 | > HTML tags anywhere in a Markdown document, and you can use block level
  18 | > HTML tags (like <div> and <table> as well).
  19 | 
  20 | Module usage:
  21 | 
  22 |     >>> import markdown2
  23 |     >>> markdown2.markdown("*boo!*")  # or use `html = markdown_path(PATH)`
  24 |     u'<p><em>boo!</em></p>\n'
  25 | 
  26 |     >>> markdowner = Markdown()
  27 |     >>> markdowner.convert("*boo!*")
  28 |     u'<p><em>boo!</em></p>\n'
  29 |     >>> markdowner.convert("**boom!**")
  30 |     u'<p><strong>boom!</strong></p>\n'
  31 | 
  32 | This implementation of Markdown implements the full "core" syntax plus a
  33 | number of extras (e.g., code syntax coloring, footnotes) as described on
  34 | <https://github.com/trentm/python-markdown2/wiki/Extras>.
  35 | """
  36 | 
  37 | cmdln_desc = """A fast and complete Python implementation of Markdown, a
  38 | text-to-HTML conversion tool for web writers.
  39 | 
  40 | Supported extra syntax options (see -x|--extras option below and
  41 | see <https://github.com/trentm/python-markdown2/wiki/Extras> for details):
  42 | 
  43 | * break-on-newline: Replace single new line characters with <br> when True
  44 | * code-friendly: Disable _ and __ for em and strong.
  45 | * cuddled-lists: Allow lists to be cuddled to the preceding paragraph.
  46 | * fenced-code-blocks: Allows a code block to not have to be indented
  47 |   by fencing it with '```' on a line before and after. Based on
  48 |   <http://github.github.com/github-flavored-markdown/> with support for
  49 |   syntax highlighting.
  50 | * footnotes: Support footnotes as in use on daringfireball.net and
  51 |   implemented in other Markdown processors (tho not in Markdown.pl v1.0.1).
  52 | * header-ids: Adds "id" attributes to headers. The id value is a slug of
  53 |   the header text.
  54 | * highlightjs-lang: Allows specifying the language which used for syntax
  55 |   highlighting when using fenced-code-blocks and highlightjs.
  56 | * html-classes: Takes a dict mapping html tag names (lowercase) to a
  57 |   string to use for a "class" tag attribute. Currently only supports "img",
  58 |   "table", "pre" and "code" tags. Add an issue if you require this for other
  59 |   tags.
  60 | * link-patterns: Auto-link given regex patterns in text (e.g. bug number
  61 |   references, revision number references).
  62 | * markdown-in-html: Allow the use of `markdown="1"` in a block HTML tag to
  63 |   have markdown processing be done on its contents. Similar to
  64 |   <http://michelf.com/projects/php-markdown/extra/#markdown-attr> but with
  65 |   some limitations.
  66 | * metadata: Extract metadata from a leading '---'-fenced block.
  67 |   See <https://github.com/trentm/python-markdown2/issues/77> for details.
  68 | * nofollow: Add `rel="nofollow"` to add `<a>` tags with an href. See
  69 |   <http://en.wikipedia.org/wiki/Nofollow>.
  70 | * numbering: Support of generic counters.  Non standard extension to
  71 |   allow sequential numbering of figures, tables, equations, exhibits etc.
  72 | * pyshell: Treats unindented Python interactive shell sessions as <code>
  73 |   blocks.
  74 | * smarty-pants: Replaces ' and " with curly quotation marks or curly
  75 |   apostrophes.  Replaces --, ---, ..., and . . . with en dashes, em dashes,
  76 |   and ellipses.
  77 | * spoiler: A special kind of blockquote commonly hidden behind a
  78 |   click on SO. Syntax per <http://meta.stackexchange.com/a/72878>.
  79 | * strike: text inside of double tilde is ~~strikethrough~~
  80 | * tag-friendly: Requires atx style headers to have a space between the # and
  81 |   the header text. Useful for applications that require twitter style tags to
  82 |   pass through the parser.
  83 | * tables: Tables using the same format as GFM
  84 |   <https://help.github.com/articles/github-flavored-markdown#tables> and
  85 |   PHP-Markdown Extra <https://michelf.ca/projects/php-markdown/extra/#table>.
  86 | * toc: The returned HTML string gets a new "toc_html" attribute which is
  87 |   a Table of Contents for the document. (experimental)
  88 | * use-file-vars: Look for an Emacs-style markdown-extras file variable to turn
  89 |   on Extras.
  90 | * wiki-tables: Google Code Wiki-style tables. See
  91 |   <http://code.google.com/p/support/wiki/WikiSyntax#Tables>.
  92 | * xml: Passes one-liner processing instructions and namespaced XML tags.
  93 | """
  94 | 
  95 | # Dev Notes:
  96 | # - Python's regex syntax doesn't have '\z', so I'm using '\Z'. I'm
  97 | #   not yet sure if there implications with this. Compare 'pydoc sre'
  98 | #   and 'perldoc perlre'.
  99 | 
 100 | __version_info__ = (2, 4, 0)
 101 | __version__ = '.'.join(map(str, __version_info__))
 102 | __author__ = "Trent Mick"
 103 | 
 104 | import sys
 105 | import re
 106 | import logging
 107 | from hashlib import sha256
 108 | import optparse
 109 | from random import random, randint
 110 | import codecs
 111 | from collections import defaultdict
 112 | 
 113 | 
 114 | # ---- Python version compat
 115 | 
 116 | # Use `bytes` for byte strings and `unicode` for unicode strings (str in Py3).
 117 | if sys.version_info[0] <= 2:
 118 |     py3 = False
 119 |     try:
 120 |         bytes
 121 |     except NameError:
 122 |         bytes = str
 123 |     base_string_type = basestring
 124 | elif sys.version_info[0] >= 3:
 125 |     py3 = True
 126 |     unicode = str
 127 |     base_string_type = str
 128 | 
 129 | # ---- globals
 130 | 
 131 | DEBUG = False
 132 | log = logging.getLogger("markdown")
 133 | 
 134 | DEFAULT_TAB_WIDTH = 4
 135 | 
 136 | 
 137 | SECRET_SALT = bytes(randint(0, 1000000))
 138 | # MD5 function was previously used for this; the "md5" prefix was kept for
 139 | # backwards compatibility.
 140 | def _hash_text(s):
 141 |     return 'md5-' + sha256(SECRET_SALT + s.encode("utf-8")).hexdigest()[32:]
 142 | 
 143 | # Table of hash values for escaped characters:
 144 | g_escape_table = dict([(ch, _hash_text(ch))
 145 |     for ch in '\\`*_{}[]()>#+-.!'])
 146 | 
 147 | # Ampersand-encoding based entirely on Nat Irons's Amputator MT plugin:
 148 | #   http://bumppo.net/projects/amputator/
 149 | _AMPERSAND_RE = re.compile(r'&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)')
 150 | 
 151 | 
 152 | # ---- exceptions
 153 | class MarkdownError(Exception):
 154 |     pass
 155 | 
 156 | 
 157 | # ---- public api
 158 | 
 159 | def markdown_path(path, encoding="utf-8",
 160 |                   html4tags=False, tab_width=DEFAULT_TAB_WIDTH,
 161 |                   safe_mode=None, extras=None, link_patterns=None,
 162 |                   footnote_title=None, footnote_return_symbol=None,
 163 |                   use_file_vars=False):
 164 |     fp = codecs.open(path, 'r', encoding)
 165 |     text = fp.read()
 166 |     fp.close()
 167 |     return Markdown(html4tags=html4tags, tab_width=tab_width,
 168 |                     safe_mode=safe_mode, extras=extras,
 169 |                     link_patterns=link_patterns,
 170 |                     footnote_title=footnote_title,
 171 |                     footnote_return_symbol=footnote_return_symbol,
 172 |                     use_file_vars=use_file_vars).convert(text)
 173 | 
 174 | 
 175 | def markdown(text, html4tags=False, tab_width=DEFAULT_TAB_WIDTH,
 176 |              safe_mode=None, extras=None, link_patterns=None,
 177 |              footnote_title=None, footnote_return_symbol=None,
 178 |              use_file_vars=False, cli=False) -> object:
 179 |     """
 180 | 
 181 |     @rtype: object
 182 |     """
 183 |     return Markdown(html4tags=html4tags, tab_width=tab_width,
 184 |                     safe_mode=safe_mode, extras=extras,
 185 |                     link_patterns=link_patterns,
 186 |                     footnote_title=footnote_title,
 187 |                     footnote_return_symbol=footnote_return_symbol,
 188 |                     use_file_vars=use_file_vars, cli=cli).convert(text)
 189 | 
 190 | 
 191 | class Markdown(object):
 192 |     # The dict of "extras" to enable in processing -- a mapping of
 193 |     # extra name to argument for the extra. Most extras do not have an
 194 |     # argument, in which case the value is None.
 195 |     #
 196 |     # This can be set via (a) subclassing and (b) the constructor
 197 |     # "extras" argument.
 198 |     extras = None
 199 | 
 200 |     urls = None
 201 |     titles = None
 202 |     html_blocks = None
 203 |     html_spans = None
 204 |     html_removed_text = "{(#HTML#)}"  # placeholder removed text that does not trigger bold
 205 |     html_removed_text_compat = "[HTML_REMOVED]"  # for compat with markdown.py
 206 | 
 207 |     _toc = None
 208 | 
 209 |     # Used to track when we're inside an ordered or unordered list
 210 |     # (see _ProcessListItems() for details):
 211 |     list_level = 0
 212 | 
 213 |     _ws_only_line_re = re.compile(r"^[ \t]+$", re.M)
 214 | 
 215 |     def __init__(self, html4tags=False, tab_width=4, safe_mode=None,
 216 |                  extras=None, link_patterns=None,
 217 |                  footnote_title=None, footnote_return_symbol=None,
 218 |                  use_file_vars=False, cli=False):
 219 |         if html4tags:
 220 |             self.empty_element_suffix = ">"
 221 |         else:
 222 |             self.empty_element_suffix = " />"
 223 |         self.tab_width = tab_width
 224 |         self.tab = tab_width * " "
 225 | 
 226 |         # For compatibility with earlier markdown2.py and with
 227 |         # markdown.py's safe_mode being a boolean,
 228 |         #   safe_mode == True -> "replace"
 229 |         if safe_mode is True:
 230 |             self.safe_mode = "replace"
 231 |         else:
 232 |             self.safe_mode = safe_mode
 233 | 
 234 |         # Massaging and building the "extras" info.
 235 |         if self.extras is None:
 236 |             self.extras = {}
 237 |         elif not isinstance(self.extras, dict):
 238 |             self.extras = dict([(e, None) for e in self.extras])
 239 |         if extras:
 240 |             if not isinstance(extras, dict):
 241 |                 extras = dict([(e, None) for e in extras])
 242 |             self.extras.update(extras)
 243 |         assert isinstance(self.extras, dict)
 244 | 
 245 |         if "toc" in self.extras:
 246 |             if "header-ids" not in self.extras:
 247 |                 self.extras["header-ids"] = None   # "toc" implies "header-ids"
 248 | 
 249 |             if self.extras["toc"] is None:
 250 |                 self._toc_depth = 6
 251 |             else:
 252 |                 self._toc_depth = self.extras["toc"].get("depth", 6)
 253 |         self._instance_extras = self.extras.copy()
 254 | 
 255 |         self.link_patterns = link_patterns
 256 |         self.footnote_title = footnote_title
 257 |         self.footnote_return_symbol = footnote_return_symbol
 258 |         self.use_file_vars = use_file_vars
 259 |         self._outdent_re = re.compile(r'^(\t|[ ]{1,%d})' % tab_width, re.M)
 260 |         self.cli = cli
 261 | 
 262 |         self._escape_table = g_escape_table.copy()
 263 |         if "smarty-pants" in self.extras:
 264 |             self._escape_table['"'] = _hash_text('"')
 265 |             self._escape_table["'"] = _hash_text("'")
 266 | 
 267 |     def reset(self):
 268 |         self.urls = {}
 269 |         self.titles = {}
 270 |         self.html_blocks = {}
 271 |         self.html_spans = {}
 272 |         self.list_level = 0
 273 |         self.extras = self._instance_extras.copy()
 274 |         if "footnotes" in self.extras:
 275 |             self.footnotes = {}
 276 |             self.footnote_ids = []
 277 |         if "header-ids" in self.extras:
 278 |             self._count_from_header_id = defaultdict(int)
 279 |         if "metadata" in self.extras:
 280 |             self.metadata = {}
 281 |         self._toc = None
 282 | 
 283 |     # Per <https://developer.mozilla.org/en-US/docs/HTML/Element/a> "rel"
 284 |     # should only be used in <a> tags with an "href" attribute.
 285 | 
 286 |     # Opens the linked document in a new window or tab
 287 |     # should only used in <a> tags with an "href" attribute.
 288 |     # same with _a_nofollow
 289 |     _a_nofollow_or_blank_links = re.compile(r"""
 290 |         <(a)
 291 |         (
 292 |             [^>]*
 293 |             href=   # href is required
 294 |             ['"]?   # HTML5 attribute values do not have to be quoted
 295 |             [^#'"]  # We don't want to match href values that start with # (like footnotes)
 296 |         )
 297 |         """,
 298 |         re.IGNORECASE | re.VERBOSE
 299 |     )
 300 | 
 301 |     def convert(self, text):
 302 |         """Convert the given text."""
 303 |         # Main function. The order in which other subs are called here is
 304 |         # essential. Link and image substitutions need to happen before
 305 |         # _EscapeSpecialChars(), so that any *'s or _'s in the <a>
 306 |         # and <img> tags get encoded.
 307 | 
 308 |         # Clear the global hashes. If we don't clear these, you get conflicts
 309 |         # from other articles when generating a page which contains more than
 310 |         # one article (e.g. an index page that shows the N most recent
 311 |         # articles):
 312 |         self.reset()
 313 | 
 314 |         if not isinstance(text, unicode):
 315 |             # TODO: perhaps shouldn't presume UTF-8 for string input?
 316 |             text = unicode(text, 'utf-8')
 317 | 
 318 |         if self.use_file_vars:
 319 |             # Look for emacs-style file variable hints.
 320 |             emacs_vars = self._get_emacs_vars(text)
 321 |             if "markdown-extras" in emacs_vars:
 322 |                 splitter = re.compile("[ ,]+")
 323 |                 for e in splitter.split(emacs_vars["markdown-extras"]):
 324 |                     if '=' in e:
 325 |                         ename, earg = e.split('=', 1)
 326 |                         try:
 327 |                             earg = int(earg)
 328 |                         except ValueError:
 329 |                             pass
 330 |                     else:
 331 |                         ename, earg = e, None
 332 |                     self.extras[ename] = earg
 333 | 
 334 |         # Standardize line endings:
 335 |         text = text.replace("\r\n", "\n")
 336 |         text = text.replace("\r", "\n")
 337 | 
 338 |         # Make sure $text ends with a couple of newlines:
 339 |         text += "\n\n"
 340 | 
 341 |         # Convert all tabs to spaces.
 342 |         text = self._detab(text)
 343 | 
 344 |         # Strip any lines consisting only of spaces and tabs.
 345 |         # This makes subsequent regexen easier to write, because we can
 346 |         # match consecutive blank lines with /\n+/ instead of something
 347 |         # contorted like /[ \t]*\n+/ .
 348 |         text = self._ws_only_line_re.sub("", text)
 349 | 
 350 |         # strip metadata from head and extract
 351 |         if "metadata" in self.extras:
 352 |             text = self._extract_metadata(text)
 353 | 
 354 |         text = self.preprocess(text)
 355 | 
 356 |         if "fenced-code-blocks" in self.extras and not self.safe_mode:
 357 |             text = self._do_fenced_code_blocks(text)
 358 | 
 359 |         if self.safe_mode:
 360 |             text = self._hash_html_spans(text)
 361 | 
 362 |         # Turn block-level HTML blocks into hash entries
 363 |         text = self._hash_html_blocks(text, raw=True)
 364 | 
 365 |         if "fenced-code-blocks" in self.extras and self.safe_mode:
 366 |             text = self._do_fenced_code_blocks(text)
 367 | 
 368 |         # Because numbering references aren't links (yet?) then we can do everything associated with counters
 369 |         # before we get started
 370 |         if "numbering" in self.extras:
 371 |             text = self._do_numbering(text)
 372 | 
 373 |         # Strip link definitions, store in hashes.
 374 |         if "footnotes" in self.extras:
 375 |             # Must do footnotes first because an unlucky footnote defn
 376 |             # looks like a link defn:
 377 |             #   [^4]: this "looks like a link defn"
 378 |             text = self._strip_footnote_definitions(text)
 379 |         text = self._strip_link_definitions(text)
 380 | 
 381 |         text = self._run_block_gamut(text)
 382 | 
 383 |         if "footnotes" in self.extras:
 384 |             text = self._add_footnotes(text)
 385 | 
 386 |         text = self.postprocess(text)
 387 | 
 388 |         text = self._unescape_special_chars(text)
 389 | 
 390 |         if self.safe_mode:
 391 |             text = self._unhash_html_spans(text)
 392 |             # return the removed text warning to its markdown.py compatible form
 393 |             text = text.replace(self.html_removed_text, self.html_removed_text_compat)
 394 | 
 395 |         do_target_blank_links = "target-blank-links" in self.extras
 396 |         do_nofollow_links = "nofollow" in self.extras
 397 | 
 398 |         if do_target_blank_links and do_nofollow_links:
 399 |             text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="nofollow noopener" target="_blank"\2', text)
 400 |         elif do_target_blank_links:
 401 |             text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="noopener" target="_blank"\2', text)
 402 |         elif do_nofollow_links:
 403 |             text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="nofollow"\2', text)
 404 | 
 405 |         if "toc" in self.extras and self._toc:
 406 |             self._toc_html = calculate_toc_html(self._toc)
 407 | 
 408 |             # Prepend toc html to output
 409 |             if self.cli:
 410 |                 text = '{}\n{}'.format(self._toc_html, text)
 411 | 
 412 |         text += "\n"
 413 | 
 414 |         # Attach attrs to output
 415 |         rv = UnicodeWithAttrs(text)
 416 | 
 417 |         if "toc" in self.extras and self._toc:
 418 |             rv.toc_html = self._toc_html
 419 | 
 420 |         if "metadata" in self.extras:
 421 |             rv.metadata = self.metadata
 422 |         return rv
 423 | 
 424 |     def postprocess(self, text):
 425 |         """A hook for subclasses to do some postprocessing of the html, if
 426 |         desired. This is called before unescaping of special chars and
 427 |         unhashing of raw HTML spans.
 428 |         """
 429 |         return text
 430 | 
 431 |     def preprocess(self, text):
 432 |         """A hook for subclasses to do some preprocessing of the Markdown, if
 433 |         desired. This is called after basic formatting of the text, but prior
 434 |         to any extras, safe mode, etc. processing.
 435 |         """
 436 |         return text
 437 | 
 438 |     # Is metadata if the content starts with optional '---'-fenced `key: value`
 439 |     # pairs. E.g. (indented for presentation):
 440 |     #   ---
 441 |     #   foo: bar
 442 |     #   another-var: blah blah
 443 |     #   ---
 444 |     #   # header
 445 |     # or:
 446 |     #   foo: bar
 447 |     #   another-var: blah blah
 448 |     #
 449 |     #   # header
 450 |     _meta_data_pattern = re.compile(r'^(?:---[\ \t]*\n)?((?:[\S\w]+\s*:(?:\n+[ \t]+.*)+)|(?:.*:\s+>\n\s+[\S\s]+?)(?=\n\w+\s*:\s*\w+\n|\Z)|(?:\s*[\S\w]+\s*:(?! >)[ \t]*.*\n?))(?:---[\ \t]*\n)?', re.MULTILINE)
 451 |     _key_val_pat = re.compile(r"[\S\w]+\s*:(?! >)[ \t]*.*\n?", re.MULTILINE)
 452 |     # this allows key: >
 453 |     #                   value
 454 |     #                   conutiues over multiple lines
 455 |     _key_val_block_pat = re.compile(
 456 |         r"(.*:\s+>\n\s+[\S\s]+?)(?=\n\w+\s*:\s*\w+\n|\Z)", re.MULTILINE
 457 |     )
 458 |     _key_val_list_pat = re.compile(
 459 |         r"^-(?:[ \t]*([^\n]*)(?:[ \t]*[:-][ \t]*(\S+))?)(?:\n((?:[ \t]+[^\n]+\n?)+))?",
 460 |         re.MULTILINE,
 461 |     )
 462 |     _key_val_dict_pat = re.compile(
 463 |         r"^([^:\n]+)[ \t]*:[ \t]*([^\n]*)(?:((?:\n[ \t]+[^\n]+)+))?", re.MULTILINE
 464 |     )  # grp0: key, grp1: value, grp2: multiline value
 465 |     _meta_data_fence_pattern = re.compile(r'^---[\ \t]*\n', re.MULTILINE)
 466 |     _meta_data_newline = re.compile("^\n", re.MULTILINE)
 467 | 
 468 |     def _extract_metadata(self, text):
 469 |         if text.startswith("---"):
 470 |             fence_splits = re.split(self._meta_data_fence_pattern, text, maxsplit=2)
 471 |             metadata_content = fence_splits[1]
 472 |             match = re.findall(self._meta_data_pattern, metadata_content)
 473 |             if not match:
 474 |                 return text
 475 |             tail = fence_splits[2]
 476 |         else:
 477 |             metadata_split = re.split(self._meta_data_newline, text, maxsplit=1)
 478 |             metadata_content = metadata_split[0]
 479 |             match = re.findall(self._meta_data_pattern, metadata_content)
 480 |             if not match:
 481 |                 return text
 482 |             tail = metadata_split[1]
 483 | 
 484 |         def parse_structured_value(value):
 485 |             vs = value.lstrip()
 486 |             vs = value.replace(v[: len(value) - len(vs)], "\n")[1:]
 487 | 
 488 |             # List
 489 |             if vs.startswith("-"):
 490 |                 r = []
 491 |                 for match in re.findall(self._key_val_list_pat, vs):
 492 |                     if match[0] and not match[1] and not match[2]:
 493 |                         r.append(match[0].strip())
 494 |                     elif match[0] == ">" and not match[1] and match[2]:
 495 |                         r.append(match[2].strip())
 496 |                     elif match[0] and match[1]:
 497 |                         r.append({match[0].strip(): match[1].strip()})
 498 |                     elif not match[0] and not match[1] and match[2]:
 499 |                         r.append(parse_structured_value(match[2]))
 500 |                     else:
 501 |                         # Broken case
 502 |                         pass
 503 | 
 504 |                 return r
 505 | 
 506 |             # Dict
 507 |             else:
 508 |                 return {
 509 |                     match[0].strip(): (
 510 |                         match[1].strip()
 511 |                         if match[1]
 512 |                         else parse_structured_value(match[2])
 513 |                     )
 514 |                     for match in re.findall(self._key_val_dict_pat, vs)
 515 |                 }
 516 | 
 517 |         for item in match:
 518 | 
 519 |             k, v = item.split(":", 1)
 520 | 
 521 |             # Multiline value
 522 |             if v[:3] == " >\n":
 523 |                 self.metadata[k.strip()] = v[3:].strip()
 524 | 
 525 |             # Empty value
 526 |             elif v == "\n":
 527 |                 self.metadata[k.strip()] = ""
 528 | 
 529 |             # Structured value
 530 |             elif v[0] == "\n":
 531 |                 self.metadata[k.strip()] = parse_structured_value(v)
 532 | 
 533 |             # Simple value
 534 |             else:
 535 |                 self.metadata[k.strip()] = v.strip()
 536 | 
 537 |         return tail
 538 | 
 539 |     _emacs_oneliner_vars_pat = re.compile(r"-\*-\s*([^\r\n]*?)\s*-\*-", re.UNICODE)
 540 |     # This regular expression is intended to match blocks like this:
 541 |     #    PREFIX Local Variables: SUFFIX
 542 |     #    PREFIX mode: Tcl SUFFIX
 543 |     #    PREFIX End: SUFFIX
 544 |     # Some notes:
 545 |     # - "[ \t]" is used instead of "\s" to specifically exclude newlines
 546 |     # - "(\r\n|\n|\r)" is used instead of "$" because the sre engine does
 547 |     #   not like anything other than Unix-style line terminators.
 548 |     _emacs_local_vars_pat = re.compile(r"""^
 549 |         (?P<prefix>(?:[^\r\n|\n|\r])*?)
 550 |         [\ \t]*Local\ Variables:[\ \t]*
 551 |         (?P<suffix>.*?)(?:\r\n|\n|\r)
 552 |         (?P<content>.*?\1End:)
 553 |         """, re.IGNORECASE | re.MULTILINE | re.DOTALL | re.VERBOSE)
 554 | 
 555 |     def _get_emacs_vars(self, text):
 556 |         """Return a dictionary of emacs-style local variables.
 557 | 
 558 |         Parsing is done loosely according to this spec (and according to
 559 |         some in-practice deviations from this):
 560 |         http://www.gnu.org/software/emacs/manual/html_node/emacs/Specifying-File-Variables.html#Specifying-File-Variables
 561 |         """
 562 |         emacs_vars = {}
 563 |         SIZE = pow(2, 13)  # 8kB
 564 | 
 565 |         # Search near the start for a '-*-'-style one-liner of variables.
 566 |         head = text[:SIZE]
 567 |         if "-*-" in head:
 568 |             match = self._emacs_oneliner_vars_pat.search(head)
 569 |             if match:
 570 |                 emacs_vars_str = match.group(1)
 571 |                 assert '\n' not in emacs_vars_str
 572 |                 emacs_var_strs = [s.strip() for s in emacs_vars_str.split(';')
 573 |                                   if s.strip()]
 574 |                 if len(emacs_var_strs) == 1 and ':' not in emacs_var_strs[0]:
 575 |                     # While not in the spec, this form is allowed by emacs:
 576 |                     #   -*- Tcl -*-
 577 |                     # where the implied "variable" is "mode". This form
 578 |                     # is only allowed if there are no other variables.
 579 |                     emacs_vars["mode"] = emacs_var_strs[0].strip()
 580 |                 else:
 581 |                     for emacs_var_str in emacs_var_strs:
 582 |                         try:
 583 |                             variable, value = emacs_var_str.strip().split(':', 1)
 584 |                         except ValueError:
 585 |                             log.debug("emacs variables error: malformed -*- "
 586 |                                       "line: %r", emacs_var_str)
 587 |                             continue
 588 |                         # Lowercase the variable name because Emacs allows "Mode"
 589 |                         # or "mode" or "MoDe", etc.
 590 |                         emacs_vars[variable.lower()] = value.strip()
 591 | 
 592 |         tail = text[-SIZE:]
 593 |         if "Local Variables" in tail:
 594 |             match = self._emacs_local_vars_pat.search(tail)
 595 |             if match:
 596 |                 prefix = match.group("prefix")
 597 |                 suffix = match.group("suffix")
 598 |                 lines = match.group("content").splitlines(0)
 599 |                 # print "prefix=%r, suffix=%r, content=%r, lines: %s"\
 600 |                 #      % (prefix, suffix, match.group("content"), lines)
 601 | 
 602 |                 # Validate the Local Variables block: proper prefix and suffix
 603 |                 # usage.
 604 |                 for i, line in enumerate(lines):
 605 |                     if not line.startswith(prefix):
 606 |                         log.debug("emacs variables error: line '%s' "
 607 |                                   "does not use proper prefix '%s'"
 608 |                                   % (line, prefix))
 609 |                         return {}
 610 |                     # Don't validate suffix on last line. Emacs doesn't care,
 611 |                     # neither should we.
 612 |                     if i != len(lines)-1 and not line.endswith(suffix):
 613 |                         log.debug("emacs variables error: line '%s' "
 614 |                                   "does not use proper suffix '%s'"
 615 |                                   % (line, suffix))
 616 |                         return {}
 617 | 
 618 |                 # Parse out one emacs var per line.
 619 |                 continued_for = None
 620 |                 for line in lines[:-1]:  # no var on the last line ("PREFIX End:")
 621 |                     if prefix: line = line[len(prefix):]  # strip prefix
 622 |                     if suffix: line = line[:-len(suffix)]  # strip suffix
 623 |                     line = line.strip()
 624 |                     if continued_for:
 625 |                         variable = continued_for
 626 |                         if line.endswith('\\'):
 627 |                             line = line[:-1].rstrip()
 628 |                         else:
 629 |                             continued_for = None
 630 |                         emacs_vars[variable] += ' ' + line
 631 |                     else:
 632 |                         try:
 633 |                             variable, value = line.split(':', 1)
 634 |                         except ValueError:
 635 |                             log.debug("local variables error: missing colon "
 636 |                                       "in local variables entry: '%s'" % line)
 637 |                             continue
 638 |                         # Do NOT lowercase the variable name, because Emacs only
 639 |                         # allows "mode" (and not "Mode", "MoDe", etc.) in this block.
 640 |                         value = value.strip()
 641 |                         if value.endswith('\\'):
 642 |                             value = value[:-1].rstrip()
 643 |                             continued_for = variable
 644 |                         else:
 645 |                             continued_for = None
 646 |                         emacs_vars[variable] = value
 647 | 
 648 |         # Unquote values.
 649 |         for var, val in list(emacs_vars.items()):
 650 |             if len(val) > 1 and (val.startswith('"') and val.endswith('"')
 651 |                or val.startswith('"') and val.endswith('"')):
 652 |                 emacs_vars[var] = val[1:-1]
 653 | 
 654 |         return emacs_vars
 655 | 
 656 |     def _detab_line(self, line):
 657 |         r"""Recusively convert tabs to spaces in a single line.
 658 | 
 659 |         Called from _detab()."""
 660 |         if '\t' not in line:
 661 |             return line
 662 |         chunk1, chunk2 = line.split('\t', 1)
 663 |         chunk1 += (' ' * (self.tab_width - len(chunk1) % self.tab_width))
 664 |         output = chunk1 + chunk2
 665 |         return self._detab_line(output)
 666 | 
 667 |     def _detab(self, text):
 668 |         r"""Iterate text line by line and convert tabs to spaces.
 669 | 
 670 |             >>> m = Markdown()
 671 |             >>> m._detab("\tfoo")
 672 |             '    foo'
 673 |             >>> m._detab("  \tfoo")
 674 |             '    foo'
 675 |             >>> m._detab("\t  foo")
 676 |             '      foo'
 677 |             >>> m._detab("  foo")
 678 |             '  foo'
 679 |             >>> m._detab("  foo\n\tbar\tblam")
 680 |             '  foo\n    bar blam'
 681 |         """
 682 |         if '\t' not in text:
 683 |             return text
 684 |         output = []
 685 |         for line in text.splitlines():
 686 |             output.append(self._detab_line(line))
 687 |         return '\n'.join(output)
 688 | 
 689 |     # I broke out the html5 tags here and add them to _block_tags_a and
 690 |     # _block_tags_b.  This way html5 tags are easy to keep track of.
 691 |     _html5tags = '|article|aside|header|hgroup|footer|nav|section|figure|figcaption'
 692 | 
 693 |     _block_tags_a = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del'
 694 |     _block_tags_a += _html5tags
 695 | 
 696 |     _strict_tag_block_re = re.compile(r"""
 697 |         (                       # save in \1
 698 |             ^                   # start of line  (with re.M)
 699 |             <(%s)               # start tag = \2
 700 |             \b                  # word break
 701 |             (.*\n)*?            # any number of lines, minimally matching
 702 |             </\2>               # the matching end tag
 703 |             [ \t]*              # trailing spaces/tabs
 704 |             (?=\n+|\Z)          # followed by a newline or end of document
 705 |         )
 706 |         """ % _block_tags_a,
 707 |         re.X | re.M)
 708 | 
 709 |     _block_tags_b = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math'
 710 |     _block_tags_b += _html5tags
 711 | 
 712 |     _liberal_tag_block_re = re.compile(r"""
 713 |         (                       # save in \1
 714 |             ^                   # start of line  (with re.M)
 715 |             <(%s)               # start tag = \2
 716 |             \b                  # word break
 717 |             (.*\n)*?            # any number of lines, minimally matching
 718 |             .*</\2>             # the matching end tag
 719 |             [ \t]*              # trailing spaces/tabs
 720 |             (?=\n+|\Z)          # followed by a newline or end of document
 721 |         )
 722 |         """ % _block_tags_b,
 723 |         re.X | re.M)
 724 | 
 725 |     _html_markdown_attr_re = re.compile(
 726 |         r'''\s+markdown=("1"|'1')''')
 727 |     def _hash_html_block_sub(self, match, raw=False):
 728 |         html = match.group(1)
 729 |         if raw and self.safe_mode:
 730 |             html = self._sanitize_html(html)
 731 |         elif 'markdown-in-html' in self.extras and 'markdown=' in html:
 732 |             first_line = html.split('\n', 1)[0]
 733 |             m = self._html_markdown_attr_re.search(first_line)
 734 |             if m:
 735 |                 lines = html.split('\n')
 736 |                 middle = '\n'.join(lines[1:-1])
 737 |                 last_line = lines[-1]
 738 |                 first_line = first_line[:m.start()] + first_line[m.end():]
 739 |                 f_key = _hash_text(first_line)
 740 |                 self.html_blocks[f_key] = first_line
 741 |                 l_key = _hash_text(last_line)
 742 |                 self.html_blocks[l_key] = last_line
 743 |                 return ''.join(["\n\n", f_key,
 744 |                     "\n\n", middle, "\n\n",
 745 |                     l_key, "\n\n"])
 746 |         key = _hash_text(html)
 747 |         self.html_blocks[key] = html
 748 |         return "\n\n" + key + "\n\n"
 749 | 
 750 |     def _hash_html_blocks(self, text, raw=False):
 751 |         """Hashify HTML blocks
 752 | 
 753 |         We only want to do this for block-level HTML tags, such as headers,
 754 |         lists, and tables. That's because we still want to wrap <p>s around
 755 |         "paragraphs" that are wrapped in non-block-level tags, such as anchors,
 756 |         phrase emphasis, and spans. The list of tags we're looking for is
 757 |         hard-coded.
 758 | 
 759 |         @param raw {boolean} indicates if these are raw HTML blocks in
 760 |             the original source. It makes a difference in "safe" mode.
 761 |         """
 762 |         if '<' not in text:
 763 |             return text
 764 | 
 765 |         # Pass `raw` value into our calls to self._hash_html_block_sub.
 766 |         hash_html_block_sub = _curry(self._hash_html_block_sub, raw=raw)
 767 | 
 768 |         # First, look for nested blocks, e.g.:
 769 |         #   <div>
 770 |         #       <div>
 771 |         #       tags for inner block must be indented.
 772 |         #       </div>
 773 |         #   </div>
 774 |         #
 775 |         # The outermost tags must start at the left margin for this to match, and
 776 |         # the inner nested divs must be indented.
 777 |         # We need to do this before the next, more liberal match, because the next
 778 |         # match will start at the first `<div>` and stop at the first `</div>`.
 779 |         text = self._strict_tag_block_re.sub(hash_html_block_sub, text)
 780 | 
 781 |         # Now match more liberally, simply from `\n<tag>` to `</tag>\n`
 782 |         text = self._liberal_tag_block_re.sub(hash_html_block_sub, text)
 783 | 
 784 |         # Special case just for <hr />. It was easier to make a special
 785 |         # case than to make the other regex more complicated.
 786 |         if "<hr" in text:
 787 |             _hr_tag_re = _hr_tag_re_from_tab_width(self.tab_width)
 788 |             text = _hr_tag_re.sub(hash_html_block_sub, text)
 789 | 
 790 |         # Special case for standalone HTML comments:
 791 |         if "<!--" in text:
 792 |             start = 0
 793 |             while True:
 794 |                 # Delimiters for next comment block.
 795 |                 try:
 796 |                     start_idx = text.index("<!--", start)
 797 |                 except ValueError:
 798 |                     break
 799 |                 try:
 800 |                     end_idx = text.index("-->", start_idx) + 3
 801 |                 except ValueError:
 802 |                     break
 803 | 
 804 |                 # Start position for next comment block search.
 805 |                 start = end_idx
 806 | 
 807 |                 # Validate whitespace before comment.
 808 |                 if start_idx:
 809 |                     # - Up to `tab_width - 1` spaces before start_idx.
 810 |                     for i in range(self.tab_width - 1):
 811 |                         if text[start_idx - 1] != ' ':
 812 |                             break
 813 |                         start_idx -= 1
 814 |                         if start_idx == 0:
 815 |                             break
 816 |                     # - Must be preceded by 2 newlines or hit the start of
 817 |                     #   the document.
 818 |                     if start_idx == 0:
 819 |                         pass
 820 |                     elif start_idx == 1 and text[0] == '\n':
 821 |                         start_idx = 0  # to match minute detail of Markdown.pl regex
 822 |                     elif text[start_idx-2:start_idx] == '\n\n':
 823 |                         pass
 824 |                     else:
 825 |                         break
 826 | 
 827 |                 # Validate whitespace after comment.
 828 |                 # - Any number of spaces and tabs.
 829 |                 while end_idx < len(text):
 830 |                     if text[end_idx] not in ' \t':
 831 |                         break
 832 |                     end_idx += 1
 833 |                 # - Must be following by 2 newlines or hit end of text.
 834 |                 if text[end_idx:end_idx+2] not in ('', '\n', '\n\n'):
 835 |                     continue
 836 | 
 837 |                 # Escape and hash (must match `_hash_html_block_sub`).
 838 |                 html = text[start_idx:end_idx]
 839 |                 if raw and self.safe_mode:
 840 |                     html = self._sanitize_html(html)
 841 |                 key = _hash_text(html)
 842 |                 self.html_blocks[key] = html
 843 |                 text = text[:start_idx] + "\n\n" + key + "\n\n" + text[end_idx:]
 844 | 
 845 |         if "xml" in self.extras:
 846 |             # Treat XML processing instructions and namespaced one-liner
 847 |             # tags as if they were block HTML tags. E.g., if standalone
 848 |             # (i.e. are their own paragraph), the following do not get
 849 |             # wrapped in a <p> tag:
 850 |             #    <?foo bar?>
 851 |             #
 852 |             #    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="chapter_1.md"/>
 853 |             _xml_oneliner_re = _xml_oneliner_re_from_tab_width(self.tab_width)
 854 |             text = _xml_oneliner_re.sub(hash_html_block_sub, text)
 855 | 
 856 |         return text
 857 | 
 858 |     def _strip_link_definitions(self, text):
 859 |         # Strips link definitions from text, stores the URLs and titles in
 860 |         # hash references.
 861 |         less_than_tab = self.tab_width - 1
 862 | 
 863 |         # Link defs are in the form:
 864 |         #   [id]: url "optional title"
 865 |         _link_def_re = re.compile(r"""
 866 |             ^[ ]{0,%d}\[(.+)\]: # id = \1
 867 |               [ \t]*
 868 |               \n?               # maybe *one* newline
 869 |               [ \t]*
 870 |             <?(.+?)>?           # url = \2
 871 |               [ \t]*
 872 |             (?:
 873 |                 \n?             # maybe one newline
 874 |                 [ \t]*
 875 |                 (?<=\s)         # lookbehind for whitespace
 876 |                 ['"(]
 877 |                 ([^\n]*)        # title = \3
 878 |                 ['")]
 879 |                 [ \t]*
 880 |             )?  # title is optional
 881 |             (?:\n+|\Z)
 882 |             """ % less_than_tab, re.X | re.M | re.U)
 883 |         return _link_def_re.sub(self._extract_link_def_sub, text)
 884 | 
 885 |     def _extract_link_def_sub(self, match):
 886 |         id, url, title = match.groups()
 887 |         key = id.lower()    # Link IDs are case-insensitive
 888 |         self.urls[key] = self._encode_amps_and_angles(url)
 889 |         if title:
 890 |             self.titles[key] = title
 891 |         return ""
 892 | 
 893 |     def _do_numbering(self, text):
 894 |         ''' We handle the special extension for generic numbering for
 895 |             tables, figures etc.
 896 |         '''
 897 |         # First pass to define all the references
 898 |         self.regex_defns = re.compile(r'''
 899 |             \[\#(\w+)\s* # the counter.  Open square plus hash plus a word \1
 900 |             ([^@]*)\s*   # Some optional characters, that aren't an @. \2
 901 |             @(\w+)       # the id.  Should this be normed? \3
 902 |             ([^\]]*)\]   # The rest of the text up to the terminating ] \4
 903 |             ''', re.VERBOSE)
 904 |         self.regex_subs = re.compile(r"\[@(\w+)\s*\]")  # [@ref_id]
 905 |         counters = {}
 906 |         references = {}
 907 |         replacements = []
 908 |         definition_html = '<figcaption class="{}" id="counter-ref-{}">{}{}{}</figcaption>'
 909 |         reference_html = '<a class="{}" href="#counter-ref-{}">{}</a>'
 910 |         for match in self.regex_defns.finditer(text):
 911 |             # We must have four match groups otherwise this isn't a numbering reference
 912 |             if len(match.groups()) != 4:
 913 |                 continue
 914 |             counter = match.group(1)
 915 |             text_before = match.group(2)
 916 |             ref_id = match.group(3)
 917 |             text_after = match.group(4)
 918 |             number = counters.get(counter, 1)
 919 |             references[ref_id] = (number, counter)
 920 |             replacements.append((match.start(0),
 921 |                                  definition_html.format(counter,
 922 |                                                         ref_id,
 923 |                                                         text_before,
 924 |                                                         number,
 925 |                                                         text_after),
 926 |                                  match.end(0)))
 927 |             counters[counter] = number + 1
 928 |         for repl in reversed(replacements):
 929 |             text = text[:repl[0]] + repl[1] + text[repl[2]:]
 930 | 
 931 |         # Second pass to replace the references with the right
 932 |         # value of the counter
 933 |         # Fwiw, it's vaguely annoying to have to turn the iterator into
 934 |         # a list and then reverse it but I can't think of a better thing to do.
 935 |         for match in reversed(list(self.regex_subs.finditer(text))):
 936 |             number, counter = references.get(match.group(1), (None, None))
 937 |             if number is not None:
 938 |                 repl = reference_html.format(counter,
 939 |                                              match.group(1),
 940 |                                              number)
 941 |             else:
 942 |                 repl = reference_html.format(match.group(1),
 943 |                                              'countererror',
 944 |                                              '?' + match.group(1) + '?')
 945 |             if "smarty-pants" in self.extras:
 946 |                 repl = repl.replace('"', self._escape_table['"'])
 947 | 
 948 |             text = text[:match.start()] + repl + text[match.end():]
 949 |         return text
 950 | 
 951 |     def _extract_footnote_def_sub(self, match):
 952 |         id, text = match.groups()
 953 |         text = _dedent(text, skip_first_line=not text.startswith('\n')).strip()
 954 |         normed_id = re.sub(r'\W', '-', id)
 955 |         # Ensure footnote text ends with a couple newlines (for some
 956 |         # block gamut matches).
 957 |         self.footnotes[normed_id] = text + "\n\n"
 958 |         return ""
 959 | 
 960 |     def _strip_footnote_definitions(self, text):
 961 |         """A footnote definition looks like this:
 962 | 
 963 |             [^note-id]: Text of the note.
 964 | 
 965 |                 May include one or more indented paragraphs.
 966 | 
 967 |         Where,
 968 |         - The 'note-id' can be pretty much anything, though typically it
 969 |           is the number of the footnote.
 970 |         - The first paragraph may start on the next line, like so:
 971 | 
 972 |             [^note-id]:
 973 |                 Text of the note.
 974 |         """
 975 |         less_than_tab = self.tab_width - 1
 976 |         footnote_def_re = re.compile(r'''
 977 |             ^[ ]{0,%d}\[\^(.+)\]:   # id = \1
 978 |             [ \t]*
 979 |             (                       # footnote text = \2
 980 |               # First line need not start with the spaces.
 981 |               (?:\s*.*\n+)
 982 |               (?:
 983 |                 (?:[ ]{%d} | \t)  # Subsequent lines must be indented.
 984 |                 .*\n+
 985 |               )*
 986 |             )
 987 |             # Lookahead for non-space at line-start, or end of doc.
 988 |             (?:(?=^[ ]{0,%d}\S)|\Z)
 989 |             ''' % (less_than_tab, self.tab_width, self.tab_width),
 990 |             re.X | re.M)
 991 |         return footnote_def_re.sub(self._extract_footnote_def_sub, text)
 992 | 
 993 |     _hr_re = re.compile(r'^[ ]{0,3}([-_*][ ]{0,2}){3,}$', re.M)
 994 | 
 995 |     def _run_block_gamut(self, text):
 996 |         # These are all the transformations that form block-level
 997 |         # tags like paragraphs, headers, and list items.
 998 | 
 999 |         if "fenced-code-blocks" in self.extras:
1000 |             text = self._do_fenced_code_blocks(text)
1001 | 
1002 |         text = self._do_headers(text)
1003 | 
1004 |         # Do Horizontal Rules:
1005 |         # On the number of spaces in horizontal rules: The spec is fuzzy: "If
1006 |         # you wish, you may use spaces between the hyphens or asterisks."
1007 |         # Markdown.pl 1.0.1's hr regexes limit the number of spaces between the
1008 |         # hr chars to one or two. We'll reproduce that limit here.
1009 |         hr = "\n<hr"+self.empty_element_suffix+"\n"
1010 |         text = re.sub(self._hr_re, hr, text)
1011 | 
1012 |         text = self._do_lists(text)
1013 | 
1014 |         if "pyshell" in self.extras:
1015 |             text = self._prepare_pyshell_blocks(text)
1016 |         if "wiki-tables" in self.extras:
1017 |             text = self._do_wiki_tables(text)
1018 |         if "tables" in self.extras:
1019 |             text = self._do_tables(text)
1020 | 
1021 |         text = self._do_code_blocks(text)
1022 | 
1023 |         text = self._do_block_quotes(text)
1024 | 
1025 |         # We already ran _HashHTMLBlocks() before, in Markdown(), but that
1026 |         # was to escape raw HTML in the original Markdown source. This time,
1027 |         # we're escaping the markup we've just created, so that we don't wrap
1028 |         # <p> tags around block-level tags.
1029 |         text = self._hash_html_blocks(text)
1030 | 
1031 |         text = self._form_paragraphs(text)
1032 | 
1033 |         return text
1034 | 
1035 |     def _pyshell_block_sub(self, match):
1036 |         lines = match.group(0).splitlines(0)
1037 |         _dedentlines(lines)
1038 |         indent = ' ' * self.tab_width
1039 |         s = ('\n'  # separate from possible cuddled paragraph
1040 |              + indent + ('\n'+indent).join(lines)
1041 |              + '\n\n')
1042 |         return s
1043 | 
1044 |     def _prepare_pyshell_blocks(self, text):
1045 |         """Ensure that Python interactive shell sessions are put in
1046 |         code blocks -- even if not properly indented.
1047 |         """
1048 |         if ">>>" not in text:
1049 |             return text
1050 | 
1051 |         less_than_tab = self.tab_width - 1
1052 |         _pyshell_block_re = re.compile(r"""
1053 |             ^([ ]{0,%d})>>>[ ].*\n   # first line
1054 |             ^(\1.*\S+.*\n)*         # any number of subsequent lines
1055 |             ^\n                     # ends with a blank line
1056 |             """ % less_than_tab, re.M | re.X)
1057 | 
1058 |         return _pyshell_block_re.sub(self._pyshell_block_sub, text)
1059 | 
1060 |     def _table_sub(self, match):
1061 |         trim_space_re = '^[ \t\n]+|[ \t\n]+$'
1062 |         trim_bar_re = r'^\||\|$'
1063 |         split_bar_re = r'^\||(?<!\\)\|'
1064 |         escape_bar_re = r'\\\|'
1065 | 
1066 |         head, underline, body = match.groups()
1067 | 
1068 |         # Determine aligns for columns.
1069 |         cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", underline)))]
1070 |         align_from_col_idx = {}
1071 |         for col_idx, col in enumerate(cols):
1072 |             if col[0] == ':' and col[-1] == ':':
1073 |                 align_from_col_idx[col_idx] = ' style="text-align:center;"'
1074 |             elif col[0] == ':':
1075 |                 align_from_col_idx[col_idx] = ' style="text-align:left;"'
1076 |             elif col[-1] == ':':
1077 |                 align_from_col_idx[col_idx] = ' style="text-align:right;"'
1078 | 
1079 |         # thead
1080 |         hlines = ['<table%s>' % self._html_class_str_from_tag('table'), '<thead>', '<tr>']
1081 |         cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", head)))]
1082 |         for col_idx, col in enumerate(cols):
1083 |             hlines.append('  <th%s>%s</th>' % (
1084 |                 align_from_col_idx.get(col_idx, ''),
1085 |                 self._run_span_gamut(col)
1086 |             ))
1087 |         hlines.append('</tr>')
1088 |         hlines.append('</thead>')
1089 | 
1090 |         # tbody
1091 |         hlines.append('<tbody>')
1092 |         for line in body.strip('\n').split('\n'):
1093 |             hlines.append('<tr>')
1094 |             cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", line)))]
1095 |             for col_idx, col in enumerate(cols):
1096 |                 hlines.append('  <td%s>%s</td>' % (
1097 |                     align_from_col_idx.get(col_idx, ''),
1098 |                     self._run_span_gamut(col)
1099 |                 ))
1100 |             hlines.append('</tr>')
1101 |         hlines.append('</tbody>')
1102 |         hlines.append('</table>')
1103 | 
1104 |         return '\n'.join(hlines) + '\n'
1105 | 
1106 |     def _do_tables(self, text):
1107 |         """Copying PHP-Markdown and GFM table syntax. Some regex borrowed from
1108 |         https://github.com/michelf/php-markdown/blob/lib/Michelf/Markdown.php#L2538
1109 |         """
1110 |         less_than_tab = self.tab_width - 1
1111 |         table_re = re.compile(r'''
1112 |                 (?:(?<=\n\n)|\A\n?)             # leading blank line
1113 | 
1114 |                 ^[ ]{0,%d}                      # allowed whitespace
1115 |                 (.*[|].*)  \n                   # $1: header row (at least one pipe)
1116 | 
1117 |                 ^[ ]{0,%d}                      # allowed whitespace
1118 |                 (                               # $2: underline row
1119 |                     # underline row with leading bar
1120 |                     (?:  \|\ *:?-+:?\ *  )+  \|?  \n
1121 |                     |
1122 |                     # or, underline row without leading bar
1123 |                     (?:  \ *:?-+:?\ *\|  )+  (?:  \ *:?-+:?\ *  )?  \n
1124 |                 )
1125 | 
1126 |                 (                               # $3: data rows
1127 |                     (?:
1128 |                         ^[ ]{0,%d}(?!\ )         # ensure line begins with 0 to less_than_tab spaces
1129 |                         .*\|.*  \n
1130 |                     )+
1131 |                 )
1132 |             ''' % (less_than_tab, less_than_tab, less_than_tab), re.M | re.X)
1133 |         return table_re.sub(self._table_sub, text)
1134 | 
1135 |     def _wiki_table_sub(self, match):
1136 |         ttext = match.group(0).strip()
1137 |         # print('wiki table: %r' % match.group(0))
1138 |         rows = []
1139 |         for line in ttext.splitlines(0):
1140 |             line = line.strip()[2:-2].strip()
1141 |             row = [c.strip() for c in re.split(r'(?<!\\)\|\|', line)]
1142 |             rows.append(row)
1143 |         # from pprint import pprint
1144 |         # pprint(rows)
1145 |         hlines = []
1146 | 
1147 |         def add_hline(line, indents=0):
1148 |             hlines.append((self.tab * indents) + line)
1149 | 
1150 |         def format_cell(text):
1151 |             return self._run_span_gamut(re.sub(r"^\s*~", "", cell).strip(" "))
1152 | 
1153 |         add_hline('<table%s>' % self._html_class_str_from_tag('table'))
1154 |         # Check if first cell of first row is a header cell. If so, assume the whole row is a header row.
1155 |         if rows and rows[0] and re.match(r"^\s*~", rows[0][0]):
1156 |             add_hline('<thead>', 1)
1157 |             add_hline('<tr>', 2)
1158 |             for cell in rows[0]:
1159 |                 add_hline("<th>{}</th>".format(format_cell(cell)), 3)
1160 |             add_hline('</tr>', 2)
1161 |             add_hline('</thead>', 1)
1162 |             # Only one header row allowed.
1163 |             rows = rows[1:]
1164 |         # If no more rows, don't create a tbody.
1165 |         if rows:
1166 |             add_hline('<tbody>', 1)
1167 |             for row in rows:
1168 |                 add_hline('<tr>', 2)
1169 |                 for cell in row:
1170 |                     add_hline('<td>{}</td>'.format(format_cell(cell)), 3)
1171 |                 add_hline('</tr>', 2)
1172 |             add_hline('</tbody>', 1)
1173 |         add_hline('</table>')
1174 |         return '\n'.join(hlines) + '\n'
1175 | 
1176 |     def _do_wiki_tables(self, text):
1177 |         # Optimization.
1178 |         if "||" not in text:
1179 |             return text
1180 | 
1181 |         less_than_tab = self.tab_width - 1
1182 |         wiki_table_re = re.compile(r'''
1183 |             (?:(?<=\n\n)|\A\n?)            # leading blank line
1184 |             ^([ ]{0,%d})\|\|.+?\|\|[ ]*\n  # first line
1185 |             (^\1\|\|.+?\|\|\n)*        # any number of subsequent lines
1186 |             ''' % less_than_tab, re.M | re.X)
1187 |         return wiki_table_re.sub(self._wiki_table_sub, text)
1188 | 
1189 |     def _run_span_gamut(self, text):
1190 |         # These are all the transformations that occur *within* block-level
1191 |         # tags like paragraphs, headers, and list items.
1192 | 
1193 |         text = self._do_code_spans(text)
1194 | 
1195 |         text = self._escape_special_chars(text)
1196 | 
1197 |         # Process anchor and image tags.
1198 |         if "link-patterns" in self.extras:
1199 |             text = self._do_link_patterns(text)
1200 | 
1201 |         text = self._do_links(text)
1202 | 
1203 |         # Make links out of things like `<http://example.com/>`
1204 |         # Must come after _do_links(), because you can use < and >
1205 |         # delimiters in inline links like [this](<url>).
1206 |         text = self._do_auto_links(text)
1207 | 
1208 |         text = self._encode_amps_and_angles(text)
1209 | 
1210 |         if "strike" in self.extras:
1211 |             text = self._do_strike(text)
1212 | 
1213 |         if "underline" in self.extras:
1214 |             text = self._do_underline(text)
1215 | 
1216 |         text = self._do_italics_and_bold(text)
1217 | 
1218 |         if "smarty-pants" in self.extras:
1219 |             text = self._do_smart_punctuation(text)
1220 | 
1221 |         # Do hard breaks:
1222 |         if "break-on-newline" in self.extras:
1223 |             text = re.sub(r" *\n", "<br%s\n" % self.empty_element_suffix, text)
1224 |         else:
1225 |             text = re.sub(r" {2,}\n", " <br%s\n" % self.empty_element_suffix, text)
1226 | 
1227 |         return text
1228 | 
1229 |     # "Sorta" because auto-links are identified as "tag" tokens.
1230 |     _sorta_html_tokenize_re = re.compile(r"""
1231 |         (
1232 |             # tag
1233 |             </?
1234 |             (?:\w+)                                     # tag name
1235 |             (?:\s+(?:[\w-]+:)?[\w-]+=(?:".*?"|'.*?'))*  # attributes
1236 |             \s*/?>
1237 |             |
1238 |             # auto-link (e.g., <http://www.activestate.com/>)
1239 |             <\w+[^>]*>
1240 |             |
1241 |             <!--.*?-->      # comment
1242 |             |
1243 |             <\?.*?\?>       # processing instruction
1244 |         )
1245 |         """, re.X)
1246 | 
1247 |     def _escape_special_chars(self, text):
1248 |         # Python markdown note: the HTML tokenization here differs from
1249 |         # that in Markdown.pl, hence the behaviour for subtle cases can
1250 |         # differ (I believe the tokenizer here does a better job because
1251 |         # it isn't susceptible to unmatched '<' and '>' in HTML tags).
1252 |         # Note, however, that '>' is not allowed in an auto-link URL
1253 |         # here.
1254 |         escaped = []
1255 |         is_html_markup = False
1256 |         for token in self._sorta_html_tokenize_re.split(text):
1257 |             if is_html_markup:
1258 |                 # Within tags/HTML-comments/auto-links, encode * and _
1259 |                 # so they don't conflict with their use in Markdown for
1260 |                 # italics and strong.  We're replacing each such
1261 |                 # character with its corresponding MD5 checksum value;
1262 |                 # this is likely overkill, but it should prevent us from
1263 |                 # colliding with the escape values by accident.
1264 |                 escaped.append(token.replace('*', self._escape_table['*'])
1265 |                                     .replace('_', self._escape_table['_']))
1266 |             else:
1267 |                 escaped.append(self._encode_backslash_escapes(token))
1268 |             is_html_markup = not is_html_markup
1269 |         return ''.join(escaped)
1270 | 
1271 |     def _hash_html_spans(self, text):
1272 |         # Used for safe_mode.
1273 | 
1274 |         def _is_auto_link(s):
1275 |             if ':' in s and self._auto_link_re.match(s):
1276 |                 return True
1277 |             elif '@' in s and self._auto_email_link_re.match(s):
1278 |                 return True
1279 |             return False
1280 | 
1281 |         tokens = []
1282 |         is_html_markup = False
1283 |         for token in self._sorta_html_tokenize_re.split(text):
1284 |             if is_html_markup and not _is_auto_link(token):
1285 |                 sanitized = self._sanitize_html(token)
1286 |                 key = _hash_text(sanitized)
1287 |                 self.html_spans[key] = sanitized
1288 |                 tokens.append(key)
1289 |             else:
1290 |                 tokens.append(self._encode_incomplete_tags(token))
1291 |             is_html_markup = not is_html_markup
1292 |         return ''.join(tokens)
1293 | 
1294 |     def _unhash_html_spans(self, text):
1295 |         for key, sanitized in list(self.html_spans.items()):
1296 |             text = text.replace(key, sanitized)
1297 |         return text
1298 | 
1299 |     def _sanitize_html(self, s):
1300 |         if self.safe_mode == "replace":
1301 |             return self.html_removed_text
1302 |         elif self.safe_mode == "escape":
1303 |             replacements = [
1304 |                 ('&', '&amp;'),
1305 |                 ('<', '&lt;'),
1306 |                 ('>', '&gt;'),
1307 |             ]
1308 |             for before, after in replacements:
1309 |                 s = s.replace(before, after)
1310 |             return s
1311 |         else:
1312 |             raise MarkdownError("invalid value for 'safe_mode': %r (must be "
1313 |                                 "'escape' or 'replace')" % self.safe_mode)
1314 | 
1315 |     _inline_link_title = re.compile(r'''
1316 |             (                   # \1
1317 |               [ \t]+
1318 |               (['"])            # quote char = \2
1319 |               (?P<title>.*?)
1320 |               \2
1321 |             )?                  # title is optional
1322 |           \)$
1323 |         ''', re.X | re.S)
1324 |     _tail_of_reference_link_re = re.compile(r'''
1325 |           # Match tail of: [text][id]
1326 |           [ ]?          # one optional space
1327 |           (?:\n[ ]*)?   # one optional newline followed by spaces
1328 |           \[
1329 |             (?P<id>.*?)
1330 |           \]
1331 |         ''', re.X | re.S)
1332 | 
1333 |     _whitespace = re.compile(r'\s*')
1334 | 
1335 |     _strip_anglebrackets = re.compile(r'<(.*)>.*')
1336 | 
1337 |     def _find_non_whitespace(self, text, start):
1338 |         """Returns the index of the first non-whitespace character in text
1339 |         after (and including) start
1340 |         """
1341 |         match = self._whitespace.match(text, start)
1342 |         return match.end()
1343 | 
1344 |     def _find_balanced(self, text, start, open_c, close_c):
1345 |         """Returns the index where the open_c and close_c characters balance
1346 |         out - the same number of open_c and close_c are encountered - or the
1347 |         end of string if it's reached before the balance point is found.
1348 |         """
1349 |         i = start
1350 |         l = len(text)
1351 |         count = 1
1352 |         while count > 0 and i < l:
1353 |             if text[i] == open_c:
1354 |                 count += 1
1355 |             elif text[i] == close_c:
1356 |                 count -= 1
1357 |             i += 1
1358 |         return i
1359 | 
1360 |     def _extract_url_and_title(self, text, start):
1361 |         """Extracts the url and (optional) title from the tail of a link"""
1362 |         # text[start] equals the opening parenthesis
1363 |         idx = self._find_non_whitespace(text, start+1)
1364 |         if idx == len(text):
1365 |             return None, None, None
1366 |         end_idx = idx
1367 |         has_anglebrackets = text[idx] == "<"
1368 |         if has_anglebrackets:
1369 |             end_idx = self._find_balanced(text, end_idx+1, "<", ">")
1370 |         end_idx = self._find_balanced(text, end_idx, "(", ")")
1371 |         match = self._inline_link_title.search(text, idx, end_idx)
1372 |         if not match:
1373 |             return None, None, None
1374 |         url, title = text[idx:match.start()], match.group("title")
1375 |         if has_anglebrackets:
1376 |             url = self._strip_anglebrackets.sub(r'\1', url)
1377 |         return url, title, end_idx
1378 | 
1379 |     _safe_protocols = re.compile(r'(https?|ftp):', re.I)
1380 |     def _do_links(self, text):
1381 |         """Turn Markdown link shortcuts into XHTML <a> and <img> tags.
1382 | 
1383 |         This is a combination of Markdown.pl's _DoAnchors() and
1384 |         _DoImages(). They are done together because that simplified the
1385 |         approach. It was necessary to use a different approach than
1386 |         Markdown.pl because of the lack of atomic matching support in
1387 |         Python's regex engine used in $g_nested_brackets.
1388 |         """
1389 |         MAX_LINK_TEXT_SENTINEL = 3000  # markdown2 issue 24
1390 | 
1391 |         # `anchor_allowed_pos` is used to support img links inside
1392 |         # anchors, but not anchors inside anchors. An anchor's start
1393 |         # pos must be `>= anchor_allowed_pos`.
1394 |         anchor_allowed_pos = 0
1395 | 
1396 |         curr_pos = 0
1397 |         while True:  # Handle the next link.
1398 |             # The next '[' is the start of:
1399 |             # - an inline anchor:   [text](url "title")
1400 |             # - a reference anchor: [text][id]
1401 |             # - an inline img:      ![text](url "title")
1402 |             # - a reference img:    ![text][id]
1403 |             # - a footnote ref:     [^id]
1404 |             #   (Only if 'footnotes' extra enabled)
1405 |             # - a footnote defn:    [^id]: ...
1406 |             #   (Only if 'footnotes' extra enabled) These have already
1407 |             #   been stripped in _strip_footnote_definitions() so no
1408 |             #   need to watch for them.
1409 |             # - a link definition:  [id]: url "title"
1410 |             #   These have already been stripped in
1411 |             #   _strip_link_definitions() so no need to watch for them.
1412 |             # - not markup:         [...anything else...
1413 |             try:
1414 |                 start_idx = text.index('[', curr_pos)
1415 |             except ValueError:
1416 |                 break
1417 |             text_length = len(text)
1418 | 
1419 |             # Find the matching closing ']'.
1420 |             # Markdown.pl allows *matching* brackets in link text so we
1421 |             # will here too. Markdown.pl *doesn't* currently allow
1422 |             # matching brackets in img alt text -- we'll differ in that
1423 |             # regard.
1424 |             bracket_depth = 0
1425 |             for p in range(start_idx+1, min(start_idx+MAX_LINK_TEXT_SENTINEL,
1426 |                                             text_length)):
1427 |                 ch = text[p]
1428 |                 if ch == ']':
1429 |                     bracket_depth -= 1
1430 |                     if bracket_depth < 0:
1431 |                         break
1432 |                 elif ch == '[':
1433 |                     bracket_depth += 1
1434 |             else:
1435 |                 # Closing bracket not found within sentinel length.
1436 |                 # This isn't markup.
1437 |                 curr_pos = start_idx + 1
1438 |                 continue
1439 |             link_text = text[start_idx+1:p]
1440 | 
1441 |             # Fix for issue 341 - Injecting XSS into link text
1442 |             if self.safe_mode:
1443 |                 link_text = self._hash_html_spans(link_text)
1444 |                 link_text = self._unhash_html_spans(link_text)
1445 | 
1446 |             # Possibly a footnote ref?
1447 |             if "footnotes" in self.extras and link_text.startswith("^"):
1448 |                 normed_id = re.sub(r'\W', '-', link_text[1:])
1449 |                 if normed_id in self.footnotes:
1450 |                     self.footnote_ids.append(normed_id)
1451 |                     result = '<sup class="footnote-ref" id="fnref-%s">' \
1452 |                              '<a href="#fn-%s">%s</a></sup>' \
1453 |                              % (normed_id, normed_id, len(self.footnote_ids))
1454 |                     text = text[:start_idx] + result + text[p+1:]
1455 |                 else:
1456 |                     # This id isn't defined, leave the markup alone.
1457 |                     curr_pos = p+1
1458 |                 continue
1459 | 
1460 |             # Now determine what this is by the remainder.
1461 |             p += 1
1462 |             if p == text_length:
1463 |                 return text
1464 | 
1465 |             # Inline anchor or img?
1466 |             if text[p] == '(':  # attempt at perf improvement
1467 |                 url, title, url_end_idx = self._extract_url_and_title(text, p)
1468 |                 if url is not None:
1469 |                     # Handle an inline anchor or img.
1470 |                     is_img = start_idx > 0 and text[start_idx-1] == "!"
1471 |                     if is_img:
1472 |                         start_idx -= 1
1473 | 
1474 |                     # We've got to encode these to avoid conflicting
1475 |                     # with italics/bold.
1476 |                     url = url.replace('*', self._escape_table['*']) \
1477 |                              .replace('_', self._escape_table['_'])
1478 |                     if title:
1479 |                         title_str = ' title="%s"' % (
1480 |                             _xml_escape_attr(title)
1481 |                                 .replace('*', self._escape_table['*'])
1482 |                                 .replace('_', self._escape_table['_']))
1483 |                     else:
1484 |                         title_str = ''
1485 |                     if is_img:
1486 |                         img_class_str = self._html_class_str_from_tag("img")
1487 |                         result = '<img src="%s" alt="%s"%s%s%s' \
1488 |                             % (_html_escape_url(url, safe_mode=self.safe_mode),
1489 |                                _xml_escape_attr(link_text),
1490 |                                title_str,
1491 |                                img_class_str,
1492 |                                self.empty_element_suffix)
1493 |                         if "smarty-pants" in self.extras:
1494 |                             result = result.replace('"', self._escape_table['"'])
1495 |                         curr_pos = start_idx + len(result)
1496 |                         text = text[:start_idx] + result + text[url_end_idx:]
1497 |                     elif start_idx >= anchor_allowed_pos:
1498 |                         safe_link = self._safe_protocols.match(url) or url.startswith('#')
1499 |                         if self.safe_mode and not safe_link:
1500 |                             result_head = '<a href="#"%s>' % (title_str)
1501 |                         else:
1502 |                             result_head = '<a href="%s"%s>' % (_html_escape_url(url, safe_mode=self.safe_mode), title_str)
1503 |                         result = '%s%s</a>' % (result_head, link_text)
1504 |                         if "smarty-pants" in self.extras:
1505 |                             result = result.replace('"', self._escape_table['"'])
1506 |                         # <img> allowed from curr_pos on, <a> from
1507 |                         # anchor_allowed_pos on.
1508 |                         curr_pos = start_idx + len(result_head)
1509 |                         anchor_allowed_pos = start_idx + len(result)
1510 |                         text = text[:start_idx] + result + text[url_end_idx:]
1511 |                     else:
1512 |                         # Anchor not allowed here.
1513 |                         curr_pos = start_idx + 1
1514 |                     continue
1515 | 
1516 |             # Reference anchor or img?
1517 |             else:
1518 |                 match = self._tail_of_reference_link_re.match(text, p)
1519 |                 if match:
1520 |                     # Handle a reference-style anchor or img.
1521 |                     is_img = start_idx > 0 and text[start_idx-1] == "!"
1522 |                     if is_img:
1523 |                         start_idx -= 1
1524 |                     link_id = match.group("id").lower()
1525 |                     if not link_id:
1526 |                         link_id = link_text.lower()  # for links like [this][]
1527 |                     if link_id in self.urls:
1528 |                         url = self.urls[link_id]
1529 |                         # We've got to encode these to avoid conflicting
1530 |                         # with italics/bold.
1531 |                         url = url.replace('*', self._escape_table['*']) \
1532 |                                  .replace('_', self._escape_table['_'])
1533 |                         title = self.titles.get(link_id)
1534 |                         if title:
1535 |                             title = _xml_escape_attr(title) \
1536 |                                 .replace('*', self._escape_table['*']) \
1537 |                                 .replace('_', self._escape_table['_'])
1538 |                             title_str = ' title="%s"' % title
1539 |                         else:
1540 |                             title_str = ''
1541 |                         if is_img:
1542 |                             img_class_str = self._html_class_str_from_tag("img")
1543 |                             result = '<img src="%s" alt="%s"%s%s%s' \
1544 |                                 % (_html_escape_url(url, safe_mode=self.safe_mode),
1545 |                                    _xml_escape_attr(link_text),
1546 |                                    title_str,
1547 |                                    img_class_str,
1548 |                                    self.empty_element_suffix)
1549 |                             if "smarty-pants" in self.extras:
1550 |                                 result = result.replace('"', self._escape_table['"'])
1551 |                             curr_pos = start_idx + len(result)
1552 |                             text = text[:start_idx] + result + text[match.end():]
1553 |                         elif start_idx >= anchor_allowed_pos:
1554 |                             if self.safe_mode and not self._safe_protocols.match(url):
1555 |                                 result_head = '<a href="#"%s>' % (title_str)
1556 |                             else:
1557 |                                 result_head = '<a href="%s"%s>' % (_html_escape_url(url, safe_mode=self.safe_mode), title_str)
1558 |                             result = '%s%s</a>' % (result_head, link_text)
1559 |                             if "smarty-pants" in self.extras:
1560 |                                 result = result.replace('"', self._escape_table['"'])
1561 |                             # <img> allowed from curr_pos on, <a> from
1562 |                             # anchor_allowed_pos on.
1563 |                             curr_pos = start_idx + len(result_head)
1564 |                             anchor_allowed_pos = start_idx + len(result)
1565 |                             text = text[:start_idx] + result + text[match.end():]
1566 |                         else:
1567 |                             # Anchor not allowed here.
1568 |                             curr_pos = start_idx + 1
1569 |                     else:
1570 |                         # This id isn't defined, leave the markup alone.
1571 |                         curr_pos = match.end()
1572 |                     continue
1573 | 
1574 |             # Otherwise, it isn't markup.
1575 |             curr_pos = start_idx + 1
1576 | 
1577 |         return text
1578 | 
1579 |     def header_id_from_text(self, text, prefix, n):
1580 |         """Generate a header id attribute value from the given header
1581 |         HTML content.
1582 | 
1583 |         This is only called if the "header-ids" extra is enabled.
1584 |         Subclasses may override this for different header ids.
1585 | 
1586 |         @param text {str} The text of the header tag
1587 |         @param prefix {str} The requested prefix for header ids. This is the
1588 |             value of the "header-ids" extra key, if any. Otherwise, None.
1589 |         @param n {int} The <hN> tag number, i.e. `1` for an <h1> tag.
1590 |         @returns {str} The value for the header tag's "id" attribute. Return
1591 |             None to not have an id attribute and to exclude this header from
1592 |             the TOC (if the "toc" extra is specified).
1593 |         """
1594 |         header_id = _slugify(text)
1595 |         if prefix and isinstance(prefix, base_string_type):
1596 |             header_id = prefix + '-' + header_id
1597 | 
1598 |         self._count_from_header_id[header_id] += 1
1599 |         if 0 == len(header_id) or self._count_from_header_id[header_id] > 1:
1600 |             header_id += '-%s' % self._count_from_header_id[header_id]
1601 | 
1602 |         return header_id
1603 | 
1604 |     def _toc_add_entry(self, level, id, name):
1605 |         if level > self._toc_depth:
1606 |             return
1607 |         if self._toc is None:
1608 |             self._toc = []
1609 |         self._toc.append((level, id, self._unescape_special_chars(name)))
1610 | 
1611 |     _h_re_base = r'''
1612 |         (^(.+)[ \t]*\n(=+|-+)[ \t]*\n+)
1613 |         |
1614 |         (^(\#{1,6})  # \1 = string of #'s
1615 |         [ \t]%s
1616 |         (.+?)       # \2 = Header text
1617 |         [ \t]*
1618 |         (?<!\\)     # ensure not an escaped trailing '#'
1619 |         \#*         # optional closing #'s (not counted)
1620 |         \n+
1621 |         )
1622 |         '''
1623 | 
1624 |     _h_re = re.compile(_h_re_base % '*', re.X | re.M)
1625 |     _h_re_tag_friendly = re.compile(_h_re_base % '+', re.X | re.M)
1626 | 
1627 |     def _h_sub(self, match):
1628 |         if match.group(1) is not None and match.group(3) == "-":
1629 |             return match.group(1)
1630 |         elif match.group(1) is not None:
1631 |             # Setext header
1632 |             n = {"=": 1, "-": 2}[match.group(3)[0]]
1633 |             header_group = match.group(2)
1634 |         else:
1635 |             # atx header
1636 |             n = len(match.group(5))
1637 |             header_group = match.group(6)
1638 | 
1639 |         demote_headers = self.extras.get("demote-headers")
1640 |         if demote_headers:
1641 |             n = min(n + demote_headers, 6)
1642 |         header_id_attr = ""
1643 |         if "header-ids" in self.extras:
1644 |             header_id = self.header_id_from_text(header_group,
1645 |                 self.extras["header-ids"], n)
1646 |             if header_id:
1647 |                 header_id_attr = ' id="%s"' % header_id
1648 |         html = self._run_span_gamut(header_group)
1649 |         if "toc" in self.extras and header_id:
1650 |             self._toc_add_entry(n, header_id, html)
1651 |         return "<h%d%s>%s</h%d>\n\n" % (n, header_id_attr, html, n)
1652 | 
1653 |     def _do_headers(self, text):
1654 |         # Setext-style headers:
1655 |         #     Header 1
1656 |         #     ========
1657 |         #
1658 |         #     Header 2
1659 |         #     --------
1660 | 
1661 |         # atx-style headers:
1662 |         #   # Header 1
1663 |         #   ## Header 2
1664 |         #   ## Header 2 with closing hashes ##
1665 |         #   ...
1666 |         #   ###### Header 6
1667 | 
1668 |         if 'tag-friendly' in self.extras:
1669 |             return self._h_re_tag_friendly.sub(self._h_sub, text)
1670 |         return self._h_re.sub(self._h_sub, text)
1671 | 
1672 |     _marker_ul_chars = '*+-'
1673 |     _marker_any = r'(?:[%s]|\d+\.)' % _marker_ul_chars
1674 |     _marker_ul = '(?:[%s])' % _marker_ul_chars
1675 |     _marker_ol = r'(?:\d+\.)'
1676 | 
1677 |     def _list_sub(self, match):
1678 |         lst = match.group(1)
1679 |         lst_type = match.group(3) in self._marker_ul_chars and "ul" or "ol"
1680 |         result = self._process_list_items(lst)
1681 |         if self.list_level:
1682 |             return "<%s>\n%s</%s>\n" % (lst_type, result, lst_type)
1683 |         else:
1684 |             return "<%s>\n%s</%s>\n\n" % (lst_type, result, lst_type)
1685 | 
1686 |     def _do_lists(self, text):
1687 |         # Form HTML ordered (numbered) and unordered (bulleted) lists.
1688 | 
1689 |         # Iterate over each *non-overlapping* list match.
1690 |         pos = 0
1691 |         while True:
1692 |             # Find the *first* hit for either list style (ul or ol). We
1693 |             # match ul and ol separately to avoid adjacent lists of different
1694 |             # types running into each other (see issue #16).
1695 |             hits = []
1696 |             for marker_pat in (self._marker_ul, self._marker_ol):
1697 |                 less_than_tab = self.tab_width - 1
1698 |                 whole_list = r'''
1699 |                     (                   # \1 = whole list
1700 |                       (                 # \2
1701 |                         [ ]{0,%d}
1702 |                         (%s)            # \3 = first list item marker
1703 |                         [ \t]+
1704 |                         (?!\ *\3\ )     # '- - - ...' isn't a list. See 'not_quite_a_list' test case.
1705 |                       )
1706 |                       (?:.+?)
1707 |                       (                 # \4
1708 |                           \Z
1709 |                         |
1710 |                           \n{2,}
1711 |                           (?=\S)
1712 |                           (?!           # Negative lookahead for another list item marker
1713 |                             [ \t]*
1714 |                             %s[ \t]+
1715 |                           )
1716 |                       )
1717 |                     )
1718 |                 ''' % (less_than_tab, marker_pat, marker_pat)
1719 |                 if self.list_level:  # sub-list
1720 |                     list_re = re.compile("^"+whole_list, re.X | re.M | re.S)
1721 |                 else:
1722 |                     list_re = re.compile(r"(?:(?<=\n\n)|\A\n?)"+whole_list,
1723 |                                          re.X | re.M | re.S)
1724 |                 match = list_re.search(text, pos)
1725 |                 if match:
1726 |                     hits.append((match.start(), match))
1727 |             if not hits:
1728 |                 break
1729 |             hits.sort()
1730 |             match = hits[0][1]
1731 |             start, end = match.span()
1732 |             middle = self._list_sub(match)
1733 |             text = text[:start] + middle + text[end:]
1734 |             pos = start + len(middle)  # start pos for next attempted match
1735 | 
1736 |         return text
1737 | 
1738 |     _list_item_re = re.compile(r'''
1739 |         (\n)?                   # leading line = \1
1740 |         (^[ \t]*)               # leading whitespace = \2
1741 |         (?P<marker>%s) [ \t]+   # list marker = \3
1742 |         ((?:.+?)                # list item text = \4
1743 |         (\n{1,2}))              # eols = \5
1744 |         (?= \n* (\Z | \2 (?P<next_marker>%s) [ \t]+))
1745 |         ''' % (_marker_any, _marker_any),
1746 |         re.M | re.X | re.S)
1747 | 
1748 |     _task_list_item_re = re.compile(r'''
1749 |         (\[[\ xX]\])[ \t]+       # tasklist marker = \1
1750 |         (.*)                   # list item text = \2
1751 |     ''', re.M | re.X | re.S)
1752 | 
1753 |     _task_list_warpper_str = r'<input type="checkbox" class="task-list-item-checkbox" %sdisabled> %s'
1754 | 
1755 |     def _task_list_item_sub(self, match):
1756 |         marker = match.group(1)
1757 |         item_text = match.group(2)
1758 |         if marker in ['[x]','[X]']:
1759 |                 return self._task_list_warpper_str % ('checked ', item_text)
1760 |         elif marker == '[ ]':
1761 |                 return self._task_list_warpper_str % ('', item_text)
1762 | 
1763 |     _last_li_endswith_two_eols = False
1764 |     def _list_item_sub(self, match):
1765 |         item = match.group(4)
1766 |         leading_line = match.group(1)
1767 |         if leading_line or "\n\n" in item or self._last_li_endswith_two_eols:
1768 |             item = self._run_block_gamut(self._outdent(item))
1769 |         else:
1770 |             # Recursion for sub-lists:
1771 |             item = self._do_lists(self._outdent(item))
1772 |             if item.endswith('\n'):
1773 |                 item = item[:-1]
1774 |             item = self._run_span_gamut(item)
1775 |         self._last_li_endswith_two_eols = (len(match.group(5)) == 2)
1776 | 
1777 |         if "task_list" in self.extras:
1778 |             item = self._task_list_item_re.sub(self._task_list_item_sub, item)
1779 | 
1780 |         return "<li>%s</li>\n" % item
1781 | 
1782 |     def _process_list_items(self, list_str):
1783 |         # Process the contents of a single ordered or unordered list,
1784 |         # splitting it into individual list items.
1785 | 
1786 |         # The $g_list_level global keeps track of when we're inside a list.
1787 |         # Each time we enter a list, we increment it; when we leave a list,
1788 |         # we decrement. If it's zero, we're not in a list anymore.
1789 |         #
1790 |         # We do this because when we're not inside a list, we want to treat
1791 |         # something like this:
1792 |         #
1793 |         #       I recommend upgrading to version
1794 |         #       8. Oops, now this line is treated
1795 |         #       as a sub-list.
1796 |         #
1797 |         # As a single paragraph, despite the fact that the second line starts
1798 |         # with a digit-period-space sequence.
1799 |         #
1800 |         # Whereas when we're inside a list (or sub-list), that line will be
1801 |         # treated as the start of a sub-list. What a kludge, huh? This is
1802 |         # an aspect of Markdown's syntax that's hard to parse perfectly
1803 |         # without resorting to mind-reading. Perhaps the solution is to
1804 |         # change the syntax rules such that sub-lists must start with a
1805 |         # starting cardinal number; e.g. "1." or "a.".
1806 |         self.list_level += 1
1807 |         self._last_li_endswith_two_eols = False
1808 |         list_str = list_str.rstrip('\n') + '\n'
1809 |         list_str = self._list_item_re.sub(self._list_item_sub, list_str)
1810 |         self.list_level -= 1
1811 |         return list_str
1812 | 
1813 |     def _get_pygments_lexer(self, lexer_name):
1814 |         try:
1815 |             from pygments import lexers, util
1816 |         except ImportError:
1817 |             return None
1818 |         try:
1819 |             return lexers.get_lexer_by_name(lexer_name)
1820 |         except util.ClassNotFound:
1821 |             return None
1822 | 
1823 |     def _color_with_pygments(self, codeblock, lexer, **formatter_opts):
1824 |         import pygments
1825 |         import pygments.formatters
1826 | 
1827 |         class HtmlCodeFormatter(pygments.formatters.HtmlFormatter):
1828 |             def _wrap_code(self, inner):
1829 |                 """A function for use in a Pygments Formatter which
1830 |                 wraps in <code> tags.
1831 |                 """
1832 |                 yield 0, "<code>"
1833 |                 for tup in inner:
1834 |                     yield tup
1835 |                 yield 0, "</code>"
1836 | 
1837 |             def wrap(self, source, outfile):
1838 |                 """Return the source with a code, pre, and div."""
1839 |                 return self._wrap_div(self._wrap_pre(self._wrap_code(source)))
1840 | 
1841 |         formatter_opts.setdefault("cssclass", "codehilite")
1842 |         formatter = HtmlCodeFormatter(**formatter_opts)
1843 |         return pygments.highlight(codeblock, lexer, formatter)
1844 | 
1845 |     def _code_block_sub(self, match, is_fenced_code_block=False):
1846 |         lexer_name = None
1847 |         if is_fenced_code_block:
1848 |             lexer_name = match.group(1)
1849 |             if lexer_name:
1850 |                 formatter_opts = self.extras['fenced-code-blocks'] or {}
1851 |             codeblock = match.group(2)
1852 |             codeblock = codeblock[:-1]  # drop one trailing newline
1853 |         else:
1854 |             codeblock = match.group(1)
1855 |             codeblock = self._outdent(codeblock)
1856 |             codeblock = self._detab(codeblock)
1857 |             codeblock = codeblock.lstrip('\n')  # trim leading newlines
1858 |             codeblock = codeblock.rstrip()      # trim trailing whitespace
1859 | 
1860 |             # Note: "code-color" extra is DEPRECATED.
1861 |             if "code-color" in self.extras and codeblock.startswith(":::"):
1862 |                 lexer_name, rest = codeblock.split('\n', 1)
1863 |                 lexer_name = lexer_name[3:].strip()
1864 |                 codeblock = rest.lstrip("\n")   # Remove lexer declaration line.
1865 |                 formatter_opts = self.extras['code-color'] or {}
1866 | 
1867 |         # Use pygments only if not using the highlightjs-lang extra
1868 |         if lexer_name and "highlightjs-lang" not in self.extras:
1869 |             def unhash_code(codeblock):
1870 |                 for key, sanitized in list(self.html_spans.items()):
1871 |                     codeblock = codeblock.replace(key, sanitized)
1872 |                 replacements = [
1873 |                     ("&amp;", "&"),
1874 |                     ("&lt;", "<"),
1875 |                     ("&gt;", ">")
1876 |                 ]
1877 |                 for old, new in replacements:
1878 |                     codeblock = codeblock.replace(old, new)
1879 |                 return codeblock
1880 |             lexer = self._get_pygments_lexer(lexer_name)
1881 |             if lexer:
1882 |                 codeblock = unhash_code( codeblock )
1883 |                 colored = self._color_with_pygments(codeblock, lexer,
1884 |                                                     **formatter_opts)
1885 |                 return "\n\n%s\n\n" % colored
1886 | 
1887 |         codeblock = self._encode_code(codeblock)
1888 |         pre_class_str = self._html_class_str_from_tag("pre")
1889 | 
1890 |         if "highlightjs-lang" in self.extras and lexer_name:
1891 |             code_class_str = ' class="%s language-%s"' % (lexer_name, lexer_name)
1892 |         else:
1893 |             code_class_str = self._html_class_str_from_tag("code")
1894 | 
1895 |         return "\n\n<pre%s><code%s>%s\n</code></pre>\n\n" % (
1896 |             pre_class_str, code_class_str, codeblock)
1897 | 
1898 |     def _html_class_str_from_tag(self, tag):
1899 |         """Get the appropriate ' class="..."' string (note the leading
1900 |         space), if any, for the given tag.
1901 |         """
1902 |         if "html-classes" not in self.extras:
1903 |             return ""
1904 |         try:
1905 |             html_classes_from_tag = self.extras["html-classes"]
1906 |         except TypeError:
1907 |             return ""
1908 |         else:
1909 |             if tag in html_classes_from_tag:
1910 |                 return ' class="%s"' % html_classes_from_tag[tag]
1911 |         return ""
1912 | 
1913 |     def _do_code_blocks(self, text):
1914 |         """Process Markdown `<pre><code>` blocks."""
1915 |         code_block_re = re.compile(r'''
1916 |             (?:\n\n|\A\n?)
1917 |             (               # $1 = the code block -- one or more lines, starting with a space/tab
1918 |               (?:
1919 |                 (?:[ ]{%d} | \t)  # Lines must start with a tab or a tab-width of spaces
1920 |                 .*\n+
1921 |               )+
1922 |             )
1923 |             ((?=^[ ]{0,%d}\S)|\Z)   # Lookahead for non-space at line-start, or end of doc
1924 |             # Lookahead to make sure this block isn't already in a code block.
1925 |             # Needed when syntax highlighting is being used.
1926 |             (?![^<]*\</code\>)
1927 |             ''' % (self.tab_width, self.tab_width),
1928 |             re.M | re.X)
1929 |         return code_block_re.sub(self._code_block_sub, text)
1930 | 
1931 |     _fenced_code_block_re = re.compile(r'''
1932 |         (?:\n+|\A\n?)
1933 |         ^```\s*?([\w+-]+)?\s*?\n    # opening fence, $1 = optional lang
1934 |         (.*?)                       # $2 = code block content
1935 |         ^```[ \t]*\n                # closing fence
1936 |         ''', re.M | re.X | re.S)
1937 | 
1938 |     def _fenced_code_block_sub(self, match):
1939 |         return self._code_block_sub(match, is_fenced_code_block=True)
1940 | 
1941 |     def _do_fenced_code_blocks(self, text):
1942 |         """Process ```-fenced unindented code blocks ('fenced-code-blocks' extra)."""
1943 |         return self._fenced_code_block_re.sub(self._fenced_code_block_sub, text)
1944 | 
1945 |     # Rules for a code span:
1946 |     # - backslash escapes are not interpreted in a code span
1947 |     # - to include one or or a run of more backticks the delimiters must
1948 |     #   be a longer run of backticks
1949 |     # - cannot start or end a code span with a backtick; pad with a
1950 |     #   space and that space will be removed in the emitted HTML
1951 |     # See `test/tm-cases/escapes.text` for a number of edge-case
1952 |     # examples.
1953 |     _code_span_re = re.compile(r'''
1954 |             (?<!\\)
1955 |             (`+)        # \1 = Opening run of `
1956 |             (?!`)       # See Note A test/tm-cases/escapes.text
1957 |             (.+?)       # \2 = The code block
1958 |             (?<!`)
1959 |             \1          # Matching closer
1960 |             (?!`)
1961 |         ''', re.X | re.S)
1962 | 
1963 |     def _code_span_sub(self, match):
1964 |         c = match.group(2).strip(" \t")
1965 |         c = self._encode_code(c)
1966 |         return "<code>%s</code>" % c
1967 | 
1968 |     def _do_code_spans(self, text):
1969 |         #   *   Backtick quotes are used for <code></code> spans.
1970 |         #
1971 |         #   *   You can use multiple backticks as the delimiters if you want to
1972 |         #       include literal backticks in the code span. So, this input:
1973 |         #
1974 |         #         Just type ``foo `bar` baz`` at the prompt.
1975 |         #
1976 |         #       Will translate to:
1977 |         #
1978 |         #         <p>Just type <code>foo `bar` baz</code> at the prompt.</p>
1979 |         #
1980 |         #       There's no arbitrary limit to the number of backticks you
1981 |         #       can use as delimters. If you need three consecutive backticks
1982 |         #       in your code, use four for delimiters, etc.
1983 |         #
1984 |         #   *   You can use spaces to get literal backticks at the edges:
1985 |         #
1986 |         #         ... type `` `bar` `` ...
1987 |         #
1988 |         #       Turns to:
1989 |         #
1990 |         #         ... type <code>`bar`</code> ...
1991 |         return self._code_span_re.sub(self._code_span_sub, text)
1992 | 
1993 |     def _encode_code(self, text):
1994 |         """Encode/escape certain characters inside Markdown code runs.
1995 |         The point is that in code, these characters are literals,
1996 |         and lose their special Markdown meanings.
1997 |         """
1998 |         replacements = [
1999 |             # Encode all ampersands; HTML entities are not
2000 |             # entities within a Markdown code span.
2001 |             ('&', '&amp;'),
2002 |             # Do the angle bracket song and dance:
2003 |             ('<', '&lt;'),
2004 |             ('>', '&gt;'),
2005 |         ]
2006 |         for before, after in replacements:
2007 |             text = text.replace(before, after)
2008 |         hashed = _hash_text(text)
2009 |         self._escape_table[text] = hashed
2010 |         return hashed
2011 | 
2012 |     _strike_re = re.compile(r"~~(?=\S)(.+?)(?<=\S)~~", re.S)
2013 |     def _do_strike(self, text):
2014 |         text = self._strike_re.sub(r"<strike>\1</strike>", text)
2015 |         return text
2016 | 
2017 |     _underline_re = re.compile(r"--(?=\S)(.+?)(?<=\S)--", re.S)
2018 |     def _do_underline(self, text):
2019 |         text = self._underline_re.sub(r"<u>\1</u>", text)
2020 |         return text
2021 | 
2022 |     _strong_re = re.compile(r"(\*\*|__)(?=\S)(.+?[*_]*)(?<=\S)\1", re.S)
2023 |     _em_re = re.compile(r"(\*|_)(?=\S)(.+?)(?<=\S)\1", re.S)
2024 |     _code_friendly_strong_re = re.compile(r"\*\*(?=\S)(.+?[*_]*)(?<=\S)\*\*", re.S)
2025 |     _code_friendly_em_re = re.compile(r"\*(?=\S)(.+?)(?<=\S)\*", re.S)
2026 |     def _do_italics_and_bold(self, text):
2027 |         # <strong> must go first:
2028 |         if "code-friendly" in self.extras:
2029 |             text = self._code_friendly_strong_re.sub(r"<strong>\1</strong>", text)
2030 |             text = self._code_friendly_em_re.sub(r"<em>\1</em>", text)
2031 |         else:
2032 |             text = self._strong_re.sub(r"<strong>\2</strong>", text)
2033 |             text = self._em_re.sub(r"<em>\2</em>", text)
2034 |         return text
2035 | 
2036 |     # "smarty-pants" extra: Very liberal in interpreting a single prime as an
2037 |     # apostrophe; e.g. ignores the fact that "round", "bout", "twer", and
2038 |     # "twixt" can be written without an initial apostrophe. This is fine because
2039 |     # using scare quotes (single quotation marks) is rare.
2040 |     _apostrophe_year_re = re.compile(r"'(\d\d)(?=(\s|,|;|\.|\?|!|$))")
2041 |     _contractions = ["tis", "twas", "twer", "neath", "o", "n",
2042 |         "round", "bout", "twixt", "nuff", "fraid", "sup"]
2043 |     def _do_smart_contractions(self, text):
2044 |         text = self._apostrophe_year_re.sub(r"&#8217;\1", text)
2045 |         for c in self._contractions:
2046 |             text = text.replace("'%s" % c, "&#8217;%s" % c)
2047 |             text = text.replace("'%s" % c.capitalize(),
2048 |                 "&#8217;%s" % c.capitalize())
2049 |         return text
2050 | 
2051 |     # Substitute double-quotes before single-quotes.
2052 |     _opening_single_quote_re = re.compile(r"(?<!\S)'(?=\S)")
2053 |     _opening_double_quote_re = re.compile(r'(?<!\S)"(?=\S)')
2054 |     _closing_single_quote_re = re.compile(r"(?<=\S)'")
2055 |     _closing_double_quote_re = re.compile(r'(?<=\S)"(?=(\s|,|;|\.|\?|!|$))')
2056 |     def _do_smart_punctuation(self, text):
2057 |         """Fancifies 'single quotes', "double quotes", and apostrophes.
2058 |         Converts --, ---, and ... into en dashes, em dashes, and ellipses.
2059 | 
2060 |         Inspiration is: <http://daringfireball.net/projects/smartypants/>
2061 |         See "test/tm-cases/smarty_pants.text" for a full discussion of the
2062 |         support here and
2063 |         <http://code.google.com/p/python-markdown2/issues/detail?id=42> for a
2064 |         discussion of some diversion from the original SmartyPants.
2065 |         """
2066 |         if "'" in text:  # guard for perf
2067 |             text = self._do_smart_contractions(text)
2068 |             text = self._opening_single_quote_re.sub("&#8216;", text)
2069 |             text = self._closing_single_quote_re.sub("&#8217;", text)
2070 | 
2071 |         if '"' in text:  # guard for perf
2072 |             text = self._opening_double_quote_re.sub("&#8220;", text)
2073 |             text = self._closing_double_quote_re.sub("&#8221;", text)
2074 | 
2075 |         text = text.replace("---", "&#8212;")
2076 |         text = text.replace("--", "&#8211;")
2077 |         text = text.replace("...", "&#8230;")
2078 |         text = text.replace(" . . . ", "&#8230;")
2079 |         text = text.replace(". . .", "&#8230;")
2080 | 
2081 |         # TODO: Temporary hack to fix https://github.com/trentm/python-markdown2/issues/150
2082 |         if "footnotes" in self.extras and "footnote-ref" in text:
2083 |             # Quotes in the footnote back ref get converted to "smart" quotes
2084 |             # Change them back here to ensure they work.
2085 |             text = text.replace('class="footnote-ref&#8221;', 'class="footnote-ref"')
2086 | 
2087 |         return text
2088 | 
2089 |     _block_quote_base = r'''
2090 |         (                           # Wrap whole match in \1
2091 |           (
2092 |             ^[ \t]*>%s[ \t]?        # '>' at the start of a line
2093 |               .+\n                  # rest of the first line
2094 |             (.+\n)*                 # subsequent consecutive lines
2095 |           )+
2096 |         )
2097 |     '''
2098 |     _block_quote_re = re.compile(_block_quote_base % '', re.M | re.X)
2099 |     _block_quote_re_spoiler = re.compile(_block_quote_base % '[ \t]*?!?', re.M | re.X)
2100 |     _bq_one_level_re = re.compile('^[ \t]*>[ \t]?', re.M)
2101 |     _bq_one_level_re_spoiler = re.compile('^[ \t]*>[ \t]*?![ \t]?', re.M)
2102 |     _bq_all_lines_spoilers = re.compile(r'\A(?:^[ \t]*>[ \t]*?!.*[\n\r]*)+\Z', re.M)
2103 |     _html_pre_block_re = re.compile(r'(\s*<pre>.+?</pre>)', re.S)
2104 |     def _dedent_two_spaces_sub(self, match):
2105 |         return re.sub(r'(?m)^  ', '', match.group(1))
2106 | 
2107 |     def _block_quote_sub(self, match):
2108 |         bq = match.group(1)
2109 |         is_spoiler = 'spoiler' in self.extras and self._bq_all_lines_spoilers.match(bq)
2110 |         # trim one level of quoting
2111 |         if is_spoiler:
2112 |             bq = self._bq_one_level_re_spoiler.sub('', bq)
2113 |         else:
2114 |             bq = self._bq_one_level_re.sub('', bq)
2115 |         # trim whitespace-only lines
2116 |         bq = self._ws_only_line_re.sub('', bq)
2117 |         bq = self._run_block_gamut(bq)          # recurse
2118 | 
2119 |         bq = re.sub('(?m)^', '  ', bq)
2120 |         # These leading spaces screw with <pre> content, so we need to fix that:
2121 |         bq = self._html_pre_block_re.sub(self._dedent_two_spaces_sub, bq)
2122 | 
2123 |         if is_spoiler:
2124 |             return '<blockquote class="spoiler">\n%s\n</blockquote>\n\n' % bq
2125 |         else:
2126 |             return '<blockquote>\n%s\n</blockquote>\n\n' % bq
2127 | 
2128 |     def _do_block_quotes(self, text):
2129 |         if '>' not in text:
2130 |             return text
2131 |         if 'spoiler' in self.extras:
2132 |             return self._block_quote_re_spoiler.sub(self._block_quote_sub, text)
2133 |         else:
2134 |             return self._block_quote_re.sub(self._block_quote_sub, text)
2135 | 
2136 |     def _form_paragraphs(self, text):
2137 |         # Strip leading and trailing lines:
2138 |         text = text.strip('\n')
2139 | 
2140 |         # Wrap <p> tags.
2141 |         grafs = []
2142 |         for i, graf in enumerate(re.split(r"\n{2,}", text)):
2143 |             if graf in self.html_blocks:
2144 |                 # Unhashify HTML blocks
2145 |                 grafs.append(self.html_blocks[graf])
2146 |             else:
2147 |                 cuddled_list = None
2148 |                 if "cuddled-lists" in self.extras:
2149 |                     # Need to put back trailing '\n' for `_list_item_re`
2150 |                     # match at the end of the paragraph.
2151 |                     li = self._list_item_re.search(graf + '\n')
2152 |                     # Two of the same list marker in this paragraph: a likely
2153 |                     # candidate for a list cuddled to preceding paragraph
2154 |                     # text (issue 33). Note the `[-1]` is a quick way to
2155 |                     # consider numeric bullets (e.g. "1." and "2.") to be
2156 |                     # equal.
2157 |                     if (li and len(li.group(2)) <= 3
2158 |                             and (
2159 |                                     (li.group("next_marker") and li.group("marker")[-1] == li.group("next_marker")[-1])
2160 |                                     or
2161 |                                     li.group("next_marker") is None
2162 |                             )
2163 |                     ):
2164 |                         start = li.start()
2165 |                         cuddled_list = self._do_lists(graf[start:]).rstrip("\n")
2166 |                         assert cuddled_list.startswith("<ul>") or cuddled_list.startswith("<ol>")
2167 |                         graf = graf[:start]
2168 | 
2169 |                 # Wrap <p> tags.
2170 |                 graf = self._run_span_gamut(graf)
2171 |                 grafs.append("<p%s>" % self._html_class_str_from_tag('p') + graf.lstrip(" \t") + "</p>")
2172 | 
2173 |                 if cuddled_list:
2174 |                     grafs.append(cuddled_list)
2175 | 
2176 |         return "\n\n".join(grafs)
2177 | 
2178 |     def _add_footnotes(self, text):
2179 |         if self.footnotes:
2180 |             footer = [
2181 |                 '<div class="footnotes">',
2182 |                 '<hr' + self.empty_element_suffix,
2183 |                 '<ol>',
2184 |             ]
2185 | 
2186 |             if not self.footnote_title:
2187 |                 self.footnote_title = "Jump back to footnote %d in the text."
2188 |             if not self.footnote_return_symbol:
2189 |                 self.footnote_return_symbol = "&#8617;"
2190 | 
2191 |             for i, id in enumerate(self.footnote_ids):
2192 |                 if i != 0:
2193 |                     footer.append('')
2194 |                 footer.append('<li id="fn-%s">' % id)
2195 |                 footer.append(self._run_block_gamut(self.footnotes[id]))
2196 |                 try:
2197 |                     backlink = ('<a href="#fnref-%s" ' +
2198 |                             'class="footnoteBackLink" ' +
2199 |                             'title="' + self.footnote_title + '">' +
2200 |                             self.footnote_return_symbol +
2201 |                             '</a>') % (id, i+1)
2202 |                 except TypeError:
2203 |                     log.debug("Footnote error. `footnote_title` "
2204 |                               "must include parameter. Using defaults.")
2205 |                     backlink = ('<a href="#fnref-%s" '
2206 |                         'class="footnoteBackLink" '
2207 |                         'title="Jump back to footnote %d in the text.">'
2208 |                         '&#8617;</a>' % (id, i+1))
2209 | 
2210 |                 if footer[-1].endswith("</p>"):
2211 |                     footer[-1] = footer[-1][:-len("</p>")] \
2212 |                         + '&#160;' + backlink + "</p>"
2213 |                 else:
2214 |                     footer.append("\n<p>%s</p>" % backlink)
2215 |                 footer.append('</li>')
2216 |             footer.append('</ol>')
2217 |             footer.append('</div>')
2218 |             return text + '\n\n' + '\n'.join(footer)
2219 |         else:
2220 |             return text
2221 | 
2222 |     _naked_lt_re = re.compile(r'<(?![a-z/?\$!])', re.I)
2223 |     _naked_gt_re = re.compile(r'''(?<![a-z0-9?!/'"-])>''', re.I)
2224 | 
2225 |     def _encode_amps_and_angles(self, text):
2226 |         # Smart processing for ampersands and angle brackets that need
2227 |         # to be encoded.
2228 |         text = _AMPERSAND_RE.sub('&amp;', text)
2229 | 
2230 |         # Encode naked <'s
2231 |         text = self._naked_lt_re.sub('&lt;', text)
2232 | 
2233 |         # Encode naked >'s
2234 |         # Note: Other markdown implementations (e.g. Markdown.pl, PHP
2235 |         # Markdown) don't do this.
2236 |         text = self._naked_gt_re.sub('&gt;', text)
2237 |         return text
2238 | 
2239 |     _incomplete_tags_re = re.compile(r"<(/?\w+?(?!\w).+?[\s/]+?)")
2240 | 
2241 |     def _encode_incomplete_tags(self, text):
2242 |         if self.safe_mode not in ("replace", "escape"):
2243 |             return text
2244 | 
2245 |         if text.endswith(">"):
2246 |             return text  # this is not an incomplete tag, this is a link in the form <http://x.y.z>
2247 | 
2248 |         return self._incomplete_tags_re.sub("&lt;\\1", text)
2249 | 
2250 |     def _encode_backslash_escapes(self, text):
2251 |         for ch, escape in list(self._escape_table.items()):
2252 |             text = text.replace("\\"+ch, escape)
2253 |         return text
2254 | 
2255 |     _auto_link_re = re.compile(r'<((https?|ftp):[^\'">\s]+)>', re.I)
2256 |     def _auto_link_sub(self, match):
2257 |         g1 = match.group(1)
2258 |         return '<a href="%s">%s</a>' % (g1, g1)
2259 | 
2260 |     _auto_email_link_re = re.compile(r"""
2261 |           <
2262 |            (?:mailto:)?
2263 |           (
2264 |               [-.\w]+
2265 |               \@
2266 |               [-\w]+(\.[-\w]+)*\.[a-z]+
2267 |           )
2268 |           >
2269 |         """, re.I | re.X | re.U)
2270 |     def _auto_email_link_sub(self, match):
2271 |         return self._encode_email_address(
2272 |             self._unescape_special_chars(match.group(1)))
2273 | 
2274 |     def _do_auto_links(self, text):
2275 |         text = self._auto_link_re.sub(self._auto_link_sub, text)
2276 |         text = self._auto_email_link_re.sub(self._auto_email_link_sub, text)
2277 |         return text
2278 | 
2279 |     def _encode_email_address(self, addr):
2280 |         #  Input: an email address, e.g. "foo@example.com"
2281 |         #
2282 |         #  Output: the email address as a mailto link, with each character
2283 |         #      of the address encoded as either a decimal or hex entity, in
2284 |         #      the hopes of foiling most address harvesting spam bots. E.g.:
2285 |         #
2286 |         #    <a href="&#x6D;&#97;&#105;&#108;&#x74;&#111;:&#102;&#111;&#111;&#64;&#101;
2287 |         #       x&#x61;&#109;&#x70;&#108;&#x65;&#x2E;&#99;&#111;&#109;">&#102;&#111;&#111;
2288 |         #       &#64;&#101;x&#x61;&#109;&#x70;&#108;&#x65;&#x2E;&#99;&#111;&#109;</a>
2289 |         #
2290 |         #  Based on a filter by Matthew Wickline, posted to the BBEdit-Talk
2291 |         #  mailing list: <http://tinyurl.com/yu7ue>
2292 |         chars = [_xml_encode_email_char_at_random(ch)
2293 |                  for ch in "mailto:" + addr]
2294 |         # Strip the mailto: from the visible part.
2295 |         addr = '<a href="%s">%s</a>' \
2296 |                % (''.join(chars), ''.join(chars[7:]))
2297 |         return addr
2298 | 
2299 |     def _do_link_patterns(self, text):
2300 |         link_from_hash = {}
2301 |         for regex, repl in self.link_patterns:
2302 |             replacements = []
2303 |             for match in regex.finditer(text):
2304 |                 if hasattr(repl, "__call__"):
2305 |                     href = repl(match)
2306 |                 else:
2307 |                     href = match.expand(repl)
2308 |                 replacements.append((match.span(), href))
2309 |             for (start, end), href in reversed(replacements):
2310 | 
2311 |                 # Do not match against links inside brackets.
2312 |                 if text[start - 1:start] == '[' and text[end:end + 1] == ']':
2313 |                     continue
2314 | 
2315 |                 # Do not match against links in the standard markdown syntax.
2316 |                 if text[start - 2:start] == '](' or text[end:end + 2] == '")':
2317 |                     continue
2318 | 
2319 |                 # Do not match against links which are escaped.
2320 |                 if text[start - 3:start] == '"""' and text[end:end + 3] == '"""':
2321 |                     text = text[:start - 3] + text[start:end] + text[end + 3:]
2322 |                     continue
2323 | 
2324 |                 escaped_href = (
2325 |                     href.replace('"', '&quot;')  # b/c of attr quote
2326 |                         # To avoid markdown <em> and <strong>:
2327 |                         .replace('*', self._escape_table['*'])
2328 |                         .replace('_', self._escape_table['_']))
2329 |                 link = '<a href="%s">%s</a>' % (escaped_href, text[start:end])
2330 |                 hash = _hash_text(link)
2331 |                 link_from_hash[hash] = link
2332 |                 text = text[:start] + hash + text[end:]
2333 |         for hash, link in list(link_from_hash.items()):
2334 |             text = text.replace(hash, link)
2335 |         return text
2336 | 
2337 |     def _unescape_special_chars(self, text):
2338 |         # Swap back in all the special characters we've hidden.
2339 |         for ch, hash in list(self._escape_table.items()):
2340 |             text = text.replace(hash, ch)
2341 |         return text
2342 | 
2343 |     def _outdent(self, text):
2344 |         # Remove one level of line-leading tabs or spaces
2345 |         return self._outdent_re.sub('', text)
2346 | 
2347 | 
2348 | class MarkdownWithExtras(Markdown):
2349 |     """A markdowner class that enables most extras:
2350 | 
2351 |     - footnotes
2352 |     - code-color (only has effect if 'pygments' Python module on path)
2353 | 
2354 |     These are not included:
2355 |     - pyshell (specific to Python-related documenting)
2356 |     - code-friendly (because it *disables* part of the syntax)
2357 |     - link-patterns (because you need to specify some actual
2358 |       link-patterns anyway)
2359 |     """
2360 |     extras = ["footnotes", "code-color"]
2361 | 
2362 | 
2363 | # ---- internal support functions
2364 | 
2365 | 
2366 | def calculate_toc_html(toc):
2367 |     """Return the HTML for the current TOC.
2368 | 
2369 |     This expects the `_toc` attribute to have been set on this instance.
2370 |     """
2371 |     if toc is None:
2372 |         return None
2373 | 
2374 |     def indent():
2375 |         return '  ' * (len(h_stack) - 1)
2376 |     lines = []
2377 |     h_stack = [0]   # stack of header-level numbers
2378 |     for level, id, name in toc:
2379 |         if level > h_stack[-1]:
2380 |             lines.append("%s<ul>" % indent())
2381 |             h_stack.append(level)
2382 |         elif level == h_stack[-1]:
2383 |             lines[-1] += "</li>"
2384 |         else:
2385 |             while level < h_stack[-1]:
2386 |                 h_stack.pop()
2387 |                 if not lines[-1].endswith("</li>"):
2388 |                     lines[-1] += "</li>"
2389 |                 lines.append("%s</ul></li>" % indent())
2390 |         lines.append('%s<li><a href="#%s">%s</a>' % (
2391 |             indent(), id, name))
2392 |     while len(h_stack) > 1:
2393 |         h_stack.pop()
2394 |         if not lines[-1].endswith("</li>"):
2395 |             lines[-1] += "</li>"
2396 |         lines.append("%s</ul>" % indent())
2397 |     return '\n'.join(lines) + '\n'
2398 | 
2399 | 
2400 | class UnicodeWithAttrs(unicode):
2401 |     """A subclass of unicode used for the return value of conversion to
2402 |     possibly attach some attributes. E.g. the "toc_html" attribute when
2403 |     the "toc" extra is used.
2404 |     """
2405 |     metadata = None
2406 |     toc_html = None
2407 | 
2408 | ## {{{ http://code.activestate.com/recipes/577257/ (r1)
2409 | _slugify_strip_re = re.compile(r'[^\w\s-]')
2410 | _slugify_hyphenate_re = re.compile(r'[-\s]+')
2411 | def _slugify(value):
2412 |     """
2413 |     Normalizes string, converts to lowercase, removes non-alpha characters,
2414 |     and converts spaces to hyphens.
2415 | 
2416 |     From Django's "django/template/defaultfilters.py".
2417 |     """
2418 |     import unicodedata
2419 |     value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode()
2420 |     value = _slugify_strip_re.sub('', value).strip().lower()
2421 |     return _slugify_hyphenate_re.sub('-', value)
2422 | ## end of http://code.activestate.com/recipes/577257/ }}}
2423 | 
2424 | 
2425 | # From http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52549
2426 | def _curry(*args, **kwargs):
2427 |     function, args = args[0], args[1:]
2428 |     def result(*rest, **kwrest):
2429 |         combined = kwargs.copy()
2430 |         combined.update(kwrest)
2431 |         return function(*args + rest, **combined)
2432 |     return result
2433 | 
2434 | 
2435 | # Recipe: regex_from_encoded_pattern (1.0)
2436 | def _regex_from_encoded_pattern(s):
2437 |     """'foo'    -> re.compile(re.escape('foo'))
2438 |        '/foo/'  -> re.compile('foo')
2439 |        '/foo/i' -> re.compile('foo', re.I)
2440 |     """
2441 |     if s.startswith('/') and s.rfind('/') != 0:
2442 |         # Parse it: /PATTERN/FLAGS
2443 |         idx = s.rfind('/')
2444 |         _, flags_str = s[1:idx], s[idx+1:]
2445 |         flag_from_char = {
2446 |             "i": re.IGNORECASE,
2447 |             "l": re.LOCALE,
2448 |             "s": re.DOTALL,
2449 |             "m": re.MULTILINE,
2450 |             "u": re.UNICODE,
2451 |         }
2452 |         flags = 0
2453 |         for char in flags_str:
2454 |             try:
2455 |                 flags |= flag_from_char[char]
2456 |             except KeyError:
2457 |                 raise ValueError("unsupported regex flag: '%s' in '%s' "
2458 |                                  "(must be one of '%s')"
2459 |                                  % (char, s, ''.join(list(flag_from_char.keys()))))
2460 |         return re.compile(s[1:idx], flags)
2461 |     else:  # not an encoded regex
2462 |         return re.compile(re.escape(s))
2463 | 
2464 | 
2465 | # Recipe: dedent (0.1.2)
2466 | def _dedentlines(lines, tabsize=8, skip_first_line=False):
2467 |     """_dedentlines(lines, tabsize=8, skip_first_line=False) -> dedented lines
2468 | 
2469 |         "lines" is a list of lines to dedent.
2470 |         "tabsize" is the tab width to use for indent width calculations.
2471 |         "skip_first_line" is a boolean indicating if the first line should
2472 |             be skipped for calculating the indent width and for dedenting.
2473 |             This is sometimes useful for docstrings and similar.
2474 | 
2475 |     Same as dedent() except operates on a sequence of lines. Note: the
2476 |     lines list is modified **in-place**.
2477 |     """
2478 |     DEBUG = False
2479 |     if DEBUG:
2480 |         print("dedent: dedent(..., tabsize=%d, skip_first_line=%r)"\
2481 |               % (tabsize, skip_first_line))
2482 |     margin = None
2483 |     for i, line in enumerate(lines):
2484 |         if i == 0 and skip_first_line: continue
2485 |         indent = 0
2486 |         for ch in line:
2487 |             if ch == ' ':
2488 |                 indent += 1
2489 |             elif ch == '\t':
2490 |                 indent += tabsize - (indent % tabsize)
2491 |             elif ch in '\r\n':
2492 |                 continue  # skip all-whitespace lines
2493 |             else:
2494 |                 break
2495 |         else:
2496 |             continue  # skip all-whitespace lines
2497 |         if DEBUG: print("dedent: indent=%d: %r" % (indent, line))
2498 |         if margin is None:
2499 |             margin = indent
2500 |         else:
2501 |             margin = min(margin, indent)
2502 |     if DEBUG: print("dedent: margin=%r" % margin)
2503 | 
2504 |     if margin is not None and margin > 0:
2505 |         for i, line in enumerate(lines):
2506 |             if i == 0 and skip_first_line: continue
2507 |             removed = 0
2508 |             for j, ch in enumerate(line):
2509 |                 if ch == ' ':
2510 |                     removed += 1
2511 |                 elif ch == '\t':
2512 |                     removed += tabsize - (removed % tabsize)
2513 |                 elif ch in '\r\n':
2514 |                     if DEBUG: print("dedent: %r: EOL -> strip up to EOL" % line)
2515 |                     lines[i] = lines[i][j:]
2516 |                     break
2517 |                 else:
2518 |                     raise ValueError("unexpected non-whitespace char %r in "
2519 |                                      "line %r while removing %d-space margin"
2520 |                                      % (ch, line, margin))
2521 |                 if DEBUG:
2522 |                     print("dedent: %r: %r -> removed %d/%d"\
2523 |                           % (line, ch, removed, margin))
2524 |                 if removed == margin:
2525 |                     lines[i] = lines[i][j+1:]
2526 |                     break
2527 |                 elif removed > margin:
2528 |                     lines[i] = ' '*(removed-margin) + lines[i][j+1:]
2529 |                     break
2530 |             else:
2531 |                 if removed:
2532 |                     lines[i] = lines[i][removed:]
2533 |     return lines
2534 | 
2535 | 
2536 | def _dedent(text, tabsize=8, skip_first_line=False):
2537 |     """_dedent(text, tabsize=8, skip_first_line=False) -> dedented text
2538 | 
2539 |         "text" is the text to dedent.
2540 |         "tabsize" is the tab width to use for indent width calculations.
2541 |         "skip_first_line" is a boolean indicating if the first line should
2542 |             be skipped for calculating the indent width and for dedenting.
2543 |             This is sometimes useful for docstrings and similar.
2544 | 
2545 |     textwrap.dedent(s), but don't expand tabs to spaces
2546 |     """
2547 |     lines = text.splitlines(1)
2548 |     _dedentlines(lines, tabsize=tabsize, skip_first_line=skip_first_line)
2549 |     return ''.join(lines)
2550 | 
2551 | 
2552 | class _memoized(object):
2553 |     """Decorator that caches a function's return value each time it is called.
2554 |     If called later with the same arguments, the cached value is returned, and
2555 |     not re-evaluated.
2556 | 
2557 |     http://wiki.python.org/moin/PythonDecoratorLibrary
2558 |     """
2559 |     def __init__(self, func):
2560 |         self.func = func
2561 |         self.cache = {}
2562 | 
2563 |     def __call__(self, *args):
2564 |         try:
2565 |             return self.cache[args]
2566 |         except KeyError:
2567 |             self.cache[args] = value = self.func(*args)
2568 |             return value
2569 |         except TypeError:
2570 |             # uncachable -- for instance, passing a list as an argument.
2571 |             # Better to not cache than to blow up entirely.
2572 |             return self.func(*args)
2573 | 
2574 |     def __repr__(self):
2575 |         """Return the function's docstring."""
2576 |         return self.func.__doc__
2577 | 
2578 | 
2579 | def _xml_oneliner_re_from_tab_width(tab_width):
2580 |     """Standalone XML processing instruction regex."""
2581 |     return re.compile(r"""
2582 |         (?:
2583 |             (?<=\n\n)       # Starting after a blank line
2584 |             |               # or
2585 |             \A\n?           # the beginning of the doc
2586 |         )
2587 |         (                           # save in $1
2588 |             [ ]{0,%d}
2589 |             (?:
2590 |                 <\?\w+\b\s+.*?\?>   # XML processing instruction
2591 |                 |
2592 |                 <\w+:\w+\b\s+.*?/>  # namespaced single tag
2593 |             )
2594 |             [ \t]*
2595 |             (?=\n{2,}|\Z)       # followed by a blank line or end of document
2596 |         )
2597 |         """ % (tab_width - 1), re.X)
2598 | _xml_oneliner_re_from_tab_width = _memoized(_xml_oneliner_re_from_tab_width)
2599 | 
2600 | 
2601 | def _hr_tag_re_from_tab_width(tab_width):
2602 |     return re.compile(r"""
2603 |         (?:
2604 |             (?<=\n\n)       # Starting after a blank line
2605 |             |               # or
2606 |             \A\n?           # the beginning of the doc
2607 |         )
2608 |         (                       # save in \1
2609 |             [ ]{0,%d}
2610 |             <(hr)               # start tag = \2
2611 |             \b                  # word break
2612 |             ([^<>])*?           #
2613 |             /?>                 # the matching end tag
2614 |             [ \t]*
2615 |             (?=\n{2,}|\Z)       # followed by a blank line or end of document
2616 |         )
2617 |         """ % (tab_width - 1), re.X)
2618 | _hr_tag_re_from_tab_width = _memoized(_hr_tag_re_from_tab_width)
2619 | 
2620 | 
2621 | def _xml_escape_attr(attr, skip_single_quote=True):
2622 |     """Escape the given string for use in an HTML/XML tag attribute.
2623 | 
2624 |     By default this doesn't bother with escaping `'` to `&#39;`, presuming that
2625 |     the tag attribute is surrounded by double quotes.
2626 |     """
2627 |     escaped = _AMPERSAND_RE.sub('&amp;', attr)
2628 | 
2629 |     escaped = (attr
2630 |         .replace('"', '&quot;')
2631 |         .replace('<', '&lt;')
2632 |         .replace('>', '&gt;'))
2633 |     if not skip_single_quote:
2634 |         escaped = escaped.replace("'", "&#39;")
2635 |     return escaped
2636 | 
2637 | 
2638 | def _xml_encode_email_char_at_random(ch):
2639 |     r = random()
2640 |     # Roughly 10% raw, 45% hex, 45% dec.
2641 |     # '@' *must* be encoded. I [John Gruber] insist.
2642 |     # Issue 26: '_' must be encoded.
2643 |     if r > 0.9 and ch not in "@_":
2644 |         return ch
2645 |     elif r < 0.45:
2646 |         # The [1:] is to drop leading '0': 0x63 -> x63
2647 |         return '&#%s;' % hex(ord(ch))[1:]
2648 |     else:
2649 |         return '&#%s;' % ord(ch)
2650 | 
2651 | 
2652 | def _html_escape_url(attr, safe_mode=False):
2653 |     """Replace special characters that are potentially malicious in url string."""
2654 |     escaped = (attr
2655 |         .replace('"', '&quot;')
2656 |         .replace('<', '&lt;')
2657 |         .replace('>', '&gt;'))
2658 |     if safe_mode:
2659 |         escaped = escaped.replace('+', ' ')
2660 |         escaped = escaped.replace("'", "&#39;")
2661 |     return escaped
2662 | 
2663 | 
2664 | # ---- mainline
2665 | 
2666 | class _NoReflowFormatter(optparse.IndentedHelpFormatter):
2667 |     """An optparse formatter that does NOT reflow the description."""
2668 |     def format_description(self, description):
2669 |         return description or ""
2670 | 
2671 | 
2672 | def _test():
2673 |     import doctest
2674 |     doctest.testmod()
2675 | 
2676 | 
2677 | def main(argv=None):
2678 |     if argv is None:
2679 |         argv = sys.argv
2680 |     if not logging.root.handlers:
2681 |         logging.basicConfig()
2682 | 
2683 |     usage = "usage: %prog [PATHS...]"
2684 |     version = "%prog "+__version__
2685 |     parser = optparse.OptionParser(prog="markdown2", usage=usage,
2686 |         version=version, description=cmdln_desc,
2687 |         formatter=_NoReflowFormatter())
2688 |     parser.add_option("-v", "--verbose", dest="log_level",
2689 |                       action="store_const", const=logging.DEBUG,
2690 |                       help="more verbose output")
2691 |     parser.add_option("--encoding",
2692 |                       help="specify encoding of text content")
2693 |     parser.add_option("--html4tags", action="store_true", default=False,
2694 |                       help="use HTML 4 style for empty element tags")
2695 |     parser.add_option("-s", "--safe", metavar="MODE", dest="safe_mode",
2696 |                       help="sanitize literal HTML: 'escape' escapes "
2697 |                            "HTML meta chars, 'replace' replaces with an "
2698 |                            "[HTML_REMOVED] note")
2699 |     parser.add_option("-x", "--extras", action="append",
2700 |                       help="Turn on specific extra features (not part of "
2701 |                            "the core Markdown spec). See above.")
2702 |     parser.add_option("--use-file-vars",
2703 |                       help="Look for and use Emacs-style 'markdown-extras' "
2704 |                            "file var to turn on extras. See "
2705 |                            "<https://github.com/trentm/python-markdown2/wiki/Extras>")
2706 |     parser.add_option("--link-patterns-file",
2707 |                       help="path to a link pattern file")
2708 |     parser.add_option("--self-test", action="store_true",
2709 |                       help="run internal self-tests (some doctests)")
2710 |     parser.add_option("--compare", action="store_true",
2711 |                       help="run against Markdown.pl as well (for testing)")
2712 |     parser.set_defaults(log_level=logging.INFO, compare=False,
2713 |                         encoding="utf-8", safe_mode=None, use_file_vars=False)
2714 |     opts, paths = parser.parse_args()
2715 |     log.setLevel(opts.log_level)
2716 | 
2717 |     if opts.self_test:
2718 |         return _test()
2719 | 
2720 |     if opts.extras:
2721 |         extras = {}
2722 |         for s in opts.extras:
2723 |             splitter = re.compile("[,;: ]+")
2724 |             for e in splitter.split(s):
2725 |                 if '=' in e:
2726 |                     ename, earg = e.split('=', 1)
2727 |                     try:
2728 |                         earg = int(earg)
2729 |                     except ValueError:
2730 |                         pass
2731 |                 else:
2732 |                     ename, earg = e, None
2733 |                 extras[ename] = earg
2734 |     else:
2735 |         extras = None
2736 | 
2737 |     if opts.link_patterns_file:
2738 |         link_patterns = []
2739 |         f = open(opts.link_patterns_file)
2740 |         try:
2741 |             for i, line in enumerate(f.readlines()):
2742 |                 if not line.strip(): continue
2743 |                 if line.lstrip().startswith("#"): continue
2744 |                 try:
2745 |                     pat, href = line.rstrip().rsplit(None, 1)
2746 |                 except ValueError:
2747 |                     raise MarkdownError("%s:%d: invalid link pattern line: %r"
2748 |                                         % (opts.link_patterns_file, i+1, line))
2749 |                 link_patterns.append(
2750 |                     (_regex_from_encoded_pattern(pat), href))
2751 |         finally:
2752 |             f.close()
2753 |     else:
2754 |         link_patterns = None
2755 | 
2756 |     from os.path import join, dirname, abspath, exists
2757 |     markdown_pl = join(dirname(dirname(abspath(__file__))), "test",
2758 |                        "Markdown.pl")
2759 |     if not paths:
2760 |         paths = ['-']
2761 |     for path in paths:
2762 |         if path == '-':
2763 |             text = sys.stdin.read()
2764 |         else:
2765 |             fp = codecs.open(path, 'r', opts.encoding)
2766 |             text = fp.read()
2767 |             fp.close()
2768 |         if opts.compare:
2769 |             from subprocess import Popen, PIPE
2770 |             print("==== Markdown.pl ====")
2771 |             p = Popen('perl %s' % markdown_pl, shell=True, stdin=PIPE, stdout=PIPE, close_fds=True)
2772 |             p.stdin.write(text.encode('utf-8'))
2773 |             p.stdin.close()
2774 |             perl_html = p.stdout.read().decode('utf-8')
2775 |             if py3:
2776 |                 sys.stdout.write(perl_html)
2777 |             else:
2778 |                 sys.stdout.write(perl_html.encode(
2779 |                     sys.stdout.encoding or "utf-8", 'xmlcharrefreplace'))
2780 |             print("==== markdown2.py ====")
2781 |         html = markdown(text,
2782 |             html4tags=opts.html4tags,
2783 |             safe_mode=opts.safe_mode,
2784 |             extras=extras, link_patterns=link_patterns,
2785 |             use_file_vars=opts.use_file_vars,
2786 |             cli=True)
2787 |         if py3:
2788 |             sys.stdout.write(html)
2789 |         else:
2790 |             sys.stdout.write(html.encode(
2791 |                 sys.stdout.encoding or "utf-8", 'xmlcharrefreplace'))
2792 |         if extras and "toc" in extras:
2793 |             log.debug("toc_html: " +
2794 |                 str(html.toc_html.encode(sys.stdout.encoding or "utf-8", 'xmlcharrefreplace')))
2795 |         if opts.compare:
2796 |             test_dir = join(dirname(dirname(abspath(__file__))), "test")
2797 |             if exists(join(test_dir, "test_markdown2.py")):
2798 |                 sys.path.insert(0, test_dir)
2799 |                 from test_markdown2 import norm_html_from_html
2800 |                 norm_html = norm_html_from_html(html)
2801 |                 norm_perl_html = norm_html_from_html(perl_html)
2802 |             else:
2803 |                 norm_html = html
2804 |                 norm_perl_html = perl_html
2805 |             print("==== match? %r ====" % (norm_perl_html == norm_html))
2806 | 
2807 | 
2808 | if __name__ == "__main__":
2809 |     sys.exit(main(sys.argv))
2810 | 


--------------------------------------------------------------------------------
/src/markdown2/markdown2Mathjax.py:
--------------------------------------------------------------------------------
  1 | __version_info__ = (0,3,9)
  2 | __version__ = '.'.join(map(str,__version_info__))
  3 | __author__ = "Matthew Young"
  4 | 
  5 | import re
  6 | from .markdown2 import markdown
  7 | 
  8 | def break_tie(inline,equation):
  9 |     """If one of the delimiters is a substring of the other (e.g., $ and $$) it is possible that the two will begin at the same location.  In this case we need some criteria to break the tie and decide which operation takes precedence.  I've gone with the longer of the two delimiters takes priority (for example, $$ over $).  This function should return a 2 for the equation block taking precedence, a 1 for the inline block.  The magic looking return statement is to map 0->2 and 1->1."""
 10 |     tmp=(inline.end()-inline.start() > equation.end()-equation.start())
 11 |     return (tmp*3+2)%4
 12 | 
 13 | def markdown_safe(placeholder):
 14 |     """Is the placeholder changed by markdown?  If it is, this isn't a valid placeholder."""
 15 |     mdstrip=re.compile("<p>(.*)</p>\n")
 16 |     md=markdown(placeholder)
 17 |     mdp=mdstrip.match(md)
 18 |     if mdp and mdp.group(1)==placeholder:
 19 |         return True
 20 |     return False
 21 | 
 22 | def mathdown(text):
 23 |     """Convenience function which runs the basic markdown and mathjax processing sequentially."""
 24 |     tmp=sanitizeInput(text)
 25 |     return reconstructMath(markdown(tmp[0]),tmp[1])
 26 | 
 27 | def sanitizeInput(string,inline_delims=["$","$"],equation_delims=["$$","$$"],placeholder="™™™"):
 28 |     """Given a string that will be passed to markdown, the content of the different math blocks is stripped out and replaced by a placeholder which MUST be ignored by markdown.  A list is returned containing the text with placeholders and a list of the stripped out equations.  Note that any pre-existing instances of the placeholder are "replaced" with themselves and a corresponding dummy entry is placed in the returned codeblock.  The sanitized string can then be passed safetly through markdown and then reconstructed with reconstructMath.
 29 | 
 30 |     There are potential four delimiters that can be specified.  The left and right delimiters for inline and equation mode math.  These can potentially be anything that isn't already used by markdown and is compatible with mathjax (see documentation for both).
 31 |     """
 32 |     #Check placeholder is valid.
 33 |     if not markdown_safe(placeholder):
 34 |         raise ValueError("Placeholder %s altered by markdown processing." % placeholder)
 35 |     #really what we want is a reverse markdown function, but as that's too much work, this will do
 36 |     inline_left=re.compile("(?<!\\\\)"+re.escape(inline_delims[0]))
 37 |     inline_right=re.compile("(?<!\\\\)"+re.escape(inline_delims[1]))
 38 |     equation_left=re.compile("(?<!\\\\)"+re.escape(equation_delims[0]))
 39 |     equation_right=re.compile("(?<!\\\\)"+re.escape(equation_delims[1]))
 40 |     placeholder_re = re.compile("(?<!\\\\)"+re.escape(placeholder))
 41 |     placeholder_scan = placeholder_re.scanner(string)
 42 |     ilscanner=[inline_left.scanner(string),inline_right.scanner(string)]
 43 |     eqscanner=[equation_left.scanner(string),equation_right.scanner(string)]
 44 |     scanners=[placeholder_scan,ilscanner,eqscanner]
 45 |     #There are 3 types of blocks, inline math, equation math and occurances of the placeholder in the text
 46 |     #inBlack is 0 for a placeholder, 1 for inline block, 2 for equation
 47 |     inBlock=0
 48 |     post=-1
 49 |     stlen=len(string)
 50 |     startmatches=[placeholder_scan.search(),ilscanner[0].search(),eqscanner[0].search()]
 51 |     startpoints=[stlen,stlen,stlen]
 52 |     startpoints[0]= startmatches[0].start() if startmatches[0] else stlen
 53 |     startpoints[1]= startmatches[1].start() if startmatches[1] else stlen
 54 |     startpoints[2]= startmatches[2].start() if startmatches[2] else stlen
 55 |     terminator=-1
 56 |     sanitizedString=''
 57 |     codeblocks=[]
 58 |     while 1:
 59 |         #find the next point of interest.
 60 |         while startmatches[0] and startmatches[0].start()<post:
 61 |             startmatches[0]=placeholder_scan.search()
 62 |             startpoints[0]= startmatches[0].start() if startmatches[0] else stlen
 63 |         while startmatches[1] and startmatches[1].start()<post:
 64 |             startmatches[1]=ilscanner[0].search()
 65 |             startpoints[1]= startmatches[1].start() if startmatches[1] else stlen
 66 |         while startmatches[2] and startmatches[2].start()<post:
 67 |             startmatches[2]=eqscanner[0].search()
 68 |             startpoints[2]= startmatches[2].start() if startmatches[2] else stlen
 69 |         #Found start of next block of each type
 70 |         #Placeholder type always takes precedence if it exists and is next...
 71 |         if startmatches[0] and min(startpoints)==startpoints[0]:
 72 |             #We can do it all in one!
 73 |             #First add the "stripped" code to the blocks
 74 |             codeblocks.append('0'+placeholder)
 75 |             #Work out where the placeholder ends
 76 |             tmp=startpoints[0]+len(placeholder)
 77 |             #Add the "sanitized" text up to and including the placeholder
 78 |             sanitizedString = sanitizedString + string[post*(post>=0):tmp]
 79 |             #Set the new post
 80 |             post=tmp
 81 |             #Back to start!
 82 |             continue
 83 |         elif startmatches[1] is None and startmatches[2] is None:
 84 |                 #No more blocks, add in the rest of string and be done with it...
 85 |                 sanitizedString = sanitizedString + string[post*(post>=0):]
 86 |                 return (sanitizedString, codeblocks)
 87 |         elif startmatches[1] is None:
 88 |             inBlock=2
 89 |         elif startmatches[2] is None:
 90 |             inBlock=1
 91 |         else:
 92 |             inBlock = (startpoints[1] < startpoints[2]) + (startpoints[1] > startpoints[2])*2
 93 |             if not inBlock:
 94 |                 inBlock = break_tie(startmatches[1],startmatches[2])
 95 |         #Magic to ensure minimum index is 0
 96 |         sanitizedString = sanitizedString+string[(post*(post>=0)):startpoints[inBlock]]
 97 |         post = startmatches[inBlock].end()
 98 |         #Now find the matching end...
 99 |         while terminator<post:
100 |             endpoint=scanners[inBlock][1].search()
101 |             #If we run out of terminators before ending this loop, we're done
102 |             if endpoint is None:
103 |                 #Add the unterminated codeblock to the sanitized string
104 |                 sanitizedString = sanitizedString + string[startpoints[inBlock]:]
105 |                 return (sanitizedString, codeblocks)
106 |             terminator=endpoint.start()
107 |         #We fonud a matching endpoint, add the bit to the appropriate codeblock...
108 |         codeblocks.append(str(inBlock)+string[post:endpoint.start()])
109 |         #Now add in the appropriate placeholder
110 |         sanitizedString = sanitizedString+placeholder
111 |         #Fabulous.  Now we can start again once we update post...
112 |         post = endpoint.end()
113 | 
114 | def reconstructMath(processedString,codeblocks,inline_delims=["$","$"],equation_delims=["$$","$$"],placeholder="™™™",htmlSafe=False) -> object:
115 |     """This is usually the output of sanitizeInput, after having passed the output string through markdown.  The delimiters given to this function should match those used to construct the string to begin with.
116 | 
117 |      This will output a string containing html suitable to use with mathjax.
118 |      
119 |      "<" and ">" "&" symbols in math can confuse the html interpreter because they mark the begining and end of definition blocks.  To avoid issues, if htmlSafe is set to True these symbols will be replaced by ascii codes in the math blocks. The downside to this is that if anyone is already doing this, there already niced text might be mangled (I think I've taken steps to make sure it won't but not extensively tested...)"""
120 |     delims=[['',''],inline_delims,equation_delims]
121 |     placeholder_re = re.compile("(?<!\\\\)"+re.escape(placeholder))
122 |     #If we've defined some "new" special characters we'll have to process any escapes of them here
123 |     #Make html substitutions.
124 |     if htmlSafe:
125 |         safeAmp=re.compile("&(?!(?:amp;|lt;|gt;))")
126 |         for i in range(len(codeblocks)):
127 |             codeblocks[i]=safeAmp.sub("&amp;",codeblock[i])
128 |             codeblocks[i]=codeblocks[i].replace("<","&lt;")
129 |             codeblocks[i]=codeblocks[i].replace(">","&gt;")
130 |     #Step through the codeblocks one at a time and replace the next occurance of the placeholder.  Extra placeholders are invalid math blocks and ignored...
131 |     outString=''
132 |     scan = placeholder_re.scanner(processedString)
133 |     post=0
134 |     for i in range(len(codeblocks)):
135 |         inBlock=int(codeblocks[i][0])
136 |         match=scan.search()
137 |         if not match:
138 |             raise ValueError("More codeblocks given than valid placeholders in text.")
139 |         outString=outString+processedString[post:match.start()]+delims[inBlock][0]+codeblocks[i][1:]+delims[inBlock][1]
140 |         post = match.end()
141 |     #Add the rest of the string (if we need to)
142 |     if post<len(processedString):
143 |         outString = outString+processedString[post:]
144 |     return outString
145 | 
146 | def findBoundaries(string):
147 |     """A depricated function.  Finds the location of string boundaries in a stupid way."""
148 |     last=''
149 |     twod=[]
150 |     oned=[]
151 |     boundary=False
152 |     inoned=False
153 |     intwod=False
154 |     for count,char in enumerate(string):
155 |         if char=="$" and last!='\\':
156 |             #We just hit a valid $ character!
157 |             if inoned:
158 |                 oned.append(count)
159 |                 inoned=False
160 |             elif intwod:
161 |                 if boundary:
162 |                     twod.append(count)
163 |                     intwod=False
164 |                     boundary=False
165 |                 else:
166 |                     boundary=True
167 |             elif boundary:
168 |                 #This means the last character was also a valid $
169 |                 twod.append(count)
170 |                 intwod=True
171 |                 boundary=False
172 |             else:
173 |                 #This means the last character was NOT a useable $
174 |                 boundary=True
175 |         elif boundary:
176 |             #The last character was a valid $, but this one isn't...
177 |             #This means the last character was a valid $, but this isn't
178 |             if inoned:
179 |                 print("THIS SHOULD NEVER HAPPEN!")
180 |             elif intwod:
181 |                 #ignore it...
182 |                 pass
183 |             else:
184 |                 oned.append(count-1)
185 |                 inoned=True
186 |             boundary=False
187 |         last=char
188 |     #What if we finished on a boundary character?  Actually doesn't matter, but let's include it for completeness
189 |     if boundary:
190 |         if not (inoned or intwod):
191 |             oned.append(count)
192 |             inoned=True
193 |     return (oned,twod)
194 | 


--------------------------------------------------------------------------------
/src/obsidian_url.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | from aqt.utils import showInfo
 4 | 
 5 | def process_obsidian_file(file_content:str, files_catalog:list):
 6 | 	lines = file_content.split("\n")
 7 | 	
 8 | 	isInCode = False
 9 | 	
10 | 	for i in range(0, len(lines)):
11 | 		if lines[i].startswith("<div class=\"codehilite\"><pre><span></span><code>"):
12 | 			isInCode = True
13 | 		elif lines[i].startswith("</code></pre></div>"):
14 | 			isInCode = False
15 | 		if not isInCode:
16 | 			lines[i] = process_obsidian_line(lines[i], files_catalog)
17 | 	
18 | 	file_content = "\n".join(lines)
19 | 	return file_content
20 | 			
21 | def process_obsidian_line(line, files_catalog:list):
22 | 	if line.find("[[") != -1 and line.find("]]") != -1:
23 | 		line = line.replace("[[", "ªªª[[")
24 | 		line = line.replace("]]", "]]ªªª")
25 | 		line_segments = line.split("[[")
26 | 		line = "º".join(line_segments)
27 | 		line_segments = line.split("]]")
28 | 		line = "º".join(line_segments)
29 | 		line_segments = line.split("º")
30 | 		number_of_segments = len(line_segments)
31 | 		number_of_replacements = number_of_segments // 2
32 | 		if number_of_replacements > 0:
33 | 			for i in range(1, number_of_replacements + 1):
34 | 				replacement_index = 2 * i - 1
35 | 				line_segments[replacement_index] = process_obsidian_link_content(line_segments[replacement_index], files_catalog)
36 | 		line = "".join(line_segments)
37 | 		line = line.replace("ªªª", "")
38 | 	return line
39 | 	
40 | def process_obsidian_link_content(content, files_catalog:list):
41 | 	if content.find("|") != -1:
42 | 		content_segments = content.split("|")
43 | 		obsidian_url = search_for_note(content_segments[0], files_catalog)
44 | 		content = "<a href = \"" + obsidian_url + "\">" + content_segments[1] + "</a>"
45 | 	else:
46 | 		obsidian_url = search_for_note(content, files_catalog)
47 | 		content = "<a href = \"" + obsidian_url + "\">" + content + "</a>"
48 | 	return content
49 | 
50 | def search_for_note(name:str, files_catalog:list):
51 | 	for file in files_catalog:
52 | 		if file.get_file_name() == name:
53 | 			return file.get_obsidian_url()
54 | 	return ""
55 | 


--------------------------------------------------------------------------------
/src/processor.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | import re
  3 | import html
  4 | import random
  5 | import hashlib
  6 | from . import settings
  7 | from .markdown2 import markdown2
  8 | from .markdown2 import markdown2Mathjax
  9 | from aqt.utils import showInfo
 10 | 
 11 | 
 12 | 
 13 | mark_file_extras = {
 14 | 	"fenced-code-blocks": None,
 15 | 	"metadata": None, 
 16 | 	"strike": None, 
 17 | 	"tables": None, 
 18 | 	"tag-friendly": None, 
 19 | 	"task_list": None, 
 20 | 	"footnotes": None, 
 21 | 	"break-on-newline": True
 22 | }
 23 | 
 24 | def read_file(full_path:str) -> list:
 25 | 	output = ""
 26 | 	source = ""
 27 | 	uid = ""
 28 | 	has_uid = False
 29 | 	with open(full_path, mode="r", encoding="utf-8") as file:
 30 | 		source = file.read()
 31 | 		temporary_content = markdown2Mathjax.sanitizeInput(source)
 32 | 		temporary_content0 = temporary_content[0].replace("[[", "œœœ")
 33 | 		temporary_content0 = temporary_content[0].replace("]]", "®®®")
 34 | 		if source.startswith("---"):
 35 | 			markdown_file = markdown2.markdown(temporary_content0, extras = ["fenced-code-blocks", "metadata", "strike", "tables", "tag-friendly", "task_list", "footnotes", "break-on-newline"])
 36 | 			metadata = markdown_file.metadata
 37 | 		else:
 38 | 			markdown_file = markdown2.markdown(temporary_content0, extras = ["fenced-code-blocks", "strike", "tables", "tag-friendly", "task_list", "footnotes", "break-on-newline"])
 39 | 			metadata = {}
 40 | 		markdown_file = markdown_file.replace("œœœ", "[[")
 41 | 		markdown_file = markdown_file.replace("®®®", "]]")
 42 | 		try:
 43 | 			uid = metadata["uid"]
 44 | 		except:
 45 | 			random_number = random.randint(0, 100000000000000000000000000000)
 46 | 			new_source = source + full_path + str(random_number)
 47 | 			hash_value = hashlib.md5(new_source.encode())
 48 | 			uid = str(hash_value.hexdigest())
 49 | 			if len(metadata) == 0:
 50 | 				source = "---\nuid: " + uid + "\n---\n\n" + source
 51 | 			else:
 52 | 				source_lines = source.split("\n")
 53 | 				source_lines[0] = "---\nuid: " + uid
 54 | 				source = "\n".join(source_lines)
 55 | 			
 56 | 		for i in range(len(temporary_content[1])):
 57 | 			temporary_content[1][i] = html.escape(temporary_content[1][i])
 58 | 			temporary_content[1][i] = temporary_content[1][i].replace("{{", "{ {")
 59 | 			temporary_content[1][i] = temporary_content[1][i].replace("}}", "} }")
 60 | 		
 61 | 		cloze_settings = metadata_to_settings(metadata)
 62 | 		
 63 | 		markdown_file = get_converted_file(cloze_settings, markdown_file)
 64 | 		if cloze_settings["type"] != "cloze" and cloze_settings["type"] != "Cloze":
 65 | 			markdown_file[1] = False
 66 | 		output = markdown2Mathjax.reconstructMath(markdown_file[0], temporary_content[1])
 67 | 	output = math_conversion(output)
 68 | 	
 69 | 	# FIXME: output here
 70 | 	with open(full_path, mode = "w", encoding = "utf-8") as file:
 71 | 		file.write(source)
 72 | 	return [uid, output, markdown_file[1], metadata]
 73 | 
 74 | def metadata_to_settings(metadata: dict) -> dict:
 75 | 	new_settings = {}
 76 | 	default_settings = settings.get_settings()
 77 | 	for individual_key in default_settings.keys():
 78 | 		try:
 79 | 			new_settings[individual_key] = metadata[individual_key]
 80 | 		except KeyError:
 81 | 			new_settings[individual_key] = default_settings[individual_key]
 82 | 	return new_settings
 83 | 
 84 | def get_converted_file(cloze_settings, file_content):
 85 | 	file_content = cloze_generation(cloze_settings, file_content)
 86 | 	file_content = cloze_number_generation(cloze_settings["mode"], file_content)
 87 | 	return file_content
 88 | 
 89 | # Special thanks to Anis Qiao (https://github.com/qiaozhanrong) for the math_conversion section of the code! Now, obsidianki can support display math formula written in multiple lines. 
 90 | 
 91 | def math_conversion(file_content):
 92 | 	isOpen = False
 93 | 	s = ""
 94 | 	p = 0
 95 | 	while True:
 96 | 		q = file_content.find("$$", p)
 97 | 		if q == -1:
 98 | 			s += file_content[p:]
 99 | 			break
100 | 		s += file_content[p:q] + ("\]" if isOpen else "\[")
101 | 		isOpen = not isOpen
102 | 		p = q + 2
103 | 	file_content = s
104 | 	
105 | 	isOpen = False
106 | 	s = ""
107 | 	p = 0
108 | 	while True:
109 | 		q = file_content.find("$", p)
110 | 		if q == -1:
111 | 			s += file_content[p:]
112 | 			break
113 | 		s += file_content[p:q] + ("\)" if isOpen else "\(")
114 | 		isOpen = not isOpen
115 | 		p = q + 1
116 | 	file_content = s
117 | 	
118 | 	return file_content
119 | 				
120 | def cloze_generation(cloze_settings:dict, file_content:str) -> str:
121 | 	file_content = file_content.replace("#new_cloze", "<label id = \"tag\">#new_cloze</label>")
122 | 	if cloze_settings["type"] == "cloze" or cloze_settings["type"] == "Cloze":
123 | 		if cloze_settings["bold"] == "True" or cloze_settings["bold"] == "true":
124 | 			file_content = file_content.replace("<strong>", "<strong>{{c¡::")
125 | 			file_content = file_content.replace("</strong>", "}}</strong>")
126 | 		if cloze_settings["italics"] == "True" or cloze_settings["italics"] == "true":
127 | 			file_content = file_content.replace("<em>", "<em>{{c¡::")
128 | 			file_content = file_content.replace("</em>", "}}</em>")
129 | 		if cloze_settings["image"] == "True" or cloze_settings["image"] == "true":
130 | 			file_content = apply_cloze_to_image(file_content)
131 | 		if cloze_settings["inline code"] == "True" or cloze_settings["inline code"] == "true":
132 | 			file_content = re.sub(r"<code>(?!<span)", "<code>{{c¡::", file_content)
133 | 			file_content = re.sub(r"</code>(?!</pre>)", "}}</code>", file_content)
134 | 		if cloze_settings["QA"] == "True" or cloze_settings["QA"] == "true":
135 | 			tmp = file_content.split("\n")
136 | 			for i in range(0, len(tmp)):
137 | 				if tmp[i].startswith("<p>A: ") and tmp[i].endswith("</p>"):
138 | 					# TODO: add a security check to make sure that these two things are in the same line. 
139 | 					tmp[i] = tmp[i].replace("{{¡::", "")
140 | 					tmp[i] = tmp[i].replace("}}", "")
141 | 					tmp[i] = tmp[i].replace("<p>A: ", "<p>A: {{c¡::", 1)
142 | 					tmp[i] = tmp[i].replace("</p>", "}}</p>", 1)
143 | 				elif tmp[i].startswith("<p>答：") and tmp[i].endswith("</p>"):
144 | 					tmp[i] = tmp[i].replace("{{¡::", "")
145 | 					tmp[i] = tmp[i].replace("}}", "")
146 | 					tmp[i] = tmp[i].replace("<p>答：", "<p>答：{{c¡::", 1)
147 | 					tmp[i] = tmp[i].replace("</p>", "}}</p>")
148 | 					
149 | 				# ==================================================================
150 | 				# | You Can Disable this code if you Enabled strict line spacing.  |
151 | 				# ==================================================================
152 | 				elif tmp[i].startswith("A: ") and tmp[i].endswith("</p>"):
153 | 					tmp[i] = tmp[i].replace("{{¡::", "")
154 | 					tmp[i] = tmp[i].replace("}}", "")
155 | 					tmp[i] = tmp[i].replace("A: ", "A: {{c¡::", 1)
156 | 					tmp[i] = tmp[i].replace("</p>", "}}</p>", 1)
157 | 				elif tmp[i].startswith("答：") and tmp[i].endswith("</p>"):
158 | 					tmp[i] = tmp[i].replace("{{¡::", "")
159 | 					tmp[i] = tmp[i].replace("}}", "")
160 | 					tmp[i] = tmp[i].replace("答：", "答: {{c¡::", 1)
161 | 					tmp[i] = tmp[i].replace("</p>", "}}</p>", 1)
162 | 			file_content = "\n".join(tmp)
163 | 		if cloze_settings["list"] == "True" or cloze_settings["list"] == "true":
164 | 			tmp = file_content.split("\n")
165 | 			for i in range(0, len(tmp)):
166 | 				if tmp[i].find("{{c¡::") != -1:
167 | 					pass
168 | 				else:
169 | 					tmp[i] = tmp[i].replace("<li>", "<li>{{c¡::")
170 | 					tmp[i] = tmp[i].replace("</li>", "}}</li>")
171 | 			file_content = "\n".join(tmp)
172 | 		if cloze_settings["quote"] == "True" or cloze_settings["quote"] == "true":
173 | 			# ===================================================
174 | 			# | TODO: use REGEX to replace the proper ones here |
175 | 			# ===================================================
176 | 			file_content = file_content.replace("<blockquote>", "<blockquote>{{c¡::")
177 | 			file_content = file_content.replace("</blockquote>", "}}</blockquote>")
178 | 		if cloze_settings["block code"] == "True" or cloze_settings["block code"] == "true":
179 | 			# ===================================================
180 | 			# | TODO: use REGEX to replace the proper ones here |
181 | 			# ===================================================
182 | 			file_content = file_content.replace("<div class=\"codehilite\"><pre><span></span><code>", "<div class=\"codehilite\"><pre><span></span><code>{{c¡::")
183 | 			file_content = file_content.replace("</code></pre></div>", "}}</code></pre></div>")
184 | 		file_content = highlight_conversion(file_content, cloze_settings["highlight"])
185 | 	elif cloze_settings["type"] == "basic" or cloze_settings["type"] == "Basic":
186 | 		file_content = highlight_conversion(file_content, "False")
187 | 	return file_content
188 | 
189 | def cloze_number_generation(mode:str, file_content:str) -> [str, bool]:
190 | 	has_cloze = False
191 | 	
192 | 	if file_content.find("¡") != -1:
193 | 		has_cloze = True
194 | 	
195 | 	if mode == "word":
196 | 		cloze_num = 1
197 | 		while file_content.find("¡") != -1:
198 | 			file_content = file_content.replace("¡", str(cloze_num), 1)
199 | 			cloze_num = cloze_num + 1
200 | 	elif mode == "line":
201 | 		tmp = file_content.split("\n")
202 | 		cloze_num = 0
203 | 		for i in range (0, len(tmp)):
204 | 			if tmp[i].find("¡") != -1:
205 | 				cloze_num = cloze_num + 1
206 | 				tmp[i] = tmp[i].replace("¡", str(cloze_num))
207 | 		file_content = "\n".join(tmp)
208 | 	elif mode == "heading":
209 | 		# ==========================================================
210 | 		# | TODO: Check the code here to see if it actually works  |
211 | 		# ==========================================================
212 | 		tmp = file_content.split("\n")
213 | 		cloze_num = 0
214 | 		increase_num = 0
215 | 		new_cloze = 0
216 | 		for i in range(0, len(tmp)):
217 | 			if re.search(r"<h\d>", tmp[i]) != None or tmp[i].find("#new_cloze") != -1:
218 | 				# TODO: add this to documentation
219 | 				cloze_num = get_cloze_number(tmp) + 1
220 | 			if tmp[i].startswith("<p>A: ") or tmp[i].startswith("<p>答：") or tmp[i].startswith("A: ") or tmp[i].startswith("答："):
221 | 				increase_num = get_cloze_number(tmp) + 1
222 | 				tmp[i] = tmp[i].replace("¡", str(increase_num))
223 | 				cloze_num = increase_num + 1
224 | 			elif tmp[i].startswith("<li>"):
225 | 				if new_cloze == 0 and i < (len(tmp) - 2) and not tmp[i + 1].startswith("<li>"):
226 | 					increase_num = get_cloze_number(tmp) + 1
227 | 					tmp[i] = tmp[i].replace("¡", str(increase_num))
228 | 					cloze_num = increase_num + 1
229 | 				elif new_cloze == 0:
230 | 					new_cloze = 1
231 | 					increase_num = get_cloze_number(tmp) + 1
232 | 					tmp[i] = tmp[i].replace("¡", str(increase_num))
233 | 				elif new_cloze == 1 and i < (len(tmp) - 2) and tmp[i + 1].startswith("<li>"):
234 | 					tmp[i] = tmp[i].replace("¡", str(increase_num))
235 | 				elif new_cloze == 1 and i < (len(tmp) - 2) and not tmp[i + 1].startswith("<li>"):
236 | 					tmp[i] = tmp[i].replace("¡", str(increase_num))
237 | 					cloze_num = increase_num + 1
238 | 					new_cloze = 0
239 | 			tmp[i] = tmp[i].replace("¡", str(cloze_num))
240 | 		file_content = "\n".join(tmp)
241 | 	elif mode == "document":
242 | 		if file_content.find("¡") != -1:
243 | 			file_content = file_content.replace("¡", "1")
244 | 	return [file_content, has_cloze]
245 | 
246 | 
247 | def get_cloze_number(tmp) -> int:
248 | 	file_content = "".join(tmp)
249 | 	cloze_number = 0
250 | 	for i in range(1, 7):
251 | 		if file_content.find("{{c%d::"%(i)) != -1:
252 | 			cloze_number = i
253 | 	return cloze_number
254 | 
255 | # =========================================
256 | # | TODO: Check to see if this code works |
257 | # =========================================
258 | 
259 | 
260 | def highlight_conversion(file_content: str, to_cloze: bool) -> str:
261 | 	lines = file_content.split("\n")
262 | 	isInCode = False
263 | 	for i in range(0, len(lines)):
264 | 		if lines[i].startswith("<div class=\"codehilite\"><pre><span></span><code>"):
265 | 			isInCode = True
266 | 		elif lines[i].startswith("</code></pre></div>"):
267 | 			isInCode = False
268 | 		if not isInCode:
269 | 			lines[i] = apply_highlight(lines[i], to_cloze)
270 | 	file_content = "\n".join(lines)
271 | 	return file_content
272 | 
273 | 
274 | def apply_highlight(line: str, to_cloze: str) -> str:
275 | 	line = "ªªª" + line + "ªªª"
276 | 	line_segments = line.split("==")
277 | 	number_of_highlights = len(line_segments) // 2
278 | 	if number_of_highlights > 0:
279 | 		if to_cloze == "True" or to_cloze == "true":
280 | 			for i in range(1, number_of_highlights + 1):
281 | 				highlight_index = 2 * i - 1
282 | 				line_segments[highlight_index] = "<label id = \"highlight\">{{c¡::" + line_segments[highlight_index] + "}}</label>"
283 | 		else:
284 | 			for i in range(1, number_of_highlights + 1):
285 | 				highlight_index = 2 * i - 1
286 | 				line_segments[highlight_index] = "<label id = \"highlight\">" + line_segments[highlight_index] + "</label>"
287 | 
288 | 	line = "".join(line_segments)
289 | 	line = line.replace("ªªª", "")
290 | 	return line
291 | 
292 | def apply_cloze_to_image(file_content: str) -> str:
293 | 	lines = file_content.split("\n")
294 | 	for i in range(0, len(lines)):
295 | 		image_url = re.search(r"<img src=\".+? \/>", lines[i])
296 | 		if image_url != None:
297 | 			lines[i] = re.sub(r"<img src=\".+? \/>", "{{c¡::" + image_url.group(0) + "}}", lines[i])
298 | 	file_content = "\n".join(lines)
299 | 	return file_content


--------------------------------------------------------------------------------
/src/settings.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | import os
 3 | import aqt
 4 | import pickle
 5 | from aqt.qt import *
 6 | from aqt import AnkiQt, gui_hooks
 7 | from aqt.utils import tooltip
 8 | from PyQt5 import QtWidgets, QtCore
 9 | 
10 | # TODO: Rework on the settings and clean up the code
11 | 
12 | default_settings = {
13 |     "vault path": "/Users/xiuxuan/Knowledge Base",
14 |     "trash folder": "Trash",
15 |     "archive folder": "Archive",
16 |     "ignore folder": "Templates",
17 |     "mode": "heading",
18 |     "type": "cloze",
19 |     "bold": "True",
20 |     "italics": "True",
21 |     "image": "True",
22 |     "quote": "False", # FIXME: fix the conflict of Quote with other clozes
23 |     "QA": "True",
24 |     "list": "True",
25 |     "inline code": "True",
26 |     "block code": "False",
27 |     "highlight": "False"
28 | }
29 | 
30 | SETTINGS_PATH = os.path.expanduser("~/.obsidianki4.settings")
31 | 
32 | 
33 | def save_settings(settings, path=SETTINGS_PATH):
34 |     with open(path, "wb") as fd:
35 |         pickle.dump(settings, fd)
36 | 
37 | 
38 | def load_settings(path=SETTINGS_PATH):
39 |     if os.path.isfile(path):
40 |         with open(path, "rb") as fd:
41 |             return pickle.load(fd)
42 |     return default_settings
43 | 
44 | def get_settings():
45 |     settings = load_settings()
46 |     return settings
47 | 
48 | def get_settings_by_name(setting_name):
49 |     settings = load_settings()
50 |     try:
51 |         return settings[setting_name]
52 |     except KeyError:
53 |         return default_settings[setting_name]


--------------------------------------------------------------------------------