├── .DS_Store ├── Archive.zip ├── LICENSE ├── Obsidianki 4.0.apkg ├── Obsidianki 4.ankiaddon ├── README.md ├── miscellaneous └── ankiweb.html └── src ├── README.md ├── __init__.py ├── anki_importer.py ├── files.py ├── manifest.json ├── markdown2 ├── markdown2.py └── markdown2Mathjax.py ├── obsidian_url.py ├── processor.py └── settings.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wxxedu/obsidianki4/90305bbf74aa6b625b6b374e87964bf5d5d62a13/.DS_Store -------------------------------------------------------------------------------- /Archive.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wxxedu/obsidianki4/90305bbf74aa6b625b6b374e87964bf5d5d62a13/Archive.zip -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 wxxedu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Obsidianki 4.0.apkg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wxxedu/obsidianki4/90305bbf74aa6b625b6b374e87964bf5d5d62a13/Obsidianki 4.0.apkg -------------------------------------------------------------------------------- /Obsidianki 4.ankiaddon: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wxxedu/obsidianki4/90305bbf74aa6b625b6b374e87964bf5d5d62a13/Obsidianki 4.ankiaddon -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # obsidianki 4 2 | 3 | NOTE: The project is now **PAUSED**, and I will not be actively mainting this project until June. I am busy with my exam-preparations for now. Sorry about this issue :-( 4 | 5 | 6 | 7 | > **Please back-up your vault regularly while using this add-on!** 8 | > 9 | > I am a noob in programming, while no occasion of losing notes has happened, I am afraid that it might. As your notes are valuable, please do remember to back up notes. 10 | > 11 | > **Versions** 12 | > 13 | > Theoretically, it now supports Anki 2.1.26 +. I am unaware whether if supports earlier version, and I wasn't able to test it on Anki 2.1.28 as my laptop is a M1 MacBook Air and Anki 2.1.28 does not open on it. 14 | > 15 | > **Expectations** 16 | > 17 | > With this add-on, it is expected that you make all your changes in Obsidian (including the deletion, addition, and moving of the files). If you want to edit a file, you can just click on the link in Anki to link back to Obsidian. Instead of deleting files, you should move unused files into the `.trash` folder that you can turn on in the settings of Obsidian. Obsidianki will automatically remove them for you. 18 | 19 | This is a [Anki](https://github.com/ankitects) add-on that would import your files from [Obsidian](https://obsidian.md) into Anki while preserving the wiki-links. Each file in Obsidianki will be converted to a single note in Anki. It does so by searching through your vault for the file with the name specified and generating an Obsidian url from the path. 20 | 21 | Its github page is [obsidianki4](https://github.com/wxxedu/obsidianki4). 22 | 23 | This add-on also works with [hierarchical tags](https://ankiweb.net/shared/info/594329229) to convert the hierarchical tags in Obsidian in the metadata section (`tags: [tag1/tag1.1/tag1.1.1, tag2/tag2.1/tag2.1.1]`) into anki hierarchical tags. `tag1::tag1.1::tag1.1.1` and `tag2::tag2.1::tag2.1.1` 24 | 25 | ## How to Install 26 | 27 | You can install this Add-on by downloading the `obsidianki 4.ankiaddon` file from the releases section of GitHub and double click on it. 28 | 29 | You can also download from AnkiWeb: [Obsidianki 4 Addon Page](https://ankiweb.net/shared/info/620260832). The code for this add-on is 620260832. 30 | 31 | ## How to Use 32 | 33 | **Before starting to use, you will have to install Obsidianki's template, without which Obsidianki would not work.** To do so, go to Anki's Add-ons folder, open the folder "Obsidianki 4", and find `Obsidianki 4.apkg`. Double click on it to install. You can also download it from GitHub. 34 | 35 | After you've installed the Add-on, you can open Anki, select `Tools` -> `Obsidianki 4`, as shown in the following picture. 36 | 37 | ![](https://tva1.sinaimg.cn/large/008eGmZEgy1gmmwz3peljj30u80ncq62.jpg) 38 | 39 | The following menu will pop-up, which will include the default preferences panel. **NOTE THAT THE SETTINGS IN THIS PANEL ARE ALL DEFAULT SETTINGS**, and you **SHOULD NOT** change them regularly, as a change will **AFFECT ALL YOUR NOTES**. 40 | 41 | ![](https://tva1.sinaimg.cn/large/008eGmZEgy1gmpllk0e9nj30rq0zkn1f.jpg) 42 | 43 | Copy the path of your Obsidian vault into the first field. Note that you will have to use **forward slashes** `/` instead of backward ones for Obsidianki to function properly. 44 | 45 | After you've set the settings (I will explain in the next section), you can click on "Save and Convert", and it will complete the conversion. However, you won't notice a difference. Why? Because Anki's interface is not refreshed. To refresh the interface, you could click on anything in Anki's main interface, and it should be refreshed. 46 | 47 | ## Default Settings 48 | 49 | Now, let's take a look at the default settings. 50 | 51 | ### Vault Path 52 | 53 | This place shows the path to your vault. Note that in order for the wiki-links in Anki to link back to Obsidian, you will have to use a path that is actually a vault. If you just copy the path of a folder in the vault, the link function will not work. 54 | 55 | Another thing to take especial note of is that you should use **forward slashes** instead of backward ones. 56 | 57 | ### Templates Folder Name 58 | 59 | The name of the folder in the first level that holds your templates. If specified, the contents in this folder would not be imported to Anki. 60 | 61 | ### Trash Folder Name 62 | 63 | The name of the folder in the first level that holds your trash. If specified, the contents in this folder would be **erased** when you run the Obsidianki add-on, and the corresponding cards in anki would also be deleted. 64 | 65 | ### Archive Folder Name 66 | 67 | The name of the folder in the first level that holds your archived file. If specified, corresponding anki cards to the contents in this folder would be deleted in Anki, but the files are still there in Obsidian and would not be deleted. 68 | 69 | ### Mode 70 | 71 | There are four importing cloze modes in Obsidianki. 72 | 73 | #### `word` mode 74 | 75 | It generates a card for every cloze. If you have 10 clozes, it generates 10 cards from `{{c1::Card 1}}` to `{{c10::}}`. 76 | 77 | #### `line` mode 78 | 79 | It generates a card for every line. If you have 10 clozes in the first line, they will be `{{c1::Card 1}}` to `{{c10::Card 10}}`. If you have 2 more clozes in the second line, they will be `{{c2::Card 11}}` to `{{c2::Card 12}}`. 80 | 81 | #### `heading` mode (Recommended) 82 | 83 | It generates a card for the content under every heading, with the exception of list cards and QA cards (I will explain this below). If you have a file as below: 84 | 85 | ```markdown 86 | # Heading 1 87 | 88 | Hello **Obsidianki**. 89 | 90 | This is the best **Anki** add-on for importing Obsidian files into **Anki**. 91 | 92 | ## Heading 2 93 | 94 | This is something **interesting**. 95 | 96 | Q: What is the best add-on for importing Obsidian files into **Anki**? 97 | 98 | A: Obsidianki! 99 | 100 | What are the features of Obsidianki? 101 | 102 | 1. Import files 103 | 2. Preserve wiki links 104 | 3. Convert to Clozes 105 | 106 | ## Heading 3 107 | 108 | This is **Heading 3**. 109 | 110 | ``` 111 | 112 | The "Obsidianki", "Anki" under "Heading 1" will be turned into `{{c1::Obsidianki}}` and `{{c1::Anki}}` respectively. 113 | 114 | Theoretically, everything under "Heading 2" should be turned into `{{c2::...}}` cards, right? Not quite, because I have added QA cards and list cards. So, after conversion, the portion under heading to would become: 115 | 116 | ```markdown 117 | ## Heading 2 118 | 119 | This is something **{{c2::interesting}}**. 120 | 121 | Q: What is the best add-on for importing Obsidian files into **Anki**? 122 | 123 | A: {{c3::Obsidianki!}} 124 | 125 | What are the features of Obsidianki? 126 | 127 | 1. {{c4::Import files}} 128 | 2. {{c5::Preserve wiki links}} 129 | 3. {{c6::Convert to Clozes}} 130 | ``` 131 | 132 | #### `document` mode 133 | 134 | In the document mode, everything will be converted to `{{c1::...}}`. 135 | 136 | ### Type 137 | 138 | There are two types in Obsidianki 4: `cloze` and `basic`. Nevertheless, these two types are different from Anki's `cloze` and `basic`. 139 | 140 | #### `cloze` 141 | 142 | This type will create visible deletions on the screen. You will be able to see `[...]` on the screen where you applied cloze. 143 | 144 | #### `basic` 145 | 146 | This type will only create one card, and the cloze deletion would not be visible. 147 | 148 | ### Conversions 149 | 150 | #### Bold to Cloze: 151 | 152 | This converts the bold syntax `**bold**` to cloze in Anki, while preserving the format. 153 | 154 | #### Italics to Cloze: 155 | 156 | This converts the italics syntax `*italics*` to cloze in Anki, while preserving the format. 157 | 158 | #### Highlight to Cloze: 159 | 160 | This converts the highlight syntax `==highlight==` to cloze in Anki, while preserving the format. 161 | 162 | #### Image to Cloze: 163 | 164 | This converts the image syntax `![]()` to cloze in Anki, while preserving the image. 165 | 166 | #### Quote to Cloze: 167 | 168 | This converts the quote syntax `> this is a quote` to cloze in Anki, while preserving format. 169 | 170 | **Be aware that currently, this has conflicts with the other syntaxes. If you want to leave this option on, you will have to make sure that you apply no other cloze formatting in the quote.** 171 | 172 | #### QA to Cloze: 173 | 174 | This converts the QA syntax that I created into cloze in Anki. 175 | 176 | ```markdown 177 | Q: Question 178 | 179 | A: Answer 180 | ``` 181 | 182 | ```markdown 183 | Q: Question 184 | 185 | A: {{c1::Answer}} 186 | ``` 187 | 188 | #### List to Cloze: 189 | 190 | This turns any list into `Cloze`, where each list item is a cloze. 191 | 192 | #### Inline Code to Cloze: 193 | 194 | This converts the inline code syntax to cloze in Anki, while preserving format. 195 | 196 | #### Block Code to Cloze: 197 | 198 | This converts the block code syntax to cloze in Anki, while preserving format. 199 | 200 | ## Individual Settings 201 | 202 | You can also individually specify the settings for each note (file) in the metadata section of your file. The metadata section is the following segment in the very beginning of a document. 203 | 204 | ```markdown 205 | --- 206 | uid: 4511487055494033182 207 | --- 208 | ``` 209 | 210 | **By the way, Obsidianki will automatically create a metadata section that contains the file's unique id in the file. If you don't want duplicated notes, do not change the uid.** 211 | 212 | If you want to change the individual importing settings for each file, type it in in the metadata section. You can make this a template in Obsidian: 213 | 214 | ``` 215 | --- 216 | mode: heading 217 | type: cloze 218 | bold: True 219 | italics: True 220 | highlight: False 221 | image: True 222 | quote: False 223 | QA: True 224 | list: True 225 | inline code: True 226 | block code: False 227 | --- 228 | ``` 229 | 230 | ## Special Note 231 | 232 | ### About the Development 233 | 234 | I will try my best to develop and maintain this add-on. However, as of right now, I am just a high school student who barely knows any programming. All my knowledge of programming come from my AP Computer Science A class LOL. 235 | 236 | I know that my code is pretty bad, so feel free to help me update them. (please do so so that I can learn from you!) I will probably add more comments to my code explaining my thoughts while writing them in the future, just in case you want to know what I did in the code. (I want to do this because I struggled to understand Anki's source code and other Add-ons). While this will not be an add-on writing tutorial and I am by no means good at python, it is my best hopes that sharing my thoughts as a beginner will help other beginners understand better how to write Anki add-ons. This will take some time for me to do, as I need to get back to work and studying, but I am going to spend some time doing so. 237 | 238 | ### Thanks 239 | 240 | I want to thank the creators of Anki and Obsidian for building such beautiful apps. I also want to thank my friend [Anis](https://github.com/qiaozhanrong) for helping me with the code. 241 | -------------------------------------------------------------------------------- /miscellaneous/ankiweb.html: -------------------------------------------------------------------------------- 1 | 2 | Note: please back up your obsidian vault regularly while using this add-on. As it will write certain information to your vault, I am concerned that it might have the very slight chance of erasing your files. 3 | 4 | This is a Anki add-on that would import your files from Obsidian into Anki while preserving the wiki-links. Each file in Obsidianki will be converted to a single note in Anki. It does so by searching through your vault for the file with the name specified and generating an Obsidian url from the path. Note that this only works with Anki 2.1.38+ 5 | 6 | https://github.com/wxxedu/obsidianki4 7 | 8 | This add-on also works with hierarchical tags to convert the hierarchical tags in Obsidian in the metadata section (tags: [tag1/tag1.1/tag1.1.1, tag2/tag2.1/tag2.1.1]) into anki hierarchical tags. tag1::tag1.1::tag1.1.1 and tag2::tag2.1::tag2.1.1. 9 | 10 | ## How to Install 11 | 12 | You can install this Add-on by downloading the obsidianki 4.ankiaddon file from the releases section of GitHub and double click on it. 13 | 14 | You can also use the install code below. 15 | 16 | ## How to Use 17 | 18 | Before starting to use, you will have to install Obsidianki's template, without which Obsidianki would not work. You can also download it from Obsidianki's Github Page. 19 | 20 | After you've installed the Add-on, you can open Anki, select Tools -> Obsidianki 4, as shown in the following picture. 21 | 22 | 23 | 24 | The following menu will pop-up, which will include the default preferences panel. NOTE THAT THE SETTINGS IN THIS PANEL ARE ALL DEFAULT SETTINGS, and you SHOULD NOT change them regularly, as a change will AFFECT ALL YOUR NOTES. 25 | 26 | 27 | 28 | Copy the path of your Obsidian vault into the first field. Note that you will have to use forward slashes / instead of backward ones for Obsidianki to function properly. 29 | 30 | After you've set the settings (I will explain in the next section), you can click on "Save and Convert", and it will complete the conversion. However, you won't notice a difference. Why? Because Anki's interface is not refreshed. To refresh the interface, you could click on anything in Anki's main interface, and it should be refreshed. 31 | 32 | ## Default Settings 33 | 34 | Now, let's take a look at the default settings. 35 | 36 | ### Vault Path 37 | 38 | This place shows the path to your vault. Note that in order for the wiki-links in Anki to link back to Obsidian, you will have to use a path that is actually a vault. If you just copy the path of a folder in the vault, the link function will not work. 39 | 40 | Another thing to take especial note of is that you should use forward slashes instead of backward ones. 41 | 42 | ### Mode 43 | 44 | There are four importing cloze modes in Obsidianki. 45 | 46 | #### word mode 47 | 48 | It generates a card for every cloze. If you have 10 clozes, it generates 10 cards from {{c1::Card 1}} to {{c10::}}. 49 | 50 | #### line mode 51 | 52 | It generates a card for every line. If you have 10 clozes in the first line, they will be {{c1::Card 1}} to {{c10::Card 10}}. If you have 2 more clozes in the second line, they will be {{c2::Card 11}} to {{c2::Card 12}}. 53 | 54 | #### heading mode (Recommended) 55 | 56 | It generates a card for the content under every heading, with the exception of list cards and QA cards (I will explain this below). If you have a file as below: 57 | 58 | 59 | 60 | The "Obsidianki", "Anki" under "Heading 1" will be turned into {{c1::Obsidianki}} and {{c1::Anki}} respectively. 61 | 62 | Theoretically, everything under "Heading 2" should be turned into {{c2::...}} cards, right? Not quite, because I have added QA cards and list cards. So, after conversion, the portion under heading to would become: 63 | 64 | 65 | 66 | #### document mode 67 | 68 | In the document mode, everything will be converted to {{c1::...}}. 69 | 70 | ### Type 71 | 72 | There are two types in Obsidianki 4: cloze and basic. Nevertheless, these two types are different from Anki's `cloze` and basic. 73 | 74 | #### cloze 75 | 76 | This type will create visible deletions on the screen. You will be able to see [...] on the screen where you applied cloze. 77 | 78 | #### basic 79 | 80 | This type will only create one card, and the cloze deletion would not be visible. 81 | 82 | ### Conversions 83 | 84 | #### Bold to Cloze: 85 | 86 | This converts the bold syntax **bold** to cloze in Anki, while preserving the format. 87 | 88 | #### Italics to Cloze: 89 | 90 | This converts the italics syntax *italics* to cloze in Anki, while preserving the format. 91 | 92 | #### Highlight to Cloze: 93 | 94 | This converts the highlight syntax ==highlight== to cloze in Anki, while preserving the format. 95 | 96 | #### Image to Cloze: 97 | 98 | This converts the image syntax ![]() to cloze in Anki, while preserving the image. 99 | 100 | #### Quote to Cloze: 101 | 102 | This converts the quote syntax > this is a quote to cloze in Anki, while preserving format. 103 | 104 | Be aware that currently, this has conflicts with the other syntaxes. If you want to leave this option on, you will have to make sure that you apply no other cloze formatting in the quote. 105 | 106 | #### QA to Cloze: 107 | 108 | This converts the QA syntax that I created into cloze in Anki. 109 | 110 | 111 | 112 | #### List to Cloze: 113 | 114 | This turns any list into cloze, where each list item is a cloze. 115 | 116 | #### Inline Code to Cloze: 117 | 118 | This converts the inline code syntax to cloze in Anki, while preserving format. 119 | 120 | #### Block Code to Cloze: 121 | 122 | This converts the block code syntax to cloze in Anki, while preserving format. 123 | 124 | ## Individual Settings 125 | 126 | You can also individually specify the settings for each note (file) in the metadata section of your file. The metadata section is the following segment in the very beginning of a document. 127 | 128 | 129 | 130 | By the way, Obsidianki will automatically create a metadata section that contains the file's unique id in the file. If you don't want duplicated notes, do not change the uid. 131 | 132 | If you want to change the individual importing settings for each file, type it in in the metadata section. You can make this a template in Obsidian: 133 | 134 | 135 | 136 | ## Special Note 137 | 138 | ### About the Development 139 | 140 | I will try my best to develop and maintain this add-on. However, as of right now, I am just a high school student who barely knows any programming. All my knowledge of programming come from my AP Computer Science A class LOL. 141 | 142 | I know that my code is pretty bad, so feel free to help me update them. (please do so so that I can learn from you!) 143 | 144 | ### Thanks 145 | 146 | I want to thank the creators of Anki and Obsidian for building such beautiful apps. I also want to thank my friend Anis for helping me with the code. -------------------------------------------------------------------------------- /src/README.md: -------------------------------------------------------------------------------- 1 | # obsidianki 4 2 | 3 | > **Please back-up your vault regularly while using this add-on!** 4 | > 5 | > Theoretically, it now supports Anki 2.1.28 +. I am unware whether if supports earlier version, and I wasn't able to test it on Anki 2.1.28 as my laptop is a M1 MacBook Air and Anki 2.1.28 does not open on it. 6 | > 7 | > This add-on also works with [hierarchical tags](https://ankiweb.net/shared/info/594329229) to convert the hierarchical tags in Obsidian in the metadata section (`tags: [tag1/tag1.1/tag1.1.1, tag2/tag2.1/tag2.1.1]`) into anki hierarchical tags. `tag1::tag1.1::tag1.1.1` and `tag2::tag2.1::tag2.1.1` 8 | 9 | This is a [Anki](https://github.com/ankitects) add-on that would import your files from [Obsidian](https://obsidian.md) into Anki while preserving the wiki-links. Each file in Obsidianki will be converted to a single note in Anki. It does so by searching through your vault for the file with the name specified and generating an Obsidian url from the path. 10 | 11 | Its github page is [obsidianki4](https://github.com/wxxedu/obsidianki4). 12 | 13 | ## How to Install 14 | 15 | You can install this Add-on by downloading the `obsidianki 4.ankiaddon` file from the releases section of GitHub and double click on it. 16 | 17 | You can also download from AnkiWeb: [Obsidianki 4 Addon Page](https://ankiweb.net/shared/info/620260832). The code for this add-on is 620260832. 18 | 19 | ## How to Use 20 | 21 | **Before starting to use, you will have to install Obsidianki's template, without which Obsidianki would not work.** To do so, go to Anki's Add-ons folder, open the folder "Obsidianki 4", and find `Obsidianki 4.apkg`. Double click on it to install. You can also download it from GitHub. 22 | 23 | After you've installed the Add-on, you can open Anki, select `Tools` -> `Obsidianki 4`, as shown in the following picture. 24 | 25 | ![](https://tva1.sinaimg.cn/large/008eGmZEgy1gmmwz3peljj30u80ncq62.jpg) 26 | 27 | The following menu will pop-up, which will include the default preferences panel. **NOTE THAT THE SETTINGS IN THIS PANEL ARE ALL DEFAULT SETTINGS**, and you **SHOULD NOT** change them regularly, as a change will **AFFECT ALL YOUR NOTES**. 28 | 29 | ![](https://tva1.sinaimg.cn/large/008eGmZEgy1gmpllk0e9nj30rq0zkn1f.jpg) 30 | 31 | Copy the path of your Obsidian vault into the first field. Note that you will have to use **forward slashes** `/` instead of backward ones for Obsidianki to function properly. 32 | 33 | After you've set the settings (I will explain in the next section), you can click on "Save and Convert", and it will complete the conversion. However, you won't notice a difference. Why? Because Anki's interface is not refreshed. To refresh the interface, you could click on anything in Anki's main interface, and it should be refreshed. 34 | 35 | ## Default Settings 36 | 37 | Now, let's take a look at the default settings. 38 | 39 | ### Vault Path 40 | 41 | This place shows the path to your vault. Note that in order for the wiki-links in Anki to link back to Obsidian, you will have to use a path that is actually a vault. If you just copy the path of a folder in the vault, the link function will not work. 42 | 43 | Another thing to take especial note of is that you should use **forward slashes** instead of backward ones. 44 | 45 | ### Templates Folder Name 46 | 47 | The name of the folder in the first level that holds your templates. If specified, the contents in this folder would not be imported to Anki. 48 | 49 | ### Trash Folder Name 50 | 51 | The name of the folder in the first level that holds your trash. If specified, the contents in this folder would be **erased** when you run the Obsidianki add-on, and the corresponding cards in anki would also be deleted. 52 | 53 | ### Archive Folder Name 54 | 55 | The name of the folder in the first level that holds your archived file. If specified, corresponding anki cards to the contents in this folder would be deleted in Anki, but the files are still there in Obsidian and would not be deleted. 56 | 57 | ### Mode 58 | 59 | There are four importing cloze modes in Obsidianki. 60 | 61 | #### `word` mode 62 | 63 | It generates a card for every cloze. If you have 10 clozes, it generates 10 cards from `{{c1::Card 1}}` to `{{c10::}}`. 64 | 65 | #### `line` mode 66 | 67 | It generates a card for every line. If you have 10 clozes in the first line, they will be `{{c1::Card 1}}` to `{{c10::Card 10}}`. If you have 2 more clozes in the second line, they will be `{{c2::Card 11}}` to `{{c2::Card 12}}`. 68 | 69 | #### `heading` mode (Recommended) 70 | 71 | It generates a card for the content under every heading, with the exception of list cards and QA cards (I will explain this below). If you have a file as below: 72 | 73 | ```markdown 74 | # Heading 1 75 | 76 | Hello **Obsidianki**. 77 | 78 | This is the best **Anki** add-on for importing Obsidian files into **Anki**. 79 | 80 | ## Heading 2 81 | 82 | This is something **interesting**. 83 | 84 | Q: What is the best add-on for importing Obsidian files into **Anki**? 85 | 86 | A: Obsidianki! 87 | 88 | What are the features of Obsidianki? 89 | 90 | 1. Import files 91 | 2. Preserve wiki links 92 | 3. Convert to Clozes 93 | 94 | ## Heading 3 95 | 96 | This is **Heading 3**. 97 | 98 | ``` 99 | 100 | The "Obsidianki", "Anki" under "Heading 1" will be turned into `{{c1::Obsidianki}}` and `{{c1::Anki}}` respectively. 101 | 102 | Theoretically, everything under "Heading 2" should be turned into `{{c2::...}}` cards, right? Not quite, because I have added QA cards and list cards. So, after conversion, the portion under heading to would become: 103 | 104 | ```markdown 105 | ## Heading 2 106 | 107 | This is something **{{c2::interesting}}**. 108 | 109 | Q: What is the best add-on for importing Obsidian files into **Anki**? 110 | 111 | A: {{c3::Obsidianki!}} 112 | 113 | What are the features of Obsidianki? 114 | 115 | 1. {{c4::Import files}} 116 | 2. {{c5::Preserve wiki links}} 117 | 3. {{c6::Convert to Clozes}} 118 | ``` 119 | 120 | #### `document` mode 121 | 122 | In the document mode, everything will be converted to `{{c1::...}}`. 123 | 124 | ### Type 125 | 126 | There are two types in Obsidianki 4: `cloze` and `basic`. Nevertheless, these two types are different from Anki's `cloze` and `basic`. 127 | 128 | #### `cloze` 129 | 130 | This type will create visible deletions on the screen. You will be able to see `[...]` on the screen where you applied cloze. 131 | 132 | #### `basic` 133 | 134 | This type will only create one card, and the cloze deletion would not be visible. 135 | 136 | ### Conversions 137 | 138 | #### Bold to Cloze: 139 | 140 | This converts the bold syntax `**bold**` to cloze in Anki, while preserving the format. 141 | 142 | #### Italics to Cloze: 143 | 144 | This converts the italics syntax `*italics*` to cloze in Anki, while preserving the format. 145 | 146 | #### Highlight to Cloze: 147 | 148 | This converts the highlight syntax `==highlight==` to cloze in Anki, while preserving the format. 149 | 150 | #### Image to Cloze: 151 | 152 | This converts the image syntax `![]()` to cloze in Anki, while preserving the image. 153 | 154 | #### Quote to Cloze: 155 | 156 | This converts the quote syntax `> this is a quote` to cloze in Anki, while preserving format. 157 | 158 | **Be aware that currently, this has conflicts with the other syntaxes. If you want to leave this option on, you will have to make sure that you apply no other cloze formatting in the quote.** 159 | 160 | #### QA to Cloze: 161 | 162 | This converts the QA syntax that I created into cloze in Anki. 163 | 164 | ```markdown 165 | Q: Question 166 | 167 | A: Answer 168 | ``` 169 | 170 | ```markdown 171 | Q: Question 172 | 173 | A: {{c1::Answer}} 174 | ``` 175 | 176 | #### List to Cloze: 177 | 178 | This turns any list into `Cloze`, where each list item is a cloze. 179 | 180 | #### Inline Code to Cloze: 181 | 182 | This converts the inline code syntax to cloze in Anki, while preserving format. 183 | 184 | #### Block Code to Cloze: 185 | 186 | This converts the block code syntax to cloze in Anki, while preserving format. 187 | 188 | ## Individual Settings 189 | 190 | You can also individually specify the settings for each note (file) in the metadata section of your file. The metadata section is the following segment in the very beginning of a document. 191 | 192 | ```markdown 193 | --- 194 | uid: 4511487055494033182 195 | --- 196 | ``` 197 | 198 | **By the way, Obsidianki will automatically create a metadata section that contains the file's unique id in the file. If you don't want duplicated notes, do not change the uid.** 199 | 200 | If you want to change the individual importing settings for each file, type it in in the metadata section. You can make this a template in Obsidian: 201 | 202 | ``` 203 | --- 204 | mode: heading 205 | type: cloze 206 | bold: True 207 | italics: True 208 | highlight: False 209 | image: True 210 | quote: False 211 | QA: True 212 | list: True 213 | inline code: True 214 | block code: False 215 | --- 216 | ``` 217 | 218 | ## Special Note 219 | 220 | ### About the Development 221 | 222 | I will try my best to develop and maintain this add-on. However, as of right now, I am just a high school student who barely knows any programming. All my knowledge of programming come from my AP Computer Science A class LOL. 223 | 224 | I know that my code is pretty bad, so feel free to help me update them. (please do so so that I can learn from you!) I will probably add more comments to my code explaining my thoughts while writing them in the future, just in case you want to know what I did in the code. (I want to do this because I struggled to understand Anki's source code and other Add-ons). While this will not be an add-on writing tutorial and I am by no means good at python, it is my best hopes that sharing my thoughts as a beginner will help other beginners understand better how to write Anki add-ons. This will take some time for me to do, as I need to get back to work and studying, but I am going to spend some time doing so. 225 | 226 | ### Thanks 227 | 228 | I want to thank the creators of Anki and Obsidian for building such beautiful apps. I also want to thank my friend [Anis](https://github.com/qiaozhanrong) for helping me with the code. 229 | 230 | 231 | 232 | 233 | 234 | -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import os 3 | from . import files 4 | from . import settings 5 | from . import obsidian_url 6 | from . import anki_importer 7 | import aqt 8 | from aqt import mw 9 | from aqt import AnkiQt, gui_hooks 10 | from aqt.qt import * 11 | from aqt.utils import showInfo 12 | from aqt.utils import tooltip 13 | from PyQt5 import QtWidgets, QtCore 14 | 15 | 16 | def read_files(root_path, relative_path): 17 | files_catalog = [] 18 | if relative_path == "": 19 | paths = os.listdir(root_path) 20 | else: 21 | paths = os.listdir(root_path + "/" + relative_path) 22 | for path in paths: 23 | foler_is_ignored = False 24 | 25 | ignore_folder_s = settings.get_settings_by_name("ignore folder") 26 | 27 | if ignore_folder_s == "": 28 | pass 29 | elif ignore_folder_s.find("\n") != -1: 30 | ignore_folders = ignore_folder_s.split("\n") 31 | else: 32 | ignore_folders = [ignore_folder_s] 33 | 34 | for ignore_folder in ignore_folders: 35 | ignore_folder = ignore_folder.lstrip(" ") 36 | ignore_folder = ignore_folder.rstrip(" ") 37 | ignore_folder = "/" + ignore_folder 38 | if relative_path.startswith(ignore_folder) and ignore_folder != "/": 39 | foler_is_ignored = True 40 | 41 | if path.find(".") != -1 and path.split(".")[-1] != "md" and path != ".trash": 42 | pass 43 | elif foler_is_ignored: 44 | pass 45 | elif path.endswith(".md"): 46 | new_path = relative_path + "/" + path 47 | new_file = files.File(root_path, new_path) 48 | files_catalog.append(new_file) 49 | else: 50 | try: 51 | new_path = relative_path + "/" + path 52 | files_catalog = files_catalog + read_files(root_path, new_path) 53 | except NotADirectoryError: 54 | pass 55 | return files_catalog 56 | 57 | def get_bool(status_text): 58 | return status_text == "True" or status_text == "true" 59 | 60 | def get_text(status_bool): 61 | if status_bool: 62 | return "True" 63 | else: 64 | return "False" 65 | 66 | class ObsidiankiSettings(QDialog): 67 | def __init__(self, mw): 68 | super().__init__(mw) 69 | 70 | 71 | layout = QFormLayout(self) 72 | 73 | 74 | self.vault_path = QPlainTextEdit(self) 75 | self.templates_folder = QPlainTextEdit(self) 76 | self.archive_folder = QPlainTextEdit(self) 77 | 78 | 79 | self.mode = QLineEdit(self) 80 | self.type = QLineEdit(self) 81 | 82 | 83 | self.bold = QCheckBox(self) 84 | self.highlight = QCheckBox(self) 85 | self.italics = QCheckBox(self) 86 | self.image = QCheckBox(self) 87 | self.quote = QCheckBox(self) 88 | self.QuestionOrAnswer = QCheckBox(self) 89 | self.list = QCheckBox(self) 90 | self.inline_code = QCheckBox(self) 91 | self.block_code = QCheckBox(self) 92 | self.convert_button = QPushButton("Save and Convert") 93 | self.save_button = QPushButton("Save and Close") 94 | 95 | 96 | layout.addRow(QLabel("Vault Path: ")) 97 | layout.addRow(QLabel("(Please use forward slashes for your vault path)")) 98 | layout.addRow(self.vault_path) 99 | 100 | layout.addRow(QLabel("Ignore Folders: ")) 101 | layout.addRow(QLabel("(Notes in Anki in this obsidian folder will be ignored)")) 102 | layout.addRow(self.templates_folder) 103 | 104 | layout.addRow(QLabel("Archive Folder Name: ")) 105 | layout.addRow(self.archive_folder) 106 | layout.addRow(QLabel("Anki Cards in this folder will be deleted")) 107 | 108 | 109 | layout.addRow(QLabel("Mode: "), self.mode) 110 | layout.addRow(QLabel("Mode: choose from word/line/heading/document")) 111 | layout.addRow(QLabel("Type: "), self.type) 112 | layout.addRow(QLabel("Type: choose from cloze/basic")) 113 | 114 | 115 | layout.addRow(QLabel("Bold to Cloze: "), self.bold) 116 | layout.addRow(QLabel("Italics to Cloze: "), self.italics) 117 | layout.addRow(QLabel("Highlight to Cloze: "), self.highlight) 118 | layout.addRow(QLabel("Image to Cloze: "), self.image) 119 | layout.addRow(QLabel("Quote to Cloze: "), self.quote) 120 | layout.addRow(QLabel("QA to Cloze"), self.QuestionOrAnswer) 121 | layout.addRow(QLabel("List to Cloze"), self.list) 122 | layout.addRow(QLabel("Inline Code to Cloze"), self.inline_code) 123 | layout.addRow(QLabel("Block Code to Cloze"), self.block_code) 124 | 125 | 126 | layout.addRow(self.save_button, self.convert_button) 127 | 128 | 129 | self.vault_path.setPlainText(settings.get_settings_by_name("vault path")) 130 | self.templates_folder.setPlainText(settings.get_settings_by_name("ignore folder")) 131 | self.archive_folder.setPlainText(settings.get_settings_by_name("archive folder")) 132 | 133 | 134 | self.mode.setText(settings.get_settings_by_name("mode")) 135 | self.type.setText(settings.get_settings_by_name("type")) 136 | 137 | 138 | self.bold.setChecked(get_bool(settings.get_settings_by_name("bold"))) 139 | self.italics.setChecked(get_bool(settings.get_settings_by_name("italics"))) 140 | self.highlight.setChecked(get_bool(settings.get_settings_by_name("highlight"))) 141 | self.image.setChecked(get_bool(settings.get_settings_by_name("image"))) 142 | self.quote.setChecked(get_bool(settings.get_settings_by_name("quote"))) 143 | self.QuestionOrAnswer.setChecked(get_bool(settings.get_settings_by_name("QA"))) 144 | self.list.setChecked(get_bool(settings.get_settings_by_name("list"))) 145 | self.inline_code.setChecked(get_bool(settings.get_settings_by_name("inline code"))) 146 | self.block_code.setChecked(get_bool(settings.get_settings_by_name("block code"))) 147 | 148 | 149 | self.convert_button.setDefault(True) 150 | self.convert_button.clicked.connect(self.onOk) 151 | self.save_button.clicked.connect(self.onSave) 152 | self.show() 153 | 154 | def onOk(self): 155 | newSettings = {} 156 | newSettings["vault path"] = self.vault_path.toPlainText() 157 | newSettings["ignore folder"] = self.templates_folder.toPlainText() 158 | # newSettings["trash folder"] = self.trash_folder.text() 159 | newSettings["archive folder"] = self.archive_folder.toPlainText() 160 | newSettings["mode"] = self.mode.text() 161 | newSettings["type"] = self.type.text() 162 | newSettings["bold"] = get_text(self.bold.isChecked()) 163 | newSettings["highlight"] = get_text(self.highlight.isChecked()) 164 | newSettings["italics"] = get_text(self.italics.isChecked()) 165 | newSettings["image"] = get_text(self.image.isChecked()) 166 | newSettings["quote"] = get_text(self.quote.isChecked()) 167 | newSettings["QA"] = get_text(self.QuestionOrAnswer.isChecked()) 168 | newSettings["list"] = get_text(self.list.isChecked()) 169 | newSettings["inline code"] = get_text(self.inline_code.isChecked()) 170 | newSettings["block code"] = get_text(self.block_code.isChecked()) 171 | settings.save_settings(newSettings) 172 | 173 | ############################################################################################################################### 174 | ############################################################################################################################### 175 | ############################################################################################################################### 176 | 177 | if self.vault_path.toPlainText().find("\n") != -1: 178 | vault_paths = self.vault_path.toPlainText().split("\n") 179 | else: 180 | vault_paths = [self.vault_path.toPlainText()] 181 | 182 | my_files_catalog = [] 183 | 184 | for a_vault_path in vault_paths: 185 | if a_vault_path != "": 186 | my_files_catalog = my_files_catalog + read_files(a_vault_path, "") 187 | 188 | length_of_files = len(my_files_catalog) 189 | for i in range(0, length_of_files): 190 | my_files_catalog[i].set_file_content(obsidian_url.process_obsidian_file(my_files_catalog[i].file_content, my_files_catalog)) 191 | 192 | anki_importer.importer(my_files_catalog) 193 | 194 | ############################################################################################################################### 195 | ############################################################################################################################### 196 | ############################################################################################################################### 197 | 198 | mw.update() 199 | mw.reset(True) 200 | 201 | self.close() 202 | 203 | def onSave(self): 204 | newSettings = {} 205 | newSettings["vault path"] = self.vault_path.toPlainText() 206 | newSettings["ignore folder"] = self.templates_folder.toPlainText() 207 | newSettings["archive folder"] = self.archive_folder.toPlainText() 208 | 209 | newSettings["mode"] = self.mode.text() 210 | newSettings["type"] = self.type.text() 211 | newSettings["bold"] = get_text(self.bold.isChecked()) 212 | newSettings["highlight"] = get_text(self.highlight.isChecked()) 213 | newSettings["italics"] = get_text(self.italics.isChecked()) 214 | newSettings["image"] = get_text(self.image.isChecked()) 215 | newSettings["quote"] = get_text(self.quote.isChecked()) 216 | newSettings["QA"] = get_text(self.QuestionOrAnswer.isChecked()) 217 | newSettings["list"] = get_text(self.list.isChecked()) 218 | newSettings["inline code"] = get_text(self.inline_code.isChecked()) 219 | newSettings["block code"] = get_text(self.block_code.isChecked()) 220 | settings.save_settings(newSettings) 221 | self.close() 222 | 223 | action = QAction("Obsidianki 4", aqt.mw) 224 | action.triggered.connect(lambda: ObsidiankiSettings(aqt.mw)) 225 | 226 | aqt.mw.form.menuTools.addAction(action) -------------------------------------------------------------------------------- /src/anki_importer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import os 3 | import shutil 4 | from . import settings 5 | from aqt import mw 6 | from anki.cards import Card 7 | from anki.notes import Note 8 | from anki.collection import Collection 9 | from aqt.utils import showInfo 10 | 11 | def importer(my_files_catalog): 12 | for file in my_files_catalog: 13 | importer_to_anki(file) 14 | empty_trash() 15 | delete_empty_decks() 16 | 17 | def importer_to_anki(file): 18 | 19 | archive_folder_input = settings.get_settings_by_name("archive folder") 20 | if archive_folder_input == "": 21 | pass 22 | elif archive_folder_input.find("\n") != -1: 23 | archive_folders = archive_folder_input.split("\n") 24 | else: 25 | archive_folders = [archive_folder_input] 26 | 27 | is_in_archive_folder = False 28 | 29 | for archive_folder in archive_folders: 30 | archive_folder = archive_folder.lstrip(" ") 31 | archive_folder = archive_folder.rstrip(" ") 32 | archive_folder = "/" + archive_folder 33 | if file.get_file_relative_path().startswith(archive_folder) and archive_folder != "" and archive_folder != "\n": 34 | is_in_archive_folder = True 35 | 36 | if file.get_file_relative_path().startswith("/.trash"): 37 | uid = file.get_file_uid() 38 | note_list = mw.col.find_notes(uid) 39 | if len(note_list) > 0: 40 | for single_note_id in note_list: 41 | single_note = mw.col.getNote(single_note_id) 42 | try: 43 | if single_note["UID"] == uid: 44 | mw.col.remNotes([single_note_id]) 45 | except KeyError: 46 | pass 47 | elif is_in_archive_folder: # or file.get_file_root_folder() == settings.get_settings_by_name("ignore folder") 48 | uid = file.get_file_uid() 49 | note_list = mw.col.find_notes(uid) 50 | if len(note_list) > 0: 51 | for single_note_id in note_list: 52 | single_note = mw.col.getNote(single_note_id) 53 | if single_note["UID"] == uid: 54 | mw.col.remNotes([single_note_id]) 55 | else: 56 | deck_id = mw.col.decks.id(file.get_deck_name()) 57 | mw.col.decks.select(deck_id) 58 | card_model = mw.col.models.byName("Obsidianki4") 59 | uid = file.get_file_uid() 60 | note_list = mw.col.find_notes(uid) 61 | found_exisiting_file = False 62 | if len(note_list) > 0: 63 | for single_note_id in note_list: 64 | single_note = mw.col.getNote(single_note_id) 65 | if single_note.model() == card_model: 66 | if single_note["UID"] == uid: 67 | if file.get_cloze_or_basic(): 68 | single_note["Cloze"] = file.get_file_content() 69 | single_note["Text"] = "" 70 | else: 71 | single_note["Cloze"] = "{{c1::}}" 72 | single_note["Text"] = file.get_file_content() 73 | 74 | back_extra = "Source: " + file.get_file_name_with_url() 75 | single_note["Back Extra"] = back_extra 76 | 77 | single_note.tags = [] 78 | for tag in file.get_tags(): 79 | single_note.tags.append(tag) 80 | try: 81 | card_ids = mw.col.card_ids_of_note(single_note_id) 82 | mw.col.set_deck(card_ids, deck_id) 83 | except AttributeError: 84 | card_ids = mw.col.find_cards(uid) 85 | mw.col.decks.setDeck(card_ids, deck_id) 86 | single_note.flush() 87 | found_exisiting_file = True 88 | if not found_exisiting_file: 89 | try: 90 | deck = mw.col.decks.get(deck_id) 91 | deck["mid"] = card_model["id"] 92 | mw.col.decks.save(deck) 93 | note_object = mw.col.newNote(deck_id) 94 | if file.get_cloze_or_basic(): 95 | note_object["Cloze"] = file.get_file_content() 96 | note_object["Text"] = "" 97 | else: 98 | note_object["Cloze"] = "{{c1::}}" 99 | note_object["Text"] = file.get_file_content() 100 | note_object["UID"] = uid 101 | back_extra = "Source: " + file.get_file_name_with_url() 102 | note_object["Back Extra"] = back_extra 103 | for tag in file.get_tags(): 104 | note_object.tags.append(tag) 105 | 106 | mw.col.add_note(note_object, deck_id) 107 | except TypeError: 108 | pass 109 | 110 | def delete_empty_decks(): 111 | names_and_ids = mw.col.decks.all_names_and_ids() 112 | for name_and_id in names_and_ids: 113 | # I could not find what type this object is, so the only way for me to do it now is to use the string. 114 | name_and_id_segments = str(name_and_id).split("\n") 115 | deck_id= int(name_and_id_segments[0].split(": ")[1]) 116 | 117 | if deck_has_cards(deck_id): 118 | mw.col.decks.rem(deck_id, True, True) 119 | 120 | def empty_trash(): 121 | path_s = settings.get_settings_by_name("vault path") 122 | 123 | if path_s == "": 124 | pass 125 | elif path_s.find("\n"): 126 | paths = path_s.split("\n") 127 | else: 128 | paths = [paths] 129 | 130 | for path in paths: 131 | path = path.lstrip(" ") 132 | path = path.rstrip(" ") 133 | # TODO: Add this to settings 134 | if path != "": 135 | trash_can_path = path + "/" + ".trash" 136 | try: 137 | trash_directories = os.listdir(trash_can_path) 138 | for trash_directory in trash_directories: 139 | trash_directory_path = trash_can_path + "/" + trash_directory 140 | try: 141 | shutil.rmtree(trash_directory_path) 142 | except NotADirectoryError: 143 | os.remove(trash_directory_path) 144 | except NotADirectoryError: 145 | pass 146 | 147 | def deck_has_cards(deck_id): 148 | if deck_id != 1: 149 | try: 150 | if mw.col.decks.card_count(deck_id, True) == 0: 151 | return True 152 | except AttributeError: 153 | cids = mw.col.decks.cids(deck_id, True) 154 | if len(cids) == 0: 155 | return True 156 | return False -------------------------------------------------------------------------------- /src/files.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | from aqt import mw 3 | from aqt.utils import showInfo 4 | from . import processor 5 | 6 | def gen_obsidian_url(vault_name, file_url): 7 | vault_url = "obsidian://open?vault=" + my_encode(vault_name) 8 | file_url = "&file=" + my_encode(file_url) 9 | return vault_url + file_url 10 | 11 | 12 | def my_encode(string:str): 13 | string = str(string.encode("utf-8")) 14 | string = string.replace("\\x", "%") 15 | string = string.replace(" ", "%20") 16 | string = string.replace("/", "%2F") 17 | string = string.lstrip("\'b") 18 | string = string.rstrip("\'") 19 | string = capitalize_unicode(string) 20 | return string 21 | 22 | 23 | def capitalize_unicode(string): 24 | new = [] 25 | position = -5 26 | for index in range(0, len(string)): 27 | if string[index] == "%": 28 | position = index 29 | new.append(string[index]) 30 | elif index == position + 1 or index == position + 2: 31 | new.append(string[index].capitalize()) 32 | else: 33 | new.append(string[index]) 34 | return "".join(new) 35 | 36 | 37 | class File: 38 | # important parameters 39 | file_name = "" 40 | vault_path = "" 41 | relative_path = "" 42 | full_path = "" 43 | uid = "" 44 | file_content = "" 45 | cloze_or_basic = True 46 | obsidian_url = "" 47 | metadata = {} 48 | 49 | def __init__(self, vault_path, relative_path): 50 | self.vault_path = vault_path 51 | self.relative_path = relative_path 52 | self.full_path = self.vault_path + "/" + self.relative_path 53 | tmp = processor.read_file(self.full_path) 54 | self.uid = tmp[0] 55 | self.file_content = tmp[1] 56 | self.cloze_or_basic = tmp[2] 57 | self.metadata = tmp[3] 58 | self.obsidian_url = self.generate_obsidian_url() 59 | self.file_name = self.generate_file_name() 60 | 61 | 62 | def get_deck_name(self): 63 | root_name = self.vault_path.split("/")[-1] 64 | sublevel_name_segments = self.relative_path.split("/")[:-1] 65 | sublevel_name = "::".join(sublevel_name_segments) 66 | deck_name = root_name + sublevel_name 67 | return deck_name 68 | 69 | def get_file_root_folder(self): 70 | tmp = self.relative_path.lstrip("/") 71 | root_folder = tmp.split("/")[0] 72 | # TODO: Delete this test code 73 | return root_folder 74 | 75 | def get_file_full_path(self): 76 | return self.full_path 77 | 78 | def get_file_relative_path(self): 79 | return self.relative_path 80 | 81 | 82 | def generate_obsidian_url(self): 83 | vault_name = self.vault_path.split("/")[-1] 84 | file_url_segments = self.relative_path.split(".")[:-1] 85 | file_url = ".".join(file_url_segments) 86 | return gen_obsidian_url(vault_name, file_url) 87 | 88 | 89 | def get_obsidian_url(self): 90 | return self.obsidian_url 91 | 92 | 93 | def generate_file_name(self): 94 | file_name = self.relative_path.split("/")[-1] 95 | file_name_segments = file_name.split(".")[:-1] 96 | file_name = ".".join(file_name_segments) 97 | return file_name 98 | 99 | 100 | def get_file_name(self): 101 | return self.file_name 102 | 103 | 104 | def get_file_name_with_url(self): 105 | url = self.get_obsidian_url() 106 | name = self.get_file_name() 107 | name_with_url = "" + name + "" 108 | return name_with_url 109 | 110 | 111 | def get_file_uid(self): 112 | return self.uid 113 | 114 | 115 | def get_cloze_or_basic(self): 116 | return self.cloze_or_basic 117 | 118 | 119 | def set_file_content(self, file_content): 120 | self.file_content = file_content 121 | 122 | 123 | def get_file_content(self): 124 | return self.file_content 125 | 126 | 127 | def get_tags(self): 128 | tag_line = "[]" 129 | try: 130 | tag_line = self.metadata["tags"] 131 | except: 132 | pass 133 | tag_line = tag_line.lstrip("[") 134 | tag_line = tag_line.rstrip("]") 135 | if tag_line.find("/"): 136 | tag_line = tag_line.replace("/", "::") 137 | tags = tag_line.split(",") 138 | for i in range(0, len(tags)): 139 | tags[i] = tags[i].lstrip(" ") 140 | tags[i] = tags[i].rstrip(" ") 141 | tags[i] = tags[i].replace(" ", "_") 142 | return tags -------------------------------------------------------------------------------- /src/manifest.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Obsidianki 4", 3 | "package": "Obsidianki 4" 4 | } -------------------------------------------------------------------------------- /src/markdown2/markdown2.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # Copyright (c) 2012 Trent Mick. 3 | # Copyright (c) 2007-2008 ActiveState Corp. 4 | # License: MIT (http://www.opensource.org/licenses/mit-license.php) 5 | 6 | r"""A fast and complete Python implementation of Markdown. 7 | 8 | [from http://daringfireball.net/projects/markdown/] 9 | > Markdown is a text-to-HTML filter; it translates an easy-to-read / 10 | > easy-to-write structured text format into HTML. Markdown's text 11 | > format is most similar to that of plain text email, and supports 12 | > features such as headers, *emphasis*, code blocks, blockquotes, and 13 | > links. 14 | > 15 | > Markdown's syntax is designed not as a generic markup language, but 16 | > specifically to serve as a front-end to (X)HTML. You can use span-level 17 | > HTML tags anywhere in a Markdown document, and you can use block level 18 | > HTML tags (like
and as well). 19 | 20 | Module usage: 21 | 22 | >>> import markdown2 23 | >>> markdown2.markdown("*boo!*") # or use `html = markdown_path(PATH)` 24 | u'

boo!

\n' 25 | 26 | >>> markdowner = Markdown() 27 | >>> markdowner.convert("*boo!*") 28 | u'

boo!

\n' 29 | >>> markdowner.convert("**boom!**") 30 | u'

boom!

\n' 31 | 32 | This implementation of Markdown implements the full "core" syntax plus a 33 | number of extras (e.g., code syntax coloring, footnotes) as described on 34 | . 35 | """ 36 | 37 | cmdln_desc = """A fast and complete Python implementation of Markdown, a 38 | text-to-HTML conversion tool for web writers. 39 | 40 | Supported extra syntax options (see -x|--extras option below and 41 | see for details): 42 | 43 | * break-on-newline: Replace single new line characters with
when True 44 | * code-friendly: Disable _ and __ for em and strong. 45 | * cuddled-lists: Allow lists to be cuddled to the preceding paragraph. 46 | * fenced-code-blocks: Allows a code block to not have to be indented 47 | by fencing it with '```' on a line before and after. Based on 48 | with support for 49 | syntax highlighting. 50 | * footnotes: Support footnotes as in use on daringfireball.net and 51 | implemented in other Markdown processors (tho not in Markdown.pl v1.0.1). 52 | * header-ids: Adds "id" attributes to headers. The id value is a slug of 53 | the header text. 54 | * highlightjs-lang: Allows specifying the language which used for syntax 55 | highlighting when using fenced-code-blocks and highlightjs. 56 | * html-classes: Takes a dict mapping html tag names (lowercase) to a 57 | string to use for a "class" tag attribute. Currently only supports "img", 58 | "table", "pre" and "code" tags. Add an issue if you require this for other 59 | tags. 60 | * link-patterns: Auto-link given regex patterns in text (e.g. bug number 61 | references, revision number references). 62 | * markdown-in-html: Allow the use of `markdown="1"` in a block HTML tag to 63 | have markdown processing be done on its contents. Similar to 64 | but with 65 | some limitations. 66 | * metadata: Extract metadata from a leading '---'-fenced block. 67 | See for details. 68 | * nofollow: Add `rel="nofollow"` to add `` tags with an href. See 69 | . 70 | * numbering: Support of generic counters. Non standard extension to 71 | allow sequential numbering of figures, tables, equations, exhibits etc. 72 | * pyshell: Treats unindented Python interactive shell sessions as 73 | blocks. 74 | * smarty-pants: Replaces ' and " with curly quotation marks or curly 75 | apostrophes. Replaces --, ---, ..., and . . . with en dashes, em dashes, 76 | and ellipses. 77 | * spoiler: A special kind of blockquote commonly hidden behind a 78 | click on SO. Syntax per . 79 | * strike: text inside of double tilde is ~~strikethrough~~ 80 | * tag-friendly: Requires atx style headers to have a space between the # and 81 | the header text. Useful for applications that require twitter style tags to 82 | pass through the parser. 83 | * tables: Tables using the same format as GFM 84 | and 85 | PHP-Markdown Extra . 86 | * toc: The returned HTML string gets a new "toc_html" attribute which is 87 | a Table of Contents for the document. (experimental) 88 | * use-file-vars: Look for an Emacs-style markdown-extras file variable to turn 89 | on Extras. 90 | * wiki-tables: Google Code Wiki-style tables. See 91 | . 92 | * xml: Passes one-liner processing instructions and namespaced XML tags. 93 | """ 94 | 95 | # Dev Notes: 96 | # - Python's regex syntax doesn't have '\z', so I'm using '\Z'. I'm 97 | # not yet sure if there implications with this. Compare 'pydoc sre' 98 | # and 'perldoc perlre'. 99 | 100 | __version_info__ = (2, 4, 0) 101 | __version__ = '.'.join(map(str, __version_info__)) 102 | __author__ = "Trent Mick" 103 | 104 | import sys 105 | import re 106 | import logging 107 | from hashlib import sha256 108 | import optparse 109 | from random import random, randint 110 | import codecs 111 | from collections import defaultdict 112 | 113 | 114 | # ---- Python version compat 115 | 116 | # Use `bytes` for byte strings and `unicode` for unicode strings (str in Py3). 117 | if sys.version_info[0] <= 2: 118 | py3 = False 119 | try: 120 | bytes 121 | except NameError: 122 | bytes = str 123 | base_string_type = basestring 124 | elif sys.version_info[0] >= 3: 125 | py3 = True 126 | unicode = str 127 | base_string_type = str 128 | 129 | # ---- globals 130 | 131 | DEBUG = False 132 | log = logging.getLogger("markdown") 133 | 134 | DEFAULT_TAB_WIDTH = 4 135 | 136 | 137 | SECRET_SALT = bytes(randint(0, 1000000)) 138 | # MD5 function was previously used for this; the "md5" prefix was kept for 139 | # backwards compatibility. 140 | def _hash_text(s): 141 | return 'md5-' + sha256(SECRET_SALT + s.encode("utf-8")).hexdigest()[32:] 142 | 143 | # Table of hash values for escaped characters: 144 | g_escape_table = dict([(ch, _hash_text(ch)) 145 | for ch in '\\`*_{}[]()>#+-.!']) 146 | 147 | # Ampersand-encoding based entirely on Nat Irons's Amputator MT plugin: 148 | # http://bumppo.net/projects/amputator/ 149 | _AMPERSAND_RE = re.compile(r'&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)') 150 | 151 | 152 | # ---- exceptions 153 | class MarkdownError(Exception): 154 | pass 155 | 156 | 157 | # ---- public api 158 | 159 | def markdown_path(path, encoding="utf-8", 160 | html4tags=False, tab_width=DEFAULT_TAB_WIDTH, 161 | safe_mode=None, extras=None, link_patterns=None, 162 | footnote_title=None, footnote_return_symbol=None, 163 | use_file_vars=False): 164 | fp = codecs.open(path, 'r', encoding) 165 | text = fp.read() 166 | fp.close() 167 | return Markdown(html4tags=html4tags, tab_width=tab_width, 168 | safe_mode=safe_mode, extras=extras, 169 | link_patterns=link_patterns, 170 | footnote_title=footnote_title, 171 | footnote_return_symbol=footnote_return_symbol, 172 | use_file_vars=use_file_vars).convert(text) 173 | 174 | 175 | def markdown(text, html4tags=False, tab_width=DEFAULT_TAB_WIDTH, 176 | safe_mode=None, extras=None, link_patterns=None, 177 | footnote_title=None, footnote_return_symbol=None, 178 | use_file_vars=False, cli=False) -> object: 179 | """ 180 | 181 | @rtype: object 182 | """ 183 | return Markdown(html4tags=html4tags, tab_width=tab_width, 184 | safe_mode=safe_mode, extras=extras, 185 | link_patterns=link_patterns, 186 | footnote_title=footnote_title, 187 | footnote_return_symbol=footnote_return_symbol, 188 | use_file_vars=use_file_vars, cli=cli).convert(text) 189 | 190 | 191 | class Markdown(object): 192 | # The dict of "extras" to enable in processing -- a mapping of 193 | # extra name to argument for the extra. Most extras do not have an 194 | # argument, in which case the value is None. 195 | # 196 | # This can be set via (a) subclassing and (b) the constructor 197 | # "extras" argument. 198 | extras = None 199 | 200 | urls = None 201 | titles = None 202 | html_blocks = None 203 | html_spans = None 204 | html_removed_text = "{(#HTML#)}" # placeholder removed text that does not trigger bold 205 | html_removed_text_compat = "[HTML_REMOVED]" # for compat with markdown.py 206 | 207 | _toc = None 208 | 209 | # Used to track when we're inside an ordered or unordered list 210 | # (see _ProcessListItems() for details): 211 | list_level = 0 212 | 213 | _ws_only_line_re = re.compile(r"^[ \t]+$", re.M) 214 | 215 | def __init__(self, html4tags=False, tab_width=4, safe_mode=None, 216 | extras=None, link_patterns=None, 217 | footnote_title=None, footnote_return_symbol=None, 218 | use_file_vars=False, cli=False): 219 | if html4tags: 220 | self.empty_element_suffix = ">" 221 | else: 222 | self.empty_element_suffix = " />" 223 | self.tab_width = tab_width 224 | self.tab = tab_width * " " 225 | 226 | # For compatibility with earlier markdown2.py and with 227 | # markdown.py's safe_mode being a boolean, 228 | # safe_mode == True -> "replace" 229 | if safe_mode is True: 230 | self.safe_mode = "replace" 231 | else: 232 | self.safe_mode = safe_mode 233 | 234 | # Massaging and building the "extras" info. 235 | if self.extras is None: 236 | self.extras = {} 237 | elif not isinstance(self.extras, dict): 238 | self.extras = dict([(e, None) for e in self.extras]) 239 | if extras: 240 | if not isinstance(extras, dict): 241 | extras = dict([(e, None) for e in extras]) 242 | self.extras.update(extras) 243 | assert isinstance(self.extras, dict) 244 | 245 | if "toc" in self.extras: 246 | if "header-ids" not in self.extras: 247 | self.extras["header-ids"] = None # "toc" implies "header-ids" 248 | 249 | if self.extras["toc"] is None: 250 | self._toc_depth = 6 251 | else: 252 | self._toc_depth = self.extras["toc"].get("depth", 6) 253 | self._instance_extras = self.extras.copy() 254 | 255 | self.link_patterns = link_patterns 256 | self.footnote_title = footnote_title 257 | self.footnote_return_symbol = footnote_return_symbol 258 | self.use_file_vars = use_file_vars 259 | self._outdent_re = re.compile(r'^(\t|[ ]{1,%d})' % tab_width, re.M) 260 | self.cli = cli 261 | 262 | self._escape_table = g_escape_table.copy() 263 | if "smarty-pants" in self.extras: 264 | self._escape_table['"'] = _hash_text('"') 265 | self._escape_table["'"] = _hash_text("'") 266 | 267 | def reset(self): 268 | self.urls = {} 269 | self.titles = {} 270 | self.html_blocks = {} 271 | self.html_spans = {} 272 | self.list_level = 0 273 | self.extras = self._instance_extras.copy() 274 | if "footnotes" in self.extras: 275 | self.footnotes = {} 276 | self.footnote_ids = [] 277 | if "header-ids" in self.extras: 278 | self._count_from_header_id = defaultdict(int) 279 | if "metadata" in self.extras: 280 | self.metadata = {} 281 | self._toc = None 282 | 283 | # Per "rel" 284 | # should only be used in tags with an "href" attribute. 285 | 286 | # Opens the linked document in a new window or tab 287 | # should only used in tags with an "href" attribute. 288 | # same with _a_nofollow 289 | _a_nofollow_or_blank_links = re.compile(r""" 290 | <(a) 291 | ( 292 | [^>]* 293 | href= # href is required 294 | ['"]? # HTML5 attribute values do not have to be quoted 295 | [^#'"] # We don't want to match href values that start with # (like footnotes) 296 | ) 297 | """, 298 | re.IGNORECASE | re.VERBOSE 299 | ) 300 | 301 | def convert(self, text): 302 | """Convert the given text.""" 303 | # Main function. The order in which other subs are called here is 304 | # essential. Link and image substitutions need to happen before 305 | # _EscapeSpecialChars(), so that any *'s or _'s in the 306 | # and tags get encoded. 307 | 308 | # Clear the global hashes. If we don't clear these, you get conflicts 309 | # from other articles when generating a page which contains more than 310 | # one article (e.g. an index page that shows the N most recent 311 | # articles): 312 | self.reset() 313 | 314 | if not isinstance(text, unicode): 315 | # TODO: perhaps shouldn't presume UTF-8 for string input? 316 | text = unicode(text, 'utf-8') 317 | 318 | if self.use_file_vars: 319 | # Look for emacs-style file variable hints. 320 | emacs_vars = self._get_emacs_vars(text) 321 | if "markdown-extras" in emacs_vars: 322 | splitter = re.compile("[ ,]+") 323 | for e in splitter.split(emacs_vars["markdown-extras"]): 324 | if '=' in e: 325 | ename, earg = e.split('=', 1) 326 | try: 327 | earg = int(earg) 328 | except ValueError: 329 | pass 330 | else: 331 | ename, earg = e, None 332 | self.extras[ename] = earg 333 | 334 | # Standardize line endings: 335 | text = text.replace("\r\n", "\n") 336 | text = text.replace("\r", "\n") 337 | 338 | # Make sure $text ends with a couple of newlines: 339 | text += "\n\n" 340 | 341 | # Convert all tabs to spaces. 342 | text = self._detab(text) 343 | 344 | # Strip any lines consisting only of spaces and tabs. 345 | # This makes subsequent regexen easier to write, because we can 346 | # match consecutive blank lines with /\n+/ instead of something 347 | # contorted like /[ \t]*\n+/ . 348 | text = self._ws_only_line_re.sub("", text) 349 | 350 | # strip metadata from head and extract 351 | if "metadata" in self.extras: 352 | text = self._extract_metadata(text) 353 | 354 | text = self.preprocess(text) 355 | 356 | if "fenced-code-blocks" in self.extras and not self.safe_mode: 357 | text = self._do_fenced_code_blocks(text) 358 | 359 | if self.safe_mode: 360 | text = self._hash_html_spans(text) 361 | 362 | # Turn block-level HTML blocks into hash entries 363 | text = self._hash_html_blocks(text, raw=True) 364 | 365 | if "fenced-code-blocks" in self.extras and self.safe_mode: 366 | text = self._do_fenced_code_blocks(text) 367 | 368 | # Because numbering references aren't links (yet?) then we can do everything associated with counters 369 | # before we get started 370 | if "numbering" in self.extras: 371 | text = self._do_numbering(text) 372 | 373 | # Strip link definitions, store in hashes. 374 | if "footnotes" in self.extras: 375 | # Must do footnotes first because an unlucky footnote defn 376 | # looks like a link defn: 377 | # [^4]: this "looks like a link defn" 378 | text = self._strip_footnote_definitions(text) 379 | text = self._strip_link_definitions(text) 380 | 381 | text = self._run_block_gamut(text) 382 | 383 | if "footnotes" in self.extras: 384 | text = self._add_footnotes(text) 385 | 386 | text = self.postprocess(text) 387 | 388 | text = self._unescape_special_chars(text) 389 | 390 | if self.safe_mode: 391 | text = self._unhash_html_spans(text) 392 | # return the removed text warning to its markdown.py compatible form 393 | text = text.replace(self.html_removed_text, self.html_removed_text_compat) 394 | 395 | do_target_blank_links = "target-blank-links" in self.extras 396 | do_nofollow_links = "nofollow" in self.extras 397 | 398 | if do_target_blank_links and do_nofollow_links: 399 | text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="nofollow noopener" target="_blank"\2', text) 400 | elif do_target_blank_links: 401 | text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="noopener" target="_blank"\2', text) 402 | elif do_nofollow_links: 403 | text = self._a_nofollow_or_blank_links.sub(r'<\1 rel="nofollow"\2', text) 404 | 405 | if "toc" in self.extras and self._toc: 406 | self._toc_html = calculate_toc_html(self._toc) 407 | 408 | # Prepend toc html to output 409 | if self.cli: 410 | text = '{}\n{}'.format(self._toc_html, text) 411 | 412 | text += "\n" 413 | 414 | # Attach attrs to output 415 | rv = UnicodeWithAttrs(text) 416 | 417 | if "toc" in self.extras and self._toc: 418 | rv.toc_html = self._toc_html 419 | 420 | if "metadata" in self.extras: 421 | rv.metadata = self.metadata 422 | return rv 423 | 424 | def postprocess(self, text): 425 | """A hook for subclasses to do some postprocessing of the html, if 426 | desired. This is called before unescaping of special chars and 427 | unhashing of raw HTML spans. 428 | """ 429 | return text 430 | 431 | def preprocess(self, text): 432 | """A hook for subclasses to do some preprocessing of the Markdown, if 433 | desired. This is called after basic formatting of the text, but prior 434 | to any extras, safe mode, etc. processing. 435 | """ 436 | return text 437 | 438 | # Is metadata if the content starts with optional '---'-fenced `key: value` 439 | # pairs. E.g. (indented for presentation): 440 | # --- 441 | # foo: bar 442 | # another-var: blah blah 443 | # --- 444 | # # header 445 | # or: 446 | # foo: bar 447 | # another-var: blah blah 448 | # 449 | # # header 450 | _meta_data_pattern = re.compile(r'^(?:---[\ \t]*\n)?((?:[\S\w]+\s*:(?:\n+[ \t]+.*)+)|(?:.*:\s+>\n\s+[\S\s]+?)(?=\n\w+\s*:\s*\w+\n|\Z)|(?:\s*[\S\w]+\s*:(?! >)[ \t]*.*\n?))(?:---[\ \t]*\n)?', re.MULTILINE) 451 | _key_val_pat = re.compile(r"[\S\w]+\s*:(?! >)[ \t]*.*\n?", re.MULTILINE) 452 | # this allows key: > 453 | # value 454 | # conutiues over multiple lines 455 | _key_val_block_pat = re.compile( 456 | r"(.*:\s+>\n\s+[\S\s]+?)(?=\n\w+\s*:\s*\w+\n|\Z)", re.MULTILINE 457 | ) 458 | _key_val_list_pat = re.compile( 459 | r"^-(?:[ \t]*([^\n]*)(?:[ \t]*[:-][ \t]*(\S+))?)(?:\n((?:[ \t]+[^\n]+\n?)+))?", 460 | re.MULTILINE, 461 | ) 462 | _key_val_dict_pat = re.compile( 463 | r"^([^:\n]+)[ \t]*:[ \t]*([^\n]*)(?:((?:\n[ \t]+[^\n]+)+))?", re.MULTILINE 464 | ) # grp0: key, grp1: value, grp2: multiline value 465 | _meta_data_fence_pattern = re.compile(r'^---[\ \t]*\n', re.MULTILINE) 466 | _meta_data_newline = re.compile("^\n", re.MULTILINE) 467 | 468 | def _extract_metadata(self, text): 469 | if text.startswith("---"): 470 | fence_splits = re.split(self._meta_data_fence_pattern, text, maxsplit=2) 471 | metadata_content = fence_splits[1] 472 | match = re.findall(self._meta_data_pattern, metadata_content) 473 | if not match: 474 | return text 475 | tail = fence_splits[2] 476 | else: 477 | metadata_split = re.split(self._meta_data_newline, text, maxsplit=1) 478 | metadata_content = metadata_split[0] 479 | match = re.findall(self._meta_data_pattern, metadata_content) 480 | if not match: 481 | return text 482 | tail = metadata_split[1] 483 | 484 | def parse_structured_value(value): 485 | vs = value.lstrip() 486 | vs = value.replace(v[: len(value) - len(vs)], "\n")[1:] 487 | 488 | # List 489 | if vs.startswith("-"): 490 | r = [] 491 | for match in re.findall(self._key_val_list_pat, vs): 492 | if match[0] and not match[1] and not match[2]: 493 | r.append(match[0].strip()) 494 | elif match[0] == ">" and not match[1] and match[2]: 495 | r.append(match[2].strip()) 496 | elif match[0] and match[1]: 497 | r.append({match[0].strip(): match[1].strip()}) 498 | elif not match[0] and not match[1] and match[2]: 499 | r.append(parse_structured_value(match[2])) 500 | else: 501 | # Broken case 502 | pass 503 | 504 | return r 505 | 506 | # Dict 507 | else: 508 | return { 509 | match[0].strip(): ( 510 | match[1].strip() 511 | if match[1] 512 | else parse_structured_value(match[2]) 513 | ) 514 | for match in re.findall(self._key_val_dict_pat, vs) 515 | } 516 | 517 | for item in match: 518 | 519 | k, v = item.split(":", 1) 520 | 521 | # Multiline value 522 | if v[:3] == " >\n": 523 | self.metadata[k.strip()] = v[3:].strip() 524 | 525 | # Empty value 526 | elif v == "\n": 527 | self.metadata[k.strip()] = "" 528 | 529 | # Structured value 530 | elif v[0] == "\n": 531 | self.metadata[k.strip()] = parse_structured_value(v) 532 | 533 | # Simple value 534 | else: 535 | self.metadata[k.strip()] = v.strip() 536 | 537 | return tail 538 | 539 | _emacs_oneliner_vars_pat = re.compile(r"-\*-\s*([^\r\n]*?)\s*-\*-", re.UNICODE) 540 | # This regular expression is intended to match blocks like this: 541 | # PREFIX Local Variables: SUFFIX 542 | # PREFIX mode: Tcl SUFFIX 543 | # PREFIX End: SUFFIX 544 | # Some notes: 545 | # - "[ \t]" is used instead of "\s" to specifically exclude newlines 546 | # - "(\r\n|\n|\r)" is used instead of "$" because the sre engine does 547 | # not like anything other than Unix-style line terminators. 548 | _emacs_local_vars_pat = re.compile(r"""^ 549 | (?P(?:[^\r\n|\n|\r])*?) 550 | [\ \t]*Local\ Variables:[\ \t]* 551 | (?P.*?)(?:\r\n|\n|\r) 552 | (?P.*?\1End:) 553 | """, re.IGNORECASE | re.MULTILINE | re.DOTALL | re.VERBOSE) 554 | 555 | def _get_emacs_vars(self, text): 556 | """Return a dictionary of emacs-style local variables. 557 | 558 | Parsing is done loosely according to this spec (and according to 559 | some in-practice deviations from this): 560 | http://www.gnu.org/software/emacs/manual/html_node/emacs/Specifying-File-Variables.html#Specifying-File-Variables 561 | """ 562 | emacs_vars = {} 563 | SIZE = pow(2, 13) # 8kB 564 | 565 | # Search near the start for a '-*-'-style one-liner of variables. 566 | head = text[:SIZE] 567 | if "-*-" in head: 568 | match = self._emacs_oneliner_vars_pat.search(head) 569 | if match: 570 | emacs_vars_str = match.group(1) 571 | assert '\n' not in emacs_vars_str 572 | emacs_var_strs = [s.strip() for s in emacs_vars_str.split(';') 573 | if s.strip()] 574 | if len(emacs_var_strs) == 1 and ':' not in emacs_var_strs[0]: 575 | # While not in the spec, this form is allowed by emacs: 576 | # -*- Tcl -*- 577 | # where the implied "variable" is "mode". This form 578 | # is only allowed if there are no other variables. 579 | emacs_vars["mode"] = emacs_var_strs[0].strip() 580 | else: 581 | for emacs_var_str in emacs_var_strs: 582 | try: 583 | variable, value = emacs_var_str.strip().split(':', 1) 584 | except ValueError: 585 | log.debug("emacs variables error: malformed -*- " 586 | "line: %r", emacs_var_str) 587 | continue 588 | # Lowercase the variable name because Emacs allows "Mode" 589 | # or "mode" or "MoDe", etc. 590 | emacs_vars[variable.lower()] = value.strip() 591 | 592 | tail = text[-SIZE:] 593 | if "Local Variables" in tail: 594 | match = self._emacs_local_vars_pat.search(tail) 595 | if match: 596 | prefix = match.group("prefix") 597 | suffix = match.group("suffix") 598 | lines = match.group("content").splitlines(0) 599 | # print "prefix=%r, suffix=%r, content=%r, lines: %s"\ 600 | # % (prefix, suffix, match.group("content"), lines) 601 | 602 | # Validate the Local Variables block: proper prefix and suffix 603 | # usage. 604 | for i, line in enumerate(lines): 605 | if not line.startswith(prefix): 606 | log.debug("emacs variables error: line '%s' " 607 | "does not use proper prefix '%s'" 608 | % (line, prefix)) 609 | return {} 610 | # Don't validate suffix on last line. Emacs doesn't care, 611 | # neither should we. 612 | if i != len(lines)-1 and not line.endswith(suffix): 613 | log.debug("emacs variables error: line '%s' " 614 | "does not use proper suffix '%s'" 615 | % (line, suffix)) 616 | return {} 617 | 618 | # Parse out one emacs var per line. 619 | continued_for = None 620 | for line in lines[:-1]: # no var on the last line ("PREFIX End:") 621 | if prefix: line = line[len(prefix):] # strip prefix 622 | if suffix: line = line[:-len(suffix)] # strip suffix 623 | line = line.strip() 624 | if continued_for: 625 | variable = continued_for 626 | if line.endswith('\\'): 627 | line = line[:-1].rstrip() 628 | else: 629 | continued_for = None 630 | emacs_vars[variable] += ' ' + line 631 | else: 632 | try: 633 | variable, value = line.split(':', 1) 634 | except ValueError: 635 | log.debug("local variables error: missing colon " 636 | "in local variables entry: '%s'" % line) 637 | continue 638 | # Do NOT lowercase the variable name, because Emacs only 639 | # allows "mode" (and not "Mode", "MoDe", etc.) in this block. 640 | value = value.strip() 641 | if value.endswith('\\'): 642 | value = value[:-1].rstrip() 643 | continued_for = variable 644 | else: 645 | continued_for = None 646 | emacs_vars[variable] = value 647 | 648 | # Unquote values. 649 | for var, val in list(emacs_vars.items()): 650 | if len(val) > 1 and (val.startswith('"') and val.endswith('"') 651 | or val.startswith('"') and val.endswith('"')): 652 | emacs_vars[var] = val[1:-1] 653 | 654 | return emacs_vars 655 | 656 | def _detab_line(self, line): 657 | r"""Recusively convert tabs to spaces in a single line. 658 | 659 | Called from _detab().""" 660 | if '\t' not in line: 661 | return line 662 | chunk1, chunk2 = line.split('\t', 1) 663 | chunk1 += (' ' * (self.tab_width - len(chunk1) % self.tab_width)) 664 | output = chunk1 + chunk2 665 | return self._detab_line(output) 666 | 667 | def _detab(self, text): 668 | r"""Iterate text line by line and convert tabs to spaces. 669 | 670 | >>> m = Markdown() 671 | >>> m._detab("\tfoo") 672 | ' foo' 673 | >>> m._detab(" \tfoo") 674 | ' foo' 675 | >>> m._detab("\t foo") 676 | ' foo' 677 | >>> m._detab(" foo") 678 | ' foo' 679 | >>> m._detab(" foo\n\tbar\tblam") 680 | ' foo\n bar blam' 681 | """ 682 | if '\t' not in text: 683 | return text 684 | output = [] 685 | for line in text.splitlines(): 686 | output.append(self._detab_line(line)) 687 | return '\n'.join(output) 688 | 689 | # I broke out the html5 tags here and add them to _block_tags_a and 690 | # _block_tags_b. This way html5 tags are easy to keep track of. 691 | _html5tags = '|article|aside|header|hgroup|footer|nav|section|figure|figcaption' 692 | 693 | _block_tags_a = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del' 694 | _block_tags_a += _html5tags 695 | 696 | _strict_tag_block_re = re.compile(r""" 697 | ( # save in \1 698 | ^ # start of line (with re.M) 699 | <(%s) # start tag = \2 700 | \b # word break 701 | (.*\n)*? # any number of lines, minimally matching 702 | # the matching end tag 703 | [ \t]* # trailing spaces/tabs 704 | (?=\n+|\Z) # followed by a newline or end of document 705 | ) 706 | """ % _block_tags_a, 707 | re.X | re.M) 708 | 709 | _block_tags_b = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math' 710 | _block_tags_b += _html5tags 711 | 712 | _liberal_tag_block_re = re.compile(r""" 713 | ( # save in \1 714 | ^ # start of line (with re.M) 715 | <(%s) # start tag = \2 716 | \b # word break 717 | (.*\n)*? # any number of lines, minimally matching 718 | .* # the matching end tag 719 | [ \t]* # trailing spaces/tabs 720 | (?=\n+|\Z) # followed by a newline or end of document 721 | ) 722 | """ % _block_tags_b, 723 | re.X | re.M) 724 | 725 | _html_markdown_attr_re = re.compile( 726 | r'''\s+markdown=("1"|'1')''') 727 | def _hash_html_block_sub(self, match, raw=False): 728 | html = match.group(1) 729 | if raw and self.safe_mode: 730 | html = self._sanitize_html(html) 731 | elif 'markdown-in-html' in self.extras and 'markdown=' in html: 732 | first_line = html.split('\n', 1)[0] 733 | m = self._html_markdown_attr_re.search(first_line) 734 | if m: 735 | lines = html.split('\n') 736 | middle = '\n'.join(lines[1:-1]) 737 | last_line = lines[-1] 738 | first_line = first_line[:m.start()] + first_line[m.end():] 739 | f_key = _hash_text(first_line) 740 | self.html_blocks[f_key] = first_line 741 | l_key = _hash_text(last_line) 742 | self.html_blocks[l_key] = last_line 743 | return ''.join(["\n\n", f_key, 744 | "\n\n", middle, "\n\n", 745 | l_key, "\n\n"]) 746 | key = _hash_text(html) 747 | self.html_blocks[key] = html 748 | return "\n\n" + key + "\n\n" 749 | 750 | def _hash_html_blocks(self, text, raw=False): 751 | """Hashify HTML blocks 752 | 753 | We only want to do this for block-level HTML tags, such as headers, 754 | lists, and tables. That's because we still want to wrap

s around 755 | "paragraphs" that are wrapped in non-block-level tags, such as anchors, 756 | phrase emphasis, and spans. The list of tags we're looking for is 757 | hard-coded. 758 | 759 | @param raw {boolean} indicates if these are raw HTML blocks in 760 | the original source. It makes a difference in "safe" mode. 761 | """ 762 | if '<' not in text: 763 | return text 764 | 765 | # Pass `raw` value into our calls to self._hash_html_block_sub. 766 | hash_html_block_sub = _curry(self._hash_html_block_sub, raw=raw) 767 | 768 | # First, look for nested blocks, e.g.: 769 | #

770 | #
771 | # tags for inner block must be indented. 772 | #
773 | #
774 | # 775 | # The outermost tags must start at the left margin for this to match, and 776 | # the inner nested divs must be indented. 777 | # We need to do this before the next, more liberal match, because the next 778 | # match will start at the first `
` and stop at the first `
`. 779 | text = self._strict_tag_block_re.sub(hash_html_block_sub, text) 780 | 781 | # Now match more liberally, simply from `\n` to `\n` 782 | text = self._liberal_tag_block_re.sub(hash_html_block_sub, text) 783 | 784 | # Special case just for
. It was easier to make a special 785 | # case than to make the other regex more complicated. 786 | if "", start_idx) + 3 801 | except ValueError: 802 | break 803 | 804 | # Start position for next comment block search. 805 | start = end_idx 806 | 807 | # Validate whitespace before comment. 808 | if start_idx: 809 | # - Up to `tab_width - 1` spaces before start_idx. 810 | for i in range(self.tab_width - 1): 811 | if text[start_idx - 1] != ' ': 812 | break 813 | start_idx -= 1 814 | if start_idx == 0: 815 | break 816 | # - Must be preceded by 2 newlines or hit the start of 817 | # the document. 818 | if start_idx == 0: 819 | pass 820 | elif start_idx == 1 and text[0] == '\n': 821 | start_idx = 0 # to match minute detail of Markdown.pl regex 822 | elif text[start_idx-2:start_idx] == '\n\n': 823 | pass 824 | else: 825 | break 826 | 827 | # Validate whitespace after comment. 828 | # - Any number of spaces and tabs. 829 | while end_idx < len(text): 830 | if text[end_idx] not in ' \t': 831 | break 832 | end_idx += 1 833 | # - Must be following by 2 newlines or hit end of text. 834 | if text[end_idx:end_idx+2] not in ('', '\n', '\n\n'): 835 | continue 836 | 837 | # Escape and hash (must match `_hash_html_block_sub`). 838 | html = text[start_idx:end_idx] 839 | if raw and self.safe_mode: 840 | html = self._sanitize_html(html) 841 | key = _hash_text(html) 842 | self.html_blocks[key] = html 843 | text = text[:start_idx] + "\n\n" + key + "\n\n" + text[end_idx:] 844 | 845 | if "xml" in self.extras: 846 | # Treat XML processing instructions and namespaced one-liner 847 | # tags as if they were block HTML tags. E.g., if standalone 848 | # (i.e. are their own paragraph), the following do not get 849 | # wrapped in a

tag: 850 | # 851 | # 852 | # 853 | _xml_oneliner_re = _xml_oneliner_re_from_tab_width(self.tab_width) 854 | text = _xml_oneliner_re.sub(hash_html_block_sub, text) 855 | 856 | return text 857 | 858 | def _strip_link_definitions(self, text): 859 | # Strips link definitions from text, stores the URLs and titles in 860 | # hash references. 861 | less_than_tab = self.tab_width - 1 862 | 863 | # Link defs are in the form: 864 | # [id]: url "optional title" 865 | _link_def_re = re.compile(r""" 866 | ^[ ]{0,%d}\[(.+)\]: # id = \1 867 | [ \t]* 868 | \n? # maybe *one* newline 869 | [ \t]* 870 | ? # url = \2 871 | [ \t]* 872 | (?: 873 | \n? # maybe one newline 874 | [ \t]* 875 | (?<=\s) # lookbehind for whitespace 876 | ['"(] 877 | ([^\n]*) # title = \3 878 | ['")] 879 | [ \t]* 880 | )? # title is optional 881 | (?:\n+|\Z) 882 | """ % less_than_tab, re.X | re.M | re.U) 883 | return _link_def_re.sub(self._extract_link_def_sub, text) 884 | 885 | def _extract_link_def_sub(self, match): 886 | id, url, title = match.groups() 887 | key = id.lower() # Link IDs are case-insensitive 888 | self.urls[key] = self._encode_amps_and_angles(url) 889 | if title: 890 | self.titles[key] = title 891 | return "" 892 | 893 | def _do_numbering(self, text): 894 | ''' We handle the special extension for generic numbering for 895 | tables, figures etc. 896 | ''' 897 | # First pass to define all the references 898 | self.regex_defns = re.compile(r''' 899 | \[\#(\w+)\s* # the counter. Open square plus hash plus a word \1 900 | ([^@]*)\s* # Some optional characters, that aren't an @. \2 901 | @(\w+) # the id. Should this be normed? \3 902 | ([^\]]*)\] # The rest of the text up to the terminating ] \4 903 | ''', re.VERBOSE) 904 | self.regex_subs = re.compile(r"\[@(\w+)\s*\]") # [@ref_id] 905 | counters = {} 906 | references = {} 907 | replacements = [] 908 | definition_html = '

{}{}{}
' 909 | reference_html = '
{}' 910 | for match in self.regex_defns.finditer(text): 911 | # We must have four match groups otherwise this isn't a numbering reference 912 | if len(match.groups()) != 4: 913 | continue 914 | counter = match.group(1) 915 | text_before = match.group(2) 916 | ref_id = match.group(3) 917 | text_after = match.group(4) 918 | number = counters.get(counter, 1) 919 | references[ref_id] = (number, counter) 920 | replacements.append((match.start(0), 921 | definition_html.format(counter, 922 | ref_id, 923 | text_before, 924 | number, 925 | text_after), 926 | match.end(0))) 927 | counters[counter] = number + 1 928 | for repl in reversed(replacements): 929 | text = text[:repl[0]] + repl[1] + text[repl[2]:] 930 | 931 | # Second pass to replace the references with the right 932 | # value of the counter 933 | # Fwiw, it's vaguely annoying to have to turn the iterator into 934 | # a list and then reverse it but I can't think of a better thing to do. 935 | for match in reversed(list(self.regex_subs.finditer(text))): 936 | number, counter = references.get(match.group(1), (None, None)) 937 | if number is not None: 938 | repl = reference_html.format(counter, 939 | match.group(1), 940 | number) 941 | else: 942 | repl = reference_html.format(match.group(1), 943 | 'countererror', 944 | '?' + match.group(1) + '?') 945 | if "smarty-pants" in self.extras: 946 | repl = repl.replace('"', self._escape_table['"']) 947 | 948 | text = text[:match.start()] + repl + text[match.end():] 949 | return text 950 | 951 | def _extract_footnote_def_sub(self, match): 952 | id, text = match.groups() 953 | text = _dedent(text, skip_first_line=not text.startswith('\n')).strip() 954 | normed_id = re.sub(r'\W', '-', id) 955 | # Ensure footnote text ends with a couple newlines (for some 956 | # block gamut matches). 957 | self.footnotes[normed_id] = text + "\n\n" 958 | return "" 959 | 960 | def _strip_footnote_definitions(self, text): 961 | """A footnote definition looks like this: 962 | 963 | [^note-id]: Text of the note. 964 | 965 | May include one or more indented paragraphs. 966 | 967 | Where, 968 | - The 'note-id' can be pretty much anything, though typically it 969 | is the number of the footnote. 970 | - The first paragraph may start on the next line, like so: 971 | 972 | [^note-id]: 973 | Text of the note. 974 | """ 975 | less_than_tab = self.tab_width - 1 976 | footnote_def_re = re.compile(r''' 977 | ^[ ]{0,%d}\[\^(.+)\]: # id = \1 978 | [ \t]* 979 | ( # footnote text = \2 980 | # First line need not start with the spaces. 981 | (?:\s*.*\n+) 982 | (?: 983 | (?:[ ]{%d} | \t) # Subsequent lines must be indented. 984 | .*\n+ 985 | )* 986 | ) 987 | # Lookahead for non-space at line-start, or end of doc. 988 | (?:(?=^[ ]{0,%d}\S)|\Z) 989 | ''' % (less_than_tab, self.tab_width, self.tab_width), 990 | re.X | re.M) 991 | return footnote_def_re.sub(self._extract_footnote_def_sub, text) 992 | 993 | _hr_re = re.compile(r'^[ ]{0,3}([-_*][ ]{0,2}){3,}$', re.M) 994 | 995 | def _run_block_gamut(self, text): 996 | # These are all the transformations that form block-level 997 | # tags like paragraphs, headers, and list items. 998 | 999 | if "fenced-code-blocks" in self.extras: 1000 | text = self._do_fenced_code_blocks(text) 1001 | 1002 | text = self._do_headers(text) 1003 | 1004 | # Do Horizontal Rules: 1005 | # On the number of spaces in horizontal rules: The spec is fuzzy: "If 1006 | # you wish, you may use spaces between the hyphens or asterisks." 1007 | # Markdown.pl 1.0.1's hr regexes limit the number of spaces between the 1008 | # hr chars to one or two. We'll reproduce that limit here. 1009 | hr = "\n tags around block-level tags. 1029 | text = self._hash_html_blocks(text) 1030 | 1031 | text = self._form_paragraphs(text) 1032 | 1033 | return text 1034 | 1035 | def _pyshell_block_sub(self, match): 1036 | lines = match.group(0).splitlines(0) 1037 | _dedentlines(lines) 1038 | indent = ' ' * self.tab_width 1039 | s = ('\n' # separate from possible cuddled paragraph 1040 | + indent + ('\n'+indent).join(lines) 1041 | + '\n\n') 1042 | return s 1043 | 1044 | def _prepare_pyshell_blocks(self, text): 1045 | """Ensure that Python interactive shell sessions are put in 1046 | code blocks -- even if not properly indented. 1047 | """ 1048 | if ">>>" not in text: 1049 | return text 1050 | 1051 | less_than_tab = self.tab_width - 1 1052 | _pyshell_block_re = re.compile(r""" 1053 | ^([ ]{0,%d})>>>[ ].*\n # first line 1054 | ^(\1.*\S+.*\n)* # any number of subsequent lines 1055 | ^\n # ends with a blank line 1056 | """ % less_than_tab, re.M | re.X) 1057 | 1058 | return _pyshell_block_re.sub(self._pyshell_block_sub, text) 1059 | 1060 | def _table_sub(self, match): 1061 | trim_space_re = '^[ \t\n]+|[ \t\n]+$' 1062 | trim_bar_re = r'^\||\|$' 1063 | split_bar_re = r'^\||(?' % self._html_class_str_from_tag('table'), '
', ''] 1081 | cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", head)))] 1082 | for col_idx, col in enumerate(cols): 1083 | hlines.append(' %s' % ( 1084 | align_from_col_idx.get(col_idx, ''), 1085 | self._run_span_gamut(col) 1086 | )) 1087 | hlines.append('') 1088 | hlines.append('') 1089 | 1090 | # tbody 1091 | hlines.append('') 1092 | for line in body.strip('\n').split('\n'): 1093 | hlines.append('') 1094 | cols = [re.sub(escape_bar_re, '|', cell.strip()) for cell in re.split(split_bar_re, re.sub(trim_bar_re, "", re.sub(trim_space_re, "", line)))] 1095 | for col_idx, col in enumerate(cols): 1096 | hlines.append(' %s' % ( 1097 | align_from_col_idx.get(col_idx, ''), 1098 | self._run_span_gamut(col) 1099 | )) 1100 | hlines.append('') 1101 | hlines.append('') 1102 | hlines.append('
') 1103 | 1104 | return '\n'.join(hlines) + '\n' 1105 | 1106 | def _do_tables(self, text): 1107 | """Copying PHP-Markdown and GFM table syntax. Some regex borrowed from 1108 | https://github.com/michelf/php-markdown/blob/lib/Michelf/Markdown.php#L2538 1109 | """ 1110 | less_than_tab = self.tab_width - 1 1111 | table_re = re.compile(r''' 1112 | (?:(?<=\n\n)|\A\n?) # leading blank line 1113 | 1114 | ^[ ]{0,%d} # allowed whitespace 1115 | (.*[|].*) \n # $1: header row (at least one pipe) 1116 | 1117 | ^[ ]{0,%d} # allowed whitespace 1118 | ( # $2: underline row 1119 | # underline row with leading bar 1120 | (?: \|\ *:?-+:?\ * )+ \|? \n 1121 | | 1122 | # or, underline row without leading bar 1123 | (?: \ *:?-+:?\ *\| )+ (?: \ *:?-+:?\ * )? \n 1124 | ) 1125 | 1126 | ( # $3: data rows 1127 | (?: 1128 | ^[ ]{0,%d}(?!\ ) # ensure line begins with 0 to less_than_tab spaces 1129 | .*\|.* \n 1130 | )+ 1131 | ) 1132 | ''' % (less_than_tab, less_than_tab, less_than_tab), re.M | re.X) 1133 | return table_re.sub(self._table_sub, text) 1134 | 1135 | def _wiki_table_sub(self, match): 1136 | ttext = match.group(0).strip() 1137 | # print('wiki table: %r' % match.group(0)) 1138 | rows = [] 1139 | for line in ttext.splitlines(0): 1140 | line = line.strip()[2:-2].strip() 1141 | row = [c.strip() for c in re.split(r'(?' % self._html_class_str_from_tag('table')) 1154 | # Check if first cell of first row is a header cell. If so, assume the whole row is a header row. 1155 | if rows and rows[0] and re.match(r"^\s*~", rows[0][0]): 1156 | add_hline('', 1) 1157 | add_hline('', 2) 1158 | for cell in rows[0]: 1159 | add_hline("{}".format(format_cell(cell)), 3) 1160 | add_hline('', 2) 1161 | add_hline('', 1) 1162 | # Only one header row allowed. 1163 | rows = rows[1:] 1164 | # If no more rows, don't create a tbody. 1165 | if rows: 1166 | add_hline('', 1) 1167 | for row in rows: 1168 | add_hline('', 2) 1169 | for cell in row: 1170 | add_hline('{}'.format(format_cell(cell)), 3) 1171 | add_hline('', 2) 1172 | add_hline('', 1) 1173 | add_hline('') 1174 | return '\n'.join(hlines) + '\n' 1175 | 1176 | def _do_wiki_tables(self, text): 1177 | # Optimization. 1178 | if "||" not in text: 1179 | return text 1180 | 1181 | less_than_tab = self.tab_width - 1 1182 | wiki_table_re = re.compile(r''' 1183 | (?:(?<=\n\n)|\A\n?) # leading blank line 1184 | ^([ ]{0,%d})\|\|.+?\|\|[ ]*\n # first line 1185 | (^\1\|\|.+?\|\|\n)* # any number of subsequent lines 1186 | ''' % less_than_tab, re.M | re.X) 1187 | return wiki_table_re.sub(self._wiki_table_sub, text) 1188 | 1189 | def _run_span_gamut(self, text): 1190 | # These are all the transformations that occur *within* block-level 1191 | # tags like paragraphs, headers, and list items. 1192 | 1193 | text = self._do_code_spans(text) 1194 | 1195 | text = self._escape_special_chars(text) 1196 | 1197 | # Process anchor and image tags. 1198 | if "link-patterns" in self.extras: 1199 | text = self._do_link_patterns(text) 1200 | 1201 | text = self._do_links(text) 1202 | 1203 | # Make links out of things like `` 1204 | # Must come after _do_links(), because you can use < and > 1205 | # delimiters in inline links like [this](). 1206 | text = self._do_auto_links(text) 1207 | 1208 | text = self._encode_amps_and_angles(text) 1209 | 1210 | if "strike" in self.extras: 1211 | text = self._do_strike(text) 1212 | 1213 | if "underline" in self.extras: 1214 | text = self._do_underline(text) 1215 | 1216 | text = self._do_italics_and_bold(text) 1217 | 1218 | if "smarty-pants" in self.extras: 1219 | text = self._do_smart_punctuation(text) 1220 | 1221 | # Do hard breaks: 1222 | if "break-on-newline" in self.extras: 1223 | text = re.sub(r" *\n", " 1237 | | 1238 | # auto-link (e.g., ) 1239 | <\w+[^>]*> 1240 | | 1241 | # comment 1242 | | 1243 | <\?.*?\?> # processing instruction 1244 | ) 1245 | """, re.X) 1246 | 1247 | def _escape_special_chars(self, text): 1248 | # Python markdown note: the HTML tokenization here differs from 1249 | # that in Markdown.pl, hence the behaviour for subtle cases can 1250 | # differ (I believe the tokenizer here does a better job because 1251 | # it isn't susceptible to unmatched '<' and '>' in HTML tags). 1252 | # Note, however, that '>' is not allowed in an auto-link URL 1253 | # here. 1254 | escaped = [] 1255 | is_html_markup = False 1256 | for token in self._sorta_html_tokenize_re.split(text): 1257 | if is_html_markup: 1258 | # Within tags/HTML-comments/auto-links, encode * and _ 1259 | # so they don't conflict with their use in Markdown for 1260 | # italics and strong. We're replacing each such 1261 | # character with its corresponding MD5 checksum value; 1262 | # this is likely overkill, but it should prevent us from 1263 | # colliding with the escape values by accident. 1264 | escaped.append(token.replace('*', self._escape_table['*']) 1265 | .replace('_', self._escape_table['_'])) 1266 | else: 1267 | escaped.append(self._encode_backslash_escapes(token)) 1268 | is_html_markup = not is_html_markup 1269 | return ''.join(escaped) 1270 | 1271 | def _hash_html_spans(self, text): 1272 | # Used for safe_mode. 1273 | 1274 | def _is_auto_link(s): 1275 | if ':' in s and self._auto_link_re.match(s): 1276 | return True 1277 | elif '@' in s and self._auto_email_link_re.match(s): 1278 | return True 1279 | return False 1280 | 1281 | tokens = [] 1282 | is_html_markup = False 1283 | for token in self._sorta_html_tokenize_re.split(text): 1284 | if is_html_markup and not _is_auto_link(token): 1285 | sanitized = self._sanitize_html(token) 1286 | key = _hash_text(sanitized) 1287 | self.html_spans[key] = sanitized 1288 | tokens.append(key) 1289 | else: 1290 | tokens.append(self._encode_incomplete_tags(token)) 1291 | is_html_markup = not is_html_markup 1292 | return ''.join(tokens) 1293 | 1294 | def _unhash_html_spans(self, text): 1295 | for key, sanitized in list(self.html_spans.items()): 1296 | text = text.replace(key, sanitized) 1297 | return text 1298 | 1299 | def _sanitize_html(self, s): 1300 | if self.safe_mode == "replace": 1301 | return self.html_removed_text 1302 | elif self.safe_mode == "escape": 1303 | replacements = [ 1304 | ('&', '&'), 1305 | ('<', '<'), 1306 | ('>', '>'), 1307 | ] 1308 | for before, after in replacements: 1309 | s = s.replace(before, after) 1310 | return s 1311 | else: 1312 | raise MarkdownError("invalid value for 'safe_mode': %r (must be " 1313 | "'escape' or 'replace')" % self.safe_mode) 1314 | 1315 | _inline_link_title = re.compile(r''' 1316 | ( # \1 1317 | [ \t]+ 1318 | (['"]) # quote char = \2 1319 | (?P.*?) 1320 | \2 1321 | )? # title is optional 1322 | \)$ 1323 | ''', re.X | re.S) 1324 | _tail_of_reference_link_re = re.compile(r''' 1325 | # Match tail of: [text][id] 1326 | [ ]? # one optional space 1327 | (?:\n[ ]*)? # one optional newline followed by spaces 1328 | \[ 1329 | (?P<id>.*?) 1330 | \] 1331 | ''', re.X | re.S) 1332 | 1333 | _whitespace = re.compile(r'\s*') 1334 | 1335 | _strip_anglebrackets = re.compile(r'<(.*)>.*') 1336 | 1337 | def _find_non_whitespace(self, text, start): 1338 | """Returns the index of the first non-whitespace character in text 1339 | after (and including) start 1340 | """ 1341 | match = self._whitespace.match(text, start) 1342 | return match.end() 1343 | 1344 | def _find_balanced(self, text, start, open_c, close_c): 1345 | """Returns the index where the open_c and close_c characters balance 1346 | out - the same number of open_c and close_c are encountered - or the 1347 | end of string if it's reached before the balance point is found. 1348 | """ 1349 | i = start 1350 | l = len(text) 1351 | count = 1 1352 | while count > 0 and i < l: 1353 | if text[i] == open_c: 1354 | count += 1 1355 | elif text[i] == close_c: 1356 | count -= 1 1357 | i += 1 1358 | return i 1359 | 1360 | def _extract_url_and_title(self, text, start): 1361 | """Extracts the url and (optional) title from the tail of a link""" 1362 | # text[start] equals the opening parenthesis 1363 | idx = self._find_non_whitespace(text, start+1) 1364 | if idx == len(text): 1365 | return None, None, None 1366 | end_idx = idx 1367 | has_anglebrackets = text[idx] == "<" 1368 | if has_anglebrackets: 1369 | end_idx = self._find_balanced(text, end_idx+1, "<", ">") 1370 | end_idx = self._find_balanced(text, end_idx, "(", ")") 1371 | match = self._inline_link_title.search(text, idx, end_idx) 1372 | if not match: 1373 | return None, None, None 1374 | url, title = text[idx:match.start()], match.group("title") 1375 | if has_anglebrackets: 1376 | url = self._strip_anglebrackets.sub(r'\1', url) 1377 | return url, title, end_idx 1378 | 1379 | _safe_protocols = re.compile(r'(https?|ftp):', re.I) 1380 | def _do_links(self, text): 1381 | """Turn Markdown link shortcuts into XHTML <a> and <img> tags. 1382 | 1383 | This is a combination of Markdown.pl's _DoAnchors() and 1384 | _DoImages(). They are done together because that simplified the 1385 | approach. It was necessary to use a different approach than 1386 | Markdown.pl because of the lack of atomic matching support in 1387 | Python's regex engine used in $g_nested_brackets. 1388 | """ 1389 | MAX_LINK_TEXT_SENTINEL = 3000 # markdown2 issue 24 1390 | 1391 | # `anchor_allowed_pos` is used to support img links inside 1392 | # anchors, but not anchors inside anchors. An anchor's start 1393 | # pos must be `>= anchor_allowed_pos`. 1394 | anchor_allowed_pos = 0 1395 | 1396 | curr_pos = 0 1397 | while True: # Handle the next link. 1398 | # The next '[' is the start of: 1399 | # - an inline anchor: [text](url "title") 1400 | # - a reference anchor: [text][id] 1401 | # - an inline img: ![text](url "title") 1402 | # - a reference img: ![text][id] 1403 | # - a footnote ref: [^id] 1404 | # (Only if 'footnotes' extra enabled) 1405 | # - a footnote defn: [^id]: ... 1406 | # (Only if 'footnotes' extra enabled) These have already 1407 | # been stripped in _strip_footnote_definitions() so no 1408 | # need to watch for them. 1409 | # - a link definition: [id]: url "title" 1410 | # These have already been stripped in 1411 | # _strip_link_definitions() so no need to watch for them. 1412 | # - not markup: [...anything else... 1413 | try: 1414 | start_idx = text.index('[', curr_pos) 1415 | except ValueError: 1416 | break 1417 | text_length = len(text) 1418 | 1419 | # Find the matching closing ']'. 1420 | # Markdown.pl allows *matching* brackets in link text so we 1421 | # will here too. Markdown.pl *doesn't* currently allow 1422 | # matching brackets in img alt text -- we'll differ in that 1423 | # regard. 1424 | bracket_depth = 0 1425 | for p in range(start_idx+1, min(start_idx+MAX_LINK_TEXT_SENTINEL, 1426 | text_length)): 1427 | ch = text[p] 1428 | if ch == ']': 1429 | bracket_depth -= 1 1430 | if bracket_depth < 0: 1431 | break 1432 | elif ch == '[': 1433 | bracket_depth += 1 1434 | else: 1435 | # Closing bracket not found within sentinel length. 1436 | # This isn't markup. 1437 | curr_pos = start_idx + 1 1438 | continue 1439 | link_text = text[start_idx+1:p] 1440 | 1441 | # Fix for issue 341 - Injecting XSS into link text 1442 | if self.safe_mode: 1443 | link_text = self._hash_html_spans(link_text) 1444 | link_text = self._unhash_html_spans(link_text) 1445 | 1446 | # Possibly a footnote ref? 1447 | if "footnotes" in self.extras and link_text.startswith("^"): 1448 | normed_id = re.sub(r'\W', '-', link_text[1:]) 1449 | if normed_id in self.footnotes: 1450 | self.footnote_ids.append(normed_id) 1451 | result = '<sup class="footnote-ref" id="fnref-%s">' \ 1452 | '<a href="#fn-%s">%s</a></sup>' \ 1453 | % (normed_id, normed_id, len(self.footnote_ids)) 1454 | text = text[:start_idx] + result + text[p+1:] 1455 | else: 1456 | # This id isn't defined, leave the markup alone. 1457 | curr_pos = p+1 1458 | continue 1459 | 1460 | # Now determine what this is by the remainder. 1461 | p += 1 1462 | if p == text_length: 1463 | return text 1464 | 1465 | # Inline anchor or img? 1466 | if text[p] == '(': # attempt at perf improvement 1467 | url, title, url_end_idx = self._extract_url_and_title(text, p) 1468 | if url is not None: 1469 | # Handle an inline anchor or img. 1470 | is_img = start_idx > 0 and text[start_idx-1] == "!" 1471 | if is_img: 1472 | start_idx -= 1 1473 | 1474 | # We've got to encode these to avoid conflicting 1475 | # with italics/bold. 1476 | url = url.replace('*', self._escape_table['*']) \ 1477 | .replace('_', self._escape_table['_']) 1478 | if title: 1479 | title_str = ' title="%s"' % ( 1480 | _xml_escape_attr(title) 1481 | .replace('*', self._escape_table['*']) 1482 | .replace('_', self._escape_table['_'])) 1483 | else: 1484 | title_str = '' 1485 | if is_img: 1486 | img_class_str = self._html_class_str_from_tag("img") 1487 | result = '<img src="%s" alt="%s"%s%s%s' \ 1488 | % (_html_escape_url(url, safe_mode=self.safe_mode), 1489 | _xml_escape_attr(link_text), 1490 | title_str, 1491 | img_class_str, 1492 | self.empty_element_suffix) 1493 | if "smarty-pants" in self.extras: 1494 | result = result.replace('"', self._escape_table['"']) 1495 | curr_pos = start_idx + len(result) 1496 | text = text[:start_idx] + result + text[url_end_idx:] 1497 | elif start_idx >= anchor_allowed_pos: 1498 | safe_link = self._safe_protocols.match(url) or url.startswith('#') 1499 | if self.safe_mode and not safe_link: 1500 | result_head = '<a href="#"%s>' % (title_str) 1501 | else: 1502 | result_head = '<a href="%s"%s>' % (_html_escape_url(url, safe_mode=self.safe_mode), title_str) 1503 | result = '%s%s</a>' % (result_head, link_text) 1504 | if "smarty-pants" in self.extras: 1505 | result = result.replace('"', self._escape_table['"']) 1506 | # <img> allowed from curr_pos on, <a> from 1507 | # anchor_allowed_pos on. 1508 | curr_pos = start_idx + len(result_head) 1509 | anchor_allowed_pos = start_idx + len(result) 1510 | text = text[:start_idx] + result + text[url_end_idx:] 1511 | else: 1512 | # Anchor not allowed here. 1513 | curr_pos = start_idx + 1 1514 | continue 1515 | 1516 | # Reference anchor or img? 1517 | else: 1518 | match = self._tail_of_reference_link_re.match(text, p) 1519 | if match: 1520 | # Handle a reference-style anchor or img. 1521 | is_img = start_idx > 0 and text[start_idx-1] == "!" 1522 | if is_img: 1523 | start_idx -= 1 1524 | link_id = match.group("id").lower() 1525 | if not link_id: 1526 | link_id = link_text.lower() # for links like [this][] 1527 | if link_id in self.urls: 1528 | url = self.urls[link_id] 1529 | # We've got to encode these to avoid conflicting 1530 | # with italics/bold. 1531 | url = url.replace('*', self._escape_table['*']) \ 1532 | .replace('_', self._escape_table['_']) 1533 | title = self.titles.get(link_id) 1534 | if title: 1535 | title = _xml_escape_attr(title) \ 1536 | .replace('*', self._escape_table['*']) \ 1537 | .replace('_', self._escape_table['_']) 1538 | title_str = ' title="%s"' % title 1539 | else: 1540 | title_str = '' 1541 | if is_img: 1542 | img_class_str = self._html_class_str_from_tag("img") 1543 | result = '<img src="%s" alt="%s"%s%s%s' \ 1544 | % (_html_escape_url(url, safe_mode=self.safe_mode), 1545 | _xml_escape_attr(link_text), 1546 | title_str, 1547 | img_class_str, 1548 | self.empty_element_suffix) 1549 | if "smarty-pants" in self.extras: 1550 | result = result.replace('"', self._escape_table['"']) 1551 | curr_pos = start_idx + len(result) 1552 | text = text[:start_idx] + result + text[match.end():] 1553 | elif start_idx >= anchor_allowed_pos: 1554 | if self.safe_mode and not self._safe_protocols.match(url): 1555 | result_head = '<a href="#"%s>' % (title_str) 1556 | else: 1557 | result_head = '<a href="%s"%s>' % (_html_escape_url(url, safe_mode=self.safe_mode), title_str) 1558 | result = '%s%s</a>' % (result_head, link_text) 1559 | if "smarty-pants" in self.extras: 1560 | result = result.replace('"', self._escape_table['"']) 1561 | # <img> allowed from curr_pos on, <a> from 1562 | # anchor_allowed_pos on. 1563 | curr_pos = start_idx + len(result_head) 1564 | anchor_allowed_pos = start_idx + len(result) 1565 | text = text[:start_idx] + result + text[match.end():] 1566 | else: 1567 | # Anchor not allowed here. 1568 | curr_pos = start_idx + 1 1569 | else: 1570 | # This id isn't defined, leave the markup alone. 1571 | curr_pos = match.end() 1572 | continue 1573 | 1574 | # Otherwise, it isn't markup. 1575 | curr_pos = start_idx + 1 1576 | 1577 | return text 1578 | 1579 | def header_id_from_text(self, text, prefix, n): 1580 | """Generate a header id attribute value from the given header 1581 | HTML content. 1582 | 1583 | This is only called if the "header-ids" extra is enabled. 1584 | Subclasses may override this for different header ids. 1585 | 1586 | @param text {str} The text of the header tag 1587 | @param prefix {str} The requested prefix for header ids. This is the 1588 | value of the "header-ids" extra key, if any. Otherwise, None. 1589 | @param n {int} The <hN> tag number, i.e. `1` for an <h1> tag. 1590 | @returns {str} The value for the header tag's "id" attribute. Return 1591 | None to not have an id attribute and to exclude this header from 1592 | the TOC (if the "toc" extra is specified). 1593 | """ 1594 | header_id = _slugify(text) 1595 | if prefix and isinstance(prefix, base_string_type): 1596 | header_id = prefix + '-' + header_id 1597 | 1598 | self._count_from_header_id[header_id] += 1 1599 | if 0 == len(header_id) or self._count_from_header_id[header_id] > 1: 1600 | header_id += '-%s' % self._count_from_header_id[header_id] 1601 | 1602 | return header_id 1603 | 1604 | def _toc_add_entry(self, level, id, name): 1605 | if level > self._toc_depth: 1606 | return 1607 | if self._toc is None: 1608 | self._toc = [] 1609 | self._toc.append((level, id, self._unescape_special_chars(name))) 1610 | 1611 | _h_re_base = r''' 1612 | (^(.+)[ \t]*\n(=+|-+)[ \t]*\n+) 1613 | | 1614 | (^(\#{1,6}) # \1 = string of #'s 1615 | [ \t]%s 1616 | (.+?) # \2 = Header text 1617 | [ \t]* 1618 | (?<!\\) # ensure not an escaped trailing '#' 1619 | \#* # optional closing #'s (not counted) 1620 | \n+ 1621 | ) 1622 | ''' 1623 | 1624 | _h_re = re.compile(_h_re_base % '*', re.X | re.M) 1625 | _h_re_tag_friendly = re.compile(_h_re_base % '+', re.X | re.M) 1626 | 1627 | def _h_sub(self, match): 1628 | if match.group(1) is not None and match.group(3) == "-": 1629 | return match.group(1) 1630 | elif match.group(1) is not None: 1631 | # Setext header 1632 | n = {"=": 1, "-": 2}[match.group(3)[0]] 1633 | header_group = match.group(2) 1634 | else: 1635 | # atx header 1636 | n = len(match.group(5)) 1637 | header_group = match.group(6) 1638 | 1639 | demote_headers = self.extras.get("demote-headers") 1640 | if demote_headers: 1641 | n = min(n + demote_headers, 6) 1642 | header_id_attr = "" 1643 | if "header-ids" in self.extras: 1644 | header_id = self.header_id_from_text(header_group, 1645 | self.extras["header-ids"], n) 1646 | if header_id: 1647 | header_id_attr = ' id="%s"' % header_id 1648 | html = self._run_span_gamut(header_group) 1649 | if "toc" in self.extras and header_id: 1650 | self._toc_add_entry(n, header_id, html) 1651 | return "<h%d%s>%s</h%d>\n\n" % (n, header_id_attr, html, n) 1652 | 1653 | def _do_headers(self, text): 1654 | # Setext-style headers: 1655 | # Header 1 1656 | # ======== 1657 | # 1658 | # Header 2 1659 | # -------- 1660 | 1661 | # atx-style headers: 1662 | # # Header 1 1663 | # ## Header 2 1664 | # ## Header 2 with closing hashes ## 1665 | # ... 1666 | # ###### Header 6 1667 | 1668 | if 'tag-friendly' in self.extras: 1669 | return self._h_re_tag_friendly.sub(self._h_sub, text) 1670 | return self._h_re.sub(self._h_sub, text) 1671 | 1672 | _marker_ul_chars = '*+-' 1673 | _marker_any = r'(?:[%s]|\d+\.)' % _marker_ul_chars 1674 | _marker_ul = '(?:[%s])' % _marker_ul_chars 1675 | _marker_ol = r'(?:\d+\.)' 1676 | 1677 | def _list_sub(self, match): 1678 | lst = match.group(1) 1679 | lst_type = match.group(3) in self._marker_ul_chars and "ul" or "ol" 1680 | result = self._process_list_items(lst) 1681 | if self.list_level: 1682 | return "<%s>\n%s</%s>\n" % (lst_type, result, lst_type) 1683 | else: 1684 | return "<%s>\n%s</%s>\n\n" % (lst_type, result, lst_type) 1685 | 1686 | def _do_lists(self, text): 1687 | # Form HTML ordered (numbered) and unordered (bulleted) lists. 1688 | 1689 | # Iterate over each *non-overlapping* list match. 1690 | pos = 0 1691 | while True: 1692 | # Find the *first* hit for either list style (ul or ol). We 1693 | # match ul and ol separately to avoid adjacent lists of different 1694 | # types running into each other (see issue #16). 1695 | hits = [] 1696 | for marker_pat in (self._marker_ul, self._marker_ol): 1697 | less_than_tab = self.tab_width - 1 1698 | whole_list = r''' 1699 | ( # \1 = whole list 1700 | ( # \2 1701 | [ ]{0,%d} 1702 | (%s) # \3 = first list item marker 1703 | [ \t]+ 1704 | (?!\ *\3\ ) # '- - - ...' isn't a list. See 'not_quite_a_list' test case. 1705 | ) 1706 | (?:.+?) 1707 | ( # \4 1708 | \Z 1709 | | 1710 | \n{2,} 1711 | (?=\S) 1712 | (?! # Negative lookahead for another list item marker 1713 | [ \t]* 1714 | %s[ \t]+ 1715 | ) 1716 | ) 1717 | ) 1718 | ''' % (less_than_tab, marker_pat, marker_pat) 1719 | if self.list_level: # sub-list 1720 | list_re = re.compile("^"+whole_list, re.X | re.M | re.S) 1721 | else: 1722 | list_re = re.compile(r"(?:(?<=\n\n)|\A\n?)"+whole_list, 1723 | re.X | re.M | re.S) 1724 | match = list_re.search(text, pos) 1725 | if match: 1726 | hits.append((match.start(), match)) 1727 | if not hits: 1728 | break 1729 | hits.sort() 1730 | match = hits[0][1] 1731 | start, end = match.span() 1732 | middle = self._list_sub(match) 1733 | text = text[:start] + middle + text[end:] 1734 | pos = start + len(middle) # start pos for next attempted match 1735 | 1736 | return text 1737 | 1738 | _list_item_re = re.compile(r''' 1739 | (\n)? # leading line = \1 1740 | (^[ \t]*) # leading whitespace = \2 1741 | (?P<marker>%s) [ \t]+ # list marker = \3 1742 | ((?:.+?) # list item text = \4 1743 | (\n{1,2})) # eols = \5 1744 | (?= \n* (\Z | \2 (?P<next_marker>%s) [ \t]+)) 1745 | ''' % (_marker_any, _marker_any), 1746 | re.M | re.X | re.S) 1747 | 1748 | _task_list_item_re = re.compile(r''' 1749 | (\[[\ xX]\])[ \t]+ # tasklist marker = \1 1750 | (.*) # list item text = \2 1751 | ''', re.M | re.X | re.S) 1752 | 1753 | _task_list_warpper_str = r'<input type="checkbox" class="task-list-item-checkbox" %sdisabled> %s' 1754 | 1755 | def _task_list_item_sub(self, match): 1756 | marker = match.group(1) 1757 | item_text = match.group(2) 1758 | if marker in ['[x]','[X]']: 1759 | return self._task_list_warpper_str % ('checked ', item_text) 1760 | elif marker == '[ ]': 1761 | return self._task_list_warpper_str % ('', item_text) 1762 | 1763 | _last_li_endswith_two_eols = False 1764 | def _list_item_sub(self, match): 1765 | item = match.group(4) 1766 | leading_line = match.group(1) 1767 | if leading_line or "\n\n" in item or self._last_li_endswith_two_eols: 1768 | item = self._run_block_gamut(self._outdent(item)) 1769 | else: 1770 | # Recursion for sub-lists: 1771 | item = self._do_lists(self._outdent(item)) 1772 | if item.endswith('\n'): 1773 | item = item[:-1] 1774 | item = self._run_span_gamut(item) 1775 | self._last_li_endswith_two_eols = (len(match.group(5)) == 2) 1776 | 1777 | if "task_list" in self.extras: 1778 | item = self._task_list_item_re.sub(self._task_list_item_sub, item) 1779 | 1780 | return "<li>%s</li>\n" % item 1781 | 1782 | def _process_list_items(self, list_str): 1783 | # Process the contents of a single ordered or unordered list, 1784 | # splitting it into individual list items. 1785 | 1786 | # The $g_list_level global keeps track of when we're inside a list. 1787 | # Each time we enter a list, we increment it; when we leave a list, 1788 | # we decrement. If it's zero, we're not in a list anymore. 1789 | # 1790 | # We do this because when we're not inside a list, we want to treat 1791 | # something like this: 1792 | # 1793 | # I recommend upgrading to version 1794 | # 8. Oops, now this line is treated 1795 | # as a sub-list. 1796 | # 1797 | # As a single paragraph, despite the fact that the second line starts 1798 | # with a digit-period-space sequence. 1799 | # 1800 | # Whereas when we're inside a list (or sub-list), that line will be 1801 | # treated as the start of a sub-list. What a kludge, huh? This is 1802 | # an aspect of Markdown's syntax that's hard to parse perfectly 1803 | # without resorting to mind-reading. Perhaps the solution is to 1804 | # change the syntax rules such that sub-lists must start with a 1805 | # starting cardinal number; e.g. "1." or "a.". 1806 | self.list_level += 1 1807 | self._last_li_endswith_two_eols = False 1808 | list_str = list_str.rstrip('\n') + '\n' 1809 | list_str = self._list_item_re.sub(self._list_item_sub, list_str) 1810 | self.list_level -= 1 1811 | return list_str 1812 | 1813 | def _get_pygments_lexer(self, lexer_name): 1814 | try: 1815 | from pygments import lexers, util 1816 | except ImportError: 1817 | return None 1818 | try: 1819 | return lexers.get_lexer_by_name(lexer_name) 1820 | except util.ClassNotFound: 1821 | return None 1822 | 1823 | def _color_with_pygments(self, codeblock, lexer, **formatter_opts): 1824 | import pygments 1825 | import pygments.formatters 1826 | 1827 | class HtmlCodeFormatter(pygments.formatters.HtmlFormatter): 1828 | def _wrap_code(self, inner): 1829 | """A function for use in a Pygments Formatter which 1830 | wraps in <code> tags. 1831 | """ 1832 | yield 0, "<code>" 1833 | for tup in inner: 1834 | yield tup 1835 | yield 0, "</code>" 1836 | 1837 | def wrap(self, source, outfile): 1838 | """Return the source with a code, pre, and div.""" 1839 | return self._wrap_div(self._wrap_pre(self._wrap_code(source))) 1840 | 1841 | formatter_opts.setdefault("cssclass", "codehilite") 1842 | formatter = HtmlCodeFormatter(**formatter_opts) 1843 | return pygments.highlight(codeblock, lexer, formatter) 1844 | 1845 | def _code_block_sub(self, match, is_fenced_code_block=False): 1846 | lexer_name = None 1847 | if is_fenced_code_block: 1848 | lexer_name = match.group(1) 1849 | if lexer_name: 1850 | formatter_opts = self.extras['fenced-code-blocks'] or {} 1851 | codeblock = match.group(2) 1852 | codeblock = codeblock[:-1] # drop one trailing newline 1853 | else: 1854 | codeblock = match.group(1) 1855 | codeblock = self._outdent(codeblock) 1856 | codeblock = self._detab(codeblock) 1857 | codeblock = codeblock.lstrip('\n') # trim leading newlines 1858 | codeblock = codeblock.rstrip() # trim trailing whitespace 1859 | 1860 | # Note: "code-color" extra is DEPRECATED. 1861 | if "code-color" in self.extras and codeblock.startswith(":::"): 1862 | lexer_name, rest = codeblock.split('\n', 1) 1863 | lexer_name = lexer_name[3:].strip() 1864 | codeblock = rest.lstrip("\n") # Remove lexer declaration line. 1865 | formatter_opts = self.extras['code-color'] or {} 1866 | 1867 | # Use pygments only if not using the highlightjs-lang extra 1868 | if lexer_name and "highlightjs-lang" not in self.extras: 1869 | def unhash_code(codeblock): 1870 | for key, sanitized in list(self.html_spans.items()): 1871 | codeblock = codeblock.replace(key, sanitized) 1872 | replacements = [ 1873 | ("&", "&"), 1874 | ("<", "<"), 1875 | (">", ">") 1876 | ] 1877 | for old, new in replacements: 1878 | codeblock = codeblock.replace(old, new) 1879 | return codeblock 1880 | lexer = self._get_pygments_lexer(lexer_name) 1881 | if lexer: 1882 | codeblock = unhash_code( codeblock ) 1883 | colored = self._color_with_pygments(codeblock, lexer, 1884 | **formatter_opts) 1885 | return "\n\n%s\n\n" % colored 1886 | 1887 | codeblock = self._encode_code(codeblock) 1888 | pre_class_str = self._html_class_str_from_tag("pre") 1889 | 1890 | if "highlightjs-lang" in self.extras and lexer_name: 1891 | code_class_str = ' class="%s language-%s"' % (lexer_name, lexer_name) 1892 | else: 1893 | code_class_str = self._html_class_str_from_tag("code") 1894 | 1895 | return "\n\n<pre%s><code%s>%s\n</code></pre>\n\n" % ( 1896 | pre_class_str, code_class_str, codeblock) 1897 | 1898 | def _html_class_str_from_tag(self, tag): 1899 | """Get the appropriate ' class="..."' string (note the leading 1900 | space), if any, for the given tag. 1901 | """ 1902 | if "html-classes" not in self.extras: 1903 | return "" 1904 | try: 1905 | html_classes_from_tag = self.extras["html-classes"] 1906 | except TypeError: 1907 | return "" 1908 | else: 1909 | if tag in html_classes_from_tag: 1910 | return ' class="%s"' % html_classes_from_tag[tag] 1911 | return "" 1912 | 1913 | def _do_code_blocks(self, text): 1914 | """Process Markdown `<pre><code>` blocks.""" 1915 | code_block_re = re.compile(r''' 1916 | (?:\n\n|\A\n?) 1917 | ( # $1 = the code block -- one or more lines, starting with a space/tab 1918 | (?: 1919 | (?:[ ]{%d} | \t) # Lines must start with a tab or a tab-width of spaces 1920 | .*\n+ 1921 | )+ 1922 | ) 1923 | ((?=^[ ]{0,%d}\S)|\Z) # Lookahead for non-space at line-start, or end of doc 1924 | # Lookahead to make sure this block isn't already in a code block. 1925 | # Needed when syntax highlighting is being used. 1926 | (?![^<]*\</code\>) 1927 | ''' % (self.tab_width, self.tab_width), 1928 | re.M | re.X) 1929 | return code_block_re.sub(self._code_block_sub, text) 1930 | 1931 | _fenced_code_block_re = re.compile(r''' 1932 | (?:\n+|\A\n?) 1933 | ^```\s*?([\w+-]+)?\s*?\n # opening fence, $1 = optional lang 1934 | (.*?) # $2 = code block content 1935 | ^```[ \t]*\n # closing fence 1936 | ''', re.M | re.X | re.S) 1937 | 1938 | def _fenced_code_block_sub(self, match): 1939 | return self._code_block_sub(match, is_fenced_code_block=True) 1940 | 1941 | def _do_fenced_code_blocks(self, text): 1942 | """Process ```-fenced unindented code blocks ('fenced-code-blocks' extra).""" 1943 | return self._fenced_code_block_re.sub(self._fenced_code_block_sub, text) 1944 | 1945 | # Rules for a code span: 1946 | # - backslash escapes are not interpreted in a code span 1947 | # - to include one or or a run of more backticks the delimiters must 1948 | # be a longer run of backticks 1949 | # - cannot start or end a code span with a backtick; pad with a 1950 | # space and that space will be removed in the emitted HTML 1951 | # See `test/tm-cases/escapes.text` for a number of edge-case 1952 | # examples. 1953 | _code_span_re = re.compile(r''' 1954 | (?<!\\) 1955 | (`+) # \1 = Opening run of ` 1956 | (?!`) # See Note A test/tm-cases/escapes.text 1957 | (.+?) # \2 = The code block 1958 | (?<!`) 1959 | \1 # Matching closer 1960 | (?!`) 1961 | ''', re.X | re.S) 1962 | 1963 | def _code_span_sub(self, match): 1964 | c = match.group(2).strip(" \t") 1965 | c = self._encode_code(c) 1966 | return "<code>%s</code>" % c 1967 | 1968 | def _do_code_spans(self, text): 1969 | # * Backtick quotes are used for <code></code> spans. 1970 | # 1971 | # * You can use multiple backticks as the delimiters if you want to 1972 | # include literal backticks in the code span. So, this input: 1973 | # 1974 | # Just type ``foo `bar` baz`` at the prompt. 1975 | # 1976 | # Will translate to: 1977 | # 1978 | # <p>Just type <code>foo `bar` baz</code> at the prompt.</p> 1979 | # 1980 | # There's no arbitrary limit to the number of backticks you 1981 | # can use as delimters. If you need three consecutive backticks 1982 | # in your code, use four for delimiters, etc. 1983 | # 1984 | # * You can use spaces to get literal backticks at the edges: 1985 | # 1986 | # ... type `` `bar` `` ... 1987 | # 1988 | # Turns to: 1989 | # 1990 | # ... type <code>`bar`</code> ... 1991 | return self._code_span_re.sub(self._code_span_sub, text) 1992 | 1993 | def _encode_code(self, text): 1994 | """Encode/escape certain characters inside Markdown code runs. 1995 | The point is that in code, these characters are literals, 1996 | and lose their special Markdown meanings. 1997 | """ 1998 | replacements = [ 1999 | # Encode all ampersands; HTML entities are not 2000 | # entities within a Markdown code span. 2001 | ('&', '&'), 2002 | # Do the angle bracket song and dance: 2003 | ('<', '<'), 2004 | ('>', '>'), 2005 | ] 2006 | for before, after in replacements: 2007 | text = text.replace(before, after) 2008 | hashed = _hash_text(text) 2009 | self._escape_table[text] = hashed 2010 | return hashed 2011 | 2012 | _strike_re = re.compile(r"~~(?=\S)(.+?)(?<=\S)~~", re.S) 2013 | def _do_strike(self, text): 2014 | text = self._strike_re.sub(r"<strike>\1</strike>", text) 2015 | return text 2016 | 2017 | _underline_re = re.compile(r"--(?=\S)(.+?)(?<=\S)--", re.S) 2018 | def _do_underline(self, text): 2019 | text = self._underline_re.sub(r"<u>\1</u>", text) 2020 | return text 2021 | 2022 | _strong_re = re.compile(r"(\*\*|__)(?=\S)(.+?[*_]*)(?<=\S)\1", re.S) 2023 | _em_re = re.compile(r"(\*|_)(?=\S)(.+?)(?<=\S)\1", re.S) 2024 | _code_friendly_strong_re = re.compile(r"\*\*(?=\S)(.+?[*_]*)(?<=\S)\*\*", re.S) 2025 | _code_friendly_em_re = re.compile(r"\*(?=\S)(.+?)(?<=\S)\*", re.S) 2026 | def _do_italics_and_bold(self, text): 2027 | # <strong> must go first: 2028 | if "code-friendly" in self.extras: 2029 | text = self._code_friendly_strong_re.sub(r"<strong>\1</strong>", text) 2030 | text = self._code_friendly_em_re.sub(r"<em>\1</em>", text) 2031 | else: 2032 | text = self._strong_re.sub(r"<strong>\2</strong>", text) 2033 | text = self._em_re.sub(r"<em>\2</em>", text) 2034 | return text 2035 | 2036 | # "smarty-pants" extra: Very liberal in interpreting a single prime as an 2037 | # apostrophe; e.g. ignores the fact that "round", "bout", "twer", and 2038 | # "twixt" can be written without an initial apostrophe. This is fine because 2039 | # using scare quotes (single quotation marks) is rare. 2040 | _apostrophe_year_re = re.compile(r"'(\d\d)(?=(\s|,|;|\.|\?|!|$))") 2041 | _contractions = ["tis", "twas", "twer", "neath", "o", "n", 2042 | "round", "bout", "twixt", "nuff", "fraid", "sup"] 2043 | def _do_smart_contractions(self, text): 2044 | text = self._apostrophe_year_re.sub(r"’\1", text) 2045 | for c in self._contractions: 2046 | text = text.replace("'%s" % c, "’%s" % c) 2047 | text = text.replace("'%s" % c.capitalize(), 2048 | "’%s" % c.capitalize()) 2049 | return text 2050 | 2051 | # Substitute double-quotes before single-quotes. 2052 | _opening_single_quote_re = re.compile(r"(?<!\S)'(?=\S)") 2053 | _opening_double_quote_re = re.compile(r'(?<!\S)"(?=\S)') 2054 | _closing_single_quote_re = re.compile(r"(?<=\S)'") 2055 | _closing_double_quote_re = re.compile(r'(?<=\S)"(?=(\s|,|;|\.|\?|!|$))') 2056 | def _do_smart_punctuation(self, text): 2057 | """Fancifies 'single quotes', "double quotes", and apostrophes. 2058 | Converts --, ---, and ... into en dashes, em dashes, and ellipses. 2059 | 2060 | Inspiration is: <http://daringfireball.net/projects/smartypants/> 2061 | See "test/tm-cases/smarty_pants.text" for a full discussion of the 2062 | support here and 2063 | <http://code.google.com/p/python-markdown2/issues/detail?id=42> for a 2064 | discussion of some diversion from the original SmartyPants. 2065 | """ 2066 | if "'" in text: # guard for perf 2067 | text = self._do_smart_contractions(text) 2068 | text = self._opening_single_quote_re.sub("‘", text) 2069 | text = self._closing_single_quote_re.sub("’", text) 2070 | 2071 | if '"' in text: # guard for perf 2072 | text = self._opening_double_quote_re.sub("“", text) 2073 | text = self._closing_double_quote_re.sub("”", text) 2074 | 2075 | text = text.replace("---", "—") 2076 | text = text.replace("--", "–") 2077 | text = text.replace("...", "…") 2078 | text = text.replace(" . . . ", "…") 2079 | text = text.replace(". . .", "…") 2080 | 2081 | # TODO: Temporary hack to fix https://github.com/trentm/python-markdown2/issues/150 2082 | if "footnotes" in self.extras and "footnote-ref" in text: 2083 | # Quotes in the footnote back ref get converted to "smart" quotes 2084 | # Change them back here to ensure they work. 2085 | text = text.replace('class="footnote-ref”', 'class="footnote-ref"') 2086 | 2087 | return text 2088 | 2089 | _block_quote_base = r''' 2090 | ( # Wrap whole match in \1 2091 | ( 2092 | ^[ \t]*>%s[ \t]? # '>' at the start of a line 2093 | .+\n # rest of the first line 2094 | (.+\n)* # subsequent consecutive lines 2095 | )+ 2096 | ) 2097 | ''' 2098 | _block_quote_re = re.compile(_block_quote_base % '', re.M | re.X) 2099 | _block_quote_re_spoiler = re.compile(_block_quote_base % '[ \t]*?!?', re.M | re.X) 2100 | _bq_one_level_re = re.compile('^[ \t]*>[ \t]?', re.M) 2101 | _bq_one_level_re_spoiler = re.compile('^[ \t]*>[ \t]*?![ \t]?', re.M) 2102 | _bq_all_lines_spoilers = re.compile(r'\A(?:^[ \t]*>[ \t]*?!.*[\n\r]*)+\Z', re.M) 2103 | _html_pre_block_re = re.compile(r'(\s*<pre>.+?</pre>)', re.S) 2104 | def _dedent_two_spaces_sub(self, match): 2105 | return re.sub(r'(?m)^ ', '', match.group(1)) 2106 | 2107 | def _block_quote_sub(self, match): 2108 | bq = match.group(1) 2109 | is_spoiler = 'spoiler' in self.extras and self._bq_all_lines_spoilers.match(bq) 2110 | # trim one level of quoting 2111 | if is_spoiler: 2112 | bq = self._bq_one_level_re_spoiler.sub('', bq) 2113 | else: 2114 | bq = self._bq_one_level_re.sub('', bq) 2115 | # trim whitespace-only lines 2116 | bq = self._ws_only_line_re.sub('', bq) 2117 | bq = self._run_block_gamut(bq) # recurse 2118 | 2119 | bq = re.sub('(?m)^', ' ', bq) 2120 | # These leading spaces screw with <pre> content, so we need to fix that: 2121 | bq = self._html_pre_block_re.sub(self._dedent_two_spaces_sub, bq) 2122 | 2123 | if is_spoiler: 2124 | return '<blockquote class="spoiler">\n%s\n</blockquote>\n\n' % bq 2125 | else: 2126 | return '<blockquote>\n%s\n</blockquote>\n\n' % bq 2127 | 2128 | def _do_block_quotes(self, text): 2129 | if '>' not in text: 2130 | return text 2131 | if 'spoiler' in self.extras: 2132 | return self._block_quote_re_spoiler.sub(self._block_quote_sub, text) 2133 | else: 2134 | return self._block_quote_re.sub(self._block_quote_sub, text) 2135 | 2136 | def _form_paragraphs(self, text): 2137 | # Strip leading and trailing lines: 2138 | text = text.strip('\n') 2139 | 2140 | # Wrap <p> tags. 2141 | grafs = [] 2142 | for i, graf in enumerate(re.split(r"\n{2,}", text)): 2143 | if graf in self.html_blocks: 2144 | # Unhashify HTML blocks 2145 | grafs.append(self.html_blocks[graf]) 2146 | else: 2147 | cuddled_list = None 2148 | if "cuddled-lists" in self.extras: 2149 | # Need to put back trailing '\n' for `_list_item_re` 2150 | # match at the end of the paragraph. 2151 | li = self._list_item_re.search(graf + '\n') 2152 | # Two of the same list marker in this paragraph: a likely 2153 | # candidate for a list cuddled to preceding paragraph 2154 | # text (issue 33). Note the `[-1]` is a quick way to 2155 | # consider numeric bullets (e.g. "1." and "2.") to be 2156 | # equal. 2157 | if (li and len(li.group(2)) <= 3 2158 | and ( 2159 | (li.group("next_marker") and li.group("marker")[-1] == li.group("next_marker")[-1]) 2160 | or 2161 | li.group("next_marker") is None 2162 | ) 2163 | ): 2164 | start = li.start() 2165 | cuddled_list = self._do_lists(graf[start:]).rstrip("\n") 2166 | assert cuddled_list.startswith("<ul>") or cuddled_list.startswith("<ol>") 2167 | graf = graf[:start] 2168 | 2169 | # Wrap <p> tags. 2170 | graf = self._run_span_gamut(graf) 2171 | grafs.append("<p%s>" % self._html_class_str_from_tag('p') + graf.lstrip(" \t") + "</p>") 2172 | 2173 | if cuddled_list: 2174 | grafs.append(cuddled_list) 2175 | 2176 | return "\n\n".join(grafs) 2177 | 2178 | def _add_footnotes(self, text): 2179 | if self.footnotes: 2180 | footer = [ 2181 | '<div class="footnotes">', 2182 | '<hr' + self.empty_element_suffix, 2183 | '<ol>', 2184 | ] 2185 | 2186 | if not self.footnote_title: 2187 | self.footnote_title = "Jump back to footnote %d in the text." 2188 | if not self.footnote_return_symbol: 2189 | self.footnote_return_symbol = "↩" 2190 | 2191 | for i, id in enumerate(self.footnote_ids): 2192 | if i != 0: 2193 | footer.append('') 2194 | footer.append('<li id="fn-%s">' % id) 2195 | footer.append(self._run_block_gamut(self.footnotes[id])) 2196 | try: 2197 | backlink = ('<a href="#fnref-%s" ' + 2198 | 'class="footnoteBackLink" ' + 2199 | 'title="' + self.footnote_title + '">' + 2200 | self.footnote_return_symbol + 2201 | '</a>') % (id, i+1) 2202 | except TypeError: 2203 | log.debug("Footnote error. `footnote_title` " 2204 | "must include parameter. Using defaults.") 2205 | backlink = ('<a href="#fnref-%s" ' 2206 | 'class="footnoteBackLink" ' 2207 | 'title="Jump back to footnote %d in the text.">' 2208 | '↩</a>' % (id, i+1)) 2209 | 2210 | if footer[-1].endswith("</p>"): 2211 | footer[-1] = footer[-1][:-len("</p>")] \ 2212 | + ' ' + backlink + "</p>" 2213 | else: 2214 | footer.append("\n<p>%s</p>" % backlink) 2215 | footer.append('</li>') 2216 | footer.append('</ol>') 2217 | footer.append('</div>') 2218 | return text + '\n\n' + '\n'.join(footer) 2219 | else: 2220 | return text 2221 | 2222 | _naked_lt_re = re.compile(r'<(?![a-z/?\$!])', re.I) 2223 | _naked_gt_re = re.compile(r'''(?<![a-z0-9?!/'"-])>''', re.I) 2224 | 2225 | def _encode_amps_and_angles(self, text): 2226 | # Smart processing for ampersands and angle brackets that need 2227 | # to be encoded. 2228 | text = _AMPERSAND_RE.sub('&', text) 2229 | 2230 | # Encode naked <'s 2231 | text = self._naked_lt_re.sub('<', text) 2232 | 2233 | # Encode naked >'s 2234 | # Note: Other markdown implementations (e.g. Markdown.pl, PHP 2235 | # Markdown) don't do this. 2236 | text = self._naked_gt_re.sub('>', text) 2237 | return text 2238 | 2239 | _incomplete_tags_re = re.compile(r"<(/?\w+?(?!\w).+?[\s/]+?)") 2240 | 2241 | def _encode_incomplete_tags(self, text): 2242 | if self.safe_mode not in ("replace", "escape"): 2243 | return text 2244 | 2245 | if text.endswith(">"): 2246 | return text # this is not an incomplete tag, this is a link in the form <http://x.y.z> 2247 | 2248 | return self._incomplete_tags_re.sub("<\\1", text) 2249 | 2250 | def _encode_backslash_escapes(self, text): 2251 | for ch, escape in list(self._escape_table.items()): 2252 | text = text.replace("\\"+ch, escape) 2253 | return text 2254 | 2255 | _auto_link_re = re.compile(r'<((https?|ftp):[^\'">\s]+)>', re.I) 2256 | def _auto_link_sub(self, match): 2257 | g1 = match.group(1) 2258 | return '<a href="%s">%s</a>' % (g1, g1) 2259 | 2260 | _auto_email_link_re = re.compile(r""" 2261 | < 2262 | (?:mailto:)? 2263 | ( 2264 | [-.\w]+ 2265 | \@ 2266 | [-\w]+(\.[-\w]+)*\.[a-z]+ 2267 | ) 2268 | > 2269 | """, re.I | re.X | re.U) 2270 | def _auto_email_link_sub(self, match): 2271 | return self._encode_email_address( 2272 | self._unescape_special_chars(match.group(1))) 2273 | 2274 | def _do_auto_links(self, text): 2275 | text = self._auto_link_re.sub(self._auto_link_sub, text) 2276 | text = self._auto_email_link_re.sub(self._auto_email_link_sub, text) 2277 | return text 2278 | 2279 | def _encode_email_address(self, addr): 2280 | # Input: an email address, e.g. "foo@example.com" 2281 | # 2282 | # Output: the email address as a mailto link, with each character 2283 | # of the address encoded as either a decimal or hex entity, in 2284 | # the hopes of foiling most address harvesting spam bots. E.g.: 2285 | # 2286 | # <a href="mailto:foo@e 2287 | # xample.com">foo 2288 | # @example.com</a> 2289 | # 2290 | # Based on a filter by Matthew Wickline, posted to the BBEdit-Talk 2291 | # mailing list: <http://tinyurl.com/yu7ue> 2292 | chars = [_xml_encode_email_char_at_random(ch) 2293 | for ch in "mailto:" + addr] 2294 | # Strip the mailto: from the visible part. 2295 | addr = '<a href="%s">%s</a>' \ 2296 | % (''.join(chars), ''.join(chars[7:])) 2297 | return addr 2298 | 2299 | def _do_link_patterns(self, text): 2300 | link_from_hash = {} 2301 | for regex, repl in self.link_patterns: 2302 | replacements = [] 2303 | for match in regex.finditer(text): 2304 | if hasattr(repl, "__call__"): 2305 | href = repl(match) 2306 | else: 2307 | href = match.expand(repl) 2308 | replacements.append((match.span(), href)) 2309 | for (start, end), href in reversed(replacements): 2310 | 2311 | # Do not match against links inside brackets. 2312 | if text[start - 1:start] == '[' and text[end:end + 1] == ']': 2313 | continue 2314 | 2315 | # Do not match against links in the standard markdown syntax. 2316 | if text[start - 2:start] == '](' or text[end:end + 2] == '")': 2317 | continue 2318 | 2319 | # Do not match against links which are escaped. 2320 | if text[start - 3:start] == '"""' and text[end:end + 3] == '"""': 2321 | text = text[:start - 3] + text[start:end] + text[end + 3:] 2322 | continue 2323 | 2324 | escaped_href = ( 2325 | href.replace('"', '"') # b/c of attr quote 2326 | # To avoid markdown <em> and <strong>: 2327 | .replace('*', self._escape_table['*']) 2328 | .replace('_', self._escape_table['_'])) 2329 | link = '<a href="%s">%s</a>' % (escaped_href, text[start:end]) 2330 | hash = _hash_text(link) 2331 | link_from_hash[hash] = link 2332 | text = text[:start] + hash + text[end:] 2333 | for hash, link in list(link_from_hash.items()): 2334 | text = text.replace(hash, link) 2335 | return text 2336 | 2337 | def _unescape_special_chars(self, text): 2338 | # Swap back in all the special characters we've hidden. 2339 | for ch, hash in list(self._escape_table.items()): 2340 | text = text.replace(hash, ch) 2341 | return text 2342 | 2343 | def _outdent(self, text): 2344 | # Remove one level of line-leading tabs or spaces 2345 | return self._outdent_re.sub('', text) 2346 | 2347 | 2348 | class MarkdownWithExtras(Markdown): 2349 | """A markdowner class that enables most extras: 2350 | 2351 | - footnotes 2352 | - code-color (only has effect if 'pygments' Python module on path) 2353 | 2354 | These are not included: 2355 | - pyshell (specific to Python-related documenting) 2356 | - code-friendly (because it *disables* part of the syntax) 2357 | - link-patterns (because you need to specify some actual 2358 | link-patterns anyway) 2359 | """ 2360 | extras = ["footnotes", "code-color"] 2361 | 2362 | 2363 | # ---- internal support functions 2364 | 2365 | 2366 | def calculate_toc_html(toc): 2367 | """Return the HTML for the current TOC. 2368 | 2369 | This expects the `_toc` attribute to have been set on this instance. 2370 | """ 2371 | if toc is None: 2372 | return None 2373 | 2374 | def indent(): 2375 | return ' ' * (len(h_stack) - 1) 2376 | lines = [] 2377 | h_stack = [0] # stack of header-level numbers 2378 | for level, id, name in toc: 2379 | if level > h_stack[-1]: 2380 | lines.append("%s<ul>" % indent()) 2381 | h_stack.append(level) 2382 | elif level == h_stack[-1]: 2383 | lines[-1] += "</li>" 2384 | else: 2385 | while level < h_stack[-1]: 2386 | h_stack.pop() 2387 | if not lines[-1].endswith("</li>"): 2388 | lines[-1] += "</li>" 2389 | lines.append("%s</ul></li>" % indent()) 2390 | lines.append('%s<li><a href="#%s">%s</a>' % ( 2391 | indent(), id, name)) 2392 | while len(h_stack) > 1: 2393 | h_stack.pop() 2394 | if not lines[-1].endswith("</li>"): 2395 | lines[-1] += "</li>" 2396 | lines.append("%s</ul>" % indent()) 2397 | return '\n'.join(lines) + '\n' 2398 | 2399 | 2400 | class UnicodeWithAttrs(unicode): 2401 | """A subclass of unicode used for the return value of conversion to 2402 | possibly attach some attributes. E.g. the "toc_html" attribute when 2403 | the "toc" extra is used. 2404 | """ 2405 | metadata = None 2406 | toc_html = None 2407 | 2408 | ## {{{ http://code.activestate.com/recipes/577257/ (r1) 2409 | _slugify_strip_re = re.compile(r'[^\w\s-]') 2410 | _slugify_hyphenate_re = re.compile(r'[-\s]+') 2411 | def _slugify(value): 2412 | """ 2413 | Normalizes string, converts to lowercase, removes non-alpha characters, 2414 | and converts spaces to hyphens. 2415 | 2416 | From Django's "django/template/defaultfilters.py". 2417 | """ 2418 | import unicodedata 2419 | value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore').decode() 2420 | value = _slugify_strip_re.sub('', value).strip().lower() 2421 | return _slugify_hyphenate_re.sub('-', value) 2422 | ## end of http://code.activestate.com/recipes/577257/ }}} 2423 | 2424 | 2425 | # From http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52549 2426 | def _curry(*args, **kwargs): 2427 | function, args = args[0], args[1:] 2428 | def result(*rest, **kwrest): 2429 | combined = kwargs.copy() 2430 | combined.update(kwrest) 2431 | return function(*args + rest, **combined) 2432 | return result 2433 | 2434 | 2435 | # Recipe: regex_from_encoded_pattern (1.0) 2436 | def _regex_from_encoded_pattern(s): 2437 | """'foo' -> re.compile(re.escape('foo')) 2438 | '/foo/' -> re.compile('foo') 2439 | '/foo/i' -> re.compile('foo', re.I) 2440 | """ 2441 | if s.startswith('/') and s.rfind('/') != 0: 2442 | # Parse it: /PATTERN/FLAGS 2443 | idx = s.rfind('/') 2444 | _, flags_str = s[1:idx], s[idx+1:] 2445 | flag_from_char = { 2446 | "i": re.IGNORECASE, 2447 | "l": re.LOCALE, 2448 | "s": re.DOTALL, 2449 | "m": re.MULTILINE, 2450 | "u": re.UNICODE, 2451 | } 2452 | flags = 0 2453 | for char in flags_str: 2454 | try: 2455 | flags |= flag_from_char[char] 2456 | except KeyError: 2457 | raise ValueError("unsupported regex flag: '%s' in '%s' " 2458 | "(must be one of '%s')" 2459 | % (char, s, ''.join(list(flag_from_char.keys())))) 2460 | return re.compile(s[1:idx], flags) 2461 | else: # not an encoded regex 2462 | return re.compile(re.escape(s)) 2463 | 2464 | 2465 | # Recipe: dedent (0.1.2) 2466 | def _dedentlines(lines, tabsize=8, skip_first_line=False): 2467 | """_dedentlines(lines, tabsize=8, skip_first_line=False) -> dedented lines 2468 | 2469 | "lines" is a list of lines to dedent. 2470 | "tabsize" is the tab width to use for indent width calculations. 2471 | "skip_first_line" is a boolean indicating if the first line should 2472 | be skipped for calculating the indent width and for dedenting. 2473 | This is sometimes useful for docstrings and similar. 2474 | 2475 | Same as dedent() except operates on a sequence of lines. Note: the 2476 | lines list is modified **in-place**. 2477 | """ 2478 | DEBUG = False 2479 | if DEBUG: 2480 | print("dedent: dedent(..., tabsize=%d, skip_first_line=%r)"\ 2481 | % (tabsize, skip_first_line)) 2482 | margin = None 2483 | for i, line in enumerate(lines): 2484 | if i == 0 and skip_first_line: continue 2485 | indent = 0 2486 | for ch in line: 2487 | if ch == ' ': 2488 | indent += 1 2489 | elif ch == '\t': 2490 | indent += tabsize - (indent % tabsize) 2491 | elif ch in '\r\n': 2492 | continue # skip all-whitespace lines 2493 | else: 2494 | break 2495 | else: 2496 | continue # skip all-whitespace lines 2497 | if DEBUG: print("dedent: indent=%d: %r" % (indent, line)) 2498 | if margin is None: 2499 | margin = indent 2500 | else: 2501 | margin = min(margin, indent) 2502 | if DEBUG: print("dedent: margin=%r" % margin) 2503 | 2504 | if margin is not None and margin > 0: 2505 | for i, line in enumerate(lines): 2506 | if i == 0 and skip_first_line: continue 2507 | removed = 0 2508 | for j, ch in enumerate(line): 2509 | if ch == ' ': 2510 | removed += 1 2511 | elif ch == '\t': 2512 | removed += tabsize - (removed % tabsize) 2513 | elif ch in '\r\n': 2514 | if DEBUG: print("dedent: %r: EOL -> strip up to EOL" % line) 2515 | lines[i] = lines[i][j:] 2516 | break 2517 | else: 2518 | raise ValueError("unexpected non-whitespace char %r in " 2519 | "line %r while removing %d-space margin" 2520 | % (ch, line, margin)) 2521 | if DEBUG: 2522 | print("dedent: %r: %r -> removed %d/%d"\ 2523 | % (line, ch, removed, margin)) 2524 | if removed == margin: 2525 | lines[i] = lines[i][j+1:] 2526 | break 2527 | elif removed > margin: 2528 | lines[i] = ' '*(removed-margin) + lines[i][j+1:] 2529 | break 2530 | else: 2531 | if removed: 2532 | lines[i] = lines[i][removed:] 2533 | return lines 2534 | 2535 | 2536 | def _dedent(text, tabsize=8, skip_first_line=False): 2537 | """_dedent(text, tabsize=8, skip_first_line=False) -> dedented text 2538 | 2539 | "text" is the text to dedent. 2540 | "tabsize" is the tab width to use for indent width calculations. 2541 | "skip_first_line" is a boolean indicating if the first line should 2542 | be skipped for calculating the indent width and for dedenting. 2543 | This is sometimes useful for docstrings and similar. 2544 | 2545 | textwrap.dedent(s), but don't expand tabs to spaces 2546 | """ 2547 | lines = text.splitlines(1) 2548 | _dedentlines(lines, tabsize=tabsize, skip_first_line=skip_first_line) 2549 | return ''.join(lines) 2550 | 2551 | 2552 | class _memoized(object): 2553 | """Decorator that caches a function's return value each time it is called. 2554 | If called later with the same arguments, the cached value is returned, and 2555 | not re-evaluated. 2556 | 2557 | http://wiki.python.org/moin/PythonDecoratorLibrary 2558 | """ 2559 | def __init__(self, func): 2560 | self.func = func 2561 | self.cache = {} 2562 | 2563 | def __call__(self, *args): 2564 | try: 2565 | return self.cache[args] 2566 | except KeyError: 2567 | self.cache[args] = value = self.func(*args) 2568 | return value 2569 | except TypeError: 2570 | # uncachable -- for instance, passing a list as an argument. 2571 | # Better to not cache than to blow up entirely. 2572 | return self.func(*args) 2573 | 2574 | def __repr__(self): 2575 | """Return the function's docstring.""" 2576 | return self.func.__doc__ 2577 | 2578 | 2579 | def _xml_oneliner_re_from_tab_width(tab_width): 2580 | """Standalone XML processing instruction regex.""" 2581 | return re.compile(r""" 2582 | (?: 2583 | (?<=\n\n) # Starting after a blank line 2584 | | # or 2585 | \A\n? # the beginning of the doc 2586 | ) 2587 | ( # save in $1 2588 | [ ]{0,%d} 2589 | (?: 2590 | <\?\w+\b\s+.*?\?> # XML processing instruction 2591 | | 2592 | <\w+:\w+\b\s+.*?/> # namespaced single tag 2593 | ) 2594 | [ \t]* 2595 | (?=\n{2,}|\Z) # followed by a blank line or end of document 2596 | ) 2597 | """ % (tab_width - 1), re.X) 2598 | _xml_oneliner_re_from_tab_width = _memoized(_xml_oneliner_re_from_tab_width) 2599 | 2600 | 2601 | def _hr_tag_re_from_tab_width(tab_width): 2602 | return re.compile(r""" 2603 | (?: 2604 | (?<=\n\n) # Starting after a blank line 2605 | | # or 2606 | \A\n? # the beginning of the doc 2607 | ) 2608 | ( # save in \1 2609 | [ ]{0,%d} 2610 | <(hr) # start tag = \2 2611 | \b # word break 2612 | ([^<>])*? # 2613 | /?> # the matching end tag 2614 | [ \t]* 2615 | (?=\n{2,}|\Z) # followed by a blank line or end of document 2616 | ) 2617 | """ % (tab_width - 1), re.X) 2618 | _hr_tag_re_from_tab_width = _memoized(_hr_tag_re_from_tab_width) 2619 | 2620 | 2621 | def _xml_escape_attr(attr, skip_single_quote=True): 2622 | """Escape the given string for use in an HTML/XML tag attribute. 2623 | 2624 | By default this doesn't bother with escaping `'` to `'`, presuming that 2625 | the tag attribute is surrounded by double quotes. 2626 | """ 2627 | escaped = _AMPERSAND_RE.sub('&', attr) 2628 | 2629 | escaped = (attr 2630 | .replace('"', '"') 2631 | .replace('<', '<') 2632 | .replace('>', '>')) 2633 | if not skip_single_quote: 2634 | escaped = escaped.replace("'", "'") 2635 | return escaped 2636 | 2637 | 2638 | def _xml_encode_email_char_at_random(ch): 2639 | r = random() 2640 | # Roughly 10% raw, 45% hex, 45% dec. 2641 | # '@' *must* be encoded. I [John Gruber] insist. 2642 | # Issue 26: '_' must be encoded. 2643 | if r > 0.9 and ch not in "@_": 2644 | return ch 2645 | elif r < 0.45: 2646 | # The [1:] is to drop leading '0': 0x63 -> x63 2647 | return '&#%s;' % hex(ord(ch))[1:] 2648 | else: 2649 | return '&#%s;' % ord(ch) 2650 | 2651 | 2652 | def _html_escape_url(attr, safe_mode=False): 2653 | """Replace special characters that are potentially malicious in url string.""" 2654 | escaped = (attr 2655 | .replace('"', '"') 2656 | .replace('<', '<') 2657 | .replace('>', '>')) 2658 | if safe_mode: 2659 | escaped = escaped.replace('+', ' ') 2660 | escaped = escaped.replace("'", "'") 2661 | return escaped 2662 | 2663 | 2664 | # ---- mainline 2665 | 2666 | class _NoReflowFormatter(optparse.IndentedHelpFormatter): 2667 | """An optparse formatter that does NOT reflow the description.""" 2668 | def format_description(self, description): 2669 | return description or "" 2670 | 2671 | 2672 | def _test(): 2673 | import doctest 2674 | doctest.testmod() 2675 | 2676 | 2677 | def main(argv=None): 2678 | if argv is None: 2679 | argv = sys.argv 2680 | if not logging.root.handlers: 2681 | logging.basicConfig() 2682 | 2683 | usage = "usage: %prog [PATHS...]" 2684 | version = "%prog "+__version__ 2685 | parser = optparse.OptionParser(prog="markdown2", usage=usage, 2686 | version=version, description=cmdln_desc, 2687 | formatter=_NoReflowFormatter()) 2688 | parser.add_option("-v", "--verbose", dest="log_level", 2689 | action="store_const", const=logging.DEBUG, 2690 | help="more verbose output") 2691 | parser.add_option("--encoding", 2692 | help="specify encoding of text content") 2693 | parser.add_option("--html4tags", action="store_true", default=False, 2694 | help="use HTML 4 style for empty element tags") 2695 | parser.add_option("-s", "--safe", metavar="MODE", dest="safe_mode", 2696 | help="sanitize literal HTML: 'escape' escapes " 2697 | "HTML meta chars, 'replace' replaces with an " 2698 | "[HTML_REMOVED] note") 2699 | parser.add_option("-x", "--extras", action="append", 2700 | help="Turn on specific extra features (not part of " 2701 | "the core Markdown spec). See above.") 2702 | parser.add_option("--use-file-vars", 2703 | help="Look for and use Emacs-style 'markdown-extras' " 2704 | "file var to turn on extras. See " 2705 | "<https://github.com/trentm/python-markdown2/wiki/Extras>") 2706 | parser.add_option("--link-patterns-file", 2707 | help="path to a link pattern file") 2708 | parser.add_option("--self-test", action="store_true", 2709 | help="run internal self-tests (some doctests)") 2710 | parser.add_option("--compare", action="store_true", 2711 | help="run against Markdown.pl as well (for testing)") 2712 | parser.set_defaults(log_level=logging.INFO, compare=False, 2713 | encoding="utf-8", safe_mode=None, use_file_vars=False) 2714 | opts, paths = parser.parse_args() 2715 | log.setLevel(opts.log_level) 2716 | 2717 | if opts.self_test: 2718 | return _test() 2719 | 2720 | if opts.extras: 2721 | extras = {} 2722 | for s in opts.extras: 2723 | splitter = re.compile("[,;: ]+") 2724 | for e in splitter.split(s): 2725 | if '=' in e: 2726 | ename, earg = e.split('=', 1) 2727 | try: 2728 | earg = int(earg) 2729 | except ValueError: 2730 | pass 2731 | else: 2732 | ename, earg = e, None 2733 | extras[ename] = earg 2734 | else: 2735 | extras = None 2736 | 2737 | if opts.link_patterns_file: 2738 | link_patterns = [] 2739 | f = open(opts.link_patterns_file) 2740 | try: 2741 | for i, line in enumerate(f.readlines()): 2742 | if not line.strip(): continue 2743 | if line.lstrip().startswith("#"): continue 2744 | try: 2745 | pat, href = line.rstrip().rsplit(None, 1) 2746 | except ValueError: 2747 | raise MarkdownError("%s:%d: invalid link pattern line: %r" 2748 | % (opts.link_patterns_file, i+1, line)) 2749 | link_patterns.append( 2750 | (_regex_from_encoded_pattern(pat), href)) 2751 | finally: 2752 | f.close() 2753 | else: 2754 | link_patterns = None 2755 | 2756 | from os.path import join, dirname, abspath, exists 2757 | markdown_pl = join(dirname(dirname(abspath(__file__))), "test", 2758 | "Markdown.pl") 2759 | if not paths: 2760 | paths = ['-'] 2761 | for path in paths: 2762 | if path == '-': 2763 | text = sys.stdin.read() 2764 | else: 2765 | fp = codecs.open(path, 'r', opts.encoding) 2766 | text = fp.read() 2767 | fp.close() 2768 | if opts.compare: 2769 | from subprocess import Popen, PIPE 2770 | print("==== Markdown.pl ====") 2771 | p = Popen('perl %s' % markdown_pl, shell=True, stdin=PIPE, stdout=PIPE, close_fds=True) 2772 | p.stdin.write(text.encode('utf-8')) 2773 | p.stdin.close() 2774 | perl_html = p.stdout.read().decode('utf-8') 2775 | if py3: 2776 | sys.stdout.write(perl_html) 2777 | else: 2778 | sys.stdout.write(perl_html.encode( 2779 | sys.stdout.encoding or "utf-8", 'xmlcharrefreplace')) 2780 | print("==== markdown2.py ====") 2781 | html = markdown(text, 2782 | html4tags=opts.html4tags, 2783 | safe_mode=opts.safe_mode, 2784 | extras=extras, link_patterns=link_patterns, 2785 | use_file_vars=opts.use_file_vars, 2786 | cli=True) 2787 | if py3: 2788 | sys.stdout.write(html) 2789 | else: 2790 | sys.stdout.write(html.encode( 2791 | sys.stdout.encoding or "utf-8", 'xmlcharrefreplace')) 2792 | if extras and "toc" in extras: 2793 | log.debug("toc_html: " + 2794 | str(html.toc_html.encode(sys.stdout.encoding or "utf-8", 'xmlcharrefreplace'))) 2795 | if opts.compare: 2796 | test_dir = join(dirname(dirname(abspath(__file__))), "test") 2797 | if exists(join(test_dir, "test_markdown2.py")): 2798 | sys.path.insert(0, test_dir) 2799 | from test_markdown2 import norm_html_from_html 2800 | norm_html = norm_html_from_html(html) 2801 | norm_perl_html = norm_html_from_html(perl_html) 2802 | else: 2803 | norm_html = html 2804 | norm_perl_html = perl_html 2805 | print("==== match? %r ====" % (norm_perl_html == norm_html)) 2806 | 2807 | 2808 | if __name__ == "__main__": 2809 | sys.exit(main(sys.argv)) 2810 | -------------------------------------------------------------------------------- /src/markdown2/markdown2Mathjax.py: -------------------------------------------------------------------------------- 1 | __version_info__ = (0,3,9) 2 | __version__ = '.'.join(map(str,__version_info__)) 3 | __author__ = "Matthew Young" 4 | 5 | import re 6 | from .markdown2 import markdown 7 | 8 | def break_tie(inline,equation): 9 | """If one of the delimiters is a substring of the other (e.g., $ and $$) it is possible that the two will begin at the same location. In this case we need some criteria to break the tie and decide which operation takes precedence. I've gone with the longer of the two delimiters takes priority (for example, $$ over $). This function should return a 2 for the equation block taking precedence, a 1 for the inline block. The magic looking return statement is to map 0->2 and 1->1.""" 10 | tmp=(inline.end()-inline.start() > equation.end()-equation.start()) 11 | return (tmp*3+2)%4 12 | 13 | def markdown_safe(placeholder): 14 | """Is the placeholder changed by markdown? If it is, this isn't a valid placeholder.""" 15 | mdstrip=re.compile("<p>(.*)</p>\n") 16 | md=markdown(placeholder) 17 | mdp=mdstrip.match(md) 18 | if mdp and mdp.group(1)==placeholder: 19 | return True 20 | return False 21 | 22 | def mathdown(text): 23 | """Convenience function which runs the basic markdown and mathjax processing sequentially.""" 24 | tmp=sanitizeInput(text) 25 | return reconstructMath(markdown(tmp[0]),tmp[1]) 26 | 27 | def sanitizeInput(string,inline_delims=["$","$"],equation_delims=["$$","$$"],placeholder="™™™"): 28 | """Given a string that will be passed to markdown, the content of the different math blocks is stripped out and replaced by a placeholder which MUST be ignored by markdown. A list is returned containing the text with placeholders and a list of the stripped out equations. Note that any pre-existing instances of the placeholder are "replaced" with themselves and a corresponding dummy entry is placed in the returned codeblock. The sanitized string can then be passed safetly through markdown and then reconstructed with reconstructMath. 29 | 30 | There are potential four delimiters that can be specified. The left and right delimiters for inline and equation mode math. These can potentially be anything that isn't already used by markdown and is compatible with mathjax (see documentation for both). 31 | """ 32 | #Check placeholder is valid. 33 | if not markdown_safe(placeholder): 34 | raise ValueError("Placeholder %s altered by markdown processing." % placeholder) 35 | #really what we want is a reverse markdown function, but as that's too much work, this will do 36 | inline_left=re.compile("(?<!\\\\)"+re.escape(inline_delims[0])) 37 | inline_right=re.compile("(?<!\\\\)"+re.escape(inline_delims[1])) 38 | equation_left=re.compile("(?<!\\\\)"+re.escape(equation_delims[0])) 39 | equation_right=re.compile("(?<!\\\\)"+re.escape(equation_delims[1])) 40 | placeholder_re = re.compile("(?<!\\\\)"+re.escape(placeholder)) 41 | placeholder_scan = placeholder_re.scanner(string) 42 | ilscanner=[inline_left.scanner(string),inline_right.scanner(string)] 43 | eqscanner=[equation_left.scanner(string),equation_right.scanner(string)] 44 | scanners=[placeholder_scan,ilscanner,eqscanner] 45 | #There are 3 types of blocks, inline math, equation math and occurances of the placeholder in the text 46 | #inBlack is 0 for a placeholder, 1 for inline block, 2 for equation 47 | inBlock=0 48 | post=-1 49 | stlen=len(string) 50 | startmatches=[placeholder_scan.search(),ilscanner[0].search(),eqscanner[0].search()] 51 | startpoints=[stlen,stlen,stlen] 52 | startpoints[0]= startmatches[0].start() if startmatches[0] else stlen 53 | startpoints[1]= startmatches[1].start() if startmatches[1] else stlen 54 | startpoints[2]= startmatches[2].start() if startmatches[2] else stlen 55 | terminator=-1 56 | sanitizedString='' 57 | codeblocks=[] 58 | while 1: 59 | #find the next point of interest. 60 | while startmatches[0] and startmatches[0].start()<post: 61 | startmatches[0]=placeholder_scan.search() 62 | startpoints[0]= startmatches[0].start() if startmatches[0] else stlen 63 | while startmatches[1] and startmatches[1].start()<post: 64 | startmatches[1]=ilscanner[0].search() 65 | startpoints[1]= startmatches[1].start() if startmatches[1] else stlen 66 | while startmatches[2] and startmatches[2].start()<post: 67 | startmatches[2]=eqscanner[0].search() 68 | startpoints[2]= startmatches[2].start() if startmatches[2] else stlen 69 | #Found start of next block of each type 70 | #Placeholder type always takes precedence if it exists and is next... 71 | if startmatches[0] and min(startpoints)==startpoints[0]: 72 | #We can do it all in one! 73 | #First add the "stripped" code to the blocks 74 | codeblocks.append('0'+placeholder) 75 | #Work out where the placeholder ends 76 | tmp=startpoints[0]+len(placeholder) 77 | #Add the "sanitized" text up to and including the placeholder 78 | sanitizedString = sanitizedString + string[post*(post>=0):tmp] 79 | #Set the new post 80 | post=tmp 81 | #Back to start! 82 | continue 83 | elif startmatches[1] is None and startmatches[2] is None: 84 | #No more blocks, add in the rest of string and be done with it... 85 | sanitizedString = sanitizedString + string[post*(post>=0):] 86 | return (sanitizedString, codeblocks) 87 | elif startmatches[1] is None: 88 | inBlock=2 89 | elif startmatches[2] is None: 90 | inBlock=1 91 | else: 92 | inBlock = (startpoints[1] < startpoints[2]) + (startpoints[1] > startpoints[2])*2 93 | if not inBlock: 94 | inBlock = break_tie(startmatches[1],startmatches[2]) 95 | #Magic to ensure minimum index is 0 96 | sanitizedString = sanitizedString+string[(post*(post>=0)):startpoints[inBlock]] 97 | post = startmatches[inBlock].end() 98 | #Now find the matching end... 99 | while terminator<post: 100 | endpoint=scanners[inBlock][1].search() 101 | #If we run out of terminators before ending this loop, we're done 102 | if endpoint is None: 103 | #Add the unterminated codeblock to the sanitized string 104 | sanitizedString = sanitizedString + string[startpoints[inBlock]:] 105 | return (sanitizedString, codeblocks) 106 | terminator=endpoint.start() 107 | #We fonud a matching endpoint, add the bit to the appropriate codeblock... 108 | codeblocks.append(str(inBlock)+string[post:endpoint.start()]) 109 | #Now add in the appropriate placeholder 110 | sanitizedString = sanitizedString+placeholder 111 | #Fabulous. Now we can start again once we update post... 112 | post = endpoint.end() 113 | 114 | def reconstructMath(processedString,codeblocks,inline_delims=["$","$"],equation_delims=["$$","$$"],placeholder="™™™",htmlSafe=False) -> object: 115 | """This is usually the output of sanitizeInput, after having passed the output string through markdown. The delimiters given to this function should match those used to construct the string to begin with. 116 | 117 | This will output a string containing html suitable to use with mathjax. 118 | 119 | "<" and ">" "&" symbols in math can confuse the html interpreter because they mark the begining and end of definition blocks. To avoid issues, if htmlSafe is set to True these symbols will be replaced by ascii codes in the math blocks. The downside to this is that if anyone is already doing this, there already niced text might be mangled (I think I've taken steps to make sure it won't but not extensively tested...)""" 120 | delims=[['',''],inline_delims,equation_delims] 121 | placeholder_re = re.compile("(?<!\\\\)"+re.escape(placeholder)) 122 | #If we've defined some "new" special characters we'll have to process any escapes of them here 123 | #Make html substitutions. 124 | if htmlSafe: 125 | safeAmp=re.compile("&(?!(?:amp;|lt;|gt;))") 126 | for i in range(len(codeblocks)): 127 | codeblocks[i]=safeAmp.sub("&",codeblock[i]) 128 | codeblocks[i]=codeblocks[i].replace("<","<") 129 | codeblocks[i]=codeblocks[i].replace(">",">") 130 | #Step through the codeblocks one at a time and replace the next occurance of the placeholder. Extra placeholders are invalid math blocks and ignored... 131 | outString='' 132 | scan = placeholder_re.scanner(processedString) 133 | post=0 134 | for i in range(len(codeblocks)): 135 | inBlock=int(codeblocks[i][0]) 136 | match=scan.search() 137 | if not match: 138 | raise ValueError("More codeblocks given than valid placeholders in text.") 139 | outString=outString+processedString[post:match.start()]+delims[inBlock][0]+codeblocks[i][1:]+delims[inBlock][1] 140 | post = match.end() 141 | #Add the rest of the string (if we need to) 142 | if post<len(processedString): 143 | outString = outString+processedString[post:] 144 | return outString 145 | 146 | def findBoundaries(string): 147 | """A depricated function. Finds the location of string boundaries in a stupid way.""" 148 | last='' 149 | twod=[] 150 | oned=[] 151 | boundary=False 152 | inoned=False 153 | intwod=False 154 | for count,char in enumerate(string): 155 | if char=="$" and last!='\\': 156 | #We just hit a valid $ character! 157 | if inoned: 158 | oned.append(count) 159 | inoned=False 160 | elif intwod: 161 | if boundary: 162 | twod.append(count) 163 | intwod=False 164 | boundary=False 165 | else: 166 | boundary=True 167 | elif boundary: 168 | #This means the last character was also a valid $ 169 | twod.append(count) 170 | intwod=True 171 | boundary=False 172 | else: 173 | #This means the last character was NOT a useable $ 174 | boundary=True 175 | elif boundary: 176 | #The last character was a valid $, but this one isn't... 177 | #This means the last character was a valid $, but this isn't 178 | if inoned: 179 | print("THIS SHOULD NEVER HAPPEN!") 180 | elif intwod: 181 | #ignore it... 182 | pass 183 | else: 184 | oned.append(count-1) 185 | inoned=True 186 | boundary=False 187 | last=char 188 | #What if we finished on a boundary character? Actually doesn't matter, but let's include it for completeness 189 | if boundary: 190 | if not (inoned or intwod): 191 | oned.append(count) 192 | inoned=True 193 | return (oned,twod) 194 | -------------------------------------------------------------------------------- /src/obsidian_url.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | from aqt.utils import showInfo 4 | 5 | def process_obsidian_file(file_content:str, files_catalog:list): 6 | lines = file_content.split("\n") 7 | 8 | isInCode = False 9 | 10 | for i in range(0, len(lines)): 11 | if lines[i].startswith("<div class=\"codehilite\"><pre><span></span><code>"): 12 | isInCode = True 13 | elif lines[i].startswith("</code></pre></div>"): 14 | isInCode = False 15 | if not isInCode: 16 | lines[i] = process_obsidian_line(lines[i], files_catalog) 17 | 18 | file_content = "\n".join(lines) 19 | return file_content 20 | 21 | def process_obsidian_line(line, files_catalog:list): 22 | if line.find("[[") != -1 and line.find("]]") != -1: 23 | line = line.replace("[[", "ªªª[[") 24 | line = line.replace("]]", "]]ªªª") 25 | line_segments = line.split("[[") 26 | line = "º".join(line_segments) 27 | line_segments = line.split("]]") 28 | line = "º".join(line_segments) 29 | line_segments = line.split("º") 30 | number_of_segments = len(line_segments) 31 | number_of_replacements = number_of_segments // 2 32 | if number_of_replacements > 0: 33 | for i in range(1, number_of_replacements + 1): 34 | replacement_index = 2 * i - 1 35 | line_segments[replacement_index] = process_obsidian_link_content(line_segments[replacement_index], files_catalog) 36 | line = "".join(line_segments) 37 | line = line.replace("ªªª", "") 38 | return line 39 | 40 | def process_obsidian_link_content(content, files_catalog:list): 41 | if content.find("|") != -1: 42 | content_segments = content.split("|") 43 | obsidian_url = search_for_note(content_segments[0], files_catalog) 44 | content = "<a href = \"" + obsidian_url + "\">" + content_segments[1] + "</a>" 45 | else: 46 | obsidian_url = search_for_note(content, files_catalog) 47 | content = "<a href = \"" + obsidian_url + "\">" + content + "</a>" 48 | return content 49 | 50 | def search_for_note(name:str, files_catalog:list): 51 | for file in files_catalog: 52 | if file.get_file_name() == name: 53 | return file.get_obsidian_url() 54 | return "" 55 | -------------------------------------------------------------------------------- /src/processor.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import re 3 | import html 4 | import random 5 | import hashlib 6 | from . import settings 7 | from .markdown2 import markdown2 8 | from .markdown2 import markdown2Mathjax 9 | from aqt.utils import showInfo 10 | 11 | 12 | 13 | mark_file_extras = { 14 | "fenced-code-blocks": None, 15 | "metadata": None, 16 | "strike": None, 17 | "tables": None, 18 | "tag-friendly": None, 19 | "task_list": None, 20 | "footnotes": None, 21 | "break-on-newline": True 22 | } 23 | 24 | def read_file(full_path:str) -> list: 25 | output = "" 26 | source = "" 27 | uid = "" 28 | has_uid = False 29 | with open(full_path, mode="r", encoding="utf-8") as file: 30 | source = file.read() 31 | temporary_content = markdown2Mathjax.sanitizeInput(source) 32 | temporary_content0 = temporary_content[0].replace("[[", "œœœ") 33 | temporary_content0 = temporary_content[0].replace("]]", "®®®") 34 | if source.startswith("---"): 35 | markdown_file = markdown2.markdown(temporary_content0, extras = ["fenced-code-blocks", "metadata", "strike", "tables", "tag-friendly", "task_list", "footnotes", "break-on-newline"]) 36 | metadata = markdown_file.metadata 37 | else: 38 | markdown_file = markdown2.markdown(temporary_content0, extras = ["fenced-code-blocks", "strike", "tables", "tag-friendly", "task_list", "footnotes", "break-on-newline"]) 39 | metadata = {} 40 | markdown_file = markdown_file.replace("œœœ", "[[") 41 | markdown_file = markdown_file.replace("®®®", "]]") 42 | try: 43 | uid = metadata["uid"] 44 | except: 45 | random_number = random.randint(0, 100000000000000000000000000000) 46 | new_source = source + full_path + str(random_number) 47 | hash_value = hashlib.md5(new_source.encode()) 48 | uid = str(hash_value.hexdigest()) 49 | if len(metadata) == 0: 50 | source = "---\nuid: " + uid + "\n---\n\n" + source 51 | else: 52 | source_lines = source.split("\n") 53 | source_lines[0] = "---\nuid: " + uid 54 | source = "\n".join(source_lines) 55 | 56 | for i in range(len(temporary_content[1])): 57 | temporary_content[1][i] = html.escape(temporary_content[1][i]) 58 | temporary_content[1][i] = temporary_content[1][i].replace("{{", "{ {") 59 | temporary_content[1][i] = temporary_content[1][i].replace("}}", "} }") 60 | 61 | cloze_settings = metadata_to_settings(metadata) 62 | 63 | markdown_file = get_converted_file(cloze_settings, markdown_file) 64 | if cloze_settings["type"] != "cloze" and cloze_settings["type"] != "Cloze": 65 | markdown_file[1] = False 66 | output = markdown2Mathjax.reconstructMath(markdown_file[0], temporary_content[1]) 67 | output = math_conversion(output) 68 | 69 | # FIXME: output here 70 | with open(full_path, mode = "w", encoding = "utf-8") as file: 71 | file.write(source) 72 | return [uid, output, markdown_file[1], metadata] 73 | 74 | def metadata_to_settings(metadata: dict) -> dict: 75 | new_settings = {} 76 | default_settings = settings.get_settings() 77 | for individual_key in default_settings.keys(): 78 | try: 79 | new_settings[individual_key] = metadata[individual_key] 80 | except KeyError: 81 | new_settings[individual_key] = default_settings[individual_key] 82 | return new_settings 83 | 84 | def get_converted_file(cloze_settings, file_content): 85 | file_content = cloze_generation(cloze_settings, file_content) 86 | file_content = cloze_number_generation(cloze_settings["mode"], file_content) 87 | return file_content 88 | 89 | # Special thanks to Anis Qiao (https://github.com/qiaozhanrong) for the math_conversion section of the code! Now, obsidianki can support display math formula written in multiple lines. 90 | 91 | def math_conversion(file_content): 92 | isOpen = False 93 | s = "" 94 | p = 0 95 | while True: 96 | q = file_content.find("$$", p) 97 | if q == -1: 98 | s += file_content[p:] 99 | break 100 | s += file_content[p:q] + ("\]" if isOpen else "\[") 101 | isOpen = not isOpen 102 | p = q + 2 103 | file_content = s 104 | 105 | isOpen = False 106 | s = "" 107 | p = 0 108 | while True: 109 | q = file_content.find("$", p) 110 | if q == -1: 111 | s += file_content[p:] 112 | break 113 | s += file_content[p:q] + ("\)" if isOpen else "\(") 114 | isOpen = not isOpen 115 | p = q + 1 116 | file_content = s 117 | 118 | return file_content 119 | 120 | def cloze_generation(cloze_settings:dict, file_content:str) -> str: 121 | file_content = file_content.replace("#new_cloze", "<label id = \"tag\">#new_cloze</label>") 122 | if cloze_settings["type"] == "cloze" or cloze_settings["type"] == "Cloze": 123 | if cloze_settings["bold"] == "True" or cloze_settings["bold"] == "true": 124 | file_content = file_content.replace("<strong>", "<strong>{{c¡::") 125 | file_content = file_content.replace("</strong>", "}}</strong>") 126 | if cloze_settings["italics"] == "True" or cloze_settings["italics"] == "true": 127 | file_content = file_content.replace("<em>", "<em>{{c¡::") 128 | file_content = file_content.replace("</em>", "}}</em>") 129 | if cloze_settings["image"] == "True" or cloze_settings["image"] == "true": 130 | file_content = apply_cloze_to_image(file_content) 131 | if cloze_settings["inline code"] == "True" or cloze_settings["inline code"] == "true": 132 | file_content = re.sub(r"<code>(?!<span)", "<code>{{c¡::", file_content) 133 | file_content = re.sub(r"</code>(?!</pre>)", "}}</code>", file_content) 134 | if cloze_settings["QA"] == "True" or cloze_settings["QA"] == "true": 135 | tmp = file_content.split("\n") 136 | for i in range(0, len(tmp)): 137 | if tmp[i].startswith("<p>A: ") and tmp[i].endswith("</p>"): 138 | # TODO: add a security check to make sure that these two things are in the same line. 139 | tmp[i] = tmp[i].replace("{{¡::", "") 140 | tmp[i] = tmp[i].replace("}}", "") 141 | tmp[i] = tmp[i].replace("<p>A: ", "<p>A: {{c¡::", 1) 142 | tmp[i] = tmp[i].replace("</p>", "}}</p>", 1) 143 | elif tmp[i].startswith("<p>答:") and tmp[i].endswith("</p>"): 144 | tmp[i] = tmp[i].replace("{{¡::", "") 145 | tmp[i] = tmp[i].replace("}}", "") 146 | tmp[i] = tmp[i].replace("<p>答:", "<p>答:{{c¡::", 1) 147 | tmp[i] = tmp[i].replace("</p>", "}}</p>") 148 | 149 | # ================================================================== 150 | # | You Can Disable this code if you Enabled strict line spacing. | 151 | # ================================================================== 152 | elif tmp[i].startswith("A: ") and tmp[i].endswith("</p>"): 153 | tmp[i] = tmp[i].replace("{{¡::", "") 154 | tmp[i] = tmp[i].replace("}}", "") 155 | tmp[i] = tmp[i].replace("A: ", "A: {{c¡::", 1) 156 | tmp[i] = tmp[i].replace("</p>", "}}</p>", 1) 157 | elif tmp[i].startswith("答:") and tmp[i].endswith("</p>"): 158 | tmp[i] = tmp[i].replace("{{¡::", "") 159 | tmp[i] = tmp[i].replace("}}", "") 160 | tmp[i] = tmp[i].replace("答:", "答: {{c¡::", 1) 161 | tmp[i] = tmp[i].replace("</p>", "}}</p>", 1) 162 | file_content = "\n".join(tmp) 163 | if cloze_settings["list"] == "True" or cloze_settings["list"] == "true": 164 | tmp = file_content.split("\n") 165 | for i in range(0, len(tmp)): 166 | if tmp[i].find("{{c¡::") != -1: 167 | pass 168 | else: 169 | tmp[i] = tmp[i].replace("<li>", "<li>{{c¡::") 170 | tmp[i] = tmp[i].replace("</li>", "}}</li>") 171 | file_content = "\n".join(tmp) 172 | if cloze_settings["quote"] == "True" or cloze_settings["quote"] == "true": 173 | # =================================================== 174 | # | TODO: use REGEX to replace the proper ones here | 175 | # =================================================== 176 | file_content = file_content.replace("<blockquote>", "<blockquote>{{c¡::") 177 | file_content = file_content.replace("</blockquote>", "}}</blockquote>") 178 | if cloze_settings["block code"] == "True" or cloze_settings["block code"] == "true": 179 | # =================================================== 180 | # | TODO: use REGEX to replace the proper ones here | 181 | # =================================================== 182 | file_content = file_content.replace("<div class=\"codehilite\"><pre><span></span><code>", "<div class=\"codehilite\"><pre><span></span><code>{{c¡::") 183 | file_content = file_content.replace("</code></pre></div>", "}}</code></pre></div>") 184 | file_content = highlight_conversion(file_content, cloze_settings["highlight"]) 185 | elif cloze_settings["type"] == "basic" or cloze_settings["type"] == "Basic": 186 | file_content = highlight_conversion(file_content, "False") 187 | return file_content 188 | 189 | def cloze_number_generation(mode:str, file_content:str) -> [str, bool]: 190 | has_cloze = False 191 | 192 | if file_content.find("¡") != -1: 193 | has_cloze = True 194 | 195 | if mode == "word": 196 | cloze_num = 1 197 | while file_content.find("¡") != -1: 198 | file_content = file_content.replace("¡", str(cloze_num), 1) 199 | cloze_num = cloze_num + 1 200 | elif mode == "line": 201 | tmp = file_content.split("\n") 202 | cloze_num = 0 203 | for i in range (0, len(tmp)): 204 | if tmp[i].find("¡") != -1: 205 | cloze_num = cloze_num + 1 206 | tmp[i] = tmp[i].replace("¡", str(cloze_num)) 207 | file_content = "\n".join(tmp) 208 | elif mode == "heading": 209 | # ========================================================== 210 | # | TODO: Check the code here to see if it actually works | 211 | # ========================================================== 212 | tmp = file_content.split("\n") 213 | cloze_num = 0 214 | increase_num = 0 215 | new_cloze = 0 216 | for i in range(0, len(tmp)): 217 | if re.search(r"<h\d>", tmp[i]) != None or tmp[i].find("#new_cloze") != -1: 218 | # TODO: add this to documentation 219 | cloze_num = get_cloze_number(tmp) + 1 220 | if tmp[i].startswith("<p>A: ") or tmp[i].startswith("<p>答:") or tmp[i].startswith("A: ") or tmp[i].startswith("答:"): 221 | increase_num = get_cloze_number(tmp) + 1 222 | tmp[i] = tmp[i].replace("¡", str(increase_num)) 223 | cloze_num = increase_num + 1 224 | elif tmp[i].startswith("<li>"): 225 | if new_cloze == 0 and i < (len(tmp) - 2) and not tmp[i + 1].startswith("<li>"): 226 | increase_num = get_cloze_number(tmp) + 1 227 | tmp[i] = tmp[i].replace("¡", str(increase_num)) 228 | cloze_num = increase_num + 1 229 | elif new_cloze == 0: 230 | new_cloze = 1 231 | increase_num = get_cloze_number(tmp) + 1 232 | tmp[i] = tmp[i].replace("¡", str(increase_num)) 233 | elif new_cloze == 1 and i < (len(tmp) - 2) and tmp[i + 1].startswith("<li>"): 234 | tmp[i] = tmp[i].replace("¡", str(increase_num)) 235 | elif new_cloze == 1 and i < (len(tmp) - 2) and not tmp[i + 1].startswith("<li>"): 236 | tmp[i] = tmp[i].replace("¡", str(increase_num)) 237 | cloze_num = increase_num + 1 238 | new_cloze = 0 239 | tmp[i] = tmp[i].replace("¡", str(cloze_num)) 240 | file_content = "\n".join(tmp) 241 | elif mode == "document": 242 | if file_content.find("¡") != -1: 243 | file_content = file_content.replace("¡", "1") 244 | return [file_content, has_cloze] 245 | 246 | 247 | def get_cloze_number(tmp) -> int: 248 | file_content = "".join(tmp) 249 | cloze_number = 0 250 | for i in range(1, 7): 251 | if file_content.find("{{c%d::"%(i)) != -1: 252 | cloze_number = i 253 | return cloze_number 254 | 255 | # ========================================= 256 | # | TODO: Check to see if this code works | 257 | # ========================================= 258 | 259 | 260 | def highlight_conversion(file_content: str, to_cloze: bool) -> str: 261 | lines = file_content.split("\n") 262 | isInCode = False 263 | for i in range(0, len(lines)): 264 | if lines[i].startswith("<div class=\"codehilite\"><pre><span></span><code>"): 265 | isInCode = True 266 | elif lines[i].startswith("</code></pre></div>"): 267 | isInCode = False 268 | if not isInCode: 269 | lines[i] = apply_highlight(lines[i], to_cloze) 270 | file_content = "\n".join(lines) 271 | return file_content 272 | 273 | 274 | def apply_highlight(line: str, to_cloze: str) -> str: 275 | line = "ªªª" + line + "ªªª" 276 | line_segments = line.split("==") 277 | number_of_highlights = len(line_segments) // 2 278 | if number_of_highlights > 0: 279 | if to_cloze == "True" or to_cloze == "true": 280 | for i in range(1, number_of_highlights + 1): 281 | highlight_index = 2 * i - 1 282 | line_segments[highlight_index] = "<label id = \"highlight\">{{c¡::" + line_segments[highlight_index] + "}}</label>" 283 | else: 284 | for i in range(1, number_of_highlights + 1): 285 | highlight_index = 2 * i - 1 286 | line_segments[highlight_index] = "<label id = \"highlight\">" + line_segments[highlight_index] + "</label>" 287 | 288 | line = "".join(line_segments) 289 | line = line.replace("ªªª", "") 290 | return line 291 | 292 | def apply_cloze_to_image(file_content: str) -> str: 293 | lines = file_content.split("\n") 294 | for i in range(0, len(lines)): 295 | image_url = re.search(r"<img src=\".+? \/>", lines[i]) 296 | if image_url != None: 297 | lines[i] = re.sub(r"<img src=\".+? \/>", "{{c¡::" + image_url.group(0) + "}}", lines[i]) 298 | file_content = "\n".join(lines) 299 | return file_content -------------------------------------------------------------------------------- /src/settings.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import os 3 | import aqt 4 | import pickle 5 | from aqt.qt import * 6 | from aqt import AnkiQt, gui_hooks 7 | from aqt.utils import tooltip 8 | from PyQt5 import QtWidgets, QtCore 9 | 10 | # TODO: Rework on the settings and clean up the code 11 | 12 | default_settings = { 13 | "vault path": "/Users/xiuxuan/Knowledge Base", 14 | "trash folder": "Trash", 15 | "archive folder": "Archive", 16 | "ignore folder": "Templates", 17 | "mode": "heading", 18 | "type": "cloze", 19 | "bold": "True", 20 | "italics": "True", 21 | "image": "True", 22 | "quote": "False", # FIXME: fix the conflict of Quote with other clozes 23 | "QA": "True", 24 | "list": "True", 25 | "inline code": "True", 26 | "block code": "False", 27 | "highlight": "False" 28 | } 29 | 30 | SETTINGS_PATH = os.path.expanduser("~/.obsidianki4.settings") 31 | 32 | 33 | def save_settings(settings, path=SETTINGS_PATH): 34 | with open(path, "wb") as fd: 35 | pickle.dump(settings, fd) 36 | 37 | 38 | def load_settings(path=SETTINGS_PATH): 39 | if os.path.isfile(path): 40 | with open(path, "rb") as fd: 41 | return pickle.load(fd) 42 | return default_settings 43 | 44 | def get_settings(): 45 | settings = load_settings() 46 | return settings 47 | 48 | def get_settings_by_name(setting_name): 49 | settings = load_settings() 50 | try: 51 | return settings[setting_name] 52 | except KeyError: 53 | return default_settings[setting_name] --------------------------------------------------------------------------------