├── .gitignore
├── DEPENDENCIES.md
├── LICENSE
├── README.md
├── data
    ├── cbow.bin
    ├── cbow.sh
    ├── cbow.txt
    ├── mtgvocab.json
    └── output.txt
├── decode.py
├── encode.py
├── lib
    ├── cardlib.py
    ├── cbow.py
    ├── config.py
    ├── datalib.py
    ├── html_extra_data.py
    ├── jdecode.py
    ├── manalib.py
    ├── namediff.py
    ├── nltk_model.py
    ├── nltk_model_api.py
    ├── transforms.py
    └── utils.py
├── mtg_sweep1.ipynb
├── scripts
    ├── analysis.py
    ├── autosample.py
    ├── collect_checkpoints.py
    ├── distances.py
    ├── keydiff.py
    ├── mtg_validate.py
    ├── ngrams.py
    ├── pairing.py
    ├── sanity.py
    ├── streamcards.py
    ├── sum.py
    └── summarize.py
└── sortcards.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | *.pyc
3 | AllSets.json
4 | AllSets-x.json
5 | lib/__init__.py
6 | 


--------------------------------------------------------------------------------
/DEPENDENCIES.md:
--------------------------------------------------------------------------------
  1 | Dependencies
  2 | ======
  3 | 
  4 | ## mtgjson
  5 | 
  6 | First, you'll need the json corpus of Magic the Gathering cards, which can be found at:
  7 | 
  8 | http://mtgjson.com/
  9 | 
 10 | You probably want the file AllSets.json, which you should also be able to download here:
 11 | 
 12 | http://mtgjson.com/json/AllSets.json
 13 | 
 14 | ## Python packages
 15 | 
 16 | mtgencode uses a few additional Python packages which you should be able to install with Pip, Python's package manager. They aren'y mission critical, but they provide better capitalization of names and text in human-readable output formats. If they aren't installed, mtgencode will silently fall back to less effective workarounds.
 17 | 
 18 | On Ubuntu, you should be able to install the necessary packages with:
 19 | 
 20 | ```
 21 | sudo apt-get install python-pip
 22 | sudo pip install titlecase
 23 | sudo pip install nltk
 24 | ```
 25 | 
 26 | nltk requires some additional data files to work, so you'll also have to do:
 27 | 
 28 | ```
 29 | mkdir ~/nltk_data
 30 | cd ~/nltk_data
 31 | python -c "import nltk; nltk.download('punkt')"
 32 | cd -
 33 | ```
 34 | 
 35 | You don't have to put the files in ~/nltk_data, that's just one of the places it will look automatically. If you try to run decode.py with nltk but without the additional files, the error message is pretty helpful.
 36 | 
 37 | mtgencode can also use numpy to speed up some of the long calculations required to generate the creativity statistics comparing similarity of generated and existing cards. You can install numpy with:
 38 | 
 39 | ```
 40 | sudo apt-get install python-dev python-pip
 41 | sudo pip install numpy
 42 | ```
 43 | 
 44 | This will launch an absolutely massive compilation process for all of the numpy C sources. Go get a cup of coffee, and if it fails consult google. You'll probably need to at least have GCC installed, I'm not sure what else.
 45 | 
 46 | Some additional packages will be needed for multithreading, but that doesn't work yet, so no worries.
 47 | 
 48 | ## word2vec
 49 | 
 50 | The creativity analysis is done using vector models produced by this tool:
 51 | 
 52 | https://code.google.com/p/word2vec/
 53 | 
 54 | You can install it pretty easily with subversion:
 55 | 
 56 | ``` 
 57 | sudo apt-get install subversion
 58 | mkdir ~/word2vec
 59 | cd ~/word2vec
 60 | svn checkout http://word2vec.googlecode.com/svn/trunk/
 61 | cd trunk
 62 | make
 63 | ```
 64 | 
 65 | That should create some files, among them a binary called word2vec. Add this to your path somehow, and you'll be able to invoke cbow.sh from within the data/ subdirectory to recompile the vector model (cbow.bin) from whatever text representation was last produced (cbow.txt).
 66 | 
 67 | ## Rebuilding the data files
 68 | 
 69 | The standard procedure to produce the derived data files from AllSets.json is the following:
 70 | 
 71 | ```
 72 | ./encode.py -v data/AllSets.json data/output.txt
 73 | ./encode.py -v data/output.txt data/cbow.txt -s -e vec
 74 | cd data
 75 | ./cbow.sh
 76 | ```
 77 | 
 78 | This of course assumes that you have AllSets.json in data/, and that you start from the root of the repo, in the same directory as encode.py.
 79 | 
 80 | ## Magic Set Editor 2
 81 | 
 82 | MSE2 is a tool for creating and viewing custom magic cards:
 83 | 
 84 | http://magicseteditor.sourceforge.net/
 85 | 
 86 | Set files, with the extension .mse-set, can be produced by decode.py using the -mse option and then viewed in MSE2.
 87 | 
 88 | Unfortunately, getting MSE2 to run on Linux can be tricky. Both Wine 1.6 and 1.7 have been reported to work on Ubuntu; instructions for 1.7 can be found here:
 89 | 
 90 | https://www.winehq.org/download/ubuntu
 91 | 
 92 | To install MSE with Wine, download the standard Windows installer and open it with Wine. Everything should just work. You will need some additional card styles:
 93 | 
 94 | http://sourceforge.net/projects/msetemps/files/Magic%20-%20Recently%20Printed%20Styles.mse-installer/download
 95 | 
 96 | And possibly this:
 97 | 
 98 | http://sourceforge.net/projects/msetemps/files/Magic%20-%20M15%20Extra.mse-installer/download
 99 | 
100 | Once MSE2 is installed with Wine, you should be able to just click on the template installers and MSE2 will know what to do with them.
101 | 
102 | Some additional system fonts are required, specifically Beleren Bold, Beleren Small Caps Bold, and Relay Medium. Those can be found here:
103 | 
104 | http://www.slightlymagic.net/forum/viewtopic.php?f=15&t=14730
105 | 
106 | http://www.azfonts.net/download/relay-medium/ttf.html
107 | 
108 | Open them in Font Viewer and click install; you might then have to clear the caches so MSE2 can see them:
109 | 
110 | ```
111 | sudo fc-cache -fv
112 | ```
113 | 
114 | If you're running a Linux distro other than Ubuntu, then a similar procedure will probably work. If you're on Windows, then it should work fine as is without messing around with Wine. You'll still need the additional styles.
115 | 
116 | I tried to build MSE2 from source on 64-bit Ubuntu. After hacking up some of the files, I did get a working binary, but I was unable to set up the data files it needs in such a way that I could actually open a set. If you manage to get this to work, please explain how, and I will be very grateful.
117 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2015 Bill Zorn
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy
 4 | of this software and associated documentation files (the "Software"), to deal
 5 | in the Software without restriction, including without limitation the rights
 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 7 | copies of the Software, and to permit persons to whom the Software is
 8 | furnished to do so, subject to the following conditions:
 9 | 
10 | The above copyright notice and this permission notice shall be included in
11 | all copies or substantial portions of the Software.
12 | 
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19 | THE SOFTWARE.
20 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # mtgencode
  2 | 
  3 | Utilities to assist in the process of generating Magic the Gathering cards with neural nets. Inspired by this thread on the mtgsalvation forums:
  4 | 
  5 | http://www.mtgsalvation.com/forums/creativity/custom-card-creation/612057-generating-magic-cards-using-deep-recurrent-neural
  6 | 
  7 | The purpose of this code is mostly to wrangle text between various human and machine readable formats. The original input comes from [mtgjson](http://mtgjson.com); this is filtered and reduced to one of several input formats intended for neural network training, such as the standard encoded format used in [data/output.txt](https://github.com/billzorn/mtgencode/blob/master/data/output.txt). Any json or encoded data, including output from appropriately trained neural nets, can then be interpreted as cards and decoded to a human readable format, such as a text spoiler, [Magic Set Editor 2](http://magicseteditor.sourceforge.net) set file, or a pretty, portable html file that can be viewed in any browser.
  8 | 
  9 | ## Requirements
 10 | 
 11 | I'm running this code on Ubuntu 14.04 with Python 2.7. Unfortunately it does not work with Python 3, though apparently it isn't too hard to use 2to3 to automatically convert it.
 12 | 
 13 | For the most part it should work out of the box, though there are a few optional bonus features that will make it much better. See [DEPENDENCIES.md](https://github.com/billzorn/mtgencode/blob/master/DEPENDENCIES.md#dependencies).
 14 | 
 15 | This code does not have anything to do with neural nets; if you want to generate cards with them, see the [tutorial](https://github.com/billzorn/mtgencode#tutorial).
 16 | 
 17 | ## Usage
 18 | 
 19 | Functionality is provided by two main driver scripts: encode.py and decode.py. Logically, encode.py handles encoding to formats intended to feed into a neural network, while decode.py handles decoding to formats intended to be read by a human.
 20 | 
 21 | ### encode.py
 22 | 
 23 | ```
 24 | usage: encode.py [-h] [-e {std,named,noname,rfields,old,norarity,vec,custom}]
 25 |                  [-r] [--nolinetrans] [--nolabel] [-s] [-v]
 26 |                  infile [outfile]
 27 | 
 28 | positional arguments:
 29 |   infile                encoded card file or json corpus to encode
 30 |   outfile               output file, defaults to stdout
 31 | 
 32 | optional arguments:
 33 |   -h, --help            show this help message and exit
 34 |   -e {std,named,noname,rfields,old,norarity,vec,custom}, --encoding {std,named,noname,rfields,old,norarity,vec,custom}
 35 |                         encoding format to use
 36 |   -r, --randomize       randomize the order of symbols in mana costs
 37 |   --nolinetrans         don't reorder lines of card text
 38 |   --nolabel             don't label fields
 39 |   -s, --stable          don't randomize the order of the cards
 40 |   -v, --verbose         verbose output
 41 | ```
 42 | 
 43 | The supported encodings are:
 44 | 
 45 | Argument   | Description
 46 | -----------|------------
 47 | std        | Standard format: `|type|supertype|subtype|loyalty|pt|text|cost|rarity|name|`.
 48 | named      | Name first: `|name|type|supertype|subtype|loyalty|pt|text|cost|rarity|`.
 49 | noname     | No name field at all: `|type|supertype|subtype|loyalty|pt|text|cost|rarity|`.
 50 | rfields    | Randomize the order of the fields, using only the label to distinguish which field is which.
 51 | old        | Legacy format: `|name|supertype|type|loyalty|subtype|rarity|pt|cost|text|`. No field labels.
 52 | norarity   | Older legacy format: `|name|supertype|type|loyalty|subtype|pt|cost|text|`. No field labels.
 53 | vec        | Produce a content vector for each card; used with [word2vec](https://code.google.com/p/word2vec/).
 54 | custom     | Blank format slot, inteded to help users add their own formats to the python source.
 55 | 
 56 | ### decode.py
 57 | 
 58 | ```
 59 | usage: decode.py [-h] [-e {std,named,noname,rfields,old,norarity,vec,custom}]
 60 |                  [-g] [-f] [-c] [-d] [-v] [-mse] [-html]
 61 |                  infile [outfile]
 62 | 
 63 | positional arguments:
 64 |   infile                encoded card file or json corpus to encode
 65 |   outfile               output file, defaults to stdout
 66 | 
 67 | optional arguments:
 68 |   -h, --help            show this help message and exit
 69 |   -e {std,named,noname,rfields,old,norarity,vec,custom}, --encoding {std,named,noname,rfields,old,norarity,vec,custom}
 70 |                         encoding format to use
 71 |   -g, --gatherer        emulate Gatherer visual spoiler
 72 |   -f, --forum           use pretty mana encoding for mtgsalvation forum
 73 |   -c, --creativity      use CBOW fuzzy matching to check creativity of cards
 74 |   -d, --dump            dump out lots of information about invalid cards
 75 |   -v, --verbose         verbose output
 76 |   -mse, --mse           use Magic Set Editor 2 encoding; will output as .mse-
 77 |                         set file
 78 |   -html, --html         create a .html file with pretty forum formatting
 79 | ```
 80 | 
 81 | The default output is a text spoiler which modifies the output of the neural net as little as possible while making it human readable. Specifying the -g option will produce a prettier, Gatherer-inspired text spoiler with heavier-weight transformations applied to the text, such as capitalization. The -f option encodes mana symbols in the format used by the mtgsalvation forum; this is useful if you want to cut and paste your spoiler into a post to share it.
 82 | 
 83 | Passing the -mse option will cause decode.py to produce both the hilarious internal MSE text format as well as an actual mse set file, which is really just a renamed zip archive. The -f and -g flags will be respected in the text that is dumped to each card's notes field.
 84 | 
 85 | Finally, the -c and -d options will print out additional data about the quality of the cards. Running with -c is extremely slow due to the massive amount of computation involved, though at least we can do it in parallel over all of your processor cores; -d is probably a good idea to use in general unless you're trying to produce pretty output to show off. Using html mode is especially useful with -c as we can link to visual spoilers from magiccards.info.
 86 | 
 87 | ### Examples
 88 | 
 89 | To generate the standard encoding in data/output.txt, I run:
 90 | 
 91 | ```
 92 | ./encode.py -v data/AllSets.json data/output.txt
 93 | ```
 94 | 
 95 | Of course, this requires that you've downloaded the mtgjson corpus to data/AllSets.json, and are running from the root of the repo.
 96 | 
 97 | If I wanted to convert that standard output to a Magic Set Editor 2 set, I'd run:
 98 | 
 99 | ```
100 | ./decode.py -v data/output.txt data/allcards -f -g -d
101 | ```
102 | 
103 | This will produce a useless text file called data/allcards, and a set file called data/allcards.mse-set that you can open with MSE2. The -f and -g options will cause the text spoiler included in the notes field of each card in the set to be a pretty Gatherer-inspired affair that you could cut and paste onto the mtgsalvation forum. The -d option will dump additional information if any of the cards are invalidly formatted, which probably won't do anything because all existing magic cards are encoded correctly. Specifying the -c option here would be a bad idea; it would probably take several days to run.
104 | 
105 | ### Scripts
106 | 
107 | A bunch of additional data processing functionality is provided by the files in scripts/. Right now there isn't a whole lot, but more tools might be added in the future, to do things such as convert card dumps into .arff files that could be analyzed in [Weka](http://www.cs.waikato.ac.nz/ml/weka/).
108 | 
109 | Currently, scripts/summarize.py will build a bunch of big data mining indices and use them to print out interesting statistics about a dump of cards. If you want to use mtgencode to do your own data analysis, taking a look at it would be a good place to start.
110 | 
111 | 
112 | 
113 | ## Tutorial
114 | 
115 | This tutorial will cover how to generate cards from scratch using neural nets.
116 | 
117 | ### Set up a Linux environment
118 | 
119 | If you're already running on Linux, hooray! If not, you have a few options. The easiest is probably to use a virtual machine; the disadvantage of this approach is that it will prevent you from using a graphics card to train the neural net, which speeds things up immensely. For reference, my GTX Titan is about 10x faster than my overclocked 8-core i7-5960X.
120 | 
121 | The other option is to dual boot your machine (which is what I do) or otherwise acquire a machine that you can run Linux on natively. How exactly you do this is beyond the scope of this tutorial.
122 | 
123 | If you do decide to go the virtual machine route:
124 | 
125 | 1. Download some sort of virtual machine software. I recommend [VirtualBox](https://help.ubuntu.com/community/VirtualBox).
126 | 2. Download a Linux operating system. I recommend [Ubuntu](http://www.ubuntu.com/download/desktop).
127 | 3. [Create a virtual machine, and install the operating system on it](https://help.ubuntu.com/community/VirtualBox/FirstVM).
128 | 
129 | IMPORTANT NOTE: Training neural nets is extremely CPU intensive, and rather memory intensive as well. If you don't want training to take multiple weeks, it's a very good idea to give your virtual machine as many processor cores and as much memory as you can spare, and to monitor system performance with the 'top' command to make sure you aren't [swapping](https://help.ubuntu.com/community/SwapFaq), as that will degrade performance immensely.
130 | 
131 | You should be able to boot up the virtual machine and use whatever operating system you installed. If you're new to Linux, you might want to familiarize yourself with it a little. For my own sanity, I'm going to assume at least basic familiarity. Most of what we'll be doing will be in terminals; if the instructions say to do something and then provide some code in a block quote, it probably means to type that into a terminal, on line at a time.
132 | 
133 | ### Set up the neural net code
134 | 
135 | We're ultimately going to use the code from the [mtg-rnn repo](https://github.com/billzorn/mtg-rnn); if anything is unclear you can refer to the documentation there as well.
136 | 
137 | First, we need to install some dependencies. The primary one is Torch, the scientific computing framework the neural net code is written. Directions are [here](http://torch.ch/docs/getting-started.html).
138 | 
139 | Next, open a terminal and install some additional lua packages:
140 | 
141 | ```
142 | luarocks install nngraph
143 | luarocks install optim
144 | ```
145 | 
146 | Now we'll clone the git repo with the neural net code. You'll need git installed, if it isn't:
147 | 
148 | ```
149 | sudo apt-get install git
150 | ```
151 | 
152 | Then go to your home directory (or wherever you want to put the repo, it can be anywhere really) and clone it:
153 | 
154 | ```
155 | cd ~
156 | git clone https://github.com/billzorn/mtg-rnn.git
157 | ```
158 | 
159 | This should create the folder mtg-rnn, with a bunch of files in it. To check if it works, try:
160 | 
161 | ```
162 | cd ~/mtg-rnn
163 | th train.lua --help
164 | ```
165 | 
166 | A large usage message should be printed. If you get an error, then check to make sure Torch is working. As always, Google is your best friend when anything goes wrong.
167 | 
168 | ### Set up mtgencode
169 | 
170 | Go back to your home directory (or wherever) and clone mtgencode as well:
171 | 
172 | ```
173 | cd ~
174 | git clone https://github.com/billzorn/mtgencode.git
175 | ```
176 | 
177 | This should create the folder mtgencode, also with a bunch of files in it.
178 | 
179 | You'll need Python to run it; to get full functionality, consult [DEPENDENCIES.md](https://github.com/billzorn/mtgencode/blob/master/DEPENDENCIES.md#dependencies). But, it should work with just Python. To install Python:
180 | 
181 | ```
182 | sudo apt-get install python
183 | ```
184 | 
185 | To check if it works:
186 | 
187 | ```
188 | cd ~/mtgencode
189 | ./encode.py --help
190 | ```
191 | 
192 | Again, you should see a usage message; if you don't, make sure Python is working. mtgencode uses Python 2.7, so if you think your default python is Python 3, you can try:
193 | 
194 | ```
195 | python2 encode.py --help
196 | ```
197 | 
198 | instead of running the script directly.
199 | 
200 | ### Generating an encoded corpus for training
201 | 
202 | If you just want to train with the default corpus, you can skip this step, as it already exists in mtg-rnn. Just replace all instances of 'custom_encoding' with 'mtgencode-std'.
203 | 
204 | To generate an encoded corpus, you'll first need to download AllSets.json from [mtgjson.com](http://mtgjson.com/) to data/AllSets.json. Then to encode it:
205 | 
206 | ```
207 | ./encode.py -v data/AllSets.json data/custom_encoding.txt
208 | ```
209 | 
210 | This will create a the file data/custom_encoding.txt with your encoding in it. You can add some options to create a different encoding; consult the usage of [encode.py](https://github.com/billzorn/mtgencode#encodepy).
211 | 
212 | Now copy this encoded corpus over to mtg-rnn:
213 | 
214 | ```
215 | cd ~/mtg-rnn
216 | mkdir data/custom_encoding
217 | cp ~/mtgencode/data/custom_encoding.txt data/custom_encoding/input.txt
218 | ```
219 | 
220 | The input file does have to be named input.txt, though you can name the folder that holds it, under mtg-rnn/data/, whatever you want.
221 | 
222 | ### Training a neural net
223 | 
224 | There are lots of parameters to control training. With a good GPU, I can train a 3-layer, size 512 network in a few hours; on a CPU this will probably take at least a day. 
225 | 
226 | Most networks we use are about that size. I'd recommend avoiding anything much larger, as they don't seem to produce appreciably better results and take longer to train. The only other parameter you really have to change from the defaults is seq_length, which we usually set somewhere from 120-200. If this causes memory issues you can reduce batch_size slightly to compensate.
227 | 
228 | A sample training command might like this:
229 | 
230 | ```
231 | th train.lua -gpuid -1 -rnn_size 256 -num_layers 3 -seq_length 200 -data_dir data/custom_encoding -checkpoint_dir cv/custom_format-256/ -eval_val_every 1000 -seed 7767
232 | ```
233 | 
234 | This tells the neural network to train using the corpus in data/custom_encoding/, and to output periodic checkpoints to the directory cv/custom_format-256/. The option "-gpuid -1" means to use the CPU, not a GPU (which won't be possible in VirtualBox anyway). The final options, -eval_val_every and -seed, aren't necessary, but I like to specify them. The seed will be set to a fixed 123 if you don't specify one yourself. If you're generating too many checkpoints and filling up your disk, you can increase the number of iterations between saving them by increasing the argument to -eval_val_every.
235 | 
236 | If all goes well, you should see the neural net code do some stuff and then start training, reporting training loss and batch times as it goes:
237 | 
238 | ```
239 | 1/112100 (epoch 0.000), train_loss = 4.21492900, grad/param norm = 3.1264e+00, time/batch = 4.73s
240 | 2/112100 (epoch 0.001), train_loss = 4.29372822, grad/param norm = 8.6741e+00, time/batch = 3.62s
241 | 3/112100 (epoch 0.001), train_loss = 4.02817964, grad/param norm = 8.0445e+00, time/batch = 3.57s
242 | ...
243 | ```
244 | 
245 | This process can take a while, so go to sleep or something and come back in the morning. The train_loss should eventually start to decrease and settle around 0.5 or so; if it doesn't, then something is wrong and the neural net will probably produce gibberish.
246 | 
247 | Every N iterations, where N is the argument to -eval_val_every, the neural net will generate a checkpoint in cv/custom_format-256/. They look like this:
248 | 
249 | ```
250 | lm_lstm_epoch2.23_0.5367.t7
251 | ```
252 | 
253 | The numbers are important; the first is the epoch, which tells you how many passes the neural network had made over the training data when it saved the checkpoint, and the second is the validation loss of the checkpoint. Validation loss is effectively a measurement of how accurate the checkpoint is at producing text that resembles the encoded format, the lower the better. The two numbers are separated by an underscore, so for the example above, the checkpoint is from epoch 2.23, and it had a validation loss of 0.5367, which isn't great but probably isn't gibberish either.
254 | 
255 | ### Sampling checkpoints to generate cards
256 | 
257 | Once you're done training, or you've got enough checkpoints and you're just impatient, you can sample to generate actual cards. If the network is still training, you'll probably want to pause it by typing Control-Z in the terminal; you can resume it later with the command 'fg'. Training will use all available CPU resources all by itself, so trying to sample at the same time is a recipe for slow.
258 | 
259 | Once you're ready, go the the mtg-rnn repo. A typical sampling command might look like this:
260 | 
261 | ```
262 | th sample.lua cv/custom_format-256/lm_lstm_epochXX.XX_X.XXXX.t7 -gpuid -1 -temperature 0.9 -length 2000 | tee cards.txt
263 | ```
264 | 
265 | Replace the Xs in the checkpoint name with the numbers in the name of an actual checkpoint; tab completion is your friend. This command will sample 2000 characters, which is probably something like 20 cards, and both print them to the terminal and write them to a file called cards.txt. The interesting options here are the temperature and the length. Temperature controls how cautious the network is; lower values produce more probable output, while higher values make it wilder and more creative. Somewhere in the range of 0.7-1.0 usually works best. Length is just how many characters to generate. You can also specify a seed with -seed, exactly as for training, which is a particularly good idea if you just generated a few million characters and would like to see something new. The default seed is fixed at 123, again exactly as for training.
266 | 
267 | You can read the output yourself, but it might be painful, especially if you're using randomly ordered fields.
268 | 
269 | ### Postprocessing neural net output with mtgencode
270 | 
271 | Once you've generated some cards, you can turn them into pretty text spoilers or a set file for MSE2.
272 | 
273 | Go back to mtgencode, and run something like:
274 | 
275 | ```
276 | ./decode.py -v ~/mtg-rnn/cards.txt cards.pretty.txt -d
277 | ```
278 | 
279 | This should create a file called cards.pretty.txt with a text spoiler in it that's actually designed for human consumption. Open it in your favorite text editor and enjoy!
280 | 
281 | The -d option ensures you'll still be able to see anything that went wrong with the cards. You can change the formatting with -f and -g, and produce a set file for MSE2 with -mse. The -c option produces some intersting comparisons to existing cards, but it's slow, so be prepared to wait a long time if you use it on a large dump.
282 | 
283 | ## Gory details of the format
284 | 
285 | Individual cards are separated by two newlines. Multifaced cards (split, flip, etc.) are encoded together, with the castable one first if applicable, and separated by only one newline.
286 | 
287 | All decimal numbers are in represented in unary, with numbers over 20 special-cased into english. Fun fact: the only numbers over 20 on cards are 25, 30, 40, 50, 100, and 200. The unary represenation uses one character to mark the start of the number, and another to count. So 0 is &, 1 is &^, 2 is &^^, 11 is &^^^^^^^^^^^, and so on.
288 | 
289 | Mana costs are specially encoded between braces {}. I use the unary counter to encode the colorless part, and then special two-character symbols for everything else. So, {3}{W}{W} becomes {^^^WWWW}, {U/B}{U/B} becomes {UBUB}, and {X}{X}{X} becomes {XXXXXX}. The details are controlled in lib/utils.py, and handled with the Manacost and Manatext objects in lib/manalib.py.
290 | 
291 | The name of the card becomes @ in the text. I try to handle all the stupid special cases correctly. For example, Crovax the Cursed is referred to in his text box as simply 'Crovax'. Yuch.
292 | 
293 | The names of counters are similarly replaced with %, and then a speial line of text is added to tell what kind of counter % refers to. Fun fact: there's more than a hundred different kinds used in real cards.
294 | 
295 | Several ambiguous words are resolved. Most directly, the word 'counter' as in 'counter target spell' is replaced with 'uncast'. This should prevent confusion with +&^/+&^ counters and % counters.
296 | 
297 | I also reformat cards that choose between multiple things by removing the choice clause itself and instead having a delimited list of options prefixed by a number. If you could choose different numbers of things (one or both, one or more - turns out the latter is valid in all existing cases) then the number is 0, otherwise it's however many things you'd get to choose. So, 'choose one -\= effect x\= effect y' (the \ is a newline) becomes [&^ = effect x = effect y].
298 | 
299 | Finally, some postprocessing is done to put the lines of a card's ability text into a standardized, canonical form. Lines with multiple keywords are split, and then we put all of the simple keywords first, followed by things like static or activated abilities. A few things always go first (such as equip and enchant) and a few other things always go last (such as kicker and countertype). There are various reasons for doing this transformation, and some proper science could probably come up with a better specific procedure. One of the primary motivations for putting abilities onto individual lines is that it should simplify the process of adding back in reminder text. It should be noted somewhere that the definition of a simple keyword ability vs. some other line of text is that a simple keyword won't contain a period, and we can split a line with multiple of them by looking for commas and semicolons.
300 | 
301 | ======
302 | 
303 | Here's an attempt at a list of all the things I do:
304 | 
305 | * Aggregate split / flip / rotating / etc. cards by their card number (22a with 22b) and put them together
306 | 
307 | * Make all text lowercase, so the symbols for mana and X are distinct
308 | 
309 | * Remove all reminder text
310 | 
311 | * Put @ in for the name of the card
312 | 
313 | * Encode the mana costs, and the tap and untap symbols
314 | 
315 | * Convert decimal numbers to unary
316 | 
317 | * Simplify the syntax of dashes, so that - is only used as a minus sign, and ~ is used elsewhere
318 | 
319 | * Make sure that where X is the variable X, it's uppercase
320 | 
321 | * Change the names of all counters to % and add a line to identify what kind of counter % refers to
322 | 
323 | * Move the equip cost of equipment to the beginning of the text so that it's closer to the type
324 | 
325 | * Rename 'counter' in the context of 'counter target spell' to 'uncast'
326 | 
327 | * Put choices into [&^ = effect x = effect y] format
328 | 
329 | * Replace acutal newline characters with \ so that we can use those to separate cards
330 | 
331 | * Clean all the unicode junk like accents and unicode minus signs out of the text so there are fewer characters
332 | 
333 | * Split composite text lines (i.e. "flying, first strike" -> "flying\first strike") and put the lines into canonical order
334 | 


--------------------------------------------------------------------------------
/data/cbow.bin:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/billzorn/mtgencode/ee5f26590dc77cd252fa0ceb00d88b4665e2a9bf/data/cbow.bin


--------------------------------------------------------------------------------
/data/cbow.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | 
3 | word2vec -train cbow.txt -output cbow.bin -cbow 1 -size 200 -window 8 -negative 25 -hs 0 -sample 1e-4 -threads 8 -binary 1 -iter 15
4 | 


--------------------------------------------------------------------------------
/data/mtgvocab.json:
--------------------------------------------------------------------------------
1 | {"idx_to_token": {"1": "\n", "2": " ", "3": "\"", "4": "%", "5": "&", "6": "'", "7": "*", "8": "+", "9": ",", "10": "-", "11": ".", "12": "/", "13": "0", "14": "1", "15": "2", "16": "3", "17": "4", "18": "5", "19": "6", "20": "7", "21": "8", "22": "9", "23": ":", "24": "=", "25": "@", "26": "A", "27": "B", "28": "C", "29": "E", "30": "G", "31": "L", "32": "N", "33": "O", "34": "P", "35": "Q", "36": "R", "37": "S", "38": "T", "39": "U", "40": "W", "41": "X", "42": "Y", "43": "[", "44": "\\", "45": "]", "46": "^", "47": "a", "48": "b", "49": "c", "50": "d", "51": "e", "52": "f", "53": "g", "54": "h", "55": "i", "56": "j", "57": "k", "58": "l", "59": "m", "60": "n", "61": "o", "62": "p", "63": "q", "64": "r", "65": "s", "66": "t", "67": "u", "68": "v", "69": "w", "70": "x", "71": "y", "72": "z", "73": "{", "74": "|", "75": "}", "76": "~"}, "token_to_idx": {"\n": 1, " ": 2, "\"": 3, "%": 4, "'": 6, "&": 5, "+": 8, "*": 7, "-": 10, ",": 9, "/": 12, ".": 11, "1": 14, "0": 13, "3": 16, "2": 15, "5": 18, "4": 17, "7": 20, "6": 19, "9": 22, "8": 21, ":": 23, "=": 24, "A": 26, "@": 25, "C": 28, "B": 27, "E": 29, "G": 30, "L": 31, "O": 33, "N": 32, "Q": 35, "P": 34, "S": 37, "R": 36, "U": 39, "T": 38, "W": 40, "Y": 42, "X": 41, "[": 43, "]": 45, "\\": 44, "^": 46, "a": 47, "c": 49, "b": 48, "e": 51, "d": 50, "g": 53, "f": 52, "i": 55, "h": 54, "k": 57, "j": 56, "m": 59, "l": 58, "o": 61, "n": 60, "q": 63, "p": 62, "s": 65, "r": 64, "u": 67, "t": 66, "w": 69, "v": 68, "y": 71, "x": 70, "{": 73, "z": 72, "}": 75, "|": 74, "~": 76}}


--------------------------------------------------------------------------------
/decode.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | import sys
  3 | import os
  4 | import zipfile
  5 | import shutil
  6 | 
  7 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'lib')
  8 | sys.path.append(libdir)
  9 | import utils
 10 | import jdecode
 11 | import cardlib
 12 | from cbow import CBOW
 13 | from namediff import Namediff
 14 | 
 15 | def main(fname, oname = None, verbose = True, encoding = 'std',
 16 |          gatherer = False, for_forum = False, for_mse = False,
 17 |          creativity = False, vdump = False, for_html = False):
 18 | 
 19 |     # there is a sane thing to do here (namely, produce both at the same time)
 20 |     # but we don't support it yet.
 21 |     if for_mse and for_html:
 22 |         print 'ERROR - decode.py - incompatible formats "mse" and "html"'
 23 |         return
 24 | 
 25 |     fmt_ordered = cardlib.fmt_ordered_default
 26 | 
 27 |     if encoding in ['std']:
 28 |         pass
 29 |     elif encoding in ['named']:
 30 |         fmt_ordered = cardlib.fmt_ordered_named
 31 |     elif encoding in ['noname']:
 32 |         fmt_ordered = cardlib.fmt_ordered_noname
 33 |     elif encoding in ['rfields']:
 34 |         pass
 35 |     elif encoding in ['old']:
 36 |         fmt_ordered = cardlib.fmt_ordered_old
 37 |     elif encoding in ['norarity']:
 38 |         fmt_ordered = cardlib.fmt_ordered_norarity
 39 |     elif encoding in ['vec']:
 40 |         pass
 41 |     elif encoding in ['custom']:
 42 |         ## put custom format decisions here ##########################
 43 |         
 44 |         ## end of custom format ######################################
 45 |         pass
 46 |     else:
 47 |         raise ValueError('encode.py: unknown encoding: ' + encoding)
 48 | 
 49 |     cards = jdecode.mtg_open_file(fname, verbose=verbose, fmt_ordered=fmt_ordered)
 50 | 
 51 |     if creativity:
 52 |         namediff = Namediff()
 53 |         cbow = CBOW()
 54 |         if verbose:
 55 |             print 'Computing nearest names...'
 56 |         nearest_names = namediff.nearest_par(map(lambda c: c.name, cards), n=3)
 57 |         if verbose:
 58 |             print 'Computing nearest cards...'
 59 |         nearest_cards = cbow.nearest_par(cards)
 60 |         for i in range(0, len(cards)):
 61 |             cards[i].nearest_names = nearest_names[i]
 62 |             cards[i].nearest_cards = nearest_cards[i]
 63 |         if verbose:
 64 |             print '...Done.'
 65 | 
 66 |     def hoverimg(cardname, dist, nd):
 67 |         truename = nd.names[cardname]
 68 |         code = nd.codes[cardname]
 69 |         namestr = ''
 70 |         if for_html:
 71 |             if code:
 72 |                 namestr = ('<div class="hover_img"><a href="#">' + truename 
 73 |                            + '<span><img style="background: url(http://magiccards.info/scans/en/' + code
 74 |                            + ');" alt=""/></span></a>' + ': ' + str(dist) + '\n</div>\n')
 75 |             else:
 76 |                 namestr = '<div>' + truename + ': ' + str(dist) + '</div>'
 77 |         elif for_forum:
 78 |             namestr = '[card]' + truename + '[/card]' + ': ' + str(dist) + '\n'
 79 |         else:
 80 |             namestr = truename + ': ' + str(dist) + '\n'
 81 |         return namestr 
 82 | 
 83 |     def writecards(writer):
 84 |         if for_mse:
 85 |             # have to prepend a massive chunk of formatting info
 86 |             writer.write(utils.mse_prepend)
 87 | 
 88 |         if for_html:
 89 |             # have to preapend html info
 90 |             writer.write(utils.html_prepend)
 91 |             # seperate the write function to allow for writing smaller chunks of cards at a time
 92 |             segments = sort_colors(cards)
 93 |             for i in range(len(segments)):
 94 |                 # sort color by CMC
 95 |                 segments[i] = sort_type(segments[i])
 96 |                 # this allows card boxes to be colored for each color 
 97 |                 # for coloring of each box seperately cardlib.Card.format() must change non-minimaly
 98 |                 writer.write('<div id="' + utils.segment_ids[i] + '">')
 99 |                 writehtml(writer, segments[i])
100 |                 writer.write("</div><hr>")
101 |             # closing the html file
102 |             writer.write(utils.html_append)
103 |             return #break out of the write cards funcrion to avoid writing cards twice
104 | 
105 | 
106 |         for card in cards:
107 |             if for_mse:
108 |                 writer.write(card.to_mse().encode('utf-8'))
109 |                 fstring = ''
110 |                 if card.json:
111 |                     fstring += 'JSON:\n' + card.json + '\n'
112 |                 if card.raw: 
113 |                     fstring += 'raw:\n' + card.raw + '\n'
114 |                 fstring += '\n'
115 |                 fstring += card.format(gatherer = gatherer, for_forum = for_forum,
116 |                                        vdump = vdump) + '\n'
117 |                 fstring = fstring.replace('<', '(').replace('>', ')')
118 |                 writer.write(('\n' + fstring[:-1]).replace('\n', '\n\t\t'))
119 |             else:
120 |                 fstring = card.format(gatherer = gatherer, for_forum = for_forum,
121 |                                       vdump = vdump, for_html = for_html)
122 |                 writer.write((fstring + '\n').encode('utf-8'))
123 | 
124 |             if creativity:
125 |                 cstring = '~~ closest cards ~~\n'
126 |                 nearest = card.nearest_cards
127 |                 for dist, cardname in nearest:
128 |                     cstring += hoverimg(cardname, dist, namediff)
129 |                 cstring += '~~ closest names ~~\n'
130 |                 nearest = card.nearest_names
131 |                 for dist, cardname in nearest:
132 |                     cstring += hoverimg(cardname, dist, namediff)
133 |                 if for_mse:
134 |                     cstring = ('\n\n' + cstring[:-1]).replace('\n', '\n\t\t')
135 |                 writer.write(cstring.encode('utf-8'))
136 | 
137 |             writer.write('\n'.encode('utf-8'))
138 | 
139 |         if for_mse:
140 |             # more formatting info
141 |             writer.write('version control:\n\ttype: none\napprentice code: ')
142 |             
143 | 
144 |     def writehtml(writer, card_set):
145 |         for card in card_set:
146 |             fstring = card.format(gatherer = gatherer, for_forum = True,
147 |                                       vdump = vdump, for_html = for_html)
148 |             if creativity:
149 |                 fstring = fstring[:-6] # chop off the closing </div> to stick stuff in
150 |             writer.write((fstring + '\n').encode('utf-8'))
151 | 
152 |             if creativity:
153 |                 cstring = '~~ closest cards ~~\n<br>\n'
154 |                 nearest = card.nearest_cards
155 |                 for dist, cardname in nearest:
156 |                     cstring += hoverimg(cardname, dist, namediff)
157 |                 cstring += "<br>\n"
158 |                 cstring += '~~ closest names ~~\n<br>\n'
159 |                 nearest = card.nearest_names
160 |                 for dist, cardname in nearest:
161 |                     cstring += hoverimg(cardname, dist, namediff)
162 |                 cstring = '<hr><div>' + cstring + '</div>\n</div>'
163 |                 writer.write(cstring.encode('utf-8'))
164 | 
165 |             writer.write('\n'.encode('utf-8'))
166 | 
167 |     # Sorting by colors
168 |     def sort_colors(card_set):
169 |         # Initialize sections
170 |         red_cards = []
171 |         blue_cards = []
172 |         green_cards = []
173 |         black_cards = []
174 |         white_cards = []
175 |         multi_cards = []
176 |         colorless_cards = []
177 |         lands = []
178 |         for card in card_set:
179 |             if len(card.get_colors())>1:
180 |                 multi_cards += [card]
181 |                 continue
182 |             if 'R' in card.get_colors():
183 |                 red_cards += [card]
184 |                 continue
185 |             elif 'U' in card.get_colors():
186 |                 blue_cards += [card]
187 |                 continue
188 |             elif 'B' in card.get_colors():
189 |                 black_cards += [card]
190 |                 continue
191 |             elif 'G' in card.get_colors():
192 |                 green_cards += [card]
193 |                 continue
194 |             elif 'W' in card.get_colors():
195 |                 white_cards += [card]
196 |                 continue
197 |             else:
198 |                 if "land" in card.get_types():
199 |                     lands += [card]
200 |                     continue
201 |                 colorless_cards += [card]
202 |         return[white_cards, blue_cards, black_cards, red_cards, green_cards, multi_cards, colorless_cards, lands]
203 | 
204 |     def sort_type(card_set):
205 |         sorting = ["creature", "enchantment", "instant", "sorcery", "artifact", "planeswalker"]
206 |         sorted_cards = [[],[],[],[],[],[],[]]
207 |         sorted_set = []
208 |         for card in card_set:
209 |             types = card.get_types()
210 |             for i in range(len(sorting)):
211 |                 if sorting[i] in types:
212 |                     sorted_cards[i] += [card]
213 |                     break
214 |             else:
215 |                 sorted_cards[6] += [card]
216 |         for value in sorted_cards:
217 |             for card in value:
218 |                 sorted_set += [card]
219 |         return sorted_set
220 | 
221 | 
222 | 
223 |     def sort_cmc(card_set):
224 |         sorted_cards = []
225 |         sorted_set = []
226 |         for card in card_set:
227 |             # make sure there is an empty set for each CMC
228 |             while len(sorted_cards)-1 < card.get_cmc():
229 |                 sorted_cards += [[]]
230 |             # add card to correct set of CMC values
231 |             sorted_cards[card.get_cmc()] += [card]
232 |         # combine each set of CMC valued cards together
233 |         for value in sorted_cards:
234 |             for card in value:
235 |                 sorted_set += [card]
236 |         return sorted_set
237 | 
238 | 
239 |     if oname:
240 |         if for_html:
241 |             print oname
242 |             # if ('.html' != oname[-])
243 |             #     oname += '.html'
244 |         if verbose:
245 |             print 'Writing output to: ' + oname
246 |         with open(oname, 'w') as ofile:
247 |             writecards(ofile)
248 |         if for_mse:
249 |             # Copy whatever output file is produced, name the copy 'set' (yes, no extension).
250 |             if os.path.isfile('set'):
251 |                 print 'ERROR: tried to overwrite existing file "set" - aborting.'
252 |                 return
253 |             shutil.copyfile(oname, 'set')
254 |             # Use the freaky mse extension instead of zip.
255 |             with zipfile.ZipFile(oname+'.mse-set', mode='w') as zf:
256 |                 try:
257 |                     # Zip up the set file into oname.mse-set.
258 |                     zf.write('set') 
259 |                 finally:
260 |                     if verbose:
261 |                         print 'Made an MSE set file called ' + oname + '.mse-set.'
262 |                     # The set file is useless outside the .mse-set, delete it.
263 |                     os.remove('set') 
264 |     else:
265 |         writecards(sys.stdout)
266 |         sys.stdout.flush()
267 | 
268 | 
269 | if __name__ == '__main__':
270 |     import argparse
271 |     parser = argparse.ArgumentParser()
272 |     
273 |     parser.add_argument('infile', #nargs='?'. default=None,
274 |                         help='encoded card file or json corpus to encode')
275 |     parser.add_argument('outfile', nargs='?', default=None,
276 |                         help='output file, defaults to stdout')
277 |     parser.add_argument('-e', '--encoding', default='std', choices=utils.formats,
278 |                         #help='{' + ','.join(formats) + '}',
279 |                         help='encoding format to use',
280 |     )
281 |     parser.add_argument('-g', '--gatherer', action='store_true',
282 |                         help='emulate Gatherer visual spoiler')
283 |     parser.add_argument('-f', '--forum', action='store_true',
284 |                         help='use pretty mana encoding for mtgsalvation forum')
285 |     parser.add_argument('-c', '--creativity', action='store_true',
286 |                         help='use CBOW fuzzy matching to check creativity of cards')
287 |     parser.add_argument('-d', '--dump', action='store_true',
288 |                         help='dump out lots of information about invalid cards')
289 |     parser.add_argument('-v', '--verbose', action='store_true', 
290 |                         help='verbose output')
291 |     parser.add_argument('-mse', '--mse', action='store_true', 
292 |                         help='use Magic Set Editor 2 encoding; will output as .mse-set file')
293 |     parser.add_argument('-html', '--html', action='store_true', help='create a .html file with pretty forum formatting')
294 | 
295 |     args = parser.parse_args()
296 | 
297 |     main(args.infile, args.outfile, verbose = args.verbose, encoding = args.encoding,
298 |          gatherer = args.gatherer, for_forum = args.forum, for_mse = args.mse,
299 |          creativity = args.creativity, vdump = args.dump, for_html = args.html)
300 | 
301 |     exit(0)
302 | 


--------------------------------------------------------------------------------
/encode.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | import sys
  3 | import os
  4 | 
  5 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'lib')
  6 | sys.path.append(libdir)
  7 | import re
  8 | import random
  9 | import utils
 10 | import jdecode
 11 | import cardlib
 12 | 
 13 | def main(fname, oname = None, verbose = True, encoding = 'std', 
 14 |          nolinetrans = False, randomize = False, nolabel = False, stable = False):
 15 |     fmt_ordered = cardlib.fmt_ordered_default
 16 |     fmt_labeled = None if nolabel else cardlib.fmt_labeled_default
 17 |     fieldsep = utils.fieldsep
 18 |     line_transformations = not nolinetrans
 19 |     randomize_fields = False
 20 |     randomize_mana = randomize
 21 |     initial_sep = True
 22 |     final_sep = True
 23 | 
 24 |     # set the properties of the encoding
 25 | 
 26 |     if encoding in ['std']:
 27 |         pass
 28 |     elif encoding in ['named']:
 29 |         fmt_ordered = cardlib.fmt_ordered_named
 30 |     elif encoding in ['noname']:
 31 |         fmt_ordered = cardlib.fmt_ordered_noname
 32 |     elif encoding in ['rfields']:
 33 |         randomize_fields = True
 34 |         final_sep = False
 35 |     elif encoding in ['old']:
 36 |         fmt_ordered = cardlib.fmt_ordered_old
 37 |     elif encoding in ['norarity']:
 38 |         fmt_ordered = cardlib.fmt_ordered_norarity
 39 |     elif encoding in ['vec']:
 40 |         pass
 41 |     elif encoding in ['custom']:
 42 |         ## put custom format decisions here ##########################
 43 |         
 44 |         ## end of custom format ######################################
 45 |         pass
 46 |     else:
 47 |         raise ValueError('encode.py: unknown encoding: ' + encoding)
 48 | 
 49 |     if verbose:
 50 |         print 'Preparing to encode:'
 51 |         print '  Using encoding ' + repr(encoding)
 52 |         if stable:
 53 |             print '  NOT randomizing order of cards.'
 54 |         if randomize_mana:
 55 |             print '  Randomizing order of symobls in manacosts.'
 56 |         if not fmt_labeled:
 57 |             print '  NOT labeling fields for this run (may be harder to decode).'
 58 |         if not line_transformations:
 59 |             print '  NOT using line reordering transformations'
 60 | 
 61 |     cards = jdecode.mtg_open_file(fname, verbose=verbose, linetrans=line_transformations)
 62 | 
 63 |     # This should give a random but consistent ordering, to make comparing changes
 64 |     # between the output of different versions easier.
 65 |     if not stable:
 66 |         random.seed(1371367)
 67 |         random.shuffle(cards)
 68 | 
 69 |     def writecards(writer):
 70 |         for card in cards:
 71 |             if encoding in ['vec']:
 72 |                 writer.write(card.vectorize() + '\n\n')
 73 |             else:
 74 |                 writer.write(card.encode(fmt_ordered = fmt_ordered,
 75 |                                          fmt_labeled = fmt_labeled,
 76 |                                          fieldsep = fieldsep,
 77 |                                          randomize_fields = randomize_fields,
 78 |                                          randomize_mana = randomize_mana,
 79 |                                          initial_sep = initial_sep,
 80 |                                          final_sep = final_sep) 
 81 |                              + utils.cardsep)
 82 | 
 83 |     if oname:
 84 |         if verbose:
 85 |             print 'Writing output to: ' + oname
 86 |         with open(oname, 'w') as ofile:
 87 |             writecards(ofile)
 88 |     else:
 89 |         writecards(sys.stdout)
 90 |         sys.stdout.flush()
 91 | 
 92 | 
 93 | if __name__ == '__main__':
 94 |     import argparse
 95 |     parser = argparse.ArgumentParser()
 96 |     
 97 |     parser.add_argument('infile', 
 98 |                         help='encoded card file or json corpus to encode')
 99 |     parser.add_argument('outfile', nargs='?', default=None,
100 |                         help='output file, defaults to stdout')
101 |     parser.add_argument('-e', '--encoding', default='std', choices=utils.formats,
102 |                         #help='{' + ','.join(formats) + '}',
103 |                         help='encoding format to use',
104 |     )
105 |     parser.add_argument('-r', '--randomize', action='store_true',
106 |                         help='randomize the order of symbols in mana costs')
107 |     parser.add_argument('--nolinetrans', action='store_true',
108 |                         help="don't reorder lines of card text")
109 |     parser.add_argument('--nolabel', action='store_true',
110 |                         help="don't label fields")
111 |     parser.add_argument('-s', '--stable', action='store_true',
112 |                         help="don't randomize the order of the cards")
113 |     parser.add_argument('-v', '--verbose', action='store_true', 
114 |                         help='verbose output')
115 |     
116 |     args = parser.parse_args()
117 |     main(args.infile, args.outfile, verbose = args.verbose, encoding = args.encoding, 
118 |          nolinetrans = args.nolinetrans, randomize = args.randomize, nolabel = args.nolabel, 
119 |          stable = args.stable)
120 |     exit(0)
121 | 


--------------------------------------------------------------------------------
/lib/cbow.py:
--------------------------------------------------------------------------------
  1 | # Infinite thanks to Talcos from the mtgsalvation forums, who among
  2 | # many, many other things wrote the original version of this code.
  3 | # I have merely ported it to fit my needs.
  4 | 
  5 | import re
  6 | import sys
  7 | import subprocess
  8 | import os
  9 | import struct
 10 | import math
 11 | import multiprocessing
 12 | 
 13 | import utils
 14 | import cardlib
 15 | import transforms
 16 | import namediff
 17 | 
 18 | libdir = os.path.dirname(os.path.realpath(__file__))
 19 | datadir = os.path.realpath(os.path.join(libdir, '../data'))
 20 | 
 21 | # multithreading control parameters
 22 | cores = multiprocessing.cpu_count()
 23 | 
 24 | # max length of vocabulary entries
 25 | max_w = 50
 26 | 
 27 | 
 28 | #### snip! ####
 29 | 
 30 | def read_vector_file(fname):
 31 |     with open(fname, 'rb') as f:
 32 |         words = int(f.read(4))
 33 |         size = int(f.read(4))
 34 |         vocab = [' '] * (words * max_w)
 35 |         M = []
 36 |         for b in range(0,words):
 37 |             a = 0
 38 |             while True:
 39 |                 c = f.read(1)
 40 |                 vocab[b * max_w + a] = c;
 41 |                 if len(c) == 0 or c == ' ':
 42 |                     break
 43 |                 if (a < max_w) and vocab[b * max_w + a] != '\n':
 44 |                     a += 1
 45 |             tmp = list(struct.unpack('f'*size,f.read(4 * size)))
 46 |             length = math.sqrt(sum([tmp[i] * tmp[i] for i in range(0,len(tmp))]))
 47 |             for i in range(0,len(tmp)):
 48 |                 tmp[i] /= length
 49 |             M.append(tmp)
 50 |         return ((''.join(vocab)).split(),M)
 51 | 
 52 | def makevector(vocabulary,vecs,sequence):
 53 |     words = sequence.split()
 54 |     indices = []
 55 |     for word in words:
 56 |         if word not in vocabulary:
 57 |             #print("Missing word in vocabulary: " + word)
 58 |             continue
 59 |             #return [0.0]*len(vecs[0])
 60 |         indices.append(vocabulary.index(word))
 61 |     #res = map(sum,[vecs[i] for i in indices])
 62 |     res = None
 63 |     for v in [vecs[i] for i in indices]:
 64 |         if res == None:
 65 |             res = v
 66 |         else:
 67 |             res = [x + y for x, y in zip(res,v)]
 68 | 
 69 |     # bad things happen if we have a vector of only unknown words
 70 |     if res is None:
 71 |         return [0.0]*len(vecs[0])
 72 | 
 73 |     length = math.sqrt(sum([res[i] * res[i] for i in range(0,len(res))]))
 74 |     for i in range(0,len(res)):
 75 |         res[i] /= length
 76 |     return res
 77 | 
 78 | #### !snip ####
 79 | 
 80 | 
 81 | try:
 82 |     import numpy
 83 |     def cosine_similarity(v1,v2):
 84 |         A = numpy.array([v1,v2])
 85 | 
 86 |         # from http://stackoverflow.com/questions/17627219/whats-the-fastest-way-in-python-to-calculate-cosine-similarity-given-sparse-mat
 87 | 
 88 |         # base similarity matrix (all dot products)
 89 |         # replace this with A.dot(A.T).todense() for sparse representation
 90 |         similarity = numpy.dot(A, A.T)
 91 |         
 92 |         # squared magnitude of preference vectors (number of occurrences)
 93 |         square_mag = numpy.diag(similarity)
 94 | 
 95 |         # inverse squared magnitude
 96 |         inv_square_mag = 1 / square_mag
 97 | 
 98 |         # if it doesn't occur, set it's inverse magnitude to zero (instead of inf)
 99 |         inv_square_mag[numpy.isinf(inv_square_mag)] = 0
100 | 
101 |         # inverse of the magnitude
102 |         inv_mag = numpy.sqrt(inv_square_mag)
103 |         
104 |         # cosine similarity (elementwise multiply by inverse magnitudes)
105 |         cosine = similarity * inv_mag
106 |         cosine = cosine.T * inv_mag
107 |     
108 |         return cosine[0][1]
109 | 
110 | except ImportError:
111 |     def cosine_similarity(v1,v2):
112 |         #compute cosine similarity of v1 to v2: (v1 dot v1)/{||v1||*||v2||)
113 |         sumxx, sumxy, sumyy = 0, 0, 0
114 |         for i in range(len(v1)):
115 |             x = v1[i]; y = v2[i]
116 |             sumxx += x*x
117 |             sumyy += y*y
118 |             sumxy += x*y
119 |         return sumxy/math.sqrt(sumxx*sumyy)
120 | 
121 | def cosine_similarity_name(cardvec, v, name):
122 |     return (cosine_similarity(cardvec, v), name)
123 | 
124 | # we need to put the logic in a regular function (as opposed to a method of an object)
125 | # so that we can pass the function to multiprocessing
126 | def f_nearest(card, vocab, vecs, cardvecs, n):
127 |     if isinstance(card, cardlib.Card):
128 |         words = card.vectorize().split('\n\n')[0]
129 |     else:
130 |         # assume it's a string (that's already a vector)
131 |         words = card
132 |             
133 |     if not words:
134 |         return []
135 | 
136 |     cardvec = makevector(vocab, vecs, words)
137 | 
138 |     comparisons = [cosine_similarity_name(cardvec, v, name) for (name, v) in cardvecs]
139 | 
140 |     comparisons.sort(reverse = True)
141 |     comp_n = comparisons[:n]
142 |     
143 |     if isinstance(card, cardlib.Card) and card.bside:
144 |         comp_n += f_nearest(card.bside, vocab, vecs, cardvecs, n=n)
145 | 
146 |     return comp_n
147 | 
148 | def f_nearest_per_thread(workitem):
149 |     (workcards, vocab, vecs, cardvecs, n) = workitem
150 |     return map(lambda card: f_nearest(card, vocab, vecs, cardvecs, n), workcards)
151 | 
152 | class CBOW:
153 |     def __init__(self, verbose = True,
154 |                  vector_fname = os.path.join(datadir, 'cbow.bin'), 
155 |                  card_fname = os.path.join(datadir, 'output.txt')):
156 |         self.verbose = verbose
157 |         self.cardvecs = []
158 | 
159 |         if self.verbose:
160 |             print 'Building a cbow model...'
161 | 
162 |         if self.verbose:
163 |             print '  Reading binary vector data from: ' + vector_fname
164 |         (vocab, vecs) = read_vector_file(vector_fname)
165 |         self.vocab = vocab
166 |         self.vecs = vecs
167 |         
168 |         if self.verbose:
169 |             print '  Reading encoded cards from: ' + card_fname
170 |             print '  They\'d better be in the same order as the file used to build the vector model!'
171 |         with open(card_fname, 'rt') as f:
172 |             text = f.read()
173 |         for card_src in text.split(utils.cardsep):
174 |             if card_src:
175 |                 card = cardlib.Card(card_src)
176 |                 name = card.name
177 |                 self.cardvecs += [(name, makevector(self.vocab, 
178 |                                                     self.vecs, 
179 |                                                     card.vectorize()))]
180 |                 
181 |         if self.verbose:
182 |             print '... Done.'
183 |             print '  vocab size: ' + str(len(self.vocab))
184 |             print '  raw vecs:   ' + str(len(self.vecs))
185 |             print '  card vecs:  ' + str(len(self.cardvecs))
186 | 
187 |     def nearest(self, card, n=5):
188 |         return f_nearest(card, self.vocab, self.vecs, self.cardvecs, n)
189 | 
190 |     def nearest_par(self, cards, n=5, threads=cores):
191 |         workpool = multiprocessing.Pool(threads)
192 |         proto_worklist = namediff.list_split(cards, threads)
193 |         worklist = map(lambda x: (x, self.vocab, self.vecs, self.cardvecs, n), proto_worklist)
194 |         donelist = workpool.map(f_nearest_per_thread, worklist)
195 |         return namediff.list_flatten(donelist)
196 | 


--------------------------------------------------------------------------------
/lib/config.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | 
 3 | # Utilities for handling unicode, unary numbers, mana costs, and special symbols.
 4 | # For convenience we redefine everything from utils so that it can all be accessed
 5 | # from the utils module.
 6 | 
 7 | # separators
 8 | cardsep = '\n\n'
 9 | fieldsep = '|'
10 | bsidesep = '\n'
11 | newline = '\\'
12 | 
13 | # special indicators
14 | dash_marker = '~'
15 | bullet_marker = '='
16 | this_marker = '@'
17 | counter_marker = '%'
18 | reserved_marker = '\v'
19 | reserved_mana_marker = '$'
20 | choice_open_delimiter = '['
21 | choice_close_delimiter = ']'
22 | x_marker = 'X'
23 | tap_marker = 'T'
24 | untap_marker = 'Q'
25 | # second letter of the word
26 | rarity_common_marker = 'O'
27 | rarity_uncommon_marker = 'N'
28 | rarity_rare_marker = 'A'
29 | rarity_mythic_marker = 'Y'
30 | # with some crazy exceptions
31 | rarity_special_marker = 'E'
32 | rarity_basic_land_marker = 'L'
33 | 
34 | # unambiguous synonyms
35 | counter_rename = 'uncast'
36 | 
37 | # unary numbers
38 | unary_marker = '&'
39 | unary_counter = '^'
40 | unary_max = 20
41 | unary_exceptions = {
42 |     25 : 'twenty' + dash_marker + 'five',
43 |     30 : 'thirty',
44 |     40 : 'forty',
45 |     50 : 'fifly',
46 |     100: 'one hundred',
47 |     200: 'two hundred',
48 | }
49 | 
50 | # field labels, to allow potential reordering of card format
51 | field_label_name = '1'
52 | field_label_rarity = '0' # 2 is part of some mana symbols {2/B} ...
53 | field_label_cost = '3'
54 | field_label_supertypes = '4'
55 | field_label_types = '5'
56 | field_label_subtypes = '6'
57 | field_label_loyalty = '7'
58 | field_label_pt = '8'
59 | field_label_text = '9'
60 | 
61 | # additional fields we add to the json cards
62 | json_field_bside = 'bside'
63 | json_field_set_name = 'setName'
64 | json_field_info_code = 'magicCardsInfoCode'
65 | 


--------------------------------------------------------------------------------
/lib/datalib.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | 
  3 | import utils
  4 | from cardlib import Card
  5 | 
  6 | # Format a list of rows of data into nice columns.
  7 | # Note that it's the columns that are nice, not this code.
  8 | def padrows(l):
  9 |     # get length for each field
 10 |     lens = []
 11 |     for ll in l:
 12 |         for i, field in enumerate(ll):
 13 |             if i < len(lens):
 14 |                 lens[i] = max(len(str(field)), lens[i])
 15 |             else:
 16 |                 lens += [len(str(field))]
 17 |     # now pad out to that length
 18 |     padded = []
 19 |     for ll in l:
 20 |         padded += ['']
 21 |         for i, field in enumerate(ll):
 22 |             s = str(field)
 23 |             pad = ' ' * (lens[i] - len(s))
 24 |             padded[-1] += (s + pad + ' ')
 25 |     return padded
 26 | def printrows(l):
 27 |     for row in l:
 28 |         print row
 29 | 
 30 | # index management helpers
 31 | def index_size(d):
 32 |     return sum(map(lambda k: len(d[k]), d))
 33 | 
 34 | def inc(d, k, obj):
 35 |     if k or k == 0:
 36 |         if k in d:
 37 |             d[k] += obj
 38 |         else:
 39 |             d[k] = obj
 40 | 
 41 | # thanks gleemax
 42 | def plimit(s, mlen = 1000):
 43 |     if len(s) > mlen:
 44 |         return s[:1000] + '[...]'
 45 |     else:
 46 |         return s
 47 | 
 48 | class Datamine:
 49 |     # build the global indices
 50 |     def __init__(self, card_srcs):
 51 |         # global card pools
 52 |         self.unparsed_cards = []
 53 |         self.invalid_cards = []
 54 |         self.cards = []
 55 |         self.allcards = []
 56 |         
 57 |         # global indices
 58 |         self.by_name = {}
 59 |         self.by_type = {}
 60 |         self.by_type_inclusive = {}
 61 |         self.by_supertype = {}
 62 |         self.by_supertype_inclusive = {}
 63 |         self.by_subtype = {}
 64 |         self.by_subtype_inclusive = {}
 65 |         self.by_color = {}
 66 |         self.by_color_inclusive = {}
 67 |         self.by_color_count = {}
 68 |         self.by_cmc = {}
 69 |         self.by_cost = {}
 70 |         self.by_power = {}
 71 |         self.by_toughness = {}
 72 |         self.by_pt = {}
 73 |         self.by_loyalty = {}
 74 |         self.by_textlines = {}
 75 |         self.by_textlen = {}
 76 | 
 77 |         self.indices = {
 78 |             'by_name' : self.by_name,
 79 |             'by_type' : self.by_type,
 80 |             'by_type_inclusive' : self.by_type_inclusive,
 81 |             'by_supertype' : self.by_supertype,
 82 |             'by_supertype_inclusive' : self.by_supertype_inclusive,
 83 |             'by_subtype' : self.by_subtype,
 84 |             'by_subtype_inclusive' : self.by_subtype_inclusive,
 85 |             'by_color' : self.by_color,
 86 |             'by_color_inclusive' : self.by_color_inclusive,
 87 |             'by_color_count' : self.by_color_count,
 88 |             'by_cmc' : self.by_cmc,
 89 |             'by_cost' : self.by_cost,
 90 |             'by_power' : self.by_power,
 91 |             'by_toughness' : self.by_toughness,
 92 |             'by_pt' : self.by_pt,
 93 |             'by_loyalty' : self.by_loyalty,
 94 |             'by_textlines' : self.by_textlines,
 95 |             'by_textlen' : self.by_textlen,
 96 |         }
 97 | 
 98 |         for card_src in card_srcs:
 99 |             # the empty card is not interesting
100 |             if not card_src:
101 |                 continue
102 |             card = Card(card_src)
103 |             if card.valid:
104 |                 self.cards += [card]
105 |                 self.allcards += [card]
106 |             elif card.parsed:
107 |                 self.invalid_cards += [card]
108 |                 self.allcards += [card]
109 |             else:
110 |                 self.unparsed_cards += [card]
111 | 
112 |             if card.parsed:
113 |                 inc(self.by_name, card.name, [card])
114 | 
115 |                 inc(self.by_type, ' '.join(card.types), [card])
116 |                 for t in card.types:
117 |                     inc(self.by_type_inclusive, t, [card])
118 |                 inc(self.by_supertype, ' '.join(card.supertypes), [card])
119 |                 for t in card.supertypes:
120 |                     inc(self.by_supertype_inclusive, t, [card])
121 |                 inc(self.by_subtype, ' '.join(card.subtypes), [card])
122 |                 for t in card.subtypes:
123 |                     inc(self.by_subtype_inclusive, t, [card])
124 | 
125 |                 if card.cost.colors:
126 |                     inc(self.by_color, card.cost.colors, [card])
127 |                     for c in card.cost.colors:
128 |                         inc(self.by_color_inclusive, c, [card])
129 |                     inc(self.by_color_count, len(card.cost.colors), [card])
130 |                 else:
131 |                     # colorless, still want to include in these tables
132 |                     inc(self.by_color, 'A', [card])
133 |                     inc(self.by_color_inclusive, 'A', [card])
134 |                     inc(self.by_color_count, 0, [card])
135 | 
136 |                 inc(self.by_cmc, card.cost.cmc, [card])
137 |                 inc(self.by_cost, card.cost.encode() if card.cost.encode() else 'none', [card])
138 | 
139 |                 inc(self.by_power, card.pt_p, [card])
140 |                 inc(self.by_toughness, card.pt_t, [card])
141 |                 inc(self.by_pt, card.pt, [card])
142 | 
143 |                 inc(self.by_loyalty, card.loyalty, [card])
144 | 
145 |                 inc(self.by_textlines, len(card.text_lines), [card])
146 |                 inc(self.by_textlen, len(card.text.encode()), [card])
147 | 
148 |     # summarize the indices
149 |     # Yes, this printing code is pretty terrible.
150 |     def summarize(self, hsize = 10, vsize = 10, cmcsize = 20):
151 |         print '===================='
152 |         print str(len(self.cards)) + ' valid cards, ' + str(len(self.invalid_cards)) + ' invalid cards.'
153 |         print str(len(self.allcards)) + ' cards parsed, ' + str(len(self.unparsed_cards)) + ' failed to parse'
154 |         print '--------------------'
155 |         print str(len(self.by_name)) + ' unique card names'
156 |         print '--------------------'
157 |         print (str(len(self.by_color_inclusive)) + ' represented colors (including colorless as \'A\'), ' 
158 |                + str(len(self.by_color)) + ' combinations')
159 |         print 'Breakdown by color:'
160 |         rows = [self.by_color_inclusive.keys()]
161 |         rows += [[len(self.by_color_inclusive[k]) for k in rows[0]]]
162 |         printrows(padrows(rows))
163 |         print 'Breakdown by number of colors:'
164 |         rows = [self.by_color_count.keys()]
165 |         rows += [[len(self.by_color_count[k]) for k in rows[0]]]
166 |         printrows(padrows(rows))
167 |         print '--------------------'
168 |         print str(len(self.by_type_inclusive)) + ' unique card types, ' + str(len(self.by_type)) + ' combinations'
169 |         print 'Breakdown by type:'
170 |         d = sorted(self.by_type_inclusive, 
171 |                    lambda x,y: cmp(len(self.by_type_inclusive[x]), len(self.by_type_inclusive[y])), 
172 |                    reverse = True)
173 |         rows = [[k for k in d[:hsize]]]
174 |         rows += [[len(self.by_type_inclusive[k]) for k in rows[0]]]
175 |         printrows(padrows(rows))
176 |         print '--------------------'
177 |         print (str(len(self.by_subtype_inclusive)) + ' unique subtypes, ' 
178 |                + str(len(self.by_subtype)) + ' combinations')
179 |         print '-- Popular subtypes: --'
180 |         d = sorted(self.by_subtype_inclusive, 
181 |                    lambda x,y: cmp(len(self.by_subtype_inclusive[x]), len(self.by_subtype_inclusive[y])), 
182 |                    reverse = True)
183 |         rows = []
184 |         for k in d[0:vsize]:
185 |             rows += [[k, len(self.by_subtype_inclusive[k])]]
186 |         printrows(padrows(rows))
187 |         print '-- Top combinations: --'
188 |         d = sorted(self.by_subtype, 
189 |                    lambda x,y: cmp(len(self.by_subtype[x]), len(self.by_subtype[y])), 
190 |                    reverse = True)
191 |         rows = []
192 |         for k in d[0:vsize]:
193 |             rows += [[k, len(self.by_subtype[k])]]
194 |         printrows(padrows(rows))
195 |         print '--------------------'
196 |         print (str(len(self.by_supertype_inclusive)) + ' unique supertypes, ' 
197 |                + str(len(self.by_supertype)) + ' combinations')
198 |         print 'Breakdown by supertype:'
199 |         d = sorted(self.by_supertype_inclusive, 
200 |                    lambda x,y: cmp(len(self.by_supertype_inclusive[x]),len(self.by_supertype_inclusive[y])), 
201 |                    reverse = True)
202 |         rows = [[k for k in d[:hsize]]]
203 |         rows += [[len(self.by_supertype_inclusive[k]) for k in rows[0]]]
204 |         printrows(padrows(rows))
205 |         print '--------------------'
206 |         print str(len(self.by_cmc)) + ' different CMCs, ' + str(len(self.by_cost)) + ' unique mana costs'
207 |         print 'Breakdown by CMC:'
208 |         d = sorted(self.by_cmc, reverse = False)
209 |         rows = [[k for k in d[:cmcsize]]]
210 |         rows += [[len(self.by_cmc[k]) for k in rows[0]]]
211 |         printrows(padrows(rows))
212 |         print '-- Popular mana costs: --'
213 |         d = sorted(self.by_cost, 
214 |                    lambda x,y: cmp(len(self.by_cost[x]), len(self.by_cost[y])), 
215 |                    reverse = True)
216 |         rows = []
217 |         for k in d[0:vsize]:
218 |             rows += [[utils.from_mana(k), len(self.by_cost[k])]]
219 |         printrows(padrows(rows))
220 |         print '--------------------'
221 |         print str(len(self.by_pt)) + ' unique p/t combinations'
222 |         if len(self.by_power) > 0 and len(self.by_toughness) > 0:
223 |             print ('Largest power: ' + str(max(map(len, self.by_power)) - 1) + 
224 |                    ', largest toughness: ' + str(max(map(len, self.by_toughness)) - 1))
225 |         print '-- Popular p/t values: --'
226 |         d = sorted(self.by_pt, 
227 |                    lambda x,y: cmp(len(self.by_pt[x]), len(self.by_pt[y])), 
228 |                    reverse = True)
229 |         rows = []
230 |         for k in d[0:vsize]:
231 |             rows += [[utils.from_unary(k), len(self.by_pt[k])]]
232 |         printrows(padrows(rows))
233 |         print '--------------------'
234 |         print 'Loyalty values:'
235 |         d = sorted(self.by_loyalty, 
236 |                    lambda x,y: cmp(len(self.by_loyalty[x]), len(self.by_loyalty[y])), 
237 |                    reverse = True)
238 |         rows = []
239 |         for k in d[0:vsize]:
240 |             rows += [[utils.from_unary(k), len(self.by_loyalty[k])]]
241 |         printrows(padrows(rows))
242 |         print '--------------------'
243 |         if len(self.by_textlen) > 0 and len(self.by_textlines) > 0:
244 |             print('Card text ranges from ' + str(min(self.by_textlen)) + ' to ' 
245 |                   + str(max(self.by_textlen)) + ' characters in length')
246 |             print('Card text ranges from ' + str(min(self.by_textlines)) + ' to '
247 |                   + str(max(self.by_textlines)) + ' lines')
248 |         print '-- Line counts by frequency: --'
249 |         d = sorted(self.by_textlines, 
250 |                    lambda x,y: cmp(len(self.by_textlines[x]), len(self.by_textlines[y])), 
251 |                    reverse = True)
252 |         rows = []
253 |         for k in d[0:vsize]:
254 |             rows += [[k, len(self.by_textlines[k])]]
255 |         printrows(padrows(rows))
256 |         print '===================='
257 | 
258 | 
259 |     # describe outliers in the indices
260 |     def outliers(self, hsize = 10, vsize = 10, dump_invalid = False):
261 |         print '********************'
262 |         print 'Overview of indices:'
263 |         rows = [['Index Name', 'Keys', 'Total Members']]
264 |         for index in self.indices:
265 |             rows += [[index, len(self.indices[index]), index_size(self.indices[index])]]
266 |         printrows(padrows(rows))
267 |         print '********************'
268 |         if len(self.by_name) > 0:
269 |             scardname =  sorted(self.by_name, 
270 |                                 lambda x,y: cmp(len(x), len(y)), 
271 |                                 reverse = False)[0]
272 |             print 'Shortest Cardname: (' + str(len(scardname)) + ')'
273 |             print '  ' + scardname
274 |             lcardname =  sorted(self.by_name, 
275 |                                 lambda x,y: cmp(len(x), len(y)), 
276 |                                 reverse = True)[0]
277 |             print 'Longest Cardname: (' + str(len(lcardname)) + ')'
278 |             print '  ' + lcardname
279 |             d = sorted(self.by_name, 
280 |                        lambda x,y: cmp(len(self.by_name[x]), len(self.by_name[y])), 
281 |                        reverse = True)
282 |             rows = []
283 |             for k in d[0:vsize]:
284 |                 if len(self.by_name[k]) > 1:
285 |                     rows += [[k, len(self.by_name[k])]]
286 |             if rows == []:
287 |                 print('No duplicated cardnames')
288 |             else:
289 |                 print '-- Most duplicated names: --'
290 |                 printrows(padrows(rows))
291 |         else:
292 |             print 'No cards indexed by name?'
293 |         print '--------------------'
294 |         if len(self.by_type) > 0:
295 |             ltypes = sorted(self.by_type, 
296 |                             lambda x,y: cmp(len(x), len(y)), 
297 |                             reverse = True)[0]
298 |             print 'Longest card type: (' + str(len(ltypes)) + ')'
299 |             print '  ' + ltypes
300 |         else:
301 |             print 'No cards indexed by type?'
302 |         if len(self.by_subtype) > 0:
303 |             lsubtypes = sorted(self.by_subtype, 
304 |                                lambda x,y: cmp(len(x), len(y)), 
305 |                                reverse = True)[0]
306 |             print 'Longest subtype: (' + str(len(lsubtypes)) + ')'
307 |             print '  ' + lsubtypes
308 |         else:
309 |             print 'No cards indexed by subtype?'
310 |         if len(self.by_supertype) > 0:
311 |             lsupertypes = sorted(self.by_supertype, 
312 |                             lambda x,y: cmp(len(x), len(y)), 
313 |                                  reverse = True)[0]
314 |             print 'Longest supertype: (' + str(len(lsupertypes)) + ')'
315 |             print '  ' + lsupertypes
316 |         else:
317 |             print 'No cards indexed by supertype?'
318 |         print '--------------------'
319 |         if len(self.by_cost) > 0:
320 |             lcost = sorted(self.by_cost, 
321 |                            lambda x,y: cmp(len(x), len(y)), 
322 |                            reverse = True)[0]
323 |             print 'Longest mana cost: (' + str(len(lcost)) + ')'
324 |             print '  ' + utils.from_mana(lcost)
325 |             print '\n' + plimit(self.by_cost[lcost][0].encode()) + '\n'
326 |         else:
327 |             print 'No cards indexed by cost?'
328 |         if len(self.by_cmc) > 0:
329 |             lcmc = sorted(self.by_cmc, reverse = True)[0]
330 |             print 'Largest cmc: (' + str(lcmc) + ')'
331 |             print '  ' + str(self.by_cmc[lcmc][0].cost)
332 |             print '\n' + plimit(self.by_cmc[lcmc][0].encode())
333 |         else:
334 |             print 'No cards indexed by cmc?'
335 |         print '--------------------'
336 |         if len(self.by_power) > 0:
337 |             lpower = sorted(self.by_power, 
338 |                             lambda x,y: cmp(len(x), len(y)), 
339 |                             reverse = True)[0]
340 |             print 'Largest creature power: ' + utils.from_unary(lpower)
341 |             print '\n' + plimit(self.by_power[lpower][0].encode()) + '\n'
342 |         else: 
343 |             print 'No cards indexed by power?'
344 |         if len(self.by_toughness) > 0:
345 |             ltoughness = sorted(self.by_toughness, 
346 |                             lambda x,y: cmp(len(x), len(y)), 
347 |                             reverse = True)[0]
348 |             print 'Largest creature toughness: ' + utils.from_unary(ltoughness)
349 |             print '\n' + plimit(self.by_toughness[ltoughness][0].encode())
350 |         else: 
351 |             print 'No cards indexed by toughness?'
352 |         print '--------------------'
353 |         if len(self.by_textlines) > 0:
354 |             llines = sorted(self.by_textlines, reverse = True)[0]
355 |             print 'Most lines of text in a card: ' + str(llines)
356 |             print '\n' + plimit(self.by_textlines[llines][0].encode()) + '\n'
357 |         else: 
358 |             print 'No cards indexed by line count?'
359 |         if len(self.by_textlen) > 0:
360 |             ltext = sorted(self.by_textlen, reverse = True)[0]
361 |             print 'Most chars in a card text: ' + str(ltext)
362 |             print '\n' + plimit(self.by_textlen[ltext][0].encode())
363 |         else: 
364 |             print 'No cards indexed by char count?'
365 |         print '--------------------'
366 |         print 'There were ' + str(len(self.invalid_cards)) + ' invalid cards.'
367 |         if dump_invalid:
368 |             for card in self.invalid_cards:
369 |                 print '\n' + repr(card.fields)
370 |         elif len(self.invalid_cards) > 0:
371 |             print 'Not summarizing.'
372 |         print '--------------------'
373 |         print 'There were ' + str(len(self.unparsed_cards)) + ' unparsed cards.'
374 |         if dump_invalid:
375 |             for card in self.unparsed_cards:
376 |                 print '\n' + repr(card.fields)
377 |         elif len(self.unparsed_cards) > 0:
378 |             print 'Not summarizing.'
379 |         print '===================='
380 | 


--------------------------------------------------------------------------------
/lib/jdecode.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | 
  3 | import utils
  4 | import cardlib
  5 | 
  6 | def mtg_open_json(fname, verbose = False):
  7 | 
  8 |     with open(fname, 'r') as f:
  9 |         jobj = json.load(f)
 10 |     
 11 |     allcards = {}
 12 |     asides = {}
 13 |     bsides = {}
 14 | 
 15 |     for k_set in jobj:
 16 |         set = jobj[k_set]
 17 |         setname = set['name']
 18 |         if 'magicCardsInfoCode' in set:
 19 |             codename = set['magicCardsInfoCode']
 20 |         else:
 21 |             codename = ''
 22 |         
 23 |         for card in set['cards']:
 24 |             card[utils.json_field_set_name] = setname
 25 |             card[utils.json_field_info_code] = codename
 26 | 
 27 |             cardnumber = None
 28 |             if 'number' in card:
 29 |                 cardnumber = card['number']
 30 |             # the lower avoids duplication of at least one card (Will-o/O'-the-Wisp)
 31 |             cardname = card['name'].lower()
 32 | 
 33 |             uid = set['code']
 34 |             if cardnumber == None:
 35 |                 uid = uid + '_' + cardname + '_'
 36 |             else:
 37 |                 uid = uid + '_' + cardnumber
 38 | 
 39 |             # aggregate by name to avoid duplicates, not counting bsides
 40 |             if not uid[-1] == 'b':
 41 |                 if cardname in allcards:
 42 |                     allcards[cardname] += [card]
 43 |                 else:
 44 |                     allcards[cardname] = [card]
 45 |                     
 46 |             # also aggregate aside cards by uid so we can add bsides later
 47 |             if uid[-1:] == 'a':
 48 |                 asides[uid] = card
 49 |             if uid[-1:] == 'b':
 50 |                 bsides[uid] = card
 51 | 
 52 |     for uid in bsides:
 53 |         aside_uid = uid[:-1] + 'a'
 54 |         if aside_uid in asides:
 55 |             # the second check handles the brothers yamazaki edge case
 56 |             if not asides[aside_uid]['name'] == bsides[uid]['name']:
 57 |                 asides[aside_uid][utils.json_field_bside] = bsides[uid]
 58 |         else:
 59 |             pass
 60 |             # this exposes some coldsnap theme deck bsides that aren't
 61 |             # really bsides; shouldn't matter too much
 62 |             #print aside_uid
 63 |             #print bsides[uid]
 64 | 
 65 |     if verbose:
 66 |         print 'Opened ' + str(len(allcards)) + ' uniquely named cards.'
 67 |     return allcards
 68 | 
 69 | # filters to ignore some undesirable cards, only used when opening json
 70 | def default_exclude_sets(cardset):
 71 |     return cardset == 'Unglued' or cardset == 'Unhinged' or cardset == 'Celebration'
 72 | 
 73 | def default_exclude_types(cardtype):
 74 |     return cardtype in ['conspiracy']
 75 | 
 76 | def default_exclude_layouts(layout):
 77 |     return layout in ['token', 'plane', 'scheme', 'phenomenon', 'vanguard']
 78 | 
 79 | # centralized logic for opening files of cards, either encoded or json
 80 | def mtg_open_file(fname, verbose = False,
 81 |                   linetrans = True, fmt_ordered = cardlib.fmt_ordered_default,
 82 |                   exclude_sets = default_exclude_sets,
 83 |                   exclude_types = default_exclude_types,
 84 |                   exclude_layouts = default_exclude_layouts):
 85 | 
 86 |     cards = []
 87 |     valid = 0
 88 |     skipped = 0
 89 |     invalid = 0
 90 |     unparsed = 0
 91 | 
 92 |     if fname[-5:] == '.json':
 93 |         if verbose:
 94 |             print 'This looks like a json file: ' + fname
 95 |         json_srcs = mtg_open_json(fname, verbose)
 96 |         # sorted for stability
 97 |         for json_cardname in sorted(json_srcs):
 98 |             if len(json_srcs[json_cardname]) > 0:
 99 |                 jcards = json_srcs[json_cardname]
100 | 
101 |                 # look for a normal rarity version, in a set we can use
102 |                 idx = 0
103 |                 card = cardlib.Card(jcards[idx], linetrans=linetrans)
104 |                 while (idx < len(jcards)
105 |                        and (card.rarity == utils.rarity_special_marker
106 |                             or exclude_sets(jcards[idx][utils.json_field_set_name]))):
107 |                     idx += 1
108 |                     if idx < len(jcards):
109 |                         card = cardlib.Card(jcards[idx], linetrans=linetrans)
110 |                 # if there isn't one, settle with index 0
111 |                 if idx >= len(jcards):
112 |                     idx = 0
113 |                     card = cardlib.Card(jcards[idx], linetrans=linetrans)
114 |                 # we could go back and look for a card satisfying one of the criteria,
115 |                 # but eh
116 | 
117 |                 skip = False
118 |                 if (exclude_sets(jcards[idx][utils.json_field_set_name])
119 |                     or exclude_layouts(jcards[idx]['layout'])):
120 |                     skip = True
121 |                 for cardtype in card.types:
122 |                     if exclude_types(cardtype):
123 |                         skip = True
124 |                 if skip:
125 |                     skipped += 1
126 |                     continue
127 | 
128 |                 if card.valid:
129 |                     valid += 1
130 |                     cards += [card]
131 |                 elif card.parsed:
132 |                     invalid += 1
133 |                     if verbose:
134 | 		        print 'Invalid card: ' + json_cardname
135 |                 else:
136 |                     unparsed += 1
137 | 
138 |     # fall back to opening a normal encoded file
139 |     else:
140 |         if verbose:
141 |             print 'Opening encoded card file: ' + fname
142 |         with open(fname, 'rt') as f:
143 |             text = f.read()
144 |         for card_src in text.split(utils.cardsep):
145 |             if card_src:
146 |                 card = cardlib.Card(card_src, fmt_ordered=fmt_ordered)
147 |                 # unlike opening from json, we still want to return invalid cards
148 |                 cards += [card]
149 |                 if card.valid:
150 |                     valid += 1
151 |                 elif card.parsed:
152 |                     invalid += 1
153 |                     if verbose:
154 | 		        print 'Invalid card: ' + json_cardname
155 |                 else:
156 |                     unparsed += 1
157 | 
158 |     if verbose:
159 |         print (str(valid) + ' valid, ' + str(skipped) + ' skipped, '
160 |                + str(invalid) + ' invalid, ' + str(unparsed) + ' failed to parse.')
161 | 
162 |     good_count = 0
163 |     bad_count = 0
164 |     for card in cards:
165 |         if not card.parsed and not card.text.text:
166 |             bad_count += 1
167 |         elif len(card.name) > 50 or len(card.rarity) > 3:
168 |             bad_count += 1
169 |         else:
170 |             good_count += 1
171 |         if good_count + bad_count > 15:
172 |             break
173 |     # random heuristic
174 |     if bad_count > 10:
175 |         print 'WARNING: Saw a bunch of unparsed cards:'
176 |         print '         Is this a legacy format, you may need to specify the field order.'
177 | 
178 |     return cards
179 | 


--------------------------------------------------------------------------------
/lib/manalib.py:
--------------------------------------------------------------------------------
  1 | # representation for mana costs and text with embedded mana costs
  2 | # data aggregating classes
  3 | import re
  4 | import random
  5 | 
  6 | import utils
  7 | 
  8 | class Manacost:
  9 |     '''mana cost representation with data'''
 10 |     
 11 |     # hardcoded to be dependent on the symbol structure... ah well
 12 |     def get_colors(self):
 13 |         colors = ''
 14 |         for sym in self.symbols:
 15 |             if self.symbols[sym] > 0:
 16 |                 symcolors = re.sub(r'2|P|S|X|C', '', sym)
 17 |                 for symcolor in symcolors:
 18 |                     if symcolor not in colors:
 19 |                         colors += symcolor
 20 |         # sort so the order is always consistent
 21 |         return ''.join(sorted(colors))
 22 | 
 23 |     def check_colors(self, symbolstring):
 24 |         for sym in symbolstring:
 25 |             if not sym in self.colors:
 26 |                 return False
 27 |         return True
 28 | 
 29 |     def __init__(self, src, fmt = ''):
 30 |         # source fields, exactly one will be set
 31 |         self.raw = None
 32 |         self.json = None
 33 |         # flags
 34 |         self.parsed = True
 35 |         self.valid = True
 36 |         self.none = False
 37 |         # default values for all fields
 38 |         self.inner = None
 39 |         self.cmc = 0
 40 |         self.colorless = 0
 41 |         self.sequence = []
 42 |         self.symbols = {sym : 0 for sym in utils.mana_syms}
 43 |         self.allsymbols = {sym : 0 for sym in utils.mana_symall}
 44 |         self.colors = ''
 45 | 
 46 |         if fmt == 'json':
 47 |             self.json = src
 48 |             text = utils.mana_translate(self.json.upper())
 49 |         else:
 50 |             self.raw = src
 51 |             text = self.raw
 52 | 
 53 |         if text == '':
 54 |             self.inner = ''
 55 |             self.none = True
 56 | 
 57 |         elif not (len(text) >= 2 and text[0] == '{' and text[-1] == '}'):
 58 |             self.parsed = False
 59 |             self.valid = False
 60 | 
 61 |         else:
 62 |             self.inner = text[1:-1]
 63 | 
 64 |             # structure mirrors the decoding in utils, but we pull out different data here
 65 |             idx = 0
 66 |             while idx < len(self.inner):
 67 |                 # taking this branch is an infinite loop if unary_marker is empty
 68 |                 if (len(utils.mana_unary_marker) > 0 and 
 69 |                     self.inner[idx:idx+len(utils.mana_unary_marker)] == utils.mana_unary_marker):
 70 |                     idx += len(utils.mana_unary_marker)
 71 |                     self.sequence += [utils.mana_unary_marker]
 72 |                 elif self.inner[idx:idx+len(utils.mana_unary_counter)] == utils.mana_unary_counter:
 73 |                     idx += len(utils.mana_unary_counter)
 74 |                     self.sequence += [utils.mana_unary_counter]
 75 |                     self.colorless += 1
 76 |                     self.cmc += 1
 77 |                 else:
 78 |                     old_idx = idx
 79 |                     for symlen in range(utils.mana_symlen_min, utils.mana_symlen_max + 1):
 80 |                         encoded_sym = self.inner[idx:idx+symlen]
 81 |                         if encoded_sym in utils.mana_symall_decode:
 82 |                             idx += symlen
 83 |                             # leave the sequence encoded for convenience
 84 |                             self.sequence += [encoded_sym]
 85 |                             sym = utils.mana_symall_decode[encoded_sym]
 86 |                             self.allsymbols[sym] += 1
 87 |                             if sym in utils.mana_symalt:
 88 |                                 self.symbols[utils.mana_alt(sym)] += 1
 89 |                             else:
 90 |                                 self.symbols[sym] += 1
 91 |                             if sym == utils.mana_X:
 92 |                                 self.cmc += 0
 93 |                             elif utils.mana_2 in sym:
 94 |                                 self.cmc += 2
 95 |                             else:
 96 |                                 self.cmc += 1
 97 |                             break
 98 |                     # otherwise we'll go into an infinite loop if we see a symbol we don't know
 99 |                     if idx == old_idx:
100 |                         idx += 1
101 |                         self.valid = False
102 | 
103 |         self.colors = self.get_colors()
104 | 
105 |     def __str__(self):
106 |         if self.none:
107 |             return '_NOCOST_'
108 |         return utils.mana_untranslate(utils.mana_open_delimiter + ''.join(self.sequence)
109 |                                       + utils.mana_close_delimiter)
110 | 
111 |     def format(self, for_forum = False, for_html = False):
112 |         if self.none:
113 |             return '_NOCOST_'
114 |         
115 |         else:
116 |             return utils.mana_untranslate(utils.mana_open_delimiter + ''.join(self.sequence)
117 |                                           + utils.mana_close_delimiter, for_forum, for_html)
118 | 
119 |     def encode(self, randomize = False):
120 |         if self.none:
121 |             return ''
122 |         elif randomize:
123 |             # so this won't work very well if mana_unary_marker isn't empty
124 |             return (utils.mana_open_delimiter 
125 |                     + ''.join(random.sample(self.sequence, len(self.sequence)))
126 |                     + utils.mana_close_delimiter)
127 |         else:
128 |             return utils.mana_open_delimiter + ''.join(self.sequence) + utils.mana_close_delimiter
129 | 
130 |     def vectorize(self, delimit = False):
131 |         if self.none:
132 |             return ''
133 |         elif delimit:
134 |             ld = '('
135 |             rd = ')'
136 |         else:
137 |             ld = ''
138 |             rd = ''
139 |         return ' '.join(map(lambda s: ld + s + rd, sorted(self.sequence)))
140 |         
141 | 
142 | class Manatext:
143 |     '''text representation with embedded mana costs'''
144 |     
145 |     def __init__(self, src, fmt = ''):
146 |         # source fields
147 |         self.raw = None
148 |         self.json = None
149 |         # flags
150 |         self.valid = True
151 |         # default values for all fields
152 |         self.text = src
153 |         self.costs = []
154 |         
155 |         if fmt == 'json':
156 |             self.json = src
157 |             manastrs = re.findall(utils.mana_json_regex, src)
158 |         else:
159 |             self.raw = src
160 |             manastrs = re.findall(utils.mana_regex, src)
161 |             
162 |         for manastr in manastrs:
163 |             cost = Manacost(manastr, fmt)
164 |             if not cost.valid:
165 |                 self.valid = False
166 |             self.costs += [cost]
167 |             self.text = self.text.replace(manastr, utils.reserved_mana_marker, 1)
168 | 
169 |         if (utils.mana_open_delimiter in self.text 
170 |             or utils.mana_close_delimiter in self.text
171 |             or utils.mana_json_open_delimiter in self.text 
172 |             or utils.mana_json_close_delimiter in self.text):
173 |             self.valid = False
174 | 
175 |     def __str__(self):
176 |         text = self.text
177 |         for cost in self.costs:
178 |             text = text.replace(utils.reserved_mana_marker, str(cost), 1)
179 |         return text
180 | 
181 |     def format(self, for_forum = False, for_html = False):
182 |         text = self.text
183 |         for cost in self.costs:
184 |             text = text.replace(utils.reserved_mana_marker, cost.format(for_forum=for_forum, for_html=for_html), 1)
185 |         if for_html:
186 |             text = text.replace('\n', '<br>\n')
187 |         return text
188 | 
189 |     def encode(self, randomize = False):
190 |         text = self.text
191 |         for cost in self.costs:
192 |             text = text.replace(utils.reserved_mana_marker, cost.encode(randomize = randomize), 1)
193 |         return text
194 | 
195 |     def vectorize(self):
196 |         text = self.text
197 |         special_chars = [utils.reserved_mana_marker,
198 |                          utils.dash_marker,
199 |                          utils.bullet_marker,
200 |                          utils.this_marker,
201 |                          utils.counter_marker,
202 |                          utils.choice_open_delimiter,
203 |                          utils.choice_close_delimiter,
204 |                          utils.newline,
205 |                          #utils.x_marker,
206 |                          utils.tap_marker,
207 |                          utils.untap_marker,
208 |                          utils.newline,
209 |                          ';', ':', '"', ',', '.']
210 |         for char in special_chars:
211 |             text = text.replace(char, ' ' + char + ' ')
212 |         text = text.replace('/', '/ /')
213 |         for cost in self.costs:
214 |             text = text.replace(utils.reserved_mana_marker, cost.vectorize(), 1)
215 |         return ' '.join(text.split())
216 | 


--------------------------------------------------------------------------------
/lib/namediff.py:
--------------------------------------------------------------------------------
  1 | # This module is misleadingly named, as it has other utilities as well
  2 | # that are generally necessary when trying to postprocess output by
  3 | # comparing it against existing cards.
  4 | 
  5 | import difflib
  6 | import os
  7 | import multiprocessing
  8 | 
  9 | import utils
 10 | import jdecode
 11 | import cardlib
 12 | 
 13 | libdir = os.path.dirname(os.path.realpath(__file__))
 14 | datadir = os.path.realpath(os.path.join(libdir, '../data'))
 15 | 
 16 | # multithreading control parameters
 17 | cores = multiprocessing.cpu_count()
 18 | 
 19 | # split a list into n pieces; return a list of these lists
 20 | # has slightly interesting behavior, in that if n is large, it can
 21 | # run out of elements early and return less than n lists
 22 | def list_split(l, n):
 23 |     if n <= 0:
 24 |         return l
 25 |     split_size = len(l) / n
 26 |     if len(l) % n > 0:
 27 |         split_size += 1
 28 |     return [l[i:i+split_size] for i in range(0, len(l), split_size)]
 29 | 
 30 | # flatten a list of lists into a single list of all their contents, in order
 31 | def list_flatten(l):
 32 |     return [item for sublist in l for item in sublist]
 33 | 
 34 | 
 35 | # isolated logic for multiprocessing
 36 | def f_nearest(name, matchers, n):
 37 |     for m in matchers:
 38 |         m.set_seq1(name)
 39 |     ratios = [(m.ratio(), m.b) for m in matchers]
 40 |     ratios.sort(reverse = True)
 41 | 
 42 |     if ratios[0][0] >= 1:
 43 |         return ratios[:1]
 44 |     else:
 45 |         return ratios[:n]
 46 | 
 47 | def f_nearest_per_thread(workitem):
 48 |     (worknames, names, n) = workitem
 49 |     # each thread (well, process) needs to generate its own matchers
 50 |     matchers = [difflib.SequenceMatcher(b=name, autojunk=False) for name in names]
 51 |     return map(lambda name: f_nearest(name, matchers, n), worknames)
 52 | 
 53 | class Namediff:
 54 |     def __init__(self, verbose = True,
 55 |                  json_fname = os.path.join(datadir, 'AllSets.json')):
 56 |         self.verbose = verbose
 57 |         self.names = {}
 58 |         self.codes = {}
 59 |         self.cardstrings = {}
 60 | 
 61 |         if self.verbose:
 62 |             print 'Setting up namediff...'
 63 | 
 64 |         if self.verbose:
 65 |             print '  Reading names from: ' + json_fname
 66 |         json_srcs = jdecode.mtg_open_json(json_fname, verbose)
 67 |         namecount = 0
 68 |         for json_cardname in sorted(json_srcs):
 69 |             if len(json_srcs[json_cardname]) > 0:
 70 |                 jcards = json_srcs[json_cardname]
 71 | 
 72 |                 # just use the first one
 73 |                 idx = 0
 74 |                 card = cardlib.Card(jcards[idx])
 75 |                 name = card.name
 76 |                 jname = jcards[idx]['name']
 77 |                 jcode = jcards[idx][utils.json_field_info_code]
 78 |                 if 'number' in jcards[idx]:
 79 |                     jnum = jcards[idx]['number']
 80 |                 else:
 81 |                     jnum = ''
 82 |                     
 83 |                 if name in self.names:
 84 |                     print '  Duplicate name ' + name + ', ignoring.'
 85 |                 else:
 86 |                     self.names[name] = jname
 87 |                     self.cardstrings[name] = card.encode()
 88 |                     if jcode and jnum:
 89 |                         self.codes[name] = jcode + '/' + jnum + '.jpg'
 90 |                     else:
 91 |                         self.codes[name] = ''
 92 |                     namecount += 1
 93 | 
 94 |         print '  Read ' + str(namecount) + ' unique cardnames'
 95 |         print '  Building SequenceMatcher objects.'
 96 |         
 97 |         self.matchers = [difflib.SequenceMatcher(b=n, autojunk=False) for n in self.names]
 98 |         self.card_matchers = [difflib.SequenceMatcher(b=self.cardstrings[n], autojunk=False) for n in self.cardstrings]
 99 | 
100 |         print '... Done.'
101 |     
102 |     def nearest(self, name, n=3):
103 |         return f_nearest(name, self.matchers, n)
104 | 
105 |     def nearest_par(self, names, n=3, threads=cores):
106 |         workpool = multiprocessing.Pool(threads)
107 |         proto_worklist = list_split(names, threads)
108 |         worklist = map(lambda x: (x, self.names, n), proto_worklist)
109 |         donelist = workpool.map(f_nearest_per_thread, worklist)
110 |         return list_flatten(donelist)
111 | 
112 |     def nearest_card(self, card, n=5):
113 |         return f_nearest(card.encode(), self.card_matchers, n)
114 | 
115 |     def nearest_card_par(self, cards, n=5, threads=cores):
116 |         workpool = multiprocessing.Pool(threads)
117 |         proto_worklist = list_split(cards, threads)
118 |         worklist = map(lambda x: (map(lambda c: c.encode(), x), self.cardstrings.values(), n), proto_worklist)
119 |         donelist = workpool.map(f_nearest_per_thread, worklist)
120 |         return list_flatten(donelist)
121 | 


--------------------------------------------------------------------------------
/lib/nltk_model.py:
--------------------------------------------------------------------------------
  1 | # Natural Language Toolkit: Language Models
  2 | #
  3 | # Copyright (C) 2001-2014 NLTK Project
  4 | # Authors: Steven Bird <stevenbird1@gmail.com>
  5 | #          Daniel Blanchard <dblanchard@ets.org>
  6 | #          Ilia Kurenkov <ilia.kurenkov@gmail.com>
  7 | # URL: <http://nltk.org/>
  8 | # For license information, see LICENSE.TXT
  9 | #
 10 | # adapted for mtgencode Nov. 2015
 11 | # an attempt was made to preserve the exact functionality of this code,
 12 | # hampered somewhat by its brokenness
 13 | 
 14 | from __future__ import unicode_literals
 15 | 
 16 | from math import log
 17 | 
 18 | from nltk.probability import ConditionalProbDist, ConditionalFreqDist, LidstoneProbDist
 19 | from nltk.util import ngrams
 20 | from nltk_model_api import ModelI
 21 | 
 22 | from nltk import compat
 23 | 
 24 | 
 25 | def _estimator(fdist, **estimator_kwargs):
 26 |     """
 27 |     Default estimator function using a LidstoneProbDist.
 28 |     """
 29 |     # can't be an instance method of NgramModel as they
 30 |     # can't be pickled either.
 31 |     return LidstoneProbDist(fdist, 0.001, **estimator_kwargs)
 32 | 
 33 | 
 34 | @compat.python_2_unicode_compatible
 35 | class NgramModel(ModelI):
 36 |     """
 37 |     A processing interface for assigning a probability to the next word.
 38 |     """
 39 | 
 40 |     def __init__(self, n, train, pad_left=True, pad_right=False,
 41 |                  estimator=None, **estimator_kwargs):
 42 |         """
 43 |         Create an ngram language model to capture patterns in n consecutive
 44 |         words of training text.  An estimator smooths the probabilities derived
 45 |         from the text and may allow generation of ngrams not seen during
 46 |         training. See model.doctest for more detailed testing
 47 | 
 48 |             >>> from nltk.corpus import brown
 49 |             >>> lm = NgramModel(3, brown.words(categories='news'))
 50 |             >>> lm
 51 |             <NgramModel with 91603 3-grams>
 52 |             >>> lm._backoff
 53 |             <NgramModel with 62888 2-grams>
 54 |             >>> lm.entropy(brown.words(categories='humor'))
 55 |             ... # doctest: +ELLIPSIS
 56 |             12.0399...
 57 | 
 58 |         :param n: the order of the language model (ngram size)
 59 |         :type n: int
 60 |         :param train: the training text
 61 |         :type train: list(str) or list(list(str))
 62 |         :param pad_left: whether to pad the left of each sentence with an (n-1)-gram of empty strings
 63 |         :type pad_left: bool
 64 |         :param pad_right: whether to pad the right of each sentence with an (n-1)-gram of empty strings
 65 |         :type pad_right: bool
 66 |         :param estimator: a function for generating a probability distribution
 67 |         :type estimator: a function that takes a ConditionalFreqDist and
 68 |             returns a ConditionalProbDist
 69 |         :param estimator_kwargs: Extra keyword arguments for the estimator
 70 |         :type estimator_kwargs: (any)
 71 |         """
 72 | 
 73 |         # protection from cryptic behavior for calling programs
 74 |         # that use the pre-2.0.2 interface
 75 |         assert(isinstance(pad_left, bool))
 76 |         assert(isinstance(pad_right, bool))
 77 | 
 78 |         self._lpad = ('',) * (n - 1) if pad_left else ()
 79 |         self._rpad = ('',) * (n - 1) if pad_right else ()
 80 | 
 81 |         # make sure n is greater than zero, otherwise print it
 82 |         assert (n > 0), n
 83 | 
 84 |         # For explicitness save the check whether this is a unigram model
 85 |         self.is_unigram_model = (n == 1)
 86 |         # save the ngram order number
 87 |         self._n = n
 88 |         # save left and right padding
 89 |         self._lpad = ('',) * (n - 1) if pad_left else ()
 90 |         self._rpad = ('',) * (n - 1) if pad_right else ()
 91 | 
 92 |         if estimator is None:
 93 |             estimator = _estimator
 94 | 
 95 |         cfd = ConditionalFreqDist()
 96 | 
 97 |         # set read-only ngrams set (see property declaration below to reconfigure)
 98 |         self._ngrams = set()
 99 | 
100 |         # If given a list of strings instead of a list of lists, create enclosing list
101 |         if (train is not None) and isinstance(train[0], compat.string_types):
102 |             train = [train]
103 | 
104 |         # we need to keep track of the number of word types we encounter
105 |         vocabulary = set()
106 |         for sent in train:
107 |             raw_ngrams = ngrams(sent, n, pad_left, pad_right, pad_symbol='')
108 |             for ngram in raw_ngrams:
109 |                 self._ngrams.add(ngram)
110 |                 context = tuple(ngram[:-1])
111 |                 token = ngram[-1]
112 |                 cfd[context][token] += 1
113 |                 vocabulary.add(token)
114 | 
115 |         # Unless number of bins is explicitly passed, we should use the number
116 |         # of word types encountered during training as the bins value.
117 |         # If right padding is on, this includes the padding symbol.
118 |         if 'bins' not in estimator_kwargs:
119 |             estimator_kwargs['bins'] = len(vocabulary)
120 | 
121 |         self._model = ConditionalProbDist(cfd, estimator, **estimator_kwargs)
122 | 
123 |         # recursively construct the lower-order models
124 |         if not self.is_unigram_model:
125 |             self._backoff = NgramModel(n-1, train,
126 |                                         pad_left, pad_right,
127 |                                         estimator,
128 |                                         **estimator_kwargs)
129 | 
130 |             self._backoff_alphas = dict()
131 |             # For each condition (or context)
132 |             for ctxt in cfd.conditions():
133 |                 backoff_ctxt = ctxt[1:]
134 |                 backoff_total_pr = 0.0
135 |                 total_observed_pr = 0.0
136 | 
137 |                 # this is the subset of words that we OBSERVED following
138 |                 # this context.
139 |                 # i.e. Count(word | context) > 0
140 |                 for words in self._words_following(ctxt, cfd):
141 |                     
142 |                     # so, _words_following as fixed gives back a whole list now...
143 |                     for word in words:
144 | 
145 |                         total_observed_pr += self.prob(word, ctxt)
146 |                         # we also need the total (n-1)-gram probability of
147 |                         # words observed in this n-gram context
148 |                         backoff_total_pr += self._backoff.prob(word, backoff_ctxt)
149 | 
150 |                 assert (0 <= total_observed_pr <= 1), total_observed_pr
151 |                 # beta is the remaining probability weight after we factor out
152 |                 # the probability of observed words.
153 |                 # As a sanity check, both total_observed_pr and backoff_total_pr
154 |                 # must be GE 0, since probabilities are never negative
155 |                 beta = 1.0 - total_observed_pr
156 | 
157 |                 # backoff total has to be less than one, otherwise we get
158 |                 # an error when we try subtracting it from 1 in the denominator
159 |                 assert (0 <= backoff_total_pr < 1), backoff_total_pr
160 |                 alpha_ctxt = beta / (1.0 - backoff_total_pr)
161 | 
162 |                 self._backoff_alphas[ctxt] = alpha_ctxt
163 | 
164 |     # broken
165 |     # def _words_following(self, context, cond_freq_dist):
166 |     #     for ctxt, word in cond_freq_dist.iterkeys():
167 |     #         if ctxt == context:
168 |     #             yield word
169 | 
170 |     # fixed
171 |     def _words_following(self, context, cond_freq_dist):
172 |         for ctxt in cond_freq_dist.iterkeys():
173 |             if ctxt == context:
174 |                 yield cond_freq_dist[ctxt].keys()
175 | 
176 |     def prob(self, word, context):
177 |         """
178 |         Evaluate the probability of this word in this context using Katz Backoff.
179 | 
180 |         :param word: the word to get the probability of
181 |         :type word: str
182 |         :param context: the context the word is in
183 |         :type context: list(str)
184 |         """
185 |         context = tuple(context)
186 |         if (context + (word,) in self._ngrams) or (self.is_unigram_model):
187 |             return self._model[context].prob(word)
188 |         else:
189 |             return self._alpha(context) * self._backoff.prob(word, context[1:])
190 | 
191 |     def _alpha(self, context):
192 |         """Get the backoff alpha value for the given context
193 |         """
194 |         error_message = "Alphas and backoff are not defined for unigram models"
195 |         assert not self.is_unigram_model, error_message
196 | 
197 |         if context in self._backoff_alphas:
198 |             return self._backoff_alphas[context]
199 |         else:
200 |             return 1
201 | 
202 |     def logprob(self, word, context):
203 |         """
204 |         Evaluate the (negative) log probability of this word in this context.
205 | 
206 |         :param word: the word to get the probability of
207 |         :type word: str
208 |         :param context: the context the word is in
209 |         :type context: list(str)
210 |         """
211 |         return -log(self.prob(word, context), 2)
212 | 
213 |     @property
214 |     def ngrams(self):
215 |         return self._ngrams
216 | 
217 |     @property
218 |     def backoff(self):
219 |         return self._backoff
220 | 
221 |     @property
222 |     def model(self):
223 |         return self._model
224 | 
225 |     def choose_random_word(self, context):
226 |         '''
227 |         Randomly select a word that is likely to appear in this context.
228 | 
229 |         :param context: the context the word is in
230 |         :type context: list(str)
231 |         '''
232 | 
233 |         return self.generate(1, context)[-1]
234 | 
235 |     # NB, this will always start with same word if the model
236 |     # was trained on a single text
237 |     def generate(self, num_words, context=()):
238 |         '''
239 |         Generate random text based on the language model.
240 | 
241 |         :param num_words: number of words to generate
242 |         :type num_words: int
243 |         :param context: initial words in generated string
244 |         :type context: list(str)
245 |         '''
246 | 
247 |         text = list(context)
248 |         for i in range(num_words):
249 |             text.append(self._generate_one(text))
250 |         return text
251 | 
252 |     def _generate_one(self, context):
253 |         context = (self._lpad + tuple(context))[-self._n + 1:]
254 |         if context in self:
255 |             return self[context].generate()
256 |         elif self._n > 1:
257 |             return self._backoff._generate_one(context[1:])
258 |         else:
259 |             return '.'
260 | 
261 |     def entropy(self, text):
262 |         """
263 |         Calculate the approximate cross-entropy of the n-gram model for a
264 |         given evaluation text.
265 |         This is the average log probability of each word in the text.
266 | 
267 |         :param text: words to use for evaluation
268 |         :type text: list(str)
269 |         """
270 | 
271 |         H = 0.0     # entropy is conventionally denoted by "H"
272 |         text = list(self._lpad) + text + list(self._rpad)
273 |         for i in range(self._n - 1, len(text)):
274 |             context = tuple(text[(i - self._n + 1):i])
275 |             token = text[i]
276 |             H += self.logprob(token, context)
277 |         return H / float(len(text) - (self._n - 1))
278 | 
279 |     def perplexity(self, text):
280 |         """
281 |         Calculates the perplexity of the given text.
282 |         This is simply 2 ** cross-entropy for the text.
283 | 
284 |         :param text: words to calculate perplexity of
285 |         :type text: list(str)
286 |         """
287 | 
288 |         return pow(2.0, self.entropy(text))
289 | 
290 |     def __contains__(self, item):
291 |         if not isinstance(item, tuple):
292 |             item = (item,)
293 |         return item in self._model
294 | 
295 |     def __getitem__(self, item):
296 |         if not isinstance(item, tuple):
297 |             item = (item,)
298 |         return self._model[item]
299 | 
300 |     def __repr__(self):
301 |         return '<NgramModel with %d %d-grams>' % (len(self._ngrams), self._n)
302 | 
303 | if __name__ == "__main__":
304 |     import doctest
305 |     doctest.testmod(optionflags=doctest.NORMALIZE_WHITESPACE)
306 | 


--------------------------------------------------------------------------------
/lib/nltk_model_api.py:
--------------------------------------------------------------------------------
 1 | # Natural Language Toolkit: API for Language Models
 2 | #
 3 | # Copyright (C) 2001-2014 NLTK Project
 4 | # Author: Steven Bird <stevenbird1@gmail.com>
 5 | # URL: <http://nltk.org/>
 6 | # For license information, see LICENSE.TXT
 7 | #
 8 | # imported for use in mtgcode Nov. 2015
 9 | 
10 | 
11 | # should this be a subclass of ConditionalProbDistI?
12 | 
13 | class ModelI(object):
14 |     """
15 |     A processing interface for assigning a probability to the next word.
16 |     """
17 | 
18 |     def __init__(self):
19 |         '''Create a new language model.'''
20 |         raise NotImplementedError()
21 | 
22 |     def prob(self, word, context):
23 |         '''Evaluate the probability of this word in this context.'''
24 |         raise NotImplementedError()
25 | 
26 |     def logprob(self, word, context):
27 |         '''Evaluate the (negative) log probability of this word in this context.'''
28 |         raise NotImplementedError()
29 | 
30 |     def choose_random_word(self, context):
31 |         '''Randomly select a word that is likely to appear in this context.'''
32 |         raise NotImplementedError()
33 | 
34 |     def generate(self, n):
35 |         '''Generate n words of text from the language model.'''
36 |         raise NotImplementedError()
37 | 
38 |     def entropy(self, text):
39 |         '''Evaluate the total entropy of a message with respect to the model.
40 |         This is the sum of the log probability of each word in the message.'''
41 |         raise NotImplementedError()
42 | 
43 | 


--------------------------------------------------------------------------------
/lib/transforms.py:
--------------------------------------------------------------------------------
  1 | # transform passes used to encode / decode cards
  2 | import re
  3 | import random
  4 | 
  5 | # These could probably use a little love... They tend to hardcode in lots
  6 | # of things very specific to the mtgjson format.
  7 | 
  8 | import utils
  9 | 
 10 | cardsep = utils.cardsep
 11 | fieldsep = utils.fieldsep
 12 | bsidesep = utils.bsidesep
 13 | newline = utils.newline
 14 | dash_marker = utils.dash_marker
 15 | bullet_marker = utils.bullet_marker
 16 | this_marker = utils.this_marker
 17 | counter_marker = utils.counter_marker
 18 | reserved_marker = utils.reserved_marker
 19 | choice_open_delimiter = utils.choice_open_delimiter
 20 | choice_close_delimiter = utils.choice_close_delimiter
 21 | x_marker = utils.x_marker
 22 | tap_marker = utils.tap_marker
 23 | untap_marker = utils.untap_marker
 24 | counter_rename = utils.counter_rename
 25 | unary_marker = utils.unary_marker
 26 | unary_counter = utils.unary_counter
 27 | 
 28 | 
 29 | # Name Passes.
 30 | 
 31 | 
 32 | def name_pass_1_sanitize(s):
 33 |     s = s.replace('!', '')
 34 |     s = s.replace('?', '')
 35 |     s = s.replace('-', dash_marker)
 36 |     s = s.replace('100,000', 'one hundred thousand')
 37 |     s = s.replace('1,000', 'one thousand')
 38 |     s = s.replace('1996', 'nineteen ninety-six')
 39 |     return s
 40 | 
 41 | 
 42 | # Name unpasses.
 43 | 
 44 | 
 45 | # particularly helpful if you want to call text_unpass_8_unicode later
 46 | # and NOT have it stick unicode long dashes into names.
 47 | def name_unpass_1_dashes(s):
 48 |     return s.replace(dash_marker, '-')
 49 | 
 50 | 
 51 | # Text Passes.
 52 | 
 53 | 
 54 | def text_pass_1_strip_rt(s):
 55 |     return re.sub(r'\(.*\)', '', s)
 56 | 
 57 | 
 58 | def text_pass_2_cardname(s, name):
 59 |     # Here are some fun edge cases, thanks to jml34 on the forum for 
 60 |     # pointing them out.
 61 |     if name == 'sacrifice':
 62 |         s = s.replace(name, this_marker, 1)
 63 |         return s
 64 |     elif name == 'fear':
 65 |         return s
 66 | 
 67 |     s = s.replace(name, this_marker)
 68 |     
 69 |     # So, some legends don't use the full cardname in their text box...
 70 |     # this check finds about 400 of them.
 71 |     nameparts = name.split(',')
 72 |     if len(nameparts) > 1:
 73 |         mininame = nameparts[0]
 74 |         new_s = s.replace(mininame, this_marker)
 75 |         if not new_s == s:
 76 |             s = new_s
 77 |         
 78 |     # A few others don't have a convenient comma to detect their nicknames,
 79 |     # so we override them here.
 80 |     overrides = [
 81 |         # detectable by splitting on 'the', though that might cause other issues
 82 |         'crovax',
 83 |         'rashka',
 84 |         'phage',
 85 |         'shimatsu',
 86 |         # random and arbitrary: they have a last name, 1996 world champion, etc.
 87 |         'world champion',
 88 |         'axelrod',
 89 |         'hazezon',
 90 |         'rubinia',
 91 |         'rasputin',
 92 |         'hivis',
 93 |     ]
 94 |     
 95 |     for override in overrides:
 96 |         s = s.replace(override, this_marker)
 97 | 
 98 |     # stupid planeswalker abilities
 99 |     s = s.replace('to him.', 'to ' + this_marker + '.')
100 |     s = s.replace('to him this', 'to ' + this_marker + ' this')
101 |     s = s.replace('to himself', 'to itself')
102 |     s = s.replace("he's", this_marker + ' is')
103 | 
104 |     # sometimes we actually don't want to do this replacement
105 |     s = s.replace('named ' + this_marker, 'named ' + name)
106 |     s = s.replace('name is still ' + this_marker, 'name is still ' + name)
107 |     s = s.replace('named keeper of ' + this_marker, 'named keeper of ' + name)
108 |     s = s.replace('named kobolds of ' + this_marker, 'named kobolds of ' + name)
109 |     s = s.replace('named sword of kaldra, ' + this_marker, 'named sword of kaldra, ' + name)
110 | 
111 |     return s
112 | 
113 | 
114 | def text_pass_3_unary(s):
115 |     return utils.to_unary(s)
116 | 
117 | 
118 | # Run only after doing unary conversion.
119 | def text_pass_4a_dashes(s):
120 |     s = s.replace('-' + unary_marker, reserved_marker)
121 |     s = s.replace('-', dash_marker)
122 |     s = s.replace(reserved_marker, '-' + unary_marker)
123 |     
124 |     # level up is annoying
125 |     levels = re.findall(r'level &\^*\-&', s)
126 |     for level in levels:
127 |         newlevel = level.replace('-', dash_marker)
128 |         s = s.replace(level, newlevel)
129 | 
130 |     levels = re.findall(r'level &\^*\+', s)
131 |     for level in levels:
132 |         newlevel = level.replace('+', dash_marker)
133 |         s = s.replace(level, newlevel)
134 | 
135 |     # and we still have the ~x issue
136 |     return s
137 | 
138 | 
139 | # Run this after fixing dashes, because this unbreaks the ~x issue.
140 | # Also probably don't run this on names, there are a few names with x~ in them.
141 | def text_pass_4b_x(s):
142 |     s = s.replace(dash_marker + 'x', '-' + x_marker)
143 |     s = s.replace('+x', '+' + x_marker)
144 |     s = s.replace(' x ', ' ' + x_marker + ' ')
145 |     s = s.replace('x:', x_marker + ':')
146 |     s = s.replace('x~', x_marker + '~')
147 |     s = s.replace(u'x\u2014', x_marker + u'\u2014')
148 |     s = s.replace('x.', x_marker + '.')
149 |     s = s.replace('x,', x_marker + ',')
150 |     s = s.replace('x is', x_marker + ' is')
151 |     s = s.replace('x can\'t', x_marker + ' can\'t')
152 |     s = s.replace('x/x', x_marker + '/' + x_marker)
153 |     s = s.replace('x target', x_marker + ' target')
154 |     s = s.replace('si' + x_marker + ' target', 'six target')
155 |     s = s.replace('avara' + x_marker, 'avarax')
156 |     # there's also some stupid ice age card that wants -x/-y
157 |     s = s.replace('/~', '/-')
158 |     return s
159 | 
160 | 
161 | # Call this before replacing newlines.
162 | # This one ends up being really bad because of the confusion
163 | # with 'counter target spell or ability'.
164 | def text_pass_5_counters(s):
165 |     # so, big fat old dictionary time!!!!!!!!!
166 |     allcounters = [
167 |         'time counter',
168 |         'devotion counter',
169 |         'charge counter',
170 |         'ki counter',
171 |         'matrix counter',
172 |         'spore counter',
173 |         'poison counter',
174 |         'quest counter',
175 |         'hatchling counter',
176 |         'storage counter',
177 |         'growth counter',
178 |         'paralyzation counter',
179 |         'energy counter',
180 |         'study counter',
181 |         'glyph counter',
182 |         'depletion counter',
183 |         'sleight counter',
184 |         'loyalty counter',
185 |         'hoofprint counter',
186 |         'wage counter',
187 |         'echo counter',
188 |         'lore counter',
189 |         'page counter',
190 |         'divinity counter',
191 |         'mannequin counter',
192 |         'ice counter',
193 |         'fade counter',
194 |         'pain counter',
195 |         #'age counter',
196 |         'gold counter',
197 |         'muster counter',
198 |         'infection counter',
199 |         'plague counter',
200 |         'fate counter',
201 |         'slime counter',
202 |         'shell counter',
203 |         'credit counter',
204 |         'despair counter',
205 |         'globe counter',
206 |         'currency counter',
207 |         'blood counter',
208 |         'soot counter',
209 |         'carrion counter',
210 |         'fuse counter',
211 |         'filibuster counter',
212 |         'wind counter',
213 |         'hourglass counter',
214 |         'trap counter',
215 |         'corpse counter',
216 |         'awakening counter',
217 |         'verse counter',
218 |         'scream counter',
219 |         'doom counter',
220 |         'luck counter',
221 |         'intervention counter',
222 |         'eyeball counter',
223 |         'flood counter',
224 |         'eon counter',
225 |         'death counter',
226 |         'delay counter',
227 |         'blaze counter',
228 |         'magnet counter',
229 |         'feather counter',
230 |         'shield counter',
231 |         'wish counter',
232 |         'petal counter',
233 |         'music counter',
234 |         'pressure counter',
235 |         'manifestation counter',
236 |         #'net counter',
237 |         'velocity counter',
238 |         'vitality counter',
239 |         'treasure counter',
240 |         'pin counter',
241 |         'bounty counter',
242 |         'rust counter',
243 |         'mire counter',
244 |         'tower counter',
245 |         #'ore counter',
246 |         'cube counter',
247 |         'strife counter',
248 |         'elixir counter',
249 |         'hunger counter',
250 |         'level counter',
251 |         'winch counter',
252 |         'fungus counter',
253 |         'training counter',
254 |         'theft counter',
255 |         'arrowhead counter',
256 |         'sleep counter',
257 |         'healing counter',
258 |         'mining counter',
259 |         'dream counter',
260 |         'aim counter',
261 |         'arrow counter',
262 |         'javelin counter',
263 |         'gem counter',
264 |         'bribery counter',
265 |         'mine counter',
266 |         'omen counter',
267 |         'phylactery counter',
268 |         'tide counter',
269 |         'polyp counter',
270 |         'petrification counter',
271 |         'shred counter',
272 |         'pupa counter',
273 |         'crystal counter',
274 |     ]
275 |     usedcounters = []
276 |     for countername in allcounters:
277 |         if countername in s:
278 |             usedcounters += [countername]
279 |             s = s.replace(countername, counter_marker + ' counter')
280 |     
281 |     # oh god some of the counter names are suffixes of others...
282 |     shortcounters = [
283 |         'age counter',
284 |         'net counter',
285 |         'ore counter',
286 |     ]
287 |     for countername in shortcounters:
288 |         # SUPER HACKY fix for doubling season
289 |         if countername in s and 'more counter' not in s:
290 |             usedcounters += [countername]
291 |             s = s.replace(countername, counter_marker + ' counter')
292 |     
293 |     # miraculously this doesn't seem to happen
294 |     # if len(usedcounters) > 1:
295 |     #     print usedcounters
296 | 
297 |     # we haven't done newline replacement yet, so use actual newlines
298 |     if len(usedcounters) == 1:
299 |         # and yeah, this line of code can blow up in all kinds of different ways
300 |         s = 'countertype ' + counter_marker + ' ' + usedcounters[0].split()[0] + '\n' + s
301 | 
302 |     return s
303 | 
304 | 
305 | # The word 'counter' is confusing when used to refer to what we do to spells
306 | # and sometimes abilities to make them not happen. Let's rename that.
307 | # Call this after doing the counter replacement to simplify the regexes.
308 | counter_rename = 'uncast'
309 | def text_pass_6_uncast(s):
310 |     # pre-checks to make sure we aren't doing anything dumb
311 |     # if '% counter target ' in s or '^ counter target ' in s or '& counter target ' in s:
312 |     #     print s + '\n'
313 |     # if '% counter a ' in s or '^ counter a ' in s or '& counter a ' in s:
314 |     #     print s + '\n'
315 |     # if '% counter all ' in s or '^ counter all ' in s or '& counter all ' in s:
316 |     #     print s + '\n'
317 |     # if '% counter a ' in s or '^ counter a ' in s or '& counter a ' in s:
318 |     #     print s + '\n'
319 |     # if '% counter that ' in s or '^ counter that ' in s or '& counter that ' in s:
320 |     #     print s + '\n'
321 |     # if '% counter @' in s or '^ counter @' in s or '& counter @' in s:
322 |     #     print s + '\n'
323 |     # if '% counter the ' in s or '^ counter the ' in s or '& counter the ' in s:
324 |     #     print s + '\n'
325 | 
326 |     # counter target
327 |     s = s.replace('counter target ', counter_rename + ' target ')
328 |     # counter a
329 |     s = s.replace('counter a ', counter_rename + ' a ')
330 |     # counter all
331 |     s = s.replace('counter all ', counter_rename + ' all ')
332 |     # counters a
333 |     s = s.replace('counters a ', counter_rename + 's a ')
334 |     # countered (this could get weird in terms of englishing the word; lets just go for hilarious)
335 |     s = s.replace('countered', counter_rename + 'ed')
336 |     # counter that
337 |     s = s.replace('counter that ', counter_rename + ' that ')
338 |     # counter @
339 |     s = s.replace('counter @', counter_rename + ' @')
340 |     # counter it (this is tricky
341 |     s = s.replace(', counter it', ', ' + counter_rename + ' it')
342 |     # counter the (it happens at least once, thanks wizards!)
343 |     s = s.replace('counter the ', counter_rename + ' the ')
344 |     # counter up to
345 |     s = s.replace('counter up to ', counter_rename + ' up to ')
346 | 
347 |     # check if the word exists in any other context
348 |     # if 'counter' in (s.replace('% counter', '').replace('countertype', '')
349 |     #                  .replace('^ counter', '').replace('& counter', ''):
350 |     #     print s + '\n'
351 | 
352 |     # whew! by manual inspection of a few dozen texts, it looks like this about covers it.
353 |     return s    
354 |     
355 | 
356 | # Run after fixing dashes, it makes the regexes better, but before replacing newlines.
357 | def text_pass_7_choice(s):
358 |     # the idea is to take 'choose n ~\n=ability\n=ability\n'
359 |     # to '[n = ability = ability]\n'
360 |     
361 |     def choice_formatting_helper(s_helper, prefix, count, suffix = ''):
362 |         single_choices = re.findall(ur'(' + prefix + ur'\n?(\u2022.*(\n|$))+)', s_helper)
363 |         for choice in single_choices:
364 |             newchoice = choice[0]
365 |             newchoice = newchoice.replace(prefix, unary_marker + (unary_counter * count) + suffix)
366 |             newchoice = newchoice.replace('\n', ' ')
367 |             if newchoice[-1:] == ' ':
368 |                 newchoice = choice_open_delimiter + newchoice[:-1] + choice_close_delimiter + '\n'
369 |             else:
370 |                 newchoice = choice_open_delimiter + newchoice + choice_close_delimiter
371 |             s_helper = s_helper.replace(choice[0], newchoice)
372 |         return s_helper
373 | 
374 |     s = choice_formatting_helper(s, ur'choose one \u2014', 1)
375 |     s = choice_formatting_helper(s, ur'choose one \u2014 ', 1) # ty Promise of Power
376 |     s = choice_formatting_helper(s, ur'choose two \u2014', 2)
377 |     s = choice_formatting_helper(s, ur'choose two \u2014 ', 2) # ty Profane Command
378 |     s = choice_formatting_helper(s, ur'choose one or both \u2014', 0)
379 |     s = choice_formatting_helper(s, ur'choose one or more \u2014', 0)
380 |     s = choice_formatting_helper(s, ur'choose khans or dragons.', 1)
381 |     # this is for 'an opponent chooses one', which will be a bit weird but still work out
382 |     s = choice_formatting_helper(s, ur'chooses one \u2014', 1)
383 |     # Demonic Pact has 'choose one that hasn't been chosen'...
384 |     s = choice_formatting_helper(s, ur"choose one that hasn't been chosen \u2014", 1,
385 |                                  suffix=" that hasn't been chosen")
386 |     # 'choose n. you may choose the same mode more than once.'
387 |     s = choice_formatting_helper(s, ur'choose three. you may choose the same mode more than once.', 3,
388 |                                  suffix='. you may choose the same mode more than once.')
389 | 
390 |     return s
391 | 
392 | 
393 | # do before removing newlines
394 | # might as well do this after countertype because we probably care more about
395 | # the location of the equip cost
396 | def text_pass_8_equip(s):
397 |     equips = re.findall(r'equip ' + utils.mana_json_regex + r'.?$', s)
398 |     # there don't seem to be any cases with more than one
399 |     if len(equips) == 1:
400 |         equip = equips[0]
401 |         s = s.replace('\n' + equip, '')
402 |         s = s.replace(equip, '')
403 | 
404 |         if equip[-1:] == ' ':
405 |             equip = equip[0:-1]
406 | 
407 |         if s == '':
408 |             s = equip
409 |         else:
410 |             s = equip + '\n' + s
411 | 
412 |     nonmana = re.findall(ur'(equip\u2014.*(\n|$))', s)
413 |     if len(nonmana) == 1:
414 |         equip = nonmana[0][0]
415 |         s = s.replace('\n' + equip, '')
416 |         s = s.replace(equip, '')
417 |         
418 |         if equip[-1:] == ' ':
419 |             equip = equip[0:-1]
420 | 
421 |         if s == '':
422 |             s = equip
423 |         else:
424 |             s = equip + '\n' + s
425 |         
426 |     return s
427 | 
428 | 
429 | def text_pass_9_newlines(s):
430 |     return s.replace('\n', utils.newline)
431 | 
432 | 
433 | def text_pass_10_symbols(s):
434 |     return utils.to_symbols(s)
435 | 
436 | 
437 | # reorder the lines of text into a canonical form:
438 | # first enchant and equip
439 | # then other keywords, one per line (things with no period on the end)
440 | # then other abilities
441 | # then kicker and countertype last of all
442 | def text_pass_11_linetrans(s):
443 |     # let's just not deal with level up
444 |     if 'level up' in s:
445 |         return s
446 | 
447 |     prelines = []
448 |     keylines = []
449 |     mainlines = []
450 |     postlines = []
451 | 
452 |     lines = s.split(utils.newline)
453 |     for line in lines:
454 |         line = line.strip()
455 |         if line == '':
456 |             continue
457 |         if not '.' in line:
458 |             # because this is inconsistent
459 |             line = line.replace(',', ';')
460 |             line = line.replace('; where', ', where') # Thromok the Insatiable
461 |             line = line.replace('; and', ', and') # wonky protection
462 |             line = line.replace('; from', ', from') # wonky protection
463 |             line = line.replace('upkeep;', 'upkeep,') # wonky protection
464 |             sublines = line.split(';')
465 |             for subline in sublines:
466 |                 subline = subline.strip()
467 |                 if 'equip' in subline or 'enchant' in subline:
468 |                     prelines += [subline]
469 |                 elif 'countertype' in subline or 'kicker' in subline:
470 |                     postlines += [subline]
471 |                 else:
472 |                     keylines += [subline]
473 |         elif u'\u2014' in line and not u' \u2014 ' in line:
474 |             if 'equip' in line or 'enchant' in line:
475 |                 prelines += [line]
476 |             elif 'countertype' in line or 'kicker' in line:
477 |                 postlines += [line]
478 |             else:
479 |                 keylines += [line]
480 |         else:
481 |             mainlines += [line]
482 | 
483 |     alllines = prelines + keylines + mainlines + postlines
484 |     return utils.newline.join(alllines)
485 | 
486 | 
487 | # randomize the order of the lines
488 | # not a text pass, intended to be invoked dynamically when encoding a card
489 | # call this on fully encoded text, with mana symbols expanded
490 | def separate_lines(text):
491 |     # forget about level up, ignore empty text too while we're at it
492 |     if text == '' or 'level up' in text:
493 |         return [],[],[],[],[]
494 |     
495 |     preline_search = ['equip', 'fortify', 'enchant ', 'bestow']
496 |     # probably could use optimization with a regex
497 |     costline_search = [
498 |         'multikicker', 'kicker', 'suspend', 'echo', 'awaken',
499 |         'buyback', 'dash', 'entwine', 'evoke', 'flashback',
500 |         'madness', 'megamorph', 'morph', 'miracle', 'ninjutsu', 'overload',
501 |         'prowl', 'recover', 'reinforce', 'replicate', 'scavenge', 'splice',
502 |         'surge', 'unearth', 'transmute', 'transfigure',
503 |     ]
504 |     # cycling is a special case to handle the variants
505 |     postline_search = ['countertype']
506 |     keyline_search = ['cumulative']
507 | 
508 |     prelines = []
509 |     keylines = []
510 |     mainlines = []
511 |     costlines = []
512 |     postlines = []
513 | 
514 |     lines = text.split(utils.newline)
515 |     # we've already done linetrans once, so some of the irregularities have been simplified
516 |     for line in lines:
517 |         if not '.' in line:
518 |             if any(line.startswith(s) for s in preline_search):
519 |                 prelines.append(line)
520 |             elif any(line.startswith(s) for s in postline_search):
521 |                 postlines.append(line)
522 |             elif any(line.startswith(s) for s in costline_search) or 'cycling' in line:
523 |                 costlines.append(line)
524 |             else:
525 |                 keylines.append(line)
526 |         elif (utils.dash_marker in line and not 
527 |               (' '+utils.dash_marker+' ' in line or 'non'+utils.dash_marker in line)):
528 |             if any(line.startswith(s) for s in preline_search):
529 |                 prelines.append(line)
530 |             elif any(line.startswith(s) for s in costline_search) or 'cycling' in line:
531 |                 costlines.append(line)
532 |             elif any(line.startswith(s) for s in keyline_search):
533 |                 keylines.append(line)
534 |             else:
535 |                 mainlines.append(line)
536 |         elif ': monstrosity' in line:
537 |             costlines.append(line)
538 |         else:
539 |             mainlines.append(line)
540 | 
541 |     return prelines, keylines, mainlines, costlines, postlines
542 | 
543 | choice_re = re.compile(re.escape(utils.choice_open_delimiter) + r'.*' + 
544 |                        re.escape(utils.choice_close_delimiter))
545 | choice_divider = ' ' + utils.bullet_marker + ' '
546 | def randomize_choice(line):
547 |     choices = re.findall(choice_re, line)
548 |     if len(choices) < 1:
549 |         return line
550 |     new_line = line
551 |     for choice in choices:
552 |         parts = choice[1:-1].split(choice_divider)
553 |         if len(parts) < 3:
554 |             continue
555 |         choiceparts = parts[1:]
556 |         random.shuffle(choiceparts)
557 |         new_line = new_line.replace(choice, 
558 |                                     utils.choice_open_delimiter +
559 |                                     choice_divider.join(parts[:1] + choiceparts) +
560 |                                     utils.choice_close_delimiter,
561 |                                     1)
562 |     return new_line
563 |     
564 | def randomize_lines(text):
565 |     if text == '' or 'level up' in text:
566 |         return text
567 | 
568 |     prelines, keylines, mainlines, costlines, postlines = separate_lines(text)
569 | 
570 |     new_mainlines = []
571 |     for line in mainlines:
572 |         if line.endswith(utils.choice_close_delimiter):
573 |             new_mainlines.append(randomize_choice(line))
574 |         # elif utils.choice_open_delimiter in line or utils.choice_close_delimiter in line:
575 |         #     print(line)
576 |         else:
577 |             new_mainlines.append(line)
578 | 
579 |     if False: # TODO: make this an option
580 |         lines = prelines + keylines + new_mainlines + costlines + postlines
581 |         random.shuffle(lines)
582 |         return utils.newline.join(lines)
583 |     else:
584 |         random.shuffle(prelines)
585 |         random.shuffle(keylines)
586 |         random.shuffle(new_mainlines)
587 |         random.shuffle(costlines)
588 |         #random.shuffle(postlines) # only one kind ever (countertype)
589 |         return utils.newline.join(prelines+keylines+new_mainlines+costlines+postlines)
590 | 
591 | 
592 | # Text unpasses, for decoding. All assume the text inside a Manatext, so don't do anything
593 | # weird with the mana cost symbol.
594 | 
595 | 
596 | def text_unpass_1_choice(s, delimit = False):
597 |     choice_regex = (re.escape(choice_open_delimiter) + re.escape(unary_marker)
598 |                     + r'.*' + re.escape(bullet_marker) + r'.*' + re.escape(choice_close_delimiter))
599 |     choices = re.findall(choice_regex, s)
600 |     for choice in sorted(choices, lambda x,y: cmp(len(x), len(y)), reverse = True):
601 |         fragments = choice[1:-1].split(bullet_marker)
602 |         countfrag = fragments[0]
603 |         optfrags = fragments[1:]
604 |         choicecount = int(utils.from_unary(re.findall(utils.number_unary_regex, countfrag)[0]))
605 |         newchoice = ''
606 | 
607 |         if choicecount == 0:
608 |             if len(countfrag) == 2:
609 |                 newchoice += 'choose one or both '
610 |             else:
611 |                 newchoice += 'choose one or more '
612 |         elif choicecount == 1:
613 |             newchoice += 'choose one '
614 |         elif choicecount == 2:
615 |             newchoice += 'choose two '
616 |         else:
617 |             newchoice += 'choose ' + utils.to_unary(str(choicecount)) + ' '
618 |         newchoice += dash_marker
619 |         
620 |         for option in optfrags:
621 |             option = option.strip()
622 |             if option:
623 |                 newchoice += newline + bullet_marker + ' ' + option
624 | 
625 |         if delimit:
626 |             s = s.replace(choice, choice_open_delimiter + newchoice + choice_close_delimiter)
627 |             s = s.replace('an opponent ' + choice_open_delimiter + 'choose ', 
628 |                           'an opponent ' + choice_open_delimiter + 'chooses ')
629 |         else:
630 |             s = s.replace(choice, newchoice)
631 |             s = s.replace('an opponent choose ', 'an opponent chooses ')
632 |     
633 |     return s
634 | 
635 | 
636 | def text_unpass_2_counters(s):
637 |     countertypes = re.findall(r'countertype ' + re.escape(counter_marker) 
638 |                               + r'[^' + re.escape(newline) + r']*' + re.escape(newline), s)
639 |     # lazier than using groups in the regex
640 |     countertypes += re.findall(r'countertype ' + re.escape(counter_marker) 
641 |                               + r'[^' + re.escape(newline) + r']*$', s)
642 |     if len(countertypes) > 0:
643 |         countertype = countertypes[0].replace('countertype ' + counter_marker, '')
644 |         countertype = countertype.replace(newline, '\n').strip()
645 |         s = s.replace(countertypes[0], '')
646 |         s = s.replace(counter_marker, countertype)
647 |     
648 |     return s
649 | 
650 | 
651 | def text_unpass_3_uncast(s):
652 |     return s.replace(counter_rename, 'counter')
653 | 
654 | 
655 | def text_unpass_4_unary(s):
656 |     return utils.from_unary(s)
657 | 
658 | 
659 | def text_unpass_5_symbols(s, for_forum, for_html):
660 |     return utils.from_symbols(s, for_forum = for_forum, for_html = for_html)
661 | 
662 | 
663 | def text_unpass_6_cardname(s, name):
664 |     return s.replace(this_marker, name)
665 | 
666 | 
667 | def text_unpass_7_newlines(s):
668 |     return s.replace(newline, '\n')
669 | 
670 | 
671 | def text_unpass_8_unicode(s):
672 |     s = s.replace(dash_marker, u'\u2014')
673 |     s = s.replace(bullet_marker, u'\u2022')
674 |     return s
675 | 


--------------------------------------------------------------------------------
/lib/utils.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | 
  3 | # Utilities for handling unicode, unary numbers, mana costs, and special symbols.
  4 | # For convenience we redefine everything from config so that it can all be accessed
  5 | # from the utils module.
  6 | 
  7 | import config
  8 | 
  9 | # special chunk of text that Magic Set Editor 2 requires at the start of all set files.
 10 | mse_prepend = 'mse version: 0.3.8\ngame: magic\nstylesheet: m15\nset info:\n\tsymbol:\nstyling:\n\tmagic-m15:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay:\n\tmagic-m15-clear:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay: \n\tmagic-m15-extra-improved:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\tpt box symbols: magic-pt-symbols-extra.mse-symbol-font\n\t\toverlay: \n\tmagic-m15-planeswalker:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay: \n\tmagic-m15-planeswalker-promo-black:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay: \n\tmagic-m15-promo-dka:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay: \n\tmagic-m15-token-clear:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay: \n\tmagic-new-planeswalker:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay: \n\tmagic-new-planeswalker-4abil:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay: \n\tmagic-new-planeswalker-clear:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay: \n\tmagic-new-planeswalker-promo-black:\n\t\ttext box mana symbols: magic-mana-small.mse-symbol-font\n\t\toverlay: \n'
 11 | 
 12 | # special chunk of text to start an HTML document.
 13 | import html_extra_data
 14 | segment_ids = html_extra_data.id_lables
 15 | html_prepend = html_extra_data.html_prepend
 16 | html_append = "\n</body>\n</html>"
 17 | 
 18 | # encoding formats we know about
 19 | formats = [
 20 |     'std',
 21 |     'named',
 22 |     'noname',
 23 |     'rfields',
 24 |     'old',
 25 |     'norarity',
 26 |     'vec',
 27 |     'custom',
 28 | ]
 29 | 
 30 | # separators
 31 | cardsep = config.cardsep
 32 | fieldsep = config.fieldsep
 33 | bsidesep = config.bsidesep
 34 | newline = config.newline
 35 | 
 36 | # special indicators
 37 | dash_marker = config.dash_marker
 38 | bullet_marker = config.bullet_marker
 39 | this_marker = config.this_marker
 40 | counter_marker = config.counter_marker
 41 | reserved_marker = config.reserved_marker
 42 | reserved_mana_marker = config.reserved_mana_marker
 43 | choice_open_delimiter = config.choice_open_delimiter
 44 | choice_close_delimiter = config.choice_close_delimiter
 45 | x_marker = config.x_marker
 46 | tap_marker = config.tap_marker
 47 | untap_marker = config.untap_marker
 48 | rarity_common_marker = config.rarity_common_marker
 49 | rarity_uncommon_marker = config.rarity_uncommon_marker
 50 | rarity_rare_marker = config.rarity_rare_marker
 51 | rarity_mythic_marker = config.rarity_mythic_marker
 52 | rarity_special_marker = config.rarity_special_marker
 53 | rarity_basic_land_marker = config.rarity_basic_land_marker
 54 | 
 55 | json_rarity_map = {
 56 |     'Common' : rarity_common_marker,
 57 |     'Uncommon' : rarity_uncommon_marker,
 58 |     'Rare' : rarity_rare_marker,
 59 |     'Mythic Rare' : rarity_mythic_marker,
 60 |     'Special' : rarity_special_marker,
 61 |     'Basic Land' : rarity_basic_land_marker,
 62 | }
 63 | json_rarity_unmap = {json_rarity_map[k] : k for k in json_rarity_map}
 64 | 
 65 | # unambiguous synonyms
 66 | counter_rename = config.counter_rename
 67 | 
 68 | # field labels
 69 | field_label_name = config.field_label_name
 70 | field_label_rarity = config.field_label_rarity
 71 | field_label_cost = config.field_label_cost
 72 | field_label_supertypes = config.field_label_supertypes
 73 | field_label_types = config.field_label_types
 74 | field_label_subtypes = config.field_label_subtypes
 75 | field_label_loyalty = config.field_label_loyalty
 76 | field_label_pt = config.field_label_pt
 77 | field_label_text = config.field_label_text
 78 | 
 79 | # additional fields we add to the json cards
 80 | json_field_bside = config.json_field_bside
 81 | json_field_set_name = config.json_field_set_name
 82 | json_field_info_code = config.json_field_info_code
 83 | 
 84 | # unicode / ascii conversion
 85 | unicode_trans = {
 86 |     u'\u2014' : dash_marker, # unicode long dash
 87 |     u'\u2022' : bullet_marker, # unicode bullet
 88 |     u'\u2019' : '"', # single quote
 89 |     u'\u2018' : '"', # single quote
 90 |     u'\u2212' : '-', # minus sign
 91 |     u'\xe6' : 'ae', # ae symbol
 92 |     u'\xfb' : 'u', # u with caret
 93 |     u'\xfa' : 'u', # u with accent
 94 |     u'\xe9' : 'e', # e with accent
 95 |     u'\xe1' : 'a', # a with accent
 96 |     u'\xe0' : 'a', # a with accent going the other way
 97 |     u'\xe2' : 'a', # a with caret
 98 |     u'\xf6' : 'o', # o with umlaut
 99 |     u'\xed' : 'i', # i with accent
100 | }
101 | 
102 | # this one is one-way only
103 | def to_ascii(s):
104 |     for uchar in unicode_trans:
105 |         s = s.replace(uchar, unicode_trans[uchar])
106 |     return s
107 | 
108 | # unary numbers
109 | unary_marker = config.unary_marker
110 | unary_counter = config.unary_counter
111 | unary_max = config.unary_max
112 | unary_exceptions = config.unary_exceptions
113 | 
114 | def to_unary(s, warn = False):
115 |     numbers = re.findall(r'[0123456789]+', s)
116 |     # replace largest first to avoid accidentally replacing shared substrings
117 |     for n in sorted(numbers, cmp = lambda x,y: cmp(int(x), int(y)), reverse = True):
118 |         i = int(n)
119 |         if i in unary_exceptions:
120 |             s = s.replace(n, unary_exceptions[i])
121 |         elif i > unary_max:
122 |             i = unary_max
123 |             if warn:
124 |                 print s
125 |             s = s.replace(n, unary_marker + unary_counter * i)
126 |         else:
127 |             s = s.replace(n, unary_marker + unary_counter * i)
128 |     return s
129 | 
130 | def from_unary(s):
131 |     numbers = re.findall(re.escape(unary_marker + unary_counter) + '*', s)
132 |     # again, largest first so we don't replace substrings and break everything
133 |     for n in sorted(numbers, cmp = lambda x,y: cmp(len(x), len(y)), reverse = True):
134 |         i = (len(n) - len(unary_marker)) / len(unary_counter)
135 |         s = s.replace(n, str(i))
136 |     return s
137 | 
138 | # mana syntax
139 | mana_open_delimiter = '{'
140 | mana_close_delimiter = '}'
141 | mana_json_open_delimiter = mana_open_delimiter
142 | mana_json_close_delimiter = mana_close_delimiter
143 | mana_json_hybrid_delimiter = '/'
144 | mana_forum_open_delimiter = '[mana]'
145 | mana_forum_close_delimiter = '[/mana]'
146 | mana_html_open_delimiter = "<img class='mana-"
147 | mana_html_close_delimiter = "'>"
148 | mana_html_hybrid_delimiter = '-'
149 | mana_unary_marker = '' # if the same as unary_marker, from_unary WILL replace numbers in mana costs
150 | mana_unary_counter = unary_counter
151 | 
152 | # The decoding from mtgjson format is dependent on the specific structure of
153 | # these internally used mana symbol strings, so if you want to change them you'll
154 | # also have to change the json decoding functions.
155 | 
156 | # standard mana symbol set
157 | mana_W = 'W' # single color
158 | mana_U = 'U'
159 | mana_B = 'B'
160 | mana_R = 'R'
161 | mana_G = 'G'
162 | mana_P = 'P' # colorless phyrexian
163 | mana_S = 'S' # snow
164 | mana_X = 'X' # colorless X
165 | mana_C = 'C' # colorless only 'eldrazi'
166 | mana_E = 'E' # energy counter
167 | mana_WP = 'WP' # single color phyrexian
168 | mana_UP = 'UP'
169 | mana_BP = 'BP'
170 | mana_RP = 'RP'
171 | mana_GP = 'GP'
172 | mana_2W = '2W' # single color hybrid
173 | mana_2U = '2U'
174 | mana_2B = '2B'
175 | mana_2R = '2R'
176 | mana_2G = '2G'
177 | mana_WU = 'WU' # dual color hybrid
178 | mana_WB = 'WB'
179 | mana_RW = 'RW'
180 | mana_GW = 'GW'
181 | mana_UB = 'UB'
182 | mana_UR = 'UR'
183 | mana_GU = 'GU'
184 | mana_BR = 'BR'
185 | mana_BG = 'BG'
186 | mana_RG = 'RG'
187 | # alternative order symbols
188 | mana_WP_alt = 'PW' # single color phyrexian
189 | mana_UP_alt = 'PU'
190 | mana_BP_alt = 'PB'
191 | mana_RP_alt = 'PR'
192 | mana_GP_alt = 'PG'
193 | mana_2W_alt = 'W2' # single color hybrid
194 | mana_2U_alt = 'U2'
195 | mana_2B_alt = 'B2'
196 | mana_2R_alt = 'R2'
197 | mana_2G_alt = 'G2'
198 | mana_WU_alt = 'UW' # dual color hybrid
199 | mana_WB_alt = 'BW'
200 | mana_RW_alt = 'WR'
201 | mana_GW_alt = 'WG'
202 | mana_UB_alt = 'BU'
203 | mana_UR_alt = 'RU'
204 | mana_GU_alt = 'UG'
205 | mana_BR_alt = 'RB'
206 | mana_BG_alt = 'GB'
207 | mana_RG_alt = 'GR'
208 | # special 
209 | mana_2 = '2' # use with 'in' to identify single color hybrid
210 | 
211 | # master symbol lists
212 | mana_syms = [
213 |     mana_W,
214 |     mana_U,
215 |     mana_B,
216 |     mana_R,
217 |     mana_G,
218 |     mana_P,
219 |     mana_S,
220 |     mana_X,
221 |     mana_C,
222 |     mana_E,
223 |     mana_WP,
224 |     mana_UP,
225 |     mana_BP,
226 |     mana_RP,
227 |     mana_GP,
228 |     mana_2W,
229 |     mana_2U,
230 |     mana_2B,
231 |     mana_2R,
232 |     mana_2G,
233 |     mana_WU,
234 |     mana_WB,
235 |     mana_RW,
236 |     mana_GW,
237 |     mana_UB,
238 |     mana_UR,
239 |     mana_GU,
240 |     mana_BR,
241 |     mana_BG,
242 |     mana_RG,
243 | ]
244 | mana_symalt = [
245 |     mana_WP_alt,
246 |     mana_UP_alt,
247 |     mana_BP_alt,
248 |     mana_RP_alt,
249 |     mana_GP_alt,
250 |     mana_2W_alt,
251 |     mana_2U_alt,
252 |     mana_2B_alt,
253 |     mana_2R_alt,
254 |     mana_2G_alt,
255 |     mana_WU_alt,
256 |     mana_WB_alt,
257 |     mana_RW_alt,
258 |     mana_GW_alt,
259 |     mana_UB_alt,
260 |     mana_UR_alt,
261 |     mana_GU_alt,
262 |     mana_BR_alt,
263 |     mana_BG_alt,
264 |     mana_RG_alt,
265 | ]
266 | mana_symall = mana_syms + mana_symalt
267 | 
268 | # alt symbol conversion
269 | def mana_alt(sym):
270 |     if not sym in mana_symall:
271 |         raise ValueError('invalid mana symbol for mana_alt(): ' + repr(sym))
272 |     if len(sym) < 2:
273 |         return sym
274 |     else:
275 |         return sym[::-1]
276 | 
277 | # produce intended neural net output format
278 | def mana_sym_to_encoding(sym):
279 |     if not sym in mana_symall:
280 |         raise ValueError('invalid mana symbol for mana_sym_to_encoding(): ' + repr(sym))
281 |     if len(sym) < 2:
282 |         return sym * 2
283 |     else:
284 |         return sym
285 | 
286 | # produce json formatting used in mtgjson
287 | def mana_sym_to_json(sym):
288 |     if not sym in mana_symall:
289 |         raise ValueError('invalid mana symbol for mana_sym_to_json(): ' + repr(sym))
290 |     if len(sym) < 2:
291 |         return mana_json_open_delimiter + sym + mana_json_close_delimiter
292 |     else:
293 |         return (mana_json_open_delimiter + sym[0] + mana_json_hybrid_delimiter
294 |                 + sym[1] + mana_json_close_delimiter)
295 | 
296 | # produce pretty formatting that renders on mtgsalvation forum
297 | # converts individual symbols; surrounding [mana][/mana] tags are added elsewhere
298 | def mana_sym_to_forum(sym):
299 |     if not sym in mana_symall:
300 |         raise ValueError('invalid mana symbol for mana_sym_to_forum(): ' + repr(sym))
301 |     if sym in mana_symalt:
302 |         sym = mana_alt(sym)
303 |     if len(sym) < 2:
304 |         return sym
305 |     else:
306 |         return mana_json_open_delimiter + sym + mana_json_close_delimiter
307 | 
308 | # forward symbol tables for encoding
309 | mana_syms_encode = {sym : mana_sym_to_encoding(sym) for sym in mana_syms}
310 | mana_symalt_encode = {sym : mana_sym_to_encoding(sym) for sym in mana_symalt}
311 | mana_symall_encode = {sym : mana_sym_to_encoding(sym) for sym in mana_symall}
312 | mana_syms_jencode = {sym : mana_sym_to_json(sym) for sym in mana_syms}
313 | mana_symalt_jencode = {sym : mana_sym_to_json(sym) for sym in mana_symalt}
314 | mana_symall_jencode = {sym : mana_sym_to_json(sym) for sym in mana_symall}
315 | 
316 | # reverse symbol tables for decoding
317 | mana_syms_decode = {mana_sym_to_encoding(sym) : sym for sym in mana_syms}
318 | mana_symalt_decode = {mana_sym_to_encoding(sym) : sym for sym in mana_symalt}
319 | mana_symall_decode = {mana_sym_to_encoding(sym) : sym for sym in mana_symall}
320 | mana_syms_jdecode = {mana_sym_to_json(sym) : sym for sym in mana_syms}
321 | mana_symalt_jdecode = {mana_sym_to_json(sym) : sym for sym in mana_symalt}
322 | mana_symall_jdecode = {mana_sym_to_json(sym) : sym for sym in mana_symall}
323 | 
324 | # going straight from json to encoding and vice versa
325 | def mana_encode_direct(jsym):
326 |     if not jsym in mana_symall_jdecode:
327 |         raise ValueError('json string not found in decode table for mana_encode_direct(): '
328 |                          + repr(jsym))
329 |     else:
330 |         return mana_symall_encode[mana_symall_jdecode[jsym]]
331 | 
332 | def mana_decode_direct(sym):
333 |     if not sym in mana_symall_decode:
334 |         raise ValueError('mana symbol not found in decode table for mana_decode_direct(): '
335 |                          + repr(sym))
336 |     else:
337 |         return mana_symall_jencode[mana_symall_decode[sym]]
338 | 
339 | # hacked in support for mtgsalvation forum
340 | def mana_decode_direct_forum(sym):
341 |     if not sym in mana_symall_decode:
342 |         raise ValueError('mana symbol not found in decode table for mana_decode_direct_forum(): '
343 |                          + repr(sym))
344 |     else:
345 |         return mana_sym_to_forum(mana_symall_decode[sym])
346 | 
347 | # processing entire strings
348 | def unique_string(s):
349 |     return ''.join(set(s))
350 | 
351 | mana_charset_special = mana_unary_marker + mana_unary_counter
352 | mana_charset_strict = unique_string(''.join(mana_symall) + mana_charset_special)
353 | mana_charset = unique_string(mana_charset_strict + mana_charset_strict.lower())
354 | 
355 | mana_regex_strict = (re.escape(mana_open_delimiter) + '['
356 |                      + re.escape(mana_charset_strict) 
357 |                      + ']*' + re.escape(mana_close_delimiter))
358 | mana_regex = (re.escape(mana_open_delimiter) + '['
359 |               + re.escape(mana_charset)
360 |               + ']*' + re.escape(mana_close_delimiter))
361 | 
362 | # as a special case, we let unary or decimal numbers exist in json mana strings
363 | mana_json_charset_special = ('0123456789' + unary_marker + unary_counter)
364 | mana_json_charset_strict = unique_string(''.join(mana_symall_jdecode) + mana_json_charset_special)
365 | mana_json_charset = unique_string(mana_json_charset_strict + mana_json_charset_strict.lower())
366 | 
367 | # note that json mana strings can't be empty between the delimiters
368 | mana_json_regex_strict = (re.escape(mana_json_open_delimiter) + '['
369 |                      + re.escape(mana_json_charset_strict) 
370 |                      + ']+' + re.escape(mana_json_close_delimiter))
371 | mana_json_regex = (re.escape(mana_json_open_delimiter) + '['
372 |                + re.escape(mana_json_charset)
373 |                + ']+' + re.escape(mana_json_close_delimiter))
374 | 
375 | number_decimal_regex = r'[0123456789]+'
376 | number_unary_regex = re.escape(unary_marker) + re.escape(unary_counter) + '*'
377 | mana_decimal_regex = (re.escape(mana_json_open_delimiter) + number_decimal_regex 
378 |                       + re.escape(mana_json_close_delimiter))
379 | mana_unary_regex = (re.escape(mana_json_open_delimiter) + number_unary_regex
380 |                     + re.escape(mana_json_close_delimiter))
381 | 
382 | # convert a json mana string to the proper encoding
383 | def mana_translate(jmanastr):
384 |     manastr = jmanastr
385 |     for n in sorted(re.findall(mana_unary_regex, manastr),
386 |                     lambda x,y: cmp(len(x), len(y)), reverse = True):
387 |         ns = re.findall(number_unary_regex, n)
388 |         i = (len(ns[0]) - len(unary_marker)) / len(unary_counter)
389 |         manastr = manastr.replace(n, mana_unary_marker + mana_unary_counter * i)
390 |     for n in sorted(re.findall(mana_decimal_regex, manastr),
391 |                         lambda x,y: cmp(len(x), len(y)), reverse = True):
392 |         ns = re.findall(number_decimal_regex, n)
393 |         i = int(ns[0])
394 |         manastr = manastr.replace(n, mana_unary_marker + mana_unary_counter * i)
395 |     for jsym in sorted(mana_symall_jdecode, lambda x,y: cmp(len(x), len(y)), reverse = True):
396 |         if jsym in manastr:
397 |             manastr = manastr.replace(jsym, mana_encode_direct(jsym))
398 |     return mana_open_delimiter + manastr + mana_close_delimiter
399 | 
400 | # convert an encoded mana string back to json
401 | mana_symlen_min = min([len(sym) for sym in mana_symall_decode])
402 | mana_symlen_max = max([len(sym) for sym in mana_symall_decode])
403 | def mana_untranslate(manastr, for_forum = False, for_html = False):
404 |     inner = manastr[1:-1]
405 |     jmanastr = ''
406 |     colorless_total = 0
407 |     idx = 0
408 |     while idx < len(inner):
409 |         # taking this branch is an infinite loop if unary_marker is empty
410 |         if len(mana_unary_marker) > 0 and inner[idx:idx+len(mana_unary_marker)] == mana_unary_marker:
411 |             idx += len(mana_unary_marker)
412 |         elif inner[idx:idx+len(mana_unary_counter)] == mana_unary_counter:
413 |             idx += len(mana_unary_counter)
414 |             colorless_total += 1
415 |         else:
416 |             old_idx = idx
417 |             for symlen in range(mana_symlen_min, mana_symlen_max + 1):
418 |                 sym = inner[idx:idx+symlen]
419 |                 if sym in mana_symall_decode:
420 |                     idx += symlen
421 |                     if for_html:
422 |                         jmanastr = jmanastr + mana_decode_direct(sym)
423 |                         jmanastr = jmanastr.replace(mana_open_delimiter, mana_html_open_delimiter)
424 |                         jmanastr = jmanastr.replace(mana_close_delimiter, mana_html_close_delimiter)
425 |                         jmanastr = jmanastr.replace(mana_open_delimiter, mana_html_open_delimiter)
426 |                         jmanastr = jmanastr.replace(mana_json_hybrid_delimiter, mana_html_hybrid_delimiter)
427 |                     elif for_forum:
428 |                         jmanastr = jmanastr + mana_decode_direct_forum(sym)
429 |                     else:
430 |                         jmanastr = jmanastr + mana_decode_direct(sym)
431 |                     break
432 |             # otherwise we'll go into an infinite loop if we see a symbol we don't know
433 |             if idx == old_idx:
434 |                 idx += 1
435 |     
436 |     if for_html:
437 |         if jmanastr == '':
438 |             return mana_html_open_delimiter + str(colorless_total) + mana_html_close_delimiter
439 |         else:
440 |             return (('' if colorless_total == 0
441 |                      else mana_html_open_delimiter + str(colorless_total) + mana_html_close_delimiter)
442 |                     + jmanastr)
443 | 
444 |     elif for_forum:
445 |         if jmanastr == '':
446 |             return mana_forum_open_delimiter + str(colorless_total) + mana_forum_close_delimiter
447 |         else:
448 |             return (mana_forum_open_delimiter + ('' if colorless_total == 0 
449 |                                                  else str(colorless_total))
450 |                     + jmanastr + mana_forum_close_delimiter)
451 |     else:
452 |         if jmanastr == '':
453 |             return mana_json_open_delimiter + str(colorless_total) + mana_json_close_delimiter
454 |         else:
455 |             return (('' if colorless_total == 0 else 
456 |                      mana_json_open_delimiter + str(colorless_total) + mana_json_close_delimiter)
457 |                     + jmanastr)
458 | 
459 | # finally, replacing all instances in a string
460 | # notice the calls to .upper(), this way we recognize lowercase symbols as well just in case
461 | def to_mana(s):
462 |     jmanastrs = re.findall(mana_json_regex, s)
463 |     for jmanastr in sorted(jmanastrs, lambda x,y: cmp(len(x), len(y)), reverse = True):
464 |         s = s.replace(jmanastr, mana_translate(jmanastr.upper()))
465 |     return s
466 | 
467 | def from_mana(s, for_forum = False):
468 |     manastrs = re.findall(mana_regex, s)
469 |     for manastr in sorted(manastrs, lambda x,y: cmp(len(x), len(y)), reverse = True):
470 |         s = s.replace(manastr, mana_untranslate(manastr.upper(), for_forum = for_forum))
471 |     return s
472 |     
473 | # Translation could also be accomplished using the datamine.Manacost object's
474 | # display methods, but these direct string transformations are retained for
475 | # quick scripting and convenience (and used under the hood by that class to
476 | # do its formatting).
477 | 
478 | # more convenience features for formatting tap / untap symbols
479 | json_symbol_tap = tap_marker
480 | json_symbol_untap = untap_marker
481 | 
482 | json_symbol_trans = {
483 |     mana_json_open_delimiter + json_symbol_tap + mana_json_close_delimiter : tap_marker,
484 |     mana_json_open_delimiter + json_symbol_tap.lower() + mana_json_close_delimiter : tap_marker,
485 |     mana_json_open_delimiter + json_symbol_untap + mana_json_close_delimiter : untap_marker,
486 |     mana_json_open_delimiter + json_symbol_untap.lower() + mana_json_close_delimiter : untap_marker,
487 | }
488 | symbol_trans = {
489 |     tap_marker : mana_json_open_delimiter + json_symbol_tap + mana_json_close_delimiter,
490 |     untap_marker : mana_json_open_delimiter + json_symbol_untap + mana_json_close_delimiter,
491 | }
492 | symbol_forum_trans = {
493 |     tap_marker : mana_forum_open_delimiter + json_symbol_tap + mana_forum_close_delimiter,
494 |     untap_marker : mana_forum_open_delimiter + json_symbol_untap + mana_forum_close_delimiter,
495 | }
496 | symbol_html_trans = {
497 |     tap_marker : mana_html_open_delimiter + json_symbol_tap + mana_html_close_delimiter,
498 |     untap_marker : mana_html_open_delimiter + json_symbol_untap + mana_html_close_delimiter,
499 | }
500 | 
501 | json_symbol_regex = (re.escape(mana_json_open_delimiter) + '['
502 |                      + json_symbol_tap + json_symbol_tap.lower()
503 |                      + json_symbol_untap + json_symbol_untap.lower()
504 |                      + ']' + re.escape(mana_json_close_delimiter))
505 | symbol_regex = '[' + tap_marker + untap_marker + ']'
506 | 
507 | def to_symbols(s):
508 |     jsymstrs = re.findall(json_symbol_regex, s)
509 |     for jsymstr in sorted(jsymstrs, lambda x,y: cmp(len(x), len(y)), reverse = True):
510 |         s = s.replace(jsymstr, json_symbol_trans[jsymstr])
511 |     return s
512 | 
513 | def from_symbols(s, for_forum = False, for_html = False):
514 |     symstrs = re.findall(symbol_regex, s)
515 |     #for symstr in sorted(symstrs, lambda x,y: cmp(len(x), len(y)), reverse = True):
516 |     # We have to do the right thing here, because the thing we replace exists in the thing
517 |     # we replace it with...
518 |     for symstr in set(symstrs):
519 |         if for_html:
520 |             s = s.replace(symstr, symbol_html_trans[symstr])
521 |         elif for_forum:
522 |             s = s.replace(symstr, symbol_forum_trans[symstr])
523 |         else:
524 |             s = s.replace(symstr, symbol_trans[symstr])
525 |     return s
526 | 
527 | unletters_regex = r"[^abcdefghijklmnopqrstuvwxyz']"
528 | 


--------------------------------------------------------------------------------
/scripts/analysis.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | import sys
  3 | import os
  4 | import re
  5 | from collections import OrderedDict
  6 | 
  7 | # scipy is kinda necessary
  8 | import scipy
  9 | import scipy.stats
 10 | import numpy as np
 11 | import math
 12 | 
 13 | def mean_nonan(l):
 14 |     filtered = [x for x in l if not math.isnan(x)]
 15 |     return  np.mean(filtered)
 16 | 
 17 | def gmean_nonzero(l):
 18 |     filtered = [x for x in l if x != 0 and not math.isnan(x)]
 19 |     return  scipy.stats.gmean(filtered)
 20 | 
 21 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../lib')
 22 | sys.path.append(libdir)
 23 | datadir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../data')
 24 | import jdecode
 25 | 
 26 | import mtg_validate
 27 | import ngrams
 28 | 
 29 | def annotate_values(values):
 30 |     for k in values:
 31 |         (total, good, bad) = values[k]
 32 |         values[k] = OrderedDict([('total', total), ('good', good), ('bad', bad)])
 33 |     return values
 34 | 
 35 | def print_statistics(stats, ident = 0):
 36 |     for k in stats:
 37 |         if isinstance(stats[k], OrderedDict):
 38 |             print(' ' * ident + str(k) + ':')
 39 |             print_statistics(stats[k], ident=ident+2)
 40 |         elif isinstance(stats[k], dict):
 41 |             print(' ' * ident + str(k) + ': <dict with ' + str(len(stats[k])) + ' entries>')
 42 |         elif isinstance(stats[k], list):
 43 |             print(' ' * ident + str(k) + ': <list with ' + str(len(stats[k])) + ' entries>')
 44 |         else:
 45 |             print(' ' * ident + str(k) + ': ' + str(stats[k]))
 46 | 
 47 | def get_statistics(fname, lm = None, sep = False, verbose=False):
 48 |     stats = OrderedDict()
 49 |     cards = jdecode.mtg_open_file(fname, verbose=verbose)
 50 |     stats['cards'] = cards
 51 | 
 52 |     # unpack the name of the checkpoint - terrible and hacky
 53 |     try:
 54 |         final_name = os.path.basename(fname)
 55 |         halves = final_name.split('_epoch')
 56 |         cp_name = halves[0]
 57 |         cp_info = halves[1][:-4]
 58 |         info_halves = cp_info.split('_')
 59 |         cp_epoch = float(info_halves[0])
 60 |         fragments = info_halves[1].split('.')
 61 |         cp_vloss = float('.'.join(fragments[:2]))
 62 |         cp_temp = float('.'.join(fragments[-2:]))
 63 |         cp_ident = '.'.join(fragments[2:-2])
 64 |         stats['cp'] = OrderedDict([('name', cp_name),
 65 |                                    ('epoch', cp_epoch),
 66 |                                    ('vloss', cp_vloss),
 67 |                                    ('temp', cp_temp),
 68 |                                    ('ident', cp_ident)])
 69 |     except Exception as e:
 70 |         pass
 71 | 
 72 |     # validate
 73 |     ((total_all, total_good, total_bad, total_uncovered), 
 74 |          values) = mtg_validate.process_props(cards)
 75 |     
 76 |     stats['props'] = annotate_values(values)
 77 |     stats['props']['overall'] = OrderedDict([('total', total_all), 
 78 |                                              ('good', total_good), 
 79 |                                              ('bad', total_bad), 
 80 |                                              ('uncovered', total_uncovered)])
 81 | 
 82 |     # distances
 83 |     distfname = fname + '.dist'
 84 |     if os.path.isfile(distfname):
 85 |         name_dupes = 0
 86 |         card_dupes = 0
 87 |         with open(distfname, 'rt') as f:
 88 |             distlines = f.read().split('\n')
 89 |         dists = OrderedDict([('name', []), ('cbow', [])])
 90 |         for line in distlines:
 91 |             fields = line.split('|')
 92 |             if len(fields) < 4:
 93 |                 continue
 94 |             idx = int(fields[0])
 95 |             name = str(fields[1])
 96 |             ndist = float(fields[2])
 97 |             cdist = float(fields[3])
 98 |             dists['name'] += [ndist]
 99 |             dists['cbow'] += [cdist]
100 |             if ndist == 1.0:
101 |                 name_dupes += 1
102 |             if cdist == 1.0:
103 |                 card_dupes += 1
104 | 
105 |         dists['name_mean'] = mean_nonan(dists['name'])
106 |         dists['cbow_mean'] = mean_nonan(dists['cbow'])
107 |         dists['name_geomean'] = gmean_nonzero(dists['name'])
108 |         dists['cbow_geomean'] = gmean_nonzero(dists['cbow'])
109 |         stats['dists'] = dists
110 |         
111 |     # n-grams
112 |     if not lm is None:
113 |         ngram = OrderedDict([('perp', []), ('perp_per', []), 
114 |                              ('perp_max', []), ('perp_per_max', [])])
115 |         for card in cards:
116 |             if len(card.text.text) == 0:
117 |                 perp = 0.0
118 |                 perp_per = 0.0
119 |             elif sep:
120 |                 vtexts = [line.vectorize().split() for line in card.text_lines 
121 |                           if len(line.vectorize().split()) > 0]
122 |                 perps = [lm.perplexity(vtext) for vtext in vtexts]
123 |                 perps_per = [perps[i] / float(len(vtexts[i])) for i in range(0, len(vtexts))]
124 |                 perp = gmean_nonzero(perps)
125 |                 perp_per = gmean_nonzero(perps_per)
126 |                 perp_max = max(perps)
127 |                 perp_per_max = max(perps_per)
128 |             else:
129 |                 vtext = card.text.vectorize().split()
130 |                 perp = lm.perplexity(vtext)
131 |                 perp_per = perp / float(len(vtext))
132 |                 perp_max = perp
133 |                 perp_per_max = perps_per
134 | 
135 |             ngram['perp'] += [perp]
136 |             ngram['perp_per'] += [perp_per]
137 |             ngram['perp_max'] += [perp_max]
138 |             ngram['perp_per_max'] += [perp_per_max]
139 | 
140 |         ngram['perp_mean'] = mean_nonan(ngram['perp'])
141 |         ngram['perp_per_mean'] = mean_nonan(ngram['perp_per'])
142 |         ngram['perp_geomean'] = gmean_nonzero(ngram['perp'])
143 |         ngram['perp_per_geomean'] = gmean_nonzero(ngram['perp_per'])
144 |         stats['ngram'] = ngram
145 | 
146 |     return stats
147 |     
148 | 
149 | def main(infile, verbose = False):
150 |     lm = ngrams.build_ngram_model(jdecode.mtg_open_file(str(os.path.join(datadir, 'output.txt'))),
151 |                             3, separate_lines=True, verbose=True)
152 |     stats = get_statistics(infile, lm=lm, sep=True, verbose=verbose)
153 |     print_statistics(stats)
154 | 
155 | if __name__ == '__main__':
156 |     
157 |     import argparse
158 |     parser = argparse.ArgumentParser()
159 |     
160 |     parser.add_argument('infile', #nargs='?'. default=None,
161 |                         help='encoded card file or json corpus to process')
162 |     parser.add_argument('-v', '--verbose', action='store_true', 
163 |                         help='verbose output')
164 | 
165 |     args = parser.parse_args()
166 |     main(args.infile, verbose=args.verbose)
167 |     exit(0)
168 | 


--------------------------------------------------------------------------------
/scripts/autosample.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | import sys
 3 | import os
 4 | import subprocess
 5 | import random
 6 | 
 7 | def extract_cp_name(name):
 8 |     # "lm_lstm_epoch50.00_0.1870.t7"
 9 |     if not (name[:13] == 'lm_lstm_epoch' and name[-3:] == '.t7'):
10 |         return None
11 |     name = name[13:-3]
12 |     (epoch, vloss) = tuple(name.split('_'))
13 |     return (float(epoch), float(vloss))
14 | 
15 | def sample(cp, temp, count, seed = None, ident = 'output'):
16 |     if seed is None:
17 |         seed = random.randint(-1000000000, 1000000000)
18 |     outfile = cp + '.' + ident + '.' + str(temp) + '.txt'
19 |     cmd = ('th sample.lua ' + cp 
20 |            + ' -temperature ' + str(temp) 
21 |            + ' -length ' + str(count)
22 |            + ' -seed ' + str(seed)
23 |            + ' >> ' + outfile)
24 |     if os.path.exists(outfile):
25 |         print(outfile + ' already exists, skipping')
26 |         return False
27 |     else:
28 |         # UNSAFE SHELL=TRUE FOR CONVENIENCE
29 |         subprocess.call('echo "' + cmd + '" | tee ' + outfile, shell=True)
30 |         subprocess.call(cmd, shell=True)
31 | 
32 | def find_best_cp(cpdir):
33 |     best = None
34 |     best_cp = None
35 |     for path in os.listdir(cpdir):
36 |         fullpath = os.path.join(cpdir, path)
37 |         if os.path.isfile(fullpath):
38 |             extracted = extract_cp_name(path)
39 |             if not extracted is None:
40 |                 (epoch, vloss) = extracted
41 |                 if best is None or vloss < best:
42 |                     best = vloss
43 |                     best_cp = fullpath
44 |     return best_cp
45 | 
46 | def process_dir(cpdir, temp, count, seed = None, ident = 'output', verbose = False):
47 |     if verbose:
48 |         print('processing ' + cpdir)
49 |     best_cp = find_best_cp(cpdir)
50 |     if not best_cp is None:
51 |         sample(best_cp, temp, count, seed=seed, ident=ident)
52 |     for path in os.listdir(cpdir):
53 |         fullpath = os.path.join(cpdir, path)
54 |         if os.path.isdir(fullpath):
55 |             process_dir(fullpath, temp, count, seed=seed, ident=ident, verbose=verbose)
56 | 
57 | def main(rnndir, cpdir, temp, count, seed = None, ident = 'output', verbose = False):
58 |     if not os.path.isdir(rnndir):
59 |         raise ValueError('bad rnndir: ' + rnndir)
60 |     if not os.path.isdir(cpdir):
61 |         raise ValueError('bad cpdir: ' + cpdir)
62 |     os.chdir(rnndir)
63 |     process_dir(cpdir, temp, count, seed=seed, ident=ident, verbose=verbose)
64 | 
65 | if __name__ == '__main__':
66 |     import argparse
67 |     parser = argparse.ArgumentParser()
68 |     
69 |     parser.add_argument('rnndir', #nargs='?'. default=None,
70 |                         help='base rnn directory, must contain sample.lua')
71 |     parser.add_argument('cpdir', #nargs='?', default=None,
72 |                         help='checkpoint directory, all subdirectories will be processed')
73 |     parser.add_argument('-t', '--temperature', action='store', default='1.0',
74 |                         help='sampling temperature')
75 |     parser.add_argument('-c', '--count', action='store', default='1000000',
76 |                         help='number of characters to sample each time')
77 |     parser.add_argument('-s', '--seed', action='store', default=None,
78 |                         help='fixed seed; if not present, a random seed will be used')
79 |     parser.add_argument('-i', '--ident', action='store', default='output',
80 |                         help='identifier to include in the output filenames')
81 |     parser.add_argument('-v', '--verbose', action='store_true', 
82 |                         help='verbose output')
83 | 
84 |     args = parser.parse_args()
85 |     if args.seed is None:
86 |         seed = None
87 |     else:
88 |         seed = int(args.seed)
89 |     main(args.rnndir, args.cpdir, float(args.temperature), int(args.count), 
90 |          seed=seed, ident=args.ident, verbose = args.verbose)
91 |     exit(0)
92 | 


--------------------------------------------------------------------------------
/scripts/collect_checkpoints.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | import sys
  3 | import os
  4 | import shutil
  5 | 
  6 | def cleanup_dump(dumpstr):
  7 |     cardfrags = dumpstr.split('\n\n')
  8 |     if len(cardfrags) < 4:
  9 |         return ''
 10 |     else:
 11 |         return '\n\n'.join(cardfrags[2:-1]) + '\n\n'
 12 | 
 13 | def identify_checkpoints(basedir, ident):
 14 |     cp_infos = []
 15 |     for path in os.listdir(basedir):
 16 |         fullpath = os.path.join(basedir, path)
 17 |         if not os.path.isfile(fullpath):
 18 |             continue
 19 |         if not (path[:13] == 'lm_lstm_epoch' and path[-4:] == '.txt'):
 20 |             continue
 21 |         if not ident in path:
 22 |             continue
 23 |         # attempt super hacky parsing
 24 |         inner = path[13:-4]
 25 |         halves = inner.split('_')
 26 |         if not len(halves) == 2:
 27 |             continue
 28 |         parts = halves[1].split('.')
 29 |         if not len(parts) == 6:
 30 |             continue
 31 |         # lm_lstm_epoch[25.00_0.3859.t7.output.1.0].txt
 32 |         if not parts[3] == ident:
 33 |             continue
 34 |         epoch = halves[0]
 35 |         vloss = '.'.join([parts[0], parts[1]])
 36 |         temp = '.'.join([parts[4], parts[5]])
 37 |         cpname = 'lm_lstm_epoch' + epoch + '_' + vloss + '.t7'
 38 |         cp_infos += [(fullpath, os.path.join(basedir, cpname),
 39 |                       (epoch, vloss, temp))]
 40 |     return cp_infos
 41 | 
 42 | def process_dir(basedir, targetdir, ident, copy_cp = False, verbose = False):
 43 |     (basepath, basedirname) = os.path.split(basedir)
 44 |     if basedirname == '':
 45 |         (basepath, basedirname) = os.path.split(basepath)
 46 | 
 47 |     cp_infos = identify_checkpoints(basedir, ident)
 48 |     for (dpath, cpath, (epoch, vloss, temp)) in cp_infos:
 49 |         if verbose:
 50 |             print('found dumpfile ' + dpath)
 51 |         dname = basedirname + '_epoch' + epoch + '_' + vloss + '.' + ident + '.' + temp + '.txt'
 52 |         cname = basedirname + '_epoch' + epoch + '_' + vloss + '.t7'
 53 |         tdpath = os.path.join(targetdir, dname)
 54 |         tcpath = os.path.join(targetdir, cname)
 55 |         if verbose:
 56 |             print('    cpx ' + dpath + ' ' + tdpath)
 57 |         with open(dpath, 'rt') as infile:
 58 |             with open(tdpath, 'wt') as outfile:
 59 |                 outfile.write(cleanup_dump(infile.read()))
 60 |         if copy_cp:
 61 |             if os.path.isfile(cpath):
 62 |                 if verbose:
 63 |                     print('    cp ' + cpath + ' ' + tcpath)
 64 |                 shutil.copy(cpath,  tcpath)
 65 | 
 66 |     if copy_cp and len(cp_infos) > 0:
 67 |         cmdpath = os.path.join(basedir, 'command.txt')
 68 |         tcmdpath = os.path.join(targetdir, basedirname + '.command')
 69 |         if os.path.isfile(cmdpath):
 70 |             if verbose:
 71 |                 print('    cp ' + cmdpath + ' ' + tcmdpath)
 72 |             shutil.copy(cmdpath,  tcmdpath)
 73 | 
 74 |     for path in os.listdir(basedir):
 75 |         fullpath = os.path.join(basedir, path)
 76 |         if os.path.isdir(fullpath):
 77 |             process_dir(fullpath, targetdir, ident, copy_cp=copy_cp, verbose=verbose)
 78 |             
 79 | def main(basedir, targetdir, ident = 'output', copy_cp = False, verbose = False):
 80 |     process_dir(basedir, targetdir, ident, copy_cp=copy_cp, verbose=verbose)
 81 | 
 82 | if __name__ == '__main__':
 83 |     import argparse
 84 |     parser = argparse.ArgumentParser()
 85 |     
 86 |     parser.add_argument('basedir', #nargs='?'. default=None,
 87 |                         help='base rnn directory, must contain sample.lua')
 88 |     parser.add_argument('targetdir', #nargs='?', default=None,
 89 |                         help='checkpoint directory, all subdirectories will be processed')
 90 |     parser.add_argument('-c', '--copy_cp', action='store_true', 
 91 |                         help='copy checkpoints used to generate the output files')
 92 |     parser.add_argument('-i', '--ident', action='store', default='output',
 93 |                         help='identifier to look for to determine checkpoints')
 94 |     parser.add_argument('-v', '--verbose', action='store_true', 
 95 |                         help='verbose output')
 96 | 
 97 |     args = parser.parse_args()
 98 |     main(args.basedir, args.targetdir, ident=args.ident, copy_cp=args.copy_cp, verbose=args.verbose)
 99 |     exit(0)
100 | 


--------------------------------------------------------------------------------
/scripts/distances.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | import sys
 3 | import os
 4 | 
 5 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../lib')
 6 | sys.path.append(libdir)
 7 | import utils
 8 | import jdecode
 9 | from namediff import Namediff
10 | from cbow import CBOW
11 | 
12 | def main(fname, oname, verbose = True, parallel = True):
13 |     # may need to set special arguments here
14 |     cards = jdecode.mtg_open_file(fname, verbose=verbose)
15 | 
16 |     # this could reasonably be some separate function
17 |     # might make sense to merge cbow and namediff and have this be the main interface
18 |     namediff = Namediff()
19 |     cbow = CBOW()
20 | 
21 |     if verbose:
22 |         print 'Computing nearest names...'
23 |     if parallel:
24 |         nearest_names = namediff.nearest_par(map(lambda c: c.name, cards), n=1)
25 |     else:
26 |         nearest_names = [namediff.nearest(c.name, n=1) for c in cards]
27 | 
28 |     if verbose:
29 |         print 'Computing nearest cards...'
30 |     if parallel:
31 |         nearest_cards = cbow.nearest_par(cards, n=1)
32 |     else:
33 |         nearest_cards = [cbow.nearest(c, n=1) for c in cards]
34 | 
35 |     for i in range(0, len(cards)):
36 |         cards[i].nearest_names = nearest_names[i]
37 |         cards[i].nearest_cards = nearest_cards[i]
38 | 
39 |     # # unfortunately this takes ~30 hours on 8 cores for a 10MB dump
40 |     # if verbose:
41 |     #     print 'Computing nearest encodings by text edit distance...'
42 |     # if parallel:
43 |     #     nearest_cards_text = namediff.nearest_card_par(cards, n=1)
44 |     # else:
45 |     #     nearest_cards_text = [namediff.nearest_card(c, n=1) for c in cards]
46 | 
47 |     if verbose:
48 |         print '...Done.'
49 | 
50 |     # write to a file to store the data, this is a terribly long computation
51 |     # we could also just store this same info in the cards themselves as more fields...
52 |     sep = '|'
53 |     with open(oname, 'w') as ofile:
54 |         for i in range(0, len(cards)):
55 |             card = cards[i]
56 |             ostr = str(i) + sep + card.name + sep
57 |             ndist, _ = card.nearest_names[0]
58 |             ostr += str(ndist) + sep
59 |             cdist, _ = card.nearest_cards[0]
60 |             ostr += str(cdist) + '\n'
61 |             # tdist, _ = nearest_cards_text[i][0]
62 |             # ostr += str(tdist) + '\n'
63 |             ofile.write(ostr.encode('utf-8'))
64 | 
65 | if __name__ == '__main__':
66 |     
67 |     import argparse
68 |     parser = argparse.ArgumentParser()
69 |     
70 |     parser.add_argument('infile', #nargs='?'. default=None,
71 |                         help='encoded card file or json corpus to process')
72 |     parser.add_argument('outfile', #nargs='?', default=None,
73 |                         help='name of output file, will be overwritten')
74 |     parser.add_argument('-v', '--verbose', action='store_true', 
75 |                         help='verbose output')
76 |     parser.add_argument('-p', '--parallel', action='store_true', 
77 |                         help='run in parallel on all cores')
78 | 
79 |     args = parser.parse_args()
80 |     main(args.infile, args.outfile, verbose=args.verbose, parallel=args.parallel)
81 |     exit(0)
82 | 


--------------------------------------------------------------------------------
/scripts/keydiff.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | def parse_keyfile(f, d, constructor = lambda x: x):
 4 |     for line in f:
 5 |         kv = map(lambda s: s.strip(), line.split(':'))
 6 |         if not len(kv) == 2:
 7 |             continue
 8 |         d[kv[0]] = constructor(kv[1])
 9 | 
10 | def merge_dicts(d1, d2):
11 |     d = {}
12 |     for k in d1:
13 |         d[k] = (d1[k], d2[k] if k in d2 else None)
14 |     for k in d2:
15 |         if not k in d:
16 |             d[k] = (None, d2[k])
17 |     return d
18 | 
19 | def main(fname1, fname2, verbose = True):
20 |     if verbose:
21 |         print 'opening ' + fname1 + ' as base key/value store'
22 |         print 'opening ' + fname2 + ' as target key/value store'
23 | 
24 |     d1 = {}
25 |     d2 = {}
26 |     with open(fname1, 'rt') as f1:
27 |         parse_keyfile(f1, d1, int)
28 |     with open(fname2, 'rt') as f2:
29 |         parse_keyfile(f2, d2, int)
30 |     
31 |     tot1 = sum(d1.values())
32 |     tot2 = sum(d2.values())
33 | 
34 |     if verbose:
35 |         print '  ' + fname1 + ': ' + str(len(d1)) + ', total ' + str(tot1)
36 |         print '  ' + fname2 + ': ' + str(len(d2)) + ', total ' + str(tot2)
37 | 
38 |     d_merged = merge_dicts(d1, d2)
39 | 
40 |     ratios = {}
41 |     only_1 = {}
42 |     only_2 = {}
43 |     for k in d_merged:
44 |         (v1, v2) = d_merged[k]
45 |         if v1 is None:
46 |             only_2[k] = v2
47 |         elif v2 is None:
48 |             only_1[k] = v1
49 |         else:
50 |             ratios[k] = float(v2 * tot1) / float(v1 * tot2)
51 | 
52 |     print 'shared: ' + str(len(ratios))
53 |     for k in sorted(ratios, lambda x,y: cmp(d2[x], d2[y]), reverse=True):
54 |         print '  ' + k + ': ' + str(d2[k]) + '/' + str(d1[k]) + ' (' + str(ratios[k]) + ')'
55 |     print ''
56 |         
57 |     print '1 only: ' + str(len(only_1))
58 |     for k in sorted(only_1, lambda x,y: cmp(d1[x], d1[y]), reverse=True):
59 |         print '  ' + k + ': ' + str(d1[k])
60 |     print ''
61 | 
62 |     print '2 only: ' + str(len(only_2))
63 |     for k in sorted(only_2, lambda x,y: cmp(d2[x], d2[y]), reverse=True):
64 |         print '  ' + k + ': ' + str(d2[k])
65 |     print ''
66 | 
67 | if __name__ == '__main__':
68 |     
69 |     import argparse
70 |     parser = argparse.ArgumentParser()
71 |     
72 |     parser.add_argument('file1', #nargs='?'. default=None,
73 |                         help='base key file to diff against')
74 |     parser.add_argument('file2', nargs='?', default=None,
75 |                         help='other file to compare against the baseline')
76 |     parser.add_argument('-v', '--verbose', action='store_true', 
77 |                         help='verbose output')
78 | 
79 |     args = parser.parse_args()
80 |     main(args.file1, args.file2, verbose=args.verbose)
81 |     exit(0)
82 | 


--------------------------------------------------------------------------------
/scripts/mtg_validate.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | import sys
  3 | import os
  4 | import re
  5 | from collections import OrderedDict
  6 | 
  7 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../lib')
  8 | sys.path.append(libdir)
  9 | import utils
 10 | import jdecode
 11 | 
 12 | datadir = os.path.realpath(os.path.join(libdir, '../data'))
 13 | gramdir = os.path.join(datadir, 'ngrams')
 14 | compute_ngrams = False
 15 | gramdicts = {}
 16 | if os.path.isdir(gramdir):
 17 |     import keydiff
 18 |     compute_ngrams = True
 19 |     for fname in os.listdir(gramdir):
 20 |         suffixes = re.findall(r'\.[0-9]*g$', fname)
 21 |         if suffixes:
 22 |             grams = int(suffixes[0][1:-1])
 23 |             d = {}
 24 |             with open(os.path.join(gramdir, fname), 'rt') as f:
 25 |                 keydiff.parse_keyfile(f, d, int)
 26 |             gramdicts[grams] = d
 27 | 
 28 | def rare_grams(card, thresh = 2, grams = 2):
 29 |     if not grams in gramdicts:
 30 |         return None
 31 |     rares = 0
 32 |     gramdict = gramdicts[grams]
 33 |     for line in card.text_lines_words:
 34 |         for i in range(0, len(line) - (grams - 1)):
 35 |             ngram = ' '.join([line[i + j] for j in range(0, grams)])
 36 |             if ngram in gramdict:
 37 |                 if gramdict[ngram] < thresh:
 38 |                     rares += 1
 39 |             else:
 40 |                 rares += 1
 41 |     return rares
 42 | 
 43 | def list_only(l, items):
 44 |     for e in l:
 45 |         if not e in items:
 46 |             return False
 47 |     return True
 48 | 
 49 | def pct(x, total):
 50 |     pctstr = 100.0 * float(x) / float(total)
 51 |     return '(' + str(pctstr)[:5] + '%)'
 52 | 
 53 | def check_types(card):
 54 |     if 'instant' in card.types:
 55 |         return list_only(card.types, ['tribal', 'instant'])
 56 |     if 'sorcery' in card.types:
 57 |         return list_only(card.types, ['tribal', 'sorcery'])
 58 |     if 'creature' in card.types:
 59 |         return list_only(card.types, ['tribal', 'creature', 'artifact', 'land', 'enchantment'])
 60 |     if 'planeswalker' in card.types:
 61 |         return list_only(card.types, ['tribal', 'planeswalker', 'artifact', 'land', 'enchantment'])
 62 |     else:
 63 |         return list_only(card.types, ['tribal', 'artifact', 'land', 'enchantment'])
 64 | 
 65 | def check_pt(card):
 66 |     if ('creature' in card.types or 'vehicle' in card.subtypes) or card.pt:
 67 |         return ((('creature' in card.types or 'vehicle' in card.subtypes) and len(re.findall(re.escape('/'), card.pt)) == 1)
 68 |                 and not card.loyalty)
 69 |     if 'planeswalker' in card.types or card.loyalty:
 70 |         return (('planeswalker' in card.types and card.loyalty)
 71 |                 and not card.pt)
 72 |     return None
 73 | 
 74 | def check_lands(card):
 75 |     if 'land' in card.types:
 76 |         return card.cost.format() == '_NOCOST_'
 77 |     else:
 78 |         return None
 79 | 
 80 | # doesn't handle granted activated abilities in ""
 81 | def check_X(card):
 82 |     correct = None
 83 |     incost = 'X' in card.cost.encode()
 84 |     extra_cost_lines = 0
 85 |     cost_lines = 0
 86 |     use_lines = 0
 87 |     for mt in card.text_lines:
 88 |         sides = mt.text.split(':')
 89 |         if len(sides) == 2:
 90 |             actcosts = len(re.findall(re.escape(utils.reserved_mana_marker), sides[0]))
 91 |             lcosts = mt.costs[:actcosts]
 92 |             rcosts = mt.costs[actcosts:]
 93 |             if 'X' in sides[0] or (utils.reserved_mana_marker in sides[0] and
 94 |                                    'X' in ''.join(map(lambda c: c.encode(), lcosts))):
 95 | 
 96 |                 if incost:
 97 |                     return False # bad, duplicated Xs in costs
 98 | 
 99 |                 if 'X' in sides[1] or (utils.reserved_mana_marker in sides[1] and
100 |                                        'X' in ''.join(map(lambda c: c.encode(), rcosts))):
101 |                     correct = True # good, defined X is either specified or used
102 |                     if 'monstrosity' in sides[1]:
103 |                         extra_cost_lines += 1
104 |                     continue
105 |                 elif 'remove X % counters' in sides[0] and 'each counter removed' in sides[1]:
106 |                     correct = True # Blademane Baku
107 |                     continue
108 |                 elif 'note' in sides[1]:
109 |                     correct = True # Ice Cauldron
110 |                     continue
111 |                 else:
112 |                     return False # bad, defined X is unused
113 | 
114 |         # we've checked all cases where an X ocurrs in an activiation cost
115 |         linetext = mt.encode()
116 |         intext = len(re.findall(r'X', linetext))
117 |         defs = (len(re.findall(r'X is', linetext))
118 |                 + len(re.findall(re.escape('pay {X'), linetext))
119 |                 + len(re.findall(re.escape('pay X'), linetext))
120 |                 + len(re.findall(re.escape('reveal X'), linetext))
121 |                 + len(re.findall(re.escape('may tap X'), linetext)))
122 | 
123 |         if incost:
124 |             if intext:
125 |                 correct = True # defined and used or specified in some way
126 |         elif intext > 0:
127 |             if intext > 1 and defs > 0:
128 |                 correct = True # look for multiples
129 |             elif 'suspend' in linetext or 'bloodthirst' in linetext:
130 |                 correct = True # special case keywords
131 |             elif 'reinforce' in linetext and intext > 2:
132 |                 correct = True # this should work
133 |             elif 'contain {X' in linetext or 'with {X' in linetext:
134 |                 correct = True
135 |                 
136 |             elif ('additional cost' in linetext
137 |                   or 'morph' in linetext
138 |                   or 'kicker' in linetext):
139 |                 cost_lines += 1
140 |             else:
141 |                 use_lines += 1
142 | 
143 |     if incost and not correct:
144 |         if 'sunburst' in card.text.text or 'spent to cast' in card.text.text:
145 |             return True # Engineered Explosives, Skyrider Elf
146 |         return False # otherwise we should have seen X somewhere if it was in the cost
147 | 
148 |     elif cost_lines > 0 or use_lines > 0:
149 |         if (cost_lines + extra_cost_lines) == 1 and use_lines > 0:
150 |             return True # dreams, etc.
151 |         else:
152 |             return False
153 | 
154 |     return correct
155 | 
156 | def check_kicker(card):
157 |     # also lazy and simple
158 |     if 'kicker' in card.text.text or 'kicked' in card.text.text:
159 |         # could also check for costs, at least make 'it's $ kicker,' not count as a kicker ability
160 |         newtext = card.text.text.replace(utils.reserved_mana_marker + ' kicker', '')
161 |         return 'kicker' in newtext and 'kicked' in newtext
162 |     else:
163 |         return None
164 | 
165 | def check_counters(card):
166 |     uses = len(re.findall(re.escape(utils.counter_marker), card.text.text))
167 |     if uses > 0:
168 |         return uses > 1 and 'countertype ' + utils.counter_marker in card.text.text
169 |     else:
170 |         return None
171 | 
172 | def check_choices(card):
173 |     bullets = len(re.findall(re.escape(utils.bullet_marker), card.text.text))
174 |     obracks = len(re.findall(re.escape(utils.choice_open_delimiter), card.text.text))
175 |     cbracks = len(re.findall(re.escape(utils.choice_close_delimiter), card.text.text))
176 |     if bullets + obracks + cbracks > 0:
177 |         if not (obracks == cbracks and bullets > 0):
178 |             return False
179 |         # could compile ahead of time
180 |         choice_regex = (re.escape(utils.choice_open_delimiter) + re.escape(utils.unary_marker)
181 |                         + r'.*' + re.escape(utils.bullet_marker) + r'.*' 
182 |                         + re.escape(utils.choice_close_delimiter))
183 |         nochoices = re.sub(choice_regex, '', card.text.text)
184 |         nobullets = len(re.findall(re.escape(utils.bullet_marker), nochoices))
185 |         noobracks = len(re.findall(re.escape(utils.choice_open_delimiter), nochoices))
186 |         nocbracks = len(re.findall(re.escape(utils.choice_close_delimiter), nochoices))
187 |         return nobullets + noobracks + nocbracks == 0
188 |     else:
189 |         return None
190 | 
191 | def check_auras(card):
192 |     # a bit loose
193 |     if 'enchantment' in card.types or 'aura' in card.subtypes or 'enchant' in card.text.text:
194 |         return 'enchantment' in card.types or 'aura' in card.subtypes or 'enchant' in card.text.text
195 |     else:
196 |         return None
197 | 
198 | def check_equipment(card):
199 |     # probably even looser, chould check for actual equip abilities and noncreatureness
200 |     if 'equipment' in card.subtypes:
201 |         return 'equip' in card.text.text
202 |     else:
203 |         return None
204 | 
205 | def check_vehicles(card):
206 |     if 'vehicle' in card.subtypes:
207 | 	return 'crew' in card.text.text
208 |     else:
209 | 	return None
210 | 
211 | def check_planeswalkers(card):
212 |     if 'planeswalker' in card.types:
213 |         good_lines = 0
214 |         bad_lines = 0
215 |         initial_re = r'^[+-]?' + re.escape(utils.unary_marker) + re.escape(utils.unary_counter) + '*:'
216 |         initial_re_X = r'^[-+]' + re.escape(utils.x_marker) + '+:'
217 |         for line in card.text_lines:
218 |             if len(re.findall(initial_re, line.text)) == 1:
219 |                 good_lines += 1
220 |             elif len(re.findall(initial_re_X, line.text)) == 1:
221 |                 good_lines += 1
222 |             elif 'can be your commander' in line.text:
223 |                 pass
224 |             elif 'countertype' in line.text or 'transform' in line.text:
225 |                 pass
226 |             else:
227 |                 bad_lines += 1
228 |         return good_lines > 1 and bad_lines == 0
229 |     else:
230 |         return None
231 | 
232 | def check_levelup(card):
233 |     if 'level' in card.text.text:
234 |         uplines = 0
235 |         llines = 0
236 |         for line in card.text_lines:
237 |             if 'countertype ' + utils.counter_marker + ' level' in line.text:
238 |                 uplines += 1
239 |                 llines += 1
240 |             elif 'with level up' in line.text:
241 |                 llines += 1
242 |             elif 'level up' in line.text:
243 |                 uplines += 1
244 |             elif 'level' in line.text:
245 |                 llines += 1
246 |         return uplines == 1 and llines > 0
247 |     else:
248 |         return None
249 | 
250 | def check_activated(card):
251 |     activated = 0
252 |     for line in card.text_lines:
253 |         if '.' in line.text:
254 |             subtext = re.sub(r'"[^"]*"', '', line.text)
255 |             if 'forecast' in subtext:
256 |                 pass
257 |             elif 'return ' + utils.this_marker + ' from your graveyard' in subtext:
258 |                 pass
259 |             elif 'on the stack' in subtext:
260 |                 pass
261 |             elif ':' in subtext:
262 |                 activated += 1
263 |     if activated > 0:
264 |         return list_only(card.types, ['creature', 'land', 'artifact', 'enchantment', 'planeswalker', 'tribal'])
265 |     else:
266 |         return None
267 | 
268 | def check_triggered(card):
269 |     triggered = 0
270 |     triggered_2 = 0
271 |     for line in card.text_lines:
272 |         if 'when ' + utils.this_marker + ' enters the battlefield' in line.text:
273 |             triggered += 1
274 |         if 'when ' + utils.this_marker + ' leaves the battlefield' in line.text:
275 |             triggered += 1
276 |         if 'when ' + utils.this_marker + ' dies' in line.text:
277 |             triggered += 1
278 |         elif 'at the beginning' == line.text[:16] or 'when' == line.text[:4]:
279 |             if 'from your graveyard' in line.text:
280 |                 triggered_2 += 1
281 |             elif 'in your graveyard' in line.text:
282 |                 triggered_2 += 1
283 |             elif 'if ' + utils.this_marker + ' is suspended' in line.text:
284 |                 triggered_2 += 1
285 |             elif 'if that card is exiled' in line.text or 'if ' + utils.this_marker + ' is exiled' in line.text:
286 |                 triggered_2 += 1
287 |             elif 'when the creature ' + utils.this_marker + ' haunts' in line.text:
288 |                 triggered_2 += 1
289 |             elif 'when you cycle ' + utils.this_marker in line.text or 'when you cast ' + utils.this_marker in line.text:
290 |                 triggered_2 += 1
291 |             elif 'this turn' in line.text or 'this combat' in line.text or 'your next upkeep' in line.text:
292 |                 triggered_2 += 1
293 |             elif 'from your library' in line.text:
294 |                 triggered_2 += 1
295 |             elif 'you discard ' + utils.this_marker in line.text or 'you to discard ' + utils.this_marker in line.text:
296 |                 triggered_2 += 1
297 |             else:
298 |                 triggered += 1
299 |             
300 |     if triggered > 0:
301 |         return list_only(card.types, ['creature', 'land', 'artifact', 'enchantment', 'planeswalker', 'tribal'])
302 |     elif triggered_2:
303 |         return True
304 |     else:
305 |         return None
306 | 
307 | def check_chosen(card):
308 |     if 'chosen' in card.text.text:
309 |         return ('choose' in card.text.text
310 |                 or 'chosen at random' in card.text.text
311 |                 or 'name' in card.text.text
312 |                 or 'is chosen' in card.text.text
313 |                 or 'search' in card.text.text)
314 |     else:
315 |         return None
316 | 
317 | def check_shuffle(card):
318 |     retval = None
319 |     # sadly, this does not detect spurious shuffling
320 |     for line in card.text_lines:
321 |         if 'search' in line.text and 'library' in line.text:
322 |             thisval = ('shuffle' in line.text
323 |                        or 'searches' in line.text
324 |                        or 'searched' in line.text
325 |                        or 'searching' in line.text
326 |                        or 'rest' in line.text
327 |                        or 'instead' in line.text)
328 |             if retval is None:
329 |                 retval = thisval
330 |             else:
331 |                 retval = retval and thisval
332 |     return retval
333 | 
334 | def check_quotes(card):
335 |     retval = None
336 |     for line in card.text_lines:
337 |         quotes = len(re.findall(re.escape('"'), line.text))
338 |         # HACK: the '" pattern in the training set is actually incorrect
339 |         quotes += len(re.findall(re.escape('\'"'), line.text))
340 |         if quotes > 0:
341 |             thisval = quotes % 2 == 0
342 |             if retval is None:
343 |                 retval = thisval
344 |             else:
345 |                 retval = retval and thisval
346 |     return retval
347 | 
348 | props = OrderedDict([
349 |     ('types', check_types),
350 |     ('pt', check_pt),
351 |     ('lands', check_lands),
352 |     ('X', check_X),
353 |     ('kicker', check_kicker),
354 |     ('counters', check_counters),
355 |     ('choices', check_choices),
356 |     ('quotes', check_quotes),
357 |     ('auras', check_auras),
358 |     ('equipment', check_equipment),
359 |     ('vehicles', check_vehicles),
360 |     ('planeswalkers', check_planeswalkers),
361 |     ('levelup', check_levelup),
362 |     ('chosen', check_chosen),
363 |     ('shuffle', check_shuffle),
364 |     ('activated', check_activated),
365 |     ('triggered', check_triggered),
366 | ])
367 | 
368 | def process_props(cards, dump = False, uncovered = False):
369 |     total_all = 0
370 |     total_good = 0
371 |     total_bad = 0
372 |     total_uncovered = 0
373 |     values = OrderedDict([(k, (0,0,0)) for k in props])
374 | 
375 |     for card in cards:
376 |         total_all += 1
377 |         overall = True
378 |         any_prop = False
379 |         for prop in props:
380 |             (total, good, bad) = values[prop]
381 |             this_prop = props[prop](card)
382 |             if not this_prop is None:
383 |                 total += 1
384 |                 if not prop == 'types':
385 |                     any_prop = True
386 |                 if this_prop:
387 |                     good += 1
388 |                 else:
389 |                     bad += 1
390 |                     overall = False
391 |                     if card.name not in ['demonic pact', 'lavaclaw reaches',
392 |                                          "ertai's trickery", 'rumbling aftershocks', # i hate these
393 |                     ] and dump:
394 |                         print('---- ' + prop + ' ----')
395 |                         print(card.encode())
396 |                         print(card.format())
397 |                 values[prop] = (total, good, bad)
398 |         if overall:
399 |             total_good += 1
400 |         else:
401 |             total_bad += 1
402 |         if not any_prop:
403 |             total_uncovered += 1
404 |             if uncovered:
405 |                 print('---- uncovered ----')
406 |                 print(card.encode())
407 |                 print(card.format())
408 | 
409 |     return ((total_all, total_good, total_bad, total_uncovered),
410 |             values)
411 | 
412 | def main(fname, oname = None, verbose = False, dump = False):
413 |     # may need to set special arguments here
414 |     cards = jdecode.mtg_open_file(fname, verbose=verbose)
415 |     
416 |     do_grams = False
417 | 
418 |     if do_grams:
419 |         rg = {}
420 |         for card in cards:
421 |             g = rare_grams(card, thresh=2, grams=2)
422 |             if len(card.text_words) > 0:
423 |                 g = int(1.0 + (float(g) * 100.0 / float(len(card.text_words))))
424 |             if g in rg:
425 |                 rg[g] += 1
426 |             else:
427 |                 rg[g] = 1
428 |             if g >= 60:
429 |                 print g
430 |                 print card.format()
431 | 
432 |         tot = 0
433 |         vmax = sum(rg.values())
434 |         pct90 = None
435 |         pct95 = None
436 |         pct99 = None
437 |         for i in sorted(rg):
438 |             print str(i) + ' rare ngrams: ' + str(rg[i])
439 |             tot += rg[i]
440 |             if pct90 is None and tot >= vmax * 0.90:
441 |                 pct90 = i
442 |             if pct95 is None and tot >= vmax * 0.95:
443 |                 pct95 = i
444 |             if pct99 is None and tot >= vmax * 0.99:
445 |                 pct99 = i
446 | 
447 |         print '90% - ' + str(pct90)
448 |         print '95% - ' + str(pct95)
449 |         print '99% - ' + str(pct99)
450 | 
451 |     else:
452 |         ((total_all, total_good, total_bad, total_uncovered), 
453 |          values) = process_props(cards, dump=dump)
454 | 
455 |         # summary
456 |         print('-- overall --')
457 |         print('  total     : ' + str(total_all))
458 |         print('  good      : ' + str(total_good) + ' ' + pct(total_good, total_all))
459 |         print('  bad       : ' + str(total_bad) + ' ' + pct(total_bad, total_all))
460 |         print('  uncocoverd: ' + str(total_uncovered) + ' ' + pct(total_uncovered, total_all))
461 |         print('----')
462 | 
463 |         # breakdown
464 |         for prop in props:
465 |             (total, good, bad) = values[prop]
466 |             print(prop + ':')
467 |             print('  total: ' + str(total) + ' ' + pct(total, total_all))
468 |             print('  good : ' + str(good) + ' ' + pct(good, total_all))
469 |             print('  bad  : ' + str(bad) + ' ' + pct(bad, total_all))
470 | 
471 | 
472 | if __name__ == '__main__':
473 |     
474 |     import argparse
475 |     parser = argparse.ArgumentParser()
476 |     
477 |     parser.add_argument('infile', #nargs='?'. default=None,
478 |                         help='encoded card file or json corpus to process')
479 |     parser.add_argument('outfile', nargs='?', default=None,
480 |                         help='name of output file, will be overwritten')
481 |     parser.add_argument('-v', '--verbose', action='store_true', 
482 |                         help='verbose output')
483 |     parser.add_argument('-d', '--dump', action='store_true', 
484 |                         help='print invalid cards')
485 | 
486 |     args = parser.parse_args()
487 |     main(args.infile, args.outfile, verbose=args.verbose, dump=args.dump)
488 |     exit(0)
489 |  
490 | 


--------------------------------------------------------------------------------
/scripts/ngrams.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | import sys
  3 | import os
  4 | import pickle
  5 | 
  6 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../lib')
  7 | sys.path.append(libdir)
  8 | import jdecode
  9 | import nltk_model as model
 10 | 
 11 | def update_ngrams(lines, gramdict, grams):
 12 |     for line in lines:
 13 |         for i in range(0, len(line) - (grams - 1)):
 14 |             ngram = ' '.join([line[i + j] for j in range(0, grams)])
 15 |             if ngram in gramdict:
 16 |                 gramdict[ngram] += 1
 17 |             else:
 18 |                 gramdict[ngram] = 1
 19 | 
 20 | def describe_bins(gramdict, bins):
 21 |     bins = sorted(bins)
 22 |     counts = [0 for _ in range(0, len(bins) + 1)]
 23 | 
 24 |     for ngram in gramdict:
 25 |         for i in range(0, len(bins) + 1):
 26 |             if i < len(bins):
 27 |                 if gramdict[ngram] <= bins[i]:
 28 |                     counts[i] += 1
 29 |                     break
 30 |             else:
 31 |                 # didn't fit into any of the smaller bins, stick in on the end
 32 |                 counts[-1] += 1
 33 |     
 34 |     for i in range(0, len(counts)):
 35 |         if counts[i] > 0:
 36 |             print ('  ' + (str(bins[i]) if i < len(bins) else str(bins[-1]) + '+') 
 37 |                    + ': ' + str(counts[i]))
 38 | 
 39 | def extract_language(cards, separate_lines = True):
 40 |     if separate_lines:
 41 |         lang = [line.vectorize() for card in cards for line in card.text_lines]
 42 |     else:
 43 |         lang = [card.text.vectorize() for card in cards]
 44 |     return map(lambda s: s.split(), lang)
 45 | 
 46 | def build_ngram_model(cards, n, separate_lines = True, verbose = False):
 47 |     if verbose:
 48 |         print('generating ' + str(n) + '-gram model')
 49 |     lang = extract_language(cards, separate_lines=separate_lines)
 50 |     if verbose:
 51 |         print('found ' + str(len(lang)) + ' sentences')
 52 |     lm = model.NgramModel(n, lang, pad_left=True, pad_right=True)
 53 |     if verbose:
 54 |         print(lm)
 55 |     return lm
 56 | 
 57 | def main(fname, oname, gmin = 2, gmax = 8, nltk = False, sep = False, verbose = False):
 58 |     # may need to set special arguments here
 59 |     cards = jdecode.mtg_open_file(fname, verbose=verbose)
 60 |     gmin = int(gmin)
 61 |     gmax = int(gmax)
 62 | 
 63 |     if nltk:
 64 |         n = gmin
 65 |         lm = build_ngram_model(cards, n, separate_lines=sep, verbose=verbose)
 66 |         if verbose:
 67 |             teststr = 'when @ enters the battlefield'
 68 |             print('litmus test: perplexity of ' + repr(teststr))
 69 |             print('  ' + str(lm.perplexity(teststr.split())))
 70 |         if verbose:
 71 |             print('pickling module to ' + oname)
 72 |         with open(oname, 'wb') as f:
 73 |             pickle.dump(lm, f)
 74 | 
 75 |     else:
 76 |         bins = [1, 2, 3, 10, 30, 100, 300, 1000]
 77 |         if gmin < 2 or gmax < gmin:
 78 |             print 'invalid gram sizes: ' + str(gmin) + '-' + str(gmax)
 79 |             exit(1)
 80 | 
 81 |         for grams in range(gmin, gmax+1):
 82 |             if verbose:
 83 |                 print 'generating ' + str(grams) + '-grams...'
 84 |             gramdict = {}
 85 |             for card in cards:
 86 |                 update_ngrams(card.text_lines_words, gramdict, grams)
 87 | 
 88 |             oname_full = oname + '.' + str(grams) + 'g'
 89 |             if verbose:
 90 |                 print('  writing ' + str(len(gramdict)) + ' unique ' + str(grams) 
 91 |                       + '-grams to ' + oname_full)
 92 |                 describe_bins(gramdict, bins)
 93 | 
 94 |             with open(oname_full, 'wt') as f:
 95 |                 for ngram in sorted(gramdict,
 96 |                                     lambda x,y: cmp(gramdict[x], gramdict[y]),
 97 |                                     reverse = True):
 98 |                     f.write((ngram + ': ' + str(gramdict[ngram]) + '\n').encode('utf-8'))
 99 | 
100 | if __name__ == '__main__':
101 |     
102 |     import argparse
103 |     parser = argparse.ArgumentParser()
104 |     
105 |     parser.add_argument('infile', #nargs='?'. default=None,
106 |                         help='encoded card file or json corpus to process')
107 |     parser.add_argument('outfile', #nargs='?', default=None,
108 |                         help='base name of output file, outputs ending in .2g, .3g etc. will be produced')
109 |     parser.add_argument('-min', '--min', action='store', default='2',
110 |                         help='minimum gram size to compute')
111 |     parser.add_argument('-max', '--max', action='store', default='8',
112 |                         help='maximum gram size to compute')
113 |     parser.add_argument('-nltk', '--nltk', action='store_true',
114 |                         help='use nltk model.NgramModel, with n = min')
115 |     parser.add_argument('-s', '--separate', action='store_true',
116 |                         help='separate card text into lines when constructing nltk model')
117 |     parser.add_argument('-v', '--verbose', action='store_true', 
118 |                         help='verbose output')
119 | 
120 |     args = parser.parse_args()
121 |     main(args.infile, args.outfile, gmin=args.min, gmax=args.max, nltk=args.nltk,
122 |          sep=args.separate, verbose=args.verbose)
123 |     exit(0)
124 | 


--------------------------------------------------------------------------------
/scripts/pairing.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | import sys
  3 | import os
  4 | import random
  5 | import zipfile
  6 | import shutil
  7 | 
  8 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../lib')
  9 | sys.path.append(libdir)
 10 | datadir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../data')
 11 | import utils
 12 | import jdecode
 13 | import ngrams
 14 | import analysis
 15 | import mtg_validate
 16 | 
 17 | from cbow import CBOW
 18 | 
 19 | separate_lines=True
 20 | 
 21 | def select_card(cards, stats, i):
 22 |     card = cards[i]
 23 |     nearest = stats['dists']['cbow'][i]
 24 |     perp = stats['ngram']['perp'][i]
 25 |     perp_per = stats['ngram']['perp_per'][i]
 26 |     perp_max = stats['ngram']['perp_max'][i]
 27 | 
 28 |     if nearest > 0.9 or perp_per > 2.0 or perp_max > 10.0:
 29 |         return None
 30 |         
 31 |     ((_, total_good, _, _), _) = mtg_validate.process_props([card])
 32 |     if not total_good == 1:
 33 |         return False
 34 | 
 35 |     # print '===='
 36 |     # print nearest
 37 |     # print perp
 38 |     # print perp_per
 39 |     # print perp_max
 40 |     # print '----'
 41 |     # print card.format()
 42 | 
 43 |     return True
 44 | 
 45 | def compare_to_real(card, realcard):
 46 |     ctypes = ' '.join(sorted(card.types))
 47 |     rtypes = ' '.join(sorted(realcard.types))
 48 |     return ctypes == rtypes and realcard.cost.check_colors(card.cost.get_colors())
 49 | 
 50 | def writecard(card, name, writer):
 51 |     gatherer = False
 52 |     for_forum = True
 53 |     vdump = True
 54 |     fmt = card.format(gatherer = gatherer, for_forum = for_forum, vdump = vdump)
 55 |     oldname = card.name
 56 |     # alter name used in image
 57 |     card.name = name
 58 |     writer.write(card.to_mse().encode('utf-8'))
 59 |     card.name = oldname
 60 |     fstring = ''
 61 |     if card.json:
 62 |         fstring += 'JSON:\n' + card.json + '\n'
 63 |     if card.raw: 
 64 |         fstring += 'raw:\n' + card.raw + '\n'
 65 |     fstring += '\n'
 66 |     fstring +=  fmt + '\n'
 67 |     fstring = fstring.replace('<', '(').replace('>', ')')
 68 |     writer.write(('\n' + fstring[:-1]).replace('\n', '\n\t\t').encode('utf-8'))
 69 |     writer.write('\n'.encode('utf-8'))
 70 | 
 71 | def main(fname, oname, n=20, verbose=False):
 72 |     cbow = CBOW()
 73 |     realcards = jdecode.mtg_open_file(str(os.path.join(datadir, 'output.txt')), verbose=verbose)
 74 |     real_by_name = {c.name: c for c in realcards}
 75 |     lm = ngrams.build_ngram_model(realcards, 3, separate_lines=separate_lines, verbose=verbose)
 76 |     cards = jdecode.mtg_open_file(fname, verbose=verbose)
 77 |     stats = analysis.get_statistics(fname, lm=lm, sep=separate_lines, verbose=verbose)
 78 | 
 79 |     selected = []
 80 |     for i in range(0, len(cards)):
 81 |         if select_card(cards, stats, i):
 82 |             selected += [(i, cards[i])]
 83 | 
 84 |     limit = 3000
 85 | 
 86 |     random.shuffle(selected)
 87 |     #selected = selected[:limit]
 88 | 
 89 |     if verbose:
 90 |         print('computing nearest cards for ' + str(len(selected)) + ' candindates...')
 91 |     cbow_nearest = cbow.nearest_par(map(lambda (i, c): c, selected))
 92 |     for i in range(0, len(selected)):
 93 |         (j, card) = selected[i]
 94 |         selected[i] = (j, card, cbow_nearest[i])
 95 |     if verbose:
 96 |         print('...done')
 97 | 
 98 |     final = []
 99 |     for (i, card, nearest) in selected:
100 |         for dist, rname in nearest:
101 |             realcard = real_by_name[rname]
102 |             if compare_to_real(card, realcard):
103 |                 final += [(i, card, realcard, dist)]
104 |                 break
105 | 
106 |     for (i, card, realcard, dist) in final:
107 |         print '-- real --'
108 |         print realcard.format()
109 |         print '-- fake --'
110 |         print card.format()
111 |         print '-- stats --'
112 |         perp_per = stats['ngram']['perp_per'][i]
113 |         perp_max = stats['ngram']['perp_max'][i]
114 |         print dist
115 |         print perp_per
116 |         print perp_max
117 |         print '----'
118 | 
119 |     if not oname is None:
120 |         with open(oname, 'wt') as ofile:
121 |             ofile.write(utils.mse_prepend)
122 |             for (i, card, realcard, dist) in final:
123 |                 name = realcard.name
124 |                 writecard(realcard, name, ofile)
125 |                 writecard(card, name, ofile)
126 |             ofile.write('version control:\n\ttype: none\napprentice code: ')
127 |             # Copy whatever output file is produced, name the copy 'set' (yes, no extension).
128 |             if os.path.isfile('set'):
129 |                 print 'ERROR: tried to overwrite existing file "set" - aborting.'
130 |                 return
131 |             shutil.copyfile(oname, 'set')
132 |             # Use the freaky mse extension instead of zip.
133 |             with zipfile.ZipFile(oname+'.mse-set', mode='w') as zf:
134 |                 try:
135 |                     # Zip up the set file into oname.mse-set.
136 |                     zf.write('set') 
137 |                 finally:
138 |                     if verbose:
139 |                         print 'Made an MSE set file called ' + oname + '.mse-set.'
140 |                     # The set file is useless outside the .mse-set, delete it.
141 |                     os.remove('set')
142 | 
143 | if __name__ == '__main__':
144 |     
145 |     import argparse
146 |     parser = argparse.ArgumentParser()
147 |     
148 |     parser.add_argument('infile', #nargs='?'. default=None,
149 |                         help='encoded card file or json corpus to process')
150 |     parser.add_argument('outfile', nargs='?', default=None,
151 |                         help='output file, defaults to none')
152 |     parser.add_argument('-n', '--n', action='store',
153 |                         help='number of cards to consider for each pairing')
154 |     parser.add_argument('-v', '--verbose', action='store_true', 
155 |                         help='verbose output')
156 | 
157 |     args = parser.parse_args()
158 |     main(args.infile, args.outfile, n=args.n, verbose=args.verbose)
159 |     exit(0)
160 | 


--------------------------------------------------------------------------------
/scripts/sanity.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | import sys
  3 | import os
  4 | import re
  5 | import json
  6 | 
  7 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../lib')
  8 | sys.path.append(libdir)
  9 | import utils
 10 | import jdecode
 11 | import cardlib
 12 | import transforms
 13 | 
 14 | def check_lines(fname):
 15 |     cards = jdecode.mtg_open_file(fname, verbose=True, linetrans=True)
 16 | 
 17 |     prelines = set()
 18 |     keylines = set()
 19 |     mainlines = set()
 20 |     costlines = set()
 21 |     postlines = set()
 22 | 
 23 |     known = ['enchant ', 'equip', 'countertype', 'multikicker', 'kicker',
 24 |              'suspend', 'echo', 'awaken', 'bestow', 'buyback',
 25 |              'cumulative', 'dash', 'entwine', 'evoke', 'fortify',
 26 |              'flashback', 'madness', 'morph', 'megamorph', 'miracle', 'ninjutsu',
 27 |              'overload', 'prowl', 'recover', 'reinforce', 'replicate', 'scavenge',
 28 |              'splice', 'surge', 'unearth', 'transfigure', 'transmute',
 29 |     ]
 30 |     known = []
 31 | 
 32 |     for card in cards:
 33 |         prel, keyl, mainl, costl, postl = transforms.separate_lines(card.text.encode(randomize=False))
 34 |         if card.bside:
 35 |             prel2, keyl2, mainl2, costl2, postl2 = transforms.separate_lines(card.bside.text.encode(randomize=False))
 36 |             prel += prel2
 37 |             keyl += keyl2
 38 |             mainl += mainl2
 39 |             costl += costl2
 40 |             postl += postl2
 41 | 
 42 |         for line in prel:
 43 |             if line.strip() == '':
 44 |                 print(card.name, card.text.text)
 45 |             if any(line.startswith(s) for s in known):
 46 |                 line = 'known'
 47 |             prelines.add(line)
 48 |         for line in postl:
 49 |             if line.strip() == '':
 50 |                 print(card.name, card.text.text)
 51 |             if any(line.startswith(s) for s in known):
 52 |                 line = 'known'
 53 |             postlines.add(line)
 54 |         for line in keyl:
 55 |             if line.strip() == '':
 56 |                 print(card.name, card.text.text)
 57 |             if any(line.startswith(s) for s in known):
 58 |                 line = 'known'
 59 |             keylines.add(line)
 60 |         for line in mainl:
 61 |             if line.strip() == '':
 62 |                 print(card.name, card.text.text)
 63 |             # if any(line.startswith(s) for s in known):
 64 |             #     line = 'known'
 65 |             mainlines.add(line)
 66 |         for line in costl:
 67 |             if line.strip() == '':
 68 |                 print(card.name, card.text.text)
 69 |             # if any(line.startswith(s) for s in known) or 'cycling' in line or 'monstrosity' in line:
 70 |             #     line = 'known'
 71 |             costlines.add(line)
 72 | 
 73 |     print('prel: {:d}, keyl: {:d}, mainl: {:d}, postl {:d}'
 74 |           .format(len(prelines), len(keylines), len(mainlines), len(postlines)))
 75 | 
 76 |     print('\nprelines')
 77 |     for line in sorted(prelines):
 78 |         print(line)
 79 | 
 80 |     print('\npostlines')
 81 |     for line in sorted(postlines):
 82 |         print(line)
 83 | 
 84 |     print('\ncostlines')
 85 |     for line in sorted(costlines):
 86 |         print(line)
 87 | 
 88 |     print('\nkeylines')
 89 |     for line in sorted(keylines):
 90 |         print(line)
 91 | 
 92 |     print('\nmainlines')
 93 |     for line in sorted(mainlines):
 94 |         #if any(s in line for s in ['champion', 'devour', 'tribute']):
 95 |         print(line)
 96 | 
 97 | def check_vocab(fname):
 98 |     cards = jdecode.mtg_open_file(fname, verbose=True, linetrans=True)
 99 | 
100 |     vocab = {}
101 |     for card in cards:
102 |         words = card.text.vectorize().split()
103 |         if card.bside:
104 |             words += card.bside.text.vectorize().split()
105 |         for word in words:
106 |             if not word in vocab:
107 |                 vocab[word] = 1
108 |             else:
109 |                 vocab[word] += 1
110 | 
111 |     for word in sorted(vocab, lambda x,y: cmp(vocab[x], vocab[y]), reverse = True):
112 |         print('{:8d} : {:s}'.format(vocab[word], word))
113 | 
114 |     n = 3
115 | 
116 |     for card in cards:
117 |         words = card.text.vectorize().split()
118 |         if card.bside:
119 |             words += card.bside.text.vectorize().split()
120 |         for word in words:
121 |             if vocab[word] <= n:
122 |             #if 'name' in word:
123 |                 print('\n{:8d} : {:s}'.format(vocab[word], word))
124 |                 print(card.encode())
125 |                 break
126 | 
127 | def check_characters(fname, vname):
128 |     cards = jdecode.mtg_open_file(fname, verbose=True, linetrans=True)
129 | 
130 |     tokens = {c for c in utils.cardsep}
131 |     for card in cards:
132 |         for c in card.encode():
133 |             tokens.add(c)
134 | 
135 |     token_to_idx = {tok:i+1 for i, tok in enumerate(sorted(tokens))}
136 |     idx_to_token = {i+1:tok for i, tok in enumerate(sorted(tokens))}
137 | 
138 |     print('Vocabulary: ({:d} symbols)'.format(len(token_to_idx)))
139 |     for token in sorted(token_to_idx):
140 |         print('{:8s} : {:4d}'.format(repr(token), token_to_idx[token]))
141 | 
142 |     # compliant with torch-rnn
143 |     if vname:
144 |         json_data = {'token_to_idx':token_to_idx, 'idx_to_token':idx_to_token}
145 |         print('writing vocabulary to {:s}'.format(vname))
146 |         with open(vname, 'w') as f:
147 |             json.dump(json_data, f)
148 | 
149 | if __name__ == '__main__':
150 |     import argparse
151 |     parser = argparse.ArgumentParser()
152 |     
153 |     parser.add_argument('infile', nargs='?', default=os.path.join(libdir, '../data/output.txt'),
154 |                         help='encoded card file or json corpus to process')
155 |     parser.add_argument('-lines', action='store_true',
156 |                         help='show behavior of line separation')
157 |     parser.add_argument('-vocab', action='store_true',
158 |                         help='show vocabulary counts from encoded card text')
159 |     parser.add_argument('-chars', action='store_true',
160 |                         help='generate and display vocabulary of characters used in encoding')
161 |     parser.add_argument('--vocab_name', default=None,
162 |                         help='json file to write vocabulary to')
163 |     args = parser.parse_args()
164 | 
165 |     if args.lines:
166 |         check_lines(args.infile)
167 |     if args.vocab:
168 |         check_vocab(args.infile)
169 |     if args.chars:
170 |         check_characters(args.infile, args.vocab_name)
171 | 
172 |     exit(0)
173 | 


--------------------------------------------------------------------------------
/scripts/streamcards.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # -- STOLEN FROM torch-rnn/scripts/streamfile.py -- #
  4 | 
  5 | import os
  6 | import threading
  7 | import time
  8 | import signal
  9 | import traceback
 10 | import psutil
 11 | 
 12 | # correctly setting up a stream that won't get orphaned and left clutting the operating
 13 | # system proceeds in 3 parts:
 14 | #   1) invoke install_suicide_handlers() to ensure correct behavior on interrupt
 15 | #   2) get threads by invoking spawn_stream_threads
 16 | #   3) invoke wait_and_kill_self_noreturn(threads)
 17 | # or, use the handy wrapper that does it for you
 18 | 
 19 | def spawn_stream_threads(fds, runthread, mkargs):
 20 |     threads = []
 21 |     for i, fd in enumerate(fds):
 22 |         stream_thread = threading.Thread(target=runthread, args=mkargs(i, fd))
 23 |         stream_thread.daemon = True
 24 |         stream_thread.start()
 25 |         threads.append(stream_thread)
 26 |     return threads
 27 | 
 28 | def force_kill_self_noreturn():
 29 |     # We have a strange issue here, which is that our threads will refuse to die
 30 |     # to a normal exit() or sys.exit() because they're all blocked in write() calls
 31 |     # on full pipes; the simplest workaround seems to be to ask the OS to terminate us.
 32 |     # This kinda works, but...
 33 |     #os.kill(os.getpid(), signal.SIGTERM)
 34 |     # psutil might have useful features like checking if the pid has been reused before killing it.
 35 |     # Also we might have child processes like l2e luajits to think about.
 36 |     me = psutil.Process(os.getpid())
 37 |     for child in me.children(recursive=True):
 38 |         child.terminate()
 39 |     me.terminate()
 40 | 
 41 | def handler_kill_self(signum, frame):
 42 |     if signum != signal.SIGQUIT:
 43 |         traceback.print_stack(frame)
 44 |         print('caught signal {:d} - streamer sending SIGTERM to self'.format(signum))
 45 |     force_kill_self_noreturn()
 46 | 
 47 | def install_suicide_handlers():
 48 |     for sig in [signal.SIGHUP, signal.SIGINT, signal.SIGQUIT]:
 49 |         signal.signal(sig, handler_kill_self)
 50 | 
 51 | def wait_and_kill_self_noreturn(threads):
 52 |     running = True
 53 |     while running:
 54 |         running = False
 55 |         for thread in threads:
 56 |             if thread.is_alive():
 57 |                 running = True
 58 |         if(os.getppid() <= 1):
 59 |             # exit if parent process died (and we were reparented to init)
 60 |             break
 61 |         time.sleep(1)
 62 |     force_kill_self_noreturn()
 63 | 
 64 | def streaming_noreturn(fds, write_stream, mkargs):
 65 |     install_suicide_handlers()
 66 |     threads = spawn_stream_threads(fds, write_stream, mkargs)
 67 |     wait_and_kill_self_noreturn(threads)
 68 |     assert False, 'should not return from streaming'
 69 | 
 70 | # -- END STOLEN FROM torch-rnn/scripts/streamfile.py -- #
 71 | 
 72 | import sys
 73 | import random
 74 | 
 75 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../lib')
 76 | sys.path.append(libdir)
 77 | import utils
 78 | import jdecode
 79 | import transforms
 80 | 
 81 | def main(args):
 82 |     fds = args.fds
 83 |     fname = args.fname
 84 |     block_size =  args.block_size
 85 |     main_seed = args.seed if args.seed != 0 else None
 86 | 
 87 |     # simple default encoding for now, will add more options with the curriculum
 88 |     # learning feature
 89 | 
 90 |     cards = jdecode.mtg_open_file(fname, verbose=True, linetrans=True)
 91 | 
 92 |     def write_stream(i, fd):
 93 |         local_random = random.Random(main_seed)
 94 |         local_random.jumpahead(i)
 95 |         local_cards = [card for card in cards]
 96 |         with open('/proc/self/fd/'+str(fd), 'wt') as f:
 97 |             while True:
 98 |                 local_random.shuffle(local_cards)
 99 |                 for card in local_cards:
100 |                     f.write(card.encode(randomize_mana=True, randomize_lines=True))
101 |                     f.write(utils.cardsep)
102 | 
103 |     def mkargs(i, fd):
104 |         return i, fd
105 | 
106 |     streaming_noreturn(fds, write_stream, mkargs)
107 | 
108 | if __name__ == '__main__':
109 |     import argparse
110 |     
111 |     parser = argparse.ArgumentParser()
112 |     parser.add_argument('fds', type=int, nargs='+',
113 |                         help='file descriptors to write streams to')
114 |     parser.add_argument('-f', '--fname', default=os.path.join(libdir, '../data/output.txt'),
115 |                         help='file to read cards from')
116 |     parser.add_argument('-n', '--block_size', type=int, default=10000,
117 |                         help='number of characters each stream should read/write at a time')
118 |     parser.add_argument('-s', '--seed', type=int, default=0,
119 |                         help='random seed')
120 |     args = parser.parse_args()
121 | 
122 |     main(args)
123 | 


--------------------------------------------------------------------------------
/scripts/sum.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | import sys
 3 | import os
 4 | 
 5 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../lib')
 6 | sys.path.append(libdir)
 7 | 
 8 | def main(fname):
 9 |     with open(fname, 'rt') as f:
10 |         text = f.read()
11 |     
12 |     cardstats = text.split('\n')
13 |     nonempty = 0
14 |     name_avg = 0
15 |     name_dupes = 0
16 |     card_avg = 0
17 |     card_dupes = 0
18 | 
19 |     for c in cardstats:
20 |         fields = c.split('|')
21 |         if len(fields) < 4:
22 |             continue
23 |         nonempty += 1
24 |         idx = int(fields[0])
25 |         name = str(fields[1])
26 |         ndist = float(fields[2])
27 |         cdist = float(fields[3])
28 | 
29 |         name_avg += ndist
30 |         if ndist == 1.0:
31 |             name_dupes += 1
32 |         card_avg += cdist
33 |         if cdist == 1.0:
34 |             card_dupes += 1
35 | 
36 |     name_avg = name_avg / float(nonempty)
37 |     card_avg = card_avg / float(nonempty)
38 |     
39 |     print str(nonempty) + ' cards'
40 |     print '-- names --'
41 |     print 'avg distance:   ' + str(name_avg)
42 |     print 'num duplicates: ' + str(name_dupes)
43 |     print '-- cards --'
44 |     print 'avg distance:   ' + str(card_avg)
45 |     print 'num duplicates: ' + str(card_dupes)
46 |     print '----'
47 | 
48 | if __name__ == '__main__':
49 |     
50 |     import argparse
51 |     parser = argparse.ArgumentParser()
52 |     
53 |     parser.add_argument('infile', #nargs='?'. default=None,
54 |                         help='data file to process')
55 | 
56 |     args = parser.parse_args()
57 |     main(args.infile)
58 |     exit(0)
59 | 


--------------------------------------------------------------------------------
/scripts/summarize.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | import sys
 3 | import os
 4 | 
 5 | libdir = os.path.join(os.path.dirname(os.path.realpath(__file__)), '../lib')
 6 | sys.path.append(libdir)
 7 | import utils
 8 | import jdecode
 9 | from datalib import Datamine
10 | 
11 | def main(fname, verbose = True, outliers = False, dump_all = False):
12 |     if fname[-5:] == '.json':
13 |         if verbose:
14 |             print 'This looks like a json file: ' + fname
15 |         json_srcs = jdecode.mtg_open_json(fname, verbose)
16 |         card_srcs = []
17 |         for json_cardname in sorted(json_srcs):
18 |             if len(json_srcs[json_cardname]) > 0:
19 |                 card_srcs += [json_srcs[json_cardname][0]]
20 |     else:
21 |         if verbose:
22 |             print 'Opening encoded card file: ' + fname
23 |         with open(fname, 'rt') as f:
24 |             text = f.read()
25 |         card_srcs = text.split(utils.cardsep)
26 | 
27 |     mine = Datamine(card_srcs)
28 |     mine.summarize()
29 |     if outliers or dump_all:
30 |         mine.outliers(dump_invalid = dump_all)
31 | 
32 | 
33 | if __name__ == '__main__':
34 |     import argparse
35 |     parser = argparse.ArgumentParser()
36 |     
37 |     parser.add_argument('infile', 
38 |                         help='encoded card file or json corpus to process')
39 |     parser.add_argument('-x', '--outliers', action='store_true',
40 |                         help='show additional diagnostics and edge cases')
41 |     parser.add_argument('-a', '--all', action='store_true',
42 |                         help='show all information and dump invalid cards')
43 |     parser.add_argument('-v', '--verbose', action='store_true', 
44 |                         help='verbose output')
45 |     
46 |     args = parser.parse_args()
47 |     main(args.infile, verbose = args.verbose, outliers = args.outliers, dump_all = args.all)
48 |     exit(0)
49 | 


--------------------------------------------------------------------------------
/sortcards.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import codecs
  3 | import sys
  4 | from collections import OrderedDict
  5 | 
  6 | # returns back a dictionary mapping the names of classes of cards
  7 | # to lists of cards in those classes
  8 | def sortcards(cards):
  9 |     classes = OrderedDict([
 10 |         ('Special classes:', None),
 11 |         ('multicards', []),
 12 |         ('Inclusive classes:', None),
 13 |         ('X cards', []),
 14 |         ('kicker cards', []),
 15 |         ('counter cards', []),
 16 |         ('uncast cards', []),
 17 |         ('choice cards', []),
 18 |         ('equipment', []),
 19 |         ('levelers', []),
 20 |         ('legendary', []),
 21 |         ('Exclusive classes:', None),
 22 |         ('planeswalkers', []),
 23 |         ('lands', []),
 24 |         ('instants', []),
 25 |         ('sorceries', []),
 26 |         ('enchantments', []),
 27 |         ('noncreature artifacts', []),
 28 |         ('creatures', []),
 29 |         ('other', []),
 30 |         ('By color:', None),
 31 |         ('white', []),
 32 |         ('blue', []),
 33 |         ('black', []),
 34 |         ('red', []),
 35 |         ('green', []),
 36 |         ('colorless nonland', []),
 37 |         ('colorless land', []),
 38 |         ('unknown color', []),
 39 |         ('By number of colors:', None),
 40 |         ('zero colors', []),                
 41 |         ('one color', []),
 42 |         ('two colors', []),
 43 |         ('three colors', []),
 44 |         ('four colors', []),
 45 |         ('five colors', []),
 46 |         ('more colors?', []),
 47 |     ])
 48 | 
 49 |     for card in cards:
 50 |         # special classes
 51 |         if '|\n|' in card:
 52 |             # better formatting pls???
 53 |             classes['multicards'] += [card.replace('|\n|', '|\n~~~~~~~~~~~~~~~~\n|')]
 54 |             continue
 55 |         
 56 |         # inclusive classes
 57 |         if 'X' in card:
 58 |             classes['X cards'] += [card]
 59 |         if 'kick' in card:
 60 |             classes['kicker cards'] += [card]
 61 |         if '%' in card or '#' in card:
 62 |             classes['counter cards'] += [card]
 63 |         if 'uncast' in card:
 64 |             classes['uncast cards'] += [card]
 65 |         if '[' in card or ']' in card or '=' in card:
 66 |             classes['choice cards'] += [card]
 67 |         if '|equipment|' in card or 'equip {' in card:
 68 |             classes['equipment'] += [card]
 69 |         if 'level up' in card or 'level &' in card:
 70 |             classes['levelers'] += [card]
 71 |         if '|legendary|' in card:
 72 |             classes['legendary'] += [card]
 73 | 
 74 |         # exclusive classes
 75 |         if '|planeswalker|' in card:
 76 |             classes['planeswalkers'] += [card]
 77 |         elif '|land|' in card:
 78 |             classes['lands'] += [card]
 79 |         elif '|instant|' in card:
 80 |             classes['instants'] += [card]
 81 |         elif '|sorcery|' in card:
 82 |             classes['sorceries'] += [card]
 83 |         elif '|enchantment|' in card:
 84 |             classes['enchantments'] += [card]
 85 |         elif '|artifact|' in card:
 86 |             classes['noncreature artifacts'] += [card]
 87 |         elif '|creature|' in card or 'artifact creature' in card:
 88 |             classes['creatures'] += [card]
 89 |         else:
 90 |             classes['other'] += [card]
 91 | 
 92 |         # color classes need to find the mana cost
 93 |         fields = card.split('|')
 94 |         if len(fields) != 11:
 95 |             classes['unknown color'] += [card]
 96 |         else:
 97 |             cost = fields[8]
 98 |             color_count = 0
 99 |             if 'W' in cost or 'U' in cost or 'B' in cost or 'R' in cost or 'G' in cost:
100 |                 if 'W' in cost:
101 |                     classes['white'] += [card]
102 |                     color_count += 1
103 |                 if 'U' in cost:
104 |                     classes['blue'] += [card]
105 |                     color_count += 1
106 |                 if 'B' in cost:
107 |                     classes['black'] += [card]
108 |                     color_count += 1
109 |                 if 'R' in cost:
110 |                     classes['red'] += [card]
111 |                     color_count += 1
112 |                 if 'G' in cost:
113 |                     classes['green'] += [card]
114 |                     color_count += 1
115 |                 # should be unreachable
116 |                 if color_count == 0:
117 |                     classes['unknown color'] += [card]
118 |             else:
119 |                 if '|land|' in card:
120 |                     classes['colorless land'] += [card]
121 |                 else:
122 |                     classes['colorless nonland'] += [card]
123 |             
124 |             if color_count == 0:
125 |                 classes['zero colors'] += [card]
126 |             elif color_count == 1:
127 |                 classes['one color'] += [card]
128 |             elif color_count == 2:
129 |                 classes['two colors'] += [card]
130 |             elif color_count == 3:
131 |                 classes['three colors'] += [card]
132 |             elif color_count == 4:
133 |                 classes['four colors'] += [card]
134 |             elif color_count == 5:
135 |                 classes['five colors'] += [card]
136 |             else:
137 |                 classes['more colors?'] += [card]
138 |         
139 |     return classes
140 | 
141 | 
142 | def main(fname, oname = None, verbose = True):
143 |     if verbose:
144 |         print 'Opening encoded card file: ' + fname
145 | 
146 |     f = open(fname, 'r')
147 |     text = f.read()
148 |     f.close()
149 | 
150 |     # we get rid of the first and last because they are probably partial
151 |     cards = text.split('\n\n')[1:-1]
152 |     classes = sortcards(cards)
153 | 
154 |     if not oname == None:
155 |         if verbose:
156 |             print 'Writing output to: ' + oname
157 |         ofile = codecs.open(oname, 'w', 'utf-8')
158 | 
159 |     for cardclass in classes:
160 |         if classes[cardclass] == None:
161 |             print cardclass
162 |         else:
163 |             print '  ' + cardclass + ': ' + str(len(classes[cardclass]))
164 | 
165 |     if oname == None:
166 |         outputter = sys.stdout
167 |     else:
168 |         outputter = ofile
169 | 
170 |     for cardclass in classes:
171 |         if classes[cardclass] == None:
172 |             outputter.write(cardclass + '\n')
173 |         else:
174 |             classlen = len(classes[cardclass])
175 |             if classlen > 0:
176 |                 outputter.write('[spoiler=' + cardclass + ': ' + str(classlen) + ' cards]\n')
177 |                 for card in classes[cardclass]:
178 |                     outputter.write(card + '\n\n')
179 |                 outputter.write('[/spoiler]\n')
180 | 
181 |     if not oname == None:
182 |         ofile.close()
183 | 
184 |     
185 | if __name__ == '__main__':
186 |     import sys
187 |     if len(sys.argv) == 2:
188 |         main(sys.argv[1])
189 |     elif len(sys.argv) == 3:
190 |         main(sys.argv[1], oname = sys.argv[2])
191 |     else:
192 |         print 'Usage: ' + sys.argv[0] + ' ' + '<encoded file> [output filename]'
193 |         exit(1)
194 | 
195 | 


--------------------------------------------------------------------------------