├── .gitignore ├── LICENSE ├── README.md ├── msgs └── .gitignore └── search.py /.gitignore: -------------------------------------------------------------------------------- 1 | libpff* 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015 Severin Schols 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Outlook PST file search 2 | 3 | I had a few PST files from old times, which I needed some information out of. Running OS X, without Outlook, made it difficult to get to that stuff. 4 | 5 | So I decided to look through the internet, and found [libpff](https://github.com/libyal/libpff), which cans (sort of) parse PST files, and also has python bindings. I used the packages source version of it, because I couldn't get the development version in git to compile. Also, I used a Debian VPS for this experiment. 6 | 7 | The code here searches for a keyword in all messages, and if it finds it, it'll write the message as a txt file into the msgs folder. 8 | 9 | ## Compiling and runing 10 | ``` 11 | apt-get install python-dev build-essential 12 | 13 | wget https://da1ba3cfdffc2404250f16d3711dfb32dcd40e96.googledrive.com/host/0B3fBvzttpiiScU9qcG5ScEZKZE0/libpff-experimental-20131028.tar.gz 14 | tar xvzf libpff-experimental-20131028.tar.gz 15 | cd libpff-20131028/ 16 | ./configure --enable-python 17 | make 18 | make install 19 | 20 | LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib python search.py /home/severin/my_mails.pst rubberducky 21 | ``` 22 | 23 | Searches for the term "rubberducky" in the PST file "/home/severin/my_mails.pst". -------------------------------------------------------------------------------- /msgs/.gitignore: -------------------------------------------------------------------------------- 1 | *.txt 2 | -------------------------------------------------------------------------------- /search.py: -------------------------------------------------------------------------------- 1 | import pypff 2 | import sys 3 | 4 | if len(sys.argv) != 3: 5 | print "Need to have 2 arguments: " 6 | sys.exit(1) 7 | 8 | pst_file = sys.argv[1] 9 | search_term = sys.argv[2] 10 | 11 | print "PST file:", pst_file 12 | print "Search term:", search_term 13 | 14 | pst = pypff.file() 15 | pst.open(pst_file) 16 | 17 | print "Size:", pst.get_size() 18 | print 19 | 20 | msg_counter = 0 21 | 22 | def search_dir(dir,path): 23 | if dir.get_display_name(): 24 | new_path = path + u"/" + unicode(dir.get_display_name()) 25 | else: 26 | new_path = path 27 | 28 | print "Searching ", new_path 29 | 30 | for i in range(0, dir.get_number_of_sub_messages()): 31 | msg = dir.get_sub_message(i) 32 | try: 33 | if search_term in msg.get_plain_text_body(): 34 | write_to_file(msg) 35 | except TypeError: 36 | pass 37 | 38 | for i in range(0, dir.get_number_of_sub_folders()): 39 | search_dir(dir.get_sub_folder(i), new_path) 40 | 41 | def write_to_file(msg): 42 | global msg_counter 43 | f = open("msgs/" + str(msg_counter) + ".txt","wb") 44 | f.write("Subject: ") 45 | f.write(msg.get_subject().encode("UTF-8")) 46 | f.write("\n\n") 47 | f.write(msg.get_plain_text_body().encode("UTF-8")) 48 | f.close() 49 | msg_counter = msg_counter + 1 50 | 51 | search_dir(pst.get_root_folder(),u"") 52 | --------------------------------------------------------------------------------