├── .gitignore ├── credentials.ini.txt ├── testdata.csv ├── LICENSE ├── README.md └── post-articles.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.csv 2 | *.tsv 3 | credentials.ini 4 | *.swp 5 | .kdev4/ 6 | *.kdev4 7 | *.kate-swp 8 | *.directory 9 | -------------------------------------------------------------------------------- /credentials.ini.txt: -------------------------------------------------------------------------------- 1 | [DEFAULT] 2 | host = 3 | username = 4 | password = 5 | client_id = 6 | c_secret = 7 | selfsigned = 8 | -------------------------------------------------------------------------------- /testdata.csv: -------------------------------------------------------------------------------- 1 | url,is_read,is_fav 2 | https://support.dnsimple.com/articles/differences-a-cname-records/,1,0 3 | https://www.datenverlust.at/2016/04/01/vertikale-montage-von-festplatten-beguenstigt-datenverlust/,1,0 4 | https://blog.torproject.org/blog/trouble-cloudflare,1,0 5 | http://verygoodsoftwarenotvirus.ru/,1,0 6 | http://www.heise.de/tp/artikel/47/47852/1.html,1,0 7 | http://www.wired.com/2016/04/sculpture-lets-museums-amplify-tors-anonymity-network/,1,0 8 | http://www.berliner-zeitung.de/berlin/eine-lehrstunde-in-sachen-sicherheit-fuer-den-bundesinnenminister-23830694,1,0 9 | https://www.kuketz-blog.de/app-verbindungen-mitschneiden-auf-android-und-ios/,1,0 10 | http://www.einfachbewusst.de/2016/04/keinen-fernseher/,1,0 11 | "http://mobil.fr-online.de/cms/wirtschaft/afd-zu-lasten-der-kleinen-leute,4233346,33912204,view,asFitMl.html?originalReferrer=https://t.co/4EJQoe7R5Z",1,0 12 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 Benedikt Geißler 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # wallabag-migration 2 | 3 | This python3 script takes a CSV file (formatted like the `testdata.csv` file) and the credentials from the `credentials.ini` to query the wallabag v2 API with it. 4 | 5 | It is meant as an alternative way to migrate from wallabag v1 to v2 when the 6 | amount of articles is too big and the v1 export functionality only delivers 7 | 0-byte-sized JSON files. 8 | 9 | ## Generating CSV with a tool 10 | 11 | To get the CSV data you can export a CSV file from your wallabag v1 database 12 | with the help of a tools such as [Adminer](https://www.adminer.org/) and the 13 | SQL command 14 | 15 | ```sql 16 | SELECT `url`, `is_read`, `is_fav` 17 | FROM `entries` 18 | WHERE `user_id` = 'N'; # replace N by your actual user ID 19 | ``` 20 | 21 | After that you might need to convert the exported file from DOS to unix format with a command like 22 | 23 | ```bash 24 | dos2unix sql.csv 25 | ``` 26 | 27 | ## Generating CSV with the mysql CLI 28 | 29 | You can also export a CSV file directly on the command line with this command: 30 | 31 | ```bash 32 | export WB_USERID=1 33 | 34 | mysql -e" 35 | SELECT url, is_read, is_fav 36 | FROM entries 37 | WHERE user_id = ${WB_USERID} 38 | INTO OUTFILE '/tmp/wallabag-user-${WB_USERID}.csv' 39 | FIELDS TERMINATED BY ',' 40 | ENCLOSED BY '\"' 41 | LINES TERMINATED BY '\n';" wallabag 42 | 43 | echo 'url,is_read,is_fav' > ./wallabag-user-${WB_USERID}.csv 44 | cat /tmp/wallabag-user-${WB_USERID}.csv >> ./wallabag-user-${WB_USERID}.csv 45 | sudo rm /tmp/wallabag-user-${WB_USERID}.csv # We need sudo here cuz mysqld wrote the file rather than our mysql CLI 46 | ``` 47 | 48 | ## Importing 49 | 50 | The exported CSV needs to be in the same directory where the 51 | `post-articles.py` lies. 52 | 53 | To start the actual migration copy the `credentials.ini.txt` and fill out the `credentials.ini` like 54 | ```bash 55 | cp credentials.ini.txt credentials.ini 56 | ``` 57 | 58 | ```ini 59 | [DEFAULT] 60 | host = https://wallabag.example.org 61 | username = johndoe 62 | password = s3cr3tp4ssw0rd 63 | client_id = 1_yaifiil7ooyaezohne9nei4azoopieshoo8eicae0moh2eumi 64 | c_secret = iquohme5naehee7ieg6ohsh0uo3aghaik3kiephi9jequoodoc 65 | selfsigned = False 66 | ``` 67 | (you can get the client ID and secret by clicking on "developer" in your wallabag v2 and "create a new client" there, `selfsigned` means if your TLS certificate is a self-signed one) and execute 68 | ```bash 69 | ./post-articles.py sql.csv # replace sql.csv by the actual CSV file name 70 | ``` 71 | This process might last quite a long time when you many articles so I'd recommend running it in a terminal multiplexer. In my case it ended with 72 | ``` 73 | posted 1304 articles 74 | finished successfully. 75 | ./post-articles.py wallabag.csv 55,81s user 3,38s system 0% cpu 2:19:34,25 total 76 | ``` 77 | -------------------------------------------------------------------------------- /post-articles.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import configparser 4 | import csv 5 | import itertools 6 | import requests 7 | import sys 8 | 9 | credentialFileName = 'credentials.ini' 10 | maxFailCount = 2 11 | 12 | 13 | def main(args): 14 | config = configparser.ConfigParser() 15 | config.read(credentialFileName) 16 | hostname, payload, isSelfSigned = extractCreds(config) 17 | doVerify = not isSelfSigned 18 | token = getToken(hostname, payload, doVerify) 19 | 20 | csvFileName = args[1] 21 | 22 | hasRequiredColumns = checkCsvFile(csvFileName) 23 | if not hasRequiredColumns: 24 | sys.exit('''csv file does not have the required could 25 | there should be: 26 | `url`, `is_fav`, `is_read`''') 27 | 28 | fp = open(csvFileName, newline='') 29 | reader = csv.DictReader(fp) 30 | 31 | global counter 32 | counter = 0 33 | for row in reader: 34 | failCount = 0 35 | while failCount < maxFailCount: 36 | article = extractArticle(row, token) 37 | printf('.') 38 | r = requests.post('{}/api/entries.json'.format(hostname), article, 39 | verify=doVerify) 40 | if not connectionFailed(r): 41 | counter += 1 42 | printf('\b+') 43 | break 44 | else: 45 | failCount += 1 46 | printf('\b-') 47 | token = getToken(hostname, payload, doVerify) 48 | article['access_token'] = token 49 | if failCount == 2: 50 | sys.exit('\nConnection failed.') 51 | fp.close() 52 | 53 | 54 | def extractCreds(config): 55 | ''' 56 | reads the config file and 57 | returns a tuple of the hostname (str) 58 | and the API request payload (dict) 59 | and if the TLS cert is selfsigned (bool) 60 | ''' 61 | config = config.defaults() 62 | hostname = config['host'] 63 | username = config['username'] 64 | password = config['password'] 65 | clientid = config['client_id'] 66 | secret = config['c_secret'] 67 | payload = {'username': username, 'password': password, 68 | 'client_id': clientid, 'client_secret': secret, 69 | 'grant_type': 'password'} 70 | isSelfSigned = False 71 | if (config['selfsigned'] == 'True') or (config['selfsigned'] == 'true'): 72 | isSelfSigned = True 73 | return (hostname, payload, isSelfSigned) 74 | 75 | 76 | def getToken(hostname, payload, doVerify): 77 | ''' 78 | acquires an API token 79 | 80 | returns str 81 | ''' 82 | r = requests.post('{}/oauth/v2/token'.format(hostname), payload, 83 | verify=doVerify) 84 | token = r.json().get('access_token') 85 | refresh = r.json().get('refresh_token') 86 | payload['grant_type'] = 'refresh_token' 87 | payload['refresh_token'] = refresh 88 | return token 89 | 90 | 91 | def checkCsvFile(csvFileName): 92 | ''' 93 | ensures that the CSV file has the right columns 94 | 95 | returns bool 96 | ''' 97 | with open(csvFileName, 'r') as f: 98 | firstLine = f.readline().strip() 99 | 100 | requiredFields = ['url', 'is_read', 'is_fav'] 101 | hasRequiredColumns = False 102 | 103 | for l in itertools.permutations(requiredFields): 104 | toMatch = '{},{},{}'.format(l[0], l[1], l[2]) 105 | doesMatch = (firstLine == toMatch) 106 | if doesMatch: 107 | hasRequiredColumns = True 108 | break 109 | return hasRequiredColumns 110 | 111 | 112 | def extractArticle(row, token): 113 | ''' 114 | interprets a line of the CSV file 115 | 116 | returns dict 117 | ''' 118 | url = row['url'] 119 | isRead = int(row['is_read']) 120 | isFaved = int(row['is_fav']) 121 | article = {'url': url, 'archive': isRead, 122 | 'starred': isFaved, 'access_token': token} 123 | return article 124 | 125 | 126 | def connectionFailed(response): 127 | ''' 128 | checks if there was an error when connecting to the API 129 | 130 | returns bool 131 | ''' 132 | return 'error' in response.json().keys() 133 | 134 | 135 | def printf(text): 136 | ''' 137 | prints text without newline at the end 138 | 139 | returns void 140 | ''' 141 | print(text, end='', flush=True) 142 | 143 | if __name__ == "__main__": 144 | try: 145 | main(sys.argv) 146 | print('\nposted {} articles\nfinished successfully.'.format(counter)) 147 | except(KeyboardInterrupt): 148 | sys.exit('\nposted {} articles\naborted.'.format(counter)) 149 | --------------------------------------------------------------------------------