├── .gitignore ├── Dockerfile ├── LICENSE ├── README.md ├── config ├── application.config.ini └── my_family.py ├── import_all_changes.py ├── import_list.py ├── import_one.py ├── import_recent_changes.py ├── list ├── monitor_wikidata_identifier_changes.py ├── requirements.txt ├── script.sh ├── user-config.py ├── user-password.py └── util ├── IdSparql.py ├── PropertyWikidataIdentifier.py ├── changes.py ├── get_wikidata_changes.py └── util.py /.gitignore: -------------------------------------------------------------------------------- 1 | env/ 2 | pywikibot.lwp 3 | throttle.ctrl 4 | apicache-py3/ 5 | .idea/ 6 | util/__pycache__/ 7 | venv 8 | *.pyc 9 | *log* 10 | *sh 11 | src/ 12 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM python:3.8-slim-buster 2 | 3 | RUN apt-get update && apt-get install -y git 4 | RUN apt-get install -y procps 5 | 6 | WORKDIR /app 7 | 8 | COPY requirements.txt requirements.txt 9 | RUN pip3 install -r requirements.txt 10 | 11 | COPY . . 12 | 13 | CMD "./script.sh" -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 The QA Company 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # WikibaseSync documentation 2 | This is an open-source project developed by [The QA Company](https://the-qa-company.com). 3 | 4 | *You can use this project to sync Wikidata Items and Properties with your arbitrary Wikibase.* 5 | 6 | This tool is actively used at [https://linkedopendata.eu](https://linkedopendata.eu). 7 | 8 | ## Features 9 | * Import Wikidata Items and Properties 10 | * Track the changes on Wikidata and Keep synchronized 11 | * Keep external links to Wikidata 12 | 13 | ## Installation remarks 14 | This is just a standard setup of the python repo using virtual enviroments 15 | - ensure you have python3 and `sudo yum install python3-devel` or `sudo apt install python3-dev` 16 | - `sudo apt install virtualenv` 17 | - `virtualenv venv --python=python3` 18 | - `source venv/bin/activate` 19 | - `pip install -r requirements.txt` 20 | 21 | ## Setup 22 | This is the standard procedure to create a Bot Account on Wikibase. The Account is responsible for making the edits. 23 | 1. Login in the Wikibase instance (for example using the **admin** account) 24 | 2. Go to "Special Pages" -> "Bot passwords" and type the name of your bot (for example **WikidataUpdater**) 25 | 3. Give him the follwing rights: "High-volume editing", "Edit existing pages" and "Create, edit, and move pages" 26 | 4. Copy your username in "user-config.py" 27 | 5. Copy your username, the name of the bot and the password in "user-password.py" 28 | 6. Update the application.config.ini [`config/application.config.ini`] 29 | 30 | ### Application config 31 | 32 | Define the Wikibase properties in this file. 33 | > ### *application.config.ini* 34 | > 35 | > located in `config/application.config.ini` in the repository 36 | > 37 | > Customize this file based on your Wikibase properties. Check the example properties below (it matches the default properties of a [Wikibase Docker installation](https://github.com/wmde/wikibase-release-pipeline)) 38 | > 39 | > ``` 40 | > [wikibase] 41 | > user = admin 42 | > sparqlEndPoint = http://localhost:8834/proxy/wdqs/bigdata/namespace/wdq/sparql 43 | > domain = localhost:80 44 | > protocol = http 45 | > apiUrl= http://localhost:80/w/api.php 46 | > entityUri=http://wikibase.svc/entity 47 | > propertyUri=http://wikibase.svc/prop 48 | > 49 | > ``` 50 | 51 | ## Usage 52 | - `python import_one.py Q1` to import Q1 from Wikidata (if already imported the entity will be put in sync) 53 | - `python import_one.py P31` to import P31 from Wikidata (if already imported the entity will be put in sync) 54 | - `python import_all_changes.py` to sync all currently imported items 55 | - `python import_recent_changes.py` to sync all entities that where changed in Wikidata (calling this regularly allows to maintain all instances and properties in sync with Wikidata) 56 | -------------------------------------------------------------------------------- /config/application.config.ini: -------------------------------------------------------------------------------- 1 | [DEFAULT] 2 | ServerAliveInterval = 45 3 | 4 | [wikibase] 5 | user = admin 6 | sparqlEndPoint = http://localhost:8834/proxy/wdqs/bigdata/namespace/wdq/sparql 7 | domain = localhost:80 8 | protocol = http 9 | apiUrl= http://localhost:80/w/api.php 10 | entityUri=http://wikibase.svc/entity 11 | propertyUri=http://wikibase.svc/prop 12 | overwriteLocalChanges = false 13 | 14 | -------------------------------------------------------------------------------- /config/my_family.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """Family module for Wikidata.""" 3 | # 4 | # (C) Pywikibot team, 2012-2018 5 | # 6 | # Distributed under the terms of the MIT license. 7 | # 8 | from __future__ import absolute_import, division, unicode_literals 9 | 10 | from pywikibot import config 11 | from pywikibot import family 12 | import configparser 13 | app_config = configparser.ConfigParser() 14 | app_config.read('config/application.config.ini') 15 | 16 | 17 | # The Wikidata family 18 | class Family(family.WikimediaFamily): 19 | 20 | """Family class for Wikidata.""" 21 | 22 | name = 'my' 23 | 24 | langs = { 25 | 'my': app_config.get('wikibase', 'domain'), 26 | } 27 | 28 | interwiki_forward = 'wikipedia' 29 | 30 | category_redirect_templates = { 31 | 'my': ( 32 | 'Category redirect', 33 | ), 34 | } 35 | 36 | # Subpages for documentation. 37 | doc_subpages = { 38 | '_default': (('/doc', ), ['wikidata']), 39 | } 40 | 41 | # Disable cosmetic changes 42 | config.cosmetic_changes_disable.update({ 43 | 'my': ('wikidata2', 'test', 'beta') 44 | }) 45 | 46 | def interface(self, code): 47 | """Return 'DataSite'.""" 48 | return 'DataSite' 49 | 50 | def calendarmodel(self, code): 51 | """Default calendar model for WbTime datatype.""" 52 | return 'http://www.wikidata.org/entity/Q1985727' 53 | 54 | def shared_geo_shape_repository(self, code): 55 | """Return Wikimedia Commons as the repository for geo-shapes.""" 56 | # Per geoShapeStorageFrontendUrl settings in Wikibase 57 | return ('commons', 'commons') 58 | 59 | def shared_tabular_data_repository(self, code): 60 | """Return Wikimedia Commons as the repository for tabular-datas.""" 61 | # Per tabularDataStorageFrontendUrl settings in Wikibase 62 | return ('commons', 'commons') 63 | 64 | def default_globe(self, code): 65 | """Default globe for Coordinate datatype.""" 66 | return 'earth' 67 | 68 | 69 | def protocol(self, code): 70 | return { 71 | 'my': app_config.get('wikibase', 'protocol'), 72 | }[code] 73 | 74 | def globes(self, code): 75 | """Supported globes for Coordinate datatype.""" 76 | return { 77 | 'ariel': 'http://www.wikidata.org/entity/Q3343', 78 | 'callisto': 'http://www.wikidata.org/entity/Q3134', 79 | 'ceres': 'http://www.wikidata.org/entity/Q596', 80 | 'deimos': 'http://www.wikidata.org/entity/Q7548', 81 | 'dione': 'http://www.wikidata.org/entity/Q15040', 82 | 'earth': 'http://www.wikidata.org/entity/Q2', 83 | 'enceladus': 'http://www.wikidata.org/entity/Q3303', 84 | 'eros': 'http://www.wikidata.org/entity/Q16711', 85 | 'europa': 'http://www.wikidata.org/entity/Q3143', 86 | 'ganymede': 'http://www.wikidata.org/entity/Q3169', 87 | 'gaspra': 'http://www.wikidata.org/entity/Q158244', 88 | 'hyperion': 'http://www.wikidata.org/entity/Q15037', 89 | 'iapetus': 'http://www.wikidata.org/entity/Q17958', 90 | 'io': 'http://www.wikidata.org/entity/Q3123', 91 | 'jupiter': 'http://www.wikidata.org/entity/Q319', 92 | 'lutetia': 'http://www.wikidata.org/entity/Q107556', 93 | 'mars': 'http://www.wikidata.org/entity/Q111', 94 | 'mercury': 'http://www.wikidata.org/entity/Q308', 95 | 'mimas': 'http://www.wikidata.org/entity/Q15034', 96 | 'miranda': 'http://www.wikidata.org/entity/Q3352', 97 | 'moon': 'http://www.wikidata.org/entity/Q405', 98 | 'oberon': 'http://www.wikidata.org/entity/Q3332', 99 | 'phobos': 'http://www.wikidata.org/entity/Q7547', 100 | 'phoebe': 'http://www.wikidata.org/entity/Q17975', 101 | 'pluto': 'http://www.wikidata.org/entity/Q339', 102 | 'rhea': 'http://www.wikidata.org/entity/Q15050', 103 | 'steins': 'http://www.wikidata.org/entity/Q150249', 104 | 'tethys': 'http://www.wikidata.org/entity/Q15047', 105 | 'titan': 'http://www.wikidata.org/entity/Q2565', 106 | 'titania': 'http://www.wikidata.org/entity/Q3322', 107 | 'triton': 'http://www.wikidata.org/entity/Q3359', 108 | 'umbriel': 'http://www.wikidata.org/entity/Q3338', 109 | 'venus': 'http://www.wikidata.org/entity/Q313', 110 | 'vesta': 'http://www.wikidata.org/entity/Q3030', 111 | } 112 | -------------------------------------------------------------------------------- /import_all_changes.py: -------------------------------------------------------------------------------- 1 | #configuration for pywikibot 2 | import os 3 | import time 4 | 5 | import pywikibot 6 | from SPARQLWrapper import SPARQLWrapper, JSON 7 | from pywikibot import config2 8 | import configparser 9 | app_config = configparser.ConfigParser() 10 | app_config.read('config/application.config.ini') 11 | 12 | 13 | family = 'my' 14 | mylang = 'my' 15 | familyfile=os.path.relpath("./config/my_family.py") 16 | if not os.path.isfile(familyfile): 17 | print ("family file %s is missing" % (familyfile)) 18 | config2.register_family_file(family, familyfile) 19 | config2.password_file = "user-password.py" 20 | config2.usernames['my']['my'] = app_config.get('wikibase', 'user') 21 | 22 | #connect to the wikibase 23 | wikibase = pywikibot.Site("my", "my") 24 | wikibase_repo = wikibase.data_repository() 25 | wikibase_repo.login() 26 | 27 | #connect to wikidata 28 | wikidata = pywikibot.Site("wikidata", "wikidata") 29 | wikidata_repo = wikidata.data_repository() 30 | 31 | #import an item 32 | from util.util import WikibaseImporter 33 | wikibase_importer = WikibaseImporter(wikibase_repo,wikidata_repo) 34 | 35 | sparql = SPARQLWrapper(app_config.get('wikibase', 'sparqlEndPoint')) 36 | 37 | query = """ 38 | # select distinct ?id where { 39 | # ?s ?id . 40 | # ?s ?p ?o . 41 | # FILTER(STRSTARTS(STR(?p), "https://linkedopendata.eu/prop/direct/") && ?p != && STRSTARTS(STR(?s), "https://linkedopendata.eu/entity/Q")) 42 | # } order by desc(?id) 43 | SELECT ?s1 ?id WHERE { 44 | ?s1 <"""+app_config.get('wikibase', 'propertyUri')+"""/direct/P1> ?id . 45 | {SELECT DISTINCT ?s1 WHERE { 46 | ?s1 <"""+app_config.get('wikibase', 'propertyUri')+"""/P35> ?blank . ?blank <"""+app_config.get('wikibase', 'propertyUri')+"""/statement/P35> <"""+app_config.get('wikibase', 'entityUri')+"""/Q196899> . 47 | ?s1 <"""+app_config.get('wikibase', 'propertyUri')+"""/direct/P1> ?id . 48 | ?s1 <"""+app_config.get('wikibase', 'propertyUri')+"""/P35> ?blank2 . ?blank2 <"""+app_config.get('wikibase', 'propertyUri')+"""/statement/P35> ?prop 49 | } group by ?s1 having(count(?prop) = 1)}} 50 | 51 | """ 52 | sparql.setQuery(query) 53 | sparql.setReturnFormat(JSON) 54 | results = sparql.query().convert() 55 | count = 1 56 | for result in results['results']['bindings']: 57 | print(count,"/",len(results['results']['bindings'])) 58 | count=count+1 59 | split = result['id']['value'].split('/') 60 | id = split[len(split)-1] 61 | print("Changing ",id) 62 | wikidata_item = pywikibot.ItemPage(wikidata_repo, id) 63 | try: 64 | wikidata_item.get() 65 | wikibase_importer.change_item(wikidata_item, wikibase_repo, True) 66 | except pywikibot.exceptions.IsRedirectPage as e: 67 | print("THIS IS A REDIRECT PAGE "+id) 68 | time.sleep(5) 69 | sparql = SPARQLWrapper("https://query.wikidata.org/sparql") 70 | query = "select ?id where { ?id . }" 71 | sparql.setQuery(query) 72 | sparql.setReturnFormat(JSON) 73 | new_results = sparql.query().convert() 74 | for new_result in new_results['results']['bindings']: 75 | newId = new_result['id']['value'].replace("http://www.wikidata.org/entity/","") 76 | print("SEARCHING THE NEW ID ",newId) 77 | wikidata_item = pywikibot.ItemPage(wikidata_repo, newId) 78 | try: 79 | wikidata_item.get() 80 | wikibase_importer.change_item_given_id(wikidata_item, result['s1']['value'].replace("https://linkedopendata.eu/entity/",""), wikibase_repo, True) 81 | except pywikibot.exceptions.IsRedirectPage as e: 82 | print("THIS SHOULD NOT HAPPEN") 83 | 84 | -------------------------------------------------------------------------------- /import_list.py: -------------------------------------------------------------------------------- 1 | #configuration for pywikibot 2 | import pywikibot 3 | 4 | #connect to the wikibase 5 | wikibase = pywikibot.Site("my", "my") 6 | wikibase_repo = wikibase.data_repository() 7 | wikibase_repo.login() 8 | 9 | #connect to wikidata 10 | wikidata = pywikibot.Site("wikidata", "wikidata") 11 | wikidata_repo = wikidata.data_repository() 12 | 13 | from util.util import WikibaseImporter 14 | wikibase_importer = WikibaseImporter(wikibase_repo,wikidata_repo) 15 | 16 | # import a list 17 | filepath = 'list2' 18 | with open(filepath) as fp: 19 | line = fp.readline() 20 | while line: 21 | print("Importing " + line.replace("\n", "")) 22 | if not line.startswith("#"): 23 | if (line.startswith("Q")): 24 | wikidata_item = pywikibot.ItemPage(wikidata_repo, line) 25 | wikidata_item.get() 26 | wikibase_importer.change_item(wikidata_item, wikibase_repo, True) 27 | elif (line.startswith("P")): 28 | wikidata_property = pywikibot.PropertyPage(wikidata_repo, line) 29 | wikidata_property.get() 30 | wikibase_importer.change_property(wikidata_property, wikibase_repo, True) 31 | line = fp.readline() 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | -------------------------------------------------------------------------------- /import_one.py: -------------------------------------------------------------------------------- 1 | # configuration for pywikibot 2 | import sys 3 | 4 | import pywikibot 5 | 6 | # connect to the wikibase 7 | wikibase = pywikibot.Site("my", "my") 8 | wikibase_repo = wikibase.data_repository() 9 | wikibase_repo.login() 10 | 11 | # connect to wikidata 12 | wikidata = pywikibot.Site("wikidata", "wikidata") 13 | wikidata_repo = wikidata.data_repository() 14 | 15 | from util.util import WikibaseImporter 16 | 17 | wikibase_importer = WikibaseImporter(wikibase_repo, wikidata_repo) 18 | 19 | # import a single item or property 20 | arg = sys.argv[1] 21 | print(f"Importing {arg}") 22 | if arg.startswith("Q"): 23 | print("before get") 24 | wikidata_item = pywikibot.ItemPage(wikidata_repo, arg) 25 | wikidata_item.get() 26 | print("after get") 27 | wikibase_importer.change_item(wikidata_item, wikibase_repo, True) 28 | elif arg.startswith("P"): 29 | wikidata_property = pywikibot.PropertyPage(wikidata_repo, arg) 30 | wikidata_property.get() 31 | wikibase_importer.change_property(wikidata_property, wikibase_repo, True) 32 | -------------------------------------------------------------------------------- /import_recent_changes.py: -------------------------------------------------------------------------------- 1 | #configuration for pywikibot 2 | import os 3 | import pywikibot 4 | 5 | from util.get_wikidata_changes import get_wikidata_changes 6 | 7 | from util.IdSparql import IdSparql 8 | from util.PropertyWikidataIdentifier import PropertyWikidataIdentifier 9 | import configparser 10 | app_config = configparser.ConfigParser() 11 | app_config.read('config/application.config.ini') 12 | 13 | #connect to the wikibase 14 | wikibase = pywikibot.Site("my", "my") 15 | wikibase_repo = wikibase.data_repository() 16 | wikibase_repo.login() 17 | 18 | #connect to wikidata 19 | wikidata = pywikibot.Site("wikidata", "wikidata") 20 | wikidata_repo = wikidata.data_repository() 21 | wikibase_repo.login() 22 | 23 | from util.util import WikibaseImporter 24 | wikibase_importer = WikibaseImporter(wikibase_repo,wikidata_repo) 25 | 26 | identifier = PropertyWikidataIdentifier() 27 | identifier.get(wikibase_repo) 28 | print('Wikidata Item Identifier',identifier.itemIdentifier) 29 | 30 | idSparql = IdSparql(app_config.get('wikibase', 'sparqlEndPoint'), identifier.itemIdentifier, identifier.propertyIdentifier) 31 | idSparql.load() 32 | 33 | #grab all entities that changed 34 | recent = get_wikidata_changes(None, 15) 35 | for rc in recent: 36 | print(str(rc['title'])) 37 | if idSparql.contains_id(str(rc['title'])): 38 | print("This entity ...", idSparql.get_id(str(rc['title']))," corresponding to Wikidata entity "+str(rc['title'])+" has changed and will be sync!") 39 | wikidata_item = pywikibot.ItemPage(wikidata_repo, str(rc['title'])) 40 | #check if the entity has some statements 41 | wikibase_item = pywikibot.ItemPage(wikibase_repo, idSparql.get_id(str(rc['title']))) 42 | wikibase_item.get() 43 | count = 0 44 | for wikibase_claims in wikibase_item.claims: 45 | for wikibase_c in wikibase_item.claims.get(wikibase_claims): 46 | count=count+1 47 | if count > 1: 48 | wikibase_importer.change_item(wikidata_item, wikibase_repo, True) 49 | else: 50 | print("Change only the labels") 51 | wikibase_importer.change_item(wikidata_item, wikibase_repo, False) 52 | -------------------------------------------------------------------------------- /list: -------------------------------------------------------------------------------- 1 | # European union and outgoing 2 | Q458 3 | Q27 4 | Q28 5 | Q29 6 | Q31 7 | Q32 8 | Q33 9 | Q34 10 | Q35 11 | Q36 12 | Q37 13 | Q38 14 | Q40 15 | Q41 16 | Q45 17 | Q55 18 | Q142 19 | Q145 20 | Q183 21 | Q191 22 | Q211 23 | Q213 24 | Q214 25 | Q215 26 | Q218 27 | Q219 28 | Q224 29 | Q229 30 | Q233 31 | # capitals of eu countries 32 | #select distinct ?o2 where { wd:Q458 wdt:P150 ?o . ?o wdt:P36 ?o2 . ?o2 rdfs:label ?label } 33 | Q64 34 | Q84 35 | Q90 36 | Q216 37 | Q220 38 | Q239 39 | Q270 40 | Q437 41 | Q472 42 | Q597 43 | Q727 44 | Q1085 45 | Q1435 46 | Q1524 47 | Q1741 48 | Q1748 49 | Q1754 50 | Q1757 51 | Q1761 52 | Q1770 53 | Q1773 54 | Q1780 55 | Q1781 56 | Q1842 57 | Q2807 58 | Q3856 59 | Q19660 60 | Q23800 61 | # Institutions of the european union 62 | #select distinct ?o where { ?o wdt:P31 wd:Q748720 . } 63 | Q56683646 64 | Q4951 65 | Q8880 66 | Q8886 67 | Q8889 68 | Q8896 69 | Q8900 70 | Q8901 71 | # mostly DGs and things associated to the commission 72 | Q384175 73 | Q596757 74 | Q610734 75 | Q818564 76 | Q1378038 77 | Q1378040 78 | Q1476458 79 | Q1485366 80 | Q1500915 81 | Q1501745 82 | Q1501749 83 | Q1501751 84 | Q1501753 85 | Q1780205 86 | Q2231042 87 | Q5280604 88 | # European parliament and outgoing 89 | Q8889 90 | Q466985 91 | Q740126 92 | Q18088790 93 | Q142 94 | Q27169 95 | Q748720 96 | Q2391857 97 | Q5438305 98 | Q6054776 99 | Q8427547 100 | Q55394398 101 | # European council and outgoing 102 | Q8886 103 | Q150 104 | Q188 105 | Q458 106 | Q1860 107 | Q8882 108 | Q8896 109 | Q8908 110 | Q122482 111 | Q735587 112 | Q748720 113 | Q950958 114 | Q1375164 115 | Q2285706 116 | Q8427286 117 | Q62023253 118 | # head of states eu countries 119 | Q9682 120 | Q16004 121 | Q29032 122 | Q29207 123 | Q45068 124 | Q57276 125 | Q57467 126 | Q76658 127 | Q78869 128 | Q102139 129 | Q154952 130 | Q155004 131 | Q191045 132 | Q206471 133 | Q348043 134 | Q701484 135 | Q1508402 136 | Q2112764 137 | Q3052772 138 | Q3176299 139 | Q3407316 140 | Q3956186 141 | Q6756617 142 | Q9151911 143 | Q9268379 144 | Q12366816 145 | Q26257557 146 | Q26273268 147 | # head of governement eu countries 148 | #select distinct ?o2 where { wd:Q458 wdt:P150 ?o . ?o wdt:P6 ?o2 . ?o2 rdfs:label ?label } 149 | Q567 150 | Q5015 151 | Q57641 152 | Q57775 153 | Q57792 154 | Q180589 155 | Q552751 156 | Q561213 157 | Q610788 158 | Q671955 159 | Q916162 160 | Q938224 161 | Q1728820 162 | Q1797508 163 | Q2112764 164 | Q2621730 165 | Q2740012 166 | Q3579995 167 | Q3736499 168 | Q6070218 169 | Q10819807 170 | Q11685764 171 | Q11771436 172 | Q11852228 173 | Q12795878 174 | Q17330556 175 | Q18434995 176 | Q53844829 177 | # capitals of eu countries 178 | #select distinct ?o2 where { wd:Q458 wdt:P150 ?o . ?o wdt:P36 ?o2 . ?o2 rdfs:label ?label } 179 | Q64 180 | Q84 181 | Q90 182 | Q216 183 | Q220 184 | Q239 185 | Q270 186 | Q437 187 | Q472 188 | Q597 189 | Q727 190 | Q1085 191 | Q1435 192 | Q1524 193 | Q1741 194 | Q1748 195 | Q1754 196 | Q1757 197 | Q1761 198 | Q1770 199 | Q1773 200 | Q1780 201 | Q1781 202 | Q1842 203 | Q2807 204 | Q3856 205 | Q19660 206 | Q23800 207 | # Institutions of the european union 208 | #select distinct ?o where { ?o wdt:P31 wd:Q748720 . } 209 | Q56683646 210 | Q4951 211 | Q8880 212 | Q8886 213 | Q8889 214 | Q8896 215 | Q8900 216 | Q8901 217 | # mostly DGs and things associated to the commission 218 | Q384175 219 | Q596757 220 | Q610734 221 | Q818564 222 | Q1378038 223 | Q1378040 224 | Q1476458 225 | Q1485366 226 | Q1500915 227 | Q1501745 228 | Q1501749 229 | Q1501751 230 | Q1501753 231 | Q1780205 232 | Q2231042 233 | Q2661677 234 | Q2983826 235 | Q3304308 236 | Q3375710 237 | Q5280584 238 | Q5280587 239 | Q5280589 240 | Q5280590 241 | Q5280591 242 | Q5280593 243 | Q5280595 244 | Q5280596 245 | Q5280597 246 | Q5280598 247 | Q5280599 248 | Q5280602 249 | Q5280603 250 | Q5280604 251 | Q5280607 252 | Q5468311 253 | Q6047642 254 | Q7079168 255 | Q15811618 256 | Q16572792 257 | Q16978157 258 | Q63161299 259 | Q63161910 260 | Q2142823 261 | -------------------------------------------------------------------------------- /monitor_wikidata_identifier_changes.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | import configparser 3 | import json 4 | import sys 5 | import time 6 | import traceback 7 | from datetime import timedelta 8 | 9 | import pywikibot 10 | from SPARQLWrapper import SPARQLWrapper 11 | 12 | from util.PropertyWikidataIdentifier import PropertyWikidataIdentifier 13 | from util.util import WikibaseImporter 14 | 15 | # connect to the wikibase 16 | wikibase = pywikibot.Site("my", "my") 17 | wikidata = pywikibot.Site("wikidata", "wikidata") 18 | 19 | """ 20 | THIS CLASS RUNS FREQUENTLY TO MONITOR THE CHANGES AND IMPORT WIKIDATA CHANGES IF ANY 21 | """ 22 | 23 | 24 | class MonitorChanges: 25 | 26 | def __init__(self,wikibase,wikidata): 27 | self.wikibase = wikibase 28 | self.wikidata = wikidata 29 | wikibase_repo = wikibase.data_repository() 30 | self.wikibase_repo = wikibase_repo 31 | wikidata_repo = wikidata.data_repository() 32 | self.wikidata_repo = wikidata_repo 33 | identifier = PropertyWikidataIdentifier() 34 | identifier.get(wikibase_repo) 35 | self.wikidata_code_property_id = identifier.itemIdentifier 36 | self.wikidata_pid_property_id = identifier.propertyIdentifier 37 | self.wikibase_importer = WikibaseImporter(wikibase_repo,wikidata_repo) 38 | 39 | def get_claim(self, item_id): 40 | entity = pywikibot.ItemPage(self.wikibase_repo, item_id) 41 | claims = entity.get(u'claims') # Get all the existing claims 42 | return claims 43 | 44 | def check_differences(self, item_id, change): 45 | if item_id and item_id[0] == 'Q': 46 | print(f' changed item {item_id} : edit type {change.get("type")}') 47 | item = pywikibot.ItemPage(self.wikibase_repo, item_id) 48 | item.get() 49 | existing_claims = self.get_claim(item.id) 50 | if (u'' + self.wikidata_code_property_id + '' in existing_claims[u'claims'] and len( 51 | list(existing_claims.get('claims'))) > 1): 52 | rev_text = [] 53 | for x in item.revisions(total=3, content=True): 54 | rev_text.append(x.text) 55 | # RECENT REVISION CONTAINS WIKIDATAQID PROPERTY AND PREVIOUS REVISION DOES NOT CONTAINS THAT PROPERTY 56 | if (json.loads(rev_text[0]).get('claims').get(self.wikidata_code_property_id, None) is not None and json.loads(rev_text[1]).get( 57 | 'claims').get(self.wikidata_code_property_id, None) is None): 58 | wikidata_qid = existing_claims[u'claims'][self.wikidata_code_property_id][0].toJSON()['mainsnak']['datavalue']['value'] 59 | print("Entity "+item_id+" has a new link to wikidata id "+wikidata_qid+" importing it ... ") 60 | wikidata_item = pywikibot.ItemPage(self.wikidata_repo, wikidata_qid) 61 | wikidata_item.get() 62 | self.wikibase_importer.change_item(wikidata_item, self.wikibase_repo, True) 63 | else: 64 | return 65 | 66 | # get changes 67 | def get_changes(self): 68 | print("Fetching changes ...") 69 | current_time = self.wikibase.server_time() 70 | requests=self.wikibase.recentchanges(start=current_time, end=current_time - timedelta(minutes=20)) 71 | response=requests.request.submit() 72 | changes=response.get('query')['recentchanges'] 73 | for change in changes: 74 | try: 75 | if change.get('type') == 'new': 76 | item_id=change.get('title').split(':')[-1] 77 | self.check_differences(item_id, change) 78 | elif change.get('type') == 'edit': 79 | item_id = change.get('title').split(':')[-1] 80 | self.check_differences(item_id, change) 81 | except Exception as e: 82 | print(e) 83 | return response 84 | 85 | 86 | def start(): 87 | changes = MonitorChanges(wikibase,wikidata) 88 | while True: 89 | try: 90 | res = changes.get_changes() 91 | print(res) 92 | except Exception as e: 93 | print(e) 94 | print('Wikiwata QID Change Monitor sleeps for 180s') 95 | time.sleep(180) 96 | 97 | 98 | start() 99 | exit() 100 | 101 | 102 | 103 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | certifi==2019.11.28 2 | chardet==3.0.4 3 | idna==2.8 4 | isodate==0.6.0 5 | pycparser==2.19 6 | pyparsing==2.4.5 7 | rdflib==4.2.2 8 | requests==2.22.0 9 | six==1.13.0 10 | SPARQLWrapper==1.8.4 11 | urllib3==1.25.7 12 | 13 | pywikibot~=7.7.2 14 | mwparserfromhell>=0.5.0 -------------------------------------------------------------------------------- /script.sh: -------------------------------------------------------------------------------- 1 | while true 2 | do 3 | echo "kill previews process" 4 | pkill -f import_recent_changes.py 5 | sleep 3 6 | echo "importing entity" 7 | python -u import_recent_changes.py & 8 | echo "import done" 9 | sleep 300 10 | done 11 | -------------------------------------------------------------------------------- /user-config.py: -------------------------------------------------------------------------------- 1 | from pywikibot.config2 import usernames 2 | 3 | user_families_paths = ['./config'] 4 | mylang = "wikidata" 5 | family = "wikidata" 6 | usernames['my']['my'] = u'admin' 7 | password_file = "user-password.py" 8 | minthrottle = 0 9 | maxthrottle = 0 10 | max_retries = 100 11 | #verbose_output = True 12 | -------------------------------------------------------------------------------- /user-password.py: -------------------------------------------------------------------------------- 1 | (u'admin', BotPassword(u'WikidataUpdater', u'BotPassword')) 2 | -------------------------------------------------------------------------------- /util/IdSparql.py: -------------------------------------------------------------------------------- 1 | # this class makes the correspondence between Wikidata entities and entities in the Wikibase using the external 2 | # identifier for Wikidata 3 | from SPARQLWrapper import SPARQLWrapper, JSON 4 | import configparser 5 | 6 | 7 | class IdSparql: 8 | def __init__(self, endpoint, item_identifier, property_identifier): 9 | self.mapEntity = {} 10 | self.mapProperty = {} 11 | self.endpoint = endpoint 12 | self.item_identifier = item_identifier 13 | self.property_identifier = property_identifier 14 | self.app_config = configparser.ConfigParser() 15 | self.app_config.read('config/application.config.ini') 16 | 17 | def load(self): 18 | sparql = SPARQLWrapper(self.endpoint) 19 | query = """ 20 | select ?item ?id where { 21 | ?item <""" + self.app_config.get('wikibase','propertyUri') + """/direct/""" + self.item_identifier + """> ?id 22 | } 23 | """ 24 | sparql.setQuery(query) 25 | sparql.setReturnFormat(JSON) 26 | results = sparql.query().convert() 27 | for result in results['results']['bindings']: 28 | split = result['item']['value'].split('/') 29 | id = split[len(split)-1] 30 | if id.startswith('Q'): 31 | self.mapEntity[result['id']['value']] = id 32 | query = """ 33 | select ?item ?id where { 34 | ?item <""" + self.app_config.get('wikibase','propertyUri') + """/direct/""" + self.property_identifier + """> ?id 35 | } 36 | """ 37 | sparql.setQuery(query) 38 | sparql.setReturnFormat(JSON) 39 | results = sparql.query().convert() 40 | for result in results['results']['bindings']: 41 | split = result['item']['value'].split('/') 42 | id = split[len(split) - 1] 43 | if id.startswith('P'): 44 | self.mapProperty[result['id']['value']] = id 45 | else: 46 | print("This should not happen") 47 | 48 | def get_id(self,id): 49 | if id.startswith("Q"): 50 | return self.mapEntity[id] 51 | elif id.startswith("P"): 52 | return self.mapProperty[id] 53 | else: 54 | raise NameError('This should not happen') 55 | 56 | def save_id(self,id,new_id): 57 | if id.startswith("Q"): 58 | self.mapEntity[id] = str(new_id) 59 | elif id.startswith("P"): 60 | self.mapProperty[id] = str(new_id) 61 | else: 62 | raise NameError('This should not happen') 63 | 64 | def contains_id(self,id): 65 | if id.startswith("Q"): 66 | return id in self.mapEntity 67 | elif id.startswith("P"): 68 | return id in self.mapProperty 69 | else: 70 | print('This should not happen') -------------------------------------------------------------------------------- /util/PropertyWikidataIdentifier.py: -------------------------------------------------------------------------------- 1 | # This class generates an external identifier to Wikidata, i.e. for the entites imported from Wikidata the wikidata 2 | # ID will be indicated 3 | import pywikibot 4 | from pywikibot.data import api 5 | 6 | 7 | def wiki_item_exists(wikibase_repo, label): 8 | params = {'action': 'wbsearchentities', 'format': 'json', 9 | 'language': 'en', 'type': 'property', 'limit':1, 10 | 'search': label} 11 | request = api.Request(site=wikibase_repo, parameters=params) 12 | result = request.submit() 13 | return result['search'] 14 | 15 | 16 | class PropertyWikidataIdentifier: 17 | 18 | def __init__(self): 19 | self.itemIdentifier = None 20 | self.propertyIdentifier = None 21 | 22 | def get(self, wikibase_repo): 23 | if len(wiki_item_exists(wikibase_repo, "Wikidata QID"))>0: 24 | self.itemIdentifier = str(wiki_item_exists(wikibase_repo, "Wikidata QID")[0]['id']) 25 | else: 26 | wikibase_item = pywikibot.PropertyPage(wikibase_repo, datatype='external-id') 27 | data = {} 28 | mylabels = {"en": "Wikidata QID"} 29 | mydescriptions = {"en": "Corresponding QID in Wikidata (do not change the label of this property otherwise you will break WikibaseSync)"} 30 | data['labels'] = mylabels 31 | data['descriptions'] = mydescriptions 32 | wikibase_item.editEntity(data, summary=u'Insert a property to have a wikidata identifier') 33 | self.itemIdentifier = str(wikibase_item.getID()) 34 | if len(wiki_item_exists(wikibase_repo, "Wikidata PID"))>0: 35 | self.propertyIdentifier = str(wiki_item_exists(wikibase_repo, "Wikidata PID")[0]['id']) 36 | else: 37 | wikibase_item = pywikibot.PropertyPage(wikibase_repo, datatype='external-id') 38 | data = {} 39 | mylabels = {"en": "Wikidata PID"} 40 | mydescriptions = {"en": "id in wikidata of the corresponding properties (do not change this property otherwise you will break WikibaseSync)"} 41 | data['labels'] = mylabels 42 | data['descriptions'] = mydescriptions 43 | wikibase_item.editEntity(data, summary=u'Insert a property to have a wikidata identifier') 44 | self.propertyIdentifier = str(wikibase_item.getID()) 45 | -------------------------------------------------------------------------------- /util/changes.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from datetime import datetime, timedelta,timezone 3 | 4 | 5 | def recent_changes(rccontinue, minutes, url ="https://wikidata.org/w/api.php"): 6 | time = datetime.now(timezone.utc) - timedelta(minutes=minutes) 7 | S = requests.Session() 8 | 9 | parameters = { 10 | "format": "json", 11 | "rcprop": "title", 12 | "list": "recentchanges", 13 | "action": "query", 14 | "rclimit": "500", 15 | 16 | } 17 | if rccontinue is not None: 18 | parameters['rccontinue'] = rccontinue 19 | 20 | R = S.get(url=url, params=parameters) 21 | data = R.json() 22 | S.close() 23 | 24 | date_time_obj = datetime.strptime(data['continue']['rccontinue'].split('|')[0], '%Y%m%d%H%M%S').replace(tzinfo=timezone.utc) 25 | date_time_obj = date_time_obj.astimezone(timezone.utc) 26 | print("time ",time," date_time_obj ",date_time_obj) 27 | if time < date_time_obj: 28 | older_changes = recent_changes(data['continue']['rccontinue'], minutes) 29 | return data['query']['recentchanges'] + older_changes 30 | else: 31 | print("Finished") 32 | return data['query']['recentchanges'] 33 | 34 | recent_changes(None, 30) -------------------------------------------------------------------------------- /util/get_wikidata_changes.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from datetime import datetime, timedelta,timezone 3 | 4 | 5 | def get_wikidata_changes(rccontinue, minutes): 6 | time = datetime.now(timezone.utc) - timedelta(minutes=minutes) 7 | S = requests.Session() 8 | 9 | url = "https://wikidata.org/w/api.php" 10 | 11 | parameters = { 12 | "format": "json", 13 | "rcprop": "title", 14 | "list": "recentchanges", 15 | "action": "query", 16 | "rclimit": "500", 17 | 18 | } 19 | if rccontinue is not None: 20 | parameters['rccontinue'] = rccontinue 21 | 22 | R = S.get(url=url, params=parameters) 23 | data = R.json() 24 | S.close() 25 | 26 | 27 | date_time_obj = datetime.strptime(data['continue']['rccontinue'].split('|')[0], '%Y%m%d%H%M%S').replace(tzinfo=timezone.utc) 28 | date_time_obj = date_time_obj.astimezone(timezone.utc) 29 | print("time ",time," date_time_obj ",date_time_obj) 30 | if time < date_time_obj: 31 | older_changes = get_wikidata_changes(data['continue']['rccontinue'], minutes) 32 | return data['query']['recentchanges'] + older_changes 33 | else: 34 | print("Finished") 35 | return data['query']['recentchanges'] 36 | 37 | get_wikidata_changes(None,30) -------------------------------------------------------------------------------- /util/util.py: -------------------------------------------------------------------------------- 1 | import re 2 | from decimal import Decimal 3 | import json 4 | import pywikibot 5 | from pywikibot.page import Claim 6 | import configparser 7 | 8 | from util.IdSparql import IdSparql 9 | from util.PropertyWikidataIdentifier import PropertyWikidataIdentifier 10 | 11 | user_config = __import__("user-config") 12 | 13 | languages = ["bg", "cs", "da", "de", "el", "en", "es", "et", "fi", "fr", "ga", "hr", "hu", "it", "lb", "lt", "lv", "mt", 14 | "nl", "pl", "pt", "ro", "sk", "sl", "sv", "tr"] 15 | 16 | 17 | class WikibaseImporter: 18 | def __init__(self, wikibase_repo, wikidata_repo): 19 | self.wikibase_repo = wikibase_repo 20 | self.wikidata_repo = wikidata_repo 21 | self.identifier = PropertyWikidataIdentifier() 22 | self.identifier.get(wikibase_repo) 23 | self.appConfig = configparser.ConfigParser() 24 | self.appConfig.read('config/application.config.ini') 25 | endpoint = self.appConfig.get('wikibase', 'sparqlEndPoint') 26 | self.id = IdSparql(endpoint, self.identifier.itemIdentifier, self.identifier.propertyIdentifier) 27 | self.id.load() 28 | 29 | # transforms the json to an item 30 | def json_to_item(self, wikibase_repo, json_object): 31 | y = json.loads(json_object) 32 | data = {} 33 | # labels 34 | labels = {} 35 | if 'labels' in y: 36 | for lang in y['labels']: 37 | if 'removed' not in y['labels'][lang]: # T56767 38 | labels[lang] = y['labels'][lang]['value'] 39 | 40 | # descriptions 41 | descriptions = {} 42 | if 'descriptions' in y: 43 | for lang in y['descriptions']: 44 | descriptions[lang] = y['descriptions'][lang]['value'] 45 | 46 | # aliases 47 | aliases = {} 48 | if 'aliases' in y: 49 | for lang in y['aliases']: 50 | aliases[lang] = [] 51 | for value in y['aliases'][lang]: 52 | aliases[lang].append(value['value']) 53 | 54 | # claims 55 | claims = {} 56 | if 'claims' in y: 57 | for pid in y['claims']: 58 | claims[pid] = [] 59 | for claim in y['claims'][pid]: 60 | try: 61 | c = Claim.fromJSON(wikibase_repo, claim) 62 | # c.on_item = self 63 | claims[pid].append(c) 64 | except KeyError: 65 | print("This can happen when a property was deleted") 66 | 67 | data['labels'] = labels 68 | data['descriptions'] = descriptions 69 | data['aliases'] = aliases 70 | data['claims'] = claims 71 | return data 72 | 73 | def get_last_label_update(self, revisions, focus_label): 74 | """ 75 | for a given label, return the last/most recent revision where it was updated 76 | :arguments 77 | :revisions - list of revisions on the wikibase item 78 | :focus_label - current label to get difference for 79 | :return - the revisions item where the last update was made on the focus label 80 | """ 81 | index = 0 82 | # get labels string as dictionary 83 | # most_recent_label_value = ast.literal_eval(revisions[index]['slots']['main']['*'])['labels'][focus_label]['value'] 84 | most_recent_label_value = json.loads(revisions[index]['slots']['main']['*'])['labels'][focus_label]['value'] 85 | for revision in revisions: 86 | if index == 0: # focus revision 87 | index += 1 88 | elif index > 0: 89 | # get revision asterisk string data as dictionary 90 | revision_asterisk = json.loads(revision['slots']['main']['*']) 91 | if revision_asterisk['labels'][focus_label]['value'] != most_recent_label_value: 92 | # the next revision after this is where a change was made 93 | return revisions[index - 1] 94 | 95 | else: 96 | index += 1 97 | return None 98 | 99 | # comparing the labels 100 | def diffLabels(self, wikidata_item, wikibase_item): 101 | """ 102 | LOADING REVISION HERE IS EFFICIENT, UPDATES TURN OUT SUCCESSFUL ( but script ends with an exception) 103 | pywikibot.exceptions.NoPage: Page [[my:Item:-1]] doesn't exist. 104 | """ 105 | 106 | revisions = [] 107 | try: 108 | revisions_tmp = wikibase_item.revisions(content=True) 109 | # problem with the revisions_tmp object 110 | for h in revisions_tmp: 111 | revisions.append(h) 112 | except pywikibot.exceptions.NoPageError: 113 | # pywikibot.exceptions.NoPage: Page [[my:Item:-1]] doesn't exist 114 | # No revision 115 | pass 116 | 117 | mylabels = {} 118 | for label in wikidata_item.labels: 119 | if label in languages: 120 | # confirm that wikidata label and description do not have the same value before proceeding 121 | if not (wikidata_item.labels.get(label) == wikidata_item.descriptions.get(label)): 122 | if wikibase_item.getID() != str(-1) and label in wikibase_item.labels: 123 | if not (wikidata_item.labels.get(label) == wikibase_item.labels.get(label)): 124 | 125 | if revisions is None or revisions[0] is None: 126 | # no update has been done on label, accept remote update 127 | mylabels[label] = wikidata_item.labels.get(label) 128 | else: 129 | if self.appConfig.get('wikibase', 'overwriteLocalChanges').lower() == 'false': 130 | last_update_revision_on_label = self.get_last_label_update(revisions, label) 131 | if last_update_revision_on_label is None: 132 | # no update has been done on label, accept remote update 133 | mylabels[label] = wikidata_item.labels.get(label) 134 | else: 135 | # accept remote update if the last update on the label was made by wikidata updater 136 | # leave current value if update was by a local user/admin 137 | # if last_update_revision_on_label["user"].lower() == self.appConfig.get('wikibase', 'user').lower(): 138 | if last_update_revision_on_label["user"].lower() == str( 139 | user_config.usernames['my']['my']): 140 | mylabels[label] = wikidata_item.labels.get(label) 141 | else: 142 | mylabels[label] = wikidata_item.labels.get(label) 143 | else: 144 | mylabels[label] = wikidata_item.labels.get(label) 145 | return mylabels 146 | 147 | def change_labels(self, wikidata_item, wikibase_item): 148 | mylabels = self.diffLabels(wikidata_item, wikibase_item) 149 | if len(mylabels) != 0: 150 | print("Import labels") 151 | # wikibase_item.editLabels(mylabels, summary=u'Label in wikidata changed') 152 | try: 153 | wikibase_item.editLabels(mylabels, summary=u'Label in wikidata changed') 154 | return wikibase_item.getID() 155 | except pywikibot.exceptions.OtherPageSaveError as e: 156 | print("Could not set labels of ", wikibase_item.getID()) 157 | print(e) 158 | # this happens when a property with the same label already exists 159 | x = re.search(r"\[\[Property:.*\]\]", str(e)) 160 | if x: 161 | return x.group(0).replace("[[Property:", "").split("|")[0] 162 | else: 163 | print("This should not happen 3") 164 | 165 | # comparing the descriptions 166 | def diff_descriptions(self, wikidata_item, wikibase_item): 167 | myDescriptions = {} 168 | for description in wikidata_item.descriptions: 169 | if description in languages: 170 | # confirm that wikidata label and description do not have the same value before proceeding 171 | if not (wikidata_item.labels.get(description) == wikidata_item.descriptions.get(description)): 172 | if wikibase_item.getID() != str(-1) and description in wikibase_item.descriptions: 173 | if not (wikidata_item.descriptions.get(description) == wikibase_item.descriptions.get( 174 | description)): 175 | # print("Change", wikidata_item.descriptions.get(description), "----", wikibase_item.descriptions.get(description)) 176 | myDescriptions[description] = wikidata_item.descriptions.get(description) 177 | else: 178 | myDescriptions[description] = wikidata_item.descriptions.get(description) 179 | return myDescriptions 180 | 181 | # comparing the descriptions 182 | def change_descriptions(self, wikidata_item, wikibase_item): 183 | myDescriptions = self.diff_descriptions(wikidata_item, wikibase_item) 184 | # print(myDescriptions) 185 | if len(myDescriptions) != 0: 186 | print("Import Descriptions") 187 | try: 188 | wikibase_item.editEntity({'descriptions': myDescriptions}, 189 | summary='The description in Wikidata changed') 190 | except pywikibot.exceptions.OtherPageSaveError as e: 191 | print("Could not set description of ", wikibase_item.getID()) 192 | print(e) 193 | x = re.search(r'\[\[Item:.*\]\]', str(e)) 194 | if x: 195 | return x.group(0).replace("[[Item:", "").split("|")[0] 196 | else: 197 | print("This should not happen 4") 198 | print("Error probably property or item already existing ", e) 199 | 200 | # diff the aliases 201 | def diff_aliases(self, wikidata_item, wikibase_item): 202 | mylabels = {} 203 | for alias in wikidata_item.aliases: 204 | if alias in languages: 205 | if wikibase_item.getID() != str(-1) and alias in wikibase_item.aliases: 206 | if not (wikidata_item.aliases.get(alias) == wikibase_item.aliases.get(alias)): 207 | # print("Change", wikidata_item.aliases.get(alias), "----", wikibase_item.aliases.get(alias)) 208 | mylabels[alias] = wikidata_item.aliases.get(alias) 209 | else: 210 | mylabels[alias] = wikidata_item.aliases.get(alias) 211 | return mylabels 212 | 213 | # comparing the aliases 214 | def change_aliases(self, wikidata_item, wikibase_item): 215 | myaliases = self.diff_aliases(wikidata_item, wikibase_item) 216 | if len(myaliases) != 0: 217 | print("Import aliases") 218 | try: 219 | wikibase_item.editAliases(myaliases, summary=u'Aliases in wikidata changed') 220 | except pywikibot.exceptions.OtherPageSaveError as e: 221 | print("This should not happen ", e) 222 | 223 | # comparing the sitelinks 224 | def diff_site_links(self, wikidata_item, wikibase_item): 225 | siteLinks = [] 226 | id = wikibase_item.getID() 227 | for sitelink in wikidata_item.sitelinks: 228 | for lang in languages: 229 | if str(sitelink) == lang + "wiki": 230 | if id != str(-1) and sitelink in wikibase_item.sitelinks: 231 | if not (str(wikidata_item.sitelinks.get(sitelink)) == str( 232 | wikibase_item.sitelinks.get(sitelink))): 233 | # print("Change", wikidata_item.sitelinks.get(sitelink), "----", wikibase_item.sitelinks.get(sitelink)) 234 | siteLinks.append({'site': sitelink, 235 | 'title': str(wikidata_item.sitelinks.get(sitelink)).replace('[[', 236 | '').replace( 237 | ']]', '')}) 238 | else: 239 | # print("Change", wikidata_item.sitelinks.get(sitelink), "----", wikibase_item.sitelinks.get(sitelink)) 240 | siteLinks.append({'site': sitelink, 241 | 'title': str(wikidata_item.sitelinks.get(sitelink)).replace('[[', 242 | '').replace( 243 | ']]', '')}) 244 | return siteLinks 245 | 246 | # comparing the sitelinks 247 | def change_site_links(self, wikidata_item, wikibase_item): 248 | siteLinks = self.diff_site_links(wikidata_item, wikibase_item) 249 | if len(siteLinks) != 0: 250 | print("Import sitelinks") 251 | try: 252 | wikibase_item.setSitelinks(siteLinks, summary=u'Sitelinks in wikidata changed') 253 | except pywikibot.exceptions.OtherPageSaveError as e: 254 | print("Could not set sitelinks of ", wikibase_item.getID()) 255 | print(e) 256 | except pywikibot.exceptions.UnknownSite as e: 257 | print("Could not set sitelinks of ", wikibase_item.getID()) 258 | print(e) 259 | 260 | def import_item(self, wikidata_item): 261 | print("Import Entity", wikidata_item.getID() + " from Wikidata") 262 | wikibase_item = pywikibot.ItemPage(self.wikibase_repo) 263 | mylabels = self.diffLabels(wikidata_item, wikibase_item) 264 | myDescriptions = self.diff_descriptions(wikidata_item, wikibase_item) 265 | myaliases = self.diff_aliases(wikidata_item, wikibase_item) 266 | # mySitelinks = diffSiteLinks(wikidata_item, wikibase_item) 267 | mySitelinks = [] 268 | claim = pywikibot.page.Claim(self.wikibase_repo, self.identifier.itemIdentifier, datatype='external-id') 269 | target = wikidata_item.getID() 270 | claim.setTarget(target) 271 | data = { 272 | 'labels': mylabels, 273 | 'descriptions': myDescriptions, 274 | 'aliases': myaliases, 275 | 'sitelinks': mySitelinks, 276 | 'claims': [claim.toJSON()] 277 | } 278 | # print(data) 279 | try: 280 | wikibase_item.editEntity(data, summary=u'Importing entity ' + wikidata_item.getID() + ' from wikidata') 281 | self.id.save_id(wikidata_item.getID(), wikibase_item.getID()) 282 | return wikibase_item.getID() 283 | except pywikibot.exceptions.OtherPageSaveError as e: 284 | print("Could not set description of ", wikibase_item.getID()) 285 | print("This is the error message ", e) 286 | x = re.search(r'\[\[Item:.*\]\]', str(e)) 287 | if x: 288 | return x.group(0).replace("[[Item:", "").split("|")[0] 289 | else: 290 | print("This should not happen 5") 291 | print("Error probably property or item already existing ", e) 292 | 293 | def importProperty(self, wikidata_item): 294 | print("Import Property", wikidata_item.getID() + " from Wikidata") 295 | wikibase_item = pywikibot.PropertyPage(self.wikibase_repo, datatype=wikidata_item.type) 296 | mylabels = self.diffLabels(wikidata_item, wikibase_item) 297 | myDescriptions = self.diff_descriptions(wikidata_item, wikibase_item) 298 | myaliases = self.diff_aliases(wikidata_item, wikibase_item) 299 | claim = pywikibot.page.Claim(self.wikibase_repo, self.identifier.propertyIdentifier, datatype='external-id') 300 | target = wikidata_item.getID() 301 | claim.setTarget(target) 302 | 303 | data = { 304 | 'labels': mylabels, 305 | 'descriptions': myDescriptions, 306 | 'aliases': myaliases, 307 | 'claims': [claim.toJSON()] 308 | } 309 | try: 310 | wikibase_item.editEntity(data, 311 | summary=u'Importing property ' + wikidata_item.getID() + ' from wikidata') 312 | self.id.save_id(wikidata_item.getID(), wikibase_item.getID()) 313 | return wikibase_item.getID() 314 | except pywikibot.exceptions.OtherPageSaveError as e: 315 | print("Could not set description of ", wikibase_item.getID()) 316 | print(e) 317 | x = re.search(r'\[\[Item:.*\]\]', str(e)) 318 | if x: 319 | return x.group(0).replace("[[Item:", "").split("|")[0] 320 | else: 321 | print("This should not happen 6") 322 | print("Error probably property or item already existing ", e) 323 | 324 | # comparing two claims 325 | def compare_claim(self, wikidata_claim, wikibase_claim, translate): 326 | found = False 327 | found_equal_value = False 328 | wikidata_propertyId = wikidata_claim.get('property') 329 | wikibase_propertyId = wikibase_claim.get('property') 330 | if ((translate == True and self.id.get_id(wikidata_propertyId) == wikibase_propertyId) or ( 331 | translate == False and wikidata_propertyId == wikibase_propertyId)): 332 | found = True 333 | if wikidata_claim.get('snaktype') == 'somevalue' and wikibase_claim.get('snaktype') == 'somevalue': 334 | found_equal_value = True 335 | elif wikidata_claim.get('snaktype') == 'novalue' and wikibase_claim.get('snaktype') == 'novalue': 336 | found_equal_value = True 337 | else: 338 | # WIKIBASE_ITEM 339 | if wikidata_claim.get('datatype') == 'wikibase-item': 340 | if wikibase_claim.get('datatype') == 'wikibase-item': 341 | wikidata_objectId = 'Q' + str( 342 | wikidata_claim.get('datavalue').get('value').get('numeric-id')) 343 | wikibase_objectId = 'Q' + str( 344 | wikibase_claim.get('datavalue').get('value').get('numeric-id')) 345 | # print(self.id.get_id(wikidata_propertyId),"---", wikibase_propertyId) 346 | # print(self.id.get_id(wikidata_objectId),"---",wikibase_objectId) 347 | if translate: 348 | if self.id.contains_id(wikidata_objectId) and self.id.get_id( 349 | wikidata_objectId) == wikibase_objectId: 350 | found_equal_value = True 351 | else: 352 | if wikidata_objectId == wikibase_objectId: 353 | found_equal_value = True 354 | # WIKIBASE-PROPERTY 355 | elif wikidata_claim.get('datatype') == 'wikibase-property': 356 | if wikibase_claim.get('datatype') == 'wikibase-property': 357 | wikidata_objectId = 'P' + str( 358 | wikidata_claim.get('datavalue').get('value').get('numeric-id')) 359 | wikibase_objectId = 'P' + str( 360 | wikibase_claim.get('datavalue').get('value').get('numeric-id')) 361 | # print(self.id.get_id(wikidata_propertyId),"---", wikibase_propertyId) 362 | # print(self.id.get_id(wikidata_objectId),"---",wikibase_objectId) 363 | if translate: 364 | if self.id.contains_id(wikidata_objectId) and self.id.get_id( 365 | wikidata_objectId) == wikibase_objectId: 366 | found_equal_value = True 367 | else: 368 | if wikidata_objectId == wikibase_objectId: 369 | found_equal_value = True 370 | # MONOLINGUALTEXT 371 | elif wikidata_claim.get('datatype') == 'monolingualtext': 372 | if wikibase_claim.get('datatype') == 'monolingualtext': 373 | wikibase_propertyId = wikibase_claim.get('property') 374 | 375 | wikibase_text = wikibase_claim.get('datavalue').get('value').get( 376 | 'text') 377 | wikibase_language = wikibase_claim.get('datavalue').get( 378 | 'value').get('language') 379 | 380 | wikidata_text = wikidata_claim.get('datavalue').get('value').get( 381 | 'text') 382 | wikidata_language = wikidata_claim.get('datavalue').get( 383 | 'value').get('language') 384 | 385 | # if wikibase_propertyId == "P8": 386 | # print(wikibase_propertyId) 387 | # print(wikibase_text , "---", wikidata_text) 388 | # print(wikibase_language, "---", wikidata_language) 389 | if wikibase_text == wikidata_text and wikibase_language == wikidata_language: 390 | found_equal_value = True 391 | 392 | # COMMONS-MEDIA 393 | elif wikidata_claim.get('datatype') == 'commonsMedia': 394 | if wikibase_claim.get('datatype') == 'commonsMedia': 395 | wikibase_propertyId = wikibase_claim.get('property') 396 | wikibase_text = wikibase_claim.get('datavalue').get('value') 397 | wikidata_text = wikidata_claim.get('datavalue').get('value') 398 | # print(self.id.get_id(wikidata_propertyId),'--',wikibase_propertyId,'--',wikibase_text, '--- ', wikidata_text, wikibase_text == wikidata_text) 399 | if wikibase_text == wikidata_text: 400 | found_equal_value = True 401 | # GLOBAL-COORDINATE 402 | elif wikidata_claim.get('datatype') == 'globe-coordinate': 403 | if wikibase_claim.get('datatype') == 'globe-coordinate': 404 | wikibase_propertyId = wikibase_claim.get('property') 405 | wikibase_latitude = wikibase_claim.get('datavalue').get( 406 | 'value').get('latitude') 407 | wikibase_longitude = wikibase_claim.get('datavalue').get( 408 | 'value').get('longitude') 409 | wikibase_altitude = wikibase_claim.get('datavalue').get( 410 | 'value').get('altitude') 411 | wikibase_precision = wikibase_claim.get('datavalue').get( 412 | 'value').get('precision') 413 | wikibase_globe = wikibase_claim.get('datavalue').get('value').get( 414 | 'globe') 415 | wikidata_latitude = wikidata_claim.get('datavalue').get( 416 | 'value').get('latitude') 417 | wikidata_longitude = wikidata_claim.get('datavalue').get( 418 | 'value').get('longitude') 419 | wikidata_altitude = wikidata_claim.get('datavalue').get( 420 | 'value').get('altitude') 421 | wikidata_precision = wikidata_claim.get('datavalue').get( 422 | 'value').get('precision') 423 | wikidata_globe = wikidata_claim.get('datavalue').get('value').get( 424 | 'globe') 425 | if wikibase_latitude == wikidata_latitude and wikibase_longitude == wikidata_longitude and wikibase_globe == wikidata_globe \ 426 | and wikibase_altitude == wikidata_altitude and ( 427 | wikibase_precision == wikidata_precision or ( 428 | wikibase_precision == 1 and wikidata_precision == None)): 429 | found_equal_value = True 430 | # QUANTITY 431 | elif wikidata_claim.get('datatype') == 'quantity': 432 | if wikibase_claim.get('datatype') == 'quantity': 433 | wikibase_propertyId = wikibase_claim.get('property') 434 | 435 | wikibase_amount = wikibase_claim.get('datavalue').get('value').get( 436 | 'amount') 437 | wikibase_upperBound = wikibase_claim.get('datavalue').get( 438 | 'value').get('upperBound') 439 | wikibase_lowerBound = wikibase_claim.get('datavalue').get( 440 | 'value').get('lowerBound') 441 | wikibase_unit = wikibase_claim.get('datavalue').get('value').get( 442 | 'unit') 443 | 444 | wikidata_amount = wikidata_claim.get('datavalue').get('value').get( 445 | 'amount') 446 | wikidata_upperBound = wikidata_claim.get('datavalue').get( 447 | 'value').get('upperBound') 448 | wikidata_lowerBound = wikidata_claim.get('datavalue').get( 449 | 'value').get('lowerBound') 450 | wikidata_unit = wikidata_claim.get('datavalue').get('value').get( 451 | 'unit') 452 | # print("Compare") 453 | # print(wikibase_amount, "--", wikidata_amount) 454 | # print(wikibase_upperBound, "--", wikidata_upperBound) 455 | # print(wikibase_lowerBound, "--", wikidata_lowerBound) 456 | # print(wikibase_unit, "--", wikidata_unit) 457 | if wikibase_amount == wikidata_amount and wikibase_upperBound == wikidata_upperBound and wikibase_lowerBound == wikidata_lowerBound: 458 | if (wikidata_unit == None and wikibase_unit == None) or ( 459 | wikidata_unit == '1' and wikibase_unit == '1'): 460 | found_equal_value = True 461 | else: 462 | if ("entity/" in wikidata_unit) and ("entity/" in wikibase_unit): 463 | unit_id = wikibase_unit.split("entity/")[1] 464 | wikidata_unit_id = wikidata_unit.split("entity/")[1] 465 | if translate: 466 | if self.id.contains_id(wikidata_unit_id) and self.id.get_id( 467 | wikidata_unit_id) == unit_id: 468 | found_equal_value = True 469 | else: 470 | if wikidata_unit_id == unit_id: 471 | found_equal_value = True 472 | # print("EQUAL ",found_equal_value) 473 | 474 | # TIME 475 | elif wikidata_claim.get('datatype') == 'time': 476 | if wikibase_claim.get('datatype') == 'time': 477 | 478 | wikibase_propertyId = wikibase_claim.get('property') 479 | 480 | wikidata_time = wikidata_claim.get('datavalue').get('value').get('time') 481 | wikidata_precision = wikidata_claim.get('datavalue').get('value').get( 482 | 'precision') 483 | wikidata_after = wikidata_claim.get('datavalue').get('value').get( 484 | 'after') 485 | wikidata_before = wikidata_claim.get('datavalue').get('value').get( 486 | 'before') 487 | wikidata_timezone = wikidata_claim.get('datavalue').get('value').get( 488 | 'timezone') 489 | wikidata_calendermodel = wikidata_claim.get('datavalue').get( 490 | 'value').get( 491 | 'calendarmodel') 492 | 493 | wikibase_time = wikibase_claim.get('datavalue').get('value').get('time') 494 | wikibase_precision = wikibase_claim.get('datavalue').get('value').get( 495 | 'precision') 496 | wikibase_after = wikibase_claim.get('datavalue').get('value').get( 497 | 'after') 498 | wikibase_before = wikibase_claim.get('datavalue').get('value').get( 499 | 'before') 500 | wikibase_timezone = wikibase_claim.get('datavalue').get('value').get( 501 | 'timezone') 502 | wikibase_calendermodel = wikibase_claim.get('datavalue').get( 503 | 'value').get( 504 | 'calendarmodel') 505 | 506 | # print(wikidata_time , "---" , wikibase_time) 507 | # print(wikidata_precision , "---" , wikibase_precision) 508 | # print(wikidata_after , "---" , wikibase_after) 509 | # print(wikidata_before , "---" , wikibase_before) 510 | # print(wikidata_timezone , "---" , wikibase_timezone) 511 | # print(wikidata_calendermodel , "---" , wikibase_calendermodel) 512 | if wikidata_time == wikibase_time and wikidata_precision == wikibase_precision and wikidata_after == wikibase_after and wikidata_before == wikibase_before and wikidata_timezone == wikibase_timezone \ 513 | and wikidata_calendermodel == wikibase_calendermodel: 514 | found_equal_value = True 515 | 516 | # URL 517 | elif wikidata_claim.get('datatype') == 'url': 518 | if wikibase_claim.get('datatype') == 'url': 519 | wikibase_propertyId = wikibase_claim.get('property') 520 | wikibase_value = wikibase_claim.get('datavalue').get( 521 | 'value')[0:500] 522 | 523 | wikidata_value = wikidata_claim.get('datavalue').get( 524 | 'value')[0:500] 525 | if wikibase_value == wikidata_value: 526 | found_equal_value = True 527 | # STRING 528 | elif wikidata_claim.get('datatype') == 'string': 529 | if wikibase_claim.get('datatype') == 'string': 530 | wikibase_value = wikibase_claim.get('datavalue').get( 531 | 'value') 532 | wikidata_value = wikidata_claim.get('datavalue').get( 533 | 'value') 534 | if wikibase_value == wikidata_value: 535 | found_equal_value = True 536 | # EXTERNAL ID 537 | elif wikidata_claim.get('datatype') == 'external-id': 538 | if wikibase_claim.get('datatype') == 'external-id': 539 | wikibase_propertyId = wikibase_claim.get('property') 540 | 541 | wikibase_value = wikibase_claim.get('datavalue').get( 542 | 'value') 543 | wikibase_type = wikibase_claim.get('datavalue').get( 544 | 'type') 545 | 546 | wikidata_value = wikidata_claim.get('datavalue').get( 547 | 'value') 548 | wikidata_type = wikidata_claim.get('datavalue').get( 549 | 'type') 550 | # print(wikidata_propertyId) 551 | # if wikidata_propertyId == "P523": 552 | # print(wikibase_value, " --- ", wikidata_value) 553 | # print(wikibase_type, " --- ", wikidata_type) 554 | # print(id.get_id(wikidata_propertyId), " --- ", wikibase_propertyId) 555 | if wikibase_value == wikidata_value and wikibase_type == wikidata_type: 556 | found_equal_value = True 557 | # GEOSHAPE 558 | elif wikidata_claim.get('datatype') == 'geo-shape': 559 | if wikibase_claim.get('datatype') == 'geo-shape': 560 | wikibase_propertyId = wikibase_claim.get('property') 561 | wikibase_value = wikibase_claim.get('datavalue').get('value') 562 | wikidata_value = wikidata_claim.get('datavalue').get('value') 563 | if wikibase_value == wikidata_value: 564 | found_equal_value = True 565 | # TABULAR-DATA 566 | elif wikidata_claim.get('datatype') == 'tabular-data': 567 | print('tabular-data') 568 | # raise NameError('Tabluar data not implemented') 569 | # set new claim 570 | # claim = pywikibot.page.Claim( 571 | # testsite, 'P30175', datatype='tabular-data') 572 | # commons_site = pywikibot.Site('commons', 'commons') 573 | # page = pywikibot.Page(commons_site, 'Data:Bea.gov/GDP by state.tab') 574 | # target = pywikibot.WbGeoShape(page) 575 | # claim.setTarget(target) 576 | # item.addClaim(claim) 577 | else: 578 | print('This datatype is not supported ', wikidata_claim.get('datatype'), ' ---- ', 579 | wikibase_claim.get('datatype')) 580 | return found, found_equal_value 581 | 582 | # translate one claim from wikidata in one of wikibase 583 | def translateClaim(self, wikidata_claim): 584 | wikidata_propertyId = wikidata_claim.get('property') 585 | if not self.id.contains_id(wikidata_propertyId): 586 | wikidata_property = pywikibot.PropertyPage(self.wikidata_repo, wikidata_propertyId, 587 | datatype=wikidata_claim.get('datatype')) 588 | wikidata_property.get() 589 | self.importProperty(wikidata_property) 590 | if wikidata_claim.get('snaktype') == 'somevalue': 591 | claim = pywikibot.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 592 | datatype=wikidata_claim.get('datatype')) 593 | claim.setSnakType('somevalue') 594 | return claim 595 | elif wikidata_claim.get('snaktype') == 'novalue': 596 | claim = pywikibot.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 597 | datatype=wikidata_claim.get('datatype')) 598 | claim.setSnakType('novalue') 599 | else: 600 | # WIKIBASE-ITEM 601 | if wikidata_claim.get('datatype') == 'wikibase-item': 602 | # add the entity to the wiki 603 | wikidata_objectId = 'Q' + str( 604 | wikidata_claim.get('datavalue').get('value').get('numeric-id')) 605 | if not self.id.contains_id(wikidata_objectId): 606 | item = pywikibot.ItemPage(self.wikidata_repo, wikidata_objectId) 607 | try: 608 | item.get() 609 | self.import_item(item) 610 | except pywikibot.exceptions.IsRedirectPage: 611 | print("We are ignoring this") 612 | 613 | if self.id.contains_id(wikidata_objectId) and (not self.id.get_id(wikidata_objectId) == '-1'): 614 | claim = pywikibot.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 615 | datatype='wikibase-item') 616 | object = pywikibot.ItemPage(self.wikibase_repo, self.id.get_id(wikidata_objectId)) 617 | claim.setTarget(object) 618 | claim.setRank(wikidata_claim.get('rank')) 619 | return claim 620 | # WIKIBASE-PROPERTY 621 | elif wikidata_claim.get('datatype') == 'wikibase-property': 622 | wikidata_objectId = 'P' + str( 623 | wikidata_claim.get('datavalue').get('value').get('numeric-id')) 624 | if not self.id.contains_id(wikidata_objectId): 625 | item = pywikibot.PropertyPage(self.wikidata_repo, wikidata_objectId) 626 | try: 627 | item.get() 628 | self.importProperty(item) 629 | except pywikibot.exceptions.IsRedirectPage: 630 | print("We are ignoring this") 631 | 632 | if self.id.contains_id(wikidata_objectId) and (not self.id.get_id(wikidata_objectId) == '-1'): 633 | claim = pywikibot.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 634 | datatype='wikibase-property') 635 | object = pywikibot.PropertyPage(self.wikibase_repo, self.id.get_id(wikidata_objectId)) 636 | claim.setTarget(object) 637 | claim.setRank(wikidata_claim.get('rank')) 638 | return claim 639 | # MONOLINGUALTEXT 640 | elif wikidata_claim.get('datatype') == 'monolingualtext': 641 | claim = pywikibot.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 642 | datatype='monolingualtext') 643 | wikidata_text = wikidata_claim.get('datavalue').get('value').get('text') 644 | wikidata_language = wikidata_claim.get('datavalue').get('value').get('language') 645 | # HACK 646 | # 647 | # print(wikidata_text, "---", wikidata_language) 648 | target = pywikibot.WbMonolingualText(text=wikidata_text, language=wikidata_language) 649 | claim.setTarget(target) 650 | claim.setRank(wikidata_claim.get('rank')) 651 | return claim 652 | # GLOBE-COORDINATES 653 | elif wikidata_claim.get('datatype') == 'globe-coordinate': 654 | wikidata_latitude = wikidata_claim.get('datavalue').get('value').get('latitude') 655 | wikidata_longitude = wikidata_claim.get('datavalue').get('value').get('longitude') 656 | wikidata_altitude = wikidata_claim.get('datavalue').get('value').get('altitude') 657 | wikidata_globe_uri = wikidata_claim.get('datavalue').get('value').get( 658 | 'globe').replace("http://www.wikidata.org/entity/", "") 659 | wikidata_precision = wikidata_claim.get('datavalue').get('value').get('precision') 660 | wikidata_globe_item = pywikibot.ItemPage(self.wikidata_repo, wikidata_globe_uri) 661 | wikidata_globe_item.get() 662 | wikibase_globe_item = self.change_item(wikidata_globe_item, self.wikibase_repo, False) 663 | 664 | ##Note: picking as globe wikidata item for earth, this is the standard in a wikibase even if the entity does not exist 665 | claim = pywikibot.page.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 666 | datatype='globe-coordinate') 667 | if wikidata_precision != None: 668 | target = pywikibot.Coordinate(site=self.wikibase_repo, lat=wikidata_latitude, 669 | lon=wikidata_longitude, 670 | alt=wikidata_altitude, 671 | globe_item="http://www.wikidata.org/entity/Q2", 672 | precision=wikidata_precision 673 | ) 674 | else: 675 | target = pywikibot.Coordinate(site=self.wikibase_repo, lat=wikidata_latitude, 676 | lon=wikidata_longitude, 677 | alt=wikidata_altitude, 678 | globe_item="http://www.wikidata.org/entity/Q2", 679 | precision=1 680 | ) 681 | # print(wikidata_propertyId) 682 | # print("Property ",self.id.get_id(wikidata_propertyId)) 683 | # print("My traget ",target) 684 | claim.setTarget(target) 685 | claim.setRank(wikidata_claim.get('rank')) 686 | return claim 687 | # TIME 688 | elif wikidata_claim.get('datatype') == 'time': 689 | wikidata_time = wikidata_claim.get('datavalue').get('value').get('time') 690 | wikidata_precision = wikidata_claim.get('datavalue').get('value').get( 691 | 'precision') 692 | wikidata_after = wikidata_claim.get('datavalue').get('value').get( 693 | 'after') 694 | wikidata_before = wikidata_claim.get('datavalue').get('value').get( 695 | 'before') 696 | wikidata_timezone = wikidata_claim.get('datavalue').get('value').get( 697 | 'timezone') 698 | wikidata_calendermodel = wikidata_claim.get('datavalue').get('value').get( 699 | 'calendarmodel') 700 | 701 | ##Note: picking as claender wikidata, this is the standard in a wikibase even if the entity does not exist 702 | claim = pywikibot.page.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 703 | datatype='time') 704 | target = pywikibot.WbTime.fromTimestr(site=self.wikibase_repo, datetimestr=wikidata_time, 705 | precision=wikidata_precision, 706 | after=wikidata_after, before=wikidata_before, 707 | timezone=wikidata_timezone, 708 | calendarmodel=wikidata_calendermodel) 709 | claim.setTarget(target) 710 | claim.setRank(wikidata_claim.get('rank')) 711 | 712 | return claim 713 | # COMMONSMEDIA 714 | elif wikidata_claim.get('datatype') == 'commonsMedia': 715 | claim = pywikibot.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 716 | datatype='commonsMedia') 717 | wikidata_text = wikidata_claim.get('datavalue').get('value') 718 | commonssite = pywikibot.Site('commons', 'commons') 719 | imagelink = pywikibot.Link(wikidata_text, source=commonssite, 720 | default_namespace=6) 721 | image = pywikibot.FilePage(imagelink) 722 | if image.isRedirectPage(): 723 | image = pywikibot.FilePage(image.getRedirectTarget()) 724 | 725 | if not image.exists(): 726 | pywikibot.output("{} doesn't exist so I can't link to it" 727 | .format(image.title(as_link=True))) 728 | return 729 | 730 | claim.setTarget(image) 731 | claim.setRank(wikidata_claim.get('rank')) 732 | return claim 733 | # QUANTITY 734 | elif wikidata_claim.get('datatype') == 'quantity': 735 | wikidata_amount = wikidata_claim.get('datavalue').get('value').get('amount') 736 | wikidata_upperBound = wikidata_claim.get('datavalue').get('value').get( 737 | 'upperBound') 738 | wikidata_lowerBound = wikidata_claim.get('datavalue').get('value').get( 739 | 'lowerBound') 740 | wikidata_unit = wikidata_claim.get('datavalue').get('value').get('unit') 741 | wikidata_objectId = wikidata_unit.replace("http://www.wikidata.org/entity/", "") 742 | # add unit if not in the wiki 743 | if not (wikidata_unit == None or wikidata_unit == '1'): 744 | if not self.id.contains_id(wikidata_objectId): 745 | item = pywikibot.ItemPage(self.wikidata_repo, wikidata_objectId) 746 | self.change_item(item, self.wikibase_repo, False) 747 | claim = pywikibot.page.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 748 | datatype='quantity') 749 | # print(wikidata_amount) 750 | # print(Decimal(wikidata_amount)) 751 | # print(wikidata_upperBound) 752 | # print(Decimal(wikidata_upperBound)-Decimal(wikidata_amount)) 753 | if wikidata_unit == None or wikidata_unit == '1': 754 | if wikidata_upperBound == None: 755 | # print("Here 1", '{:f}'.format(Decimal(wikidata_amount))) 756 | target = pywikibot.WbQuantity(amount='{:f}'.format(Decimal(wikidata_amount)), 757 | site=self.wikibase_repo) 758 | claim.setTarget(target) 759 | claim.setRank(wikidata_claim.get('rank')) 760 | return claim 761 | else: 762 | # print("Here 2", '{:f}'.format(Decimal(wikidata_amount))) 763 | target = pywikibot.WbQuantity(amount=Decimal(wikidata_amount), site=self.wikibase_repo, 764 | error=Decimal(wikidata_upperBound) - Decimal(wikidata_amount)) 765 | claim.setTarget(target) 766 | claim.setRank(wikidata_claim.get('rank')) 767 | return claim 768 | else: 769 | if (self.id.contains_id(wikidata_objectId) and not self.id.get_id(wikidata_objectId) == '-1'): 770 | if wikidata_upperBound == None: 771 | # print("Here 3", '{:f}'.format(Decimal(wikidata_amount))) 772 | wikibase_unit = pywikibot.ItemPage(self.wikibase_repo, self.id.get_id(wikidata_objectId)) 773 | # here this is a hack ....... 774 | target = pywikibot.WbQuantity(amount=Decimal(wikidata_amount), unit=wikibase_unit, 775 | site=self.wikibase_repo) 776 | claim.setTarget(target) 777 | claim.setRank(wikidata_claim.get('rank')) 778 | else: 779 | # print("Here 4", '{:f}'.format(Decimal(wikidata_amount))) 780 | wikibase_unit = pywikibot.ItemPage(self.wikibase_repo, self.id.get_id(wikidata_objectId)) 781 | target = pywikibot.WbQuantity(amount=Decimal(wikidata_amount), unit=wikibase_unit, 782 | site=self.wikibase_repo, 783 | error=Decimal(wikidata_upperBound) - Decimal( 784 | wikidata_amount)) 785 | claim.setTarget(target) 786 | claim.setRank(wikidata_claim.get('rank')) 787 | return claim 788 | # URL 789 | elif wikidata_claim.get('datatype') == 'url': 790 | wikidata_value = wikidata_claim.get('datavalue').get('value') 791 | claim = pywikibot.page.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 792 | datatype='url') 793 | target = wikidata_value[0:500] 794 | claim.setTarget(target) 795 | claim.setRank(wikidata_claim.get('rank')) 796 | return claim 797 | # EXTERNAL-ID 798 | elif wikidata_claim.get('datatype') == 'external-id': 799 | wikidata_value = wikidata_claim.get('datavalue').get('value') 800 | claim = pywikibot.page.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 801 | datatype='external-id') 802 | target = wikidata_value 803 | claim.setTarget(target) 804 | claim.setRank(wikidata_claim.get('rank')) 805 | return claim 806 | # STRING 807 | elif wikidata_claim.get('datatype') == 'string': 808 | wikidata_value = wikidata_claim.get('datavalue').get('value') 809 | claim = pywikibot.page.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 810 | datatype='string') 811 | target = wikidata_value 812 | claim.setTarget(target) 813 | claim.setRank(wikidata_claim.get('rank')) 814 | claim.setRank(wikidata_claim.get('rank')) 815 | return claim 816 | # GEOSHAPE 817 | elif wikidata_claim.get('datatype') == 'geo-shape': 818 | claim = pywikibot.page.Claim(self.wikibase_repo, self.id.get_id(wikidata_propertyId), 819 | datatype='geo-shape') 820 | commons_site = pywikibot.Site('commons', 'commons') 821 | page = pywikibot.Page(commons_site, wikidata_claim.get('datavalue').get('value')) 822 | target = pywikibot.WbGeoShape(page) 823 | claim.setTarget(target) 824 | claim.setRank(wikidata_claim.get('rank')) 825 | return claim 826 | # TABULAR-DATA 827 | elif wikidata_claim.get('datatype') == 'tabular-data': 828 | return None 829 | print('Not implemented yet tabular-data') 830 | # raise NameError('Tabluar data not implemented') 831 | # set new claim 832 | # claim = pywikibot.page.Claim( 833 | # testsite, 'P30175', datatype='tabular-data') 834 | # commons_site = pywikibot.Site('commons', 'commons') 835 | # page = pywikibot.Page(commons_site, 'Data:Bea.gov/GDP by state.tab') 836 | # target = pywikibot.WbGeoShape(page) 837 | # claim.setTarget(target) 838 | # item.addClaim(claim) 839 | else: 840 | print('This datatype is not supported ', wikidata_claim.get('datatype'), 841 | ' translating the following claim ', wikidata_claim) 842 | return None 843 | 844 | # comparing two claims together with their qualifiers and references 845 | def compare_claim_with_qualifiers_and_references(self, wikidata_claim, wikibase_claim, translate): 846 | # compare mainsnak 847 | found_equal_value = False 848 | (claim_found, main_claim_found_equal_value) = self.compare_claim(wikidata_claim.get('mainsnak'), 849 | wikibase_claim.get('mainsnak'), translate) 850 | # compare qualifiers 851 | qualifiers_equal = True 852 | if main_claim_found_equal_value and ('qualifiers' in wikidata_claim) and ('qualifiers' in wikibase_claim): 853 | for q1 in wikidata_claim.get('qualifiers'): 854 | for q_wikidata in wikidata_claim.get('qualifiers').get(q1): 855 | qualifier_equal = False 856 | # print("Passing here .... ", q_wikidata) 857 | # print(qualifier_equal) 858 | for q2 in wikibase_claim.get('qualifiers'): 859 | for q_wikibase in wikibase_claim.get('qualifiers').get(q2): 860 | wikidata_propertyId = q_wikidata.get('property') 861 | if self.id.contains_id(wikidata_propertyId): 862 | (qualifier_claim_found, qualifier_claim_found_equal_value) = self.compare_claim( 863 | q_wikidata, q_wikibase, translate) 864 | if qualifier_claim_found_equal_value == True: 865 | qualifier_equal = True 866 | if qualifier_equal == False: 867 | qualifiers_equal = False 868 | if main_claim_found_equal_value and ('qualifiers' in wikidata_claim and not ('qualifiers' in wikibase_claim) or ( 869 | not 'qualifiers' in wikidata_claim) and 'qualifiers' in wikibase_claim): 870 | qualifiers_equal = False 871 | 872 | # compare references 873 | references_equal = True 874 | 875 | # print(wikidata_claim.get('references')) 876 | # print(wikibase_claim.get('references')) 877 | if ('references' in wikidata_claim) and ('references' in wikibase_claim): 878 | # print(len(wikidata_claim.get('references'))) 879 | # print() 880 | # print(len(wikibase_claim.get('references'))) 881 | # if len(wikidata_claim.get('references')) == len(wikibase_claim.get('references')): 882 | for i in range(0, len(wikidata_claim.get('references'))): 883 | for q1 in wikidata_claim.get('references')[i].get('snaks'): 884 | for q_wikidata in wikidata_claim.get('references')[i].get('snaks').get(q1): 885 | reference_equal = False 886 | for snak in wikibase_claim.get('references'): 887 | for q2 in snak.get('snaks'): 888 | for q_wikibase in snak.get('snaks').get(q2): 889 | wikidata_propertyId = q_wikidata.get('property') 890 | if self.id.contains_id(wikidata_propertyId): 891 | # print("Two Qualifiers") 892 | # print("q_wikidata",q_wikidata) 893 | # print("q_wikibase",q_wikibase) 894 | ( 895 | references_claim_found, 896 | references_claim_found_equal_value) = self.compare_claim( 897 | q_wikidata, 898 | q_wikibase, translate) 899 | # print("qualifier_claim_found_equal_value", references_claim_found_equal_value) 900 | if references_claim_found_equal_value == True: 901 | # print("Enter here ....") 902 | reference_equal = True 903 | if reference_equal == False: 904 | references_equal = False 905 | # else: 906 | # references_equal = False 907 | if ('references' in wikidata_claim and not ('references' in wikibase_claim)) or ( 908 | not ('references' in wikidata_claim) and 'references' in wikibase_claim): 909 | references_equal = False 910 | if main_claim_found_equal_value and qualifiers_equal and references_equal and wikidata_claim.get( 911 | 'rank') == wikibase_claim.get('rank'): 912 | found_equal_value = True 913 | more_accurate = False 914 | # print("main_claim_found_equal_value and ('references' not in wikibase_claim) and ('qualifiers' not in wikibase_claim)", main_claim_found_equal_value and ('references' not in wikibase_claim) and ('qualifiers' not in wikibase_claim)) 915 | # print("'references' in wikidata_claim ", 'references' in wikidata_claim ) 916 | # print("len(wikidata_claim.get('references'))>0",'references' in wikidata_claim and len(wikidata_claim.get('references')) > 0) 917 | # if 'references' in wikidata_claim: 918 | # print(wikidata_claim.get('references')) 919 | # print(len(wikidata_claim.get('references')) > 0) 920 | # print((('references' in wikidata_claim and len(wikidata_claim.get('references'))>0) or ('qualifiers' in wikidata_claim and len(wikidata_claim.get('qualifiers'))>0))) 921 | if main_claim_found_equal_value and ('references' not in wikibase_claim) and ( 922 | 'qualifiers' not in wikibase_claim) and ( 923 | ('references' in wikidata_claim and len(wikidata_claim.get('references')) > 0) or ( 924 | 'qualifiers' in wikidata_claim and len(wikidata_claim.get('qualifiers')) > 0)): 925 | more_accurate = True 926 | return claim_found, found_equal_value, more_accurate 927 | 928 | def check_claim_was_not_deleted_locally(self, wikibase_repo, revisions, wikidata_claim): 929 | """ 930 | This method checks whether a given claim has previously existed in the wikibase 931 | and was deleted by a local user. 932 | Returns true if the above condition is true, false otherwise 933 | """ 934 | found = False 935 | # check in the revisions if the claim existed before in some point in time 936 | k = 0 937 | for i in range(0, len(revisions)): 938 | if found == False: 939 | item_revision = self.json_to_item(wikibase_repo, revisions[i]['text']) 940 | claims = item_revision["claims"] 941 | 942 | for claim_key, claim in claims.items(): 943 | 944 | for c_revision in claim: 945 | (claim_found, found_equal_value) = self.compare_claim( 946 | wikidata_claim.get('mainsnak'), 947 | c_revision.toJSON().get('mainsnak'), False) 948 | 949 | if found_equal_value: 950 | found = True 951 | k = i 952 | if found: 953 | if k == 0: 954 | return True 955 | else: 956 | if revisions[k - 1]["user"].lower() != str(user_config.usernames['my']['my']).lower(): 957 | return False 958 | else: 959 | return True 960 | else: 961 | return True 962 | 963 | # change the claims 964 | def change_claims(self, wikidata_item, wikibase_item): 965 | # check which claims are in wikibase and in wikidata with the same property but different value, and delete them 966 | claims_to_remove = [] 967 | claim_more_accurate = [] 968 | for wikibase_claims in wikibase_item.claims: 969 | for wikibase_c in wikibase_item.claims.get(wikibase_claims): 970 | # print("Trying to find this claim ", wikibase_c) 971 | alreadyFound = False 972 | wikibase_claim = wikibase_c.toJSON() 973 | wikibase_propertyId = wikibase_claim.get('mainsnak').get('property') 974 | found = False 975 | found_equal_value = False 976 | found_more_accurate = False # tells if the statement to import is better then the existing one, i.e. if it has references and qualifiers for the fact 977 | for claims in wikidata_item.claims: 978 | for c in wikidata_item.claims.get(claims): 979 | wikidata_claim = c.toJSON() 980 | wikidata_property_id = wikidata_claim.get('mainsnak').get('property') 981 | # if the property is not there then they cannot be at the same time in wikibase and wikidata 982 | if self.id.contains_id(wikidata_property_id): 983 | if self.id.get_id(wikidata_property_id) == wikibase_propertyId: 984 | # print(wikidata_claim,"---",wikibase_claim) 985 | (found_here, found_equal_value_here, 986 | more_accurate_here) = self.compare_claim_with_qualifiers_and_references(wikidata_claim, 987 | wikibase_claim, 988 | True) 989 | # print('Result ',found_here,found_equal_value_here, more_accurate_here) 990 | if found_here == True: 991 | found = True 992 | if found_equal_value == True and found_equal_value_here == True: 993 | alreadyFound = True 994 | if found_equal_value_here == True: 995 | found_equal_value = True 996 | found_more_accurate = more_accurate_here 997 | 998 | if found == True and found_equal_value == False: 999 | claims_to_remove.append(wikibase_c) 1000 | claim_more_accurate.append(found_more_accurate) 1001 | if alreadyFound == True: 1002 | claims_to_remove.append(wikibase_c) 1003 | claim_more_accurate.append(found_more_accurate) 1004 | print("This claim is deleted it's a duplicate", wikibase_claim) 1005 | 1006 | # print("CHECK WHO ADDED THE CLAIMS") 1007 | # check that the claims to delete where added by Wikidata Updater, if not, don't delete them 1008 | # get all the edit history 1009 | not_remove = [] 1010 | revisions_tmp = wikibase_item.revisions(content=True) 1011 | revisions = [] 1012 | # problem with the revisions_tmp object 1013 | for h in revisions_tmp: 1014 | revisions.append(h) 1015 | is_only_wikidata_updater_user = True 1016 | # if only the wikidata updater made changes then it is for sure a deletion in wikidata 1017 | for revision in revisions: 1018 | # print(revision['user']) 1019 | if revision['user'].lower() != str(user_config.usernames['my']['my']).lower(): 1020 | is_only_wikidata_updater_user = False 1021 | break 1022 | # print("is_only_wikidata_updater_user",is_only_wikidata_updater_user) 1023 | claims_found_in_revisions = [] 1024 | print(len(claims_to_remove)) 1025 | if not is_only_wikidata_updater_user: 1026 | for i in range(0, len(claims_to_remove)): 1027 | claimToRemove = claims_to_remove[i] 1028 | # print("CHECKING CLAIM ",claimToRemove, "---", claim_more_accurate[i],"---", revisions) 1029 | # go through the history and find the edit where it was added and the user that made that edit 1030 | if claim_more_accurate[ 1031 | i] == False: # if the claim is more accurate it is better to cancel the existing one 1032 | edit_where_claim_was_added = len(revisions) - 1 1033 | print(len(revisions)) 1034 | for j in range(0, len(revisions)): 1035 | # print("new revision ",revisions[j]['user']) 1036 | item_revision = self.json_to_item(self.wikibase_repo, revisions[j]['text']) 1037 | found = False 1038 | for claims_revision in item_revision['claims']: 1039 | if found == False: 1040 | for c_revision in item_revision['claims'].get(claims_revision): 1041 | if found == False: 1042 | (found_here, found_equal_value_here, 1043 | more_accurate) = self.compare_claim_with_qualifiers_and_references( 1044 | claimToRemove.toJSON(), c_revision.toJSON(), False) 1045 | # print(claimToRemove.toJSON(), "----", c_revision.toJSON()) 1046 | # print("found_equal_value_here",found_equal_value_here, " more_accurate", more_accurate) 1047 | if found_equal_value_here == True: 1048 | found = True 1049 | if found == False: 1050 | edit_where_claim_was_added = j - 1 1051 | break 1052 | 1053 | # print("User that added this claim ", revisions[edit_where_claim_was_added]['user']) 1054 | if revisions[edit_where_claim_was_added]['user'].lower() != self.appConfig.get('wikibase', 'user').lower(): 1055 | not_remove.append(claimToRemove) 1056 | for c in not_remove: 1057 | claims_to_remove.remove(c) 1058 | print("claimsToRemove ", claims_to_remove) 1059 | if len(claims_to_remove) > 0: 1060 | for claimsToRemoveChunk in chunks(claims_to_remove, 50): 1061 | wikibase_item.get() 1062 | wikibase_item.removeClaims(claimsToRemoveChunk, 1063 | summary="Removing this statements since they changed in Wikidata") 1064 | # check which claims are in wikidata and not in wikibase and import them 1065 | # refetch the wikibase entity since some statements may hav been deleted 1066 | if wikibase_item.getID().startswith("Q"): 1067 | wikibase_item = pywikibot.ItemPage(self.wikibase_repo, wikibase_item.getID()) 1068 | else: 1069 | wikibase_item = pywikibot.PropertyPage(self.wikibase_repo, wikibase_item.getID()) 1070 | wikibase_item.get() 1071 | new_claims = [] 1072 | for claims in wikidata_item.claims: 1073 | for c in wikidata_item.claims.get(claims): 1074 | wikidata_claim = c.toJSON() 1075 | found_equal_value = False 1076 | wikidata_property_id = wikidata_claim.get('mainsnak').get('property') 1077 | print(wikidata_property_id) 1078 | if wikibase_item.getID().startswith("Q") or wikibase_item.getID().startswith("P"): 1079 | for wikibase_claims in wikibase_item.claims: 1080 | for wikibase_c in wikibase_item.claims.get(wikibase_claims): 1081 | wikibase_claim = wikibase_c.toJSON() 1082 | if self.id.contains_id(wikidata_property_id): 1083 | (claim_found, claim_found_equal_value, 1084 | more_accurate) = self.compare_claim_with_qualifiers_and_references(wikidata_claim, 1085 | wikibase_claim, 1086 | True) 1087 | if (claim_found_equal_value == True): 1088 | found_equal_value = True 1089 | print(found_equal_value) 1090 | if found_equal_value == False: 1091 | # print("This claim is added ", wikidata_claim) 1092 | # import the property if it does not exist 1093 | if wikidata_claim.get('mainsnak').get('snaktype') == 'value': 1094 | # the claim is added 1095 | claim = self.translateClaim(wikidata_claim.get('mainsnak')) 1096 | if claim is not None: 1097 | claim.setRank(wikidata_claim.get('rank')) 1098 | if 'qualifiers' in wikidata_claim: 1099 | for key in wikidata_claim.get('qualifiers'): 1100 | for old_qualifier in wikidata_claim.get('qualifiers').get(key): 1101 | new_qualifier = self.translateClaim(old_qualifier) 1102 | if new_qualifier != None: 1103 | claim.addQualifier(new_qualifier) 1104 | if 'references' in wikidata_claim: 1105 | for snak in wikidata_claim.get('references'): 1106 | for key in snak.get('snaks'): 1107 | new_references = [] 1108 | for old_reference in snak.get('snaks').get(key): 1109 | # print('old',old_reference) 1110 | new_reference = self.translateClaim(old_reference) 1111 | # print(new_reference) 1112 | # this can happen if the object entity has no label in any given language 1113 | if new_reference != None: 1114 | new_references.append(new_reference) 1115 | if len(new_references) > 0: 1116 | claim.addSources(new_references) 1117 | new_claims.append(claim.toJSON()) 1118 | else: 1119 | print('The translated claim is None ', wikidata_claim.get('mainsnak')) 1120 | elif wikidata_claim.get('mainsnak').get('snaktype') == 'novalue': 1121 | print("Claims with no value not implemented yet") 1122 | else: 1123 | print('This should not happen ', wikidata_claim.get('mainsnak')) 1124 | 1125 | if len(new_claims) > 0: 1126 | # exclude the claims that where deleted locally 1127 | if not is_only_wikidata_updater_user: 1128 | temp_new_claims = [] 1129 | for claim in new_claims: 1130 | if self.check_claim_was_not_deleted_locally(self.wikibase_repo, revisions, claim): 1131 | temp_new_claims.append(claim) 1132 | new_claims = temp_new_claims 1133 | # add the claims 1134 | for claimsToAdd in chunks(new_claims, 20): 1135 | data = {} 1136 | data['claims'] = claimsToAdd 1137 | try: 1138 | wikibase_item.editEntity(data, 1139 | summary="Adding these statements since they where added in Wikidata") 1140 | except (pywikibot.data.api.APIError, pywikibot.exceptions.OtherPageSaveError) as e: 1141 | print(e) 1142 | 1143 | def wikidata_link(self, wikibase_item, wikidata_item): 1144 | # make a link to wikidata if it does not exist 1145 | found = False 1146 | if hasattr(wikibase_item, "claims"): 1147 | for wikibase_claims in wikibase_item.claims: 1148 | for wikibase_c in wikibase_item.claims.get(wikibase_claims): 1149 | wikibase_claim = wikibase_c.toJSON() 1150 | wikibase_propertyId = wikibase_claim.get('mainsnak').get('property') 1151 | if wikibase_propertyId == self.identifier.itemIdentifier: 1152 | found = True 1153 | if found == False: 1154 | claim = pywikibot.page.Claim(self.wikibase_repo, self.identifier.itemIdentifier, datatype='external-id') 1155 | target = wikidata_item.getID() 1156 | claim.setTarget(target) 1157 | wikibase_item.addClaim(claim) 1158 | 1159 | def change_item(self, wikidata_item, wikibase_repo, statements): 1160 | try: 1161 | item = wikidata_item.get() 1162 | except pywikibot.exceptions.UnknownSite as e: 1163 | print("There is a problem fetching an entity, this should ideally not occur") 1164 | return 1165 | print("Change Entity ", wikidata_item.getID()) 1166 | if not self.id.contains_id(wikidata_item.getID()): 1167 | new_id = self.import_item(wikidata_item) 1168 | wikibase_item = pywikibot.ItemPage(wikibase_repo, new_id) 1169 | wikibase_item.get() 1170 | else: 1171 | print("This entity corresponds to ", self.id.get_id(wikidata_item.getID())) 1172 | wikibase_item = pywikibot.ItemPage(wikibase_repo, self.id.get_id(wikidata_item.getID())) 1173 | wikibase_item.get() 1174 | self.change_labels(wikidata_item, wikibase_item) 1175 | self.change_aliases(wikidata_item, wikibase_item) 1176 | self.change_descriptions(wikidata_item, wikibase_item) 1177 | self.wikidata_link(wikibase_item, wikidata_item) 1178 | if statements: 1179 | self.change_site_links(wikidata_item, wikibase_item) 1180 | self.change_claims(wikidata_item, wikibase_item) 1181 | return wikibase_item 1182 | 1183 | def change_item_given_id(self, wikidata_item, id, wikibase_repo, statements): 1184 | print("This entity corresponds to ", id) 1185 | wikibase_item = pywikibot.ItemPage(wikibase_repo, id) 1186 | wikibase_item.get() 1187 | self.change_labels(wikidata_item, wikibase_item) 1188 | self.change_aliases(wikidata_item, wikibase_item) 1189 | self.change_descriptions(wikidata_item, wikibase_item) 1190 | self.wikidata_link(wikibase_item, wikidata_item) 1191 | if statements: 1192 | self.change_site_links(wikidata_item, wikibase_item) 1193 | self.change_claims(wikidata_item, wikibase_item) 1194 | 1195 | def change_property(self, wikidata_item, wikibase_repo, statements): 1196 | print("Change Property", wikidata_item.getID()) 1197 | wikidata_item.get() 1198 | wikibase_item = None 1199 | if not self.id.contains_id(wikidata_item.getID()): 1200 | new_id = self.importProperty(wikidata_item) 1201 | wikibase_item = pywikibot.PropertyPage(wikibase_repo, new_id, datatype=wikidata_item.type) 1202 | wikibase_item.get() 1203 | else: 1204 | print("Entering here") 1205 | wikibase_item = pywikibot.PropertyPage(wikibase_repo, self.id.get_id(wikidata_item.getID()), 1206 | datatype=wikidata_item.type) 1207 | wikibase_item.get() 1208 | new_id = wikibase_item.getID() 1209 | self.change_labels(wikidata_item, wikibase_item) 1210 | self.change_aliases(wikidata_item, wikibase_item) 1211 | self.change_descriptions(wikidata_item, wikibase_item) 1212 | if statements: 1213 | self.change_claims(wikidata_item, wikibase_item) 1214 | return wikibase_item 1215 | 1216 | 1217 | def chunks(l, n): 1218 | """Yield successive n-sized chunks from l.""" 1219 | for i in range(0, len(l), n): 1220 | yield l[i:i + n] 1221 | --------------------------------------------------------------------------------