├── LICENSE.txt ├── PyLOD ├── PyLOD.py └── __init__.py └── README.md /LICENSE.txt: -------------------------------------------------------------------------------- 1 | PyLOD is released under the W3C® SOFTWARE NOTICE AND LICENSE. 2 | 3 | This work (and included software, documentation such as READMEs, or other related items) is being provided by the copyright holders under the following license. By obtaining, using and/or copying this work, you (the licensee) agree that you have read, understood, and will comply with the following terms and conditions. 4 | 5 | Permission to copy, modify, and distribute this software and its documentation, with or without modification, for any purpose and without fee or royalty is hereby granted, provided that you include the following on ALL copies of the software and documentation or portions thereof, including modifications: 6 | 7 | 1. The full text of this NOTICE in a location viewable to users of the redistributed or derivative work. 8 | 2. Any pre-existing intellectual property disclaimers, notices, or terms and conditions. If none exist, the W3C Software Short Notice should be included (hypertext is preferred, text is permitted) within the body of any redistributed or derivative code. 9 | 3. Notice of any changes or modifications to the files, including the date changes were made. (We recommend you provide URIs to the location from which the code is derived.) 10 | 11 | THIS SOFTWARE AND DOCUMENTATION IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS. 12 | 13 | COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE SOFTWARE OR DOCUMENTATION. 14 | 15 | The name and trademarks of copyright holders may NOT be used in advertising or publicity pertaining to the software without specific, written prior permission. Title to copyright in this software and any associated documentation will at all times remain with copyright holders. 16 | 17 | See also http://www.w3.org/Consortium/Legal/copyright-software for further details 18 | -------------------------------------------------------------------------------- /PyLOD/PyLOD.py: -------------------------------------------------------------------------------- 1 | """ 2 | PyLOD - A Python wrapper for exposing Linked Open Data from public SPARQL-served endpoints. 3 | Version 0.1 4 | 5 | Official webpage: http://pmitzias/PyLOD 6 | Documentation: http://pmitzias/PyLOD/docs.html 7 | 8 | Created by Panos Mitzias (http://www.pmitzias.com), Efstratios Kontopoulos (http://www.stratoskontopoulos.com) 9 | Powered by CERTH/MKLab (http://mklab.iti.gr) 10 | """ 11 | 12 | from SPARQLWrapper import SPARQLWrapper, JSON 13 | import re 14 | import sys 15 | 16 | 17 | class PyLOD: 18 | def __init__(self, endpoint_dictionary=None, namespaces_dictionary=None): 19 | """ 20 | The PyLOD class constructor. 21 | :param endpoint_dictionary: Optional argument for user-defined SPARQL-served LOD endpoints given as a dictionary, where the keys are the endpoint names and the values are the endpoint URLs. 22 | :param namespaces_dictionary: Optional argument for user-defined namespaces given as a dictionary, where the keys are the namespace prefixes and the key values are the namespace URLs. 23 | """ 24 | 25 | class Endpoints: 26 | def __init__(self, endpoint_dictionary=None): 27 | """ 28 | The Endpoints class constructor. 29 | :param endpoint_dictionary: Optional argument for user-defined SPARQL-served LOD endpoints given as a dictionary, where the keys are the endpoint names and the key values are the endpoint URLs. 30 | """ 31 | 32 | self.dictionary = {} 33 | self.set_endpoints(endpoint_dictionary) 34 | 35 | def set_endpoints(self, endpoint_dictionary=None): 36 | """ 37 | Sets the dictionary of endpoints to be queried. If the argument endpoint_dictionary is not provided, a set of popular endpoints (e.g. DBpedia) will be used. 38 | :param endpoint_dictionary: A user-defined dictionary of endpoints where the keys are the endpoint names and the key values are the corresponding endpoint URLs. 39 | """ 40 | 41 | if endpoint_dictionary is None: 42 | # Set popular endpoints 43 | self.dictionary = { 44 | "DBpedia": "http://dbpedia.org/sparql", 45 | "GeoLinkedData": "http://linkedgeodata.org/sparql" 46 | } 47 | 48 | # If a user-defined endpoint dictionary was given as argument 49 | elif isinstance(endpoint_dictionary, dict): 50 | self.dictionary = {} 51 | 52 | # For each given endpoint 53 | for key in endpoint_dictionary: 54 | try: 55 | # If given value is string 56 | if isinstance(endpoint_dictionary[key], str): 57 | self.dictionary[key] = endpoint_dictionary[key] 58 | except Exception as e: 59 | print("PyLOD.Endpoints.set_endpoints() - Error appending provided endpoint to endpoints dictionary") 60 | print(e) 61 | 62 | else: 63 | self.dictionary = {} 64 | 65 | def get_endpoints(self): 66 | """ 67 | :return: The dictionary of currently set endpoints. 68 | """ 69 | 70 | return self.dictionary 71 | 72 | class Namespaces: 73 | def __init__(self, namespace_dictionary): 74 | """ 75 | The Namespaces class constructor. 76 | :param namespace_dictionary: Optional argument for a user-defined dictionary of namespaces where the keys are the desired prefixes and the key values are the corresponding namespace URLs. 77 | """ 78 | self.dictionary = self.set_namespaces(namespace_dictionary) 79 | 80 | def set_namespaces(self, namespace_dictionary=None): 81 | """ 82 | Returns a dictionary of the most popular namespaces (rdf, rdfs, etc.). The argument namespace_dictionary may contain a dictionary of user-defined namespaces. 83 | :param namespace_dictionary: A user-defined dictionary of namespaces where the keys are the desired prefixes and the key values are the corresponding namespace URLs 84 | :return: A dictionary of namespaces. 85 | """ 86 | 87 | # Popular namespaces 88 | namespaces = { 89 | "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", 90 | "rdfs": "http://www.w3.org/2000/01/rdf-schema#", 91 | "prov": "http://www.w3.org/ns/prov#", 92 | "foaf": "http://xmlns.com/foaf/0.1/", 93 | "xml": "http://www.w3.org/2001/XMLSchema#", 94 | "owl": "http://www.w3.org/2002/07/owl#", 95 | "db": "http://dbpedia.org/", 96 | "dbo": "http://dbpedia.org/ontology/", 97 | "dbp": "http://dbpedia.org/property/" 98 | } 99 | 100 | # If a user-defined namespace dictionary was given as argument 101 | if (namespace_dictionary is not None) and (isinstance(namespace_dictionary, dict)): 102 | 103 | # For each given namespace prefix 104 | for prefix in namespace_dictionary: 105 | 106 | try: 107 | namespaces[prefix] = namespace_dictionary[prefix] 108 | except Exception as e: 109 | print("PyLOD.Namespaces.set_namespaces() - Error appending provided namespace to namespace dictionary") 110 | print(e) 111 | 112 | return namespaces 113 | 114 | def get_namespaces(self): 115 | """ 116 | :return: The dictionary of currently set namespaces. 117 | """ 118 | return self.dictionary 119 | 120 | def get_namespaces_string(self): 121 | """ 122 | Concatenates all namespaces in the namespace dictionary into a string, in order to be used in SPARQL queries 123 | :return: A string that complies with W3C SPARQL definition of namespaces 124 | """ 125 | 126 | namespaces_string = '' 127 | 128 | for prefix in self.dictionary: 129 | try: 130 | namespaces_string += "PREFIX %s: <%s>\n" % (prefix, self.dictionary[prefix]) 131 | except Exception as e: 132 | print("PyLOD.Namespaces.get_namespaces_string() - Error while generating namespaces string from namespace dictionary") 133 | print(e) 134 | 135 | return namespaces_string 136 | 137 | class SPARQL: 138 | def __init__(self, pylod): 139 | """ 140 | The SPARQL class constructor. 141 | :param pylod: SPARQL's parent class object (PyLOD object). 142 | """ 143 | 144 | self.pylod = pylod 145 | 146 | def execute_select(self, endpoint_url, query, limit=None): 147 | """ 148 | Uses the SPARQLWrapper module to execute a SPARQL query against the given endpoint. 149 | :param endpoint_url: A URL of the SPARQL-served endpoint to be queried. 150 | :param query: The desired SPARQL query. 151 | :param limit: Optional argument (integer) to limit query results. 152 | :return: The query results as a dictionary (JSON format). 153 | """ 154 | 155 | if (not self.pylod.is_valid_string(endpoint_url)) and (not self.pylod.is_valid_string(query)): 156 | print("PyLOD.SPARQL.execute_select() - Invalid arguments") 157 | return False 158 | 159 | # Connect to ontology 160 | sparql = SPARQLWrapper(endpoint_url) 161 | 162 | # Add prefixes to query 163 | query = self.pylod.namespaces.get_namespaces_string() + query 164 | 165 | # Add limit to query 166 | if (limit is not None) and (isinstance(limit, int)): 167 | query = query + ' LIMIT ' + str(limit) 168 | 169 | # Set query 170 | try: 171 | sparql.setQuery(query) 172 | # In case it is not unicode 173 | except TypeError: 174 | sparql.setQuery(unicode(query)) 175 | 176 | # Set output to JSON 177 | sparql.setReturnFormat(JSON) 178 | 179 | try: 180 | # Execute query and return results 181 | return sparql.query().convert()['results']['bindings'] 182 | except Exception as e: 183 | # print("PyLOD.SPARQL.execute_select() - Error while executing query to ", endpoint_url) 184 | # print(e) 185 | return False 186 | 187 | def execute_select_to_all_endpoints(self, query, limit_per_endpoint=None): 188 | """ 189 | Executes the given query against all endpoints in the endpoint dictionary. 190 | :param query: The desired SPARQL query. 191 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 192 | :return: A dictionary with the query results per endpoint. 193 | """ 194 | 195 | if not self.pylod.is_valid_string(query) or (limit_per_endpoint is not None and not isinstance(limit_per_endpoint, int)): 196 | print("PyLOD.SPARQL.execute_select_to_all_endpoints() - Invalid arguments") 197 | return False 198 | 199 | results = {} 200 | 201 | # Get the endpoints dictionary 202 | endpoints = self.pylod.endpoints.get_endpoints() 203 | 204 | # For each endpoint 205 | for endpoint_name in endpoints: 206 | 207 | sys.stdout.write("Querying \033[95m" + str(endpoint_name) + "\033[0m | Endpoint status:") 208 | 209 | # If endpoint is reachable 210 | if self.pylod.sparql.is_active_endpoint(endpoint_url=endpoints[endpoint_name]): 211 | 212 | sys.stdout.write("\033[92m ACTIVE \033[0m") 213 | 214 | try: 215 | results[endpoint_name] = self.pylod.sparql.execute_select( 216 | endpoint_url=endpoints[endpoint_name], 217 | query=query, 218 | limit=limit_per_endpoint) 219 | 220 | if results[endpoint_name]: 221 | sys.stdout.write("| Results:\033[92m RETRIEVED \033[0m \n") 222 | else: 223 | sys.stdout.write("| Results:\033[91m NOT RETRIEVED \033[0m \n") 224 | 225 | except Exception as e: 226 | print("PyLOD.SPARQL.execute_select_to_all_endpoints() - Error while executing query to ", endpoint_name) 227 | print(e) 228 | else: 229 | sys.stdout.write("\033[91m UNREACHABLE \033[0m") 230 | sys.stdout.write("| Results: \033[91m NOT RETRIEVED \033[0m \n") 231 | 232 | results[endpoint_name] = None 233 | 234 | sys.stdout.flush() 235 | 236 | return results 237 | 238 | def is_active_endpoint(self, endpoint_url): 239 | """ 240 | Checks if the given endpoint URL corresponds to an active SPARQL-served endpoint. 241 | :param endpoint_url: The endpoint URL to check. 242 | :return: True if endpoint is active, False if endpoint is not reachable. 243 | """ 244 | 245 | # Try to make a selection 246 | if not self.execute_select(endpoint_url, 'SELECT ?x WHERE {?x ?y ?z}', limit=1): 247 | return False 248 | else: 249 | return True 250 | 251 | class Expose: 252 | def __init__(self, pylod): 253 | """ 254 | The Expose class constructor. 255 | :param pylod: Expose's parent class object (PyLOD object). 256 | """ 257 | 258 | self.pylod = pylod 259 | 260 | def classes(self, limit_per_endpoint=None): 261 | """ 262 | Exposes URIs of classes. 263 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 264 | :return: The query results as a dictionary (JSON format). 265 | """ 266 | 267 | # Execute query 268 | return self.pylod.sparql.execute_select_to_all_endpoints( 269 | query=""" 270 | SELECT DISTINCT (?class AS ?uri) 271 | WHERE { 272 | ?class rdf:type owl:Class . 273 | } 274 | """, 275 | limit_per_endpoint=limit_per_endpoint) 276 | 277 | def sub_classes(self, super_class, limit_per_endpoint=None): 278 | """ 279 | Exposes URIs of entities that are sub classes of the given class. 280 | :param super_class: The desired class to expose its sub classes. 281 | Should be given either with a known prefix (e.g. "dbo:Artist") or with the complete URI (e.g. "http://dbpedia.org/ontology/Artist"). 282 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 283 | :return: The query results as a dictionary (JSON format). 284 | """ 285 | 286 | # Validate given argument 287 | if self.pylod.is_valid_string(super_class): 288 | if self.pylod.is_url(super_class): 289 | super_class = "<" + super_class + ">" 290 | 291 | else: 292 | print("PyLOD.Expose.sub_classes() - Invalid argument") 293 | return False 294 | 295 | # Execute query 296 | return self.pylod.sparql.execute_select_to_all_endpoints( 297 | query=""" 298 | SELECT DISTINCT (?subclass AS ?uri) 299 | WHERE { 300 | ?subclass rdfs:subClassOf %s . 301 | } 302 | """ % (super_class,), 303 | limit_per_endpoint=limit_per_endpoint) 304 | 305 | def super_classes(self, sub_class, limit_per_endpoint=None): 306 | """ 307 | Exposes URIs of entities that are super classes of the given class. 308 | :param sub_class: The desired class to expose its super classes. 309 | Should be given either with a known prefix (e.g. "dbo:Artist") or with the complete URI (e.g. "http://dbpedia.org/ontology/Artist"). 310 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 311 | :return: The query results as a dictionary (JSON format). 312 | """ 313 | # Validate given argument 314 | if self.pylod.is_valid_string(sub_class): 315 | if self.pylod.is_url(sub_class): 316 | sub_class = "<" + sub_class + ">" 317 | 318 | else: 319 | print("PyLOD.Expose.super_classes() - Invalid argument") 320 | return False 321 | 322 | # Execute query 323 | return self.pylod.sparql.execute_select_to_all_endpoints( 324 | query=""" 325 | SELECT DISTINCT (?superclass AS ?uri) 326 | WHERE { 327 | %s rdfs:subClassOf ?superclass . 328 | } 329 | """ % (sub_class,), 330 | limit_per_endpoint=limit_per_endpoint) 331 | 332 | def equivalent_classes(self, cls, limit_per_endpoint=None): 333 | """ 334 | Exposes URIs of entities that are equivalent classes of the given class. 335 | :param cls: The desired class to expose its equivalent classes. 336 | Should be given either with a known prefix (e.g. "dbo:Artist") or with the complete URI (e.g. "http://dbpedia.org/ontology/Artist"). 337 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 338 | :return: The query results as a dictionary (JSON format). 339 | """ 340 | 341 | # Validate given argument 342 | if self.pylod.is_valid_string(cls): 343 | if self.pylod.is_url(cls): 344 | cls = "<" + cls + ">" 345 | else: 346 | print("PyLOD.Expose.equivalent_classes() - Invalid argument") 347 | return False 348 | 349 | # Execute query 350 | return self.pylod.sparql.execute_select_to_all_endpoints( 351 | query=""" 352 | SELECT DISTINCT (?equivalent_class AS ?uri) 353 | WHERE { 354 | ?equivalent_class owl:equivalentClass %s . 355 | } 356 | """ % (cls,), 357 | limit_per_endpoint=limit_per_endpoint) 358 | 359 | def disjoint_classes(self, cls, limit_per_endpoint=None): 360 | """ 361 | Exposes URIs of entities that are disjoint classes of the given class. 362 | :param cls: The desired class to expose its disjoint classes. 363 | Should be given either with a known prefix (e.g. "dbo:Artist") or with the complete URI (e.g. "http://dbpedia.org/ontology/Artist"). 364 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 365 | :return: The query results as a dictionary (JSON format). 366 | """ 367 | 368 | # Validate given argument 369 | if self.pylod.is_valid_string(cls): 370 | if self.pylod.is_url(cls): 371 | cls = "<" + cls + ">" 372 | else: 373 | print("PyLOD.Expose.disjoint_classes() - Invalid argument") 374 | return False 375 | 376 | # Execute query 377 | return self.pylod.sparql.execute_select_to_all_endpoints( 378 | query=""" 379 | SELECT DISTINCT (?disjoint_class AS ?uri) 380 | WHERE { 381 | ?disjoint_class owl:disjointWith %s . 382 | } 383 | """ % (cls,), 384 | limit_per_endpoint=limit_per_endpoint) 385 | 386 | def sub_properties(self, super_property, limit_per_endpoint=None): 387 | """ 388 | Exposes URIs of properties that are sub properties of the given property. 389 | :param super_property: The desired property to expose its sub properties. 390 | Should be given either with a known prefix (e.g. "rdfs:label") or with the complete URI (e.g. "https://www.w3.org/2000/01/rdf-schema#label"). 391 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 392 | :return: The query results as a dictionary (JSON format). 393 | """ 394 | 395 | # Validate given argument 396 | if self.pylod.is_valid_string(super_property): 397 | if self.pylod.is_url(super_property): 398 | super_property = "<" + super_property + ">" 399 | 400 | else: 401 | print("PyLOD.Expose.sub_properties() - Invalid argument") 402 | return False 403 | 404 | # Execute query 405 | return self.pylod.sparql.execute_select_to_all_endpoints( 406 | query=""" 407 | SELECT DISTINCT (?subproperty AS ?uri) 408 | WHERE { 409 | ?subproperty rdfs:subPropertyOf %s . 410 | } 411 | """ % (super_property,), 412 | limit_per_endpoint=limit_per_endpoint) 413 | 414 | def super_properties(self, sub_property, limit_per_endpoint=None): 415 | """ 416 | Exposes URIs of properties that are super properties of the given property. 417 | :param sub_property: The desired property to expose its super properties. 418 | Should be given either with a known prefix (e.g. "rdfs:label") or with the complete URI (e.g. "https://www.w3.org/2000/01/rdf-schema#label"). 419 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 420 | :return: The query results as a dictionary (JSON format). 421 | """ 422 | 423 | # Validate given argument 424 | if self.pylod.is_valid_string(sub_property): 425 | if self.pylod.is_url(sub_property): 426 | sub_property = "<" + sub_property + ">" 427 | 428 | else: 429 | print("PyLOD.Expose.super_properties() - Invalid argument") 430 | return False 431 | 432 | # Execute query 433 | return self.pylod.sparql.execute_select_to_all_endpoints( 434 | query=""" 435 | SELECT DISTINCT (?superproperty AS ?uri) 436 | WHERE { 437 | %s rdfs:subPropertyOf ?superproperty . 438 | } 439 | """ % (sub_property,), 440 | limit_per_endpoint=limit_per_endpoint) 441 | 442 | def subjects(self, predicate, object, limit_per_endpoint=None): 443 | """ 444 | Exposes entities found as subjects with the given predicate and object, within the scope of the tiple pattern Subject-Predicate-Object. 445 | :param predicate: The desired predicate (either as a full URI or with a known namespace) 446 | :param object: The desired object (either as a full URI or with a known namespace) 447 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 448 | :return: The query results as a dictionary (JSON format). 449 | """ 450 | 451 | # Validate given arguments 452 | if self.pylod.is_valid_string(predicate) and self.pylod.is_valid_string(object): 453 | if self.pylod.is_url(predicate): 454 | predicate = "<" + predicate + ">" 455 | if self.pylod.is_url(object): 456 | object = "<" + object + ">" 457 | 458 | else: 459 | print("PyLOD.Expose.subjects() - Invalid arguments") 460 | return False 461 | 462 | # Execute query 463 | return self.pylod.sparql.execute_select_to_all_endpoints( 464 | query=""" 465 | SELECT DISTINCT (?subject AS ?uri) 466 | WHERE { 467 | ?subject %s %s . 468 | } 469 | """ % (predicate, object), 470 | limit_per_endpoint=limit_per_endpoint) 471 | 472 | def predicates(self, subject, object, limit_per_endpoint=None): 473 | """ 474 | Exposes entities found as predicates with the given subject and object, within the scope of the tiple pattern Subject-Predicate-Object. 475 | :param subject: The desired subject (either as a full URI or with a known namespace) 476 | :param object: The desired object (either as a full URI or with a known namespace) 477 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 478 | :return: The query results as a dictionary (JSON format). 479 | """ 480 | 481 | # Validate given arguments 482 | if self.pylod.is_valid_string(subject) and self.pylod.is_valid_string(object): 483 | if self.pylod.is_url(subject): 484 | subject = "<" + subject + ">" 485 | if self.pylod.is_url(object): 486 | object = "<" + object + ">" 487 | 488 | else: 489 | print("PyLOD.Expose.predicates() - Invalid arguments") 490 | return False 491 | 492 | # Execute query 493 | return self.pylod.sparql.execute_select_to_all_endpoints( 494 | query=""" 495 | SELECT DISTINCT (?predicate AS ?uri) 496 | WHERE { 497 | %s ?predicate %s . 498 | } 499 | """ % (subject, object), 500 | limit_per_endpoint=limit_per_endpoint) 501 | 502 | def objects(self, subject, predicate, limit_per_endpoint=None): 503 | """ 504 | Exposes entities found as objects with the given subject and predicate, within the scope of the tiple pattern Subject-Predicate-Object. 505 | :param subject: The desired subject (either as a full URI or with a known namespace) 506 | :param predicate: The desired predicate (either as a full URI or with a known namespace) 507 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 508 | :return: The query results as a dictionary (JSON format). 509 | """ 510 | 511 | # Validate given arguments 512 | if self.pylod.is_valid_string(subject) and self.pylod.is_valid_string(predicate): 513 | if self.pylod.is_url(subject): 514 | subject = "<" + subject + ">" 515 | if self.pylod.is_url(predicate): 516 | predicate = "<" + predicate + ">" 517 | 518 | else: 519 | print("PyLOD.Expose.objects() - Invalid arguments") 520 | return False 521 | 522 | # Execute query 523 | return self.pylod.sparql.execute_select_to_all_endpoints( 524 | query=""" 525 | SELECT DISTINCT (?object AS ?uri) 526 | WHERE { 527 | %s %s ?object . 528 | } 529 | """ % (subject, predicate), 530 | limit_per_endpoint=limit_per_endpoint) 531 | 532 | def triples(self, subject=None, predicate=None, object=None, limit_per_endpoint=None): 533 | """ 534 | Exposes triples with the given subject and/or predicate and/or object, within the scope of the tiple pattern Subject-Predicate-Object. 535 | If any of the arguments (subject, predicate, object) is not defined (None), then it will act as a variable in the query. 536 | :param subject: Optional argument. If not provided, triples will be returned where the subject is variable. 537 | :param predicate: Optional argument. If not provided, triples will be returned where the predicate is variable. 538 | :param object: Optional argument. If not provided, triples will be returned where the object is variable. 539 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 540 | :return: The query results as a dictionary (JSON format). 541 | """ 542 | 543 | # Validate arguments and initialize not given arguments 544 | 545 | # Subject argument 546 | if subject is None: 547 | subject = "?subject" 548 | elif self.pylod.is_valid_string(subject): 549 | if self.pylod.is_url(subject): 550 | subject = "<" + subject + ">" 551 | else: 552 | print("PyLOD.Expose.triples() - Invalid subject argument") 553 | return False 554 | 555 | # Predicate argument 556 | if predicate is None: 557 | predicate = "?predicate" 558 | elif self.pylod.is_valid_string(predicate): 559 | if self.pylod.is_url(predicate): 560 | predicate = "<" + predicate + ">" 561 | else: 562 | print("PyLOD.Expose.triples() - Invalid predicate argument") 563 | return False 564 | 565 | # Object argument 566 | if object is None: 567 | object = "?object" 568 | elif self.pylod.is_valid_string(object): 569 | if self.pylod.is_url(object): 570 | object = "<" + object + ">" 571 | else: 572 | print("PyLOD.Expose.triples() - Invalid object argument") 573 | return False 574 | 575 | # Execute query 576 | return self.pylod.sparql.execute_select_to_all_endpoints( 577 | query=""" 578 | SELECT DISTINCT ?subject ?predicate ?object 579 | WHERE { 580 | %s %s %s . 581 | } 582 | """ % (subject, predicate, object), 583 | limit_per_endpoint=limit_per_endpoint) 584 | 585 | def instances_of_class(self, cls, include_subclasses=False, limit_per_endpoint=None): 586 | """ 587 | Exposes instances of the given class and (optionally) its subclasses. 588 | :param cls: The desired class to be queried for isntances. 589 | :param include_subclasses: Optional argument (boolean). If True, instances from cls's subclasses will also be returned. 590 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 591 | :return: The query results as a dictionary (JSON format). 592 | """ 593 | 594 | # Validate given argument 595 | if self.pylod.is_valid_string(cls): 596 | if self.pylod.is_url(cls): 597 | cls = "<" + cls + ">" 598 | 599 | else: 600 | print("PyLOD.Expose.instances_of_class() - Invalid argument") 601 | return False 602 | 603 | # Check if subclasses of cls should be included 604 | predicate = "rdf:type" 605 | if include_subclasses: 606 | predicate += "*" 607 | 608 | # Execute query 609 | return self.pylod.sparql.execute_select_to_all_endpoints( 610 | query=""" 611 | SELECT DISTINCT (?instance AS ?uri) 612 | WHERE { 613 | ?instance %s %s . 614 | } 615 | """ % (predicate, cls,), 616 | limit_per_endpoint=limit_per_endpoint) 617 | 618 | def labels(self, entity, language=None, limit_per_endpoint=None): 619 | """ 620 | Exposes the labels of entities. Optionally, a language tag can be defined. 621 | :param entity: The URI of entity to retrieve its labels 622 | :param language: Optional language parameter as defined in BCP 47. 623 | :param limit_per_endpoint: Optional argument (integer) to limit query results per endpoint. 624 | :return: The query results as a dictionary (JSON format). 625 | """ 626 | 627 | # Validate given argument 628 | if self.pylod.is_valid_string(entity): 629 | if self.pylod.is_url(entity): 630 | entity = "<" + entity + ">" 631 | 632 | else: 633 | print("PyLOD.Expose.labels() - Invalid argument") 634 | return False 635 | 636 | language_filter = "" 637 | 638 | # Check if a language tag is selected 639 | if language is not None and self.pylod.is_valid_string(language): 640 | language_filter = "FILTER (LANG(?label) = '%s')" % (language,) 641 | 642 | # Execute query 643 | return self.pylod.sparql.execute_select_to_all_endpoints( 644 | query=""" 645 | SELECT DISTINCT ?label 646 | WHERE { 647 | %s rdfs:label ?label . 648 | %s 649 | } 650 | """ % (entity, language_filter,), 651 | limit_per_endpoint=limit_per_endpoint) 652 | 653 | self.endpoints = Endpoints(endpoint_dictionary=endpoint_dictionary) 654 | self.namespaces = Namespaces(namespace_dictionary=namespaces_dictionary) 655 | self.sparql = SPARQL(pylod=self) 656 | self.expose = Expose(pylod=self) 657 | 658 | def is_url(self, text): 659 | """ 660 | Checks if a given string is a URL. 661 | :param text: The string to be tested. 662 | :return: True if URL, False if not a URL. 663 | """ 664 | 665 | regex = re.compile( 666 | r'^(?:http|ftp)s?://' # http:// or https:// 667 | r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain... 668 | r'localhost|' #localhost... 669 | r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip 670 | r'(?::\d+)?' # optional port 671 | r'(?:/?|[/?]\S+)$', re.IGNORECASE) 672 | 673 | try: 674 | return re.match(regex, text) is not None 675 | except Exception as e: 676 | print("PyLOD.is_url() - Invalid argument") 677 | print(e) 678 | return False 679 | 680 | def is_valid_string(self, arg): 681 | """ 682 | Checks if the given argument is a non-empty, non-whitespace string 683 | :param arg: The argument to check. 684 | :return: True if valid string, False if not 685 | """ 686 | 687 | if isinstance(arg, str) and arg and (not arg.isspace()): 688 | return True 689 | 690 | return False 691 | 692 | if __name__ == '__main__': 693 | print("Please visit http://pmitzias/PyLOD/docs.html for usage instructions.") 694 | 695 | 696 | 697 | -------------------------------------------------------------------------------- /PyLOD/__init__.py: -------------------------------------------------------------------------------- 1 | try: 2 | from PyLOD.PyLOD import PyLOD 3 | except: 4 | from PyLOD import PyLOD 5 | 6 | __author__ = 'Panos Mitzias' 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PyLOD 2 | PyLOD is a Python wrapper for exposing Linked Open Data from public SPARQL-served endpoints. It acts as an abstraction layer for the retrieval of structured data, such as classes, properties and individuals, without requiring any knowledge of SPARQL. 3 | 4 | [![Downloads](http://pepy.tech/badge/pylod)](http://pepy.tech/project/pylod) 5 | 6 | ## Getting Started 7 | PyLOD is a minimal module for Python (2.x. and 3.x). 8 | 9 | ### Prerequisites 10 | 11 | [SPARQLWrapper](https://rdflib.github.io/sparqlwrapper/) - SPARQLWrapper is a simple Python wrapper around a SPARQL service to remotelly execute queries. 12 | 13 | ### Installation 14 | 15 | * #### Manually 16 | 17 | 1. Install [SPARQLWrapper](https://github.com/RDFLib/sparqlwrapper). 18 | 2. Save `PyLOD.py` to your project's directory. 19 | 20 | * #### From PyPi 21 | 22 | ``` 23 | pip install PyLOD 24 | ``` 25 | 26 | ## Usage 27 | **1. Import the PyLOD class and create a PyLOD class object.** 28 | ```python 29 | from PyLOD import PyLOD 30 | pylod = PyLOD() 31 | ``` 32 | 33 | **2. Provide a dictionary of desired namespaces** 34 | ```python 35 | my_namespaces={ 36 | "dbo": "http://dbpedia.org/ontology/", 37 | "dbp": "http://dbpedia.org/property/" 38 | } 39 | 40 | pylod.namespaces.set_namespaces(my_namespaces) 41 | ``` 42 | This step is optional, since PyLOD already incorporates a set of known namespaces. To get the list of defined namespaces, use this: 43 | 44 | ```python 45 | print(pylod.namespaces.get_namespaces()) 46 | ``` 47 | 48 | **3. Define a dictionary of SPARQL endpoints to be queried:** 49 | ```python 50 | my_endpoints={ 51 | "DBpedia": "http://dbpedia.org/sparql", 52 | "GeoLinkedData": "http://linkedgeodata.org/sparql" 53 | } 54 | 55 | pylod.endpoints.set_endpoints(my_endpoints) 56 | ``` 57 | If no endpoints are defined, PyLOD will use a pre-defined set of known endpoints. To get the list of these endpoints, do this: 58 | 59 | ```python 60 | print(pylod.endpoints.get_endpoints()) 61 | ``` 62 | 63 | **4. Use PyLOD's `expose` functions to retrieve structured data from the endpoints.** 64 | Set the optional argument `limit_per_endpoint` to limit the results per endpoint. For example: 65 | ```python 66 | # Get entities of type owl:Class 67 | classes = pylod.expose.classes(limit_per_endpoint=100) 68 | 69 | # Get the sub-classes of a specific class 70 | sub_classes = pylod.expose.sub_classes(super_class="dbo:Artist") 71 | 72 | # Get instances of a specific class 73 | instances = pylod.expose.instances_of_class(cls="dbo:Artist", include_subclasses=True, limit_per_endpoint=50) 74 | 75 | # Execute custom SPARQL select query to all endpoints 76 | results = pylod.sparql.execute_select_to_all_endpoints(query="SELECT * WHERE {?s ?p ?o}") 77 | ``` 78 | 79 | ### Expose functions: 80 | * __classes()__ - Returns class entities 81 | * __sub_classes()__ - Returns the sub-classes of a given class 82 | * __super_classes()__ - Returns the super-classes of a given class 83 | * __equivalent_classes()__ - Returns the equivalent classes of a given class 84 | * __disjoint_classes()__ - Returns the disjoint classes of a given class 85 | * __sub_properties()__ - Returns the sub-properties of a given property 86 | * __super_properties()__ - Returns the super-properties of a given property 87 | * __triples()__ - Allows the retrieval of triples within the pattern (subject-predicate-object) 88 | * __subjects()__ - Returns the subjects of a given predicate-object pair 89 | * __predicates()__ - Returns the predicates of a given subject-object pair 90 | * __objects()__ - Returns the objects of a given subject-predicate pair 91 | * __instances_of_class()__ - Returns instances of a given class type 92 | * __labels()__ - Returns labels of a given entity, with an optional language argument 93 | 94 | ### SPARQL functions: 95 | * __execute_select()__ - Allows the execution of a custom SPARQL select query to a given endpoint URL 96 | * __execute_select_to_all_endpoints()__ - Allows the execution of a custom SPARQL select query to all endpoints defined in `pylod.endpoints.get_endpoints()` 97 | * __is_active_endpoint()__ - Checks if a given endpoint URL is alive and responds to SPARQL queries 98 | 99 | ## Documentation 100 | [The official webpage](http://pmitzias.com/PyLOD) - [The Docs](http://pmitzias.com/PyLOD/docs.html) 101 | 102 | ## Authors 103 | * [Panos Mitzias](http://pmitzias.com) - Design and development 104 | * [Stratos Kontopoulos](http://stratoskontopoulos.com) - Contribution to the design 105 | 106 | ## Powered by 107 | * [Centre for Research & Technology Hellas - CERTH](https://www.certh.gr/root.en.aspx) 108 | * [Multimedia Knowledge & Social Media Analytics Laboratory - MKLab](http://mklab.iti.gr/) 109 | 110 | ## Applications 111 | PyLOD has been deployed in the following projects: 112 | 113 | * [PERICLES](http://project-pericles.eu/) 114 | * [ROBORDER](http://roborder.eu/) 115 | * [TENSOR](https://tensor-project.eu/) 116 | * [SUITCEYES](http://suitceyes.eu/) 117 | * [beAWARE](https://beaware-project.eu/) 118 | --------------------------------------------------------------------------------