├── .gitignore ├── CONFIG.md ├── LICENSE.md ├── README.md ├── _config.yml ├── elsapy ├── __init__.py ├── elsclient.py ├── elsdoc.py ├── elsentity.py ├── elsprofile.py ├── elssearch.py ├── log_util.py └── utils.py ├── exampleProg.py ├── setup.cfg ├── setup.py └── test_elsapy.py /.gitignore: -------------------------------------------------------------------------------- 1 | .cache 2 | config.json 3 | niso_training_prog.py 4 | test_elsapy.py 5 | dump.json 6 | *.pyc 7 | *.log 8 | /data 9 | /test_data 10 | /logs 11 | /dist 12 | /build 13 | /elsapy.egg-info -------------------------------------------------------------------------------- /CONFIG.md: -------------------------------------------------------------------------------- 1 | # configuration 2 | If you are using (or adapting) exampleProg.py, do this: 3 | - In the folder in which exampleProg.py resides, create a file called 'config.json' 4 | - Open 'config.json' in a file editor, and insert the following: 5 | 6 | ``` 7 | { 8 | "apikey": "ENTER_APIKEY_HERE", 9 | "insttoken": "ENTER_INSTTOKEN_HERE_IF_YOU_HAVE_ONE_ELSE_DELETE" 10 | } 11 | ``` 12 | 13 | - Paste your APIkey (obtained from http://dev.elsevier.com) in the right place 14 | - If you don't have a valid insttoken (which you would have received from Elsevier support staff), delete the placeholder text. If you enter a dummy value, your API requests will fail. 15 | 16 | The '.gitignore' file lists 'config.json' as a file to be ignored when committing elsapy to a GIT repository, which is to prevent your APIkey from being shared with the world. Make similar provisions when you change your configuration setup. 17 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | ## Copyright (c) 2016-2017, Elsevier Inc. All rights reserved. 2 | 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 4 | 5 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 6 | 7 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 8 | 9 | 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 10 | 11 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # elsapy 2 | 3 | A Python module for use with api.elsevier.com. Its aim is to make life easier for people who are not primarily programmers, but need to interact with publication and citation data from Elsevier products in a programmatic manner (e.g. academic researchers). The module consists of the following classes: 4 | 5 | * ElsClient: represents a [client interface to api.elsevier.com](https://github.com/ElsevierDev/elsapy/wiki/Establishing-an-API-interface-for-your-program). 6 | * ElsEntity: an abstract class representing an entity in the Elsevier (specifically, Scopus) data model. ElsEntities can be initialized with a URI, after which they can read their own data from api.elsevier.com through an ElsClient instance. ElsEntity has the following descendants: 7 | * elsProf: an abstract class representing a _profiled_ entity in Scopus. This class has two descendants: 8 | * ElsAuthor: represent the author of one or more documents in Scopus. 9 | * ElsAffil: represents an affiliation (i.e. an institution authors are affiliated with) in Scopus 10 | * AbsDoc: represents a document in Scopus (i.e. abstract only). This document typically is the record of a scholarly article in any of the journals covered in Scopus. 11 | * FullDoc: represents a document in ScienceDirect (i.e. full text). This document is the full-text version of a scholarly article or book chapter from a journal published by Elsevier. 12 | 13 | Each ElsEntity (once read) has a .data attribute, which contains a JSON/dictionary representation of the object's data. Use the object's .data.keys() method to list the first-level keys in the dictionary; drill down from there to explore the data. 14 | 15 | ElsAuthor and ElsAffil objects also have a method, .readDocs(), that tells it to retrieve all the publications associated with that author/affiliation from Elsevier's API, and store it as a list attribute, .doc_list. Each entry in the list is a dictionary containing that document's metadata. 16 | * ElsSearch: represents a search through one of Elsevier's indexes, which can be a document index (ScienceDirect or Scopus), an author index, or an affiliation index. Once executed, each search object has a list attribute, .results, that contains the results retrieved from Elsevier's APIs for that search. Each entry in the list is a dictionary containing that result's metadata. 17 | 18 | More info on the [wiki](https://github.com/ElsevierDev/elsapy/wiki). 19 | 20 | ## Prerequisites 21 | * An API key from http://dev.elsevier.com 22 | * Python 3.x on your machine, with the [Requests HTTP](http://docs.python-requests.org/) and [pandas](https://pandas.pydata.org/) libraries added. If you have neither installed yet, you might want to get the [Anaconda distribution of Python 3.6](https://www.continuum.io/downloads) go get both in one go (plus a lot of other useful stuff) 23 | * A network connection at an institution that subscribes to Scopus and/or ScienceDirect 24 | * Some knowledge of Python and [object-oriented design](https://en.wikipedia.org/wiki/Object-oriented_design) 25 | 26 | ## Quick start 27 | * Run `pip install elsapy` from your command line 28 | * In your project root folder, [create a config file and add your APIkey](https://github.com/ElsevierDev/elsapy/blob/master/CONFIG.md) to it 29 | * Download [exampleProg.py](https://raw.githubusercontent.com/ElsevierDev/elsapy/master/exampleProg.py) to your project root folder and modify it to suit your needs 30 | 31 | ## Disclaimer 32 | This is not an 'official' SDK and is not guaranteed to always work with Elsevier's APIs, on all platforms, or without eating up all your machine's resources. But we'll do our best to keep it in good shape, are happy to take suggestions for improvements, and are open to collaborations. License info is [here](https://github.com/ElsevierDev/elsapy/blob/master/LICENSE.md). 33 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-minimal -------------------------------------------------------------------------------- /elsapy/__init__.py: -------------------------------------------------------------------------------- 1 | """A Python module for use with api.elsevier.com. Its aim is to make life easier 2 | for people who are not primarily programmers, but need to interact with 3 | publication and citation data from Elsevier products in a programmatic 4 | manner (e.g. academic researchers). 5 | Additional resources: 6 | * https://github.com/ElsevierDev/elsapy 7 | * https://dev.elsevier.com 8 | * https://api.elsevier.com""" 9 | 10 | version = '0.5.1' -------------------------------------------------------------------------------- /elsapy/elsclient.py: -------------------------------------------------------------------------------- 1 | """A Python module that provides the API client component for the elsapy package. 2 | Additional resources: 3 | * https://github.com/ElsevierDev/elsapy 4 | * https://dev.elsevier.com 5 | * https://api.elsevier.com""" 6 | 7 | 8 | import requests, json, time 9 | from . import log_util 10 | from .__init__ import version 11 | try: 12 | import pathlib 13 | except ImportError: 14 | import pathlib2 as pathlib 15 | 16 | logger = log_util.get_logger(__name__) 17 | 18 | class ElsClient: 19 | """A class that implements a Python interface to api.elsevier.com""" 20 | 21 | # class variables 22 | __url_base = "https://api.elsevier.com/" ## Base URL for later use 23 | __user_agent = "elsapy-v%s" % version ## Helps track library use 24 | __min_req_interval = 1 ## Min. request interval in sec 25 | __ts_last_req = time.time() ## Tracker for throttling 26 | 27 | # constructors 28 | def __init__(self, api_key, inst_token = None, num_res = 25, local_dir = None): 29 | # TODO: make num_res configurable for searches and documents/authors view 30 | # - see https://github.com/ElsevierDev/elsapy/issues/32 31 | """Initializes a client with a given API Key and, optionally, institutional 32 | token, number of results per request, and local data path.""" 33 | self.api_key = api_key 34 | self.inst_token = inst_token 35 | self.num_res = num_res 36 | if not local_dir: 37 | self.local_dir = pathlib.Path.cwd() / 'data' 38 | else: 39 | self.local_dir = pathlib.Path(local_dir) 40 | if not self.local_dir.exists(): 41 | self.local_dir.mkdir() 42 | 43 | # properties 44 | @property 45 | def api_key(self): 46 | """Get the apiKey for the client instance""" 47 | return self._api_key 48 | @api_key.setter 49 | def api_key(self, api_key): 50 | """Set the apiKey for the client instance""" 51 | self._api_key = api_key 52 | 53 | @property 54 | def inst_token(self): 55 | """Get the instToken for the client instance""" 56 | return self._inst_token 57 | @inst_token.setter 58 | def inst_token(self, inst_token): 59 | """Set the instToken for the client instance""" 60 | self._inst_token = inst_token 61 | 62 | @property 63 | def num_res(self): 64 | """Gets the max. number of results to be used by the client instance""" 65 | return self._num_res 66 | 67 | @num_res.setter 68 | def num_res(self, numRes): 69 | """Sets the max. number of results to be used by the client instance""" 70 | self._num_res = numRes 71 | 72 | @property 73 | def local_dir(self): 74 | """Gets the currently configured local path to write data to.""" 75 | return self._local_dir 76 | 77 | @property 78 | def req_status(self): 79 | '''Return the status of the request response, ''' 80 | return {'status_code': self._status_code, 'status_msg': self._status_msg} 81 | 82 | @local_dir.setter 83 | def local_dir(self, path_str): 84 | """Sets the local path to write data to.""" 85 | self._local_dir = pathlib.Path(path_str) 86 | 87 | # access functions 88 | def getBaseURL(self): 89 | """Returns the ELSAPI base URL currently configured for the client""" 90 | return self.__url_base 91 | 92 | # request/response execution functions 93 | def exec_request(self, URL): 94 | """Sends the actual request; returns response.""" 95 | 96 | ## Throttle request, if need be 97 | interval = time.time() - self.__ts_last_req 98 | if (interval < self.__min_req_interval): 99 | time.sleep( self.__min_req_interval - interval ) 100 | 101 | ## Construct and execute request 102 | headers = { 103 | "X-ELS-APIKey" : self.api_key, 104 | "User-Agent" : self.__user_agent, 105 | "Accept" : 'application/json' 106 | } 107 | if self.inst_token: 108 | headers["X-ELS-Insttoken"] = self.inst_token 109 | logger.info('Sending GET request to ' + URL) 110 | r = requests.get( 111 | URL, 112 | headers = headers 113 | ) 114 | self.__ts_last_req = time.time() 115 | self._status_code=r.status_code 116 | if r.status_code == 200: 117 | self._status_msg='data retrieved' 118 | return json.loads(r.text) 119 | else: 120 | self._status_msg="HTTP " + str(r.status_code) + " Error from " + URL + " and using headers " + str(headers) + ": " + r.text 121 | raise requests.HTTPError("HTTP " + str(r.status_code) + " Error from " + URL + "\nand using headers " + str(headers) + ":\n" + r.text) 122 | -------------------------------------------------------------------------------- /elsapy/elsdoc.py: -------------------------------------------------------------------------------- 1 | """The document module of elsapy. 2 | Additional resources: 3 | * https://github.com/ElsevierDev/elsapy 4 | * https://dev.elsevier.com 5 | * https://api.elsevier.com""" 6 | 7 | from . import log_util 8 | from .elsentity import ElsEntity 9 | 10 | logger = log_util.get_logger(__name__) 11 | 12 | class FullDoc(ElsEntity): 13 | """A document in ScienceDirect. Initialize with PII or DOI.""" 14 | 15 | # static variables 16 | __payload_type = u'full-text-retrieval-response' 17 | __uri_base = u'https://api.elsevier.com/content/article/' 18 | 19 | @property 20 | def title(self): 21 | """Gets the document's title""" 22 | return self.data["coredata"]["dc:title"]; 23 | 24 | @property 25 | def uri(self): 26 | """Gets the document's uri""" 27 | return self._uri 28 | 29 | # constructors 30 | def __init__(self, uri = '', sd_pii = '', doi = ''): 31 | """Initializes a document given a ScienceDirect PII or DOI.""" 32 | if uri and not sd_pii and not doi: 33 | super().__init__(uri) 34 | elif sd_pii and not uri and not doi: 35 | super().__init__(self.__uri_base + 'pii/' + str(sd_pii)) 36 | elif doi and not uri and not sd_pii: 37 | super().__init__(self.__uri_base + 'doi/' + str(doi)) 38 | elif not uri and not doi: 39 | raise ValueError('No URI, ScienceDirect PII or DOI specified') 40 | else: 41 | raise ValueError('Multiple identifiers specified; just need one.') 42 | 43 | # modifier functions 44 | def read(self, els_client = None): 45 | """Reads the JSON representation of the document from ELSAPI. 46 | Returns True if successful; else, False.""" 47 | if super().read(self.__payload_type, els_client): 48 | return True 49 | else: 50 | return False 51 | 52 | class AbsDoc(ElsEntity): 53 | """A document in Scopus. Initialize with URI or Scopus ID.""" 54 | 55 | # static variables 56 | __payload_type = u'abstracts-retrieval-response' 57 | __uri_base = u'https://api.elsevier.com/content/abstract/' 58 | 59 | @property 60 | def title(self): 61 | """Gets the document's title""" 62 | return self.data["coredata"]["dc:title"]; 63 | 64 | @property 65 | def uri(self): 66 | """Gets the document's uri""" 67 | return self._uri 68 | 69 | # constructors 70 | def __init__(self, uri = '', scp_id = ''): 71 | """Initializes a document given a Scopus document URI or Scopus ID.""" 72 | if uri and not scp_id: 73 | super().__init__(uri) 74 | elif scp_id and not uri: 75 | super().__init__(self.__uri_base + 'scopus_id/' + str(scp_id)) 76 | elif not uri and not scp_id: 77 | raise ValueError('No URI or Scopus ID specified') 78 | else: 79 | raise ValueError('Both URI and Scopus ID specified; just need one.') 80 | 81 | # modifier functions 82 | def read(self, els_client = None): 83 | """Reads the JSON representation of the document from ELSAPI. 84 | Returns True if successful; else, False.""" 85 | if super().read(self.__payload_type, els_client): 86 | return True 87 | else: 88 | return False -------------------------------------------------------------------------------- /elsapy/elsentity.py: -------------------------------------------------------------------------------- 1 | """The (abstract) base entity module for elsapy. Used by elsprofile, elsdoc. 2 | Additional resources: 3 | * https://github.com/ElsevierDev/elsapy 4 | * https://dev.elsevier.com 5 | * https://api.elsevier.com""" 6 | 7 | import requests, json, urllib 8 | from abc import ABCMeta, abstractmethod 9 | from . import log_util 10 | 11 | logger = log_util.get_logger(__name__) 12 | 13 | class ElsEntity(metaclass=ABCMeta): 14 | """An abstract class representing an entity in Elsevier's data model""" 15 | 16 | # constructors 17 | @abstractmethod 18 | def __init__(self, uri): 19 | """Initializes a data entity with its URI""" 20 | self._uri = uri 21 | self._data = None 22 | self._client = None 23 | 24 | # properties 25 | @property 26 | def uri(self): 27 | """Get the URI of the entity instance""" 28 | return self._uri 29 | 30 | @uri.setter 31 | def uri(self, uri): 32 | """Set the URI of the entity instance""" 33 | self._uri = uri 34 | 35 | @property 36 | def id(self): 37 | """Get the dc:identifier of the entity instance""" 38 | return self.data["coredata"]["dc:identifier"] 39 | 40 | @property 41 | def int_id(self): 42 | """Get the (non-URI, numbers only) ID of the entity instance""" 43 | dc_id = self.data["coredata"]["dc:identifier"] 44 | return dc_id[dc_id.find(':') + 1:] 45 | 46 | @property 47 | def data(self): 48 | """Get the full JSON data for the entity instance""" 49 | return self._data 50 | 51 | @property 52 | def client(self): 53 | """Get the elsClient instance currently used by this entity instance""" 54 | return self._client 55 | 56 | @client.setter 57 | def client(self, elsClient): 58 | """Set the elsClient instance to be used by thisentity instance""" 59 | self._client = elsClient 60 | 61 | # modifier functions 62 | @abstractmethod 63 | def read(self, payloadType, elsClient): 64 | """Fetches the latest data for this entity from api.elsevier.com. 65 | Returns True if successful; else, False.""" 66 | if elsClient: 67 | self._client = elsClient; 68 | elif not self.client: 69 | raise ValueError('''Entity object not currently bound to elsClient instance. Call .read() with elsClient argument or set .client attribute.''') 70 | try: 71 | api_response = self.client.exec_request(self.uri) 72 | if isinstance(api_response[payloadType], list): 73 | self._data = api_response[payloadType][0] 74 | else: 75 | self._data = api_response[payloadType] 76 | ## TODO: check if URI is the same, if necessary update and log warning. 77 | logger.info("Data loaded for " + self.uri) 78 | return True 79 | except (requests.HTTPError, requests.RequestException) as e: 80 | for elm in e.args: 81 | logger.warning(elm) 82 | return False 83 | 84 | def write(self): 85 | """If data exists for the entity, writes it to disk as a .JSON file with 86 | the url-encoded URI as the filename and returns True. Else, returns 87 | False.""" 88 | if (self.data): 89 | dataPath = self.client.local_dir / (urllib.parse.quote_plus(self.uri)+'.json') 90 | with dataPath.open(mode='w') as dump_file: 91 | json.dump(self.data, dump_file) 92 | dump_file.close() 93 | logger.info('Wrote ' + self.uri + ' to file') 94 | return True 95 | else: 96 | logger.warning('No data to write for ' + self.uri) 97 | return False -------------------------------------------------------------------------------- /elsapy/elsprofile.py: -------------------------------------------------------------------------------- 1 | """The author/affiliation profile module of elsapy. 2 | Additional resources: 3 | * https://github.com/ElsevierDev/elsapy 4 | * https://dev.elsevier.com 5 | * https://api.elsevier.com""" 6 | 7 | import requests, json, urllib, pandas as pd 8 | from abc import ABCMeta, abstractmethod 9 | from . import log_util 10 | from .elsentity import ElsEntity 11 | from .utils import recast_df 12 | 13 | 14 | logger = log_util.get_logger(__name__) 15 | 16 | class ElsProfile(ElsEntity, metaclass=ABCMeta): 17 | """An abstract class representing an author or affiliation profile in 18 | Elsevier's data model""" 19 | 20 | def __init__(self, uri): 21 | """Initializes a data entity with its URI""" 22 | super().__init__(uri) 23 | self._doc_list = None 24 | 25 | 26 | @property 27 | def doc_list(self): 28 | """Get the list of documents for this entity""" 29 | return self._doc_list 30 | 31 | @abstractmethod 32 | def read_docs(self, payloadType, els_client = None): 33 | """Fetches the list of documents associated with this entity from 34 | api.elsevier.com. If need be, splits the requests in batches to 35 | retrieve them all. Returns True if successful; else, False. 36 | NOTE: this method requires elevated API permissions. 37 | See http://bit.ly/2leirnq for more info.""" 38 | if els_client: 39 | self._client = els_client; 40 | elif not self.client: 41 | raise ValueError('''Entity object not currently bound to els_client instance. Call .read() with els_client argument or set .client attribute.''') 42 | try: 43 | api_response = self.client.exec_request(self.uri + "?view=documents") 44 | if isinstance(api_response[payloadType], list): 45 | data = api_response[payloadType][0] 46 | else: 47 | data = api_response[payloadType] 48 | docCount = int(data["documents"]["@total"]) 49 | self._doc_list = [x for x in data["documents"]["abstract-document"]] 50 | for i in range (0, docCount//self.client.num_res): 51 | try: 52 | api_response = self.client.exec_request(self.uri + "?view=documents&startref=" + str((i+1) * self.client.num_res+1)) 53 | if isinstance(api_response[payloadType], list): 54 | data = api_response[payloadType][0] 55 | else: 56 | data = api_response[payloadType] 57 | self._doc_list = self._doc_list + [x for x in data["documents"]["abstract-document"]] 58 | except (requests.HTTPError, requests.RequestException) as e: 59 | if hasattr(self, 'doc_list'): ## We don't want incomplete doc lists 60 | self._doc_list = None 61 | raise e 62 | logger.info("Documents loaded for " + self.uri) 63 | self.docsframe = recast_df(pd.DataFrame(self._doc_list)) 64 | logger.info("Documents loaded into dataframe for " + self.uri) 65 | return True 66 | except (requests.HTTPError, requests.RequestException) as e: 67 | logger.warning(e.args) 68 | return False 69 | 70 | def write_docs(self): 71 | """If a doclist exists for the entity, writes it to disk as a JSON file 72 | with the url-encoded URI as the filename and returns True. Else, 73 | returns False.""" 74 | if self.doc_list: 75 | dump_file = open('data/' 76 | + urllib.parse.quote_plus(self.uri+'?view=documents') 77 | + '.json', mode='w' 78 | ) 79 | dump_file.write('[' + json.dumps(self.doc_list[0])) 80 | for i in range (1, len(self.doc_list)): 81 | dump_file.write(',' + json.dumps(self.doc_list[i])) 82 | dump_file.write(']') 83 | dump_file.close() 84 | logger.info('Wrote ' + self.uri + '?view=documents to file') 85 | return True 86 | else: 87 | logger.warning('No doclist to write for ' + self.uri) 88 | return False 89 | 90 | 91 | class ElsAuthor(ElsProfile): 92 | """An author of a document in Scopus. Initialize with URI or author ID.""" 93 | 94 | # static variables 95 | _payload_type = u'author-retrieval-response' 96 | _uri_base = u'https://api.elsevier.com/content/author/author_id/' 97 | 98 | # constructors 99 | def __init__(self, uri = '', author_id = ''): 100 | """Initializes an author given a Scopus author URI or author ID""" 101 | if uri and not author_id: 102 | super().__init__(uri) 103 | elif author_id and not uri: 104 | super().__init__(self._uri_base + str(author_id)) 105 | elif not uri and not author_id: 106 | raise ValueError('No URI or author ID specified') 107 | else: 108 | raise ValueError('Both URI and author ID specified; just need one.') 109 | 110 | # properties 111 | @property 112 | def first_name(self): 113 | """Gets the author's first name""" 114 | return self.data[u'author-profile'][u'preferred-name'][u'given-name'] 115 | 116 | @property 117 | def last_name(self): 118 | """Gets the author's last name""" 119 | return self.data[u'author-profile'][u'preferred-name'][u'surname'] 120 | 121 | @property 122 | def full_name(self): 123 | """Gets the author's full name""" 124 | return self.first_name + " " + self.last_name 125 | 126 | # modifier functions 127 | def read(self, els_client = None): 128 | """Reads the JSON representation of the author from ELSAPI. 129 | Returns True if successful; else, False.""" 130 | if ElsProfile.read(self, self._payload_type, els_client): 131 | return True 132 | else: 133 | return False 134 | 135 | def read_docs(self, els_client = None): 136 | """Fetches the list of documents associated with this author from 137 | api.elsevier.com. Returns True if successful; else, False.""" 138 | return ElsProfile.read_docs(self, self._payload_type, els_client) 139 | 140 | def read_metrics(self, els_client = None): 141 | """Reads the bibliographic metrics for this author from api.elsevier.com 142 | and updates self.data with them. Returns True if successful; else, 143 | False.""" 144 | try: 145 | fields = [ 146 | "document-count", 147 | "cited-by-count", 148 | "citation-count", 149 | "h-index", 150 | "dc:identifier", 151 | ] 152 | api_response = els_client.exec_request( 153 | self.uri + "?field=" + ",".join(fields)) 154 | data = api_response[self._payload_type][0] 155 | if not self.data: 156 | self._data = dict() 157 | self._data['coredata'] = dict() 158 | # TODO: apply decorator for type conversion of common fields 159 | self._data['coredata']['dc:identifier'] = data['coredata']['dc:identifier'] 160 | self._data['coredata']['citation-count'] = int(data['coredata']['citation-count']) 161 | self._data['coredata']['cited-by-count'] = int(data['coredata']['citation-count']) 162 | self._data['coredata']['document-count'] = int(data['coredata']['document-count']) 163 | self._data['h-index'] = int(data['h-index']) 164 | logger.info('Added/updated author metrics') 165 | except (requests.HTTPError, requests.RequestException) as e: 166 | logger.warning(e.args) 167 | return False 168 | return True 169 | 170 | 171 | class ElsAffil(ElsProfile): 172 | """An affilliation (i.e. an institution an author is affiliated with) in Scopus. 173 | Initialize with URI or affiliation ID.""" 174 | 175 | # static variables 176 | _payload_type = u'affiliation-retrieval-response' 177 | _uri_base = u'https://api.elsevier.com/content/affiliation/affiliation_id/' 178 | 179 | # constructors 180 | def __init__(self, uri = '', affil_id = ''): 181 | """Initializes an affiliation given a Scopus affiliation URI or affiliation ID.""" 182 | if uri and not affil_id: 183 | super().__init__(uri) 184 | elif affil_id and not uri: 185 | super().__init__(self._uri_base + str(affil_id)) 186 | elif not uri and not affil_id: 187 | raise ValueError('No URI or affiliation ID specified') 188 | else: 189 | raise ValueError('Both URI and affiliation ID specified; just need one.') 190 | 191 | # properties 192 | @property 193 | def name(self): 194 | """Gets the affiliation's name""" 195 | return self.data["affiliation-name"]; 196 | 197 | # modifier functions 198 | def read(self, els_client = None): 199 | """Reads the JSON representation of the affiliation from ELSAPI. 200 | Returns True if successful; else, False.""" 201 | if ElsProfile.read(self, self._payload_type, els_client): 202 | return True 203 | else: 204 | return False 205 | 206 | def read_docs(self, els_client = None): 207 | """Fetches the list of documents associated with this affiliation from 208 | api.elsevier.com. Returns True if successful; else, False.""" 209 | return ElsProfile.read_docs(self, self._payload_type, els_client) 210 | -------------------------------------------------------------------------------- /elsapy/elssearch.py: -------------------------------------------------------------------------------- 1 | """The search module of elsapy. 2 | Additional resources: 3 | * https://github.com/ElsevierDev/elsapy 4 | * https://dev.elsevier.com 5 | * https://api.elsevier.com""" 6 | 7 | from . import log_util 8 | from urllib.parse import quote_plus as url_encode 9 | import pandas as pd, json 10 | from .utils import recast_df 11 | 12 | logger = log_util.get_logger(__name__) 13 | 14 | class ElsSearch(): 15 | """Represents a search to one of the search indexes accessible 16 | through api.elsevier.com. Returns True if successful; else, False.""" 17 | 18 | # static / class variables 19 | _base_url = u'https://api.elsevier.com/content/search/' 20 | _cursored_indexes = [ 21 | 'scopus', 22 | ] 23 | 24 | def __init__(self, query, index): 25 | """Initializes a search object with a query and target index.""" 26 | self.query = query 27 | self.index = index 28 | self._cursor_supported = (index in self._cursored_indexes) 29 | self._uri = self._base_url + self.index + '?query=' + url_encode( 30 | self.query) 31 | self.results_df = pd.DataFrame() 32 | 33 | # properties 34 | @property 35 | def query(self): 36 | """Gets the search query""" 37 | return self._query 38 | 39 | @query.setter 40 | def query(self, query): 41 | """Sets the search query""" 42 | self._query = query 43 | 44 | @property 45 | def index(self): 46 | """Gets the label of the index targeted by the search""" 47 | return self._index 48 | 49 | @index.setter 50 | def index(self, index): 51 | """Sets the label of the index targeted by the search""" 52 | self._index = index 53 | 54 | @property 55 | def results(self): 56 | """Gets the results for the search""" 57 | return self._results 58 | 59 | @property 60 | def tot_num_res(self): 61 | """Gets the total number of results that exist in the index for 62 | this query. This number might be larger than can be retrieved 63 | and stored in a single ElsSearch object (i.e. 5,000).""" 64 | return self._tot_num_res 65 | 66 | @property 67 | def num_res(self): 68 | """Gets the number of results for this query that are stored in the 69 | search object. This number might be smaller than the number of 70 | results that exist in the index for the query.""" 71 | return len(self.results) 72 | 73 | @property 74 | def uri(self): 75 | """Gets the request uri for the search""" 76 | return self._uri 77 | 78 | def _upper_limit_reached(self): 79 | """Determines if the upper limit for retrieving results from of the 80 | search index is reached. Returns True if so, else False. Upper 81 | limit is 5,000 for indexes that don't support cursor-based 82 | pagination.""" 83 | if self._cursor_supported: 84 | return False 85 | else: 86 | return self.num_res >= 5000 87 | 88 | 89 | def execute( 90 | self, 91 | els_client = None, 92 | get_all = False, 93 | use_cursor = False, 94 | view = None, 95 | count = 25, 96 | fields = [] 97 | ): 98 | """Executes the search. If get_all = False (default), this retrieves 99 | the default number of results specified for the API. If 100 | get_all = True, multiple API calls will be made to iteratively get 101 | all results for the search, up to a maximum of 5,000.""" 102 | ## TODO: add exception handling 103 | url = self._uri 104 | if use_cursor: 105 | url += "&cursor=*" 106 | if view: 107 | url += "&view={}".format(view) 108 | api_response = els_client.exec_request(url) 109 | self._tot_num_res = int(api_response['search-results']['opensearch:totalResults']) 110 | self._results = api_response['search-results']['entry'] 111 | if get_all is True: 112 | while (self.num_res < self.tot_num_res) and not self._upper_limit_reached(): 113 | for e in api_response['search-results']['link']: 114 | if e['@ref'] == 'next': 115 | next_url = e['@href'] 116 | api_response = els_client.exec_request(next_url) 117 | self._results += api_response['search-results']['entry'] 118 | with open('dump.json', 'w') as f: 119 | f.write(json.dumps(self._results)) 120 | self.results_df = recast_df(pd.DataFrame(self._results)) 121 | 122 | def hasAllResults(self): 123 | """Returns true if the search object has retrieved all results for the 124 | query from the index (i.e. num_res equals tot_num_res).""" 125 | return (self.num_res is self.tot_num_res) 126 | -------------------------------------------------------------------------------- /elsapy/log_util.py: -------------------------------------------------------------------------------- 1 | """A logging module for use with elsapy. 2 | Additional resources: 3 | * https://github.com/ElsevierDev/elsapy 4 | * https://dev.elsevier.com 5 | * https://api.elsevier.com""" 6 | 7 | import time, logging 8 | try: 9 | from pathlib import Path 10 | except ImportError: 11 | from pathlib2 import Path 12 | 13 | ## Following adapted from https://docs.python.org/3/howto/logging-cookbook.html 14 | 15 | def get_logger(name): 16 | # TODO: add option to disable logging, without stripping logger out of all modules 17 | # - e.g. by simply not writing to file if logging is disabled. See 18 | # https://github.com/ElsevierDev/elsapy/issues/26 19 | 20 | # create logger with module name 21 | logger = logging.getLogger(name) 22 | logger.setLevel(logging.DEBUG) 23 | # create log path, if not already there 24 | logPath = Path('logs') 25 | if not logPath.exists(): 26 | logPath.mkdir() 27 | # create file handler which logs even debug messages 28 | fh = logging.FileHandler('logs/elsapy-%s.log' % time.strftime('%Y%m%d')) 29 | fh.setLevel(logging.DEBUG) 30 | # create console handler with a higher log level 31 | ch = logging.StreamHandler() 32 | ch.setLevel(logging.ERROR) 33 | # create formatter and add it to the handlers 34 | formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') 35 | fh.setFormatter(formatter) 36 | ch.setFormatter(formatter) 37 | # add the handlers to the logger 38 | logger.addHandler(fh) 39 | logger.addHandler(ch) 40 | logger.info("Module loaded.") 41 | return logger -------------------------------------------------------------------------------- /elsapy/utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """An elsapy module that contains decorators and other utilities to make the 3 | project more maintainable. 4 | """ 5 | 6 | import pandas as pd 7 | from . import log_util 8 | 9 | logger = log_util.get_logger(__name__) 10 | 11 | 12 | def recast_df(df): 13 | '''Recasts a data frame so that it has proper date fields and a more 14 | useful data structure for URLs''' 15 | int_resp_fields = [ 16 | 'document-count', 17 | 'citedby-count', 18 | ] 19 | date_resp_fields = [ 20 | 'prism:coverDate', 21 | ] 22 | 23 | # Modify data structure for storing links/URLs in a DF 24 | if 'link' in df.columns: 25 | if '@rel' in df.link[0][0].keys(): 26 | # To deal with inconsistency. In some API responses, the link type 27 | # field uses '@rel' as key; in others, it uses '@ref'. 28 | link_type_key ='@rel' 29 | else: 30 | link_type_key = '@ref' 31 | df['link'] = df.link.apply( 32 | lambda x: dict([(e[link_type_key], e['@href']) for e in x])) 33 | # Recast fields that contain integers from strings to the integer type 34 | for int_field in int_resp_fields: 35 | if int_field in df.columns: 36 | df[int_field] = df[int_field].apply( 37 | int) 38 | # Recast fields that contain datetime from strings to a datetime type 39 | for date_field in date_resp_fields: 40 | if date_field in df.columns: 41 | logger.info("Converting {}".format(date_field)) 42 | df[date_field] = df[date_field].apply( 43 | pd.Timestamp) 44 | return df -------------------------------------------------------------------------------- /exampleProg.py: -------------------------------------------------------------------------------- 1 | """An example program that uses the elsapy module""" 2 | 3 | from elsapy.elsclient import ElsClient 4 | from elsapy.elsprofile import ElsAuthor, ElsAffil 5 | from elsapy.elsdoc import FullDoc, AbsDoc 6 | from elsapy.elssearch import ElsSearch 7 | import json 8 | 9 | ## Load configuration 10 | con_file = open("config.json") 11 | config = json.load(con_file) 12 | con_file.close() 13 | 14 | ## Initialize client 15 | client = ElsClient(config['apikey']) 16 | client.inst_token = config['insttoken'] 17 | 18 | ## Author example 19 | # Initialize author with uri 20 | my_auth = ElsAuthor( 21 | uri = 'https://api.elsevier.com/content/author/author_id/7004367821') 22 | # Read author data, then write to disk 23 | if my_auth.read(client): 24 | print ("my_auth.full_name: ", my_auth.full_name) 25 | my_auth.write() 26 | else: 27 | print ("Read author failed.") 28 | 29 | ## Affiliation example 30 | # Initialize affiliation with ID as string 31 | my_aff = ElsAffil(affil_id = '60101411') 32 | if my_aff.read(client): 33 | print ("my_aff.name: ", my_aff.name) 34 | my_aff.write() 35 | else: 36 | print ("Read affiliation failed.") 37 | 38 | ## Scopus (Abtract) document example 39 | # Initialize document with ID as integer 40 | scp_doc = AbsDoc(scp_id = 84872135457) 41 | if scp_doc.read(client): 42 | print ("scp_doc.title: ", scp_doc.title) 43 | scp_doc.write() 44 | else: 45 | print ("Read document failed.") 46 | 47 | ## ScienceDirect (full-text) document example using PII 48 | pii_doc = FullDoc(sd_pii = 'S1674927814000082') 49 | if pii_doc.read(client): 50 | print ("pii_doc.title: ", pii_doc.title) 51 | pii_doc.write() 52 | else: 53 | print ("Read document failed.") 54 | 55 | ## ScienceDirect (full-text) document example using DOI 56 | doi_doc = FullDoc(doi = '10.1016/S1525-1578(10)60571-5') 57 | if doi_doc.read(client): 58 | print ("doi_doc.title: ", doi_doc.title) 59 | doi_doc.write() 60 | else: 61 | print ("Read document failed.") 62 | 63 | 64 | ## Load list of documents from the API into affilation and author objects. 65 | # Since a document list is retrieved for 25 entries at a time, this is 66 | # a potentially lenghty operation - hence the prompt. 67 | print ("Load documents (Y/N)?") 68 | s = input('--> ') 69 | 70 | if (s == "y" or s == "Y"): 71 | 72 | ## Read all documents for example author, then write to disk 73 | if my_auth.read_docs(client): 74 | print ("my_auth.doc_list has " + str(len(my_auth.doc_list)) + " items.") 75 | my_auth.write_docs() 76 | else: 77 | print ("Read docs for author failed.") 78 | 79 | ## Read all documents for example affiliation, then write to disk 80 | if my_aff.read_docs(client): 81 | print ("my_aff.doc_list has " + str(len(my_aff.doc_list)) + " items.") 82 | my_aff.write_docs() 83 | else: 84 | print ("Read docs for affiliation failed.") 85 | 86 | ## Initialize author search object and execute search 87 | auth_srch = ElsSearch('authlast(keuskamp)','author') 88 | auth_srch.execute(client) 89 | print ("auth_srch has", len(auth_srch.results), "results.") 90 | 91 | ## Initialize affiliation search object and execute search 92 | aff_srch = ElsSearch('affil(amsterdam)','affiliation') 93 | aff_srch.execute(client) 94 | print ("aff_srch has", len(aff_srch.results), "results.") 95 | 96 | ## Initialize doc search object using Scopus and execute search, retrieving 97 | # all results 98 | doc_srch = ElsSearch("AFFIL(dartmouth) AND AUTHOR-NAME(lewis) AND PUBYEAR > 2011",'scopus') 99 | doc_srch.execute(client, get_all = True) 100 | print ("doc_srch has", len(doc_srch.results), "results.") 101 | 102 | ## Initialize doc search object using ScienceDirect and execute search, 103 | # retrieving all results 104 | doc_srch = ElsSearch("star trek vs star wars",'sciencedirect') 105 | doc_srch.execute(client, get_all = False) 106 | print ("doc_srch has", len(doc_srch.results), "results.") -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file = README.md -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | from setuptools import find_packages 3 | from elsapy.__init__ import version 4 | 5 | # TODO: use pbr for various packaging tasks. 6 | setup( 7 | name = 'elsapy', 8 | version = version, 9 | description = "A Python module for use with Elsevier's APIs: Scopus, ScienceDirect, others - see https://api.elsevier.com", 10 | long_description = "See https://github.com/ElsevierDev/elsapy for the latest information / background to this project.", 11 | author = 'Elsevier, Inc.', 12 | author_email = 'integrationsupport@elsevier.com', 13 | url = 'https://github.com/ElsevierDev/elsapy', 14 | license = 'License :: OSI Approved :: BSD License', 15 | download_url = 'https://www.github.com/ElsevierDev/elsapy/archive/0.4.6.tar.gz', 16 | keywords = ['elsevier api', 'sciencedirect api', 'scopus api'], # arbitrary keywords 17 | classifiers = [ 18 | 'Development Status :: 3 - Alpha', 19 | 'Intended Audience :: Developers', 20 | 'Intended Audience :: Science/Research', 21 | 'License :: OSI Approved :: BSD License', 22 | 'Programming Language :: Python :: 3', 23 | ], 24 | packages = find_packages(exclude=['contrib', 'docs', 'tests*']), 25 | install_requires = ['requests'] 26 | ) -------------------------------------------------------------------------------- /test_elsapy.py: -------------------------------------------------------------------------------- 1 | """Test cases for elsapy""" 2 | 3 | ## TODO: 4 | ## - break down in modules (test suites) for each class to allow faster unit-testing 5 | ## - this will require a shared 'utility class' 6 | ## - add a module that integrates all 7 | 8 | from elsapy.elsclient import ElsClient 9 | from elsapy.elsprofile import ElsAuthor, ElsAffil 10 | from elsapy.elsdoc import FullDoc, AbsDoc 11 | from elsapy.elssearch import ElsSearch 12 | from urllib.parse import quote_plus as url_encode 13 | import json, pathlib 14 | 15 | ## Load good client configuration 16 | conFile = open("config.json") 17 | config = json.load(conFile) 18 | conFile.close() 19 | 20 | ## Set local path for test data and ensure it's clean 21 | test_path = pathlib.Path.cwd() / 'test_data' 22 | if not test_path.exists(): 23 | test_path.mkdir() 24 | else: 25 | file_list = list(test_path.glob('*')) 26 | ## TODO: write recursive function that also identifies and clears out child directories 27 | for e in file_list: 28 | if e.is_file(): 29 | e.unlink() 30 | 31 | class util: 32 | """Contains tests common to test cases from multiple classes""" 33 | 34 | def file_exist_with_id(id): 35 | """Test case: exactly one local file exist with given ID in the filename""" 36 | if len(list(test_path.glob('*' + id +'*'))) == 1: 37 | return True 38 | 39 | 40 | class TestElsClient: 41 | """Test general client functionality""" 42 | 43 | def test_init_apikey_(self): 44 | """Test case: APIkey and token are set correctly during initialization""" 45 | my_client = ElsClient(config['apikey']) 46 | assert my_client.api_key == config['apikey'] 47 | assert my_client.inst_token == None 48 | 49 | def test_init_apikey_insttoken(self): 50 | """Test case: APIkey and insttoken are set correctly during initialization""" 51 | my_client = ElsClient(config['apikey'], inst_token = config['insttoken']) 52 | assert my_client.api_key == config['apikey'] 53 | assert my_client.inst_token == config['insttoken'] 54 | 55 | def test_init_apikey_insttoken_path(self): 56 | """Test case: APIkey, insttoken and local path are set correctly during initialization""" 57 | loc_dir = '\\TEMP' 58 | my_client = ElsClient(config['apikey'], inst_token = config['insttoken'], local_dir = loc_dir) 59 | assert my_client.api_key == config['apikey'] 60 | assert my_client.inst_token == config['insttoken'] 61 | assert str(my_client.local_dir) == loc_dir 62 | 63 | def test_set_apikey_insttoken(self): 64 | """Test case: APIkey and insttoken are set correctly using setters""" 65 | my_client = ElsClient("dummy") 66 | my_client.api_key = config['apikey'] 67 | my_client.inst_token = config['insttoken'] 68 | assert my_client.api_key == config['apikey'] 69 | assert my_client.inst_token == config['insttoken'] 70 | 71 | class TestElsAuthor: 72 | """Test author object functionality""" 73 | 74 | ## Test data 75 | auth_uri = "https://api.elsevier.com/content/author/author_id/55070335500" 76 | auth_id_int = 55070335500 77 | auth_id_str = "55070335500" 78 | 79 | ## Test initialization 80 | def test_init_uri(self): 81 | """ Test case: uri is set correctly during initialization with uri""" 82 | myAuth = ElsAuthor(uri = self.auth_uri) 83 | assert myAuth.uri == self.auth_uri 84 | 85 | def test_init_auth_id_int(self): 86 | """ Test case: uri is set correctly during initialization with author id as integer""" 87 | myAuth = ElsAuthor(author_id = self.auth_id_int) 88 | assert myAuth.uri == self.auth_uri 89 | 90 | def test_init_auth_id_str(self): 91 | """ Test case: uri is set correctly during initialization with author id as string""" 92 | myAuth = ElsAuthor(author_id = self.auth_id_str) 93 | assert myAuth.uri == self.auth_uri 94 | 95 | ## Test reading/writing author profile data 96 | bad_client = ElsClient("dummy") 97 | good_client = ElsClient(config['apikey'], inst_token = config['insttoken']) 98 | good_client.local_dir = str(test_path) 99 | 100 | myAuth = ElsAuthor(uri = auth_uri) 101 | 102 | def test_read_good_bad_client(self): 103 | """Test case: using a well-configured client leads to successful read 104 | and using a badly-configured client does not.""" 105 | assert self.myAuth.read(self.bad_client) == False 106 | assert self.myAuth.read(self.good_client) == True 107 | 108 | def test_json_to_dict(self): 109 | """Test case: the JSON read by the author object from the API is parsed 110 | into a Python dictionary""" 111 | assert type(self.myAuth.data) == dict 112 | 113 | def test_name_getter(self): 114 | """Test case: the full name attribute is returned as a non-empty string""" 115 | assert (type(self.myAuth.full_name) == str and self.myAuth.full_name != '') 116 | 117 | def test_write(self): 118 | """Test case: the author object's data is written to a file with the author 119 | ID in the filename""" 120 | self.myAuth.write() 121 | assert util.file_exist_with_id(self.myAuth.data['coredata']['dc:identifier'].split(':')[1]) 122 | 123 | def test_read_docs(self): 124 | self.myAuth.read_docs() 125 | assert len(self.myAuth.doc_list) > 0 126 | ## TODO: once author metrics inconsistency is resolved, change to: 127 | # assert len(self.myAuth.doc_list) == int(self.myAuth.data['coredata']['document-count']) 128 | 129 | def test_read_metrics_new_author(self): 130 | myAuth = ElsAuthor(uri = self.auth_uri) 131 | myAuth.read_metrics(self.good_client) 132 | assert ( 133 | myAuth.data['coredata']['citation-count'] and 134 | myAuth.data['coredata']['cited-by-count'] and 135 | myAuth.data['coredata']['document-count'] and 136 | myAuth.data['h-index']) 137 | 138 | def test_read_metrics_existing_author(self): 139 | self.myAuth.read_metrics(self.good_client) 140 | assert ( 141 | self.myAuth.data['coredata']['citation-count'] and 142 | self.myAuth.data['coredata']['cited-by-count'] and 143 | self.myAuth.data['coredata']['document-count'] and 144 | self.myAuth.data['h-index']) 145 | 146 | 147 | class TestElsAffil: 148 | """Test affiliation functionality""" 149 | 150 | ## Test data 151 | aff_uri = "https://api.elsevier.com/content/affiliation/affiliation_id/60101411" 152 | aff_id_int = 60101411 153 | aff_id_str = "60101411" 154 | 155 | ## Test initialization 156 | def test_init_uri(self): 157 | """ Test case: uri is set correctly during initialization with uri""" 158 | myAff = ElsAffil(uri = self.aff_uri) 159 | assert myAff.uri == self.aff_uri 160 | 161 | def test_init_aff_id_int(self): 162 | """ Test case: uri is set correctly during initialization with affiliation id as integer""" 163 | myAff = ElsAffil(affil_id = self.aff_id_int) 164 | assert myAff.uri == self.aff_uri 165 | 166 | def test_init_aff_id_str(self): 167 | """ Test case: uri is set correctly during initialization with affiliation id as string""" 168 | myAff = ElsAffil(affil_id = self.aff_id_str) 169 | assert myAff.uri == self.aff_uri 170 | 171 | ## Test reading/writing author profile data 172 | bad_client = ElsClient("dummy") 173 | good_client = ElsClient(config['apikey'], inst_token = config['insttoken']) 174 | good_client.local_dir = str(test_path) 175 | 176 | myAff = ElsAffil(uri = aff_uri) 177 | 178 | def test_read_good_bad_client(self): 179 | """Test case: using a well-configured client leads to successful read 180 | and using a badly-configured client does not.""" 181 | assert self.myAff.read(self.bad_client) == False 182 | assert self.myAff.read(self.good_client) == True 183 | 184 | def test_json_to_dict(self): 185 | """Test case: the JSON read by the author object from the API is parsed 186 | into a Python dictionary""" 187 | assert type(self.myAff.data) == dict 188 | 189 | def test_name_getter(self): 190 | """Test case: the name attribute is returned as a non-empty string""" 191 | assert (type(self.myAff.name) == str and self.myAff.name != '') 192 | 193 | def test_write(self): 194 | """Test case: the author object's data is written to a file with the author 195 | ID in the filename""" 196 | self.myAff.write() 197 | assert util.file_exist_with_id(self.myAff.data['coredata']['dc:identifier'].split(':')[1]) 198 | 199 | def test_read_docs(self): 200 | self.myAff.read_docs() 201 | assert len(self.myAff.doc_list) == int(self.myAff.data['coredata']['document-count']) 202 | 203 | 204 | class TestAbsDoc: 205 | """Test Scopus document functionality""" 206 | 207 | ## Test data 208 | abs_uri = "https://api.elsevier.com/content/abstract/scopus_id/84872135457" 209 | scp_id_int = 84872135457 210 | scp_id_str = "84872135457" 211 | 212 | ## Test initialization 213 | def test_init_uri(self): 214 | """ Test case: uri is set correctly during initialization with uri""" 215 | myAbsDoc = AbsDoc(uri = self.abs_uri) 216 | assert myAbsDoc.uri == self.abs_uri 217 | 218 | def test_init_scp_id_int(self): 219 | """ Test case: uri is set correctly during initialization with Scopus id as integer""" 220 | myAbsDoc = AbsDoc(scp_id = self.scp_id_int) 221 | assert myAbsDoc.uri == self.abs_uri 222 | 223 | def test_init_scp_id_str(self): 224 | """ Test case: uri is set correctly during initialization with Scopus id as string""" 225 | myAbsDoc = AbsDoc(scp_id = self.scp_id_str) 226 | assert myAbsDoc.uri == self.abs_uri 227 | 228 | ## Test reading/writing author profile data 229 | bad_client = ElsClient("dummy") 230 | good_client = ElsClient(config['apikey'], inst_token = config['insttoken']) 231 | good_client.local_dir = str(test_path) 232 | 233 | myAbsDoc = AbsDoc(uri = abs_uri) 234 | 235 | def test_read_good_bad_client(self): 236 | """Test case: using a well-configured client leads to successful read 237 | and using a badly-configured client does not.""" 238 | assert self.myAbsDoc.read(self.bad_client) == False 239 | assert self.myAbsDoc.read(self.good_client) == True 240 | 241 | def test_json_to_dict(self): 242 | """Test case: the JSON read by the abstract document object from the 243 | API is parsed into a Python dictionary""" 244 | assert type(self.myAbsDoc.data) == dict 245 | 246 | def test_title_getter(self): 247 | """Test case: the title attribute is returned as a non-empty string""" 248 | assert (type(self.myAbsDoc.title) == str and self.myAbsDoc.title != '') 249 | 250 | def test_write(self): 251 | """Test case: the abstract document object's data is written to a file with the Scopus 252 | ID in the filename""" 253 | self.myAbsDoc.write() 254 | assert util.file_exist_with_id(self.myAbsDoc.data['coredata']['dc:identifier'].split(':')[1]) 255 | 256 | class TestFullDoc: 257 | """Test ScienceDirect article functionality""" 258 | 259 | ## Test data 260 | full_pii_uri = "https://api.elsevier.com/content/article/pii/S1674927814000082" 261 | sd_pii = 'S1674927814000082' 262 | full_doi_uri = "https://api.elsevier.com/content/article/doi/10.1016/S1525-1578(10)60571-5" 263 | doi = '10.1016/S1525-1578(10)60571-5' 264 | 265 | ## Test initialization 266 | def test_init_uri(self): 267 | """ Test case: uri is set correctly during initialization with uri""" 268 | myFullDoc = FullDoc(uri = self.full_pii_uri) 269 | assert myFullDoc.uri == self.full_pii_uri 270 | 271 | def test_init_sd_pii(self): 272 | """ Test case: uri is set correctly during initialization with ScienceDirect PII""" 273 | myFullDoc = FullDoc(sd_pii = self.sd_pii) 274 | assert myFullDoc.uri == self.full_pii_uri 275 | 276 | def test_init_doi(self): 277 | """ Test case: uri is set correctly during initialization with DOI""" 278 | myFullDoc = FullDoc(doi = self.doi) 279 | assert myFullDoc.uri == self.full_doi_uri 280 | 281 | ## Test reading/writing author profile data 282 | bad_client = ElsClient("dummy") 283 | good_client = ElsClient(config['apikey'], inst_token = config['insttoken']) 284 | good_client.local_dir = str(test_path) 285 | 286 | myFullDoc = FullDoc(uri = full_pii_uri) 287 | 288 | def test_read_good_bad_client(self): 289 | """Test case: using a well-configured client leads to successful read 290 | and using a badly-configured client does not.""" 291 | assert self.myFullDoc.read(self.bad_client) == False 292 | assert self.myFullDoc.read(self.good_client) == True 293 | 294 | def test_json_to_dict(self): 295 | """Test case: the JSON read by the full article object from the 296 | API is parsed into a Python dictionary""" 297 | assert type(self.myFullDoc.data) == dict 298 | 299 | def test_title_getter(self): 300 | """Test case: the title attribute is returned as a non-empty string""" 301 | assert (type(self.myFullDoc.title) == str and self.myFullDoc.title != '') 302 | 303 | def test_write(self): 304 | """Test case: the full article object's data is written to a file with the ID in the filename""" 305 | self.myFullDoc.write() 306 | ## TODO: replace following (strung-together replace) with regex 307 | assert util.file_exist_with_id( 308 | self.myFullDoc.data['coredata']['pii'].replace('-','').replace('(','').replace(')','')) 309 | 310 | 311 | 312 | class TestSearch: 313 | """Test search functionality""" 314 | 315 | ## Test data 316 | base_url = u'https://api.elsevier.com/content/search/' 317 | search_types = [ 318 | {"query" : "authlast(keuskamp)", "index" : "author"}, 319 | {"query" : "affil(amsterdam)", "index" : "affiliation"}, 320 | {"query" : "AFFIL(dartmouth) AND AUTHOR-NAME(lewis) AND PUBYEAR > 2011", 321 | "index" : "scopus"}, 322 | {"query" : "star trek vs star wars", "index" : "sciencedirect"} 323 | ] 324 | 325 | searches = [ ElsSearch(search_type["query"], search_type["index"]) 326 | for search_type in search_types] 327 | 328 | good_client = ElsClient(config['apikey'], inst_token = config['insttoken']) 329 | 330 | 331 | ## Test initialization 332 | def test_init_uri(self): 333 | """Test case: query, index and uri are set correctly during 334 | initialization""" 335 | match_all = True 336 | for i in range(len(self.search_types)): 337 | if (self.searches[i].query != self.search_types[i]['query'] or 338 | self.searches[i].index != self.search_types[i]['index'] or 339 | self.searches[i].uri != (self.base_url + 340 | self.search_types[i]['index'] + 341 | '?query=' + 342 | url_encode(self.search_types[i]['query']))): 343 | match_all = False 344 | assert match_all == True 345 | 346 | def test_execution(self): 347 | '''Test case: all searches are executed without raising an exception.''' 348 | for search in self.searches: 349 | search.execute(self.good_client) 350 | assert True 351 | --------------------------------------------------------------------------------