├── .github └── workflows │ └── build.yml ├── .gitignore ├── LICENSE ├── README.md ├── examples ├── fetch_abstracts.py └── pubmedflow.ipynb ├── pubmedflow ├── __init__.py ├── pubmedflow.py └── utils.py └── setup.py /.github/workflows/build.yml: -------------------------------------------------------------------------------- 1 | # This workflow will install Python dependencies, run tests and lint with a variety of Python versions 2 | # For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions 3 | 4 | name: Build 5 | 6 | on: 7 | push: 8 | branches: [ "main" ] 9 | pull_request: 10 | branches: [ "main" ] 11 | 12 | jobs: 13 | build: 14 | 15 | runs-on: ubuntu-latest 16 | strategy: 17 | fail-fast: true 18 | matrix: 19 | python-version: ["3.8", "3.9"] 20 | os: [ubuntu-latest, macos-latest] 21 | 22 | steps: 23 | - uses: actions/checkout@v3 24 | - name: Set up Python ${{ matrix.python-version }} 25 | uses: actions/setup-python@v3 26 | with: 27 | python-version: ${{ matrix.python-version }} 28 | - name: Install dependencies 29 | run: | 30 | python -m pip install --upgrade pip 31 | python -m pip install flake8 pytest 32 | if [ -f requirements.txt ]; then pip install -r requirements.txt; fi 33 | - name: Lint with flake8 34 | run: | 35 | # stop the build if there are Python syntax errors or undefined names 36 | flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics 37 | # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide 38 | flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics 39 | - name: Test with pytest 40 | run: | 41 | pip install --upgrade pip 42 | python setup.py install 43 | 44 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | .DS_Store 6 | 7 | # C extensions 8 | *.so 9 | 10 | # Distribution / packaging 11 | .Python 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | pip-wheel-metadata/ 25 | share/python-wheels/ 26 | *.egg-info/ 27 | .installed.cfg 28 | *.egg 29 | MANIFEST 30 | 31 | # PyInstaller 32 | # Usually these files are written by a python script from a template 33 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 34 | *.manifest 35 | *.spec 36 | 37 | # Installer logs 38 | pip-log.txt 39 | pip-delete-this-directory.txt 40 | 41 | # Unit test / coverage reports 42 | htmlcov/ 43 | .tox/ 44 | .nox/ 45 | .coverage 46 | .coverage.* 47 | .cache 48 | nosetests.xml 49 | coverage.xml 50 | *.cover 51 | *.py,cover 52 | .hypothesis/ 53 | .pytest_cache/ 54 | 55 | # Translations 56 | *.mo 57 | *.pot 58 | 59 | # Django stuff: 60 | *.log 61 | local_settings.py 62 | db.sqlite3 63 | db.sqlite3-journal 64 | 65 | # Flask stuff: 66 | instance/ 67 | .webassets-cache 68 | 69 | # Scrapy stuff: 70 | .scrapy 71 | 72 | # Sphinx documentation 73 | docs/_build/ 74 | 75 | # PyBuilder 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | .python-version 87 | 88 | # pipenv 89 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 90 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 91 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 92 | # install all needed dependencies. 93 | #Pipfile.lock 94 | 95 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 96 | __pypackages__/ 97 | 98 | # Celery stuff 99 | celerybeat-schedule 100 | celerybeat.pid 101 | 102 | # SageMath parsed files 103 | *.sage.py 104 | 105 | # Environments 106 | .env 107 | .venv 108 | env/ 109 | venv/ 110 | ENV/ 111 | env.bak/ 112 | venv.bak/ 113 | 114 | # Spyder project settings 115 | .spyderproject 116 | .spyproject 117 | 118 | # Rope project settings 119 | .ropeproject 120 | 121 | # mkdocs documentation 122 | /site 123 | 124 | # mypy 125 | .mypy_cache/ 126 | .dmypy.json 127 | dmypy.json 128 | 129 | # Pyre type checker 130 | .pyre/ 131 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 NFFLOW 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
Contribute and Support
4 | 5 | 6 | [](https://opensource.org/licenses/MIT) 7 | [](https://github.com/nfflow/pubmedflow/commits/main) 8 | [](http://makeapullrequest.com) 9 | [](https://colab.research.google.com/drive/1mjlnHAb7aqwfDEylo05z3RdIyyaNRoQ5?usp=sharing) 10 | 11 | 12 | ## 🎮 Features 13 | 14 | - fetch pubmed ids (pmids) based on keyword query (supports multiple keywords query) 15 | - Fetch Abstract of research papers from pubmed based on pmids 16 | - Download the full pdf of respective pmid -> if available on pubmedcentral (pmc) 17 | - if pdf not available on pmc -> download from scihub internally 18 | 19 | 20 | ## How to obtain ncbi key? 21 | 22 | - Follow this [tutorial](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/#:~:text=To%20create%20the%20key%2C%20go,and%20copy%20the%20resulting%20key) 23 | 24 | ## Installation 25 | ### From pypi 26 | 27 | ``` 28 | pip install pubmedflow 29 | ``` 30 | 31 | ### From source 32 | ```python 33 | python setup.py install 34 | ``` 35 | OR 36 | ``` 37 | pip install git+https://github.com/nfflow/pubmedflow 38 | ``` 39 | 40 | ## How to use api? 41 | 42 | Arguments: 43 | Name | Input | Description 44 | ----------- | ----------- | ----------- 45 | folder_name | Optional, str | path to store output data 46 | 47 | 48 | ## Quick Start: 49 | 50 | ### Download pubmed articles as PDF and DataFrame - 51 | 52 | ```python 53 | 54 | import eutils 55 | from pubmedflow import LazyPubmed 56 | 57 | 58 | pb = LazyPubmed(title_query, 59 | folder_name='pubmed_data', 60 | api_key='', 61 | max_documents=None, 62 | download_pdf=True, 63 | scihub=False) 64 | 65 | ``` 66 | 67 | ### Perform unsupervised learning to make a pre-trained model from the collected data: 68 | 69 | ```python 70 | pb.pubmed_train(model_name='sentence-transformers/all-mpnet-base-v2', 71 | model_output_path='pubmedflow_model', 72 | model_architecture='ct') 73 | ``` 74 | 75 | ### Do question answering on the downloaded text to get answer spans from each article: 76 | 77 | ```python 78 | 79 | qa_results = pb.pubmed_qa(qa_query = 'What are the chronic diseases',) 80 | print(qa_results) 81 | ``` 82 | 83 | ### Summarise each of them 84 | 85 | ```python 86 | 87 | summ_results = pb.pubmed_summarise() 88 | print(summ_results) 89 | ``` 90 | 91 | ### Perform entity extraction on each of them 92 | 93 | ```python 94 | 95 | ents = pb.pubmed_entity_extraction() 96 | print(ents) 97 | ``` 98 | 99 | 100 | 101 | 102 | 103 | -------------------------------------------------------------------------------- /examples/fetch_abstracts.py: -------------------------------------------------------------------------------- 1 | from pubmedflow import LazyPubmed 2 | pb = LazyPubmed() 3 | 4 | result = pb.fetch(query = "lncRNA", 5 | key = "your_api_key", 6 | max_documents = 5) -------------------------------------------------------------------------------- /examples/pubmedflow.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 5, 6 | "id": "d7aa088f", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "from pubmedflow import LazyPubmed\n", 11 | "\n", 12 | "pb = LazyPubmed()\n", 13 | "df_result = pb.pubmed_search(query = 'Chronic',\n", 14 | " key = \"your_api_key\",\n", 15 | " max_documents = 10,\n", 16 | " download_pdf = True, \n", 17 | " scihub = False)" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 2, 23 | "id": "e14c8487", 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "data": { 28 | "text/html": [ 29 | "\n", 47 | " | title | \n", 48 | "issue | \n", 49 | "pages | \n", 50 | "abstract | \n", 51 | "journal | \n", 52 | "authors | \n", 53 | "pubdate | \n", 54 | "pmid | \n", 55 | "mesh_terms | \n", 56 | "publication_types | \n", 57 | "... | \n", 58 | "references | \n", 59 | "delete | \n", 60 | "affiliations | \n", 61 | "pmc | \n", 62 | "other_id | \n", 63 | "medline_ta | \n", 64 | "nlm_unique_id | \n", 65 | "issn_linking | \n", 66 | "country | \n", 67 | "pdf_content | \n", 68 | "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", 73 | "Assessing the experience of person-centred coo... | \n", 74 | "25(3) | \n", 75 | "1069-1080 | \n", 76 | "BACKGROUND\\nCountries are adapting their healt... | \n", 77 | "Health expectations : an international journal... | \n", 78 | "Rijken|Mieke|M|https://orcid.org/0000-0001-607... | \n", 79 | "2022 | \n", 80 | "35318778 | \n", 81 | "D000328:Adult; D000368:Aged; D000369:Aged, 80 ... | \n", 82 | "D016428:Journal Article | \n", 83 | "... | \n", 84 | "29444767;29166917;8870135;15804318;22778146;18... | \n", 85 | "False | \n", 86 | "Nivel (Netherlands Institute for Health Servic... | \n", 87 | "NaN | \n", 88 | "NaN | \n", 89 | "Health Expect | \n", 90 | "9815926 | \n", 91 | "1369-6513 | \n", 92 | "England | \n", 93 | "Received 11 August 2021 Revised 28 November 20... | \n", 94 | "
1 | \n", 97 | "Association Between Systolic Blood Pressure Va... | \n", 98 | "11(11) | \n", 99 | "e025513 | \n", 100 | "Background Whether visit-to-visit systolic blo... | \n", 101 | "Journal of the American Heart Association | \n", 102 | "Park|Cheol Ho|CH|0000-0003-4636-5745;Kim|Hyung... | \n", 103 | "2022 | \n", 104 | "35656977 | \n", 105 | "D001794:Blood Pressure; D002318:Cardiovascular... | \n", 106 | "D016428:Journal Article | \n", 107 | "... | \n", 108 | "NaN | \n", 109 | "False | \n", 110 | "Department of Internal Medicine College of Med... | \n", 111 | "NaN | \n", 112 | "NaN | \n", 113 | "J Am Heart Assoc | \n", 114 | "101580524 | \n", 115 | "2047-9980 | \n", 116 | "England | \n", 117 | "NaN | \n", 118 | "
2 | \n", 121 | "Comorbidity progression patterns of major chro... | \n", 122 | "NaN | \n", 123 | "17423953221087647 | \n", 124 | "OBJECTIVE\\nThe presence of one chronic disease... | \n", 125 | "Chronic illness | \n", 126 | "Uddin|Shahadat|S|https://orcid.org/0000-0003-0... | \n", 127 | "2022 | \n", 128 | "35306857 | \n", 129 | "NaN | \n", 130 | "D016428:Journal Article | \n", 131 | "... | \n", 132 | "NaN | \n", 133 | "False | \n", 134 | "Faculty of Engineering, 4334The University of ... | \n", 135 | "NaN | \n", 136 | "NaN | \n", 137 | "Chronic Illn | \n", 138 | "101253019 | \n", 139 | "1742-3953 | \n", 140 | "United States | \n", 141 | "NaN | \n", 142 | "
3 | \n", 145 | "A Review of Laser Therapy and Low-Intensity Ul... | \n", 146 | "26(1) | \n", 147 | "57-63 | \n", 148 | "PURPOSE OF REVIEW\\nChronic pain management the... | \n", 149 | "Current pain and headache reports | \n", 150 | "Chen|Frank R|FR|;Manzi|Joseph E|JE|;Mehta|Neel... | \n", 151 | "2022 | \n", 152 | "35133560 | \n", 153 | "D059350:Chronic Pain; D006801:Humans; D053685:... | \n", 154 | "D016428:Journal Article; D016454:Review | \n", 155 | "... | \n", 156 | "32880358;25824429;31726927;30443883;12605432;2... | \n", 157 | "False | \n", 158 | "Department of Anesthesiology, Hospital of the ... | \n", 159 | "NaN | \n", 160 | "NaN | \n", 161 | "Curr Pain Headache Rep | \n", 162 | "100970666 | \n", 163 | "1534-3081 | \n", 164 | "United States | \n", 165 | "NaN | \n", 166 | "
4 | \n", 169 | "Patients' and healthcare providers' perception... | \n", 170 | "22(1) | \n", 171 | "9 | \n", 172 | "BACKGROUND\\nTelehealth and online health infor... | \n", 173 | "BMC geriatrics | \n", 174 | "Jiang|Yuyu|Y|;Sun|Pingping|P|;Chen|Zhongyi|Z|;... | \n", 175 | "2022 | \n", 176 | "34979967 | \n", 177 | "D000368:Aged; D019468:Disease Management; D006... | \n", 178 | "D016428:Journal Article; D013485:Research Supp... | \n", 179 | "... | \n", 180 | "32512462;16867972;32314971;12020305;33687342;2... | \n", 181 | "False | \n", 182 | "Research office of chronic disease management ... | \n", 183 | "NaN | \n", 184 | "NaN | \n", 185 | "BMC Geriatr | \n", 186 | "100968548 | \n", 187 | "1471-2318 | \n", 188 | "England | \n", 189 | "Jiang et al BMC Geriatrics 2022 22 9 https doi... | \n", 190 | "
5 | \n", 193 | "A Preliminary Study of Provider Burden in the ... | \n", 194 | "22(11) | \n", 195 | "1408-1417 | \n", 196 | "This study compared perceptions of the burden ... | \n", 197 | "The journal of pain | \n", 198 | "Tait|Raymond C|RC|;Chibnall|John T|JT|;Kalauok... | \n", 199 | "2021 | \n", 200 | "33989786 | \n", 201 | "D000328:Adult; D001291:Attitude of Health Pers... | \n", 202 | "D016428:Journal Article; D013485:Research Supp... | \n", 203 | "... | \n", 204 | "NaN | \n", 205 | "False | \n", 206 | "Department of Psychiatry and Behavioral Neuros... | \n", 207 | "NaN | \n", 208 | "NaN | \n", 209 | "J Pain | \n", 210 | "100898657 | \n", 211 | "1526-5900 | \n", 212 | "United States | \n", 213 | "NaN | \n", 214 | "
6 | \n", 217 | "\"A little bit of a guidance and a little bit o... | \n", 218 | "43(23) | \n", 219 | "3347-3356 | \n", 220 | "PURPOSE\\nTo understand preferences, barriers, ... | \n", 221 | "Disability and rehabilitation | \n", 222 | "Dnes|Natalie|N|;Coley|Bridget|B|;Frisby|Kaitly... | \n", 223 | "2021 | \n", 224 | "32223460 | \n", 225 | "D000293:Adolescent; D000328:Adult; D059350:Chr... | \n", 226 | "D016428:Journal Article; D013485:Research Supp... | \n", 227 | "... | \n", 228 | "NaN | \n", 229 | "False | \n", 230 | "Department of Physical Therapy, University of ... | \n", 231 | "NaN | \n", 232 | "NaN | \n", 233 | "Disabil Rehabil | \n", 234 | "9207179 | \n", 235 | "0963-8288 | \n", 236 | "England | \n", 237 | "NaN | \n", 238 | "
7 | \n", 241 | "Chronic disease health literacy in First Natio... | \n", 242 | "30(17-18) | \n", 243 | "2683-2695 | \n", 244 | "AIM\\nTo explore chronic disease education, sel... | \n", 245 | "Journal of clinical nursing | \n", 246 | "Rheault|Haunnah|H|https://orcid.org/0000-0001-... | \n", 247 | "2021 | \n", 248 | "34180097 | \n", 249 | "D000328:Adult; D001315:Australia; D002908:Chro... | \n", 250 | "D016428:Journal Article | \n", 251 | "... | \n", 252 | "NaN | \n", 253 | "False | \n", 254 | "School of Nursing, Queensland University of Te... | \n", 255 | "NaN | \n", 256 | "NaN | \n", 257 | "J Clin Nurs | \n", 258 | "9207302 | \n", 259 | "0962-1067 | \n", 260 | "England | \n", 261 | "NaN | \n", 262 | "
8 | \n", 265 | "Patient Perceptions of Physician Burden in the... | \n", 266 | "22(9) | \n", 267 | "1060-1071 | \n", 268 | "While patient perceptions of burden to caregiv... | \n", 269 | "The journal of pain | \n", 270 | "Tait|Raymond C|RC|;Chibnall|John T|JT|;Kalauok... | \n", 271 | "2021 | \n", 272 | "33727158 | \n", 273 | "D000328:Adult; D059350:Chronic Pain; D002983:C... | \n", 274 | "D016428:Journal Article | \n", 275 | "... | \n", 276 | "NaN | \n", 277 | "False | \n", 278 | "Department of Psychiatry and Behavioral Neuros... | \n", 279 | "NaN | \n", 280 | "NaN | \n", 281 | "J Pain | \n", 282 | "100898657 | \n", 283 | "1526-5900 | \n", 284 | "United States | \n", 285 | "NaN | \n", 286 | "
9 | \n", 289 | "The relationship between the perception of chr... | \n", 290 | "NaN | \n", 291 | "17423953211039792 | \n", 292 | "OBJECTIVES\\nIn this study, it was aimed to det... | \n", 293 | "Chronic illness | \n", 294 | "Akca|Nesrin|N|;Saygili|Meltem|M|;Ture|Aysun Ka... | \n", 295 | "2021 | \n", 296 | "34569319 | \n", 297 | "NaN | \n", 298 | "D016428:Journal Article | \n", 299 | "... | \n", 300 | "NaN | \n", 301 | "False | \n", 302 | "52977Kirikkale University, Faculty of Health S... | \n", 303 | "NaN | \n", 304 | "NaN | \n", 305 | "Chronic Illn | \n", 306 | "101253019 | \n", 307 | "1742-3953 | \n", 308 | "United States | \n", 309 | "NaN | \n", 310 | "
10 rows × 23 columns
\n", 314 | "