├── .gitignore
├── LICENSE
├── MANIFEST.in
├── README.md
├── examples
    ├── C0501.2019.xlsx
    ├── query-count-year.png
    ├── query-count.png
    ├── query-help.png
    ├── query-keys.png
    ├── query-output-jl.png
    ├── query-output-xlsx.png
    ├── query-year-and-subject.png
    └── query-year-region.png
├── help.md
├── nsfc
    ├── __init__.py
    ├── bin
    │   ├── __init__.py
    │   ├── build.py
    │   ├── crawl.py
    │   ├── main.py
    │   ├── query.py
    │   └── report.py
    ├── db
    │   ├── __init__.py
    │   ├── manager.py
    │   └── model.py
    ├── src
    │   ├── __init__.py
    │   ├── letpub.py
    │   ├── medsci.py
    │   └── official.py
    ├── util
    │   ├── __init__.py
    │   └── parse_data.py
    └── version
    │   └── version.json
├── requirements.txt
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *pyc
2 | *__pycache__
3 | .vscode
4 | *.sh
5 | *.egg-info
6 | build
7 | dist
8 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/LICENSE


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include requirements.txt
2 | include nsfc/version/version.json
3 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ![PyPI](https://img.shields.io/pypi/v/nsfc?color=red&label=latest%20version)
  2 | [![Downloads](https://pepy.tech/badge/nsfc)](https://pepy.tech/project/nsfc)
  3 | ![GitHub last commit](https://img.shields.io/github/last-commit/suqingdong/nsfc)
  4 | ![GitHub Repo stars](https://img.shields.io/github/stars/suqingdong/nsfc?color=orange)
  5 | ![GitHub forks](https://img.shields.io/github/forks/suqingdong/nsfc?color=%23CCAA88)
  6 | 
  7 | # 国家自然科学基金数据查询系统
  8 | 
  9 | ## 安装
 10 | ```bash
 11 | pip3 install nsfc
 12 | ```
 13 | 
 14 | ## 数据下载
 15 | > 数据库文件较大，可通过百度网盘进行下载
 16 | > ([下载链接](https://pan.baidu.com/s/1eadrfUg1ovBF1EAXWSTV-w) 提取码: `2nw5`)
 17 | - 下载所需的数据库文件，如project.A.sqlite3, 或全部数据project.all.sqlite3
 18 | - 保存至`nsfc`的安装路径下的`data`目录下, 如：`/path/to/site-packages/nsfc/data/project.db`
 19 | - 或者保存至`HOME`路径下的`nsfc_data`目录下，如`~/nsfc_data/project.db`
 20 | - 也可以通过`-d`参数指定要使用的数据库文件
 21 | 
 22 | ## 使用示例
 23 | ### 本地查询
 24 | ```bash
 25 | # 查看帮助
 26 | nsfc query
 27 | ```
 28 | ![](https://suqingdong.github.io/nsfc/examples/query-help.png)
 29 | 
 30 | ```bash
 31 | # 列出可用的查询字段
 32 | nsfc query -K
 33 | ```
 34 | ![](https://suqingdong.github.io/nsfc/examples/query-keys.png)
 35 | 
 36 | ```bash
 37 | # 输出数量
 38 | nsfc query -C
 39 | ```
 40 | ![](https://suqingdong.github.io/nsfc/examples/query-count.png)
 41 | 
 42 | ```bash
 43 | # 按批准年份查询
 44 | nsfc query -C -s approval_year 2019
 45 | ```
 46 | ![](https://suqingdong.github.io/nsfc/examples/query-count-year.png)
 47 | 
 48 | ```bash
 49 | # 按批准年份+学科代码(模糊)
 50 | nsfc query -C -s approval_year 2019 -s subject_code "%A%"
 51 | ```
 52 | ![](https://suqingdong.github.io/nsfc/examples/query-year-and-subject.png)
 53 | 
 54 | ```bash
 55 | # 批准年份也可以是一个区间
 56 | nsfc query -C -s approval_year 2015-2019 -s subject_code "%C01%"
 57 | ```
 58 | ![](https://suqingdong.github.io/nsfc/examples/query-year-region.png)
 59 | 
 60 | ```bash
 61 | # 结果输出为.jl文件
 62 | nsfc query -s approval_year 2019 -s subject_code "%C0501%" -o C0501.2019.jl
 63 | ```
 64 | ![](https://suqingdong.github.io/nsfc/examples/query-output-jl.png)
 65 | 
 66 | ```bash
 67 | # 结果输出为xlsx文件
 68 | nsfc query -s approval_year 2019 -s subject_code "%C0501%" -o C0501.2019.xlsx -F xlsx
 69 | ```
 70 | ![](https://suqingdong.github.io/nsfc/examples/query-output-xlsx.png)
 71 | 
 72 | ```bash
 73 | # 限制最大输出条数
 74 | nsfc query -L 5 -s approval_year 2019                                           
 75 | ```
 76 | 
 77 | #### 结题报告下载
 78 | ```bash
 79 | nsfc report 20671004
 80 | 
 81 | nsfc report 20671004 -o out.pdf
 82 | ```
 83 | 
 84 | ### 其他功能
 85 | #### LetPub数据获取
 86 | ```bash
 87 | nsfc crawl
 88 | ```
 89 | 
 90 | #### 本地数据库构建/更新
 91 | ```bash
 92 | nsfc build
 93 | ```
 94 | 
 95 | #### 其他说明
 96 | - 目前基本上只有2019年之前的数据，2020年的数据很少
 97 | - 后续有数据时会再更新
 98 | 
 99 | #### 更新记录
100 | - [2022-01-14] version 2.0.4
101 |     - update the urls of Official
102 | 


--------------------------------------------------------------------------------
/examples/C0501.2019.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/examples/C0501.2019.xlsx


--------------------------------------------------------------------------------
/examples/query-count-year.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/examples/query-count-year.png


--------------------------------------------------------------------------------
/examples/query-count.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/examples/query-count.png


--------------------------------------------------------------------------------
/examples/query-help.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/examples/query-help.png


--------------------------------------------------------------------------------
/examples/query-keys.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/examples/query-keys.png


--------------------------------------------------------------------------------
/examples/query-output-jl.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/examples/query-output-jl.png


--------------------------------------------------------------------------------
/examples/query-output-xlsx.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/examples/query-output-xlsx.png


--------------------------------------------------------------------------------
/examples/query-year-and-subject.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/examples/query-year-and-subject.png


--------------------------------------------------------------------------------
/examples/query-year-region.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/examples/query-year-region.png


--------------------------------------------------------------------------------
/help.md:
--------------------------------------------------------------------------------
 1 | ### 数据来源
 2 | #### [LetPub](http://www.letpub.com.cn/index.php?page=grant)
 3 | > 查询快，但频率会有限制
 4 | > 目前数据只更新到了2019年，2020年数据和官网一样受限
 5 | 
 6 | 
 7 | #### [MedSci](https://www.medsci.cn/sci/nsfc.do)
 8 | > 查询不好(单个查询限制500条，资助类别分类不准确)
 9 | > 有2020年数据（但不是很全）
10 | 
11 | 
12 | #### [NSFC](http://output.nsfc.gov.cn/)
13 | > 官网
14 | > 资助项目查询已失效，只能查询结题项目
15 | > 结题项目查询可用
16 | 


--------------------------------------------------------------------------------
/nsfc/__init__.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import json
 3 | import codecs
 4 | 
 5 | 
 6 | BASE_DIR = os.path.dirname(os.path.realpath(__file__))
 7 | version_file = os.path.join(BASE_DIR, 'version', 'version.json')
 8 | version_info = json.load(codecs.open(version_file, encoding='utf-8'))
 9 | 
10 | __version__ = version_info['version']
11 | 
12 | DEFAULT_DB = os.path.join(BASE_DIR, 'data', 'project.db')
13 | HOME = os.path.expanduser('~')
14 | 
15 | if not os.path.isfile(DEFAULT_DB):
16 |     file = os.path.join(HOME, 'nsfc_data', 'project.db')
17 |     if os.path.isfile(file):
18 |         DEFAULT_DB = file
19 | 


--------------------------------------------------------------------------------
/nsfc/bin/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/nsfc/bin/__init__.py


--------------------------------------------------------------------------------
/nsfc/bin/build.py:
--------------------------------------------------------------------------------
 1 | from pprint import pprint
 2 | 
 3 | import click
 4 | 
 5 | from nsfc.db.model import Project
 6 | from nsfc.db.manager import Manager
 7 | from nsfc.src.official import Official
 8 | from nsfc.util.parse_data import parse as parse_data
 9 | from nsfc import DEFAULT_DB
10 | 
11 | 
12 | @click.command(no_args_is_help=True, name='build', help='build the local database')
13 | @click.argument('infiles', nargs=-1)
14 | @click.option('-d', '--dbfile', help='the path of database file', default=DEFAULT_DB, show_default=True)
15 | @click.option('--echo', help='turn echo on for sqlalchemy', is_flag=True)
16 | @click.option('--drop', help='drop table before creating', is_flag=True)
17 | @click.option('-no', help='do not get conclusion data', is_flag=True)
18 | def main(**kwargs):
19 |     print(kwargs)
20 |     uri = 'sqlite:///{dbfile}'.format(**kwargs)
21 |     with Manager(uri=uri, echo=kwargs['echo'], drop=kwargs['drop']) as m:
22 |         for infile in kwargs['infiles']:
23 |             for data in parse_data(infile):
24 | 
25 |                 query_result = m.query(Project, 'project_id', data['project_id']).first()
26 |                 if query_result:
27 |                     print('*** skip ***', query_result)
28 |                     continue
29 |                 
30 |                 project = Project(**data)
31 | 
32 |                 if not kwargs['no']:
33 |                     conc_data = Official.get_conclusion_data(data['project_id'])
34 |                     if conc_data:
35 |                         project.finished = True
36 |                         project.project_type_code = conc_data.get('projectType')
37 |                         project.abstract = conc_data.get('projectAbstractC')
38 |                         project.abstract_en = conc_data.get('projectAbstractE')
39 |                         project.abstract_conc = conc_data.get('conclusionAbstract')
40 |                         project.keyword = conc_data.get('projectKeywordC')
41 |                         project.keyword_en = conc_data.get('projectKeywordE')
42 |                         project.result_stat = conc_data.get('result_stat')
43 | 
44 |                 pprint(project.as_dict)
45 |                 m.insert(Project, 'project_id', project)
46 | 
47 | 
48 | if __name__ == '__main__':
49 |     main()
50 | 


--------------------------------------------------------------------------------
/nsfc/bin/crawl.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import time
 3 | import json
 4 | 
 5 | import click
 6 | from simple_loggers import SimpleLogger
 7 | 
 8 | from nsfc.src.letpub import LetPub
 9 | 
10 | 
11 | @click.command(no_args_is_help=True, name='crawl', help='crawl data from website')
12 | @click.option('-y', '--year', help='the start year of searching', required=True)
13 | @click.option('-e', '--end', help='the end year of searching')
14 | @click.option('-sc', '--subcategory', help='subcategory of searching')
15 | @click.option('-c', '--code', help='the code of subject', required=True)
16 | @click.option('-O', '--outdir', help='the output directory', default='done', show_default=True)
17 | @click.option('-o', '--outfile', help='the output file', default='out.jl', show_default=True)
18 | @click.option('-l', '--level', help='the level of given code', type=click.Choice(['0', '1', '2', '3', '-1']))
19 | @click.option('-L', '--list', help='list the subcode for given code', is_flag=True)
20 | @click.option('-C', '--count', help='count only', is_flag=True)
21 | def main(**kwargs):
22 |     start_time = time.time()
23 | 
24 |     logger = SimpleLogger('MAIN')
25 |     logger.info(f'input arguments: {kwargs}')
26 | 
27 |     year = kwargs['year']
28 |     end = kwargs['end'] or year
29 |     code = kwargs['code']
30 |     subcategory = kwargs['subcategory']
31 |     level = int (kwargs['level']) if kwargs['level'] else None
32 |     count = kwargs['count']
33 |     letpub = LetPub(logger=logger)
34 | 
35 |     outdir = kwargs['outdir']
36 |     outfile = os.path.join(kwargs['outdir'], kwargs['outfile'])
37 |     if not os.path.exists(outdir):
38 |         os.makedirs(outdir)
39 | 
40 |     if kwargs['list']:
41 |         code_list = letpub.code_list
42 |         print(code_list.get(code))
43 |         exit(0)
44 | 
45 |     try:
46 |         with open(outfile, 'w') as out:
47 |             for context in letpub.search(code, startTime=year, endTime=end, subcategory=subcategory, level=level, count=count):
48 |                 if not count:
49 |                     line = json.dumps(context, ensure_ascii=False) + '\n'
50 |                     out.write(line)
51 |         if not count:
52 |             logger.info(f'save file: {outfile}')
53 |     except KeyboardInterrupt:
54 |         os.remove(outfile)
55 | 
56 |     elapsed = time.time() - start_time
57 |     logger.info(f'elapsed time: {elapsed:.2f}s')
58 | 
59 | 
60 | if __name__ == '__main__':
61 |     main()
62 | 


--------------------------------------------------------------------------------
/nsfc/bin/main.py:
--------------------------------------------------------------------------------
 1 | import click
 2 | 
 3 | from nsfc import version_info
 4 | from nsfc.bin.crawl import main as crawl_cli
 5 | from nsfc.bin.build import main as build_cli
 6 | from nsfc.bin.query import main as query_cli
 7 | from nsfc.bin.report import main as report_cli
 8 | 
 9 | 
10 | CONTEXT_SETTINGS = dict(help_option_names=['-h', '--help'])
11 | 
12 | __epilog__ = click.style(f'''\
13 | contact: {version_info['author']} <{version_info['author_email']}>
14 | ''', fg='cyan')
15 | 
16 | @click.group(help=click.style(version_info['desc'], bold=True, fg='green'),
17 |              epilog=__epilog__,
18 |              context_settings=CONTEXT_SETTINGS)
19 | @click.version_option(version=version_info['version'], prog_name=version_info['prog'])
20 | def cli(**kwargs):
21 |     pass
22 | 
23 | 
24 | def main():
25 |     cli.add_command(crawl_cli)
26 |     cli.add_command(build_cli)
27 |     cli.add_command(query_cli)
28 |     cli.add_command(report_cli)
29 |     cli()
30 | 
31 | 
32 | if __name__ == '__main__':
33 |     main()
34 | 


--------------------------------------------------------------------------------
/nsfc/bin/query.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import sys
  3 | import json
  4 | 
  5 | import openpyxl
  6 | from openpyxl.styles import Font, PatternFill
  7 | import click
  8 | from prettytable import PrettyTable
  9 | from simple_loggers import SimpleLogger
 10 | 
 11 | from nsfc import DEFAULT_DB, version_info
 12 | from nsfc.db.model import Project
 13 | from nsfc.db.manager import Manager
 14 | 
 15 | 
 16 | __epilog__ = click.style('''\
 17 | examples:
 18 | 
 19 | \b
 20 | # 查看帮助
 21 | nsfc query
 22 | \b
 23 | # 列出可用的查询字段
 24 | nsfc query -K
 25 | \b
 26 | # 输出数量
 27 | nsfc query -C
 28 | \b
 29 | # 按批准年份查询
 30 | nsfc query -C -s approval_year 2019
 31 | \b
 32 | # 按批准年份+学科代码(模糊)
 33 | nsfc query -C -s approval_year 2019 -s subject_code "%A%"
 34 | \b
 35 | # 批准年份也可以是一个区间
 36 | nsfc query -C -s approval_year 2015-2019 -s subject_code "%C01%"
 37 | \b
 38 | # 结果输出为.jl文件
 39 | nsfc query -s approval_year 2019 -s subject_code "%C0501%" -o C0501.2019.jl
 40 | \b
 41 | # 结果输出为xlsx文件
 42 | nsfc query -s approval_year 2019 -s subject_code "%C0501%" -o C0501.2019.xlsx -F xlsx
 43 | \b
 44 | # 限制最大输出条数
 45 | nsfc query -L 5 -s approval_year 2019
 46 | ''', fg='yellow')
 47 | 
 48 | @click.command(no_args_is_help=True,
 49 |                name='query',
 50 |                epilog=__epilog__,
 51 |                help='query data from the local database')
 52 | @click.option('-d', '--dbfile', help='the database file', default=DEFAULT_DB, show_default=True)
 53 | 
 54 | @click.option('-s', '--search', help='the search string, eg. project_id 41950410575', multiple=True, nargs=2)
 55 | 
 56 | @click.option('-o', '--outfile', help='the output filename')
 57 | 
 58 | @click.option('-F', '--format', help='the format of output',
 59 |               type=click.Choice(['json', 'jl', 'tsv', 'xlsx']), default='jl',
 60 |               show_choices=True, show_default=True)
 61 | @click.option('-K', '--keys', help='list the available keys for query', is_flag=True)
 62 | @click.option('-C', '--count', help='just output the out of searching', is_flag=True)
 63 | @click.option('-L', '--limit', help='the count of limit of output', type=int)
 64 | @click.option('-l', '--log-level', help='the level of logging',
 65 |               type=click.Choice(SimpleLogger().level_maps), default='info',
 66 |               show_choices=True, show_default=True)
 67 | def main(**kwargs):
 68 | 
 69 |     logger = SimpleLogger('STATS')
 70 |     logger.level = logger.level_maps[kwargs['log_level']]
 71 | 
 72 |     logger.info(f'input arguments: {kwargs}')
 73 | 
 74 |     dbfile = kwargs['dbfile']
 75 |     limit = kwargs['limit']
 76 |     outfile = kwargs['outfile']
 77 | 
 78 |     if kwargs['keys']:
 79 |         table = PrettyTable(['Key', 'Comment', 'Type'])
 80 |         for k, v in Project.metadata.tables['project'].columns.items():
 81 |             table.add_row([k, v.comment, v.type])
 82 |         for field in table._field_names:
 83 |             table.align[field] = 'l'
 84 |         print(click.style(str(table), fg='cyan'))
 85 |         exit(0)
 86 | 
 87 |     if not os.path.isfile(dbfile):
 88 |         logger.error(f'dbfile not exists! [{dbfile}]')
 89 |         baidu = version_info['baidu_data']
 90 |         logger.info(f'可通过百度网盘下载需要的数据：{baidu}\n'
 91 |                     f'下载完成后可通过-d参数指定数据库文件，也可以拷贝文件到：{DEFAULT_DB}')
 92 |         exit(1)
 93 | 
 94 |     uri = f'sqlite:///{dbfile}'
 95 |     with Manager(uri=uri, echo=False, logger=logger) as m:
 96 | 
 97 |         query = m.session.query(Project)
 98 | 
 99 |         if kwargs['search']:
100 |             for key, value in kwargs['search']:
101 |                 if '%' in value:
102 |                     query = query.filter(Project.__dict__[key].like(value))
103 |                 elif key in ('approval_year', ) and not value.isdigit():
104 |                     if '-' in value:
105 |                         min_value, max_value = value.split('-')
106 |                         query = query.filter(Project.__dict__[key] >= min_value)
107 |                         query = query.filter(Project.__dict__[key] <= max_value)
108 |                     else:
109 |                         logger.error('bad approval_year: {value}')
110 |                         exit(1)
111 |                 else:
112 |                     query = query.filter(Project.__dict__[key] == value)
113 | 
114 |         if limit:
115 |             query = query.limit(limit)
116 | 
117 |         logger.debug(str(query))
118 | 
119 |         if kwargs['count']:
120 |             logger.info(f'count: {query.count()}')
121 |         elif not query.count():
122 |             logger.warning('no result for your input')
123 |         else:
124 |             if outfile and kwargs['format'] == 'xlsx':
125 |                 wb = openpyxl.Workbook()
126 |                 ws = wb.active
127 |                 ws.title = 'NSFC-RESULT'
128 |                 title = [k for k, v in query.first().__dict__.items() if k != '_sa_instance_state']
129 |                 ws.append(title)
130 |                 for col, v in enumerate(title,1 ):
131 |                     _ = ws.cell(row=1, column=col, value=v)
132 |                     _.font = Font(color='FFFFFF', bold=True)
133 |                     _.fill = PatternFill(start_color='000000', end_color='000000', fill_type='solid')
134 | 
135 |                 for n, row in enumerate(query):
136 |                     context = [v for k, v in row.__dict__.items() if k != '_sa_instance_state']
137 |                     ws.append(context)
138 | 
139 |                 ws.freeze_panes = 'A2'
140 |                 wb.save(outfile)
141 |             else:
142 |                 out = open(outfile, 'w') if outfile else sys.stdout
143 |                 with out:
144 |                     if kwargs['format'] == 'json':
145 |                         data = [{k: v for k, v in row.__dict__.items() if k != '_sa_instance_state'} for row in query]
146 |                         out.write(json.dumps(data, ensure_ascii=False, indent=2) + '\n')
147 |                     else:
148 |                         for n, row in enumerate(query):
149 |                             context = {k: v for k, v in row.__dict__.items() if k != '_sa_instance_state'}
150 |                             if n == 0 and kwargs['format'] == 'tsv':
151 |                                 title = '\t'.join(context.keys())
152 |                                 out.write(title + '\n')
153 |                             if kwargs['format'] == 'tsv':
154 |                                 line = '\t'.join(map(str, context.values()))
155 |                             else:
156 |                                 line = json.dumps(context, ensure_ascii=False)
157 |                             out.write(line + '\n')
158 |             if outfile:
159 |                 logger.info(f'save file: {outfile}')
160 | 
161 | 
162 | if __name__ == '__main__':
163 |     main()
164 | 


--------------------------------------------------------------------------------
/nsfc/bin/report.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import shutil
 3 | import tempfile
 4 | 
 5 | import click
 6 | 
 7 | from nsfc.src.official import Official
 8 | 
 9 | 
10 | __epilog__ = click.style('''
11 | 
12 | \b
13 | examples:
14 |     nsfc report 20671004
15 |     nsfc report 20671004 -o out.pdf
16 |     nsfc report 20671004 -o out.pdf --delete
17 | ''', fg='yellow')
18 | 
19 | @click.command(name='report',
20 |                epilog=__epilog__,
21 |                no_args_is_help=True,
22 |                help='download the conclusion report for given project_id')
23 | @click.argument('project_id')
24 | @click.option('-t', '--tmpdir', help='the temporary directory to store pngs', default=tempfile.gettempdir(), show_default=True)
25 | @click.option('-o', '--outfile', help='the output filename of pdf')
26 | @click.option('-k', '--keep', help='do not the temporary directory after completion', is_flag=True)
27 | def main(**kwargs):
28 |     
29 |     tmpdir = tempfile.mktemp(prefix='nsfc-report-', dir=kwargs['tmpdir'])
30 |     if Official.get_conclusion_report(kwargs['project_id'], tmpdir=tmpdir, outfile=kwargs['outfile']):
31 |         if not kwargs['keep']:
32 |             shutil.rmtree(tmpdir)
33 |             Official.logger.debug(f'tempdir deleted: {tmpdir}')
34 | 
35 | 
36 | if __name__ == "__main__":
37 |     main()
38 | 


--------------------------------------------------------------------------------
/nsfc/db/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/nsfc/db/__init__.py


--------------------------------------------------------------------------------
/nsfc/db/manager.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import sqlalchemy
 3 | from sqlalchemy.orm import sessionmaker
 4 | from sqlalchemy.orm.state import InstanceState
 5 | 
 6 | from .model import Base, Project
 7 | 
 8 | from simple_loggers import SimpleLogger
 9 | 
10 | 
11 | class Manager(object):
12 |     """
13 |         uri:
14 |             - sqlite:///relative/path/to/db
15 |             - sqlite:////absolute/path/to/db
16 |             - sqlite:///:memory:
17 |     """
18 |     def __init__(self, uri=None, echo=True, drop=False, logger=None):
19 |         self.uri = uri or 'sqlite:///:memory:'
20 |         self.logger = logger or SimpleLogger('Manager')
21 |         self.engine = sqlalchemy.create_engine(uri, echo=echo)
22 |         self.engine.logger.level = self.logger.level
23 | 
24 |         self.session = self.connect()
25 |         self.create_table(drop=drop)
26 |     
27 |     def __enter__(self):
28 |         return self
29 | 
30 |     def __exit__(self, *exc_info):
31 |         self.session.commit()
32 |         self.session.close()
33 |         self.logger.debug('database closed.')
34 | 
35 |     def connect(self):
36 |         self.logger.debug('connecting to: {}'.format(self.uri))
37 |         DBSession = sessionmaker(bind=self.engine)
38 |         session = DBSession()
39 |         return session
40 | 
41 |     def create_table(self, drop=False):
42 |         if drop:
43 |             Base.metadata.drop_all(self.engine)
44 |         Base.metadata.create_all(self.engine)
45 | 
46 |     def query(self, Meta, key, value):
47 |         if key not in Meta.__dict__:
48 |             self.logger.warning(f'unavailable key: {key}')
49 |             return None
50 |         res = self.session.query(Meta).filter(Meta.__dict__[key]==value)
51 |         return res
52 | 
53 |     def insert(self, Meta, key, datas, upsert=True):
54 |         """
55 |             upsert: add when key not exists, update when key exists
56 |         """
57 |         if isinstance(datas, Base):
58 |             datas = [datas]
59 | 
60 |         for data in datas:
61 |             res = self.query(Meta, key, data.__dict__[key])
62 |             if not res.first():
63 |                 self.logger.debug(f'>>> insert data: {data}')
64 |                 self.session.add(data)
65 |             elif upsert:
66 |                 self.logger.debug(f'>>> update data: {data}')
67 |                 context = {k: v for k, v in data.__dict__.items() if not isinstance(v, InstanceState)}
68 |                 res.update(context)
69 | 
70 |     
71 | 
72 | if __name__ == '__main__':
73 |     # uri = 'sqlite:///:memory:'
74 |     # uri = 'sqlite:///./project.db'
75 |     # uri = 'sqlite:////path/to/test.db'
76 |     # m = Manager(uri)
77 |     # m.create_table()
78 | 
79 |     uri = 'sqlite:///./project.1997_2000.db'
80 | 
81 |     with Manager(uri=uri, echo=False) as m:
82 |         # m.create_table()
83 |         res = m.query(Project, 'project_id', '10001001')
84 |         print(dir(res))
85 | 
86 | 
87 | 
88 |     


--------------------------------------------------------------------------------
/nsfc/db/model.py:
--------------------------------------------------------------------------------
 1 | from sqlalchemy import Column, Integer, Float, DECIMAL, String, DATETIME, ForeignKey, BOOLEAN, Index, DATE
 2 | from sqlalchemy.orm import relationship
 3 | from sqlalchemy.orm.state import InstanceState
 4 | from sqlalchemy.ext.declarative import declarative_base
 5 | 
 6 | 
 7 | # 创建对象的基类:
 8 | Base = declarative_base()
 9 | 
10 | 
11 | class Project(Base):
12 |     __tablename__ = 'project'
13 | 
14 |     project_id = Column(String(20), primary_key=True, comment='项目编号')
15 | 
16 |     title = Column(String(200), comment='项目名称')
17 |     project_type = Column(String(50), comment='项目类型')
18 |     project_type_code = Column(String(20), comment='项目类型代码')
19 | 
20 |     approval_year = Column(Integer, comment='批准年度')
21 |     person = Column(String(20), comment='负责人', index=True)
22 |     money = Column(Float, comment='项目金额(万)')
23 |     institution = Column(String(50), comment='依托单位')
24 | 
25 |     start_time = Column(Integer, comment='开始时间(YYYYMM)')
26 |     end_time = Column(Integer, comment='结束时间(YYYYMM)')
27 | 
28 |     subject = Column(String(30), comment='所属学部')
29 |     subject_class_list = Column(String(100), comment='学科分类分级')
30 |     subject_code_list = Column(String(50), comment='学科代码分级')
31 |     subject_code = Column(String(20), comment='学科代码')
32 | 
33 |     finished = Column(BOOLEAN, comment='是否结题', default=False)
34 | 
35 |     keyword = Column(String(100), comment='中文关键词')
36 |     keyword_en = Column(String(100), comment='英文关键词')
37 |     abstract = Column(String(1000), comment='中文摘要')
38 |     abstract_en = Column(String(1000), comment='英文摘要')
39 |     abstract_conc = Column(String(1000), comment='结题摘要')
40 |     result_stat = Column(String(30), comment='研究成果统计')
41 | 
42 |     __table_args__ = (
43 |         Index('search_by_year', 'approval_year'),
44 |         Index('search_by_title', 'title'),
45 |     )
46 | 
47 |     @property
48 |     def as_dict(self):
49 |         return {k:v for k,v in self.__dict__.items() if not isinstance(v, InstanceState)}
50 | 
51 |     def __str__(self):
52 |         return '[{project_id} - {title}]'.format(**self.__dict__)
53 | 
54 |     __repr__ = __str__
55 | 
56 | 
57 | # class FieldCode(Base):
58 | #     __tablename__ = 'field_code'
59 | #     code = Column(String(20), comment='学科代码')
60 | #     name = Column(String(20), comment='学科名称')
61 | 
62 | 
63 | # class SupportType(Base):
64 | #     __tablename__ = 'support_type'
65 | #     code = Column(String(20), comment='类别代码')
66 | #     name = Column(String(20), comment='类别名称')
67 | 
68 | 
69 | if __name__ == '__main__':
70 |     data = {'approval_year': '2019',
71 |             'institution': '中国人民解放军第四军医大学',
72 |             'money': '20',
73 |             'period': '2020-01 - 2022-12',
74 |             'person': '宗春琳',
75 |             'project_id': '81903249',
76 |             'project_type': '青年科学基金项目',
77 |             'subject': '医学科学部',
78 |             'subject_class': '一级：放射医学，二级：放射医学，三级：放射医学',
79 |             'subject_code': '一级：H22，二级：H2201，三级：H2201',
80 |             'title': '放射性颌骨骨坏死中巨噬细胞外泌体对肌成纤维细胞的调控作用及机制研究'}
81 |     p = Project(**data)
82 |     print(p)


--------------------------------------------------------------------------------
/nsfc/src/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/nsfc/src/__init__.py


--------------------------------------------------------------------------------
/nsfc/src/letpub.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | import time
  3 | import random
  4 | from collections import defaultdict
  5 | 
  6 | import requests
  7 | from webrequests import WebRequest as WR
  8 | from simple_loggers import SimpleLogger
  9 | 
 10 | """
 11 | 注意事项：
 12 | - 查询结果最多显示20页(200个条目)
 13 | - 按学科查询时，会存在特殊情况：
 14 |     - 2级 == 3级           eg. A02 A0203 A0203
 15 |     - 1级 == 2级 == 3级    eg. A01 A01 A01
 16 | - 按项目类别查询时，"应急管理项目" 实际应该为 "科学部主任基金项目/应急管理项目"
 17 | 
 18 | 其他问题：
 19 | - 列表文件应该是旧的，数据不全
 20 | - 存在列表中没有的学科，如A06, A08 （U1930117）
 21 | - 或学科编号和官网不一致：eg. 11571001
 22 |     - LetPub: A011404
 23 |     - 官网：A0602 (http://output.nsfc.gov.cn/conclusionProject/2672d6fe408220c02da8ab9e24a0f637)
 24 | - 列表中没有其他学部(L)
 25 | 
 26 | """
 27 | 
 28 | 
 29 | class LetPub(object):
 30 |     base_url = 'http://www.letpub.com.cn'
 31 |     index_url = base_url + '/index.php?page=grant'
 32 |     search_url = base_url + '/nsfcfund_search.php'
 33 | 
 34 |     def __init__(self, logger=None):
 35 |         self.logger = logger or SimpleLogger('LetPub')
 36 |         self.subcategory_list = self.list_support_types()
 37 |         self.province_list = self.list_provinces()
 38 |         self.code_list = self.list_codes()
 39 | 
 40 |     def list_support_types(self):
 41 |         """项目类别列表
 42 | 
 43 |             Bug: 网页显示：应急管理项目
 44 |                  实际应该：科学部主任基金项目/应急管理项目
 45 |         """
 46 |         self.logger.debug('list support types ...')
 47 |         soup = WR.get_soup(self.index_url)
 48 |         subcategory_list = []
 49 |         for option in soup.select('select#subcategory option')[1:]:
 50 |             if option.text == '应急管理项目':
 51 |                 text = '科学部主任基金项目/应急管理项目'
 52 |             else:
 53 |                 text = option.text
 54 |             subcategory_list.append(text)
 55 | 
 56 |         return subcategory_list
 57 |     
 58 |     def list_provinces(self):
 59 |         """省份列表
 60 |         """
 61 |         self.logger.debug('list provinces ...')
 62 |         soup = WR.get_soup(self.index_url)
 63 |         province_list = [each.attrs['value'] for each in soup.select('#province_main option[value!=""]')]
 64 |         return province_list
 65 | 
 66 |     def list_codes(self):
 67 |         self.logger.debug('list subject codes ...')
 68 |         url = self.base_url + '/js/nsfctags2019multiple.js'
 69 |         resp = WR.get_response(url)
 70 | 
 71 |         codes = defaultdict(list)
 72 |         for line in resp.text.split('\n'):
 73 |             if line.startswith('subtag['):
 74 |                 linelist = line.strip().split("', '")
 75 |                 subject = linelist[2]                   # 学部
 76 |                 code1 = linelist[3]                     # 一级学科 A01
 77 |                 code2 = linelist[4]                     # 二级学科 A0101 
 78 |                 code3 = linelist[5]                     # 二级学科 A010101
 79 |                 # name = linelist[6].split("'")[0]        # 学科名字
 80 | 
 81 |                 if code1 not in codes[subject]:
 82 |                     codes[subject].append(code1)
 83 |                 if code2 not in codes[code1]:
 84 |                     codes[code1].append(code2)
 85 |                 if code3 not in codes[code2]:
 86 |                     codes[code2].append(code3)
 87 | 
 88 |                 # print(subject, name, code1, code2, code3)
 89 |         return dict(codes)
 90 | 
 91 |     def search(self, code, page=1, startTime='', endTime='', subcategory='', province_main='', level='', count=False):
 92 |         params = {
 93 |             'mode': 'advanced',
 94 |             'datakind': 'list',
 95 |             'currentpage': page
 96 |         }
 97 |         payload = {
 98 |             'addcomment_s1': code[0],
 99 |             'startTime': startTime,
100 |             'endTime': endTime,
101 |             'subcategory': subcategory,
102 |             'province_main': province_main,
103 |         }
104 | 
105 |         level = level or int((len(code) - 1) / 2)
106 |         if level > 0:
107 |             payload[f'addcomment_s{level+1}'] = code
108 | 
109 |         soup = self.search_page(params, payload)
110 |         total_count = int(soup.select_one('#dict div b').text)
111 |         total_page = math.ceil(total_count / 10.)
112 |         self.logger.info(f'total count: {total_count} [{payload}]')
113 | 
114 |         if count:
115 |             yield None
116 |         elif total_page > 20:
117 |             if 0 <= level < 3 and code in self.code_list:
118 |                 self.logger.warning(f'too many results, search with class level: {level+1} ...')
119 |                 for code2 in self.code_list[code]:
120 |                     yield from self.search(code2, page=page, startTime=startTime, endTime=endTime, subcategory=subcategory, province_main=province_main, level=level+1)
121 |             elif not subcategory:
122 |                 self.logger.warning('too many results, search with subcategory ...')
123 |                 for subcategory in self.subcategory_list:
124 |                     yield from self.search(code, page=page, startTime=startTime, endTime=endTime, subcategory=subcategory, province_main=province_main, level=level)
125 |             elif not province_main:
126 |                 self.logger.warning('too many results, search with province_main ...')
127 |                 for province_main in self.province_list:
128 |                     yield from self.search(code, page=page, startTime=startTime, endTime=endTime, subcategory=subcategory, province_main=province_main, level=level)
129 |             else:
130 |                 self.logger.error(f'still too many results! [{payload}]')
131 |         elif total_page > 0:
132 |             self.logger.debug(f'parsing for page: {page}/{total_page}')
133 |             yield from self.parse_page(soup)
134 |             if page < total_page:
135 |                 page += 1
136 |                 yield from self.search(code, page=page, startTime=startTime, endTime=endTime, subcategory=subcategory, province_main=province_main, level=level)
137 |             # yield code, total_count
138 | 
139 |     def search_page(self, params, payload):
140 |         """查询页面
141 |         """
142 |         self.logger.debug(f'searching for: {payload} [page: {params["currentpage"]}]')
143 |         while True:
144 |             soup = WR.get_soup(self.search_url, method='POST', params=params, data=payload)
145 |             if not soup.select_one('#dict div b'):
146 |                 self.logger.warning(f'{soup.text}')
147 |                 if '需要先注册登录' in soup.text:
148 |                     exit()
149 |                 time.sleep(30)
150 |                 continue
151 | 
152 |             time.sleep(random.randint(5, 10))
153 |             return soup
154 | 
155 |     def parse_page(self, soup):
156 |         """项目内容列表解析
157 |         """
158 |         ths = soup.select('table.table_yjfx .table_yjfx_th')
159 |         if ths:
160 |             title = [th.text for th in ths]
161 |             context = {}
162 |             for tr in soup.select('table.table_yjfx tr')[2:-1]:
163 |                 values = [td.text for td in tr.select('td')]
164 |                 if len(values) == len(title):
165 |                     if context:
166 |                         yield context
167 |                     context = dict(zip(title, values))
168 |                 else:
169 |                     context.update(dict([values]))
170 |             yield context
171 | 
172 | 
173 | if __name__ == '__main__':
174 |     import json
175 |     from pprint import pprint
176 | 
177 |     # letpub = LetPub(logfile='run.log')
178 |     letpub = LetPub()
179 |     old_codes = [k for k in letpub.code_list.keys() if len(k)==3]
180 | 
181 | 
182 |     url = 'http://output.nsfc.gov.cn/common/data/fieldCode'
183 |     # data = requests.get(url).json()['data']
184 |     # new_codes = []
185 |     # for item in data:
186 |     #     code = item['code']
187 |     #     if len(code) == 3 and code not in old_codes:
188 |     #         new_codes.append(code)
189 |     # print(new_codes)
190 |     
191 |     # code = 'A0203'
192 |     # code = 'A26'
193 | 
194 |     year = 2019
195 |     
196 |     subcategory = ''
197 |     # subcategory = '科学部主任基金项目/应急管理项目'
198 | 
199 |     # for code in new_codes:
200 |     #     if not code.startswith('A'):
201 |     #         continue
202 |     #     with open(f'{code}.{year}.jl', 'w') as out:
203 |     #         for each in letpub.search(code, startTime=year, endTime=year, subcategory=subcategory):
204 |     #             out.write(json.dumps(each, ensure_ascii=False) + '\n')
205 | 
206 |     code = 'L'
207 |     year = 2017
208 |     level = -1
209 |     with open(f'{code}.{year}.jl', 'w') as out:
210 |         for each in letpub.search(code, startTime=year, endTime=year, subcategory=subcategory, level=level):
211 |             out.write(json.dumps(each, ensure_ascii=False) + '\n')
212 | 
213 | 


--------------------------------------------------------------------------------
/nsfc/src/medsci.py:
--------------------------------------------------------------------------------
 1 | import re
 2 | import sys
 3 | import math
 4 | import json
 5 | 
 6 | import click
 7 | from webrequests import WebRequest as WR
 8 | 
 9 | 
10 | 
11 | class MedSCI(object):
12 |     url = 'https://www.medsci.cn/sci/nsfc.do'
13 | 
14 |     @classmethod
15 |     def search(cls, page=1, txtitle='', project_classname_list='', date_begin='', date_end='', **kwargs):
16 |         params = {
17 |             'txtitle': txtitle,
18 |             'page': page,
19 |             'project_classname_list': project_classname_list,
20 |             'cost_begin': '',
21 |             'cost_end': '',
22 |             'date_begin': date_begin,
23 |             'date_end': date_end,
24 |             'sort_type': '3',
25 |         }
26 |         soup = WR.get_soup(cls.url, params=params)
27 |         total_count = int(re.findall(r'\d+', soup.select_one('.list-result').text)[0])
28 |         total_page = math.ceil(total_count / 15.)
29 |         click.secho(f'total page: {total_page}, total count: {total_count} [{params}]', err=True, fg='yellow')
30 | 
31 |         if total_count == 500:
32 |             click.secho(f'too many results: {params}, searching by each project ...', err=True, fg='yellow')
33 |             for project in cls.list_projects():
34 |                 if params['project_classname_list']:
35 |                     click.secho(f'still too many results: {params} ...', err=True, fg='red')
36 |                     exit(1)
37 |                 params['project_classname_list'] = project
38 |                 yield from cls.search(**params)
39 | 
40 |         for page in range(1, total_page + 1):
41 |             click.secho(f'>>> crawling page: {page}/{total_page}', err=True, fg='green')
42 |             params['page'] = page
43 |             soup = WR.get_soup(cls.url, params=params)
44 |             for a in soup.select('#journalList .journal-item strong a'):
45 |                 click.secho(str(a), err=True, fg='white')
46 |                 context = {}
47 |                 href = a.attrs['href']
48 |                 data = dict(list(cls.get_detail(href)))
49 |                 context['title'] = data['项目名称']
50 |                 context['project_id'] = data['项目批准号']
51 |                 context['project_type'] = data['资助类型']
52 |                 context['person'] = data['负责人']
53 |                 context['institution'] = data['依托单位']
54 |                 context['money'] = data['批准金额'].strip('万元')
55 |                 context['approval_year'] = data['批准年份']
56 | 
57 |                 context['subject_code'] = data['学科分类'].split()[0]
58 | 
59 |                 context['start_time'], context['end_time'] = data['起止时间'].split('-')
60 | 
61 |                 yield context
62 | 
63 |     @classmethod
64 |     def get_detail(cls, url):
65 |         soup = WR.get_soup(url)
66 |         for column in soup.select('.journal-content .journal-content-column'):
67 |             key = column.select_one('.column-label').text
68 |             value = column.select_one('.font-black').text.strip()
69 |             yield key, value
70 | 
71 |     @classmethod
72 |     def list_projects(cls):
73 |         soup = WR.get_soup(cls.url)
74 |         for box in soup.select('.input-area .ms-checkbox input'):
75 |             yield box.attrs['value']
76 | 
77 | 
78 | @click.command()
79 | @click.option('-y', '--year', help='the year of approval')
80 | @click.option('-o', '--outfile', help='the output filename')
81 | def main(**kwargs):
82 |     year = kwargs['year']
83 |     out = open(kwargs['outfile'], 'w') if kwargs['outfile'] else sys.stdout
84 | 
85 |     with out:
86 |         for context in MedSCI.search(date_begin=year, date_end=year):
87 |             out.write(json.dumps(context, ensure_ascii=False) + '\n')
88 |             # break
89 | 
90 | 
91 | if __name__ == '__main__':
92 |     main()
93 | 


--------------------------------------------------------------------------------
/nsfc/src/official.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | import requests
  4 | import img2pdf
  5 | import human_readable
  6 | 
  7 | from webrequests import WebRequest as WR
  8 | from simple_loggers import SimpleLogger
  9 | 
 10 | 
 11 | class Official(object):
 12 |     base_url = 'https://kd.nsfc.gov.cn'
 13 |     logger = SimpleLogger('Official')
 14 | 
 15 |     field_codes = WR.get_response(base_url + '/api/common/fieldCode').json()['data']
 16 | 
 17 |     @classmethod
 18 |     def list_root_codes(cls):
 19 |         """
 20 |             获取所有的学科分类代码
 21 |         """
 22 |         root_codes = {}
 23 |         for context in cls.field_codes:
 24 |             if len(context['code']) == 1:
 25 |                 root_codes[context['code']] = context['name']
 26 |         return root_codes
 27 | 
 28 |     @classmethod
 29 |     def list_child_codes(cls, keys):
 30 |         """
 31 |             获取最低级的学科代码
 32 |                 C01  -->  C010101, C010102, ...
 33 |                 H10  -->  H1001, H1002, ...
 34 |         """
 35 |         child_codes = {}
 36 |         for key in keys.split(','):
 37 |             for context in cls.field_codes:
 38 |                 code = context['code']
 39 |                 if len(code) == 1:
 40 |                     continue
 41 |                 if code.startswith(key):
 42 |                     child_codes[code] = context['name']
 43 |                     if code[:-2] in child_codes:
 44 |                         del child_codes[code[:-2]]
 45 |         return child_codes
 46 | 
 47 |     @classmethod
 48 |     def get_conclusion_data(cls, ratify_number, detail=True):
 49 |         """
 50 |             获取指定项目批准号的结题数据
 51 |         """
 52 |         url = cls.base_url + '/api/baseQuery/completionQueryResultsData'
 53 |         payload = {
 54 |             'ratifyNo': ratify_number,
 55 |             # 'queryType': 'input',
 56 |             # 'complete': 'true',
 57 |         }
 58 |         result = WR.get_response(url, method='POST', json=payload).json()['data']['resultsData']
 59 |         data = {}
 60 |         if result:
 61 |             data['projectid'] = result[0][0]
 62 |             data['project_type'] = result[0][3]
 63 |             data['result_stat'] = result[0][10]
 64 | 
 65 |         if detail and data.get('projectid'):
 66 |             detail_data = cls.get_detail_data(data['projectid'])
 67 |             data.update(detail_data)
 68 |         return data
 69 | 
 70 |     @classmethod
 71 |     def get_detail_data(cls, projectid):
 72 |         url = cls.base_url + '/api/baseQuery/conclusionProjectInfo/' + projectid
 73 |         data = WR.get_response(url).json()['data']
 74 |         return data
 75 | 
 76 |     @classmethod
 77 |     def get_conclusion_report(cls, ratify_number, tmpdir='tmp', pdf=True, outfile=None):
 78 |         data = cls.get_conclusion_data(ratify_number, detail=False)
 79 |         if not data:
 80 |             cls.logger.warning(f'no conclusion result for: {ratify_number}')
 81 |             return
 82 | 
 83 |         images = list(cls.get_conclusion_report_images(data['projectid']))
 84 | 
 85 |         if not os.path.exists(tmpdir):
 86 |             os.makedirs(tmpdir)
 87 |         
 88 |         pngs = []
 89 |         for n, url in enumerate(images, 1):
 90 |             name = os.path.basename(url)
 91 |             png = f'{tmpdir}/{name}.png'
 92 |             pngs.append(png)
 93 |             cls.logger.debug(f'[{n}/{len(images)}] download png: {url} => {png}')
 94 | 
 95 |             resp = WR.get_response(url, stream=True)
 96 |             with open(png, 'wb') as out:
 97 |                 for chunk in resp.iter_content(chunk_size=512):
 98 |                     out.write(chunk)
 99 |             cls.logger.debug(f'save png: {png}')
100 |         
101 |         if pdf:
102 |             cls.logger.debug('converting *png to pdf')
103 |             outfile = outfile or f'{ratify_number}.pdf'
104 |             with open(outfile, 'wb') as out:
105 |                 out.write(img2pdf.convert(pngs))
106 | 
107 |             size = human_readable.file_size(os.stat(outfile).st_size)
108 |             cls.logger.info(f'save pdf: {outfile} [{size}]')
109 |         return True
110 | 
111 |     @classmethod
112 |     def get_conclusion_report_images(cls, projectid):
113 |         url = cls.base_url + '/api/baseQuery/completeProjectReport'
114 |         index = 1
115 |         while True:
116 |             payload = {
117 |                 'id': projectid,
118 |                 'index': index
119 |             }
120 |             res = WR.get_response(url, method='POST', data=payload).json()['data']
121 |             if not res['hasnext']:
122 |                 break
123 |             yield cls.base_url + res['url']
124 |             index += 1
125 | 
126 | 
127 | if __name__ == '__main__':
128 |     Official.get_conclusion_report('20671004')
129 | 


--------------------------------------------------------------------------------
/nsfc/util/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/suqingdong/nsfc/9185e4af8ebcc98dbc33d487be44c823015668ea/nsfc/util/__init__.py


--------------------------------------------------------------------------------
/nsfc/util/parse_data.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | 
 3 | 
 4 | def parse(infile):
 5 |     with open(infile) as f:
 6 |         for line in f:
 7 |             context = json.loads(line.strip())
 8 |             if 'project_id' in context:
 9 |                 yield context
10 |             else:
11 |                 data = {}
12 |                 data['project_id'] = context['项目编号']
13 |                 data['person'] = context['负责人']
14 |                 data['institution'] = context['单位']
15 |                 data['money'] = context['金额 (万)']
16 |                 data['subject'] = context['所属学部']
17 |                 data['project_type'] = context['项目类型']
18 |                 data['approval_year'] = context['批准年份']
19 |                 data['title'] = context['题目']
20 |                 data['subject_class_list'] = context['学科分类']
21 |                 data['subject_code_list'] = context['学科代码']
22 | 
23 |                 data['subject_code'] = context['学科代码'].split()[-1]
24 | 
25 |                 start, end = context['执行时间'].split(' 至 ')
26 |                 data['start_time'] = start.replace('-', '')
27 |                 data['end_time'] = end.replace('-', '')
28 | 
29 |                 yield data
30 | 


--------------------------------------------------------------------------------
/nsfc/version/version.json:
--------------------------------------------------------------------------------
1 | {
2 |     "desc": "国家自然科学基金数据查询系统",
3 |     "prog": "nsfc",
4 |     "author": "suqingdong",
5 |     "author_email": "suqingdong1114@gmail.com",
6 |     "baidu_data": "链接: https://pan.baidu.com/s/1eadrfUg1ovBF1EAXWSTV-w 提取码: 2nw5",
7 |     "version": "2.0.4"
8 | }
9 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | click
 2 | img2pdf
 3 | openpyxl
 4 | requests
 5 | sqlalchemy
 6 | prettytable
 7 | human-readable
 8 | webrequests
 9 | simple-loggers
10 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import json
 3 | import codecs
 4 | from setuptools import setup, find_packages
 5 | 
 6 | 
 7 | BASE_DIR = os.path.dirname(os.path.abspath(__file__))
 8 | version_info = json.load(codecs.open(os.path.join(BASE_DIR, 'nsfc', 'version', 'version.json'), encoding='utf-8'))
 9 | 
10 | 
11 | setup(
12 |     name=version_info['prog'],
13 |     version=version_info['version'],
14 |     author=version_info['author'],
15 |     author_email=version_info['author_email'],
16 |     description=version_info['desc'],
17 |     long_description=codecs.open(os.path.join(BASE_DIR, 'README.md'), encoding='utf-8').read(),
18 |     long_description_content_type="text/markdown",
19 |     url='https://github.com/suqingdong/nsfc',
20 |     project_urls={
21 |         'Documentation': 'https://nsfc.readthedocs.io',
22 |         'Tracker': 'https://github.com/suqingdong/nsfc/issues',
23 |     },
24 |     license='BSD License',
25 |     install_requires=codecs.open(os.path.join(BASE_DIR, 'requirements.txt'), encoding='utf-8').read().split('\n'),
26 |     packages=find_packages(),
27 |     include_package_data=True,
28 |     entry_points={'console_scripts': [
29 |         'nsfc = nsfc.bin.main:main',
30 |     ]},
31 |     classifiers=[
32 |         'Development Status :: 5 - Production/Stable',
33 |         'Operating System :: OS Independent',
34 |         'Intended Audience :: Developers',
35 |         'License :: OSI Approved :: MIT License',
36 |         'Programming Language :: Python',
37 |         'Programming Language :: Python :: 3',
38 |         'Programming Language :: Python :: 3.8',
39 |         'Topic :: Software Development :: Libraries'
40 |     ]
41 | )
42 | 


--------------------------------------------------------------------------------