├── License.txt
├── README.md
├── dictionary
    └── index.list.dic
├── python
    ├── e_Stat_API_Adaptor.py
    ├── examples.py
    ├── get_csv.py
    └── install.py
└── www
    └── run.py


/License.txt:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 | Copyright (c) 2016 National Statistics Center
3 | 
4 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
5 | 
6 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
7 | 
8 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # 中間アプリ
  2 | 
  3 | 
  4 | ## 概要
  5 | 中間アプリはe-Stat APIを使いやすくするためのPythonライブラリです。
  6 | 
  7 | ## License
  8 | * MIT  
  9 |     * see License.txt
 10 |     
 11 | ## 事前にインストールするライブラリ等
 12 | 本ライブラリを使用する前に下記のライブラリ等は予めインストールしてください。
 13 | 
 14 | ### Pythonライブラリ
 15 | 	pandas,numpy,math,Flask
 16 | 
 17 | 
 18 | ## ディレクトリ及びファイル構成
 19 | (ディレクトリがない場合は作成してください)
 20 | 
 21 |     ├ data-cache/	 				 キャッシュ用ディレクトリ(データがCSV形式で保存されます)  
 22 |     ├ dictionary/					 辞書用ディレクトリ(検索用の辞書が作成されます)  
 23 |     ├ python/						 Pythonライブラリ用ディレクトリ  
 24 |     │  └ e_STat_API_Adaptor.py    
 25 |     ├ tmp/						 一時ダウンロード用ディレクトリ(json形式の元データを一時的に保存します)  
 26 |     └ www/						 Web公開用ディレクトリ  
 27 |        └ run.py					 Web用中間アプリスクリプト  
 28 | 
 29 | なお、インスタンスの生成時及びe_STat_API_Adaptor.py内でこれらのディレクトリの設定を行えます。
 30 | 
 31 | ## インスタンスの生成例
 32 | インスタンスの生成にはe-Stat APIのサイトで取得できるappIDが必要になります。
 33 | 予め取得してください。
 34 | 
 35 |     #!/usr/bin/env python
 36 |     # -*- coding: utf-8 -*-
 37 |     import sys
 38 |     sys.path.append('./python')
 39 |     import e_Stat_API_Adaptor
 40 |     eStatAPI = e_Stat_API_Adaptor.e_Stat_API_Adaptor({
 41 | 		# 取得したappId
 42 | 		'appId'		: 'hogehoge'
 43 | 		# データをダウンロード時に一度に取得するデータ件数
 44 | 		,'limit'	: '10000'
 45 | 		# next_keyに対応するか否か(非対応の場合は上記のlimitで設定した件数のみしかダウンロードされない)
 46 | 		# 対応時はTrue/非対応時はFalse
 47 | 		,'next_key'	: True
 48 | 		# 中間アプリの設置ディレクトリ
 49 | 		,'directory':'./'
 50 | 		# APIのバージョン
 51 | 		,'ver'		:'2.0'
 52 |     })
 53 | 
 54 | ## 最初に行うこと
 55 | まず、ディレクトリ等を作成後に下記を実行してください。統計IDを検索するために必要なインデックスが生成されます。
 56 | 
 57 |     # 全ての統計表IDをローカルにダウンロード
 58 |     eStatAPI.load_all_ids()
 59 |     # ダウンロードした統計表IDからインデックスを作成 
 60 |     eStatAPI.build_statid_index()
 61 | 
 62 | また、下記を実行することで、STATISTICS_NAMEとTITLEから検索用インデックスを作成できます(N-gram形式)。
 63 |     
 64 |     eStatAPI.build_detailed_index()
 65 |     eStatAPI.search_detailed_index('家計')
 66 | 
 67 | ## 機能
 68 | 
 69 | 中間アプリの主な機能は下記の5つです。
 70 | 
 71 | ### 統計IDの検索
 72 | 
 73 | インデックスリストを検索
 74 | 
 75 |     eStatAPI.search_id(
 76 |     	 '法人'
 77 |     	,eStatAPI.path['dictionary-index']
 78 |     )
 79 | 
 80 |     eStatAPI.search_id(
 81 |     	 'index'
 82 |     	,eStatAPI.path['dictionary-index']
 83 |     )
 84 |     
 85 | ユーザー作成型インデックスを検索
 86 | 
 87 |     eStatAPI.search_id(
 88 |     	 '法人'
 89 |     	,eStatAPI.path['dictionary-user']
 90 |     	,'user'
 91 |     )
 92 | 
 93 |     eStatAPI.search_id(
 94 |     	 '家計'
 95 |     	,eStatAPI.path['dictionary-index']
 96 |     )
 97 | 
 98 | 下記でユーザー用のインデックスにすることも可能です。
 99 | 
100 |     eStatAPI.create_user_index_from_detailed_index('法人')
101 | 
102 | ### データのダウンロード
103 | 下記のようにget_csvメソッドを実行することで、
104 | 該当するデータがdata-cacheディレクトリ内にCSV形式で保存されます。
105 | 
106 | get_csvメソッドの返り値は保存されたCSVの1行目と3行目以降です。
107 | 
108 | 
109 |     eStatAPI.get_csv(
110 |     	 'get'
111 |     	,'0000030002'
112 |     )
113 | 
114 | 作成されたCSVファイルは下記のようになります。  
115 | 1行目:列名  
116 | 2行目:キー  
117 | 3行目以降:データ(文字列)  
118 | 
119 |     "$","全国都道府県030001","男女Ａ030001","年齢各歳階級Ｂ030003","全域・集中の別030002","時間軸(年次)","unit"
120 |     "$","area","cat02","cat03","cat01","time","unit"
121 |     "117060396","全国","男女総数","総数","全域","1980年","人"
122 | 
123 | ### データの表示
124 | get_csvメソッドの第一引数に"get"を指定することでダウンロードした全てのデータを表示させることができます。
125 | 
126 |     eStatAPI.get_csv(
127 |     	 'get'
128 |     	,'0000030002'
129 |     )
130 | 
131 | また、第一引数に"head"を指定するとデータの最初の5行、"tail"を指定すると最後の5行を表示させることができます。
132 | 
133 |     eStatAPI.get_csv(
134 |     	 'head'
135 |     	,'0000030001'
136 |     )
137 |     eStatAPI.get_csv(
138 |     	 'tail'
139 |     	,'0000030001'
140 |     )
141 | 
142 | さらに、get_outputメソッドを使用することでJSON形式にデータを変換することができます。  
143 | JSON形式にはCSVの行を基準としたjson-row形式(第二引数に"rjson"を指定)、列を基準としたjson-col形式(第二引数に"cjson"を指定)があります。
144 |  
145 |     eStatAPI.get_output(
146 |     	　eStatAPI.get_csv('get' , '0000030001')
147 |     	,'rjson'
148 |     )
149 | 
150 |     [
151 |     	 {全域・集中の別030002: "全域", 全国都道府県030001: "全国", 男女Ａ030001: "男女総数", 時間軸(年次): "1980年", 年齢５歳階級Ａ030002: "総数",…}
152 |     	,{全域・集中の別030002: "全域", 全国都道府県030001: "全国市部", 男女Ａ030001: "男女総数", 時間軸(年次): "1980年", 年齢５歳階級Ａ030002: "総数",…}
153 | 	    …
154 | 	]
155 | 	
156 |     eStatAPI.get_output(
157 |     	　eStatAPI.get_csv('get' , '0000030001')
158 |     	,'cjson'
159 |     )
160 |     
161 |     {
162 |     	$: [117060396, 89187409, 27872987, 5575989, 1523907, 1421927, 2082320, 1256745, 1251917,…]
163 |     	unit: ["人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人",…]
164 |     	全国都道府県030001: ["全国", "全国市部", "全国郡部", "北海道", "青森県", "岩手県", "宮城県", "秋田県", "山形県", "福島県", "茨城県", "栃木県", "群馬県", "埼玉県",…]
165 |     	全域・集中の別030002: ["全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域",…]
166 |     	年齢５歳階級Ａ030002: ["総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数",…]
167 |     	時間軸(年次): ["1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年",…]
168 |     }
169 | 
170 | ### データの集約
171 | 
172 | 2つの統計表を集約することもできます。
173 | 下記のように、第一引数に統計表ID(2つ)を「`,`」(カンマ)で区切って文字列として指定、第二引数にカテゴリ名(area等)、第三引数に集約する手法を指定してください。
174 | 
175 | 
176 |     eStatAPI.merge_data(
177 |     	 '0000030001,0000030001'
178 |     	,'all'
179 |     	,'std'
180 |     )
181 | 
182 | なお、第三引数で指定できる集約する方法は下記になります。
183 | 
184 | 
185 | |手法|   |
186 | |---|---|
187 | |最小値|min|
188 | |最大値|max|
189 | |中央値|median|
190 | |頻度|count|
191 | |分散|var|
192 | |標準偏差|std|
193 | |平均値|mean|
194 | |合計|sum|
195 | 
196 | 
197 | 
198 | ### データへのHTTPアクセス
199 | UWSGIとFlaskを使用することでHTTPアクセスでデータを表示させ、取得することも可能です。
200 | `www/run.py`を実行してください。
201 | 
202 | #### 取得
203 | 
204 |     リクエストURL 		: '<appId>/<cmd>/<id>.<ext>'
205 |     <appID> 		: ご自身で取得されたApp IDです。
206 |     <cmd>			: get、head、tailの3種類を指定できます。getの場合はデータ全体、headの場合はデータの上部一部分、tailはデータの下部一部分を表示します。
207 |     <id>			: 取得したい統計データの統計表IDです。
208 |     <ext>			: 取得したい出力形式です(「出力形式」を参照してください)。
209 |     
210 |     パラメーター
211 |     dl			: 出力結果をダウンロードしたい場合はtrueを指定してください。
212 | 
213 | ##### 例
214 | 「国勢調査(統計表ID:0000030001)」をCSV形式で表示する場合
215 |    
216 |     <appId>/get/0000030001.csv
217 | 
218 | 「国勢調査(統計表ID:0000030001)」をCSV形式でダウンロードする場合
219 |    
220 |     <appId>/get/0000030001.csv?dl=true
221 | 
222 | #### 集約
223 | 
224 |     リクエストURL 	: '<appId>/merge/<ids>/<group_by>.<ext>'
225 |     <appID> 	: ご自身で取得されたApp IDです。
226 |     <ids>		: 結合したい2つの統計IDをカンマ区切りで指定します。
227 |     <group_by>	: 集約したい列(キー)を指定します。area,time,cat01,cat02等になります。指定がない場合はallになります。
228 |     <ext>		: 取得したい出力形式です(「出力形式」を参照してください)。
229 |     
230 |     パラメーター
231 |     dl			: 出力結果をダウンロードしたい場合はtrueを指定してください。
232 |     aggregate	: データを集約する手法を指定します。現在は下記の手法が対応しています。指定がない場合は全てのデータを表示します。
233 | 
234 | 
235 | |手法|   |
236 | |---|---|
237 | |最小値|min|
238 | |最大値|max|
239 | |中央値|median|
240 | |頻度|count|
241 | |分散|var|
242 | |標準偏差|std|
243 | |平均値|mean|
244 | |合計|sum|
245 | 
246 | 
247 | 
248 | ##### 例
249 | 「国勢調査(統計表ID:0000030001,0000030002)」をマージしCSVで表示する場合
250 | 
251 |     /<appId>/merge/0000030001,0000030002/all.csv
252 | 
253 | 「国勢調査(統計表ID:0000030001,0000030002)」をareaでマージしCSVで表示する場合
254 | 
255 |     /<appId>/merge/0000030001,0000030002/area.csv
256 | 
257 | 「国勢調査(統計表ID:0000030001,0000030002)」をareaの平均値で集約しCSVで表示する場合
258 | 
259 |     /<appId>/merge/0000030001,0000030002/area.csv?aggregation=mean
260 | 
261 | #### 検索
262 | 
263 |     リクエストURL 	: '<appId>/search/<q>.<ext>'
264 |     <appID> 	: ご自身で取得されたApp IDです。
265 |     <q>			: 検索したい単語(一語)です。なお、「index」を指定すると全件表示されます。
266 |     <ext>		: 取得したい出力形式です(「出力形式」を参照してください)。
267 | 
268 |     パラメーター
269 |     dl			: 出力結果をダウンロードしたい場合はtrueを指定してください。
270 | 
271 | ##### 例 
272 | 「法人」という単語が入った統計ID表をcsv形式で表示する場合
273 | 
274 |     /<appId>/search/法人.csv
275 | 
276 | 「法人」という単語が入った統計ID表をcsv形式でダウンロードする場合
277 | 
278 |     /<appId>/search/法人.csv?dl=true
279 | 
280 | 「法人」という単語が入った統計ID表を行型のJSON形式で表示する場合
281 | 
282 |     /<appId>/search/法人.rjson
283 | 
284 | 「法人」という単語が入った統計ID表を列型のJSON形式で表示する場合
285 | 
286 |     /<appId>/search/法人.cjson
287 | 
288 | #### 出力形式
289 | 3つの出力形式があります。各リクエストURLにおいて「<ext>」で表されています。
290 | なお、全てのエンコーディングはUTF-8になります。
291 | 
292 | 1. csv  
293 | csv形式の出力です。下記のようになります。
294 |     
295 |         "全域・集中の別030002","男女Ａ030001","年齢５歳階級Ａ030002","全国都道府県030001","時間軸(年次)","unit","$"
296 |         "全域","男女総数","総数","全国","1980年","人","117060396"
297 |         "全域","男女総数","総数","全国市部","1980年","人","89187409"
298 |         "全域","男女総数","総数","全国郡部","1980年","人","27872987"
299 |         "全域","男女総数","総数","北海道","1980年","人","5575989"
300 |         "全域","男女総数","総数","青森県","1980年","人","1523907"
301 |         "全域","男女総数","総数","岩手県","1980年","人","1421927"
302 |         .....
303 | 
304 | 2. rjson  
305 | CSVを行単位でまとめたJSONの形式です。下記のようになります。
306 | 
307 |         [
308 |     	 {全域・集中の別030002: "全域", 全国都道府県030001: "全国", 男女Ａ030001: "男女総数", 時間軸(年次): "1980年", 年齢５歳階級Ａ030002: "総数",…}
309 |     	,{全域・集中の別030002: "全域", 全国都道府県030001: "全国市部", 男女Ａ030001: "男女総数", 時間軸(年次): "1980年", 年齢５歳階級Ａ030002: "総数",…}
310 | 	    …
311 | 	    ]
312 | 
313 | 3. cjson  
314 | CSVを列単位でまとめたJSONの形式です。下記のようになります。
315 | 
316 | 
317 |         {
318 |     	$: [117060396, 89187409, 27872987, 5575989, 1523907, 1421927, 2082320, 1256745, 1251917,…]
319 |     	unit: ["人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人", "人",…]
320 |     	全国都道府県030001: ["全国", "全国市部", "全国郡部", "北海道", "青森県", "岩手県", "宮城県", "秋田県", "山形県", "福島県", "茨城県", "栃木県", "群馬県", "埼玉県",…]
321 |     	全域・集中の別030002: ["全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域", "全域",…]
322 |     	年齢５歳階級Ａ030002: ["総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数", "総数",…]
323 |     	時間軸(年次): ["1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年", "1980年",…]
324 |         }
325 | 
326 | 
327 | ### 注意点
328 | * データがキャッシュされているため、e-Stat API側でデータが変更された場合は、該当するデータのCSVファイルを削除して、再度ダウンロードしてください。
329 | 


--------------------------------------------------------------------------------
/python/e_Stat_API_Adaptor.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | # # # # # # # # # # # # # # # # # # # # # # # #
  5 | #
  6 | #  e-Stat API Adaptor
  7 | #  (c) 2016 National Statistics Center
  8 | #  License: MIT
  9 | #
 10 | # # # # # # # # # # # # # # # # # # # # # # # #
 11 | 
 12 | import os
 13 | import subprocess
 14 | import unicodedata
 15 | import urllib2
 16 | import json
 17 | import csv
 18 | import re
 19 | import StringIO
 20 | import random
 21 | import numpy
 22 | import math
 23 | import pandas as pd
 24 | from flask import request
 25 | from flask import Response
 26 | from flask import Flask
 27 | 
 28 | 
 29 | class e_Stat_API_Adaptor:
 30 | 
 31 |     def __init__(self, _):
 32 |         # アプリ設定
 33 |         self._ = _
 34 |         # パス設定
 35 |         self.path = {
 36 |             # データダウンロード時に使用するディレクトリ
 37 |             # CSVのディレクトリ
 38 |             # 全ての統計IDを含むJSONファイルのパス
 39 |             'tmp'				: self._['directory'] + 'tmp/'            # indexを作成するパス
 40 |             # ユーザーindex
 41 |             # 統計センターindex
 42 |             # 統計センターindexのダウンロード用URL
 43 |             # 詳細(n-gram形式)
 44 |             # 公開ディレクトリ
 45 |             , 'csv'				: self._['directory'] + 'data-cache/', 'statid-json'		: self._['directory'] + 'dictionary/all.json.dic', 'dictionary-index'	: self._['directory'] + 'dictionary/index.list.dic', 'dictionary-user'	: self._['directory'] + 'dictionary/user.csv.dic', 'dictionary-stat-center': self._['directory'] + 'dictionary/stat.center.csv.dic', 'url-dictionary-stat-center': 'http://www.e-stat.go.jp/api/sample2/api-m/stat-center-index.csv', 'dictionary-detail': self._['directory'] + 'dictionary/detail/', 'http-public'		: '/'
 46 |         }
 47 |         self.msg = {
 48 |             'check-extension': 'Oops! check your extension!'
 49 |         }
 50 |         self.url = {
 51 |             'host'		: 'http://api.e-stat.go.jp', 'path': '/'.join([
 52 |                 'rest', self._['ver'], 'app', 'json', 'getStatsData'
 53 |             ])
 54 |         }
 55 |         self.csv_header = {
 56 |             'index': ['statsDataId', '調査名', '調査年月', '組織名', 'カテゴリー'], 'user': ['statsDataId', '検索語']
 57 |         }
 58 |         self.header = {'Access-Control-Allow-Origin': '*'}
 59 |         self.random_str = 'ABCDEFGHIJKLMNOPQRTSUVWXYZabcdefghijklmnopqrstuvwxyz1234567890'
 60 |         self.cache = {}
 61 |         # N-グラムの設定
 62 |         self.gram = 2
 63 |     # 全ての統計IDをダウンロード
 64 | 
 65 |     def load_all_ids(self):
 66 |         load_uri = self.build_uri({
 67 |             'appId': self._['appId'], 'searchWord': ''
 68 |         }).replace('getStatsData', 'getStatsList')
 69 |         self.cmd_line(self.build_cmd([
 70 |             'curl', '-o', self.path['statid-json'], '"' + load_uri + '"'
 71 |         ]))
 72 |     # ダウンロードした統計表からインデックスファイルを作成する
 73 | 
 74 |     def build_statid_index(self):
 75 |         jd = self.load_json(
 76 |             self.path['statid-json'])['GET_STATS_LIST']['DATALIST_INF']['TABLE_INF']
 77 |         rows = '\n'.join([
 78 |             '-'.join([
 79 |                          j['@id'], j['STAT_NAME']['$'], str(j['SURVEY_DATE']), j['GOV_ORG']['$'], j[
 80 |                              'MAIN_CATEGORY']['$'], j['SUB_CATEGORY']['$']
 81 |                          ]) + '.dic'
 82 |             for j in jd
 83 |         ]).encode('utf-8')
 84 |         with open(self.path['dictionary-index'], 'w') as f:
 85 |             f.write(rows)
 86 |     # 統計センターが作成するindexのダウンロード用関数
 87 | 
 88 |     def load_stat_center_index(self):
 89 |         self.cmd_line(self.build_cmd([
 90 |             'curl', '-o', self.path['dictionary-stat-center'], '"' +
 91 |             self.path['url-dictionary-stat-center'] + '"'
 92 |         ]))
 93 | 
 94 |     def build_detailed_index(self):
 95 |         jd = self.load_json(
 96 |             self.path['statid-json'])['GET_STATS_LIST']['DATALIST_INF']['TABLE_INF']
 97 |         for i, j in enumerate(jd):
 98 | 
 99 |             filename = '-'.join([
100 |                 j['@id'], j['STAT_NAME']['$'], str(j['SURVEY_DATE']), j['GOV_ORG']['$'], j[
101 |                     'MAIN_CATEGORY']['$'], j['SUB_CATEGORY']['$']
102 |             ]) + '.dic'
103 |             try:
104 |                 STATISTICS_NAME = self.create_n_gram_str(
105 |                     j['STATISTICS_NAME'], self.gram)
106 |             except:
107 |                 STATISTICS_NAME = ''
108 |             try:
109 |                 TITLE = self.create_n_gram_str(j['TITLE']['$'], self.gram)
110 |             except:
111 |                 TITLE = ''
112 |             with open(self.path['dictionary-detail'] + filename, 'w') as f:
113 |                 f.write(
114 |                     '\n'.join([STATISTICS_NAME.encode('utf-8'), TITLE.encode('utf-8')]))
115 | 
116 |     def create_n_gram_str(self, str, gram):
117 |         str = unicodedata.normalize('NFKC', str)
118 |         str = re.sub('[\s\(\)-,\[\]]', '', str).replace(u'・', '')
119 |         return ','.join([v for v in [str[str.index(s):str.index(s) + gram] for s in str] if v is not ''])
120 | 
121 |     def search_detailed_index(self, q):
122 |         detail_files = os.listdir(self.path['dictionary-detail'])
123 |         detail_index = []
124 |         for dic in detail_files:
125 |             with open(self.path['dictionary-detail'] + dic, 'r') as f:
126 |                 for row in f.readlines():
127 |                     if q in row:
128 |                         detail_index.append(','.join([dic.split('-')[0], q]))
129 |         return detail_index
130 | 
131 |     def create_user_index_from_detailed_index(self, q):
132 |         with open(self.path['dictionary-user'], 'a') as f:
133 |             f.write('\n'.join(self.search_detailed_index(q)) + '\n')
134 | 
135 |     def build_uri(self, param):
136 |         return '?'.join([
137 |             '/'.join([self.url['host'], self.url['path']]
138 |                      ), '&'.join([k + '=' + str(v) for k, v in param.items()])
139 |         ])
140 | 
141 |     def build_cmd(self, cmd_list):
142 |         return ' '.join(cmd_list)
143 | 
144 |     def cmd_line(self, cmd):
145 |         try:
146 |             return subprocess.check_output(cmd, shell=True)
147 |         except:
148 |             return None
149 | 
150 |     def load_json(self, path):
151 |         with open(path) as json_data:
152 |             return json.load(json_data)
153 | 
154 |     def search_id(self, q, _index, _header='index'):
155 |         if q == 'index':
156 |             rows = [[c for c in f.split('-') if '.dic' not in c]
157 |                     for f in self.cmd_line(self.build_cmd(['cat', _index])).split('\n')]
158 |         else:
159 |             output = self.cmd_line(
160 |                 'cat ' + _index + ' | ' + 'grep -n \"' + q + '\"').split('\n')
161 |             rows = [[c if i > 0 else c.split(
162 |                 ':')[-1] for i, c in enumerate(f.split('-')) if '.dic' not in c] for f in output]
163 |         for i, r in enumerate(rows):
164 |             if len(r) == 6:
165 |                 rows[i][2] = rows[i][2] + '-' + rows[i][3]
166 |                 del rows[i][3]
167 |             rows[i] = ','.join(rows[i])
168 |         rows = '\n'.join([','.join(self.csv_header[_header]), '\n'.join(rows)])
169 |         return rows
170 | 
171 |     def get_all_data(self, statsDataId, next_key):
172 |         self.cache['tmp'] = self.path['tmp'] + \
173 |             '.'.join([self._['appId'], statsDataId, next_key, 'json'])
174 |         try:
175 |             if os.path.exists(self.cache['tmp']) == False:
176 |                 apiURI = self.build_uri({
177 |                     'appId'		: self._['appId'], 'statsDataId'	: statsDataId, 'limit'		: self._['limit'], 'startPosition': next_key
178 |                 })
179 |                 self.cmd_line(self.build_cmd(
180 |                     ['curl', '-o', self.cache['tmp'], '"' + apiURI + '"'])).replace('\n', '')
181 |             RESULT_INF = self.load_json(self.cache['tmp'])['GET_STATS_DATA'][
182 |                 'STATISTICAL_DATA']['RESULT_INF']
183 |             NEXT_KEY = '-1' if 'NEXT_KEY' not in RESULT_INF.keys() else RESULT_INF[
184 |                 'NEXT_KEY']
185 |             return str(NEXT_KEY)
186 |         except:
187 |             # 下記のエラー処理は考える
188 |             filepath = self.path[
189 |                 'tmp'] + '.'.join([self._['appId'], statsDataId, '*', 'json'])
190 |             try:
191 |                 downloaded_files = self.cmd_line(
192 |                     self.build_cmd(['ls', filepath]))
193 |                 if downloaded_files != '':
194 |                     self.remove_file(filepath)
195 |                 return None
196 |             except:
197 |                 return None
198 | 
199 |     def convert_raw_json_to_csv(self, statsDataId):
200 |         try:
201 |             self.cache['csv'] = self.path['csv'] + statsDataId + '.csv'
202 |             dat = {'header': None, 'body': [], 'keys': None}
203 |             ix = [
204 |                 {int(f.split('.')[1]):f}
205 |                 for f in self.cmd_line(
206 |                     self.build_cmd(
207 |                         ['ls', self.path['tmp'] + '.'.join([self._['appId'], statsDataId, '*', 'json'])])
208 |                 ).split('\n')
209 |                 if f != ''
210 |             ]
211 |             print ix
212 |             ix.sort()
213 |             ix = [hash.values()[0] for hash in ix]
214 |             for i, json_file in enumerate(ix):
215 |                 print i, json_file
216 |                 jd = self.load_json(json_file)
217 |                 if i == 0:
218 |                     dat['header'] = [
219 |                         k.replace('@', '')
220 |                         for k in jd['GET_STATS_DATA']['STATISTICAL_DATA']['DATA_INF']['VALUE'][0].keys()
221 |                     ]
222 |                     dat['keys'] = jd['GET_STATS_DATA'][
223 |                         'STATISTICAL_DATA']['CLASS_INF']
224 |                 dat['body'].extend(jd['GET_STATS_DATA'][
225 |                                    'STATISTICAL_DATA']['DATA_INF']['VALUE'])
226 |             _h = {}
227 |             _b = {}
228 |             for o in dat['keys']['CLASS_OBJ']:
229 |                 o['CLASS'] = [o['CLASS']] if (
230 |                     type(o['CLASS']) is list) is False else o['CLASS']
231 |                 if o['@id'] not in _b.keys():
232 |                     _b[o['@id']] = {}
233 |                 for oc in o['CLASS']:
234 |                     _b[o['@id']][oc['@code']] = oc['@name']
235 |                 _h[o['@id']] = o['@name']
236 |             newCSV = [[r.encode('utf-8') for r in [_h[h]
237 |                                                    if h in _h.keys() else h for h in dat['header']]]]
238 |             newCSV.append(dat['header'])
239 |             for body in dat['body']:
240 |                 newCSV.append(body.values())
241 |             for i, x in enumerate(newCSV):
242 |                 if i > 0:
243 |                     for j, d in enumerate(x):
244 |                         if dat['header'][j] in _b.keys() and d in _b[dat['header'][j]].keys():
245 |                             newCSV[i][j] = _b[dat['header'][j]][
246 |                                 d].encode('utf-8')
247 |                         else:
248 |                             newCSV[i][j] = d.encode('utf-8')
249 |             with open(self.cache['csv'], 'w') as f:
250 |                 csv.writer(f, quoting=csv.QUOTE_NONNUMERIC).writerows(newCSV)
251 |             filepath = self.path[
252 |                 'tmp'] + '.'.join([self._['appId'], statsDataId, '*', 'json'])
253 |             self.cmd_line(self.build_cmd(['rm', filepath]))
254 | 
255 |         except:
256 |             filepath = self.path[
257 |                 'tmp'] + '.'.join([self._['appId'], statsDataId, '*', 'json'])
258 |             if os.path.exists(filepath):
259 |                 self.cmd_line(self.build_cmd(['rm', filepath]))
260 | 
261 |     def merge_data(self, statsDataId, group_by, aggregate):
262 |         statsDataId = statsDataId.split(',')
263 |         data = {}
264 |         for id in statsDataId:
265 |             csv_path = self.path['csv'] + id + '.csv'
266 |             if os.path.exists(csv_path) == False:
267 |                 self.get_all_data(id, '1')
268 |                 self.convert_raw_json_to_csv(id)
269 |             data[id] = pd.read_csv(csv_path, skiprows=[0])
270 |             data[id]['stat-id'] = id
271 |         for k, v in data.items():
272 |             v.rename(columns=lambda x: x.replace('$', '$' + k), inplace=True)
273 |         data = pd.concat([v for k, v in data.items()], ignore_index=True)
274 |         if group_by != 'all':
275 |             # summation
276 |             if aggregate == 'sum':
277 |                 data = data.groupby(group_by.split(',')).sum()
278 |             # min
279 |             elif aggregate == 'min':
280 |                 data = data.groupby(group_by.split(',')).min()
281 |             # max
282 |             elif aggregate == 'max':
283 |                 data = data.groupby(group_by.split(',')).max()
284 |             # median
285 |             elif aggregate == 'median':
286 |                 data = data.groupby(group_by.split(',')).median()
287 |             # count
288 |             elif aggregate == 'count':
289 |                 data = data.groupby(group_by.split(',')).count()
290 |             # variance
291 |             elif aggregate == 'var':
292 |                 data = data.groupby(group_by.split(',')).var()
293 |             # standard deviation
294 |             elif aggregate == 'std':
295 |                 data = data.groupby(group_by.split(',')).std()
296 |             # mean
297 |             elif aggregate == 'mean':
298 |                 data = data.groupby(group_by.split(',')).mean()
299 |             else:
300 |                 data = data
301 |         if group_by != 'all':
302 |             data = data.loc[:, [c for c in data.columns if '$' in c or group_by in c]
303 |                             ] if aggregate == '' else data.loc[:, [c for c in data.columns if '$' in c]]
304 |         return data.reset_index()
305 | 
306 |     def remove_file(self, filepath):
307 |         self.cmd_line(self.build_cmd([
308 |             'rm', filepath
309 |         ]))
310 | 
311 |     def get_csv(self, cmd, statsDataId):
312 |         cmd = 'cat' if cmd == 'get' else cmd
313 |         self.cache['csv'] = self.path['csv'] + statsDataId + '.csv'
314 | 
315 |         if os.path.exists(self.cache['csv']) == False:
316 |             next_key = '1'
317 |             if self._['next_key'] == True:
318 |                 while next_key != '-1':
319 |                     next_key = self.get_all_data(statsDataId, next_key)
320 |                     print next_key
321 |             else:
322 |                 self.get_all_data(statsDataId, next_key)
323 |             self.convert_raw_json_to_csv(statsDataId)
324 |         txt = self.cmd_line(self.build_cmd([
325 |             cmd, self.cache['csv'], " | awk 'NR != 2 { print $0; }'"
326 |         ])) if cmd == 'cat' or cmd == 'head' else self.cmd_line(self.build_cmd([
327 |             cmd, self.cache['csv']
328 |         ]))
329 |         return txt
330 | 
331 |     def error(self, txt):
332 |         return txt
333 | 
334 |     def get_output(self, data, output_type):
335 |         def get_tmp_data(tmp_data_0_j, tmp_data_i_j):
336 |             if re.match('^\$[0-9]+$', tmp_data_0_j) or tmp_data_0_j == '$':
337 |                 return float(tmp_data_i_j) if tmp_data_i_j != '' else None
338 |             else:
339 |                 return tmp_data_i_j
340 |         if output_type == 'csv':
341 |             return data
342 |         elif output_type == 'rjson':
343 |             tmp_data = [d for d in csv.reader(StringIO.StringIO(data.strip()))]
344 |             data = []
345 |             for i in range(1, len(tmp_data)):
346 |                 row_data = {}
347 |                 for j in range(0, len(tmp_data[i])):
348 |                     row_data[tmp_data[0][j]] = get_tmp_data(
349 |                         tmp_data[0][j], tmp_data[i][j])
350 |                 data.append(row_data)
351 |             return json.dumps(data)
352 |         elif output_type == 'cjson':
353 |             tmp_data = [d for d in csv.reader(StringIO.StringIO(data.strip()))]
354 |             print tmp_data[0]
355 |             data = {}
356 |             for i in range(0, len(tmp_data[0])):
357 |                 print tmp_data[0][i]
358 |                 data[tmp_data[0][i]] = [get_tmp_data(tmp_data[0][i], tmp_data[j][
359 |                                                      i]) for j in range(1, len(tmp_data))]
360 |             return json.dumps(data)
361 |         else:
362 |             return self.error(self.msg['check-extension'])
363 | 
364 |     def mimetype(self, ext):
365 |         mt = 'text/plain' if ext == 'csv' else 'application/json'
366 |         mt = 'application/octet-stream' if request.args.get(
367 |             'dl') == 'true' else mt
368 |         return mt
369 | 
370 |     def response(self, res, ext):
371 |         return Response(res, mimetype=self.mimetype(ext), headers=self.header)
372 | 


--------------------------------------------------------------------------------
/python/examples.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | import sys
 4 | sys.path.append('./')
 5 | import e_Stat_API_Adaptor
 6 | eStatAPI = e_Stat_API_Adaptor.e_Stat_API_Adaptor({
 7 |     # 取得したappId
 8 |     'appId'	: '#appID#'    # データをダウンロード時に一度に取得するデータ件数
 9 |     # next_keyに対応するか否か(非対応の場合は上記のlimitで設定した件数のみしかダウンロードされない)
10 |     , 'limit'	: '10000'    # 対応時はTrue/非対応時はFalse
11 |     , 'next_key'	: True        # 中間アプリの設置ディレクトリ
12 |     , 'directory': '#絶対パス#'        # APIのバージョン
13 |     , 'ver'		: '2.0'
14 | })
15 | #
16 | #
17 | # # インストール直後に下記を実行
18 | # #
19 | # 全ての統計表IDをローカルにダウンロード
20 | # print eStatAPI.load_all_ids()
21 | # ダウンロードした統計表IDからインデックスを作成
22 | # print eStatAPI.build_statid_index()
23 | #
24 | #
25 | # # STATISTICS_NAMEとTITLEからインデックスを作成(N-gram)
26 | # # print eStatAPI.build_detailed_index()
27 | # # print eStatAPI.search_detailed_index('家計')
28 | # #
29 | # # 下記でユーザー用のインデックスにすることも可能
30 | # # print eStatAPI.create_user_index_from_detailed_index('法人')
31 | #
32 | #
33 | # # インデックスリストを検索
34 | # # print  eStatAPI.search_id('法人', eStatAPI.path['dictionary-index'] )
35 | # # ユーザー作成型インデックスを検索
36 | # # print  eStatAPI.search_id('法人', eStatAPI.path['dictionary-user'], 'user')
37 | #
38 | # # print eStatAPI.search_id('index',	eStatAPI.path['dictionary-index'])
39 | # # print eStatAPI.search_id('家計',  eStatAPI.path['dictionary-index'])
40 | #
41 | #
42 | # # csvファイルのremove
43 | # # eStatAPI.remove_file(eStatAPI.path['csv']+'*.csv')
44 | #
45 | # # データのダウンロード
46 | # # eStatAPI.get_csv('get' , '0000030002')
47 | # # 作成されたCSVファイルの例(1行目が列を表す、2行目はキー、データは文字列)
48 | # #"$","全国都道府県030001","男女Ａ030001","年齢各歳階級Ｂ030003","全域・集中の別030002","時間軸(年次)","unit"
49 | # #"$","area","cat02","cat03","cat01","time","unit"
50 | # #"117060396","全国","男女総数","総数","全域","1980年","人"
51 | #
52 | # # データの結合
53 | # # print eStatAPI.merge_data('0000030001,0000030001', 'all', 'std')
54 | #
55 | # # 一部のデータを見る
56 | # # print eStatAPI.get_csv('head', '0000030001')
57 | # # print eStatAPI.get_csv('tail', '0000030001')
58 | #
59 | # # データの出力
60 | # # csv	:csv形式
61 | # # rjson :json-row形式
62 | # # cjson :json-col形式
63 | # # print eStatAPI.get_output(eStatAPI.get_csv('get' , '0000030001'),'csv')
64 | # # print eStatAPI.get_output(eStatAPI.get_csv('get' , '0000030001'),'rjson')
65 | # # print eStatAPI.get_output(eStatAPI.get_csv('get' , '0000030001'),'cjson')
66 | #
67 | # # ファイル構成
68 | 


--------------------------------------------------------------------------------
/python/get_csv.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | import sys
 4 | sys.path.append('./')
 5 | import e_Stat_API_Adaptor
 6 | eStatAPI = e_Stat_API_Adaptor.e_Stat_API_Adaptor({
 7 |     'appId'	: '#appID#' # 取得したappId    
 8 |     , 'limit'	: '10000'  # データをダウンロード時に一度に取得するデータ件数  
 9 |     , 'next_key'	: True        # next_keyに対応するか否か(非対応の場合は上記のlimitで設定した件数のみしかダウンロードされない)# 対応時はTrue/非対応時はFalse
10 |     , 'directory': '#Directory#'      # 中間アプリの設置ディレクトリ
11 |     , 'ver'		: '2.0'       # APIのバージョン
12 | })
13 | # 0000030001をcsvの形式でダウンロード
14 | print 'id:' + sys.argv[1]
15 | print eStatAPI.get_csv('get', sys.argv[1])
16 | 


--------------------------------------------------------------------------------
/python/install.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | import sys
 4 | sys.path.append('./')
 5 | import e_Stat_API_Adaptor
 6 | eStatAPI = e_Stat_API_Adaptor.e_Stat_API_Adaptor({
 7 |     # 取得したappId
 8 |     'appId'	: '#appId#'   
 9 |     # データをダウンロード時に一度に取得するデータ件数
10 |     , 'limit'	: '10000'    
11 |     # next_keyに対応するか否か(非対応の場合は上記のlimitで設定した件数のみしかダウンロードされない)
12 |     # 対応時はTrue/非対応時はFalse
13 |     , 'next_key'	: True        
14 |    # 中間アプリの設置ディレクトリ
15 |     , 'directory': '#絶対パス# /foo/bar/'
16 |     # APIのバージョン        
17 |     , 'ver'		: '2.0'
18 | })
19 | # 全ての統計表IDをローカルにダウンロード
20 | print eStatAPI.load_all_ids()
21 | # ダウンロードした統計表IDからインデックスを作成
22 | print eStatAPI.build_statid_index()
23 | 


--------------------------------------------------------------------------------
/www/run.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # -*- coding: utf-8 -*-
 3 | import sys
 4 | import csv
 5 | import random
 6 | import pandas as pd
 7 | from flask import Flask
 8 | sys.path.append('../python/')
 9 | 
10 | import e_Stat_API_Adaptor
11 | 
12 | app = Flask(__name__)
13 | eStatAPI = e_Stat_API_Adaptor.e_Stat_API_Adaptor({
14 |     # 取得したappId
15 |     'appId'	: '#appID#'   
16 |     , 'limit' : '10000'    # データをダウンロード時に一度に取得するデータ件数
17 |     , 'next_key': False # next_keyに対応するか否か(非対応の場合は上記のlimitで設定した件数のみしかダウンロードされない)  対応時はTrue/非対応時はFalse
18 |     , 'directory': '#絶対パス# /foo/bar/'     # 中間アプリの設置ディレクトリ
19 |     , 'ver'		: '2.0'      # APIのバージョン
20 |     , 'format'	: 'json'     # データを取得形式
21 | })
22 | 
23 | 
24 | @app.route(eStatAPI.path['http-public'] + '<appId>/search/<q>.<ext>', methods=['GET'])
25 | def _search_id(appId, q, ext):
26 |     eStatAPI._['appId'] = appId
27 |     return eStatAPI.response(eStatAPI.get_output(eStatAPI.search_id(q, eStatAPI.path['dictionary-index']), ext), ext)
28 | 
29 | 
30 | @app.route(eStatAPI.path['http-public'] + '<appId>/<cmd>/<id>.<ext>', methods=['GET'])
31 | def _get_data(appId, cmd, id, ext):
32 |     eStatAPI._['appId'] = appId
33 |     return eStatAPI.response(eStatAPI.get_output(eStatAPI.get_csv(cmd, id), ext), ext)
34 | 
35 | 
36 | @app.route(eStatAPI.path['http-public'] + '<appId>/merge/<ids>/<group_by>.<ext>', methods=['GET'])
37 | def _merge_data(appId, ids, group_by, ext):
38 |     eStatAPI._['appId'] = appId
39 |     aggregate = request.args.get('aggregate') if request.args.get(
40 |         'aggregate') is not None else ''
41 |     data = eStatAPI.merge_data(ids, group_by, aggregate)
42 |     eStatAPI.path['tmp_merge'] = eStatAPI.path['tmp'] + '.'.join(
43 |         [eStatAPI._['appId'], ''.join([l for l in random.choice(eStatAPI.random_str)]), 'csv'])
44 |     data.to_csv(eStatAPI.path['tmp_merge'],
45 |                 quoting=csv.QUOTE_NONNUMERIC, index=None)
46 |     tmp_csv = eStatAPI.cmd_line(eStatAPI.build_cmd(
47 |         ['cat', eStatAPI.path['tmp_merge']]))
48 |     eStatAPI.cmd_line(eStatAPI.build_cmd(['rm', eStatAPI.path['tmp_merge']]))
49 |     return eStatAPI.response(eStatAPI.get_output(tmp_csv, ext), ext)
50 | 
51 | if __name__ == '__main__':
52 |     app.run(host='0.0.0.0', debug=True)
53 | 


--------------------------------------------------------------------------------