├── .gitignore ├── LICENSE ├── README.md ├── heroku └── dadosabertosinep.zip └── scripts ├── exemplo_mapreduce.json ├── exemplo_mapreduce.txt └── import ├── anos_finais.csv ├── anos_iniciais.csv ├── divulgacao-anos-finais-municipios-2011.csv ├── divulgacao-anos-iniciais-municipios-2011.csv ├── importa_ideb.py ├── importa_ideb_municipal.py └── importa_ideb_municipal_mongo.py /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | _apps 3 | _documentos 4 | dados 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc., 5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Lesser General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you". 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | {description} 294 | Copyright (C) {year} {fullname} 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License along 307 | with this program; if not, write to the Free Software Foundation, Inc., 308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 309 | 310 | Also add information on how to contact you by electronic and paper mail. 311 | 312 | If the program is interactive, make it output a short notice like this 313 | when it starts in an interactive mode: 314 | 315 | Gnomovision version 69, Copyright (C) year name of author 316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 317 | This is free software, and you are welcome to redistribute it 318 | under certain conditions; type `show c' for details. 319 | 320 | The hypothetical commands `show w' and `show c' should show the appropriate 321 | parts of the General Public License. Of course, the commands you use may 322 | be called something other than `show w' and `show c'; they could even be 323 | mouse-clicks or menu items--whatever suits your program. 324 | 325 | You should also get your employer (if you work as a programmer) or your 326 | school, if any, to sign a "copyright disclaimer" for the program, if 327 | necessary. Here is a sample; alter the names: 328 | 329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 330 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 331 | 332 | {signature of Ty Coon}, 1 April 1989 333 | Ty Coon, President of Vice 334 | 335 | This General Public License does not permit incorporating your program into 336 | proprietary programs. If your program is a subroutine library, you may 337 | consider it more useful to permit linking proprietary applications with the 338 | library. If this is what you want to do, use the GNU Lesser General 339 | Public License instead of this License. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | ## **PROJETO DADOS ABERTOS DO INEP** 3 | 4 | ### Objetivos do projeto 5 | * Mostrar que é possível disponibilizar dados do INEP seguindo as recomendações da Cartilha de Dados Abertos (http://dados.gov.br/cartilha-publicacao-dados-abertos/) 6 | * Disponibilizar os dados mais brutos e permitir que seja feito o acesso simplificado, com considerações semânticas da URL e com identificadores únicos e persistentes 7 | 8 | ### Domínio da API 9 | 10 | * Utilize o endereço http://api.dadosabertosinep.org/v1 como prefixo de todas as chamadas dessa API 11 | 12 | 13 | ### URLs de chamadas da API 14 | 15 | **Retorna escolas com um determinado filtro (não exclusivo) [micro-dado]** 16 | * /ideb/escolas.{json} 17 | 18 | * Paramêtros (pelo menos um parâmetro deve ser informado) 19 | * uf=[sigla] 20 | * codigo_municipio=[cod_municipio] 21 | * rede=[municipal|estadual|federal|publica] 22 | 23 | * Exemplos 24 | * http://api.dadosabertosinep.org/v1/ideb/escolas.json?uf=RR 25 | * http://api.dadosabertosinep.org/v1/ideb/escolas.json?uf=RR&rede=estadual 26 | * http://api.dadosabertosinep.org/v1/ideb/escolas.json?codigo_municipio=1100254 27 | * http://api.dadosabertosinep.org/v1/ideb/escolas.json?codigo_municipio=1100254&rede=municipal 28 | 29 | **Retorna resumo dos dados de um determinado filtro (não exclusivo)** 30 | * /ideb.{json} 31 | 32 | * Paramêtros 33 | * uf=[sigla] obrigatório 34 | 35 | * Exemplo 36 | * http://api.dadosabertosinep.org/v1/ideb.json?uf=ES 37 | 38 | **Retorna uma escola específica** 39 | * /ideb/escola/[código_escola].{json} 40 | 41 | * Exemplo 42 | * http://api.dadosabertosinep.org/v1/ideb/escola/43101895.json 43 | 44 | **(FUTURO) retorna resumo agrupado de uma UF específica** 45 | * /ideb/uf/[uf].{json} 46 | 47 | * Paramêtros 48 | * rede=[municipal|estadual|federal|publica] 49 | 50 | * Exemplos 51 | * http://api.dadosabertosinep.org/v1/ideb/uf/MG.json 52 | * http://api.dadosabertosinep.org/v1/ideb/uf/MG.json?rede=municipal 53 | 54 | **(FUTURO) retorna resultado agrupado de um município específico** 55 | * /ideb/municipio/[código_municipio].{json} 56 | 57 | * Exemplo 58 | * http://api.dadosabertosinep.org/v1/ideb/municipio/1100254.json 59 | 60 | ### Roadmap do projeto 61 | 62 | 1. refinar a lógica de organização dos dados 63 | 64 | 1. Criar “bucket único das escolas” agrupando dados das escolas dos censos (educação básica e ensino superior) com as informações únicas das escolas (inclusive dados de geolocalização) 65 | 1. Criar “bucket do censo escolar” com os dados de estrutura, cursos, docentes e alunos 66 | 1. Criar “bucket de cada indicador” com o código da escola como chave. Exemplo: bucket “ideb”, com a chave “11046430” ([código_escola]_[indicador]) e nesse índice exibir os dados agrupados por ano 67 | 1. Estudar e identificar modelo de buckets para as pesquisas (SAEB, ENEM, PADAE, PNERA, PROVAO, PROVA BRASIL, CENSO MAGISTERIO) 68 | 69 | 2. gerar visualizações dos dados brutos nos formatos csv, html e xml como ocorre no http://api.convenios.gov.br/ 70 | 71 | 72 | -------------------------------------------------------------------------------- /heroku/dadosabertosinep.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/inepdadosabertos/api/7d32b7e23bc6b5eb0fa2916d1cf982d3203c79d6/heroku/dadosabertosinep.zip -------------------------------------------------------------------------------- /scripts/exemplo_mapreduce.json: -------------------------------------------------------------------------------- 1 | { 2 | "inputs": [ 3 | 4 | ["test_leao","11047739"], 5 | ["test_leao","11043709"], 6 | ["test_leao","11040629"], 7 | ["test_leao","11037512"], 8 | ["test_leao","11033576"], 9 | ["test_leao","11033371"], 10 | ["test_leao","11026812"], 11 | ["test_leao","11026235"], 12 | ["test_leao","11025310"], 13 | ["test_leao","11022388"], 14 | ["test_leao","11022221"] 15 | 16 | 17 | ], 18 | "query": [{ 19 | "map": { 20 | "language": "javascript", 21 | "source": "function(value){ 22 | return Riak.mapValuesJson(value); 23 | }", 24 | "keep": true 25 | } 26 | } 27 | ] 28 | } 29 | 30 | -------------------------------------------------------------------------------- /scripts/exemplo_mapreduce.txt: -------------------------------------------------------------------------------- 1 | curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d @exemplo_mapreduce.json 2 | 3 | 4 | 5 | 6 | 7 | http://127.0.0.1:8098/buckets/ideb_escola/index/rede_bin/municipal/ 8 | http://127.0.0.1:8098/buckets/ideb_municipio/index/uf_bin/MG/ 9 | 10 | 11 | -------------------------------------------------------------------------------- /scripts/import/importa_ideb.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | 6 | HOST = "54.207.108.144" #"localhost" #"10.0.0.36" 7 | PORT = 8098 8 | BUCKET_NAME = "ideb_escola" 9 | 10 | 11 | 12 | import riak 13 | import time 14 | import uuid 15 | import csv 16 | import json 17 | 18 | 19 | def format_number(number, type): 20 | 21 | if (number.lower() == "nd" or number.lower() == "nd*"): 22 | return None 23 | 24 | return type(number.replace("-", "0").replace(",", ".")) 25 | 26 | 27 | 28 | def create_new_item(codigo_escola, row): 29 | return { 30 | 31 | "codigo_escola": codigo_escola, 32 | "codigo_municipio": row[1], 33 | "uf": row[0], 34 | "nome_municipio": row[2], 35 | "nome_escola": row[4].decode('UTF-8'), 36 | "rede": row[5].lower(), 37 | 38 | "taxa_aprovacao": { 39 | "2005": {}, 40 | "2007": {}, 41 | "2009": {}, 42 | "2011": {} 43 | }, 44 | 45 | "nota_prova_brasil": { 46 | "2005": {}, 47 | "2007": {}, 48 | "2009": {}, 49 | "2011": {} 50 | }, 51 | 52 | "ideb": {}, 53 | "projecoes": {} 54 | 55 | } 56 | 57 | 58 | 59 | client = riak.RiakClient(host=HOST, http_port=PORT) 60 | escola_bucket = client.bucket(BUCKET_NAME) 61 | 62 | 63 | 64 | print client.get_buckets() 65 | 66 | 67 | dados = {} 68 | dados_array = [] 69 | 70 | #i=0 71 | 72 | 73 | 74 | with open("anos_iniciais.csv", "rb") as csvfile: 75 | spamreader = csv.reader(csvfile, delimiter=',', quotechar='"') 76 | headers = spamreader.next() 77 | for row in spamreader: 78 | 79 | #i+=1 80 | #if i > 100: 81 | # i=0 82 | # break; 83 | 84 | codigo_escola = row[3] 85 | print codigo_escola 86 | 87 | item = create_new_item(codigo_escola, row) 88 | if codigo_escola in dados: 89 | item = dados[codigo_escola] 90 | 91 | 92 | 93 | item["taxa_aprovacao"]["2005"]["anos_iniciais"] = { 94 | "ano_1a5": format_number(row[6], float), 95 | "ano_1": format_number(row[7], float), 96 | "ano_2": format_number(row[8], float), 97 | "ano_3": format_number(row[9], float), 98 | "ano_4": format_number(row[10], float), 99 | "ano_5": format_number(row[11], float), 100 | "indicador_rendimento": format_number(row[12], float) 101 | } 102 | 103 | item["taxa_aprovacao"]["2007"]["anos_iniciais"] = { 104 | "ano_1a5": format_number(row[13], float), 105 | "ano_1": format_number(row[14], float), 106 | "ano_2": format_number(row[15], float), 107 | "ano_3": format_number(row[16], float), 108 | "ano_4": format_number(row[17], float), 109 | "ano_5": format_number(row[18], float), 110 | "indicador_rendimento": format_number(row[19], float) 111 | } 112 | item["taxa_aprovacao"]["2009"]["anos_iniciais"] = { 113 | "ano_1a5": format_number(row[20], float), 114 | "ano_1": format_number(row[21], float), 115 | "ano_2": format_number(row[22], float), 116 | "ano_3": format_number(row[23], float), 117 | "ano_4": format_number(row[24], float), 118 | "ano_5": format_number(row[25], float), 119 | "indicador_rendimento": row[26] 120 | } 121 | item["taxa_aprovacao"]["2011"]["anos_iniciais"] = { 122 | "ano_1a5": format_number(row[27], float), 123 | "ano_1": format_number(row[28], float), 124 | "ano_2": format_number(row[29], float), 125 | "ano_3": format_number(row[30], float), 126 | "ano_4": format_number(row[31], float), 127 | "ano_5": format_number(row[32], float), 128 | "indicador_rendimento": format_number(row[33], float) 129 | } 130 | 131 | 132 | 133 | item["nota_prova_brasil"]["2005"]["anos_iniciais"] = { 134 | "matematica": format_number(row[34], float), 135 | "lingua_portuguesa": format_number(row[35], float), 136 | "nota_media_padronizada": format_number(row[36], float) 137 | } 138 | item["nota_prova_brasil"]["2007"]["anos_iniciais"] = { 139 | "matematica": format_number(row[37], float), 140 | "lingua_portuguesa": format_number(row[38], float), 141 | "nota_media_padronizada": format_number(row[39], float) 142 | } 143 | item["nota_prova_brasil"]["2009"]["anos_iniciais"] = { 144 | "matematica": format_number(row[40], float), 145 | "lingua_portuguesa": format_number(row[41], float), 146 | "nota_media_padronizada": format_number(row[42], float) 147 | } 148 | item["nota_prova_brasil"]["2011"]["anos_iniciais"] = { 149 | "matematica": format_number(row[43], float), 150 | "lingua_portuguesa": format_number(row[44], float), 151 | "nota_media_padronizada": format_number(row[45], float) 152 | } 153 | 154 | item["ideb"]["anos_iniciais"] = { 155 | "2005": format_number(row[46], float), 156 | "2007": format_number(row[47], float), 157 | "2009": format_number(row[48], float), 158 | "2011": format_number(row[49], float) 159 | } 160 | item["projecoes"]["anos_iniciais"] = { 161 | "2007": format_number(row[50], float), 162 | "2009": format_number(row[51], float), 163 | "2011": format_number(row[52], float), 164 | "2013": format_number(row[53], float), 165 | "2015": format_number(row[54], float), 166 | "2017": format_number(row[55], float), 167 | "2019": format_number(row[56], float), 168 | "2021": format_number(row[57], float) 169 | } 170 | 171 | dados[codigo_escola] = item 172 | 173 | 174 | 175 | 176 | print "**** DADOS FINAIS ***" 177 | 178 | with open("anos_finais.csv", "rb") as csvfile: 179 | spamreader = csv.reader(csvfile, delimiter=',', quotechar='"') 180 | headers = spamreader.next() 181 | for row in spamreader: 182 | 183 | 184 | #i+=1 185 | #if i > 100: 186 | # i=0 187 | # break; 188 | 189 | codigo_escola = row[3] 190 | print codigo_escola 191 | 192 | item = create_new_item(codigo_escola, row) 193 | if codigo_escola in dados: 194 | item = dados[codigo_escola] 195 | 196 | 197 | 198 | item["taxa_aprovacao"]["2005"]["anos_finais"] = { 199 | "ano_6a9": format_number(row[6], float), 200 | "ano_6": format_number(row[7], float), 201 | "ano_7": format_number(row[8], float), 202 | "ano_8": format_number(row[9], float), 203 | "ano_9": format_number(row[10], float), 204 | "indicador_rendimento": format_number(row[11], float) 205 | } 206 | item["taxa_aprovacao"]["2007"]["anos_finais"] = { 207 | "ano_6a9": format_number(row[12], float), 208 | "ano_6": format_number(row[13], float), 209 | "ano_7": format_number(row[14], float), 210 | "ano_8": format_number(row[15], float), 211 | "ano_9": format_number(row[16], float), 212 | "indicador_rendimento": format_number(row[17], float) 213 | } 214 | item["taxa_aprovacao"]["2009"]["anos_finais"] = { 215 | "ano_6a9": format_number(row[18], float), 216 | "ano_6": format_number(row[19], float), 217 | "ano_7": format_number(row[20], float), 218 | "ano_8": format_number(row[21], float), 219 | "ano_9": format_number(row[22], float), 220 | "indicador_rendimento": format_number(row[23], float) 221 | } 222 | item["taxa_aprovacao"]["2011"]["anos_finais"] = { 223 | "ano_6a9": format_number(row[24], float), 224 | "ano_6": format_number(row[25], float), 225 | "ano_7": format_number(row[26], float), 226 | "ano_8": format_number(row[27], float), 227 | "ano_9": format_number(row[28], float), 228 | "indicador_rendimento": format_number(row[29], float) 229 | } 230 | 231 | 232 | item["nota_prova_brasil"]["2005"]["anos_finais"] = { 233 | "matematica": format_number(row[30], float), 234 | "lingua_portuguesa": format_number(row[31], float), 235 | "nota_media_padronizada": format_number(row[32], float) 236 | } 237 | item["nota_prova_brasil"]["2007"]["anos_finais"] = { 238 | "matematica": format_number(row[33], float), 239 | "lingua_portuguesa": format_number(row[34], float), 240 | "nota_media_padronizada": format_number(row[35], float) 241 | } 242 | item["nota_prova_brasil"]["2009"]["anos_finais"] = { 243 | "matematica": format_number(row[36], float), 244 | "lingua_portuguesa": format_number(row[37], float), 245 | "nota_media_padronizada": format_number(row[38], float) 246 | } 247 | item["nota_prova_brasil"]["2011"]["anos_finais"] = { 248 | "matematica": format_number(row[39], float), 249 | "lingua_portuguesa": format_number(row[40], float), 250 | "nota_media_padronizada": format_number(row[41], float) 251 | } 252 | 253 | 254 | 255 | item["ideb"]["anos_finais"] = { 256 | "2005": format_number(row[42], float), 257 | "2007": format_number(row[43], float), 258 | "2009": format_number(row[44], float), 259 | "2011": format_number(row[45], float) 260 | } 261 | item["projecoes"]["anos_finais"] = { 262 | "2007": format_number(row[46], float), 263 | "2009": format_number(row[47], float), 264 | "2011": format_number(row[48], float), 265 | "2013": format_number(row[49], float), 266 | "2015": format_number(row[50], float), 267 | "2017": format_number(row[51], float), 268 | "2019": format_number(row[52], float), 269 | "2021": format_number(row[53], float) 270 | } 271 | 272 | dados[codigo_escola] = item 273 | 274 | 275 | 276 | print "salvando no RIAK..." 277 | 278 | for codigo_escola in dados: 279 | print "> %s" % codigo_escola 280 | item = dados[codigo_escola] 281 | new_escola = escola_bucket.new( 282 | "%s" % (codigo_escola), data=item) 283 | 284 | new_escola.add_index('uf_bin', item['uf']) 285 | new_escola.add_index('codigo_municipio_bin', item['codigo_municipio']) 286 | new_escola.add_index('rede_bin', item['rede']) 287 | new_escola.add_index('uf_rede_bin', "%s_%s" % (item['uf'], item['rede'])) 288 | new_escola.add_index('codigo_municipio_rede_bin', "%s_%s" % (item['codigo_municipio'], item['rede'])) 289 | new_escola.store() 290 | 291 | 292 | #print row 293 | #print ', '.join(row) 294 | 295 | #break 296 | 297 | 298 | 299 | 300 | print "acabou!" 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | """ 309 | MAPA DO CSV 310 | 311 | 0 uf, 312 | 1 codigo_municipio, 313 | 2 nome_municipio, 314 | 3 codigo_escola, 315 | 4 nome_escola, 316 | 5 rede, 317 | 6 2005_aprov_1a5, 318 | 7 2005_ano_1, 319 | 8 2005_ano_2, 320 | 9 2005_ano_3, 321 | 10 2005_ano_4, 322 | 11 2005_ano_5, 323 | 12 2005_indicador_1a5, 324 | 13 2007_aprov_1a5, 325 | 14 2007_ano_1, 326 | 15 2007_ano_2, 327 | 16 2007_ano_3, 328 | 17 2007_ano_4, 329 | 18 2007_ano_5, 330 | 19 2007_indicador_1a5, 331 | 20 2009_aprov_1a5, 332 | 21 2009_ano_1, 333 | 22 2009_ano_2, 334 | 23 2009_ano_3, 335 | 24 2009_ano_4, 336 | 25 2009_ano_5, 337 | 26 2009_indicador_1a5, 338 | 27 2011_aprov_1a5, 339 | 28 2011_ano_1, 340 | 29 2011_ano_2, 341 | 30 2011_ano_3, 342 | 31 2011_ano_4, 343 | 32 2011_ano_5, 344 | 33 2011_indicador_1a5, 345 | 34 2005_provabrasil_Matemática, 346 | 35 2005_provabrasil_Língua Portuguesa, 347 | 36 2005_provabrasil_Nota Média Padronizada (N), 348 | 37 2007_provabrasil_Matemática, 349 | 38 2007_provabrasil_Língua Portuguesa, 350 | 39 2007_provabrasil_Nota Média Padronizada (N), 351 | 40 2009_provabrasil_Matemática, 352 | 41 2009_provabrasil_Língua Portuguesa, 353 | 42 2009_provabrasil_Nota Média Padronizada (N), 354 | 43 2011_provabrasil_Matemática, 355 | 44 2011_provabrasil_Língua Portuguesa, 356 | 45 2011_provabrasil_MEDIA_PADRONIZADA, 357 | 46 Ideb_2005, 358 | 47 Ideb_2007, 359 | 48 Ideb_2009, 360 | 49 Ideb_2011, 361 | 50 Proj_2007, 362 | 51 Proj_2009, 363 | 52 Proj_2011, 364 | 53 Proj_2013, 365 | 54 Proj_2015, 366 | 55 Proj_2017, 367 | 56 Proj_2019, 368 | 57 Proj_2021 369 | 370 | """ 371 | 372 | 373 | 374 | -------------------------------------------------------------------------------- /scripts/import/importa_ideb_municipal.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | 6 | HOST = "54.207.108.144" #"localhost" #"10.0.0.36" 7 | PORT = 8098 8 | BUCKET_NAME = "ideb_municipio" 9 | 10 | 11 | 12 | import riak 13 | import time 14 | import uuid 15 | import csv 16 | import json 17 | 18 | 19 | def format_rede(rede): 20 | return rede.lower().replace("ú", "u") 21 | 22 | def format_number(number, type): 23 | 24 | if (number.lower() == "nd" or number.lower() == "nd*"): 25 | return None 26 | 27 | return type(number.replace("-", "0").replace(",", ".")) 28 | 29 | 30 | 31 | def create_new_item(codigo_municipio, row): 32 | return { 33 | "codigo_municipio": codigo_municipio, 34 | "uf": row[0], 35 | "nome_municipio": row[2], 36 | "redes": {} 37 | } 38 | 39 | def create_new_rede(codigo_rede): 40 | return { 41 | "taxa_aprovacao": { 42 | "2005": {}, 43 | "2007": {}, 44 | "2009": {}, 45 | "2011": {} 46 | }, 47 | 48 | "nota_prova_brasil": { 49 | "2005": {}, 50 | "2007": {}, 51 | "2009": {}, 52 | "2011": {} 53 | }, 54 | 55 | "ideb": {}, 56 | "projecoes": {} 57 | } 58 | 59 | 60 | 61 | 62 | 63 | client = riak.RiakClient(host=HOST, http_port=PORT) 64 | 65 | 66 | escola_bucket = client.bucket(BUCKET_NAME) 67 | 68 | # We're creating the user data & keying off their username. 69 | # Note that the user hasn't been stored in Riak yet. 70 | 71 | 72 | print client.get_buckets() 73 | 74 | 75 | dados = {} 76 | dados_array = [] 77 | 78 | 79 | 80 | 81 | 82 | i=0 83 | 84 | 85 | with open("divulgacao-anos-iniciais-municipios-2011.csv", "rb") as csvfile: 86 | spamreader = csv.reader(csvfile, delimiter=',', quotechar='"') 87 | headers = spamreader.next() 88 | for row in spamreader: 89 | 90 | #i+=1 91 | #if i > 100: 92 | # i=0 93 | # break; 94 | 95 | codigo_municipio = row[1] 96 | codigo_rede = format_rede(row[3]) 97 | print codigo_municipio 98 | 99 | item = create_new_item(codigo_municipio, row) 100 | if codigo_municipio in dados: 101 | item = dados[codigo_municipio] 102 | 103 | rede = create_new_rede(codigo_rede) 104 | if codigo_rede in item["redes"]: 105 | rede = item["redes"][codigo_rede] 106 | 107 | 108 | rede["taxa_aprovacao"]["2005"]["anos_iniciais"] = { 109 | "ano_1a5": format_number(row[4], float), 110 | "ano_1": format_number(row[5], float), 111 | "ano_2": format_number(row[6], float), 112 | "ano_3": format_number(row[7], float), 113 | "ano_4": format_number(row[8], float), 114 | "ano_5": format_number(row[9], float), 115 | "indicador_rendimento": format_number(row[10], float) 116 | } 117 | 118 | rede["taxa_aprovacao"]["2007"]["anos_iniciais"] = { 119 | "ano_1a5": format_number(row[11], float), 120 | "ano_1": format_number(row[12], float), 121 | "ano_2": format_number(row[13], float), 122 | "ano_3": format_number(row[14], float), 123 | "ano_4": format_number(row[15], float), 124 | "ano_5": format_number(row[16], float), 125 | "indicador_rendimento": format_number(row[17], float) 126 | } 127 | rede["taxa_aprovacao"]["2009"]["anos_iniciais"] = { 128 | "ano_1a5": format_number(row[18], float), 129 | "ano_1": format_number(row[19], float), 130 | "ano_2": format_number(row[20], float), 131 | "ano_3": format_number(row[21], float), 132 | "ano_4": format_number(row[22], float), 133 | "ano_5": format_number(row[23], float), 134 | "indicador_rendimento": row[24] 135 | } 136 | rede["taxa_aprovacao"]["2011"]["anos_iniciais"] = { 137 | "ano_1a5": format_number(row[25], float), 138 | "ano_1": format_number(row[26], float), 139 | "ano_2": format_number(row[27], float), 140 | "ano_3": format_number(row[28], float), 141 | "ano_4": format_number(row[29], float), 142 | "ano_5": format_number(row[30], float), 143 | "indicador_rendimento": format_number(row[31], float) 144 | } 145 | 146 | 147 | 148 | rede["nota_prova_brasil"]["2005"]["anos_iniciais"] = { 149 | "matematica": format_number(row[32], float), 150 | "lingua_portuguesa": format_number(row[33], float), 151 | "nota_media_padronizada": format_number(row[34], float) 152 | } 153 | rede["nota_prova_brasil"]["2007"]["anos_iniciais"] = { 154 | "matematica": format_number(row[35], float), 155 | "lingua_portuguesa": format_number(row[36], float), 156 | "nota_media_padronizada": format_number(row[37], float) 157 | } 158 | rede["nota_prova_brasil"]["2009"]["anos_iniciais"] = { 159 | "matematica": format_number(row[38], float), 160 | "lingua_portuguesa": format_number(row[39], float), 161 | "nota_media_padronizada": format_number(row[40], float) 162 | } 163 | rede["nota_prova_brasil"]["2011"]["anos_iniciais"] = { 164 | "matematica": format_number(row[41], float), 165 | "lingua_portuguesa": format_number(row[42], float), 166 | "nota_media_padronizada": format_number(row[43], float) 167 | } 168 | 169 | rede["ideb"]["anos_iniciais"] = { 170 | "2005": format_number(row[44], float), 171 | "2007": format_number(row[45], float), 172 | "2009": format_number(row[46], float), 173 | "2011": format_number(row[47], float) 174 | } 175 | rede["projecoes"]["anos_iniciais"] = { 176 | "2007": format_number(row[48], float), 177 | "2009": format_number(row[49], float), 178 | "2011": format_number(row[50], float), 179 | "2013": format_number(row[51], float), 180 | "2015": format_number(row[52], float), 181 | "2017": format_number(row[53], float), 182 | "2019": format_number(row[54], float), 183 | "2021": format_number(row[55], float) 184 | } 185 | 186 | item["redes"][codigo_rede] = rede 187 | dados[codigo_municipio] = item 188 | 189 | 190 | 191 | print "**** DADOS FINAIS ***" 192 | 193 | with open("divulgacao-anos-finais-municipios-2011.csv", "rb") as csvfile: 194 | spamreader = csv.reader(csvfile, delimiter=',', quotechar='"') 195 | headers = spamreader.next() 196 | for row in spamreader: 197 | 198 | 199 | #i+=1 200 | #if i > 100: 201 | # i=0 202 | # break; 203 | 204 | codigo_municipio = row[1] 205 | codigo_rede = format_rede(row[3]) 206 | print codigo_municipio 207 | 208 | item = create_new_item(codigo_municipio, row) 209 | if codigo_municipio in dados: 210 | item = dados[codigo_municipio] 211 | 212 | rede = create_new_rede(codigo_rede) 213 | if codigo_rede in item["redes"]: 214 | rede = item["redes"][codigo_rede] 215 | 216 | 217 | rede["taxa_aprovacao"]["2005"]["anos_iniciais"] = { 218 | "ano_6a9": format_number(row[4], float), 219 | "ano_6": format_number(row[5], float), 220 | "ano_7": format_number(row[6], float), 221 | "ano_8": format_number(row[7], float), 222 | "ano_9": format_number(row[8], float), 223 | "indicador_rendimento": format_number(row[9], float) 224 | } 225 | 226 | rede["taxa_aprovacao"]["2007"]["anos_iniciais"] = { 227 | "ano_6a9": format_number(row[10], float), 228 | "ano_6": format_number(row[12], float), 229 | "ano_7": format_number(row[13], float), 230 | "ano_8": format_number(row[14], float), 231 | "ano_9": format_number(row[15], float), 232 | "indicador_rendimento": format_number(row[16], float) 233 | } 234 | rede["taxa_aprovacao"]["2009"]["anos_iniciais"] = { 235 | "ano_6a9": format_number(row[17], float), 236 | "ano_6": format_number(row[18], float), 237 | "ano_7": format_number(row[19], float), 238 | "ano_8": format_number(row[20], float), 239 | "ano_9": format_number(row[21], float), 240 | "indicador_rendimento": row[22] 241 | } 242 | rede["taxa_aprovacao"]["2011"]["anos_iniciais"] = { 243 | "ano_6a9": format_number(row[22], float), 244 | "ano_6": format_number(row[23], float), 245 | "ano_7": format_number(row[24], float), 246 | "ano_8": format_number(row[25], float), 247 | "ano_9": format_number(row[26], float), 248 | "indicador_rendimento": format_number(row[27], float) 249 | } 250 | 251 | 252 | 253 | rede["nota_prova_brasil"]["2005"]["anos_iniciais"] = { 254 | "matematica": format_number(row[28], float), 255 | "lingua_portuguesa": format_number(row[29], float), 256 | "nota_media_padronizada": format_number(row[30], float) 257 | } 258 | rede["nota_prova_brasil"]["2007"]["anos_iniciais"] = { 259 | "matematica": format_number(row[31], float), 260 | "lingua_portuguesa": format_number(row[32], float), 261 | "nota_media_padronizada": format_number(row[33], float) 262 | } 263 | rede["nota_prova_brasil"]["2009"]["anos_iniciais"] = { 264 | "matematica": format_number(row[34], float), 265 | "lingua_portuguesa": format_number(row[35], float), 266 | "nota_media_padronizada": format_number(row[36], float) 267 | } 268 | rede["nota_prova_brasil"]["2011"]["anos_iniciais"] = { 269 | "matematica": format_number(row[37], float), 270 | "lingua_portuguesa": format_number(row[38], float), 271 | "nota_media_padronizada": format_number(row[39], float) 272 | } 273 | 274 | rede["ideb"]["anos_iniciais"] = { 275 | "2005": format_number(row[40], float), 276 | "2007": format_number(row[41], float), 277 | "2009": format_number(row[42], float), 278 | "2011": format_number(row[43], float) 279 | } 280 | rede["projecoes"]["anos_iniciais"] = { 281 | "2007": format_number(row[44], float), 282 | "2009": format_number(row[45], float), 283 | "2011": format_number(row[46], float), 284 | "2013": format_number(row[47], float), 285 | "2015": format_number(row[48], float), 286 | "2017": format_number(row[49], float), 287 | "2019": format_number(row[50], float), 288 | "2021": format_number(row[51], float) 289 | } 290 | 291 | item["redes"][codigo_rede] = rede 292 | dados[codigo_municipio] = item 293 | 294 | 295 | 296 | 297 | print "salvando no RIAK..." 298 | 299 | for codigo_municipio in dados: 300 | print "> %s" % codigo_municipio 301 | item = dados[codigo_municipio] 302 | new_escola = escola_bucket.new( 303 | "%s" % (codigo_municipio), data=item) 304 | 305 | new_escola.add_index('uf_bin', item['uf']) 306 | new_escola.add_index('codigo_municipio_bin', item['codigo_municipio']) 307 | new_escola.store() 308 | 309 | 310 | #print row 311 | #print ', '.join(row) 312 | 313 | #break 314 | 315 | 316 | 317 | 318 | print "acabou!" 319 | 320 | 321 | 322 | 323 | 324 | #print dados 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | """ 333 | 0 uf, 334 | 1 codigo_municipio, 335 | 2 nome_municipio, 336 | 5 rede, 337 | 6 2005_aprov_1a5, 338 | 7 2005_ano_1, 339 | 8 2005_ano_2, 340 | 9 2005_ano_3, 341 | 10 2005_ano_4, 342 | 11 2005_ano_5, 343 | 12 2005_indicador_1a5, 344 | 13 2007_aprov_1a5, 345 | 14 2007_ano_1, 346 | 15 2007_ano_2, 347 | 16 2007_ano_3, 348 | 17 2007_ano_4, 349 | 18 2007_ano_5, 350 | 19 2007_indicador_1a5, 351 | 20 2009_aprov_1a5, 352 | 21 2009_ano_1, 353 | 22 2009_ano_2, 354 | 23 2009_ano_3, 355 | 24 2009_ano_4, 356 | 25 2009_ano_5, 357 | 26 2009_indicador_1a5, 358 | 27 2011_aprov_1a5, 359 | 28 2011_ano_1, 360 | 29 2011_ano_2, 361 | 30 2011_ano_3, 362 | 31 2011_ano_4, 363 | 32 2011_ano_5, 364 | 33 2011_indicador_1a5, 365 | 34 2005_provabrasil_Matemática, 366 | 35 2005_provabrasil_Língua Portuguesa, 367 | 36 2005_provabrasil_Nota Média Padronizada (N), 368 | 37 2007_provabrasil_Matemática, 369 | 38 2007_provabrasil_Língua Portuguesa, 370 | 39 2007_provabrasil_Nota Média Padronizada (N), 371 | 40 2009_provabrasil_Matemática, 372 | 41 2009_provabrasil_Língua Portuguesa, 373 | 42 2009_provabrasil_Nota Média Padronizada (N), 374 | 43 2011_provabrasil_Matemática, 375 | 44 2011_provabrasil_Língua Portuguesa, 376 | 45 2011_provabrasil_MEDIA_PADRONIZADA, 377 | 46 Ideb_2005, 378 | 47 Ideb_2007, 379 | 48 Ideb_2009, 380 | 49 Ideb_2011, 381 | 50 Proj_2007, 382 | 51 Proj_2009, 383 | 52 Proj_2011, 384 | 53 Proj_2013, 385 | 54 Proj_2015, 386 | 55 Proj_2017, 387 | 56 Proj_2019, 388 | 57 Proj_2021 389 | 390 | """ -------------------------------------------------------------------------------- /scripts/import/importa_ideb_municipal_mongo.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | 6 | HOST = "localhost" #"10.0.0.36" 7 | PORT = 8098 8 | BUCKET_NAME = "ideb_municipio" 9 | 10 | HOST_MONGO = "localhost" 11 | PORT_MONGO = 27017 12 | DATA_BASE_NAME = "ideb" 13 | 14 | 15 | import riak 16 | import pymongo 17 | import time 18 | import uuid 19 | import csv 20 | import json 21 | 22 | 23 | def format_number(number, type): 24 | 25 | if (number.lower() == "nd" or number.lower() == "nd*"): 26 | return None 27 | 28 | return type(number.replace("-", "0").replace(",", ".")) 29 | 30 | 31 | 32 | def create_new_item(codigo_municipio, row): 33 | return { 34 | "codigo_municipio": codigo_municipio, 35 | "uf": row[0], 36 | "nome_municipio": row[2], 37 | "redes": {} 38 | } 39 | 40 | def create_new_rede(codigo_rede): 41 | return { 42 | "taxa_aprovacao": { 43 | "2005": {}, 44 | "2007": {}, 45 | "2009": {}, 46 | "2011": {} 47 | }, 48 | 49 | "nota_prova_brasil": { 50 | "2005": {}, 51 | "2007": {}, 52 | "2009": {}, 53 | "2011": {} 54 | }, 55 | 56 | "ideb": {}, 57 | "projecoes": {} 58 | } 59 | 60 | 61 | 62 | 63 | 64 | client = riak.RiakClient(host=HOST, http_port=PORT) 65 | 66 | 67 | municipio_bucket = client.bucket(BUCKET_NAME) 68 | mongoClient = MongoClient(HOST_MONGO, PORT_MONGO) 69 | mongoDB = mongoClient[DATA_BASE_NAME] 70 | 71 | # We're creating the user data & keying off their username. 72 | # Note that the user hasn't been stored in Riak yet. 73 | 74 | 75 | print client.get_buckets() 76 | 77 | 78 | dados = {} 79 | dados_array = [] 80 | 81 | 82 | 83 | 84 | 85 | i=0 86 | 87 | 88 | with open("D:/Data/INEP/IDEB/divulgacao-anos-iniciais-escolas-2011.xls", "rb") as csvfile: 89 | spamreader = csv.reader(csvfile, delimiter=',', quotechar='"') 90 | headers = spamreader.next() 91 | for row in spamreader: 92 | 93 | #i+=1 94 | #if i > 100: 95 | # i=0 96 | # break; 97 | 98 | codigo_municipio = row[1] 99 | codigo_rede = row[3] 100 | print codigo_municipio 101 | 102 | item = create_new_item(codigo_municipio, row) 103 | if codigo_municipio in dados: 104 | item = dados[codigo_municipio] 105 | 106 | rede = create_new_rede(codigo_rede) 107 | if codigo_rede in item["redes"]: 108 | rede = item["redes"][codigo_rede] 109 | 110 | 111 | rede["taxa_aprovacao"]["2005"]["anos_iniciais"] = { 112 | "ano_1a5": format_number(row[4], float), 113 | "ano_1": format_number(row[5], float), 114 | "ano_2": format_number(row[6], float), 115 | "ano_3": format_number(row[7], float), 116 | "ano_4": format_number(row[8], float), 117 | "ano_5": format_number(row[9], float), 118 | "indicador_rendimento": format_number(row[10], float) 119 | } 120 | 121 | rede["taxa_aprovacao"]["2007"]["anos_iniciais"] = { 122 | "ano_1a5": format_number(row[11], float), 123 | "ano_1": format_number(row[12], float), 124 | "ano_2": format_number(row[13], float), 125 | "ano_3": format_number(row[14], float), 126 | "ano_4": format_number(row[15], float), 127 | "ano_5": format_number(row[16], float), 128 | "indicador_rendimento": format_number(row[17], float) 129 | } 130 | rede["taxa_aprovacao"]["2009"]["anos_iniciais"] = { 131 | "ano_1a5": format_number(row[18], float), 132 | "ano_1": format_number(row[19], float), 133 | "ano_2": format_number(row[20], float), 134 | "ano_3": format_number(row[21], float), 135 | "ano_4": format_number(row[22], float), 136 | "ano_5": format_number(row[23], float), 137 | "indicador_rendimento": row[24] 138 | } 139 | rede["taxa_aprovacao"]["2011"]["anos_iniciais"] = { 140 | "ano_1a5": format_number(row[25], float), 141 | "ano_1": format_number(row[26], float), 142 | "ano_2": format_number(row[27], float), 143 | "ano_3": format_number(row[28], float), 144 | "ano_4": format_number(row[29], float), 145 | "ano_5": format_number(row[30], float), 146 | "indicador_rendimento": format_number(row[31], float) 147 | } 148 | 149 | 150 | 151 | rede["nota_prova_brasil"]["2005"]["anos_iniciais"] = { 152 | "matematica": format_number(row[32], float), 153 | "lingua_portuguesa": format_number(row[33], float), 154 | "nota_media_padronizada": format_number(row[34], float) 155 | } 156 | rede["nota_prova_brasil"]["2007"]["anos_iniciais"] = { 157 | "matematica": format_number(row[35], float), 158 | "lingua_portuguesa": format_number(row[36], float), 159 | "nota_media_padronizada": format_number(row[37], float) 160 | } 161 | rede["nota_prova_brasil"]["2009"]["anos_iniciais"] = { 162 | "matematica": format_number(row[38], float), 163 | "lingua_portuguesa": format_number(row[39], float), 164 | "nota_media_padronizada": format_number(row[40], float) 165 | } 166 | rede["nota_prova_brasil"]["2011"]["anos_iniciais"] = { 167 | "matematica": format_number(row[41], float), 168 | "lingua_portuguesa": format_number(row[42], float), 169 | "nota_media_padronizada": format_number(row[43], float) 170 | } 171 | 172 | rede["ideb"]["anos_iniciais"] = { 173 | "2005": format_number(row[44], float), 174 | "2007": format_number(row[45], float), 175 | "2009": format_number(row[46], float), 176 | "2011": format_number(row[47], float) 177 | } 178 | rede["projecoes"]["anos_iniciais"] = { 179 | "2007": format_number(row[48], float), 180 | "2009": format_number(row[49], float), 181 | "2011": format_number(row[50], float), 182 | "2013": format_number(row[51], float), 183 | "2015": format_number(row[52], float), 184 | "2017": format_number(row[53], float), 185 | "2019": format_number(row[54], float), 186 | "2021": format_number(row[55], float) 187 | } 188 | 189 | item["redes"][codigo_rede] = rede 190 | dados[codigo_municipio] = item 191 | 192 | 193 | 194 | print "**** DADOS FINAIS ***" 195 | 196 | with open("D:/Data/INEP/IDEB/divulgacao-anos-finais-municipios-2011.csv", "rb") as csvfile: 197 | spamreader = csv.reader(csvfile, delimiter=',', quotechar='"') 198 | headers = spamreader.next() 199 | for row in spamreader: 200 | 201 | 202 | #i+=1 203 | #if i > 100: 204 | # i=0 205 | # break; 206 | 207 | codigo_municipio = row[1] 208 | codigo_rede = row[3] 209 | print codigo_municipio 210 | 211 | item = create_new_item(codigo_municipio, row) 212 | if codigo_municipio in dados: 213 | item = dados[codigo_municipio] 214 | 215 | rede = create_new_rede(codigo_rede) 216 | if codigo_rede in item["redes"]: 217 | rede = item["redes"][codigo_rede] 218 | 219 | 220 | rede["taxa_aprovacao"]["2005"]["anos_iniciais"] = { 221 | "ano_6a9": format_number(row[4], float), 222 | "ano_6": format_number(row[5], float), 223 | "ano_7": format_number(row[6], float), 224 | "ano_8": format_number(row[7], float), 225 | "ano_9": format_number(row[8], float), 226 | "indicador_rendimento": format_number(row[9], float) 227 | } 228 | 229 | rede["taxa_aprovacao"]["2007"]["anos_iniciais"] = { 230 | "ano_6a9": format_number(row[10], float), 231 | "ano_6": format_number(row[12], float), 232 | "ano_7": format_number(row[13], float), 233 | "ano_8": format_number(row[14], float), 234 | "ano_9": format_number(row[15], float), 235 | "indicador_rendimento": format_number(row[16], float) 236 | } 237 | rede["taxa_aprovacao"]["2009"]["anos_iniciais"] = { 238 | "ano_6a9": format_number(row[17], float), 239 | "ano_6": format_number(row[18], float), 240 | "ano_7": format_number(row[19], float), 241 | "ano_8": format_number(row[20], float), 242 | "ano_9": format_number(row[21], float), 243 | "indicador_rendimento": row[22] 244 | } 245 | rede["taxa_aprovacao"]["2011"]["anos_iniciais"] = { 246 | "ano_6a9": format_number(row[22], float), 247 | "ano_6": format_number(row[23], float), 248 | "ano_7": format_number(row[24], float), 249 | "ano_8": format_number(row[25], float), 250 | "ano_9": format_number(row[26], float), 251 | "indicador_rendimento": format_number(row[27], float) 252 | } 253 | 254 | 255 | 256 | rede["nota_prova_brasil"]["2005"]["anos_iniciais"] = { 257 | "matematica": format_number(row[28], float), 258 | "lingua_portuguesa": format_number(row[29], float), 259 | "nota_media_padronizada": format_number(row[30], float) 260 | } 261 | rede["nota_prova_brasil"]["2007"]["anos_iniciais"] = { 262 | "matematica": format_number(row[31], float), 263 | "lingua_portuguesa": format_number(row[32], float), 264 | "nota_media_padronizada": format_number(row[33], float) 265 | } 266 | rede["nota_prova_brasil"]["2009"]["anos_iniciais"] = { 267 | "matematica": format_number(row[34], float), 268 | "lingua_portuguesa": format_number(row[35], float), 269 | "nota_media_padronizada": format_number(row[36], float) 270 | } 271 | rede["nota_prova_brasil"]["2011"]["anos_iniciais"] = { 272 | "matematica": format_number(row[37], float), 273 | "lingua_portuguesa": format_number(row[38], float), 274 | "nota_media_padronizada": format_number(row[39], float) 275 | } 276 | 277 | rede["ideb"]["anos_iniciais"] = { 278 | "2005": format_number(row[40], float), 279 | "2007": format_number(row[41], float), 280 | "2009": format_number(row[42], float), 281 | "2011": format_number(row[43], float) 282 | } 283 | rede["projecoes"]["anos_iniciais"] = { 284 | "2007": format_number(row[44], float), 285 | "2009": format_number(row[45], float), 286 | "2011": format_number(row[46], float), 287 | "2013": format_number(row[47], float), 288 | "2015": format_number(row[48], float), 289 | "2017": format_number(row[49], float), 290 | "2019": format_number(row[50], float), 291 | "2021": format_number(row[51], float) 292 | } 293 | 294 | item["redes"][codigo_rede] = rede 295 | dados[codigo_municipio] = item 296 | 297 | 298 | 299 | 300 | print "salvando no MongoDB..." 301 | 302 | municipiosDB = mongoDB.municipio; 303 | municipiosDB.insert(dados) 304 | 305 | 306 | print "acabou!" 307 | 308 | 309 | 310 | 311 | 312 | #print dados 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | """ 321 | 0 uf, 322 | 1 codigo_municipio, 323 | 2 nome_municipio, 324 | 5 rede, 325 | 6 2005_aprov_1a5, 326 | 7 2005_ano_1, 327 | 8 2005_ano_2, 328 | 9 2005_ano_3, 329 | 10 2005_ano_4, 330 | 11 2005_ano_5, 331 | 12 2005_indicador_1a5, 332 | 13 2007_aprov_1a5, 333 | 14 2007_ano_1, 334 | 15 2007_ano_2, 335 | 16 2007_ano_3, 336 | 17 2007_ano_4, 337 | 18 2007_ano_5, 338 | 19 2007_indicador_1a5, 339 | 20 2009_aprov_1a5, 340 | 21 2009_ano_1, 341 | 22 2009_ano_2, 342 | 23 2009_ano_3, 343 | 24 2009_ano_4, 344 | 25 2009_ano_5, 345 | 26 2009_indicador_1a5, 346 | 27 2011_aprov_1a5, 347 | 28 2011_ano_1, 348 | 29 2011_ano_2, 349 | 30 2011_ano_3, 350 | 31 2011_ano_4, 351 | 32 2011_ano_5, 352 | 33 2011_indicador_1a5, 353 | 34 2005_provabrasil_Matemática, 354 | 35 2005_provabrasil_Língua Portuguesa, 355 | 36 2005_provabrasil_Nota Média Padronizada (N), 356 | 37 2007_provabrasil_Matemática, 357 | 38 2007_provabrasil_Língua Portuguesa, 358 | 39 2007_provabrasil_Nota Média Padronizada (N), 359 | 40 2009_provabrasil_Matemática, 360 | 41 2009_provabrasil_Língua Portuguesa, 361 | 42 2009_provabrasil_Nota Média Padronizada (N), 362 | 43 2011_provabrasil_Matemática, 363 | 44 2011_provabrasil_Língua Portuguesa, 364 | 45 2011_provabrasil_MEDIA_PADRONIZADA, 365 | 46 Ideb_2005, 366 | 47 Ideb_2007, 367 | 48 Ideb_2009, 368 | 49 Ideb_2011, 369 | 50 Proj_2007, 370 | 51 Proj_2009, 371 | 52 Proj_2011, 372 | 53 Proj_2013, 373 | 54 Proj_2015, 374 | 55 Proj_2017, 375 | 56 Proj_2019, 376 | 57 Proj_2021 377 | 378 | """ --------------------------------------------------------------------------------