├── LICENSE ├── README.md └── src ├── f.sh ├── page.sh └── port.sh /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2013 qindongliang 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # shell-mysql 2 | 使用shell脚本分页读取MySQL数据 3 | 4 | ### 脚本背景 5 | **由于要在Linux上,远程读取mysql的表的数据,然后做一定清洗后,把数据上传至Hadoop集群中,使用Java写吧,感觉太麻烦了,得在Win上开发好,还得打成jar包, 6 | 上传到Linux上,如果那里出了问题,还得重复这样,非常不方便,那就用shell写一个吧,也不需要什么jdbc驱动包,只需要在Linux上装个MySQL的 7 | 客户端即可,用一行yum命令即可搞定,所以就花了点时间,封装了一个小脚本** 8 | 9 | ### 功能介绍 10 | **直接在Linux下使用shell脚本远程分页读取MySQL表的数据的一个小脚本,已测过读取600万+的数据 11 | 效率与jdbc相差无几** 12 | 13 | ### 脚本介绍 14 | **主要有三个脚本构成
1,page.sh 这是一主脚本,里面定义了分页的条件,大家看下便知
2,f.sh 一个小包装的脚本吧,里面会用sed去掉表头一些信息
3,port.sh 分页读取数据的执行脚本
** 15 | 16 | 17 | ### 如何使用? 18 | **使用非常简单,需要改下page.sh里面的查询字段,以及分页查询的数量,默认是10000,然后执行sh page.sh databaseName tableName传入数据库名和表名即可** 19 | 20 | ### 博客相关 21 | 22 | (1)[个人站点(2018之后,同步更新)](http://8090nixi.com/) 23 | 24 | (2)[iteye博客]() 25 | 26 | 27 | 28 | 29 | 30 | 31 | ### 我的公众号(woshigcs) 32 | 33 | 有问题可关注我的公众号留言咨询 34 | 35 | ![image](https://github.com/qindongliang/answer_sheet_scan/blob/master/imgs/gcs.jpg) 36 | -------------------------------------------------------------------------------- /src/f.sh: -------------------------------------------------------------------------------- 1 | 2 | #第一个参数表名 3 | #第二个参数是start 4 | #第三个参数是offset 5 | #分页读取数据后,删除第一行表头,并写入一个文件中,文件名与表名一样 6 | sh port.sh $1 $2 $3 | sed '1d' >> tables/$1 7 | 8 | #删除无用的html字符 9 | #sed -i 's/<[^>]*>//g;/^$/d' tables/$1 10 | #删除无用的一些特殊符号 11 | #sed -i 's/[a-zA-Z\.():;><-]//g' tables/$1 12 | 13 | 14 | 15 | 16 | 17 | 18 | -------------------------------------------------------------------------------- /src/page.sh: -------------------------------------------------------------------------------- 1 | 2 | #登陆mysql,负责查询某个表数据总量 3 | MYSQL=`which mysql` 4 | count=`$MYSQL -hmysqlhost --default-character-set=utf8 -P3306 -uname -ppwd <]*>//g;/^$/d' tables/$1 46 | 47 | #sed -i 's/[a-zA-Z\.():;><-]//g' tables/$1 48 | 49 | 50 | -------------------------------------------------------------------------------- /src/port.sh: -------------------------------------------------------------------------------- 1 | 2 | #连接mysql 3 | # -h主机地址 -u用户名 -p密码 4 | 5 | #下面的语句,是登陆到数据库后,使用某个数据库,然后根据条件查询表 6 | MYSQL=`which mysql` 7 | 8 | $MYSQL -hmysqlhost --default-character-set=utf8 -P3306 -uname -ppwd <