├── .gitignore ├── LICENSE ├── README.org ├── bin └── hyper_dict.sh ├── conf └── hyper_dict.conf ├── demo_data └── hdict_20150818181818 │ ├── dat │ └── idx ├── depend └── libevent-1.4.14b-stable.tar.gz ├── hdb ├── 0 │ └── .gitignore ├── 1 │ └── .gitignore ├── 2 │ └── .gitignore ├── 3 │ └── .gitignore └── link │ └── .gitignore ├── src ├── Makefile ├── common.mk ├── hdict.c ├── hdict.h ├── hyper_dict.c └── index_sort.c ├── tools ├── LushanFileOutputFormat.java └── generate_idx.py └── upload ├── 0 └── .gitignore ├── 1 └── .gitignore ├── 2 └── .gitignore └── 3 └── .gitignore /.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015, WEIBO, Inc. 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are 6 | met: 7 | 8 | * Redistributions of source code must retain the above copyright 9 | notice, this list of conditions and the following disclaimer. 10 | 11 | * Redistributions in binary form must reproduce the above 12 | copyright notice, this list of conditions and the following disclaimer 13 | in the documentation and/or other materials provided with the 14 | distribution. 15 | 16 | * Neither the name of the WEIBO nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 21 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 22 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 23 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 24 | OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 25 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 26 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 27 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 28 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 29 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 30 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /README.org: -------------------------------------------------------------------------------- 1 | * 简介 2 | lushan是一个基于memcache协议的key-value数据库, 可以动态挂载多个库, 用来进行静态数据的存储, 适用于存储更新频次较低的数据. 可以作为redis的有效补充, 以节省昂贵的内存服务器成本, lushan不像redis那样需要将数据完全存在内存中, 而是结合使用内存和硬盘, 内存只用来存储索引文件, 硬盘则用来存储真正的数据文件. 另外在写入时不能像redis那样实时写入, 而是通过加载离线的静态数据文件完成(例如用MapReduce生成的数据) 3 | 4 | lushan的每个库由数据文件和索引文件组成, 数据文件命名为dat, 索引文件命名为idx, 目录命名为hdict_xxxxxxxxxxx, 后面是时间编号. 索引文件由一个个key-pos对组成. 其中key就是key-value结构中需要查询的key, 而pos则包含两部分信息, 它的前40位表示value在dat文件中的偏离值off, 后20位表示value的长度length, 通过off和length来共同定位dat文件中的value 5 | 6 | 关于lushan的实现原理可以参考微博推荐博客的这篇博文: http://www.wbrecom.com/?p=453 7 | * 使用方法 8 | ** 源码编译 9 | *** 依赖的第三方源码包 10 | - libevent1.4 11 | *** 步骤 12 | 0. git clone https://github.com/wbrecom/lushan.git 13 | 1. 创建lushan编译环境 14 | #+BEGIN_SRC sh 15 | SOURCE_DIR=/tmp/lushan_environment/ 16 | mkdir "$SOURCE_DIR" 17 | #+END_SRC 18 | 2. 复制lushan项目源码到lushan编译环境目录 19 | #+BEGIN_SRC sh 20 | cp -r lushan "$SOURCE_DIR" 21 | #+END_SRC 22 | 3. 编译libevent 23 | #+BEGIN_SRC sh 24 | cd "$SOURCE_DIR"lushan/depend/libevent-1.4.14b-stable 25 | ./configure --prefix="$SOURCE_DIR"libevent 26 | make && make install 27 | #+END_SRC 28 | 4. 编译lushan 29 | #+BEGIN_EXAMPLE 30 | cd "$SOURCE_DIR"lushan/src 31 | # 修改common.mk, 将LIBEVENT_HOME的值修改为刚才安装libevent的目录 32 | LIBEVENT_DIR="$SOURCE_DIR"libevent 33 | sed -i "s#LIBEVENT_HOME.*#LIBEVENT_HOME = $LIBEVENT_DIR#g" common.mk 34 | make 35 | #+END_EXAMPLE 36 | ** 部署lushan 37 | 1. 创建lushan部署环境 38 | #+BEGIN_SRC sh 39 | SOURCE_DIR=/tmp/lushan_environment/ 40 | DEPLOY_DIR=/tmp/lushan_deploy/ 41 | mkdir -p $DEPLOY_DIR 42 | #+END_SRC 43 | 2. 创建bin conf hdb logs upload目录 44 | #+BEGIN_SRC sh 45 | cp -r "$SOURCE_DIR"lushan/bin $DEPLOY_DIR 46 | cp -r "$SOURCE_DIR"lushan/conf $DEPLOY_DIR 47 | cp -r "$SOURCE_DIR"lushan/hdb $DEPLOY_DIR 48 | cp -r "$SOURCE_DIR"lushan/logs $DEPLOY_DIR 49 | cp -r "$SOURCE_DIR"lushan/upload $DEPLOY_DIR 50 | #+END_SRC 51 | 3. 替换编译好的lushan程序 52 | #+BEGIN_SRC sh 53 | cp "$SOURCE_DIR"lushan/src/hyper_dict "$DEPLOY_DIR"bin 54 | #+END_SRC 55 | 4. 修改lushan.conf配置文件 56 | #+BEGIN_SRC sh 57 | HDB_DIR="$DEPLOY_DIR"hdb 58 | sed -i "s#HDB_PATH=.*#HDB_PATH=$HDB_DIR#g" "$DEPLOY_DIR"conf/hyper_dict.conf 59 | UPLOAD_DIR="$DEPLOY_DIR"upload 60 | sed -i "s#UPLOAD_PATH=.*#UPLOAD_PATH=$UPLOAD_DIR#g" "$DEPLOY_DIR"conf/hyper_dict.conf 61 | #+END_SRC 62 | 5. 挂载数据 63 | 64 | 将含有数据文件的目录复制到hdb目录下(数据文件目录命名为hdict_$datetime) 65 | 66 | 该目录包含choic.flg done.flg dat idx这4个文件 67 | #+BEGIN_SRC sh 68 | cp -r hdict_20150820131415 "$DEPLOY_DIR"hdb/1 69 | #+END_SRC 70 | 6. 启动lushan 71 | #+BEGIN_SRC sh 72 | bash "$DEPLOY_DIR"bin/hyper_dict.sh 73 | #+END_SRC 74 | 7. 补充: 动态挂载数据 75 | 76 | 将含有数据文件的目录复制到upload目录下(数据文件目录命名为hdict_$datetime) 77 | 78 | 该目录包含done.flg dat idx这3个文件 79 | #+BEGIN_SRC sh 80 | cp -r hdict_20150820142244 "$DEPLOY_DIR"upload/2 81 | touch "$DEPLOY_DIR"upload/2/hdict_20150820142244/done.flg 82 | #+END_SRC 83 | ** 访问示例 84 | 启动后即可通过stats命令查看lushan状态 85 | #+BEGIN_EXAMPLE 86 | echo -ne "stats\r\n" | nc 127.0.0.1 9999 87 | #+END_EXAMPLE 88 | 查询某个key的value(get dbnum-key) 89 | #+BEGIN_EXAMPLE 90 | echo -ne "get 1-123456\r\n" | nc 127.0.0.1 9999 91 | #+END_EXAMPLE 92 | ** 生成数据 93 | 生成符合lushan格式的数据有两种方法 94 | *** 脚本转化 95 | #+BEGIN_EXAMPLE 96 | 0. 有一个原始的数据文件dat, 每行都是key-value结构, 用:分隔, key必须为整数 97 | 1. 通过tools/generate_idx.py脚本生成索引文件 98 | 2. 如果数据文件的key是无序的, 可使用index_sort程序对索引文件排序 99 | 3. 新建hditc_xxxxxxxxx目录, 将idx文件和dat文件放到该目录下 100 | #+END_EXAMPLE 101 | *** MapReduce直接生成 102 | 如果是在hadoop上用MapReduce直接生成数据, 则需要使用tools/LushanFileOutputFormat.java, 指定MapReudce的输出格式类为LushanFileOutputFormat 103 | #+BEGIN_EXAMPLE 104 | job.setOutputFormat(LushanFileOutputFormat.class) 105 | #+END_EXAMPLE 106 | * 支持命令 107 | - info 108 | 109 | 查看库是否挂载成功, 显示每个库的信息, 打开时间, 当前处理的请求量, 库里面有多少条记录 110 | 111 | - stats 112 | 113 | 查看lushan本身的状态, 主要是通信部分的信息(例如: 当前等待处理队列里有多少请求, 有多少请求在等待队列里超时了). 这些信息, 有利于知道服务是否稳定, 是否性能满足要求 114 | 115 | - randomkey 116 | 117 | 随机取得一个key 118 | 119 | - get 120 | 121 | 取得一个或多个key的value 122 | 123 | - open reopen 124 | 125 | 动态挂载库 126 | 127 | - stats reset(慎用) 128 | 129 | 重置lushan统计信息 130 | 131 | - close 132 | 133 | 关闭客户端连接 134 | -------------------------------------------------------------------------------- /bin/hyper_dict.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # resolve links - $0 may be a softlink 4 | THIS="$0" 5 | while [ -h "$THIS" ]; do 6 | ls=`ls -ld "$THIS"` 7 | link=`expr "$ls" : '.*-> \(.*\)$'` 8 | if expr "$link" : '.*/.*' > /dev/null; then 9 | THIS="$link" 10 | else 11 | THIS=`dirname "$THIS"`/"$link" 12 | fi 13 | done 14 | 15 | THIS_DIR=`dirname "$THIS"` 16 | HOME=`cd "$THIS_DIR" ; pwd` 17 | 18 | . $HOME/../conf/hyper_dict.conf 19 | 20 | if [ -e $HDB_PATH/hyper_dict.work ];then 21 | exit 0 22 | fi 23 | 24 | touch $HDB_PATH/hyper_dict.work 25 | 26 | if [ -e $HDB_PATH/hyper_dict.stop ];then 27 | rm $HDB_PATH/hyper_dict.stop 28 | fi 29 | 30 | LINK_PATH=$HDB_PATH/link 31 | 32 | while [ 1 -eq 1 ]; 33 | do 34 | if [ -e $HDB_PATH/hyper_dict.stop ];then 35 | rm $HDB_PATH/hyper_dict.stop 36 | echo -n -e "stop\r\n" | nc 127.0.0.1 $PORT >/dev/null 37 | break 38 | fi 39 | echo -n -e "info\r\n" | nc 127.0.0.1 $PORT >/dev/null 40 | if [ $? != 0 ]; 41 | then 42 | if [ -e $HDB_PATH/hyper_dict.init ];then 43 | rm $HDB_PATH/hyper_dict.init 44 | fi 45 | 46 | i=0 47 | while [ $i -lt $HDICT_NUM ]; 48 | do 49 | if [ -L $LINK_PATH/$i ];then 50 | rm -f $LINK_PATH/$i 51 | fi 52 | 53 | choiceid=$(ls -t $HDB_PATH/$i/hdict*/choice.flg 2>/dev/null |head -1| awk -F/ '{print $(NF-1)}') 54 | if [ "$choiceid"x != x ];then 55 | echo $choiceid 56 | ln -s $HDB_PATH/$i/$choiceid $LINK_PATH/$i 57 | echo -n -e "open $LINK_PATH/$i $i\r\n" >> $HDB_PATH/hyper_dict.init 58 | fi 59 | 60 | i=$(expr $i + 1 ) 61 | 62 | done 63 | echo -n -e "end\r\n" >> $HDB_PATH/hyper_dict.init 64 | 65 | $HOME/hyper_dict -t $NUM_THREADS -T $TIMEOUT -c $MAXCONNS -p $PORT -d -v -i $HDB_PATH/hyper_dict.init > $HOME/../logs/hyper_dict.log 2>&1 66 | i=0 67 | while [ 1 -eq 1 ];do 68 | sleep 1 69 | echo -n -e "info\r\n" | nc 127.0.0.1 $PORT >/dev/null 70 | if [ $? -eq 0 ];then 71 | date +"restart hyper_dict %F %T. ok " 72 | if [ -e $HDB_PATH/hyper_dict.init ];then 73 | rm $HDB_PATH/hyper_dict.init 74 | fi 75 | break 76 | fi 77 | i=$(expr $i + 1) 78 | if [ $i -ge 20 ];then 79 | date +"restart hyper_dict %F %T. failed " 80 | break 81 | fi 82 | echo "FAIL" 83 | done 84 | 85 | if [ $i -ge 20 ];then 86 | continue 87 | fi 88 | 89 | fi 90 | 91 | i=0 92 | while [ $i -lt $HDICT_NUM ]; 93 | do 94 | dataids=$(ls -t $UPLOAD_PATH/$i/hdict*/done.flg 2>/dev/null | awk -F/ '{print $(NF-1)}' |awk 'BEGIN{ids="";}{if(length(ids)==0){ids=$1;}else {ids=ids" "$1} }END{print ids}') 95 | if [ "$dataids"x != x ];then 96 | first=1 97 | for dataid in $dataids; 98 | do 99 | if [ $first -eq 1 ];then 100 | mv $UPLOAD_PATH/$i/$dataid $HDB_PATH/$i/ 101 | first=0 102 | else 103 | rm -fr $UPLOAD_PATH/$i/$dataid 104 | fi 105 | done 106 | 107 | dataids=$(ls -t $HDB_PATH/$i/hdict*/done.flg 2>/dev/null | awk -F/ '{print $(NF-1)}' |awk 'BEGIN{ids="";}{if(length(ids)==0){ids=$1;}else {ids=ids" "$1} }END{print ids}') 108 | if [ "$dataids"x != x ];then 109 | first=1 110 | for dataid in $dataids; 111 | do 112 | if [ $first -eq 1 ];then 113 | touch $HDB_PATH/$i/$dataid/choice.flg 114 | if [ -L $LINK_PATH/$i ];then 115 | rm $LINK_PATH/$i 116 | fi 117 | ln -s $HDB_PATH/$i/$dataid $LINK_PATH/$i 118 | 119 | k=0 120 | while [ $k -lt 5 ]; 121 | do 122 | status=$(echo -n -e "open $LINK_PATH/$i $i\r\n" | nc 127.0.0.1 $PORT) 123 | expected=$(echo -e "OPENED\r\n") 124 | if [ "$status"x != "$expected"x ]; 125 | then 126 | sleep 3 127 | else 128 | break 129 | fi 130 | k=$(expr $k + 1 ) 131 | done 132 | 133 | first=0 134 | else 135 | if [ -e $HDB_PATH/$i/$dataid/choice.flg ]; then 136 | rm $HDB_PATH/$i/$dataid/choice.flg 137 | fi 138 | rm -fr $HDB_PATH/$i/$dataid 139 | fi 140 | done 141 | fi 142 | fi 143 | i=$(expr $i + 1 ) 144 | 145 | done 146 | 147 | sleep 3 148 | done 149 | 150 | rm $HDB_PATH/hyper_dict.work 151 | 152 | exit 0 153 | -------------------------------------------------------------------------------- /conf/hyper_dict.conf: -------------------------------------------------------------------------------- 1 | PORT=9999 2 | HDB_PATH=/tmp/lushan/hdb 3 | UPLOAD_PATH=/tmp/lushan/upload 4 | HDICT_NUM=32 5 | TIMEOUT=200 6 | MAXCONNS=4096 7 | NUM_THREADS=16 8 | -------------------------------------------------------------------------------- /demo_data/hdict_20150818181818/dat: -------------------------------------------------------------------------------- 1 | 2192792924:2011-06-25``````CHARMINGSS_346_804 2 | 3535170764:651247,13596,303566690383932033 3 | 5666520884:2015-08-12`1`2015-08-17 -------------------------------------------------------------------------------- /demo_data/hdict_20150818181818/idx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/demo_data/hdict_20150818181818/idx -------------------------------------------------------------------------------- /depend/libevent-1.4.14b-stable.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/depend/libevent-1.4.14b-stable.tar.gz -------------------------------------------------------------------------------- /hdb/0/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/hdb/0/.gitignore -------------------------------------------------------------------------------- /hdb/1/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/hdb/1/.gitignore -------------------------------------------------------------------------------- /hdb/2/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/hdb/2/.gitignore -------------------------------------------------------------------------------- /hdb/3/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/hdb/3/.gitignore -------------------------------------------------------------------------------- /hdb/link/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/hdb/link/.gitignore -------------------------------------------------------------------------------- /src/Makefile: -------------------------------------------------------------------------------- 1 | include common.mk 2 | 3 | CFLAGS := -I$(LIBEVENT_HOME)/include 4 | 5 | LIBS := $(LIBS) -L$(LIBEVENT_HOME)/lib -levent -lpthread 6 | 7 | ### Complie Rules. ### 8 | .c.o: 9 | $(CC) $(CFLAGS) $(CPPFLAGS) -c -o $@ $< -fPIC 10 | .cc.o: 11 | $(CXX) $(CFLAGS) $(CPPFLAGS) -c -o $@ $< -fPIC 12 | 13 | TARGET = hyper_dict index_sort 14 | 15 | ### Objects. ### 16 | OBJS1 = hdict.o hyper_dict.o 17 | OBJS2 = index_sort.o 18 | 19 | all : $(TARGET) 20 | 21 | hyper_dict : $(OBJS1) 22 | $(CC) $(CFLAGS) -o $@ $(OBJS1) $(LIBS) 23 | 24 | index_sort : $(OBJS2) 25 | $(CC) $(CFLAGS) -o $@ $(OBJS2) $(LIBS) 26 | 27 | 28 | ### Clean. ### 29 | clean: 30 | rm -f $(TARGET) *.o *.a tags 31 | -------------------------------------------------------------------------------- /src/common.mk: -------------------------------------------------------------------------------- 1 | CFLAGS = -g -Wall 2 | 3 | LIBEVENT_HOME = /usr/local/libevent 4 | -------------------------------------------------------------------------------- /src/hdict.c: -------------------------------------------------------------------------------- 1 | #include "hdict.h" 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | #define LOCK(hdb) pthread_mutex_lock(&(hdb)->mutex) 15 | #define UNLK(hdb) pthread_mutex_unlock(&(hdb)->mutex) 16 | 17 | int get_hdict_meta(const char *path, meta_t *hdict_meta) 18 | { 19 | FILE *fp = fopen(path, "r"); 20 | if (fp == NULL) { 21 | return -1; 22 | } 23 | 24 | char line[LINE_SIZE]; 25 | 26 | while (fgets(line, LINE_SIZE, fp)) { 27 | 28 | if (strncasecmp(line, "version", 7) == 0) { 29 | int i = 7; 30 | while((line[i] && line[i] != '\r' && line[i] != '\n') && 31 | (line[i] == ' ' || line[i] == ':' || line[i] == '=')) i++; 32 | if (line[i] && line[i] != '\r' && line[i] != '\n') 33 | hdict_meta->version = atoi(line+i); 34 | } else if (strncasecmp(line, "label", 5) == 0) { 35 | int i = 5; 36 | while((line[i] && line[i] != '\r' && line[i] != '\n') && 37 | (line[i] == ' ' || line[i] == ':' || line[i] == '=')) i++; 38 | int j = 0; 39 | while(line[i] && line[i] != '\r' && line[i] != '\n' && j < 20) { 40 | hdict_meta->label[j++] = line[i++]; 41 | } 42 | hdict_meta->label[j] = '\0'; 43 | } 44 | } 45 | 46 | fclose(fp); 47 | 48 | return 0; 49 | } 50 | 51 | hdict_t* hdict_open(const char *path, int *hdict_errnop) 52 | { 53 | FILE *fp = NULL; 54 | char pathname[256]; 55 | 56 | hdict_t *hdict = (hdict_t *)calloc(1, sizeof(hdict[0])); 57 | hdict->path = strdup(path); 58 | 59 | snprintf(pathname, sizeof(pathname), "%s/idx", hdict->path); 60 | 61 | struct stat st; 62 | if (stat(pathname, &st) == -1 || 63 | (st.st_size % sizeof(hdict->idx[0])) != 0) { 64 | *hdict_errnop = EHDICT_BAD_FILE; 65 | goto error; 66 | } 67 | 68 | hdict->idx = (idx_t *)malloc(st.st_size); 69 | if (hdict->idx == NULL) { 70 | *hdict_errnop = EHDICT_OUT_OF_MEMERY; 71 | goto error; 72 | } 73 | 74 | if ((fp = fopen(pathname, "r")) == NULL) { 75 | *hdict_errnop = EHDICT_BAD_FILE; 76 | goto error; 77 | } 78 | 79 | hdict->idx_num = st.st_size / sizeof(hdict->idx[0]); 80 | if (fread(hdict->idx, sizeof(hdict->idx[0]), hdict->idx_num, fp) != hdict->idx_num) { 81 | *hdict_errnop = EHDICT_BAD_FILE; 82 | goto error; 83 | } 84 | 85 | fclose(fp); 86 | fp = NULL; 87 | 88 | snprintf(pathname, sizeof(pathname), "%s/dat", hdict->path); 89 | hdict->fd = open(pathname, O_RDWR); 90 | if (hdict->fd <= 0) { 91 | //*hdict_errnop = EHDICT_BAD_FILE; 92 | *hdict_errnop = EHDICT_OUT_OF_MEMERY; 93 | goto error; 94 | } 95 | hdict->open_time = time(NULL); 96 | 97 | // meta 98 | snprintf(pathname, sizeof(pathname), "%s/meta", hdict->path); 99 | 100 | meta_t *hdict_meta = (meta_t*)calloc(1, sizeof(meta_t)); 101 | if (hdict_meta == NULL) { 102 | *hdict_errnop = EHDICT_OUT_OF_MEMERY; 103 | goto error; 104 | } 105 | 106 | get_hdict_meta(pathname, hdict_meta); 107 | hdict->hdict_meta = hdict_meta; 108 | 109 | return hdict; 110 | 111 | error: 112 | if (fp) fclose(fp); 113 | if (hdict) hdict_close(hdict); 114 | return NULL; 115 | } 116 | 117 | int hdict_seek(hdict_t *hdict, uint64_t key, off_t *off, uint32_t *length) 118 | { 119 | uint32_t low = 0; 120 | uint32_t high = hdict->idx_num; 121 | uint32_t mid; 122 | int hit = 0; 123 | int count = 0; 124 | while (low < high) { 125 | ++count; 126 | mid = (low + high) / 2; 127 | if (hdict->idx[mid].key > key) { 128 | high = mid; 129 | } else if (hdict->idx[mid].key < key) { 130 | low = mid + 1; 131 | } else { 132 | *off = (hdict->idx[mid].pos & 0xFFFFFFFFFF); 133 | *length = (hdict->idx[mid].pos >> 40); 134 | hit = 1; 135 | break; 136 | } 137 | } 138 | 139 | return hit; 140 | } 141 | 142 | int hdict_randomkey(hdict_t *hdict, uint64_t *key) 143 | { 144 | if (hdict->idx_num == 0) 145 | return -1; 146 | int i = rand() % hdict->idx_num; 147 | *key = hdict->idx[i].key; 148 | return 0; 149 | } 150 | 151 | int hdict_read(hdict_t *hdict, char *buf, uint32_t length, off_t off) 152 | { 153 | return pread(hdict->fd, buf, length, off); 154 | } 155 | 156 | void hdict_close(hdict_t *hdict) 157 | { 158 | if (hdict->fd > 0) close(hdict->fd); 159 | if (hdict->idx) free(hdict->idx); 160 | if (hdict->hdict_meta) free(hdict->hdict_meta); 161 | free(hdict->path); 162 | free(hdict); 163 | } 164 | 165 | void *hdb_mgr(void *arg) 166 | { 167 | hdb_t *hdb = (hdb_t *)arg; 168 | 169 | for (;;) { 170 | if (TAILQ_FIRST(&hdb->close_list)) { 171 | hdict_t *hdict, *next; 172 | hdict_t *hdicts[100]; 173 | int i, k = 0; 174 | 175 | LOCK(hdb); 176 | for (hdict = TAILQ_FIRST(&hdb->close_list); hdict; hdict = next) { 177 | next = TAILQ_NEXT(hdict, link); 178 | if (hdict->ref == 0) { 179 | hdicts[k++] = hdict; 180 | TAILQ_REMOVE(&hdb->close_list, hdict, link); 181 | hdb->num_close--; 182 | if (k == 100) 183 | break; 184 | } 185 | } 186 | UNLK(hdb); 187 | 188 | for (i = 0; i < k; ++i) { 189 | hdict = hdicts[i]; 190 | hdict_close(hdict); 191 | } 192 | } 193 | sleep(1); 194 | } 195 | return NULL; 196 | } 197 | 198 | int hdb_init(hdb_t *hdb) 199 | { 200 | if (pthread_mutex_init(&hdb->mutex, NULL)) return -1; 201 | TAILQ_INIT(&hdb->open_list); 202 | TAILQ_INIT(&hdb->close_list); 203 | hdb->num_open = 0; 204 | hdb->num_close = 0; 205 | 206 | int i; 207 | LIST_HEAD(, hdict_t) htab[HTAB_SIZE]; 208 | for (i = 0; i < HTAB_SIZE; i++) { 209 | LIST_INIT(htab + i); 210 | } 211 | return 0; 212 | } 213 | 214 | int hdb_reopen(hdb_t *hdb, const char *hdict_path, uint32_t hdid) 215 | { 216 | hdict_t *hd, *next; 217 | uint32_t hash; 218 | char rpath[1024]; 219 | realpath(hdict_path, rpath); 220 | 221 | int hdict_errno = 0; 222 | hdict_t *hdict = hdict_open(rpath, &hdict_errno); 223 | if (hdict == NULL) return hdict_errno; 224 | 225 | LOCK(hdb); 226 | for (hd = TAILQ_FIRST(&hdb->open_list); hd; hd = next) { 227 | next = TAILQ_NEXT(hd, link); 228 | if (hd->hdid == hdid) { 229 | LIST_REMOVE(hd, h_link); 230 | TAILQ_REMOVE(&hdb->open_list, hd, link); 231 | hdb->num_open--; 232 | TAILQ_INSERT_TAIL(&hdb->close_list, hd, link); 233 | hdb->num_close++; 234 | break; 235 | } 236 | } 237 | hdict->hdid = hdid; 238 | TAILQ_INSERT_TAIL(&hdb->open_list, hdict, link); 239 | hash = HASH(hdict->hdid); 240 | LIST_INSERT_HEAD(&hdb->htab[hash], hdict, h_link); 241 | hdb->num_open++; 242 | UNLK(hdb); 243 | 244 | return 0; 245 | } 246 | 247 | int hdb_close(hdb_t *hdb, uint32_t hdid) 248 | { 249 | int found = 0; 250 | hdict_t *hd, *next; 251 | LOCK(hdb); 252 | for (hd = TAILQ_FIRST(&hdb->open_list); hd; hd = next) { 253 | next = TAILQ_NEXT(hd, link); 254 | if (hd->hdid == hdid) { 255 | LIST_REMOVE(hd, h_link); 256 | TAILQ_REMOVE(&hdb->open_list, hd, link); 257 | hdb->num_open--; 258 | TAILQ_INSERT_TAIL(&hdb->close_list, hd, link); 259 | hdb->num_close++; 260 | found = 1; 261 | break; 262 | } 263 | } 264 | UNLK(hdb); 265 | 266 | return found; 267 | } 268 | 269 | int hdb_info(hdb_t *hdb, char *buf, int size) 270 | { 271 | int len = 0; 272 | len += snprintf(buf+len, size-len, "%2s %20s %5s %3s %9s %8s %13s %s\n", 273 | "id", "label", "state", "ref", "num_qry", "idx_num", "open_time", "path"); 274 | if (len < size) 275 | len += snprintf(buf+len, size-len, "----------------------------------------------------------------\n"); 276 | int pass, k; 277 | hdict_t *hdict; 278 | LOCK(hdb); 279 | for (pass = 0; pass < 2; ++pass) { 280 | const char *state; 281 | struct hdict_list_t *hlist; 282 | switch (pass) { 283 | case 0: 284 | state = "OPEN"; 285 | hlist = &hdb->open_list; 286 | break; 287 | case 1: 288 | state = "CLOSE"; 289 | hlist = &hdb->close_list; 290 | break; 291 | default: 292 | state = NULL; 293 | hlist = NULL; 294 | } 295 | 296 | k = 0; 297 | for (hdict = TAILQ_FIRST(hlist); hdict; hdict = TAILQ_NEXT(hdict, link)) { 298 | ++k; 299 | if (len < size) { 300 | struct tm tm; 301 | localtime_r(&hdict->open_time, &tm); 302 | len += snprintf(buf+len, size-len, "%2d %20s %5s %2d %10d %8d %02d%02d%02d-%02d%02d%02d %s\n", 303 | hdict->hdid, 304 | hdict->hdict_meta->label, 305 | state, 306 | hdict->ref, 307 | hdict->num_qry, 308 | hdict->idx_num, 309 | tm.tm_year - 100, tm.tm_mon + 1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec, 310 | hdict->path); 311 | } 312 | } 313 | } 314 | UNLK(hdb); 315 | return len; 316 | } 317 | 318 | hdict_t *hdb_ref(hdb_t *hdb, uint32_t hdid) 319 | { 320 | hdict_t *hdict = NULL; 321 | hdict_t *hd; 322 | LOCK(hdb); 323 | uint32_t hash = HASH(hdid); 324 | for (hd = LIST_FIRST(&hdb->htab[hash]); hd; hd = LIST_NEXT(hd, h_link)) { 325 | if (hd->hdid == hdid) { 326 | hd->ref++; 327 | hdict = hd; 328 | break; 329 | } 330 | } 331 | UNLK(hdb); 332 | return hdict; 333 | } 334 | 335 | int hdb_deref(hdb_t *hdb, hdict_t *hdict) 336 | { 337 | LOCK(hdb); 338 | hdict->ref--; 339 | UNLK(hdb); 340 | return 0; 341 | } 342 | -------------------------------------------------------------------------------- /src/hdict.h: -------------------------------------------------------------------------------- 1 | #ifndef HDICT_H 2 | #define HDICT_H 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | #define LINE_SIZE 1024 11 | #define BUF_SIZE 255 12 | 13 | typedef struct { 14 | uint64_t key; 15 | uint64_t pos; 16 | } idx_t; 17 | 18 | typedef struct { 19 | uint32_t version; 20 | char label[21]; 21 | } meta_t; 22 | 23 | typedef struct hdict_t hdict_t; 24 | struct hdict_t { 25 | TAILQ_ENTRY(hdict_t) link; 26 | LIST_ENTRY(hdict_t) h_link; 27 | char *path; 28 | uint32_t idx_num; 29 | idx_t *idx; 30 | int fd; 31 | time_t open_time; 32 | uint32_t num_qry; 33 | uint32_t ref; 34 | uint32_t hdid; 35 | meta_t *hdict_meta; 36 | }; 37 | 38 | TAILQ_HEAD(hdict_list_t, hdict_t); 39 | 40 | #define HTAB_SIZE 1024 41 | #define HASH(dict_id) ((dict_id) % HTAB_SIZE) 42 | 43 | typedef struct hdb_t hdb_t; 44 | struct hdb_t { 45 | pthread_mutex_t mutex; 46 | struct hdict_list_t open_list; 47 | struct hdict_list_t close_list; 48 | int num_open; 49 | int num_close; 50 | LIST_HEAD(, hdict_t) htab[HTAB_SIZE]; 51 | }; 52 | 53 | #define EHDICT_OUT_OF_MEMERY -1 54 | #define EHDICT_BAD_FILE -2 55 | 56 | hdict_t* hdict_open(const char *path, int *hdict_errnop); 57 | 58 | #define HDICT_VALUE_LENGTH_MAX 204800 59 | int hdict_seek(hdict_t *hdict, uint64_t key, off_t *off, uint32_t *length); 60 | 61 | int hdict_randomkey(hdict_t *hdict, uint64_t *key); 62 | 63 | int hdict_read(hdict_t *hdict, char *buf, uint32_t length, off_t off); 64 | 65 | void hdict_close(hdict_t *hdict); 66 | 67 | void *hdb_mgr(void *arg); 68 | 69 | int hdb_init(hdb_t *hdb); 70 | 71 | int hdb_reopen(hdb_t *hdb, const char *hdict_path, uint32_t hdid); 72 | 73 | int hdb_close(hdb_t *hdb, uint32_t hdid); 74 | 75 | int hdb_info(hdb_t *hdb, char *buf, int size); 76 | 77 | hdict_t *hdb_ref(hdb_t *hdb, uint32_t hdid); 78 | 79 | int hdb_deref(hdb_t *hdb, hdict_t *hdict); 80 | 81 | #endif 82 | -------------------------------------------------------------------------------- /src/hyper_dict.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2015, WEIBO, Inc. */ 2 | /* All rights reserved. */ 3 | 4 | /* Redistribution and use in source and binary forms, with or without */ 5 | /* modification, are permitted provided that the following conditions are */ 6 | /* met: */ 7 | 8 | /* * Redistributions of source code must retain the above copyright */ 9 | /* notice, this list of conditions and the following disclaimer. */ 10 | 11 | /* * Redistributions in binary form must reproduce the above */ 12 | /* copyright notice, this list of conditions and the following disclaimer */ 13 | /* in the documentation and/or other materials provided with the */ 14 | /* distribution. */ 15 | 16 | /* * Neither the name of the WEIBO nor the names of its */ 17 | /* contributors may be used to endorse or promote products derived from */ 18 | /* this software without specific prior written permission. */ 19 | 20 | /* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS */ 21 | /* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT */ 22 | /* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR */ 23 | /* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT */ 24 | /* OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, */ 25 | /* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT */ 26 | /* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */ 27 | /* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY */ 28 | /* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT */ 29 | /* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE */ 30 | /* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 31 | 32 | /* 33 | * memcached - memory caching daemon 34 | * 35 | * http://www.memcached.org/ 36 | * 37 | * Copyright 2003 Danga Interactive, Inc. All rights reserved. 38 | * 39 | * Use and distribution licensed under the BSD license. See 40 | * the LICENSE file for full text. 41 | * 42 | * Authors: 43 | * Anatoly Vorobey 44 | * Brad Fitzpatrick 45 | */ 46 | 47 | #include 48 | #include 49 | #include 50 | #include 51 | #include 52 | #include 53 | #include 54 | #include 55 | #include 56 | #include 57 | #include 58 | #include 59 | #include 60 | #include 61 | #include 62 | #include 63 | #include 64 | #include 65 | #include 66 | 67 | 68 | #include "hdict.h" 69 | 70 | #define mytimesub(a, b, result) do { \ 71 | (result)->tv_sec = (a)->tv_sec - (b)->tv_sec; \ 72 | (result)->tv_usec = (a)->tv_usec - (b)->tv_usec; \ 73 | if ((result)->tv_usec < 0) { \ 74 | --(result)->tv_sec; \ 75 | (result)->tv_usec += 1000000; \ 76 | } \ 77 | } while (0) 78 | 79 | struct settings { 80 | int maxconns; 81 | int port; 82 | struct in_addr interf; 83 | int num_threads; 84 | int verbose; 85 | int timeout; 86 | }; 87 | 88 | static struct settings settings; 89 | 90 | static void settings_init(void) { 91 | settings.port = 9764; 92 | settings.interf.s_addr = htonl(INADDR_ANY); 93 | settings.maxconns = 8192; 94 | settings.num_threads = 4; 95 | settings.verbose = 0; 96 | settings.timeout = 0; 97 | } 98 | 99 | struct stats { 100 | uint32_t curr_conns; 101 | uint32_t conn_structs; 102 | uint32_t get_cmds; 103 | uint32_t get_hits; 104 | uint32_t get_misses; 105 | time_t started; 106 | uint32_t timeouts; 107 | uint32_t ialloc_failed; 108 | }; 109 | 110 | static struct stats stats; 111 | static pthread_mutex_t stats_lock = PTHREAD_MUTEX_INITIALIZER; 112 | #define STATS_LOCK() pthread_mutex_lock(&stats_lock); 113 | #define STATS_UNLOCK() pthread_mutex_unlock(&stats_lock); 114 | 115 | static void stats_init() 116 | { 117 | stats.curr_conns = stats.conn_structs = 0; 118 | stats.get_cmds = stats.get_hits = stats.get_misses = 0; 119 | stats.ialloc_failed = 0; 120 | stats.started = time(NULL); 121 | return; 122 | } 123 | 124 | enum conn_states { 125 | conn_listening, /* the socket which listens for connections */ 126 | conn_read, /* reading in a command line */ 127 | conn_write, /* writing out a simple response */ 128 | conn_closing, /* closing this connection */ 129 | }; 130 | 131 | typedef struct _conn { 132 | int sfd; 133 | int state; 134 | struct event event; 135 | short ev_flags; 136 | short which; /* which events were just triggered */ 137 | 138 | char *rbuf; /* buffer to read commands into */ 139 | char *rcurr; /* but if we parsed some already, this is where we stopped */ 140 | int rsize; /* total allocated size of rbuf */ 141 | int rbytes; /* how much data, starting from rcur, do we have unparsed */ 142 | 143 | char *wbuf; 144 | char *wcurr; 145 | int wsize; 146 | int wbytes; 147 | int write_and_go; 148 | 149 | char *cmd; 150 | struct timeval tv; 151 | struct _conn *next; 152 | } conn; 153 | 154 | struct conn_queue { 155 | conn *head; 156 | conn *tail; 157 | uint32_t length; 158 | int wakeup; 159 | pthread_mutex_t lock; 160 | pthread_cond_t cond; 161 | }; 162 | 163 | static struct conn_queue REQ; 164 | static struct conn_queue RSP; 165 | static pthread_mutex_t notify_lock = PTHREAD_MUTEX_INITIALIZER; 166 | static int notify_receive_fd; 167 | static int notify_send_fd; 168 | 169 | static void cq_init(struct conn_queue *cq, int wakeup) { 170 | pthread_mutex_init(&cq->lock, NULL); 171 | pthread_cond_init(&cq->cond, NULL); 172 | cq->length = 0; 173 | cq->wakeup = wakeup; 174 | cq->head = NULL; 175 | cq->tail = NULL; 176 | } 177 | 178 | static void cq_push(struct conn_queue *cq, conn *item) { 179 | item->next = NULL; 180 | 181 | pthread_mutex_lock(&cq->lock); 182 | if (NULL == cq->tail) 183 | cq->head = item; 184 | else 185 | cq->tail->next = item; 186 | cq->tail = item; 187 | cq->length++; 188 | if (cq->wakeup) pthread_cond_signal(&cq->cond); 189 | pthread_mutex_unlock(&cq->lock); 190 | } 191 | 192 | static conn *cq_pop(struct conn_queue *cq) { 193 | conn *item; 194 | 195 | assert(cq->wakeup); 196 | pthread_mutex_lock(&cq->lock); 197 | while (NULL == cq->head) 198 | pthread_cond_wait(&cq->cond, &cq->lock); 199 | item = cq->head; 200 | cq->head = item->next; 201 | if (NULL == cq->head) 202 | cq->tail = NULL; 203 | cq->length--; 204 | pthread_mutex_unlock(&cq->lock); 205 | 206 | return item; 207 | } 208 | 209 | static conn *cq_peek(struct conn_queue *cq) { 210 | conn *item; 211 | 212 | pthread_mutex_lock(&cq->lock); 213 | item = cq->head; 214 | if (NULL != item) { 215 | cq->head = item->next; 216 | if (NULL == cq->head) 217 | cq->tail = NULL; 218 | cq->length--; 219 | } 220 | pthread_mutex_unlock(&cq->lock); 221 | 222 | return item; 223 | } 224 | 225 | static uint32_t cq_length(struct conn_queue *cq) { 226 | int length; 227 | pthread_mutex_lock(&cq->lock); 228 | length = cq->length; 229 | pthread_mutex_unlock(&cq->lock); 230 | return length; 231 | } 232 | 233 | void event_handler(const int fd, const short which, void *arg); 234 | void out_string(conn *c, char *str); 235 | conn *conn_new(int sfd, int init_state, int event_flags, struct event_base *base); 236 | 237 | static conn **freeconns; 238 | static int freetotal; 239 | static int freecurr; 240 | 241 | static void conn_init() 242 | { 243 | freetotal = 200; 244 | freecurr = 0; 245 | freeconns = (conn **)malloc(sizeof (conn *)*freetotal); 246 | return; 247 | } 248 | 249 | #define DATA_BUFFER_SIZE 8192 250 | 251 | static void conn_close(conn *c) { 252 | event_del(&c->event); 253 | 254 | if (settings.verbose > 1) 255 | fprintf(stderr, "<%d connection closed.\n", c->sfd); 256 | 257 | close(c->sfd); 258 | 259 | if (freecurr < freetotal) { 260 | freeconns[freecurr++] = c; 261 | } else { 262 | conn **new_freeconns = realloc(freeconns, sizeof(conn *)*freetotal*2); 263 | if (new_freeconns) { 264 | freetotal *= 2; 265 | freeconns = new_freeconns; 266 | freeconns[freecurr++] = c; 267 | } else { 268 | if (settings.verbose > 0) 269 | fprintf(stderr, "Couldn't realloc freeconns\n"); 270 | free(c->rbuf); 271 | free(c->wbuf); 272 | free(c); 273 | 274 | STATS_LOCK(); 275 | stats.conn_structs--; 276 | STATS_UNLOCK(); 277 | } 278 | } 279 | 280 | STATS_LOCK(); 281 | stats.curr_conns--; 282 | STATS_UNLOCK(); 283 | } 284 | 285 | static int update_event(conn *c, const int new_flags) { 286 | assert(c != NULL); 287 | 288 | struct event_base *base = c->event.ev_base; 289 | if (c->ev_flags == new_flags) 290 | return 1; 291 | if (c->ev_flags != 0) { 292 | if (event_del(&c->event) == -1) return 0; 293 | } 294 | event_set(&c->event, c->sfd, new_flags, event_handler, (void *)c); 295 | event_base_set(base, &c->event); 296 | c->ev_flags = new_flags; 297 | if (event_add(&c->event, 0) == -1) return 0; 298 | return 1; 299 | } 300 | 301 | #define FORWARD_OUT 1 302 | #define STAY_IN 2 303 | 304 | static int process_command(conn *c, char *command) { 305 | 306 | if (strcmp(command, "quit") == 0) { 307 | c->state = conn_closing; 308 | return STAY_IN; 309 | } else if (strcmp(command, "stop") == 0) { 310 | exit(0); 311 | } else if (strcmp(command, "version") == 0) { 312 | // compat with old php Memcache client 313 | out_string(c, "VERSION 1.1.13"); 314 | return STAY_IN; 315 | } else { 316 | if (event_del(&c->event) == -1) { 317 | out_string(c, "SERVER_ERROR can't forward"); 318 | return STAY_IN; 319 | } 320 | gettimeofday(&c->tv, NULL); 321 | c->ev_flags = 0; 322 | c->cmd = command; 323 | cq_push(&REQ, c); 324 | return FORWARD_OUT; 325 | } 326 | } 327 | 328 | static int try_read_command(conn *c) { 329 | char *el, *cont; 330 | 331 | if (!c->rbytes) 332 | return 0; 333 | el = (char *)memchr(c->rcurr, '\n', c->rbytes); 334 | if (!el) 335 | return 0; 336 | cont = el + 1; 337 | if (el - c->rcurr > 1 && *(el - 1) == '\r') { 338 | el--; 339 | } 340 | *el = '\0'; 341 | 342 | int res = process_command(c, c->rcurr); 343 | 344 | c->rbytes -= (cont - c->rcurr); 345 | c->rcurr = cont; 346 | 347 | return res; 348 | } 349 | 350 | void pack_string(conn *c, char *str) { 351 | int len; 352 | 353 | len = strlen(str); 354 | if (len + 2 > c->wsize) { 355 | /* ought to be always enough. just fail for simplicity */ 356 | str = "SERVER_ERROR output line too long"; 357 | len = strlen(str); 358 | } 359 | 360 | strcpy(c->wbuf, str); 361 | strcat(c->wbuf, "\r\n"); 362 | c->wbytes = len + 2; 363 | 364 | return; 365 | } 366 | 367 | void out_string(conn *c, char *str) { 368 | pack_string(c, str); 369 | c->wcurr = c->wbuf; 370 | c->state = conn_write; 371 | c->write_and_go = conn_read; 372 | return; 373 | } 374 | 375 | 376 | static int try_read_network(conn *c) { 377 | int gotdata = 0; 378 | int res; 379 | 380 | if (c->rcurr != c->rbuf) { 381 | if (c->rbytes != 0) /* otherwise there's nothing to copy */ 382 | memmove(c->rbuf, c->rcurr, c->rbytes); 383 | c->rcurr = c->rbuf; 384 | } 385 | 386 | while (1) { 387 | if (c->rbytes >= c->rsize) { 388 | char *new_rbuf = (char *)realloc(c->rbuf, c->rsize*2); 389 | if (!new_rbuf) { 390 | if (settings.verbose > 0) 391 | fprintf(stderr, "Couldn't realloc input buffer\n"); 392 | c->rbytes = 0; /* ignore what we read */ 393 | out_string(c, "SERVER_ERROR out of memory"); 394 | c->write_and_go = conn_closing; 395 | return 1; 396 | } 397 | c->rcurr = c->rbuf = new_rbuf; 398 | c->rsize *= 2; 399 | } 400 | int avail = c->rsize - c->rbytes; 401 | res = read(c->sfd, c->rbuf + c->rbytes, avail); 402 | if (res > 0) { 403 | gotdata = 1; 404 | c->rbytes += res; 405 | if (res == avail) { 406 | continue; 407 | } else { 408 | break; 409 | } 410 | } 411 | if (res == 0) { 412 | /* connection closed */ 413 | c->state = conn_closing; 414 | return 1; 415 | } 416 | 417 | if (res == -1) { 418 | if (errno == EAGAIN || errno == EWOULDBLOCK) break; 419 | else { 420 | c->state = conn_closing; 421 | return 1; 422 | } 423 | } 424 | } 425 | return gotdata; 426 | } 427 | 428 | static void drive_machine(conn *c) { 429 | int stop = 0; 430 | int sfd, flags = 1; 431 | socklen_t addrlen; 432 | struct sockaddr_in addr; 433 | conn *newc; 434 | int res; 435 | 436 | assert(c != NULL); 437 | 438 | while (!stop) { 439 | 440 | switch(c->state) { 441 | case conn_listening: 442 | addrlen = sizeof(addr); 443 | if ((sfd = accept(c->sfd, (struct sockaddr *)&addr, &addrlen)) == -1) { 444 | if (errno == EAGAIN || errno == EWOULDBLOCK) { 445 | /* these are transient, so don't log anything */ 446 | stop = 1; 447 | } else if (errno == EMFILE) { 448 | if (settings.verbose > 0) 449 | fprintf(stderr, "Too many open connections\n"); 450 | stop = 1; 451 | } else { 452 | perror("accept()"); 453 | stop = 1; 454 | } 455 | break; 456 | } 457 | 458 | if ((flags = fcntl(sfd, F_GETFL, 0)) < 0 || 459 | fcntl(sfd, F_SETFL, flags | O_NONBLOCK) < 0) { 460 | perror("setting O_NONBLOCK"); 461 | close(sfd); 462 | break; 463 | } 464 | newc = conn_new(sfd, conn_read, EV_READ | EV_PERSIST, c->event.ev_base); 465 | if (!newc) { 466 | if (settings.verbose > 0) 467 | fprintf(stderr, "couldn't create new connection\n"); 468 | close(sfd); 469 | } 470 | break; 471 | 472 | case conn_read: 473 | res = try_read_command(c); 474 | if (res == STAY_IN) { 475 | continue; 476 | } else if (res == FORWARD_OUT) { 477 | stop = 1; 478 | break; 479 | } 480 | if (try_read_network(c) != 0) { 481 | continue; 482 | } 483 | /* we have no command line and no data to read from network */ 484 | if (!update_event(c, EV_READ | EV_PERSIST)) { 485 | if (settings.verbose > 0) 486 | fprintf(stderr, "Couldn't update event\n"); 487 | c->state = conn_closing; 488 | break; 489 | } 490 | stop = 1; 491 | break; 492 | 493 | case conn_write: 494 | if (c->wbytes == 0) { 495 | c->state = c->write_and_go; 496 | break; 497 | } 498 | 499 | res = write(c->sfd, c->wcurr, c->wbytes); 500 | if (res > 0) { 501 | c->wcurr += res; 502 | c->wbytes -= res; 503 | break; 504 | } 505 | if (res == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) { 506 | if (!update_event(c, EV_WRITE | EV_PERSIST)) { 507 | if (settings.verbose > 0) 508 | fprintf(stderr, "Couldn't update event\n"); 509 | c->state = conn_closing; 510 | break; 511 | } 512 | stop = 1; 513 | break; 514 | } 515 | c->state = conn_closing; 516 | break; 517 | 518 | case conn_closing: 519 | conn_close(c); 520 | stop = 1; 521 | break; 522 | } 523 | } 524 | 525 | return; 526 | } 527 | 528 | 529 | void event_handler(const int fd, const short which, void *arg) { 530 | conn *c; 531 | 532 | c = (conn *)arg; 533 | assert(c != NULL); 534 | 535 | c->which = which; 536 | 537 | if (fd != c->sfd) { 538 | if (settings.verbose > 0) 539 | fprintf(stderr, "Catastrophic: event fd doesn't match conn fd!\n"); 540 | conn_close(c); 541 | return; 542 | } 543 | 544 | drive_machine(c); 545 | 546 | return; 547 | } 548 | 549 | void notify_handler(const int fd, const short which, void *arg) { 550 | if (fd != notify_receive_fd) { 551 | if (settings.verbose > 0) 552 | fprintf(stderr, "Catastrophic: event fd doesn't match conn fd!\n"); 553 | return; 554 | } 555 | char buf[1]; 556 | if (read(fd, buf, 1) != 1) { 557 | if (settings.verbose > 0) 558 | fprintf(stderr, "Can't read from event pipe\n"); 559 | } 560 | 561 | conn *c = cq_peek(&RSP); 562 | if (c != NULL) { 563 | c->wcurr = c->wbuf; 564 | c->state = conn_write; 565 | c->write_and_go = conn_read; 566 | 567 | drive_machine(c); 568 | } 569 | 570 | return; 571 | } 572 | 573 | conn *conn_new(int sfd, int init_state, int event_flags, struct event_base *base) { 574 | conn *c; 575 | 576 | if (freecurr > 0) { 577 | c = freeconns[--freecurr]; 578 | } else { 579 | if (!(c = (conn *)malloc(sizeof(conn)))) { 580 | perror("malloc()"); 581 | return 0; 582 | } 583 | 584 | c->rbuf = c->wbuf = 0; 585 | c->rbuf = (char *) malloc(DATA_BUFFER_SIZE); 586 | c->wbuf = (char *) malloc(DATA_BUFFER_SIZE); 587 | 588 | if (c->rbuf == 0 || c->wbuf == 0) { 589 | if (c->rbuf != 0) free(c->rbuf); 590 | if (c->wbuf != 0) free(c->wbuf); 591 | free(c); 592 | perror("malloc()"); 593 | return 0; 594 | } 595 | c->rsize = c->wsize = DATA_BUFFER_SIZE; 596 | c->rcurr = c->rbuf; 597 | 598 | STATS_LOCK(); 599 | stats.conn_structs++; 600 | STATS_UNLOCK(); 601 | } 602 | 603 | if (settings.verbose > 1) { 604 | if (init_state == conn_listening) 605 | fprintf(stderr, "<%d server listening\n", sfd); 606 | else 607 | fprintf(stderr, "<%d new client connection\n", sfd); 608 | } 609 | 610 | c->sfd = sfd; 611 | c->state = init_state; 612 | c->rbytes = c->wbytes = 0; 613 | c->wcurr = c->wbuf; 614 | c->write_and_go = conn_read; 615 | event_set(&c->event, sfd, event_flags, event_handler, (void *)c); 616 | event_base_set(base, &c->event); 617 | c->ev_flags = event_flags; 618 | if (event_add(&c->event, 0) == -1) { 619 | if (freecurr < freetotal) { 620 | freeconns[freecurr++] = c; 621 | } else { 622 | free (c->rbuf); 623 | free (c->wbuf); 624 | free (c); 625 | 626 | STATS_LOCK(); 627 | stats.conn_structs--; 628 | STATS_UNLOCK(); 629 | } 630 | return 0; 631 | } 632 | 633 | STATS_LOCK(); 634 | stats.curr_conns++; 635 | STATS_UNLOCK(); 636 | return c; 637 | } 638 | 639 | static int l_socket = 0; 640 | 641 | static int new_socket() { 642 | int sfd; 643 | int flags; 644 | 645 | if ((sfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) { 646 | perror("socket()"); 647 | return -1; 648 | } 649 | 650 | if ((flags = fcntl(sfd, F_GETFL, 0)) < 0 || 651 | fcntl(sfd, F_SETFL, flags | O_NONBLOCK) < 0) { 652 | perror("setting O_NONBLOCK"); 653 | close(sfd); 654 | return -1; 655 | } 656 | return sfd; 657 | } 658 | 659 | static int server_socket(const int port) { 660 | int sfd; 661 | struct linger ling = {0, 0}; 662 | struct sockaddr_in addr; 663 | int flags = 1; 664 | 665 | if ((sfd = new_socket()) == -1) { 666 | return -1; 667 | } 668 | 669 | setsockopt(sfd, SOL_SOCKET, SO_REUSEADDR, (void *)&flags, sizeof(flags)); 670 | setsockopt(sfd, SOL_SOCKET, SO_KEEPALIVE, (void *)&flags, sizeof(flags)); 671 | setsockopt(sfd, SOL_SOCKET, SO_LINGER, (void *)&ling, sizeof(ling)); 672 | setsockopt(sfd, IPPROTO_TCP, TCP_NODELAY, (void *)&flags, sizeof(flags)); 673 | 674 | memset(&addr, 0, sizeof(addr)); 675 | 676 | addr.sin_family = AF_INET; 677 | addr.sin_port = htons(port); 678 | addr.sin_addr = settings.interf; 679 | if (bind(sfd, (struct sockaddr *)&addr, sizeof(addr)) == -1) { 680 | perror("bind()"); 681 | close(sfd); 682 | return -1; 683 | } 684 | 685 | if (listen(sfd, 1024) == -1) { 686 | perror("listen()"); 687 | close(sfd); 688 | return -1; 689 | } 690 | return sfd; 691 | } 692 | 693 | static void* worker(void *arg) 694 | { 695 | pthread_detach(pthread_self()); 696 | 697 | hdb_t *hdb = (hdb_t *)arg; 698 | hdict_t *hdict; 699 | conn *c; 700 | while(1) { 701 | c = cq_pop(&REQ); 702 | 703 | struct timeval now; 704 | gettimeofday(&now, NULL); 705 | 706 | struct timeval tv; 707 | mytimesub(&now, &c->tv, &tv); 708 | 709 | if (settings.timeout > 0 && tv.tv_sec * 1000 + tv.tv_usec/1000 > settings.timeout) { 710 | pack_string(c, "SERVER_ERROR timeout"); 711 | STATS_LOCK(); 712 | stats.timeouts++; 713 | STATS_UNLOCK(); 714 | } else if (strcmp(c->cmd, "stats") == 0) { 715 | char temp[1024]; 716 | pid_t pid = getpid(); 717 | char *pos = temp; 718 | 719 | uint32_t length = cq_length(&REQ); 720 | STATS_LOCK(); 721 | pos += sprintf(pos, "STAT pid %u\r\n", pid); 722 | pos += sprintf(pos, "STAT uptime %lld\r\n", (long long)stats.started); 723 | pos += sprintf(pos, "STAT curr_connections %u\r\n", stats.curr_conns - 1); /* ignore listening conn */ 724 | pos += sprintf(pos, "STAT connection_structures %u\r\n", stats.conn_structs); 725 | pos += sprintf(pos, "STAT cmd_get %u\r\n", stats.get_cmds); 726 | pos += sprintf(pos, "STAT get_hits %u\r\n", stats.get_hits); 727 | pos += sprintf(pos, "STAT get_misses %u\r\n", stats.get_misses); 728 | pos += sprintf(pos, "STAT threads %u\r\n", settings.num_threads); 729 | pos += sprintf(pos, "STAT timeouts %u\r\n", stats.timeouts); 730 | pos += sprintf(pos, "STAT waiting_requests %u\r\n", length); 731 | pos += sprintf(pos, "STAT ialloc_failed %u\r\n", stats.ialloc_failed); 732 | pos += sprintf(pos, "END"); 733 | STATS_UNLOCK(); 734 | pack_string(c, temp); 735 | } else if (strcmp(c->cmd, "stats reset") == 0) { 736 | STATS_LOCK(); 737 | stats.get_cmds = 0; 738 | stats.get_hits = 0; 739 | stats.get_misses = 0; 740 | stats.timeouts = 0; 741 | stats.ialloc_failed = 0; 742 | STATS_UNLOCK(); 743 | pack_string(c, "RESET"); 744 | } else if (strncmp(c->cmd, "open ", 5) == 0 || 745 | strncmp(c->cmd, "reopen ", 7) == 0) { 746 | 747 | char path[256]; 748 | uint32_t hdid; 749 | int res = sscanf(c->cmd, "%*s %255s %u\n", path, &hdid); 750 | if (res != 2 || strlen(path) == 0) { 751 | pack_string(c, "CLIENT_ERROR bad command line format"); 752 | } else { 753 | int status = hdb_reopen(hdb, path, hdid); 754 | if (status == 0) { 755 | pack_string(c, "OPENED"); 756 | } else { 757 | if (settings.verbose > 0) 758 | fprintf(stderr, "failed to open %s on %d, return %d\n", 759 | path, hdid, status); 760 | if (status == EHDICT_OUT_OF_MEMERY) { 761 | STATS_LOCK(); 762 | stats.ialloc_failed++; 763 | STATS_UNLOCK(); 764 | } 765 | pack_string(c, "SERVER_ERROR open failed"); 766 | } 767 | } 768 | } else if (strncmp(c->cmd, "close ", 6) == 0) { 769 | uint32_t hdid; 770 | hdid = atoi(c->cmd + 6); 771 | int res = hdb_close(hdb, hdid); 772 | if (res == 1) { 773 | pack_string(c, "CLOSED"); 774 | } else { 775 | pack_string(c, "NOT_FOUND"); 776 | } 777 | } else if (strncmp(c->cmd, "randomkey ", 10) == 0) { 778 | uint32_t hdid; 779 | hdid = atoi(c->cmd + 10); 780 | hdict = hdb_ref(hdb, hdid); 781 | if (hdict == NULL) { 782 | if (settings.verbose > 1) 783 | fprintf(stderr, "cmd, "info", 4) == 0) { 799 | char temp[4096]; 800 | hdb_info(hdb, temp, 4096); 801 | pack_string(c, temp); 802 | } else if (strncmp(c->cmd, "get ", 4) == 0) { 803 | char *start = c->cmd + 4; 804 | char token[251]; 805 | int next; 806 | uint32_t hdid; 807 | uint64_t key; 808 | off_t off; 809 | uint32_t length; 810 | c->wbytes = 0; 811 | int res, nc; 812 | char *en_dash; 813 | while(sscanf(start, " %250s%n", token, &next) >= 1) { 814 | start += next; 815 | 816 | STATS_LOCK(); 817 | stats.get_cmds++; 818 | STATS_UNLOCK(); 819 | 820 | hdid = 0; 821 | if ((en_dash = strchr(token, '-')) == NULL) { 822 | key = strtoull(token, NULL, 10); 823 | } else { 824 | hdid = atoi(token); 825 | key = strtoull(en_dash+1, NULL, 10); 826 | } 827 | hdict = hdb_ref(hdb, hdid); 828 | if (hdict == NULL) { 829 | if (settings.verbose > 1) 830 | fprintf(stderr, "num_qry++; 837 | 838 | if (hdict_seek(hdict, key, &off, &length)) { 839 | STATS_LOCK(); 840 | stats.get_hits++; 841 | STATS_UNLOCK(); 842 | if (length < HDICT_VALUE_LENGTH_MAX) { 843 | if (c->wsize - c->wbytes < 512 + length) { 844 | int relloc_length = c->wsize * 2; 845 | if (relloc_length - c->wbytes < 512 + length) { 846 | relloc_length = 512 + length + c->wbytes; 847 | relloc_length = (relloc_length/DATA_BUFFER_SIZE + 1) * DATA_BUFFER_SIZE; 848 | } 849 | char *newbuf = (char *)realloc(c->wbuf, relloc_length); 850 | if (newbuf) { 851 | c->wbuf = newbuf; 852 | c->wsize = relloc_length; 853 | } else { 854 | if (settings.verbose > 0) 855 | fprintf(stderr, "Couldn't realloc output buffer\n"); 856 | goto NEXT; 857 | } 858 | } 859 | 860 | nc = sprintf(c->wbuf + c->wbytes, "VALUE %s %u %u\r\n", token, 0, length); 861 | res = hdict_read(hdict, c->wbuf + c->wbytes + nc, length, off); 862 | if (res != length) { 863 | if (settings.verbose > 0) 864 | fprintf(stderr, "Failed to read from hdict\n"); 865 | goto NEXT; 866 | } 867 | 868 | c->wbytes += nc; 869 | c->wbytes += length; 870 | c->wbytes += sprintf(c->wbuf + c->wbytes, "\r\n"); 871 | } 872 | } else { 873 | STATS_LOCK(); 874 | stats.get_misses++; 875 | STATS_UNLOCK(); 876 | } 877 | NEXT: 878 | hdb_deref(hdb, hdict); 879 | } 880 | c->wbytes += sprintf(c->wbuf + c->wbytes, "END\r\n"); 881 | } else { 882 | pack_string(c, "ERROR"); 883 | } 884 | 885 | cq_push(&RSP, c); 886 | pthread_mutex_lock(¬ify_lock); 887 | if (write(notify_send_fd, "", 1) != 1) { 888 | perror("Writing to thread notify pipe"); 889 | } 890 | pthread_mutex_unlock(¬ify_lock); 891 | } 892 | 893 | return NULL; 894 | } 895 | 896 | static void usage(void) { 897 | printf("-p TCP port number to listen on (default: 9764)\n" 898 | "-l interface to listen on, default is INDRR_ANY\n" 899 | "-c max simultaneous connections (default: 1024)\n" 900 | "-d run as a daemon\n" 901 | "-h print this help and exit\n" 902 | "-v verbose (print errors/warnings)\n" 903 | "-s strict mode (exit while open hdb failed before startup)\n" 904 | "-t number of worker threads to use, default 4\n" 905 | "-T timeout in millisecond, 0 for none, default 0\n"); 906 | return; 907 | } 908 | 909 | int main(int argc, char *argv[]) 910 | { 911 | int c; 912 | int daemonize = 0; 913 | int strict = 0; 914 | struct in_addr addr; 915 | 916 | char *init = NULL; 917 | 918 | settings_init(); 919 | setbuf(stderr, NULL); 920 | 921 | while ((c = getopt(argc, argv, "p:c:hvdl:t:T:i:")) != -1) { 922 | switch (c) { 923 | case 'p': 924 | settings.port = atoi(optarg); 925 | break; 926 | case 'c': 927 | settings.maxconns = atoi(optarg); 928 | break; 929 | case 'h': 930 | usage(); 931 | exit(0); 932 | case 'l': 933 | if (inet_pton(AF_INET, optarg, &addr) <= 0) { 934 | fprintf(stderr, "Illegal address: %s\n", optarg); 935 | return 1; 936 | } else { 937 | settings.interf = addr; 938 | } 939 | break; 940 | case 'd': 941 | daemonize = 1; 942 | break; 943 | case 'v': 944 | settings.verbose++; 945 | break; 946 | case 's': 947 | strict = 1; 948 | break; 949 | case 't': 950 | settings.num_threads = atoi(optarg); 951 | if (settings.num_threads == 0) { 952 | fprintf(stderr, "Number of threads must be greater than 0\n"); 953 | return 1; 954 | } 955 | break; 956 | case 'T': 957 | settings.timeout = atoi(optarg); 958 | break; 959 | case 'i': 960 | init = strdup(optarg); 961 | break; 962 | default: 963 | fprintf(stderr, "Illegal argument \"%c\"\n", c); 964 | return 1; 965 | } 966 | } 967 | 968 | srand(time(NULL)^getpid()); 969 | 970 | struct rlimit rlim; 971 | if (getrlimit(RLIMIT_NOFILE, &rlim) != 0) { 972 | fprintf(stderr, "failed to getrlimit number of files\n"); 973 | exit(1); 974 | } else { 975 | int maxfiles = settings.maxconns; 976 | if (rlim.rlim_cur < maxfiles) 977 | rlim.rlim_cur = maxfiles; 978 | if (rlim.rlim_max < rlim.rlim_cur) 979 | rlim.rlim_max = rlim.rlim_cur; 980 | if (setrlimit(RLIMIT_NOFILE, &rlim) != 0) { 981 | fprintf(stderr, "failed to set rlimit for open files. Try running as root or requesting smaller maxconns value.\n"); 982 | exit(1); 983 | } 984 | } 985 | 986 | if (daemonize) { 987 | int res; 988 | res = daemon(1, settings.verbose); 989 | if (res == -1) { 990 | fprintf(stderr, "failed to daemon() in order to daemonize\n"); 991 | return 1; 992 | } 993 | } 994 | 995 | stats_init(); 996 | conn_init(); 997 | 998 | l_socket = server_socket(settings.port); 999 | if (l_socket == -1) { 1000 | fprintf(stderr, "failed to listen\n"); 1001 | exit(1); 1002 | } 1003 | 1004 | cq_init(&REQ, 1); 1005 | cq_init(&RSP, 0); 1006 | 1007 | int fds[2]; 1008 | if (pipe(fds)) { 1009 | fprintf(stderr, "can't create notify pipe\n"); 1010 | exit(1); 1011 | } 1012 | notify_receive_fd = fds[0]; 1013 | notify_send_fd = fds[1]; 1014 | 1015 | hdb_t hdb; 1016 | hdb_init(&hdb); 1017 | 1018 | int i; 1019 | if (init) { 1020 | FILE *fp; 1021 | if ((fp = fopen(init, "r")) == NULL) { 1022 | fprintf(stderr, "failed to open %s\n", init); 1023 | exit(1); 1024 | } 1025 | 1026 | char line[1024]; 1027 | int bad = 1; 1028 | while(fgets(line, 1024, fp)) { 1029 | i = strlen(line) - 1; 1030 | while(i>=0 && (line[i]=='\r' || line[i]=='\n')) { 1031 | line[i] = '\0'; 1032 | i--; 1033 | } 1034 | if (strncmp(line, "open ", 5) == 0) { 1035 | char path[256]; 1036 | uint32_t hdid; 1037 | int res = sscanf(line, "%*s %255s %u\n", path, &hdid); 1038 | if (res != 2 || strlen(path) == 0) { 1039 | fprintf(stderr, "illegal init command %s\n", line); 1040 | exit(1); 1041 | } 1042 | int status = hdb_reopen(&hdb, path, hdid); 1043 | if (status != 0) { 1044 | fprintf(stderr, "failed to open %s on %d, return %d\n", path, hdid, status); 1045 | if (strict) 1046 | exit(1); 1047 | else { 1048 | if (status == EHDICT_OUT_OF_MEMERY) { 1049 | stats.ialloc_failed++; 1050 | } 1051 | } 1052 | 1053 | } 1054 | } else if (strcmp(line, "end") == 0) { 1055 | bad = 0; 1056 | } 1057 | } 1058 | fclose(fp); 1059 | if (bad) { 1060 | fprintf(stderr, "bad init command file %s, expect \"end\"\n", init); 1061 | exit(1); 1062 | } 1063 | } 1064 | 1065 | pthread_t tid; 1066 | pthread_create(&tid, NULL, hdb_mgr, &hdb); 1067 | 1068 | for (i = 0; i < settings.num_threads; i++) { 1069 | pthread_create(&tid, NULL, worker, &hdb); 1070 | } 1071 | 1072 | struct event_base *main_base = event_init(); 1073 | 1074 | struct sigaction sa; 1075 | sa.sa_handler = SIG_IGN; 1076 | sa.sa_flags = 0; 1077 | if (sigemptyset(&sa.sa_mask) == -1 || 1078 | sigaction(SIGPIPE, &sa, 0) == -1) { 1079 | perror("failed to ignore SIGPIPE; sigaction"); 1080 | exit(1); 1081 | } 1082 | struct event notify_event; 1083 | event_set(¬ify_event, notify_receive_fd, 1084 | EV_READ | EV_PERSIST, notify_handler, NULL); 1085 | event_base_set(main_base, ¬ify_event); 1086 | 1087 | if (event_add(¬ify_event, 0) == -1) { 1088 | fprintf(stderr, "can't monitor libevent notify pipe\n"); 1089 | exit(1); 1090 | } 1091 | 1092 | conn *listen_conn; 1093 | if (!(listen_conn = conn_new(l_socket, conn_listening, 1094 | EV_READ | EV_PERSIST, main_base))) { 1095 | fprintf(stderr, "failed to create listening connection"); 1096 | exit(1); 1097 | } 1098 | event_base_loop(main_base, 0); 1099 | 1100 | exit(0); 1101 | } 1102 | -------------------------------------------------------------------------------- /src/index_sort.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | typedef struct { 10 | uint64_t key; 11 | uint64_t pos; 12 | } idx_t; 13 | 14 | static int cmp_idx_key(const void* p1, const void* p2) 15 | { 16 | uint64_t key1 = ((idx_t*)p1)->key; 17 | uint64_t key2 = ((idx_t*)p2)->key; 18 | 19 | return key1 >= key2 ? 1 : -1; 20 | } 21 | 22 | static void tcpl_error(char* fmt, ...) 23 | { 24 | va_list args; 25 | 26 | va_start(args, fmt); 27 | fprintf(stdout, "error: "); 28 | vfprintf(stdout, fmt, args); 29 | fprintf(stdout, "\n"); 30 | va_end(args); 31 | 32 | exit(EXIT_FAILURE); 33 | } 34 | 35 | int main(int argc, char *argv[]) 36 | { 37 | if (argc != 2) { 38 | tcpl_error("usage: %s idx_file", argv[0]); 39 | } 40 | 41 | const char* idx_file = argv[1]; 42 | 43 | struct stat st; 44 | if (stat(idx_file, &st) == -1 || 45 | (st.st_size % sizeof(idx_t) != 0)) { 46 | tcpl_error("bad idx file: %s", idx_file); 47 | } 48 | FILE* idx_fp = fopen(idx_file, "r+b"); 49 | if (idx_fp == NULL) { 50 | tcpl_error("open %s fail", idx_file); 51 | } 52 | idx_t* idx = (idx_t*)malloc(st.st_size); 53 | if (idx == NULL) { 54 | tcpl_error("malloc fail"); 55 | } 56 | uint32_t idx_num = st.st_size / sizeof(idx_t); 57 | 58 | // 读索引文件 59 | if (fread(idx, sizeof(idx_t), idx_num, idx_fp) != idx_num) { 60 | tcpl_error("fread fail"); 61 | } 62 | 63 | // 根据索引的key值升序排序 64 | qsort(idx, idx_num, sizeof(idx_t), cmp_idx_key); 65 | 66 | // 将排序好的结果覆盖原索引文件 67 | rewind(idx_fp); 68 | if (fwrite(idx, sizeof(idx_t), idx_num, idx_fp) != idx_num) { 69 | tcpl_error("fwrite fail"); 70 | } 71 | 72 | free(idx); 73 | idx = NULL; 74 | 75 | fclose(idx_fp); 76 | idx_fp = NULL; 77 | 78 | exit(EXIT_SUCCESS); 79 | } 80 | -------------------------------------------------------------------------------- /tools/LushanFileOutputFormat.java: -------------------------------------------------------------------------------- 1 | import java.io.IOException; 2 | import java.io.DataOutputStream; 3 | import java.io.BufferedOutputStream; 4 | 5 | import org.apache.hadoop.fs.FileSystem; 6 | import org.apache.hadoop.fs.Path; 7 | 8 | import org.apache.hadoop.io.compress.CompressionOutputStream; 9 | import org.apache.hadoop.io.compress.Compressor; 10 | import org.apache.hadoop.io.compress.CodecPool; 11 | import org.apache.hadoop.io.compress.CompressionCodec; 12 | import org.apache.hadoop.io.compress.DefaultCodec; 13 | 14 | import org.apache.hadoop.io.WritableComparable; 15 | import org.apache.hadoop.io.Writable; 16 | import org.apache.hadoop.io.LongWritable; 17 | import org.apache.hadoop.io.DataOutputBuffer; 18 | import org.apache.hadoop.fs.FSDataOutputStream; 19 | import org.apache.hadoop.io.SequenceFile.CompressionType; 20 | import org.apache.hadoop.io.serializer.SerializationFactory; 21 | import org.apache.hadoop.io.serializer.Serializer; 22 | import org.apache.hadoop.conf.Configuration; 23 | import org.apache.hadoop.util.Progressable; 24 | import org.apache.hadoop.util.ReflectionUtils; 25 | import org.apache.hadoop.mapred.FileOutputFormat; 26 | import org.apache.hadoop.mapred.JobConf; 27 | import org.apache.hadoop.mapred.RecordWriter; 28 | import org.apache.hadoop.mapred.Reporter; 29 | 30 | public class LushanFileOutputFormat 31 | extends FileOutputFormat { 32 | 33 | protected static class LushanRecordWriter 34 | implements RecordWriter { 35 | 36 | private long size = 0; 37 | private long offset = 0; 38 | private LongWritable lastKey = new LongWritable(); 39 | 40 | private boolean compress = false; 41 | private CompressionCodec codec = null; 42 | private CompressionOutputStream deflateFilter = null; 43 | private DataOutputStream deflateOut = null; 44 | private Compressor compressor = null; 45 | 46 | protected DataOutputBuffer buffer = new DataOutputBuffer(); 47 | protected Serializer uncompressedValSerializer = null; 48 | protected Serializer compressedValSerializer = null; 49 | 50 | public static final String INDEX_FILE_NAME = "idx"; 51 | public static final String DATA_FILE_NAME = "dat"; 52 | 53 | protected FSDataOutputStream dataOut; 54 | protected FSDataOutputStream indexOut; 55 | 56 | public LushanRecordWriter(Configuration conf, FileSystem fs, String dirName, 57 | Class valClass, CompressionCodec codec, Progressable progress) throws IOException { 58 | 59 | this.codec = codec; 60 | 61 | SerializationFactory serializationFactory = new SerializationFactory(conf); 62 | if (this.codec != null) { 63 | this.compress = true; 64 | this.compressor = CodecPool.getCompressor(this.codec); 65 | this.deflateFilter = this.codec.createOutputStream(buffer, compressor); 66 | 67 | this.deflateOut = 68 | new DataOutputStream(new BufferedOutputStream(deflateFilter)); 69 | this.compressedValSerializer = serializationFactory.getSerializer(valClass); 70 | this.compressedValSerializer.open(deflateOut); 71 | } else { 72 | this.uncompressedValSerializer = serializationFactory.getSerializer(valClass); 73 | this.uncompressedValSerializer.open(buffer); 74 | } 75 | 76 | Path dir = new Path(dirName); 77 | if (!fs.mkdirs(dir)) { 78 | throw new IOException("Mkdirs failed to create directory " + dir.toString()); 79 | } 80 | Path dataFile = new Path(dir, DATA_FILE_NAME); 81 | Path indexFile = new Path(dir, INDEX_FILE_NAME); 82 | 83 | this.dataOut = fs.create(dataFile, progress); 84 | this.indexOut = fs.create(indexFile, progress); 85 | } 86 | 87 | private void checkKey(LongWritable key) throws IOException { 88 | 89 | // check that keys are well-ordered 90 | if (size != 0 && lastKey.get() > key.get()) 91 | throw new IOException("key out of order: "+key+" after "+lastKey); 92 | 93 | lastKey.set(key.get()); 94 | } 95 | 96 | 97 | 98 | public synchronized void write(WritableComparable key, Writable value) 99 | throws IOException { 100 | 101 | if (!(key instanceof LongWritable)) { 102 | throw new IOException("key is not instanceof LongWritable"); 103 | } 104 | 105 | buffer.reset(); 106 | 107 | LongWritable k = (LongWritable)key; 108 | checkKey(k); 109 | offset = dataOut.getPos(); 110 | if (offset > 0xFFFFFFFFFFL) return; 111 | 112 | if (compress) { 113 | deflateFilter.resetState(); 114 | compressedValSerializer.serialize(value); 115 | deflateOut.flush(); 116 | deflateFilter.finish(); 117 | } else { 118 | uncompressedValSerializer.serialize(value); 119 | } 120 | 121 | long length = buffer.getLength(); 122 | if (length > 0x2FFFFFL) return; 123 | 124 | long pos = (length << 40) | offset; 125 | dataOut.write(buffer.getData(), 0, buffer.getLength()); 126 | 127 | indexOut.writeLong(Long.reverseBytes(k.get())); 128 | indexOut.writeLong(Long.reverseBytes(pos)); 129 | 130 | size++; 131 | } 132 | 133 | public synchronized void close(Reporter reporter) throws IOException { 134 | 135 | if (uncompressedValSerializer != null) { 136 | uncompressedValSerializer.close(); 137 | } 138 | 139 | if (compressedValSerializer != null) { 140 | compressedValSerializer.close(); 141 | CodecPool.returnCompressor(compressor); 142 | compressor = null; 143 | } 144 | 145 | dataOut.close(); 146 | indexOut.close(); 147 | } 148 | } 149 | 150 | public RecordWriter getRecordWriter(FileSystem ignored, JobConf job, 151 | String name, Progressable progress) 152 | throws IOException { 153 | // get the path of the temporary output file 154 | Path file = FileOutputFormat.getTaskOutputPath(job, name); 155 | 156 | FileSystem fs = file.getFileSystem(job); 157 | CompressionCodec codec = null; 158 | if (getCompressOutput(job)) { 159 | // find the right codec 160 | Class codecClass = getOutputCompressorClass(job, 161 | DefaultCodec.class); 162 | codec = ReflectionUtils.newInstance(codecClass, job); 163 | } 164 | 165 | // ignore the progress parameter, since LushanFile is local 166 | return new LushanRecordWriter(job, fs, file.toString(), 167 | job.getOutputValueClass().asSubclass(Writable.class), 168 | codec, 169 | progress); 170 | 171 | } 172 | } 173 | 174 | -------------------------------------------------------------------------------- /tools/generate_idx.py: -------------------------------------------------------------------------------- 1 | import struct 2 | import sys 3 | 4 | if len(sys.argv) < 3: 5 | print "usage:%s dat_file idx_file" % sys.argv[0] 6 | sys.exit(1) 7 | 8 | dat_fp = open(sys.argv[1]) 9 | idx_fp = open(sys.argv[2], 'wb') 10 | 11 | off = dat_fp.tell() 12 | line = dat_fp.readline() 13 | while line != '': 14 | pos = line.find(':') 15 | if pos > 0: 16 | id = int(line[0:pos]) 17 | off += pos + 1 18 | data_idx = struct.pack('QQ', id, ((len(line) - pos - 1) << 40) | off) 19 | idx_fp.write(data_idx) 20 | off = dat_fp.tell() 21 | line = dat_fp.readline() 22 | 23 | dat_fp.close() 24 | idx_fp.close() 25 | -------------------------------------------------------------------------------- /upload/0/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/upload/0/.gitignore -------------------------------------------------------------------------------- /upload/1/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/upload/1/.gitignore -------------------------------------------------------------------------------- /upload/2/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/upload/2/.gitignore -------------------------------------------------------------------------------- /upload/3/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wbrecom/lushan/a98af965f79c2463f7f5e68c5c21b3f4a5def590/upload/3/.gitignore --------------------------------------------------------------------------------