├── LinUCB ├── LinUCB流程图.jpg └── LinUCB流程图.vsd ├── README.md ├── UCBAlgorithm ├── UCB1_by_C++ │ ├── Makefile │ ├── README.md │ ├── UCB1.cpp │ ├── UCB1.h │ └── main.cpp ├── UCBServer_python │ ├── README.md │ ├── UCB1.py │ └── UCBServer.py ├── UCB_Server_demo │ ├── 20150924015910.attri │ ├── Makefile │ ├── README.md │ ├── UCB1.cpp │ ├── UCB1.h │ ├── main.cpp │ ├── url_chinese.cpp │ └── url_chinese.h ├── UCB_json_time │ ├── Makefile4 │ ├── README.md │ ├── UCB1.cpp │ ├── UCB1.h │ ├── read_test.cpp │ └── topics_and_tags │ │ ├── tags_json_file_0.3k.txt │ │ ├── tags_json_file_0.4w.txt │ │ ├── tags_json_file_0.5k.txt │ │ ├── tags_json_file_0.6w.txt │ │ ├── tags_json_file_0.8w.txt │ │ ├── tags_json_file_1.2w.txt │ │ ├── tags_json_file_1.4w.txt │ │ ├── tags_json_file_1.5k.txt │ │ ├── tags_json_file_1.6w.txt │ │ ├── tags_json_file_1.8w.txt │ │ ├── tags_json_file_1k.txt │ │ ├── tags_json_file_1w.txt │ │ ├── tags_json_file_2k.txt │ │ ├── tags_json_file_2w.txt │ │ ├── topics_json_file_0.3k.txt │ │ ├── topics_json_file_0.4w.txt │ │ ├── topics_json_file_0.5k.txt │ │ ├── topics_json_file_0.6w.txt │ │ ├── topics_json_file_0.8w.txt │ │ ├── topics_json_file_1.2w.txt │ │ ├── topics_json_file_1.4w.txt │ │ ├── topics_json_file_1.5k.txt │ │ ├── topics_json_file_1.6w.txt │ │ ├── topics_json_file_1.8w.txt │ │ ├── topics_json_file_1k.txt │ │ ├── topics_json_file_1w.txt │ │ ├── topics_json_file_2k.txt │ │ └── topics_json_file_2w.txt ├── Using Multi-armed Bandit to Solve Cold-start Prob.pdf └── bandit_algorithms_for_website_optimization.pdf ├── user_follow_rate ├── README.md ├── add_coperation_count.py ├── add_coperation_count.pyc ├── getid_from_mysql.py ├── getid_from_mysql.pyc ├── read_from_clean.py ├── read_from_clean.pyc ├── sort_user_follow.py ├── start_read.py ├── start_read.sh └── start_read.sh.bak └── web_crawler └── get_url ├── README.md ├── get_final_url.py ├── get_url_name.py ├── list_unique.txt ├── list_url.txt ├── list_url_first.txt └── wget.sh /LinUCB/LinUCB流程图.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YinWenAtBIT/Machine_Learning/393751cc0e780f754faa7346c54da29c41a54bd6/LinUCB/LinUCB流程图.jpg -------------------------------------------------------------------------------- /LinUCB/LinUCB流程图.vsd: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YinWenAtBIT/Machine_Learning/393751cc0e780f754faa7346c54da29c41a54bd6/LinUCB/LinUCB流程图.vsd -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | #机器学习及数据挖掘 2 | 3 | 在做实习推荐算法工程师时学习到的一些算法将会贴在这里 4 | 5 | ###增强学习 6 | 1. UCB(upper confidence bound)上置信度算法 7 | 2. 抓取商品的URL 8 | 9 | ###文件列表 10 | 1. UCB算法: 11 | 1. 最初的C++编写的UCB1算法基础类 12 | 2. 使用基于C++11和boost1.56编写,名为served的http服务器,开发的UCB算法测试demo。 13 | 3. C++UCB类的json解析速度测试 14 | 4. 重新使用python编写的UCB类,使用Python的tornado异步非阻塞的web服务器搭建的UCB推荐服务器 15 | 16 | 2. 抓取商品URL: 17 | 1. 抓取什么值得买的双十一购物清单的商品URL,使用shell与python混编。 18 | 19 | 3. 分析用户操作记录,给用户关注的店铺,品牌,列表排序: 20 | 1. 用户对商品的操作记录有:点击,购买,收藏,拉黑 21 | 2. 通过MySQL数据库得到商品对应的品牌,店铺,或者列表,记录下每个店铺的操作记录 22 | 3. 通过打分,将用户的关注店铺,品牌排序,将结果保存在redis数据库中 -------------------------------------------------------------------------------- /UCBAlgorithm/UCB1_by_C++/Makefile: -------------------------------------------------------------------------------- 1 | C=gcc 2 | CXX=g++ 3 | CFLAGS= -g -D_DEBUG -fPIC -Wshadow -Wcast-qual -Wcast-align -Wwrite-strings -Wsign-compare -Winvalid-pch -fms-extensions -Wall -MMD 4 | CPPFLAGS=$(CFLAGS) -Woverloaded-virtual -Wsign-promo -fno-gnu-keywords -std=c++11 5 | 6 | DEPS=/home/yw/jsoncpp-src-0.5.0/include /home/yw/jsoncpp-src-0.5.0/libs/linux-gcc-4.8.3/libjson_linux-gcc-4.8.3_libmt.a 7 | OBJS=main.o UCB1.o 8 | 9 | TARGET=unittest_UCB1 10 | 11 | 12 | ALL: $(TARGET) 13 | 14 | $(TARGET):main.o UCB1.o 15 | $(CXX) $(CPPFLAGS) -o unittest_UCB1 main.o UCB1.o -I$(DEPS) 16 | 17 | UCB1.o: UCB1.cpp 18 | $(CXX) $(CPPFLAGS) -c $< -I$(DEPS) 19 | 20 | main.o:main.cpp 21 | $(CXX) $(CPPFLAGS) -c $< -I$(DEPS) 22 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB1_by_C++/README.md: -------------------------------------------------------------------------------- 1 | #UCB(upper confidence bound)上置信度算法 2 | 3 | ###一、问题模型: 4 | 5 | Multi-armed bandit问题,中文译名或叫做“多臂赌博机”问题。 6 | 在概率论中,多臂赌博机问题(有时也称为K臂/N臂赌博机问题),是一个赌徒需要在一排老虎机前决定拉动哪一个老虎机的臂,并且决定每个臂需要被拉动多少次的问题。每台老虎机提供的奖励是与它自身的奖励随机分布的函数相关的。赌徒的目标是最大限度地通过杠杆拉动序列,使得获得的奖励最大化。 7 | 8 | 在实际应用中,多臂模型被用来模拟管理研究项目,比如一家制药公司,要给定一个的预算,问题是在各个项目中分配资源,项目的回报暂时只知道一部分,需要在项目进行的过程中才会知道的越来越清楚。 9 | 10 | 因此,在解决这个问题的时候,需要在“exploration”(探索新臂以获得跟多关于臂的回报的信息)和“exploitation”(选择已有回报最高的臂来获取最大利益)之中进行权衡。 11 | 12 | ###二、UCB1算法特性: 13 | 14 | 在实现UCB算法的时候,我们不需要关心其他的假设条件,只要满足一个条件:回报是分布在0-1之间,1代表最大的回报。如果使用的模型最大回报结果超出了这个范围,需要对结果进行归一化。 15 | 16 | 除了保存每个臂结果的置信度以外,UCB算法还在以下两个点不同与之前的算法: 17 | 18 | 1. UCB完全不使用随机性,每一种情况下,UCB选择的臂都是可以通过数据计算出来的 19 | 20 | 2. UCB算法没有任何需要配置的参数,这意味着,在任何情况下你都可以使用UCB算法,没有任何需要的先验条件 21 | 22 | ###更新日志 23 | 1. 最初始的版本 24 | 2. 修正了select_arm_N中的变量类型错误,增加了比较详细的测试文档,以及makefile文件 25 | ##详细介绍 26 | 更加详细的UCB算法介绍请转至我的博客: 27 | http://blog.csdn.net/yw8355507/article/details/48579635 -------------------------------------------------------------------------------- /UCBAlgorithm/UCB1_by_C++/UCB1.cpp: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: UCB1.cpp 3 | > Author: YinWen 4 | > Mail: YinWenatBIT@163.com 5 | > Created Time: Tue 01 Sep 2015 08:32:35 PM CST 6 | ************************************************************************/ 7 | 8 | #include "UCB1.h" 9 | #include 10 | #include 11 | #include 12 | #include "json/json.h" 13 | #include 14 | 15 | 16 | 17 | using namespace std; 18 | 19 | UCB1::UCB1(): 20 | totalcount(0), default_count(0), default_value(0.0) 21 | { 22 | 23 | } 24 | 25 | UCB1::UCB1(int init_totalcount, int init_count, double init_value): 26 | totalcount(init_totalcount), default_count(init_count), default_value(init_value) 27 | { 28 | 29 | } 30 | 31 | UCB1::~UCB1() 32 | { 33 | 34 | } 35 | 36 | string UCB1::toString() 37 | { 38 | Json::FastWriter writer; 39 | Json::Value value; 40 | Json::Value element; 41 | std::string key; 42 | 43 | value["totalcount"] = Json::Value(totalcount); 44 | value["default_count"] = Json::Value(default_count); 45 | value["default_value"] = Json::Value(default_value); 46 | 47 | for(auto it = frequencyReward.begin(); it !=frequencyReward.end(); it++) 48 | { 49 | key = it->first; 50 | 51 | Json::Value temp_value; 52 | temp_value["counts"] = Json::Value(it->second.counts); 53 | temp_value["keyValue"] = Json::Value(it->second.values); 54 | element[key] = temp_value; 55 | } 56 | value["element"] = element; 57 | return writer.write(value); 58 | } 59 | 60 | 61 | bool UCB1::readFromString(const string& JString) 62 | { 63 | Json::Reader reader; 64 | Json::Value value; 65 | Json::Value element; 66 | std::string key; 67 | 68 | int readcount; 69 | double readdouble; 70 | 71 | if(!reader.parse(JString, value)) 72 | return false; 73 | 74 | element = value["element"]; 75 | auto keymember = element.getMemberNames(); 76 | int num = keymember.size(); 77 | 78 | for(int i=0; isecond.counts); 134 | countstr = temp; 135 | return countstr; 136 | } 137 | 138 | string UCB1::get_value(string key) 139 | { 140 | string valuestr; 141 | auto origin = frequencyReward.find(key); 142 | if(origin == frequencyReward.end()) 143 | return valuestr; 144 | 145 | char temp[25]; 146 | sprintf(temp, "%f", origin->second.values); 147 | valuestr = temp; 148 | return valuestr; 149 | } 150 | 151 | 152 | bool UCB1::update(const char * start, double res) 153 | { 154 | string key(start); 155 | return update(key, res); 156 | } 157 | 158 | 159 | 160 | bool UCB1::update(string& key, double res) 161 | { 162 | auto origin = frequencyReward.find(key); 163 | if(origin == frequencyReward.end()) 164 | { 165 | auto ret = frequencyReward.insert({ key, {default_count, default_value} }); 166 | return true; 167 | } 168 | 169 | 170 | double n = ++origin->second.counts; 171 | origin->second.values = origin->second.values*(n-1)/n + res/n; 172 | // ++this->totalcount; 173 | 174 | return true; 175 | } 176 | 177 | bool UCB1::update_reset_last(string & key, double res) 178 | { 179 | auto origin = frequencyReward.find(key); 180 | if(origin == frequencyReward.end()) 181 | { 182 | auto ret = frequencyReward.insert({ key, {default_count, default_value} }); 183 | return true; 184 | } 185 | 186 | 187 | double n = origin->second.counts; 188 | origin->second.values = (origin->second.values*n -default_value + res)/n; 189 | 190 | return true; 191 | 192 | } 193 | bool UCB1::update_reset_last(const char *start, double res) 194 | { 195 | string key(start); 196 | return update_reset_last(key, res); 197 | } 198 | 199 | string UCB1::select_arm() 200 | { 201 | string maxkey; 202 | 203 | double bonus; 204 | double maxvalue = 0.0; 205 | 206 | if(frequencyReward.empty()) 207 | throw empty_arm("no arm in the map"); 208 | 209 | for(auto it = frequencyReward.begin(); it != frequencyReward.end(); it++ ) 210 | { 211 | if(it->second.counts == 0) 212 | return it->first; 213 | 214 | bonus = sqrt(2* log((double)totalcount))/it->second.counts; 215 | if(maxvalue < bonus + it->second.values) 216 | { 217 | maxvalue = bonus + it->second.values; 218 | maxkey = it->first; 219 | } 220 | } 221 | 222 | return maxkey; 223 | } 224 | 225 | 226 | std::vector & UCB1::select_arm_N(size_t n) 227 | { 228 | keystrs.clear(); 229 | int countzero= 0; 230 | int count = 0; 231 | 232 | if(n ==0) 233 | return keystrs; 234 | if(n > frequencyReward.size()) 235 | n = frequencyReward.size(); 236 | 237 | if(frequencyReward.empty()) 238 | throw empty_arm("no arm in the map"); 239 | 240 | 241 | 242 | vector maxkey; 243 | maxkey.resize(n+1); 244 | 245 | double bonus; 246 | double valuenow; 247 | vector maxvalue; 248 | maxvalue.resize(n+1, 0.0); 249 | 250 | for(auto it = frequencyReward.begin(); it != frequencyReward.end() && countzero second.counts == 0) 253 | { 254 | keystrs.push_back(it->first); 255 | ++countzero; 256 | } 257 | else 258 | { 259 | bonus = sqrt(2* log((double)totalcount))/it->second.counts; 260 | valuenow = bonus +it->second.values; 261 | 262 | int i; 263 | if(count < n - countzero) 264 | i = count; 265 | else 266 | i = n - countzero; 267 | 268 | while(i>0) 269 | { 270 | if(valuenow > maxvalue[i-1]) 271 | { 272 | maxvalue[i] = maxvalue[i-1]; 273 | maxkey[i] = maxkey[i-1]; 274 | --i; 275 | } 276 | else 277 | break; 278 | 279 | } 280 | maxvalue[i] = valuenow; 281 | maxkey[i] = it->first; 282 | ++count; 283 | 284 | } 285 | } 286 | 287 | int remain = n - countzero; 288 | int i=0; 289 | while(i< remain) 290 | { 291 | keystrs.push_back(maxkey[i]); 292 | i++; 293 | } 294 | 295 | for(int i=0; iadd_totalcount(1); 299 | 300 | return keystrs; 301 | 302 | 303 | } 304 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB1_by_C++/UCB1.h: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: UCB1.h 3 | > Author: YinWen 4 | > Mail: YinWenatBIT@163.com 5 | > Created Time: Tue 01 Sep 2015 07:54:50 PM CST 6 | >Description:实现UCB1算法,提供一个UCB类 7 | ************************************************************************/ 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | #ifndef _UCB1_H 15 | #define _UCB1_H 16 | 17 | using namespace std; 18 | 19 | 20 | class empty_arm: public runtime_error 21 | { 22 | public: 23 | explicit empty_arm(const string &str): 24 | runtime_error(str) {} 25 | }; 26 | 27 | 28 | struct UCBNode 29 | { 30 | int counts; 31 | double values; 32 | }; 33 | 34 | typedef UCBNode UCB; 35 | 36 | class UCB1 37 | { 38 | public: 39 | UCB1(); 40 | UCB1(int init_totalcount, int init_count, double init_value); 41 | ~UCB1(); 42 | bool update(string & key, double res); 43 | bool update(const char *start, double res); 44 | 45 | bool update_reset_last(string & key, double res); 46 | bool update_reset_last(const char *start, double res); 47 | 48 | string toString(); 49 | bool readFromString(const string& JString); 50 | string select_arm(); 51 | std::vector & select_arm_N(size_t n); 52 | 53 | std::vector keystrs; 54 | bool insert(string & key, UCB value); 55 | bool insert(string &key); 56 | void set_totalcount(int number); 57 | void add_totalcount(int num); 58 | string get_totalcount(); 59 | string get_count(string key); 60 | string get_value(string key); 61 | private: 62 | 63 | std::unordered_map frequencyReward; 64 | int totalcount; 65 | int default_count; 66 | double default_value; 67 | }; 68 | 69 | 70 | 71 | #endif 72 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB1_by_C++/main.cpp: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: main.cpp 3 | > Author: YinWen 4 | > Mail: yinwenatbit@163.com 5 | > Created Time: Sun 06 Sep 2015 06:10:36 PM CST 6 | ************************************************************************/ 7 | 8 | #include 9 | using namespace std; 10 | 11 | #include "UCB1.h" 12 | 13 | 14 | 15 | int main() 16 | { 17 | UCB1 goods; 18 | goods.update("a", 0.0); 19 | goods.update("b", 0.0); 20 | goods.update("c", 0.0); 21 | goods.update("d", 0.0); 22 | goods.update("e", 0.0); 23 | goods.update("f", 0.0); 24 | 25 | cout<<"select three with all zero\n"<maxvalue: 82 | maxvalue = totalvalue 83 | maxkey = key 84 | 85 | self.add_totalcount(1) 86 | self.update(maxkey, self.default_value) 87 | return maxkey 88 | 89 | #添加黑名单后,立刻重新选择新的推荐,但是不增加totalcount与counts 90 | def select_arm_N_forcedbyBlackList(self, N): 91 | if N == 0: 92 | return None 93 | 94 | allvalue = {x:self.value_bonus(x) for x in self.frequencyReward if self.frequencyReward[x][2] != 1} 95 | 96 | sorted_key = sorted(allvalue.iteritems(), key = lambda asd:asd[1], reverse = True) 97 | allstr = [x[0] for x in sorted_key] 98 | return allstr[0:N] 99 | 100 | 101 | def select_arm_N(self, N): 102 | if N == 0: 103 | return None 104 | 105 | allvalue = {x:self.value_bonus(x) for x in self.frequencyReward if self.frequencyReward[x][2] != 1} 106 | 107 | sorted_key = sorted(allvalue.iteritems(), key = lambda asd:asd[1], reverse = True) 108 | allstr = [x[0] for x in sorted_key] 109 | for i in range(N): 110 | self.update(allstr[i], self.default_value) 111 | 112 | self.add_totalcount(1) 113 | return allstr[0:N] 114 | 115 | 116 | def value_bonus(self, key): 117 | if key in self.frequencyReward: 118 | bonus = math.sqrt(2*math.log(self.totalcount)) /float(self.frequencyReward[key][0]) 119 | return bonus+self.frequencyReward[key][1] 120 | 121 | 122 | def select_arm_N_sort(self, N): 123 | if N == 0: 124 | return None 125 | 126 | allvalue = {x:self.value_bonus(x) for x in self.frequencyReward} 127 | sorted(allvalue.iteritems(), key = lambda asd:asd[1], reverse = True) 128 | return allvalue 129 | 130 | def toString(self): 131 | json = {} 132 | json["totalcount"] = self.totalcount 133 | json["default_count"] = self.default_count 134 | json["default_value"] = self.default_value 135 | json["element"] = self.frequencyReward 136 | #element = simplejson.dumps(self.frequencyReward) 137 | json_str = simplejson.dumps(json) 138 | return json_str 139 | 140 | def readFromString(self, str): 141 | try: 142 | readDict = simplejson.loads(str) 143 | self.totalcount = readDict["totalcount"] 144 | self.default_count = readDict["default_count"] 145 | self.default_value = readDict["default_value"] 146 | self.frequencyReward = readDict["element"] 147 | except : 148 | print "wrong string" 149 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCBServer_python/UCBServer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding=utf-8 3 | 4 | import redis 5 | 6 | import tornado.ioloop 7 | import tornado.web 8 | import UCB1 9 | import simplejson 10 | 11 | initStr = '{"default_value": 0.5, "totalcount": 3, "default_count": 10, "element": {"topic9": [10, 0.5, 0], "topic8": [10, 0.5, 0], "topic1": [10, 0.5, 0], "topic3": [10, 0.5, 0], "topic2": [10, 0.5, 0], "topic5": [10, 0.5, 0], "topic4": [10, 0.5, 0], "topic7": [10, 0.5, 0], "topic6": [10, 0.5, 0], "topic14": [10, 0.5, 0], "topic11": [10, 0.5, 0], "topic10": [10, 0.5, 0], "topic13": [10, 0.5, 0], "topic12": [10, 0.5, 0]}}' 12 | 13 | print "use http://211.144.146.217:8888/select?uid=123&number=10 for select goods" 14 | 15 | print "use http://211.144.146.217:8888/update?uid=123&armkey=topic1&value=0.7 for update" 16 | print "use http://211.144.146.217:8888/blacklist?uid=123&armkey=topic1 for addblacklist" 17 | 18 | redisClient=redis.StrictRedis(host='211.144.146.217',port=6379,db=0) 19 | 20 | 21 | class SelectHandler(tornado.web.RequestHandler): 22 | def get(self): 23 | uid = self.get_argument("uid") 24 | num = self.get_argument("number") 25 | num = int(num) 26 | userKey = uid +".updated.ee" 27 | selectKey = uid+".selected.ee" 28 | #if uidKey exists, then write the old selectKey,and update the selectKey 29 | if redisClient.exists(userKey): 30 | json_selected = redisClient.get(selectKey) 31 | self.write(json_selected) 32 | 33 | userUCB = UCB1.UCB1() 34 | UCBJson = redisClient.get(userKey) 35 | userUCB.readFromString(UCBJson) 36 | selected = userUCB.select_arm_N(num) 37 | json_selected = simplejson.dumps(selected) 38 | #将更新的选择值和用户数据写回数据库 39 | newUCBJson = userUCB.toString() 40 | redisClient.set(userKey, newUCBJson) 41 | redisClient.set(selectKey, json_selected) 42 | else: 43 | userUCB = UCB1.UCB1(1,10,0.5) 44 | userUCB.readFromString(initStr) 45 | selected = userUCB.select_arm_N(num) 46 | json_selected = simplejson.dumps(selected) 47 | #传送初始数据 48 | self.write(json_selected) 49 | #将更新的选择值和用户数据写回数据库 50 | newUCBJson = userUCB.toString() 51 | redisClient.set(userKey, newUCBJson) 52 | redisClient.set(selectKey, json_selected) 53 | 54 | class UpdateHandler(tornado.web.RequestHandler): 55 | def get(self): 56 | uid = self.get_argument("uid") 57 | armkey = self.get_argument("armkey") 58 | value = self.get_argument("value") 59 | value = float(value) 60 | print value 61 | userKey = uid +".updated.ee" 62 | userUCB = UCB1.UCB1() 63 | UCBJson = redisClient.get(userKey) 64 | userUCB.readFromString(UCBJson) 65 | userUCB.update_reset_last(armkey, value) 66 | #将更新的用户数据写回数据库 67 | newUCBJson = userUCB.toString() 68 | redisClient.set(userKey, newUCBJson) 69 | 70 | 71 | class BlackListHandler(tornado.web.RequestHandler): 72 | def get(self): 73 | uid = self.get_argument("uid") 74 | armkey = self.get_argument("armkey") 75 | userKey = uid +".updated.ee" 76 | selectKey = uid+".selected.ee" 77 | 78 | #将key加入黑名单 79 | userUCB = UCB1.UCB1() 80 | UCBJson = redisClient.get(userKey) 81 | userUCB.readFromString(UCBJson) 82 | userUCB.addBlackList([armkey]) 83 | #将更新的选择值和用户数据写回数据库 84 | newUCBJson = userUCB.toString() 85 | redisClient.set(userKey, newUCBJson) 86 | json_selected = userUCB.select_arm_N_forcedbyBlackList(10) 87 | redisClient.set(selectKey, json_selected) 88 | 89 | application = tornado.web.Application([ 90 | (r"/select", SelectHandler), 91 | (r"/update", UpdateHandler), 92 | (r"/blacklist", BlackListHandler) 93 | ]) 94 | 95 | 96 | if __name__ == "__main__": 97 | application.listen(8888) 98 | tornado.ioloop.IOLoop.instance().start() 99 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_Server_demo/Makefile: -------------------------------------------------------------------------------- 1 | C=gcc 2 | CXX=g++ 3 | CFLAGS= -g -D_DEBUG -fPIC -Wshadow -Wcast-qual -Wcast-align -Wwrite-strings -Wsign-compare -Winvalid-pch -fms-extensions -Wall -MMD 4 | CPPFLAGS=$(CFLAGS) -Woverloaded-virtual -Wsign-promo -fno-gnu-keywords -std=c++11 5 | 6 | DEPS=-I/home/yinwen/jsoncpp-src-0.5.0/include -I/home/yinwen/served/src /home/yinwen/jsoncpp-src-0.5.0/libs/linux-gcc-4.8.2/libjson_linux-gcc-4.8.2_libmt.a 7 | 8 | LIBPATH=/home/yinwen/served/lib 9 | 10 | LIB=-lserved 11 | #-lboost_system-mt -lstdc++ 12 | 13 | OBJS=main.o UCB1.o url_chinese.o 14 | 15 | TARGET=unittest_UCB1 16 | 17 | 18 | ALL: $(TARGET) 19 | 20 | $(TARGET):$(OBJS) 21 | $(CXX) $(CPPFLAGS) -o unittest_UCB1 $^ $(DEPS) -L$(LIBPATH) $(LIB) 22 | 23 | UCB1.o: UCB1.cpp 24 | $(CXX) $(CPPFLAGS) -c $< $(DEPS) 25 | 26 | url_chinese.o:url_chinese.cpp 27 | $(CXX) $(CPPFLAGS) -c $< $(DEPS) 28 | 29 | main.o:main.cpp 30 | $(CXX) $(CPPFLAGS) -c $< $(DEPS) 31 | 32 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_Server_demo/README.md: -------------------------------------------------------------------------------- 1 | #基于served的http服务器Demo 2 | 3 | ###一、served服务器: 4 | 该服务器使用C++11与boost1.56开发,编译时需要gcc4.8版本以及至少boost1.53版本。 5 | 该服务器已经在我的github的仓库中,只阅读使用了该服务器,未贡献代码 6 | 7 | ###二、Demo的功能与实现 8 | 9 | 1. 该Demo先在attri文件(已在此目录下)中读取文件中的json字符串。每个json串中包含一个商品的名字与链接,以及对该商品的各种分类。 10 | 2. 在这里使用tags与topics两种分类,建立两个UCB类,来保存每个客户对tags与topics的喜好。 11 | 3. 再使用unorderedmap分别建立一个tags/topics与std::vector联系。vector中保存的就是有着相同tags或者topics分类的商品json字符串 12 | 4. 建立推荐handler,输入参数中包含key=tags/topics,number=要推荐的数量 13 | 5. 建立反馈handler,输入key=tags/topics,tags的名字或者topics的名字,以及更新的value值。 14 | 6. 通过网页访问该url,可以看到推荐的结果与预想的相同 15 | 16 | 17 | ##详细介绍 18 | 更加详细的Demo介绍请转至我的博客: 19 | http://blog.csdn.net/yw8355507/article/details/49206395 -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_Server_demo/UCB1.cpp: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: UCB1.cpp 3 | > Author: YinWen 4 | > Mail: YinWenatBIT@163.com 5 | > Created Time: Tue 01 Sep 2015 08:32:35 PM CST 6 | ************************************************************************/ 7 | 8 | #include "UCB1.h" 9 | #include 10 | #include 11 | #include 12 | #include "json/json.h" 13 | #include 14 | 15 | 16 | 17 | using namespace std; 18 | 19 | UCB1::UCB1(): 20 | totalcount(0), default_count(0), default_value(0.0) 21 | { 22 | 23 | } 24 | 25 | UCB1::UCB1(int init_totalcount, int init_count, double init_value): 26 | totalcount(init_totalcount), default_count(init_count), default_value(init_value) 27 | { 28 | 29 | } 30 | 31 | UCB1::~UCB1() 32 | { 33 | 34 | } 35 | 36 | string UCB1::toString() 37 | { 38 | Json::FastWriter writer; 39 | Json::Value value; 40 | Json::Value element; 41 | std::string key; 42 | 43 | value["totalcount"] = Json::Value(totalcount); 44 | value["default_count"] = Json::Value(default_count); 45 | value["default_value"] = Json::Value(default_value); 46 | 47 | for(auto it = frequencyReward.begin(); it !=frequencyReward.end(); it++) 48 | { 49 | key = it->first; 50 | 51 | Json::Value temp_value; 52 | temp_value["counts"] = Json::Value(it->second.counts); 53 | temp_value["keyValue"] = Json::Value(it->second.values); 54 | element[key] = temp_value; 55 | } 56 | value["element"] = element; 57 | return writer.write(value); 58 | } 59 | 60 | 61 | bool UCB1::readFromString(const string& JString) 62 | { 63 | Json::Reader reader; 64 | Json::Value value; 65 | Json::Value element; 66 | std::string key; 67 | 68 | int readcount; 69 | double readdouble; 70 | 71 | if(!reader.parse(JString, value)) 72 | return false; 73 | 74 | element = value["element"]; 75 | auto keymember = element.getMemberNames(); 76 | int num = keymember.size(); 77 | 78 | for(int i=0; isecond.counts); 134 | countstr = temp; 135 | return countstr; 136 | } 137 | 138 | string UCB1::get_value(string key) 139 | { 140 | string valuestr; 141 | auto origin = frequencyReward.find(key); 142 | if(origin == frequencyReward.end()) 143 | return valuestr; 144 | 145 | char temp[25]; 146 | sprintf(temp, "%f", origin->second.values); 147 | valuestr = temp; 148 | return valuestr; 149 | } 150 | 151 | 152 | bool UCB1::update(const char * start, double res) 153 | { 154 | string key(start); 155 | return update(key, res); 156 | } 157 | 158 | 159 | 160 | bool UCB1::update(string& key, double res) 161 | { 162 | auto origin = frequencyReward.find(key); 163 | if(origin == frequencyReward.end()) 164 | { 165 | auto ret = frequencyReward.insert({ key, {default_count, default_value} }); 166 | return true; 167 | } 168 | 169 | 170 | double n = ++origin->second.counts; 171 | origin->second.values = origin->second.values*(n-1)/n + res/n; 172 | // ++this->totalcount; 173 | 174 | return true; 175 | } 176 | 177 | bool UCB1::update_reset_last(string & key, double res) 178 | { 179 | auto origin = frequencyReward.find(key); 180 | if(origin == frequencyReward.end()) 181 | { 182 | auto ret = frequencyReward.insert({ key, {default_count, default_value} }); 183 | return true; 184 | } 185 | 186 | 187 | double n = origin->second.counts; 188 | origin->second.values = (origin->second.values*n -default_value + res)/n; 189 | 190 | return true; 191 | 192 | } 193 | bool UCB1::update_reset_last(const char *start, double res) 194 | { 195 | string key(start); 196 | return update_reset_last(key, res); 197 | } 198 | 199 | string UCB1::select_arm() 200 | { 201 | string maxkey; 202 | 203 | double bonus; 204 | double maxvalue = 0.0; 205 | 206 | if(frequencyReward.empty()) 207 | throw empty_arm("no arm in the map"); 208 | 209 | for(auto it = frequencyReward.begin(); it != frequencyReward.end(); it++ ) 210 | { 211 | if(it->second.counts == 0) 212 | return it->first; 213 | 214 | bonus = sqrt(2* log((double)totalcount))/it->second.counts; 215 | if(maxvalue < bonus + it->second.values) 216 | { 217 | maxvalue = bonus + it->second.values; 218 | maxkey = it->first; 219 | } 220 | } 221 | 222 | return maxkey; 223 | } 224 | 225 | 226 | std::vector & UCB1::select_arm_N(size_t n) 227 | { 228 | keystrs.clear(); 229 | int countzero= 0; 230 | int count = 0; 231 | 232 | if(n ==0) 233 | return keystrs; 234 | if(n > frequencyReward.size()) 235 | n = frequencyReward.size(); 236 | 237 | if(frequencyReward.empty()) 238 | throw empty_arm("no arm in the map"); 239 | 240 | 241 | 242 | vector maxkey; 243 | maxkey.resize(n+1); 244 | 245 | double bonus; 246 | double valuenow; 247 | vector maxvalue; 248 | maxvalue.resize(n+1, 0.0); 249 | 250 | for(auto it = frequencyReward.begin(); it != frequencyReward.end() && countzero second.counts == 0) 253 | { 254 | keystrs.push_back(it->first); 255 | ++countzero; 256 | } 257 | else 258 | { 259 | bonus = sqrt(2* log((double)totalcount))/it->second.counts; 260 | valuenow = bonus +it->second.values; 261 | 262 | int i; 263 | if(count < n - countzero) 264 | i = count; 265 | else 266 | i = n - countzero; 267 | 268 | while(i>0) 269 | { 270 | if(valuenow > maxvalue[i-1]) 271 | { 272 | maxvalue[i] = maxvalue[i-1]; 273 | maxkey[i] = maxkey[i-1]; 274 | --i; 275 | } 276 | else 277 | break; 278 | 279 | } 280 | maxvalue[i] = valuenow; 281 | maxkey[i] = it->first; 282 | ++count; 283 | 284 | } 285 | } 286 | 287 | int remain = n - countzero; 288 | int i=0; 289 | while(i< remain) 290 | { 291 | keystrs.push_back(maxkey[i]); 292 | i++; 293 | } 294 | 295 | for(int i=0; iadd_totalcount(1); 299 | 300 | return keystrs; 301 | 302 | 303 | } 304 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_Server_demo/UCB1.h: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: UCB1.h 3 | > Author: YinWen 4 | > Mail: YinWenatBIT@163.com 5 | > Created Time: Tue 01 Sep 2015 07:54:50 PM CST 6 | >Description:实现UCB1算法,提供一个UCB类 7 | ************************************************************************/ 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | #ifndef _UCB1_H 15 | #define _UCB1_H 16 | 17 | using namespace std; 18 | 19 | 20 | class empty_arm: public runtime_error 21 | { 22 | public: 23 | explicit empty_arm(const string &str): 24 | runtime_error(str) {} 25 | }; 26 | 27 | 28 | struct UCBNode 29 | { 30 | int counts; 31 | double values; 32 | }; 33 | 34 | typedef UCBNode UCB; 35 | 36 | class UCB1 37 | { 38 | public: 39 | UCB1(); 40 | UCB1(int init_totalcount, int init_count, double init_value); 41 | ~UCB1(); 42 | bool update(string & key, double res); 43 | bool update(const char *start, double res); 44 | 45 | bool update_reset_last(string & key, double res); 46 | bool update_reset_last(const char *start, double res); 47 | 48 | string toString(); 49 | bool readFromString(const string& JString); 50 | string select_arm(); 51 | std::vector & select_arm_N(size_t n); 52 | 53 | std::vector keystrs; 54 | bool insert(string & key, UCB value); 55 | bool insert(string &key); 56 | void set_totalcount(int number); 57 | void add_totalcount(int num); 58 | string get_totalcount(); 59 | string get_count(string key); 60 | string get_value(string key); 61 | private: 62 | 63 | std::unordered_map frequencyReward; 64 | int totalcount; 65 | int default_count; 66 | double default_value; 67 | }; 68 | 69 | 70 | 71 | #endif 72 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_Server_demo/main.cpp: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: main.cpp 3 | > Author: YinWen 4 | > Mail: yinwenatbit@163.com 5 | > Created Time: Sun 06 Sep 2015 06:10:36 PM CST 6 | ************************************************************************/ 7 | 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include "json/json.h" 13 | #include 14 | #include 15 | #include "url_chinese.h" 16 | #include "UCB1.h" 17 | 18 | using namespace std; 19 | 20 | void addurl(unordered_map> &good_url, string &key, string &url); 21 | 22 | void initialize(UCB1 &tags_key_goods, UCB1 & topics_key_goods, unordered_map> &tag_jason, unordered_map> &topic_jason); 23 | 24 | void addlike(string &x, string &key, string &count_str, string & value_str, string &totalcount_str, string &jsonstr, string & output); 25 | 26 | int main() 27 | { 28 | UCB1 tags_key_goods(1, 10, 0.5), topics_key_goods(1, 10, 0.5); 29 | unordered_map> tag_jason; 30 | unordered_map> topic_jason; 31 | 32 | initialize(tags_key_goods, topics_key_goods, tag_jason, topic_jason); 33 | 34 | string key; 35 | string number; 36 | 37 | served::multiplexer mux; 38 | mux.use_after(served::plugin::access_log); 39 | 40 | const std::string image_name("served-logo.png"); 41 | 42 | mux.handle("/UCB/query") 43 | .get([&](served::response & res, const served::request & req) { 44 | 45 | 46 | // res.set_header(std::string("charset"), std::string("gb2312")); 47 | key = req.query["key"]; 48 | number = req.query["number"]; 49 | 50 | int chonum = atoi(number.c_str()); 51 | std::vector tohttp; 52 | string totalcount_str; 53 | 54 | if(key == "tags") 55 | { 56 | tohttp = tags_key_goods.select_arm_N(chonum); 57 | totalcount_str = tags_key_goods.get_totalcount(); 58 | } 59 | else if (key == "topics") 60 | { 61 | tohttp = topics_key_goods.select_arm_N(chonum); 62 | totalcount_str = topics_key_goods.get_totalcount(); 63 | 64 | } 65 | else 66 | cout<<"wrong key"<"; 77 | totaloutput +=""; 78 | for(auto x: tohttp) 79 | { 80 | vector allurl; 81 | if(key == "tags") 82 | { 83 | allurl = tag_jason[x]; 84 | count_str = tags_key_goods.get_count(x); 85 | value_str = tags_key_goods.get_value(x); 86 | } 87 | if(key == "topics") 88 | { 89 | allurl = topic_jason[x]; 90 | count_str = topics_key_goods.get_count(x); 91 | value_str = topics_key_goods.get_value(x); 92 | 93 | } 94 | int number = rand() % allurl.size(); 95 | select_good = allurl[number]; 96 | 97 | 98 | addlike(x, key, count_str, value_str, totalcount_str, select_good, totaloutput); 99 | } 100 | 101 | 102 | totaloutput +=""; 103 | res.set_body(totaloutput); 104 | }); 105 | 106 | string res_key; 107 | string res_target; 108 | string res_choose; 109 | 110 | mux.handle("/UCB/response") 111 | .get([&](served::response & res, const served::request & req) { 112 | 113 | res_key = req.query["key"]; 114 | res_choose = req.query["choose"]; 115 | double updatevalue = atof(res_choose.c_str()); 116 | res_target = Url2Str_gb2312(req.query["target"]); 117 | cout<<"target: "<> &tag_jason, unordered_map> &topic_jason) 144 | { 145 | 146 | ifstream data("20150924015910.attri"); 147 | string readdata; 148 | 149 | 150 | Json::Reader reader; 151 | Json::Value value; 152 | Json::Value tags; 153 | Json::Value topic; 154 | std::string key; 155 | std::string url; 156 | int readcount; 157 | double readfloat; 158 | 159 | Json::FastWriter writer; 160 | int counts =0; 161 | 162 | srand(time(0)); 163 | 164 | while(!data.eof()) 165 | { 166 | 167 | getline(data, readdata); 168 | 169 | if(!reader.parse(readdata, value)) 170 | break; 171 | 172 | topic = value["topics"]; 173 | if(!topic.empty()) 174 | { 175 | unsigned int j=0; 176 | key = topic[j].asString(); 177 | topics_key_goods.insert(key); 178 | addurl(topic_jason, key, readdata); 179 | } 180 | 181 | /*use tags as key*/ 182 | tags = value["tags"]; 183 | unsigned int num = tags.size(); 184 | for(unsigned int i=0; i< num; i++) 185 | { 186 | key = tags[i].asString(); 187 | 188 | tags_key_goods.insert(key); 189 | addurl(tag_jason, key, readdata); 190 | } 191 | 192 | } 193 | 194 | data.close(); 195 | 196 | 197 | 198 | } 199 | 200 | 201 | void addlike(string &x, string &key, string &count_str, string & value_str, string &totalcount_str, string &jsonstr, string & output) 202 | { 203 | Json::Reader reader; 204 | Json::Value value; 205 | Json::Value tags; 206 | std::string topic; 207 | std::string url; 208 | std::string imag; 209 | std::string good_name; 210 | 211 | unsigned int i = 0; 212 | 213 | reader.parse(jsonstr, value); 214 | 215 | url = value["goods_channel_origin_url"].asString(); 216 | tags = value["tags"]; 217 | topic = value["topics"][i].asString(); 218 | imag = value["image"][i].asString(); 219 | good_name = value["goods_name"].asString(); 220 | 221 | std::string likestring; 222 | likestring.reserve(100); 223 | 224 | 225 | 226 | likestring = ""; 231 | output += likestring; 232 | 233 | likestring = ""; 236 | output += likestring; 237 | 238 | 239 | likestring = ""; 250 | output += likestring; 251 | 252 | likestring = ""; 273 | output += likestring; 274 | 275 | likestring = ""; 282 | output += likestring; 283 | 284 | 285 | } 286 | 287 | 288 | void addurl(unordered_map> &good_url, string &key, string &url) 289 | { 290 | auto origin = good_url.find(key); 291 | if(origin == good_url.end()) 292 | { 293 | vector tag_url; 294 | tag_url.push_back(url); 295 | good_url.insert({key, tag_url}); 296 | } 297 | else 298 | { 299 | origin->second.push_back(url); 300 | } 301 | } 302 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_Server_demo/url_chinese.cpp: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: url_chinese.cpp 3 | > Author: YinWen 4 | > Mail: yinwenatbit@163.com 5 | > Created Time: 2015年09月29日 星期二 15时59分31秒 6 | ************************************************************************/ 7 | 8 | #include "url_chinese.h" 9 | using namespace std; 10 | 11 | char Char2Int(char ch){ 12 | if(ch>='0' && ch<='9')return (char)(ch-'0'); 13 | if(ch>='a' && ch<='f')return (char)(ch-'a'+10); 14 | if(ch>='A' && ch<='F')return (char)(ch-'A'+10); 15 | return -1; 16 | } 17 | 18 | char Str2Bin(char *str){ 19 | char tempWord[2]; 20 | char chn; 21 | 22 | tempWord[0] = Char2Int(str[0]); //make the B to 11 -- 00001011 23 | tempWord[1] = Char2Int(str[1]); //make the 0 to 0 -- 00000000 24 | 25 | chn = (tempWord[0] << 4) | tempWord[1]; //to change the BO to 10110000 26 | 27 | return chn; 28 | } 29 | 30 | string UrlDecode(string str){ 31 | string output=""; 32 | char tmp[2]; 33 | int i=0,idx=0,ndx,len=str.length(); 34 | 35 | while(i File Name: url_chinese.h 3 | > Author: YinWen 4 | > Mail: yinwenatbit@163.com 5 | > Created Time: 2015年09月29日 星期二 15时58分08秒 6 | ************************************************************************/ 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | #ifndef _URL_CHINESE_H 13 | #define _URL_CHINESE_H 14 | 15 | 16 | 17 | char Char2Int(char ch); 18 | char Str2Bin(char *str); 19 | std::string UrlDecode(std::string str); 20 | std::string Url2Str_Utf8(std::string instr); 21 | 22 | std::string Url2Str_gb2312(std::string str); 23 | 24 | class CodeConverter { 25 | private: 26 | iconv_t cd; 27 | public: 28 | CodeConverter(const char *from_charset,const char *to_charset) {// 构造 29 | cd = iconv_open(to_charset,from_charset); 30 | } 31 | 32 | ~CodeConverter() {// 析构 33 | iconv_close(cd); 34 | } 35 | 36 | int convert(char *inbuf,int inlen,char *outbuf,int outlen) {// 转换输出 37 | char **pin = &inbuf; 38 | char **pout = &outbuf; 39 | 40 | memset(outbuf,0,outlen); 41 | return iconv(cd,pin,(size_t *)&inlen,pout,(size_t *)&outlen); 42 | } 43 | }; 44 | 45 | #endif 46 | 47 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_json_time/Makefile4: -------------------------------------------------------------------------------- 1 | 2 | C=gcc 3 | CXX=g++ 4 | CFLAGS= -g -D_DEBUG -fPIC -Wshadow -Wcast-qual -Wcast-align -Wwrite-strings -Wsign-compare -Winvalid-pch -fms-extensions -Wall -MMD 5 | CPPFLAGS=$(CFLAGS) -Woverloaded-virtual -Wsign-promo -fno-gnu-keywords -std=c++11 6 | 7 | DEPS=-I/home/yw/jsoncpp-src-0.5.0/include -I/home/yw/github/served/src/ /home/yw/jsoncpp-src-0.5.0/libs/linux-gcc-4.8.3/libjson_linux-gcc-4.8.3_libmt.a 8 | 9 | LIBPATH=/home/yw/github/served/lib 10 | 11 | LIB=-lserved -lboost_system-mt -lstdc++ 12 | 13 | #-lboost_system-mt -lstdc++ 14 | 15 | OBJS=read_test.o UCB1.o 16 | 17 | TARGET=unit_read 18 | 19 | 20 | ALL: $(TARGET) 21 | 22 | $(TARGET):$(OBJS) 23 | $(CXX) $(CPPFLAGS) -o $(TARGET) $(OBJS) $(DEPS) 24 | 25 | 26 | read_test.o:read_test.cpp 27 | $(CXX) $(CPPFLAGS) -c $< $(DEPS) 28 | 29 | UCB1.o: UCB1.cpp 30 | $(CXX) $(CPPFLAGS) -c $< $(DEPS) 31 | 32 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_json_time/README.md: -------------------------------------------------------------------------------- 1 | #测试UCB类中,Json串解析速度 2 | 3 | ###一、原因 4 | 在服务器实际使用中,需要保存每一个客户的UCB类,保存的方式就是通过toString方法保存成json串,然后得到了推荐的请求之后,从数据库中取出json串,解析,然后进行臂的挑选(推荐)。这里希望能在10ms之内完成。所以速度的瓶颈可以估计出应该在json串解析的时间上。 5 | 6 | ###二、Demo的功能与实现 7 | 8 | 1. 使用中的gettimeofday函数。可以得到精确到us的时间。进行循环100次json解析,来估计每次解析的平均时间 9 | 2. UCB类中,臂的个数不等,从1万个臂到100个臂。估计出不同臂数量使用的时间,确定可以设置臂的上限 10 | 11 | 12 | ##详细介绍 13 | 更加详细的加载时间测试请见我的博客: 14 | http://blog.csdn.net/yw8355507/article/details/49220855 -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_json_time/UCB1.cpp: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: UCB1.cpp 3 | > Author: YinWen 4 | > Mail: YinWenatBIT@163.com 5 | > Created Time: Tue 01 Sep 2015 08:32:35 PM CST 6 | ************************************************************************/ 7 | 8 | #include "UCB1.h" 9 | #include 10 | #include 11 | #include 12 | #include "json/json.h" 13 | #include 14 | 15 | 16 | 17 | using namespace std; 18 | 19 | UCB1::UCB1(): 20 | totalcount(0), default_count(0), default_value(0.0) 21 | { 22 | 23 | } 24 | 25 | UCB1::UCB1(int init_totalcount, int init_count, double init_value): 26 | totalcount(init_totalcount), default_count(init_count), default_value(init_value) 27 | { 28 | 29 | } 30 | 31 | UCB1::~UCB1() 32 | { 33 | 34 | } 35 | 36 | string UCB1::toString() 37 | { 38 | Json::FastWriter writer; 39 | Json::Value value; 40 | Json::Value element; 41 | std::string key; 42 | 43 | value["totalcount"] = Json::Value(totalcount); 44 | value["default_count"] = Json::Value(default_count); 45 | value["default_value"] = Json::Value(default_value); 46 | 47 | for(auto it = frequencyReward.begin(); it !=frequencyReward.end(); it++) 48 | { 49 | key = it->first; 50 | 51 | Json::Value temp_value; 52 | temp_value["counts"] = Json::Value(it->second.counts); 53 | temp_value["keyValue"] = Json::Value(it->second.values); 54 | element[key] = temp_value; 55 | } 56 | value["element"] = element; 57 | return writer.write(value); 58 | } 59 | 60 | 61 | bool UCB1::readFromString(const string& JString) 62 | { 63 | Json::Reader reader; 64 | Json::Value value; 65 | Json::Value element; 66 | std::string key; 67 | 68 | int readcount; 69 | double readdouble; 70 | 71 | if(!reader.parse(JString, value)) 72 | return false; 73 | 74 | element = value["element"]; 75 | auto keymember = element.getMemberNames(); 76 | int num = keymember.size(); 77 | 78 | for(int i=0; isecond.counts); 134 | countstr = temp; 135 | return countstr; 136 | } 137 | 138 | string UCB1::get_value(string key) 139 | { 140 | string valuestr; 141 | auto origin = frequencyReward.find(key); 142 | if(origin == frequencyReward.end()) 143 | return valuestr; 144 | 145 | char temp[25]; 146 | sprintf(temp, "%f", origin->second.values); 147 | valuestr = temp; 148 | return valuestr; 149 | } 150 | 151 | 152 | bool UCB1::update(const char * start, double res) 153 | { 154 | string key(start); 155 | return update(key, res); 156 | } 157 | 158 | 159 | 160 | bool UCB1::update(string& key, double res) 161 | { 162 | auto origin = frequencyReward.find(key); 163 | if(origin == frequencyReward.end()) 164 | { 165 | auto ret = frequencyReward.insert({ key, {default_count, default_value} }); 166 | return true; 167 | } 168 | 169 | 170 | double n = ++origin->second.counts; 171 | origin->second.values = origin->second.values*(n-1)/n + res/n; 172 | // ++this->totalcount; 173 | 174 | return true; 175 | } 176 | 177 | bool UCB1::update_reset_last(string & key, double res) 178 | { 179 | auto origin = frequencyReward.find(key); 180 | if(origin == frequencyReward.end()) 181 | { 182 | auto ret = frequencyReward.insert({ key, {default_count, default_value} }); 183 | return true; 184 | } 185 | 186 | 187 | double n = origin->second.counts; 188 | origin->second.values = (origin->second.values*n -default_value + res)/n; 189 | 190 | return true; 191 | 192 | } 193 | bool UCB1::update_reset_last(const char *start, double res) 194 | { 195 | string key(start); 196 | return update_reset_last(key, res); 197 | } 198 | 199 | string UCB1::select_arm() 200 | { 201 | string maxkey; 202 | 203 | double bonus; 204 | double maxvalue = 0.0; 205 | 206 | if(frequencyReward.empty()) 207 | throw empty_arm("no arm in the map"); 208 | 209 | for(auto it = frequencyReward.begin(); it != frequencyReward.end(); it++ ) 210 | { 211 | if(it->second.counts == 0) 212 | return it->first; 213 | 214 | bonus = sqrt(2* log((double)totalcount))/it->second.counts; 215 | if(maxvalue < bonus + it->second.values) 216 | { 217 | maxvalue = bonus + it->second.values; 218 | maxkey = it->first; 219 | } 220 | } 221 | 222 | return maxkey; 223 | } 224 | 225 | 226 | std::vector & UCB1::select_arm_N(size_t n) 227 | { 228 | keystrs.clear(); 229 | int countzero= 0; 230 | int count = 0; 231 | 232 | if(n ==0) 233 | return keystrs; 234 | if(n > frequencyReward.size()) 235 | n = frequencyReward.size(); 236 | 237 | if(frequencyReward.empty()) 238 | throw empty_arm("no arm in the map"); 239 | 240 | 241 | 242 | vector maxkey; 243 | maxkey.resize(n+1); 244 | 245 | double bonus; 246 | double valuenow; 247 | vector maxvalue; 248 | maxvalue.resize(n+1, 0.0); 249 | 250 | for(auto it = frequencyReward.begin(); it != frequencyReward.end() && countzero second.counts == 0) 253 | { 254 | keystrs.push_back(it->first); 255 | ++countzero; 256 | } 257 | else 258 | { 259 | bonus = sqrt(2* log((double)totalcount))/it->second.counts; 260 | valuenow = bonus +it->second.values; 261 | 262 | int i; 263 | if(count < n - countzero) 264 | i = count; 265 | else 266 | i = n - countzero; 267 | 268 | while(i>0) 269 | { 270 | if(valuenow > maxvalue[i-1]) 271 | { 272 | maxvalue[i] = maxvalue[i-1]; 273 | maxkey[i] = maxkey[i-1]; 274 | --i; 275 | } 276 | else 277 | break; 278 | 279 | } 280 | maxvalue[i] = valuenow; 281 | maxkey[i] = it->first; 282 | ++count; 283 | 284 | } 285 | } 286 | 287 | int remain = n - countzero; 288 | int i=0; 289 | while(i< remain) 290 | { 291 | keystrs.push_back(maxkey[i]); 292 | i++; 293 | } 294 | 295 | for(int i=0; iadd_totalcount(1); 299 | 300 | return keystrs; 301 | 302 | 303 | } 304 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_json_time/UCB1.h: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: UCB1.h 3 | > Author: YinWen 4 | > Mail: YinWenatBIT@163.com 5 | > Created Time: Tue 01 Sep 2015 07:54:50 PM CST 6 | >Description:实现UCB1算法,提供一个UCB类 7 | ************************************************************************/ 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | #ifndef _UCB1_H 15 | #define _UCB1_H 16 | 17 | using namespace std; 18 | 19 | 20 | class empty_arm: public runtime_error 21 | { 22 | public: 23 | explicit empty_arm(const string &str): 24 | runtime_error(str) {} 25 | }; 26 | 27 | 28 | struct UCBNode 29 | { 30 | int counts; 31 | double values; 32 | }; 33 | 34 | typedef UCBNode UCB; 35 | 36 | class UCB1 37 | { 38 | public: 39 | UCB1(); 40 | UCB1(int init_totalcount, int init_count, double init_value); 41 | ~UCB1(); 42 | bool update(string & key, double res); 43 | bool update(const char *start, double res); 44 | 45 | bool update_reset_last(string & key, double res); 46 | bool update_reset_last(const char *start, double res); 47 | 48 | string toString(); 49 | bool readFromString(const string& JString); 50 | string select_arm(); 51 | std::vector & select_arm_N(size_t n); 52 | 53 | std::vector keystrs; 54 | bool insert(string & key, UCB value); 55 | bool insert(string &key); 56 | void set_totalcount(int number); 57 | void add_totalcount(int num); 58 | string get_totalcount(); 59 | string get_count(string key); 60 | string get_value(string key); 61 | private: 62 | 63 | std::unordered_map frequencyReward; 64 | int totalcount; 65 | int default_count; 66 | double default_value; 67 | }; 68 | 69 | 70 | 71 | #endif 72 | -------------------------------------------------------------------------------- /UCBAlgorithm/UCB_json_time/read_test.cpp: -------------------------------------------------------------------------------- 1 | /************************************************************************* 2 | > File Name: read_test.cpp 3 | > Author: YinWen 4 | > Mail: yinwenatbit@163.com 5 | > Created Time: 2015年10月18日 星期日 18时55分55秒 6 | ************************************************************************/ 7 | 8 | #include "json/json.h" 9 | #include 10 | #include 11 | #include 12 | #include "UCB1.h" 13 | #include 14 | #include 15 | #include 16 | #include 17 | 18 | 19 | using namespace std; 20 | void addurl(unordered_map> &good_url, string &key, string &url); 21 | void initialize(UCB1 &tags_key_goods, UCB1 & topics_key_goods, unordered_map> &tag_jason, unordered_map> &topic_jason); 22 | 23 | 24 | int main() 25 | { 26 | 27 | std::string tags_str, topics_str; 28 | 29 | ifstream tags_json_file("./topics_and_tags/tags_json_file_0.8w.txt"); 30 | ifstream topics_json_file("./topics_and_tags/topics_json_file_0.8w.txt"); 31 | 32 | if(!tags_json_file.is_open()) 33 | cout<<"cannot open tags_json_file"<>topics_str; 41 | 42 | 43 | tags_json_file>>tags_str; 44 | 45 | struct timeval timebefore, timenow; 46 | long time_tags = 0, time_topics = 0; 47 | 48 | for(int i =0; i<100; i++) 49 | { 50 | UCB1 tags_key_goods, topics_key_goods; 51 | 52 | gettimeofday(&timebefore, NULL); 53 | topics_key_goods.readFromString(topics_str); 54 | // topics_key_goods.select_arm_N(200); 55 | gettimeofday(&timenow, NULL); 56 | 57 | time_topics += (timenow.tv_sec - timebefore.tv_sec)*1000000 +timenow.tv_usec - timebefore.tv_usec; 58 | 59 | 60 | 61 | gettimeofday(&timebefore, NULL); 62 | tags_key_goods.readFromString(tags_str); 63 | // tags_key_goods.select_arm_N(200); 64 | gettimeofday(&timenow, NULL); 65 | 66 | time_tags += (timenow.tv_sec - timebefore.tv_sec)*1000000 +timenow.tv_usec - timebefore.tv_usec; 67 | } 68 | 69 | cout<<"topics_time in ms: "<).+?(?=<)",all_the_text) 18 | listName = listName[0] 19 | listId =re.findall(r"(?<=/p/).+?(?=/)" ,wget_url) 20 | link_list =re.findall(r"(?<=
商品名字 图片属性选择UCB参数
"; 229 | likestring +=good_name; 230 | likestring +=" "; 240 | 241 | unsigned int num = tags.size(); 242 | for(i=0; i< num; i++) 243 | { 244 | likestring += tags[i].asString(); 245 | likestring += "
"; 246 | } 247 | 248 | likestring += topic; 249 | likestring += "
like
"; 257 | output += likestring; 258 | 259 | 260 | likestring = " soso
"; 265 | output += likestring; 266 | 267 | 268 | likestring = " dislike
count:"; 276 | likestring += count_str; 277 | likestring += "
value:"; 278 | likestring += value_str; 279 | likestring +="
totalcount:"; 280 | likestring +=totalcount_str; 281 | likestring +="