├── README.md ├── business ├── assessor.go ├── consumer.go ├── producer.go ├── proxy.go ├── service.go └── settings.go ├── config ├── README.md └── config.yml ├── dao ├── dao.go └── proxy.go ├── docs └── screenshot │ └── directory structure.png ├── go.mod ├── go.sum ├── main.go ├── model ├── config.go ├── database.go ├── httpbin.go ├── model.go └── source.go ├── source ├── nimadaili.com.http.yml ├── nimadaili.com.https.yml ├── xiladaili.com.http.yml └── xiladaili.com.https.yml └── std ├── mysql.go ├── sqlite.go └── std.go /README.md: -------------------------------------------------------------------------------- 1 | # golang-proxy `v3.1` 2 | 3 | ![golang-proxy](https://img.shields.io/teamcity/codebetter/bt428.svg) 4 | [![download](https://img.shields.io/eclipse-marketplace/dt/notepad4e.svg)](https://github.com/storyicon/golang-proxy) 5 | 6 | > Made some changes to the judgment logic of whether the proxy is highly anonymous. This problem has led to the absence of proxies that can be used. Now you can enjoy it. 😏 7 | 8 | - [English Document](#english-document) 9 | - [1. Feature](#1-feature) 10 | - [2. How to use](#2-how-to-use) 11 | - [API interface](#api-interface) 12 | - [3. Advanced](#3-advanced) 13 | - [two `data tables`](#two-data-tables) 14 | - [1. Table Crude Proxy](#1-table-crude-proxy) 15 | - [2. Table Proxy](#2-table-proxy) 16 | - [one `configuration file`](#one-configuration-file) 17 | - [one `source folder`](#one-source-folder) 18 | - [four `modules`](#four-modules) 19 | - [Request for comments](#request-for-comments) 20 | - [中文文档](#中文文档) 21 | - [在 `v3.0` 有哪些新特性](#在-v30-有哪些新特性) 22 | - [如何使用 `golang-proxy`](#如何使用-golang-proxy) 23 | - [1. 使用开箱即用版本](#1-使用开箱即用版本) 24 | - [接口示例: `localhost:9999/sql`](#接口示例-localhost9999sql) 25 | - [2. 使用源码编译](#2-使用源码编译) 26 | - [为什么要用 Golang-Proxy](#为什么要用-golang-proxy) 27 | - [如何配置一个新的源](#如何配置一个新的源) 28 | - [征求意见](#征求意见) 29 | 30 | ![golang-proxy](https://raw.githubusercontent.com/parnurzeal/gorequest/gh-pages/images/Gopher_GoRequest_400x300.jpg) 31 | 32 | # English Document 33 | 34 | Golang-proxy is an efficient free proxy crawler that ensures that the captured proxies are highly anonymous and at the same time guarantee their quality. You can use these captured proxies to download network resources and ensure the privacy of your own identity. 35 | 36 | ## 1. Feature 37 | 38 | - Very high speed of proxy crawler, which can download 1000 pages per second. 39 | - You can customize the source of proxy crawler. The configuration file is extremely simple. 40 | - Provide a compiled version, comes with a SQLite database, and supports mysql 41 | - Comes with an API interface, all functions can be used with one click 42 | - Proxy evaluation system to ensure the quality of the proxy pool 43 | 44 | ## 2. How to use 45 | 46 | `golang-proxy` provides compiled binary files so that you do not need `golang` on the machine. Download binary compression pack to [Release Page](https://github.com/storyicon/golang-proxy/releases/) 47 | According to your system type, download the corresponding compression package, unzip it and run it. After a few minutes, you can access `localhost:9999/all` in the browser to see the proxy's crawl results. 48 | 49 | Before I go into the detailed introduction of golang-proxy, I think it's best to tell you the most useful information first. 50 | 51 | ### API interface 52 | 53 | After you start the binary, you can access the following interface in the browser to get the proxy 54 | 55 | | url | description | 56 | | -------------------------------------- | ----------------------------------------------------------------------------------------- | 57 | | `localhost:9999/all` | Get all highly available proxies | 58 | | `localhost:9999/all?table=proxy` | Get all highly available proxies | 59 | | `localhost:9999/random` | Randomly acquire a highly available proxy | 60 | | `localhost:9999/all?table=crude_proxy` | Obtain the proxies in the temporary table (the quality of them cannot be guaranteed) | 61 | | `localhost:9999/random?table=proxy` | Randomly get an proxy from the temporary table (the quality of them cannot be guaranteed) | 62 | | `localhost:9999/sql?query=` | Write the SQL statement you want to execute after `query=`, customize your filter rules. | 63 | 64 | Having mastered the above content, you have been able to use the 50% function of `golang-proxy`. But the last interface allows you to execute custom SQL statements, and you'll find that you need to know at least the structure of the tables. The following will tell you. 65 | 66 | ## 3. Advanced 67 | 68 | golang-proxy consists of the following parts: 69 | 70 | - two `data tables` 71 | - one `configuration file` 72 | - one `source folder` 73 | - four `modules` 74 | 75 | ### two `data tables` 76 | 77 | #### 1. Table Crude Proxy 78 | 79 | In order to store temporary proxies, we designed the data table `crude_proxy`, the table is defined as follows. 80 | 81 | | field | type | example | description | 82 | | ----------- | ------ | --------------- | ----------- | 83 | | id | int | - | - | 84 | | ip | string | 192.168.0.1 | - | 85 | | port | string | 255 | - | 86 | | content | string | 192.168.0.1:255 | - | 87 | | insert_time | int | 1540798717 | - | 88 | | update_time | int | 1540798717 | - | 89 | 90 | table `crude_proxy` stores the proxies that are crawled out, and cannot guarantee their quality. 91 | 92 | #### 2. Table Proxy 93 | 94 | When the agent in the `crude_proxy` table passes through `pre assess` ( `pre assess` roughly verifies the availability of the proxy and tests the proxy's support for `https` and `http` ), it will enter the `proxy` table. 95 | 96 | | field | type | example | description | 97 | | ----------------------- | ------ | --------------- | ---------------------------------------------------------------------------------------------------------------- | 98 | | id | int | - | - | 99 | | ip | string | 192.168.0.1 | - | 100 | | port | string | 255 | - | 101 | | scheme_type | int | 2 | Identify the extent to which the proxy supports http and https, `0`: http only, `1` https only, `2` https & http | 102 | | content | string | 192.168.0.1:255 | | 103 | | assess_times | int | 5 | proxy evaluation times | 104 | | success_times | int | 5 | The number of times the proxy successfully passed the evaluation | 105 | | avg_response_time | float | 0.001 | - | 106 | | continuous_failed_times | int | 0 | The number of consecutive failures during the proxy evaluation process | 107 | | score | float | 25 | The higher the better | 108 | | insert_time | int | 1540798717 | - | 109 | | update_time | int | 1540798717 | - | 110 | 111 | The proxy in the `proxy` table will be evaluated periodically and their scores will be modified. Low scores will be deleted. 112 | 113 | ### one `configuration file` 114 | 115 | For convenience, the proxy in golang-proxy is stored in the portable database sqlite by default. You can make `golang-proxy` use the mysql database by adding the `config.yml` file in the executable directory. 116 | 117 | For details, see [Config](https://github.com/storyicon/golang-proxy/tree/master/config) page. 118 | 119 | ### one `source folder` 120 | 121 | golang-proxy needs `source` to define its crawling contents and rules. Therefore, the run directory of golang-proxy needs at least one `source` folder, and the source folder should have at least one source in `yml` format. 122 | The source is defined as follows: 123 | 124 | ```yml 125 | page: 126 | entry: "http://www.xxx.com/http/?page=1" 127 | template: "http://www.xxx.com/http/?page={page}" 128 | from: 1 129 | to: 2000 130 | selector: 131 | iterator: ".list item" 132 | ip: ".ip" 133 | port: ".port" 134 | category: 135 | parallelnumber: 3 136 | delayRange: [10, 30] 137 | interval: "@every 10m" 138 | debug: true 139 | ``` 140 | 141 | In the definition above, `producer` will first crawl the entry page, then crawl: 142 | 143 | ``` 144 | http://www.xxx.com/http/?page=1 145 | http://www.xxx.com/http/?page=2 146 | http://www.xxx.com/http/?page=3 147 | ... 148 | http://www.xxx.com/http/?page=2000 149 | ``` 150 | 151 | This source definition page expects this format: 152 | 153 | ```html 154 | 155 | ... 156 |
157 |
158 |
127.0.0.1
159 |
80
160 | ... 161 |
162 |
163 |
125.4.0.1
164 |
8080
165 | ... 166 |
167 | ... 168 |
169 | ... 170 | 171 | ``` 172 | 173 | When `producer` parses a single page, it always traverses the nodes defined by iterator first, and then gets the elements defined by `ip` and `port` selectors from these nodes. The source definition above is still valid for the following HTML structure. 174 | 175 | ```html 176 | 177 | ... 178 |
179 |
180 |
127.0.0.1:80
181 |
182 |
183 |
125.4.0.1:8080
184 |
185 | ... 186 |
187 | ... 188 | 189 | ``` 190 | 191 | Because when the `port` selector cannot get the content, it will try to parse the port from the text selected by the `ip` selector. 192 | 193 | The source is stored in the source folder in yml format, and a source definition is completed. Golang-proxy will read it and crawl it the next time it starts. So you successfully define a source, store it in the source folder in YML format, and the next time you start golang-proxy, the source will enter the crawl list. 194 | 195 | > If a source file name starts with a `.` , the source will not be read. 196 | 197 | ### four `modules` 198 | 199 | golang-proxy consists of four modules, which cooperate to complete the task that golang-proxy wants to accomplish. 200 | 201 | | module name | description | 202 | | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 203 | | producer | Periodically fetch the source defined in the `source` directory, and write the fetched proxy to the `crude_proxy` table. | 204 | | consumer | Periodically read a certain number of proxies from `crude_proxy`, determine their proxy scheme type and availability, and write them to the `proxy` table. | 205 | | assessor | Periodically read a number of proxies from the `proxy` table to evaluate their quality. | 206 | | service | Be responsible for the HTTP API interface provided by `golang-proxy`, allows you to filter and obtain the proxies in the `crude_proxy` and `proxy` tables by `localhost: 9999/all`, `localhost: 9999/random`, and `localhost: 9999/sql`. | 207 | 208 | When you start the executable file of golang-proxy, you will start these module in turn. But you can add the `-mode` startup parameter after the golang-proxy executable to command golang-proxy to start only one module. Like below: 209 | 210 | ```bash 211 | golang-proxy -mode=service 212 | ``` 213 | 214 | This will only start the HTTP API interface service. 215 | 216 | At this point, you have mastered the 95% function of golang-proxy. If you want to find more, you can read the source code provided above, and improve them. 217 | 218 | ## Request for comments 219 | 220 | Welcome to submit issue. 221 | If you feel that golang-proxy is helping you, you can order a star or watch, thanks ! 222 | 223 | # 中文文档 224 | 225 | Golang-Proxy -- 简单高效的免费代理抓取工具通过抓取网络上公开的免费代理,来维护一个属于自己的高匿代理池,用于网络爬虫、资源下载等用途。 226 | 227 | ## 在 `v3.0` 有哪些新特性 228 | 229 | 1. 依旧提供了高度灵活的 **API 接口**,在启动主程序后,即可通过在浏览器访问`localhost:9999/all` 与 `localhost:9999/random` 直接获取抓到的代理!甚至可以使用 `localhost:9999/sql?query=`来执行一些简单的 SQL 语句来自定义代理筛选规则! 230 | 2. 依旧提供 `Windows`、`Linux`、`Mac` **开箱即用版**! 231 | [Download Release v3.0](https://github.com/storyicon/golang-proxy/releases/) 232 | 3. 支持自动对代理类型进行判断, 可以通过 `schemeType` 判定代理对`http`和`https`的支持程度 233 | 4. 支持了 MySQL 数据库, 详情请见 [Config](https://github.com/storyicon/golang-proxy/tree/master/config) 234 | 5. 支持单独启动服务, 在启动编译好的二进制文件时, 通过 `-mode=` 来指定是否单独启动 `producer`/`consumer`/`assessor`/`service` 235 | 6. 重新设计了数据表, 请注意, 这意味着 `API` 接口发生了变动 236 | 7. 重新设计了 `源` 的数据结构, 去除了 `filter` 等字段, 请注意, 这意味着 `v2.0` 的源在直接提供给`v3.0` 使用时可能会出现一些问题 237 | 8. 更新了一些 `源` 238 | 9. 不再支持 `-source` 启动参数 239 | 240 | ## 如何使用 `golang-proxy` 241 | 242 | ### 1. 使用开箱即用版本 243 | 244 | [Release 页面](https://github.com/storyicon/golang-proxy/releases/) 根据系统环境提供了一些压缩包,将他们解压后执行即可。 245 | 246 | 开箱即用版下载地址: [Download Release v3.0](https://github.com/storyicon/golang-proxy/releases/) 247 | 248 | 下载完成后, 将压缩包中的二进制文件和 `source` 目录解压到同一个位置, 启动二进制文件即可, 程序将会启动下面这些服务: 249 | 250 | 1. `producer` : 周期性的抓取`source`目录中定义的源, 将抓取到的代理写入到 `crude_proxy` 表中 251 | 2. `consumer` : 周期性的从 `crude_proxy` 中读取一定数量的代理, 判断它们的代理类型以及可用性, 将它们写入到 `proxy`表中 252 | 3. `assessor` : 周期性的从 `proxy` 表中读取一定数量的代理, 评估它们的质量 253 | 4. `service` : `golang-proxy` 提供的 http api 接口, 使你可以通过 `localhost:9999/all`, `localhost:9999/random`, `localhost:9999/sql?query=` 这三个接口来筛选和获取 `crude_proxy`和 `proxy` 表中的代理 254 | 255 | 当你启动编译好的二进制文件时, 默认这些服务会依次启动, 但是在 `v3.0` 版本, 你可以通过添加 `-mode` 启动参数来指定单独启动某个服务, 比如: 256 | 257 | ``` 258 | golang-proxy -mode=service 259 | ``` 260 | 261 | 这样运行, 将只会启动 `service` 服务, 在启动了 `service` 之后, 你可以在浏览器中访问以下接口, 获得相应的代理: 262 | 263 | | url | description | 264 | | ----------------------------------------- | ---------------------------------------------------------------------- | 265 | | `localhost:9999/all` | 获取 `proxy` 表中所有已经抓取到的代理 | 266 | | `localhost:9999/all?table=proxy` | 获取 `proxy` 表中所有已经抓取到的代理 | 267 | | `localhost:9999/all?table=crude_proxy` | 获取 `crude_proxy` 表中所有已经抓取到的代理 | 268 | | `localhost:9999/random` | 从 `proxy` 表中随机获取一条代理 | 269 | | `localhost:9999/random?table=proxy` | 从 `proxy` 表中随机获取一条代理 | 270 | | `localhost:9999/random?table=crude_proxy` | 从 `crude_proxy` 表中随机获取一条代理 | 271 | | `localhost:9999/sql?query=` | 在`query=`后加上`SQL`语句, 返回 SQL 执行结果, 只支持较为简单的查询语句 | 272 | 273 | 请注意, `crude_proxy` 只是抓取到的代理的临时储存表, 不能保证它们的质量, 而`proxy` 表中的代理将会不断得到 `assessor` 的评估, `proxy` 表中的 `score` 字段可以较为全面的反映一个代理的质量, 质量较低时会被删除 274 | 275 | #### 接口示例: `localhost:9999/sql` 276 | 277 | 例如访问 `localhost:9999/sql?query=SELECT * FROM PROXY WHERE SCORE > 5 ORDER BY SCORE DESC`, 将会返回 `proxy` 表中所有分数大于 5 的代理, 并按照分数从高到低返回 278 | 279 | ```json 280 | { 281 | "error": "", 282 | "message": [ 283 | { 284 | "id": 2, 285 | "ip": "45.113.69.177", 286 | "port": "1080", 287 | // scheme_type 可以取以下值: 288 | // 0: 代理只支持 http 289 | // 1: 代理只支持 https 290 | // 2: 代理同时支持 http 和 https 291 | "scheme_type": 0, 292 | "content": "45.113.69.177:1080", 293 | // 评估次数 294 | "assess_times": 9, 295 | // 评估成功次数, 可以通过 success_times/assess_times获得代理连接成功率 296 | "success_times": 9, 297 | // 平均响应时间 298 | "avg_response_time": 0.098, 299 | // 连续失败次数 300 | "continuous_failed_times": 0, 301 | // 分数, 推荐使用 5 分以上的代理 302 | "score": 68.45106053570785, 303 | "insert_time": 1540793312, 304 | "update_time": 1540797880 305 | } 306 | ] 307 | } 308 | ``` 309 | 310 | ### 2. 使用源码编译 311 | 312 | ```bash 313 | go get -u github.com/storyicon/golang-proxy 314 | ``` 315 | 316 | 进入到 `golang-proxy` 目录,执行 `go build main.go`,执行生成的二进制的执行程序即可。 317 | 318 | **注意:** 319 | 320 | 项目根目录下的 `./source` 是项目执行必须的文件夹,里面存储了各类网站源,其他的文件夹储存的均为项目源码。所以在编译后得到二进制程序 `main` 文件后,即可将 `main` 文件和 `source` 文件夹一同移动到任意地方,`main` 文件可以任意命名。 321 | 322 | ## 为什么要用 Golang-Proxy 323 | 324 | 1. 稳定、快速。 325 | 抓取模块,**单核并发可以到达 1000 个页面/秒**。 326 | 2. 高可配置性、高拓展性。 327 | 你不需要写任何代码,花**一两分钟**填写一个配置文件就可以添加一个新的网站源。 328 | 3. 评估功能。 329 | 通过 Assessor 评估模块,周期性测试代理质量,根据代理的**测试成功率、高匿性、测试次数、突变性、响应速度**等独立影响因子进行综合评分,算法具有高度可配置性,可以根据项目的需要可以对因子的权重进行独立调整。 330 | 4. 提供了高度灵活的 **API 接口**,在启动主程序后,即可通过在浏览器访问`localhost:9999/all` 与 `localhost:9999/random` 直接获取抓到的代理!甚至可以使用 `localhost:9999/sql?query=`来执行 SQL 语句来自定义代理筛选规则! 331 | 5. 不依赖任何服务型数据库,一键下载,开箱即用! 332 | 333 | ## 如何配置一个新的源 334 | 335 | `./source/`下的所有 yml 格式的文件都是**源**,你可以增加源,也可以通过在文件名前加上一个 **`.`** 来使程序忽略这个源,当然你也可以直接删除,来让一个源永远的消失,下面进行 Source 参数介绍: 336 | 337 | ```yml 338 | #Page配置项 339 | page: 340 | entry: "https://xxx/1.html" 341 | template: "https://xxx/{page}.html" 342 | from: 2 343 | to: 10 344 | #publisher将会首先抓取entry,即 https://xxx/1.html 345 | #然后根据 template、from 和 to 依次抓取 346 | #  https://xxx/2.html 347 | #  https://xxx/3.html 348 | #  https://xxx/4.html 349 | #  ... 350 | #  https://xxx/10.html 351 | ``` 352 | 353 | ```yml 354 | #Selector配置项 355 | selector: 356 | iterator: ".table tbody tr" 357 | ip: "td:nth-child(1)" 358 | port: "td:nth-child(2)" 359 | # 以上配置用于抓取下面这种 HTML 结构 360 | # 361 | # 362 | # 363 | # 364 | # 365 | # 366 | # 367 | # 368 | # 369 | # 370 | # 371 | # 372 | # 373 | # 374 | # 375 | # 376 | # 377 | # 378 | #
187.3.0.18080HTTP
164.23.1.280HTTPS
131.9.2.38080HTTP
379 | # 选择器为通用的JQuery选择器,iterator为循环对象,比如表格里的行,每行一条代理,那这个行的选择器就是iterator,而ip、port、protocal则是在iterator选择器的基础上进行子元素的查找。 380 | ``` 381 | 382 | ```yml 383 | category: 384 | # 并行数 385 | parallelnumber: 1 386 | # 对于这个源,每抓取一个页面 387 | # 将会随机等待5~20s再抓下一个页面 388 | delayRange: [5, 20] 389 | # 间隔多长时间启用一次这个源 390 | # @every 10s , @every 10h... 391 | interval: "@every 10m" 392 | debug: true 393 | ``` 394 | 395 | ## 征求意见 396 | 397 | 1. 使用中任何问题提 `issues` 即可 398 | 2. 如果发现了新的好用的源,欢迎提交上来分享 399 | 3. 来都来了点个 Star 再走呗 : ) 400 | -------------------------------------------------------------------------------- /business/assessor.go: -------------------------------------------------------------------------------- 1 | package business 2 | 3 | import ( 4 | "math" 5 | "time" 6 | 7 | "github.com/robfig/cron" 8 | log "github.com/sirupsen/logrus" 9 | "github.com/storyicon/golang-proxy/dao" 10 | "github.com/storyicon/golang-proxy/model" 11 | ) 12 | 13 | var ( 14 | // AssessorStackLength stores the number of proxy to be evaluated in the current memory. 15 | AssessorStackLength = 0 16 | ) 17 | 18 | // StartAssessor is used to start the evaluation procedure. 19 | func StartAssessor() { 20 | scheduler := cron.New() 21 | scheduler.AddFunc("@every 3s", func() { 22 | if AssessorStackLength < AssessorStackCapacity { 23 | proxies := dao.GetProxy(AssessorInterval, AssessorPerExtract) 24 | AssessorStackLength += len(proxies) 25 | for _, proxy := range proxies { 26 | go (func(proxy *model.Proxy) { 27 | Assess(proxy) 28 | })(proxy) 29 | } 30 | } 31 | }) 32 | log.Infoln("Start Assessor") 33 | scheduler.Start() 34 | } 35 | 36 | // Assess is used to evaluate an proxy. 37 | func Assess(proxy *model.Proxy) { 38 | schemeTest := HTTP 39 | timestamp := time.Now().UnixNano() / 1e6 40 | switch proxy.SchemeType { 41 | case typeHTTP: 42 | schemeTest = HTTP 43 | case typeHTTPS: 44 | schemeTest = HTTPS 45 | case typeBOTH: 46 | if timestamp%2 == 0 { 47 | schemeTest = HTTPS 48 | } 49 | default: 50 | log.Errorf("Unknown proxy scheme type: %d", proxy.SchemeType) 51 | } 52 | 53 | testOK := HTTPBinTester(proxy.IP, proxy.Port, schemeTest) 54 | AssessorStackLength-- 55 | timeCost := float64(time.Now().UnixNano()/1e6-timestamp) / 1e3 56 | feedBack(proxy, testOK, timeCost) 57 | } 58 | 59 | func feedBack(proxy *model.Proxy, isOK bool, timeCost float64) { 60 | proxy.AssessTimes++ 61 | assessTimes := float64(proxy.AssessTimes) 62 | proxy.AvgResponseTime = (proxy.AvgResponseTime*(assessTimes-1.0) + timeCost) / assessTimes 63 | if isOK { 64 | proxy.ContinuousFailedTimes = 0 65 | proxy.SuccessTimes++ 66 | log.Infof("[Assessor]Proxy %s assess pass(%gms)", proxy.Content, timeCost) 67 | } else { 68 | proxy.ContinuousFailedTimes++ 69 | log.Warnf("[Assessor]Proxy %s Assess Failed", proxy.Content) 70 | } 71 | proxy.UpdateTime = time.Now().Unix() 72 | proxy.Score = GetScore(proxy) 73 | UpdateProxy(proxy) 74 | } 75 | 76 | // UpdateProxy is used to update the evaluation information to the database. 77 | func UpdateProxy(proxy *model.Proxy) { 78 | session := dao.GetDatabase() 79 | successRate := float64(proxy.SuccessTimes) / float64(proxy.AssessTimes) 80 | if successRate < AssessorAllowSuccessRateMin { 81 | log.Warnf("[Assessor]Proxy %s Deleted: score too low", proxy.Content) 82 | session.Delete(proxy) 83 | } else { 84 | session.Save(proxy) 85 | } 86 | } 87 | 88 | // GetScore uses association algorithm to evaluate the score of a Proxy. 89 | // Set 4 impact factors, namely AssessTimes, SuccessTimes, Speed, Mutation 90 | // Continuously increasing Mutation value will lead to a sharp drop in Score 91 | // Formula affected by SuccessRate and AssessTimes at the same time. 92 | // Formulas can be derived by yourself 93 | func GetScore(p *model.Proxy) float64 { 94 | times := float64(p.AssessTimes) 95 | success := float64(p.SuccessTimes) 96 | speed := math.Sqrt(float64(RequestTimeout)) / p.AvgResponseTime 97 | mutation := 1 / math.Pow(float64(p.ContinuousFailedTimes)+1, 2.0) 98 | return success * speed * mutation / math.Sqrt(times) 99 | } 100 | -------------------------------------------------------------------------------- /business/consumer.go: -------------------------------------------------------------------------------- 1 | package business 2 | 3 | import ( 4 | "net/http" 5 | "strings" 6 | "time" 7 | 8 | "fmt" 9 | 10 | "github.com/parnurzeal/gorequest" 11 | "github.com/robfig/cron" 12 | log "github.com/sirupsen/logrus" 13 | "github.com/storyicon/golang-proxy/dao" 14 | "github.com/storyicon/golang-proxy/model" 15 | ) 16 | 17 | const ( 18 | // HTTP defines an proxy type 19 | HTTP = "http" 20 | // HTTPS defines an proxy type 21 | HTTPS = "https" 22 | ) 23 | 24 | var ( 25 | // ConsumerStackLength stores the number of proxy to be pre evaluated in the current memory. 26 | ConsumerStackLength = 0 27 | // LocalIP is used to store local ipv4 address 28 | LocalIP = GetLocalIPAddress() 29 | ) 30 | 31 | // StartConsumer is used to start the consumer 32 | func StartConsumer() { 33 | scheduler := cron.New() 34 | scheduler.AddFunc("@every 1s", func() { 35 | if ConsumerStackLength < ConsumerStackCapacity { 36 | log.Infoln("[Comsumer]No proxy in stack, start to extract proxy from database to pre assess") 37 | proxies := dao.PopCrudeProxy(0, ConsumerPerExtract) 38 | ConsumerStackLength += len(proxies) 39 | for _, proxy := range proxies { 40 | go (func(proxy *model.CrudeProxy) { 41 | PreAssess(proxy) 42 | })(proxy) 43 | } 44 | } 45 | }) 46 | log.Infoln("Start Comsumer") 47 | scheduler.Start() 48 | } 49 | 50 | // PreAssess is used to pre assess an proxy. 51 | func PreAssess(proxy *model.CrudeProxy) { 52 | IP, port := proxy.IP, proxy.Port 53 | httpOK := HTTPBinTester(IP, port, HTTP) 54 | httpsOK := HTTPBinTester(IP, port, HTTPS) 55 | 56 | ConsumerStackLength-- 57 | 58 | var schemeType int64 59 | if httpOK && httpsOK { 60 | schemeType = typeBOTH 61 | } else if httpsOK { 62 | schemeType = typeHTTPS 63 | } else if !httpsOK && !httpOK { 64 | return 65 | } 66 | dao.SaveProxy(&model.Proxy{ 67 | IP: proxy.IP, 68 | Port: proxy.Port, 69 | SchemeType: schemeType, 70 | Content: proxy.Content, 71 | }) 72 | } 73 | 74 | func isAnonymous(proxy string, origin string) bool { 75 | if proxy == origin { 76 | return true 77 | } 78 | for _, ip := range LocalIP { 79 | if strings.Contains(origin, ip) { 80 | return false 81 | } 82 | } 83 | return true 84 | } 85 | 86 | // GetLocalIPAddress is used to get local ip address 87 | func GetLocalIPAddress() []string { 88 | httpBin := &model.HTTPBinIP{} 89 | _, _, errs := gorequest.New().Timeout(3 * time.Second).EndStruct(httpBin) 90 | if len(errs) != 0 { 91 | return []string{} 92 | } 93 | localIP := []string{} 94 | address := strings.Split(httpBin.Origin, ",") 95 | for _, addr := range address { 96 | localIP = append(localIP, addr) 97 | } 98 | return localIP 99 | } 100 | 101 | // HTTPBinTester is used to use httpbin test agent. 102 | func HTTPBinTester(IP string, port string, schemeTest string) bool { 103 | switch schemeTest { 104 | case HTTPS: 105 | default: 106 | schemeTest = HTTP 107 | } 108 | url, proxy := fmt.Sprintf("%s://httpbin.org/ip", schemeTest), 109 | fmt.Sprintf("%s://%s:%s", schemeTest, IP, port) 110 | httpBin := &model.HTTPBinIP{} 111 | request := gorequest.New().Proxy(proxy).Timeout(RequestTimeout * time.Second) 112 | response, _, errs := request.Get(url). 113 | Retry( 114 | ConsumerRetryTimes, 115 | RequestTimeout, 116 | http.StatusBadRequest, 117 | http.StatusInternalServerError, 118 | ). 119 | EndStruct(httpBin) 120 | if len(errs) == 0 && response.StatusCode == 200 { 121 | if isAnonymous(IP, httpBin.Origin) { 122 | log.Infof(`[Consumer][%s]Proxy Pre Assess Successful: %s`, schemeTest, proxy) 123 | return true 124 | } 125 | log.Println(IP, httpBin.Origin) 126 | log.Warnf(`[Consumer]Proxy %s Pre Assess Failed: Not Highly Anonymous`, proxy) 127 | return false 128 | } 129 | log.Warnf(`[Consumer]Proxy %s Pre Assess Failed: Connection Timeout or Refused`, proxy) 130 | return false 131 | } 132 | -------------------------------------------------------------------------------- /business/producer.go: -------------------------------------------------------------------------------- 1 | package business 2 | 3 | import ( 4 | "math/rand" 5 | "time" 6 | 7 | "github.com/storyicon/golang-proxy/model" 8 | 9 | "github.com/gocolly/colly" 10 | "github.com/robfig/cron" 11 | log "github.com/sirupsen/logrus" 12 | "github.com/storyicon/golang-proxy/dao" 13 | "github.com/storyicon/golang-proxy/std" 14 | ) 15 | 16 | // StartProducer used to start the producer 17 | // Producer used to cralw source and store the crawled proxy in database 18 | func StartProducer() { 19 | scheduler := cron.New() 20 | sources := dao.GetSources() 21 | 22 | log.Infof("[Producer]Totally %d source was found", len(sources)) 23 | 24 | std.Dump(sources) 25 | for _, source := range sources { 26 | (func(scheduler *cron.Cron, source *model.Source) { 27 | go newSourceCrawler(source) 28 | name := source.Name 29 | interval := source.Category.Interval 30 | log.Infof("[Producer]Periodical Task %s Was Assigned %s", name, interval) 31 | scheduler.AddFunc(interval, func() { 32 | log.Infof("[Producer]Periodical Task @%s is Running!", name) 33 | newSourceCrawler(source) 34 | }) 35 | })(scheduler, source) 36 | } 37 | 38 | scheduler.Start() 39 | } 40 | 41 | func newSourceCrawler(source *model.Source) { 42 | var ( 43 | name = source.Name 44 | debug = source.Debug 45 | parallelNumber = source.Category.ParallelNumber 46 | iterator = source.Selector.Iterator 47 | IPSelector = source.Selector.IP 48 | portSelector = source.Selector.Port 49 | startURL = source.Page.Entry 50 | pageFrom = source.Page.From 51 | pageTo = source.Page.To 52 | delayRange = source.Category.DelayRange 53 | template = source.Page.Template 54 | ) 55 | c := colly.NewCollector( 56 | colly.UserAgent(UserAgent), 57 | colly.Async(true), 58 | ) 59 | c.SetRequestTimeout(RequestTimeout * time.Second) 60 | c.Limit(&colly.LimitRule{ 61 | Parallelism: parallelNumber, 62 | }) 63 | c.OnError(func(_ *colly.Response, err error) { 64 | if debug { 65 | log.Warnf("[Producer][%s]Visit error: %s", name, err) 66 | } 67 | }) 68 | c.OnRequest(func(request *colly.Request) { 69 | if debug { 70 | log.Infof("[Producer][%s]Start visit: %s", name, request.URL) 71 | } 72 | }) 73 | c.OnHTML(iterator, func(element *colly.HTMLElement) { 74 | item := element.DOM 75 | proxy := NewProxy( 76 | item.Find(IPSelector).Text(), 77 | item.Find(portSelector).Text(), 78 | ) 79 | if proxy == nil { 80 | return 81 | } 82 | if content := proxy.String(); content != "" { 83 | dao.SaveCrudeProxy(&model.CrudeProxy{ 84 | IP: proxy.IP, 85 | Port: proxy.Port, 86 | Content: content, 87 | }) 88 | if debug { 89 | log.Infof("[Producer][%s]Proxy %s was mined", name, content) 90 | } 91 | } 92 | }) 93 | c.OnScraped(func(response *colly.Response) { 94 | if debug { 95 | log.Infof("[Producer][%s]Finish visit: %s", name, response.Request.URL) 96 | } 97 | }) 98 | 99 | c.Visit(startURL) 100 | 101 | for i := pageFrom; i < pageTo; i++ { 102 | sleep(delayRange) 103 | c.Visit(std.TemplateRender(template, "page", i)) 104 | } 105 | 106 | } 107 | 108 | func sleep(delayRange []int) { 109 | delay := 1 110 | switch count := len(delayRange); count { 111 | case 0: 112 | break 113 | case 1: 114 | delay = delayRange[0] 115 | case 2: 116 | delay = delayRange[0] + rand.Intn(delayRange[1]-delayRange[0]) 117 | default: 118 | delay = delayRange[rand.Intn(count)] 119 | } 120 | time.Sleep(time.Duration(delay) * time.Second) 121 | } 122 | -------------------------------------------------------------------------------- /business/proxy.go: -------------------------------------------------------------------------------- 1 | package business 2 | 3 | import ( 4 | "fmt" 5 | "net/url" 6 | "regexp" 7 | "strings" 8 | ) 9 | 10 | const ( 11 | // PortRegexString is used for regular matching port numbers. 12 | PortRegexString = `([0-9]{1,4}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])` 13 | // IPRegexString is used for regular matching IP address. 14 | IPRegexString = `((?:(?:25[0-5]|2[0-4]\d|(?:1\d{2}|[1-9]?\d))\.){3}(?:25[0-5]|2[0-4]\d|(?:1\d{2}|[1-9]?\d)))` 15 | ) 16 | 17 | const ( 18 | typeHTTP = iota 19 | typeHTTPS 20 | typeBOTH 21 | ) 22 | 23 | var ( 24 | // PortRegexp is the compiled port regular expression. 25 | PortRegexp = regexp.MustCompile(PortRegexString) 26 | // IPRegexp is the compiled IP regular expression. 27 | IPRegexp = regexp.MustCompile(IPRegexString) 28 | ) 29 | 30 | // Proxy defines the data structure of proxy. 31 | type Proxy struct { 32 | // IP is an ip address in the strict sense, in the form of xxx.xxx.xxx.xxx 33 | IP string 34 | // Port is a port number in the strict sense 35 | Port string 36 | // In order to avoid the burden of source configuration and untrusted scheme data, 37 | // we decided to hide this data item and automatically detect the proxy's scheme by the assessor module. 38 | // Scheme string 39 | } 40 | 41 | // String is used to convert Proxy to strings. 42 | func (proxy *Proxy) String() string { 43 | if proxy.IP == "" || proxy.Port == "" { 44 | return "" 45 | } 46 | return fmt.Sprintf("%s:%s", proxy.IP, proxy.Port) 47 | } 48 | 49 | // NewProxy is used to return a Proxy object by parsing and formatting 50 | func NewProxy(ipRaw string, portRaw string) *Proxy { 51 | ip, port := parseProxyByIPRaw(ipRaw) 52 | if ip == "" { 53 | return nil 54 | } 55 | if port == "" { 56 | port = parsePortByPortRaw(portRaw) 57 | } 58 | return &Proxy{ 59 | IP: ip, 60 | Port: port, 61 | } 62 | } 63 | 64 | // Parsing out the port from the Port string 65 | func parsePortByPortRaw(portRaw string) string { 66 | return PortRegexp.FindString(portRaw) 67 | } 68 | 69 | // Parsing out the IP from the IP string 70 | func parseIPByIPRaw(ipRaw string) string { 71 | return IPRegexp.FindString(ipRaw) 72 | } 73 | 74 | // parseProxyByIPRaw is used to parse out ip and port from ip string 75 | // Accept parameters similar to "127.0.0.1", "127.0.0.1:1080", "http://127.0.0.1" ... 76 | func parseProxyByIPRaw(ipRaw string) (ip string, port string) { 77 | if strings.Index(ipRaw, "://") == -1 { 78 | ipRaw = "http://" + ipRaw 79 | } 80 | if parser, err := url.Parse(ipRaw); err == nil { 81 | ip = parseIPByIPRaw(parser.Hostname()) 82 | port = parser.Port() 83 | } 84 | return 85 | } 86 | -------------------------------------------------------------------------------- /business/service.go: -------------------------------------------------------------------------------- 1 | package business 2 | 3 | import ( 4 | "fmt" 5 | "net/http" 6 | "strings" 7 | 8 | "github.com/gin-gonic/gin" 9 | log "github.com/sirupsen/logrus" 10 | "github.com/storyicon/golang-proxy/dao" 11 | ) 12 | 13 | // Response is the response struct of the http service 14 | type Response struct { 15 | Error string `json:"error"` 16 | Message interface{} `json:"message"` 17 | } 18 | 19 | // StartService used to start the http service 20 | func StartService() { 21 | router := gin.Default() 22 | router.GET("/all", func(c *gin.Context) { 23 | tableName := queryWithDefault(c, "table", "proxy") 24 | sql := fmt.Sprintf("SELECT * FROM %s ", tableName) 25 | redirect(c, sql) 26 | }) 27 | router.GET("/random", func(c *gin.Context) { 28 | databaseType := dao.GetDatabase().Dialect().GetName() 29 | tableName := queryWithDefault(c, "table", "proxy") 30 | var sql string 31 | switch databaseType { 32 | case "sqlite3": 33 | sql = fmt.Sprintf("SELECT * FROM %s ORDER BY RANDOM() limit 1", tableName) 34 | case "mysql": 35 | sql =fmt.Sprintf("SELECT * FROM %s ORDER BY RAND() limit 1", tableName) 36 | } 37 | redirect(c, sql) 38 | }) 39 | router.GET("/sql", func(c *gin.Context) { 40 | query := c.Query("query") 41 | tableName := getTableNameBySQL(query) 42 | response, statusCode := Response{}, http.StatusOK 43 | if tableName != "" { 44 | record, err := dao.GetSQLResult(tableName, query) 45 | response.Message = record 46 | if err != nil { 47 | statusCode = http.StatusInternalServerError 48 | response.Error = fmt.Sprint(err) 49 | } 50 | } else { 51 | statusCode = http.StatusInternalServerError 52 | response.Error = "Unable to resolve table name" 53 | } 54 | c.JSON(statusCode, response) 55 | }) 56 | log.Infof("[S]Start Service on %s", ServiceListenAddress) 57 | router.Run(ServiceListenAddress) 58 | } 59 | 60 | func redirect(context *gin.Context, sql string) { 61 | context.Redirect(http.StatusTemporaryRedirect, fmt.Sprintf("/sql?query=%s", sql)) 62 | } 63 | 64 | func getTableNameBySQL(s string) string { 65 | words := strings.Split(strings.ToLower(s), " ") 66 | length := len(words) 67 | for i := 0; i < length; i++ { 68 | if words[i] == "from" { 69 | if i < length-1 { 70 | return words[i+1] 71 | } 72 | break 73 | } 74 | } 75 | return "" 76 | } 77 | 78 | func queryWithDefault(c *gin.Context, key string, defaultValue string) string { 79 | if conseq := c.Query(key); conseq != "" { 80 | return conseq 81 | } 82 | return defaultValue 83 | } 84 | -------------------------------------------------------------------------------- /business/settings.go: -------------------------------------------------------------------------------- 1 | package business 2 | 3 | const ( 4 | RequestTimeout = 5 5 | UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36" 6 | 7 | ConsumerRetryTimes = 3 8 | ConsumerStackCapacity = 500 9 | ConsumerPerExtract = 30 10 | ConsumerProxyInitScore = 1 11 | 12 | AssessorAllowSuccessRateMin = 0.5 13 | AssessorStackCapacity = 500 14 | AssessorPerExtract = 30 15 | AssessorInterval int64 = 60 16 | 17 | ServiceListenAddress = ":9999" 18 | ) 19 | -------------------------------------------------------------------------------- /config/README.md: -------------------------------------------------------------------------------- 1 | # Config 2 | 3 | 4 | - [English Document](#english-document) 5 | - [中文文档](#中文文档) 6 | 7 | 8 | ## English Document 9 | 10 | You can specify `golang-proxy` to use the `MySQL` database by adding a file called `config.yml` in the compiled binary directory. The directory structure is as follows: 11 | 12 | ![directory structure](https://raw.githubusercontent.com/storyicon/golang-proxy/master/docs/screenshot/directory%20structure.png) 13 | 14 | The contents of the `config.yml` file can be seen in [config.yml](https://github.com/storyicon/golang-proxy/blob/master/config/config.yml). If the MySQL connection fails or fails to read the `config.yml` file, the `sqlite` database will be used by default. 15 | 16 | If you specify to use the mysql database, then the database needs you to build, but you don't need to create the data tables needed by golang-proxy at any time, because golang-proxy will automatically create when they don't exist. 17 | 18 | ## 中文文档 19 | 20 | 你可以通过在`golang-proxy`编译好的二进制文件的目录下建立一个名为 `config.yml` 的文件, 来指定 `golang-proxy` 使用 `MySQL` 数据库, 目录组织看起来像这样: 21 | 22 | ![directory structure](https://raw.githubusercontent.com/storyicon/golang-proxy/master/docs/screenshot/directory%20structure.png) 23 | 24 | 在 `config.yml` 中你可以指定 `MySQL` 的 `HOST`, `PORT` 等等, `config.yml` 的示例见: [config.yml](https://github.com/storyicon/golang-proxy/blob/master/config/config.yml) 25 | 26 | 如果你决定使用 `MySQL` 作为 `golang-proxy` 的储存引擎, 你需要自己建立数据库, 并在 `config.yml` 中指定它, 但是你不需要建立数据表, 因为 `golang-proxy` 会在它们不存在的时候自动创建 27 | 28 | 当 `golang-proxy` 连接 `MySQL` 出现错误, 或者没有找到同目录下的 `config.yml` 的时候, 会默认使用 `sqlite` 数据库, 你可以下载 `sqlite studio` 等软件来读取这个便携式数据库 (事实上, 使用 `golang-proxy` 提供的 [http接口](https://github.com/storyicon/golang-proxy#api-interface) 已经足够了) -------------------------------------------------------------------------------- /config/config.yml: -------------------------------------------------------------------------------- 1 | MYSQL: 2 | HOST: "127.0.0.1" 3 | PORT: "3306" 4 | USER: "root" 5 | PASS: "root" 6 | DB: "proxy" 7 | CHARSET: "utf8mb4" -------------------------------------------------------------------------------- /dao/dao.go: -------------------------------------------------------------------------------- 1 | package dao 2 | 3 | import ( 4 | "os" 5 | "path/filepath" 6 | "strings" 7 | 8 | "github.com/jinzhu/gorm" 9 | log "github.com/sirupsen/logrus" 10 | "github.com/storyicon/golang-proxy/model" 11 | "github.com/storyicon/golang-proxy/std" 12 | ) 13 | 14 | var ( 15 | SourceFolderPath string 16 | ConfigFilePath string 17 | ) 18 | 19 | var ( 20 | Database *gorm.DB 21 | Config *model.Config 22 | Sources model.Sources 23 | ) 24 | 25 | func GetDatabase() *gorm.DB { 26 | if Database == nil { 27 | Database = getDatabase() 28 | } 29 | return Database 30 | } 31 | 32 | func GetSources() model.Sources { 33 | if Sources == nil { 34 | Sources = getSources() 35 | } 36 | return Sources 37 | } 38 | 39 | func GetConfig() *model.Config { 40 | if Config == nil { 41 | config, err := getConfig() 42 | if err != nil { 43 | log.Panicf("Failed to load config file, %v", err) 44 | } 45 | Config = config 46 | } 47 | return Config 48 | } 49 | 50 | func getDatabase() (db *gorm.DB) { 51 | config, err := getConfig() 52 | if err == nil { 53 | log.Infoln("The configuration file has been read, try to use the MySQL database") 54 | db, err = std.NewMySQL(config.MySQL) 55 | if err == nil { 56 | return db 57 | } 58 | log.Errorf("An error occurred while connecting to the MySQL database: %v", err) 59 | } 60 | log.Infof("Start to use sqlite database, because %v", err) 61 | db, err = std.NewSQLite(model.SQLiteDatabase) 62 | if err != nil { 63 | log.Panicf("An error occurred while initializing database SQLite: %v", err) 64 | } 65 | return 66 | } 67 | 68 | func getConfig() (*model.Config, error) { 69 | config := model.Config{} 70 | if err := std.LoadYaml(getConfigFilePath(), &config); err != nil { 71 | return nil, err 72 | } 73 | return &config, nil 74 | } 75 | 76 | func getSources() model.Sources { 77 | sources := model.Sources{} 78 | filepath.Walk(getSourceFolderPath(), func(path string, info os.FileInfo, err error) error { 79 | if info == nil || info.IsDir() { 80 | return nil 81 | } 82 | if filename := info.Name(); filepath.Ext(filename) == ".yml" && !strings.HasPrefix(filename, ".") { 83 | source := model.Source{} 84 | std.LoadYaml(path, &source) 85 | source.Name = filename 86 | sources = append(sources, &source) 87 | log.Infof("Successfully load source: %s", filename) 88 | } 89 | return nil 90 | }) 91 | return sources 92 | } 93 | 94 | func getSourceFolderPath() string { 95 | if SourceFolderPath == "" { 96 | path := filepath.Join(std.GetCurrentDirectory(), "source") 97 | if !std.IsDirExists(path) { 98 | log.Errorf(`Source folder "%s" does not exist`, path) 99 | os.Exit(1) 100 | } 101 | SourceFolderPath = path 102 | } 103 | return SourceFolderPath 104 | } 105 | 106 | func getConfigFilePath() string { 107 | if ConfigFilePath == "" { 108 | ConfigFilePath = filepath.Join(std.GetCurrentDirectory(), "config.yml") 109 | } 110 | return ConfigFilePath 111 | } 112 | 113 | func init() { 114 | session := GetDatabase() 115 | session.AutoMigrate( 116 | &model.CrudeProxy{}, 117 | &model.Proxy{}, 118 | ) 119 | } 120 | -------------------------------------------------------------------------------- /dao/proxy.go: -------------------------------------------------------------------------------- 1 | package dao 2 | 3 | import ( 4 | "errors" 5 | "time" 6 | 7 | log "github.com/sirupsen/logrus" 8 | "github.com/storyicon/golang-proxy/model" 9 | ) 10 | 11 | func GetSQLResult(tableName string, sql string) (conseq interface{}, err error) { 12 | session := GetDatabase() 13 | switch tableName { 14 | case model.CrudeProxyTableName: 15 | conseq = &[]model.CrudeProxy{} 16 | case model.ProxyTableName: 17 | conseq = &[]model.Proxy{} 18 | default: 19 | return nil, errors.New("Query unknown table") 20 | } 21 | err = session.Raw(sql).Scan(conseq).Error 22 | return 23 | } 24 | 25 | func SaveProxy(proxy *model.Proxy) error { 26 | timestamp := time.Now().Unix() 27 | _, err := Save(&model.Proxy{ 28 | IP: proxy.IP, 29 | Port: proxy.Port, 30 | SchemeType: proxy.SchemeType, 31 | Content: proxy.Content, 32 | AssessTimes: 0, 33 | SuccessTimes: 0, 34 | AvgResponseTime: 0, 35 | ContinuousFailedTimes: 0, 36 | InsertTime: timestamp, 37 | Score: model.ProxyInitScore, 38 | }) 39 | return err 40 | } 41 | 42 | func GetProxy(interval int64, limit int) []*model.Proxy { 43 | proxy := []*model.Proxy{} 44 | session := GetDatabase() 45 | session.Where("update_time <= ?", time.Now().Unix()-interval). 46 | Order("update_time"). 47 | Limit(limit). 48 | Find(&proxy) 49 | return proxy 50 | } 51 | 52 | func GetCrudeProxy(offset int, limit int) []*model.CrudeProxy { 53 | proxies := []*model.CrudeProxy{} 54 | session := GetDatabase() 55 | session.Model(proxies). 56 | Offset(offset). 57 | Limit(limit). 58 | Find(&proxies) 59 | return proxies 60 | } 61 | 62 | func PopCrudeProxy(offset int, limit int) []*model.CrudeProxy { 63 | proxies := GetCrudeProxy(offset, limit) 64 | DeleteCrudeProxy(proxies) 65 | return proxies 66 | } 67 | 68 | func SaveCrudeProxy(proxy *model.CrudeProxy) error { 69 | timestamp := time.Now().Unix() 70 | if proxy.InsertTime == 0 { 71 | proxy.InsertTime = timestamp 72 | } 73 | _, err := Save(&model.CrudeProxy{ 74 | IP: proxy.IP, 75 | Port: proxy.Port, 76 | Content: proxy.Content, 77 | InsertTime: proxy.InsertTime, 78 | UpdateTime: timestamp, 79 | }) 80 | return err 81 | } 82 | 83 | func DeleteCrudeProxy(proxies []*model.CrudeProxy) error { 84 | var idList []int64 85 | session := GetDatabase() 86 | for _, proxy := range proxies { 87 | idList = append(idList, proxy.ID) 88 | } 89 | return session.Where("id in (?)", idList).Delete(&model.CrudeProxy{}).Error 90 | } 91 | 92 | func Save(data interface{}) (interface{}, error) { 93 | session := GetDatabase().Begin() 94 | if err := session.Create(data).Error; err != nil { 95 | log.Errorln(err) 96 | session.Rollback() 97 | return 0, err 98 | } 99 | session.Commit() 100 | return data, nil 101 | } 102 | -------------------------------------------------------------------------------- /docs/screenshot/directory structure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/storyicon/golang-proxy/4380014884b5ea2263a29af321b64be38e752566/docs/screenshot/directory structure.png -------------------------------------------------------------------------------- /go.mod: -------------------------------------------------------------------------------- 1 | module github.com/storyicon/golang-proxy 2 | 3 | go 1.13 4 | 5 | require ( 6 | github.com/PuerkitoBio/goquery v1.5.1 // indirect 7 | github.com/antchfx/htmlquery v1.2.2 // indirect 8 | github.com/antchfx/xmlquery v1.2.3 // indirect 9 | github.com/antchfx/xpath v1.1.5 // indirect 10 | github.com/elazarl/goproxy v0.0.0-20200315184450-1f3cb6622dad // indirect 11 | github.com/gin-gonic/gin v1.6.2 12 | github.com/gobwas/glob v0.2.3 // indirect 13 | github.com/gocolly/colly v1.2.0 14 | github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e // indirect 15 | github.com/jinzhu/gorm v1.9.12 16 | github.com/kennygrant/sanitize v1.2.4 // indirect 17 | github.com/parnurzeal/gorequest v0.2.16 18 | github.com/pkg/errors v0.9.1 // indirect 19 | github.com/robfig/cron v1.2.0 20 | github.com/saintfish/chardet v0.0.0-20120816061221-3af4cd4741ca // indirect 21 | github.com/sirupsen/logrus v1.5.0 22 | github.com/smartystreets/goconvey v1.6.4 // indirect 23 | github.com/temoto/robotstxt v1.1.1 // indirect 24 | gopkg.in/yaml.v2 v2.2.8 25 | moul.io/http2curl v1.0.0 // indirect 26 | ) 27 | -------------------------------------------------------------------------------- /go.sum: -------------------------------------------------------------------------------- 1 | github.com/PuerkitoBio/goquery v1.5.1 h1:PSPBGne8NIUWw+/7vFBV+kG2J/5MOjbzc7154OaKCSE= 2 | github.com/PuerkitoBio/goquery v1.5.1/go.mod h1:GsLWisAFVj4WgDibEWF4pvYnkVQBpKBKeU+7zCJoLcc= 3 | github.com/andybalholm/cascadia v1.1.0 h1:BuuO6sSfQNFRu1LppgbD25Hr2vLYW25JvxHs5zzsLTo= 4 | github.com/andybalholm/cascadia v1.1.0/go.mod h1:GsXiBklL0woXo1j/WYWtSYYC4ouU9PqHO0sqidkEA4Y= 5 | github.com/antchfx/htmlquery v1.2.2 h1:exe4hUStBqXdRZ+9nB7EYA+W2zfIHIq3rRFpChh+VSk= 6 | github.com/antchfx/htmlquery v1.2.2/go.mod h1:MS9yksVSQXls00iXkiMqXr0J+umL/AmxXKuP28SUJM8= 7 | github.com/antchfx/xmlquery v1.2.3 h1:++irmxT+Pkn55FGtSTkUTHarZ6E0b1yyR+UiPZRA+eY= 8 | github.com/antchfx/xmlquery v1.2.3/go.mod h1:/+CnyD/DzHRnv2eRxrVbieRU/FIF6N0C+7oTtyUtCKk= 9 | github.com/antchfx/xpath v1.1.5 h1:pQWeT0Xuv0gR7bDXXuoLAA7ztm9dxb19tTdxdxJR1Bo= 10 | github.com/antchfx/xpath v1.1.5/go.mod h1:Yee4kTMuNiPYJ7nSNorELQMr1J33uOpXDMByNYhvtNk= 11 | github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 12 | github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= 13 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 14 | github.com/denisenkom/go-mssqldb v0.0.0-20191124224453-732737034ffd h1:83Wprp6ROGeiHFAP8WJdI2RoxALQYgdllERc3N5N2DM= 15 | github.com/denisenkom/go-mssqldb v0.0.0-20191124224453-732737034ffd/go.mod h1:xbL0rPBG9cCiLr28tMa8zpbdarY27NDyej4t/EjAShU= 16 | github.com/elazarl/goproxy v0.0.0-20200315184450-1f3cb6622dad h1:zPs0fNF2Io1Qytf92EI2CDJ9oCXZr+NmjEVexrUEdq4= 17 | github.com/elazarl/goproxy v0.0.0-20200315184450-1f3cb6622dad/go.mod h1:Ro8st/ElPeALwNFlcTpWmkr6IoMFfkjXAvTHpevnDsM= 18 | github.com/elazarl/goproxy/ext v0.0.0-20190711103511-473e67f1d7d2 h1:dWB6v3RcOy03t/bUadywsbyrQwCqZeNIEX6M1OtSZOM= 19 | github.com/elazarl/goproxy/ext v0.0.0-20190711103511-473e67f1d7d2/go.mod h1:gNh8nYJoAm43RfaxurUnxr+N1PwuFV3ZMl/efxlIlY8= 20 | github.com/erikstmartin/go-testdb v0.0.0-20160219214506-8d10e4a1bae5 h1:Yzb9+7DPaBjB8zlTR87/ElzFsnQfuHnVUVqpZZIcV5Y= 21 | github.com/erikstmartin/go-testdb v0.0.0-20160219214506-8d10e4a1bae5/go.mod h1:a2zkGnVExMxdzMo3M0Hi/3sEU+cWnZpSni0O6/Yb/P0= 22 | github.com/gin-contrib/sse v0.1.0 h1:Y/yl/+YNO8GZSjAhjMsSuLt29uWRFHdHYUb5lYOV9qE= 23 | github.com/gin-contrib/sse v0.1.0/go.mod h1:RHrZQHXnP2xjPF+u1gW/2HnVO7nvIa9PG3Gm+fLHvGI= 24 | github.com/gin-gonic/gin v1.6.2 h1:88crIK23zO6TqlQBt+f9FrPJNKm9ZEr7qjp9vl/d5TM= 25 | github.com/gin-gonic/gin v1.6.2/go.mod h1:75u5sXoLsGZoRN5Sgbi1eraJ4GU3++wFwWzhwvtwp4M= 26 | github.com/go-playground/assert/v2 v2.0.1 h1:MsBgLAaY856+nPRTKrp3/OZK38U/wa0CcBYNjji3q3A= 27 | github.com/go-playground/assert/v2 v2.0.1/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4= 28 | github.com/go-playground/locales v0.13.0 h1:HyWk6mgj5qFqCT5fjGBuRArbVDfE4hi8+e8ceBS/t7Q= 29 | github.com/go-playground/locales v0.13.0/go.mod h1:taPMhCMXrRLJO55olJkUXHZBHCxTMfnGwq/HNwmWNS8= 30 | github.com/go-playground/universal-translator v0.17.0 h1:icxd5fm+REJzpZx7ZfpaD876Lmtgy7VtROAbHHXk8no= 31 | github.com/go-playground/universal-translator v0.17.0/go.mod h1:UkSxE5sNxxRwHyU+Scu5vgOQjsIJAF8j9muTVoKLVtA= 32 | github.com/go-playground/validator/v10 v10.2.0 h1:KgJ0snyC2R9VXYN2rneOtQcw5aHQB1Vv0sFl1UcHBOY= 33 | github.com/go-playground/validator/v10 v10.2.0/go.mod h1:uOYAAleCW8F/7oMFd6aG0GOhaH6EGOAJShg8Id5JGkI= 34 | github.com/go-sql-driver/mysql v1.4.1 h1:g24URVg0OFbNUTx9qqY1IRZ9D9z3iPyi5zKhQZpNwpA= 35 | github.com/go-sql-driver/mysql v1.4.1/go.mod h1:zAC/RDZ24gD3HViQzih4MyKcchzm+sOG5ZlKdlhCg5w= 36 | github.com/gobwas/glob v0.2.3 h1:A4xDbljILXROh+kObIiy5kIaPYD8e96x1tgBhUI5J+Y= 37 | github.com/gobwas/glob v0.2.3/go.mod h1:d3Ez4x06l9bZtSvzIay5+Yzi0fmZzPgnTbPcKjJAkT8= 38 | github.com/gocolly/colly v1.2.0 h1:qRz9YAn8FIH0qzgNUw+HT9UN7wm1oF9OBAilwEWpyrI= 39 | github.com/gocolly/colly v1.2.0/go.mod h1:Hof5T3ZswNVsOHYmba1u03W65HDWgpV5HifSuueE0EA= 40 | github.com/golang-sql/civil v0.0.0-20190719163853-cb61b32ac6fe h1:lXe2qZdvpiX5WZkZR4hgp4KJVfY3nMkvmwbVkpv1rVY= 41 | github.com/golang-sql/civil v0.0.0-20190719163853-cb61b32ac6fe/go.mod h1:8vg3r2VgvsThLBIFL93Qb5yWzgyZWhEmBwUJWevAkK0= 42 | github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e h1:1r7pUrabqp18hOBcwBwiTsbnFeTZHV9eER/QT5JVZxY= 43 | github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc= 44 | github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= 45 | github.com/golang/protobuf v1.3.3 h1:gyjaxf+svBWX08ZjK86iN9geUJF0H6gp2IRKX6Nf6/I= 46 | github.com/golang/protobuf v1.3.3/go.mod h1:vzj43D7+SQXF/4pzW/hwtAqwc6iTitCiVSaWz5lYuqw= 47 | github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= 48 | github.com/gopherjs/gopherjs v0.0.0-20181017120253-0766667cb4d1 h1:EGx4pi6eqNxGaHF6qqu48+N2wcFQ5qg5FXgOdqsJ5d8= 49 | github.com/gopherjs/gopherjs v0.0.0-20181017120253-0766667cb4d1/go.mod h1:wJfORRmW1u3UXTncJ5qlYoELFm8eSnnEO6hX4iZ3EWY= 50 | github.com/jinzhu/gorm v1.9.12 h1:Drgk1clyWT9t9ERbzHza6Mj/8FY/CqMyVzOiHviMo6Q= 51 | github.com/jinzhu/gorm v1.9.12/go.mod h1:vhTjlKSJUTWNtcbQtrMBFCxy7eXTzeCAzfL5fBZT/Qs= 52 | github.com/jinzhu/inflection v1.0.0 h1:K317FqzuhWc8YvSVlFMCCUb36O/S9MCKRDI7QkRKD/E= 53 | github.com/jinzhu/inflection v1.0.0/go.mod h1:h+uFLlag+Qp1Va5pdKtLDYj+kHp5pxUVkryuEj+Srlc= 54 | github.com/jinzhu/now v1.0.1 h1:HjfetcXq097iXP0uoPCdnM4Efp5/9MsM0/M+XOTeR3M= 55 | github.com/jinzhu/now v1.0.1/go.mod h1:d3SSVoowX0Lcu0IBviAWJpolVfI5UJVZZ7cO71lE/z8= 56 | github.com/json-iterator/go v1.1.9 h1:9yzud/Ht36ygwatGx56VwCZtlI/2AD15T1X2sjSuGns= 57 | github.com/json-iterator/go v1.1.9/go.mod h1:KdQUCv79m/52Kvf8AW2vK1V8akMuk1QjK/uOdHXbAo4= 58 | github.com/jtolds/gls v4.20.0+incompatible h1:xdiiI2gbIgH/gLH7ADydsJ1uDOEzR8yvV7C0MuV77Wo= 59 | github.com/jtolds/gls v4.20.0+incompatible/go.mod h1:QJZ7F/aHp+rZTRtaJ1ow/lLfFfVYBRgL+9YlvaHOwJU= 60 | github.com/kennygrant/sanitize v1.2.4 h1:gN25/otpP5vAsO2djbMhF/LQX6R7+O1TB4yv8NzpJ3o= 61 | github.com/kennygrant/sanitize v1.2.4/go.mod h1:LGsjYYtgxbetdg5owWB2mpgUL6e2nfw2eObZ0u0qvak= 62 | github.com/konsorten/go-windows-terminal-sequences v1.0.1 h1:mweAR1A6xJ3oS2pRaGiHgQ4OO8tzTaLawm8vnODuwDk= 63 | github.com/konsorten/go-windows-terminal-sequences v1.0.1/go.mod h1:T0+1ngSBFLxvqU3pZ+m/2kptfBszLMUkC4ZK/EgS/cQ= 64 | github.com/leodido/go-urn v1.2.0 h1:hpXL4XnriNwQ/ABnpepYM/1vCLWNDfUNts8dX3xTG6Y= 65 | github.com/leodido/go-urn v1.2.0/go.mod h1:+8+nEpDfqqsY+g338gtMEUOtuK+4dEMhiQEgxpxOKII= 66 | github.com/lib/pq v1.1.1 h1:sJZmqHoEaY7f+NPP8pgLB/WxulyR3fewgCM2qaSlBb4= 67 | github.com/lib/pq v1.1.1/go.mod h1:5WUZQaWbwv1U+lTReE5YruASi9Al49XbQIvNi/34Woo= 68 | github.com/mattn/go-isatty v0.0.12 h1:wuysRhFDzyxgEmMf5xjvJ2M9dZoWAXNNr5LSBS7uHXY= 69 | github.com/mattn/go-isatty v0.0.12/go.mod h1:cbi8OIDigv2wuxKPP5vlRcQ1OAZbq2CE4Kysco4FUpU= 70 | github.com/mattn/go-sqlite3 v2.0.1+incompatible h1:xQ15muvnzGBHpIpdrNi1DA5x0+TcBZzsIDwmw9uTHzw= 71 | github.com/mattn/go-sqlite3 v2.0.1+incompatible/go.mod h1:FPy6KqzDD04eiIsT53CuJW3U88zkxoIYsOqkbpncsNc= 72 | github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421 h1:ZqeYNhU3OHLH3mGKHDcjJRFFRrJa6eAM5H+CtDdOsPc= 73 | github.com/modern-go/concurrent v0.0.0-20180228061459-e0a39a4cb421/go.mod h1:6dJC0mAP4ikYIbvyc7fijjWJddQyLn8Ig3JB5CqoB9Q= 74 | github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742 h1:Esafd1046DLDQ0W1YjYsBW+p8U2u7vzgW2SQVmlNazg= 75 | github.com/modern-go/reflect2 v0.0.0-20180701023420-4b7aa43c6742/go.mod h1:bx2lNnkwVCuqBIxFjflWJWanXIb3RllmbCylyMrvgv0= 76 | github.com/parnurzeal/gorequest v0.2.16 h1:T/5x+/4BT+nj+3eSknXmCTnEVGSzFzPGdpqmUVVZXHQ= 77 | github.com/parnurzeal/gorequest v0.2.16/go.mod h1:3Kh2QUMJoqw3icWAecsyzkpY7UzRfDhbRdTjtNwNiUE= 78 | github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= 79 | github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= 80 | github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= 81 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= 82 | github.com/robfig/cron v1.2.0 h1:ZjScXvvxeQ63Dbyxy76Fj3AT3Ut0aKsyd2/tl3DTMuQ= 83 | github.com/robfig/cron v1.2.0/go.mod h1:JGuDeoQd7Z6yL4zQhZ3OPEVHB7fL6Ka6skscFHfmt2k= 84 | github.com/rogpeppe/go-charset v0.0.0-20180617210344-2471d30d28b4/go.mod h1:qgYeAmZ5ZIpBWTGllZSQnw97Dj+woV0toclVaRGI8pc= 85 | github.com/saintfish/chardet v0.0.0-20120816061221-3af4cd4741ca h1:NugYot0LIVPxTvN8n+Kvkn6TrbMyxQiuvKdEwFdR9vI= 86 | github.com/saintfish/chardet v0.0.0-20120816061221-3af4cd4741ca/go.mod h1:uugorj2VCxiV1x+LzaIdVa9b4S4qGAcH6cbhh4qVxOU= 87 | github.com/sirupsen/logrus v1.5.0 h1:1N5EYkVAPEywqZRJd7cwnRtCb6xJx7NH3T3WUTF980Q= 88 | github.com/sirupsen/logrus v1.5.0/go.mod h1:+F7Ogzej0PZc/94MaYx/nvG9jOFMD2osvC3s+Squfpo= 89 | github.com/smartystreets/assertions v0.0.0-20180927180507-b2de0cb4f26d h1:zE9ykElWQ6/NYmHa3jpm/yHnI4xSofP+UP6SpjHcSeM= 90 | github.com/smartystreets/assertions v0.0.0-20180927180507-b2de0cb4f26d/go.mod h1:OnSkiWE9lh6wB0YB77sQom3nweQdgAjqCqsofrRNTgc= 91 | github.com/smartystreets/goconvey v1.6.4 h1:fv0U8FUIMPNf1L9lnHLvLhgicrIVChEkdzIKYqbNC9s= 92 | github.com/smartystreets/goconvey v1.6.4/go.mod h1:syvi0/a8iFYH4r/RixwvyeAJjdLS9QV7WQ/tjFTllLA= 93 | github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= 94 | github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs= 95 | github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= 96 | github.com/stretchr/testify v1.4.0 h1:2E4SXV/wtOkTonXsotYi4li6zVWxYlZuYNCXe9XRJyk= 97 | github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4= 98 | github.com/temoto/robotstxt v1.1.1 h1:Gh8RCs8ouX3hRSxxK7B1mO5RFByQ4CmJZDwgom++JaA= 99 | github.com/temoto/robotstxt v1.1.1/go.mod h1:+1AmkuG3IYkh1kv0d2qEB9Le88ehNO0zwOr3ujewlOo= 100 | github.com/ugorji/go v1.1.7 h1:/68gy2h+1mWMrwZFeD1kQialdSzAb432dtpeJ42ovdo= 101 | github.com/ugorji/go v1.1.7/go.mod h1:kZn38zHttfInRq0xu/PH0az30d+z6vm202qpg1oXVMw= 102 | github.com/ugorji/go/codec v1.1.7 h1:2SvQaVZ1ouYrrKKwoSk2pzd4A9evlKJb9oTL+OaLUSs= 103 | github.com/ugorji/go/codec v1.1.7/go.mod h1:Ax+UKWsSmolVDwsd+7N3ZtXu+yMGCf907BLYF3GoBXY= 104 | golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= 105 | golang.org/x/crypto v0.0.0-20190325154230-a5d413f7728c/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= 106 | golang.org/x/crypto v0.0.0-20191205180655-e7c4368fe9dd h1:GGJVjV8waZKRHrgwvtH66z9ZGVurTD1MT0n1Bb+q4aM= 107 | golang.org/x/crypto v0.0.0-20191205180655-e7c4368fe9dd/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= 108 | golang.org/x/net v0.0.0-20180218175443-cbe0f9307d01/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= 109 | golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= 110 | golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= 111 | golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3 h1:0GoQqolDA55aaLxZyTzK/Y2ePZzZTUrRacwib7cNsYQ= 112 | golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= 113 | golang.org/x/net v0.0.0-20200202094626-16171245cfb2 h1:CCH4IOTTfewWjGOlSp+zGcjutRKlBEZQ6wTn8ozI/nI= 114 | golang.org/x/net v0.0.0-20200202094626-16171245cfb2/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= 115 | golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= 116 | golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= 117 | golang.org/x/sys v0.0.0-20190422165155-953cdadca894/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= 118 | golang.org/x/sys v0.0.0-20200116001909-b77594299b42 h1:vEOn+mP2zCOVzKckCZy6YsCtDblrpj/w7B9nxGNELpg= 119 | golang.org/x/sys v0.0.0-20200116001909-b77594299b42/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= 120 | golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= 121 | golang.org/x/text v0.3.2 h1:tW2bmiBqwgJj/UpqtC8EpXEZVYOwU0yG4iWbprSVAcs= 122 | golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk= 123 | golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= 124 | golang.org/x/tools v0.0.0-20190328211700-ab21143f2384/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= 125 | google.golang.org/appengine v1.4.0 h1:/wp5JvzpHIxhs/dumFmF7BXTf3Z+dd4uXta4kVyO508= 126 | google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4= 127 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM= 128 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 129 | gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= 130 | gopkg.in/yaml.v2 v2.2.8 h1:obN1ZagJSUGI0Ek/LBmuj4SNLPfIny3KsKFopxRdj10= 131 | gopkg.in/yaml.v2 v2.2.8/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= 132 | moul.io/http2curl v1.0.0 h1:6XwpyZOYsgZJrU8exnG87ncVkU1FVCcTRpwzOkTDUi8= 133 | moul.io/http2curl v1.0.0/go.mod h1:f6cULg+e4Md/oW1cYmwW4IWQOVl2lGbmCNGOHvzX2kE= 134 | -------------------------------------------------------------------------------- /main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "flag" 5 | "time" 6 | 7 | log "github.com/sirupsen/logrus" 8 | 9 | "github.com/storyicon/golang-proxy/business" 10 | ) 11 | 12 | func main() { 13 | var mode string 14 | flag.StringVar(&mode, "mode", "all", "all/consumer/producer/assessor/service, default is all") 15 | flag.Parse() 16 | 17 | log.Infof("Operating Mode: %s, will start running after 3 seconds", mode) 18 | 19 | time.Sleep(3 * time.Second) 20 | 21 | switch mode { 22 | case "all": 23 | go business.StartProducer() 24 | go business.StartConsumer() 25 | go business.StartAssessor() 26 | go business.StartService() 27 | case "consumer": 28 | business.StartConsumer() 29 | case "producer": 30 | business.StartProducer() 31 | case "assessor": 32 | business.StartAssessor() 33 | case "service": 34 | business.StartService() 35 | default: 36 | log.Panicf("Unknown mode: %s", mode) 37 | } 38 | select {} 39 | } 40 | -------------------------------------------------------------------------------- /model/config.go: -------------------------------------------------------------------------------- 1 | package model 2 | 3 | import "fmt" 4 | 5 | type Config struct { 6 | MySQL *MySQLOptions `yaml:"MYSQL"` 7 | } 8 | 9 | type MySQLOptions struct { 10 | Host string `yaml:"HOST"` 11 | Port string `yaml:"PORT"` 12 | User string `yaml:"USER"` 13 | Pass string `yaml:"PASS"` 14 | Db string `yaml:"DB"` 15 | Charset string `yaml:"CHARSET"` 16 | } 17 | 18 | func (options *MySQLOptions) String() string { 19 | return fmt.Sprintf("%s:%s@(%s:%s)/%s?charset=%s", 20 | options.User, options.Pass, 21 | options.Host, options.Port, 22 | options.Db, options.Charset, 23 | ) 24 | } 25 | -------------------------------------------------------------------------------- /model/database.go: -------------------------------------------------------------------------------- 1 | package model 2 | 3 | // CrudeProxy stores the agents that are crawled out, and cannot guarantee their quality. 4 | type CrudeProxy struct { 5 | // ID is the ID value of the current record, which is unique among all proxies. 6 | ID int64 `gorm:"AUTO_INCREMENT;" json:"id"` 7 | // IP is the IP address of the proxy. e.g 127.0.0.1 8 | IP string `json:"ip"` 9 | // Port is the Port of the proxy. e.g 3306 10 | Port string `json:"port"` 11 | // Content is the ip:port of the proxy. e.g 127.0.0.1:3306 12 | Content string `gorm:"unique_index:unique_crude_content;" json:"content"` 13 | // InsertTime is the insertion time of the proxy 14 | InsertTime int64 `json:"insert_time"` 15 | // UpdateTime is the update time of the proxy 16 | UpdateTime int64 `json:"update_time"` 17 | } 18 | 19 | func (CrudeProxy) TableName() string { 20 | return CrudeProxyTableName 21 | } 22 | 23 | // Proxy stores the proxy filtered from CrudeProxy 24 | type Proxy struct { 25 | // ID is the ID value of the current record, which is unique among all proxies. 26 | ID int64 `gorm:"AUTO_INCREMENT;" json:"id"` 27 | // IP is the IP address of the proxy. e.g 127.0.0.1 28 | IP string `json:"ip"` 29 | // Port is the Port of the proxy. e.g 3306 30 | Port string `json:"port"` 31 | // SchemeType represents the protocol type supported by the proxy. 32 | // 0: http 33 | // 1: https 34 | // 2: http & https 35 | SchemeType int64 `json:"scheme_type"` 36 | // Content is the ip:port of the proxy. e.g 127.0.0.1:3306 37 | Content string `gorm:"unique_index:unique_content;" json:"content"` 38 | 39 | // AssessTimes is the number of evaluations of the proxy 40 | AssessTimes int64 `json:"assess_times"` 41 | // SuccessTimes is the number of successful evaluations of the proxy 42 | SuccessTimes int64 `json:"success_times"` 43 | // AvgResponseTime is the average response time of the proxy 44 | AvgResponseTime float64 `json:"avg_response_time"` 45 | // ContinuousFailedTimes is the number of consecutive failures during the proxy evaluation process 46 | ContinuousFailedTimes int64 `json:"continuous_failed_times"` 47 | // Score is the rating of the proxy 48 | Score float64 `json:"score"` 49 | // InsertTime is the insertion time of the proxy 50 | InsertTime int64 `json:"insert_time"` 51 | // UpdateTime is the update time of the proxy, can also reflect the last evaluation time 52 | UpdateTime int64 `json:"update_time"` 53 | } 54 | 55 | func (Proxy) TableName() string { 56 | return ProxyTableName 57 | } 58 | -------------------------------------------------------------------------------- /model/httpbin.go: -------------------------------------------------------------------------------- 1 | package model 2 | 3 | // HTTPBinIP is the response struct of httpbin.org/ip 4 | type HTTPBinIP struct { 5 | // Origin is the real ip returned from httpbin 6 | Origin string `json:"origin"` 7 | } 8 | -------------------------------------------------------------------------------- /model/model.go: -------------------------------------------------------------------------------- 1 | package model 2 | 3 | const ( 4 | ProxyTableName = "proxy" 5 | ProxyInitScore = 1 6 | SQLiteDatabase = "sqlite3.db" 7 | CrudeProxyTableName = "crude_proxy" 8 | ) 9 | -------------------------------------------------------------------------------- /model/source.go: -------------------------------------------------------------------------------- 1 | package model 2 | 3 | // Sources is an array of Source. 4 | type Sources []*Source 5 | 6 | // Source is the source configuration 7 | type Source struct { 8 | // Name is the name of source 9 | Name string 10 | // Page is the page options 11 | Page PageOptions 12 | // Selector is the selector options 13 | Selector SelectorOptions 14 | // Category is the category options 15 | Category CategoryOptions 16 | // Debug determines whether to output debugging information 17 | Debug bool 18 | } 19 | 20 | // PageOptions is the page configuration 21 | type PageOptions struct { 22 | // Entry is the first url to cralw 23 | Entry string 24 | // Template is the page template. e.g http:xxxx.xxx/proxy?page={page} 25 | Template string 26 | // From is the start page number 27 | From int 28 | // To is the end page number 29 | To int 30 | } 31 | 32 | // SelectorOptions is the selector configuration 33 | type SelectorOptions struct { 34 | // Iterator is the iterable element of proxy items 35 | Iterator string 36 | // IP is the IP selector 37 | IP string 38 | // Port is the port selector 39 | Port string 40 | } 41 | 42 | // CategoryOptions is the category configuration 43 | type CategoryOptions struct { 44 | // ParallelNumber is the number of parallels that the source crawls 45 | // e.g 10 46 | ParallelNumber int 47 | // DelayRange is the interval between crawls, random in this array range 48 | // e.g [0, 10] 49 | DelayRange []int 50 | // Interval is how long it takes to re-crawl from StarURL 51 | // e.g "@every 10m", "@every 10s", "@every 10h" 52 | Interval string 53 | } 54 | -------------------------------------------------------------------------------- /source/nimadaili.com.http.yml: -------------------------------------------------------------------------------- 1 | 2 | # "."命名开头的文件将不会被载入 3 | page: 4 | entry: "http://www.nimadaili.com/http/?page=1" 5 | template: "http://www.nimadaili.com/http/?page={page}" 6 | from: 1 7 | to: 2000 8 | selector: 9 | iterator: ".fl-table tr" 10 | ip: "td:nth-child(1)" 11 | port: "" 12 | category: 13 | parallelnumber: 3 14 | delayRange: [10, 30] 15 | interval: "@every 10m" 16 | debug: true 17 | 18 | -------------------------------------------------------------------------------- /source/nimadaili.com.https.yml: -------------------------------------------------------------------------------- 1 | 2 | # "."命名开头的文件将不会被载入 3 | page: 4 | entry: "http://www.nimadaili.com/https/?page=1" 5 | template: "http://www.nimadaili.com/https/?page={page}" 6 | from: 1 7 | to: 2000 8 | selector: 9 | iterator: ".fl-table tr" 10 | ip: "td:nth-child(1)" 11 | port: "" 12 | category: 13 | parallelnumber: 3 14 | delayRange: [10, 30] 15 | interval: "@every 10m" 16 | debug: true 17 | 18 | -------------------------------------------------------------------------------- /source/xiladaili.com.http.yml: -------------------------------------------------------------------------------- 1 | # "."命名开头的文件将不会被载入 2 | page: 3 | entry: "http://www.xiladaili.com/http/1/" 4 | template: "http://www.xiladaili.com/http/{page}/" 5 | from: 0 6 | to: 2000 7 | selector: 8 | iterator: ".fl-table tr" 9 | ip: "td:nth-child(1)" 10 | port: "" 11 | category: 12 | parallelnumber: 3 13 | delayRange: [10, 30] 14 | interval: "@every 10m" 15 | debug: true 16 | 17 | -------------------------------------------------------------------------------- /source/xiladaili.com.https.yml: -------------------------------------------------------------------------------- 1 | # "."命名开头的文件将不会被载入 2 | page: 3 | entry: "http://www.xiladaili.com/https/1/" 4 | template: "http://www.xiladaili.com/https/{page}/" 5 | from: 0 6 | to: 2000 7 | selector: 8 | iterator: ".fl-table tr" 9 | ip: "td:nth-child(1)" 10 | port: "" 11 | category: 12 | parallelnumber: 3 13 | delayRange: [10, 30] 14 | interval: "@every 10m" 15 | debug: true 16 | 17 | -------------------------------------------------------------------------------- /std/mysql.go: -------------------------------------------------------------------------------- 1 | package std 2 | 3 | import ( 4 | "log" 5 | 6 | "github.com/jinzhu/gorm" 7 | _ "github.com/jinzhu/gorm/dialects/mysql" 8 | "github.com/storyicon/golang-proxy/model" 9 | ) 10 | 11 | func NewMySQL(options *model.MySQLOptions) (db *gorm.DB, err error) { 12 | log.Println(options.String()) 13 | db, err = gorm.Open("mysql", options.String()) 14 | if err != nil { 15 | return 16 | } 17 | db.SingularTable(true) 18 | return 19 | } 20 | -------------------------------------------------------------------------------- /std/sqlite.go: -------------------------------------------------------------------------------- 1 | package std 2 | 3 | import ( 4 | "github.com/jinzhu/gorm" 5 | _ "github.com/jinzhu/gorm/dialects/sqlite" 6 | ) 7 | 8 | func NewSQLite(dbname string) (db *gorm.DB, err error) { 9 | db, err = gorm.Open("sqlite3", dbname) 10 | if err != nil { 11 | return 12 | } 13 | db.SingularTable(true) 14 | return db, nil 15 | } 16 | -------------------------------------------------------------------------------- /std/std.go: -------------------------------------------------------------------------------- 1 | package std 2 | 3 | import ( 4 | "encoding/json" 5 | "fmt" 6 | "io/ioutil" 7 | "os" 8 | "strings" 9 | 10 | log "github.com/sirupsen/logrus" 11 | 12 | yaml "gopkg.in/yaml.v2" 13 | ) 14 | 15 | // TemplateRender is used to replace all the "key" in the template with "value" 16 | func TemplateRender(template string, key string, value interface{}) string { 17 | return strings.Replace(template, "{"+key+"}", fmt.Sprint(value), -1) 18 | } 19 | 20 | // GetCurrentDirectory is used to get the directory where the current binary is running. 21 | func GetCurrentDirectory() string { 22 | dir, err := os.Getwd() 23 | if err != nil { 24 | log.Panicln(err) 25 | } 26 | return dir 27 | } 28 | 29 | // LoadYaml is used to load yaml file 30 | func LoadYaml(path string, data interface{}) (err error) { 31 | if _, err = os.Stat(path); err == nil { 32 | var bytes []byte 33 | if bytes, err = ioutil.ReadFile(path); err == nil { 34 | if err = yaml.Unmarshal(bytes, data); err == nil { 35 | return nil 36 | } 37 | } 38 | } 39 | return 40 | } 41 | 42 | func IsDirExists(path string) bool { 43 | if s, err := os.Stat(path); err == nil { 44 | return s.IsDir() 45 | } 46 | return false 47 | } 48 | 49 | func Dump(variable interface{}) { 50 | if str, err := json.MarshalIndent(variable, "", " "); err == nil { 51 | log.Print(string(str)) 52 | return 53 | } 54 | log.Printf("%+v", variable) 55 | } 56 | --------------------------------------------------------------------------------