├── .gitignore
└── BigDataSystems
    ├── Petuum
        ├── BgThreads.md
        ├── CientTableUpdate.md
        ├── ImportantClasses.md
        ├── Introduction-to-parameter-server-system.pptx
        ├── Introduction-to-parameter-server.pptx
        ├── MF-logs
        │   ├── MF-log.pdf
        │   ├── driver.txt
        │   ├── matrixfact.ubuntu.xulijie.log.INFO.20141230-152214.13924.txt
        │   └── matrixfact.ubuntu2.xulijie.log.INFO.20141230-152217.8866.txt
        ├── Matrix-Factorization-Analysis.md
        ├── MatrixFactorization.md
        ├── PetuumArchitecture.md
        ├── Petuum基本架构.md
        ├── Petuum基础.md
        ├── Petuum本地编译运行.md
        ├── Petuum系统及Table配置.md
        ├── STRADS.md
        ├── ServerThreads.md
        ├── TableCreation.md
        ├── ThreadInitialization.md
        ├── figures
        │   ├── Architecture.png
        │   ├── BSP-ABSP-SSP.png
        │   ├── ClientTableUpdate.png
        │   ├── Compare-BSP-ABSP-SSP.png
        │   ├── ConsistencyModel.png
        │   ├── CreateTable.png
        │   ├── CreateTableThreads.png
        │   ├── DistributedThreads.png
        │   ├── LocalThreads.png
        │   ├── PSTableGroup-Init().png
        │   ├── Petuum-architecture.png
        │   ├── Petuum-ps-topology.png
        │   ├── Petuum架构图.graffle
        │   ├── Petuum架构图.png
        │   ├── STRADS-architecture.png
        │   ├── matrixfact-petuum.png
        │   ├── matrixfact.png
        │   ├── parallel-matrixfact.png
        │   └── petuum-overview.png
        └── 杂项.md
    └── Spark
        ├── Build
            └── BuildingSpark.md
        ├── ML
            ├── Introduction to MLlib Pipeline.md
            └── figures
            │   ├── CrossValidatorDemo.png
            │   ├── DAGpipeline.png
            │   └── pipelineDemo.png
        ├── Scheduler
            ├── SparkResourceManager.graffle
            ├── SparkScheduler.graffle
            ├── SparkScheduler.md
            └── figures
            │   ├── SparkResourceManager.pdf
            │   ├── SparkSchedulerAppSubmit.pdf
            │   ├── SparkStandaloneMaster.pdf
            │   ├── SparkStandaloneResourceAllocation.pdf
            │   ├── SparkStandaloneTaskScheduler.pdf
            │   └── SparkStandaloneTaskSchedulerChinese.pdf
        └── StackOverflowDiagnosis
            ├── GraphX_StackOverflow_Cause_Diagnosis.md
            ├── StackOverflow.md
            └── figures
                ├── g1.png
                ├── g2.png
                └── g3.png


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/BgThreads.md:
--------------------------------------------------------------------------------
  1 | # BgWorkers
  2 | 
  3 | BgWorkers的角色与ServerThreads的角色类似，都是管理本进程里的bg/server threads。BgWorker通过BgContext来管理，ServerThreads通过ServerContext来管理。
  4 | 
  5 | BgContext里面存放了以下数据结构：
  6 | 
  7 | ```c++
  8 | int version;   // version of the data, increment when a set of OpLogs
  9 |                // are sent out; may wrap around
 10 |                // More specifically, version denotes the version of the
 11 |                // OpLogs that haven't been sent out.
 12 |                // version表示client端的最新opLog还没有发送给server
 13 | RowRequestOpLogMgr *row_request_oplog_mgr;
 14 | 
 15 | // initialized by BgThreadMain(), used in CreateSendOpLogs()
 16 | // For server x, table y, the size of serialized OpLog is ...
 17 | map<serverId, map<int32_t, size_t> > server_table_oplog_size_map;
 18 | // The OpLog msg to each server
 19 | map<serverId,, ClientSendOpLogMsg* > server_oplog_msg_map;
 20 | // map server id to oplog msg size
 21 | map<serverId,, size_t> server_oplog_msg_size_map;
 22 | // size of oplog per table, reused across multiple tables
 23 | map<int32_t, size_t> table_server_oplog_size_map;
 24 | 
 25 | /* Data members needed for server push */
 26 | VectorClock server_vector_clock;
 27 | ```
 28 | 
 29 | ## Bg thread初始化
 30 | 
 31 | 1. 在bg thread初始化时会先打印出来“Bg Worker starts here, my id = 100/1100”。
 32 | 2. InitBgContext()。设置一下`bg_context->row_request_oplog_mgr = new SSPPushRowRequestOpLogMgr`。然后对PS中的每一个`serverId`，将其放入下列数据结构，`server_table_oplog_size_map.insert(serverId, map<int, int>())`，`server_oplog_msg_map.insert(serverId, 0)`，`server_oplog_msg_size_map.insert(serverId, 0)`，`table_server_oplog_size_map.insert(serverId, 0)`，`server_vector_clock.AddClock(serverId)`。AddClock会将`serverId, clock=0`放入到`server_vector_clock`中。
 33 | 3. BgServerHanshake()。
 34 | 
 35 | 	```
 36 | 	1. 通过ConnectToNameNodeOrServer(name_node_id)连接Namenode。
 37 | 	      首先打印出"ConnectToNameNodeOrServer server_id"。
 38 | 	      然后将自己的client_id填入到ClientConnectMsg中。
 39 | 	      最后将msg发送给server_id对应的local/remote server thread（这里是Namenode thread）。
 40 | 	2. 等待Namenode返回的ConnectServerMsg (kConnectServer)消息。
 41 | 	3. 连接PS里面的每个server thread，仍然是通过ConnectToNameNodeOrServer(server_id)。
 42 | 	4. 等待，直到收到所有server thread返回的kClientStart信息，每收到一条信息就会打印"get kClientStart from server_id"。
 43 | 	5. 收到namenode和所有server返回的信息后，退出。
 44 | 	```
 45 | 4. 解除`pthread_barrier_wait`。
 46 | 5. 去接受本进程内的AppInitThread的连接。使用`RecvAppInitThreadConnection()`去接受连接，连接消息类型是kAppConnect。
 47 | 6. 如果本bg thread是head bg thread（第一个bg thread）就要承担CreateClientTable的任务，先打印"head bg handles CreateTable"，然后调用HandleCreateTables()，然后wait直到Table创建完成。
 48 | 7. 最后便进入了无限等待循环，等待接受msg，处理msg。
 49 | 
 50 | ### HandleCreateTables()
 51 | 
 52 | > the app thread shall not submit another create table request before the current one returns as it is blocked waiting
 53 | 
 54 | 1. 假设要create 3 tables，那么会去`comm_bus`索取这每个table的BgCreateTableMsg (kBgCreateTable)，然后从msg中提取`staleness, row_type, row_capacity, process_cache_capacity, thread_cache_capacity, oplog_capacity`。
 55 | 2. 将`table_id, staleness, row_type, row_capacity`包装成`CreateTableMsg`，然后将该msg发送到Namenode。
 56 | 3. 等待接收Namenode的反馈信息CreateTableReplyMsg (kCreateTableReply)，收到就说明namenode已经知道head bg thread要创建ClientTable。
 57 | 4. 然后可以创建`client_table = new ClientTable(table_id, client_table_config)`。
 58 | 5. 将`client_table`放进`map<table_id, ClientTable> tables`里。
 59 | 6. 打印"Reply app thread"，回复app init thread表示ClientTable已经创建。
 60 | 
 61 | ### `ClientTable(table_id, client_table_config)`
 62 | 
 63 | 与ServerTable直接存储parameter rows不同，ClientTable是一个逻辑概念，它相当于一个ServerTable的buffer/cache，app thread将最新的参数先写入到这个buffer，然后push到Server上。从Server端pull parameter rows的时候也一样，先pull到ClientTable里面然后读到app thread里面。
 64 | 
 65 | ![](figures/ClientTableUpdate.png)
 66 | 
 67 | 1. 提取`table_id, row_type`。
 68 | 2. 创建一个`Row sample_row`，创建这个row只是用来使用Row中的函数，而不是ClientTable中实际存储value的row，实际的row存放在`process_storage`中。
 69 | 3. 初始化一下oplog，oplog用于存储parameter的本地更新，也就是实际的updated value。有几个bg thread，就有几个oplog.opLogPartition。
 70 | 4. 初始化`process_storage(config.process_cache_capacity)`。`process_storage`被所有thread共享，里面存储了ClientTable的实际rows，但由于`process_storage`有大小限制（row的个数），可能存储ClientTable的一部分，完整的Table存放在Server端。
 71 | 5. 初始化`oplog_index`，目前还不知道这个东西是干嘛的？
 72 | 6. 设置Table的一致性控制器，如果是SSP协议就使用SSPConsistencyController，如果是SSPPush协议，使用SSPPushConsistencyController。
 73 | 
 74 | ## 当bg thread收到kAppConnect消息
 75 | 
 76 | 1. `++num_connected_app_threads`
 77 | 
 78 | ## 当bg thread收到kRowRequest消息
 79 | 
 80 | 1. 接收到`row_request_msg`，类型是RowRequestMsg。
 81 | 2. 调用`CheckForwardRowRequestToServer(sender_id, row_request_msg)`来处理rowRequest消息，`sender_id`就是app thread id。
 82 | 
 83 | ### `CheckForwardRowRequestToServer(app_thread_id, row_request_msg)`
 84 | 
 85 | 1. 从msg中提取出`table_id, row_id, clock`。
 86 | 2. 从tables中找到`table_id`对应的ClientTable table。
 87 | 3. 提取出table对应的ProcessStorage，并去该storage中查找`row_id`对应的row。
 88 | 4. 如果找到了对应的row，且row的clock满足要求（row.clock >= request.clock），那么只是发一个空RowRequestReplyMsg消息给app thread，然后return。如果没找到对应的row，那就要去server端取，会执行下面的步骤：
 89 | 5. 构造一个RowRequestInfo，初始化它的`app_thread_id, clock = row_request_msg.clock, version = bgThread.version - 1`。Version in request denotes the update version that the row on server can see. Which should be 1 less than the current version number。
 90 | 6. 将这个RowRequestInfo加入到RowRequestOpLogMgr中，使用`bgThread.row_request_oplog_mgr->AddRowRequest(row_request, table_id, row_id)`。
 91 | 7. 如果必须send这个RowRequestInfo（本地最新更新也没有）到server，就会先根据`row_id`计算存储该`row_id`的`server_id`（通过GetRowPartitionServerID(table_id, row_id)，只是简单地`server_ids[row_id % num_server]`），然后发`row_request_msg`请求给server。
 92 | 
 93 | ### `SSPRowRequestOpLogMgr.AddRowRequest(row_request, table_id, row_id)`
 94 | 
 95 | 1. 提取出request的version (也就是bgThread.version - 1)。
 96 | 2. request.sent = true。
 97 | 3. 去`map<(tableId, rowId), list<RowRequestInfo> > bgThread.row_request_oplog_mgr.pending_row_requests`里取出`(request.table_id, request.row_id)`对应的list<RowRequestInfo>，然后从后往前查看，将request插入到合适的位置，使得prev.clock < request.clock < next.clock。如果插入成功，那么会打印"I'm requesting clock is request.clock. There's a previous request requesting clock is prev.clock."。然后将request.sent设置为false（意思是不用send request到server端，先暂时保存），`request_added`设置为true。
 98 | 4. `++version_request_cnt_map[version]`。
 99 | 
100 | 
101 | > 可见在client和server端之间不仅要cache push/pull的parameters，还要cache push/pull的requests。
102 | 
103 | ## 当bg thread收到kServerRowRequestReply消息
104 | 
105 | 1. 收到ServerRowRequestReplyMsg消息
106 | 2. 处理消息`HandleServerRowRequestReply(server_id, server_row_request_reply_msg)`。
107 | 
108 | ### `HandleServerRowRequestReply(server_id, server_row_request_reply_msg)`
109 | 
110 | 1. 先从msg中提取出`table_id, row_id, clock, version`。
111 | 2. 从bgWorkers.tables中找到`table_id`对应的ClientTable。
112 | 3. 将msg中的row反序列化出来，放到`Row *row_data`中。
113 | 4. 将msg的version信息添加到`bgThread.row_request_oplog_mgr`中，使用`bgThread.row_request_oplog_mgr->ServerAcknowledgeVersion(server_id, version)`。
114 | 5. 处理row，使用`ApplyOpLogsAndInsertRow(table_id, client_table, row_id, version, row_data, clock)`。
115 | 6. `int clock_to_request = bgThread.row_request_oplog_mgr->InformReply(table_id, row_id, clock, bgThread.version, &app_thread_ids)`。
116 | 7. 如果`clock_to_request > 0`，那么构造RowRequestMsg，将`tabel_id, row_id, clock_to_request`填进msg。根据`table_id, row_id`计算存放该row的server thread，然后将msg发给server，并打印“send to server + serverId”。
117 | 8. 构造一个空的RowRequestReplyMsg，发送给每个app thread。
118 | 
119 | 
120 | ### `row_request_oplog_mgr.ServerAcknowledgeVersion(server_id, version)`
121 | 目前RowRequestOpLogMgr中的方法都会调用其子类SSPRowRequestOpLogMgr中的方法。本方法目前为空。
122 | 
123 | ### `ApplyOpLogsAndInsertRow(table_id, client_table, row_id, version, row_data, clock)`
124 | 
125 | Step 1：该函数首先执行`ApplyOldOpLogsToRowData(table_id, client_table, row_id, row_version, row_data)`，具体执行如下步骤：
126 | 
127 | 1. 如果msg.version + 1 >=  bgThread.version，那么直接return。
128 | 2. 调用`bg_oplog = bgThread.row_request_oplog_mgr->OpLogIterInit(version + 1, bgThread.version - 1)`。
129 | 3. `oplog_version = version + 1`。
130 | 4. 对于每一条`bg_oplog: BgOpLog`执行如下操作：
131 | 5. 得到`table_id`对应的BgOpLogPartitions，使用`BgOpLogPartition *bg_oplog_partition = bg_oplog->Get(table_id)`。
132 | 6. `RowOpLog *row_oplog = bg_oplog_partition->FindOpLog(row_id)`。
133 | 7. 如果`row_oplog`不为空，将RowOpLog中的update都更新到`row_data`上。
134 | 8. 然后去获得下一条`bg_oplog`，使用`bg_oplog = bgThread.row_request_oplog_mgr->OpLogIterNext(&oplog_version)`。该函数会调用`SSPRowRequestOpLogMgr.GetOpLog(version)`去`version_oplog_map<version, oplog:BgOpLog>`那里获得oplog。
135 | 
136 | BgOpLog和TableOpLog不一样，BgOpLog自带的数据结构是`map<table_id, BgOpLogPartition*> table_oplog_map`。BgOpLog由RowRequest OpLogMgr自带的`map<version, BgOpLog*> version_oplog_map`持有，而RowRequestOpLogMgr由每个bg thread持有。RowRequestOpLogMgr有两个子类：SSPRowRequestOpLogMgr和SSPPushRowRequestOpLogMgr。TableOpLog由每个ClientTable对象持有。BgOpLog对row request进行cache，而TableOpLog对parameter updates进行cache。
137 | 
138 | Step 2：`ClientRow *client_row = CreateClientRowFunc(clock, row_data)`
139 | 
140 | Step 3：获取ClientTable的oplog，使用`TableOpLog &table_oplog = client_table->get_oplog()`。
141 | 
142 | Step 4：提取TableOpLog中对应的row的oplogs，然后更新到`row_data`上。
143 | 
144 | Step 5：最后将`(row_id, client_row)`插入到ClientTable的`process_storage`中。
145 | 
146 | > 整个过程可以看到，先new出来一个新的row，然后将BgThread.BgOpLog持有的一些RowOpLog更新到row上，接着将ClientTable持有的RowOpLog更新到row上。
147 | 
148 | 
149 | 
150 | 
151 | ### `row_request_oplog_mgr.InformReply(table_id, row_id, clock, bgThread.version, &app_thread_ids)`
152 | 
153 | 
154 | ## SSPRowRequestOpLogMgr逻辑
155 | 
156 | 1. 负责持有client**待发往**或者**已发往**server的row requests。这些row不在本地process cache中。
157 | 2. 如果requested row不在本地cache中，bg worker会询问RowRequestMgr是否已经发出了改row的request，如果没有，那么就send该row的request，否则，就等待server response。
158 | 3. 当bg worker收到该row的reply时，它会将该row insert到process cache中，然后使用RowRequestMgr检查哪些buffered row request可以被reply。
159 | 4. 从一个row reqeust被bg worker发到server，到bg worker接收server reply的这段时间内，bg worker可能已经发了多组row update requests到server。Server端会buffer这些row然后等到一定时间再update server端的ServerTable，然后再reply。
160 | 5. Bg worker为每一组updates分配一个单调递增的version number。本地的version number表示已经被发往server的updates最新版本。当一个row request被发送的时候，它会包含本地最新的version number。Server接收和处理messages会按照一定的顺序，当server在处理一个row request的时候，比该row request version小的row requests会先被处理，也就是说server按照version顺序来处理同一row的requests。
161 | 6. 当server buffer一个client发来的row request后，又收到同一个client发来的一组updates的时候，server会增加这个已经被buffer的row request的version。这样，当client收到这个row request的reply的时候，它会通过version知道哪些updates已经被server更新，之后在将row插入到process cache之前，会将missing掉的updates应用到row上。
162 | 7. RowRequestMgr也负责跟踪管理sent oplog。一个oplog会一直存在不被删掉，直到在此version之前的row requests都已经被reply。
163 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/CientTableUpdate.md:
--------------------------------------------------------------------------------
 1 | # ClientTable Upadte
 2 | 
 3 | ##  总体架构
 4 | ![](figures/Architecture.png)
 5 | 
 6 | 这个是物理架构图，但实际实现比这张图复杂。可以看到为了减少Server里Table的访问次数，Petuum在Client端设计了两级缓存，分别是Thread级别和Process级别的缓存。
 7 | 
 8 | ## ClientTable结构图
 9 | ![](figures/ClientTableUpdate.png)
10 | 
11 | ClientTable实际存放在ProcessStorage中，但相对于ServerTable来说，ProcessStorage中存放的Table只是ServerTable的一部分，甚至可以设置ClientTable的row_num为0，这样就可以减少Client端的内存使用量。
12 | 
13 | ## ClientTable初始化
14 | 
15 | ClientTable属性：
16 | 
17 | | Name | Default | Description | L Table |
18 | |:-----|:------|:-------|:-------|
19 | | table\_info.row\_type| N/A | row type (e.g., 0 表示 DenseRow) | 0 |
20 | | process\_cache\_capacity| 0 | Table 里的 row个数| matrix.getN() |
21 | | table\_info.row\_capacity| 0 | 对于 DenseRow，指column个数，对SparseRow无效| K |
22 | | table\_info.table_staleness | 0 | SSP staleness | 0 |
23 | | table\_info.oplog\_capacity | 0 | OpLogTable里面最多可以写入多少个row | 100 |
24 | 
25 | 每个bg thread持有一个OpLogTable，OpLogTable的`row_num = oplog_capacity / bg_threads_num`。
26 | 
27 | 代码分析：
28 | 
29 | ```c++
30 | void SSPConsistencyController::BatchInc(int32_t row_id,
31 |   const int32_t* column_ids, const void* updates, int32_t num_updates) {
32 | 
33 |   // updates就是每个col上要increase的value。
34 |   // 比如，col 1和col 3都要加1，那么column_ids = {1, 3}，updates = {1, 1}
35 |   // thread_cache_是ThreadTable的指针，ThreadTable就是ClientTable或者ServerTable
36 |   // IndexUpadte(row_id)会
37 |   thread_cache_->IndexUpdate(row_id);
38 | 
39 |   OpLogAccessor oplog_accessor;
40 |   oplog_.FindInsertOpLog(row_id, &oplog_accessor);
41 | 
42 |   const uint8_t* deltas_uint8 = reinterpret_cast<const uint8_t*>(updates);
43 | 
44 |   for (int i = 0; i < num_updates; ++i) {
45 |     void *oplog_delta = oplog_accessor.FindCreate(column_ids[i]);
46 |     sample_row_->AddUpdates(column_ids[i], oplog_delta, deltas_uint8
47 | 			    + sample_row_->get_update_size()*i);
48 |   }
49 | 
50 |   RowAccessor row_accessor;
51 |   bool found = process_storage_.Find(row_id, &row_accessor);
52 |   if (found) {
53 |     row_accessor.GetRowData()->ApplyBatchInc(column_ids, updates,
54 |                                              num_updates);
55 |   }
56 | }
57 | ```
58 | 
59 | 
60 | ## ClientTable属性解释
61 | 
62 | ```c++
63 | Class ClientTable {
64 | 	private:
65 | 	  // table Id
66 | 	  int32_t table_id_;
67 | 	  // Table里面row的类型，比如DenseRow<float>
68 | 	  int32_t row_type_;
69 | 	  // Row的游标（指针）
70 | 	  const AbstractRow* const sample_row_;
71 | 	  // Table的更新日志
72 | 	  TableOpLog oplog_;
73 | 	  // 进程里cache的Table
74 | 	  ProcessStorage process_storage_;
75 | 	  // Table的一致性控制协议
76 | 	  AbstractConsistencyController *consistency_controller_;
77 | 
78 | 	  // ThreadTable就是ClientTable或者ServerTable
79 | 	  // thread_cahce就是Threads维护的ClientTable的全局对象
80 | 	  boost::thread_specific_ptr<ThreadTable> thread_cache_;
81 | 	  // 操作日志，每个bg thread对应一个index value
82 | 	  TableOpLogIndex oplog_index_;
83 | }
84 | ```


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/ImportantClasses.md:
--------------------------------------------------------------------------------
  1 | # Important Classes
  2 | 
  3 | ## ClientTable
  4 | 
  5 | ```c++
  6 | class ClientTable : public AbstractClientTable {
  7 | public:
  8 |   // Instantiate AbstractRow, TableOpLog, and ProcessStorage using config.
  9 |   ClientTable(int32_t table_id, const ClientTableConfig& config);
 10 | 
 11 |   ~ClientTable();
 12 | 
 13 |   void RegisterThread();
 14 | 
 15 |   void GetAsync(int32_t row_id);
 16 |   void WaitPendingAsyncGet();
 17 |   void ThreadGet(int32_t row_id, ThreadRowAccessor *row_accessor);
 18 |   void ThreadInc(int32_t row_id, int32_t column_id, const void *update);
 19 |   void ThreadBatchInc(int32_t row_id, const int32_t* column_ids,
 20 |                       const void* updates,
 21 |                       int32_t num_updates);
 22 |   void FlushThreadCache();
 23 | 
 24 |   void Get(int32_t row_id, RowAccessor *row_accessor);
 25 |   void Inc(int32_t row_id, int32_t column_id, const void *update);
 26 |   void BatchInc(int32_t row_id, const int32_t* column_ids, const void* updates,
 27 |     int32_t num_updates);
 28 | 
 29 |   void Clock();
 30 |   cuckoohash_map<int32_t, bool> *GetAndResetOpLogIndex(int32_t client_table);
 31 | 
 32 |   ProcessStorage& get_process_storage () {
 33 |     return process_storage_;
 34 |   }
 35 | 
 36 |   TableOpLog& get_oplog () {
 37 |     return oplog_;
 38 |   }
 39 | 
 40 |   const AbstractRow* get_sample_row () const {
 41 |     return sample_row_;
 42 |   }
 43 | 
 44 |   int32_t get_row_type () const {
 45 |     return row_type_;
 46 |   }
 47 | 
 48 | private:
 49 |   int32_t table_id_;
 50 |   int32_t row_type_;
 51 |   // 指向每一个row的指针
 52 |   const AbstractRow* const sample_row_;
 53 |   // Table操作日志
 54 |   TableOpLog oplog_;
 55 |   // 进程的Table
 56 |   ProcessStorage process_storage_;
 57 |   // Table的一致性controller
 58 |   AbstractConsistencyController *consistency_controller_;
 59 | 
 60 |   // ThreadTable指针
 61 |   boost::thread_specific_ptr<ThreadTable> thread_cache_;
 62 |   // Table操作日志的index
 63 |   TableOpLogIndex oplog_index_;
 64 | };
 65 | 
 66 | }  // namespace petuum
 67 | ```
 68 | 
 69 | ## ThreadTable
 70 | 
 71 | ```c++
 72 | class ThreadTable : boost::noncopyable {
 73 | public:
 74 |   explicit ThreadTable(const AbstractRow *sample_row);
 75 |   ~ThreadTable();
 76 |   void IndexUpdate(int32_t row_id);
 77 |   void FlushOpLogIndex(TableOpLogIndex &oplog_index);
 78 | 
 79 |   AbstractRow *GetRow(int32_t row_id);
 80 |   void InsertRow(int32_t row_id, const AbstractRow *to_insert);
 81 |   void Inc(int32_t row_id, int32_t column_id, const void *delta);
 82 |   void BatchInc(int32_t row_id, const int32_t *column_ids,
 83 |     const void *deltas, int32_t num_updates);
 84 | 
 85 |   void FlushCache(ProcessStorage &process_storage, TableOpLog &table_oplog,
 86 | 		  const AbstractRow *sample_row);
 87 | 
 88 | private:
 89 |   // Vector[set, set, set, ..., set]
 90 |   std::vector<std::unordered_set<int32_t> > oplog_index_;
 91 |   // HashMap<rowid, Row>
 92 |   boost::unordered_map<int32_t, AbstractRow* > row_storage_;
 93 |   // HashMap<rowid RowOpLog>
 94 |   boost::unordered_map<int32_t, RowOpLog* > oplog_map_;
 95 |   // Row指针
 96 |   const AbstractRow *sample_row_;
 97 | };
 98 | ```
 99 | 
100 | ## TableGroup
101 | ```c++
102 | class TableGroup : public AbstractTableGroup {
103 | public:
104 |   TableGroup(const TableGroupConfig &table_group_config,
105 |              bool table_access, int32_t *init_thread_id);
106 | 
107 |   ~TableGroup();
108 | 
109 |   bool CreateTable(int32_t table_id,
110 |       const ClientTableConfig& table_config);
111 | 
112 |   void CreateTableDone();
113 | 
114 |   void WaitThreadRegister();
115 | 
116 |   AbstractClientTable *GetTableOrDie(int32_t table_id) {
117 |     auto iter = tables_.find(table_id);
118 |     CHECK(iter != tables_.end()) << "Table " << table_id << " does not exist";
119 |     return static_cast<AbstractClientTable*>(iter->second);
120 |   }
121 | 
122 |   int32_t RegisterThread();
123 | 
124 |   void DeregisterThread();
125 | 
126 |   void Clock();
127 | 
128 |   void GlobalBarrier();
129 | 
130 | private:
131 |   typedef void (TableGroup::*ClockFunc) ();
132 |   ClockFunc ClockInternal;
133 | 
134 |   void ClockAggressive();
135 |   void ClockConservative();
136 | 
137 |   // TreeMap<rowid, ClientTable>
138 |   std::map<int32_t, ClientTable* > tables_;
139 |   // barrier
140 |   pthread_barrier_t register_barrier_;
141 |   // 注册的app thread（也就是worker thread）数目
142 |   std::atomic<int> num_app_threads_registered_;
143 | 
144 |   // Max staleness among all tables.
145 |   int32_t max_table_staleness_;
146 |   // Table处于第几个clock里面
147 |   VectorClockMT vector_clock_;
148 | };
149 | ```
150 | 
151 | ## SSPClientRow
152 | 
153 | ```c++
154 | // ClientRow is a wrapper on user-defined ROW data structure (e.g., vector,
155 | // map) with additional features:
156 | //
157 | // 1. Reference Counting: number of references used by application. Note the
158 | // copy in storage itself does not contribute to the count
159 | // 2. Row Metadata
160 | //
161 | // ClientRow does not provide thread-safety in itself. The locks are
162 | // maintained in the storage and in (user-defined) ROW.
163 | class SSPClientRow : public ClientRow {
164 | public:
165 |   // ClientRow takes ownership of row_data.
166 |   SSPClientRow(int32_t clock, AbstractRow* row_data):
167 |       ClientRow(clock, row_data),
168 |       clock_(clock){ }
169 | 
170 |   void SetClock(int32_t clock) {
171 |     std::unique_lock<std::mutex> ulock(clock_mtx_);
172 |     clock_ = clock;
173 |   }
174 | 
175 |   int32_t GetClock() const {
176 |     std::unique_lock<std::mutex> ulock(clock_mtx_);
177 |     return clock_;
178 |   }
179 | 
180 |   // Take row_data_pptr_ from other and destroy other. Existing ROW will not
181 |   // be accessible any more, but will stay alive until all RowAccessors
182 |   // referencing the ROW are destroyed. Accesses to SwapAndDestroy() and
183 |   // GetRowDataPtr() must be mutually exclusive as they the former modifies
184 |   // row_data_pptr_.
185 |   void SwapAndDestroy(ClientRow* other) {
186 |     clock_ = dynamic_cast<SSPClientRow*>(other)->clock_;
187 |     ClientRow::SwapAndDestroy(other);
188 |   }
189 | 
190 | private:  // private members
191 |   mutable std::mutex clock_mtx_;
192 |   int32_t clock_;
193 | };
194 | ```
195 | 
196 | 
197 | ## SerializedRowReader
198 | 
199 | ```c++
200 | // Provide sequential access to a byte string that's serialized rows.
201 | // Used to facilicate server reading row data.
202 | 
203 | // st_separator : serialized_table_separator
204 | // st_end : serialized_table_end
205 | 
206 | // Tables are serialized as the following memory layout
207 | // 1. int32_t : table id, could be st_separator or st_end
208 | // 2. int32_t : row id, could be st_separator or st_end
209 | // 3. size_t : serialized row size
210 | // 4. row data
211 | // repeat 1, 2, 3, 4
212 | // st_separator can not happen right after st_separator
213 | // st_end can not happen right after st_separator
214 | 
215 | // Rules for serialization:
216 | // The serialized row data is guaranteed to end when seeing a st_end or with
217 | // finish reading the entire memory buffer.
218 | // When seeing a st_separator, there could be another table or no table
219 | // following. The latter happens only when the buffer reaches its memory
220 | // boundary.
221 | 
222 | class SerializedRowReader : boost::noncopyable {
223 | public:
224 |   // does not take ownership
225 |   SerializedRowReader(const void *mem, size_t mem_size):
226 |       mem_(reinterpret_cast<const uint8_t*>(mem)),
227 |       mem_size_(mem_size) {
228 |     VLOG(0) << "mem_size_ = " << mem_size_;
229 |   }
230 |   ~SerializedRowReader() { }
231 | 
232 |   bool Restart() {
233 |     offset_ = 0;
234 |     current_table_id_ = *(reinterpret_cast<const int32_t*>(mem_ + offset_));
235 |     offset_ += sizeof(int32_t);
236 | 
237 |     if (current_table_id_ == GlobalContext::get_serialized_table_end())
238 |       return false;
239 |     return true;
240 |   }
241 | 
242 |   const void *Next(int32_t *table_id, int32_t *row_id, size_t *row_size) {
243 |     // When starting, there are 4 possiblilities:
244 |     // 1. finished reading the mem buffer
245 |     // 2. encounter the end of an table but there are other tables following
246 |     // (st_separator)
247 |     // 3. encounter the end of an table but there is no other table following
248 |     // (st_end)
249 |     // 4. normal row data
250 | 
251 |     if (offset_ + sizeof (int32_t) > mem_size_)
252 |       return NULL;
253 |     *row_id = *(reinterpret_cast<const int32_t*>(mem_ + offset_));
254 |     offset_ += sizeof(int32_t);
255 | 
256 |     do {
257 |       if (*row_id == GlobalContext::get_serialized_table_separator()) {
258 |         if (offset_ + sizeof (int32_t) > mem_size_)
259 |           return NULL;
260 | 
261 |         current_table_id_ = *(reinterpret_cast<const int32_t*>(mem_ + offset_));
262 |         offset_ += sizeof(int32_t);
263 | 
264 |         if (offset_ + sizeof (int32_t) > mem_size_)
265 |           return NULL;
266 | 
267 |         *row_id = *(reinterpret_cast<const int32_t*>(mem_ + offset_));
268 |         offset_ += sizeof(int32_t);
269 |         // row_id could be
270 |         // 1) st_separator: if the table is empty and there there are other
271 |         // tables following;
272 |         // 2) st_end: if the table is empty and there are no more table
273 |         // following
274 |         continue;
275 |       } else if (*row_id == GlobalContext::get_serialized_table_end()) {
276 |         return NULL;
277 |       } else {
278 |         *table_id = current_table_id_;
279 |         *row_size = *(reinterpret_cast<const size_t*>(mem_ + offset_));
280 |         offset_ += sizeof(size_t);
281 |         const void *data_mem = mem_ + offset_;
282 |         offset_ += *row_size;
283 |         //VLOG(0) << "mem read offset = " << offset_;
284 |         return data_mem;
285 |       }
286 |     }while(1);
287 |   }
288 | 
289 | private:
290 |   const uint8_t *mem_;
291 |   size_t mem_size_;
292 |   size_t offset_; // bytes to be read next
293 |   int32_t current_table_id_;
294 | };
295 | ```
296 | 
297 | ## ProcessStorage
298 | 
299 | ```c++
300 | // ProcessStorage is shared by all threads.
301 | //
302 | // TODO(wdai): Include thread storage in ProcessStorage.
303 | class ProcessStorage {
304 | public:
305 |   // capacity is the upper bound of the number of rows this ProcessStorage
306 |   // can store.
307 |   explicit ProcessStorage(int32_t capacity, size_t lock_pool_size);
308 | 
309 |   ~ProcessStorage();
310 | 
311 |   // Find row row_id; row_accessor is a read-only smart pointer. Return true
312 |   // if found, false otherwise. Note that the # of active row_accessor
313 |   // cannot be close to capacity, or Insert() will have undefined behavior
314 |   // as we may not be able to evict any row that's not being referenced by
315 |   // row_accessor.
316 |   bool Find(int32_t row_id, RowAccessor* row_accessor);
317 | 
318 |   // Check if a row exists, does not count as one access
319 |   bool Find(int32_t row_id);
320 | 
321 |   // Insert a row, and take ownership of client_row. Return true if row_id
322 |   // does not already exist (possibly evicting another row), false if row
323 |   // row_id already exists and is updated. If hitting capacity, then evict a
324 |   // row using ClockLRU.  Return read reference and evicted row id if
325 |   // row_accessor and evicted_row_id is supplied. We assume
326 |   // row_id is always non-negative, and use *evicted_row_id = -1 if no row
327 |   // is evicted. The evicted row is guaranteed to have 0 reference count
328 |   // (i.e., no application is using).
329 |   //
330 |   // Note: To stay below the capacity, we first check num_rows_. If
331 |   // num_rows_ >= capacity_, we subtract (num_rows_ - capacity_) from
332 |   // num_rows_ and then evict (num_rows_ - capacity_ + 1) rows using
333 |   // ClockLRU before inserting. This could result in over-eviction when two
334 |   // threads simultaneously do this eviction, but this is fine.
335 |   //
336 |   // TODO(wdai): Watch out when over-eviction clears the inactive list.
337 |   bool Insert(int32_t row_id, ClientRow* client_row);
338 |   bool Insert(int32_t row_id, ClientRow* client_row,
339 |       RowAccessor* row_accessor, int32_t* evicted_row_id = 0);
340 | 
341 |   bool Insert(int32_t row_id, ClientRow* client_row,
342 |               RowAccessor *row_accessor, int32_t *evicted_row_id,
343 |               ClientRow** evicted_row);
344 | 
345 | private:    // private functions
346 |   // Evict one inactive row using CLOCK replacement algorithm.
347 |   void EvictOneInactiveRow();
348 | 
349 |   // Find row_id in storage_map_, assuming there is lock on row_id. If
350 |   // found, update it with client_row, reference LRU, and set row_accessor
351 |   // accordingly, and return true. Return false if row_id is not found.
352 |   bool FindAndUpdate(int32_t row_id, ClientRow* client_row);
353 |   bool FindAndUpdate(int32_t row_id, ClientRow* client_row,
354 |       RowAccessor* row_accessor);
355 | 
356 | private:    // private members
357 |   // Number of rows allowed in this storage.
358 |   int32_t capacity_;
359 | 
360 |   // Number of rows in the storage. We choose not to use Cuckoo's size()
361 |   // which is more expensive.
362 |   std::atomic<int32_t> num_rows_;
363 | 
364 |   // Shared map with ClockLRU. The key type is row_id (int32_t), and the
365 |   // value type consists of a ClientRow* pointer (void*) and a slot #
366 |   // (int32_t).
367 |   // HashMap<rowId, ClientRow>
368 |   cuckoohash_map<int32_t, std::pair<void*, int32_t> > storage_map_;
369 | 
370 |   // Depends on storage_map_, thus need to be initialized after it.
371 |   ClockLRU clock_lru_;
372 | 
373 |   // Lock pool.
374 |   StripedLock<int32_t> locks_;
375 | };
376 | ```
377 | 
378 | ## RowOpLog
379 | ```c++
380 | class RowOpLog : boost::noncopyable {
381 | public:
382 |   RowOpLog(uint32_t update_size, InitUpdateFunc InitUpdate):
383 |     update_size_(update_size),
384 |     InitUpdate_(InitUpdate) { }
385 | 
386 |   ~RowOpLog() {
387 |     auto iter = oplogs_.begin();
388 |     for (; iter != oplogs_.end(); iter++) {
389 |       delete reinterpret_cast<uint8_t*>(iter->second);
390 |     }
391 |   }
392 | 
393 |   void* Find(int32_t col_id) {
394 |     auto iter = oplogs_.find(col_id);
395 |     if (iter == oplogs_.end()) {
396 |       return 0;
397 |     }
398 |     return iter->second;
399 |   }
400 | 
401 |   const void* FindConst(int32_t col_id) const {
402 |     auto iter = oplogs_.find(col_id);
403 |     if (iter == oplogs_.end()) {
404 |       return 0;
405 |     }
406 |     return iter->second;
407 |   }
408 | 
409 |   void* FindCreate(int32_t col_id) {
410 |     auto iter = oplogs_.find(col_id);
411 |     if (iter == oplogs_.end()) {
412 |       void* update = reinterpret_cast<void*>(new uint8_t[update_size_]);
413 |       InitUpdate_(col_id, update);
414 |       oplogs_[col_id] = update;
415 |       return update;
416 |     }
417 |     return iter->second;
418 |   }
419 | 
420 |   // Guaranteed ordered traversal
421 |   void* BeginIterate(int32_t *column_id) {
422 |     iter_ = oplogs_.begin();
423 |     if (iter_ == oplogs_.end()) {
424 |       return 0;
425 |     }
426 |     *column_id = iter_->first;
427 |     return iter_->second;
428 |   }
429 | 
430 |   void* Next(int32_t *column_id) {
431 |     iter_++;
432 |     if (iter_ == oplogs_.end()) {
433 |       return 0;
434 |     }
435 |     *column_id = iter_->first;
436 |     return iter_->second;
437 |   }
438 | 
439 |   // Guaranteed ordered traversal, in ascending order of column_id
440 |   const void* BeginIterateConst(int32_t *column_id) const {
441 |     const_iter_ = oplogs_.cbegin();
442 |     if (const_iter_ == oplogs_.cend()) {
443 |       return 0;
444 |     }
445 |     *column_id = const_iter_->first;
446 |     return const_iter_->second;
447 |   }
448 | 
449 |   const void* NextConst(int32_t *column_id) const {
450 |     const_iter_++;
451 |     if (const_iter_ == oplogs_.cend()) {
452 |       return 0;
453 |     }
454 |     *column_id = const_iter_->first;
455 |     return const_iter_->second;
456 |   }
457 | 
458 |   int32_t GetSize() const {
459 |     return oplogs_.size();
460 |   }
461 | 
462 | private:
463 |   // 
464 |   const uint32_t update_size_;
465 |   // TreeMap<rowId, updateFunc>
466 |   std::map<int32_t, void*> oplogs_;
467 |   // 最初的update函数
468 |   InitUpdateFunc InitUpdate_;
469 | 
470 |   std::map<int32_t, void*>::iterator iter_;
471 |   mutable std::map<int32_t, void*>::const_iterator const_iter_;
472 | };
473 | ```


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/Introduction-to-parameter-server-system.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/Introduction-to-parameter-server-system.pptx


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/Introduction-to-parameter-server.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/Introduction-to-parameter-server.pptx


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/MF-logs/MF-log.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/MF-logs/MF-log.pdf


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/MF-logs/driver.txt:
--------------------------------------------------------------------------------
 1 | xulijie@ubuntu:~/dev/Petuum/petuum-0.93/apps/matrixfact$ Data mode: Loading matrix sampledata/9x9_3blocks into memory...
 2 | Matrix dimensions: 9 by 9
 3 | # non-missing entries: 81
 4 | Factorization rank: 3
 5 | # client machines: 2
 6 | # worker threads per client: 2
 7 | SSP staleness: 5
 8 | Step size formula: 0.5 * (100 + t)^(-0.5)
 9 | Regularization strength lambda: 0.1
10 |   (Note: displayed loss function does not include regularization term)
11 | Iteration 1/2... loss function = 81.5027... elapsed time = 0.082
12 | Iteration 2/2... loss function = 157.365... elapsed time = 0.007
13 | Outputting results to prefix mf_output ... done
14 | total runtime = 2.925s
15 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/MF-logs/matrixfact.ubuntu.xulijie.log.INFO.20141230-152214.13924.txt:
--------------------------------------------------------------------------------
  1 | Log file created at: 2014/12/30 15:22:14
  2 | Running on machine: ubuntu
  3 | Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
  4 | I1230 15:22:14.711292 13924 comm_bus.cpp:117] CommBus ThreadRegister()
  5 | I1230 15:22:14.711571 13925 comm_bus.cpp:117] CommBus ThreadRegister()
  6 | I1230 15:22:14.712108 13925 name_node_thread.cpp:126] Number total_bg_threads() = 2
  7 | I1230 15:22:14.712117 13925 name_node_thread.cpp:128] Number total_server_threads() = 2
  8 | I1230 15:22:14.712414 13924 server_threads.cpp:92] RowSubscribe = SSPPushRowSubscribe
  9 | I1230 15:22:14.712421 13924 server_threads.cpp:106] Create server thread 0
 10 | I1230 15:22:14.712700 13928 server_threads.cpp:239] ServerThreads num_clients = 2
 11 | I1230 15:22:14.712708 13928 server_threads.cpp:240] my id = 1
 12 | I1230 15:22:14.712713 13928 server_threads.cpp:246] network addr = 192.168.40.100:10000 
 13 | I1230 15:22:14.712718 13928 comm_bus.cpp:117] CommBus ThreadRegister()
 14 | I1230 15:22:14.712924 13928 server_threads.cpp:252] Server thread registered CommBus
 15 | I1230 15:22:14.712944 13928 server_threads.cpp:141] Connect to local name node
 16 | I1230 15:22:14.713012 13925 name_node_thread.cpp:142] Name node gets server 1
 17 | I1230 15:22:14.713145 13929 bg_workers.cpp:889] Bg Worker starts here, my_id = 100
 18 | I1230 15:22:14.713166 13929 comm_bus.cpp:117] CommBus ThreadRegister()
 19 | I1230 15:22:14.713189 13929 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 0
 20 | I1230 15:22:14.713193 13929 bg_workers.cpp:290] Connect to local server 0
 21 | I1230 15:22:14.713238 13925 name_node_thread.cpp:139] Name node gets client 100
 22 | I1230 15:22:17.447321 13925 name_node_thread.cpp:142] Name node gets server 1000
 23 | I1230 15:22:17.447495 13925 name_node_thread.cpp:139] Name node gets client 1100
 24 | I1230 15:22:17.447517 13925 name_node_thread.cpp:149] Has received connections from all clients and servers, sending out connect_server_msg
 25 | I1230 15:22:17.447549 13925 name_node_thread.cpp:156] Send connect_server_msg done
 26 | I1230 15:22:17.447556 13925 name_node_thread.cpp:162] InitNameNode done
 27 | I1230 15:22:17.449017 13929 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1
 28 | I1230 15:22:17.449028 13929 bg_workers.cpp:290] Connect to local server 1
 29 | I1230 15:22:17.449089 13929 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1000
 30 | I1230 15:22:17.449095 13929 bg_workers.cpp:293] Connect to remote server 1000
 31 | I1230 15:22:17.449098 13929 bg_workers.cpp:296] server_addr = 192.168.40.101:10000
 32 | I1230 15:22:17.450561 13928 server_threads.cpp:187] InitNonNameNode done
 33 | I1230 15:22:17.470659 13929 bg_workers.cpp:368] get kClientStart from 0 num_started_servers = 0
 34 | I1230 15:22:17.470669 13929 bg_workers.cpp:368] get kClientStart from 1 num_started_servers = 1
 35 | I1230 15:22:17.470676 13929 bg_workers.cpp:368] get kClientStart from 1000 num_started_servers = 2
 36 | I1230 15:22:17.472113 13925 name_node_thread.cpp:308] msg_type = 4
 37 | I1230 15:22:17.472244 13929 bg_workers.cpp:911] head bg handles CreateTable
 38 | I1230 15:22:17.472594 13925 name_node_thread.cpp:308] msg_type = 5
 39 | I1230 15:22:17.472605 13925 name_node_thread.cpp:308] msg_type = 4
 40 | I1230 15:22:17.472697 13925 name_node_thread.cpp:308] msg_type = 5
 41 | I1230 15:22:17.473858 13929 oplog_index.cpp:42] Constructor shared_oplog_index = 0x1e358c0
 42 | I1230 15:22:17.473875 13929 bg_workers.cpp:439] Reply app thread 200
 43 | I1230 15:22:17.475160 13925 name_node_thread.cpp:308] msg_type = 4
 44 | I1230 15:22:17.475195 13925 name_node_thread.cpp:308] msg_type = 4
 45 | I1230 15:22:17.475469 13925 name_node_thread.cpp:308] msg_type = 5
 46 | I1230 15:22:17.475558 13925 name_node_thread.cpp:308] msg_type = 5
 47 | I1230 15:22:17.476692 13929 oplog_index.cpp:42] Constructor shared_oplog_index = 0x1e35840
 48 | I1230 15:22:17.476702 13929 bg_workers.cpp:439] Reply app thread 200
 49 | I1230 15:22:17.477852 13925 name_node_thread.cpp:308] msg_type = 4
 50 | I1230 15:22:17.477886 13925 name_node_thread.cpp:308] msg_type = 4
 51 | I1230 15:22:17.478170 13925 name_node_thread.cpp:308] msg_type = 5
 52 | I1230 15:22:17.478266 13925 name_node_thread.cpp:308] msg_type = 5
 53 | I1230 15:22:17.479367 13929 oplog_index.cpp:42] Constructor shared_oplog_index = 0x1e35c00
 54 | I1230 15:22:17.479375 13929 bg_workers.cpp:439] Reply app thread 200
 55 | I1230 15:22:17.486426 13934 comm_bus.cpp:117] CommBus ThreadRegister()
 56 | I1230 15:22:17.488441 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 57 | I1230 15:22:17.488484 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5
 58 | I1230 15:22:17.488497 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5
 59 | I1230 15:22:17.488502 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 60 | I1230 15:22:17.488507 13928 server.cpp:202] Read and Apply Update Done
 61 | I1230 15:22:17.488517 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 62 | I1230 15:22:17.488519 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 63 | I1230 15:22:17.488523 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 64 | I1230 15:22:17.488526 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 65 | I1230 15:22:17.488529 13928 server.cpp:202] Read and Apply Update Done
 66 | I1230 15:22:17.488535 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 67 | I1230 15:22:17.488538 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 68 | I1230 15:22:17.488541 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 69 | I1230 15:22:17.488544 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 70 | I1230 15:22:17.488548 13928 server.cpp:202] Read and Apply Update Done
 71 | I1230 15:22:17.488554 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 72 | I1230 15:22:17.488556 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 73 | I1230 15:22:17.488560 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 74 | I1230 15:22:17.488562 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 75 | I1230 15:22:17.488565 13928 server.cpp:202] Read and Apply Update Done
 76 | I1230 15:22:17.488571 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 77 | I1230 15:22:17.488590 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 78 | I1230 15:22:17.488595 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 79 | I1230 15:22:17.488598 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 80 | I1230 15:22:17.488601 13928 server.cpp:202] Read and Apply Update Done
 81 | I1230 15:22:17.488610 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 82 | I1230 15:22:17.488615 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 83 | I1230 15:22:17.488617 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 84 | I1230 15:22:17.488620 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 85 | I1230 15:22:17.488623 13928 server.cpp:202] Read and Apply Update Done
 86 | I1230 15:22:17.488647 13933 comm_bus.cpp:117] CommBus ThreadRegister()
 87 | I1230 15:22:17.491765 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 88 | I1230 15:22:17.491780 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5
 89 | I1230 15:22:17.491786 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5
 90 | I1230 15:22:17.491789 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 91 | I1230 15:22:17.491792 13928 server.cpp:202] Read and Apply Update Done
 92 | I1230 15:22:17.492027 13928 server.cpp:236] Serializing table 2
 93 | I1230 15:22:17.492035 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
 94 | I1230 15:22:17.492074 13928 server.cpp:236] Serializing table 1
 95 | I1230 15:22:17.492079 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
 96 | I1230 15:22:17.492082 13928 server.cpp:236] Serializing table 0
 97 | I1230 15:22:17.492085 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
 98 | I1230 15:22:17.493206 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 4294967295Set server version = 0
 99 | I1230 15:22:17.493221 13929 serialized_row_reader.hpp:64] mem_size_ = 24
100 | I1230 15:22:17.495124 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
101 | I1230 15:22:17.495134 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
102 | I1230 15:22:17.495137 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
103 | I1230 15:22:17.495141 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
104 | I1230 15:22:17.495144 13928 server.cpp:202] Read and Apply Update Done
105 | I1230 15:22:17.495151 13928 server.cpp:236] Serializing table 2
106 | I1230 15:22:17.495156 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
107 | I1230 15:22:17.495159 13928 server.cpp:236] Serializing table 1
108 | I1230 15:22:17.495162 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
109 | I1230 15:22:17.495165 13928 server.cpp:236] Serializing table 0
110 | I1230 15:22:17.495168 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
111 | I1230 15:22:17.495184 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
112 | I1230 15:22:17.495188 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
113 | I1230 15:22:17.495193 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
114 | I1230 15:22:17.495195 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
115 | I1230 15:22:17.495198 13928 server.cpp:202] Read and Apply Update Done
116 | I1230 15:22:17.495630 13928 server.cpp:236] Serializing table 2
117 | I1230 15:22:17.495647 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
118 | I1230 15:22:17.495651 13928 server.cpp:236] Serializing table 1
119 | I1230 15:22:17.495653 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
120 | I1230 15:22:17.495657 13928 server.cpp:236] Serializing table 0
121 | I1230 15:22:17.495661 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
122 | I1230 15:22:17.495678 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
123 | I1230 15:22:17.495682 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
124 | I1230 15:22:17.495686 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
125 | I1230 15:22:17.495688 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
126 | I1230 15:22:17.495692 13928 server.cpp:202] Read and Apply Update Done
127 | I1230 15:22:17.496121 13928 server.cpp:236] Serializing table 2
128 | I1230 15:22:17.496136 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
129 | I1230 15:22:17.496140 13928 server.cpp:236] Serializing table 1
130 | I1230 15:22:17.496143 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
131 | I1230 15:22:17.496146 13928 server.cpp:236] Serializing table 0
132 | I1230 15:22:17.496150 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
133 | I1230 15:22:17.496168 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
134 | I1230 15:22:17.496172 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
135 | I1230 15:22:17.496176 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
136 | I1230 15:22:17.496179 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
137 | I1230 15:22:17.496182 13928 server.cpp:202] Read and Apply Update Done
138 | I1230 15:22:17.496492 13928 server.cpp:236] Serializing table 2
139 | I1230 15:22:17.496546 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
140 | I1230 15:22:17.496551 13928 server.cpp:236] Serializing table 1
141 | I1230 15:22:17.496554 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
142 | I1230 15:22:17.496557 13928 server.cpp:236] Serializing table 0
143 | I1230 15:22:17.496561 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
144 | I1230 15:22:17.496572 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
145 | I1230 15:22:17.496574 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
146 | I1230 15:22:17.496578 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
147 | I1230 15:22:17.496598 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
148 | I1230 15:22:17.496603 13928 server.cpp:202] Read and Apply Update Done
149 | I1230 15:22:17.496997 13928 server.cpp:236] Serializing table 2
150 | I1230 15:22:17.497007 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
151 | I1230 15:22:17.497011 13928 server.cpp:236] Serializing table 1
152 | I1230 15:22:17.497014 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
153 | I1230 15:22:17.497017 13928 server.cpp:236] Serializing table 0
154 | I1230 15:22:17.497020 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
155 | I1230 15:22:17.497136 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 0Set server version = 1
156 | I1230 15:22:17.497144 13929 serialized_row_reader.hpp:64] mem_size_ = 24
157 | I1230 15:22:17.497153 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 1Set server version = 2
158 | I1230 15:22:17.497158 13929 serialized_row_reader.hpp:64] mem_size_ = 24
159 | I1230 15:22:17.497164 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 2Set server version = 3
160 | I1230 15:22:17.497167 13929 serialized_row_reader.hpp:64] mem_size_ = 24
161 | I1230 15:22:17.497174 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 3Set server version = 4
162 | I1230 15:22:17.497179 13929 serialized_row_reader.hpp:64] mem_size_ = 24
163 | I1230 15:22:17.497184 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 4Set server version = 5
164 | I1230 15:22:17.497189 13929 serialized_row_reader.hpp:64] mem_size_ = 24
165 | I1230 15:22:17.497195 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 4294967295Set server version = 0
166 | I1230 15:22:17.497200 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 0
167 | I1230 15:22:17.497208 13929 server_version_mgr.cpp:92] New min_version_ = 0
168 | I1230 15:22:17.497212 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 0
169 | I1230 15:22:17.497231 13929 serialized_row_reader.hpp:64] mem_size_ = 24
170 | I1230 15:22:17.497562 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 0Set server version = 1
171 | I1230 15:22:17.497570 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 1
172 | I1230 15:22:17.497573 13929 server_version_mgr.cpp:92] New min_version_ = 1
173 | I1230 15:22:17.497577 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 1
174 | I1230 15:22:17.497607 13929 serialized_row_reader.hpp:64] mem_size_ = 24
175 | I1230 15:22:17.497620 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 1Set server version = 2
176 | I1230 15:22:17.497624 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 2
177 | I1230 15:22:17.497628 13929 server_version_mgr.cpp:92] New min_version_ = 2
178 | I1230 15:22:17.497632 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 2
179 | I1230 15:22:17.497635 13929 serialized_row_reader.hpp:64] mem_size_ = 24
180 | I1230 15:22:17.497642 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 2Set server version = 3
181 | I1230 15:22:17.497645 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 3
182 | I1230 15:22:17.497648 13929 server_version_mgr.cpp:92] New min_version_ = 3
183 | I1230 15:22:17.497652 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 3
184 | I1230 15:22:17.497678 13929 serialized_row_reader.hpp:64] mem_size_ = 24
185 | I1230 15:22:17.497687 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 3Set server version = 4
186 | I1230 15:22:17.497690 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 4
187 | I1230 15:22:17.497694 13929 server_version_mgr.cpp:92] New min_version_ = 4
188 | I1230 15:22:17.497697 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 4
189 | I1230 15:22:17.497700 13929 serialized_row_reader.hpp:64] mem_size_ = 24
190 | I1230 15:22:17.497706 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 4Set server version = 5
191 | I1230 15:22:17.497710 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 5
192 | I1230 15:22:17.497714 13929 server_version_mgr.cpp:92] New min_version_ = 5
193 | I1230 15:22:17.497716 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 5
194 | I1230 15:22:17.497720 13929 serialized_row_reader.hpp:64] mem_size_ = 24
195 | I1230 15:22:17.499207 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
196 | I1230 15:22:17.569367 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
197 | I1230 15:22:17.570533 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
198 | I1230 15:22:17.570849 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
199 | I1230 15:22:17.570858 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5
200 | I1230 15:22:17.570863 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5
201 | I1230 15:22:17.570866 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 1
202 | I1230 15:22:17.570869 13928 server.cpp:202] Read and Apply Update Done
203 | I1230 15:22:17.570947 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
204 | I1230 15:22:17.571192 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
205 | I1230 15:22:17.572718 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
206 | I1230 15:22:17.572728 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5
207 | I1230 15:22:17.572733 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5
208 | I1230 15:22:17.572737 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 1
209 | I1230 15:22:17.572741 13928 server.cpp:202] Read and Apply Update Done
210 | I1230 15:22:17.572749 13928 server.cpp:236] Serializing table 2
211 | I1230 15:22:17.572753 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
212 | I1230 15:22:17.572757 13928 server.cpp:236] Serializing table 1
213 | I1230 15:22:17.572760 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
214 | I1230 15:22:17.572764 13928 server.cpp:236] Serializing table 0
215 | I1230 15:22:17.572767 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
216 | I1230 15:22:17.578516 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 5Set server version = 6
217 | I1230 15:22:17.578531 13929 serialized_row_reader.hpp:64] mem_size_ = 292
218 | I1230 15:22:17.578754 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
219 | I1230 15:22:17.578763 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5
220 | I1230 15:22:17.578768 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5
221 | I1230 15:22:17.578770 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 1
222 | I1230 15:22:17.578774 13928 server.cpp:202] Read and Apply Update Done
223 | I1230 15:22:17.578816 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
224 | I1230 15:22:17.578820 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
225 | I1230 15:22:17.578824 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
226 | I1230 15:22:17.578827 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
227 | I1230 15:22:17.578830 13928 server.cpp:202] Read and Apply Update Done
228 | I1230 15:22:17.578836 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
229 | I1230 15:22:17.578840 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
230 | I1230 15:22:17.578843 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
231 | I1230 15:22:17.578846 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
232 | I1230 15:22:17.578850 13928 server.cpp:202] Read and Apply Update Done
233 | I1230 15:22:17.578855 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
234 | I1230 15:22:17.578858 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
235 | I1230 15:22:17.578861 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
236 | I1230 15:22:17.578865 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
237 | I1230 15:22:17.578867 13928 server.cpp:202] Read and Apply Update Done
238 | I1230 15:22:17.578873 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
239 | I1230 15:22:17.578876 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
240 | I1230 15:22:17.578881 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
241 | I1230 15:22:17.578883 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
242 | I1230 15:22:17.578886 13928 server.cpp:202] Read and Apply Update Done
243 | I1230 15:22:17.578892 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
244 | I1230 15:22:17.578896 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
245 | I1230 15:22:17.578899 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
246 | I1230 15:22:17.578902 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
247 | I1230 15:22:17.578905 13928 server.cpp:202] Read and Apply Update Done
248 | I1230 15:22:17.578912 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
249 | I1230 15:22:17.578914 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
250 | I1230 15:22:17.578917 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
251 | I1230 15:22:17.578920 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
252 | I1230 15:22:17.578923 13928 server.cpp:202] Read and Apply Update Done
253 | I1230 15:22:17.578932 13928 server_threads.cpp:419] get ClientShutDown from bg 1100
254 | I1230 15:22:17.578991 13925 name_node_thread.cpp:308] msg_type = 16
255 | I1230 15:22:17.578999 13925 name_node_thread.cpp:313] get ClientShutDown from bg 1100
256 | I1230 15:22:17.579012 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 5Set server version = 6
257 | I1230 15:22:17.579016 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 6
258 | I1230 15:22:17.579020 13929 server_version_mgr.cpp:92] New min_version_ = 6
259 | I1230 15:22:17.579023 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 6
260 | I1230 15:22:17.579049 13929 serialized_row_reader.hpp:64] mem_size_ = 216
261 | I1230 15:22:17.584317 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
262 | I1230 15:22:17.584328 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5
263 | I1230 15:22:17.584332 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5
264 | I1230 15:22:17.584336 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 1
265 | I1230 15:22:17.584339 13928 server.cpp:202] Read and Apply Update Done
266 | I1230 15:22:17.584352 13928 server.cpp:236] Serializing table 2
267 | I1230 15:22:17.584357 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
268 | I1230 15:22:17.584360 13928 server.cpp:236] Serializing table 1
269 | I1230 15:22:17.584363 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
270 | I1230 15:22:17.584367 13928 server.cpp:236] Serializing table 0
271 | I1230 15:22:17.584370 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
272 | I1230 15:22:17.584385 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
273 | I1230 15:22:17.584389 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
274 | I1230 15:22:17.584393 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
275 | I1230 15:22:17.584395 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
276 | I1230 15:22:17.584398 13928 server.cpp:202] Read and Apply Update Done
277 | I1230 15:22:17.584403 13928 server.cpp:236] Serializing table 2
278 | I1230 15:22:17.584406 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
279 | I1230 15:22:17.584409 13928 server.cpp:236] Serializing table 1
280 | I1230 15:22:17.584413 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
281 | I1230 15:22:17.584415 13928 server.cpp:236] Serializing table 0
282 | I1230 15:22:17.584419 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
283 | I1230 15:22:17.584426 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
284 | I1230 15:22:17.584429 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
285 | I1230 15:22:17.584432 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
286 | I1230 15:22:17.584435 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
287 | I1230 15:22:17.584439 13928 server.cpp:202] Read and Apply Update Done
288 | I1230 15:22:17.584442 13928 server.cpp:236] Serializing table 2
289 | I1230 15:22:17.584446 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
290 | I1230 15:22:17.584450 13928 server.cpp:236] Serializing table 1
291 | I1230 15:22:17.584452 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
292 | I1230 15:22:17.584455 13928 server.cpp:236] Serializing table 0
293 | I1230 15:22:17.584458 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
294 | I1230 15:22:17.584465 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
295 | I1230 15:22:17.584470 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
296 | I1230 15:22:17.584472 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
297 | I1230 15:22:17.584475 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
298 | I1230 15:22:17.584478 13928 server.cpp:202] Read and Apply Update Done
299 | I1230 15:22:17.584482 13928 server.cpp:236] Serializing table 2
300 | I1230 15:22:17.584486 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
301 | I1230 15:22:17.584488 13928 server.cpp:236] Serializing table 1
302 | I1230 15:22:17.584491 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
303 | I1230 15:22:17.584494 13928 server.cpp:236] Serializing table 0
304 | I1230 15:22:17.584497 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
305 | I1230 15:22:17.584504 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
306 | I1230 15:22:17.584527 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
307 | I1230 15:22:17.584532 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
308 | I1230 15:22:17.584534 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
309 | I1230 15:22:17.584537 13928 server.cpp:202] Read and Apply Update Done
310 | I1230 15:22:17.584542 13928 server.cpp:236] Serializing table 2
311 | I1230 15:22:17.584545 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
312 | I1230 15:22:17.584549 13928 server.cpp:236] Serializing table 1
313 | I1230 15:22:17.584553 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
314 | I1230 15:22:17.584555 13928 server.cpp:236] Serializing table 0
315 | I1230 15:22:17.584558 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
316 | I1230 15:22:17.584566 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
317 | I1230 15:22:17.584570 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
318 | I1230 15:22:17.584573 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
319 | I1230 15:22:17.584576 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
320 | I1230 15:22:17.584636 13928 server.cpp:202] Read and Apply Update Done
321 | I1230 15:22:17.584887 13928 server.cpp:236] Serializing table 2
322 | I1230 15:22:17.584894 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
323 | I1230 15:22:17.584897 13928 server.cpp:236] Serializing table 1
324 | I1230 15:22:17.584900 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
325 | I1230 15:22:17.584903 13928 server.cpp:236] Serializing table 0
326 | I1230 15:22:17.584907 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
327 | I1230 15:22:17.584918 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
328 | I1230 15:22:17.584923 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
329 | I1230 15:22:17.584925 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
330 | I1230 15:22:17.584928 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
331 | I1230 15:22:17.584931 13928 server.cpp:202] Read and Apply Update Done
332 | I1230 15:22:17.585170 13928 server.cpp:236] Serializing table 2
333 | I1230 15:22:17.585177 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
334 | I1230 15:22:17.585180 13928 server.cpp:236] Serializing table 1
335 | I1230 15:22:17.585183 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
336 | I1230 15:22:17.585186 13928 server.cpp:236] Serializing table 0
337 | I1230 15:22:17.585189 13928 server_table.hpp:83] tmp_row_buff_size_ = 512
338 | I1230 15:22:17.587085 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 6Set server version = 7
339 | I1230 15:22:17.587095 13929 serialized_row_reader.hpp:64] mem_size_ = 292
340 | I1230 15:22:17.587123 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 7Set server version = 8
341 | I1230 15:22:17.587127 13929 serialized_row_reader.hpp:64] mem_size_ = 24
342 | I1230 15:22:17.587134 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 8Set server version = 9
343 | I1230 15:22:17.587138 13929 serialized_row_reader.hpp:64] mem_size_ = 24
344 | I1230 15:22:17.587146 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 9Set server version = 10
345 | I1230 15:22:17.587148 13929 serialized_row_reader.hpp:64] mem_size_ = 24
346 | I1230 15:22:17.587155 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 10Set server version = 11
347 | I1230 15:22:17.587159 13929 serialized_row_reader.hpp:64] mem_size_ = 24
348 | I1230 15:22:17.587165 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 11Set server version = 12
349 | I1230 15:22:17.587169 13929 serialized_row_reader.hpp:64] mem_size_ = 24
350 | I1230 15:22:17.587175 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 12Set server version = 13
351 | I1230 15:22:17.587208 13929 serialized_row_reader.hpp:64] mem_size_ = 24
352 | I1230 15:22:17.593243 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 6Set server version = 7
353 | I1230 15:22:17.593261 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 7
354 | I1230 15:22:17.593264 13929 server_version_mgr.cpp:92] New min_version_ = 7
355 | I1230 15:22:17.593267 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 7
356 | I1230 15:22:17.593284 13929 serialized_row_reader.hpp:64] mem_size_ = 216
357 | I1230 15:22:17.593324 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 7Set server version = 8
358 | I1230 15:22:17.593330 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 8
359 | I1230 15:22:17.593333 13929 server_version_mgr.cpp:92] New min_version_ = 8
360 | I1230 15:22:17.593336 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 8
361 | I1230 15:22:17.593340 13929 serialized_row_reader.hpp:64] mem_size_ = 24
362 | I1230 15:22:17.593350 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 8Set server version = 9
363 | I1230 15:22:17.593354 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 9
364 | I1230 15:22:17.593358 13929 server_version_mgr.cpp:92] New min_version_ = 9
365 | I1230 15:22:17.593360 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 9
366 | I1230 15:22:17.593365 13929 serialized_row_reader.hpp:64] mem_size_ = 24
367 | I1230 15:22:17.593372 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 9Set server version = 10
368 | I1230 15:22:17.593375 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 10
369 | I1230 15:22:17.593379 13929 server_version_mgr.cpp:92] New min_version_ = 10
370 | I1230 15:22:17.593382 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 10
371 | I1230 15:22:17.593385 13929 serialized_row_reader.hpp:64] mem_size_ = 24
372 | I1230 15:22:17.593391 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 10Set server version = 11
373 | I1230 15:22:17.593395 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 11
374 | I1230 15:22:17.593399 13929 server_version_mgr.cpp:92] New min_version_ = 11
375 | I1230 15:22:17.593401 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 11
376 | I1230 15:22:17.593405 13929 serialized_row_reader.hpp:64] mem_size_ = 24
377 | I1230 15:22:17.593411 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 11Set server version = 12
378 | I1230 15:22:17.593415 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 12
379 | I1230 15:22:17.593417 13929 server_version_mgr.cpp:92] New min_version_ = 12
380 | I1230 15:22:17.593420 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 12
381 | I1230 15:22:17.593425 13929 serialized_row_reader.hpp:64] mem_size_ = 24
382 | I1230 15:22:17.597100 13925 name_node_thread.cpp:308] msg_type = 16
383 | I1230 15:22:17.597122 13925 name_node_thread.cpp:313] get ClientShutDown from bg 100
384 | I1230 15:22:17.597277 13925 name_node_thread.cpp:316] NameNode shutting down
385 | I1230 15:22:17.597395 13929 bg_workers.cpp:970] get ServerShutDownAck from server 0
386 | I1230 15:22:17.597450 13928 server_threads.cpp:419] get ClientShutDown from bg 100
387 | I1230 15:22:17.597481 13928 server_threads.cpp:422] Server shutdown
388 | I1230 15:22:17.599992 13929 bg_workers.cpp:970] get ServerShutDownAck from server 1
389 | I1230 15:22:17.635277 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 12Set server version = 13
390 | I1230 15:22:17.635350 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 13
391 | I1230 15:22:17.635354 13929 server_version_mgr.cpp:92] New min_version_ = 13
392 | I1230 15:22:17.635359 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1000 version to remove = 13
393 | I1230 15:22:17.635367 13929 serialized_row_reader.hpp:64] mem_size_ = 24
394 | I1230 15:22:17.635381 13929 bg_workers.cpp:970] get ServerShutDownAck from server 1000
395 | I1230 15:22:17.635422 13929 bg_workers.cpp:973] Bg worker 100 shutting down
396 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/MF-logs/matrixfact.ubuntu2.xulijie.log.INFO.20141230-152217.8866.txt:
--------------------------------------------------------------------------------
  1 | Log file created at: 2014/12/30 15:22:17
  2 | Running on machine: ubuntu2
  3 | Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
  4 | I1230 15:22:17.474882  8866 comm_bus.cpp:117] CommBus ThreadRegister()
  5 | I1230 15:22:17.475028  8866 server_threads.cpp:92] RowSubscribe = SSPPushRowSubscribe
  6 | I1230 15:22:17.475034  8866 server_threads.cpp:106] Create server thread 0
  7 | I1230 15:22:17.475098  8867 server_threads.cpp:239] ServerThreads num_clients = 2
  8 | I1230 15:22:17.475103  8867 server_threads.cpp:240] my id = 1000
  9 | I1230 15:22:17.475110  8867 server_threads.cpp:246] network addr = 192.168.40.101:10000
 10 | I1230 15:22:17.475113  8867 comm_bus.cpp:117] CommBus ThreadRegister()
 11 | I1230 15:22:17.475584  8867 server_threads.cpp:252] Server thread registered CommBus
 12 | I1230 15:22:17.475623  8867 server_threads.cpp:144] Connect to remote name node
 13 | I1230 15:22:17.475628  8867 server_threads.cpp:147] name_node_addr = 192.168.40.100:9999
 14 | I1230 15:22:17.475725  8870 bg_workers.cpp:889] Bg Worker starts here, my_id = 1100
 15 | I1230 15:22:17.475742  8870 comm_bus.cpp:117] CommBus ThreadRegister()
 16 | I1230 15:22:17.475757  8870 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 0
 17 | I1230 15:22:17.475761  8870 bg_workers.cpp:293] Connect to remote server 0
 18 | I1230 15:22:17.475764  8870 bg_workers.cpp:296] server_addr = 192.168.40.100:9999
 19 | I1230 15:22:17.478379  8870 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1
 20 | I1230 15:22:17.478387  8870 bg_workers.cpp:293] Connect to remote server 1
 21 | I1230 15:22:17.478390  8870 bg_workers.cpp:296] server_addr = 192.168.40.100:10000
 22 | I1230 15:22:17.480212  8870 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1000
 23 | I1230 15:22:17.480221  8870 bg_workers.cpp:290] Connect to local server 1000
 24 | I1230 15:22:17.480249  8870 bg_workers.cpp:368] get kClientStart from 0 num_started_servers = 0
 25 | I1230 15:22:17.481348  8870 bg_workers.cpp:368] get kClientStart from 1 num_started_servers = 1
 26 | I1230 15:22:17.481624  8867 server_threads.cpp:187] InitNonNameNode done
 27 | I1230 15:22:17.481638  8870 bg_workers.cpp:368] get kClientStart from 1000 num_started_servers = 2
 28 | I1230 15:22:17.481751  8870 bg_workers.cpp:911] head bg handles CreateTable
 29 | I1230 15:22:17.505518  8870 oplog_index.cpp:42] Constructor shared_oplog_index = 0x13e5700
 30 | I1230 15:22:17.505545  8870 bg_workers.cpp:439] Reply app thread 1200
 31 | I1230 15:22:17.508220  8870 oplog_index.cpp:42] Constructor shared_oplog_index = 0x13e5840
 32 | I1230 15:22:17.508232  8870 bg_workers.cpp:439] Reply app thread 1200
 33 | I1230 15:22:17.510934  8870 oplog_index.cpp:42] Constructor shared_oplog_index = 0x13e5a00
 34 | I1230 15:22:17.510947  8870 bg_workers.cpp:439] Reply app thread 1200
 35 | I1230 15:22:17.511045  8872 comm_bus.cpp:117] CommBus ThreadRegister()
 36 | I1230 15:22:17.511118  8871 comm_bus.cpp:117] CommBus ThreadRegister()
 37 | I1230 15:22:17.512034  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 38 | I1230 15:22:17.512040  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4
 39 | I1230 15:22:17.512049  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4
 40 | I1230 15:22:17.512053  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 41 | I1230 15:22:17.512056  8867 server.cpp:202] Read and Apply Update Done
 42 | I1230 15:22:17.512743  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 43 | I1230 15:22:17.512749  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 44 | I1230 15:22:17.512753  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 45 | I1230 15:22:17.512754  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 46 | I1230 15:22:17.512758  8867 server.cpp:202] Read and Apply Update Done
 47 | I1230 15:22:17.513478  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 48 | I1230 15:22:17.513485  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 49 | I1230 15:22:17.513488  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 50 | I1230 15:22:17.513521  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 51 | I1230 15:22:17.513525  8867 server.cpp:202] Read and Apply Update Done
 52 | I1230 15:22:17.514166  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 53 | I1230 15:22:17.514173  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 54 | I1230 15:22:17.514175  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 55 | I1230 15:22:17.514178  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 56 | I1230 15:22:17.514179  8867 server.cpp:202] Read and Apply Update Done
 57 | I1230 15:22:17.514829  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 58 | I1230 15:22:17.514837  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 59 | I1230 15:22:17.514838  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 60 | I1230 15:22:17.514842  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 61 | I1230 15:22:17.514843  8867 server.cpp:202] Read and Apply Update Done
 62 | I1230 15:22:17.515512  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 63 | I1230 15:22:17.515518  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 64 | I1230 15:22:17.515522  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 65 | I1230 15:22:17.515523  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 66 | I1230 15:22:17.515525  8867 server.cpp:202] Read and Apply Update Done
 67 | I1230 15:22:17.523031  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 68 | I1230 15:22:17.523054  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4
 69 | I1230 15:22:17.523059  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4
 70 | I1230 15:22:17.523062  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 71 | I1230 15:22:17.523066  8867 server.cpp:202] Read and Apply Update Done
 72 | I1230 15:22:17.523350  8867 server.cpp:236] Serializing table 2
 73 | I1230 15:22:17.523357  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
 74 | I1230 15:22:17.523361  8867 server.cpp:236] Serializing table 1
 75 | I1230 15:22:17.523363  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
 76 | I1230 15:22:17.523366  8867 server.cpp:236] Serializing table 0
 77 | I1230 15:22:17.523370  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
 78 | I1230 15:22:17.523473  8870 server_version_mgr.cpp:51] Server id = 1000 original server version = 4294967295Set server version = 5
 79 | I1230 15:22:17.523481  8870 serialized_row_reader.hpp:64] mem_size_ = 24
 80 | I1230 15:22:17.523488  8870 server_version_mgr.cpp:51] Server id = 1 original server version = 4294967295Set server version = 5
 81 | I1230 15:22:17.523491  8870 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1 version = 5
 82 | I1230 15:22:17.523496  8870 server_version_mgr.cpp:92] New min_version_ = 5
 83 | I1230 15:22:17.523499  8870 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1 version to remove = 5
 84 | I1230 15:22:17.523519  8870 serialized_row_reader.hpp:64] mem_size_ = 24
 85 | I1230 15:22:17.525842  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 86 | I1230 15:22:17.525852  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 87 | I1230 15:22:17.525856  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 88 | I1230 15:22:17.525857  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
 89 | I1230 15:22:17.525893  8867 server.cpp:202] Read and Apply Update Done
 90 | I1230 15:22:17.525902  8867 server.cpp:236] Serializing table 2
 91 | I1230 15:22:17.525904  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
 92 | I1230 15:22:17.525907  8867 server.cpp:236] Serializing table 1
 93 | I1230 15:22:17.525909  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
 94 | I1230 15:22:17.525913  8867 server.cpp:236] Serializing table 0
 95 | I1230 15:22:17.525914  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
 96 | I1230 15:22:17.525928  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
 97 | I1230 15:22:17.525930  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
 98 | I1230 15:22:17.525933  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
 99 | I1230 15:22:17.525935  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
100 | I1230 15:22:17.525938  8867 server.cpp:202] Read and Apply Update Done
101 | I1230 15:22:17.526305  8867 server.cpp:236] Serializing table 2
102 | I1230 15:22:17.526319  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
103 | I1230 15:22:17.526322  8867 server.cpp:236] Serializing table 1
104 | I1230 15:22:17.526324  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
105 | I1230 15:22:17.526326  8867 server.cpp:236] Serializing table 0
106 | I1230 15:22:17.526329  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
107 | I1230 15:22:17.526343  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
108 | I1230 15:22:17.526346  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
109 | I1230 15:22:17.526348  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
110 | I1230 15:22:17.526350  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
111 | I1230 15:22:17.526353  8867 server.cpp:202] Read and Apply Update Done
112 | I1230 15:22:17.526689  8867 server.cpp:236] Serializing table 2
113 | I1230 15:22:17.526695  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
114 | I1230 15:22:17.526698  8867 server.cpp:236] Serializing table 1
115 | I1230 15:22:17.526700  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
116 | I1230 15:22:17.526702  8867 server.cpp:236] Serializing table 0
117 | I1230 15:22:17.526705  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
118 | I1230 15:22:17.526713  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
119 | I1230 15:22:17.526716  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
120 | I1230 15:22:17.526718  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
121 | I1230 15:22:17.526721  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
122 | I1230 15:22:17.526722  8867 server.cpp:202] Read and Apply Update Done
123 | I1230 15:22:17.527147  8867 server.cpp:236] Serializing table 2
124 | I1230 15:22:17.527159  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
125 | I1230 15:22:17.527163  8867 server.cpp:236] Serializing table 1
126 | I1230 15:22:17.527165  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
127 | I1230 15:22:17.527168  8867 server.cpp:236] Serializing table 0
128 | I1230 15:22:17.527169  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
129 | I1230 15:22:17.527186  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
130 | I1230 15:22:17.527189  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
131 | I1230 15:22:17.527191  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
132 | I1230 15:22:17.527194  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
133 | I1230 15:22:17.527196  8867 server.cpp:202] Read and Apply Update Done
134 | I1230 15:22:17.527624  8867 server.cpp:236] Serializing table 2
135 | I1230 15:22:17.527678  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
136 | I1230 15:22:17.527683  8867 server.cpp:236] Serializing table 1
137 | I1230 15:22:17.527684  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
138 | I1230 15:22:17.527688  8867 server.cpp:236] Serializing table 0
139 | I1230 15:22:17.527689  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
140 | I1230 15:22:17.527811  8870 serialized_row_reader.hpp:64] mem_size_ = 24
141 | I1230 15:22:17.527822  8870 serialized_row_reader.hpp:64] mem_size_ = 24
142 | I1230 15:22:17.527827  8870 serialized_row_reader.hpp:64] mem_size_ = 24
143 | I1230 15:22:17.527832  8870 serialized_row_reader.hpp:64] mem_size_ = 24
144 | I1230 15:22:17.527835  8870 serialized_row_reader.hpp:64] mem_size_ = 24
145 | I1230 15:22:17.527959  8870 serialized_row_reader.hpp:64] mem_size_ = 24
146 | I1230 15:22:17.527986  8870 serialized_row_reader.hpp:64] mem_size_ = 24
147 | I1230 15:22:17.527997  8870 serialized_row_reader.hpp:64] mem_size_ = 24
148 | I1230 15:22:17.528008  8870 serialized_row_reader.hpp:64] mem_size_ = 24
149 | I1230 15:22:17.528018  8870 serialized_row_reader.hpp:64] mem_size_ = 24
150 | I1230 15:22:17.528100  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 6 There's a previous request requesting clock 6
151 | I1230 15:22:17.528393  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
152 | I1230 15:22:17.528705  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
153 | I1230 15:22:17.529086  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
154 | I1230 15:22:17.529573  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
155 | I1230 15:22:17.529762  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
156 | I1230 15:22:17.530083  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
157 | I1230 15:22:17.530362  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
158 | I1230 15:22:17.590853  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
159 | I1230 15:22:17.597518  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
160 | I1230 15:22:17.597622  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
161 | I1230 15:22:17.599817  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
162 | I1230 15:22:17.599916  8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1
163 | I1230 15:22:17.601063  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
164 | I1230 15:22:17.601071  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4
165 | I1230 15:22:17.601075  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4
166 | I1230 15:22:17.601078  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
167 | I1230 15:22:17.601080  8867 server.cpp:202] Read and Apply Update Done
168 | I1230 15:22:17.603108  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
169 | I1230 15:22:17.603118  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4
170 | I1230 15:22:17.603121  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4
171 | I1230 15:22:17.603124  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
172 | I1230 15:22:17.603127  8867 server.cpp:202] Read and Apply Update Done
173 | I1230 15:22:17.603134  8867 server.cpp:236] Serializing table 2
174 | I1230 15:22:17.603173  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
175 | I1230 15:22:17.603176  8867 server.cpp:236] Serializing table 1
176 | I1230 15:22:17.603179  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
177 | I1230 15:22:17.603181  8867 server.cpp:236] Serializing table 0
178 | I1230 15:22:17.603184  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
179 | I1230 15:22:17.603328  8870 server_version_mgr.cpp:51] Server id = 1000 original server version = 5Set server version = 6
180 | I1230 15:22:17.603337  8870 serialized_row_reader.hpp:64] mem_size_ = 216
181 | I1230 15:22:17.603554  8870 server_version_mgr.cpp:51] Server id = 1 original server version = 5Set server version = 6
182 | I1230 15:22:17.603560  8870 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1 version = 6
183 | I1230 15:22:17.603564  8870 server_version_mgr.cpp:92] New min_version_ = 6
184 | I1230 15:22:17.603565  8870 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1 version to remove = 6
185 | I1230 15:22:17.603575  8870 serialized_row_reader.hpp:64] mem_size_ = 292
186 | I1230 15:22:17.604626  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
187 | I1230 15:22:17.604634  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4
188 | I1230 15:22:17.604637  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4
189 | I1230 15:22:17.604640  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
190 | I1230 15:22:17.604642  8867 server.cpp:202] Read and Apply Update Done
191 | I1230 15:22:17.605319  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
192 | I1230 15:22:17.605325  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
193 | I1230 15:22:17.605329  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
194 | I1230 15:22:17.605330  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
195 | I1230 15:22:17.605332  8867 server.cpp:202] Read and Apply Update Done
196 | I1230 15:22:17.605978  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
197 | I1230 15:22:17.605985  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
198 | I1230 15:22:17.605988  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
199 | I1230 15:22:17.605990  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
200 | I1230 15:22:17.605993  8867 server.cpp:202] Read and Apply Update Done
201 | I1230 15:22:17.606648  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
202 | I1230 15:22:17.606654  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
203 | I1230 15:22:17.606657  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
204 | I1230 15:22:17.606659  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
205 | I1230 15:22:17.606662  8867 server.cpp:202] Read and Apply Update Done
206 | I1230 15:22:17.607377  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
207 | I1230 15:22:17.607383  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
208 | I1230 15:22:17.607385  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
209 | I1230 15:22:17.607388  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
210 | I1230 15:22:17.607390  8867 server.cpp:202] Read and Apply Update Done
211 | I1230 15:22:17.608032  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
212 | I1230 15:22:17.608039  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
213 | I1230 15:22:17.608062  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
214 | I1230 15:22:17.608065  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
215 | I1230 15:22:17.608067  8867 server.cpp:202] Read and Apply Update Done
216 | I1230 15:22:17.608726  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
217 | I1230 15:22:17.608732  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
218 | I1230 15:22:17.608736  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
219 | I1230 15:22:17.608737  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
220 | I1230 15:22:17.608739  8867 server.cpp:202] Read and Apply Update Done
221 | I1230 15:22:17.608836  8867 server_threads.cpp:419] get ClientShutDown from bg 1100
222 | I1230 15:22:17.616081  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
223 | I1230 15:22:17.616102  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4
224 | I1230 15:22:17.616107  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4
225 | I1230 15:22:17.616111  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
226 | I1230 15:22:17.616113  8867 server.cpp:202] Read and Apply Update Done
227 | I1230 15:22:17.616122  8867 server.cpp:236] Serializing table 2
228 | I1230 15:22:17.616125  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
229 | I1230 15:22:17.616128  8867 server.cpp:236] Serializing table 1
230 | I1230 15:22:17.616132  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
231 | I1230 15:22:17.616133  8867 server.cpp:236] Serializing table 0
232 | I1230 15:22:17.616137  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
233 | I1230 15:22:17.616154  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
234 | I1230 15:22:17.616158  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
235 | I1230 15:22:17.616160  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
236 | I1230 15:22:17.616163  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
237 | I1230 15:22:17.616165  8867 server.cpp:202] Read and Apply Update Done
238 | I1230 15:22:17.616168  8867 server.cpp:236] Serializing table 2
239 | I1230 15:22:17.616171  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
240 | I1230 15:22:17.616173  8867 server.cpp:236] Serializing table 1
241 | I1230 15:22:17.616175  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
242 | I1230 15:22:17.616178  8867 server.cpp:236] Serializing table 0
243 | I1230 15:22:17.616180  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
244 | I1230 15:22:17.616185  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
245 | I1230 15:22:17.616189  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
246 | I1230 15:22:17.616190  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
247 | I1230 15:22:17.616194  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
248 | I1230 15:22:17.616195  8867 server.cpp:202] Read and Apply Update Done
249 | I1230 15:22:17.616199  8867 server.cpp:236] Serializing table 2
250 | I1230 15:22:17.616200  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
251 | I1230 15:22:17.616204  8867 server.cpp:236] Serializing table 1
252 | I1230 15:22:17.616205  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
253 | I1230 15:22:17.616207  8867 server.cpp:236] Serializing table 0
254 | I1230 15:22:17.616209  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
255 | I1230 15:22:17.616215  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
256 | I1230 15:22:17.616252  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
257 | I1230 15:22:17.616255  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
258 | I1230 15:22:17.616257  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
259 | I1230 15:22:17.616261  8867 server.cpp:202] Read and Apply Update Done
260 | I1230 15:22:17.616264  8867 server.cpp:236] Serializing table 2
261 | I1230 15:22:17.616266  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
262 | I1230 15:22:17.616269  8867 server.cpp:236] Serializing table 1
263 | I1230 15:22:17.616271  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
264 | I1230 15:22:17.616273  8867 server.cpp:236] Serializing table 0
265 | I1230 15:22:17.616276  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
266 | I1230 15:22:17.616298  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
267 | I1230 15:22:17.616303  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
268 | I1230 15:22:17.616305  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
269 | I1230 15:22:17.616307  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
270 | I1230 15:22:17.616309  8867 server.cpp:202] Read and Apply Update Done
271 | I1230 15:22:17.616313  8867 server.cpp:236] Serializing table 2
272 | I1230 15:22:17.616315  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
273 | I1230 15:22:17.616318  8867 server.cpp:236] Serializing table 1
274 | I1230 15:22:17.616320  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
275 | I1230 15:22:17.616322  8867 server.cpp:236] Serializing table 0
276 | I1230 15:22:17.616324  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
277 | I1230 15:22:17.616330  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
278 | I1230 15:22:17.616333  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
279 | I1230 15:22:17.616335  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
280 | I1230 15:22:17.616338  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
281 | I1230 15:22:17.616339  8867 server.cpp:202] Read and Apply Update Done
282 | I1230 15:22:17.616595  8867 server.cpp:236] Serializing table 2
283 | I1230 15:22:17.616602  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
284 | I1230 15:22:17.616605  8867 server.cpp:236] Serializing table 1
285 | I1230 15:22:17.616607  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
286 | I1230 15:22:17.616610  8867 server.cpp:236] Serializing table 0
287 | I1230 15:22:17.616612  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
288 | I1230 15:22:17.616621  8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3
289 | I1230 15:22:17.616623  8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0
290 | I1230 15:22:17.616626  8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0
291 | I1230 15:22:17.616627  8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0
292 | I1230 15:22:17.616631  8867 server.cpp:202] Read and Apply Update Done
293 | I1230 15:22:17.622216  8870 server_version_mgr.cpp:51] Server id = 1000 original server version = 6Set server version = 13
294 | I1230 15:22:17.622227  8870 serialized_row_reader.hpp:64] mem_size_ = 216
295 | I1230 15:22:17.622247  8870 serialized_row_reader.hpp:64] mem_size_ = 24
296 | I1230 15:22:17.622252  8870 serialized_row_reader.hpp:64] mem_size_ = 24
297 | I1230 15:22:17.622257  8870 serialized_row_reader.hpp:64] mem_size_ = 24
298 | I1230 15:22:17.622262  8870 serialized_row_reader.hpp:64] mem_size_ = 24
299 | I1230 15:22:17.622267  8870 serialized_row_reader.hpp:64] mem_size_ = 24
300 | I1230 15:22:17.622270  8870 server_version_mgr.cpp:51] Server id = 1 original server version = 6Set server version = 13
301 | I1230 15:22:17.622306  8870 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1 version = 13
302 | I1230 15:22:17.622311  8870 server_version_mgr.cpp:92] New min_version_ = 13
303 | I1230 15:22:17.622314  8870 ssp_push_row_request_oplog_mgr.cpp:129] server id  = 1 version to remove = 13
304 | I1230 15:22:17.622330  8870 serialized_row_reader.hpp:64] mem_size_ = 292
305 | I1230 15:22:17.622346  8870 serialized_row_reader.hpp:64] mem_size_ = 24
306 | I1230 15:22:17.622351  8870 serialized_row_reader.hpp:64] mem_size_ = 24
307 | I1230 15:22:17.622355  8870 serialized_row_reader.hpp:64] mem_size_ = 24
308 | I1230 15:22:17.622360  8870 serialized_row_reader.hpp:64] mem_size_ = 24
309 | I1230 15:22:17.622364  8870 serialized_row_reader.hpp:64] mem_size_ = 24
310 | I1230 15:22:17.622370  8870 serialized_row_reader.hpp:64] mem_size_ = 24
311 | I1230 15:22:17.628597  8870 bg_workers.cpp:970] get ServerShutDownAck from server 0
312 | I1230 15:22:17.628619  8870 bg_workers.cpp:970] get ServerShutDownAck from server 1
313 | I1230 15:22:17.663645  8867 server.cpp:236] Serializing table 2
314 | I1230 15:22:17.663669  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
315 | I1230 15:22:17.663674  8867 server.cpp:236] Serializing table 1
316 | I1230 15:22:17.663676  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
317 | I1230 15:22:17.663679  8867 server.cpp:236] Serializing table 0
318 | I1230 15:22:17.663682  8867 server_table.hpp:83] tmp_row_buff_size_ = 512
319 | I1230 15:22:17.663833  8870 serialized_row_reader.hpp:64] mem_size_ = 24
320 | I1230 15:22:17.663854  8867 server_threads.cpp:419] get ClientShutDown from bg 100
321 | I1230 15:22:17.663864  8870 bg_workers.cpp:970] get ServerShutDownAck from server 1000
322 | I1230 15:22:17.663868  8870 bg_workers.cpp:973] Bg worker 1100 shutting down
323 | I1230 15:22:17.664058  8867 server_threads.cpp:422] Server shutdown
324 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/Matrix-Factorization-Analysis.md:
--------------------------------------------------------------------------------
  1 | # Matrix Factorization分析
  2 | 
  3 | ## 1. 初始化
  4 | ### Configure Petuum PS
  5 | ```c++
  6 | // Configure PS row types
  7 | petuum::PSTableGroup::RegisterRow<petuum::DenseRow<float> >(0);  // Register dense 
  8 | ```
  9 | 注册Row类型，实际动作是将`Class DenseRow<float>`放到了一个`map<int32_t, CreateFunc> creator_map_`里面，map的key就是`RegisterRow(id)`中的id，这里是0。
 10 | 
 11 | ### Start PS
 12 | 
 13 | ```c++
 14 | // Start PS
 15 | // IMPORTANT: This command starts up the name node service on client 0.
 16 | //            We therefore do it ASAP, before other lengthy actions like
 17 | //            loading data.
 18 | petuum::PSTableGroup::Init(table_group_config, false);  // Initializing thread does not need table access
 19 | ```
 20 | 实际动作是new出来一个TableGroup。
 21 | 
 22 | Server is different from NameNode. NameNode is not considered as server.
 23 | 
 24 | Thread id范围
 25 | - 0~100: Server和NameNode thread使用
 26 | - 200~1000: app thread使用
 27 | 
 28 | Server thread需要设置consistency_model，具体如下：
 29 | 
 30 | ```c++
 31 |   //ServerThreadMainFunc ServerThreadMain;
 32 |   ConsistencyModel consistency_model = GlobalContext::get_consistency_model();
 33 |   switch(consistency_model) {
 34 |     case SSP:
 35 |       ServerPushRow = SSPServerPushRow;
 36 |       RowSubscribe = SSPRowSubscribe;
 37 |       break;
 38 |     case SSPPush:
 39 |       ServerPushRow = SSPPushServerPushRow;
 40 |       RowSubscribe = SSPPushRowSubscribe;
 41 |       VLOG(0) << "RowSubscribe = SSPPushRowSubscribe";
 42 |       break;
 43 |     default:
 44 |       LOG(FATAL) << "Unrecognized consistency model " << consistency_model;
 45 |   }
 46 |   ```
 47 |   background tread is used for storing opLog，同样backgroud thread也需要设置consistency_model如下：
 48 |   ```c++
 49 |   BgThreadMainFunc BgThreadMain;
 50 |   ConsistencyModel consistency_model = GlobalContext::get_consistency_model();
 51 |   switch(consistency_model) {
 52 |     case SSP:
 53 |       {
 54 |         BgThreadMain = SSPBgThreadMain;
 55 |         MyCreateClientRow = CreateSSPClientRow;
 56 |         GetRowOpLog = SSPGetRowOpLog;
 57 |       }
 58 |       break;
 59 |     case SSPPush:
 60 |       {
 61 |         BgThreadMain = SSPBgThreadMain;
 62 |         MyCreateClientRow = CreateClientRow;
 63 |         system_clock_ = 0;
 64 |         GetRowOpLog = SSPGetRowOpLog;
 65 |       }
 66 |       break;
 67 |     default:
 68 |       LOG(FATAL) << "Unrecognized consistency model " << consistency_model;
 69 |   }
 70 |   ```
 71 |   
 72 |   bg_workers也会添加vector clock.
 73 |   
 74 |   init thread也会添加vector clock
 75 |   
 76 |   TableGroupConfig里面还有一个aggressive_clock属性：
 77 |   ```c++
 78 |   // If set to true, oplog send is triggered on every Clock() call.
 79 |   // If set to false, oplog is only sent if the process clock (representing all
 80 |   // app threads) has advanced.
 81 |   // Aggressive clock may reduce memory footprint and improve the per-clock
 82 |   // convergence rate in the cost of performance.
 83 |   // Default is false (suggested).
 84 |   bool aggressive_clock;
 85 |   ```
 86 |   如果是true的，每一个commit（也就是clock()）都要send oplog。
 87 |   
 88 | ## Server thread执行逻辑
 89 | 
 90 |  ConnectToNameNode()
 91 |  
 92 |  Server thread可以connect所有client里面的bg threads。Server thread的功能：
 93 |  - 接收到kCreateTable消息后，会HandleCreateTable()
 94 |  - 接收到kRowRequest消息后，会HandleRowRequest()
 95 |  - 接收到kClientSendOpLog消息后，会HandleOpLogMsg()
 96 |  - 接收到kClientShutDown消息后，会HandleShutDownMsg()
 97 | 
 98 |  
 99 | ## StandardMatrixLoader
100 | 
101 | `num_workers_`是整个集群中的worker thread个数。每一个worker thread有一个访问Matrix的index，这个index被存在`worker_next_el_pos_`中。
102 | 
103 | Client的main thread会利用StandardMatrixLoader将整个Matrix load到内存，然后让每个worker thread顺序访问。
104 | 
105 | ## matrixfact.CreateTable()
106 | CreateTable() 先设置table的`max_table_staleness`属性，然后调用`Bgworkers::CreateTable(table_id, table_config)`，该函数会将要create的Table信息通过`SendInProc(id_st_=100, msg, msg_size)`发送给Bg thread。
107 | 
108 | Bg thread initialization logic:
109 | - Establish connections with all server threads (app threads cannot send message to bg threads until this is done);
110 | - Wait on a "Start" message from each server thread;
111 | - Receive connections from all app threads. Server message (currently none for pull model) may come in at the same time.
112 | 
113 | 在初始化`petuum::PSTableGroup::Init(table_group_config, false);`里面就调用Bg thread的`SSPBgThreadMain()`的方法，然后调用`BgWorkers::HandleCreateTables()`方法。由于在Init()的时候，还没有createTable的需求，因此`BgWorkers::HandleCreateTables()`会快速返回。当main()中调用createTable时，比如，` petuum::PSTableGroup::CreateTable(0,table_config);
114 | ` 会向bg thread发送createTable的消息（类型是kBgCreateTable），然后标号是100的bg thread会调用HandleCreateTable()，bg thread的HandleCreateTable()会向NamNode发送创建Table的信息，收到NameNode反馈的信息后，会使用下面的语句来真正地创建表，也就是说Table存在于bg thread中：
115 | 
116 | ```c++
117 | client_table  = new ClientTable(table_id, client_table_config);
118 | ```
119 | 
120 | 在创建一个Table时，会同时创建其Consistency model，目前只有两种：
121 | - SSP：对应创建 SSPConsistencyController
122 | - SSPPush：对应创建 SSPPushConsistencyController
123 | 
124 | 在MF中，L，R和Loss table的Consistency model都是SSPPush。
125 | 
126 | ConsistencyController负责控制对Table的访问，提供了GetAsync(row_id)，Get(row_id, row_accessor), ThreadGet(row_id, row_accessor)等方法。其中最重要的方法是Get(row_id)，该方法会check freshness，如果row_id不存在或者stale is too old。
127 | 
128 | bg thread创建完表以后，会将创建完的信息发送给main() thread。


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/MatrixFactorization.md:
--------------------------------------------------------------------------------
 1 | # Analysis of Matrtix Factorization
 2 | 
 3 | ## 算法
 4 | 
 5 | 矩阵LR分解是将一个N * M 的矩阵分解为N * K的L矩阵和K * M的R矩阵。
 6 | ![](figures/matrixfact.png)
 7 | 
 8 | 根据`rank(AB) ≤ min(rank(A), rank(B))`可知，如果Matrix的秩大于K，那么LR分解后的矩阵乘积会丢失Matrix的一些信息，类似PCA和SVD。
 9 | 
10 | ## 算法并行化
11 | ![](figures/parallel-matrixfact.png)
12 | 
13 | ## 在PS上实现矩阵分解算法
14 | 
15 | ![](figures/matrixfact-petuum.png)


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/PetuumArchitecture.md:
--------------------------------------------------------------------------------
  1 | # Petuum原理
  2 | 
  3 | ## LazyTable基本架构
  4 | ![](figures/Architecture.png)
  5 | 
  6 | 1. 存放parameters的table的rows分布在多个tablet servers上。
  7 | 2. 执行一个App script后，PS会在每个Client都会运行一个App program (e.g., matrixfact.main())，每个App program可以生成多个app threads。App thread相当于MapReduce/Spark中的task。
  8 | 3. App thread通过client library来访问相应的table servers获取所需的table中的rows。
  9 | 4. Client library维护了一个多级cache和operation logs来减少与table server的交互。
 10 | 
 11 | ## LazyTable数据模型与访问API
 12 | 
 13 | ### Data model: Table[row(columns)]
 14 | 
 15 | 由于ML中算法基本使用vector或者matrix，所以可以用Table来存储参数。
 16 | 
 17 | 与二维表类似，一个Table（比如matrixfact中的`L_Table`）包含多个row，row一般是`denseRow`或者`sparseRow`，一个row包含多个column。具体的parameter存在table中的cell中。具体实现时，Table可以用`hashmap<rowid, row>`来实现。
 18 | 
 19 | 由于Table中的paramters会被多个threads更新，所以row支持一些聚合操作，比如plus, multiply, union。
 20 | 
 21 | ### LazyTable操作
 22 | 
 23 | 因为要对Table进行读写更新操作，因此Table需要支持一些操作，LazyTable的操作接口借鉴了Piccolo的接口：
 24 | 
 25 | 1. read(tableid, rowid, slack)
 26 | 
 27 | 	读取row，如果local cache中存在该row且其slack满足staleness bound（也就是local cache中的参数足够新），那么从local cache读取该row，否则暂停读取线程（the calling thread waits）。这个API也是唯一可以block calling thread的API。
 28 | 	
 29 | 2. update(tableid, rowid, delta)
 30 | 
 31 | 	更新table中row的参数，newParameter = oldParameter + delta，这三个都是vector。
 32 | 	
 33 | 3. refresh(tableid, rowid, slack)
 34 | 
 35 | 	如果process cache（被多个app thread共享）中的table中的row已经old了，就更新之。
 36 | 	
 37 | 4. clock()
 38 | 	
 39 | 	调用后表示calling thread已经进入到下一个周期，因为SSP不存在固定的barrier，所以这个看似会synchronization的API并不会block calling thread。
 40 | 	
 41 | ### Data freshness and consistency guarantees
 42 | 
 43 | 1. 数据新鲜度保证：
 44 | 
 45 | 	每个row有一个data age field（也就是clock）用于表示该row的数据新鲜度。假设一个row的当前data age是t，那么表示该row里面的参数 contains all updates from all app threads for1, 2, ..., t. 
 46 | 	
 47 | 	对于SSP来说，当calling thread在clock t的周期内发送`read(tableid, rowid, slack)`的请求时，如果相应row的`data age >= t-1-slack`，那么该row可以返回。
 48 | 
 49 | 2. read-my-updates:
 50 | 
 51 | 	ready-my-updates ensures that the data read by a thread contains all its own updates.
 52 | 
 53 | ## LazyTable系统模块之Tablet Servers
 54 | 
 55 | ### Tablet Servers基本功能
 56 | 
 57 | 一个逻辑上的Table可以分布存放在不同的tablet server上，比如`L_Talbe`中的 i-th row 可以存在`tablet_server_id = i % total_num_of_servers`上。每个tablet server都将rows存放在内存中。
 58 | 
 59 | 每个tablet server使用一个vector clock（也就是`vector<Clock>`）来keep track of rows的新鲜度。vector中第i个分量表示第i个row的clock，vector中最小的clock被定义为`global_clock_value`，比如`global_clock_value = t` 表示所有的app threads都已经完成了clock t周期的计算及参数更新。问题：每个tablet server只存储table中的一部分rows，一部分rows达到了clock t就能说所有的app threads都完成了clock t周期的计算？
 60 | 
 61 | ### Table updates
 62 | 
 63 | 由于tablet server会不断收到来自多个app thread的update请求，tablet server会先将update请求做一个本地cache（将update请求放到pending updates list中）。当且仅当收到client发送clock()请求时，tablet server才会集中处理将这些updates。这样可以保证row的新鲜度由vector clock唯一决定。
 64 | 
 65 | ### Table read
 66 | 
 67 | 当tablet server收到client端发来的read请求，会先查看`global_clock_value` （为什么不是该row的data age？），如果tablet server中的row新鲜度满足requested data age要求（`global_clock_value >= t-1-slack`），那么直接返回row给client。否则，将read request放到pending read list里面，并按照requested data age排序（从大到小？）。当`global_clock_value`递增到requested data age时，tablet server再将相应的row返回给client。除了返回row，tablet server还返回data age和requester clock。前者是`global_clock_value`，后者是client's clock（说明了which updates from this client have been applied to the row data，client可以利用这个信息来清除一些本地的oplogs）。
 68 | 
 69 | ## LazyTable系统模块之Client library
 70 | 
 71 | Client library与app threads在同一个process，用于将LazyTable API的调用转成message发送到tablet server。Client library包含多层caches和operation logs。Client library会创建一个或多个background threads （简称为bg threds）来完成propagating updates和receiving rows的工作。
 72 | 
 73 | Client library由两层 caches/oplogs 组成：process cache/oplog和thread cache/oplog。Process cache/oplog被同在一个进程中的所有app thread和bg thread共享。Each thread cache/oplog is exclusively associated with one app thread.（实现好像不是这样的）。Thread cache的引入可以避免在process cache端有过多的锁同步，但是只能cache一些rows。
 74 | 
 75 | Client library也使用vector clock来track app thread的clock，第i个分量代表第i个app thread已经进入的clock周期。
 76 | 
 77 | ### Client updates
 78 | App thread调用update(deltas)后，会先去访问对应的thread cache/oplog，如果cache中相应的row存在，那么`thread.cache.row += update.deltas`，同时会update写入到oplog中。不存在就直接存起来。当app thread调用clock()，那么在thread oplog中的updates都会被push到process oplog中，同时`process.cache.row += updates.deltas`。如果thread cache/oplog不存在，update会直接被push到process cache/oplog。
 79 | 
 80 | 当一个client process中所有app threads都完成clock为 t 的计算周期，client library会使用一个bg thread（是head bg thread么？）向table server发送一个消息，这个消息包含clock t，process oplogs中clock为 t 的updates。这些process cache/oplogs中的updates会在发送该消息后一直保留，直到收到server返回的更新后的rows。
 81 | 
 82 | ### Client read
 83 | 
 84 | 在clock t周期内，如果一个app thread想要去读row r with a slack of s，那么client library会将这个请求翻译成`read row r with data age >= t-s-1`。接着，client library会先去thread cache中找对应的且满足条件的row，如果不存在就去process cache中找，如果还找不到就向tablet server发送要read row r的请求，同时block calling thread，直到server返回row r。在process cache中每个row有一个tag来表示是否有row request正在被处理，这样可以同步其它的request统一row的请求。
 85 | 
 86 | 当tablet server返回row r时，client library端有一个bg thread会接受到row r，同时接受requester clock rc。rc表示该client提交的clock t的updates已经被处理。之后，process oplog就可以清除`clock <= rc` 的update日志。为了保证 read-my-updates，接收到row r 后，会将process oplog中`clock > rc`的操作作用到row r上，这样就可以得到本地最新的row r。最后，前面接受row r的bg thread会跟心row r的clock并将其返回到waiting app threads。
 87 | 
 88 | 
 89 | ## Prefetching and fault-tolerance
 90 | ### 数据预取
 91 | 
 92 | LazyTable提供了预取API refresh()，函数参数与read()一样，但与read()不一样的地方是refresh()不会block calling thread。
 93 | 
 94 | LazyTable支持两种预取机制：conservative prefetching和aggressive prefetching。前者只在必要的时候进行refresh，如果`cache_age < t-s-1`，prefetcher才会发送一个`request(row = r, age >= t-s-1)`。对于Aggressive prefetching，如果当前的row不是最新的会主动去更新。
 95 | 
 96 | 
 97 | ## Differences with Spark
 98 | 
 99 | 1. Spark的通信模式比较简单，最复杂的是shuffle模块，需要redcuer去mapper端fetch数据。
100 | 2. 在PS中，client与server有频繁的交互通信。
101 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/Petuum基本架构.md:
--------------------------------------------------------------------------------
  1 | # Petuum 基本架构
  2 | 
  3 | 
  4 | ## Parameter Server (PS) 概念
  5 | 
  6 | PS is a key-value store that allows different processes to share access to a set of variables.
  7 | 
  8 | 对于Distirbuted ML来说，process指的是learning process，variables指的是parameters。PS的特点是
  9 | 
 10 | 1.  Data partition
 11 | 	
 12 | 	每个节点存放一部分data
 13 | 2. Shared model
 14 | 
 15 | 	多个learning process共享model（模型里面包含参数）
 16 | 	
 17 | ## 基本系统架构
 18 | 1. 一个Server
 19 | 
 20 | 	maintains the master copy of the **parameters** and propagates the workers’ **writes (updates)** to other workers.
 21 | 2. 多个Worker
 22 | 
 23 | 	每个worker通过client library去server那里获取parameters。Client library还会cache之前从server那里获取到的parameters，这样worker就不必每次都去Server那里获取最新的parameters。这个cache成为process storage。每次对parameter所做的write(update)操作都会被insert到一个update table（代码里对应**Oplog**）。
 24 | 	
 25 | ##  数据模型
 26 |  
 27 | 在PS中，parameters被表示成key-value paris，并存放在多个Table中。每一个table包含多个Rows，每个Row的类型都相同，且有一个RowID。Row中的每一个cell都包含一个Column ID，每个cell一般存放一个parameter。这样存放到Parameter Server中的每个parameter可以表示成<RowID, ColumnID, Parameter>。Table存放在一台或者多台机器上。
 28 | 
 29 | Table-Row既是数据模型也是存储格式。PS允许app选择适合自己的数据结构来组织每个row中的parameters，甚至允许app自定义Rows。
 30 | 
 31 | 每一个table都有自己的update table，update table也有自己的rows，不过是用来存放log的，这里称之为row  oplog。
 32 | 	
 33 | ##  创建一个PS的app
 34 | 
 35 | 创建一个简单的app，该app包含一个single-threaded的client和一个Table。
 36 | 
 37 | ### 1. 引入头文件
 38 | ```c++
 39 | #include <petuum_ps_common/include/petuum_ps.hpp>
 40 | ```
 41 | 
 42 | 所有app只需包含这个头文件，该文件包含了PS的所有APIs。第一步是去初始化PS的环境，相当于Spark里面的SparkContext，负责初始化的线程被称作init thread。
 43 | 
 44 | 为了简化例子，我们只run一个worker process。如果要run多个worker process的话，所有的worker process要执行同样的初始化流程，并创建多个tables。
 45 | 
 46 | ### 2. 注册row types
 47 | Row type可以是多种类型，但需要在PS启动计算前注册。下面的API可以创建一个row ID到row type的映射，其中row ID是32位的integer。之后，app可以在创建table时候使用row ID来获取相应的row type。
 48 | 
 49 | 下面的例子会创建一个row type，类型为vector<T>，这个类型由PS的API提供。Row里面的T就是parameter的类型。更具体地，这里我们注册`petuum::DenseRow<int>`到PS中，并将所有参数初始化为0，如下：
 50 | 
 51 | ```c++
 52 | // register row type petuum::DenseRow<int> with ID 0.petuum::PSTableGroup::RegisterRow<petuum::DenseRow<int> >(0);
 53 | ```
 54 | 	
 55 | ### 3. 初始化PS环境
 56 | 就像在SparkConf中要设置master，port，app name等，在Petuum中，需要设置`host_map`。我们需要将每一个worker process的信息加入到该map中，形成一个entry。每个entry有一个ID（从0开始计数的整数），一个IP地址，还有一个当前未用的port（比如10000）。具体代码如下：
 57 | 
 58 | ```c++
 59 | petuum::TableGroupConfig table_group_config;table_group_config.host_map.insert(std::make_pair(0, HostInfo(0,    "127.0.0.1", "10000")));
 60 | petuum::PSTableGroup::Init(table_group_config, false);
 61 | ```
 62 | 
 63 | 将worker process加入到`host_map`中后，就可以使用`petuum::PSTableGroup::Init()`来初始化PS的环境，Init()还包含一个boolean flag，如果设置为true，就表示init thread可以访问table的所有APIs，这些APIs在`petuum:PSTableGroup::GetTableOrDie()`中定义。一般将flag置为false。
 64 | 
 65 | ### 4. 创建Tables
 66 | 
 67 | 先show代码
 68 | 
 69 | ```c++
 70 | petuum::ClientTableConfig table_config;table_config.table_info.row_type = 0;table_config.table_info.row_capacity = 100;
 71 | table_config.process_cache_capacity = 1000;table_config.oplog_capacity = 1000;
 72 | // here 0 is the table ID, which will be used later to get table.bool suc = petuum::PSTableGroup::CreateTable(0, table_config);
 73 | ```
 74 | 对于一个app来说，上面的配置参数都需要设置。配置参数的具体含义见下表：
 75 | 
 76 | | 名称 | 默认值 | 解释|
 77 | |:-----|:------|:-------|
 78 | | table\_info.row\_type| N/A | row type|
 79 | | table\_info.row\_capacity| 0 | 对于 DenseRow，指column个数，对SparseRow无效|
 80 | | process\_cache\_capacity| 0 | row个数|
 81 | | process\_oplog\_capacity | 0 | update table里面最多可以写入多少个row|
 82 | 
 83 | 调用`CreateTable()`后，就会去创建tables，创建好后需要调用下面的API来完成table创建过程。
 84 | 
 85 | ```c++
 86 | petuum::PSTableGroup::CreateTableDone();
 87 | ```
 88 | ### 5. 创建并运行Worker threads
 89 | 
 90 | 接下来我们将会创建一个worker thread，该thread可以通过Table接口来访问到parameters。
 91 | 
 92 | 首先定义一个概念，可以访问table APIs的worker thread被称为**table thread**。
 93 | 
 94 | 在成为table thread之前，该worker thread需要通过下面的API来注册自己
 95 | 
 96 | ```c++
 97 | int thread_id = petuum::PSTableGroup::RegisterThread();
 98 | ```
 99 | 然后就可以通过Table ID来得到table实例：
100 | 
101 | ```c++
102 | petuum::Table<int> table = petuum::PSTableGroup::GetTableOrDie(0);
103 | ```
104 | 可以通过这个`petuum:Table`类型来访问table里面的parameters，之后可以进行计算。
105 | 
106 | 当worker thread完成计算之后，需要通过下面的API注销自己
107 | 
108 | ```c++
109 | petuum::PSTableGroup::DeregisterThread();
110 | ```
111 | 
112 | 如果想让init thread也能访问到table的API，需要将`petuum::PSTableGroup::Init(table_group_config, false);`中的false改为true。init thread不需要注册和注销自己，但它需要通过下面的API等待所有其他thread完成注册。
113 | 
114 | ```c++
115 | petuum::PSTableGroup::WaitThreadRegister();
116 | ```
117 | 
118 | ### 6. Stop PS
119 | 当所有的worker threads都完成计算退出，我们可以通过下面的API shutdown PS。
120 | 
121 | ```c++
122 | petuum::PSTableGroup::ShutDown();
123 | ```
124 | 
125 | ## Table API
126 | 
127 | ### 1. 访问Table
128 | 在read或者update table之前，需要先get table
129 | ```c++
130 | // Gain access to table.template<typename UPDATE>petuum::Table<UPDATE> petuum::PSTableGroup::GetTableOrDie(int table_id);
131 | ```
132 | ### 2. Read parameters
133 | 
134 | 先new一个`RowAccessor`对象，给定`row_id`后，下面的API会将row信息写入到`row_accessor`指向的`RowAccessor`对象。
135 | 
136 | ```c++
137 | void petuum::Table::Get(int32_t row_id, RowAccessor *row_accessor);
138 | ```
139 | 
140 | ### 3. Update parameters
141 | Petuum提供了两种更新参数的方式：
142 | - 只更新一个parameter
143 | 	通过`row_id`和`column_id`定位到parameter，然后更新
144 | 	
145 | 	```c++
146 | 	void petuum::Table<UPDATE>::Inc(int32_t row_id, int32_t column_id, UPDATE update);
147 | 	```
148 | - 更新一组参数
149 | 	通过`row_id`定位到row，然后更新
150 | 	```c++
151 | 	void petuum::Table<UPDATE>::BatchInc(int32_t row_id, const UpdateBatch<UPDATE>& update_batch);
152 | 	```
153 | 
154 | ### 4. Completion of A Clock Tick
155 | 
156 | Inform PS that this thread is advancing to the next iteration, workers only commit their updates at the end of each clock.
157 | 
158 | ```c++
159 | static void petuum::PSTableGroup::Clock();
160 | ```
161 | 
162 | ## 编译
163 | 
164 | ### 1. 编译PS
165 | 进入root文件夹，执行：
166 | ```c++make third_party_coremake ps_lib -j8
167 | ```
168 | PS library依赖很多第三方库，第一条command就是去编译这些库的。
169 | 
170 | ### 2. 编译app
171 | 在自己的app目录下建立Makefile，并将`defns.mk`里面的内容加入到app的Makefile中。
172 | 
173 | 
174 | 
175 | 
176 | 
177 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/Petuum基础.md:
--------------------------------------------------------------------------------
  1 | # Petuum基础
  2 | 
  3 | 
  4 | Petuum将ML算法应用分为两种类型：Big data（has many data samples）和Big Model（has very large parameter and intermediate variable spaces）。针对这两种应用，Petuum分别设计了两个系统功能模块及一个系统优化模块：
  5 | 
  6 | ## 主要系统模块
  7 | 
  8 | - Distributed parameter server (i.e. key-value storage)
  9 |     - 用于global的参数同步，主要支持Big data类型算法的并行化，比如矩阵LR分解
 10 | - Distributed model scheduler (STRADS)
 11 |     - 调度worker tasks，主要支持Big Model类型算法的并行化，比如Lasso 
 12 | - Out-of-core (disk) storage for limited-memory situations
 13 |     - 针对内存不足的情况，设计的磁盘存储策略
 14 | 
 15 | 前两种可以组合使用，但在目前的例子是分开使用的。
 16 | 
 17 | 更详细的介绍：
 18 | 
 19 | We have develop a prototypic framework for Big ML called Petuum, which comprises several interrelated components, each focused on exploiting various specific properties of *iterative-convergent* behavior in ML. **The components can be used individually, or combined to handle tasks that require their collective capabilities.** Here, we focus on two components:- **Parameter Server for global parameters:** Our parameter server(Petuum-PS) is a distributed key-value store that enables an easy-to-use, distributed-shared-memory model for writing distributed ML programs over **BIG DATA**. Petuum-PS supports novel consistency models such as bounded staleness, which achieve provably good results on iterative-convergent ML algorithms. Petuum-PS additionally offers several “tuning knobs” available to experts but otherwise hidden from regular users such as thread and process-level caching and read-my-write consistency. We also support out-of-core data streaming for datasets that are too large to fit in machine memory.
 20 | ![](figures/Petuum-ps-topology.png)
 21 | - **Variable Scheduler for local variables:** Our scheduler (STRADS) analyzes the variable structure of ML problems, in order to find parallelization opportunities over **BIG MODEL** while avoiding error due to strong dependencies. STRADS then dispatches parallel variable updates across a distributed cluster, while prioritizing them for maximum objective function progress. Throughout this, STRADS maintains load balance by dispatching new variable updates as soon as worker machines finish existing ones.
 22 | 
 23 | ![](figures/STRADS-architecture.png)
 24 | 
 25 | ## 基本逻辑架构
 26 | 
 27 | ![](figures/petuum-overview.png)
 28 | 
 29 | An update function updates the model parameters and/or latent model states 𝜃 by some function 𝚫𝜃(𝓓) of the data 𝓓. Data parallelism divides the data 𝓓 among different workers, whereas model parallelism divides the parameters (and/or latent states) 𝜃 among different worker.
 30 | 
 31 | 在左图中，数据是分布的但模型参数$\theta$没有分布，每个worker节点持有完整的参数，${worker}\_{i}$要在分块数据${D}\_{i}$上计算参数更新$\Delta\theta({D}\_{i})$（可以想象成梯度）。一般来说，如果参数可以batch update（不需要一个固定的更新顺序），那么计算$\Delta\theta({D}\_{i})$与计算$\Delta\theta({D}\_{j})$过程可以独立，就可以用PS的架构了。
 32 | 
 33 | 在右图中，模型是分布的但数据没有分布，每个worker持有全部的数据，但只持有一部分参数${\theta}\_{i}$，${worker}\_{i}$在整个数据${D}$上计算一部分参数更新$\Delta{\theta}\_{i}(D)$。
 34 | 
 35 | ## 基本算法
 36 | - Matrix Factorization
 37 |     - Stochastic Gradient Descent更新方式
 38 |     - a data-parallel algorithm
 39 | - LASSO regression
 40 |     - Coordinate Descent更新方式
 41 |     - a model-parallel algorithm
 42 | 
 43 | ## 共享目录
 44 | 
 45 | **We highly recommend using Petuum in an cluster environment with a shared filesystem** (e.g. shared home directories).在实际环境中，这一条很难实现，目前cluster不会有共享的home目录。 Provided all machines are identically configured and have the necessary packages/libraries, you only need to compile Petuum (and any apps you want to use) once, from one machine. The Petuum ML applications are all designed to work in this environment, as long as the input data and configuration files are also available through the shared filesystem.
 46 | 
 47 | ## PS的配置文件
 48 | ```
 49 | 0 ip_address_0 10000
 50 | 1 ip_address_0 9999
 51 | 1000 ip_address_1 9999
 52 | 2000 ip_address_2 9999
 53 | 3000 ip_address_3 9999
 54 | ```
 55 | 
 56 | Each line in the server configuration file format specifies an ID (0, 1, 1000, 2000, etc.), the IP address of the machine assigned to that ID, and a port number (9999 or 10000). Every machine is assigned to one ID and one port, except for the first machine, which is assigned two IDs and two ports because it has a special role. 整个role就是NameNode。
 57 | 
 58 | If you want to simultaneously run two Petuum apps on the same machines, make sure you give them **separate** Parameter Server configuration files with **different ports**. **The apps cannot share the same ports!**
 59 | 
 60 | ## ML App: Matrix Factorization
 61 | 这个例子的运行过程文档已经讲的很清楚，这里再解释几个文档没有细讲的地方：
 62 | M = L * R，M是9x9的矩阵，分解后的L是9x3的矩阵，R是3x9的矩阵。
 63 | 
 64 | 1. 当K=3时，MF的输出结果（L矩阵如下）：
 65 | 
 66 |     ```
 67 |     0.115764 1.03662 0.100797 
 68 |     0.115764 1.03662 0.100797 
 69 |     0.115764 1.03662 0.100797 
 70 |     -1.07724 0.107777 0.922327 
 71 |     -1.07724 0.107777 0.922327 
 72 |     -1.07724 0.107777 0.922327 
 73 |     1.16671 -0.218361 1.17542 
 74 |     1.16671 -0.218361 1.17542 
 75 |     1.16671 -0.218361 1.17542 
 76 |     
 77 |     ```
 78 |     可以发现是每三行几乎一样，原因是$rank(AB) \leq min(rank(A),rank(B))\leq K = 3$。当K=3是9的一个公约数时，分解得到三个线性无关的向量，但当K=4时，不是9的公约数时就没有这个特性了。
 79 | 
 80 | 2. 当K=4时，MF的输出结果（R矩阵如下）：
 81 |     ```
 82 |     -0.0919685 -0.668313 0.789098 0.0187957 
 83 |     -0.225036 -0.585597 0.82593 0.123971 
 84 |     0.140019 -0.814718 0.724556 -0.16359 
 85 |     -0.839692 0.564146 0.475171 -0.915151 
 86 |     -0.728989 0.494618 0.444423 -1.00233 
 87 |     -0.790882 0.533725 0.46164 -0.953696 
 88 |     -1.03243 -1.04973 -0.924 -0.464828 
 89 |     -0.917422 -1.12158 -0.955918 -0.55563 
 90 |     -1.11345 -0.998635 -0.901482 -0.401137 
 91 |     ```
 92 | 
 93 | 3. App configuration
 94 |     需要解释几个配置:
 95 |     - `client_worker_threads`: how many worker threads to use on each machine
 96 |     - `--staleness x`: turn on Stale Synchronous Parallel (SSP) consistency at staleness level x; often improves performance when using many machines
 97 |     -  `--lambda x`: sets the L2 regularization strength to x; default is 0
 98 |     -  `--offsetsfile x`: used to provide an "offsets file" for limited-memory situations; 
 99 |     -  `--init_step_size x`, --step_size_offset y, --step_size_pow z: used to control the SGD step size. The step size at iteration t is $x * {(y+t)}^{-z}$. Default values are $x=0.5, y=100, z=0.5$.
100 |     -  `--ps_row_cache_size x`: controls the cache size of each worker machine. By default, the MF app caches the whole L, R matrices for maximum performance, but this means every machine must have enough memory to hold a full copy of L and R. If you are short on memory, set x to the maximum number of L rows and R columns you wish to cache. For example, `--ps_row_cache_size 100` forces every client to only cache 100 rows of L and 100 columns of R.
101 | 
102 |     
103 |     比如要run一个client worker threads为4，staleness为5，lambda为0.1的MF例子：
104 |     ```
105 |     scripts/run_matrixfact.sh sampledata/9x9_3blocks 3 100 mf_output scripts/localserver 4 "--staleness 5 --lambda 0.1"
106 |     ```
107 |     


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/Petuum本地编译运行.md:
--------------------------------------------------------------------------------
 1 | # Petuum本地编译运行
 2 | 
 3 | 
 4 | ## 编译Pettum
 5 | 按照Pettum的安装文档[Installation](https://github.com/petuum/public/wiki/Installation)编译。该编译流程会先下载并编译第三方库（如boost，gflags，leveldb等等），然后会编译petuum自身（见petuum/Makefile）。
 6 | 
 7 | ## 编译apps
 8 | 上面的步骤只是编译Petuum计算框架，每个app需要单独编译。比如要编译Matrixfact，只需要进入apps/matrixfact，执行`make`就行了。具体参见[ML App: Matrix Factorization](https://github.com/petuum/public/wiki/ML-App:-Matrix-Factorization)。其他的apps（比如LDA，DNN类似）。
 9 | 
10 | ## 本地运行app测试
11 | 按照Pettum的[wiki](https://github.com/petuum/public/wiki)运行就好了。
12 | 
13 | ## 在Eclipse里面编译petuum
14 | 想要深入理解代码，肯定要导入IDE中debug了。
15 | 
16 | 1. 导入project
17 | 
18 |     下载Linux版本的Eclipse CDT，将编译好的petuum整个文件夹拷贝到workspace下面，然后删去third_party下面的src文件夹（整个文件夹存放了第三方库的源码，可以不要）。最后，将workspace/petuum import到Eclipse里面，选择`File->New project->C/C++->Makefile Project with Existing Code`，设置Toolchain for Indexer Settings为Linux GCC。
19 |  
20 | 2. 编译Petuum
21 | 
22 |     导入后，直接`Project->Build project`就可以编译petuum了，以后可以修改petuum的源代码，然后直接build就行了。
23 |     
24 | 3. 编译apps
25 |     
26 |     Eclipse不能自动识别子项目的Makefile，因此`Project->Build project`只能编译petuum自身不能编译apps。解决方法是手动添加apps的编译项。具体方法是在`Properties->C/C++ Build->Manage configurations->New`添加一个编译项，比如name设为matrixfact，确定后设置Build directory为`${workspace_loc:/petuum-0.93/apps/matrixfact}`，Refresh Policy里面的Resources设置为`${workspace_loc:/petuum-0.93/apps/matrixfact}`，最后设置matrixfact为active后，就可以编译matrixfact子项目了。
27 | 
28 | ## 在Eclipse里面运行petuum
29 | 
30 | Petuum使用脚本运行方式来执行app，比如需要在某个node上执行
31 | 
32 | ```
33 | scripts/run_matrixfact.sh sampledata/9x9_3blocks 3 100 mf_output scripts/localserver 4 "--staleness 5 --lambda 0.1"
34 | ```
35 | 来运行matrixfact，那么如何在Eclipse里达到同样的效果？
36 | 
37 | 答案是在`Run->External Tools->External Tools Configurations`里面的Program里面添加一个运行项，比如命名为run_matrixfact。然后设置Location为`${workspace_loc:/petuum-0.93/apps/matrixfact/scripts/run_matrixfact.sh}`，Workding Directory为`${workspace_loc:/petuum-0.93/apps/matrixfact}`， Arugments为`sampledata/9x9_3blocks 3 100 mf_output scripts/localserver 2`。然后run就相当于在terminal里面执行该脚本了。但遗憾的是没有debug as external tools。想要debug，目前可以用下面的方法解决。
38 | 
39 | ## 在Eclipse里面debug app
40 | 
41 | 首先要new一个Debug configuration，名字可以是app的名字，比如matrixfact。C/C++ Application是app的路径，比如`apps/matrixfact/bin/matrixfact`。Build Configuration选择之前添加的`matrixfact`。Program arguments填入
42 | 
43 | ```
44 | --hostfile scripts/localserver --datafile sampledata/9x9_3blocks --output_prefix mf_output --K 3 --num_iterations 100 --num_worker_threads 2 --num_clients 1 --client_id 0
45 | ```
46 | 
47 | Working directionary输入`${workspace_loc:petuum-0.93/apps/matrixfact}`。最后设置断点，开始debug。
48 | 
49 | > 需要注意的是以这种方式进行debug仅仅是在debug众多client中的一个。
50 |     
51 | 
52 | 
53 | 
54 | 
55 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/Petuum系统及Table配置.md:
--------------------------------------------------------------------------------
 1 | # Petuum系统及Table配置
 2 | 
 3 | ## 系统配置
 4 | 
 5 | 系统相关的配置都会被存放到`petuum::TableGroupConfig`对象中。
 6 | 
 7 | ### 1. 建立一个distributed app
 8 | 
 9 | 每一个worker process需要在`host_map`中注册（以一个entry方式存在）。为了简化配置，用户可以定义**optional** server file，交给PS处理。下面的API会用`server_file`中的内容初始化`host_map`。
10 | 
11 | ```c++
12 | void petuum::GetHostInfos(std::string server_file, std::map<int32_t, HostInfo> *host_map);
13 | ```
14 | 
15 | 一个Sever file的例子，三列分别是`processID, IP addresss, port`：
16 | ```c0 192.168.1.1 100001 192.168.1.2 10000
17 | ```
18 | 另外，需要告诉PS总共要run多少个process以及每个process的ID，如下表
19 | 
20 | | 名称 | 默认值 | 解释 |
21 | |:-------|:----------|:-------|
22 | |num\_total\_clients| 1 | Number of processes to run|
23 | | client\_id | 0 | This process's ID |
24 | 
25 | ### 2. 让每个node上run多个worker threads
26 | 
27 | 设置app threads的个数，包含init thread。
28 | 
29 | | 名称 | 默认值 | 解释 |
30 | |:-------|:----------|:-------|
31 | | num\_total\_app\_threads | 2 | Number of local application threads, including the init threads |
32 | 
33 | ### 3. 建立多个Table
34 | 
35 | 基于以下原因用户可能由建立多个Tables的需求：
36 | - 有多个row types，不同row type对应的row个数（row capacity）也不一样
37 | - 不同的 staleness constraints
38 | - 一个table里row个数太多，放不下
39 | 
40 | 要设置多个table，需要更改下面的默认值：
41 | 
42 | | 名称 | 默认值 | 解释 |
43 | |:-------|:----------|:-------|
44 | | num\_tables | 1 | 系统包含的table个数|
45 | 
46 | ### 4. Taking SnapShots and Resume From SnapShots
47 | 
48 | SnapShot可以暂存中间计算结果，对于迭代型的ML算法来说，既可以暂停程序，也可以进行错误恢复，类似与Spark的checkpoint。相关参数见下表：
49 | 
50 | | 名称 | 默认值 | 解释 |
51 | |:-------|:----------|:-------|
52 | | snapshot\_clock | -1 | take snapshots every `x` iterations |
53 | | snapshot\_dir | "" | 存放snapshot的目录 |
54 | | resume_clock | -1 | if specified, resume from iteration `x` |
55 | | resume\_dir |  ""|从存放snapshot的目录中恢复 |
56 | 
57 | ### 5. Runtime Statistics
58 | 
59 | PS可以记录runtime statistics，但需要在defns.mk中注释掉下面这一行
60 | 
61 | ```c++
62 | PETUUM_CXXFLAGS += -DPETUUM_STATS
63 | ```
64 | ## Table配置
65 | 
66 | ### 1. 选择Client cahce types
67 | 
68 | Client端的cache type由两种：
69 |   
70 | - BoundedDense
71 | 
72 | 	BoundedDense是一个连续的memory chunk，适用于模型能够全部装载到client的memory里面的情况。如果`C`代表cache capacity ，那么此时可以访问到row IDs就是\[0, C-1\]。
73 | - BoundedSparse
74 | 
75 | 	BoundedSparse支持换出操作，因此适合于memory不够的情况。
76 | 	
77 | 	
78 | ### 2. Staleness threshold
79 | 
80 | Petuum里面最重要的概念就是staleness，也就是允许worker thread最多读取多少轮前的parameters。默认是0，通过下表设置
81 | 
82 | | 名称 | 默认值 | 解释 |
83 | |:-------|:----------|:-------|
84 | | table\_info.table\_staleness | 0 | SSP staleness threshold |
85 | 
86 | ### 3. Row capacity
87 | 
88 | | 名称 | 默认值 | 解释 |
89 | |:-------|:----------|:-------|
90 | | table\_info.row\_capacity | 0 | Row capacity |
91 | 
92 | 一些（比如dense）row types需要设置这个参数。1代表sparse，0代表dense。如果updates是dense的，最好设置成dense。


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/STRADS.md:
--------------------------------------------------------------------------------
 1 | # STRADS
 2 | 
 3 | 
 4 | STRADS意思是STRucture-Aware Dynamic Scheduler，一个动态调度框架，用于计算Big Model类型的app。STRADS调度的是模型参数，而不是data，调度参数会让参数更新和收敛的速度更快，但要求在参数间没有依赖。目前包含两个app：Lasso和Logistic Regression。
 5 | 
 6 | ## STRADS的四个组件
 7 | 四个组件组成了scatter/gather的拓扑结构。A Coordinator, multiple workers and an aggregator make a scatter/gather style topology.
 8 | 
 9 | ### 1. Scheduler
10 | 
11 | Scheduler maintain weight information of model parameter. In each iteration, scheduler selects a set of promising model parameters to dispatch based on the weight information so that updating the scheduled parameters is likely to increase convergence speed than updating randomly selected parameter as common in stochastic method. Scheduler update weight information on receiving weight change from the coordinator when the dispatched is completed in the coordinator side. In addition to weight based sampling, STRADS scheduler runs user defined model dependency checking routine for a give set of model parameters. If any pair of parameters has too strong interference, one of them will be removed from the set.
12 | 
13 | Scheduler维护参数间的权重信息$w$。在每次迭代时，scheduler先按照权重选择一部分promising的参数集合$S$分发给worker，这样的选择方式比random的选择方式能更快地收敛。当参数集合$S$计算并更新完毕后（这个过程由coordinator负责），coordinator会将$w$的更新信息发给scheduler。另外，scheduler可以运行user defined model 依赖检测程序来检测是否参数间有强依赖关系，如果有，那么去掉一个参数。
14 | 
15 | 
16 | ### 2. 一个Coordinator
17 | 
18 | Coordinator is in charge of keeping model parameters, scattering a dispatch of parameter over the worker machines, sending back weight change information to the scheduler. In i-th iteration, the coordinator receive a dispatch set from the scheduler and scatter the dispatch together to all over the worker machines. On receiving updated model parameter values from the aggregator, it will udpate model parameters and send weight change information to the scheduler.
19 | 
20 | Coordinator负责持有参数，分发参数给worker，更新参数，并将更新后的$w$信息发给scheduler。当迭代到第i轮时，coordinator会先收到scheduler发来的一个参数集合（dispatch set），然后将参数集合发到所有的worker节点。Worker计算参数更新$\Delta\theta$后，将$\Delta\theta$发给aggregator，aggregator负责更新参数，并将更新结果发给Coordinator，Coordinator存储新的$\theta$后将权重$w$更新信息发给scheduler。
21 | 
22 | ### 3. 多个Worker
23 | 
24 | On receiving a dispatch, worker executes user function to make a partial result with a partition of input data that is assigned to the worker machine. Each worker sends back its partial results to the aggregator.
25 | 
26 | 当收到参数集合$S$后，worker在data partition上执行user function，并将计算结果$\Delta\theta$发给aggregator。
27 | 
28 | ### 4. Aggregator
29 | 
30 | On collecting all partial results of one dispatch, aggregator runs user defined aggregation function to get new value of model parameters. New model parameter values are sent to the coordinator to be kept.
31 | 
32 | 当收到所有的所有worker发来的partial results后，aggregator运行user defined aggregation function来计算出新的参数$theta$，然后将新的参数发送给coordinator存放。
33 | 
34 | 
35 | ## STRADS提供的其他low level primitives
36 | 
37 | - Scheduling
38 | - Global Barrier
39 | - Data Partitioning
40 | - Message abstraction
41 | 
42 | ## STRADS编程接口
43 | 
44 | 整个编程范型是 scatter/gather。
45 | 
46 | Basically, STRADS allows users to define functions to run on scheduler, coordinator, workers and aggregator vertexes. In addition to the functions, use can define message types as C++ template for communicating across different vertexes. STRADS programming interfaces are implemented in the form of two classes. 
47 | 
48 | STRADS允许在scheduler，coordinator，workers和aggregator vertexes上自定义函数。除了自定义函数外，用户也可以定义不同vertex之间消息传递的message type。具体地， STRADS的编程接口由下面两个类实现：
49 | 
50 | ### Handler Class
51 | 
52 | Handler class is a template class where user can define user functions as class method here. T1 ~ T4 are template of user defined messages and used as parameter and return type of class methods(user functions). STRADS requires 4 major user functions for scheduling/updating parameters and three minor function for checking progress such as calculating objective function value.
53 | 
54 | 用于调度和参数更新的4个Major User Functions，T1 ~ T4 是用户定义的消息类型
55 | ```c++
56 |     T1 &dispatch_scheduling(SYSMSG, T3) 
57 |     void do_work(T1, T2) 
58 |     void do_msgcombiner(T2, stradsctx)
59 |     void do_aggregator(T3, stradsctx)
60 |     int check_dependency(list parameters)
61 |     set_initi_priority(list weight, model_cnt)
62 | ```
63 | Minor User Functions for progress checking，比如要查看目标函数的value。
64 | ```c++
65 |     void do_obj_calc(T4, stradsctx)
66 |     void do_msgcombiner_obj(T4, stradsctx)
67 |     void do_object_aggregation(T4, stradsctx)
68 | ```
69 | ### Message Class
70 | 
71 | Message class is a template class that allows user to define a type of messages that contains arbitrary number of elements. Logically, user can define any type for the element. STRADS provides several template classes that can make a message with one, two or three different kinds of element types. Again, you can put arbitrary number of elements with different types on a message. If you define your message type only with POD type, you can simply finish message class definition with defining elements type.
72 | 
73 |     element class
74 |     message class
75 | 
76 | 消息可以包含任意多个elements，每个element可以是任意类型。
77 | 
78 | 
79 | 
80 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/ServerThreads.md:
--------------------------------------------------------------------------------
 1 | # ServerThreads
 2 | 
 3 | ## 基本结构
 4 | 1. 每个client上的app进程持有一个ServerThreads object，这个object管理该client上的所有server threads。这些server threads的启动过程：`app.main() => PSTableGroup::Init() => ServerThreads::Init() => ServerThreadMain(threadId) for each server thread`。
 5 | 2. 每个server thread实际上是一个Server object。ServerThreads对象通过`vector<pthread_t> threads`和`vector<int> threads_ids`来引用server threads，通过其ServerContex指针用来访问每个server thread对应的Server object（`server_context_ptr->server_obj`）。
 6 | 3. 对于每一个server thread，都持有一个ServerContext，其初始化时`server_context.bg_threads_ids`存储PS中所有bg threads的`bg_thread_id`，`server_context.server_obj`存储该server thread对应的Server object。
 7 | 4. 每个Server object里面存放了三个数据结构：`client_bg_map<client_id, bg_id>`存放PS中有那些client，每个client上有那些bg threads；`client_ids`存放PS中有那些client；`client_clocks`是VectorClock，存放来自client的clock，初始化时clock为0。每个Server thread在初始化时会去connect PS中所有的bg thread，然后将`(client_id, 0)`添加到server thread对应的Server object中的`client_clocks`中。如果某个client上有多个bg thread，那么`(client_id, 0)`会被重复添加到`client_clocks: VectorClock`中，会做替换。注意`client_clocks: VectorClock`的长度为PS中client的总个数，也就是每一个client对应一个clock，而不是每个bg thread对应一个clock。Server object还有一个`client_vector_clock_map<int, VectorClock>`的数据结构，key为`client_id`，value为该client上所有bg thread的VectorClock。也就是说每个server thread不仅存放了每个client的clock，也存放了该client上每个bg thread的clock。
 8 | 5. Server object还有一个`bg_version_map<bg_thread_id, int>`的数据结构，该结构用于存放server thread收到的bg thread的最新oplog版本。
 9 | 
10 | ## CreateTable
11 | 
12 | Server thread启动后，会不断循环等待消息，当收到Namenode发来的`create_table_msg`时，会调用`HandleCreateTable(create_table_msg)`来createTable，会经历以下步骤：
13 | 
14 | 1. 从msg中提取出tableId。
15 | 2. 回复消息给Namenode说准备创建table。
16 | 3. 初始化TableInfo消息，包括table的`staleness, row_type, columnNum (row_capacity)`。
17 | 4. 然后调用server thread对应的Server object创建table，使用`Server.CreateTable(table_id, table_info)`。
18 | 5. Server object里面有个`map<table_id, ServerTable> tables`数据结构，`CreateTable(table_id)`就是new出一个ServerTable，然后将其加入这个map。
19 | 6. ServerTable object会存放`table_info`，并且有一个`map<row_id, ServerRow> storage`，这个map用来存放ServerTable中的rows。另外还有一个`tmp_row_buff[row_length]`的buffer。new ServerTable时，只是初始化一些这些数据结构。
20 | 
21 | ## HandleClientSendOpLogMsg
22 | 
23 | 当某个server thread收到client里bg thread发来的`client_send_oplog_msg`时，会调用ServerThreads的`HandleOpLogMsg(client_send_oplog_msg)`，该函数会执行如下步骤：
24 | 
25 | 1. 从msg中抽取出`client_id`，判断该msg是否是clock信息，并提取出oplog的version。
26 | 2. 调用server thread对应的`ServerObj.ApplyOpLog(client_send_oplog_msg)`。该函数会将oplog中的updates requests都更新到本server thread维护的ServerTable。
27 | 3. 如果msg中没有携带clock信息，那么执行结束，否则继续下面的步骤：
28 | 4. 调用`ServerObj.Clock(client_id, bg_id)`，并返回`bool clock_changed`。该函数会更新client的VectorClock（也就是每个bg thread的clock），如果client的VectorClock中唯一最小的clock被更新，那么client本身的clock也需要更新，这种情况下`clock_changed`为true。
29 | 5. 如果`clock_changed == false`，那么结束，否则，进行下面的步骤：
30 | 6. `vector<ServerRowRequest> requests = serverObject.GetFulfilledRowRequests()`。
31 | 7. 对每一个request，提取其`table_id, row_id, bg_id`，然后算出bg thread的`version = serverObj.GetBgVersion(bg_id)`。
32 | 8. 根据提取的`row_id`去Server object的ServerTable中提取对应的row，使用方法`ServerRow server_row = ServerObj.FindCreateRow(table_id, row_id)`。
33 | 9. 调用`RowSubscribe(server_row, bg_id_to_client_id)`。如果consistency model是SSP，那么RowSubscribe就是SSPRowSubscribe；如果是SSP push，那么RowSubscribe就是SSPPushRowSubscribe。NMF使用是后者，因此这一步就是`SSPPushRowSubscribe(server_row, bg_id_to_client_id)`。该方法的意思是将`client_id`注册到该`server_row`，这样将该`server_row`在调用`AppendRowToBuffs`可以使用`callback_subs.AppendRowToBuffs()`。
34 | 10. 查看Server object中VectorClock中的最小clock，使用方法`server_clock = ServerObj.GetMinClock()`。
35 | 11. `ReplyRowRequest(bg_id, server_row, table_id, row_id, sersver_clock)`。
36 | 12. 最后调用`ServerPushRow()`。
37 | 
38 | ### `Server.ApplyOpLog(oplog, bg_thread_id, version)`
39 | 
40 | 1. check一下，确保自己`bg_version_map`中该bg thread对应的version比这个新来的version小1。
41 | 2. 更新`bg_version_map[bg_thread_id] = version`。
42 | 3. oplog里面可以存在多个update request，对于每一个update request，执行以下步骤：
43 | 4. 读取oplog中的`table_id, row_id, column_ids, num_updates, started_new_table`到updates。
44 | 5. 根据`table_id`从`ServerObj.tables`中找出对应的ServerTable。
45 | 6. 执行ServerTable的`ApplyRowOpLog(row_id, column_ids, updates, num_updates)`。该方法会找出ServerTable对应的row，并对row进行`BatchInc(column_ids, updates)`。如果ServerTable不存在该row，就先`CreateRow(row_id)`，然后`BatchInc()`。
46 | 7. 打出"Read and Apply Update Done"的日志。
47 | 
48 | ### `ServerObj.Clock(client_id, bg_id)`
49 | 
50 | 1. 执行`ServerObj.client_vector_clock_map[client_id].Tick(bg_id)`，该函数将client对应的VectorClock中`bg_id`对应的clock加1。
51 | 2. 如果`bg_id`对应的原始clock是VectorClock中最小值，且是唯一的最小值，那么clock+1后，需要更新client对应的clock，也就是对`client_clocks.Tick(client_id)`。
52 | 3. 然后看是否达到了snapshot的clock，达到就进行checkpoint。
53 | 
54 | ## HandleRowRequestMsg
55 | 
56 | 当某个server thread收到client里bg thread发来的`row_request_msg`时，会调用ServerThreads的`HandleRowRequest(bg_id, row_request_msg)`，该函数会执行如下步骤：
57 | 
58 | 1. 从msg中提取出`table_id, row_id, clock`。
59 | 2. 查看ServerObj中的所有client的最小clock。使用`server_clock = ServerObj.GetMinClock()`。
60 | 3. 如果msg请求信息中的clock > `server_clock`，也就是说目前有些clients在clock时的更新信息还没有收到，那么先将这个msg的request存起来，等到ServerTable更新到clock时，再reply。具体会执行`ServerObj.AddRowRequest(sender_id, table_id, row_id, clock)`。
61 | 4. 如果msg请求信息中的clock  <= `server_clock`，也就是说ServerTable中存在满足clock要求的rows，那么会执行如下步骤：
62 | 5. 得到`bg_id`的version，使用`version = ServerObj.GetBgVersion(sender_id)`，`sender_id`就是发送`row_request_msg`请求的client上面的bg thread。
63 | 6. 将ServerTable中被request的row取出来到`server_row`。
64 | 7. 调用`RowSubscribe(server_row, sender_id_to_thread_id)`。
65 | 8. 将`server_row`reply给bg thread，具体使用`ReplyRowRequest(sender_id, server_row, table_id, row_id, server_clock, version)`。
66 | 
67 | 
68 | 
69 | ### `ServerObj.AddRowRequest(sender_id, table_id, row_id, clock)`
70 | 
71 | 当来自client的request当前无法被处理的时候（server的row太old），server会调用这个函数将请求先放到队列里。具体执行如下步骤：
72 | 
73 | 1. 先new一个ServerRowRequest的结构体，将`bg_id, table_id, row_id, clock`放到这个结构体中。
74 | 2. 将ServerRowRequest放进`map<clock, vector<ServerRowRequest>> clock_bg_row_requests`中，该数据结构的key是clock，vector中的index是`bg_id`，value是ServerRowRequest。
75 | 
76 | ### `ReplyRowRequest(sender_id, server_row, table_id, row_id, server_clock, version)`
77 | 
78 | 1. 先构造一个`ServerRowRequestReplyMsg`，然后将`table_id, row_id, server_clock, version`填入这个msg中。
79 | 2. 然后将msg序列化后发回给`bg_id`对应的bg thread。
80 | 
81 | 


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/TableCreation.md:
--------------------------------------------------------------------------------
 1 | # CreateTable过程
 2 | 
 3 | ## 基本流程
 4 | 
 5 | 1. 每个App main Thread（比如每个节点上matrixfact.main()进程的main/init thread）调用`petuum::PSTableGroup::CreateTable(tableId, table_config)`来创建Table。
 6 | 2. 该方法会调用同在一个Process里的head bg thread向NameNode thread发送创建Table的请求`create_table_msg`。
 7 | 3. NameNode收到CreateTable请求，如果该Table还未创建，就在自己的线程里创建一个ServerTable。之后会忽略其他要创建同一Table的请求。
 8 | 4. NameNode将CreateTable请求`create_table_msg`发送到cluster中的每个Server thread。
 9 | 5. Server thread收到CreateTable请求后，先reply `create_table_reply_msg` to NameNode thread，表示自己已经知道要创建Table，然后直接在线程里创建一个ServerTable。
10 | 6. 当NameNode thread收到cluster中所有Server thread返回的reply消息后，就开始reply `create_table_reply_msg` to head bg thread说“Table已被ServerThreads创建”。
11 | 7. 当App main()里定义的所有的Table都被创建完毕（比如matrixfact里要创建三个Table），NameNode thread会向cluster中所有head bg thread发送“所有的Tables都被创建了”的消息，也就是`created_all_tables_msg`。
12 | 
13 | ## 流程图
14 | ![CreateTable](figures/CreateTableThreads.png)
15 | 
16 | ## 代码结构图
17 | 
18 | ![CreateTable](figures/CreateTable.png)


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/ThreadInitialization.md:
--------------------------------------------------------------------------------
  1 | # Petuum的线程启动过程分析
  2 | 
  3 | Start PS的第一个步骤就是初始化各个线程
  4 | ```c++
  5 | petuum::PSTableGroup::Init(table_group_config, false)
  6 | ```
  7 | 其具体实现是
  8 | - 初始化每个node上的namenode，background及server threads
  9 | - 建立这些threads之间的通信关系
 10 | - 为createTables()做准备
 11 | 
 12 | ## Namenode thread
 13 | 一个Petuum cluster里面只有一个Namenode thread，负责协同各个节点上的bg threads和server threads。
 14 | 
 15 | 
 16 | ## Server thread
 17 | 角色是PS中的Server，负责管理建立和维护用于存放parameters的global tables。
 18 | 
 19 | ## Background (Bg) thread
 20 | 角色是PS中的Client，负责管理真正计算的worker threads，并与server thread通信。在每个node上，bg threads可以有多个，其中一个负责建立本地 table。
 21 | 
 22 | ## 代码结构与流程
 23 | ![init](figures/PSTableGroup-Init().png)
 24 | 
 25 | 
 26 | ## Local 模式线程启动分析
 27 | 
 28 | 启动流程
 29 | 
 30 | ```c++
 31 | // main thread调用PSTableGroup::Init()后变成init thread并向CommBus注册自己
 32 | I1230 10:00:50.570231  9821 comm_bus.cpp:117] CommBus ThreadRegister()
 33 | // init thread创建Namenode thread，该向CommBus注册自己
 34 | I1230 10:01:16.210435 10014 comm_bus.cpp:117] CommBus ThreadRegister()
 35 | // Namenode thread启动
 36 | NameNode is ready to accept connections!
 37 | // cluster中bg thread的个数
 38 | I1230 10:05:09.398447 10014 name_node_thread.cpp:126] Number total_bg_threads() = 1
 39 | // cluster中的server thread的个数
 40 | I1230 10:05:09.398485 10014 name_node_thread.cpp:128] Number total_server_threads() = 1
 41 | // app中定义的table_group_config的consistency_model = SSPPush or SSP
 42 | I1230 10:06:24.141788  9821 server_threads.cpp:92] RowSubscribe = SSPPushRowSubscribe
 43 | // 启动（pthread_create）所有的local server threads，这里只有一个
 44 | I1230 10:09:50.340092  9821 server_threads.cpp:106] Create server thread 0
 45 | // Server thread获取cluster中的client个数
 46 | I1230 10:12:15.419473 10137 server_threads.cpp:239] ServerThreads num_clients = 1
 47 | // Server thread自己的thread id
 48 | I1230 10:12:15.419505 10137 server_threads.cpp:240] my id = 1
 49 | // Server thread向CommBus注册自己
 50 | I1230 10:12:15.419514 10137 comm_bus.cpp:117] CommBus ThreadRegister()
 51 | // 注册成功
 52 | I1230 10:12:15.419587 10137 server_threads.cpp:252] Server thread registered CommBus
 53 | // Bg thread启动，id = 100，Bg thread的id从100开始
 54 | I1230 10:12:51.534554 10171 bg_workers.cpp:889] Bg Worker starts here, my_id = 100
 55 | // Bg thread向CommBus注册自己
 56 | I1230 10:12:51.534627 10171 comm_bus.cpp:117] CommBus ThreadRegister()
 57 | // Bg thread先去connect Namenode thread
 58 | I1230 10:12:51.534677 10171 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 0
 59 | // Bg thread去连接Namenode thread
 60 | I1230 10:12:51.534683 10171 bg_workers.cpp:290] Connect to local server 0
 61 | // Namenode thread 收到Bg thread id = 100的请求
 62 | I1230 10:12:51.534826 10014 name_node_thread.cpp:139] Name node gets client 100
 63 | // Server thread首先去连接Namenode thread
 64 | I1230 10:13:18.879250 10137 server_threads.cpp:141] Connect to local name node
 65 | // Namenode thread收到Server thread的请求
 66 | I1230 10:13:21.051105 10014 name_node_thread.cpp:142] Name node gets server 1
 67 | // Namenode已经收到所有的client和server的连接请求
 68 | I1230 10:13:33.913213 10014 name_node_thread.cpp:149] Has received connections from all clients and servers, sending out connect_server_msg
 69 | // Namenode向所有client (bg thread) 发送让其连接server thread的命令
 70 | I1230 10:13:33.913254 10014 name_node_thread.cpp:156] Send connect_server_msg done
 71 | // 发送connect_server_msg命令完毕
 72 | I1230 10:13:33.913261 10014 name_node_thread.cpp:162] InitNameNode done
 73 | // 每个bg thread去连接cluster中的所有的server threads，这里只有一个server thread
 74 | I1230 10:13:33.929790 10171 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1
 75 | // Bg thread连接上了server thread
 76 | I1230 10:13:33.929821 10171 bg_workers.cpp:290] Connect to local server 1
 77 | // 收到Namenode的连接反馈消息（client_start_msg表示连接成功)
 78 | I1230 10:13:33.929862 10171 bg_workers.cpp:368] get kClientStart from 0 num_started_servers = 0
 79 | // Server thread初始化完成
 80 | I1230 10:23:39.355000 10137 server_threads.cpp:187] InitNonNameNode done
 81 | // Bg thread收到server thread的反馈信息（client_start_msg表示连接成功)
 82 | I1230 10:23:39.355051 10171 bg_workers.cpp:368] get kClientStart from 1 num_started_servers = 1
 83 | // Bg thread id＝100收到CreateTable的请求
 84 | I1230 10:23:39.355198 10171 bg_workers.cpp:911] head bg handles CreateTable
 85 | Data mode: Loading matrix sampledata/9x9_3blocks into memory...
 86 | ```
 87 | Thread Ids: （local模式下Namenode，Server及Bg thread都只有一个）
 88 | - 9821: main() thread
 89 | - 10014: Namenode thread
 90 | - 10137: Server thread
 91 | - 10171: Bg thread
 92 | 
 93 | 图解如下：
 94 | 
 95 | ![LocalThreads](figures/LocalThreads.png)
 96 | 
 97 | ## Distributed 模式线程启动分析
 98 | 
 99 | 启动图解如下：
100 | 
101 | ![DistributedThreads](figures/DistributedThreads.png)
102 | 
103 | 可以看到各个节点上的线程启动后，Server threads和Bg threads都与Namenode threads建立了连接。然后Namenode通知所有的bg threads与集群中的所有server threads建立连接。连接建立后，可以看到Server threads和Bg threads组成了一个二分图结构，也就是所谓的Parameter Server。


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/Architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Architecture.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/BSP-ABSP-SSP.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/BSP-ABSP-SSP.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/ClientTableUpdate.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/ClientTableUpdate.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/Compare-BSP-ABSP-SSP.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Compare-BSP-ABSP-SSP.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/ConsistencyModel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/ConsistencyModel.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/CreateTable.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/CreateTable.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/CreateTableThreads.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/CreateTableThreads.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/DistributedThreads.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/DistributedThreads.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/LocalThreads.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/LocalThreads.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/PSTableGroup-Init().png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/PSTableGroup-Init().png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/Petuum-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Petuum-architecture.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/Petuum-ps-topology.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Petuum-ps-topology.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/Petuum架构图.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Petuum架构图.graffle


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/Petuum架构图.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Petuum架构图.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/STRADS-architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/STRADS-architecture.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/matrixfact-petuum.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/matrixfact-petuum.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/matrixfact.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/matrixfact.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/parallel-matrixfact.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/parallel-matrixfact.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/figures/petuum-overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/petuum-overview.png


--------------------------------------------------------------------------------
/BigDataSystems/Petuum/杂项.md:
--------------------------------------------------------------------------------
1 | # 杂项
2 | 1. 当调用`L_Table.Get()`后，ServerThreads会调用HandleOpLogMsg()来处理`client_send_oplog_msg` 消息，然后调用ApplyOpLog()处理该消息。ApplyOpLog()会调用ApplyRowOpLog()将UpdateValues更新到ServerTable中相应的Row中去。如果ServerTable中不存在该Row，那么先CreateRow()，然后ApplyRowOpLog()到该Row上。但问题是worker thread什么时候向ServerThreads发送的`client_send_oplog_msg` 消息？在bg_workers.cpp中的kBgClock中，调用HandleClockMsg(ture)后会调用CreateOpLogMsg()向ServerThreads发送消息。
3 | 
4 | 2. `SSP_push_consistency_controller::Get()`中需要先`BgWorkers::WaitSystemClock()`才能进入`BgWorkers::RequestRow(table_id,  row_id, stalest_clock)`方法。
5 | 
6 | 3.  Client传给Server的是`oplog`消息，里面包含了`row_id, column_ids, updates`信息，与ClientTable中的`oplog`类似。
7 | 


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Build/BuildingSpark.md:
--------------------------------------------------------------------------------
 1 | # Building Spark
 2 | 
 3 | ## Using SBT
 4 | To build spark-1.6.0 using SBT, we can 
 5 | 
 6 | 1. git clone Spark-1.6.0
 7 | 2. Modify the sbt version
 8 | 
 9 | 	```shell
10 | 	cd Spark-1.6.0/projects
11 | 	modify build.properties (change sbt.version=0.13.7 to sbt.version=0.13.9)
12 | 	```
13 | 3. Generate idea modules 
14 | 	```shell
15 | 	cd Spark-1.6.0
16 | 	run ./sbt/sbt 'gen-idea no-classifiers no-sbt-classifiers' // faster
17 | 	or  ./sbt/sbt gen-idea
18 | 	```
19 | 	
20 | 4. Import the projects to IDEA
21 | 	```shell
22 | 	File -> Project from Existing Sources -> SBT -> Use auto-import -> Finish
23 | 	```
24 | 5. Select the profilers
25 | 	```shell
26 | 	Maven projects -> Select hadoop-1, maven-3, sbt, scala-2.10, unix
27 | 	```
28 | 6. Cancel SBT's auto-import
29 |     ```shell
30 | 	File -> Setttings -> SBT -> Cancel Use auto-import
31 | 	```
32 | 	
33 | When encountering some build errors, we can refer to:
34 | 
35 | 1. http://stackoverflow.com/questions/25211071/compilation-errors-in-spark-datatypeconversions-scala-on-intellij-when-using-m
36 | 2. http://stackoverflow.com/questions/33311794/import-spark-source-code-into-intellj-build-error-not-found-type-sparkflumepr
37 | 3. http://blog.csdn.net/tanglizhe1105/article/details/50530104
38 | 4. http://www.iteblog.com/archives/1038


--------------------------------------------------------------------------------
/BigDataSystems/Spark/ML/Introduction to MLlib Pipeline.md:
--------------------------------------------------------------------------------
  1 | # Introduction to Spark ML Pipeline
  2 | 
  3 | ## 说明
  4 | 建议在阅读本文档之前先阅读[官方文档](http://spark.apache.org/docs/latest/ml-guide.html)，本文档不是官方文档的翻译，而是对Spark ML Pipeline的进一步理解与总结。
  5 | 
  6 | ## From MLlib to Spark.ml
  7 | 从1.2开始，Spark开始提供新的ML package，叫做Spark.ml。这个package提供的API接口比Spark.mllib更清晰更统一，当然最主要的特性是提供了ML pipeline功能。Spark.ml目前是alpha版本，仅包含LogisticRegression一个算法，但后面会加入更多的算法并取代现有的Spark.mllib。
  8 | 
  9 | ## ML任务基本流程
 10 | 在介绍ML pipeline之前，我们先回顾一下一个ML任务包含的典型流程：
 11 | 
 12 | 1. 准备训练数据集 (training examples)
 13 | 2. 预处理及特征抽取 (training examples => features)
 14 | 3. 训练模型 (training models(features))
 15 | 4. 在测试集上进行模型评测 (testing examples => features => results)
 16 | 
 17 | 可以看到整个ML任务实际上是一个dataflow。更确切地，是两条dataflow。一条是training过程，结束点是训练好的model，另一条是testing过程，结束点是最后得到的results (e.g., predictions)。如果要训练多个模型，那么dataflow会有更多条。
 18 | 
 19 | 从high-level的角度来看，dataflow里只包含两种类型的操作：数据变换（上面的=>）与模型训练（产生model）。
 20 | 
 21 | ## Spark.ml原理
 22 | Spark.ml目的为用户提供简单易用的API，方便用户将整个training和testing的dataflow组织成一个统一的pipeline（其实叫workflow更合适）。Spark.ml里主要包含4个基本概念：
 23 | 
 24 | 1. ML data model: Spark ML使用 Spark SQL里面的SchemaRDD来表示整个处理过程中的 input/output/intermediate data。一个SchemaRDD类似一个table，里面可以包含很多不同类型的column，column可以是text，feature，label，prediction等等。
 25 | 
 26 | 2. Transformer: 俗称数据变形金刚。变形金刚可以从一样东西（比如汽车）变成另一样东西（人是不是东西？）。在Spark.ml中Transformer可以将一个SchemaRDD变成另一个SchemaRDD，变换方法由其 Transformer.transform()方法决定，整个过程与RDD.transformation()类似。可想而知，Transformer可以是feature抽取器，也可以是已经训练好的model，等等。比如，Spark.ml提供的一个`Tokenizer: Transformer `可以对training examples中的 text 进行Tokenization预处理，处理结果就是`Tokenizer.transform(text)`。
 27 | 再比如Spark.ml中的LogisticRegressionModel也是一个Transformer，当它被训练好后，预测过程就是将testing examples中的features变成predictions，也就是`predictions = LogisticRegressionModel.transform(features)`。
 28 | 
 29 | 3. Estimator: 形象地讲就是可以生产变形金刚的机器。比如要生产一个中国版的擎天柱，只需传入中国人的训练数据（比如中国人的身高，体重等），选择擎天柱的模型，然后就可以生产得到一个中国版的擎天柱。对应到Spark.ml中，要得到一个Transformer（比如要得到训练好的LogisticRegressionModel)，我们要提供一些训练数据SchemaRDD给Estimator，然后构造模型（比如直接将Spark.ml中的LogisticRegression模型拿来用），设置参数 params（比如迭代次数），最后训练（`Estimator.fit(dataset: SchemaRDD, params)`）得到一个LogisticRegressionModel，类型是Model。
 30 | 
 31 | 4. Pipeline: 将多个Transformer和Estimator组成一个DAG workflow就是pipeline。想像一下把多个变形金刚组合成战神金刚是不是很流比。具体的组装方法是`val pipeline = new Pipeline().setStages(Transformer*, Estimator*)`，* 表示0个或者多个。其实setStages()方法接收参数类型是`Array[PipelineStage]`，这里这样写是因为Transformer和Estimator都是PipelineStage的子类。得到的`val pipeline`也是一个Estimator，它可以生产（`pipeline.fit()`）出来PipelineModel （类型是Transformer，也就是那个战神金刚）。
 32 | 
 33 | 
 34 | ## 例子
 35 | 
 36 | ### 1. 官方文档中 Example: Pipeline 的图示：
 37 | ![](figures/pipelineDemo.png)
 38 | 从图中可以看到：
 39 | 
 40 | 1. 该Example中的pipeline有三个PipelineStage：两个Transformer和一个Estimator。
 41 | 2. 这个pipeline最后生成了一个训练好的model。
 42 | 3. 利用这个训练好的model可以对testingData进行预测（也就是transform()）。
 43 | 4. transform()输出的Table (SchemaRDD) 会在其输入的Table里添加一列或者多列。
 44 | 5. transform()输出的Table不会存放在内存中（类似RDD.tranformation()的实现原理，这里画出来只是方面说明）。
 45 | 
 46 | 
 47 | 
 48 | ### 2. 官方文档中 Example: Model Selection via Cross-Validation 的图示：
 49 | ![](figures/CrossValidatorDemo.png)
 50 | 
 51 | 调参是一件痛苦的事情，pipeline实际上是一个调参神器。可以在一个程序里实现**交叉验证＋最优参数选择**。
 52 | 
 53 | 比如这个例子中，使用2-Fold交叉验证，特征抽取器（hashingTF）里的参数（numFeatures)有三个values{10, 100, 1000}，LR模型的参数（正则化权重regParam）有两个values{0.1, 0.01}。
 54 | 
 55 | 为了方便画图，我把这个例子改为3-Fold交叉验证，将numFeatures参数的values减少到两个。
 56 | 
 57 | 从图中可以看到：
 58 | 
 59 | 1. 交叉验证首先会将traning dataset 划分为k份，k-1份用来做traningData，另外1份用来做testingData。
 60 | 2. Transformer和Estimator都可以有自己的参数。这里第二个Transformer（也就是HashingTF的参数有两个values，Estimator的参数（也就是LogisticRegressionModel的正则化权重）也有两个values。
 61 | 3. 总的要训练的模型个数为`Values(numFeatures) * Values(regParam)`，但需要`k * Values(numFeatures) * Values(regParam)`条pipeline来训练模型。
 62 | 4. 最优模型对应的`sum(metric_i)`最大（或者最小，具体要看cost function的定义），metric可以是AUC等。
 63 | 5. 在训练第 i 个fold里的模型的时候，traningData和testingData可以公用。
 64 | 
 65 | ## 实现
 66 | 
 67 | ### 1. Transformer
 68 | 目前Spark.ml里面只有少量的内置Transformer，Transformer有UnaryTransformer和Model两种子类型，具体如下：
 69 | 
 70 | - UnaryTransformer
 71 | 	- Tokenizer （将input String转换成小写后按空白符分割）
 72 | 	- HashingTF （统计一个document的Term Frequentcy，并将TF信息存放到一个Sparse vector里，index是term的hash值，value是term出现的次数，numFeatures参数意思是样本documents中的总term数目）
 73 | - Model
 74 | 	- LogisticRegressionModel （LR模型）
 75 | 	- PipelineModel （pipeline组合成的模型）
 76 | 	- StandardScalerModel （归一化模型）
 77 | 	- CrossValidatorModel（交叉验证模型）
 78 | 
 79 | 
 80 | Transformer中的transform()实现原理很简单: 在SchemaRDD上执行SELECT操作，SELECT的时候使用transform()作为UDF。注意，一般transform()得到的SchemaRDD后会在原有的SchemaRDD上添加1个或者多个columns。
 81 | 
 82 | 与RDD.transformation()一样，当调用Transformer.transform()时，只会生成新的SchemaRDD变量，而不会去提交job计算SchemaRDD中的内容。
 83 | 
 84 | ### 2. Estimator
 85 | 
 86 | 目前Spark.ml中只有几个Estimator，具体如下：
 87 | 
 88 | - LogisticRegression（可以把LogisticRegression看作是生产learned LogisticRegressionModel的机器）
 89 | - StandardScaler（生产StandardScalerModel的机器）
 90 | - CrossValidator（生产CrossValidatorModel的机器）
 91 | - Pipeline（生产PipelineModel 的机器）
 92 | 
 93 | Estimator里最重要的就是`Estimator.fit(SchemaRDD, params) `方法，给定SchemaRDD和parameters后，可以生产出一个learned model。
 94 | 
 95 | 每当调用 Estimator.fit() 后，都会产生job去训练模型，得到模型参数，类似MLlib中的`model.train()`。
 96 | 
 97 | ### 3. Pipeline
 98 | 
 99 | Pipeline实质是 chained Transformers。啊， 前面不是说也可以在Pipeline中加入Estimator吗？是的，加入Estimator实际上就是加入Transformer，也就是Estimator.fit()产生的Model（Transformer的子类）。同理，也可以在Pipeline中加入另一个Pipeline，反正实际加入的是Pipeline.fit()产生的PipelineModel（Transformer的子类）。
100 | 
101 | 
102 | ## DAG型的pipeline
103 | 上面例子中的pipeline都是串行的，如何组成DAG型的pipeline？
104 | 
105 | 很遗憾，目前的Transformer都是一元（Unary）的，只能输入一个SchemaRDD，输出另一个SchemaRDD。如果以后出现二元的，比如图中的BinaryTransformer，那么可以接收两个SchemaRDD，输出一个SchemaRDD，类似`RDD.join(other RDDs)`，那么pipeline就可以是DAG型的了。
106 | 
107 | 注意：目前Transformer之间的联系根据`Transformer.setInputCol()`和`Transformer.setOutputCol()`建立。
108 | ![](figures/DAGpipeline.png)
109 | 
110 | 
111 | ## 不足之处
112 | 
113 | 由于还是alpha版，目前Spark.ml还有很多不足之处：
114 | 
115 | 1. pipeline会隐藏中间数据处理结果，这样不方便调试和错误诊断。
116 | 2. 实际上没有做到完全的pipeline，训练模型（pipeline.fit()）时是barrier，也就是说训练和测试过程仍然是独立的。
117 | 3. 在CrossValidator中，用于训练模型的pipelines目前不能够并行运行。
118 | 
119 | 
120 | 
121 | 
122 | 
123 | 
124 | 


--------------------------------------------------------------------------------
/BigDataSystems/Spark/ML/figures/CrossValidatorDemo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/ML/figures/CrossValidatorDemo.png


--------------------------------------------------------------------------------
/BigDataSystems/Spark/ML/figures/DAGpipeline.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/ML/figures/DAGpipeline.png


--------------------------------------------------------------------------------
/BigDataSystems/Spark/ML/figures/pipelineDemo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/ML/figures/pipelineDemo.png


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Scheduler/SparkResourceManager.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/SparkResourceManager.graffle


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Scheduler/SparkScheduler.graffle:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/SparkScheduler.graffle


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Scheduler/SparkScheduler.md:
--------------------------------------------------------------------------------
  1 | ## Spark scheduler
  2 | 
  3 | ### Master 分配 Executor 方法
  4 |     
  5 | ```scala
  6 |   private def schedule(): Unit = {
  7 |     // 先打乱 workers
  8 |     val shuffledWorkers = Random.shuffle(workers) 
  9 |     // 对通过 Spark-submit 提交（也就是 AppClient 类提交）的 app 来说下面这个 for 语句没用
 10 |     for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) {
 11 |       for (driver <- waitingDrivers) {
 12 |         if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
 13 |           launchDriver(worker, driver)
 14 |           waitingDrivers -= driver
 15 |         }
 16 |       }
 17 |     }
 18 |     // 开始在 workers 上分配 executors
 19 |     startExecutorsOnWorkers()
 20 |   }
 21 | 
 22 | ```
 23 | #### `startExecutorsOnWorkers()` 逻辑
 24 | 
 25 | ```scala
 26 | private def startExecutorsOnWorkers(): Unit = {
 27 |     // FIFO 调度策略
 28 |     for (app <- waitingApps if app.coresLeft > 0) {
 29 |       // 得到每个 executor 需要的 cores 数目
 30 |       val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor
 31 |       // 挑选出可用的 workers，将可用 workers 的资源（空闲 CPU core 个数）按照降序排列
 32 |       val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
 33 |         .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
 34 |           worker.coresFree >= coresPerExecutor.getOrElse(1))
 35 |         .sortBy(_.coresFree).reverse
 36 |       // 资源分配算法，assignedCores 是一个数组，第 i 个元素表示应该往第 i个 usableWorkers 上分配多少个 core
 37 |       val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
 38 | 
 39 |       // 已经得到应该往每个 worker 上分配多少个 core，开始分配
 40 |       for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
 41 |         allocateWorkerResourceToExecutors(
 42 |           app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
 43 |       }
 44 |     }
 45 |   }
 46 | ```
 47 | 
 48 | #### `scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)`逻辑
 49 | 
 50 | ```scala
 51 |   private def scheduleExecutorsOnWorkers(
 52 |       app: ApplicationInfo,
 53 |       usableWorkers: Array[WorkerInfo],
 54 |       spreadOutApps: Boolean): Array[Int] = {
 55 |     // 首先进行一系列初始化
 56 |     // 每个 executor 需要多少个 core
 57 |     val coresPerExecutor = app.desc.coresPerExecutor
 58 |     // 每个 executor 最少需要多少个 core，默认是 1
 59 |     val minCoresPerExecutor = coresPerExecutor.getOrElse(1)
 60 |     // 如果用户没有设置 coresPerExecutor，那么 oneExecutorPerWorker 为 true
 61 |     val oneExecutorPerWorker = coresPerExecutor.isEmpty
 62 |     // 每个 executor 需要的 memory 用量
 63 |     val memoryPerExecutor = app.desc.memoryPerExecutorMB
 64 |     // 可用的 workers 个数
 65 |     val numUsable = usableWorkers.length
 66 |     // 每个 worker 要提供的 cores 个数
 67 |     val assignedCores = new Array[Int](numUsable) 
 68 |     // 在每个 worker 上分配的 executor 个数
 69 |     val assignedExecutors = new Array[Int](numUsable) 
 70 |     // 要分配的 core 个数 = min(app 需求的 cores，workers 剩余 cores 之和)
 71 |     var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)
 72 | 
 73 | 
 74 |     // Keep launching executors until no more workers can accommodate any
 75 |     // more executors, or if we have reached this application's limits
 76 | 
 77 |     // 从所有 workers 中筛选出可用的 workers，筛选算法见 canLauchExecutor
 78 |     var freeWorkers = (0 until numUsable).filter(canLaunchExecutor)
 79 |     
 80 |     while (freeWorkers.nonEmpty) {
 81 |       freeWorkers.foreach { pos =>
 82 |         var keepScheduling = true
 83 |         // 如果该 worker 上可以启动 executor
 84 |         while (keepScheduling && canLaunchExecutor(pos)) {
 85 |           // 需要分配的 cores 的数目减去每个 executor 需要的 core 个数
 86 |           coresToAssign -= minCoresPerExecutor
 87 |           // 将分配好的 core 信息保存到 assignedCores 里面
 88 |           assignedCores(pos) += minCoresPerExecutor
 89 | 
 90 |           // If we are launching one executor per worker, then every iteration assigns 1 core
 91 |           // to the executor. Otherwise, every iteration assigns cores to a new executor.
 92 |           if (oneExecutorPerWorker) {
 93 |             assignedExecutors(pos) = 1
 94 |           } else {
 95 |             assignedExecutors(pos) += 1
 96 |           }
 97 | 
 98 |           // Spreading out an application means spreading out its executors across as
 99 |           // many workers as possible. If we are not spreading out, then we should keep
100 |           // scheduling executors on this worker until we use all of its resources.
101 |           // Otherwise, just move on to the next worker.
102 |           // 如果选择 spreadOut 模式，那么在一个 worker 上分配一个 executor 的 cores 后，就更换
103 |           // worker 再分配
104 |           if (spreadOutApps) {
105 |             keepScheduling = false
106 |           }
107 |         }
108 |       }
109 |       // 从 freeWorkers 中再挑选出可以启动 executor 的 workers
110 |       freeWorkers = freeWorkers.filter(canLaunchExecutor)
111 |     }
112 |     返回在 workers 上分配的 CPU core 资源信息
113 |     assignedCores
114 | ```
115 | 
116 | #### `scheduleExecutorsOnWorkers().canLaunchExecutor`逻辑
117 | 
118 | ```scala
119 |    /** Return whether the specified worker can launch an executor for this app. */
120 |     def canLaunchExecutor(pos: Int): Boolean = {
121 |       // 如果 app 里要分配的 core 个数大于每个 executor 需要的个数，仍然继续调度
122 |       val keepScheduling = coresToAssign >= minCoresPerExecutor
123 |       // 当前 worker 是否有足够的 core 来分配
124 |       val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor
125 | 
126 |       // If we allow multiple executors per worker, then we can always launch new executors.
127 |       // Otherwise, if there is already an executor on this worker, just give it more cores.
128 |       val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0
129 |       // 如果每个 worker 可以分配多个 executor，或者这个 worker 上还没分配到 executor
130 |       if (launchingNewExecutor) {
131 |         // 计算要在该 worker 上分配了多少 memory
132 |         val assignedMemory = assignedExecutors(pos) * memoryPerExecutor
133 |         // 计算该 worker 上是否有足够的 memory 来分配 executor
134 |         val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor
135 |         // 检测是否超过 app executor 数目上限，用于动态调度
136 |         val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit
137 |         keepScheduling && enoughCores && enoughMemory && underLimit
138 |       } else {
139 |         // We're adding cores to an existing executor, so no need
140 |         // to check memory and executor limits
141 |         // 如果每个 worker 只运行一个 executor，那么直接在该 executor 上增加 core 个数
142 |         keepScheduling && enoughCores
143 |       }
144 |     }
145 | ```
146 | 
147 | #### `allocateWorkerResourceToExecutors()`逻辑
148 | 
149 | ```scala
150 |   /**
151 |    * Allocate a worker's resources to one or more executors.
152 |    * @param app the info of the application which the executors belong to
153 |    * @param assignedCores number of cores on this worker for this application
154 |    * @param coresPerExecutor number of cores per executor
155 |    * @param worker the worker info
156 |    */
157 |   private def allocateWorkerResourceToExecutors(
158 |       app: ApplicationInfo,
159 |       assignedCores: Int,
160 |       coresPerExecutor: Option[Int],
161 |       worker: WorkerInfo): Unit = {
162 |     // If the number of cores per executor is specified, we divide the cores assigned
163 |     // to this worker evenly among the executors with no remainder.
164 |     // Otherwise, we launch a single executor that grabs all the assignedCores on this worker.
165 | 
166 |     // 计算需要在该 worker 上启动多少个 executors (assignedCores / coresPerExecutor)
167 |     val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1)
168 |     // 每个 executor 需要多少个 core
169 |     val coresToAssign = coresPerExecutor.getOrElse(assignedCores)
170 |     // 将 executor 信息加到 app 上，在 worker 上启动相应的 executor
171 |     for (i <- 1 to numExecutors) {
172 |       val exec = app.addExecutor(worker, coresToAssign)
173 |       launchExecutor(worker, exec)
174 |       app.state = ApplicationState.RUNNING
175 |     }
176 |   }
177 | ```


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Scheduler/figures/SparkResourceManager.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkResourceManager.pdf


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Scheduler/figures/SparkSchedulerAppSubmit.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkSchedulerAppSubmit.pdf


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneMaster.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneMaster.pdf


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneResourceAllocation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneResourceAllocation.pdf


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneTaskScheduler.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneTaskScheduler.pdf


--------------------------------------------------------------------------------
/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneTaskSchedulerChinese.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneTaskSchedulerChinese.pdf


--------------------------------------------------------------------------------
/BigDataSystems/Spark/StackOverflowDiagnosis/StackOverflow.md:
--------------------------------------------------------------------------------
  1 | # 一个KCore算法引发的StackOverflow血案
  2 | ——记一次扑朔迷离的Spark StackOverflow侦破过程
  3 | 
  4 | 
  5 | ## 案件概述
  6 | 
  7 | 故事开始于一个KCore算法，这是一个求解图中所有节点KCore值的算法。KCore的算法特点决定了它需要迭代很多轮才能收敛。在亿级别的新浪数据上，迭代个几百次是小Case的。当我们翘首期盼收敛结果时，算法却引发了一个StackOverflow的血案。
  8 | 
  9 | 一开始，算法在本地小数集上测试通过，没有任何问题，但放到集群上运行后总是出现StackOverflow错误。起初，我们认为错误原因不过是典型的RDD的lineage过长问题，也就是lineage长度随着算法迭代次数增加而不断变长，最后导致Spark在序列化该lineage的时候调用栈溢出。我们随手拎起checkpoint的宝刀，认为每迭代几轮checkpoint一下，可以轻而易举地截断lineage，从而避免这个问题。可是当checkpoint加入后，错误仍然出现。迫不得已，我们进行小米加步枪地debug，来查找是否还有其他影响因素。
 10 | 
 11 | 在debug过程中，StackOverflow这种类型的错误，展现了比OutOfMemory更难驯服的个性，尤其是在大规模的集群上。这逼迫我们想办法把案发现场转移到单机环境，并通过非常Geek的方式，模拟出算法在分布式环境下的出错情景，重现案发现场，最后追本溯源找到了问题的根源所在：RDD的 f 函数闭包和GraphX中的一个小bug。
 12 | 
 13 | 这两个小bug，最终导致task的序列化链，可以偷偷穿越被断掉的lineage而不断延续，也就是task的序列化链随着迭代次数增加不断增长，最终造成StackOverflow错误。整个断案过程耗时一周，可谓扑朔迷离，柳暗花明又一村，且听我们娓娓道来。
 14 | 
 15 | 
 16 | ## 案件描述
 17 | 
 18 | 在集群上运行KCore算法的时候（完整版参见[1]）时，会稳定的在第300+轮出现StackOverflow错误，这个错误由JDK内部的序列化/反序列化方法抛出。不管我们怎么调优参数，错误总会如期而至。
 19 | 
 20 | 这类算法的特点：
 21 | 
 22 | 1. 具有很长的computing chain
 23 | 
 24 | 		比如下面的 “degreeGraph=>subGraph=>degreeGraph=>subGraph=>…=>”
 25 | 		
 26 | 2. 迭代非常多次才能收敛
 27 | 
 28 | 	```scala
 29 | 	//K-Core Algorithm
 30 | 	val kNum = 5
 31 | 
 32 | 	var degreeGraph = graph.outerJoinVertices(graph.degrees) {
 33 | 		    (vid, vd, degree) => degree.getOrElse(0)
 34 | 	}.cache()
 35 | 
 36 | 	do {
 37 | 		val subGraph = degreeGraph.subgraph(
 38 | 			vpred = (vid, degree) => degree >= KNum
 39 | 		).cache()
 40 | 
 41 | 		val newDegreeGraph = subGraph.degrees
 42 | 
 43 | 		degreeGraph = subGraph.outerJoinVertices(newDegreeGraph) {
 44 | 		    (vid, vd, degree) => degree.getOrElse(0)
 45 | 		}.cache()
 46 | 
 47 | 		isConverged = check(degreeGraph)
 48 | 	} while(isConverged == false)
 49 | 	```
 50 | 
 51 | 它产生的错误栈如下：
 52 | 
 53 | * 错误栈1（在JDK序列化时产生）：
 54 | 
 55 | 	```java
 56 | Exception in thread "main" org.apache.spark.SparkException: 
 57 | Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError
 58 | 	java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1275)
 59 | 	java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230)
 60 | 	...
 61 | 	java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
 62 | 	java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
 63 | 	scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
 64 | 	sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
 65 | 	...
 66 | 	sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 67 | 	java.lang.reflect.Method.invoke(Method.java:606)
 68 | ```
 69 | * 错误栈2（也可以在JDK反序列化时产生）：
 70 | 	
 71 | 	```java
 72 | ERROR Executor: Exception in task 1.0 in stage 339993.0 (TID 3341)
 73 | java.lang.StackOverflowError
 74 | 	at java.lang.StringBuilder.append(StringBuilder.java:204)
 75 | 	at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3143)
 76 | 	...
 77 | 	java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 78 | 	java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
 79 | 	scala.collection.immutable.$colon$colon.readObject(List.scala:362)
 80 | 	sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
 81 | 	sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 82 | 	java.lang.reflect.Method.invoke(Method.java:606)
 83 | 	java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
 84 | ```
 85 | 
 86 | ## 案件分析
 87 | 
 88 | 日志显示这个错误在task序列化/反序列化时产生，首先定位到error被抛出的地点是在
 89 | 
 90 | 1. driver端在序列化task的时候。具体地点在DAGScheduler.scala的
 91 |     
 92 |    ```scala
 93 |    if (stage.isShuffleMap) { //如果是shuffleMapTask就序列化stage中最后一个RDD及Shuffle依赖关系
 94 |        closureSerializer.serialize((stage.rdd, stage.shuffleDep.get) : AnyRef).array()
 95 |    } else { //如果是ReduceTask就序列化stage中最后一个RDD及用于计算结果的func
 96 |        closureSerializer.serialize((stage.rdd, stage.resultOfJob.get.func) : AnyRef).array()
 97 |    }
 98 | ```
 99 | 
100 | 2. executor端在反序列化task的时候。具体地点在
101 | 
102 | 	ShuffleMapTask.scala
103 | 	```scala
104 | 	val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
105 |  		ByteBuffer.wrap(taskBinary.value), 
106 | 		Thread.currentThread.getContextClassLoader)
107 | 	```
108 | 	
109 | 	或 ResultTask.scala
110 | 
111 | 	```scala
112 |    val (rdd, func) = ser.deserialize[(RDD[T], 
113 | 		(TaskContext, Iterator[T]) => U)](ByteBuffer.wrap(taskBinary.value),
114 | 		Thread.currentThread.getContextClassLoader)
115 | 	```
116 | 
117 | 
118 | 
119 | 所以，我们认定原因是Spark在序列化task时产生了一条很长的调用链（以下称为**序列化链**），但是这条链是什么？为什么会那么长？
120 | 
121 | 
122 | 1. **分析task在序列化时要序列化哪些内容：**
123 | 
124 | 	首先明确一个概念：每个stage生成一组task。Task在序列化的时候主要是要序列化该stage中的最后一个RDD（后面称为finalRDD）。我们分析了RDD的代码，发现在序列化RDD时，需要序列化RDD的成员变量有`RDD id，dependencies_，storageLevel`等。其中最主要的是`dependencies_`变量，它存放着该RDD的依赖关系，也就是该RDD如何依赖于其他的哪些RDDs，这个依赖关系俗称为**lineage**。设想，当序列化后的task发到remote worker node上时，executor只需要反序列化出finalRDD，然后通过它的lineage就可以从最初的数据源（或者shuffleMapTask的输出结果）一步步计算得到finalRDD。
125 | 
126 | 2. **分析lineage对序列化链的影响：**
127 | 
128 | 	由于在序列化finalRDD的dependencies\_时会序列化finalRDD依赖的上游的RDDs的dependencies\_，那么这个序列化过程实际上是lineage的回溯过程。假设lineage是这样的 `RDD a => RDD b => RDD c => finalRDD`，那么首先序列化`finalRDD`，然后顺着dependencies去序列化`RDD c`，然后顺着dependencies去序列化`RDD b`，最后顺着dependencies去序列化`RDD a`。可见lineage长度越长，序列化链越长，最终可以造成栈溢出。
129 | 
130 | 3. **分析lineage增长规律：**
131 | 	
132 | 	虽然我们推测lineage过长是错误原因，但需要实验证明。我们选择在本地小数据集来实验（大数据上实验一次要2个小时），在小数据集上算法迭代12轮后可以收敛。我们选择了几个关键的RDD，通过`RDD.toDebugString()`来输出这些RDD在每轮迭代中的lineage。起初我们犯了一个小错误，将toDebugString()输出的lineage长度当作序列化时的lineage长度。经 [@hashjoin](http://weibo.com/u/1630850750) 提醒ShuffledRDD可以断掉序列化时的lineage，也就是说序列化链碰到ShuffledRDD就停止了，这与stage划分方法一致。我们重新审查了每轮迭代toDebugString()输出的lineage长度，却发现一个小问题：toDebugString()输出的lineage与真正的lineage有出入：真正的lineage是一个DAG，而toDebugString()输出的是一个stage tree（每个stage中的RDDs被输出成一条线性依赖链），所以像`ZippedPartitionsRDD2`这样同时依赖于两个RDD的依赖关系没有能够在toDebugString()的输出中表示出来。为了准确计算序列化时lineage的长度，我们修改了toDebugString()方法，加入了depth信息，类似求从DAG源点到每个节点的距离，最后终于统计出来每轮迭代lineage长度会增长3。
133 | 
134 | 	小插曲：由于每轮迭代产生的lineage太长，后面几轮迭代产生的lineage根本输出不了（打印lineage时产生的String会造成本地程序OutOfMemory）。没办法，我们修改了toDebugString()方法，只统计depth，不显示lineage。即使这样，如果lineage很长，toDebugString()执行时间也接受不了。
135 | 
136 | 	lineage的demo参见[2]和[3]。
137 | 
138 | ## 推理诊断
139 | 
140 | ### 1. 尝试使用checkpoint避免错误失败
141 | 
142 | 竟然找到了错误原因是不断增长的lineage，我们只需要每隔几轮就checkpoint()一次，就可以断掉不断增长的lineage，从而控制序列化链的长度。下表分别显示了在**“不进行checkpoint，每隔6轮checkpoint一次，每隔8轮checkpoint一次”**情况下得到的某个重要的RDD的lineage长度变化规律。
143 | 
144 | 
145 | |i-th iter| WithoutCheckpoint|	checkpoint-every-6-iters|	checkpoint-every-8-iters|
146 | |----:|--------:|------:|------:|
147 | |1	|9	|2	|2|
148 | |2	|12	|6	|6|
149 | |3	|15	|9	|9|
150 | |4	|18	|12	|12|
151 | |5	|21	|15	|15|
152 | |6	|24	|18	|18|
153 | |7	|27	|2	|21|
154 | |8	|30	|6	|24|
155 | |9	|33	|9	|2|
156 | |10	| 执行时间太长，算不出来 :-( |12	|6|
157 | |11	| 执行时间太长，算不出来 :-( |15	|9|
158 | |12	| 执行时间太长，算不出来 :-( |18	|12|
159 | 
160 | 
161 | 结果看起来不错，checkpoint以后我们可以控制lineage的长度了，也就解决问题了。
162 | 
163 | 然而，现实总是那么残酷，加上checkpoint()后，在大数据集上仍然StackOverflow了，而且还是在300+轮。
164 | 
165 | 
166 | ### 2. 错误本地重现
167 | 
168 | 还有什么因素可以导致序列化链变长？而且随着迭代次数增长而变长？而且checkpoint()也断不了？
169 | 
170 | 如果错误是OutOfMemory，那么直接去dump heap，找日志，分析每个object的来源，然后debug调参，等等。可惜错误是StackOverflow，这个错误产生是一瞬间的事，要debug到错误产生的那个时间点非常难，而且无法生成heap dump，而且是在集群上产生的，而且运行一次要2个多小时。
171 | 
172 | 我们仔细分析了GraphX的实现代码，看每个函数都干了些什么事情，但依然没有头绪。无奈之下，我们做了一个非常tricky的尝试，修改收敛条件，让算法在小数据集上无限迭代，看错误是否能在本地重现。测试到第400+轮的时候，错误终于在本地重现。为了让错误更快地出现，我们调小了stack大小，设为128KB，这样第30+轮的时候就可以重现。
173 | 
174 | 
175 | ### 3. 诊断错误真正原因
176 | 
177 | 本地可以重现后，下一步就是要debug到错误产生的时间点。这个点很难把握，因为一个迭代型算法产生的stage非常多，要预测哪个stage会产生error很难（也许直接catch StackOverflowError可以debug到那个时间点，但我们没有尝试，也可能catch到的时候调用栈已经自动退栈了）。我们选择了第20轮附近的某个stage，并假定这个stage就会产生StackOverflow，然后在JDK中的`ObjectOutputStream.java`的`writeObject()`方法处设置断点，仔细观察task序列化的每一步都序列化了什么内容。我们观察了很久很久，仔细查看了几百个stack frame中的内容，发现除了序列化RDD及其依赖关系以外，还序列化了一个奇怪的东西，那就是**RDD的 f 函数闭包中的$outer**，这个$outer指向了一个不在lineage中的RDD（VertexRDD）。也就是说，当序列化RDD的时候，其 f 函数闭包引用的VertexRDD也被序列化了。而在序列化这个VertexRDD时又序列化了它的成员变量PartitionsRDD。这个`f -> $outer (VertexRDD) -> PartitionsRDD` 的序列化链不属于正常的lineage，可能是错误原因。
178 | 
179 | > Note：很多包含 f 的RDDs，都可能存在不正常的$outer，不正常是指这个$outer会引用到其他不在lineage中RDDs。
180 | 
181 | 经过深入的源代码分析+stack分析，我们最后确定这条链可以重新连接被checkpoint断掉的lineage，也就是说序列化的时候可以通过不正常的 f 序列化链访问到之前迭代产生的RDDs。图示如下：
182 | 
183 | <img src="figures/g1.png" alt="替代文本" title="标题文本" width="100%" />
184 | 
185 | 
186 | 
187 | 所有被OneToOneDep连接的RDD是正常的lineage，这些RDD也会被正常序列化。如果某个RDD被checkpoint，比如Figure 1中的A，那么A的依赖链会被断掉（dependencies\_被置为null），这样序列化到A或者ShuffledRDD就停止了。然而，由于A的 f 的函数闭包引用了VertexRDD，而partitionsRDD又是VertexRDD的成员变量，当序列化到A的时候会顺着`(1)->(2)`链又访问到`RDDs in previous iteartions`，最后造成序列化链与checkpoint前的lineage一样长。这样，随着迭代次数增加，lineage不断变长，序列化链不断变长，最后就StackOverflow了。
188 | 
189 | 下图显示了我们debug到的`ZippedPartitionsRDD2 -> f -> $outer -> partitionsRDD`的序列化链。
190 | 
191 | <img src="figures/g2.png" alt="替代文本" title="标题文本" width="100%" />
192 | 
193 | 更严重的是，单单这条链不断迭代就可以造成StackOverflow，下图显示了连续不断的，与checkpoint之前一样长的序列化链`VertexRDD -> partitionsRDD -> f -> $outer -> partitionsRDD -> f -> $outer -> VertexRDD -> ...`
194 | 
195 | <img src="figures/g3.png" alt="替代文本" title="标题文本" width="100%" />
196 | 
197 | 
198 | ## 总结陈词
199 | 
200 | 至此，StackOverflow的原因水落石出，罪魁祸首是这3个原因：
201 | 
202 | 1. lineage过长，且随着迭代次数增加而增长
203 | 2. 异常的 f 函数闭包，即产生外部RDD引用的 f 闭包。
204 | 3. GraphX小bug：partitionsRDD是VertexRDD的non-transient成员变量
205 | 
206 | 
207 | ## 案件处理
208 | 
209 | 从上面分析可以看出，只需要切断Figure 1中的(1)或(2)就可以达到截断序列化链的目的。
210 | 
211 | 截断(1)的方法有两种：
212 | 
213 | 1. 修改GraphX的代码结构，使得partitionsRDD的 f 的函数闭包不再引用VertexRDD。目前来看这种方法不行，因为需要重构代码。
214 | 2. 在checkpoint的时候将RDD的 f 置为null。这样如果A被checkpoint，那么f函数闭包会同时被清理掉，就不存在链接(1)了。
215 | 
216 | 截断(2)的方法很简单：
217 | 
218 | - 直接将partitionsRDD置为transient，相当于序列化时partitionsRDD不是VertexRDD的成员变量，也就不存在连接(2)了。由于partitionsRDD已经通过lineage序列化，不用担心置为transient后会造成task不能计算的情况。
219 | 
220 | 我们针对这三个方法，提交了 [issue-4672](https://issues.apache.org/jira/browse/SPARK-4672) 和三个PR，目前都被merge到master和Spark-1.2版本中了。
221 | 
222 | 另外，我们发现之前 [@witgo](https://github.com/witgo) 提交了一个 [issue-3623](https://issues.apache.org/jira/browse/SPARK-3623) 讨论了GraphX的checkpoint问题和修复方法，提了几个很好的问题，但没有提到StackOverflow error。
223 | 
224 | ## 进一步思考
225 | 
226 | 虽然上面的解决方案可以解决GraphX中迭代算法的问题，但其他迭代算法呢？通过上面的错误诊断分析，committers（包括 [@Jason Dai](http://weibo.com/u/3816918426)）和我们认识到这是一个general的问题，即：怎么保证task和RDD在序列化时只序列化“重要的”的东西，而自动去除不必要的引用？@Jason Dai给出了一些解决思路，见 [issue-4672](https://issues.apache.org/jira/browse/SPARK-4672) 里面的 comments。如果大家有更优雅的解决思路或者具体的思路实现可以讨论、实现、提交新的PR。
227 | 
228 | 这个问题只是分布式系统可靠性和性能优化方面问题的冰山一角，我们还遇到一些可靠性与性能 trade-off 方面的一些问题，也在考虑是否有更优雅的解决方案。
229 | 
230 | 感谢[@明风Andy](http://weibo.com/u/2304284334)的统筹、审阅和修改，这个问题最初由[@张萌Taobao](http://weibo.com/u/2712640302)发现，并参与了诊断工作。
231 | 
232 | ## References
233 | [1] 造成StackOverflow的完整代码,（需要将最后的收敛条件设置为`filteredCount >= 0L`可以产生无限迭代进而产生错误）
234 | 	
235 | ```scala
236 | package graphx.test
237 | 
238 | import org.apache.hadoop.conf.Configuration
239 | import org.apache.hadoop.fs.{Path, FileSystem}
240 | import org.apache.spark.SparkContext
241 | import org.apache.spark.graphx._
242 | import org.apache.spark.rdd.RDD
243 | 
244 | object SimpleIterAppTriggersStackOverflow {
245 | 
246 |   def main(args: Array[String]) {
247 | 
248 |     val sc = new SparkContext("local[2]", "Kcore")
249 |     val checkpointPath = "D:\\data\\checkpoint"
250 |     sc.setCheckpointDir(checkpointPath)
251 | 
252 | 
253 |     val edges: RDD[(Long, Long)] =
254 |       sc.parallelize(Array(
255 |         (1L, 17L), (2L, 4L),
256 |         (3L, 4L), (4L, 17L),
257 |         (4L, 16L), (5L, 15L),
258 |         (6L, 7L), (7L, 15L),
259 |         (8L, 12L), (9L, 12L),
260 |         (10L, 12L), (11L, 12L),
261 |         (12L, 18L), (13L, 14L),
262 |         (13L, 17L), (14L, 17L),
263 |         (15L, 16L), (15L, 19L),
264 |         (16L, 17L), (16L, 18L),
265 |         (16L, 19L), (17L, 18L),
266 |         (17L, 19L), (18L, 19L)))
267 | 
268 | 
269 |     val graph = Graph.fromEdgeTuples(edges, 1).cache()
270 | 
271 |     var degreeGraph = graph.outerJoinVertices(graph.degrees) {
272 |       (vid, vd, degree) => degree.getOrElse(0)
273 |     }.cache()
274 | 
275 |     var filteredCount = 0L
276 |     var iters = 0
277 | 
278 |     val kNum = 5
279 |     val checkpointInterval = 10
280 | 
281 |     do {
282 | 
283 |       val subGraph = degreeGraph.subgraph(vpred = (vid, degree) => degree >= kNum).cache()
284 | 
285 |       val preDegreeGraph = degreeGraph
286 |       degreeGraph = subGraph.outerJoinVertices(subGraph.degrees) {
287 |         (vid, vd, degree) => degree.getOrElse(0)
288 |       }.cache()
289 | 
290 |       if (iters % checkpointInterval == 0) {
291 | 
292 |         try {
293 |           val fs = FileSystem.get(new Configuration())
294 |           if (fs.exists(new Path(checkpointPath)))
295 |             fs.delete(new Path(checkpointPath), true)
296 |         } catch {
297 |           case e: Throwable => {
298 |             e.printStackTrace()
299 |             println("Something Wrong in GetKCoreGraph Checkpoint Path " + checkpointPath)
300 |             System.exit(0)
301 |           }
302 |         }
303 | 
304 |         degreeGraph.edges.checkpoint()
305 |         degreeGraph.vertices.checkpoint()
306 | 
307 |       }
308 | 
309 |       val dVertices = degreeGraph.vertices.count()
310 |       val dEdges = degreeGraph.edges.count()
311 | 
312 |       println("[Iter " + iters + "] dVertices = " + dVertices + ", dEdges = " + dEdges)
313 | 
314 |       filteredCount = degreeGraph.vertices.filter {
315 |         case (vid, degree) => degree < kNum
316 |       }.count()
317 | 
318 |       preDegreeGraph.unpersistVertices()
319 |       preDegreeGraph.edges.unpersist()
320 |       subGraph.unpersistVertices()
321 |       subGraph.edges.unpersist()
322 | 
323 | 
324 |       iters += 1
325 |     } while (filteredCount >= 1L) 
326 | 
327 |     println(degreeGraph.vertices.count())
328 |   }
329 | }
330 | ```
331 | [2] lineage demo，未加checkpoint，第一轮迭代
332 | 
333 | ```scala
334 | [Iter 1][DEBUG] (2) EdgeRDD[33] at RDD at EdgeRDD.scala:35
335 | |  EdgeRDD ZippedPartitionsRDD2[32] at zipPartitions at ReplicatedVertexView.scala:114
336 | |  EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169
337 | |  MappedRDD[11] at map at Graph.scala:392
338 | |  MappedRDD[10] at distinct at KCoreCommonDebug.scala:115
339 | |  ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115
340 | +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115
341 |     |  FilteredRDD[7] at filter at KCoreCommonDebug.scala:112
342 |     |  MappedRDD[6] at map at KCoreCommonDebug.scala:102
343 |     |  MappedRDD[5] at repartition at KCoreCommonDebug.scala:101
344 |     |  CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101
345 |     |  ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101
346 |     +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101
347 |        |  D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100
348 |        |  D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:100
349 | |  ShuffledRDD[31] at partitionBy at ReplicatedVertexView.scala:112
350 | +-(2) ReplicatedVertexView.updateVertices - shippedVerts false false (broadcast) MapPartitionsRDD[30] at mapPartitions at VertexRDD.scala:347
351 |     |  VertexRDD ZippedPartitionsRDD2[28] at zipPartitions at VertexRDD.scala:174
352 |     |  VertexRDD, VertexRDD MapPartitionsRDD[18] at mapPartitions at VertexRDD.scala:441
353 |     |  MapPartitionsRDD[17] at mapPartitions at VertexRDD.scala:457
354 |     |  ShuffledRDD[16] at ShuffledRDD at RoutingTablePartition.scala:36
355 |     +-(2) VertexRDD.createRoutingTables - vid2pid (aggregation) MapPartitionsRDD[15] at mapPartitions at VertexRDD.scala:452
356 |        |  EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169
357 |        |  MappedRDD[11] at map at Graph.scala:392
358 |        |  MappedRDD[10] at distinct at KCoreCommonDebug.scala:115
359 |        |  ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115
360 |        +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115
361 |           |  FilteredRDD[7] at filter at KCoreCommonDebug.scala:112
362 |           |  MappedRDD[6] at map at KCoreCommonDebug.scala:102
363 |           |  MappedRDD[5] at repartition at KCoreCommonDebug.scala:101
364 |           |  CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101
365 |           |  ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101
366 |           +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101
367 |              |  D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100
368 |              |  D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:100
369 |     |  VertexRDD ZippedPartitionsRDD2[26] at zipPartitions at VertexRDD.scala:200
370 |     |  VertexRDD, VertexRDD MapPartitionsRDD[18] at mapPartitions at VertexRDD.scala:441
371 |     |  MapPartitionsRDD[17] at mapPartitions at VertexRDD.scala:457
372 |     |  ShuffledRDD[16] at ShuffledRDD at RoutingTablePartition.scala:36
373 |     +-(2) VertexRDD.createRoutingTables - vid2pid (aggregation) MapPartitionsRDD[15] at mapPartitions at VertexRDD.scala:452
374 |        |  EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169
375 |        |  MappedRDD[11] at map at Graph.scala:392
376 |        |  MappedRDD[10] at distinct at KCoreCommonDebug.scala:115
377 |        |  ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115
378 |        +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115
379 |           |  FilteredRDD[7] at filter at KCoreCommonDebug.scala:112
380 |           |  MappedRDD[6] at map at KCoreCommonDebug.scala:102
381 |           |  MappedRDD[5] at repartition at KCoreCommonDebug.scala:101
382 |           |  CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101
383 |           |  ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101
384 |           +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101
385 |              |  D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100
386 |              |  D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:100
387 |     |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[24] at zipPartitions at VertexRDD.scala:301
388 |     |  VertexRDD, VertexRDD MapPartitionsRDD[18] at mapPartitions at VertexRDD.scala:441
389 |     |  MapPartitionsRDD[17] at mapPartitions at VertexRDD.scala:457
390 |     |  ShuffledRDD[16] at ShuffledRDD at RoutingTablePartition.scala:36
391 |    +-(2) VertexRDD.createRoutingTables - vid2pid (aggregation) MapPartitionsRDD[15] at mapPartitions at VertexRDD.scala:452
392 |        |  EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169
393 |        |  MappedRDD[11] at map at Graph.scala:392
394 |        |  MappedRDD[10] at distinct at KCoreCommonDebug.scala:115
395 |        |  ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115
396 |        +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115
397 |           |  FilteredRDD[7] at filter at KCoreCommonDebug.scala:112
398 |           |  MappedRDD[6] at map at KCoreCommonDebug.scala:102
399 |           |  MappedRDD[5] at repartition at KCoreCommonDebug.scala:101
400 |           |  CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101
401 |           |  ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101
402 |           +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101
403 |              |  D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100
404 |              |  D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:100
405 |     |  ShuffledRDD[23] at ShuffledRDD at MessageToPartition.scala:31
406 |     +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[22] at mapPartitions at GraphImpl.scala:192
407 |        |  EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169
408 |        |  MappedRDD[11] at map at Graph.scala:392
409 |        |  MappedRDD[10] at distinct at KCoreCommonDebug.scala:115
410 |        |  ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115
411 |        +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115
412 |           |  FilteredRDD[7] at filter at KCoreCommonDebug.scala:112
413 |           |  MappedRDD[6] at map at KCoreCommonDebug.scala:102
414 |           |  MappedRDD[5] at repartition at KCoreCommonDebug.scala:101
415 |           |  CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101
416 |           |  ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101
417 |           +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101
418 |             |  D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100
419 |              |  D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:10
420 | ```
421 | 
422 | [3] lineage demo，加checkpoint，第5轮迭代
423 | ```scala
424 | [Iter 5][DEBUG] (2) VertexRDD[113] at RDD at VertexRDD.scala:58
425 | |  VertexRDD ZippedPartitionsRDD2[112] at zipPartitions at VertexRDD.scala:200
426 | |  VertexRDD MapPartitionsRDD[103] at mapPartitions at VertexRDD.scala:127
427 | |  VertexRDD ZippedPartitionsRDD2[91] at zipPartitions at VertexRDD.scala:200
428 | |  VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127
429 | |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
430 | |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
431 | |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
432 | |  CheckpointRDD[56] at apply at List.scala:318
433 | |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
434 | |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
435 | |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
436 | |  CheckpointRDD[56] at apply at List.scala:318
437 | |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
438 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
439 |     |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
440 |     |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
441 |     |  CheckpointRDD[57] at apply at List.scala:318
442 | |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[89] at zipPartitions at VertexRDD.scala:301
443 | |  VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127
444 | |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
445 | |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
446 | |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
447 | |  CheckpointRDD[56] at apply at List.scala:318
448 | |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
449 | |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
450 | |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
451 | |  CheckpointRDD[56] at apply at List.scala:318
452 | |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
453 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
454 |     |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
455 |     |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
456 |     |  CheckpointRDD[57] at apply at List.scala:318
457 | |  ShuffledRDD[88] at ShuffledRDD at MessageToPartition.scala:31
458 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[87] at mapPartitions at GraphImpl.scala:192
459 |     |  EdgeRDD MapPartitionsRDD[84] at mapPartitions at EdgeRDD.scala:85
460 |     |  EdgeRDD ZippedPartitionsRDD2[76] at zipPartitions at ReplicatedVertexView.scala:114
461 |     |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
462 |     |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
463 |     |  CheckpointRDD[57] at apply at List.scala:318
464 |     |  ShuffledRDD[75] at partitionBy at ReplicatedVertexView.scala:112
465 |     +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[74] at mapPartitions at VertexRDD.scala:347
466 |        |  VertexRDD ZippedPartitionsRDD2[72] at zipPartitions at VertexRDD.scala:174
467 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
468 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
469 |        |  CheckpointRDD[56] at apply at List.scala:318
470 |        |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
471 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
472 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
473 |        |  CheckpointRDD[56] at apply at List.scala:318
474 |        |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
475 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
476 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
477 |        |  CheckpointRDD[56] at apply at List.scala:318
478 |        |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
479 |        +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
480 |           |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
481 |           |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
482 |           |  CheckpointRDD[57] at apply at List.scala:318
483 | |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[110] at zipPartitions at VertexRDD.scala:301
484 | |  VertexRDD MapPartitionsRDD[103] at mapPartitions at VertexRDD.scala:127
485 | |  VertexRDD ZippedPartitionsRDD2[91] at zipPartitions at VertexRDD.scala:200
486 | |  VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127
487 | |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
488 | |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
489 | |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
490 | |  CheckpointRDD[56] at apply at List.scala:318
491 | |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
492 | |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
493 | |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
494 | |  CheckpointRDD[56] at apply at List.scala:318
495 | |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
496 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
497 |     |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
498 |     |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
499 |     |  CheckpointRDD[57] at apply at List.scala:318
500 | |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[89] at zipPartitions at VertexRDD.scala:301
501 | |  VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127
502 | |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
503 | |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
504 | |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
505 | |  CheckpointRDD[56] at apply at List.scala:318
506 | |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
507 | |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
508 | |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
509 | |  CheckpointRDD[56] at apply at List.scala:318
510 | |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
511 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
512 |     |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
513 |     |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
514 |     |  CheckpointRDD[57] at apply at List.scala:318
515 | |  ShuffledRDD[88] at ShuffledRDD at MessageToPartition.scala:31
516 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[87] at mapPartitions at GraphImpl.scala:192
517 |     |  EdgeRDD MapPartitionsRDD[84] at mapPartitions at EdgeRDD.scala:85
518 |     |  EdgeRDD ZippedPartitionsRDD2[76] at zipPartitions at ReplicatedVertexView.scala:114
519 |     |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
520 |     |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
521 |     |  CheckpointRDD[57] at apply at List.scala:318
522 |     |  ShuffledRDD[75] at partitionBy at ReplicatedVertexView.scala:112
523 |     +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[74] at mapPartitions at VertexRDD.scala:347
524 |        |  VertexRDD ZippedPartitionsRDD2[72] at zipPartitions at VertexRDD.scala:174
525 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
526 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
527 |        |  CheckpointRDD[56] at apply at List.scala:318
528 |        |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
529 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
530 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
531 |        |  CheckpointRDD[56] at apply at List.scala:318
532 |        |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
533 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
534 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
535 |        |  CheckpointRDD[56] at apply at List.scala:318
536 |        |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
537 |        +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
538 |           |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
539 |           |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
540 |           |  CheckpointRDD[57] at apply at List.scala:318
541 | |  ShuffledRDD[109] at ShuffledRDD at MessageToPartition.scala:31
542 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[108] at mapPartitions at GraphImpl.scala:192
543 |     |  EdgeRDD MapPartitionsRDD[105] at mapPartitions at EdgeRDD.scala:85
544 |     |  EdgeRDD ZippedPartitionsRDD2[97] at zipPartitions at ReplicatedVertexView.scala:114
545 |     |  EdgeRDD MapPartitionsRDD[84] at mapPartitions at EdgeRDD.scala:85
546 |     |  EdgeRDD ZippedPartitionsRDD2[76] at zipPartitions at ReplicatedVertexView.scala:114
547 |     |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
548 |     |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
549 |     |  CheckpointRDD[57] at apply at List.scala:318
550 |     |  ShuffledRDD[75] at partitionBy at ReplicatedVertexView.scala:112
551 |     +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[74] at mapPartitions at VertexRDD.scala:347
552 |        |  VertexRDD ZippedPartitionsRDD2[72] at zipPartitions at VertexRDD.scala:174
553 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
554 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
555 |        |  CheckpointRDD[56] at apply at List.scala:318
556 |        |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
557 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
558 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
559 |        |  CheckpointRDD[56] at apply at List.scala:318
560 |        |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
561 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
562 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
563 |        |  CheckpointRDD[56] at apply at List.scala:318
564 |        |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
565 |        +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
566 |           |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
567 |           |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
568 |           |  CheckpointRDD[57] at apply at List.scala:318
569 |     |  ShuffledRDD[96] at partitionBy at ReplicatedVertexView.scala:112
570 |     +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[95] at mapPartitions at VertexRDD.scala:347
571 |        |  VertexRDD ZippedPartitionsRDD2[93] at zipPartitions at VertexRDD.scala:174
572 |        |  VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127
573 |        |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
574 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
575 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
576 |        |  CheckpointRDD[56] at apply at List.scala:318
577 |        |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
578 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
579 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
580 |        |  CheckpointRDD[56] at apply at List.scala:318
581 |        |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
582 |        +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
583 |           |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
584 |           |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
585 |           |  CheckpointRDD[57] at apply at List.scala:318
586 |        |  VertexRDD ZippedPartitionsRDD2[91] at zipPartitions at VertexRDD.scala:200
587 |        |  VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127
588 |        |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
589 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
590 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
591 |        |  CheckpointRDD[56] at apply at List.scala:318
592 |        |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
593 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
594 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
595 |        |  CheckpointRDD[56] at apply at List.scala:318
596 |        |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
597 |        +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
598 |           |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
599 |           |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
600 |           |  CheckpointRDD[57] at apply at List.scala:318
601 |        |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[89] at zipPartitions at VertexRDD.scala:301
602 |        |  VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127
603 |        |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
604 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
605 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
606 |        |  CheckpointRDD[56] at apply at List.scala:318
607 |        |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
608 |        |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
609 |        |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
610 |        |  CheckpointRDD[56] at apply at List.scala:318
611 |        |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
612 |        +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
613 |           |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
614 |           |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
615 |           |  CheckpointRDD[57] at apply at List.scala:318
616 |        |  ShuffledRDD[88] at ShuffledRDD at MessageToPartition.scala:31
617 |        +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[87] at mapPartitions at GraphImpl.scala:192
618 |           |  EdgeRDD MapPartitionsRDD[84] at mapPartitions at EdgeRDD.scala:85
619 |           |  EdgeRDD ZippedPartitionsRDD2[76] at zipPartitions at ReplicatedVertexView.scala:114
620 |           |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
621 |           |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
622 |           |  CheckpointRDD[57] at apply at List.scala:318
623 |           |  ShuffledRDD[75] at partitionBy at ReplicatedVertexView.scala:112
624 |           +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[74] at mapPartitions at VertexRDD.scala:347
625 |              |  VertexRDD ZippedPartitionsRDD2[72] at zipPartitions at VertexRDD.scala:174
626 |              |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
627 |              |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
628 |              |  CheckpointRDD[56] at apply at List.scala:318
629 |              |  VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200
630 |              |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
631 |              |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
632 |              |  CheckpointRDD[56] at apply at List.scala:318
633 |              |  VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301
634 |              |  VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127
635 |              |  VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200
636 |              |  CheckpointRDD[56] at apply at List.scala:318
637 |              |  ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31
638 |              +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192
639 |                 |  EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85
640 |                 |  EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114
641 |                 |  CheckpointRDD[57] at apply at List.scala:318
642 | ```


--------------------------------------------------------------------------------
/BigDataSystems/Spark/StackOverflowDiagnosis/figures/g1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/StackOverflowDiagnosis/figures/g1.png


--------------------------------------------------------------------------------
/BigDataSystems/Spark/StackOverflowDiagnosis/figures/g2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/StackOverflowDiagnosis/figures/g2.png


--------------------------------------------------------------------------------
/BigDataSystems/Spark/StackOverflowDiagnosis/figures/g3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/StackOverflowDiagnosis/figures/g3.png


--------------------------------------------------------------------------------