├── .gitignore └── BigDataSystems ├── Petuum ├── BgThreads.md ├── CientTableUpdate.md ├── ImportantClasses.md ├── Introduction-to-parameter-server-system.pptx ├── Introduction-to-parameter-server.pptx ├── MF-logs │ ├── MF-log.pdf │ ├── driver.txt │ ├── matrixfact.ubuntu.xulijie.log.INFO.20141230-152214.13924.txt │ └── matrixfact.ubuntu2.xulijie.log.INFO.20141230-152217.8866.txt ├── Matrix-Factorization-Analysis.md ├── MatrixFactorization.md ├── PetuumArchitecture.md ├── Petuum基本架构.md ├── Petuum基础.md ├── Petuum本地编译运行.md ├── Petuum系统及Table配置.md ├── STRADS.md ├── ServerThreads.md ├── TableCreation.md ├── ThreadInitialization.md ├── figures │ ├── Architecture.png │ ├── BSP-ABSP-SSP.png │ ├── ClientTableUpdate.png │ ├── Compare-BSP-ABSP-SSP.png │ ├── ConsistencyModel.png │ ├── CreateTable.png │ ├── CreateTableThreads.png │ ├── DistributedThreads.png │ ├── LocalThreads.png │ ├── PSTableGroup-Init().png │ ├── Petuum-architecture.png │ ├── Petuum-ps-topology.png │ ├── Petuum架构图.graffle │ ├── Petuum架构图.png │ ├── STRADS-architecture.png │ ├── matrixfact-petuum.png │ ├── matrixfact.png │ ├── parallel-matrixfact.png │ └── petuum-overview.png └── 杂项.md └── Spark ├── Build └── BuildingSpark.md ├── ML ├── Introduction to MLlib Pipeline.md └── figures │ ├── CrossValidatorDemo.png │ ├── DAGpipeline.png │ └── pipelineDemo.png ├── Scheduler ├── SparkResourceManager.graffle ├── SparkScheduler.graffle ├── SparkScheduler.md └── figures │ ├── SparkResourceManager.pdf │ ├── SparkSchedulerAppSubmit.pdf │ ├── SparkStandaloneMaster.pdf │ ├── SparkStandaloneResourceAllocation.pdf │ ├── SparkStandaloneTaskScheduler.pdf │ └── SparkStandaloneTaskSchedulerChinese.pdf └── StackOverflowDiagnosis ├── GraphX_StackOverflow_Cause_Diagnosis.md ├── StackOverflow.md └── figures ├── g1.png ├── g2.png └── g3.png /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/BgThreads.md: -------------------------------------------------------------------------------- 1 | # BgWorkers 2 | 3 | BgWorkers的角色与ServerThreads的角色类似,都是管理本进程里的bg/server threads。BgWorker通过BgContext来管理,ServerThreads通过ServerContext来管理。 4 | 5 | BgContext里面存放了以下数据结构: 6 | 7 | ```c++ 8 | int version; // version of the data, increment when a set of OpLogs 9 | // are sent out; may wrap around 10 | // More specifically, version denotes the version of the 11 | // OpLogs that haven't been sent out. 12 | // version表示client端的最新opLog还没有发送给server 13 | RowRequestOpLogMgr *row_request_oplog_mgr; 14 | 15 | // initialized by BgThreadMain(), used in CreateSendOpLogs() 16 | // For server x, table y, the size of serialized OpLog is ... 17 | map > server_table_oplog_size_map; 18 | // The OpLog msg to each server 19 | map server_oplog_msg_map; 20 | // map server id to oplog msg size 21 | map server_oplog_msg_size_map; 22 | // size of oplog per table, reused across multiple tables 23 | map table_server_oplog_size_map; 24 | 25 | /* Data members needed for server push */ 26 | VectorClock server_vector_clock; 27 | ``` 28 | 29 | ## Bg thread初始化 30 | 31 | 1. 在bg thread初始化时会先打印出来“Bg Worker starts here, my id = 100/1100”。 32 | 2. InitBgContext()。设置一下`bg_context->row_request_oplog_mgr = new SSPPushRowRequestOpLogMgr`。然后对PS中的每一个`serverId`,将其放入下列数据结构,`server_table_oplog_size_map.insert(serverId, map())`,`server_oplog_msg_map.insert(serverId, 0)`,`server_oplog_msg_size_map.insert(serverId, 0)`,`table_server_oplog_size_map.insert(serverId, 0)`,`server_vector_clock.AddClock(serverId)`。AddClock会将`serverId, clock=0`放入到`server_vector_clock`中。 33 | 3. BgServerHanshake()。 34 | 35 | ``` 36 | 1. 通过ConnectToNameNodeOrServer(name_node_id)连接Namenode。 37 | 首先打印出"ConnectToNameNodeOrServer server_id"。 38 | 然后将自己的client_id填入到ClientConnectMsg中。 39 | 最后将msg发送给server_id对应的local/remote server thread(这里是Namenode thread)。 40 | 2. 等待Namenode返回的ConnectServerMsg (kConnectServer)消息。 41 | 3. 连接PS里面的每个server thread,仍然是通过ConnectToNameNodeOrServer(server_id)。 42 | 4. 等待,直到收到所有server thread返回的kClientStart信息,每收到一条信息就会打印"get kClientStart from server_id"。 43 | 5. 收到namenode和所有server返回的信息后,退出。 44 | ``` 45 | 4. 解除`pthread_barrier_wait`。 46 | 5. 去接受本进程内的AppInitThread的连接。使用`RecvAppInitThreadConnection()`去接受连接,连接消息类型是kAppConnect。 47 | 6. 如果本bg thread是head bg thread(第一个bg thread)就要承担CreateClientTable的任务,先打印"head bg handles CreateTable",然后调用HandleCreateTables(),然后wait直到Table创建完成。 48 | 7. 最后便进入了无限等待循环,等待接受msg,处理msg。 49 | 50 | ### HandleCreateTables() 51 | 52 | > the app thread shall not submit another create table request before the current one returns as it is blocked waiting 53 | 54 | 1. 假设要create 3 tables,那么会去`comm_bus`索取这每个table的BgCreateTableMsg (kBgCreateTable),然后从msg中提取`staleness, row_type, row_capacity, process_cache_capacity, thread_cache_capacity, oplog_capacity`。 55 | 2. 将`table_id, staleness, row_type, row_capacity`包装成`CreateTableMsg`,然后将该msg发送到Namenode。 56 | 3. 等待接收Namenode的反馈信息CreateTableReplyMsg (kCreateTableReply),收到就说明namenode已经知道head bg thread要创建ClientTable。 57 | 4. 然后可以创建`client_table = new ClientTable(table_id, client_table_config)`。 58 | 5. 将`client_table`放进`map tables`里。 59 | 6. 打印"Reply app thread",回复app init thread表示ClientTable已经创建。 60 | 61 | ### `ClientTable(table_id, client_table_config)` 62 | 63 | 与ServerTable直接存储parameter rows不同,ClientTable是一个逻辑概念,它相当于一个ServerTable的buffer/cache,app thread将最新的参数先写入到这个buffer,然后push到Server上。从Server端pull parameter rows的时候也一样,先pull到ClientTable里面然后读到app thread里面。 64 | 65 | ![](figures/ClientTableUpdate.png) 66 | 67 | 1. 提取`table_id, row_type`。 68 | 2. 创建一个`Row sample_row`,创建这个row只是用来使用Row中的函数,而不是ClientTable中实际存储value的row,实际的row存放在`process_storage`中。 69 | 3. 初始化一下oplog,oplog用于存储parameter的本地更新,也就是实际的updated value。有几个bg thread,就有几个oplog.opLogPartition。 70 | 4. 初始化`process_storage(config.process_cache_capacity)`。`process_storage`被所有thread共享,里面存储了ClientTable的实际rows,但由于`process_storage`有大小限制(row的个数),可能存储ClientTable的一部分,完整的Table存放在Server端。 71 | 5. 初始化`oplog_index`,目前还不知道这个东西是干嘛的? 72 | 6. 设置Table的一致性控制器,如果是SSP协议就使用SSPConsistencyController,如果是SSPPush协议,使用SSPPushConsistencyController。 73 | 74 | ## 当bg thread收到kAppConnect消息 75 | 76 | 1. `++num_connected_app_threads` 77 | 78 | ## 当bg thread收到kRowRequest消息 79 | 80 | 1. 接收到`row_request_msg`,类型是RowRequestMsg。 81 | 2. 调用`CheckForwardRowRequestToServer(sender_id, row_request_msg)`来处理rowRequest消息,`sender_id`就是app thread id。 82 | 83 | ### `CheckForwardRowRequestToServer(app_thread_id, row_request_msg)` 84 | 85 | 1. 从msg中提取出`table_id, row_id, clock`。 86 | 2. 从tables中找到`table_id`对应的ClientTable table。 87 | 3. 提取出table对应的ProcessStorage,并去该storage中查找`row_id`对应的row。 88 | 4. 如果找到了对应的row,且row的clock满足要求(row.clock >= request.clock),那么只是发一个空RowRequestReplyMsg消息给app thread,然后return。如果没找到对应的row,那就要去server端取,会执行下面的步骤: 89 | 5. 构造一个RowRequestInfo,初始化它的`app_thread_id, clock = row_request_msg.clock, version = bgThread.version - 1`。Version in request denotes the update version that the row on server can see. Which should be 1 less than the current version number。 90 | 6. 将这个RowRequestInfo加入到RowRequestOpLogMgr中,使用`bgThread.row_request_oplog_mgr->AddRowRequest(row_request, table_id, row_id)`。 91 | 7. 如果必须send这个RowRequestInfo(本地最新更新也没有)到server,就会先根据`row_id`计算存储该`row_id`的`server_id`(通过GetRowPartitionServerID(table_id, row_id),只是简单地`server_ids[row_id % num_server]`),然后发`row_request_msg`请求给server。 92 | 93 | ### `SSPRowRequestOpLogMgr.AddRowRequest(row_request, table_id, row_id)` 94 | 95 | 1. 提取出request的version (也就是bgThread.version - 1)。 96 | 2. request.sent = true。 97 | 3. 去`map<(tableId, rowId), list > bgThread.row_request_oplog_mgr.pending_row_requests`里取出`(request.table_id, request.row_id)`对应的list,然后从后往前查看,将request插入到合适的位置,使得prev.clock < request.clock < next.clock。如果插入成功,那么会打印"I'm requesting clock is request.clock. There's a previous request requesting clock is prev.clock."。然后将request.sent设置为false(意思是不用send request到server端,先暂时保存),`request_added`设置为true。 98 | 4. `++version_request_cnt_map[version]`。 99 | 100 | 101 | > 可见在client和server端之间不仅要cache push/pull的parameters,还要cache push/pull的requests。 102 | 103 | ## 当bg thread收到kServerRowRequestReply消息 104 | 105 | 1. 收到ServerRowRequestReplyMsg消息 106 | 2. 处理消息`HandleServerRowRequestReply(server_id, server_row_request_reply_msg)`。 107 | 108 | ### `HandleServerRowRequestReply(server_id, server_row_request_reply_msg)` 109 | 110 | 1. 先从msg中提取出`table_id, row_id, clock, version`。 111 | 2. 从bgWorkers.tables中找到`table_id`对应的ClientTable。 112 | 3. 将msg中的row反序列化出来,放到`Row *row_data`中。 113 | 4. 将msg的version信息添加到`bgThread.row_request_oplog_mgr`中,使用`bgThread.row_request_oplog_mgr->ServerAcknowledgeVersion(server_id, version)`。 114 | 5. 处理row,使用`ApplyOpLogsAndInsertRow(table_id, client_table, row_id, version, row_data, clock)`。 115 | 6. `int clock_to_request = bgThread.row_request_oplog_mgr->InformReply(table_id, row_id, clock, bgThread.version, &app_thread_ids)`。 116 | 7. 如果`clock_to_request > 0`,那么构造RowRequestMsg,将`tabel_id, row_id, clock_to_request`填进msg。根据`table_id, row_id`计算存放该row的server thread,然后将msg发给server,并打印“send to server + serverId”。 117 | 8. 构造一个空的RowRequestReplyMsg,发送给每个app thread。 118 | 119 | 120 | ### `row_request_oplog_mgr.ServerAcknowledgeVersion(server_id, version)` 121 | 目前RowRequestOpLogMgr中的方法都会调用其子类SSPRowRequestOpLogMgr中的方法。本方法目前为空。 122 | 123 | ### `ApplyOpLogsAndInsertRow(table_id, client_table, row_id, version, row_data, clock)` 124 | 125 | Step 1:该函数首先执行`ApplyOldOpLogsToRowData(table_id, client_table, row_id, row_version, row_data)`,具体执行如下步骤: 126 | 127 | 1. 如果msg.version + 1 >= bgThread.version,那么直接return。 128 | 2. 调用`bg_oplog = bgThread.row_request_oplog_mgr->OpLogIterInit(version + 1, bgThread.version - 1)`。 129 | 3. `oplog_version = version + 1`。 130 | 4. 对于每一条`bg_oplog: BgOpLog`执行如下操作: 131 | 5. 得到`table_id`对应的BgOpLogPartitions,使用`BgOpLogPartition *bg_oplog_partition = bg_oplog->Get(table_id)`。 132 | 6. `RowOpLog *row_oplog = bg_oplog_partition->FindOpLog(row_id)`。 133 | 7. 如果`row_oplog`不为空,将RowOpLog中的update都更新到`row_data`上。 134 | 8. 然后去获得下一条`bg_oplog`,使用`bg_oplog = bgThread.row_request_oplog_mgr->OpLogIterNext(&oplog_version)`。该函数会调用`SSPRowRequestOpLogMgr.GetOpLog(version)`去`version_oplog_map`那里获得oplog。 135 | 136 | BgOpLog和TableOpLog不一样,BgOpLog自带的数据结构是`map table_oplog_map`。BgOpLog由RowRequest OpLogMgr自带的`map version_oplog_map`持有,而RowRequestOpLogMgr由每个bg thread持有。RowRequestOpLogMgr有两个子类:SSPRowRequestOpLogMgr和SSPPushRowRequestOpLogMgr。TableOpLog由每个ClientTable对象持有。BgOpLog对row request进行cache,而TableOpLog对parameter updates进行cache。 137 | 138 | Step 2:`ClientRow *client_row = CreateClientRowFunc(clock, row_data)` 139 | 140 | Step 3:获取ClientTable的oplog,使用`TableOpLog &table_oplog = client_table->get_oplog()`。 141 | 142 | Step 4:提取TableOpLog中对应的row的oplogs,然后更新到`row_data`上。 143 | 144 | Step 5:最后将`(row_id, client_row)`插入到ClientTable的`process_storage`中。 145 | 146 | > 整个过程可以看到,先new出来一个新的row,然后将BgThread.BgOpLog持有的一些RowOpLog更新到row上,接着将ClientTable持有的RowOpLog更新到row上。 147 | 148 | 149 | 150 | 151 | ### `row_request_oplog_mgr.InformReply(table_id, row_id, clock, bgThread.version, &app_thread_ids)` 152 | 153 | 154 | ## SSPRowRequestOpLogMgr逻辑 155 | 156 | 1. 负责持有client**待发往**或者**已发往**server的row requests。这些row不在本地process cache中。 157 | 2. 如果requested row不在本地cache中,bg worker会询问RowRequestMgr是否已经发出了改row的request,如果没有,那么就send该row的request,否则,就等待server response。 158 | 3. 当bg worker收到该row的reply时,它会将该row insert到process cache中,然后使用RowRequestMgr检查哪些buffered row request可以被reply。 159 | 4. 从一个row reqeust被bg worker发到server,到bg worker接收server reply的这段时间内,bg worker可能已经发了多组row update requests到server。Server端会buffer这些row然后等到一定时间再update server端的ServerTable,然后再reply。 160 | 5. Bg worker为每一组updates分配一个单调递增的version number。本地的version number表示已经被发往server的updates最新版本。当一个row request被发送的时候,它会包含本地最新的version number。Server接收和处理messages会按照一定的顺序,当server在处理一个row request的时候,比该row request version小的row requests会先被处理,也就是说server按照version顺序来处理同一row的requests。 161 | 6. 当server buffer一个client发来的row request后,又收到同一个client发来的一组updates的时候,server会增加这个已经被buffer的row request的version。这样,当client收到这个row request的reply的时候,它会通过version知道哪些updates已经被server更新,之后在将row插入到process cache之前,会将missing掉的updates应用到row上。 162 | 7. RowRequestMgr也负责跟踪管理sent oplog。一个oplog会一直存在不被删掉,直到在此version之前的row requests都已经被reply。 163 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/CientTableUpdate.md: -------------------------------------------------------------------------------- 1 | # ClientTable Upadte 2 | 3 | ## 总体架构 4 | ![](figures/Architecture.png) 5 | 6 | 这个是物理架构图,但实际实现比这张图复杂。可以看到为了减少Server里Table的访问次数,Petuum在Client端设计了两级缓存,分别是Thread级别和Process级别的缓存。 7 | 8 | ## ClientTable结构图 9 | ![](figures/ClientTableUpdate.png) 10 | 11 | ClientTable实际存放在ProcessStorage中,但相对于ServerTable来说,ProcessStorage中存放的Table只是ServerTable的一部分,甚至可以设置ClientTable的row_num为0,这样就可以减少Client端的内存使用量。 12 | 13 | ## ClientTable初始化 14 | 15 | ClientTable属性: 16 | 17 | | Name | Default | Description | L Table | 18 | |:-----|:------|:-------|:-------| 19 | | table\_info.row\_type| N/A | row type (e.g., 0 表示 DenseRow) | 0 | 20 | | process\_cache\_capacity| 0 | Table 里的 row个数| matrix.getN() | 21 | | table\_info.row\_capacity| 0 | 对于 DenseRow,指column个数,对SparseRow无效| K | 22 | | table\_info.table_staleness | 0 | SSP staleness | 0 | 23 | | table\_info.oplog\_capacity | 0 | OpLogTable里面最多可以写入多少个row | 100 | 24 | 25 | 每个bg thread持有一个OpLogTable,OpLogTable的`row_num = oplog_capacity / bg_threads_num`。 26 | 27 | 代码分析: 28 | 29 | ```c++ 30 | void SSPConsistencyController::BatchInc(int32_t row_id, 31 | const int32_t* column_ids, const void* updates, int32_t num_updates) { 32 | 33 | // updates就是每个col上要increase的value。 34 | // 比如,col 1和col 3都要加1,那么column_ids = {1, 3},updates = {1, 1} 35 | // thread_cache_是ThreadTable的指针,ThreadTable就是ClientTable或者ServerTable 36 | // IndexUpadte(row_id)会 37 | thread_cache_->IndexUpdate(row_id); 38 | 39 | OpLogAccessor oplog_accessor; 40 | oplog_.FindInsertOpLog(row_id, &oplog_accessor); 41 | 42 | const uint8_t* deltas_uint8 = reinterpret_cast(updates); 43 | 44 | for (int i = 0; i < num_updates; ++i) { 45 | void *oplog_delta = oplog_accessor.FindCreate(column_ids[i]); 46 | sample_row_->AddUpdates(column_ids[i], oplog_delta, deltas_uint8 47 | + sample_row_->get_update_size()*i); 48 | } 49 | 50 | RowAccessor row_accessor; 51 | bool found = process_storage_.Find(row_id, &row_accessor); 52 | if (found) { 53 | row_accessor.GetRowData()->ApplyBatchInc(column_ids, updates, 54 | num_updates); 55 | } 56 | } 57 | ``` 58 | 59 | 60 | ## ClientTable属性解释 61 | 62 | ```c++ 63 | Class ClientTable { 64 | private: 65 | // table Id 66 | int32_t table_id_; 67 | // Table里面row的类型,比如DenseRow 68 | int32_t row_type_; 69 | // Row的游标(指针) 70 | const AbstractRow* const sample_row_; 71 | // Table的更新日志 72 | TableOpLog oplog_; 73 | // 进程里cache的Table 74 | ProcessStorage process_storage_; 75 | // Table的一致性控制协议 76 | AbstractConsistencyController *consistency_controller_; 77 | 78 | // ThreadTable就是ClientTable或者ServerTable 79 | // thread_cahce就是Threads维护的ClientTable的全局对象 80 | boost::thread_specific_ptr thread_cache_; 81 | // 操作日志,每个bg thread对应一个index value 82 | TableOpLogIndex oplog_index_; 83 | } 84 | ``` -------------------------------------------------------------------------------- /BigDataSystems/Petuum/ImportantClasses.md: -------------------------------------------------------------------------------- 1 | # Important Classes 2 | 3 | ## ClientTable 4 | 5 | ```c++ 6 | class ClientTable : public AbstractClientTable { 7 | public: 8 | // Instantiate AbstractRow, TableOpLog, and ProcessStorage using config. 9 | ClientTable(int32_t table_id, const ClientTableConfig& config); 10 | 11 | ~ClientTable(); 12 | 13 | void RegisterThread(); 14 | 15 | void GetAsync(int32_t row_id); 16 | void WaitPendingAsyncGet(); 17 | void ThreadGet(int32_t row_id, ThreadRowAccessor *row_accessor); 18 | void ThreadInc(int32_t row_id, int32_t column_id, const void *update); 19 | void ThreadBatchInc(int32_t row_id, const int32_t* column_ids, 20 | const void* updates, 21 | int32_t num_updates); 22 | void FlushThreadCache(); 23 | 24 | void Get(int32_t row_id, RowAccessor *row_accessor); 25 | void Inc(int32_t row_id, int32_t column_id, const void *update); 26 | void BatchInc(int32_t row_id, const int32_t* column_ids, const void* updates, 27 | int32_t num_updates); 28 | 29 | void Clock(); 30 | cuckoohash_map *GetAndResetOpLogIndex(int32_t client_table); 31 | 32 | ProcessStorage& get_process_storage () { 33 | return process_storage_; 34 | } 35 | 36 | TableOpLog& get_oplog () { 37 | return oplog_; 38 | } 39 | 40 | const AbstractRow* get_sample_row () const { 41 | return sample_row_; 42 | } 43 | 44 | int32_t get_row_type () const { 45 | return row_type_; 46 | } 47 | 48 | private: 49 | int32_t table_id_; 50 | int32_t row_type_; 51 | // 指向每一个row的指针 52 | const AbstractRow* const sample_row_; 53 | // Table操作日志 54 | TableOpLog oplog_; 55 | // 进程的Table 56 | ProcessStorage process_storage_; 57 | // Table的一致性controller 58 | AbstractConsistencyController *consistency_controller_; 59 | 60 | // ThreadTable指针 61 | boost::thread_specific_ptr thread_cache_; 62 | // Table操作日志的index 63 | TableOpLogIndex oplog_index_; 64 | }; 65 | 66 | } // namespace petuum 67 | ``` 68 | 69 | ## ThreadTable 70 | 71 | ```c++ 72 | class ThreadTable : boost::noncopyable { 73 | public: 74 | explicit ThreadTable(const AbstractRow *sample_row); 75 | ~ThreadTable(); 76 | void IndexUpdate(int32_t row_id); 77 | void FlushOpLogIndex(TableOpLogIndex &oplog_index); 78 | 79 | AbstractRow *GetRow(int32_t row_id); 80 | void InsertRow(int32_t row_id, const AbstractRow *to_insert); 81 | void Inc(int32_t row_id, int32_t column_id, const void *delta); 82 | void BatchInc(int32_t row_id, const int32_t *column_ids, 83 | const void *deltas, int32_t num_updates); 84 | 85 | void FlushCache(ProcessStorage &process_storage, TableOpLog &table_oplog, 86 | const AbstractRow *sample_row); 87 | 88 | private: 89 | // Vector[set, set, set, ..., set] 90 | std::vector > oplog_index_; 91 | // HashMap 92 | boost::unordered_map row_storage_; 93 | // HashMap 94 | boost::unordered_map oplog_map_; 95 | // Row指针 96 | const AbstractRow *sample_row_; 97 | }; 98 | ``` 99 | 100 | ## TableGroup 101 | ```c++ 102 | class TableGroup : public AbstractTableGroup { 103 | public: 104 | TableGroup(const TableGroupConfig &table_group_config, 105 | bool table_access, int32_t *init_thread_id); 106 | 107 | ~TableGroup(); 108 | 109 | bool CreateTable(int32_t table_id, 110 | const ClientTableConfig& table_config); 111 | 112 | void CreateTableDone(); 113 | 114 | void WaitThreadRegister(); 115 | 116 | AbstractClientTable *GetTableOrDie(int32_t table_id) { 117 | auto iter = tables_.find(table_id); 118 | CHECK(iter != tables_.end()) << "Table " << table_id << " does not exist"; 119 | return static_cast(iter->second); 120 | } 121 | 122 | int32_t RegisterThread(); 123 | 124 | void DeregisterThread(); 125 | 126 | void Clock(); 127 | 128 | void GlobalBarrier(); 129 | 130 | private: 131 | typedef void (TableGroup::*ClockFunc) (); 132 | ClockFunc ClockInternal; 133 | 134 | void ClockAggressive(); 135 | void ClockConservative(); 136 | 137 | // TreeMap 138 | std::map tables_; 139 | // barrier 140 | pthread_barrier_t register_barrier_; 141 | // 注册的app thread(也就是worker thread)数目 142 | std::atomic num_app_threads_registered_; 143 | 144 | // Max staleness among all tables. 145 | int32_t max_table_staleness_; 146 | // Table处于第几个clock里面 147 | VectorClockMT vector_clock_; 148 | }; 149 | ``` 150 | 151 | ## SSPClientRow 152 | 153 | ```c++ 154 | // ClientRow is a wrapper on user-defined ROW data structure (e.g., vector, 155 | // map) with additional features: 156 | // 157 | // 1. Reference Counting: number of references used by application. Note the 158 | // copy in storage itself does not contribute to the count 159 | // 2. Row Metadata 160 | // 161 | // ClientRow does not provide thread-safety in itself. The locks are 162 | // maintained in the storage and in (user-defined) ROW. 163 | class SSPClientRow : public ClientRow { 164 | public: 165 | // ClientRow takes ownership of row_data. 166 | SSPClientRow(int32_t clock, AbstractRow* row_data): 167 | ClientRow(clock, row_data), 168 | clock_(clock){ } 169 | 170 | void SetClock(int32_t clock) { 171 | std::unique_lock ulock(clock_mtx_); 172 | clock_ = clock; 173 | } 174 | 175 | int32_t GetClock() const { 176 | std::unique_lock ulock(clock_mtx_); 177 | return clock_; 178 | } 179 | 180 | // Take row_data_pptr_ from other and destroy other. Existing ROW will not 181 | // be accessible any more, but will stay alive until all RowAccessors 182 | // referencing the ROW are destroyed. Accesses to SwapAndDestroy() and 183 | // GetRowDataPtr() must be mutually exclusive as they the former modifies 184 | // row_data_pptr_. 185 | void SwapAndDestroy(ClientRow* other) { 186 | clock_ = dynamic_cast(other)->clock_; 187 | ClientRow::SwapAndDestroy(other); 188 | } 189 | 190 | private: // private members 191 | mutable std::mutex clock_mtx_; 192 | int32_t clock_; 193 | }; 194 | ``` 195 | 196 | 197 | ## SerializedRowReader 198 | 199 | ```c++ 200 | // Provide sequential access to a byte string that's serialized rows. 201 | // Used to facilicate server reading row data. 202 | 203 | // st_separator : serialized_table_separator 204 | // st_end : serialized_table_end 205 | 206 | // Tables are serialized as the following memory layout 207 | // 1. int32_t : table id, could be st_separator or st_end 208 | // 2. int32_t : row id, could be st_separator or st_end 209 | // 3. size_t : serialized row size 210 | // 4. row data 211 | // repeat 1, 2, 3, 4 212 | // st_separator can not happen right after st_separator 213 | // st_end can not happen right after st_separator 214 | 215 | // Rules for serialization: 216 | // The serialized row data is guaranteed to end when seeing a st_end or with 217 | // finish reading the entire memory buffer. 218 | // When seeing a st_separator, there could be another table or no table 219 | // following. The latter happens only when the buffer reaches its memory 220 | // boundary. 221 | 222 | class SerializedRowReader : boost::noncopyable { 223 | public: 224 | // does not take ownership 225 | SerializedRowReader(const void *mem, size_t mem_size): 226 | mem_(reinterpret_cast(mem)), 227 | mem_size_(mem_size) { 228 | VLOG(0) << "mem_size_ = " << mem_size_; 229 | } 230 | ~SerializedRowReader() { } 231 | 232 | bool Restart() { 233 | offset_ = 0; 234 | current_table_id_ = *(reinterpret_cast(mem_ + offset_)); 235 | offset_ += sizeof(int32_t); 236 | 237 | if (current_table_id_ == GlobalContext::get_serialized_table_end()) 238 | return false; 239 | return true; 240 | } 241 | 242 | const void *Next(int32_t *table_id, int32_t *row_id, size_t *row_size) { 243 | // When starting, there are 4 possiblilities: 244 | // 1. finished reading the mem buffer 245 | // 2. encounter the end of an table but there are other tables following 246 | // (st_separator) 247 | // 3. encounter the end of an table but there is no other table following 248 | // (st_end) 249 | // 4. normal row data 250 | 251 | if (offset_ + sizeof (int32_t) > mem_size_) 252 | return NULL; 253 | *row_id = *(reinterpret_cast(mem_ + offset_)); 254 | offset_ += sizeof(int32_t); 255 | 256 | do { 257 | if (*row_id == GlobalContext::get_serialized_table_separator()) { 258 | if (offset_ + sizeof (int32_t) > mem_size_) 259 | return NULL; 260 | 261 | current_table_id_ = *(reinterpret_cast(mem_ + offset_)); 262 | offset_ += sizeof(int32_t); 263 | 264 | if (offset_ + sizeof (int32_t) > mem_size_) 265 | return NULL; 266 | 267 | *row_id = *(reinterpret_cast(mem_ + offset_)); 268 | offset_ += sizeof(int32_t); 269 | // row_id could be 270 | // 1) st_separator: if the table is empty and there there are other 271 | // tables following; 272 | // 2) st_end: if the table is empty and there are no more table 273 | // following 274 | continue; 275 | } else if (*row_id == GlobalContext::get_serialized_table_end()) { 276 | return NULL; 277 | } else { 278 | *table_id = current_table_id_; 279 | *row_size = *(reinterpret_cast(mem_ + offset_)); 280 | offset_ += sizeof(size_t); 281 | const void *data_mem = mem_ + offset_; 282 | offset_ += *row_size; 283 | //VLOG(0) << "mem read offset = " << offset_; 284 | return data_mem; 285 | } 286 | }while(1); 287 | } 288 | 289 | private: 290 | const uint8_t *mem_; 291 | size_t mem_size_; 292 | size_t offset_; // bytes to be read next 293 | int32_t current_table_id_; 294 | }; 295 | ``` 296 | 297 | ## ProcessStorage 298 | 299 | ```c++ 300 | // ProcessStorage is shared by all threads. 301 | // 302 | // TODO(wdai): Include thread storage in ProcessStorage. 303 | class ProcessStorage { 304 | public: 305 | // capacity is the upper bound of the number of rows this ProcessStorage 306 | // can store. 307 | explicit ProcessStorage(int32_t capacity, size_t lock_pool_size); 308 | 309 | ~ProcessStorage(); 310 | 311 | // Find row row_id; row_accessor is a read-only smart pointer. Return true 312 | // if found, false otherwise. Note that the # of active row_accessor 313 | // cannot be close to capacity, or Insert() will have undefined behavior 314 | // as we may not be able to evict any row that's not being referenced by 315 | // row_accessor. 316 | bool Find(int32_t row_id, RowAccessor* row_accessor); 317 | 318 | // Check if a row exists, does not count as one access 319 | bool Find(int32_t row_id); 320 | 321 | // Insert a row, and take ownership of client_row. Return true if row_id 322 | // does not already exist (possibly evicting another row), false if row 323 | // row_id already exists and is updated. If hitting capacity, then evict a 324 | // row using ClockLRU. Return read reference and evicted row id if 325 | // row_accessor and evicted_row_id is supplied. We assume 326 | // row_id is always non-negative, and use *evicted_row_id = -1 if no row 327 | // is evicted. The evicted row is guaranteed to have 0 reference count 328 | // (i.e., no application is using). 329 | // 330 | // Note: To stay below the capacity, we first check num_rows_. If 331 | // num_rows_ >= capacity_, we subtract (num_rows_ - capacity_) from 332 | // num_rows_ and then evict (num_rows_ - capacity_ + 1) rows using 333 | // ClockLRU before inserting. This could result in over-eviction when two 334 | // threads simultaneously do this eviction, but this is fine. 335 | // 336 | // TODO(wdai): Watch out when over-eviction clears the inactive list. 337 | bool Insert(int32_t row_id, ClientRow* client_row); 338 | bool Insert(int32_t row_id, ClientRow* client_row, 339 | RowAccessor* row_accessor, int32_t* evicted_row_id = 0); 340 | 341 | bool Insert(int32_t row_id, ClientRow* client_row, 342 | RowAccessor *row_accessor, int32_t *evicted_row_id, 343 | ClientRow** evicted_row); 344 | 345 | private: // private functions 346 | // Evict one inactive row using CLOCK replacement algorithm. 347 | void EvictOneInactiveRow(); 348 | 349 | // Find row_id in storage_map_, assuming there is lock on row_id. If 350 | // found, update it with client_row, reference LRU, and set row_accessor 351 | // accordingly, and return true. Return false if row_id is not found. 352 | bool FindAndUpdate(int32_t row_id, ClientRow* client_row); 353 | bool FindAndUpdate(int32_t row_id, ClientRow* client_row, 354 | RowAccessor* row_accessor); 355 | 356 | private: // private members 357 | // Number of rows allowed in this storage. 358 | int32_t capacity_; 359 | 360 | // Number of rows in the storage. We choose not to use Cuckoo's size() 361 | // which is more expensive. 362 | std::atomic num_rows_; 363 | 364 | // Shared map with ClockLRU. The key type is row_id (int32_t), and the 365 | // value type consists of a ClientRow* pointer (void*) and a slot # 366 | // (int32_t). 367 | // HashMap 368 | cuckoohash_map > storage_map_; 369 | 370 | // Depends on storage_map_, thus need to be initialized after it. 371 | ClockLRU clock_lru_; 372 | 373 | // Lock pool. 374 | StripedLock locks_; 375 | }; 376 | ``` 377 | 378 | ## RowOpLog 379 | ```c++ 380 | class RowOpLog : boost::noncopyable { 381 | public: 382 | RowOpLog(uint32_t update_size, InitUpdateFunc InitUpdate): 383 | update_size_(update_size), 384 | InitUpdate_(InitUpdate) { } 385 | 386 | ~RowOpLog() { 387 | auto iter = oplogs_.begin(); 388 | for (; iter != oplogs_.end(); iter++) { 389 | delete reinterpret_cast(iter->second); 390 | } 391 | } 392 | 393 | void* Find(int32_t col_id) { 394 | auto iter = oplogs_.find(col_id); 395 | if (iter == oplogs_.end()) { 396 | return 0; 397 | } 398 | return iter->second; 399 | } 400 | 401 | const void* FindConst(int32_t col_id) const { 402 | auto iter = oplogs_.find(col_id); 403 | if (iter == oplogs_.end()) { 404 | return 0; 405 | } 406 | return iter->second; 407 | } 408 | 409 | void* FindCreate(int32_t col_id) { 410 | auto iter = oplogs_.find(col_id); 411 | if (iter == oplogs_.end()) { 412 | void* update = reinterpret_cast(new uint8_t[update_size_]); 413 | InitUpdate_(col_id, update); 414 | oplogs_[col_id] = update; 415 | return update; 416 | } 417 | return iter->second; 418 | } 419 | 420 | // Guaranteed ordered traversal 421 | void* BeginIterate(int32_t *column_id) { 422 | iter_ = oplogs_.begin(); 423 | if (iter_ == oplogs_.end()) { 424 | return 0; 425 | } 426 | *column_id = iter_->first; 427 | return iter_->second; 428 | } 429 | 430 | void* Next(int32_t *column_id) { 431 | iter_++; 432 | if (iter_ == oplogs_.end()) { 433 | return 0; 434 | } 435 | *column_id = iter_->first; 436 | return iter_->second; 437 | } 438 | 439 | // Guaranteed ordered traversal, in ascending order of column_id 440 | const void* BeginIterateConst(int32_t *column_id) const { 441 | const_iter_ = oplogs_.cbegin(); 442 | if (const_iter_ == oplogs_.cend()) { 443 | return 0; 444 | } 445 | *column_id = const_iter_->first; 446 | return const_iter_->second; 447 | } 448 | 449 | const void* NextConst(int32_t *column_id) const { 450 | const_iter_++; 451 | if (const_iter_ == oplogs_.cend()) { 452 | return 0; 453 | } 454 | *column_id = const_iter_->first; 455 | return const_iter_->second; 456 | } 457 | 458 | int32_t GetSize() const { 459 | return oplogs_.size(); 460 | } 461 | 462 | private: 463 | // 464 | const uint32_t update_size_; 465 | // TreeMap 466 | std::map oplogs_; 467 | // 最初的update函数 468 | InitUpdateFunc InitUpdate_; 469 | 470 | std::map::iterator iter_; 471 | mutable std::map::const_iterator const_iter_; 472 | }; 473 | ``` -------------------------------------------------------------------------------- /BigDataSystems/Petuum/Introduction-to-parameter-server-system.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/Introduction-to-parameter-server-system.pptx -------------------------------------------------------------------------------- /BigDataSystems/Petuum/Introduction-to-parameter-server.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/Introduction-to-parameter-server.pptx -------------------------------------------------------------------------------- /BigDataSystems/Petuum/MF-logs/MF-log.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/MF-logs/MF-log.pdf -------------------------------------------------------------------------------- /BigDataSystems/Petuum/MF-logs/driver.txt: -------------------------------------------------------------------------------- 1 | xulijie@ubuntu:~/dev/Petuum/petuum-0.93/apps/matrixfact$ Data mode: Loading matrix sampledata/9x9_3blocks into memory... 2 | Matrix dimensions: 9 by 9 3 | # non-missing entries: 81 4 | Factorization rank: 3 5 | # client machines: 2 6 | # worker threads per client: 2 7 | SSP staleness: 5 8 | Step size formula: 0.5 * (100 + t)^(-0.5) 9 | Regularization strength lambda: 0.1 10 | (Note: displayed loss function does not include regularization term) 11 | Iteration 1/2... loss function = 81.5027... elapsed time = 0.082 12 | Iteration 2/2... loss function = 157.365... elapsed time = 0.007 13 | Outputting results to prefix mf_output ... done 14 | total runtime = 2.925s 15 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/MF-logs/matrixfact.ubuntu.xulijie.log.INFO.20141230-152214.13924.txt: -------------------------------------------------------------------------------- 1 | Log file created at: 2014/12/30 15:22:14 2 | Running on machine: ubuntu 3 | Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg 4 | I1230 15:22:14.711292 13924 comm_bus.cpp:117] CommBus ThreadRegister() 5 | I1230 15:22:14.711571 13925 comm_bus.cpp:117] CommBus ThreadRegister() 6 | I1230 15:22:14.712108 13925 name_node_thread.cpp:126] Number total_bg_threads() = 2 7 | I1230 15:22:14.712117 13925 name_node_thread.cpp:128] Number total_server_threads() = 2 8 | I1230 15:22:14.712414 13924 server_threads.cpp:92] RowSubscribe = SSPPushRowSubscribe 9 | I1230 15:22:14.712421 13924 server_threads.cpp:106] Create server thread 0 10 | I1230 15:22:14.712700 13928 server_threads.cpp:239] ServerThreads num_clients = 2 11 | I1230 15:22:14.712708 13928 server_threads.cpp:240] my id = 1 12 | I1230 15:22:14.712713 13928 server_threads.cpp:246] network addr = 192.168.40.100:10000 13 | I1230 15:22:14.712718 13928 comm_bus.cpp:117] CommBus ThreadRegister() 14 | I1230 15:22:14.712924 13928 server_threads.cpp:252] Server thread registered CommBus 15 | I1230 15:22:14.712944 13928 server_threads.cpp:141] Connect to local name node 16 | I1230 15:22:14.713012 13925 name_node_thread.cpp:142] Name node gets server 1 17 | I1230 15:22:14.713145 13929 bg_workers.cpp:889] Bg Worker starts here, my_id = 100 18 | I1230 15:22:14.713166 13929 comm_bus.cpp:117] CommBus ThreadRegister() 19 | I1230 15:22:14.713189 13929 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 0 20 | I1230 15:22:14.713193 13929 bg_workers.cpp:290] Connect to local server 0 21 | I1230 15:22:14.713238 13925 name_node_thread.cpp:139] Name node gets client 100 22 | I1230 15:22:17.447321 13925 name_node_thread.cpp:142] Name node gets server 1000 23 | I1230 15:22:17.447495 13925 name_node_thread.cpp:139] Name node gets client 1100 24 | I1230 15:22:17.447517 13925 name_node_thread.cpp:149] Has received connections from all clients and servers, sending out connect_server_msg 25 | I1230 15:22:17.447549 13925 name_node_thread.cpp:156] Send connect_server_msg done 26 | I1230 15:22:17.447556 13925 name_node_thread.cpp:162] InitNameNode done 27 | I1230 15:22:17.449017 13929 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1 28 | I1230 15:22:17.449028 13929 bg_workers.cpp:290] Connect to local server 1 29 | I1230 15:22:17.449089 13929 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1000 30 | I1230 15:22:17.449095 13929 bg_workers.cpp:293] Connect to remote server 1000 31 | I1230 15:22:17.449098 13929 bg_workers.cpp:296] server_addr = 192.168.40.101:10000 32 | I1230 15:22:17.450561 13928 server_threads.cpp:187] InitNonNameNode done 33 | I1230 15:22:17.470659 13929 bg_workers.cpp:368] get kClientStart from 0 num_started_servers = 0 34 | I1230 15:22:17.470669 13929 bg_workers.cpp:368] get kClientStart from 1 num_started_servers = 1 35 | I1230 15:22:17.470676 13929 bg_workers.cpp:368] get kClientStart from 1000 num_started_servers = 2 36 | I1230 15:22:17.472113 13925 name_node_thread.cpp:308] msg_type = 4 37 | I1230 15:22:17.472244 13929 bg_workers.cpp:911] head bg handles CreateTable 38 | I1230 15:22:17.472594 13925 name_node_thread.cpp:308] msg_type = 5 39 | I1230 15:22:17.472605 13925 name_node_thread.cpp:308] msg_type = 4 40 | I1230 15:22:17.472697 13925 name_node_thread.cpp:308] msg_type = 5 41 | I1230 15:22:17.473858 13929 oplog_index.cpp:42] Constructor shared_oplog_index = 0x1e358c0 42 | I1230 15:22:17.473875 13929 bg_workers.cpp:439] Reply app thread 200 43 | I1230 15:22:17.475160 13925 name_node_thread.cpp:308] msg_type = 4 44 | I1230 15:22:17.475195 13925 name_node_thread.cpp:308] msg_type = 4 45 | I1230 15:22:17.475469 13925 name_node_thread.cpp:308] msg_type = 5 46 | I1230 15:22:17.475558 13925 name_node_thread.cpp:308] msg_type = 5 47 | I1230 15:22:17.476692 13929 oplog_index.cpp:42] Constructor shared_oplog_index = 0x1e35840 48 | I1230 15:22:17.476702 13929 bg_workers.cpp:439] Reply app thread 200 49 | I1230 15:22:17.477852 13925 name_node_thread.cpp:308] msg_type = 4 50 | I1230 15:22:17.477886 13925 name_node_thread.cpp:308] msg_type = 4 51 | I1230 15:22:17.478170 13925 name_node_thread.cpp:308] msg_type = 5 52 | I1230 15:22:17.478266 13925 name_node_thread.cpp:308] msg_type = 5 53 | I1230 15:22:17.479367 13929 oplog_index.cpp:42] Constructor shared_oplog_index = 0x1e35c00 54 | I1230 15:22:17.479375 13929 bg_workers.cpp:439] Reply app thread 200 55 | I1230 15:22:17.486426 13934 comm_bus.cpp:117] CommBus ThreadRegister() 56 | I1230 15:22:17.488441 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 57 | I1230 15:22:17.488484 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5 58 | I1230 15:22:17.488497 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5 59 | I1230 15:22:17.488502 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 60 | I1230 15:22:17.488507 13928 server.cpp:202] Read and Apply Update Done 61 | I1230 15:22:17.488517 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 62 | I1230 15:22:17.488519 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 63 | I1230 15:22:17.488523 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 64 | I1230 15:22:17.488526 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 65 | I1230 15:22:17.488529 13928 server.cpp:202] Read and Apply Update Done 66 | I1230 15:22:17.488535 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 67 | I1230 15:22:17.488538 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 68 | I1230 15:22:17.488541 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 69 | I1230 15:22:17.488544 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 70 | I1230 15:22:17.488548 13928 server.cpp:202] Read and Apply Update Done 71 | I1230 15:22:17.488554 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 72 | I1230 15:22:17.488556 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 73 | I1230 15:22:17.488560 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 74 | I1230 15:22:17.488562 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 75 | I1230 15:22:17.488565 13928 server.cpp:202] Read and Apply Update Done 76 | I1230 15:22:17.488571 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 77 | I1230 15:22:17.488590 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 78 | I1230 15:22:17.488595 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 79 | I1230 15:22:17.488598 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 80 | I1230 15:22:17.488601 13928 server.cpp:202] Read and Apply Update Done 81 | I1230 15:22:17.488610 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 82 | I1230 15:22:17.488615 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 83 | I1230 15:22:17.488617 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 84 | I1230 15:22:17.488620 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 85 | I1230 15:22:17.488623 13928 server.cpp:202] Read and Apply Update Done 86 | I1230 15:22:17.488647 13933 comm_bus.cpp:117] CommBus ThreadRegister() 87 | I1230 15:22:17.491765 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 88 | I1230 15:22:17.491780 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5 89 | I1230 15:22:17.491786 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5 90 | I1230 15:22:17.491789 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 91 | I1230 15:22:17.491792 13928 server.cpp:202] Read and Apply Update Done 92 | I1230 15:22:17.492027 13928 server.cpp:236] Serializing table 2 93 | I1230 15:22:17.492035 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 94 | I1230 15:22:17.492074 13928 server.cpp:236] Serializing table 1 95 | I1230 15:22:17.492079 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 96 | I1230 15:22:17.492082 13928 server.cpp:236] Serializing table 0 97 | I1230 15:22:17.492085 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 98 | I1230 15:22:17.493206 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 4294967295Set server version = 0 99 | I1230 15:22:17.493221 13929 serialized_row_reader.hpp:64] mem_size_ = 24 100 | I1230 15:22:17.495124 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 101 | I1230 15:22:17.495134 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 102 | I1230 15:22:17.495137 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 103 | I1230 15:22:17.495141 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 104 | I1230 15:22:17.495144 13928 server.cpp:202] Read and Apply Update Done 105 | I1230 15:22:17.495151 13928 server.cpp:236] Serializing table 2 106 | I1230 15:22:17.495156 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 107 | I1230 15:22:17.495159 13928 server.cpp:236] Serializing table 1 108 | I1230 15:22:17.495162 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 109 | I1230 15:22:17.495165 13928 server.cpp:236] Serializing table 0 110 | I1230 15:22:17.495168 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 111 | I1230 15:22:17.495184 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 112 | I1230 15:22:17.495188 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 113 | I1230 15:22:17.495193 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 114 | I1230 15:22:17.495195 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 115 | I1230 15:22:17.495198 13928 server.cpp:202] Read and Apply Update Done 116 | I1230 15:22:17.495630 13928 server.cpp:236] Serializing table 2 117 | I1230 15:22:17.495647 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 118 | I1230 15:22:17.495651 13928 server.cpp:236] Serializing table 1 119 | I1230 15:22:17.495653 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 120 | I1230 15:22:17.495657 13928 server.cpp:236] Serializing table 0 121 | I1230 15:22:17.495661 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 122 | I1230 15:22:17.495678 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 123 | I1230 15:22:17.495682 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 124 | I1230 15:22:17.495686 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 125 | I1230 15:22:17.495688 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 126 | I1230 15:22:17.495692 13928 server.cpp:202] Read and Apply Update Done 127 | I1230 15:22:17.496121 13928 server.cpp:236] Serializing table 2 128 | I1230 15:22:17.496136 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 129 | I1230 15:22:17.496140 13928 server.cpp:236] Serializing table 1 130 | I1230 15:22:17.496143 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 131 | I1230 15:22:17.496146 13928 server.cpp:236] Serializing table 0 132 | I1230 15:22:17.496150 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 133 | I1230 15:22:17.496168 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 134 | I1230 15:22:17.496172 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 135 | I1230 15:22:17.496176 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 136 | I1230 15:22:17.496179 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 137 | I1230 15:22:17.496182 13928 server.cpp:202] Read and Apply Update Done 138 | I1230 15:22:17.496492 13928 server.cpp:236] Serializing table 2 139 | I1230 15:22:17.496546 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 140 | I1230 15:22:17.496551 13928 server.cpp:236] Serializing table 1 141 | I1230 15:22:17.496554 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 142 | I1230 15:22:17.496557 13928 server.cpp:236] Serializing table 0 143 | I1230 15:22:17.496561 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 144 | I1230 15:22:17.496572 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 145 | I1230 15:22:17.496574 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 146 | I1230 15:22:17.496578 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 147 | I1230 15:22:17.496598 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 148 | I1230 15:22:17.496603 13928 server.cpp:202] Read and Apply Update Done 149 | I1230 15:22:17.496997 13928 server.cpp:236] Serializing table 2 150 | I1230 15:22:17.497007 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 151 | I1230 15:22:17.497011 13928 server.cpp:236] Serializing table 1 152 | I1230 15:22:17.497014 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 153 | I1230 15:22:17.497017 13928 server.cpp:236] Serializing table 0 154 | I1230 15:22:17.497020 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 155 | I1230 15:22:17.497136 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 0Set server version = 1 156 | I1230 15:22:17.497144 13929 serialized_row_reader.hpp:64] mem_size_ = 24 157 | I1230 15:22:17.497153 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 1Set server version = 2 158 | I1230 15:22:17.497158 13929 serialized_row_reader.hpp:64] mem_size_ = 24 159 | I1230 15:22:17.497164 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 2Set server version = 3 160 | I1230 15:22:17.497167 13929 serialized_row_reader.hpp:64] mem_size_ = 24 161 | I1230 15:22:17.497174 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 3Set server version = 4 162 | I1230 15:22:17.497179 13929 serialized_row_reader.hpp:64] mem_size_ = 24 163 | I1230 15:22:17.497184 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 4Set server version = 5 164 | I1230 15:22:17.497189 13929 serialized_row_reader.hpp:64] mem_size_ = 24 165 | I1230 15:22:17.497195 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 4294967295Set server version = 0 166 | I1230 15:22:17.497200 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 0 167 | I1230 15:22:17.497208 13929 server_version_mgr.cpp:92] New min_version_ = 0 168 | I1230 15:22:17.497212 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 0 169 | I1230 15:22:17.497231 13929 serialized_row_reader.hpp:64] mem_size_ = 24 170 | I1230 15:22:17.497562 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 0Set server version = 1 171 | I1230 15:22:17.497570 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 1 172 | I1230 15:22:17.497573 13929 server_version_mgr.cpp:92] New min_version_ = 1 173 | I1230 15:22:17.497577 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 1 174 | I1230 15:22:17.497607 13929 serialized_row_reader.hpp:64] mem_size_ = 24 175 | I1230 15:22:17.497620 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 1Set server version = 2 176 | I1230 15:22:17.497624 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 2 177 | I1230 15:22:17.497628 13929 server_version_mgr.cpp:92] New min_version_ = 2 178 | I1230 15:22:17.497632 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 2 179 | I1230 15:22:17.497635 13929 serialized_row_reader.hpp:64] mem_size_ = 24 180 | I1230 15:22:17.497642 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 2Set server version = 3 181 | I1230 15:22:17.497645 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 3 182 | I1230 15:22:17.497648 13929 server_version_mgr.cpp:92] New min_version_ = 3 183 | I1230 15:22:17.497652 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 3 184 | I1230 15:22:17.497678 13929 serialized_row_reader.hpp:64] mem_size_ = 24 185 | I1230 15:22:17.497687 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 3Set server version = 4 186 | I1230 15:22:17.497690 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 4 187 | I1230 15:22:17.497694 13929 server_version_mgr.cpp:92] New min_version_ = 4 188 | I1230 15:22:17.497697 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 4 189 | I1230 15:22:17.497700 13929 serialized_row_reader.hpp:64] mem_size_ = 24 190 | I1230 15:22:17.497706 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 4Set server version = 5 191 | I1230 15:22:17.497710 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 5 192 | I1230 15:22:17.497714 13929 server_version_mgr.cpp:92] New min_version_ = 5 193 | I1230 15:22:17.497716 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 5 194 | I1230 15:22:17.497720 13929 serialized_row_reader.hpp:64] mem_size_ = 24 195 | I1230 15:22:17.499207 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 196 | I1230 15:22:17.569367 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 197 | I1230 15:22:17.570533 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 198 | I1230 15:22:17.570849 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 199 | I1230 15:22:17.570858 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5 200 | I1230 15:22:17.570863 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5 201 | I1230 15:22:17.570866 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 1 202 | I1230 15:22:17.570869 13928 server.cpp:202] Read and Apply Update Done 203 | I1230 15:22:17.570947 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 204 | I1230 15:22:17.571192 13929 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 205 | I1230 15:22:17.572718 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 206 | I1230 15:22:17.572728 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5 207 | I1230 15:22:17.572733 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5 208 | I1230 15:22:17.572737 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 1 209 | I1230 15:22:17.572741 13928 server.cpp:202] Read and Apply Update Done 210 | I1230 15:22:17.572749 13928 server.cpp:236] Serializing table 2 211 | I1230 15:22:17.572753 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 212 | I1230 15:22:17.572757 13928 server.cpp:236] Serializing table 1 213 | I1230 15:22:17.572760 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 214 | I1230 15:22:17.572764 13928 server.cpp:236] Serializing table 0 215 | I1230 15:22:17.572767 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 216 | I1230 15:22:17.578516 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 5Set server version = 6 217 | I1230 15:22:17.578531 13929 serialized_row_reader.hpp:64] mem_size_ = 292 218 | I1230 15:22:17.578754 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 219 | I1230 15:22:17.578763 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5 220 | I1230 15:22:17.578768 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5 221 | I1230 15:22:17.578770 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 1 222 | I1230 15:22:17.578774 13928 server.cpp:202] Read and Apply Update Done 223 | I1230 15:22:17.578816 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 224 | I1230 15:22:17.578820 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 225 | I1230 15:22:17.578824 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 226 | I1230 15:22:17.578827 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 227 | I1230 15:22:17.578830 13928 server.cpp:202] Read and Apply Update Done 228 | I1230 15:22:17.578836 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 229 | I1230 15:22:17.578840 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 230 | I1230 15:22:17.578843 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 231 | I1230 15:22:17.578846 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 232 | I1230 15:22:17.578850 13928 server.cpp:202] Read and Apply Update Done 233 | I1230 15:22:17.578855 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 234 | I1230 15:22:17.578858 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 235 | I1230 15:22:17.578861 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 236 | I1230 15:22:17.578865 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 237 | I1230 15:22:17.578867 13928 server.cpp:202] Read and Apply Update Done 238 | I1230 15:22:17.578873 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 239 | I1230 15:22:17.578876 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 240 | I1230 15:22:17.578881 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 241 | I1230 15:22:17.578883 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 242 | I1230 15:22:17.578886 13928 server.cpp:202] Read and Apply Update Done 243 | I1230 15:22:17.578892 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 244 | I1230 15:22:17.578896 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 245 | I1230 15:22:17.578899 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 246 | I1230 15:22:17.578902 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 247 | I1230 15:22:17.578905 13928 server.cpp:202] Read and Apply Update Done 248 | I1230 15:22:17.578912 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 249 | I1230 15:22:17.578914 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 250 | I1230 15:22:17.578917 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 251 | I1230 15:22:17.578920 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 252 | I1230 15:22:17.578923 13928 server.cpp:202] Read and Apply Update Done 253 | I1230 15:22:17.578932 13928 server_threads.cpp:419] get ClientShutDown from bg 1100 254 | I1230 15:22:17.578991 13925 name_node_thread.cpp:308] msg_type = 16 255 | I1230 15:22:17.578999 13925 name_node_thread.cpp:313] get ClientShutDown from bg 1100 256 | I1230 15:22:17.579012 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 5Set server version = 6 257 | I1230 15:22:17.579016 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 6 258 | I1230 15:22:17.579020 13929 server_version_mgr.cpp:92] New min_version_ = 6 259 | I1230 15:22:17.579023 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 6 260 | I1230 15:22:17.579049 13929 serialized_row_reader.hpp:64] mem_size_ = 216 261 | I1230 15:22:17.584317 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 262 | I1230 15:22:17.584328 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 5 263 | I1230 15:22:17.584332 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 5 264 | I1230 15:22:17.584336 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 1 265 | I1230 15:22:17.584339 13928 server.cpp:202] Read and Apply Update Done 266 | I1230 15:22:17.584352 13928 server.cpp:236] Serializing table 2 267 | I1230 15:22:17.584357 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 268 | I1230 15:22:17.584360 13928 server.cpp:236] Serializing table 1 269 | I1230 15:22:17.584363 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 270 | I1230 15:22:17.584367 13928 server.cpp:236] Serializing table 0 271 | I1230 15:22:17.584370 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 272 | I1230 15:22:17.584385 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 273 | I1230 15:22:17.584389 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 274 | I1230 15:22:17.584393 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 275 | I1230 15:22:17.584395 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 276 | I1230 15:22:17.584398 13928 server.cpp:202] Read and Apply Update Done 277 | I1230 15:22:17.584403 13928 server.cpp:236] Serializing table 2 278 | I1230 15:22:17.584406 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 279 | I1230 15:22:17.584409 13928 server.cpp:236] Serializing table 1 280 | I1230 15:22:17.584413 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 281 | I1230 15:22:17.584415 13928 server.cpp:236] Serializing table 0 282 | I1230 15:22:17.584419 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 283 | I1230 15:22:17.584426 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 284 | I1230 15:22:17.584429 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 285 | I1230 15:22:17.584432 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 286 | I1230 15:22:17.584435 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 287 | I1230 15:22:17.584439 13928 server.cpp:202] Read and Apply Update Done 288 | I1230 15:22:17.584442 13928 server.cpp:236] Serializing table 2 289 | I1230 15:22:17.584446 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 290 | I1230 15:22:17.584450 13928 server.cpp:236] Serializing table 1 291 | I1230 15:22:17.584452 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 292 | I1230 15:22:17.584455 13928 server.cpp:236] Serializing table 0 293 | I1230 15:22:17.584458 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 294 | I1230 15:22:17.584465 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 295 | I1230 15:22:17.584470 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 296 | I1230 15:22:17.584472 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 297 | I1230 15:22:17.584475 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 298 | I1230 15:22:17.584478 13928 server.cpp:202] Read and Apply Update Done 299 | I1230 15:22:17.584482 13928 server.cpp:236] Serializing table 2 300 | I1230 15:22:17.584486 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 301 | I1230 15:22:17.584488 13928 server.cpp:236] Serializing table 1 302 | I1230 15:22:17.584491 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 303 | I1230 15:22:17.584494 13928 server.cpp:236] Serializing table 0 304 | I1230 15:22:17.584497 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 305 | I1230 15:22:17.584504 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 306 | I1230 15:22:17.584527 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 307 | I1230 15:22:17.584532 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 308 | I1230 15:22:17.584534 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 309 | I1230 15:22:17.584537 13928 server.cpp:202] Read and Apply Update Done 310 | I1230 15:22:17.584542 13928 server.cpp:236] Serializing table 2 311 | I1230 15:22:17.584545 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 312 | I1230 15:22:17.584549 13928 server.cpp:236] Serializing table 1 313 | I1230 15:22:17.584553 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 314 | I1230 15:22:17.584555 13928 server.cpp:236] Serializing table 0 315 | I1230 15:22:17.584558 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 316 | I1230 15:22:17.584566 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 317 | I1230 15:22:17.584570 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 318 | I1230 15:22:17.584573 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 319 | I1230 15:22:17.584576 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 320 | I1230 15:22:17.584636 13928 server.cpp:202] Read and Apply Update Done 321 | I1230 15:22:17.584887 13928 server.cpp:236] Serializing table 2 322 | I1230 15:22:17.584894 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 323 | I1230 15:22:17.584897 13928 server.cpp:236] Serializing table 1 324 | I1230 15:22:17.584900 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 325 | I1230 15:22:17.584903 13928 server.cpp:236] Serializing table 0 326 | I1230 15:22:17.584907 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 327 | I1230 15:22:17.584918 13928 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 328 | I1230 15:22:17.584923 13928 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 329 | I1230 15:22:17.584925 13928 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 330 | I1230 15:22:17.584928 13928 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 331 | I1230 15:22:17.584931 13928 server.cpp:202] Read and Apply Update Done 332 | I1230 15:22:17.585170 13928 server.cpp:236] Serializing table 2 333 | I1230 15:22:17.585177 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 334 | I1230 15:22:17.585180 13928 server.cpp:236] Serializing table 1 335 | I1230 15:22:17.585183 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 336 | I1230 15:22:17.585186 13928 server.cpp:236] Serializing table 0 337 | I1230 15:22:17.585189 13928 server_table.hpp:83] tmp_row_buff_size_ = 512 338 | I1230 15:22:17.587085 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 6Set server version = 7 339 | I1230 15:22:17.587095 13929 serialized_row_reader.hpp:64] mem_size_ = 292 340 | I1230 15:22:17.587123 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 7Set server version = 8 341 | I1230 15:22:17.587127 13929 serialized_row_reader.hpp:64] mem_size_ = 24 342 | I1230 15:22:17.587134 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 8Set server version = 9 343 | I1230 15:22:17.587138 13929 serialized_row_reader.hpp:64] mem_size_ = 24 344 | I1230 15:22:17.587146 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 9Set server version = 10 345 | I1230 15:22:17.587148 13929 serialized_row_reader.hpp:64] mem_size_ = 24 346 | I1230 15:22:17.587155 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 10Set server version = 11 347 | I1230 15:22:17.587159 13929 serialized_row_reader.hpp:64] mem_size_ = 24 348 | I1230 15:22:17.587165 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 11Set server version = 12 349 | I1230 15:22:17.587169 13929 serialized_row_reader.hpp:64] mem_size_ = 24 350 | I1230 15:22:17.587175 13929 server_version_mgr.cpp:51] Server id = 1 original server version = 12Set server version = 13 351 | I1230 15:22:17.587208 13929 serialized_row_reader.hpp:64] mem_size_ = 24 352 | I1230 15:22:17.593243 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 6Set server version = 7 353 | I1230 15:22:17.593261 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 7 354 | I1230 15:22:17.593264 13929 server_version_mgr.cpp:92] New min_version_ = 7 355 | I1230 15:22:17.593267 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 7 356 | I1230 15:22:17.593284 13929 serialized_row_reader.hpp:64] mem_size_ = 216 357 | I1230 15:22:17.593324 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 7Set server version = 8 358 | I1230 15:22:17.593330 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 8 359 | I1230 15:22:17.593333 13929 server_version_mgr.cpp:92] New min_version_ = 8 360 | I1230 15:22:17.593336 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 8 361 | I1230 15:22:17.593340 13929 serialized_row_reader.hpp:64] mem_size_ = 24 362 | I1230 15:22:17.593350 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 8Set server version = 9 363 | I1230 15:22:17.593354 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 9 364 | I1230 15:22:17.593358 13929 server_version_mgr.cpp:92] New min_version_ = 9 365 | I1230 15:22:17.593360 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 9 366 | I1230 15:22:17.593365 13929 serialized_row_reader.hpp:64] mem_size_ = 24 367 | I1230 15:22:17.593372 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 9Set server version = 10 368 | I1230 15:22:17.593375 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 10 369 | I1230 15:22:17.593379 13929 server_version_mgr.cpp:92] New min_version_ = 10 370 | I1230 15:22:17.593382 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 10 371 | I1230 15:22:17.593385 13929 serialized_row_reader.hpp:64] mem_size_ = 24 372 | I1230 15:22:17.593391 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 10Set server version = 11 373 | I1230 15:22:17.593395 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 11 374 | I1230 15:22:17.593399 13929 server_version_mgr.cpp:92] New min_version_ = 11 375 | I1230 15:22:17.593401 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 11 376 | I1230 15:22:17.593405 13929 serialized_row_reader.hpp:64] mem_size_ = 24 377 | I1230 15:22:17.593411 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 11Set server version = 12 378 | I1230 15:22:17.593415 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 12 379 | I1230 15:22:17.593417 13929 server_version_mgr.cpp:92] New min_version_ = 12 380 | I1230 15:22:17.593420 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 12 381 | I1230 15:22:17.593425 13929 serialized_row_reader.hpp:64] mem_size_ = 24 382 | I1230 15:22:17.597100 13925 name_node_thread.cpp:308] msg_type = 16 383 | I1230 15:22:17.597122 13925 name_node_thread.cpp:313] get ClientShutDown from bg 100 384 | I1230 15:22:17.597277 13925 name_node_thread.cpp:316] NameNode shutting down 385 | I1230 15:22:17.597395 13929 bg_workers.cpp:970] get ServerShutDownAck from server 0 386 | I1230 15:22:17.597450 13928 server_threads.cpp:419] get ClientShutDown from bg 100 387 | I1230 15:22:17.597481 13928 server_threads.cpp:422] Server shutdown 388 | I1230 15:22:17.599992 13929 bg_workers.cpp:970] get ServerShutDownAck from server 1 389 | I1230 15:22:17.635277 13929 server_version_mgr.cpp:51] Server id = 1000 original server version = 12Set server version = 13 390 | I1230 15:22:17.635350 13929 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1000 version = 13 391 | I1230 15:22:17.635354 13929 server_version_mgr.cpp:92] New min_version_ = 13 392 | I1230 15:22:17.635359 13929 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1000 version to remove = 13 393 | I1230 15:22:17.635367 13929 serialized_row_reader.hpp:64] mem_size_ = 24 394 | I1230 15:22:17.635381 13929 bg_workers.cpp:970] get ServerShutDownAck from server 1000 395 | I1230 15:22:17.635422 13929 bg_workers.cpp:973] Bg worker 100 shutting down 396 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/MF-logs/matrixfact.ubuntu2.xulijie.log.INFO.20141230-152217.8866.txt: -------------------------------------------------------------------------------- 1 | Log file created at: 2014/12/30 15:22:17 2 | Running on machine: ubuntu2 3 | Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg 4 | I1230 15:22:17.474882 8866 comm_bus.cpp:117] CommBus ThreadRegister() 5 | I1230 15:22:17.475028 8866 server_threads.cpp:92] RowSubscribe = SSPPushRowSubscribe 6 | I1230 15:22:17.475034 8866 server_threads.cpp:106] Create server thread 0 7 | I1230 15:22:17.475098 8867 server_threads.cpp:239] ServerThreads num_clients = 2 8 | I1230 15:22:17.475103 8867 server_threads.cpp:240] my id = 1000 9 | I1230 15:22:17.475110 8867 server_threads.cpp:246] network addr = 192.168.40.101:10000 10 | I1230 15:22:17.475113 8867 comm_bus.cpp:117] CommBus ThreadRegister() 11 | I1230 15:22:17.475584 8867 server_threads.cpp:252] Server thread registered CommBus 12 | I1230 15:22:17.475623 8867 server_threads.cpp:144] Connect to remote name node 13 | I1230 15:22:17.475628 8867 server_threads.cpp:147] name_node_addr = 192.168.40.100:9999 14 | I1230 15:22:17.475725 8870 bg_workers.cpp:889] Bg Worker starts here, my_id = 1100 15 | I1230 15:22:17.475742 8870 comm_bus.cpp:117] CommBus ThreadRegister() 16 | I1230 15:22:17.475757 8870 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 0 17 | I1230 15:22:17.475761 8870 bg_workers.cpp:293] Connect to remote server 0 18 | I1230 15:22:17.475764 8870 bg_workers.cpp:296] server_addr = 192.168.40.100:9999 19 | I1230 15:22:17.478379 8870 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1 20 | I1230 15:22:17.478387 8870 bg_workers.cpp:293] Connect to remote server 1 21 | I1230 15:22:17.478390 8870 bg_workers.cpp:296] server_addr = 192.168.40.100:10000 22 | I1230 15:22:17.480212 8870 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1000 23 | I1230 15:22:17.480221 8870 bg_workers.cpp:290] Connect to local server 1000 24 | I1230 15:22:17.480249 8870 bg_workers.cpp:368] get kClientStart from 0 num_started_servers = 0 25 | I1230 15:22:17.481348 8870 bg_workers.cpp:368] get kClientStart from 1 num_started_servers = 1 26 | I1230 15:22:17.481624 8867 server_threads.cpp:187] InitNonNameNode done 27 | I1230 15:22:17.481638 8870 bg_workers.cpp:368] get kClientStart from 1000 num_started_servers = 2 28 | I1230 15:22:17.481751 8870 bg_workers.cpp:911] head bg handles CreateTable 29 | I1230 15:22:17.505518 8870 oplog_index.cpp:42] Constructor shared_oplog_index = 0x13e5700 30 | I1230 15:22:17.505545 8870 bg_workers.cpp:439] Reply app thread 1200 31 | I1230 15:22:17.508220 8870 oplog_index.cpp:42] Constructor shared_oplog_index = 0x13e5840 32 | I1230 15:22:17.508232 8870 bg_workers.cpp:439] Reply app thread 1200 33 | I1230 15:22:17.510934 8870 oplog_index.cpp:42] Constructor shared_oplog_index = 0x13e5a00 34 | I1230 15:22:17.510947 8870 bg_workers.cpp:439] Reply app thread 1200 35 | I1230 15:22:17.511045 8872 comm_bus.cpp:117] CommBus ThreadRegister() 36 | I1230 15:22:17.511118 8871 comm_bus.cpp:117] CommBus ThreadRegister() 37 | I1230 15:22:17.512034 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 38 | I1230 15:22:17.512040 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4 39 | I1230 15:22:17.512049 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4 40 | I1230 15:22:17.512053 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 41 | I1230 15:22:17.512056 8867 server.cpp:202] Read and Apply Update Done 42 | I1230 15:22:17.512743 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 43 | I1230 15:22:17.512749 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 44 | I1230 15:22:17.512753 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 45 | I1230 15:22:17.512754 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 46 | I1230 15:22:17.512758 8867 server.cpp:202] Read and Apply Update Done 47 | I1230 15:22:17.513478 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 48 | I1230 15:22:17.513485 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 49 | I1230 15:22:17.513488 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 50 | I1230 15:22:17.513521 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 51 | I1230 15:22:17.513525 8867 server.cpp:202] Read and Apply Update Done 52 | I1230 15:22:17.514166 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 53 | I1230 15:22:17.514173 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 54 | I1230 15:22:17.514175 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 55 | I1230 15:22:17.514178 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 56 | I1230 15:22:17.514179 8867 server.cpp:202] Read and Apply Update Done 57 | I1230 15:22:17.514829 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 58 | I1230 15:22:17.514837 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 59 | I1230 15:22:17.514838 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 60 | I1230 15:22:17.514842 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 61 | I1230 15:22:17.514843 8867 server.cpp:202] Read and Apply Update Done 62 | I1230 15:22:17.515512 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 63 | I1230 15:22:17.515518 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 64 | I1230 15:22:17.515522 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 65 | I1230 15:22:17.515523 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 66 | I1230 15:22:17.515525 8867 server.cpp:202] Read and Apply Update Done 67 | I1230 15:22:17.523031 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 68 | I1230 15:22:17.523054 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4 69 | I1230 15:22:17.523059 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4 70 | I1230 15:22:17.523062 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 71 | I1230 15:22:17.523066 8867 server.cpp:202] Read and Apply Update Done 72 | I1230 15:22:17.523350 8867 server.cpp:236] Serializing table 2 73 | I1230 15:22:17.523357 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 74 | I1230 15:22:17.523361 8867 server.cpp:236] Serializing table 1 75 | I1230 15:22:17.523363 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 76 | I1230 15:22:17.523366 8867 server.cpp:236] Serializing table 0 77 | I1230 15:22:17.523370 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 78 | I1230 15:22:17.523473 8870 server_version_mgr.cpp:51] Server id = 1000 original server version = 4294967295Set server version = 5 79 | I1230 15:22:17.523481 8870 serialized_row_reader.hpp:64] mem_size_ = 24 80 | I1230 15:22:17.523488 8870 server_version_mgr.cpp:51] Server id = 1 original server version = 4294967295Set server version = 5 81 | I1230 15:22:17.523491 8870 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1 version = 5 82 | I1230 15:22:17.523496 8870 server_version_mgr.cpp:92] New min_version_ = 5 83 | I1230 15:22:17.523499 8870 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1 version to remove = 5 84 | I1230 15:22:17.523519 8870 serialized_row_reader.hpp:64] mem_size_ = 24 85 | I1230 15:22:17.525842 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 86 | I1230 15:22:17.525852 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 87 | I1230 15:22:17.525856 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 88 | I1230 15:22:17.525857 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 89 | I1230 15:22:17.525893 8867 server.cpp:202] Read and Apply Update Done 90 | I1230 15:22:17.525902 8867 server.cpp:236] Serializing table 2 91 | I1230 15:22:17.525904 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 92 | I1230 15:22:17.525907 8867 server.cpp:236] Serializing table 1 93 | I1230 15:22:17.525909 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 94 | I1230 15:22:17.525913 8867 server.cpp:236] Serializing table 0 95 | I1230 15:22:17.525914 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 96 | I1230 15:22:17.525928 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 97 | I1230 15:22:17.525930 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 98 | I1230 15:22:17.525933 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 99 | I1230 15:22:17.525935 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 100 | I1230 15:22:17.525938 8867 server.cpp:202] Read and Apply Update Done 101 | I1230 15:22:17.526305 8867 server.cpp:236] Serializing table 2 102 | I1230 15:22:17.526319 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 103 | I1230 15:22:17.526322 8867 server.cpp:236] Serializing table 1 104 | I1230 15:22:17.526324 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 105 | I1230 15:22:17.526326 8867 server.cpp:236] Serializing table 0 106 | I1230 15:22:17.526329 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 107 | I1230 15:22:17.526343 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 108 | I1230 15:22:17.526346 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 109 | I1230 15:22:17.526348 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 110 | I1230 15:22:17.526350 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 111 | I1230 15:22:17.526353 8867 server.cpp:202] Read and Apply Update Done 112 | I1230 15:22:17.526689 8867 server.cpp:236] Serializing table 2 113 | I1230 15:22:17.526695 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 114 | I1230 15:22:17.526698 8867 server.cpp:236] Serializing table 1 115 | I1230 15:22:17.526700 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 116 | I1230 15:22:17.526702 8867 server.cpp:236] Serializing table 0 117 | I1230 15:22:17.526705 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 118 | I1230 15:22:17.526713 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 119 | I1230 15:22:17.526716 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 120 | I1230 15:22:17.526718 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 121 | I1230 15:22:17.526721 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 122 | I1230 15:22:17.526722 8867 server.cpp:202] Read and Apply Update Done 123 | I1230 15:22:17.527147 8867 server.cpp:236] Serializing table 2 124 | I1230 15:22:17.527159 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 125 | I1230 15:22:17.527163 8867 server.cpp:236] Serializing table 1 126 | I1230 15:22:17.527165 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 127 | I1230 15:22:17.527168 8867 server.cpp:236] Serializing table 0 128 | I1230 15:22:17.527169 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 129 | I1230 15:22:17.527186 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 130 | I1230 15:22:17.527189 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 131 | I1230 15:22:17.527191 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 132 | I1230 15:22:17.527194 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 133 | I1230 15:22:17.527196 8867 server.cpp:202] Read and Apply Update Done 134 | I1230 15:22:17.527624 8867 server.cpp:236] Serializing table 2 135 | I1230 15:22:17.527678 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 136 | I1230 15:22:17.527683 8867 server.cpp:236] Serializing table 1 137 | I1230 15:22:17.527684 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 138 | I1230 15:22:17.527688 8867 server.cpp:236] Serializing table 0 139 | I1230 15:22:17.527689 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 140 | I1230 15:22:17.527811 8870 serialized_row_reader.hpp:64] mem_size_ = 24 141 | I1230 15:22:17.527822 8870 serialized_row_reader.hpp:64] mem_size_ = 24 142 | I1230 15:22:17.527827 8870 serialized_row_reader.hpp:64] mem_size_ = 24 143 | I1230 15:22:17.527832 8870 serialized_row_reader.hpp:64] mem_size_ = 24 144 | I1230 15:22:17.527835 8870 serialized_row_reader.hpp:64] mem_size_ = 24 145 | I1230 15:22:17.527959 8870 serialized_row_reader.hpp:64] mem_size_ = 24 146 | I1230 15:22:17.527986 8870 serialized_row_reader.hpp:64] mem_size_ = 24 147 | I1230 15:22:17.527997 8870 serialized_row_reader.hpp:64] mem_size_ = 24 148 | I1230 15:22:17.528008 8870 serialized_row_reader.hpp:64] mem_size_ = 24 149 | I1230 15:22:17.528018 8870 serialized_row_reader.hpp:64] mem_size_ = 24 150 | I1230 15:22:17.528100 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 6 There's a previous request requesting clock 6 151 | I1230 15:22:17.528393 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 152 | I1230 15:22:17.528705 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 153 | I1230 15:22:17.529086 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 154 | I1230 15:22:17.529573 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 155 | I1230 15:22:17.529762 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 156 | I1230 15:22:17.530083 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 157 | I1230 15:22:17.530362 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 158 | I1230 15:22:17.590853 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 159 | I1230 15:22:17.597518 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 160 | I1230 15:22:17.597622 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 161 | I1230 15:22:17.599817 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 162 | I1230 15:22:17.599916 8870 ssp_push_row_request_oplog_mgr.cpp:55] I'm requesting clock is 1 There's a previous request requesting clock 1 163 | I1230 15:22:17.601063 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 164 | I1230 15:22:17.601071 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4 165 | I1230 15:22:17.601075 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4 166 | I1230 15:22:17.601078 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 167 | I1230 15:22:17.601080 8867 server.cpp:202] Read and Apply Update Done 168 | I1230 15:22:17.603108 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 169 | I1230 15:22:17.603118 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4 170 | I1230 15:22:17.603121 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4 171 | I1230 15:22:17.603124 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 172 | I1230 15:22:17.603127 8867 server.cpp:202] Read and Apply Update Done 173 | I1230 15:22:17.603134 8867 server.cpp:236] Serializing table 2 174 | I1230 15:22:17.603173 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 175 | I1230 15:22:17.603176 8867 server.cpp:236] Serializing table 1 176 | I1230 15:22:17.603179 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 177 | I1230 15:22:17.603181 8867 server.cpp:236] Serializing table 0 178 | I1230 15:22:17.603184 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 179 | I1230 15:22:17.603328 8870 server_version_mgr.cpp:51] Server id = 1000 original server version = 5Set server version = 6 180 | I1230 15:22:17.603337 8870 serialized_row_reader.hpp:64] mem_size_ = 216 181 | I1230 15:22:17.603554 8870 server_version_mgr.cpp:51] Server id = 1 original server version = 5Set server version = 6 182 | I1230 15:22:17.603560 8870 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1 version = 6 183 | I1230 15:22:17.603564 8870 server_version_mgr.cpp:92] New min_version_ = 6 184 | I1230 15:22:17.603565 8870 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1 version to remove = 6 185 | I1230 15:22:17.603575 8870 serialized_row_reader.hpp:64] mem_size_ = 292 186 | I1230 15:22:17.604626 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 187 | I1230 15:22:17.604634 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4 188 | I1230 15:22:17.604637 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4 189 | I1230 15:22:17.604640 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 190 | I1230 15:22:17.604642 8867 server.cpp:202] Read and Apply Update Done 191 | I1230 15:22:17.605319 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 192 | I1230 15:22:17.605325 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 193 | I1230 15:22:17.605329 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 194 | I1230 15:22:17.605330 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 195 | I1230 15:22:17.605332 8867 server.cpp:202] Read and Apply Update Done 196 | I1230 15:22:17.605978 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 197 | I1230 15:22:17.605985 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 198 | I1230 15:22:17.605988 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 199 | I1230 15:22:17.605990 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 200 | I1230 15:22:17.605993 8867 server.cpp:202] Read and Apply Update Done 201 | I1230 15:22:17.606648 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 202 | I1230 15:22:17.606654 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 203 | I1230 15:22:17.606657 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 204 | I1230 15:22:17.606659 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 205 | I1230 15:22:17.606662 8867 server.cpp:202] Read and Apply Update Done 206 | I1230 15:22:17.607377 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 207 | I1230 15:22:17.607383 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 208 | I1230 15:22:17.607385 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 209 | I1230 15:22:17.607388 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 210 | I1230 15:22:17.607390 8867 server.cpp:202] Read and Apply Update Done 211 | I1230 15:22:17.608032 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 212 | I1230 15:22:17.608039 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 213 | I1230 15:22:17.608062 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 214 | I1230 15:22:17.608065 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 215 | I1230 15:22:17.608067 8867 server.cpp:202] Read and Apply Update Done 216 | I1230 15:22:17.608726 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 217 | I1230 15:22:17.608732 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 218 | I1230 15:22:17.608736 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 219 | I1230 15:22:17.608737 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 220 | I1230 15:22:17.608739 8867 server.cpp:202] Read and Apply Update Done 221 | I1230 15:22:17.608836 8867 server_threads.cpp:419] get ClientShutDown from bg 1100 222 | I1230 15:22:17.616081 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 223 | I1230 15:22:17.616102 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 4 224 | I1230 15:22:17.616107 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 4 225 | I1230 15:22:17.616111 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 226 | I1230 15:22:17.616113 8867 server.cpp:202] Read and Apply Update Done 227 | I1230 15:22:17.616122 8867 server.cpp:236] Serializing table 2 228 | I1230 15:22:17.616125 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 229 | I1230 15:22:17.616128 8867 server.cpp:236] Serializing table 1 230 | I1230 15:22:17.616132 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 231 | I1230 15:22:17.616133 8867 server.cpp:236] Serializing table 0 232 | I1230 15:22:17.616137 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 233 | I1230 15:22:17.616154 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 234 | I1230 15:22:17.616158 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 235 | I1230 15:22:17.616160 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 236 | I1230 15:22:17.616163 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 237 | I1230 15:22:17.616165 8867 server.cpp:202] Read and Apply Update Done 238 | I1230 15:22:17.616168 8867 server.cpp:236] Serializing table 2 239 | I1230 15:22:17.616171 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 240 | I1230 15:22:17.616173 8867 server.cpp:236] Serializing table 1 241 | I1230 15:22:17.616175 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 242 | I1230 15:22:17.616178 8867 server.cpp:236] Serializing table 0 243 | I1230 15:22:17.616180 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 244 | I1230 15:22:17.616185 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 245 | I1230 15:22:17.616189 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 246 | I1230 15:22:17.616190 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 247 | I1230 15:22:17.616194 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 248 | I1230 15:22:17.616195 8867 server.cpp:202] Read and Apply Update Done 249 | I1230 15:22:17.616199 8867 server.cpp:236] Serializing table 2 250 | I1230 15:22:17.616200 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 251 | I1230 15:22:17.616204 8867 server.cpp:236] Serializing table 1 252 | I1230 15:22:17.616205 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 253 | I1230 15:22:17.616207 8867 server.cpp:236] Serializing table 0 254 | I1230 15:22:17.616209 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 255 | I1230 15:22:17.616215 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 256 | I1230 15:22:17.616252 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 257 | I1230 15:22:17.616255 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 258 | I1230 15:22:17.616257 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 259 | I1230 15:22:17.616261 8867 server.cpp:202] Read and Apply Update Done 260 | I1230 15:22:17.616264 8867 server.cpp:236] Serializing table 2 261 | I1230 15:22:17.616266 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 262 | I1230 15:22:17.616269 8867 server.cpp:236] Serializing table 1 263 | I1230 15:22:17.616271 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 264 | I1230 15:22:17.616273 8867 server.cpp:236] Serializing table 0 265 | I1230 15:22:17.616276 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 266 | I1230 15:22:17.616298 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 267 | I1230 15:22:17.616303 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 268 | I1230 15:22:17.616305 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 269 | I1230 15:22:17.616307 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 270 | I1230 15:22:17.616309 8867 server.cpp:202] Read and Apply Update Done 271 | I1230 15:22:17.616313 8867 server.cpp:236] Serializing table 2 272 | I1230 15:22:17.616315 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 273 | I1230 15:22:17.616318 8867 server.cpp:236] Serializing table 1 274 | I1230 15:22:17.616320 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 275 | I1230 15:22:17.616322 8867 server.cpp:236] Serializing table 0 276 | I1230 15:22:17.616324 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 277 | I1230 15:22:17.616330 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 278 | I1230 15:22:17.616333 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 279 | I1230 15:22:17.616335 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 280 | I1230 15:22:17.616338 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 281 | I1230 15:22:17.616339 8867 server.cpp:202] Read and Apply Update Done 282 | I1230 15:22:17.616595 8867 server.cpp:236] Serializing table 2 283 | I1230 15:22:17.616602 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 284 | I1230 15:22:17.616605 8867 server.cpp:236] Serializing table 1 285 | I1230 15:22:17.616607 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 286 | I1230 15:22:17.616610 8867 server.cpp:236] Serializing table 0 287 | I1230 15:22:17.616612 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 288 | I1230 15:22:17.616621 8867 serialized_oplog_reader.hpp:57] SerializedOpLogReader Restart(), num_tables_left = 3 289 | I1230 15:22:17.616623 8867 serialized_oplog_reader.hpp:119] current_table_id = 0 update_size = 4 rows_left_in_current_table_ = 0 290 | I1230 15:22:17.616626 8867 serialized_oplog_reader.hpp:119] current_table_id = 1 update_size = 4 rows_left_in_current_table_ = 0 291 | I1230 15:22:17.616627 8867 serialized_oplog_reader.hpp:119] current_table_id = 2 update_size = 4 rows_left_in_current_table_ = 0 292 | I1230 15:22:17.616631 8867 server.cpp:202] Read and Apply Update Done 293 | I1230 15:22:17.622216 8870 server_version_mgr.cpp:51] Server id = 1000 original server version = 6Set server version = 13 294 | I1230 15:22:17.622227 8870 serialized_row_reader.hpp:64] mem_size_ = 216 295 | I1230 15:22:17.622247 8870 serialized_row_reader.hpp:64] mem_size_ = 24 296 | I1230 15:22:17.622252 8870 serialized_row_reader.hpp:64] mem_size_ = 24 297 | I1230 15:22:17.622257 8870 serialized_row_reader.hpp:64] mem_size_ = 24 298 | I1230 15:22:17.622262 8870 serialized_row_reader.hpp:64] mem_size_ = 24 299 | I1230 15:22:17.622267 8870 serialized_row_reader.hpp:64] mem_size_ = 24 300 | I1230 15:22:17.622270 8870 server_version_mgr.cpp:51] Server id = 1 original server version = 6Set server version = 13 301 | I1230 15:22:17.622306 8870 server_version_mgr.cpp:61] IsUniqueMin!! server id = 1 version = 13 302 | I1230 15:22:17.622311 8870 server_version_mgr.cpp:92] New min_version_ = 13 303 | I1230 15:22:17.622314 8870 ssp_push_row_request_oplog_mgr.cpp:129] server id = 1 version to remove = 13 304 | I1230 15:22:17.622330 8870 serialized_row_reader.hpp:64] mem_size_ = 292 305 | I1230 15:22:17.622346 8870 serialized_row_reader.hpp:64] mem_size_ = 24 306 | I1230 15:22:17.622351 8870 serialized_row_reader.hpp:64] mem_size_ = 24 307 | I1230 15:22:17.622355 8870 serialized_row_reader.hpp:64] mem_size_ = 24 308 | I1230 15:22:17.622360 8870 serialized_row_reader.hpp:64] mem_size_ = 24 309 | I1230 15:22:17.622364 8870 serialized_row_reader.hpp:64] mem_size_ = 24 310 | I1230 15:22:17.622370 8870 serialized_row_reader.hpp:64] mem_size_ = 24 311 | I1230 15:22:17.628597 8870 bg_workers.cpp:970] get ServerShutDownAck from server 0 312 | I1230 15:22:17.628619 8870 bg_workers.cpp:970] get ServerShutDownAck from server 1 313 | I1230 15:22:17.663645 8867 server.cpp:236] Serializing table 2 314 | I1230 15:22:17.663669 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 315 | I1230 15:22:17.663674 8867 server.cpp:236] Serializing table 1 316 | I1230 15:22:17.663676 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 317 | I1230 15:22:17.663679 8867 server.cpp:236] Serializing table 0 318 | I1230 15:22:17.663682 8867 server_table.hpp:83] tmp_row_buff_size_ = 512 319 | I1230 15:22:17.663833 8870 serialized_row_reader.hpp:64] mem_size_ = 24 320 | I1230 15:22:17.663854 8867 server_threads.cpp:419] get ClientShutDown from bg 100 321 | I1230 15:22:17.663864 8870 bg_workers.cpp:970] get ServerShutDownAck from server 1000 322 | I1230 15:22:17.663868 8870 bg_workers.cpp:973] Bg worker 1100 shutting down 323 | I1230 15:22:17.664058 8867 server_threads.cpp:422] Server shutdown 324 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/Matrix-Factorization-Analysis.md: -------------------------------------------------------------------------------- 1 | # Matrix Factorization分析 2 | 3 | ## 1. 初始化 4 | ### Configure Petuum PS 5 | ```c++ 6 | // Configure PS row types 7 | petuum::PSTableGroup::RegisterRow >(0); // Register dense 8 | ``` 9 | 注册Row类型,实际动作是将`Class DenseRow`放到了一个`map creator_map_`里面,map的key就是`RegisterRow(id)`中的id,这里是0。 10 | 11 | ### Start PS 12 | 13 | ```c++ 14 | // Start PS 15 | // IMPORTANT: This command starts up the name node service on client 0. 16 | // We therefore do it ASAP, before other lengthy actions like 17 | // loading data. 18 | petuum::PSTableGroup::Init(table_group_config, false); // Initializing thread does not need table access 19 | ``` 20 | 实际动作是new出来一个TableGroup。 21 | 22 | Server is different from NameNode. NameNode is not considered as server. 23 | 24 | Thread id范围 25 | - 0~100: Server和NameNode thread使用 26 | - 200~1000: app thread使用 27 | 28 | Server thread需要设置consistency_model,具体如下: 29 | 30 | ```c++ 31 | //ServerThreadMainFunc ServerThreadMain; 32 | ConsistencyModel consistency_model = GlobalContext::get_consistency_model(); 33 | switch(consistency_model) { 34 | case SSP: 35 | ServerPushRow = SSPServerPushRow; 36 | RowSubscribe = SSPRowSubscribe; 37 | break; 38 | case SSPPush: 39 | ServerPushRow = SSPPushServerPushRow; 40 | RowSubscribe = SSPPushRowSubscribe; 41 | VLOG(0) << "RowSubscribe = SSPPushRowSubscribe"; 42 | break; 43 | default: 44 | LOG(FATAL) << "Unrecognized consistency model " << consistency_model; 45 | } 46 | ``` 47 | background tread is used for storing opLog,同样backgroud thread也需要设置consistency_model如下: 48 | ```c++ 49 | BgThreadMainFunc BgThreadMain; 50 | ConsistencyModel consistency_model = GlobalContext::get_consistency_model(); 51 | switch(consistency_model) { 52 | case SSP: 53 | { 54 | BgThreadMain = SSPBgThreadMain; 55 | MyCreateClientRow = CreateSSPClientRow; 56 | GetRowOpLog = SSPGetRowOpLog; 57 | } 58 | break; 59 | case SSPPush: 60 | { 61 | BgThreadMain = SSPBgThreadMain; 62 | MyCreateClientRow = CreateClientRow; 63 | system_clock_ = 0; 64 | GetRowOpLog = SSPGetRowOpLog; 65 | } 66 | break; 67 | default: 68 | LOG(FATAL) << "Unrecognized consistency model " << consistency_model; 69 | } 70 | ``` 71 | 72 | bg_workers也会添加vector clock. 73 | 74 | init thread也会添加vector clock 75 | 76 | TableGroupConfig里面还有一个aggressive_clock属性: 77 | ```c++ 78 | // If set to true, oplog send is triggered on every Clock() call. 79 | // If set to false, oplog is only sent if the process clock (representing all 80 | // app threads) has advanced. 81 | // Aggressive clock may reduce memory footprint and improve the per-clock 82 | // convergence rate in the cost of performance. 83 | // Default is false (suggested). 84 | bool aggressive_clock; 85 | ``` 86 | 如果是true的,每一个commit(也就是clock())都要send oplog。 87 | 88 | ## Server thread执行逻辑 89 | 90 | ConnectToNameNode() 91 | 92 | Server thread可以connect所有client里面的bg threads。Server thread的功能: 93 | - 接收到kCreateTable消息后,会HandleCreateTable() 94 | - 接收到kRowRequest消息后,会HandleRowRequest() 95 | - 接收到kClientSendOpLog消息后,会HandleOpLogMsg() 96 | - 接收到kClientShutDown消息后,会HandleShutDownMsg() 97 | 98 | 99 | ## StandardMatrixLoader 100 | 101 | `num_workers_`是整个集群中的worker thread个数。每一个worker thread有一个访问Matrix的index,这个index被存在`worker_next_el_pos_`中。 102 | 103 | Client的main thread会利用StandardMatrixLoader将整个Matrix load到内存,然后让每个worker thread顺序访问。 104 | 105 | ## matrixfact.CreateTable() 106 | CreateTable() 先设置table的`max_table_staleness`属性,然后调用`Bgworkers::CreateTable(table_id, table_config)`,该函数会将要create的Table信息通过`SendInProc(id_st_=100, msg, msg_size)`发送给Bg thread。 107 | 108 | Bg thread initialization logic: 109 | - Establish connections with all server threads (app threads cannot send message to bg threads until this is done); 110 | - Wait on a "Start" message from each server thread; 111 | - Receive connections from all app threads. Server message (currently none for pull model) may come in at the same time. 112 | 113 | 在初始化`petuum::PSTableGroup::Init(table_group_config, false);`里面就调用Bg thread的`SSPBgThreadMain()`的方法,然后调用`BgWorkers::HandleCreateTables()`方法。由于在Init()的时候,还没有createTable的需求,因此`BgWorkers::HandleCreateTables()`会快速返回。当main()中调用createTable时,比如,` petuum::PSTableGroup::CreateTable(0,table_config); 114 | ` 会向bg thread发送createTable的消息(类型是kBgCreateTable),然后标号是100的bg thread会调用HandleCreateTable(),bg thread的HandleCreateTable()会向NamNode发送创建Table的信息,收到NameNode反馈的信息后,会使用下面的语句来真正地创建表,也就是说Table存在于bg thread中: 115 | 116 | ```c++ 117 | client_table = new ClientTable(table_id, client_table_config); 118 | ``` 119 | 120 | 在创建一个Table时,会同时创建其Consistency model,目前只有两种: 121 | - SSP:对应创建 SSPConsistencyController 122 | - SSPPush:对应创建 SSPPushConsistencyController 123 | 124 | 在MF中,L,R和Loss table的Consistency model都是SSPPush。 125 | 126 | ConsistencyController负责控制对Table的访问,提供了GetAsync(row_id),Get(row_id, row_accessor), ThreadGet(row_id, row_accessor)等方法。其中最重要的方法是Get(row_id),该方法会check freshness,如果row_id不存在或者stale is too old。 127 | 128 | bg thread创建完表以后,会将创建完的信息发送给main() thread。 -------------------------------------------------------------------------------- /BigDataSystems/Petuum/MatrixFactorization.md: -------------------------------------------------------------------------------- 1 | # Analysis of Matrtix Factorization 2 | 3 | ## 算法 4 | 5 | 矩阵LR分解是将一个N * M 的矩阵分解为N * K的L矩阵和K * M的R矩阵。 6 | ![](figures/matrixfact.png) 7 | 8 | 根据`rank(AB) ≤ min(rank(A), rank(B))`可知,如果Matrix的秩大于K,那么LR分解后的矩阵乘积会丢失Matrix的一些信息,类似PCA和SVD。 9 | 10 | ## 算法并行化 11 | ![](figures/parallel-matrixfact.png) 12 | 13 | ## 在PS上实现矩阵分解算法 14 | 15 | ![](figures/matrixfact-petuum.png) -------------------------------------------------------------------------------- /BigDataSystems/Petuum/PetuumArchitecture.md: -------------------------------------------------------------------------------- 1 | # Petuum原理 2 | 3 | ## LazyTable基本架构 4 | ![](figures/Architecture.png) 5 | 6 | 1. 存放parameters的table的rows分布在多个tablet servers上。 7 | 2. 执行一个App script后,PS会在每个Client都会运行一个App program (e.g., matrixfact.main()),每个App program可以生成多个app threads。App thread相当于MapReduce/Spark中的task。 8 | 3. App thread通过client library来访问相应的table servers获取所需的table中的rows。 9 | 4. Client library维护了一个多级cache和operation logs来减少与table server的交互。 10 | 11 | ## LazyTable数据模型与访问API 12 | 13 | ### Data model: Table[row(columns)] 14 | 15 | 由于ML中算法基本使用vector或者matrix,所以可以用Table来存储参数。 16 | 17 | 与二维表类似,一个Table(比如matrixfact中的`L_Table`)包含多个row,row一般是`denseRow`或者`sparseRow`,一个row包含多个column。具体的parameter存在table中的cell中。具体实现时,Table可以用`hashmap`来实现。 18 | 19 | 由于Table中的paramters会被多个threads更新,所以row支持一些聚合操作,比如plus, multiply, union。 20 | 21 | ### LazyTable操作 22 | 23 | 因为要对Table进行读写更新操作,因此Table需要支持一些操作,LazyTable的操作接口借鉴了Piccolo的接口: 24 | 25 | 1. read(tableid, rowid, slack) 26 | 27 | 读取row,如果local cache中存在该row且其slack满足staleness bound(也就是local cache中的参数足够新),那么从local cache读取该row,否则暂停读取线程(the calling thread waits)。这个API也是唯一可以block calling thread的API。 28 | 29 | 2. update(tableid, rowid, delta) 30 | 31 | 更新table中row的参数,newParameter = oldParameter + delta,这三个都是vector。 32 | 33 | 3. refresh(tableid, rowid, slack) 34 | 35 | 如果process cache(被多个app thread共享)中的table中的row已经old了,就更新之。 36 | 37 | 4. clock() 38 | 39 | 调用后表示calling thread已经进入到下一个周期,因为SSP不存在固定的barrier,所以这个看似会synchronization的API并不会block calling thread。 40 | 41 | ### Data freshness and consistency guarantees 42 | 43 | 1. 数据新鲜度保证: 44 | 45 | 每个row有一个data age field(也就是clock)用于表示该row的数据新鲜度。假设一个row的当前data age是t,那么表示该row里面的参数 contains all updates from all app threads for1, 2, ..., t. 46 | 47 | 对于SSP来说,当calling thread在clock t的周期内发送`read(tableid, rowid, slack)`的请求时,如果相应row的`data age >= t-1-slack`,那么该row可以返回。 48 | 49 | 2. read-my-updates: 50 | 51 | ready-my-updates ensures that the data read by a thread contains all its own updates. 52 | 53 | ## LazyTable系统模块之Tablet Servers 54 | 55 | ### Tablet Servers基本功能 56 | 57 | 一个逻辑上的Table可以分布存放在不同的tablet server上,比如`L_Talbe`中的 i-th row 可以存在`tablet_server_id = i % total_num_of_servers`上。每个tablet server都将rows存放在内存中。 58 | 59 | 每个tablet server使用一个vector clock(也就是`vector`)来keep track of rows的新鲜度。vector中第i个分量表示第i个row的clock,vector中最小的clock被定义为`global_clock_value`,比如`global_clock_value = t` 表示所有的app threads都已经完成了clock t周期的计算及参数更新。问题:每个tablet server只存储table中的一部分rows,一部分rows达到了clock t就能说所有的app threads都完成了clock t周期的计算? 60 | 61 | ### Table updates 62 | 63 | 由于tablet server会不断收到来自多个app thread的update请求,tablet server会先将update请求做一个本地cache(将update请求放到pending updates list中)。当且仅当收到client发送clock()请求时,tablet server才会集中处理将这些updates。这样可以保证row的新鲜度由vector clock唯一决定。 64 | 65 | ### Table read 66 | 67 | 当tablet server收到client端发来的read请求,会先查看`global_clock_value` (为什么不是该row的data age?),如果tablet server中的row新鲜度满足requested data age要求(`global_clock_value >= t-1-slack`),那么直接返回row给client。否则,将read request放到pending read list里面,并按照requested data age排序(从大到小?)。当`global_clock_value`递增到requested data age时,tablet server再将相应的row返回给client。除了返回row,tablet server还返回data age和requester clock。前者是`global_clock_value`,后者是client's clock(说明了which updates from this client have been applied to the row data,client可以利用这个信息来清除一些本地的oplogs)。 68 | 69 | ## LazyTable系统模块之Client library 70 | 71 | Client library与app threads在同一个process,用于将LazyTable API的调用转成message发送到tablet server。Client library包含多层caches和operation logs。Client library会创建一个或多个background threads (简称为bg threds)来完成propagating updates和receiving rows的工作。 72 | 73 | Client library由两层 caches/oplogs 组成:process cache/oplog和thread cache/oplog。Process cache/oplog被同在一个进程中的所有app thread和bg thread共享。Each thread cache/oplog is exclusively associated with one app thread.(实现好像不是这样的)。Thread cache的引入可以避免在process cache端有过多的锁同步,但是只能cache一些rows。 74 | 75 | Client library也使用vector clock来track app thread的clock,第i个分量代表第i个app thread已经进入的clock周期。 76 | 77 | ### Client updates 78 | App thread调用update(deltas)后,会先去访问对应的thread cache/oplog,如果cache中相应的row存在,那么`thread.cache.row += update.deltas`,同时会update写入到oplog中。不存在就直接存起来。当app thread调用clock(),那么在thread oplog中的updates都会被push到process oplog中,同时`process.cache.row += updates.deltas`。如果thread cache/oplog不存在,update会直接被push到process cache/oplog。 79 | 80 | 当一个client process中所有app threads都完成clock为 t 的计算周期,client library会使用一个bg thread(是head bg thread么?)向table server发送一个消息,这个消息包含clock t,process oplogs中clock为 t 的updates。这些process cache/oplogs中的updates会在发送该消息后一直保留,直到收到server返回的更新后的rows。 81 | 82 | ### Client read 83 | 84 | 在clock t周期内,如果一个app thread想要去读row r with a slack of s,那么client library会将这个请求翻译成`read row r with data age >= t-s-1`。接着,client library会先去thread cache中找对应的且满足条件的row,如果不存在就去process cache中找,如果还找不到就向tablet server发送要read row r的请求,同时block calling thread,直到server返回row r。在process cache中每个row有一个tag来表示是否有row request正在被处理,这样可以同步其它的request统一row的请求。 85 | 86 | 当tablet server返回row r时,client library端有一个bg thread会接受到row r,同时接受requester clock rc。rc表示该client提交的clock t的updates已经被处理。之后,process oplog就可以清除`clock <= rc` 的update日志。为了保证 read-my-updates,接收到row r 后,会将process oplog中`clock > rc`的操作作用到row r上,这样就可以得到本地最新的row r。最后,前面接受row r的bg thread会跟心row r的clock并将其返回到waiting app threads。 87 | 88 | 89 | ## Prefetching and fault-tolerance 90 | ### 数据预取 91 | 92 | LazyTable提供了预取API refresh(),函数参数与read()一样,但与read()不一样的地方是refresh()不会block calling thread。 93 | 94 | LazyTable支持两种预取机制:conservative prefetching和aggressive prefetching。前者只在必要的时候进行refresh,如果`cache_age < t-s-1`,prefetcher才会发送一个`request(row = r, age >= t-s-1)`。对于Aggressive prefetching,如果当前的row不是最新的会主动去更新。 95 | 96 | 97 | ## Differences with Spark 98 | 99 | 1. Spark的通信模式比较简单,最复杂的是shuffle模块,需要redcuer去mapper端fetch数据。 100 | 2. 在PS中,client与server有频繁的交互通信。 101 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/Petuum基本架构.md: -------------------------------------------------------------------------------- 1 | # Petuum 基本架构 2 | 3 | 4 | ## Parameter Server (PS) 概念 5 | 6 | PS is a key-value store that allows different processes to share access to a set of variables. 7 | 8 | 对于Distirbuted ML来说,process指的是learning process,variables指的是parameters。PS的特点是 9 | 10 | 1. Data partition 11 | 12 | 每个节点存放一部分data 13 | 2. Shared model 14 | 15 | 多个learning process共享model(模型里面包含参数) 16 | 17 | ## 基本系统架构 18 | 1. 一个Server 19 | 20 | maintains the master copy of the **parameters** and propagates the workers’ **writes (updates)** to other workers. 21 | 2. 多个Worker 22 | 23 | 每个worker通过client library去server那里获取parameters。Client library还会cache之前从server那里获取到的parameters,这样worker就不必每次都去Server那里获取最新的parameters。这个cache成为process storage。每次对parameter所做的write(update)操作都会被insert到一个update table(代码里对应**Oplog**)。 24 | 25 | ## 数据模型 26 | 27 | 在PS中,parameters被表示成key-value paris,并存放在多个Table中。每一个table包含多个Rows,每个Row的类型都相同,且有一个RowID。Row中的每一个cell都包含一个Column ID,每个cell一般存放一个parameter。这样存放到Parameter Server中的每个parameter可以表示成。Table存放在一台或者多台机器上。 28 | 29 | Table-Row既是数据模型也是存储格式。PS允许app选择适合自己的数据结构来组织每个row中的parameters,甚至允许app自定义Rows。 30 | 31 | 每一个table都有自己的update table,update table也有自己的rows,不过是用来存放log的,这里称之为row oplog。 32 | 33 | ## 创建一个PS的app 34 | 35 | 创建一个简单的app,该app包含一个single-threaded的client和一个Table。 36 | 37 | ### 1. 引入头文件 38 | ```c++ 39 | #include 40 | ``` 41 | 42 | 所有app只需包含这个头文件,该文件包含了PS的所有APIs。第一步是去初始化PS的环境,相当于Spark里面的SparkContext,负责初始化的线程被称作init thread。 43 | 44 | 为了简化例子,我们只run一个worker process。如果要run多个worker process的话,所有的worker process要执行同样的初始化流程,并创建多个tables。 45 | 46 | ### 2. 注册row types 47 | Row type可以是多种类型,但需要在PS启动计算前注册。下面的API可以创建一个row ID到row type的映射,其中row ID是32位的integer。之后,app可以在创建table时候使用row ID来获取相应的row type。 48 | 49 | 下面的例子会创建一个row type,类型为vector,这个类型由PS的API提供。Row里面的T就是parameter的类型。更具体地,这里我们注册`petuum::DenseRow`到PS中,并将所有参数初始化为0,如下: 50 | 51 | ```c++ 52 | // register row type petuum::DenseRow with ID 0. petuum::PSTableGroup::RegisterRow >(0); 53 | ``` 54 | 55 | ### 3. 初始化PS环境 56 | 就像在SparkConf中要设置master,port,app name等,在Petuum中,需要设置`host_map`。我们需要将每一个worker process的信息加入到该map中,形成一个entry。每个entry有一个ID(从0开始计数的整数),一个IP地址,还有一个当前未用的port(比如10000)。具体代码如下: 57 | 58 | ```c++ 59 | petuum::TableGroupConfig table_group_config; table_group_config.host_map.insert(std::make_pair(0, HostInfo(0, "127.0.0.1", "10000"))); 60 | petuum::PSTableGroup::Init(table_group_config, false); 61 | ``` 62 | 63 | 将worker process加入到`host_map`中后,就可以使用`petuum::PSTableGroup::Init()`来初始化PS的环境,Init()还包含一个boolean flag,如果设置为true,就表示init thread可以访问table的所有APIs,这些APIs在`petuum:PSTableGroup::GetTableOrDie()`中定义。一般将flag置为false。 64 | 65 | ### 4. 创建Tables 66 | 67 | 先show代码 68 | 69 | ```c++ 70 | petuum::ClientTableConfig table_config; table_config.table_info.row_type = 0; table_config.table_info.row_capacity = 100; 71 | table_config.process_cache_capacity = 1000; table_config.oplog_capacity = 1000; 72 | // here 0 is the table ID, which will be used later to get table. bool suc = petuum::PSTableGroup::CreateTable(0, table_config); 73 | ``` 74 | 对于一个app来说,上面的配置参数都需要设置。配置参数的具体含义见下表: 75 | 76 | | 名称 | 默认值 | 解释| 77 | |:-----|:------|:-------| 78 | | table\_info.row\_type| N/A | row type| 79 | | table\_info.row\_capacity| 0 | 对于 DenseRow,指column个数,对SparseRow无效| 80 | | process\_cache\_capacity| 0 | row个数| 81 | | process\_oplog\_capacity | 0 | update table里面最多可以写入多少个row| 82 | 83 | 调用`CreateTable()`后,就会去创建tables,创建好后需要调用下面的API来完成table创建过程。 84 | 85 | ```c++ 86 | petuum::PSTableGroup::CreateTableDone(); 87 | ``` 88 | ### 5. 创建并运行Worker threads 89 | 90 | 接下来我们将会创建一个worker thread,该thread可以通过Table接口来访问到parameters。 91 | 92 | 首先定义一个概念,可以访问table APIs的worker thread被称为**table thread**。 93 | 94 | 在成为table thread之前,该worker thread需要通过下面的API来注册自己 95 | 96 | ```c++ 97 | int thread_id = petuum::PSTableGroup::RegisterThread(); 98 | ``` 99 | 然后就可以通过Table ID来得到table实例: 100 | 101 | ```c++ 102 | petuum::Table table = petuum::PSTableGroup::GetTableOrDie(0); 103 | ``` 104 | 可以通过这个`petuum:Table`类型来访问table里面的parameters,之后可以进行计算。 105 | 106 | 当worker thread完成计算之后,需要通过下面的API注销自己 107 | 108 | ```c++ 109 | petuum::PSTableGroup::DeregisterThread(); 110 | ``` 111 | 112 | 如果想让init thread也能访问到table的API,需要将`petuum::PSTableGroup::Init(table_group_config, false);`中的false改为true。init thread不需要注册和注销自己,但它需要通过下面的API等待所有其他thread完成注册。 113 | 114 | ```c++ 115 | petuum::PSTableGroup::WaitThreadRegister(); 116 | ``` 117 | 118 | ### 6. Stop PS 119 | 当所有的worker threads都完成计算退出,我们可以通过下面的API shutdown PS。 120 | 121 | ```c++ 122 | petuum::PSTableGroup::ShutDown(); 123 | ``` 124 | 125 | ## Table API 126 | 127 | ### 1. 访问Table 128 | 在read或者update table之前,需要先get table 129 | ```c++ 130 | // Gain access to table. template petuum::Table petuum::PSTableGroup::GetTableOrDie(int table_id); 131 | ``` 132 | ### 2. Read parameters 133 | 134 | 先new一个`RowAccessor`对象,给定`row_id`后,下面的API会将row信息写入到`row_accessor`指向的`RowAccessor`对象。 135 | 136 | ```c++ 137 | void petuum::Table::Get(int32_t row_id, RowAccessor *row_accessor); 138 | ``` 139 | 140 | ### 3. Update parameters 141 | Petuum提供了两种更新参数的方式: 142 | - 只更新一个parameter 143 | 通过`row_id`和`column_id`定位到parameter,然后更新 144 | 145 | ```c++ 146 | void petuum::Table::Inc(int32_t row_id, int32_t column_id, UPDATE update); 147 | ``` 148 | - 更新一组参数 149 | 通过`row_id`定位到row,然后更新 150 | ```c++ 151 | void petuum::Table::BatchInc(int32_t row_id, const UpdateBatch& update_batch); 152 | ``` 153 | 154 | ### 4. Completion of A Clock Tick 155 | 156 | Inform PS that this thread is advancing to the next iteration, workers only commit their updates at the end of each clock. 157 | 158 | ```c++ 159 | static void petuum::PSTableGroup::Clock(); 160 | ``` 161 | 162 | ## 编译 163 | 164 | ### 1. 编译PS 165 | 进入root文件夹,执行: 166 | ```c++ make third_party_core make ps_lib -j8 167 | ``` 168 | PS library依赖很多第三方库,第一条command就是去编译这些库的。 169 | 170 | ### 2. 编译app 171 | 在自己的app目录下建立Makefile,并将`defns.mk`里面的内容加入到app的Makefile中。 172 | 173 | 174 | 175 | 176 | 177 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/Petuum基础.md: -------------------------------------------------------------------------------- 1 | # Petuum基础 2 | 3 | 4 | Petuum将ML算法应用分为两种类型:Big data(has many data samples)和Big Model(has very large parameter and intermediate variable spaces)。针对这两种应用,Petuum分别设计了两个系统功能模块及一个系统优化模块: 5 | 6 | ## 主要系统模块 7 | 8 | - Distributed parameter server (i.e. key-value storage) 9 | - 用于global的参数同步,主要支持Big data类型算法的并行化,比如矩阵LR分解 10 | - Distributed model scheduler (STRADS) 11 | - 调度worker tasks,主要支持Big Model类型算法的并行化,比如Lasso 12 | - Out-of-core (disk) storage for limited-memory situations 13 | - 针对内存不足的情况,设计的磁盘存储策略 14 | 15 | 前两种可以组合使用,但在目前的例子是分开使用的。 16 | 17 | 更详细的介绍: 18 | 19 | We have develop a prototypic framework for Big ML called Petuum, which comprises several interrelated components, each focused on exploiting various specific properties of *iterative-convergent* behavior in ML. **The components can be used individually, or combined to handle tasks that require their collective capabilities.** Here, we focus on two components: - **Parameter Server for global parameters:** Our parameter server(Petuum-PS) is a distributed key-value store that enables an easy-to-use, distributed-shared-memory model for writing distributed ML programs over **BIG DATA**. Petuum-PS supports novel consistency models such as bounded staleness, which achieve provably good results on iterative-convergent ML algorithms. Petuum-PS additionally offers several “tuning knobs” available to experts but otherwise hidden from regular users such as thread and process-level caching and read-my-write consistency. We also support out-of-core data streaming for datasets that are too large to fit in machine memory. 20 | ![](figures/Petuum-ps-topology.png) 21 | - **Variable Scheduler for local variables:** Our scheduler (STRADS) analyzes the variable structure of ML problems, in order to find parallelization opportunities over **BIG MODEL** while avoiding error due to strong dependencies. STRADS then dispatches parallel variable updates across a distributed cluster, while prioritizing them for maximum objective function progress. Throughout this, STRADS maintains load balance by dispatching new variable updates as soon as worker machines finish existing ones. 22 | 23 | ![](figures/STRADS-architecture.png) 24 | 25 | ## 基本逻辑架构 26 | 27 | ![](figures/petuum-overview.png) 28 | 29 | An update function updates the model parameters and/or latent model states 𝜃 by some function 𝚫𝜃(𝓓) of the data 𝓓. Data parallelism divides the data 𝓓 among different workers, whereas model parallelism divides the parameters (and/or latent states) 𝜃 among different worker. 30 | 31 | 在左图中,数据是分布的但模型参数$\theta$没有分布,每个worker节点持有完整的参数,${worker}\_{i}$要在分块数据${D}\_{i}$上计算参数更新$\Delta\theta({D}\_{i})$(可以想象成梯度)。一般来说,如果参数可以batch update(不需要一个固定的更新顺序),那么计算$\Delta\theta({D}\_{i})$与计算$\Delta\theta({D}\_{j})$过程可以独立,就可以用PS的架构了。 32 | 33 | 在右图中,模型是分布的但数据没有分布,每个worker持有全部的数据,但只持有一部分参数${\theta}\_{i}$,${worker}\_{i}$在整个数据${D}$上计算一部分参数更新$\Delta{\theta}\_{i}(D)$。 34 | 35 | ## 基本算法 36 | - Matrix Factorization 37 | - Stochastic Gradient Descent更新方式 38 | - a data-parallel algorithm 39 | - LASSO regression 40 | - Coordinate Descent更新方式 41 | - a model-parallel algorithm 42 | 43 | ## 共享目录 44 | 45 | **We highly recommend using Petuum in an cluster environment with a shared filesystem** (e.g. shared home directories).在实际环境中,这一条很难实现,目前cluster不会有共享的home目录。 Provided all machines are identically configured and have the necessary packages/libraries, you only need to compile Petuum (and any apps you want to use) once, from one machine. The Petuum ML applications are all designed to work in this environment, as long as the input data and configuration files are also available through the shared filesystem. 46 | 47 | ## PS的配置文件 48 | ``` 49 | 0 ip_address_0 10000 50 | 1 ip_address_0 9999 51 | 1000 ip_address_1 9999 52 | 2000 ip_address_2 9999 53 | 3000 ip_address_3 9999 54 | ``` 55 | 56 | Each line in the server configuration file format specifies an ID (0, 1, 1000, 2000, etc.), the IP address of the machine assigned to that ID, and a port number (9999 or 10000). Every machine is assigned to one ID and one port, except for the first machine, which is assigned two IDs and two ports because it has a special role. 整个role就是NameNode。 57 | 58 | If you want to simultaneously run two Petuum apps on the same machines, make sure you give them **separate** Parameter Server configuration files with **different ports**. **The apps cannot share the same ports!** 59 | 60 | ## ML App: Matrix Factorization 61 | 这个例子的运行过程文档已经讲的很清楚,这里再解释几个文档没有细讲的地方: 62 | M = L * R,M是9x9的矩阵,分解后的L是9x3的矩阵,R是3x9的矩阵。 63 | 64 | 1. 当K=3时,MF的输出结果(L矩阵如下): 65 | 66 | ``` 67 | 0.115764 1.03662 0.100797 68 | 0.115764 1.03662 0.100797 69 | 0.115764 1.03662 0.100797 70 | -1.07724 0.107777 0.922327 71 | -1.07724 0.107777 0.922327 72 | -1.07724 0.107777 0.922327 73 | 1.16671 -0.218361 1.17542 74 | 1.16671 -0.218361 1.17542 75 | 1.16671 -0.218361 1.17542 76 | 77 | ``` 78 | 可以发现是每三行几乎一样,原因是$rank(AB) \leq min(rank(A),rank(B))\leq K = 3$。当K=3是9的一个公约数时,分解得到三个线性无关的向量,但当K=4时,不是9的公约数时就没有这个特性了。 79 | 80 | 2. 当K=4时,MF的输出结果(R矩阵如下): 81 | ``` 82 | -0.0919685 -0.668313 0.789098 0.0187957 83 | -0.225036 -0.585597 0.82593 0.123971 84 | 0.140019 -0.814718 0.724556 -0.16359 85 | -0.839692 0.564146 0.475171 -0.915151 86 | -0.728989 0.494618 0.444423 -1.00233 87 | -0.790882 0.533725 0.46164 -0.953696 88 | -1.03243 -1.04973 -0.924 -0.464828 89 | -0.917422 -1.12158 -0.955918 -0.55563 90 | -1.11345 -0.998635 -0.901482 -0.401137 91 | ``` 92 | 93 | 3. App configuration 94 | 需要解释几个配置: 95 | - `client_worker_threads`: how many worker threads to use on each machine 96 | - `--staleness x`: turn on Stale Synchronous Parallel (SSP) consistency at staleness level x; often improves performance when using many machines 97 | - `--lambda x`: sets the L2 regularization strength to x; default is 0 98 | - `--offsetsfile x`: used to provide an "offsets file" for limited-memory situations; 99 | - `--init_step_size x`, --step_size_offset y, --step_size_pow z: used to control the SGD step size. The step size at iteration t is $x * {(y+t)}^{-z}$. Default values are $x=0.5, y=100, z=0.5$. 100 | - `--ps_row_cache_size x`: controls the cache size of each worker machine. By default, the MF app caches the whole L, R matrices for maximum performance, but this means every machine must have enough memory to hold a full copy of L and R. If you are short on memory, set x to the maximum number of L rows and R columns you wish to cache. For example, `--ps_row_cache_size 100` forces every client to only cache 100 rows of L and 100 columns of R. 101 | 102 | 103 | 比如要run一个client worker threads为4,staleness为5,lambda为0.1的MF例子: 104 | ``` 105 | scripts/run_matrixfact.sh sampledata/9x9_3blocks 3 100 mf_output scripts/localserver 4 "--staleness 5 --lambda 0.1" 106 | ``` 107 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/Petuum本地编译运行.md: -------------------------------------------------------------------------------- 1 | # Petuum本地编译运行 2 | 3 | 4 | ## 编译Pettum 5 | 按照Pettum的安装文档[Installation](https://github.com/petuum/public/wiki/Installation)编译。该编译流程会先下载并编译第三方库(如boost,gflags,leveldb等等),然后会编译petuum自身(见petuum/Makefile)。 6 | 7 | ## 编译apps 8 | 上面的步骤只是编译Petuum计算框架,每个app需要单独编译。比如要编译Matrixfact,只需要进入apps/matrixfact,执行`make`就行了。具体参见[ML App: Matrix Factorization](https://github.com/petuum/public/wiki/ML-App:-Matrix-Factorization)。其他的apps(比如LDA,DNN类似)。 9 | 10 | ## 本地运行app测试 11 | 按照Pettum的[wiki](https://github.com/petuum/public/wiki)运行就好了。 12 | 13 | ## 在Eclipse里面编译petuum 14 | 想要深入理解代码,肯定要导入IDE中debug了。 15 | 16 | 1. 导入project 17 | 18 | 下载Linux版本的Eclipse CDT,将编译好的petuum整个文件夹拷贝到workspace下面,然后删去third_party下面的src文件夹(整个文件夹存放了第三方库的源码,可以不要)。最后,将workspace/petuum import到Eclipse里面,选择`File->New project->C/C++->Makefile Project with Existing Code`,设置Toolchain for Indexer Settings为Linux GCC。 19 | 20 | 2. 编译Petuum 21 | 22 | 导入后,直接`Project->Build project`就可以编译petuum了,以后可以修改petuum的源代码,然后直接build就行了。 23 | 24 | 3. 编译apps 25 | 26 | Eclipse不能自动识别子项目的Makefile,因此`Project->Build project`只能编译petuum自身不能编译apps。解决方法是手动添加apps的编译项。具体方法是在`Properties->C/C++ Build->Manage configurations->New`添加一个编译项,比如name设为matrixfact,确定后设置Build directory为`${workspace_loc:/petuum-0.93/apps/matrixfact}`,Refresh Policy里面的Resources设置为`${workspace_loc:/petuum-0.93/apps/matrixfact}`,最后设置matrixfact为active后,就可以编译matrixfact子项目了。 27 | 28 | ## 在Eclipse里面运行petuum 29 | 30 | Petuum使用脚本运行方式来执行app,比如需要在某个node上执行 31 | 32 | ``` 33 | scripts/run_matrixfact.sh sampledata/9x9_3blocks 3 100 mf_output scripts/localserver 4 "--staleness 5 --lambda 0.1" 34 | ``` 35 | 来运行matrixfact,那么如何在Eclipse里达到同样的效果? 36 | 37 | 答案是在`Run->External Tools->External Tools Configurations`里面的Program里面添加一个运行项,比如命名为run_matrixfact。然后设置Location为`${workspace_loc:/petuum-0.93/apps/matrixfact/scripts/run_matrixfact.sh}`,Workding Directory为`${workspace_loc:/petuum-0.93/apps/matrixfact}`, Arugments为`sampledata/9x9_3blocks 3 100 mf_output scripts/localserver 2`。然后run就相当于在terminal里面执行该脚本了。但遗憾的是没有debug as external tools。想要debug,目前可以用下面的方法解决。 38 | 39 | ## 在Eclipse里面debug app 40 | 41 | 首先要new一个Debug configuration,名字可以是app的名字,比如matrixfact。C/C++ Application是app的路径,比如`apps/matrixfact/bin/matrixfact`。Build Configuration选择之前添加的`matrixfact`。Program arguments填入 42 | 43 | ``` 44 | --hostfile scripts/localserver --datafile sampledata/9x9_3blocks --output_prefix mf_output --K 3 --num_iterations 100 --num_worker_threads 2 --num_clients 1 --client_id 0 45 | ``` 46 | 47 | Working directionary输入`${workspace_loc:petuum-0.93/apps/matrixfact}`。最后设置断点,开始debug。 48 | 49 | > 需要注意的是以这种方式进行debug仅仅是在debug众多client中的一个。 50 | 51 | 52 | 53 | 54 | 55 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/Petuum系统及Table配置.md: -------------------------------------------------------------------------------- 1 | # Petuum系统及Table配置 2 | 3 | ## 系统配置 4 | 5 | 系统相关的配置都会被存放到`petuum::TableGroupConfig`对象中。 6 | 7 | ### 1. 建立一个distributed app 8 | 9 | 每一个worker process需要在`host_map`中注册(以一个entry方式存在)。为了简化配置,用户可以定义**optional** server file,交给PS处理。下面的API会用`server_file`中的内容初始化`host_map`。 10 | 11 | ```c++ 12 | void petuum::GetHostInfos(std::string server_file, std::map *host_map); 13 | ``` 14 | 15 | 一个Sever file的例子,三列分别是`processID, IP addresss, port`: 16 | ```c 0 192.168.1.1 10000 1 192.168.1.2 10000 17 | ``` 18 | 另外,需要告诉PS总共要run多少个process以及每个process的ID,如下表 19 | 20 | | 名称 | 默认值 | 解释 | 21 | |:-------|:----------|:-------| 22 | |num\_total\_clients| 1 | Number of processes to run| 23 | | client\_id | 0 | This process's ID | 24 | 25 | ### 2. 让每个node上run多个worker threads 26 | 27 | 设置app threads的个数,包含init thread。 28 | 29 | | 名称 | 默认值 | 解释 | 30 | |:-------|:----------|:-------| 31 | | num\_total\_app\_threads | 2 | Number of local application threads, including the init threads | 32 | 33 | ### 3. 建立多个Table 34 | 35 | 基于以下原因用户可能由建立多个Tables的需求: 36 | - 有多个row types,不同row type对应的row个数(row capacity)也不一样 37 | - 不同的 staleness constraints 38 | - 一个table里row个数太多,放不下 39 | 40 | 要设置多个table,需要更改下面的默认值: 41 | 42 | | 名称 | 默认值 | 解释 | 43 | |:-------|:----------|:-------| 44 | | num\_tables | 1 | 系统包含的table个数| 45 | 46 | ### 4. Taking SnapShots and Resume From SnapShots 47 | 48 | SnapShot可以暂存中间计算结果,对于迭代型的ML算法来说,既可以暂停程序,也可以进行错误恢复,类似与Spark的checkpoint。相关参数见下表: 49 | 50 | | 名称 | 默认值 | 解释 | 51 | |:-------|:----------|:-------| 52 | | snapshot\_clock | -1 | take snapshots every `x` iterations | 53 | | snapshot\_dir | "" | 存放snapshot的目录 | 54 | | resume_clock | -1 | if specified, resume from iteration `x` | 55 | | resume\_dir | ""|从存放snapshot的目录中恢复 | 56 | 57 | ### 5. Runtime Statistics 58 | 59 | PS可以记录runtime statistics,但需要在defns.mk中注释掉下面这一行 60 | 61 | ```c++ 62 | PETUUM_CXXFLAGS += -DPETUUM_STATS 63 | ``` 64 | ## Table配置 65 | 66 | ### 1. 选择Client cahce types 67 | 68 | Client端的cache type由两种: 69 | 70 | - BoundedDense 71 | 72 | BoundedDense是一个连续的memory chunk,适用于模型能够全部装载到client的memory里面的情况。如果`C`代表cache capacity ,那么此时可以访问到row IDs就是\[0, C-1\]。 73 | - BoundedSparse 74 | 75 | BoundedSparse支持换出操作,因此适合于memory不够的情况。 76 | 77 | 78 | ### 2. Staleness threshold 79 | 80 | Petuum里面最重要的概念就是staleness,也就是允许worker thread最多读取多少轮前的parameters。默认是0,通过下表设置 81 | 82 | | 名称 | 默认值 | 解释 | 83 | |:-------|:----------|:-------| 84 | | table\_info.table\_staleness | 0 | SSP staleness threshold | 85 | 86 | ### 3. Row capacity 87 | 88 | | 名称 | 默认值 | 解释 | 89 | |:-------|:----------|:-------| 90 | | table\_info.row\_capacity | 0 | Row capacity | 91 | 92 | 一些(比如dense)row types需要设置这个参数。1代表sparse,0代表dense。如果updates是dense的,最好设置成dense。 -------------------------------------------------------------------------------- /BigDataSystems/Petuum/STRADS.md: -------------------------------------------------------------------------------- 1 | # STRADS 2 | 3 | 4 | STRADS意思是STRucture-Aware Dynamic Scheduler,一个动态调度框架,用于计算Big Model类型的app。STRADS调度的是模型参数,而不是data,调度参数会让参数更新和收敛的速度更快,但要求在参数间没有依赖。目前包含两个app:Lasso和Logistic Regression。 5 | 6 | ## STRADS的四个组件 7 | 四个组件组成了scatter/gather的拓扑结构。A Coordinator, multiple workers and an aggregator make a scatter/gather style topology. 8 | 9 | ### 1. Scheduler 10 | 11 | Scheduler maintain weight information of model parameter. In each iteration, scheduler selects a set of promising model parameters to dispatch based on the weight information so that updating the scheduled parameters is likely to increase convergence speed than updating randomly selected parameter as common in stochastic method. Scheduler update weight information on receiving weight change from the coordinator when the dispatched is completed in the coordinator side. In addition to weight based sampling, STRADS scheduler runs user defined model dependency checking routine for a give set of model parameters. If any pair of parameters has too strong interference, one of them will be removed from the set. 12 | 13 | Scheduler维护参数间的权重信息$w$。在每次迭代时,scheduler先按照权重选择一部分promising的参数集合$S$分发给worker,这样的选择方式比random的选择方式能更快地收敛。当参数集合$S$计算并更新完毕后(这个过程由coordinator负责),coordinator会将$w$的更新信息发给scheduler。另外,scheduler可以运行user defined model 依赖检测程序来检测是否参数间有强依赖关系,如果有,那么去掉一个参数。 14 | 15 | 16 | ### 2. 一个Coordinator 17 | 18 | Coordinator is in charge of keeping model parameters, scattering a dispatch of parameter over the worker machines, sending back weight change information to the scheduler. In i-th iteration, the coordinator receive a dispatch set from the scheduler and scatter the dispatch together to all over the worker machines. On receiving updated model parameter values from the aggregator, it will udpate model parameters and send weight change information to the scheduler. 19 | 20 | Coordinator负责持有参数,分发参数给worker,更新参数,并将更新后的$w$信息发给scheduler。当迭代到第i轮时,coordinator会先收到scheduler发来的一个参数集合(dispatch set),然后将参数集合发到所有的worker节点。Worker计算参数更新$\Delta\theta$后,将$\Delta\theta$发给aggregator,aggregator负责更新参数,并将更新结果发给Coordinator,Coordinator存储新的$\theta$后将权重$w$更新信息发给scheduler。 21 | 22 | ### 3. 多个Worker 23 | 24 | On receiving a dispatch, worker executes user function to make a partial result with a partition of input data that is assigned to the worker machine. Each worker sends back its partial results to the aggregator. 25 | 26 | 当收到参数集合$S$后,worker在data partition上执行user function,并将计算结果$\Delta\theta$发给aggregator。 27 | 28 | ### 4. Aggregator 29 | 30 | On collecting all partial results of one dispatch, aggregator runs user defined aggregation function to get new value of model parameters. New model parameter values are sent to the coordinator to be kept. 31 | 32 | 当收到所有的所有worker发来的partial results后,aggregator运行user defined aggregation function来计算出新的参数$theta$,然后将新的参数发送给coordinator存放。 33 | 34 | 35 | ## STRADS提供的其他low level primitives 36 | 37 | - Scheduling 38 | - Global Barrier 39 | - Data Partitioning 40 | - Message abstraction 41 | 42 | ## STRADS编程接口 43 | 44 | 整个编程范型是 scatter/gather。 45 | 46 | Basically, STRADS allows users to define functions to run on scheduler, coordinator, workers and aggregator vertexes. In addition to the functions, use can define message types as C++ template for communicating across different vertexes. STRADS programming interfaces are implemented in the form of two classes. 47 | 48 | STRADS允许在scheduler,coordinator,workers和aggregator vertexes上自定义函数。除了自定义函数外,用户也可以定义不同vertex之间消息传递的message type。具体地, STRADS的编程接口由下面两个类实现: 49 | 50 | ### Handler Class 51 | 52 | Handler class is a template class where user can define user functions as class method here. T1 ~ T4 are template of user defined messages and used as parameter and return type of class methods(user functions). STRADS requires 4 major user functions for scheduling/updating parameters and three minor function for checking progress such as calculating objective function value. 53 | 54 | 用于调度和参数更新的4个Major User Functions,T1 ~ T4 是用户定义的消息类型 55 | ```c++ 56 | T1 &dispatch_scheduling(SYSMSG, T3) 57 | void do_work(T1, T2) 58 | void do_msgcombiner(T2, stradsctx) 59 | void do_aggregator(T3, stradsctx) 60 | int check_dependency(list parameters) 61 | set_initi_priority(list weight, model_cnt) 62 | ``` 63 | Minor User Functions for progress checking,比如要查看目标函数的value。 64 | ```c++ 65 | void do_obj_calc(T4, stradsctx) 66 | void do_msgcombiner_obj(T4, stradsctx) 67 | void do_object_aggregation(T4, stradsctx) 68 | ``` 69 | ### Message Class 70 | 71 | Message class is a template class that allows user to define a type of messages that contains arbitrary number of elements. Logically, user can define any type for the element. STRADS provides several template classes that can make a message with one, two or three different kinds of element types. Again, you can put arbitrary number of elements with different types on a message. If you define your message type only with POD type, you can simply finish message class definition with defining elements type. 72 | 73 | element class 74 | message class 75 | 76 | 消息可以包含任意多个elements,每个element可以是任意类型。 77 | 78 | 79 | 80 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/ServerThreads.md: -------------------------------------------------------------------------------- 1 | # ServerThreads 2 | 3 | ## 基本结构 4 | 1. 每个client上的app进程持有一个ServerThreads object,这个object管理该client上的所有server threads。这些server threads的启动过程:`app.main() => PSTableGroup::Init() => ServerThreads::Init() => ServerThreadMain(threadId) for each server thread`。 5 | 2. 每个server thread实际上是一个Server object。ServerThreads对象通过`vector threads`和`vector threads_ids`来引用server threads,通过其ServerContex指针用来访问每个server thread对应的Server object(`server_context_ptr->server_obj`)。 6 | 3. 对于每一个server thread,都持有一个ServerContext,其初始化时`server_context.bg_threads_ids`存储PS中所有bg threads的`bg_thread_id`,`server_context.server_obj`存储该server thread对应的Server object。 7 | 4. 每个Server object里面存放了三个数据结构:`client_bg_map`存放PS中有那些client,每个client上有那些bg threads;`client_ids`存放PS中有那些client;`client_clocks`是VectorClock,存放来自client的clock,初始化时clock为0。每个Server thread在初始化时会去connect PS中所有的bg thread,然后将`(client_id, 0)`添加到server thread对应的Server object中的`client_clocks`中。如果某个client上有多个bg thread,那么`(client_id, 0)`会被重复添加到`client_clocks: VectorClock`中,会做替换。注意`client_clocks: VectorClock`的长度为PS中client的总个数,也就是每一个client对应一个clock,而不是每个bg thread对应一个clock。Server object还有一个`client_vector_clock_map`的数据结构,key为`client_id`,value为该client上所有bg thread的VectorClock。也就是说每个server thread不仅存放了每个client的clock,也存放了该client上每个bg thread的clock。 8 | 5. Server object还有一个`bg_version_map`的数据结构,该结构用于存放server thread收到的bg thread的最新oplog版本。 9 | 10 | ## CreateTable 11 | 12 | Server thread启动后,会不断循环等待消息,当收到Namenode发来的`create_table_msg`时,会调用`HandleCreateTable(create_table_msg)`来createTable,会经历以下步骤: 13 | 14 | 1. 从msg中提取出tableId。 15 | 2. 回复消息给Namenode说准备创建table。 16 | 3. 初始化TableInfo消息,包括table的`staleness, row_type, columnNum (row_capacity)`。 17 | 4. 然后调用server thread对应的Server object创建table,使用`Server.CreateTable(table_id, table_info)`。 18 | 5. Server object里面有个`map tables`数据结构,`CreateTable(table_id)`就是new出一个ServerTable,然后将其加入这个map。 19 | 6. ServerTable object会存放`table_info`,并且有一个`map storage`,这个map用来存放ServerTable中的rows。另外还有一个`tmp_row_buff[row_length]`的buffer。new ServerTable时,只是初始化一些这些数据结构。 20 | 21 | ## HandleClientSendOpLogMsg 22 | 23 | 当某个server thread收到client里bg thread发来的`client_send_oplog_msg`时,会调用ServerThreads的`HandleOpLogMsg(client_send_oplog_msg)`,该函数会执行如下步骤: 24 | 25 | 1. 从msg中抽取出`client_id`,判断该msg是否是clock信息,并提取出oplog的version。 26 | 2. 调用server thread对应的`ServerObj.ApplyOpLog(client_send_oplog_msg)`。该函数会将oplog中的updates requests都更新到本server thread维护的ServerTable。 27 | 3. 如果msg中没有携带clock信息,那么执行结束,否则继续下面的步骤: 28 | 4. 调用`ServerObj.Clock(client_id, bg_id)`,并返回`bool clock_changed`。该函数会更新client的VectorClock(也就是每个bg thread的clock),如果client的VectorClock中唯一最小的clock被更新,那么client本身的clock也需要更新,这种情况下`clock_changed`为true。 29 | 5. 如果`clock_changed == false`,那么结束,否则,进行下面的步骤: 30 | 6. `vector requests = serverObject.GetFulfilledRowRequests()`。 31 | 7. 对每一个request,提取其`table_id, row_id, bg_id`,然后算出bg thread的`version = serverObj.GetBgVersion(bg_id)`。 32 | 8. 根据提取的`row_id`去Server object的ServerTable中提取对应的row,使用方法`ServerRow server_row = ServerObj.FindCreateRow(table_id, row_id)`。 33 | 9. 调用`RowSubscribe(server_row, bg_id_to_client_id)`。如果consistency model是SSP,那么RowSubscribe就是SSPRowSubscribe;如果是SSP push,那么RowSubscribe就是SSPPushRowSubscribe。NMF使用是后者,因此这一步就是`SSPPushRowSubscribe(server_row, bg_id_to_client_id)`。该方法的意思是将`client_id`注册到该`server_row`,这样将该`server_row`在调用`AppendRowToBuffs`可以使用`callback_subs.AppendRowToBuffs()`。 34 | 10. 查看Server object中VectorClock中的最小clock,使用方法`server_clock = ServerObj.GetMinClock()`。 35 | 11. `ReplyRowRequest(bg_id, server_row, table_id, row_id, sersver_clock)`。 36 | 12. 最后调用`ServerPushRow()`。 37 | 38 | ### `Server.ApplyOpLog(oplog, bg_thread_id, version)` 39 | 40 | 1. check一下,确保自己`bg_version_map`中该bg thread对应的version比这个新来的version小1。 41 | 2. 更新`bg_version_map[bg_thread_id] = version`。 42 | 3. oplog里面可以存在多个update request,对于每一个update request,执行以下步骤: 43 | 4. 读取oplog中的`table_id, row_id, column_ids, num_updates, started_new_table`到updates。 44 | 5. 根据`table_id`从`ServerObj.tables`中找出对应的ServerTable。 45 | 6. 执行ServerTable的`ApplyRowOpLog(row_id, column_ids, updates, num_updates)`。该方法会找出ServerTable对应的row,并对row进行`BatchInc(column_ids, updates)`。如果ServerTable不存在该row,就先`CreateRow(row_id)`,然后`BatchInc()`。 46 | 7. 打出"Read and Apply Update Done"的日志。 47 | 48 | ### `ServerObj.Clock(client_id, bg_id)` 49 | 50 | 1. 执行`ServerObj.client_vector_clock_map[client_id].Tick(bg_id)`,该函数将client对应的VectorClock中`bg_id`对应的clock加1。 51 | 2. 如果`bg_id`对应的原始clock是VectorClock中最小值,且是唯一的最小值,那么clock+1后,需要更新client对应的clock,也就是对`client_clocks.Tick(client_id)`。 52 | 3. 然后看是否达到了snapshot的clock,达到就进行checkpoint。 53 | 54 | ## HandleRowRequestMsg 55 | 56 | 当某个server thread收到client里bg thread发来的`row_request_msg`时,会调用ServerThreads的`HandleRowRequest(bg_id, row_request_msg)`,该函数会执行如下步骤: 57 | 58 | 1. 从msg中提取出`table_id, row_id, clock`。 59 | 2. 查看ServerObj中的所有client的最小clock。使用`server_clock = ServerObj.GetMinClock()`。 60 | 3. 如果msg请求信息中的clock > `server_clock`,也就是说目前有些clients在clock时的更新信息还没有收到,那么先将这个msg的request存起来,等到ServerTable更新到clock时,再reply。具体会执行`ServerObj.AddRowRequest(sender_id, table_id, row_id, clock)`。 61 | 4. 如果msg请求信息中的clock <= `server_clock`,也就是说ServerTable中存在满足clock要求的rows,那么会执行如下步骤: 62 | 5. 得到`bg_id`的version,使用`version = ServerObj.GetBgVersion(sender_id)`,`sender_id`就是发送`row_request_msg`请求的client上面的bg thread。 63 | 6. 将ServerTable中被request的row取出来到`server_row`。 64 | 7. 调用`RowSubscribe(server_row, sender_id_to_thread_id)`。 65 | 8. 将`server_row`reply给bg thread,具体使用`ReplyRowRequest(sender_id, server_row, table_id, row_id, server_clock, version)`。 66 | 67 | 68 | 69 | ### `ServerObj.AddRowRequest(sender_id, table_id, row_id, clock)` 70 | 71 | 当来自client的request当前无法被处理的时候(server的row太old),server会调用这个函数将请求先放到队列里。具体执行如下步骤: 72 | 73 | 1. 先new一个ServerRowRequest的结构体,将`bg_id, table_id, row_id, clock`放到这个结构体中。 74 | 2. 将ServerRowRequest放进`map> clock_bg_row_requests`中,该数据结构的key是clock,vector中的index是`bg_id`,value是ServerRowRequest。 75 | 76 | ### `ReplyRowRequest(sender_id, server_row, table_id, row_id, server_clock, version)` 77 | 78 | 1. 先构造一个`ServerRowRequestReplyMsg`,然后将`table_id, row_id, server_clock, version`填入这个msg中。 79 | 2. 然后将msg序列化后发回给`bg_id`对应的bg thread。 80 | 81 | -------------------------------------------------------------------------------- /BigDataSystems/Petuum/TableCreation.md: -------------------------------------------------------------------------------- 1 | # CreateTable过程 2 | 3 | ## 基本流程 4 | 5 | 1. 每个App main Thread(比如每个节点上matrixfact.main()进程的main/init thread)调用`petuum::PSTableGroup::CreateTable(tableId, table_config)`来创建Table。 6 | 2. 该方法会调用同在一个Process里的head bg thread向NameNode thread发送创建Table的请求`create_table_msg`。 7 | 3. NameNode收到CreateTable请求,如果该Table还未创建,就在自己的线程里创建一个ServerTable。之后会忽略其他要创建同一Table的请求。 8 | 4. NameNode将CreateTable请求`create_table_msg`发送到cluster中的每个Server thread。 9 | 5. Server thread收到CreateTable请求后,先reply `create_table_reply_msg` to NameNode thread,表示自己已经知道要创建Table,然后直接在线程里创建一个ServerTable。 10 | 6. 当NameNode thread收到cluster中所有Server thread返回的reply消息后,就开始reply `create_table_reply_msg` to head bg thread说“Table已被ServerThreads创建”。 11 | 7. 当App main()里定义的所有的Table都被创建完毕(比如matrixfact里要创建三个Table),NameNode thread会向cluster中所有head bg thread发送“所有的Tables都被创建了”的消息,也就是`created_all_tables_msg`。 12 | 13 | ## 流程图 14 | ![CreateTable](figures/CreateTableThreads.png) 15 | 16 | ## 代码结构图 17 | 18 | ![CreateTable](figures/CreateTable.png) -------------------------------------------------------------------------------- /BigDataSystems/Petuum/ThreadInitialization.md: -------------------------------------------------------------------------------- 1 | # Petuum的线程启动过程分析 2 | 3 | Start PS的第一个步骤就是初始化各个线程 4 | ```c++ 5 | petuum::PSTableGroup::Init(table_group_config, false) 6 | ``` 7 | 其具体实现是 8 | - 初始化每个node上的namenode,background及server threads 9 | - 建立这些threads之间的通信关系 10 | - 为createTables()做准备 11 | 12 | ## Namenode thread 13 | 一个Petuum cluster里面只有一个Namenode thread,负责协同各个节点上的bg threads和server threads。 14 | 15 | 16 | ## Server thread 17 | 角色是PS中的Server,负责管理建立和维护用于存放parameters的global tables。 18 | 19 | ## Background (Bg) thread 20 | 角色是PS中的Client,负责管理真正计算的worker threads,并与server thread通信。在每个node上,bg threads可以有多个,其中一个负责建立本地 table。 21 | 22 | ## 代码结构与流程 23 | ![init](figures/PSTableGroup-Init().png) 24 | 25 | 26 | ## Local 模式线程启动分析 27 | 28 | 启动流程 29 | 30 | ```c++ 31 | // main thread调用PSTableGroup::Init()后变成init thread并向CommBus注册自己 32 | I1230 10:00:50.570231 9821 comm_bus.cpp:117] CommBus ThreadRegister() 33 | // init thread创建Namenode thread,该向CommBus注册自己 34 | I1230 10:01:16.210435 10014 comm_bus.cpp:117] CommBus ThreadRegister() 35 | // Namenode thread启动 36 | NameNode is ready to accept connections! 37 | // cluster中bg thread的个数 38 | I1230 10:05:09.398447 10014 name_node_thread.cpp:126] Number total_bg_threads() = 1 39 | // cluster中的server thread的个数 40 | I1230 10:05:09.398485 10014 name_node_thread.cpp:128] Number total_server_threads() = 1 41 | // app中定义的table_group_config的consistency_model = SSPPush or SSP 42 | I1230 10:06:24.141788 9821 server_threads.cpp:92] RowSubscribe = SSPPushRowSubscribe 43 | // 启动(pthread_create)所有的local server threads,这里只有一个 44 | I1230 10:09:50.340092 9821 server_threads.cpp:106] Create server thread 0 45 | // Server thread获取cluster中的client个数 46 | I1230 10:12:15.419473 10137 server_threads.cpp:239] ServerThreads num_clients = 1 47 | // Server thread自己的thread id 48 | I1230 10:12:15.419505 10137 server_threads.cpp:240] my id = 1 49 | // Server thread向CommBus注册自己 50 | I1230 10:12:15.419514 10137 comm_bus.cpp:117] CommBus ThreadRegister() 51 | // 注册成功 52 | I1230 10:12:15.419587 10137 server_threads.cpp:252] Server thread registered CommBus 53 | // Bg thread启动,id = 100,Bg thread的id从100开始 54 | I1230 10:12:51.534554 10171 bg_workers.cpp:889] Bg Worker starts here, my_id = 100 55 | // Bg thread向CommBus注册自己 56 | I1230 10:12:51.534627 10171 comm_bus.cpp:117] CommBus ThreadRegister() 57 | // Bg thread先去connect Namenode thread 58 | I1230 10:12:51.534677 10171 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 0 59 | // Bg thread去连接Namenode thread 60 | I1230 10:12:51.534683 10171 bg_workers.cpp:290] Connect to local server 0 61 | // Namenode thread 收到Bg thread id = 100的请求 62 | I1230 10:12:51.534826 10014 name_node_thread.cpp:139] Name node gets client 100 63 | // Server thread首先去连接Namenode thread 64 | I1230 10:13:18.879250 10137 server_threads.cpp:141] Connect to local name node 65 | // Namenode thread收到Server thread的请求 66 | I1230 10:13:21.051105 10014 name_node_thread.cpp:142] Name node gets server 1 67 | // Namenode已经收到所有的client和server的连接请求 68 | I1230 10:13:33.913213 10014 name_node_thread.cpp:149] Has received connections from all clients and servers, sending out connect_server_msg 69 | // Namenode向所有client (bg thread) 发送让其连接server thread的命令 70 | I1230 10:13:33.913254 10014 name_node_thread.cpp:156] Send connect_server_msg done 71 | // 发送connect_server_msg命令完毕 72 | I1230 10:13:33.913261 10014 name_node_thread.cpp:162] InitNameNode done 73 | // 每个bg thread去连接cluster中的所有的server threads,这里只有一个server thread 74 | I1230 10:13:33.929790 10171 bg_workers.cpp:283] ConnectToNameNodeOrServer server_id = 1 75 | // Bg thread连接上了server thread 76 | I1230 10:13:33.929821 10171 bg_workers.cpp:290] Connect to local server 1 77 | // 收到Namenode的连接反馈消息(client_start_msg表示连接成功) 78 | I1230 10:13:33.929862 10171 bg_workers.cpp:368] get kClientStart from 0 num_started_servers = 0 79 | // Server thread初始化完成 80 | I1230 10:23:39.355000 10137 server_threads.cpp:187] InitNonNameNode done 81 | // Bg thread收到server thread的反馈信息(client_start_msg表示连接成功) 82 | I1230 10:23:39.355051 10171 bg_workers.cpp:368] get kClientStart from 1 num_started_servers = 1 83 | // Bg thread id=100收到CreateTable的请求 84 | I1230 10:23:39.355198 10171 bg_workers.cpp:911] head bg handles CreateTable 85 | Data mode: Loading matrix sampledata/9x9_3blocks into memory... 86 | ``` 87 | Thread Ids: (local模式下Namenode,Server及Bg thread都只有一个) 88 | - 9821: main() thread 89 | - 10014: Namenode thread 90 | - 10137: Server thread 91 | - 10171: Bg thread 92 | 93 | 图解如下: 94 | 95 | ![LocalThreads](figures/LocalThreads.png) 96 | 97 | ## Distributed 模式线程启动分析 98 | 99 | 启动图解如下: 100 | 101 | ![DistributedThreads](figures/DistributedThreads.png) 102 | 103 | 可以看到各个节点上的线程启动后,Server threads和Bg threads都与Namenode threads建立了连接。然后Namenode通知所有的bg threads与集群中的所有server threads建立连接。连接建立后,可以看到Server threads和Bg threads组成了一个二分图结构,也就是所谓的Parameter Server。 -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/Architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Architecture.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/BSP-ABSP-SSP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/BSP-ABSP-SSP.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/ClientTableUpdate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/ClientTableUpdate.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/Compare-BSP-ABSP-SSP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Compare-BSP-ABSP-SSP.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/ConsistencyModel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/ConsistencyModel.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/CreateTable.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/CreateTable.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/CreateTableThreads.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/CreateTableThreads.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/DistributedThreads.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/DistributedThreads.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/LocalThreads.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/LocalThreads.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/PSTableGroup-Init().png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/PSTableGroup-Init().png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/Petuum-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Petuum-architecture.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/Petuum-ps-topology.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Petuum-ps-topology.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/Petuum架构图.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Petuum架构图.graffle -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/Petuum架构图.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/Petuum架构图.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/STRADS-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/STRADS-architecture.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/matrixfact-petuum.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/matrixfact-petuum.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/matrixfact.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/matrixfact.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/parallel-matrixfact.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/parallel-matrixfact.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/figures/petuum-overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Petuum/figures/petuum-overview.png -------------------------------------------------------------------------------- /BigDataSystems/Petuum/杂项.md: -------------------------------------------------------------------------------- 1 | # 杂项 2 | 1. 当调用`L_Table.Get()`后,ServerThreads会调用HandleOpLogMsg()来处理`client_send_oplog_msg` 消息,然后调用ApplyOpLog()处理该消息。ApplyOpLog()会调用ApplyRowOpLog()将UpdateValues更新到ServerTable中相应的Row中去。如果ServerTable中不存在该Row,那么先CreateRow(),然后ApplyRowOpLog()到该Row上。但问题是worker thread什么时候向ServerThreads发送的`client_send_oplog_msg` 消息?在bg_workers.cpp中的kBgClock中,调用HandleClockMsg(ture)后会调用CreateOpLogMsg()向ServerThreads发送消息。 3 | 4 | 2. `SSP_push_consistency_controller::Get()`中需要先`BgWorkers::WaitSystemClock()`才能进入`BgWorkers::RequestRow(table_id, row_id, stalest_clock)`方法。 5 | 6 | 3. Client传给Server的是`oplog`消息,里面包含了`row_id, column_ids, updates`信息,与ClientTable中的`oplog`类似。 7 | -------------------------------------------------------------------------------- /BigDataSystems/Spark/Build/BuildingSpark.md: -------------------------------------------------------------------------------- 1 | # Building Spark 2 | 3 | ## Using SBT 4 | To build spark-1.6.0 using SBT, we can 5 | 6 | 1. git clone Spark-1.6.0 7 | 2. Modify the sbt version 8 | 9 | ```shell 10 | cd Spark-1.6.0/projects 11 | modify build.properties (change sbt.version=0.13.7 to sbt.version=0.13.9) 12 | ``` 13 | 3. Generate idea modules 14 | ```shell 15 | cd Spark-1.6.0 16 | run ./sbt/sbt 'gen-idea no-classifiers no-sbt-classifiers' // faster 17 | or ./sbt/sbt gen-idea 18 | ``` 19 | 20 | 4. Import the projects to IDEA 21 | ```shell 22 | File -> Project from Existing Sources -> SBT -> Use auto-import -> Finish 23 | ``` 24 | 5. Select the profilers 25 | ```shell 26 | Maven projects -> Select hadoop-1, maven-3, sbt, scala-2.10, unix 27 | ``` 28 | 6. Cancel SBT's auto-import 29 | ```shell 30 | File -> Setttings -> SBT -> Cancel Use auto-import 31 | ``` 32 | 33 | When encountering some build errors, we can refer to: 34 | 35 | 1. http://stackoverflow.com/questions/25211071/compilation-errors-in-spark-datatypeconversions-scala-on-intellij-when-using-m 36 | 2. http://stackoverflow.com/questions/33311794/import-spark-source-code-into-intellj-build-error-not-found-type-sparkflumepr 37 | 3. http://blog.csdn.net/tanglizhe1105/article/details/50530104 38 | 4. http://www.iteblog.com/archives/1038 -------------------------------------------------------------------------------- /BigDataSystems/Spark/ML/Introduction to MLlib Pipeline.md: -------------------------------------------------------------------------------- 1 | # Introduction to Spark ML Pipeline 2 | 3 | ## 说明 4 | 建议在阅读本文档之前先阅读[官方文档](http://spark.apache.org/docs/latest/ml-guide.html),本文档不是官方文档的翻译,而是对Spark ML Pipeline的进一步理解与总结。 5 | 6 | ## From MLlib to Spark.ml 7 | 从1.2开始,Spark开始提供新的ML package,叫做Spark.ml。这个package提供的API接口比Spark.mllib更清晰更统一,当然最主要的特性是提供了ML pipeline功能。Spark.ml目前是alpha版本,仅包含LogisticRegression一个算法,但后面会加入更多的算法并取代现有的Spark.mllib。 8 | 9 | ## ML任务基本流程 10 | 在介绍ML pipeline之前,我们先回顾一下一个ML任务包含的典型流程: 11 | 12 | 1. 准备训练数据集 (training examples) 13 | 2. 预处理及特征抽取 (training examples => features) 14 | 3. 训练模型 (training models(features)) 15 | 4. 在测试集上进行模型评测 (testing examples => features => results) 16 | 17 | 可以看到整个ML任务实际上是一个dataflow。更确切地,是两条dataflow。一条是training过程,结束点是训练好的model,另一条是testing过程,结束点是最后得到的results (e.g., predictions)。如果要训练多个模型,那么dataflow会有更多条。 18 | 19 | 从high-level的角度来看,dataflow里只包含两种类型的操作:数据变换(上面的=>)与模型训练(产生model)。 20 | 21 | ## Spark.ml原理 22 | Spark.ml目的为用户提供简单易用的API,方便用户将整个training和testing的dataflow组织成一个统一的pipeline(其实叫workflow更合适)。Spark.ml里主要包含4个基本概念: 23 | 24 | 1. ML data model: Spark ML使用 Spark SQL里面的SchemaRDD来表示整个处理过程中的 input/output/intermediate data。一个SchemaRDD类似一个table,里面可以包含很多不同类型的column,column可以是text,feature,label,prediction等等。 25 | 26 | 2. Transformer: 俗称数据变形金刚。变形金刚可以从一样东西(比如汽车)变成另一样东西(人是不是东西?)。在Spark.ml中Transformer可以将一个SchemaRDD变成另一个SchemaRDD,变换方法由其 Transformer.transform()方法决定,整个过程与RDD.transformation()类似。可想而知,Transformer可以是feature抽取器,也可以是已经训练好的model,等等。比如,Spark.ml提供的一个`Tokenizer: Transformer `可以对training examples中的 text 进行Tokenization预处理,处理结果就是`Tokenizer.transform(text)`。 27 | 再比如Spark.ml中的LogisticRegressionModel也是一个Transformer,当它被训练好后,预测过程就是将testing examples中的features变成predictions,也就是`predictions = LogisticRegressionModel.transform(features)`。 28 | 29 | 3. Estimator: 形象地讲就是可以生产变形金刚的机器。比如要生产一个中国版的擎天柱,只需传入中国人的训练数据(比如中国人的身高,体重等),选择擎天柱的模型,然后就可以生产得到一个中国版的擎天柱。对应到Spark.ml中,要得到一个Transformer(比如要得到训练好的LogisticRegressionModel),我们要提供一些训练数据SchemaRDD给Estimator,然后构造模型(比如直接将Spark.ml中的LogisticRegression模型拿来用),设置参数 params(比如迭代次数),最后训练(`Estimator.fit(dataset: SchemaRDD, params)`)得到一个LogisticRegressionModel,类型是Model。 30 | 31 | 4. Pipeline: 将多个Transformer和Estimator组成一个DAG workflow就是pipeline。想像一下把多个变形金刚组合成战神金刚是不是很流比。具体的组装方法是`val pipeline = new Pipeline().setStages(Transformer*, Estimator*)`,* 表示0个或者多个。其实setStages()方法接收参数类型是`Array[PipelineStage]`,这里这样写是因为Transformer和Estimator都是PipelineStage的子类。得到的`val pipeline`也是一个Estimator,它可以生产(`pipeline.fit()`)出来PipelineModel (类型是Transformer,也就是那个战神金刚)。 32 | 33 | 34 | ## 例子 35 | 36 | ### 1. 官方文档中 Example: Pipeline 的图示: 37 | ![](figures/pipelineDemo.png) 38 | 从图中可以看到: 39 | 40 | 1. 该Example中的pipeline有三个PipelineStage:两个Transformer和一个Estimator。 41 | 2. 这个pipeline最后生成了一个训练好的model。 42 | 3. 利用这个训练好的model可以对testingData进行预测(也就是transform())。 43 | 4. transform()输出的Table (SchemaRDD) 会在其输入的Table里添加一列或者多列。 44 | 5. transform()输出的Table不会存放在内存中(类似RDD.tranformation()的实现原理,这里画出来只是方面说明)。 45 | 46 | 47 | 48 | ### 2. 官方文档中 Example: Model Selection via Cross-Validation 的图示: 49 | ![](figures/CrossValidatorDemo.png) 50 | 51 | 调参是一件痛苦的事情,pipeline实际上是一个调参神器。可以在一个程序里实现**交叉验证+最优参数选择**。 52 | 53 | 比如这个例子中,使用2-Fold交叉验证,特征抽取器(hashingTF)里的参数(numFeatures)有三个values{10, 100, 1000},LR模型的参数(正则化权重regParam)有两个values{0.1, 0.01}。 54 | 55 | 为了方便画图,我把这个例子改为3-Fold交叉验证,将numFeatures参数的values减少到两个。 56 | 57 | 从图中可以看到: 58 | 59 | 1. 交叉验证首先会将traning dataset 划分为k份,k-1份用来做traningData,另外1份用来做testingData。 60 | 2. Transformer和Estimator都可以有自己的参数。这里第二个Transformer(也就是HashingTF的参数有两个values,Estimator的参数(也就是LogisticRegressionModel的正则化权重)也有两个values。 61 | 3. 总的要训练的模型个数为`Values(numFeatures) * Values(regParam)`,但需要`k * Values(numFeatures) * Values(regParam)`条pipeline来训练模型。 62 | 4. 最优模型对应的`sum(metric_i)`最大(或者最小,具体要看cost function的定义),metric可以是AUC等。 63 | 5. 在训练第 i 个fold里的模型的时候,traningData和testingData可以公用。 64 | 65 | ## 实现 66 | 67 | ### 1. Transformer 68 | 目前Spark.ml里面只有少量的内置Transformer,Transformer有UnaryTransformer和Model两种子类型,具体如下: 69 | 70 | - UnaryTransformer 71 | - Tokenizer (将input String转换成小写后按空白符分割) 72 | - HashingTF (统计一个document的Term Frequentcy,并将TF信息存放到一个Sparse vector里,index是term的hash值,value是term出现的次数,numFeatures参数意思是样本documents中的总term数目) 73 | - Model 74 | - LogisticRegressionModel (LR模型) 75 | - PipelineModel (pipeline组合成的模型) 76 | - StandardScalerModel (归一化模型) 77 | - CrossValidatorModel(交叉验证模型) 78 | 79 | 80 | Transformer中的transform()实现原理很简单: 在SchemaRDD上执行SELECT操作,SELECT的时候使用transform()作为UDF。注意,一般transform()得到的SchemaRDD后会在原有的SchemaRDD上添加1个或者多个columns。 81 | 82 | 与RDD.transformation()一样,当调用Transformer.transform()时,只会生成新的SchemaRDD变量,而不会去提交job计算SchemaRDD中的内容。 83 | 84 | ### 2. Estimator 85 | 86 | 目前Spark.ml中只有几个Estimator,具体如下: 87 | 88 | - LogisticRegression(可以把LogisticRegression看作是生产learned LogisticRegressionModel的机器) 89 | - StandardScaler(生产StandardScalerModel的机器) 90 | - CrossValidator(生产CrossValidatorModel的机器) 91 | - Pipeline(生产PipelineModel 的机器) 92 | 93 | Estimator里最重要的就是`Estimator.fit(SchemaRDD, params) `方法,给定SchemaRDD和parameters后,可以生产出一个learned model。 94 | 95 | 每当调用 Estimator.fit() 后,都会产生job去训练模型,得到模型参数,类似MLlib中的`model.train()`。 96 | 97 | ### 3. Pipeline 98 | 99 | Pipeline实质是 chained Transformers。啊, 前面不是说也可以在Pipeline中加入Estimator吗?是的,加入Estimator实际上就是加入Transformer,也就是Estimator.fit()产生的Model(Transformer的子类)。同理,也可以在Pipeline中加入另一个Pipeline,反正实际加入的是Pipeline.fit()产生的PipelineModel(Transformer的子类)。 100 | 101 | 102 | ## DAG型的pipeline 103 | 上面例子中的pipeline都是串行的,如何组成DAG型的pipeline? 104 | 105 | 很遗憾,目前的Transformer都是一元(Unary)的,只能输入一个SchemaRDD,输出另一个SchemaRDD。如果以后出现二元的,比如图中的BinaryTransformer,那么可以接收两个SchemaRDD,输出一个SchemaRDD,类似`RDD.join(other RDDs)`,那么pipeline就可以是DAG型的了。 106 | 107 | 注意:目前Transformer之间的联系根据`Transformer.setInputCol()`和`Transformer.setOutputCol()`建立。 108 | ![](figures/DAGpipeline.png) 109 | 110 | 111 | ## 不足之处 112 | 113 | 由于还是alpha版,目前Spark.ml还有很多不足之处: 114 | 115 | 1. pipeline会隐藏中间数据处理结果,这样不方便调试和错误诊断。 116 | 2. 实际上没有做到完全的pipeline,训练模型(pipeline.fit())时是barrier,也就是说训练和测试过程仍然是独立的。 117 | 3. 在CrossValidator中,用于训练模型的pipelines目前不能够并行运行。 118 | 119 | 120 | 121 | 122 | 123 | 124 | -------------------------------------------------------------------------------- /BigDataSystems/Spark/ML/figures/CrossValidatorDemo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/ML/figures/CrossValidatorDemo.png -------------------------------------------------------------------------------- /BigDataSystems/Spark/ML/figures/DAGpipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/ML/figures/DAGpipeline.png -------------------------------------------------------------------------------- /BigDataSystems/Spark/ML/figures/pipelineDemo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/ML/figures/pipelineDemo.png -------------------------------------------------------------------------------- /BigDataSystems/Spark/Scheduler/SparkResourceManager.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/SparkResourceManager.graffle -------------------------------------------------------------------------------- /BigDataSystems/Spark/Scheduler/SparkScheduler.graffle: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/SparkScheduler.graffle -------------------------------------------------------------------------------- /BigDataSystems/Spark/Scheduler/SparkScheduler.md: -------------------------------------------------------------------------------- 1 | ## Spark scheduler 2 | 3 | ### Master 分配 Executor 方法 4 | 5 | ```scala 6 | private def schedule(): Unit = { 7 | // 先打乱 workers 8 | val shuffledWorkers = Random.shuffle(workers) 9 | // 对通过 Spark-submit 提交(也就是 AppClient 类提交)的 app 来说下面这个 for 语句没用 10 | for (worker <- shuffledWorkers if worker.state == WorkerState.ALIVE) { 11 | for (driver <- waitingDrivers) { 12 | if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) { 13 | launchDriver(worker, driver) 14 | waitingDrivers -= driver 15 | } 16 | } 17 | } 18 | // 开始在 workers 上分配 executors 19 | startExecutorsOnWorkers() 20 | } 21 | 22 | ``` 23 | #### `startExecutorsOnWorkers()` 逻辑 24 | 25 | ```scala 26 | private def startExecutorsOnWorkers(): Unit = { 27 | // FIFO 调度策略 28 | for (app <- waitingApps if app.coresLeft > 0) { 29 | // 得到每个 executor 需要的 cores 数目 30 | val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor 31 | // 挑选出可用的 workers,将可用 workers 的资源(空闲 CPU core 个数)按照降序排列 32 | val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE) 33 | .filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB && 34 | worker.coresFree >= coresPerExecutor.getOrElse(1)) 35 | .sortBy(_.coresFree).reverse 36 | // 资源分配算法,assignedCores 是一个数组,第 i 个元素表示应该往第 i个 usableWorkers 上分配多少个 core 37 | val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps) 38 | 39 | // 已经得到应该往每个 worker 上分配多少个 core,开始分配 40 | for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) { 41 | allocateWorkerResourceToExecutors( 42 | app, assignedCores(pos), coresPerExecutor, usableWorkers(pos)) 43 | } 44 | } 45 | } 46 | ``` 47 | 48 | #### `scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)`逻辑 49 | 50 | ```scala 51 | private def scheduleExecutorsOnWorkers( 52 | app: ApplicationInfo, 53 | usableWorkers: Array[WorkerInfo], 54 | spreadOutApps: Boolean): Array[Int] = { 55 | // 首先进行一系列初始化 56 | // 每个 executor 需要多少个 core 57 | val coresPerExecutor = app.desc.coresPerExecutor 58 | // 每个 executor 最少需要多少个 core,默认是 1 59 | val minCoresPerExecutor = coresPerExecutor.getOrElse(1) 60 | // 如果用户没有设置 coresPerExecutor,那么 oneExecutorPerWorker 为 true 61 | val oneExecutorPerWorker = coresPerExecutor.isEmpty 62 | // 每个 executor 需要的 memory 用量 63 | val memoryPerExecutor = app.desc.memoryPerExecutorMB 64 | // 可用的 workers 个数 65 | val numUsable = usableWorkers.length 66 | // 每个 worker 要提供的 cores 个数 67 | val assignedCores = new Array[Int](numUsable) 68 | // 在每个 worker 上分配的 executor 个数 69 | val assignedExecutors = new Array[Int](numUsable) 70 | // 要分配的 core 个数 = min(app 需求的 cores,workers 剩余 cores 之和) 71 | var coresToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum) 72 | 73 | 74 | // Keep launching executors until no more workers can accommodate any 75 | // more executors, or if we have reached this application's limits 76 | 77 | // 从所有 workers 中筛选出可用的 workers,筛选算法见 canLauchExecutor 78 | var freeWorkers = (0 until numUsable).filter(canLaunchExecutor) 79 | 80 | while (freeWorkers.nonEmpty) { 81 | freeWorkers.foreach { pos => 82 | var keepScheduling = true 83 | // 如果该 worker 上可以启动 executor 84 | while (keepScheduling && canLaunchExecutor(pos)) { 85 | // 需要分配的 cores 的数目减去每个 executor 需要的 core 个数 86 | coresToAssign -= minCoresPerExecutor 87 | // 将分配好的 core 信息保存到 assignedCores 里面 88 | assignedCores(pos) += minCoresPerExecutor 89 | 90 | // If we are launching one executor per worker, then every iteration assigns 1 core 91 | // to the executor. Otherwise, every iteration assigns cores to a new executor. 92 | if (oneExecutorPerWorker) { 93 | assignedExecutors(pos) = 1 94 | } else { 95 | assignedExecutors(pos) += 1 96 | } 97 | 98 | // Spreading out an application means spreading out its executors across as 99 | // many workers as possible. If we are not spreading out, then we should keep 100 | // scheduling executors on this worker until we use all of its resources. 101 | // Otherwise, just move on to the next worker. 102 | // 如果选择 spreadOut 模式,那么在一个 worker 上分配一个 executor 的 cores 后,就更换 103 | // worker 再分配 104 | if (spreadOutApps) { 105 | keepScheduling = false 106 | } 107 | } 108 | } 109 | // 从 freeWorkers 中再挑选出可以启动 executor 的 workers 110 | freeWorkers = freeWorkers.filter(canLaunchExecutor) 111 | } 112 | 返回在 workers 上分配的 CPU core 资源信息 113 | assignedCores 114 | ``` 115 | 116 | #### `scheduleExecutorsOnWorkers().canLaunchExecutor`逻辑 117 | 118 | ```scala 119 | /** Return whether the specified worker can launch an executor for this app. */ 120 | def canLaunchExecutor(pos: Int): Boolean = { 121 | // 如果 app 里要分配的 core 个数大于每个 executor 需要的个数,仍然继续调度 122 | val keepScheduling = coresToAssign >= minCoresPerExecutor 123 | // 当前 worker 是否有足够的 core 来分配 124 | val enoughCores = usableWorkers(pos).coresFree - assignedCores(pos) >= minCoresPerExecutor 125 | 126 | // If we allow multiple executors per worker, then we can always launch new executors. 127 | // Otherwise, if there is already an executor on this worker, just give it more cores. 128 | val launchingNewExecutor = !oneExecutorPerWorker || assignedExecutors(pos) == 0 129 | // 如果每个 worker 可以分配多个 executor,或者这个 worker 上还没分配到 executor 130 | if (launchingNewExecutor) { 131 | // 计算要在该 worker 上分配了多少 memory 132 | val assignedMemory = assignedExecutors(pos) * memoryPerExecutor 133 | // 计算该 worker 上是否有足够的 memory 来分配 executor 134 | val enoughMemory = usableWorkers(pos).memoryFree - assignedMemory >= memoryPerExecutor 135 | // 检测是否超过 app executor 数目上限,用于动态调度 136 | val underLimit = assignedExecutors.sum + app.executors.size < app.executorLimit 137 | keepScheduling && enoughCores && enoughMemory && underLimit 138 | } else { 139 | // We're adding cores to an existing executor, so no need 140 | // to check memory and executor limits 141 | // 如果每个 worker 只运行一个 executor,那么直接在该 executor 上增加 core 个数 142 | keepScheduling && enoughCores 143 | } 144 | } 145 | ``` 146 | 147 | #### `allocateWorkerResourceToExecutors()`逻辑 148 | 149 | ```scala 150 | /** 151 | * Allocate a worker's resources to one or more executors. 152 | * @param app the info of the application which the executors belong to 153 | * @param assignedCores number of cores on this worker for this application 154 | * @param coresPerExecutor number of cores per executor 155 | * @param worker the worker info 156 | */ 157 | private def allocateWorkerResourceToExecutors( 158 | app: ApplicationInfo, 159 | assignedCores: Int, 160 | coresPerExecutor: Option[Int], 161 | worker: WorkerInfo): Unit = { 162 | // If the number of cores per executor is specified, we divide the cores assigned 163 | // to this worker evenly among the executors with no remainder. 164 | // Otherwise, we launch a single executor that grabs all the assignedCores on this worker. 165 | 166 | // 计算需要在该 worker 上启动多少个 executors (assignedCores / coresPerExecutor) 167 | val numExecutors = coresPerExecutor.map { assignedCores / _ }.getOrElse(1) 168 | // 每个 executor 需要多少个 core 169 | val coresToAssign = coresPerExecutor.getOrElse(assignedCores) 170 | // 将 executor 信息加到 app 上,在 worker 上启动相应的 executor 171 | for (i <- 1 to numExecutors) { 172 | val exec = app.addExecutor(worker, coresToAssign) 173 | launchExecutor(worker, exec) 174 | app.state = ApplicationState.RUNNING 175 | } 176 | } 177 | ``` -------------------------------------------------------------------------------- /BigDataSystems/Spark/Scheduler/figures/SparkResourceManager.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkResourceManager.pdf -------------------------------------------------------------------------------- /BigDataSystems/Spark/Scheduler/figures/SparkSchedulerAppSubmit.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkSchedulerAppSubmit.pdf -------------------------------------------------------------------------------- /BigDataSystems/Spark/Scheduler/figures/SparkStandaloneMaster.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneMaster.pdf -------------------------------------------------------------------------------- /BigDataSystems/Spark/Scheduler/figures/SparkStandaloneResourceAllocation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneResourceAllocation.pdf -------------------------------------------------------------------------------- /BigDataSystems/Spark/Scheduler/figures/SparkStandaloneTaskScheduler.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneTaskScheduler.pdf -------------------------------------------------------------------------------- /BigDataSystems/Spark/Scheduler/figures/SparkStandaloneTaskSchedulerChinese.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/Scheduler/figures/SparkStandaloneTaskSchedulerChinese.pdf -------------------------------------------------------------------------------- /BigDataSystems/Spark/StackOverflowDiagnosis/StackOverflow.md: -------------------------------------------------------------------------------- 1 | # 一个KCore算法引发的StackOverflow血案 2 | ——记一次扑朔迷离的Spark StackOverflow侦破过程 3 | 4 | 5 | ## 案件概述 6 | 7 | 故事开始于一个KCore算法,这是一个求解图中所有节点KCore值的算法。KCore的算法特点决定了它需要迭代很多轮才能收敛。在亿级别的新浪数据上,迭代个几百次是小Case的。当我们翘首期盼收敛结果时,算法却引发了一个StackOverflow的血案。 8 | 9 | 一开始,算法在本地小数集上测试通过,没有任何问题,但放到集群上运行后总是出现StackOverflow错误。起初,我们认为错误原因不过是典型的RDD的lineage过长问题,也就是lineage长度随着算法迭代次数增加而不断变长,最后导致Spark在序列化该lineage的时候调用栈溢出。我们随手拎起checkpoint的宝刀,认为每迭代几轮checkpoint一下,可以轻而易举地截断lineage,从而避免这个问题。可是当checkpoint加入后,错误仍然出现。迫不得已,我们进行小米加步枪地debug,来查找是否还有其他影响因素。 10 | 11 | 在debug过程中,StackOverflow这种类型的错误,展现了比OutOfMemory更难驯服的个性,尤其是在大规模的集群上。这逼迫我们想办法把案发现场转移到单机环境,并通过非常Geek的方式,模拟出算法在分布式环境下的出错情景,重现案发现场,最后追本溯源找到了问题的根源所在:RDD的 f 函数闭包和GraphX中的一个小bug。 12 | 13 | 这两个小bug,最终导致task的序列化链,可以偷偷穿越被断掉的lineage而不断延续,也就是task的序列化链随着迭代次数增加不断增长,最终造成StackOverflow错误。整个断案过程耗时一周,可谓扑朔迷离,柳暗花明又一村,且听我们娓娓道来。 14 | 15 | 16 | ## 案件描述 17 | 18 | 在集群上运行KCore算法的时候(完整版参见[1])时,会稳定的在第300+轮出现StackOverflow错误,这个错误由JDK内部的序列化/反序列化方法抛出。不管我们怎么调优参数,错误总会如期而至。 19 | 20 | 这类算法的特点: 21 | 22 | 1. 具有很长的computing chain 23 | 24 | 比如下面的 “degreeGraph=>subGraph=>degreeGraph=>subGraph=>…=>” 25 | 26 | 2. 迭代非常多次才能收敛 27 | 28 | ```scala 29 | //K-Core Algorithm 30 | val kNum = 5 31 | 32 | var degreeGraph = graph.outerJoinVertices(graph.degrees) { 33 | (vid, vd, degree) => degree.getOrElse(0) 34 | }.cache() 35 | 36 | do { 37 | val subGraph = degreeGraph.subgraph( 38 | vpred = (vid, degree) => degree >= KNum 39 | ).cache() 40 | 41 | val newDegreeGraph = subGraph.degrees 42 | 43 | degreeGraph = subGraph.outerJoinVertices(newDegreeGraph) { 44 | (vid, vd, degree) => degree.getOrElse(0) 45 | }.cache() 46 | 47 | isConverged = check(degreeGraph) 48 | } while(isConverged == false) 49 | ``` 50 | 51 | 它产生的错误栈如下: 52 | 53 | * 错误栈1(在JDK序列化时产生): 54 | 55 | ```java 56 | Exception in thread "main" org.apache.spark.SparkException: 57 | Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError 58 | java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1275) 59 | java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1230) 60 | ... 61 | java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) 62 | java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) 63 | scala.collection.immutable.$colon$colon.writeObject(List.scala:379) 64 | sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 65 | ... 66 | sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 67 | java.lang.reflect.Method.invoke(Method.java:606) 68 | ``` 69 | * 错误栈2(也可以在JDK反序列化时产生): 70 | 71 | ```java 72 | ERROR Executor: Exception in task 1.0 in stage 339993.0 (TID 3341) 73 | java.lang.StackOverflowError 74 | at java.lang.StringBuilder.append(StringBuilder.java:204) 75 | at java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3143) 76 | ... 77 | java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) 78 | java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) 79 | scala.collection.immutable.$colon$colon.readObject(List.scala:362) 80 | sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) 81 | sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 82 | java.lang.reflect.Method.invoke(Method.java:606) 83 | java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) 84 | ``` 85 | 86 | ## 案件分析 87 | 88 | 日志显示这个错误在task序列化/反序列化时产生,首先定位到error被抛出的地点是在 89 | 90 | 1. driver端在序列化task的时候。具体地点在DAGScheduler.scala的 91 | 92 | ```scala 93 | if (stage.isShuffleMap) { //如果是shuffleMapTask就序列化stage中最后一个RDD及Shuffle依赖关系 94 | closureSerializer.serialize((stage.rdd, stage.shuffleDep.get) : AnyRef).array() 95 | } else { //如果是ReduceTask就序列化stage中最后一个RDD及用于计算结果的func 96 | closureSerializer.serialize((stage.rdd, stage.resultOfJob.get.func) : AnyRef).array() 97 | } 98 | ``` 99 | 100 | 2. executor端在反序列化task的时候。具体地点在 101 | 102 | ShuffleMapTask.scala 103 | ```scala 104 | val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])]( 105 | ByteBuffer.wrap(taskBinary.value), 106 | Thread.currentThread.getContextClassLoader) 107 | ``` 108 | 109 | 或 ResultTask.scala 110 | 111 | ```scala 112 | val (rdd, func) = ser.deserialize[(RDD[T], 113 | (TaskContext, Iterator[T]) => U)](ByteBuffer.wrap(taskBinary.value), 114 | Thread.currentThread.getContextClassLoader) 115 | ``` 116 | 117 | 118 | 119 | 所以,我们认定原因是Spark在序列化task时产生了一条很长的调用链(以下称为**序列化链**),但是这条链是什么?为什么会那么长? 120 | 121 | 122 | 1. **分析task在序列化时要序列化哪些内容:** 123 | 124 | 首先明确一个概念:每个stage生成一组task。Task在序列化的时候主要是要序列化该stage中的最后一个RDD(后面称为finalRDD)。我们分析了RDD的代码,发现在序列化RDD时,需要序列化RDD的成员变量有`RDD id,dependencies_,storageLevel`等。其中最主要的是`dependencies_`变量,它存放着该RDD的依赖关系,也就是该RDD如何依赖于其他的哪些RDDs,这个依赖关系俗称为**lineage**。设想,当序列化后的task发到remote worker node上时,executor只需要反序列化出finalRDD,然后通过它的lineage就可以从最初的数据源(或者shuffleMapTask的输出结果)一步步计算得到finalRDD。 125 | 126 | 2. **分析lineage对序列化链的影响:** 127 | 128 | 由于在序列化finalRDD的dependencies\_时会序列化finalRDD依赖的上游的RDDs的dependencies\_,那么这个序列化过程实际上是lineage的回溯过程。假设lineage是这样的 `RDD a => RDD b => RDD c => finalRDD`,那么首先序列化`finalRDD`,然后顺着dependencies去序列化`RDD c`,然后顺着dependencies去序列化`RDD b`,最后顺着dependencies去序列化`RDD a`。可见lineage长度越长,序列化链越长,最终可以造成栈溢出。 129 | 130 | 3. **分析lineage增长规律:** 131 | 132 | 虽然我们推测lineage过长是错误原因,但需要实验证明。我们选择在本地小数据集来实验(大数据上实验一次要2个小时),在小数据集上算法迭代12轮后可以收敛。我们选择了几个关键的RDD,通过`RDD.toDebugString()`来输出这些RDD在每轮迭代中的lineage。起初我们犯了一个小错误,将toDebugString()输出的lineage长度当作序列化时的lineage长度。经 [@hashjoin](http://weibo.com/u/1630850750) 提醒ShuffledRDD可以断掉序列化时的lineage,也就是说序列化链碰到ShuffledRDD就停止了,这与stage划分方法一致。我们重新审查了每轮迭代toDebugString()输出的lineage长度,却发现一个小问题:toDebugString()输出的lineage与真正的lineage有出入:真正的lineage是一个DAG,而toDebugString()输出的是一个stage tree(每个stage中的RDDs被输出成一条线性依赖链),所以像`ZippedPartitionsRDD2`这样同时依赖于两个RDD的依赖关系没有能够在toDebugString()的输出中表示出来。为了准确计算序列化时lineage的长度,我们修改了toDebugString()方法,加入了depth信息,类似求从DAG源点到每个节点的距离,最后终于统计出来每轮迭代lineage长度会增长3。 133 | 134 | 小插曲:由于每轮迭代产生的lineage太长,后面几轮迭代产生的lineage根本输出不了(打印lineage时产生的String会造成本地程序OutOfMemory)。没办法,我们修改了toDebugString()方法,只统计depth,不显示lineage。即使这样,如果lineage很长,toDebugString()执行时间也接受不了。 135 | 136 | lineage的demo参见[2]和[3]。 137 | 138 | ## 推理诊断 139 | 140 | ### 1. 尝试使用checkpoint避免错误失败 141 | 142 | 竟然找到了错误原因是不断增长的lineage,我们只需要每隔几轮就checkpoint()一次,就可以断掉不断增长的lineage,从而控制序列化链的长度。下表分别显示了在**“不进行checkpoint,每隔6轮checkpoint一次,每隔8轮checkpoint一次”**情况下得到的某个重要的RDD的lineage长度变化规律。 143 | 144 | 145 | |i-th iter| WithoutCheckpoint| checkpoint-every-6-iters| checkpoint-every-8-iters| 146 | |----:|--------:|------:|------:| 147 | |1 |9 |2 |2| 148 | |2 |12 |6 |6| 149 | |3 |15 |9 |9| 150 | |4 |18 |12 |12| 151 | |5 |21 |15 |15| 152 | |6 |24 |18 |18| 153 | |7 |27 |2 |21| 154 | |8 |30 |6 |24| 155 | |9 |33 |9 |2| 156 | |10 | 执行时间太长,算不出来 :-( |12 |6| 157 | |11 | 执行时间太长,算不出来 :-( |15 |9| 158 | |12 | 执行时间太长,算不出来 :-( |18 |12| 159 | 160 | 161 | 结果看起来不错,checkpoint以后我们可以控制lineage的长度了,也就解决问题了。 162 | 163 | 然而,现实总是那么残酷,加上checkpoint()后,在大数据集上仍然StackOverflow了,而且还是在300+轮。 164 | 165 | 166 | ### 2. 错误本地重现 167 | 168 | 还有什么因素可以导致序列化链变长?而且随着迭代次数增长而变长?而且checkpoint()也断不了? 169 | 170 | 如果错误是OutOfMemory,那么直接去dump heap,找日志,分析每个object的来源,然后debug调参,等等。可惜错误是StackOverflow,这个错误产生是一瞬间的事,要debug到错误产生的那个时间点非常难,而且无法生成heap dump,而且是在集群上产生的,而且运行一次要2个多小时。 171 | 172 | 我们仔细分析了GraphX的实现代码,看每个函数都干了些什么事情,但依然没有头绪。无奈之下,我们做了一个非常tricky的尝试,修改收敛条件,让算法在小数据集上无限迭代,看错误是否能在本地重现。测试到第400+轮的时候,错误终于在本地重现。为了让错误更快地出现,我们调小了stack大小,设为128KB,这样第30+轮的时候就可以重现。 173 | 174 | 175 | ### 3. 诊断错误真正原因 176 | 177 | 本地可以重现后,下一步就是要debug到错误产生的时间点。这个点很难把握,因为一个迭代型算法产生的stage非常多,要预测哪个stage会产生error很难(也许直接catch StackOverflowError可以debug到那个时间点,但我们没有尝试,也可能catch到的时候调用栈已经自动退栈了)。我们选择了第20轮附近的某个stage,并假定这个stage就会产生StackOverflow,然后在JDK中的`ObjectOutputStream.java`的`writeObject()`方法处设置断点,仔细观察task序列化的每一步都序列化了什么内容。我们观察了很久很久,仔细查看了几百个stack frame中的内容,发现除了序列化RDD及其依赖关系以外,还序列化了一个奇怪的东西,那就是**RDD的 f 函数闭包中的$outer**,这个$outer指向了一个不在lineage中的RDD(VertexRDD)。也就是说,当序列化RDD的时候,其 f 函数闭包引用的VertexRDD也被序列化了。而在序列化这个VertexRDD时又序列化了它的成员变量PartitionsRDD。这个`f -> $outer (VertexRDD) -> PartitionsRDD` 的序列化链不属于正常的lineage,可能是错误原因。 178 | 179 | > Note:很多包含 f 的RDDs,都可能存在不正常的$outer,不正常是指这个$outer会引用到其他不在lineage中RDDs。 180 | 181 | 经过深入的源代码分析+stack分析,我们最后确定这条链可以重新连接被checkpoint断掉的lineage,也就是说序列化的时候可以通过不正常的 f 序列化链访问到之前迭代产生的RDDs。图示如下: 182 | 183 | 替代文本 184 | 185 | 186 | 187 | 所有被OneToOneDep连接的RDD是正常的lineage,这些RDD也会被正常序列化。如果某个RDD被checkpoint,比如Figure 1中的A,那么A的依赖链会被断掉(dependencies\_被置为null),这样序列化到A或者ShuffledRDD就停止了。然而,由于A的 f 的函数闭包引用了VertexRDD,而partitionsRDD又是VertexRDD的成员变量,当序列化到A的时候会顺着`(1)->(2)`链又访问到`RDDs in previous iteartions`,最后造成序列化链与checkpoint前的lineage一样长。这样,随着迭代次数增加,lineage不断变长,序列化链不断变长,最后就StackOverflow了。 188 | 189 | 下图显示了我们debug到的`ZippedPartitionsRDD2 -> f -> $outer -> partitionsRDD`的序列化链。 190 | 191 | 替代文本 192 | 193 | 更严重的是,单单这条链不断迭代就可以造成StackOverflow,下图显示了连续不断的,与checkpoint之前一样长的序列化链`VertexRDD -> partitionsRDD -> f -> $outer -> partitionsRDD -> f -> $outer -> VertexRDD -> ...` 194 | 195 | 替代文本 196 | 197 | 198 | ## 总结陈词 199 | 200 | 至此,StackOverflow的原因水落石出,罪魁祸首是这3个原因: 201 | 202 | 1. lineage过长,且随着迭代次数增加而增长 203 | 2. 异常的 f 函数闭包,即产生外部RDD引用的 f 闭包。 204 | 3. GraphX小bug:partitionsRDD是VertexRDD的non-transient成员变量 205 | 206 | 207 | ## 案件处理 208 | 209 | 从上面分析可以看出,只需要切断Figure 1中的(1)或(2)就可以达到截断序列化链的目的。 210 | 211 | 截断(1)的方法有两种: 212 | 213 | 1. 修改GraphX的代码结构,使得partitionsRDD的 f 的函数闭包不再引用VertexRDD。目前来看这种方法不行,因为需要重构代码。 214 | 2. 在checkpoint的时候将RDD的 f 置为null。这样如果A被checkpoint,那么f函数闭包会同时被清理掉,就不存在链接(1)了。 215 | 216 | 截断(2)的方法很简单: 217 | 218 | - 直接将partitionsRDD置为transient,相当于序列化时partitionsRDD不是VertexRDD的成员变量,也就不存在连接(2)了。由于partitionsRDD已经通过lineage序列化,不用担心置为transient后会造成task不能计算的情况。 219 | 220 | 我们针对这三个方法,提交了 [issue-4672](https://issues.apache.org/jira/browse/SPARK-4672) 和三个PR,目前都被merge到master和Spark-1.2版本中了。 221 | 222 | 另外,我们发现之前 [@witgo](https://github.com/witgo) 提交了一个 [issue-3623](https://issues.apache.org/jira/browse/SPARK-3623) 讨论了GraphX的checkpoint问题和修复方法,提了几个很好的问题,但没有提到StackOverflow error。 223 | 224 | ## 进一步思考 225 | 226 | 虽然上面的解决方案可以解决GraphX中迭代算法的问题,但其他迭代算法呢?通过上面的错误诊断分析,committers(包括 [@Jason Dai](http://weibo.com/u/3816918426))和我们认识到这是一个general的问题,即:怎么保证task和RDD在序列化时只序列化“重要的”的东西,而自动去除不必要的引用?@Jason Dai给出了一些解决思路,见 [issue-4672](https://issues.apache.org/jira/browse/SPARK-4672) 里面的 comments。如果大家有更优雅的解决思路或者具体的思路实现可以讨论、实现、提交新的PR。 227 | 228 | 这个问题只是分布式系统可靠性和性能优化方面问题的冰山一角,我们还遇到一些可靠性与性能 trade-off 方面的一些问题,也在考虑是否有更优雅的解决方案。 229 | 230 | 感谢[@明风Andy](http://weibo.com/u/2304284334)的统筹、审阅和修改,这个问题最初由[@张萌Taobao](http://weibo.com/u/2712640302)发现,并参与了诊断工作。 231 | 232 | ## References 233 | [1] 造成StackOverflow的完整代码,(需要将最后的收敛条件设置为`filteredCount >= 0L`可以产生无限迭代进而产生错误) 234 | 235 | ```scala 236 | package graphx.test 237 | 238 | import org.apache.hadoop.conf.Configuration 239 | import org.apache.hadoop.fs.{Path, FileSystem} 240 | import org.apache.spark.SparkContext 241 | import org.apache.spark.graphx._ 242 | import org.apache.spark.rdd.RDD 243 | 244 | object SimpleIterAppTriggersStackOverflow { 245 | 246 | def main(args: Array[String]) { 247 | 248 | val sc = new SparkContext("local[2]", "Kcore") 249 | val checkpointPath = "D:\\data\\checkpoint" 250 | sc.setCheckpointDir(checkpointPath) 251 | 252 | 253 | val edges: RDD[(Long, Long)] = 254 | sc.parallelize(Array( 255 | (1L, 17L), (2L, 4L), 256 | (3L, 4L), (4L, 17L), 257 | (4L, 16L), (5L, 15L), 258 | (6L, 7L), (7L, 15L), 259 | (8L, 12L), (9L, 12L), 260 | (10L, 12L), (11L, 12L), 261 | (12L, 18L), (13L, 14L), 262 | (13L, 17L), (14L, 17L), 263 | (15L, 16L), (15L, 19L), 264 | (16L, 17L), (16L, 18L), 265 | (16L, 19L), (17L, 18L), 266 | (17L, 19L), (18L, 19L))) 267 | 268 | 269 | val graph = Graph.fromEdgeTuples(edges, 1).cache() 270 | 271 | var degreeGraph = graph.outerJoinVertices(graph.degrees) { 272 | (vid, vd, degree) => degree.getOrElse(0) 273 | }.cache() 274 | 275 | var filteredCount = 0L 276 | var iters = 0 277 | 278 | val kNum = 5 279 | val checkpointInterval = 10 280 | 281 | do { 282 | 283 | val subGraph = degreeGraph.subgraph(vpred = (vid, degree) => degree >= kNum).cache() 284 | 285 | val preDegreeGraph = degreeGraph 286 | degreeGraph = subGraph.outerJoinVertices(subGraph.degrees) { 287 | (vid, vd, degree) => degree.getOrElse(0) 288 | }.cache() 289 | 290 | if (iters % checkpointInterval == 0) { 291 | 292 | try { 293 | val fs = FileSystem.get(new Configuration()) 294 | if (fs.exists(new Path(checkpointPath))) 295 | fs.delete(new Path(checkpointPath), true) 296 | } catch { 297 | case e: Throwable => { 298 | e.printStackTrace() 299 | println("Something Wrong in GetKCoreGraph Checkpoint Path " + checkpointPath) 300 | System.exit(0) 301 | } 302 | } 303 | 304 | degreeGraph.edges.checkpoint() 305 | degreeGraph.vertices.checkpoint() 306 | 307 | } 308 | 309 | val dVertices = degreeGraph.vertices.count() 310 | val dEdges = degreeGraph.edges.count() 311 | 312 | println("[Iter " + iters + "] dVertices = " + dVertices + ", dEdges = " + dEdges) 313 | 314 | filteredCount = degreeGraph.vertices.filter { 315 | case (vid, degree) => degree < kNum 316 | }.count() 317 | 318 | preDegreeGraph.unpersistVertices() 319 | preDegreeGraph.edges.unpersist() 320 | subGraph.unpersistVertices() 321 | subGraph.edges.unpersist() 322 | 323 | 324 | iters += 1 325 | } while (filteredCount >= 1L) 326 | 327 | println(degreeGraph.vertices.count()) 328 | } 329 | } 330 | ``` 331 | [2] lineage demo,未加checkpoint,第一轮迭代 332 | 333 | ```scala 334 | [Iter 1][DEBUG] (2) EdgeRDD[33] at RDD at EdgeRDD.scala:35 335 | | EdgeRDD ZippedPartitionsRDD2[32] at zipPartitions at ReplicatedVertexView.scala:114 336 | | EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169 337 | | MappedRDD[11] at map at Graph.scala:392 338 | | MappedRDD[10] at distinct at KCoreCommonDebug.scala:115 339 | | ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115 340 | +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115 341 | | FilteredRDD[7] at filter at KCoreCommonDebug.scala:112 342 | | MappedRDD[6] at map at KCoreCommonDebug.scala:102 343 | | MappedRDD[5] at repartition at KCoreCommonDebug.scala:101 344 | | CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101 345 | | ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101 346 | +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101 347 | | D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100 348 | | D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:100 349 | | ShuffledRDD[31] at partitionBy at ReplicatedVertexView.scala:112 350 | +-(2) ReplicatedVertexView.updateVertices - shippedVerts false false (broadcast) MapPartitionsRDD[30] at mapPartitions at VertexRDD.scala:347 351 | | VertexRDD ZippedPartitionsRDD2[28] at zipPartitions at VertexRDD.scala:174 352 | | VertexRDD, VertexRDD MapPartitionsRDD[18] at mapPartitions at VertexRDD.scala:441 353 | | MapPartitionsRDD[17] at mapPartitions at VertexRDD.scala:457 354 | | ShuffledRDD[16] at ShuffledRDD at RoutingTablePartition.scala:36 355 | +-(2) VertexRDD.createRoutingTables - vid2pid (aggregation) MapPartitionsRDD[15] at mapPartitions at VertexRDD.scala:452 356 | | EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169 357 | | MappedRDD[11] at map at Graph.scala:392 358 | | MappedRDD[10] at distinct at KCoreCommonDebug.scala:115 359 | | ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115 360 | +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115 361 | | FilteredRDD[7] at filter at KCoreCommonDebug.scala:112 362 | | MappedRDD[6] at map at KCoreCommonDebug.scala:102 363 | | MappedRDD[5] at repartition at KCoreCommonDebug.scala:101 364 | | CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101 365 | | ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101 366 | +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101 367 | | D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100 368 | | D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:100 369 | | VertexRDD ZippedPartitionsRDD2[26] at zipPartitions at VertexRDD.scala:200 370 | | VertexRDD, VertexRDD MapPartitionsRDD[18] at mapPartitions at VertexRDD.scala:441 371 | | MapPartitionsRDD[17] at mapPartitions at VertexRDD.scala:457 372 | | ShuffledRDD[16] at ShuffledRDD at RoutingTablePartition.scala:36 373 | +-(2) VertexRDD.createRoutingTables - vid2pid (aggregation) MapPartitionsRDD[15] at mapPartitions at VertexRDD.scala:452 374 | | EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169 375 | | MappedRDD[11] at map at Graph.scala:392 376 | | MappedRDD[10] at distinct at KCoreCommonDebug.scala:115 377 | | ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115 378 | +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115 379 | | FilteredRDD[7] at filter at KCoreCommonDebug.scala:112 380 | | MappedRDD[6] at map at KCoreCommonDebug.scala:102 381 | | MappedRDD[5] at repartition at KCoreCommonDebug.scala:101 382 | | CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101 383 | | ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101 384 | +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101 385 | | D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100 386 | | D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:100 387 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[24] at zipPartitions at VertexRDD.scala:301 388 | | VertexRDD, VertexRDD MapPartitionsRDD[18] at mapPartitions at VertexRDD.scala:441 389 | | MapPartitionsRDD[17] at mapPartitions at VertexRDD.scala:457 390 | | ShuffledRDD[16] at ShuffledRDD at RoutingTablePartition.scala:36 391 | +-(2) VertexRDD.createRoutingTables - vid2pid (aggregation) MapPartitionsRDD[15] at mapPartitions at VertexRDD.scala:452 392 | | EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169 393 | | MappedRDD[11] at map at Graph.scala:392 394 | | MappedRDD[10] at distinct at KCoreCommonDebug.scala:115 395 | | ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115 396 | +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115 397 | | FilteredRDD[7] at filter at KCoreCommonDebug.scala:112 398 | | MappedRDD[6] at map at KCoreCommonDebug.scala:102 399 | | MappedRDD[5] at repartition at KCoreCommonDebug.scala:101 400 | | CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101 401 | | ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101 402 | +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101 403 | | D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100 404 | | D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:100 405 | | ShuffledRDD[23] at ShuffledRDD at MessageToPartition.scala:31 406 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[22] at mapPartitions at GraphImpl.scala:192 407 | | EdgeRDD MapPartitionsRDD[12] at mapPartitionsWithIndex at EdgeRDD.scala:169 408 | | MappedRDD[11] at map at Graph.scala:392 409 | | MappedRDD[10] at distinct at KCoreCommonDebug.scala:115 410 | | ShuffledRDD[9] at distinct at KCoreCommonDebug.scala:115 411 | +-(2) MappedRDD[8] at distinct at KCoreCommonDebug.scala:115 412 | | FilteredRDD[7] at filter at KCoreCommonDebug.scala:112 413 | | MappedRDD[6] at map at KCoreCommonDebug.scala:102 414 | | MappedRDD[5] at repartition at KCoreCommonDebug.scala:101 415 | | CoalescedRDD[4] at repartition at KCoreCommonDebug.scala:101 416 | | ShuffledRDD[3] at repartition at KCoreCommonDebug.scala:101 417 | +-(2) MapPartitionsRDD[2] at repartition at KCoreCommonDebug.scala:101 418 | | D:\graphData\verylarge.txt MappedRDD[1] at textFile at KCoreCommonDebug.scala:100 419 | | D:\graphData\verylarge.txt HadoopRDD[0] at textFile at KCoreCommonDebug.scala:10 420 | ``` 421 | 422 | [3] lineage demo,加checkpoint,第5轮迭代 423 | ```scala 424 | [Iter 5][DEBUG] (2) VertexRDD[113] at RDD at VertexRDD.scala:58 425 | | VertexRDD ZippedPartitionsRDD2[112] at zipPartitions at VertexRDD.scala:200 426 | | VertexRDD MapPartitionsRDD[103] at mapPartitions at VertexRDD.scala:127 427 | | VertexRDD ZippedPartitionsRDD2[91] at zipPartitions at VertexRDD.scala:200 428 | | VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127 429 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 430 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 431 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 432 | | CheckpointRDD[56] at apply at List.scala:318 433 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 434 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 435 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 436 | | CheckpointRDD[56] at apply at List.scala:318 437 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 438 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 439 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 440 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 441 | | CheckpointRDD[57] at apply at List.scala:318 442 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[89] at zipPartitions at VertexRDD.scala:301 443 | | VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127 444 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 445 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 446 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 447 | | CheckpointRDD[56] at apply at List.scala:318 448 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 449 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 450 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 451 | | CheckpointRDD[56] at apply at List.scala:318 452 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 453 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 454 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 455 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 456 | | CheckpointRDD[57] at apply at List.scala:318 457 | | ShuffledRDD[88] at ShuffledRDD at MessageToPartition.scala:31 458 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[87] at mapPartitions at GraphImpl.scala:192 459 | | EdgeRDD MapPartitionsRDD[84] at mapPartitions at EdgeRDD.scala:85 460 | | EdgeRDD ZippedPartitionsRDD2[76] at zipPartitions at ReplicatedVertexView.scala:114 461 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 462 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 463 | | CheckpointRDD[57] at apply at List.scala:318 464 | | ShuffledRDD[75] at partitionBy at ReplicatedVertexView.scala:112 465 | +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[74] at mapPartitions at VertexRDD.scala:347 466 | | VertexRDD ZippedPartitionsRDD2[72] at zipPartitions at VertexRDD.scala:174 467 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 468 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 469 | | CheckpointRDD[56] at apply at List.scala:318 470 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 471 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 472 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 473 | | CheckpointRDD[56] at apply at List.scala:318 474 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 475 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 476 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 477 | | CheckpointRDD[56] at apply at List.scala:318 478 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 479 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 480 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 481 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 482 | | CheckpointRDD[57] at apply at List.scala:318 483 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[110] at zipPartitions at VertexRDD.scala:301 484 | | VertexRDD MapPartitionsRDD[103] at mapPartitions at VertexRDD.scala:127 485 | | VertexRDD ZippedPartitionsRDD2[91] at zipPartitions at VertexRDD.scala:200 486 | | VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127 487 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 488 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 489 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 490 | | CheckpointRDD[56] at apply at List.scala:318 491 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 492 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 493 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 494 | | CheckpointRDD[56] at apply at List.scala:318 495 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 496 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 497 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 498 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 499 | | CheckpointRDD[57] at apply at List.scala:318 500 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[89] at zipPartitions at VertexRDD.scala:301 501 | | VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127 502 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 503 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 504 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 505 | | CheckpointRDD[56] at apply at List.scala:318 506 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 507 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 508 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 509 | | CheckpointRDD[56] at apply at List.scala:318 510 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 511 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 512 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 513 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 514 | | CheckpointRDD[57] at apply at List.scala:318 515 | | ShuffledRDD[88] at ShuffledRDD at MessageToPartition.scala:31 516 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[87] at mapPartitions at GraphImpl.scala:192 517 | | EdgeRDD MapPartitionsRDD[84] at mapPartitions at EdgeRDD.scala:85 518 | | EdgeRDD ZippedPartitionsRDD2[76] at zipPartitions at ReplicatedVertexView.scala:114 519 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 520 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 521 | | CheckpointRDD[57] at apply at List.scala:318 522 | | ShuffledRDD[75] at partitionBy at ReplicatedVertexView.scala:112 523 | +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[74] at mapPartitions at VertexRDD.scala:347 524 | | VertexRDD ZippedPartitionsRDD2[72] at zipPartitions at VertexRDD.scala:174 525 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 526 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 527 | | CheckpointRDD[56] at apply at List.scala:318 528 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 529 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 530 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 531 | | CheckpointRDD[56] at apply at List.scala:318 532 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 533 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 534 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 535 | | CheckpointRDD[56] at apply at List.scala:318 536 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 537 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 538 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 539 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 540 | | CheckpointRDD[57] at apply at List.scala:318 541 | | ShuffledRDD[109] at ShuffledRDD at MessageToPartition.scala:31 542 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[108] at mapPartitions at GraphImpl.scala:192 543 | | EdgeRDD MapPartitionsRDD[105] at mapPartitions at EdgeRDD.scala:85 544 | | EdgeRDD ZippedPartitionsRDD2[97] at zipPartitions at ReplicatedVertexView.scala:114 545 | | EdgeRDD MapPartitionsRDD[84] at mapPartitions at EdgeRDD.scala:85 546 | | EdgeRDD ZippedPartitionsRDD2[76] at zipPartitions at ReplicatedVertexView.scala:114 547 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 548 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 549 | | CheckpointRDD[57] at apply at List.scala:318 550 | | ShuffledRDD[75] at partitionBy at ReplicatedVertexView.scala:112 551 | +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[74] at mapPartitions at VertexRDD.scala:347 552 | | VertexRDD ZippedPartitionsRDD2[72] at zipPartitions at VertexRDD.scala:174 553 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 554 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 555 | | CheckpointRDD[56] at apply at List.scala:318 556 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 557 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 558 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 559 | | CheckpointRDD[56] at apply at List.scala:318 560 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 561 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 562 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 563 | | CheckpointRDD[56] at apply at List.scala:318 564 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 565 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 566 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 567 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 568 | | CheckpointRDD[57] at apply at List.scala:318 569 | | ShuffledRDD[96] at partitionBy at ReplicatedVertexView.scala:112 570 | +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[95] at mapPartitions at VertexRDD.scala:347 571 | | VertexRDD ZippedPartitionsRDD2[93] at zipPartitions at VertexRDD.scala:174 572 | | VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127 573 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 574 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 575 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 576 | | CheckpointRDD[56] at apply at List.scala:318 577 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 578 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 579 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 580 | | CheckpointRDD[56] at apply at List.scala:318 581 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 582 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 583 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 584 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 585 | | CheckpointRDD[57] at apply at List.scala:318 586 | | VertexRDD ZippedPartitionsRDD2[91] at zipPartitions at VertexRDD.scala:200 587 | | VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127 588 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 589 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 590 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 591 | | CheckpointRDD[56] at apply at List.scala:318 592 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 593 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 594 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 595 | | CheckpointRDD[56] at apply at List.scala:318 596 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 597 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 598 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 599 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 600 | | CheckpointRDD[57] at apply at List.scala:318 601 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[89] at zipPartitions at VertexRDD.scala:301 602 | | VertexRDD MapPartitionsRDD[82] at mapPartitions at VertexRDD.scala:127 603 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 604 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 605 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 606 | | CheckpointRDD[56] at apply at List.scala:318 607 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 608 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 609 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 610 | | CheckpointRDD[56] at apply at List.scala:318 611 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 612 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 613 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 614 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 615 | | CheckpointRDD[57] at apply at List.scala:318 616 | | ShuffledRDD[88] at ShuffledRDD at MessageToPartition.scala:31 617 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[87] at mapPartitions at GraphImpl.scala:192 618 | | EdgeRDD MapPartitionsRDD[84] at mapPartitions at EdgeRDD.scala:85 619 | | EdgeRDD ZippedPartitionsRDD2[76] at zipPartitions at ReplicatedVertexView.scala:114 620 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 621 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 622 | | CheckpointRDD[57] at apply at List.scala:318 623 | | ShuffledRDD[75] at partitionBy at ReplicatedVertexView.scala:112 624 | +-(2) ReplicatedVertexView.updateVertices - shippedVerts true true (broadcast) MapPartitionsRDD[74] at mapPartitions at VertexRDD.scala:347 625 | | VertexRDD ZippedPartitionsRDD2[72] at zipPartitions at VertexRDD.scala:174 626 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 627 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 628 | | CheckpointRDD[56] at apply at List.scala:318 629 | | VertexRDD ZippedPartitionsRDD2[70] at zipPartitions at VertexRDD.scala:200 630 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 631 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 632 | | CheckpointRDD[56] at apply at List.scala:318 633 | | VertexRDD, GraphOps.degrees ZippedPartitionsRDD2[68] at zipPartitions at VertexRDD.scala:301 634 | | VertexRDD MapPartitionsRDD[61] at mapPartitions at VertexRDD.scala:127 635 | | VertexRDD ZippedPartitionsRDD2[47] at zipPartitions at VertexRDD.scala:200 636 | | CheckpointRDD[56] at apply at List.scala:318 637 | | ShuffledRDD[67] at ShuffledRDD at MessageToPartition.scala:31 638 | +-(2) GraphImpl.mapReduceTriplets - preAgg MapPartitionsRDD[66] at mapPartitions at GraphImpl.scala:192 639 | | EdgeRDD MapPartitionsRDD[63] at mapPartitions at EdgeRDD.scala:85 640 | | EdgeRDD ZippedPartitionsRDD2[53] at zipPartitions at ReplicatedVertexView.scala:114 641 | | CheckpointRDD[57] at apply at List.scala:318 642 | ``` -------------------------------------------------------------------------------- /BigDataSystems/Spark/StackOverflowDiagnosis/figures/g1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/StackOverflowDiagnosis/figures/g1.png -------------------------------------------------------------------------------- /BigDataSystems/Spark/StackOverflowDiagnosis/figures/g2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/StackOverflowDiagnosis/figures/g2.png -------------------------------------------------------------------------------- /BigDataSystems/Spark/StackOverflowDiagnosis/figures/g3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JerryLead/blogs/f559c3ecf1b11f928c9d9b2f0af91e997327ef35/BigDataSystems/Spark/StackOverflowDiagnosis/figures/g3.png --------------------------------------------------------------------------------