├── .gitignore ├── README.md ├── auto_schedule ├── __init__.py └── auto_schedule.py ├── config.py ├── student_test.py └── test_frame.py /.gitignore: -------------------------------------------------------------------------------- 1 | _auto_schedule.py 2 | _config.py 3 | batch_test.py 4 | *vscode* 5 | *.txt 6 | *cache* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Compiler Design Project (Main project) 2 | 3 | ## 项目背景 4 | 在课程项目的第一部分中,同学们利用TVM的Tensor描述语言完成了五个卷积神经网络算子前向及反向的实现。其中包含五个算子分别为二维卷积,2x2池化,ReLU,flatten,全连接。通过与PyTorch等框架的对比,这些算子的正确性有了保证。然而,在第一部分中,所实现的算子并没有考虑性能问题。正是因为PyTorch等框架提供了高性能的算子实现,它们才被工业界和学术界广泛应用。为了实现高性能的算子,在课程项目的第二部分,我们需要对算子 5 | 进行优化。通过结合两部分课程项目,同学们就有能力搭建完整的卷积神经网络并且完成快速训练与推导。 6 | 7 | ## 项目要求 8 | 本部分是课程项目的第二部分。本部分的目的为设计流程完成对给定算子的性能优化并实现为自动化的程序。 9 | 10 | 本部分的任务包括: 11 | 1. 学习如何使用TVM来优化算子; 12 | 2. 实现算法完成完成对给定算子的自动优化,有效提升算子性能(即缩短执行时间) 13 | 14 | ## 使用框架代码 15 | 提供了auto\_schedule目录,该目录下有auto\_schedule.py,其中实现了auto\_schedule函数,已经在\_\_init\_\_.py中export,可以在auto\_schedule所在目录调用该包。 16 | 如 17 | ```python 18 | user@host:$ ls 19 | auto_schedule config.py README.md student_test.py 20 | user@host:$ python 21 | Python 3.5.6 (default, Apr 20 2019, 09:23:07) 22 | [GCC 5.4.0 20160609] on linux 23 | Type "help", "copyright", "credits" or "license" for more information. 24 | >>> import auto_schedule as auto 25 | >>> auto.auto_schedule 26 | 27 | ``` 28 | 同学们实现的auto\_schedule函数及其它辅助结构都要在auto\_schedule目录中定义并使用。 29 | 30 | --- 31 | 32 | **环境**: 33 | Python 3 + Numpy + PyTorch + tvm v0.5 34 | 35 | --- 36 | **测试**: 37 | ``` 38 | python student_test.py 39 | ``` 40 | 运行结果会打印出来,并且在project2\_score.txt也记录这这一次的得分。 41 | 在config.py中定义了被测试的参数,可以通过在config.py中打注释实现针对某个shape的测试。 42 | 测试中是按照随机顺序测试不同参数的,所以请不要对测试顺序有假设。 43 | 44 | ## 其它 45 | 测试框架可能会有考虑不周的地方,有问题请及时反馈。 -------------------------------------------------------------------------------- /auto_schedule/__init__.py: -------------------------------------------------------------------------------- 1 | # export your auto_schedule function 2 | from .auto_schedule import auto_schedule -------------------------------------------------------------------------------- /auto_schedule/auto_schedule.py: -------------------------------------------------------------------------------- 1 | import tvm 2 | 3 | 4 | def auto_schedule(func, args): 5 | """Automatic scheduler 6 | 7 | Args: 8 | ----------------- 9 | func: function object 10 | similar to batch_gemm function mentioned above 11 | args: tuple 12 | inputs to func 13 | ----------------- 14 | Returns: 15 | s: tvm.schedule.Schedule 16 | bufs: list of tvm.tensor.Tensor 17 | """ 18 | ops, bufs = func(*args) 19 | ################################################# 20 | # do some thing with `ops`, `bufs` and `args` 21 | # to analyze which schedule is appropriate 22 | 23 | s = tvm.create_schedule(ops) 24 | 25 | ################################################# 26 | # perform real schedule according to 27 | # decisions made above, using primitives 28 | # such as split, reorder, parallel, unroll... 29 | 30 | ################################################# 31 | # finally, remember to return these two results 32 | # we need `bufs` to build function via `tvm.build` 33 | return s, bufs -------------------------------------------------------------------------------- /config.py: -------------------------------------------------------------------------------- 1 | conv_shapes = [ 2 | # batch_size, 3 | # in_channel, 4 | # inputs_height, 5 | # inputs_width, 6 | # out_channel, 7 | # channel_per_group, 8 | # kernel_height, 9 | # kernel_width, 10 | # if_bias=0, 11 | # stride=1, 12 | # padding=0, 13 | # dilation=1, 14 | # groups=1, 15 | # dtype="float32" 16 | 17 | (1, 1024, 7, 7, 1024, 1024, 3, 3, 0, 1, 1, 1, 1), # yolo 24 18 | (8, 384, 27, 27, 64, 384, 1, 1, 1, 1, 0, 1, 1), # squeeze-net fire 8 19 | (4, 112, 14, 14, 224, 112, 3, 3, 0, 1, 1, 2, 1), # google-net inception4b-branch2b 20 | ] 21 | 22 | gemm_shapes = [ 23 | # batch 24 | # height 25 | # width 26 | # length 27 | 28 | (1, 1024, 1024, 1024), 29 | (2, 512, 512, 512), 30 | (8, 1024, 32, 1024), 31 | ] -------------------------------------------------------------------------------- /student_test.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import os 3 | import sys 4 | import time 5 | import traceback 6 | import signal 7 | import multiprocessing.pool as pool 8 | import shutil 9 | import tvm 10 | import torch 11 | import numpy as np 12 | 13 | import config 14 | 15 | from imp import find_module, load_module 16 | from math import ceil 17 | from multiprocessing import Process, Queue, Pipe 18 | from multiprocessing import Pool 19 | 20 | 21 | def handler(signum, frame): 22 | raise TimeoutError() 23 | 24 | 25 | def assert_print(a, b="Error!"): 26 | if a == False: 27 | print(b) 28 | 29 | 30 | def torch_gemm(A, B, *arg): 31 | '''Interface of gemm function in pytorch 32 | 33 | Args: 34 | ----------------------------- 35 | A, B : torch.tensor 36 | args for gemm function in pytorch 37 | 38 | *arg : just for uniform interface 39 | ----------------------------- 40 | 41 | Returns: 42 | ----------------------------- 43 | 44 | torch.tensor 45 | ----------------------------- 46 | ''' 47 | return A.bmm(B) 48 | 49 | 50 | def torch_conv2d(inputs, weight, bias=None, stride=1, padding=0, dilation=1, groups=1): 51 | '''Interface of torch.nn.functional.conv2d 52 | 53 | Args: 54 | ----------------------------- 55 | inputs, weight, bias : torch.tensor 56 | first three args for torch.nn.functional.conv2d 57 | ----------------------------- 58 | 59 | Returns: 60 | ----------------------------- 61 | 62 | torch.tensor 63 | ----------------------------- 64 | ''' 65 | return torch.nn.functional.conv2d(inputs, weight, bias=bias, stride=stride, padding=padding, dilation=dilation, groups=groups) 66 | 67 | 68 | def zero_pad2d(inputs, padding=0): 69 | """Zero padding for 2d tensor 70 | 71 | Args: 72 | ----------------------------- 73 | inputs : tvm.tensor.Tensor 74 | shape [batch, channel, height, width] 75 | padding: (optional:0) int or tuple 76 | expected: (h_pad_up, h_pad_down, w_pad_up, w_pad_down) 77 | ----------------------------- 78 | 79 | Returns: 80 | ----------------------------- 81 | tvm.tensor.Tensor 82 | shape [batch, channel, padded_height, padded_width] 83 | ----------------------------- 84 | """ 85 | padding = (padding, padding, padding, padding) if isinstance( 86 | padding, (int, tvm.expr.IntImm)) else padding 87 | assert_print(isinstance(padding, tuple), 88 | "type(padding)={}".format(type(padding))) 89 | if len(padding) == 2: 90 | padding = (padding[0], padding[0], padding[1], padding[1]) 91 | assert_print(len(padding) == 4) 92 | 93 | padding_zero = 0.0 if "float" in inputs.dtype else 0 94 | 95 | batch_size, in_channel, height, width = inputs.shape 96 | return tvm.compute( 97 | (batch_size, in_channel, height + 98 | padding[0] + padding[1], width + padding[2] + padding[3]), 99 | lambda b, c, h, w: tvm.if_then_else( 100 | tvm.all(h >= padding[0], h < height + padding[0], 101 | w >= padding[2], w < width + padding[2]), 102 | inputs[b, c, h - padding[0], w - padding[2]], 103 | padding_zero 104 | ) 105 | ) 106 | 107 | 108 | 109 | def batch_gemm(batch, height, width, length, transposeA=False, transposeB=False, dtype="float32"): 110 | """Matrix multiplies matrix 111 | 112 | Args: 113 | ----------------------------- 114 | batch, height, width, length : int 115 | shape of A and B 116 | A: tvm.tensor.Tensor 117 | shape [batch, height, width] 118 | B: tvm.tensor.Tensor 119 | shape [batch, width, length] 120 | 121 | transposeA: (optional:False) bool 122 | 123 | transposeB: (optional:False) bool 124 | ----------------------------- 125 | 126 | Returns: 127 | ----------------------------- 128 | list [tvm.tensor.Tensor.op] 129 | 130 | list of bufs: 131 | shape [A, B, C] 132 | ----------------------------- 133 | """ 134 | A = tvm.placeholder((batch, height, width), dtype=dtype, name="A") 135 | B = tvm.placeholder((batch, width, length), dtype=dtype, name="B") 136 | if transposeA and transposeB: 137 | k = tvm.reduce_axis((0, B.shape[2])) 138 | assert_print(A.shape[1].value == B.shape[2].value) 139 | C = tvm.compute( 140 | (A.shape[0], A.shape[2], B.shape[1]), 141 | lambda b, i, j: tvm.sum(A[b, k, i] * B[b, j, k], axis=k) 142 | ) 143 | elif transposeA and not transposeB: 144 | k = tvm.reduce_axis((0, B.shape[1])) 145 | assert_print(A.shape[1].value == B.shape[1].value) 146 | C = tvm.compute( 147 | (A.shape[0], A.shape[2], B.shape[2]), 148 | lambda b, i, j: tvm.sum(A[b, k, i] * B[b, k, j], axis=k) 149 | ) 150 | elif not transposeA and transposeB: 151 | k = tvm.reduce_axis((0, B.shape[2])) 152 | assert_print(A.shape[2].value == B.shape[2].value) 153 | C = tvm.compute( 154 | (A.shape[0], A.shape[1], B.shape[1]), 155 | lambda b, i, j: tvm.sum(A[b, i, k] * B[b, j, k], axis=k) 156 | ) 157 | else: 158 | k = tvm.reduce_axis((0, B.shape[1])) 159 | assert_print(A.shape[2].value == B.shape[1].value) 160 | C = tvm.compute( 161 | (A.shape[0], A.shape[1], B.shape[2]), 162 | lambda b, i, j: tvm.sum(A[b, i, k] * B[b, k, j], axis=k) 163 | ) 164 | 165 | return [C.op], [A, B, C] 166 | 167 | 168 | def conv2d_nchw(batch_size, in_channel, inputs_height, inputs_width, out_channel, channel_per_group, kernel_height, kernel_width, if_bias=0, stride=1, padding=0, dilation=1, groups=1, dtype="float32"): 169 | """Convolution 2d NCHW layout 170 | 171 | Args: 172 | ----------------------------- 173 | batch_size, in_channel, inputs_height, inputs_width : int 174 | shape of inputs 175 | inputs : tvm.tensor.Tensor 176 | shape [batch, channel, height, width] 177 | 178 | out_channel, channel_per_group, kernel_height, kernel_width : int 179 | shape of weight 180 | weight : tvm.tensor.Tensor 181 | shape [out_channel, channel // groups, kernel_height, kernel_width] 182 | 183 | if_bias : (optional:0) bool 184 | bias : tvm.tensor.Tensor 185 | shape [out_channel] 186 | 187 | stride : (optional:1) int or tuple 188 | 189 | padding : (optional:0) int or tuple 190 | 191 | dilation: (optional:1) int 192 | 193 | groups : (optional:1) int 194 | ----------------------------- 195 | 196 | Returns: 197 | ----------------------------- 198 | 199 | list:[tvm.tensor.Tensor.op] 200 | 201 | list of bufs: 202 | [inputs, weight, bias, Output] if if_bias 203 | [inputs, weight, Output] otherwise 204 | ----------------------------- 205 | """ 206 | in_h, in_w, k_h, k_w = inputs_height, inputs_width, kernel_height, kernel_width 207 | inputs = tvm.placeholder( 208 | (batch_size, in_channel, in_h, in_w), dtype=dtype, name="inputs") 209 | weight = tvm.placeholder( 210 | (out_channel, channel_per_group, k_h, k_w), dtype=dtype, name="weight") 211 | if if_bias: 212 | bias = tvm.placeholder((out_channel,), dtype=dtype, name="bias") 213 | assert_print(channel_per_group * groups == in_channel) 214 | out_channel_per_group = out_channel // groups 215 | assert_print(out_channel_per_group * groups == out_channel) 216 | 217 | stride = (stride, stride) if isinstance( 218 | stride, (int, tvm.expr.IntImm)) else stride 219 | padding = (padding, padding) if isinstance( 220 | padding, (int, tvm.expr.IntImm)) else padding 221 | dilation = (dilation, dilation) if isinstance( 222 | dilation, (int, tvm.expr.IntImm)) else dilation 223 | assert_print(isinstance(stride, tuple) and len(stride) == 2) 224 | assert_print(isinstance(padding, tuple) and len(padding) == 2) 225 | assert_print(isinstance(dilation, tuple) and len(dilation) == 2) 226 | 227 | out_h = (in_h + 2 * padding[0] - dilation[0] 228 | * (k_h - 1) - 1) // stride[0] + 1 229 | out_w = (in_w + 2 * padding[1] - dilation[1] 230 | * (k_w - 1) - 1) // stride[1] + 1 231 | rc = tvm.reduce_axis((0, channel_per_group)) 232 | rh = tvm.reduce_axis((0, k_h)) 233 | rw = tvm.reduce_axis((0, k_w)) 234 | 235 | padded = zero_pad2d(inputs, padding=padding) 236 | Output = tvm.compute( 237 | (batch_size, out_channel, out_h, out_w), 238 | lambda b, c, h, w: tvm.sum( 239 | (padded[b, c // out_channel_per_group * channel_per_group + rc, 240 | h * stride[0] + rh * dilation[0], w * stride[1] + rw * dilation[1]] 241 | * weight[c, rc, rh, rw]), 242 | axis=[rc, rw, rh] 243 | ) 244 | ) 245 | if if_bias: 246 | Output = tvm.compute( 247 | (batch_size, out_channel, out_h, out_w), 248 | lambda b, c, h, w: Output[b, c, h, w] + bias[c] 249 | ) 250 | return [Output.op], [inputs, weight, bias, Output] 251 | return [Output.op], [inputs, weight, Output] 252 | 253 | 254 | def build_and_run(s, tensors, control_f, shape, time_count, count=10, device_id=0, target="llvm", timeout=10.0): 255 | """ Build and record the time of running. 256 | 257 | Args: 258 | ----------------------------- 259 | s: schedule.Schedule get form the student's auto_schedule 260 | 261 | tensors (list) 262 | the input tensors and the output tensor 263 | 264 | control_f the torch function 265 | 266 | shape 267 | 268 | time_count: used for record the running time 269 | 270 | count: the number rounds repeat testing 271 | 272 | device_id : the id of CPU 273 | ----------------------------- 274 | 275 | Returns: 276 | ----------------------------- 277 | [tvm_time, torch_time]: 278 | [float , flaot] 279 | which indicates 280 | the total time of running scheduled tvm calculation and 281 | the total time of running torch calculation 282 | ----------------------------- 283 | """ 284 | # Create ctx. 285 | try: 286 | ctx = tvm.cpu(device_id) 287 | except Exception as e: 288 | string = "Can not found device !!!\n" + str(e) 289 | time_count.put([string, -1]) 290 | return -1 291 | 292 | try: 293 | output_tensor = tensors[-1] 294 | del tensors[-1] 295 | except Exception as e: 296 | string = "The input is not correct !!!" + str(e) 297 | time_count.put([string, -1]) 298 | return -1 299 | # Craft input data. 300 | try: 301 | input_tvm = [] 302 | input_torch = [] 303 | 304 | for tensor in tensors: 305 | data = np.random.random( 306 | [int(j) for j in tensor.shape]).astype(np.float32) * 100 307 | tvm_data = tvm.nd.array(data, ctx) 308 | torch_data = torch.tensor(data) 309 | input_tvm.append(tvm_data) 310 | input_torch.append(torch_data) 311 | 312 | output_holder = tvm.nd.array( 313 | np.zeros([int(j) for j in output_tensor.shape], 314 | dtype=output_tensor.dtype), ctx 315 | ) 316 | 317 | input_tvm = input_tvm + [output_holder] 318 | except Exception as e: 319 | string = "Can't prepare input data!!!\n" + str(e) 320 | time_count.put([string, -1]) 321 | return -1 322 | 323 | torch_args = [] 324 | # TODO use shape length to distinguish conv2d and gemm is foolish 325 | # No bias if this is convolution 326 | if len(shape) > 8 and shape[8] == 0: 327 | torch_args.append(None) 328 | torch_args.extend(shape[9:]) 329 | # warm-up 330 | control_f(*(input_torch + torch_args)) 331 | begin = time.time() 332 | for i in range(0, count): 333 | control_f(*(input_torch + torch_args)) 334 | end = time.time() 335 | torch_time = (end - begin) * 1e3 / count 336 | 337 | # Build function form s and tensors. 338 | try: 339 | func = tvm.build(s, tensors + [output_tensor], target=target) 340 | except Exception as e: 341 | string = "Can not build successfully !!!" + str(e) 342 | time_count.put([string, torch_time]) 343 | return -1 344 | 345 | signal.signal(signal.SIGALRM, handler) 346 | signal.alarm(ceil(timeout)) 347 | try: 348 | evaluator = func.time_evaluator(func.entry_name, ctx, number=count) 349 | tvm_time = evaluator(*input_tvm).mean * 1e3 350 | except TimeoutError: 351 | string = "Timeout when evaluating, the limit is {}ms".format(timeout / count * 1e3) 352 | time_count.put([string, torch_time]) 353 | return -1 354 | except Exception as e: 355 | string = "The culation is not correct !!!\n" + str(e) 356 | time_count.put([string, torch_time]) 357 | return -1 358 | finally: 359 | # restore the default handler 360 | signal.signal(signal.SIGALRM,signal.SIG_IGN) 361 | time_count.put([tvm_time, torch_time]) 362 | return 0 363 | 364 | 365 | def _auto_schedule(auto_schedule_func, func, shape, queue, timeout=20 * 60): 366 | '''Interface of auto_schedule 367 | 368 | Args: 369 | ----------------------------- 370 | auto_schedule_func : auto_schedule function 371 | 372 | func : conv2d_nchw or gemm 373 | 374 | shape : args for auto_schedule 375 | ----------------------------- 376 | 377 | Returns: 378 | ----------------------------- 379 | list:[tvm.tensor.Tensor.op] 380 | 381 | list of bufs in func 382 | ----------------------------- 383 | ''' 384 | signal.signal(signal.SIGALRM, handler) 385 | signal.alarm(ceil(timeout)) 386 | try: 387 | s, bufs = auto_schedule_func(func, shape) 388 | except TimeoutError: 389 | string = "Timeout when running `auto_schedule` function, time limit is {}ms".format(timeout * 1e3) 390 | queue.put([string, -1]) 391 | return None, None 392 | except Exception as e: 393 | string = "Error occurs when running `auto_schedule`\n" + str(e) 394 | queue.put([string, -1]) 395 | return None, None 396 | finally: 397 | # restore the default handler 398 | signal.signal(signal.SIGALRM,signal.SIG_IGN) 399 | return s, bufs 400 | 401 | 402 | def _evaluate(torch_func, func, shape, time_count, target="llvm", dev_id=0, times=10, timeout_create=20 * 60, timeout_cal=10.0): 403 | '''evaluating auto_schedule in special shape 404 | 405 | Args: 406 | ----------------------------- 407 | torch_func : torch_conv2d or torch_gemm 408 | interface of torch function 409 | 410 | auto_schedule: function from student 411 | 412 | func : conv2d_nchw or gemm 413 | 414 | shape : list 415 | args for func 416 | 417 | target : string 418 | 419 | dev_id : int 420 | 421 | times : int 422 | times of calculating in Build_and_Run 423 | 424 | timeout_create : (optional: 10.0) float 425 | time limit in creating schedule 426 | 427 | timeout_cal : (optional: 10.0) float 428 | time limit in calculating 429 | 430 | time_count : Queue 431 | for testing result transferring 432 | ----------------------------- 433 | 434 | Returns: 435 | ----------------------------- 436 | ''' 437 | # import student module 438 | try: 439 | student_module = load_module('student_module', *find_module('auto_schedule')) 440 | except ImportError as e: 441 | string = 'An error occurs when importing `auto_schedule` module\n' + str(e) 442 | time_count.put([string, -1]) 443 | return -1 444 | 445 | # scheduling 446 | s, bufs = _auto_schedule(student_module.auto_schedule, func, shape, time_count, timeout_create) 447 | if s is None or bufs is None: 448 | return -1 449 | 450 | # evaluating 451 | ret = build_and_run(s, bufs, torch_func, shape, time_count, times, dev_id, target, timeout_cal) 452 | if ret < 0: 453 | return -1 454 | return 0 455 | 456 | 457 | def evaluate(torch_func, func, shape, target="llvm", dev_id=0, timeout_create=20 * 60, timeout_cal=10.0, times=10): 458 | '''evaluating auto_schedule with a single shape 459 | 460 | Args: 461 | ----------------------------- 462 | torch_func : torch_conv2d or torch_gemm 463 | interface of torch function 464 | 465 | func : conv2d_nchw or gemm 466 | 467 | shape : a single shape 468 | args for func 469 | 470 | target : string 471 | 472 | dev_id : (optional: 0) int 473 | 474 | timeout_create : (optional: 10.0) float 475 | time limit in creating schedule 476 | 477 | timeout_cal : (optional: 10.0) float 478 | time limit in calculating 479 | 480 | times : (optional: 10) int 481 | times of calculating in Build_and_Run 482 | 483 | max_proc_num : (optional: 4) int 484 | ----------------------------- 485 | 486 | Returns: 487 | ----------------------------- 488 | list : [auto_time,torch_time] for each shape 489 | ----------------------------- 490 | ''' 491 | assert shape != None, "empty shape!" 492 | 493 | time_count = Queue() 494 | try: 495 | p = Process(target=_evaluate, args=( 496 | torch_func, func, shape, time_count, target, dev_id, times, timeout_create, 497 | timeout_cal)) 498 | p.start() 499 | except Exception as e: 500 | print("Failed in creating process for shape {}\n{}".format(shape, str(e))) 501 | 502 | # waiting for testing 503 | timeout = timeout_create + timeout_cal + 10.0 # make up for torch wastes 504 | beg = time.time() 505 | try: 506 | while time.time() - beg < timeout: 507 | if p.is_alive(): 508 | time.sleep(.1) 509 | else: 510 | break 511 | else: 512 | p.terminate() 513 | p.join() 514 | except Exception as e: 515 | print("Shape {}: failed in process\n{}".format(shape, str(e))) 516 | 517 | # collecting testing result 518 | ans = [-1, -1] 519 | if not time_count.empty(): 520 | auto_time, torch_time = time_count.get() 521 | if isinstance(auto_time, str): 522 | print("Exceptons occur in shape {}".format(shape)) 523 | print(auto_time) 524 | else: 525 | ans = [auto_time, torch_time] 526 | else: 527 | print("Shape {} can't get results!".format(shape)) 528 | # clean the queue 529 | time_count.close() 530 | del time_count 531 | return ans 532 | 533 | # overload Pool in order to non-daemonize 534 | # class NonDaemonProcess(Process): 535 | # def __init__(self, *args, **kwargs): 536 | # super(Process, self).__init__(*args, **kwargs) 537 | 538 | # def _get_daemon(self): 539 | # return False 540 | 541 | # def _set_daemon(self, value): 542 | # pass 543 | 544 | # daemon = property(_get_daemon, _set_daemon) 545 | 546 | # class NewPool(pool.Pool): 547 | # def __init__(self, *args, **kwargs): 548 | # super(pool.Pool, self).__init__(*args, **kwargs) 549 | 550 | # Process = NonDaemonProcess 551 | 552 | class NewPool(pool.Pool): 553 | def Process(self, *args, **kwds): 554 | proc = super(NewPool, self).Process(*args, **kwds) 555 | 556 | class NonDaemonProcess(proc.__class__): 557 | """Monkey-patch process to ensure it is never daemonized""" 558 | @property 559 | def daemon(self): 560 | return False 561 | 562 | @daemon.setter 563 | def daemon(self, val): 564 | pass 565 | 566 | proc.__class__ = NonDaemonProcess 567 | return proc 568 | 569 | def parallel_evaluate(parallel=1): 570 | """evaluate process 571 | 572 | student level : synchro 573 | operator level : synchro 574 | shape level : asynchro 575 | """ 576 | # dir preparation 577 | res_file = 'project2_score.txt' 578 | res_path = res_file 579 | time_create = 20 * 60 580 | time_cal = 10.0 581 | number_test = 10 582 | 583 | # test coeffs; currently random 584 | conv2d_shapes = config.conv_shapes.copy() 585 | gemm_shapes = config.gemm_shapes.copy() 586 | np.random.shuffle(conv2d_shapes) 587 | np.random.shuffle(gemm_shapes) 588 | score_item = ['gemm_' + str(s) for s in gemm_shapes] + ['conv2d_' + str(s) for s in conv2d_shapes] 589 | target = 'llvm' 590 | 591 | # for stdout logs 592 | start_time = time.time() 593 | 594 | # exception info 595 | # prob_exceptions = ('Import Failure', 'illegal auto_schedule', 'TLE auto_schedule', 'Build Failure', 'TLE run') 596 | 597 | # evaluate func 598 | def pool_evaluate(shapes, veri_func, func, target="llvm"): 599 | # create process Pool for shapes 600 | p = NewPool() 601 | run_time = [] 602 | # exception_stat = [0, 0, 0, 0, 0] 603 | exception_stat = 0 604 | sub_procs = [] 605 | for i, shape in enumerate(shapes): 606 | subp = p.apply_async(evaluate, (veri_func, func, shape, target, i, time_create, time_cal, number_test)) 607 | sub_procs.append(subp) 608 | 609 | p.close() 610 | p.join() 611 | 612 | ret = [] 613 | for i, subp in enumerate(sub_procs): 614 | try: 615 | case_time = subp.get(timeout=1.0) 616 | except Exception as e: 617 | print("Can't get shape {} results\n{}".format(shapes[i], str(e))) 618 | case_time = [-1, -1] 619 | if case_time[0] < 0: 620 | exception_stat += 1 621 | ret.append(case_time) 622 | 623 | return ret, exception_stat 624 | 625 | # stdout logs 626 | logs = '\rProcessing begins...' 627 | sys.stdout.write(logs + '\n') 628 | sys.stdout.flush() 629 | 630 | # evaluate 631 | num_gemms = len(gemm_shapes) 632 | outer = ceil(num_gemms / parallel) 633 | gemm_ret = [] 634 | gemm_error_count = 0 635 | for i in range(outer): 636 | part_gemm_ret, part_gemm_error = pool_evaluate(gemm_shapes[i * parallel:(i+1) * parallel], torch_gemm, batch_gemm, target) 637 | gemm_ret.extend(part_gemm_ret) 638 | gemm_error_count += part_gemm_error 639 | 640 | num_convs = len(conv2d_shapes) 641 | outer = ceil(num_convs / parallel) 642 | conv_ret = [] 643 | conv_error_count = 0 644 | for i in range(outer): 645 | part_conv_ret, part_conv_error = pool_evaluate(conv2d_shapes[i * parallel:(i+1) * parallel], torch_conv2d, conv2d_nchw, target) 646 | conv_ret.extend(part_conv_ret) 647 | conv_error_count += part_conv_error 648 | 649 | if gemm_error_count or conv_error_count: 650 | exception_info = ' exception raises in {} cases'.format(gemm_error_count + conv_error_count) 651 | else: 652 | exception_info = ' No exceptions' 653 | 654 | print() 655 | print("#####################################################") 656 | print("The results:\n") 657 | string = "Time costs of GEMMs\n" 658 | for shape, ret in zip(gemm_shapes, gemm_ret): 659 | times = [ret[0] if ret[0] > 0 else "Timeout", ret[1] if ret[1] > 0 else "Not evaluted"] 660 | string += "{}: yours: {}(ms), torch: {}(ms)\n".format(shape, times[0], times[1]) 661 | print(string) 662 | 663 | string = "Time costs of Conv2ds\n" 664 | for shape, ret in zip(conv2d_shapes, conv_ret): 665 | times = [ret[0] if ret[0] > 0 else "Timeout", ret[1] if ret[1] > 0 else "Not evaluted"] 666 | string += "{}: yours: {}(ms), torch: {}(ms)\n".format(shape, times[0], times[1]) 667 | print(string) 668 | 669 | score_list = list(map(score_calculate, gemm_ret + conv_ret)) 670 | 671 | write_score(res_path, score_list, score_item, exception_info) 672 | 673 | # stdout logs 674 | logs = '\rall done!' 675 | sys.stdout.write(logs + '\n') 676 | sys.stdout.flush() 677 | return 678 | 679 | def write_score(res_file, score_list, score_item, prob_error=''): 680 | """write score into result file 681 | 682 | Parameters 683 | ---------- 684 | student_id: str 685 | res_file: str 686 | path of file to record scores 687 | score_list: list 688 | scores in each test 689 | score_item: list 690 | test names 691 | prob_error: str 692 | exceptions and errors occurring during tests 693 | 694 | Returns 695 | ------- 696 | """ 697 | total_score = sum(score_list) 698 | line = '{}:\n'.format("your scores") 699 | for i in range(len(score_item)): 700 | line += '{}:{}\n'.format(score_item[i], score_list[i]) 701 | line += 'total:{}\n'.format(total_score) 702 | line += 'exceptions:{}\n'.format(prob_error) 703 | with open(res_file, 'w') as f: 704 | f.write(line) 705 | print(line) 706 | return 707 | 708 | 709 | def score_calculate(time_tuple): 710 | """scores based on look-up table 711 | 712 | Parameters 713 | ---------- 714 | time_tuple: list 715 | with format [auto_time, torch_time] 716 | 717 | Returns 718 | ------- 719 | case_score: float 720 | scores calculated based on the look-up table 721 | """ 722 | time_tvm = time_tuple[0] 723 | time_torch = time_tuple[1] 724 | 725 | if time_tvm < 0: 726 | return 0.0 727 | perf_rate = time_torch / time_tvm 728 | if 0 <= perf_rate < 0.1: 729 | return 0.0 730 | elif 0.1 <= perf_rate < 0.2: 731 | return 0.7 732 | elif 0.2 <= perf_rate < 0.3: 733 | return 1.4 734 | elif 0.3 <= perf_rate < 0.4: 735 | return 2.1 736 | elif 0.4 <= perf_rate < 0.5: 737 | return 2.8 738 | elif 0.5 <= perf_rate < 0.6: 739 | return 3.5 740 | elif 0.6 <= perf_rate < 0.7: 741 | return 4.2 742 | elif 0.7 <= perf_rate < 0.8: 743 | return 4.9 744 | elif 0.8 <= perf_rate < 0.9: 745 | return 5.6 746 | else: 747 | return 7.0 748 | 749 | if __name__ == '__main__': 750 | parallel_evaluate() -------------------------------------------------------------------------------- /test_frame.py: -------------------------------------------------------------------------------- 1 | import tvm 2 | import time 3 | import torch 4 | import traceback 5 | import numpy as np 6 | import signal 7 | from math import ceil 8 | from multiprocessing import Process, Queue, Pipe 9 | 10 | # additional modules 11 | from multiprocessing import Pool 12 | import multiprocessing.pool as pool 13 | import os 14 | import sys 15 | import shutil 16 | 17 | from imp import find_module, load_module 18 | 19 | # remove the auto_schedule func 20 | 21 | def handler(signum, frame): 22 | raise TimeoutError() 23 | 24 | 25 | 26 | def assert_print(a, b="Error!"): 27 | if a == False: 28 | print(b) 29 | 30 | 31 | def torch_batch_gemm(A, B, *arg): 32 | '''Interface of gemm function in pytorch 33 | 34 | Args: 35 | ----------------------------- 36 | A, B : torch.tensor 37 | args for gemm function in pytorch 38 | 39 | *arg : just for uniform interface 40 | ----------------------------- 41 | 42 | Returns: 43 | ----------------------------- 44 | 45 | torch.tensor 46 | ----------------------------- 47 | ''' 48 | return torch.bmm(A, B) 49 | 50 | 51 | def torch_conv2d(inputs, weight, *arg): 52 | '''Interface of torch.nn.functional.conv2d 53 | 54 | Args: 55 | ----------------------------- 56 | inputs, weight: torch.tensor 57 | 58 | arg : tuple 59 | (bias, shape) if if_bias=True 60 | (shape) otherwise 61 | ----------------------------- 62 | 63 | Returns: 64 | ----------------------------- 65 | 66 | torch.tensor 67 | ----------------------------- 68 | ''' 69 | if len(arg)==2: 70 | return torch.nn.functional.conv2d(inputs, weight, arg[0], *arg[1][9:]) 71 | else: 72 | return torch.nn.functional.conv2d(inputs, weight, None, *arg[0][9:]) 73 | 74 | 75 | def zero_pad2d(inputs, padding=0): 76 | """Zero padding for 2d tensor 77 | 78 | Args: 79 | ----------------------------- 80 | inputs : tvm.tensor.Tensor 81 | shape [batch, channel, height, width] 82 | padding: (optional:0) int or tuple 83 | expected: (h_pad_up, h_pad_down, w_pad_up, w_pad_down) 84 | ----------------------------- 85 | 86 | Returns: 87 | ----------------------------- 88 | tvm.tensor.Tensor 89 | shape [batch, channel, padded_height, padded_width] 90 | ----------------------------- 91 | """ 92 | padding = (padding, padding, padding, padding) if isinstance( 93 | padding, (int, tvm.expr.IntImm)) else padding 94 | assert_print(isinstance(padding, tuple), 95 | "type(padding)={}".format(type(padding))) 96 | if len(padding) == 2: 97 | padding = (padding[0], padding[0], padding[1], padding[1]) 98 | assert_print(len(padding) == 4) 99 | 100 | padding_zero = 0.0 if "float" in inputs.dtype else 0 101 | 102 | batch_size, in_channel, height, width = inputs.shape 103 | return tvm.compute( 104 | (batch_size, in_channel, height + 105 | padding[0] + padding[1], width + padding[2] + padding[3]), 106 | lambda b, c, h, w: tvm.if_then_else( 107 | tvm.all(h >= padding[0], h < height + padding[0], 108 | w >= padding[2], w < width + padding[2]), 109 | inputs[b, c, h - padding[0], w - padding[2]], 110 | padding_zero 111 | ) 112 | ) 113 | 114 | 115 | def batch_gemm(batch, height, width, length, transposeA=False, transposeB=False, dtype="float32"): 116 | """Batched matrix multiplies matrix 117 | 118 | Args: 119 | ----------------------------- 120 | height, width, length : int 121 | shape of A and B 122 | A: tvm.tensor.Tensor 123 | shape [batch, height, width] 124 | B: tvm.tensor.Tensor 125 | shape [batch, width, length] 126 | 127 | transposeA: (optional:False) bool 128 | 129 | transposeB: (optional:False) bool 130 | ----------------------------- 131 | 132 | Returns: 133 | ----------------------------- 134 | list [tvm.tensor.Tensor.op] 135 | 136 | list of bufs: 137 | shape [A, B, C] 138 | ----------------------------- 139 | """ 140 | A = tvm.placeholder((batch, height, width), dtype=dtype, name="A") 141 | B = tvm.placeholder((batch, width, length), dtype=dtype, name="B") 142 | if transposeA and transposeB: 143 | k = tvm.reduce_axis((0, B.shape[2])) 144 | assert_print(A.shape[1].value == B.shape[2].value) 145 | C = tvm.compute( 146 | (A.shape[0], A.shape[2], B.shape[1]), 147 | lambda b, i, j: tvm.sum(A[b, k, i] * B[b, j, k], axis=k) 148 | ) 149 | elif transposeA and not transposeB: 150 | k = tvm.reduce_axis((0, B.shape[1])) 151 | assert_print(A.shape[1].value == B.shape[1].value) 152 | C = tvm.compute( 153 | (A.shape[0], A.shape[2], B.shape[2]), 154 | lambda b, i, j: tvm.sum(A[b, k, i] * B[b, k, j], axis=k) 155 | ) 156 | elif not transposeA and transposeB: 157 | k = tvm.reduce_axis((0, B.shape[2])) 158 | assert_print(A.shape[2].value == B.shape[2].value) 159 | C = tvm.compute( 160 | (A.shape[0], A.shape[1], B.shape[1]), 161 | lambda b, i, j: tvm.sum(A[b, i, k] * B[b, j, k], axis=k) 162 | ) 163 | else: 164 | k = tvm.reduce_axis((0, B.shape[1])) 165 | assert_print(A.shape[2].value == B.shape[1].value) 166 | C = tvm.compute( 167 | (A.shape[0], A.shape[1], B.shape[2]), 168 | lambda b, i, j: tvm.sum(A[b, i, k] * B[b, k, j], axis=k) 169 | ) 170 | return [C.op], [A, B, C] 171 | 172 | 173 | def conv2d_nchw(batch_size, in_channel, inputs_height, inputs_width, out_channel, channel_per_group, kernel_height, kernel_width, if_bias=0, stride=1, padding=0, dilation=1, groups=1, dtype="float32"): 174 | """Convolution 2d NCHW layout 175 | 176 | Args: 177 | ----------------------------- 178 | batch_size, in_channel, inputs_height, inputs_width : int 179 | shape of inputs 180 | inputs : tvm.tensor.Tensor 181 | shape [batch, channel, height, width] 182 | 183 | out_channel, channel_per_group, kernel_height, kernel_width : int 184 | shape of weight 185 | weight : tvm.tensor.Tensor 186 | shape [out_channel, channel // groups, kernel_height, kernel_width] 187 | 188 | if_bias : (optional:0) bool 189 | bias : tvm.tensor.Tensor 190 | shape [out_channel] 191 | 192 | stride : (optional:1) int or tuple 193 | 194 | padding : (optional:0) int or tuple 195 | 196 | dilation: (optional:1) int 197 | 198 | groups : (optional:1) int 199 | ----------------------------- 200 | 201 | Returns: 202 | ----------------------------- 203 | 204 | list:[tvm.tensor.Tensor.op] 205 | 206 | list of bufs: 207 | [inputs, weight, bias, Output] if if_bias 208 | [inputs, weight, Output] otherwise 209 | ----------------------------- 210 | """ 211 | in_h, in_w, k_h, k_w = inputs_height, inputs_width, kernel_height, kernel_width 212 | inputs = tvm.placeholder( 213 | (batch_size, in_channel, in_h, in_w), dtype=dtype, name="inputs") 214 | weight = tvm.placeholder( 215 | (out_channel, channel_per_group, k_h, k_w), dtype=dtype, name="weight") 216 | if if_bias: 217 | bias = tvm.placeholder((out_channel,), dtype=dtype, name="bias") 218 | assert_print(channel_per_group * groups == in_channel) 219 | out_channel_per_group = out_channel // groups 220 | assert_print(out_channel_per_group * groups == out_channel) 221 | 222 | stride = (stride, stride) if isinstance( 223 | stride, (int, tvm.expr.IntImm)) else stride 224 | padding = (padding, padding) if isinstance( 225 | padding, (int, tvm.expr.IntImm)) else padding 226 | dilation = (dilation, dilation) if isinstance( 227 | dilation, (int, tvm.expr.IntImm)) else dilation 228 | assert_print(isinstance(stride, tuple) and len(stride) == 2) 229 | assert_print(isinstance(padding, tuple) and len(padding) == 2) 230 | assert_print(isinstance(dilation, tuple) and len(dilation) == 2) 231 | 232 | out_h = (in_h + 2 * padding[0] - dilation[0] 233 | * (k_h - 1) - 1) // stride[0] + 1 234 | out_w = (in_w + 2 * padding[1] - dilation[1] 235 | * (k_w - 1) - 1) // stride[1] + 1 236 | rc = tvm.reduce_axis((0, channel_per_group)) 237 | rh = tvm.reduce_axis((0, k_h)) 238 | rw = tvm.reduce_axis((0, k_w)) 239 | 240 | padded = zero_pad2d(inputs, padding=padding) 241 | Output = tvm.compute( 242 | (batch_size, out_channel, out_h, out_w), 243 | lambda b, c, h, w: tvm.sum( 244 | (padded[b, c // out_channel_per_group * channel_per_group + rc, 245 | h * stride[0] + rh * dilation[0], w * stride[1] + rw * dilation[1]] 246 | * weight[c, rc, rh, rw]), 247 | axis=[rc, rw, rh] 248 | ) 249 | ) 250 | if if_bias: 251 | Output = tvm.compute( 252 | (batch_size, out_channel, out_h, out_w), 253 | lambda b, c, h, w: Output[b, c, h, w] + bias[c] 254 | ) 255 | return [Output.op], [inputs, weight, bias, Output] 256 | return [Output.op], [inputs, weight, Output] 257 | 258 | 259 | def build_and_run(s, Tensor, control_f, shape, time_count, timeout_build, timeout_cal, count=20, device_id=0, tar="llvm"): 260 | """ Build and record the time of running. 261 | 262 | Args: 263 | ----------------------------- 264 | s : schedule.Schedule get form the student's auto_schedule 265 | 266 | Tensor : (list) 267 | the input tensors and the output tensor 268 | 269 | control_f : the torch function 270 | 271 | shape : arg for control_f 272 | 273 | time_count : used for record the running time 274 | 275 | timeout_build:time limit for building 276 | 277 | timeout_cal : time limit for culation 278 | 279 | count : the number rounds repeat testing 280 | 281 | device_id : the id of CPU 282 | ----------------------------- 283 | 284 | Returns: 285 | ----------------------------- 286 | [tvm_time, torch_time]: 287 | [float , flaot] 288 | which indicates 289 | the total time of running scheduled tvm calculation and 290 | the total time of running torch calculation 291 | ----------------------------- 292 | """ 293 | # Create ctx. 294 | try: 295 | ctx = tvm.cpu(device_id) 296 | except: 297 | print("Can not found device !!!") 298 | time_count.put([-1, -1]) 299 | return -1 300 | # Build function form s and Tensor. 301 | try: 302 | timelimit=ceil(timeout_build) 303 | signal.signal(signal.SIGALRM, handler) 304 | signal.alarm(timelimit) 305 | begin = time.time() 306 | f = tvm.build(s, Tensor, name="my_op") 307 | timepass = time.time()-begin 308 | signal.signal(signal.SIGALRM,signal.SIG_IGN) 309 | if timepass>timeout_build: 310 | print("Timeout in building!") 311 | return -1 312 | except: 313 | traceback.print_exc() 314 | print("Can not build successfully !!!") 315 | time_count.put([-1, -1]) 316 | return -1 317 | try: 318 | Output_tensor = Tensor[-1] 319 | del Tensor[-1] 320 | except: 321 | print("The input is not correct !!!") 322 | time_count.put([-1, -1]) 323 | return -1 324 | # Craft input data. 325 | try: 326 | Input_tvm_batch = [] 327 | Input_torch_batch = [] 328 | for it in range(0, count): 329 | Input_tvm_data = [] 330 | Input_torch_data = [] 331 | 332 | for i in Tensor: 333 | data = np.random.random( 334 | [int(j) for j in i.shape]).astype(np.float32) * 100 335 | tvm_data = tvm.nd.array(data, ctx) 336 | torch_data = torch.tensor(data) 337 | Input_tvm_data.append(tvm_data) 338 | Input_torch_data.append(torch_data) 339 | 340 | Output_holder = tvm.nd.array( 341 | np.zeros([int(j) for j in Output_tensor.shape], 342 | dtype=Output_tensor.dtype), ctx 343 | ) 344 | 345 | Input_tvm_batch.append(Input_tvm_data + [Output_holder]) 346 | Input_torch_batch.append(Input_torch_data) 347 | except: 348 | traceback.print_exc() 349 | print("Can not create input datas !!!") 350 | time_count.put([-1, -1]) 351 | return -1 352 | 353 | try: 354 | f(*Input_tvm_batch[0]) 355 | timelimit = ceil(timeout_cal) 356 | signal.signal(signal.SIGALRM, handler) 357 | signal.alarm(timelimit) 358 | begin = time.time() 359 | for i in range(0, count): 360 | f(*Input_tvm_batch[i]) 361 | tvm_time=time.time()-begin 362 | signal.signal(signal.SIGALRM, signal.SIG_IGN) 363 | if tvm_time>timeout_cal: 364 | print("Results of shape", shape, "Timeout!") 365 | tvm_time = -1 366 | else: 367 | tvm_time/=count 368 | except TimeoutError: 369 | tvm_time = -1 370 | print("Results of shape", shape, "Timeout!") 371 | except: 372 | tvm_time = -1 373 | print("Results of shape", shape, "\n| The culation is not correct !!!") 374 | 375 | try: 376 | control_f(*(Input_torch_batch[0]+[shape])) 377 | begin = time.time() 378 | for i in range(0, count): 379 | control_f(*(Input_torch_batch[i]+[shape])) 380 | torch_time=time.time()-begin 381 | torch_time/=count 382 | except TimeoutError: 383 | torch_time = -1 384 | print("Results of shape", shape, "Timeout!") 385 | except: 386 | torch_time = -1 387 | print("Results of shape", shape, "\n| The culation is not correct !!!") 388 | 389 | print("Results of shape", shape, " \n| your time:", tvm_time, " s| pytorch time:", torch_time, "s\n") 390 | time_count.put([tvm_time, torch_time]) 391 | 392 | 393 | def _auto_schedule(auto_schedule_func, func, shape, timeout_create): 394 | '''Interface of auto_schedule 395 | 396 | Args: 397 | ----------------------------- 398 | auto_schedule_func : auto_schedule function 399 | 400 | func : conv2d_nchw or gemm 401 | 402 | shape : args for auto_schedule 403 | 404 | timeout_create : time limit for auto_schedule_func 405 | ----------------------------- 406 | 407 | Returns: 408 | ----------------------------- 409 | list:[tvm.tensor.Tensor.op] 410 | 411 | list of bufs in func 412 | 413 | timepass: float 414 | ----------------------------- 415 | ''' 416 | 417 | timelimit=ceil(timeout_create) 418 | signal.signal(signal.SIGALRM, handler) 419 | signal.alarm(timelimit) 420 | time_now=time.time() 421 | s, bufs = auto_schedule_func(func, shape) 422 | timepass = time.time()-time_now 423 | signal.signal(signal.SIGALRM,signal.SIG_IGN) 424 | return s, bufs, timepass 425 | 426 | 427 | def _evaluate(torch_func, func, shape, target, dev_id, times, timeout_create, timeout_build, timeout_cal, time_count): 428 | '''evaluating auto_schedule in special shape 429 | 430 | Args: 431 | ----------------------------- 432 | torch_func : torch_conv2d or torch_batch_gemm 433 | interface of torch function 434 | 435 | auto_schedule : function from student 436 | 437 | func : conv2d_nchw or batch_gemm 438 | 439 | shape : list 440 | args for func 441 | 442 | target : string 443 | 444 | dev_id : int 445 | 446 | times : int 447 | times of calculating in Build_and_Run 448 | 449 | timeout_create : float 450 | time limit in creating schedule 451 | 452 | timeout_build : float 453 | time limit in building 454 | 455 | timeout_cal : float 456 | time limit in calculating 457 | 458 | time_count : Queue 459 | for testing result transferring 460 | ----------------------------- 461 | 462 | Returns: 463 | ----------------------------- 464 | None 465 | ----------------------------- 466 | ''' 467 | 468 | 469 | # import student module 470 | try: 471 | student_module = load_module('student_module', *find_module(student_id, ['../extract'])) 472 | except ImportError: 473 | score_list = [0 for i in range(20)] 474 | write_score(student_id, res_path, score_list, score_item, 'Error in Importing') 475 | print('An error occurs when importing the file as module:', unpack_path) 476 | return -1 477 | 478 | # testing if auto_schedule work with time limit 479 | try: 480 | s, bufs, timepass = _auto_schedule(student_module.auto_schedule, func, shape, timeout_create) 481 | except: 482 | traceback.print_exc() 483 | print("failed in auto_schedule!") 484 | return -1 485 | 486 | if timepass>timeout_create: 487 | print("timeout in auto_schedule!") 488 | return -1 489 | 490 | # testing calculating speed in Build_and_Run with time limit 491 | try: 492 | build_and_run(s, bufs, torch_func, shape, time_count, timeout_build, 493 | timeout_cal, times, dev_id, target) 494 | except Exception as e: 495 | print("failed in build_and_run!") 496 | print(e) 497 | return -1 498 | 499 | 500 | def evaluate(torch_func, student_id, func, shape, target, dev_id=0, timeout_create=10.0, timeout_build=10.0, timeout_cal=10.0, times=10): 501 | '''evaluating auto_schedule with a single shape 502 | 503 | Args: 504 | ----------------------------- 505 | torch_func : torch_conv2d or torch_batch_gemm 506 | interface of torch function 507 | 508 | student_id : student module name 509 | 510 | func : conv2d_nchw or batch_gemm 511 | 512 | shape : a single shape 513 | args for func 514 | 515 | target : string 516 | 517 | dev_id : (optional: 0) int 518 | 519 | timeout_create : (optional: 10.0) float 520 | time limit in creating schedule 521 | 522 | timeout_cal : (optional: 10.0) float 523 | time limit in calculating 524 | 525 | timeout_build : (optional: 10.0) float 526 | time limit in building 527 | 528 | times : (optional: 10) int 529 | times of calculating in Build_and_Run 530 | ----------------------------- 531 | 532 | Returns: 533 | ----------------------------- 534 | list : [auto_time,torch_time] for each shape 535 | ----------------------------- 536 | ''' 537 | assert shape != None, "empty shape!" 538 | 539 | # proc = [] 540 | # number = len(shape) 541 | # time_count = [Queue() for i in range(number)] 542 | time_count = Queue() 543 | try: 544 | p = Process(target=_evaluate, args=( 545 | torch_func, student_id, func, shape, target, dev_id, times, timeout_create, timeout_build, 546 | timeout_cal, time_count)) 547 | p.start() 548 | # proc.append(p) 549 | except: 550 | print("failed in creating process!") 551 | 552 | # waiting for testing 553 | timeout = timeout_create+timeout_cal+timeout_build 554 | beg = time.time() 555 | try: 556 | while time.time() - beg < timeout: 557 | if p.is_alive(): 558 | time.sleep(.1) 559 | else: 560 | break 561 | ''' 562 | if any(p.is_alive() for p in proc): 563 | time.sleep(.1) 564 | else: 565 | break 566 | ''' 567 | else: 568 | p.terminate() 569 | p.join() 570 | except: 571 | print("failed in waiting for evaluating") 572 | 573 | # collecting testing result 574 | # ans = [[[-1] for col in range(2)] for i in range(number)] 575 | ans = [-1, -1] 576 | if not time_count.empty(): 577 | auto_time, torch_time = time_count.get() 578 | if torch_time == -1: 579 | print("time out in torch fuction!") 580 | elif auto_time != -1: 581 | ans = [auto_time, torch_time] 582 | else: 583 | print("failed in auto_schedule or time out!") 584 | 585 | return ans 586 | 587 | # overload Pool in order to non-daemonize 588 | class NonDaemonProcess(Process): 589 | def _get_daemon(self): 590 | return False 591 | def _set_daemon(self, value): 592 | pass 593 | daemon = property(_get_daemon, _set_daemon) 594 | 595 | class NewPool(pool.Pool): 596 | Process = NonDaemonProcess 597 | 598 | def parallel_evaluate(): 599 | """evaluate process 600 | 601 | student level : synchro 602 | operator level : synchro 603 | shape level : asynchro 604 | """ 605 | # dir preparation 606 | sub_dir = '../submits' 607 | res_dir = '../results' 608 | imp_dir = '../extract' 609 | res_file = 'project2_score.txt' 610 | res_path = os.path.join(res_dir, res_file) 611 | score_item = ['gemm' + str(i) for i in range(10)] + ['conv2d' + str(i) for i in range(10)] 612 | 613 | if os.path.exists(res_dir) and os.path.isdir(res_dir) and os.listdir(res_dir): 614 | print(res_dir, "is not empty, you'd better copy the contents and clean it, now exit...") 615 | return 616 | if not os.path.exists(res_dir) or not os.path.isdir(res_dir): 617 | os.mkdir(res_dir) 618 | if os.path.exists(imp_dir) and os.path.isdir(imp_dir) and os.listdir(imp_dir): 619 | print(imp_dir, "is not empty. Automatically clean it, now continue...") 620 | shutil.rmtree(imp_dir, ignore_errors=True) 621 | if not os.path.exists(imp_dir) or not os.path.isdir(imp_dir): 622 | os.mkdir(imp_dir) 623 | 624 | total_tasks = list(os.listdir(sub_dir)) 625 | 626 | # test coeffs; currently random 627 | conv2d_shapes = [[4, 6, 7, 7, 9, 2, 3, 3, 1, 2, 1, 2, 3] for i in range(10)] 628 | gemm_shapes = [[4, 6, 7] for i in range(10)] 629 | target = 'llvm' 630 | 631 | # for stdout logs 632 | count_task = 0 633 | num_tasks = len(total_tasks) 634 | start_time = time.time() 635 | 636 | # exception info 637 | # prob_exceptions = ('Import Failure', 'illegal auto_schedule', 'TLE auto_schedule', 'Build Failure', 'TLE run') 638 | 639 | # evaluate func 640 | def pool_evaluate(shapes, veri_func, student_id, func, target): 641 | # create process Pool for shapes 642 | p = NewPool() 643 | run_time = [] 644 | # exception_stat = [0, 0, 0, 0, 0] 645 | exception_stat = 0 646 | sub_procs = [] 647 | for shape in shapes: 648 | subp = p.apply_async(evaluate, (veri_func, student_id, func, shape, target)) 649 | sub_procs.append(subp) 650 | ''' 651 | if case_time <= -1: 652 | exception_stat[-1 - case_time] += 1 653 | run_time.append([1, 0]) 654 | else: 655 | run_time.append(case_time) 656 | ''' 657 | p.close() 658 | p.join() 659 | 660 | for subp in sub_procs: 661 | case_time = subp.get() 662 | if case_time[0] == -1: 663 | exception_stat += 1 664 | run_time.append([1, 0]) 665 | else: 666 | run_time.append(case_time) 667 | score_list = list(map(score_calculate, run_time)) 668 | 669 | return score_list, exception_stat 670 | 671 | for filezip in total_tasks: 672 | # stdout logs 673 | count_task += 1 674 | logs = '\rprocessing:{} | [finished/total] = [{}/{}] | [passed: {}s]'.format( 675 | filezip, count_task, num_tasks, int(time.time() - start_time)) 676 | sys.stdout.write(logs + '\n') 677 | sys.stdout.flush() 678 | 679 | # parse the packed archive 680 | zip_path = os.path.join(sub_dir, filezip) 681 | student_id = filezip.split('.')[0] 682 | unpack_path = os.path.join(imp_dir, student_id + '/') 683 | try: 684 | shutil.unpack_archive(zip_path, unpack_path) 685 | except (ValueError, ReadError): 686 | score_list = [0 for i in range(20)] 687 | write_score(student_id, res_path, score_list, score_item, 'Error in Unpacking') 688 | print('An error occurs when unpacking the archive:', filezip) 689 | continue 690 | 691 | # evaluate 692 | gemm_scores, gemm_exc = pool_evaluate(gemm_shapes, torch_batch_gemm, student_id, batch_gemm, target) 693 | conv_scores, conv_exc = pool_evaluate(conv2d_shapes, torch_conv2d, student_id, conv2d_nchw, target) 694 | 695 | if gemm_exc + conv_exc: 696 | exception_info = ' exception raises in {} cases'.format(gemm_exc + conv_exc) 697 | else: 698 | exception_info = ' No exceptions' 699 | ''' 700 | for i in range(5): 701 | if gemm_exception[i] + conv_exception[i] > 0: 702 | exception_info += prob_exceptions[i] + 'in {} cases'.format(gemm_exception[i] + conv_exception[i]) 703 | ''' 704 | score_list = gemm_scores + conv_scores 705 | 706 | write_score(student_id, res_path, score_list, score_item, exception_info) 707 | 708 | return 709 | 710 | def write_score(student_id, res_file, score_list, score_item, prob_error=''): 711 | """write score into result file 712 | 713 | Parameters 714 | ---------- 715 | student_id: str 716 | res_file: str 717 | path of file to record scores 718 | score_list: list 719 | scores in each test 720 | score_item: list 721 | test names 722 | prob_error: str 723 | exceptions and errors occurring during tests 724 | 725 | Returns 726 | ------- 727 | """ 728 | total_score = sum(score_list) 729 | line = '{}: '.format(student_id) 730 | for i in range(len(score_item)): 731 | line += '{}:{} '.format(score_item[i], score_list[i]) 732 | line += 'total:{} '.format(total_score) 733 | line += 'exceptions:{}\n'.format(prob_error) 734 | with open(res_file, 'a') as f: 735 | f.write(line) 736 | return 737 | 738 | def score_calculate(time_tuple): 739 | """scores based on look-up table 740 | 741 | Parameters 742 | ---------- 743 | time_tuple: list 744 | with format [auto_time, torch_time] 745 | 746 | Returns 747 | ------- 748 | case_score: float 749 | scores calculated based on the look-up table 750 | """ 751 | time_tvm = time_tuple[0] 752 | time_torch = time_tuple[1] 753 | 754 | if time_tvm == -1: 755 | return -1 756 | perf_rate = time_torch / time_tvm 757 | if perf_rate <= 0.1: 758 | return 0 759 | elif 0.1 < perf_rate <= 0.2: 760 | return 0.5 761 | elif 0.2 < perf_rate <= 0.4: 762 | return 1.5 763 | elif 0.4 < perf_rate <= 0.5: 764 | return 2.5 765 | elif 0.5 < perf_rate <= 0.7: 766 | return 4.0 767 | else: 768 | return 5.0 769 | 770 | if __name__ == '__main__': 771 | parallel_evaluate() 772 | --------------------------------------------------------------------------------