├── LICENSE ├── README.md ├── attachments └── overview.png ├── doc ├── ecdsa.md └── guide │ └── aws-f1-usage-cn.md └── test ├── helloworld_ocl ├── README.md ├── hw │ ├── sdaccel_profile_summary.csv │ ├── sdaccel_profile_summary.html │ ├── sdaccel_timeline_trace.csv │ └── sdaccel_timeline_trace.html └── hw_emu │ ├── emconfig.json │ ├── emulation_debug.log │ ├── sdaccel_profile_summary.csv │ ├── sdaccel_profile_summary.html │ ├── sdaccel_timeline_trace.csv │ └── sdaccel_timeline_trace.html └── vector_addition_1000 ├── reports ├── sdaccel_profile_summary.csv ├── sdaccel_profile_summary.html ├── sdaccel_timeline_trace.csv └── sdaccel_timeline_trace.html └── src ├── host.cpp ├── krnl_vadd.cl └── vadd.h /LICENSE: -------------------------------------------------------------------------------- 1 | GNU LESSER GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Copyright (C) 2007 Free Software Foundation, Inc. 5 | Everyone is permitted to copy and distribute verbatim copies 6 | of this license document, but changing it is not allowed. 7 | 8 | 9 | This version of the GNU Lesser General Public License incorporates 10 | the terms and conditions of version 3 of the GNU General Public 11 | License, supplemented by the additional permissions listed below. 12 | 13 | 0. Additional Definitions. 14 | 15 | As used herein, "this License" refers to version 3 of the GNU Lesser 16 | General Public License, and the "GNU GPL" refers to version 3 of the GNU 17 | General Public License. 18 | 19 | "The Library" refers to a covered work governed by this License, 20 | other than an Application or a Combined Work as defined below. 21 | 22 | An "Application" is any work that makes use of an interface provided 23 | by the Library, but which is not otherwise based on the Library. 24 | Defining a subclass of a class defined by the Library is deemed a mode 25 | of using an interface provided by the Library. 26 | 27 | A "Combined Work" is a work produced by combining or linking an 28 | Application with the Library. The particular version of the Library 29 | with which the Combined Work was made is also called the "Linked 30 | Version". 31 | 32 | The "Minimal Corresponding Source" for a Combined Work means the 33 | Corresponding Source for the Combined Work, excluding any source code 34 | for portions of the Combined Work that, considered in isolation, are 35 | based on the Application, and not on the Linked Version. 36 | 37 | The "Corresponding Application Code" for a Combined Work means the 38 | object code and/or source code for the Application, including any data 39 | and utility programs needed for reproducing the Combined Work from the 40 | Application, but excluding the System Libraries of the Combined Work. 41 | 42 | 1. Exception to Section 3 of the GNU GPL. 43 | 44 | You may convey a covered work under sections 3 and 4 of this License 45 | without being bound by section 3 of the GNU GPL. 46 | 47 | 2. Conveying Modified Versions. 48 | 49 | If you modify a copy of the Library, and, in your modifications, a 50 | facility refers to a function or data to be supplied by an Application 51 | that uses the facility (other than as an argument passed when the 52 | facility is invoked), then you may convey a copy of the modified 53 | version: 54 | 55 | a) under this License, provided that you make a good faith effort to 56 | ensure that, in the event an Application does not supply the 57 | function or data, the facility still operates, and performs 58 | whatever part of its purpose remains meaningful, or 59 | 60 | b) under the GNU GPL, with none of the additional permissions of 61 | this License applicable to that copy. 62 | 63 | 3. Object Code Incorporating Material from Library Header Files. 64 | 65 | The object code form of an Application may incorporate material from 66 | a header file that is part of the Library. You may convey such object 67 | code under terms of your choice, provided that, if the incorporated 68 | material is not limited to numerical parameters, data structure 69 | layouts and accessors, or small macros, inline functions and templates 70 | (ten or fewer lines in length), you do both of the following: 71 | 72 | a) Give prominent notice with each copy of the object code that the 73 | Library is used in it and that the Library and its use are 74 | covered by this License. 75 | 76 | b) Accompany the object code with a copy of the GNU GPL and this license 77 | document. 78 | 79 | 4. Combined Works. 80 | 81 | You may convey a Combined Work under terms of your choice that, 82 | taken together, effectively do not restrict modification of the 83 | portions of the Library contained in the Combined Work and reverse 84 | engineering for debugging such modifications, if you also do each of 85 | the following: 86 | 87 | a) Give prominent notice with each copy of the Combined Work that 88 | the Library is used in it and that the Library and its use are 89 | covered by this License. 90 | 91 | b) Accompany the Combined Work with a copy of the GNU GPL and this license 92 | document. 93 | 94 | c) For a Combined Work that displays copyright notices during 95 | execution, include the copyright notice for the Library among 96 | these notices, as well as a reference directing the user to the 97 | copies of the GNU GPL and this license document. 98 | 99 | d) Do one of the following: 100 | 101 | 0) Convey the Minimal Corresponding Source under the terms of this 102 | License, and the Corresponding Application Code in a form 103 | suitable for, and under terms that permit, the user to 104 | recombine or relink the Application with a modified version of 105 | the Linked Version to produce a modified Combined Work, in the 106 | manner specified by section 6 of the GNU GPL for conveying 107 | Corresponding Source. 108 | 109 | 1) Use a suitable shared library mechanism for linking with the 110 | Library. A suitable mechanism is one that (a) uses at run time 111 | a copy of the Library already present on the user's computer 112 | system, and (b) will operate properly with a modified version 113 | of the Library that is interface-compatible with the Linked 114 | Version. 115 | 116 | e) Provide Installation Information, but only if you would otherwise 117 | be required to provide such information under section 6 of the 118 | GNU GPL, and only to the extent that such information is 119 | necessary to install and execute a modified version of the 120 | Combined Work produced by recombining or relinking the 121 | Application with a modified version of the Linked Version. (If 122 | you use option 4d0, the Installation Information must accompany 123 | the Minimal Corresponding Source and Corresponding Application 124 | Code. If you use option 4d1, you must provide the Installation 125 | Information in the manner specified by section 6 of the GNU GPL 126 | for conveying Corresponding Source.) 127 | 128 | 5. Combined Libraries. 129 | 130 | You may place library facilities that are a work based on the 131 | Library side by side in a single library together with other library 132 | facilities that are not Applications and are not covered by this 133 | License, and convey such a combined library under terms of your 134 | choice, if you do both of the following: 135 | 136 | a) Accompany the combined library with a copy of the same work based 137 | on the Library, uncombined with any other library facilities, 138 | conveyed under the terms of this License. 139 | 140 | b) Give prominent notice with the combined library that part of it 141 | is a work based on the Library, and explaining where to find the 142 | accompanying uncombined form of the same work. 143 | 144 | 6. Revised Versions of the GNU Lesser General Public License. 145 | 146 | The Free Software Foundation may publish revised and/or new versions 147 | of the GNU Lesser General Public License from time to time. Such new 148 | versions will be similar in spirit to the present version, but may 149 | differ in detail to address new problems or concerns. 150 | 151 | Each version is given a distinguishing version number. If the 152 | Library as you received it specifies that a certain numbered version 153 | of the GNU Lesser General Public License "or any later version" 154 | applies to it, you have the option of following the terms and 155 | conditions either of that published version or of any later version 156 | published by the Free Software Foundation. If the Library as you 157 | received it does not specify a version number of the GNU Lesser 158 | General Public License, you may choose any version of the GNU Lesser 159 | General Public License ever published by the Free Software Foundation. 160 | 161 | If the Library as you received it specifies that a proxy can decide 162 | whether future versions of the GNU Lesser General Public License shall 163 | apply, that proxy's public statement of acceptance of any version is 164 | permanent authorization for you to choose that version for the 165 | Library. 166 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Design and FPGA Implementation of Fast Signature Verification 2 | 3 | Since the signature verification is the most computationally intensive work in 4 | a non-PoW blockchain system, it's performance is directly related to the whole 5 | network's throughput. 6 | 7 | This project aims to design a fast signature verification module implemented on 8 | Field-Programmable Gate Array(FPGA) devices, which processes multiple verification 9 | tasks in parallel, and will be used in Ontology Network. 10 | 11 | At the first stage, the goals are: 12 | 13 | * Implementation of ECDSA verification with curve P-256. 14 | * Task Manager, the host program which schedules tasks and collect results, and 15 | can be integrated into the Ontology Node. 16 | * The interface between Task Manager and Ontology Node. 17 | 18 | # General Design 19 | 20 | The Ontology node process collects a batch of signature verification tasks and 21 | sends them to FPGA board throw Task Manager for processing. After all tasks 22 | done, Task Manager collects the results and give back to the node. 23 | 24 | ![](./attachments/overview.png) 25 | 26 | Currently, the target platform is AWS F1, a EC2 instance with Xilinx Virtex 27 | UltraScale+ VU9P FPGA carried. 28 | 29 | ## Kernel 30 | 31 | ### Input 32 | 33 | A verification task needs five parameters serialized into a byte sequence: 34 | 35 | | -- hash -- | --- x --- | --- y --- | --- r --- | --- s --- | 36 | 37 | Each of the five is 32 byte length. 38 | 39 | * hash is the signed hash value 40 | * x and y are the affine coordinates of the public key, in little-endian 41 | * r and s are the 2 integers of signature data, in little-endian 42 | 43 | Since, the input data for a task is a 160 bytes sequence. 44 | 45 | As for all tasks, the input data are concatenated into a large byte sequence 46 | and passed to the board, stored in DDR from address 0x10. There is another 4 47 | bytes argument which indicates the number of tasks, and stored at 0x00. 48 | 49 | ### Ouput 50 | 51 | The verification result is a bool value. To get all the results, pass an output 52 | array to the kernel. 53 | 54 | ### ECDSA Core 55 | 56 | See [this](./doc/ecdsa.md) for details about ECDSA verification. 57 | 58 | 59 | # Contributing 60 | 61 | Contributions are welcome! 62 | 63 | Feel free to open issues for discussion, and create pull requests to post your updates. 64 | 65 | 66 | # License 67 | 68 | This project is under LGPL v3.0 license. 69 | -------------------------------------------------------------------------------- /attachments/overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ontio/ontology-fpga/b1e61603535705967b8ff53e07a445bce1f83242/attachments/overview.png -------------------------------------------------------------------------------- /doc/ecdsa.md: -------------------------------------------------------------------------------- 1 | # Prime Field 2 | 3 | Given a odd prime p, the integers {0, 1, ..., p-1} form a finite field Fp. 4 | 5 | Operations: 6 | 7 | * Addition 8 | 9 | ``` 10 | a + b mod p 11 | ``` 12 | 13 | * Subtraction 14 | 15 | ``` 16 | a - b mod p 17 | ``` 18 | 19 | * Multiplication 20 | 21 | ``` 22 | a * b mod p 23 | ``` 24 | 25 | * Inversion 26 | 27 | ``` 28 | a^-1 mod p 29 | ``` 30 | 31 | # Elliptic Curve 32 | 33 | An elliptic curve defined over a prime field Fp with the following equation: 34 | 35 | y^2 = x^3 + a*x + b 36 | 37 | Basic parameters: 38 | 39 | * p: the prime 40 | * a 41 | * b 42 | * G: the base point 43 | * n: order of the base point G 44 | 45 | As for P-256, the parameters are: 46 | 47 | ``` 48 | p = 2^256 - 2^224 + 2^192 + 2^96 - 1 49 | a = p-3 50 | G: x = 0x6b17d1f2e12c4247f8bce6e563a440f277037d812deb33a0f4a13945d898c296 51 | y = 0x4fe342e2fe1a7f9b8ee7eb4a7c0f9e162bce33576b315ececbb6406837bf51f5 52 | n = 0xffffffff00000000ffffffffffffffffbce6faada7179e84f3b9cac2fc632551 53 | ``` 54 | 55 | Curve point: P with coordinates (x, y) 56 | Infinite point: O 57 | 58 | **Note**: all the computations below are on field Fp. 59 | 60 | ## Double Point 61 | 62 | **Input** point P1(x1, y1) 63 | 64 | **Output** point P2(x2, y2) = [2]P1 65 | 66 | If P1 is O, the result is O. 67 | 68 | Else, process the following computation. 69 | 70 | ``` 71 | X1 = x1, Y1 = y1, Z1 = 1 72 | S = 4 * X1 * Y1^2 73 | M = 3 * X1^2 + a * Z1^4 74 | X2 = M^2 - 2 * S 75 | Y2 = M * (S - X2) - 8 * Y1^4 76 | Z2 = 2 * Y1 * Z1 77 | x2 = X2 / Z2^2 78 | y2 = Y2 / Z2^3 79 | return (x2, y2) 80 | ``` 81 | 82 | 83 | ## Point Addition 84 | 85 | **Input** two point P1(x1, y1) and P2(x2, y2). 86 | 87 | **Output** point P3(x3, y3) = P1 + P2 88 | 89 | 90 | If P1 is O, then P3 = P2. 91 | 92 | If P2 is O, then P3 = P1. 93 | 94 | Else, process the following computation. 95 | 96 | 97 | ``` 98 | X1 = x1, Y1 = y1, Z1 = 1 99 | X2 = x2, Y2 = y2, Z2 = 1 100 | 101 | U1 = X1 * Z2^2 102 | U2 = X2 * Z1^2 103 | S1 = Y1 * Z2^3 104 | S2 = Y2 * Z1^3 105 | 106 | if U1 == U2 && S1 != S2: 107 | return O 108 | else if U1 == U2 && S1 == S2: 109 | return [2]P1 110 | else: 111 | H = U2 - U1 112 | R = S2 - S1 113 | X3 = R^2 - H^3 - 2 * U1 * H^2 114 | Y3 = R * (U1 * H^2 - X3) - S1 * H^3 115 | Z3 = H * Z1 * Z2 116 | x3 = X3 / Z3^2 117 | y3 = Y3 / Z3^3 118 | return (x3, y3) 119 | ``` 120 | 121 | 122 | While [2]P is the double point of P, [k]P is the k times point of P. Calculation 123 | of [k]P is the combination of Double Point and Point Addition, for example 124 | [6]P = [2]([2]P + P). 125 | 126 | 127 | # ECDSA Verification 128 | 129 | **Input** 130 | 131 | * Public key P(x, y), a point on the curve 132 | * Signed data e, a byte sequence, usually the digest of the original message 133 | generated by a hash function 134 | * Signature (r, s), a pair of integers in Fp. 135 | 136 | **Output** 137 | 138 | TRUE or FALSE 139 | 140 | **Process** 141 | 142 | 1. Verify that r, s ∈ [1, n-1]. IF not, return FALSE. 143 | 2. w = s^-1 mod n 144 | 3. u1 = e*w mod n, u2 = r*w mod n 145 | 4. Q(x1, y1) = [u1]G + [u2]P. If Q is the infinite point, return FALSE. 146 | 5. If r ≡ x1 mod n, return TRUE. Else return FALSE. 147 | 148 | 149 | # Optimizations 150 | 151 | ## Montgomery Multiplication 152 | 153 | The prime field multiplication is the most time consuming operation and could 154 | be optimized using Montgomery Multiplication (MM). 155 | 156 | Assume `p` is a `l` bits prime, MM can efficiently calculate `a * b * 2^-l mod p`. 157 | 158 | ``` 159 | Input p, a, b, l = s * k, where s is the unit length for calculation 160 | 161 | Output a * b * 2^-l mod p 162 | 163 | Pre-compute k0 = -p^-1 mod 2^s 164 | 165 | t = a * b 166 | for i <- 1 to k: 167 | t1 = t mod 2^s 168 | u = t1 * k0 mod 2^s 169 | t2 = u * p 170 | t3 = t + t2 171 | t = t3 / 2^s 172 | 173 | if t >= p: x = t - p 174 | else: x = p 175 | 176 | return x 177 | ``` 178 | 179 | Obviously, the only division operation can be done by bit shifts. 180 | 181 | For calculating `a * b mod p`, first transform `a` and `b` to `a'` and `b'`: 182 | 183 | ``` 184 | a' = a * 2^l ≡ a * 2^2l * 2^-l mod p 185 | b' = b * 2^l ≡ b * 2^2l * 2^-l mod p 186 | ``` 187 | 188 | Then calculate 189 | 190 | ``` 191 | t = a' * b' * 2^-l mod p 192 | a * b ≡ 1 * t * 2^-l mod p 193 | ``` 194 | 195 | Since, MM is complete for the whole calculation. 196 | 197 | 198 | ## Fast Multi-Scalar 199 | 200 | The step 4 of ECDSA Verification is the slowest because of the scalar multiplication 201 | of the points which involves plenty of prime field operations. There are several 202 | techniques to accelerate this. 203 | 204 | The basic is to calculate the target point with Jacobian projective coordinates. 205 | This avoids the inversion operation of Fp. 206 | 207 | ### Shamir's Trick 208 | 209 | Shamir's trick is a technique to calculate [m]P + [n]Q simultaneously. The general 210 | idea is to represent m and n in binary. For example 145 and 207 has the following 211 | representation: 212 | 213 | 145 = (1 0 0 1 0 0 0 1) 214 | 207 = (1 1 0 0 1 1 1 1) 215 | 216 | Traverse the two from left to right simultaneously. For each 1 do an addition 217 | to the result and doubling it before going to the next bit. 218 | 219 | Obviously this needs l doublings, while the number of additions depends on the 220 | so called Hamming Weight, i.e. the number of non-zero columns. 221 | 222 | Signed binary representation is a better form which can reduce the Hamming Weight. 223 | It uses {-1, 0, 1} as the bit values. Following is a representation of 145 and 224 | 207 with the Hamming Weight 5: 225 | 226 | 145 = (0 1 0 0 1 0 0 0 1) 227 | 207 = (1 0 -1 0 1 0 0 0 -1) 228 | 229 | With pre-compiled P+Q and P-Q, it only needs 5 additions. 230 | 231 | ### Joint Sparse Form 232 | 233 | An further impoved method is the Joint Sparse Form (JSF) which generates signed 234 | binary representations from right to left with lower Hamming Weight. 235 | 236 | TODO 237 | 238 | ### Double-base Chains 239 | 240 | Traditional representations of integers are single based such as bases 2, 8, 10 241 | and 16. Double-base number system use two bases to represent integers. For example 242 | 243 | n = SUM_i(c_i * 2^a_i * 3^b_i) 244 | 245 | This uses 2 and 3 as the double base. 246 | 247 | For all `a_i` and `b_i`, if `a_i >= a_i+1` and `b_i >= b_i+1`, we call the 248 | representation as a Joint Double-Base Chain (JDBC) 249 | 250 | Let `v_p(x)` denote the largest exponent of p that satisfies `p^v_p(x)` devids 251 | x. And denote `v_p(x, y) = min(v_p(x), v_p(y))`. To generate a JDBC of m and n, 252 | we can use the following algorithm: 253 | 254 | ``` 255 | i = 0 256 | a_i = v_2(m, n) 257 | b_i = v_3(m, n) 258 | x = m / (2^a_i * 3^b_i) 259 | y = n / (2^a_i * 3^b_i) 260 | 261 | while x > 1 or y > 1: 262 | find the largest g = 2^v_2(x - c_i, y - d_i) * 3^v_3(x - c_i, y - d_i), where c_i, d_i in {-1, 0, 1} 263 | x = (x - c_i) / g 264 | y = (y - d_i) / g 265 | a_i = a_i-1 + v_2(g) 266 | b_i = b_i-1 + v_3(g) 267 | 268 | c_i = x 269 | d_i = y 270 | 271 | inverse the sequences [a_i], [b_i], [c_i] and [d_i] 272 | ``` 273 | 274 | The JDBC for m and n are then 275 | 276 | m = SUM_i(c_i * 2^a_i * 3^b_i) for m 277 | n = SUM_i(d_i * 2^a_i * 3^b_i) for n 278 | 279 | For example, the JDBC of 542788 and 462444 have the following representations: 280 | 281 | a_i 11 9 7 7 5 5 5 4 2 282 | b_i 5 4 4 3 3 2 1 0 0 283 | 542788 c_i 1 1 0 1 0 1 -1 0 1 284 | 462444 d_i 1 -1 1 -1 -1 1 -1 1 -1 285 | 286 | JDBC can further reduce the Hamming Weight, while involves the tripling operation 287 | in the multi-scalar. Following is an algorithm for calculate [m]P + [n]Q with 288 | JDBC: 289 | 290 | ``` 291 | Pre-compute P+Q and P-Q 292 | let l be the length of the JDBC 293 | result = 0 294 | for i <- 0 to l-1: 295 | if c_i == 1 and d_i == 1: 296 | result = result + (P+Q) 297 | else if c_i == -1 and d_i == -1: 298 | result = result - (P+Q) 299 | else if c_i == 1 and d_i == -1: 300 | result = result + (P-Q) 301 | else if c_i == -1 and d_i == 1: 302 | result = result - (P-Q) 303 | else if c_i != 0: 304 | result = result + c_i * P 305 | else if d_i != 0: 306 | result = result + d_i * Q 307 | 308 | for j <- 1 to a_i - a_i+1: 309 | result = double(result) 310 | for k <- 1 to b_i - b_i+1: 311 | result = triple(result) 312 | ``` 313 | 314 | # Further Reading 315 | 316 | [1] nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf 317 | 318 | [2] https://eprint.iacr.org/2013/816.pdf 319 | 320 | [3] http://www.hackersdelight.org/MontgomeryMultiplication.pdf 321 | 322 | [4] http://web.science.mq.edu.au/~doche/asilomar.pdf 323 | 324 | [5] http://www.ijana.in/papers/V4I2-8.pdf 325 | 326 | [6] https://www.iacr.org/archive/eurocrypt2009/54790501/54790501.pdf 327 | -------------------------------------------------------------------------------- /doc/guide/aws-f1-usage-cn.md: -------------------------------------------------------------------------------- 1 | ## 在AWS上部署FPGA环境 2 | 3 | AWS的EC2服务提供搭载FPGA的虚拟主机F1,以及用于开发FPGA应用的虚拟主机镜像FPGA Developer AMI。利用AWS的服务可以很方便地进行FPGA应用的开发、部署及测试工作。 4 | 5 | 部署一个FPGA程序分为三个过程: 6 | 7 | 1. 编译FPGA程序 8 | 2. 制作AFI镜像 9 | 3. 将AFI部署到F1主机 10 | 11 | ## 所需环境 12 | 13 | 编译环境: 14 | 15 | * 一台性能足够的主机,可使用AWS的虚拟主机,直接使用部署FPGA应用的F1主机亦可。 16 | * SDAccel。 17 | 18 | 制作AFI镜像: 19 | 20 | * AWS CLI。 21 | * 一个AWS S3 Bucket。 22 | 23 | 部署并环境: 24 | 25 | * 搭载FPGA Developer AMI的AWS F1主机。 26 | 27 | ## 申请F1主机权限 28 | 29 | 默认情况下,AWS用户的F1主机数量上限为0,因此无法部署F1主机,需要申请提升限制。 30 | 31 | 1. 进入客服界面[http://aws.amazon.com/contact-us/ec2-request](http://aws.amazon.com/contact-us/ec2-request)。 32 | 2. 选择Service Limit Increase。 33 | 3. 选择EC2 Instances。 34 | 4. 选择所要部署主机的区域,目前仅US East (N.Virginia), US West (Oregon) 和 EU (Ireland) 支持部署F1主机。 35 | 5. 在Instance type一项选择所需的F1主机类型(f1.2xlarge或f1.16xlarge)。 36 | 6. 在New limit value一项输入所需的F1主机数量。 37 | 7. 点击Submit提交。 38 | 39 | 提交申请后需等待24到48小时。当收到处理完成的邮件后即可部署F1主机。 40 | 41 | 42 | ## 配置虚拟主机 43 | 44 | 1. 进入AWS控制台 -> EC2。 45 | 2. 在左边的导航栏选择Instances。 46 | 3. 在右边的界面点击Launch Instance。 47 | 4. 在弹出的界面中选择所需的镜像,进而选择主机配置。若是部署F1主机,需在AWS Marcketplace中搜索FPGA,选择搜索到的FPGA Developer AMI,进而选择f1.2xlarge或f1.16xlarge主机。 48 | 5. 点击Review and Launch使用默认配置,也可以逐步点击Next进行详细配置。 49 | 6. 在Review Instance Launch检查主机配置参数,确认无误后点击Launch。 50 | 7. 在弹出的对话框中选择或新建一个密钥,此密钥即作为ssh登录的认证密钥。 51 | 8. 点击Launch Instances启动主机。 52 | 53 | 主机启动完成后,可以在控制台查看其公共域名和IP。 54 | 55 | 使用SSH登录主机 56 | 57 | ssh -i @ 58 | 59 | * ``是创建密钥时保存的.pem密钥文件的路径。 60 | * ``是主机的公共域名或IP。 61 | * ``是登录的用户名。默认用户根据所选的主机镜像而不同。若选择的是FPGA Developer AMI,默认用户名是centos。 62 | 63 | ## 配置S3 Bucket 64 | 65 | 部署FPGA应用前需将编译好的应用制作成AFI镜像,这一过程需要将编译好的应用存储到S3上,并由AWS完成镜像的制作。因此需要创建一个S3 Bucket。 66 | 67 | 1. 登录AWS控制台 -> S3。 68 | 2. 点击Create Bucket。 69 | 3. 输入一个全球唯一ID作为名称。 70 | 4. 选择主机所在的区域。 71 | 5. 点击Create完成创建。 72 | 73 | 创建好Bucket之后,点击进入该Bucket,并点击Create Folder创建两个目录,分别用于存放编译文件和日志。 74 | 75 | ## 配置编译环境 76 | 77 | FPGA Developer AMI已集成了SDx。进一步的环境配置可以使用aws-fpga项目提供的配置脚本。 78 | 79 | ``` 80 | git clone https://github.com/aws/aws-fpga.git $AWS_FPGA_REPO_DIR 81 | cd $AWS_FPGA_REPO_DIR 82 | source sdaccel_setup.sh 83 | ``` 84 | 85 | 其中`$AWS_FPGA_REPO_DIR`是存储项目文件的绝对路径。在FPGA Developer AMI中此环境变量已默认设置为 86 | 87 | /home/centos/src/project_data/aws-fpga 88 | 89 | 在其他镜像中自行设置为所需的路径。 90 | 91 | ## 编译样例程序 92 | 93 | aws-fpga项目中包含多个样例程序。以helloworld_ocl为例,该程序实现了向量相加的运算。 94 | 95 | #### 准备环境 96 | 97 | ``` 98 | cd $AWS_FPGA_REPO_DIR 99 | source sdaccel_setup.sh 100 | source $XILINX_SDX/settings64.sh 101 | ``` 102 | 103 | #### 模拟执行 104 | 105 | 由于FPGA程序的编译时间较长(几个小时),SDAccel提供了模拟流程,方便对程序进行调试。 106 | 107 | 软件模拟 108 | 109 | ``` 110 | cd $SDACCEL_DIR/examples/xilinx/getting_started/host/helloworld_ocl/ 111 | make clean 112 | make check TARGETS=sw_emu DEVICES=$AWS_PLATFORM all 113 | ``` 114 | 115 | 硬件模拟 116 | 117 | ``` 118 | cd $SDACCEL_DIR/examples/xilinx/getting_started/host/helloworld_ocl/ 119 | make clean 120 | make check TARGETS=hw_emu DEVICES=$AWS_PLATFORM all 121 | ``` 122 | 123 | #### 编译 124 | 125 | ``` 126 | cd $SDACCEL_DIR/examples/xilinx/getting_started/host/helloworld_ocl/ 127 | make clean 128 | make TARGETS=hw DEVICES=$AWS_PLATFORM all 129 | ``` 130 | 131 | 编译过程需要几个小时(在f1.2xlarge主机上需约2.5小时)。 132 | 133 | 134 | ## 制作AFI镜像 135 | 136 | 当编译完成后,需将编译好的程序制作成AFI,以便后续部署到F1主机上。 137 | 138 | #### 安装AWS CLI 139 | 140 | 制作镜像需用到AWS CLI。如果是选择了FPGA Developer AMI或Amazon Linux镜像的EC2主机,已安装好了AWS CLI。其他主机环境需手动安装,参考[aws-cli](https://github.com/aws/aws-cli)中的说明。 141 | 142 | #### 配置AWS CLI 143 | 144 | ``` 145 | $ aws configure 146 | AWS Access Key ID [None]: 147 | AWS Secret Access Key [None]: 148 | Default region name [None]: 149 | Default output format [None]: json 150 | ``` 151 | 152 | 其中``和``可以在AWS账户的My Security Credentials里查看和设置。 153 | 154 | 155 | #### 制作AFI 156 | 157 | ``` 158 | cd $SDACCEL_DIR/examples/xilinx/getting_started/host/helloworld_ocl/xclbin 159 | $SDACCEL_DIR/tools/create_sdaccel_afi.sh \ 160 | -xclbin=vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0.xclbin \ 161 | -s3_bucket= \ 162 | -s3_dcp_key= \ 163 | -s3_logs_key= 164 | ``` 165 | 166 | 其中``是之前创建的S3 Bucket的唯一ID,``和``分别是在Bucket中创建的存放编译文件和日志的目录名。 167 | 168 | create_sdaccel_afi.sh脚本执行了如下操作: 169 | 170 | 1. 将编译好的文件打包成tar文件上传到S3。 171 | 2. 调用AWS CLI的命令将上传的tar文件制作成AFI。 172 | 3. 生成*_afi_id.txt文件,其中记录了AFI的ID和全球ID。 173 | 4. 生成*.awsxclbin文件,FPGA应用的主程序可读取该文件以自动加载AFI。 174 | 175 | 制作AFI的工作由AWS后台服务执行,大约需要50分钟。等待制作完成的时候可以关闭当前的主机。可以通过以下命令查看AFI制作进度: 176 | 177 | ``` 178 | aws ec2 describe-fpga-images --fpga-image-ids 179 | ``` 180 | 181 | 其中``是*_afi_id.txt文件中记录的AFI的ID。 182 | 183 | 输出信息中的 "State" -> "Code" 一项指示了当前状态,共有4个状态: 184 | 185 | * pending:制作中 186 | * available:制作完成 187 | * failed:制作失败 188 | * unavailable:镜像不可用 189 | 190 | 当AFI镜像制作完成后,即可部署到F1上执行。 191 | 192 | ## 部署与执行 193 | 194 | 若编译环境不是F1主机,需将编译好的主程序(helloworld)及制作AFI时生成的*.awsxclbin文件拷贝到F1主机上。 195 | 196 | 若没有*.awsxclbin文件,可手动加载AFI镜像。首先清理以前加载的镜像 197 | 198 | ``` 199 | sudo fpga-clear-local-image -S 0 200 | ``` 201 | 202 | 加载新制作的镜像 203 | 204 | ``` 205 | sudo fpga-load-local-image -S 0 -I 206 | ``` 207 | 208 | 其中``是镜像的全球ID(以"agfi-"开头)。 209 | 210 | 执行程序 211 | 212 | ``` 213 | sudo sh 214 | source /opt/Xilinx/SDx/2017.1.rte/setup.sh 215 | ./helloworld 216 | ``` 217 | 218 | ## 性能报告 219 | 220 | SDAccel可以生成程序所用资源及性能的报告,以帮助开发人员改进程序。 221 | 222 | 选择目标平台为DEBUG平台: 223 | 224 | ```sh 225 | export $AWS_PLATFORM=$AWS_PLATFORM_4DDR_DEBUG 226 | ``` 227 | 228 | 在模拟执行过程会默认生成profile报告。为了生成timeline报告或为FPGA执行生成报告,需在程序执行目录添加`sdaccel.ini`文件,内容如下: 229 | 230 | ``` 231 | [Debug] 232 | timeline_trace=true 233 | profile=true 234 | ``` 235 | 236 | 报告会保存为.csv和.html两种格式文件。 237 | 238 | 239 | ## 参考资料 240 | 241 | [1] https://github.com/Xilinx/SDAccel_Examples/wiki/Getting-Started-on-AWS-F1-with-SDAccel-and-RTL-Kernels 242 | 243 | [2] https://amazonaws-china.com/cn/blogs/china/running-hello-world-on-fpga/ 244 | 245 | [3] https://github.com/aws/aws-fpga 246 | 247 | [4] https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_1/ug1023-sdaccel-user-guide.pdf 248 | -------------------------------------------------------------------------------- /test/helloworld_ocl/README.md: -------------------------------------------------------------------------------- 1 | Reports of [this example](https://github.com/Xilinx/SDAccel_Examples/tree/51416734fd694773a2ab4991f027e5c78e09c9a8/getting_started/host/helloworld_ocl) running on AWS F1 instance. 2 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw/sdaccel_profile_summary.csv: -------------------------------------------------------------------------------- 1 | SDAccel Profile Summary 2 | Generated on: 2018-03-30 02:28:11 3 | Msec since Epoch: 1522376891275 4 | Profiled application: helloworld 5 | Target platform: Xilinx 6 | Target devices: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0 7 | Flow mode: System Run 8 | Tool version: 2017.1 9 | 10 | OpenCL API Calls 11 | API Name,Number Of Calls,Total Time (ms),Minimum Time (ms),Average Time (ms),Maximum Time (ms), 12 | clCreateProgramWithBinary,1,7212.89,7212.89,7212.89,7212.89, 13 | clFinish,1,0.477942,0.477942,0.477942,0.477942, 14 | clEnqueueMigrateMemObjects,2,0.090178,0.027306,0.045089,0.062872, 15 | clReleaseProgram,1,0.079477,0.079477,0.079477,0.079477, 16 | clReleaseMemObject,6,0.074336,0.005632,0.0123893,0.04028, 17 | clGetPlatformInfo,14,0.072223,0.004633,0.00515879,0.008087, 18 | clEnqueueTask,1,0.069419,0.069419,0.069419,0.069419, 19 | clCreateBuffer,3,0.058059,0.006781,0.019353,0.043363, 20 | clGetExtensionFunctionAddress,2,0.039078,0.006426,0.019539,0.032652, 21 | clSetKernelArg,4,0.033246,0.00613,0.0083115,0.014044, 22 | clCreateKernel,1,0.02422,0.02422,0.02422,0.02422, 23 | clRetainMemObject,3,0.02251,0.006237,0.00750333,0.009445, 24 | clCreateContext,1,0.020672,0.020672,0.020672,0.020672, 25 | clGetDeviceIDs,2,0.016448,0.004934,0.008224,0.011514, 26 | clReleaseDevice,2,0.011948,0.005641,0.005974,0.006307, 27 | clGetDeviceInfo,2,0.011626,0.005124,0.005813,0.006502, 28 | clRetainDevice,2,0.011567,0.005202,0.0057835,0.006365, 29 | clCreateCommandQueue,1,0.010785,0.010785,0.010785,0.010785, 30 | clReleaseKernel,1,0.010467,0.010467,0.010467,0.010467, 31 | clReleaseCommandQueue,1,0.00816,0.00816,0.00816,0.00816, 32 | clReleaseContext,1,0.007424,0.007424,0.007424,0.007424, 33 | 34 | 35 | Kernel Execution 36 | Kernel,Number Of Enqueues,Total Time (ms),Minimum Time (ms),Average Time (ms),Maximum Time (ms), 37 | vector_add,1,0.113689,0.113689,0.113689,0.113689, 38 | 39 | 40 | Compute Unit Utilization 41 | Device,Compute Unit,Kernel,Global Work Size,Local Work Size,Number Of Calls,Total Time (ms),Minimum Time (ms),Average Time (ms),Maximum Time (ms), 42 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,vector_add_1,vector_add,1:1:1,1:1:1,1,0.067396,0.067396,0.067396,0.067396, 43 | 44 | 45 | Data Transfer: Host and Global Memory 46 | Context:Number of Devices,Transfer Type,Number Of Transfers,Transfer Rate (MB/s),Average Bandwidth Utilization (%),Average Size (KB),Total Time (ms),Average Time (ms), 47 | context0:1,READ,1,9.065798,0.094435,1.024,0.112952,0.112952, 48 | context0:1,WRITE,1,11.181481,0.116474,2.048,0.183160,0.183160, 49 | 50 | 51 | Data Transfer: Kernels and Global Memory 52 | Device,Transfer Type,Number Of Transfers,Transfer Rate (MB/s),Average Bandwidth Utilization (%),Average Size (KB),Average Time (ns),Device Execution Time (ms), 53 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,READ,32,18.0141,0.156372,0.064,758.281,0.113689, 54 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,WRITE,16,9.00703,0.078186,0.064,95,0.113689, 55 | 56 | 57 | Top Data Transfer: Kernels and Global Memory 58 | Device,Kernel Name,Number of Transfers,Average Bytes per Transfer,Transfer Efficiency (%),Total Data Transfer (MB),Total Write (MB),Total Read (MB),Transfer Rate (MB/s),Average Bandwidth Utilization (%), 59 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,ALL,48,64,1.5625,0.003072,0.001024,0.002048,27.0211,0.234558, 60 | 61 | 62 | Top Kernel Execution 63 | Kernel Instance Address,Kernel,Context ID,Command Queue ID,Device,Start Time (ms),Duration (ms),Global Work Size,Local Work Size, 64 | 17498816,vector_add,0,0,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,7215.1,0.113689,1:1:1,1:1:1, 65 | 66 | 67 | Top Buffer Writes 68 | Buffer Address,Context ID,Command Queue ID,Start Time (ms),Duration (ms),Buffer Size (KB),Writing Rate(MB/s), 69 | 17508288,0,0,7214.8,0.183160,2.048,11.181481, 70 | 71 | 72 | Top Buffer Reads 73 | Buffer Address,Context ID,Command Queue ID,Start Time (ms),Duration (ms),Buffer Size (KB),Reading Rate(MB/s), 74 | 17507584,0,0,7215.27,0.112952,1.024,9.065798, 75 | 76 | 77 | PRC Parameters 78 | Parameter,Element,Value, 79 | DEVICE_EXEC_TIME,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,0.113689, 80 | CU_CALLS,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_add_1,1, 81 | MEMORY_BIT_WIDTH,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,512, 82 | 83 | 84 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw/sdaccel_profile_summary.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 17 |

SDAccel Profile Summary

18 |
19 |

Generated on: 2018-03-30 02:28:11

20 |

Profiled application: helloworld

21 |

Target platform: Xilinx

22 |

Target devices: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0

23 |

Flow mode: System Run

24 |

Tool version: 2017.1

25 |
26 |
27 |

OpenCL API Calls

28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 |
API NameNumber Of CallsTotal Time (ms)Minimum Time (ms)Average Time (ms)Maximum Time (ms)
clCreateProgramWithBinary17212.897212.897212.897212.89
clFinish10.4779420.4779420.4779420.477942
clEnqueueMigrateMemObjects20.0901780.0273060.0450890.062872
clReleaseProgram10.0794770.0794770.0794770.079477
clReleaseMemObject60.0743360.0056320.01238930.04028
clGetPlatformInfo140.0722230.0046330.005158790.008087
clEnqueueTask10.0694190.0694190.0694190.069419
clCreateBuffer30.0580590.0067810.0193530.043363
clGetExtensionFunctionAddress20.0390780.0064260.0195390.032652
clSetKernelArg40.0332460.006130.00831150.014044
clCreateKernel10.024220.024220.024220.02422
clRetainMemObject30.022510.0062370.007503330.009445
clCreateContext10.0206720.0206720.0206720.020672
clGetDeviceIDs20.0164480.0049340.0082240.011514
clReleaseDevice20.0119480.0056410.0059740.006307
clGetDeviceInfo20.0116260.0051240.0058130.006502
clRetainDevice20.0115670.0052020.00578350.006365
clCreateCommandQueue10.0107850.0107850.0107850.010785
clReleaseKernel10.0104670.0104670.0104670.010467
clReleaseCommandQueue10.008160.008160.008160.00816
clReleaseContext10.0074240.0074240.0074240.007424
60 |
61 |

Kernel Execution

62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 |
KernelNumber Of EnqueuesTotal Time (ms)Minimum Time (ms)Average Time (ms)Maximum Time (ms)
vector_add10.1136890.1136890.1136890.113689
74 |
75 |

Compute Unit Utilization

76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 |
DeviceCompute UnitKernelGlobal Work SizeLocal Work SizeNumber Of CallsTotal Time (ms)Minimum Time (ms)Average Time (ms)Maximum Time (ms)
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0vector_add_1vector_add1:1:11:1:110.0673960.0673960.0673960.067396
92 |
93 |

Data Transfer: Host and Global Memory

94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 |
Context:Number of DevicesTransfer TypeNumber Of TransfersTransfer Rate (MB/s)Average Bandwidth Utilization (%)Average Size (KB)Total Time (ms)Average Time (ms)
context0:1READ19.0657980.0944351.0240.1129520.112952
context0:1WRITE111.1814810.1164742.0480.1831600.183160
109 |
110 |

Data Transfer: Kernels and Global Memory

111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 |
DeviceTransfer TypeNumber Of TransfersTransfer Rate (MB/s)Average Bandwidth Utilization (%)Average Size (KB)Average Time (ns)Device Execution Time (ms)
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0READ3218.01410.1563720.064758.2810.113689
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0WRITE169.007030.0781860.064950.113689
126 |
127 |

Top Data Transfer: Kernels and Global Memory

128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 |
DeviceKernel NameNumber of TransfersAverage Bytes per TransferTransfer Efficiency (%)Total Data Transfer (MB)Total Write (MB)Total Read (MB)Transfer Rate (MB/s)Average Bandwidth Utilization (%)
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0ALL48641.56250.0030720.0010240.00204827.02110.234558
144 | 145 | 146 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw/sdaccel_timeline_trace.csv: -------------------------------------------------------------------------------- 1 | SDAccel Timeline Trace 2 | Generated on: 2018-03-30 02:28:11 3 | Msec since Epoch: 1522376891275 4 | Profiled application: helloworld 5 | Target platform: Xilinx 6 | 7 | 8 | Time_msec,Name,Event,Address_Port,Size,Latency_cycles,Start_cycles,End_cycles,Latency_usec,Start_msec,End_msec, 9 | 0.326859,clGetExtensionFunctionAddress|General,START,,,,,,,,, 10 | 0.359511,clGetExtensionFunctionAddress|General,END,,,,,,,,, 11 | 0.370213,clGetExtensionFunctionAddress|General,START,,,,,,,,, 12 | 0.376639,clGetExtensionFunctionAddress|General,END,,,,,,,,, 13 | 0.384743,clGetPlatformInfo|General,START,,,,,,,,, 14 | 0.39283,clGetPlatformInfo|General,END,,,,,,,,, 15 | 0.398257,clGetPlatformInfo|General,START,,,,,,,,, 16 | 0.403234,clGetPlatformInfo|General,END,,,,,,,,, 17 | 0.416554,clGetPlatformInfo|General,START,,,,,,,,, 18 | 0.421706,clGetPlatformInfo|General,END,,,,,,,,, 19 | 0.426586,clGetPlatformInfo|General,START,,,,,,,,, 20 | 0.431227,clGetPlatformInfo|General,END,,,,,,,,, 21 | 0.435936,clGetPlatformInfo|General,START,,,,,,,,, 22 | 0.440653,clGetPlatformInfo|General,END,,,,,,,,, 23 | 0.445353,clGetPlatformInfo|General,START,,,,,,,,, 24 | 0.450016,clGetPlatformInfo|General,END,,,,,,,,, 25 | 0.456784,clGetPlatformInfo|General,START,,,,,,,,, 26 | 0.461538,clGetPlatformInfo|General,END,,,,,,,,, 27 | 0.466278,clGetPlatformInfo|General,START,,,,,,,,, 28 | 0.470923,clGetPlatformInfo|General,END,,,,,,,,, 29 | 0.475851,clGetPlatformInfo|General,START,,,,,,,,, 30 | 0.480484,clGetPlatformInfo|General,END,,,,,,,,, 31 | 0.485224,clGetPlatformInfo|General,START,,,,,,,,, 32 | 0.489902,clGetPlatformInfo|General,END,,,,,,,,, 33 | 0.494571,clGetPlatformInfo|General,START,,,,,,,,, 34 | 0.500901,clGetPlatformInfo|General,END,,,,,,,,, 35 | 0.505669,clGetPlatformInfo|General,START,,,,,,,,, 36 | 0.510338,clGetPlatformInfo|General,END,,,,,,,,, 37 | 0.531548,clGetPlatformInfo|General,START,,,,,,,,, 38 | 0.536954,clGetPlatformInfo|General,END,,,,,,,,, 39 | 0.545692,clGetPlatformInfo|General,START,,,,,,,,, 40 | 0.550563,clGetPlatformInfo|General,END,,,,,,,,, 41 | 0.596241,clGetDeviceIDs|General,START,,,,,,,,, 42 | 0.607755,clGetDeviceIDs|General,END,,,,,,,,, 43 | 0.613691,clGetDeviceIDs|General,START,,,,,,,,, 44 | 0.618625,clGetDeviceIDs|General,END,,,,,,,,, 45 | 0.627651,clRetainDevice|General,START,,,,,,,,, 46 | 0.634016,clRetainDevice|General,END,,,,,,,,, 47 | 0.640397,clRetainDevice|General,START,,,,,,,,, 48 | 0.645599,clRetainDevice|General,END,,,,,,,,, 49 | 0.652701,clCreateContext|General,START,,,,,,,,, 50 | 0.673373,clCreateContext|General,END,,,,,,,,, 51 | 0.681043,clCreateCommandQueue|General,START,,,,,,,,, 52 | 0.691828,clCreateCommandQueue|General,END,,,,,,,,, 53 | 0.69954,clGetDeviceInfo|General,START,,,,,,,,, 54 | 0.706042,clGetDeviceInfo|General,END,,,,,,,,, 55 | 0.711652,clGetDeviceInfo|General,START,,,,,,,,, 56 | 0.716776,clGetDeviceInfo|General,END,,,,,,,,, 57 | 1.407974,clCreateProgramWithBinary|General,START,,,,,,,,, 58 | 7214.297443,clCreateProgramWithBinary|General,END,,,,,,,,, 59 | 7214.339748,clCreateBuffer|General,START,,,,,,,,, 60 | 7214.383111,clCreateBuffer|General,END,,,,,,,,, 61 | 7214.391693,clCreateBuffer|General,START,,,,,,,,, 62 | 7214.399608,clCreateBuffer|General,END,,,,,,,,, 63 | 7214.406051,clCreateBuffer|General,START,,,,,,,,, 64 | 7214.412832,clCreateBuffer|General,END,,,,,,,,, 65 | 7214.467568,clRetainMemObject|General,START,,,,,,,,, 66 | 7214.477013,clRetainMemObject|General,END,,,,,,,,, 67 | 7214.48488,clRetainMemObject|General,START,,,,,,,,, 68 | 7214.491708,clRetainMemObject|General,END,,,,,,,,, 69 | 7214.498809,clRetainMemObject|General,START,,,,,,,,, 70 | 7214.505046,clRetainMemObject|General,END,,,,,,,,, 71 | 7214.516536,clEnqueueMigrateMemObjects|17491104,START,,,,,,,,, 72 | 7214.562483,WRITE_BUFFER,QUEUE,0X10B27C0,2048,,,,,,, 73 | 7214.579408,clEnqueueMigrateMemObjects|17491104,END,,,,,,,,, 74 | 7214.649367,clCreateKernel|General,START,,,,,,,,, 75 | 7214.673214,WRITE_BUFFER,SUBMIT,0X10B27C0,2048,,,,,,, 76 | 7214.673587,clCreateKernel|General,END,,,,,,,,, 77 | 7214.729261,clSetKernelArg|General,START,,,,,,,,, 78 | 7214.743305,clSetKernelArg|General,END,,,,,,,,, 79 | 7214.749772,clSetKernelArg|General,START,,,,,,,,, 80 | 7214.756661,clSetKernelArg|General,END,,,,,,,,, 81 | 7214.762985,clSetKernelArg|General,START,,,,,,,,, 82 | 7214.769115,clSetKernelArg|General,END,,,,,,,,, 83 | 7214.774955,clSetKernelArg|General,START,,,,,,,,, 84 | 7214.781138,clSetKernelArg|General,END,,,,,,,,, 85 | 7214.790468,clEnqueueTask|17491104,START,,,,,,,,, 86 | 7214.802321,WRITE_BUFFER,START,0X10B27C0,2048,,,,,,, 87 | 7214.82705,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|vector_add|1:1:1|all,QUEUE,0X10B02C0,1,,,,,,, 88 | 7214.859887,clEnqueueTask|17491104,END,,,,,,,,, 89 | 7214.869504,clEnqueueMigrateMemObjects|17491104,START,,,,,,,,, 90 | 7214.888526,READ_BUFFER,QUEUE,0X10B2500,1024,,,,,,, 91 | 7214.89681,clEnqueueMigrateMemObjects|17491104,END,,,,,,,,, 92 | 7214.907691,clFinish|General,START,,,,,,,,, 93 | 7214.985481,WRITE_BUFFER,END,0X10B27C0,2048,,,,,,, 94 | 7215.011003,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|vector_add|1:1:1|all,SUBMIT,0X10B02C0,1,,,,,,, 95 | 7215.100965,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|vector_add|1:1:1|vector_add_1,START,0X10B02C0,1,,,,,,, 96 | 7215.168361,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|vector_add|1:1:1|vector_add_1,END,0X10B02C0,1,,,,,,, 97 | 7215.220212,READ_BUFFER,SUBMIT,0X10B2500,1024,,,,,,, 98 | 7215.271212,READ_BUFFER,START,0X10B2500,1024,,,,,,, 99 | 7215.384164,READ_BUFFER,END,0X10B2500,1024,,,,,,, 100 | 7215.385633,clFinish|General,END,,,,,,,,, 101 | 7215.496304,clReleaseKernel|General,START,,,,,,,,, 102 | 7215.506771,clReleaseKernel|General,END,,,,,,,,, 103 | 7215.514224,clReleaseMemObject|General,START,,,,,,,,, 104 | 7215.521325,clReleaseMemObject|General,END,,,,,,,,, 105 | 7215.528135,clReleaseMemObject|General,START,,,,,,,,, 106 | 7215.534069,clReleaseMemObject|General,END,,,,,,,,, 107 | 7215.539785,clReleaseMemObject|General,START,,,,,,,,, 108 | 7215.545417,clReleaseMemObject|General,END,,,,,,,,, 109 | 7215.551333,clReleaseMemObject|General,START,,,,,,,,, 110 | 7215.591613,clReleaseMemObject|General,END,,,,,,,,, 111 | 7215.598889,clReleaseMemObject|General,START,,,,,,,,, 112 | 7215.60678,clReleaseMemObject|General,END,,,,,,,,, 113 | 7215.612883,clReleaseMemObject|General,START,,,,,,,,, 114 | 7215.620381,clReleaseMemObject|General,END,,,,,,,,, 115 | 7215.62822,clReleaseProgram|General,START,,,,,,,,, 116 | 7215.707697,clReleaseProgram|General,END,,,,,,,,, 117 | 7215.742002,clReleaseCommandQueue|General,START,,,,,,,,, 118 | 7215.750162,clReleaseCommandQueue|General,END,,,,,,,,, 119 | 7215.757532,clReleaseContext|General,START,,,,,,,,, 120 | 7215.764956,clReleaseContext|General,END,,,,,,,,, 121 | 7215.772112,clReleaseDevice|General,START,,,,,,,,, 122 | 7215.778419,clReleaseDevice|General,END,,,,,,,,, 123 | 7215.784682,clReleaseDevice|General,START,,,,,,,,, 124 | 7215.790323,clReleaseDevice|General,END,,,,,,,,, 125 | Footer,begin 126 | Project,vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0, 127 | Stall profiling,false, 128 | Target,System Run, 129 | Platform,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0, 130 | Footer,end 131 | 132 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw/sdaccel_timeline_trace.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 17 |

SDAccel Timeline Trace

18 |
19 |

Generated on: 2018-03-30 02:28:11

20 |

Profiled application: helloworld

21 |

Target platform: Xilinx

22 |
23 |

24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 |
Time (msec)NameEventAddress/PortSize (Bytes or Num)Latency (cycles)Start (cycles)End (cycles)Latency (usec)Start (msec)End (msec)
0.326859clGetExtensionFunctionAddress|GeneralSTART
0.359511clGetExtensionFunctionAddress|GeneralEND
0.370213clGetExtensionFunctionAddress|GeneralSTART
0.376639clGetExtensionFunctionAddress|GeneralEND
0.384743clGetPlatformInfo|GeneralSTART
0.39283clGetPlatformInfo|GeneralEND
0.398257clGetPlatformInfo|GeneralSTART
0.403234clGetPlatformInfo|GeneralEND
0.416554clGetPlatformInfo|GeneralSTART
0.421706clGetPlatformInfo|GeneralEND
0.426586clGetPlatformInfo|GeneralSTART
0.431227clGetPlatformInfo|GeneralEND
0.435936clGetPlatformInfo|GeneralSTART
0.440653clGetPlatformInfo|GeneralEND
0.445353clGetPlatformInfo|GeneralSTART
0.450016clGetPlatformInfo|GeneralEND
0.456784clGetPlatformInfo|GeneralSTART
0.461538clGetPlatformInfo|GeneralEND
0.466278clGetPlatformInfo|GeneralSTART
0.470923clGetPlatformInfo|GeneralEND
0.475851clGetPlatformInfo|GeneralSTART
0.480484clGetPlatformInfo|GeneralEND
0.485224clGetPlatformInfo|GeneralSTART
0.489902clGetPlatformInfo|GeneralEND
0.494571clGetPlatformInfo|GeneralSTART
0.500901clGetPlatformInfo|GeneralEND
0.505669clGetPlatformInfo|GeneralSTART
0.510338clGetPlatformInfo|GeneralEND
0.531548clGetPlatformInfo|GeneralSTART
0.536954clGetPlatformInfo|GeneralEND
0.545692clGetPlatformInfo|GeneralSTART
0.550563clGetPlatformInfo|GeneralEND
0.596241clGetDeviceIDs|GeneralSTART
0.607755clGetDeviceIDs|GeneralEND
0.613691clGetDeviceIDs|GeneralSTART
0.618625clGetDeviceIDs|GeneralEND
0.627651clRetainDevice|GeneralSTART
0.634016clRetainDevice|GeneralEND
0.640397clRetainDevice|GeneralSTART
0.645599clRetainDevice|GeneralEND
0.652701clCreateContext|GeneralSTART
0.673373clCreateContext|GeneralEND
0.681043clCreateCommandQueue|GeneralSTART
0.691828clCreateCommandQueue|GeneralEND
0.69954clGetDeviceInfo|GeneralSTART
0.706042clGetDeviceInfo|GeneralEND
0.711652clGetDeviceInfo|GeneralSTART
0.716776clGetDeviceInfo|GeneralEND
1.407974clCreateProgramWithBinary|GeneralSTART
7214.297443clCreateProgramWithBinary|GeneralEND
7214.339748clCreateBuffer|GeneralSTART
7214.383111clCreateBuffer|GeneralEND
7214.391693clCreateBuffer|GeneralSTART
7214.399608clCreateBuffer|GeneralEND
7214.406051clCreateBuffer|GeneralSTART
7214.412832clCreateBuffer|GeneralEND
7214.467568clRetainMemObject|GeneralSTART
7214.477013clRetainMemObject|GeneralEND
7214.48488clRetainMemObject|GeneralSTART
7214.491708clRetainMemObject|GeneralEND
7214.498809clRetainMemObject|GeneralSTART
7214.505046clRetainMemObject|GeneralEND
7214.516536clEnqueueMigrateMemObjects|17491104START
7214.562483WRITE_BUFFERQUEUE0X10B27C02048
7214.579408clEnqueueMigrateMemObjects|17491104END
7214.649367clCreateKernel|GeneralSTART
7214.673214WRITE_BUFFERSUBMIT0X10B27C02048
7214.673587clCreateKernel|GeneralEND
7214.729261clSetKernelArg|GeneralSTART
7214.743305clSetKernelArg|GeneralEND
7214.749772clSetKernelArg|GeneralSTART
7214.756661clSetKernelArg|GeneralEND
7214.762985clSetKernelArg|GeneralSTART
7214.769115clSetKernelArg|GeneralEND
7214.774955clSetKernelArg|GeneralSTART
7214.781138clSetKernelArg|GeneralEND
7214.790468clEnqueueTask|17491104START
7214.802321WRITE_BUFFERSTART0X10B27C02048
7214.82705KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|vector_add|1:1:1|allQUEUE0X10B02C01
7214.859887clEnqueueTask|17491104END
7214.869504clEnqueueMigrateMemObjects|17491104START
7214.888526READ_BUFFERQUEUE0X10B25001024
7214.89681clEnqueueMigrateMemObjects|17491104END
7214.907691clFinish|GeneralSTART
7214.985481WRITE_BUFFEREND0X10B27C02048
7215.011003KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|vector_add|1:1:1|allSUBMIT0X10B02C01
7215.100965KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|vector_add|1:1:1|vector_add_1START0X10B02C01
7215.168361KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|vector_add|1:1:1|vector_add_1END0X10B02C01
7215.220212READ_BUFFERSUBMIT0X10B25001024
7215.271212READ_BUFFERSTART0X10B25001024
7215.384164READ_BUFFEREND0X10B25001024
7215.385633clFinish|GeneralEND
7215.496304clReleaseKernel|GeneralSTART
7215.506771clReleaseKernel|GeneralEND
7215.514224clReleaseMemObject|GeneralSTART
7215.521325clReleaseMemObject|GeneralEND
7215.528135clReleaseMemObject|GeneralSTART
7215.534069clReleaseMemObject|GeneralEND
7215.539785clReleaseMemObject|GeneralSTART
7215.545417clReleaseMemObject|GeneralEND
7215.551333clReleaseMemObject|GeneralSTART
7215.591613clReleaseMemObject|GeneralEND
7215.598889clReleaseMemObject|GeneralSTART
7215.60678clReleaseMemObject|GeneralEND
7215.612883clReleaseMemObject|GeneralSTART
7215.620381clReleaseMemObject|GeneralEND
7215.62822clReleaseProgram|GeneralSTART
7215.707697clReleaseProgram|GeneralEND
7215.742002clReleaseCommandQueue|GeneralSTART
7215.750162clReleaseCommandQueue|GeneralEND
7215.757532clReleaseContext|GeneralSTART
7215.764956clReleaseContext|GeneralEND
7215.772112clReleaseDevice|GeneralSTART
7215.778419clReleaseDevice|GeneralEND
7215.784682clReleaseDevice|GeneralSTART
7215.790323clReleaseDevice|GeneralEND
156 | 157 | 158 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw_emu/emconfig.json: -------------------------------------------------------------------------------- 1 | { 2 | "Comment": "This file is auto-generated by the tool. Do not modify", 3 | "Version": { 4 | "FileVersion": "2.0", 5 | "ToolVersion": "2017.1" 6 | }, 7 | "Platform": { 8 | "Boards": [ 9 | { 10 | "Devices": [ 11 | { 12 | "Name": "xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0", 13 | "DdrBanks": [ 14 | { 15 | "Name": "mem0", 16 | "Type": "ddr4", 17 | "Size": "16GB" 18 | }, 19 | { 20 | "Name": "mem1", 21 | "Type": "ddr4", 22 | "Size": "16GB" 23 | }, 24 | { 25 | "Name": "mem2", 26 | "Type": "ddr4", 27 | "Size": "16GB" 28 | }, 29 | { 30 | "Name": "mem3", 31 | "Type": "ddr4", 32 | "Size": "16GB" 33 | } 34 | ] 35 | } 36 | ], 37 | "NumBoards": "1" 38 | } 39 | ] 40 | } 41 | } 42 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw_emu/emulation_debug.log: -------------------------------------------------------------------------------- 1 | INFO: [SDx-EM 01] Hardware emulation runs detailed simulation underneath. It may take long time for large data set. Please use a small dataset for faster execution. You can still get performance trend for your kernel with smaller dataset. 2 | INFO: [SDx-EM 22] [Wall clock time: 08:14, Emulation time: 0.007938 ms] Data transfer between kernel(s) and global memory(s) 3 | BANK0 RD = 2.000 KB WR = 1.000 KB 4 | BANK1 RD = 0.000 KB WR = 0.000 KB 5 | BANK2 RD = 0.000 KB WR = 0.000 KB 6 | BANK3 RD = 0.000 KB WR = 0.000 KB 7 | 8 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw_emu/sdaccel_profile_summary.csv: -------------------------------------------------------------------------------- 1 | SDAccel Profile Summary 2 | Generated on: 2018-03-21 08:14:19 3 | Msec since Epoch: 1521620059703 4 | Profiled application: helloworld 5 | Target platform: Xilinx 6 | Target devices: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0 7 | Flow mode: Hardware Emulation 8 | Tool version: 2017.1 9 | 10 | OpenCL API Calls 11 | API Name,Number Of Calls,Total Time (ms),Minimum Time (ms),Average Time (ms),Maximum Time (ms), 12 | clReleaseProgram,1,9225.68,9225.68,9225.68,9225.68, 13 | clCreateProgramWithBinary,1,4501.22,4501.22,4501.22,4501.22, 14 | clFinish,1,772.519,772.519,772.519,772.519, 15 | clCreateBuffer,3,1.20608,0.3314,0.402025,0.486241, 16 | clEnqueueMigrateMemObjects,2,0.779511,0.368783,0.389755,0.410728, 17 | clEnqueueTask,1,0.755245,0.755245,0.755245,0.755245, 18 | clReleaseMemObject,6,0.060435,0.005765,0.0100725,0.027762, 19 | clGetExtensionFunctionAddress,1,0.037798,0.037798,0.037798,0.037798, 20 | clSetKernelArg,4,0.030918,0.006127,0.0077295,0.012059, 21 | clGetPlatformInfo,4,0.027089,0.005009,0.00677225,0.008946, 22 | clRetainMemObject,3,0.022179,0.006348,0.007393,0.009351, 23 | clCreateKernel,1,0.018523,0.018523,0.018523,0.018523, 24 | clGetDeviceIDs,2,0.016958,0.005037,0.008479,0.011921, 25 | clCreateCommandQueue,1,0.01318,0.01318,0.01318,0.01318, 26 | clReleaseDevice,2,0.012838,0.005995,0.006419,0.006843, 27 | clReleaseCommandQueue,1,0.012037,0.012037,0.012037,0.012037, 28 | clCreateContext,1,0.011895,0.011895,0.011895,0.011895, 29 | clRetainDevice,2,0.011652,0.005235,0.005826,0.006417, 30 | clGetDeviceInfo,2,0.011545,0.005401,0.0057725,0.006144, 31 | clReleaseKernel,1,0.010708,0.010708,0.010708,0.010708, 32 | clReleaseContext,1,0.008581,0.008581,0.008581,0.008581, 33 | 34 | 35 | Kernel Execution (includes estimated device times) 36 | Kernel,Number Of Enqueues,Total Time (ms),Minimum Time (ms),Average Time (ms),Maximum Time (ms), 37 | vector_add,1,0.007672,0.007672,0.007672,0.007672, 38 | 39 | 40 | Compute Unit Utilization (includes estimated device times) 41 | Device,Compute Unit,Kernel,Global Work Size,Local Work Size,Number Of Calls,Total Time (ms),Minimum Time (ms),Average Time (ms),Maximum Time (ms), 42 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,vector_add_1,vector_add,1:1:1,1:1:1,1,0.007664,0.007664,0.007664,0.007664, 43 | 44 | 45 | Data Transfer: Host and Global Memory 46 | Context:Number of Devices,Transfer Type,Number Of Transfers,Transfer Rate (MB/s),Average Bandwidth Utilization (%),Average Size (KB),Total Time (ms),Average Time (ms), 47 | context0:1,READ,1,N/A,N/A,1.024,N/A,N/A, 48 | context0:1,WRITE,1,N/A,N/A,2.048,N/A,N/A, 49 | 50 | 51 | Data Transfer: Kernels and Global Memory 52 | Device,Transfer Type,Number Of Transfers,Transfer Rate (MB/s),Average Bandwidth Utilization (%),Average Size (KB),Average Time (ns),Device Execution Time (ms), 53 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,READ,32,266.945,2.31723,0.064,461.25,0.007672, 54 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,WRITE,16,133.472,1.15861,0.064,155,0.007672, 55 | 56 | 57 | Top Data Transfer: Kernels and Global Memory 58 | Device,Kernel Name,Number of Transfers,Average Bytes per Transfer,Transfer Efficiency (%),Total Data Transfer (MB),Total Write (MB),Total Read (MB),Transfer Rate (MB/s),Average Bandwidth Utilization (%), 59 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,ALL,48,64,1.5625,0.003072,0.001024,0.002048,400.417,3.47584, 60 | 61 | 62 | Top Kernel Execution 63 | Kernel Instance Address,Kernel,Context ID,Command Queue ID,Device,Start Time (ms),Duration (ms),Global Work Size,Local Work Size, 64 | 21629984,vector_add,0,0,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,0.000142,0.007672,1:1:1,1:1:1, 65 | 66 | 67 | Top Buffer Writes 68 | Buffer Address,Context ID,Command Queue ID,Start Time (ms),Duration (ms),Buffer Size (KB),Writing Rate(MB/s), 69 | 21613744,0,0,4506.97,N/A,2.048,N/A, 70 | 71 | 72 | Top Buffer Reads 73 | Buffer Address,Context ID,Command Queue ID,Start Time (ms),Duration (ms),Buffer Size (KB),Reading Rate(MB/s), 74 | 21644416,0,0,5275.27,N/A,1.024,N/A, 75 | 76 | 77 | PRC Parameters 78 | Parameter,Element,Value, 79 | DEVICE_EXEC_TIME,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,0.007672, 80 | CU_CALLS,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_add_1,1, 81 | MEMORY_BIT_WIDTH,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,512, 82 | 83 | 84 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw_emu/sdaccel_profile_summary.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 17 |

SDAccel Profile Summary

18 |
19 |

Generated on: 2018-03-21 08:14:19

20 |

Profiled application: helloworld

21 |

Target platform: Xilinx

22 |

Target devices: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0

23 |

Flow mode: Hardware Emulation

24 |

Tool version: 2017.1

25 |
26 |
27 |

OpenCL API Calls

28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 |
API NameNumber Of CallsTotal Time (ms)Minimum Time (ms)Average Time (ms)Maximum Time (ms)
clReleaseProgram19225.689225.689225.689225.68
clCreateProgramWithBinary14501.224501.224501.224501.22
clFinish1772.519772.519772.519772.519
clCreateBuffer31.206080.33140.4020250.486241
clEnqueueMigrateMemObjects20.7795110.3687830.3897550.410728
clEnqueueTask10.7552450.7552450.7552450.755245
clReleaseMemObject60.0604350.0057650.01007250.027762
clGetExtensionFunctionAddress10.0377980.0377980.0377980.037798
clSetKernelArg40.0309180.0061270.00772950.012059
clGetPlatformInfo40.0270890.0050090.006772250.008946
clRetainMemObject30.0221790.0063480.0073930.009351
clCreateKernel10.0185230.0185230.0185230.018523
clGetDeviceIDs20.0169580.0050370.0084790.011921
clCreateCommandQueue10.013180.013180.013180.01318
clReleaseDevice20.0128380.0059950.0064190.006843
clReleaseCommandQueue10.0120370.0120370.0120370.012037
clCreateContext10.0118950.0118950.0118950.011895
clRetainDevice20.0116520.0052350.0058260.006417
clGetDeviceInfo20.0115450.0054010.00577250.006144
clReleaseKernel10.0107080.0107080.0107080.010708
clReleaseContext10.0085810.0085810.0085810.008581
60 |
61 |

Kernel Execution (includes estimated device times)

62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 |
KernelNumber Of EnqueuesTotal Time (ms)Minimum Time (ms)Average Time (ms)Maximum Time (ms)
vector_add10.0076720.0076720.0076720.007672
74 |
75 |

Compute Unit Utilization (includes estimated device times)

76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 |
DeviceCompute UnitKernelGlobal Work SizeLocal Work SizeNumber Of CallsTotal Time (ms)Minimum Time (ms)Average Time (ms)Maximum Time (ms)
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0vector_add_1vector_add1:1:11:1:110.0076640.0076640.0076640.007664
92 |
93 |

Data Transfer: Host and Global Memory

94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 |
Context:Number of DevicesTransfer TypeNumber Of TransfersTransfer Rate (MB/s)Average Bandwidth Utilization (%)Average Size (KB)Total Time (ms)Average Time (ms)
context0:1READ1N/AN/A1.024N/AN/A
context0:1WRITE1N/AN/A2.048N/AN/A
109 |
110 |

Data Transfer: Kernels and Global Memory

111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 |
DeviceTransfer TypeNumber Of TransfersTransfer Rate (MB/s)Average Bandwidth Utilization (%)Average Size (KB)Average Time (ns)Device Execution Time (ms)
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0READ32266.9452.317230.064461.250.007672
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0WRITE16133.4721.158610.0641550.007672
126 |
127 |

Top Data Transfer: Kernels and Global Memory

128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 |
DeviceKernel NameNumber of TransfersAverage Bytes per TransferTransfer Efficiency (%)Total Data Transfer (MB)Total Write (MB)Total Read (MB)Transfer Rate (MB/s)Average Bandwidth Utilization (%)
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0ALL48641.56250.0030720.0010240.002048400.4173.47584
144 | 145 | 146 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw_emu/sdaccel_timeline_trace.csv: -------------------------------------------------------------------------------- 1 | SDAccel Timeline Trace 2 | Generated on: 2018-03-21 08:14:19 3 | Msec since Epoch: 1521620059703 4 | Profiled application: helloworld 5 | Target platform: Xilinx 6 | 7 | 8 | Time_msec,Name,Event,Address_Port,Size,Latency_cycles,Start_cycles,End_cycles,Latency_usec,Start_msec,End_msec, 9 | 0.288481,clGetExtensionFunctionAddress|General,START,,,,,,,,, 10 | 0.326279,clGetExtensionFunctionAddress|General,END,,,,,,,,, 11 | 0.338826,clGetPlatformInfo|General,START,,,,,,,,, 12 | 0.347772,clGetPlatformInfo|General,END,,,,,,,,, 13 | 0.353404,clGetPlatformInfo|General,START,,,,,,,,, 14 | 0.358904,clGetPlatformInfo|General,END,,,,,,,,, 15 | 0.401701,clGetPlatformInfo|General,START,,,,,,,,, 16 | 0.409335,clGetPlatformInfo|General,END,,,,,,,,, 17 | 0.418269,clGetPlatformInfo|General,START,,,,,,,,, 18 | 0.423278,clGetPlatformInfo|General,END,,,,,,,,, 19 | 0.50022,clGetDeviceIDs|General,START,,,,,,,,, 20 | 0.512141,clGetDeviceIDs|General,END,,,,,,,,, 21 | 0.518369,clGetDeviceIDs|General,START,,,,,,,,, 22 | 0.523406,clGetDeviceIDs|General,END,,,,,,,,, 23 | 0.531095,clRetainDevice|General,START,,,,,,,,, 24 | 0.537512,clRetainDevice|General,END,,,,,,,,, 25 | 0.54381,clRetainDevice|General,START,,,,,,,,, 26 | 0.549045,clRetainDevice|General,END,,,,,,,,, 27 | 0.555862,clCreateContext|General,START,,,,,,,,, 28 | 0.567757,clCreateContext|General,END,,,,,,,,, 29 | 0.574235,clCreateCommandQueue|General,START,,,,,,,,, 30 | 0.587415,clCreateCommandQueue|General,END,,,,,,,,, 31 | 0.594017,clGetDeviceInfo|General,START,,,,,,,,, 32 | 0.600161,clGetDeviceInfo|General,END,,,,,,,,, 33 | 0.60579,clGetDeviceInfo|General,START,,,,,,,,, 34 | 0.611191,clGetDeviceInfo|General,END,,,,,,,,, 35 | 3.050016,clCreateProgramWithBinary|General,START,,,,,,,,, 36 | 4504.267456,clCreateProgramWithBinary|General,END,,,,,,,,, 37 | 4504.314784,clCreateBuffer|General,START,,,,,,,,, 38 | 4504.646184,clCreateBuffer|General,END,,,,,,,,, 39 | 4504.655168,clCreateBuffer|General,START,,,,,,,,, 40 | 4505.043603,clCreateBuffer|General,END,,,,,,,,, 41 | 4505.052999,clCreateBuffer|General,START,,,,,,,,, 42 | 4505.53924,clCreateBuffer|General,END,,,,,,,,, 43 | 4505.555236,clRetainMemObject|General,START,,,,,,,,, 44 | 4505.564587,clRetainMemObject|General,END,,,,,,,,, 45 | 4505.572326,clRetainMemObject|General,START,,,,,,,,, 46 | 4505.578806,clRetainMemObject|General,END,,,,,,,,, 47 | 4505.58589,clRetainMemObject|General,START,,,,,,,,, 48 | 4505.592238,clRetainMemObject|General,END,,,,,,,,, 49 | 4505.603836,clEnqueueMigrateMemObjects|21770256,START,,,,,,,,, 50 | 4505.644057,WRITE_BUFFER,QUEUE,0X149CCB0,2048,,,,,,, 51 | 4506.014564,clEnqueueMigrateMemObjects|21770256,END,,,,,,,,, 52 | 4506.027801,clCreateKernel|General,START,,,,,,,,, 53 | 4506.046324,clCreateKernel|General,END,,,,,,,,, 54 | 4506.055694,clSetKernelArg|General,START,,,,,,,,, 55 | 4506.067753,clSetKernelArg|General,END,,,,,,,,, 56 | 4506.073911,clSetKernelArg|General,START,,,,,,,,, 57 | 4506.080348,clSetKernelArg|General,END,,,,,,,,, 58 | 4506.086342,clSetKernelArg|General,START,,,,,,,,, 59 | 4506.092637,clSetKernelArg|General,END,,,,,,,,, 60 | 4506.098477,clSetKernelArg|General,START,,,,,,,,, 61 | 4506.104604,clSetKernelArg|General,END,,,,,,,,, 62 | 4506.112731,clEnqueueTask|21770256,START,,,,,,,,, 63 | 4506.137948,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0|vector_add|1:1:1|all,QUEUE,0X14A0C20,1,,,,,,, 64 | 4506.867976,clEnqueueTask|21770256,END,,,,,,,,, 65 | 4506.882133,clEnqueueMigrateMemObjects|21770256,START,,,,,,,,, 66 | 4506.024483,WRITE_BUFFER,SUBMIT,0X149CCB0,2048,,,,,,, 67 | 4506.895589,READ_BUFFER,QUEUE,0X14A4480,1024,,,,,,, 68 | 4507.250916,clEnqueueMigrateMemObjects|21770256,END,,,,,,,,, 69 | 4507.263682,clFinish|General,START,,,,,,,,, 70 | 4506.97118,WRITE_BUFFER,START,0X149CCB0,2048,,,,,,, 71 | 4518.469775,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0|vector_add|1:1:1|all,SUBMIT,0X14A0C20,1,,,,,,, 72 | 4518.451556,WRITE_BUFFER,END,0X149CCB0,2048,,,,,,, 73 | 4519.598625,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0|vector_add|1:1:1|vector_add_1,START,0X14A0C20,1,,,,,,, 74 | 5274.379162,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0|vector_add|1:1:1|vector_add_1,END,0X14A0C20,1,,,,,,, 75 | 5274.497885,READ_BUFFER,SUBMIT,0X14A4480,1024,,,,,,, 76 | 5275.273772,READ_BUFFER,START,0X14A4480,1024,,,,,,, 77 | 5279.782589,clFinish|General,END,,,,,,,,, 78 | 5279.890787,clReleaseKernel|General,START,,,,,,,,, 79 | 5279.901495,clReleaseKernel|General,END,,,,,,,,, 80 | 5279.910512,clReleaseMemObject|General,START,,,,,,,,, 81 | 5279.917511,clReleaseMemObject|General,END,,,,,,,,, 82 | 5279.924549,clReleaseMemObject|General,START,,,,,,,,, 83 | 5279.930655,clReleaseMemObject|General,END,,,,,,,,, 84 | 5279.936644,clReleaseMemObject|General,START,,,,,,,,, 85 | 5279.942409,clReleaseMemObject|General,END,,,,,,,,, 86 | 5279.948456,clReleaseMemObject|General,START,,,,,,,,, 87 | 5279.976218,clReleaseMemObject|General,END,,,,,,,,, 88 | 5279.982649,clReleaseMemObject|General,START,,,,,,,,, 89 | 5279.989726,clReleaseMemObject|General,END,,,,,,,,, 90 | 5279.995706,clReleaseMemObject|General,START,,,,,,,,, 91 | 5280.002432,clReleaseMemObject|General,END,,,,,,,,, 92 | 5280.009724,clReleaseProgram|General,START,,,,,,,,, 93 | 5279.754514,READ_BUFFER,END,0X14A4480,1024,,,,,,, 94 | 4519.598665,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,773,774,0.004,4519.598665,4519.598669, 95 | 4796.389588,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,776,777,0.004,4796.389588,4796.389592, 96 | 4802.839947,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,790,791,0.004,4802.839947,4802.839952, 97 | 4805.621023,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,806,807,0.004,4805.621023,4805.621027, 98 | 4814.678754,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,822,823,0.004,4814.678754,4814.678759, 99 | 4814.678818,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,838,839,0.004,4814.678818,4814.678823, 100 | 4824.60244,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,854,855,0.004,4824.60244,4824.602443, 101 | 4834.441633,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,870,871,0.004,4834.441633,4834.441638, 102 | 4834.441698,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,886,887,0.004,4834.441698,4834.441701, 103 | 4834.441762,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,902,903,0.004,4834.441762,4834.441765, 104 | 4853.353966,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,918,919,0.004,4853.353966,4853.35397, 105 | 4853.35403,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,934,935,0.004,4853.35403,4853.354034, 106 | 4853.354094,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,950,951,0.004,4853.354094,4853.354098, 107 | 4872.516916,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,966,967,0.004,4872.516916,4872.51692, 108 | 4872.51698,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,982,983,0.004,4872.51698,4872.516984, 109 | 4872.517044,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,998,999,0.004,4872.517044,4872.517048, 110 | 4951.401917,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1166,1167,0.004,4951.401917,4951.401921, 111 | 4953.995803,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1169,1170,0.004,4953.995803,4953.995807, 112 | 4960.517865,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1183,1184,0.004,4960.517865,4960.517869, 113 | 4965.36103,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1199,1200,0.004,4965.36103,4965.361034, 114 | 4965.361094,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1215,1216,0.004,4965.361094,4965.361098, 115 | 4974.760256,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1231,1232,0.004,4974.760256,4974.76026, 116 | 4984.74891,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1247,1248,0.004,4984.74891,4984.748913, 117 | 4984.748973,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1263,1264,0.004,4984.748973,4984.748978, 118 | 4997.527122,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1279,1280,0.004,4997.527122,4997.527126, 119 | 4997.527186,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1295,1296,0.004,4997.527186,4997.52719, 120 | 4997.52725,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1311,1312,0.004,4997.52725,4997.527254, 121 | 5017.219202,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1327,1328,0.004,5017.219202,5017.219205, 122 | 5017.219265,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1343,1344,0.004,5017.219265,5017.21927, 123 | 5017.219329,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1359,1360,0.004,5017.219329,5017.219334, 124 | 5036.532965,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1375,1376,0.004,5036.532965,5036.532968, 125 | 5036.533029,Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Read,OCL Region,1,1,1391,1392,0.004,5036.533029,5036.533033, 126 | 5109.782799,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1567,1568,0.004,5109.782799,5109.782804, 127 | 5119.825562,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1585,1586,0.004,5119.825562,5119.825566, 128 | 5127.548296,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1603,1604,0.004,5127.548296,5127.5483, 129 | 5135.259652,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1621,1622,0.004,5135.259652,5135.259657, 130 | 5142.956922,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1639,1640,0.004,5142.956922,5142.956925, 131 | 5150.74268,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1657,1658,0.004,5150.74268,5150.742684, 132 | 5158.442499,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1675,1676,0.004,5158.442499,5158.442504, 133 | 5166.095489,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1693,1694,0.004,5166.095489,5166.095492, 134 | 5173.834999,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1711,1712,0.004,5173.834999,5173.835002, 135 | 5181.560792,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1729,1730,0.004,5181.560792,5181.560796, 136 | 5189.287673,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1747,1748,0.004,5189.287673,5189.287677, 137 | 5196.987295,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1765,1766,0.004,5196.987295,5196.987299, 138 | 5204.727909,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1783,1784,0.004,5204.727909,5204.727913, 139 | 5212.486518,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1801,1802,0.004,5212.486518,5212.486522, 140 | 5220.327591,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1819,1820,0.004,5220.327591,5220.327594, 141 | 5228.255192,Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0,Write,OCL Region,1,1,1837,1838,0.004,5228.255192,5228.255196, 142 | 14505.69079,clReleaseProgram|General,END,,,,,,,,, 143 | 14505.72991,clReleaseCommandQueue|General,START,,,,,,,,, 144 | 14505.74195,clReleaseCommandQueue|General,END,,,,,,,,, 145 | 14505.74941,clReleaseContext|General,START,,,,,,,,, 146 | 14505.75799,clReleaseContext|General,END,,,,,,,,, 147 | 14505.76523,clReleaseDevice|General,START,,,,,,,,, 148 | 14505.77207,clReleaseDevice|General,END,,,,,,,,, 149 | 14505.77837,clReleaseDevice|General,START,,,,,,,,, 150 | 14505.78437,clReleaseDevice|General,END,,,,,,,,, 151 | Footer,begin 152 | Project,vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0, 153 | Stall profiling,false, 154 | Target,Hardware Emulation, 155 | Platform,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0, 156 | Footer,end 157 | 158 | -------------------------------------------------------------------------------- /test/helloworld_ocl/hw_emu/sdaccel_timeline_trace.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 17 |

SDAccel Timeline Trace

18 |
19 |

Generated on: 2018-03-21 08:14:19

20 |

Profiled application: helloworld

21 |

Target platform: Xilinx

22 |
23 |

24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 |
Time (msec)NameEventAddress/PortSize (Bytes or Num)Latency (cycles)Start (cycles)End (cycles)Latency (usec)Start (msec)End (msec)
0.288481clGetExtensionFunctionAddress|GeneralSTART
0.326279clGetExtensionFunctionAddress|GeneralEND
0.338826clGetPlatformInfo|GeneralSTART
0.347772clGetPlatformInfo|GeneralEND
0.353404clGetPlatformInfo|GeneralSTART
0.358904clGetPlatformInfo|GeneralEND
0.401701clGetPlatformInfo|GeneralSTART
0.409335clGetPlatformInfo|GeneralEND
0.418269clGetPlatformInfo|GeneralSTART
0.423278clGetPlatformInfo|GeneralEND
0.50022clGetDeviceIDs|GeneralSTART
0.512141clGetDeviceIDs|GeneralEND
0.518369clGetDeviceIDs|GeneralSTART
0.523406clGetDeviceIDs|GeneralEND
0.531095clRetainDevice|GeneralSTART
0.537512clRetainDevice|GeneralEND
0.54381clRetainDevice|GeneralSTART
0.549045clRetainDevice|GeneralEND
0.555862clCreateContext|GeneralSTART
0.567757clCreateContext|GeneralEND
0.574235clCreateCommandQueue|GeneralSTART
0.587415clCreateCommandQueue|GeneralEND
0.594017clGetDeviceInfo|GeneralSTART
0.600161clGetDeviceInfo|GeneralEND
0.60579clGetDeviceInfo|GeneralSTART
0.611191clGetDeviceInfo|GeneralEND
3.050016clCreateProgramWithBinary|GeneralSTART
4504.267456clCreateProgramWithBinary|GeneralEND
4504.314784clCreateBuffer|GeneralSTART
4504.646184clCreateBuffer|GeneralEND
4504.655168clCreateBuffer|GeneralSTART
4505.043603clCreateBuffer|GeneralEND
4505.052999clCreateBuffer|GeneralSTART
4505.53924clCreateBuffer|GeneralEND
4505.555236clRetainMemObject|GeneralSTART
4505.564587clRetainMemObject|GeneralEND
4505.572326clRetainMemObject|GeneralSTART
4505.578806clRetainMemObject|GeneralEND
4505.58589clRetainMemObject|GeneralSTART
4505.592238clRetainMemObject|GeneralEND
4505.603836clEnqueueMigrateMemObjects|21770256START
4505.644057WRITE_BUFFERQUEUE0X149CCB02048
4506.014564clEnqueueMigrateMemObjects|21770256END
4506.027801clCreateKernel|GeneralSTART
4506.046324clCreateKernel|GeneralEND
4506.055694clSetKernelArg|GeneralSTART
4506.067753clSetKernelArg|GeneralEND
4506.073911clSetKernelArg|GeneralSTART
4506.080348clSetKernelArg|GeneralEND
4506.086342clSetKernelArg|GeneralSTART
4506.092637clSetKernelArg|GeneralEND
4506.098477clSetKernelArg|GeneralSTART
4506.104604clSetKernelArg|GeneralEND
4506.112731clEnqueueTask|21770256START
4506.137948KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0|vector_add|1:1:1|allQUEUE0X14A0C201
4506.867976clEnqueueTask|21770256END
4506.882133clEnqueueMigrateMemObjects|21770256START
4506.024483WRITE_BUFFERSUBMIT0X149CCB02048
4506.895589READ_BUFFERQUEUE0X14A44801024
4507.250916clEnqueueMigrateMemObjects|21770256END
4507.263682clFinish|GeneralSTART
4506.97118WRITE_BUFFERSTART0X149CCB02048
4518.469775KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0|vector_add|1:1:1|allSUBMIT0X14A0C201
4518.451556WRITE_BUFFEREND0X149CCB02048
4519.598625KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0|vector_add|1:1:1|vector_add_1START0X14A0C201
5274.379162KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0|vector_add|1:1:1|vector_add_1END0X14A0C201
5274.497885READ_BUFFERSUBMIT0X14A44801024
5275.273772READ_BUFFERSTART0X14A44801024
5279.782589clFinish|GeneralEND
5279.890787clReleaseKernel|GeneralSTART
5279.901495clReleaseKernel|GeneralEND
5279.910512clReleaseMemObject|GeneralSTART
5279.917511clReleaseMemObject|GeneralEND
5279.924549clReleaseMemObject|GeneralSTART
5279.930655clReleaseMemObject|GeneralEND
5279.936644clReleaseMemObject|GeneralSTART
5279.942409clReleaseMemObject|GeneralEND
5279.948456clReleaseMemObject|GeneralSTART
5279.976218clReleaseMemObject|GeneralEND
5279.982649clReleaseMemObject|GeneralSTART
5279.989726clReleaseMemObject|GeneralEND
5279.995706clReleaseMemObject|GeneralSTART
5280.002432clReleaseMemObject|GeneralEND
5280.009724clReleaseProgram|GeneralSTART
5279.754514READ_BUFFEREND0X14A44801024
4519.598665Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region117737740.0044519.5986654519.598669
4796.389588Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region117767770.0044796.3895884796.389592
4802.839947Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region117907910.0044802.8399474802.839952
4805.621023Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region118068070.0044805.6210234805.621027
4814.678754Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region118228230.0044814.6787544814.678759
4814.678818Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region118388390.0044814.6788184814.678823
4824.60244Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region118548550.0044824.602444824.602443
4834.441633Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region118708710.0044834.4416334834.441638
4834.441698Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region118868870.0044834.4416984834.441701
4834.441762Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region119029030.0044834.4417624834.441765
4853.353966Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region119189190.0044853.3539664853.35397
4853.35403Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region119349350.0044853.354034853.354034
4853.354094Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region119509510.0044853.3540944853.354098
4872.516916Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region119669670.0044872.5169164872.51692
4872.51698Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region119829830.0044872.516984872.516984
4872.517044Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region119989990.0044872.5170444872.517048
4951.401917Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11116611670.0044951.4019174951.401921
4953.995803Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11116911700.0044953.9958034953.995807
4960.517865Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11118311840.0044960.5178654960.517869
4965.36103Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11119912000.0044965.361034965.361034
4965.361094Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11121512160.0044965.3610944965.361098
4974.760256Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11123112320.0044974.7602564974.76026
4984.74891Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11124712480.0044984.748914984.748913
4984.748973Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11126312640.0044984.7489734984.748978
4997.527122Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11127912800.0044997.5271224997.527126
4997.527186Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11129512960.0044997.5271864997.52719
4997.52725Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11131113120.0044997.527254997.527254
5017.219202Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11132713280.0045017.2192025017.219205
5017.219265Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11134313440.0045017.2192655017.21927
5017.219329Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11135913600.0045017.2193295017.219334
5036.532965Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11137513760.0045036.5329655036.532968
5036.533029Kernel_Read|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0ReadOCL Region11139113920.0045036.5330295036.533033
5109.782799Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11156715680.0045109.7827995109.782804
5119.825562Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11158515860.0045119.8255625119.825566
5127.548296Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11160316040.0045127.5482965127.5483
5135.259652Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11162116220.0045135.2596525135.259657
5142.956922Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11163916400.0045142.9569225142.956925
5150.74268Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11165716580.0045150.742685150.742684
5158.442499Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11167516760.0045158.4424995158.442504
5166.095489Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11169316940.0045166.0954895166.095492
5173.834999Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11171117120.0045173.8349995173.835002
5181.560792Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11172917300.0045181.5607925181.560796
5189.287673Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11174717480.0045189.2876735189.287677
5196.987295Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11176517660.0045196.9872955196.987299
5204.727909Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11178317840.0045204.7279095204.727913
5212.486518Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11180118020.0045212.4865185212.486522
5220.327591Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11181918200.0045220.3275915220.327594
5228.255192Kernel_Write|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|vector_addition.hw_emu.xilinx_aws-vu9p-f1_4ddr-xpr-2pr_4_0WriteOCL Region11183718380.0045228.2551925228.255196
14505.69079clReleaseProgram|GeneralEND
14505.72991clReleaseCommandQueue|GeneralSTART
14505.74195clReleaseCommandQueue|GeneralEND
14505.74941clReleaseContext|GeneralSTART
14505.75799clReleaseContext|GeneralEND
14505.76523clReleaseDevice|GeneralSTART
14505.77207clReleaseDevice|GeneralEND
14505.77837clReleaseDevice|GeneralSTART
14505.78437clReleaseDevice|GeneralEND
182 | 183 | 184 | -------------------------------------------------------------------------------- /test/vector_addition_1000/reports/sdaccel_profile_summary.csv: -------------------------------------------------------------------------------- 1 | SDAccel Profile Summary 2 | Generated on: 2018-04-02 14:18:02 3 | Msec since Epoch: 1522678682454 4 | Profiled application: host 5 | Target platform: Xilinx 6 | Target devices: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0 7 | Flow mode: System Run 8 | Tool version: 2017.1 9 | 10 | OpenCL API Calls 11 | API Name,Number Of Calls,Total Time (ms),Minimum Time (ms),Average Time (ms),Maximum Time (ms), 12 | clCreateProgramWithBinary,1,5411.55,5411.55,5411.55,5411.55, 13 | clFinish,1,140.521,140.521,140.521,140.521, 14 | clEnqueueTask,1,0.137106,0.137106,0.137106,0.137106, 15 | clEnqueueMigrateMemObjects,2,0.096361,0.030913,0.0481805,0.065448, 16 | clReleaseProgram,1,0.07531,0.07531,0.07531,0.07531, 17 | clGetPlatformInfo,14,0.071116,0.004687,0.00507971,0.007864, 18 | clCreateBuffer,3,0.054971,0.007291,0.0183237,0.039199, 19 | clSetKernelArg,4,0.049103,0.00633,0.0122758,0.028652, 20 | clReleaseKernel,1,0.040271,0.040271,0.040271,0.040271, 21 | clGetExtensionFunctionAddress,2,0.039961,0.00656,0.0199805,0.033401, 22 | clReleaseMemObject,6,0.03825,0.005512,0.006375,0.009968, 23 | clCreateKernel,1,0.024704,0.024704,0.024704,0.024704, 24 | clRetainMemObject,3,0.022616,0.006217,0.00753867,0.009819, 25 | clCreateContext,1,0.021413,0.021413,0.021413,0.021413, 26 | clGetDeviceIDs,2,0.016534,0.005074,0.008267,0.01146, 27 | clReleaseDevice,2,0.011785,0.005697,0.0058925,0.006088, 28 | clGetDeviceInfo,2,0.011555,0.005196,0.0057775,0.006359, 29 | clRetainDevice,2,0.011507,0.005135,0.0057535,0.006372, 30 | clCreateCommandQueue,1,0.009645,0.009645,0.009645,0.009645, 31 | clReleaseCommandQueue,1,0.007712,0.007712,0.007712,0.007712, 32 | clReleaseContext,1,0.0073,0.0073,0.0073,0.0073, 33 | 34 | 35 | Kernel Execution 36 | Kernel,Number Of Enqueues,Total Time (ms),Minimum Time (ms),Average Time (ms),Maximum Time (ms), 37 | krnl_vadd,1,138.407,138.407,138.407,138.407, 38 | 39 | 40 | Compute Unit Utilization 41 | Device,Compute Unit,Kernel,Global Work Size,Local Work Size,Number Of Calls,Total Time (ms),Minimum Time (ms),Average Time (ms),Maximum Time (ms), 42 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,krnl_vadd_1,krnl_vadd,1:1:1,1:1:1,1,138.359,138.359,138.359,138.359, 43 | 44 | 45 | Data Transfer: Host and Global Memory 46 | Context:Number of Devices,Transfer Type,Number Of Transfers,Transfer Rate (MB/s),Average Bandwidth Utilization (%),Average Size (KB),Total Time (ms),Average Time (ms), 47 | context0:1,READ,1,6794.382028,70.774813,4096,0.602851,0.602851, 48 | context0:1,WRITE,1,5543.153982,57.741187,8192,1.477859,1.477859, 49 | 50 | 51 | Data Transfer: Kernels and Global Memory 52 | Device,Transfer Type,Number Of Transfers,Transfer Rate (MB/s),Average Bandwidth Utilization (%),Average Size (KB),Average Time (ns),Device Execution Time (ms), 53 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,READ,1088000,59.1878,0.513783,0.00752941,3497.32,138.407, 54 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,WRITE,1024000,29.5939,0.256891,0.004,109.761,138.407, 55 | 56 | 57 | Top Data Transfer: Kernels and Global Memory 58 | Device,Kernel Name,Number of Transfers,Average Bytes per Transfer,Transfer Efficiency (%),Total Data Transfer (MB),Total Write (MB),Total Read (MB),Transfer Rate (MB/s),Average Bandwidth Utilization (%), 59 | xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,ALL,2112000,5.81818,0.142045,12.288,4.096,8.192,88.7817,0.770674, 60 | 61 | 62 | Top Kernel Execution 63 | Kernel Instance Address,Kernel,Context ID,Command Queue ID,Device,Start Time (ms),Duration (ms),Global Work Size,Local Work Size, 64 | 33536400,krnl_vadd,0,0,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,5414.6,138.407,1:1:1,1:1:1, 65 | 66 | 67 | Top Buffer Writes 68 | Buffer Address,Context ID,Command Queue ID,Start Time (ms),Duration (ms),Buffer Size (KB),Writing Rate(MB/s), 69 | 33536592,0,0,5413.01,1.477859,8192,5543.153982, 70 | 71 | 72 | Top Buffer Reads 73 | Buffer Address,Context ID,Command Queue ID,Start Time (ms),Duration (ms),Buffer Size (KB),Reading Rate(MB/s), 74 | 33542752,0,0,5553.06,0.602851,4096,6794.382028, 75 | 76 | 77 | PRC Parameters 78 | Parameter,Element,Value, 79 | DEVICE_EXEC_TIME,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,138.406975, 80 | CU_CALLS,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|krnl_vadd_1,1, 81 | MEMORY_BIT_WIDTH,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0,512, 82 | 83 | 84 | -------------------------------------------------------------------------------- /test/vector_addition_1000/reports/sdaccel_profile_summary.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 17 |

SDAccel Profile Summary

18 |
19 |

Generated on: 2018-04-02 14:18:02

20 |

Profiled application: host

21 |

Target platform: Xilinx

22 |

Target devices: xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0

23 |

Flow mode: System Run

24 |

Tool version: 2017.1

25 |
26 |
27 |

OpenCL API Calls

28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 |
API NameNumber Of CallsTotal Time (ms)Minimum Time (ms)Average Time (ms)Maximum Time (ms)
clCreateProgramWithBinary15411.555411.555411.555411.55
clFinish1140.521140.521140.521140.521
clEnqueueTask10.1371060.1371060.1371060.137106
clEnqueueMigrateMemObjects20.0963610.0309130.04818050.065448
clReleaseProgram10.075310.075310.075310.07531
clGetPlatformInfo140.0711160.0046870.005079710.007864
clCreateBuffer30.0549710.0072910.01832370.039199
clSetKernelArg40.0491030.006330.01227580.028652
clReleaseKernel10.0402710.0402710.0402710.040271
clGetExtensionFunctionAddress20.0399610.006560.01998050.033401
clReleaseMemObject60.038250.0055120.0063750.009968
clCreateKernel10.0247040.0247040.0247040.024704
clRetainMemObject30.0226160.0062170.007538670.009819
clCreateContext10.0214130.0214130.0214130.021413
clGetDeviceIDs20.0165340.0050740.0082670.01146
clReleaseDevice20.0117850.0056970.00589250.006088
clGetDeviceInfo20.0115550.0051960.00577750.006359
clRetainDevice20.0115070.0051350.00575350.006372
clCreateCommandQueue10.0096450.0096450.0096450.009645
clReleaseCommandQueue10.0077120.0077120.0077120.007712
clReleaseContext10.00730.00730.00730.0073
60 |
61 |

Kernel Execution

62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 |
KernelNumber Of EnqueuesTotal Time (ms)Minimum Time (ms)Average Time (ms)Maximum Time (ms)
krnl_vadd1138.407138.407138.407138.407
74 |
75 |

Compute Unit Utilization

76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 |
DeviceCompute UnitKernelGlobal Work SizeLocal Work SizeNumber Of CallsTotal Time (ms)Minimum Time (ms)Average Time (ms)Maximum Time (ms)
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0krnl_vadd_1krnl_vadd1:1:11:1:11138.359138.359138.359138.359
92 |
93 |

Data Transfer: Host and Global Memory

94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 |
Context:Number of DevicesTransfer TypeNumber Of TransfersTransfer Rate (MB/s)Average Bandwidth Utilization (%)Average Size (KB)Total Time (ms)Average Time (ms)
context0:1READ16794.38202870.77481340960.6028510.602851
context0:1WRITE15543.15398257.74118781921.4778591.477859
109 |
110 |

Data Transfer: Kernels and Global Memory

111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 |
DeviceTransfer TypeNumber Of TransfersTransfer Rate (MB/s)Average Bandwidth Utilization (%)Average Size (KB)Average Time (ns)Device Execution Time (ms)
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0READ108800059.18780.5137830.007529413497.32138.407
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0WRITE102400029.59390.2568910.004109.761138.407
126 |
127 |

Top Data Transfer: Kernels and Global Memory

128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 |
DeviceKernel NameNumber of TransfersAverage Bytes per TransferTransfer Efficiency (%)Total Data Transfer (MB)Total Write (MB)Total Read (MB)Transfer Rate (MB/s)Average Bandwidth Utilization (%)
xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0ALL21120005.818180.14204512.2884.0968.19288.78170.770674
144 | 145 | 146 | -------------------------------------------------------------------------------- /test/vector_addition_1000/reports/sdaccel_timeline_trace.csv: -------------------------------------------------------------------------------- 1 | SDAccel Timeline Trace 2 | Generated on: 2018-04-02 14:18:02 3 | Msec since Epoch: 1522678682454 4 | Profiled application: host 5 | Target platform: Xilinx 6 | 7 | 8 | Time_msec,Name,Event,Address_Port,Size,Latency_cycles,Start_cycles,End_cycles,Latency_usec,Start_msec,End_msec, 9 | 0.316351,clGetExtensionFunctionAddress|General,START,,,,,,,,, 10 | 0.349752,clGetExtensionFunctionAddress|General,END,,,,,,,,, 11 | 0.36127,clGetExtensionFunctionAddress|General,START,,,,,,,,, 12 | 0.36783,clGetExtensionFunctionAddress|General,END,,,,,,,,, 13 | 0.376046,clGetPlatformInfo|General,START,,,,,,,,, 14 | 0.38391,clGetPlatformInfo|General,END,,,,,,,,, 15 | 0.389302,clGetPlatformInfo|General,START,,,,,,,,, 16 | 0.394414,clGetPlatformInfo|General,END,,,,,,,,, 17 | 0.407041,clGetPlatformInfo|General,START,,,,,,,,, 18 | 0.412132,clGetPlatformInfo|General,END,,,,,,,,, 19 | 0.416989,clGetPlatformInfo|General,START,,,,,,,,, 20 | 0.421737,clGetPlatformInfo|General,END,,,,,,,,, 21 | 0.426463,clGetPlatformInfo|General,START,,,,,,,,, 22 | 0.43115,clGetPlatformInfo|General,END,,,,,,,,, 23 | 0.436012,clGetPlatformInfo|General,START,,,,,,,,, 24 | 0.440924,clGetPlatformInfo|General,END,,,,,,,,, 25 | 0.445734,clGetPlatformInfo|General,START,,,,,,,,, 26 | 0.450485,clGetPlatformInfo|General,END,,,,,,,,, 27 | 0.455248,clGetPlatformInfo|General,START,,,,,,,,, 28 | 0.460017,clGetPlatformInfo|General,END,,,,,,,,, 29 | 0.467553,clGetPlatformInfo|General,START,,,,,,,,, 30 | 0.472311,clGetPlatformInfo|General,END,,,,,,,,, 31 | 0.477081,clGetPlatformInfo|General,START,,,,,,,,, 32 | 0.481887,clGetPlatformInfo|General,END,,,,,,,,, 33 | 0.486622,clGetPlatformInfo|General,START,,,,,,,,, 34 | 0.491328,clGetPlatformInfo|General,END,,,,,,,,, 35 | 0.496049,clGetPlatformInfo|General,START,,,,,,,,, 36 | 0.500789,clGetPlatformInfo|General,END,,,,,,,,, 37 | 0.524833,clGetPlatformInfo|General,START,,,,,,,,, 38 | 0.530225,clGetPlatformInfo|General,END,,,,,,,,, 39 | 0.539666,clGetPlatformInfo|General,START,,,,,,,,, 40 | 0.544446,clGetPlatformInfo|General,END,,,,,,,,, 41 | 0.588068,clGetDeviceIDs|General,START,,,,,,,,, 42 | 0.599528,clGetDeviceIDs|General,END,,,,,,,,, 43 | 0.605543,clGetDeviceIDs|General,START,,,,,,,,, 44 | 0.610617,clGetDeviceIDs|General,END,,,,,,,,, 45 | 0.619341,clRetainDevice|General,START,,,,,,,,, 46 | 0.625713,clRetainDevice|General,END,,,,,,,,, 47 | 0.632312,clRetainDevice|General,START,,,,,,,,, 48 | 0.637447,clRetainDevice|General,END,,,,,,,,, 49 | 0.64441,clCreateContext|General,START,,,,,,,,, 50 | 0.665823,clCreateContext|General,END,,,,,,,,, 51 | 0.674268,clCreateCommandQueue|General,START,,,,,,,,, 52 | 0.683913,clCreateCommandQueue|General,END,,,,,,,,, 53 | 0.691083,clGetDeviceInfo|General,START,,,,,,,,, 54 | 0.697442,clGetDeviceInfo|General,END,,,,,,,,, 55 | 0.702992,clGetDeviceInfo|General,START,,,,,,,,, 56 | 0.708188,clGetDeviceInfo|General,END,,,,,,,,, 57 | 0.847502,clCreateProgramWithBinary|General,START,,,,,,,,, 58 | 5412.401678,clCreateProgramWithBinary|General,END,,,,,,,,, 59 | 5412.447043,clCreateKernel|General,START,,,,,,,,, 60 | 5412.471747,clCreateKernel|General,END,,,,,,,,, 61 | 5412.482605,clCreateBuffer|General,START,,,,,,,,, 62 | 5412.521804,clCreateBuffer|General,END,,,,,,,,, 63 | 5412.529633,clCreateBuffer|General,START,,,,,,,,, 64 | 5412.536924,clCreateBuffer|General,END,,,,,,,,, 65 | 5412.593968,clCreateBuffer|General,START,,,,,,,,, 66 | 5412.602449,clCreateBuffer|General,END,,,,,,,,, 67 | 5412.611607,clRetainMemObject|General,START,,,,,,,,, 68 | 5412.621426,clRetainMemObject|General,END,,,,,,,,, 69 | 5412.629287,clRetainMemObject|General,START,,,,,,,,, 70 | 5412.635867,clRetainMemObject|General,END,,,,,,,,, 71 | 5412.642655,clRetainMemObject|General,START,,,,,,,,, 72 | 5412.648872,clRetainMemObject|General,END,,,,,,,,, 73 | 5412.660265,clEnqueueMigrateMemObjects|33528880,START,,,,,,,,, 74 | 5412.70815,WRITE_BUFFER,QUEUE,0X1FFBA50,8192000,,,,,,, 75 | 5412.725713,clEnqueueMigrateMemObjects|33528880,END,,,,,,,,, 76 | 5412.793915,clSetKernelArg|General,START,,,,,,,,, 77 | 5412.800486,WRITE_BUFFER,SUBMIT,0X1FFBA50,8192000,,,,,,, 78 | 5412.822567,clSetKernelArg|General,END,,,,,,,,, 79 | 5412.875521,clSetKernelArg|General,START,,,,,,,,, 80 | 5412.883192,clSetKernelArg|General,END,,,,,,,,, 81 | 5412.889394,clSetKernelArg|General,START,,,,,,,,, 82 | 5412.895724,clSetKernelArg|General,END,,,,,,,,, 83 | 5412.901548,clSetKernelArg|General,START,,,,,,,,, 84 | 5412.907998,clSetKernelArg|General,END,,,,,,,,, 85 | 5412.953816,clEnqueueTask|33528880,START,,,,,,,,, 86 | 5413.011706,WRITE_BUFFER,START,0X1FFBA50,8192000,,,,,,, 87 | 5413.026325,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|krnl_vadd.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|krnl_vadd|1:1:1|all,QUEUE,0X1FFB990,1,,,,,,, 88 | 5413.090922,clEnqueueTask|33528880,END,,,,,,,,, 89 | 5413.101106,clEnqueueMigrateMemObjects|33528880,START,,,,,,,,, 90 | 5413.116929,READ_BUFFER,QUEUE,0X1FFD260,4096000,,,,,,, 91 | 5413.132019,clEnqueueMigrateMemObjects|33528880,END,,,,,,,,, 92 | 5413.14359,clFinish|General,START,,,,,,,,, 93 | 5414.489565,WRITE_BUFFER,END,0X1FFBA50,8192000,,,,,,, 94 | 5414.540075,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|krnl_vadd.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|krnl_vadd|1:1:1|all,SUBMIT,0X1FFB990,1,,,,,,, 95 | 5414.606145,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|krnl_vadd.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|krnl_vadd|1:1:1|krnl_vadd_1,START,0X1FFB990,1,,,,,,, 96 | 5552.965332,KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|krnl_vadd.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|krnl_vadd|1:1:1|krnl_vadd_1,END,0X1FFB990,1,,,,,,, 97 | 5553.019282,READ_BUFFER,SUBMIT,0X1FFD260,4096000,,,,,,, 98 | 5553.061313,READ_BUFFER,START,0X1FFD260,4096000,,,,,,, 99 | 5553.664164,READ_BUFFER,END,0X1FFD260,4096000,,,,,,, 100 | 5553.664777,clFinish|General,END,,,,,,,,, 101 | 5559.529787,clReleaseMemObject|General,START,,,,,,,,, 102 | 5559.539755,clReleaseMemObject|General,END,,,,,,,,, 103 | 5559.545935,clReleaseMemObject|General,START,,,,,,,,, 104 | 5559.551635,clReleaseMemObject|General,END,,,,,,,,, 105 | 5559.557488,clReleaseMemObject|General,START,,,,,,,,, 106 | 5559.563178,clReleaseMemObject|General,END,,,,,,,,, 107 | 5559.56911,clReleaseMemObject|General,START,,,,,,,,, 108 | 5559.574875,clReleaseMemObject|General,END,,,,,,,,, 109 | 5559.580869,clReleaseMemObject|General,START,,,,,,,,, 110 | 5559.586381,clReleaseMemObject|General,END,,,,,,,,, 111 | 5559.592046,clReleaseMemObject|General,START,,,,,,,,, 112 | 5559.597661,clReleaseMemObject|General,END,,,,,,,,, 113 | 5559.604453,clReleaseKernel|General,START,,,,,,,,, 114 | 5559.644724,clReleaseKernel|General,END,,,,,,,,, 115 | 5559.653248,clReleaseProgram|General,START,,,,,,,,, 116 | 5559.728558,clReleaseProgram|General,END,,,,,,,,, 117 | 5559.759549,clReleaseCommandQueue|General,START,,,,,,,,, 118 | 5559.767261,clReleaseCommandQueue|General,END,,,,,,,,, 119 | 5559.774428,clReleaseContext|General,START,,,,,,,,, 120 | 5559.781728,clReleaseContext|General,END,,,,,,,,, 121 | 5559.788665,clReleaseDevice|General,START,,,,,,,,, 122 | 5559.794753,clReleaseDevice|General,END,,,,,,,,, 123 | 5559.800968,clReleaseDevice|General,START,,,,,,,,, 124 | 5559.806665,clReleaseDevice|General,END,,,,,,,,, 125 | Footer,begin 126 | Project,krnl_vadd.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0, 127 | Stall profiling,false, 128 | Target,System Run, 129 | Platform,xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0, 130 | Footer,end 131 | 132 | -------------------------------------------------------------------------------- /test/vector_addition_1000/reports/sdaccel_timeline_trace.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 17 |

SDAccel Timeline Trace

18 |
19 |

Generated on: 2018-04-02 14:18:02

20 |

Profiled application: host

21 |

Target platform: Xilinx

22 |
23 |

24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 |
Time (msec)NameEventAddress/PortSize (Bytes or Num)Latency (cycles)Start (cycles)End (cycles)Latency (usec)Start (msec)End (msec)
0.316351clGetExtensionFunctionAddress|GeneralSTART
0.349752clGetExtensionFunctionAddress|GeneralEND
0.36127clGetExtensionFunctionAddress|GeneralSTART
0.36783clGetExtensionFunctionAddress|GeneralEND
0.376046clGetPlatformInfo|GeneralSTART
0.38391clGetPlatformInfo|GeneralEND
0.389302clGetPlatformInfo|GeneralSTART
0.394414clGetPlatformInfo|GeneralEND
0.407041clGetPlatformInfo|GeneralSTART
0.412132clGetPlatformInfo|GeneralEND
0.416989clGetPlatformInfo|GeneralSTART
0.421737clGetPlatformInfo|GeneralEND
0.426463clGetPlatformInfo|GeneralSTART
0.43115clGetPlatformInfo|GeneralEND
0.436012clGetPlatformInfo|GeneralSTART
0.440924clGetPlatformInfo|GeneralEND
0.445734clGetPlatformInfo|GeneralSTART
0.450485clGetPlatformInfo|GeneralEND
0.455248clGetPlatformInfo|GeneralSTART
0.460017clGetPlatformInfo|GeneralEND
0.467553clGetPlatformInfo|GeneralSTART
0.472311clGetPlatformInfo|GeneralEND
0.477081clGetPlatformInfo|GeneralSTART
0.481887clGetPlatformInfo|GeneralEND
0.486622clGetPlatformInfo|GeneralSTART
0.491328clGetPlatformInfo|GeneralEND
0.496049clGetPlatformInfo|GeneralSTART
0.500789clGetPlatformInfo|GeneralEND
0.524833clGetPlatformInfo|GeneralSTART
0.530225clGetPlatformInfo|GeneralEND
0.539666clGetPlatformInfo|GeneralSTART
0.544446clGetPlatformInfo|GeneralEND
0.588068clGetDeviceIDs|GeneralSTART
0.599528clGetDeviceIDs|GeneralEND
0.605543clGetDeviceIDs|GeneralSTART
0.610617clGetDeviceIDs|GeneralEND
0.619341clRetainDevice|GeneralSTART
0.625713clRetainDevice|GeneralEND
0.632312clRetainDevice|GeneralSTART
0.637447clRetainDevice|GeneralEND
0.64441clCreateContext|GeneralSTART
0.665823clCreateContext|GeneralEND
0.674268clCreateCommandQueue|GeneralSTART
0.683913clCreateCommandQueue|GeneralEND
0.691083clGetDeviceInfo|GeneralSTART
0.697442clGetDeviceInfo|GeneralEND
0.702992clGetDeviceInfo|GeneralSTART
0.708188clGetDeviceInfo|GeneralEND
0.847502clCreateProgramWithBinary|GeneralSTART
5412.401678clCreateProgramWithBinary|GeneralEND
5412.447043clCreateKernel|GeneralSTART
5412.471747clCreateKernel|GeneralEND
5412.482605clCreateBuffer|GeneralSTART
5412.521804clCreateBuffer|GeneralEND
5412.529633clCreateBuffer|GeneralSTART
5412.536924clCreateBuffer|GeneralEND
5412.593968clCreateBuffer|GeneralSTART
5412.602449clCreateBuffer|GeneralEND
5412.611607clRetainMemObject|GeneralSTART
5412.621426clRetainMemObject|GeneralEND
5412.629287clRetainMemObject|GeneralSTART
5412.635867clRetainMemObject|GeneralEND
5412.642655clRetainMemObject|GeneralSTART
5412.648872clRetainMemObject|GeneralEND
5412.660265clEnqueueMigrateMemObjects|33528880START
5412.70815WRITE_BUFFERQUEUE0X1FFBA508192000
5412.725713clEnqueueMigrateMemObjects|33528880END
5412.793915clSetKernelArg|GeneralSTART
5412.800486WRITE_BUFFERSUBMIT0X1FFBA508192000
5412.822567clSetKernelArg|GeneralEND
5412.875521clSetKernelArg|GeneralSTART
5412.883192clSetKernelArg|GeneralEND
5412.889394clSetKernelArg|GeneralSTART
5412.895724clSetKernelArg|GeneralEND
5412.901548clSetKernelArg|GeneralSTART
5412.907998clSetKernelArg|GeneralEND
5412.953816clEnqueueTask|33528880START
5413.011706WRITE_BUFFERSTART0X1FFBA508192000
5413.026325KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|krnl_vadd.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|krnl_vadd|1:1:1|allQUEUE0X1FFB9901
5413.090922clEnqueueTask|33528880END
5413.101106clEnqueueMigrateMemObjects|33528880START
5413.116929READ_BUFFERQUEUE0X1FFD2604096000
5413.132019clEnqueueMigrateMemObjects|33528880END
5413.14359clFinish|GeneralSTART
5414.489565WRITE_BUFFEREND0X1FFBA508192000
5414.540075KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|krnl_vadd.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|krnl_vadd|1:1:1|allSUBMIT0X1FFB9901
5414.606145KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|krnl_vadd.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|krnl_vadd|1:1:1|krnl_vadd_1START0X1FFB9901
5552.965332KERNEL|xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0-0|krnl_vadd.hw.xilinx_aws-vu9p-f1_4ddr-xpr-2pr-debug_4_0|krnl_vadd|1:1:1|krnl_vadd_1END0X1FFB9901
5553.019282READ_BUFFERSUBMIT0X1FFD2604096000
5553.061313READ_BUFFERSTART0X1FFD2604096000
5553.664164READ_BUFFEREND0X1FFD2604096000
5553.664777clFinish|GeneralEND
5559.529787clReleaseMemObject|GeneralSTART
5559.539755clReleaseMemObject|GeneralEND
5559.545935clReleaseMemObject|GeneralSTART
5559.551635clReleaseMemObject|GeneralEND
5559.557488clReleaseMemObject|GeneralSTART
5559.563178clReleaseMemObject|GeneralEND
5559.56911clReleaseMemObject|GeneralSTART
5559.574875clReleaseMemObject|GeneralEND
5559.580869clReleaseMemObject|GeneralSTART
5559.586381clReleaseMemObject|GeneralEND
5559.592046clReleaseMemObject|GeneralSTART
5559.597661clReleaseMemObject|GeneralEND
5559.604453clReleaseKernel|GeneralSTART
5559.644724clReleaseKernel|GeneralEND
5559.653248clReleaseProgram|GeneralSTART
5559.728558clReleaseProgram|GeneralEND
5559.759549clReleaseCommandQueue|GeneralSTART
5559.767261clReleaseCommandQueue|GeneralEND
5559.774428clReleaseContext|GeneralSTART
5559.781728clReleaseContext|GeneralEND
5559.788665clReleaseDevice|GeneralSTART
5559.794753clReleaseDevice|GeneralEND
5559.800968clReleaseDevice|GeneralSTART
5559.806665clReleaseDevice|GeneralEND
156 | 157 | 158 | -------------------------------------------------------------------------------- /test/vector_addition_1000/src/host.cpp: -------------------------------------------------------------------------------- 1 | /********** 2 | Copyright (c) 2018, Xilinx, Inc. 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without modification, 6 | are permitted provided that the following conditions are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright notice, 9 | this list of conditions and the following disclaimer. 10 | 11 | 2. Redistributions in binary form must reproduce the above copyright notice, 12 | this list of conditions and the following disclaimer in the documentation 13 | and/or other materials provided with the distribution. 14 | 15 | 3. Neither the name of the copyright holder nor the names of its contributors 16 | may be used to endorse or promote products derived from this software 17 | without specific prior written permission. 18 | 19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 21 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 22 | IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 23 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 24 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25 | HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, 27 | EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | **********/ 29 | #include "xcl2.hpp" 30 | #include 31 | #include "vadd.h" 32 | 33 | int main(int argc, char* argv[]) { 34 | 35 | size_t vector_size_bytes = sizeof(int) * LENGTH; 36 | std::vector> source_a (LENGTH); 37 | std::vector> source_b (LENGTH); 38 | std::vector> result_sim (LENGTH); 39 | std::vector> result_krnl (LENGTH); 40 | 41 | /* Create the test data and run the vector addition locally */ 42 | for(int i=0; i < LENGTH; i++){ 43 | source_a[i] = i; 44 | source_b[i] = 2*i; 45 | result_sim[i] = source_a[i] + source_b[i]; 46 | } 47 | 48 | std::vector devices = xcl::get_xil_devices(); 49 | cl::Device device = devices[0]; 50 | 51 | cl::Context context(device); 52 | cl::CommandQueue q(context, device, CL_QUEUE_PROFILING_ENABLE); 53 | std::string device_name = device.getInfo(); 54 | 55 | std::string binaryFile = xcl::find_binary_file(device_name,"krnl_vadd"); 56 | cl::Program::Binaries bins = xcl::import_binary_file(binaryFile); 57 | devices.resize(1); 58 | cl::Program program(context, devices, bins); 59 | cl::Kernel krnl(program,"krnl_vadd"); 60 | 61 | std::vector inBufVec, outBufVec; 62 | cl::Buffer buffer_a(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 63 | vector_size_bytes, source_a.data()); 64 | cl::Buffer buffer_b(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, 65 | vector_size_bytes, source_b.data()); 66 | cl::Buffer buffer_c(context,CL_MEM_USE_HOST_PTR | CL_MEM_WRITE_ONLY, 67 | vector_size_bytes, result_krnl.data()); 68 | inBufVec.push_back(buffer_a); 69 | inBufVec.push_back(buffer_b); 70 | outBufVec.push_back(buffer_c); 71 | 72 | 73 | /* Copy input vectors to memory */ 74 | q.enqueueMigrateMemObjects(inBufVec,0/* 0 means from host*/); 75 | 76 | /* Set the kernel arguments */ 77 | int vector_length = LENGTH; 78 | krnl.setArg(0, buffer_a); 79 | krnl.setArg(1, buffer_b); 80 | krnl.setArg(2, buffer_c); 81 | krnl.setArg(3, vector_length); 82 | 83 | /* Launch the kernel */ 84 | q.enqueueTask(krnl); 85 | 86 | /* Copy result to local buffer */ 87 | q.enqueueMigrateMemObjects(outBufVec,CL_MIGRATE_MEM_OBJECT_HOST); 88 | q.finish(); 89 | 90 | 91 | /* Compare the results of the kernel to the simulation */ 92 | int krnl_match = 0; 93 | for(int i = 0; i < LENGTH; i++){ 94 | if(result_sim[i] != result_krnl[i]){ 95 | printf("Error: Result mismatch\n"); 96 | printf("i = %d CPU result = %d Krnl Result = %d\n", i, result_sim[i], result_krnl[i]); 97 | krnl_match = 1; 98 | break; 99 | } else{ 100 | //printf("Result Match: i = %d CPU result = %d Krnl Result = %d\n", i, result_sim[i], result_krnl[i]); 101 | } 102 | } 103 | 104 | std::cout << "TEST " << (krnl_match ? "FAILED" : "PASSED") << std::endl; 105 | return (krnl_match ? EXIT_FAILURE : EXIT_SUCCESS); 106 | } 107 | -------------------------------------------------------------------------------- /test/vector_addition_1000/src/krnl_vadd.cl: -------------------------------------------------------------------------------- 1 | /********** 2 | Copyright (c) 2018, Xilinx, Inc. 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without modification, 6 | are permitted provided that the following conditions are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright notice, 9 | this list of conditions and the following disclaimer. 10 | 11 | 2. Redistributions in binary form must reproduce the above copyright notice, 12 | this list of conditions and the following disclaimer in the documentation 13 | and/or other materials provided with the distribution. 14 | 15 | 3. Neither the name of the copyright holder nor the names of its contributors 16 | may be used to endorse or promote products derived from this software 17 | without specific prior written permission. 18 | 19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 21 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 22 | IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 23 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 24 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25 | HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, 27 | EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | **********/ 29 | 30 | #define N 128 31 | 32 | //__kernel void __attribute__ ((reqd_work_group_size(1,1,1))) 33 | void 34 | krnl_vadd_core( 35 | __global int *a, 36 | __global int *b, 37 | __global int *c) 38 | { 39 | int result[N]; 40 | 41 | { 42 | int j; 43 | read_a: 44 | __attribute__((xcl_pipeline_loop)) 45 | for(j=0; j < N; j++) 46 | result[j] = a[j]; 47 | 48 | read_b_write_c: // simultaneously both read and write are supported 49 | __attribute__((xcl_pipeline_loop)) 50 | for(j=0; j < N; j++) 51 | c[j] = result[j] + b[j]; 52 | } 53 | 54 | } 55 | 56 | __kernel void __attribute__ ((reqd_work_group_size(1, 1, 1))) 57 | krnl_vadd( 58 | __global int* a, 59 | __global int* b, 60 | __global int* c, 61 | const int length) { 62 | 63 | // optimized kernel code 64 | 65 | int iterations = length/N; 66 | for(int i=0; i < iterations; i++) { 67 | krnl_vadd_core(a + i*N, b + i*N, c +i*N); 68 | } 69 | 70 | 71 | return; 72 | } 73 | -------------------------------------------------------------------------------- /test/vector_addition_1000/src/vadd.h: -------------------------------------------------------------------------------- 1 | /********** 2 | Copyright (c) 2018, Xilinx, Inc. 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without modification, 6 | are permitted provided that the following conditions are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright notice, 9 | this list of conditions and the following disclaimer. 10 | 11 | 2. Redistributions in binary form must reproduce the above copyright notice, 12 | this list of conditions and the following disclaimer in the documentation 13 | and/or other materials provided with the distribution. 14 | 15 | 3. Neither the name of the copyright holder nor the names of its contributors 16 | may be used to endorse or promote products derived from this software 17 | without specific prior written permission. 18 | 19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 20 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 21 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 22 | IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 23 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 24 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25 | HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, 27 | EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | **********/ 29 | #pragma once 30 | 31 | #define LENGTH (1024*1000) 32 | #define NUM_WORKGROUPS (1) 33 | #define WORKGROUP_SIZE (16) 34 | --------------------------------------------------------------------------------