├── README.md └── refs.bib /README.md: -------------------------------------------------------------------------------- 1 | # Literature Review 2 | ## Conferences 3 | - [ISCA2016] (http://isca2016.eecs.umich.edu/) [ISCA2015] (http://www.ece.cmu.edu/calcm/isca2015/) 4 | - [Micro2016] (https://www.microarch.org/micro49/) [Micro2015] (https://www.microarch.org/micro48/) 5 | - [ASPLOS2016] (https://www.ece.cmu.edu/calcm/asplos2016/) 6 | - [HPCA2016] (http://hpca22.site.ac.upc.edu/index.php/program/conference-program/) [HPCA2015] (http://darksilicon.org/hpca/) 7 | - [SC2016] (http://sc16.supercomputing.org/conference-components/technical-program-tues-fri/technical-papers) [SC2015] (http://sc15.supercomputing.org/program/technical-papers.html) 8 | - [VLDB2016] (http://vldb2016.persistent.com/) 9 | - [OSDI2016] 10 | - [EuroSys2016] 11 | - [ICDE2016] 12 | - [SIGMOD2016] 13 | - [FCCM2016] (http://fccm.org/2016/cfp.html) [FCCM2015] (http://fccm.org/2015/) 14 | - [FPGA2016] (http://www.isfpga.org/fpga2016/index.html) [FPGA2015] (http://www.eecs.ucf.edu/isfpga/) 15 | - [FPL2016] (http://www.fpl2016.org/) [FPL2015] (http://www.fpl2015.org/?page=tech_sessions#arch3) 16 | - [FPT2016] (http://www.icfpt2016.org/index.jsp) [FPT2015] (http://fpt.massey.ac.nz/) 17 | - [ASAP2016] (http://www.asap2016.org/) [ASAP2015] (http://www.eecg.toronto.edu/asap2015/) 18 | 19 | 20 | ## Research Groups on Graph Acceleration Research 21 | - [GAP] (http://gap.cs.berkeley.edu/) 22 | - [GPS] (http://infolab.stanford.edu/gps/) 23 | - [Big Graph Mining] (http://datalab.snu.ac.kr/projects/big-graph-mining) 24 | - [amplab: GraphX] (https://amplab.cs.berkeley.edu/projects/graphx/) 25 | - [Gunrock] (http://gunrock.github.io/gunrock/doc/latest/index.html) 26 | - [Ligra] (http://jshun.github.io/ligra/) 27 | - [Galois] (http://iss.ices.utexas.edu/?p=projects/galois) 28 | - [Trinity] (https://www.microsoft.com/en-us/research/project/trinity/) 29 | - [GRASP] (http://grasp.cs.ucr.edu/) 30 | 31 | ## Reading List 32 | 33 | ### Graph Processing 34 | The graph processing algorithms and frameworks are roughly classified based on the target computing platforms including many-core processors, distributed systems, GPUs, ASIC based Accelerators and FPGAs. Instead of targeting the graph processing framework, some of the work may particularly focus on one aspect of the graph processing such as graph compression, pre-processing, partition and load balancing. These work will be put in corresponding subsections as well. 35 | 36 | #### Survey 37 | 38 | - McCune, Robert Ryan, Tim Weninger, and Greg Madey. "Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing." ACM Computing Surveys (CSUR) 48.2 (2015): 25. 39 | 40 | - Doekemeijer, Niels, and Ana Lucia Varbanescu. "A survey of parallel graph processing frameworks." Delft University of Technology (2014). 41 | 42 | 43 | #### Graph Processing on GPUs 44 | - Shi, Xuanhua, J. Liang, X. Luo, S. Di, B. He, L. Lu, and Hai Jin. "Frog: 45 | Asynchronous graph processing on GPU with hybrid coloring model." Huazhong 46 | University of Science and Technology, Tech. Rep. HUSTCGCL-TR-402 (2015). 47 | 48 | - Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha: 49 | vertex-centric graph processing on GPUs. In Proceedings of the 23rd 50 | international symposium on High-performance parallel and distributed computing 51 | (HPDC '14). ACM, New York, NY, USA, 239-252. 52 | 53 | - Fu, Zhisong, Michael Personick, and Bryan Thompson. "Mapgraph: A high level API for fast development of high performance graph analytics on GPUs." In Proceedings of Workshop on GRAph Data management Experiences and Systems pp. 1-6. ACM, 2014 54 | 55 | - Andrew Davidson, Sean Baxter, Michael Garland, and John D. Owens. 2014. 56 | Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths. In 57 | Proceedings of the 2014 IEEE 28th International Parallel and Distributed 58 | Processing Symposium (IPDPS '14). IEEE Computer Society, Washington, DC, USA, 59 | 349-359. 60 | 61 | - Merrill, Duane, Michael Garland, and Andrew Grimshaw. "Scalable GPU graph traversal." ACM SIGPLAN Notices. Vol. 47. No. 8. ACM, 2012. 62 | 63 | - Singh D P, Khare N. Modified Dijkstra’s Algorithm for Dense Graphs on GPU 64 | using CUDA[J]. Indian Journal of Science and Technology, 2016, 9(33). 65 | 66 | - Wang, Yangzihao; Davidson, Andrew; Pan, Yuechao; Wu, Yuduo; Riffel, Andy; & Owens, John D.(2016). Gunrock: A High-Performance Graph Processing Library on the GPU. Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 67 | 68 | - Singh DP, Khare N, Rasool A. Efficient Parallel Implementation of Single 69 | Source Shortest Path Algorithm on GPU Using CUDA. International Journal of 70 | Applied Engineering Research. 2016; 11(4):2560–7. 71 | 72 | - Bingsheng He, Jianlong Zhong, "Medusa: Simplified Graph Processing on GPUs", IEEE Transactions on Parallel & Distributed Systems 73 | 74 | - Hong, Sungpack, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun. 75 | "Accelerating CUDA graph algorithms at maximum warp." In ACM SIGPLAN Notices, 76 | vol. 46, no. 8, pp. 267-276. ACM, 2011. 77 | 78 | #### Graph Processing on CPUs 79 | - Roy, Amitabha, Ivo Mihailovic, and Willy Zwaenepoel. "X-Stream: edge-centric 80 | graph processing using streaming partitions." In Proceedings of the 81 | Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472-488. ACM, 82 | 2013. 83 | 84 | - Shang, Zechao, Feifei Li, Jeffrey Xu Yu, Zhiwei Zhang, and Hong Cheng. "Graph 85 | Analytics Through Fine-Grained Parallelism. SIGMOD, 2016" 86 | 87 | - Sundaram, Narayanan, et al. "GraphMat: High performance graph analytics made productive." Proceedings of the VLDB Endowment 8.11 (2015): 1214-1225. 88 | 89 | - Julian Shun. An Evaluation of Parallel Eccentricity Estimation Algorithms on 90 | Undirected Real-World Graphs. Proceedings of the ACM SIGKDD Conference on 91 | Knowledge Discovery and Data Mining (KDD), pp. 1095-1104, 2015. 92 | 93 | - Delling, Daniel, et al. "Phast: Hardware-accelerated shortest path trees." 94 | Journal of Parallel and Distributed Computing 73.7 (2013): 940-952. 95 | 96 | - Meyer, Ulrich, and Peter Sanders. "Δ-stepping: a parallelizable shortest path 97 | algorithm." Journal of Algorithms 49.1 (2003): 114-152. 98 | 99 | - Kyrola, Aapo, Guy Blelloch, and Carlos Guestrin. "GraphChi: large-scale graph computation on just a PC." Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 2012. 100 | 101 | - Julian Shun and Guy E. Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13). ACM, New York, NY, USA, 135-146. 102 | 103 | - Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, Huazhong Yang, "NXgraph: An Efficient Graph Processing System on a Single Machine", CoRR, 2015 104 | 105 | - Cheng, Raymond, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan 106 | Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. "Kineograph: taking the pulse 107 | of a fast-changing and connected world." In Proceedings of the 7th ACM 108 | european conference on Computer Systems, pp. 85-98. ACM, 2012. 109 | 110 | - Geisberger, Robert, Peter Sanders, Dominik Schultes, and Daniel Delling. 111 | "Contraction hierarchies: Faster and simpler hierarchical routing in road 112 | networks." In International Workshop on Experimental and Efficient Algorithms, 113 | pp. 319-333. Springer Berlin Heidelberg, 2008. 114 | 115 | - Zheng, Da, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, 116 | and Alexander S. Szalay. "FlashGraph: Processing billion-node graphs on an 117 | array of commodity SSDs." In 13th USENIX Conference on File and Storage 118 | Technologies (FAST 15), pp. 45-58. 2015. 119 | 120 | - Yuan, Pingpeng, Wenya Zhang, Changfeng Xie, Hai Jin, Ling Liu, and Kisung Lee. 121 | "Fast iterative graph computation: A path centric approach." In Proceedings of 122 | the International Conference for High Performance Computing, Networking, 123 | Storage and Analysis, pp. 401-412. IEEE Press, 2014. 124 | 125 | - Najeebullah, Kamran, Kifayat Ullah Khan, Waqas Nawaz, and Young-Koo Lee. "BPP: 126 | Large Graph Storage for Efficient Disk Based Processing." arXiv preprint 127 | arXiv:1401.2327 (2014). 128 | 129 | - Nilakant, Karthik, Valentin Dalibard, Amitabha Roy, and Eiko Yoneki. 130 | "PrefEdge: SSD prefetcher for large-scale graph traversal." In Proceedings of 131 | International Conference on Systems and Storage, pp. 1-12. ACM, 2014. 132 | 133 | - Nguyen, Donald, Andrew Lenharth, and Keshav Pingali. "A lightweight 134 | infrastructure for graph analytics." In Proceedings of the Twenty-Fourth ACM 135 | Symposium on Operating Systems Principles, pp. 456-471. ACM, 2013. 136 | 137 | #### Graph Processing on Distributed Systems 138 | - Venkataraman, Shivaram, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, and Robert 139 | S. Schreiber. "Presto: distributed machine learning and graph processing with 140 | sparse matrices." In Proceedings of the 8th ACM European Conference on 141 | Computer Systems, pp. 197-210. ACM, 2013. 142 | 143 | - Gonzalez, Joseph E., et al. "Graphx: Graph processing in a distributed dataflow framework." 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 2014. 144 | 145 | - Salihoglu, Semih, and Jennifer Widom. "GPS: a graph processing system." Proceedings of the 25th International Conference on Scientific and Statistical Database Management. ACM, 2013. 146 | 147 | - Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010. 148 | 149 | - Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. 2016. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems (GRADES '16). ACM, New York, NY, USA 150 | 151 | - Steinbauer, Matthias, and Gabriele Anderst-Kotsis. "DynamoGraph: extending the Pregel paradigm for large-scale temporal graph processing." International Journal of Grid and Utility Computing 7.2 (2016): 141-151. 152 | 153 | - Steinbauer, Matthias, and Gabriele Anderst-Kotsis. "DynamoGraph: A Distributed System for Large-scale, Temporal Graph Processing, its Implementation and First Observations." Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 2016. 154 | 155 | - Khayyat, Zuhair, et al. "Mizan: a system for dynamic load balancing in large-scale graph processing." Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013. 156 | 157 | - Sengupta, Dipanjan, et al. "Graphin: An online high performance incremental graph processing framework." European Conference on Parallel Processing. Springer International Publishing, 2016. 158 | 159 | - Sabeur Aridhi, Alberto Montresor, and Yannis Velegrakis. 2016. BLADYG: A Novel Block-Centric Framework for the Analysis of Large Dynamic Graphs. In Proceedings of the ACM Workshop on High Performance Graph Processing (HPGP '16). ACM, New York, NY, USA 160 | 161 | 162 | #### Graph Processing on FPGAs 163 | - Umuroglu, Yaman, Donn Morrison, and Magnus Jahre. "Hybrid breadth-first search 164 | on a single-chip FPGA-CPU heterogeneous platform." In Field Programmable Logic 165 | and Applications (FPL), 2015 25th International Conference on, pp. 1-8. IEEE, 166 | 2015. 167 | 168 | - Oguntebi and Kunle Olukotun. 2016. GraphOps: A Dataflow Library for Graph 169 | Analytics Acceleration. In Proceedings of the 2016 ACM/SIGDA International 170 | Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY, USA, 171 | 111-117. DOI: http://dx.doi.org/10.1145/2847263.2847337 172 | 173 | - Nurvitadhi, Eriko, et al. "GraphGen: An FPGA framework for vertex-centric graph computation." Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on. IEEE, 2014. 174 | 175 | - U. Bondhugula, A. Devulapalli, J. Fernando, P. Wyckoff and P. Sadayappan, "Parallel FPGA-based all-pairs shortest-paths in a directed graph," Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006 176 | 177 | - Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 178 | 179 | - N. Engelhardt and H. K. H. So, "GraVF: A vertex-centric distributed graph processing framework on FPGAs," 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, Switzerland, 2016, pp. 1-4. 180 | 181 | - Kapre, Nachiket. "Custom FPGA-based soft-processors for sparse graph 182 | acceleration." In 2015 IEEE 26th International Conference on 183 | Application-specific Systems, Architectures and Processors (ASAP), pp. 9-16. 184 | IEEE, 2015. 185 | 186 | - Kapre, Nachiket, and Pradeep Moorthy. "A case for embedded FPGA-based socs in 187 | energy-efficient acceleration of graph problems." Supercomputing frontiers and 188 | innovations 2, no. 3 (2015): 76-86. 189 | 190 | - S. Zhou, C. Chelmis and V. K. Prasanna, "High-Throughput and Energy-Efficient 191 | Graph Processing on FPGA," 2016 IEEE 24th Annual International Symposium on 192 | Field-Programmable Custom Computing Machines (FCCM), Washington, DC, 2016, pp. 193 | 103-110. 194 | 195 | #### Graph Processing on ASICs 196 | 197 | - Ham, Tae Jun, et al. "Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics." 198 | 199 | - Ozdal, Muhammet Mustafa, et al. "Energy efficient architecture for graph analytics accelerators." Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 2016. 200 | 201 | - Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 105-117. 202 | 203 | ### Graph Partition and Clustering 204 | - Chen, Rong, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. "Powerlyra: 205 | Differentiated graph computation and partitioning on skewed graphs." In 206 | Proceedings of the Tenth European Conference on Computer Systems, p. 1. ACM, 207 | 2015. 208 | 209 | - Vaquero, Luis, et al. "xDGP: A dynamic graph processing system with adaptive partitioning." arXiv preprint arXiv:1309.1049 (2013). 210 | 211 | - Julian Shun, Farbod Roosta-Khorasani, Kimon Fountoulakis and Michael Mahoney. 212 | Parallel Local Graph Clustering. Proceedings of the International Conference 213 | on Very Large Data Bases (VLDB), 2016. 214 | 215 | - A. Abdolrashidi and L. Ramaswamy, "Continual and Cost-Effective Partitioning of Dynamic Graphs for Optimizing Big Graph Processing Systems," 2016 IEEE International Congress on Big Data (BigData Congress), San Francisco, CA, USA, 2016 216 | 217 | - Andreas Beckmann, Ulrich Meyer and David, Veith, "An Implementation of I/O-Efficient Dynamic Breadth-First Search Using Level-Aligned Hierarchical Clustering", 21st Annual European Symposium of Algorithms (ESA), 2013. 218 | 219 | ### Graph Pre-processing 220 | 221 | - Wu, Bo, Zhijia Zhao, Eddy Zheng Zhang, Yunlian Jiang, and Xipeng Shen. 222 | "Complexity analysis and algorithm design for reorganizing data to minimize 223 | non-coalesced memory accesses on GPU." In ACM SIGPLAN Notices, vol. 48, no. 8, 224 | pp. 57-68. ACM, 2013. 225 | 226 | - Khorasani, Farzad, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. "CuSha: 227 | vertex-centric graph processing on GPUs." In Proceedings of the 23rd 228 | international symposium on High-performance parallel and distributed 229 | computing, pp. 239-252. ACM, 2014. 230 | 231 | - Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. 2011. 232 | On-the-fly elimination of dynamic irregularities for GPU computing. In 233 | Proceedings of the sixteenth international conference on Architectural support 234 | for programming languages and operating systems (ASPLOS XVI). ACM, New York, NY, USA, 369-380. 235 | 236 | - Sanders, Peter, Dominik Schultes, and Christian Vetter. "Mobile route 237 | planning." In European Symposium on Algorithms, pp. 732-743. Springer Berlin 238 | Heidelberg, 2008. 239 | 240 | ### Load balancing 241 | 242 | ### Graph Compression 243 | - Zhou, Fang. "Graph compression." Department of Computer Science and Helsinki Institute for Information Technology HIIT (2015): 1-12. 244 | 245 | - S. Chen and J. H. Reif. 1996. Efficient Lossless Compression of Trees and Graphs. In Proceedings of the Conference on Data Compression (DCC '96). IEEE Computer Society, Washington 246 | 247 | - Sebastian Maneth and Fabian Peternek, "A Survey on Methods and Systems for Graph Compression", Journal of CoRR, 2015 248 | 249 | - Sparsh Mittal and Jeffrey S. Vetter. 2016. A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems. IEEE Trans. Parallel Distrib. Syst. 27, 5 (May 2016), 1524-1536. 250 | 251 | - Vito Giovanni Castellana, Marco Minutoli, Alessandro Morari, Antonino Tumeo, Marco Lattuada, and Fabrizio Ferrandi. 2015. High Level Synthesis of RDF Queries for Graph Analytics. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '15). IEEE Press, Piscataway, NJ, USA, 323-330. 252 | 253 | - Julian Shun, Laxman Dhulipala and Guy Blelloch. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+. Proceedings of the IEEE Data Compression Conference (DCC), pp. 403-412, 2015 254 | 255 | ### Graph Approximate Computing 256 | - Shang, Zechao, and Jeffrey Xu Yu. "Auto-approximation of graph computing." 257 | Proceedings of the VLDB Endowment 7, no. 14 (2014): 1833-1844. 258 | 259 | 260 | ## Graph Database 261 | 262 | - Shi, Jiaxin, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. "Fast and 263 | concurrent rdf queries with rdma-based distributed graph exploration." In 12th 264 | USENIX Symposium on Operating Systems Design and Implementation (OSDI 265 | 16)(Savannah, GA. 2016. 266 | 267 | - Xirogiannopoulos, Konstantinos, Udayan Khurana, and Amol Deshpande. "GraphGen: exploring interesting graphs in relational data." Proceedings of the VLDB Endowment 8.12 (2015): 2032-2035. 268 | 269 | - Morari, Alessandro, Jesse Weaver, Oreste Villa, David Haglin, Antonino Tumeo, 270 | Vito Giovanni Castellana, and John Feo. "High-Performance, Distributed 271 | Dictionary Encoding of RDF Datasets." In 2015 IEEE International Conference on 272 | Cluster Computing, pp. 250-253. IEEE, 2015. 273 | 274 | - Morari, Alessandro, Vito Giovanni Castellana, Oreste Villa, Jesse Weaver, 275 | Gregory Todd Williams, David J. Haglin, Antonino Tumeo, and John Feo. "GEMS: 276 | Graph Database Engine for Multithreaded Systems." (2015): 139-156. 277 | 278 | ## Research Groups on Database Query Acceleration 279 | - [Xtra Computing Group] (http://pdcc.ntu.edu.sg/xtra/) 280 | - [amplab] (https://amplab.cs.berkeley.edu/projects/succinct-enabling-queries-on-compressed-data/) 281 | 282 | ## Readling List 283 | 284 | ### Database Query Acceleration 285 | 286 | - M. Sadoghi, R. Javed, N. Tarafdar, H. Singh, R. Palaniappan and H. A. Jacobsen, "Multi-query Stream Processing on FPGAs," 2012 IEEE 28th International Conference on Data Engineering, Washington, DC, 2012, pp. 1229-1232. 287 | 288 | - Kocberber, Onur, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and 289 | Parthasarathy Ranganathan. "Meet the walkers: Accelerating index traversals 290 | for in-memory databases." In Proceedings of the 46th Annual IEEE/ACM 291 | International Symposium on Microarchitecture, pp. 468-479. ACM, 2013. 292 | 293 | - V. G. Castellana et al., "In-Memory Graph Databases for Web-Scale Data," in 294 | Computer, vol. 48, no. 3, pp. 24-35, Mar. 2015. 295 | 296 | - Zeng, Kai, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. "A 297 | distributed graph engine for web scale RDF data." In Proceedings of the VLDB 298 | Endowment, vol. 6, no. 4, pp. 265-276. VLDB Endowment, 2013. 299 | 300 | - Sukhwani, Bharat, et al. "A hardware/software approach for database query acceleration with fpgas." International Journal of Parallel Programming 43.6 (2015): 1129-1159. 301 | 302 | - Dennl, Christopher, Daniel Ziener, and Jurgen Teich. "On-the-fly composition of FPGA-based SQL query accelerators using a partially reconfigurable module library." Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on. IEEE, 2012. 303 | 304 | - Wu, Lisa, et al. "The Q100 Database Processing Unit." IEEE Micro 35.3 (2015): 34-46. 305 | 306 | - Chung, Eric S., John D. Davis, and Jaewon Lee. "Linqits: Big data on little clients." ACM SIGARCH Computer Architecture News. Vol. 41. No. 3. ACM, 2013. 307 | 308 | - Halstead, Robert J., et al. "FPGA-based Multithreading for In-Memory Hash Joins." CIDR. 2015. 309 | 310 | - Chen, Ren, and Viktor K. Prasanna. "Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform." 311 | 312 | - Wang, Zeke, Bingsheng He, and Wei Zhang. "A study of data partitioning on OpenCL-based FPGAs." 2015 25th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2015. 313 | 314 | - R. R. Bordawekar and M. Sadoghi, "Accelerating database workloads by software-hardware-system co-design," 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, 2016, pp. 1428-1431. 315 | 316 | - Guo, Cong and Martin Karsten. “Towards Adaptive Resource Allocation for Database Workloads.” ADMS@VLDB (2015). 317 | 318 | - Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based Pipelined Query Processing Engine. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY 319 | 320 | - Jared Casper and Kunle Olukotun. 2014. Hardware acceleration of database operations. In Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays (FPGA '14) 321 | 322 | - Gokul Soundararajan, Daniel Lupei, Saeed Ghanbari, Adrian Daniel Popescu, Jin Chen, and Cristiana Amza. 2009. Dynamic resource allocation for database servers running on virtual storage. In Proccedings of the 7th conference on File and storage technologies (FAST '09), Margo Seltzer and Ric Wheeler (Eds.). USENIX Association, Berkeley, CA, USA, 71-84. 323 | 324 | - Bingsheng He and Jeffrey Xu Yu. 2011. High-throughput transaction executions on graphics processors. Proc. VLDB Endow. 4, 5 (February 2011), 314-325. 325 | 326 | - Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Bernard Brezzo, Sameh Asaad, Donna Eng Dillenberger, "Database Analytics: A Reconfigurable-Computing Approach", IEEE Micro vol. 34 no. 1, p. 19-29, Jan.-Feb., 2014 327 | 328 | - Shuang Chen, Shunning Jiang, Bingsheng He, and Xueyan Tang. 2016. A Study of Sorting Algorithms on Approximate Memory. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 647-662. 329 | 330 | - Gustavo Alonso, "Data Processing on the fast lane", Systems Group, Department of Computer Science, ETH Zurich, Switzerland, FPL keynote, 2016. 331 | 332 | - Zeke Wang, Huiyan Cheah, Johns Paul, Bingsheng He, and Wei Zhang. 2016. Accelerating Database Query Processing on OpenCL-based FPGAs (Abstract Only). In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY 333 | 334 | - Barthels, Claude, Ingo Müller, Timo Schneider, Gustavo Alonso, and Torsten 335 | Hoefler. "Distributed Join Algorithms on Thousands of Cores." Proceedings of 336 | the VLDB Endowment 10, no. 5 (2017). 337 | 338 | ### Database Compression 339 | - Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann and Alfons Kemper. “Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation.” SIGMOD Conference (2016). 340 | 341 | - Lin, Chunbin, Jianguo Wang, and Yannis Papakonstantinou. "Data Compression for 342 | Analytics over Large-scale In-memory Column Databases." arXiv preprint 343 | arXiv:1606.09315 (2016). 344 | 345 | ## Interesting Open Projects & Posts 346 | - [RIFF] (https://github.com/KastnerRG/riffa) 347 | - [Overclocking Arithmetic](https://constantinides.net/2014/12/11/overclocking-friendly-arithmetic-neednt-cost-the-earth/) 348 | - [Some Highlights in FPGA 2016](https://constantinides.net/2016/02/25/fpga-2016-some-highlights/) 349 | - [Time is Precision] (https://constantinides.net/2016/12/12/time-is-precision/) 350 | 351 | ## Cutting Edge Techniques 352 | - Ousterhout, John, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, 353 | Behnam Montazeri, Diego Ongaro et al. "The ramcloud storage system." ACM 354 | Transactions on Computer Systems (TOCS) 33, no. 3 (2015): 7. 355 | 356 | - Ho, Chen-Han, Sung Jin Kim, and Karthikeyan Sankaralingam. "Efficient 357 | execution of memory access phases using dataflow specialization." In ACM 358 | SIGARCH Computer Architecture News, vol. 43, no. 3, pp. 118-130. ACM, 2015. 359 | 360 | - Kumar, Snehasish, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin, and 361 | Jordon Phillips. "SQRL: hardware accelerator for collecting software data 362 | structures." In Proceedings of the 23rd international conference on Parallel 363 | architectures and compilation, pp. 475-476. ACM, 2014. 364 | 365 | - Schkufza, Eric, Rahul Sharma, and Alex Aiken. "Stochastic optimization of 366 | floating-point programs with tunable precision." ACM SIGPLAN Notices 49, no. 6 367 | (2014): 53-64. 368 | 369 | ## Interesting Research Topic 370 | 371 | ### Memory access related optimization 372 | - Guo, Qi, Tze-Meng Low, Nikolaos Alachiotis, Berkin Akin, Larry Pileggi, James 373 | C. Hoe, and Franz Franchetti. "Enabling portable energy efficiency with memory 374 | accelerated library." In Proceedings of the 48th International Symposium on 375 | Microarchitecture, pp. 750-761. ACM, 2015. 376 | 377 | - Appuswamy, Raja, Matthaios Olma, and Anastasia Ailamaki. "Scaling the Memory 378 | Power Wall With DRAM-Aware Data Management." In Proceedings of the 11th 379 | International Workshop on Data Management on New Hardware, p. 3. ACM, 2015. 380 | 381 | - Akın, Berkin, Franz Franchetti, and James C. Hoe. "Understanding the design 382 | space of dram-optimized hardware FFT accelerators." In 2014 IEEE 25th 383 | International Conference on Application-Specific Systems, Architectures and 384 | Processors, pp. 248-255. IEEE, 2014. 385 | 386 | - Akin, Berkin, Franz Franchetti, and James C. Hoe. "Data reorganization in 387 | memory using 3d-stacked dram." In ACM SIGARCH Computer Architecture News, vol. 388 | 43, no. 3, pp. 131-143. ACM, 2015. 389 | 390 | - Hsieh, Kevin, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali 391 | Boroumand, Saugata Ghose, and Onur Mutlu. "Accelerating pointer chasing in 392 | 3D-stacked memory: Challenges, mechanisms, evaluation." In Computer Design 393 | (ICCD), 2016 IEEE 34th International Conference on, pp. 25-32. IEEE, 2016. 394 | 395 | ### FPGA Design Tools and Frameworks 396 | - Jacobsen, M., Richmond, D., Hogains, M., and Kastner, R. “RIFFA 2.1: A reusable integration framework for FPGA accelerators.” ACM Transactions on Reconfigurable Technology and Systems (TRETS), September 2015. 397 | 398 | - C. Pham-Quoc, Z. Al-Ars and K. Bertels, "Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling," Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, Phoenix, AZ, 2014, pp. 151-160. 399 | 400 | - Niu, Xinyu, Wayne Luk, and Yu Wang. "EURECA: On-chip configuration generation for effective dynamic data access." Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2015. 401 | 402 | ### Sparse Matrix Computing Acceleration on FPGAs 403 | - Umuroglu, Yaman, and Magnus Jahre. "Random access schemes for efficient FPGA 404 | SpMV acceleration." Microprocessors and Microsystems (2016). 405 | 406 | - Dorrance, Richard, Fengbo Ren, and Dejan Marković. "A scalable sparse 407 | matrix-vector multiplication kernel for energy-efficient sparse-blas on 408 | FPGAs." In Proceedings of the 2014 ACM/SIGDA international symposium on 409 | Field-programmable gate arrays, pp. 161-170. ACM, 2014. 410 | 411 | - Jamro, Ernest, Tomasz Pabiś, Paweł Russek, and Kazimierz Wiatr. "The 412 | algorithms for FPGA implementation of sparse matrices multiplication." 413 | Computing and Informatics 33, no. 3 (2015): 667-684. 414 | 415 | - Giefers, Heiner, Peter Staar, Costas Bekas, and Christoph Hagleitner. 416 | "Analyzing the energy-efficiency of sparse matrix multiplication on 417 | heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA." In 418 | Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International 419 | Symposium on, pp. 46-56. IEEE, 2016. 420 | 421 | ### Manycore Simulation and Scalability Research 422 | - Yu, Xiangyao, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael 423 | Stonebraker. "Staring into the abyss: An evaluation of concurrency control 424 | with one thousand cores." Proceedings of the VLDB Endowment 8, no. 3 (2014): 425 | 209-220. 426 | 427 | - Fu, Yaosheng, and David Wentzlaff. "PriME: A parallel and distributed 428 | simulator for thousand-core chips." In Performance Analysis of Systems and 429 | Software (ISPASS), 2014 IEEE International Symposium on, pp. 116-125. IEEE, 430 | 2014. 431 | 432 | - Miller, Jason E., Harshad Kasture, George Kurian, Charles Gruenwald, Nathan 433 | Beckmann, Christopher Celio, Jonathan Eastep, and Anant Agarwal. "Graphite: A 434 | distributed parallel simulator for multicores." In HPCA-16 2010 The Sixteenth 435 | International Symposium on High-Performance Computer Architecture, pp. 1-12. 436 | IEEE, 2010. 437 | 438 | - Carlson, Trevor E., Wim Heirman, and Lieven Eeckhout. "Sniper: exploring the 439 | level of abstraction for scalable and accurate parallel multi-core 440 | simulation." In Proceedings of 2011 International Conference for High 441 | Performance Computing, Networking, Storage and Analysis, p. 52. ACM, 2011. 442 | -------------------------------------------------------------------------------- /refs.bib: -------------------------------------------------------------------------------- 1 | @inproceedings{charousset2014caf, 2 | title={Caf-the c++ actor framework for scalable and resource-efficient applications}, 3 | author={Charousset, Dominik and Hiesgen, Raphael and Schmidt, Thomas C}, 4 | booktitle={Proceedings of the 4th International Workshop on Programming based on Actors Agents 5 | \& Decentralized Control}, 6 | pages={15--28}, 7 | year={2014}, 8 | organization={ACM} 9 | } 10 | 11 | @inproceedings{hiesgen2015manyfold, 12 | title={Manyfold actors: extending the C++ actor framework to heterogeneous many-core machines 13 | using OpenCL}, 14 | author={Hiesgen, Raphael and Charousset, Dominik and Schmidt, Thomas C}, 15 | booktitle={Proceedings of the 5th International Workshop on Programming Based on Actors, 16 | Agents, and Decentralized Control}, 17 | pages={45--56}, 18 | year={2015}, 19 | organization={ACM} 20 | } 21 | 22 | 23 | @article{wu2014q100, 24 | title={Q100: the architecture and design of a database processing unit}, 25 | author={Wu, Lisa and Lottarini, Andrea and Paine, Timothy K and Kim, Martha A and Ross, Kenneth 26 | A}, 27 | journal={ACM SIGPLAN Notices}, 28 | volume={49}, 29 | number={4}, 30 | pages={255--268}, 31 | year={2014}, 32 | publisher={ACM} 33 | } 34 | 35 | 36 | @article{wu2015q100, 37 | title={The Q100 Database Processing Unit}, 38 | author={Wu, Lisa and Lottarini, Andrea and Paine, Timothy K and Kim, Martha A and Ross, Kenneth 39 | A}, 40 | journal={IEEE Micro}, 41 | volume={35}, 42 | number={3}, 43 | pages={34--46}, 44 | year={2015}, 45 | publisher={IEEE} 46 | } 47 | 48 | @inproceedings{chung2013linqits, 49 | title={Linqits: Big data on little clients}, 50 | author={Chung, Eric S and Davis, John D and Lee, Jaewon}, 51 | booktitle={ACM SIGARCH Computer Architecture News}, 52 | volume={41}, 53 | number={3}, 54 | pages={261--272}, 55 | year={2013}, 56 | organization={ACM} 57 | } 58 | 59 | @article{guotowards, 60 | title={Towards Adaptive Resource Allocation for Database Workloads}, 61 | author={Guo, Cong and Karsten, Martin} 62 | } 63 | 64 | @inproceedings{soundararajan2009dynamic, 65 | title={Dynamic Resource Allocation for Database Servers Running on Virtual Storage.}, 66 | author={Soundararajan, Gokul and Lupei, Daniel and Ghanbari, Saeed and Popescu, Adrian Daniel and Chen, Jin and Amza, Cristiana}, 67 | booktitle={FAST}, 68 | volume={9}, 69 | pages={71--84}, 70 | year={2009} 71 | } 72 | 73 | @inproceedings{sadoghi2012multi, 74 | title={Multi-query stream processing on fpgas}, 75 | author={Sadoghi, Mohammad and Javed, Rija and Tarafdar, Naif and Singh, Harsh and Palaniappan, Rohan and Jacobsen, Hans-Arno}, 76 | booktitle={2012 IEEE 28th International Conference on Data Engineering}, 77 | pages={1229--1232}, 78 | year={2012}, 79 | organization={IEEE} 80 | } 81 | 82 | @article{chen2015accelerating, 83 | title={Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform}, 84 | author={Chen, Ren and Prasanna, Viktor K} 85 | } 86 | 87 | @article{papadimitriou2011performance, 88 | title={Performance of aprtial reconfiguration in FPGA systems: A survey and a cost model}, 89 | author={Papadimitriou, Kyprianos and Dollas, Apostolos and Hauck, Scott}, 90 | journal={ACM Transactions on Reconfigurable Technology and Systems (TRETS)}, 91 | volume={4}, 92 | number={4}, 93 | pages={36}, 94 | year={2011}, 95 | publisher={ACM} 96 | } 97 | 98 | @article{zhang2013omnidb, 99 | title={Omnidb: Towards portable and efficient query processing on parallel cpu/gpu architectures}, 100 | author={Zhang, Shuhao and He, Jiong and He, Bingsheng and Lu, Mian}, 101 | journal={Proceedings of the VLDB Endowment}, 102 | volume={6}, 103 | number={12}, 104 | pages={1374--1377}, 105 | year={2013}, 106 | publisher={VLDB Endowment} 107 | } 108 | 109 | @inproceedings{nurvitadhi2014graphgen, 110 | title={GraphGen: An FPGA framework for vertex-centric graph computation}, 111 | author={Nurvitadhi, Eriko and Weisz, Gabriel and Wang, Yu and Hurkat, Skand and Nguyen, Marie and Hoe, James C and Mart{\'\i}nez, Jos{\'e} F and Guestrin, Carlos}, 112 | booktitle={Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on}, 113 | pages={25--28}, 114 | year={2014}, 115 | organization={IEEE} 116 | } 117 | 118 | @inproceedings{ozdal2016energy, 119 | title={Energy efficient architecture for graph analytics accelerators}, 120 | author={Ozdal, Muhammet Mustafa and Yesil, Serif and Kim, Taemin and Ayupov, Andrey and Greth, John and Burns, Steven and Ozturk, Ozcan}, 121 | booktitle={Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on}, 122 | pages={166--177}, 123 | year={2016}, 124 | organization={IEEE} 125 | } 126 | 127 | @article{hamgra2016graphicionado, 128 | title={Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics}, 129 | author={Ham, Tae Jun and Wu, Lisa and Sundaram, Narayanan and Satish, Nadathur and Martonosi, Margaret} 130 | } 131 | 132 | @article{sundaram2015graphmat, 133 | title={GraphMat: High performance graph analytics made productive}, 134 | author={Sundaram, Narayanan and Satish, Nadathur and Patwary, Md Mostofa Ali and Dulloor, Subramanya R and Anderson, Michael J and Vadlamudi, Satya Gautam and Das, Dipankar and Dubey, Pradeep}, 135 | journal={Proceedings of the VLDB Endowment}, 136 | volume={8}, 137 | number={11}, 138 | pages={1214--1225}, 139 | year={2015}, 140 | publisher={VLDB Endowment} 141 | } 142 | 143 | @article{sukhwani2015hardware, 144 | title={A hardware/software approach for database query acceleration with fpgas}, 145 | author={Sukhwani, Bharat and Thoennes, Mathew and Min, Hong and Dube, Parijat and Brezzo, Bernard and Asaad, Sameh and Dillenberger, Donna}, 146 | journal={International Journal of Parallel Programming}, 147 | volume={43}, 148 | number={6}, 149 | pages={1129--1159}, 150 | year={2015}, 151 | publisher={Springer} 152 | } 153 | 154 | @article{vaquero2013xdgp, 155 | title={xDGP: A dynamic graph processing system with adaptive partitioning}, 156 | author={Vaquero, Luis and Cuadrado, F{\'e}lix and Logothetis, Dionysios and Martella, Claudio}, 157 | journal={arXiv preprint arXiv:1309.1049}, 158 | year={2013} 159 | } 160 | 161 | @article{mccune2015thinking, 162 | title={Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing}, 163 | author={McCune, Robert Ryan and Weninger, Tim and Madey, Greg}, 164 | journal={ACM Computing Surveys (CSUR)}, 165 | volume={48}, 166 | number={2}, 167 | pages={25}, 168 | year={2015}, 169 | publisher={ACM} 170 | } 171 | 172 | @inproceedings{gonzalez2014graphx, 173 | title={Graphx: Graph processing in a distributed dataflow framework}, 174 | author={Gonzalez, Joseph E and Xin, Reynold S and Dave, Ankur and Crankshaw, Daniel and Franklin, Michael J and Stoica, Ion}, 175 | booktitle={11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)}, 176 | pages={599--613}, 177 | year={2014} 178 | } 179 | 180 | @inproceedings{salihoglu2013gps, 181 | title={GPS: a graph processing system}, 182 | author={Salihoglu, Semih and Widom, Jennifer}, 183 | booktitle={Proceedings of the 25th International Conference on Scientific and Statistical Database Management}, 184 | pages={22}, 185 | year={2013}, 186 | organization={ACM} 187 | } 188 | 189 | @inproceedings{malewicz2010pregel, 190 | title={Pregel: a system for large-scale graph processing}, 191 | author={Malewicz, Grzegorz and Austern, Matthew H and Bik, Aart JC and Dehnert, James C and Horn, Ilan and Leiser, Naty and Czajkowski, Grzegorz}, 192 | booktitle={Proceedings of the 2010 ACM SIGMOD International Conference on Management of data}, 193 | pages={135--146}, 194 | year={2010}, 195 | organization={ACM} 196 | } 197 | 198 | @inproceedings{kyrola2012graphchi, 199 | title={GraphChi: large-scale graph computation on just a PC}, 200 | author={Kyrola, Aapo and Blelloch, Guy and Guestrin, Carlos}, 201 | booktitle={Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12)}, 202 | pages={31--46}, 203 | year={2012} 204 | } 205 | 206 | @inproceedings{dennl2012fly, 207 | title={On-the-fly composition of FPGA-based SQL query accelerators using a partially reconfigurable module library}, 208 | author={Dennl, Christopher and Ziener, Daniel and Teich, Jurgen}, 209 | booktitle={Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on}, 210 | pages={45--52}, 211 | year={2012}, 212 | organization={IEEE} 213 | } 214 | 215 | @article{zhou2015graph, 216 | title={Graph compression}, 217 | author={Zhou, Fang}, 218 | journal={Department of Computer Science and Helsinki Institute for Information Technology HIIT}, 219 | pages={1--12}, 220 | year={2015} 221 | } 222 | 223 | @inproceedings{halstead2015fpga, 224 | title={FPGA-based Multithreading for In-Memory Hash Joins.}, 225 | author={Halstead, Robert J and Absalyamov, Ildar and Najjar, Walid A and Tsotras, Vassilis J}, 226 | booktitle={CIDR}, 227 | year={2015} 228 | } 229 | 230 | @inproceedings{wang2015study, 231 | title={A study of data partitioning on OpenCL-based FPGAs}, 232 | author={Wang, Zeke and He, Bingsheng and Zhang, Wei}, 233 | booktitle={2015 25th International Conference on Field Programmable Logic and Applications (FPL)}, 234 | pages={1--8}, 235 | year={2015}, 236 | organization={IEEE} 237 | } 238 | 239 | @INPROCEEDINGS{bondhugula2006parallel-APSP, 240 | author={U. Bondhugula and A. Devulapalli and J. Fernando and P. Wyckoff and P. Sadayappan}, 241 | booktitle={Proceedings 20th IEEE International Parallel Distributed Processing Symposium}, 242 | title={Parallel FPGA-based all-pairs shortest-paths in a directed graph}, 243 | year={2006}, 244 | pages={10 pp.-}, 245 | keywords={directed graphs;field programmable gate arrays;parallel algorithms;Cray XD1 processor;Floyd-Warshall algorithm;VLSI technology;all-pair shortest-path problem;bioinformatics application;directed graph;field programmable gate array;high performance computing;parallel FPGA design;parallel computing;very large scale integration;Algorithm design and analysis;Application software;Bonding;Design optimization;Field programmable gate arrays;High performance 246 | computing;Parallel processing;Signal processing algorithms;Supercomputers;Very large scale integration}, 247 | doi={10.1109/IPDPS.2006.1639347}, 248 | ISSN={1530-2075}, 249 | month={April}, 250 | } 251 | 252 | @INPROCEEDINGS{abdolrashidi2016continual, 253 | author={A. Abdolrashidi and L. Ramaswamy}, 254 | booktitle={2016 IEEE International Congress on Big Data (BigData Congress)}, 255 | title={Continual and Cost-Effective Partitioning of Dynamic Graphs for Optimizing Big Graph Processing Systems}, 256 | year={2016}, 257 | pages={18-25}, 258 | keywords={Distributed Vertex-Centric Graph Processing;Graph Partitioning;Performance Evaluation;Time-Evolving Graphs}, 259 | doi={10.1109/BigDataCongress.2016.12}, 260 | month={June}, 261 | } 262 | 263 | @inproceedings{dai2016fpgp, 264 | author = {Dai, Guohao and Chi, Yuze and Wang, Yu and Yang, Huazhong}, 265 | title = {FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search}, 266 | booktitle = {Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays}, 267 | series = {FPGA '16}, 268 | year = {2016}, 269 | isbn = {978-1-4503-3856-1}, 270 | location = {Monterey, California, USA}, 271 | pages = {105--110}, 272 | numpages = {6}, 273 | url = {http://doi.acm.org/10.1145/2847263.2847339}, 274 | doi = {10.1145/2847263.2847339}, 275 | acmid = {2847339}, 276 | publisher = {ACM}, 277 | address = {New York, NY, USA}, 278 | keywords = {fpga framework, large scale graph processing}, 279 | } 280 | 281 | @INPROCEEDINGS{sadogi2012multi-query, 282 | author={M. Sadoghi and R. Javed and N. Tarafdar and H. Singh and R. Palaniappan and H. A. Jacobsen}, 283 | booktitle={2012 IEEE 28th International Conference on Data Engineering}, 284 | title={Multi-query Stream Processing on FPGAs}, 285 | year={2012}, 286 | pages={1229-1232}, 287 | keywords={field programmable gate arrays;logic design;query processing;FPGA;algorithmic trading;complex event processing;high-frequency event streams;line-rate multiquery processing;low-level logic design;multiquery event stream platform;multiquery stream processing;parallelism;pipelining;real-time data analytics;reconfigurable hardware;targeted advertisement;Algebra;Bandwidth;Field programmable gate arrays;Hardware;Hardware design languages;Parallel processing;Semantics}, 288 | doi={10.1109/ICDE.2012.39}, 289 | ISSN={1063-6382}, 290 | month={April}, 291 | } 292 | 293 | @INPROCEEDINGS{engelhardt2016gravf, 294 | author={N. Engelhardt and H. K. H. So}, 295 | booktitle={2016 26th International Conference on Field Programmable Logic and Applications (FPL)}, 296 | title={GraVF: A vertex-centric distributed graph processing framework on FPGAs}, 297 | year={2016}, 298 | pages={1-4}, 299 | keywords={Algorithm design and analysis;Computational modeling;Computer architecture;Field programmable gate arrays;Hardware;Kernel;Programming}, 300 | doi={10.1109/FPL.2016.7577360}, 301 | month={Aug}, 302 | } 303 | 304 | @article{sparsh2012survey, 305 | author = {Mittal, Sparsh}, 306 | title = {A Survey of Architectural Techniques for DRAM Power Management}, 307 | journal = {Int. J. High Perform. Syst. Archit.}, 308 | issue_date = {December 2012}, 309 | volume = {4}, 310 | number = {2}, 311 | month = dec, 312 | year = {2012}, 313 | issn = {1751-6528}, 314 | pages = {110--119}, 315 | numpages = {10}, 316 | url = {http://dx.doi.org/10.1504/IJHPSA.2012.050990}, 317 | doi = {10.1504/IJHPSA.2012.050990}, 318 | acmid = {2421513}, 319 | publisher = {Inderscience Publishers}, 320 | address = {Inderscience Publishers, Geneva, SWITZERLAND}, 321 | } 322 | 323 | @ARTICLE{zhong2014medusa, 324 | author={J. Zhong and B. He}, 325 | journal={IEEE Transactions on Parallel and Distributed Systems}, 326 | title={Medusa: Simplified Graph Processing on GPUs}, 327 | year={2014}, 328 | volume={25}, 329 | number={6}, 330 | pages={1543-1552}, 331 | keywords={C++ language;application program interfaces;data structures;graph theory;graphics processing units;optimisation;source code (software);API;GPGPU programs;GPU graph operations;Medusa;data structures;graph processing;graph-centric optimizations;graphics processing unit;runtime system;sequential C-C++ code;source code;Algorithm design and analysis;Data structures;Graphics processing units;Memory management;Optimization;Parallel processing;Programming;GPGPU;GPU 332 | programming;graph processing;runtime framework}, 333 | doi={10.1109/TPDS.2013.111}, 334 | ISSN={1045-9219}, 335 | month={June}, 336 | } 337 | 338 | @INPROCEEDINGS{bordawekar2016accelerating, 339 | author={R. R. Bordawekar and M. Sadoghi}, 340 | booktitle={2016 IEEE 32nd International Conference on Data Engineering (ICDE)}, 341 | title={Accelerating database workloads by software-hardware-system co-design}, 342 | year={2016}, 343 | pages={1428-1431}, 344 | keywords={SQL;business data processing;field programmable gate arrays;graphics processing units;hardware-software codesign;query processing;relational databases;FPGA;GPU;NoSQL database;data stream management system;database workload acceleration;enterprise data management workload;field-programmable gate array;graphics processing unit;query execution pipeline;relational database;software-hardware-system codesign;system-level 345 | characterization;Acceleration;Computer architecture;Databases;Field programmable gate arrays;Graphics processing units;Hardware;Programming}, 346 | doi={10.1109/ICDE.2016.7498362}, 347 | month={May}, 348 | } 349 | 350 | @inproceedings{Guo2015TowardsAR, 351 | title={Towards Adaptive Resource Allocation for Database Workloads}, 352 | author={Cong Guo and Martin Karsten}, 353 | booktitle={ADMS@VLDB}, 354 | year={2015} 355 | } 356 | 357 | @inproceedings{chen1996efficient, 358 | author = {Chen, S. and Reif, J. H.}, 359 | title = {Efficient Lossless Compression of Trees and Graphs}, 360 | booktitle = {Proceedings of the Conference on Data Compression}, 361 | series = {DCC '96}, 362 | year = {1996}, 363 | isbn = {0-8186-7358-3}, 364 | pages = {428--}, 365 | url = {http://dl.acm.org/citation.cfm?id=789084.789454}, 366 | acmid = {789454}, 367 | publisher = {IEEE Computer Society}, 368 | address = {Washington, DC, USA}, 369 | } 370 | 371 | @inproceedings{paul2016gpl, 372 | author = {Paul, Johns and He, Jiong and He, Bingsheng}, 373 | title = {GPL: A GPU-based Pipelined Query Processing Engine}, 374 | booktitle = {Proceedings of the 2016 International Conference on Management of Data}, 375 | series = {SIGMOD '16}, 376 | year = {2016}, 377 | isbn = {978-1-4503-3531-7}, 378 | location = {San Francisco, California, USA}, 379 | pages = {1935--1950}, 380 | numpages = {16}, 381 | url = {http://doi.acm.org/10.1145/2882903.2915224}, 382 | doi = {10.1145/2882903.2915224}, 383 | acmid = {2915224}, 384 | publisher = {ACM}, 385 | address = {New York, NY, USA}, 386 | keywords = {KBE, channel, pipelined execution, tiling}, 387 | } 388 | 389 | @inproceedings{casper2014hardware, 390 | author = {Casper, Jared and Olukotun, Kunle}, 391 | title = {Hardware Acceleration of Database Operations}, 392 | booktitle = {Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays}, 393 | series = {FPGA '14}, 394 | year = {2014}, 395 | isbn = {978-1-4503-2671-1}, 396 | location = {Monterey, California, USA}, 397 | pages = {151--160}, 398 | numpages = {10}, 399 | url = {http://doi.acm.org/10.1145/2554688.2554787}, 400 | doi = {10.1145/2554688.2554787}, 401 | acmid = {2554787}, 402 | publisher = {ACM}, 403 | address = {New York, NY, USA}, 404 | keywords = {database, fpga, hardware acceleration, join, sort}, 405 | } 406 | 407 | @inproceedings{Lang2016DataBH, 408 | title={Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation}, 409 | author={Harald Lang and Tobias M\"{u}hlbauer and Florian Funke and Peter A. Boncz and Thomas Neumann and Alfons Kemper}, 410 | booktitle={SIGMOD Conference}, 411 | year={2016} 412 | } 413 | 414 | @inbook{Beckmann2013, 415 | author={Beckmann, Andreas and Meyer, Ulrich and Veith, David}, 416 | editor={Bodlaender, Hans L.and Italiano, Giuseppe F.}, 417 | title={An Implementation of I/O-Efficient Dynamic Breadth-First Search Using Level-Aligned Hierarchical Clustering}, 418 | bookTitle={Algorithms -- ESA 2013: 21st Annual European Symposium, Sophia Antipolis, France, September 2-4, 2013. Proceedings}, 419 | year={2013}, 420 | publisher={Springer Berlin Heidelberg}, 421 | address={Berlin, Heidelberg}, 422 | pages={121--132}, 423 | isbn={978-3-642-40450-4}, 424 | doi={10.1007/978-3-642-40450-4_11}, 425 | url={http://dx.doi.org/10.1007/978-3-642-40450-4_11} 426 | } 427 | 428 | @article{maneth2015survey, 429 | author = {Sebastian Maneth and Fabian Peternek}, 430 | title = {A Survey on Methods and Systems for Graph Compression}, 431 | journal = {CoRR}, 432 | volume = {abs/1504.00616}, 433 | year = {2015}, 434 | url = {http://arxiv.org/abs/1504.00616}, 435 | timestamp = {Sat, 02 May 2015 17:50:32 +0200}, 436 | biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/ManethP15}, 437 | bibsource = {dblp computer science bibliography, http://dblp.org} 438 | } 439 | 440 | @article{He2011high, 441 | author = {He, Bingsheng and Yu, Jeffrey Xu}, 442 | title = {High-throughput Transaction Executions on Graphics Processors}, 443 | journal = {Proc. VLDB Endow.}, 444 | issue_date = {February 2011}, 445 | volume = {4}, 446 | number = {5}, 447 | month = feb, 448 | year = {2011}, 449 | issn = {2150-8097}, 450 | pages = {314--325}, 451 | numpages = {12}, 452 | url = {http://dx.doi.org/10.14778/1952376.1952381}, 453 | doi = {10.14778/1952376.1952381}, 454 | acmid = {1952381}, 455 | publisher = {VLDB Endowment}, 456 | } 457 | 458 | @article{Mittal2016survey, 459 | author = {Mittal, Sparsh and Vetter, Jeffrey S.}, 460 | title = {A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems}, 461 | journal = {IEEE Trans. Parallel Distrib. Syst.}, 462 | issue_date = {May 2016}, 463 | volume = {27}, 464 | number = {5}, 465 | month = may, 466 | year = {2016}, 467 | issn = {1045-9219}, 468 | pages = {1524--1536}, 469 | numpages = {13}, 470 | url = {http://dx.doi.org/10.1109/TPDS.2015.2435788}, 471 | doi = {10.1109/TPDS.2015.2435788}, 472 | acmid = {2927579}, 473 | publisher = {IEEE Press}, 474 | address = {Piscataway, NJ, USA}, 475 | } 476 | 477 | @article{Sukhwani2014database, 478 | author = {Bharat Sukhwani, and Hong Min, and Mathew Thoennes, and Parijat Dube, and Bernard Brezzo, and Sameh Asaad, and Donna Eng Dillenberger, }, 479 | title = {Database Analytics: A Reconfigurable-Computing Approach}, 480 | journal = {IEEE Micro}, 481 | volume = {34}, 482 | number = {1}, 483 | issn = {0272-1732}, 484 | year = {2014}, 485 | pages = {19-29}, 486 | doi = {doi.ieeecomputersociety.org/10.1109/MM.2013.107}, 487 | publisher = {IEEE Computer Society}, 488 | address = {Los Alamitos, CA, USA}, 489 | } 490 | 491 | @inproceedings{Chen2016study, 492 | author = {Chen, Shuang and Jiang, Shunning and He, Bingsheng and Tang, Xueyan}, 493 | title = {A Study of Sorting Algorithms on Approximate Memory}, 494 | booktitle = {Proceedings of the 2016 International Conference on Management of Data}, 495 | series = {SIGMOD '16}, 496 | year = {2016}, 497 | isbn = {978-1-4503-3531-7}, 498 | location = {San Francisco, California, USA}, 499 | pages = {647--662}, 500 | numpages = {16}, 501 | url = {http://doi.acm.org/10.1145/2882903.2882908}, 502 | doi = {10.1145/2882903.2882908}, 503 | acmid = {2882908}, 504 | publisher = {ACM}, 505 | address = {New York, NY, USA}, 506 | keywords = {approximate storage, database, hybrid storage, phase change memory, sorting algorithms}, 507 | } 508 | 509 | @inproceedings{Wang2016accelerating, 510 | author = {Wang, Zeke and Cheah, Huiyan and Paul, Johns and He, Bingsheng and Zhang, Wei}, 511 | title = {Accelerating Database Query Processing on OpenCL-based FPGAs (Abstract Only)}, 512 | booktitle = {Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays}, 513 | series = {FPGA '16}, 514 | year = {2016}, 515 | isbn = {978-1-4503-3856-1}, 516 | location = {Monterey, California, USA}, 517 | pages = {274--274}, 518 | numpages = {1}, 519 | url = {http://doi.acm.org/10.1145/2847263.2847295}, 520 | doi = {10.1145/2847263.2847295}, 521 | acmid = {2847295}, 522 | publisher = {ACM}, 523 | address = {New York, NY, USA}, 524 | keywords = {fpga, opencl, query processing}, 525 | } 526 | 527 | @INPROCEEDINGS{quoc2014automated, 528 | author={C. Pham-Quoc and Z. Al-Ars and K. Bertels}, 529 | booktitle={Parallel Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International}, 530 | title={Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling}, 531 | year={2014}, 532 | pages={151-160}, 533 | keywords={data communication;field programmable gate arrays;interconnections;network-on-chip;shared memory systems;FPGA bus-based accelerator system;NoC;adaptive mapping function;automated hybrid interconnect design;communication behavior;custom interconnect design algorithm;data communication profiling;energy reduction;kernels;quantitative communication profiling;shared local memory solution;Algorithm design and analysis;Computer architecture;DH-HEMTs;Data 534 | communication;Field programmable gate arrays;Kernel;Optimization;FPGA-based accelerator;communication profiling;custom interconnect}, 535 | doi={10.1109/IPDPSW.2014.21}, 536 | month={May}, 537 | } 538 | 539 | @inproceedings{Castellana2015HLS, 540 | author = {Castellana, Vito Giovanni and Minutoli, Marco and Morari, Alessandro and Tumeo, Antonino and Lattuada, Marco and Ferrandi, Fabrizio}, 541 | title = {High Level Synthesis of RDF Queries for Graph Analytics}, 542 | booktitle = {Proceedings of the IEEE/ACM International Conference on Computer-Aided Design}, 543 | series = {ICCAD '15}, 544 | year = {2015}, 545 | isbn = {978-1-4673-8389-9}, 546 | location = {Austin, TX, USA}, 547 | pages = {323--330}, 548 | numpages = {8}, 549 | url = {http://dl.acm.org/citation.cfm?id=2840819.2840865}, 550 | acmid = {2840865}, 551 | publisher = {IEEE Press}, 552 | address = {Piscataway, NJ, USA}, 553 | } 554 | 555 | @inproceedings{Aridhi2016bladyg, 556 | author = {Aridhi, Sabeur and Montresor, Alberto and Velegrakis, Yannis}, 557 | title = {BLADYG: A Novel Block-Centric Framework for the Analysis of Large Dynamic Graphs}, 558 | booktitle = {Proceedings of the ACM Workshop on High Performance Graph Processing}, 559 | series = {HPGP '16}, 560 | year = {2016}, 561 | isbn = {978-1-4503-4350-3}, 562 | location = {Kyoto, Japan}, 563 | pages = {39--42}, 564 | numpages = {4}, 565 | url = {http://doi.acm.org/10.1145/2915516.2915525}, 566 | doi = {10.1145/2915516.2915525}, 567 | acmid = {2915525}, 568 | publisher = {ACM}, 569 | address = {New York, NY, USA}, 570 | keywords = {akka framework, distributed graph processing, dynamic graphs}, 571 | } 572 | 573 | 574 | @article{Chi2015NXgraph, 575 | author = {Yuze Chi and Guohao Dai and Yu Wang and Guangyu Sun and Guoliang Li and Huazhong Yang}, 576 | title = {NXgraph: An Efficient Graph Processing System on a Single Machine}, 577 | journal = {CoRR}, 578 | volume = {abs/1510.06916}, 579 | year = {2015}, 580 | url = {http://arxiv.org/abs/1510.06916}, 581 | timestamp = {Wed, 03 Aug 2016 14:57:48 +0200}, 582 | biburl = {http://dblp.uni-trier.de/rec/bib/journals/corr/ChiDWSLY15}, 583 | bibsource = {dblp computer science bibliography, http://dblp.org} 584 | } 585 | 586 | @inproceedings{Ahn2015pim, 587 | author = {Ahn, Junwhan and Hong, Sungpack and Yoo, Sungjoo and Mutlu, Onur and Choi, Kiyoung}, 588 | title = {A Scalable Processing-in-memory Accelerator for Parallel Graph Processing}, 589 | booktitle = {Proceedings of the 42Nd Annual International Symposium on Computer Architecture}, 590 | series = {ISCA '15}, 591 | year = {2015}, 592 | isbn = {978-1-4503-3402-0}, 593 | location = {Portland, Oregon}, 594 | pages = {105--117}, 595 | numpages = {13}, 596 | url = {http://doi.acm.org/10.1145/2749469.2750386}, 597 | doi = {10.1145/2749469.2750386}, 598 | acmid = {2750386}, 599 | publisher = {ACM}, 600 | address = {New York, NY, USA}, 601 | } 602 | 603 | @inproceedings{sengupta2016graphin, 604 | title={Graphin: An online high performance incremental graph processing framework}, 605 | author={Sengupta, Dipanjan and Sundaram, Narayanan and Zhu, Xia and Willke, Theodore L and Young, Jeffrey and Wolf, Matthew and Schwan, Karsten}, 606 | booktitle={European Conference on Parallel Processing}, 607 | pages={319--333}, 608 | year={2016}, 609 | organization={Springer} 610 | } 611 | 612 | @inproceedings{niu2015eureca, 613 | title={EURECA: On-chip configuration generation for effective dynamic data access}, 614 | author={Niu, Xinyu and Luk, Wayne and Wang, Yu}, 615 | booktitle={Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays}, 616 | pages={74--83}, 617 | year={2015}, 618 | organization={ACM} 619 | } 620 | 621 | @article{xirogiannopoulos2015graphgen, 622 | title={GraphGen: exploring interesting graphs in relational data}, 623 | author={Xirogiannopoulos, Konstantinos and Khurana, Udayan and Deshpande, Amol}, 624 | journal={Proceedings of the VLDB Endowment}, 625 | volume={8}, 626 | number={12}, 627 | pages={2032--2035}, 628 | year={2015}, 629 | publisher={VLDB Endowment} 630 | } 631 | 632 | @inproceedings{khayyat2013mizan, 633 | title={Mizan: a system for dynamic load balancing in large-scale graph processing}, 634 | author={Khayyat, Zuhair and Awara, Karim and Alonazi, Amani and Jamjoom, Hani and Williams, Dan and Kalnis, Panos}, 635 | booktitle={Proceedings of the 8th ACM European Conference on Computer Systems}, 636 | pages={169--182}, 637 | year={2013}, 638 | organization={ACM} 639 | } 640 | 641 | @article{doekemeijer2014survey, 642 | title={A survey of parallel graph processing frameworks}, 643 | author={Doekemeijer, Niels and Varbanescu, Ana Lucia}, 644 | journal={Delft University of Technology}, 645 | year={2014} 646 | } 647 | 648 | @article{steinbauer2016dynamograph-journal, 649 | title={DynamoGraph: extending the Pregel paradigm for large-scale temporal graph processing}, 650 | author={Steinbauer, Matthias and Anderst-Kotsis, Gabriele}, 651 | journal={International Journal of Grid and Utility Computing}, 652 | volume={7}, 653 | number={2}, 654 | pages={141--151}, 655 | year={2016}, 656 | publisher={Inderscience Publishers (IEL)} 657 | } 658 | 659 | @inproceedings{steinbauer2016dynamograph-conf, 660 | title={DynamoGraph: A Distributed System for Large-scale, Temporal Graph Processing, its Implementation and First Observations}, 661 | author={Steinbauer, Matthias and Anderst-Kotsis, Gabriele}, 662 | booktitle={Proceedings of the 25th International Conference Companion on World Wide Web}, 663 | pages={861--866}, 664 | year={2016}, 665 | organization={International World Wide Web Conferences Steering Committee} 666 | } 667 | 668 | @inproceedings{Iyer2016time-evolving, 669 | author = {Iyer, Anand Padmanabha and Li, Li Erran and Das, Tathagata and Stoica, Ion}, 670 | title = {Time-evolving Graph Processing at Scale}, 671 | booktitle = {Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems}, 672 | series = {GRADES '16}, 673 | year = {2016}, 674 | isbn = {978-1-4503-4780-8}, 675 | location = {Redwood Shores, California}, 676 | pages = {5:1--5:6}, 677 | articleno = {5}, 678 | numpages = {6}, 679 | url = {http://doi.acm.org/10.1145/2960414.2960419}, 680 | doi = {10.1145/2960414.2960419}, 681 | acmid = {2960419}, 682 | publisher = {ACM}, 683 | address = {New York, NY, USA}, 684 | } 685 | 686 | @article{Shun2013ligra, 687 | author = {Shun, Julian and Blelloch, Guy E.}, 688 | title = {Ligra: A Lightweight Graph Processing Framework for Shared Memory}, 689 | journal = {SIGPLAN Not.}, 690 | issue_date = {August 2013}, 691 | volume = {48}, 692 | number = {8}, 693 | month = feb, 694 | year = {2013}, 695 | issn = {0362-1340}, 696 | pages = {135--146}, 697 | numpages = {12}, 698 | url = {http://doi.acm.org/10.1145/2517327.2442530}, 699 | doi = {10.1145/2517327.2442530}, 700 | acmid = {2442530}, 701 | publisher = {ACM}, 702 | address = {New York, NY, USA}, 703 | keywords = {graph algorithms, parallel programming, shared memory}, 704 | } 705 | 706 | @inproceedings{Shun2015ligra+, 707 | author = {Shun, Julian and Dhulipala, Laxman and Blelloch, Guy E.}, 708 | title = {Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+}, 709 | booktitle = {Proceedings of the 2015 Data Compression Conference}, 710 | series = {DCC '15}, 711 | year = {2015}, 712 | isbn = {978-1-4799-8430-5}, 713 | pages = {403--412}, 714 | numpages = {10}, 715 | url = {http://dx.doi.org/10.1109/DCC.2015.8}, 716 | doi = {10.1109/DCC.2015.8}, 717 | acmid = {2860198}, 718 | publisher = {IEEE Computer Society}, 719 | address = {Washington, DC, USA}, 720 | keywords = {Graph compression, Parallel algorithms}, 721 | } 722 | 723 | @article{Shun2016parallel, 724 | author = {Julian Shun and 725 | Farbod Roosta{-}Khorasani and 726 | Kimon Fountoulakis and 727 | Michael W. Mahoney}, 728 | title = {Parallel Local Graph Clustering}, 729 | journal = {CoRR}, 730 | volume = {abs/1604.07515}, 731 | year = {2016}, 732 | url = {http://arxiv.org/abs/1604.07515}, 733 | timestamp = {Mon, 02 May 2016 18:22:52 +0200}, 734 | biburl = 735 | {http://dblp.uni-trier.de/rec/bib/journals/corr/ShunRFM16}, 736 | bibsource = {dblp computer science bibliography, 737 | http://dblp.org} 738 | } 739 | 740 | @inproceedings{wang2016gunrock, 741 | title={Gunrock: A high-performance graph processing library on the GPU}, 742 | author={Wang, Yangzihao and Davidson, Andrew and Pan, Yuechao and Wu, 743 | Yuduo and Riffel, Andy and Owens, John D}, 744 | booktitle={Proceedings of the 21st ACM SIGPLAN Symposium on Principles 745 | and Practice of Parallel Programming}, 746 | pages={11}, 747 | year={2016}, 748 | organization={ACM} 749 | } 750 | 751 | @inproceedings{Davidson2014work-efficient, 752 | author = {Davidson, Andrew and Baxter, Sean and Garland, Michael and Owens, 753 | John D.}, 754 | title = {Work-Efficient Parallel GPU Methods for Single-Source Shortest 755 | Paths}, 756 | booktitle = {Proceedings of the 2014 IEEE 28th International Parallel and 757 | Distributed Processing Symposium}, 758 | series = {IPDPS '14}, 759 | year = {2014}, 760 | isbn = {978-1-4799-3800-1}, 761 | pages = {349--359}, 762 | numpages = {11}, 763 | url = {http://dx.doi.org/10.1109/IPDPS.2014.45}, 764 | doi = {10.1109/IPDPS.2014.45}, 765 | acmid = {2650649}, 766 | publisher = {IEEE Computer Society}, 767 | address = {Washington, DC, USA}, 768 | keywords = {GPU computing, graph traversal, single-source 769 | shortest paths, sparse graphs}, 770 | } 771 | 772 | @article{singh2016modified, 773 | title={Modified Dijkstra’s Algorithm for Dense Graphs on GPU using CUDA}, 774 | author={Singh, Dhirendra Pratap and Khare, Nilay}, 775 | journal={Indian Journal of Science and Technology}, 776 | volume={9}, 777 | number={33}, 778 | year={2016} 779 | } 780 | 781 | @article{singh2016efficient, 782 | title={Efficient Parallel Implementation of Single Source Shortest Path 783 | Algorithm on GPU Using CUDA}, 784 | author={Singh, Dhirendra Pratap and Khare, Nilay and Rasool, Akhtar}, 785 | journal={International Journal of Applied Engineering Research}, 786 | volume={11}, 787 | number={4}, 788 | pages={2560--2567}, 789 | year={2016} 790 | } 791 | 792 | @article{delling2013phast, 793 | title={Phast: Hardware-accelerated shortest path trees}, 794 | author={Delling, Daniel and Goldberg, Andrew V and Nowatzyk, Andreas and 795 | Werneck, Renato F}, 796 | journal={Journal of Parallel and Distributed Computing}, 797 | volume={73}, 798 | number={7}, 799 | pages={940--952}, 800 | year={2013}, 801 | publisher={Elsevier} 802 | } 803 | 804 | @article{meyer2003delta, 805 | title={$\Delta$-stepping: a parallelizable shortest path algorithm}, 806 | author={Meyer, Ulrich and Sanders, Peter}, 807 | journal={Journal of Algorithms}, 808 | volume={49}, 809 | number={1}, 810 | pages={114--152}, 811 | year={2003}, 812 | publisher={Elsevier} 813 | } 814 | 815 | @inproceedings{merrill2012scalable, 816 | title={Scalable GPU graph traversal}, 817 | author={Merrill, Duane and Garland, Michael and Grimshaw, Andrew}, 818 | booktitle={ACM SIGPLAN Notices}, 819 | volume={47}, 820 | number={8}, 821 | pages={117--128}, 822 | year={2012}, 823 | organization={ACM} 824 | } 825 | 826 | @ARTICLE{Castellana2015in-memory, 827 | author={V. G. Castellana and A. Morari and J. Weaver and A. Tumeo and D. 828 | Haglin and O. Villa and J. Feo}, 829 | journal={Computer}, 830 | title={In-Memory Graph Databases for Web-Scale Data}, 831 | year={2015}, 832 | volume={48}, 833 | number={3}, 834 | pages={24-35}, 835 | keywords={Internet;database management systems;graph theory;pattern 836 | clustering;Web-scale data;commodity clusters;graph-based 837 | methods;heterogeneous data;inmemory graph databases;scalable 838 | resource description framework databases;software stack;Algorithm 839 | design and analysis;Clustering algorithms;Data structures;Pattern 840 | matching;Resource description framework;Resource management;Software 841 | development;RDF databases;SPARQL;big data;graph 842 | databases;high-performance computing;multithreading}, 843 | doi={10.1109/MC.2015.74}, 844 | ISSN={0018-9162}, 845 | month={Mar}, 846 | } 847 | 848 | @inproceedings{zeng2013distributed, 849 | title={A distributed graph engine for web scale RDF data}, 850 | author={Zeng, Kai and Yang, Jiacheng and Wang, Haixun and Shao, Bin and 851 | Wang, Zhongyuan}, 852 | booktitle={Proceedings of the VLDB Endowment}, 853 | volume={6}, 854 | number={4}, 855 | pages={265--276}, 856 | year={2013}, 857 | organization={VLDB Endowment} 858 | } 859 | 860 | @misc{microsoft2013trinity, 861 | author = {Microsoft}, 862 | title = {{Trinity}}, 863 | howpublished = 864 | "\url{https://www.microsoft.com/en-us/research/project/trinity/}", 865 | year = {2013}, 866 | note = "[Online; accessed 10-November-2016]" 867 | } 868 | 869 | @inproceedings{kocberber2013meet, 870 | title={Meet the walkers: Accelerating index traversals for in-memory 871 | databases}, 872 | author={Kocberber, Onur and Grot, Boris and Picorel, Javier and Falsafi, 873 | Babak and Lim, Kevin and Ranganathan, Parthasarathy}, 874 | booktitle={Proceedings of the 46th Annual IEEE/ACM International 875 | Symposium on Microarchitecture}, 876 | pages={468--479}, 877 | year={2013}, 878 | organization={ACM} 879 | } 880 | 881 | @inproceedings{shi2016fast, 882 | author = {Jiaxin Shi and Youyang Yao and Rong Chen and Haibo Chen and Feifei Li}, 883 | title = {Fast and Concurrent {RDF} Queries with RDMA-Based Distributed Graph Exploration}, 884 | booktitle = {12th {USENIX} Symposium on Operating Systems Design and 885 | Implementation, 886 | {OSDI} 2016, Savannah, GA, USA, November 2-4, 887 | 2016.}, 888 | pages = {317--332}, 889 | year = {2016}, 890 | url = 891 | {https://www.usenix.org/conference/osdi16/technical-sessions/presentation/shi}, 892 | timestamp = {Tue, 08 Nov 2016 07:18:04 +0100}, 893 | biburl = 894 | {http://dblp.uni-trier.de/rec/bib/conf/osdi/ShiYCCL16}, 895 | bibsource = {dblp computer science bibliography, 896 | http://dblp.org} 897 | } 898 | 899 | @inproceedings{chen2015powerlyra, 900 | title={Powerlyra: Differentiated graph computation and partitioning on 901 | skewed graphs}, 902 | author={Chen, Rong and Shi, Jiaxin and Chen, Yanzhe and Chen, Haibo}, 903 | booktitle={Proceedings of the Tenth European Conference on Computer 904 | Systems}, 905 | pages={1}, 906 | year={2015}, 907 | organization={ACM} 908 | } 909 | 910 | @inproceedings{fu2014mapgraph, 911 | title={Mapgraph: A high level API for fast development of high performance 912 | graph analytics on GPUs}, 913 | author={Fu, Zhisong and Personick, Michael and Thompson, Bryan}, 914 | booktitle={Proceedings of Workshop on GRAph Data management 915 | Experiences and Systems}, 916 | pages={1--6}, 917 | year={2014}, 918 | organization={ACM} 919 | } 920 | 921 | @inproceedings{dorrance2014scalable, 922 | title={A scalable sparse matrix-vector multiplication kernel for 923 | energy-efficient sparse-blas on FPGAs}, 924 | author={Dorrance, Richard and Ren, Fengbo and Markovi{\'c}, Dejan}, 925 | booktitle={Proceedings of the 2014 ACM/SIGDA international symposium 926 | on Field-programmable gate arrays}, 927 | pages={161--170}, 928 | year={2014}, 929 | organization={ACM} 930 | } 931 | 932 | @inproceedings{Khorasani2014CuSha, 933 | 934 | author = {Khorasani, Farzad and Vora, Keval and Gupta, Rajiv and Bhuyan, 935 | Laxmi N.}, 936 | 937 | title = {CuSha: Vertex-centric Graph Processing on GPUs}, 938 | 939 | booktitle = {Proceedings of the 23rd International Symposium on 940 | High-performance Parallel and Distributed Computing}, 941 | 942 | series = {HPDC '14}, 943 | 944 | year = {2014}, 945 | 946 | isbn = {978-1-4503-2749-7}, 947 | 948 | location = {Vancouver, BC, Canada}, 949 | 950 | pages = {239--252}, 951 | 952 | numpages = {14}, 953 | 954 | url = {http://doi.acm.org/10.1145/2600212.2600227}, 955 | 956 | doi = {10.1145/2600212.2600227}, 957 | 958 | acmid = {2600227}, 959 | 960 | publisher = {ACM}, 961 | 962 | address = {New York, NY, USA}, 963 | 964 | keywords = {coalesced memory accesses, concatenated windows, 965 | g-shards, gpu, graph representation}, 966 | 967 | } 968 | 969 | @inproceedings{wu2013complexity, 970 | author = {Wu, Bo and Zhao, Zhijia and Zhang, Eddy Zheng and Jiang, Yunlian 971 | and Shen, Xipeng}, 972 | title = {Complexity Analysis and Algorithm Design for Reorganizing Data to 973 | Minimize Non-coalesced Memory Accesses on GPU}, 974 | booktitle = {Proceedings of the 18th ACM SIGPLAN Symposium on Principles 975 | and Practice of Parallel Programming}, 976 | series = {PPoPP '13}, 977 | year = {2013}, 978 | isbn = {978-1-4503-1922-5}, 979 | location = {Shenzhen, China}, 980 | pages = {57--68}, 981 | numpages = {12}, 982 | url = {http://doi.acm.org/10.1145/2442516.2442523}, 983 | doi = {10.1145/2442516.2442523}, 984 | acmid = {2442523}, 985 | publisher = {ACM}, 986 | address = {New York, NY, USA}, 987 | keywords = {computational complexity, data transformation, 988 | gpgpu, memory coalescing, runtime optimizations, 989 | thread-data remapping}, 990 | 991 | } 992 | 993 | @article{Zhang2011elimination, 994 | 995 | author = {Zhang, Eddy Z. and Jiang, Yunlian and Guo, Ziyu and Tian, Kai and 996 | Shen, Xipeng}, 997 | 998 | title = {On-the-fly Elimination of Dynamic Irregularities for GPU 999 | Computing}, 1000 | 1001 | journal = {SIGPLAN Not.}, 1002 | 1003 | issue_date = {March 2011}, 1004 | 1005 | volume = {46}, 1006 | 1007 | number = {3}, 1008 | 1009 | month = mar, 1010 | 1011 | year = {2011}, 1012 | 1013 | issn = {0362-1340}, 1014 | 1015 | pages = {369--380}, 1016 | 1017 | numpages = {12}, 1018 | 1019 | url = {http://doi.acm.org/10.1145/1961296.1950408}, 1020 | 1021 | doi = {10.1145/1961296.1950408}, 1022 | 1023 | acmid = {1950408}, 1024 | 1025 | publisher = {ACM}, 1026 | 1027 | address = {New York, NY, USA}, 1028 | 1029 | keywords = {cpu-gpu pipelining, data transformation, gpgpu, 1030 | memory coalescing, thread data remapping, thread 1031 | divergence}, 1032 | 1033 | } 1034 | 1035 | @inproceedings{cheng2012kineograph, 1036 | 1037 | title={Kineograph: taking the pulse of a fast-changing and connected 1038 | world}, 1039 | 1040 | author={Cheng, Raymond and Hong, Ji and Kyrola, Aapo and Miao, Youshan 1041 | and Weng, Xuetian and Wu, Ming and Yang, Fan and Zhou, Lidong and 1042 | Zhao, Feng and Chen, Enhong}, 1043 | 1044 | booktitle={Proceedings of the 7th ACM european conference on Computer 1045 | Systems}, 1046 | 1047 | pages={85--98}, 1048 | 1049 | year={2012}, 1050 | 1051 | organization={ACM} 1052 | 1053 | } 1054 | 1055 | @inproceedings{hong2011accelerating, 1056 | title={Accelerating CUDA graph algorithms at maximum warp}, 1057 | author={Hong, Sungpack and Kim, Sang Kyun and Oguntebi, Tayo and 1058 | Olukotun, Kunle}, 1059 | booktitle={ACM SIGPLAN Notices}, 1060 | volume={46}, 1061 | number={8}, 1062 | pages={267--276}, 1063 | year={2011}, 1064 | organization={ACM} 1065 | } 1066 | 1067 | @inproceedings{hong2011accelerating, 1068 | title={Accelerating CUDA graph algorithms at maximum warp}, 1069 | author={Hong, Sungpack and Kim, Sang Kyun and Oguntebi, Tayo and 1070 | Olukotun, Kunle}, 1071 | booktitle={ACM SIGPLAN Notices}, 1072 | volume={46}, 1073 | number={8}, 1074 | pages={267--276}, 1075 | year={2011}, 1076 | organization={ACM} 1077 | } 1078 | 1079 | @inproceedings{morari2015high, 1080 | title={High-Performance, Distributed Dictionary Encoding of RDF Datasets}, 1081 | author={Morari, Alessandro and Weaver, Jesse and Villa, Oreste and 1082 | Haglin, David and Tumeo, Antonino and Castellana, Vito Giovanni and 1083 | Feo, John}, 1084 | booktitle={2015 IEEE International Conference on Cluster Computing}, 1085 | pages={250--253}, 1086 | year={2015}, 1087 | organization={IEEE} 1088 | } 1089 | 1090 | @article{lin2016data, 1091 | title={Data Compression for Analytics over Large-scale In-memory Column 1092 | Databases}, 1093 | author={Lin, Chunbin and Wang, Jianguo and Papakonstantinou, Yannis}, 1094 | journal={arXiv preprint arXiv:1606.09315}, 1095 | year={2016} 1096 | } 1097 | 1098 | @article{ousterhout2015ramcloud, 1099 | title={The ramcloud storage system}, 1100 | author={Ousterhout, John and Gopalan, Arjun and Gupta, Ashish and 1101 | Kejriwal, Ankita and Lee, Collin and Montazeri, Behnam and Ongaro, 1102 | Diego and Park, Seo Jin and Qin, Henry and Rosenblum, Mendel and 1103 | others}, 1104 | journal={ACM Transactions on Computer Systems (TOCS)}, 1105 | volume={33}, 1106 | number={3}, 1107 | pages={7}, 1108 | year={2015}, 1109 | publisher={ACM} 1110 | } 1111 | 1112 | @article{jamro2015algorithms, 1113 | title={The algorithms for FPGA implementation of sparse matrices 1114 | multiplication}, 1115 | author={Jamro, Ernest and Pabi{\'s}, Tomasz and Russek, Pawe{\l} and 1116 | Wiatr, Kazimierz}, 1117 | journal={Computing and Informatics}, 1118 | volume={33}, 1119 | number={3}, 1120 | pages={667--684}, 1121 | year={2015} 1122 | } 1123 | 1124 | @misc{morari2015gems, 1125 | title={GEMS: Graph Database Engine for Multithreaded Systems.}, 1126 | author={Morari, Alessandro and Castellana, Vito Giovanni and Villa, 1127 | Oreste and Weaver, Jesse and Williams, Gregory Todd and Haglin, 1128 | David J and Tumeo, Antonino and Feo, John}, 1129 | year={2015} 1130 | } 1131 | 1132 | @inproceedings{geisberger2008contraction, 1133 | title={Contraction hierarchies: Faster and simpler hierarchical routing in 1134 | road networks}, 1135 | author={Geisberger, Robert and Sanders, Peter and Schultes, Dominik and 1136 | Delling, Daniel}, 1137 | booktitle={International Workshop on Experimental and Efficient 1138 | Algorithms}, 1139 | pages={319--333}, 1140 | year={2008}, 1141 | organization={Springer} 1142 | } 1143 | 1144 | @article{kapre2015case, 1145 | author = {Nachiket Kapre and Pradeep Moorthy}, 1146 | title = {A Case for Embedded FPGA-based SoCs in Energy-Efficient 1147 | Acceleration of Graph Problems}, 1148 | journal = {Supercomputing frontiers and innovations}, 1149 | volume = {2}, 1150 | number = {3}, 1151 | year = {2015}, 1152 | keywords = {}, 1153 | abstract = {Sparse graph problems are 1154 | notoriously hard to accelerate on 1155 | conventional platforms due to 1156 | irregular memory access patterns 1157 | resulting in underutilization of 1158 | memory bandwidth. These bottlenecks 1159 | on traditional x86-based systems 1160 | mean that sparse graph problems 1161 | scale very poorly, both in terms of 1162 | performance and power efficiency. A 1163 | cluster of embedded SoCs 1164 | (systems-on-chip) with 1165 | closely-coupled FPGA accelerators 1166 | can support distributed memory 1167 | accesses with better matched 1168 | low-power processing. We first 1169 | conduct preliminary experiments 1170 | across a range of COTS (commercial 1171 | off-the-shelf) embedded SoCs 1172 | to establish promise for 1173 | energy-efficiency acceleration of 1174 | sparse problems. We select the 1175 | Xilinx Zynq SoC with FPGA 1176 | accelerators to construct a 1177 | prototype 32-node Beowulf cluster. 1178 | We develop specialized MPI routines 1179 | and memory DMA offload engines to 1180 | support irregular communication 1181 | efficiently. In this setup, we use 1182 | the ARM processor as a data 1183 | marshaller for local DMA traffic as 1184 | well as remote MPI traffic while the 1185 | FPGA may be used as a programmable 1186 | accelerator. Across a set of 1187 | benchmark graphs, we show that 1188 | 32-node embedded SoC cluster can 1189 | exceed the energy efficiency of an 1190 | Intel E5-2407 by as much as 1.7× at 1191 | a total graph processing capacity of 1192 | 91–95 MTEPS for graphs as large as 1193 | 32 million nodes and edges. }, 1194 | 1195 | issn = {2313-8734},url = {http://superfri.org/superfri/article/view/62} 1196 | 1197 | } 1198 | 1199 | @INPROCEEDINGS{zhou2016high, 1200 | author={S. Zhou and C. Chelmis and V. K. Prasanna}, 1201 | booktitle={2016 IEEE 24th Annual International Symposium on 1202 | Field-Programmable Custom Computing Machines (FCCM)}, 1203 | title={High-Throughput and Energy-Efficient Graph Processing on FPGA}, 1204 | year={2016}, 1205 | pages={103-110}, 1206 | keywords={DRAM chips;energy conservation;field programmable gate 1207 | arrays;logic design;parallel architectures;performance evaluation;power 1208 | aware computing;trees (mathematics);FPGA design;MTEPS;concurrent 1209 | multiple input data processing;data layout;edge-centric 1210 | computing;efficient memory activation schedule;energy-efficiency 1211 | improvement;energy-efficient graph processing;external memory 1212 | bandwidth saturation;external memory performance 1213 | optimization;high-throughput graph processing;large-scale graph 1214 | processing design;million traversed edges per second;minimum 1215 | spanning tree algorithm;on-chip memory power consumption 1216 | reduction;parallel architecture;single-source shortest path 1217 | algorithm;throughput improvement;weakly connected component 1218 | algorithm;Field programmable gate arrays;Layout;Memory 1219 | management;Optimization;Random access memory;Throughput;Writing}, 1220 | doi={10.1109/FCCM.2016.35}, 1221 | month={May}, 1222 | } 1223 | 1224 | @inproceedings{sanders2008mobile, 1225 | title={Mobile route planning}, 1226 | author={Sanders, Peter and Schultes, Dominik and Vetter, Christian}, 1227 | booktitle={European Symposium on Algorithms}, 1228 | pages={732--743}, 1229 | year={2008}, 1230 | organization={Springer} 1231 | } 1232 | 1233 | @inproceedings{ho2015efficient, 1234 | title={Efficient execution of memory access phases using dataflow 1235 | specialization}, 1236 | author={Ho, Chen-Han and Kim, Sung Jin and Sankaralingam, Karthikeyan}, 1237 | booktitle={ACM SIGARCH Computer Architecture News}, 1238 | volume={43}, 1239 | number={3}, 1240 | pages={118--130}, 1241 | year={2015}, 1242 | organization={ACM} 1243 | } 1244 | 1245 | @inproceedings{kumar2014sqrl, 1246 | title={SQRL: hardware accelerator for collecting software data 1247 | structures}, 1248 | author={Kumar, Snehasish and Shriraman, Arrvindh and Srinivasan, 1249 | Vijayalakshmi and Lin, Dan and Phillips, Jordon}, 1250 | booktitle={Proceedings of the 23rd international conference on 1251 | Parallel architectures and compilation}, 1252 | pages={475--476}, 1253 | year={2014}, 1254 | organization={ACM} 1255 | } 1256 | 1257 | @article{schkufza2014stochastic, 1258 | title={Stochastic optimization of floating-point programs with tunable 1259 | precision}, 1260 | author={Schkufza, Eric and Sharma, Rahul and Aiken, Alex}, 1261 | journal={ACM SIGPLAN Notices}, 1262 | volume={49}, 1263 | number={6}, 1264 | pages={53--64}, 1265 | year={2014}, 1266 | publisher={ACM} 1267 | } 1268 | 1269 | @article{yu2014staring, 1270 | title={Staring into the abyss: An evaluation of concurrency control with 1271 | one thousand cores}, 1272 | author={Yu, Xiangyao and Bezerra, George and Pavlo, Andrew and Devadas, 1273 | Srinivas and Stonebraker, Michael}, 1274 | journal={Proceedings of the VLDB Endowment}, 1275 | volume={8}, 1276 | number={3}, 1277 | pages={209--220}, 1278 | year={2014}, 1279 | publisher={VLDB Endowment} 1280 | } 1281 | 1282 | @inproceedings{giefers2016analyzing, 1283 | title={Analyzing the energy-efficiency of sparse matrix multiplication on 1284 | heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA}, 1285 | author={Giefers, Heiner and Staar, Peter and Bekas, Costas and 1286 | Hagleitner, Christoph}, 1287 | booktitle={Performance Analysis of Systems and Software (ISPASS), 2016 1288 | IEEE International Symposium on}, 1289 | pages={46--56}, 1290 | year={2016}, 1291 | organization={IEEE} 1292 | } 1293 | 1294 | @inproceedings{zheng2015flashgraph, 1295 | title={FlashGraph: Processing billion-node graphs on an array of commodity 1296 | SSDs}, 1297 | author={Zheng, Da and Mhembere, Disa and Burns, Randal and Vogelstein, 1298 | Joshua and Priebe, Carey E and Szalay, Alexander S}, 1299 | booktitle={13th USENIX Conference on File and Storage Technologies 1300 | (FAST 15)}, 1301 | pages={45--58}, 1302 | year={2015} 1303 | } 1304 | 1305 | @inproceedings{yuan2014fast, 1306 | title={Fast iterative graph computation: A path centric approach}, 1307 | author={Yuan, Pingpeng and Zhang, Wenya and Xie, Changfeng and Jin, Hai 1308 | and Liu, Ling and Lee, Kisung}, 1309 | booktitle={Proceedings of the International Conference for High 1310 | Performance Computing, Networking, Storage and Analysis}, 1311 | pages={401--412}, 1312 | year={2014}, 1313 | organization={IEEE Press} 1314 | } 1315 | 1316 | @article{najeebullah2014bpp, 1317 | title={BPP: Large Graph Storage for Efficient Disk Based Processing}, 1318 | author={Najeebullah, Kamran and Khan, Kifayat Ullah and Nawaz, Waqas and 1319 | Lee, Young-Koo}, 1320 | journal={arXiv preprint arXiv:1401.2327}, 1321 | year={2014} 1322 | } 1323 | 1324 | @inproceedings{nilakant2014prefedge, 1325 | title={PrefEdge: SSD prefetcher for large-scale graph traversal}, 1326 | author={Nilakant, Karthik and Dalibard, Valentin and Roy, Amitabha and 1327 | Yoneki, Eiko}, 1328 | booktitle={Proceedings of International Conference on Systems and 1329 | Storage}, 1330 | pages={1--12}, 1331 | year={2014}, 1332 | organization={ACM} 1333 | } 1334 | 1335 | @inproceedings{fu2014prime, 1336 | title={PriME: A parallel and distributed simulator for thousand-core 1337 | chips}, 1338 | author={Fu, Yaosheng and Wentzlaff, David}, 1339 | booktitle={Performance Analysis of Systems and Software (ISPASS), 2014 1340 | IEEE International Symposium on}, 1341 | pages={116--125}, 1342 | year={2014}, 1343 | organization={IEEE} 1344 | } 1345 | 1346 | @inproceedings{miller2010graphite, 1347 | title={Graphite: A distributed parallel simulator for multicores}, 1348 | author={Miller, Jason E and Kasture, Harshad and Kurian, George and 1349 | Gruenwald, Charles and Beckmann, Nathan and Celio, Christopher and 1350 | Eastep, Jonathan and Agarwal, Anant}, 1351 | booktitle={HPCA-16 2010 The Sixteenth International Symposium on 1352 | High-Performance Computer Architecture}, 1353 | pages={1--12}, 1354 | year={2010}, 1355 | organization={IEEE} 1356 | } 1357 | 1358 | @inproceedings{nguyen2013lightweight, 1359 | title={A lightweight infrastructure for graph analytics}, 1360 | author={Nguyen, Donald and Lenharth, Andrew and Pingali, Keshav}, 1361 | booktitle={Proceedings of the Twenty-Fourth ACM Symposium on Operating 1362 | Systems Principles}, 1363 | pages={456--471}, 1364 | year={2013}, 1365 | organization={ACM} 1366 | 1367 | } 1368 | 1369 | @inproceedings{carlson2011sniper, 1370 | title={Sniper: exploring the level of abstraction for scalable and 1371 | accurate parallel multi-core simulation}, 1372 | author={Carlson, Trevor E and Heirman, Wim and Eeckhout, Lieven}, 1373 | booktitle={Proceedings of 2011 International Conference for High 1374 | Performance Computing, Networking, Storage and Analysis}, 1375 | pages={52}, 1376 | year={2011}, 1377 | organization={ACM} 1378 | } 1379 | 1380 | @article{barthels2017distributed, 1381 | title={Distributed Join Algorithms on Thousands of Cores}, 1382 | author={Barthels, Claude and M{\"u}ller, Ingo and Schneider, Timo and 1383 | Alonso, Gustavo and Hoefler, Torsten}, 1384 | journal={Proceedings of the VLDB Endowment}, 1385 | volume={10}, 1386 | number={5}, 1387 | year={2017} 1388 | } 1389 | 1390 | @article{shang2014auto, 1391 | title={Auto-approximation of graph computing}, 1392 | author={Shang, Zechao and Yu, Jeffrey Xu}, 1393 | journal={Proceedings of the VLDB Endowment}, 1394 | volume={7}, 1395 | number={14}, 1396 | pages={1833--1844}, 1397 | year={2014}, 1398 | publisher={VLDB Endowment} 1399 | } 1400 | 1401 | @article{Shang2016large, 1402 | title = {{Graph analytics through fine-grained parallelism}}, 1403 | author = {Shang, Zechao and Li, Feifei and Yu, Jeffrey Xu and Zhang, Zhiwei 1404 | and Cheng, Hong}, 1405 | doi = {10.1145/2882903.2915238}, 1406 | isbn = {9781450335317}, 1407 | issn = {07308078}, 1408 | journal = {Sigmod}, 1409 | pages = {463--478}, 1410 | url = {http://dl.acm.org/citation.cfm?doid=2882903.2915238}, 1411 | year = {2016} 1412 | } 1413 | 1414 | @inproceedings{roy2013x-stream, 1415 | title={X-Stream: edge-centric graph processing using streaming 1416 | partitions}, 1417 | author={Roy, Amitabha and Mihailovic, Ivo and Zwaenepoel, Willy}, 1418 | booktitle={Proceedings of the Twenty-Fourth ACM Symposium on Operating 1419 | Systems Principles}, 1420 | pages={472--488}, 1421 | year={2013}, 1422 | organization={ACM} 1423 | } 1424 | 1425 | @article{shi2015frog, 1426 | title={Frog: Asynchronous graph processing on GPU with hybrid coloring 1427 | model}, 1428 | author={Shi, Xuanhua and Liang, J and Luo, X and Di, S and He, B and Lu, 1429 | L and Jin, Hai}, 1430 | journal={Huazhong University of Science and Technology, Tech. Rep. 1431 | HUSTCGCL-TR-402}, 1432 | year={2015} 1433 | } 1434 | 1435 | @inproceedings{akin2014understanding, 1436 | title={Understanding the design space of dram-optimized hardware FFT 1437 | accelerators}, 1438 | author={Ak{\i}n, Berkin and Franchetti, Franz and Hoe, James C}, 1439 | booktitle={2014 IEEE 25th International Conference on 1440 | Application-Specific Systems, Architectures and Processors}, 1441 | pages={248--255}, 1442 | year={2014}, 1443 | organization={IEEE} 1444 | } 1445 | 1446 | @inproceedings{akin2015data, 1447 | title={Data reorganization in memory using 3d-stacked dram}, 1448 | author={Akin, Berkin and Franchetti, Franz and Hoe, James C}, 1449 | booktitle={ACM SIGARCH Computer Architecture News}, 1450 | volume={43}, 1451 | number={3}, 1452 | pages={131--143}, 1453 | year={2015}, 1454 | organization={ACM} 1455 | } 1456 | 1457 | @inproceedings{venkataraman2013presto, 1458 | title={Presto: distributed machine learning and graph processing with 1459 | sparse matrices}, 1460 | author={Venkataraman, Shivaram and Bodzsar, Erik and Roy, Indrajit and 1461 | AuYoung, Alvin and Schreiber, Robert S}, 1462 | booktitle={Proceedings of the 8th ACM European Conference on Computer 1463 | Systems}, 1464 | pages={197--210}, 1465 | year={2013}, 1466 | organization={ACM} 1467 | } 1468 | 1469 | @inproceedings{hsieh2016accelerating, 1470 | title={Accelerating pointer chasing in 3D-stacked memory: Challenges, 1471 | mechanisms, evaluation}, 1472 | author={Hsieh, Kevin and Khan, Samira and Vijaykumar, Nandita and Chang, 1473 | Kevin K and Boroumand, Amirali and Ghose, Saugata and Mutlu, Onur}, 1474 | booktitle={Computer Design (ICCD), 2016 IEEE 34th International 1475 | Conference on}, 1476 | pages={25--32}, 1477 | year={2016}, 1478 | organization={IEEE} 1479 | } 1480 | 1481 | @inproceedings{appuswamy2015scaling, 1482 | title={Scaling the Memory Power Wall With DRAM-Aware Data Management}, 1483 | author={Appuswamy, Raja and Olma, Matthaios and Ailamaki, Anastasia}, 1484 | booktitle={Proceedings of the 11th International Workshop on Data 1485 | Management on New Hardware}, 1486 | pages={3}, 1487 | year={2015}, 1488 | organization={ACM} 1489 | } 1490 | 1491 | @article{umuroglu2016random, 1492 | title={Random access schemes for efficient FPGA SpMV acceleration}, 1493 | author={Umuroglu, Yaman and Jahre, Magnus}, 1494 | journal={Microprocessors and Microsystems}, 1495 | year={2016}, 1496 | publisher={Elsevier} 1497 | } 1498 | 1499 | @inproceedings{guo2015enabling, 1500 | title={Enabling portable energy efficiency with memory accelerated 1501 | library}, 1502 | author={Guo, Qi and Low, Tze-Meng and Alachiotis, Nikolaos and Akin, 1503 | Berkin and Pileggi, Larry and Hoe, James C and Franchetti, Franz}, 1504 | booktitle={Proceedings of the 48th International Symposium on 1505 | Microarchitecture}, 1506 | pages={750--761}, 1507 | year={2015}, 1508 | organization={ACM} 1509 | } 1510 | 1511 | @inproceedings{Oguntebi:2016:GDL:2847263.2847337, 1512 | author = {Oguntebi, Tayo and Olukotun, Kunle}, 1513 | title = {GraphOps: A Dataflow Library for Graph Analytics Acceleration}, 1514 | booktitle = {Proceedings of the 2016 ACM/SIGDA International Symposium on 1515 | Field-Programmable Gate Arrays}, 1516 | series = {FPGA '16}, 1517 | year = {2016}, 1518 | isbn = {978-1-4503-3856-1}, 1519 | location = {Monterey, California, USA}, 1520 | pages = {111--117}, 1521 | numpages = {7}, 1522 | url = {http://doi.acm.org/10.1145/2847263.2847337}, 1523 | doi = {10.1145/2847263.2847337}, 1524 | acmid = {2847337}, 1525 | publisher = {ACM}, 1526 | address = {New York, NY, USA}, 1527 | keywords = {accelerator, analytics, dataflow, fpga, graph 1528 | analysis}, 1529 | } 1530 | 1531 | @inproceedings{umuroglu2015hybrid, 1532 | title={Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous 1533 | platform}, 1534 | author={Umuroglu, Yaman and Morrison, Donn and Jahre, Magnus}, 1535 | booktitle={Field Programmable Logic and Applications (FPL), 2015 25th 1536 | International Conference on}, 1537 | pages={1--8}, 1538 | year={2015}, 1539 | organization={IEEE} 1540 | } 1541 | 1542 | --------------------------------------------------------------------------------