├── README.md
└── refs.bib


/README.md:
--------------------------------------------------------------------------------
  1 | # Literature Review
  2 | ## Conferences
  3 | - [ISCA2016] (http://isca2016.eecs.umich.edu/) [ISCA2015] (http://www.ece.cmu.edu/calcm/isca2015/)
  4 | - [Micro2016] (https://www.microarch.org/micro49/) [Micro2015] (https://www.microarch.org/micro48/)
  5 | - [ASPLOS2016] (https://www.ece.cmu.edu/calcm/asplos2016/)
  6 | - [HPCA2016] (http://hpca22.site.ac.upc.edu/index.php/program/conference-program/) [HPCA2015] (http://darksilicon.org/hpca/)
  7 | - [SC2016] (http://sc16.supercomputing.org/conference-components/technical-program-tues-fri/technical-papers) [SC2015] (http://sc15.supercomputing.org/program/technical-papers.html)
  8 | - [VLDB2016] (http://vldb2016.persistent.com/)
  9 | - [OSDI2016]
 10 | - [EuroSys2016]
 11 | - [ICDE2016]
 12 | - [SIGMOD2016]
 13 | - [FCCM2016] (http://fccm.org/2016/cfp.html) [FCCM2015] (http://fccm.org/2015/)
 14 | - [FPGA2016] (http://www.isfpga.org/fpga2016/index.html) [FPGA2015] (http://www.eecs.ucf.edu/isfpga/)
 15 | - [FPL2016] (http://www.fpl2016.org/) [FPL2015] (http://www.fpl2015.org/?page=tech_sessions#arch3)
 16 | - [FPT2016] (http://www.icfpt2016.org/index.jsp) [FPT2015] (http://fpt.massey.ac.nz/)
 17 | - [ASAP2016] (http://www.asap2016.org/) [ASAP2015] (http://www.eecg.toronto.edu/asap2015/)
 18 | 
 19 | 
 20 | ## Research Groups on Graph Acceleration Research
 21 | - [GAP] (http://gap.cs.berkeley.edu/)
 22 | - [GPS] (http://infolab.stanford.edu/gps/)
 23 | - [Big Graph Mining] (http://datalab.snu.ac.kr/projects/big-graph-mining)
 24 | - [amplab: GraphX] (https://amplab.cs.berkeley.edu/projects/graphx/)
 25 | - [Gunrock] (http://gunrock.github.io/gunrock/doc/latest/index.html)
 26 | - [Ligra] (http://jshun.github.io/ligra/)
 27 | - [Galois] (http://iss.ices.utexas.edu/?p=projects/galois)
 28 | - [Trinity] (https://www.microsoft.com/en-us/research/project/trinity/)
 29 | - [GRASP] (http://grasp.cs.ucr.edu/)
 30 | 
 31 | ## Reading List
 32 | 
 33 | ### Graph Processing
 34 | The graph processing algorithms and frameworks are roughly classified based on the target computing platforms including many-core processors, distributed systems, GPUs, ASIC based Accelerators and FPGAs. Instead of targeting the graph processing framework, some of the work may particularly focus on one aspect of the graph processing such as graph compression, pre-processing, partition and load balancing. These work will be put in corresponding subsections as well. 
 35 | 
 36 | #### Survey
 37 | 
 38 | - McCune, Robert Ryan, Tim Weninger, and Greg Madey. "Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing." ACM Computing Surveys (CSUR) 48.2 (2015): 25.
 39 | 
 40 | - Doekemeijer, Niels, and Ana Lucia Varbanescu. "A survey of parallel graph processing frameworks." Delft University of Technology (2014).
 41 | 
 42 | 
 43 | #### Graph Processing on GPUs
 44 | - Shi, Xuanhua, J. Liang, X. Luo, S. Di, B. He, L. Lu, and Hai Jin. "Frog:
 45 |   Asynchronous graph processing on GPU with hybrid coloring model." Huazhong
 46 |   University of Science and Technology, Tech. Rep. HUSTCGCL-TR-402 (2015).
 47 | 
 48 | - Farzad Khorasani, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. 2014. CuSha:
 49 |   vertex-centric graph processing on GPUs. In Proceedings of the 23rd
 50 |   international symposium on High-performance parallel and distributed computing
 51 |   (HPDC '14). ACM, New York, NY, USA, 239-252. 
 52 | 
 53 | - Fu, Zhisong, Michael Personick, and Bryan Thompson. "Mapgraph: A high level API for fast development of high performance graph analytics on GPUs." In Proceedings of Workshop on GRAph Data management Experiences and Systems pp. 1-6. ACM, 2014
 54 | 
 55 | - Andrew Davidson, Sean Baxter, Michael Garland, and John D. Owens. 2014.
 56 |   Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths. In
 57 |   Proceedings of the 2014 IEEE 28th International Parallel and Distributed
 58 |   Processing Symposium (IPDPS '14). IEEE Computer Society, Washington, DC, USA,
 59 |   349-359. 
 60 | 
 61 | - Merrill, Duane, Michael Garland, and Andrew Grimshaw. "Scalable GPU graph traversal." ACM SIGPLAN Notices. Vol. 47. No. 8. ACM, 2012.
 62 | 
 63 | - Singh D P, Khare N. Modified Dijkstra’s Algorithm for Dense Graphs on GPU
 64 |   using CUDA[J]. Indian Journal of Science and Technology, 2016, 9(33).
 65 | 
 66 | - Wang, Yangzihao; Davidson, Andrew; Pan, Yuechao; Wu, Yuduo; Riffel, Andy; & Owens, John D.(2016). Gunrock: A High-Performance Graph Processing Library on the GPU. Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
 67 | 
 68 | - Singh DP, Khare N, Rasool A. Efficient Parallel Implementation of Single
 69 |   Source Shortest Path Algorithm on GPU Using CUDA. International Journal of
 70 |   Applied Engineering Research. 2016; 11(4):2560–7.
 71 | 
 72 | - Bingsheng He, Jianlong Zhong, "Medusa: Simplified Graph Processing on GPUs", IEEE Transactions on Parallel & Distributed Systems
 73 | 
 74 | - Hong, Sungpack, Sang Kyun Kim, Tayo Oguntebi, and Kunle Olukotun.
 75 |   "Accelerating CUDA graph algorithms at maximum warp." In ACM SIGPLAN Notices,
 76 |   vol. 46, no. 8, pp. 267-276. ACM, 2011.
 77 | 
 78 | #### Graph Processing on CPUs
 79 | - Roy, Amitabha, Ivo Mihailovic, and Willy Zwaenepoel. "X-Stream: edge-centric
 80 |   graph processing using streaming partitions." In Proceedings of the
 81 |   Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 472-488. ACM,
 82 |   2013.
 83 | 
 84 | - Shang, Zechao, Feifei Li, Jeffrey Xu Yu, Zhiwei Zhang, and Hong Cheng. "Graph
 85 |   Analytics Through Fine-Grained Parallelism. SIGMOD, 2016"
 86 | 
 87 | - Sundaram, Narayanan, et al. "GraphMat: High performance graph analytics made productive." Proceedings of the VLDB Endowment 8.11 (2015): 1214-1225.
 88 | 
 89 | - Julian Shun. An Evaluation of Parallel Eccentricity Estimation Algorithms on
 90 |   Undirected Real-World Graphs. Proceedings of the ACM SIGKDD Conference on
 91 |   Knowledge Discovery and Data Mining (KDD), pp. 1095-1104, 2015.
 92 | 
 93 | - Delling, Daniel, et al. "Phast: Hardware-accelerated shortest path trees."
 94 |   Journal of Parallel and Distributed Computing 73.7 (2013): 940-952.
 95 | 
 96 | - Meyer, Ulrich, and Peter Sanders. "Δ-stepping: a parallelizable shortest path
 97 |   algorithm." Journal of Algorithms 49.1 (2003): 114-152.
 98 | 
 99 | - Kyrola, Aapo, Guy Blelloch, and Carlos Guestrin. "GraphChi: large-scale graph computation on just a PC." Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12). 2012.
100 | 
101 | - Julian Shun and Guy E. Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP '13). ACM, New York, NY, USA, 135-146. 
102 | 
103 | - Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li, Huazhong Yang, "NXgraph: An Efficient Graph Processing System on a Single Machine", CoRR, 2015
104 | 
105 | - Cheng, Raymond, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan
106 |   Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. "Kineograph: taking the pulse
107 |   of a fast-changing and connected world." In Proceedings of the 7th ACM
108 |   european conference on Computer Systems, pp. 85-98. ACM, 2012.
109 | 
110 | - Geisberger, Robert, Peter Sanders, Dominik Schultes, and Daniel Delling.
111 |   "Contraction hierarchies: Faster and simpler hierarchical routing in road
112 |   networks." In International Workshop on Experimental and Efficient Algorithms,
113 |   pp. 319-333. Springer Berlin Heidelberg, 2008.
114 | 
115 | - Zheng, Da, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe,
116 |   and Alexander S. Szalay. "FlashGraph: Processing billion-node graphs on an
117 |   array of commodity SSDs." In 13th USENIX Conference on File and Storage
118 |   Technologies (FAST 15), pp. 45-58. 2015.
119 | 
120 | - Yuan, Pingpeng, Wenya Zhang, Changfeng Xie, Hai Jin, Ling Liu, and Kisung Lee.
121 |   "Fast iterative graph computation: A path centric approach." In Proceedings of
122 |   the International Conference for High Performance Computing, Networking,
123 |   Storage and Analysis, pp. 401-412. IEEE Press, 2014.
124 | 
125 | - Najeebullah, Kamran, Kifayat Ullah Khan, Waqas Nawaz, and Young-Koo Lee. "BPP:
126 |   Large Graph Storage for Efficient Disk Based Processing." arXiv preprint
127 |   arXiv:1401.2327 (2014).
128 | 
129 | - Nilakant, Karthik, Valentin Dalibard, Amitabha Roy, and Eiko Yoneki.
130 |   "PrefEdge: SSD prefetcher for large-scale graph traversal." In Proceedings of
131 |   International Conference on Systems and Storage, pp. 1-12. ACM, 2014.
132 | 
133 | - Nguyen, Donald, Andrew Lenharth, and Keshav Pingali. "A lightweight
134 |   infrastructure for graph analytics." In Proceedings of the Twenty-Fourth ACM
135 |   Symposium on Operating Systems Principles, pp. 456-471. ACM, 2013.
136 | 
137 | #### Graph Processing on Distributed Systems
138 | - Venkataraman, Shivaram, Erik Bodzsar, Indrajit Roy, Alvin AuYoung, and Robert
139 |   S. Schreiber. "Presto: distributed machine learning and graph processing with
140 |   sparse matrices." In Proceedings of the 8th ACM European Conference on
141 |   Computer Systems, pp. 197-210. ACM, 2013.
142 | 
143 | - Gonzalez, Joseph E., et al. "Graphx: Graph processing in a distributed dataflow framework." 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). 2014.
144 | 
145 | - Salihoglu, Semih, and Jennifer Widom. "GPS: a graph processing system." Proceedings of the 25th International Conference on Scientific and Statistical Database Management. ACM, 2013.
146 | 
147 | - Malewicz, Grzegorz, et al. "Pregel: a system for large-scale graph processing." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010.
148 | 
149 | - Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, and Ion Stoica. 2016. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems (GRADES '16). ACM, New York, NY, USA
150 | 
151 | - Steinbauer, Matthias, and Gabriele Anderst-Kotsis. "DynamoGraph: extending the Pregel paradigm for large-scale temporal graph processing." International Journal of Grid and Utility Computing 7.2 (2016): 141-151.
152 | 
153 | - Steinbauer, Matthias, and Gabriele Anderst-Kotsis. "DynamoGraph: A Distributed System for Large-scale, Temporal Graph Processing, its Implementation and First Observations." Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 2016.
154 | 
155 | - Khayyat, Zuhair, et al. "Mizan: a system for dynamic load balancing in large-scale graph processing." Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013.
156 | 
157 | - Sengupta, Dipanjan, et al. "Graphin: An online high performance incremental graph processing framework." European Conference on Parallel Processing. Springer International Publishing, 2016.
158 | 
159 | - Sabeur Aridhi, Alberto Montresor, and Yannis Velegrakis. 2016. BLADYG: A Novel Block-Centric Framework for the Analysis of Large Dynamic Graphs. In Proceedings of the ACM Workshop on High Performance Graph Processing (HPGP '16). ACM, New York, NY, USA
160 | 
161 | 
162 | #### Graph Processing on FPGAs
163 | - Umuroglu, Yaman, Donn Morrison, and Magnus Jahre. "Hybrid breadth-first search
164 |   on a single-chip FPGA-CPU heterogeneous platform." In Field Programmable Logic
165 |   and Applications (FPL), 2015 25th International Conference on, pp. 1-8. IEEE,
166 |   2015.
167 | 
168 | - Oguntebi and Kunle Olukotun. 2016. GraphOps: A Dataflow Library for Graph
169 |  Analytics Acceleration. In Proceedings of the 2016 ACM/SIGDA International
170 |  Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY, USA,
171 |  111-117. DOI: http://dx.doi.org/10.1145/2847263.2847337
172 | 
173 | - Nurvitadhi, Eriko, et al. "GraphGen: An FPGA framework for vertex-centric graph computation." Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on. IEEE, 2014.
174 | 
175 | - U. Bondhugula, A. Devulapalli, J. Fernando, P. Wyckoff and P. Sadayappan, "Parallel FPGA-based all-pairs shortest-paths in a directed graph," Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006
176 | 
177 | - Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 
178 | 
179 | - N. Engelhardt and H. K. H. So, "GraVF: A vertex-centric distributed graph processing framework on FPGAs," 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, Switzerland, 2016, pp. 1-4.
180 | 
181 | - Kapre, Nachiket. "Custom FPGA-based soft-processors for sparse graph
182 |   acceleration." In 2015 IEEE 26th International Conference on
183 |   Application-specific Systems, Architectures and Processors (ASAP), pp. 9-16.
184 |   IEEE, 2015.
185 | 
186 | - Kapre, Nachiket, and Pradeep Moorthy. "A case for embedded FPGA-based socs in
187 | energy-efficient acceleration of graph problems." Supercomputing frontiers and
188 | innovations 2, no. 3 (2015): 76-86.
189 | 
190 | - S. Zhou, C. Chelmis and V. K. Prasanna, "High-Throughput and Energy-Efficient
191 |   Graph Processing on FPGA," 2016 IEEE 24th Annual International Symposium on
192 |   Field-Programmable Custom Computing Machines (FCCM), Washington, DC, 2016, pp.
193 |   103-110.
194 | 
195 | #### Graph Processing on ASICs
196 | 
197 | - Ham, Tae Jun, et al. "Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics."
198 | 
199 | - Ozdal, Muhammet Mustafa, et al. "Energy efficient architecture for graph analytics accelerators." Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 2016.
200 | 
201 | - Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 105-117.
202 | 
203 | ### Graph Partition and Clustering
204 | - Chen, Rong, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. "Powerlyra:
205 |   Differentiated graph computation and partitioning on skewed graphs." In
206 |   Proceedings of the Tenth European Conference on Computer Systems, p. 1. ACM,
207 |   2015.
208 | 
209 | - Vaquero, Luis, et al. "xDGP: A dynamic graph processing system with adaptive partitioning." arXiv preprint arXiv:1309.1049 (2013).
210 | 
211 | - Julian Shun, Farbod Roosta-Khorasani, Kimon Fountoulakis and Michael Mahoney.
212 |   Parallel Local Graph Clustering. Proceedings of the International Conference
213 |   on Very Large Data Bases (VLDB), 2016.
214 | 
215 | - A. Abdolrashidi and L. Ramaswamy, "Continual and Cost-Effective Partitioning of Dynamic Graphs for Optimizing Big Graph Processing Systems," 2016 IEEE International Congress on Big Data (BigData Congress), San Francisco, CA, USA, 2016
216 | 
217 | - Andreas Beckmann, Ulrich Meyer and David, Veith, "An Implementation of I/O-Efficient Dynamic Breadth-First Search Using Level-Aligned Hierarchical Clustering", 21st Annual European Symposium of Algorithms (ESA), 2013. 
218 | 
219 | ### Graph Pre-processing
220 | 
221 | - Wu, Bo, Zhijia Zhao, Eddy Zheng Zhang, Yunlian Jiang, and Xipeng Shen.
222 |   "Complexity analysis and algorithm design for reorganizing data to minimize
223 |   non-coalesced memory accesses on GPU." In ACM SIGPLAN Notices, vol. 48, no. 8,
224 |   pp. 57-68. ACM, 2013.
225 | 
226 | - Khorasani, Farzad, Keval Vora, Rajiv Gupta, and Laxmi N. Bhuyan. "CuSha:
227 |   vertex-centric graph processing on GPUs." In Proceedings of the 23rd
228 |   international symposium on High-performance parallel and distributed
229 |   computing, pp. 239-252. ACM, 2014.
230 | 
231 | - Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Kai Tian, and Xipeng Shen. 2011.
232 |   On-the-fly elimination of dynamic irregularities for GPU computing. In
233 |   Proceedings of the sixteenth international conference on Architectural support
234 |   for programming languages and operating systems (ASPLOS XVI). ACM, New York, NY, USA, 369-380. 
235 | 
236 | - Sanders, Peter, Dominik Schultes, and Christian Vetter. "Mobile route
237 |   planning." In European Symposium on Algorithms, pp. 732-743. Springer Berlin
238 |   Heidelberg, 2008.
239 | 
240 | ### Load balancing
241 | 
242 | ### Graph Compression
243 | - Zhou, Fang. "Graph compression." Department of Computer Science and Helsinki Institute for Information Technology HIIT (2015): 1-12.
244 | 
245 | - S. Chen and J. H. Reif. 1996. Efficient Lossless Compression of Trees and Graphs. In Proceedings of the Conference on Data Compression (DCC '96). IEEE Computer Society, Washington
246 | 
247 | - Sebastian Maneth and Fabian Peternek, "A Survey on Methods and Systems for Graph Compression", Journal of CoRR, 2015 
248 | 
249 | - Sparsh Mittal and Jeffrey S. Vetter. 2016. A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems. IEEE Trans. Parallel Distrib. Syst. 27, 5 (May 2016), 1524-1536.  
250 | 
251 | - Vito Giovanni Castellana, Marco Minutoli, Alessandro Morari, Antonino Tumeo, Marco Lattuada, and Fabrizio Ferrandi. 2015. High Level Synthesis of RDF Queries for Graph Analytics. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '15). IEEE Press, Piscataway, NJ, USA, 323-330.
252 | 
253 | - Julian Shun, Laxman Dhulipala and Guy Blelloch. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+. Proceedings of the IEEE Data Compression Conference (DCC), pp. 403-412, 2015
254 | 
255 | ### Graph Approximate Computing
256 | - Shang, Zechao, and Jeffrey Xu Yu. "Auto-approximation of graph computing."
257 |   Proceedings of the VLDB Endowment 7, no. 14 (2014): 1833-1844.
258 | 
259 | 
260 | ## Graph Database
261 | 
262 | - Shi, Jiaxin, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. "Fast and
263 |   concurrent rdf queries with rdma-based distributed graph exploration." In 12th
264 |   USENIX Symposium on Operating Systems Design and Implementation (OSDI
265 |   16)(Savannah, GA. 2016.
266 |  
267 | - Xirogiannopoulos, Konstantinos, Udayan Khurana, and Amol Deshpande. "GraphGen: exploring interesting graphs in relational data." Proceedings of the VLDB Endowment 8.12 (2015): 2032-2035.
268 | 
269 | - Morari, Alessandro, Jesse Weaver, Oreste Villa, David Haglin, Antonino Tumeo,
270 |   Vito Giovanni Castellana, and John Feo. "High-Performance, Distributed
271 |   Dictionary Encoding of RDF Datasets." In 2015 IEEE International Conference on
272 |   Cluster Computing, pp. 250-253. IEEE, 2015.
273 | 
274 | - Morari, Alessandro, Vito Giovanni Castellana, Oreste Villa, Jesse Weaver,
275 |   Gregory Todd Williams, David J. Haglin, Antonino Tumeo, and John Feo. "GEMS:
276 |   Graph Database Engine for Multithreaded Systems." (2015): 139-156.
277 | 
278 | ## Research Groups on Database Query Acceleration
279 | - [Xtra Computing Group] (http://pdcc.ntu.edu.sg/xtra/)
280 | - [amplab] (https://amplab.cs.berkeley.edu/projects/succinct-enabling-queries-on-compressed-data/)
281 | 
282 | ## Readling List
283 | 
284 | ### Database Query Acceleration
285 | 
286 | - M. Sadoghi, R. Javed, N. Tarafdar, H. Singh, R. Palaniappan and H. A. Jacobsen, "Multi-query Stream Processing on FPGAs," 2012 IEEE 28th International Conference on Data Engineering, Washington, DC, 2012, pp. 1229-1232.
287 | 
288 | - Kocberber, Onur, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and
289 |   Parthasarathy Ranganathan. "Meet the walkers: Accelerating index traversals
290 |   for in-memory databases." In Proceedings of the 46th Annual IEEE/ACM
291 |       International Symposium on Microarchitecture, pp. 468-479. ACM, 2013.
292 | 
293 | - V. G. Castellana et al., "In-Memory Graph Databases for Web-Scale Data," in
294 |   Computer, vol. 48, no. 3, pp. 24-35, Mar. 2015.
295 | 
296 | - Zeng, Kai, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. "A
297 |   distributed graph engine for web scale RDF data." In Proceedings of the VLDB
298 |   Endowment, vol. 6, no. 4, pp. 265-276. VLDB Endowment, 2013.
299 | 
300 | - Sukhwani, Bharat, et al. "A hardware/software approach for database query acceleration with fpgas." International Journal of Parallel Programming 43.6 (2015): 1129-1159.
301 | 
302 | - Dennl, Christopher, Daniel Ziener, and Jurgen Teich. "On-the-fly composition of FPGA-based SQL query accelerators using a partially reconfigurable module library." Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on. IEEE, 2012.
303 | 
304 | - Wu, Lisa, et al. "The Q100 Database Processing Unit." IEEE Micro 35.3 (2015): 34-46. 
305 | 
306 | - Chung, Eric S., John D. Davis, and Jaewon Lee. "Linqits: Big data on little clients." ACM SIGARCH Computer Architecture News. Vol. 41. No. 3. ACM, 2013. 
307 | 
308 | - Halstead, Robert J., et al. "FPGA-based Multithreading for In-Memory Hash Joins." CIDR. 2015.
309 | 
310 | - Chen, Ren, and Viktor K. Prasanna. "Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform." 
311 | 
312 | - Wang, Zeke, Bingsheng He, and Wei Zhang. "A study of data partitioning on OpenCL-based FPGAs." 2015 25th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2015.
313 | 
314 | - R. R. Bordawekar and M. Sadoghi, "Accelerating database workloads by software-hardware-system co-design," 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, 2016, pp. 1428-1431.
315 | 
316 | - Guo, Cong and Martin Karsten. “Towards Adaptive Resource Allocation for Database Workloads.” ADMS@VLDB (2015).
317 | 
318 | - Johns Paul, Jiong He, and Bingsheng He. 2016. GPL: A GPU-based Pipelined Query Processing Engine. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY
319 | 
320 | - Jared Casper and Kunle Olukotun. 2014. Hardware acceleration of database operations. In Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays (FPGA '14)
321 | 
322 | - Gokul Soundararajan, Daniel Lupei, Saeed Ghanbari, Adrian Daniel Popescu, Jin Chen, and Cristiana Amza. 2009. Dynamic resource allocation for database servers running on virtual storage. In Proccedings of the 7th conference on File and storage technologies (FAST '09), Margo Seltzer and Ric Wheeler (Eds.). USENIX Association, Berkeley, CA, USA, 71-84.
323 | 
324 | - Bingsheng He and Jeffrey Xu Yu. 2011. High-throughput transaction executions on graphics processors. Proc. VLDB Endow. 4, 5 (February 2011), 314-325.
325 | 
326 | - Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Bernard Brezzo, Sameh Asaad, Donna Eng Dillenberger, "Database Analytics: A Reconfigurable-Computing Approach", IEEE Micro vol. 34 no. 1, p. 19-29, Jan.-Feb., 2014
327 | 
328 | - Shuang Chen, Shunning Jiang, Bingsheng He, and Xueyan Tang. 2016. A Study of Sorting Algorithms on Approximate Memory. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 647-662.  
329 | 
330 | -  Gustavo Alonso, "Data Processing on the fast lane", Systems Group, Department of Computer Science, ETH Zurich, Switzerland, FPL keynote, 2016.
331 | 
332 | - Zeke Wang, Huiyan Cheah, Johns Paul, Bingsheng He, and Wei Zhang. 2016. Accelerating Database Query Processing on OpenCL-based FPGAs (Abstract Only). In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '16). ACM, New York, NY
333 | 
334 | - Barthels, Claude, Ingo Müller, Timo Schneider, Gustavo Alonso, and Torsten
335 |   Hoefler. "Distributed Join Algorithms on Thousands of Cores." Proceedings of
336 |   the VLDB Endowment 10, no. 5 (2017).
337 | 
338 | ### Database Compression
339 | - Harald Lang, Tobias Mühlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann and Alfons Kemper. “Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation.” SIGMOD Conference (2016).  
340 | 
341 | - Lin, Chunbin, Jianguo Wang, and Yannis Papakonstantinou. "Data Compression for
342 | Analytics over Large-scale In-memory Column Databases." arXiv preprint
343 | arXiv:1606.09315 (2016).
344 | 
345 | ## Interesting Open Projects & Posts
346 | - [RIFF] (https://github.com/KastnerRG/riffa)
347 | - [Overclocking Arithmetic](https://constantinides.net/2014/12/11/overclocking-friendly-arithmetic-neednt-cost-the-earth/)
348 | - [Some Highlights in FPGA 2016](https://constantinides.net/2016/02/25/fpga-2016-some-highlights/)
349 | - [Time is Precision] (https://constantinides.net/2016/12/12/time-is-precision/)
350 | 
351 | ## Cutting Edge Techniques
352 | - Ousterhout, John, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee,
353 |   Behnam Montazeri, Diego Ongaro et al. "The ramcloud storage system." ACM
354 |   Transactions on Computer Systems (TOCS) 33, no. 3 (2015): 7.
355 | 
356 | - Ho, Chen-Han, Sung Jin Kim, and Karthikeyan Sankaralingam. "Efficient
357 |   execution of memory access phases using dataflow specialization." In ACM
358 |   SIGARCH Computer Architecture News, vol. 43, no. 3, pp. 118-130. ACM, 2015.
359 | 
360 | - Kumar, Snehasish, Arrvindh Shriraman, Vijayalakshmi Srinivasan, Dan Lin, and
361 |   Jordon Phillips. "SQRL: hardware accelerator for collecting software data
362 |   structures." In Proceedings of the 23rd international conference on Parallel
363 |   architectures and compilation, pp. 475-476. ACM, 2014.
364 | 
365 | - Schkufza, Eric, Rahul Sharma, and Alex Aiken. "Stochastic optimization of
366 |   floating-point programs with tunable precision." ACM SIGPLAN Notices 49, no. 6
367 |   (2014): 53-64.
368 | 
369 | ## Interesting Research Topic
370 | 
371 | ### Memory access related optimization
372 | - Guo, Qi, Tze-Meng Low, Nikolaos Alachiotis, Berkin Akin, Larry Pileggi, James
373 |   C. Hoe, and Franz Franchetti. "Enabling portable energy efficiency with memory
374 |   accelerated library." In Proceedings of the 48th International Symposium on
375 |   Microarchitecture, pp. 750-761. ACM, 2015.
376 | 
377 | - Appuswamy, Raja, Matthaios Olma, and Anastasia Ailamaki. "Scaling the Memory
378 |   Power Wall With DRAM-Aware Data Management." In Proceedings of the 11th
379 |   International Workshop on Data Management on New Hardware, p. 3. ACM, 2015.
380 | 
381 | - Akın, Berkin, Franz Franchetti, and James C. Hoe. "Understanding the design
382 |   space of dram-optimized hardware FFT accelerators." In 2014 IEEE 25th
383 |   International Conference on Application-Specific Systems, Architectures and
384 |   Processors, pp. 248-255. IEEE, 2014.
385 | 
386 | - Akin, Berkin, Franz Franchetti, and James C. Hoe. "Data reorganization in
387 |   memory using 3d-stacked dram." In ACM SIGARCH Computer Architecture News, vol.
388 |   43, no. 3, pp. 131-143. ACM, 2015.
389 | 
390 | - Hsieh, Kevin, Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali
391 |   Boroumand, Saugata Ghose, and Onur Mutlu. "Accelerating pointer chasing in
392 |   3D-stacked memory: Challenges, mechanisms, evaluation." In Computer Design
393 |   (ICCD), 2016 IEEE 34th International Conference on, pp. 25-32. IEEE, 2016.
394 | 
395 | ### FPGA Design Tools and Frameworks
396 | - Jacobsen, M., Richmond, D., Hogains, M., and Kastner, R. “RIFFA 2.1: A reusable integration framework for FPGA accelerators.” ACM Transactions on Reconfigurable Technology and Systems (TRETS), September 2015.  
397 | 
398 | - C. Pham-Quoc, Z. Al-Ars and K. Bertels, "Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling," Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, Phoenix, AZ, 2014, pp. 151-160.
399 | 
400 | - Niu, Xinyu, Wayne Luk, and Yu Wang. "EURECA: On-chip configuration generation for effective dynamic data access." Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2015.
401 | 
402 | ### Sparse Matrix Computing Acceleration on FPGAs
403 | - Umuroglu, Yaman, and Magnus Jahre. "Random access schemes for efficient FPGA
404 |   SpMV acceleration." Microprocessors and Microsystems (2016).
405 | 
406 | - Dorrance, Richard, Fengbo Ren, and Dejan Marković. "A scalable sparse
407 |   matrix-vector multiplication kernel for energy-efficient sparse-blas on
408 |   FPGAs." In Proceedings of the 2014 ACM/SIGDA international symposium on
409 |   Field-programmable gate arrays, pp. 161-170. ACM, 2014. 
410 | 
411 | - Jamro, Ernest, Tomasz Pabiś, Paweł Russek, and Kazimierz Wiatr. "The
412 |   algorithms for FPGA implementation of sparse matrices multiplication."
413 |   Computing and Informatics 33, no. 3 (2015): 667-684.
414 | 
415 | - Giefers, Heiner, Peter Staar, Costas Bekas, and Christoph Hagleitner.
416 |   "Analyzing the energy-efficiency of sparse matrix multiplication on
417 |   heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA." In
418 |   Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International
419 |   Symposium on, pp. 46-56. IEEE, 2016.
420 | 
421 | ### Manycore Simulation and Scalability Research
422 | - Yu, Xiangyao, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael
423 |   Stonebraker. "Staring into the abyss: An evaluation of concurrency control
424 |   with one thousand cores." Proceedings of the VLDB Endowment 8, no. 3 (2014):
425 |   209-220.
426 | 
427 | - Fu, Yaosheng, and David Wentzlaff. "PriME: A parallel and distributed
428 |   simulator for thousand-core chips." In Performance Analysis of Systems and
429 |   Software (ISPASS), 2014 IEEE International Symposium on, pp. 116-125. IEEE,
430 |   2014.
431 | 
432 | - Miller, Jason E., Harshad Kasture, George Kurian, Charles Gruenwald, Nathan
433 |   Beckmann, Christopher Celio, Jonathan Eastep, and Anant Agarwal. "Graphite: A
434 |   distributed parallel simulator for multicores." In HPCA-16 2010 The Sixteenth
435 |   International Symposium on High-Performance Computer Architecture, pp. 1-12.
436 |   IEEE, 2010.
437 | 
438 | - Carlson, Trevor E., Wim Heirman, and Lieven Eeckhout. "Sniper: exploring the
439 |   level of abstraction for scalable and accurate parallel multi-core
440 |   simulation." In Proceedings of 2011 International Conference for High
441 |   Performance Computing, Networking, Storage and Analysis, p. 52. ACM, 2011.
442 | 


--------------------------------------------------------------------------------
/refs.bib:
--------------------------------------------------------------------------------
   1 | @inproceedings{charousset2014caf,
   2 |     title={Caf-the c++ actor framework for scalable and resource-efficient applications},
   3 |     author={Charousset, Dominik and Hiesgen, Raphael and Schmidt, Thomas C},
   4 |     booktitle={Proceedings of the 4th International Workshop on Programming based on Actors Agents
   5 |         \& Decentralized Control},
   6 |     pages={15--28},
   7 |     year={2014},
   8 |     organization={ACM}
   9 | }
  10 | 
  11 | @inproceedings{hiesgen2015manyfold,
  12 |     title={Manyfold actors: extending the C++ actor framework to heterogeneous many-core machines
  13 |         using OpenCL},
  14 |     author={Hiesgen, Raphael and Charousset, Dominik and Schmidt, Thomas C},
  15 |     booktitle={Proceedings of the 5th International Workshop on Programming Based on Actors,
  16 |         Agents, and Decentralized Control},
  17 |     pages={45--56},
  18 |     year={2015},
  19 |     organization={ACM}
  20 | }
  21 | 
  22 | 
  23 | @article{wu2014q100,
  24 |     title={Q100: the architecture and design of a database processing unit},
  25 |     author={Wu, Lisa and Lottarini, Andrea and Paine, Timothy K and Kim, Martha A and Ross, Kenneth
  26 |         A},
  27 |     journal={ACM SIGPLAN Notices},
  28 |     volume={49},
  29 |     number={4},
  30 |     pages={255--268},
  31 |     year={2014},
  32 |     publisher={ACM}
  33 | }
  34 | 
  35 | 
  36 | @article{wu2015q100,
  37 |     title={The Q100 Database Processing Unit},
  38 |     author={Wu, Lisa and Lottarini, Andrea and Paine, Timothy K and Kim, Martha A and Ross, Kenneth
  39 |         A},
  40 |     journal={IEEE Micro},
  41 |     volume={35},
  42 |     number={3},
  43 |     pages={34--46},
  44 |     year={2015},
  45 |     publisher={IEEE}
  46 | }
  47 | 
  48 | @inproceedings{chung2013linqits,
  49 |     title={Linqits: Big data on little clients},
  50 |     author={Chung, Eric S and Davis, John D and Lee, Jaewon},
  51 |     booktitle={ACM SIGARCH Computer Architecture News},
  52 |     volume={41},
  53 |     number={3},
  54 |     pages={261--272},
  55 |     year={2013},
  56 |     organization={ACM}
  57 | }
  58 | 
  59 | @article{guotowards,
  60 |     title={Towards Adaptive Resource Allocation for Database Workloads},
  61 |     author={Guo, Cong and Karsten, Martin}
  62 | }
  63 | 
  64 | @inproceedings{soundararajan2009dynamic,
  65 |   title={Dynamic Resource Allocation for Database Servers Running on Virtual Storage.},
  66 |   author={Soundararajan, Gokul and Lupei, Daniel and Ghanbari, Saeed and Popescu, Adrian Daniel and Chen, Jin and Amza, Cristiana},
  67 |   booktitle={FAST},
  68 |   volume={9},
  69 |   pages={71--84},
  70 |   year={2009}
  71 | }
  72 | 
  73 | @inproceedings{sadoghi2012multi,
  74 |   title={Multi-query stream processing on fpgas},
  75 |   author={Sadoghi, Mohammad and Javed, Rija and Tarafdar, Naif and Singh, Harsh and Palaniappan, Rohan and Jacobsen, Hans-Arno},
  76 |   booktitle={2012 IEEE 28th International Conference on Data Engineering},
  77 |   pages={1229--1232},
  78 |   year={2012},
  79 |   organization={IEEE}
  80 | }
  81 | 
  82 | @article{chen2015accelerating,
  83 |   title={Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform},
  84 |   author={Chen, Ren and Prasanna, Viktor K}
  85 | }
  86 | 
  87 | @article{papadimitriou2011performance,
  88 |   title={Performance of aprtial reconfiguration in FPGA systems: A survey and a cost model},
  89 |   author={Papadimitriou, Kyprianos and Dollas, Apostolos and Hauck, Scott},
  90 |   journal={ACM Transactions on Reconfigurable Technology and Systems (TRETS)},
  91 |   volume={4},
  92 |   number={4},
  93 |   pages={36},
  94 |   year={2011},
  95 |   publisher={ACM}
  96 | }
  97 | 
  98 | @article{zhang2013omnidb,
  99 |   title={Omnidb: Towards portable and efficient query processing on parallel cpu/gpu architectures},
 100 |   author={Zhang, Shuhao and He, Jiong and He, Bingsheng and Lu, Mian},
 101 |   journal={Proceedings of the VLDB Endowment},
 102 |   volume={6},
 103 |   number={12},
 104 |   pages={1374--1377},
 105 |   year={2013},
 106 |   publisher={VLDB Endowment}
 107 | }
 108 | 
 109 | @inproceedings{nurvitadhi2014graphgen,
 110 |     title={GraphGen: An FPGA framework for vertex-centric graph computation},
 111 |     author={Nurvitadhi, Eriko and Weisz, Gabriel and Wang, Yu and Hurkat, Skand and Nguyen, Marie and Hoe, James C and Mart{\'\i}nez, Jos{\'e} F and Guestrin, Carlos},
 112 |     booktitle={Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on},
 113 |     pages={25--28},
 114 |     year={2014},
 115 |     organization={IEEE}
 116 | }
 117 | 
 118 | @inproceedings{ozdal2016energy,
 119 |     title={Energy efficient architecture for graph analytics accelerators},
 120 |     author={Ozdal, Muhammet Mustafa and Yesil, Serif and Kim, Taemin and Ayupov, Andrey and Greth, John and Burns, Steven and Ozturk, Ozcan},
 121 |     booktitle={Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on},
 122 |     pages={166--177},
 123 |     year={2016},
 124 |     organization={IEEE}
 125 | }
 126 | 
 127 | @article{hamgra2016graphicionado,
 128 |     title={Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics},
 129 |     author={Ham, Tae Jun and Wu, Lisa and Sundaram, Narayanan and Satish, Nadathur and Martonosi, Margaret}
 130 | }
 131 | 
 132 | @article{sundaram2015graphmat,
 133 |     title={GraphMat: High performance graph analytics made productive},
 134 |     author={Sundaram, Narayanan and Satish, Nadathur and Patwary, Md Mostofa Ali and Dulloor, Subramanya R and Anderson, Michael J and Vadlamudi, Satya Gautam and Das, Dipankar and Dubey, Pradeep},
 135 |     journal={Proceedings of the VLDB Endowment},
 136 |     volume={8},
 137 |     number={11},
 138 |     pages={1214--1225},
 139 |     year={2015},
 140 |     publisher={VLDB Endowment}
 141 | }
 142 | 
 143 | @article{sukhwani2015hardware,
 144 |     title={A hardware/software approach for database query acceleration with fpgas},
 145 |     author={Sukhwani, Bharat and Thoennes, Mathew and Min, Hong and Dube, Parijat and Brezzo, Bernard and Asaad, Sameh and Dillenberger, Donna},
 146 |     journal={International Journal of Parallel Programming},
 147 |     volume={43},
 148 |     number={6},
 149 |     pages={1129--1159},
 150 |     year={2015},
 151 |     publisher={Springer}
 152 | }
 153 | 
 154 | @article{vaquero2013xdgp,
 155 |     title={xDGP: A dynamic graph processing system with adaptive partitioning},
 156 |     author={Vaquero, Luis and Cuadrado, F{\'e}lix and Logothetis, Dionysios and Martella, Claudio},
 157 |     journal={arXiv preprint arXiv:1309.1049},
 158 |     year={2013}
 159 | }
 160 | 
 161 | @article{mccune2015thinking,
 162 |     title={Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing},
 163 |     author={McCune, Robert Ryan and Weninger, Tim and Madey, Greg},
 164 |     journal={ACM Computing Surveys (CSUR)},
 165 |     volume={48},
 166 |     number={2},
 167 |     pages={25},
 168 |     year={2015},
 169 |     publisher={ACM}
 170 | }
 171 | 
 172 | @inproceedings{gonzalez2014graphx,
 173 |     title={Graphx: Graph processing in a distributed dataflow framework},
 174 |     author={Gonzalez, Joseph E and Xin, Reynold S and Dave, Ankur and Crankshaw, Daniel and Franklin, Michael J and Stoica, Ion},
 175 |     booktitle={11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)},
 176 |     pages={599--613},
 177 |     year={2014}
 178 | }
 179 | 
 180 | @inproceedings{salihoglu2013gps,
 181 |     title={GPS: a graph processing system},
 182 |     author={Salihoglu, Semih and Widom, Jennifer},
 183 |     booktitle={Proceedings of the 25th International Conference on Scientific and Statistical Database Management},
 184 |     pages={22},
 185 |     year={2013},
 186 |     organization={ACM}
 187 | }
 188 | 
 189 | @inproceedings{malewicz2010pregel,
 190 |     title={Pregel: a system for large-scale graph processing},
 191 |     author={Malewicz, Grzegorz and Austern, Matthew H and Bik, Aart JC and Dehnert, James C and Horn, Ilan and Leiser, Naty and Czajkowski, Grzegorz},
 192 |     booktitle={Proceedings of the 2010 ACM SIGMOD International Conference on Management of data},
 193 |     pages={135--146},
 194 |     year={2010},
 195 |     organization={ACM}
 196 | }
 197 | 
 198 | @inproceedings{kyrola2012graphchi,
 199 |     title={GraphChi: large-scale graph computation on just a PC},
 200 |     author={Kyrola, Aapo and Blelloch, Guy and Guestrin, Carlos},
 201 |     booktitle={Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12)},
 202 |     pages={31--46},
 203 |     year={2012}
 204 | }
 205 | 
 206 | @inproceedings{dennl2012fly,
 207 |     title={On-the-fly composition of FPGA-based SQL query accelerators using a partially reconfigurable module library},
 208 |     author={Dennl, Christopher and Ziener, Daniel and Teich, Jurgen},
 209 |     booktitle={Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on},
 210 |     pages={45--52},
 211 |     year={2012},
 212 |     organization={IEEE}
 213 | }
 214 | 
 215 | @article{zhou2015graph,
 216 |     title={Graph compression},
 217 |     author={Zhou, Fang},
 218 |     journal={Department of Computer Science and Helsinki Institute for Information Technology HIIT},
 219 |     pages={1--12},
 220 |     year={2015}
 221 | }
 222 | 
 223 | @inproceedings{halstead2015fpga,
 224 |     title={FPGA-based Multithreading for In-Memory Hash Joins.},
 225 |     author={Halstead, Robert J and Absalyamov, Ildar and Najjar, Walid A and Tsotras, Vassilis J},
 226 |     booktitle={CIDR},
 227 |     year={2015}
 228 | }
 229 | 
 230 | @inproceedings{wang2015study,
 231 |     title={A study of data partitioning on OpenCL-based FPGAs},
 232 |     author={Wang, Zeke and He, Bingsheng and Zhang, Wei},
 233 |     booktitle={2015 25th International Conference on Field Programmable Logic and Applications (FPL)},
 234 |     pages={1--8},
 235 |     year={2015},
 236 |     organization={IEEE}
 237 | }
 238 | 
 239 | @INPROCEEDINGS{bondhugula2006parallel-APSP, 
 240 |     author={U. Bondhugula and A. Devulapalli and J. Fernando and P. Wyckoff and P. Sadayappan}, 
 241 |     booktitle={Proceedings 20th IEEE International Parallel Distributed Processing Symposium}, 
 242 |     title={Parallel FPGA-based all-pairs shortest-paths in a directed graph}, 
 243 |     year={2006}, 
 244 |     pages={10 pp.-}, 
 245 |     keywords={directed graphs;field programmable gate arrays;parallel algorithms;Cray XD1 processor;Floyd-Warshall algorithm;VLSI technology;all-pair shortest-path problem;bioinformatics application;directed graph;field programmable gate array;high performance computing;parallel FPGA design;parallel computing;very large scale integration;Algorithm design and analysis;Application software;Bonding;Design optimization;Field programmable gate arrays;High performance
 246 |         computing;Parallel processing;Signal processing algorithms;Supercomputers;Very large scale integration}, 
 247 |     doi={10.1109/IPDPS.2006.1639347}, 
 248 |     ISSN={1530-2075}, 
 249 |     month={April},
 250 | }
 251 | 
 252 | @INPROCEEDINGS{abdolrashidi2016continual, 
 253 |     author={A. Abdolrashidi and L. Ramaswamy}, 
 254 |     booktitle={2016 IEEE International Congress on Big Data (BigData Congress)}, 
 255 |     title={Continual and Cost-Effective Partitioning of Dynamic Graphs for Optimizing Big Graph Processing Systems}, 
 256 |     year={2016}, 
 257 |     pages={18-25}, 
 258 |     keywords={Distributed Vertex-Centric Graph Processing;Graph Partitioning;Performance Evaluation;Time-Evolving Graphs}, 
 259 |     doi={10.1109/BigDataCongress.2016.12}, 
 260 |     month={June},
 261 | }
 262 | 
 263 | @inproceedings{dai2016fpgp,
 264 |     author = {Dai, Guohao and Chi, Yuze and Wang, Yu and Yang, Huazhong},
 265 |     title = {FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search},
 266 |     booktitle = {Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
 267 |     series = {FPGA '16},
 268 |     year = {2016},
 269 |     isbn = {978-1-4503-3856-1},
 270 |     location = {Monterey, California, USA},
 271 |     pages = {105--110},
 272 |     numpages = {6},
 273 |     url = {http://doi.acm.org/10.1145/2847263.2847339},
 274 |     doi = {10.1145/2847263.2847339},
 275 |     acmid = {2847339},
 276 |     publisher = {ACM},
 277 |     address = {New York, NY, USA},
 278 |     keywords = {fpga framework, large scale graph processing},
 279 | } 
 280 | 
 281 | @INPROCEEDINGS{sadogi2012multi-query, 
 282 |     author={M. Sadoghi and R. Javed and N. Tarafdar and H. Singh and R. Palaniappan and H. A. Jacobsen}, 
 283 |     booktitle={2012 IEEE 28th International Conference on Data Engineering}, 
 284 |     title={Multi-query Stream Processing on FPGAs}, 
 285 |     year={2012}, 
 286 |     pages={1229-1232}, 
 287 |     keywords={field programmable gate arrays;logic design;query processing;FPGA;algorithmic trading;complex event processing;high-frequency event streams;line-rate multiquery processing;low-level logic design;multiquery event stream platform;multiquery stream processing;parallelism;pipelining;real-time data analytics;reconfigurable hardware;targeted advertisement;Algebra;Bandwidth;Field programmable gate arrays;Hardware;Hardware design languages;Parallel processing;Semantics}, 
 288 |     doi={10.1109/ICDE.2012.39}, 
 289 |     ISSN={1063-6382}, 
 290 |     month={April},
 291 | }
 292 | 
 293 | @INPROCEEDINGS{engelhardt2016gravf, 
 294 |     author={N. Engelhardt and H. K. H. So}, 
 295 |     booktitle={2016 26th International Conference on Field Programmable Logic and Applications (FPL)}, 
 296 |     title={GraVF: A vertex-centric distributed graph processing framework on FPGAs}, 
 297 |     year={2016}, 
 298 |     pages={1-4}, 
 299 |     keywords={Algorithm design and analysis;Computational modeling;Computer architecture;Field programmable gate arrays;Hardware;Kernel;Programming}, 
 300 |     doi={10.1109/FPL.2016.7577360}, 
 301 |     month={Aug},
 302 | }
 303 | 
 304 | @article{sparsh2012survey,
 305 |     author = {Mittal, Sparsh},
 306 |     title = {A Survey of Architectural Techniques for DRAM Power Management},
 307 |     journal = {Int. J. High Perform. Syst. Archit.},
 308 |     issue_date = {December 2012},
 309 |     volume = {4},
 310 |     number = {2},
 311 |     month = dec,
 312 |     year = {2012},
 313 |     issn = {1751-6528},
 314 |     pages = {110--119},
 315 |     numpages = {10},
 316 |     url = {http://dx.doi.org/10.1504/IJHPSA.2012.050990},
 317 |     doi = {10.1504/IJHPSA.2012.050990},
 318 |     acmid = {2421513},
 319 |     publisher = {Inderscience Publishers},
 320 |     address = {Inderscience Publishers, Geneva, SWITZERLAND},
 321 | } 
 322 | 
 323 | @ARTICLE{zhong2014medusa, 
 324 |     author={J. Zhong and B. He}, 
 325 |     journal={IEEE Transactions on Parallel and Distributed Systems}, 
 326 |     title={Medusa: Simplified Graph Processing on GPUs}, 
 327 |     year={2014}, 
 328 |     volume={25}, 
 329 |     number={6}, 
 330 |     pages={1543-1552}, 
 331 |     keywords={C++ language;application program interfaces;data structures;graph theory;graphics processing units;optimisation;source code (software);API;GPGPU programs;GPU graph operations;Medusa;data structures;graph processing;graph-centric optimizations;graphics processing unit;runtime system;sequential C-C++ code;source code;Algorithm design and analysis;Data structures;Graphics processing units;Memory management;Optimization;Parallel processing;Programming;GPGPU;GPU
 332 |         programming;graph processing;runtime framework}, 
 333 |     doi={10.1109/TPDS.2013.111}, 
 334 |     ISSN={1045-9219}, 
 335 |     month={June},
 336 | }
 337 | 
 338 | @INPROCEEDINGS{bordawekar2016accelerating, 
 339 |     author={R. R. Bordawekar and M. Sadoghi}, 
 340 |     booktitle={2016 IEEE 32nd International Conference on Data Engineering (ICDE)}, 
 341 |     title={Accelerating database workloads by software-hardware-system co-design}, 
 342 |     year={2016}, 
 343 |     pages={1428-1431}, 
 344 |     keywords={SQL;business data processing;field programmable gate arrays;graphics processing units;hardware-software codesign;query processing;relational databases;FPGA;GPU;NoSQL database;data stream management system;database workload acceleration;enterprise data management workload;field-programmable gate array;graphics processing unit;query execution pipeline;relational database;software-hardware-system codesign;system-level
 345 |         characterization;Acceleration;Computer architecture;Databases;Field programmable gate arrays;Graphics processing units;Hardware;Programming}, 
 346 |     doi={10.1109/ICDE.2016.7498362}, 
 347 |     month={May},
 348 | }
 349 | 
 350 | @inproceedings{Guo2015TowardsAR,
 351 |     title={Towards Adaptive Resource Allocation for Database Workloads},
 352 |     author={Cong Guo and Martin Karsten},
 353 |     booktitle={ADMS@VLDB},
 354 |     year={2015}
 355 | }
 356 | 
 357 | @inproceedings{chen1996efficient,
 358 |     author = {Chen, S. and Reif, J. H.},
 359 |     title = {Efficient Lossless Compression of Trees and Graphs},
 360 |     booktitle = {Proceedings of the  Conference on Data Compression},
 361 |     series = {DCC '96},
 362 |     year = {1996},
 363 |     isbn = {0-8186-7358-3},
 364 |     pages = {428--},
 365 |     url = {http://dl.acm.org/citation.cfm?id=789084.789454},
 366 |     acmid = {789454},
 367 |     publisher = {IEEE Computer Society},
 368 |     address = {Washington, DC, USA},
 369 | } 
 370 | 
 371 | @inproceedings{paul2016gpl,
 372 |     author = {Paul, Johns and He, Jiong and He, Bingsheng},
 373 |     title = {GPL: A GPU-based Pipelined Query Processing Engine},
 374 |     booktitle = {Proceedings of the 2016 International Conference on Management of Data},
 375 |     series = {SIGMOD '16},
 376 |     year = {2016},
 377 |     isbn = {978-1-4503-3531-7},
 378 |     location = {San Francisco, California, USA},
 379 |     pages = {1935--1950},
 380 |     numpages = {16},
 381 |     url = {http://doi.acm.org/10.1145/2882903.2915224},
 382 |     doi = {10.1145/2882903.2915224},
 383 |     acmid = {2915224},
 384 |     publisher = {ACM},
 385 |     address = {New York, NY, USA},
 386 |     keywords = {KBE, channel, pipelined execution, tiling},
 387 | } 
 388 | 
 389 | @inproceedings{casper2014hardware,
 390 |     author = {Casper, Jared and Olukotun, Kunle},
 391 |     title = {Hardware Acceleration of Database Operations},
 392 |     booktitle = {Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays},
 393 |     series = {FPGA '14},
 394 |     year = {2014},
 395 |     isbn = {978-1-4503-2671-1},
 396 |     location = {Monterey, California, USA},
 397 |     pages = {151--160},
 398 |     numpages = {10},
 399 |     url = {http://doi.acm.org/10.1145/2554688.2554787},
 400 |     doi = {10.1145/2554688.2554787},
 401 |     acmid = {2554787},
 402 |     publisher = {ACM},
 403 |     address = {New York, NY, USA},
 404 |     keywords = {database, fpga, hardware acceleration, join, sort},
 405 | } 
 406 | 
 407 | @inproceedings{Lang2016DataBH,
 408 |     title={Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation},
 409 |     author={Harald Lang and Tobias M\"{u}hlbauer and Florian Funke and Peter A. Boncz and Thomas Neumann and Alfons Kemper},
 410 |     booktitle={SIGMOD Conference},
 411 |     year={2016}
 412 | }
 413 | 
 414 | @inbook{Beckmann2013,
 415 |     author={Beckmann, Andreas and Meyer, Ulrich and Veith, David},
 416 |     editor={Bodlaender, Hans L.and Italiano, Giuseppe F.},
 417 |     title={An Implementation of I/O-Efficient Dynamic Breadth-First Search Using Level-Aligned Hierarchical Clustering},
 418 |     bookTitle={Algorithms -- ESA 2013: 21st Annual European Symposium, Sophia Antipolis, France, September 2-4, 2013. Proceedings},
 419 |     year={2013},
 420 |     publisher={Springer Berlin Heidelberg},
 421 |     address={Berlin, Heidelberg},
 422 |     pages={121--132},
 423 |     isbn={978-3-642-40450-4},
 424 |     doi={10.1007/978-3-642-40450-4_11},
 425 |     url={http://dx.doi.org/10.1007/978-3-642-40450-4_11}
 426 | }
 427 | 
 428 | @article{maneth2015survey,
 429 |     author    = {Sebastian Maneth and Fabian Peternek},
 430 |     title     = {A Survey on Methods and Systems for Graph Compression},
 431 |     journal   = {CoRR},
 432 |     volume    = {abs/1504.00616},
 433 |     year      = {2015},
 434 |     url       = {http://arxiv.org/abs/1504.00616},
 435 |     timestamp = {Sat, 02 May 2015 17:50:32 +0200},
 436 |     biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/ManethP15},
 437 |     bibsource = {dblp computer science bibliography, http://dblp.org}
 438 | }
 439 | 
 440 | @article{He2011high,
 441 |     author = {He, Bingsheng and Yu, Jeffrey Xu},
 442 |     title = {High-throughput Transaction Executions on Graphics Processors},
 443 |     journal = {Proc. VLDB Endow.},
 444 |     issue_date = {February 2011},
 445 |     volume = {4},
 446 |     number = {5},
 447 |     month = feb,
 448 |     year = {2011},
 449 |     issn = {2150-8097},
 450 |     pages = {314--325},
 451 |     numpages = {12},
 452 |     url = {http://dx.doi.org/10.14778/1952376.1952381},
 453 |     doi = {10.14778/1952376.1952381},
 454 |     acmid = {1952381},
 455 |     publisher = {VLDB Endowment},
 456 | } 
 457 | 
 458 | @article{Mittal2016survey,
 459 |     author = {Mittal, Sparsh and Vetter, Jeffrey S.},
 460 |     title = {A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems},
 461 |     journal = {IEEE Trans. Parallel Distrib. Syst.},
 462 |     issue_date = {May 2016},
 463 |     volume = {27},
 464 |     number = {5},
 465 |     month = may,
 466 |     year = {2016},
 467 |     issn = {1045-9219},
 468 |     pages = {1524--1536},
 469 |     numpages = {13},
 470 |     url = {http://dx.doi.org/10.1109/TPDS.2015.2435788},
 471 |     doi = {10.1109/TPDS.2015.2435788},
 472 |     acmid = {2927579},
 473 |     publisher = {IEEE Press},
 474 |     address = {Piscataway, NJ, USA},
 475 | } 
 476 | 
 477 | @article{Sukhwani2014database,
 478 |     author = {Bharat Sukhwani, and Hong Min, and Mathew Thoennes, and Parijat Dube, and Bernard Brezzo, and Sameh Asaad, and Donna Eng Dillenberger, },
 479 |     title = {Database Analytics: A Reconfigurable-Computing Approach},
 480 |     journal = {IEEE Micro},
 481 |     volume = {34},
 482 |     number = {1},
 483 |     issn = {0272-1732},
 484 |     year = {2014},
 485 |     pages = {19-29},
 486 |     doi = {doi.ieeecomputersociety.org/10.1109/MM.2013.107},
 487 |     publisher = {IEEE Computer Society},
 488 |     address = {Los Alamitos, CA, USA},
 489 | }
 490 | 
 491 | @inproceedings{Chen2016study,
 492 |     author = {Chen, Shuang and Jiang, Shunning and He, Bingsheng and Tang, Xueyan},
 493 |     title = {A Study of Sorting Algorithms on Approximate Memory},
 494 |     booktitle = {Proceedings of the 2016 International Conference on Management of Data},
 495 |     series = {SIGMOD '16},
 496 |     year = {2016},
 497 |     isbn = {978-1-4503-3531-7},
 498 |     location = {San Francisco, California, USA},
 499 |     pages = {647--662},
 500 |     numpages = {16},
 501 |     url = {http://doi.acm.org/10.1145/2882903.2882908},
 502 |     doi = {10.1145/2882903.2882908},
 503 |     acmid = {2882908},
 504 |     publisher = {ACM},
 505 |     address = {New York, NY, USA},
 506 |     keywords = {approximate storage, database, hybrid storage, phase change memory, sorting algorithms},
 507 | } 
 508 | 
 509 | @inproceedings{Wang2016accelerating,
 510 |     author = {Wang, Zeke and Cheah, Huiyan and Paul, Johns and He, Bingsheng and Zhang, Wei},
 511 |     title = {Accelerating Database Query Processing on OpenCL-based FPGAs (Abstract Only)},
 512 |     booktitle = {Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
 513 |     series = {FPGA '16},
 514 |     year = {2016},
 515 |     isbn = {978-1-4503-3856-1},
 516 |     location = {Monterey, California, USA},
 517 |     pages = {274--274},
 518 |     numpages = {1},
 519 |     url = {http://doi.acm.org/10.1145/2847263.2847295},
 520 |     doi = {10.1145/2847263.2847295},
 521 |     acmid = {2847295},
 522 |     publisher = {ACM},
 523 |     address = {New York, NY, USA},
 524 |     keywords = {fpga, opencl, query processing},
 525 | } 
 526 | 
 527 | @INPROCEEDINGS{quoc2014automated, 
 528 |     author={C. Pham-Quoc and Z. Al-Ars and K. Bertels}, 
 529 |     booktitle={Parallel Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International}, 
 530 |     title={Automated Hybrid Interconnect Design for FPGA Accelerators Using Data Communication Profiling}, 
 531 |     year={2014}, 
 532 |     pages={151-160}, 
 533 |     keywords={data communication;field programmable gate arrays;interconnections;network-on-chip;shared memory systems;FPGA bus-based accelerator system;NoC;adaptive mapping function;automated hybrid interconnect design;communication behavior;custom interconnect design algorithm;data communication profiling;energy reduction;kernels;quantitative communication profiling;shared local memory solution;Algorithm design and analysis;Computer architecture;DH-HEMTs;Data
 534 |         communication;Field programmable gate arrays;Kernel;Optimization;FPGA-based accelerator;communication profiling;custom interconnect}, 
 535 |     doi={10.1109/IPDPSW.2014.21}, 
 536 |     month={May},
 537 | }
 538 | 
 539 | @inproceedings{Castellana2015HLS,
 540 |     author = {Castellana, Vito Giovanni and Minutoli, Marco and Morari, Alessandro and Tumeo, Antonino and Lattuada, Marco and Ferrandi, Fabrizio},
 541 |     title = {High Level Synthesis of RDF Queries for Graph Analytics},
 542 |     booktitle = {Proceedings of the IEEE/ACM International Conference on Computer-Aided Design},
 543 |     series = {ICCAD '15},
 544 |     year = {2015},
 545 |     isbn = {978-1-4673-8389-9},
 546 |     location = {Austin, TX, USA},
 547 |     pages = {323--330},
 548 |     numpages = {8},
 549 |     url = {http://dl.acm.org/citation.cfm?id=2840819.2840865},
 550 |     acmid = {2840865},
 551 |     publisher = {IEEE Press},
 552 |     address = {Piscataway, NJ, USA},
 553 | } 
 554 | 
 555 | @inproceedings{Aridhi2016bladyg,
 556 |     author = {Aridhi, Sabeur and Montresor, Alberto and Velegrakis, Yannis},
 557 |     title = {BLADYG: A Novel Block-Centric Framework for the Analysis of Large Dynamic Graphs},
 558 |     booktitle = {Proceedings of the ACM Workshop on High Performance Graph Processing},
 559 |     series = {HPGP '16},
 560 |     year = {2016},
 561 |     isbn = {978-1-4503-4350-3},
 562 |     location = {Kyoto, Japan},
 563 |     pages = {39--42},
 564 |     numpages = {4},
 565 |     url = {http://doi.acm.org/10.1145/2915516.2915525},
 566 |     doi = {10.1145/2915516.2915525},
 567 |     acmid = {2915525},
 568 |     publisher = {ACM},
 569 |     address = {New York, NY, USA},
 570 |     keywords = {akka framework, distributed graph processing, dynamic graphs},
 571 | } 
 572 | 
 573 | 
 574 | @article{Chi2015NXgraph,
 575 |     author    = {Yuze Chi and Guohao Dai and Yu Wang and Guangyu Sun and Guoliang Li and Huazhong Yang},
 576 |     title     = {NXgraph: An Efficient Graph Processing System on a Single Machine},
 577 |     journal   = {CoRR},
 578 |     volume    = {abs/1510.06916},
 579 |     year      = {2015},
 580 |     url       = {http://arxiv.org/abs/1510.06916},
 581 |     timestamp = {Wed, 03 Aug 2016 14:57:48 +0200},
 582 |     biburl    = {http://dblp.uni-trier.de/rec/bib/journals/corr/ChiDWSLY15},
 583 |     bibsource = {dblp computer science bibliography, http://dblp.org}
 584 | }
 585 | 
 586 | @inproceedings{Ahn2015pim,
 587 |     author = {Ahn, Junwhan and Hong, Sungpack and Yoo, Sungjoo and Mutlu, Onur and Choi, Kiyoung},
 588 |     title = {A Scalable Processing-in-memory Accelerator for Parallel Graph Processing},
 589 |     booktitle = {Proceedings of the 42Nd Annual International Symposium on Computer Architecture},
 590 |     series = {ISCA '15},
 591 |     year = {2015},
 592 |     isbn = {978-1-4503-3402-0},
 593 |     location = {Portland, Oregon},
 594 |     pages = {105--117},
 595 |     numpages = {13},
 596 |     url = {http://doi.acm.org/10.1145/2749469.2750386},
 597 |     doi = {10.1145/2749469.2750386},
 598 |     acmid = {2750386},
 599 |     publisher = {ACM},
 600 |     address = {New York, NY, USA},
 601 | }
 602 | 
 603 | @inproceedings{sengupta2016graphin,
 604 |     title={Graphin: An online high performance incremental graph processing framework},
 605 |     author={Sengupta, Dipanjan and Sundaram, Narayanan and Zhu, Xia and Willke, Theodore L and Young, Jeffrey and Wolf, Matthew and Schwan, Karsten},
 606 |     booktitle={European Conference on Parallel Processing},
 607 |     pages={319--333},
 608 |     year={2016},
 609 |     organization={Springer}
 610 | }
 611 | 
 612 | @inproceedings{niu2015eureca,
 613 |     title={EURECA: On-chip configuration generation for effective dynamic data access},
 614 |     author={Niu, Xinyu and Luk, Wayne and Wang, Yu},
 615 |     booktitle={Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
 616 |     pages={74--83},
 617 |     year={2015},
 618 |     organization={ACM}
 619 | }
 620 | 
 621 | @article{xirogiannopoulos2015graphgen,
 622 |     title={GraphGen: exploring interesting graphs in relational data},
 623 |     author={Xirogiannopoulos, Konstantinos and Khurana, Udayan and Deshpande, Amol},
 624 |     journal={Proceedings of the VLDB Endowment},
 625 |     volume={8},
 626 |     number={12},
 627 |     pages={2032--2035},
 628 |     year={2015},
 629 |     publisher={VLDB Endowment}
 630 | }
 631 | 
 632 | @inproceedings{khayyat2013mizan,
 633 |     title={Mizan: a system for dynamic load balancing in large-scale graph processing},
 634 |     author={Khayyat, Zuhair and Awara, Karim and Alonazi, Amani and Jamjoom, Hani and Williams, Dan and Kalnis, Panos},
 635 |     booktitle={Proceedings of the 8th ACM European Conference on Computer Systems},
 636 |     pages={169--182},
 637 |     year={2013},
 638 |     organization={ACM}
 639 | }
 640 | 
 641 | @article{doekemeijer2014survey,
 642 |     title={A survey of parallel graph processing frameworks},
 643 |     author={Doekemeijer, Niels and Varbanescu, Ana Lucia},
 644 |     journal={Delft University of Technology},
 645 |     year={2014}
 646 | }
 647 | 
 648 | @article{steinbauer2016dynamograph-journal,
 649 |     title={DynamoGraph: extending the Pregel paradigm for large-scale temporal graph processing},
 650 |     author={Steinbauer, Matthias and Anderst-Kotsis, Gabriele},
 651 |     journal={International Journal of Grid and Utility Computing},
 652 |     volume={7},
 653 |     number={2},
 654 |     pages={141--151},
 655 |     year={2016},
 656 |     publisher={Inderscience Publishers (IEL)}
 657 | }
 658 | 
 659 | @inproceedings{steinbauer2016dynamograph-conf,
 660 |     title={DynamoGraph: A Distributed System for Large-scale, Temporal Graph Processing, its Implementation and First Observations},
 661 |     author={Steinbauer, Matthias and Anderst-Kotsis, Gabriele},
 662 |     booktitle={Proceedings of the 25th International Conference Companion on World Wide Web},
 663 |     pages={861--866},
 664 |     year={2016},
 665 |     organization={International World Wide Web Conferences Steering Committee}
 666 | }
 667 | 
 668 | @inproceedings{Iyer2016time-evolving,
 669 |     author = {Iyer, Anand Padmanabha and Li, Li Erran and Das, Tathagata and Stoica, Ion},
 670 |     title = {Time-evolving Graph Processing at Scale},
 671 |     booktitle = {Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems},
 672 |     series = {GRADES '16},
 673 |     year = {2016},
 674 |     isbn = {978-1-4503-4780-8},
 675 |     location = {Redwood Shores, California},
 676 |     pages = {5:1--5:6},
 677 |     articleno = {5},
 678 |     numpages = {6},
 679 |     url = {http://doi.acm.org/10.1145/2960414.2960419},
 680 |     doi = {10.1145/2960414.2960419},
 681 |     acmid = {2960419},
 682 |     publisher = {ACM},
 683 |     address = {New York, NY, USA},
 684 | } 
 685 | 
 686 | @article{Shun2013ligra,
 687 |     author = {Shun, Julian and Blelloch, Guy E.},
 688 |     title = {Ligra: A Lightweight Graph Processing Framework for Shared Memory},
 689 |     journal = {SIGPLAN Not.},
 690 |     issue_date = {August 2013},
 691 |     volume = {48},
 692 |     number = {8},
 693 |     month = feb,
 694 |     year = {2013},
 695 |     issn = {0362-1340},
 696 |     pages = {135--146},
 697 |     numpages = {12},
 698 |     url = {http://doi.acm.org/10.1145/2517327.2442530},
 699 |     doi = {10.1145/2517327.2442530},
 700 |     acmid = {2442530},
 701 |     publisher = {ACM},
 702 |     address = {New York, NY, USA},
 703 |     keywords = {graph algorithms, parallel programming, shared memory},
 704 | } 
 705 | 
 706 | @inproceedings{Shun2015ligra+,
 707 |     author = {Shun, Julian and Dhulipala, Laxman and Blelloch, Guy E.},
 708 |     title = {Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+},
 709 |     booktitle = {Proceedings of the 2015 Data Compression Conference},
 710 |     series = {DCC '15},
 711 |     year = {2015},
 712 |     isbn = {978-1-4799-8430-5},
 713 |     pages = {403--412},
 714 |     numpages = {10},
 715 |     url = {http://dx.doi.org/10.1109/DCC.2015.8},
 716 |     doi = {10.1109/DCC.2015.8},
 717 |     acmid = {2860198},
 718 |     publisher = {IEEE Computer Society},
 719 |     address = {Washington, DC, USA},
 720 |     keywords = {Graph compression, Parallel algorithms},
 721 | } 
 722 | 
 723 | @article{Shun2016parallel,
 724 |     author    = {Julian Shun and
 725 |         Farbod Roosta{-}Khorasani and
 726 |             Kimon Fountoulakis and
 727 |             Michael W. Mahoney},
 728 |     title     = {Parallel Local Graph Clustering},
 729 |     journal   = {CoRR},
 730 |     volume    = {abs/1604.07515},
 731 |     year      = {2016},
 732 |     url       = {http://arxiv.org/abs/1604.07515},
 733 |     timestamp = {Mon, 02 May 2016 18:22:52 +0200},
 734 |     biburl    =
 735 |     {http://dblp.uni-trier.de/rec/bib/journals/corr/ShunRFM16},
 736 |     bibsource = {dblp computer science bibliography,
 737 |         http://dblp.org}
 738 | }
 739 | 
 740 | @inproceedings{wang2016gunrock,
 741 |     title={Gunrock: A high-performance graph processing library on the GPU},
 742 |     author={Wang, Yangzihao and Davidson, Andrew and Pan, Yuechao and Wu,
 743 |         Yuduo and Riffel, Andy and Owens, John D},
 744 |     booktitle={Proceedings of the 21st ACM SIGPLAN Symposium on Principles
 745 |         and Practice of Parallel Programming},
 746 |     pages={11},
 747 |     year={2016},
 748 |     organization={ACM}
 749 | }
 750 | 
 751 | @inproceedings{Davidson2014work-efficient,
 752 |     author = {Davidson, Andrew and Baxter, Sean and Garland, Michael and Owens,
 753 |         John D.},
 754 |     title = {Work-Efficient Parallel GPU Methods for Single-Source Shortest
 755 |         Paths},
 756 |     booktitle = {Proceedings of the 2014 IEEE 28th International Parallel and
 757 |         Distributed Processing Symposium},
 758 |     series = {IPDPS '14},
 759 |     year = {2014},
 760 |     isbn = {978-1-4799-3800-1},
 761 |     pages = {349--359},
 762 |     numpages = {11},
 763 |     url = {http://dx.doi.org/10.1109/IPDPS.2014.45},
 764 |     doi = {10.1109/IPDPS.2014.45},
 765 |     acmid = {2650649},
 766 |     publisher = {IEEE Computer Society},
 767 |     address = {Washington, DC, USA},
 768 |     keywords = {GPU computing, graph traversal, single-source
 769 |         shortest paths, sparse graphs},
 770 | } 
 771 | 
 772 | @article{singh2016modified,
 773 |     title={Modified Dijkstra’s Algorithm for Dense Graphs on GPU using CUDA},
 774 |     author={Singh, Dhirendra Pratap and Khare, Nilay},
 775 |     journal={Indian Journal of Science and Technology},
 776 |     volume={9},
 777 |     number={33},
 778 |     year={2016}
 779 | }
 780 | 
 781 | @article{singh2016efficient,
 782 |     title={Efficient Parallel Implementation of Single Source Shortest Path
 783 |         Algorithm on GPU Using CUDA},
 784 |     author={Singh, Dhirendra Pratap and Khare, Nilay and Rasool, Akhtar},
 785 |     journal={International Journal of Applied Engineering Research},
 786 |     volume={11},
 787 |     number={4},
 788 |     pages={2560--2567},
 789 |     year={2016}
 790 | }
 791 | 
 792 | @article{delling2013phast,
 793 |     title={Phast: Hardware-accelerated shortest path trees},
 794 |     author={Delling, Daniel and Goldberg, Andrew V and Nowatzyk, Andreas and
 795 |         Werneck, Renato F},
 796 |     journal={Journal of Parallel and Distributed Computing},
 797 |     volume={73},
 798 |     number={7},
 799 |     pages={940--952},
 800 |     year={2013},
 801 |     publisher={Elsevier}
 802 | }
 803 | 
 804 | @article{meyer2003delta,
 805 |     title={$\Delta$-stepping: a parallelizable shortest path algorithm},
 806 |     author={Meyer, Ulrich and Sanders, Peter},
 807 |     journal={Journal of Algorithms},
 808 |     volume={49},
 809 |     number={1},
 810 |     pages={114--152},
 811 |     year={2003},
 812 |     publisher={Elsevier}
 813 | }
 814 | 
 815 | @inproceedings{merrill2012scalable,
 816 |     title={Scalable GPU graph traversal},
 817 |     author={Merrill, Duane and Garland, Michael and Grimshaw, Andrew},
 818 |     booktitle={ACM SIGPLAN Notices},
 819 |     volume={47},
 820 |     number={8},
 821 |     pages={117--128},
 822 |     year={2012},
 823 |     organization={ACM}
 824 | }
 825 | 
 826 | @ARTICLE{Castellana2015in-memory, 
 827 |     author={V. G. Castellana and A. Morari and J. Weaver and A. Tumeo and D.
 828 |         Haglin and O. Villa and J. Feo}, 
 829 |     journal={Computer}, 
 830 |     title={In-Memory Graph Databases for Web-Scale Data}, 
 831 |     year={2015}, 
 832 |     volume={48}, 
 833 |     number={3}, 
 834 |     pages={24-35}, 
 835 |     keywords={Internet;database management systems;graph theory;pattern
 836 |         clustering;Web-scale data;commodity clusters;graph-based
 837 |             methods;heterogeneous data;inmemory graph databases;scalable
 838 |             resource description framework databases;software stack;Algorithm
 839 |             design and analysis;Clustering algorithms;Data structures;Pattern
 840 |             matching;Resource description framework;Resource management;Software
 841 |             development;RDF databases;SPARQL;big data;graph
 842 |             databases;high-performance computing;multithreading}, 
 843 |     doi={10.1109/MC.2015.74}, 
 844 |     ISSN={0018-9162}, 
 845 |     month={Mar},
 846 | }
 847 | 
 848 | @inproceedings{zeng2013distributed,
 849 |     title={A distributed graph engine for web scale RDF data},
 850 |     author={Zeng, Kai and Yang, Jiacheng and Wang, Haixun and Shao, Bin and
 851 |         Wang, Zhongyuan},
 852 |     booktitle={Proceedings of the VLDB Endowment},
 853 |     volume={6},
 854 |     number={4},
 855 |     pages={265--276},
 856 |     year={2013},
 857 |     organization={VLDB Endowment}
 858 | }
 859 | 
 860 | @misc{microsoft2013trinity,
 861 |     author = {Microsoft},
 862 |     title = {{Trinity}},
 863 |     howpublished =
 864 |         "\url{https://www.microsoft.com/en-us/research/project/trinity/}",
 865 |     year = {2013}, 
 866 |     note = "[Online; accessed 10-November-2016]"
 867 | }
 868 | 
 869 | @inproceedings{kocberber2013meet,
 870 |     title={Meet the walkers: Accelerating index traversals for in-memory
 871 |         databases},
 872 |     author={Kocberber, Onur and Grot, Boris and Picorel, Javier and Falsafi,
 873 |         Babak and Lim, Kevin and Ranganathan, Parthasarathy},
 874 |     booktitle={Proceedings of the 46th Annual IEEE/ACM International
 875 |         Symposium on Microarchitecture},
 876 |     pages={468--479},
 877 |     year={2013},
 878 |     organization={ACM}
 879 | }
 880 | 
 881 | @inproceedings{shi2016fast,
 882 |     author    = {Jiaxin Shi and Youyang Yao and Rong Chen and Haibo Chen and Feifei Li},
 883 |     title     = {Fast and Concurrent {RDF} Queries with RDMA-Based Distributed Graph Exploration},
 884 |     booktitle = {12th {USENIX} Symposium on Operating Systems Design and
 885 |         Implementation,
 886 |         {OSDI} 2016, Savannah, GA, USA, November 2-4,
 887 |         2016.},
 888 |     pages     = {317--332},
 889 |     year      = {2016},
 890 |     url       =
 891 |     {https://www.usenix.org/conference/osdi16/technical-sessions/presentation/shi},
 892 |     timestamp = {Tue, 08 Nov 2016 07:18:04 +0100},
 893 |     biburl    =
 894 |     {http://dblp.uni-trier.de/rec/bib/conf/osdi/ShiYCCL16},
 895 |     bibsource = {dblp computer science bibliography,
 896 |         http://dblp.org}
 897 | }
 898 | 
 899 | @inproceedings{chen2015powerlyra,
 900 |     title={Powerlyra: Differentiated graph computation and partitioning on
 901 |         skewed graphs},
 902 |     author={Chen, Rong and Shi, Jiaxin and Chen, Yanzhe and Chen, Haibo},
 903 |     booktitle={Proceedings of the Tenth European Conference on Computer
 904 |         Systems},
 905 |     pages={1},
 906 |     year={2015},
 907 |     organization={ACM}
 908 | }
 909 | 
 910 | @inproceedings{fu2014mapgraph,
 911 |     title={Mapgraph: A high level API for fast development of high performance
 912 |         graph analytics on GPUs},
 913 |     author={Fu, Zhisong and Personick, Michael and Thompson, Bryan},
 914 |     booktitle={Proceedings of Workshop on GRAph Data management
 915 |         Experiences and Systems},
 916 |     pages={1--6},
 917 |     year={2014},
 918 |     organization={ACM}
 919 | }
 920 | 
 921 | @inproceedings{dorrance2014scalable,
 922 |     title={A scalable sparse matrix-vector multiplication kernel for
 923 |         energy-efficient sparse-blas on FPGAs},
 924 |     author={Dorrance, Richard and Ren, Fengbo and Markovi{\'c}, Dejan},
 925 |     booktitle={Proceedings of the 2014 ACM/SIGDA international symposium
 926 |         on Field-programmable gate arrays},
 927 |     pages={161--170},
 928 |     year={2014},
 929 |     organization={ACM}
 930 | }
 931 | 
 932 | @inproceedings{Khorasani2014CuSha,
 933 | 
 934 |     author = {Khorasani, Farzad and Vora, Keval and Gupta, Rajiv and Bhuyan,
 935 |         Laxmi N.},
 936 | 
 937 |     title = {CuSha: Vertex-centric Graph Processing on GPUs},
 938 | 
 939 |     booktitle = {Proceedings of the 23rd International Symposium on
 940 |         High-performance Parallel and Distributed Computing},
 941 | 
 942 |     series = {HPDC '14},
 943 | 
 944 |     year = {2014},
 945 | 
 946 |     isbn = {978-1-4503-2749-7},
 947 | 
 948 |     location = {Vancouver, BC, Canada},
 949 | 
 950 |     pages = {239--252},
 951 | 
 952 |     numpages = {14},
 953 | 
 954 |     url = {http://doi.acm.org/10.1145/2600212.2600227},
 955 | 
 956 |     doi = {10.1145/2600212.2600227},
 957 | 
 958 |     acmid = {2600227},
 959 | 
 960 |     publisher = {ACM},
 961 | 
 962 |     address = {New York, NY, USA},
 963 | 
 964 |     keywords = {coalesced memory accesses, concatenated windows,
 965 |         g-shards, gpu, graph representation},
 966 | 
 967 | } 
 968 | 
 969 | @inproceedings{wu2013complexity,
 970 |     author = {Wu, Bo and Zhao, Zhijia and Zhang, Eddy Zheng and Jiang, Yunlian
 971 |         and Shen, Xipeng},
 972 |     title = {Complexity Analysis and Algorithm Design for Reorganizing Data to
 973 |         Minimize Non-coalesced Memory Accesses on GPU},
 974 |     booktitle = {Proceedings of the 18th ACM SIGPLAN Symposium on Principles
 975 |         and Practice of Parallel Programming},
 976 |     series = {PPoPP '13},
 977 |     year = {2013},
 978 |     isbn = {978-1-4503-1922-5},
 979 |     location = {Shenzhen, China},
 980 |     pages = {57--68},
 981 |     numpages = {12},
 982 |     url = {http://doi.acm.org/10.1145/2442516.2442523},
 983 |     doi = {10.1145/2442516.2442523},
 984 |     acmid = {2442523},
 985 |     publisher = {ACM},
 986 |     address = {New York, NY, USA},
 987 |     keywords = {computational complexity, data transformation,
 988 |         gpgpu, memory coalescing, runtime optimizations,
 989 |         thread-data remapping},
 990 | 
 991 | } 
 992 | 
 993 | @article{Zhang2011elimination,
 994 | 
 995 |     author = {Zhang, Eddy Z. and Jiang, Yunlian and Guo, Ziyu and Tian, Kai and
 996 |         Shen, Xipeng},
 997 | 
 998 |     title = {On-the-fly Elimination of Dynamic Irregularities for GPU
 999 |         Computing},
1000 | 
1001 |     journal = {SIGPLAN Not.},
1002 | 
1003 |     issue_date = {March 2011},
1004 | 
1005 |     volume = {46},
1006 | 
1007 |     number = {3},
1008 | 
1009 |     month = mar,
1010 | 
1011 |     year = {2011},
1012 | 
1013 |     issn = {0362-1340},
1014 | 
1015 |     pages = {369--380},
1016 | 
1017 |     numpages = {12},
1018 | 
1019 |     url = {http://doi.acm.org/10.1145/1961296.1950408},
1020 | 
1021 |     doi = {10.1145/1961296.1950408},
1022 | 
1023 |     acmid = {1950408},
1024 | 
1025 |     publisher = {ACM},
1026 | 
1027 |     address = {New York, NY, USA},
1028 | 
1029 |     keywords = {cpu-gpu pipelining, data transformation, gpgpu,
1030 |         memory coalescing, thread data remapping, thread
1031 |             divergence},
1032 | 
1033 | } 
1034 | 
1035 | @inproceedings{cheng2012kineograph,
1036 | 
1037 |     title={Kineograph: taking the pulse of a fast-changing and connected
1038 |         world},
1039 | 
1040 |     author={Cheng, Raymond and Hong, Ji and Kyrola, Aapo and Miao, Youshan
1041 |         and Weng, Xuetian and Wu, Ming and Yang, Fan and Zhou, Lidong and
1042 |             Zhao, Feng and Chen, Enhong},
1043 | 
1044 |     booktitle={Proceedings of the 7th ACM european conference on Computer
1045 |         Systems},
1046 | 
1047 |     pages={85--98},
1048 | 
1049 |     year={2012},
1050 | 
1051 |     organization={ACM}
1052 | 
1053 | }
1054 | 
1055 | @inproceedings{hong2011accelerating,
1056 |     title={Accelerating CUDA graph algorithms at maximum warp},
1057 |     author={Hong, Sungpack and Kim, Sang Kyun and Oguntebi, Tayo and
1058 |         Olukotun, Kunle},
1059 |     booktitle={ACM SIGPLAN Notices},
1060 |     volume={46},
1061 |     number={8},
1062 |     pages={267--276},
1063 |     year={2011},
1064 |     organization={ACM}
1065 | }
1066 | 
1067 | @inproceedings{hong2011accelerating,
1068 |     title={Accelerating CUDA graph algorithms at maximum warp},
1069 |     author={Hong, Sungpack and Kim, Sang Kyun and Oguntebi, Tayo and
1070 |         Olukotun, Kunle},
1071 |     booktitle={ACM SIGPLAN Notices},
1072 |     volume={46},
1073 |     number={8},
1074 |     pages={267--276},
1075 |     year={2011},
1076 |     organization={ACM}
1077 | }
1078 | 
1079 | @inproceedings{morari2015high,
1080 |     title={High-Performance, Distributed Dictionary Encoding of RDF Datasets},
1081 |     author={Morari, Alessandro and Weaver, Jesse and Villa, Oreste and
1082 |         Haglin, David and Tumeo, Antonino and Castellana, Vito Giovanni and
1083 |             Feo, John},
1084 |     booktitle={2015 IEEE International Conference on Cluster Computing},
1085 |     pages={250--253},
1086 |     year={2015},
1087 |     organization={IEEE}
1088 | }
1089 | 
1090 | @article{lin2016data,
1091 |     title={Data Compression for Analytics over Large-scale In-memory Column
1092 |         Databases},
1093 |     author={Lin, Chunbin and Wang, Jianguo and Papakonstantinou, Yannis},
1094 |     journal={arXiv preprint arXiv:1606.09315},
1095 |     year={2016}
1096 | }
1097 | 
1098 | @article{ousterhout2015ramcloud,
1099 |     title={The ramcloud storage system},
1100 |     author={Ousterhout, John and Gopalan, Arjun and Gupta, Ashish and
1101 |         Kejriwal, Ankita and Lee, Collin and Montazeri, Behnam and Ongaro,
1102 |         Diego and Park, Seo Jin and Qin, Henry and Rosenblum, Mendel and
1103 |             others},
1104 |     journal={ACM Transactions on Computer Systems (TOCS)},
1105 |     volume={33},
1106 |     number={3},
1107 |     pages={7},
1108 |     year={2015},
1109 |     publisher={ACM}
1110 | }
1111 | 
1112 | @article{jamro2015algorithms,
1113 |     title={The algorithms for FPGA implementation of sparse matrices
1114 |         multiplication},
1115 |     author={Jamro, Ernest and Pabi{\'s}, Tomasz and Russek, Pawe{\l} and
1116 |         Wiatr, Kazimierz},
1117 |     journal={Computing and Informatics},
1118 |     volume={33},
1119 |     number={3},
1120 |     pages={667--684},
1121 |     year={2015}
1122 | }
1123 | 
1124 | @misc{morari2015gems,
1125 |     title={GEMS: Graph Database Engine for Multithreaded Systems.},
1126 |     author={Morari, Alessandro and Castellana, Vito Giovanni and Villa,
1127 |         Oreste and Weaver, Jesse and Williams, Gregory Todd and Haglin,
1128 |         David J and Tumeo, Antonino and Feo, John},
1129 |     year={2015}
1130 | }
1131 | 
1132 | @inproceedings{geisberger2008contraction,
1133 |     title={Contraction hierarchies: Faster and simpler hierarchical routing in
1134 |         road networks},
1135 |     author={Geisberger, Robert and Sanders, Peter and Schultes, Dominik and
1136 |         Delling, Daniel},
1137 |     booktitle={International Workshop on Experimental and Efficient
1138 |         Algorithms},
1139 |     pages={319--333},
1140 |     year={2008},
1141 |     organization={Springer}
1142 | }
1143 | 
1144 | @article{kapre2015case,
1145 |     author = {Nachiket Kapre and Pradeep Moorthy},
1146 |     title = {A Case for Embedded FPGA-based SoCs in Energy-Efficient
1147 |         Acceleration of Graph Problems},
1148 |     journal = {Supercomputing frontiers and innovations},
1149 |     volume = {2},
1150 |     number = {3},
1151 |     year = {2015},
1152 |     keywords = {},
1153 |     abstract = {Sparse graph problems are
1154 |         notoriously hard to accelerate on
1155 |             conventional platforms due to
1156 |             irregular memory access patterns
1157 |             resulting in underutilization of
1158 |             memory bandwidth. These bottlenecks
1159 |             on traditional x86-based systems
1160 |             mean that sparse graph problems
1161 |             scale very poorly, both in terms of
1162 |             performance and power efficiency. A
1163 |             cluster of embedded SoCs
1164 |             (systems-on-chip) with
1165 |             closely-coupled FPGA accelerators
1166 |             can support distributed memory
1167 |             accesses with better matched
1168 |             low-power processing. We first
1169 |             conduct preliminary experiments
1170 |             across a range of COTS (commercial
1171 |                     off-the-shelf) embedded SoCs
1172 |             to establish promise for
1173 |             energy-efficiency acceleration of
1174 |             sparse problems. We select the
1175 |             Xilinx Zynq SoC with FPGA
1176 |             accelerators to construct a
1177 |             prototype 32-node Beowulf cluster.
1178 |             We develop specialized MPI routines
1179 |             and memory DMA offload engines to
1180 |             support irregular communication
1181 |             efficiently. In this setup, we use
1182 |             the ARM processor as a data
1183 |             marshaller for local DMA traffic as
1184 |             well as remote MPI traffic while the
1185 |             FPGA may be used as a programmable
1186 |             accelerator. Across a set of
1187 |             benchmark graphs, we show that
1188 |             32-node embedded SoC cluster can
1189 |             exceed the energy efficiency of an
1190 |             Intel E5-2407 by as much as 1.7× at
1191 |             a total graph processing capacity of
1192 |             91–95 MTEPS for graphs as large as
1193 |             32 million nodes and edges. },
1194 | 
1195 |     issn = {2313-8734},url = {http://superfri.org/superfri/article/view/62}
1196 | 
1197 | }
1198 | 
1199 | @INPROCEEDINGS{zhou2016high, 
1200 |     author={S. Zhou and C. Chelmis and V. K. Prasanna}, 
1201 |     booktitle={2016 IEEE 24th Annual International Symposium on
1202 |         Field-Programmable Custom Computing Machines (FCCM)}, 
1203 |     title={High-Throughput and Energy-Efficient Graph Processing on FPGA}, 
1204 |     year={2016}, 
1205 |     pages={103-110}, 
1206 |     keywords={DRAM chips;energy conservation;field programmable gate
1207 |         arrays;logic design;parallel architectures;performance evaluation;power
1208 |             aware computing;trees (mathematics);FPGA design;MTEPS;concurrent
1209 |             multiple input data processing;data layout;edge-centric
1210 |             computing;efficient memory activation schedule;energy-efficiency
1211 |             improvement;energy-efficient graph processing;external memory
1212 |             bandwidth saturation;external memory performance
1213 |             optimization;high-throughput graph processing;large-scale graph
1214 |             processing design;million traversed edges per second;minimum
1215 |             spanning tree algorithm;on-chip memory power consumption
1216 |             reduction;parallel architecture;single-source shortest path
1217 |             algorithm;throughput improvement;weakly connected component
1218 |             algorithm;Field programmable gate arrays;Layout;Memory
1219 |             management;Optimization;Random access memory;Throughput;Writing}, 
1220 |     doi={10.1109/FCCM.2016.35}, 
1221 |     month={May},
1222 | }
1223 | 
1224 | @inproceedings{sanders2008mobile,
1225 |     title={Mobile route planning},
1226 |     author={Sanders, Peter and Schultes, Dominik and Vetter, Christian},
1227 |     booktitle={European Symposium on Algorithms},
1228 |     pages={732--743},
1229 |     year={2008},
1230 |     organization={Springer}
1231 | }
1232 | 
1233 | @inproceedings{ho2015efficient,
1234 |     title={Efficient execution of memory access phases using dataflow
1235 |         specialization},
1236 |     author={Ho, Chen-Han and Kim, Sung Jin and Sankaralingam, Karthikeyan},
1237 |     booktitle={ACM SIGARCH Computer Architecture News},
1238 |     volume={43},
1239 |     number={3},
1240 |     pages={118--130},
1241 |     year={2015},
1242 |     organization={ACM}
1243 | }
1244 | 
1245 | @inproceedings{kumar2014sqrl,
1246 |     title={SQRL: hardware accelerator for collecting software data
1247 |         structures},
1248 |     author={Kumar, Snehasish and Shriraman, Arrvindh and Srinivasan,
1249 |         Vijayalakshmi and Lin, Dan and Phillips, Jordon},
1250 |     booktitle={Proceedings of the 23rd international conference on
1251 |         Parallel architectures and compilation},
1252 |     pages={475--476},
1253 |     year={2014},
1254 |     organization={ACM}
1255 | }
1256 | 
1257 | @article{schkufza2014stochastic,
1258 |     title={Stochastic optimization of floating-point programs with tunable
1259 |         precision},
1260 |     author={Schkufza, Eric and Sharma, Rahul and Aiken, Alex},
1261 |     journal={ACM SIGPLAN Notices},
1262 |     volume={49},
1263 |     number={6},
1264 |     pages={53--64},
1265 |     year={2014},
1266 |     publisher={ACM}
1267 | }
1268 | 
1269 | @article{yu2014staring,
1270 |     title={Staring into the abyss: An evaluation of concurrency control with
1271 |         one thousand cores},
1272 |     author={Yu, Xiangyao and Bezerra, George and Pavlo, Andrew and Devadas,
1273 |         Srinivas and Stonebraker, Michael},
1274 |     journal={Proceedings of the VLDB Endowment},
1275 |     volume={8},
1276 |     number={3},
1277 |     pages={209--220},
1278 |     year={2014},
1279 |     publisher={VLDB Endowment}
1280 | }
1281 | 
1282 | @inproceedings{giefers2016analyzing,
1283 |     title={Analyzing the energy-efficiency of sparse matrix multiplication on
1284 |         heterogeneous systems: A comparative study of GPU, Xeon Phi and FPGA},
1285 |     author={Giefers, Heiner and Staar, Peter and Bekas, Costas and
1286 |         Hagleitner, Christoph},
1287 |     booktitle={Performance Analysis of Systems and Software (ISPASS), 2016
1288 |         IEEE International Symposium on},
1289 |     pages={46--56},
1290 |     year={2016},
1291 |     organization={IEEE}
1292 | }
1293 | 
1294 | @inproceedings{zheng2015flashgraph,
1295 |     title={FlashGraph: Processing billion-node graphs on an array of commodity
1296 |         SSDs},
1297 |     author={Zheng, Da and Mhembere, Disa and Burns, Randal and Vogelstein,
1298 |         Joshua and Priebe, Carey E and Szalay, Alexander S},
1299 |     booktitle={13th USENIX Conference on File and Storage Technologies
1300 |         (FAST 15)},
1301 |     pages={45--58},
1302 |     year={2015}
1303 | }
1304 | 
1305 | @inproceedings{yuan2014fast,
1306 |     title={Fast iterative graph computation: A path centric approach},
1307 |     author={Yuan, Pingpeng and Zhang, Wenya and Xie, Changfeng and Jin, Hai
1308 |         and Liu, Ling and Lee, Kisung},
1309 |     booktitle={Proceedings of the International Conference for High
1310 |         Performance Computing, Networking, Storage and Analysis},
1311 |     pages={401--412},
1312 |     year={2014},
1313 |     organization={IEEE Press}
1314 | }
1315 | 
1316 | @article{najeebullah2014bpp,
1317 |     title={BPP: Large Graph Storage for Efficient Disk Based Processing},
1318 |     author={Najeebullah, Kamran and Khan, Kifayat Ullah and Nawaz, Waqas and
1319 |         Lee, Young-Koo},
1320 |     journal={arXiv preprint arXiv:1401.2327},
1321 |     year={2014}
1322 | }
1323 | 
1324 | @inproceedings{nilakant2014prefedge,
1325 |     title={PrefEdge: SSD prefetcher for large-scale graph traversal},
1326 |     author={Nilakant, Karthik and Dalibard, Valentin and Roy, Amitabha and
1327 |         Yoneki, Eiko},
1328 |     booktitle={Proceedings of International Conference on Systems and
1329 |         Storage},
1330 |     pages={1--12},
1331 |     year={2014},
1332 |     organization={ACM}
1333 | }
1334 | 
1335 | @inproceedings{fu2014prime,
1336 |     title={PriME: A parallel and distributed simulator for thousand-core
1337 |         chips},
1338 |     author={Fu, Yaosheng and Wentzlaff, David},
1339 |     booktitle={Performance Analysis of Systems and Software (ISPASS), 2014
1340 |         IEEE International Symposium on},
1341 |     pages={116--125},
1342 |     year={2014},
1343 |     organization={IEEE}
1344 | }
1345 | 
1346 | @inproceedings{miller2010graphite,
1347 |     title={Graphite: A distributed parallel simulator for multicores},
1348 |     author={Miller, Jason E and Kasture, Harshad and Kurian, George and
1349 |         Gruenwald, Charles and Beckmann, Nathan and Celio, Christopher and
1350 |             Eastep, Jonathan and Agarwal, Anant},
1351 |     booktitle={HPCA-16 2010 The Sixteenth International Symposium on
1352 |         High-Performance Computer Architecture},
1353 |     pages={1--12},
1354 |     year={2010},
1355 |     organization={IEEE}
1356 | }
1357 | 
1358 | @inproceedings{nguyen2013lightweight,
1359 |     title={A lightweight infrastructure for graph analytics},
1360 |     author={Nguyen, Donald and Lenharth, Andrew and Pingali, Keshav},
1361 |     booktitle={Proceedings of the Twenty-Fourth ACM Symposium on Operating
1362 |         Systems Principles},
1363 |     pages={456--471},
1364 |     year={2013},
1365 |     organization={ACM}
1366 | 
1367 | }
1368 | 
1369 | @inproceedings{carlson2011sniper,
1370 |     title={Sniper: exploring the level of abstraction for scalable and
1371 |         accurate parallel multi-core simulation},
1372 |     author={Carlson, Trevor E and Heirman, Wim and Eeckhout, Lieven},
1373 |     booktitle={Proceedings of 2011 International Conference for High
1374 |         Performance Computing, Networking, Storage and Analysis},
1375 |     pages={52},
1376 |     year={2011},
1377 |     organization={ACM}
1378 | }
1379 | 
1380 | @article{barthels2017distributed,
1381 |     title={Distributed Join Algorithms on Thousands of Cores},
1382 |     author={Barthels, Claude and M{\"u}ller, Ingo and Schneider, Timo and
1383 |         Alonso, Gustavo and Hoefler, Torsten},
1384 |     journal={Proceedings of the VLDB Endowment},
1385 |     volume={10},
1386 |     number={5},
1387 |     year={2017}
1388 | }
1389 | 
1390 | @article{shang2014auto,
1391 |     title={Auto-approximation of graph computing},
1392 |     author={Shang, Zechao and Yu, Jeffrey Xu},
1393 |     journal={Proceedings of the VLDB Endowment},
1394 |     volume={7},
1395 |     number={14},
1396 |     pages={1833--1844},
1397 |     year={2014},
1398 |     publisher={VLDB Endowment}
1399 | }
1400 | 
1401 | @article{Shang2016large,
1402 |     title = {{Graph analytics through fine-grained parallelism}},
1403 |     author = {Shang, Zechao and Li, Feifei and Yu, Jeffrey Xu and Zhang, Zhiwei
1404 |         and Cheng, Hong},
1405 |     doi = {10.1145/2882903.2915238},
1406 |     isbn = {9781450335317},
1407 |     issn = {07308078},
1408 |     journal = {Sigmod},
1409 |     pages = {463--478},
1410 |     url = {http://dl.acm.org/citation.cfm?doid=2882903.2915238},
1411 |     year = {2016}
1412 | }
1413 | 
1414 | @inproceedings{roy2013x-stream,
1415 |     title={X-Stream: edge-centric graph processing using streaming
1416 |         partitions},
1417 |     author={Roy, Amitabha and Mihailovic, Ivo and Zwaenepoel, Willy},
1418 |     booktitle={Proceedings of the Twenty-Fourth ACM Symposium on Operating
1419 |         Systems Principles},
1420 |     pages={472--488},
1421 |     year={2013},
1422 |     organization={ACM}
1423 | }
1424 | 
1425 | @article{shi2015frog,
1426 |     title={Frog: Asynchronous graph processing on GPU with hybrid coloring
1427 |         model},
1428 |     author={Shi, Xuanhua and Liang, J and Luo, X and Di, S and He, B and Lu,
1429 |         L and Jin, Hai},
1430 |     journal={Huazhong University of Science and Technology, Tech. Rep.
1431 |         HUSTCGCL-TR-402},
1432 |     year={2015}
1433 | }
1434 | 
1435 | @inproceedings{akin2014understanding,
1436 |     title={Understanding the design space of dram-optimized hardware FFT
1437 |         accelerators},
1438 |     author={Ak{\i}n, Berkin and Franchetti, Franz and Hoe, James C},
1439 |     booktitle={2014 IEEE 25th International Conference on
1440 |         Application-Specific Systems, Architectures and Processors},
1441 |     pages={248--255},
1442 |     year={2014},
1443 |     organization={IEEE}
1444 | }
1445 | 
1446 | @inproceedings{akin2015data,
1447 |     title={Data reorganization in memory using 3d-stacked dram},
1448 |     author={Akin, Berkin and Franchetti, Franz and Hoe, James C},
1449 |     booktitle={ACM SIGARCH Computer Architecture News},
1450 |     volume={43},
1451 |     number={3},
1452 |     pages={131--143},
1453 |     year={2015},
1454 |     organization={ACM}
1455 | }
1456 | 
1457 | @inproceedings{venkataraman2013presto,
1458 |     title={Presto: distributed machine learning and graph processing with
1459 |         sparse matrices},
1460 |     author={Venkataraman, Shivaram and Bodzsar, Erik and Roy, Indrajit and
1461 |         AuYoung, Alvin and Schreiber, Robert S},
1462 |     booktitle={Proceedings of the 8th ACM European Conference on Computer
1463 |         Systems},
1464 |     pages={197--210},
1465 |     year={2013},
1466 |     organization={ACM}
1467 | }
1468 | 
1469 | @inproceedings{hsieh2016accelerating,
1470 |     title={Accelerating pointer chasing in 3D-stacked memory: Challenges,
1471 |         mechanisms, evaluation},
1472 |     author={Hsieh, Kevin and Khan, Samira and Vijaykumar, Nandita and Chang,
1473 |         Kevin K and Boroumand, Amirali and Ghose, Saugata and Mutlu, Onur},
1474 |     booktitle={Computer Design (ICCD), 2016 IEEE 34th International
1475 |         Conference on},
1476 |     pages={25--32},
1477 |     year={2016},
1478 |     organization={IEEE}
1479 | }
1480 | 
1481 | @inproceedings{appuswamy2015scaling,
1482 |     title={Scaling the Memory Power Wall With DRAM-Aware Data Management},
1483 |     author={Appuswamy, Raja and Olma, Matthaios and Ailamaki, Anastasia},
1484 |     booktitle={Proceedings of the 11th International Workshop on Data
1485 |         Management on New Hardware},
1486 |     pages={3},
1487 |     year={2015},
1488 |     organization={ACM}
1489 | }
1490 | 
1491 | @article{umuroglu2016random,
1492 |     title={Random access schemes for efficient FPGA SpMV acceleration},
1493 |     author={Umuroglu, Yaman and Jahre, Magnus},
1494 |     journal={Microprocessors and Microsystems},
1495 |     year={2016},
1496 |     publisher={Elsevier}
1497 | }
1498 | 
1499 | @inproceedings{guo2015enabling,
1500 |     title={Enabling portable energy efficiency with memory accelerated
1501 |         library},
1502 |     author={Guo, Qi and Low, Tze-Meng and Alachiotis, Nikolaos and Akin,
1503 |         Berkin and Pileggi, Larry and Hoe, James C and Franchetti, Franz},
1504 |     booktitle={Proceedings of the 48th International Symposium on
1505 |         Microarchitecture},
1506 |     pages={750--761},
1507 |     year={2015},
1508 |     organization={ACM}
1509 | }
1510 | 
1511 | @inproceedings{Oguntebi:2016:GDL:2847263.2847337,
1512 |     author = {Oguntebi, Tayo and Olukotun, Kunle},
1513 |     title = {GraphOps: A Dataflow Library for Graph Analytics Acceleration},
1514 |     booktitle = {Proceedings of the 2016 ACM/SIGDA International Symposium on
1515 |         Field-Programmable Gate Arrays},
1516 |     series = {FPGA '16},
1517 |     year = {2016},
1518 |     isbn = {978-1-4503-3856-1},
1519 |     location = {Monterey, California, USA},
1520 |     pages = {111--117},
1521 |     numpages = {7},
1522 |     url = {http://doi.acm.org/10.1145/2847263.2847337},
1523 |     doi = {10.1145/2847263.2847337},
1524 |     acmid = {2847337},
1525 |     publisher = {ACM},
1526 |     address = {New York, NY, USA},
1527 |     keywords = {accelerator, analytics, dataflow, fpga, graph
1528 |         analysis},
1529 | } 
1530 | 
1531 | @inproceedings{umuroglu2015hybrid,
1532 |     title={Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous
1533 |         platform},
1534 |     author={Umuroglu, Yaman and Morrison, Donn and Jahre, Magnus},
1535 |     booktitle={Field Programmable Logic and Applications (FPL), 2015 25th
1536 |         International Conference on},
1537 |     pages={1--8},
1538 |     year={2015},
1539 |     organization={IEEE}
1540 | }
1541 | 
1542 | 


--------------------------------------------------------------------------------