├── .gitignore
├── LICENSE
├── Makefile
├── README.md
├── bigdata.bbl
├── bigdata.bib
├── bigdata.tex
├── docs
├── bigdata.css
├── bigdata10.html
├── bigdata11.html
├── bigdata12.html
├── bigdata13.html
├── bigdata14.html
├── bigdata15.html
├── bigdata16.html
├── bigdata17.html
├── bigdata18.html
├── bigdata19.html
├── bigdata2.html
├── bigdata3.html
├── bigdata4.html
├── bigdata5.html
├── bigdata6.html
├── bigdata7.html
├── bigdata8.html
├── bigdata9.html
├── images
└── index.html
└── images
├── MapReduce.png
├── PigHive_MR.png
├── PigHive_Tez.png
├── data-management.png
├── hadoop.png
├── hbase-architecture.png
├── hdfs-architecture.png
├── mesos-architecture.jpg
├── mongodb-replica-set.png
├── mongodb-sharding.png
├── mongodb-storage-structure.png
├── riak-data-distribution.png
├── riak-ring.png
├── yarn-architecture.png
└── zookeeper.jpg
/.gitignore:
--------------------------------------------------------------------------------
1 | bigdata.epub
2 | bigdata.pdf
3 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU GENERAL PUBLIC LICENSE
2 | Version 2, June 1991
3 |
4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
6 | Everyone is permitted to copy and distribute verbatim copies
7 | of this license document, but changing it is not allowed.
8 |
9 | Preamble
10 |
11 | The licenses for most software are designed to take away your
12 | freedom to share and change it. By contrast, the GNU General Public
13 | License is intended to guarantee your freedom to share and change free
14 | software--to make sure the software is free for all its users. This
15 | General Public License applies to most of the Free Software
16 | Foundation's software and to any other program whose authors commit to
17 | using it. (Some other Free Software Foundation software is covered by
18 | the GNU Lesser General Public License instead.) You can apply it to
19 | your programs, too.
20 |
21 | When we speak of free software, we are referring to freedom, not
22 | price. Our General Public Licenses are designed to make sure that you
23 | have the freedom to distribute copies of free software (and charge for
24 | this service if you wish), that you receive source code or can get it
25 | if you want it, that you can change the software or use pieces of it
26 | in new free programs; and that you know you can do these things.
27 |
28 | To protect your rights, we need to make restrictions that forbid
29 | anyone to deny you these rights or to ask you to surrender the rights.
30 | These restrictions translate to certain responsibilities for you if you
31 | distribute copies of the software, or if you modify it.
32 |
33 | For example, if you distribute copies of such a program, whether
34 | gratis or for a fee, you must give the recipients all the rights that
35 | you have. You must make sure that they, too, receive or can get the
36 | source code. And you must show them these terms so they know their
37 | rights.
38 |
39 | We protect your rights with two steps: (1) copyright the software, and
40 | (2) offer you this license which gives you legal permission to copy,
41 | distribute and/or modify the software.
42 |
43 | Also, for each author's protection and ours, we want to make certain
44 | that everyone understands that there is no warranty for this free
45 | software. If the software is modified by someone else and passed on, we
46 | want its recipients to know that what they have is not the original, so
47 | that any problems introduced by others will not reflect on the original
48 | authors' reputations.
49 |
50 | Finally, any free program is threatened constantly by software
51 | patents. We wish to avoid the danger that redistributors of a free
52 | program will individually obtain patent licenses, in effect making the
53 | program proprietary. To prevent this, we have made it clear that any
54 | patent must be licensed for everyone's free use or not licensed at all.
55 |
56 | The precise terms and conditions for copying, distribution and
57 | modification follow.
58 |
59 | GNU GENERAL PUBLIC LICENSE
60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
61 |
62 | 0. This License applies to any program or other work which contains
63 | a notice placed by the copyright holder saying it may be distributed
64 | under the terms of this General Public License. The "Program", below,
65 | refers to any such program or work, and a "work based on the Program"
66 | means either the Program or any derivative work under copyright law:
67 | that is to say, a work containing the Program or a portion of it,
68 | either verbatim or with modifications and/or translated into another
69 | language. (Hereinafter, translation is included without limitation in
70 | the term "modification".) Each licensee is addressed as "you".
71 |
72 | Activities other than copying, distribution and modification are not
73 | covered by this License; they are outside its scope. The act of
74 | running the Program is not restricted, and the output from the Program
75 | is covered only if its contents constitute a work based on the
76 | Program (independent of having been made by running the Program).
77 | Whether that is true depends on what the Program does.
78 |
79 | 1. You may copy and distribute verbatim copies of the Program's
80 | source code as you receive it, in any medium, provided that you
81 | conspicuously and appropriately publish on each copy an appropriate
82 | copyright notice and disclaimer of warranty; keep intact all the
83 | notices that refer to this License and to the absence of any warranty;
84 | and give any other recipients of the Program a copy of this License
85 | along with the Program.
86 |
87 | You may charge a fee for the physical act of transferring a copy, and
88 | you may at your option offer warranty protection in exchange for a fee.
89 |
90 | 2. You may modify your copy or copies of the Program or any portion
91 | of it, thus forming a work based on the Program, and copy and
92 | distribute such modifications or work under the terms of Section 1
93 | above, provided that you also meet all of these conditions:
94 |
95 | a) You must cause the modified files to carry prominent notices
96 | stating that you changed the files and the date of any change.
97 |
98 | b) You must cause any work that you distribute or publish, that in
99 | whole or in part contains or is derived from the Program or any
100 | part thereof, to be licensed as a whole at no charge to all third
101 | parties under the terms of this License.
102 |
103 | c) If the modified program normally reads commands interactively
104 | when run, you must cause it, when started running for such
105 | interactive use in the most ordinary way, to print or display an
106 | announcement including an appropriate copyright notice and a
107 | notice that there is no warranty (or else, saying that you provide
108 | a warranty) and that users may redistribute the program under
109 | these conditions, and telling the user how to view a copy of this
110 | License. (Exception: if the Program itself is interactive but
111 | does not normally print such an announcement, your work based on
112 | the Program is not required to print an announcement.)
113 |
114 | These requirements apply to the modified work as a whole. If
115 | identifiable sections of that work are not derived from the Program,
116 | and can be reasonably considered independent and separate works in
117 | themselves, then this License, and its terms, do not apply to those
118 | sections when you distribute them as separate works. But when you
119 | distribute the same sections as part of a whole which is a work based
120 | on the Program, the distribution of the whole must be on the terms of
121 | this License, whose permissions for other licensees extend to the
122 | entire whole, and thus to each and every part regardless of who wrote it.
123 |
124 | Thus, it is not the intent of this section to claim rights or contest
125 | your rights to work written entirely by you; rather, the intent is to
126 | exercise the right to control the distribution of derivative or
127 | collective works based on the Program.
128 |
129 | In addition, mere aggregation of another work not based on the Program
130 | with the Program (or with a work based on the Program) on a volume of
131 | a storage or distribution medium does not bring the other work under
132 | the scope of this License.
133 |
134 | 3. You may copy and distribute the Program (or a work based on it,
135 | under Section 2) in object code or executable form under the terms of
136 | Sections 1 and 2 above provided that you also do one of the following:
137 |
138 | a) Accompany it with the complete corresponding machine-readable
139 | source code, which must be distributed under the terms of Sections
140 | 1 and 2 above on a medium customarily used for software interchange; or,
141 |
142 | b) Accompany it with a written offer, valid for at least three
143 | years, to give any third party, for a charge no more than your
144 | cost of physically performing source distribution, a complete
145 | machine-readable copy of the corresponding source code, to be
146 | distributed under the terms of Sections 1 and 2 above on a medium
147 | customarily used for software interchange; or,
148 |
149 | c) Accompany it with the information you received as to the offer
150 | to distribute corresponding source code. (This alternative is
151 | allowed only for noncommercial distribution and only if you
152 | received the program in object code or executable form with such
153 | an offer, in accord with Subsection b above.)
154 |
155 | The source code for a work means the preferred form of the work for
156 | making modifications to it. For an executable work, complete source
157 | code means all the source code for all modules it contains, plus any
158 | associated interface definition files, plus the scripts used to
159 | control compilation and installation of the executable. However, as a
160 | special exception, the source code distributed need not include
161 | anything that is normally distributed (in either source or binary
162 | form) with the major components (compiler, kernel, and so on) of the
163 | operating system on which the executable runs, unless that component
164 | itself accompanies the executable.
165 |
166 | If distribution of executable or object code is made by offering
167 | access to copy from a designated place, then offering equivalent
168 | access to copy the source code from the same place counts as
169 | distribution of the source code, even though third parties are not
170 | compelled to copy the source along with the object code.
171 |
172 | 4. You may not copy, modify, sublicense, or distribute the Program
173 | except as expressly provided under this License. Any attempt
174 | otherwise to copy, modify, sublicense or distribute the Program is
175 | void, and will automatically terminate your rights under this License.
176 | However, parties who have received copies, or rights, from you under
177 | this License will not have their licenses terminated so long as such
178 | parties remain in full compliance.
179 |
180 | 5. You are not required to accept this License, since you have not
181 | signed it. However, nothing else grants you permission to modify or
182 | distribute the Program or its derivative works. These actions are
183 | prohibited by law if you do not accept this License. Therefore, by
184 | modifying or distributing the Program (or any work based on the
185 | Program), you indicate your acceptance of this License to do so, and
186 | all its terms and conditions for copying, distributing or modifying
187 | the Program or works based on it.
188 |
189 | 6. Each time you redistribute the Program (or any work based on the
190 | Program), the recipient automatically receives a license from the
191 | original licensor to copy, distribute or modify the Program subject to
192 | these terms and conditions. You may not impose any further
193 | restrictions on the recipients' exercise of the rights granted herein.
194 | You are not responsible for enforcing compliance by third parties to
195 | this License.
196 |
197 | 7. If, as a consequence of a court judgment or allegation of patent
198 | infringement or for any other reason (not limited to patent issues),
199 | conditions are imposed on you (whether by court order, agreement or
200 | otherwise) that contradict the conditions of this License, they do not
201 | excuse you from the conditions of this License. If you cannot
202 | distribute so as to satisfy simultaneously your obligations under this
203 | License and any other pertinent obligations, then as a consequence you
204 | may not distribute the Program at all. For example, if a patent
205 | license would not permit royalty-free redistribution of the Program by
206 | all those who receive copies directly or indirectly through you, then
207 | the only way you could satisfy both it and this License would be to
208 | refrain entirely from distribution of the Program.
209 |
210 | If any portion of this section is held invalid or unenforceable under
211 | any particular circumstance, the balance of the section is intended to
212 | apply and the section as a whole is intended to apply in other
213 | circumstances.
214 |
215 | It is not the purpose of this section to induce you to infringe any
216 | patents or other property right claims or to contest validity of any
217 | such claims; this section has the sole purpose of protecting the
218 | integrity of the free software distribution system, which is
219 | implemented by public license practices. Many people have made
220 | generous contributions to the wide range of software distributed
221 | through that system in reliance on consistent application of that
222 | system; it is up to the author/donor to decide if he or she is willing
223 | to distribute software through any other system and a licensee cannot
224 | impose that choice.
225 |
226 | This section is intended to make thoroughly clear what is believed to
227 | be a consequence of the rest of this License.
228 |
229 | 8. If the distribution and/or use of the Program is restricted in
230 | certain countries either by patents or by copyrighted interfaces, the
231 | original copyright holder who places the Program under this License
232 | may add an explicit geographical distribution limitation excluding
233 | those countries, so that distribution is permitted only in or among
234 | countries not thus excluded. In such case, this License incorporates
235 | the limitation as if written in the body of this License.
236 |
237 | 9. The Free Software Foundation may publish revised and/or new versions
238 | of the General Public License from time to time. Such new versions will
239 | be similar in spirit to the present version, but may differ in detail to
240 | address new problems or concerns.
241 |
242 | Each version is given a distinguishing version number. If the Program
243 | specifies a version number of this License which applies to it and "any
244 | later version", you have the option of following the terms and conditions
245 | either of that version or of any later version published by the Free
246 | Software Foundation. If the Program does not specify a version number of
247 | this License, you may choose any version ever published by the Free Software
248 | Foundation.
249 |
250 | 10. If you wish to incorporate parts of the Program into other free
251 | programs whose distribution conditions are different, write to the author
252 | to ask for permission. For software which is copyrighted by the Free
253 | Software Foundation, write to the Free Software Foundation; we sometimes
254 | make exceptions for this. Our decision will be guided by the two goals
255 | of preserving the free status of all derivatives of our free software and
256 | of promoting the sharing and reuse of software generally.
257 |
258 | NO WARRANTY
259 |
260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
268 | REPAIR OR CORRECTION.
269 |
270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
278 | POSSIBILITY OF SUCH DAMAGES.
279 |
280 | END OF TERMS AND CONDITIONS
281 |
282 | How to Apply These Terms to Your New Programs
283 |
284 | If you develop a new program, and you want it to be of the greatest
285 | possible use to the public, the best way to achieve this is to make it
286 | free software which everyone can redistribute and change under these terms.
287 |
288 | To do so, attach the following notices to the program. It is safest
289 | to attach them to the start of each source file to most effectively
290 | convey the exclusion of warranty; and each file should have at least
291 | the "copyright" line and a pointer to where the full notice is found.
292 |
293 | {description}
294 | Copyright (C) {year} {fullname}
295 |
296 | This program is free software; you can redistribute it and/or modify
297 | it under the terms of the GNU General Public License as published by
298 | the Free Software Foundation; either version 2 of the License, or
299 | (at your option) any later version.
300 |
301 | This program is distributed in the hope that it will be useful,
302 | but WITHOUT ANY WARRANTY; without even the implied warranty of
303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
304 | GNU General Public License for more details.
305 |
306 | You should have received a copy of the GNU General Public License along
307 | with this program; if not, write to the Free Software Foundation, Inc.,
308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
309 |
310 | Also add information on how to contact you by electronic and paper mail.
311 |
312 | If the program is interactive, make it output a short notice like this
313 | when it starts in an interactive mode:
314 |
315 | Gnomovision version 69, Copyright (C) year name of author
316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
317 | This is free software, and you are welcome to redistribute it
318 | under certain conditions; type `show c' for details.
319 |
320 | The hypothetical commands `show w' and `show c' should show the appropriate
321 | parts of the General Public License. Of course, the commands you use may
322 | be called something other than `show w' and `show c'; they could even be
323 | mouse-clicks or menu items--whatever suits your program.
324 |
325 | You should also get your employer (if you work as a programmer) or your
326 | school, if any, to sign a "copyright disclaimer" for the program, if
327 | necessary. Here is a sample; alter the names:
328 |
329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program
330 | `Gnomovision' (which makes passes at compilers) written by James Hacker.
331 |
332 | {signature of Ty Coon}, 1 April 1989
333 | Ty Coon, President of Vice
334 |
335 | This General Public License does not permit incorporating your program into
336 | proprietary programs. If your program is a subroutine library, you may
337 | consider it more useful to permit linking proprietary applications with the
338 | library. If this is what you want to do, use the GNU Lesser General
339 | Public License instead of this License.
340 |
341 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | all: README.md bigdata.epub
2 |
3 | README.md: bigdata.tex
4 | pandoc -s -S --toc --webtex --filter pandoc-citeproc --metadata bibliography=bigdata.bib $< -o $@
5 |
6 | bigdata.epub: bigdata.tex
7 | pandoc -S --toc $< -o $@
8 |
9 | bigdata.pdf: bigdata.tex
10 | pandoc --toc $< -o $@
11 |
--------------------------------------------------------------------------------
/bigdata.bbl:
--------------------------------------------------------------------------------
1 | \begin{thebibliography}{10}
2 |
3 | \bibitem{Accenture13Seattle}
4 | Accenture.
5 | \newblock Accenture analytics and smart building solutions are helping seattle
6 | boost energy efficiency downtown, 2013.
7 |
8 | \bibitem{Accenture14SmartGrid}
9 | Accenture.
10 | \newblock Accenture to help thames water prove the benefits of smart monitoring
11 | capabilities, 2014.
12 |
13 | \bibitem{AdpHcm}
14 | ADP.
15 | \newblock Adp hcm solutions for large business.
16 | \newblock \url{http://www.adp.com/solutions/large-business/products.aspx},
17 | 2014.
18 |
19 | \bibitem{AMPLabBenchmark2014}
20 | AMPLab.
21 | \newblock Big data benchmark.
22 | \newblock \url{http://amplab.cs.berkeley.edu/benchmark/}.
23 |
24 | \bibitem{Accumulo}
25 | Apache.
26 | \newblock Accumulo.
27 | \newblock \url{http://accumulo.apache.org}.
28 |
29 | \bibitem{Cassandra}
30 | Apache.
31 | \newblock Cassandra.
32 | \newblock \url{http://cassandra.apache.org}.
33 |
34 | \bibitem{Chukwa}
35 | Apache.
36 | \newblock Chukwa.
37 | \newblock \url{http://chukwa.apache.org}.
38 |
39 | \bibitem{HdfsShell}
40 | Apache.
41 | \newblock File system shell guide.
42 | \newblock
43 | \url{http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html}.
44 |
45 | \bibitem{Flume}
46 | Apache.
47 | \newblock Flume.
48 | \newblock \url{http://flume.apache.org}.
49 |
50 | \bibitem{Hadoop}
51 | Apache.
52 | \newblock Hadoop.
53 | \newblock \url{http://hadoop.apache.org}.
54 |
55 | \bibitem{HBase}
56 | Apache.
57 | \newblock Hbase.
58 | \newblock \url{http://hbase.apache.org}.
59 |
60 | \bibitem{HdfsThrift}
61 | Apache.
62 | \newblock {HDFS} {API}s in perl, python, ruby and php.
63 | \newblock \url{http://wiki.apache.org/hadoop/HDFS-APIs}.
64 |
65 | \bibitem{Kafka}
66 | Apache.
67 | \newblock Kafka.
68 | \newblock \url{http://kafka.apache.org}.
69 |
70 | \bibitem{MapReduceTutorial}
71 | Apache.
72 | \newblock Mapreduce tutorial.
73 | \newblock
74 | \url{http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html}.
75 |
76 | \bibitem{Maven}
77 | Apache.
78 | \newblock Maven.
79 | \newblock \url{http://maven.apache.org}.
80 |
81 | \bibitem{Spark}
82 | Apache.
83 | \newblock Spark.
84 | \newblock \url{http://spark.apache.org}.
85 |
86 | \bibitem{Sqoop}
87 | Apache.
88 | \newblock Sqoop.
89 | \newblock \url{http://sqoop.apache.org}.
90 |
91 | \bibitem{Storm}
92 | Apache.
93 | \newblock Storm.
94 | \newblock \url{http://storm.apache.org}.
95 |
96 | \bibitem{Tez}
97 | Apache.
98 | \newblock Tez.
99 | \newblock \url{http://tez.apache.org}.
100 |
101 | \bibitem{Thrift}
102 | Apache.
103 | \newblock Thrift.
104 | \newblock \url{http://thrift.apache.org}.
105 |
106 | \bibitem{ZooKeeper}
107 | Apache.
108 | \newblock Zookeeper.
109 | \newblock \url{http://zookeeper.apache.org}.
110 |
111 | \bibitem{Riak}
112 | Basho.
113 | \newblock Riak.
114 | \newblock \url{http://basho.com/riak/}.
115 |
116 | \bibitem{opac:2009}
117 | Philip~A. Bernstein and Eric Newcomer.
118 | \newblock {\em Principles of transaction processing}.
119 | \newblock The Morgan Kaufmann series in data management systems. Morgan
120 | Kaufmann Publishers, Burlington, MA, second edition, 2009.
121 |
122 | \bibitem{Bloom:1970:STH}
123 | Burton~H. Bloom.
124 | \newblock Space/time trade-offs in hash coding with allowable errors.
125 | \newblock {\em Commun. ACM}, 13(7):422--426, July 1970.
126 |
127 | \bibitem{Brewer:2000:TRD}
128 | Eric~A. Brewer.
129 | \newblock Towards robust distributed systems (abstract).
130 | \newblock In {\em Proceedings of the Nineteenth Annual ACM Symposium on
131 | Principles of Distributed Computing}, PODC '00, 2000.
132 |
133 | \bibitem{Brewer:2012}
134 | Eric~A. Brewer.
135 | \newblock Cap twelve years later: How the ``rules'' have changed.
136 | \newblock {\em Computer}, 45(2):23--29, February 2012.
137 |
138 | \bibitem{Chang:2006:BDS}
139 | Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson~C. Hsieh, Deborah~A. Wallach,
140 | Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert~E. Gruber.
141 | \newblock Bigtable: A distributed storage system for structured data.
142 | \newblock In {\em Proceedings of the 7th USENIX Symposium on Operating Systems
143 | Design and Implementation - Volume 7}, OSDI '06, 2006.
144 |
145 | \bibitem{ClouderaImpala2014}
146 | Cloudera.
147 | \newblock New sql choices in the apache hadoop ecosystem: Why impala continues
148 | to lead.
149 | \newblock
150 | \url{http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/}.
151 |
152 | \bibitem{LevelDB}
153 | Jeffrey Dean and Sanjay Ghemawat.
154 | \newblock Leveldb.
155 | \newblock \url{http://leveldb.org}.
156 |
157 | \bibitem{Dean:2008:MSD}
158 | Jeffrey Dean and Sanjay Ghemawat.
159 | \newblock Mapreduce: Simplified data processing on large clusters.
160 | \newblock {\em Commun. ACM}, 51(1):107--113, January 2008.
161 |
162 | \bibitem{DeCandia:2007:DAH}
163 | Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,
164 | Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall,
165 | and Werner Vogels.
166 | \newblock Dynamo: {Amazon's} highly available key-value store.
167 | \newblock In {\em Proceedings of Twenty-first ACM SIGOPS Symposium on Operating
168 | Systems Principles}, SOSP '07, pages 205--220, 2007.
169 |
170 | \bibitem{DeWitt:1992:PDS}
171 | David DeWitt and Jim Gray.
172 | \newblock Parallel database systems: The future of high performance database
173 | systems.
174 | \newblock {\em Commun. ACM}, 35(6):85--98, June 1992.
175 |
176 | \bibitem{fidge1988timestamps}
177 | C.~J. Fidge.
178 | \newblock Timestamps in message-passing systems that preserve the partial
179 | ordering.
180 | \newblock {\em Proceedings of the 11th Australian Computer Science Conference},
181 | 10(1):56–66, 1988.
182 |
183 | \bibitem{Forum:1994:MMI}
184 | Message~P Forum.
185 | \newblock Mpi: A message-passing interface standard.
186 | \newblock Technical report, University of Tennessee, Knoxville, TN, USA, 1994.
187 |
188 | \bibitem{Gartner2014}
189 | Gartner.
190 | \newblock Gartner's 2014 hype cycle for emerging technologies maps the journey
191 | to digital business.
192 | \newblock \url{http://www.gartner.com/newsroom/id/2819918}, 2014.
193 |
194 | \bibitem{Stinger}
195 | Alan Gates.
196 | \newblock The stinger initiative: Making apache hive 100 times faster.
197 | \newblock \url{http://hortonworks.com/blog/100x-faster-hive/}.
198 |
199 | \bibitem{IndustrialInternetReport2014}
200 | GE and Accenture.
201 | \newblock Industrial internet insights report, 2014.
202 |
203 | \bibitem{Ghemawat:2003:GFS}
204 | Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung.
205 | \newblock The google file system.
206 | \newblock In {\em Proceedings of the Nineteenth ACM Symposium on Operating
207 | Systems Principles}, SOSP '03, pages 29--43, 2003.
208 |
209 | \bibitem{Ghodsi:2011:DRF}
210 | Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, and
211 | Ion Stoica.
212 | \newblock Dominant resource fairness: Fair allocation of multiple resource
213 | types.
214 | \newblock In {\em Proceedings of the 8th USENIX Conference on Networked Systems
215 | Design and Implementation}, NSDI'11, pages 323--336, 2011.
216 |
217 | \bibitem{Gilbert:2002:BCF}
218 | Seth Gilbert and Nancy Lynch.
219 | \newblock Brewer's conjecture and the feasibility of consistent, available,
220 | partition-tolerant web services.
221 | \newblock {\em SIGACT News}, 33(2):51--59, June 2002.
222 |
223 | \bibitem{Gropp:1999:UMA}
224 | William Gropp, Ewing Lusk, and Rajeev Thakur.
225 | \newblock {\em Using MPI-2: Advanced Features of the Message-Passing
226 | Interface}.
227 | \newblock MIT Press, Cambridge, MA, USA, 1999.
228 |
229 | \bibitem{HDFS2010:265}
230 | Apache Hadoop.
231 | \newblock Asf jira hdfs-265, 2010.
232 |
233 | \bibitem{YARN2011:279}
234 | Apache Hadoop.
235 | \newblock Asf jira mapreduce-279, 2011.
236 |
237 | \bibitem{HDFS}
238 | Apache Hadoop.
239 | \newblock Hdfs architecture.
240 | \newblock
241 | \url{http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html},
242 | 2014.
243 |
244 | \bibitem{Hindman:2011:MPF}
245 | Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony~D. Joseph,
246 | Randy Katz, Scott Shenker, and Ion Stoica.
247 | \newblock Mesos: A platform for fine-grained resource sharing in the data
248 | center.
249 | \newblock In {\em Proceedings of the 8th USENIX Conference on Networked Systems
250 | Design and Implementation}, NSDI'11, pages 295--308, 2011.
251 |
252 | \bibitem{Watson2013Cancer}
253 | IBM.
254 | \newblock Ibm watson helps fight cancer with evidence-based diagnosis and
255 | treatment suggestions, 2013.
256 |
257 | \bibitem{Watson2013Healthcare}
258 | IBM.
259 | \newblock Putting watson to work: Watson in healthcare, 2013.
260 |
261 | \bibitem{IBM2013}
262 | IBM.
263 | \newblock What is big data?
264 | \newblock
265 | \url{http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html},
266 | 2013.
267 |
268 | \bibitem{Watson2014}
269 | IBM.
270 | \newblock Watson.
271 | \newblock \url{http://www.ibm.com/smarterplanet/us/en/ibmwatson/}, 2014.
272 |
273 | \bibitem{Isard:2007:DDD}
274 | Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly.
275 | \newblock Dryad: Distributed data-parallel programs from sequential building
276 | blocks.
277 | \newblock In {\em Proceedings of the 2Nd ACM SIGOPS/EuroSys European Conference
278 | on Computer Systems 2007}, EuroSys '07, pages 59--72, 2007.
279 |
280 | \bibitem{Jain:1988:ACD}
281 | Anil~K. Jain and Richard~C. Dubes.
282 | \newblock {\em Algorithms for Clustering Data}.
283 | \newblock Prentice-Hall, Inc., 1988.
284 |
285 | \bibitem{HBaseCoprocessor}
286 | Mingjie Lai, Eugene Koontz, and Andrew Purtell.
287 | \newblock Coprocessor introduction.
288 | \newblock \url{http://blogs.apache.org/hbase/entry/coprocessor_introduction}.
289 |
290 | \bibitem{Lakshman:2010:CDS}
291 | Avinash Lakshman and Prashant Malik.
292 | \newblock Cassandra: A decentralized structured storage system.
293 | \newblock {\em SIGOPS Oper. Syst. Rev.}, 44(2):35--40, April 2010.
294 |
295 | \bibitem{Lamport:1998:PP}
296 | Leslie Lamport.
297 | \newblock The part-time parliament.
298 | \newblock {\em ACM Trans. Comput. Syst.}, 16(2):133--169, May 1998.
299 |
300 | \bibitem{Laney2012}
301 | Douglas Laney.
302 | \newblock The importance of `big data': A definition.
303 | \newblock {\em Gartner}, June 2012.
304 |
305 | \bibitem{HBaseMTTR}
306 | Nicolas Liochon.
307 | \newblock Introduction to hbase mean time to recovery ({MTTR}).
308 | \newblock
309 | \url{http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/}.
310 |
311 | \bibitem{MacLeodClarke2012}
312 | David MacLeod and Nita Clarke.
313 | \newblock Engaging for success: enhancing performance through employee
314 | engagement, 2012.
315 |
316 | \bibitem{Mattern89virtualtime}
317 | Friedemann Mattern.
318 | \newblock Virtual time and global states of distributed systems.
319 | \newblock In {\em Parallel and Distributed Algorithms}, pages 215--226, 1989.
320 |
321 | \bibitem{McKusick:2009:GEF}
322 | Marshall~Kirk McKusick and Sean Quinlan.
323 | \newblock Gfs: Evolution on fast-forward.
324 | \newblock {\em Queue}, 7(7):10--20, August 2009.
325 |
326 | \bibitem{BSON}
327 | MongoDB.
328 | \newblock Bson.
329 | \newblock \url{http://bsonspec.org}.
330 |
331 | \bibitem{MongoDB}
332 | MongoDB.
333 | \newblock Mongodb.
334 | \newblock \url{https://www.mongodb.com}.
335 |
336 | \bibitem{WiredTiger}
337 | MongoDB.
338 | \newblock Wiredtiger.
339 | \newblock \url{http://www.wiredtiger.com}.
340 |
341 | \bibitem{NunesKambil2001}
342 | Paul~F. Nunes and Ajit Kambil.
343 | \newblock Personalization? {No} thanks.
344 | \newblock {\em Harvard Business Review}, April 2001.
345 |
346 | \bibitem{O'Neil96thelog-structured}
347 | Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil.
348 | \newblock The log-structured merge-tree (lsm-tree), 1996.
349 |
350 | \bibitem{Pavlo:2009:CAL}
351 | Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel~J. Abadi, David~J. DeWitt,
352 | Samuel Madden, and Michael Stonebraker.
353 | \newblock A comparison of approaches to large-scale data analysis.
354 | \newblock In {\em Proceedings of the 2009 ACM SIGMOD International Conference
355 | on Management of Data}, SIGMOD '09, pages 165--178, 2009.
356 |
357 | \bibitem{Preguica:2012:BAE}
358 | Nuno Pregui\c{c}a, Carlos Bauqero, Paulo~S{\'e}rgio Almeida, Victor Fonte, and
359 | Ricardo Gon\c{c}alves.
360 | \newblock Brief announcement: Efficient causality tracking in distributed
361 | storage systems with dotted version vectors.
362 | \newblock In {\em Proceedings of the 2012 ACM Symposium on Principles of
363 | Distributed Computing}, PODC '12, pages 335--336, 2012.
364 |
365 | \bibitem{Reed:2008:STO}
366 | Benjamin Reed and Flavio~P. Junqueira.
367 | \newblock A simple totally ordered broadcast protocol.
368 | \newblock In {\em Proceedings of the 2Nd Workshop on Large-Scale Distributed
369 | Systems and Middleware}, LADIS '08, 2008.
370 |
371 | \bibitem{ReichheldSasser1990}
372 | Frederick~F. Reichheld and Jr. W.~Earl~Sasser.
373 | \newblock Zero defections: Quality comes to services.
374 | \newblock {\em Harvard Business}, September 1990.
375 |
376 | \bibitem{Rowstron:2012:NEG}
377 | Antony Rowstron, Dushyanth Narayanan, Austin Donnelly, Greg O'Shea, and Andrew
378 | Douglas.
379 | \newblock Nobody ever got fired for using hadoop on a cluster.
380 | \newblock In {\em Proceedings of the 1st International Workshop on Hot Topics
381 | in Cloud Data Processing}, HotCDP '12, pages 1--5, 2012.
382 |
383 | \bibitem{TezTutorial}
384 | Bikas Saha.
385 | \newblock Apache tez: A new chapter in hadoop data processing.
386 | \newblock
387 | \url{http://hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-data-processing/}.
388 |
389 | \bibitem{Schwarzkopf:2013:OFS}
390 | Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes.
391 | \newblock Omega: Flexible, scalable schedulers for large compute clusters.
392 | \newblock In {\em Proceedings of the 8th ACM European Conference on Computer
393 | Systems}, EuroSys '13, pages 351--364, 2013.
394 |
395 | \bibitem{CRDT2011}
396 | Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski.
397 | \newblock A comprehensive study of convergent and commutative replicated data
398 | types.
399 | \newblock Technical Report RR-7506, INRIA, 2011.
400 |
401 | \bibitem{Bitcask}
402 | Justin Sheehy and David Smith.
403 | \newblock Bitcask: A log-structured hash table for fast key/value data.
404 | \newblock
405 | \url{http://github.com/basho/basho_docs/raw/master/source/data/bitcask-intro.pdf}.
406 |
407 | \bibitem{Tar2Seq}
408 | Stuart Sierra.
409 | \newblock A million little files.
410 | \newblock \url{http://stuartsierra.com/2008/04/24/a-million-little-files},
411 | 2008.
412 |
413 | \bibitem{Tachyon}
414 | Tachyon Team.
415 | \newblock Tachyon project.
416 | \newblock \url{http://tachyon-project.org}.
417 |
418 | \bibitem{Top500}
419 | Top500.org.
420 | \newblock Numerical wind tunnel: National aerospace laboratory of japan.
421 | \newblock
422 | \url{http://www.top500.org/featured/systems/numerical-wind-tunnel-national-aerospace-laboratory-of-japan/},
423 | 2014.
424 |
425 | \bibitem{Facebook13Mouse}
426 | Lisa Vaas.
427 | \newblock Facebook mulls silently tracking users' cursor movements to see which
428 | ads we like best, 2013.
429 |
430 | \bibitem{VagateWilfong2014}
431 | Pamela Vagata and Kevin Wilfong.
432 | \newblock Scaling the facebook data warehouse to 300 {PB}.
433 | \newblock
434 | \url{https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/},
435 | April 2014.
436 |
437 | \bibitem{SmallFiles}
438 | Tom White.
439 | \newblock The small files problem.
440 | \newblock \url{http://blog.cloudera.com/blog/2009/02/the-small-files-problem/},
441 | 2009.
442 |
443 | \bibitem{Nvidia2014}
444 | Wikipedia.
445 | \newblock Nvidia tesla.
446 | \newblock \url{http://en.wikipedia.org/wiki/Nvidia_Tesla}, 2014.
447 |
448 | \bibitem{Zaharia:2012:RDD}
449 | Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy
450 | McCauley, Michael~J. Franklin, Scott Shenker, and Ion Stoica.
451 | \newblock Resilient distributed datasets: A fault-tolerant abstraction for
452 | in-memory cluster computing.
453 | \newblock In {\em Proceedings of the 9th USENIX Conference on Networked Systems
454 | Design and Implementation}, NSDI'12, 2012.
455 |
456 | \bibitem{Zaharia:2010:SCC}
457 | Matei Zaharia, Mosharaf Chowdhury, Michael~J. Franklin, Scott Shenker, and Ion
458 | Stoica.
459 | \newblock Spark: Cluster computing with working sets.
460 | \newblock In {\em Proceedings of the 2Nd USENIX Conference on Hot Topics in
461 | Cloud Computing}, HotCloud'10, 2010.
462 |
463 | \end{thebibliography}
464 |
--------------------------------------------------------------------------------
/bigdata.bib:
--------------------------------------------------------------------------------
1 | @misc{Gartner2014,
2 | AUTHOR="Gartner",
3 | TITLE="Gartner's 2014 Hype Cycle for Emerging Technologies Maps the Journey to Digital Business",
4 | howpublished={\url{http://www.gartner.com/newsroom/id/2819918}},
5 | YEAR=2014
6 | }
7 | @misc{VagateWilfong2014,
8 | author = {Pamela Vagata and Kevin Wilfong},
9 | title = {Scaling the Facebook data warehouse to 300 {PB}},
10 | month = {April},
11 | year = {2014},
12 | howpublished={\url{https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/}}
13 | }
14 | @ARTICLE{Laney2012,
15 | AUTHOR="Douglas Laney",
16 | TITLE="The Importance of `Big Data': A Definition",
17 | JOURNAL="Gartner",
18 | month = {June},
19 | YEAR=2012
20 | }
21 | @misc{IBM2013,
22 | AUTHOR="IBM",
23 | TITLE="What is big data?",
24 | howpublished={\url{http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html}},
25 | YEAR=2013
26 | }
27 | @misc{Top500,
28 | AUTHOR="Top500.org",
29 | TITLE="Numerical Wind Tunnel: National Aerospace Laboratory of Japan",
30 | howpublished={\url{http://www.top500.org/featured/systems/numerical-wind-tunnel-national-aerospace-laboratory-of-japan/}},
31 | YEAR=2014
32 | }
33 | @misc{Nvidia2014,
34 | AUTHOR="Wikipedia",
35 | TITLE="Nvidia Tesla",
36 | howpublished={\url{http://en.wikipedia.org/wiki/Nvidia_Tesla}},
37 | YEAR=2014
38 | }
39 | @ARTICLE{NunesKambil2001,
40 | AUTHOR="Paul F. Nunes and Ajit Kambil",
41 | TITLE="Personalization? {No} Thanks.",
42 | JOURNAL="Harvard Business Review",
43 | howpublished={\url{http://hbr.org/2001/04/personalization-no-thanks/ar/1}},
44 | month = {April},
45 | YEAR=2001
46 | }
47 | @inproceedings{Huetal2000,
48 | author = "J. Hu, and H.R. Wu and A. Jennings and X. Wang",
49 | title = "Fast and robust equalization: A case study",
50 | booktitle = "Proceedings of the World Multiconference on Systemics, Cybernetics and Informatics, (SCI 2000), Florida, USA, 23-26 July 2000",
51 | publisher = "International Institute of Informatics and Systemics",
52 | address = "FL, USA",
53 | pages = "398--403",
54 | year = "2000"
55 | }
56 | @Book{Conway2000,
57 | author = {Damian Conway},
58 | title = {Object {O}riented {P}erl: {A} comprehensive guide to concepts and programming techniques},
59 | publisher = {Manning Publications Co.},
60 | year = {2000},
61 | address = {Connecticut, USA}
62 | }
63 | @misc{YARN2011:279,
64 | author = {Apache Hadoop},
65 | title = {ASF JIRA MAPREDUCE-279},
66 | year = {2011}
67 | }
68 | @misc{Hadoop,
69 | author = {Apache},
70 | title = {Hadoop},
71 | howpublished={\url{http://hadoop.apache.org}}
72 | }
73 | @misc{Tez,
74 | author = {Apache},
75 | title = {Tez},
76 | howpublished={\url{http://tez.apache.org}}
77 | }
78 | @misc{TezTutorial,
79 | author = {Bikas Saha},
80 | title = {Apache Tez: A New Chapter in Hadoop Data Processing},
81 | howpublished={\url{http://hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-data-processing/}}
82 | }
83 | @misc{Stinger,
84 | author = {Alan Gates},
85 | title = {The Stinger Initiative: Making Apache Hive 100 Times Faster},
86 | howpublished={\url{http://hortonworks.com/blog/100x-faster-hive/}}
87 | }
88 | @inproceedings{Hindman:2011:MPF,
89 | author = {Hindman, Benjamin and Konwinski, Andy and Zaharia, Matei and Ghodsi, Ali and Joseph, Anthony D. and Katz, Randy and Shenker, Scott and Stoica, Ion},
90 | title = {Mesos: A Platform for Fine-grained Resource Sharing in the Data Center},
91 | booktitle = {Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation},
92 | series = {NSDI'11},
93 | year = {2011},
94 | location = {Boston, MA},
95 | pages = {295--308},
96 | numpages = {14},
97 | url = {http://dl.acm.org/citation.cfm?id=1972457.1972488},
98 | }
99 | @inproceedings{Schwarzkopf:2013:OFS,
100 | author = {Schwarzkopf, Malte and Konwinski, Andy and Abd-El-Malek, Michael and Wilkes, John},
101 | title = {Omega: Flexible, Scalable Schedulers for Large Compute Clusters},
102 | booktitle = {Proceedings of the 8th ACM European Conference on Computer Systems},
103 | series = {EuroSys '13},
104 | year = {2013},
105 | location = {Prague, Czech Republic},
106 | pages = {351--364},
107 | numpages = {14},
108 | url = {http://doi.acm.org/10.1145/2465351.2465386},
109 | }
110 | @inproceedings{Ghodsi:2011:DRF,
111 | author = {Ghodsi, Ali and Zaharia, Matei and Hindman, Benjamin and Konwinski, Andy and Shenker, Scott and Stoica, Ion},
112 | title = {Dominant Resource Fairness: Fair Allocation of Multiple Resource Types},
113 | booktitle = {Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation},
114 | series = {NSDI'11},
115 | year = {2011},
116 | location = {Boston, MA},
117 | pages = {323--336},
118 | numpages = {14},
119 | url = {http://dl.acm.org/citation.cfm?id=1972457.1972490}
120 | }
121 | @inproceedings{Ghemawat:2003:GFS,
122 | author = {Ghemawat, Sanjay and Gobioff, Howard and Leung, Shun-Tak},
123 | title = {The Google File System},
124 | booktitle = {Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles},
125 | series = {SOSP '03},
126 | year = {2003},
127 | location = {Bolton Landing, NY, USA},
128 | pages = {29--43}
129 | }
130 | @techreport{Forum:1994:MMI,
131 | author = {Forum, Message P},
132 | title = {MPI: A Message-Passing Interface Standard},
133 | year = {1994},
134 | institution = {University of Tennessee},
135 | address = {Knoxville, TN, USA}
136 | }
137 | @article{Dean:2008:MSD,
138 | author = {Dean, Jeffrey and Ghemawat, Sanjay},
139 | title = {MapReduce: Simplified Data Processing on Large Clusters},
140 | journal = {Commun. ACM},
141 | issue_date = {January 2008},
142 | volume = {51},
143 | number = {1},
144 | month = jan,
145 | year = {2008},
146 | pages = {107--113}
147 | }
148 | @book{Gropp:1999:UMA,
149 | author = {Gropp, William and Lusk, Ewing and Thakur, Rajeev},
150 | title = {Using MPI-2: Advanced Features of the Message-Passing Interface},
151 | year = {1999},
152 | publisher = {MIT Press},
153 | address = {Cambridge, MA, USA}
154 | }
155 | @article{McKusick:2009:GEF,
156 | author = {McKusick, Marshall Kirk and Quinlan, Sean},
157 | title = {GFS: Evolution on Fast-forward},
158 | journal = {Queue},
159 | issue_date = {August 2009},
160 | volume = {7},
161 | number = {7},
162 | month = Aug,
163 | year = {2009},
164 | pages = {10--20}
165 | }
166 | @misc{HDFS,
167 | AUTHOR="Apache Hadoop",
168 | TITLE="HDFS Architecture",
169 | howpublished={\url{http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html}},
170 | YEAR=2014
171 | }
172 | @misc{HDFS2010:265,
173 | author = {Apache Hadoop},
174 | title = {ASF JIRA HDFS-265},
175 | year = {2010}
176 | }
177 | @misc{MacLeodClarke2012,
178 | author = {David MacLeod and Nita Clarke},
179 | title = {Engaging for Success: enhancing performance through employee engagement},
180 | year = {2012}
181 | }
182 | @ARTICLE{ReichheldSasser1990,
183 | AUTHOR="Frederick F. Reichheld and W. Earl Sasser, Jr.",
184 | TITLE="Zero Defections: Quality Comes to Services",
185 | JOURNAL="Harvard Business",
186 | month = {September},
187 | YEAR=1990
188 | }
189 | @misc{Accenture14SmartGrid,
190 | title = {Accenture to Help Thames Water Prove the Benefits of Smart Monitoring Capabilities},
191 | author = {Accenture},
192 | year = {2014}
193 | }
194 | @misc{Accenture13Seattle,
195 | title = {Accenture Analytics and Smart Building Solutions are helping Seattle boost energy efficiency downtown},
196 | author = {Accenture},
197 | year = {2013}
198 | }
199 | @misc{IndustrialInternetReport2014,
200 | title = {Industrial Internet Insights report},
201 | author = {GE and Accenture},
202 | year = {2014}
203 | }
204 | @misc{Watson2014,
205 | AUTHOR="IBM",
206 | TITLE="Watson",
207 | howpublished={\url{http://www.ibm.com/smarterplanet/us/en/ibmwatson/}},
208 | YEAR=2014
209 | }
210 | @misc{Watson2013Healthcare,
211 | AUTHOR="IBM",
212 | TITLE="Putting Watson to Work: Watson in Healthcare",
213 | YEAR=2013
214 | }
215 | @misc{Watson2013Cancer,
216 | AUTHOR="IBM",
217 | TITLE="IBM Watson Helps Fight Cancer with Evidence-Based Diagnosis and Treatment Suggestions",
218 | YEAR=2013
219 | }
220 | @misc{Facebook13Mouse,
221 | AUTHOR="Lisa Vaas",
222 | TITLE="Facebook mulls silently tracking users' cursor movements to see which ads we like best",
223 | YEAR=2013
224 | }
225 | @misc{AdpHcm,
226 | AUTHOR="ADP",
227 | TITLE="ADP HCM Solutions for Large Business",
228 | howpublished={\url{http://www.adp.com/solutions/large-business/products.aspx}},
229 | YEAR=2014
230 | }
231 | @misc{SmallFiles,
232 | AUTHOR="Tom White",
233 | institution = {Cloudera Blog},
234 | TITLE="The Small Files Problem",
235 | howpublished={\url{http://blog.cloudera.com/blog/2009/02/the-small-files-problem/}},
236 | YEAR=2009
237 | }
238 | @misc{Tar2Seq,
239 | AUTHOR="Stuart Sierra",
240 | TITLE="A Million Little Files",
241 | howpublished={\url{http://stuartsierra.com/2008/04/24/a-million-little-files}},
242 | YEAR=2008
243 | }
244 | @misc{Maven,
245 | AUTHOR="Apache",
246 | TITLE="Maven",
247 | howpublished={\url{http://maven.apache.org}}
248 | }
249 | @misc{Thrift,
250 | AUTHOR="Apache",
251 | TITLE="Thrift",
252 | howpublished={\url{http://thrift.apache.org}}
253 | }
254 | @misc{HdfsThrift,
255 | AUTHOR="Apache",
256 | TITLE="{HDFS} {API}s in perl, python, ruby and php",
257 | howpublished={\url{http://wiki.apache.org/hadoop/HDFS-APIs}}
258 | }
259 | @misc{Sqoop,
260 | AUTHOR="Apache",
261 | TITLE="Sqoop",
262 | howpublished={\url{http://sqoop.apache.org}}
263 | }
264 | @misc{Flume,
265 | AUTHOR="Apache",
266 | TITLE="Flume",
267 | howpublished={\url{http://flume.apache.org}}
268 | }
269 | @misc{Kafka,
270 | AUTHOR="Apache",
271 | TITLE="Kafka",
272 | howpublished={\url{http://kafka.apache.org}}
273 | }
274 | @misc{Chukwa,
275 | AUTHOR="Apache",
276 | TITLE="Chukwa",
277 | howpublished={\url{http://chukwa.apache.org}}
278 | }
279 | @misc{Storm,
280 | AUTHOR="Apache",
281 | TITLE="Storm",
282 | howpublished={\url{http://storm.apache.org}}
283 | }
284 | @misc{HdfsShell,
285 | AUTHOR="Apache",
286 | TITLE="File System Shell Guide",
287 | howpublished={\url{http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html}}
288 | }
289 | @inproceedings{Pavlo:2009:CAL,
290 | author = {Pavlo, Andrew and Paulson, Erik and Rasin, Alexander and Abadi, Daniel J. and DeWitt, David J. and Madden, Samuel and Stonebraker, Michael},
291 | title = {A Comparison of Approaches to Large-scale Data Analysis},
292 | booktitle = {Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data},
293 | series = {SIGMOD '09},
294 | year = {2009},
295 | location = {Providence, Rhode Island, USA},
296 | pages = {165--178}
297 | }
298 | @article{DeWitt:1992:PDS,
299 | author = {DeWitt, David and Gray, Jim},
300 | title = {Parallel Database Systems: The Future of High Performance Database Systems},
301 | journal = {Commun. ACM},
302 | issue_date = {June 1992},
303 | volume = {35},
304 | number = {6},
305 | month = jun,
306 | year = {1992},
307 | pages = {85--98}
308 | }
309 | @book{Jain:1988:ACD,
310 | author = {Jain, Anil K. and Dubes, Richard C.},
311 | title = {Algorithms for Clustering Data},
312 | year = {1988},
313 | publisher = {Prentice-Hall, Inc.}
314 | }
315 | @misc{MapReduceTutorial,
316 | AUTHOR="Apache",
317 | TITLE="MapReduce Tutorial",
318 | howpublished={\url{http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html}}
319 | }
320 | @inproceedings{Zaharia:2010:SCC,
321 | author = {Zaharia, Matei and Chowdhury, Mosharaf and Franklin, Michael J. and Shenker, Scott and Stoica, Ion},
322 | title = {Spark: Cluster Computing with Working Sets},
323 | booktitle = {Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing},
324 | series = {HotCloud'10},
325 | year = {2010},
326 | location = {Boston, MA},
327 | }
328 | @inproceedings{Zaharia:2012:RDD,
329 | author = {Zaharia, Matei and Chowdhury, Mosharaf and Das, Tathagata and Dave, Ankur and Ma, Justin and McCauley, Murphy and Franklin, Michael J. and Shenker, Scott and Stoica, Ion},
330 | title = {Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing},
331 | booktitle = {Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation},
332 | series = {NSDI'12},
333 | year = {2012},
334 | location = {San Jose, CA},
335 | }
336 | @misc{Spark,
337 | AUTHOR="Apache",
338 | TITLE="Spark",
339 | howpublished={\url{http://spark.apache.org}}
340 | }
341 | @inproceedings{Rowstron:2012:NEG,
342 | author = {Rowstron, Antony and Narayanan, Dushyanth and Donnelly, Austin and O'Shea, Greg and Douglas, Andrew},
343 | title = {Nobody Ever Got Fired for Using Hadoop on a Cluster},
344 | booktitle = {Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing},
345 | series = {HotCDP '12},
346 | year = {2012},
347 | location = {Bern, Switzerland},
348 | pages = {1--5}
349 | }
350 | @inproceedings{Isard:2007:DDD,
351 | author = {Isard, Michael and Budiu, Mihai and Yu, Yuan and Birrell, Andrew and Fetterly, Dennis},
352 | title = {Dryad: Distributed Data-parallel Programs from Sequential Building Blocks},
353 | booktitle = {Proceedings of the 2Nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007},
354 | series = {EuroSys '07},
355 | year = {2007},
356 | location = {Lisbon, Portugal},
357 | pages = {59--72}
358 | }
359 | @misc{Tachyon,
360 | AUTHOR="Tachyon Team",
361 | TITLE="Tachyon Project",
362 | howpublished={\url{http://tachyon-project.org}}
363 | }
364 | @inproceedings{Brewer:2000:TRD,
365 | author = {Brewer, Eric A.},
366 | title = {Towards Robust Distributed Systems (Abstract)},
367 | booktitle = {Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing},
368 | series = {PODC '00},
369 | year = {2000},
370 | location = {Portland, Oregon, USA},
371 | }
372 | @article{Gilbert:2002:BCF,
373 | author = {Gilbert, Seth and Lynch, Nancy},
374 | title = {Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-tolerant Web Services},
375 | journal = {SIGACT News},
376 | issue_date = {June 2002},
377 | volume = {33},
378 | number = {2},
379 | month = jun,
380 | year = {2002},
381 | pages = {51--59}
382 | }
383 | @article{Brewer:2012,
384 | author = {Brewer, Eric A.},
385 | title = {CAP Twelve Years Later: How the ``Rules'' Have Changed},
386 | journal = {Computer},
387 | volume = {45},
388 | number = {2},
389 | month = Feb,
390 | year = {2012},
391 | pages = {23--29}
392 | }
393 | @misc{ZooKeeper,
394 | AUTHOR="Apache",
395 | TITLE="Zookeeper",
396 | howpublished={\url{http://zookeeper.apache.org}}
397 | }
398 | @inproceedings{Reed:2008:STO,
399 | author = {Reed, Benjamin and Junqueira, Flavio P.},
400 | title = {A Simple Totally Ordered Broadcast Protocol},
401 | booktitle = {Proceedings of the 2Nd Workshop on Large-Scale Distributed Systems and Middleware},
402 | series = {LADIS '08},
403 | year = {2008},
404 | location = {Yorktown Heights, New York}
405 | }
406 | @article{Lamport:1998:PP,
407 | author = {Lamport, Leslie},
408 | title = {The Part-time Parliament},
409 | journal = {ACM Trans. Comput. Syst.},
410 | issue_date = {May 1998},
411 | volume = {16},
412 | number = {2},
413 | month = may,
414 | year = {1998},
415 | pages = {133--169}
416 | }
417 | @book{opac:2009,
418 | title = "Principles of transaction processing",
419 | author = "Bernstein, Philip A. and Newcomer, Eric",
420 | series = "The Morgan Kaufmann series in data management systems",
421 | edition = "Second",
422 | publisher = "Morgan Kaufmann Publishers",
423 | address = "Burlington, MA",
424 | year = 2009
425 | }
426 | @inproceedings{Chang:2006:BDS,
427 | author = {Chang, Fay and Dean, Jeffrey and Ghemawat, Sanjay and Hsieh, Wilson C. and Wallach, Deborah A. and Burrows, Mike and Chandra, Tushar and Fikes, Andrew and Gruber, Robert E.},
428 | title = {Bigtable: A Distributed Storage System for Structured Data},
429 | booktitle = {Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7},
430 | series = {OSDI '06},
431 | year = {2006},
432 | location = {Seattle, WA}
433 | }
434 | @misc{HBase,
435 | AUTHOR="Apache",
436 | TITLE="Hbase",
437 | howpublished={\url{http://hbase.apache.org}}
438 | }
439 | @misc{Accumulo,
440 | AUTHOR="Apache",
441 | TITLE="Accumulo",
442 | howpublished={\url{http://accumulo.apache.org}}
443 | }
444 | @article{Bloom:1970:STH,
445 | author = {Bloom, Burton H.},
446 | title = {Space/Time Trade-offs in Hash Coding with Allowable Errors},
447 | journal = {Commun. ACM},
448 | issue_date = {July 1970},
449 | volume = {13},
450 | number = {7},
451 | month = jul,
452 | year = {1970},
453 | pages = {422--426}
454 | }
455 | @misc{HBaseMTTR,
456 | AUTHOR="Nicolas Liochon",
457 | TITLE="Introduction to HBase Mean Time to Recovery ({MTTR})",
458 | howpublished={\url{http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/}}
459 | }
460 | @misc{HBaseCoprocessor,
461 | AUTHOR="Mingjie Lai and Eugene Koontz and Andrew Purtell",
462 | TITLE="Coprocessor Introduction",
463 | howpublished={\url{http://blogs.apache.org/hbase/entry/coprocessor_introduction}}
464 | }
465 | @misc{Riak,
466 | AUTHOR="Basho",
467 | TITLE="Riak",
468 | howpublished={\url{http://basho.com/riak/}}
469 | }
470 | @inproceedings{DeCandia:2007:DAH,
471 | author = {DeCandia, Giuseppe and Hastorun, Deniz and Jampani, Madan and Kakulapati, Gunavardhan and Lakshman, Avinash and Pilchin, Alex and Sivasubramanian, Swaminathan and Vosshall, Peter and Vogels, Werner},
472 | title = {Dynamo: {Amazon's} Highly Available Key-value Store},
473 | booktitle = {Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles},
474 | series = {SOSP '07},
475 | year = {2007},
476 | location = {Stevenson, Washington, USA},
477 | pages = {205--220}
478 | }
479 | @misc{LevelDB,
480 | AUTHOR="Jeffrey Dean and Sanjay Ghemawat",
481 | TITLE="LevelDB",
482 | howpublished={\url{http://leveldb.org}}
483 | }
484 | @misc{Bitcask,
485 | AUTHOR= "Justin Sheehy and David Smith",
486 | TITLE="Bitcask: A Log-Structured Hash Table for Fast Key/Value Data",
487 | howpublished={\url{http://github.com/basho/basho_docs/raw/master/source/data/bitcask-intro.pdf}}
488 | }
489 | @techreport{CRDT2011,
490 | author = {Marc Shapiro and Nuno Preguiça and Carlos Baquero and Marek Zawirski},
491 | title = {A comprehensive study of Convergent and Commutative Replicated Data Types},
492 | year = {2011},
493 | number = {RR-7506},
494 | institution = {INRIA}
495 | }
496 | @INPROCEEDINGS{Mattern89virtualtime,
497 | author = {Friedemann Mattern},
498 | title = {Virtual Time and Global States of Distributed Systems},
499 | booktitle = {Parallel and Distributed Algorithms},
500 | year = {1989},
501 | pages = {215--226}
502 | }
503 | @article{fidge1988timestamps,
504 | author = {Fidge, C. J.},
505 | journal = {Proceedings of the 11th Australian Computer Science Conference},
506 | number = 1,
507 | pages = {56–66},
508 | title = {Timestamps in message-passing systems that preserve the partial ordering},
509 | volume = 10,
510 | year = 1988
511 | }
512 | @inproceedings{Preguica:2012:BAE,
513 | author = {Pregui\c{c}a, Nuno and Bauqero, Carlos and Almeida, Paulo S{\'e}rgio and Fonte, Victor and Gon\c{c}alves, Ricardo},
514 | title = {Brief Announcement: Efficient Causality Tracking in Distributed Storage Systems with Dotted Version Vectors},
515 | booktitle = {Proceedings of the 2012 ACM Symposium on Principles of Distributed Computing},
516 | series = {PODC '12},
517 | year = {2012},
518 | location = {Madeira, Portugal},
519 | pages = {335--336}
520 | }
521 | @misc{Cassandra,
522 | AUTHOR="Apache",
523 | TITLE="Cassandra",
524 | howpublished={\url{http://cassandra.apache.org}}
525 | }
526 | @article{Lakshman:2010:CDS,
527 | author = {Lakshman, Avinash and Malik, Prashant},
528 | title = {Cassandra: A Decentralized Structured Storage System},
529 | journal = {SIGOPS Oper. Syst. Rev.},
530 | issue_date = {April 2010},
531 | volume = {44},
532 | number = {2},
533 | month = apr,
534 | year = {2010},
535 | pages = {35--40}
536 | }
537 | @MISC{O'Neil96thelog-structured,
538 | author = {Patrick O'Neil and Edward Cheng and Dieter Gawlick and Elizabeth O'Neil},
539 | title = {The Log-Structured Merge-Tree (LSM-Tree)},
540 | year = {1996}
541 | }
542 | @misc{MongoDB,
543 | AUTHOR="MongoDB",
544 | TITLE="MongoDB",
545 | howpublished={\url{https://www.mongodb.com}}
546 | }
547 | @misc{BSON,
548 | AUTHOR="MongoDB",
549 | TITLE="BSON",
550 | howpublished={\url{http://bsonspec.org}}
551 | }
552 | @misc{WiredTiger,
553 | AUTHOR="MongoDB",
554 | TITLE="WiredTiger",
555 | howpublished={\url{http://www.wiredtiger.com}}
556 | }
557 | @misc{ClouderaImpala2014,
558 | AUTHOR="Cloudera",
559 | TITLE="New SQL Choices in the Apache Hadoop Ecosystem: Why Impala Continues to Lead",
560 | howpublished={\url{http://blog.cloudera.com/blog/2014/05/new-sql-choices-in-the-apache-hadoop-ecosystem-why-impala-continues-to-lead/}}
561 | }
562 | @misc{AMPLabBenchmark2014,
563 | AUTHOR="AMPLab",
564 | TITLE="Big Data Benchmark",
565 | howpublished={\url{http://amplab.cs.berkeley.edu/benchmark/}}
566 | }
567 |
--------------------------------------------------------------------------------
/docs/bigdata.css:
--------------------------------------------------------------------------------
1 |
2 | /* start css.sty */
3 | .cmr-10{font-size:83%;}
4 | .cmr-17x-x-120{font-size:170%;}
5 | .cmr-12x-x-120{font-size:120%;}
6 | .pcrr7t-x-x-120{font-family: monospace;}
7 | .cmmi-12{font-style: italic;}
8 | .cmmi-8{font-size:66%;font-style: italic;}
9 | .cmr-10x-x-109{font-size:90%;}
10 | .cmti-12{ font-style: italic;}
11 | .cmbx-12{ font-weight: bold;}
12 | .pcrr7t-x-x-109{font-size:90%;font-family: monospace;}
13 | .pcrr7t-{font-size:83%;font-family: monospace;}
14 | .pcrb7t-x-x-120{ font-family: monospace; font-weight: bold;}
15 | p.noindent { text-indent: 0em }
16 | td p.noindent { text-indent: 0em; margin-top:0em; }
17 | p.nopar { text-indent: 0em; }
18 | p.indent{ text-indent: 1.5em }
19 | @media print {div.crosslinks {visibility:hidden;}}
20 | a img { border-top: 0; border-left: 0; border-right: 0; }
21 | center { margin-top:1em; margin-bottom:1em; }
22 | td center { margin-top:0em; margin-bottom:0em; }
23 | .Canvas { position:relative; }
24 | img.math{vertical-align:middle;}
25 | li p.indent { text-indent: 0em }
26 | li p:first-child{ margin-top:0em; }
27 | li p:last-child, li div:last-child { margin-bottom:0.5em; }
28 | li p~ul:last-child, li p~ol:last-child{ margin-bottom:0.5em; }
29 | .enumerate1 {list-style-type:decimal;}
30 | .enumerate2 {list-style-type:lower-alpha;}
31 | .enumerate3 {list-style-type:lower-roman;}
32 | .enumerate4 {list-style-type:upper-alpha;}
33 | div.newtheorem { margin-bottom: 2em; margin-top: 2em;}
34 | .obeylines-h,.obeylines-v {white-space: nowrap; }
35 | div.obeylines-v p { margin-top:0; margin-bottom:0; }
36 | .overline{ text-decoration:overline; }
37 | .overline img{ border-top: 1px solid black; }
38 | td.displaylines {text-align:center; white-space:nowrap;}
39 | .centerline {text-align:center;}
40 | .rightline {text-align:right;}
41 | div.verbatim {font-family: monospace; white-space: nowrap; text-align:left; clear:both; }
42 | .fbox {padding-left:3.0pt; padding-right:3.0pt; text-indent:0pt; border:solid black 0.4pt; }
43 | div.fbox {display:table}
44 | div.center div.fbox {text-align:center; clear:both; padding-left:3.0pt; padding-right:3.0pt; text-indent:0pt; border:solid black 0.4pt; }
45 | div.minipage{width:100%;}
46 | div.center, div.center div.center {text-align: center; margin-left:1em; margin-right:1em;}
47 | div.center div {text-align: left;}
48 | div.flushright, div.flushright div.flushright {text-align: right;}
49 | div.flushright div {text-align: left;}
50 | div.flushleft {text-align: left;}
51 | .underline{ text-decoration:underline; }
52 | .underline img{ border-bottom: 1px solid black; margin-bottom:1pt; }
53 | .framebox-c, .framebox-l, .framebox-r { padding-left:3.0pt; padding-right:3.0pt; text-indent:0pt; border:solid black 0.4pt; }
54 | .framebox-c {text-align:center;}
55 | .framebox-l {text-align:left;}
56 | .framebox-r {text-align:right;}
57 | span.thank-mark{ vertical-align: super }
58 | span.footnote-mark sup.textsuperscript, span.footnote-mark a sup.textsuperscript{ font-size:80%; }
59 | div.tabular, div.center div.tabular {text-align: center; margin-top:0.5em; margin-bottom:0.5em; }
60 | table.tabular td p{margin-top:0em;}
61 | table.tabular {margin-left: auto; margin-right: auto;}
62 | td p:first-child{ margin-top:0em; }
63 | td p:last-child{ margin-bottom:0em; }
64 | div.td00{ margin-left:0pt; margin-right:0pt; }
65 | div.td01{ margin-left:0pt; margin-right:5pt; }
66 | div.td10{ margin-left:5pt; margin-right:0pt; }
67 | div.td11{ margin-left:5pt; margin-right:5pt; }
68 | table[rules] {border-left:solid black 0.4pt; border-right:solid black 0.4pt; }
69 | td.td00{ padding-left:0pt; padding-right:0pt; }
70 | td.td01{ padding-left:0pt; padding-right:5pt; }
71 | td.td10{ padding-left:5pt; padding-right:0pt; }
72 | td.td11{ padding-left:5pt; padding-right:5pt; }
73 | table[rules] {border-left:solid black 0.4pt; border-right:solid black 0.4pt; }
74 | .hline hr, .cline hr{ height : 1px; margin:0px; }
75 | .tabbing-right {text-align:right;}
76 | span.TEX {letter-spacing: -0.125em; }
77 | span.TEX span.E{ position:relative;top:0.5ex;left:-0.0417em;}
78 | a span.TEX span.E {text-decoration: none; }
79 | span.LATEX span.A{ position:relative; top:-0.5ex; left:-0.4em; font-size:85%;}
80 | span.LATEX span.TEX{ position:relative; left: -0.4em; }
81 | div.float, div.figure {margin-left: auto; margin-right: auto;}
82 | div.float img {text-align:center;}
83 | div.figure img {text-align:center;}
84 | .marginpar {width:20%; float:right; text-align:left; margin-left:auto; margin-top:0.5em; font-size:85%; text-decoration:underline;}
85 | .marginpar p{margin-top:0.4em; margin-bottom:0.4em;}
86 | table.equation {width:100%;}
87 | .equation td{text-align:center; }
88 | td.equation { margin-top:1em; margin-bottom:1em; }
89 | td.equation-label { width:5%; text-align:center; }
90 | td.eqnarray4 { width:5%; white-space: normal; }
91 | td.eqnarray2 { width:5%; }
92 | table.eqnarray-star, table.eqnarray {width:100%;}
93 | div.eqnarray{text-align:center;}
94 | div.array {text-align:center;}
95 | div.pmatrix {text-align:center;}
96 | table.pmatrix {width:100%;}
97 | span.pmatrix img{vertical-align:middle;}
98 | div.pmatrix {text-align:center;}
99 | table.pmatrix {width:100%;}
100 | span.bar-css {text-decoration:overline;}
101 | img.cdots{vertical-align:middle;}
102 | .partToc a, .partToc, .likepartToc a, .likepartToc {line-height: 200%; font-weight:bold; font-size:110%;}
103 | .chapterToc a, .chapterToc, .likechapterToc a, .likechapterToc, .appendixToc a, .appendixToc {line-height: 200%; font-weight:bold;}
104 | .index-item, .index-subitem, .index-subsubitem {display:block}
105 | div.caption {text-indent:-2em; margin-left:3em; margin-right:1em; text-align:left;}
106 | div.caption span.id{font-weight: bold; white-space: nowrap; }
107 | h1.partHead{text-align: center}
108 | p.bibitem { text-indent: -2em; margin-left: 2em; margin-top:0.6em; margin-bottom:0.6em; }
109 | p.bibitem-p { text-indent: 0em; margin-left: 2em; margin-top:0.6em; margin-bottom:0.6em; }
110 | .paragraphHead, .likeparagraphHead { margin-top:2em; font-weight: bold;}
111 | .subparagraphHead, .likesubparagraphHead { font-weight: bold;}
112 | .quote {margin-bottom:0.25em; margin-top:0.25em; margin-left:1em; margin-right:1em; text-align:justify;}
113 | .verse{white-space:nowrap; margin-left:2em}
114 | div.maketitle {text-align:center;}
115 | h2.titleHead{text-align:center;}
116 | div.maketitle{ margin-bottom: 2em; }
117 | div.author, div.date {text-align:center;}
118 | div.thanks{text-align:left; margin-left:10%; font-size:85%; font-style:italic; }
119 | div.author{white-space: nowrap;}
120 | .quotation {margin-bottom:0.25em; margin-top:0.25em; margin-left:1em; }
121 | h1.partHead{text-align: center}
122 | .chapterToc, .likechapterToc {margin-left:0em;}
123 | .chapterToc ~ .likesectionToc, .chapterToc ~ .sectionToc, .likechapterToc ~ .likesectionToc, .likechapterToc ~ .sectionToc {margin-left:2em;}
124 | .chapterToc ~ .likesectionToc ~ .likesubsectionToc, .chapterToc ~ .likesectionToc ~ .subsectionToc, .chapterToc ~ .sectionToc ~ .likesubsectionToc, .chapterToc ~ .sectionToc ~ .subsectionToc, .likechapterToc ~ .likesectionToc ~ .likesubsectionToc, .likechapterToc ~ .likesectionToc ~ .subsectionToc, .likechapterToc ~ .sectionToc ~ .likesubsectionToc, .likechapterToc ~ .sectionToc ~ .subsectionToc {margin-left:4em;}
125 | .chapterToc ~ .likesectionToc ~ .likesubsectionToc ~ .likesubsubsectionToc, .chapterToc ~ .likesectionToc ~ .likesubsectionToc ~ .subsubsectionToc, .chapterToc ~ .likesectionToc ~ .subsectionToc ~ .likesubsubsectionToc, .chapterToc ~ .likesectionToc ~ .subsectionToc ~ .subsubsectionToc, .chapterToc ~ .sectionToc ~ .likesubsectionToc ~ .likesubsubsectionToc, .chapterToc ~ .sectionToc ~ .likesubsectionToc ~ .subsubsectionToc, .chapterToc ~ .sectionToc ~ .subsectionToc ~ .likesubsubsectionToc, .chapterToc ~ .sectionToc ~ .subsectionToc ~ .subsubsectionToc, .likechapterToc ~ .likesectionToc ~ .likesubsectionToc ~ .likesubsubsectionToc, .likechapterToc ~ .likesectionToc ~ .likesubsectionToc ~ .subsubsectionToc, .likechapterToc ~ .likesectionToc ~ .subsectionToc ~ .likesubsubsectionToc, .likechapterToc ~ .likesectionToc ~ .subsectionToc ~ .subsubsectionToc, .likechapterToc ~ .sectionToc ~ .likesubsectionToc ~ .likesubsubsectionToc, .likechapterToc ~ .sectionToc ~ .likesubsectionToc ~ .subsubsectionToc, .likechapterToc ~ .sectionToc ~ .subsectionToc ~ .likesubsubsectionToc .likechapterToc ~ .sectionToc ~ .subsectionToc ~ .subsubsectionToc {margin-left:6em;}
126 | .likesectionToc , .sectionToc {margin-left:0em;}
127 | .likesectionToc ~ .likesubsectionToc, .likesectionToc ~ .subsectionToc, .sectionToc ~ .likesubsectionToc, .sectionToc ~ .subsectionToc {margin-left:2em;}
128 | .likesectionToc ~ .likesubsectionToc ~ .likesubsubsectionToc, .likesectionToc ~ .likesubsectionToc ~ .subsubsectionToc, .likesectionToc ~ .subsectionToc ~ .likesubsubsectionToc, .likesectionToc ~ .subsectionToc ~ .subsubsectionToc, .sectionToc ~ .likesubsectionToc ~ .likesubsubsectionToc, .sectionToc ~ .likesubsectionToc ~ .subsubsectionToc, .sectionToc ~ .subsectionToc ~ .likesubsubsectionToc, .sectionToc ~ .subsectionToc ~ .subsubsectionToc {margin-left:4em;}
129 | .likesubsectionToc, .subsectionToc {margin-left:0em;}
130 | .likesubsectionToc ~ .subsubsectionToc, .subsectionToc ~ .subsubsectionToc, {margin-left:2em;}
131 | .figure img.graphics {margin-left:10%;}
132 | .lstlisting .label{margin-right:0.5em; }
133 | div.lstlisting{font-family: monospace; white-space: nowrap; margin-top:0.5em; margin-bottom:0.5em; }
134 | div.lstinputlisting{ font-family: monospace; white-space: nowrap; }
135 | .lstinputlisting .label{margin-right:0.5em;}
136 | /* end css.sty */
137 |
138 |
--------------------------------------------------------------------------------
/docs/bigdata10.html:
--------------------------------------------------------------------------------
1 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
14 |
15 |
1Google BigQuery is the public implementation of Dremel.
18 | BigQuery provides the core set of features available in Dremel to third
20 | party developers via a REST API.
1In the asynchronous model, there is no clock, and nodes must
18 | make decisions based only on the messages received and local computation.
20 | In the asynchronous model an algorithm has no way of determining
22 | whether a message has been lost, or has been arbitrarily delayed in the
24 | transmission channel.
2In a partially synchronous model, every node has a clock, and all
18 | clocks increase at the same rate. However, the clocks themselves are not
20 | synchronized, in that they may display different values at the same real
22 | time.
3Once the system enters partition mode, one approach is to limit
18 | some operations, thereby reducing availability. The alternative allows
20 | inconsistency but records extra information about the operations that will
22 | be helpful during partition recovery.
5Earlier versions used the MapFileformat. The MapFile is actually
20 | a directory that contains two SequenceFile: the data file and the index file.
22 |
2Facebook is even testing data mining methods that would
18 | silently follow users’ mouse movements to see not only where we
20 | click but even where we pause, where we hover and for how long
22 | [77].
3The Apache Thrift [20]is a software framework for scalable
23 | cross-language services development. It combines a software stack with a
25 | code generation engine to build services that work efficiently and
27 | seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl,
29 | C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, and Delphi,
31 | etc.