├── .gitignore ├── Makefile ├── common └── pdf-template.tex ├── en ├── mongodb.markdown ├── title.png ├── title.psd └── title.txt ├── readme.markdown └── zh-cn ├── mongodb.markdown ├── title.png ├── title.psd └── title.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Output books 2 | 3 | *.pdf 4 | *.epub 5 | *.mobi 6 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | SOURCE_FILE_NAME = mongodb.markdown 2 | BOOK_FILE_NAME = mongodb 3 | 4 | PDF_BUILDER = pandoc 5 | PDF_BUILDER_FLAGS = \ 6 | --latex-engine xelatex \ 7 | --template ../common/pdf-template.tex \ 8 | --listings 9 | 10 | EPUB_BUILDER = pandoc 11 | EPUB_BUILDER_FLAGS = \ 12 | --epub-cover-image 13 | 14 | MOBI_BUILDER = kindlegen 15 | 16 | 17 | en/mongodb.pdf: 18 | cd en && $(PDF_BUILDER) $(PDF_BUILDER_FLAGS) $(SOURCE_FILE_NAME) -o $(BOOK_FILE_NAME).pdf 19 | 20 | en/mongodb.epub: en/title.png en/title.txt en/mongodb.markdown 21 | $(EPUB_BUILDER) $(EPUB_BUILDER_FLAGS) $^ -o $@ 22 | 23 | en/mongodb.mobi: en/mongodb.epub 24 | $(MOBI_BUILDER) $^ 25 | 26 | clean: 27 | rm -f */$(BOOK_FILE_NAME).pdf 28 | rm -f */$(BOOK_FILE_NAME).epub 29 | rm -f */$(BOOK_FILE_NAME).mobi 30 | -------------------------------------------------------------------------------- /common/pdf-template.tex: -------------------------------------------------------------------------------- 1 | \documentclass{book} 2 | 3 | 4 | % Fonts and typography 5 | 6 | %% Typography 7 | \usepackage[no-math]{fontspec} 8 | \defaultfontfeatures{Scale = MatchLowercase} 9 | 10 | %% Fonts 11 | \setmainfont[Ligatures=TeX]{Verdana} 12 | \setsansfont[Ligatures=TeX]{Verdana} 13 | \setmonofont{Consolas} 14 | 15 | %% Set Sans font in headings 16 | \usepackage{sectsty} 17 | \allsectionsfont{\sffamily} 18 | 19 | %% Set polyglossia language 20 | \usepackage{polyglossia} 21 | \setdefaultlanguage{english} 22 | 23 | 24 | % Page 25 | 26 | %% Use full page in book style 27 | \usepackage{fullpage} 28 | 29 | %% Set line spacing 30 | \usepackage{setspace} 31 | \setstretch{1.2} 32 | 33 | %% Disable paragraph indentation 34 | \usepackage{parskip} 35 | 36 | %% Start sections from new page 37 | \let\stdsection\section 38 | \renewcommand\section{\newpage\stdsection} 39 | 40 | 41 | % Colors 42 | 43 | \usepackage{xcolor} 44 | 45 | %% Tango color scheme 46 | \definecolor{SkyBlue}{HTML}{3465A4} 47 | \definecolor{DarkSkyBlue}{HTML}{204A87} 48 | 49 | \definecolor{Plum}{HTML}{75507B} 50 | 51 | \definecolor{ScarletRed}{HTML}{CC0000} 52 | 53 | \definecolor{Aluminium1}{HTML}{EEEEEC} 54 | \definecolor{Aluminium6}{HTML}{2e3436} 55 | 56 | \definecolor{Black}{HTML}{000000} 57 | 58 | 59 | % Listings 60 | \usepackage{upquote} 61 | 62 | \usepackage{listings} 63 | 64 | \lstdefinelanguage{JavaScript}{ 65 | keywords = {typeof, new, true, false, catch, function, return, null, catch, switch, var, if, in, while, do, else, case, break}, 66 | keywordstyle = \color{SkyBlue}\bfseries, 67 | ndkeywords = {class, export, boolean, throw, implements, import, this}, 68 | ndkeywordstyle = \color{Aluminium6}\bfseries, 69 | identifierstyle = \color{Black}, 70 | sensitive = false, 71 | comment = [l]{//}, 72 | morecomment = [s]{/*}{*/}, 73 | commentstyle = \color{Plum}\ttfamily, 74 | stringstyle = \color{ScarletRed}\ttfamily, 75 | morestring = [b]', 76 | morestring = [b]" 77 | } 78 | 79 | \lstset{ 80 | language = JavaScript, 81 | backgroundcolor = \color{Aluminium1}, 82 | extendedchars = true, 83 | basicstyle = \normalsize\ttfamily, 84 | showstringspaces = false, 85 | showspaces = false, 86 | tabsize = 1, 87 | breaklines = true, 88 | showtabs = false 89 | } 90 | 91 | 92 | % Links 93 | 94 | %% Hyperref 95 | \usepackage[colorlinks, breaklinks, bookmarks, xetex]{hyperref} 96 | 97 | \hypersetup { 98 | linkcolor = DarkSkyBlue, 99 | citecolor = DarkSkyBlue, 100 | filecolor = DarkSkyBlue, 101 | urlcolor = DarkSkyBlue 102 | } 103 | 104 | %% Don’t use Mono font for URLs 105 | \urlstyle{same} 106 | 107 | 108 | % Images 109 | 110 | \usepackage{graphicx} 111 | 112 | 113 | % Pandoc hacks 114 | 115 | %% Normal enumerates processing 116 | \usepackage{enumerate} 117 | 118 | %% Disable section numbers 119 | \setcounter{secnumdepth}{0} 120 | 121 | 122 | \begin{document} 123 | 124 | % Title page 125 | 126 | \thispagestyle{empty} 127 | 128 | \vspace*{\fill} 129 | \begin{center} 130 | \includegraphics[width=0.7\textwidth]{title} 131 | \end{center} 132 | \vspace*{\fill} 133 | 134 | \setcounter{page}{0} 135 | 136 | % Book contents 137 | 138 | $body$ 139 | 140 | \end{document} 141 | -------------------------------------------------------------------------------- /en/mongodb.markdown: -------------------------------------------------------------------------------- 1 | # About This Book # 2 | 3 | ## License ## 4 | The Little MongoDB Book book is licensed under the Attribution-NonCommercial 3.0 Unported license. **You should not have paid for this book.** 5 | 6 | You are basically free to copy, distribute, modify or display the book. However, I ask that you always attribute the book to me, Karl Seguin and do not use it for commercial purposes. 7 | 8 | You can see the full text of the license at: 9 | 10 | 11 | 12 | ## About The Author ## 13 | Karl Seguin is a developer with experience across various fields and technologies. He's an expert .NET and Ruby developer. He's a semi-active contributor to OSS projects, a technical writer and an occasional speaker. With respect to MongoDB, he was a core contributor to the C# MongoDB library NoRM, wrote the interactive tutorial [mongly](http://openmymind.net/mongly/) as well as the [Mongo Web Admin](https://github.com/karlseguin/Mongo-Web-Admin). His free service for casual game developers, [mogade.com](http://mogade.com/), is powered by MongoDB. 14 | 15 | Karl has since written [The Little Redis Book](http://openmymind.net/2012/1/23/The-Little-Redis-Book/) 16 | 17 | His blog can be found at , and he tweets via [@karlseguin](http://twitter.com/karlseguin) 18 | 19 | ## With Thanks To ## 20 | A special thanks to [Perry Neal](http://twitter.com/perryneal) for lending me his eyes, mind and passion. You provided me with invaluable help. Thank you. 21 | 22 | ## Latest Version ## 23 | This version was updated for MongoDB 2.6 by Asya Kamsky. The latest source of this book is available at: 24 | 25 | . 26 | 27 | # Introduction # 28 | > It's not my fault the chapters are short, MongoDB is just easy to learn. 29 | 30 | It is often said that technology moves at a blazing pace. It's true that there is an ever growing list of new technologies and techniques being released. However, I've long been of the opinion that the fundamental technologies used by programmers move at a rather slow pace. One could spend years learning little yet remain relevant. What is striking though is the speed at which established technologies get replaced. Seemingly overnight, long-established technologies find themselves threatened by shifts in developer focus. 31 | 32 | Nothing could be more representative of this sudden shift than the progress of NoSQL technologies against well-established relational databases. It almost seems like one day the web was being driven by a few RDBMSs, and the next, five or so NoSQL solutions had established themselves as worthy solutions. 33 | 34 | Even though these transitions seem to happen overnight, the reality is that they can take years to become accepted practice. The initial enthusiasm is driven by a relatively small set of developers and companies. Solutions are refined, lessons learned and seeing that a new technology is here to stay, others slowly try it for themselves. Again, this is particularly true in the case of NoSQL where many solutions aren't replacements for more traditional storage solutions, but rather address a specific need in addition to what one might get from traditional offerings. 35 | 36 | Having said all of that, the first thing we ought to do is explain what is meant by NoSQL. It's a broad term that means different things to different people. Personally, I use it very broadly to mean a system that plays a part in the storage of data. Put another way, NoSQL (again, for me), is the belief that your persistence layer isn't necessarily the responsibility of a single system. Where relational database vendors have historically tried to position their software as a one-size-fits-all solution, NoSQL leans towards smaller units of responsibility where the best tool for a given job can be leveraged. So, your NoSQL stack might still leverage a relational database, say MySQL, but it'll also contain Redis as a persistence lookup for specific parts of the system as well as Hadoop for your intensive data processing. Put simply, NoSQL is about being open and aware of alternative, existing and additional patterns and tools for managing your data. 37 | 38 | You might be wondering where MongoDB fits into all of this. As a document-oriented database, MongoDB is a more generalized NoSQL solution. It should be viewed as an alternative to relational databases. Like relational databases, it too can benefit from being paired with some of the more specialized NoSQL solutions. MongoDB has advantages and drawbacks, which we'll cover in later parts of this book. 39 | 40 | # Getting Started # 41 | Most of this book will focus on core MongoDB functionality. We'll therefore rely on the MongoDB shell. While the shell is useful to learn as well as being a useful administrative tool, your code will use a MongoDB driver. 42 | 43 | This does bring up the first thing you should know about MongoDB: its drivers. MongoDB has a [number of official drivers](http://docs.mongodb.org/ecosystem/drivers/) for various languages. These drivers can be thought of as the various database drivers you are probably already familiar with. On top of these drivers, the development community has built more language/framework-specific libraries. For example, [NoRM](https://github.com/atheken/NoRM) is a C# library which implements LINQ, and [MongoMapper](https://github.com/jnunemaker/mongomapper) is a Ruby library which is ActiveRecord-friendly. Whether you choose to program directly against the core MongoDB drivers or some higher-level library is up to you. I point this out only because many people new to MongoDB are confused as to why there are both official drivers and community libraries - the former generally focuses on core communication/connectivity with MongoDB and the latter with more language and framework-specific implementations. 44 | 45 | As you read through this, I encourage you to play with MongoDB to replicate what I demonstrate as well as to explore questions that you might come up with on your own. It's easy to get up and running with MongoDB, so let's take a few minutes now to set things up. 46 | 47 | 1. Head over to the [official download page](http://www.mongodb.org/downloads) and grab the binaries from the first row (the recommended stable version) for your operating system of choice. For development purposes, you can pick either 32-bit or 64-bit. 48 | 49 | 2. Extract the archive (wherever you want) and navigate to the `bin` subfolder. Don't execute anything just yet, but know that `mongod` is the server process and `mongo` is the client shell - these are the two executables we'll be spending most of our time with. 50 | 51 | 3. Create a new text file in the `bin` subfolder named `mongodb.config`. 52 | 53 | 4. Add a single line to your mongodb.config: `dbpath=PATH_TO_WHERE_YOU_WANT_TO_STORE_YOUR_DATABASE_FILES`. For example, on Windows you might do `dbpath=c:\mongodb\data` and on Linux you might do `dbpath=/var/lib/mongodb/data`. 54 | 55 | 5. Make sure the `dbpath` you specified exists. 56 | 57 | 6. Launch mongod with the `--config /path/to/your/mongodb.config` parameter. 58 | 59 | As an example for Windows users, if you extracted the downloaded file to `c:\mongodb\` and you created `c:\mongodb\data\` then within `c:\mongodb\bin\mongodb.config` you would specify `dbpath=c:\mongodb\data\`. You could then launch `mongod` from a command prompt via `c:\mongodb\bin\mongod --config c:\mongodb\bin\mongodb.config`. 60 | 61 | Feel free to add the `bin` folder to your path to make all of this less verbose. MacOSX and Linux users can follow almost identical directions. The only thing you should have to change are the paths. 62 | 63 | Hopefully you now have MongoDB up and running. If you get an error, read the output carefully - the server is quite good at explaining what's wrong. 64 | 65 | You can now launch `mongo` (without the *d*) which will connect a shell to your running server. Try entering `db.version()` to make sure everything's working as it should. Hopefully you'll see the version number you installed. 66 | 67 | # Chapter 1 - The Basics # 68 | We begin our journey by getting to know the basic mechanics of working with MongoDB. Obviously this is core to understanding MongoDB, but it should also help us answer higher-level questions about where MongoDB fits. 69 | 70 | To get started, there are six simple concepts we need to understand. 71 | 72 | 1. MongoDB has the same concept of a `database` with which you are likely already familiar (or a schema for you Oracle folks). Within a MongoDB instance you can have zero or more databases, each acting as high-level containers for everything else. 73 | 74 | 2. A database can have zero or more `collections`. A collection shares enough in common with a traditional `table` that you can safely think of the two as the same thing. 75 | 76 | 3. Collections are made up of zero or more `documents`. Again, a document can safely be thought of as a `row`. 77 | 78 | 4. A document is made up of one or more `fields`, which you can probably guess are a lot like `columns`. 79 | 80 | 5. `Indexes` in MongoDB function mostly like their RDBMS counterparts. 81 | 82 | 6. `Cursors` are different than the other five concepts but they are important enough, and often overlooked, that I think they are worthy of their own discussion. The important thing to understand about cursors is that when you ask MongoDB for data, it returns a pointer to the result set called a cursor, which we can do things to, such as counting or skipping ahead, before actually pulling down data. 83 | 84 | To recap, MongoDB is made up of `databases` which contain `collections`. A `collection` is made up of `documents`. Each `document` is made up of `fields`. `Collections` can be `indexed`, which improves lookup and sorting performance. Finally, when we get data from MongoDB we do so through a `cursor` whose actual execution is delayed until necessary. 85 | 86 | Why use new terminology (collection vs. table, document vs. row and field vs. column)? Is it just to make things more complicated? The truth is that while these concepts are similar to their relational database counterparts, they are not identical. The core difference comes from the fact that relational databases define `columns` at the `table` level whereas a document-oriented database defines its `fields` at the `document` level. That is to say that each `document` within a `collection` can have its own unique set of `fields`. As such, a `collection` is a dumbed down container in comparison to a `table`, while a `document` has a lot more information than a `row`. 87 | 88 | Although this is important to understand, don't worry if things aren't yet clear. It won't take more than a couple of inserts to see what this truly means. Ultimately, the point is that a collection isn't strict about what goes in it (it's schema-less). Fields are tracked with each individual document. The benefits and drawbacks of this will be explored in a future chapter. 89 | 90 | Let's get hands-on. If you don't have it running already, go ahead and start the `mongod` server as well as a mongo shell. The shell runs JavaScript. There are some global commands you can execute, like `help` or `exit`. Commands that you execute against the current database are executed against the `db` object, such as `db.help()` or `db.stats()`. Commands that you execute against a specific collection, which is what we'll be doing a lot of, are executed against the `db.COLLECTION_NAME` object, such as `db.unicorns.help()` or `db.unicorns.count()`. 91 | 92 | Go ahead and enter `db.help()`, you'll get a list of commands that you can execute against the `db` object. 93 | 94 | A small side note: Because this is a JavaScript shell, if you execute a method and omit the parentheses `()`, you'll see the method body rather than executing the method. I only mention it so that the first time you do it and get a response that starts with `function (...){` you won't be surprised. For example, if you enter `db.help` (without the parentheses), you'll see the internal implementation of the `help` method. 95 | 96 | First we'll use the global `use` helper to switch databases, so go ahead and enter `use learn`. It doesn't matter that the database doesn't really exist yet. The first collection that we create will also create the actual `learn` database. Now that you are inside a database, you can start issuing database commands, like `db.getCollectionNames()`. If you do so, you should get an empty array (`[ ]`). Since collections are schema-less, we don't explicitly need to create them. We can simply insert a document into a new collection. To do so, use the `insert` command, supplying it with the document to insert: 97 | 98 | db.unicorns.insert({name: 'Aurora', 99 | gender: 'f', weight: 450}) 100 | 101 | The above line is executing `insert` against the `unicorns` collection, passing it a single parameter. Internally MongoDB uses a binary serialized JSON format called BSON. Externally, this means that we use JSON a lot, as is the case with our parameters. If we execute `db.getCollectionNames()` now, we'll actually see two collections: `unicorns` and `system.indexes`. The collection `system.indexes` is created once per database and contains the information on our database's indexes. 102 | 103 | You can now use the `find` command against `unicorns` to return a list of documents: 104 | 105 | db.unicorns.find() 106 | 107 | Notice that, in addition to the data you specified, there's an `_id` field. Every document must have a unique `_id` field. You can either generate one yourself or let MongoDB generate a value for you which has the type `ObjectId`. Most of the time you'll probably want to let MongoDB generate it for you. By default, the `_id` field is indexed - which explains why the `system.indexes` collection was created. You can look at `system.indexes`: 108 | 109 | db.system.indexes.find() 110 | 111 | What you're seeing is the name of the index, the database and collection it was created against and the fields included in the index. 112 | 113 | Now, back to our discussion about schema-less collections. Insert a totally different document into `unicorns`, such as: 114 | 115 | db.unicorns.insert({name: 'Leto', 116 | gender: 'm', 117 | home: 'Arrakeen', 118 | worm: false}) 119 | 120 | And, again use `find` to list the documents. Once we know a bit more, we'll discuss this interesting behavior of MongoDB, but hopefully you are starting to understand why the more traditional terminology wasn't a good fit. 121 | 122 | ## Mastering Selectors ## 123 | In addition to the six concepts we've explored, there's one practical aspect of MongoDB you need to have a good grasp of before moving to more advanced topics: query selectors. A MongoDB query selector is like the `where` clause of an SQL statement. As such, you use it when finding, counting, updating and removing documents from collections. A selector is a JSON object, the simplest of which is `{}` which matches all documents. If we wanted to find all female unicorns, we could use `{gender:'f'}`. 124 | 125 | Before delving too deeply into selectors, let's set up some data to play with. First, remove what we've put so far in the `unicorns` collection via: `db.unicorns.remove({})`. Now, issue the following inserts to get some data we can play with (I suggest you copy and paste this): 126 | 127 | db.unicorns.insert({name: 'Horny', 128 | dob: new Date(1992,2,13,7,47), 129 | loves: ['carrot','papaya'], 130 | weight: 600, 131 | gender: 'm', 132 | vampires: 63}); 133 | db.unicorns.insert({name: 'Aurora', 134 | dob: new Date(1991, 0, 24, 13, 0), 135 | loves: ['carrot', 'grape'], 136 | weight: 450, 137 | gender: 'f', 138 | vampires: 43}); 139 | db.unicorns.insert({name: 'Unicrom', 140 | dob: new Date(1973, 1, 9, 22, 10), 141 | loves: ['energon', 'redbull'], 142 | weight: 984, 143 | gender: 'm', 144 | vampires: 182}); 145 | db.unicorns.insert({name: 'Roooooodles', 146 | dob: new Date(1979, 7, 18, 18, 44), 147 | loves: ['apple'], 148 | weight: 575, 149 | gender: 'm', 150 | vampires: 99}); 151 | db.unicorns.insert({name: 'Solnara', 152 | dob: new Date(1985, 6, 4, 2, 1), 153 | loves:['apple', 'carrot', 154 | 'chocolate'], 155 | weight:550, 156 | gender:'f', 157 | vampires:80}); 158 | db.unicorns.insert({name:'Ayna', 159 | dob: new Date(1998, 2, 7, 8, 30), 160 | loves: ['strawberry', 'lemon'], 161 | weight: 733, 162 | gender: 'f', 163 | vampires: 40}); 164 | db.unicorns.insert({name:'Kenny', 165 | dob: new Date(1997, 6, 1, 10, 42), 166 | loves: ['grape', 'lemon'], 167 | weight: 690, 168 | gender: 'm', 169 | vampires: 39}); 170 | db.unicorns.insert({name: 'Raleigh', 171 | dob: new Date(2005, 4, 3, 0, 57), 172 | loves: ['apple', 'sugar'], 173 | weight: 421, 174 | gender: 'm', 175 | vampires: 2}); 176 | db.unicorns.insert({name: 'Leia', 177 | dob: new Date(2001, 9, 8, 14, 53), 178 | loves: ['apple', 'watermelon'], 179 | weight: 601, 180 | gender: 'f', 181 | vampires: 33}); 182 | db.unicorns.insert({name: 'Pilot', 183 | dob: new Date(1997, 2, 1, 5, 3), 184 | loves: ['apple', 'watermelon'], 185 | weight: 650, 186 | gender: 'm', 187 | vampires: 54}); 188 | db.unicorns.insert({name: 'Nimue', 189 | dob: new Date(1999, 11, 20, 16, 15), 190 | loves: ['grape', 'carrot'], 191 | weight: 540, 192 | gender: 'f'}); 193 | db.unicorns.insert({name: 'Dunx', 194 | dob: new Date(1976, 6, 18, 18, 18), 195 | loves: ['grape', 'watermelon'], 196 | weight: 704, 197 | gender: 'm', 198 | vampires: 165}); 199 | 200 | Now that we have data, we can master selectors. `{field: value}` is used to find any documents where `field` is equal to `value`. `{field1: value1, field2: value2}` is how we do an `and` statement. The special `$lt`, `$lte`, `$gt`, `$gte` and `$ne` are used for less than, less than or equal, greater than, greater than or equal and not equal operations. For example, to get all male unicorns that weigh more than 700 pounds, we could do: 201 | 202 | db.unicorns.find({gender: 'm', 203 | weight: {$gt: 700}}) 204 | //or (not quite the same thing, but for 205 | //demonstration purposes) 206 | db.unicorns.find({gender: {$ne: 'f'}, 207 | weight: {$gte: 701}}) 208 | 209 | 210 | The `$exists` operator is used for matching the presence or absence of a field, for example: 211 | 212 | db.unicorns.find({ 213 | vampires: {$exists: false}}) 214 | 215 | should return a single document. The '$in' operator is used for matching one of several values that we pass as an array, for example: 216 | 217 | db.unicorns.find({ 218 | loves: {$in:['apple','orange']}}) 219 | 220 | This returns any unicorn who loves 'apple' or 'orange'. 221 | 222 | If we want to OR rather than AND several conditions on different fields, we use the `$or` operator and assign to it an array of selectors we want or'd: 223 | 224 | db.unicorns.find({gender: 'f', 225 | $or: [{loves: 'apple'}, 226 | {weight: {$lt: 500}}]}) 227 | 228 | The above will return all female unicorns which either love apples or weigh less than 500 pounds. 229 | 230 | There's something pretty neat going on in our last two examples. You might have already noticed, but the `loves` field is an array. MongoDB supports arrays as first class objects. This is an incredibly handy feature. Once you start using it, you wonder how you ever lived without it. What's more interesting is how easy selecting based on an array value is: `{loves: 'watermelon'}` will return any document where `watermelon` is a value of `loves`. 231 | 232 | There are more available operators than what we've seen so far. These are all described in the [Query Selectors](http://docs.mongodb.org/manual/reference/operator/query/#query-selectors) section of the MongoDB manual. What we've covered so far though is the basics you'll need to get started. It's also what you'll end up using most of the time. 233 | 234 | We've seen how these selectors can be used with the `find` command. They can also be used with the `remove` command which we've briefly looked at, the `count` command, which we haven't looked at but you can probably figure out, and the `update` command which we'll spend more time with later on. 235 | 236 | The `ObjectId` which MongoDB generated for our `_id` field can be selected like so: 237 | 238 | db.unicorns.find( 239 | {_id: ObjectId("TheObjectId")}) 240 | 241 | ## In This Chapter ## 242 | We haven't looked at the `update` command yet, or some of the fancier things we can do with `find`. However, we did get MongoDB up and running, looked briefly at the `insert` and `remove` commands (there isn't much more than what we've seen). We also introduced `find` and saw what MongoDB `selectors` were all about. We've had a good start and laid a solid foundation for things to come. Believe it or not, you actually know most of what you need to know to get started with MongoDB - it really is meant to be quick to learn and easy to use. I strongly urge you to play with your local copy before moving on. Insert different documents, possibly in new collections, and get familiar with different selectors. Use `find`, `count` and `remove`. After a few tries on your own, things that might have seemed awkward at first will hopefully fall into place. 243 | 244 | # Chapter 2 - Updating # 245 | In chapter 1 we introduced three of the four CRUD (create, read, update and delete) operations. This chapter is dedicated to the one we skipped over: `update`. `Update` has a few surprising behaviors, which is why we dedicate a chapter to it. 246 | 247 | ## Update: Replace Versus $set ## 248 | In its simplest form, `update` takes two parameters: the selector (where) to use and what updates to apply to fields. If Roooooodles had gained a bit of weight, you might expect that we should execute: 249 | 250 | db.unicorns.update({name: 'Roooooodles'}, 251 | {weight: 590}) 252 | 253 | (If you've played with your `unicorns` collection and it doesn't have the original data anymore, go ahead and `remove` all documents and re-insert from the code in chapter 1.) 254 | 255 | Now, if we look at the updated record: 256 | 257 | db.unicorns.find({name: 'Roooooodles'}) 258 | 259 | You should discover the first surprise of `update`. No document is found because the second parameter we supplied didn't have any update operators, and therefore it was used to **replace** the original document. In other words, the `update` found a document by `name` and replaced the entire document with the new document (the second parameter). There is no equivalent functionality to this in SQL's `update` command. In some situations, this is ideal and can be leveraged for some truly dynamic updates. However, when you want to change the value of one, or a few fields, you must use MongoDB's `$set` operator. Go ahead and run this update to reset the lost fields: 260 | 261 | db.unicorns.update({weight: 590}, {$set: { 262 | name: 'Roooooodles', 263 | dob: new Date(1979, 7, 18, 18, 44), 264 | loves: ['apple'], 265 | gender: 'm', 266 | vampires: 99}}) 267 | 268 | This won't overwrite the new `weight` since we didn't specify it. Now if we execute: 269 | 270 | db.unicorns.find({name: 'Roooooodles'}) 271 | 272 | We get the expected result. Therefore, the correct way to have updated the weight in the first place is: 273 | 274 | db.unicorns.update({name: 'Roooooodles'}, 275 | {$set: {weight: 590}}) 276 | 277 | ## Update Operators ## 278 | In addition to `$set`, we can leverage other operators to do some nifty things. All update operators work on fields - so your entire document won't be wiped out. For example, the `$inc` operator is used to increment a field by a certain positive or negative amount. If Pilot was incorrectly awarded a couple vampire kills, we could correct the mistake by executing: 279 | 280 | db.unicorns.update({name: 'Pilot'}, 281 | {$inc: {vampires: -2}}) 282 | 283 | If Aurora suddenly developed a sweet tooth, we could add a value to her `loves` field via the `$push` operator: 284 | 285 | db.unicorns.update({name: 'Aurora'}, 286 | {$push: {loves: 'sugar'}}) 287 | 288 | The [Update Operators](http://docs.mongodb.org/manual/reference/operator/update/#update-operators) section of the MongoDB manual has more information on the other available update operators. 289 | 290 | ## Upserts ## 291 | One of the more pleasant surprises of using `update` is that it fully supports `upserts`. An `upsert` updates the document if found or inserts it if not. Upserts are handy to have in certain situations and when you run into one, you'll know it. To enable upserting we pass a third parameter to update `{upsert:true}`. 292 | 293 | A mundane example is a hit counter for a website. If we wanted to keep an aggregate count in real time, we'd have to see if the record already existed for the page, and based on that decide to run an update or insert. With the upsert option omitted (or set to false), executing the following won't do anything: 294 | 295 | db.hits.update({page: 'unicorns'}, 296 | {$inc: {hits: 1}}); 297 | db.hits.find(); 298 | 299 | However, if we add the upsert option, the results are quite different: 300 | 301 | db.hits.update({page: 'unicorns'}, 302 | {$inc: {hits: 1}}, {upsert:true}); 303 | db.hits.find(); 304 | 305 | Since no documents exists with a field `page` equal to `unicorns`, a new document is inserted. If we execute it a second time, the existing document is updated and `hits` is incremented to 2. 306 | 307 | db.hits.update({page: 'unicorns'}, 308 | {$inc: {hits: 1}}, {upsert:true}); 309 | db.hits.find(); 310 | 311 | ## Multiple Updates ## 312 | The final surprise `update` has to offer is that, by default, it'll update a single document. So far, for the examples we've looked at, this might seem logical. However, if you executed something like: 313 | 314 | db.unicorns.update({}, 315 | {$set: {vaccinated: true }}); 316 | db.unicorns.find({vaccinated: true}); 317 | 318 | You might expect to find all of your precious unicorns to be vaccinated. To get the behavior you desire, the `multi` option must be set to true: 319 | 320 | db.unicorns.update({}, 321 | {$set: {vaccinated: true }}, 322 | {multi:true}); 323 | db.unicorns.find({vaccinated: true}); 324 | 325 | ## In This Chapter ## 326 | This chapter concluded our introduction to the basic CRUD operations available against a collection. We looked at `update` in detail and observed three interesting behaviors. First, if you pass it a document without update operators, MongoDB's `update` will replace the existing document. Because of this, normally you will use the `$set` operator (or one of the many other available operators that modify the document). Secondly, `update` supports an intuitive `upsert` option which is particularly useful when you don't know if the document already exists. Finally, by default, `update` updates only the first matching document, so use the `multi` option when you want to update all matching documents. 327 | 328 | # Chapter 3 - Mastering Find # 329 | Chapter 1 provided a superficial look at the `find` command. There's more to `find` than understanding `selectors` though. We already mentioned that the result from `find` is a `cursor`. We'll now look at exactly what this means in more detail. 330 | 331 | ## Field Selection ## 332 | Before we jump into `cursors`, you should know that `find` takes a second optional parameter called "projection". This parameter is the list of fields we want to retrieve or exclude. For example, we can get all of the unicorns' names without getting back other fields by executing: 333 | 334 | db.unicorns.find({}, {name: 1}); 335 | 336 | By default, the `_id` field is always returned. We can explicitly exclude it by specifying `{name:1, _id: 0}`. 337 | 338 | Aside from the `_id` field, you cannot mix and match inclusion and exclusion. If you think about it, that actually makes sense. You either want to select or exclude one or more fields explicitly. 339 | 340 | ## Ordering ## 341 | A few times now I've mentioned that `find` returns a cursor whose execution is delayed until needed. However, what you've no doubt observed from the shell is that `find` executes immediately. This is a behavior of the shell only. We can observe the true behavior of `cursors` by looking at one of the methods we can chain to `find`. The first that we'll look at is `sort`. We specify the fields we want to sort on as a JSON document, using 1 for ascending and -1 for descending. For example: 342 | 343 | //heaviest unicorns first 344 | db.unicorns.find().sort({weight: -1}) 345 | 346 | //by unicorn name then vampire kills: 347 | db.unicorns.find().sort({name: 1, 348 | vampires: -1}) 349 | 350 | As with a relational database, MongoDB can use an index for sorting. We'll look at indexes in more detail later on. However, you should know that MongoDB limits the size of your sort without an index. That is, if you try to sort a very large result set which can't use an index, you'll get an error. Some people see this as a limitation. In truth, I wish more databases had the capability to refuse to run unoptimized queries. (I won't turn every MongoDB drawback into a positive, but I've seen enough poorly optimized databases that I sincerely wish they had a strict-mode.) 351 | 352 | ## Paging ## 353 | Paging results can be accomplished via the `limit` and `skip` cursor methods. To get the second and third heaviest unicorn, we could do: 354 | 355 | db.unicorns.find() 356 | .sort({weight: -1}) 357 | .limit(2) 358 | .skip(1) 359 | 360 | Using `limit` in conjunction with `sort`, can be a way to avoid running into problems when sorting on non-indexed fields. 361 | 362 | ## Count ## 363 | The shell makes it possible to execute a `count` directly on a collection, such as: 364 | 365 | db.unicorns.count({vampires: {$gt: 50}}) 366 | 367 | In reality, `count` is actually a `cursor` method, the shell simply provides a shortcut. Drivers which don't provide such a shortcut need to be executed like this (which will also work in the shell): 368 | 369 | db.unicorns.find({vampires: {$gt: 50}}) 370 | .count() 371 | 372 | ## In This Chapter ## 373 | Using `find` and `cursors` is a straightforward proposition. There are a few additional commands that we'll either cover in later chapters or which only serve edge cases, but, by now, you should be getting pretty comfortable working in the mongo shell and understanding the fundamentals of MongoDB. 374 | 375 | # Chapter 4 - Data Modeling # 376 | Let's shift gears and have a more abstract conversation about MongoDB. Explaining a few new terms and some new syntax is a trivial task. Having a conversation about modeling with a new paradigm isn't as easy. The truth is that most of us are still finding out what works and what doesn't when it comes to modeling with these new technologies. It's a conversation we can start having, but ultimately you'll have to practice and learn on real code. 377 | 378 | Out of all NoSQL databases, document-oriented databases are probably the most similar to relational databases - at least when it comes to modeling. However, the differences that exist are important. 379 | 380 | ## No Joins ## 381 | The first and most fundamental difference that you'll need to get comfortable with is MongoDB's lack of joins. I don't know the specific reason why some type of join syntax isn't supported in MongoDB, but I do know that joins are generally seen as non-scalable. That is, once you start to split your data horizontally, you end up performing your joins on the client (the application server) anyway. Regardless of the reasons, the fact remains that data *is* relational, and MongoDB doesn't support joins. 382 | 383 | Without knowing anything else, to live in a join-less world, we have to do joins ourselves within our application's code. Essentially we need to issue a second query to `find` the relevant data in a second collection. Setting our data up isn't any different than declaring a foreign key in a relational database. Let's give a little less focus to our beautiful `unicorns` and a bit more time to our `employees`. The first thing we'll do is create an employee (I'm providing an explicit `_id` so that we can build coherent examples) 384 | 385 | db.employees.insert({_id: ObjectId( 386 | "4d85c7039ab0fd70a117d730"), 387 | name: 'Leto'}) 388 | 389 | Now let's add a couple employees and set their manager as `Leto`: 390 | 391 | db.employees.insert({_id: ObjectId( 392 | "4d85c7039ab0fd70a117d731"), 393 | name: 'Duncan', 394 | manager: ObjectId( 395 | "4d85c7039ab0fd70a117d730")}); 396 | db.employees.insert({_id: ObjectId( 397 | "4d85c7039ab0fd70a117d732"), 398 | name: 'Moneo', 399 | manager: ObjectId( 400 | "4d85c7039ab0fd70a117d730")}); 401 | 402 | 403 | (It's worth repeating that the `_id` can be any unique value. Since you'd likely use an `ObjectId` in real life, we'll use them here as well.) 404 | 405 | Of course, to find all of Leto's employees, one simply executes: 406 | 407 | db.employees.find({manager: ObjectId( 408 | "4d85c7039ab0fd70a117d730")}) 409 | 410 | There's nothing magical here. In the worst cases, most of the time, the lack of join will merely require an extra query (likely indexed). 411 | 412 | ## Arrays and Embedded Documents ## 413 | Just because MongoDB doesn't have joins doesn't mean it doesn't have a few tricks up its sleeve. Remember when we saw that MongoDB supports arrays as first class objects of a document? It turns out that this is incredibly handy when dealing with many-to-one or many-to-many relationships. As a simple example, if an employee could have two managers, we could simply store these in an array: 414 | 415 | db.employees.insert({_id: ObjectId( 416 | "4d85c7039ab0fd70a117d733"), 417 | name: 'Siona', 418 | manager: [ObjectId( 419 | "4d85c7039ab0fd70a117d730"), 420 | ObjectId( 421 | "4d85c7039ab0fd70a117d732")] }) 422 | 423 | Of particular interest is that, for some documents, `manager` can be a scalar value, while for others it can be an array. Our original `find` query will work for both: 424 | 425 | db.employees.find({manager: ObjectId( 426 | "4d85c7039ab0fd70a117d730")}) 427 | 428 | You'll quickly find that arrays of values are much more convenient to deal with than many-to-many join-tables. 429 | 430 | Besides arrays, MongoDB also supports embedded documents. Go ahead and try inserting a document with a nested document, such as: 431 | 432 | db.employees.insert({_id: ObjectId( 433 | "4d85c7039ab0fd70a117d734"), 434 | name: 'Ghanima', 435 | family: {mother: 'Chani', 436 | father: 'Paul', 437 | brother: ObjectId( 438 | "4d85c7039ab0fd70a117d730")}}) 439 | 440 | In case you are wondering, embedded documents can be queried using a dot-notation: 441 | 442 | db.employees.find({ 443 | 'family.mother': 'Chani'}) 444 | 445 | We'll briefly talk about where embedded documents fit and how you should use them. 446 | 447 | Combining the two concepts, we can even embed arrays of documents: 448 | 449 | db.employees.insert({_id: ObjectId( 450 | "4d85c7039ab0fd70a117d735"), 451 | name: 'Chani', 452 | family: [ {relation:'mother',name: 'Chani'}, 453 | {relation:'father',name: 'Paul'}, 454 | {relation:'brother', name: 'Duncan'}]}) 455 | 456 | 457 | ## Denormalization ## 458 | Yet another alternative to using joins is to denormalize your data. Historically, denormalization was reserved for performance-sensitive code, or when data should be snapshotted (like in an audit log). However, with the ever-growing popularity of NoSQL, many of which don't have joins, denormalization as part of normal modeling is becoming increasingly common. This doesn't mean you should duplicate every piece of information in every document. However, rather than letting fear of duplicate data drive your design decisions, consider modeling your data based on what information belongs to what document. 459 | 460 | For example, say you are writing a forum application. The traditional way to associate a specific `user` with a `post` is via a `userid` column within `posts`. With such a model, you can't display `posts` without retrieving (joining to) `users`. A possible alternative is simply to store the `name` as well as the `userid` with each `post`. You could even do so with an embedded document, like `user: {id: ObjectId('Something'), name: 'Leto'}`. Yes, if you let users change their name, you may have to update each document (which is one multi-update). 461 | 462 | Adjusting to this kind of approach won't come easy to some. In a lot of cases it won't even make sense to do this. Don't be afraid to experiment with this approach though. It's not only suitable in some circumstances, but it can also be the best way to do it. 463 | 464 | ## Which Should You Choose? ## 465 | Arrays of ids can be a useful strategy when dealing with one-to-many or many-to-many scenarios. But more commonly, new developers are left deciding between using embedded documents versus doing "manual" referencing. 466 | 467 | First, you should know that an individual document is currently limited to 16 megabytes in size. Knowing that documents have a size limit, though quite generous, gives you some idea of how they are intended to be used. At this point, it seems like most developers lean heavily on manual references for most of their relationships. Embedded documents are frequently leveraged, but mostly for smaller pieces of data which we want to always pull with the parent document. A real world example may be to store an `addresses` documents with each user, something like: 468 | 469 | db.users.insert({name: 'leto', 470 | email: 'leto@dune.gov', 471 | addresses: [{street: "229 W. 43rd St", 472 | city: "New York", state:"NY",zip:"10036"}, 473 | {street: "555 University", 474 | city: "Palo Alto", state:"CA",zip:"94107"}]}) 475 | 476 | This doesn't mean you should underestimate the power of embedded documents or write them off as something of minor utility. Having your data model map directly to your objects makes things a lot simpler and often removes the need to join. This is especially true when you consider that MongoDB lets you query and index fields of an embedded documents and arrays. 477 | 478 | ## Few or Many Collections ## 479 | Given that collections don't enforce any schema, it's entirely possible to build a system using a single collection with a mishmash of documents but it would be a very bad idea. Most MongoDB systems are laid out somewhat similarly to what you'd find in a relational system, though with fewer collections. In other words, if it would be a table in a relational database, there's a chance it'll be a collection in MongoDB (many-to-many join tables being an important exception as well as tables that exist only to enable one to many relationships with simple entities). 480 | 481 | The conversation gets even more interesting when you consider embedded documents. The example that frequently comes up is a blog. Should you have a `posts` collection and a `comments` collection, or should each `post` have an array of `comments` embedded within it? Setting aside the 16MB document size limit for the time being (all of *Hamlet* is less than 200KB, so just how popular is your blog?), most developers should prefer to separate things out. It's simply cleaner, gives you better performance and more explicit. MongoDB's flexible schema allows you to combine the two approaches by keeping comments in their own collection but embedding a few comments (maybe the first few) in the blog post to be able to display them with the post. This follows the principle of keeping together data that you want to get back in one query. 482 | 483 | There's no hard rule (well, aside from 16MB). Play with different approaches and you'll get a sense of what does and does not feel right. 484 | 485 | ## In This Chapter ## 486 | Our goal in this chapter was to provide some helpful guidelines for modeling your data in MongoDB, a starting point, if you will. Modeling in a document-oriented system is different, but not too different, than in a relational world. You have more flexibility and one constraint, but for a new system, things tend to fit quite nicely. The only way you can go wrong is by not trying. 487 | 488 | # Chapter 5 - When To Use MongoDB # 489 | By now you should have a feel for where and how MongoDB might fit into your existing system. There are enough new and competing storage technologies that it's easy to get overwhelmed by all of the choices. 490 | 491 | For me, the most important lesson, which has nothing to do with MongoDB, is that you no longer have to rely on a single solution for dealing with your data. No doubt, a single solution has obvious advantages, and for a lot projects - possibly even most - a single solution is the sensible approach. The idea isn't that you *must* use different technologies, but rather that you *can*. Only you know whether the benefits of introducing a new solution outweigh the costs. 492 | 493 | With that said, I'm hopeful that what you've seen so far has made you see MongoDB as a general solution. It's been mentioned a couple times that document-oriented databases share a lot in common with relational databases. Therefore, rather than tiptoeing around it, let's simply state that MongoDB should be seen as a direct alternative to relational databases. Where one might see Lucene as enhancing a relational database with full text indexing, or Redis as a persistent key-value store, MongoDB is a central repository for your data. 494 | 495 | Notice that I didn't call MongoDB a *replacement* for relational databases, but rather an *alternative*. It's a tool that can do what a lot of other tools can do. Some of it MongoDB does better, some of it MongoDB does worse. Let's dissect things a little further. 496 | 497 | ## Flexible Schema ## 498 | An oft-touted benefit of document-oriented database is that they don't enforce a fixed schema. This makes them much more flexible than traditional database tables. I agree that flexible schema is a nice feature, but not for the main reason most people mention. 499 | 500 | People talk about schema-less as though you'll suddenly start storing a crazy mishmash of data. There are domains and data sets which can really be a pain to model using relational databases, but I see those as edge cases. Schema-less is cool, but most of your data is going to be highly structured. It's true that having an occasional mismatch can be handy, especially when you introduce new features, but in reality it's nothing a nullable column probably wouldn't solve just as well. 501 | 502 | For me, the real benefit of dynamic schema is the lack of setup and the reduced friction with OOP. This is particularly true when you're working with a static language. I've worked with MongoDB in both C# and Ruby, and the difference is striking. Ruby's dynamism and its popular ActiveRecord implementations already reduce much of the object-relational impedance mismatch. That isn't to say MongoDB isn't a good match for Ruby, it really is. Rather, I think most Ruby developers would see MongoDB as an incremental improvement, whereas C# or Java developers would see a fundamental shift in how they interact with their data. 503 | 504 | Think about it from the perspective of a driver developer. You want to save an object? Serialize it to JSON (technically BSON, but close enough) and send it to MongoDB. There is no property mapping or type mapping. This straightforwardness definitely flows to you, the end developer. 505 | 506 | ## Writes ## 507 | One area where MongoDB can fit a specialized role is in logging. There are two aspects of MongoDB which make writes quite fast. First, you have an option to send a write command and have it return immediately without waiting for the write to be acknowledged. Secondly, you can control the write behavior with respect to data durability. These settings, in addition to specifying how many servers should get your data before being considered successful, are configurable per-write, giving you a great level of control over write performance and data durability. 508 | 509 | In addition to these performance factors, log data is one of those data sets which can often take advantage of schema-less collections. Finally, MongoDB has something called a [capped collection](http://docs.mongodb.org/manual/core/capped-collections/). So far, all of the implicitly created collections we've created are just normal collections. We can create a capped collection by using the `db.createCollection` command and flagging it as capped: 510 | 511 | //limit our capped collection to 1 megabyte 512 | db.createCollection('logs', {capped: true, 513 | size: 1048576}) 514 | 515 | When our capped collection reaches its 1MB limit, old documents are automatically purged. A limit on the number of documents, rather than the size, can be set using `max`. Capped collections have some interesting properties. For example, you can update a document but it can't change in size. The insertion order is preserved, so you don't need to add an extra index to get proper time-based sorting. You can "tail" a capped collection the way you tail a file in Unix via `tail -f ` which allows you to get new data as it arrives, without having to re-query it. 516 | 517 | If you want to "expire" your data based on time rather than overall collection size, you can use [TTL Indexes](http://docs.mongodb.org/manual/tutorial/expire-data/) where TTL stands for "time-to-live". 518 | 519 | ## Durability ## 520 | Prior to version 1.8, MongoDB did not have single-server durability. That is, a server crash would likely result in lost or corrupt data. The solution had always been to run MongoDB in a multi-server setup (MongoDB supports replication). Journaling was one of the major features added in 1.8. Since version 2.0 MongoDB enables journaling by default, which allows fast recovery of the server in case of a crash or abrupt power loss. 521 | 522 | Durability is only mentioned here because a lot has been made around MongoDB's past lack of single-server durability. This'll likely show up in Google searches for some time to come. Information you find about journaling being a missing feature is simply out of date. 523 | 524 | ## Full Text Search ## 525 | True full text search capability is a recent addition to MongoDB. It supports fifteen languages with stemming and stop words. With MongoDB's support for arrays and full text search you will only need to look to other solutions if you need a more powerful and full-featured full text search engine. 526 | 527 | ## Transactions ## 528 | MongoDB doesn't have transactions. It has two alternatives, one which is great but with limited use, and the other that is cumbersome but flexible. 529 | 530 | The first is its many atomic update operations. These are great, so long as they actually address your problem. We already saw some of the simpler ones, like `$inc` and `$set`. There are also commands like `findAndModify` which can update or delete a document and return it atomically. 531 | 532 | The second, when atomic operations aren't enough, is to fall back to a two-phase commit. A two-phase commit is to transactions what manual dereferencing is to joins. It's a storage-agnostic solution that you do in code. Two-phase commits are actually quite popular in the relational world as a way to implement transactions across multiple databases. The MongoDB website [has an example](http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/) illustrating the most typical example (a transfer of funds). The general idea is that you store the state of the transaction within the actual document being updated atomically and go through the init-pending-commit/rollback steps manually. 533 | 534 | MongoDB's support for nested documents and flexible schema design makes two-phase commits slightly less painful, but it still isn't a great process, especially when you are just getting started with it. 535 | 536 | ## Data Processing ## 537 | Before version 2.2 MongoDB relied on MapReduce for most data processing jobs. As of 2.2 it has added a powerful feature called [aggregation framework or pipeline](http://docs.mongodb.org/manual/core/aggregation-pipeline/), so you'll only need to use MapReduce in rare cases where you need complex functions for aggregations that are not yet supported in the pipeline. In the next chapter we'll look at Aggregation Pipeline and MapReduce in detail. For now you can think of them as feature-rich and different ways to `group by` (which is an understatement). For parallel processing of very large data, you may need to rely on something else, such as Hadoop. Thankfully, since the two systems really do complement each other, there's a [MongoDB connector for Hadoop](http://docs.mongodb.org/ecosystem/tools/hadoop/). 538 | 539 | Of course, parallelizing data processing isn't something relational databases excel at either. There are plans for future versions of MongoDB to be better at handling very large sets of data. 540 | 541 | ## Geospatial ## 542 | A particularly powerful feature of MongoDB is its support for [geospatial indexes](http://docs.mongodb.org/manual/applications/geospatial-indexes/). This allows you to store either geoJSON or x and y coordinates within documents and then find documents that are `$near` a set of coordinates or `$within` a box or circle. This is a feature best explained via some visual aids, so I invite you to try the [5 minute geospatial interactive tutorial](http://mongly.openmymind.net/geo/index), if you want to learn more. 543 | 544 | ## Tools and Maturity ## 545 | You probably already know the answer to this, but MongoDB is obviously younger than most relational database systems. This is absolutely something you should consider, though how much it matters depends on what you are doing and how you are doing it. Nevertheless, an honest assessment simply can't ignore the fact that MongoDB is younger and the available tooling around isn't great (although the tooling around a lot of very mature relational databases is pretty horrible too!). As an example, the lack of support for base-10 floating point numbers will obviously be a concern (though not necessarily a show-stopper) for systems dealing with money. 546 | 547 | On the positive side, drivers exist for a great many languages, the protocol is modern and simple, and development is happening at blinding speeds. MongoDB is in production at enough companies that concerns about maturity, while valid, are quickly becoming a thing of the past. 548 | 549 | ## In This Chapter ## 550 | The message from this chapter is that MongoDB, in most cases, can replace a relational database. It's much simpler and straightforward; it's faster and generally imposes fewer restrictions on application developers. The lack of transactions can be a legitimate and serious concern. However, when people ask *where does MongoDB sit with respect to the new data storage landscape?* the answer is simple: **right in the middle**. 551 | 552 | # Chapter 6 - Aggregating Data # 553 | 554 | ## Aggregation Pipeline ## 555 | Aggregation pipeline gives you a way to transform and combine documents in your collection. You do it by passing the documents through a pipeline that's somewhat analogous to the Unix "pipe" where you send output from one command to another to a third, etc. 556 | 557 | The simplest aggregation you are probably already familiar with is the SQL `group by` expression. We already saw the simple `count()` method, but what if we want to see how many unicorns are male and how many are female? 558 | 559 | db.unicorns.aggregate([{$group:{_id:'$gender', 560 | total: {$sum:1}}}]) 561 | 562 | In the shell we have the `aggregate` helper which takes an array of pipeline operators. For a simple count grouped by something, we only need one such operator and it's called `$group`. This is the exact analog of `GROUP BY` in SQL where we create a new document with `_id` field indicating what field we are grouping by (here it's `gender`) and other fields usually getting assigned results of some aggregation, in this case we `$sum` 1 for each document that matches a particular gender. You probably noticed that the `_id` field was assigned `'$gender'` and not `'gender'` - the `'$'` before a field name indicates that the value of this field from incoming document will be substituted. 563 | 564 | What are some of the other pipeline operators that we can use? The most common one to use before (and frequently after) `$group` would be `$match` - this is exactly like the `find` method and it allows us to aggregate only a matching subset of our documents, or to exclude some documents from our result. 565 | 566 | db.unicorns.aggregate([{$match: {weight:{$lt:600}}}, 567 | {$group: {_id:'$gender', total:{$sum:1}, 568 | avgVamp:{$avg:'$vampires'}}}, 569 | {$sort:{avgVamp:-1}} ]) 570 | 571 | Here we introduced another pipeline operator `$sort` which does exactly what you would expect, along with it we also get `$skip` and `$limit`. We also used a `$group` operator `$avg`. 572 | 573 | MongoDB arrays are powerful and they don't stop us from being able to aggregate on values that are stored inside of them. We do need to be able to "flatten" them to properly count everything: 574 | 575 | db.unicorns.aggregate([{$unwind:'$loves'}, 576 | {$group: {_id:'$loves', total:{$sum:1}, 577 | unicorns:{$addToSet:'$name'}}}, 578 | {$sort:{total:-1}}, 579 | {$limit:1} ]) 580 | 581 | Here we will find out which food item is loved by the most unicorns and we will also get the list of names of all the unicorns that love it. `$sort` and `$limit` in combination allow you to get answers to "top N" types of questions. 582 | 583 | There is another powerful pipeline operator called [`$project`](http://docs.mongodb.org/manual/reference/operator/aggregation/project/#pipe._S_project) (analogous to the projection we can specify to `find`) which allows you not just to include certain fields, but to create or calculate new fields based on values in existing fields. For example, you can use math operators to add together values of several fields before finding out the average, or you can use string operators to create a new field that's a concatenation of some existing fields. 584 | 585 | This just barely scratches the surface of what you can do with aggregations. In 2.6 aggregation got more powerful as the aggregate command returns either a cursor to the result set (which you already know how to work with from Chapter 1) or it can write your results into a new collection using the `$out` pipeline operator. You can see a lot more examples as well as all of the supported pipeline and expression operators in the [MongoDB manual](http://docs.mongodb.org/manual/core/aggregation-pipeline/). 586 | 587 | ## MapReduce ## 588 | MapReduce is a two-step approach to data processing. First you map, and then you reduce. The mapping step transforms the inputted documents and emits a key=>value pair (the key and/or value can be complex). Then, key/value pairs are grouped by key, such that values for the same key end up in an array. The reduce gets a key and the array of values emitted for that key, and produces the final result. The map and reduce functions are written in JavaScript. 589 | 590 | With MongoDB we use the `mapReduce` command on a collection. `mapReduce` takes a map function, a reduce function and an output directive. In our shell we can create and pass a JavaScript function. From most libraries you supply a string of your functions (which is a bit ugly). The third parameter sets additional options, for example we could filter, sort and limit the documents that we want analyzed. We can also supply a `finalize` method to be applied to the results after the `reduce` step. 591 | 592 | You probably won't need to use MapReduce for most of your aggregations, but if you do, you can read more about it [on my blog](http://openmymind.net/2011/1/20/Understanding-Map-Reduce/) and in [MongoDB manual](http://docs.mongodb.org/manual/core/map-reduce/). 593 | 594 | ## In This Chapter ## 595 | In this chapter we covered MongoDB's [aggregation capabilities](http://docs.mongodb.org/manual/aggregation/). Aggregation Pipeline is relatively simple to write once you understand how it's structured and it's a powerful way to group data. MapReduce is more complicated to understand, but its capabilities can be as boundless as any code you can write in JavaScript. 596 | 597 | # Chapter 7 - Performance and Tools # 598 | In this last chapter, we look at a few performance topics as well as some of the tools available to MongoDB developers. We won't dive deeply into either topic, but we will examine the most important aspects of each. 599 | 600 | ## Indexes ## 601 | At the very beginning we saw the special `system.indexes` collection which contains information on all the indexes in our database. Indexes in MongoDB work a lot like indexes in a relational database: they help improve query and sorting performance. Indexes are created via `ensureIndex`: 602 | 603 | // where "name" is the field name 604 | db.unicorns.ensureIndex({name: 1}); 605 | 606 | And dropped via `dropIndex`: 607 | 608 | db.unicorns.dropIndex({name: 1}); 609 | 610 | A unique index can be created by supplying a second parameter and setting `unique` to `true`: 611 | 612 | db.unicorns.ensureIndex({name: 1}, 613 | {unique: true}); 614 | 615 | Indexes can be created on embedded fields (again, using the dot-notation) and on array fields. We can also create compound indexes: 616 | 617 | db.unicorns.ensureIndex({name: 1, 618 | vampires: -1}); 619 | 620 | The direction of your index (1 for ascending, -1 for descending) doesn't matter for a single key index, but it can make a difference for compound indexes when you are sorting on more than one indexed field. 621 | 622 | The [indexes page](http://docs.mongodb.org/manual/indexes/) has additional information on indexes. 623 | 624 | ## Explain ## 625 | To see whether or not your queries are using an index, you can use the `explain` method on a cursor: 626 | 627 | db.unicorns.find().explain() 628 | 629 | The output tells us that a `BasicCursor` was used (which means non-indexed), that 12 objects were scanned, how long it took, what index, if any, was used as well as a few other pieces of useful information. 630 | 631 | If we change our query to use an index, we'll see that a `BtreeCursor` was used, as well as the index used to fulfill the request: 632 | 633 | db.unicorns.find({name: 'Pilot'}).explain() 634 | 635 | ## Replication ## 636 | MongoDB replication works in some ways similarly to how relational database replication works. All production deployments should be replica sets, which consist of ideally three or more servers that hold the same data. Writes are sent to a single server, the primary, from where it's asynchronously replicated to every secondary. You can control whether you allow reads to happen on secondaries or not, which can help direct some special queries away from the primary, at the risk of reading slightly stale data. If the primary goes down, one of the secondaries will be automatically elected to be the new primary. Again, MongoDB replication is outside the scope of this book. 637 | 638 | ## Sharding ## 639 | MongoDB supports auto-sharding. Sharding is an approach to scalability which partitions your data across multiple servers or clusters. A naive implementation might put all of the data for users with a name that starts with A-M on server 1 and the rest on server 2. Thankfully, MongoDB's sharding capabilities far exceed such a simple algorithm. Sharding is a topic well beyond the scope of this book, but you should know that it exists and that you should consider it, should your needs grow beyond a single replica set. 640 | 641 | While replication can help performance somewhat (by isolating long running queries to secondaries, and reducing latency for some other types of queries), its main purpose is to provide high availability. Sharding is the primary method for scaling MongoDB clusters. Combining replication with sharding is the perscribed approach to achieve scaling and high availability. 642 | 643 | ## Stats ## 644 | You can obtain statistics on a database by typing `db.stats()`. Most of the information deals with the size of your database. You can also get statistics on a collection, say `unicorns`, by typing `db.unicorns.stats()`. Most of this information relates to the size of your collection and its indexes. 645 | 646 | ## Profiler ## 647 | You enable the MongoDB profiler by executing: 648 | 649 | db.setProfilingLevel(2); 650 | 651 | With it enabled, we can run a command: 652 | 653 | db.unicorns.find({weight: {$gt: 600}}); 654 | 655 | And then examine the profiler: 656 | 657 | db.system.profile.find() 658 | 659 | The output tells us what was run and when, how many documents were scanned, and how much data was returned. 660 | 661 | You disable the profiler by calling `setProfilingLevel` again but changing the parameter to `0`. Specifying `1` as the first parameter will profile queries that take more than 100 milliseconds. 100 milliseconds is the default threshold, you can specify a different minimum time, in milliseconds, with a second parameter: 662 | 663 | //profile anything that takes 664 | //more than 1 second 665 | db.setProfilingLevel(1, 1000); 666 | 667 | ## Backups and Restore ## 668 | Within the MongoDB `bin` folder is a `mongodump` executable. Simply executing `mongodump` will connect to localhost and backup all of your databases to a `dump` subfolder. You can type `mongodump --help` to see additional options. Common options are `--db DBNAME` to back up a specific database and `--collection COLLECTIONNAME` to back up a specific collection. You can then use the `mongorestore` executable, located in the same `bin` folder, to restore a previously made backup. Again, the `--db` and `--collection` can be specified to restore a specific database and/or collection. `mongodump` and `mongorestore` operate on BSON, which is MongoDB's native format. 669 | 670 | For example, to back up our `learn` database to a `backup` folder, we'd execute (this is its own executable which you run in a command/terminal window, not within the mongo shell itself): 671 | 672 | mongodump --db learn --out backup 673 | 674 | To restore only the `unicorns` collection, we could then do: 675 | 676 | mongorestore --db learn --collection unicorns \ 677 | backup/learn/unicorns.bson 678 | 679 | It's worth pointing out that `mongoexport` and `mongoimport` are two other executables which can be used to export and import data from JSON or CSV. For example, we can get a JSON output by doing: 680 | 681 | mongoexport --db learn --collection unicorns 682 | 683 | And a CSV output by doing: 684 | 685 | mongoexport --db learn \ 686 | --collection unicorns \ 687 | --csv --fields name,weight,vampires 688 | 689 | Note that `mongoexport` and `mongoimport` cannot always represent your data. Only `mongodump` and `mongorestore` should ever be used for actual backups. You can read more about [your backup options](http://docs.mongodb.org/manual/core/backups/) in the MongoDB Manual. 690 | 691 | ## In This Chapter ## 692 | In this chapter we looked at various commands, tools and performance details of using MongoDB. We haven't touched on everything, but we've looked at some of the common ones. Indexing in MongoDB is similar to indexing with relational databases, as are many of the tools. However, with MongoDB, many of these are to the point and simple to use. 693 | 694 | # Conclusion # 695 | You should have enough information to start using MongoDB in a real project. There's more to MongoDB than what we've covered, but your next priority should be putting together what we've learned, and getting familiar with the driver you'll be using. The [MongoDB website](http://www.mongodb.org/) has a lot of useful information. The official [MongoDB user group](http://groups.google.com/group/mongodb-user) is a great place to ask questions. 696 | 697 | NoSQL was born not only out of necessity, but also out of an interest in trying new approaches. It is an acknowledgment that our field is ever-advancing and that if we don't try, and sometimes fail, we can never succeed. This, I think, is a good way to lead our professional lives. 698 | -------------------------------------------------------------------------------- /en/title.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ilivebox/the-little-mongodb-book/fb0728599f64aeec9d5d87d70b3f8a1877dea3ef/en/title.png -------------------------------------------------------------------------------- /en/title.psd: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ilivebox/the-little-mongodb-book/fb0728599f64aeec9d5d87d70b3f8a1877dea3ef/en/title.psd -------------------------------------------------------------------------------- /en/title.txt: -------------------------------------------------------------------------------- 1 | % The Little MongoDB Book 2 | % Karl Seguin -------------------------------------------------------------------------------- /readme.markdown: -------------------------------------------------------------------------------- 1 | ## About ## 2 | The Little MongoDB Book is a free book introducing MongoDB. 3 | 4 | The book was written shortly after the creation of the [MongoDB interactive tutorial](http://openmymind.net/mongly/). As such, the two can be seen as complementary. 5 | 6 | The book was written by [Karl Seguin](http://openmymind.net), with [Perry Neal](http://twitter.com/perryneal)'s assistance. 7 | 8 | If you liked this book, maybe you'll also like [The Little Redis Book](http://openmymind.net/2012/1/23/The-Little-Redis-Book/). 9 | 10 | The book was updated for MongoDB 2.6 by [Asya Kamsky](https://twitter.com/asya999) 11 | 12 | ## Translations ## 13 | 14 | * [Russian](https://github.com/jsmarkus/the-little-mongodb-book/tree/master/ru) 15 | * [Chinese](https://github.com/geminiyellow/the-little-mongodb-book/tree/master/zh-cn) updated for 2.6 16 | * [Chinese](https://github.com/justinyhuang/the-little-mongodb-book-cn) 17 | * [Italian](https://github.com/nicolaiarocci/the-little-mongodb-book/tree/master/it) updated for 2.6 18 | * [Spanish](https://github.com/uokesita/the-little-mongodb-book/tree/master/es) 19 | * [Brazilian Portuguese](https://github.com/danielmelogpi/the-little-mongodb-book/tree/master/pt_BR) 20 | * Japanese [Version 1](http://www.cuspy.org/diary/2012-04-17) - [Version 2](https://github.com/ma2/the-little-mongodb-book) 21 | 22 | ## License ## 23 | The book is freely distributed under the [Attribution-NonCommercial 3.0 Unported license](). 24 | 25 | ## Formats ## 26 | The book is written in [Markdown](http://daringfireball.net/projects/markdown/) and converted to PDF using [Pandoc](http://johnmacfarlane.net/pandoc/). 27 | 28 | The TeX template makes use of [Lena Herrmann's JavaScript highlighter](http://lenaherrmann.net/2010/05/20/javascript-syntax-highlighting-in-the-latex-listings-package). 29 | 30 | Kindle and ePub format provided using [Pandoc](http://johnmacfarlane.net/pandoc/). 31 | 32 | ## Generating books ## 33 | Packages listed below are for Ubuntu. If you use another OS or distribution names would be similar. 34 | 35 | ### PDF 36 | 37 | #### Dependencies 38 | 39 | Packages: 40 | 41 | * `pandoc` 42 | * `texlive-xetex` 43 | * `texlive-latex-extra` 44 | * `texlive-latex-recommended` 45 | 46 | You should have Microsoft fonts installed too. But you could change fonts in `common/pdf-template.tex` file if you want. 47 | 48 | #### Building 49 | 50 | Run `make en/mongodb.pdf`. 51 | 52 | ### ePub 53 | 54 | #### Dependencies 55 | 56 | Packages: 57 | 58 | * `pandoc` 59 | 60 | #### Building 61 | 62 | Run `make en/mongodb.epub`. 63 | 64 | ### Mobi 65 | 66 | #### Dependencies 67 | 68 | Packages: 69 | 70 | * `pandoc` 71 | 72 | You should have [KindleGen](http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000765211) installed too. 73 | 74 | #### Building 75 | 76 | Run `make en/mongodb.mobi`. 77 | 78 | ## Title Image ## 79 | A PSD of the title image is included. The font used is [Comfortaa](http://www.dafont.com/comfortaa.font). 80 | -------------------------------------------------------------------------------- /zh-cn/mongodb.markdown: -------------------------------------------------------------------------------- 1 | # 关于本书 # 2 | 3 | ## 许可 ## 4 | 本书《 The Little MongoDB Book 》基于 Attribution-NonCommercial 3.0 Unported license. **你无须为本书付款。** 5 | 6 | 你可以自由的复制,分发,修改和传阅本书。但请认可该书属于作者 Karl Seguin,并请勿将本书用于任何商业目的。 7 | 8 | 你可以在以下链接查看完整的许可文档: 9 | 10 | 11 | 12 | ## 关于作者 ## 13 | Karl Seguin 在多领域有着丰富经验,他是 .NET 和 Ruby 的开发专家。他也参与贡献 OSS 项目, 还是技术文档撰写人而且偶尔做做演讲。MongoDB 方面,他是 C# MongoDB 库 NoRM 的核心开发者,写有互动入门教程 [mongly](http://openmymind.net/mongly/) 和 [Mongo Web Admin](https://github.com/karlseguin/Mongo-Web-Admin)。他用 MongoDB,为休闲游戏开发者写了一个免费服务, [mogade.com](http://mogade.com/)。 14 | 15 | Karl 还编写了 [The Little Redis Book](http://openmymind.net/2012/1/23/The-Little-Redis-Book/) *1* 16 | 17 | 你可以在 找到他的 Blog,或者通过 [@karlseguin](http://twitter.com/karlseguin) 在 Twitter 上关注他。 18 | 19 | ## 鸣谢 ## 20 | 特别感谢 [Perry Neal](http://twitter.com/perryneal), 赐予我你的视野,精神,和热情。你赐予了我无尽的力量。感恩。 21 | 22 | ## 最新版本 ## 23 | 最新的版本由 Asya Kamsky 更新到了 MongoDB 2.6 。本书最新代码可以在这里获得: 24 | 25 | . 26 | 27 | ### 中文版本 ### 28 | Karl 在 [the-little-mongodb-book](https://github.com/karlseguin/the-little-mongodb-book) 的 Github 链接中给出了 [justinyhuang](https://github.com/justinyhuang) 的 [the-little-mongodb-book-cn](https://github.com/justinyhuang/the-little-mongodb-book-cn) 链接。但貌似 justinyhuang 并没有同步更新到 MongoDB 2.6 。内容上也和原文稍微有点出入,并且由于本人水平有限,无法提交自信正确的内容。因此重开一项目。如果你被搜索引擎引导到本工程,在此向你致歉,并希望有能力者且有时间者一同完善和同步本工程。你可以通过我的 邮箱 来联系我,或者通过 [@geminiyellow](https://twitter.com/geminiyellow) 在 Twitter 上关注我。 29 | 30 | 最新中文版本基于 [asya999](https://github.com/asya999) 在 May 29, 2014 提交的 [#38](https://github.com/karlseguin/the-little-mongodb-book/pull/38) SHA 是:6d4dce8ead6a767e1e8de1b59f714510d36d366f 31 | 32 | # 简介 # 33 | > 这章那么短不是我的错,MongoDB 就真的很易学。 34 | 35 | 都说技术在飞速发展。确实,有接连不断的新技术新方法出现。但是,我一直认为,程序员用到的基础技术的发展却是相当缓慢的。你可以好几年不学习但还能混得下去。令人惊讶的其实是成熟技术的被替换速度。就像在一夜之间,那些长期稳定成熟的技术发现它们不再被开发者关注。 36 | 37 | 最好的例子就是 NoSQL 技术的发展,以及它对稳定的关系型数据库市场的蚕食。看起来就像,昨天网络还是由 RDBMS 们来驱动的,而今天,就冒出五种左右的 NoSQL 解决案已经证明了它们都是值得拥有的。 38 | 39 | 虽然这些转变看起来都是一夜之间发生的,实际上他们他们可能花了数年的时间来取得公众的认可。最开始是由一小波开发者和公司在推动。解决方案被不断细化,吸取教训,然后一个新技术就这样诞生了,慢慢的后来者也开始了尝试。再次重申,NoSQL 的许多解决方案并不是为了取代传统的存储方案,而是解决一些特殊需求,填补了传统解决方案的一些空白。 40 | 41 | 说了那么多,我们第一件应该解决的事情是解释一下什么是 NoSQL。它是一个宽松的概念,不同的人有不同的见解。就个人而言,我通常认为它是数据存储系统的一部分。换而言之,NoSQL (重申, 就我而言),的好处是让你的持久层不需要一个独立的系统。历史上,传统的关系数据库厂商尝试把他们的产品当作一揽子解决方案,NoSQL 倾向于扮演,在特定的工作中充当最好的工具这种角色。因此,你的 NoSQL 架构中还是可以用到关系型数据库,比如说 MySQL,但是可以也可以用 Redis 作为系统中某部分的持久层,或者是用到 Hadoop 来处理大数据。简而言之,NoSQL 就是需要用开放的可代替的意识,使用现有的或者未来的方式和工具来管理你的数据。 42 | 43 | 你会想知道,MongoDB 是不是适用于这一切。作为一个面向文档数据库,MongoDB 是最通用的 NoSQL 解决案。它可以看成是关系型数据库的代替方案。和关系型数据库一样,它也可以和其他的 NoSQL 解决案搭配在一起更好的工作。MongoDB 有优点也有缺点,我们将会在本书后面的章节中介绍。 44 | 45 | # 开始 # 46 | 本书大部分内容将会专注于 MongoDB 的核心功能。我们会用到 MongoDB 的 shell。因为 shell 不但有助于学习,而且还是个很有用的管理工具。实际代码中你需要用到 MongoDB 驱动。 47 | 48 | 这也引出了关于 MongoDB 你所需要知道的第一件事: 它的驱动。MongoDB 有各种语言的 [官方驱动](http://docs.mongodb.org/ecosystem/drivers/)。这些驱动可以认为是和你所熟悉的各种数据库驱动一样的东西。基于这些驱动,开发社区又创建了更多的语言/框架相关库。比如说,[NoRM](https://github.com/atheken/NoRM) 是一个 C# 语言库,用 LINQ 实现,而 [MongoMapper](https://github.com/jnunemaker/mongomapper) 是一个 Ruby 库,ActiveRecord-friendly。你可以选择直接对 MongoDB 核心进行开发,或选择高级库。之所以要指出,是因为许多新手都觉得迷惑,为什么这里有官方版本和社区版本 - 前者通常关心和 MongoDB 核心的通讯/连接,而后者有更多的语言和框架的实现。 49 | 50 | 说到这,我希望你可以在 MongoDB 环境中尝试一下我的例子,并且在尝试解决可能遇到的问题。MongoDB 很容易安装和运行,所以让我们花几分钟把所有的东西运行起来。 51 | 52 | 1. 先打开 [官方下载页面](http://www.mongodb.org/downloads) ,从你选择的操作系统下面的第一行(推荐稳定版本)下载二进制文件。根据开发实际,你可以选择 32位 或者 64位。 53 | 54 | 2. 解压缩文件 (随便你放哪) 然后进入 `bin` 子目录。现在还不要执行任何命令,只要记住 `mongod` 用来打开服务进程,`mongo` 打开客户端 shell - 大部分时间我们将要使用这两个命令。 55 | 56 | 3. 在 `bin` 子目录下创建一个文本文件,命名为 `mongodb.config`。 57 | 58 | 4. 在 mongodb.config 中添加一行: `dbpath=PATH_TO_WHERE_YOU_WANT_TO_STORE_YOUR_DATABASE_FILES`。比如,在 Windows 你可以写 `dbpath=c:\mongodb\data` ,在 Linux 可能是 `dbpath=/var/lib/mongodb/data`。 59 | 60 | 5. 确保你指定的 `dbpath` 确实存在。 61 | 62 | 6. 执行 mongod ,带上参数 `--config /path/to/your/mongodb.config` 。 63 | 64 | 以 Windows 用户为例,如果你解压下载文档到 `c:\mongodb\` ,并且你创建了 `c:\mongodb\data\` ,那么在 `c:\mongodb\bin\mongodb.config` 你要指定 `dbpath=c:\mongodb\data\`。 然后你可以在 CMD 执行 `mongod` 如下命令行 `c:\mongodb\bin\mongod --config c:\mongodb\bin\mongodb.config`。 65 | 66 | 为省心你可以把 `bin` 文件夹路径添加到环境变量 PATH 中,可以简化命令。MacOSX 和 Linux 用户方法几乎一样。唯一需要改变的是路径。 67 | 68 | 希望你现在已经可以启动 MongoDB 了。如果出现异常,仔细阅读一下异常信息 - 服务器对异常的解释做得非常好。 69 | 70 | 现在你可以执行 `mongo` (没有 *d*) ,链接 shell 到你的服务器上了。尝试输入 `db.version()` 来确认所有都正确执行了。你应该能拿到一个已安装的版本号。 71 | 72 | # 第一章 - 基础知识 # 73 | 我们通过学习 MongoDB 的基本工作原理,开始我们的 MongoDB 之旅。当然,这是学习 MongoDB 的核心,它也能帮助我们回答诸如,MongoDB 适用于哪些场景这些更高层次的问题。 74 | 75 | 开始之前,这有六个简单的概念我们需要了解一下。 76 | 77 | 1. MongoDB中的 `database` 有着和你熟知的"数据库"一样的概念 (对 Oracle 来说就是 schema)。一个 MongoDB 实例中,可以有零个或多个数据库,每个都作为一个高等容器,用于存储数据。 78 | 79 | 2. 数据库中可以有零个或多个 `collections` (集合)。集合和传统意义上的 `table` 基本一致,你可以简单的把两者看成是一样的东西。 80 | 81 | 3. 集合是由零个或多个 `documents` (文档)组成。同样,一个文档可以看成是一 `row`。 82 | 83 | 4. 文档是由零个或多个 `fields` (字段)组成。, 没错,它就是 `columns`。 84 | 85 | 5. `Indexes` (索引)在 MongoDB 中扮演着和它们在 RDBMS 中一样的角色。 86 | 87 | 6. `Cursors` (游标)和上面的五个概念都不一样,但是它非常重要,并且经常被忽视,因此我觉得它们值得单独讨论一下。其中最重要的你要理解的一点是,游标是,当你问 MongoDB 拿数据的时候,它会给你返回一个结果集的指针而不是真正的数据,这个指针我们叫它游标,我们可以拿游标做我们想做的任何事情,比如说计数或者跨行之类的,而无需把真正的数据拖下来,在真正的数据上操作。 88 | 89 | 综上,MongoDB 是由包含 `collections` 的 `databases` 组成的。而 `collection` 是由 `documents`组成。每个 `document` 是由 `fields` 组成。 `Collections` 可以被 `indexed`,以便提高查找和排序的性能。最后,当我们从 MongoDB 获取数据的时候,我们通过 `cursor` 来操作,读操作会被延迟到需要实际数据的时候才会执行。 90 | 91 | 那为什么我们需要新的术语(collection vs. table, document vs. row and field vs. column)?为了让看起来更复杂点?事实上,虽然这些概念和关系型数据中的概念类似,但是还是有差异的。核心差异在于,关系型数据库是在 `table` 上定义的 `columns`,而面向文档数据库是在 `document` 上定义的 `fields`。也就是说,在 `collection` 中的每个 `document` 都可以有它自己独立的 `fields`。因此,对于 `collection` 来说是个简化了的 `table` ,但是一个 `document` 却比一 `row` 有更多的信息。 92 | 93 | 虽然这些概念很重要,但是如果现在搞不明白也不要紧。多插几条数据就明白上面说的到底是什么意思了。反正,要点就是,集合不对存储内容严格限制 (所谓的无模式(schema-less))。字段由每个独立的文档进行跟踪处理。这样做的优点和缺点将在下面章节一一讨论。 94 | 95 | 好了我们开始吧。如果你还没有运行 MongoDB,那么快去运行 `mongod` 服务和开启 mongo shell。shell 用的是 JavaScript。你可以试试一些全局命令,比如 `help` 或者 `exit`。如果要操作当前数据库,用 `db` ,比如 `db.help()` 或者 `db.stats()`。如果要操作指定集合,大多数情况下我们会操作集合而不是数据库,用 `db.COLLECTION_NAME` ,比如 `db.unicorns.help()` 或者 `db.unicorns.count()`。 96 | 97 | 我们继续,输入 `db.help()`,就能拿到一个对 `db` 能执行的所有的命令的列表。 98 | 99 | 顺便说一句:因为这是一个 JavaScript shell,如果你输入的命令漏了 `()`,你会看到这个命令的源码,而不是执行这个命令。我提一下,是为了避免你执行漏了括号的命令,拿到一个以 `function (...){` 开头的返回的时候,觉得神奇不可思议。比如说,如果你输入 `db.help` (不带括号), 你会看到 `help` 方法的内部实现。 100 | 101 | 首先我们用全局的 `use` 来切换数据库,继续,输入 `use learn`。这个数据库实际存在与否完全没有关系。我们在里面生成集合的时候, `learn` 数据库会自动建起来。现在,我们在一个数据库里面了,你可以开始尝试一下数据库命令,比如 `db.getCollectionNames()`。执行之后,你会得到一个空数组 (`[ ]`)。因为集合是无模式的,我们不需要特地去配置它。我们可以简单的插入一个文档到一个新的集合。像这样,我们用 `insert` 命令,在文档中插入: 102 | 103 | db.unicorns.insert({name: 'Aurora', 104 | gender: 'f', weight: 450}) 105 | 106 | 这行命令对集合 `unicorns` 执行了 `insert` 命令,并传入一个参数。MongoDB 内部用二进制序列化 JSON 格式,称为 BSON。外部,也就是说我们多数情况应该用 JSON,就像上面的参数一样。然后我们执行 `db.getCollectionNames()` ,我们将能拿到两个集合: `unicorns` 和 `system.indexes`。在每个数据库中都会有一个 `system.indexes` 集合,用来保存我们数据的的索引信息。 107 | 108 | 你现在可以对用 `unicorns` 执行 `find` 命令,然后返回文档列表: 109 | 110 | db.unicorns.find() 111 | 112 | 请注意,除你指定的字段之外,会多出一个 `_id` 字段。每个文档都会有一个唯一 `_id` 字段。你可以自己生成一个,或者让 MongoDB 帮你生成一个 `ObjectId` 类型的。多数情况下,你会乐意让 MongoDB 帮你生成的。默认的 `_id` 字段是已被索引的 - 这就说明了为什么会有 `system.indexes` 集合。你可以看看 `system.indexes`: 113 | 114 | db.system.indexes.find() 115 | 116 | 你可以看到索引的名字,被索引的数据库和集合,以及在索引中的字段。 117 | 118 | 现在,回到我们关于数组无模式的讨论中来。往 `unicorns` 插入一个完全不同的文档,比如: 119 | 120 | db.unicorns.insert({name: 'Leto', 121 | gender: 'm', 122 | home: 'Arrakeen', 123 | worm: false}) 124 | 125 | 然后,再用 `find` 列出文档。等我们理解再深入一点的时候,将会讨论一下 MongoDB 的有趣行为。到这里,我希望你开始理解,为什么那些传统的术语在这里不适用了。 126 | 127 | ## 掌握选择器(Selector) ## 128 | 除了我们介绍过的六个概念,在开始讨论更深入的话题之前,MongoDB 还有一个应该掌握的实用概念:查询选择器。MongoDB 的查询选择器就像 SQL 语句里面的 `where` 一样。因此,你会在对集合的文档做查找,计数,更新,删除的时候用到它。选择器是一个 JSON 对象,最简单的是就是用 `{}` 匹配所有的文档。如果我们想找出所有母独角兽,我们可以用 `{gender:'f'}`。 129 | 130 | 开始深入学习选择器之前,让我们先做些准备。首先,把刚才我们插入 `unicorns` 集合的数据删除,通过: `db.unicorns.remove({})`。现在,再插入一些用来演示的数据 (你不会手打吧): 131 | 132 | db.unicorns.insert({name: 'Horny', 133 | dob: new Date(1992,2,13,7,47), 134 | loves: ['carrot','papaya'], 135 | weight: 600, 136 | gender: 'm', 137 | vampires: 63}); 138 | db.unicorns.insert({name: 'Aurora', 139 | dob: new Date(1991, 0, 24, 13, 0), 140 | loves: ['carrot', 'grape'], 141 | weight: 450, 142 | gender: 'f', 143 | vampires: 43}); 144 | db.unicorns.insert({name: 'Unicrom', 145 | dob: new Date(1973, 1, 9, 22, 10), 146 | loves: ['energon', 'redbull'], 147 | weight: 984, 148 | gender: 'm', 149 | vampires: 182}); 150 | db.unicorns.insert({name: 'Roooooodles', 151 | dob: new Date(1979, 7, 18, 18, 44), 152 | loves: ['apple'], 153 | weight: 575, 154 | gender: 'm', 155 | vampires: 99}); 156 | db.unicorns.insert({name: 'Solnara', 157 | dob: new Date(1985, 6, 4, 2, 1), 158 | loves:['apple', 'carrot', 159 | 'chocolate'], 160 | weight:550, 161 | gender:'f', 162 | vampires:80}); 163 | db.unicorns.insert({name:'Ayna', 164 | dob: new Date(1998, 2, 7, 8, 30), 165 | loves: ['strawberry', 'lemon'], 166 | weight: 733, 167 | gender: 'f', 168 | vampires: 40}); 169 | db.unicorns.insert({name:'Kenny', 170 | dob: new Date(1997, 6, 1, 10, 42), 171 | loves: ['grape', 'lemon'], 172 | weight: 690, 173 | gender: 'm', 174 | vampires: 39}); 175 | db.unicorns.insert({name: 'Raleigh', 176 | dob: new Date(2005, 4, 3, 0, 57), 177 | loves: ['apple', 'sugar'], 178 | weight: 421, 179 | gender: 'm', 180 | vampires: 2}); 181 | db.unicorns.insert({name: 'Leia', 182 | dob: new Date(2001, 9, 8, 14, 53), 183 | loves: ['apple', 'watermelon'], 184 | weight: 601, 185 | gender: 'f', 186 | vampires: 33}); 187 | db.unicorns.insert({name: 'Pilot', 188 | dob: new Date(1997, 2, 1, 5, 3), 189 | loves: ['apple', 'watermelon'], 190 | weight: 650, 191 | gender: 'm', 192 | vampires: 54}); 193 | db.unicorns.insert({name: 'Nimue', 194 | dob: new Date(1999, 11, 20, 16, 15), 195 | loves: ['grape', 'carrot'], 196 | weight: 540, 197 | gender: 'f'}); 198 | db.unicorns.insert({name: 'Dunx', 199 | dob: new Date(1976, 6, 18, 18, 18), 200 | loves: ['grape', 'watermelon'], 201 | weight: 704, 202 | gender: 'm', 203 | vampires: 165}); 204 | 205 | 现在我们有数据了,我们可以开始来学习掌握选择器了。`{field: value}` 用来查找那些 `field` 的值等于 `value` 的文档。 `{field1: value1, field2: value2}` 相当于 `and` 查询。还有 `$lt`, `$lte`, `$gt`, `$gte` 和 `$ne` 被用来处理 小于,小于等于,大于,大于等于,和不等于操作。比如,获取所有体重大于700磅的公独角兽,我们可以这样: 206 | 207 | db.unicorns.find({gender: 'm', 208 | weight: {$gt: 700}}) 209 | //or (not quite the same thing, but for 210 | //demonstration purposes) 211 | db.unicorns.find({gender: {$ne: 'f'}, 212 | weight: {$gte: 701}}) 213 | 214 | `$exists` 用来匹配字段是否存在,比如: 215 | 216 | db.unicorns.find({ 217 | vampires: {$exists: false}}) 218 | 219 | 会返回一条文档。'$in' 被用来匹配查询文档在我们传入的数组参数中是否存在匹配值,比如: 220 | 221 | db.unicorns.find({ 222 | loves: {$in:['apple','orange']}}) 223 | 224 | 会返回那些喜欢 `apple` 或者 `orange` 的独角兽。 225 | 226 | 如果我们想要 OR 而不是 AND 来处理选择条件的话,我们可以用 `$or` 操作符,再给它一个我们要匹配的数组: 227 | 228 | db.unicorns.find({gender: 'f', 229 | $or: [{loves: 'apple'}, 230 | {weight: {$lt: 500}}]}) 231 | 232 | 上面的查询会返回那些喜欢 `apples` 或者 `weigh` 小于500磅的母独角兽。 233 | 234 | 在我们最后两个例子里面有个非常赞的特性。你应该已经注意到了,`loves` 字段是个数组。MongoDB 允许数组作为基本对象(first class objects)处理。这是个令人难以置信的超赞特性。一旦你开始用它,你都不知道没了它你怎么活下去了。最有趣的是,基于数组的查询变得非常简单: `{loves: 'watermelon'}` 会把文档中 `loves` 中有 `watermelon` 的值全部查询出来。 235 | 236 | 除了我们介绍的这些,还有更多可用的操作。所有这些都记载在 MongoDB 手册上的 [Query Selectors](http://docs.mongodb.org/manual/reference/operator/query/#query-selectors) 这一章。我们介绍的仅仅是那些你学习时所需要用到的,同时也是你最经常用到的操作。 237 | 238 | 我们已经学习了选择器是如何配合 `find` 命令使用的了。还大致介绍了一下如何配合 `remove` 命令使用,`count` 命令虽然没介绍,不过你肯定知道应该怎么做,而 `update` 命令,之后我们会花多点时间来详细学习它。 239 | 240 | MongoDB 为我们的 `_id` 字段生成的 `ObjectId` 可以这样查询: 241 | 242 | db.unicorns.find( 243 | {_id: ObjectId("TheObjectId")}) 244 | 245 | ## 小结 ## 246 | 我们还没有看到 `update` , 或是能拿来做更华丽事情的 `find`。不过,我们已经安装好 MongoDB 并运行起来了, 简略的介绍了一下 `insert` 和 `remove` 命令 (完整版也没比我们介绍的多什么)。 我们还介绍了 `find` 以及了解了 MongoDB `selectors` 是怎么一回事。 我们起了个很好的头,并为以后的学习奠定了坚实基础。 信不信由你,其实你已经掌握了学习 MongoDB 所必须的大多数知识 - 它真的是易学易用。 我强烈建议你在继续学习之前在本机上多试试多玩玩。 插入不同的文档,可以试试看在不同的集合中,习惯一下使用不同的选择器。试试 `find`, `count` 和 `remove`。 多试几次之后,你会发现原来看起来那么格格不入的东西,用起来居然水到渠成。 247 | 248 | # 第二章 - 更新 # 249 | 在第一章,我们介绍了 CRUD 的四分之三(create, read, update 和 delete) 操作。这章,我们来专门来讨论我们跳过的那个操作: `update`。 `Update` 有些独特的行为,这是为什么我们把它独立成章。 250 | 251 | ## Update: 覆盖还是 $set ## 252 | 最简单的情况, `update` 有两个参数: 选择器 (`where`) 和需要更新字段的内容。假设 Roooooodles 长胖了,你会希望我们这样操作: 253 | 254 | db.unicorns.update({name: 'Roooooodles'}, 255 | {weight: 590}) 256 | 257 | (如果你已经把 `unicorns` 集合玩坏了,它已经不是原来的数据了的话,再执行一次 `remove` 删除所有数据,然后重新插入第一章中所有的代码。) 258 | 259 | 现在,如果你查一下被更新了的记录: 260 | 261 | db.unicorns.find({name: 'Roooooodles'}) 262 | 263 | 你会发现 `update` 的第一个惊喜,没找到任何文档。因为我们指定的第二个参数没有使用任何的更新选项,因此,它 **replace** 了原始文档。也就是说, `update` 先根据 `name` 找到一个文档,然后用新文档(第二个参数)覆盖替换了整个文档。这和 SQL 的 `update` 命令的完全不一样。在某些情况下,这非常理想,可以用于某些完全动态更新上。但是,如果你只希望改变一个或者几个字段的值的时候,你应该用 MongoDB 的 `$set` 操作。继续,让我们来更新重置这个丢失的数据: 264 | 265 | db.unicorns.update({weight: 590}, {$set: { 266 | name: 'Roooooodles', 267 | dob: new Date(1979, 7, 18, 18, 44), 268 | loves: ['apple'], 269 | gender: 'm', 270 | vampires: 99}}) 271 | 272 | 这里不会覆盖新字段 `weight` 因为我们没有指定它。现在让我们来执行: 273 | 274 | db.unicorns.find({name: 'Roooooodles'}) 275 | 276 | 我们拿到了期待的结果。因此,在最开始的时候,我们正确的更新 weight 的方式应该是: 277 | 278 | db.unicorns.update({name: 'Roooooodles'}, 279 | {$set: {weight: 590}}) 280 | 281 | ## Update 操作符 ## 282 | 除了 `$set`,我们还可以用其他的更新操作符做些有意思的事情。所有的更新操作都是对字段起作用 - 所以你不用担心整个文档被删掉。比如,`$inc` 可以用来给一个字段增加一个正/负值。假设说 Pilot 获得了非法的两个 vampire kills 点,我们可以这样修正它: 283 | 284 | db.unicorns.update({name: 'Pilot'}, 285 | {$inc: {vampires: -2}}) 286 | 287 | 假设 Aurora 忽然长牙了,我们可以给她的 `loves` 字段加一个值,通过 `$push` 操作: 288 | 289 | db.unicorns.update({name: 'Aurora'}, 290 | {$push: {loves: 'sugar'}}) 291 | 292 | MongoDB 手册的 [Update Operators](http://docs.mongodb.org/manual/reference/operator/update/#update-operators) 这章,可以查到更多可用的更新操作符的信息。 293 | 294 | ## Upserts ## 295 | 用 `update` 还有一个最大的惊喜,就是它完全支持 `upserts`。所谓 `upsert` 更新,即在文档中找到匹配值时更新它,无匹配时向文档插入新值,你可以这样理解。要使用 upsert 我们需要向 update 写入第三个参数 `{upsert:true}`。 296 | 297 | 一个最常见的例子是网站点击计数器。如果我们想保存一个实时点击总数,我们得先看看是否在页面上已经有点击记录,然后基于此再决定执行更新或者插入操作。如果省略 upsert 选项(或者设为 false),执行下面的操作不会带来任何变化: 298 | 299 | db.hits.update({page: 'unicorns'}, 300 | {$inc: {hits: 1}}); 301 | db.hits.find(); 302 | 303 | 但是,如果我们加上 upsert 选项,结果会大不同: 304 | 305 | db.hits.update({page: 'unicorns'}, 306 | {$inc: {hits: 1}}, {upsert:true}); 307 | db.hits.find(); 308 | 309 | 由于没有找到字段 `page` 值为 `unicorns`的文档,一个新的文档被生成插入。当我们第二次执行这句命令的时候,这个既存的文档将会被更新,且 `hits` 会被增加到 2。 310 | 311 | db.hits.update({page: 'unicorns'}, 312 | {$inc: {hits: 1}}, {upsert:true}); 313 | db.hits.find(); 314 | 315 | ## 批量 Updates ## 316 | 关于 `update` 的最后一个惊喜,默认的,它只更新单个文档。到目前为止,我们的所有例子,看起来都挺符合逻辑的。但是,如果你执行一些像这样的操作的时候: 317 | 318 | db.unicorns.update({}, 319 | {$set: {vaccinated: true }}); 320 | db.unicorns.find({vaccinated: true}); 321 | 322 | 你肯定会希望,你所有的宝贝独角兽都被接种疫苗了。为了达到这个目的, `multi` 选项需要设为 true: 323 | 324 | db.unicorns.update({}, 325 | {$set: {vaccinated: true }}, 326 | {multi:true}); 327 | db.unicorns.find({vaccinated: true}); 328 | 329 | ## 小结 ## 330 | 本章中我们介绍了集合的基本 CRUD 操作。我们详细讲解了 `update` 及它的三个有趣的行为。 首先,如果你传 MongoDB 一个文档但是不带更新操作, MongoDB 的 `update` 会默认替换现有文档。因此,你通常要用到 `$set` 操作 (或者其他各种可用的用于修改文档的操作)。 其次, `update` 支持 `upsert` 操作,当你不知道文档是否存在的时候,非常有用。 最后,默认情况下, `update` 只更新第一个匹配文档,因此当你希望更新所有匹配文档时,你要用 `multi` 。 331 | 332 | # 第三章 - 掌握查询 # 333 | 在第一章中我们对 `find` 命令做了一个初步的了解。除了 `selectors` 以外 `find` 还有更丰富的功能。我们已经说过,`find` 返回的结果是一个 `cursor`。我们将进一步看看它到底是什么意思。 334 | 335 | ## 字段选择 ## 336 | 在开始 `cursors` 的话题之前,你应该知道 `find` 有第二个可选参数,叫做 "projection"。这个参数是我们要检索或者排除字段的列表。比如,我们可以仅查询返回独角兽的名字而不带别的字段: 337 | 338 | db.unicorns.find({}, {name: 1}); 339 | 340 | 默认的,`_id` 字段总是会返回的。我们可以通过这样显式的把它从返回结果中排除 `{name:1, _id: 0}`。 341 | 342 | 除了 `_id` 字段,你不能把检索和排除混合使用。仔细想想,这是有道理的。你只能显式的检索或者排除某些字段。 343 | 344 | ## 排序(Ordering) ## 345 | 到目前位置我已经提到好多次, `find` 返回的是一个游标,它只有在需要的时候才会执行。但是,你在 shell 中看确实到的是 `find` 被立刻执行了。这只是 shell 的行为。 我们可以通过一个 `find` 的链式方法,观察到 `cursors` 的真正行为。我们来看看 `sort`。我们指定我们希望排序的字段,以 JSON 方式,其中 1 表示升序 -1 表示降序。比如: 346 | 347 | //heaviest unicorns first 348 | db.unicorns.find().sort({weight: -1}) 349 | 350 | //by unicorn name then vampire kills: 351 | db.unicorns.find().sort({name: 1, 352 | vampires: -1}) 353 | 354 | 就像关系型数据库那样,MongoDB 允许对索引进行排序。我们再稍后将详细讨论索引。那,你应该知道的是,MongoDB 对未经索引的字段进行排序是有大小限制的。就是说,如果你试图对一个非常大的没有经过索引的结果集进行排序的话,你会得到个异常。有些人认为这是一个缺点。说实话,我是多希望更多的数据库可以有这种能力去拒绝未经优化的查询。(我不是把每个 MongoDB 的缺点硬说成优点,但是我已经看够了那些缺乏优化的数据库了,我真心希望他们能有一个 strict-mode。) 355 | 356 | ## 分页(Paging) ## 357 | 对结果分页可以通过 `limit` 和 `skip` 游标方法来实现。比如要获取第二和第三重的独角兽,我们可以这样: 358 | 359 | db.unicorns.find() 360 | .sort({weight: -1}) 361 | .limit(2) 362 | .skip(1) 363 | 364 | 通过 `limit` 和 `sort` 的配合,可以在对非索引字段进行排序时避免引起问题。 365 | 366 | ## 计数(Count) ## 367 | shell 中可以直接对一个集合执行 `count` ,像这样: 368 | 369 | db.unicorns.count({vampires: {$gt: 50}}) 370 | 371 | 实际上,`count` 是一个 `cursor` 的方法,shell 只是简单的提供了一个快捷方式。以不提供快捷方式的方法来执行的时候需要这样(在 shell 中同样可以执行): 372 | 373 | db.unicorns.find({vampires: {$gt: 50}}) 374 | .count() 375 | 376 | ## 小结 ## 377 | 使用 `find` 和 `cursors` 非常简单。还讲了一些我们后面章节会用到的或是非常特殊情况才用的命令,不过不管怎样,现在,你应该已经非常熟练使用 mongo shell 以及理解 MongoDB 的基本原则了。 378 | 379 | # 第四章 - 数据建模 # 380 | 让我们换换思维,对 MongoDB 进行一个更抽象的理解。介绍一些新的术语和一些新的语法是非常容易的。而要接受一个以新的范式来建模,是相当不简单的。事实是,当用新技术进行建模的时候,我们中的许多人还在找什么可用的什么不可用。在这里我们只是开始新的开端,而最终你需要去在实战中练习和学习。 381 | 382 | 与大多数 NoSQL 数据库相比,面向文档型数据库和关系型数据库很相似 - 至少,在建模上是这样的。但是,不同点非常重要。 383 | 384 | ## No Joins ## 385 | 你需要适应的第一个,也是最根本的区别就是 mongoDB 没有链接(join) 。我不知道 MongoDB 中不支持链接的具体原因,但是我知道链接基本上意味着不可扩展。就是说,一旦你把数据水平扩展,无论如何你都要放弃在客户端(应用服务器)使用链接。事实就是,数据 *有* 关系, 但 MongoDB 不支持链接。 386 | 387 | 没别的办法,为了在无连接的世界生存下去,我们只能在我们的应用代码中自己实现链接。我们需要进行二次查询 `find` ,把相关数据保存到另一个集合中。我们设置数据和在关系型数据中声明一个外键没什么区别。先不管我们那美丽的 `unicorns` 了,让我们来看看我们的 `employees`。 首先我们来创建一个雇主 (我提供了一个明确的 `_id` ,这样我们就可以和例子作成一样) 388 | 389 | db.employees.insert({_id: ObjectId( 390 | "4d85c7039ab0fd70a117d730"), 391 | name: 'Leto'}) 392 | 393 | 然后让我们加几个工人,把他们的管理者设置为 `Leto`: 394 | 395 | db.employees.insert({_id: ObjectId( 396 | "4d85c7039ab0fd70a117d731"), 397 | name: 'Duncan', 398 | manager: ObjectId( 399 | "4d85c7039ab0fd70a117d730")}); 400 | db.employees.insert({_id: ObjectId( 401 | "4d85c7039ab0fd70a117d732"), 402 | name: 'Moneo', 403 | manager: ObjectId( 404 | "4d85c7039ab0fd70a117d730")}); 405 | 406 | 407 | (有必要再重复一次, `_id` 可以是任何形式的唯一值。因为你很可能在实际中使用 `ObjectId` ,我们也在这里用它。) 408 | 409 | 当然,要找出 Leto 的所有工人,只需要执行: 410 | 411 | db.employees.find({manager: ObjectId( 412 | "4d85c7039ab0fd70a117d730")}) 413 | 414 | 这没什么神奇的。在最坏的情况下,大多数的时间,为弥补无链接所做的仅仅是增加一个额外的查询(可能是被索引的)。 415 | 416 | ## 数组和内嵌文档 ## 417 | MongoDB 不支持链接不意味着它没优势。还记得我们说过 MongoDB 支持数组作为文档中的基本对象吗?这在处理多对一(many-to-one)或者多对多(many-to-many)的关系的时候非常方便。举个简单的例子,如果一个工人有两个管理者,我们只需要像这样存一下数组: 418 | 419 | db.employees.insert({_id: ObjectId( 420 | "4d85c7039ab0fd70a117d733"), 421 | name: 'Siona', 422 | manager: [ObjectId( 423 | "4d85c7039ab0fd70a117d730"), 424 | ObjectId( 425 | "4d85c7039ab0fd70a117d732")] }) 426 | 427 | 有趣的是,对于某些文档,`manager` 可以是单个不同的值,而另外一些可以是数组。而我们原来的 `find` 查询依旧可用: 428 | 429 | db.employees.find({manager: ObjectId( 430 | "4d85c7039ab0fd70a117d730")}) 431 | 432 | 你会很快就发现,数组中的值比多对多链接表(many-to-many join-tables)要容易处理得多。 433 | 434 | 数组之外,MongoDB 还支持内嵌文档。来试试看向文档插入一个内嵌文档,像这样: 435 | 436 | db.employees.insert({_id: ObjectId( 437 | "4d85c7039ab0fd70a117d734"), 438 | name: 'Ghanima', 439 | family: {mother: 'Chani', 440 | father: 'Paul', 441 | brother: ObjectId( 442 | "4d85c7039ab0fd70a117d730")}}) 443 | 444 | 像你猜的那样,内嵌文档可以用 dot-notation 查询: 445 | 446 | db.employees.find({ 447 | 'family.mother': 'Chani'}) 448 | 449 | 我们只简单的介绍一下内嵌文档适用情况,以及你怎么使用它们。 450 | 451 | 结合两个概念,我们甚至可以内嵌文档数组: 452 | 453 | db.employees.insert({_id: ObjectId( 454 | "4d85c7039ab0fd70a117d735"), 455 | name: 'Chani', 456 | family: [ {relation:'mother',name: 'Chani'}, 457 | {relation:'father',name: 'Paul'}, 458 | {relation:'brother', name: 'Duncan'}]}) 459 | 460 | 461 | ## 反规范化(Denormalization) ## 462 | 另外一个代替链接的方案是对你的数据做反规范化处理(denormalization)。从历史角度看,反规范化处理是为了解决那些对性能敏感的问题,或是需要做快照的数据(比如说审计日志)。但是,随着日益增长的普及的 NoSQL,对链接的支持的日益丧失,反规范化作为规范化建模的一部分变得越来越普遍了。这不意味着,应该对你文档里的每条数据都做冗余处理。而是说,与其对冗余数据心存恐惧,让它影响你的设计决策,不如在建模的时候考虑什么信息应当属于什么文档。 463 | 464 | 比如说,假设你要写一个论坛应用。传统的方式是通过 `posts` 中的 `userid` 列,来关联一个特定的 `user` 和一篇 `post` 。这样的建模,你没法在显示 `posts` 的时候不查询 (链接到) `users`。一个代替案是简单的在每篇 `post` 中把 `name` 和 `userid` 一起保存。你可能要用到内嵌文档,比如 `user: {id: ObjectId('Something'), name: 'Leto'}`。是的,如果你让用户可以更新他们的名字,那么你得对所有的文档都进行更新(一个多重更新)。 465 | 466 | 适应这种方法不是对任何人都那么简单的。很多情况下这样做甚至是无意义的。不过不要害怕去尝试。它只是在某些情况下不适用而已,但在某些情况下是最好的解决方法。 467 | 468 | ## 你的选择是? ## 469 | 在处理一对多(one-to-many)或者多对多(many-to-many)场景的时候,id 数组通常是一个正确的选择。但通常,新人开发者在面对内嵌文档和 "手工" 引用时,左右为难。 470 | 471 | 首先,你应该知道的是,一个独立文档的大小当前被限制在 16MB 。知道了文档的大小限制,挺宽裕的,对你考虑怎么用它多少有些影响。在这点上,看起来大多数开发者都愿意手工维护数据引用关系。内嵌文档经常被用到,大多数情况下多是很小的数据块,那些总是被和父节点一起拉取的数据块。现实的例子是为每个用户保存一个 `addresses` ,看起来像这样: 472 | 473 | db.users.insert({name: 'leto', 474 | email: 'leto@dune.gov', 475 | addresses: [{street: "229 W. 43rd St", 476 | city: "New York", state:"NY",zip:"10036"}, 477 | {street: "555 University", 478 | city: "Palo Alto", state:"CA",zip:"94107"}]}) 479 | 480 | 这并不意味着你要低估内嵌文档的能力,或者仅仅把他们当成小技巧。把你的数据模型直接映射到你的对象,这会使得问题更简单,并且通常也不需要用到链接了。尤其是,当你考虑到 MongoDB 允许你对内嵌文档和数组的字段进行查询和索引时,效果特别明显。 481 | 482 | ## 大而全还是小而专的集合? ## 483 | 由于对集合没做任何的强制要求,完全可以在系统中用一个混合了各种文档的集合,但这绝对是个非常烂的主意。大多数 MongoDB 系统都采用了和关系型数据库类似的结构,分成几个集合。换而言之,如果在关系型数据库中是一个表,那么在 MongoDB 中会被作成一个集合 (many-to-many join tables being an important exception as well as tables that exist only to enable one to many relationships with simple entities)。 484 | 485 | 当你把内嵌文档考虑进来的时候,这个话题会变的更有趣。常见的例子就是博客。你是应该分成一个 `posts` 集合和一个 `comments` 集合呢,还是应该每个 `post` 下面嵌入一个 `comments` 数组? 先不考虑那个 16MB 文档大小限制 ( *哈姆雷特* 全文也没超过 200KB,所以你的博客是有多人气?),许多开发者都喜欢把东西划分开来。这样更简洁更明确,给你更好的性能。MongoDB 的灵活架构允许你把这两种方式结合起来,你可以把评论放在独立的集合中,同时在博客帖子下嵌入一小部分评论 (比如说最新评论) ,以便和帖子一同显示。这遵守以下的规则,就是你到想在一次查询中获取到什么内容。 486 | 487 | 这没有硬性规定(好吧,除了16MB限制)。尝试用不同的方法解决问题,你会知道什么能用什么不能用。 488 | 489 | ## 小结 ## 490 | 本章目标是提供一些对你在 MongoDB 中数据建模有帮助的指导, 一个新起点,如果愿意你可以这样认为。在一个面向文档系统中建模,和在面向关系世界中建模,是不一样的,但也没多少不同。你能得到更多的灵活性并且只有一个约束,而对于新系统,一切都很完美。你唯一会做错的就是你不去尝试。 491 | 492 | # 第五章 - MongoDB 适用场景 # 493 | 现在你应该有感觉,何时何地把 MongoDB 融入你现有的系统是最棒的了。这有超多的新的类似的存储技术,肯定会让你在选择的时候晕头转向。 494 | 495 | 对我来说,最重要的教训,跟 MongoDB 无关,是说你不用再依赖单一的解决案来处理你的数据了。毫无疑问,一个单一的解决案有明显的优势,对于许多项目来说 - 或者说大多数 - 单一解决案是一个明智的选择。意思不是说你 *必须* 使用不同的技术,而是说你 *可以*。 只有你自己才知道,引进新技术是否利大于弊。 496 | 497 | 说了那么多,我希望你到目前为止学到知识让你觉得 MongoDB 是一个通用的解决案。我们已经提到很多次了,面向文档的数据库和关系型数据库有很多方面类似。因此,与其绕开这些相同点,不如我们可以简单的这样认为, MongoDB 是关系型数据库的一个代替案。比如说用 Lucene 作为关系型数据库的全文检索索引的加强,或者用 Redis 作为持久型 key-value 存储,MongoDB 就是用来保存你的数据的。 498 | 499 | 注意,我没有说用 MongoDB *取代* 关系型数据库,而是 *代替* 案。它能做的有很多工具也能做。有些事情 MongoDB 可以做的更好,另外一些 MongoDB 做得差点。我们来进一步来讨论一下。 500 | 501 | ## 无模式(Flexible Schema) ## 502 | 面向文档数据库经常吹嘘的一个好处就是,它不需要一个固定的模式。这使得他们比传统的数据库表要灵活得多。我同意无模式是一个很不错的特性,但不是大多数人说的那样。 503 | 504 | 人们讲到无模式的时候,好像你就会把一堆乱七八糟的数据统统存起来一样。确实有些领域有些数据用关系型数据库来建模很痛苦,不过我觉得这些都是不常见的特例。无模式是酷,可是大多数情况下你的数据结构还是应当好好设计的。真正需要处理混乱时是不错,比如当你添加一个新功能的时候,不过事实是,大多数情况下,一个空列基本可以解决问题。 505 | 506 | 对我来说,动态模式的真正好处在于无需很多设置以及可以降低在 OOP 中使用的阻力。这在你使用静态语言的时候尤其明显。我在 C# 和 Ruby 中用过 MongoDB ,差异非常明显。Ruby 的动态特性以及它的流行的 ActiveRecord 实现,已经大幅降低面向对象/关系开发之间差异所带来的阻力。这不是说 MongoDB 和 Ruby 不配,而是是说它们太配了。真的,我觉得许多 Ruby 开发者眼中的的 MongoDB 只是有些许改进而已,而在 C# 或者 Java 开发者眼中,MongoDB 带来的是处理数据交互方式的翻天覆地变化。 507 | 508 | 假设从驱动开发者角度来看这个问题。你想保存一个对象?把它串行化成 JSON (严格来说是 BSON, 不过差不多) 然后把它传给 MongoDB。不需要做任何属性映射或者类型映射。这种简单性的好处就这样传递给了你,终端开发者。 509 | 510 | ## 写操作(Writes) ## 511 | MongoDB 可以胜任的一个特殊角色是在日志领域。有两点使得 MongoDB 的写操作非常快。首先,你可以选择发送了写操作命令之后立刻返回,而无须等到操作完成。其次,你可以控制数据持久性的写行为。这些设置,加上,可以定义一个成功的提交,需要在多少台服务器上成功拿到你的数据之后才算成功,并且每个写操作都是可设置, 这就给予你很高的权限用以控制写性能和数据持久性。 512 | 513 | 除了这些性能因素,日志数据还是这样一种数据集,用无模式集合更有优势。最后,MongoDB 还提供了 [受限集合(capped collection)](http://docs.mongodb.org/manual/core/capped-collections/)。到目前为止,所有我们默认创建的集合都是普通集合。我们可以通过 `db.createCollection` 命令来创建一个受限集合并标记它的限制: 514 | 515 | //limit our capped collection to 1 megabyte 516 | db.createCollection('logs', {capped: true, 517 | size: 1048576}) 518 | 519 | 当我们的受限集合到达 1MB 上限的时候,旧文档会被自动清除。另外一种限制可以基于文档个数,而不是大小,用 `max` 标记。受限集合有一些非常有趣的属性。比如说,你可以更新文档但是你不能改变它的大小。插入顺序是被设置好了的,因此不需要另外提供一个索引来获取基于时间的排序,你可以 "tail" 一个受限集合,就和你在 Unix 中通过 `tail -f ` 来处理文件一样,获取最新的数据,如果存在数据的话,而不需要重新查询它。 520 | 521 | 如果想让你的数据 "过期" ,基于时间而不是整个集合的大小,你可以用 [TTL 索引](http://docs.mongodb.org/manual/tutorial/expire-data/) ,所谓 TTL 是 "time-to-live" 的缩写。 522 | 523 | ## 持久性(Durability) ## 524 | 在 1.8 之前的版本,MongoDB 不支持单服务器持久性。就是说,如果一个服务器崩溃了,可能会导致数据的丢失或者损坏。解决案是在多服务器上运行 MongoDB 副本 (MongoDB 支持复制)。日志(Journaling)是 1.8 版追加的一个非常重要的功能。从 2.0 版的 MongoDB 开始,日志是默认启动的,该功能允许快速恢复服务器,比如遭遇到了服务器崩溃或者停电的情况。 525 | 526 | 持久性在这里只是提一下,因为围绕 MongoDB 过去缺乏单服务器持久的问题,人们取得了众多成果。这个话题在以后的 Google 检索中也许还会继续出现。但是关于缺少日志功能这一缺点的信息,都是过时了的。 527 | 528 | ## 全文检索(Full Text Search) ## 529 | 真正的全文检索是在最近加入到 MongoDB 中的。它支持十五国语言,支持词形变化(stemming)和干扰字(stop words)。除了原生的 MongoDB 的全文检索支持,如果你需要一个更强大更全面的全文检索引擎的话,你需要另找方案。 530 | 531 | ## 事务(Transactions) ## 532 | MongoDB 不支持事务。这有两个代替案,一个很好用但有限制,另外一个比较麻烦但灵活。 533 | 534 | 第一个方案,就是各种原子更新操作。只要能解决你的问题,都挺不错。我们已经看过几个简单的了,比如 `$inc` 和 `$set`。还有像 `findAndModify` 命令,可以更新或删除文档之后,自动返回修改过的文档。 535 | 536 | 第二个方案,当原子操作不能满足的时候,回到两段提交上来。对于事务,两段提交就好像给链接手工解引用。这是一个和存储无关的解决方案。两段提交实际上在关系型数据库世界中非常常用,用来实现多数据库之间的事务。 MongoDB 网站 [有个例子](http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/) 演示了最典型的场合 (资金转账)。通常的想法是,把事务的状态保存到实际的原子更新的文档中,然后手工的进行 init-pending-commit/rollback 处理。 537 | 538 | MongoDB 支持内嵌文档以及它灵活的 schema 设计,让两步提交没那么痛苦,但是它仍然不是一个好处理,特别是当你刚开始接触它的时候。 539 | 540 | ## 数据处理(Data Processing) ## 541 | 在2.2 版本之前的 MongoDB 依赖 MapReduce 来解决大部分数据处理工作。在 2.2 版本,它追加了一个强力的功能,叫做 [aggregation framework or pipeline](http://docs.mongodb.org/manual/core/aggregation-pipeline/),因此你只要对那些尚未支持管道的,需要使用复杂方法的,不常见的聚合使用 MapReduce。下一章我们将看看聚合管道和 MapReduce 的细节。现在,你可以把他们想象成功能强大的,用不同方法实现的 `group by` (打个比方)。对于非常大的数据的处理,你可能要用到其他的工具,比如 Hadoop。值得庆幸的是,这两个系统是相辅相成的,这里有个 [MongoDB connector for Hadoop](http://docs.mongodb.org/ecosystem/tools/hadoop/)。 542 | 543 | 当然,关系型数据库也不擅长并行数据处理。MongoDB 有计划在未来的版本中,改善增加处理大数据集的能力。 544 | 545 | ## 地理空间查询(Geospatial) ## 546 | 一个很强大的功能就是 MongoDB 支持 [geospatial 索引](http://docs.mongodb.org/manual/applications/geospatial-indexes/)。这允许你保存 geoJSON 或者 x 和 y 坐标到文档,并查询文档,用如 `$near` 来获取坐标集,或者 `$within` 来获取一个矩形或圆中的点。这个特性最好通过一些可视化例子来演示,所以如果你想学更多的话,可以试试看 [5 minute geospatial interactive tutorial](http://mongly.openmymind.net/geo/index)。 547 | 548 | ## 工具和成熟度 ## 549 | 你应该已经知道这个问题的答案了,MongoDB 确实比大多数的关系型数据要年轻很多。这个问题确实是你应当考虑的,但是到底有多重要,这取决于你要做什么,怎么做。不管怎么说,一个好的评估,不可能忽略 MongoDB 年轻这一事实,而可用的工具也不是很好 (虽然成熟的关系型数据库工具有些也非常渣!)。举个例子,它缺乏对十进制浮点数的支持,在处理货币的系统来说,明显是一个问题 (尽管也不是致命的) 。 550 | 551 | 积极的一方面,它为大多数语言提供了驱动,协议现代而简约,开发速度相当快。MongoDB 被众多公司用到了生产环境中,虽然有所担心,但经过验证后,担心很快就变成了过去。 552 | 553 | ## 小结 ## 554 | 本章要说的是,MongoDB,大多数情况下,可以取代关系型数据库。它更简单更直接;更快速并且通常对应用开发者的约束更少。不过缺乏事务支持也许值得慎重考虑。当人们说起 *MongoDB 在新的数据库阵营中到底处在什么位置?* 时,答案很简单: **中庸**(*2*)。 555 | 556 | # 第六章 - 数据聚合 # 557 | 558 | ## 聚合管道(Aggregation Pipeline) ## 559 | 聚合管道提供了一种方法用于转换整合文档到集合。你可以通过管道来传递文档,就像 Unix 的 "pipe" 一样,将一个命令的输出传递到另第二个,第三个,等等。 560 | 561 | 最简单的聚合,应该是你在 SQL 中早已熟悉的 `group by` 操作。我们已经看过 `count()` 方法,那么假设我们怎么才能知道有多少匹公独角兽,有多少匹母独角兽呢? 562 | 563 | db.unicorns.aggregate([{$group:{_id:'$gender', 564 | total: {$sum:1}}}]) 565 | 566 | 在 shell 中,我们有 `aggregate` 辅助类,用来执行数组的管道操作。对于简单的对某物进行分组计数,我们只需要简单的调用 `$group`。这和 SQL 中的 `GROUP BY` 完全一致,我们用来创建一个新的文档,以 `_id` 字段表示我们以什么来分组(在这里是以 `gender`) ,另外的字段通常被分配为聚合的结果,在这里,我们对匹配某一性别的各文档使用了 `$sum` 1 。你应该注意到了 `_id` 字段被分配为 `'$gender'` 而不是 `'gender'` - 字段前面的 `'$'` 表示,该字段将会被输入的文档中的有同样名字的值所代替,一个占位符。 567 | 568 | 我们还可以用其他什么管道操作呢?在 `$group` 之前(之后也很常用)的一个是 `$match` - 这和 `find` 方法完全一样,允许我们获取文档中某个匹配的子集,或者在我们的结果中对文档进行筛选。 569 | 570 | db.unicorns.aggregate([{$match: {weight:{$lt:600}}}, 571 | {$group: {_id:'$gender', total:{$sum:1}, 572 | avgVamp:{$avg:'$vampires'}}}, 573 | {$sort:{avgVamp:-1}} ]) 574 | 575 | 这里我们介绍另外一个管道操作 `$sort` ,作用和你想的完全一致,还有和它一起用的 `$skip` 和 `$limit`。以及用 `$group` 操作 `$avg`。 576 | 577 | MongoDB 数组非常强大,并且他们不会阻止我们往保存中的数组中写入内容。我们需要可以 "flatten" 他们以便对所有的东西进行计数: 578 | 579 | db.unicorns.aggregate([{$unwind:'$loves'}, 580 | {$group: {_id:'$loves', total:{$sum:1}, 581 | unicorns:{$addToSet:'$name'}}}, 582 | {$sort:{total:-1}}, 583 | {$limit:1} ]) 584 | 585 | 这里我们可以找出独角兽最喜欢吃的食物,以及拿到喜欢这种食物的独角兽的名单。 `$sort` 和 `$limit` 的组合能让你拿到 "top N" 这种查询的结果。 586 | 587 | 还有另外一个强大的管道操作叫做 [`$project`](http://docs.mongodb.org/manual/reference/operator/aggregation/project/#pipe._S_project) (类似于 `find`),不但允许你拿到指定字段,还可以根据现存字段进行创建或计算一个新字段。比如,可以用数学操作,在做平均运算之前,对几个字段进行加法运算,或者你可以用字符串操作创建一个新的字段,用于拼接现有字段。 588 | 589 | 这只是用聚合所能做到的众多功能中的皮毛, 2.6 的聚合拥有了更强大的力量,比如聚合命令可以返回结果集的游标(我们已经在第一章学过了) 或者可以将结果写到另外一个新集合中,通过 `$out` 管道操作。你可以从 [MongoDB 手册](http://docs.mongodb.org/manual/core/aggregation-pipeline/) 得到关于管道操作和表达式操作更多的例子。 590 | 591 | ## MapReduce ## 592 | MapReduce 分两步进行数据处理。首先是 map,然后 reduce。在 map 步骤中,转换输入文档和输出一个 key=>value 对(key 和/或 value 可以很复杂)。然后, key/value 对以 key 进行分组,有同样的 key 的 value 会被收入一个数组中。在 reduce 步骤中,获取 key 和该 key 的 value 的数组,生成最终结果。map 和 reduce 方法用 JavaScript 来编写。 593 | 594 | 在 MongoDB 中我们对一个集合使用 `mapReduce` 命令。 `mapReduce` 执行 map 方法, reduce 方法和 output 指令。在我们的 shell 中,我们可以创建输入一个 JavaScript 方法。许多库中,支持字符串方法 (有点丑)。第三个参数设置一个附加参数,比如说我们可以过滤,排序和限制那些我们想要分析的文档。我们也可以提供一个 `finalize` 方法来处理 `reduce` 步骤之后的结果。 595 | 596 | 在你的大多数聚合中,也许无需用到 MapReduce , 但如果需要,你可以读到更多关于它的内容,从 [我的 blog](http://openmymind.net/2011/1/20/Understanding-Map-Reduce/) 和 [MongoDB 手册](http://docs.mongodb.org/manual/core/map-reduce/)。 597 | 598 | ## 小结 ## 599 | 在这章中我们介绍了 MongoDB 的 [聚合功能(aggregation capabilities)](http://docs.mongodb.org/manual/aggregation/)。 一旦你理解了聚合管道(Aggregation Pipeline)的构造,它还是相对容易编写的,并且它是一个聚合数据的强有力工具。 MapReduce 更难理解一点,不过它强力无边,就像你用 JavaScript 写的代码一样。 600 | 601 | # 第七章 - 性能和工具 # 602 | 在这章中,我们来讲几个关于性能的话题,以及在 MongoDB 开发中用到的一些工具。我们不会深入其中的一个话题,不过我们会指出每个话题中最重要的方面。 603 | 604 | ## 索引(Index) ## 605 | 首先我们要介绍一个特殊的集合 `system.indexes` ,它保存了我们数据库中所有的索引信息。索引的作用在 MongoDB 中和关系型数据库基本一致: 帮助改善查询和排序的性能。创建索引用 `ensureIndex` : 606 | 607 | // where "name" is the field name 608 | db.unicorns.ensureIndex({name: 1}); 609 | 610 | 删除索引用 `dropIndex`: 611 | 612 | db.unicorns.dropIndex({name: 1}); 613 | 614 | 可以创建唯一索引,这需要把第二个参数 `unique` 设置为 `true`: 615 | 616 | db.unicorns.ensureIndex({name: 1}, 617 | {unique: true}); 618 | 619 | 索引可以内嵌到字段中 (再说一次,用点号) 和任何数组字段。我们可以这样创建复合索引: 620 | 621 | db.unicorns.ensureIndex({name: 1, 622 | vampires: -1}); 623 | 624 | 索引的顺序 (1 升序, -1 降序) 对单键索引不起任何影响,但它会在使用复合索引的时候有所不同,比如你用不止一个索引来进行排序的时候。 625 | 626 | 阅读 [indexes page](http://docs.mongodb.org/manual/indexes/) 获取更多关于索引的信息。 627 | 628 | ## Explain ## 629 | 需要检查你的查询是否用到了索引,你可以通过 `explain` 方法: 630 | 631 | db.unicorns.find().explain() 632 | 633 | 输出告诉我们,我们用的是 `BasicCursor` (意思是没索引), 12 个对象被扫描,用了多少时间,什么索引,如果有索引,还会有其他有用信息。 634 | 635 | 如果我们改变查询索引语句,查询一个有索引的字段,我们可以看到 `BtreeCursor` 作为索引被用到填充请求中去: 636 | 637 | db.unicorns.find({name: 'Pilot'}).explain() 638 | 639 | ## 复制(Replication) ## 640 | MongoDB 的复制在某些方面和关系型数据库的复制类似。所有的生产部署应该都是副本集,理想情况下,三个或者多个服务器都保持相同的数据。写操作被发送到单个服务器,也即主服务器,然后从它异步复制到所有的从服务器上。你可以控制是否允许从服务器上进行读操作,这可以让一些特定的查询从主服务器中分离出来,当然,存在读取到旧数据的风险。如果主服务器异常关闭,从服务中的一个将会自动晋升为新的主服务器继续工作。另外,MongoDB 的复制不在本书的讨论范围之内。 641 | 642 | ## 分片(Sharding) ## 643 | MongoDB 支持自动分片。分片是实现数据扩展的一种方法,依靠在跨服务器或者集群上进行数据分区来实现。一个最简单的实现是把所有的用户数据,按照名字首字母 A-M 放在服务器 1 ,然后剩下的放在服务器 2。谢天谢地,MongoDB 的拆分能力远比这种分法要强。分片不在本书的讨论范围之内,不过你应当有分片的概念,并且,当你的需求增长超过了使用单一副本集的时候,你应该考虑它。 644 | 645 | 尽管复制有时候可以提高性能(通过将长时间查询隔离到从服务器,或者降低某些类型的查询的延迟),但它的主要目的是维护高可用性。分片是扩展 MongoDB 集群的主要方法。把复制和分片结合起来实现可扩展和高可用性的通用方法。 646 | 647 | ## 状态(Stats) ## 648 | 你可以通过 `db.stats()` 查询数据库的状态。基本上都是关于数据库大小的信息。你还可以查询集合的状态,比如说 `unicorns` 集合,可以输入 `db.unicorns.stats()`。基本上都是关于集合大小的信息,以及集合的索引信息。 649 | 650 | ## 分析器(Profiler) ## 651 | 你可以这样执行 MongoDB profiler : 652 | 653 | db.setProfilingLevel(2); 654 | 655 | 启动之后,我们可以执行一个命令: 656 | 657 | db.unicorns.find({weight: {$gt: 600}}); 658 | 659 | 然后检查 profiler: 660 | 661 | db.system.profile.find() 662 | 663 | 输出会告诉我们:什么时候执行了什么,有多少文档被扫描,有多少数据被返回。 664 | 665 | 你要停止 profiler 只需要再调用一次 `setProfilingLevel` ,不过这次参数是 `0`。指定 `1` 作为第一个参数,将会统计那些超过 100 milliseconds 的任务. 100 milliseconds 是默认的阈值,你可以在第二个参数中,指定不同的阈值时间,以 milliseconds 为单位: 666 | 667 | //profile anything that takes 668 | //more than 1 second 669 | db.setProfilingLevel(1, 1000); 670 | 671 | ## 备份和还原 ## 672 | 在 MongoDB 的 `bin` 目录下有一个可执行文件 `mongodump` 。简单执行 `mongodump` 会链接到 localhost 并备份你所有的数据库到 `dump` 子目录。你可以用 `mongodump --help` 查看更多执行参数。常用的参数有 `--db DBNAME` 备份指定数据库和 `--collection COLLECTIONNAME` 备份指定集合。你可以用 `mongorestore` 可执行文件,同样在 `bin` 目录下,还原之前的备份。同样, `--db` 和 `--collection` 可以指定还原的数据库和/或集合。 `mongodump` 和 `mongorestore` 使用 BSON,这是 MongoDB 的原生格式。 673 | 674 | 比如,来备份我们的 `learn` 数据库导 `backup` 文件夹,我们需要执行(在控制台或者终端中执行该命令,而不是在 mongo shell 中): 675 | 676 | mongodump --db learn --out backup 677 | 678 | 如果只还原 `unicorns` 集合,我们可以这样做: 679 | 680 | mongorestore --db learn --collection unicorns \ 681 | backup/learn/unicorns.bson 682 | 683 | 值得一提的是, `mongoexport` 和 `mongoimport` 是另外两个可执行文件,用于导出和从 JSON/CSV 格式文件导入数据。比如说,我们可以像这样导出一个 JSON: 684 | 685 | mongoexport --db learn --collection unicorns 686 | 687 | CSV 格式是这样: 688 | 689 | mongoexport --db learn \ 690 | --collection unicorns \ 691 | --csv --fields name,weight,vampires 692 | 693 | 注意 `mongoexport` 和 `mongoimport` 不一定能正确代表数据。真实的备份中,只能使用 `mongodump` 和 `mongorestore` 。 你可以从 MongoDB 手册中读到更多的 [备份须知](http://docs.mongodb.org/manual/core/backups/) 。 694 | 695 | ## 小结 ## 696 | 在这章中我们介绍了 MongoDB 的各种命令,工具和性能细节。我们没有涉及所有的东西,不过我们已经把常用的都看了一遍。MongoDB 的索引和关系型数据库中的索引非常类似,其他一些工具也一样。不过,在 MongoDB 中,这些更易于使用。 697 | 698 | # 总结 # 699 | 你现在应该有足够的能力开始在真实项目中使用 MongoDB 了。虽然 MongoDB 远不止我们学到的这些内容,但是你要作的下一步是,把学到的知识融会贯通,熟悉我们需要用到的功能。[MongoDB website](http://www.mongodb.org/) 有许多有用的信息。官网的 [MongoDB user group](http://groups.google.com/group/mongodb-user) 是个问问题的好地方。 700 | 701 | NoSQL 不光是为需求而生,它同时还是不断尝试创新的成果。不得不承认,我们的领域是不断前行的。如果我们不尝试,一旦失败,我们就绝不会取得成功。就是这样的,我认为,这是让你在职业生涯一路走好的方法。 702 | 703 | ---------- 704 | 705 | *1* :中文版本 [the-little-redis-book][1] 706 | 707 | *2* :参考 [justinyhuang][2] 的翻译。MongoDB 属于 NoSQL,但是和传统关系型数据库类似,且较为通用。 708 | 709 | [1]:https://github.com/geminiyellow/the-little-redis-book/blob/master/zh-cn/redis.md 710 | [2]:https://github.com/justinyhuang/the-little-mongodb-book-cn/blob/master/mongodb.md 711 | -------------------------------------------------------------------------------- /zh-cn/title.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ilivebox/the-little-mongodb-book/fb0728599f64aeec9d5d87d70b3f8a1877dea3ef/zh-cn/title.png -------------------------------------------------------------------------------- /zh-cn/title.psd: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ilivebox/the-little-mongodb-book/fb0728599f64aeec9d5d87d70b3f8a1877dea3ef/zh-cn/title.psd -------------------------------------------------------------------------------- /zh-cn/title.txt: -------------------------------------------------------------------------------- 1 | % The Little MongoDB Book 2 | % Karl Seguin --------------------------------------------------------------------------------