├── .gitignore
├── session_notes
    ├── garbage collection.md
    ├── scaling to millions.md
    ├── Dr Nic - Contributing to OSS.md
    ├── adventures in full text search.md
    ├── learning to speak interface.md
    ├── persistence smoothie.md
    ├── Don't Repeat Yourself.md
    ├── app monitoring with cucumber.md
    ├── cassandra and cassandraobject.md
    ├── redis rails and resque.md
    ├── from 1 to 30.md
    ├── derek sivers.md
    ├── user behavior tracking.md
    ├── implementing user recommendations.md
    ├── million dollar mongo.md
    └── Building an API with Rails.md
└── README


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .dropbox
3 | slides


--------------------------------------------------------------------------------
/session_notes/garbage collection.md:
--------------------------------------------------------------------------------
1 | # Garbage Collection and the Ruby Heap
2 | 


--------------------------------------------------------------------------------
/README:
--------------------------------------------------------------------------------
1 | jayzes notes from Railsconf 2010 in Baltimore with mine added in.  Feel free to fork and add your own.


--------------------------------------------------------------------------------
/session_notes/scaling to millions.md:
--------------------------------------------------------------------------------
1 | # Scaling to Millions
2 | * EC2 performance is definitely an issue - big difference between physical and virtual hardware
3 | * Mongo falls down on big datasets/high concurrency due to locking issues
4 | * Cassandra is cool, but likes larger clusters
5 | 	* Recovery is very resource intensive, can kill smaller machines
6 | 	* Worth the hassle if you have billions of rows.  
7 | * Lesson: I'm a lousy sysadmin
8 | 	* Don't drink the cool aid or believe the hype, definitely do your own homework before going with a new piece of infrastructure


--------------------------------------------------------------------------------
/session_notes/Dr Nic - Contributing to OSS.md:
--------------------------------------------------------------------------------
 1 | ## Dr. Nic - Contributing to OSS ##
 2 | * What do you make?  Tools, TM bundles, little things, first drafts
 3 | * Help or recreate - pitch in rather than rewrite
 4 | 	* Nice to have your name as an author
 5 | 	* Less responsibility to contribute than maintain
 6 | 	* Visibility of contributions via blogs, announcements, etc.
 7 | 	* Make friends (at conferences)
 8 | * Project teams - not necessarily exclusive
 9 | 	* Just 'do' - add/fix/document what you want
10 | 	* Opportunity to express individuality
11 | * Aim is to get things done
12 | 	* OSS is a tool box, spare parts box, free and delicious
13 | * The 8 Steps (not magic, just convenient)
14 | 	1.  Get annoyed by a defect or missing feature
15 | 		* Circle of concern and circle of influence
16 | 	2.  Finding the source
17 | 		* Step 2b - Has anyone else fixed it already?
18 | 	3.  Checkout the source
19 | 	4.  Snoop around
20 | 		* Rake -T, etc.
21 | 	5.  Make changes
22 | 		* Add coverage tests
23 | 		* Add break tests
24 | 		* Document!
25 | 		* Cleanup/refactor
26 | 		* Add features
27 | 	6.  Refresh the code from the repo
28 | 	7.  Create the patch
29 | 	8.  Submit patch
30 | 	


--------------------------------------------------------------------------------
/session_notes/adventures in full text search.md:
--------------------------------------------------------------------------------
 1 | # Adventures in Full Text Search
 2 | ## Sarah Allen (Mightyverse)
 3 | ### 11:45am Thursday, 06/10/2010 
 4 | 
 5 | What are the things that make full text search special?
 6 | How do you choose what to use?
 7 | 
 8 | ##Stemming
 9 | ###Q:
10 | How do you avoid embedded words that should not show up in the search (ie. run inside of drunk)
11 | ###A: 
12 | Stemming
13 | 
14 | ##Stop words
15 | Words that are meaningless to search and need to be added to an excluded list
16 | 
17 | ##Tokenization
18 | Contextualized groupings of words that have different meanings when grouped in different ways. More common with Asian languages than in english.
19 | 
20 | 
21 | ##Accuracy
22 | * Its a tradeoff between accuracy and speed.
23 | * How important are false positives (or the absence of)?
24 | * How important are the top results compared with the exact number of results?
25 | 
26 | ##Message to take home
27 | * Learn how users use search on your site to inform your search implementation.
28 | * Simple database based full text searches may work for your needs.
29 | 
30 | ##Thoughts
31 | * very cursory
32 | * useful for a beginner overview of key concepts
33 | 
34 | 


--------------------------------------------------------------------------------
/session_notes/learning to speak interface.md:
--------------------------------------------------------------------------------
 1 | #Learn to Speak Interface: Creating Conversations Between Developers and Designers
 2 | ##Jess Martin (Relevance )
 3 | ###2:50pm Wednesday, 06/09/2010 
 4 | 
 5 | *@jessmartin
 6 | *Slides and worksheets: *SpeakInterface.com
 7 | *Feedback: htttp://spkr8.com/t/3440
 8 | 
 9 | 
10 | ##Software is for people
11 | Know the person, purpose, priority
12 | 
13 | ##Three questions:
14 | 
15 |    * WHO is the user
16 |    * WHY are they here?
17 |    * WHAT is most important?
18 | 
19 | ###WHO
20 | 
21 | *Exercises
22 | 
23 |    * Day in the life (go through person's life mentally - what are they doing throughout the day, especially when they are using your application)
24 |       * Draw a timeline (see image from slides)
25 |       * Usage helps identify key features or requirements
26 |    * Gotta Wanna
27 | 
28 | ###Why do they use the app?
29 | 
30 |    * Do they choose to? (intrinsic)
31 |    * Has someone told them to use it? (extrinsic)
32 | PURPOSE - Why are they here?
33 | 
34 | *Exercises
35 | 
36 |    * Back-seat driver - What would you tell them to click next (the thing you'd be frustrated if they missed).
37 |    * Cotton-Eyed Joe - Where is the user in the flow of the application?
38 | 
39 | ###PRIORITY
40 | What is the most important element or action to be taken on the page.
41 | 
42 | *Exercises
43 | 
44 |    * MVP test (don't have more than 3 items in the hierarchy on the page)
45 |       * Design for mobile (focuses on the most import things)
46 |    * Design from the Inside Out
47 |       * Focus on what matters most first
48 |    * Evaluating the MVP
49 |       * Squint Test
50 |       * Font Size Test (make sure not to have more than 3 levels of hierarchy)
51 | 


--------------------------------------------------------------------------------
/session_notes/persistence smoothie.md:
--------------------------------------------------------------------------------
 1 | # Persistence Smoothie #
 2 | * Example code is at http://www.github.com/flipsasser/Persistence-Smoothie
 3 | 
 4 | ## What is NoSQL? ##
 5 | * Any kind of persistene
 6 | * Generally speaking, ACID is out the door in favor of speed and scalability
 7 | * Probably shouldn't be running it *all* the time - good tool for the toolbox though
 8 | * Key-value stores
 9 | 	* Redis, riak, Voldemort, Tokyo Cabinet, MemcacheDB
10 | 	* Hashes in the sky
11 | 	* Use case is taking ugly (and non-performant) joins and transforming them into denormalized data
12 | * Document Stores
13 | 	* Mongo, Couch, Riak
14 | 	* Multi-level hashes
15 | 	* Generally query with map/reduce to flatten
16 | 	* Use case is loose schema, no ACID needed
17 | * Column-based stores
18 | 	* Cassandra,  HBase
19 | 	* Have a schema, more like a traditional DB
20 | * Graph stores
21 | 	* Track relationships efficiently
22 | 	* Neo4j, HypergraphDB, InfoGrid
23 | 	* Use case is deep relationship mapping
24 | 
25 | ## How do you get started with an existing app written with SQL? ##
26 | * First option is putting together helpers - just slam them into models
27 | 	* not very DRY
28 | * Second option is using datamapper
29 | 	* swiss army knife of ORMs
30 | 	* much simpler, gives a lot of AR niceties
31 | 	* still requires AR refactoring, ecosystem not as mature
32 | 	
33 | ## Example - Convert a Store App ##
34 | * Cases are Authentication, Products, Purchases, Activity stream
35 | 	* Auth is kind of not worth converting
36 | 	* Products are the sweet spot though - loose schema, lots of different structure
37 | 	* Purchases need to be transactional, don't mess with those
38 | 	* Activity stream is another sweet spot - lots of joins to be eliminated (!)
39 | * (Live coding)
40 | 	* Can use OpenStruct as a base for some simple persistence layers
41 | * Deployment and data portability become a much bigger pain in the ass when dealing with multiple data stores
42 | * The ecosystem of tools around ActiveRecord is really nice, but you basically have to say goodbye to it for many NoSQL ORMs (gem uninstall everything)


--------------------------------------------------------------------------------
/session_notes/Don't Repeat Yourself.md:
--------------------------------------------------------------------------------
 1 | # Don't repeat yourself, repeat others #
 2 | * Experiences writing Ruby - improve, learn, share
 3 | * 3 points - create, steal, think
 4 | 
 5 | ## Create ##
 6 | * What we have to learn to do, we learn by doing - Aristotle
 7 | * Reinvent the wheel - important to dev education and skill (like bodybuilder)
 8 | * MongoMapper - redid others work, but gained an appreciation for the work of others (DM, AR)
 9 | * Dynamic languages are dynamic
10 | 	* autoload - classes loading on demand
11 | 	* method_missing magic
12 | 	* dirty keys in AR
13 | 	* include hook and class_eval - used for the mapper mixins
14 | 	* super and the ancestor tree (with overrides in the same file)
15 | 	* dynamically defined classes and modules (Class.new, Module.new, etc)
16 | * Objects can do more than #new and #save
17 | 	* equality - #eql? versus == (can alias == to #eql?) but don't touch #equal
18 | 	* clone vs. dup
19 | 		* clone doesn't clone internal objects, like hashes
20 | 		* can use initialize_copy method definition for your class to work properly
21 | 	* Hooks
22 | 		* inherited hook - this is how STI works
23 | 	* Validations, callbacks, comparable, enumerable, etc. 
24 | * Patterns - not just for the enterprise
25 | 	* Proxies a la AR - class as an interface to another class
26 | 	* Decorators - adding behavior to an instance dynamically.  Can use extend/include on an object!
27 | 	* Identity map - "a man with 2 watches never knows the time"
28 | 		* Don't ever load the same object 2x
29 | * Local APIs - eat your own dogfood to make sure it's consistent for others
30 | * "I want to make things, not just glue things together" - Mike Taylor
31 | 	
32 | ## Steal ##
33 | * HTTParty steals from DM, AR
34 | * Happymapper steals a *ton* from DM
35 | * Mongomapper - steals from sequel
36 | * Plucky is a lot like Hash, Arel
37 | * gem whois
38 | * canable - thanks pivotal
39 | 
40 | ## Think ##
41 | * Make decisions.  Don't add things because you *might* need them
42 | * Extraction vs. prediction.  Don't extract until you repeat yourself.
43 | * Refactor (read the book)
44 | 	* Sketch them out on paper
45 | * Write
46 | 	* Helps you connect all the dots on what you've learned
47 | 	* Write for yourself, helping others is incidental


--------------------------------------------------------------------------------
/session_notes/app monitoring with cucumber.md:
--------------------------------------------------------------------------------
 1 | # App Monitoring with Cucumber
 2 | ## Evolution of testing
 3 | * Save & Refresh -> Test::Unit (difficult to demonstrate business value) -> RSpec + BDD (more stakeholder-digestible) -> Cucumber (awesome)
 4 | * Cucumber as a business-readable domain specific language
 5 | 	- Documentation, automated tests, and development aid all in one
 6 | * But what about the actual production app?
 7 | 	* Monitoring is treated as revenue preservation
 8 | 	* Toolsets not as advanced
 9 | 	* Only about 12 people in the room have external monitoring for more than a fitter_happier type URL
10 | 	* 2 axes - what are you looking at (URLs, etc) and how close are you looking (expected values, etc)?
11 | 		* If you're checking just the home page, you have great focus but not breadth
12 | 		* 	Ex. search - frequent issues on many sites, may be difficult to detect automatically (0 results for beer!)
13 | 	* Not really TATFT on production - why not test where the revenue is being made?
14 | 	
15 | ## Existing tools for monitoring are old & broken
16 | * Nagios is ugly and tends to have a lot of noise - leads to boy crying work scenarios
17 | 	* Nagios is EVIL
18 | * Pingdom - not much depth in tests
19 | * Watchmouse is pretty cool - can drill into specific URLs and methods
20 | 	* Twitter is using for enhanced status monitoring
21 | * All tools miss a direct link between business value and the actual alerts being sent out
22 | 
23 | ## Enter cucumber - #devops!
24 | * Blur the line between devs and ops staff, and between development and infrastructure management
25 | * Kumbaya.
26 | 
27 | ## Examples (checkout cucumber-nagios)
28 | * Benchmarking
29 | * E-mail delivery (use mailinator for stubbed e-mail boxes)
30 | * Tap into existing tools like scout and newrelic (cucumber-scout and cucumber-newrelic)
31 | * SEO checks (go to google, check search results)
32 | * Security - check to make sure people have access and others do not (fired employees, etc.)
33 | * Infrastructure status (RAID arrays, etc)
34 | * DNS - make sure everything resolves properly
35 | * Credit card transactions, SSL (valid certificate), error rate (Hoptoad/Exceptional)
36 | 
37 | ## Running features in production
38 | * Use a JSON cucumber output formatter (cucumber-json)
39 | * Set up a separate 'production' environment and feature files/step defs/support
40 | * cucumber-p production
41 | * OR build a cucumber Scout plugins
42 | * Use pagerduty.com to send e-mail alerts out and assigning them out
43 | * Ultimately, this gives you the power to monitor anything and everything with clearly defined business value in a way stakeholders can read and understand, and know of issues before your customers and clients do
44 | 
45 | ## Q&A
46 | * Use hudson for production checks - maintain history, know if things are changing
47 | * Use long-running transactions for dropping generated data.


--------------------------------------------------------------------------------
/session_notes/cassandra and cassandraobject.md:
--------------------------------------------------------------------------------
 1 | # Cassandra and CassandraObject #
 2 | 
 3 | ### Disclaimer: I (Koz) don't have to scale, so no scalability porn here ###
 4 | 
 5 | ## About Cassandra ##
 6 | * Pre-1.0, full-fledged Apache project
 7 | * Distributed and fault-tolerant and elastic from the ground up
 8 | * Runs on a cluster of servers that organize into a ring based on key ranges
 9 | 	* Nodes are aware of each other and talk using a gossip-based protocol
10 | 	* Configurable keyspace partitioning - random (MD5, evenly-distributed) or order preserving (solely key-based, uneven)
11 | 	* Configurable replication factor - number of nodes to write items of data to.  Defaults to 3
12 | 	* Replication happens asynchronously after writes - this is where consistency levels come into place
13 | 		* Essentially, how many nodes need to have responded to a request in order for it to be considered consistent (and thus complete)
14 | 		* Options include none, one, quorum (majority), all
15 | 		* Allows you to trade off performance and data consistency speed
16 | 	* Fault tolerance will automatically skip dead servers when reading/writing data - can decommission nodes and reassign data on the fly
17 | 	* Adding a new node will automatically rebalance the cluster and increase capacity accordingly
18 | 
19 | ## Data Model ##
20 | * Fundamentally a column store (using keys) - not a key-value store!
21 | * Columns consist of a name, value, and a timestamp
22 | * Column families/rows consist of an ordered set of columns
23 | * Supercolumns point to a set of rows (a 2 level hash, essentially, like JSON)
24 | * Can set ordering on columns families using CompareWith (for example, TimeUUID)
25 | * There's no WHERE or ORDER or COUNT or SUM in querying - it's more about query-driven modelling
26 | 	* Populate a model that lets you query what you need
27 | 	* Create column families like indexes to search - you need a column family per query, essentially
28 | 	* Can get specific keys individually or grab ranges (only feasible with the order-preserving partitioner due to assembly time)
29 | 	
30 | ## Do you really need Cassandra? ##
31 | * Different enough that you want to make sure you need it before you dive in
32 | * Won't be writing a little blog in it, likely
33 | 
34 | ## CassandraObject ##
35 | * Opinionated, hierarchical like ActiveRecord
36 | * Started out as a way to prove the usefulness of ActiveModel (and learn Cassandra) - not to solve scaling problems
37 | 	* Mostly AR-compatible
38 | * Usage
39 | 	* Define attributes and types (pluggable system for implementing those in Cassandra)
40 | 	* Validations (exactly like AR, except the stuff reliant on DB ops like uniqueness)
41 | 	* Keys - selection is crucially important since it determines how you can query
42 | 		* Can do UUID, natural (based on an attribute), or some other key factory object (like a Redis INCR-based factory, very simple API)
43 | 	* Migrations
44 | 		* CassandraObject uses an integer schema version attribute to auto-execute data migrations on access
45 | 		* Pretty cool and transparent
46 | 	* Indexes
47 | 		* Lets you do AR_style find_by_x methods
48 | 		* Can be unique or non-unique
49 | 		* Read-repair is used to re-update indexes on operations
50 | 	* Associations
51 | 		* Deliberately named differently since they function quite differently than AR - no proxy create methods or finders, etc.
52 | * The library is still very beta and API/data is in flux, but Koz is happy to take patches
53 | * Works on Rails 2.x and 3


--------------------------------------------------------------------------------
/session_notes/redis rails and resque.md:
--------------------------------------------------------------------------------
 1 | # Redis, Rails and Resque # 
 2 | ## 3 favorites pieces of software ##
 3 | * Redis - "we've been wanting it for a long time, even if we didn't know that we wanted it"
 4 | 	* Key-value store for data structures
 5 | 	* Hash, list, set, sorted set, string, integer, all with different operations
 6 | 	* RAM used to be a limiting facter, not anymore.  Can shard across machines to increase capacity
 7 | 	* Used @ GH as a routing table, monitoring tool for file server load, etc.
 8 | 	* Can use in multiple different languages, which is huge
 9 | 	* antirez is awesome
10 | 		* Responsive and transparent (blog posts, etc.)
11 | 		* Included client libraries and CLI utils for easy setup
12 | * Unicorn - not a stupid name
13 | 	* Uses Mongrel parser (which is awesome)
14 | 	* May not be the fastest, but it's uber reliable and graceful at dealing with failure
15 | 	* Uses all kinds of UNIXy underpinnings
16 | 	* Unicorn uses select() to load balance to workers without blocking
17 | 	* Preforking worker to get shared memory, fast startup, and durability for stalled workers
18 | 		* No problems with all mongrels being in the restarting state anymore
19 | 		* Rack-based, but using a shim for Rails 2.2
20 | 		* Luke Malia's GC middleware <-- CHECK OUT
21 | 	* Great maintainer, very attentive, solid code
22 | * Delayed_job - background processor, queue, man about town?
23 | 	* Stupid simple
24 | 	* Sort of like a post-commit hook for your application actions
25 | 	* GH has gone through 6-7 queueing systems
26 | 		* SQS was first - basic wrapper with loop/do
27 | 		* DJ doesn't deal with chaos in workers very well, unfortunately - hanging jobs, etc.
28 | * Time to roll your own
29 | 	* Not based on SQL cause we're not good at it
30 | 	* Needs to be easy!
31 | 	* Needs to be atomic for operations to ensure no work is being done multiple times
32 | 	* High visibility - jump in, kill jobs, etc.
33 | 	* High relaibility for the system (at the cost of reliability for individual jobs)
34 | 	* Didn't want full serialization YAML code from DJ - too much overhead
35 | 	* Note: Please don't write push-bots
36 | 	* Enter resque!
37 | 		* Based on redis
38 | 		* No complicated priorities - queues for buckets
39 | 			* Makes it easy to use redis-cli to investigate things
40 | 		* JSON serialization makes things very simple - class and args, that's it
41 | 		* You can have workers in many different languages - Node, C, etc.
42 | 		* Very simple API
43 | 		* Doesn't actually implement queueing - lets redis do that cause it's a better implementation
44 | 		 	* Frontend queueset - critical, high, low, etc.
45 | 			* Queue names can indicate locality - different queues for pages, frontend, backend, etc.
46 | 				* Pages are static data through nginx
47 | 		* Easy monitoring - just look at redis keys
48 | 		* Preforking model
49 | 			* Child can have chaos, it's okay, just kill it
50 | 			* Master can deal with signals and manage the whole deal through its own signals
51 | 		* Abuse the procline to help with debugging
52 | 		* Web UI
53 | 		* Trying to make it an awesome open source project
54 | 			* Open list, community of patches, etc.
55 | 			* Balance the README so it isn't too long
56 | 			* Need to have a vision and a plan for your OSS projects, and be available - that will bring people running
57 | 			* Have no features in the main project - encourage a healthy ecosystem for plugins to support functionality you don't want as a maintainer
58 | 		* What about resque 2.0?
59 | 			* BLPOP in Redis 2 to eliminate polling between client/messaging server
60 | 			* Can do multiple keys - almost a full queueing system with no Ruby code!
61 | 			* More plugins and hooks - nicer plugin API and demo app
62 | 			* Redoing UI in Mustache
63 | 			
64 | ## Q&A ##
65 | * Use supervise for daemon monitoring
66 | * Actually run background tasks on the frontend because Unicorn is so efficient


--------------------------------------------------------------------------------
/session_notes/from 1 to 30.md:
--------------------------------------------------------------------------------
 1 | # From 1 to 30 - Breaking down a big app
 2 | 
 3 | ## Tale of Two Buildings
 4 | * Forbidden city - constantly refactored and made bigger, lasted for years and years
 5 | * Lotus Revierside highrise - built bigger and bigger by greedy developers, fell over on its side (http://www.q2hoo.com/2009/06/a-series-of-photos-about-collapsed-shanghai-flat-lotus-riverside-compound.htm)
 6 | 
 7 | ## The app is a distributed call center app for language trainers
 8 | * Mission critical - kept adding more and more and more to it
 9 | 
10 | ## Monolithic - entire app runs as one Rails application
11 | * Confuses and scares new staff <-- BIG PROBLEM
12 | * Hard to test/extend/scale
13 | * Code gets messy
14 | 
15 | ## What is 30?
16 | * An ecosystem of independent Rails applications
17 | * Linked and seamless
18 | * Users don't see the difference
19 | 
20 | ## Each app:
21 | * Has a separate database
22 | * Runs independently (has its own user stories)
23 | * Is lightweight (can be handled by a single developer)
24 | * Has tight internal cohesion and loose external coupling
25 | * Advantages:
26 | 	* Independent development cycles
27 | 	* Developer autonomy
28 | 	* Safety from technology (im)maturity
29 | 	* Appeal to developer laziness - overriding philosophy of making it easier on yourself (cause it's going to happen anyway)
30 | 
31 | ## The Mysteries of the Forbidden City
32 | * Consistent UI/UX
33 | 	* Shared CSS/JS framework and style guide for all apps
34 | 	* Gem all common helpers (combo search, list tables w/ pagination)
35 | 	* Any time you see repetition, extract and reuse!
36 | 		* Implement new view code in a separate application
37 | 		* Extract into plugin
38 | 		* Roll up into a gem
39 | * Sharing data
40 | 	* For example, purchase app needs data stored in the 'courses' application
41 | 		* Tried built-in Rails stuff (ActiveResource) - tends to be really slow and difficult to debug
42 | 		* Lose Rails magic
43 | 		* Tried SVN externals, but a pain to manage
44 | 	* Read-only database connections
45 | 		* One app responsible for writing/updating/creating data, the others just reading
46 | 		* Pretty safe way of integrating
47 | 		* Acts_as_readonly macro to disable AR save functionality
48 | 		* Configuration is hell, however
49 | 			* Use CoreService ActiveResource model to retrieve app-specific info
50 | 			* When each app starts, it posts its configuration info to a central service for all others to access 
51 | 			* Ordering issues aren't too bad - not too many tight dependencies
52 | 	* Services - pain to build but necessary for write-centric operations
53 | 	* AJAX-loaded view segments - put in a div from another apps URL
54 | * Authentication and Access Control
55 | 	* Registration/Login
56 | 		* Share sessions between applications via a common session store
57 | 	* Profile management
58 | 	* Role-based Access Control
59 | 		* Controller-based - each controller is a node
60 | 		* Posted to CoreService when app starts
61 | 		* Retrieve access rights
62 | 		* Controllers are responsible for checking rights
63 | * Services
64 | 	* File uploads - can't use S3 because of the Chinese firewall so wrote a similar service to receive files posted in the background
65 | 		* make use of super-polymorphic relationships keyed off the application as well as model and id
66 | 	* Mail - universal sending
67 | 	* Comet service
68 | * Deployment
69 | 	* Load each app into a subdirectory, then use unicorn with multiple paths
70 | 	* Use NGinx as a reverse proxy to rewrite URIs to be uniform
71 | 	* Definitely extra memory consumption, but it's worth the dev time savings
72 | 	* Have to scale apps independently based on traffic levels - very easy to tweak and balance
73 | 	
74 | ## Net result: Higher productivity
75 | * Faster build-to-deploy
76 | * More developer autonomy
77 | * Safer
78 | * more scalable
79 | * Easier for new folks to jump in
80 | * Greater developer happiness


--------------------------------------------------------------------------------
/session_notes/derek sivers.md:
--------------------------------------------------------------------------------
 1 | # Derek Sivers Keynote - thoughts.pro #
 2 | 
 3 | ## Sometimes you need to see how things work from an opposite point of view
 4 | * Chinese doctors - you pay only when you're healthy (to prevent sickness)
 5 | * 2-3-4-1 musical meter
 6 | * Upside-down map w/ new zealand and australia at the top
 7 | 
 8 | ## Accents and Identify
 9 | * French accent story
10 | * You can't hear your own accent to know how bad it is
11 | * Works the same for programming languages - Java guys come to Rails with a serious accent
12 | 
13 | ## Quirks
14 | * Light switch story - waving a hand in front of a panel
15 | * Quirks are only cool when you're used to them
16 | * "Don't make me think" - enormous difference between observing quirks and having to use/get through them.  
17 | * Not so charming when you need to use them.
18 | 
19 | ## My PHP Framework
20 | * CDbaby was cobbled together and quirky, but it worked
21 | * Found Rails partway through
22 | 
23 | ## Playing vs. Planning
24 | * Present focus vs. future focus
25 | * Predictably irrational - $20 bill auction
26 | 	* Easy to get on a path that doesn't lead you anywhere good
27 | 	* Fine when you're playing - good to be in the present moment
28 | 	* Another example is the management rat race - getting away from what you love not because you want to, but because you have to
29 | 
30 | ## Is it fun for people to help you?
31 | * If anyone else needs to learn this PHP framework, they're going to hate it because it's full of personal quirks
32 | * Harry Harlow - monkey pinned door experiment - the monkeys wanted to figure out the puzzle right away during the acclimation period.  With no external motivaion.
33 | 	* Instrinsic motivation!
34 | 	* Only the monkeys started getting rewarded, they made more errors and got worse at solving puzzles
35 | 	* Lesson: for interesting problems, intrinsic motivation can be stronger than external motivations
36 | * Candy or answers - for interesting questions, the answer can be worth more than any rewards
37 | 
38 | ## Delegating: What would Branson do?
39 | * Over 400 companies in the Virgin group - how does he do it?
40 | * Constantly evaluating and funding ideas - like having unlimited time
41 | 	* Suddenly able to say yes to everything that seemed worth doing
42 | 	* But it's hard to let go
43 | 
44 | ## Brain dump yet? - thoughts.pro
45 | * Dancing guy video - lone nut -> first follower turns him into a leader -> third makes a crowd to hit the tipping point and it's no longer risky
46 | 	* nurture your followers as equals
47 | 	* glorify the action, not yourself
48 | 	* no movement without the first follower
49 | 	* best way to make a movement is to creatively follow and give a lone nut credibility
50 | 
51 | ## Confabulate - fabricate plausible reasons for things we feel
52 | * Brother/sister story
53 | * For moral arguments, the feelings come first and the reasons come later (but are made up to support the feelings)
54 | * Emotions are the elephant, rational self is the rider.  Ultimately, the elephant is going to do whatever it wants.  The rider can only coax in rational directions and avoid danger.
55 | 	* Easy to slip from guiding the elephant to making excuses for it
56 | 	* People believe what they believe, the reasons come after
57 | 
58 | ## Intuition is the sum of everything you've learned
59 | * "how we decide" book on neuroscience
60 | * Primitive brain is good at making fast decisions
61 | * Rational brain is much slower and younger (evolutionarily)
62 | * Rational brain stores info into the emotional brain and wires a rule
63 | * That's what makes the snap decisions work
64 | 	* Story about submarine blips in Desert Storm
65 | 	
66 | ## Hobby
67 | * Can't really hire people to help with a hobby - you're doing it for your own sake, not as a means to an end
68 | 
69 | ## 90% think they're better than average
70 | * Easy to get caught up with looking good, so you never do anything good
71 | * Easy to get caught up trying to do something great, so you never do anything at all
72 | * Best way around it is to think of yourself as a beginner again.


--------------------------------------------------------------------------------
/session_notes/user behavior tracking.md:
--------------------------------------------------------------------------------
 1 | # User Behavior Tracking
 2 | ## Three steps to implement successfully
 3 | * Track!
 4 | * Interpret and identify
 5 | * Alternate and refactor
 6 | 
 7 | ## The toolbox
 8 | * Google analytics
 9 | * You with Garb
10 | * Vanity
11 | 
12 | ## Questions to answer:
13 | * Which links are being clicked?
14 | * Where are users landing?
15 | * Where are users exiting? frustrations, bounces, etc.
16 | * Where do users spend the most time? most used features, etc.
17 | * What are logged-in users doing? (segmenting users, figuring out what segments are doing, features being used, etc.)
18 | 	
19 | ## Google Analytics - answers "Which links are being clicked?"
20 | * Async tracking
21 | 	* _gaq.push - push tracking values onto an array without actually loading the google JS, then sends them off later
22 | * What should we track? Everything!  More data is always better
23 | * Track clicks as virtual pageviews
24 | 	* Assign clicks fake URLs to track appropriately
25 | 	* Need to come up with a consistent naming scheme
26 | 	* Also works well for modals, etc.
27 | 	* May be faulty sometimes - affects time on page measurements, for example
28 | * Can also track events - simple key/value pairs as a counter
29 | 	* Main disadvantage is the lack/loss of context
30 | 	* Tough to get access to from the API - just a sum count of every event which isn't very useful
31 | * Limitations
32 | 	* Time on page measurements can have weird side effects
33 | * An idea - go through the Rails app instead for some events/pageviews
34 | 	* Use a meta tag like the Rails 3 CSREF meta tag
35 | 	* Insert javascript that picks up the tracking info on load and dumps it out on load
36 | * Can also use something like Clicktale to step through the site
37 | 
38 | ## Garb - answers "Where are users landing?"
39 | * Log in with OAuth access token, set globally or per session in Garb
40 | * Pull back a profile with a UA number
41 | * Reports - can grab existing ones from classes or build new ones the fly
42 | 	* Uses a model DSL like DataMapper (Garb::Resource)
43 | 	* Specify metrics and dimensions
44 | 	* Can specify profile, offsets, limits, dates at query time
45 | 	* Returns an OpenStruct with a lot of data
46 | 	* Can set sorts and filters as well, with operations and dimensions
47 | 	* OR you can build on the fly
48 | 	
49 | ## Garb - answers "Where are users exiting?"
50 | * Google doesn't give you the exist rate via the API anymore
51 | * Can calculate from exits and page views metrics (just divide!)
52 | * Bummer that you can't sort on the Google side anymore
53 | * What about abandonment?
54 | 	* Tracking goals and progress through funnels (you get 4 groups of 5)
55 | 	* Can track with a virtual pageview
56 | 	* Can't track goals vis a vis events - bummer
57 | 	* Can't track from the backend, which kind of sucks too
58 | 	
59 | ## Garb - answers "Where do users spend the most time?"
60 | * Google doesn't give it to you, but it's just time/(pageviews - exits)
61 | 	* Doesn't do this because exit pages always sign out after 30 minutes, so it would just skew results
62 | 	
63 | ## GA/Garb - answers "What are logged-in users doing?"
64 | * Use custom variables to find out!
65 | 	 * SetCustomVar - path, slot # (you get 5), scope, and key-value pair
66 | 		* Lasts for the session, might be used to segment admin users
67 | * Great for dynamic segmentation - just grab the right criteria, then can use the ID in reports (grab from the URL, unfortunately)
68 | 
69 | ## Vanity - answers "How do we improve?"
70 | * Redis-backed library for tracking metrics and A/B testing
71 | * Define metrics to be tracked from controller
72 | * Get fancy dashboard of what's going on
73 | 	* Would be cool to build programmatic access to the dashboard data - probably pretty easy
74 | * Define ab tests and the metrics they track against
75 | 	* takes care of session cookies and segmenting users randomly
76 | 	* supports multivariate alternatives
77 | 	* Figures out the test stats for you on the fly
78 | 	* view helpers for switching display
79 | * Can integrate with Garb too!
80 | 	* Give a UA number and the metrics you want to track
81 | 	* Can also hit a report instance directly from the metric definition
82 | 	* Can get crazy custom if you want
83 | 	
84 | ## Impacts
85 | * Give info to your users - more transparency and utility - ex. Flickr stats
86 | * Easy admin panel for your users where you can't give them GA access - like Github's pageview stats
87 | 
88 | ## For the road
89 | * Follow the money - for improvements, start at the place where you make your money
90 | * Nobody is paid by the pageview - dig into useful metrics that actually tell you stuff
91 | * User testing still wins - always much easier to get feedback and more direct than trying to infer info from analytics and guessing


--------------------------------------------------------------------------------
/session_notes/implementing user recommendations.md:
--------------------------------------------------------------------------------
 1 | # Implementing user recommendations in Rails
 2 | * Matthew Deiters - @mdieters
 3 | * Make graph technologies accessible
 4 | * VW Lemon ad - brought on the golden age of advertising *creative revolution
 5 | 	* Previously had been more of an academic approach
 6 | 	* Technology changed (TV - by 1962, 90% of homes had one)
 7 | 	* Necessarily, the approach changed
 8 | 	* Transition to audience segmentation (away from male-targeted exclusively)
 9 | * Web dev is in a similar place now, particularly with NoSQL and dynamic languages
10 | * We've been doing CRUD for 30 years, and it's gotten really boring for us and for users
11 | 
12 | ## Our Creative Revolution
13 | * Recommendations = Money
14 | 	* 25% of Amazon's sales are on personalized suggestions
15 | 	* StumbleUpon ranks #1/#2 in social media traffic generations
16 | * 3 points
17 | 	* Discover the relationships in your data
18 | 		* FB news feed - using your actions to determine who's relevant to you
19 | 		* Github relationship graph with followers
20 | 		* Applies to social networking, content display (tailoring content display), website analytics, and predictive analysis
21 | 	* Modeling the relationships
22 | 		* RDBMSes don't work for this dataset and operations  b/c SQL is set-based
23 | 			* With 100 people, it's really fast
24 | 			* Doesn't scale though - 2nd degree ops with 60k people takes over an hour
25 | 			* Get some really bad SQL smells - RECURSIVE SQL extension and stuff like that (Hasselhoff graphic)
26 | 		* Use graphs instead - relationships are first class citizens in the data model
27 | 			* Instead of rows, you have nodes (or actors or vertices or points)
28 | 			* Edges are relationships between those nodes (or arcs or links)
29 | 			* Ultimately less complex and more natural (and whiteboard friendly)
30 | 			* Plus they're really fast (FUCK YOU, EINSTEIN)
31 | 	* Using graphs in your Rails app
32 | 		* In-Memory Ruby Graphs
33 | 			* Great for small datasets or ad-hoc querying
34 | 			* There's the RGL gem but it's cryptic (github.com/fmeyer/rgl)
35 | 			* 20k nodes and 1M edges = 300MB of memory (not bad) plus queries went from 30 seconds to a few milliseconds (smallish dataset though)
36 | 		* Persistence & Dynamic Data - Neo4j (OSS graph DB in Java)
37 | 			* Key-value store plus relationships between keys
38 | 			* Neo4j.rb is a solid interface library (ORM) using Lucene but requires JRuby
39 | 			* Don't need to store everything in Neo4j - just another persistence tool like Solr
40 | 			* Neo4jr-social is another library - more like solr/sunspot, much easier to run and doesn't require JRuby
41 | 				* Self-contained jetty server or deployable WAR
42 | 				* Focused on social network analysis
43 | 				* Built-in basic querying, very extensible
44 | 			* (Live code example w/ Star Wars stuff) - built in relationships, suggestions, degrees of separation
45 | 		* Gremlin - xPath-like syntax for querying your social graph
46 | 			* Had a MongoDB backend - cool but not efficient
47 | 			* Also works on Neo4j
48 | 			* More than anything, it's a query syntax that's really easy to use
49 | 			* Net net, SQL is assembly required, Graphs mostly work out of the box with pre-built queries
50 | 				* AllSimplePaths - who do we have in common?
51 | 				* ShortestPath
52 | 				* Dijkstra algorithm - add scores to edges
53 | 					* Can build who do we have in common on steroids
54 | 				* Closeness Centrality - who has the most followers on Twitter
55 | 				* Betweenness Centrality - Who has more influential people following them on Twitter (used for the Github London followers graph)
56 | 				* Eigenvector Centrality - Essentially PageRank
57 | 			* Performance implications
58 | 				* Sparse graphs are faster because there's less traversal necessary
59 | 					* if you get too dense the data isn't especially meaningful
60 | 				* Neo4j is a hoss - needs lots of RAM to avoid swapping
61 | 				* Metadata - can vary depending on how you want to slice and dice and limit
62 | 				* Breadth-first vs. depth-first searches - need to match to data and the problem you want to solve
63 | 
64 | ## Conclusion
65 | * Start thinking in graphs when it makes sense
66 | * NoSQL as an augmentation like Solr or Sphinx or Memcached - okay to have different stores for specific kinds of data
67 | 	
68 | # Q&A
69 | * Probably don't want to store a lot of metadata in Neo4j - use identifiers and relationships, and node data where necessary
70 | * Performance is limited by complexity of relationships, not size of dataset
71 | * Nodes and relationships can be of any type
72 | * Dex graph engine - C-based, could be easy to bind into Ruby.  Also RichRelevance (SaaS) and DirectedEdge (Shopify)
73 | * For loading data, mine info out of relational DB and put it into Neo4j
74 | 	* Not necessarily duplicate data, just modeled for different purposes
75 | * Deployment - Neo4j is a memory hog, may be able to keep the DB in memory but not sure how well it releases RAM


--------------------------------------------------------------------------------
/session_notes/million dollar mongo.md:
--------------------------------------------------------------------------------
 1 | # Million Dollar Mongo - Breaking Free of the Relational Mindset
 2 | ## Backstory ##
 3 | * Client facing total business failure - 2 years of dev, little to no progress, had a hard 6-month deadline
 4 | * Competitive RFP
 5 | * In the process of signing a very large insurance company for a SAAS app that they didn't have built
 6 | 	* Service almost 1/2 of insured people in the US
 7 | * Started with a $500k budget, scope ballooned after 4mo, eventually billed $1.7MM, over 10,000 hours in 9 months
 8 | 	* Peak head count was 14 people across 4 continents
 9 | 	* Reqs driven by client's client - dealing with 1 record at a time since it's pharmaceutical/healthcare
10 | 	* Production releases Feb to April 2010
11 | 	* Ridiculous project points breakdown graph showing scope creep - could be useful for factory!
12 | * Technical backstory
13 | 	* Originally relational data, stored across hundreds of tables
14 | 	* Ridiculously join-heavy, query-heavy (35-36 per request)
15 | 	* Original system was on DotNetNuke and MSSQL
16 | ## The Solution ##
17 | * At core, a patient record system
18 | * Files are very distinct and have no overlap.
19 | * Built spikes for Mongo, Couch, and Cassandra
20 | 	* Perf numbers on Twitter
21 | 	* Big selling point for Mongo was dynamic queries
22 | 		* Couch Map/Reduce felt too much like stored procs
23 | 		* Couch fell over a bit at 10,000 docs on insert/update operations
24 | 		* Cassandra had "too much fucking XML"
25 | 	* Schemaless approach made legacy migration much easier
26 | 		* CSRs needed to build/update records with history
27 | 		* Sotred history by generating new keys on the fly
28 | * Resistance to change was a problem
29 | 	* Hashrocket didn't have an internal consensus that this was the right way to go - Mongo was only 0.8, was this the right thing to stake such a large engagement on?
30 | 	* "So, are we going to have to write our own ORM for this thing?"
31 | 	* Out of the normal comfort zone - both good and bad
32 | 		* Normal Rails development has become "more like a zombie"
33 | 		* New challenges, very interesting
34 | * Easiest DB to install ever - no deps
35 | * Schemaless DB makes things uber-agile because migrations are painless
36 | 	* Able to cut down time for data import process
37 | 	* Code has to be able to deal with different data structures - more app-side load
38 | 	* Client DBAs kept changing the schema, causing issues
39 | 	* (fast ramp-up for non-idiots) - if you can understand JSON, you can work with mongo!
40 | * Mongomapper -> Mongoid
41 | 	* Hashrocket fork had 33 patch releases, had more that didn't really fit into the MM philosophy
42 | 	* Prefers embedding documents for everything
43 | 	* Default to atomic operations
44 | 	* Rich criteria API, master/slave support, versioning, etc.
45 | 	* Is NOT ActiveRecord (!!!)
46 | 	* Designed with performance in mind
47 | 	* Average document size is 500k, biggest about 1MB (4MB is limit, but would be hard to hit) - big but can get everything in 1 request
48 | 	* As of Mongo 1.4, you can index asynchronously on inserts
49 | 	* Regex queries on fields, etc.
50 | * Schema design patterns
51 | 	* Denomalization is acceptable
52 | 	* Embed whatever you can
53 | 	* Be wary of pre-epoch dates - use integers where you can
54 | 	* One query to rule them all - no joins!
55 | * MySQL is fast, Mongo is faster
56 | 	* 300,000 writes/second
57 | 	* Didn't even need to shard mongo on an EC2 XL instance
58 | * Hybrid DB model
59 | 	* Hierarchical data in Mongo, relational data in MyISAM, transactions in InnoDB, simple data in Redis
60 | 	* Having more than one DB is okay - favor suitability over simplicity
61 | 	* Needed relational DB as staging tables for normalized imports
62 | 	* Working without transactions
63 | 		* Atomic updates mostly cover you
64 | 		* Optimistic locking is pretty easy to roll in
65 | 		* Integration tests require manual rollback
66 | 		* Selenium tests can run in a separate process (neat) - enables something like SpecJour helps the 40-minute cucumber run
67 | * In Production
68 | 	* Deployed on EC2
69 | 	* Close to 500GB of data
70 | 	* Put Mongo on its own utility instance instead of the DB master since MySQL will fight it
71 | 	* Feed it lots of RAM and disk space
72 | 	* Setting up chef scripts was pretty easy
73 | 	* Costs about $10k a month
74 | 	* For backups, do a mongo dump and send it to S3
75 | 		
76 | ## What happens when 'real america' interferes? ##
77 | * Relational DBAs can be retarded (especially when threatened by change)
78 | 	* Set up squealer to provide a proxy to MySQL for ad-hoc querying and a GUI
79 | * How the hell did we end up in bed with these people?
80 | * Emerging technologies threatened those opposed to change
81 | 	* Anyone with a C and O in their title has no business speaking about technology
82 | 	* CIO saw the well-suited/less-well-suited page and pitched a fit
83 | 	* Called up another Rails firm (Intridea) to verify, that firm backstabbed somewhat and said they didn't think Mongo was an appropriate choice, and offered to convert in a month
84 | 	* Real bummer in the community that this sort of stuff happens
85 | 
86 | * Would be nice if more case studies on large mongo apps existed for background info.


--------------------------------------------------------------------------------
/session_notes/Building an API with Rails.md:
--------------------------------------------------------------------------------
 1 | # Building an API with Rails #
 2 | * Joe Ferriss (Thoughtbot/Hoptoad), Jeremy Kemper (37S), Marcel Molina (Twitter), Rick Olson (Github/ENTP), Derek Willis (NYT)
 3 | 
 4 | ## Authentication ##
 5 | * Twitter implemented OAuth 1.0 about a year ago, then 1.0a for a security issue
 6 | 	* 1.0 spec is difficult to implement because of the signature algorithm - makes requests difficult and easy to screw up
 7 | 	* 2.0 spec uses tokens over SSL instead - much simpler.  
 8 | * 37S is working on OAuth
 9 | 	* Distinction between authentication and authorization is interesting - changes app responsibilities
10 | 	* Wrap is an interesting concept - rolled into OAuth 2
11 | * Github using OAuth 2
12 | 	* Lets you compartmentalize access
13 | * Thoughtbot
14 | 	* Simplicity is key - fast setup
15 | 	* OAuth is cool but overkill/unnecessary
16 | 	
17 | ## Input and Response Formats ##
18 | * Thoughtbot
19 | 	* Originally accepted multiple formats, simplifying and standardizing helped scale more easily (write-heavy)
20 | 	* Documentation is huge - XML schemas provide this implicitly which is nice
21 | * NYT
22 | 	* Read-heavy API, necessary to offer larger data streams and archives in multiple formats for analysis
23 | * Twitter
24 | 	* Like to give devs many options
25 | 	* Moving to standardize on JSON - don't see the lack of choice as an issue
26 | * 37S
27 | 	* Treat all formats equally, restful/resourceful architecture is huge and makes a lot of this a non-issue
28 | 	* Smaller apps make it much easier
29 | 	
30 | ## Versioning ##
31 | * Twitter
32 | 	* Really hard problem (and great interview question) - no real good answers yet
33 | 	* 5 separate things to deal with - format, params, serialization, features, etc.
34 | 	* Fork app and do separate deploys vs. inheritance - change management is a bitch
35 | 	* Twitter has held off - no compelling force driving to implement, don't want to inconvenience devs.  Have handled issues by other means.
36 | 	* Build in new functionality on a switch on/switch off basis - can ease transitions.  Just make sure you clean them up later.
37 | * 37S
38 | 	* Easy to get drunk on engineering and want to tackle it when it's not really a problem
39 | 	* Give developers lead time, but don't sweat making changes
40 | 	* Flip formats with mime types, etc - lots of little tricks to ease transitions
41 | * Thoughtbot
42 | 	* Some people will never upgrade until things are broken, but at the same time the dropoff/abandonment rate is not very big (for those that don't make the jump)
43 | * NYT
44 | 	* Developers are flexible and will roll with a lot of things.  Don't abuse them, but don't bend over backwards either.
45 | 	
46 | ## Scaling ##
47 | * NYT uses mashery for authentication but not for caching - on a per-app basis
48 | 	* Use varnish for frontend - totally awesome
49 | 	* Internal use is huge - timing and cache invalidation is critical to the usefulness of the dataset
50 | * Hoptoad is difficult because of the write-heavy nature
51 | 	* Everything has a cost - need to figure out what they are and prioritize accordingly
52 | 	* Rate limits, etc. help
53 | * 37S
54 | 	* API traffic is way different to cache than web traffic - lots of scraping, batching, etc.  Need different strategies.
55 | 	* Need to rate limit to keep it under control.  Clients get accidentally out of control often.
56 | * Twitter caches everything, but you need to watch out for working set size.  Be smart about it.
57 | 	* Dynamic assembly at runtime
58 | 	* Mind the cost of expiration too
59 | 	* Streaming API for continuous data to prevent polling (which kills)
60 | * Github
61 | 	* Let clients make use of HTTP caching to help you out
62 | 	
63 | ## Code Separation
64 | * Github
65 | 	* First API was inline with the UI (respond_to) - now use separate controllers because the logic was getting kludgy
66 | 	* Eventually want to separate into its own app (Sinatra or like)
67 | * Twitter
68 | 	* Nice when you can make all of the business logic in one place
69 | 	* If you use skinny controller/fat model, it's not much duplication
70 | 	* Much easier to reason through changes, figure out logic if it's separate
71 | * 37S
72 | 	* This is why respond_to exists - use it!
73 | 
74 | ## Security Concerns ##
75 | * Twitter - annotations present an interesting challenge as they can store any data
76 | 	* Moral:  you're going to get screwed, no matter how awesome you are.  Be prepared.
77 | * Github
78 | 	* Watch out for login cookies/sessions, particularly with badges
79 | * 37S - keep the walls up
80 | 
81 | ## Developer Communication ##
82 | * Twitter - obviate the need to communicate with developers
83 | 	* Detailed status sites, etc.  Keep them in the loop.
84 | 	* Use what works for you
85 | * Thoughtbot - make sure you're not totally wrong
86 | 	* People will hold you totally responsible for what you say
87 | * 37S - figure out what business priority you have for the API and treat it like a product with devs as customers
88 | 	* Get rid of your API if you're not going to support it
89 | * NYT - build sample apps for documentation for every change.  Helps you learn too, be able to relate.
90 | * Github - foster a sense of community and open lines of communication
91 | 	* Community-supported docs, etc.
92 | 	* Be your own customer (37S highrise iPhone app)


--------------------------------------------------------------------------------