├── .gitignore ├── session_notes ├── garbage collection.md ├── scaling to millions.md ├── Dr Nic - Contributing to OSS.md ├── adventures in full text search.md ├── learning to speak interface.md ├── persistence smoothie.md ├── Don't Repeat Yourself.md ├── app monitoring with cucumber.md ├── cassandra and cassandraobject.md ├── redis rails and resque.md ├── from 1 to 30.md ├── derek sivers.md ├── user behavior tracking.md ├── implementing user recommendations.md ├── million dollar mongo.md └── Building an API with Rails.md └── README /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .dropbox 3 | slides -------------------------------------------------------------------------------- /session_notes/garbage collection.md: -------------------------------------------------------------------------------- 1 | # Garbage Collection and the Ruby Heap 2 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | jayzes notes from Railsconf 2010 in Baltimore with mine added in. Feel free to fork and add your own. -------------------------------------------------------------------------------- /session_notes/scaling to millions.md: -------------------------------------------------------------------------------- 1 | # Scaling to Millions 2 | * EC2 performance is definitely an issue - big difference between physical and virtual hardware 3 | * Mongo falls down on big datasets/high concurrency due to locking issues 4 | * Cassandra is cool, but likes larger clusters 5 | * Recovery is very resource intensive, can kill smaller machines 6 | * Worth the hassle if you have billions of rows. 7 | * Lesson: I'm a lousy sysadmin 8 | * Don't drink the cool aid or believe the hype, definitely do your own homework before going with a new piece of infrastructure -------------------------------------------------------------------------------- /session_notes/Dr Nic - Contributing to OSS.md: -------------------------------------------------------------------------------- 1 | ## Dr. Nic - Contributing to OSS ## 2 | * What do you make? Tools, TM bundles, little things, first drafts 3 | * Help or recreate - pitch in rather than rewrite 4 | * Nice to have your name as an author 5 | * Less responsibility to contribute than maintain 6 | * Visibility of contributions via blogs, announcements, etc. 7 | * Make friends (at conferences) 8 | * Project teams - not necessarily exclusive 9 | * Just 'do' - add/fix/document what you want 10 | * Opportunity to express individuality 11 | * Aim is to get things done 12 | * OSS is a tool box, spare parts box, free and delicious 13 | * The 8 Steps (not magic, just convenient) 14 | 1. Get annoyed by a defect or missing feature 15 | * Circle of concern and circle of influence 16 | 2. Finding the source 17 | * Step 2b - Has anyone else fixed it already? 18 | 3. Checkout the source 19 | 4. Snoop around 20 | * Rake -T, etc. 21 | 5. Make changes 22 | * Add coverage tests 23 | * Add break tests 24 | * Document! 25 | * Cleanup/refactor 26 | * Add features 27 | 6. Refresh the code from the repo 28 | 7. Create the patch 29 | 8. Submit patch 30 | -------------------------------------------------------------------------------- /session_notes/adventures in full text search.md: -------------------------------------------------------------------------------- 1 | # Adventures in Full Text Search 2 | ## Sarah Allen (Mightyverse) 3 | ### 11:45am Thursday, 06/10/2010 4 | 5 | What are the things that make full text search special? 6 | How do you choose what to use? 7 | 8 | ##Stemming 9 | ###Q: 10 | How do you avoid embedded words that should not show up in the search (ie. run inside of drunk) 11 | ###A: 12 | Stemming 13 | 14 | ##Stop words 15 | Words that are meaningless to search and need to be added to an excluded list 16 | 17 | ##Tokenization 18 | Contextualized groupings of words that have different meanings when grouped in different ways. More common with Asian languages than in english. 19 | 20 | 21 | ##Accuracy 22 | * Its a tradeoff between accuracy and speed. 23 | * How important are false positives (or the absence of)? 24 | * How important are the top results compared with the exact number of results? 25 | 26 | ##Message to take home 27 | * Learn how users use search on your site to inform your search implementation. 28 | * Simple database based full text searches may work for your needs. 29 | 30 | ##Thoughts 31 | * very cursory 32 | * useful for a beginner overview of key concepts 33 | 34 | -------------------------------------------------------------------------------- /session_notes/learning to speak interface.md: -------------------------------------------------------------------------------- 1 | #Learn to Speak Interface: Creating Conversations Between Developers and Designers 2 | ##Jess Martin (Relevance ) 3 | ###2:50pm Wednesday, 06/09/2010 4 | 5 | *@jessmartin 6 | *Slides and worksheets: *SpeakInterface.com 7 | *Feedback: htttp://spkr8.com/t/3440 8 | 9 | 10 | ##Software is for people 11 | Know the person, purpose, priority 12 | 13 | ##Three questions: 14 | 15 | * WHO is the user 16 | * WHY are they here? 17 | * WHAT is most important? 18 | 19 | ###WHO 20 | 21 | *Exercises 22 | 23 | * Day in the life (go through person's life mentally - what are they doing throughout the day, especially when they are using your application) 24 | * Draw a timeline (see image from slides) 25 | * Usage helps identify key features or requirements 26 | * Gotta Wanna 27 | 28 | ###Why do they use the app? 29 | 30 | * Do they choose to? (intrinsic) 31 | * Has someone told them to use it? (extrinsic) 32 | PURPOSE - Why are they here? 33 | 34 | *Exercises 35 | 36 | * Back-seat driver - What would you tell them to click next (the thing you'd be frustrated if they missed). 37 | * Cotton-Eyed Joe - Where is the user in the flow of the application? 38 | 39 | ###PRIORITY 40 | What is the most important element or action to be taken on the page. 41 | 42 | *Exercises 43 | 44 | * MVP test (don't have more than 3 items in the hierarchy on the page) 45 | * Design for mobile (focuses on the most import things) 46 | * Design from the Inside Out 47 | * Focus on what matters most first 48 | * Evaluating the MVP 49 | * Squint Test 50 | * Font Size Test (make sure not to have more than 3 levels of hierarchy) 51 | -------------------------------------------------------------------------------- /session_notes/persistence smoothie.md: -------------------------------------------------------------------------------- 1 | # Persistence Smoothie # 2 | * Example code is at http://www.github.com/flipsasser/Persistence-Smoothie 3 | 4 | ## What is NoSQL? ## 5 | * Any kind of persistene 6 | * Generally speaking, ACID is out the door in favor of speed and scalability 7 | * Probably shouldn't be running it *all* the time - good tool for the toolbox though 8 | * Key-value stores 9 | * Redis, riak, Voldemort, Tokyo Cabinet, MemcacheDB 10 | * Hashes in the sky 11 | * Use case is taking ugly (and non-performant) joins and transforming them into denormalized data 12 | * Document Stores 13 | * Mongo, Couch, Riak 14 | * Multi-level hashes 15 | * Generally query with map/reduce to flatten 16 | * Use case is loose schema, no ACID needed 17 | * Column-based stores 18 | * Cassandra, HBase 19 | * Have a schema, more like a traditional DB 20 | * Graph stores 21 | * Track relationships efficiently 22 | * Neo4j, HypergraphDB, InfoGrid 23 | * Use case is deep relationship mapping 24 | 25 | ## How do you get started with an existing app written with SQL? ## 26 | * First option is putting together helpers - just slam them into models 27 | * not very DRY 28 | * Second option is using datamapper 29 | * swiss army knife of ORMs 30 | * much simpler, gives a lot of AR niceties 31 | * still requires AR refactoring, ecosystem not as mature 32 | 33 | ## Example - Convert a Store App ## 34 | * Cases are Authentication, Products, Purchases, Activity stream 35 | * Auth is kind of not worth converting 36 | * Products are the sweet spot though - loose schema, lots of different structure 37 | * Purchases need to be transactional, don't mess with those 38 | * Activity stream is another sweet spot - lots of joins to be eliminated (!) 39 | * (Live coding) 40 | * Can use OpenStruct as a base for some simple persistence layers 41 | * Deployment and data portability become a much bigger pain in the ass when dealing with multiple data stores 42 | * The ecosystem of tools around ActiveRecord is really nice, but you basically have to say goodbye to it for many NoSQL ORMs (gem uninstall everything) -------------------------------------------------------------------------------- /session_notes/Don't Repeat Yourself.md: -------------------------------------------------------------------------------- 1 | # Don't repeat yourself, repeat others # 2 | * Experiences writing Ruby - improve, learn, share 3 | * 3 points - create, steal, think 4 | 5 | ## Create ## 6 | * What we have to learn to do, we learn by doing - Aristotle 7 | * Reinvent the wheel - important to dev education and skill (like bodybuilder) 8 | * MongoMapper - redid others work, but gained an appreciation for the work of others (DM, AR) 9 | * Dynamic languages are dynamic 10 | * autoload - classes loading on demand 11 | * method_missing magic 12 | * dirty keys in AR 13 | * include hook and class_eval - used for the mapper mixins 14 | * super and the ancestor tree (with overrides in the same file) 15 | * dynamically defined classes and modules (Class.new, Module.new, etc) 16 | * Objects can do more than #new and #save 17 | * equality - #eql? versus == (can alias == to #eql?) but don't touch #equal 18 | * clone vs. dup 19 | * clone doesn't clone internal objects, like hashes 20 | * can use initialize_copy method definition for your class to work properly 21 | * Hooks 22 | * inherited hook - this is how STI works 23 | * Validations, callbacks, comparable, enumerable, etc. 24 | * Patterns - not just for the enterprise 25 | * Proxies a la AR - class as an interface to another class 26 | * Decorators - adding behavior to an instance dynamically. Can use extend/include on an object! 27 | * Identity map - "a man with 2 watches never knows the time" 28 | * Don't ever load the same object 2x 29 | * Local APIs - eat your own dogfood to make sure it's consistent for others 30 | * "I want to make things, not just glue things together" - Mike Taylor 31 | 32 | ## Steal ## 33 | * HTTParty steals from DM, AR 34 | * Happymapper steals a *ton* from DM 35 | * Mongomapper - steals from sequel 36 | * Plucky is a lot like Hash, Arel 37 | * gem whois 38 | * canable - thanks pivotal 39 | 40 | ## Think ## 41 | * Make decisions. Don't add things because you *might* need them 42 | * Extraction vs. prediction. Don't extract until you repeat yourself. 43 | * Refactor (read the book) 44 | * Sketch them out on paper 45 | * Write 46 | * Helps you connect all the dots on what you've learned 47 | * Write for yourself, helping others is incidental -------------------------------------------------------------------------------- /session_notes/app monitoring with cucumber.md: -------------------------------------------------------------------------------- 1 | # App Monitoring with Cucumber 2 | ## Evolution of testing 3 | * Save & Refresh -> Test::Unit (difficult to demonstrate business value) -> RSpec + BDD (more stakeholder-digestible) -> Cucumber (awesome) 4 | * Cucumber as a business-readable domain specific language 5 | - Documentation, automated tests, and development aid all in one 6 | * But what about the actual production app? 7 | * Monitoring is treated as revenue preservation 8 | * Toolsets not as advanced 9 | * Only about 12 people in the room have external monitoring for more than a fitter_happier type URL 10 | * 2 axes - what are you looking at (URLs, etc) and how close are you looking (expected values, etc)? 11 | * If you're checking just the home page, you have great focus but not breadth 12 | * Ex. search - frequent issues on many sites, may be difficult to detect automatically (0 results for beer!) 13 | * Not really TATFT on production - why not test where the revenue is being made? 14 | 15 | ## Existing tools for monitoring are old & broken 16 | * Nagios is ugly and tends to have a lot of noise - leads to boy crying work scenarios 17 | * Nagios is EVIL 18 | * Pingdom - not much depth in tests 19 | * Watchmouse is pretty cool - can drill into specific URLs and methods 20 | * Twitter is using for enhanced status monitoring 21 | * All tools miss a direct link between business value and the actual alerts being sent out 22 | 23 | ## Enter cucumber - #devops! 24 | * Blur the line between devs and ops staff, and between development and infrastructure management 25 | * Kumbaya. 26 | 27 | ## Examples (checkout cucumber-nagios) 28 | * Benchmarking 29 | * E-mail delivery (use mailinator for stubbed e-mail boxes) 30 | * Tap into existing tools like scout and newrelic (cucumber-scout and cucumber-newrelic) 31 | * SEO checks (go to google, check search results) 32 | * Security - check to make sure people have access and others do not (fired employees, etc.) 33 | * Infrastructure status (RAID arrays, etc) 34 | * DNS - make sure everything resolves properly 35 | * Credit card transactions, SSL (valid certificate), error rate (Hoptoad/Exceptional) 36 | 37 | ## Running features in production 38 | * Use a JSON cucumber output formatter (cucumber-json) 39 | * Set up a separate 'production' environment and feature files/step defs/support 40 | * cucumber-p production 41 | * OR build a cucumber Scout plugins 42 | * Use pagerduty.com to send e-mail alerts out and assigning them out 43 | * Ultimately, this gives you the power to monitor anything and everything with clearly defined business value in a way stakeholders can read and understand, and know of issues before your customers and clients do 44 | 45 | ## Q&A 46 | * Use hudson for production checks - maintain history, know if things are changing 47 | * Use long-running transactions for dropping generated data. -------------------------------------------------------------------------------- /session_notes/cassandra and cassandraobject.md: -------------------------------------------------------------------------------- 1 | # Cassandra and CassandraObject # 2 | 3 | ### Disclaimer: I (Koz) don't have to scale, so no scalability porn here ### 4 | 5 | ## About Cassandra ## 6 | * Pre-1.0, full-fledged Apache project 7 | * Distributed and fault-tolerant and elastic from the ground up 8 | * Runs on a cluster of servers that organize into a ring based on key ranges 9 | * Nodes are aware of each other and talk using a gossip-based protocol 10 | * Configurable keyspace partitioning - random (MD5, evenly-distributed) or order preserving (solely key-based, uneven) 11 | * Configurable replication factor - number of nodes to write items of data to. Defaults to 3 12 | * Replication happens asynchronously after writes - this is where consistency levels come into place 13 | * Essentially, how many nodes need to have responded to a request in order for it to be considered consistent (and thus complete) 14 | * Options include none, one, quorum (majority), all 15 | * Allows you to trade off performance and data consistency speed 16 | * Fault tolerance will automatically skip dead servers when reading/writing data - can decommission nodes and reassign data on the fly 17 | * Adding a new node will automatically rebalance the cluster and increase capacity accordingly 18 | 19 | ## Data Model ## 20 | * Fundamentally a column store (using keys) - not a key-value store! 21 | * Columns consist of a name, value, and a timestamp 22 | * Column families/rows consist of an ordered set of columns 23 | * Supercolumns point to a set of rows (a 2 level hash, essentially, like JSON) 24 | * Can set ordering on columns families using CompareWith (for example, TimeUUID) 25 | * There's no WHERE or ORDER or COUNT or SUM in querying - it's more about query-driven modelling 26 | * Populate a model that lets you query what you need 27 | * Create column families like indexes to search - you need a column family per query, essentially 28 | * Can get specific keys individually or grab ranges (only feasible with the order-preserving partitioner due to assembly time) 29 | 30 | ## Do you really need Cassandra? ## 31 | * Different enough that you want to make sure you need it before you dive in 32 | * Won't be writing a little blog in it, likely 33 | 34 | ## CassandraObject ## 35 | * Opinionated, hierarchical like ActiveRecord 36 | * Started out as a way to prove the usefulness of ActiveModel (and learn Cassandra) - not to solve scaling problems 37 | * Mostly AR-compatible 38 | * Usage 39 | * Define attributes and types (pluggable system for implementing those in Cassandra) 40 | * Validations (exactly like AR, except the stuff reliant on DB ops like uniqueness) 41 | * Keys - selection is crucially important since it determines how you can query 42 | * Can do UUID, natural (based on an attribute), or some other key factory object (like a Redis INCR-based factory, very simple API) 43 | * Migrations 44 | * CassandraObject uses an integer schema version attribute to auto-execute data migrations on access 45 | * Pretty cool and transparent 46 | * Indexes 47 | * Lets you do AR_style find_by_x methods 48 | * Can be unique or non-unique 49 | * Read-repair is used to re-update indexes on operations 50 | * Associations 51 | * Deliberately named differently since they function quite differently than AR - no proxy create methods or finders, etc. 52 | * The library is still very beta and API/data is in flux, but Koz is happy to take patches 53 | * Works on Rails 2.x and 3 -------------------------------------------------------------------------------- /session_notes/redis rails and resque.md: -------------------------------------------------------------------------------- 1 | # Redis, Rails and Resque # 2 | ## 3 favorites pieces of software ## 3 | * Redis - "we've been wanting it for a long time, even if we didn't know that we wanted it" 4 | * Key-value store for data structures 5 | * Hash, list, set, sorted set, string, integer, all with different operations 6 | * RAM used to be a limiting facter, not anymore. Can shard across machines to increase capacity 7 | * Used @ GH as a routing table, monitoring tool for file server load, etc. 8 | * Can use in multiple different languages, which is huge 9 | * antirez is awesome 10 | * Responsive and transparent (blog posts, etc.) 11 | * Included client libraries and CLI utils for easy setup 12 | * Unicorn - not a stupid name 13 | * Uses Mongrel parser (which is awesome) 14 | * May not be the fastest, but it's uber reliable and graceful at dealing with failure 15 | * Uses all kinds of UNIXy underpinnings 16 | * Unicorn uses select() to load balance to workers without blocking 17 | * Preforking worker to get shared memory, fast startup, and durability for stalled workers 18 | * No problems with all mongrels being in the restarting state anymore 19 | * Rack-based, but using a shim for Rails 2.2 20 | * Luke Malia's GC middleware <-- CHECK OUT 21 | * Great maintainer, very attentive, solid code 22 | * Delayed_job - background processor, queue, man about town? 23 | * Stupid simple 24 | * Sort of like a post-commit hook for your application actions 25 | * GH has gone through 6-7 queueing systems 26 | * SQS was first - basic wrapper with loop/do 27 | * DJ doesn't deal with chaos in workers very well, unfortunately - hanging jobs, etc. 28 | * Time to roll your own 29 | * Not based on SQL cause we're not good at it 30 | * Needs to be easy! 31 | * Needs to be atomic for operations to ensure no work is being done multiple times 32 | * High visibility - jump in, kill jobs, etc. 33 | * High relaibility for the system (at the cost of reliability for individual jobs) 34 | * Didn't want full serialization YAML code from DJ - too much overhead 35 | * Note: Please don't write push-bots 36 | * Enter resque! 37 | * Based on redis 38 | * No complicated priorities - queues for buckets 39 | * Makes it easy to use redis-cli to investigate things 40 | * JSON serialization makes things very simple - class and args, that's it 41 | * You can have workers in many different languages - Node, C, etc. 42 | * Very simple API 43 | * Doesn't actually implement queueing - lets redis do that cause it's a better implementation 44 | * Frontend queueset - critical, high, low, etc. 45 | * Queue names can indicate locality - different queues for pages, frontend, backend, etc. 46 | * Pages are static data through nginx 47 | * Easy monitoring - just look at redis keys 48 | * Preforking model 49 | * Child can have chaos, it's okay, just kill it 50 | * Master can deal with signals and manage the whole deal through its own signals 51 | * Abuse the procline to help with debugging 52 | * Web UI 53 | * Trying to make it an awesome open source project 54 | * Open list, community of patches, etc. 55 | * Balance the README so it isn't too long 56 | * Need to have a vision and a plan for your OSS projects, and be available - that will bring people running 57 | * Have no features in the main project - encourage a healthy ecosystem for plugins to support functionality you don't want as a maintainer 58 | * What about resque 2.0? 59 | * BLPOP in Redis 2 to eliminate polling between client/messaging server 60 | * Can do multiple keys - almost a full queueing system with no Ruby code! 61 | * More plugins and hooks - nicer plugin API and demo app 62 | * Redoing UI in Mustache 63 | 64 | ## Q&A ## 65 | * Use supervise for daemon monitoring 66 | * Actually run background tasks on the frontend because Unicorn is so efficient -------------------------------------------------------------------------------- /session_notes/from 1 to 30.md: -------------------------------------------------------------------------------- 1 | # From 1 to 30 - Breaking down a big app 2 | 3 | ## Tale of Two Buildings 4 | * Forbidden city - constantly refactored and made bigger, lasted for years and years 5 | * Lotus Revierside highrise - built bigger and bigger by greedy developers, fell over on its side (http://www.q2hoo.com/2009/06/a-series-of-photos-about-collapsed-shanghai-flat-lotus-riverside-compound.htm) 6 | 7 | ## The app is a distributed call center app for language trainers 8 | * Mission critical - kept adding more and more and more to it 9 | 10 | ## Monolithic - entire app runs as one Rails application 11 | * Confuses and scares new staff <-- BIG PROBLEM 12 | * Hard to test/extend/scale 13 | * Code gets messy 14 | 15 | ## What is 30? 16 | * An ecosystem of independent Rails applications 17 | * Linked and seamless 18 | * Users don't see the difference 19 | 20 | ## Each app: 21 | * Has a separate database 22 | * Runs independently (has its own user stories) 23 | * Is lightweight (can be handled by a single developer) 24 | * Has tight internal cohesion and loose external coupling 25 | * Advantages: 26 | * Independent development cycles 27 | * Developer autonomy 28 | * Safety from technology (im)maturity 29 | * Appeal to developer laziness - overriding philosophy of making it easier on yourself (cause it's going to happen anyway) 30 | 31 | ## The Mysteries of the Forbidden City 32 | * Consistent UI/UX 33 | * Shared CSS/JS framework and style guide for all apps 34 | * Gem all common helpers (combo search, list tables w/ pagination) 35 | * Any time you see repetition, extract and reuse! 36 | * Implement new view code in a separate application 37 | * Extract into plugin 38 | * Roll up into a gem 39 | * Sharing data 40 | * For example, purchase app needs data stored in the 'courses' application 41 | * Tried built-in Rails stuff (ActiveResource) - tends to be really slow and difficult to debug 42 | * Lose Rails magic 43 | * Tried SVN externals, but a pain to manage 44 | * Read-only database connections 45 | * One app responsible for writing/updating/creating data, the others just reading 46 | * Pretty safe way of integrating 47 | * Acts_as_readonly macro to disable AR save functionality 48 | * Configuration is hell, however 49 | * Use CoreService ActiveResource model to retrieve app-specific info 50 | * When each app starts, it posts its configuration info to a central service for all others to access 51 | * Ordering issues aren't too bad - not too many tight dependencies 52 | * Services - pain to build but necessary for write-centric operations 53 | * AJAX-loaded view segments - put in a div from another apps URL 54 | * Authentication and Access Control 55 | * Registration/Login 56 | * Share sessions between applications via a common session store 57 | * Profile management 58 | * Role-based Access Control 59 | * Controller-based - each controller is a node 60 | * Posted to CoreService when app starts 61 | * Retrieve access rights 62 | * Controllers are responsible for checking rights 63 | * Services 64 | * File uploads - can't use S3 because of the Chinese firewall so wrote a similar service to receive files posted in the background 65 | * make use of super-polymorphic relationships keyed off the application as well as model and id 66 | * Mail - universal sending 67 | * Comet service 68 | * Deployment 69 | * Load each app into a subdirectory, then use unicorn with multiple paths 70 | * Use NGinx as a reverse proxy to rewrite URIs to be uniform 71 | * Definitely extra memory consumption, but it's worth the dev time savings 72 | * Have to scale apps independently based on traffic levels - very easy to tweak and balance 73 | 74 | ## Net result: Higher productivity 75 | * Faster build-to-deploy 76 | * More developer autonomy 77 | * Safer 78 | * more scalable 79 | * Easier for new folks to jump in 80 | * Greater developer happiness -------------------------------------------------------------------------------- /session_notes/derek sivers.md: -------------------------------------------------------------------------------- 1 | # Derek Sivers Keynote - thoughts.pro # 2 | 3 | ## Sometimes you need to see how things work from an opposite point of view 4 | * Chinese doctors - you pay only when you're healthy (to prevent sickness) 5 | * 2-3-4-1 musical meter 6 | * Upside-down map w/ new zealand and australia at the top 7 | 8 | ## Accents and Identify 9 | * French accent story 10 | * You can't hear your own accent to know how bad it is 11 | * Works the same for programming languages - Java guys come to Rails with a serious accent 12 | 13 | ## Quirks 14 | * Light switch story - waving a hand in front of a panel 15 | * Quirks are only cool when you're used to them 16 | * "Don't make me think" - enormous difference between observing quirks and having to use/get through them. 17 | * Not so charming when you need to use them. 18 | 19 | ## My PHP Framework 20 | * CDbaby was cobbled together and quirky, but it worked 21 | * Found Rails partway through 22 | 23 | ## Playing vs. Planning 24 | * Present focus vs. future focus 25 | * Predictably irrational - $20 bill auction 26 | * Easy to get on a path that doesn't lead you anywhere good 27 | * Fine when you're playing - good to be in the present moment 28 | * Another example is the management rat race - getting away from what you love not because you want to, but because you have to 29 | 30 | ## Is it fun for people to help you? 31 | * If anyone else needs to learn this PHP framework, they're going to hate it because it's full of personal quirks 32 | * Harry Harlow - monkey pinned door experiment - the monkeys wanted to figure out the puzzle right away during the acclimation period. With no external motivaion. 33 | * Instrinsic motivation! 34 | * Only the monkeys started getting rewarded, they made more errors and got worse at solving puzzles 35 | * Lesson: for interesting problems, intrinsic motivation can be stronger than external motivations 36 | * Candy or answers - for interesting questions, the answer can be worth more than any rewards 37 | 38 | ## Delegating: What would Branson do? 39 | * Over 400 companies in the Virgin group - how does he do it? 40 | * Constantly evaluating and funding ideas - like having unlimited time 41 | * Suddenly able to say yes to everything that seemed worth doing 42 | * But it's hard to let go 43 | 44 | ## Brain dump yet? - thoughts.pro 45 | * Dancing guy video - lone nut -> first follower turns him into a leader -> third makes a crowd to hit the tipping point and it's no longer risky 46 | * nurture your followers as equals 47 | * glorify the action, not yourself 48 | * no movement without the first follower 49 | * best way to make a movement is to creatively follow and give a lone nut credibility 50 | 51 | ## Confabulate - fabricate plausible reasons for things we feel 52 | * Brother/sister story 53 | * For moral arguments, the feelings come first and the reasons come later (but are made up to support the feelings) 54 | * Emotions are the elephant, rational self is the rider. Ultimately, the elephant is going to do whatever it wants. The rider can only coax in rational directions and avoid danger. 55 | * Easy to slip from guiding the elephant to making excuses for it 56 | * People believe what they believe, the reasons come after 57 | 58 | ## Intuition is the sum of everything you've learned 59 | * "how we decide" book on neuroscience 60 | * Primitive brain is good at making fast decisions 61 | * Rational brain is much slower and younger (evolutionarily) 62 | * Rational brain stores info into the emotional brain and wires a rule 63 | * That's what makes the snap decisions work 64 | * Story about submarine blips in Desert Storm 65 | 66 | ## Hobby 67 | * Can't really hire people to help with a hobby - you're doing it for your own sake, not as a means to an end 68 | 69 | ## 90% think they're better than average 70 | * Easy to get caught up with looking good, so you never do anything good 71 | * Easy to get caught up trying to do something great, so you never do anything at all 72 | * Best way around it is to think of yourself as a beginner again. -------------------------------------------------------------------------------- /session_notes/user behavior tracking.md: -------------------------------------------------------------------------------- 1 | # User Behavior Tracking 2 | ## Three steps to implement successfully 3 | * Track! 4 | * Interpret and identify 5 | * Alternate and refactor 6 | 7 | ## The toolbox 8 | * Google analytics 9 | * You with Garb 10 | * Vanity 11 | 12 | ## Questions to answer: 13 | * Which links are being clicked? 14 | * Where are users landing? 15 | * Where are users exiting? frustrations, bounces, etc. 16 | * Where do users spend the most time? most used features, etc. 17 | * What are logged-in users doing? (segmenting users, figuring out what segments are doing, features being used, etc.) 18 | 19 | ## Google Analytics - answers "Which links are being clicked?" 20 | * Async tracking 21 | * _gaq.push - push tracking values onto an array without actually loading the google JS, then sends them off later 22 | * What should we track? Everything! More data is always better 23 | * Track clicks as virtual pageviews 24 | * Assign clicks fake URLs to track appropriately 25 | * Need to come up with a consistent naming scheme 26 | * Also works well for modals, etc. 27 | * May be faulty sometimes - affects time on page measurements, for example 28 | * Can also track events - simple key/value pairs as a counter 29 | * Main disadvantage is the lack/loss of context 30 | * Tough to get access to from the API - just a sum count of every event which isn't very useful 31 | * Limitations 32 | * Time on page measurements can have weird side effects 33 | * An idea - go through the Rails app instead for some events/pageviews 34 | * Use a meta tag like the Rails 3 CSREF meta tag 35 | * Insert javascript that picks up the tracking info on load and dumps it out on load 36 | * Can also use something like Clicktale to step through the site 37 | 38 | ## Garb - answers "Where are users landing?" 39 | * Log in with OAuth access token, set globally or per session in Garb 40 | * Pull back a profile with a UA number 41 | * Reports - can grab existing ones from classes or build new ones the fly 42 | * Uses a model DSL like DataMapper (Garb::Resource) 43 | * Specify metrics and dimensions 44 | * Can specify profile, offsets, limits, dates at query time 45 | * Returns an OpenStruct with a lot of data 46 | * Can set sorts and filters as well, with operations and dimensions 47 | * OR you can build on the fly 48 | 49 | ## Garb - answers "Where are users exiting?" 50 | * Google doesn't give you the exist rate via the API anymore 51 | * Can calculate from exits and page views metrics (just divide!) 52 | * Bummer that you can't sort on the Google side anymore 53 | * What about abandonment? 54 | * Tracking goals and progress through funnels (you get 4 groups of 5) 55 | * Can track with a virtual pageview 56 | * Can't track goals vis a vis events - bummer 57 | * Can't track from the backend, which kind of sucks too 58 | 59 | ## Garb - answers "Where do users spend the most time?" 60 | * Google doesn't give it to you, but it's just time/(pageviews - exits) 61 | * Doesn't do this because exit pages always sign out after 30 minutes, so it would just skew results 62 | 63 | ## GA/Garb - answers "What are logged-in users doing?" 64 | * Use custom variables to find out! 65 | * SetCustomVar - path, slot # (you get 5), scope, and key-value pair 66 | * Lasts for the session, might be used to segment admin users 67 | * Great for dynamic segmentation - just grab the right criteria, then can use the ID in reports (grab from the URL, unfortunately) 68 | 69 | ## Vanity - answers "How do we improve?" 70 | * Redis-backed library for tracking metrics and A/B testing 71 | * Define metrics to be tracked from controller 72 | * Get fancy dashboard of what's going on 73 | * Would be cool to build programmatic access to the dashboard data - probably pretty easy 74 | * Define ab tests and the metrics they track against 75 | * takes care of session cookies and segmenting users randomly 76 | * supports multivariate alternatives 77 | * Figures out the test stats for you on the fly 78 | * view helpers for switching display 79 | * Can integrate with Garb too! 80 | * Give a UA number and the metrics you want to track 81 | * Can also hit a report instance directly from the metric definition 82 | * Can get crazy custom if you want 83 | 84 | ## Impacts 85 | * Give info to your users - more transparency and utility - ex. Flickr stats 86 | * Easy admin panel for your users where you can't give them GA access - like Github's pageview stats 87 | 88 | ## For the road 89 | * Follow the money - for improvements, start at the place where you make your money 90 | * Nobody is paid by the pageview - dig into useful metrics that actually tell you stuff 91 | * User testing still wins - always much easier to get feedback and more direct than trying to infer info from analytics and guessing -------------------------------------------------------------------------------- /session_notes/implementing user recommendations.md: -------------------------------------------------------------------------------- 1 | # Implementing user recommendations in Rails 2 | * Matthew Deiters - @mdieters 3 | * Make graph technologies accessible 4 | * VW Lemon ad - brought on the golden age of advertising *creative revolution 5 | * Previously had been more of an academic approach 6 | * Technology changed (TV - by 1962, 90% of homes had one) 7 | * Necessarily, the approach changed 8 | * Transition to audience segmentation (away from male-targeted exclusively) 9 | * Web dev is in a similar place now, particularly with NoSQL and dynamic languages 10 | * We've been doing CRUD for 30 years, and it's gotten really boring for us and for users 11 | 12 | ## Our Creative Revolution 13 | * Recommendations = Money 14 | * 25% of Amazon's sales are on personalized suggestions 15 | * StumbleUpon ranks #1/#2 in social media traffic generations 16 | * 3 points 17 | * Discover the relationships in your data 18 | * FB news feed - using your actions to determine who's relevant to you 19 | * Github relationship graph with followers 20 | * Applies to social networking, content display (tailoring content display), website analytics, and predictive analysis 21 | * Modeling the relationships 22 | * RDBMSes don't work for this dataset and operations b/c SQL is set-based 23 | * With 100 people, it's really fast 24 | * Doesn't scale though - 2nd degree ops with 60k people takes over an hour 25 | * Get some really bad SQL smells - RECURSIVE SQL extension and stuff like that (Hasselhoff graphic) 26 | * Use graphs instead - relationships are first class citizens in the data model 27 | * Instead of rows, you have nodes (or actors or vertices or points) 28 | * Edges are relationships between those nodes (or arcs or links) 29 | * Ultimately less complex and more natural (and whiteboard friendly) 30 | * Plus they're really fast (FUCK YOU, EINSTEIN) 31 | * Using graphs in your Rails app 32 | * In-Memory Ruby Graphs 33 | * Great for small datasets or ad-hoc querying 34 | * There's the RGL gem but it's cryptic (github.com/fmeyer/rgl) 35 | * 20k nodes and 1M edges = 300MB of memory (not bad) plus queries went from 30 seconds to a few milliseconds (smallish dataset though) 36 | * Persistence & Dynamic Data - Neo4j (OSS graph DB in Java) 37 | * Key-value store plus relationships between keys 38 | * Neo4j.rb is a solid interface library (ORM) using Lucene but requires JRuby 39 | * Don't need to store everything in Neo4j - just another persistence tool like Solr 40 | * Neo4jr-social is another library - more like solr/sunspot, much easier to run and doesn't require JRuby 41 | * Self-contained jetty server or deployable WAR 42 | * Focused on social network analysis 43 | * Built-in basic querying, very extensible 44 | * (Live code example w/ Star Wars stuff) - built in relationships, suggestions, degrees of separation 45 | * Gremlin - xPath-like syntax for querying your social graph 46 | * Had a MongoDB backend - cool but not efficient 47 | * Also works on Neo4j 48 | * More than anything, it's a query syntax that's really easy to use 49 | * Net net, SQL is assembly required, Graphs mostly work out of the box with pre-built queries 50 | * AllSimplePaths - who do we have in common? 51 | * ShortestPath 52 | * Dijkstra algorithm - add scores to edges 53 | * Can build who do we have in common on steroids 54 | * Closeness Centrality - who has the most followers on Twitter 55 | * Betweenness Centrality - Who has more influential people following them on Twitter (used for the Github London followers graph) 56 | * Eigenvector Centrality - Essentially PageRank 57 | * Performance implications 58 | * Sparse graphs are faster because there's less traversal necessary 59 | * if you get too dense the data isn't especially meaningful 60 | * Neo4j is a hoss - needs lots of RAM to avoid swapping 61 | * Metadata - can vary depending on how you want to slice and dice and limit 62 | * Breadth-first vs. depth-first searches - need to match to data and the problem you want to solve 63 | 64 | ## Conclusion 65 | * Start thinking in graphs when it makes sense 66 | * NoSQL as an augmentation like Solr or Sphinx or Memcached - okay to have different stores for specific kinds of data 67 | 68 | # Q&A 69 | * Probably don't want to store a lot of metadata in Neo4j - use identifiers and relationships, and node data where necessary 70 | * Performance is limited by complexity of relationships, not size of dataset 71 | * Nodes and relationships can be of any type 72 | * Dex graph engine - C-based, could be easy to bind into Ruby. Also RichRelevance (SaaS) and DirectedEdge (Shopify) 73 | * For loading data, mine info out of relational DB and put it into Neo4j 74 | * Not necessarily duplicate data, just modeled for different purposes 75 | * Deployment - Neo4j is a memory hog, may be able to keep the DB in memory but not sure how well it releases RAM -------------------------------------------------------------------------------- /session_notes/million dollar mongo.md: -------------------------------------------------------------------------------- 1 | # Million Dollar Mongo - Breaking Free of the Relational Mindset 2 | ## Backstory ## 3 | * Client facing total business failure - 2 years of dev, little to no progress, had a hard 6-month deadline 4 | * Competitive RFP 5 | * In the process of signing a very large insurance company for a SAAS app that they didn't have built 6 | * Service almost 1/2 of insured people in the US 7 | * Started with a $500k budget, scope ballooned after 4mo, eventually billed $1.7MM, over 10,000 hours in 9 months 8 | * Peak head count was 14 people across 4 continents 9 | * Reqs driven by client's client - dealing with 1 record at a time since it's pharmaceutical/healthcare 10 | * Production releases Feb to April 2010 11 | * Ridiculous project points breakdown graph showing scope creep - could be useful for factory! 12 | * Technical backstory 13 | * Originally relational data, stored across hundreds of tables 14 | * Ridiculously join-heavy, query-heavy (35-36 per request) 15 | * Original system was on DotNetNuke and MSSQL 16 | ## The Solution ## 17 | * At core, a patient record system 18 | * Files are very distinct and have no overlap. 19 | * Built spikes for Mongo, Couch, and Cassandra 20 | * Perf numbers on Twitter 21 | * Big selling point for Mongo was dynamic queries 22 | * Couch Map/Reduce felt too much like stored procs 23 | * Couch fell over a bit at 10,000 docs on insert/update operations 24 | * Cassandra had "too much fucking XML" 25 | * Schemaless approach made legacy migration much easier 26 | * CSRs needed to build/update records with history 27 | * Sotred history by generating new keys on the fly 28 | * Resistance to change was a problem 29 | * Hashrocket didn't have an internal consensus that this was the right way to go - Mongo was only 0.8, was this the right thing to stake such a large engagement on? 30 | * "So, are we going to have to write our own ORM for this thing?" 31 | * Out of the normal comfort zone - both good and bad 32 | * Normal Rails development has become "more like a zombie" 33 | * New challenges, very interesting 34 | * Easiest DB to install ever - no deps 35 | * Schemaless DB makes things uber-agile because migrations are painless 36 | * Able to cut down time for data import process 37 | * Code has to be able to deal with different data structures - more app-side load 38 | * Client DBAs kept changing the schema, causing issues 39 | * (fast ramp-up for non-idiots) - if you can understand JSON, you can work with mongo! 40 | * Mongomapper -> Mongoid 41 | * Hashrocket fork had 33 patch releases, had more that didn't really fit into the MM philosophy 42 | * Prefers embedding documents for everything 43 | * Default to atomic operations 44 | * Rich criteria API, master/slave support, versioning, etc. 45 | * Is NOT ActiveRecord (!!!) 46 | * Designed with performance in mind 47 | * Average document size is 500k, biggest about 1MB (4MB is limit, but would be hard to hit) - big but can get everything in 1 request 48 | * As of Mongo 1.4, you can index asynchronously on inserts 49 | * Regex queries on fields, etc. 50 | * Schema design patterns 51 | * Denomalization is acceptable 52 | * Embed whatever you can 53 | * Be wary of pre-epoch dates - use integers where you can 54 | * One query to rule them all - no joins! 55 | * MySQL is fast, Mongo is faster 56 | * 300,000 writes/second 57 | * Didn't even need to shard mongo on an EC2 XL instance 58 | * Hybrid DB model 59 | * Hierarchical data in Mongo, relational data in MyISAM, transactions in InnoDB, simple data in Redis 60 | * Having more than one DB is okay - favor suitability over simplicity 61 | * Needed relational DB as staging tables for normalized imports 62 | * Working without transactions 63 | * Atomic updates mostly cover you 64 | * Optimistic locking is pretty easy to roll in 65 | * Integration tests require manual rollback 66 | * Selenium tests can run in a separate process (neat) - enables something like SpecJour helps the 40-minute cucumber run 67 | * In Production 68 | * Deployed on EC2 69 | * Close to 500GB of data 70 | * Put Mongo on its own utility instance instead of the DB master since MySQL will fight it 71 | * Feed it lots of RAM and disk space 72 | * Setting up chef scripts was pretty easy 73 | * Costs about $10k a month 74 | * For backups, do a mongo dump and send it to S3 75 | 76 | ## What happens when 'real america' interferes? ## 77 | * Relational DBAs can be retarded (especially when threatened by change) 78 | * Set up squealer to provide a proxy to MySQL for ad-hoc querying and a GUI 79 | * How the hell did we end up in bed with these people? 80 | * Emerging technologies threatened those opposed to change 81 | * Anyone with a C and O in their title has no business speaking about technology 82 | * CIO saw the well-suited/less-well-suited page and pitched a fit 83 | * Called up another Rails firm (Intridea) to verify, that firm backstabbed somewhat and said they didn't think Mongo was an appropriate choice, and offered to convert in a month 84 | * Real bummer in the community that this sort of stuff happens 85 | 86 | * Would be nice if more case studies on large mongo apps existed for background info. -------------------------------------------------------------------------------- /session_notes/Building an API with Rails.md: -------------------------------------------------------------------------------- 1 | # Building an API with Rails # 2 | * Joe Ferriss (Thoughtbot/Hoptoad), Jeremy Kemper (37S), Marcel Molina (Twitter), Rick Olson (Github/ENTP), Derek Willis (NYT) 3 | 4 | ## Authentication ## 5 | * Twitter implemented OAuth 1.0 about a year ago, then 1.0a for a security issue 6 | * 1.0 spec is difficult to implement because of the signature algorithm - makes requests difficult and easy to screw up 7 | * 2.0 spec uses tokens over SSL instead - much simpler. 8 | * 37S is working on OAuth 9 | * Distinction between authentication and authorization is interesting - changes app responsibilities 10 | * Wrap is an interesting concept - rolled into OAuth 2 11 | * Github using OAuth 2 12 | * Lets you compartmentalize access 13 | * Thoughtbot 14 | * Simplicity is key - fast setup 15 | * OAuth is cool but overkill/unnecessary 16 | 17 | ## Input and Response Formats ## 18 | * Thoughtbot 19 | * Originally accepted multiple formats, simplifying and standardizing helped scale more easily (write-heavy) 20 | * Documentation is huge - XML schemas provide this implicitly which is nice 21 | * NYT 22 | * Read-heavy API, necessary to offer larger data streams and archives in multiple formats for analysis 23 | * Twitter 24 | * Like to give devs many options 25 | * Moving to standardize on JSON - don't see the lack of choice as an issue 26 | * 37S 27 | * Treat all formats equally, restful/resourceful architecture is huge and makes a lot of this a non-issue 28 | * Smaller apps make it much easier 29 | 30 | ## Versioning ## 31 | * Twitter 32 | * Really hard problem (and great interview question) - no real good answers yet 33 | * 5 separate things to deal with - format, params, serialization, features, etc. 34 | * Fork app and do separate deploys vs. inheritance - change management is a bitch 35 | * Twitter has held off - no compelling force driving to implement, don't want to inconvenience devs. Have handled issues by other means. 36 | * Build in new functionality on a switch on/switch off basis - can ease transitions. Just make sure you clean them up later. 37 | * 37S 38 | * Easy to get drunk on engineering and want to tackle it when it's not really a problem 39 | * Give developers lead time, but don't sweat making changes 40 | * Flip formats with mime types, etc - lots of little tricks to ease transitions 41 | * Thoughtbot 42 | * Some people will never upgrade until things are broken, but at the same time the dropoff/abandonment rate is not very big (for those that don't make the jump) 43 | * NYT 44 | * Developers are flexible and will roll with a lot of things. Don't abuse them, but don't bend over backwards either. 45 | 46 | ## Scaling ## 47 | * NYT uses mashery for authentication but not for caching - on a per-app basis 48 | * Use varnish for frontend - totally awesome 49 | * Internal use is huge - timing and cache invalidation is critical to the usefulness of the dataset 50 | * Hoptoad is difficult because of the write-heavy nature 51 | * Everything has a cost - need to figure out what they are and prioritize accordingly 52 | * Rate limits, etc. help 53 | * 37S 54 | * API traffic is way different to cache than web traffic - lots of scraping, batching, etc. Need different strategies. 55 | * Need to rate limit to keep it under control. Clients get accidentally out of control often. 56 | * Twitter caches everything, but you need to watch out for working set size. Be smart about it. 57 | * Dynamic assembly at runtime 58 | * Mind the cost of expiration too 59 | * Streaming API for continuous data to prevent polling (which kills) 60 | * Github 61 | * Let clients make use of HTTP caching to help you out 62 | 63 | ## Code Separation 64 | * Github 65 | * First API was inline with the UI (respond_to) - now use separate controllers because the logic was getting kludgy 66 | * Eventually want to separate into its own app (Sinatra or like) 67 | * Twitter 68 | * Nice when you can make all of the business logic in one place 69 | * If you use skinny controller/fat model, it's not much duplication 70 | * Much easier to reason through changes, figure out logic if it's separate 71 | * 37S 72 | * This is why respond_to exists - use it! 73 | 74 | ## Security Concerns ## 75 | * Twitter - annotations present an interesting challenge as they can store any data 76 | * Moral: you're going to get screwed, no matter how awesome you are. Be prepared. 77 | * Github 78 | * Watch out for login cookies/sessions, particularly with badges 79 | * 37S - keep the walls up 80 | 81 | ## Developer Communication ## 82 | * Twitter - obviate the need to communicate with developers 83 | * Detailed status sites, etc. Keep them in the loop. 84 | * Use what works for you 85 | * Thoughtbot - make sure you're not totally wrong 86 | * People will hold you totally responsible for what you say 87 | * 37S - figure out what business priority you have for the API and treat it like a product with devs as customers 88 | * Get rid of your API if you're not going to support it 89 | * NYT - build sample apps for documentation for every change. Helps you learn too, be able to relate. 90 | * Github - foster a sense of community and open lines of communication 91 | * Community-supported docs, etc. 92 | * Be your own customer (37S highrise iPhone app) --------------------------------------------------------------------------------