├── .gitattributes ├── .gitignore ├── 000 - change the world.jpg ├── 000a - Data Science Stack.md ├── 000c - Medicare Blog Post.md ├── 000d - SQL Regex.md ├── 000e - Pesticides.md ├── 000f - Picking a College Using Data.md ├── 001 - Personal Security.md ├── 002 - California Water.md ├── 003 - spot pricing 3.md ├── 005 - spot pricing 2.md ├── 005 - spot pricing 4.md ├── 008 - Car Analysis Three.md ├── 008 - Car Prices.xlsx ├── 008 - Car Quality Stats.xlsx ├── 008 - Car-Models.xlsx ├── 008 - Cars-For-Sale.xlsx ├── 010 - CameraAwesomePhoto (1).jpg ├── 010 - CameraAwesomePhoto (2).jpg ├── 010 - Ethics Inputs.md ├── 010 - equality vs equity.jpg ├── 010 - food lifecycle.jpg ├── 010 - poor deserve the best.png ├── 010 - spider man.jpg ├── 015 - AnnualSalary2010thru2013.csv ├── 015 - Tuition by Raw Numbers.xlsx ├── 015 - UW Salary Info.xlsx ├── 015 - UW Student Tuition Plan.pdf ├── 015 - University Salary Analysis.md ├── 016 - College from First Principles.md ├── 020 - Industries.pdf ├── 020 - What is Ethical.md ├── 020 - ally bingo.jpg ├── 025 - Why Ethics.md ├── 025 - ethics.jpg ├── 030 - Looking for Ethical Work.md ├── 040 - Ethics for Data Pros.md ├── 040 - ML companies.png ├── 040 - NIST.SP.800-179.pdf ├── 040 - data ethics slide.jpg ├── 040 - ead_v1.pdf ├── 040 - ethical tech.jpg ├── 040 - the scored society.pdf ├── 050 - Privacy.markdown ├── 050 - The Database of Ruin.md ├── 050 - online tracking.jpg ├── 060 - Precautionary Principle.md ├── 070 - Environmental Engineering.md ├── 070 - climate change viz.jpg ├── 070 - space ceded to cars.jpg ├── 080 - Music and Engineering.md ├── 100 - Google Wagging the Dog.md ├── 100 - map-reduce funny.png ├── 110 - Big_Data_Landscape.png ├── 110 - Data Stack.md ├── 110 - data stack hadoop cat.png ├── 110 - data stack.jpg ├── 115 - Cost of Complexity.jpg ├── 115 - Systems Thinking.md ├── 120 - Cloud Computing incl Spot.md ├── 120 - cloud computing dev 1.JPG ├── 120 - cloud computing dev 2.JPG ├── 130 - Interdisciplinary.md ├── 135 - Curiosity and Ego.md ├── 140 - Net Neutrality.md ├── 150 - MonteCarlo.R ├── 150 - Project Paradox.png ├── 150 - Project Planning.markdown ├── 150 - Scratch.R ├── 151 - Project Planning 2.md ├── 152 - Project Planning 3.md ├── 153 - Project Planning 4.md ├── 160 - Communication and Storytelling.md ├── 160 - linkbait effectiveness.png ├── 170 - Getting Started With Programmind.md ├── 180 - Incentives.md ├── 180 - incentives.jpg ├── 190 - Project Animal Names.jpg ├── 190 - Project Names 2.jpg ├── 190 - Project Names.md ├── 200 - data viz.jpg ├── 200 - data viz.md ├── 2013-12-08-pagerank scale.md ├── 2013-12-28 Productivity Analysis.xlsx ├── 2013-12-28-productivity.md ├── 2014-02-23-hal-varian.md ├── 2014-05-19-social-network.md ├── 2014-06-01-democratization-of-bi.md ├── 210 - System Replacements.md ├── 220 - Personal Automation.md ├── 220 - software architecture.png ├── 230 - Association Rules in SQL AdventureWorks 2012.sql ├── 230 - Basic ML Using SQL.markdown ├── 230 - CameraAwesomePhoto.jpg ├── 240 - SQL and Digraphs.markdown ├── 250 - Finding a Vacation Using Data.markdown ├── 260 - Startups and Y Combinator.markdown ├── 270 - Smell Test Dilbert.jpg ├── 270 - The Smell Test.md ├── 280 - Agile and Waterfall.md ├── 290 - Data Science Evolution.md ├── 320 - Feature Engineering.md ├── 330 - Cognition for Data Professionals.md ├── 330 - Cognition for Data Pros.png ├── 330 - know all the things.jpg ├── 350 - Example Math.xlsx ├── 350 - Matrix Prioritizaton.md ├── 360 - Life is an Optimization Problem.md ├── 370 - Industry Comparisons.md ├── 380 - Trust.md ├── 400 - Software as a Craft.markdown ├── 400 - software.jpg ├── 410 - Scientific Method.markdown ├── 420 - Inductive vs Deductive.markdown ├── 430 - Hiring.md ├── 431 - CV of Failures.pdf ├── 431 - Job Searches.md ├── 431 - job searches as developer.png ├── 431 - resume viz.png ├── 431- interviewing honesty.jpg ├── 431-decoding-job-descriptions.jpg ├── 432 - Bad Work Situations.md ├── 432 - fail.jpg ├── 440 - Database Development.markdown ├── 450 - Engineering Constraints.markdown ├── 460 - Reputation Systems and PageRank.markdown ├── 470 - Amazon.md ├── 480 - Minorities in Technology.md ├── 480 - computing women.jpg ├── 480 - lego_gender.jpg ├── 480 - racism and bigotry.jpg ├── 480 - recruiting WIT.jpg ├── 480 - what happens we're out.png ├── 480 - women_astronomer.jpg ├── 480- perfectcrime.png ├── 490 - Chart of Cosmic Exploration.jpg ├── 490 - Science and Research.md ├── 490 - scientific method.jpg ├── 490 - what would feynman.png ├── 500 - Intro to Caching and Core Algos.markdown ├── 501 - Moore's Law.md ├── 502 - Self-Documenting Code.md ├── 510 - Analysis of Brilliant People.markdown ├── 510 - Brilliant People.png ├── 510 - Smart People Traits.xlsx ├── 520 - Find a Health Using Data.md ├── 520 - Natural Food Remedies Notes.txt ├── 520 - healthy foods.jpg ├── 520.jpg ├── 530 - 10 commands of architecture.markdown ├── 540 - Learning and Retention Methods.markdown ├── 550 - SQL on RDS.markdown ├── 560 - Balance.markdown ├── 570 - Housing Using Data.md ├── 600 - Advanced ETL Approaches.markdown ├── 610 - ETL tips and Tricks.markdown ├── 620 - Data Science Intro.markdown ├── 621 - SQLSatRedmond - ML For Mere Mortals.markdown ├── 621 - photo.JPG ├── 640 - Making Data Friendly Organizations.markdown ├── 650 - Data To Decisions Education Abstract.html ├── 650 - Data to Decisions Ed abstract.md ├── 650 - Data to Decisions for Education.md ├── 700 - autotrader_scrape.py ├── 9900 - Cloud Uploads.jpg ├── 9900 - Graphical Models.PDF ├── 9900 - IT_roles.jpg ├── 9900 - Programming Links.md ├── 9900 - commit linkbait.jpg ├── 9900 - complexity kills.jpg ├── 9900 - complexity.jpg ├── 9900 - devops and security.jpg ├── 9900 - enterprise-it.png ├── 9900 - git undo flowchart.png ├── 9900 - ie-must-die.jpg ├── 9900 - javascript.png ├── 9900 - linux perf tools.jpg ├── 9900 - multithreading.jpg ├── 9900 - programmer_style.png ├── 9900 - programming spec.jpg ├── 9900 - reading software.png ├── 9900 - software-engineer.jpg ├── 9900 - stackoverflow.png ├── 9900 - wicked problems.jpg ├── 9901 - GDP vs GNH.jpg ├── 9901 - Productivity Links.md ├── 9901 - Smartphone Crossing.jpg ├── 9901 - learning stages.jpg ├── 9901 - profanity motivation.jpg ├── 9902 - Career and Branding.md ├── 9903 - CEO streamlining.jpg ├── 9903 - Finance Links.md ├── 9903 - Robots and labor.jpg ├── 9903 - counter-Varian Rule.jpg ├── 9903 - trickle down economics.jpg ├── 9904 - 2016-12-9-gans.pdf ├── 9904 - Big Data Deities.png ├── 9904 - Data Science and Engineering Links.md ├── 9904 - Overfitting diagram.jpg ├── 9904 - RoadToDataScientist1.png ├── 9904 - Scikit_Learn_Cheat_Sheet_Python.pdf ├── 9904 - data science funny.jpg ├── 9904 - data science over time.png ├── 9904 - data science skills venn.jpg ├── 9904 - data viz.png ├── 9904 - data-science-venn-diagram.jpg ├── 9904 - machine learning industry.png ├── 9904 - ml libraries.png ├── 9904 - never use piece charts.jpg ├── 9904 - stats-trick-question.jpg ├── 9904 - storytelling.jpg ├── 9904 - tools.jpg ├── 9904.jpg ├── 9905 - Grief.md ├── 9905 - Parenting Chores over Time.jpg ├── 9905 - Parenting Iron Triangle.jpg ├── 9905 - Personal Life Links.md ├── 9905 - money and time.jpg ├── 9905 - no hipsters.jpg ├── 9905 - why people become unhappy.jpeg ├── 9906 - Security Links.md ├── 9906 - time to crack password.png ├── 9907 - Health Links.md ├── 9907 - Salad Ideas.md ├── 9907 - cheese wheel.jpg ├── 9907 - dentist prices.pdf ├── 9907 - growth of hospital admins.png ├── 9907 - overweight.jpg ├── 9907 - recipe recommendation ML.pdf ├── 9907 - vaccines.gif ├── 9908 - Academia Misincentives.jpg ├── 9908 - College and Career.jpg ├── 9908 - Education Links.md ├── 9908 - NFL odds.jpg ├── 9908 - academic minions.jpg ├── 9908 - education retention.jpg ├── 9908 - game of loans.jpg ├── 9908 - goal of education.jpg ├── 9908 - incentives.jpg ├── 9908 - teacher feedback funny.jpg ├── 9908 - textbooks.png ├── 9908.jpg ├── 9909 - Education Reform Warnings.pdf ├── 9909 - Leadership and Management Links.md ├── 9909 - Skunk Works Leadership.png ├── 9909 - get out of the way.jpg ├── 9909 - org charts.jpg ├── 9909 - typical conversation with managers.webm ├── 9910 - Startups.md ├── 9910 - mvp.png ├── 9910 - sick burn by new yorker.jpg ├── 9911 - CBP Task Group Out-brief Slides_FINAL.pdf ├── 9911 - ComparisonOfVotingSystems.png ├── 9911 - Government.md ├── 9911 - Terrorism causes.png ├── 9911 - police and recording.jpg ├── 9912 - Intellectual Property.md ├── 9913 - Companies.md ├── 9913 - Net Neutrality.png ├── 9913 - coca cola.png ├── 9913 - misbehaving.jpg ├── 9914 - Privacy and Security.md ├── 9914 - privacy vs security.jpg ├── 9915 - dont shoot.png ├── 9916 - Police and the Justice System.md ├── 9916 - how to survive police encounters.jpg ├── 9917 - Military.md ├── Archive ├── 140 - Nonprofit_Revenue_-_Donation_Cannibalization.pdf ├── 140 - Seattle Art Museum.markdown ├── 170 - Seattle Aquarium.md ├── 2014-05-08-keynote-one.md ├── 2014-05-08-keynote-two.md ├── 2014-05-13-passbac survey.xlsx ├── 2014-06-11-passbac.md ├── 2014-07-01-tsql-tuesday.md ├── NodeXL graphs.md └── uw - 010 - introduction.md ├── Genome Science Blog Post.md ├── List of things I still can't do in November 2014.md ├── Principles_of_Performance_Tuning.md ├── README.md ├── company size and culture.png ├── crime-vs-incarceration.jpg ├── darwin award.jpg ├── data bias.jpg ├── einstein_ethics.jpg ├── equal-vs-fair.png ├── math_for_grownups.jpg ├── mechanical_calculator.jpg ├── precision-and-recall.jpg ├── resistance is just.jpg ├── student_debt.jpg └── wolf debt.png /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | 4 | # Custom for Visual Studio 5 | *.cs diff=csharp 6 | *.sln merge=union 7 | *.csproj merge=union 8 | *.vbproj merge=union 9 | *.fsproj merge=union 10 | *.dbproj merge=union 11 | 12 | # Standard to msysgit 13 | *.doc diff=astextplain 14 | *.DOC diff=astextplain 15 | *.docx diff=astextplain 16 | *.DOCX diff=astextplain 17 | *.dot diff=astextplain 18 | *.DOT diff=astextplain 19 | *.pdf diff=astextplain 20 | *.PDF diff=astextplain 21 | *.rtf diff=astextplain 22 | *.RTF diff=astextplain 23 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ################# 2 | ## Eclipse 3 | ################# 4 | 5 | *.pydevproject 6 | .project 7 | .metadata 8 | bin/ 9 | tmp/ 10 | *.tmp 11 | *.bak 12 | *.swp 13 | *~.nib 14 | local.properties 15 | .classpath 16 | .settings/ 17 | .loadpath 18 | 19 | # External tool builders 20 | .externalToolBuilders/ 21 | 22 | # Locally stored "Eclipse launch configurations" 23 | *.launch 24 | 25 | # CDT-specific 26 | .cproject 27 | 28 | # PDT-specific 29 | .buildpath 30 | 31 | 32 | ################# 33 | ## Visual Studio 34 | ################# 35 | 36 | ## Ignore Visual Studio temporary files, build results, and 37 | ## files generated by popular Visual Studio add-ons. 38 | 39 | # User-specific files 40 | *.suo 41 | *.user 42 | *.sln.docstates 43 | 44 | # Build results 45 | 46 | [Dd]ebug/ 47 | [Rr]elease/ 48 | x64/ 49 | build/ 50 | [Bb]in/ 51 | [Oo]bj/ 52 | 53 | # MSTest test Results 54 | [Tt]est[Rr]esult*/ 55 | [Bb]uild[Ll]og.* 56 | 57 | *_i.c 58 | *_p.c 59 | *.ilk 60 | *.meta 61 | *.obj 62 | *.pch 63 | *.pdb 64 | *.pgc 65 | *.pgd 66 | *.rsp 67 | *.sbr 68 | *.tlb 69 | *.tli 70 | *.tlh 71 | *.tmp 72 | *.tmp_proj 73 | *.log 74 | *.vspscc 75 | *.vssscc 76 | .builds 77 | *.pidb 78 | *.log 79 | *.scc 80 | 81 | # Visual C++ cache files 82 | ipch/ 83 | *.aps 84 | *.ncb 85 | *.opensdf 86 | *.sdf 87 | *.cachefile 88 | 89 | # Visual Studio profiler 90 | *.psess 91 | *.vsp 92 | *.vspx 93 | 94 | # Guidance Automation Toolkit 95 | *.gpState 96 | 97 | # ReSharper is a .NET coding add-in 98 | _ReSharper*/ 99 | *.[Rr]e[Ss]harper 100 | 101 | # TeamCity is a build add-in 102 | _TeamCity* 103 | 104 | # DotCover is a Code Coverage Tool 105 | *.dotCover 106 | 107 | # NCrunch 108 | *.ncrunch* 109 | .*crunch*.local.xml 110 | 111 | # Installshield output folder 112 | [Ee]xpress/ 113 | 114 | # DocProject is a documentation generator add-in 115 | DocProject/buildhelp/ 116 | DocProject/Help/*.HxT 117 | DocProject/Help/*.HxC 118 | DocProject/Help/*.hhc 119 | DocProject/Help/*.hhk 120 | DocProject/Help/*.hhp 121 | DocProject/Help/Html2 122 | DocProject/Help/html 123 | 124 | # Click-Once directory 125 | publish/ 126 | 127 | # Publish Web Output 128 | *.Publish.xml 129 | *.pubxml 130 | *.publishproj 131 | 132 | # NuGet Packages Directory 133 | ## TODO: If you have NuGet Package Restore enabled, uncomment the next line 134 | #packages/ 135 | 136 | # Windows Azure Build Output 137 | csx 138 | *.build.csdef 139 | 140 | # Windows Store app package directory 141 | AppPackages/ 142 | 143 | # Others 144 | sql/ 145 | *.Cache 146 | ClientBin/ 147 | [Ss]tyle[Cc]op.* 148 | ~$* 149 | *~ 150 | *.dbmdl 151 | *.[Pp]ublish.xml 152 | *.pfx 153 | *.publishsettings 154 | 155 | # RIA/Silverlight projects 156 | Generated_Code/ 157 | 158 | # Backup & report files from converting an old project file to a newer 159 | # Visual Studio version. Backup files are not needed, because we have git ;-) 160 | _UpgradeReport_Files/ 161 | Backup*/ 162 | UpgradeLog*.XML 163 | UpgradeLog*.htm 164 | 165 | # SQL Server files 166 | App_Data/*.mdf 167 | App_Data/*.ldf 168 | 169 | ############# 170 | ## Windows detritus 171 | ############# 172 | 173 | # Windows image file caches 174 | Thumbs.db 175 | ehthumbs.db 176 | 177 | # Folder config file 178 | Desktop.ini 179 | 180 | # Recycle Bin used on file shares 181 | $RECYCLE.BIN/ 182 | 183 | # Mac crap 184 | .DS_Store 185 | 186 | 187 | ############# 188 | ## Python 189 | ############# 190 | 191 | *.py[cod] 192 | 193 | # Packages 194 | *.egg 195 | *.egg-info 196 | dist/ 197 | build/ 198 | eggs/ 199 | parts/ 200 | var/ 201 | sdist/ 202 | develop-eggs/ 203 | .installed.cfg 204 | 205 | # Installer logs 206 | pip-log.txt 207 | 208 | # Unit test / coverage reports 209 | .coverage 210 | .tox 211 | 212 | #Translations 213 | *.mo 214 | 215 | #Mr Developer 216 | .mr.developer.cfg 217 | -------------------------------------------------------------------------------- /000 - change the world.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/000 - change the world.jpg -------------------------------------------------------------------------------- /000a - Data Science Stack.md: -------------------------------------------------------------------------------- 1 | Blog post on data science stack -------------------------------------------------------------------------------- /000c - Medicare Blog Post.md: -------------------------------------------------------------------------------- 1 | Use medicare data -------------------------------------------------------------------------------- /000d - SQL Regex.md: -------------------------------------------------------------------------------- 1 | Use regex 2 | 3 | 4 | https://connect.microsoft.com/SQLServer/feedback/details/261342/regex-functionality-in-pattern-matching -------------------------------------------------------------------------------- /000e - Pesticides.md: -------------------------------------------------------------------------------- 1 | http://stackoverflow.com/questions/19611729/getting-google-spreadsheet-csv-into-a-pandas-dataframe 2 | 3 | 4 | http://vitals.lifehacker.com/why-you-shouldnt-buy-organic-based-on-the-dirty-dozen-1689190822 5 | 6 | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3135239/ 7 | 8 | Not all pesticides are the same 9 | 10 | 11 | * http://www.extremetech.com/extreme/218689-what-are-endocrine-disruptors-and-how-should-you-protect-yourself-from-them 12 | * http://well.blogs.nytimes.com/2014/12/08/bpa-in-cans-and-plastic-bottles-linked-to-quick-rise-in-blood-pressure/ 13 | * http://arstechnica.com/tech-policy/2015/05/eu-dropped-plans-for-safer-pesticides-because-of-ttip-and-pressure-from-us/ 14 | * http://www.theatlantic.com/health/archive/2015/02/the-food-babe-enemy-of-chemicals/385301/ 15 | * http://theconversation.com/the-mercury-level-in-your-tuna-is-getting-higher-37147 16 | * http://world.openfoodfacts.org/ <- for side business 17 | * http://ajcn.nutrition.org/content/84/3/475.full.pdf 18 | * https://www.supertracker.usda.gov/default.aspx 19 | * http://www.minnpost.com/earth-journal/2014/11/arsenic-laden-rice-fda-deliberates-consumer-reports-issues-guidance 20 | * http://www.reuters.com/article/2015/03/20/us-monsanto-roundup-cancer-idUSKBN0MG2NY20150320 21 | 22 | 23 | 24 |
  • EWG's “Dirty Dozen” list of hormone-disrupting chemicals | @ewg @saferchem
  • 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | -------------------------------------------------------------------------------- /000f - Picking a College Using Data.md: -------------------------------------------------------------------------------- 1 | ## Picking a Good College Using Data 2 | 3 | * Look at majors vs. schools. 4 | * School 'prestige' 5 | * School overhead 6 | * Total price - tuition, debt. Look at ROI 7 | 8 | * Look at PayScale data 9 | * Look at US World data, methodology 10 | 11 | - Value of school to prep people for a career. Value of school to prep people to be good citizens. 12 | - Try finding the amount schools spend on administration, other things by scraping Mechanical Turk? 13 | * http://fivethirtyeight.com/features/more-high-school-grads-decide-college-isnt-worth-it/ 14 | * http://www.nakedcapitalism.com/2014/03/us-university-science-shopping-mall-model.html 15 | * http://seattletimes.com/html/businesstechnology/2023239544_apxwealthgapstudentloans.html 16 | * http://www.nytimes.com/2014/04/01/opinion/bruni-our-crazy-college-crossroads.html?src=me&ref=general 17 | * http://priceonomics.com/the-phd-deluge/ 18 | * https://www.discover.com/student-loans/majors/index.html 19 | * http://mobile.nytimes.com/2014/06/29/upshot/americans-think-we-have-the-worlds-best-colleges-we-dont.html 20 | 21 | 22 | http://static.googleusercontent.com/media/www.google.com/en/us/googleblogs/pdfs/google_public_data_march2010.pdf -------------------------------------------------------------------------------- /001 - Personal Security.md: -------------------------------------------------------------------------------- 1 | # Personal Security 2 | 3 | * Target, Home Depot, JP Morgan, you name it 4 | 5 | ### Core Lessons 6 | 7 | * The incentives for keeping data secure are all missing 8 | * Data's genius is also its Achilles heel: perfect copying 9 | * No way to prove that you are you that a hacker can't exploit 10 | * You will get hacked. The incentives are there. 11 | 12 | 13 | 14 | ### Make Yourself a Hard Target 15 | 16 | * Don't re-use passwords 17 | * Pick complicated passwords. Use pass phrases. Random generator. Password vault of some kind. 18 | * Two-factor auth 19 | * Fake identity questions 20 | * Change passwords over time 21 | * Credit freeze 22 | * Computer, smartphone security 23 | * Don't use services that get hacked. 24 | * Companies that pay their IT professionals low amounts. 25 | * Companies that have been hacked before. 26 | * Any place that limits the length of your password 27 | * Also goes for significant others, spouses, kids 28 | * Offer this as a service to friends, in trade. 29 | 30 | #### When You Get Hacked, Find Out Quickly 31 | 32 | * Credit monitoring services 33 | * Alerts 34 | 35 | 36 | #### When You Get Hacked, Have It Not Be a Huge Deal 37 | 38 | * Separation of security 39 | * Password reset emails 40 | * Identify single points of failure 41 | * Accounts 42 | * Locations 43 | * Devices 44 | * Companies that know enough about you that they could use it as leverage 45 | * Google 46 | * Facebook 47 | * Amazon 48 | * Anything that has your email or web-browsing habits 49 | * Banks 50 | * Medical companies 51 | * (DRAW A GRAPH, TYPICAL AND SAFE) 52 | * 53 | 54 | 55 | ### Pay for Good Ideas 56 | 57 | * One-time, limited-time debit/credit cards 58 | * Two-factor auth 59 | * Strong encryption 60 | * Anything that gives companies incentives to protect your data (higher liability, reputational risk, etc) 61 | 62 | 63 | ### Know Your Limits 64 | 65 | * Don't try protecting yourself from huge states (Mossad or not-Mossad). 66 | * Add XKCD comic. 67 | * Add James Mickens references. 68 | 69 | 70 | ### Not a Goal: Don't Be a Target 71 | 72 | * Keep a low profile (ummm) 73 | * Have nothing worth stealing (ummm) 74 | * Don't speak up about things you care about, that could make you a target (GamerGate, gun violence, inequality, racism, you name it) -------------------------------------------------------------------------------- /002 - California Water.md: -------------------------------------------------------------------------------- 1 | # California Water Supply 2 | 3 | Scatterplot showing: 4 | - Sales price for farmers 5 | - Amount of water used per pound, or per calorie. 6 | 7 | http://www.nytimes.com/interactive/2015/05/21/us/your-contribution-to-the-california-drought.html -------------------------------------------------------------------------------- /003 - spot pricing 3.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2014-11-15 4 | layout: post 5 | slug: spot-predictions 6 | title: Predicting AWS Spot Pricing 7 | meta-description: 8 | - aws 9 | - amazon web services 10 | - ec2 11 | - vm 12 | - cloud computing 13 | - race to zero 14 | - cost of computing 15 | - machine learning 16 | - prediction 17 | --- 18 | 19 | In the last two blog posts we covered the basics of AWS spot instances and looked at the landscape of cloud computing competitors. 20 | 21 | Now let's see if we can predict prices. Our goal is a better understand of how spot prics behave, so we can optimize our computing costs over time. 22 | 23 | #### Prices By Time 24 | 25 | Prices differ by time of day 26 | 27 | * Per biz hour / weekday. 28 | * Per time of day. 29 | * What days and times of day matter? What patterns exist? 30 | * Which times have the most price 'bursts'? 31 | * Which instance types have the most price 'bursts?' 32 | 33 | http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html 34 | 35 | http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/how-spot-instances-work.html 36 | 37 | http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html 38 | 39 | ---- 40 | 41 | **Part 3: Predictions** 42 | 43 | * Show the results, not the algo 44 | * Which regions have the most price 'bursts'? 45 | * Algos to look at: logistic regression, others. Depends on pricing strategy 46 | * When do price spikes happen? Can they be predicted? 47 | * Do price spikes happen across AZs? Regions? Instance types? 48 | * 'Spike' defined as above some threshold, IQR, absolute value. 49 | * For a given price, X, how many hours will it last for each instance type? 50 | * Split per AZ, per region 51 | * Split by biz hour, weekday 52 | * how much does the price vary per instance? Inter-quartile range? 53 | * Are prices normally distributed? If not, why not? 54 | * What are the gaps in the analysis 55 | * Not clear how many instances exist of each type, especially for the specialized ones. 56 | * Number of hours before the price goes to X 57 | * Trying to do either regression (hours until price goes above X) or classification (odds that price will go away ) 58 | * Do an independent prediction for each instance type -------------------------------------------------------------------------------- /005 - spot pricing 2.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2014-11-20 4 | layout: post 5 | slug: cheap-computing-comparison 6 | title: Comparing Computing for Fun and Profit 7 | meta-description: 8 | - aws 9 | - amazon web services 10 | - ec2 11 | - spot instances 12 | - google compute engine 13 | - gce 14 | - microsoft azure 15 | - windows azure 16 | - vm 17 | - cloud computing 18 | - race to zero 19 | - cost of computing 20 | --- 21 | 22 | In my last blog post, I gave an introduction into Amazon Web Services' spot instances. There were some great deals to be found. 23 | 24 | Let's look at the competition. How cheaply can we run find computing resources using Google's [Compute Engine](https://cloud.google.com/compute/) and Microsoft's [Azure](http://azure.microsoft.com/en-us/)? 25 | 26 | First, let's compare the different instances types by both price and performance. For now, I'm going to assume that RAM speed is the same everywhere. It's only RAM capacity that matters. 27 | 28 | CPU speed, on the other hand, varies dramatically. 29 | 30 | 31 | 32 | **Azure** 33 | 34 | * http://azure.microsoft.com/en-us/pricing/details/virtual-machines/#Linux 35 | * Does it charge for local I/O? Does that even exist? 36 | 37 | 38 | **GCE** 39 | 40 | * Charges for local SSD I/O! 41 | * https://cloud.google.com/compute/docs/machine-types#standard 42 | * https://cloud.google.com/compute/docs/disks 43 | * https://cloud.google.com/compute/docs/local-ssd#pricing_and_quota 44 | 45 | 46 | 47 | 48 | https://aws.amazon.com/blogs/aws/focusing-on-spot-instances-lets-talk-about-best-practices/ 49 | 50 | 51 | 52 | **Resources** 53 | 54 | * http://www.citeworld.com/article/2113976/cloud-computing/ultimate-cloud-speed-tests-amazon-vs-google-vs-windows-azure.html 55 | * http://blog.cloudharmony.com/2013/06/value-of-the-cloud-cpu-performance.html <- AMAZING 56 | * http://www.pythian.com/blog/comparing-cpu-throughput-of-azure-and-aws-ec2/ 57 | * http://www.computerworld.com.au/article/539633/amazon_vs_google_vs_windows_azure_cloud_computing_speed_showdown/ 58 | * http://sqlperformance.com/2014/05/io-subsystem/comparing-azure-vm-performance 59 | * https://cloudvertical.com/cloud-costs#cloud_costs/index 60 | * http://redmonk.com/sogrady/2014/11/18/iaas-pricing-patterns-1114/ <- VERY USEFUL 61 | 62 | 63 | 64 | #### Google Cloud Engine 65 | 66 | * Figure out CPU per scaling factor for each 67 | * How much of a discount is this compared to GCE or Azure, since they don't have this feature 68 | 69 | #### Microsoft Azure 70 | 71 | * Figure out CPU per scaling factor for each 72 | * How much of a discount is this compared to GCE or Azure, since they don't have this feature 73 | 74 | 75 | 76 | ### When In Doubt, Competition 77 | 78 | If I was a large company, I would use *several* cloud computing solutions. My reasoning is simple: it's cheaper that way. 79 | 80 | 'Public cloud' infrastructure is incredibly expensive to build and engineer. The leaders in the field have some of the smartest engineers on the planet. The barriers to entry are *extremely* high. 81 | 82 | When I'm a customer of companies that have natural barriers to competition, I want there to be lots of choices. As long as many different cloud-computing companies exist, there will be [competition on price, regardless of what people say](http://recode.net/2014/11/12/amazon-cloud-chief-andy-jassy-dismisses-talk-of-price-war/). Competition leads to lower prices than monopolies; that's Economics 101 (LINKME). 83 | -------------------------------------------------------------------------------- /005 - spot pricing 4.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2014-11-16 4 | layout: post 5 | slug: spot-strategy 6 | title: AWS Spot Strategy 7 | meta-description: 8 | - aws 9 | - amazon web services 10 | - ec2 11 | - vm 12 | - cloud computing 13 | - race to zero 14 | - cost of computing 15 | - bidding strategies 16 | - cloud arbitrage 17 | --- 18 | 19 | Over the last few days I've looked at AWS spot instances, competitors, and predicting their performance. Now, let's look at how to use this information. 20 | 21 | 22 | Bidding strategies: 23 | * Maximum bid, keep it running 24 | * Persistent bid at a certan price, trade-off for cost vs. runtime 25 | * Auto-analyzing bid, move to different locations and exploit deals over time. 26 | 27 | 28 | Caveat. Sometimes there aren't very many of certain instance classes, so you can *bid against yourself*. Unfortunately there often isn't enough information about customer demand vs. supply to figure out what's going on (if you're being outbid by customers, or if AWS is reclaiming instances because it needs the supply for on-demand or other instance types). 29 | 30 | 31 | ---- 32 | **Part 4: Strategy and Uses** 33 | 34 | * Public good. Science. 35 | * Offer to share 36 | * For a couple of prototypical workloads (use Netflix for an example), walk through the cost differential 37 | * Youtube video on strategies 38 | * http://santtu.iki.fi/2014/03/25/ec2-spot-price-minimum/ 39 | * http://santtu.iki.fi/2014/03/20/ec2-spot-market/ 40 | * http://santtu.iki.fi/2014/03/19/ec2-spot-usage/ 41 | * http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html 42 | 43 | * http://blog.yhathq.com/posts/how-yhat-does-cloud-balancing.html 44 | 45 | 46 | **Post-Publish Notes*** 47 | 48 | * Notify interested groups 49 | * Wendy Pastrick (?) at Seattle Cancer Care Alliance 50 | * Chris Bare at Sage Bionetworks 51 | * Susan ____ from SUUC at Harborview 52 | * UW-IT cloud computing folks. IAM. 53 | * eScience Institute 54 | * UW CSE department. My grad-student contact there. 55 | * UW Physics department 56 | * https://www.youtube.com/watch?v=mKElyNabc0A&feature=youtu.be 57 | * Kevin Jorissen, Research Assoc there. 58 | * Fernando Villa, Research scientist there 59 | * They focus on cluster instances -------------------------------------------------------------------------------- /008 - Car Analysis Three.md: -------------------------------------------------------------------------------- 1 | # Car Analysis - Hunting for Deals 2 | 3 | This last may, I gave a presentation at the PASS Business Analytics conference on using data to make decisions. During that presentation I 4 | 5 | I've covered this topic before. I've purchased my own car using data (LINKS), compiled a list of car buying best practices (LINKS), and even purchased a cheap car in 72 hours for my sister (LINK). 6 | 7 | This time, I wanted answers to questions. 8 | 9 | * What patterns can I find in safe vs. unsafe cars? 10 | * What patterns can I find in cheap vs. expensive cars? 11 | * What are the best deals for a car nowadays? 12 | 13 | 14 | ### Context is King 15 | 16 | All behavior and patterns are influenced by the fundamental rules of their context. When buying a car, there are some obvious truths: 17 | 18 | 19 | * The goal of a car is to redu the time/effort it takes to get from one place to another. It therefore competes with bicycles, walking, buses, subways, planes, trains, and ZipCar, Car2Go, and Lyft* for convenience and value. (LINKS) 20 | * New cars are more expensive than used ones 21 | * Not all cars are created equal. They differ in features, quality, safety, and especially reliability. 22 | * However, cars of the same make and model will behave about the same, unless one of them has been damaged in some way. 23 | * Popular cars are more expensive than unpopular cars 24 | 25 | 26 | There are also some common truths. These are behaviors that happen *most* of the time, but not always: 27 | 28 | * Cars: the price of a car drops by 15-25% each year for the first 5 years. 29 | * The biggest expense is the car itself; it's not gas, or insurance, or repairs, it's the cost of purchasing the vehicle. 30 | * All of them wear down. They're machines. They have a finite lifespan; it's rare to hear of a car that lasts more than 300K miles or so, although cars with 200K miles on them are becoming fairly common. 31 | * It cheaper to add fancy features (backup cameras, fancy speakers) after buying it than when buying the car. 32 | * People normally drive around 12K miles a year. 200K mile car that's driven 12K miles a year will last around 16.6%. 5.9% a year. 33 | * Most car purchases are made within 50 miles of where the owner lives. 34 | * Car models undergo 'revisions'. A 2009 Toyota Prius and 2011 Toyota Prius don't look alike, because there were a bunch of changes made. Therefore, different model years for the same car will have different behaviors and safety. 35 | 36 | 37 | I'll add one more truth, a psychological one: 38 | 39 | **A car doesn't have to be an expression of your personality**. It can be just a box with an engine that gets you from one place to another. 40 | 41 | http://wolfstreet.com/2017/03/26/automakers-record-incentives-to-slow-decline-in-sales/ 42 | 43 | * http://tradeinqualityindex.com <- HOLY CRAP, THIS IS AMAZING 44 | * http://www.mrmoneymustache.com/2011/09/30/is-a-costco-membership-worth-the-cost/ 45 | * http://arstechnica.com/cars/2015/05/meta-analysis-finds-self-braking-cars-reduce-collisions-by-38-percent/ 46 | * http://consumerist.com/2015/05/20/gm-that-car-you-bought-were-really-the-ones-who-own-it/ 47 | * http://money.usnews.com/money/personal-finance/articles/2015/06/09/startups-offer-new-ways-to-buy-and-sell-used-cars 48 | * http://www.nytimes.com/2015/06/24/business/senate-commerce-hearing-takata-airbag-nhtsa-general-motors.html 49 | * https://medium.com/@ade3/the-zombie-mobile-b03932ac971d 50 | * http://wolfstreet.com/2016/11/22/strongest-pillar-of-the-shaky-us-economy-has-cracked/ 51 | * https://www.nytimes.com/2017/01/27/your-money/used-cars-takata-recalls.html 52 | * https://www.yourmechanic.com/article/the-most-and-least-expensive-cars-to-maintain-by-maddy-martin 53 | * https://www.nytimes.com/2017/04/20/automobiles/wheels/new-cars-technology.html <- FOR CAR BLOG POST 54 | 55 | * https://publish.manheim.com/en/services/consulting/used-vehicle-value-index.html <- also VERY useful for used cars 56 | 57 | 58 | ## Safe and Unsafe Cars 59 | 60 | * Do analysis 61 | 62 | 63 | Right now you can buy over a thousand different car models. Some are brand-new, some are a bit older, but you can find all of them. 64 | 65 | That's a lot, so we have narrow down the field. Luckily, most of us can do this pretty easily. 66 | 67 | (DEMO) Car makes and models in Excel 68 | 69 | * Ensembling 70 | * Conditional formatting in Excel 71 | * Eliminating bad options vs. picking good ones 72 | 73 | **Ensembling** a.k.a model averaging or bagging. 74 | 75 | ***Big problem***: How do we choose what make and model of car to buy? 76 | * We know some reliable brands. Honda and Toyota are famous. 77 | * We ask our friends, family, neighbors, coworkers. 78 | * We rely on what has worked before. 79 | 80 | Anyone who followed the US presidential election in 2008 and 2012, this is what Nate Silver did to predict the outcome of all 50 states. 81 | 82 | * This problem comes up all the time, in politics, medicine, finance, even cooking. Nobody has all of the information and no bias, but *collectively* there's enough information and the bias can average out. 83 | * Simple way: find the ratings from major car sites and average them. This is more reliable than any single site alone. 84 | 85 | I did this for small cars & sedans last year. 86 | 87 | * When there is no single reliable source of data, use the aggregate of different sources. 88 | * Combining the ratings of 10 different car-review sites is more accurate than the ratings of any single site. 89 | 90 | 91 | ## Cheap and Expensive Cars 92 | 93 | * Do analysis. Cost per mile. Cost of the car. TCO. 94 | * What is the price difference if you want to carry more than 5 people? 95 | * What is the price difference if you want to haul things? 96 | * What is the price difference if you want to buy a 'luxury' car brand? 97 | * What is the price difference between new and used cars? 98 | * Can you find used luxury cars at the same price as new econoboxes? 99 | 100 | ## Car Deals 101 | 102 | Our question asks "what's a good deal". A 'deal' is one where there's high value for low cost. So we need to define cost, and value. 103 | 104 | Value, though, is harder to define. It's the value proposition of a car; it's transportation that saves time/energy compared to walking. 105 | 106 | One simple way is the number of miles it can take us before it dies. 107 | 108 | The ratio of cost:value is therefore the # of expected miles vs. its total cost of ownership. 109 | 110 | We can simplify this to $ per expected mile. 111 | 112 | 113 | 114 | (SWITCH TO TABLEAU, COST PER MILE, TCO PER MILE) 115 | 116 | 117 | * Eliminating bad options vs. picking good ones 118 | * Diminishing returns 119 | DEMO - ROI, diminishing returns, cost:value 120 | 121 | 122 | *Note: I did not mention Uber because of their recent behavior towards journalists. I don't believe companies that abuse their power should be given anything but scorn.* 123 | 124 | 125 | 126 | * http://www.nytimes.com/2015/03/27/automobiles/their-ranks-thinned-the-surviving-car-dealerships-thrive.html 127 | * http://www.nytimes.com/2015/03/31/business/dealbook/prosecutors-scrutinize-minorities-auto-loans.html 128 | * http://www.nytimes.com/2015/04/02/business/us-auto-sales-march.html 129 | * http://www.nytimes.com/2015/04/23/technology/personaltech/an-online-tune-up-for-the-used-car-marketplace.html 130 | * http://www.japantimes.co.jp/news/2014/04/07/business/gods-edging-out-robots-at-toyota-facility/ 131 | * http://www.safetyresearch.net/blog/articles/toyota-unintended-acceleration-and-big-bowl-“spaghetti”-code 132 | * http://blog.instamotor.com/why-dealership-used-cars-cost-more/ 133 | 134 | -------------------------------------------------------------------------------- /008 - Car Prices.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/008 - Car Prices.xlsx -------------------------------------------------------------------------------- /008 - Car Quality Stats.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/008 - Car Quality Stats.xlsx -------------------------------------------------------------------------------- /008 - Car-Models.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/008 - Car-Models.xlsx -------------------------------------------------------------------------------- /008 - Cars-For-Sale.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/008 - Cars-For-Sale.xlsx -------------------------------------------------------------------------------- /010 - CameraAwesomePhoto (1).jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/010 - CameraAwesomePhoto (1).jpg -------------------------------------------------------------------------------- /010 - CameraAwesomePhoto (2).jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/010 - CameraAwesomePhoto (2).jpg -------------------------------------------------------------------------------- /010 - equality vs equity.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/010 - equality vs equity.jpg -------------------------------------------------------------------------------- /010 - food lifecycle.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/010 - food lifecycle.jpg -------------------------------------------------------------------------------- /010 - poor deserve the best.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/010 - poor deserve the best.png -------------------------------------------------------------------------------- /010 - spider man.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/010 - spider man.jpg -------------------------------------------------------------------------------- /015 - Tuition by Raw Numbers.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/015 - Tuition by Raw Numbers.xlsx -------------------------------------------------------------------------------- /015 - UW Salary Info.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/015 - UW Salary Info.xlsx -------------------------------------------------------------------------------- /015 - UW Student Tuition Plan.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/015 - UW Student Tuition Plan.pdf -------------------------------------------------------------------------------- /015 - University Salary Analysis.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/015 - University Salary Analysis.md -------------------------------------------------------------------------------- /016 - College from First Principles.md: -------------------------------------------------------------------------------- 1 | # College from First Principles 2 | 3 | 4 | 5 | ### How much should college cost? 6 | 7 | NOTE: this is a later blog post 8 | 9 | Let's look at colleges and universities from a student's perspective. What matters most of all is: 10 | 11 | * A degree they can show future employers 12 | * Good professors and teaching aides 13 | * A safe place to learn 14 | 15 | *Everything* else is optional. 16 | 17 | I'm going to shamelessly copy Elon Musk's idea of [analyzing cost from first principles](LINKME) from first principles. 18 | 19 | **Cost of a large university degree** 20 | 21 | University professors 22 | TAs / grad students 23 | Cost to rent a building in a suburb. 24 | Cost to rent a studio in a suburb 25 | 26 | **Cost of a small liberal-arts degree** 27 | 28 | University professors 29 | TAs / grad students 30 | Cost to rent a building in a sleepy 'college' town. 31 | Cost to rent a house and share it in a sleepy 'college' town. 32 | Better student:teacher ratio 33 | 34 | 35 | **Cost of a 2-year, 'intensive' degree** 36 | 37 | * Rise of coding schools 38 | * Smaller staff. 39 | * Cost to rent 40 | 41 | 42 | #### Externalities 43 | 44 | This is an unfair comparison. It discards a lot of what colleges pride themselves on: academic research, fancy dorms, 45 | 46 | 47 | Cost of higher ed from first principles: 48 | 49 | 50 | 51 | *Full disclosure: I am a staff member at the University of Washington. I acknowledge that this may cause some bias; I have tied to stick to the facts in an attempt to counter this.* 52 | 53 | 54 | #### Gender Balance 55 | 56 | Use gender-prediction API (name?) to figure out gender per name. Use that to find gender balance per title, per category, and overall. Also look at salary imbalance. 57 | 58 | 59 | http://data.spokesman.com/salaries/state/2014/306-university-of-washington/ 60 | 61 | http://data.spokesman.com/salaries/state/faq/ 62 | 63 | http://fiscal.wa.gov/Salaries.aspx <- salaries. Not total compensation. 64 | -------------------------------------------------------------------------------- /020 - Industries.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/020 - Industries.pdf -------------------------------------------------------------------------------- /020 - ally bingo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/020 - ally bingo.jpg -------------------------------------------------------------------------------- /025 - Why Ethics.md: -------------------------------------------------------------------------------- 1 | # Title 2 | 3 | * Ethics for Data Professionals 4 | * Ethics for Professionals 5 | * Professional Ethics: (SECTION) 6 | 7 | 8 | ## Sections 9 | 10 | 1. What is Ethical? 11 | 2. Why Ethics? 12 | * Is my job ethical? 13 | * Why Should I Care? 14 | 3. Looking for Ethics in All The Right Places 15 | * What industries are ethical? 16 | 4. Disrupting unethical industries. 17 | 18 | ## Ethics for Professionals: What and Why 19 | 20 | * "Why should I care?"* 21 | * "I like to work on X technology"* 22 | 23 | 24 | "If a man isn't proud of what he does, then he isn't proud of his living" - garbageworker from the 1960's Memphis Civil Rights movement and protests. 25 | 26 | Technical professionals have a lot of power. We only rarely consider the effects of our power and how our work is being used. 27 | 28 | (IMAGE: with great power comes great...you know) 29 | 30 | 31 | We all like to believe we are working to make our society, our world, a better place. At the very least we want to believe we aren't making things worse. 32 | 33 | 34 | #### You Didn't Build Yourself 35 | 36 | If you are reading this, then you're lucky enough to afford an Internet connection, which means you are most likely not starving, have a well-built home with heat, running water, and electricity. It also means you're probably a highly educated IT professional or software engineer making at least $40K a year, if not much more. That puts you in the ___% percentile of the population. 37 | 38 | You also didn't build yourself. Chances are you had family that sacrificed to raise you, a society that paid to educate you, civic services that collectively worked to keep your community safe, warm and intact. As a child we consume resources; we don't contribute back to society until we are older. 39 | 40 | That's fine. But saying someone is 'self-made' should mean they taught themselves without a teacher, without parents or family as guides, without police or firefighters to keep them safe. It didn't happen to you. 41 | 42 | Imagine if you were born to a family at the midpoint of our world's population. You wouldn't have the opportunity for an education. Your parents would make ___. 43 | 44 | We are the lucky ones. We won the lottery considering where we were born, where we grew up, and the part of society we were born into. 45 | 46 | "If not me, then who? If not now, then when?" (ATTRIBUTE QUOTE) 47 | 48 | The reason you should care about ethics is twofold: 49 | 50 | 1. You're extremely lucky; you've hit the circumstantial jackpot! 51 | 2. The planet is making those chances harder. 52 | 3. At some level, you like to think of yourself as a good person. 53 | 4. If data professionals don't start behaving ethically *en masse*, the data revolution will be worse for the average person than the Industrial Revolution (REFINE). 54 | 5. At some level you want to believe that that your compatriots and colleagues are working on the side of the angels. 55 | 56 | 57 | * http://www.bbc.com/future/story/20150130-the-man-who-studies-evil 58 | * http://www.ecouterre.com/reality-show-sends-fashion-bloggers-to-work-in-cambodian-sweatshop/ 59 | * http://time.com/3694368/make-internet-better-place/ 60 | * http://www.theguardian.com/commentisfree/oliver-burkeman-column/2015/feb/03/believing-that-life-is-fair-might-make-you-a-terrible-person?CMP=share_btn_fb 61 | * http://www.queerty.com/bayard-rustin-the-gay-dreamer-behind-dr-kings-i-have-a-dream-speech-20130828 62 | 63 | #### Is your job ethical? How do you know? 64 | 65 | That's great. Prove it. 66 | 67 | Engineers, researchers, and IT pros are taught to use data to justify our technical instincts. Writing tests, analyzing server logs, and collecting data for scientific experiments are all examples of our understanding that *we don't know as much as the data can tell us*. 68 | 69 | Let's apply that same principal to a different topic: ethics and work. 70 | 71 | 72 | 73 | 74 | We live in a place with massive inequality. 75 | 76 | * Start with a premise...we are created equal. 77 | * But our results are nowhere near equal, and it's not because of fate. It's partly, if not mostly, due to circumstances. 78 | * We live on a planet that's rapidly losing the ability to be habitable to us. 79 | * That's mostly for two reasons: we are consuming more per capita, and we are an overpopulated species. 80 | * Preventing unwanted pregnancies is a highly ethical thing to do. 81 | * Environment as a closed system. Optimizing under unknown constraints. 82 | * We live in a planet where exploiting other people is highly rewarded. 83 | * Most of us don't think this way. We think in terms of tools and problems. We remove the human results of our actions from the equation, because we're not comfortable with people. 84 | * Nobody likes to think of themselves as a bad person. So we find ways to justify our actions or explain them away. 85 | * You feel better knowing you're making the world a better place. It puts a spring in your step. You're glad when someone asks "Where do you work and what do you do?" -------------------------------------------------------------------------------- /025 - ethics.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/025 - ethics.jpg -------------------------------------------------------------------------------- /040 - ML companies.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/040 - ML companies.png -------------------------------------------------------------------------------- /040 - NIST.SP.800-179.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/040 - NIST.SP.800-179.pdf -------------------------------------------------------------------------------- /040 - data ethics slide.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/040 - data ethics slide.jpg -------------------------------------------------------------------------------- /040 - ead_v1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/040 - ead_v1.pdf -------------------------------------------------------------------------------- /040 - ethical tech.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/040 - ethical tech.jpg -------------------------------------------------------------------------------- /040 - the scored society.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/040 - the scored society.pdf -------------------------------------------------------------------------------- /050 - Privacy.markdown: -------------------------------------------------------------------------------- 1 | Attention Please: This is not a rant, but it is long. I'm going to keep this as tinfoil-hat free as possible. 2 | 3 | People thought we were kind of silly when we went on an anti-Google tirade a little while ago, and I admit I have relapsed into most of those services. With the recent news about PRISM, I'm now scrubbing my life of non-essential services (Facebook, Google, Microsoft, Skype, etc.), and moving anything "cloud" overseas to MEGA and the like. Expect to see less and less of us on Facebook, and for what we post here to become very generic and non-controversial. 4 | 5 | Albert Einstein said, "Never do anything against conscience, even if the state demands it." The seven companies now known to be complicit in the US government's unending quest to invade more and more of our lives either have no conscience, or disagree with this philosophy. Either way, they now stand counter to my own convictions. There are a number of alternative services out there that are either foreign-based and not beholden to US law, or are small enough that they are not on the radar. Use them. Vote with your choices in software, services, purchases, habits. If you can, use an SMS application that encrypts your texts (TextSecure on Android) and your voice communications (RedPhone on Android) and your IM communications (GibberBot on Android). Use a mail client that supports PGP. Use a firewall in your home. Use open-source, community-developed alternatives like FireFox and Pidgin instead of Chrome and Google Talk. Use OTR plugins to keep your communications secure. Use Tor. 6 | 7 | These little, "inconvenient" things can add up, and I promise you that once you adopt them and get used to them, you'll forget what you thought was so inconvenient. That is the blessing and the curse of the human mind: we adjust quickly to change when forced. So the same psychology that the NSA and FBI and DHS have used to cow us into accepting ever-increasing encroachment into our personal lives can in fact be used as a weapon against that encroachment, through the adoption of more secure practices. 8 | 9 | The time for laughing it off as a nutter conspiracy theory is over. What happened in Iran can happen here. What happened in East Germany can happen here. What happened in the former USSR can happen here. What happened in Chile can happen here. These were free societies, of different economic policy, that were overtaken by autocracy. The only way to prevent it is to fight it. The only way to fight it is to starve it. 10 | 11 | Each of us has a choice, and our choices matter. 12 | 13 | 14 | Google is like 2000s-era Microsoft. So pervasive it's impossible to get away. http://www.businessinsider.com/r-exclusive-google-aiming-to-go-straight-into-car-with-next-android---sources-2014-12 15 | 16 | 17 | * http://thomaslarock.com/2014/03/safe-data-theft/ 18 | * http://www.fastcoexist.com/3027665/the-nsa-can-learn-all-your-secrets-from-your-phone-metadata 19 | * http://billmoyers.com/2014/03/13/tips-for-protecting-your-privacy-online/ 20 | * http://us.macmillan.com/dragnetnation/JuliaAngwin/ 21 | * http://technet.microsoft.com/library/cc722487.aspx 22 | * http://www.extremetech.com/internet/180485-the-ultimate-guide-to-staying-anonymous-and-protecting-your-privacy-online 23 | * http://blogs.computerworld.com/security/23805/michaels-finally-confirms-massive-pos-hack-aaron-bros-well 24 | * http://www.theguardian.com/world/2014/jun/07/stephen-fry-denounces-uk-government-edward-snowden-nsa-revelations 25 | * http://www.pewinternet.org/2014/12/18/other-resounding-themes/ <- "Privacy will become a luxury good" -------------------------------------------------------------------------------- /050 - The Database of Ruin.md: -------------------------------------------------------------------------------- 1 | ## The Database of Ruin 2 | 3 | Privacy matters. 4 | 5 | The more valuable we are, the more likely data is to steal from us. 6 | 7 | 8 | * Black swans happen 9 | * How to measure likely impact. 10 | * Humans underestimate tail risk. 11 | * The riskier the behavior, the more you should default-to-safe. 12 | * Humans are risk averse. 13 | * Prototypes are great for this. 14 | * Change + Risk = Constant per culture (company, org, relationship). 15 | * Compatibility is huge. 16 | 17 | 18 | * http://boingboing.net/2014/03/03/full-nhs-hospital-records-uplo.html 19 | * http://www.technologyreview.com/photoessay/533426/the-troll-hunters/ 20 | * http://betaboston.com/news/2014/03/05/a-vast-hidden-surveillance-network-runs-across-america-powered-by-the-repo-industry/ 21 | * http://www.extremetech.com/computing/177945-how-big-business-builds-license-plate-databases-that-track-your-every-move 22 | * http://radar.oreilly.com/2014/03/the-creep-factor-how-to-think-about-big-data-and-privacy.html 23 | * http://flowingdata.com/2014/12/15/when-data-gets-creepy/ 24 | * http://krebsonsecurity.com/2014/03/experian-lapse-allowed-id-theft-service-to-access-200-million-consumer-records/ 25 | * http://gigaom.com/2014/03/13/with-data-brokers-selling-lists-of-alcoholics-to-big-business-the-feds-have-some-thinking-to-do/ 26 | * http://mobile.nytimes.com/blogs/bits/2014/12/23/data-broker-is-charged-with-selling-consumers-financial-details-to-fraudsters/ -------------------------------------------------------------------------------- /050 - online tracking.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/050 - online tracking.jpg -------------------------------------------------------------------------------- /060 - Precautionary Principle.md: -------------------------------------------------------------------------------- 1 | ## The Precautionary Principle 2 | 3 | * Black swans happen 4 | * How to measure likely impact. 5 | * Humans underestimate tail risk. 6 | * The riskier the behavior, the more you should default-to-safe. 7 | * Humans are risk averse. 8 | * Prototypes are great for this. 9 | * Change + Risk = Constant per culture (company, org, relationship). 10 | * Compatibility is huge. 11 | 12 | 13 | Risk vs. reward. Risk aversion. 14 | 15 | * http://www.bloomberg.com/news/2014-04-11/nsa-said-to-have-used-heartbleed-bug-exposing-consumers.html 16 | * http://www.wired.com/2014/04/hospital-equipment-vulnerable/ 17 | * http://www.pqed.org/2014/06/how-should-people-respond-to-open-carry.html 18 | * http://www.economist.com/news/technology-quarterly/21615064-following-example-maker-communities-worldwide-hobbyists-keen-biology-have 19 | * http://arstechnica.com/science/2015/04/apollo-13-the-mistakes-the-explosion-and-six-hours-of-live-saving-decisions/ 20 | * http://pando.com/2014/10/18/gms-hit-and-run-how-a-lawyer-mechanic-and-engineer-blew-the-lid-off-the-worst-auto-scandal-in-history/ 21 | * http://www.wired.com/2012/10/ff-why-products-fail/all/ 22 | * http://www.theatlantic.com/features/archive/2014/03/the-toxins-that-threaten-our-brains/284466/ 23 | * http://www.nytimes.com/2015/04/10/opinion/why-pilots-still-matter.html?_r=0 -------------------------------------------------------------------------------- /070 - climate change viz.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/070 - climate change viz.jpg -------------------------------------------------------------------------------- /070 - space ceded to cars.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/070 - space ceded to cars.jpg -------------------------------------------------------------------------------- /080 - Music and Engineering.md: -------------------------------------------------------------------------------- 1 | # Music and Engineering 2 | 3 | * Both require creativity 4 | * Both have a 'frame' of constraints 5 | 6 |
  • Musicians and Programmers - An Animusic Review - Music, Computers, Math & P
  • 7 |
  • http://www.digitalmusicnews.com/permalink/2014/06/23/fk-heres-entire-youtube-contract-indies
  • 8 |
  • Not Just for Music: Drumming Is Therapy, Too - The Daily Beast
  • 9 |
  • What keeps Henry Rollins productive? The musician/author/speaker/actor shar
  • 10 |
  • Why do we listen to our favourite music over and over again? Because repeat
  • 11 |
  • How Musicians Really Make Money in One Long Graph - Derek Thompson - The At
  • 12 | 13 | 14 | -------------------------------------------------------------------------------- /100 - Google Wagging the Dog.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2014-02-14 4 | layout: post 5 | slug: google-wag 6 | title: Google is wagging the whole Internet 7 | meta-description: In this blog post, Dev Nambi writes about the massive impact Google is having on all of software engineering. 8 | tags: 9 | - data science 10 | - signal vs. noise 11 | - fud 12 | - marketing 13 | - big data 14 | - machine learning 15 | - learning 16 | - distributed computing 17 | --- 18 | 19 | One of the more famous engineering diagrams is the OSI network model. It describes the different layers of a network, and what each layer is responsible for. It's a *beautiful* example of how separation of concerns and abstraction can be used to build a large system. 20 | 21 | It's the model that defines the engineering around the entire Internet. 22 | 23 | Data engineering doesn't have an equivalent model. This is my attempt to create one. 24 | 25 | 26 | * MapReduce 27 | * HDFS 28 | * BigQuery (HBase) 29 | * Dremel and Drill, Parquet. Columnar big data. 30 | * F1 31 | * Wagging the dog 32 | * omega and mesos 33 | * no-case servers and open compute 34 | * Entire industries around SEO 35 | Ajax in maps. 36 | Large storage in email inboxes. 37 | 38 | * http://the-paper-trail.org/blog/the-elephant-was-a-trojan-horse-on-the-death-of-map-reduce-at-google/ 39 | * http://www.kdnuggets.com/2014/08/sibyl-google-system-large-scale-machine-learning.html <- Sibyl 40 | * http://www.slate.com/blogs/business_insider/2014/10/23/behind_the_scenes_look_at_google_data_centers.html 41 | 42 | ### The Elephant Has Left the Building 43 | 44 | Hadoop is almost a decade old. It's established. It's also showing it's age. The original MapReduce paper came out in 1999 (LINK). 45 | 46 | #### HDFS 47 | 48 | The HDFS file system is still immensely popular, even with companies that are working to 'replace Hadoop'. I have 2 guesses why: 49 | 50 | 1. It does a great job of maintaining file integrity using inexpensive disks without sacrificing performance. 51 | 2. Filesystems are hard to create. 52 | 53 | #### MapReduce 54 | 55 | MapReduce, on the other hand, hasn't aged as well. It works well for some problems, but it turns out to be very limiting for a lot of . In particular, its batch-oriented processing paradigm makes it useless for low-lateny (interactive) queries. 56 | 57 | Google replaced MapReduce with Dremel. Then it replaced that with F1. 58 | 59 | The most popular cutting-edge implementations of interactive big-data engines are probably Cloudera Impala and Apache Spark/Shark. 60 | 61 | ## The Layers 62 | 63 | Hardware, Infrastructure. 64 | 65 | Low-Level Operators 66 | 67 | High-Level Operators, Queries 68 | 69 | Algorithms, Parameters 70 | 71 | Languages 72 | 73 | 74 | #### Hardware 75 | 76 | 77 | #### Low-Level Operators 78 | 79 | 80 | #### High-Level Operators 81 | 82 | 83 | #### Algorithms, Parameters 84 | 85 | 86 | #### Languages 87 | 88 | 89 | ## The Future 90 | 91 | -------------------------------------------------------------------------------- /100 - map-reduce funny.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/100 - map-reduce funny.png -------------------------------------------------------------------------------- /110 - Big_Data_Landscape.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/110 - Big_Data_Landscape.png -------------------------------------------------------------------------------- /110 - Data Stack.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2014-02-14 4 | layout: post 5 | slug: data-stack 6 | title: Data Stacks 7 | meta-description: In this blog post, Dev Nambi writes about the new data stacks. 8 | tags: 9 | - data science 10 | - signal vs. noise 11 | - fud 12 | - marketing 13 | - big data 14 | - machine learning 15 | - learning 16 | - distributed computing 17 | --- 18 | 19 | One of the more famous engineering diagrams is the OSI network model. It describes the different layers of a network, and what each layer is responsible for. It's a *beautiful* example of how separation of concerns and abstraction can be used to build a large system. 20 | 21 | It's the model that defines the engineering around the entire Internet. 22 | 23 | Data engineering doesn't have an equivalent model. It needs one as technology stacks, connectors, and processing models are invented, evolve, and die at a furious pace. 24 | 25 | 26 | http://radar.oreilly.com/2015/02/processing-frameworks-for-hadoop.html <- basically the article I wanted to write 27 | 28 | 29 | ### The Elephant Has Left the Building 30 | 31 | Hadoop is almost a decade old. It's established. It's also showing it's age. The original MapReduce paper came out in 1999 (LINK). 32 | 33 | #### HDFS 34 | 35 | The HDFS file system is still immensely popular, even with companies that are working to 'replace Hadoop'. I have 2 guesses why: 36 | 37 | 1. It does a great job of maintaining file integrity using inexpensive disks without sacrificing performance. 38 | 2. Filesystems are hard to create. 39 | 40 | * http://www.slideshare.net/julienledem/th-210pledem 41 | * http://tachyon-project.org/ 42 | 43 | http://venturebeat.com/2014/05/11/the-state-of-big-data-in-2014-chart/ 44 | http://azure.microsoft.com/en-us/documentation/articles/documentdb-sql-query/ 45 | 46 | #### MapReduce 47 | 48 | MapReduce, on the other hand, hasn't aged as well. It works well for some problems, but it turns out to be very limiting for a lot of . In particular, its batch-oriented processing paradigm makes it useless for low-lateny (interactive) queries. 49 | 50 | Google replaced MapReduce with Dremel. Then it replaced that with F1. 51 | 52 | The most popular cutting-edge implementations of interactive big-data engines are probably Cloudera Impala and Apache Spark/Shark. 53 | 54 | ## The Layers 55 | 56 | **Figure Out Where They Fit** 57 | 58 | * Summingbird 59 | * MemSQL 60 | 61 | #### Hardware, Infrastructure. 62 | 63 | * Resource management 64 | * Process monitoring and restartability 65 | * ZooKeeper 66 | * Mesos 67 | * Cloud computing tools (Chef, Puppet) 68 | 69 | 70 | #### Storage, Filesystem, Memory 71 | 72 | * I/O (serialization, etc) 73 | * Connectors, connectors everywhere 74 | * Kafka 75 | * Apache Storm 76 | * HDFS 77 | * RDD in Spark 78 | * Hbase/Cassandra 79 | * Mongo 80 | * Tachyon 81 | 82 | #### Low-Level Operators 83 | 84 | * Pig 85 | * Connectors, connectors everywhere! 86 | * Hadoop operators (map, reduce) 87 | * Scala operators (map, flatmap, etc) 88 | * SQL operators (seek, scan, join, aggregate) 89 | 90 | 91 | #### High-Level Operators, Queries 92 | 93 | * ML 94 | * SQL (Hive, Shark) 95 | * Impala 96 | * Mahout 97 | 98 | #### Algorithms, Parameters 99 | 100 | * Brains 101 | * Hyperparameters 102 | * MLBase 103 | 104 | #### Languages 105 | 106 | * R 107 | * Python 108 | * Scala 109 | * Cascading 110 | * Java 111 | * .NET 112 | * C++ 113 | * Cascalog 114 | * Clojure 115 | 116 | 117 | 118 | ## The Future 119 | 120 | * Consolidation 121 | * Go for simplicity (GraphLab) 122 | * Interoperability 123 | 124 | -------------------------------------------------------------------------------- /110 - data stack hadoop cat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/110 - data stack hadoop cat.png -------------------------------------------------------------------------------- /110 - data stack.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/110 - data stack.jpg -------------------------------------------------------------------------------- /115 - Cost of Complexity.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/115 - Cost of Complexity.jpg -------------------------------------------------------------------------------- /115 - Systems Thinking.md: -------------------------------------------------------------------------------- 1 | # Systems Thinking 2 | 3 | * See the whole system and scenario, scale up and down in scope 4 | * NOC job as a basis for becoming a good developer 5 | 6 | https://medium.com/@joaomilho/festina-lente-e29070811b84 7 | 8 | 9 | ## Cost of Complexity 10 | 11 | 12 | Nassim N. Taleb (@nntaleb) 13 | 3/29/14, 9:50 AM 14 | General Principle: the solutions on balance needs to be simpler than the problems. (Otherwise the system collapses under its complexity) 15 | 16 | http://www.vanityfair.com/politics/2013/09/joint-strike-fighter-lockheed-martin. 17 | 18 | https://devopsu.com/blog/boring-systems-build-badass-businesses/ 19 | 20 | http://www.vanityfair.com/business/2014/10/air-france-flight-447-crash 21 | 22 | http://firstround.com/article/The-one-cost-engineers-and-product-managers-dont-consider -------------------------------------------------------------------------------- /120 - Cloud Computing incl Spot.md: -------------------------------------------------------------------------------- 1 | # Cloud Computing 2 | 3 | 4 | **Glacier** 5 | http://storagemojo.com/2014/04/25/amazons-glacier-secret-bdxl/ 6 | 7 | What makes you think your tape drive is any better? 8 | 9 | Compression and network latency 10 | 11 | 12 | - Write PoSH cmdlet to upload files to AWS Glacier 13 | 14 | 15 | 16 | * http://recode.net/2014/11/12/amazon-cloud-chief-andy-jassy-dismisses-talk-of-price-war/ 17 | * https://aws.amazon.com/blogs/aws/next-generation-of-dense-storage-instances-for-ec2/ 18 | * http://www.slideshare.net/whiskybar/aws-ec2 19 | * http://www.salon.com/2014/11/13/amazons_dirty_energy_problem_is_about_to_get_even_worse/ 20 | * http://www.theregister.co.uk/2014/11/10/kryders_law_of_ever_cheaper_storage_disproven/?mt=1415981641453 21 | * http://www.infoworld.com/article/2610403/cloud-computing/ultimate-cloud-speed-tests--amazon-vs--google-vs--windows-azure.html?page=4 22 | 23 | 24 | **How to pay for it?** 25 | 26 | * UW CSE 27 | * UW IT 28 | * Startups like Scalyr 29 | * Some other startup? 30 | * http://www.nouvola.com/ 31 | * http://dataconomy.com/google-using-machine-learning-boost-efficiency-data-centres/ 32 | * https://docs.google.com/a/google.com/viewer?url=www.google.com/about/datacenters/efficiency/internal/assets/machine-learning-applicationsfor-datacenter-optimization-finalv2.pdf -------------------------------------------------------------------------------- /120 - cloud computing dev 1.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/120 - cloud computing dev 1.JPG -------------------------------------------------------------------------------- /120 - cloud computing dev 2.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/120 - cloud computing dev 2.JPG -------------------------------------------------------------------------------- /130 - Interdisciplinary.md: -------------------------------------------------------------------------------- 1 | ## Polyglot Engineering 2 | 3 | 4 | I have met many, many developers and IT engineers who think their job is something new. It's not. The practices involved in working with computers are new mixes of ancient techniques. 5 | 6 | I am inspired by Max Shron's excellent book, *Thinking with Data* (LINK). Max describes the linkages between data analysis and rhetoric, industrial design, and communication. 7 | 8 | ### Teams 9 | 10 | Very, very few software projects or systems are built by one person. The vast majority of the time a team is involved. 11 | 12 | Software engineering requires **psychology** and **social awareness**. The same traits you find in 8-year-olds when building a sand castle. Here are some of the "new" skills: 13 | 14 | * An awareness of people's emotions 15 | * Knowledge of how people communicate 16 | * Recognizing and developing 17 | 18 | Software engineering is **education**. You should always be learning; a new codebase, new techniques, new ideas. These build upon your current understand. You'll also be teaching your peers or new hires. 19 | 20 | * Teaching 21 | * The Socratic Method 22 | * Documentation is like preparing curriculum. 23 | 24 | Software engineering is **communication**. The ongoing effort to hear what people mean, not just what you hear. Asking precise, probing questions to get to the heart of the matter. Modulating your own communication style to match your audience. 25 | 26 | 27 | ### Self Awareness 28 | 29 | *You* build computer products. Your body, brain and reason are involved in the endeavor. 30 | 31 | Software engineering is about **self awareness**. Don't write emails when you're angry. Try to control your ego and hubris when writing code, so it doesn't become overly complicated. 32 | 33 | * Self-awareness. Religion. Meditation 34 | 35 | 36 | Software engineering is about **nutrition**. Eat healthy food so your brain's chemical pathways function well. Exercise so you have energy after you're done with your work. 37 | 38 | 39 | ### Science 40 | 41 | Software engineering borrows many, many things from science. 42 | 43 | Troubleshooting and debugging are the same as the scientific method. 44 | 45 | Designing a computer application is awfully similar to designing an experiment. 46 | 47 | Data analysis in science and engineering are *exactly the same*. 48 | 49 | Algorithms and data structures are applied math. 50 | 51 | Explaining your data analysis is like journalism. 52 | 53 | ### Craft 54 | 55 | My best lessons in engineering came when I learned carpentry from my grandfather. He posed challenges and let me solve them multiple times, in different ways. That was a great series of lessons in complexity, the sublime beauty of good design, and the balance required between form and function. -------------------------------------------------------------------------------- /135 - Curiosity and Ego.md: -------------------------------------------------------------------------------- 1 | ## Curiosity and Ego 2 | 3 | 4 | * We want to know as much as possible 5 | * We want to learn things that will help us 6 | * We don't want to spend time on things we already know, or which we'll never use. 7 | * ROC curves 8 | * We don't want to learn things that are wrong. 9 | * Ego vs. 'meekness'. 10 | * Active learning and education 11 | * XKCD comic on learning Perl in high school 12 | * How to learn, and how to get better at something. What does science tell us? 13 | * Can curiosity be the right mental approach? 14 | 15 | 16 | -------------------------------------------------------------------------------- /140 - Net Neutrality.md: -------------------------------------------------------------------------------- 1 | # Net Neutrality 2 | 3 | http://consumerist.com/2014/04/29/everything-you-need-to-know-before-e-mailing-the-fcc-about-net-neutrality/ 4 | 5 | - Net neutrality idea - imagine if roads were like that 6 | • Traffic lights and speed limits changed depending on how much you paid 7 | • And on how much the other side paid. 8 | • It was also heavily subsidized by the government when first built. 9 | • Now we're auctioning off some sidewalks and tunnel rights. 10 | • Make it visual 11 | 12 | 13 | -------------------------------------------------------------------------------- /150 - MonteCarlo.R: -------------------------------------------------------------------------------- 1 | # R learning script for Monte Carlo methods 2 | library(ggplot2) 3 | library(VGAM) 4 | theme_set(theme_bw()) 5 | 6 | 7 | # blog post work 8 | set.seed(12345) 9 | req <- seq(1, 2250) 10 | 11 | qplot(x=req, y= (1 - ppareto(length(req), req, 0.138 ))*100, ylim=c(0,100), xlab="Number of Requests", ylab="% Complete") 12 | 13 | 14 | array(1 - ppareto(100, seq(1,100), 0.138 ))[20] 15 | plot(1 - ppareto(100, seq(1,100), 0.138 )) 16 | qplot(x=seq(1,100), y= 1 - ppareto(100, seq(1,100), 0.138 ), ylim=c(0,100)) 17 | 18 | requests.df <- as.data.frame(cbind(req, 1 - ppareto(length(req), req, 0.138 ))) 19 | names(requests.df) <- c("request_id","percent_complete") 20 | 21 | requests.df$ 22 | 23 | head(requests.df) 24 | 25 | -------------------------------------------------------------------------------- /150 - Project Paradox.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/150 - Project Paradox.png -------------------------------------------------------------------------------- /150 - Scratch.R: -------------------------------------------------------------------------------- 1 | # R learning script for Monte Carlo methods 2 | library(ggplot2) 3 | library(VGAM) 4 | 5 | # http://www.youtube.com/watch?v=cpc9D0EVYSk 6 | # It's really about simulation. Monte Carlo looks at probablility patterns. 7 | # Sample and replicate are really useful. 8 | # 9 | # 10 | # 11 | # 12 | # 13 | # 14 | # 15 | # 16 | # 17 | # 18 | # 19 | 20 | # Simulate a game of chance with coin tossing 21 | options(width=60) 22 | sample(c(-1,1), size=50, replace=TRUE) #sample from a given set. 23 | #Replace the values after you sample them 24 | 25 | # cumsum will do a rolling cumulative sum. That's handy. 26 | 27 | win <- sample(c(-1,1), size=50, replace=TRUE) 28 | cum.win <- cumsum(win) 29 | cum.win 30 | 31 | 32 | #extend, plot the sequence of cumulative winnings for 4 games 33 | par(mfrow=c(2,2)) #carves up the graphic frames into 4 pieces 34 | for (j in 1:4) { 35 | win <- sample(c(-1,1), size=50, replace=TRUE) 36 | plot(cumsum(win), type="l", ylim=c(-15,15)) 37 | abline(h=0) 38 | } 39 | 40 | #there's a lot of variability here. Interesting 41 | # what do we see? There's a level at which you should declare victory and stop 42 | # pick a random set.seed, set it to a large number 43 | 44 | # 1) what's the probability of breaking even after 50 games? 45 | # 2) what likely number of tosses that Peter will be winning? 46 | # 3) what's the value of Peter's best fortune? 47 | 48 | # simulate the random process once, and then repeat. 49 | # compute statistics as you repeat. 50 | # 51 | 52 | # user-defined function 53 | peter.paul <- function(n=50) { 54 | win <- sample(c(-1,1), size=n, replace=TRUE) 55 | sum(win) #fortune at the end of the game 56 | } 57 | 58 | peter.paul() 59 | 60 | F <- replicate(10000, peter.paul()) #calls a function a bunch of times 61 | #has a 1000 values 62 | max(F) #highest value 63 | 64 | table(F) #frequency binning 65 | par(mfrow=c(1,1)) 66 | plot(table(F)) #looks normally distributed 67 | # no odd numbers. Why is that? Doesn't go into it. 68 | # Only way to break even is if the head comes up n/2 times 69 | 70 | ## what are the chances he breaks even? It's the ratio of him finishing with 0 out of 1000 71 | # In the simulation it was 1119/10000, or .1119 72 | dbinom(25,size=50,prob=0.5) #comes out to be .112. That's really close 73 | 74 | #now on part 3 75 | 76 | 77 | 78 | 79 | 80 | 81 | # blog post work 82 | set.seed(12345) 83 | req <- seq(1, 2250) 84 | 85 | plot(ppareto(seq(1,100),1,0.548)) 86 | 87 | array(1 - ppareto(100, seq(1,100), 0.138 ))[20] 88 | plot(1 - ppareto(100, seq(1,100), 0.138 )) 89 | qplot(x=seq(1,100), y= 1 - ppareto(100, seq(1,100), 0.138 ), ylim=c(0,1)) 90 | qplot(x=seq(1,2250), y= 1 - ppareto(2250, seq(1,2250), 0.138 ), ylim=c(0,1)) 91 | 92 | cdf_pareto <- function(length, location) 93 | { 94 | sum(dpareto(seq(1,length),location, shape=1)) 95 | } 96 | 97 | req.df <- as.data.frame(req) 98 | names(req) <- c('request') 99 | req.df$pareto <- NULL 100 | req.df$pareto <- cdf_pareto(2250, 101 | 102 | qplot(data=req.df, x=req, y=pareto) 103 | 104 | 105 | alpha <- 3; 106 | k <- exp(1); 107 | x <- seq(2.8, 8, len = 300) 108 | plot(x, dpareto(x, location = alpha, shape = k)) 109 | qpareto(seq(0.1,0.9,by = 0.1),location = alpha,shape = k) 110 | -------------------------------------------------------------------------------- /151 - Project Planning 2.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Project Planning and Prioritization 3 | layout: post 4 | slug: project-expansion 5 | -- 6 | 7 | **Phases** 8 | 9 | 1. Assume customers of equal size, with equal request sizes. Assume all work is the same size. No contravening work. PMs of 1, 2, 3, up to 10 layers away. Add margins of error. 10 | - To add a feature that isn't requested, how much clearer of vision do you need to have? 11 | - What are the implications? 12 | - What are the assumptions of guessing. How to reduce that risk. Sampling. 13 | - Since risk is roughly proportional to guesstimate size, assume random risk up to 2X the size of the estimate. See what happens. 14 | - Punish misses 5X more than early successes. Human psychology for underpromising and overdelivering. 15 | 2. Assume customers of unequal size, with unequal request sizes. Repeat 16 | 3. Assume work is unequal in size, with unequal risk and impact. Repeat. 17 | - Impact if we don't take on too small or too-large work. Why. Repeat. 18 | 4. Play with different assumptions. Cynical. Idealistic. See what happens. 19 | 20 | Assume customers of unequal size, with unequal request sizes. Repeat 21 | 22 | - To add a feature that isn't requested, how much clearer of vision do you need to have? 23 | - What are the implications? 24 | - What are the assumptions of guessing. How to reduce that risk. Sampling. 25 | - Since risk is roughly proportional to guesstimate size, assume random risk up to 2X the size of the estimate. See what happens. 26 | - Punish misses 5X more than early successes. Human psychology for underpromising and overdelivering. 27 | - Consider impact of codebase dilution. Hiring more people and its inefficiencies. -------------------------------------------------------------------------------- /152 - Project Planning 3.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Project Planning and Prioritization 3 | layout: post 4 | slug: project-sizes 5 | -- 6 | 7 | **Phases** 8 | 9 | 1. Assume customers of equal size, with equal request sizes. Assume all work is the same size. No contravening work. PMs of 1, 2, 3, up to 10 layers away. Add margins of error. 10 | - To add a feature that isn't requested, how much clearer of vision do you need to have? 11 | - What are the implications? 12 | - What are the assumptions of guessing. How to reduce that risk. Sampling. 13 | - Since risk is roughly proportional to guesstimate size, assume random risk up to 2X the size of the estimate. See what happens. 14 | - Punish misses 5X more than early successes. Human psychology for underpromising and overdelivering. 15 | 2. Assume customers of unequal size, with unequal request sizes. Repeat 16 | 3. Assume work is unequal in size, with unequal risk and impact. Repeat. 17 | - Impact if we don't take on too small or too-large work. Why. Repeat. 18 | 4. Play with different assumptions. Cynical. Idealistic. See what happens. 19 | 20 | 3. Assume work is unequal in size, with unequal risk and impact. Repeat. 21 | - Impact if we don't take on too small or too-large work. Why. Repeat. 22 | 23 | - To add a feature that isn't requested, how much clearer of vision do you need to have? 24 | - What are the implications? 25 | - What are the assumptions of guessing. How to reduce that risk. Sampling. 26 | - Since risk is roughly proportional to guesstimate size, assume random risk up to 2X the size of the estimate. See what happens. 27 | - Punish misses 5X more than early successes. Human psychology for underpromising and overdelivering. 28 | 29 | Size estimates are notoriously inaccurate, often wrong by an order of magnitude or more. 30 | 31 | Size estimates: 32 | 33 | * Takes 1 person 2 days = 2 days 34 | * Takes 5 people 2 weeks = 10 days 35 | * Takes 10 people 10 weeks = 100 days 36 | 37 | Software estimates are usually logarithmic; making something bigger makes it 10X bigger, not 2X bigger. Also, the amount of risk increases proportionally. 38 | 39 | ### Questions 40 | 41 | * What happens when size estimates are unequal? How does risk play out? 42 | * Think about how iteration reduces risk. Shifting baselines. -------------------------------------------------------------------------------- /153 - Project Planning 4.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Project Planning and Prioritization 3 | layout: post 4 | slug: project-assumptions 5 | --- 6 | 7 | **Phases** 8 | 9 | 1. Assume customers of equal size, with equal request sizes. Assume all work is the same size. No contravening work. PMs of 1, 2, 3, up to 10 layers away. Add margins of error. 10 | - To add a feature that isn't requested, how much clearer of vision do you need to have? 11 | - What are the implications? 12 | - What are the assumptions of guessing. How to reduce that risk. Sampling. 13 | - Since risk is roughly proportional to guesstimate size, assume random risk up to 2X the size of the estimate. See what happens. 14 | - Punish misses 5X more than early successes. Human psychology for underpromising and overdelivering. 15 | 2. Assume customers of unequal size, with unequal request sizes. Repeat 16 | 3. Assume work is unequal in size, with unequal risk and impact. Repeat. 17 | - Impact if we don't take on too small or too-large work. Why. Repeat. 18 | 4. Play with different assumptions. Cynical. Idealistic. See what happens. 19 | 20 | 4. Play with different assumptions. Cynical. Idealistic. See what happens. 21 | -------------------------------------------------------------------------------- /160 - Communication and Storytelling.md: -------------------------------------------------------------------------------- 1 | # Communication 2 | 3 | **Reject the premise of an argument** 4 | 5 | * Context is king 6 | * Underlying assumption 7 | * Tone, message. 8 | 9 | 10 | ### Writing 11 | 12 | http://bighow.com/news/the-art-of-great-writing-60-writing-tips-from-6-alltime-great-writers 13 | 14 | Funny pictures and quotes. 15 | Math equations. 16 | Drafts 17 | Check spelling. 18 | Read out loud. Edit down. 19 | Remove all big words. 20 | Images - http://designrope.com/design/find-stock-photos-dont-suck/ 21 | Only use active verbs. 22 | - Make a blog post checklist 23 | § Has it been revised? 24 | § Has it been spell checked? 25 | § Has it been checked for grammar errors? 26 | § If it talks about a feature or example, did you mention the SQL version you're using? 27 | § Check against grammar and style books 28 | □ Strunk and white has impeccable style 29 | - "Write until you're absolutely in love with the work" 30 | 31 | * http://seriouspony.com/blog/2013/10/4/presentation-skills-considered-harmful 32 | * http://mobile.nytimes.com/2015/02/14/world/europe/russian-tv-insider-says-putin-is-running-the-show-in-ukraine.html?_r=1&referrer= 33 | * http://www.bakadesuyo.com/2014/12/how-to-read-people/ 34 | * http://ozar.me/2015/02/best-presentations-based-pain/ 35 | * https://tractionloops.com/web-property-systems/ 36 | 37 | 38 |
  • » Speaking: Entertain, Don’t Teach hilarymason.com
  • 39 |
  • 8 Conversational Habits That Kill Credibility | Inc.com
  • 40 | 41 | 42 | * https://www.khanacademy.org/partner-content/pixar/storytelling 43 | * https://hynek.me/articles/speaking/ 44 | 45 | * http://www.fastcodesign.com/3038950/evidence/the-science-of-politely-ending-a-conversation 46 | * http://www.theatlantic.com/education/archive/2014/12/how-scientists-are-learning-to-write/383685/?single_page=true 47 | * http://www.bobpusateri.com/archive/2015/02/why-you-should-submit-for-pass-summit-2015/ 48 | * http://www.bbc.com/future/story/20150324-the-hidden-tricks-of-persuasion 49 | * http://www.artofmanliness.com/2012/08/22/how-to-make-small-talk/ 50 | * http://www.washingtonpost.com/posteverything/wp/2015/05/26/powerpoint-should-be-banned-this-powerpoint-presentation-explains-why/ 51 | * http://paulgraham.com/talk.html 52 | * http://qz.com/778767/to-tell-someone-theyre-wrong-first-tell-them-how-theyre-right/ 53 | * http://blog.statuspage.io/why-public-apologies-suck 54 | * https://hackernoon.com/pr-101-for-engineers-7cd116cc5347 55 | * https://longreads.com/2017/04/12/the-elements-of-bureaucratic-style/ 56 | * http://andrewchen.co/professional-blogging/ 57 | 58 | 59 | Blog meta 60 | Who is my audience? 61 | DBAs who want to have a better relationship with their developers 62 | Developers who want to have a better relationship with their DBAs 63 | A DBA with 2+ years of experience, 1 or more dev teams to support, and friction 64 | A developer (DBE or not) with 1+ year of SQL Server development experience, who has a hard time working with their DBA(s). 65 | Write down one concept a minute for twenty minutes 66 | Then take each concept and write something about each for two minutes. Write two sentences about each. If you can't write two sentences, delete it. If it’s particularly juice, note that and move on. 67 | Minimum length - go for as long as you can. 68 | This increases the chances that someone big will link to it, and your traffic will explode 69 | Use SnagIt for screen capture 70 | Look over Google Analytics to figure out how to improve blog posts 71 | Why did post A get 50% more hits than post B? 72 | - What caused me the biggest pain? 73 | § This will always create new topics 74 | - Blog your life, challenges and improvements 75 | § Tactics I use to get through the day 76 | - Often limited to 1/2 to 2 pages 77 | - Blog posts: fairly condensed, deals with a specific topic, and it is transitory 78 | - Why do we write? 79 | § It helps us become a better researcher 80 | § It helps us learn 81 | § If we write properly, it helps us organize our thoughts 82 | - How often should I write? 83 | § 1-2 times a week is a good starting point 84 | -------------------------------------------------------------------------------- /160 - linkbait effectiveness.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/160 - linkbait effectiveness.png -------------------------------------------------------------------------------- /170 - Getting Started With Programmind.md: -------------------------------------------------------------------------------- 1 | To get started, I'd recommend learning Python. It's one of the easiest languages to learn, and it's also one of the most widely used (http://redmonk.com/sogrady/2014/06/13/language-rankings-6-14/). It's also an open-source language, so you can install it anywhere. I'd recommend installing Anaconda (http://continuum.io/downloads), which takes care of a lot of the version-incompatibility headaches you can run into with open-source tools. 2 | 3 | There are some decent online sites that walk you through how to learn Python, notably Codeacademy (http://www.codecademy.com/en/tracks/python). I'd also recommend a couple of books, (http://learnpythonthehardway.org/, http://www.amazon.com/Python-Cookbook-Alex-Martelli/dp/0596007973). I'd start with the online sites to learn the basics, and then progress up to the books. 4 | 5 | I'd also recommend signing up for a GitHub account (https://github.com/) and putting all of your code and projects there. GitHub accounts are free-and-open-source, and a lot of the most popular open-source tools are there (like Linux). If you work on projects regularly and improve, your GitHub account becomes a pretty compelling resume. 6 | 7 | It's going to take time, though, and a lot of patience. For me it was an endless series of evenings and weekends spent tinkering. One of my favorite teachers, Hilary Mason (http://www.hilarymason.com/), said that computer science is the endless process of playing with your curiosity and cleverness to find your way around an endless series of brick walls (http://www.hilarymason.com/presentations-2/devs-love-bacon-everything-you-need-to-know-about-machine-learning-in-30-minutes-or-less/ ). Many people run out of patience; it's probably the most common reason people stop learning to write code. 8 | 9 | The best way I've heard of to fight the disillusionment problem is to use code to work on a problem you're interested in, so it's not as abstract. For Sean, that could be playing with the music/audio utilities in Python (https://wiki.python.org/moin/PythonInMusic), or looking at stuff about Portland (food, housing, weather, etc). There are also local meetups that discuss programming (http://www.meetup.com/pdxpython/ and many others); those can be a lot of fun, and they're amazing places to learn. 10 | 11 | I hope this helps. 12 | Cheers! 13 | Dev -------------------------------------------------------------------------------- /180 - Incentives.md: -------------------------------------------------------------------------------- 1 | # Incentives 2 | 3 | 4 | 5 | ## We Don't Appreciate What Works Well 6 | 7 | * Cultural bias 8 | * "The squeaky wheel gets the grease" 9 | * What effects does this lead to 10 | * "Bad cases make for bad law" 11 | * Post - undervaluing preventative work, overvaluing heroic fixes 12 | * http://finance.yahoo.com/blogs/breakout/target-s-pr-nightmare-continues-160404828.html 13 | 14 | ## On Influence 15 | 16 | ### The Carrot 17 | 18 | ### The Stick 19 | 20 | ### Trading Favors 21 | 22 | ### Why It Matters 23 | 24 | * Different perspectives 25 | 26 | 27 | **Metrics and Unintended Consequences** 28 | 29 | Schools - http://tpep-wa.org/student-growth-overview/student-growth-case-studies/ 30 | 31 | Altruism - http://www.theatlantic.com/education/archive/2014/06/most-kids-believe-that-achievement-trumps-empathy/373378/ 32 | -------------------------------------------------------------------------------- /180 - incentives.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/180 - incentives.jpg -------------------------------------------------------------------------------- /190 - Project Animal Names.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/190 - Project Animal Names.jpg -------------------------------------------------------------------------------- /190 - Project Names 2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/190 - Project Names 2.jpg -------------------------------------------------------------------------------- /190 - Project Names.md: -------------------------------------------------------------------------------- 1 | ## Technology Names and FUD 2 | 3 | All technology products have names. While I try not to judge a book by its cover, I have found that some products can be identified as bad by their names alone. 4 | 5 | 6 | ### What's In a Name? 7 | 8 | What does a project or product name tell us? At the most basic level, they exist for easy identification. MySQL and PostgreSQL aren't named 'RDBMS 1' and 'RDBMS 2' for this reason. 9 | 10 | At a second level, they exist for *branding*. This is where the wheels fall off, because brands have contradictory goals: 11 | 12 | * They should be memorable 13 | * They should be accessible 14 | * They should describe the product 15 | 16 | What I have found is that product/project names can tell you a lot about the culture of the organization creating them. 17 | 18 | ### First There was The Sale 19 | 20 | When we are looking for a product to purchase, we need to know several things: 21 | 22 | * What the product does 23 | * How the product interrelates with *other* products 24 | * Where it comes from 25 | * Does it solve our business problem? 26 | * Does the company creating it 27 | * The cost 28 | * Supportability 29 | * Unambiguous. 30 | 31 | There is *no way* you can fit that into a single sentence, let alone a phrase or name. 32 | 33 | I am sick and tired of products that are "branded" to appeal to sales people. Why? Projects named by marketing people are inevitably targeted towards business executives, CIOs, CTOs, and managers. 34 | 35 | 36 | https://twitter.com/mrogati/status/395666192842510336 37 | 38 | Here are some names of technology projects/services that are given sales-y names: 39 | 40 | * Windows Azure 41 | * Windows 42 | * Office 43 | * Access 44 | * Word 45 | * Excel 46 | * Exchange 47 | * PowerPoint 48 | * Power View 49 | * Power Query 50 | * Power Pivot 51 | * Power Shell 52 | * Power BI 53 | * In-Memory OLTP 54 | * Q & A 55 | * Windows Azure SQL Database 56 | * SAP HANA 57 | * SalesForce 58 | * Sugar CRM 59 | * Tableau 60 | 61 | 62 | 63 | ### Meanwhile, IRL 64 | 65 | When we use a product day in and day out, we have different requirements: 66 | 67 | * Easy to pronounce 68 | * Unambiguous (both as a project and in normal language) 69 | 70 | ...and that's it. 71 | 72 | 73 | ### Age of the Geek 74 | 75 | I am a huge fan of unusual project names, because unusual names tells me critical: 76 | 77 | **The product is about substance, and not appearance** 78 | 79 | I have seen far, far too many sales pitches for products that look great and never work correctly. I have a simple theory: 80 | 81 | * There's never enough engineering talent to go around. 82 | * One of the big limitations of any organization is the number of people who *detract* from the work that really matters 83 | * Managers (who aren't visionaries or practical) 84 | * PMs (who aren't customer advocates) 85 | * Salespeople / Marketing (who care more about sales than the product) 86 | * Legal (who are worried about potential liability) 87 | * Anybody who works primarily via email 88 | * Anybody who thinks in purely theoretical concerns. 89 | * When engineers pick names, they have influence in the company 90 | * When marketers pick names, they have influence in the company 91 | 92 | Why? Because they are about function, and not appearance. 93 | 94 | * What can it do? 95 | * What are its capabilities? 96 | * How does it work? 97 | * What new features are involved? 98 | * How do I adopt it? 99 | * What are the limitations? 100 | * How well does it compare to competing options? 101 | 102 | 103 | 104 | Hadoop is not only the elephant in the room, *it's the name of a stuffed elephant* 105 | 106 | Here are some names of technology projects that are given unusual names: 107 | 108 | * Hadoop 109 | * Hive 110 | * F1 111 | * Spanner 112 | * Dremel 113 | * Mahout 114 | * Spark 115 | * Shark 116 | * Apollo 117 | * Red Dog 118 | * MLBase / MLlib 119 | * DeepDive 120 | * Sandy Bridge 121 | * Ivy Bridge 122 | * Bay Trail 123 | * Solr 124 | * Lucene 125 | * Gump 126 | * Git 127 | * Linux 128 | * Pig 129 | * Impala 130 | * ZooKeeper 131 | * Tomcat 132 | * Python 133 | * R 134 | * Cassandra 135 | * CouchDB 136 | 137 | 138 | (ADD A TABLE ) 139 | 140 | (ADD PET IMAGE: I think that engineers either spend too much, or too little time with their pets) 141 | 142 | I'm also a huge fan of technology project code names, because they're usually chosen by technical people. Marketing people don't pick names like this. 143 | 144 | Why does this matter? 145 | 146 | I want to know about the signal:noise ratio in a product. 147 | 148 | ### Signal 149 | 150 | 151 | ### Noise 152 | 153 | * Does it have all ___ list of features I don't care about? 154 | * Is it 'enterprise' ready? 155 | 156 | 157 | **Noisy Terms* 158 | 159 | * AlwaysOn (replaces Hadron) 160 | * PowerQuery 161 | * PowerView 162 | * In-Memory Index 163 | * Columnstore Index (to replace Apollo) 164 | * In-Memory OLTP (to replace Hekaton) 165 | * Windows Azure SQL Database 166 | * Windows Azure (to replace Red Dog) 167 | * Elastic Compute Cloud (EC2) 168 | * Simple Storage Service (S3) 169 | * Word 170 | * Excel 171 | * Access 172 | * Office 173 | * Office 365 174 | * '3rd generation i7' (to replace Ivy Bridge) 175 | 176 | 177 | 178 | 179 | 180 | Pig. 181 | Hadoop 182 | YARN 183 | Mahout 184 | Impala 185 | Hive 186 | Linux 187 | S3 188 | EC2 189 | Dremel. Drill 190 | 191 | http://www.theatlantic.com/features/archive/2014/04/the-origins-of-office-speak/361135/ 192 | 193 | vs. 194 | RAC 195 | AlwaysOn 196 | SQL Server 197 | Windows 198 | Office 199 | SSAS 200 | SSRS 201 | PowerView 202 | PowerPivot 203 | Tableau 204 | Azure 205 | BigQuery -------------------------------------------------------------------------------- /200 - data viz.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/200 - data viz.jpg -------------------------------------------------------------------------------- /200 - data viz.md: -------------------------------------------------------------------------------- 1 | # Data Visualization 2 | 3 | * Categorical 4 | * Numeric 5 | * Optimizing for human psychology 6 | 7 | Scatterplots, bar charts, etc. 8 | Include guidance. 9 | Data Visualization 10 | Difficulty vs comprehension. 11 | Writing your own query 12 | Understanding the data model 13 | Picking a good graphic 14 | Understanding the graphic 15 | Asking a different question. 16 | Repeat 17 | Time involved. 18 | -------------------------------------------------------------------------------- /2013-12-08-pagerank scale.md: -------------------------------------------------------------------------------- 1 | --- 2 | slug: pagerank 3 | title: Scaling PageRank in SQL 4 | layout: post 5 | author: Dev Nambi 6 | date: 2013-12-08 7 | meta-description: In this blog post Dev Nambi analyzes how PageRank scales in SQL. 8 | tags: 9 | - sql 10 | - sql development 11 | - PageRank 12 | - graph databases 13 | - scaling 14 | --- 15 | 16 | In [my last post](http://devnambi.com/2013/pagerank/) I put together an implementation of [PageRank](http://en.wikipedia.org/wiki/PageRank) using SQL. Now let's see how it scales. 17 | 18 | #### Tables 19 | 20 | I'll be using the same tables as before, **Nodes** and **Edges** 21 | 22 | {% highlight SQL %} 23 | CREATE TABLE Nodes 24 | (NodeId int not null 25 | ,NodeWeight decimal(10,5) not null 26 | ,NodeCount int not null default(0) 27 | ,HasConverged bit not null default(0) 28 | ,constraint NodesPK primary key clustered (NodeId) 29 | ) 30 | 31 | CREATE TABLE Edges 32 | (SourceNodeId int not null 33 | ,TargetNodeId int not null 34 | ,constraint EdgesPK primary key clustered (SourceNodeId, TargetNodeId) 35 | ,constraint EdgeChk check SourceNodeId <> TargetNodeId --ignore self references 36 | ) 37 | {% endhighlight %} 38 | 39 | 40 | #### Table Setup 41 | 42 | To run these tests I have my home workstation, a bog-standard Core i5-2500K CPU, 16GB of RAM, and a 1TB 7200pm SATA drive that I'll be using for both tempdb and the PageRank test database. 43 | 44 | Whenever I run a test, I want to measure a few key metrics: 45 | 46 | * CPU time 47 | * Clock time 48 | * Logical I/O (memory accesses) 49 | * Physical I/O (reads and writes) 50 | * Number of iterations needed to converge 51 | * Number of nodes that converge each iteration 52 | 53 | In addition, I want to start small and scale up my tests. I'll be running several tests: 54 | 55 | * 10 nodes, 15 edges 56 | * 100 nodes, 175 edges 57 | * 1K nodes, 3K edges 58 | * 10K nodes, 50K edges 59 | * 100K nodes, 750K edges 60 | * 1 mil nodes, 10 mil edges 61 | * 10 mil nodes, 100 mil edges 62 | * 100 mil edges, 1 billion edges 63 | * 1 billion edges, 10 billion edges 64 | 65 | (LIST THE TESTS) 66 | 67 | I'm tweaking the tests by adding a few performance tweaks: 68 | 69 | * Adding a [columnstore](ADD LINK) (columnar) index to the Edges table, since it is read-only. 70 | 71 | #### Results 72 | 73 | **TO DO** 74 | 75 | * CPU scaling 76 | * Iteration scaling 77 | * Logical I/O scaling 78 | * Time scaling 79 | * Physical I/O scaling 80 | * Bottleneck analysis 81 | 82 | 83 | 84 | #### Tweak #1: Data Compression 85 | 86 | SQL Server supports *data compression*, where a row is compressed to save space. It turns out that row compression for the Nodes table reduces its size by ____, reducing I/O by the same amount. 87 | 88 | #### Tweak #2: Excluding converged nodes 89 | 90 | The second tweak is an algorithm change, which excludes nodes that have converged from future iterations. 91 | 92 | 93 | #### Victory! 94 | 95 | As before, my code is available [on GitHub](https://github.com/DevNambi/SqlServerUtilities/tree/master/PageRank). 96 | 97 | **Happy Coding!** -------------------------------------------------------------------------------- /2013-12-28 Productivity Analysis.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/2013-12-28 Productivity Analysis.xlsx -------------------------------------------------------------------------------- /2014-05-19-social-network.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2014-05-19 4 | layout: post 5 | slug: social-network-intro 6 | title: Introduction to Social Network Theory 7 | meta-description: 8 | - passbac 9 | - analytics 10 | - social media 11 | - social network theory 12 | - nodexl 13 | - microsoft research 14 | --- 15 | 16 | 17 | 18 | ## Interesting Insights 19 | 20 | Facebook Explorer - http://developers.facebook.com/explorer/ 21 | https://developers.facebook.com/docs/graph-api/reference/v2.0/milestone 22 | 23 | 24 | 25 | 26 | 27 | ## Social Networks 28 | 29 | * Crowds matter online...they're larger than real life often, but we understand them less. Inherently weak signal. 30 | * Central tenet - social structure emerges from the aggregate of relationships among members of a population. 31 | * Emergence of cliques and clusters. Centrality (core) and periperhy (isolates), betweenness. 32 | * Methods - surveys, interviews, etc. 33 | * Social media is all about networks. 34 | 35 | Patterns are left behind. 36 | 37 | There are many kind of ties: 38 | 39 | * Send 40 | * mention 41 | * like link reply rate review favorite friend follow forward edit tag comment check-in. 42 | * one way relationships: lend money to. 43 | * bidirectional: is married to. 44 | 45 | Social media is meaningfully different from each other. They all have one thing in common: networks. 46 | 47 | The US doesn't have public squares anymore, with people who disagree with us. If it happens at all it happens online. 48 | 49 | A network is born whenever two entities are joined. 50 | 51 | Network theory: position, position, position. It's all relative. 52 | 53 | NodeXL - like social media for graphs. 54 | 55 | Trying to be the Firefox of GraphML. 56 | 57 | GraphML - XML for social networks (a data structure) 58 | 59 | Open Tools, Open Data, Open Scholarship. 60 | 61 | NodeXLGraphGallery.org - open data, user-generated collections/datasets. 62 | Open Scholarship - trying to make it easy. 63 | 64 | Try to using the tool. 65 | 66 | ### 6 social network structures 67 | 68 | Divided or unified crowds 69 | Divided - political/controversial topic. 70 | United - some communities are unified. 71 | Fragmented - brand clusters 72 | they don't reply to each other. 73 | Clustered - community clusters 74 | they interact a bit. 75 | what happens when people grow up a bit. 76 | Hub-and-spoke - broadcast network 77 | PR/marketing. 78 | Institutional speaker. 79 | Called the 'audience' pattern - people who retweet don't interact with each other. 80 | Out-hub-and-spoke - support network 81 | Airline support. 82 | @DellCares 83 | 84 | The density of the connections is how 85 | 86 | 87 | ## Centrality 88 | 89 | * Eigenvector centrality. 90 | * PageRank 91 | * Betweenness centrality - influencers. The 'bridge' score. 92 | ME - look at this for side business. 93 | 94 | * Some connections are very important. Bridges. Only 2 points of connection. But they're the only thing that connects those two networks. 95 | 96 | When you are the bridge, you may charge a toll. It could be only social capital. It's hard because you connect to something that is not like you. 97 | 98 | Don't be a hub. Be a bridge. 99 | 100 | Isolets. It means there's never been an @____ in their tweets. It means they're the new members. 101 | 102 | IDEA FOR PASS: use social network analysis to identify influencers and new people to connect with. 103 | 104 | #CMgrChat - social media managers. Basically it's a small village. 105 | 106 | Look at the social network of people who are better at this than you. Find out, and then use this analysis to figure it out. 107 | 108 | ME - read more of stuff by Marc Smith, MSR researcher 109 | 110 | http://www.connectedaction.net/ 111 | 112 | Last - plea for help. 113 | 114 | Because Excel is an ODBC sourcer, anything that can join 2 tables can work in NodeXL. 115 | 116 | 117 | -------------------------------------------------------------------------------- /2014-06-01-democratization-of-bi.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2015-06-01 4 | layout: post 5 | slug: democratization 6 | title: The Democratization of Analysis 7 | meta-description: 8 | tags: 9 | - bi 10 | - analysis 11 | - democratization of bi 12 | - statistics 13 | - programming 14 | - data science 15 | - fud 16 | - marketing 17 | - self service BI 18 | --- 19 | 20 | 21 | Fight the Hippo 22 | 23 | Not everybody is cut out for this kind of work 24 | 25 | Make better decisions 26 | 27 | Same problem as voting. Most companies are autocratic, authoritarian, even fascist (dissent will not be tolerated). The main protection is people can vote with their feet. 28 | 29 | 30 | When is it a good idea 31 | 32 | When is it a bad idea 33 | 34 | Overfitting and cross-validation. 35 | 36 | Problem: leadership doesn't know what to trust, because of FUD. Words, good stories and fancy arguments don't prove themselves without data. 37 | 38 | Know how to spot logical fallacies and statistical fallacies. Cut through the noise. -------------------------------------------------------------------------------- /210 - System Replacements.md: -------------------------------------------------------------------------------- 1 | # System Replacements 2 | 3 | * Have to keep the old system online 4 | * The people you have aren't always the people you need 5 | * It's cruel to hire folks for the new system and fire the old folks. 6 | * Vendor or temporary hires aren't good options because of misaligned incentives. 7 | 8 | The big way forward I can see is good, flexible design at the beginning. That, and training your existing folks to incrementally build a new system and acquire new skills 9 | 10 | * There's a thing about expectations vs. reality when it comes to timing, due dates and deliverables. 11 | 12 | http://effectivesoftwaredesign.com/2014/03/17/the-end-of-agile-death-by-over-simplification/ -------------------------------------------------------------------------------- /220 - Personal Automation.md: -------------------------------------------------------------------------------- 1 | # Personal Automation 2 | 3 | http://t.co/IHnSBmoTS1 4 | 5 | A classifer for important email (content + sender), a classifier for email -> response template, and a CRM timer 6 | 7 | The slowest part was going through old email & FB messages to build a training set. 8 | 9 | http://www.matthewjockers.net/2011/09/29/the-lda-buffet-is-now-open-or-latent-dirichlet-allocation-for-english-majors/ 10 | 11 | https://automatedinsights.com/blog/automation-at-work-an-interview-with-hilary-mason/ 12 | 13 | Replace myself with a series of Python scripts 14 | 15 |
  • Automation? Think Causation, not Correlation
  • -------------------------------------------------------------------------------- /220 - software architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/220 - software architecture.png -------------------------------------------------------------------------------- /230 - Association Rules in SQL AdventureWorks 2012.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/230 - Association Rules in SQL AdventureWorks 2012.sql -------------------------------------------------------------------------------- /230 - Basic ML Using SQL.markdown: -------------------------------------------------------------------------------- 1 | # Machine Learning using SQL 2 | 3 | ### Key points 4 | * Move the computation to the data 5 | * People already have databases. Other languages / tools are hard to find. Also then you have to keep data sets in set, deal with extracts and such, and the result often goes back *into* an applicatio or database for use. 6 | 7 | ### The Bimodal toolset (size does matter) 8 | There are roughly 2 sets of tools nowadays. Smaller data sets (less than ~10GB or so) can fit into memory, and can be analyzed on workstations using tools like R, Python or Julia. (PROVIDE LINKS). **Large** data sets (greater than 1TB) are best analyzed using 'big data' (distributed computing) tools like Hadoop or Mahout. 9 | 10 | Between the two is where *most* data sets currently fit. They'e too big to easily fit into memory, and too small to benefit from the massive scale of Mahout. They will fit into 'big data' solutions, sure, but at smaller scales like this you run into overhead challenges. 11 | 12 | If only we had a flexible, powerful, possibly interpreted language that would work on datasets between these two sizes. It turns out we do: *SQL*. 13 | 14 | There is one other option: sampling. It's perfectly viable to take a random sample of a large data set, confirm it has the same distribution properties, and work on it using something like R or Python. 15 | 16 | For many machine learning algorithms, SQL works just fine. Let's look at some examples of how to do this. 17 | 18 | http://arcanecode.com/2013/05/07/updating-adventureworksdw2012-for-today/ 19 | 20 | http://www-users.cs.umn.edu/~sarwat/RecDB/ 21 | 22 | 23 | ### Matrix math in SQL 24 | 25 | A *large* amount of machine learning algorithms use matrix mathematics. Techniques such as Principal Component Analysis use it extensively. 26 | 27 | **Matrix addition** 28 | 29 | Spare and dense 30 | 31 | Use data volumes too big for R 32 | 33 | **Matrix subtraction** 34 | 35 | **Matrix multiplication** 36 | 37 | **Matrix division** 38 | 39 | **Matrix transposition** 40 | 41 | **Eigenvalues** 42 | 43 | **Eigenvectors** 44 | 45 | 46 | 47 | ### TF-IDF 48 | 49 | **Cosine similarity** 50 | 51 | 52 | ### K-Means 53 | 54 | **Euclidean distance** 55 | 56 | ? How to measure variance covered? 57 | ? How to measure variance left? 58 | ? How to use functions besides Euclidean distance? 59 | ? 60 | 61 | 62 | ### Association Rules 63 | 64 | This is used for things like 'market basket analysis'. If you buy chips at a grocery store, what *else* are you likely to buy? Turns out it is chips. If you buy diapers at a grocery store, what are you likely buy? Turns out it's beer. 65 | 66 | Association rules are designed to work on a transactional table. Luckily SQL databases tend to have several of those. Let's use an example transaction table from the AdventureWorks database, [TABLE NAME] 67 | 68 | ### Bayesian Math 69 | 70 | 71 | 72 | ### Decision Trees 73 | 74 | ### Statistics 75 | 76 | * Percentiles 77 | * Boxplot 78 | * Median 79 | * Mode 80 | * Distribution 81 | * Correlation 82 | * T-test 83 | * Mutual information criterion 84 | * Rolling average 85 | * Trailing average -------------------------------------------------------------------------------- /230 - CameraAwesomePhoto.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/230 - CameraAwesomePhoto.jpg -------------------------------------------------------------------------------- /240 - SQL and Digraphs.markdown: -------------------------------------------------------------------------------- 1 | # SQL and dependency graphics (digraphs) 2 | 3 | Blog post on SQL dependency graphs 4 | 5 | The most popular ML algorithms are: 6 | 7 | Decision Trees / Regression Trees 8 | Linear Regression 9 | K-Means 10 | Association rules 11 | 12 | Apache Spark uses a DAG -------------------------------------------------------------------------------- /250 - Finding a Vacation Using Data.markdown: -------------------------------------------------------------------------------- 1 | # Quick Guide for a vacation 2 | 3 | Kate and I were wondering about the best way to have a vacation. Also, *why* a vacation? 4 | 5 | ### Why A Vacation 6 | 7 | People are not designed to work continuously. We certainly didn't evolve over hundreds of thousands of years to be indoors all the time, nor sitting, nor at a desk job. 8 | 9 | Also, there's the fundamental question of: why are we here? To help our employers make money? To make the world a better place? To enjoy ourselves? 10 | 11 | I'd argue it's the last two. 12 | 13 | People aren't deterministic. They wear down over time. Their productivity is unpredictable, bursty, and prone to lots of different factors. 14 | 15 | Many of the most effective engineers I know take mental health days and have lots of hobbies. They use their vacation time. They're invariably intentional about it. 16 | 17 | Life: optimize time for X. X is what you care about. Time is what you can trade for it. 18 | 19 | Vacations are like that. 20 | 21 | **Implications** 22 | 23 | * Don't go on an expensive vacation. Taking a weeklong trip to Hawaii may be pointless if you have to work for a month extra to pay for it. 24 | * Think about end goals. 25 | 26 | 27 | 28 | ## Cost 29 | 30 | * Airfare 31 | * Rental car 32 | * Places to stay 33 | * Luggage limitations - buying things 34 | 35 | 36 | ## Environmental impact 37 | 38 | Airplanes - .638 to 1 pound of CO2 per passenger per mile. For both of us to fly to San Francisco (a distance of <> miles) means a CO2 footprint of <> pounds. 39 | 40 | Driving - We drove 10,508 miles last year, using 268.4 gallons of gas. That comes to an average of 39.2 miles per gallon. We get better gas mileage on freeways because they aren't as hilly (hills are death to a Prius' gas mileage). Given that there are 19.6 pounds of CO2 in a gallon of gas, that comes out to a carbon footprint of 5261 pounds, or roughly 1/2 pound of CO2 per mile. 41 | 42 | There is also the carbon footprint to create the car, amortized. A Prius takes about <> pounds of CO2 to make, and will be lasting us hopefully 120K to 180K miles, since we bought it with 27K miles on the car. Assuming the lifetime mileage for a Prius is about 180K miles, that comes out to <> pounds of CO2 per mile to run the car. 43 | 44 | Rental Cars - rental cars are usually newer cars. A big part of . 45 | 46 | If we drove to San Francisco, that'd be carbon footprint of. 47 | 48 | There are other reasons as well. 49 | 50 | 51 | * Don't go so fast 52 | * Stop and enjoy the sites. 53 | * Variety, and serendipity, are the spice of life 54 | 55 | 56 | -------------------------------------------------------------------------------- /260 - Startups and Y Combinator.markdown: -------------------------------------------------------------------------------- 1 | # On Startups 2 | 3 | Y Combinator is the elephant in the room. It has created companies like AirBnB, Dropbox, FlightCar. 4 | 5 | There's a theme with all of these: 6 | 7 | *Make existing resources more efficient* 8 | *Build a network of supply and demand* 9 | 10 | Company: AirBnB 11 | Demand: People who want to stay somewhere overnight, for a few days. Vacationers, business folks at work. 12 | Supply of Spare Resources: Existing homes that are vacant. Spare rooms. Backyard cottages. 13 | 14 | Company: FlightCar 15 | Demand: 16 | Supply of Spare Resources: People who leave their cars at the airport 17 | 18 | Company: RescueTime 19 | Demand: 20 | Supply of Spare Resources: People who are working inefficiently. 21 | 22 | Company: Uber, Lyft 23 | Demand: People who want rides 24 | Supply of Spare Resources: People with cars and a bit of spare time. 25 | 26 | Company: Dropbox 27 | Demand: 28 | Supply of Spare Resources: 29 | 30 | Company: Payscale, Glassdoor 31 | Demand: People wanting to know about pay and working conditions in different jobs 32 | Supply of Spare Resources: People who are currently employed in different companies and can complain/brag about their 33 | 34 | 35 | http://www.industrytap.com/the-printer-that-can-print-a-house-in-20-hours/9056 36 | 37 | Company: 38 | Demand: 39 | Supply of Spare Resources: 40 | 41 | Company: 42 | Demand: 43 | Supply of Spare Resources: 44 | 45 | Company: 46 | Demand: 47 | Supply of Spare Resources: 48 | 49 | Company: 50 | Demand: 51 | Supply of Spare Resources: 52 | 53 | Company: 54 | Demand: 55 | Supply of Spare Resources: 56 | 57 | http://siliconhillslawyer.com/2014/03/15/409a-service-cash-cows-get-slaughtered/ 58 | 59 | http://www.wired.com/2014/04/no-exit/?hn -------------------------------------------------------------------------------- /270 - Smell Test Dilbert.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/270 - Smell Test Dilbert.jpg -------------------------------------------------------------------------------- /270 - The Smell Test.md: -------------------------------------------------------------------------------- 1 | ## The Smell Test 2 | 3 | * Longevity of code 4 | * Good judgment can be instinctual 5 | * Best way to develop is trial and error 6 | * Can't be taught, must be learned the hard way 7 | * Useful when applying your knowledge to different situations 8 | * Different way of thinking. Not conscious per se. 9 | * http://dilbert.com/strips/comic/2014-11-06/ -------------------------------------------------------------------------------- /280 - Agile and Waterfall.md: -------------------------------------------------------------------------------- 1 | ## Agile and Waterfall 2 | 3 | ### Story sizing 4 | * More important when in a semi-waterfall environment. 5 | * Good signal that your story process has too much overhead 6 | * Important for 'agilefall' because overhead happens all the time. 7 | 8 | 9 | ### Agilefall 10 | 11 | * The mullet of software development. Agile at its core, waterfall around it. 12 | * Project mgmt nightmare. -------------------------------------------------------------------------------- /290 - Data Science Evolution.md: -------------------------------------------------------------------------------- 1 | ## Data Science and Data Warehousing 2 | 3 | They are a cautious match. 4 | 5 | Data Warehousing - stability. 6 | 7 | Data Science - discovery, new things. 8 | 9 | * Those cause friction. 10 | * Feature engineering is a big problem - a lot of ML algorithms work only when you have the right data. DW limits what you get, and drastically slows down what you can add. 11 | * Tooling is another. DW is largely relational databases and cubes. Cutting edge is viz tools like Tableau, and 'big data' tools like Hadoop/Hive. 12 | * Data Science uses an overlapping tool set, including things like 13 | * DW includes a lot of process overhead. The reason is the assumption that 'if you build it, they (analysts) will come'. That doesn't happen very often. Also, operational reporting (what has happened) is far, far easier than predictive analysis (what will happen) and optimization (how can I change what will happen to be optimal). They have similar tooling, sometimes, but very dissimilar skills. 14 | * Operational reporting - no margin of error. 15 | * Prediction - margin of error. Limitations in what can be predicted with the data. 16 | * I've seen job postings for 'big data engineers' or 'platform engineers' that focuses largely on pipelines. 'Pipelines' are a natural fit for ETL developers and anyone who is comfortable with query optimization; the principles are the same, but the tools are different. 17 | * Not design-heavy. Data modeling happens *after* you know what you need to build. DS helps build data *products*. The data model for Netflix's "Movies you may like" is far less important than the application itself. 18 | 19 | **Becoming a DS** 20 | 21 | Starting over 22 | Rejecting jobs and work you are qualified for. Thats a trap. 23 | Being humble. Admit you don't know crap. Learn from smarter people. -------------------------------------------------------------------------------- /320 - Feature Engineering.md: -------------------------------------------------------------------------------- 1 | ## Feature Engineering 2 | 3 | * Longtitude and Latitude as an example 4 | * Mention deep learning 5 | * Adding different data sets. They're often from public sources. 6 | * Data cleansing / munging is huge. It's 80% of data science. 7 | * Not all attributes are created equal. In fact, they are dramatically unequal. They are also only identified using ML trial and error. Design and HIPPO won't help you here. 8 | * Goes with the idea that data is abundant. 9 | 10 | http://www.analyticshumor.com/search?updated-max=2014-04-03T08:24:00-07:00&max-results=10#sthash.Gqa5F4va.uxfs -------------------------------------------------------------------------------- /330 - Cognition for Data Professionals.md: -------------------------------------------------------------------------------- 1 | ## Limits of Cognition for Data Professionals 2 | 3 | * We are limited by our own physiology. 4 | * Attention spans, pomodoro. 5 | * Using sampling for rapid iteration. Stats helps with this. 6 | * Data visualization 7 | * Visualization *friction* 8 | * We can only see 7 items in working memory. But a pattern is an item. 9 | * Sleep is huge. 10 | * Eating well is huge. 11 | * Cost of distractions. 12 | * Telling a good story is important for this reason. People think in story and narrative, not numbers. 13 | * Book about limitations of smart people. 14 | * Thinking Fast and Slow. 15 | * Even statistically literate people don't do so intuitively. Psychology plays tricks against us. 16 | * Having people vet our work is helpful. 17 | * Trying to prove ourselves wrong is also helpful. 18 | * So is thinking of things from a fresh perspective. 19 | * Creativity is destroyed when you're too busy. Go for a walk. Take an extra shower. 20 | * Carry a little pad of paper around. Good ideas happen at random moments. Capture them. 21 | * (Articles in the \Health and \Lifehacker sections of Pocket) 22 | * Curiosity and pride. Humility is helpful. 23 | 24 | "Work is most fulfilling when you're at the comfortable, exciting edge of not quite knowing what you are doing." - https://twitter.com/alaindebotton 25 | 26 | * know your own skills 27 | * Know your weaknesses 28 | * Know your effect on your company, and the company's effects on the world. 29 | 30 | Look for wisdom everywhere. Difference between expert and expert beginner is self-reflection, realization and changing behavior. Self-awareness is the key. 31 | 32 | * http://www.theatlantic.com/health/archive/2013/10/how-to-build-a-happier-brain/280752/ 33 | * http://www.newrepublic.com/article/118714/interruptions-work-make-you-way-less-productive 34 | * http://georgestocker.com/2014/04/15/how-to-destroy-programmer-productivity/ 35 | * It's not about what you know, but rather your framework for adding more knowledge. 36 | * https://medium.com/@maebert/9-things-i-learned-as-a-software-engineer-c2c9f76c9266 37 | * http://www.sfu.ca/pamr/media-releases/2014/scientists-discover-brains-anti-distraction-system.html 38 | * http://well.blogs.nytimes.com/2014/03/10/do-brain-workouts-work-science-isnt-sure/?src=me&ref=general 39 | * http://ayearofproductivity.com/top-lessons-learned-a-year-of-productivity/ 40 | * http://joshldavis.com/2014/06/13/put-yourself-out-there/ 41 | * https://medium.com/@jakek/my-year-with-a-distraction-free-iphone-and-how-to-start-your-own-experiment-6ff74a0e7a50 42 | * http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0111081 -------------------------------------------------------------------------------- /330 - Cognition for Data Pros.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/330 - Cognition for Data Pros.png -------------------------------------------------------------------------------- /330 - know all the things.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/330 - know all the things.jpg -------------------------------------------------------------------------------- /350 - Example Math.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/350 - Example Math.xlsx -------------------------------------------------------------------------------- /350 - Matrix Prioritizaton.md: -------------------------------------------------------------------------------- 1 | ## Matrix Prioritization 2 | 3 | * Consider both importance and trimmed outliers. 4 | * You want the most healthy *mix* of attributes, not the best in a particular area -------------------------------------------------------------------------------- /360 - Life is an Optimization Problem.md: -------------------------------------------------------------------------------- 1 | ## Optimizing Life 2 | 3 | Life is an optimization problem 4 | 5 | * No single thing is good if you scale it up infinitely. 6 | * Very few things are good if you have nothing of them. 7 | * There is dimishing returns everywhere. 8 | 9 | * https://nplusonemag.com/issue-21/the-intellectual-situation/too-fast-too-furious/ 10 | * http://www.kpcb.com/design/how-to-be-happy-in-business-by-bud-cadell 11 | * http://www.seattletimes.com/opinion/living-the-small-happy-life-surprisingly-important-to-many/ 12 | 13 | ### Examples 14 | 15 | * Sleep 16 | * Food 17 | * Money 18 | * Housing 19 | * Exercise 20 | * Friends 21 | * Time spent with X 22 | * Programming, being productive 23 | * Playing (games, etc) 24 | 25 | Judge everything on a scale of -3 to 3. 0 means it neither adds nor detracts from your life. 26 | 27 | Then it's a linear algebra problem. What do optimal solutions look like? -------------------------------------------------------------------------------- /370 - Industry Comparisons.md: -------------------------------------------------------------------------------- 1 | # Industry Comparisons 2 | 3 | ## Keep Goals In Mind 4 | 5 | * Schools - it's not a race 6 | * Sports - the winner vs. everyone else 7 | * Nurses - false-negatives are bad (you want to be alerted all the time) 8 | * Fighter pilots - false positives are bad (don't fire on airliners) 9 | * General issues 10 | * Financialization - http://www.nakedcapitalism.com/2014/06/wikileaks-exposes-super-secret-regulation-gutting-financial-services-pact.html 11 | 12 | ## Government 13 | 14 | **Military spending** 15 | Contractor 16 | Cost after inflation 17 | Cost overruns 18 | Lifetime cost 19 | Number compared to previous generation 20 | Number of overseers 21 | 22 | ## Insurance 23 | 24 | • Insurance post is in \SQL Blog\Cheap Car 25 | * http://techcrunch.com/2014/06/21/will-google-enter-the-insurance-industry/ 26 | 27 | • Don't treat it the same across years 28 | • Do a two-year trailing average 29 | • How to account for differing premiums charged? It's a huge confound 30 | ○ Different lesson? You get what you pay for? 31 | ○ How to project expected return? Premium X loss ratio? 32 | ○ I need to include personal examples from 3 different companies. Yuck 33 | • How to get # of rejected complaints? 34 | • Mutual insurance or not? 35 | • For profit or not? 36 | 37 | http://www.insure.com/articles/interactivetools 38 | 39 | ## Loss Ratio 40 | @DevNambi Publicly traded companies release quarterly reports that include some of that info. cc @erinstellato 41 | 42 | ## Sharing Economy 43 | 44 |
  • Why Portland is keeping Uber out of the Rose City - GeekWire
  • 45 |
  • Seattle City Council worries about gaps in ride-service insurance | Local N
  • 46 | 47 | ### Government 48 | 49 | * Libraries - http://online.wsj.com/news/articles/SB20001424052702303996604580086191560891202?mg=reno64-wsj&url=http%3A%2F%2Fonline.wsj.com%2Farticle%2FSB20001424052702303996604580086191560891202.html 50 | 51 | ### Shared Cars 52 | 53 | http://mattstoller.tumblr.com/post/82233202309/ubers-algorithmic-monopoly-we-are-not-setting-the 54 | 55 | http://www.nytimes.com/2014/04/22/business/companies-built-on-sharing-balk-when-it-comes-to-regulators.html?_r=0 56 | 57 | http://www.wired.com/2014/04/trust-in-the-share-economy/ 58 | 59 | http://www.theverge.com/2014/6/17/5816254/taskrabbit-blows-up-its-auction-house-to-offer-services-on-demand 60 | 61 | http://blogs.citypaper.com/index.php/the-news-hole/desperate-hustle-way-life/ 62 | 63 | http://lefsetz.com/wordpress/index.php/archives/2014/07/05/kids-dont-care-cars/ 64 | 65 | http://bits.blogs.nytimes.com/2014/08/28/uber-and-lyft-have-become-indistinguishable-commodities/ 66 | 67 | http://www.wired.com/2014/10/volvo-turbo-engine-concept/?mbid=social_fb 68 | 69 | 70 | ## Automation 71 | 72 | ### Self-Driving Cars 73 | 74 | http://seattletimes.com/html/nationworld/2023106759_apxdriverlesscars.html 75 | 76 | ### Cooking Robots 77 | 78 | 79 | 80 | * Advertising: The price we pay for being a broke society 81 | * http://tiltthewindmill.com/breather-real-estate-and-the-innovators-dilemma/ 82 | * http://money.cnn.com/2014/10/15/technology/security/malvertising/index.html?iid=HP_River 83 | 84 | http://www.theatlantic.com/politics/archive/2014/04/city-state-governments-privatization-contracting-backlash/361016/ 85 | 86 | - Industries to disrupt 87 | • Law 88 | • Medicine 89 | § http://www.theatlantic.com/health/archive/2012/10/why-were-still-waiting-on-the-yelpification-of-health-care/263815/ 90 | • Realtors 91 | • Any cottage industries 92 | • Education 93 | § http://oedb.org/open/ 94 | § Email Daniel Strauss when I do 95 | 96 | http://arstechnica.com/science/2014/04/publishing-stings-find-predatory-journals-shoddy-peer-review/ 97 | 98 | http://www.salon.com/2014/06/20/the_music_industry_is_still_screwed_why_spotify_amazon_and_itunes_cant_save_musical_artists/ -------------------------------------------------------------------------------- /380 - Trust.md: -------------------------------------------------------------------------------- 1 | # Trust 2 | 3 | "Vaccine-autism fraud another reminder that people with wildly generalized mistrust turn out to be the biggest suckers for crazy stuff." 4 | 5 | 6 | * Specific vs. generic 7 | * "Trust but verify" 8 | * What do trustworthy companies/people have in common? -------------------------------------------------------------------------------- /400 - Software as a Craft.markdown: -------------------------------------------------------------------------------- 1 | # The Software Guild 2 | 3 | 4 | - Craft 5 | - Hardware as craft tools 6 | § Monitors, keyboards, mice are like power tools, drills for craftsmen 7 | § Advocate a hardware budget, people can buy their own 8 | § Hell for IT, but not bad 9 | § Same for furniture? 10 | - Blog post on guild laws 11 | - Blog post on the craft of software engineering 12 | 13 | * Post on office setup 14 | * http://www.wired.com/2014/11/ikea-bekant-desk/ 15 | 16 | 17 |
  • codinghorror: Here's where you can order the 2013 Software Craftsmanship Ca
  • 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | -------------------------------------------------------------------------------- /400 - software.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/400 - software.jpg -------------------------------------------------------------------------------- /410 - Scientific Method.markdown: -------------------------------------------------------------------------------- 1 | # Science and Data Professionals 2 | 3 | - Blog post - the scientific method vs how to troubleshoot 4 | • Confounds 5 | • Experimental design vs what happened before 6 | • Isolating a single variable 7 | • Correlation vs causation 8 | • When is correlation enough 9 | 10 | IT pros and scientists are similar 11 | 12 | 13 | http://arstechnica.com/science/2014/09/is-there-a-creativity-deficit-in-science/ -------------------------------------------------------------------------------- /420 - Inductive vs Deductive.markdown: -------------------------------------------------------------------------------- 1 | # Inductive vs. Deductive Learning 2 | 3 | - Inductive vs deductive learning and engineering 4 | -------------------------------------------------------------------------------- /430 - Hiring.md: -------------------------------------------------------------------------------- 1 | # Hiring 2 | 3 | * branded employees 4 | 5 | 6 | ## Hiring 7 | 8 | * https://medium.com/@ChaseTheTruth/hire-the-wisest-not-the-smartest-68b8b640ab5e 9 | 10 | ## How to find good DBAs or developers 11 | This is an information problem. It's also a FUD problem. 12 | 13 | ## Getting started (a guide for students) 14 | - Guide for new STEM students 15 | 16 | 17 | * http://www.mintzberg.org/blog/mbas-as-ceos <- MBAs make things worse 18 | 19 | * Questions to ask 20 | * Where to go looking 21 | * https://zapier.com/blog/remote-office-photos/ 22 | * http://www.groovehq.com/blog/being-a-remote-team 23 | * http://paddy.io/posts/recruiters/ 24 | * http://www.nytimes.com/2015/05/31/opinion/sunday/guess-who-doesnt-fit-in-at-work.html 25 | * https://medium.com/@joethorntonPF/structured-vs-unstructured-interviews-e35adef75db8 26 | * https://www.brentozar.com/archive/2016/04/interview-dbas-dont-ask-questions-show-screenshots/ 27 | * http://andytroutman.com/articles/2013/01/24/rockstar-programmers-are-not-assholes.html 28 | * http://www.wired.com/2014/02/smart-jerks-old-people-hard-things-company/ 29 | * https://www.shrm.org/resourcesandtools/hr-topics/technology/pages/it-employers-would-pay-15-percent-more-for-top-talent.aspx 30 | * https://medium.com/latticehq/how-much-does-employee-turnover-really-cost-d61df5eed151 31 | * http://www.b-list.org/weblog/2015/oct/19/destroy-all-hiring-processes/ 32 | * https://www.linkedin.com/today/post/article/20140527132535-50510-interviewing-engineers-is-a-team-sport 33 | * http://michaelochurch.wordpress.com/2014/02/06/if-you-stop-promoting-from-within-soon-you-cant/ 34 | * http://blog.landing.jobs/why-hunting-for-unicorns-is-bullshit-and-how-to-hire-a-great-ux-designer/ 35 | * http://www.huffingtonpost.com/susan-p-joyce/job-search-tips_b_4834361.html 36 | * http://blog.fogcreek.com/were-bad-at-interviewing-developers-and-how-to-fix-it-interview-with-kerri-miller/ 37 | * http://firstround.com/article/Mine-Your-Network-for-Early-Stage-Hiring-Gold 38 | * https://www.nczonline.net/blog/2015/09/my-favorite-interview-question/ 39 | * https://medium.com/@evnowandforever/f-you-i-quit-hiring-is-broken-bb8f3a48d324 40 | * http://blog.triplebyte.com/three-hundred-programming-interviews-in-thirty-days 41 | * https://medium.com/ride-tech-blog/open-sourcing-our-interviewing-preparation-guide-102021f81626 42 | * Psychology - because we're looking for good judgment 43 | * Functional literacy is disempowering. 44 | * We know how to use a tool, but not when/why. The 'when all you have is a hammer' syndrome 45 | * http://firstround.com/article/Heres-Why-Youre-Not-Hiring-the-Best-and-the-Brightest 46 | * http://www.codecademy.com/blog/142-why-building-a-data-science-team-is-deceptively-hard 47 | * http://rustyrazorblade.com/2014/09/21-ways-to-minimize-employee-retention/ 48 | * http://marlagottschalk.wordpress.com/2014/10/03/losing-talent-go-ahead-tell-yourself-its-mutual/ 49 | * http://blog.alinelerner.com/resumes-suck-heres-the-data/ 50 | * http://weblog.raganwald.com/2006/06/my-favourite-interview-question.html 51 | * http://carlos.bueno.org/2014/06/refactoring.html 52 | * http://swizec.com/blog/dear-tech-companies-this-is-not-how-you-hire-engineers/swizec/6643 53 | * http://www.brendangregg.com/blog/2017-11-13/brilliant-jerks.html 54 | 55 | ### Ways to troll recruiters 56 | 57 | "Sure, I know just the person to talk to!" <- refer them to another recruiter with a fake resume. 58 | 59 | * http://imgur.com/a/ZpNzE 60 | * http://blog.42floors.com/striking-back-recruiter-spam/ 61 | * http://qz.com/258066/this-is-why-you-dont-hire-good-developers/ 62 | * http://radar.oreilly.com/2014/10/resume-driven-development.html 63 | * http://www.cringely.com/2014/09/28/enemy-hr/ 64 | 65 |
  • Secretary Puzzle
  • 66 |
  • Why I Love Being A Programmer in Louisville (or, Why I Won’t Relocate to Wo
  • 67 |
  • adamlaiacano: "If you're going to hire 30 people, you're going to interview
  • 68 |
  • Never Have the "What Would It Take to Keep You Here?" Conversation - Rand's
  • 69 |
  • How should I add a new developer to the team? | Ars Technica
  • 70 |
  • Referly BlogThe Most Revealing Interview Question - Referly Blog
  • 71 |
  • What Company Culture IS and IS NOT - Rand's Blog
  • 72 |
  • Insider Secrets for Hiring Great People: Avoid the Big Mistakes | LinkedIn
  • 73 |
  • Elad Blog: Reference Check Candidates
  • 74 |
  • 7 Reasons I’ll Turn Down a Job After Interviewing With You
  • 75 |
  • Referly BlogThe Most Revealing Interview Question - Referly Blog
  • 76 |
  • How Stripe built one of Silicon Valley’s best engineering teams
  • 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | -------------------------------------------------------------------------------- /431 - CV of Failures.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/431 - CV of Failures.pdf -------------------------------------------------------------------------------- /431 - job searches as developer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/431 - job searches as developer.png -------------------------------------------------------------------------------- /431 - resume viz.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/431 - resume viz.png -------------------------------------------------------------------------------- /431- interviewing honesty.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/431- interviewing honesty.jpg -------------------------------------------------------------------------------- /431-decoding-job-descriptions.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/431-decoding-job-descriptions.jpg -------------------------------------------------------------------------------- /432 - Bad Work Situations.md: -------------------------------------------------------------------------------- 1 | # Bad Work Situations 2 | 3 | http://robertehall.com/2014/03/disengagement-economy-robert-hall-huffington-post/ 4 | 5 | What are coping mechanisms? -------------------------------------------------------------------------------- /432 - fail.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/432 - fail.jpg -------------------------------------------------------------------------------- /440 - Database Development.markdown: -------------------------------------------------------------------------------- 1 | # Database Development Series 2 | 3 | - Blog post - process of query tuning 4 | § Long running vs run many times 5 | § Eliminate blocking as a culprit 6 | § What's left 7 | § Crappy hardware 8 | § Query the optimizer can't do much with 9 | § Unreasonable demands (billions of rows) 10 | § Firehose analogy 11 | § Fire hydrant is too small (hardware) 12 | § Hose is too small (query) 13 | § Too much water (rowcount) 14 | § Factory analogy 15 | § Raw ingredients (hardware) 16 | § Factory (query opt) 17 | § Both 18 | § Get estimated and actual query plan 19 | § Check using plan explorer 20 | § Long running 21 | § Get actual query plan 22 | - Things that defeat the optimizer - case statements, functions, tables with data correlations, table variables 23 | 24 | Things that the opt loves - string literals, number-based joins, tables with random distributions, a sensible number of tables to join 25 | 26 | - Blog post on DB development best practices 27 | - Blog post on using EXCEPT or INTERSECT to make sure query refactors return the same data 28 | Best practices for ETL logging 29 | Include the spid 30 | Include @@rowcount 31 | Include sproc name 32 | Include dynamic SQL 33 | Include server name, login 34 | 35 | 36 | - Blog post idea - DB testing 37 | Regression testing 38 | Deployment validation 39 | Whitebox testing 40 | Black-box testing 41 | - ME: go to datamanipulation.net/sqlquerystress - load tool 42 | 43 | Blog post on release cadences 44 | The slowest wins. Everybody is forced to adjust for that. 45 | -------------------------------------------------------------------------------- /450 - Engineering Constraints.markdown: -------------------------------------------------------------------------------- 1 | # Engineering with Constraints 2 | 3 | We live in a world of constraints, trade-offs, and complex decisions. It is human nature to use heuristics and previous experience to limit our own choices and make rapid decisions (LINK TO BOOK ABOUT THIS, IN MY ROOM). 4 | 5 | Data structures in real life. Analogies to help people learn 6 | 7 | ### Time X fn_Y(Complexity) X fn_X(People) X fn(Motivation) X fn(Overhead) = constant 8 | 9 | When building software, there are some inherent limits. 10 | 11 | Venn diagram between speed, features, code debt/cleanliness/bugs 12 | 13 | What is the relationship between tools, brains, complexity, opportunity, and judgment? 14 | 15 | Blog idea = data quality, requirements, brains - comm overhead = constant 16 | 17 | The implications are unsettling. 18 | 19 | Human factors in engineering 20 | • Nonlinear factors 21 | • Non-determinism 22 | • Error rates 23 | 24 | #### Approach 1: Use documentation to reduce time required 25 | 26 | The idea is noble: use documentation to reduce complexity. Unfortunately, it's also not well thought out. Documentation has inherent bias, and is usually out of date a few minutes after it's written. The only exceptions to this appear to be documentation that is automatically rebuilt from the code. After all, **The Code Is The Law** (LINK). 27 | 28 | #### Approach 2: Add more people to speed things up 29 | 30 | Anyone who has read the Mythical Man-Month (LINK) knows this limitation. People don't scale. The communication overhead involved rapidly makes it harder . This is especially true with unskilled or unmotivated people; it's often faster without them than with them. 31 | 32 | #### Approach 3: Make things simpler 33 | 34 | This is a great idea in general. Unfortunately, making things *too* simple runs into the opposite problem: it's hard to do anything without adding complexity. 35 | 36 | #### Approach 4: No process 37 | 38 | This is also a decent idea. However, it is dependent upon second-order effects of your engineering team. They need to be adaptable, self-critical, and . The gains are often increased speed as inefficient processes are removed. 39 | 40 | Adds overhead, process 41 | Get enough of the big picture, get the details, and GO 42 | Best way to go fast is to go slow, and pare down to the essentials 43 | Get better at working not by studying how to work, but by working and using reflective practice 44 | Formality is overrated. 45 | 46 | #### Work everybody harder. 47 | 48 | This may work in the short term. In the long term, it is less efficient exhausting, attrition and morale problems creep up. There are also physiological limits; it's statistically unlikely that your engineering team can maintain 20-hour days or 100-hour weeks and stay mentally sharp. 49 | 50 | Also, it sends a terrible message. Any executive or manager thinks their employees should feel grateful for a grueling job at a pittance doesn't understand human psychology. People aren't robots. Their reactions are entirely non-linear and unpredictable for that, for which I'm grateful. 51 | 52 | #### Flatten 53 | 54 | This is one of my favorite approaches, because it removes overhead. It is also motivating. 55 | 56 | #### Make an 'innovative' team inside a old beast 57 | 58 | Blog post on agile development inside a waterfall framework 59 | • Messaging 60 | • Fitting stuff inside a timeline 61 | Inevitable friction arises. 62 | It also causes resentment on both sides. A rockstar team will feel like they are 'propping up' all these crappy other groups. The other groups will feel marginalized. 63 | 64 | #### Use a 'framework' or 'layer' to encapsulate and extend. 65 | 66 | Most of the time you're not reducing complexity. You're just hiding it. 67 | 68 | ### Price vs. value is non-linear. You also end up with various interfaces and APIs that you have to maintain 69 | 70 | 71 | 72 | Ratio of code to features 73 | 74 | - Blog post idea - price vs value 75 | § It's not linear 76 | § It's exponential 77 | 78 | Time series illustrates this well 79 | 80 | 81 | ### Trust is non-linear 82 | 83 | - Tension between prototyping (agile, incomplete) and the degradation of trust 84 | 85 | Requirements & user expectations is where things break down. Can't be agile & get them in a waterfall fashion 86 | 87 | 88 | 89 | ### Prototypes and Engineering 90 | 91 | * Humility 92 | * Getting it right the first time 93 | * Private failures and public successes 94 | * Knowing the goal is important 95 | - Tension between prototyping (agile, incomplete) and the degradation of trust 96 | Blog post on adapting to new changes 97 | Safely 98 | Wisely 99 | 100 | Trial periods are good for this 101 | 102 | http://blog.hut8labs.com/speeding-up-your-eng-org-part-i.html 103 | 104 | ### Ideas vs Execution 105 | 106 | "No business plan survives contact with reality" 107 | "No architecture survives contact with hardware" 108 | - Formality is overrated 109 | Adds overhead, process 110 | Get enough of the big picture, get the details, and GO 111 | Best way to go fast is to go slow, and pare down to the essentials 112 | Get better at working not by studying how to work, but by working and using reflective practice 113 | 114 | http://ejohn.org/blog/write-code-every-day/ 115 | 116 | http://scottberkun.com/2014/critique-dont-fuck-up-culture/ 117 | 118 | http://users.ece.utexas.edu/~adnan/pike.html 119 | 120 | 121 | 122 | # WHERE TO ADD? 123 | http://highscalability.com/blog/2012/2/27/zen-and-the-art-of-scaling-a-koan-and-epigram-approach.html 124 | - How to improve processes between business, developers, and DBAs 125 | 126 | - Agile requires good working conditions. Why? Because the pace is so rapid that people are the domain knowledge. They're even more critical. That means turnover is more disruptive than in slower organizations. 127 | § Blog: people add process to compensate for individual failings. And to set expectations. Why not go for more competence instead? Isn't process defeatist? 128 | -------------------------------------------------------------------------------- /460 - Reputation Systems and PageRank.markdown: -------------------------------------------------------------------------------- 1 | - Learn more, blog about PageRank algorithm 2 | • How can it be used elsewhere? 3 | 4 | Reputation 5 | 6 | 7 | ### Next Few Years 8 | * Data Science over the next few years will be darwinistic. 9 | * Companies that can be data-driven will thrive. 10 | * Others will die 11 | * Data Science as a C-level position. Strategic decisions about data will be C-level decisions w/ a management chain 12 | 13 | ? Can you build data creativity as a muscle? 14 | 15 | 16 | ** http://blogs.wsj.com/moneybeat/2014/12/19/buffett-reminds-his-top-managers-reputation-is-everything/ -------------------------------------------------------------------------------- /470 - Amazon.md: -------------------------------------------------------------------------------- 1 | # The 'Efficiency' Dystopia 2 | 3 | 4 | I hear a lot about 'efficiency' and 'progress'. My question, inevitably, is: who benefits? What does it cost? Do the benefits outweigh the costs? 5 | 6 | One of the biggest names in 'efficiency' and 'customer focus' is [Amazon.com](http://www.amazon.com). However, their relentless 'customer service' comes at a horrific price for the warehouse workers they 'employ'. 7 | 8 | * [Salon.com](http://www.salon.com/2014/02/23/worse_than_wal_mart_amazons_sick_brutality_and_secret_history_of_ruthlessly_intimidating_workers/) 9 | * [The Guardian](http://www.theguardian.com/technology/2013/dec/01/week-amazon-insider-feature-treatment-employees-work) 10 | * [Re/Code](http://recode.net/2014/06/30/amazon-was-a-prison-says-former-worker/) 11 | * [Gawker](http://gawker.com/true-stories-of-life-as-an-amazon-worker-1002568208) 12 | * [McCall](http://www.mcall.com/business/mc-amazon-temporary-workers-unemployment-20121215-story.html#page=1) 13 | * [Mother Jones](http://www.motherjones.com/politics/2012/02/mac-mcclelland-free-online-shipping-warehouses-labor) 14 | * [Forbes](http://www.forbes.com/sites/eamonnfingleton/2013/11/25/amazon-com-is-accused-of-slave-driving-after-bbc-secretly-videotaped-warehouse-conditions/) 15 | * [Business Insider](http://www.businessinsider.com/brutal-conditions-in-amazons-warehouses-2013-8) 16 | * [The International Business Times](http://www.ibtimes.com/amazoncoms-workers-are-low-paid-overworked-unhappy-new-employee-model-internet-age-1514780) 17 | 18 | 19 | This isn't limited to the U.S. In Germany, which has a strong union tradition, workers (read: people) are protesting [because the working conditions are inhumane](http://seattletimes.com/html/specialreportspages/2024340124_amazongermanyxml.html). Clearly this is because Germans are well known to be good-for-nothing slackers. 20 | 21 | In the U.K, the BBC did an [undercover investigation](https://www.youtube.com/watch?v=CXWJ4GfQ22E) to show what working at one of the warehouses is like. 22 | 23 | It is painfully obvious that the people who work to deliver goods for Amazon aren't treated like people; they are treated like cogs, to be worn down and discarded, because *there are always more cogs*. Letting people work at a sane pace, giving them access to medical care, heat, or decent bathroom breaks would put a (miniscule) dent in the bottom line. 24 | 25 | This is a more profitable way to run a business. That's the goal, right? -------------------------------------------------------------------------------- /480 - computing women.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/480 - computing women.jpg -------------------------------------------------------------------------------- /480 - lego_gender.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/480 - lego_gender.jpg -------------------------------------------------------------------------------- /480 - racism and bigotry.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/480 - racism and bigotry.jpg -------------------------------------------------------------------------------- /480 - recruiting WIT.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/480 - recruiting WIT.jpg -------------------------------------------------------------------------------- /480 - what happens we're out.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/480 - what happens we're out.png -------------------------------------------------------------------------------- /480 - women_astronomer.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/480 - women_astronomer.jpg -------------------------------------------------------------------------------- /480- perfectcrime.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/480- perfectcrime.png -------------------------------------------------------------------------------- /490 - Chart of Cosmic Exploration.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/490 - Chart of Cosmic Exploration.jpg -------------------------------------------------------------------------------- /490 - scientific method.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/490 - scientific method.jpg -------------------------------------------------------------------------------- /490 - what would feynman.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/490 - what would feynman.png -------------------------------------------------------------------------------- /500 - Intro to Caching and Core Algos.markdown: -------------------------------------------------------------------------------- 1 | # Caching 2 | 3 | 4 | • Caching and eviction policies 5 | § LRU (least recently used) 6 | § Time-based 7 | § LRU w/ priority (risk) 8 | • Ways to store data in cache for fast retrieval 9 | § Similar to Thomas Kesjer's 'grade of the steel' blog posts 10 | § Use Powershell? C#? C++? 11 | • O(1) and O(log N) 12 | 13 | Go over core algorithms. Think about Peter Thiel's limitations of current software to rank them by importance. 14 | 15 | - Blog post - Core algorithms 16 | • Hashing and partitioning algorithms 17 | § How to deal with skew? 18 | • Compression algorithms 19 | § Common 20 | § State of the art 21 | § Leverage data patterns for better compression (columnar) 22 | • Caching and eviction policies 23 | § LRU (least recently used) 24 | § Time-based 25 | § LRU w/ priority (risk) 26 | • Ways to store data in cache for fast retrieval 27 | § Similar to Thomas Kesjer's 'grade of the steel' blog posts 28 | § Use Powershell? C#? C++? 29 | • O(1) and O(log N) 30 | 31 | Vector computation - a panacea or not? 32 | 33 | * http://www.extremetech.com/extreme/188776-how-l1-and-l2-cpu-caches-work-and-why-theyre-an-essential-part-of-modern-chips 34 | * http://igoro.com/archive/gallery-of-processor-cache-effects/ 35 | * http://www.damninteresting.com/on-the-origin-of-circuits/ 36 | * http://www.reedbeta.com/blog/2015/01/12/data-oriented-hash-table/ 37 | 38 | Columnar compression - pull out as a CSV, flip into rows using AWK, http://www.unix.com/shell-programming-and-scripting/211181-converting-rows-columns-csv-file.html , and try compressing that way. -------------------------------------------------------------------------------- /501 - Moore's Law.md: -------------------------------------------------------------------------------- 1 | # Moore's Law 2 | 3 | * http://www.extremetech.com/computing/178529-this-is-what-the-death-of-moores-law-looks-like-euv-paused-indefinitely-450mm-wafers-halted-and-no-path-beyond-14nm# 4 | * What are the implications? 5 | * http://fgiesen.wordpress.com/2014/07/07/cache-coherency/ -------------------------------------------------------------------------------- /502 - Self-Documenting Code.md: -------------------------------------------------------------------------------- 1 | # Self-Documenting Code 2 | 3 | * Is it a myth? 4 | * What does it look like? 5 | * Is a spectrum? -------------------------------------------------------------------------------- /510 - Analysis of Brilliant People.markdown: -------------------------------------------------------------------------------- 1 | - Analysis of brilliant people 2 | 3 | Look for common traits 4 | Learn from the best. 5 | 6 | 7 | ### Balance 8 | 9 | * Mastery of a skill comes by working for long periods of time. 10 | * Life happens while we make other plans. The unexpected is a fertile source of new ideas. 11 | 12 | These two statements are both true, and also contradictory. It's a struggle to find a good balance. 13 | 14 | Premise: actions speak louder than words. If we want to be better, we should learn from those people who made a big impression in their time. 15 | 16 | What general lessons do they have? What threads are there in common? 17 | 18 | The assumption is that they were more than just smart. They also had a process and lessons that helped translate that intelligence into results. 19 | 20 | "It is necessary for you to learn from others' mistakes. You will not live long enough to make them all yourself." - Hyman G. Rickover 21 | 22 | "The way to tell a great idea is that, when people hear it, they say, 'Gee, I could have thought of that.'" – Feynman, quoted by Townes 23 | 24 | 25 | * http://www.hanselman.com/blog/ScottHanselmansCompleteListOfProductivityTips.aspx 26 | * https://podio.com/site/creative-routines 27 | * http://www.moreintelligentlife.com/content/edward-carr/last-days-polymath 28 | * http://nautil.us/issue/18/genius/super_intelligent-humans-are-coming 29 | * http://nautil.us/issue/18/genius/if-you-think-youre-a-genius-youre-crazy 30 | * http://nautil.us/issue/19/illusions/the-loneliest-genius 31 | * http://seekingintellect.com/2014/12/17/practical-advice-from-leonardo-da-vinci-on-learning-and-honing-your-craft.html 32 | * http://ethanwiner.com/adultbeg.html 33 | * http://www.wired.com/2015/05/inside-ilm/ 34 | * http://www.nytimes.com/2015/07/26/magazine/the-singular-mind-of-terry-tao.html 35 | * http://www.brainpickings.org/2015/01/29/music-brain-ted-ed/ 36 | * http://nautil.us/blog/how-a-genius-is-different-from-a-really-smart-person 37 | 38 | Lessons learned from Genius 39 | 40 | 41 | Processing Rank: 42 | 1. Einstein 43 | 1. http://higherpayingskills.com/2011/12/how-einstein-got-smart-learning/ 44 | 2. Thomas Jefferson 45 | 3. Ben Franklin 46 | 4. Napoleon 47 | 5. Leonardo da Vinci 48 | 6. Tesla 49 | 7. Stephen Hawking 50 | 8. Isaac Newton 51 | 9. Marie Curie 52 | 10. Alan Turing 53 | 11. Thomas Edison 54 | 12. Steve Jobs 55 | 56 | 57 | General savants 58 | Tesla 59 | Teddy Roosevelt 60 | Edison 61 | Napoleon 62 | Thomas Jefferson 63 | Peter the Great 64 | Leonardo da Vinci 65 | Leon Battista Alberti 66 | Aristotle 67 | Archimedes 68 | Omar Khayyam 69 | Frederick II (Frederick the Great) 70 | Albertus Magnus 71 | Ben Franklin 72 | Goethe 73 | Henry Poincare 74 | Physics geniuses 75 | Einstein 76 | Newton 77 | Feynman 78 | Bohr 79 | Stephen Hawking 80 | Galileo Galilei 81 | Rene Descartes 82 | Pascal 83 | Other Manhatten project folks 84 | Marie Curie 85 | Carl Sagan 86 | John von Neumann 87 | Computer geniuses 88 | Turing 89 | Coders At Work 90 | Steve Jobs 91 | Nathan Myhrvold 92 | Herbert A Simon 93 | 94 | Reading Material 95 | Sleep Habits - http://amolife.com/personality/great-people-sleep-less.html 96 | 97 | http://carymillsap.blogspot.com/2014/02/how-did-you-learn-so-much-stuff-about.html?m=1 98 | 99 | I admire the intersection of passion, ethics, and competence. 100 | Turn this into a venn diagram 101 | Passion alone - flailing around 102 | Ethics alone - ivory tower debates 103 | Competence alone - amoral burnout 104 | Passion and ethics, no competence - hippies 105 | Passion and competence, no ethics - CEOs 106 | Ethics and competence, no passion - 107 | 108 | What about how they grew up? What were their influences? 109 | 110 | http://ayearofproductivity.com/top-lessons-learned-a-year-of-productivity/ 111 | 112 | http://www.newyorker.com/tech/elements/walking-helps-us-think 113 | -------------------------------------------------------------------------------- /510 - Brilliant People.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/510 - Brilliant People.png -------------------------------------------------------------------------------- /510 - Smart People Traits.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/510 - Smart People Traits.xlsx -------------------------------------------------------------------------------- /520 - healthy foods.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/520 - healthy foods.jpg -------------------------------------------------------------------------------- /520.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/520.jpg -------------------------------------------------------------------------------- /530 - 10 commands of architecture.markdown: -------------------------------------------------------------------------------- 1 | # 10 Commandments of System Architecture 2 | 3 | - Blog post idea - 10 commandments of system architecture 4 | Each commandment comes from a truth of the world 5 | Truths of the world are a web 6 | No singletons 7 | Systems must scale 8 | Scale-out is cheaper & more efficient than scale-up 9 | Coupling is bad 10 | Development time is expense 11 | Time is money 12 | Time is short 13 | Interfaces are good things 14 | Design must be easily refactored 15 | 80/20 rule 16 | Enlightened and lazy 17 | Time is money 18 | Systems have inertia 19 | Change is a constant 20 | • Blog post idea - how do we think about systems architecture? 21 | • Architect - needs to think at multiple level of abstractions. Jump between them 22 | § The easier the better 23 | 24 | - http://highscalability.com/blog/2012/2/27/zen-and-the-art-of-scaling-a-koan-and-epigram-approach.html 25 | 26 | ## System Architecture Commandments 27 | Each commandment comes from a truth of the world 28 | Truths of the world are a web 29 | No singletons 30 | Systems must scale 31 | Scale-out is cheaper & more efficient than scale-up 32 | Coupling is bad 33 | Development time is expense 34 | Time is money 35 | Time is short 36 | Interfaces are good things 37 | Design must be easily refactored 38 | 80/20 rule 39 | Enlightened and lazy 40 | Time is money 41 | Systems have inertia 42 | Change is a constant 43 | • Blog post idea - how do we think about systems architecture? 44 | • Architect - needs to think at multiple level of abstractions. Jump between them 45 | § The easier the better 46 | 47 | 48 | 49 | * Software engineering 50 | * Trust, forgiveness, bad behavior & limits 51 | * Boundaries and interfaces 52 | * Abusive behavior 53 | * Conway's Law 54 | Therefore a failure top understand humans is a serious weakness 55 | - Human factors in engineering 56 | • Nonlinear factors 57 | • Non-determinism 58 | • Error rates 59 | • Efficiency 60 | -------------------------------------------------------------------------------- /540 - Learning and Retention Methods.markdown: -------------------------------------------------------------------------------- 1 | ## Learning and Retention Methods 2 | 3 | Mental capacity (RAM) 4 | Computers (hard drive) 5 | Goals: 6 | * Latency 7 | * Accuracy 8 | * Depth 9 | * Breadth 10 | * Connection 11 | 12 | How people think is important. 13 | 14 | 1-back and 2-back test 15 | 16 | Software Engineers build from their mind. The same way that a construction worker benefits in all sorts of subtle ways from keeping in shape, software engineers benefit from keeping their brains active and healthy. 17 | 18 | Music 19 | Sleep/rest 20 | Meditation 21 | Drugs - like steroids. -------------------------------------------------------------------------------- /550 - SQL on RDS.markdown: -------------------------------------------------------------------------------- 1 | # An Introduction to SQL Server on Amazon RDS 2 | 3 | > How to calculate IOPS on your current server 4 | 5 | EBS and its limitations 6 | 7 | Planning for failure. AWS forces architecture to a higher standard. 8 | 9 | ### Blog Post Planning 10 | 11 | * Replaces DBAs. More particularly, it moves them up the value chain, to more complicated operations' roles, or more development/business work. 12 | * Put AdventureWorks on the server 13 | * Come up with a mix of CRUD operations for AdventureWorks 14 | * Mix of procs and direct queries. That's normal. 15 | * Use SQLIOSim (or something else?) for load testing. 16 | * Do it from a different EC2 instance in the same region. 17 | * Test the network latency & bandwidth before doing so. 18 | * Run this on multiple different machines in different regions. 19 | * Look at Scalyr to see how many machines they needed for statistical significance (representative sample) 20 | * Compare to SQL Azure 21 | * Run from a different machine in same region 22 | * But first, measure bandwidth and latency 23 | * Use Powershell remoting - learn how first. 24 | 25 | **Number of Tests** 26 | 27 | * Per region, per instance size 28 | * Enough for statistical certainty. Say, 20 each 29 | 30 | **Price comparisons** 31 | * Price per month 32 | * Price per hour 33 | * 99th percentile for each query. Price vs query runtime (inverse) vs concurrency. 34 | -------------------------------------------------------------------------------- /560 - Balance.markdown: -------------------------------------------------------------------------------- 1 | - Extremes are silly 2 | • Things that are good become bad w/ too much of them 3 | • Balance is necessary 4 | • That's why a straw man is a stupid idea. 5 | 6 | http://www.theatlantic.com/health/archive/2014/06/the-dark-knight-of-the-souls/372766/ 7 | 8 | 9 | -------------------------------------------------------------------------------- /570 - Housing Using Data.md: -------------------------------------------------------------------------------- 1 | # Housing Using Data 2 | 3 | http://dealloc.me/2014/05/24/opendata-house-hunting/ 4 | http://www.nytimes.com/2014/07/20/realestate/using-data-to-find-a-new-york-suburb-that-fits.html?_r=0 -------------------------------------------------------------------------------- /600 - Advanced ETL Approaches.markdown: -------------------------------------------------------------------------------- 1 | # Titles 2 | 3 | * You Don't Know ETL 4 | * Dr. ETL Meet Hyde 5 | * Dr. ETL Meet Data Hyde 6 | * Dr. Data Meet ETL Hyde 7 | * **Advanced ETL Using T-SQL** 8 | * ETL Alchemy 9 | * You Can't Handle the ETL 10 | 11 | # Summary 12 | 13 | One of the most common, complicated problems for data professionals is turning oddly-structured data into clean data. In this session we will look at practical, proven ways to solve to complicated data-transformation problems using T-SQL. 14 | 15 | 16 | # Abstract 17 | One of the most common, complicated problems for data professionals is turning oddly-structured data into clean data. This problem is getting more and more common. Data is increasing in size and complexity, and the most efficient ways to analyze it are never the original format. 18 | 19 | In this session we will look at practical, proven ways to solve to complicated data-transformation problems using T-SQL. Examples include denormalizing historical dimensions (Type-2), billing system ETL, the bill-of-materials problem, multithreading interdependent ETL processing, and advanced change detection methods. You'll learn general techniques to tackle any data-transformation problem in your ETL processing. 20 | 21 | 22 | @DevNambi Yes, please, you should still submit your session. Be clear about your main objectives, those will stand out. 23 | 24 | 25 | 26 | 27 | # Session Notes 28 | 29 | Versioning 30 | Fact versioning 31 | aggregation based on type-1 dimensions 32 | 33 | ETL framework 34 | type-2 denormalization 35 | iterator vs. CBL 36 | ETL parallelism 37 | digraph 38 | 'Real-time' ETL vs. not 39 | Comes with a warning. 40 | Retention trade-off 41 | No aggregation trade-off 42 | 43 | Type-1 denormalization 44 | Type-2 denormalization 45 | Change detection 46 | 47 | Relational for things it's not designed for 48 | Joint courses ETL 49 | 50 | Evaluate tools to see if they do this 51 | 52 | 53 | ## Tools 54 | Powershell 55 | T-SQL 56 | SSIS 57 | Pitch the SSIS extensions 58 | Hadoop 59 | Python 60 | C# - *not* a good idea -------------------------------------------------------------------------------- /610 - ETL tips and Tricks.markdown: -------------------------------------------------------------------------------- 1 | # ETL Tips and Tricks 2 | 3 | Logging 4 | 80/20 rule for bottlenecks 5 | Parallelism 6 | Amdahl's law 7 | Load frequency wags the dog. 8 | ETL 'frameworks' & not-invented-here syndrome 9 | 10 | Difficulty = Data Volume X Load Frequency X Types of ETL / Talent^2 11 | 12 | ## What is best for Hadoop / Hive / Pig / Cascading / PoSH? 13 | 14 | ### Do a comparison-contrast 15 | -------------------------------------------------------------------------------- /620 - Data Science Intro.markdown: -------------------------------------------------------------------------------- 1 | # Titles 2 | * Data Science: Field of Vision 3 | * Data Science: Beyond the Hype 4 | * Data Science: Beyond the Hype Cycle 5 | * Machine Learning for Mere Mortals 6 | 7 | # Summary 8 | 9 | Machine learning is a way to find meaning in data. This is a fun and gentle introduction to the world of machine learning. You'll learn to implement common techniques in T-SQL and solve everyday problems. 10 | 11 | # Abstract 12 | 13 | Machine learning is a hybrid of computer science and math. It's used everywhere: web search (Google, Bing), recommendation engines (Netflix, Amazon, LinkedIn), computational vision (self-driving cars), and natural language processing (Google Translate, Klout). 14 | 15 | The basics of machine learning are simple. You don't need to be a level 18 data scientist to use machine learning to solve problems. 16 | 17 | Join fellow data geek Dev Nambi in this fun and gentle introduction to the world of machine learning. We will cover common techniques such as clustering, supervised vs unsupervised learning, and learning at scale. Finally, you'll learn how to implement common machine learning techniques in T-SQL. 18 | 19 | ### Abandoned Abstracts 20 | 21 | 'Data Science' is the all the rage these days. Most of the people I've spoken with are hesitant, probably because they weren't good at math. 22 | 23 | All of the techniques in data science are pretty intuitive once you see what they're about. 24 | 25 | This session will be a fast introduction to the world of data science. 26 | 27 | We'll look at the software side of things, including feature extraction and rapid prototyping. 28 | 29 | We'll look at the business side of things, including 30 | 31 | We'll also dive into its use, including clustering, recommender systems, natural language processing, and computer vision. 32 | 33 | Machine learning is the science of building predictive models from available data, in order to predict the behavior of new data. 34 | 35 | # Content 36 | 37 | ## Business 38 | 39 | ### Story-telling 40 | 41 | ### 42 | 43 | ## Math 44 | 45 | ### Machine Learning 46 | 47 | ### Statistics 48 | 49 | ## Engineering 50 | 51 | Development 52 | 'Big Data' 53 | Optimization 54 | It's all about scripting 55 | Quick and dirty is the point. 56 | 57 | ## Common Perspectives 58 | 59 | It's about the scientific method 60 | You don't know what's going to work ahead of time 61 | Experience makes you stop asking stupid questions. But you can get jaded 62 | Curiosity helps you ask stupid questions. 63 | 64 | 65 | ## Applications 66 | 67 | Natural language processing 68 | Searching for correlations 69 | Grouping together alike objects 70 | Pricing 71 | Behavior 72 | Identifying unexpected relationships 73 | System linkages (splunk) 74 | Purchasing behavior (retailers, Amazon) 75 | People (dating, LinkedIn, recruiter) 76 | Music (Pandora) 77 | Movies (Netflix) 78 | 79 | 80 | 81 | Machine learning / data mining 82 | -------------------------------------------------------------------------------- /621 - photo.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/621 - photo.JPG -------------------------------------------------------------------------------- /640 - Making Data Friendly Organizations.markdown: -------------------------------------------------------------------------------- 1 | 2 | * Well rounded data scienctists are pretty rare. 3 | * Managers are thinking holistically (type 2) 4 | * Scientists are thinking more tactically (type 1) 5 | 6 | This was an exercise in feature eng and exploratory data analysis 7 | * Barga wanted to learn something about the members of this class 8 | * Expected to see natural clusters of profiles 9 | * How would you measure similarity? 10 | * "We want to understand who our customers are, how they use our product." 11 | * You have to start w/ some features, initial criteria for calibrating 12 | 13 | What would I do to improve the process for a subsequent round? 14 | * A student learning this stuff has a different scale 15 | * How do we define expert and could we infer from other data? 16 | * How would it mean to standardize the scale? 17 | * What features would you add? 18 | 19 | *You will always screw up cohort clustering the first time* 20 | 21 | ### Next Few Years 22 | * Data Science over the next few years will be darwinistic. 23 | * Companies that can be data-driven will thrive. 24 | * Others will die 25 | * Data Science as a C-level position. Strategic decisions about data will be C-level decisions w/ a management chain 26 | 27 | ? Can you build data creativity as a muscle? -------------------------------------------------------------------------------- /650 - Data To Decisions Education Abstract.html: -------------------------------------------------------------------------------- 1 |

    Colleges, Majors and Tuition - using data make decisions

    124 | 125 |

    Step 1: Ask questions

    126 | 127 |

    Step 2: Look at data

    128 | 129 |

    Step 3: Profit

    130 | 131 |

    Data is growing faster than ever. Anyone who can use data to make decisions has a big advantage and is in high demand.

    132 | 133 |

    Join fellow data geek Dev Nambi (@DevNambi) and learn how to answer thorny questions about picking a college, analyzing majors, and looking at tuition. We'll use clever questions, free data, and common tools like Excel, T-SQL and Powershell.

    134 | 135 |

    You'll also learn general techniques to make sound data-based decisions for any problem.

    136 | -------------------------------------------------------------------------------- /650 - Data to Decisions Ed abstract.md: -------------------------------------------------------------------------------- 1 | ### Colleges, Majors and Tuition - using data make decisions 2 | 3 | *Step 1: Ask questions* 4 | 5 | *Step 2: Look at data* 6 | 7 | *Step 3: Profit* 8 | 9 | Data is growing faster than ever. Anyone who can use data to make decisions has a big advantage and is in high demand. 10 | 11 | Join fellow data geek Dev Nambi (@DevNambi) and learn how to answer thorny questions about picking a college, analyzing majors, and looking at tuition. We'll use clever questions, free data, and common tools like Excel, T-SQL and Powershell. 12 | 13 | You'll also learn general techniques to make sound data-based decisions for any problem. -------------------------------------------------------------------------------- /700 - autotrader_scrape.py: -------------------------------------------------------------------------------- 1 | from bs4 import BeautifulSoup 2 | from urllib2 import urlopen 3 | from time import sleep # be nice 4 | import re 5 | 6 | BASE_URL = 'http://www.autotrader.com' 7 | 8 | def f7(seq): # de-duplication function 9 | seen = set() 10 | seen_add = seen.add 11 | return [ x for x in seq if x not in seen and not seen_add(x)] 12 | 13 | def make_soup(url): 14 | return BeautifulSoup(urlopen(url).read(), "lxml") 15 | 16 | def get_links(url): 17 | soup = make_soup(url) 18 | links = [BASE_URL + link['href'] for link in soup.find_all('a', href=re.compile('vehicledetails'))] 19 | return links 20 | 21 | def get_details(url): 22 | soup = make_soup(url) 23 | table = soup.find('table', class_='vehicle-stats') 24 | atid = table.find('td',text='AT Car ID:').next_sibling.get_text()[:11] 25 | price = table.find('span', class_='primary-price').get_text() 26 | mileage = table.find('td',text='Mileage').next_sibling.get_text() 27 | body = table.find('td',text='Body Style').next_sibling.get_text() 28 | color = table.find('td',text='Exterior Color').next_sibling.get_text() 29 | drive = table.find('td',text='Drive Type').next_sibling.get_text() 30 | fuel = table.find('td',text='Fuel Type').next_sibling.get_text() 31 | doors = table.find('td',text='Doors').next_sibling.get_text() 32 | return {"atid": atid, 33 | "price": price, 34 | "mileage": mileage, 35 | "body": body, 36 | "color": color, 37 | "drive": drive, 38 | "fuel": fuel, 39 | "doors": doors} 40 | 41 | if __name__ == '__main__': 42 | 43 | url = 'http://www.autotrader.com/cars-for-sale/searchresults.xhtml?zip=98103&Log=0&modelCode1=CTS&makeCode1=CAD&searchRadius=25&mmt=%5BCAD%5BCTS%5B%5D%5D%5B%5D%5D&showcaseListingId=353441599&showcaseOwnerId=100016026&captureSearch=true&showToolbar=true&Log=0' 44 | 45 | links = get_links(url) 46 | links = f7(links) # de-dupe 47 | print(len(links)) 48 | for link in links: 49 | data = get_details(link) 50 | print data 51 | sleep(1) # be nice 52 | 53 | -------------------------------------------------------------------------------- /9900 - Cloud Uploads.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - Cloud Uploads.jpg -------------------------------------------------------------------------------- /9900 - Graphical Models.PDF: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - Graphical Models.PDF -------------------------------------------------------------------------------- /9900 - IT_roles.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - IT_roles.jpg -------------------------------------------------------------------------------- /9900 - commit linkbait.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - commit linkbait.jpg -------------------------------------------------------------------------------- /9900 - complexity kills.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - complexity kills.jpg -------------------------------------------------------------------------------- /9900 - complexity.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - complexity.jpg -------------------------------------------------------------------------------- /9900 - devops and security.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - devops and security.jpg -------------------------------------------------------------------------------- /9900 - enterprise-it.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - enterprise-it.png -------------------------------------------------------------------------------- /9900 - git undo flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - git undo flowchart.png -------------------------------------------------------------------------------- /9900 - ie-must-die.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - ie-must-die.jpg -------------------------------------------------------------------------------- /9900 - javascript.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - javascript.png -------------------------------------------------------------------------------- /9900 - linux perf tools.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - linux perf tools.jpg -------------------------------------------------------------------------------- /9900 - multithreading.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - multithreading.jpg -------------------------------------------------------------------------------- /9900 - programmer_style.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - programmer_style.png -------------------------------------------------------------------------------- /9900 - programming spec.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - programming spec.jpg -------------------------------------------------------------------------------- /9900 - reading software.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - reading software.png -------------------------------------------------------------------------------- /9900 - software-engineer.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - software-engineer.jpg -------------------------------------------------------------------------------- /9900 - stackoverflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - stackoverflow.png -------------------------------------------------------------------------------- /9900 - wicked problems.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9900 - wicked problems.jpg -------------------------------------------------------------------------------- /9901 - GDP vs GNH.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9901 - GDP vs GNH.jpg -------------------------------------------------------------------------------- /9901 - Smartphone Crossing.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9901 - Smartphone Crossing.jpg -------------------------------------------------------------------------------- /9901 - learning stages.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9901 - learning stages.jpg -------------------------------------------------------------------------------- /9901 - profanity motivation.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9901 - profanity motivation.jpg -------------------------------------------------------------------------------- /9903 - CEO streamlining.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9903 - CEO streamlining.jpg -------------------------------------------------------------------------------- /9903 - Robots and labor.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9903 - Robots and labor.jpg -------------------------------------------------------------------------------- /9903 - counter-Varian Rule.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9903 - counter-Varian Rule.jpg -------------------------------------------------------------------------------- /9903 - trickle down economics.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9903 - trickle down economics.jpg -------------------------------------------------------------------------------- /9904 - 2016-12-9-gans.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - 2016-12-9-gans.pdf -------------------------------------------------------------------------------- /9904 - Big Data Deities.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - Big Data Deities.png -------------------------------------------------------------------------------- /9904 - Overfitting diagram.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - Overfitting diagram.jpg -------------------------------------------------------------------------------- /9904 - RoadToDataScientist1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - RoadToDataScientist1.png -------------------------------------------------------------------------------- /9904 - Scikit_Learn_Cheat_Sheet_Python.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - Scikit_Learn_Cheat_Sheet_Python.pdf -------------------------------------------------------------------------------- /9904 - data science funny.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - data science funny.jpg -------------------------------------------------------------------------------- /9904 - data science over time.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - data science over time.png -------------------------------------------------------------------------------- /9904 - data science skills venn.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - data science skills venn.jpg -------------------------------------------------------------------------------- /9904 - data viz.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - data viz.png -------------------------------------------------------------------------------- /9904 - data-science-venn-diagram.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - data-science-venn-diagram.jpg -------------------------------------------------------------------------------- /9904 - machine learning industry.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - machine learning industry.png -------------------------------------------------------------------------------- /9904 - ml libraries.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - ml libraries.png -------------------------------------------------------------------------------- /9904 - never use piece charts.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - never use piece charts.jpg -------------------------------------------------------------------------------- /9904 - stats-trick-question.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - stats-trick-question.jpg -------------------------------------------------------------------------------- /9904 - storytelling.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - storytelling.jpg -------------------------------------------------------------------------------- /9904 - tools.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904 - tools.jpg -------------------------------------------------------------------------------- /9904.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9904.jpg -------------------------------------------------------------------------------- /9905 - Parenting Chores over Time.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9905 - Parenting Chores over Time.jpg -------------------------------------------------------------------------------- /9905 - Parenting Iron Triangle.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9905 - Parenting Iron Triangle.jpg -------------------------------------------------------------------------------- /9905 - money and time.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9905 - money and time.jpg -------------------------------------------------------------------------------- /9905 - no hipsters.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9905 - no hipsters.jpg -------------------------------------------------------------------------------- /9905 - why people become unhappy.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9905 - why people become unhappy.jpeg -------------------------------------------------------------------------------- /9906 - Security Links.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9906 - Security Links.md -------------------------------------------------------------------------------- /9906 - time to crack password.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9906 - time to crack password.png -------------------------------------------------------------------------------- /9907 - cheese wheel.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9907 - cheese wheel.jpg -------------------------------------------------------------------------------- /9907 - dentist prices.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9907 - dentist prices.pdf -------------------------------------------------------------------------------- /9907 - growth of hospital admins.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9907 - growth of hospital admins.png -------------------------------------------------------------------------------- /9907 - overweight.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9907 - overweight.jpg -------------------------------------------------------------------------------- /9907 - recipe recommendation ML.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9907 - recipe recommendation ML.pdf -------------------------------------------------------------------------------- /9907 - vaccines.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9907 - vaccines.gif -------------------------------------------------------------------------------- /9908 - Academia Misincentives.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - Academia Misincentives.jpg -------------------------------------------------------------------------------- /9908 - College and Career.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - College and Career.jpg -------------------------------------------------------------------------------- /9908 - NFL odds.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - NFL odds.jpg -------------------------------------------------------------------------------- /9908 - academic minions.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - academic minions.jpg -------------------------------------------------------------------------------- /9908 - education retention.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - education retention.jpg -------------------------------------------------------------------------------- /9908 - game of loans.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - game of loans.jpg -------------------------------------------------------------------------------- /9908 - goal of education.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - goal of education.jpg -------------------------------------------------------------------------------- /9908 - incentives.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - incentives.jpg -------------------------------------------------------------------------------- /9908 - teacher feedback funny.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - teacher feedback funny.jpg -------------------------------------------------------------------------------- /9908 - textbooks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908 - textbooks.png -------------------------------------------------------------------------------- /9908.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9908.jpg -------------------------------------------------------------------------------- /9909 - Education Reform Warnings.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9909 - Education Reform Warnings.pdf -------------------------------------------------------------------------------- /9909 - Skunk Works Leadership.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9909 - Skunk Works Leadership.png -------------------------------------------------------------------------------- /9909 - get out of the way.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9909 - get out of the way.jpg -------------------------------------------------------------------------------- /9909 - org charts.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9909 - org charts.jpg -------------------------------------------------------------------------------- /9909 - typical conversation with managers.webm: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9909 - typical conversation with managers.webm -------------------------------------------------------------------------------- /9910 - mvp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9910 - mvp.png -------------------------------------------------------------------------------- /9910 - sick burn by new yorker.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9910 - sick burn by new yorker.jpg -------------------------------------------------------------------------------- /9911 - CBP Task Group Out-brief Slides_FINAL.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9911 - CBP Task Group Out-brief Slides_FINAL.pdf -------------------------------------------------------------------------------- /9911 - ComparisonOfVotingSystems.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9911 - ComparisonOfVotingSystems.png -------------------------------------------------------------------------------- /9911 - Terrorism causes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9911 - Terrorism causes.png -------------------------------------------------------------------------------- /9911 - police and recording.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9911 - police and recording.jpg -------------------------------------------------------------------------------- /9913 - Net Neutrality.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9913 - Net Neutrality.png -------------------------------------------------------------------------------- /9913 - coca cola.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9913 - coca cola.png -------------------------------------------------------------------------------- /9913 - misbehaving.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9913 - misbehaving.jpg -------------------------------------------------------------------------------- /9914 - privacy vs security.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9914 - privacy vs security.jpg -------------------------------------------------------------------------------- /9915 - dont shoot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9915 - dont shoot.png -------------------------------------------------------------------------------- /9916 - how to survive police encounters.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/9916 - how to survive police encounters.jpg -------------------------------------------------------------------------------- /Archive/140 - Nonprofit_Revenue_-_Donation_Cannibalization.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/Archive/140 - Nonprofit_Revenue_-_Donation_Cannibalization.pdf -------------------------------------------------------------------------------- /Archive/2014-05-08-keynote-one.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2014-05-08 4 | layout: post 5 | slug: pass-analytics-keynote 6 | title: PASS Business Analytics Thursday Keynote 7 | meta-description: 8 | - pass 9 | - keynote 10 | - microsoft 11 | - sqlpass 12 | - passbac 13 | --- 14 | 15 | The [PASS Business Analytics conference](http://www.sqlpass.org/bac/2014/Home.aspx) (ADD IMAGE) has started. I had the privilege of watching the [keynote](http://www.sqlpass.org/bac/2014/Sessions/Keynote.aspx) with 600-700 fellow data nerds. 16 | 17 | I was also playing [Data Science Bingo](https://github.com/tdhopper/Data-Science-Conference-Bingo). 18 | 19 | Good lord, too many pie charts. 20 | 21 | Everyone wants to get more out of their data. 22 | 23 | Of course, we started off with Tom LaRock (t/b). 24 | 25 | "We get paid to work with data every day" (Tom LaRock) 26 | 27 | ### Getting Involved 28 | 29 | We're a bunch of like-minded pros. 30 | 31 | Virtual Chapters 32 | 33 | SQL Saturdays 34 | 35 | There's a niche for you. 36 | 37 | Next year - April 20-22 in Santa Clara. 38 | 39 | ### "Big Data, Predictive Analytics, and the Middle Market" 40 | 41 | [John Whittaker](https://twitter.com/alertsource) 42 | 43 | Sr. Dir. IM from Dell Software 44 | 45 | 300 respondents, 96% said that they had big-data projects in flight or were to launch them this year. 46 | 47 | Budgets range around 2-5 million. Within 2 years they'll be at the $6mil level. 48 | 49 | http://www.dell.com/learn/us/en/uscorp1/press-releases/2014-04-28-dell-software-big-data-midmarket-survey 50 | 51 | **What drives success?** 52 | 53 | * 41% strong cooperation between businss and IT 54 | * 37 - strong connection between data analystics and perf mgmt 55 | * Required skills - data science - 33 56 | * Bus req complete/accurate - 32 57 | * Server/storage capacity -30% 58 | * Datacenter tools capable 29% 59 | 60 | Unknown - 61 | 62 | Very easy to be clever w/ predictive analytics. Just as easy to be creepy with it. 63 | 64 | **What are the most useful tools?** 65 | 66 | 60% real-time processing 67 | 58% predictive 68 | 56% data viz to convert processed daa into actionable insights 69 | 56% cloud computing for lower cost 70 | 71 | 72 | ME - get the keynote slides! 73 | 74 | **Main challenge** 75 | 76 | Complexity, volume, budget 77 | 78 | Data complexity - where it is, cleaning it, etc is still one of the big unsolved challenges today. 79 | 80 | 50% of organizations with big data projects in flight are satisfied with their decision making (speed quality), compared to just 23% among those yet to kick off a project. 81 | 82 | Who should decide: split between IT and biz. 83 | 84 | 85 | ### Bingo Words 86 | 87 | * "Big Data" 88 | * "Complexity" 89 | * "Volume of data" 90 | * "Real time" 91 | * "Hadoop" 92 | * "NoSQL" 93 | 94 | 95 | Star Trek Redshirt Bayesian - 96 | 97 | 98 | # Amir Netz, Kamal Hathi 99 | 100 | Amir - chief designer for data platform 101 | Kamal (@kamalh) - director of engineering for BI 102 | 103 | We have an engineer as a MSFT CEO - is it soon enough? 104 | 105 | Talk about Microsoft culture. 106 | 107 | 108 | 2 mil power pivot 109 | 100K power query 110 | 55K power map 111 | only looking at downloads 112 | HDinsight - 100mil compute hours 113 | powerBI 365 - 12.5K tenants activated (companies) 114 | no 1 market share gain 2013 Gartner BI market share report. 115 | 116 | We've gone meta 117 | 118 | Tenants by date - up to around 12K tenants. Nice growth chart. 119 | 120 | 1,091K questions answered by Q&A last month. 121 | 122 | What kinds of features - most-used is auto-complete. 123 | 124 | "More tweets = more wins" for NBA finals. Not looking at population sentiment. 125 | 126 | iOS app for PowerBI will be available this summer. Native applications. No comments about Android yet, but it seems likely. 127 | 128 | Please speak about authentication for BI data on apps and cloud platforms. Active Directory, etc. 129 | 130 | SSRS running in PowerBI. Going to be available by the end of the summer. Natively integrated. Taking care of all of the infrastructure. Connects to on-premise data source. OK, that's pretty awesome. 131 | 132 | What about security? 133 | 134 | ### The Changing Face of BI 135 | 136 | First - all in IT. 6-7 years ago it wasn't working. 137 | 138 | 6-7 years ago: self-service, focused on analysts and power users. PowerPivot, etc. Self-service BI. 139 | 140 | Data Culture - give everybody the tools to satisfy their curiosity. This can be done in two ways: dumbing things down, elevating things up. 141 | 142 | 143 | ### Ways of Interacting with Data 144 | 145 | * Speed 146 | * Accuracy 147 | * Semantic meaning 148 | 149 | The approach is still dashboards and professionals, not products. 150 | 151 | 152 | "How to make data science so easy that anyone can do it?" 153 | 154 | Forecasting in Power View in Office 365 - simple forecasting. 155 | 156 | Going to be available in every line chart. 157 | Does seasonality and confidence interval. 158 | Confidence interval's based on standard deviation. Assume normal distribution. 159 | 160 | ME - do this w/ Tableau forecasting, PowerBI for Notify data. 161 | 162 | Data exploration mode- will be available, no date specified. 163 | 164 | Time series analysis. Does time-series cross-validation to learn how. That's pretty awesome. 165 | 166 | ME - show housing prices. 167 | 168 | Now there are treemaps. 169 | 170 | 171 | Talk about information retrieval. 172 | 173 | 174 | 175 | 176 | -------------------------------------------------------------------------------- /Archive/2014-05-08-keynote-two.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2014-05-09 4 | layout: post 5 | slug: pass-analytics-mccandless 6 | title: PASS Business Analytics Friday Keynote 7 | meta-description: 8 | - pass 9 | - keynote 10 | - microsoft 11 | - sqlpass 12 | - passbac 13 | - visualization 14 | --- 15 | 16 | Today was the last day of the [PASS Business Analytics conference](http://www.sqlpass.org/bac/2014/Home.aspx) (ADD IMAGE). 17 | 18 | The keynote today was fun; David McCandless (blog, twitter) . 19 | 20 | 21 | ### Denise 22 | 23 | * Amount of info is overwhelming. 24 | * How do you find the patterns that matter? 25 | 26 | 27 | ## David McCandless 28 | 29 | informationisbeautiful.net 30 | 31 | * What does billions look like. 32 | * Too large to really understand. Billion dollar-o-gram. Color are 33 | * The story is the connections between normally unconnected data. 34 | * Cost per taxpayer per day. It's a number and scale we can relate to. 35 | * Journalism tends to feed on fear. Yellow journalism. 36 | * Timeline of the world's biggest fears. 37 | 38 | $148 billion. Spent on obesity-related illness. 39 | 40 | Data is the new soil. Things grow from them. 41 | 42 | You need journalistic inquiry to deliver discovery and delivery. 43 | 44 | Break-up times: never on Xmas, 45 | 46 | Web scraping. David McCandless. 47 | 48 | PICTURE - most common 49 | 50 | Remember to put numbers in context. 51 | 52 | A million lines of code. 53 | 54 | Visual drake equation 55 | 56 | PICTURE - visual resume. 57 | 58 | We learn different things by osmosis. 59 | 60 | Our brains interact in visual color, pattern and shape. It's the language of our brain. 75% of neurons (check that). 61 | 62 | Venn diagram - pigs, birds, humans - in-flu-venza. Wow, that's a terrible pun. 63 | 64 | **Twitter IPO** 65 | 66 | 100 people 67 | 20 are dead 68 | 60 lazy - not in last week 69 | 5 with more than 10- followers 70 | 5 loud mouths - 32% are bots 71 | 55 women, 45 men. 72 | 73 | 74 | 200 billion hours TV. 100 million hours to create Wikipedia. 75 | 76 | his data is all public. Use it. 77 | 78 | Food supplement by efficacy. 79 | 80 | **Fail Tips** 81 | 82 | * When you visualize a complex data set, you make a complex graphic. Doesn't solve it. 83 | * Circular diagrams aren't that usable. 84 | * Cartograms...hard to get the data out. Very hard to compare. 85 | * Design is really about removing things, cleaning down to a functional essence. 86 | 87 | **What Works** 88 | 89 | * Interestingness - goes after a useful question 90 | * Integrity - trustworthy 91 | * Form - has to look good, certain standard. 92 | * Function - easy to use 93 | 94 | 95 | * Data viz smackdown 96 | * Hard to write - so engrossing. Mark of a brilliant keynote. 97 | 98 | * All datasets are public on Google Docs. They spend a lot of time putting it together. 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | -------------------------------------------------------------------------------- /Archive/2014-05-13-passbac survey.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/Archive/2014-05-13-passbac survey.xlsx -------------------------------------------------------------------------------- /Archive/2014-07-01-tsql-tuesday.md: -------------------------------------------------------------------------------- 1 | --- 2 | author: DevNambi 3 | date: 2014-07-01 4 | layout: post 5 | slug: tsql-tuesday-announcement 6 | title: T-SQL Tuesday - Assumptions 7 | meta-description: 8 | - tsql tuesday 9 | - sqlfamily 10 | - sqlpass 11 | --- 12 | 13 | 14 | 15 | It's hard to rock the boat.
    It's hard to ask the basic questions that everybody knows.
    It's hard to slow down and ask for clarification. 16 | 17 | So, we improvise. We guess: things that are accepted as true, without proof. We often forget our assumptions, or make them instinctively. 18 | 19 | For this T-SQL Tuesday, the topic is **assumptions**. 20 | 21 | 22 | For example: 23 | 24 | * The sun will come up tomorrow. 25 | * Firewalls and anti-virus are enough to protect my computer. 26 | * My backups work even if I don't restore them. 27 | * I don't need to check for *that* error, it'll never happen. 28 | 29 | 30 | 31 | Your assignment for this month is to write about a big assumption you encounter at work, one that people are uncomfortable talking about. Every team has an [elephant in the room](http://en.wikipedia.org/wiki/Elephant_in_the_room). 32 | 33 | 34 | **What happens if these big guesses aren't true?** 35 | 36 | 37 | #### Housekeeping 38 | 39 | A few rules to follow when participating: 40 | 41 | * Your post must be published between **00:00 [PDT](http://www.timeanddate.com/library/abbreviations/timezones/na/pdt.html) Tuesday, July 8th, 2014**, and **00:00 PDT Wednesday, July 9, 2014**. 42 | * Your post must contain the T-SQL Tuesday logo from above and the image should link back to this blog post. 43 | * Trackbacks won't work, so please tweet a link to me ([@DevNambi](https://twitter.com/DevNambi)) or send an email (me at devnambi dot com). 44 | 45 | 46 | Some optional (and highly encouraged) things to also do: 47 | 48 | * Include a reference to T-SQL Tuesday in the title of your post 49 | * Tweet about your post using the hash tag [#TSQL2sDay](https://twitter.com/search?q=%23tsql2sday) 50 | * Consider hosting T-SQL Tuesday yourself. Adam Machanic keeps the list. 51 | 52 | 53 | #### About T-SQL Tuesday 54 | T-SQL Tuesday was started by Adam Machanic ( [Blog](http://sqlblog.com/blogs/adam_machanic/) | [@AdamMachanic](https://twitter.com/AdamMachanic) ) in 2009. It’s a monthly blog party with a rotating host, who is responsible for providing a new topic each month. In case you've missed a month or two, Steve Jones ( [Blog](http://voiceofthedba.wordpress.com/2012/12/10/t-sql-tuesday-topics-december-2012-update/) | [@way0utwest](https://twitter.com/way0utwest) ) maintains a complete list for your reading enjoyment. 55 | 56 | 57 | *Happy sleuthing!* -------------------------------------------------------------------------------- /Archive/NodeXL graphs.md: -------------------------------------------------------------------------------- 1 | # NodeXL 2 | 3 | * Crowds matter online...they're larger than real life often, but we understand them less. Inherently weak signal. 4 | 5 | ## Social Networks 6 | 7 | * Central tenet - social structure emerges from the aggregate of relationships among members of a population. 8 | * Emergence of cliques and clusters. Centrality (core) and periperhy (isolates), betweenness. 9 | * Methods - surveys, interviews, etc. 10 | * Social media is all about networks. 11 | 12 | Patterns are left behind. 13 | 14 | There are many kind of ties: 15 | 16 | * Send 17 | * mention 18 | * like link reply rate review favorite friend follow forward edit tag comment check-in. 19 | * one way relationships: lend money to. 20 | * bidirectional: is married to. 21 | 22 | Social media is meaningfully different from each other. They all have one thing in common: networks. 23 | 24 | The UW doesn't have public squares anymore, with people who disagree with us. If it happens at all it happens online. 25 | 26 | A network is born whenever two entities are joined. 27 | 28 | Network theory: position, position, position. It's all relative. 29 | 30 | NodeXL - like social media for graphs. 31 | 32 | Trying to be the Firefox of GraphML. 33 | 34 | GraphML - XML for social networks (a data structure) 35 | 36 | Open Tools, Open Data, Open Scholarship. 37 | 38 | NodeXLGraphGallery.org - open data, user-generated collections/datasets. 39 | Open Scholarship - trying to make it easy. 40 | 41 | Try to using the tool. 42 | 43 | ### 6 social network structures 44 | 45 | Divided or unified crowds 46 | Divided - political/controversial topic. 47 | United - some communities are unified. 48 | 49 | Fragmented - brand clusters 50 | they don't reply to each other. 51 | Clustered - community clusters 52 | they interact a bit. 53 | what happens when people grow up a bit. 54 | Hub-and-spoke - broadcast network 55 | PR/marketing. 56 | Institutional speaker. 57 | Called the 'audience' pattern - people who retweet don't interact with each other. 58 | Out-hub-and-spoke - support network 59 | Airline support. 60 | @DellCares 61 | 62 | The density of the connections is how 63 | 64 | 65 | ## Centrality 66 | 67 | * Eigenvector centrality. 68 | * PageRank 69 | * Betweenness centrality - influencers. The 'bridge' score. 70 | ME - look at this for side business. 71 | 72 | * Some connections are very important. Bridges. Only 2 points of connection. But they're the only thing that connects those two networks. 73 | 74 | When you are the bridge, you may charge a toll. It could be only social capital. It's hard because you connect to something that is not like you. 75 | 76 | Don't be a hub. Be a bridge. 77 | 78 | Isolets. It means there's never been an @____ in their tweets. It means they're the new members. 79 | 80 | IDEA FOR PASS: use social network analysis to identify influencers and new people to connect with. 81 | 82 | #CMgrChat - social media managers. Basically it's a small village. 83 | 84 | Look at the social network of people who are better at this than you. Find out, and then use this analysis to figure it out. 85 | 86 | ME - read more of stuff by Marc Smith, MSR researcher 87 | 88 | http://www.connectedaction.net/ 89 | 90 | Last - plea for help. 91 | 92 | 93 | Because Excel is an ODBC sourcer, anything that can join 2 tables can work in NodeXL. 94 | 95 | 96 | 97 | 98 | 99 | -------------------------------------------------------------------------------- /Archive/uw - 010 - introduction.md: -------------------------------------------------------------------------------- 1 | # The Bootstrap 2 | 3 | 4 | This is the first post in a blog on data analysis, data-driven discovery, and decision making at the University of Washington. 5 | 6 | My name's Dev Nambi, and I'm a data scientist in the UW's [Enterprise Data and Analytics](http://www.washington.edu/uwit/im/EDA.html) team. (ADD PIC) I've worked at the UW since 2012. Before that I was a software developer and analyst at [Microsoft's Ads](http://advertising.microsoft.com/en/advertising-online) R&D group, an ETL developer at a startup, and [more](http://devnambi.com). 7 | 8 | *"The best minds of my generation are thinking about how to make people click ads..." - [Jeff Hammerbacher](https://twitter.com/hackingdata)* 9 | 10 | That was me. It the best job I could find that paid enough to let me work off my student loans. Now I am hoping to give back to the next generation of students at the university. 11 | 12 | 13 | ### Data Science in Academia 14 | 15 | There is quite a bit of [excitement](https://news.cs.washington.edu/2013/11/12/uw-berkeley-nyu-collaborate-on-37-8m-data-science-initiative/) and [activity](http://escience.washington.edu/event/data-science-university-washington-campus-conversation) on data science in academia. So far the emphasis has been, rightly, on data-driven discovery in *scientific research*. The UW's emphasis there is its new [Data Science Incubator](http://data.uw.edu) and [eScience Institute](http://escience.washington.edu/). 16 | 17 | There are potentially far-reaching implications in fields as varied as astrophysics, oceanography, chemical engineering, genomics, and sociology. 18 | 19 | I admire that, but I want to do something more pragmatic, more *direct*. 20 | 21 | 22 | ### Data Science in Administration 23 | 24 | A university can be made more efficient using data. There are so many ways to do this it's mind-boggling, so I use a heuristic to pick areas to focus on: changes must directly benefit students. 25 | 26 | My reason for starting with students is simple: money. 27 | 28 | (ADD "the tuition is too damn high meme"). 29 | 30 | 31 | Tuition is very expensive [compared to entry-level salaries](http://www.zerohedge.com/news/2014-05-18/net-worth-college-grads-student-debt-20-less-high-school-grads-no-debt), and that problem has been getting worse for *[decades](http://measuringup2008.highereducation.org/commentary/callan.php)*. Student debt is now [bigger than credit card debt](http://www.bizjournals.com/stlouis/blog/2013/04/fed-student-loan-debt-surpasses-auto.html), and it's [practically impossible to get rid of](http://www.studentloanborrowerassistance.org/bankruptcy/). 32 | 33 | My goal is simple: to find ways to help UW students graduate with the same quality education they have now, but with less debt. 34 | -------------------------------------------------------------------------------- /Genome Science Blog Post.md: -------------------------------------------------------------------------------- 1 | # Genome Science Blog Post 2 | 3 | https://aws.amazon.com/blogs/big-data/interactive-analysis-of-genomic-datasets-using-amazon-athena/ -------------------------------------------------------------------------------- /List of things I still can't do in November 2014.md: -------------------------------------------------------------------------------- 1 | ### List of things I still can't do in November 2014 2 | 3 | -- from https://www.facebook.com/frkrueger/posts/10154867033210444 4 | 5 | 6 | **Legit** 7 | 8 | * Get a blood test simply, myself, without going to a lab, and get the results overnight. 9 | * Build a house myself using standard small interchangeable parts like legos. 10 | * Build a website with a resolvable domain on the internet, a great theme, and configurable data inside of 5 minutes. 11 | * Locate where my dog is right now using a barely noticeable GPS chip. 12 | * Share my health and fitness data with my doctor and my trainer in real time and get advice. 13 | * Have my car keep a web record of everywhere it has been, how it is doing, and what needs fixing or updating. 14 | * Find a list of all LA artists online, browse their work, and buy directly from them without going through an art dealer. 15 | * Vote online. 16 | * Get a doctors house call using an app. (Uber for doctors). 17 | * Move $50,000 into a form of Bitcoin where the value is pegged to the USD, not a random number (the price of Bitcoin) 18 | * take a picture of any object and find out where to buy it (Shazam for things) 19 | * have all my accounting being done as a SASS -- with 24/7 qualified accountants / planners / bill payers and a complete online record of my expense and balance sheet at all time available. 20 | * Enter all my favorite classical music playlists, and have it coordinate with my travel schedule to update me who is playing where. 21 | * Be able to take a course online and get graded -- and get a diploma that means something. Education is really ripe for disruption: Coursera is OK, but we need to invent Stanford 2.0 22 | * Be able to sell my advice online. It's worth something and I should have some way to monetize it. 23 | * Be able to see a list of all single people in LA right now and efficiently sort through this data, with two way opt-in, to find an ideal match 24 | * See videos of restaurants before you go to them. Pictures can be decieving. I want a video. 25 | * Be able to get a $100 MRI. It can be done for this price. 26 | * Be able to get into an ER for under $100. It's ridicoulous that a mere 15 min consultation can cost somebody (the system) $1000+. 27 | 28 | 29 | 30 | **First World Problems** 31 | 32 | * Invest $10,000 in Uber in the secondary market by buying some shares from an existing shareholder -- with just a few clicks. 33 | * Find out which are the hot new restaurants in Paris from people who actually know. 34 | * Find a teacher of general relativity online. I tried. 35 | * Travel to space -- click, pay about 20 million, book a trip to the space station. 36 | * Send $10,000 to my caretaker in France and have her get the cash and withdraw it from an ATM that same day. 37 | 38 | 39 | 40 | **Ignores Paying a Living Wage** 41 | 42 | *a.k.a, can be done right now but at a high price* 43 | 44 | * Click on a recipe, pick number of people, and have all the ingredients delivered to me the next day by Amazon Fresh 45 | * Get a qualified guitar / piano / cello teacher to come to my house without randomly calling people on craigslist or doing general google searches. 46 | * Get somebody to come to my house, pick up my dry cleaning and drop it back off the next day (Amazon "Clean"). 47 | 48 | 49 | **Ignores Context** 50 | 51 | *a.k.a. provides the wrong incentives, or can be badly abused* 52 | 53 | * Get a discount from the federal government for being healthy. Fat people should pay more taxes because they cost society more. This means some approved weigh in and testing centers 54 | * Get investors for my startup by advertising the stock offering on the web and selling shares directly. 55 | * Get a quick, binding divorce online. 56 | * Keep track of where everybody in my team is physically right now. 57 | * Call the police or the fire department or paramedics using an app. 58 | 59 | -------------------------------------------------------------------------------- /Principles_of_Performance_Tuning.md: -------------------------------------------------------------------------------- 1 | # Principles of Performance Tuning 2 | 3 | * Less business complexity 4 | * Do less work 5 | * Less technical/design complexity 6 | * More efficient systems 7 | * More efficient workload (CPU cycles / unit of work) -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | blog-drafts 2 | =========== 3 | 4 | Drafts and ideas for my blog 5 | -------------------------------------------------------------------------------- /company size and culture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/company size and culture.png -------------------------------------------------------------------------------- /crime-vs-incarceration.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/crime-vs-incarceration.jpg -------------------------------------------------------------------------------- /darwin award.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/darwin award.jpg -------------------------------------------------------------------------------- /data bias.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/data bias.jpg -------------------------------------------------------------------------------- /einstein_ethics.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/einstein_ethics.jpg -------------------------------------------------------------------------------- /equal-vs-fair.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/equal-vs-fair.png -------------------------------------------------------------------------------- /math_for_grownups.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/math_for_grownups.jpg -------------------------------------------------------------------------------- /mechanical_calculator.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/mechanical_calculator.jpg -------------------------------------------------------------------------------- /precision-and-recall.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/precision-and-recall.jpg -------------------------------------------------------------------------------- /resistance is just.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/resistance is just.jpg -------------------------------------------------------------------------------- /student_debt.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/student_debt.jpg -------------------------------------------------------------------------------- /wolf debt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/DevNambi/blog-drafts/dcb265a5d583c0b2f7243a12667d2895377f2631/wolf debt.png --------------------------------------------------------------------------------