└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # awesome_stuff 2 | 3 | It turns out that twitter only keeps links for a year. Which is super lame. So here is a living document of everything I find on the internet that is awesome. 4 | 5 | https://github.com/ianozsvald/featherweight_web_api 6 | 7 | -- automatically generate a web api for a python function 8 | 9 | http://pytube.org/ 10 | 11 | --a site for talks about python 12 | 13 | http://arxiv.org/abs/1606.03476 14 | 15 | --an interesting paper about Generative Adversarial Imitation Learning 16 | 17 | http://inhabitat.com/video-nikola-teslas-dream-is-finally-a-reality-with-wi-fi-powered-electronics/ 18 | 19 | --ambient backscatter - completely battery free technology 20 | 21 | https://www.opendatascience.com/ 22 | 23 | --odsc blog posts 24 | 25 | https://www.ices.utexas.edu/about/news/350/ 26 | 27 | --Navier-Stokes equations explained 28 | 29 | http://www.darpa.mil/news-events/2016-06-17 30 | 31 | --darpa is funding machine learning algorithms that generate machine learning algorithms. 32 | 33 | http://arxiv.org/abs/1606.05340 34 | 35 | --proof that deep networks can expose nonlinearities in nonlinear space, translating them into flat fields. 36 | 37 | https://www.oreilly.com/learning/hello-tensorflow 38 | 39 | --a great intro to tensorflow 40 | 41 | https://www.facebook.com/quartznews/videos/1202874983079535/ 42 | 43 | --an ai algorithm that figures out what sound an object should make 44 | 45 | https://golem.ph.utexas.edu/category/2016/06/how_the_simplex_is_a_vector_sp.html 46 | 47 | -- how the simplex is a vector space 48 | 49 | https://github.com/josephmisiti/awesome-machine-learning 50 | 51 | -- awesome machine learning libraries list 52 | 53 | https://www.safaribooksonline.com/library/view/python-cookbook-3rd/9781449357337/ch01s05.html 54 | 55 | --implementation of a priority queue 56 | 57 | https://www.cs.bris.ac.uk/~montanar/teaching/dsa/dijkstra-handout.pdf 58 | 59 | -- dijkstra's with priority queue 60 | 61 | http://docs.scala-lang.org/tutorials/scala-for-java-programmers.html 62 | 63 | -- scala for java programmers 64 | 65 | https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf 66 | 67 | --functional data structures 68 | 69 | http://www.oreilly.com/programming/free/files/functional-programming-python.pdf 70 | 71 | --functional programming in python 72 | 73 | http://tinkersphere.com/stores 74 | 75 | --where to get a raspberry pi in nyc 76 | 77 | http://blog.smola.org/post/145983963411/leaving-cmu 78 | 79 | --ml guy heads to amazon 80 | 81 | http://exploreflask.readthedocs.io/en/latest/views.html 82 | 83 | --interesting set of patterns for flask 84 | 85 | http://stackoverflow.com/questions/29987323/how-do-i-send-data-from-js-to-python-with-flask 86 | 87 | --flask from jquery 88 | 89 | https://code.jquery.com/ 90 | 91 | --jquery cdn 92 | 93 | http://stackoverflow.com/questions/1034621/get-current-url-in-web-browser 94 | 95 | -- get current url from browser 96 | 97 | http://stackoverflow.com/questions/558518/how-can-i-serialize-an-object-to-json-in-javascript 98 | 99 | --object serialization in javascript 100 | 101 | https://developer.mozilla.org/en-US/docs/Web/API/Geolocation/Using_geolocation 102 | 103 | -- getting location from browser 104 | 105 | https://pypi.python.org/pypi/honcho 106 | 107 | --foreman clone in python 108 | 109 | http://stackoverflow.com/questions/16086962/how-to-get-a-time-zone-from-a-location-using-latitude-and-longitude-coordinates 110 | 111 | --an interesting discussion about timezones 112 | 113 | https://teamtreehouse.com/community/nested-loops-in-flask-how-to-iterate-and-make-nested-lists 114 | 115 | -- nested forloops in flask 116 | 117 | https://medium.com/@handaru/build-recommendation-engine-using-graph-cbd6d8732e46#.y6b7vd4g3 118 | 119 | --recommender engine with graphs 120 | 121 | http://www.cs.yale.edu/homes/perlis-alan/quotes.html 122 | 123 | --platitudes about programming 124 | 125 | https://www.youtube.com/watch?v=3N__tvmZrzc 126 | 127 | --programming languages class 128 | 129 | http://stackoverflow.com/questions/32311366/alembic-util-command-error-cant-find-identifier 130 | 131 | https://devcenter.heroku.com/articles/heroku-postgresql 132 | 133 | --how to update your database with migrations when flask-migrate fails to work on heroku 134 | 135 | http://www.techinsider.io/modafinil-is-an-effective-cognitive-enhancement-nootropic-2016-6?utm_content=buffer4e362&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer-ti 136 | 137 | --an interesting debate on intelligence enhancing drugs 138 | 139 | https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in-lasagne-2cdce6837315#.wiuyzrlri 140 | 141 | --Neural turing machines 142 | 143 | http://minimaxir.com/2016/06/interactive-reactions/ 144 | 145 | --interesting analysis of public facebook posts 146 | 147 | http://www.slideshare.net/AperioIntel/financial-crime-in-the-real-estate-sector-countering-illicit-money-flows 148 | 149 | -- how to detect money laundering, with examples 150 | 151 | https://medium.com/data-science-cafe/apache-spark-1-6-0-setup-on-mac-os-x-yosemite-d58076e8064e#.2vggkt2n6 152 | 153 | --spark setup 154 | 155 | https://courses.edx.org/courses/course-v1:BerkeleyX+CS105x+1T2016/info 156 | 157 | --spark course 158 | 159 | http://bugra.github.io/work/notes/2014-04-19/alternating-least-squares-method-for-collaborative-filtering/ 160 | 161 | --collaborative filtering via alternating least squares with implementation in python 162 | 163 | https://mathiasbynens.be/notes/shell-script-mac-apps 164 | 165 | --appify your shell scripts 166 | 167 | http://stackoverflow.com/questions/13636848/is-it-possible-to-do-fuzzy-match-merge-with-python-pandas 168 | 169 | --fuzzy matching with python data frames 170 | 171 | https://github.com/dgrtwo/fuzzyjoin 172 | 173 | --fuzzy join in R 174 | 175 | http://conferences.oreilly.com/strata/hadoop-big-data-ny/public/schedule/speakers 176 | 177 | --strata hadoop speakers 178 | 179 | http://conferences.oreilly.com/strata/hadoop-big-data-ny 180 | 181 | --strata hadoop NY 182 | 183 | http://conferences.oreilly.com/strata 184 | 185 | --strata conf 186 | 187 | https://www.odsc.com/boston 188 | 189 | --odsc east 190 | 191 | http://mlconf.com/events/new-york-city-ny/ 192 | 193 | --mlconf nyc 194 | 195 | http://icml.cc/2016/?page_id=1519 196 | 197 | --workshops at a glance 198 | 199 | http://icml.cc/2016/?page_id=97 200 | 201 | --tutorials at icml 202 | 203 | http://icml.cc/2016/?page_id=1839 204 | 205 | --schedule icml 206 | 207 | http://techtalks.tv/icml2016/ 208 | 209 | --icml papers 210 | 211 | https://sites.google.com/site/icmlworkshoponanomalydetection/ 212 | 213 | --anomaly detection workshop 214 | 215 | https://spark.apache.org/docs/0.9.0/mllib-guide.html 216 | 217 | --spark mllib docs 218 | 219 | https://spark.apache.org/docs/0.9.0/python-programming-guide.html 220 | 221 | --pyspark 222 | 223 | https://www.youtube.com/watch?v=wmw8Bbb_eIE&app=desktop 224 | 225 | --tensorflow intro 226 | 227 | http://www.fastcompany.com/3059634/your-most-productive-self/your-brain-has-a-delete-button-heres-how-to-use-it 228 | 229 | --your brain has a delete button 230 | 231 | https://mlalgorithm.wordpress.com/2016/06/08/hierarchical-clustering/ 232 | 233 | --hierarchical clustering 234 | 235 | https://github.com/unitedstates 236 | 237 | --united states github 238 | 239 | https://github.com/jmcarp/robobrowser 240 | 241 | --bad ass web scraper 242 | 243 | https://arxiv.org/abs/1606.09458 244 | 245 | --ensemble voting methods 246 | 247 | http://www.umiacs.umd.edu/~hal/docs/daume04rkhs.pdf 248 | 249 | --math hardcore 250 | 251 | http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 252 | 253 | --lstm 254 | 255 | http://lesswrong.com 256 | 257 | --Bayesian salad 258 | 259 | http://blog.socialcops.com/engineering/machine-learning-python?utm_source=facebook&utm_medium=social&utm_campaign=blog_share 260 | 261 | --from nothing to nn's 262 | 263 | https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest 264 | 265 | --compute the median fast 266 | 267 | http://machinelearningmastery.com/applied-deep-learning-in-python-mini-course/ 268 | 269 | --deep learning at breakneck speed 270 | 271 | http://highscalability.com/blog/2016/7/6/machine-learning-driven-programming-a-new-programming-for-a.html 272 | 273 | --deep learning for code generation 274 | 275 | https://www.whitehouse.gov/the-press-office/2016/06/30/fact-sheet-launching-data-driven-justice-initiative-disrupting-cycle 276 | 277 | --AI and justice from the whitehouse 278 | 279 | http://earthmysterynews.com/2016/05/05/physicists-send-particles-of-light-into-the-past-proving-time-travel-is-possible/ 280 | 281 | --an experiment confirming that time travel is possible 282 | 283 | https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/ 284 | 285 | --theoretical machine learning course 286 | 287 | 288 | http://ec2-52-51-244-37.eu-west-1.compute.amazonaws.com 289 | 290 | --narrative flow analysis with som 291 | 292 | http://www.analyticbridge.com/m/group/discussion?id=2004291%3ATopic%3A304182 293 | 294 | --data science book 295 | 296 | http://www.datasciencecentral.com/profiles/blogs/machine-learning-anomaly-detection-finding-a-needle-in-a-haystack?overrideMobileRedirect=1 297 | 298 | --anomaly detection 299 | 300 | http://www.wired.com/2016/05/google-open-sourced-syntaxnet-ai-natural-language/?utm_content=buffer7037c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 301 | 302 | --syntax net is open source! 303 | 304 | http://www.deepgram.com 305 | 306 | --audio api 307 | 308 | https://www.opendatascience.com/blog/understanding-principal-component-analysis/ 309 | 310 | --PCA explained 311 | 312 | http://stats.stackexchange.com/questions/8000/proper-way-of-using-recurrent-neural-network-for-time-series-analysis 313 | 314 | --great description of RNNs for time series (what they are not) 315 | 316 | http://science.tumblr.com/post/147401742140/the-most-beautiful-equation 317 | 318 | --recursion 319 | 320 | https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn/ 321 | 322 | --rnn applications 323 | 324 | https://github.com/llllllllll/lazy_python and https://github.com/llllllllll/codetransformer 325 | 326 | -- hacking python for fun and profit! 327 | 328 | http://tech.magnetic.com/ 329 | 330 | -- good blog 331 | 332 | http://tech.magnetic.com/2016/04/demystifying-logistic-regression.html 333 | 334 | --simple intro to logistic regression and ML 335 | 336 | https://github.com/mcraig2/pygotham-talk/blob/master/tflow.ipynb 337 | 338 | --tensorflow intro 339 | 340 | https://pypi.python.org/pypi/ad/1.3.2 341 | 342 | --automatic differentiation 343 | 344 | https://pypi.python.org/pypi/yappi 345 | 346 | --profiler for python 347 | 348 | http://kcachegrind.sourceforge.net/html/Home.html 349 | 350 | --visualize the profiling from yappi 351 | 352 | http://mike.place/talks/pygotham/#1 353 | 354 | --document summarization 355 | 356 | https://www.youtube.com/watch?v=0VTI1BBLydE 357 | 358 | --stanford music generation with RNNs 359 | 360 | https://github.com/MattVitelli/GRUV 361 | 362 | --source code 363 | 364 | http://oubiwann.blogspot.com/2014/07/oscon-2014-theme-song-andrew-sorensen.html 365 | 366 | --andrew sorenson keynote on music generation 367 | 368 | http://pybee.org/ 369 | 370 | -- for mobile development 371 | 372 | https://github.com/spotify/annoy 373 | 374 | --nearest neighbor implementation 375 | 376 | http://www.cs.cmu.edu/~ggordon/singh-gordon-kdd-factorization.pdf 377 | 378 | --collective matrix factorization 379 | 380 | http://videolectures.net/cmulls08_singh_rlm/ 381 | 382 | --collective matrix factorization 383 | 384 | http://www.benjamintd.com/blog/spynet/ 385 | 386 | an rnn that writes Python! 387 | 388 | http://askubuntu.com/questions/761180/wifi-doesnt-work-after-suspend-after-16-04-upgrade 389 | 390 | -- fix wifi issue 391 | 392 | https://www.opendatascience.com/blog/the-forgotton-optimization-topic-set-diversity/ 393 | 394 | --optimization texhnique 395 | 396 | https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/ 397 | 398 | --conv net theory 399 | 400 | https://www.quora.com/How-can-I-prepare-myself-to-be-a-software-engineer-at-Google/answer/Gaurav-Jha-9?srid=0c9s 401 | 402 | 403 | http://multithreaded.stitchfix.com/blog/2016/07/21/skynet-salesman/ 404 | 405 | --RL deep Q 406 | 407 | https://github.com/deepmind/rc-data 408 | 409 | --deep learning language data set 410 | 411 | https://github.com/rouseguy/europython2016_dl-nlp/tree/master/notebooks 412 | 413 | --deep learning language nlp 414 | 415 | https://medium.com/@ageitgey/machine-learning-is-fun-part-4-modern-face-recognition-with-deep-learning-c3cffc121d78#.wp3bwd9ez 416 | 417 | --svm face rec 418 | 419 | https://www.reddit.com/r/textdatamining/ 420 | 421 | --textmining reddit 422 | 423 | http://arxiv.org/abs/1503.04069 424 | 425 | --an analysis of LSTM 426 | 427 | https://web.stanford.edu/~arbenson/cme193.html 428 | 429 | --scientific computation in python 430 | 431 | http://cs231n.github.io/ 432 | 433 | --stanford convolutional neural networks course with numpy 434 | 435 | http://jvns.ca/ 436 | 437 | -- a very awesome blog 438 | 439 | https://github.com/tzutalin/labelImg 440 | 441 | --graphical label annotation 442 | 443 | https://www.nyu.edu/projects/bowman/bowman2016phd.pdf 444 | 445 | --modeling natural language with learned representations 446 | 447 | https://www.mapr.com/blog/design-patterns-recommendation-systems-%E2%80%93-everyone-wants-pony 448 | 449 | --recommender system 450 | 451 | https://www.technologyreview.com/s/539706/how-the-new-science-of-game-stories-could-change-the-future-of-sports/?utm_campaign=socialflow&utm_source=facebook&utm_medium=post 452 | 453 | --interesting analysis and visualization of stories 454 | 455 | http://buff.ly/2b5wvMm 456 | 457 | --3D modeling in Python 458 | 459 | http://www.rightrelevance.com/search/articles/hero?article=bb58e4504d119319a294fd269b5e1f61558cb26a&query=particle%20physics&taccount=parrticlephy 460 | 461 | --simple flow equation 462 | 463 | https://github.com/EricSchles/paper-notes 464 | 465 | --from kapathary, looks super cool 466 | 467 | https://www.technologyreview.com/s/601774/data-mining-reveals-the-crucial-factors-that-determine-when-people-make-blunders/?utm_content=buffer2f2d5&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 468 | 469 | --how decisions are made 470 | 471 | https://github.com/EricSchles/drmad 472 | 473 | --hyper parameter tuning, some folks on reddit seem to think this is yet another useless technique. 474 | 475 | https://github.com/EricSchles/reddit_crawlers 476 | 477 | --reddit crawler that for some reason has a serious deep learning component, worth investigating 478 | 479 | https://github.com/dyelax/Adversarial_Video_Generation/tree/master/Code 480 | 481 | --an implementation of adversarial networks! Definitely need to read through in detail 482 | 483 | http://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html 484 | 485 | --part of a series on model selection, looks pretty good. 486 | 487 | http://www.futurecrunch.com.au/writing/ 488 | 489 | --political economy writing 490 | 491 | http://chrisalbon.com/ 492 | 493 | --sane examples of pandas and R 494 | 495 | https://qbox.io/blog/sparse-matrix-multiplication-elasticsearch-apache-spark 496 | 497 | --elasticsearch matrix multiplication 498 | 499 | http://www.rightrelevance.com/search/articles/hero?article=5fb7a116712286ad60484e7f05d4fdeb75e26454&query=artificial%20intelligence&taccount=ml_toparticles 500 | 501 | -- machine learning for first responders 502 | 503 | http://blog.getstream.io/fast-recommendations-for-activity-streams-using-vowpal-wabbit?utm_content=buffera5da5&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 504 | 505 | --vopal wabbit 506 | 507 | http://mike.place/talks/pygotham/#p1 508 | 509 | --Document summarization 510 | 511 | http://github.com/coxlab/prednet 512 | --recurrent convolutional net 513 | 514 | https://github.com/MacLeek/trackmac 515 | --tracking project time on mac 516 | 517 | https://github.com/HackerHouseYT/Smart-Mirror 518 | --smart miror w/ raspbery pi 519 | 520 | http://distill.pub/2016/augmented-rnns/ 521 | --RNNs 522 | 523 | https://medium.com/@USCTO/public-input-and-next-steps-on-the-future-of-artificial-intelligence-458b82059fc3#.vad6ol11a 524 | --interesting read on ML 525 | 526 | http://blog.quantopian.com/optimize_capacity/ 527 | --sharpe Ratio 528 | 529 | https://unu.edu/fighting-human-trafficking-in-conflict 530 | --human trafficking in conflict 531 | 532 | https://www.datacamp.com/courses/intro-to-python-for-data-science?utm_content=buffer556a6&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 533 | --data camp python class 534 | 535 | http://www.aosabook.org/en/500L/a-python-interpreter-written-in-python.html 536 | --python interpretter written in python 537 | 538 | http://sunlightfoundation.com/blog/2016/09/08/today-in-opengov-the-future-of-the-us-city-open-data-census-first-us-ciso-and-more/ 539 | --open data 540 | 541 | https://deepmind.com/blog/wavenet-generative-model-raw-audio/ 542 | --wave net for audio 543 | 544 | https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa#.3e9v5nggx 545 | --machine language translation 546 | 547 | http://www.rightrelevance.com/search/articles/hero?article=e036c156aa408a235aa740162e3b1cfd2e0e985c&query=python&taccount=pythonrr 548 | --python intel distro 549 | 550 | http://fusion.net/story/344884/sex-slave-bars-in-united-states/ 551 | --great set of visuals about human trafficking 552 | 553 | https://m.youtube.com/playlist?list=PLmImxx8Char9Ig0ZHSyTqGsdhb9weEGam And https://m.youtube.com/watch?v=sU_Yu_USrNc 554 | --Stanford nlp lectures 555 | 556 | http://www.rightrelevance.com/search/articles/hero?article=4c40ce09cb544b00b68580b7866fe18ce48a27eb&query=python&taccount=pythonrr 557 | --sandman library 558 | 559 | https://www.facebook.com/inthenow/videos/681969348620104/ 560 | --ambulance drone 561 | 562 | http://www.pyimagesearch.com/2016/09/26/a-simple-neural-network-with-python-and-keras/ 563 | --minimal neural network with Keras 564 | 565 | https://github.com/datascopeanalytics/traces 566 | --uneven time series analysis 567 | 568 | https://blog.monkeylearn.com/the-definitive-guide-to-natural-language-processing/ 569 | --high level walk through of NLP concepts 570 | 571 | https://www.yhat.com/ops-demos/ 572 | --ml demos with keras / opencv 573 | 574 | http://bit.ly/2eNfcOs 575 | --wrapper around Google charts API 576 | 577 | https://github.com/metagrover/ES6-for-humans 578 | --a good set of descriptions of javascript conventions, symbols and syntax 579 | 580 | https://github.com/wireservice/agate 581 | --data discovery tool 582 | 583 | http://www.primaryobjects.com/2013/01/27/using-artificial-intelligence-to-write-self-modifying-improving-programs/ 584 | --program that generates code 585 | 586 | http://textminingonline.com/getting-started-with-word2vec-and-glove-in-python 587 | --word2vec vs GloVe 588 | 589 | https://mostafa-samir.github.io/ml-theory-pt3/ 590 | --an introduction to bias variance trade off 591 | 592 | https://www.opendatascience.com/blog/bayesian-deep-learning/ and https://www.opendatascience.com/blog/bayesian-deep-learning-part-ii-bridging-pymc3-and-lasagne-to-build-a-hierarchical-neural-network/ 593 | --neural nets and bayesian thinking 594 | 595 | https://inviqa.com/blog/graphs-database-sql-meets-social-network 596 | --loops in SQL, graph traversal in SQL 597 | 598 | https://blog.bigchaindb.com/blockchains-for-artificial-intelligence-ec63b0284984#.dzilfvdfq 599 | --blockchain ml 600 | 601 | http://pytorch.org/ 602 | --neual nets 603 | 604 | https://engineering.instagram.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172#.75j94rygt 605 | --Python Garbage Collection 606 | 607 | http://dustintran.com/talks/Tran_Edward.pdf 608 | --probability modeling 609 | 610 | https://www.r-bloggers.com/outlier-detection-with-mahalanobis-distance/ 611 | --outlier detection 612 | 613 | http://yann.readthedocs.io/en/master/ 614 | --yet another neuaral network library 615 | 616 | https://arxiv.org/abs/1701.06538?utm_content=buffer26227&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 617 | --flow of control in neural networks 618 | 619 | http://peterdowns.com/posts/first-time-with-pypi.html 620 | --making a pypi package 621 | 622 | https://github.com/lenagroeger/gifs 623 | --data visualization gifs 624 | 625 | https://github.com/mbernico/snape 626 | --realistic dummy data for testing algorimths. 627 | 628 | https://research.fb.com/prophet-forecasting-at-scale/ 629 | --facebook forecasting library 630 | 631 | `sudo killall coreaudiod` -- for when screen hero audio doesn't work 632 | 633 | https://pypi.python.org/pypi/ERAlchemy 634 | --Create ER diagrams "for free" 635 | 636 | http://students.brown.edu/seeing-theory/?vt=4 637 | --visual descriptions of basic probability 638 | 639 | https://blog.dominodatalab.com/fitting-gaussian-process-models-python/ 640 | --gaussian processes for prediction in python 641 | 642 | http://www.kdnuggets.com/2017/03/yhat-beginner-guide-customer-segmentation.html 643 | --pedogogical intro to clustering 644 | 645 | http://dan.iel.fm/emcee/current/user/line/ 646 | --parameter estimation with MCMC 647 | 648 | http://nipy.org/nitime/api/generated/nitime.timeseries.html 649 | --time series analysis 650 | 651 | http://fb09-pasig.umwelt.uni-giessen.de/spotpy/ 652 | --spotpy docs for doing simulation of data 653 | 654 | https://github.com/slavivanov/Style-Tranfer 655 | --style transfer code with a conv net in keras 656 | 657 | http://www.datasciencecentral.com/profiles/blogs/top-10-ipython-tutorials-for-data-science-and-machine-learning 658 | --whole bunch of ml notebooks 659 | 660 | https://arstechnica.co.uk/information-technology/2017/03/google-jpeg-guetzli-encoder-file-size/ 661 | --file compression. 662 | 663 | https://blog.jisungkim.com/machine-learning-and-art-9ea2c9342180#.2ve57gv6f 664 | -- art and ml examples 665 | 666 | http://www.kdnuggets.com/2017/03/simple-xgboost-tutorial-iris-dataset.html?utm_content=buffer8924b&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer 667 | --xgboost tutorial python 668 | 669 | http://blog.yhat.com/posts/python-generated-powerpoint.html 670 | --power point generator 671 | 672 | https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf 673 | -scikit learning cheat sheet 674 | 675 | http://deeplearning.net/tutorial/deeplearning.pdf 676 | --deep learning in python book theano numpy 677 | 678 | http://www.markhneedham.com/blog/2017/03/25/luigi-externalprogramtask-example-converting-json-csv/ 679 | --luigi intro 680 | 681 | https://github.com/fchollet/keras-resources 682 | --keras resources 683 | 684 | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5016890/pdf/12859_2016_Article_1236.pdf 685 | -- generalized logistic regression 686 | 687 | https://github.com/rajshah4/image_keras 688 | -- image classification 689 | 690 | http://www.pyimagesearch.com/2017/04/17/real-time-facial-landmark-detection-opencv-python-dlib/ 691 | --facial recognition for video 692 | 693 | https://tech-forward-2.glitch.me/ 694 | --list of awesome tech orgs 695 | 696 | http://www.datasciencecentral.com/profiles/blogs/introduction-to-outlier-detection-methods?utm_content=buffer0fb5c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 697 | --an introduction to outlier detection 698 | 699 | http://theorangeduck.com/page/phase-functioned-neural-networks-character-control?utm_content=buffereda7e&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 700 | --phase function neural networks - might be useful for timeseries 701 | 702 | https://www.quantinsti.com/blog/trading-using-machine-learning-python/#DataScience 703 | -- timeseries prediction in data science parlence. 704 | 705 | https://github.com/wayaai 706 | --a very cool deep learning company 707 | 708 | https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3 709 | --how to work with keras and VGG16 (also `from keras.applications.vgg16 import VGG16; model = VGG16()`) 710 | 711 | http://www.kdnuggets.com/2017/04/ai-maturity-model.html 712 | --maturity model 713 | 714 | https://medium.com/airbnb-engineering/automated-machine-learning-a-paradigm-shift-that-accelerates-data-scientist-productivity-airbnb-f1f8a10d61f8?from=timeline&isappinstalled=0 715 | --artificial intelligence automation 716 | 717 | http://p.migdal.pl/2017/04/30/teaching-deep-learning.html 718 | --deep learning keras intro 719 | 720 | https://www.xenonstack.com/blog/overview-of-artificial-intelligence-and-role-of-natural-language-processing-in-big-data 721 | --great nlp 722 | 723 | https://www.quantinsti.com/blog/trading-using-machine-learning-python/#DataScience 724 | -- timeseries in python 725 | 726 | http://www.kdnuggets.com/2017/03/naive-sharding-centroid-initialization-method.html?utm_content=buffer45425&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 727 | -- k-means improvement 728 | 729 | http://www.datasciencecentral.com/profiles/blogs/10-deep-learning-terms-explained-in-simple-english?utm_content=buffer6e829&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 730 | -- list of machine learning terms 731 | 732 | http://flowingdata.com/2017/05/02/shifting-incomes-for-young-people/ 733 | --job data 734 | 735 | http://www.rightrelevance.com/search/articles/hero?article=b8c3fc25c7f0238393be0d0ad4fc93fa074be5f6&query=data%20science&taccount=ml_toparticles 736 | --mortality data 737 | 738 | http://cmawer.github.io/trainspotting/trainspotting-blog.html 739 | --train detection and direction detection 740 | 741 | http://www.kdnuggets.com/2017/04/datascience-introduction-anomaly-detection.html 742 | --anamoly detection 743 | 744 | https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python?utm_content=buffer85c3f&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 745 | --kalman and bayesian filters in python 746 | 747 | http://www.kdnuggets.com/2016/06/open-source-machine-learning-degree.html?utm_content=bufferea858&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 748 | -- open source data science degree 749 | 750 | https://medium.com/merantix/picasso-a-free-open-source-visualizer-for-cnns-d8ed3a35cfc5?platform=hootsuite 751 | --cnn visualizer 752 | 753 | https://medium.com/intuitionmachine/deep-adversarial-learning-is-finally-ready-and-will-radically-change-the-game-f0cfda7b91d3 754 | -- good basic description of generative adversarial neural networks. 755 | 756 | http://www.pyimagesearch.com/2016/08/10/imagenet-classification-with-python-and-keras/ 757 | --keras image processing tutorial 758 | 759 | https://www.datascience.com/resources/tools/skater 760 | -- model interpretation library 761 | 762 | http://www.datasciencecentral.com/profiles/blogs/how-to-tell-a-compelling-story-with-data-6-rules-6-tools?overrideMobileRedirect=1 763 | --telling data stories 764 | 765 | https://github.com/gaojiuli/tomd 766 | --converts HTML into markdown 767 | 768 | https://medium.com/@karpathy/alphago-in-context-c47718cb95a5 769 | --super awesome description of AlphaGo 770 | 771 | https://opendatascience.com/rec-system/?utm_content=52586516&_hsenc=p2ANqtz-9jGizLlpsoa76ETOX2LRnsRKzzER0lIeENGuQuIvUflcllijdwfT6L6w-md3zQOEiTZp3xaIy1l0CsoeDgKVLRhzkPKg&_hsmi=52595398 772 | --recommender system intro in Python 773 | 774 | https://opendatascience.com/blog/factorization-machines-for-recommendation-systems/?utm_campaign=Newsletters&utm_source=hs_email&utm_medium=email&utm_content=52586516&_hsenc=p2ANqtz-_Vr8oIhp5ceuxkCEIrj9ccwSKBPIedXDF0ORf1j2E1dN6JzTR1RwAlSNVTU-eb6uHdMS4secVkw0s5ryG5qne6SioKVg&_hsmi=52595398 775 | --more recommender stuff in Python 776 | 777 | https://opendatascience.com/time-series-analysis-with-generalized-additive-models/?utm_content=52586516&_hsenc=p2ANqtz-9oWCL-QDRrDQOcDJdmmzUvRdBBnRf_L8cn5epiWWHWOdOVzwCEcWZUP8U-Hv6ZoUI1hrzfyt-Vf7jlEoFjxoqR7FeIGg&_hsmi=52595398 778 | --timeseries analysis with additive models 779 | 780 | http://babble-rnn.consected.com/docs/babble-rnn-generating-speech-from-speech-post.html 781 | --speech processing in keras 782 | 783 | https://github.com/ZWMiller/svdRec 784 | -- recommender system with SVD 785 | 786 | http://blog.echen.me/2017/05/30/exploring-lstms/?utm_content=bufferb0490&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer 787 | -- recurrent neural networks in java 788 | 789 | https://github.com/aredridel/how-to-read-code/blob/master/how-to-read-code.md 790 | -- how to read code 791 | 792 | https://2016.foss4g-na.org/sites/default/files/slides/FOSS4G_machine_learning.pdf 793 | -- ml and geospatial 794 | 795 | https://github.com/EricSchles/reveal.js 796 | --js slides in browser 797 | 798 | https://hilaryparker.com/ 799 | -- R programmer worth following 800 | 801 | https://github.com/starcolon/vor-knowledge-graph 802 | -- open knowledge graph generator from wikipedia 803 | 804 | https://en.wikipedia.org/wiki/AIML 805 | -- AI markup language 806 | 807 | http://python-for-multivariate-analysis.readthedocs.io/a_little_book_of_python_for_multivariate_analysis.html 808 | -- a fantastic introduction to multivariate analysis with a great explanation of LDA, PCA 809 | 810 | https://help.gooddata.com/display/doc/Normality+Testing+-+Skewness+and+Kurtosis 811 | --understanding the results of the normal test in scipy 812 | 813 | http://www.statisticssolutions.com/correlation-pearson-kendall-spearman/ 814 | -- understanding different correlation tests 815 | 816 | http://nbviewer.jupyter.org/gist/aflaxman/6871948 817 | -- understanding bootstraping 818 | 819 | http://www.stat.pitt.edu/stoffer/tsa4/tsaEZ.pdf 820 | --introduction to timeseries analysis 821 | 822 | http://arch.readthedocs.io/en/latest/index.html 823 | --advanced timeseries modeling 824 | 825 | http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/ 826 | -- timeseries modeling with keras 827 | 828 | http://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/ 829 | -- timeseries modeling with keras (part 2) 830 | 831 | https://www.amazon.com/Deep-Time-Forecasting-Python-Introduction-ebook/dp/B01N100IPR 832 | -- timeseries forecasting with keras book 833 | 834 | http://www.stata.com/meeting/5nasug/TSFiltering_beamer.pdf 835 | --band filtering 836 | 837 | https://thehackerdiary.wordpress.com/2017/06/09/it-is-ridiculously-easy-to-generate-any-audio-signal-using-python/ 838 | --make music with Python 839 | 840 | https://semaphoreci.com/community/tutorials/generating-fake-data-for-python-unit-tests-with-faker 841 | -- a pretty decent data faking package. 842 | 843 | https://sflscientific.com/blog/2017/2/10/predicting-stock-volume-with-lstm 844 | -- stockmarket analysis with RNNs 845 | 846 | https://kndrck.co/indexing-faces-on-instagram.html 847 | --horrifying and creepy, but useful in the anti trafficking context - scraping faces from instagram 848 | 849 | https://serverlesscode.com/post/rich-jones-interview-django-zappa/ 850 | -- AWS lambda 851 | 852 | http://www.kdnuggets.com/2017/03/working-numpy-matrices.html 853 | -- tiny intro to numpy 854 | 855 | https://github.com/rigetticomputing/pyquil 856 | -- quantum cloud computing library for python 857 | 858 | http://blog.aylien.com/understanding-customer-frustrations-in-the-airline-industry-with-aspect-based-sentiment-analysis/ 859 | -- aspect based sentiment analysis 860 | 861 | https://github.com/DistrictDataLabs/yellowbrick 862 | -- Visual analysis and diagnostic tools to facilitate machine learning model selection 863 | 864 | https://github.com/DistrictDataLabs/partisan-discourse 865 | -- build your own nlp corpus 866 | 867 | https://pypi.python.org/pypi/baleen/0.3.3 868 | -- build your own nlp corpus 869 | 870 | https://github.com/DistrictDataLabs/minke 871 | -- nlp feature extractor w/ metadata 872 | 873 | https://github.com/ethereum/pyethereum 874 | --python interface for ethereum 875 | 876 | https://www.wired.com/2016/01/use-code-to-create-sweet-3-d-visualizations-of-electric-fields/ 877 | --3-D models 878 | 879 | https://www.youtube.com/watch?v=oNf3I1fVmg8&feature=share 880 | --tensorflow, spark, advanced algebra things 881 | 882 | https://github.com/meetshah1995/pytorch-semseg 883 | --semantic image segmentation 884 | 885 | https://github.com/bokeh/datashader?utm_content=buffera606e&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 886 | -- shards big data correctly and geomaps it 887 | 888 | http://www.kdnuggets.com/2017/07/strange-loop-deep-learning.html?utm_content=bufferdb453&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 889 | --ladder networks explained. 890 | 891 | https://github.com/mehrdadn/SOTA-Py?utm_content=bufferd4663&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 892 | --routing problem algorithm - "How do you travel from point A to point B in T time under traffic?" 893 | 894 | https://github.com/reinforceio/tensorforce 895 | -- deep reinforcement learning 896 | 897 | http://tensorflow-world-resources.readthedocs.io/en/latest/ 898 | --tensorflow intro 899 | 900 | https://research.googleblog.com/2017/07/facets-open-source-visualization-tool.html?m=1 901 | -- data viz library of winning and awesomeness. 902 | 903 | http://bair.berkeley.edu/blog/2017/07/18/learning-to-learn/ 904 | -- learning to learn 905 | 906 | https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6 907 | -- pandas optimizations 908 | 909 | https://www.technologyreview.com/s/608387/an-algorithm-trained-on-emoji-knows-when-youre-being-sarcastic-on-twitter/?set=608492 910 | -- detecting sarcasm with emojis 911 | 912 | https://github.com/blue-yonder/tsfresh 913 | --feature extraction for timeseries 914 | 915 | https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607 916 | -- super good read - debugging neural networks. 917 | 918 | https://stats.stackexchange.com/questions/18891/bagging-boosting-and-stacking-in-machine-learning 919 | -- boosting bagging and stacking explained! 920 | 921 | https://www.buzzfeed.com/peteraldhous/hidden-spy-planes?utm_term=.uu8969pK9#.krQ0O0qe0 922 | -- geo classification example 923 | 924 | https://www.analyticsvidhya.com/blog/2017/08/catboost-automated-categorical-data/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AnalyticsVidhya+%28Analytics+Vidhya%29 925 | -- how to use categorical boosting library 926 | 927 | https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf 928 | -- a good general book on data science 929 | 930 | https://labs.eleks.com/2016/10/combined-different-methods-create-advanced-time-series-prediction.html 931 | -- a good use of timeseries techniques 932 | 933 | https://repositorio-aberto.up.pt/bitstream/10216/82298/2/37884.pdf 934 | -- spatial timeseries data analysis book 935 | 936 | https://pdfs.semanticscholar.org/cb6d/e3eeb810a5fe3341118b492aa94ecd5b8c83.pdf 937 | -- timeseries analysis 938 | 939 | https://medium.com/towards-data-science/gradient-descend-with-free-monads-ebf9a23bece5 940 | -- gradient descent in scala 941 | 942 | http://www.paddlepaddle.org/ 943 | --baidu's deep learning library 944 | 945 | https://ringtheory.herokuapp.com/ 946 | -- ring theory database. 947 | 948 | https://medium.com/twentybn/visual-explanation-for-video-recognition-87e9ba2a675b 949 | -- categorizing actions 950 | 951 | https://github.com/adebayoj/fairml 952 | -- detect racial bias 953 | 954 | https://oneraynyday.github.io/2017/08/20/VC-Dimensions/ 955 | -- statistical learning blog 956 | 957 | https://machinelearning.apple.com/2017/08/06/siri-voices.html?utm_content=buffer1ad8c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 958 | -- text to speech generation 959 | 960 | https://twitter.com/planarrowspace/status/901480960587218944/photo/1 961 | -- reinforcement learning 962 | 963 | https://hackernoon.com/docker-compose-gpu-tensorflow-%EF%B8%8F-a0e2011d36 964 | --GPU + Docker + tensorflow 965 | 966 | http://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources?utm_content=buffer6ddaa&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 967 | -- list of lists of data science things 968 | 969 | http://nuit-blanche.blogspot.fr/2017/08/projectionnet-learning-efficient-on.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+blogspot/wCeDd+(Nuit+Blanche 970 | --projection networks - compressing large network architectures 971 | 972 | http://nuit-blanche.blogspot.fr/2017/08/videos-deep-learning-dlss-and.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+blogspot/wCeDd+(Nuit+Blanche 973 | -- reinforcement learning videos 974 | 975 | http://allendowney.blogspot.com/2015/05/hypothesis-testing-is-only-mostly.html 976 | --the true value of computing the p-value. This is very interesting because it gives us not only the use-case of the p-value but also a path forward to test for bias as well. 977 | 978 | https://gmarti.gitlab.io/ml/2017/09/07/how-to-sort-distance-matrix.html 979 | --agglomerative clustering algorithm visualization in action! The idea here is that by first sorting data according to the hierarchical algorithm you can produce a strong and intuitive clustering visualization of your data. 980 | 981 | https://medium.com/towards-data-science/a-brief-overview-of-outlier-detection-techniques-1e0b2c19e561 982 | -- outlier detection - covers a nice overview including three specific examples - z-score, dbscan and isolation forrests. Unfortunately doesn't cover the rest of the types of algorithms that are mentioned in the high level overview. 983 | 984 | https://medium.com/towards-data-science/deep-learning-for-object-detection-a-comprehensive-review-73930816d8d9 985 | -- A good explanation of the current state of the art for image classification. This article like most of the articles of this kind cover three techniques - R-CNN, Faster-R-CNN and SSD. The computational architecture of each model is explained and some mention of where you might find these models, namely tensorflow is mentioned. They all seem to have similar performance in terms of accuracy. The main area of interest in this article was speed - how fast do the algorithms run. This may appear to be a subtle shift, but typically image classification algorithm explainations of read in the past have only been concerned with performance in terms of accuracy. The fact that folks are now more concerned with speed means we are hitting the upper limit of accuracy. 986 | 987 | https://dzone.com/articles/machine-learning-measuring 988 | -- a good set of distance metrics used in machine learning problems. 989 | 990 | http://goodtables.okfnlabs.org/ 991 | -- data validation 992 | 993 | https://userinput.io/#/#examples 994 | -- userinput testing 995 | 996 | https://blog.openai.com/unsupervised-sentiment-neuron/ 997 | -- really good sentiment classifier 998 | 999 | https://machinelearningmastery.com/transduction-in-machine-learning/ 1000 | -- transduction defined 1001 | 1002 | https://www.digitaltrends.com/business/washington-post-robot-reporter-heliograf/?utm_content=buffer20089&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1003 | --an article on how machines write our news now 1004 | 1005 | https://nlml.github.io/in-raw-numpy/in-raw-numpy-t-sne/ 1006 | -- a great introduction to t-SNE 1007 | 1008 | https://www.analyticsvidhya.com/blog/2017/09/pseudo-labelling-semi-supervised-learning-technique/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+AnalyticsVidhya+%28Analytics+Vidhya%29 1009 | 1010 | https://security-informatics.springeropen.com/articles/10.1186/s13388-017-0029-8 1011 | -- two articles on semi supervised learning 1012 | 1013 | https://www.twilio.com/blog/2017/08/geospatial-analysis-python-geojson-geopandas.html 1014 | -- a super good intro to geospatial analysis in python 1015 | 1016 | https://github.com/dwillis/nyc-maps.git 1017 | --nyc maps in geojson format 1018 | 1019 | http://jose-coto.com/plotting-geopandas 1020 | --an awesome analysis of plotting points with a geometry 1021 | 1022 | https://www.datacamp.com/community/tutorials/preprocessing-in-data-science-part-2-centering-scaling-and-logistic-regression#gs.jzWZFRU 1023 | -- a good analysis of the trade off between logistic regression and k-nearest-neighbors. Knn needs data to scale, logistic regression will do about the same, even with scaled data. 1024 | 1025 | https://monkeylearn.com/blog/beginners-guide-text-vectorization/ 1026 | -- some text classification stuff. specifically skip thought vectors versus bag of words and then joining the techniques together for better performance. 1027 | 1028 | https://hackernoon.com/machine-learning-with-javascript-part-1-9b97f3ed4fe5 1029 | -- machine learning tutorial in javascript 1030 | 1031 | http://www.kdnuggets.com/2017/10/upcoming-meetings-analytics-big-data-science-machine-learning.html?utm_content=buffer3ed10&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1032 | --a good explanation of boosting weak classifiers. covers gradient boosting and extreme boosting (xgboost) 1033 | 1034 | http://www.kdnuggets.com/2017/10/understanding-machine-learning-algorithms.html?utm_content=buffer559a8&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1035 | -- a good overview of decision trees, random forests, support vector machines, and neural networks. The kernal trick of svms is well explained, finally. 1036 | 1037 | https://dzone.com/articles/breakthrough-research-papers-and-models-for-sentim 1038 | -- neural network sentiment analysis 1039 | 1040 | http://stackabuse.com/parallel-processing-in-python/ 1041 | -- a good introduction to parallel processing 1042 | 1043 | https://jtsulliv.github.io/stock-movement/?utm_content=buffer0d87f&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1044 | --a good introduction to brownian motion and Euler-Maruyama Model time series analysis 1045 | 1046 | https://www.oreilly.com/ideas/deep-matrix-factorization-using-apache-mxnet 1047 | -- recommender systems 1048 | 1049 | https://becominghuman.ai/following-messi-with-tensorflow-and-object-detection-20ba6d75667 1050 | -- custom object detection in video using tensorflow 1051 | 1052 | https://chatbotnewsdaily.com/since-the-initial-standpoint-of-science-technology-and-ai-scientists-following-blaise-pascal-and-804ac13d8151 1053 | -- a nice little history for machine learning 1054 | 1055 | http://www.bodowinter.com/tutorial/bw_LME_tutorial1.pdf 1056 | -- a good introduction to fixed effects 1057 | 1058 | http://www.bodowinter.com/tutorial/bw_LME_tutorial2.pdf 1059 | -- a good introduction to mixed effects 1060 | 1061 | https://medium.com/towards-data-science/squeeze-and-excitation-networks-9ef5e71eacd7 1062 | -- holy crap! 25% performance jump on imagenet 1063 | 1064 | https://github.com/MaxHalford/xam 1065 | -- interesting ml toolbox 1066 | 1067 | https://journals.aps.org/pra/abstract/10.1103/PhysRevA.96.042113 1068 | -- solving problems in physics with precision without an analytic form. 1069 | 1070 | https://wxs.ca/research/multiscale-neural-synthesis/?utm_content=buffer08f81&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1071 | --this is cool multiscale neural style synthesis 1072 | 1073 | https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2 1074 | --Good intro to state of art in Reinforcement learning 1075 | 1076 | https://github.com/asktree/Asymmetric-Hashing-ANN 1077 | -- asymmetric hashing algorithm from google - uses asymmetric hashing and beam search to speed up automatic reply 1078 | 1079 | https://www.pyimagesearch.com/2017/10/30/how-to-multi-gpu-training-with-keras-python-and-deep-learning/ 1080 | -- a good introduction to multiple gpu training for keras 1081 | 1082 | https://medium.com/towards-data-science/the-10-statistical-techniques-data-scientists-need-to-master-1ef6dbd531f7 1083 | -- some good model introspection techniques here, also a good basic understanding of splines, PCR and PLS 1084 | 1085 | https://dzone.com/articles/optimizing-k-means-clustering-for-time-series-data 1086 | -- time series k-means clustering 1087 | 1088 | https://medium.com/towards-data-science/15-stunning-data-visualizations-and-what-you-can-learn-from-them-fc5b78f21fb8 1089 | -- a good intro to data visualization best practice 1090 | 1091 | https://twitter.com/AllenDowney/status/926960793261928449 1092 | -- an introduction to bell's inequality 1093 | 1094 | https://www.newnorth.com/creating-a-predictive-churn-mode-part-1l/ 1095 | -- churn modeling basics 1096 | 1097 | https://www.datascience.com/blog/what-is-a-churn-analysis-and-why-is-it-valuable-for-business 1098 | -- churn modeling modeling high level 1099 | 1100 | http://blog.yhat.com/posts/predicting-customer-churn-with-sklearn.html 1101 | -- modeling churn with scikit 1102 | 1103 | https://github.com/aloctavodia/Statistical-Rethinking-with-Python-and-PyMC3 1104 | -- bayesian book 1105 | 1106 | https://petewarden.com/2017/10/29/how-do-cnns-deal-with-position-differences/?utm_content=bufferd86a2&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1107 | -- convolutional neural networks introduced in a detailed way. 1108 | 1109 | https://github.com/tomlepaine/fast-wavenet 1110 | --fast convnets for timeseries analysis 1111 | 1112 | https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765 1113 | -- how to test machine learning code 1114 | 1115 | https://machinelearningmastery.com/prepare-photo-caption-dataset-training-deep-learning-model/ 1116 | -- captioning text for images 1117 | 1118 | https://github.com/Mic92/kshape 1119 | -- time series clustering 1120 | 1121 | https://www.slideshare.net/HamdanAzhar1/open-data-science-west-introduction-to-emoji-data-science-hamdan-azhar-nov-3-2017-81595966?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3Bzm0bIKk7TjaWWw5P7THNGA%3D%3D 1122 | --emoji's are also data 1123 | 1124 | http://vertex.ai/blog/announcing-plaidml?utm_content=buffereb80a&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1125 | --an altnerative to the tensorflow backend 1126 | 1127 | https://www.datasciencecentral.com/forum/topics/k-means-clustering-effect-of-random-seed?utm_content=buffer9a2fb&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1128 | -- seed matters for k-means 1129 | 1130 | https://randomekek.github.io/deep/deeplearning.html 1131 | --deep learning reference 1132 | 1133 | https://medium.com/singular-distillation/little-explanations-information-bottleneck-theory-its-possible-link-to-neural-networks-1d4df1badf72 1134 | -- mutual information used to study neural networks. I say, so what? But maybe this is a useful thing. 1135 | 1136 | https://schedule.readthedocs.io/en/stable/ 1137 | --a simple scheduler 1138 | 1139 | https://www.youtube.com/watch?v=3VQ382QG-y4&feature=youtu.be 1140 | --an introduction to lambda calculus 1141 | 1142 | https://github.com/stitchfix/diamond 1143 | -- mixed effects models in python 1144 | 1145 | https://github.com/civisanalytics/civisml-extensions 1146 | -- scikit learning classifier and regressor stacking 1147 | 1148 | https://github.com/caseyclements/pennies 1149 | -- advanced time series modeling in python 1150 | 1151 | https://arxiv.org/pdf/1607.06520.pdf 1152 | -- super good paper on identifying gender bias 1153 | 1154 | https://github.com/ericmjl/bayesian-analysis-recipes 1155 | -- bayesian deep learning examples 1156 | 1157 | https://github.com/mila-udem 1158 | -- a very neat collection of tools 1159 | 1160 | https://github.com/bnaul/IrregularTimeSeriesAutoencoderPaper 1161 | -- A recurrent neural network for classification of unevenly sampled variable stars 1162 | 1163 | https://www.youtube.com/user/PyDataTV/videos 1164 | -- pydata videos 1165 | 1166 | https://www.kdnuggets.com/2017/11/automated-feature-engineering-time-series-data.html?utm_content=buffere2903&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1167 | -- time series feature engineering 1168 | 1169 | http://www.nehalemlabs.net/prototype/blog/2013/04/05/an-introduction-to-smoothing-time-series-in-python-part-i-filtering-theory/ 1170 | -- a bunch of smoothing techniques 1171 | 1172 | https://www.kdnuggets.com/2017/07/when-not-use-deep-learning.html 1173 | -- fantastic explanation of deep learning 1174 | 1175 | https://www.kdnuggets.com/2017/11/10-statistical-techniques-data-scientists-need-master.html?utm_content=bufferd0f6b&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1176 | -- a survey of statistical techniques 1177 | 1178 | https://www.fullstackpython.com/blog/first-steps-gitpython.html 1179 | -- python git client 1180 | 1181 | http://pbpython.com/market-basket-analysis.html 1182 | -- aprori algorithm at work 1183 | 1184 | https://towardsdatascience.com/using-word2vec-for-music-recommendations-bb9649ac2484 1185 | -- music word2vec 1186 | 1187 | http://rlhick.people.wm.edu/posts/estimating-custom-mle.html 1188 | -- how to write a custom MLE with OLS as an example 1189 | 1190 | https://github.com/ipython-books/cookbook-code 1191 | -- a cookbook of a lot of scientific computing stuff. Mostly a bunch of great patterns for using numpy. 1192 | 1193 | https://pypi.python.org/pypi/thinkx/1.1.2 1194 | --thinkbayes package 1195 | 1196 | https://brilliant.org/wiki/stationary-distributions/ 1197 | -- a very good introduction to Markov Chains. Sadly I know understand graphs, as a consequence, to be just another representation of matrices. Also, markov chains do finally make sense. And interestingly, you can find the steady states of Markov Chains from time to time. (joke) 1198 | 1199 | https://github.com/scrat-online/pySTARMA 1200 | -- geospatial timeseries ARIMA algorithm. Looks out of date, consider updating. 1201 | 1202 | https://github.com/wkentaro/labelme 1203 | -- an image annotation tool, which may be useful for annotating various images in image training sets. 1204 | 1205 | https://cupy.chainer.org/?utm_content=bufferc0bef&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1206 | -- numpy written for cuda 1207 | 1208 | https://einstein.ai/research/hierarchical-reinforcement-learning?utm_content=bufferdcdd1&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1209 | -- hierarchical RL language models 1210 | 1211 | https://github.com/artpar/languagecrunch 1212 | -- an NLP server ready to go 1213 | 1214 | https://blog.dominodatalab.com/bias-policing-analysis-traffic-stop-data/?utm_content=buffer8976c&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer 1215 | -- a great analysis of racial bias 1216 | 1217 | https://towardsdatascience.com/how-to-create-data-products-that-are-magical-using-sequence-to-sequence-models-703f86a231f8 1218 | -- a good example of how to use sequence to sequence models in industry. 1219 | 1220 | https://towardsdatascience.com/train-test-split-and-cross-validation-in-python-80b61beca4b6 1221 | -- great intro to cross validation, k-fold for sklearn 1222 | 1223 | https://github.com/chrispaulca/waterfall 1224 | --waterfall is an interesting visualization tool. Most interestingly, it can be used in conjunction with treeinterpretter to produce visualizations for tree based model interpretation - since you can retrain any model on a tree structure, this can be used as a general interpretability visualization across feature space. 1225 | 1226 | https://github.com/andosa/treeinterpreter 1227 | -- tree interpreter interprets tree based models of any kind. Looks very promising for understanding various models. 1228 | 1229 | https://openreview.net/ 1230 | -- very interesting set of resources on the papers to understand and internalize within ML 1231 | 1232 | https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8 1233 | -- a good explaination of feature engineering for logistic regression 1234 | 1235 | https://en.wikipedia.org/wiki/Silhouette_(clustering) 1236 | -- used to assess the quality of clustering algorithms 1237 | 1238 | https://www.youtube.com/watch?v=MIKYRZc9A1M 1239 | -- a fantastic deconstruction of superman 1240 | 1241 | https://www.youtube.com/watch?v=R13BD8qKeTg 1242 | -- best introduction to bayes I've ever seen 1243 | 1244 | https://github.com/bmabey/pyLDAvis 1245 | -- LDA visualization library 1246 | 1247 | http://scikit-learn.org/stable/related_projects.html 1248 | -- a great list of related packages and tools 1249 | 1250 | https://github.com/cytoscape/cytoscape.js 1251 | -- graph visualization js library 1252 | 1253 | https://realpython.com/blog/python/python-matplotlib-guide/ 1254 | -- a good introduction to matplotlib 1255 | 1256 | https://gist.github.com/aronwc/8248457 1257 | -- gensim and sklearn together 1258 | 1259 | https://en.wikipedia.org/wiki/Synthetic_control_method 1260 | -- a way of doing natural experiments 1261 | 1262 | http://ecocontrol.readthedocs.io/en/latest/index.html 1263 | -- interesting timeseries forecasting system 1264 | 1265 | http://www.cs.cornell.edu/~tomf/pyglpk/glpk.html 1266 | -- interesting looking package 1267 | 1268 | https://github.com/laspy/laspy 1269 | -- LiDAR 1270 | 1271 | https://medium.com/luminovo/a-refresher-on-batch-re-normalization-5e0a1e902960 1272 | -- batch renormalization, better than batch normalization 1273 | 1274 | https://www.linkedin.com/pulse/4-reasons-your-machine-learning-model-wrong-how-fix-bilal-mahmood/ 1275 | -- bias variance trade off and precision recall 1276 | 1277 | https://www.kaggle.com/marknagelberg/rmsle-function 1278 | -- root mean squared loss error function 1279 | 1280 | http://www.business-science.io/code-tools/2017/10/28/demo_week_h2o.html 1281 | -- timeseries automl R 1282 | 1283 | https://towardsdatascience.com/how-i-learned-to-love-parallelized-applies-with-python-pandas-dask-and-numba-f06b0b367138 1284 | -- pandas numba dask performance benchmarking 1285 | 1286 | https://machinelearningmastery.com/keras-functional-api-deep-learning/ 1287 | -- shared layers neural network architecture for keras 1288 | 1289 | https://github.com/titu1994/BatchRenormalization 1290 | -- batch renormalization in keras 1291 | 1292 | https://www.programcreek.com/python/example/83247/sklearn.cross_validation.KFold 1293 | -- a good set of automl and cross validation techniques 1294 | 1295 | https://github.com/Britefury/batchup 1296 | -- a program for batching datasets. 1297 | 1298 | https://github.com/mdbloice/Augmentor 1299 | -- image augmentation library for deep learning 1300 | 1301 | https://github.com/HIPS/molecule-autoencoder 1302 | 1303 | https://brightthemag.com/legalizing-sex-work-spain-prostitution-human-rights-trafficking-immigration-gender-78b96c05e6fa 1304 | -- what happens when you decriminalize sex 1305 | 1306 | https://www.arxiv-vanity.com/papers/1803.04488/ 1307 | -- concept2vec - embeddings for ontological concepts 1308 | 1309 | https://www.oreilly.com/ideas/introducing-capsule-networks 1310 | -- capsule net introduction 1311 | 1312 | https://medium.freecodecamp.org/understanding-capsule-networks-ais-alluring-new-architecture-bdb228173ddc 1313 | -- another great intro to capsule net 1314 | 1315 | https://stackoverflow.com/questions/11404156/how-do-i-replace-text-in-a-selection 1316 | -- sublime magic - replace text in selected area 1317 | 1318 | https://www.youtube.com/watch?v=CY3t11vuuOM 1319 | -- introduction to LIME 1320 | 1321 | https://github.com/Ahmkel/Keras-Project-Template/blob/master/README.md 1322 | -- keras templates 1323 | 1324 | http://sigmajs.org/ 1325 | --sigma.js graph visualization library 1326 | 1327 | List intersection: 1328 | 1329 | https://stackoverflow.com/questions/6369527/python-list-intersection-efficiency-generator-or-filter 1330 | 1331 | https://www.geeksforgeeks.org/python-intersection-two-lists/ 1332 | 1333 | -- efficiently combine two lists 1334 | 1335 | https://towardsdatascience.com/simple-and-multiple-linear-regression-in-python-c928425168f9 1336 | -- linear regression in Python, explained well 1337 | 1338 | http://readingthemarkets.blogspot.com/2010/11/critique-of-granger-causality.html 1339 | --criticism of granger causality 1340 | 1341 | http://www.statsoft.com/Textbook/Time-Series-Analysis#lags 1342 | -- statistics book 1343 | 1344 | https://danielcscheer.files.wordpress.com/2012/03/food-stamps-and-poverty-irp-2012.pdf 1345 | -- a good explanation of a lot of things. A great explaination of the matching problem. 1346 | 1347 | https://medium.com/@Francesco_AI/artificial-intelligence-verticals-ii-fintech-daf6f0bd302c 1348 | -- finance DIY 1349 | 1350 | http://brohrer.github.io/how_convolutional_neural_networks_work.html 1351 | --intro to conv nets 1352 | 1353 | https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/ 1354 | -- intro to conv nets 1355 | 1356 | http://betatim.github.io/posts/bayesian-hyperparameter-search/ 1357 | --smarter grid search 1358 | 1359 | https://medium.freecodecamp.org/how-to-build-interactive-presentations-with-jupyter-notebook-and-reveal-js-c7e24f4bd9c5 1360 | --jupyter notebook to slides 1361 | 1362 | https://explosion.ai/blog/sense2vec-with-spacy 1363 | -- sense to vec - part of speech aware word2vec 1364 | 1365 | https://homes.cs.washington.edu/~marcotcr/blog/lime/ 1366 | -- LIME intro 1367 | 1368 | https://github.com/TeamHG-Memex/eli5 1369 | --super interesting explainability of models 1370 | 1371 | https://keras.io/getting-started/functional-api-guide/ 1372 | --play with this for more sophisticated models 1373 | 1374 | https://github.com/farizrahman4u/seq2seq 1375 | -- seq2seq code keras 1376 | 1377 | https://towardsdatascience.com/stochastic-weight-averaging-a-new-way-to-get-state-of-the-art-results-in-deep-learning-c639ccf36a 1378 | -- state of the art neural networks 1379 | 1380 | https://stats.stackexchange.com/questions/84076/negative-values-for-aic-in-general-mixed-model 1381 | --A good interpretation of AIC and how to deal with negative values 1382 | 1383 | https://www.datasciencecentral.com/profiles/blogs/swarm-optimization-goodbye-gradients 1384 | -- alternative to stochastic gradient descent 1385 | 1386 | https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/ 1387 | -- boosting versus bagging 1388 | 1389 | https://towardsdatascience.com/boosting-algorithm-xgboost-4d9ec0207d 1390 | -- subtle differences between xgboost and gradient boosted trees 1391 | 1392 | http://www.swig.org/Doc1.3/Python.html 1393 | --Cython like tool 1394 | 1395 | https://blog.jle.im/entry/purely-functional-typed-models-1.html 1396 | -- machine learning in haskell 1397 | 1398 | https://www.coursera.org/specializations/aml?siteID=lVarvwc5BD0-BShznKdc3CUauhfsM7_8xw&utm_content=2&utm_medium=partners&utm_source=linkshare&utm_campaign=lVarvwc5BD0 1399 | -- coursera deep learning specialization 1400 | 1401 | https://www.coursera.org/learn/machine-learning?utm_source=gg&utm_medium=sem&campaignid=685340575&adgroupid=32639001341&device=c&keyword=coursera%20machine%20learning%20course&matchtype=b&network=g&devicemodel=&adpostion=1t1&creativeid=176442054671&hide_mobile_promo&gclid=Cj0KCQjw5-TXBRCHARIsANLixNzthz0on3vVC1Vg9ldWyDzt0pY_0s2sdmUmKOPX7_H2UPH5GIA1vY4aAvDxEALw_wcB 1402 | -- deep learning coursera 1403 | 1404 | https://towardsdatascience.com/data2vis-automatic-generation-of-data-visualizations-using-sequence-to-sequence-recurrent-neural-5da8e9d3e43e 1405 | --data visualization automated with sequence to sequence vectors 1406 | 1407 | https://www.youtube.com/watch?v=jpNLp9SnTF8&t=1581s 1408 | --interesting neural network architecture - attention, memory 1409 | 1410 | https://machinelearningmastery.com/nonparametric-statistical-significance-tests-in-python/?utm_source=dlvr.it&utm_medium=twitter 1411 | -- introduction to nonparametric tests 1412 | 1413 | https://multithreaded.stitchfix.com/blog/2018/05/14/two-things-about-power/ 1414 | -- really great post on the power test 1415 | 1416 | http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/ 1417 | --feature selection with sklearn 1418 | 1419 | https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0 1420 | -- reinterpretation of a very contraversal paper...Don't think I completely agree 1421 | 1422 | https://machinelearningmastery.com/grid-search-arima-hyperparameters-with-python/ 1423 | --grid search for timeseries 1424 | 1425 | https://www.aiworkbox.com/lessons/specify-pytorch-tensor-minimum-value-threshold 1426 | --aiworkbox deep learning tutorials 1427 | 1428 | https://stats.stackexchange.com/questions/145566/how-to-calculate-area-under-the-curve-auc-or-the-c-statistic-by-hand 1429 | --AUC explained, in detail 1430 | 1431 | https://www.kaggle.com/jayatou/xgbregressor-with-gridsearchcv 1432 | -- good basic xgboost example 1433 | 1434 | https://python-graph-gallery.com/ 1435 | --graph visualization examples galor! 1436 | 1437 | https://www.youtube.com/watch?v=-sIOMs4MSuA 1438 | -- Bayesian modeling non-parametric 1439 | 1440 | https://opendatascience.com/lime-can-make-you-better-at-machine-learning/?utm_content=71572600&utm_medium=social&utm_source=twitter 1441 | -- LIME intro 1442 | 1443 | https://github.com/marcotcr/lime/blob/master/doc/notebooks/Lime%20with%20Recurrent%20Neural%20Networks.ipynb 1444 | -- LIME with RNN 1445 | 1446 | https://askubuntu.com/questions/1032850/display-and-cursor-are-out-of-sync-on-ubuntu-18-04-tablet 1447 | -- stop screen flips on ubuntu 1448 | 1449 | https://towardsdatascience.com/precision-vs-recall-386cf9f89488 1450 | -- a great explaination of precision and recall 1451 | 1452 | https://github.com/RobRomijnders/weight_uncertainty 1453 | --neural networks in a bayesian context 1454 | 1455 | https://stackoverflow.com/questions/23415500/pandas-plotting-a-stacked-bar-chart 1456 | -- how to make a stacked bar chart, the easy way. 1457 | 1458 | https://stackoverflow.com/questions/33271098/python-get-a-frequency-count-based-on-two-columns-variables-in-pandas-datafra 1459 | -- get frequency counts across multiple rows pandas 1460 | 1461 | https://www.kdnuggets.com/2018/05/10-more-free-must-read-books-for-machine-learning-and-data-science.html 1462 | -- free data science books! 1463 | 1464 | http://pynash.org/2013/02/12/proxy-objects/ 1465 | -- proxies in flask, turns out request object is a proxy. 1466 | 1467 | https://en.wikipedia.org/wiki/Mutual_information 1468 | -- a great intro to mutual information 1469 | 1470 | https://en.wikipedia.org/wiki/Entropy_(information_theory) 1471 | -- a great intro to entropy 1472 | 1473 | https://blog.google/topics/machine-learning/introducing-machine-learning-practica/ 1474 | --Keras deep learning course!!!! 1475 | 1476 | https://pypi.org/project/opencv-python/ 1477 | -- opencv prebuilt binaries (why would you install from source) 1478 | 1479 | https://github.com/slundberg/shap 1480 | --unified model interpretability package for classification 1481 | 1482 | https://medium.com/huggingface/universal-word-sentence-embeddings-ce48ddc8fc3a 1483 | -- NLP cutting edge 1484 | 1485 | https://github.com/NervanaSystems/nlp-architect 1486 | -- NLP repo for NLU 1487 | 1488 | https://lwn.net/Archives/ 1489 | -- updates on source of some pretty important stuff 1490 | 1491 | https://github.com/plasticityai/magnitude 1492 | -- a very interesting embedding library with lots of utilities 1493 | 1494 | https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html 1495 | -- ML orchestration 1496 | 1497 | https://www.youtube.com/watch?v=jvwfDdgg93E 1498 | --property based testing, amazing. 1499 | 1500 | https://www.python-course.eu/index.php 1501 | -- an advanced python class 1502 | 1503 | http://people.math.carleton.ca/~kcheung/math/notes/MATH1107/index.html 1504 | -- a nice course on linear algebra, very high level 1505 | 1506 | https://www.dataquest.io/blog/data-science-project-style-guide/?utm_source=twitter&utm_medium=social%20share&utm_content=ds%20project%20style%20guide 1507 | --a great style guide for writing good data science analysis 1508 | 1509 | https://medium.com/near-ai/are-we-close-to-having-machines-solve-topcoder-problems-cc86d33c4324 1510 | -- automatic coding without humans 1511 | 1512 | https://github.com/gboeing/osmnx-examples/blob/master/notebooks/17-street-network-orientations.ipynb 1513 | -- really cool geospatial viz 1514 | --------------------------------------------------------------------------------