└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Great Tech Blog Posts From Companies 2 | A collection of my favorite tech-related blog posts from companies. 3 | 4 | Inspiration for this README comes from @kilimchoi's [repo](https://github.com/kilimchoi/engineering-blogs) 5 | 6 | **Note: I am a Data Engineer, so the majority of these links will be data-related. However, there are plenty of general ones in here as well.** 7 | 8 | ## Links 9 | 10 | ### Adobe 11 | 12 | [Taking Query Optimizations to the Next Level with Iceberg](https://blog.developer.adobe.com/taking-query-optimizations-to-the-next-level-with-iceberg-6c968b83cd6f) - As someone looking to learn more about Iceberg, this was a good primer on how to optimize queries 13 | 14 | [Iceberg at Adobe](https://blog.developer.adobe.com/iceberg-at-adobe-88cf1950e866) - A good overview on how Iceberg is used at a huge company 15 | 16 | [What Is CI/CD² (CI/CD Squared)? Continuous Integration and Continuous Delivery](https://blog.developer.adobe.com/factoring-continuous-destruction-into-a-ci-cd-pipeline-495be05fe1a6) - Continuous destruction isn't a concept we think about often, but it's definitely a useful point to consider 17 | 18 | [Adobe Customer Journey Management’s Journey into the World of GitOps](https://blog.developer.adobe.com/adobe-customer-journey-managements-journey-into-the-world-of-gitops-678a65743d8f) - It's great to see more and more companies embracing GitOps 19 | 20 | ### Airbnb 21 | 22 | [Data Quality at Airbnb](https://medium.com/airbnb-engineering/data-quality-at-airbnb-e582465f3ef7) - The gold standard when it comes to data quality and how it should be followed. 23 | 24 | [Visualizing Data Timeliness at Airbnb](https://medium.com/airbnb-engineering/visualizing-data-timeliness-at-airbnb-ee638fdf4710) - Having the insight to properly track SLAs is helpful for the operational side of things. 25 | 26 | [Achieving Insights and Savings with Cost Data](https://medium.com/airbnb-engineering/achieving-insights-and-savings-with-cost-data-ec9a49fd74bc) - Cost dashboards are a big one to identify the biggest pain points for a team. 27 | 28 | [How Airbnb Built “Wall” to prevent data bugs](https://medium.com/airbnb-engineering/how-airbnb-built-wall-to-prevent-data-bugs-ad1b081d6e8f) - A great framework for holding up data quality. 29 | 30 | [Metis: Building Airbnb’s Next Generation Data Management Platform](https://medium.com/airbnb-engineering/metis-building-airbnbs-next-generation-data-management-platform-d2c5219edf19) - A very good implementation of a data management platform 31 | 32 | [Riverbed: Optimizing Data Access at Airbnb’s Scale](https://medium.com/airbnb-engineering/riverbed-optimizing-data-access-at-airbnbs-scale-c37ecf6456d9) - A very sensible way to combine Lambda and Kappa architectures. 33 | 34 | [Data Quality Score: The next chapter of data quality at Airbnb](https://medium.com/airbnb-engineering/data-quality-score-the-next-chapter-of-data-quality-at-airbnb-851dccda19c3) - A great way to surface the need for data quality 35 | 36 | [Personal Data Classification](https://medium.com/airbnb-engineering/personal-data-classification-2d816d8ea516) - A good shift-left approach for data governance 37 | 38 | [Sandcastle: data/AI apps for everyone](https://medium.com/airbnb-engineering/sandcastle-data-ai-apps-for-everyone-439f3b78b223) - A very good approach for bringing life to data applications 39 | 40 | [From Data to Insights: Segmenting Airbnb’s Supply](https://medium.com/airbnb-engineering/from-data-to-insights-segmenting-airbnbs-supply-c88aa2bb9399) - Good overview on the importance of insights at Airbnb 41 | 42 | ### Atomic Object 43 | 44 | [The Benefits of a Spiking Phase in Agile Development](https://spin.atomicobject.com/2023/01/04/spiking-phase-agile-development/) - How spiking helps for proper planning in Agile. 45 | 46 | ### BBC 47 | 48 | [Quality engineering for a shared codebase](https://medium.com/bbc-product-technology/quality-engineering-for-a-shared-codebase-f0ee6eb24d63) - Quality engineering is always an important topic. 49 | 50 | ### Babbel 51 | 52 | [Should you move into management as a software engineer?](https://www.babbel.com/en/magazine/should-you-move-into-management-as-a-software-engineer) - Crossing over into management from the engineering world is a difficult choice but usually has to be made at some point. 53 | 54 | [AWS Fargate for Data Engineering](https://www.babbel.com/en/magazine/aws-fargate-for-data-engineering) - Fargate is still a relevant processor for big data jobs. 55 | 56 | [How to Fight Retrospective Fatigue](https://www.babbel.com/en/magazine/how-to-fight-retrospective-fatigue) - Retrospectives are an important part of any sprint review and should be done properly. 57 | 58 | [Evolution of Babbel’s data pipeline on AWS: from SQS to Kinesis](https://www.babbel.com/en/magazine/evolution-of-babbels-data-pipeline-on-aws-from-sqs-to-kinesis) - A good luck at the evolution of a data platform. 59 | 60 | ### Benchling 61 | 62 | [Evolving to Enterprise-Grade Permissions](https://benchling.engineering/evolving-to-enterprise-grade-permissions-493eac166449) - The principle of least access can never be mentioned enough. 63 | 64 | ### Bigeng 65 | 66 | [Why the way we look at technical debt is wrong](https://www.bigeng.io/why-the-way-we-look-at-technical-debt-is-wrong/) - Tech debt needs to be accepted and not be viewed as a negative. 67 | 68 | ### Blackrock 69 | 70 | [Developer Engagement One Code Review at a Time](https://medium.com/blackrock-engineering/developer-engagement-one-code-review-at-a-time-e01fe5cfc36b) - A good overview of how to really handle code review in an efficient manner 71 | 72 | [Domain-Driven Asset Management](https://engineering.blackrock.com/domain-driven-asset-management-1d93ef63ba35) - Asset management is a discussion that doesn't get a lot of hype but is very relevant in the context of systems like the one at Blackrock 73 | 74 | [Citizen Developer Cookbook: Python Multiprocessing](https://engineering.blackrock.com/citizen-developer-cookbook-python-multiprocessing-3dc3c8cab29a) - I've always wanted to know more about multiprocessing in Python, so this was a helpful tutorial 75 | 76 | [Telemetry and Observability at BlackRock](https://engineering.blackrock.com/telemetry-and-observability-at-blackrock-99cc6ed865ee) - A great primer for those wanting to understand more about telemetry and observability 77 | 78 | ### Booking 79 | 80 | [How Reliability and Product Teams Collaborate at Booking.com](https://medium.com/booking-com-infrastructure/how-reliability-and-product-teams-collaborate-at-booking-com-f6c317cc0aeb) - Collaborating with the product team needs to become more commonplace for development teams. 81 | 82 | [What can Dungeons & Dragons teach us about User Experience?](https://medium.com/booking-writes/what-can-dungeons-dragons-teach-us-about-user-experience-be9840966875) - A very good analogy about UX. 83 | 84 | [Empowering Data with Design](https://booking.design/empowering-data-with-design-f6cec4e35cd) - A post stressing the importance of visualization. 85 | 86 | ### Bumble 87 | 88 | [Hourglass into Pyramid: how you can improve the structure of your tests](https://medium.com/bumble-tech/hourglass-into-pyramid-ccf4b4da7785) - E2E testing needs to stop being an afterthought and part of the actual development process. 89 | 90 | ### Canva 91 | 92 | [Service-aligned Data Platform Architecture](https://canvatechblog.com/service-aligned-data-platform-architecture-6b5a6fc366c4) - A good overview of Canvas's data platform and CDC. 93 | 94 | [Our journey to Snowflake monitoring mastery](https://www.canva.dev/blog/engineering/our-journey-to-snowflake-monitoring-mastery/) - Effective practices for managing Snowflake costs 95 | 96 | ### Capital One 97 | 98 | [The evolution of issue tracking](https://medium.com/@CapitalOneTech/the-evolution-of-issue-tracking-capital-one-937a0a7ce09) - An important part of being on top of everything, how issue tracking has evolved at Capital One 99 | 100 | [Data Profiler: Data Drift Model Monitoring Tool](https://medium.com/@CapitalOneTech/data-profiler-data-drift-model-monitoring-tool-capital-one-e69631f5a058) - Data drift is an underappreciated topic sometimes, so this was a good overview of how it can be properly monitored 101 | 102 | [Serverless Computing Reduces Collaboration Costs](https://medium.com/@CapitalOneTech/serverless-computing-reduces-collaboration-costs-capital-one-a60e8cba03c9) - Capital One is really all-in on the serverless revolution, and this is a good explanation of why 103 | 104 | [5 reasons to use ML for better data quality](https://medium.com/@CapitalOneTech/5-reasons-to-use-ml-for-better-data-quality-capital-one-75a45592f7d7) - A good post on how ML can be used to enhance data quality 105 | 106 | [Serverless architecture technology](https://medium.com/@CapitalOneTech/serverless-architecture-technology-80db4063d3df) - A good overview on different serverless technologies in AWS 107 | 108 | [So You Want to Be a Techie?](https://medium.com/capital-one-tech/so-you-want-to-be-a-techie-2a05df05ef57) - Tech is not only for people from a CS background. Anyone can thrive 109 | 110 | [How Machine Learning Can Help Fight Money Laundering](https://medium.com/capital-one-tech/how-machine-learning-can-help-fight-money-laundering-d22d8bed7cbc) - Applications of ML in the banking industry 111 | 112 | [Giving a Fast-Changing Data Ecosystem Room to Grow](https://medium.com/capital-one-tech/giving-a-fast-changing-data-ecosystem-room-to-grow-5d0cb24a8276) - A good post on how an always-evolving domain is allowed to properly grow 113 | 114 | [The “Why” of Inner Source](https://medium.com/capital-one-tech/the-why-of-inner-source-13ee805407f8) - As someone who's big on inner-source, this is a good post of why it's useful 115 | 116 | [From the CIO’s View: Building a Nimble Learning Organization](https://medium.com/capital-one-tech/from-the-cios-view-building-a-nimble-learning-organization-f0f2d64506a) - Learning done right within an organization 117 | 118 | [Batch and Streaming in the World of Data Science and Data Engineering](https://medium.com/capital-one-tech/batch-and-streaming-in-the-world-of-data-science-and-data-engineering-2cc029cdf554) - A good overview of both batch and stream processing as it relates to DS/DE 119 | 120 | [The Journey from Batch to Real-time with Change Data Capture](https://medium.com/capital-one-tech/the-journey-from-batch-to-real-time-with-change-data-capture-c598e56146be) - An explanation of the different technologies that can be used in CDC 121 | 122 | [CICD and Data](https://medium.com/capital-one-tech/cicd-pipelines-and-data-platforms-758b074b38b1) - An older post that helps give rise to DataOps 123 | 124 | [Doing The Hard Things First — Lessons From Our Cloud Journey](https://medium.com/capital-one-tech/doing-the-hard-things-first-lessons-from-our-cloud-journey-6f6da77ae147) - The financial industry is usually the last when it comes to bigger migration efforts, so this was a good overview of Capital One's cloud migration 125 | 126 | [Scaling to Billions of Requests-The Serverless Way at Capital One](https://medium.com/capital-one-tech/scaling-to-billions-of-requests-the-serverless-way-at-capital-one-e5958b4fa1b7) - How Capital One properly scales to support all the transactions it handles 127 | 128 | [Guardrails for AWS Event-Driven Serverless Architectures](https://medium.com/capital-one-tech/guardrails-for-aws-event-driven-serverless-architectures-f9bc12ad689f) - Best practices for using serverless technologies in AWS 129 | 130 | [The 3 R’s of SREs: Resiliency, Recovery & Reliability](https://medium.com/capital-one-tech/the-3-rs-of-sres-resiliency-recovery-reliability-5f2f5360a91b) - Even though this post is about SRE, many of the same principles apply to DRE as well 131 | 132 | [3 Considerations for Containers & Serverless Compute Options](https://medium.com/capital-one-tech/3-considerations-for-containers-serverless-compute-options-583d5d6ee93d) - Things to consider when making the migration to serverless 133 | 134 | [Serverless Stream Consumers — Common Pitfalls and Best Practices](https://medium.com/capital-one-tech/serverless-stream-consumers-common-pitfalls-and-best-practices-8fd431a892f) - Best practices for properly ingesting stream data 135 | 136 | [4 Serverless Myths to Understand Before Getting Started with AWS](https://medium.com/capital-one-tech/4-serverless-myths-to-understand-before-getting-started-with-aws-48c4ab1203ab) - A good overview on misconceptions to ignore before getting started with serverless 137 | 138 | [Embrace the Chaos … Engineering](https://medium.com/capital-one-tech/embrace-the-chaos-engineering-203fd6fc6ff7) - A good overview on how to properly do chaos engineering 139 | 140 | [6 Principles of a Well Managed Change](https://medium.com/capital-one-tech/6-principles-of-a-well-managed-change-35a3f4fed28e) - Good principles to consider when it comes to bigger changes 141 | 142 | [Governance in a DevOps Environment](https://medium.com/capital-one-tech/devops-and-governance-56f6ecae1181) - Properly integrated DevOps with governance 143 | 144 | [4 Steps for Pairing the Cloud and DevOps to Improve Resiliency](https://medium.com/capital-one-tech/4-steps-for-pairing-cloud-and-devops-to-improve-resiliency-c72fe2e52b05) - A very good post on how DevOps can be used to improve overall resiliency of an architecture 145 | 146 | [https://medium.com/capital-one-tech/focusing-on-the-devops-pipeline-topo-pal-833d15edf0bd](https://medium.com/capital-one-tech/focusing-on-the-devops-pipeline-topo-pal-833d15edf0bd) - The intersection of Agile and DevOps 147 | 148 | [Continuous Chaos — Introducing Chaos Engineering into DevOps Practices](https://medium.com/capital-one-tech/continuous-chaos-introducing-chaos-engineering-into-devops-practices-75757e1cca6d) - How chaos engineering and DevOps can feed off of one another 149 | 150 | [Continuous Delivery and What the Heck Happened to QA?](https://medium.com/capital-one-tech/continuous-delivery-and-what-the-heck-happened-to-qa-d6f999bc7643) - Why CD works and the importance of having a QA environment 151 | 152 | [DevOps is a State of Mind, Not Just a Role](https://medium.com/capital-one-tech/devops-is-a-state-of-mind-not-just-a-role-b58e577e9358) - I 100% agree with the premise of this post, as DevOps is a bigger adjustment than just learning a set of principles 153 | 154 | [No Testing Strategy, No DevOps](https://medium.com/capital-one-tech/no-testing-strategy-no-devops-915287e1b4fd) - A great overview on why proper testing is needed to successfully pull off DevOps 155 | 156 | [The Mon-ifesto Part 2: Alerting and Graphing](https://medium.com/capital-one-developers/the-mon-ifesto-part-2-alerting-and-graphing-bf51828a008f) - Proper alerting/graphing principles in application monitoring 157 | 158 | [The Mon-ifesto Part 3: Alert Response and Post-Mortem](https://medium.com/capital-one-tech/the-mon-ifesto-part-3-alert-response-and-post-mortem-cd227c684ac0) - Postmortems are an underappreciated aspect of monitoring and incident management, but they're very relevant to helping ensure that issues suppress themselves in the future 159 | 160 | ### Chick-fil-A 161 | 162 | [Enterprise Architecture at Chick-fil-A](https://medium.com/chick-fil-atech/enterprise-architecture-at-chick-fil-a-41bc263cf81d) - A great view of how important architecture can be in the restaurant industry. 163 | 164 | [Site Reliability Engineering at Chick-fil-A](https://medium.com/chick-fil-atech/site-reliability-engineering-at-chick-fil-a-1e7239da7711) - An interesting overview of SRE with CFA. 165 | 166 | [Decentralized Model Ops Platform w/ Apache Airflow](https://medium.com/chick-fil-atech/decentralized-model-ops-platform-w-apache-airflow-5c08fc4745ad) - A good overview of how Airflow is used to power MLOps at CFA. 167 | 168 | ### Clever 169 | 170 | [Defining Clever’s Engineering Culture](https://engineering.clever.com/2017/09/06/defining-clevers-engineering-culture/) - The principles mentioned in this engineering culture should be the standard. 171 | 172 | ### Cloudbees 173 | 174 | [DevOps Best Practices: Opinionated Software That Drives a Successful DevOps Culture](https://www.cloudbees.com/blog/devops-best-practices) - A solid collection of best practices when it comes to DevOps. 175 | 176 | [DevOps Has Evolved Beyond Shift Left](https://www.cloudbees.com/blog/shift-left-done-right) - Shift left isn't enough when it comes to DevOps and other key practices. Time to think bigger. 177 | 178 | ### CockroachDB 179 | 180 | [What is data partitioning, and how to do it right](https://www.cockroachlabs.com/blog/what-is-data-partitioning-and-how-to-do-it-right/) - Partitioning can make or break data applications, so it's important to know how to set it up properly. 181 | 182 | ### Codelitt 183 | 184 | [Establishing Communication](https://blog.codelitt.com/establishing-communication/) - A good post on the importance of communication in effective graphic design. 185 | 186 | ### Coinbase 187 | 188 | [Databricks cost management at Coinbase](https://www.coinbase.com/blog/databricks-cost-management-at-coinbase) - A good overview on various Databricks cost-savings practices, many of which I've implemented with successful outcomes. 189 | 190 | ### Commercetools 191 | 192 | [Product & Tech — Better Together!](https://techblog.commercetools.com/product-tech-better-together-7d8ced10d83f) - Another post stressing the importance of the interaction between tech and product teams. 193 | 194 | [Staff and Principal Engineers: why do we need them now? (Part 2)](https://techblog.commercetools.com/staff-and-principal-engineers-why-do-we-need-them-now-part-2-a6dcf2d6ad34) - Principal/staff engineers are paramount to larger organizations, so this was a good summary of how they made a difference at Commercetools. 195 | 196 | [How we Roadmap in 2021](https://techblog.commercetools.com/how-we-roadmap-in-2021-d024a4c6b180) - Effective roadmapping makes the Agile process a lot easier than it tends to be. 197 | 198 | [It’s done! Or is it?](https://techblog.commercetools.com/its-done-or-is-it-9fe91b9a6981) - A proper definition of done allows there never to be any guesswork when it comes to sprint planning. 199 | 200 | ### Compass 201 | 202 | [Building Great Products at Compass IDC](https://medium.com/compass-true-north/building-great-products-at-compass-idc-9c8c7587a5b3) - A good guide on what it takes to build "great" products. 203 | 204 | [Repositories — One or Many](https://medium.com/compass-true-north/repositories-one-or-many-f9da590611af) - This is worth more of a discussion than you think. The monorepo vs. multiple repo debate can have cascading effect based on which approach you choose. 205 | 206 | [The Engineering Manager Guide: Spinning Up a Results Oriented Team](https://medium.com/compass-true-north/the-engineering-manager-guide-spinning-up-a-results-oriented-team-ed3337675a5b) - Teams need to be focused on impact/results. This was a great overview on the practices needed to get there. 207 | 208 | [Writing Good Commit Messages](https://medium.com/compass-true-north/writing-good-commit-messages-fc33af9d6321) - Leaving this as more of a reminder to myself as my commit messages are lazy (and occasionally involve curse words) 209 | 210 | ### Confluent 211 | 212 | [Is Apache Kafka a Database? With ksqlDB, Most Definitely](https://www.confluent.io/blog/is-kafka-a-database-with-ksqldb/) - A good overview of ksql and how it compares to traditional databases. 213 | 214 | ### Credit Karma 215 | 216 | [Effectively attending a tech conference](https://engineering.creditkarma.com/effectively-attending-a-tech-conference) - Making the most out of a tech conference is harder than it seems. 217 | 218 | [Credit Karma Data Explorer](https://engineering.creditkarma.com/credit-karma-data-explorer) - A good overview of how Credit Karma is making data discovery easier. 219 | 220 | [How Engineering Rotation Programs Can Help Teams Scale](https://engineering.creditkarma.com/how-engineering-rotation-programs-can-help-teams-scale) - As someone who started out in a rotational program, these can be really effective for raising the future of your workforce. 221 | 222 | ### Criteo 223 | 224 | [How Much Can Bad Data Cost Us?](https://medium.com/criteo-engineering/how-much-can-bad-data-cost-us-9b12a3939170) - Bad data has many side effects, and that is why data quality is a fight you must always take on. 225 | 226 | [DataDoc — The Criteo Data Observability Platform](https://medium.com/criteo-engineering/datadoc-the-criteo-data-observability-platform-2cd826a9a1af) - A wonderful overview of an effective data observability platform and how it met all of its use cases. 227 | 228 | [Technical Data Roadmap: Why and how to build it using a maturity matrix?](https://medium.com/criteo-engineering/technical-data-roadmap-why-and-how-to-build-it-using-a-maturity-matrix-7fa4acb8cc77) - An effective technique for successful roadmapping. 229 | 230 | [Big Data Quality at Criteo](https://medium.com/criteo-engineering/big-data-quality-at-criteo-66c6bd0d42d8) - You can never mention data quality enough. 231 | 232 | [Data Governance at Criteo](https://medium.com/criteo-engineering/data-governance-at-criteo-e13b4d5047a1) - A good overview of how an effective data governance process can be set in place. 233 | 234 | ### dbt 235 | 236 | [The Analytics Development Lifecycle (ADLC)](https://www.getdbt.com/resources/guides/the-analytics-development-lifecycle) - A good summary of how SDLC can be applied to analytics 237 | 238 | ### D2iq 239 | 240 | [All Together Now: FinOps, Kubernetes, and Platform Engineering](https://d2iq.com/blog/all-together-finops-kubernetes-platform-engineering) - Applying FinOps to Kubernetes savings 241 | 242 | [The Best Way to Control Kubernetes and Cloud Costs](https://d2iq.com/blog/the-best-way-to-control-kubernetes-and-cloud-costs) - More FinOps with Kubernetes 243 | 244 | ### Dagster 245 | 246 | [Dagster at all 5 Steps of the Development Lifecycle](https://dagster.io/blog/dagster-five-stages-development-lifecycle) - Having an orchestrator involved at all steps of the development lifecycle is really helpful for data engineers. This is how Dagster tries to accomplish just that. 247 | 248 | [Declarative Scheduling for Data Assets](https://dagster.io/blog/declarative-scheduling) - Declarative scheduling just makes sense when it comes to managing data assets. 249 | 250 | [Partitions in Data Pipelines](https://dagster.io/blog/partitioned-data-pipelines) - Thinking in partitions helps set Dagster apart from its competitors 251 | 252 | [What Dagster Believes About Data Platforms](https://dagster.io/blog/what-dagster-believes-about-data-platforms) - "Data engineering is software engineering". Preach! 253 | 254 | [Balancing the Data Scales: Centralization vs. Decentralization 255 | ](https://dagster.io/blog/balancing-the-data-scales-centralization-vs-decentralization) - Centralization vs. decentralization is a never-ending topic, but this is a good primer on the pros and cons of each. 256 | 257 | [How to Make Data a Team Sport](https://dagster.io/blog/how-to-make-data-a-team-sport) - Data democratization can be a big challenge at organizations, but when it's done well, it helps everyone. 258 | 259 | [Data Visibility -- A Primer](https://dagster.io/blog/data-visibility-primer) - Data quality, lineage, and more are all key to proper data visibility. 260 | 261 | ### DataKitchen 262 | 263 | [Data Observability and Monitoring with DataOps](https://medium.com/data-ops/data-observability-and-monitoring-with-dataops-45fe822196f7) - A very useful overview of DataOps and a platform that supports it 264 | 265 | [Use DataOps With Your Data Mesh to Prevent Data Mush](https://medium.com/data-ops/use-dataops-with-your-data-mesh-to-prevent-data-mush-ec341492cebc) - A good post on how DataOps and data mesh are intertwined 266 | 267 | ### Deliveroo 268 | 269 | [The Emergence and Evolution of Analytics Engineering at Deliveroo](https://deliveroo.engineering/2022/01/19/the-emergence-and-evolution-of-analytics-engineering-at-deliveroo.html) - Analytics engineering is definitely on the rise as of late, so this is a good introduction to what exactly that is 270 | 271 | [CloudFormation To Terraform](https://deliveroo.engineering/2020/01/02/CloudFormation-To-Terraform.html) - I've always been a proponent of Terraform, and it was good to see that Deliveroo agrees with me 272 | 273 | [Data Sink](https://deliveroo.engineering/2017/06/15/data-sink.html) - An application of data sinks 274 | 275 | ### Discord 276 | 277 | [How Discord Stores Billions of Messages](https://discord.com/blog/how-discord-stores-billions-of-messages) - Discord is one of the fastest growing communities out there, so it's interesting to see how they manage to hold onto all those messages 278 | 279 | [How Discord Creates Insights From Trillions Of Data Points](https://discord.com/blog/how-discord-creates-insights-from-trillions-of-data-points) - Dealing with a lot of data isn't easy, so we can all take a page from Discord 280 | 281 | [How Data Science Informs Strategy Innovation At Discord](https://discord.com/blog/how-data-science-informs-strategy-innovation-at-discord) - A good post about how relevant Data Science needs to be in the grand scheme of things 282 | 283 | [How Discord Stores Trillions of Messages](https://discord.com/blog/how-discord-stores-trillions-of-messages) - An impressive view of how Discord has continued to grow and managed to sustain that growth 284 | 285 | [How Discord Uses Open-Source Tools for Scalable Data Orchestration & Transformation](https://discord.com/blog/how-discord-uses-open-source-tools-for-scalable-data-orchestration-transformation) - A good overview of how Discord overhauled their orchestration platform to Dagster and dbt 286 | 287 | [Overclocking dbt: Discord's Custom Solution in Processing Petabytes of Data](https://discord.com/blog/overclocking-dbt-discords-custom-solution-in-processing-petabytes-of-data) - How Discord runs at scale (shoutout to my former coworker Chris for this one!) 288 | 289 | ### Doordash 290 | 291 | [Building a Source of Truth for an Inventory with Disparate Data Sources](https://doordash.engineering/2022/06/21/building-a-source-of-truth-for-a-digital-inventory-with-disparate-data-sources/) - Bringing together a single source of truth in a massive organization is definitely a challenge 292 | 293 | [Meet Sibyl – DoorDash’s New Prediction Service – Learn about its Ideation, Implementation and Rollout](https://doordash.engineering/2020/06/29/doordashs-new-prediction-service/) - I've always been impressed by the role ML plays in the food service industry, so this was a cool implementation from Doordash 294 | 295 | [Lifecycle of a Successful ML Product: Reducing Dasher Wait Times](https://doordash.engineering/2023/02/15/lifecycle-of-a-successful-ml-product-reducing-dasher-wait-times/) - Another good overview of the role that ML plays in the food service industry 296 | 297 | [Ship to Production, Darkly: Moving Fast, Staying Safe with ML Deployments](https://doordash.engineering/2022/03/08/ship-to-production-darkly-moving-fast-staying-safe-with-ml-deployments/) - DevOps meets ML 298 | 299 | [Organizing Machine Learning: Every Flavor Welcome!](https://doordash.engineering/2020/02/12/organizing-machine-learning-every-flavor-welcome/) - A solid set of principles for growing ML 300 | 301 | [Using Metrics Layer to Standardize and Scale Experimentation at DoorDash](https://doordash.engineering/2023/04/12/using-metrics-layer-to-standardize-and-scale-experimentation-at-doordash/) - As someone wanting to know more about the metrics layer, this was a great post with a very detailed overview 302 | 303 | [How DoorDash Defines Great Engineering Management](https://doordash.engineering/2023/09/19/how-doordash-defines-great-engineering-management/) - I love the transparency behind how DoorDash wants to deliver on their management practices. This is a good template to follow. 304 | 305 | [How DoorDash Fosters Meaningful Engineering Career Development](https://doordash.engineering/2023/09/19/how-doordash-fosters-meaningful-engineering-career-development/) - A great model to follow for engineering development. 306 | 307 | [Five Common Data Quality Gotchas in Machine Learning and How to Detect Them Quickly](https://doordash.engineering/2022/09/27/five-common-data-quality-gotchas-in-machine-learning-and-how-to-detect-them-quickly/) - A good primer on a proper data quality framework 308 | 309 | ### Doximity 310 | 311 | [Finding Joy in Git Conflict Resolution](https://technology.doximity.com/articles/finding-joy-in-git-conflict-resolution) - A cool way to make merge conflicts a lot easier 312 | 313 | [Data Science & Analytics: Practitioner Insights](https://technology.doximity.com/articles/data-science-analytics-practitioner-insights) - A good set of principles for bringing the most out of data 314 | 315 | [Stars and Dimensions](https://technology.doximity.com/articles/stars-and-dimensions) - For those who want to learn more about data modeling, this post has a nice refresher 316 | 317 | ### Dropbox 318 | 319 | [Balancing quality and coverage with our data validation framework](https://dropbox.tech/infrastructure/balancing-quality-and-coverage-with-our-data-validation-framework) - Data validation frameworks are very helpful, and I like the way Dropbox has implemented theirs 320 | 321 | [Why we chose Apache Superset as our data exploration platform](https://dropbox.tech/application/why-we-chose-apache-superset-as-our-data-exploration-platform) - Superset seems to fly under the radar from time to time but is always a dependable choice 322 | 323 | [Lessons learned in incident management](https://dropbox.tech/infrastructure/lessons-learned-in-incident-management) - Oncall can be a messy process sometimes, but Dropbox has made it much more streamlined 324 | 325 | ### Duolingo 326 | 327 | [How we reduced our cloud spending by 20%](https://blog.duolingo.com/reducing-cloud-spending/) - A good overview on various ways you can reduce your Cloud spend 328 | 329 | [Meaningful metrics: How data sharpened the focus of product teams](https://blog.duolingo.com/growth-model-duolingo/) - A good review on the importance of metrics to product teams 330 | 331 | ### Entelo 332 | 333 | [Why You Should Make Everyone a Project Lead](https://engineering.entelo.com/why-you-should-make-everyone-a-project-lead-f167472b580d) - Leading a project is a great way to further your leadership skills 334 | 335 | ### Etsy 336 | 337 | [Towards Machine Learning Observability at Etsy](https://www.etsy.com/codeascraft/towards-machine-learning-observability-at-etsy) - A good overview on how Etsy is keeping all their ML models within scope 338 | 339 | ### Expedia 340 | 341 | [Software Architectural Patterns in Data Engineering](https://medium.com/expedia-group-tech/software-architectural-patterns-in-data-engineering-5d3bf22106a0) - A helpful expalantion of the software patterns underlying all our popular DE tools 342 | 343 | [Rethinking Data Visualization](https://medium.com/expedia-group-tech/rethinking-data-visualization-39e105cca4e8) - Product thinking applied to data visualization 344 | 345 | [Unified Machine Learning Platforms At Expedia Group](https://medium.com/expedia-group-tech/unified-machine-learning-platform-at-expedia-group-5aee72606c74) - Awesome overview of Expedia's ML journey 346 | 347 | [The Importance of Being a Code Reviewer](https://medium.com/expedia-group-tech/the-importance-of-being-a-code-reviewer-fdbd910fbce7) - A good set of practices to follow when it comes to code review. 348 | 349 | [Enhancing Data Reliability With An SLO Platform](https://medium.com/expedia-group-tech/enhancing-data-reliability-with-an-slo-platform-de00249756f6) - SLO platforms are very helpful for monitoring 350 | 351 | ### Facebook/Meta 352 | 353 | [Enabling static analysis of SQL queries at Meta](https://engineering.fb.com/2022/11/30/data-infrastructure/static-analysis-sql-queries/) - A really neat overview of how FB handles SQL linting, amongst other things 354 | 355 | [Move faster, wait less: Improving code review time at Meta](https://engineering.fb.com/2022/11/16/culture/meta-code-review-time-improving/) - FB's code review, especially considering it's a monorepo, is extremely well done 356 | 357 | [Tulip: Schematizing Meta’s data platform](https://engineering.fb.com/2022/11/09/developer-tools/tulip-schematizing-metas-data-platform/) - Logging is very important to FB, so this is good insight into how that performance is maintained 358 | 359 | [Scaling data ingestion for machine learning training at Meta](https://engineering.fb.com/2022/09/19/ml-applications/data-ingestion-machine-learning-training-meta/) - I didn't necessarily understand everything here, but this was an interesting read nonetheless 360 | 361 | [Improving Meta’s SLO workflows with data annotations](https://engineering.fb.com/2022/08/29/developer-tools/improving-metas-slo-workflows-with-data-annotations/) - Annotations can certainly give more insight into observability 362 | 363 | [Introducing Zelos: A ZooKeeper API leveraging Delos](https://engineering.fb.com/2022/06/08/developer-tools/zelos/) - Interesting overview of how FB plans on moving from ZooKepper to something more at their scale 364 | 365 | [BellJar: A new framework for testing system recoverability at scale](https://engineering.fb.com/2022/05/05/developer-tools/belljar/) - Recovering from an outage can't be easy for something the scale of FB, so this was a good overview of how they accomplish it 366 | 367 | [SLICK: Adopting SLOs for improved reliability](https://engineering.fb.com/2021/12/13/production-engineering/slick/) - FB's monitoring is top-notch, and it's overviews like these that show why 368 | 369 | [Nemo: Data discovery at Facebook](https://engineering.fb.com/2020/10/09/data-infrastructure/nemo/) - FB's data discovery, speaking from personal experience, is immensely impressive 370 | 371 | [Aria Presto: Making table scan more efficient](https://engineering.fb.com/2019/06/10/data-infrastructure/aria-presto/) - Table scans are a painful activity, so making that more efficiently holds a lot of weight in SQL engines 372 | 373 | [Getafix: How Facebook tools learn to fix bugs automatically](https://engineering.fb.com/2018/11/06/developer-tools/getafix-how-facebook-tools-learn-to-fix-bugs-automatically/) - Obviosuly treading into dangerous territory, but automating bug squashing could be very useful for many places 374 | 375 | [Migrating Messenger storage to optimize performance](https://engineering.fb.com/2018/06/26/core-data/migrating-messenger-storage-to-optimize-performance/) - How a service the size of Messenger is able to stay afloat 376 | 377 | [Rapid release at massive scale](https://engineering.fb.com/2017/08/31/web/rapid-release-at-massive-scale/) - DevOps applied to FB 378 | 379 | [Facebook Chef cookbooks](https://engineering.fb.com/2016/04/15/core-data/facebook-chef-cookbooks/) - How Facebook (although an older post) puts CI/CD to use 380 | 381 | [Engineering Culture: Code ownership](https://engineering.fb.com/2014/10/28/culture/engineering-culture-code-ownership/) - Code ownership is certainly a debatable topic 382 | 383 | [Scaling Mercurial at Facebook](https://engineering.fb.com/2014/01/07/core-data/scaling-mercurial-at-facebook/) - The Mercurial monorepo is FB is gigantic, so this is an interesting insight into how it's actually serving the thousands of engineers who work on it. 384 | 385 | [Presto: Interacting with petabytes of data at Facebook](https://engineering.fb.com/2013/11/06/core-data/presto-interacting-with-petabytes-of-data-at-facebook/) - Presto laid the foundation for what's Trino now, so understanding how Presto is as efficient as it is will help explain Starburst Galaxy and the like 386 | 387 | [Join Optimization in Apache Hive](https://engineering.fb.com/2010/12/15/core-data/join-optimization-in-apache-hive/) - Older article, but join optimizations in Hive is still a relevant topic 388 | 389 | [Scaling Out](https://engineering.fb.com/2008/08/20/core-data/scaling-out/) - An earlier post before FB was the FB we know today, but still a good lesson to be learned 390 | 391 | [Scheduling Jupyter Notebooks at Meta](https://engineering.fb.com/2023/08/29/security/scheduling-jupyter-notebooks-meta/) - A bit specific to Meta due to Bento not being open-source, but good principles nonetheless 392 | 393 | [Data engineering at Meta: High-Level Overview of the internal tech stack](https://medium.com/@AnalyticsAtMeta/data-engineering-at-meta-high-level-overview-of-the-internal-tech-stack-a200460a44fe) - Best thing you'll ever read on comparing the Meta DE tech stack to an open-source one 394 | 395 | [The future of the data engineer — Part I](https://medium.com/@AnalyticsAtMeta/the-future-of-the-data-engineer-part-i-32bd125465be) - A great read on the future of DE 396 | 397 | [Four Analytics Best Practices We Adopted — and Why You should Too](https://medium.com/@AnalyticsAtMeta/four-analytics-best-practices-we-adopted-and-why-you-should-too-a1058ce5f8af) - Good practices to follow for a successful analytics implementation 398 | 399 | [Analytics Career Development at Meta](https://medium.com/meta-analytics/analytics-career-development-at-meta-4327c011aaea) - What career advancement at Meta looks like 400 | 401 | [Automating data removal](https://engineering.fb.com/2023/10/31/data-infrastructure/automating-data-removal/) - A good system to remove data with reduced risk 402 | 403 | [What it takes to be a Senior IC at Meta](https://medium.com/@AnalyticsAtMeta/being-a-senior-ic-59ee705ba3c1) - A good breakdown of senior vs. other levels of IC 404 | 405 | [Composable data management at Meta](https://engineering.fb.com/2024/05/22/data-infrastructure/composable-data-management-at-meta/) - A good introduction to setting up a composable data stack, which is becoming more and more relevant 406 | 407 | [How Meta discovers data flows via lineage at scale](https://engineering.fb.com/2025/01/22/security/how-meta-discovers-data-flows-via-lineage-at-scale/) - Nice overview of how Meta is able to successfully capture lineage across their gigantic codebase 408 | 409 | [How Meta understands data at scale](https://engineering.fb.com/2025/04/28/security/how-meta-understands-data-at-scale/) - Cool blog on how Meta makes the most of their sprawling data 410 | 411 | ### Flipkart 412 | 413 | [Transforming Data Analytics at Flipkart: Self Serve Insights on Petabytes scale data](https://blog.flipkart.tech/transforming-data-analytics-at-flipkart-self-serve-insights-on-petabytes-scale-data-fa59caf2bc54) - Self-serving analytics is all the rage in our current AI age. This is a good overview of how to build a platform to handle that. 414 | 415 | ### Funding Circle 416 | 417 | [How we manage documentation at Funding Circle for our Data Platform](https://medium.com/funding-circle/how-we-manage-documentation-at-funding-circle-for-our-data-platform-960a422b9b2e) - A great guide on properly handling documentation 418 | 419 | ### Future Processing 420 | 421 | [Data science and data analytics – know the difference](https://www.future-processing.com/blog/data-science-and-data-analytics-know-the-difference/) - These terms sometimes are used interchangeably but have their differences, so it's important to distinguish them 422 | 423 | [7 common Big Data security issues](https://www.future-processing.com/blog/7-common-big-data-security-issues/) - Security is sometimes an afterthought when it comes to big data, so it's important to be aware of the various issues you may encounter while setting up these applications 424 | 425 | ### Fynd 426 | 427 | [Introducing Developer-less Data Workbench — Making business analysts, Masters of the data!](https://blog.gofynd.com/introducing-developer-less-data-workbench-making-business-analysts-masters-of-the-data-9eee49601d52) - Considering this was written in 2015, this is an impressive overview of data enablement with automation 428 | 429 | ### Gamechanger 430 | 431 | [Apache Airflow on AWS ECS](https://tech.gc.com/apache-airflow-on-aws-ecs/) - Many different implementations of Airflow are available, but I haven't see too many leveraging ECS before 432 | 433 | [Let me automate that for you](https://tech.gc.com/let-me-automate-that-for-you/) - Obviously, we want automation wherever we can have it, so this was a simple walkthrough of how it's done at Gamechanger 434 | 435 | [Data Interruption Process](https://tech.gc.com/data-interruption-process/) - This was a strange way to word on-call, but it's an effective (albeit older) approach nonetheless 436 | 437 | [What Good Engineers Do](https://tech.gc.com/what-good-engineers-do/) - A solid set of principles for what makes a good engineer 438 | 439 | ### Grab 440 | 441 | [How we store and process millions of orders daily](https://engineering.grab.com/how-we-store-millions-orders) - For those who want to know more about DynamoDB, this is helpful 442 | 443 | [Embracing a Docs-as-Code approach](https://engineering.grab.com/doc-as-code) - Documentation is an often overlooked area, but this is a good approach to making sure it remains a chief priority 444 | 445 | [Real-time data ingestion in Grab](https://engineering.grab.com/real-time-data-ingestion) - How food service handles real-time data ingestion 446 | 447 | [Trident - Real-time Event Processing at Scale](https://engineering.grab.com/trident-real-time-event-processing-at-scale) - I was not too familiar with IFTTT (if this, then that) design before, so this was an interesting read 448 | 449 | ### Gusto 450 | 451 | [The Accidental Tech Lead](https://engineering.gusto.com/the-accidental-tech-lead/) - Growing into being a tech lead, which can sometimes happen by accident just on account of experience 452 | 453 | [Cultivating Engineering Growth](https://engineering.gusto.com/cultivating-engineering-growth-2/) - Good tips on how to enable engineers for success through mentorship 454 | 455 | ### Haptik 456 | 457 | [Kubernetes Production Best Practices - Part I](https://www.haptik.ai/tech/k8s-prod-best-practices) - For those using Kubernetes in their workflows, a solid set of best practices 458 | 459 | ### Hashnode 460 | 461 | [How to Build Event-Driven Architecture on AWS](https://engineering.hashnode.com/how-to-build-event-driven-architecture-on-aws) - A good tutorial on setting up event-driven architectures in AWS, including the different routes that can be taken 462 | 463 | ### Heap 464 | 465 | [How I Learned to Stop Worrying and Love Tech Debt](https://www.heap.io/blog/how-i-learned-to-stop-worrying-and-love-tech-debt) - The term "papercuts" is definitely a reasonable way to pull in tech debt items into planning 466 | 467 | ### HelloFresh 468 | 469 | [How HelloTech’s working and knowledge sharing culture supports a company on scale](https://engineering.hellofresh.com/how-hellotechs-working-and-knowledge-sharing-culture-supports-a-company-on-scale-cb37f1901947) - Companies with good knowledge sharing cultures are the ones whose employees succeed the most IMO 470 | 471 | [SLOs for everyone with Sloth](https://engineering.hellofresh.com/slos-for-everyone-with-sloth-1704009b20a2) - A very well-detailed explanation of how HelloFresh has full-scale monitoring for their SLOs in place 472 | 473 | [How HelloFresh establishes Data Quality with an in-house tool](https://medium.com/hellofresh-dev/how-hellofresh-establishes-data-quality-with-an-in-house-tool-ecb6fe060ba2) - A very nice implementation of data quality and attempting to shift left with it as well 474 | 475 | [Data driven Snowflake optimisation at HelloFresh](https://engineering.hellofresh.com/data-driven-snowflake-optimisation-at-hellofresh-55a5b56aa9af) - It's no secret in the DE world that Snowflake can be expensive. A good guide on how to tune down those costs. 476 | 477 | ### Helpshift 478 | 479 | [Building a Data warehouse with Hive at Helpshift — Part 1](https://medium.com/helpshift-engineering/building-a-data-warehouse-with-hive-at-helpshift-part-1-443046df6484) - A little more outdated, but still a useful overview of how you can build a warehouse with Hive as your backbone 480 | 481 | ### Instacart 482 | 483 | [Building for Balance](https://tech.instacart.com/building-for-balance-e61fb9511893) - A very thorough overview of how Instacart finds the balance between fast deliveries and high-earning opportunities for their drivers 484 | 485 | [The Next Era of Data at Instacart](https://medium.com/tech-at-instacart/the-next-era-of-data-at-instacart-e081d8dfa162) - Good post on the future of the data org at Instacart 486 | 487 | [Adopting dbt as the Data Transformation Tool at Instacart](https://tech.instacart.com/adopting-dbt-as-the-data-transformation-tool-at-instacart-36c74bc407df) - Good to see bigger companies starting to adopt dbt 488 | 489 | ### Intuit 490 | 491 | [Democratizing AI to Accelerate ML Model Development in Weeks vs. Months](https://medium.com/intuit-engineering/democratizing-ai-to-accelerate-ml-model-development-in-weeks-vs-months-9e895e3239a9) - A good overview on how the ML development process has been sped up at Intuit 492 | 493 | [How to Ensure Release Candidates are Good2Go? Automated Performance Pipelines.](https://medium.com/intuit-engineering/how-to-ensure-release-candidates-are-good2go-automated-performance-pipelines-9e2d28200ca0) - Proper performance testing as a part of the CI/CD process is not something that's done enough, but this is a good set of principles to employ to accomplish that 494 | 495 | ### LINE 496 | 497 | [A story of introducing data lineage into LINE's large-scale data platform](https://engineering.linecorp.com/en/blog/data-lineage-on-line-big-data-platform) - A good implementation of lineage in many different capacities at LINE 498 | 499 | ### LinkedIn 500 | 501 | [Super Tables: The road to building reliable and discoverable data products](https://engineering.linkedin.com/blog/2022/super-tables--the-road-to-building-reliable-and-discoverable-dat) - LinkedIn's overview of "super tables" helps bring out the best in their data products 502 | 503 | [Open Sourcing Venice – LinkedIn’s Derived Data Platform](https://engineering.linkedin.com/blog/2022/open-sourcing-venice--linkedin-s-derived-data-platform) - An impressive data platform implementation from LinkedIn 504 | 505 | [Scalable Automated Config-Driven Data Validation with ValiData](https://www.linkedin.com/blog/engineering/data-management/scalable-automated-config-driven-data-validation) - A nice way to automate data validation 506 | 507 | [LakeChime: A Data Trigger Service for Modern Data Lakes](https://www.linkedin.com/blog/engineering/data-management/lakechime-a-data-trigger-service-for-modern-data-lakes) - A great idea of how to ingest data as soon as it's available 508 | 509 | [Right-sizing Spark executor memory](https://www.linkedin.com/blog/engineering/infrastructure/right-sizing-spark-executor-memory) - A good overview on Spark tuning 510 | 511 | [Practical text-to-SQL for data analytics](https://www.linkedin.com/blog/engineering/ai/practical-text-to-sql-for-data-analytics) - Effective guide/overview of building an SQL bot and why it'd be helpful in a larger organization 512 | 513 | [Journey of next generation control plane for data systems](https://www.linkedin.com/blog/engineering/infrastructure/journey-of-next-generation-control-plane-for-data-systems) - A really helpful control plane for data to cut down on time and effort 514 | 515 | ### Lyft 516 | 517 | [Securing Apache Airflow UI With DAG Level Access](https://eng.lyft.com/securing-apache-airflow-ui-with-dag-level-access-a7bc649a2821) - DAG-level access may be the next evolution of Airflow UI access combined with RBAC 518 | 519 | [Open Sourcing Amundsen: A Data Discovery And Metadata Platform](https://eng.lyft.com/open-sourcing-amundsen-a-data-discovery-and-metadata-platform-2282bb436234) - Amundsen is becoming a very popular data discoverability platform, and for good reason 520 | 521 | [Running Apache Airflow At Lyft](https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff) - Lyft is one of the big "power" users of Airflow, and their model can serve as a template for many 522 | 523 | [Big Savings On Big Data](https://medium.com/lyft-engineering/big-savings-on-big-data-9c74b7a35326) - A nice overview of how Lyft managed to bring down their costs in their processing 524 | 525 | [Gotchas of Streaming Pipelines: Profiling & Performance Improvements](https://medium.com/lyft-engineering/gotchas-of-streaming-pipelines-profiling-performance-improvements-301439f46412) - Good tips on optimizing streaming pipelines 526 | 527 | [From Big Data to Better Data: Ensuring Data Quality with Verity](https://medium.com/lyft-engineering/from-big-data-to-better-data-ensuring-data-quality-with-verity-a996b49343f6) - A very thorough overview of a great data quality platform 528 | 529 | [ETA (Estimated Time of Arrival) Reliability at Lyft](https://eng.lyft.com/eta-estimated-time-of-arrival-reliability-at-lyft-d4ca2720bda8) - A thorough overview of how Lyft tries to calculate ETA 530 | 531 | ### McDonald's 532 | 533 | [Searching for quality and speed? Observability can help](https://medium.com/mcdonalds-technical-blog/searching-for-quality-and-speed-observability-can-help-860e770ab1ce) - How observability is helping keep McDonald's development go quickly 534 | 535 | [A single source of truth: Building a design system library](https://medium.com/mcdonalds-technical-blog/a-single-source-of-truth-building-a-design-system-library-8e12188809a8) - This provides a good template for those who want to ensure they provide a consistent user experience 536 | 537 | [Proactive monitoring: The why, what and how](https://medium.com/mcdonalds-technical-blog/proactive-monitoring-the-why-what-and-how-c280b117a835) - Proactive monitoring helps prevent bigger incidents from ever arising. It's the best way to pull off proper monitoring. 538 | 539 | ### Miro 540 | 541 | [Data Products Reliability: The Power of Metadata](https://miro.com/careers/life-at-miro/tech/data-products-reliability-the-power-of-metadata/): A good overview of how Miro is implementing data contracts 542 | 543 | ### Netflix 544 | 545 | [Navigating the Netflix Data Deluge: The Imperative of Effective Data Management](https://medium.com/@netflixtechblog/navigating-the-netflix-data-deluge-the-imperative-of-effective-data-management-e39af70f81f7) - A great post on how Netflix manages storage costs at scale 546 | 547 | [ETL development life-cycle with Dataflow](https://medium.com/@netflixtechblog/etl-development-life-cycle-with-dataflow-9c70c64aba7b) - A very good overview of the E2E ETL process with Dataflow at Netflix 548 | 549 | [Part 1: A Survey of Analytics Engineering Work at Netflix](https://netflixtechblog.com/part-1-a-survey-of-analytics-engineering-work-at-netflix-d761cfd551ee) - A good overview of how analytics engineering is applied at Netflix 550 | 551 | ### New York Times 552 | 553 | [Congrats, You’re On Call! Now What?](https://open.nytimes.com/congrats-youre-on-call-now-what-8d36c5ad60aa) - How to effectively handle an on-call rotation 554 | 555 | ### Nextdoor 556 | 557 | [Engineering Principles (v1) at Nextdoor](https://engblog.nextdoor.com/engineering-principles-at-nextdoor-e82743b2ef2f) - A gold standard for engineering principles 558 | 559 | ### NextRoll 560 | 561 | [Coordinated Cost Savings](https://tech.nextroll.com/blog/costs/2020/07/01/coordinated-cost-savings.html) - Cost savings is a team effort and takes a village, as this post details 562 | 563 | ### PayPal 564 | 565 | [The next generation of Data Platforms is the Data Mesh](https://medium.com/paypal-tech/the-next-generation-of-data-platforms-is-the-data-mesh-b7df4b825522) - A very solid explanation of why data mesh is needed in data platforms 566 | 567 | [Next-Gen Data Movement Platform at PayPal](https://medium.com/paypal-tech/next-gen-data-movement-platform-at-paypal-100f70a7a6b) - Lots of parts in play, but detailed insight into everything that drives PayPal 568 | 569 | [The Journey of Metadata at PayPal](https://medium.com/paypal-tech/the-journey-of-metadata-at-paypal-c374ac66e2e6) - Bringing data ownership and discoverability to the masses at PayPal 570 | 571 | [Gimel: PayPal’s Analytics Data Processing Platform](https://medium.com/paypal-tech/gimel-paypals-analytics-data-processing-platform-22ec5890f4af) - The coolest part of this blog was realizing Romit now works in a related team at Disney :). But this platform is certainly impressive nonetheless. 572 | 573 | ### Pinterest 574 | 575 | [Improving efficiency and reducing runtime using S3 read optimization](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0) - Reducing runtime with S3 reads is every data engineer's dream 576 | 577 | [How Pinterest runs Kafka at scale](https://medium.com/pinterest-engineering/how-pinterest-runs-kafka-at-scale-ff9c6f735be) - A good overview on how Kafka can be effectively scaled within an organization 578 | 579 | [500X Scalability of Experiment Metric Computing with Unified Dynamic Framework](https://medium.com/pinterest-engineering/500x-scalability-of-experiment-metric-computing-with-unified-dynamic-framework-9eb356fee676) - Running metrics at scale is tricky, but Pinterest has a good practice here with the use of Druid 580 | 581 | ### Postman 582 | 583 | [How (and Why) Postman Created a Data-Driven Hiring Process](https://medium.com/better-practices/how-and-why-postman-created-a-data-driven-hiring-process-d2269a025ae6) - I have never really liked the interviewing process, on both sides. Postman has a good model in place here. 584 | 585 | [The Postman Data Team’s Hub-and-Spoke Model](https://medium.com/better-practices/the-postman-data-teams-hub-and-spoke-model-662707e0ef9e) - A good explanation of how the hub-and-spoke model works for Postman and its data teams 586 | 587 | [How Postman Does Data Democratization](https://medium.com/better-practices/how-postman-does-data-democratization-6aec096dc9bf) - A very thorough overview of how Postman enhances their data with proper democratization 588 | 589 | ### Quora 590 | 591 | [Trino at Quora Scale: Cost, Speed, and Reliability](https://quoraengineering.quora.com/Trino-at-Quora-Scale-Cost-Speed-and-Reliability) - For those using Trino/Presto, a good overview on how it's done in a larger environment 592 | 593 | ### REA Group 594 | 595 | [Accelerating experimentation with MLOps](https://www.rea-group.com/about-us/news-and-insights/blog/accelerating-experimentation-with-mlops/) - A great resource for those who want to know more about best practices in MLOps 596 | 597 | [Data Science: Principles for Success](https://www.rea-group.com/about-us/news-and-insights/blog/data-science-principles-for-success/) - A solid set of principles for enabling success in a Data Science team 598 | 599 | [Data Discovery](https://www.rea-group.com/about-us/news-and-insights/blog/data-discovery/) - A sensible implementation of Amundsen 600 | 601 | [Reflections On Designing An Enterprise Data Warehouse](https://www.rea-group.com/about-us/news-and-insights/blog/reflections-on-designing-an-enterprise-data-warehouse/) - Tips on how to design an effective data warehouse 602 | 603 | [The Ops Dojo](https://www.rea-group.com/about-us/news-and-insights/blog/the-ops-dojo/) - I'll all for the term "dojo" to better describe more of what we need to be doing 604 | 605 | ### Shopify 606 | 607 | [The 25 Percent Rule for Tackling Technical Debt](https://shopify.engineering/technical-debt-25-percent-rule) - 25% allotment for tackling technical debt would be a dream, but Shopify raises a very valid point on why it's necessary 608 | 609 | [The Hardest Part of Writing Tests is Getting Started](https://shopify.engineering/the-hardest-part-of-writing-tests-is-getting-started) - A very truthful title. TDD is needed but actually getting to that initial state can be a challenge. 610 | 611 | [How Good Documentation Can Improve Productivity](https://shopify.engineering/good-documentation-productivity) - As someone who very much agrees with good documentation, I couldn't agree more 612 | 613 | [Three Essential Remote Work Practices for Engineering Teams](https://shopify.engineering/three-essential-remote-work-practices-engineering) - Some of these are easier said than done, but very much true for remote work these days 614 | 615 | [Reducing BigQuery Costs: How We Fixed A $1 Million Query](https://shopify.engineering/reducing-bigquery-costs) - Good tips on how to keep your costs low 616 | 617 | [A Software Engineer's Guide to Working Across Time Zones](https://shopify.engineering/a-software-engineers-guide-to-working-across-time-zones) - As someone who works on a team with teammates halfway around the world, very relatable points 618 | 619 | [How to Structure Your Data Team for Maximum Influence](https://shopify.engineering/how-to-structure-data-teams) - The "Diamond Defense" is not one I've ever seen before, but it makes sense on team structure 620 | 621 | [On the Importance of Pull Request Discipline](https://shopify.engineering/on-the-importance-of-pull-request-discipline) - Good practices to follow for raising PRs 622 | 623 | [When Culture and Code Reviews Collide, Communication is Key](https://shopify.engineering/code-reviews-communication) - More relevant points than you might think 624 | 625 | [Six Tips for Staying Technical as a CTO](https://shopify.engineering/six-tips-staying-technical-cto) - My fear when getting into management is not being technical, so it's cool to see this advice on how to "stay in the game" 626 | 627 | [5 Steps to Bounce Back from a Negative Performance Review](https://shopify.engineering/five-steps-to-better-performance-reviews) - A bad performance review isn't the end of the world. It provides an opportunity to really grow. 628 | 629 | [Lessons Learned From Running Apache Airflow at Scale](https://shopify.engineering/lessons-learned-apache-airflow-scale) - Shopify has a good model in place for running Airflow 630 | 631 | [Asynchronous Communication is the Great Leveler in Engineering](https://shopify.engineering/asynchronous-communication-shopify-engineering) - Asynchronous communication is absolutely necessary in our current state of work 632 | 633 | [Data Is An Art, Not Just A Science—And Storytelling Is The Key](https://shopify.engineering/data-storytelling-shopify) - Absolutely agree with the title here. Telling a story with data is critical. 634 | 635 | [The Magic of Merlin: Shopify's New Machine Learning Platform](https://shopify.engineering/merlin-shopify-machine-learning-platform) - Merlin is a very cool implementation of ML 636 | 637 | [A Data Scientist’s Guide To Measuring Product Success](https://shopify.engineering/a-data-scientist-s-guide-to-measuring-product-success) - Good tips on how to better enable product success 638 | 639 | [Using Terraform to Manage Infrastructure](https://shopify.engineering/manage-infrastructure-with-terraform) - As a big Terraform proponent, this is a good overview on how Shopify is using it 640 | 641 | [Shopify's Playbook for Scaling Machine Learning](https://shopify.engineering/shopify-playbook-scaling-machine-learning) - A good model to follow for ML 642 | 643 | [Search at Shopify—Range in Data and Engineering is the Future](https://shopify.engineering/search-at-shopify) - A great post on why range is necessary for future development 644 | 645 | [Shopify’s Unique Data Science Hierarchy Of Needs](https://shopify.engineering/shopify-unique-data-science-hierarchy-of-needs) - Shopify has a good model in place here for Data Science 646 | 647 | [Five Tips for Growing Your Engineering Career](https://shopify.engineering/five-tips-for-engineer-career-growth) - A good set of tips for elevating your career 648 | 649 | [The AWARE Development Plan](https://shopify.engineering/aware-development-plan) - A very good acronym to follow for a successful career 650 | 651 | [5 Steps for Building Machine Learning Models for Business](https://shopify.engineering/building-business-machine-learning-models) - Good tips on getting ML into the picture 652 | 653 | [Modelling Developer Infrastructure Teams](https://shopify.engineering/modelling-developer-infrastructure-teams) - A good explanation of the difference between horizontal and vertical teams 654 | 655 | [Bridging the Gap Between Developers and End Users](https://shopify.engineering/bridging-gap-between-developers-and-users) - Very good tips on how to bring product and tech closer together 656 | 657 | [A Guide to Running an Engineering Program](https://shopify.engineering/running-engineering-program-guide) - Not sure if I'll ever get to this stage, but this seems like a very sensible guide if that day ever were to come 658 | 659 | [Other Driven Developments](https://shopify.engineering/other-driven-developments) - Developments we'd never think about, but they're totally out there 660 | 661 | [How I Define My Boundaries to Prevent Burnout](https://shopify.engineering/define-boundaries-prevent-burnout) - Good tips here, including ones I need to follow more honestly 662 | 663 | [4 Tips for Shipping Data Products Fast](https://shopify.engineering/shipping-data-products-fast) - As someone who works with data products, I can attest following these will make things go much smoother 664 | 665 | [How to Make Dashboards Using a Product Thinking Approach](https://shopify.engineering/make-dashboards-using-product-thinking-approach) - Good principles to follow for getting the most out of dashboards 666 | 667 | [How to Reliably Scale Your Data Platform for High Volumes](https://shopify.engineering/reliably-scale-data-platform) - I feel like this isn't used as often as it should be, but it totally makes sense for making sure platforms scale 668 | 669 | [Software Release Culture at Shopify](https://shopify.engineering/software-release-culture-shopify) - This should set a standard for proper release culture 670 | 671 | [Great Code Reviews—The Superpower Your Team Needs](https://shopify.engineering/great-code-reviews) - Good practices to follow for successful code reviews 672 | 673 | [Successfully Merging the Work of 1000+ Developers](https://shopify.engineering/successfully-merging-work-1000-developers) - A good set of proper CI standards 674 | 675 | [How Shopify Scales Up Its Development Teams](https://shopify.engineering/how-shopify-scales-up-its-development-teams) - I very much agree with the points listed here on upleveling your team 676 | 677 | [Five Common Data Stores and When to Use Them](https://shopify.engineering/five-common-data-stores-usage) - For those who need to evaluate with what type of data store to go with, this is a good reference 678 | 679 | [Implementing ChatOps into our Incident Management Procedure](https://shopify.engineering/implementing-chatops-into-our-incident-management-procedure) - I very much agree with the role of ChatOps in incident management 680 | 681 | [Code Style Consistency for Shopify’s Decade-Old Codebase](https://shopify.engineering/code-style-consistency-for-shopify-s-decade-old-codebase) - Code style is something that try to preach and uphold for our team 682 | 683 | [Why Shopify Moved to The Production Engineering Model](https://shopify.engineering/why-shopify-moved-to-the-production-engineering-model) - Having a model in place like this makes everyone's lives easier 684 | 685 | [Developer Onboarding at Shopify](https://shopify.engineering/developer-onboarding-at-shopify) - Proper onboarding can make a world of difference for engineers, and it seems like Shopify has it down pat 686 | 687 | [Unlocking Real-time Predictions with Shopify's Machine Learning Platform](https://shopifyengineering.myshopify.com/blogs/engineering/shopifys-machine-learning-platform-real-time-predictions) - Very well-done explanation of how Merlin is being used at scale today 688 | 689 | [What Being a Staff Developer Means at Shopify](https://shopify.engineering/what-being-a-staff-developer-means-at-shopify) - Being a staff developer is considered the ultimate rank, but what does it take to get there? This is a helpful guide to getting to that point. 690 | 691 | ### Sky Betting and Gambling 692 | 693 | [Team Size and Why It Matters](https://sbg.technology/2016/04/08/team-size-and-why-it-matters/) - A good breakdown of how smaller vs. bigger teams differ 694 | 695 | ### Skyscanner 696 | 697 | [Automating cloud governance at scale](https://medium.com/@SkyscannerEng/automating-cloud-governance-at-scale-895695fe4a1f) - For those who work in governance, this is a good way to keep guardrails on resource provisioning 698 | 699 | [Using engineering principles to create autonomous teams at scale](https://medium.com/@SkyscannerEng/using-engineering-principles-to-create-autonomous-teams-at-scale-a73120c4e252) - A good set of principles for ensuring teams are successful 700 | 701 | [Monoliths and Microservices](https://medium.com/@SkyscannerEng/monoliths-and-microservices-8c65708c3dbf) - How to move away from monoliths to microservices 702 | 703 | ### Slack 704 | 705 | [BuildRock: A Build Platform at Slack](https://slack.engineering/buildrock-a-build-platform-at-slack/) - Proper CI/CD platforms help unblock many teams, so it's imperative to do it right 706 | 707 | [Infrastructure Observability for Changing the Spend Curve](https://slack.engineering/infrastructure-observability-for-changing-the-spend-curve/) - Generally, it's not CI infrastructure that hogs costs, but always good to be aware of everything 708 | 709 | [Data Lineage at Slack](https://slack.engineering/data-lineage-at-slack/) - Effective implementation of data lineage, especially with Slack notifications involved 710 | 711 | [How We Design Our APIs at Slack](https://slack.engineering/how-we-design-our-apis-at-slack/) - For those interested in API design, this is a good set of principles to follow 712 | 713 | [Starting an Initiative](https://slack.engineering/starting-an-initiative/) - Finding impact can be difficult at first, but persistence is key 714 | 715 | [How Big Technical Changes Happen at Slack](https://slack.engineering/how-big-technical-changes-happen-at-slack/) - Good discussion on when the hype is real and joining the trend 716 | 717 | [Deploys at Slack](https://slack.engineering/deploys-at-slack/) - A very solid CI/CD implementation 718 | 719 | [Disasterpiece Theater: Slack’s process for approachable Chaos Engineering](https://slack.engineering/disasterpiece-theater-slacks-process-for-approachable-chaos-engineering/) - Chaos engineering helps keep websites like Slack up around the clock 720 | 721 | [Data Wrangling at Slack](https://slack.engineering/data-wrangling-at-slack/) - An older article, but an effective implementation for data wrangling 722 | 723 | [Data Consistency Checks](https://slack.engineering/data-consistency-checks/) - An older article, but still covers valuable points related to data quality 724 | 725 | [Service Delivery Index: A Driver for Reliability](https://slack.engineering/service-delivery-index-a-driver-for-reliability/) - For those in SRE, this is a good primer. 726 | 727 | [Executing Cron Scripts Reliably At Scale](https://slack.engineering/executing-cron-scripts-reliably-at-scale/) - A bit strange not to see Slack using a service like Airflow to handle all of this, but a good overview nonetheless. 728 | 729 | [Unlocking Efficiency and Performance: Navigating the Spark 3 and EMR 6 Upgrade Journey at Slack](https://slack.engineering/unlocking-efficiency-and-performance-navigating-the-spark-3-and-emr-6-upgrade-journey-at-slack/) - A walkthrough of how Slack upgraded all of their processes to use more recent versions of EMR/Spark. 730 | 731 | ### Slalom 732 | 733 | [Cloud Trends: A Mainstream Evolution to DataOps](https://medium.com/slalom-data-ai/cloud-trends-a-mainstream-evolution-to-dataops-219f2d7fd764) - A good overview on the relevance of DataOps in this current era 734 | 735 | [The Building Blocks of Success: Is Data Mesh Right for My Organization?](https://medium.com/slalom-data-ai/the-building-blocks-of-success-is-data-mesh-right-for-my-organization-f05931f029db) - Data mesh is (rightfully) a buzzword right now, but that doesn't mean it's for everyone. This is a good guide on when data mesh makes sense. 736 | 737 | [Data Is Everywhere. Is Yours Under Control?](https://medium.com/slalom-data-ai/data-is-everywhere-is-yours-under-control-7b77cc7df259) - A good post on the relevance of data governance 738 | 739 | [Data Modelling is More than Documentation](https://medium.com/slalom-data-ai/data-modelling-is-more-than-documentation-3e9a1b73f511) - A good explanation on the different types of data models 740 | 741 | [Deconstructing Data Mesh Principles](https://medium.com/slalom-data-ai/data-mesh-232e50f42e66) - A good overview on the different key principles of a data mesh 742 | 743 | [Data Mesh: is the argument a strawman?](https://medium.com/slalom-data-ai/data-mesh-is-the-argument-a-strawman-3cffaf55ce5e) - A post battling the hype of data meshes 744 | 745 | [Building a Culture of Data and Insights](https://medium.com/slalom-data-ai/building-a-culture-of-data-and-insights-ed0c6a81f943) - A nice overview of how to enable a data-driven culture 746 | 747 | ### Soundcloud 748 | 749 | [Building a Healthy On-Call Culture](https://developers.soundcloud.com/blog/building-a-healthy-on-call-culture) - Tips for helping ensure a smooth on-call process 750 | 751 | [How (Not) to Build Datasets and Consume Data at Your Company](https://developers.soundcloud.com/blog/how-not-to-build-datasets-and-consume-data-at-your-company) - An effective approach towards ensure healthy data usage 752 | 753 | [Getting a Team Back on Track](https://developers.soundcloud.com/blog/getting-a-team-back-on-track) - This is an underdiscussed topic that should be mentioned more. A helpful set of tips for helping keep teams afloat amidst change. 754 | 755 | [A Better Model of Data Ownership](https://developers.soundcloud.com/blog/a-better-model-of-data-ownership) - A helpful definition of what exactly ownership means in relation to data 756 | 757 | ### Spotify 758 | 759 | [Why We Switched Our Data Orchestration Service](https://engineering.atspotify.com/2022/03/why-we-switched-our-data-orchestration-service/) - Flyte isn't necessarily on Airflow or Prefect level yet but Spotify's explanations of why they're doing it makes total sense 760 | 761 | [Achieving Team Purpose and Pride with Scrum](https://engineering.atspotify.com/2021/05/achieving-team-purpose-and-pride-with-scrum/) - Getting the most out of scrum, done the right way 762 | 763 | [Managing Clouds from the Ground Up: Cost Engineering at Spotify](https://engineering.atspotify.com/2020/09/managing-clouds-from-the-ground-up-cost-engineering-at-spotify/) - We all could benefit from a dashboard tool like this (and many companies are now realizing how relevant it is) 764 | 765 | [How We Improved Data Discovery for Data Scientists at Spotify](https://engineering.atspotify.com/2020/02/how-we-improved-data-discovery-for-data-scientists-at-spotify/) - A very thorough overview of how Spotify has implemented data discovery 766 | 767 | [TC4D: Data Quality By Engineers, For Engineers](https://engineering.atspotify.com/2017/10/tc4d-data-quality-by-engineers-for-engineers-2/) - A fun initiative for bringing out the best in testing 768 | 769 | [Qualities of Quality](https://engineering.atspotify.com/2014/04/qualities-of-quality/) - A very solid set of principles for holding up quality 770 | 771 | [Analytics at Spotify](https://engineering.atspotify.com/2013/05/analytics-at-spotify/) - Old post but that only goes to show how much Spotify embraces data 772 | 773 | [Agile à la Spotify](https://engineering.atspotify.com/2013/03/agile-a-la-spotify/) - You don't see many places rewriting the Agile manifesto, but the principles Spotify's outlining make sense 774 | 775 | [Fleet Management at Spotify (Part 1): Spotify’s Shift to a Fleet-First Mindset](https://engineering.atspotify.com/2023/04/spotifys-shift-to-a-fleet-first-mindset-part-1/) - Maintaining a lot of components is extremely difficult, but Spotify makes it look easy with this approach. 776 | 777 | [Getting More from Your Team Health Checks](https://engineering.atspotify.com/2023/03/getting-more-from-your-team-health-checks/) - How to get the most out of your team health/pulse checks, something that's not done enough. 778 | 779 | [Data Platform Explained](https://engineering.atspotify.com/2024/04/data-platform-explained/) - A data platform at a company that handles data like Spotify is bound to be interesting. I look forward to the continuation of this series. 780 | 781 | [Data Platform Explained Part II](https://engineering.atspotify.com/2024/05/data-platform-explained-part-ii/) - A continuation of the data platform series 782 | 783 | [Unlocking Insights with High-Quality Dashboards at Scale](https://engineering.atspotify.com/2024/08/unlocking-insights-with-high-quality-dashboards-at-scale/) - A good set of criteria for high-quality dashboards 784 | 785 | [Are You a Dalia? How We Created Data Science Personas for Spotify’s Analytics Platform](https://engineering.atspotify.com/2024/09/are-you-a-dalia-how-we-created-data-science-personas-for-spotifys-analytics-platform/) - Persona usage for making sure a platform is built appropriately is a smart model 786 | 787 | ### Squarespace 788 | 789 | [Creating a Code Review Culture, Part 1: Organizations and Authors](https://engineering.squarespace.com/blog/2019/code-review-culture-part-1) - Good tips on how to more effectively put code together for review 790 | 791 | [Creating a Code Review Culture, Part 2: Code Reviewers](https://engineering.squarespace.com/blog/2019/code-review-culture-part-2) - Good tips on how to be an effective code reviewer 792 | 793 | [Data Traceability and Lineage](https://engineering.squarespace.com/blog/2016/date-traceability-and-lineage) - A bit older on this topic, but setting the foundations for effective lineage in data 794 | 795 | ### Stack Overflow 796 | 797 | [Why Devs (Should) Like Estimates](https://stackoverflow.blog/2019/10/23/why-devs-should-like-estimates/) - Good tips on how to more effectively estimate when it comes to planning 798 | 799 | [A Culture of Trust](https://stackoverflow.blog/2015/09/18/culture-of-trust/) - Trust is one of the most important things you need to have within a team, and I totally agree with Stack Overflow's discussion on it 800 | 801 | [Developer Turned Manager](https://stackoverflow.blog/2015/08/07/developer-turned-manager/) - A good retrospective on transitioning from development to the management side of things 802 | 803 | ### Stitchfix 804 | 805 | [Migrating Spark from EMR on EC2 to EMR on EKS](https://multithreaded.stitchfix.com/blog/2022/03/14/spark-eks/) - EKS is the "new" standard for Spark processing, so this is a helpful tutorial on moving Spark from EC2 to EKS 806 | 807 | [Aggressively Helpful Platform Teams](https://multithreaded.stitchfix.com/blog/2021/02/09/aggressively-helpful-platform-teams/) - "Aggressively helpful" is exactly what platform teams need to be in order to better enable success within an organization 808 | 809 | ### Stride 810 | 811 | [What is DevOps?](https://www.stridenyc.com/blog/what-is-devops) - A well done primer on DevOps 812 | 813 | [Creating Core Values that Actually Stick](https://www.stridenyc.com/blog/how-to-create-core-values-that-stick) - Core values are often brushed away, but the organizations that really put time and effort into them are the ones that stand out amongst the crowd 814 | 815 | ### Target 816 | 817 | [Chaos Leads to Resilience](https://tech.target.com/blog/chaos-leads-to-resilience) - Chaos engineering can better protect your system in the long run, so it's cool to see how Target is preparing themselves for those scenarios 818 | 819 | [Review Scrutiny](https://tech.target.com/blog/review-scrutiny) - Code review etiquette is an underappreciated topic but a good one to go back to from time to time 820 | 821 | ### Thoughtworks 822 | 823 | [Making the data dream a reality](https://www.thoughtworks.com/insights/blog/making-data-dream-reality) - The origins of data mesh and how it can better enable data-driven thinking 824 | 825 | ### Timescale 826 | 827 | [Database Management: Behind-the-Scenes Lessons From a Data Architect](https://www.timescale.com/blog/database-management-behind-the-scenes-lessons-from-a-data-architect/) - For those who want to learn more about data centers and the ins and outs of big data, this is definitely a good post 828 | 829 | ### Toptal 830 | 831 | [Big Data Architecture for the Masses: A ksqlDB and Kubernetes Tutorial](https://www.toptal.com/big-data/ksqldb-kubernetes-tutorial) - A good overview of ksqlDB 832 | 833 | ### Trivago 834 | 835 | [SRE: On-Call Procedure at trivago](https://tech.trivago.com/post/2022-07-18-sre-on-call-procedure-at-trivago/) - On-call procedures would be a lot better for everyone if they followed how Trivago is doing it 836 | 837 | [Remastering Guilds After Five Years](https://tech.trivago.com/post/2021-05-17-remasteringguildsafterfiveyears/) - Guilds are a great way to bring out more collaboration within an organization 838 | 839 | [Creating a Culture of Quality](https://tech.trivago.com/post/2015-08-31-culture_of_quality/) - A good post on proper quality when it comes to CI/CD 840 | 841 | [Technical Decision-Making](https://tech.trivago.com/post/2023-02-22-technical-decision-making) - A good guide to help standardize the technical decision-making process 842 | 843 | [What Have I Even Been Doing Today?](https://tech.trivago.com/post/2023-01-03-engineer-to-manager-three-mindset-shifts) - How to come to terms with moving from an IC into a management role 844 | 845 | ### Twitch 846 | 847 | [Twitch Engineering: An Introduction and Overview](https://blog.twitch.tv/en/2015/12/18/twitch-engineering-an-introduction-and-overview-a23917b71a25/) - Older post, but still a cool overview of how Twitch is set up 848 | 849 | ### Twitter 850 | 851 | [Data Quality Automation at Twitter](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2022/data-quality-automation-at-twitter) - For those using Great Expectations, this is an effective look at how Twitter is doing it 852 | 853 | [Powering real-time data analytics with Druid at Twitter](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2022/powering-real-time-data-analytics-with-druid-at-twitter) - Druid may not be the most relevant platform anymore, but it's cool to see how Twitter is using it to power their use cases 854 | 855 | [Next generation data insights using natural language queries](https://blog.twitter.com/engineering/en_us/topics/insights/2022/next-generation-data-insights-using-natural-language-queries) - This implementation of Qurious looks really, really cool 856 | 857 | [Advancing Jupyter Notebooks at Twitter - Part 1](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/advancing-jupyter-notebooks-at-twitter---part-1--a-first-class-d) - How Twitter leverages Jupyter notebooks for true data-driven analysis 858 | 859 | [Processing billions of events in real time at Twitter](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/processing-billions-of-events-in-real-time-at-twitter-) - 400 billion events per day is insane, so to see how Twitter's able to do it under the hood is very interesting 860 | 861 | [Kafka as a storage system](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/kafka-as-a-storage-system) - You don't really think of Kafka being used for storage, but Twitter seems to have done it effectively 862 | 863 | [Building Twitter’s ad platform architecture for the future](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/building-twitters-ad-platform-architecture-for-the-future) - An AdServer per product is a lot, but it definitely does better enable proper scale 864 | 865 | [Democratizing data analysis with Google BigQuery](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/democratizing-data-analysis-with-google-bigquery) - A very sensible approach to proper data democratization at Twitter 866 | 867 | [Interactive Analytics at MoPub: Querying Terabytes of Data in Seconds](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2019/interactive-analytics-at-mopub) - An effective use of Druid and microservices to power interactive analytics 868 | 869 | [ZooKeeper at Twitter](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/zookeeper-at-twitter) - Similar to FB, a detailed breakdown on how a big platform is using ZooKeeper to stay afloat 870 | 871 | [Productionizing ML with workflows at Twitter](https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows) - How Twitter uses Airflow to solve their ML use cases 872 | 873 | [Using Deep Learning at Scale in Twitter’s Timelines](https://blog.twitter.com/engineering/en_us/topics/insights/2017/using-deep-learning-at-scale-in-twitters-timelines) - This is a really cool overview of how deep learning is used to power what we see on our Twitter timelines 874 | 875 | [The Infrastructure Behind Twitter: Scale](https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale) - This is a lot of context on how Twitter manages to scale, and you know it's only gotten more complex since then 876 | 877 | [Discovery and Consumption of Analytics Data at Twitter](https://blog.twitter.com/engineering/en_us/topics/insights/2016/discovery-and-consumption-of-analytics-data-at-twitter) - A pretty detailed discussion on data discovery, especially given that this was in 2016 878 | 879 | ### Uber 880 | 881 | [Introducing WorkflowGuard: The Workflow Governance and Observability System That Oversees over 120,000 Data Workflows](https://www.uber.com/blog/introducing-workflowguard/) - Automated tools like these will become more of a reality, especially in the larger organizations 882 | 883 | [Crane: Uber’s Next-Gen Infrastructure Stack](https://www.uber.com/blog/crane-ubers-next-gen-infrastructure-stack/) - The future of big data processing at Uber 884 | 885 | [Cost Efficiency @ Scale in Big Data File Format](https://www.uber.com/blog/cost-efficiency-big-data/) - A bit advanced, but a nice overview on how Uber keeps their costs in check 886 | 887 | [Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework](https://www.uber.com/blog/streaming-real-time-analytics/) - A good implementation of real-time analytics 888 | 889 | [How Data Shapes the Uber Rider App](https://www.uber.com/blog/how-data-shapes-the-uber-rider-app/) - A good overview on what role data plays in the Uber app 890 | 891 | [How Uber Achieves Operational Excellence in the Data Quality Experience](https://www.uber.com/blog/operational-excellence-data-quality/) - +1 for operational excellence and proper data quality 892 | 893 | [Continuous Integration and Deployment for Machine Learning Online Serving and Models](https://www.uber.com/blog/continuous-integration-deployment-ml/) - How Uber tackles some of their MLOps challenges 894 | 895 | [Uber’s Journey Toward Better Data Culture From First Principles](https://www.uber.com/blog/ubers-journey-toward-better-data-culture-from-first-principles/) - I'm a big fan of the principles mentioned in this page 896 | 897 | [Turning Metadata Into Insights with Databook](https://www.uber.com/blog/metadata-insights-databook/) - A data discovery/observability platform that can be the gold standard for others 898 | 899 | [Monitoring Data Quality at Scale with Statistical Modeling](https://www.uber.com/blog/monitoring-data-quality-at-scale/) - Very useful applications of modeling for proper DQM 900 | 901 | [Uber’s Data Platform in 2019: Transforming Information to Intelligence](https://www.uber.com/blog/uber-data-platform-2019/) - A bit outdated by DE standards, but valuable insight into how Uber manages to continue to perform at scale 902 | 903 | [Solving Big Data Challenges with Data Science at Uber](https://www.uber.com/blog/solving-big-data-challenges-with-data-science-at-uber/) - Fun applications of Data Science within Uber 904 | 905 | [Managing Uber’s Data Workflows at Scale](https://www.uber.com/blog/managing-data-workflows-at-scale/) - Eliminating single points of failure and converging to unified products when possible are very solid principles to be considering for larger platforms 906 | 907 | [Databook: Turning Big Data into Knowledge with Metadata at Uber](https://www.uber.com/blog/databook/) - Cool overview on how Uber brings out their data discovery 908 | 909 | [Turbocharging Analytics at Uber with our Data Science Workbench](https://www.uber.com/blog/dsw/) - Self-serve analytics platforms like what Uber has built are the backbone of larger organizations 910 | 911 | [Engineering Data Analytics with Presto and Apache Parquet at Uber](https://www.uber.com/blog/presto/) - How Uber uses Presto and Parquet for an efficient SQL engine 912 | 913 | [ETA Phone Home: How Uber Engineers an Efficient Route](https://www.uber.com/blog/engineering-routing-engine/) - An interesting read on how Uber puts routes together 914 | 915 | [Identifying Outages with Argos, Uber Engineering’s Real-Time Monitoring and Root-Cause Exploration Tool](https://www.uber.com/blog/argos-real-time-alerts/) - An earlier but still extremely relevant post on anomaly detection and the role it plays in monitoring 916 | 917 | [The Pulse of a City: How People Move Using Uber Engineering](https://www.uber.com/blog/data-visualization-city-movement/) - For those into data visualization, a nice view into Uber transport in big cities 918 | 919 | [Evolution of Data Lifecycle Management at Uber](https://www.uber.com/blog/evolution-of-data-lifecycle-management-at-uber/?uclick_id=add05cd1-3a86-4266-b7ca-083dc79f7105) - DLM is a very relevant topic these days, especially with an increased focus on costs. How Uber handles it is a good standard to follow. 920 | 921 | [Dynamic Executor Core Resizing in Spark](https://www.uber.com/blog/dynamic-executor-core-resizing-in-spark/?uclick_id=add05cd1-3a86-4266-b7ca-083dc79f7105) - OOM errors in Spark are the worst. This is a good method to make that issue easier. 922 | 923 | [Attribute-Based Access Control at Uber](https://www.uber.com/blog/attribute-based-access-control-at-uber/?uclick_id=add05cd1-3a86-4266-b7ca-083dc79f7105) - Proper access control is tricky when it comes to tables, so this is a good foundation for others to follow. 924 | 925 | [Announcing Cadence 1.0: The Powerful Workflow Platform Built for Scale and Reliability](https://www.uber.com/blog/announcing-cadence/?uclick_id=add05cd1-3a86-4266-b7ca-083dc79f7105) - There's always more room for workflow engines, so cool to see what Cadence can bring to the table. 926 | 927 | [Sparkle: Standardizing Modular ETL at Uber](https://www.uber.com/blog/sparkle-modular-etl/) - I am all for standardizing ETL development wherever it can be. Sparkle seems like a very smart approach. 928 | 929 | [Preon: Presto Query Analysis for Intelligent and Efficient Analytics](https://www.uber.com/blog/preon/) - Excellent approach to query optimization/analysis 930 | 931 | [Genie: Uber’s Gen AI On-Call Copilot](https://www.uber.com/blog/genie-ubers-gen-ai-on-call-copilot/) - Excellent use of LLMs to cut down on manual on-call effort 932 | 933 | [Presto® Express: Speeding up Query Processing with Minimal Resources](https://www.uber.com/blog/presto-express/) - Good read on chunking and speeding up query processing 934 | 935 | ### VTS 936 | 937 | [Designing for Data](https://buildingvts.com/designing-for-data-e3758fb2dd2a) - Telling a story with data is underrated 938 | 939 | ### Walmart Global Tech 940 | 941 | [Rapid & Reliable ML Experiments using MLOps Best Practices.](https://medium.com/walmartglobaltech/rapid-reliable-ml-experiments-using-mlops-best-practices-7f01e563cb3e) - A good application of MLOps principles 942 | 943 | [element: Walmart’s Machine Learning Platform](https://medium.com/@bagavath/element-walmarts-machine-learning-platform-b8a1f7870784) - A very good overview of the ML platform that's in place at Walmart 944 | 945 | [Unsung Saga of MLOps](https://medium.com/walmartglobaltech/unsung-saga-of-mlops-1b494f587638) - This is a good set of principles to really kick ML engineering into high gear 946 | 947 | [MLOps — Is it a Buzzword??? Part -1](https://medium.com/walmartglobaltech/mlops-is-it-a-buzzword-part-1-8573fe95290e) - MLOps is more than just a buzzword or a trend. It's a cultural change. 948 | 949 | [The Importance of Good Data](https://medium.com/walmartglobaltech/the-importance-of-good-data-c5d6c5d3095d) - For those who like to sleep on data quality, this one's for you 950 | 951 | [Pillars of Walmart’s Demand Forecasting](https://medium.com/walmartglobaltech/pillars-of-walmarts-demand-forecasting-f6722de86e1a) - The pillars used to accomplish proper demand forecasting make sense for any company in the same line of work 952 | 953 | [DataBathing — A Framework for Transferring the Query to Spark Code](https://medium.com/walmartglobaltech/databathing-a-framework-for-transferring-the-query-to-spark-code-484957a7e049) - We actually use a similar process to simplify SparkSQL queries. It's good to see others do the same. 954 | 955 | [Engineering Acceleration with InnerSource Culture](https://medium.com/walmartglobaltech/engineering-acceleration-with-innersource-culture-5dfaeab32921) - Inner-source culture is a big one in companies. 956 | 957 | [Unified Monitoring of ETL Performance with BumbleBee](https://medium.com/walmartglobaltech/unified-monitoring-of-etl-performance-with-bumblebee-bc2580954584) - A good overview of how to do effective ETL monitoring 958 | 959 | [Resiliency Through Message-Driven Architecture](https://medium.com/walmartglobaltech/resiliency-through-message-driven-architecture-137c4547dc80) - The message/event-driven architecture is definitely a sensible one based on the internals of your application 960 | 961 | [Cloud Native Architecture Fundamentals](https://medium.com/walmartglobaltech/cloud-native-architecture-fundamentals-ac13f979916d) - A good overview of what it really means to be Cloud-native 962 | 963 | [Data as a Service](https://medium.com/walmartglobaltech/data-as-a-service-c75c0fe3660b) - A lot of these same concepts are a part of data products as we know them now 964 | 965 | [Auditing Airflow Job Runs](https://medium.com/walmartglobaltech/auditing-airflow-batch-jobs-73b45100045) - Auditing Airflow job runs is crucial as a part of proper observability 966 | 967 | [The Keystone of Happy Teams](https://medium.com/walmartglobaltech/the-keystone-of-happy-teams-db071d076058) - Psychological safety is a great term to use when distinguishing the average team from happy teams 968 | 969 | [Building a Platform Team — Laying the Foundations](https://medium.com/walmartglobaltech/building-a-platform-team-d915221d5654) - A great overview on how to really set up a proper platform team 970 | 971 | [Product Management 101: 8 Steps to Design Better Products](https://medium.com/walmartglobaltech/product-management-101-8-steps-to-design-better-products-b3a4436da27b) - Even as engineers, we should be familiar with many of these concepts so we can help our product stakeholders accordingly 972 | 973 | [The Power of an Invisible Leader](https://medium.com/walmartglobaltech/the-power-of-an-invisible-leader-8c4a6143895) - I've never heard the term "invisible leader" before, but it's a sensible one based on the description 974 | 975 | [Work Got You Stressed? Here Is My Secret To Controlling The Chaos.](https://medium.com/walmartglobaltech/work-got-you-stressed-here-is-my-secret-to-controlling-the-chaos-c5d19c4d450f) - A very applicable guide to myself, as I struggle with work stress all the time 976 | 977 | [5 Principles Guaranteed to Help Build a Strong Team Culture](https://medium.com/walmartglobaltech/5-principles-guaranteed-to-help-build-a-strong-team-culture-6055ab478c56) - As someone who is big on team culture, I thought this was a great read. 978 | 979 | ### Wayfair 980 | 981 | [Introducing our Machine Learning and Data Platforms Team](https://www.aboutwayfair.com/careers/tech-blog/introducing-our-machine-learning-and-data-platforms-team) - Platform teams enable a lot of success within an organization 982 | 983 | [Enabling Supplier Sales Through Real-time Data](https://www.aboutwayfair.com/careers/tech-blog/enabling-supplier-sales-through-real-time-data) - How real-time data unlocks more potential for Wayfair 984 | 985 | ### Wealthfront 986 | 987 | [Rolling Back an Airflow Upgrade](https://eng.wealthfront.com/2021/02/26/rolling-back-an-airflow-upgrade/) - Things are never perfect, so this is a good post on how to recover from a failed Airflow upgrade. 988 | 989 | ### WePay 990 | 991 | [Effective Software Design Documents](https://wecode.wepay.com/posts/effective-software-design-documents) - Effective design documents are super helpful when iterating on a product 992 | 993 | [Improving Airflow UI Security](https://wecode.wepay.com/posts/improving-airflow-ui-security) - A good model for ensuring proper security within the Airflow UI 994 | 995 | ### Xandr 996 | 997 | [Knowledge Transfer in Engineering: How to make it go smoothly](https://medium.com/xandr-tech/knowledge-transfer-in-engineering-both-a-challenge-and-an-opportunity-44c78fa43258) - Effective KTs are a gamechanger for engineers, so this is a solid set of principles for better enabling it 998 | 999 | ### Yelp 1000 | 1001 | [Spark Data Lineage](https://engineeringblog.yelp.com/2022/08/spark-data-lineage.html) - I've never really seen Spark being used for lineage purposes, but this is a cool implementation from Yelp on how it can be accomplished 1002 | 1003 | [Engineering Career Series: How we onboard engineers across the world at Yelp](https://engineeringblog.yelp.com/2021/05/engineering-career-series-how-we-onboard-engineers-across-the-world-at-yelp.html) - Effective onboarding programs make the process so much easier for engineers as they start their journey 1004 | 1005 | ### Zalando 1006 | 1007 | [Growth Engineering at Zalando](https://engineering.zalando.com/posts/2022/07/growth-engineering-at-zalando.html) - Mentoring and role frameworks help enable growth and success for engineers 1008 | 1009 | [Accelerate testing in Apache Airflow through DAG versioning](https://engineering.zalando.com/posts/2022/06/accelerate-apache-airflow-testing-through-dag-versioning.html) - DAG versioning makes complete sense for dealing with testing alongside ongoing production processes leveraging the same DAGs 1010 | 1011 | [Principal Engineering at Zalando](https://engineering.zalando.com/posts/2022/02/principal-engineering-at-zalando.html) - A good primer on what principal engineers mean to an organization 1012 | 1013 | [A Systematic Approach to Reducing Technical Debt](https://engineering.zalando.com/posts/2021/11/technical-debt.html) - The concept of a tech debt rotation isn't a bad idea to help keep that area in check 1014 | 1015 | [The Product Playbook](https://engineering.zalando.com/posts/2019/01/product-playbook.html) - The 4 D's is a sensible approach for product design 1016 | 1017 | [Four Pillars Of Leading People](https://engineering.zalando.com/posts/2018/10/four-pillars-leadership.html) - Good principles to follow for those in leadership roles. 1018 | 1019 | [The Democratization of ‘Data Science As A Service’](https://engineering.zalando.com/posts/2018/04/democratization-data-science.html) - I'm always a fan of promoting data science/engineering as a service 1020 | 1021 | [Discovering Design Sprints](https://engineering.zalando.com/posts/2018/04/discovering-design-sprints.html) - Sometimes the war room doesn't have to be such a bad thing 1022 | 1023 | [Data Analysis with Spark](https://engineering.zalando.com/posts/2018/03/data-analysis-spark.html) - For those newer to DE, a basic overview of Spark. 1024 | 1025 | [Dedicated Ownership for Teams at Zalon](https://engineering.zalando.com/posts/2017/11/dedicated-ownership-for-teams-at-zalon.html) - A good model on team structure 1026 | 1027 | ### Zapier 1028 | 1029 | [Thinking Fast and Estimating Wrong](https://zapier.com/engineering/estimating/) - Estimation never seems to be right when it comes to planning in software development, so I totally agree with the message of this post. 1030 | 1031 | ### Zillow 1032 | 1033 | [Building a Data Streaming Platform - How Zillow Sends Data to its Data Lake](https://www.zillow.com/tech/building-a-data-streaming-platform/) - An interesting look at how Zillow combines all its data sources into the lake 1034 | 1035 | [Airflow at Zillow: Easily Authoring and Managing ETL Pipelines](https://www.zillow.com/tech/airflow-at-zillow/) - Zillow has always had a strong Airflow presence, and this article from 2017 still holds up. 1036 | 1037 | [Building a strong foundation to accelerate StreetEasy’s data science efforts](https://www.zillow.com/tech/building-a-strong-foundation-to-accelerate-streeteasys-data-science-efforts/) - A great post on what it takes to build a data foundation 1038 | 1039 | ### Zomato 1040 | 1041 | [The Deep Tech Behind Estimating Food Preparation Time](https://www.zomato.com/blog/food-preparation-time) - Interesting post into the logic behind food service. 1042 | 1043 | ### Zumba 1044 | 1045 | [Learning to Be a Tech Lead](https://medium.com/zumba-tech/learning-to-be-a-tech-lead-cc2fa870214d) - Becoming a tech lead is not a simple change and requires shifting your priorities/frame of reference. 1046 | --------------------------------------------------------------------------------