├── .gitignore ├── README.md └── tasks.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Notes on Software Systems Engineering 2 | 3 | These notes have been collected over time during my work as a software engineer. 4 | 5 | Most of these notes are my own, though occasionally I quote from great books and 6 | other resources. Notes with quotations always include references to their 7 | sources. 8 | 9 | These notes should not be seen as strict instructions that must be followed, but 10 | rather as soft guidelines or recommendations. They are most effective when 11 | considered altogether. Taken in isolation, some may even contradict one another. 12 | Trying to follow these notes too rigidly can diminish their value or even lead 13 | to negative outcomes. There may also be some overlap between the notes, so it's 14 | important not to take them too literally. 15 | 16 | This is currently just a draft which is far from complete and organized 17 | arbitrarily. Please don't expect it to be polished. 18 | 19 | 20 | 21 | 22 | - [Day-to-Day Work of a Software Engineer](#day-to-day-work-of-a-software-engineer) 23 | - [Leave Work Better: Improving Today for a Simpler Tomorrow](#leave-work-better-improving-today-for-a-simpler-tomorrow) 24 | - [Fast Feedback](#fast-feedback) 25 | - [Start Simple](#start-simple) 26 | - [Look Outside Your Immediate Task, Maintain the Bigger Picture](#look-outside-your-immediate-task-maintain-the-bigger-picture) 27 | - [Avoid Work That Can Be Avoided](#avoid-work-that-can-be-avoided) 28 | - [Understand and Respect the Customer](#understand-and-respect-the-customer) 29 | - [Choose Where to Innovate (Carefully)](#choose-where-to-innovate-carefully) 30 | - [Automate everything](#automate-everything) 31 | - [Quick exploration](#quick-exploration) 32 | - [Task Sequencing: Group Related Activities for Efficiency](#task-sequencing-group-related-activities-for-efficiency) 33 | - [Strive for Clarity](#strive-for-clarity) 34 | - [Everything Explicit. No Magic.](#everything-explicit-no-magic) 35 | - [Close the loops, acknowledge communication](#close-the-loops-acknowledge-communication) 36 | - [Learn from Lessons](#learn-from-lessons) 37 | - [Use Diagrams](#use-diagrams) 38 | - [Communication and Teamwork](#communication-and-teamwork) 39 | - [Agile Software Development Requires Strong Social Network](#agile-software-development-requires-strong-social-network) 40 | - [Sending Status Updates to the Team](#sending-status-updates-to-the-team) 41 | - [Keep Everyone in the Loop](#keep-everyone-in-the-loop) 42 | - [Recognize the ideas and achievements of your colleagues](#recognize-the-ideas-and-achievements-of-your-colleagues) 43 | - [Professional content](#professional-content) 44 | - [Loop in Experts for Important Actions](#loop-in-experts-for-important-actions) 45 | - [Complexity and Cognitive Load](#complexity-and-cognitive-load) 46 | - [Solving Right Problems](#solving-right-problems) 47 | - [Solutions are Context-Driven](#solutions-are-context-driven) 48 | - [Weakest link](#weakest-link) 49 | - [Point of View](#point-of-view) 50 | - [Periphery](#periphery) 51 | - [Rational and Unconscious](#rational-and-unconscious) 52 | - [Humans are not designed for Big Numbers](#humans-are-not-designed-for-big-numbers) 53 | - [There is no such thing as Many](#there-is-no-such-thing-as-many) 54 | - [0-1-2-Many I](#0-1-2-many-i) 55 | - [0-1-2-Many II](#0-1-2-many-ii) 56 | - [Masking (Shadowing)](#masking-shadowing) 57 | - [Design](#design) 58 | - [Poor Abstraction](#poor-abstraction) 59 | - [Cost of Abstraction](#cost-of-abstraction) 60 | - [Habitability](#habitability) 61 | - [Hard Feature](#hard-feature) 62 | - [True Name](#true-name) 63 | - [One Pattern per Class](#one-pattern-per-class) 64 | - [Archetype](#archetype) 65 | - [Prima Materia](#prima-materia) 66 | - [Mature automation](#mature-automation) 67 | - ["Magic" is automation that is not adequate](#magic-is-automation-that-is-not-adequate) 68 | - [Poisonous Systems](#poisonous-systems) 69 | - [Bad Design in House](#bad-design-in-house) 70 | - [Trade-off of Encapsulation](#trade-off-of-encapsulation) 71 | - [Unnecessary Flexibility](#unnecessary-flexibility) 72 | - [Black Box with a Green Play Button](#black-box-with-a-green-play-button) 73 | - [Single Source Concept and Its Exceptions](#single-source-concept-and-its-exceptions) 74 | - [Resilience to Change vs Fixed Perfect Solutions](#resilience-to-change-vs-fixed-perfect-solutions) 75 | - [Two Almost Identical Entities](#two-almost-identical-entities) 76 | - [Control](#control) 77 | - [Observable Control](#observable-control) 78 | - [Humans should dominate machines](#humans-should-dominate-machines) 79 | - [Overlapping control](#overlapping-control) 80 | - [Broken control loops](#broken-control-loops) 81 | - [Feedback](#feedback) 82 | - [Broken feedback loops](#broken-feedback-loops) 83 | - [Separation / partitioning](#separation--partitioning) 84 | - [Grouping](#grouping) 85 | - [Observability vs Correctness](#observability-vs-correctness) 86 | - [Don't Use RAII on a Business Logic Level](#dont-use-raii-on-a-business-logic-level) 87 | - [Coding, code reviews, and maintenance programming](#coding-code-reviews-and-maintenance-programming) 88 | - [Code that Works](#code-that-works) 89 | - [Code Is Not Your Partner](#code-is-not-your-partner) 90 | - [Two Strategies for Replacing a Feature](#two-strategies-for-replacing-a-feature) 91 | - [Smallest Scope](#smallest-scope) 92 | - [Code Style as a Blocker](#code-style-as-a-blocker) 93 | - [Simplifying Complex Feature Branches](#simplifying-complex-feature-branches) 94 | - [The Moving and Changing Anti-pattern](#the-moving-and-changing-anti-pattern) 95 | - [Avoid Plural Names For Classes](#avoid-plural-names-for-classes) 96 | - [Fast Programming and Slow Programming](#fast-programming-and-slow-programming) 97 | - [Stable Components](#stable-components) 98 | - [Boring Code](#boring-code) 99 | - [Boring Code 2](#boring-code-2) 100 | - [Lack of Knowledge](#lack-of-knowledge) 101 | - [Lack of Knowledge II](#lack-of-knowledge-ii) 102 | - [Goodwill vs Pain](#goodwill-vs-pain) 103 | - [Biases](#biases) 104 | - [If It Works, Then It Works Bias](#if-it-works-then-it-works-bias) 105 | - [Focusing only on what's most visible bias](#focusing-only-on-whats-most-visible-bias) 106 | - [The Fix Bias](#the-fix-bias) 107 | - [Resolving Merge Conflict Bias](#resolving-merge-conflict-bias) 108 | - [Reliability](#reliability) 109 | - [Errors are not ok](#errors-are-not-ok) 110 | - [Errors must be understood and described](#errors-must-be-understood-and-described) 111 | - [Underlying errors shall not be hidden](#underlying-errors-shall-not-be-hidden) 112 | - [Critical errors vs non-critical errors](#critical-errors-vs-non-critical-errors) 113 | - [Assertions are better than no error handling](#assertions-are-better-than-no-error-handling) 114 | - [Assertions are shortcuts for a proper error handling](#assertions-are-shortcuts-for-a-proper-error-handling) 115 | - [Crash Early](#crash-early) 116 | - [Testing](#testing) 117 | - [Write Tests, Even Bad Ones](#write-tests-even-bad-ones) 118 | - [TDD as a Toolbox](#tdd-as-a-toolbox) 119 | - [Legacy Code is Code Without Tests](#legacy-code-is-code-without-tests) 120 | - [Testing as a Way to Manage Complexity](#testing-as-a-way-to-manage-complexity) 121 | - [Test It to Engineer It](#test-it-to-engineer-it) 122 | - [Improve Testability](#improve-testability) 123 | - [Distribution](#distribution) 124 | - [Provide Basic Test Sequences with Your Product](#provide-basic-test-sequences-with-your-product) 125 | - [Provide Drivers Alongside Your Hardware](#provide-drivers-alongside-your-hardware) 126 | - [Provide Simulators Alongside Your Hardware](#provide-simulators-alongside-your-hardware) 127 | - [Documentation](#documentation) 128 | - [The Illusion of Easy Documentation](#the-illusion-of-easy-documentation) 129 | - [Less prose, more structure](#less-prose-more-structure) 130 | - [Too Much Structure Overload](#too-much-structure-overload) 131 | - [Encyclopedic Document](#encyclopedic-document) 132 | - [Meetings](#meetings) 133 | - [Sound Check](#sound-check) 134 | - [Meeting Agenda](#meeting-agenda) 135 | - [Meeting Notes](#meeting-notes) 136 | - [Capturing Meeting Results](#capturing-meeting-results) 137 | - [Briefing In](#briefing-in) 138 | - [Sharing Screen & Presenting Material](#sharing-screen--presenting-material) 139 | - [Systems](#systems) 140 | - [Good enough is often best](#good-enough-is-often-best) 141 | - [Designing Systems for Effective Work](#designing-systems-for-effective-work) 142 | - [The Risk of Default Outcomes](#the-risk-of-default-outcomes) 143 | - [People and Organizations](#people-and-organizations) 144 | - [Everyone is busy](#everyone-is-busy) 145 | - [Solving Problems with Cash](#solving-problems-with-cash) 146 | - [The Paradox of Rushing in Software/Systems Engineering](#the-paradox-of-rushing-in-softwaresystems-engineering) 147 | - [Four seasons](#four-seasons) 148 | - [Standards](#standards) 149 | - [Idealized standards vs. practical implementation](#idealized-standards-vs-practical-implementation) 150 | - [The challenge of standards implementation](#the-challenge-of-standards-implementation) 151 | - [Standards and best practices](#standards-and-best-practices) 152 | - [Standards favor good practice](#standards-favor-good-practice) 153 | - [Wrong is worse than early or incomplete](#wrong-is-worse-than-early-or-incomplete) 154 | - [Requirements](#requirements) 155 | - [One-stop shopping](#one-stop-shopping) 156 | - [Safety](#safety) 157 | - [Safety does not exist without blood, loss or failure](#safety-does-not-exist-without-blood-loss-or-failure) 158 | - [Safety is boring](#safety-is-boring) 159 | - [Safety is very hard to achieve but is very easy to lose](#safety-is-very-hard-to-achieve-but-is-very-easy-to-lose) 160 | - [Success breeds failure](#success-breeds-failure) 161 | - [Safety as a Defensive Discipline](#safety-as-a-defensive-discipline) 162 | - [Safety for Engineering is Like Medicine for People](#safety-for-engineering-is-like-medicine-for-people) 163 | - [User Interfaces and Critical Systems](#user-interfaces-and-critical-systems) 164 | - [Books](#books) 165 | - [Similar resources](#similar-resources) 166 | - [Copyright](#copyright) 167 | 168 | 169 | 170 | ## Day-to-Day Work of a Software Engineer 171 | 172 | ### Leave Work Better: Improving Today for a Simpler Tomorrow 173 | 174 | Always leave the work artifacts – whether code, documentation, diagrams, models, 175 | or others – in a better state than they were before, giving future you or 176 | someone else the opportunity to improve them even further. 177 | 178 | ### Fast Feedback 179 | 180 | Fast feedback is essential for making progress and avoiding wasted effort. It 181 | helps engineers quickly test ideas, catch mistakes early, and stay in the flow. 182 | Useful ways to get fast feedback include test-driven development, fast-running 183 | test suites, effective debugging tools, and simply asking a colleague for quick 184 | advice. When starting on a new project, one of the first things to learn is how 185 | to run the existing tests, write new ones, and figure out the quickest way to 186 | debug. Investing in faster tools, clearer error messages, and smoother processes 187 | pays off – the shorter the feedback loop, the more confidently and efficiently 188 | you can work. 189 | 190 | ### Start Simple 191 | 192 | Start with something simple, then extend it further. Most often a complex 193 | problem is a composition of simpler problems. If you are facing a problem and 194 | you are afraid of the complexity it exerts, try to make a smallest possible step 195 | towards the solution and see what you can do from there. Simple can also mean 196 | quick and dirty but that's ok as that's only a start. Once you have something 197 | simple working you have a ground to move on further. Most likely this means you 198 | have an **archetype** of a future thing, real and complex system. 199 | 200 | See also Kent Beck's 201 | [Test-Driven Development book](https://en.wikipedia.org/wiki/Test-Driven_Development_by_Example) 202 | where this approach of doing simple things is explained at great depth. 203 | 204 | ### Look Outside Your Immediate Task, Maintain the Bigger Picture 205 | 206 | - When starting any task, take time to understand the rationale behind it (the 207 | WHY). 208 | - See how the task connects to broader goals, milestones, or parallel efforts. 209 | It may be part of a chain where upstream or downstream effects matter. 210 | - Maintain awareness of the bigger picture. A task that seems minor may be a 211 | critical blocker for a more visible effort. Conversely, something that appears 212 | simple might turn out to be time-consuming and affect teammates or 213 | dependencies. 214 | - With a deep understanding of the task, you will start seeing how different 215 | strategies (e.g., Strategy X vs. Strategy Y) can lead to different outcomes. 216 | This kind of insight allows you to: 217 | 218 | - Escalate risks early. 219 | - Spot opportunities. 220 | - Align better with the system's or team's needs. 221 | 222 | A practical application of this mindset in documentation writing: start each 223 | technical page with a clear problem statement and a description of its 224 | surrounding context. 225 | 226 | - Who or what benefits if this task is completed? 227 | - Does it enable a system, a process, or a team? 228 | - What is the strategic value of solving it? 229 | 230 | Framing the problem this way helps readers, especially future engineers, orient 231 | themselves and understand the significance of the solution that follows. 232 | 233 | ### Avoid Work That Can Be Avoided 234 | 235 | Before starting or planning any work, always ask: Is this work truly necessary? 236 | 237 | Sometimes, tasks are initiated based on uninformed decisions, leading to work 238 | that ultimately provides little value or fails to achieve the desired outcomes. 239 | 240 | "Busy work" refers to inefficient tasks that consume time and resources while 241 | contributing little to the project's success. It can compromise schedules, 242 | reduce technical consistency, lower team morale, and create the illusion of 243 | progress. The ability to recognize and eliminate busy work is one of the skills 244 | that distinguishes a senior engineer from a junior one. 245 | 246 | Engineering is sometimes cheating. Instead of implementing something 247 | sophisticated, a smarter workaround can achieve the same result with far less 248 | effort. For example, rather than building a solution from scratch, reuse 249 | existing work – whether by leveraging open-source software or buying an 250 | off-the-shelf system. 251 | 252 | In software development, there's a well-known saying: "The best code is the code 253 | that is never written". 254 | 255 | ### Understand and Respect the Customer 256 | 257 | Take time to deeply understand and respect the customer, both the people and the 258 | domain they operate in. Immerse yourself in their context. Know what they care 259 | about, what problems they face, and how your work fits into their world. 260 | 261 | When things go smoothly, this understanding helps you deliver real value. When 262 | things get challenging, such as when delays or technical setbacks arise, this 263 | relationship matters even more. 264 | 265 | In such situations, transparency is better than defensiveness. A clear and 266 | honest update, even when delivering bad news, builds trust. Customers almost 267 | always prefer being informed early over being surprised later. A transparent 268 | explanation of issues, trade-offs, and risks shows respect for their time, 269 | planning, and decision-making. 270 | 271 | ### Choose Where to Innovate (Carefully) 272 | 273 | Innovate where your business's focus lies and stay conservative with other areas 274 | by using established technologies. 275 | 276 | For example, a company focused on rocket software should likely avoid building 277 | its own web framework or NoSQL database. Exceptions exist, but they are rare, 278 | especially when a company diversifies into a highly successful product unrelated 279 | to its core business. Innovating in too many areas can compromise the core 280 | product and cause missed deadlines. 281 | 282 | For a great explanation, refer to this 283 | [Boring Technology presentation](https://boringtechnology.club/). 284 | 285 | ### Automate everything 286 | 287 | Seek opportunities to automate processes or tasks. Automation eliminates busy 288 | work, freeing time for more valuable activities. It reduces human error, 289 | increases efficiency, and helps to maintain consistency. The best workflows are 290 | automated ones. 291 | 292 | ### Quick exploration 293 | 294 | The solution you're looking for might be just two clicks and a couple of Google 295 | searches away. 296 | 297 | When reading large documents, it can be helpful to "fly over" them to quickly 298 | locate the most relevant section rather than reading from A to Z. 299 | 300 | When exploring with code, a combination of quick-and-dirty scripts can sometimes 301 | create miracles, giving immediate and valuable insights. Instead of discarding 302 | an idea because it's complex and time-consuming, try implementing a very basic 303 | version first because it might provide useful insights or even a functional 304 | solution right away. 305 | 306 | ### Task Sequencing: Group Related Activities for Efficiency 307 | 308 | When sequencing tasks (especially repetitive ones), group related tasks together 309 | and separate them from others. 310 | 311 | One useful pattern is the 'Inbox' approach, where input is first collected and 312 | then executed upon. For example, when writing a technical document, split the 313 | task of gathering the document content (the 'Inbox' with bullet points) from the 314 | task of formulating and spelling out each individual content item. 315 | 316 | ### Strive for Clarity 317 | 318 | Strive for clarity in everything you do. Put in the effort to make the products 319 | of your work, or the aspects of the system you're working on, as clear as 320 | possible. Simplify complexity – either by reducing the complexity itself through 321 | development or, if that's not feasible, by explaining the details as clearly as 322 | possible. 323 | 324 | Avoid owning too many non-obvious details about your work that only you 325 | understand. Do not hold onto esoteric knowledge – de-esoterize it. Document it 326 | for everyone to access. 327 | 328 | Encyclopedism or esotericism is an anti-pattern because it obscures common 329 | knowledge about the system for others. 330 | 331 | - Document everything, especially the most complex topics. 332 | - Use plain English and diagrams to explain complex topics to your colleagues. 333 | Test your content with them to ensure it is accessible. If it's still unclear, 334 | ask for their feedback to improve it. 335 | 336 | ### Everything Explicit. No Magic. 337 | 338 | Whenever you face a choice between explicit and magic, always choose explicit. 339 | 340 | "Magic" is a term software engineers use for anything that is non-obvious, 341 | hidden, overly complex, or no longer suited to the system's current state. 342 | 343 | Making things explicit requires a constant effort to ensure clarity, so that 344 | others can understand your work without extra effort. A good test for 345 | explicitness is whether understanding is immediate, with no mental effort or 346 | blockers when going through the material. 347 | 348 | ### Close the loops, acknowledge communication 349 | 350 | A "loop" refers to any situation where one action is followed by another that 351 | resolves the first action in some way. Often, these loops are explicitly called 352 | "feedback loops" because they are closed with feedback that resolves an 353 | outstanding action or state, such as marking it Done, OK, ACK, or something 354 | similar. 355 | 356 | Loops can exist in both developed systems and producing organizations. 357 | 358 | Examples of loops: 359 | 360 | - Answering an email from an existing email thread closes the loop created by 361 | that thread. 362 | - Closing a Pull Request finalizes its status, either as Done or Won't do. 363 | - Closing a work item ticket to Done. 364 | 365 | A task manager is an excellent tool for tracking work items that need to be 366 | completed and closed. For tracking non-trivial project development topics and 367 | trade-offs, a useful practice is to maintain an "Open Questions Log" – a table 368 | where each unresolved or unclosed item is tracked by its current status until it 369 | is resolved. 370 | 371 | Sometimes a loop may never be closed, or it may be closed with a significant 372 | delay. Both scenarios can lead to potential problems or even hazards, depending 373 | on the type of system being developed. 374 | 375 | Note that 'Won't-do' is also a valid way to close the loop. For example, closing 376 | a Jira ticket with "Won't do" or "Won't fix" positively acknowledges that this 377 | work will no longer linger in someone's backlog. 378 | 379 | Not closing loops is often bad practice. Some examples include: 380 | 381 | - Not answering an email can cause project delays or result in the 382 | implementation of a broken or inconsistent system, leading to incidents or 383 | accidents in the future. 384 | - A missed or forgotten chat message may mean important information is never 385 | delivered to a critical person. 386 | - A manager neglecting to follow up on an important topic raised by employees, 387 | leaving it unresolved in an inbox without due attention. 388 | 389 | ### Learn from Lessons 390 | 391 | Do something, then learn from experience. Don't forget – take deliberate time to 392 | reflect. The industry has developed several best practices for capturing lessons 393 | learned: 394 | 395 | - Standards: Organizational and industry knowledge is captured in standards, 396 | handbooks, guidelines, and best practices. 397 | - Post-mortems: When something goes wrong, those involved produce a structured 398 | report about the event. Larger companies maintain databases of critical 399 | incidents that employees can study to educate themselves. 400 | - Debriefs: After a meeting, the group discusses what went well or wrong. 401 | - Lessons learned documentation and meetings: After completing an important 402 | activity, such as a project or milestone, the team takes time to reflect on 403 | what went well or wrong, learn from it, and document the findings. 404 | 405 | Learning doesn't have to be only organizational – it can also be personal. 406 | 407 | Examples: 408 | 409 | - If a project was successful, what made it so? If a project failed, what were 410 | the key contributing factors? How can it be improved next time? 411 | - Learning how to estimate software work better – what if a task was estimated 412 | to take X weeks but actually took 3X? Wouldn't it be valuable to improve 413 | estimation skills? 414 | - If one colleague is significantly more effective than another, what makes them 415 | so? What tools, techniques, or habits contribute to their efficiency? Can 416 | something be learned from them? 417 | - Observing bugs missed during code reviews – what types of bugs tend to escape 418 | static analysis or peer review? What patterns can be identified to prevent 419 | them in the future? 420 | 421 | ### Use Diagrams 422 | 423 | Use diagrams as part of your daily work. A diagram can often explain far more 424 | than several paragraphs of text. 425 | 426 | Use diagrams for: 427 | 428 | - Prototyping and documenting software 429 | - Pair programming 430 | - Hardware-software integration testing 431 | - Meetings (including external meetings) 432 | - Onboarding colleagues 433 | - Everything else where a good visualization helps 434 | 435 | There are standards and conventions for creating diagrams, such as UML, but in 436 | practice, even very basic diagrams can be incredibly useful. Use simple shapes 437 | like rectangles and arrows, avoid excessive colors or different shapes, and 438 | express your concepts with the fewest visual elements possible. Creating 439 | diagrams that are too visually complex hinders understanding and reduces their 440 | effectiveness. 441 | 442 | ## Communication and Teamwork 443 | 444 | ### Agile Software Development Requires Strong Social Network 445 | 446 | **Agile Software Development Requires Strong Social Network**. This statement is 447 | a generalization: This idea has been there from the beginning and since the 448 | inception of the [Agile Manifesto](https://agilemanifesto.org/), but the 449 | following quote from Kent Beck helps to pinpoint it very clearly: 450 | 451 | > In The Forest (more specifically on an XP-style team), we handle communication 452 | > of design & implementation multiple ways: 453 | > 454 | > - Communicative code. 455 | > - Readable & predictive tests. 456 | > - A strong social network. 457 | > 458 | > It's only when there is a large audience for stable information (such as the 459 | > JUnit API) that we resort to separate documentation. 460 | 461 | See 462 | [Kent Beck - Anatomy of Oscillation](https://tidyfirst.substack.com/p/anatomy-of-oscillation). 463 | 464 | ### Sending Status Updates to the Team 465 | 466 | Software engineering teams often communicate daily via chat. A proven pattern is 467 | for each team member to send updates about their work, allowing the entire team 468 | to see these messages. 469 | 470 | Examples of such messages include: 471 | 472 | - "Task X is done, here's the PR link. @A and @B, could you take a look?" 473 | - "This week my focus is... Next, I am going to work on..." 474 | - "I see your PR, but I'm working on something else." 475 | - "What does the team think about introducing the coding convention ABC?" 476 | 477 | While this may seem obvious for some teams, there are others where daily chats 478 | are completely silent, reflecting a lack of communication between peers 479 | throughout the day. When messages are exchanged, it creates a certain "pulse" 480 | within the team, signaling that the group is actively working on meaningful 481 | tasks and is open to discussion, iteration, and improvement. 482 | 483 | This activity not only serves an informational purpose (increasing awareness) 484 | but also has learning, motivational, and even entertaining aspects. 485 | 486 | ### Keep Everyone in the Loop 487 | 488 | Share regular updates with the people who rely on your work: your manager, 489 | teammates, or anyone following your technical progress. In fast-moving projects, 490 | keeping others informed helps avoid surprises and keeps everyone aligned. 491 | 492 | In an office setting, updates often happen naturally. If the team is 493 | well-connected, these updates may happen through casual conversations or small 494 | talk over lunch. This kind of informal communication spreads useful information 495 | without needing formal meetings. 496 | 497 | One big advantage: by the time your work reaches a review—like a code review, 498 | documentation review, or a project milestone—people will already know about it 499 | and may have given input earlier. This makes reviews faster, smoother, and less 500 | stressful. 501 | 502 | Another reason to talk about your work: visibility and recognition. Others might 503 | not know: 504 | 505 | - what challenges you re facing 506 | - how long something might take 507 | - how your work connects to theirs. 508 | 509 | Your teammates are often busy with their own tasks. Clear communication helps 510 | them understand what you are doing and helps your work get noticed and 511 | appreciated. 512 | 513 | Stay connected. Stay aligned. 514 | 515 | ### Recognize the ideas and achievements of your colleagues 516 | 517 | Teamwork involves contributions from all team members. Whether you are a leader 518 | or an individual contributor, it is essential to give credit where it's due when 519 | expressing an idea that you know was authored by someone else. 520 | 521 | This is a good practice because it fosters trust and respect within the team, 522 | encouraging open collaboration and the free exchange of ideas. Recognizing 523 | others' contributions also boosts morale, motivates continued input, and 524 | strengthens the overall effectiveness of the team. 525 | 526 | An anti-pattern is when the names of the original authors are omitted, and the 527 | work is presented in the first person, either intentionally or unintentionally, 528 | as if the content were one's own. 529 | 530 | ### Professional content 531 | 532 | When writing an email or chat message, even if addressed to a select group, 533 | consider composing it in a way that it would remain professional and consistent 534 | if shared with a larger or unintended audience. Avoid using vague references 535 | like "we" and "they", especially when referring to internal teams or external 536 | parties such as customers. Refrain from using negative sentences or excessive 537 | emotion. Your content should be polished and ready to be forwarded by anyone, at 538 | any time, whether intentionally or unintentionally. 539 | 540 | ### Loop in Experts for Important Actions 541 | 542 | When making an important decision, involve the right experts. It is better to 543 | include too many people than to miss someone who should have been part of it. 544 | 545 | If you are writing an email or message that speaks for your team or group, check 546 | it with others first. Make sure the message reflects what everyone agrees on. 547 | 548 | When a message is aligned like this, it: 549 | 550 | - Stays strong even if people question it. 551 | - Builds trust inside and outside the team. 552 | - Shows that the team is working together. 553 | 554 | Taking the time to check with others makes your message clearer and more 555 | powerful in the long run. 556 | 557 | ## Complexity and Cognitive Load 558 | 559 | > "Complexity can be defined as intellectual unmanageability" (Nancy Leveson, 560 | > Engineering a Safer World, p.4) 561 | 562 | https://en.wikipedia.org/wiki/Cognitive_load (and Cognitive Overload) 563 | 564 | ### Solving Right Problems 565 | 566 | "Engineers are great at solving problems but they are not always great at 567 | identifying the right problems to be solved" (Dr. John Thomas, ESWC 2019). 568 | 569 | ### Solutions are Context-Driven 570 | 571 | Even the best solution to a problem is valid only within a given context. A 572 | slight change in the context can invalidate the solution, requiring one to start 573 | from scratch. This understanding highlights that no solution is universally 574 | perfect. Instead, solutions address specific problems or contexts in an "optimal 575 | enough" way. It also encourages detachment from ego-driven perfection, allowing 576 | solutions to evolve as the environment changes. 577 | 578 | Examples: 579 | 580 | - A clean architecture or pattern may shift to a completely different, sometimes 581 | opposite, solution due to changing requirements or system environments. 582 | - A "perfect" solution might be discarded because a new team or team leader 583 | dislikes technology X and prefers technology Y, or simply because it aligns 584 | with emerging industry trends. 585 | - Perfectly clean code may be rewritten and become more obfuscated due to 586 | necessary performance optimizations. 587 | - Highly efficient code might be rewritten to sacrifice performance in favor of 588 | better maintainability and readability, especially for a larger team. 589 | 590 | ### Weakest link 591 | 592 | A piece of information is only as clear as its most ambiguous piece. This is a 593 | generalisation from the following fragment from "Patterns for Writing Effective 594 | Use Cases" by Steve Adolph et al., Chapter 6.6: 595 | 596 | > Like the old proverb, "A chain is only as strong as its weakest link", a use 597 | > case is only as clear as its most ambiguous step. 598 | 599 | ### Point of View 600 | 601 | [How NASA Builds Teams](https://www.wiley.com/en-us/How+NASA+Builds+Teams%3A+Mission+Critical+Soft+Skills+for+Scientists%2C+Engineers%2C+and+Project+Teams-p-9780470456484): 602 | 603 | > The right coordinate system can turn an impossible problem into two really 604 | > hard ones. 605 | 606 | [The Early History Of Smalltalk](https://worrydream.com/EarlyHistoryOfSmalltalk/) 607 | 608 | > Watching a famous guy much smarter than I struggle for more than 30 minutes to 609 | > not quite solve the problem his way (there was a bug) made quite an 610 | > impression. It brought home to me once again that "point of view is worth 80 611 | > IQ points." I wasn't smarter but I had a much better internal thinking tool to 612 | > amplify my abilities. This incident and others like it made paramount that any 613 | > tool for children should have great thinking patterns and deep beauty 614 | > "built-in." 615 | 616 | ### Periphery 617 | 618 | If your reasoning is hindered by cognitive overload while trying to solve a 619 | problem, and there's no clear first step toward a solution, take a step back and 620 | start working with the Periphery. By cleaning up the periphery, you'll often 621 | find that the core problem becomes clearer and more approachable. 622 | 623 | A good example is legacy code: issues in the periphery, such as poor variable 624 | names, incorrect class responsibilities (even those distant from your immediate 625 | problem), or a disorganized folder structure, may seem irrelevant to the core 626 | issue. However, they still contribute to the cognitive overload. Fixing them 627 | will help clear the path for your actual work. 628 | 629 | Another word for Periphery is Background, see also 630 | [Deconcentation of Attention](http://deconcentration-of-attention.com/). 631 | 632 | ### Rational and Unconscious 633 | 634 | Engineers create rational artifacts that may appear simple and mundane. However, 635 | the process behind their creation often involves deep reflection and can stem 636 | from the unconscious mind. 637 | 638 | ### Humans are not designed for Big Numbers 639 | 640 | If you have to work with something that involves a big number of entities, like 641 | do something on 10000 files or work with megabytes of data, start with reducing 642 | this quantity to a minimum possible number of entities so that still makes sense 643 | for a prototype of your final work: make it work with 1 file instead of 10000 or 644 | with 20 bytes instead of 20 gigabytes. 645 | 646 | ### There is no such thing as Many 647 | 648 | Many does exist but it is difficult to cognize with a human mind. Many needs an 649 | Umbrella, that turns it into One in the way we think about it. Many can be 650 | homogenous like Array of objects of the same type or heterogeneous, for example 651 | a bunch of instructions in the code or multiple functions in a test class or a 652 | set of User Profile fields of various types: name (string), age (int), settings 653 | (object). Collections are easier because they hide Many from us behind a 654 | well-defined interface: `containsObject`, `getAtIndex`, `enumerateWithIndex`, 655 | which saves us from dealing with Many directly. Heterogeneous Many is harder: 656 | you have to cognize and organize it yourself: group instructions into meaningful 657 | functions, group fields into meaningful containers like structs or database 658 | tables. 659 | 660 | One programming construct that fails to constrain Many is tuple: you start doing 661 | things like `let person = ("John", 32)` and `let (name, age) = person` or things 662 | like `person.1` but then you quickly find yourself in a mess when the number 663 | grows to a real Many (quick lesson: don't use tuples, use structs!). If you have 664 | Many, find a way to think and work with it like One. 665 | 666 | ### 0-1-2-Many I 667 | 668 | Most of the people start saying "so many", "infinite" when there is actually 3 669 | or 4, rarely more, things on the table. Variation is 1a, 1b, 2a, 2b which is 670 | still within limit of 3 or 4. This looks like ancient calculator: when 0, 1, 2 671 | and then 'many'. Algebra looks fairly simple: 0 + 1 = 1, 1 + 1 = 2, 2 + 1 = 672 | many, 2 + 2 = many, etc. Consequence: people are quite susceptible to small 673 | numbers. Say something like "this consists of 3 steps" and people will get it. 674 | Don't say "seven". See also **Humans are not designed for Big Numbers**. 675 | 676 | ### 0-1-2-Many II 677 | 678 | Don't start to abstract or DRY from just two things. Wait until you have at 679 | least 3 of them. See also **Duplication is better than poor abstraction**. 680 | 681 | ### Masking (Shadowing) 682 | 683 | Masking/shadowing of all kinds is dangerous and should be avoided or treated 684 | with a great care. 685 | 686 | Examples: 687 | 688 | - errors introduced to the systems when overlapping requirements are implemented 689 | over time 690 | - masking in MC/DC 691 | - shadowing of variable declarations 692 | - typographically ambiguous symbols with overlapping visibility like `l` and 693 | `1`, `O` and `0` (see MISRA guidelines) 694 | - code reviews: real bugs can hide behind less important but more noticeable 695 | issues like typos or coding style details 696 | - bugs often hide themselves behind complexity 697 | 698 | See also Overlapping Control. 699 | 700 | ## Design 701 | 702 | ### Poor Abstraction 703 | 704 | > Duplication is better than poor abstraction (Sandi Metz, Rails Club 2014, 705 | > Moscow). 706 | 707 | > "...ill-fitting structure is worse than none..." (Eric Evans - Domain-Driven 708 | > Design, p.446) 709 | 710 | A good example from https://www.sigbus.info/worse-is-better: 711 | 712 | > In lld v2, we decided not to use an intermediate representation. Instead, we 713 | > directly handle platform-dependent native file formats. lld v2 consists of 714 | > virtually three different linkers for Windows, macOS and Unix. They share the 715 | > same design but do not share code. Naturally, we sometimes had to write very 716 | > similar code for each target. This may seem like an amateur-level programming 717 | > mistake, but in reality, it's much easier to write straightforward code for 718 | > each target than writing unified one that covers all the details and corner 719 | > cases of all supported targets simultaneously. 720 | 721 | ### Cost of Abstraction 722 | 723 | Software engineering often involves creating abstractions. A solution to a 724 | problem can include more or fewer abstractions, but each introduced abstraction 725 | comes with a cost. This cost manifests as the cognitive burden placed on those 726 | who need to understand, maintain, and document it – not just in code, but also 727 | in models, documentation, and even organizational structures. 728 | 729 | Cognitively, an abstraction can be thought of as a mental gadget that one must 730 | "install" in order to work with it. Imagine an empty room that needs to be 731 | furnished according to a specific use case. If the chosen abstractions fit well 732 | within the team's mental model, the space remains functional – like a 733 | well-furnished room where people can move freely and use it as intended. 734 | However, if abstractions are difficult to grasp or combine in contradictory 735 | ways, the mental space becomes cluttered, leaving little room to maneuver. This 736 | is similar to a room overloaded with furniture, making it difficult to navigate 737 | or even understand its intended purpose. 738 | 739 | For example, if a team introduces a new abstraction X, it incurs the following 740 | costs: 741 | 742 | - Every developer must understand and adopt X to work effectively within the 743 | system. 744 | - The system must be structured around X in a way that ensures maintainability 745 | over time. 746 | - Long-term maintenance will require keeping the code, file structure, and 747 | models aligned with X, often introducing additional overhead. 748 | 749 | Introducing too many incompatible abstractions – or a few abstractions that 750 | consume too much of the decision space – can quickly lead to over-engineering. 751 | Those responsible for maintaining such systems often find themselves 752 | disentangling unnecessary complexity, seeking a new balance that restores 753 | manageability by replacing or introducing more adequate abstractions. 754 | 755 | ### Habitability 756 | 757 | Habitable software is better than perfect software. 758 | 759 | [Richard Gabriel - Patterns of Software, Habitability and Piecemeal Growth](https://www.dreamsongs.com/Files/PatternsOfSoftware.pdf). 760 | 761 | > Habitability is the characteristic of source code that enables programmers, 762 | > coders, bug-fixers, and people coming to the code later in its life to 763 | > understand its construction and intentions and to change it comfortably and 764 | > confidently. Either there is more to habitability than clarity or the two 765 | > characteristics are different... 766 | 767 | > ...Habitability makes a place livable, like home. And this is what we want in 768 | > software – that developers feel at home, can place their hands on any item 769 | > without having to think deeply about where it is. It's something like clarity, 770 | > but clarity is too hard to come by. 771 | 772 | ### Hard Feature 773 | 774 | If a feature is hard to implement it might indicate that it is something wrong 775 | with the feature (or product). 776 | 777 | ### True Name 778 | 779 | If you know [True Name](https://en.wikipedia.org/wiki/True_name) of something 780 | you have power over it. Good class name - this is what True Name is in OOP. 781 | 782 | > "A well-chosen word can save an enormous amount of thought", (said by Mach 783 | > according to S.R.Cajal, Santiago Ramón y Cajal, "Advice for a young 784 | > investigator") 785 | 786 | See also 787 | [Mass and Gravity](http://www.carlopescio.com/2008/12/notes-on-software-design-chapter-2-mass.html). 788 | 789 | ### One Pattern per Class 790 | 791 | A class violates Single Responsibility Principle if it contains implementation 792 | of more than one design pattern. Of course there are exceptions. 793 | 794 | ### Archetype 795 | 796 | Archetype is an umbrella concept for other concepts like: `prototype`, 797 | `proof of concept`, `minimal viable product`. Archetype means something simple 798 | and coherent. If you know the archetype of something you understand the essence 799 | of it. A complex system can be traced back to a one or a number of underlying 800 | archetypes. 801 | 802 | Interesting side note: as far as I see it, the tendency is that engineers as 803 | they grow their software bigger, do not care much about the underlying 804 | archetypes. Imagine how easy it would be to learn about the software if it would 805 | contain itself in its earliest forms of being (source code, documentation, 806 | drafts etc). Great example: Rust programming language had to start from 807 | [somewhere](https://github.com/graydon/rust-prehistory). 808 | 809 | > "View the problem in its simplest forms ... An excellent method for 810 | > determining the meaning of something is to find out how it comes to be what it 811 | > is." (Santiago Ramón y Cajal, "Advice for a young investigator") 812 | 813 | ### Prima Materia 814 | 815 | Sometimes to make further progress you need to un-implement (break!) particular 816 | pattern/architecture/solution and put it back into 817 | [Prima Materia](https://en.wikipedia.org/wiki/Prima_materia) state and only then 818 | thansform it into a something new. Metaphors similar to Prima Materia are 819 | "primordial soup" and "indifferentiated soup of ideas" (Eric Evans - DDD). 820 | 821 | ### Mature automation 822 | 823 | Mature automation allows itself to be observed, inspected, and overridden. Even 824 | if something is automated and usually works well, there should always be a way 825 | to turn it off or adjust it when needed. Good automation is transparent – you 826 | can see what it is doing, understand how it works, troubleshoot problems, and 827 | make changes if necessary. In some situations, it is important to bypass 828 | automation entirely and take manual control or use an alternative path. Systems 829 | that do not allow this create unnecessary friction and risk. Automation should 830 | support people, not trap them. 831 | 832 | ### "Magic" is automation that is not adequate 833 | 834 | In the beginning, there is no magic, but simply a desire to automate things to 835 | reduce repetition. Magic appears as a result of increasing complexity that makes 836 | current solution to be inadequate for further progress. Magic can also emerge 837 | rather quickly as a result of automating wrong things from the beginning. The 838 | holy grail is automation that is always adequate. 839 | 840 | ### Poisonous Systems 841 | 842 | Badly designed systems tend to poison systems they interact with. 843 | 844 | ### Bad Design in House 845 | 846 | Do not overdesign your own software if you have a big producer of bad or too 847 | opinionated designs nearby. A big producer can be a vendor or a team with 848 | authority who decided to rely on a given design a while ago. 849 | 850 | ### Trade-off of Encapsulation 851 | 852 | Strong, "tight", encapsulation is good but don't forget about the users: 853 | Operations people. Good example is debugging facilities - if you close 854 | everything then you leave the ops people, who might be you, without any tools to 855 | understand or tweak your system. Richard Cook explains this very well: See 856 | [Velocity 2012: Richard Cook, "How Complex Systems Fail"](https://www.youtube.com/watch?v=2S0k12uZR14). 857 | 858 | ### Unnecessary Flexibility 859 | 860 | (from [Writing Solid Code](http://writingsolidcode.com/)) 861 | 862 | > Flexibility breeds bugs. Another strategy you can use to prevent bugs is to 863 | > strip unnecessary flexibility from your designs... The trouble with flexible 864 | > designs is that the more flexible they are, the harder it is to detect bugs. 865 | 866 | > ...Flexible features are troublesome because they can lead to unexpected 867 | > "legal" situations that you didn't think to test for even realize were 868 | > legal... 869 | 870 | > ...When you implement features in your own projects, make them easy to use; 871 | > don't make them unnecessary flexible. There is a difference. Don't allow 872 | > unnecessary flexibility. 873 | 874 | ### Black Box with a Green Play Button 875 | 876 | Ideal interface for a system of arbitrary complexity is a black box with a green 877 | play button on it - you take the box, press green button and it just works. The 878 | second ideal interface is when you also have a red button to stop the system. 879 | 880 | ### Single Source Concept and Its Exceptions 881 | 882 | The Single Source (of Truth) concept is one of the first principles beginner 883 | programmers learn and often becomes a rule they follow rigorously. However, like 884 | many principles in life, it has its exceptions. Blindly adhering to the Single 885 | Source rule can sometimes lead to suboptimal results. 886 | 887 | A good example of when this principle might fail is the 888 | [Poor Abstraction](#poor-abstraction) scenario. This happens when someone tries 889 | to consolidate similar elements into a single source while ignoring their 890 | significant differences. In such cases, forcing everything into one place can 891 | create an abstraction that is brittle, confusing, or overly complex, ultimately 892 | making the system harder to understand and maintain. 893 | 894 | Another example is 895 | [Two Almost Identical Entities](#two-almost-identical-entities). This occurs 896 | when someone tries to merge two seemingly identical entities into one, which 897 | results in an overly complicated "Single Source of Truth" codebase. This 898 | approach often leads to significant branching logic and reduced readability, 899 | making the code harder to work with and more prone to errors. 900 | 901 | Understanding when to apply the Single Source principle and when to allow for 902 | exceptions is crucial for achieving balance and maintaining flexibility in 903 | software design. Learning where to follow and where to de-prioritize the Single 904 | Source principle is a good skill that distinguishes a more experienced 905 | programmer from a beginner one. 906 | 907 | ### Resilience to Change vs Fixed Perfect Solutions 908 | 909 | When designing a system, there is a trade-off between making it easier to change 910 | in the future and striving for perfection. In most cases, choosing flexibility 911 | is the better option. If you anticipate changes in context or additional 912 | development work that could affect the system, avoid focusing too much on 913 | perfecting the existing solution, as it may not hold up under new pressures. 914 | Another important consideration is the ability to undo or disable a function 915 | that works perfectly now but could cause unforeseen issues in operation. Often, 916 | a perfectly working solution can create obstacles for other systems or people 917 | involved in operating the system. 918 | 919 | ### Two Almost Identical Entities 920 | 921 | Over the years I have seen at least three big units of a hardly manageable 922 | legacy code where each of them was built on two almost identical entities. There 923 | are two ways of such things to co-exist: 924 | 925 | 1. One is a subclass of the other. 926 | 2. Two almost identical hierarchies are maintained. 927 | 3. Two groups of helper functions without a clear separatation of 928 | responsibilities between them. 929 | 930 | It seems that historically in all three cases it started with one entity that 931 | accumulated its features along the way, then came the other which was so similar 932 | to the first that programmer avoided extraction of similar modules that both 933 | entities had and went with subclassing to get the result quickly or with 2 934 | parallel hierarchies. 935 | 936 | To these days I still didn't see or create an elegant solution to this problem. 937 | See also "Hard Feature". 938 | 939 | ### Control 940 | 941 | One of the key concerns is Control: where control should or should not be, what 942 | should have control (be active) and what should not have (passive). 943 | 944 | #### Observable Control 945 | 946 | Software should be designed in such a way that there always should be a 947 | dedicated place where it is obvious how the control and work flow through the 948 | software. This should be effective on all levels of abstraction and for each 949 | level of abstraction, such dedicated software should be free of the lower-level 950 | implementation details that discourage easy understanding of context. 951 | 952 | If something creates a low-level implementation noise on a given level, it might 953 | be a good sign that one or more underlying lower layers exist where that 954 | lower-level implementation can be represented as a high-level workflow logic 955 | (sequence of steps or algorithm). 956 | 957 | #### Humans should dominate machines 958 | 959 | The lower-level modules should not have control over higher-level modules. It is 960 | not only about not having higher-level module imported in lower-level modules 961 | and making everything to work through protocols/interfaces but more about what 962 | is the flow of control: "what controls what". Two shortcuts: **humans should 963 | dominate machines**, **business logic should dominate the system's 964 | implementation details**. 965 | 966 | #### Overlapping control 967 | 968 | Overlapping things is a challenge for a human mind and therefore is bad for the 969 | whole software lifecycle: design, development, testing and maintenance. This 970 | might be two or more classes that do the same thing. This might be two or more 971 | people whose responsibilities overlap. Nancy Leveson says Overlapping Control is 972 | one of the greatest sources of safety problems: two controllers whose areas of 973 | responsibilities overlap (see "Engineering a Safer World"). See also "Two almost 974 | identical entities" and "Shadowing/Masking". 975 | 976 | #### Broken control loops 977 | 978 | The top-level controllers should always have a control over the bottom-level 979 | elements. If the controllers include both humans and automation, the humans 980 | should always be able to intervene and take over the control provided by the 981 | automation. 982 | 983 | This heuristic can be turned into explicit design constraint. 984 | 985 | ### Feedback 986 | 987 | #### Broken feedback loops 988 | 989 | Missing, insufficient or incorrect feedback is a great source of troubles for 990 | any system. 991 | 992 | "All feedback loops must be closed" - this heuristic can be turned into explicit 993 | design constraint. 994 | 995 | ### Separation / partitioning 996 | 997 | - Separate stable from unstable 998 | - Separate permanent from temporary 999 | - Separate synchronous from asynchronous 1000 | - Separate similar from different 1001 | - Separate symmetrical from asymmetrical 1002 | - Balance and symmetry: if one partition has way more items than the other ones, 1003 | this may indicate that the partitioning has not been complete. 1004 | - Separate construction from operation (one example: Factory vs Command) 1005 | - Separate content from presentation (applies to UI-heavy code, great example: 1006 | HTML/CSS) 1007 | - Separate easy from complex: isolate easy, isolate complex, repeat many times 1008 | - Separate stateless from stateful 1009 | - Separate data from behavior and behavior from data unless you do have a good 1010 | OOP class/object with good data/behavior balance. 1011 | - Separate general-purpose from application-specific 1012 | - Separate application-level code from system-level code 1013 | - Separate methods that read from methods that write 1014 | - Separate decision from condition 1015 | - Separate One from Many, separate Many from Many. 1016 | 1017 | Example 1: "Monolithic test case files" 1018 | 1019 | In the following example the `_feature1_` or `_feature2_` parts and numbers in 1020 | the test method names assist a lot in logical grouping of the tested 1021 | functionality. 1022 | 1023 | ```c 1024 | # Many group #1 1025 | test_feature1_1() {} 1026 | test_feature1_2() {} 1027 | test_feature1_3() {} 1028 | # Many group #2 1029 | test_feature2_1() {} 1030 | test_feature2_2() {} 1031 | test_feature2_3() {} 1032 | ``` 1033 | 1034 | Example 2: the inner block has a multiline routine which could actually be 1035 | another function that works on one. At the same time this inner block on many. 1036 | Unless we create that another function we have a conflict between many of the 1037 | enumeration and many of the instructions inside a block. 1038 | 1039 | ```cpp 1040 | EnumerateInstructions(*function, [&](Instruction &instr, int bbIndex, int iIndex) 1041 | { 1042 | ... lots of lines working on `instr` ... 1043 | }); 1044 | ``` 1045 | 1046 | ### Grouping 1047 | 1048 | - Group together things that change at the same time. If possible create 1049 | container data structures so that a change involves a change of **one**. If 1050 | possible, group all the changes that happen at the same time together. 1051 | 1052 | - Group things that are used together. 1053 | 1054 | ### Observability vs Correctness 1055 | 1056 | Incorrect but observable code can be more valuable long-term than correct but 1057 | unobservable code. Observable code is easier to inspect, test, and improve, even 1058 | if it contains mistakes. In contrast, correct but hidden code can become 1059 | difficult to maintain and debug over time, creating technical debt. Visibility 1060 | allows for quicker fixes and ongoing improvement, making it more sustainable in 1061 | the long run. 1062 | 1063 | ### Don't Use RAII on a Business Logic Level 1064 | 1065 | RAII is good for resource management, such as handling memory, file handles, or 1066 | network connections, where resources need predictable acquisition and release. 1067 | However, applying RAII to business logic can lead to significant problems: 1068 | 1069 | - Reduced flexibility: RAII assumes that actions are tied directly to scope, but 1070 | business workflows may need to defer, combine, or otherwise manage actions 1071 | independently of object lifetimes. 1072 | 1073 | - Lack of transaction control: Business operations often involve external 1074 | systems, validation, or rollback mechanisms that require precise control. RAII 1075 | hides these processes behind object lifecycle management, making it harder to 1076 | handle errors or maintain consistency. 1077 | 1078 | - Unintended side effects: Business logic often involves workflows with complex 1079 | rules and dependencies. Tying actions like adding or removing data to the 1080 | lifecycle of objects can cause unexpected behaviors if those objects are 1081 | destroyed prematurely or unintentionally. 1082 | 1083 | - Debugging challenges: When business actions are implicitly triggered by object 1084 | lifetimes, it becomes harder to trace when and why specific operations occur. 1085 | This lack of clarity can lead to subtle bugs that are difficult to identify 1086 | and fix. 1087 | 1088 | Instead of using RAII, manage business logic explicitly through well-defined 1089 | methods or services. This approach keeps the logic transparent, easier to 1090 | understand, and more adaptable to changing requirements. 1091 | 1092 | ## Coding, code reviews, and maintenance programming 1093 | 1094 | ### Code that Works 1095 | 1096 | Working code with a good-enough architecture is better than buggy code with a 1097 | perfect but overly complex architecture. 1098 | 1099 | ### Code Is Not Your Partner 1100 | 1101 | Sometimes, you don't have to be nice to code. 1102 | 1103 | - It might be written for a different platform. 1104 | - It could be outdated or rely on ancient build tools. 1105 | - Some parts may be unnecessary for your needs. 1106 | - It may contain mistakes. 1107 | 1108 | In such cases, it is perfectly fine to delete, modify, or hack the code – to 1109 | make it compile, test it, or simply understand how it works. 1110 | 1111 | ### Two Strategies for Replacing a Feature 1112 | 1113 | When replacing Feature A with Feature B, there are two broad approaches. 1114 | 1115 | 1\. Remove A, Then Implement B 1116 | 1117 | This strategy is best when: 1118 | 1119 | - Feature A is simple. 1120 | - Feature B can be developed quickly. 1121 | - Switching to B is straightforward. 1122 | 1123 | In such cases, removing A first and then building B works well, as the 1124 | transition is fast and manageable. 1125 | 1126 | 2\. Develop B in Parallel, Switch from A to B, Remove A 1127 | 1128 | This approach is necessary when the transition is complex or time-consuming. 1129 | Instead of removing A immediately, B is developed alongside it while the 1130 | existing system remains operational. The switch to B happens only when it is 1131 | fully developed and tested. A remains available as a fallback until B is proven 1132 | reliable, after which A can be removed. 1133 | 1134 | This method is particularly useful when: 1135 | 1136 | - Feature B requires significant development time. 1137 | - Switching from A to B is complex and requires a dedicated transition 1138 | mechanism. 1139 | 1140 | For already deployed systems where downtime is unacceptable, the second approach 1141 | is often the only viable way to ensure a smooth migration. 1142 | 1143 | ### Smallest Scope 1144 | 1145 | - Restrict the scope of data to the smallest possible. (The Power of 10: Rules 1146 | for Developing Safety-Critical Code by NASA) 1147 | 1148 | ### Code Style as a Blocker 1149 | 1150 | Sometimes code style can be a blocker. Poorly formatted code can make 1151 | understanding of it extremely difficult. Do everything to reduce your cognitive 1152 | load. Real-world example: 1153 | 1154 | ```swift 1155 | let expectedRemainingLoops = Int(ceil( (expectedRemainingElements - Double(currentRemainingElementsForLoop)) / Double(PPENumberOfTasksInCurrentLoop) )) 1156 | ``` 1157 | 1158 | reads much better if 1159 | 1160 | ```swift 1161 | let expectedRemainingLoops = 1162 | Int( 1163 | ceil( 1164 | (expectedRemainingElements - Double(currentRemainingElementsForLoop)) / 1165 | Double(PPENumberOfTasksInCurrentLoop) 1166 | ) 1167 | ) 1168 | ``` 1169 | 1170 | ### Simplifying Complex Feature Branches 1171 | 1172 | When working on a non-trivial feature branch, consider breaking it down into its 1173 | core functionality while separating any trivial or unrelated changes that can be 1174 | integrated independently. 1175 | 1176 | A complex branch can often become more manageable, or even medium in scope, when 1177 | distilled into its essential parts and split into smaller, separate changes. In 1178 | some cases, breaking it down properly can eliminate the complexity entirely, 1179 | leaving only straightforward, incremental updates. 1180 | 1181 | ### The Moving and Changing Anti-pattern 1182 | 1183 | A great anti-pattern that complicates code reviews is creating a changeset that 1184 | involves both moving and changing things at the same time. This obscures the 1185 | diffs in the version control system, making it harder to track changes. The 1186 | solution: isolate moving and changing into separate commits or separate PRs. 1187 | 1188 | ### Avoid Plural Names For Classes 1189 | 1190 | Classes should represent a single entity or concept. Naming a class in the 1191 | plural form (e.g., `Users`) can confuse its responsibility, making it seem like 1192 | it manages multiple instances. Instead, use singular names (e.g., `User`) and 1193 | handle collections separately, such as in a `UserList` or `UserRepository`. This 1194 | ensures clear, focused class responsibilities. 1195 | 1196 | ### Fast Programming and Slow Programming 1197 | 1198 | This can be viewed as prototype vs. maintenance programming. Fast Programming is 1199 | crucial for rapid progress and is often encouraged by the business. However, it 1200 | rarely allows time to learn from mistakes due to the tunnel vision and "straight 1201 | ahead" thinking that often accompany it. Slow Programming, on the other hand, 1202 | has the virtue of reflection and deeper analysis, but it tends to be too slow to 1203 | launch a business from scratch. Business leaders typically start to appreciate 1204 | Slow Programming only when they hit the wall of complexity, realizing the need 1205 | for proper design. 1206 | 1207 | ### Stable Components 1208 | 1209 | Stable Components is a resort of a Maintenance Programmer. One way for a 1210 | developer to survive in a large legacy project is to create stable components or 1211 | extract them out of existing mess of code. Stable component most likely means a 1212 | testable component: it can be a parsing module or API layer or string 1213 | manipulation helpers. Having such islands of stability helps a lot to overcome 1214 | the difficulties of a maintenance programming. See also Periphery and Prima 1215 | Materia Heuristics. 1216 | 1217 | ### Boring Code 1218 | 1219 | (from [Writing Solid Code](http://writingsolidcode.com/)) 1220 | 1221 | > If your code feels tricky, that's your gut telling you that something isn't 1222 | > right. Listen to your gut. If you find yourself thinking of a piece of code as 1223 | > a near trick, you're really saying to yourself that an algorithm produces 1224 | > correct results even though it is not apparent that it should. The bugs won't 1225 | > be apparent to you either. 1226 | 1227 | > Be truly clever; write boring code. You'll have fewer bugs, and the 1228 | > maintenance programmers will love you for it. 1229 | 1230 | ### Boring Code 2 1231 | 1232 | Complex software is not to be developed and used by average programmers. This 1233 | happens anyway because of production pressures. People say: your mileage may 1234 | vary. 1235 | 1236 | ### Lack of Knowledge 1237 | 1238 | Bad code stems from a lack of knowledge, not malice, even though both bad code 1239 | and malice share unawareness as their root cause. Sometimes, it helps to put on 1240 | a "lack-of-knowledge hat" to better understand the intentions behind the code 1241 | you're reading. 1242 | 1243 | ### Lack of Knowledge II 1244 | 1245 | An interesting feature of inexperience is that it imposes limits on a software 1246 | system's ability to scale. Software written with unawareness at its core will 1247 | eventually become rigid and nightmarish, to the point where team members start 1248 | avoiding the "dark forest" of its codebase. The natural consequence is that such 1249 | software reaches an upper bound of complexity. Paradoxically, this means that 1250 | someone tasked with re-engineering it will often find its complexity manageable 1251 | in the end. 1252 | 1253 | ### Goodwill vs Pain 1254 | 1255 | Much of what we programmers learn over the years comes from pain, not from 1256 | goodwill. 1257 | 1258 | ## Biases 1259 | 1260 | ### If It Works, Then It Works Bias 1261 | 1262 | One of the common cognitive biases in engineering is the assumption that if 1263 | something works, it must be good enough. This belief often surfaces during 1264 | reviews of code, design, or systems that have passed tests or are known to 1265 | function under specific conditions. It takes conscious effort to question 1266 | something that already appears successful. 1267 | 1268 | But just because something works under one set of constraints does not mean it 1269 | will hold up under others. Often, "it works" simply means "it works here and 1270 | now". 1271 | 1272 | To counter this bias, reviewers should look beyond surface-level functionality 1273 | and ask: 1274 | 1275 | - It works with a file of size X. What about 10X or 100X? 1276 | - It works under normal conditions. What about a slow network or high CPU load? 1277 | - It works on Linux. Will it behave the same on embedded hardware? 1278 | 1279 | This bias also affects how we treat existing systems. A solution that "has 1280 | always worked" may be treated as correct by default, leading to investigations 1281 | based on flawed assumptions and missed problems that emerge under different 1282 | circumstances. 1283 | 1284 | There's no silver bullet for overcoming this bias. The key is maintaining 1285 | deliberate skepticism and making a habit of viewing solutions from multiple 1286 | angles. 1287 | 1288 | ### Focusing only on what's most visible bias 1289 | 1290 | The tendency to concentrate a review or investigation on the most obvious, 1291 | observable, or symptomatic parts of a system, rather than systematically 1292 | considering all potential contributing factors. This can lead to overlooking the 1293 | true root cause, especially if it's hidden in a less familiar or less accessible 1294 | area. Before jumping on a specific part of the problem or solution, first step 1295 | back and consider the bigger picture — which blocks in general might be 1296 | involved. As per the common saying: "Don't look only where there is light". In 1297 | practice, this means listing all possible contributors to a problem in the form 1298 | of a block diagram or any other simple sketch that collects both the symptoms 1299 | and relevant system parts. It can also help to annotate each block with relevant 1300 | properties — for example, in a performance investigation, adding performance 1301 | characteristics per block can highlight which parts are likely causes, not just 1302 | the ones that appear most problematic. 1303 | 1304 | ### The Fix Bias 1305 | 1306 | When reviewing a pull request titled "Fixes XYZ", there is a natural tendency to 1307 | trust the new change more than the existing code. This bias arises from the 1308 | assumption that the previous implementation was flawed simply because it is 1309 | being replaced. As a result, one might overlook the consequences of the fix or 1310 | fail to rigorously verify the correctness of the new change. 1311 | 1312 | To mitigate this bias, it's important to evaluate both the old and new 1313 | implementations with equal scrutiny. Consider questions such as: 1314 | 1315 | - Is the problem being solved accurately identified? 1316 | - Does the new change address the issue without introducing new problems? 1317 | - Are the trade-offs of this fix justified compared to the original 1318 | implementation? 1319 | 1320 | By being aware of this bias, reviewers can ensure a more balanced and thorough 1321 | review process. 1322 | 1323 | ### Resolving Merge Conflict Bias 1324 | 1325 | Software engineers frequently resolve merge conflicts, and while this task is 1326 | often trivial, it presents opportunities for introducing subtle bugs. One 1327 | contributing factor is the cognitive bias that favors accepting newly introduced 1328 | changes over preserving existing behavior. 1329 | 1330 | The conflict markers (`<<< >>>`) used by Git can obscure important details of 1331 | the original code, making it easy to unintentionally discard necessary logic. 1332 | 1333 | A practical approach to mitigating this risk is to slow down and carefully 1334 | evaluate both conflicting versions. Consider not just the new change, but also 1335 | what might be lost if an existing line or code chunk is removed. Reviewing the 1336 | code in context and testing after resolving conflicts can help prevent 1337 | unintended regressions. 1338 | 1339 | ## Reliability 1340 | 1341 | ### Errors are not ok 1342 | 1343 | Never ignore errors. Presence of errors indicates that you don't understand your 1344 | system well enough and therefore don't have a full control over it. 1345 | 1346 | An error can be major or minor but it anyway contributes negatively to the 1347 | design and operation of your system and also to your understanding of it (see 1348 | [Periphery](#periphery)). 1349 | 1350 | Errors typically ignored by developers include: 1351 | 1352 | - Configuration errors 1353 | - Compiler warnings 1354 | - Build system errors 1355 | - Errors produced by the test suites (flaky tests) 1356 | 1357 | ### Errors must be understood and described 1358 | 1359 | Google for `Malfunction 54` for a good example. 1360 | 1361 | ### Underlying errors shall not be hidden 1362 | 1363 | If a higher-level error wraps some other underlying error, the information about 1364 | the underdying error shall not be lost. Instead, it should be fully available to 1365 | the higher-level error for error handling, logging, tracing, etc. 1366 | 1367 | ### Critical errors vs non-critical errors 1368 | 1369 | Make a clear distinction between critical and non-critical errors on all levels: 1370 | source code, software design, error reporting, documentation. 1371 | 1372 | ### Assertions are better than no error handling 1373 | 1374 | When there is no error handling, presence of asserts gives at least some basic 1375 | guarantee that software does not do what it is not supposed to. 1376 | 1377 | ### Assertions are shortcuts for a proper error handling 1378 | 1379 | Every assert becomes a proper error handling eventually. 1380 | 1381 | ### Crash Early 1382 | 1383 | If you know how to not program defensively in a particular situation go ahead! 1384 | Otherwise make your code to Crash Early to catch bugs as early as possible: use 1385 | sensible assertions and stress edge-cases with tests. See 1386 | [Some notes C in 2016: Code offensively](http://blog.erratasec.com/2016/01/some-notes-c-in-2016.html#.VtGEKBg7T5c) 1387 | and 1388 | [Spotify engineering culture (part 2): "We aim to mistakes faster than anyone else"](https://labs.spotify.com/2014/09/20/spotify-engineering-culture-part-2/). 1389 | 1390 | ## Testing 1391 | 1392 | ### Write Tests, Even Bad Ones 1393 | 1394 | If you do not write tests, you will never learn how to write them. It's better 1395 | to write bad tests than to write none at all. 1396 | 1397 | ### TDD as a Toolbox 1398 | 1399 | The ability to do Test-Driven Development (TDD) is not a binary "can or cannot" 1400 | skill. It's about having a wide range of techniques, patterns, tricks, and hacks 1401 | in your toolbox. When you have enough of them, you can test almost anything in a 1402 | reasonable amount of time. 1403 | 1404 | ### Legacy Code is Code Without Tests 1405 | 1406 | As Michael Feathers puts it in Working Effectively with Legacy Code, "Legacy 1407 | code is code without tests." 1408 | 1409 | ### Testing as a Way to Manage Complexity 1410 | 1411 | In addition to ensuring quality, testing is essential for simulations that help 1412 | manage complexity. If I can test and simulate every aspect of my program, I can 1413 | effectively manage its complexity. However, if there are blind spots – areas 1414 | that are difficult or impossible to test – I lose control over those areas and 1415 | must rely on real users to test in the wild. 1416 | 1417 | ### Test It to Engineer It 1418 | 1419 | "If you can't measure it, then it can't be called engineering" (Ivar Jacobson, 1420 | Object-Oriented Software Engineering: A Use Case Driven Approach). We can 1421 | interpret "measure" as "test", with testing serving as both a form of 1422 | measurement and a core part of engineering. 1423 | 1424 | ### Improve Testability 1425 | 1426 | Ideally, everything should be testable. If something is difficult to test, it 1427 | often signals a need to improve code quality, toolset, or testing 1428 | infrastructure. With effort, these can be enhanced. If unsure how to test 1429 | something, start with a simple approach: stub everything, simplify the network, 1430 | assert what's necessary, then iterate on refining both the test and the system 1431 | under test (SUT). 1432 | 1433 | ## Distribution 1434 | 1435 | ### Provide Basic Test Sequences with Your Product 1436 | 1437 | If you are a provider of software or hardware, consider going beyond the 1438 | standard "interface control document" (ICD) by including basic test sequences – 1439 | a "Hello World"-type program that allows users to quickly get started with your 1440 | product. Such examples help users bring the system online and get up to speed 1441 | without unnecessary guesswork. 1442 | 1443 | The lack of clear "Hello World" or how-to documentation is especially prevalent 1444 | in the embedded software industry, where companies often rely solely on ICDs or 1445 | technical reference manuals. This forces end-user software engineers to engage 1446 | in guesswork and reverse-engineer the documentation to figure out how to bring 1447 | up a device. While the industry is gradually improving in this regard, there is 1448 | still a long way to go. By providing a clear and functional "Hello World" 1449 | example with every product, you empower your users and make adoption of your 1450 | product much smoother. 1451 | 1452 | ### Provide Drivers Alongside Your Hardware 1453 | 1454 | If you are a hardware provider, consider supplying software drivers with your 1455 | device rather than just a technical reference manual for end-users to decipher 1456 | and implement. As the developer of the device, you understand its functionality 1457 | better than anyone else. By providing ready-to-use drivers, you save your users 1458 | the time and effort of implementing the device's features themselves. 1459 | 1460 | With some effort on your part, you can significantly improve the adoption of 1461 | your product by making it easier to integrate and use. A smooth setup process 1462 | not only enhances user satisfaction but also reduces the barriers to bringing 1463 | your hardware to market. 1464 | 1465 | ### Provide Simulators Alongside Your Hardware 1466 | 1467 | If you supply hardware, consider providing a software simulator that mimics your 1468 | device. This greatly simplifies integration into users' SIL/PIL/HIL setups, 1469 | especially if the target users have access to only a limited number of your 1470 | devices (such as when the device is very expensive). 1471 | 1472 | For language choice, default to Python, as it is widely used for embedded 1473 | development tools. If performance is critical, a C/C++/Rust simulator is also a 1474 | great option, as these languages integrate well with embedded environments. 1475 | 1476 | ## Documentation 1477 | 1478 | ### The Illusion of Easy Documentation 1479 | 1480 | Good documentation is dry and boring. This can create an illusion that writing 1481 | good documentation is easy when in fact it is not. 1482 | 1483 | ### Less prose, more structure 1484 | 1485 | Technical documentation is supposed to focus engineer's attention on achieving a 1486 | given goal such as to build a specific system. It is easier to focus one's 1487 | attention on things that have structure embedded in them compared to things that 1488 | are hidden in several paragraphs of prose. Prose has no structure and that is 1489 | why a reader has to do an extra exercise of creating an order out of what he is 1490 | reading. If the documentation already has an order in it, the reader can spend 1491 | less time for a mental reconstruction of the content and focus on the technical 1492 | facts more easily. 1493 | 1494 | Some of the important tools that communicate order in technical documentation: 1495 | 1496 | - Document structure and table of contents 1497 | - Diagrams 1498 | - Tables. 1499 | 1500 | ### Too Much Structure Overload 1501 | 1502 | Excessively deep nesting in documents or folder structures can hinder the 1503 | understanding of the overall project or system structure, especially if the 1504 | principles used for organizing the sections lack consistency. Ideally, a good 1505 | structure should be intuitive, or at the very least, the organizational 1506 | principle should be easy to understand and mentally map, facilitating easier 1507 | navigation of the content. 1508 | 1509 | ### Encyclopedic Document 1510 | 1511 | An encyclopedic document is created over time as a collection of inputs from 1512 | various ad hoc events, eventually becoming a generic repository of everything. 1513 | These documents often have complex, nested structures and lack a single 1514 | consistent narrative. Reading them feels more like going through a dictionary 1515 | from A to Z rather than following a coherent story. This can make it difficult 1516 | for readers to stay engaged, which might explain why many people shy away from 1517 | reading standards altogether. 1518 | 1519 | Standards or guidelines are often structured in this encyclopedic way, as they 1520 | aim to encompass all aspects of product development or organizational processes. 1521 | Similarly, requirements specifications can easily take on an encyclopedic form, 1522 | making them hard to navigate and comprehend. 1523 | 1524 | When creating such documents, it's important to establish a guiding principle 1525 | that helps readers mentally map and navigate the content. Ideally, the document 1526 | should include a unifying narrative or story that makes it easier to follow, 1527 | even if the underlying information is complex or diverse. A clear structure and 1528 | logical flow can transform an overwhelming collection of information into a 1529 | useful and accessible resource. 1530 | 1531 | ## Meetings 1532 | 1533 | ### Sound Check 1534 | 1535 | It's great when everyone joins a meeting on time, but an often-overlooked 1536 | practice is doing a quick sound and video check to ensure everything is working 1537 | smoothly. A good rule of thumb is to join: 1538 | 1539 | - 5 minutes early for routine meetings. 1540 | - 15–30+ minutes early for important meetings, to handle any technical issues in 1541 | advance. 1542 | 1543 | ### Meeting Agenda 1544 | 1545 | A well-prepared meeting runs smoothly when attendees know what to expect. 1546 | 1547 | - A strong meeting has a predefined agenda that allows participants to follow a 1548 | clear execution plan. 1549 | - Is the agenda known in advance? 1550 | - Can you or your team define it? 1551 | - Are there questions or answers that can be prepared beforehand? 1552 | 1553 | ### Meeting Notes 1554 | 1555 | Meetings often lack structure, and when no notes are taken, valuable discussions 1556 | can be lost. A better approach is for someone to take ownership of note-taking 1557 | in real-time, ideally on a shared screen so everyone can see what is being 1558 | recorded. 1559 | 1560 | - If your team owns the agenda, align meeting notes with the planned topics. 1561 | - Structure notes so key points and next steps are clear. 1562 | 1563 | ### Capturing Meeting Results 1564 | 1565 | A meeting without tangible outcomes is just an expensive conversation. At a 1566 | minimum, meetings should result in: 1567 | 1568 | - Action points: tasks, follow-ups, next meetings. 1569 | - Decisions made. 1570 | - Recognized trade-offs. 1571 | 1572 | Whenever possible, capturing processes or architectures in a diagram is better 1573 | than a simple bullet point. Even if no formal notes are recorded, every 1574 | participant leaves with takeaways and mental models – but written records 1575 | significantly increase the meeting's effectiveness. 1576 | 1577 | Anti-pattern: Running meetings without documenting useful outcomes, leading to 1578 | wasted time and repeated discussions. 1579 | 1580 | ### Briefing In 1581 | 1582 | Before the actual meeting, getting alignment among participants is key, whether 1583 | for internal team discussions or external events like conferences and large 1584 | review meetings. When a team participates in an external meeting, it is crucial 1585 | that everyone is on the same page and presents a unified front, avoiding any 1586 | visible disagreement or misalignment. 1587 | 1588 | Good questions to determine if a pre-meeting briefing is needed: 1589 | 1590 | - How many attendees already know what will be presented? 1591 | - Does the content introduce significant innovation that requires prior context? 1592 | Could too much new information create confusion within the presenting team? 1593 | 1594 | Common pitfalls: 1595 | 1596 | - Discussing internal team matters in the presence of external participants. 1597 | - Asking too many unrelated questions that derail the focus of the meeting, 1598 | particularly when it disrupts team cohesion and diverts attention from the 1599 | main agenda. This is especially problematic when an individual undermines the 1600 | shared position of the team by introducing misalignment. 1601 | 1602 | ### Sharing Screen & Presenting Material 1603 | 1604 | - Share only the relevant content – close unrelated applications, especially 1605 | internal company chats, before presenting to an external audience. 1606 | - If you need to access other files or perform actions outside the presentation, 1607 | unshare your screen first, complete the task, then reshare only the necessary 1608 | content. 1609 | - If your team is presenting to an external party, align on the materials 1610 | beforehand to ensure consistency in messaging. 1611 | 1612 | ## Systems 1613 | 1614 | ### Good enough is often best 1615 | 1616 | "Good enough for each part is often best for the whole system." ("The Art of 1617 | Systems Thinking") 1618 | 1619 | In "Engineering a Safer World", Nancy Leveson discusses how, in air traffic 1620 | control, individual flight paths may not be optimized for each aircraft to 1621 | ensure overall traffic harmony. This approach is necessary because optimizing 1622 | each flight path individually could lead to conflicts and inefficiencies. 1623 | Instead, air traffic control systems manage traffic by coordinating flight paths 1624 | to maintain safe separation between aircraft, ensuring the overall safety and 1625 | efficiency of the airspace. 1626 | 1627 | ### Designing Systems for Effective Work 1628 | 1629 | - "Rather than trying to find extraordinary people to do a job, design the job 1630 | so that ordinary people can do it well." ("The Art of Systems Thinking") 1631 | 1632 | > ...No one comes to work to do a bad job, but the structure of the system may 1633 | > make good work impossible. If management falls into the blame trap, they may 1634 | > fire the offending individual and hire someone else - who may do no better. 1635 | > Rather than trying to find extraordinarypeople to do a job, design the job so 1636 | > that ordinary people can do it well. It is the structure of the system that 1637 | > creates the results. For better results, change the structure of the system. 1638 | 1639 | ### The Risk of Default Outcomes 1640 | 1641 | Unresolved trade-offs, especially those that persist over long periods, can be 1642 | risky. Decisions left undecided, such as whether to build or buy critical 1643 | hardware, will not remain open forever. Instead, they tend to resolve themselves 1644 | by default, often in bad ways, whether due to inertia, external pressures, or 1645 | short-term needs. Like a coin that always falls on a side, an undecided 1646 | trade-off will eventually land on an outcome which might not align with 1647 | strategic goals. 1648 | 1649 | To mitigate this risk, individuals, managers, teams, and organizations should 1650 | proactively track and resolve open decisions, ensuring that critical choices are 1651 | made deliberately rather than by default. Tools such as an Open Questions Log or 1652 | a Risk Registry can support the structured resolution of such trade-offs. 1653 | 1654 | ## People and Organizations 1655 | 1656 | ### Everyone is busy 1657 | 1658 | Everyone is busy, including you. The development of software products often 1659 | takes place in rushed environments, where everyone is focused on achieving 1660 | specific goals without having time to do things properly or fully explore all 1661 | the options for what is being built. 1662 | 1663 | How about QA? A company may have a dedicated QA department, or even Safety & 1664 | Reliability teams in addition. They are most likely also busy, focusing on the 1665 | most critical tasks to the point that they probably don't have enough time to 1666 | interact with development teams, understand the real requirements, or provide 1667 | 100% coverage and a complete assessment of the project scope. 1668 | 1669 | Is it a problem that everyone is busy? Given its ubiquity, it doesn't seem so. 1670 | Some people even seem to thrive on being busy all the time. Organizations appear 1671 | to care little about "busyness" itself. What really matters is whether the busy 1672 | person or department can deliver results according to the schedule or whether 1673 | something left uncovered by the busy teams could create serious problems for the 1674 | business. 1675 | 1676 | One unfortunate observation is that it usually takes significant time before the 1677 | uncovered issues are revealed and addressed from the top down. During this 1678 | incubation period, enough money is often lost, a number of unhappy customers 1679 | accumulate, and other losses may occur, depending on the type of project. 1680 | 1681 | Or, busy people themselves get tired... and create new methods and tools. 1682 | Sometimes, a new tool can eliminate much of the effort required to achieve a 1683 | goal, or it simply allows a busy person to focus on "what is most important" 1684 | rather than covering everything. 1685 | 1686 | ### Solving Problems with Cash 1687 | 1688 | Every engineering problem can be solved with an infinite amount of cash. 1689 | 1690 | ### The Paradox of Rushing in Software/Systems Engineering 1691 | 1692 | Attempting to accelerate development often leads to greater delays. In highly 1693 | complex systems, skipping thorough validation, testing, or review processes can 1694 | result in unforeseen issues, requiring extensive rework and ultimately 1695 | prolonging the timeline beyond what a steady, methodical approach would have 1696 | taken. 1697 | 1698 | There is one [parable](https://howtopracticezen.org/Advanced%20Zen/) that sounds 1699 | like this: 1700 | 1701 | > Zen teachers often tell the story of a young monk who asked a Zen master: 1702 | > 1703 | > "How long will it take me to attain enlightenment?" The master thought for a 1704 | > few moments and replied: "About ten years." The young monk was upset and said: 1705 | > "But you are assuming I am like the other monks, and I am not. I will practice 1706 | > with great determination." "In that case", replied the Master, "twenty years." 1707 | 1708 | and a [similar one](https://martialarts.stackexchange.com/a/7133/7133): 1709 | 1710 | > ... "But if I work hard, how many years will it take to become a master?" 1711 | > persisted the youth. 1712 | > 1713 | > "Oh, maybe thirty years", said Banzo. 1714 | > 1715 | > "Why is that?" asked Matajuro. "First you say ten and now thirty years. I will 1716 | > undergo any hardship to master this art in the shortest time!" 1717 | > 1718 | > "Well", said Banzo, "in that case you will have to remain with me for seventy 1719 | > years. A man in such a hurry as you are to get results seldom learns quickly." 1720 | 1721 | ### Four seasons 1722 | 1723 | It is an amusing analogy: like a year starts with a spring and ends with a 1724 | winter, a similar lifecycle can be observed in a growth of organizations. 1725 | 1726 | Spring is a young company, a handful of people. Not much structure, no strict 1727 | policies, a startup atmosphere. Not yet a fixed income, but probably investments 1728 | or lack of them. More full-stack people with broad expertise. Spring is like a 1729 | village. Colleagues are fellow villagers. 1730 | 1731 | A Summer is a Spring that made it, a company that is flourishing. Exponential 1732 | growth, more people are hired, extremely steep curve of everything: the 1733 | development of the company structure, more departments, more specialization. The 1734 | philosophy of the company is no longer about "finding its way" but rather 1735 | accelerating on what made a transition from Spring to Summer possible. 1736 | 1737 | Autumn is already a company with legacy. The source of income is known and 1738 | stabilized. The responsibilities are defined. Less or no people are busy with 1739 | defining a product anymore but more people are busy with the optimization: 1740 | improving product, doing sales and increasing revenues. 1741 | 1742 | Winter is a dangerous phase. The company has been making profit and doing its 1743 | best by exhausting what was known to work well. At this point, the structure of 1744 | the company is the most fixed and therefore the least resilient. The company may 1745 | cease to exist because there are younger and more adequate competitors or it can 1746 | find a way to renew itself and make it into a new year. 1747 | 1748 | Another interesting observation is that a transition from season to season 1749 | almost never goes smoothly – in order to accomodate for change, the company has 1750 | to adapt and this very often happens with a good deal of destruction and 1751 | restructuring (see Prima Materia heuristic). Dropping what does not work and 1752 | keeping or creating what does might be crucial for such a transition. Not all of 1753 | the Spring companies make it into Summer. Not all of the companies end up being 1754 | Winter. Not all of the companies can survive their deep Winter. 1755 | 1756 | One particular management mistake that can be made is trying to apply the best 1757 | practices of a season A to a season B if the season B is too early or already 1758 | too late for such an application. Example: imposing a strict top-down style of 1759 | management on a company of 5-10 people working in a flat hierarchy and making 1760 | them to adhere to the reporting lines might be extremely inadequate as well as 1761 | expecting a fully flat hierarchy to work in an Autumn-like business. 1762 | 1763 | Not only we can match seasons and companies, we can also match seasons and 1764 | personalities: 1765 | 1766 | - Autumn is too boring for spring people who value creativity and individual 1767 | contribution over hierarchies and defined processes. 1768 | - For Autumn people, the Spring is too chaotic and unstructured. Working for a 1769 | Spring company is inherently unsafe: the younger the company, the less 1770 | guarantees it can provide to its employees. 1771 | - It may not be optimal for a company to have too many people who represent an 1772 | incompatible season. It can be damaging for a person to get stuck working at a 1773 | company that does not match their season type. In such cases, a person who 1774 | found a matching season can be compared to a fish that found its water. 1775 | 1776 | See also Kent Beck's 1777 | [The Product Development Triathlon](https://medium.com/@kentbeck_7670/the-product-development-triathlon-6464e2763c46). 1778 | His 3 phases: Explore-Expand-Extract can be loosely mapped to the 1779 | Spring-Summer-Autumn seasons. 1780 | 1781 | ## Standards 1782 | 1783 | ### Idealized standards vs. practical implementation 1784 | 1785 | Standards provide an idealized or encyclopedic view of how systems should 1786 | function and how products should be developed. Frequently, a standard represents 1787 | the combined inputs of multiple companies, making it more extensive than what 1788 | any single company might realistically implement. For most companies, 1789 | implementing a standard is a "best effort" exercise. 1790 | 1791 | Some standards are practical only for larger companies and can be 1792 | counterproductive or harmful for smaller organizations attempting to implement 1793 | them. Recognizing this, some standards explicitly account for a company's 1794 | maturity level and offer recommendations on which parts to implement at 1795 | different stages of development. 1796 | 1797 | ### The challenge of standards implementation 1798 | 1799 | Implementing standards and managing their results within an organization can be 1800 | difficult and complex. However, without any standards, everything becomes 10 to 1801 | 100 times harder and more chaotic. 1802 | 1803 | ### Standards and best practices 1804 | 1805 | Standards seek out best practices, collect them, and generalize them. 1806 | 1807 | ### Standards favor good practice 1808 | 1809 | Standards favor good practices. If a company has adopted a practice that is not 1810 | yet conventional but makes sense and adds value, it is unlikely that this 1811 | practice would be rejected or deemed inappropriate by any standard. 1812 | 1813 | ### Wrong is worse than early or incomplete 1814 | 1815 | Sometimes it is worse to be wrong than to be early or lack information. The 1816 | context: passing the project review milestones required by standards. 1817 | 1818 | ## Requirements 1819 | 1820 | ### One-stop shopping 1821 | 1822 | > "One-stop shopping" is a useful requirements writing priciple. Simply, people 1823 | > reading the requirements should be able to get all the information they need 1824 | > from one document or from one section of a document. They should not have to 1825 | > jump between different sections to understand the requirement. (Patterns for 1826 | > Effective Use Cases by Steve Adolph et al., Chapter 7.1) 1827 | 1828 | ## Safety 1829 | 1830 | ### Safety does not exist without blood, loss or failure 1831 | 1832 | Safety is not there from the very beginning. A gloomy poet could say that safety 1833 | blooms on blood. Safety does also not exist on its own: you first need to build 1834 | something that kills people or causes a loss, then some people will bother to 1835 | learn from this and take actions. Only then safety gets recognized and truly 1836 | appreciated. 1837 | 1838 | Consequence: safety is especially sound for those folks who have some experience 1839 | of dealing with blood, loss or failure. 1840 | 1841 | ### Safety is boring 1842 | 1843 | When implemented well enough, safety becomes boring. Everything is working, no 1844 | one complains. At that moment, it is easier than ever to forget about why the 1845 | safety is there in the first place. Example: how often do we bother to look at 1846 | the safety manuals? Does it mean that the safety is there? 1847 | 1848 | ### Safety is very hard to achieve but is very easy to lose 1849 | 1850 | Safety is the extremely fragile and sensitive property of the systems. It so 1851 | much effort that is put into achieving it and still it is so easy to let the 1852 | whole system get down. Some of the very popular reasons for the failure are: 1853 | 1854 | - degradation of existing components 1855 | - changes to the system that do not take the current system's behavior into 1856 | account 1857 | - new unexpected factors coming outside the system boundary 1858 | 1859 | Consequence: safety requires continuous and intelligent effort. 1860 | 1861 | ### Success breeds failure 1862 | 1863 | Handbook of Walkthroughs, Inspections, and Technical Reviews, p.412: 1864 | 1865 | > ... however, we have to anticipate that we will in fact succeed once in a 1866 | > while - and we must also anticipate what that success will bring. For 1867 | > instance, one error-riddled system was seldom used by its several hundred 1868 | > potential users, so management decided to mount an effort to have the system 1869 | > repaired in a systematic fashion. The resulting system was so dependable and 1870 | > useful that usage suddenly increased by a factor of a thousand over previous 1871 | > usage. This increase in transaction volume made the file design of the system 1872 | > completely inadequate to the daily load - which soon meant that nobody could 1873 | > get results fast enough to be useful. The entire problem - and so many others 1874 | > like it - could have been avoided if the review group had only considered that 1875 | > unavoidable law of nature: **Success breeds failure**. So, ..., be prepared 1876 | > for the inevitable reaction. If you start making systems better, your users 1877 | > will want more of the same - the best side effect of all. 1878 | 1879 | ### Safety as a Defensive Discipline 1880 | 1881 | Safety is often seen as a defensive discipline, in contrast to fields focused on 1882 | creation, innovation, and action, which drive progress. While these fields push 1883 | forward with new ideas and developments, safety functions as a secondary, 1884 | backing force. Its role is to prevent harm, minimize risks, and ensure that 1885 | these actions happen within a secure framework. Safety doesn't seek to lead the 1886 | charge but to protect and enable other processes to unfold without catastrophic 1887 | failure. 1888 | 1889 | However, the drive to "lead the charge" often means safety is ignored or 1890 | sidelined until it's too late. In this way, safety acts like a belt that holds 1891 | uncontrolled progress together, preventing it from falling apart when the 1892 | inevitable risks are not properly addressed. 1893 | 1894 | ### Safety for Engineering is Like Medicine for People 1895 | 1896 | Medicine isn't the most exciting thing, and no one wants to spend all their time 1897 | thinking about it. But it's clear that humanity can't thrive without it, even 1898 | with all the amazing achievements of civilization. 1899 | 1900 | In the same way, organizations focus on building things that work and often 1901 | don't think much about safety or quality as long as things are fine and 1902 | customers are happy. But over time, they may realize that the "health" of their 1903 | products, teams, and development processes also matters. 1904 | 1905 | How safety and quality are handled depends a lot on experience and knowledge. 1906 | Not long ago, amputation was seen as the best way to treat many illnesses. This 1907 | shows how much we've learned and how practices improve over time. Engineering 1908 | also needs to grow in this way, moving beyond quick fixes to create stronger, 1909 | longer-lasting solutions. 1910 | 1911 | ### User Interfaces and Critical Systems 1912 | 1913 | Too much simplicity can be a problem. Overly simplistic interfaces may prevent 1914 | operators from engaging their brains fully, which could negatively impact their 1915 | performance in critical situations. If an interface is too simple, operators can 1916 | fall into automatism, executing the wrong action due to a lack of alertness. 1917 | There are serious concerns that software and interface designers should 1918 | prioritize preventing user mistakes, rather than focusing solely on aesthetics. 1919 | 1920 | ## Books 1921 | 1922 | - [The Art of Systems Thinking](https://www.google.de/search?q=the+art+of+systems+thinking+book&oq=the+art+of+systems+thinking+book) 1923 | 1924 | ## Similar resources 1925 | 1926 | - [Kent Beck - Mastering Programming](https://www.facebook.com/notes/kent-beck/mastering-programming/1184427814923414/) 1927 | - [Heuristics of Software Testability](http://www.satisfice.com/tools/testable.pdf) 1928 | - [The Law of Leaky Abstractions](https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/) 1929 | - [Lessons Learned in Software Development](https://henrikwarne.com/2015/04/16/lessons-learned-in-software-development/) 1930 | 1931 | ## Copyright 1932 | 1933 | Copyright (c) 2015-2025 Stanislav Pankevich s.pankevich@gmail.com. 1934 | -------------------------------------------------------------------------------- /tasks.py: -------------------------------------------------------------------------------- 1 | # Invoke is broken on Python 3.11 2 | # https://github.com/pyinvoke/invoke/issues/833#issuecomment-1293148106 3 | import inspect 4 | import os 5 | import re 6 | import sys 7 | from typing import Optional 8 | 9 | if not hasattr(inspect, "getargspec"): 10 | inspect.getargspec = inspect.getfullargspec 11 | 12 | import invoke # pylint: disable=wrong-import-position 13 | from invoke import task # pylint: disable=wrong-import-position 14 | 15 | # Specifying encoding because Windows crashes otherwise when running Invoke 16 | # tasks below: 17 | # UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' 18 | # in position 16: character maps to 19 | # People say, it might also be possible to export PYTHONIOENCODING=utf8 but this 20 | # seems to work. 21 | # FIXME: If you are a Windows user and expert, please advise on how to do this 22 | # properly. 23 | sys.stdout = open( # pylint: disable=consider-using-with 24 | 1, "w", encoding="utf-8", closefd=False, buffering=1 25 | ) 26 | 27 | 28 | def run_invoke( 29 | context, 30 | cmd, 31 | environment: Optional[dict] = None, 32 | warn: bool = False, 33 | ) -> invoke.runners.Result: 34 | def one_line_command(string): 35 | return re.sub("\\s+", " ", string).strip() 36 | 37 | return context.run( 38 | one_line_command(cmd), 39 | env=environment, 40 | hide=False, 41 | warn=warn, 42 | pty=False, 43 | echo=True, 44 | ) 45 | 46 | 47 | @task(default=True) 48 | def list_tasks(context): 49 | clean_command = """ 50 | invoke --list 51 | """ 52 | run_invoke(context, clean_command) 53 | 54 | 55 | @task 56 | def toc(context): 57 | run_invoke(context, "doctoc README.md") 58 | 59 | 60 | @task 61 | def format(context): 62 | run_invoke(context, "prettier --write --print-width 80 --prose-wrap always README.md") 63 | 64 | 65 | @task(aliases=["l"]) 66 | def lint(context): 67 | format(context) 68 | toc(context) 69 | --------------------------------------------------------------------------------