├── README.md ├── images ├── have_to_go_deeper.jpg └── i_know_what_to_do.jpeg ├── object_with_getstate.ipynb └── protocol_decorator.md /README.md: -------------------------------------------------------------------------------- 1 | # Unpatterns 2 | 3 | Too hacky to be a pattern, not bad enough to be an antipattern - Unpatterns! 4 | 5 | The things collected in this repository are ungodly mostrosities that I dreamt up in sleepless nights, 6 | and considered them to be elegant but also a bit dangerous and very unusual. 7 | 8 | I don't recommend using them in production code! They will raise eyebrows. Your colleagues might shun you. 9 | Your partner could leave you, your dog might run away, and your plants will dry out! 10 | 11 | You have been warned! 12 | 13 | 1. [Protocol Decorator](protocol_decorator.md) 14 | One boring and two crazy ways to build a decorator of a class. 15 | 2. [Salvaging python's object](object_with_getstate.ipynb) 16 | Highlighting important problems of persisting objects in Python, 17 | and hacking through the builtins to solve them. Worth a read if you 18 | want to know more (and care) about backwards compatibility for saved objects. 19 | 3. Allowing operators to be applied to functions: TBA 20 | -------------------------------------------------------------------------------- /images/have_to_go_deeper.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MischaPanch/unpatterns/5d0de7fb4bbbbe7ea1749dc738ac270b719144d0/images/have_to_go_deeper.jpg -------------------------------------------------------------------------------- /images/i_know_what_to_do.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MischaPanch/unpatterns/5d0de7fb4bbbbe7ea1749dc738ac270b719144d0/images/i_know_what_to_do.jpeg -------------------------------------------------------------------------------- /object_with_getstate.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "metadata": {}, 5 | "cell_type": "markdown", 6 | "source": "# Salvaging Python's broken `object` by adding `__getstate__`", 7 | "id": "f9b369d090a67772" 8 | }, 9 | { 10 | "metadata": {}, 11 | "cell_type": "markdown", 12 | "source": [ 13 | "## Intro: Pickling and Backwards Compatibility\n", 14 | "\n", 15 | "This unpattern is concerned with the persistence of python objects via pickle. I\n", 16 | "realize that in the wild of python programming, comparatively few people care\n", 17 | "about backwards compatibility in general, not even talking about backwards\n", 18 | "compatibility of pickled objects.\n", 19 | "\n", 20 | "Ensuring the latter means that if you (or your user) saved an object of some\n", 21 | "class, say `A` with pickle, you should be able to load it with a newer version\n", 22 | "of the codebase. Curiously, this is especially important in the context of \n", 23 | "machine learning (precisely where it is routinely neglected) because one might\n", 24 | "want to load a previously saved model even after\n", 25 | "performing a seemingly innocent update of some ML library dependency.\n", 26 | "\n", 27 | "Alas, as most of us learn the hard way sooner or later, there is no such thing\n", 28 | "as an innocent update of a dependency in the python world..." 29 | ], 30 | "id": "ae15cddf34abd6a" 31 | }, 32 | { 33 | "metadata": {}, 34 | "cell_type": "markdown", 35 | "source": [ 36 | "The main mechanism for providing backwards compatibility for pickled objects is\n", 37 | "the `__setstate__` magic method. If your class definition has changed between\n", 38 | "the time it was pickled and the time it is loaded, implementing `__setstate__`\n", 39 | "allows you to modify how the state of the object is restored.\n", 40 | "\n", 41 | "It's not always sufficient (e.g., if the class name has changed, it won't work),\n", 42 | "but knowing about and using `__setstate__` already covers a lot of cases. In the\n", 43 | "last section of the article I'll add a short overview of additional techniques\n", 44 | "for backwards compatibility when just `__setstate__` is not enough. Since these\n", 45 | "things are actually reasonable or even required, they don't fit the \"unpattern\"\n", 46 | "scheme and are therefore not the main focus. If you want to learn useful stuff,\n", 47 | "give them a glance, but useful stuff is not what we're here for ;)." 48 | ], 49 | "id": "250e7b55c983069a" 50 | }, 51 | { 52 | "metadata": {}, 53 | "cell_type": "markdown", 54 | "source": [ 55 | "### Example of Non-problematic Persistence\n", 56 | "\n", 57 | "Before outlining the problem, let's have a look at a case without issues:" 58 | ], 59 | "id": "d62245f771815fcd" 60 | }, 61 | { 62 | "metadata": { 63 | "ExecuteTime": { 64 | "end_time": "2024-05-09T21:50:02.155612Z", 65 | "start_time": "2024-05-09T21:50:02.146207Z" 66 | } 67 | }, 68 | "cell_type": "code", 69 | "source": [ 70 | "from typing import Any\n", 71 | "import pickle\n", 72 | "\n", 73 | "class A:\n", 74 | " def __init__(self, foo: int):\n", 75 | " self.foo = foo\n", 76 | "\n", 77 | "serialized_a = pickle.dumps(A(42))\n", 78 | "\n", 79 | "\n", 80 | "# changing the class definition to illustrate deserialization\n", 81 | "# We add a new default argument to the constructor and a new field\n", 82 | "\n", 83 | "class A:\n", 84 | " def __init__(self, foo: int, baz: str = \"baz\"):\n", 85 | " self.foo = foo\n", 86 | " self.baz = baz\n", 87 | " self.new_field = \"new_value\"\n" 88 | ], 89 | "id": "ccd1f66da156f330", 90 | "outputs": [], 91 | "execution_count": 1 92 | }, 93 | { 94 | "metadata": {}, 95 | "cell_type": "markdown", 96 | "source": [ 97 | "If we try to load the pickled object, we get a malformed object (note that there\n", 98 | "is no error on loading, which is a problem in itself, since no static\n", 99 | "code analysis will ever inform you about this):" 100 | ], 101 | "id": "a079a64dfb31f4a9" 102 | }, 103 | { 104 | "metadata": { 105 | "ExecuteTime": { 106 | "end_time": "2024-05-09T21:50:02.495120Z", 107 | "start_time": "2024-05-09T21:50:02.491099Z" 108 | } 109 | }, 110 | "cell_type": "code", 111 | "source": [ 112 | "def print_error(e: Exception):\n", 113 | " print(f\"{e.__class__.__name__}: {e}\")\n" 114 | ], 115 | "id": "3bbf220674919e6e", 116 | "outputs": [], 117 | "execution_count": 2 118 | }, 119 | { 120 | "metadata": { 121 | "ExecuteTime": { 122 | "end_time": "2024-05-09T21:50:02.529311Z", 123 | "start_time": "2024-05-09T21:50:02.523883Z" 124 | } 125 | }, 126 | "cell_type": "code", 127 | "source": [ 128 | "a: A = pickle.loads(serialized_a)\n", 129 | "try:\n", 130 | " a.baz\n", 131 | "except AttributeError as e:\n", 132 | " print_error(e)" 133 | ], 134 | "id": "40763da11fabd6c8", 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "AttributeError: 'A' object has no attribute 'baz'\n" 141 | ] 142 | } 143 | ], 144 | "execution_count": 3 145 | }, 146 | { 147 | "metadata": {}, 148 | "cell_type": "markdown", 149 | "source": "With `__setstate__` we can fix this:", 150 | "id": "ac711ab4535b79f2" 151 | }, 152 | { 153 | "metadata": { 154 | "ExecuteTime": { 155 | "end_time": "2024-05-09T21:50:02.608977Z", 156 | "start_time": "2024-05-09T21:50:02.603936Z" 157 | } 158 | }, 159 | "cell_type": "code", 160 | "source": [ 161 | "class A:\n", 162 | " def __init__(self, foo: int, baz: str = \"baz\"):\n", 163 | " self.foo = foo\n", 164 | " self.baz = baz\n", 165 | " self.new_field = \"new_value\"\n", 166 | " \n", 167 | " def __setstate__(self, state: dict[str, Any]):\n", 168 | " full_state = {\"baz\": \"baz\", \"new_field\": \"new_value\"}\n", 169 | " full_state.update(state)\n", 170 | " self.__dict__.update(full_state)" 171 | ], 172 | "id": "be21004db1fbfc4", 173 | "outputs": [], 174 | "execution_count": 4 175 | }, 176 | { 177 | "metadata": { 178 | "ExecuteTime": { 179 | "end_time": "2024-05-09T21:50:02.632650Z", 180 | "start_time": "2024-05-09T21:50:02.627015Z" 181 | } 182 | }, 183 | "cell_type": "code", 184 | "source": [ 185 | "a: A = pickle.loads(serialized_a)\n", 186 | "a.new_field\n" 187 | ], 188 | "id": "72f8d99f52065f6c", 189 | "outputs": [ 190 | { 191 | "data": { 192 | "text/plain": [ 193 | "'new_value'" 194 | ] 195 | }, 196 | "execution_count": 5, 197 | "metadata": {}, 198 | "output_type": "execute_result" 199 | } 200 | ], 201 | "execution_count": 5 202 | }, 203 | { 204 | "metadata": {}, 205 | "cell_type": "markdown", 206 | "source": [ 207 | "```{note} \n", 208 | "Be careful what you put into your classes' states! Any extension in\n", 209 | "fields, private or public, needs to be reflected in `__setstate__` if there's a\n", 210 | "chance that you or you users might have pickled objects of the old version.\n", 211 | "\n", 212 | "This is not true for methods, which are not part of the state and therefore\n", 213 | "don't need special consideration. \n", 214 | "```" 215 | ], 216 | "id": "ebe28209cfa78bdd" 217 | }, 218 | { 219 | "metadata": {}, 220 | "cell_type": "markdown", 221 | "source": [ 222 | "So far so good, `__setstate__` is a well-known mechanism,\n", 223 | "and it allows loading serialized objects of older code versions.\n", 224 | "Now we arrive at the central problem that this article deals with:\n", 225 | "\n", 226 | "**It is possible to run into non-recoverable situations!**" 227 | ], 228 | "id": "40c68d141c40308b" 229 | }, 230 | { 231 | "metadata": {}, 232 | "cell_type": "markdown", 233 | "source": "## Problem: Non-recoverable Serialization Errors", 234 | "id": "82f57b1419e9cc07" 235 | }, 236 | { 237 | "metadata": {}, 238 | "cell_type": "markdown", 239 | "source": [ 240 | "Let's change the original class definition to not include state, just methods, \n", 241 | "and then to extend it with a new field." 242 | ], 243 | "id": "cb802281072c3b89" 244 | }, 245 | { 246 | "metadata": { 247 | "ExecuteTime": { 248 | "end_time": "2024-05-09T21:50:02.678494Z", 249 | "start_time": "2024-05-09T21:50:02.673127Z" 250 | } 251 | }, 252 | "cell_type": "code", 253 | "source": [ 254 | "class A:\n", 255 | " @staticmethod\n", 256 | " def get_foo():\n", 257 | " return \"foo\"\n", 258 | " \n", 259 | " \n", 260 | "serialized_a = pickle.dumps(A())\n", 261 | "\n", 262 | "class A:\n", 263 | " def __init__(self, foo: str = \"foo\"):\n", 264 | " self.foo = foo\n", 265 | " \n", 266 | " def __setstate__(self, state):\n", 267 | " full_state = {\"foo\": \"foo\"}\n", 268 | " full_state.update(state)\n", 269 | " self.__dict__.update(full_state)\n", 270 | " \n", 271 | " def get_foo(self):\n", 272 | " return self.foo" 273 | ], 274 | "id": "a1b7d2dffac3bbc3", 275 | "outputs": [], 276 | "execution_count": 6 277 | }, 278 | { 279 | "metadata": {}, 280 | "cell_type": "markdown", 281 | "source": [ 282 | "This is something than can easily happen in a real-world scenario. In a first version,\n", 283 | "the class might have been a simple container for methods, but later it was\n", 284 | "extended to include state as well (which is precisely how I ran into this\n", 285 | "issue and why I'm writing this article).\n", 286 | "\n", 287 | "Now, at deserialization, something unexpected happens:" 288 | ], 289 | "id": "4090b85591da0d8b" 290 | }, 291 | { 292 | "metadata": { 293 | "ExecuteTime": { 294 | "end_time": "2024-05-09T21:50:02.704633Z", 295 | "start_time": "2024-05-09T21:50:02.700352Z" 296 | } 297 | }, 298 | "cell_type": "code", 299 | "source": [ 300 | "a_deserialized: A = pickle.loads(serialized_a)\n", 301 | "\n", 302 | "try:\n", 303 | " a_deserialized.get_foo()\n", 304 | "except Exception as e:\n", 305 | " print_error(e)" 306 | ], 307 | "id": "699bbe9e8e73db32", 308 | "outputs": [ 309 | { 310 | "name": "stdout", 311 | "output_type": "stream", 312 | "text": [ 313 | "AttributeError: 'A' object has no attribute 'foo'\n" 314 | ] 315 | } 316 | ], 317 | "execution_count": 7 318 | }, 319 | { 320 | "metadata": {}, 321 | "cell_type": "markdown", 322 | "source": "The same if we try to access the attribute directly:", 323 | "id": "6cc35ac4c1da430" 324 | }, 325 | { 326 | "metadata": { 327 | "ExecuteTime": { 328 | "end_time": "2024-05-09T21:50:02.740879Z", 329 | "start_time": "2024-05-09T21:50:02.737546Z" 330 | } 331 | }, 332 | "cell_type": "code", 333 | "source": [ 334 | "try:\n", 335 | " a_deserialized.foo\n", 336 | "except Exception as e:\n", 337 | " print_error(e)" 338 | ], 339 | "id": "dd7cbd22ec4a8003", 340 | "outputs": [ 341 | { 342 | "name": "stdout", 343 | "output_type": "stream", 344 | "text": [ 345 | "AttributeError: 'A' object has no attribute 'foo'\n" 346 | ] 347 | } 348 | ], 349 | "execution_count": 8 350 | }, 351 | { 352 | "metadata": {}, 353 | "cell_type": "markdown", 354 | "source": [ 355 | "What's going on? We did the right thing and implemented `__setstate__`! Why is it\n", 356 | "not working?" 357 | ], 358 | "id": "60215cae351dbdc7" 359 | }, 360 | { 361 | "metadata": {}, 362 | "cell_type": "markdown", 363 | "source": [ 364 | "The reason is that if `__gestate__` is not implemented explicitly\n", 365 | "in a stateless class, the deserialization process will not call `__setstate__` at all!\n", 366 | "\n", 367 | "In all my years of python programming, I have never heard of this and I don't\n", 368 | "fully understand why the python developers decided to implement it this way.\n", 369 | "Optimization reasons don't really make sense here since setting an empty state\n", 370 | "would never be a performance bottleneck.\n", 371 | "\n", 372 | "This behavior, however, is documented and thus \"desired\" - see \n", 373 | "[here](https://docs.python.org/3/library/pickle.html#object.__setstate__)\n", 374 | "(the output of `__reduce__`, which I fortunately never had to use, is essentially\n", 375 | "controlled by `__getstate__`).\n", 376 | "\n", 377 | "Here the crux of this behavior is clearly demonstrated:" 378 | ], 379 | "id": "8a2de14d6b474c07" 380 | }, 381 | { 382 | "metadata": { 383 | "ExecuteTime": { 384 | "end_time": "2024-05-09T21:50:02.754246Z", 385 | "start_time": "2024-05-09T21:50:02.750703Z" 386 | } 387 | }, 388 | "cell_type": "code", 389 | "source": [ 390 | "class ClNoState:\n", 391 | " def amethod(self):\n", 392 | " pass\n", 393 | " \n", 394 | "class ClWithState:\n", 395 | " def __init__(self):\n", 396 | " self.a = \"a\"\n", 397 | " \n", 398 | "print(f\"{ClNoState().__getstate__()=}\")\n", 399 | "\n", 400 | "print(f\"{ClWithState().__getstate__()=}\")" 401 | ], 402 | "id": "19bfcd5591e89228", 403 | "outputs": [ 404 | { 405 | "name": "stdout", 406 | "output_type": "stream", 407 | "text": [ 408 | "ClNoState().__getstate__()=None\n", 409 | "ClWithState().__getstate__()={'a': 'a'}\n" 410 | ] 411 | } 412 | ], 413 | "execution_count": 9 414 | }, 415 | { 416 | "metadata": {}, 417 | "cell_type": "markdown", 418 | "source": [ 419 | "The technical reason behind this is probably that `object()` doesn't have a\n", 420 | "`__dict__`. This makes it probably the only object in python that doesn't have a\n", 421 | "`__dict__` and there's a philosophical question whether `object()` is really an\n", 422 | "object...\n", 423 | "\n", 424 | "Interestingly, the `ClNoState` does have a `__dict__`, but `__getstate__` still\n", 425 | "returns `None`." 426 | ], 427 | "id": "7afdf59b0592fc16" 428 | }, 429 | { 430 | "metadata": { 431 | "ExecuteTime": { 432 | "end_time": "2024-05-09T21:50:02.765186Z", 433 | "start_time": "2024-05-09T21:50:02.760732Z" 434 | } 435 | }, 436 | "cell_type": "code", 437 | "source": [ 438 | "print(f\"{ClNoState().__dict__=}\")\n", 439 | "print(f\"{object().__getstate__()=}\")\n", 440 | "\n", 441 | "try:\n", 442 | " object().__dict__\n", 443 | "except AttributeError as e:\n", 444 | " print_error(e)" 445 | ], 446 | "id": "98733c62b36426ce", 447 | "outputs": [ 448 | { 449 | "name": "stdout", 450 | "output_type": "stream", 451 | "text": [ 452 | "ClNoState().__dict__={}\n", 453 | "object().__getstate__()=None\n", 454 | "AttributeError: 'object' object has no attribute '__dict__'\n" 455 | ] 456 | } 457 | ], 458 | "execution_count": 10 459 | }, 460 | { 461 | "metadata": {}, 462 | "cell_type": "markdown", 463 | "source": [ 464 | "I think it's fair to say that `object()` is not a proper object. Without the\n", 465 | "`__dict__` it can't have attributes, so there's some magic happening when a\n", 466 | "class is inheriting from `object` that adds all the functionality of python\n", 467 | "objects. I guess this happens at compile time." 468 | ], 469 | "id": "f9c9be7fb5dda780" 470 | }, 471 | { 472 | "metadata": { 473 | "ExecuteTime": { 474 | "end_time": "2024-05-09T21:50:02.824748Z", 475 | "start_time": "2024-05-09T21:50:02.821272Z" 476 | } 477 | }, 478 | "cell_type": "code", 479 | "source": [ 480 | "# Can't assign attributes\n", 481 | "try:\n", 482 | " object().a = \"a\"\n", 483 | "except AttributeError as e:\n", 484 | " print_error(e)" 485 | ], 486 | "id": "8416c2526b10e618", 487 | "outputs": [ 488 | { 489 | "name": "stdout", 490 | "output_type": "stream", 491 | "text": [ 492 | "AttributeError: 'object' object has no attribute 'a'\n" 493 | ] 494 | } 495 | ], 496 | "execution_count": 11 497 | }, 498 | { 499 | "metadata": {}, 500 | "cell_type": "markdown", 501 | "source": [ 502 | "### A small rant:\n", 503 | "\n", 504 | "Note that dealing with such problems from unpickling is pretty much a nightmare!\n", 505 | "You can't properly debug, because neither `__setstate__` nor `__init__` will\n", 506 | "ever be called. All you get is a malformed object, and you have to go figure on\n", 507 | "your own what's going on. Googling things like \"pickle not calling\n", 508 | "`__setstate__`\" does not provide immediate relief, and I was lucky enough that a\n", 509 | "colleague had found the right place in the python docs to understand what was\n", 510 | "going on.\n", 511 | "\n", 512 | "\n", 513 | "Even after understanding the problem, we are still in a bad spot. There's no way\n", 514 | "of fixing this! If any of your users have serialized an object of the old\n", 515 | "version (without state), you can't help them. It can never be loaded with the\n", 516 | "updated codebase. They would need to do some pretty nasty hacking on their side\n", 517 | "to overcome this.\n", 518 | "\n", 519 | "It seems almost as if python suggests that classes once defined without state,\n", 520 | "should remain without state forever. This is not a reasonable limitation, and a\n", 521 | "very unnecessary one at that." 522 | ], 523 | "id": "5b02c6f4d253cdf3" 524 | }, 525 | { 526 | "metadata": {}, 527 | "cell_type": "markdown", 528 | "source": "## Avoiding this Mess", 529 | "id": "3c4d18df9aab5b35" 530 | }, 531 | { 532 | "metadata": {}, 533 | "cell_type": "markdown", 534 | "source": [ 535 | "Without going into too many details on `__reduce__` and `__getstate__`, the\n", 536 | "problem can be avoided by always implementing `__getstate__` in stateless classes that\n", 537 | "might be pickled. For classes with state it's not strictly necessary (see above).\n", 538 | "\n", 539 | "This is not an unpattern yet (have patience) but an actual advice. Here it is in action:" 540 | ], 541 | "id": "60b339af0228f0a2" 542 | }, 543 | { 544 | "metadata": { 545 | "ExecuteTime": { 546 | "end_time": "2024-05-09T21:50:02.838860Z", 547 | "start_time": "2024-05-09T21:50:02.834014Z" 548 | } 549 | }, 550 | "cell_type": "code", 551 | "source": [ 552 | "class A:\n", 553 | " def __getstate__(self):\n", 554 | " return self.__dict__\n", 555 | " \n", 556 | " @staticmethod\n", 557 | " def get_foo():\n", 558 | " return \"foo\"\n", 559 | " \n", 560 | " \n", 561 | "serialized_a = pickle.dumps(A())\n", 562 | "\n", 563 | "# we no longer need __getstate__ in the new version since we have state now\n", 564 | "class A:\n", 565 | " def __init__(self, foo: str = \"foo\"):\n", 566 | " self.foo = foo\n", 567 | " \n", 568 | " def __setstate__(self, state):\n", 569 | " full_state = {\"foo\": \"foo\"}\n", 570 | " full_state.update(state)\n", 571 | " self.__dict__.update(full_state)\n", 572 | " \n", 573 | " def get_foo(self):\n", 574 | " return self.foo" 575 | ], 576 | "id": "36bfadf576219490", 577 | "outputs": [], 578 | "execution_count": 12 579 | }, 580 | { 581 | "metadata": {}, 582 | "cell_type": "markdown", 583 | "source": "Now things work as expected:", 584 | "id": "8e4ec7d9b556d526" 585 | }, 586 | { 587 | "metadata": { 588 | "ExecuteTime": { 589 | "end_time": "2024-05-09T21:50:02.854334Z", 590 | "start_time": "2024-05-09T21:50:02.850298Z" 591 | } 592 | }, 593 | "cell_type": "code", 594 | "source": [ 595 | "a_deserialized: A = pickle.loads(serialized_a)\n", 596 | "a_deserialized.get_foo()" 597 | ], 598 | "id": "617bce15f63ebba5", 599 | "outputs": [ 600 | { 601 | "data": { 602 | "text/plain": [ 603 | "'foo'" 604 | ] 605 | }, 606 | "execution_count": 13, 607 | "metadata": {}, 608 | "output_type": "execute_result" 609 | } 610 | ], 611 | "execution_count": 13 612 | }, 613 | { 614 | "metadata": {}, 615 | "cell_type": "markdown", 616 | "source": "## The Unpattern: Overwriting builtins", 617 | "id": "ba18c53669fb9314" 618 | }, 619 | { 620 | "metadata": {}, 621 | "cell_type": "markdown", 622 | "source": [ 623 | "### Part 1: Overwriting `object`\n", 624 | "\n", 625 | "There is no real reason that I'm aware of for any class not not have the default\n", 626 | "of `__getstate__` returning `self.__dict__`.\n", 627 | "\n", 628 | "Well, if `object`, from which any class inherits, does not behave the way we want to\n", 629 | "(does not implement `__getstate__` properly), then let's force it! We're in\n", 630 | "python after all - everything should be possible!" 631 | ], 632 | "id": "193ff106eaa778c8" 633 | }, 634 | { 635 | "metadata": {}, 636 | "cell_type": "markdown", 637 | "source": "![what-to-do](images/i_know_what_to_do.jpeg)", 638 | "id": "9c5652261f5af53c" 639 | }, 640 | { 641 | "metadata": {}, 642 | "cell_type": "markdown", 643 | "source": [ 644 | "### Disclaimer\n", 645 | "\n", 646 | "I did not in fact have the stregth to do it... But not for lack of trying.\n", 647 | "\n", 648 | "What follows below is a mostly failed attempt to overwrite python builtin behavior\n", 649 | "of how classes are defined and objects instantiated\n", 650 | "(with only partial and unsatisfactory success). Note that even if it worked (I think\n", 651 | "in python 2.7 it was possible to fully overwrite the default metaclass), it would have\n", 652 | "been a **terrible, terrible idea**!\n", 653 | "\n", 654 | "With this sorted out, let's go ahead." 655 | ], 656 | "id": "a496c6fc174a7f14" 657 | }, 658 | { 659 | "metadata": {}, 660 | "cell_type": "markdown", 661 | "source": [ 662 | "The first thing to do is to define a class that will always have `__getstate__`.\n", 663 | "And the proper way of using this class (not the unpattern way) is to inherit\n", 664 | "from when needed. We call it `Serializable`, just like the marker interface in\n", 665 | "java." 666 | ], 667 | "id": "32d9b4958e7e98b5" 668 | }, 669 | { 670 | "metadata": { 671 | "ExecuteTime": { 672 | "end_time": "2024-05-09T21:50:02.912747Z", 673 | "start_time": "2024-05-09T21:50:02.909594Z" 674 | } 675 | }, 676 | "cell_type": "code", 677 | "source": [ 678 | "class Serializable:\n", 679 | " def __getstate__(self):\n", 680 | " return self.__dict__" 681 | ], 682 | "id": "70b8eeba8cde6689", 683 | "outputs": [], 684 | "execution_count": 14 685 | }, 686 | { 687 | "metadata": {}, 688 | "cell_type": "markdown", 689 | "source": [ 690 | "Now let's try to overwrite the default behavior of `object`\n", 691 | "such that all classes inherit from `Serializable` by default." 692 | ], 693 | "id": "4d6a65c91120688d" 694 | }, 695 | { 696 | "metadata": { 697 | "ExecuteTime": { 698 | "end_time": "2024-05-09T21:50:02.924018Z", 699 | "start_time": "2024-05-09T21:50:02.921385Z" 700 | } 701 | }, 702 | "cell_type": "code", 703 | "source": [ 704 | "import builtins\n", 705 | "\n", 706 | "builtins.object = Serializable" 707 | ], 708 | "id": "974df4abb0344d1a", 709 | "outputs": [], 710 | "execution_count": 15 711 | }, 712 | { 713 | "metadata": {}, 714 | "cell_type": "markdown", 715 | "source": [ 716 | "Did that do the trick? Well, almost, but not good enough. We do get that classes that \n", 717 | "inherit from `object` now have `__getstate__`:" 718 | ], 719 | "id": "ce9b1c7087650c5f" 720 | }, 721 | { 722 | "metadata": { 723 | "ExecuteTime": { 724 | "end_time": "2024-05-09T21:50:02.943999Z", 725 | "start_time": "2024-05-09T21:50:02.939762Z" 726 | } 727 | }, 728 | "cell_type": "code", 729 | "source": [ 730 | "class AExplicitObject(object):\n", 731 | " pass" 732 | ], 733 | "id": "b9470f2030260c63", 734 | "outputs": [], 735 | "execution_count": 16 736 | }, 737 | { 738 | "metadata": { 739 | "ExecuteTime": { 740 | "end_time": "2024-05-09T21:50:03.001718Z", 741 | "start_time": "2024-05-09T21:50:02.996368Z" 742 | } 743 | }, 744 | "cell_type": "code", 745 | "source": "AExplicitObject().__getstate__()", 746 | "id": "c205907d351d2a0", 747 | "outputs": [ 748 | { 749 | "data": { 750 | "text/plain": [ 751 | "{}" 752 | ] 753 | }, 754 | "execution_count": 17, 755 | "metadata": {}, 756 | "output_type": "execute_result" 757 | } 758 | ], 759 | "execution_count": 17 760 | }, 761 | { 762 | "metadata": {}, 763 | "cell_type": "markdown", 764 | "source": [ 765 | "However, unfortunately, all that we have achieved is that now inheriting from `object` explicitly\n", 766 | "and implicitly leads to different behavior. So:" 767 | ], 768 | "id": "51bfb0c4c2d7720a" 769 | }, 770 | { 771 | "metadata": { 772 | "ExecuteTime": { 773 | "end_time": "2024-05-09T21:50:03.025553Z", 774 | "start_time": "2024-05-09T21:50:03.021715Z" 775 | } 776 | }, 777 | "cell_type": "code", 778 | "source": [ 779 | "class AImplicitObject:\n", 780 | " pass" 781 | ], 782 | "id": "f4e7d6ddbf373c2e", 783 | "outputs": [], 784 | "execution_count": 18 785 | }, 786 | { 787 | "metadata": {}, 788 | "cell_type": "markdown", 789 | "source": "will still lead to the old behavior of `__getstate__` returning `None`:", 790 | "id": "a9737ce3f82a955f" 791 | }, 792 | { 793 | "metadata": { 794 | "ExecuteTime": { 795 | "end_time": "2024-05-09T21:50:03.062391Z", 796 | "start_time": "2024-05-09T21:50:03.058412Z" 797 | } 798 | }, 799 | "cell_type": "code", 800 | "source": "print(f\"{AImplicitObject().__getstate__()=}\")", 801 | "id": "dabb13cca87cc52c", 802 | "outputs": [ 803 | { 804 | "name": "stdout", 805 | "output_type": "stream", 806 | "text": [ 807 | "AImplicitObject().__getstate__()=None\n" 808 | ] 809 | } 810 | ], 811 | "execution_count": 19 812 | }, 813 | { 814 | "metadata": {}, 815 | "cell_type": "markdown", 816 | "source": [ 817 | "This unexpected difference in behavior is a major WTF, and you should never do\n", 818 | "the hack outlined above!\n", 819 | "\n", 820 | "We can better understand why this happened by looking at the `__mro__` (method\n", 821 | "resolution order) attribute of the classes, which will list the parent classes\n", 822 | "in the order in which they are searched for attributes:" 823 | ], 824 | "id": "d46e408e68df850" 825 | }, 826 | { 827 | "metadata": { 828 | "ExecuteTime": { 829 | "end_time": "2024-05-09T21:50:03.080337Z", 830 | "start_time": "2024-05-09T21:50:03.076142Z" 831 | } 832 | }, 833 | "cell_type": "code", 834 | "source": "AExplicitObject.__mro__", 835 | "id": "d00d9b037daf37b7", 836 | "outputs": [ 837 | { 838 | "data": { 839 | "text/plain": [ 840 | "(__main__.AExplicitObject, __main__.Serializable, object)" 841 | ] 842 | }, 843 | "execution_count": 20, 844 | "metadata": {}, 845 | "output_type": "execute_result" 846 | } 847 | ], 848 | "execution_count": 20 849 | }, 850 | { 851 | "metadata": { 852 | "ExecuteTime": { 853 | "end_time": "2024-05-09T21:50:03.096085Z", 854 | "start_time": "2024-05-09T21:50:03.091747Z" 855 | } 856 | }, 857 | "cell_type": "code", 858 | "source": "AImplicitObject.__mro__", 859 | "id": "fb1635e74044fd8d", 860 | "outputs": [ 861 | { 862 | "data": { 863 | "text/plain": [ 864 | "(__main__.AImplicitObject, object)" 865 | ] 866 | }, 867 | "execution_count": 21, 868 | "metadata": {}, 869 | "output_type": "execute_result" 870 | } 871 | ], 872 | "execution_count": 21 873 | }, 874 | { 875 | "metadata": {}, 876 | "cell_type": "markdown", 877 | "source": [ 878 | "This makes clear: we didn't actually overwrite `object`.\n", 879 | "The real `object` class is added somewhere, I guess in the python compiler, and we can\n", 880 | "neither get rid of it nor overwrite it." 881 | ], 882 | "id": "37e9acc9716c3592" 883 | }, 884 | { 885 | "metadata": {}, 886 | "cell_type": "markdown", 887 | "source": [ 888 | "### Part 2: Overwriting `type`\n", 889 | "\n", 890 | "I'm not giving up yet! Since I had the misfortune of having to deal with\n", 891 | "metaclasses in the past, I know that there is something beyond inheritance to\n", 892 | "influence how classes are defined.\n", 893 | "\n", 894 | "A metaclass defines how a class is defined, thus acting before the constructor\n", 895 | "of the class is called, or before inheritance is carried out. The relevant\n", 896 | "method for metaclasses is `__new__`.\n", 897 | "\n", 898 | "The default metaclass that is used for all classes implicitly (just like object)\n", 899 | "is `type`. If we didn't succeed in overwriting `object`, maybe we can overwrite\n", 900 | "`type`? Let's try!\n", 901 | "\n", 902 | "![go-deeper](images/have_to_go_deeper.jpg)" 903 | ], 904 | "id": "fc2a6c97de208946" 905 | }, 906 | { 907 | "metadata": {}, 908 | "cell_type": "markdown", 909 | "source": [ 910 | "If we call `help(type)`, the first sentences show its signature:\n", 911 | "\n", 912 | "```\n", 913 | "class type(object)\n", 914 | " | type(object) -> the object's type\n", 915 | " | type(name, bases, dict, **kwds) -> a new type\n", 916 | " | \n", 917 | "```\n", 918 | "\n", 919 | "So, if we want to sneak in our `Serializable`,\n", 920 | "we need to extend the `bases` to include it.\n", 921 | " Here the extended type implementation:" 922 | ], 923 | "id": "714fa95873135e35" 924 | }, 925 | { 926 | "metadata": { 927 | "ExecuteTime": { 928 | "end_time": "2024-05-09T21:50:03.157936Z", 929 | "start_time": "2024-05-09T21:50:03.153188Z" 930 | } 931 | }, 932 | "cell_type": "code", 933 | "source": [ 934 | "class type_with_getstate(type):\n", 935 | " def __new__(cls, *args):\n", 936 | " args = list(args)\n", 937 | " args[1] += (Serializable,)\n", 938 | " return super().__new__(cls, *args)" 939 | ], 940 | "id": "2633bdd942244558", 941 | "outputs": [], 942 | "execution_count": 22 943 | }, 944 | { 945 | "metadata": {}, 946 | "cell_type": "markdown", 947 | "source": "Before overwriting the builtin, let's see whether this works", 948 | "id": "cddcd7539fa72f85" 949 | }, 950 | { 951 | "metadata": { 952 | "ExecuteTime": { 953 | "end_time": "2024-05-09T21:50:03.173901Z", 954 | "start_time": "2024-05-09T21:50:03.170768Z" 955 | } 956 | }, 957 | "cell_type": "code", 958 | "source": [ 959 | "class AWithMeta(metaclass=type_with_getstate):\n", 960 | " pass\n", 961 | "\n", 962 | "print(f\"{AWithMeta().__getstate__()=}\")\n", 963 | "print(f\"{AWithMeta.__mro__=}\")" 964 | ], 965 | "id": "c566ae05fcae6883", 966 | "outputs": [ 967 | { 968 | "name": "stdout", 969 | "output_type": "stream", 970 | "text": [ 971 | "AWithMeta().__getstate__()={}\n", 972 | "AWithMeta.__mro__=(, , )\n" 973 | ] 974 | } 975 | ], 976 | "execution_count": 23 977 | }, 978 | { 979 | "metadata": {}, 980 | "cell_type": "markdown", 981 | "source": [ 982 | "Looks good. Quick check on the other functionality of `type` (you know, retrieving\n", 983 | "the type of an object):" 984 | ], 985 | "id": "bf5395bfac62f007" 986 | }, 987 | { 988 | "metadata": { 989 | "ExecuteTime": { 990 | "end_time": "2024-05-09T21:50:03.196328Z", 991 | "start_time": "2024-05-09T21:50:03.192395Z" 992 | } 993 | }, 994 | "cell_type": "code", 995 | "source": [ 996 | "try:\n", 997 | " type_with_getstate(5)\n", 998 | "except Exception as e:\n", 999 | " print_error(e)" 1000 | ], 1001 | "id": "b535029ea53c7887", 1002 | "outputs": [ 1003 | { 1004 | "name": "stdout", 1005 | "output_type": "stream", 1006 | "text": [ 1007 | "IndexError: list index out of range\n" 1008 | ] 1009 | } 1010 | ], 1011 | "execution_count": 24 1012 | }, 1013 | { 1014 | "metadata": {}, 1015 | "cell_type": "markdown", 1016 | "source": [ 1017 | "What happened? Why did it stop doing its job - we only overwrote `__new__`, not `__call__`.\n", 1018 | "Is `__new__` being called when whe use it to determine an object's type? Then this should work:\n" 1019 | ], 1020 | "id": "98302556ff3d07da" 1021 | }, 1022 | { 1023 | "metadata": { 1024 | "ExecuteTime": { 1025 | "end_time": "2024-05-09T21:50:03.239745Z", 1026 | "start_time": "2024-05-09T21:50:03.235177Z" 1027 | } 1028 | }, 1029 | "cell_type": "code", 1030 | "source": [ 1031 | "class type_with_getstate_attempt2(type):\n", 1032 | " def __new__(cls, *args):\n", 1033 | " if len(args) > 1:\n", 1034 | " args = list(args)\n", 1035 | " args[1] += (Serializable,)\n", 1036 | " return super().__new__(cls, *args)\n" 1037 | ], 1038 | "id": "6dbdb9da602c8610", 1039 | "outputs": [], 1040 | "execution_count": 25 1041 | }, 1042 | { 1043 | "metadata": { 1044 | "ExecuteTime": { 1045 | "end_time": "2024-05-09T21:50:03.258515Z", 1046 | "start_time": "2024-05-09T21:50:03.254447Z" 1047 | } 1048 | }, 1049 | "cell_type": "code", 1050 | "source": [ 1051 | "try:\n", 1052 | " type_with_getstate_attempt2(5)\n", 1053 | "except Exception as e:\n", 1054 | " print_error(e)" 1055 | ], 1056 | "id": "8ccd4c1ed0ad8424", 1057 | "outputs": [ 1058 | { 1059 | "name": "stdout", 1060 | "output_type": "stream", 1061 | "text": [ 1062 | "TypeError: type.__new__() takes exactly 3 arguments (1 given)\n" 1063 | ] 1064 | } 1065 | ], 1066 | "execution_count": 26 1067 | }, 1068 | { 1069 | "metadata": {}, 1070 | "cell_type": "markdown", 1071 | "source": [ 1072 | "This got too weird for me, so I gave up on trying to undestand it exactly... Note that even this\n", 1073 | "won't work:" 1074 | ], 1075 | "id": "e1acc92da575b207" 1076 | }, 1077 | { 1078 | "metadata": { 1079 | "ExecuteTime": { 1080 | "end_time": "2024-05-09T21:50:03.275191Z", 1081 | "start_time": "2024-05-09T21:50:03.271632Z" 1082 | } 1083 | }, 1084 | "cell_type": "code", 1085 | "source": [ 1086 | "class type_extended_with_pass(type):\n", 1087 | " pass\n", 1088 | "\n", 1089 | "try:\n", 1090 | " type_extended_with_pass(5)\n", 1091 | "except Exception as e:\n", 1092 | " print_error(e)" 1093 | ], 1094 | "id": "6fbb92c57e7629f0", 1095 | "outputs": [ 1096 | { 1097 | "name": "stdout", 1098 | "output_type": "stream", 1099 | "text": [ 1100 | "TypeError: type.__new__() takes exactly 3 arguments (1 given)\n" 1101 | ] 1102 | } 1103 | ], 1104 | "execution_count": 27 1105 | }, 1106 | { 1107 | "metadata": {}, 1108 | "cell_type": "markdown", 1109 | "source": [ 1110 | "#### Bruteforcing the Solution\n", 1111 | "\n", 1112 | "As above, I take the attitude that if things don't want to behave my way, I will force them. \n", 1113 | "Here an actually working extension of type:" 1114 | ], 1115 | "id": "668173721a1fc5b8" 1116 | }, 1117 | { 1118 | "metadata": { 1119 | "ExecuteTime": { 1120 | "end_time": "2024-05-09T21:50:03.313237Z", 1121 | "start_time": "2024-05-09T21:50:03.309443Z" 1122 | } 1123 | }, 1124 | "cell_type": "code", 1125 | "source": [ 1126 | "from copy import deepcopy\n", 1127 | "\n", 1128 | "_original_type = deepcopy(type)\n", 1129 | "\n", 1130 | "class extended_type(_original_type):\n", 1131 | " def __new__(cls, *args, **kwargs):\n", 1132 | " # type of an object\n", 1133 | " if len(args) == 1:\n", 1134 | " return _original_type(*args)\n", 1135 | " # used as metaclass\n", 1136 | " args = list(args)\n", 1137 | " args[1] += (Serializable,)\n", 1138 | " return super().__new__(cls, *args, **kwargs)\n", 1139 | " \n", 1140 | " \n", 1141 | "print(f\"{extended_type(3)=}\")" 1142 | ], 1143 | "id": "4562530b467aa480", 1144 | "outputs": [ 1145 | { 1146 | "name": "stdout", 1147 | "output_type": "stream", 1148 | "text": [ 1149 | "extended_type(3)=\n" 1150 | ] 1151 | } 1152 | ], 1153 | "execution_count": 28 1154 | }, 1155 | { 1156 | "metadata": {}, 1157 | "cell_type": "markdown", 1158 | "source": "This works, so let's overwrite the builtin", 1159 | "id": "f16cf2544e4e5a28" 1160 | }, 1161 | { 1162 | "metadata": { 1163 | "ExecuteTime": { 1164 | "end_time": "2024-05-09T21:50:03.336213Z", 1165 | "start_time": "2024-05-09T21:50:03.332746Z" 1166 | } 1167 | }, 1168 | "cell_type": "code", 1169 | "source": "builtins.type = extended_type", 1170 | "id": "e9a1b489793dd4a3", 1171 | "outputs": [], 1172 | "execution_count": 29 1173 | }, 1174 | { 1175 | "metadata": {}, 1176 | "cell_type": "markdown", 1177 | "source": [ 1178 | "Unfortunately, this is only a partial success. Just like with overwriting `object`, all this\n", 1179 | "has done was to create a difference between classes that use `type` as metaclass explicitly \n", 1180 | "and classes that don't." 1181 | ], 1182 | "id": "7d6a201b2d6a9378" 1183 | }, 1184 | { 1185 | "metadata": { 1186 | "ExecuteTime": { 1187 | "end_time": "2024-05-09T21:50:03.370598Z", 1188 | "start_time": "2024-05-09T21:50:03.366110Z" 1189 | } 1190 | }, 1191 | "cell_type": "code", 1192 | "source": [ 1193 | "class AExplicitMeta(metaclass=type):\n", 1194 | " pass\n", 1195 | "\n", 1196 | "class AImplicitMeta:\n", 1197 | " pass\n", 1198 | "\n", 1199 | "print(f\"{AExplicitMeta().__getstate__()=}\")\n", 1200 | "print(f\"{AExplicitMeta.__mro__=}\")\n", 1201 | "print(\"--------------------------------------\")\n", 1202 | "print(f\"{AImplicitMeta().__getstate__()=}\")\n", 1203 | "print(f\"{AImplicitMeta.__mro__=}\")" 1204 | ], 1205 | "id": "4b9059bf210a440c", 1206 | "outputs": [ 1207 | { 1208 | "name": "stdout", 1209 | "output_type": "stream", 1210 | "text": [ 1211 | "AExplicitMeta().__getstate__()={}\n", 1212 | "AExplicitMeta.__mro__=(, , )\n", 1213 | "--------------------------------------\n", 1214 | "AImplicitMeta().__getstate__()=None\n", 1215 | "AImplicitMeta.__mro__=(, )\n" 1216 | ] 1217 | } 1218 | ], 1219 | "execution_count": 30 1220 | }, 1221 | { 1222 | "metadata": {}, 1223 | "cell_type": "markdown", 1224 | "source": [ 1225 | "We can't fight the compiler. Or maybe we can, but I don't know how. Feel free to\n", 1226 | "fire up a PR if you want to hack even deeper and find a solution!" 1227 | ], 1228 | "id": "b813c703d2b5bc21" 1229 | }, 1230 | { 1231 | "metadata": {}, 1232 | "cell_type": "markdown", 1233 | "source": [ 1234 | "## Conclusion\n", 1235 | "\n", 1236 | "We can't fully override builtin behavior because this behavior is not only\n", 1237 | "rooted in the `builtins` module but also somewhere else. We can only somehow\n", 1238 | "override it, by making explicit invocations of `object` and `type` behave\n", 1239 | "differently, but that's really not satisfactory...\n", 1240 | "\n", 1241 | "In any case, trying that was a bad idea from the start! Although having\n", 1242 | "`__getstate__` return `None` for objects without a state seems like a bad idea\n", 1243 | "as well, so the goal was a noble one. Note, however, that two times minus\n", 1244 | "usually doesn't turn to plus in software development." 1245 | ], 1246 | "id": "1de7dcc23140a54b" 1247 | }, 1248 | { 1249 | "metadata": {}, 1250 | "cell_type": "markdown", 1251 | "source": [ 1252 | "## Last Remarks: Advice on Backwards Compatibility with Pickling\n", 1253 | "\n", 1254 | "Here some actual things you could and should do to prevent deserialization\n", 1255 | "errors:\n", 1256 | "\n", 1257 | "1. Use `__setstate__`. I usually use the [setstate utility from sensAI](https://github.com/aai-institute/sensAI/blob/1d5d3d3bcd2b041d0d3084076a863d2b19f179db/src/sensai/util/pickle.py#L154)\n", 1258 | "which provides a very convenient way of taking care of backwards compatibility.\n", 1259 | "2. Write a `Serializable` class and always inherit from it\n", 1260 | "for all objects that are meant to be persisted. This is also a useful marker\n", 1261 | "interface, so you can easilly find all things that are expected to be serialized\n", 1262 | "by you or by users\n", 1263 | "\n", 1264 | "```python \n", 1265 | "class Serializable: \n", 1266 | " def __getstate__(self): \n", 1267 | " return self.__dict__\n", 1268 | "```\n", 1269 | "3. If you rename classes, add the old names to keep backwards compatibility.\n", 1270 | "Note that you can do that inside a function that you then call in the module, this way\n", 1271 | "the old name won't exist for code-analysis tools and won't appear in suggestions to import.\n", 1272 | "This looks something like this (imagine code within some python module):\n", 1273 | "\n", 1274 | "```python\n", 1275 | "\n", 1276 | "# was previously called AOld\n", 1277 | "class ANew:\n", 1278 | " pass\n", 1279 | "\n", 1280 | "def _restore_backwards_compatibility():\n", 1281 | " global AOld\n", 1282 | " AOld = ANew\n", 1283 | "\n", 1284 | "# For new users and for yourself AOld has disappeard from all IDE suggestions, but\n", 1285 | "# by calling this you add it to the global scope, and thus in reality it will be there\n", 1286 | "_restore_backwards_compatibility()\n", 1287 | "```\n", 1288 | "\n", 1289 | "4. Set up tests that previously pickled objects can still be unpickled. It's fairly easy\n", 1290 | "to do, just save instances of some classes as test resources, load them \n", 1291 | "with pickle in tests and\n", 1292 | "tests basic properties like access to attributes and `isinstance` checks." 1293 | ], 1294 | "id": "874c25c733905937" 1295 | }, 1296 | { 1297 | "metadata": { 1298 | "ExecuteTime": { 1299 | "end_time": "2024-05-09T21:50:03.393314Z", 1300 | "start_time": "2024-05-09T21:50:03.390784Z" 1301 | } 1302 | }, 1303 | "cell_type": "code", 1304 | "source": "", 1305 | "id": "22a8af53c0909e8e", 1306 | "outputs": [], 1307 | "execution_count": 30 1308 | } 1309 | ], 1310 | "metadata": { 1311 | "kernelspec": { 1312 | "display_name": "Python 3", 1313 | "language": "python", 1314 | "name": "python3" 1315 | }, 1316 | "language_info": { 1317 | "codemirror_mode": { 1318 | "name": "ipython", 1319 | "version": 2 1320 | }, 1321 | "file_extension": ".py", 1322 | "mimetype": "text/x-python", 1323 | "name": "python", 1324 | "nbconvert_exporter": "python", 1325 | "pygments_lexer": "ipython2", 1326 | "version": "2.7.6" 1327 | } 1328 | }, 1329 | "nbformat": 4, 1330 | "nbformat_minor": 5 1331 | } 1332 | -------------------------------------------------------------------------------- /protocol_decorator.md: -------------------------------------------------------------------------------- 1 | - [Typed Decorators with Protocols](#typed-decorators-with-protocols) 2 | * [The Boring Way](#the-boring-way) 3 | * [Crazy Way No1: Inherit at Type-Check Time](#crazy-way-no1-inherit-at-type-check-time) 4 | * [Crazy Way No2: Protocols and Workarounds](#crazy-way-no2-protocols-and-workarounds) 5 | 6 | 7 | # Typed Decorators with Protocols 8 | 9 | Python is amazing for implementing the decorator pattern. It is so good at it that young 10 | pythonistas tend to abuse this pattern (I was definitely guilty of that in the good old days). 11 | 12 | The instantiation of the decorator problem I want to discuss is as follows: 13 | 14 | Say you have a class `A` and you want to extend it without inheriting from it. 15 | 16 | ```python 17 | class A: 18 | def __init__(self, a_field: str = "a_field"): 19 | self.a_field = a_field 20 | 21 | def amethod(self): 22 | print(f"Greetings from A with {self.a_field=}") 23 | ``` 24 | 25 | This is a common requirement - for example, some mechanism might already give you instances of 26 | `A` that you just want to extend with new functionality. 27 | 28 | 29 | You can write a normal wrapper: 30 | 31 | ```python 32 | class AWrapper: 33 | def __init__(self, a: A): 34 | self.a = a 35 | 36 | def new_functionality(self): 37 | print("New functionality") 38 | ``` 39 | 40 | but then you can't use `AWrapper` in place of `A` because it doesn't have the same interface. 41 | 42 | 43 | In this unpattern, I will show three different ways to implement a typed decorator that truly 44 | extends `A` without inheritance. 45 | 46 | - The first will be sane, boring, and involve a lot of boilerplate. Could work in any language, and it's the one I'd recommend. Understandable, clear, and booooring. 47 | - The second will be more crazy and have some super-weird (pun intended) side effects that will almost never appear, but when they do, 48 | you'll be awfully confused. 49 | - The third one is even weirder, goes deep into python magic and might actually be sorta safe - though the WTF factor is off the charts! 50 | 51 | Let's go! 52 | 53 | ## The Boring Way 54 | 55 | You can make `AWrapper` have "the same interface" as `A` by copying 56 | all its methods and just forwarding them to the wrapped instance. At that point, 57 | it would be appropriate to make the member holding the wrapped instance private. 58 | The result would look like this: 59 | 60 | ```python 61 | class AWrapper: 62 | def __init__(self, a: A): 63 | self._a = a 64 | 65 | @property 66 | def a_field(self): 67 | return self._a.a_field 68 | 69 | def new_functionality(self): 70 | print("New functionality") 71 | 72 | def amethod(self): 73 | return self.a.amethod() 74 | ``` 75 | 76 | Just having the "same interface" accidentally is not enough, we want to have 77 | a proper type associated with both `A` and `AWrapper`! 78 | 79 | The boring, javaesc solution is to introduce an interface and make both 80 | `A` and `AWrapper` implement it: 81 | 82 | ```python 83 | from abc import ABC, abstractmethod 84 | 85 | class ABase(ABC): 86 | @abstractmethod 87 | def amethod(self) -> None: 88 | pass 89 | 90 | @property 91 | @abstractmethod 92 | def a_field(self) -> str: 93 | pass 94 | 95 | class A(ABase): 96 | def __init__(self, a_field: str = "a_field"): 97 | self._a_field = a_field 98 | 99 | @property 100 | def a_field(self) -> str: 101 | return self._a_field 102 | 103 | def amethod(self): 104 | print(f"Greetings from A with {self.a_field=}") 105 | 106 | class AWrapper(ABase): 107 | def __init__(self, a: A): 108 | self._a = a 109 | 110 | @property 111 | def a_field(self) -> str: 112 | return self._a.a_field 113 | 114 | def new_functionality(self): 115 | print("New functionality") 116 | 117 | def amethod(self): 118 | return self._a.amethod() 119 | ``` 120 | 121 | Notice that I had to actually adjust `A` in order to introduce an abstract property as 122 | part of the interface! I would also need to copy all the methods and public fields 123 | (the latter possibly as properties) from `A` to `AWrapper` and to `ABase`. 124 | This is just an actual, normal interface, resulting in a lot of boilerplate. If the class `A` is defined in an external library, 125 | you can't do that and will need to wrap it in some class of your own for this approach to work. 126 | 127 | Moreover, it's not exactly the same as it was before, since now `a_field` is read-only. One could add 128 | a few more lines making it writeable, but that'd be even more code. 129 | 130 | I want to repeat: this is the sensible thing to do! Sure, it's a bunch of boilerplate, but it's straightforward and 131 | there are no surprises. IDE's, type-checkers, and other tools will understand what's going on. 132 | 133 | Now, let's get to the fun stuff! 134 | 135 | ## Crazy Way No1: Inherit at Type-Check Time 136 | 137 | One of the first magic python things a beginner learns is the `__getattr__` method, 138 | which is perfect for forwarding calls to a wrapped object. Functionally, the 139 | decoration of `A` is achieved trivially: 140 | 141 | ```python 142 | class AWrapper: 143 | def __init__(self, a: A): 144 | self._a = a 145 | 146 | def new_functionality(self): 147 | print("New functionality") 148 | 149 | def __getattr__(self, item): 150 | return getattr(self._a, item) 151 | ``` 152 | 153 | The problem is that this will not work with type-checkers or IDEs. I don't know about you, 154 | but I can't live without autocompletion and type-safety catching my (many) simple mistakes. 155 | Whenever I see nothing proposed on typing `AWrapper(A).am `, I get incredibly sad. 156 | 157 | 158 | With python, types are ignored by the interpreter anyway. So if we want to express that `AWrapper` 159 | has the same interface as `A`, why not do it at type-checking time only and 160 | "remove" the inheritance at runtime? In fact, it can be done: 161 | 162 | ```python 163 | 164 | from typing import TYPE_CHECKING 165 | 166 | class A: 167 | def __init__(self, a_field: str = "a_field"): 168 | self.a_field = a_field 169 | 170 | def amethod(self): 171 | print(f"Greetings from A with {self.a_field=}") 172 | 173 | ABase = A if TYPE_CHECKING else object 174 | 175 | class AWrapper(ABase): 176 | def __init__(self, a: A): 177 | self._a = a 178 | 179 | def new_functionality(self): 180 | print("New functionality") 181 | 182 | def __getattr__(self, item): 183 | return getattr(self._a, item) 184 | ``` 185 | 186 | Now you get full completion, mypy and IDE support, all while not inheriting from `A` at runtime! 187 | 188 | Neat, right? I stole this shamelessly from the very cool project 189 | [tinydb](https://github.com/msiemens/tinydb), 190 | where it's used to [pretend that a TinyDB instance is a Table](https://github.com/msiemens/tinydb/blob/master/tinydb/database.py). 191 | 192 | However, the downside is that super-weird things happen! 193 | (Which is the only reason I actually had to dig into that codebase and discover their 194 | brilliant `with_typehint` implementation). 195 | 196 | I was just an innocent user, I inherited from `TinyDB` without reading the docs, called `super`, and ran into one of the 197 | weirdest errors of my life: super told me that the method was not available in the parent. 198 | But the method was there, 199 | I saw it, I was using it! What was going on? 200 | 201 | Here's how it would look with the classes defined above: 202 | 203 | ```python 204 | class AWrapperExtension(AWrapper): 205 | def method_in_extension(self) -> None: 206 | super().amethod() 207 | ``` 208 | 209 | running: 210 | ```python 211 | AWrapperExtension(A()).method_in_extension() 212 | ``` 213 | 214 | results in 215 | 216 | `AttributeError: 'super' object has no attribute 'amethod'`. 217 | 218 | I though I was going crazy! The IDE autocompleted the method for me! Moreover, 219 | calling `AWrapperExtension(A()).amethod()` works and gives the expected result! 220 | 221 | The reason behind this failure is some very specific things that happen with `super()`. 222 | As it turns out, it is only a proxy for the parent class and does not reproduce the *Method Resolution Order* (the inheritance chain) of a class. 223 | Instead, it points to the direct parent and only inherits from Pythons's `super` and `object` classes. 224 | We can see this with `super().__class__.__mro__` which evaluates to `(, )`. 225 | The `super()` object does not have access to methods defined on the grandparents. 226 | 227 | Returning from this aside, it suffices to say that `super()` is weird, 228 | and playing around with python magic can easily break it. 229 | 230 | By trial and error I found a way to fix it, but this is a serious WTF: 231 | 232 | ```python 233 | class AWrapperExtension(AWrapper): 234 | def method_in_extension(self) -> None: 235 | super().__getattr__("amethod")() 236 | ``` 237 | will actually work (contrary to using `getattr(super(), "amethod")()` since `__getattr__` is defined in `AWrapper` but `getattr` traverses the MRO tree of `super()` which only containes `super` and `object` -- both of which do not have access to our `amethod` method). 238 | 239 | Of course, this fix is ugly as hell. Note that you'd probably only run into this if 240 | you want to override a "forwarded" method from `AWrapper` in a subclass and use 241 | `super` inside of it. Otherwise, there is no need to call super: 242 | 243 | ```python 244 | class AWrapperExtension(AWrapper): 245 | def method_in_extension(self) -> None: 246 | self.amethod() 247 | ``` 248 | 249 | works perfectly fine. This kind of override tends to be rather rare, 250 | your code can work for a very long time until 251 | a confused colleague or user will run into the `AttributeError` 252 | described above. 253 | 254 | ## Crazy Way No2: Protocols and Workarounds 255 | 256 | This way would actually not be crazy if `Protocol`s in python worked as they should. 257 | Unfortunately, they don't really. A `Protocol` is a perfect way to express behavior 258 | without enforcing an inheritance hierarchy - so exactly what we want to achieve! 259 | 260 | In an ideal world (or maybe in a future version of python), we could do: 261 | 262 | ```python 263 | from typing import Protocol 264 | 265 | class AProtocol(Protocol): 266 | a_field: str 267 | 268 | def amethod(self) -> None: 269 | ... 270 | 271 | 272 | class A: 273 | def __init__(self, a_field: str = "a_field"): 274 | self.a_field = a_field 275 | 276 | def amethod(self): 277 | print(f"Greetings from A with {self.a_field=}") 278 | 279 | 280 | class AWrapper(AProtocol): 281 | def __init__(self, a: A): 282 | self._a = a 283 | 284 | def new_functionality(self): 285 | print("New functionality") 286 | 287 | def __getattr__(self, item): 288 | return getattr(self._a, item) 289 | ``` 290 | 291 | Note that `A` doesn't have to inherit from `AProtocol`, so it could perfectly well 292 | come from an external library. In the protocol we could express only the part of `A` 293 | that we actually care about (though the overridden `__getattr__` would forward 294 | all methods and fields of `A`, of course...). 295 | 296 | Unfortunately, this doesn't work. Static analysis shows that everything is fine, but 297 | at runtime `AWrapper(A()).amethod()` doesn't do anything. `__getattribute__`, which 298 | is called before `__getattr__`, forwards the call 299 | to the empty implementation of `amethod` inside the protocol instead of forwarding it to 300 | the wrapped `self._a`. Since this didn't raise an `AttributeError`, it turns out the 301 | `__getattr__` is never called, so our decorator falls apart. Infuriatingly, 302 | `AWrapper(A()).a_field` **does work**, since the field is just declared in the prototype, 303 | and not implemented. 304 | 305 | The solution is obvious, right? We just need to raise an `AttributeError` in the 306 | prototype, then `__getattr__` will finally be called, and we can all go home happy! 307 | 308 | If only... This doesn't work either. I don't know why. It should work! The [documentation](https://docs.python.org/3/reference/datamodel.html#object.__getattribute__) 309 | of `__getattribute__` says it should work! But it doesn't... 310 | 311 | With 312 | 313 | ```python 314 | from typing import Protocol 315 | 316 | class AProtocol(Protocol): 317 | a_field: str 318 | 319 | def amethod(self) -> None: 320 | raise AttributeError 321 | 322 | 323 | class A: 324 | def __init__(self, a_field: str = "a_field"): 325 | self.a_field = a_field 326 | 327 | def amethod(self): 328 | print(f"Greetings from A with {self.a_field=}") 329 | 330 | 331 | class AWrapper(AProtocol): 332 | def __init__(self, a: A): 333 | self._a = a 334 | 335 | def new_functionality(self): 336 | print("New functionality") 337 | 338 | def __getattr__(self, item): 339 | return getattr(self._a, item) 340 | ``` 341 | 342 | we get that `AWrapper(A()).amethod()` starts to cause an `AttributeError` and 343 | `__getattr__` is still not called. 344 | 345 | Well, if `__getattribute__` doesn't want to play along, we will force it! Fortunately, 346 | we can check what methods are inside a class without instantiating it. This leads us 347 | to the following dirty and ugly hack, which however works and which is the main reason 348 | I wrote this article: 349 | 350 | ```python 351 | from typing import Protocol 352 | 353 | class AProtocol(Protocol): 354 | a_field: str 355 | 356 | def amethod(self) -> None: 357 | ... 358 | 359 | 360 | class A: 361 | def __init__(self, a_field: str = "a_field"): 362 | self.a_field = a_field 363 | 364 | def amethod(self): 365 | print(f"Greetings from A with {self.a_field=}") 366 | 367 | 368 | class AWrapper(AProtocol): 369 | def __init__(self, a: A): 370 | self._a = a 371 | 372 | def new_functionality(self): 373 | print("New functionality") 374 | 375 | def __getattribute__(self, item): 376 | if hasattr(AProtocol, item) and item not in AWrapper.__dict__: 377 | return getattr(self._a, item) 378 | return super().__getattribute__(item) 379 | ``` 380 | 381 | With this everything works: static type checking, autocompletion, runtime functionality, and 382 | even inheritance: 383 | 384 | ```python 385 | class AWrapperExtension(AWrapper): 386 | def method_in_extension(self) -> None: 387 | super().amethod() 388 | ``` 389 | 390 | no longer leads to the super-weird error. 391 | 392 | But yeah, the hack will definitely raise eyebrows... Do try at home but not at work! 393 | --------------------------------------------------------------------------------