├── .github └── workflows │ └── build.yml ├── LICENSE ├── README.md ├── ab └── apache-bench-length-errors.md ├── amplitude └── export-events-to-datasette.md ├── asgi └── lifespan-test-httpx.md ├── auth0 ├── auth0-logout.md └── oauth-with-auth0.md ├── aws ├── athena-key-does-not-exist.md ├── athena-newline-json.md ├── boto-command-line.md ├── helper-for-boto-aws-pagination.md ├── instance-costs-per-month.md ├── ocr-pdf-textract.md ├── recovering-lightsail-data.md ├── s3-cors.md └── s3-triggers-dynamodb.md ├── awslambda └── asgi-mangum.md ├── azure └── all-traffic-to-subdomain.md ├── bash ├── escaping-a-string.md ├── escaping-sql-for-curl-to-datasette.md ├── finding-bom-csv-files-with-ripgrep.md ├── go-script.md ├── ignore-errors.md ├── loop-over-csv.md ├── multiple-servers.md ├── nullglob-in-bash.md ├── skip-csv-rows-with-odd-numbers.md ├── start-test-then-stop-server.md └── use-awk-to-add-a-prefix.md ├── build_database.py ├── caddy └── pause-retry-traffic.md ├── chrome └── headless.md ├── clickhouse ├── github-explorer.md └── github-public-history.md ├── cloudflare ├── cache-control-transform-rule.md ├── cloudflare-cache-html.md ├── domain-redirect-with-pages.md ├── redirect-rules.md ├── redirect-whole-domain.md ├── robots-txt-cloudflare-workers.md └── workers-github-oauth.md ├── cloudrun ├── billing-metrics-explorer.png ├── gcloud-run-services-list.md ├── increase-cloud-scheduler-time-limit.md ├── listing-cloudbuild-files.md ├── multiple-gcloud-accounts.md ├── ship-dockerfile-to-cloud-run.md ├── tailing-cloud-run-request-logs.md ├── use-labels-for-billing-breakdown-1.png ├── use-labels-for-billing-breakdown-2.png ├── use-labels-for-billing-breakdown.md └── using-build-args-with-cloud-run.md ├── cocktails ├── pisco-sour.md ├── tommys-margarita.md └── whisky-sour.md ├── cookiecutter ├── conditionally-creating-directories.md └── pytest-for-cookiecutter.md ├── cooking └── breakfast-tacos.md ├── cosmopolitan └── ecosystem.md ├── css ├── dialog-full-height.md ├── resizing-textarea.md └── simple-two-column-grid.md ├── datasette ├── baseline.md ├── cli-tool-that-is-also-a-plugin.md ├── crawling-datasette-with-datasette.md ├── datasette-on-replit.md ├── hugging-face-spaces.md ├── issues-open-for-less-than-x-seconds.md ├── playwright-tests-datasette-plugin.md ├── plugin-modifies-command.md ├── pytest-httpx-datasette.md ├── reddit-datasette-write.md ├── redirects-for-datasette.md ├── register-new-plugin-hooks.md ├── remember-to-commit.md ├── reuse-click-for-register-commands.md ├── row-selection-prototype.md ├── search-all-columns-trick.md ├── serving-mbtiles.md └── syntax-highlighted-code-examples.md ├── deno ├── annotated-deno-deploy-demo.md ├── deno-kv.md └── pyodide-sandbox.md ├── digitalocean └── datasette-on-digitalocean-app-platform.md ├── discord └── discord-github-issues-bot.md ├── django ├── almost-facet-counts-django-admin.md ├── building-a-blog-in-django.md ├── datasette-django.md ├── django-admin-horizontal-scroll.md ├── efficient-bulk-deletions-in-django.md ├── enabling-gin-index.md ├── export-csv-from-django-admin.md ├── extra-read-only-admin-information.md ├── filter-by-comma-separated-values.md ├── just-with-django.md ├── live-blog.md ├── migration-postgresql-fuzzystrmatch.md ├── migration-using-cte.md ├── migrations-runsql-noop.md ├── postgresql-full-text-search-admin.md ├── pretty-print-json-admin.md ├── pytest-django.md ├── show-timezone-in-django-admin.md └── testing-django-admin-with-pytest.md ├── docker ├── attach-bash-to-running-container.md ├── debian-unstable-packages.md ├── docker-compose-for-django-development.md ├── docker-for-mac-container-to-postgresql-on-host.md ├── emulate-s390x-with-qemu.md ├── gdb-python-docker.md ├── pipenv-and-docker.md ├── pytest-docker.md └── test-fedora-in-docker.md ├── duckdb ├── parquet-to-json.md ├── parquet.md └── remote-parquet.md ├── electron ├── electrion-auto-update.md ├── electron-debugger-console.md ├── electron-external-links-system-browser.md ├── python-inside-electron.md ├── sign-notarize-electron-macos.md └── testing-electron-playwright.md ├── exif └── orientation-and-location.md ├── firefox ├── search-across-all-resources-2.jpg ├── search-across-all-resources.jpg └── search-across-all-resources.md ├── fly ├── clip-on-fly.md ├── custom-subdomain-fly.md ├── django-sql-dashboard.md ├── fly-docker-registry.md ├── fly-logs-to-s3.md ├── redbean-on-fly.md ├── scp.md ├── undocumented-graphql-api.md ├── varnish-on-fly.md └── wildcard-dns-ssl.md ├── generate_screenshots.py ├── gis ├── gdal-sql.md ├── mapzen-elevation-tiles.md ├── natural-earth-in-spatialite-and-datasette.md └── pmtiles.md ├── git ├── backdate-git-commits.md ├── git-archive.md ├── git-bisect.md ├── git-filter-repo.md ├── remove-commit-and-force-push.md ├── rewrite-repo-remove-secrets.md ├── rewrite-repo-specific-files.md └── size-of-lfs-files.md ├── github-actions ├── attach-generated-file-to-release.md ├── cache-setup-py.md ├── cog.md ├── commit-if-file-changed.md ├── conditionally-run-a-second-job.md ├── continue-on-error.md ├── creating-github-labels.md ├── daily-planner.md ├── debug-tmate.md ├── deploy-live-demo-when-tests-pass.md ├── different-postgresql-versions.md ├── different-steps-on-a-schedule.md ├── dump-context.md ├── ensure-labels.md ├── github-pages.md ├── grep-tests.md ├── job-summaries.md ├── markdown-table-of-contents.md ├── npm-cache-with-npx-no-package.md ├── only-master.md ├── oxipng.md ├── postgresq-service-container.md ├── prettier-github-actions.md ├── python-3-11.md ├── running-tests-against-multiple-verisons-of-dependencies.md ├── s3-bucket-github-actions.md ├── service-containers-docker.md ├── set-environment-for-all-steps.md └── vite-github-pages.md ├── github ├── bulk-edit-github-projects.md ├── bulk-repo-github-graphql.md ├── clone-and-push-gist.md ├── custom-subdomain-github-pages.md ├── dependabot-python-setup.md ├── dependencies-graphql-api.md ├── django-postgresql-codespaces.md ├── github-code-search-api-uses.md ├── github-pages.md ├── graphql-pagination-python.md ├── graphql-search-topics.md ├── migrate-github-wiki.md ├── release-note-assistance.md ├── reporting-bugs.md ├── syntax-highlighting-python-console.md └── transfer-issue-private-to-public.md ├── go └── installing-tools.md ├── google-sheets └── concatenate.md ├── google ├── gmail-compose-url.md └── json-api-programmable-search-engine.md ├── googlecloud ├── gcloud-error-workaround.md ├── google-cloud-spend-datasette.md ├── google-oauth-cli-application-oauth-client-id.png ├── google-oauth-cli-application.md ├── gsutil-bucket.md ├── recursive-fetch-google-drive.md └── video-frame-ocr.md ├── gpt3 ├── chatgpt-api.md ├── chatgpt-applescript.md ├── gpt4-api-design.md ├── guessing-amazon-urls.md ├── jq.md ├── open-api.md ├── openai-python-functions-data-extraction.md ├── picking-python-project-name-chatgpt.md ├── python-chatgpt-streaming-api.md ├── reformatting-text-with-copilot.md └── writing-test-with-copilot.md ├── graphql ├── get-graphql-schema.md ├── graphql-fragments.md └── graphql-with-curl.md ├── hacker-news └── recent-comments.md ├── ham-radio └── general.md ├── heroku ├── pg-pull.md ├── pg-upgrade.md └── programatic-access-postgresql.md ├── homebrew ├── auto-formulas-github-actions.md ├── homebrew-core-local-git-checkout.md ├── latest-sqlite.md ├── mysql-homebrew.md ├── no-verify-attestations.md ├── packaging-python-cli-for-homebrew.md └── upgrading-python-homebrew-packages.md ├── html ├── datalist.md ├── lazy-loading-images.md ├── scroll-to-text.md ├── video-preload-none.md └── video-with-subtitles.md ├── http └── testing-cors-max-age.md ├── httpx └── openai-log-requests-responses.md ├── hugo └── basic.md ├── ics └── google-calendar-ics-subscribe-link.md ├── imagemagick ├── compress-animated-gif.md └── set-a-gif-to-loop.md ├── ios └── listen-to-page.md ├── javascript ├── copy-button.md ├── copy-rich-text-to-clipboard.md ├── dropdown-menu-with-details-summary.md ├── dynamically-loading-assets.md ├── javascript-date-objects.md ├── javascript-that-responds-to-media-queries.md ├── jest-without-package-json.md ├── jsr-esbuild.md ├── lit-with-skypack.md ├── manipulating-query-params.md ├── minifying-uglify-npx.md ├── openseadragon.md ├── preventing-double-form-submission.md ├── scroll-to-form-if-errors.md ├── tesseract-ocr-javascript.md └── working-around-nodevalue-size-limit.md ├── jinja ├── autoescape-template.md ├── custom-jinja-tags-with-attributes.md └── format-thousands.md ├── jq ├── array-of-array-to-objects.md ├── combined-github-release-notes.md ├── convert-no-decimal-point-latitude-jq.md ├── extracting-objects-recursively.md ├── flatten-nested-json-objects-jq.md ├── git-log-json.md ├── radio-garden-jq.md └── reformatting-airtable-json.md ├── json ├── ijson-stream.md ├── json-pointer.md └── streaming-indented-json-array.md ├── jupyter ├── javascript-in-a-jupyter-notebook.md └── jupyterlab-uv-tool-install.md ├── kubernetes ├── basic-datasette-in-kubernetes.md └── kubectl-proxy.md ├── linux ├── allow-sudo-without-password-specific-command.md ├── basic-strace.md ├── echo-pipe-to-file-su.md └── iconv.md ├── llms ├── bert-ner.md ├── claude-hacker-news-themes.md ├── code-interpreter-expansions.md ├── colbert-ragatouille.md ├── docs-from-tests.md ├── dolly-2.md ├── embed-paragraphs.md ├── larger-context-openai-models-llm.md ├── llama-7b-m2.md ├── llama-cpp-python-grammars.md ├── mlc-chat-redpajama.md ├── nanogpt-shakespeare-m2.md ├── openai-embeddings-related-content.md ├── prompt-gemini.md ├── python-react-pattern.md ├── rg-pipe-llm-trick.md ├── streaming-llm-apis.md └── training-nanogpt-on-my-blog.md ├── machinelearning └── musicgen.md ├── macos ├── 1password-terminal.md ├── apple-photos-large-files.md ├── atuin.md ├── close-terminal-on-ctrl-d.md ├── close-terminal-on-ctrl-d.png ├── downloading-partial-youtube-videos.md ├── edit-ios-home-screen.md ├── external-display-laptop.md ├── find-largest-sqlite.md ├── fixing-compinit-insecure-directories.md ├── fs-usage.md ├── ifuse-iphone.md ├── imovie-slides-and-audio.md ├── impaste.md ├── lsof-macos.md ├── open-files-with-opensnoop.md ├── python-installer-macos.md ├── quick-whisper-youtube.md ├── quicktime-capture-script.md ├── running-docker-on-remote-m1.md ├── shrinking-pngs-with-pngquant-and-oxipng.md ├── sips.md ├── skitch-catalina-1.png ├── skitch-catalina-2.png ├── skitch-catalina.md ├── whisper-cpp.md ├── wildcard-dns-dnsmasq.md └── zsh-pip-install.md ├── markdown ├── converting-to-markdown.gif ├── converting-to-markdown.md ├── github-markdown-api.md └── markdown-extensions-python.md ├── mastodon ├── custom-domain-mastodon.md ├── export-timeline-to-sqlite.md ├── mastodon-bots-github-actions.md └── verifying-github-on-mastodon.md ├── mediawiki └── mediawiki-sqlite-macos.md ├── metadata.yaml ├── midjourney └── desktop-backgrounds.md ├── misc ├── hexdump.md └── voice-cloning.md ├── networking ├── ethernet-over-coaxial-cable.md └── http-ipv6.md ├── nginx └── proxy-domain-sockets.md ├── node └── constant-time-compare-strings.md ├── npm ├── annotated-package-json.md ├── npm-publish-github-actions.md ├── prettier-django.md ├── publish-web-component.md ├── self-hosted-quickjs.md └── upgrading-packages.md ├── observable-plot ├── histogram-with-tooltips.md └── wider-tooltip-areas.md ├── observable └── jq-in-observable.md ├── overture-maps └── overture-maps-parquet.md ├── pixelmator └── pixel-editing-favicon.md ├── playwright ├── expect-selector-count.md └── testing-tables.md ├── pluggy └── multiple-hooks-same-file.md ├── plugins ├── redirects.py └── template_vars.py ├── postgresql ├── closest-locations-to-a-point.md ├── constructing-geojson-in-postgresql.md ├── json-extract-path.md ├── read-only-postgresql-user.md ├── show-schema.md ├── unnest-csv.md └── upgrade-postgres-app.md ├── presenting ├── Tipsheet__https___bit_ly_…_and_New_File_and_Zoom.png └── stickies-for-workshop-links.md ├── purpleair └── purple-air-aqi.md ├── pyodide └── cryptography-in-pyodide.md ├── pypi ├── project-links.md ├── project-links.png └── pypi-releases-from-github.md ├── pytest ├── assert-dictionary-subset.md ├── async-fixtures.md ├── coverage-with-context.md ├── mock-httpx.md ├── mocking-boto.md ├── namedtuple-parameterized-tests.md ├── only-run-integration.md ├── playwright-pytest.md ├── pytest-argparse.md ├── pytest-code-coverage.md ├── pytest-httpx-debug.md ├── pytest-mock-calls.md ├── pytest-recording-vcr.md ├── pytest-stripe-signature.md ├── pytest-subprocess.md ├── pytest-uv.md ├── registering-plugins-in-tests.md ├── session-scoped-tmp.md ├── show-files-opened-by-tests.md ├── subprocess-server.md ├── syrupy.md ├── test-click-app-with-streaming-input.md └── treat-warnings-as-errors.md ├── python ├── annotated-dataklasses.md ├── build-official-docs.md ├── calendar-weeks.md ├── call-pip-programatically.md ├── callable.md ├── click-file-encoding.md ├── click-option-names.md ├── codespell.md ├── cog-to-update-help-in-readme.md ├── comparing-version-numbers.md ├── convert-to-utc-without-pytz.md ├── copy-file.md ├── csv-error-column-too-large.md ├── debug-click-with-pdb.md ├── decorators-with-optional-arguments.md ├── fabric-ssh-key.md ├── find-local-variables-in-exception-traceback.md ├── generate-nested-json-summary.md ├── graphlib-topologicalsorter.md ├── gtr-t5-large.md ├── ignore-both-flake8-and-mypy.md ├── init-subclass.md ├── inlining-binary-data.md ├── installing-flash-attention.md ├── installing-upgrading-plugins-with-pipx.md ├── introspect-function-parameters.md ├── io-bufferedreader.md ├── itry.md ├── json-floating-point.md ├── locust.md ├── lxml-m1-mac.md ├── macos-catalina-sort-of-ships-with-python3.md ├── md5-fips.md ├── os-remove-windows.md ├── output-json-array-streaming.md ├── packaging-pyinstaller.md ├── password-hashing-with-pbkdf2.md ├── pdb-interact.md ├── pip-cache.md ├── pip-tools.md ├── pipx-alpha.md ├── platform-specific-dependencies.md ├── pprint-no-sort-dicts.md ├── protocols.md ├── pyobjc-framework-corelocation.md ├── pyproject.md ├── pypy-macos.md ├── quick-testing-pyenv.md ├── rye.md ├── safe-output-json.md ├── setup-py-from-url.md ├── sqlite-in-pyodide.md ├── stdlib-cli-tools.md ├── struct-endianness.md ├── style-yaml-dump.md ├── subprocess-time-limit.md ├── toml.md ├── too-many-open-files-psutil.md ├── tracing-every-statement.md ├── tree-sitter.md ├── trying-free-threaded-python.md ├── using-c-include-path-to-install-python-packages.md ├── utc-warning-fix.md ├── uv-cli-apps.md └── yielding-in-asyncio.md ├── quarto └── trying-out-quarto.md ├── readthedocs ├── custom-sphinx-templates.md ├── custom-subdomain.md ├── documentation-seo-canonical.md ├── link-from-latest-to-stable.md ├── pip-install-docs.md ├── readthedocs-search-api.md └── stable-docs.md ├── reddit └── scraping-reddit-json.md ├── requirements.txt ├── script ├── bootstrap ├── build ├── server └── update ├── selenium ├── async-javascript-in-selenium.md └── selenium-python-macos.md ├── service-workers └── intercept-fetch.md ├── shot-scraper ├── axe-core.md ├── readability.md ├── scraping-flourish.md ├── social-media-cards.md └── subset-of-table-columns.md ├── spatialite ├── gunion-to-combine-geometries.md ├── knn.md ├── minimal-spatialite-database-in-python.md └── viewing-geopackage-data-with-spatialite-and-datasette.md ├── sphinx ├── blacken-docs.md ├── literalinclude-with-markers.md ├── sphinx-autodoc.md └── sphinx-ext-extlinks.md ├── sql ├── consecutive-groups.md ├── cumulative-total-over-time.md ├── django-group-permissions-markdown.md ├── finding-dupes-by-name-and-distance.md └── recursive-cte-twitter-threads.md ├── sqlite ├── blob-literals.md ├── build-specific-sqlite-pysqlite-macos.md ├── column-combinations.md ├── compare-before-after-json.md ├── comparing-datasets.md ├── compile-spellfix-osx.md ├── compile-sqlite3-rsync.md ├── compile-sqlite3-ubuntu.md ├── copy-tables-between-databases.md ├── counting-vm-ops.md ├── cr-sqlite-macos.md ├── cte-values.md ├── database-file-size.md ├── enabling-wal-mode.md ├── fixing-column-encoding-with-ftfy-and-sqlite-transform.md ├── floating-point-seconds.md ├── function-list.md ├── geopoly.md ├── import-csv.md ├── json-audit-log.md ├── json-extract-path.md ├── lag-window-function.md ├── ld-preload.md ├── list-all-columns-in-a-database.md ├── multiple-indexes.md ├── now-argument-stability.md ├── null-case.md ├── one-line-csv-operations.md ├── ordered-group-concat.md ├── pragma-function-list.md ├── pysqlite3-on-macos.md ├── python-sqlite-environment.md ├── python-sqlite-memory-to-file.md ├── related-content.md ├── related-rows-single-query.md ├── replicating-rqlite.md ├── simple-recursive-cte.md ├── sort-by-number-of-json-intersections.md ├── splitting-commas-sqlite.md ├── sqlite-aggregate-filter-clauses.md ├── sqlite-extensions-python-macos.md ├── sqlite-tg.md ├── sqlite-triggers.md ├── sqlite-vec.md ├── sqlite-version-macos-python.md ├── sqlite-version-websql-chrome.md ├── steampipe.md ├── subqueries-in-select.md ├── substr-instr.md ├── text-value-is-integer-or-float.md ├── track-timestamped-changes-to-a-table.md ├── triggers.py ├── trying-macos-extensions.md ├── unix-timestamp-milliseconds-sqlite.md ├── utc-items-on-thursday-in-pst.md └── vacum-disk-full.md ├── static └── github-light.css ├── svg └── dynamic-line-chart.md ├── tailscale ├── lock-down-sshd.md └── tailscale-github-actions.md ├── templates ├── index.html ├── pages │ ├── all.html │ ├── tools │ │ ├── annotated-presentations.html │ │ ├── aqi.html │ │ ├── byte-size-converter.html │ │ ├── clipboard.html │ │ ├── render-markdown.html │ │ └── resizing-textarea.html │ ├── {topic}.html │ └── {topic} │ │ └── {slug}.html ├── query-tils-search.html └── til_base.html ├── tesseract └── tesseract-cli.md ├── tiktok └── download-all-videos.md ├── twitter ├── birdwatch-sqlite.md ├── collecting-replies.md ├── credentials-twitter-bot.md └── export-edit-twitter-spaces.md ├── typescript └── basic-tsc.md ├── update_readme.py ├── valtown └── scheduled.md ├── vega └── bar-chart-ordering.md ├── vim └── mouse-support-in-vim.md ├── vscode ├── language-specific-indentation-settings.md └── vs-code-regular-expressions.md ├── web-components └── understanding-single-file-web-component.md ├── webassembly ├── compile-to-wasm-llvm-macos.md └── python-in-a-wasm-sandbox.md ├── webauthn └── webauthn-browser-support.md ├── wikipedia └── page-stats-api.md ├── yaml └── yamlfmt.md ├── youtube └── livestreaming.md ├── zeit-now ├── python-asgi-on-now-v2.md └── redirecting-all-paths-on-vercel.md └── zsh ├── argument-heredoc.md └── custom-zsh-prompt.md /amplitude/export-events-to-datasette.md: -------------------------------------------------------------------------------- 1 | # Exporting Amplitude events to SQLite 2 | 3 | [Amplitude](https://amplitude.com/) offers an "Export Data" button in the project settings page. This can export up to 365 days of events (up to 4GB per export), where the export is a zip file containing `*.json.gz` gzipped newline-delimited JSON. 4 | 5 | You can export multiple times, so if you have more than a year of events you can export them by specifying different date ranges. It's OK to overlap these ranges as each event has a uniue `uuid` that can be used to de-duplicate them. 6 | 7 | Here's how to import that into a SQLite database using `sqlite-utils`: 8 | ``` 9 | unzip export # The exported file does not have a .zip extension for some reason 10 | cd DIRECTORY_CREATED_FROM_EXPORT 11 | gzcat *.json.gz | sqlite-utils insert amplitude.db events --nl --alter --pk uuid --ignore - 12 | ``` 13 | Since we are using `--pk uuid` and `--ignore` it's safe to run this against as many exported `*.json.gz` files as you like, including exports that overlap each other. 14 | 15 | Then run `datasette amplitude.db` to browse the results. 16 | -------------------------------------------------------------------------------- /aws/boto-command-line.md: -------------------------------------------------------------------------------- 1 | # Using boto3 from the command line 2 | 3 | I found a useful pattern today for automating more complex AWS processes as pastable command line snippets, using [Boto3](https://aws.amazon.com/sdk-for-python/). 4 | 5 | The trick is to take advantage of the fact that `python3 -c '...'` lets you pass in a multi-line Python string which will be executed directly. 6 | 7 | I used that to create a new IAM role by running the following: 8 | ```bash 9 | python3 -c ' 10 | import boto3, json 11 | 12 | iam = boto3.client("iam") 13 | create_role_response = iam.create_role( 14 | Description=("Description of my role"), 15 | RoleName="my-new-role", 16 | AssumeRolePolicyDocument=json.dumps( 17 | { 18 | "Version": "2012-10-17", 19 | "Statement": [ 20 | { 21 | "Effect": "Allow", 22 | "Principal": { 23 | "AWS": "arn:aws:iam::462092780466:user/s3.read-write.my-previously-created-user" 24 | }, 25 | "Action": "sts:AssumeRole", 26 | } 27 | ], 28 | } 29 | ), 30 | MaxSessionDuration=12 * 60 * 60, 31 | ) 32 | # Attach AmazonS3FullAccess to it - note that even though we use full access 33 | # on the role itself any time we call sts.assume_role() we attach an additional 34 | # policy to ensure reduced access for the temporary credentials 35 | iam.attach_role_policy( 36 | RoleName="my-new-role", 37 | PolicyArn="arn:aws:iam::aws:policy/AmazonS3FullAccess", 38 | ) 39 | print(create_role_response["Role"]["Arn"]) 40 | ' 41 | ``` 42 | -------------------------------------------------------------------------------- /aws/helper-for-boto-aws-pagination.md: -------------------------------------------------------------------------------- 1 | # Helper function for pagination using AWS boto3 2 | 3 | I noticed that a lot of my boto3 code in [s3-credentials](https://github.com/simonw/s3-credentials) looked like this: 4 | 5 | ```python 6 | paginator = iam.get_paginator("list_user_policies") 7 | for response in paginator.paginate(UserName=username): 8 | for policy_name in response["PolicyNames"]: 9 | print(policy_name) 10 | ``` 11 | This was enough verbosity that I was hesitating on implementing pagination properly for some method calls. 12 | 13 | I came up with this helper function to use instead: 14 | 15 | ```python 16 | def paginate(service, method, list_key, **kwargs): 17 | paginator = service.get_paginator(method) 18 | for response in paginator.paginate(**kwargs): 19 | yield from response[list_key] 20 | ``` 21 | Now the above becomes: 22 | ```python 23 | for policy_name in paginate(iam, "list_user_policies", "PolicyNames", UserName=username): 24 | print(policy_name) 25 | ``` 26 | Here's [the issue](https://github.com/simonw/s3-credentials/issues/63) and the [refactoring commit](https://github.com/simonw/s3-credentials/commit/fc1e06ca3ffa2c73e196cffe741ef4e950204240). 27 | -------------------------------------------------------------------------------- /aws/instance-costs-per-month.md: -------------------------------------------------------------------------------- 1 | # Display EC2 instance costs per month 2 | 3 | The [EC2 pricing page](https://aws.amazon.com/ec2/pricing/on-demand/) shows cost per hour, which is pretty much useless. I want cost per month. The following JavaScript, pasted into the browser developer console, modifies the page to show cost-per-month instead. 4 | 5 | ```javascript 6 | Array.from( 7 | document.querySelectorAll('td') 8 | ).filter( 9 | el => el.textContent.toLowerCase().includes('per hour') 10 | ).forEach( 11 | el => el.textContent = '$' + (parseFloat( 12 | /\d+\.\d+/.exec(el.textContent)[0] 13 | ) * 24 * 30).toFixed(2) + ' per month' 14 | ) 15 | ``` 16 | -------------------------------------------------------------------------------- /azure/all-traffic-to-subdomain.md: -------------------------------------------------------------------------------- 1 | # Writing an Azure Function that serves all traffic to a subdomain 2 | 3 | [Azure Functions](https://docs.microsoft.com/en-us/azure/azure-functions/) default to serving traffic from a path like `/api/FunctionName` - for example `https://your-subdomain.azurewebsites.net/api/MyFunction`. 4 | 5 | If you want to serve an entire website through a single function (e.g. using [Datasette](https://datasette.io/)) you need that function to we called for any traffic to that subdomain. 6 | 7 | Here's how to do that - to capture all traffic to any path under `https://your-subdomain.azurewebsites.net/`. 8 | 9 | First add the following section to your `host.json` file: 10 | 11 | ``` 12 | "extensions": { 13 | "http": { 14 | "routePrefix": "" 15 | } 16 | } 17 | ``` 18 | Then add `"route": "{*route}"` to the `function.json` file for the function that you would like to serve all traffic. Mine ended up looking like this: 19 | ```json 20 | { 21 | "scriptFile": "__init__.py", 22 | "bindings": [ 23 | { 24 | "authLevel": "Anonymous", 25 | "type": "httpTrigger", 26 | "direction": "in", 27 | "name": "req", 28 | "route": "{*route}", 29 | "methods": [ 30 | "get", 31 | "post" 32 | ] 33 | }, 34 | { 35 | "type": "http", 36 | "direction": "out", 37 | "name": "$return" 38 | } 39 | ] 40 | } 41 | ``` 42 | See https://github.com/simonw/azure-functions-datasette for an example that uses this pattern. 43 | -------------------------------------------------------------------------------- /bash/escaping-a-string.md: -------------------------------------------------------------------------------- 1 | # Escaping strings in Bash using !:q 2 | 3 | TIL this trick, [via Pascal Hirsch](https://twitter.com/phphys/status/1311727268398465029) on Twitter. Enter a line of Bash starting with a `#` comment, then run `!:q` on the next line to see what that would be with proper Bash escaping applied. 4 | 5 | ``` 6 | bash-3.2$ # This string 'has single' "and double" quotes and a $ 7 | bash-3.2$ !:q 8 | '# This string '\''has single'\'' "and double" quotes and a $' 9 | bash: # This string 'has single' "and double" quotes and a $: command not found 10 | ``` 11 | How does this work? [James Coglan explains](https://twitter.com/mountain_ghosts/status/1311767073933099010): 12 | 13 | > The `!` character begins a history expansion; `!string` produces the last command beginning with `string`, and `:q` is a modifier that quotes the result; so I'm guessing this is equivalent to `!string` where `string` is `""`, so it produces the most recent command, just like `!!` does 14 | 15 | A bunch more useful tips in the [thread about this on Hacker News](https://news.ycombinator.com/item?id=24659282). 16 | -------------------------------------------------------------------------------- /bash/finding-bom-csv-files-with-ripgrep.md: -------------------------------------------------------------------------------- 1 | # Finding CSV files that start with a BOM using ripgrep 2 | 3 | For [sqlite-utils issue 250](https://github.com/simonw/sqlite-utils/issues/250) I needed to locate some test CSV files that start with a UTF-8 BOM. 4 | 5 | Here's how I did that using [ripgrep](https://github.com/BurntSushi/ripgrep): 6 | ``` 7 | $ rg --multiline --encoding none '^(?-u:\xEF\xBB\xBF)' --glob '*.csv' . 8 | ``` 9 | The `--multiline` option means the search spans multiple lines - I only want to match entire files that begin with my search term, so this means that `^` will match the start of the file, not the start of individual lines. 10 | 11 | `--encoding none` runs the search against the raw bytes of the file, disabling ripgrep's default BOM detection. 12 | 13 | `--glob '*.csv'` causes ripgrep to search only CSV files. 14 | 15 | The regular expression itself looks like this: 16 | 17 | ^(?-u:\xEF\xBB\xBF) 18 | 19 | This is [rust regex](https://docs.rs/regex/1.5.4/regex/#syntax) syntax. 20 | 21 | `(?-u:` means "turn OFF the `u` flag for the duration of this block" - the `u` flag, which is on by default, causes the Rust regex engine to interpret input as unicode. So within the rest of that `(...)` block we can use escaped byte sequences. 22 | 23 | Finally, `\xEF\xBB\xBF` is the byte sequence for the UTF-8 BOM itself. 24 | -------------------------------------------------------------------------------- /bash/ignore-errors.md: -------------------------------------------------------------------------------- 1 | # Ignoring errors in a section of a Bash script 2 | 3 | For [simonw/museums#32](https://github.com/simonw/museums/issues/32) I wanted to have certain lines in my Bash script ignore any errors: lines that used `sqlite-utils` to add columns and configure FTS, but that might fail with an error if the column already existed or FTS had already been configured. 4 | 5 | [This tip](https://stackoverflow.com/a/60362732) on StackOverflow lead me to the [following recipe](https://github.com/simonw/museums/blob/d94410440a5c81a5cb3a0f0b886a8cd30941b8a9/build.sh): 6 | 7 | ```bash 8 | #!/bin/bash 9 | set -euo pipefail 10 | 11 | yaml-to-sqlite browse.db museums museums.yaml --pk=id 12 | python annotate_nominatum.py browse.db 13 | python annotate_timestamps.py 14 | # Ignore errors in following block until set -e: 15 | set +e 16 | sqlite-utils add-column browse.db museums country 2>/dev/null 17 | sqlite3 browse.db < set-country.sql 18 | sqlite-utils disable-fts browse.db museums 2>/dev/null 19 | sqlite-utils enable-fts browse.db museums \ 20 | name description country osm_city \ 21 | --tokenize porter --create-triggers 2>/dev/null 22 | set -e 23 | ``` 24 | Everything between the `set +e` and the `set -e` lines can now error without the Bash script itself failing. 25 | 26 | The failing lines were still showing a bunch of Python tracebacks. I fixed that by redirecting their standard error output to `/dev/null` like this: 27 | ```bash 28 | sqlite-utils disable-fts browse.db museums 2>/dev/null 29 | ``` 30 | -------------------------------------------------------------------------------- /bash/loop-over-csv.md: -------------------------------------------------------------------------------- 1 | # Looping over comma-separated values in Bash 2 | 3 | Given a file (or a process) that produces comma separated values, here's how to split those into separate variables and use them in a bash script. 4 | 5 | The trick is to set the Bash `IFS` to a delimiter, then use `my_array=($my_string)` to split on that delimiter. 6 | 7 | Create a text file called `data.txt` containing this: 8 | ``` 9 | first,1 10 | second,2 11 | ``` 12 | You can create that by doing: 13 | ```bash 14 | echo 'first,1 15 | second,2' > /tmp/data.txt 16 | ``` 17 | To loop over that file and print each line: 18 | ```bash 19 | for line in $(cat /tmp/data.txt); 20 | do 21 | echo $line 22 | done 23 | ``` 24 | To split each line into two separate variables in the loop, do this: 25 | ```bash 26 | for line in $(cat /tmp/data.txt); 27 | do 28 | IFS=$','; split=($line); unset IFS; 29 | # $split is now a bash array 30 | echo "Column 1: ${split[0]}" 31 | echo "Column 2: ${split[1]}" 32 | done 33 | ``` 34 | Outputs: 35 | ``` 36 | Column 1: first 37 | Column 2: 1 38 | Column 1: second 39 | Column 2: 2 40 | ``` 41 | Here's a script I wrote using this technique for the TIL [Use labels on Cloud Run services for a billing breakdown](https://til.simonwillison.net/til/til/cloudrun_use-labels-for-billing-breakdown.md): 42 | ```bash 43 | #!/bin/bash 44 | for line in $( 45 | gcloud run services list --platform=managed \ 46 | --format="csv(SERVICE,REGION)" \ 47 | --filter "NOT metadata.labels.service:*" \ 48 | | tail -n +2) 49 | do 50 | IFS=$','; service_and_region=($line); unset IFS; 51 | service=${service_and_region[0]} 52 | region=${service_and_region[1]} 53 | echo "service: $service region: $region" 54 | gcloud run services update $service \ 55 | --region=$region --platform=managed \ 56 | --update-labels service=$service 57 | echo 58 | done 59 | ``` 60 | -------------------------------------------------------------------------------- /bash/use-awk-to-add-a-prefix.md: -------------------------------------------------------------------------------- 1 | # Using awk to add a prefix 2 | 3 | I wanted to dynamically run the following command against all files in a directory: 4 | 5 | ```bash 6 | pypi-to-sqlite content.db -f /tmp/pypi-datasette-packages/packages/airtable-export.json \ 7 | -f /tmp/pypi-datasette-packages/packages/csv-diff.json \ 8 | --prefix pypi_ 9 | ``` 10 | 11 | I can't use `/tmp/pypi-datasette-packages/packages/*.json` here because I need each file to be processed using the `-f` option. 12 | 13 | I found a solution using `awk`. The `awk` program `'{print "-f "$0}'` adds a prefix to the input, for example: 14 | ``` 15 | % echo "blah" | awk '{print "-f "$0}' 16 | -f blah 17 | ``` 18 | I wanted that trailing backslash too, so I used this: 19 | 20 | ```awk 21 | {print "-f "$0 " \\"} 22 | ``` 23 | Piping to `awk` works, so I combined that with `ls ../*.json` like so: 24 | 25 | ``` 26 | % ls /tmp/pypi-datasette-packages/packages/*.json | awk '{print "-f "$0 " \\"}' 27 | -f /tmp/pypi-datasette-packages/packages/airtable-export.json \ 28 | -f /tmp/pypi-datasette-packages/packages/csv-diff.json \ 29 | -f /tmp/pypi-datasette-packages/packages/csvs-to-sqlite.json \ 30 | ``` 31 | Then I used `eval` to execute the command. The full recipe looks like this: 32 | ```bash 33 | args=$(ls /tmp/pypi-datasette-packages/packages/*.json | awk '{print "-f "$0 " \\"}') 34 | eval "pypi-to-sqlite content.db $args 35 | --prefix pypi_" 36 | ``` 37 | Full details in [datasette.io issue 98](https://github.com/simonw/datasette.io/issues/98). 38 | -------------------------------------------------------------------------------- /cloudflare/cloudflare-cache-html.md: -------------------------------------------------------------------------------- 1 | # How to get Cloudflare to cache HTML 2 | 3 | To my surprise, if you setup a [Cloudflare](https://www.cloudflare.com/) caching proxy in front of a website it won't cache HTML pages by default, even if they are served with `cache-control:` headers. 4 | 5 | This is [documented here](https://developers.cloudflare.com/cache/troubleshooting/customize-caching/): 6 | 7 | > Cloudflare does not cache HTML resources automatically. This prevents us from unintentionally caching pages that often contain dynamic elements. 8 | 9 | I figured out how to get caching to work using a "Cache Rule". Here's the rule I added: 10 | 11 | ![Cloudflare Cache Rule interface. Rule name: Cache everything including HTML. When incoming requests match… hostname contains .datasette.site. Expression preview: (http.host contains ".datasette.site"). Then... Cache Elegibility is set to Eligible for cache. Edge TTL is set to Use cache-control header if present, bypass cache if not.](https://static.simonwillison.net/static/2024/cloudflare-cache-rule.jpg) 12 | 13 | I've told it that for any incoming request with a hostname containing `.datasette.site` (see [background in my weeknotes](https://simonwillison.net/2024/Jan/7/page-caching-and-custom-templates-for-datasette-cloud/)) it should consider that page eligible for caching, and it should respect the `cache-control` header. 14 | 15 | With this configuration in place, my backend can now serve headers that look like this: 16 | 17 | `cache-control: s-maxage=15` 18 | 19 | This will cause Cloudflare to cache the page for 15 seconds. 20 | 21 | I tried to figure out a rule that would serve all requests no matter what they looked like, but the interface would not let me leave the rules blank - so `hostname contains .datasette.site` was the best I could figure out. 22 | -------------------------------------------------------------------------------- /cloudflare/redirect-rules.md: -------------------------------------------------------------------------------- 1 | # Cloudflare redirect rules with dynamic expressions 2 | 3 | I wanted to ensure `https://niche-museums.com/` would redirect to `https://www.niche-museums.com/` - including any path - using Cloudflare. 4 | 5 | I've [solved this with page rules in the past](https://til.simonwillison.net/cloudflare/redirect-whole-domain), but this time I tried using a "redirect rule" instead. 6 | 7 | Creating a redirect rule that only fires for hits to the `niche-museums.com` (as opposed to `www.niche-museums.com`) hostname was easy. The harder part was figuring out how to assemble the URL. 8 | 9 | I eventually found the clues I needed [in this Cloudflare blog post](https://blog.cloudflare.com/dynamic-redirect-rules). The trick is to assemble a "dynamic" URL redirect using the `concat()` function in the Cloudflare expression language, [described here](https://developers.cloudflare.com/ruleset-engine/rules-language/functions/#transformation-functions). 10 | 11 | concat("https://www.niche-museums.com", http.request.uri) 12 | 13 | Here are the full configuration settings I used for my redirect rule: 14 | 15 | ![Configuration screen for setting up a custom URL redirection rule. Custom filter expression is selected, indicating the rule will only apply to traffic matching the custom expression. When incoming requests match... Field: Hostname, Operator: Equals, Value: niche-museums.com. Expression Preview: (http.host eq "niche-museums.com"). Then... URL redirect, Type: Dynamic, Expression: concat("https://www.niche-museums.com", http.request.uri), Status code: 301 (permanent redirect). Preserve query string: This option is unchecked. Buttons: Cancel, Save as Draft, Deploy.](https://static.simonwillison.net/static/2024/cloudflare-redirect-rule.jpg) 16 | -------------------------------------------------------------------------------- /cloudflare/redirect-whole-domain.md: -------------------------------------------------------------------------------- 1 | # Redirecting a whole domain with Cloudflare 2 | 3 | I had to run this site on `til.simonwillison.org` for 24 hours due to a domain registration mistake I made. 4 | 5 | Once I got `til.simonwillison.net` working again I wanted to permanently redirect the URLs on the temporary site to their equivalent on the correct domain. 6 | 7 | Since I was running the site behind [Cloudflare](https://www.cloudflare.com/), I could get Cloudflare to handle the redirects for me using a Page Rule, which support wildcards for redirects. 8 | 9 | I used these settings: 10 | 11 | - URL: `til.simonwillison.org/*` 12 | - Setting: Forwarding URL 13 | - Status code: 301 (permanent redirect) 14 | - Destination URL: `https://til.simonwillison.net/$1` 15 | 16 | This did the right thing - hits to e.g. https://til.simonwillison.org/cloudflare?a=1 redirect to https://til.simonwillison.net/cloudflare?a=1 17 | 18 | Here's a screenshot of the settings page I used to create the new Page Rule: 19 | 20 | ![Screenshot of the Cloudflare interface. Create a Page Rule for simonwillison.org. If the URL matches: URL (required) til.simonwillison.org/* Then the settings are: Forwarding URL https://til.simonwillison.net/$1 Select status code (required) 301 - Permanent Redirect. Save and Deploy Page Rule](https://github.com/simonw/til/assets/9599/6758a865-57fa-4da1-9e41-118f41e1d7b2) 21 | -------------------------------------------------------------------------------- /cloudrun/billing-metrics-explorer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/cloudrun/billing-metrics-explorer.png -------------------------------------------------------------------------------- /cloudrun/multiple-gcloud-accounts.md: -------------------------------------------------------------------------------- 1 | # Switching between gcloud accounts 2 | 3 | I have two different Google Cloud accounts active at the moment. Here's how to list them with `gcloud auth list`: 4 | 5 | ``` 6 | % gcloud auth list 7 | Credentialed Accounts 8 | ACTIVE ACCOUNT 9 | simon@example.com 10 | * me@gmail.com 11 | 12 | To set the active account, run: 13 | $ gcloud config set account `ACCOUNT` 14 | ``` 15 | And to switch between them with `gcloud config set account`: 16 | 17 | ``` 18 | % gcloud config set account me@gmail.com 19 | Updated property [core/account]. 20 | ``` 21 | -------------------------------------------------------------------------------- /cloudrun/use-labels-for-billing-breakdown-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/cloudrun/use-labels-for-billing-breakdown-1.png -------------------------------------------------------------------------------- /cloudrun/use-labels-for-billing-breakdown-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/cloudrun/use-labels-for-billing-breakdown-2.png -------------------------------------------------------------------------------- /cookiecutter/conditionally-creating-directories.md: -------------------------------------------------------------------------------- 1 | # Conditionally creating directories in cookiecutter 2 | 3 | I wanted my [datasette-plugin](https://github.com/simonw/datasette-plugin) cookiecutter template to create empty `static` and `templates` directories if the user replied `y` to the `include_static_directory` and `include_templates_directory` prompts. 4 | 5 | The solution was to add a `hooks/post_gen_project.py` script containing the following: 6 | 7 | ```python 8 | import os 9 | import shutil 10 | 11 | 12 | include_static_directory = bool("{{ cookiecutter.include_static_directory }}") 13 | include_templates_directory = bool("{{ cookiecutter.include_templates_directory }}") 14 | 15 | 16 | if include_static_directory: 17 | os.makedirs( 18 | os.path.join( 19 | os.getcwd(), 20 | "datasette_{{ cookiecutter.underscored }}", 21 | "static", 22 | ) 23 | ) 24 | 25 | 26 | if include_templates_directory: 27 | os.makedirs( 28 | os.path.join( 29 | os.getcwd(), 30 | "datasette_{{ cookiecutter.underscored }}", 31 | "templates", 32 | ) 33 | ) 34 | ``` 35 | 36 | Note that these scripts are run through the cookiecutter Jinja template system, so they can use `{{ }}` Jinja syntax to read cookiecutter inputs. 37 | -------------------------------------------------------------------------------- /datasette/datasette-on-replit.md: -------------------------------------------------------------------------------- 1 | # Running Datasette on Replit 2 | 3 | I figured out how to run Datasette on https://replit.com/ 4 | 5 | The trick is to start a new Python project and then drop the following into the `main.py` file: 6 | 7 | ```python 8 | import uvicorn 9 | from datasette.app import Datasette 10 | 11 | ds = Datasette(memory=True, files=[]) 12 | 13 | 14 | if __name__ == "__main__": 15 | uvicorn.run(ds.app(), host="0.0.0.0", port=8000) 16 | ``` 17 | Replit is smart enough to automatically create a `pyproject.toml` file with `datasette` and `uvicorn` as dependencies. It will also notice that the application is running on port 8000 and set `https://name-of-prject.your-username.repl.co` to proxy to that port. Plus it will restart the server any time it recieves new traffic (and pause it in between groups of requests). 18 | 19 | To serve a database file, download it using `wget` in the Replit console and add it to the `files=[]` argument. I ran this: 20 | 21 | wget https://datasette.io/content.db 22 | 23 | Then changed that first line to: 24 | 25 | ```python 26 | ds = Datasette(files=["content.db"]) 27 | ``` 28 | And restarted the server. 29 | -------------------------------------------------------------------------------- /datasette/redirects-for-datasette.md: -------------------------------------------------------------------------------- 1 | # Redirects for Datasette 2 | 3 | I made some changes to my https://til.simonwillison.net/ site that resulted in cleaner URL designs, so I needed to setup some redirects. I configured the redirects using a one-off Datasette plugin called `redirects.py` which I dropped into the `plugins/` directory for the Datasette instance: 4 | 5 | ```python 6 | from datasette import hookimpl 7 | from datasette.utils.asgi import Response 8 | 9 | 10 | @hookimpl 11 | def register_routes(): 12 | return ( 13 | (r"^/til/til/(?P[^_]+)_(?P[^\.]+)\.md$", lambda request: Response.redirect( 14 | "/{topic}/{slug}".format(**request.url_vars), status=301 15 | )), 16 | ("^/til/feed.atom$", lambda: Response.redirect("/tils/feed.atom", status=301)), 17 | ( 18 | "^/til/search$", 19 | lambda request: Response.redirect( 20 | "/tils/search" 21 | + (("?" + request.query_string) if request.query_string else ""), 22 | status=301, 23 | ), 24 | ), 25 | ) 26 | ``` 27 | -------------------------------------------------------------------------------- /datasette/register-new-plugin-hooks.md: -------------------------------------------------------------------------------- 1 | # Registering new Datasette plugin hooks by defining them in other plugins 2 | 3 | I'm experimenting with a Datasette plugin that itself adds new plugin hooks which other plugins can then interact with. 4 | 5 | It's called [datasette-low-disk-space-hook](https://github.com/simonw/datasette-low-disk-space-hook), and it adds a new plugin hook called `low_disk_space(datasette)`, defined in the [datasette_low_disk_space_hook/hookspecs.py](https://github.com/simonw/datasette-low-disk-space-hook/blob/0.1a0/datasette_low_disk_space_hook/hookspecs.py) module. 6 | 7 | The hook is registered by this code in [datasette_low_disk_space_hook/\_\_init\_\_.py](https://github.com/simonw/datasette-low-disk-space-hook/blob/0.1a0/datasette_low_disk_space_hook/__init__.py) 8 | 9 | ```python 10 | from datasette.utils import await_me_maybe 11 | from datasette.plugins import pm 12 | from . import hookspecs 13 | 14 | pm.add_hookspecs(hookspecs) 15 | ``` 16 | This imports the plugin manager directly from Datasette and uses it to add the new hooks. 17 | 18 | I was worried that the `pm.add_hookspects(hookspecs)` line was not guaranteed to be executed if that module had not been imported. 19 | 20 | It turns out that having this `entrpoints=` line in [setup.py](https://github.com/simonw/datasette-low-disk-space-hook/blob/0.1a0/setup.py) is enough to ensure that the module is imported and the `pm.add_hookspecs()` line is executed: 21 | 22 | ```python 23 | from setuptools import setup 24 | 25 | setup( 26 | name="datasette-low-disk-space-hook", 27 | # ... 28 | entry_points={"datasette": ["low_disk_space_hook = datasette_low_disk_space_hook"]}, 29 | # ... 30 | ) 31 | ``` 32 | -------------------------------------------------------------------------------- /datasette/reuse-click-for-register-commands.md: -------------------------------------------------------------------------------- 1 | # Reusing an existing Click tool with register_commands 2 | 3 | The [register_commands](https://docs.datasette.io/en/stable/plugin_hooks.html#register-commands-cli) plugin hook lets you add extra sub-commands to the `datasette` CLI tool. 4 | 5 | I have a lot of existing tools that I'd like to also make available as plugins. I figured out this pattern for my [git-history](https://datasette.io/tools/git-history) tool today: 6 | 7 | ```python 8 | from datasette import hookimpl 9 | from git_history.cli import cli as git_history_cli 10 | 11 | @hookimpl 12 | def register_commands(cli): 13 | cli.add_command(git_history_cli, name="git-history") 14 | ``` 15 | Now I can run the following: 16 | 17 | ``` 18 | % datasette git-history --help 19 | Usage: datasette git-history [OPTIONS] COMMAND [ARGS]... 20 | 21 | Tools for analyzing Git history using SQLite 22 | 23 | Options: 24 | --version Show the version and exit. 25 | --help Show this message and exit. 26 | 27 | Commands: 28 | file Analyze the history of a specific file and write it to SQLite 29 | ``` 30 | 31 | I initially tried doing this: 32 | 33 | ```python 34 | @hookimpl 35 | def register_commands(cli): 36 | cli.command(name="git-history")(git_history_file) 37 | ``` 38 | But got the following error: 39 | 40 | TypeError: Attempted to convert a callback into a command twice. 41 | 42 | Using [cli.add_command()](https://click.palletsprojects.com/en/8.0.x/api/?highlight=add_command#click.Group.add_command) turns out to be the right way to do this. 43 | 44 | Research issue for this: [datasette#1538](https://github.com/simonw/datasette/issues/1538). 45 | -------------------------------------------------------------------------------- /datasette/serving-mbtiles.md: -------------------------------------------------------------------------------- 1 | # Serving MBTiles with datasette-media 2 | 3 | The [MBTiles](https://github.com/mapbox/mbtiles-spec) format uses SQLite to bundle map tiles for use with libraries such as Leaflet. 4 | 5 | I figured out how to use the [datasette-media](https://datasette.io/plugins/datasette-media) to serve tiles from this MBTiles file containing two zoom levels of tiles for San Francisco: https://static.simonwillison.net/static/2021/San_Francisco.mbtiles 6 | 7 | This TIL is now entirely obsolete: I used this prototype to build the new [datasette-tiles](https://datasette.io/plugins/datasette-tiles) plugin. 8 | 9 | ```yaml 10 | plugins: 11 | datasette-cluster-map: 12 | tile_layer: "/-/media/tiles/{z},{x},{y}" 13 | tile_layer_options: 14 | attribution: "© OpenStreetMap contributors" 15 | tms: 1 16 | bounds: [[37.61746256103807, -122.57290320721465],[37.85395101481279, -122.27695899334748]] 17 | minZoom: 15 18 | maxZoom: 16 19 | datasette-media: 20 | tiles: 21 | database: San_Francisco 22 | sql: 23 | with comma_locations as ( 24 | select instr(:key, ',') as first_comma, 25 | instr(:key, ',') + instr(substr(:key, instr(:key, ',') + 1), ',') as second_comma 26 | ), 27 | variables as ( 28 | select 29 | substr(:key, 0, first_comma) as z, 30 | substr(:key, first_comma + 1, second_comma - first_comma - 1) as x, 31 | substr(:key, second_comma + 1) as y 32 | from comma_locations 33 | ) 34 | select 35 | tile_data as content, 36 | 'image/png' as content_type 37 | from 38 | tiles, variables 39 | where 40 | zoom_level = variables.z 41 | and tile_column = variables.x 42 | and tile_row = variables.y 43 | ``` 44 | -------------------------------------------------------------------------------- /django/efficient-bulk-deletions-in-django.md: -------------------------------------------------------------------------------- 1 | # Efficient bulk deletions in Django 2 | 3 | I needed to bulk-delete a large number of objects today. Django deletions are relatively inefficient by default, because Django implements its own version of cascading deletions and fires signals for each deleted object. 4 | 5 | I knew that I wanted to avoid both of these and run a bulk `DELETE` SQL operation. 6 | 7 | Django has an undocumented `queryset._raw_delete(db_connection)` method that can do this: 8 | 9 | ```python 10 | reports_qs = Report.objects.filter(public_id__in=report_ids) 11 | reports_qs._raw_delete(reports_qs.db) 12 | ``` 13 | But this failed for me, because my `Report` object has a many-to-many relationship with another table - and those records were not deleted. 14 | 15 | I could have hand-crafted a PostgreSQL cascading delete here, but I instead decided to manually delete those many-to-many records first. Here's what that looked like: 16 | 17 | ```python 18 | report_availability_tag_qs = ( 19 | Report.availability_tags.through.objects.filter( 20 | report__public_id__in=report_ids 21 | ) 22 | ) 23 | report_availability_tag_qs._raw_delete(report_availability_tag_qs.db) 24 | ``` 25 | This didn't quite work either, because I have another model `Location` with foreign key references to those reports. So I added this: 26 | ```python 27 | Location.objects.filter(latest_report__public_id__in=report_ids).update( 28 | latest_report=None 29 | ) 30 | ``` 31 | That combination worked! The Django debug toolbar confirmed that this executed one `UPDATE` followed by two efficient bulk `DELETE` operations. 32 | -------------------------------------------------------------------------------- /django/extra-read-only-admin-information.md: -------------------------------------------------------------------------------- 1 | # Adding extra read-only information to a Django admin change page 2 | 3 | I figured out this pattern today for adding templated extra blocks of information to the Django admin change page for an object. 4 | 5 | It's really simply and incredibly useful. I can see myself using this a lot in the future. 6 | 7 | ```python 8 | from django.contrib import admin 9 | from django.template.loader import render_to_string 10 | from django.utils.safestring import mark_safe 11 | from .models import Reporter 12 | 13 | 14 | @admin.register(Reporter) 15 | class ReporterAdmin(admin.ModelAdmin): 16 | # ... 17 | readonly_fields = ("recent_calls",) 18 | 19 | def recent_calls(self, instance): 20 | return mark_safe( 21 | render_to_string( 22 | "admin/_reporter_recent_calls.html", 23 | { 24 | "reporter": instance, 25 | "recent_calls": instance.call_reports.order_by("-created_at")[:20], 26 | "call_count": instance.call_reports.count(), 27 | }, 28 | ) 29 | ) 30 | ``` 31 | 32 | That's it! `recent_calls` is marked as a read-only field, then implemented as a method which returns HTML. That method passes the instance to a template using `render_to_string`. That template looks like this: 33 | 34 | ```html+jinja 35 |

{{ reporter }} has made {{ call_count }} call{{ call_count|pluralize }}

36 | 37 |

Recent calls (view all)

38 | 39 | {% for call in recent_calls %} 40 |

{{ call.location }} on {{ call.created_at }}

41 | {% endfor %} 42 | ``` 43 | -------------------------------------------------------------------------------- /django/migration-postgresql-fuzzystrmatch.md: -------------------------------------------------------------------------------- 1 | # Enabling the fuzzystrmatch extension in PostgreSQL with a Django migration 2 | 3 | The PostgreSQL [fuzzystrmatch extension](https://www.postgresql.org/docs/13/fuzzystrmatch.html) enables several functions for fuzzy string matching: `soundex()`, `difference()`, `levenshtein()`, `levenshtein_less_equal()`, `metaphone()`, `dmetaphone()` and `dmetaphone_alt()`. 4 | 5 | Enabling them for use with Django turns out to be really easy - it just takes a migration that looks something like this: 6 | 7 | ```python 8 | from django.contrib.postgres.operations import CreateExtension 9 | from django.db import migrations 10 | 11 | 12 | class Migration(migrations.Migration): 13 | 14 | dependencies = [ 15 | ("core", "0089_importrun_sourcelocation"), 16 | ] 17 | 18 | operations = [ 19 | CreateExtension(name="fuzzystrmatch"), 20 | ] 21 | ``` 22 | -------------------------------------------------------------------------------- /django/migrations-runsql-noop.md: -------------------------------------------------------------------------------- 1 | # migrations.RunSQL.noop for reversible SQL migrations 2 | 3 | `migrations.RunSQL.noop` provides an easy way to create "reversible" Django SQL migrations, where the reverse operation does nothing (but keeps it possible to reverse back to a previous migration state without being blocked by an irreversible migration). 4 | 5 | ```python 6 | from django.db import migrations 7 | 8 | 9 | class Migration(migrations.Migration): 10 | 11 | dependencies = [ 12 | ("app", "0114_last_migration"), 13 | ] 14 | 15 | operations = [ 16 | migrations.RunSQL( 17 | sql=""" 18 | update concordance_identifier 19 | set authority = replace(authority, ':', '_') 20 | where authority like '%:%' 21 | """, 22 | reverse_sql=migrations.RunSQL.noop, 23 | ) 24 | ] 25 | ``` 26 | -------------------------------------------------------------------------------- /django/postgresql-full-text-search-admin.md: -------------------------------------------------------------------------------- 1 | # PostgreSQL full-text search in the Django Admin 2 | 3 | Django 3.1 introduces PostgreSQL `search_type="websearch"` - which gives you search with advanced operators like `"phrase search" -excluding`. James Turk [wrote about this here](https://jamesturk.net/posts/websearch-in-django-31/), and it's also in [my weeknotes](https://simonwillison.net/2020/Jul/23/datasette-copyable-datasette-insert-api/). 4 | 5 | I decided to add it to my Django Admin interface. It was _really easy_ using the `get_search_results()` model admin method, [documented here](https://docs.djangoproject.com/en/3.0/ref/contrib/admin/#django.contrib.admin.ModelAdmin.get_search_results). 6 | 7 | My models already have a `search_document` full-text search column, as described in [Implementing faceted search with Django and PostgreSQL](https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/). So all I needed to add to my `ModelAdmin` subclasses was this: 8 | 9 | ```python 10 | def get_search_results(self, request, queryset, search_term): 11 | if not search_term: 12 | return super().get_search_results( 13 | request, queryset, search_term 14 | ) 15 | query = SearchQuery(search_term, search_type="websearch") 16 | rank = SearchRank(F("search_document"), query) 17 | queryset = ( 18 | queryset 19 | .annotate(rank=rank) 20 | .filter(search_document=query) 21 | .order_by("-rank") 22 | ) 23 | return queryset, False 24 | ``` 25 | Here's [the full implementation](https://github.com/simonw/simonwillisonblog/blob/6c0de9f9976ef831fe92106be662d77dfe80b32a/blog/admin.py) for my personal blog. 26 | -------------------------------------------------------------------------------- /django/pretty-print-json-admin.md: -------------------------------------------------------------------------------- 1 | # Pretty-printing all read-only JSON in the Django admin 2 | 3 | I have a bunch of models with JSON fields that are marked as read-only in the Django admin - usually because they're recording the raw JSON that was imported from an API somewhere to create an object, for debugging purposes. 4 | 5 | Here's a pattern I found for pretty-printing ANY JSON value that is displayed in a read-only field in the admin. Create a template called `admin/change_form.html` and populate it with the following: 6 | 7 | ```html+django 8 | {% extends "admin/change_form.html" %} 9 | {% block admin_change_form_document_ready %} 10 | {{ block.super }} 11 | 26 | {% endblock %} 27 | ``` 28 | This JavaScript will execute on every Django change form page, scanning for `div.readonly`, checking to see if the div contains a valid JSON value and pretty-printing it using JavaScript if it does. 29 | 30 | It's a cheap hack and it works great. 31 | -------------------------------------------------------------------------------- /docker/attach-bash-to-running-container.md: -------------------------------------------------------------------------------- 1 | # Attaching a bash shell to a running Docker container 2 | 3 | Use `docker ps` to find the container ID: 4 | 5 | $ docker ps 6 | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7 | 81b2ad3194cb alexdebrie/livegrep-base:1 "/livegrep-github-re…" 2 minutes ago Up 2 minutes compassionate_yalow 8 | 9 | Run `docker exec -it ID bash` to start a bash session in that container: 10 | 11 | $ docker exec -it 81b2ad3194cb bash 12 | 13 | I made the mistake of using `docker attach 81b2ad3194cb` first, which attaches you to the command running as CMD in that conatiner, and means that if you hit `Ctrl+C` you exit that command and terminate the container! 14 | -------------------------------------------------------------------------------- /docker/gdb-python-docker.md: -------------------------------------------------------------------------------- 1 | # Running gdb against a Python process in a running Docker container 2 | 3 | While investigating [Datasette issue #1268](https://github.com/simonw/datasette/issues/1268) I found myself with a Python process that was hanging, and I decided to try running `gdb` against it based on tips in [Debugging of CPython processes with gdb](https://www.podoliaka.org/2016/04/10/debugging-cpython-gdb/) 4 | 5 | Here's the recipe that worked: 6 | 7 | 1. Find the Docker container ID using `docker ps` - in my case it was `16197781a7b5` 8 | 2. Attach a new bash shell to that process in privileged mode (needed to get `gdb` to work): `docker exec --privileged -it 16197781a7b5 bash` 9 | 3. Install `gdb` and the Python tooling for using it: `apt-get install gdb python3-dbg` 10 | 4. Use `top` to find the pid of the running Python process that was hanging. It was `20` for me. 11 | 5. Run `gdb /usr/bin/python3 -p 20` to launch `gdb` against that process 12 | 6. In the `(gdb)` prompt run `py-bt` to see a backtrace. 13 | 14 | I'm sure there's lots more that can be done in `gdb` at this point, but that's how I got to a place where I could interact with the Python process that was running in the Docker container. 15 | -------------------------------------------------------------------------------- /docker/test-fedora-in-docker.md: -------------------------------------------------------------------------------- 1 | # Testing things in Fedora using Docker 2 | 3 | I got [a report](https://twitter.com/peterjanes/status/1552407491819884544) of a bug with my [s3-ocr tool](https://simonwillison.net/2022/Jun/30/s3-ocr/) running on Fedora. 4 | 5 | I attempted to replicate the bug in a Fedora container using Docker, by running this command: 6 | 7 | ``` 8 | docker run -it fedora:latest /bin/bash 9 | ``` 10 | This downloaded [the official image](https://hub.docker.com/_/fedora) and dropped me into a Bash shell. 11 | 12 | It turns out Fedora won't let you run `pip install` with its default Python 3 without first creating a virtual environment: 13 | 14 | ``` 15 | [root@d1146e0061d1 /]# python3 -m pip install s3-ocr 16 | /usr/bin/python3: No module named pip 17 | [root@d1146e0061d1 /]# python3 -m venv project_venv 18 | [root@d1146e0061d1 /]# source project_venv/bin/activate 19 | (project_venv) [root@d1146e0061d1 /]# python -m pip install s3-ocr 20 | Collecting s3-ocr 21 | Downloading s3_ocr-0.5-py3-none-any.whl (14 kB) 22 | Collecting sqlite-utils 23 | ... 24 | ``` 25 | Having done that I could test out my `s3-ocr` command like so: 26 | 27 | ``` 28 | (project_venv) [root@d1146e0061d1 /]# s3-ocr start --help 29 | Usage: s3-ocr start [OPTIONS] BUCKET [KEYS]... 30 | 31 | Start OCR tasks for PDF files in an S3 bucket 32 | 33 | s3-ocr start name-of-bucket path/to/one.pdf path/to/two.pdf 34 | ... 35 | ``` 36 | -------------------------------------------------------------------------------- /electron/electron-debugger-console.md: -------------------------------------------------------------------------------- 1 | # Using the Chrome DevTools console as a REPL for an Electron app 2 | 3 | I figured out how to use the Chrome DevTools to execute JavaScript interactively inside the Electron main process. I always like having a REPL for exploring APIs, and this means I can explore the Electron and Node.js APIs interactively. 4 | 5 | Simon_Willison’s_Weblog_and_DevTools_-_Node_js_and_Inspect_with_Chrome_Developer_Tools 6 | 7 | https://www.electronjs.org/docs/tutorial/debugging-main-process#--inspectport says you need to run: 8 | 9 | electron --inspect=5858 your/app 10 | 11 | I start Electron by running `npm start`, so I modified my `package.json` to include this: 12 | 13 | ```json 14 | "scripts": { 15 | "start": "electron --inspect=5858 ." 16 | ``` 17 | Then I ran `npm start`. 18 | 19 | To connect the debugger, open Google Chrome and visit `chrome://inspect/` - then click the "Open dedicated DevTools for Node" link. 20 | 21 | In that window, select the "Connection" tab and add a connection to `localhost:5858`: 22 | 23 | 8_31_21__2_08_PM 24 | 25 | Switch back to the "Console" tab and you can start interacting with the Electron environment. 26 | 27 | I tried this and it worked: 28 | 29 | ```javascript 30 | const { app, Menu, BrowserWindow, dialog } = require("electron"); 31 | new BrowserWindow({height: 100, width: 100}).loadURL("https://simonwillison.net/"); 32 | ``` 33 | -------------------------------------------------------------------------------- /electron/electron-external-links-system-browser.md: -------------------------------------------------------------------------------- 1 | # Open external links in an Electron app using the system browser 2 | 3 | For [Datasette.app](https://github.com/simonw/datasette-app) I wanted to ensure that links to external URLs would [open in the system browser](https://github.com/simonw/datasette-app/issues/34). 4 | 5 | This recipe works: 6 | 7 | ```javascript 8 | function postConfigure(window) { 9 | window.webContents.on("will-navigate", function (event, reqUrl) { 10 | let requestedHost = new URL(reqUrl).host; 11 | let currentHost = new URL(window.webContents.getURL()).host; 12 | if (requestedHost && requestedHost != currentHost) { 13 | event.preventDefault(); 14 | shell.openExternal(reqUrl); 15 | } 16 | }); 17 | } 18 | ``` 19 | The `will-navigate` event fires before any in-browser navigations, which means they can be intercepted and cancelled if necessary. 20 | 21 | I use the `URL()` class to extract the `.host` so I can check if the host being navigated to differs from the host that the application is running against (which is probably `localhost:$port`). 22 | 23 | Initially I was using `require('url').URL` for this but that doesn't appear to be necessary - Node.js ships with `URL` as a top-level class these days. 24 | 25 | `event.preventDefault()` cancels the navigation and `shell.openExternal(reqUrl)` opens the URL using the system default browsner. 26 | 27 | I call this function on any new window I create using `new BrowserWindow` - for example: 28 | 29 | ```javascript 30 | mainWindow = new BrowserWindow({ 31 | width: 800, 32 | height: 600, 33 | show: false, 34 | }); 35 | mainWindow.loadFile("loading.html"); 36 | mainWindow.once("ready-to-show", () => { 37 | mainWindow.show(); 38 | }); 39 | postConfigure(mainWindow); 40 | ``` 41 | 42 | -------------------------------------------------------------------------------- /firefox/search-across-all-resources-2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/firefox/search-across-all-resources-2.jpg -------------------------------------------------------------------------------- /firefox/search-across-all-resources.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/firefox/search-across-all-resources.jpg -------------------------------------------------------------------------------- /firefox/search-across-all-resources.md: -------------------------------------------------------------------------------- 1 | # Search across all loaded resources in Firefox 2 | 3 | You can search for a string in any resource loaded by a page (including across HTML, JavaScript and CSS) in the Debugger pane by hitting Command+Shift+F. 4 | 5 | Screenshot of search interface 6 | 7 | This view doesn't search the body of any JSON assets that were fetched by code, presumably because JSON isn't automatically loaded into memory by the browser. 8 | 9 | But ([thanks, @digitarald](https://twitter.com/digitarald/status/1257748744352567296)) the Network pane DOES let you search for content in assets fetched via Ajax/fetch() etc - though you do have to run the search before you trigger the requests that the search should cover. Again, the shortcut is Command+Shift+F. 10 | 11 | Screenshot of search interface 12 | -------------------------------------------------------------------------------- /fly/scp.md: -------------------------------------------------------------------------------- 1 | # How to scp files to and from Fly 2 | 3 | I have a Fly instance with a 20GB volume, and I wanted to copy files to and from the instance from my computer using `scp`. 4 | 5 | Here's the process that worked for me. 6 | 7 | 1. Connect to Fly's WireGuard network. Fly have [step by step instructions](https://fly.io/docs/reference/private-networking/#step-by-step) for this - you need to install a WireGuard app (I used the [official WireGuard macOS app](https://www.wireguard.com/install/)) and use the `fly wireguard create` command to configure it. 8 | 2. Generate 24 hour limited SSH credentials for your Fly organization: Run `fly ssh issue`, follow the prompt to select your organization and then tell it where to put the credentials. I saved them to `/tmp/fly` since they will only work for 24 hours. 9 | 3. Find the IPv6 private address for the instance you want to connect to. My instance is in the `laion-aesthetic` application so I did this by running: `fly ips private -a laion-aesthetic` 10 | 4. If the image you used to build the instance doesn't have `scp` installed you'll need to install it. On Ubuntu or Debian machines you can do that by attaching using `fly ssh console -a name-of-app` and then running `apt-get update && install openssh-client -y`. Any time you restart the container you'll have to run this step again, so if you're going to do it often you should instead update the image you are using to include this package. 11 | 6. Run the `scp` like this: `scp -i /tmp/fly root@\[fdaa:0:4ef:a7b:ad0:1:9c23:2\]:/data/data.db /tmp` - note how the IPv6 address is enclosed in `\[...\]`. 12 | -------------------------------------------------------------------------------- /gis/mapzen-elevation-tiles.md: -------------------------------------------------------------------------------- 1 | # Downloading MapZen elevation tiles 2 | 3 | [Via Tony Hirst](https://twitter.com/psychemedia/status/1357280624319553537) I found out about [MapZen's elevation tiles](https://www.mapzen.com/blog/terrain-tile-service/), which encode elevation data in PNG and other formats. 4 | 5 | These days they live at https://registry.opendata.aws/terrain-tiles/ 6 | 7 | I managed to download a subset of them using [download-tiles](https://datasette.io/tools/download-tiles) like so: 8 | 9 | ``` 10 | download-tiles elevation.mbtiles -z 0-4 \ 11 | --tiles-url='https://s3.amazonaws.com/elevation-tiles-prod/terrarium/{z}/{x}/{y}.png' 12 | ``` 13 | I'm worried I may have got the x and y the wrong way round though, see comments on https://github.com/simonw/datasette-tiles/issues/15 14 | -------------------------------------------------------------------------------- /git/git-archive.md: -------------------------------------------------------------------------------- 1 | # How to create a tarball of a git repository using "git archive" 2 | 3 | I figured this out in [a Gist in 2016](https://gist.github.com/simonw/a44af92b4b255981161eacc304417368) which has attracted a bunch of comments over the years. Now I'm upgrading it to a retroactive TIL. 4 | 5 | Run this in the repository folder: 6 | 7 | git archive --format=tar.gz -o /tmp/my-repo.tar.gz --prefix=my-repo/ main 8 | 9 | This will write out a file to `/tmp/my-repo.tar.gz`. 10 | 11 | When you `tar -xzvf my-repo.tar.gz` that file it will output a `my-repo/` directory with just the files - not the `.git` folder - from your repository. 12 | 13 | You can use a commit hash or tag or branch name instead of `main` to create an archive of a different point in that repository. 14 | 15 | Without the `--prefix` option you'll get a `.tar.gz` file which, when compressed, writes a bunch of stuff to your current directory. This usually isn't what you want! 16 | 17 | Here's a version that picks up the name of the directory you run it in: 18 | 19 | git archive --format=tar.gz -o $(basename $PWD).tar.gz --prefix=$(basename $PWD)/ main 20 | 21 | Note the trailing `/` on `--prefix` - without this you'll get folders called things like `datasettetests`. 22 | 23 | `basename $PWD` gives you the name of your current folder. 24 | -------------------------------------------------------------------------------- /git/remove-commit-and-force-push.md: -------------------------------------------------------------------------------- 1 | # Removing a git commit and force pushing to remove it from history 2 | 3 | I accidentally triggered a commit which added a big chunk of unwanted data to my repository. I didn't want this to stick around in the history forever, and no-one else was pulling from the repo, so I decided to use force push to remove the rogue commit entirely. 4 | 5 | I figured out the commit hash of the previous version that I wanted to restore and ran: 6 | 7 | git reset --hard 1909f93 8 | 9 | Then I ran the force push like this: 10 | 11 | git push --force origin main 12 | 13 | See https://github.com/simonw/sf-tree-history/issues/1 14 | -------------------------------------------------------------------------------- /github-actions/continue-on-error.md: -------------------------------------------------------------------------------- 1 | # Skipping a GitHub Actions step without failing 2 | 3 | I wanted to have a GitHub Action step run that might fail, but if it failed the rest of the steps should still execute and the overall run should be treated as a success. 4 | 5 | `continue-on-error: true` does exactly that: 6 | 7 | ```yaml 8 | - name: Download previous database 9 | run: curl --fail -o tils.db https://til.simonwillison.net/tils.db 10 | continue-on-error: true 11 | - name: Build database 12 | run: python build_database.py 13 | ``` 14 | 15 | [From this workflow](https://github.com/simonw/til/blob/7d799a24921f66e585b8a6b8756b7f8040c899df/.github/workflows/build.yml#L32-L36) 16 | 17 | I'm using `curl --fail` here which returns an error code if the file download files (without `--fail` it was writing out a two line error message to a file called `tils.db` which is not what I wanted). Then `continue-on-error: true` to keep on going even if the download failed. 18 | 19 | My `build_database.py` script updates the `tils.db` database file if it exists and creates it from scratch if it doesn't. 20 | -------------------------------------------------------------------------------- /github-actions/different-steps-on-a-schedule.md: -------------------------------------------------------------------------------- 1 | # Running different steps on a schedule 2 | 3 | Say you have a workflow that runs hourly, but once a day you want the workflow to run slightly differently - without duplicating the entire workflow. 4 | 5 | Thanks to @BrightRan, here's [the solution](https://github.community/t5/GitHub-Actions/Schedule-once-an-hour-but-do-something-different-once-a-day/m-p/54382/highlight/true#M9168). Use the following pattern in an `if:` condition for a step: 6 | 7 | github.event_name == 'schedule' && github.event.schedule == '20 17 * * *' 8 | 9 | Longer example: 10 | 11 | ```yaml 12 | name: Fetch updated data and deploy 13 | 14 | on: 15 | push: 16 | schedule: 17 | - cron: '5,35 * * * *' 18 | - cron: '20 17 * * *' 19 | 20 | jobs: 21 | build_and_deploy: 22 | runs-on: ubuntu-latest 23 | steps: 24 | # ... 25 | - name: Download existing .db files 26 | if: |- 27 | !(github.event_name == 'schedule' && github.event.schedule == '20 17 * * *') 28 | env: 29 | DATASETTE_TOKEN: ${{ secrets.DATASETTE_TOKEN }} 30 | run: |- 31 | datasette-clone https://biglocal.datasettes.com/ dbs -v --token=$DATASETTE_TOKEN 32 | ``` 33 | I used this [here](https://github.com/simonw/big-local-datasette/blob/35e1acd4d9859d3af2feb29d0744ce1550e5faec/.github/workflows/deploy.yml), see [#11](https://github.com/simonw/big-local-datasette/issues/11). 34 | -------------------------------------------------------------------------------- /github-actions/dump-context.md: -------------------------------------------------------------------------------- 1 | # Dump out all GitHub Actions context 2 | 3 | Useful for seeing what's available for `if: ` conditions (see [context and expression syntax](https://help.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions)). 4 | 5 | I copied this example action [from here](https://help.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#example-printing-context-information-to-the-log-file) and deployed it [here](https://github.com/simonw/playing-with-actions/blob/master/.github/workflows/dump-context.yml). Here's an [example run](https://github.com/simonw/playing-with-actions/runs/599575180?check_suite_focus=true). 6 | 7 | ```yaml 8 | on: push 9 | 10 | jobs: 11 | one: 12 | runs-on: ubuntu-16.04 13 | steps: 14 | - name: Dump GitHub context 15 | env: 16 | GITHUB_CONTEXT: ${{ toJson(github) }} 17 | run: echo "$GITHUB_CONTEXT" 18 | - name: Dump job context 19 | env: 20 | JOB_CONTEXT: ${{ toJson(job) }} 21 | run: echo "$JOB_CONTEXT" 22 | - name: Dump steps context 23 | env: 24 | STEPS_CONTEXT: ${{ toJson(steps) }} 25 | run: echo "$STEPS_CONTEXT" 26 | - name: Dump runner context 27 | env: 28 | RUNNER_CONTEXT: ${{ toJson(runner) }} 29 | run: echo "$RUNNER_CONTEXT" 30 | - name: Dump strategy context 31 | env: 32 | STRATEGY_CONTEXT: ${{ toJson(strategy) }} 33 | run: echo "$STRATEGY_CONTEXT" 34 | - name: Dump matrix context 35 | env: 36 | MATRIX_CONTEXT: ${{ toJson(matrix) }} 37 | run: echo "$MATRIX_CONTEXT" 38 | ``` 39 | -------------------------------------------------------------------------------- /github-actions/ensure-labels.md: -------------------------------------------------------------------------------- 1 | # Ensure labels exist in a GitHub repository 2 | 3 | I wanted to ensure that when [this template repository](https://github.com/simonw/action-transcription) was used to create a new repo that repo would have a specific set of labels. 4 | 5 | Here's the workflow I came up with, saved as `.github/workflows/ensure_labels.yml`: 6 | 7 | ```yaml 8 | name: Ensure labels 9 | on: [push] 10 | 11 | jobs: 12 | ensure_labels: 13 | runs-on: ubuntu-latest 14 | steps: 15 | - name: Create labels 16 | uses: actions/github-script@v6 17 | with: 18 | script: | 19 | try { 20 | await github.rest.issues.createLabel({ 21 | ...context.repo, 22 | name: 'captions' 23 | }); 24 | await github.rest.issues.createLabel({ 25 | ...context.repo, 26 | name: 'whisper' 27 | }); 28 | } catch(e) { 29 | // Ignore if labels exist already 30 | } 31 | ``` 32 | This creates `captions` and `whisper` labels, if they do not yet exist. 33 | 34 | It's wrapped in a `try/catch` so that if the labels exist already (as they will on subsequent runs) the error can be ignored. 35 | 36 | Note that you need to use `await ...` inside that `try/catch` block or exceptions thrown by those methods will still cause the action run to fail. 37 | 38 | The `...context.repo` trick saves on having to pass `owner` and `repo` explicitly. 39 | -------------------------------------------------------------------------------- /github-actions/grep-tests.md: -------------------------------------------------------------------------------- 1 | # Using grep to write tests in CI 2 | 3 | GitHub Actions workflows fail if any of the steps executes something that returns a non-zero exit code. 4 | 5 | Today I learned that `grep` returns a non-zero exit code if it fails to find any matches. 6 | 7 | This means that piping to grep is a really quick way to write a test as part of an Actions workflow. 8 | 9 | I wrote a quick soundness check today using the new `datasette --get /path` option, which runs a fake HTTP request for that path through Datasette and returns the response to standard out. Here's an example: 10 | 11 | ```yaml 12 | - name: Build database 13 | run: scripts/build.sh 14 | - name: Run tests 15 | run: | 16 | datasette . --get /us/pillar-point | grep 'Rocky Beaches' 17 | - name: Deploy to Vercel 18 | ``` 19 | I like this pattern a lot: build a database for a custom Datasette deloyment in CI, run one or more quick soundness checks using grep, then deploy if those checks pass. 20 | -------------------------------------------------------------------------------- /github-actions/only-master.md: -------------------------------------------------------------------------------- 1 | # Only run GitHub Action on push to master / main 2 | 3 | Spotted in [this Cloud Run example](https://github.com/GoogleCloudPlatform/github-actions/blob/20c294aabd5331f9f7b8a26e6075d41c31ce5e0d/example-workflows/cloud-run/.github/workflows/cloud-run.yml): 4 | 5 | ```yaml 6 | name: Build and Deploy to Cloud Run 7 | 8 | on: 9 | push: 10 | branches: 11 | - master 12 | ``` 13 | 14 | Useful if you don't want people opening pull requests against your repo that inadvertantly trigger a deploy action! 15 | 16 | An alternative mechanism I've used is to gate the specific deploy steps in the action, [like this](https://github.com/simonw/cryptozoology/blob/8a86ec283823c91ad42c5f737a912d43791d427f/.github/workflows/deploy.yml#L31-L40). 17 | 18 | ```yaml 19 | # Only run the deploy if push was to master 20 | - name: Set up Cloud Run 21 | if: github.ref == 'refs/heads/master' 22 | uses: GoogleCloudPlatform/github-actions/setup-gcloud@v0 23 | with: 24 | version: '275.0.0' 25 | service_account_email: ${{ secrets.GCP_SA_EMAIL }} 26 | service_account_key: ${{ secrets.GCP_SA_KEY }} 27 | - name: Deploy to Cloud Run 28 | if: github.ref == 'refs/heads/master' 29 | run: |- 30 | gcloud config set run/region us-central1 31 | ``` 32 | -------------------------------------------------------------------------------- /github-actions/python-3-11.md: -------------------------------------------------------------------------------- 1 | # Testing against Python 3.11 preview using GitHub Actions 2 | 3 | I decided to run my CI tests against the Python 3.11 preview, to avoid the problem I had when Python 3.10 came out with [a bug that affected Datasette](https://simonwillison.net/2021/Oct/9/finding-and-reporting-a-bug/). 4 | 5 | I used the new [GitHub Code Search](https://cs.github.com/) to figure out how to do this. I searched for: 6 | 7 | 3.11 path:workflows/*.yml 8 | 9 | And found [this example](https://github.com/urllib3/urllib3/blob/7bec77e81aa0a194c98381053225813f5347c9d2/.github/workflows/ci.yml#L60) from `urllib3` which showed that the version tag to use is: 10 | 11 | 3.11-dev 12 | 13 | > **Update 28th Noveber 2022**: `3.12-dev` now works for Python 3.12 preview 14 | 15 | I added that to my test matrix like so: 16 | 17 | ```yaml 18 | jobs: 19 | test: 20 | runs-on: ubuntu-latest 21 | strategy: 22 | matrix: 23 | python-version: ["3.7", "3.8", "3.9", "3.10", "3.11-dev"] 24 | steps: 25 | - uses: actions/checkout@v2 26 | - name: Set up Python ${{ matrix.python-version }} 27 | uses: actions/setup-python@v2 28 | with: 29 | python-version: ${{ matrix.python-version }} 30 | # ... 31 | ``` 32 | Here's the [full workflow](https://github.com/simonw/datasette/blob/a9d8824617268c4d214dd3be2174ac452044f737/.github/workflows/test.yml). 33 | 34 | -------------------------------------------------------------------------------- /github-actions/set-environment-for-all-steps.md: -------------------------------------------------------------------------------- 1 | # Set environment variables for all steps in a GitHub Action 2 | 3 | From [this example](https://github.com/GoogleCloudPlatform/github-actions/blob/20c294aabd5331f9f7b8a26e6075d41c31ce5e0d/example-workflows/cloud-run/.github/workflows/cloud-run.yml) I learned that you can set environment variables such that they will be available in ALL jobs once at the top of a workflow: 4 | 5 | ```yaml 6 | name: Build and Deploy to Cloud Run 7 | 8 | on: 9 | push: 10 | branches: 11 | - master 12 | 13 | env: 14 | PROJECT_ID: ${{ secrets.RUN_PROJECT }} 15 | RUN_REGION: us-central1 16 | SERVICE_NAME: helloworld-nodejs 17 | ``` 18 | 19 | I had previously been using this [much more verbose pattern](https://github.com/simonw/big-local-datasette/blob/181de90f1e7b59c7727595ee8cbe7626667fe05a/.github/workflows/deploy.yml#L30-L42): 20 | 21 | ```yaml 22 | - name: Fetch projects 23 | env: 24 | BIGLOCAL_TOKEN: ${{ secrets.BIGLOCAL_TOKEN }} 25 | run: python fetch_projects.py dbs/biglocal.db $BIGLOCAL_TOKEN --contact ... 26 | ``` 27 | -------------------------------------------------------------------------------- /github/bulk-edit-github-projects.md: -------------------------------------------------------------------------------- 1 | # Bulk editing status in GitHub Projects 2 | 3 | GitHub Projects has a mechanism for bulk updating the status of items, but it's pretty difficult to figure out how to do it. 4 | 5 | The trick is to use copy and paste. You can select a cell containing a status and hit `Command+C` to copy it - at which point a dotted border will be displayed around the cell. 6 | 7 | Then you can select and then shift-click a range of other cells, and hit `Command+V` to paste the value. 8 | 9 | Here's a demo: 10 | 11 | ![I click a In Progress cell and the border goes dotted when I hit the copy keyboard shortcut. Then I shift-click to select a range of cells and hit paste to update their status.](https://github.com/simonw/til/assets/9599/aedd6b5c-167e-40a1-9866-68410c0299d7) 12 | 13 | Here's where this feature was introduced [in the GitHub changelog](https://github.blog/changelog/2023-04-06-github-issues-projects-april-6th-update/#t-rex-bulk-editing-in-tables). See also [this community discussions thread](https://github.com/orgs/community/discussions/5465). 14 | -------------------------------------------------------------------------------- /github/bulk-repo-github-graphql.md: -------------------------------------------------------------------------------- 1 | # Bulk fetching repository details with the GitHub GraphQL API 2 | 3 | I wanted to be able to fetch details of a list of different repositories from the GitHub GraphQL API by name in a single operation. 4 | 5 | It turns out the `search()` operation can be used for this, given 100 repos at a time. The trick is to use the `repo:` search operator, e.g `repo:simonw/datasette repo:django/django` as demonstrated by [this search](https://github.com/search?q=repo%3Asimonw%2Fdatasette+repo:simonw/sqlite-utils&type=Repositories). 6 | 7 | Here's the GraphQL query, tried out using https://docs.github.com/en/graphql/overview/explorer 8 | 9 | ```graphql 10 | { 11 | search(type: REPOSITORY, query: "repo:simonw/datasette repo:django/django", first: 100) { 12 | nodes { 13 | ... on Repository { 14 | id 15 | nameWithOwner 16 | createdAt 17 | repositoryTopics(first: 100) { 18 | totalCount 19 | nodes { 20 | topic { 21 | name 22 | } 23 | } 24 | } 25 | openIssueCount: issues(states: [OPEN]) { 26 | totalCount 27 | } 28 | closedIssueCount: issues(states: [CLOSED]) { 29 | totalCount 30 | } 31 | releases(last: 1) { 32 | totalCount 33 | nodes { 34 | tagName 35 | } 36 | } 37 | } 38 | } 39 | } 40 | } 41 | ``` 42 | -------------------------------------------------------------------------------- /github/clone-and-push-gist.md: -------------------------------------------------------------------------------- 1 | # Clone, edit and push files that live in a Gist 2 | 3 | GitHub [Gists](https://gist.github.com/) are full Git repositories, and can be cloned and pushed to. 4 | 5 | You can clone them anonymously (read-only) just using their URL: 6 | 7 | git clone https://gist.github.com/simonw/0a30d52feeb3ff60f7d8636b0bde296b 8 | 9 | But if you want to be able to make local edits and then push them back, you need to use this recipe instead: 10 | 11 | git clone git@gist.github.com:0a30d52feeb3ff60f7d8636b0bde296b.git 12 | 13 | You can find this in the "Embed" menu, as the "Clone via SSH" option. 14 | 15 | This only uses the Gist's ID, the `simonw/` part from the URL is omitted. 16 | 17 | This uses your existing GitHub SSH credentials. 18 | 19 | You can then edit files in that repository and commit and push them like this: 20 | 21 | cd 0a30d52feeb3ff60f7d8636b0bde296b 22 | # Edit files here 23 | git commit -m "Edited some files" -a 24 | git push 25 | -------------------------------------------------------------------------------- /github/dependabot-python-setup.md: -------------------------------------------------------------------------------- 1 | # Configuring Dependabot for a Python project 2 | 3 | GitHub's Dependabot can automatically file PRs with bumps to dependencies when new versions of them are available. 4 | 5 | In June 2023 they added support for [Grouped version updates](https://github.blog/changelog/2023-06-30-grouped-version-updates-for-dependabot-public-beta/), so one PR will be filed that updates multiple dependencies at the same time. 6 | 7 | The [Dependabot setup instructions](https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configuration-options-for-dependency-updates) don't explicitly mention projects which keep all of their dependency information in `setup.py`. 8 | 9 | It works just fine with those kinds of projects too. 10 | 11 | To start it working, create a file in `.github/dependabot.yml` with the following contents: 12 | 13 | ```yaml 14 | version: 2 15 | updates: 16 | - package-ecosystem: pip 17 | directory: "/" 18 | schedule: 19 | interval: daily 20 | time: "13:00" 21 | groups: 22 | python-packages: 23 | patterns: 24 | - "*" 25 | ``` 26 | Then navigate to https://github.com/simonw/s3-credentials/network/updates (but for your project) - that's Insights -> Dependency graph -> Dependabot - to confirm that it worked. 27 | 28 | This should work for projects that use `setup.py` or `pyproject.toml` or `requirements.txt`. 29 | -------------------------------------------------------------------------------- /github/graphql-search-topics.md: -------------------------------------------------------------------------------- 1 | # Searching for repositories by topic using the GitHub GraphQL API 2 | 3 | I wanted to use the GitHub GraphQL API to return all of the repositories on the https://github.com/topics/git-scraping page. 4 | 5 | At first glance there isn't a GraphQL field for that page - but it turns out you can access it using a GitHub search: 6 | 7 | topic:git-scraping sort:updated-desc 8 | 9 | An oddity of GitHub search is that sort order can be defined using tokens that form part of the search query! 10 | 11 | Here's a GraphQL query [tested here](https://developer.github.com/v4/explorer/) that returns the most recent 100 `git-scraping` tagged repos, sorted by most recently updated. 12 | 13 | ```graphql 14 | { 15 | search(query: "topic:git-scraping sort:updated-desc", type: REPOSITORY, first: 100) { 16 | repositoryCount 17 | nodes { 18 | ... on Repository { 19 | nameWithOwner 20 | description 21 | updatedAt 22 | createdAt 23 | diskUsage 24 | } 25 | } 26 | } 27 | } 28 | ``` 29 | -------------------------------------------------------------------------------- /github/migrate-github-wiki.md: -------------------------------------------------------------------------------- 1 | # Migrating a GitHub wiki from one repository to another 2 | 3 | I figured out how to migrate a [GitHub wiki](https://docs.github.com/en/communities/documenting-your-project-with-wikis/about-wikis) (public or private) from one repository to another while preserving all history. 4 | 5 | The trick is that GitHub wikis are just Git repositories. Which means you can clone them, edit them and push them. 6 | 7 | This means you can migrate them between their parent repos like so. `myorg/old-repo` is the repo you are moving from, and `myorg/new-repo` is the destination. 8 | 9 | git clone https://github.com/myorg/old-repo.wiki.git 10 | cd old-repo.wiki 11 | git remote remove origin 12 | git remote add origin https://github.com/myorg/new-repo.wiki.git 13 | git push --set-upstream origin master --force 14 | 15 | This will entirely over-write the content and history of the wiki attached to the `new-repo` repository with the content and history from the wiki in `old-repo`. 16 | -------------------------------------------------------------------------------- /github/reporting-bugs.md: -------------------------------------------------------------------------------- 1 | # Reporting bugs in GitHub to GitHub 2 | 3 | I found out today (via [this post](https://github.com/github-community/community/discussions/19988)) about a dedicated interface for reporting bugs in GitHub to GitHub: 4 | 5 | https://support.github.com/contact/bug-report 6 | 7 | It includes full markdown support, which means you can include animated GIFs that illustrate the bug. 8 | 9 | Once reported, you can track the status of your bug reports here: 10 | 11 | https://support.github.com/tickets 12 | -------------------------------------------------------------------------------- /github/syntax-highlighting-python-console.md: -------------------------------------------------------------------------------- 1 | # Syntax highlighting Python console examples with GFM 2 | 3 | It turns out [GitHub Flavored Markdown](https://github.github.com/gfm/) can apply syntax highlighting to Python console examples, like this one: 4 | 5 | ```pycon 6 | >>> import csv 7 | >>> with open('eggs.csv', newline='') as csvfile: 8 | ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') 9 | ... for row in spamreader: 10 | ... print(', '.join(row)) 11 | Spam, Spam, Spam, Spam, Spam, Baked Beans 12 | Spam, Lovely Spam, Wonderful Spam 13 | ``` 14 | 15 | The trick is to use the following: 16 | 17 | ```` 18 | ```pycon 19 | >>> import csv 20 | >>> with open('eggs.csv', newline='') as csvfile: 21 | ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') 22 | ... for row in spamreader: 23 | ... print(', '.join(row)) 24 | Spam, Spam, Spam, Spam, Spam, Baked Beans 25 | Spam, Lovely Spam, Wonderful Spam 26 | ``` 27 | ```` 28 | I figured out the `pycon` code by scanning through the [languages.yml](https://github.com/github/linguist/blob/v7.12.2/lib/linguist/languages.yml#L4406-L4414) file for linguist, the library GitHub use for their syntax highlighting. 29 | 30 | While writing this TIL I also learned how to embed triple-backticks in a code block - you surround the block with more-than-three backticks (thanks to [this tip](https://github.com/jonschlinkert/remarkable/issues/146#issuecomment-85539428)): 31 | 32 | 33 | ````` 34 | ```` 35 | ```pycon 36 | >>> import csv 37 | >>> with open('eggs.csv', newline='') as csvfile: 38 | ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') 39 | ... for row in spamreader: 40 | ... print(', '.join(row)) 41 | Spam, Spam, Spam, Spam, Spam, Baked Beans 42 | Spam, Lovely Spam, Wonderful Spam 43 | ``` 44 | ```` 45 | ````` 46 | -------------------------------------------------------------------------------- /github/transfer-issue-private-to-public.md: -------------------------------------------------------------------------------- 1 | # Transferring a GitHub issue from a private to a public repository 2 | 3 | I have my own private `notes` repository where I sometimes create research threads. Occasionally I want to transfer these to a public repository to publish their contents. 4 | 5 | https://docs.github.com/en/issues/tracking-your-work-with-issues/transferring-an-issue-to-another-repository says: 6 | 7 | > You can't transfer an issue from a private repository to a public repository. 8 | 9 | I found this workaround: 10 | 11 | 1. Create a new private repository. I called mine `simonw/temp` 12 | 2. Transfer the issue from your original repository to this new temporary repository 13 | 3. Use the "Settings" tab in the temporary repository to change the entire repository's visibility from private to public 14 | 4. Transfer the issue from the temporary repository to the public repository that you want it to live in 15 | 16 | ## Using the gh tool 17 | 18 | You can perform transfers using the web interface, but I also learned how to do it using the `gh` tool. 19 | 20 | Install that with `brew install gh` 21 | 22 | Then you can run this: 23 | 24 | gh issue transfer https://github.com/simonw/temp/issues/1 simonw/datasette-tiddlywiki 25 | 26 | I used this trick today to transfer https://github.com/simonw/datasette-tiddlywiki/issues/2 out of my private `notes` repo. 27 | -------------------------------------------------------------------------------- /go/installing-tools.md: -------------------------------------------------------------------------------- 1 | # Installing tools written in Go 2 | 3 | Today I learned how to install tools from GitHub that are written in Go, using [github.com/icholy/semgrepx](https://github.com/icholy/semgrepx) as an example: 4 | 5 | go install github.com/icholy/semgrepx@latest 6 | 7 | Running this command grabs a copy of the GitHub repository, compiles the Go package in there and drops the resulting binary into the `~/go/bin` folder on your computer: 8 | 9 | ``` 10 | ls -lh ~/go/bin/semgrepx 11 | -rwxr-xr-x 1 simon staff 2.9M Mar 25 21:08 /Users/simon/go/bin/semgrepx 12 | ``` 13 | The `@latest` reference confused me, since the repo in question didn't have a branch or tag called that. 14 | 15 | I couldn't find the right documentation for that, but GPT-4 [confidently told me](https://chat.openai.com/share/06e62ec2-1ab3-495f-9e0c-914ef27c1e91): 16 | 17 | > `@latest`: This specifies the version of the package you want to install. In this case, latest means that the Go tool will install the latest version of the package available. The Go tool uses the versioning information from the repository's tags to determine the latest version. If the repository follows semantic versioning, the latest version is the one with the highest version number. If there are no version tags, latest will refer to the most recent commit on the default branch of the repository. 18 | 19 | In the absence of an official answer that looks like it might be right to me. 20 | -------------------------------------------------------------------------------- /googlecloud/google-oauth-cli-application-oauth-client-id.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/googlecloud/google-oauth-cli-application-oauth-client-id.png -------------------------------------------------------------------------------- /googlecloud/gsutil-bucket.md: -------------------------------------------------------------------------------- 1 | # Publishing to a public Google Cloud bucket with gsutil 2 | 3 | I decided to publish static CSV files to accompany my https://cdc-vaccination-history.datasette.io/ project, using a Google Cloud bucket (see [cdc-vaccination-history issue #9](https://github.com/simonw/cdc-vaccination-history/issues/9)). 4 | 5 | The Google Cloud tutorial on [https://cloud.google.com/storage/docs/hosting-static-website-http#gsutil](https://cloud.google.com/storage/docs/hosting-static-website-http#gsutil) was very helpful. 6 | 7 | ## Creating the bucket 8 | 9 | I used an authenticated `gsutil` session that I already had from my work with Google Cloud Run. 10 | 11 | To create a new bucket: 12 | 13 | gsutil mb gs://cdc-vaccination-history-csv.datasette.io/ 14 | 15 | `mb` is the [make bucket](https://cloud.google.com/storage/docs/gsutil/commands/mb) command. 16 | 17 | I had already verified my `datasette.io` bucket with Google, otherwise this step would not have worked. 18 | 19 | ## Uploading files 20 | 21 | gsutil cp *.csv gs://cdc-vaccination-history-csv.datasette.io 22 | 23 | Using the [gsutil cp command](https://cloud.google.com/storage/docs/gsutil/commands/cp). 24 | 25 | ## Making them available to the public 26 | 27 | This command allows anyone to download from the bucket: 28 | 29 | gsutil iam ch allUsers:objectViewer gs://cdc-vaccination-history-csv.datasette.io 30 | 31 | ## DNS 32 | 33 | I configured `cdc-vaccination-history-csv` as a `CNAME` pointing to `c.storage.googleapis.com.` 34 | 35 | https://cdc-vaccination-history-csv.datasette.io/ now shows an XML directory listing. 36 | -------------------------------------------------------------------------------- /homebrew/homebrew-core-local-git-checkout.md: -------------------------------------------------------------------------------- 1 | # Browsing your local git checkout of homebrew-core 2 | 3 | The [homebrew-core](https://github.com/Homebrew/homebrew-core) repository contains all of the default formulas for Homebrew. 4 | 5 | It's a huge repo, and if you browse it through the GitHub web interface you can run into errors like this one: 6 | 7 | https://github.com/Homebrew/homebrew-core/commits/master/Formula/libspatialite.rb 8 | 9 | > ### Sorry, this commit history is taking too long to generate. 10 | > 11 | > Refresh the page to try again, or view this history locally using the following command: 12 | > 13 | > git log master -- Formula/libspatialite.rb 14 | 15 | It turns out there's a full checkout of the repo (including history) in this folder on your computer already: 16 | 17 | /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core 18 | 19 | So you can browse the history for that file locally like so: 20 | 21 | cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core 22 | git log master -- Formula/libspatialite.rb 23 | -------------------------------------------------------------------------------- /homebrew/latest-sqlite.md: -------------------------------------------------------------------------------- 1 | # Running the latest SQLite in Datasette using Homebrew 2 | 3 | I made a pleasant discovery today: Homebrew are very quick to update to the latest SQLite release (here's [their formula](https://github.com/Homebrew/homebrew-core/blob/master/Formula/sqlite.rb)), and since [Datasette](https://datasette.io/) when installed via Homebrew uses that version, this means you can use `brew update sqlite` to ensure you are running the most recent SQLite version within Datasette. 4 | 5 | If you've installed Datasette using Homebrew: 6 | 7 | brew install datasette 8 | 9 | You can see the version of SQLite it uses either by running `datasette` and navigating to http://127.0.0.1:8001/-/versions - or you can see it from the command-line using: 10 | 11 | % datasette --get /-/versions.json | jq .sqlite.version 12 | "3.37.2" 13 | 14 | To upgrade SQLite, run the following: 15 | 16 | brew upgrade sqlite 17 | 18 | After doing that I ran the above command again and confirmed I had been upgraded to SQLite 3.38.0: 19 | 20 | % datasette --get /-/versions.json | jq .sqlite.version 21 | "3.38.0" 22 | -------------------------------------------------------------------------------- /homebrew/mysql-homebrew.md: -------------------------------------------------------------------------------- 1 | # Running a MySQL server using Homebrew 2 | 3 | First, install MySQL like so: 4 | 5 | brew install mysql 6 | 7 | This installs the server but doesn't run it. You can run it in the background like this: 8 | ``` 9 | % mysql.server start 10 | Starting MySQL 11 | .. SUCCESS! 12 | % 13 | ``` 14 | Then later on you can stop it like so: 15 | ``` 16 | % mysql.server stop 17 | Shutting down MySQL 18 | . SUCCESS! 19 | % 20 | ``` 21 | While it's running it defaults to having a root account that only accepts connections from localhost with no password: 22 | ``` 23 | % mysql -u root 24 | Welcome to the MySQL monitor. Commands end with ; or \g. 25 | Your MySQL connection id is 8 26 | ... 27 | mysql> 28 | ``` 29 | Running `mysql_secure_installation` runs a wizard that helps set up a password. 30 | 31 | When you first install it, Homebrew says: 32 | ``` 33 | To have launchd start mysql now and restart at login: 34 | brew services start mysql 35 | ``` 36 | You can re-display that message by running `brew reinstall mysql`. 37 | 38 | ## Installing the mysqlclient Python library 39 | 40 | This took me a long time to figure out. Eventually this worked: 41 | 42 | MYSQLCLIENT_CFLAGS=`pkg-config mysqlclient --cflags` \ 43 | MYSQLCLIENT_LDFLAGS=`pkg-config mysqlclient --libs` \ 44 | pip install mysqlclient 45 | -------------------------------------------------------------------------------- /homebrew/upgrading-python-homebrew-packages.md: -------------------------------------------------------------------------------- 1 | # Upgrading Python Homebrew packages using pip 2 | 3 | [VisiData 2.0](https://www.visidata.org/) came out today. I previously installed VisiData using Homebrew, but the VisiData tap has not yet been updated with the latest version. 4 | 5 | Homebrew Python packages (including the packages for [Datasette](https://formulae.brew.sh/formula/datasette) and [sqlite-utils](https://formulae.brew.sh/formula/sqlite-utils)) work by setting up their own package-specific virtual environments. This means you can upgrade them without waiting for the tap. 6 | 7 | To find the virtual environment, run `head -n 1` against the Homebrew-providid executable. VisiData is `vd`, so this works: 8 | ``` 9 | % head -n 1 $(which vd) 10 | #!/usr/local/Cellar/visidata/1.5.2/libexec/bin/python3.8 11 | ``` 12 | Now you can call `pip` within that virtual directory to perform the upgrade like so: 13 | ``` 14 | /usr/local/Cellar/visidata/1.5.2/libexec/bin/pip install -U visidata 15 | ``` 16 | -------------------------------------------------------------------------------- /html/lazy-loading-images.md: -------------------------------------------------------------------------------- 1 | # Lazy loading images in HTML 2 | 3 | [Most modern browsers](https://caniuse.com/loading-lazy-attr) now include support for the `loading="lazy"` image attribute, which causes images not to be loaded unti the user scrolls them into view. 4 | 5 | ![Animated screenshot showing the network panel in the Firefox DevTools - as I scroll down a page more images load on demand just before they scroll into view.](https://user-images.githubusercontent.com/9599/204108097-6f385377-5daf-4895-9216-4ea0916a296a.gif) 6 | 7 | I used it for the slides on my annotated version of this presentation: [Massively increase your productivity on personal projects with comprehensive documentation and automated tests](https://simonwillison.net/2022/Nov/26/productivity/). 8 | 9 | There's one catch though: you need to provide the size of the image (I used `width=` and `height=` attributes) in order for it to work! Without those your browser still needs to fetch the images in order to calculate their dimensions to calculate page layout. 10 | 11 | Here's the HTML I used for each slide image: 12 | 13 | ```html 14 | Issue driven development 21 | ``` 22 | -------------------------------------------------------------------------------- /html/video-preload-none.md: -------------------------------------------------------------------------------- 1 | # HTML video that loads when the user clicks play 2 | 3 | Today I figured out how to use the `