├── .github
    └── workflows
    │   └── build.yml
├── LICENSE
├── README.md
├── ab
    └── apache-bench-length-errors.md
├── amplitude
    └── export-events-to-datasette.md
├── asgi
    └── lifespan-test-httpx.md
├── auth0
    ├── auth0-logout.md
    └── oauth-with-auth0.md
├── aws
    ├── athena-key-does-not-exist.md
    ├── athena-newline-json.md
    ├── boto-command-line.md
    ├── helper-for-boto-aws-pagination.md
    ├── instance-costs-per-month.md
    ├── ocr-pdf-textract.md
    ├── recovering-lightsail-data.md
    ├── s3-cors.md
    └── s3-triggers-dynamodb.md
├── awslambda
    └── asgi-mangum.md
├── azure
    └── all-traffic-to-subdomain.md
├── bash
    ├── escaping-a-string.md
    ├── escaping-sql-for-curl-to-datasette.md
    ├── finding-bom-csv-files-with-ripgrep.md
    ├── go-script.md
    ├── ignore-errors.md
    ├── loop-over-csv.md
    ├── multiple-servers.md
    ├── nullglob-in-bash.md
    ├── skip-csv-rows-with-odd-numbers.md
    ├── start-test-then-stop-server.md
    └── use-awk-to-add-a-prefix.md
├── build_database.py
├── caddy
    └── pause-retry-traffic.md
├── chrome
    └── headless.md
├── clickhouse
    ├── github-explorer.md
    └── github-public-history.md
├── cloudflare
    ├── cache-control-transform-rule.md
    ├── cloudflare-cache-html.md
    ├── domain-redirect-with-pages.md
    ├── redirect-rules.md
    ├── redirect-whole-domain.md
    ├── robots-txt-cloudflare-workers.md
    └── workers-github-oauth.md
├── cloudrun
    ├── billing-metrics-explorer.png
    ├── gcloud-run-services-list.md
    ├── increase-cloud-scheduler-time-limit.md
    ├── listing-cloudbuild-files.md
    ├── multiple-gcloud-accounts.md
    ├── ship-dockerfile-to-cloud-run.md
    ├── tailing-cloud-run-request-logs.md
    ├── use-labels-for-billing-breakdown-1.png
    ├── use-labels-for-billing-breakdown-2.png
    ├── use-labels-for-billing-breakdown.md
    └── using-build-args-with-cloud-run.md
├── cocktails
    ├── pisco-sour.md
    ├── tommys-margarita.md
    └── whisky-sour.md
├── cookiecutter
    ├── conditionally-creating-directories.md
    └── pytest-for-cookiecutter.md
├── cooking
    └── breakfast-tacos.md
├── cosmopolitan
    └── ecosystem.md
├── css
    ├── dialog-full-height.md
    ├── resizing-textarea.md
    └── simple-two-column-grid.md
├── datasette
    ├── baseline.md
    ├── cli-tool-that-is-also-a-plugin.md
    ├── crawling-datasette-with-datasette.md
    ├── datasette-on-replit.md
    ├── hugging-face-spaces.md
    ├── issues-open-for-less-than-x-seconds.md
    ├── playwright-tests-datasette-plugin.md
    ├── plugin-modifies-command.md
    ├── pytest-httpx-datasette.md
    ├── reddit-datasette-write.md
    ├── redirects-for-datasette.md
    ├── register-new-plugin-hooks.md
    ├── remember-to-commit.md
    ├── reuse-click-for-register-commands.md
    ├── row-selection-prototype.md
    ├── search-all-columns-trick.md
    ├── serving-mbtiles.md
    └── syntax-highlighted-code-examples.md
├── deno
    ├── annotated-deno-deploy-demo.md
    ├── deno-kv.md
    └── pyodide-sandbox.md
├── digitalocean
    └── datasette-on-digitalocean-app-platform.md
├── discord
    └── discord-github-issues-bot.md
├── django
    ├── almost-facet-counts-django-admin.md
    ├── building-a-blog-in-django.md
    ├── datasette-django.md
    ├── django-admin-horizontal-scroll.md
    ├── efficient-bulk-deletions-in-django.md
    ├── enabling-gin-index.md
    ├── export-csv-from-django-admin.md
    ├── extra-read-only-admin-information.md
    ├── filter-by-comma-separated-values.md
    ├── just-with-django.md
    ├── live-blog.md
    ├── migration-postgresql-fuzzystrmatch.md
    ├── migration-using-cte.md
    ├── migrations-runsql-noop.md
    ├── postgresql-full-text-search-admin.md
    ├── pretty-print-json-admin.md
    ├── pytest-django.md
    ├── show-timezone-in-django-admin.md
    └── testing-django-admin-with-pytest.md
├── docker
    ├── attach-bash-to-running-container.md
    ├── debian-unstable-packages.md
    ├── docker-compose-for-django-development.md
    ├── docker-for-mac-container-to-postgresql-on-host.md
    ├── emulate-s390x-with-qemu.md
    ├── gdb-python-docker.md
    ├── pipenv-and-docker.md
    ├── pytest-docker.md
    └── test-fedora-in-docker.md
├── duckdb
    ├── parquet-to-json.md
    ├── parquet.md
    └── remote-parquet.md
├── electron
    ├── electrion-auto-update.md
    ├── electron-debugger-console.md
    ├── electron-external-links-system-browser.md
    ├── python-inside-electron.md
    ├── sign-notarize-electron-macos.md
    └── testing-electron-playwright.md
├── exif
    └── orientation-and-location.md
├── firefox
    ├── search-across-all-resources-2.jpg
    ├── search-across-all-resources.jpg
    └── search-across-all-resources.md
├── fly
    ├── clip-on-fly.md
    ├── custom-subdomain-fly.md
    ├── django-sql-dashboard.md
    ├── fly-docker-registry.md
    ├── fly-logs-to-s3.md
    ├── redbean-on-fly.md
    ├── scp.md
    ├── undocumented-graphql-api.md
    ├── varnish-on-fly.md
    └── wildcard-dns-ssl.md
├── generate_screenshots.py
├── gis
    ├── gdal-sql.md
    ├── mapzen-elevation-tiles.md
    ├── natural-earth-in-spatialite-and-datasette.md
    └── pmtiles.md
├── git
    ├── backdate-git-commits.md
    ├── git-archive.md
    ├── git-bisect.md
    ├── git-filter-repo.md
    ├── remove-commit-and-force-push.md
    ├── rewrite-repo-remove-secrets.md
    ├── rewrite-repo-specific-files.md
    └── size-of-lfs-files.md
├── github-actions
    ├── attach-generated-file-to-release.md
    ├── cache-setup-py.md
    ├── cog.md
    ├── commit-if-file-changed.md
    ├── conditionally-run-a-second-job.md
    ├── continue-on-error.md
    ├── creating-github-labels.md
    ├── daily-planner.md
    ├── debug-tmate.md
    ├── deploy-live-demo-when-tests-pass.md
    ├── different-postgresql-versions.md
    ├── different-steps-on-a-schedule.md
    ├── dump-context.md
    ├── ensure-labels.md
    ├── github-pages.md
    ├── grep-tests.md
    ├── job-summaries.md
    ├── markdown-table-of-contents.md
    ├── npm-cache-with-npx-no-package.md
    ├── only-master.md
    ├── oxipng.md
    ├── postgresq-service-container.md
    ├── prettier-github-actions.md
    ├── python-3-11.md
    ├── running-tests-against-multiple-verisons-of-dependencies.md
    ├── s3-bucket-github-actions.md
    ├── service-containers-docker.md
    ├── set-environment-for-all-steps.md
    └── vite-github-pages.md
├── github
    ├── bulk-edit-github-projects.md
    ├── bulk-repo-github-graphql.md
    ├── clone-and-push-gist.md
    ├── custom-subdomain-github-pages.md
    ├── dependabot-python-setup.md
    ├── dependencies-graphql-api.md
    ├── django-postgresql-codespaces.md
    ├── github-code-search-api-uses.md
    ├── github-pages.md
    ├── graphql-pagination-python.md
    ├── graphql-search-topics.md
    ├── migrate-github-wiki.md
    ├── release-note-assistance.md
    ├── reporting-bugs.md
    ├── syntax-highlighting-python-console.md
    └── transfer-issue-private-to-public.md
├── go
    └── installing-tools.md
├── google-sheets
    └── concatenate.md
├── google
    ├── gmail-compose-url.md
    └── json-api-programmable-search-engine.md
├── googlecloud
    ├── gcloud-error-workaround.md
    ├── google-cloud-spend-datasette.md
    ├── google-oauth-cli-application-oauth-client-id.png
    ├── google-oauth-cli-application.md
    ├── gsutil-bucket.md
    ├── recursive-fetch-google-drive.md
    └── video-frame-ocr.md
├── gpt3
    ├── chatgpt-api.md
    ├── chatgpt-applescript.md
    ├── gpt4-api-design.md
    ├── guessing-amazon-urls.md
    ├── jq.md
    ├── open-api.md
    ├── openai-python-functions-data-extraction.md
    ├── picking-python-project-name-chatgpt.md
    ├── python-chatgpt-streaming-api.md
    ├── reformatting-text-with-copilot.md
    └── writing-test-with-copilot.md
├── graphql
    ├── get-graphql-schema.md
    ├── graphql-fragments.md
    └── graphql-with-curl.md
├── hacker-news
    └── recent-comments.md
├── ham-radio
    └── general.md
├── heroku
    ├── pg-pull.md
    ├── pg-upgrade.md
    └── programatic-access-postgresql.md
├── homebrew
    ├── auto-formulas-github-actions.md
    ├── homebrew-core-local-git-checkout.md
    ├── latest-sqlite.md
    ├── mysql-homebrew.md
    ├── no-verify-attestations.md
    ├── packaging-python-cli-for-homebrew.md
    └── upgrading-python-homebrew-packages.md
├── html
    ├── datalist.md
    ├── lazy-loading-images.md
    ├── scroll-to-text.md
    ├── video-preload-none.md
    └── video-with-subtitles.md
├── http
    └── testing-cors-max-age.md
├── httpx
    └── openai-log-requests-responses.md
├── hugo
    └── basic.md
├── ics
    └── google-calendar-ics-subscribe-link.md
├── imagemagick
    ├── compress-animated-gif.md
    └── set-a-gif-to-loop.md
├── ios
    └── listen-to-page.md
├── javascript
    ├── copy-button.md
    ├── copy-rich-text-to-clipboard.md
    ├── dropdown-menu-with-details-summary.md
    ├── dynamically-loading-assets.md
    ├── javascript-date-objects.md
    ├── javascript-that-responds-to-media-queries.md
    ├── jest-without-package-json.md
    ├── jsr-esbuild.md
    ├── lit-with-skypack.md
    ├── manipulating-query-params.md
    ├── minifying-uglify-npx.md
    ├── openseadragon.md
    ├── preventing-double-form-submission.md
    ├── scroll-to-form-if-errors.md
    ├── tesseract-ocr-javascript.md
    └── working-around-nodevalue-size-limit.md
├── jinja
    ├── autoescape-template.md
    ├── custom-jinja-tags-with-attributes.md
    └── format-thousands.md
├── jq
    ├── array-of-array-to-objects.md
    ├── combined-github-release-notes.md
    ├── convert-no-decimal-point-latitude-jq.md
    ├── extracting-objects-recursively.md
    ├── flatten-nested-json-objects-jq.md
    ├── git-log-json.md
    ├── radio-garden-jq.md
    └── reformatting-airtable-json.md
├── json
    ├── ijson-stream.md
    ├── json-pointer.md
    └── streaming-indented-json-array.md
├── jupyter
    ├── javascript-in-a-jupyter-notebook.md
    └── jupyterlab-uv-tool-install.md
├── kubernetes
    ├── basic-datasette-in-kubernetes.md
    └── kubectl-proxy.md
├── linux
    ├── allow-sudo-without-password-specific-command.md
    ├── basic-strace.md
    ├── echo-pipe-to-file-su.md
    └── iconv.md
├── llms
    ├── bert-ner.md
    ├── claude-hacker-news-themes.md
    ├── code-interpreter-expansions.md
    ├── colbert-ragatouille.md
    ├── docs-from-tests.md
    ├── dolly-2.md
    ├── embed-paragraphs.md
    ├── larger-context-openai-models-llm.md
    ├── llama-7b-m2.md
    ├── llama-cpp-python-grammars.md
    ├── mlc-chat-redpajama.md
    ├── nanogpt-shakespeare-m2.md
    ├── openai-embeddings-related-content.md
    ├── prompt-gemini.md
    ├── python-react-pattern.md
    ├── rg-pipe-llm-trick.md
    ├── streaming-llm-apis.md
    └── training-nanogpt-on-my-blog.md
├── machinelearning
    └── musicgen.md
├── macos
    ├── 1password-terminal.md
    ├── apple-photos-large-files.md
    ├── atuin.md
    ├── close-terminal-on-ctrl-d.md
    ├── close-terminal-on-ctrl-d.png
    ├── downloading-partial-youtube-videos.md
    ├── edit-ios-home-screen.md
    ├── external-display-laptop.md
    ├── find-largest-sqlite.md
    ├── fixing-compinit-insecure-directories.md
    ├── fs-usage.md
    ├── ifuse-iphone.md
    ├── imovie-slides-and-audio.md
    ├── impaste.md
    ├── lsof-macos.md
    ├── open-files-with-opensnoop.md
    ├── python-installer-macos.md
    ├── quick-whisper-youtube.md
    ├── quicktime-capture-script.md
    ├── running-docker-on-remote-m1.md
    ├── shrinking-pngs-with-pngquant-and-oxipng.md
    ├── sips.md
    ├── skitch-catalina-1.png
    ├── skitch-catalina-2.png
    ├── skitch-catalina.md
    ├── whisper-cpp.md
    ├── wildcard-dns-dnsmasq.md
    └── zsh-pip-install.md
├── markdown
    ├── converting-to-markdown.gif
    ├── converting-to-markdown.md
    ├── github-markdown-api.md
    └── markdown-extensions-python.md
├── mastodon
    ├── custom-domain-mastodon.md
    ├── export-timeline-to-sqlite.md
    ├── mastodon-bots-github-actions.md
    └── verifying-github-on-mastodon.md
├── mediawiki
    └── mediawiki-sqlite-macos.md
├── metadata.yaml
├── midjourney
    └── desktop-backgrounds.md
├── misc
    ├── hexdump.md
    └── voice-cloning.md
├── networking
    ├── ethernet-over-coaxial-cable.md
    └── http-ipv6.md
├── nginx
    └── proxy-domain-sockets.md
├── node
    └── constant-time-compare-strings.md
├── npm
    ├── annotated-package-json.md
    ├── npm-publish-github-actions.md
    ├── prettier-django.md
    ├── publish-web-component.md
    ├── self-hosted-quickjs.md
    └── upgrading-packages.md
├── observable-plot
    ├── histogram-with-tooltips.md
    └── wider-tooltip-areas.md
├── observable
    └── jq-in-observable.md
├── overture-maps
    └── overture-maps-parquet.md
├── pixelmator
    └── pixel-editing-favicon.md
├── playwright
    ├── expect-selector-count.md
    └── testing-tables.md
├── pluggy
    └── multiple-hooks-same-file.md
├── plugins
    ├── redirects.py
    └── template_vars.py
├── postgresql
    ├── closest-locations-to-a-point.md
    ├── constructing-geojson-in-postgresql.md
    ├── json-extract-path.md
    ├── read-only-postgresql-user.md
    ├── show-schema.md
    ├── unnest-csv.md
    └── upgrade-postgres-app.md
├── presenting
    ├── Tipsheet__https___bit_ly_…_and_New_File_and_Zoom.png
    └── stickies-for-workshop-links.md
├── purpleair
    └── purple-air-aqi.md
├── pyodide
    └── cryptography-in-pyodide.md
├── pypi
    ├── project-links.md
    ├── project-links.png
    └── pypi-releases-from-github.md
├── pytest
    ├── assert-dictionary-subset.md
    ├── async-fixtures.md
    ├── coverage-with-context.md
    ├── mock-httpx.md
    ├── mocking-boto.md
    ├── namedtuple-parameterized-tests.md
    ├── only-run-integration.md
    ├── playwright-pytest.md
    ├── pytest-argparse.md
    ├── pytest-code-coverage.md
    ├── pytest-httpx-debug.md
    ├── pytest-mock-calls.md
    ├── pytest-recording-vcr.md
    ├── pytest-stripe-signature.md
    ├── pytest-subprocess.md
    ├── pytest-uv.md
    ├── registering-plugins-in-tests.md
    ├── session-scoped-tmp.md
    ├── show-files-opened-by-tests.md
    ├── subprocess-server.md
    ├── syrupy.md
    ├── test-click-app-with-streaming-input.md
    └── treat-warnings-as-errors.md
├── python
    ├── annotated-dataklasses.md
    ├── build-official-docs.md
    ├── calendar-weeks.md
    ├── call-pip-programatically.md
    ├── callable.md
    ├── click-file-encoding.md
    ├── click-option-names.md
    ├── codespell.md
    ├── cog-to-update-help-in-readme.md
    ├── comparing-version-numbers.md
    ├── convert-to-utc-without-pytz.md
    ├── copy-file.md
    ├── csv-error-column-too-large.md
    ├── debug-click-with-pdb.md
    ├── decorators-with-optional-arguments.md
    ├── fabric-ssh-key.md
    ├── find-local-variables-in-exception-traceback.md
    ├── generate-nested-json-summary.md
    ├── graphlib-topologicalsorter.md
    ├── gtr-t5-large.md
    ├── ignore-both-flake8-and-mypy.md
    ├── init-subclass.md
    ├── inlining-binary-data.md
    ├── installing-flash-attention.md
    ├── installing-upgrading-plugins-with-pipx.md
    ├── introspect-function-parameters.md
    ├── io-bufferedreader.md
    ├── itry.md
    ├── json-floating-point.md
    ├── locust.md
    ├── lxml-m1-mac.md
    ├── macos-catalina-sort-of-ships-with-python3.md
    ├── md5-fips.md
    ├── os-remove-windows.md
    ├── output-json-array-streaming.md
    ├── packaging-pyinstaller.md
    ├── password-hashing-with-pbkdf2.md
    ├── pdb-interact.md
    ├── pip-cache.md
    ├── pip-tools.md
    ├── pipx-alpha.md
    ├── platform-specific-dependencies.md
    ├── pprint-no-sort-dicts.md
    ├── protocols.md
    ├── pyobjc-framework-corelocation.md
    ├── pyproject.md
    ├── pypy-macos.md
    ├── quick-testing-pyenv.md
    ├── rye.md
    ├── safe-output-json.md
    ├── setup-py-from-url.md
    ├── sqlite-in-pyodide.md
    ├── stdlib-cli-tools.md
    ├── struct-endianness.md
    ├── style-yaml-dump.md
    ├── subprocess-time-limit.md
    ├── toml.md
    ├── too-many-open-files-psutil.md
    ├── tracing-every-statement.md
    ├── tree-sitter.md
    ├── trying-free-threaded-python.md
    ├── using-c-include-path-to-install-python-packages.md
    ├── utc-warning-fix.md
    ├── uv-cli-apps.md
    └── yielding-in-asyncio.md
├── quarto
    └── trying-out-quarto.md
├── readthedocs
    ├── custom-sphinx-templates.md
    ├── custom-subdomain.md
    ├── documentation-seo-canonical.md
    ├── link-from-latest-to-stable.md
    ├── pip-install-docs.md
    ├── readthedocs-search-api.md
    └── stable-docs.md
├── reddit
    └── scraping-reddit-json.md
├── requirements.txt
├── script
    ├── bootstrap
    ├── build
    ├── server
    └── update
├── selenium
    ├── async-javascript-in-selenium.md
    └── selenium-python-macos.md
├── service-workers
    └── intercept-fetch.md
├── shot-scraper
    ├── axe-core.md
    ├── readability.md
    ├── scraping-flourish.md
    ├── social-media-cards.md
    └── subset-of-table-columns.md
├── spatialite
    ├── gunion-to-combine-geometries.md
    ├── knn.md
    ├── minimal-spatialite-database-in-python.md
    └── viewing-geopackage-data-with-spatialite-and-datasette.md
├── sphinx
    ├── blacken-docs.md
    ├── literalinclude-with-markers.md
    ├── sphinx-autodoc.md
    └── sphinx-ext-extlinks.md
├── sql
    ├── consecutive-groups.md
    ├── cumulative-total-over-time.md
    ├── django-group-permissions-markdown.md
    ├── finding-dupes-by-name-and-distance.md
    └── recursive-cte-twitter-threads.md
├── sqlite
    ├── blob-literals.md
    ├── build-specific-sqlite-pysqlite-macos.md
    ├── column-combinations.md
    ├── compare-before-after-json.md
    ├── comparing-datasets.md
    ├── compile-spellfix-osx.md
    ├── compile-sqlite3-rsync.md
    ├── compile-sqlite3-ubuntu.md
    ├── copy-tables-between-databases.md
    ├── counting-vm-ops.md
    ├── cr-sqlite-macos.md
    ├── cte-values.md
    ├── database-file-size.md
    ├── enabling-wal-mode.md
    ├── fixing-column-encoding-with-ftfy-and-sqlite-transform.md
    ├── floating-point-seconds.md
    ├── function-list.md
    ├── geopoly.md
    ├── import-csv.md
    ├── json-audit-log.md
    ├── json-extract-path.md
    ├── lag-window-function.md
    ├── ld-preload.md
    ├── list-all-columns-in-a-database.md
    ├── multiple-indexes.md
    ├── now-argument-stability.md
    ├── null-case.md
    ├── one-line-csv-operations.md
    ├── ordered-group-concat.md
    ├── pragma-function-list.md
    ├── pysqlite3-on-macos.md
    ├── python-sqlite-environment.md
    ├── python-sqlite-memory-to-file.md
    ├── related-content.md
    ├── related-rows-single-query.md
    ├── replicating-rqlite.md
    ├── simple-recursive-cte.md
    ├── sort-by-number-of-json-intersections.md
    ├── splitting-commas-sqlite.md
    ├── sqlite-aggregate-filter-clauses.md
    ├── sqlite-extensions-python-macos.md
    ├── sqlite-tg.md
    ├── sqlite-triggers.md
    ├── sqlite-vec.md
    ├── sqlite-version-macos-python.md
    ├── sqlite-version-websql-chrome.md
    ├── steampipe.md
    ├── subqueries-in-select.md
    ├── substr-instr.md
    ├── text-value-is-integer-or-float.md
    ├── track-timestamped-changes-to-a-table.md
    ├── triggers.py
    ├── trying-macos-extensions.md
    ├── unix-timestamp-milliseconds-sqlite.md
    ├── utc-items-on-thursday-in-pst.md
    └── vacum-disk-full.md
├── static
    └── github-light.css
├── svg
    └── dynamic-line-chart.md
├── tailscale
    ├── lock-down-sshd.md
    └── tailscale-github-actions.md
├── templates
    ├── index.html
    ├── pages
    │   ├── all.html
    │   ├── tools
    │   │   ├── annotated-presentations.html
    │   │   ├── aqi.html
    │   │   ├── byte-size-converter.html
    │   │   ├── clipboard.html
    │   │   ├── render-markdown.html
    │   │   └── resizing-textarea.html
    │   ├── {topic}.html
    │   └── {topic}
    │   │   └── {slug}.html
    ├── query-tils-search.html
    └── til_base.html
├── tesseract
    └── tesseract-cli.md
├── tiktok
    └── download-all-videos.md
├── twitter
    ├── birdwatch-sqlite.md
    ├── collecting-replies.md
    ├── credentials-twitter-bot.md
    └── export-edit-twitter-spaces.md
├── typescript
    └── basic-tsc.md
├── update_readme.py
├── valtown
    └── scheduled.md
├── vega
    └── bar-chart-ordering.md
├── vim
    └── mouse-support-in-vim.md
├── vscode
    ├── language-specific-indentation-settings.md
    └── vs-code-regular-expressions.md
├── web-components
    └── understanding-single-file-web-component.md
├── webassembly
    ├── compile-to-wasm-llvm-macos.md
    └── python-in-a-wasm-sandbox.md
├── webauthn
    └── webauthn-browser-support.md
├── wikipedia
    └── page-stats-api.md
├── yaml
    └── yamlfmt.md
├── youtube
    └── livestreaming.md
├── zeit-now
    ├── python-asgi-on-now-v2.md
    └── redirecting-all-paths-on-vercel.md
└── zsh
    ├── argument-heredoc.md
    └── custom-zsh-prompt.md


/amplitude/export-events-to-datasette.md:
--------------------------------------------------------------------------------
 1 | # Exporting Amplitude events to SQLite
 2 | 
 3 | [Amplitude](https://amplitude.com/) offers an "Export Data" button in the project settings page. This can export up to 365 days of events (up to 4GB per export), where the export is a zip file containing `*.json.gz` gzipped newline-delimited JSON.
 4 | 
 5 | You can export multiple times, so if you have more than a year of events you can export them by specifying different date ranges. It's OK to overlap these ranges as each event has a uniue `uuid` that can be used to de-duplicate them.
 6 | 
 7 | Here's how to import that into a SQLite database using `sqlite-utils`:
 8 | ```
 9 | unzip export # The exported file does not have a .zip extension for some reason
10 | cd DIRECTORY_CREATED_FROM_EXPORT
11 | gzcat *.json.gz | sqlite-utils insert amplitude.db events --nl --alter --pk uuid --ignore -
12 | ```
13 | Since we are using `--pk uuid` and `--ignore` it's safe to run this against as many exported `*.json.gz` files as you like, including exports that overlap each other.
14 | 
15 | Then run `datasette amplitude.db` to browse the results.
16 | 


--------------------------------------------------------------------------------
/aws/boto-command-line.md:
--------------------------------------------------------------------------------
 1 | # Using boto3 from the command line
 2 | 
 3 | I found a useful pattern today for automating more complex AWS processes as pastable command line snippets, using [Boto3](https://aws.amazon.com/sdk-for-python/).
 4 | 
 5 | The trick is to take advantage of the fact that `python3 -c '...'` lets you pass in a multi-line Python string which will be executed directly.
 6 | 
 7 | I used that to create a new IAM role by running the following:
 8 | ```bash
 9 | python3 -c '
10 | import boto3, json
11 | 
12 | iam = boto3.client("iam")
13 | create_role_response = iam.create_role(
14 |     Description=("Description of my role"),
15 |     RoleName="my-new-role",
16 |     AssumeRolePolicyDocument=json.dumps(
17 |         {
18 |             "Version": "2012-10-17",
19 |             "Statement": [
20 |                 {
21 |                     "Effect": "Allow",
22 |                     "Principal": {
23 |                         "AWS": "arn:aws:iam::462092780466:user/s3.read-write.my-previously-created-user"
24 |                     },
25 |                     "Action": "sts:AssumeRole",
26 |                 }
27 |             ],
28 |         }
29 |     ),
30 |     MaxSessionDuration=12 * 60 * 60,
31 | )
32 | # Attach AmazonS3FullAccess to it - note that even though we use full access
33 | # on the role itself any time we call sts.assume_role() we attach an additional
34 | # policy to ensure reduced access for the temporary credentials
35 | iam.attach_role_policy(
36 |     RoleName="my-new-role",
37 |     PolicyArn="arn:aws:iam::aws:policy/AmazonS3FullAccess",
38 | )
39 | print(create_role_response["Role"]["Arn"])
40 | '
41 | ```
42 | 


--------------------------------------------------------------------------------
/aws/helper-for-boto-aws-pagination.md:
--------------------------------------------------------------------------------
 1 | # Helper function for pagination using AWS boto3
 2 | 
 3 | I noticed that a lot of my boto3 code in [s3-credentials](https://github.com/simonw/s3-credentials) looked like this:
 4 | 
 5 | ```python
 6 | paginator = iam.get_paginator("list_user_policies")
 7 | for response in paginator.paginate(UserName=username):
 8 |     for policy_name in response["PolicyNames"]:
 9 |         print(policy_name)
10 | ```
11 | This was enough verbosity that I was hesitating on implementing pagination properly for some method calls.
12 | 
13 | I came up with this helper function to use instead:
14 | 
15 | ```python
16 | def paginate(service, method, list_key, **kwargs):
17 |     paginator = service.get_paginator(method)
18 |     for response in paginator.paginate(**kwargs):
19 |         yield from response[list_key]
20 | ```
21 | Now the above becomes:
22 | ```python
23 | for policy_name in paginate(iam, "list_user_policies", "PolicyNames", UserName=username):
24 |     print(policy_name)
25 | ```
26 | Here's [the issue](https://github.com/simonw/s3-credentials/issues/63) and the [refactoring commit](https://github.com/simonw/s3-credentials/commit/fc1e06ca3ffa2c73e196cffe741ef4e950204240).
27 | 


--------------------------------------------------------------------------------
/aws/instance-costs-per-month.md:
--------------------------------------------------------------------------------
 1 | # Display EC2 instance costs per month
 2 | 
 3 | The [EC2 pricing page](https://aws.amazon.com/ec2/pricing/on-demand/) shows cost per hour, which is pretty much useless. I want cost per month. The following JavaScript, pasted into the browser developer console, modifies the page to show cost-per-month instead.
 4 | 
 5 | ```javascript
 6 | Array.from(
 7 |     document.querySelectorAll('td')
 8 | ).filter(
 9 |     el => el.textContent.toLowerCase().includes('per hour')
10 | ).forEach(
11 |     el => el.textContent = '$' + (parseFloat(
12 |         /\d+\.\d+/.exec(el.textContent)[0]
13 |     ) * 24 * 30).toFixed(2) + ' per month'
14 | )
15 | ```
16 | 


--------------------------------------------------------------------------------
/azure/all-traffic-to-subdomain.md:
--------------------------------------------------------------------------------
 1 | # Writing an Azure Function that serves all traffic to a subdomain
 2 | 
 3 | [Azure Functions](https://docs.microsoft.com/en-us/azure/azure-functions/) default to serving traffic from a path like `/api/FunctionName` - for example `https://your-subdomain.azurewebsites.net/api/MyFunction`.
 4 | 
 5 | If you want to serve an entire website through a single function (e.g. using [Datasette](https://datasette.io/)) you need that function to we called for any traffic to that subdomain.
 6 | 
 7 | Here's how to do that - to capture all traffic to any path under `https://your-subdomain.azurewebsites.net/`.
 8 | 
 9 | First add the following section to your `host.json` file:
10 | 
11 | ```
12 |     "extensions": {
13 |         "http": {
14 |             "routePrefix": ""
15 |         }
16 |     }
17 | ```
18 | Then add `"route": "{*route}"` to the `function.json` file for the function that you would like to serve all traffic. Mine ended up looking like this:
19 | ```json
20 | {
21 |     "scriptFile": "__init__.py",
22 |     "bindings": [
23 |         {
24 |             "authLevel": "Anonymous",
25 |             "type": "httpTrigger",
26 |             "direction": "in",
27 |             "name": "req",
28 |             "route": "{*route}",
29 |             "methods": [
30 |                 "get",
31 |                 "post"
32 |             ]
33 |         },
34 |         {
35 |             "type": "http",
36 |             "direction": "out",
37 |             "name": "$return"
38 |         }
39 |     ]
40 | }
41 | ```
42 | See https://github.com/simonw/azure-functions-datasette for an example that uses this pattern.
43 | 


--------------------------------------------------------------------------------
/bash/escaping-a-string.md:
--------------------------------------------------------------------------------
 1 | # Escaping strings in Bash using !:q
 2 | 
 3 | TIL this trick, [via Pascal Hirsch](https://twitter.com/phphys/status/1311727268398465029) on Twitter. Enter a line of Bash starting with a `#` comment, then run `!:q` on the next line to see what that would be with proper Bash escaping applied.
 4 | 
 5 | ```
 6 | bash-3.2$ # This string 'has single' "and double" quotes and a $
 7 | bash-3.2$ !:q
 8 | '# This string '\''has single'\'' "and double" quotes and a $'
 9 | bash: # This string 'has single' "and double" quotes and a $: command not found
10 | ```
11 | How does this work? [James Coglan explains](https://twitter.com/mountain_ghosts/status/1311767073933099010):
12 | 
13 | > The `!` character begins a history expansion; `!string` produces the last command beginning with `string`, and `:q` is a modifier that quotes the result; so I'm guessing this is equivalent to `!string` where `string` is `""`, so it produces the most recent command, just like `!!` does
14 | 
15 | A bunch more useful tips in the [thread about this on Hacker News](https://news.ycombinator.com/item?id=24659282).
16 | 


--------------------------------------------------------------------------------
/bash/finding-bom-csv-files-with-ripgrep.md:
--------------------------------------------------------------------------------
 1 | # Finding CSV files that start with a BOM using ripgrep
 2 | 
 3 | For [sqlite-utils issue 250](https://github.com/simonw/sqlite-utils/issues/250) I needed to locate some test CSV files that start with a UTF-8 BOM.
 4 | 
 5 | Here's how I did that using [ripgrep](https://github.com/BurntSushi/ripgrep):
 6 | ```
 7 | $ rg --multiline --encoding none '^(?-u:\xEF\xBB\xBF)' --glob '*.csv' .
 8 | ```
 9 | The `--multiline` option means the search spans multiple lines - I only want to match entire files that begin with my search term, so this means that `^` will match the start of the file, not the start of individual lines.
10 | 
11 | `--encoding none` runs the search against the raw bytes of the file, disabling ripgrep's default BOM detection.
12 | 
13 | `--glob '*.csv'` causes ripgrep to search only CSV files.
14 | 
15 | The regular expression itself looks like this:
16 | 
17 |     ^(?-u:\xEF\xBB\xBF)
18 | 
19 | This is [rust regex](https://docs.rs/regex/1.5.4/regex/#syntax) syntax.
20 | 
21 | `(?-u:` means "turn OFF the `u` flag for the duration of this block" - the `u` flag, which is on by default, causes the Rust regex engine to interpret input as unicode. So within the rest of that `(...)` block we can use escaped byte sequences.
22 | 
23 | Finally, `\xEF\xBB\xBF` is the byte sequence for the UTF-8 BOM itself.
24 | 


--------------------------------------------------------------------------------
/bash/ignore-errors.md:
--------------------------------------------------------------------------------
 1 | # Ignoring errors in a section of a Bash script
 2 | 
 3 | For [simonw/museums#32](https://github.com/simonw/museums/issues/32) I wanted to have certain lines in my Bash script ignore any errors: lines that used `sqlite-utils` to add columns and configure FTS, but that might fail with an error if the column already existed or FTS had already been configured.
 4 | 
 5 | [This tip](https://stackoverflow.com/a/60362732) on StackOverflow lead me to the [following recipe](https://github.com/simonw/museums/blob/d94410440a5c81a5cb3a0f0b886a8cd30941b8a9/build.sh):
 6 | 
 7 | ```bash
 8 | #!/bin/bash
 9 | set -euo pipefail
10 | 
11 | yaml-to-sqlite browse.db museums museums.yaml --pk=id
12 | python annotate_nominatum.py browse.db
13 | python annotate_timestamps.py
14 | # Ignore errors in following block until set -e:
15 | set +e
16 | sqlite-utils add-column browse.db museums country 2>/dev/null
17 | sqlite3 browse.db < set-country.sql
18 | sqlite-utils disable-fts browse.db museums 2>/dev/null
19 | sqlite-utils enable-fts browse.db museums \
20 |   name description country osm_city \
21 |   --tokenize porter --create-triggers 2>/dev/null
22 | set -e
23 | ```
24 | Everything between the `set +e` and the `set -e` lines can now error without the Bash script itself failing.
25 | 
26 | The failing lines were still showing a bunch of Python tracebacks. I fixed that by redirecting their standard error output to `/dev/null` like this:
27 | ```bash
28 | sqlite-utils disable-fts browse.db museums 2>/dev/null
29 | ```
30 | 


--------------------------------------------------------------------------------
/bash/loop-over-csv.md:
--------------------------------------------------------------------------------
 1 | # Looping over comma-separated values in Bash
 2 | 
 3 | Given a file (or a process) that produces comma separated values, here's how to split those into separate variables and use them in a bash script.
 4 | 
 5 | The trick is to set the Bash `IFS` to a delimiter, then use `my_array=($my_string)` to split on that delimiter.
 6 | 
 7 | Create a text file called `data.txt` containing this:
 8 | ```
 9 | first,1
10 | second,2
11 | ```
12 | You can create that by doing:
13 | ```bash
14 | echo 'first,1
15 | second,2' > /tmp/data.txt
16 | ```
17 | To loop over that file and print each line:
18 | ```bash
19 | for line in $(cat /tmp/data.txt);
20 | do
21 |   echo $line
22 | done
23 | ```
24 | To split each line into two separate variables in the loop, do this:
25 | ```bash
26 | for line in $(cat /tmp/data.txt);
27 | do
28 |   IFS=$','; split=($line); unset IFS;
29 |   # $split is now a bash array
30 |   echo "Column 1: ${split[0]}"
31 |   echo "Column 2: ${split[1]}"
32 | done
33 | ```
34 | Outputs:
35 | ```
36 | Column 1: first
37 | Column 2: 1
38 | Column 1: second
39 | Column 2: 2
40 | ```
41 | Here's a script I wrote using this technique for the TIL [Use labels on Cloud Run services for a billing breakdown](https://til.simonwillison.net/til/til/cloudrun_use-labels-for-billing-breakdown.md):
42 | ```bash
43 | #!/bin/bash
44 | for line in $(
45 |   gcloud run services list --platform=managed \
46 |     --format="csv(SERVICE,REGION)" \
47 |     --filter "NOT metadata.labels.service:*" \
48 |   | tail -n +2)
49 | do
50 |   IFS=$','; service_and_region=($line); unset IFS;
51 |   service=${service_and_region[0]}
52 |   region=${service_and_region[1]}
53 |   echo "service: $service    region: $region"
54 |   gcloud run services update $service \
55 |     --region=$region --platform=managed \
56 |     --update-labels service=$service
57 |   echo
58 | done
59 | ```
60 | 


--------------------------------------------------------------------------------
/bash/use-awk-to-add-a-prefix.md:
--------------------------------------------------------------------------------
 1 | # Using awk to add a prefix
 2 | 
 3 | I wanted to dynamically run the following command against all files in a directory:
 4 | 
 5 | ```bash
 6 | pypi-to-sqlite content.db -f /tmp/pypi-datasette-packages/packages/airtable-export.json \
 7 | -f /tmp/pypi-datasette-packages/packages/csv-diff.json \
 8 | --prefix pypi_
 9 | ```
10 | 
11 | I can't use `/tmp/pypi-datasette-packages/packages/*.json` here because I need each file to be processed using the `-f` option.
12 | 
13 | I found a solution using `awk`. The `awk` program `'{print "-f "$0}'` adds a prefix to the input, for example:
14 | ```
15 | % echo "blah" | awk '{print "-f "$0}'      
16 | -f blah
17 | ```
18 | I wanted that trailing backslash too, so I used this:
19 | 
20 | ```awk
21 | {print "-f "$0 " \\"}
22 | ```
23 | Piping to `awk` works, so I combined that with `ls ../*.json` like so:
24 | 
25 | ```
26 | % ls /tmp/pypi-datasette-packages/packages/*.json | awk '{print "-f "$0 " \\"}' 
27 | -f /tmp/pypi-datasette-packages/packages/airtable-export.json \
28 | -f /tmp/pypi-datasette-packages/packages/csv-diff.json \
29 | -f /tmp/pypi-datasette-packages/packages/csvs-to-sqlite.json \
30 | ```
31 | Then I used `eval` to execute the command. The full recipe looks like this:
32 | ```bash
33 | args=$(ls /tmp/pypi-datasette-packages/packages/*.json | awk '{print "-f "$0 " \\"}')
34 | eval "pypi-to-sqlite content.db $args
35 | --prefix pypi_"
36 | ```
37 | Full details in [datasette.io issue 98](https://github.com/simonw/datasette.io/issues/98).
38 | 


--------------------------------------------------------------------------------
/cloudflare/cloudflare-cache-html.md:
--------------------------------------------------------------------------------
 1 | # How to get Cloudflare to cache HTML
 2 | 
 3 | To my surprise, if you setup a [Cloudflare](https://www.cloudflare.com/) caching proxy in front of a website it won't cache HTML pages by default, even if they are served with `cache-control:` headers.
 4 | 
 5 | This is [documented here](https://developers.cloudflare.com/cache/troubleshooting/customize-caching/):
 6 | 
 7 | > Cloudflare does not cache HTML resources automatically. This prevents us from unintentionally caching pages that often contain dynamic elements.
 8 | 
 9 | I figured out how to get caching to work using a "Cache Rule". Here's the rule I added:
10 | 
11 | ![Cloudflare Cache Rule interface. Rule name: Cache everything including HTML. When incoming requests match… hostname contains .datasette.site. Expression preview: (http.host contains ".datasette.site"). Then... Cache Elegibility is set to Eligible for cache. Edge TTL is set to Use cache-control header if present, bypass cache if not.](https://static.simonwillison.net/static/2024/cloudflare-cache-rule.jpg)
12 | 
13 | I've told it that for any incoming request with a hostname containing `.datasette.site` (see [background in my weeknotes](https://simonwillison.net/2024/Jan/7/page-caching-and-custom-templates-for-datasette-cloud/)) it should consider that page eligible for caching, and it should respect the `cache-control` header.
14 | 
15 | With this configuration in place, my backend can now serve headers that look like this:
16 | 
17 | `cache-control: s-maxage=15`
18 | 
19 | This will cause Cloudflare to cache the page for 15 seconds.
20 | 
21 | I tried to figure out a rule that would serve all requests no matter what they looked like, but the interface would not let me leave the rules blank - so `hostname contains .datasette.site` was the best I could figure out.
22 | 


--------------------------------------------------------------------------------
/cloudflare/redirect-rules.md:
--------------------------------------------------------------------------------
 1 | # Cloudflare redirect rules with dynamic expressions
 2 | 
 3 | I wanted to ensure `https://niche-museums.com/` would redirect to `https://www.niche-museums.com/` - including any path - using Cloudflare.
 4 | 
 5 | I've [solved this with page rules in the past](https://til.simonwillison.net/cloudflare/redirect-whole-domain), but this time I tried using a "redirect rule" instead.
 6 | 
 7 | Creating a redirect rule that only fires for hits to the `niche-museums.com` (as opposed to `www.niche-museums.com`) hostname was easy. The harder part was figuring out how to assemble the URL.
 8 | 
 9 | I eventually found the clues I needed [in this Cloudflare blog post](https://blog.cloudflare.com/dynamic-redirect-rules). The trick is to assemble a "dynamic" URL redirect using the `concat()` function in the Cloudflare expression language, [described here](https://developers.cloudflare.com/ruleset-engine/rules-language/functions/#transformation-functions).
10 | 
11 |     concat("https://www.niche-museums.com", http.request.uri)
12 | 
13 | Here are the full configuration settings I used for my redirect rule:
14 | 
15 | ![Configuration screen for setting up a custom URL redirection rule. Custom filter expression is selected, indicating the rule will only apply to traffic matching the custom expression. When incoming requests match... Field: Hostname, Operator: Equals, Value: niche-museums.com. Expression Preview: (http.host eq "niche-museums.com"). Then... URL redirect, Type: Dynamic, Expression: concat("https://www.niche-museums.com", http.request.uri), Status code: 301 (permanent redirect). Preserve query string: This option is unchecked. Buttons: Cancel, Save as Draft, Deploy.](https://static.simonwillison.net/static/2024/cloudflare-redirect-rule.jpg)
16 | 


--------------------------------------------------------------------------------
/cloudflare/redirect-whole-domain.md:
--------------------------------------------------------------------------------
 1 | # Redirecting a whole domain with Cloudflare
 2 | 
 3 | I had to run this site on `til.simonwillison.org` for 24 hours due to a domain registration mistake I made.
 4 | 
 5 | Once I got `til.simonwillison.net` working again I wanted to permanently redirect the URLs on the temporary site to their equivalent on the correct domain.
 6 | 
 7 | Since I was running the site behind [Cloudflare](https://www.cloudflare.com/), I could get Cloudflare to handle the redirects for me using a Page Rule, which support wildcards for redirects.
 8 | 
 9 | I used these settings:
10 | 
11 | - URL: `til.simonwillison.org/*`
12 | - Setting: Forwarding URL
13 | - Status code: 301 (permanent redirect)
14 | - Destination URL: `https://til.simonwillison.net/$1`
15 | 
16 | This did the right thing - hits to e.g. https://til.simonwillison.org/cloudflare?a=1 redirect to https://til.simonwillison.net/cloudflare?a=1
17 | 
18 | Here's a screenshot of the settings page I used to create the new Page Rule:
19 | 
20 | ![Screenshot of the Cloudflare interface. Create a Page Rule for simonwillison.org.  If the URL matches: URL (required) til.simonwillison.org/*  Then the settings are: Forwarding URL https://til.simonwillison.net/$1  Select status code (required) 301 - Permanent Redirect. Save and Deploy Page Rule](https://github.com/simonw/til/assets/9599/6758a865-57fa-4da1-9e41-118f41e1d7b2)
21 | 


--------------------------------------------------------------------------------
/cloudrun/billing-metrics-explorer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/cloudrun/billing-metrics-explorer.png


--------------------------------------------------------------------------------
/cloudrun/multiple-gcloud-accounts.md:
--------------------------------------------------------------------------------
 1 | # Switching between gcloud accounts
 2 | 
 3 | I have two different Google Cloud accounts active at the moment. Here's how to list them with `gcloud auth list`:
 4 | 
 5 | ```
 6 | % gcloud auth list
 7 |     Credentialed Accounts
 8 | ACTIVE  ACCOUNT
 9 |         simon@example.com
10 | *       me@gmail.com
11 | 
12 | To set the active account, run:
13 |     $ gcloud config set account `ACCOUNT`
14 | ```
15 | And to switch between them with `gcloud config set account`:
16 | 
17 | ```
18 | % gcloud config set account me@gmail.com
19 | Updated property [core/account].
20 | ```
21 | 


--------------------------------------------------------------------------------
/cloudrun/use-labels-for-billing-breakdown-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/cloudrun/use-labels-for-billing-breakdown-1.png


--------------------------------------------------------------------------------
/cloudrun/use-labels-for-billing-breakdown-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/cloudrun/use-labels-for-billing-breakdown-2.png


--------------------------------------------------------------------------------
/cookiecutter/conditionally-creating-directories.md:
--------------------------------------------------------------------------------
 1 | # Conditionally creating directories in cookiecutter
 2 | 
 3 | I wanted my [datasette-plugin](https://github.com/simonw/datasette-plugin) cookiecutter template to create empty `static` and `templates` directories if the user replied `y` to the `include_static_directory` and `include_templates_directory` prompts.
 4 | 
 5 | The solution was to add a `hooks/post_gen_project.py` script containing the following:
 6 | 
 7 | ```python
 8 | import os
 9 | import shutil
10 | 
11 | 
12 | include_static_directory = bool("{{ cookiecutter.include_static_directory }}")
13 | include_templates_directory = bool("{{ cookiecutter.include_templates_directory }}")
14 | 
15 | 
16 | if include_static_directory:
17 |     os.makedirs(
18 |         os.path.join(
19 |             os.getcwd(),
20 |             "datasette_{{ cookiecutter.underscored }}",
21 |             "static",
22 |         )
23 |     )
24 | 
25 | 
26 | if include_templates_directory:
27 |     os.makedirs(
28 |         os.path.join(
29 |             os.getcwd(),
30 |             "datasette_{{ cookiecutter.underscored }}",
31 |             "templates",
32 |         )
33 |     )
34 | ```
35 | 
36 | Note that these scripts are run through the cookiecutter Jinja template system, so they can use `{{ }}` Jinja syntax to read cookiecutter inputs.
37 | 


--------------------------------------------------------------------------------
/datasette/datasette-on-replit.md:
--------------------------------------------------------------------------------
 1 | # Running Datasette on Replit
 2 | 
 3 | I figured out how to run Datasette on https://replit.com/
 4 | 
 5 | The trick is to start a new Python project and then drop the following into the `main.py` file:
 6 | 
 7 | ```python
 8 | import uvicorn
 9 | from datasette.app import Datasette
10 | 
11 | ds = Datasette(memory=True, files=[])
12 | 
13 | 
14 | if __name__ == "__main__":
15 |     uvicorn.run(ds.app(), host="0.0.0.0", port=8000)
16 | ```
17 | Replit is smart enough to automatically create a `pyproject.toml` file with `datasette` and `uvicorn` as dependencies. It will also notice that the application is running on port 8000 and set `https://name-of-prject.your-username.repl.co` to proxy to that port. Plus it will restart the server any time it recieves new traffic (and pause it in between groups of requests).
18 | 
19 | To serve a database file, download it using `wget` in the Replit console and add it to the `files=[]` argument. I ran this:
20 | 
21 |     wget https://datasette.io/content.db
22 | 
23 | Then changed that first line to:
24 | 
25 | ```python
26 | ds = Datasette(files=["content.db"])
27 | ```
28 | And restarted the server.
29 | 


--------------------------------------------------------------------------------
/datasette/redirects-for-datasette.md:
--------------------------------------------------------------------------------
 1 | # Redirects for Datasette
 2 | 
 3 | I made some changes to my https://til.simonwillison.net/ site that resulted in cleaner URL designs, so I needed to setup some redirects. I configured the redirects using a one-off Datasette plugin called `redirects.py` which I dropped into the `plugins/` directory for the Datasette instance:
 4 | 
 5 | ```python
 6 | from datasette import hookimpl
 7 | from datasette.utils.asgi import Response
 8 | 
 9 | 
10 | @hookimpl
11 | def register_routes():
12 |     return (
13 |         (r"^/til/til/(?P<topic>[^_]+)_(?P<slug>[^\.]+)\.md$", lambda request: Response.redirect(
14 |             "/{topic}/{slug}".format(**request.url_vars), status=301
15 |         )),
16 |         ("^/til/feed.atom$", lambda: Response.redirect("/tils/feed.atom", status=301)),
17 |         (
18 |             "^/til/search$",
19 |             lambda request: Response.redirect(
20 |                 "/tils/search"
21 |                 + (("?" + request.query_string) if request.query_string else ""),
22 |                 status=301,
23 |             ),
24 |         ),
25 |     )
26 | ```
27 | 


--------------------------------------------------------------------------------
/datasette/register-new-plugin-hooks.md:
--------------------------------------------------------------------------------
 1 | # Registering new Datasette plugin hooks by defining them in other plugins
 2 | 
 3 | I'm experimenting with a Datasette plugin that itself adds new plugin hooks which other plugins can then interact with.
 4 | 
 5 | It's called [datasette-low-disk-space-hook](https://github.com/simonw/datasette-low-disk-space-hook), and it adds a new plugin hook called `low_disk_space(datasette)`, defined in the [datasette_low_disk_space_hook/hookspecs.py](https://github.com/simonw/datasette-low-disk-space-hook/blob/0.1a0/datasette_low_disk_space_hook/hookspecs.py) module.
 6 | 
 7 | The hook is registered by this code in [datasette_low_disk_space_hook/\_\_init\_\_.py](https://github.com/simonw/datasette-low-disk-space-hook/blob/0.1a0/datasette_low_disk_space_hook/__init__.py)
 8 | 
 9 | ```python
10 | from datasette.utils import await_me_maybe
11 | from datasette.plugins import pm
12 | from . import hookspecs
13 | 
14 | pm.add_hookspecs(hookspecs)
15 | ```
16 | This imports the plugin manager directly from Datasette and uses it to add the new hooks.
17 | 
18 | I was worried that the `pm.add_hookspects(hookspecs)` line was not guaranteed to be executed if that module had not been imported.
19 | 
20 | It turns out that having this `entrpoints=` line in [setup.py](https://github.com/simonw/datasette-low-disk-space-hook/blob/0.1a0/setup.py) is enough to ensure that the module is imported and the `pm.add_hookspecs()` line is executed:
21 | 
22 | ```python
23 | from setuptools import setup
24 | 
25 | setup(
26 |     name="datasette-low-disk-space-hook",
27 |     # ...
28 |     entry_points={"datasette": ["low_disk_space_hook = datasette_low_disk_space_hook"]},
29 |     # ...
30 | )
31 | ```
32 | 


--------------------------------------------------------------------------------
/datasette/reuse-click-for-register-commands.md:
--------------------------------------------------------------------------------
 1 | # Reusing an existing Click tool with register_commands
 2 | 
 3 | The [register_commands](https://docs.datasette.io/en/stable/plugin_hooks.html#register-commands-cli) plugin hook lets you add extra sub-commands to the `datasette` CLI tool.
 4 | 
 5 | I have a lot of existing tools that I'd like to also make available as plugins. I figured out this pattern for my [git-history](https://datasette.io/tools/git-history) tool today:
 6 | 
 7 | ```python
 8 | from datasette import hookimpl
 9 | from git_history.cli import cli as git_history_cli
10 | 
11 | @hookimpl
12 | def register_commands(cli):
13 |     cli.add_command(git_history_cli, name="git-history")
14 | ```
15 | Now I can run the following:
16 | 
17 | ```
18 | % datasette git-history --help
19 | Usage: datasette git-history [OPTIONS] COMMAND [ARGS]...
20 | 
21 |   Tools for analyzing Git history using SQLite
22 | 
23 | Options:
24 |   --version  Show the version and exit.
25 |   --help     Show this message and exit.
26 | 
27 | Commands:
28 |   file  Analyze the history of a specific file and write it to SQLite
29 | ```
30 | 
31 | I initially tried doing this:
32 | 
33 | ```python
34 | @hookimpl
35 | def register_commands(cli):
36 |     cli.command(name="git-history")(git_history_file)
37 | ```
38 | But got the following error:
39 | 
40 |     TypeError: Attempted to convert a callback into a command twice.
41 | 
42 | Using [cli.add_command()](https://click.palletsprojects.com/en/8.0.x/api/?highlight=add_command#click.Group.add_command) turns out to be the right way to do this.
43 | 
44 | Research issue for this: [datasette#1538](https://github.com/simonw/datasette/issues/1538).
45 | 


--------------------------------------------------------------------------------
/datasette/serving-mbtiles.md:
--------------------------------------------------------------------------------
 1 | # Serving MBTiles with datasette-media
 2 | 
 3 | The [MBTiles](https://github.com/mapbox/mbtiles-spec) format uses SQLite to bundle map tiles for use with libraries such as Leaflet.
 4 | 
 5 | I figured out how to use the [datasette-media](https://datasette.io/plugins/datasette-media) to serve tiles from this MBTiles file containing two zoom levels of tiles for San Francisco: https://static.simonwillison.net/static/2021/San_Francisco.mbtiles
 6 | 
 7 | This TIL is now entirely obsolete: I used this prototype to build the new [datasette-tiles](https://datasette.io/plugins/datasette-tiles) plugin.
 8 | 
 9 | ```yaml
10 | plugins:
11 |   datasette-cluster-map:
12 |     tile_layer: "/-/media/tiles/{z},{x},{y}"
13 |     tile_layer_options:
14 |       attribution: "© OpenStreetMap contributors"
15 |       tms: 1
16 |       bounds: [[37.61746256103807, -122.57290320721465],[37.85395101481279, -122.27695899334748]]
17 |       minZoom: 15
18 |       maxZoom: 16
19 |   datasette-media:
20 |     tiles:
21 |       database: San_Francisco
22 |       sql:
23 |         with comma_locations as (
24 |           select instr(:key, ',') as first_comma,
25 |           instr(:key, ',') + instr(substr(:key, instr(:key, ',') + 1), ',') as second_comma
26 |         ),
27 |         variables as (
28 |           select
29 |             substr(:key, 0, first_comma) as z,
30 |             substr(:key, first_comma + 1, second_comma - first_comma - 1) as x,
31 |             substr(:key, second_comma + 1) as y
32 |           from comma_locations
33 |         )
34 |         select
35 |           tile_data as content,
36 |           'image/png' as content_type
37 |         from
38 |           tiles, variables
39 |         where
40 |           zoom_level = variables.z
41 |           and tile_column = variables.x
42 |           and tile_row = variables.y
43 | ```
44 | 


--------------------------------------------------------------------------------
/django/efficient-bulk-deletions-in-django.md:
--------------------------------------------------------------------------------
 1 | # Efficient bulk deletions in Django
 2 | 
 3 | I needed to bulk-delete a large number of objects today. Django deletions are relatively inefficient by default, because Django implements its own version of cascading deletions and fires signals for each deleted object.
 4 | 
 5 | I knew that I wanted to avoid both of these and run a bulk `DELETE` SQL operation.
 6 | 
 7 | Django has an undocumented `queryset._raw_delete(db_connection)` method that can do this:
 8 | 
 9 | ```python
10 | reports_qs = Report.objects.filter(public_id__in=report_ids)
11 | reports_qs._raw_delete(reports_qs.db)
12 | ```
13 | But this failed for me, because my `Report` object has a many-to-many relationship with another table - and those records were not deleted.
14 | 
15 | I could have hand-crafted a PostgreSQL cascading delete here, but I instead decided to manually delete those many-to-many records first. Here's what that looked like:
16 | 
17 | ```python
18 | report_availability_tag_qs = (
19 |     Report.availability_tags.through.objects.filter(
20 |         report__public_id__in=report_ids
21 |     )
22 | )
23 | report_availability_tag_qs._raw_delete(report_availability_tag_qs.db)
24 | ```
25 | This didn't quite work either, because I have another model `Location` with foreign key references to those reports. So I added this:
26 | ```python
27 | Location.objects.filter(latest_report__public_id__in=report_ids).update(
28 |     latest_report=None
29 | )
30 | ```
31 | That combination worked! The Django debug toolbar confirmed that this executed one `UPDATE` followed by two efficient bulk `DELETE` operations.
32 | 


--------------------------------------------------------------------------------
/django/extra-read-only-admin-information.md:
--------------------------------------------------------------------------------
 1 | # Adding extra read-only information to a Django admin change page
 2 | 
 3 | I figured out this pattern today for adding templated extra blocks of information to the Django admin change page for an object.
 4 | 
 5 | It's really simply and incredibly useful. I can see myself using this a lot in the future.
 6 | 
 7 | ```python
 8 | from django.contrib import admin
 9 | from django.template.loader import render_to_string
10 | from django.utils.safestring import mark_safe
11 | from .models import Reporter
12 | 
13 | 
14 | @admin.register(Reporter)
15 | class ReporterAdmin(admin.ModelAdmin):
16 |     # ...
17 |     readonly_fields = ("recent_calls",)
18 | 
19 |     def recent_calls(self, instance):
20 |         return mark_safe(
21 |             render_to_string(
22 |                 "admin/_reporter_recent_calls.html",
23 |                 {
24 |                     "reporter": instance,
25 |                     "recent_calls": instance.call_reports.order_by("-created_at")[:20],
26 |                     "call_count": instance.call_reports.count(),
27 |                 },
28 |             )
29 |         )
30 | ```
31 | 
32 | That's it! `recent_calls` is marked as a read-only field, then implemented as a method which returns HTML. That method passes the instance to a template using `render_to_string`. That template looks like this:
33 | 
34 | ```html+jinja
35 | <h2>{{ reporter }} has made {{ call_count }} call{{ call_count|pluralize }}</h2>
36 | 
37 | <p><strong>Recent calls</strong> (<a href="/admin/core/callreport/?reported_by__exact={{ reporter.id }}">view all</a>)</p>
38 | 
39 | {% for call in recent_calls %}
40 |     <p><a href="/admin/core/location/{{ call.location.id }}/change/">{{ call.location }}</a> on {{ call.created_at }}</p>
41 | {% endfor %}
42 | ```
43 | 


--------------------------------------------------------------------------------
/django/migration-postgresql-fuzzystrmatch.md:
--------------------------------------------------------------------------------
 1 | # Enabling the fuzzystrmatch extension in PostgreSQL with a Django migration
 2 | 
 3 | The PostgreSQL [fuzzystrmatch extension](https://www.postgresql.org/docs/13/fuzzystrmatch.html) enables several functions for fuzzy string matching: `soundex()`, `difference()`, `levenshtein()`, `levenshtein_less_equal()`, `metaphone()`, `dmetaphone()` and `dmetaphone_alt()`.
 4 | 
 5 | Enabling them for use with Django turns out to be really easy - it just takes a migration that looks something like this:
 6 | 
 7 | ```python
 8 | from django.contrib.postgres.operations import CreateExtension
 9 | from django.db import migrations
10 | 
11 | 
12 | class Migration(migrations.Migration):
13 | 
14 |     dependencies = [
15 |         ("core", "0089_importrun_sourcelocation"),
16 |     ]
17 | 
18 |     operations = [
19 |         CreateExtension(name="fuzzystrmatch"),
20 |     ]
21 | ```
22 | 


--------------------------------------------------------------------------------
/django/migrations-runsql-noop.md:
--------------------------------------------------------------------------------
 1 | # migrations.RunSQL.noop for reversible SQL migrations
 2 | 
 3 | `migrations.RunSQL.noop` provides an easy way to create "reversible" Django SQL migrations, where the reverse operation does nothing (but keeps it possible to reverse back to a previous migration state without being blocked by an irreversible migration).
 4 | 
 5 | ```python
 6 | from django.db import migrations
 7 | 
 8 | 
 9 | class Migration(migrations.Migration):
10 | 
11 |     dependencies = [
12 |         ("app", "0114_last_migration"),
13 |     ]
14 | 
15 |     operations = [
16 |         migrations.RunSQL(
17 |             sql="""
18 |                 update concordance_identifier
19 |                 set authority = replace(authority, ':', '_')
20 |                 where authority like '%:%'
21 |             """,
22 |             reverse_sql=migrations.RunSQL.noop,
23 |         )
24 |     ]
25 | ```
26 | 


--------------------------------------------------------------------------------
/django/postgresql-full-text-search-admin.md:
--------------------------------------------------------------------------------
 1 | # PostgreSQL full-text search in the Django Admin
 2 | 
 3 | Django 3.1 introduces PostgreSQL `search_type="websearch"` - which gives you search with advanced operators like `"phrase search" -excluding`. James Turk [wrote about this here](https://jamesturk.net/posts/websearch-in-django-31/), and it's also in [my weeknotes](https://simonwillison.net/2020/Jul/23/datasette-copyable-datasette-insert-api/).
 4 | 
 5 | I decided to add it to my Django Admin interface. It was _really easy_ using the `get_search_results()` model admin method, [documented here](https://docs.djangoproject.com/en/3.0/ref/contrib/admin/#django.contrib.admin.ModelAdmin.get_search_results).
 6 | 
 7 | My models already have a `search_document` full-text search column, as described in [Implementing faceted search with Django and PostgreSQL](https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/). So all I needed to add to my `ModelAdmin` subclasses was this:
 8 | 
 9 | ```python
10 |     def get_search_results(self, request, queryset, search_term):
11 |         if not search_term:
12 |             return super().get_search_results(
13 |                 request, queryset, search_term
14 |             )
15 |         query = SearchQuery(search_term, search_type="websearch")
16 |         rank = SearchRank(F("search_document"), query)
17 |         queryset = (
18 |             queryset
19 |             .annotate(rank=rank)
20 |             .filter(search_document=query)
21 |             .order_by("-rank")
22 |         )
23 |         return queryset, False
24 | ```
25 | Here's [the full implementation](https://github.com/simonw/simonwillisonblog/blob/6c0de9f9976ef831fe92106be662d77dfe80b32a/blog/admin.py) for my personal blog.
26 | 


--------------------------------------------------------------------------------
/django/pretty-print-json-admin.md:
--------------------------------------------------------------------------------
 1 | # Pretty-printing all read-only JSON in the Django admin
 2 | 
 3 | I have a bunch of models with JSON fields that are marked as read-only in the Django admin - usually because they're recording the raw JSON that was imported from an API somewhere to create an object, for debugging purposes.
 4 | 
 5 | Here's a pattern I found for pretty-printing ANY JSON value that is displayed in a read-only field in the admin. Create a template called `admin/change_form.html` and populate it with the following:
 6 | 
 7 | ```html+django
 8 | {% extends "admin/change_form.html" %}
 9 | {% block admin_change_form_document_ready %}
10 | {{ block.super }}
11 | <script>
12 | Array.from(document.querySelectorAll('div.readonly')).forEach(div => {
13 |     let data;
14 |     try {
15 |         data = JSON.parse(div.innerText);
16 |     } catch {
17 |         // Not valid JSON
18 |         return;
19 |     }
20 |     div.style.whiteSpace = 'pre-wrap';
21 |     div.style.fontFamily = 'courier';
22 |     div.style.fontSize = '0.9em';
23 |     div.innerText = JSON.stringify(data, null, 2);
24 | });
25 | </script>
26 | {% endblock %}
27 | ```
28 | This JavaScript will execute on every Django change form page, scanning for `div.readonly`, checking to see if the div contains a valid JSON value and pretty-printing it using JavaScript if it does.
29 | 
30 | It's a cheap hack and it works great.
31 | 


--------------------------------------------------------------------------------
/docker/attach-bash-to-running-container.md:
--------------------------------------------------------------------------------
 1 | # Attaching a bash shell to a running Docker container
 2 | 
 3 | Use `docker ps` to find the container ID:
 4 | 
 5 |     $ docker ps                        
 6 |     CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS               NAMES
 7 |     81b2ad3194cb        alexdebrie/livegrep-base:1   "/livegrep-github-re…"   2 minutes ago       Up 2 minutes                            compassionate_yalow
 8 | 
 9 | Run `docker exec -it ID bash` to start a bash session in that container:
10 | 
11 |     $ docker exec -it 81b2ad3194cb bash
12 | 
13 | I made the mistake of using `docker attach 81b2ad3194cb` first, which attaches you to the command running as CMD in that conatiner, and means that if you hit `Ctrl+C` you exit that command and terminate the container!
14 | 


--------------------------------------------------------------------------------
/docker/gdb-python-docker.md:
--------------------------------------------------------------------------------
 1 | # Running gdb against a Python process in a running Docker container
 2 | 
 3 | While investigating [Datasette issue #1268](https://github.com/simonw/datasette/issues/1268) I found myself with a Python process that was hanging, and I decided to try running `gdb` against it based on tips in [Debugging of CPython processes with gdb](https://www.podoliaka.org/2016/04/10/debugging-cpython-gdb/)
 4 | 
 5 | Here's the recipe that worked:
 6 | 
 7 | 1. Find the Docker container ID using `docker ps` - in my case it was `16197781a7b5`
 8 | 2. Attach a new bash shell to that process in privileged mode (needed to get `gdb` to work): `docker exec --privileged -it 16197781a7b5 bash`
 9 | 3. Install `gdb` and the Python tooling for using it: `apt-get install gdb python3-dbg`
10 | 4. Use `top` to find the pid of the running Python process that was hanging. It was `20` for me.
11 | 5. Run `gdb /usr/bin/python3 -p 20` to launch `gdb` against that process
12 | 6. In the `(gdb)` prompt run `py-bt` to see a backtrace.
13 | 
14 | I'm sure there's lots more that can be done in `gdb` at this point, but that's how I got to a place where I could interact with the Python process that was running in the Docker container.
15 | 


--------------------------------------------------------------------------------
/docker/test-fedora-in-docker.md:
--------------------------------------------------------------------------------
 1 | # Testing things in Fedora using Docker
 2 | 
 3 | I got [a report](https://twitter.com/peterjanes/status/1552407491819884544) of a bug with my [s3-ocr tool](https://simonwillison.net/2022/Jun/30/s3-ocr/) running on Fedora.
 4 | 
 5 | I attempted to replicate the bug in a Fedora container using Docker, by running this command:
 6 | 
 7 | ```
 8 | docker run -it fedora:latest /bin/bash
 9 | ```
10 | This downloaded [the official image](https://hub.docker.com/_/fedora) and dropped me into a Bash shell.
11 | 
12 | It turns out Fedora won't let you run `pip install` with its default Python 3 without first creating a virtual environment:
13 | 
14 | ```
15 | [root@d1146e0061d1 /]# python3 -m pip install s3-ocr
16 | /usr/bin/python3: No module named pip
17 | [root@d1146e0061d1 /]# python3 -m venv project_venv
18 | [root@d1146e0061d1 /]# source project_venv/bin/activate
19 | (project_venv) [root@d1146e0061d1 /]# python -m pip install s3-ocr
20 | Collecting s3-ocr
21 |   Downloading s3_ocr-0.5-py3-none-any.whl (14 kB)
22 | Collecting sqlite-utils
23 |   ...
24 | ```
25 | Having done that I could test out my `s3-ocr` command like so:
26 | 
27 | ```
28 | (project_venv) [root@d1146e0061d1 /]# s3-ocr start --help
29 | Usage: s3-ocr start [OPTIONS] BUCKET [KEYS]...
30 | 
31 |   Start OCR tasks for PDF files in an S3 bucket
32 | 
33 |       s3-ocr start name-of-bucket path/to/one.pdf path/to/two.pdf
34 |   ...
35 | ```
36 | 


--------------------------------------------------------------------------------
/electron/electron-debugger-console.md:
--------------------------------------------------------------------------------
 1 | # Using the Chrome DevTools console as a REPL for an Electron app
 2 | 
 3 | I figured out how to use the Chrome DevTools to execute JavaScript interactively inside the Electron main process. I always like having a REPL for exploring APIs, and this means I can explore the Electron and Node.js APIs interactively.
 4 | 
 5 | <img width="945" alt="Simon_Willison’s_Weblog_and_DevTools_-_Node_js_and_Inspect_with_Chrome_Developer_Tools" src="https://user-images.githubusercontent.com/9599/131575749-a509c528-6746-42b0-8efd-03cd77f6dc2d.png">
 6 | 
 7 | https://www.electronjs.org/docs/tutorial/debugging-main-process#--inspectport says you need to run:
 8 | 
 9 |     electron --inspect=5858 your/app
10 | 
11 | I start Electron by running `npm start`, so I modified my `package.json` to include this:
12 | 
13 | ```json
14 |   "scripts": {
15 |     "start": "electron --inspect=5858 ."
16 | ```
17 | Then I ran `npm start`.
18 | 
19 | To connect the debugger, open Google Chrome and visit `chrome://inspect/` - then click the "Open dedicated DevTools for Node" link.
20 | 
21 | In that window, select the "Connection" tab and add a connection to `localhost:5858`:
22 | 
23 | <img width="901" alt="8_31_21__2_08_PM" src="https://user-images.githubusercontent.com/9599/131576143-03b28fd7-fab4-495a-8060-662b0247eabd.png">
24 | 
25 | Switch back to the "Console" tab and you can start interacting with the Electron environment.
26 | 
27 | I tried this and it worked:
28 | 
29 | ```javascript
30 | const { app, Menu, BrowserWindow, dialog } = require("electron");
31 | new BrowserWindow({height: 100, width: 100}).loadURL("https://simonwillison.net/");
32 | ```
33 | 


--------------------------------------------------------------------------------
/electron/electron-external-links-system-browser.md:
--------------------------------------------------------------------------------
 1 | # Open external links in an Electron app using the system browser
 2 | 
 3 | For [Datasette.app](https://github.com/simonw/datasette-app) I wanted to ensure that links to external URLs would [open in the system browser](https://github.com/simonw/datasette-app/issues/34).
 4 | 
 5 | This recipe works:
 6 | 
 7 | ```javascript
 8 | function postConfigure(window) {
 9 |   window.webContents.on("will-navigate", function (event, reqUrl) {
10 |     let requestedHost = new URL(reqUrl).host;
11 |     let currentHost = new URL(window.webContents.getURL()).host;
12 |     if (requestedHost && requestedHost != currentHost) {
13 |       event.preventDefault();
14 |       shell.openExternal(reqUrl);
15 |     }
16 |   });
17 | }
18 | ```
19 | The `will-navigate` event fires before any in-browser navigations, which means they can be intercepted and cancelled if necessary.
20 | 
21 | I use the `URL()` class to extract the `.host` so I can check if the host being navigated to differs from the host that the application is running against (which is probably `localhost:$port`).
22 | 
23 | Initially I was using `require('url').URL` for this but that doesn't appear to be necessary - Node.js ships with `URL` as a top-level class these days.
24 | 
25 | `event.preventDefault()` cancels the navigation and `shell.openExternal(reqUrl)` opens the URL using the system default browsner.
26 | 
27 | I call this function on any new window I create using `new BrowserWindow` - for example:
28 | 
29 | ```javascript
30 | mainWindow = new BrowserWindow({
31 |   width: 800,
32 |   height: 600,
33 |   show: false,
34 | });
35 | mainWindow.loadFile("loading.html");
36 | mainWindow.once("ready-to-show", () => {
37 |   mainWindow.show();
38 | });
39 | postConfigure(mainWindow);
40 | ```
41 | 
42 | 


--------------------------------------------------------------------------------
/firefox/search-across-all-resources-2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/firefox/search-across-all-resources-2.jpg


--------------------------------------------------------------------------------
/firefox/search-across-all-resources.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/firefox/search-across-all-resources.jpg


--------------------------------------------------------------------------------
/firefox/search-across-all-resources.md:
--------------------------------------------------------------------------------
 1 | # Search across all loaded resources in Firefox
 2 | 
 3 | You can search for a string in any resource loaded by a page (including across HTML, JavaScript and CSS) in the Debugger pane by hitting Command+Shift+F.
 4 | 
 5 | <img alt="Screenshot of search interface" src="https://raw.githubusercontent.com/simonw/til/main/firefox/search-across-all-resources.jpg" width="600">
 6 | 
 7 | This view doesn't search the body of any JSON assets that were fetched by code, presumably because JSON isn't automatically loaded into memory by the browser.
 8 | 
 9 | But ([thanks, @digitarald](https://twitter.com/digitarald/status/1257748744352567296)) the Network pane DOES let you search for content in assets fetched via Ajax/fetch() etc - though you do have to run the search before you trigger the requests that the search should cover. Again, the shortcut is Command+Shift+F.
10 | 
11 | <img alt="Screenshot of search interface" src="https://raw.githubusercontent.com/simonw/til/main/firefox/search-across-all-resources-2.jpg" width="600">
12 | 


--------------------------------------------------------------------------------
/fly/scp.md:
--------------------------------------------------------------------------------
 1 | # How to scp files to and from Fly
 2 | 
 3 | I have a Fly instance with a 20GB volume, and I wanted to copy files to and from the instance from my computer using `scp`.
 4 | 
 5 | Here's the process that worked for me.
 6 | 
 7 | 1. Connect to Fly's WireGuard network. Fly have [step by step instructions](https://fly.io/docs/reference/private-networking/#step-by-step) for this - you need to install a WireGuard app (I used the [official WireGuard macOS app](https://www.wireguard.com/install/)) and use the `fly wireguard create` command to configure it.
 8 | 2. Generate 24 hour limited SSH credentials for your Fly organization: Run `fly ssh issue`, follow the prompt to select your organization and then tell it where to put the credentials. I saved them to `/tmp/fly` since they will only work for 24 hours.
 9 | 3. Find the IPv6 private address for the instance you want to connect to. My instance is in the `laion-aesthetic` application so I did this by running: `fly ips private -a laion-aesthetic`
10 | 4. If the image you used to build the instance doesn't have `scp` installed you'll need to install it. On Ubuntu or Debian machines you can do that by attaching using `fly ssh console -a name-of-app` and then running `apt-get update && install openssh-client -y`. Any time you restart the container you'll have to run this step again, so if you're going to do it often you should instead update the image you are using to include this package.
11 | 6. Run the `scp` like this: `scp -i /tmp/fly root@\[fdaa:0:4ef:a7b:ad0:1:9c23:2\]:/data/data.db /tmp` - note how the IPv6 address is enclosed in `\[...\]`.
12 | 


--------------------------------------------------------------------------------
/gis/mapzen-elevation-tiles.md:
--------------------------------------------------------------------------------
 1 | # Downloading MapZen elevation tiles
 2 | 
 3 | [Via Tony Hirst](https://twitter.com/psychemedia/status/1357280624319553537) I found out about [MapZen's elevation tiles](https://www.mapzen.com/blog/terrain-tile-service/), which encode elevation data in PNG and other formats.
 4 | 
 5 | These days they live at https://registry.opendata.aws/terrain-tiles/
 6 | 
 7 | I managed to download a subset of them using [download-tiles](https://datasette.io/tools/download-tiles) like so:
 8 | 
 9 | ```
10 | download-tiles elevation.mbtiles -z 0-4 \
11 |   --tiles-url='https://s3.amazonaws.com/elevation-tiles-prod/terrarium/{z}/{x}/{y}.png'
12 | ```
13 | I'm worried I may have got the x and y the wrong way round though, see comments on https://github.com/simonw/datasette-tiles/issues/15
14 | 


--------------------------------------------------------------------------------
/git/git-archive.md:
--------------------------------------------------------------------------------
 1 | #  How to create a tarball of a git repository using "git archive"
 2 | 
 3 | I figured this out in [a Gist in 2016](https://gist.github.com/simonw/a44af92b4b255981161eacc304417368) which has attracted a bunch of comments over the years. Now I'm upgrading it to a retroactive TIL.
 4 | 
 5 | Run this in the repository folder:
 6 | 
 7 |     git archive --format=tar.gz -o /tmp/my-repo.tar.gz --prefix=my-repo/ main
 8 | 
 9 | This will write out a file to `/tmp/my-repo.tar.gz`.
10 | 
11 | When you `tar -xzvf my-repo.tar.gz` that file it will output a `my-repo/` directory with just the files - not the `.git` folder - from your repository.
12 | 
13 | You can use a commit hash or tag or branch name instead of `main` to create an archive of a different point in that repository.
14 | 
15 | Without the `--prefix` option you'll get a `.tar.gz` file which, when compressed, writes a bunch of stuff to your current directory. This usually isn't what you want!
16 | 
17 | Here's a version that picks up the name of the directory you run it in:
18 | 
19 |     git archive --format=tar.gz -o $(basename $PWD).tar.gz --prefix=$(basename $PWD)/ main
20 | 
21 | Note the trailing `/` on `--prefix` - without this you'll get folders called things like `datasettetests`.
22 | 
23 | `basename $PWD` gives you the name of your current folder.
24 | 


--------------------------------------------------------------------------------
/git/remove-commit-and-force-push.md:
--------------------------------------------------------------------------------
 1 | # Removing a git commit and force pushing to remove it from history
 2 | 
 3 | I accidentally triggered a commit which added a big chunk of unwanted data to my repository. I didn't want this to stick around in the history forever, and no-one else was pulling from the repo, so I decided to use force push to remove the rogue commit entirely.
 4 | 
 5 | I figured out the commit hash of the previous version that I wanted to restore and ran:
 6 | 
 7 |     git reset --hard 1909f93
 8 | 
 9 | Then I ran the force push like this:
10 | 
11 |     git push --force origin main
12 | 
13 | See https://github.com/simonw/sf-tree-history/issues/1
14 | 


--------------------------------------------------------------------------------
/github-actions/continue-on-error.md:
--------------------------------------------------------------------------------
 1 | # Skipping a GitHub Actions step without failing
 2 | 
 3 | I wanted to have a GitHub Action step run that might fail, but if it failed the rest of the steps should still execute and the overall run should be treated as a success.
 4 | 
 5 | `continue-on-error: true` does exactly that:
 6 | 
 7 | ```yaml
 8 |     - name: Download previous database
 9 |       run: curl --fail -o tils.db https://til.simonwillison.net/tils.db
10 |       continue-on-error: true
11 |     - name: Build database
12 |       run: python build_database.py
13 | ```
14 | 
15 | [From this workflow](https://github.com/simonw/til/blob/7d799a24921f66e585b8a6b8756b7f8040c899df/.github/workflows/build.yml#L32-L36)
16 | 
17 | I'm using `curl --fail` here which returns an error code if the file download files (without `--fail` it was writing out a two line error message to a file called `tils.db` which is not what I wanted). Then `continue-on-error: true` to keep on going even if the download failed.
18 | 
19 | My `build_database.py` script updates the `tils.db` database file if it exists and creates it from scratch if it doesn't.
20 | 


--------------------------------------------------------------------------------
/github-actions/different-steps-on-a-schedule.md:
--------------------------------------------------------------------------------
 1 | # Running different steps on a schedule
 2 | 
 3 | Say you have a workflow that runs hourly, but once a day you want the workflow to run slightly differently - without duplicating the entire workflow.
 4 | 
 5 | Thanks to @BrightRan, here's [the solution](https://github.community/t5/GitHub-Actions/Schedule-once-an-hour-but-do-something-different-once-a-day/m-p/54382/highlight/true#M9168). Use the following pattern in an `if:` condition for a step:
 6 | 
 7 |     github.event_name == 'schedule' && github.event.schedule == '20 17 * * *'
 8 | 
 9 | Longer example:
10 | 
11 | ```yaml
12 | name: Fetch updated data and deploy
13 | 
14 | on:
15 |   push:
16 |   schedule:
17 |     - cron: '5,35 * * * *'
18 |     - cron: '20 17 * * *'
19 | 
20 | jobs:
21 |   build_and_deploy:
22 |     runs-on: ubuntu-latest
23 |     steps:
24 |     # ...
25 |     - name: Download existing .db files
26 |       if: |-
27 |         !(github.event_name == 'schedule' && github.event.schedule == '20 17 * * *')
28 |       env:
29 |         DATASETTE_TOKEN: ${{ secrets.DATASETTE_TOKEN }}
30 |       run: |-
31 |         datasette-clone https://biglocal.datasettes.com/ dbs -v --token=$DATASETTE_TOKEN
32 | ```
33 | I used this [here](https://github.com/simonw/big-local-datasette/blob/35e1acd4d9859d3af2feb29d0744ce1550e5faec/.github/workflows/deploy.yml), see [#11](https://github.com/simonw/big-local-datasette/issues/11).
34 | 


--------------------------------------------------------------------------------
/github-actions/dump-context.md:
--------------------------------------------------------------------------------
 1 | # Dump out all GitHub Actions context
 2 | 
 3 | Useful for seeing what's available for `if: ` conditions (see [context and expression syntax](https://help.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions)).
 4 | 
 5 | I copied this example action [from here](https://help.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#example-printing-context-information-to-the-log-file) and deployed it [here](https://github.com/simonw/playing-with-actions/blob/master/.github/workflows/dump-context.yml). Here's an [example run](https://github.com/simonw/playing-with-actions/runs/599575180?check_suite_focus=true).
 6 | 
 7 | ```yaml
 8 | on: push
 9 | 
10 | jobs:
11 |   one:
12 |     runs-on: ubuntu-16.04
13 |     steps:
14 |       - name: Dump GitHub context
15 |         env:
16 |           GITHUB_CONTEXT: ${{ toJson(github) }}
17 |         run: echo "$GITHUB_CONTEXT"
18 |       - name: Dump job context
19 |         env:
20 |           JOB_CONTEXT: ${{ toJson(job) }}
21 |         run: echo "$JOB_CONTEXT"
22 |       - name: Dump steps context
23 |         env:
24 |           STEPS_CONTEXT: ${{ toJson(steps) }}
25 |         run: echo "$STEPS_CONTEXT"
26 |       - name: Dump runner context
27 |         env:
28 |           RUNNER_CONTEXT: ${{ toJson(runner) }}
29 |         run: echo "$RUNNER_CONTEXT"
30 |       - name: Dump strategy context
31 |         env:
32 |           STRATEGY_CONTEXT: ${{ toJson(strategy) }}
33 |         run: echo "$STRATEGY_CONTEXT"
34 |       - name: Dump matrix context
35 |         env:
36 |           MATRIX_CONTEXT: ${{ toJson(matrix) }}
37 |         run: echo "$MATRIX_CONTEXT"
38 | ```
39 | 


--------------------------------------------------------------------------------
/github-actions/ensure-labels.md:
--------------------------------------------------------------------------------
 1 | # Ensure labels exist in a GitHub repository
 2 | 
 3 | I wanted to ensure that when [this template repository](https://github.com/simonw/action-transcription) was used to create a new repo that repo would have a specific set of labels.
 4 | 
 5 | Here's the workflow I came up with, saved as `.github/workflows/ensure_labels.yml`:
 6 | 
 7 | ```yaml
 8 | name: Ensure labels
 9 | on: [push]
10 | 
11 | jobs:
12 |   ensure_labels:
13 |     runs-on: ubuntu-latest
14 |     steps:
15 |     - name: Create labels
16 |       uses: actions/github-script@v6
17 |       with:
18 |         script: |
19 |           try {
20 |             await github.rest.issues.createLabel({
21 |               ...context.repo,
22 |               name: 'captions'
23 |             });
24 |             await github.rest.issues.createLabel({
25 |               ...context.repo,
26 |               name: 'whisper'
27 |             });
28 |           } catch(e) {
29 |             // Ignore if labels exist already
30 |           }
31 | ```
32 | This creates `captions` and `whisper` labels, if they do not yet exist.
33 | 
34 | It's wrapped in a `try/catch` so that if the labels exist already (as they will on subsequent runs) the error can be ignored.
35 | 
36 | Note that you need to use `await ...` inside that `try/catch` block or exceptions thrown by those methods will still cause the action run to fail.
37 | 
38 | The `...context.repo` trick saves on having to pass `owner` and `repo` explicitly.
39 | 


--------------------------------------------------------------------------------
/github-actions/grep-tests.md:
--------------------------------------------------------------------------------
 1 | # Using grep to write tests in CI
 2 | 
 3 | GitHub Actions workflows fail if any of the steps executes something that returns a non-zero exit code.
 4 | 
 5 | Today I learned that `grep` returns a non-zero exit code if it fails to find any matches.
 6 | 
 7 | This means that piping to grep is a really quick way to write a test as part of an Actions workflow.
 8 | 
 9 | I wrote a quick soundness check today using the new `datasette --get /path` option, which runs a fake HTTP request for that path through Datasette and returns the response to standard out. Here's an example:
10 | 
11 | ```yaml
12 |     - name: Build database
13 |       run: scripts/build.sh
14 |     - name: Run tests
15 |       run: |
16 |         datasette . --get /us/pillar-point | grep 'Rocky Beaches'
17 |     - name: Deploy to Vercel
18 | ```
19 | I like this pattern a lot: build a database for a custom Datasette deloyment in CI, run one or more quick soundness checks using grep, then deploy if those checks pass.
20 | 


--------------------------------------------------------------------------------
/github-actions/only-master.md:
--------------------------------------------------------------------------------
 1 | # Only run GitHub Action on push to master / main
 2 | 
 3 | Spotted in [this Cloud Run example](https://github.com/GoogleCloudPlatform/github-actions/blob/20c294aabd5331f9f7b8a26e6075d41c31ce5e0d/example-workflows/cloud-run/.github/workflows/cloud-run.yml):
 4 | 
 5 | ```yaml
 6 | name: Build and Deploy to Cloud Run
 7 | 
 8 | on:
 9 |   push:
10 |     branches:
11 |     - master
12 | ```
13 | 
14 | Useful if you don't want people opening pull requests against your repo that inadvertantly trigger a deploy action!
15 | 
16 | An alternative mechanism I've used is to gate the specific deploy steps in the action, [like this](https://github.com/simonw/cryptozoology/blob/8a86ec283823c91ad42c5f737a912d43791d427f/.github/workflows/deploy.yml#L31-L40).
17 | 
18 | ```yaml
19 |     # Only run the deploy if push was to master
20 |     - name: Set up Cloud Run
21 |       if: github.ref == 'refs/heads/master'
22 |       uses: GoogleCloudPlatform/github-actions/setup-gcloud@v0
23 |       with:
24 |         version: '275.0.0'
25 |         service_account_email: ${{ secrets.GCP_SA_EMAIL }}
26 |         service_account_key: ${{ secrets.GCP_SA_KEY }}
27 |     - name: Deploy to Cloud Run
28 |       if: github.ref == 'refs/heads/master'
29 |       run: |-
30 |         gcloud config set run/region us-central1
31 | ```
32 | 


--------------------------------------------------------------------------------
/github-actions/python-3-11.md:
--------------------------------------------------------------------------------
 1 | # Testing against Python 3.11 preview using GitHub Actions
 2 | 
 3 | I decided to run my CI tests against the Python 3.11 preview, to avoid the problem I had when Python 3.10 came out with [a bug that affected Datasette](https://simonwillison.net/2021/Oct/9/finding-and-reporting-a-bug/).
 4 | 
 5 | I used the new [GitHub Code Search](https://cs.github.com/) to figure out how to do this. I searched for:
 6 | 
 7 |     3.11 path:workflows/*.yml
 8 | 
 9 | And found [this example](https://github.com/urllib3/urllib3/blob/7bec77e81aa0a194c98381053225813f5347c9d2/.github/workflows/ci.yml#L60) from `urllib3` which showed that the version tag to use is:
10 | 
11 |     3.11-dev
12 | 
13 | > **Update 28th Noveber 2022**: `3.12-dev` now works for Python 3.12 preview
14 | 
15 | I added that to my test matrix like so:
16 | 
17 | ```yaml
18 | jobs:
19 |   test:
20 |     runs-on: ubuntu-latest
21 |     strategy:
22 |       matrix:
23 |         python-version: ["3.7", "3.8", "3.9", "3.10", "3.11-dev"]
24 |     steps:
25 |     - uses: actions/checkout@v2
26 |     - name: Set up Python ${{ matrix.python-version }}
27 |       uses: actions/setup-python@v2
28 |       with:
29 |         python-version: ${{ matrix.python-version }}
30 |     # ...
31 | ```
32 | Here's the [full workflow](https://github.com/simonw/datasette/blob/a9d8824617268c4d214dd3be2174ac452044f737/.github/workflows/test.yml).
33 | 
34 | 


--------------------------------------------------------------------------------
/github-actions/set-environment-for-all-steps.md:
--------------------------------------------------------------------------------
 1 | # Set environment variables for all steps in a GitHub Action
 2 | 
 3 | From [this example](https://github.com/GoogleCloudPlatform/github-actions/blob/20c294aabd5331f9f7b8a26e6075d41c31ce5e0d/example-workflows/cloud-run/.github/workflows/cloud-run.yml) I learned that you can set environment variables such that they will be available in ALL jobs once at the top of a workflow:
 4 | 
 5 | ```yaml
 6 | name: Build and Deploy to Cloud Run
 7 | 
 8 | on:
 9 |   push:
10 |     branches:
11 |     - master
12 | 
13 | env:
14 |   PROJECT_ID: ${{ secrets.RUN_PROJECT }}
15 |   RUN_REGION: us-central1
16 |   SERVICE_NAME: helloworld-nodejs
17 | ```
18 | 
19 | I had previously been using this [much more verbose pattern](https://github.com/simonw/big-local-datasette/blob/181de90f1e7b59c7727595ee8cbe7626667fe05a/.github/workflows/deploy.yml#L30-L42):
20 | 
21 | ```yaml
22 |     - name: Fetch projects
23 |       env:
24 |         BIGLOCAL_TOKEN: ${{ secrets.BIGLOCAL_TOKEN }}
25 |       run: python fetch_projects.py dbs/biglocal.db $BIGLOCAL_TOKEN --contact ...
26 | ```
27 | 


--------------------------------------------------------------------------------
/github/bulk-edit-github-projects.md:
--------------------------------------------------------------------------------
 1 | # Bulk editing status in GitHub Projects
 2 | 
 3 | GitHub Projects has a mechanism for bulk updating the status of items, but it's pretty difficult to figure out how to do it.
 4 | 
 5 | The trick is to use copy and paste. You can select a cell containing a status and hit `Command+C` to copy it - at which point a dotted border will be displayed around the cell.
 6 | 
 7 | Then you can select and then shift-click a range of other cells, and hit `Command+V` to paste the value.
 8 | 
 9 | Here's a demo:
10 | 
11 | ![I click a In Progress cell and the border goes dotted when I hit the copy keyboard shortcut. Then I shift-click to select a range of cells and hit paste to update their status.](https://github.com/simonw/til/assets/9599/aedd6b5c-167e-40a1-9866-68410c0299d7)
12 | 
13 | Here's where this feature was introduced [in the GitHub changelog](https://github.blog/changelog/2023-04-06-github-issues-projects-april-6th-update/#t-rex-bulk-editing-in-tables). See also [this community discussions thread](https://github.com/orgs/community/discussions/5465).
14 | 


--------------------------------------------------------------------------------
/github/bulk-repo-github-graphql.md:
--------------------------------------------------------------------------------
 1 | # Bulk fetching repository details with the GitHub GraphQL API
 2 | 
 3 | I wanted to be able to fetch details of a list of different repositories from the GitHub GraphQL API by name in a single operation.
 4 | 
 5 | It turns out the `search()` operation can be used for this, given 100 repos at a time. The trick is to use the `repo:` search operator, e.g `repo:simonw/datasette repo:django/django` as demonstrated by [this search](https://github.com/search?q=repo%3Asimonw%2Fdatasette+repo:simonw/sqlite-utils&type=Repositories).
 6 | 
 7 | Here's the GraphQL query, tried out using https://docs.github.com/en/graphql/overview/explorer
 8 | 
 9 | ```graphql
10 | {
11 |   search(type: REPOSITORY, query: "repo:simonw/datasette repo:django/django", first: 100) {
12 |     nodes {
13 |       ... on Repository {
14 |         id
15 |         nameWithOwner
16 |         createdAt
17 |         repositoryTopics(first: 100) {
18 |           totalCount
19 |           nodes {
20 |             topic {
21 |               name
22 |             }
23 |           }
24 |         }
25 |         openIssueCount: issues(states: [OPEN]) {
26 |           totalCount
27 |         }
28 |         closedIssueCount: issues(states: [CLOSED]) {
29 |           totalCount
30 |         }
31 |         releases(last: 1) {
32 |           totalCount
33 |           nodes {
34 |             tagName
35 |           }
36 |         }
37 |       }
38 |     }
39 |   }
40 | }
41 | ```
42 | 


--------------------------------------------------------------------------------
/github/clone-and-push-gist.md:
--------------------------------------------------------------------------------
 1 | # Clone, edit and push files that live in a Gist
 2 | 
 3 | GitHub [Gists](https://gist.github.com/) are full Git repositories, and can be cloned and pushed to.
 4 | 
 5 | You can clone them anonymously (read-only) just using their URL:
 6 | 
 7 |     git clone https://gist.github.com/simonw/0a30d52feeb3ff60f7d8636b0bde296b
 8 | 
 9 | But if you want to be able to make local edits and then push them back, you need to use this recipe instead:
10 | 
11 |     git clone git@gist.github.com:0a30d52feeb3ff60f7d8636b0bde296b.git
12 | 
13 | You can find this in the "Embed" menu, as the "Clone via SSH" option.
14 | 
15 | This only uses the Gist's ID, the `simonw/` part from the URL is omitted.
16 | 
17 | This uses your existing GitHub SSH credentials.
18 | 
19 | You can then edit files in that repository and commit and push them like this:
20 | 
21 |     cd 0a30d52feeb3ff60f7d8636b0bde296b
22 |     # Edit files here
23 |     git commit -m "Edited some files" -a
24 |     git push
25 | 


--------------------------------------------------------------------------------
/github/dependabot-python-setup.md:
--------------------------------------------------------------------------------
 1 | # Configuring Dependabot for a Python project
 2 | 
 3 | GitHub's Dependabot can automatically file PRs with bumps to dependencies when new versions of them are available.
 4 | 
 5 | In June 2023 they added support for [Grouped version updates](https://github.blog/changelog/2023-06-30-grouped-version-updates-for-dependabot-public-beta/), so one PR will be filed that updates multiple dependencies at the same time.
 6 | 
 7 | The [Dependabot setup instructions](https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/configuration-options-for-dependency-updates) don't explicitly mention projects which keep all of their dependency information in `setup.py`.
 8 | 
 9 | It works just fine with those kinds of projects too.
10 | 
11 | To start it working, create a file in `.github/dependabot.yml` with the following contents:
12 | 
13 | ```yaml
14 | version: 2
15 | updates:
16 | - package-ecosystem: pip
17 |   directory: "/"
18 |   schedule:
19 |     interval: daily
20 |     time: "13:00"
21 |   groups:
22 |     python-packages:
23 |       patterns:
24 |         - "*"
25 | ```
26 | Then navigate to https://github.com/simonw/s3-credentials/network/updates (but for your project) - that's Insights -> Dependency graph -> Dependabot - to confirm that it worked.
27 | 
28 | This should work for projects that use `setup.py` or `pyproject.toml` or `requirements.txt`.
29 | 


--------------------------------------------------------------------------------
/github/graphql-search-topics.md:
--------------------------------------------------------------------------------
 1 | # Searching for repositories by topic using the GitHub GraphQL API
 2 | 
 3 | I wanted to use the GitHub GraphQL API to return all of the repositories on the https://github.com/topics/git-scraping page.
 4 | 
 5 | At first glance there isn't a GraphQL field for that page - but it turns out you can access it using a GitHub search:
 6 | 
 7 |     topic:git-scraping sort:updated-desc
 8 | 
 9 | An oddity of GitHub search is that sort order can be defined using tokens that form part of the search query!
10 | 
11 | Here's a GraphQL query [tested here](https://developer.github.com/v4/explorer/) that returns the most recent 100 `git-scraping` tagged repos, sorted by most recently updated.
12 | 
13 | ```graphql
14 | {
15 |   search(query: "topic:git-scraping sort:updated-desc", type: REPOSITORY, first: 100) {
16 |     repositoryCount
17 |     nodes {
18 |       ... on Repository {
19 |         nameWithOwner
20 |         description
21 |         updatedAt
22 |         createdAt
23 |         diskUsage
24 |       }
25 |     }
26 |   }
27 | }
28 | ```
29 | 


--------------------------------------------------------------------------------
/github/migrate-github-wiki.md:
--------------------------------------------------------------------------------
 1 | # Migrating a GitHub wiki from one repository to another
 2 | 
 3 | I figured out how to migrate a [GitHub wiki](https://docs.github.com/en/communities/documenting-your-project-with-wikis/about-wikis) (public or private) from one repository to another while preserving all history.
 4 | 
 5 | The trick is that GitHub wikis are just Git repositories. Which means you can clone them, edit them and push them.
 6 | 
 7 | This means you can migrate them between their parent repos like so. `myorg/old-repo` is the repo you are moving from, and `myorg/new-repo` is the destination.
 8 | 
 9 |     git clone https://github.com/myorg/old-repo.wiki.git
10 |     cd old-repo.wiki
11 |     git remote remove origin
12 |     git remote add origin https://github.com/myorg/new-repo.wiki.git
13 |     git push --set-upstream origin master --force
14 | 
15 | This will entirely over-write the content and history of the wiki attached to the `new-repo` repository with the content and history from the wiki in `old-repo`.
16 | 


--------------------------------------------------------------------------------
/github/reporting-bugs.md:
--------------------------------------------------------------------------------
 1 | # Reporting bugs in GitHub to GitHub
 2 | 
 3 | I found out today (via [this post](https://github.com/github-community/community/discussions/19988)) about a dedicated interface for reporting bugs in GitHub to GitHub:
 4 | 
 5 | https://support.github.com/contact/bug-report
 6 | 
 7 | It includes full markdown support, which means you can include animated GIFs that illustrate the bug.
 8 | 
 9 | Once reported, you can track the status of your bug reports here:
10 | 
11 | https://support.github.com/tickets
12 | 


--------------------------------------------------------------------------------
/github/syntax-highlighting-python-console.md:
--------------------------------------------------------------------------------
 1 | # Syntax highlighting Python console examples with GFM
 2 | 
 3 | It turns out [GitHub Flavored Markdown](https://github.github.com/gfm/) can apply syntax highlighting to Python console examples, like this one:
 4 | 
 5 | ```pycon
 6 | >>> import csv
 7 | >>> with open('eggs.csv', newline='') as csvfile:
 8 | ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
 9 | ...     for row in spamreader:
10 | ...         print(', '.join(row))
11 | Spam, Spam, Spam, Spam, Spam, Baked Beans
12 | Spam, Lovely Spam, Wonderful Spam
13 | ```
14 | 
15 | The trick is to use the following:
16 | 
17 | ````
18 | ```pycon
19 | >>> import csv
20 | >>> with open('eggs.csv', newline='') as csvfile:
21 | ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
22 | ...     for row in spamreader:
23 | ...         print(', '.join(row))
24 | Spam, Spam, Spam, Spam, Spam, Baked Beans
25 | Spam, Lovely Spam, Wonderful Spam
26 | ```
27 | ````
28 | I figured out the `pycon` code by scanning through the [languages.yml](https://github.com/github/linguist/blob/v7.12.2/lib/linguist/languages.yml#L4406-L4414) file for linguist, the library GitHub use for their syntax highlighting.
29 | 
30 | While writing this TIL I also learned how to embed triple-backticks in a code block - you surround the block with more-than-three backticks (thanks to [this tip](https://github.com/jonschlinkert/remarkable/issues/146#issuecomment-85539428)):
31 | 
32 | 
33 | `````
34 | ````
35 | ```pycon
36 | >>> import csv
37 | >>> with open('eggs.csv', newline='') as csvfile:
38 | ...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
39 | ...     for row in spamreader:
40 | ...         print(', '.join(row))
41 | Spam, Spam, Spam, Spam, Spam, Baked Beans
42 | Spam, Lovely Spam, Wonderful Spam
43 | ```
44 | ````
45 | `````
46 | 


--------------------------------------------------------------------------------
/github/transfer-issue-private-to-public.md:
--------------------------------------------------------------------------------
 1 | # Transferring a GitHub issue from a private to a public repository
 2 | 
 3 | I have my own private `notes` repository where I sometimes create research threads. Occasionally I want to transfer these to a public repository to publish their contents.
 4 | 
 5 | https://docs.github.com/en/issues/tracking-your-work-with-issues/transferring-an-issue-to-another-repository says:
 6 | 
 7 | > You can't transfer an issue from a private repository to a public repository.
 8 | 
 9 | I found this workaround:
10 | 
11 | 1. Create a new private repository. I called mine `simonw/temp`
12 | 2. Transfer the issue from your original repository to this new temporary repository
13 | 3. Use the "Settings" tab in the temporary repository to change the entire repository's visibility from private to public
14 | 4. Transfer the issue from the temporary repository to the public repository that you want it to live in
15 | 
16 | ## Using the gh tool
17 | 
18 | You can perform transfers using the web interface, but I also learned how to do it using the `gh` tool.
19 | 
20 | Install that with `brew install gh`
21 | 
22 | Then you can run this:
23 | 
24 |     gh issue transfer https://github.com/simonw/temp/issues/1 simonw/datasette-tiddlywiki
25 | 
26 | I used this trick today to transfer https://github.com/simonw/datasette-tiddlywiki/issues/2 out of my private `notes` repo.
27 | 


--------------------------------------------------------------------------------
/go/installing-tools.md:
--------------------------------------------------------------------------------
 1 | # Installing tools written in Go
 2 | 
 3 | Today I learned how to install tools from GitHub that are written in Go, using [github.com/icholy/semgrepx](https://github.com/icholy/semgrepx) as an example:
 4 | 
 5 |     go install github.com/icholy/semgrepx@latest
 6 | 
 7 | Running this command grabs a copy of the GitHub repository, compiles the Go package in there and drops the resulting binary into the `~/go/bin` folder on your computer:
 8 | 
 9 | ```
10 | ls -lh ~/go/bin/semgrepx
11 | -rwxr-xr-x  1 simon  staff   2.9M Mar 25 21:08 /Users/simon/go/bin/semgrepx
12 | ```
13 | The `@latest` reference confused me, since the repo in question didn't have a branch or tag called that.
14 | 
15 | I couldn't find the right documentation for that, but GPT-4 [confidently told me](https://chat.openai.com/share/06e62ec2-1ab3-495f-9e0c-914ef27c1e91):
16 | 
17 | > `@latest`: This specifies the version of the package you want to install. In this case, latest means that the Go tool will install the latest version of the package available. The Go tool uses the versioning information from the repository's tags to determine the latest version. If the repository follows semantic versioning, the latest version is the one with the highest version number. If there are no version tags, latest will refer to the most recent commit on the default branch of the repository.
18 | 
19 | In the absence of an official answer that looks like it might be right to me.
20 | 


--------------------------------------------------------------------------------
/googlecloud/google-oauth-cli-application-oauth-client-id.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/googlecloud/google-oauth-cli-application-oauth-client-id.png


--------------------------------------------------------------------------------
/googlecloud/gsutil-bucket.md:
--------------------------------------------------------------------------------
 1 | # Publishing to a public Google Cloud bucket with gsutil
 2 | 
 3 | I decided to publish static CSV files to accompany my https://cdc-vaccination-history.datasette.io/ project, using a Google Cloud bucket (see [cdc-vaccination-history issue #9](https://github.com/simonw/cdc-vaccination-history/issues/9)).
 4 | 
 5 | The Google Cloud tutorial on [https://cloud.google.com/storage/docs/hosting-static-website-http#gsutil](https://cloud.google.com/storage/docs/hosting-static-website-http#gsutil) was very helpful.
 6 | 
 7 | ## Creating the bucket
 8 | 
 9 | I used an authenticated `gsutil` session that I already had from my work with Google Cloud Run.
10 | 
11 | To create a new bucket:
12 | 
13 |     gsutil mb gs://cdc-vaccination-history-csv.datasette.io/
14 | 
15 | `mb` is the [make bucket](https://cloud.google.com/storage/docs/gsutil/commands/mb) command.
16 | 
17 | I had already verified my `datasette.io` bucket with Google, otherwise this step would not have worked.
18 | 
19 | ## Uploading files
20 | 
21 |     gsutil cp *.csv gs://cdc-vaccination-history-csv.datasette.io
22 | 
23 | Using the [gsutil cp command](https://cloud.google.com/storage/docs/gsutil/commands/cp).
24 | 
25 | ## Making them available to the public
26 | 
27 | This command allows anyone to download from the bucket:
28 | 
29 |     gsutil iam ch allUsers:objectViewer gs://cdc-vaccination-history-csv.datasette.io
30 | 
31 | ## DNS
32 | 
33 | I configured `cdc-vaccination-history-csv` as a `CNAME` pointing to `c.storage.googleapis.com.`
34 | 
35 | https://cdc-vaccination-history-csv.datasette.io/ now shows an XML directory listing.
36 | 


--------------------------------------------------------------------------------
/homebrew/homebrew-core-local-git-checkout.md:
--------------------------------------------------------------------------------
 1 | # Browsing your local git checkout of homebrew-core
 2 | 
 3 | The [homebrew-core](https://github.com/Homebrew/homebrew-core) repository contains all of the default formulas for Homebrew.
 4 | 
 5 | It's a huge repo, and if you browse it through the GitHub web interface you can run into errors like this one:
 6 | 
 7 | https://github.com/Homebrew/homebrew-core/commits/master/Formula/libspatialite.rb
 8 | 
 9 | > ### Sorry, this commit history is taking too long to generate.
10 | >
11 | > Refresh the page to try again, or view this history locally using the following command:
12 | >
13 | >     git log master -- Formula/libspatialite.rb
14 | 
15 | It turns out there's a full checkout of the repo (including history) in this folder on your computer already:
16 | 
17 |     /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core
18 | 
19 | So you can browse the history for that file locally like so:
20 | 
21 |     cd /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core
22 |     git log master -- Formula/libspatialite.rb
23 | 


--------------------------------------------------------------------------------
/homebrew/latest-sqlite.md:
--------------------------------------------------------------------------------
 1 | # Running the latest SQLite in Datasette using Homebrew
 2 | 
 3 | I made a pleasant discovery today: Homebrew are very quick to update to the latest SQLite release (here's [their formula](https://github.com/Homebrew/homebrew-core/blob/master/Formula/sqlite.rb)), and since [Datasette](https://datasette.io/) when installed via Homebrew uses that version, this means you can use `brew update sqlite` to ensure you are running the most recent SQLite version within Datasette.
 4 | 
 5 | If you've installed Datasette using Homebrew:
 6 | 
 7 |     brew install datasette
 8 | 
 9 | You can see the version of SQLite it uses either by running `datasette` and navigating to http://127.0.0.1:8001/-/versions - or you can see it from the command-line using:
10 | 
11 |     % datasette --get /-/versions.json | jq .sqlite.version
12 |     "3.37.2"
13 | 
14 | To upgrade SQLite, run the following:
15 | 
16 |     brew upgrade sqlite
17 | 
18 | After doing that I ran the above command again and confirmed I had been upgraded to SQLite 3.38.0:
19 | 
20 |     % datasette --get /-/versions.json | jq .sqlite.version
21 |     "3.38.0"
22 | 


--------------------------------------------------------------------------------
/homebrew/mysql-homebrew.md:
--------------------------------------------------------------------------------
 1 | # Running a MySQL server using Homebrew
 2 | 
 3 | First, install MySQL like so:
 4 | 
 5 |     brew install mysql
 6 | 
 7 | This installs the server but doesn't run it. You  can run it in the background like this:
 8 | ```
 9 | % mysql.server start
10 | Starting MySQL
11 | .. SUCCESS! 
12 | %
13 | ```
14 | Then later on you can stop it like so:
15 | ```
16 | % mysql.server stop 
17 | Shutting down MySQL
18 | . SUCCESS! 
19 | %
20 | ```
21 | While it's running it defaults to having a root account that only accepts connections from localhost with no password:
22 | ```
23 | % mysql -u root       
24 | Welcome to the MySQL monitor.  Commands end with ; or \g.
25 | Your MySQL connection id is 8
26 | ...
27 | mysql> 
28 | ```
29 | Running `mysql_secure_installation` runs a wizard that helps set up a password.
30 | 
31 | When you first install it, Homebrew says:
32 | ```
33 | To have launchd start mysql now and restart at login:
34 |   brew services start mysql
35 | ```
36 | You can re-display that message by running `brew reinstall mysql`.
37 | 
38 | ## Installing the mysqlclient Python library
39 | 
40 | This took me a long time to figure out. Eventually this worked:
41 | 
42 |     MYSQLCLIENT_CFLAGS=`pkg-config mysqlclient --cflags` \
43 |       MYSQLCLIENT_LDFLAGS=`pkg-config mysqlclient --libs` \
44 |       pip install mysqlclient
45 | 


--------------------------------------------------------------------------------
/homebrew/upgrading-python-homebrew-packages.md:
--------------------------------------------------------------------------------
 1 | # Upgrading Python Homebrew packages using pip
 2 | 
 3 | [VisiData 2.0](https://www.visidata.org/) came out today. I previously installed VisiData using Homebrew, but the VisiData tap has not yet been updated with the latest version.
 4 | 
 5 | Homebrew Python packages (including the packages for [Datasette](https://formulae.brew.sh/formula/datasette) and [sqlite-utils](https://formulae.brew.sh/formula/sqlite-utils)) work by setting up their own package-specific virtual environments. This means you can upgrade them without waiting for the tap.
 6 | 
 7 | To find the virtual environment, run `head -n 1` against the Homebrew-providid executable. VisiData is `vd`, so this works:
 8 | ```
 9 | % head -n 1 $(which vd)
10 | #!/usr/local/Cellar/visidata/1.5.2/libexec/bin/python3.8
11 | ```
12 | Now you can call `pip` within that virtual directory to perform the upgrade like so:
13 | ```
14 | /usr/local/Cellar/visidata/1.5.2/libexec/bin/pip install -U visidata
15 | ```
16 | 


--------------------------------------------------------------------------------
/html/lazy-loading-images.md:
--------------------------------------------------------------------------------
 1 | # Lazy loading images in HTML
 2 | 
 3 | [Most modern browsers](https://caniuse.com/loading-lazy-attr) now include support for the `loading="lazy"` image attribute, which causes images not to be loaded unti the user scrolls them into view.
 4 | 
 5 | ![Animated screenshot showing the network panel in the Firefox DevTools - as I scroll down a page more images load on demand just before they scroll into view.](https://user-images.githubusercontent.com/9599/204108097-6f385377-5daf-4895-9216-4ea0916a296a.gif)
 6 | 
 7 | I used it for the slides on my annotated version of this presentation: [Massively increase your productivity on personal projects with comprehensive documentation and automated tests](https://simonwillison.net/2022/Nov/26/productivity/).
 8 | 
 9 | There's one catch though: you need to provide the size of the image (I used `width=` and `height=` attributes) in order for it to work! Without those your browser still needs to fetch the images in order to calculate their dimensions to calculate page layout.
10 | 
11 | Here's the HTML I used for each slide image:
12 | 
13 | ```html
14 | <img
15 |   alt="Issue driven development"
16 |   width="450"
17 |   height="253"
18 |   loading="lazy"
19 |   src="https://static.simonwillison.net/static/2022/djangocon-productivity/productivity.022.jpeg"
20 | >
21 | ```
22 | 


--------------------------------------------------------------------------------
/html/video-preload-none.md:
--------------------------------------------------------------------------------
 1 | # HTML video that loads when the user clicks play
 2 | 
 3 | Today I figured out how to use the `<video>` tag to show a static thumbnail that gets replaced by the loaded video only when the user clicks play.
 4 | 
 5 | This is useful if you are going to show multiple video players on a page at the same time, as I do [on webvid.datasette.io](https://webvid.datasette.io/webvid/videos) ([described here](https://simonwillison.net/2022/Sep/29/webvid/)).
 6 | 
 7 | ```html
 8 | <video controls
 9 |   width="400"
10 |   preload="none"
11 |   poster="https://ak.picdn.net/shutterstock/videos/172/thumb/1.jpg?ip=x480"
12 | >
13 |   <source
14 |     src="https://ak.picdn.net/shutterstock/videos/172/preview/stock-footage-furnace-chimney.mp4"
15 |     type="video/mp4"
16 |   >
17 | </video>
18 | ```
19 | - `preload="none"` causes the browser to not preload the video until the user hits play
20 | - `poster="..."` provides an image thumbnail to show in the player when it first loads
21 | 
22 | More information: [MDN guide to the Video element](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video)
23 | 


--------------------------------------------------------------------------------
/ics/google-calendar-ics-subscribe-link.md:
--------------------------------------------------------------------------------
 1 | # Providing a "subscribe in Google Calendar" link for an ics feed
 2 | 
 3 | If you provide your own custom generated ICS file hosted at a URL it's nice to be able to give Google Calendar users an easy way to subscribe to that feed.
 4 | 
 5 | As far as I can tell this isn't documented anywhere, but it is possible.
 6 | 
 7 | The format is `https://www.google.com/calendar/render?cid=webcal://` followed by your URL.
 8 | 
 9 | For example: https://www.google.com/calendar/render?cid=webcal://pretalx.com/pycon-au-2020/schedule/export/schedule.ics
10 | 
11 | This drops the user into the a Google Calendar interface with a prompt for them to confirm their subscription.
12 | 
13 | I believe this only works if you are serving your ICS feed over HTTPS.
14 | 
15 | ## The catch: polling frequency
16 | 
17 | Unfortunately as far as I can tell Google Calendar polls for updates to the feed less than once every 24 hours. This is frustrating, as it makes this feature far less useful than it could be.
18 | 


--------------------------------------------------------------------------------
/imagemagick/set-a-gif-to-loop.md:
--------------------------------------------------------------------------------
 1 | # Set a GIF to loop using ImageMagick
 2 | 
 3 | I managed to accidentally create a GIF that ran once without looping. I think this is because I created it in [LICEcap](https://www.cockos.com/licecap/) but then deleted some frames and re-saved it using macOS Preview.
 4 | 
 5 | I used ImageMagick to get it to loop like this:
 6 | 
 7 |     convert chrome-samesite-missing.gif -loop 0 chrome-samesite-missing-loop.gif
 8 | 
 9 | Note that the output filename comes last, AFTER the `-loop 0` option.
10 | 
11 | I installed ImageMagick on macOS using `brew install imagemagick`
12 | 
13 | Here's the before GIF:
14 | 
15 | ![This loops once](https://static.simonwillison.net/static/2021/chrome-samesite-missing.gif)
16 | 
17 | And the after GIF:
18 | 
19 | ![This loops forever](https://static.simonwillison.net/static/2021/chrome-samesite-missing-loop.gif)
20 | 


--------------------------------------------------------------------------------
/ios/listen-to-page.md:
--------------------------------------------------------------------------------
 1 | # Listen to a web page in Mobile Safari
 2 | 
 3 | I found a better way to listen to a whole web page through text-to-speech on Mobile Safari today.
 4 | 
 5 | Previously I'd been selecting all text on the page manually (an arduous task) and using the "Speak" action menu. It turns out there's a better way.
 6 | 
 7 | 1. Hit the `aA` button next to the URL bar and select "Listen to Page" from the menu
 8 | 2. Hit that button again and you can control the Speaking rate, pause and skip forward or backwards by a short amount of time
 9 | 
10 | I think this is a new feature. Sadly it only works on pages that support "Reader" mode - which tends to be most article pages, but it won't be shown for pages that do not have a detected single article on them.
11 | 
12 | The content read out loud is the same content as is displayed if you select "Show Reader".
13 | 
14 | ![Screenshot of the two different states of that menu, with pink arrows highlighting the key options](https://github.com/simonw/til/assets/9599/89bc7329-ece0-4b80-8872-7888ea593303)
15 | 


--------------------------------------------------------------------------------
/javascript/lit-with-skypack.md:
--------------------------------------------------------------------------------
 1 | # Loading lit from Skypack
 2 | 
 3 | [Lit 2](https://lit.dev/blog/2021-09-21-announcing-lit-2/) stable was released today, offering a tiny, feature-full framework for constructing web components using modern JavaScript.
 4 | 
 5 | The [getting started documentation](https://lit.dev/docs/getting-started/) involves a whole lot of `npm` usage, but I wanted to just drop something into an HTML page and start trying out the library, without any kind of build step.
 6 | 
 7 | After [some discussion](https://twitter.com/simonw/status/1440462801630208001) on Twitter and with [the help of @WestbrookJ](https://twitter.com/WestbrookJ/status/1440477115741130757) I figured out the following pattern, which loads code from [Skypack](https://www.skypack.dev/):
 8 | 
 9 | ```html
10 | <script type="module">
11 | import { LitElement, html } from 'https://cdn.skypack.dev/lit';
12 |   
13 | class MyEl extends LitElement {
14 |   render() {
15 |     return html`Hello world!`
16 |   }
17 | }
18 | customElements.define('my-el',MyEl);
19 | </script>
20 | <my-el></my-el>
21 | ```
22 | 
23 | Also relevant: [lit-dist](https://github.com/fserb/lit-dist).
24 | 


--------------------------------------------------------------------------------
/javascript/manipulating-query-params.md:
--------------------------------------------------------------------------------
 1 | # Manipulating query strings with URLSearchParams
 2 | 
 3 | The `URLSearchParams` class, in [every modern browser](https://caniuse.com/?search=URLSearchParams) since IE 11, provides a sensible API for manipulating query string parameters in JavaScript. I first used it to build Datasette's column action menu, see [table.js](https://github.com/simonw/datasette/blob/0.50a0/datasette/static/table.js) and [issue 981](https://github.com/simonw/datasette/issues/981).
 4 | 
 5 | Here's how it handles multiple parameters with the same name, e.g. `?foo=bar&foo=baz`:
 6 | 
 7 | ```javascript
 8 | var params = new URLSearchParams('?foo=bar&foo=baz')
 9 | console.log(params.get("foo"))
10 | // Outputs "bar"
11 | console.log(params.getAll("foo"))
12 | // Outputs ["bar", "baz"]
13 | console.log(params.get("foo2"))
14 | // Outputs null
15 | console.log(params.getAll("foo2"))
16 | // Outputs []
17 | ```
18 | 
19 | It can also be used to add, remove and append values - then turned back into a query string using `params.toString()`:
20 | 
21 | ```javascript
22 | var params = new URLSearchParams('?foo=bar&foo=baz')
23 | params.append("foo", "another");
24 | params.toString()
25 | // "foo=bar&foo=baz&foo=another"
26 | params.set("foo", "over-written")
27 | // "foo=over-written"
28 | params.delete("foo")
29 | params.toString()
30 | // ""
31 | ```
32 | To construct a parameters object from the query string used on the current page, do this:
33 | ```javascript
34 | var params = new URLSearchParams(location.search);
35 | ```
36 | It doesn't matter if the string passed to `new URLSearchParams()` has a leading question mark or not - the result is the same.
37 | 


--------------------------------------------------------------------------------
/javascript/minifying-uglify-npx.md:
--------------------------------------------------------------------------------
 1 | # Minifying JavaScript with npx uglify-js
 2 | 
 3 | While [upgrading CodeMirror](https://github.com/simonw/datasette/issues/948) in Datasette I figured out how to minify JavaScript using `uglify-js` on the command line without first installing any teels, using [npx](https://www.npmjs.com/package/npx) (which downloads and executes a CLI tool while skipping the install step):
 4 | 
 5 |     npx uglify-js codemirror-5.57.0.js -o codemirror-5.57.0.min.js
 6 | 
 7 | One problem: this stripped out the LICENSE comment at the top of the file.
 8 | 
 9 | Turns out you can tell `uglify-js` not to strip comments that match a specific regular expression.
10 | 
11 | So I edited the CodeMirror file to use a single `/* ... */` comment at the top of the file (instead of multiple `//` lines) and ran Uglify like this:
12 | 
13 |     npx uglify-js codemirror-5.57.0.js -o codemirror-5.57.0.min.js --comments '/LICENSE/'
14 | 
15 | For CSS I used `clean-css-cli`:
16 | 
17 |     npx clean-css-cli codemirror-5.57.0.css -o codemirror-5.57.0.min.css
18 | 
19 | ## Using tercer instead
20 | 
21 | It turns out `uglify-js` doesn't support ES6 at all. You can use [tercer](https://github.com/terser/terser) instead:
22 | 
23 |     npx terser codemirror-5.57.0.js -o codemirror-5.57.0.min.js --comments '/LICENSE/'
24 | 
25 | Discovered in [datasette-edit-tables #16](https://github.com/simonw/datasette-edit-tables/issues/16).
26 | 
27 | ## Trying out Uglify interactively
28 | 
29 | [UglifyJS 3: Online JavaScript minifier](https://skalman.github.io/UglifyJS-online/) is a useful way to try out Ugllify since it shows you the results as you type, which makes it easy to spot tiny changes you can make that result in a shorter minified output.
30 | 


--------------------------------------------------------------------------------
/javascript/preventing-double-form-submission.md:
--------------------------------------------------------------------------------
 1 | # Preventing double form submissions with JavaScript
 2 | 
 3 | I needed this for [VIAL issue 722](https://github.com/CAVaccineInventory/vial/issues/722). I decided to disable form submissions for two seconds after they are submitted, to protect against accidental double submissions without risk of unexpected issues that could cause the form to be permanently disabled even though it should still be able to submit it.
 4 | 
 5 | I ended up adding this code to a custom Django Admin base template:
 6 | 
 7 | ```javascript
 8 | function protectForm(form) {
 9 |   var locked = false;
10 |   form.addEventListener('submit', (ev) => {
11 |     if (locked) {
12 |       ev.preventDefault();
13 |       return;
14 |     }
15 |     locked = true;
16 |     setTimeout(() => {
17 |       locked = false;
18 |     }, 2000);
19 |   });
20 | }
21 | window.addEventListener('load', () => {
22 |   Array.from(document.querySelectorAll('form')).forEach(protectForm);
23 | });
24 | ```
25 | 


--------------------------------------------------------------------------------
/javascript/scroll-to-form-if-errors.md:
--------------------------------------------------------------------------------
 1 | # Scroll page to form if there are errors
 2 | 
 3 | For a Django application I'm working on ([this issue](https://github.com/simonw/django-sql-dashboard/issues/44)) I have a form that can be quite a long way down the page.
 4 | 
 5 | If the form is displayed with errors, I want to scroll the user down to the form so they don't get confused.
 6 | 
 7 | Since Django forms display errors in an element with a `errorlist` class, this worked:
 8 | 
 9 | ```javascript
10 | window.addEventListener("load", () => {
11 |   if (document.querySelector('.errorlist')) {
12 |     document.querySelector('#my-form').scrollIntoView();
13 |   }
14 | });
15 | ```
16 | 
17 | [Element.scrollIntoView()](https://developer.mozilla.org/en-US/docs/Web/API/Element/scrollIntoView) on MDN, and [on Can I use](https://caniuse.com/scrollintoview).
18 | 


--------------------------------------------------------------------------------
/javascript/tesseract-ocr-javascript.md:
--------------------------------------------------------------------------------
 1 | # Using Tesseract.js to OCR every image on a page
 2 | 
 3 | Pasting this code into a DevTools console should load [Tesseract.js](https://github.com/naptha/tesseract.js) from a CDN, loop through every image loaded by that page (every PNG, GIF, JPG or JPEG), run OCR on them and output the result to the DevTools console.
 4 |                                                                      
 5 | There's one major catch: the images need to be served in a context that allows JavaScript to read their content - either from the same domain, or from a separate domain with a permissive CORS policy.
 6 |                                                                      
 7 | Very few sites do this! It worked on www.google.com for me, where it successfully OCRs the Google logo as containing the text "Google".
 8 |                                                                      
 9 | ```javascript
10 | var s = document.createElement("script")
11 | s.src = "https://unpkg.com/tesseract.js@v2.1.0/dist/tesseract.min.js";
12 | document.head.appendChild(s);
13 | s.onload = (async () => {
14 |   const imageUrls = performance.getEntries().map(f => f.name).filter(
15 |     n => n.includes('.jpg') || n.includes('.gif') || n.includes('.png')  || n.includes('.jpeg')
16 |   );
17 |   const worker = Tesseract.createWorker();
18 |   await worker.load();
19 |   await worker.loadLanguage('eng');
20 |   await worker.initialize('eng');
21 |   for (const url of imageUrls) {
22 |     console.log(url);
23 |     var { data: { text } } = await worker.recognize(url);
24 |     console.log(text);
25 |   }
26 | });
27 | ```
28 | 


--------------------------------------------------------------------------------
/javascript/working-around-nodevalue-size-limit.md:
--------------------------------------------------------------------------------
 1 | # Working around the size limit for nodeValue in the DOM
 2 | 
 3 | TIL that `nodeValue` in the DOM has a size limit!
 4 | 
 5 | I had a table cell element containing HTML-escaped JSON and I was doing this:
 6 | 
 7 | ```javascript
 8 | data = JSON.parse(td.firstChild.nodeValue); 
 9 | ```
10 | 
11 | This was breaking on larger JSON strings. It turns out that beyond a certain length limit browsers break up large chunks of text into multiple DOM text nodes.
12 | 
13 | The solution, [via Stackoverflow](https://stackoverflow.com/questions/4411229/size-limit-to-javascript-node-nodevalue-field), was this:
14 | 
15 | ```javascript
16 | const getFullNodeText = (el) => {
17 |     // https://stackoverflow.com/a/4412151
18 |     if (!el) {
19 |         return '';
20 |     }
21 |     if (typeof(el.textContent) != "undefined") {
22 |         return el.textContent;
23 |     }
24 |     return el.firstChild.nodeValue;
25 | };
26 | ```
27 | More details in [this issue](https://github.com/simonw/datasette-leaflet-geojson/issues/12).
28 | 


--------------------------------------------------------------------------------
/jinja/autoescape-template.md:
--------------------------------------------------------------------------------
 1 | # Turning on Jinja autoescaping when using Template() directly
 2 | 
 3 | Jinja autoescaping is turned off by default. Coming from Django this frequently catches me out.
 4 | 
 5 | You can turn on autoescaping for your Jinja environment using:
 6 | 
 7 | ```python
 8 | from jinja2 import Environment, FileSystemLoader
 9 | 
10 | env = Environment(
11 |     loader=FileSystemLoader("/path/to/templates"),
12 |     autoescape=True
13 | )
14 | ```
15 | 
16 | But what about if you are using `Template` directly? TIL that the `Template` class takes all of the same options as `Environment` does, so you can do this:
17 | 
18 | ```python
19 | from jinja2 import Template
20 | 
21 | template = Template("""
22 | <p>Hello {{ name }}</p>
23 | """, autoescape=True)
24 | 
25 | print(template.render({"name": "Simon & Cleo"}))
26 | # Output: <p>Hello Simon &amp; Cleo</p>
27 | ```
28 | Here's the [Template class constructor](https://github.com/pallets/jinja/blob/2.11.2/src/jinja2/environment.py#L984-L1005).
29 | 


--------------------------------------------------------------------------------
/jinja/format-thousands.md:
--------------------------------------------------------------------------------
 1 | # Formatting thousands in Jinja
 2 | 
 3 | Here's how to format a number in Jinja with commas for thousands, without needing any custom filters or template functions:
 4 | 
 5 |     {{ "{:,}".format(row_count) }}
 6 | 
 7 | Output looks like this:
 8 | 
 9 |     179,119
10 | 
11 | Bonus: here's how to display a different pluralization of "row" if there is a single row:
12 | 
13 |     {{ "{:,}".format(row_count) }} row{{ "" if row_count == 1 else "s" }}
14 | 
15 | Outputs:
16 | 
17 |     179,119 rows
18 | 


--------------------------------------------------------------------------------
/jq/array-of-array-to-objects.md:
--------------------------------------------------------------------------------
 1 | # Turning an array of arrays into objects with jq
 2 | 
 3 | Input:
 4 | ```json
 5 | [
 6 |     ["mm.domus.SW230"," A LA RONDE","Buildings:Houses:Medium houses",50.642781,-3.405508],
 7 |     ["mm.domus.SW193"," ALEXANDER KEILLER MUSEUM","Archaeology:Prehistory",51.427927,-1.857344],
 8 |     ["mm.domus.SE416"," ANNE OF CLEVES HOUSE MUSEUM","Buildings:Houses:Medium houses",50.869227,0.005329],
 9 | ]
10 | ```
11 | I want an array of objects. Here's what I came up with, using [jqplay.org](https://jqplay.org/):
12 | 
13 | ```jq
14 | [.[] | {id: .[0], name: .[1], category: .[2], latitude: .[3], longitude: .[4]}]
15 | ```
16 | This outputs:
17 | ```json
18 | [
19 |   {
20 |     "id": "mm.domus.SW230",
21 |     "name": " A LA RONDE",
22 |     "category": "Buildings:Houses:Medium houses",
23 |     "latitude": 50.642781,
24 |     "longitude": -3.405508
25 |   },
26 |   {
27 |     "id": "mm.domus.SW193",
28 |     "name": " ALEXANDER KEILLER MUSEUM",
29 |     "category": "Archaeology:Prehistory",
30 |     "latitude": 51.427927,
31 |     "longitude": -1.857344
32 |   },
33 |   {
34 |     "id": "mm.domus.SE416",
35 |     "name": " ANNE OF CLEVES HOUSE MUSEUM",
36 |     "category": "Buildings:Houses:Medium houses",
37 |     "latitude": 50.869227,
38 |     "longitude": 0.005329
39 |   }
40 | ]
41 | ```
42 | If you remove the outer `[` and `]` and use the "Compact output" option you get back this instead:
43 | ```
44 | {"id":"mm.domus.SW230","name":" A LA RONDE","category":"Buildings:Houses:Medium houses","latitude":50.642781,"longitude":-3.405508}
45 | {"id":"mm.domus.SW193","name":" ALEXANDER KEILLER MUSEUM","category":"Archaeology:Prehistory","latitude":51.427927,"longitude":-1.857344}
46 | {"id":"mm.domus.SE416","name":" ANNE OF CLEVES HOUSE MUSEUM","category":"Buildings:Houses:Medium houses","latitude":50.869227,"longitude":0.005329}
47 | 
48 | ```
49 | 


--------------------------------------------------------------------------------
/jupyter/jupyterlab-uv-tool-install.md:
--------------------------------------------------------------------------------
 1 | # Running jupyterlab via uv tool install
 2 | 
 3 | I tried to get [jupyterlab](https://jupyter.org/install) working via `uv tool install` today and ran into some sharp edges.
 4 | 
 5 | You can start like this:
 6 | ```bash
 7 | uv tool install jupyterlab
 8 | ```
 9 | That ran for a while and output:
10 | ```
11 | Installed 4 executables: jlpm, jupyter-lab, jupyter-labextension, jupyter-labhub
12 | ```
13 | It also gave me a warning about my PATH. I fixed that with:
14 | ```
15 | uv tool ensure-path
16 | ```
17 | On one other machine this didn't work because it refused to over-write a previous installation. The fix was to run `uv tool install` with `--force`:
18 | ```bash
19 | uv tool install jupyterlab --force
20 | ```
21 | Now we can start jupyterlab with:
22 | ```bash
23 | jupyter-lab
24 | ```
25 | ## Getting %pip to work
26 | 
27 | This was the biggest sticking point for me. Jupyter has a useful magic command for installing packages:
28 | 
29 | ```python
30 | %pip install llm
31 | ```
32 | When I tried to run this I got this error:
33 | 
34 | > `/Users/simon/.local/share/uv/tools/jupyterlab/bin/python: No module named pip`
35 | 
36 | It turns out we have an installation with no `pip` binary.
37 | 
38 | There may be a better way to do this, but I found that this worked, run in a Jupyter notebook cell:
39 | 
40 | ```python
41 | import subprocess, sys
42 | subprocess.check_call([sys.executable, "-m", "ensurepip"])
43 | ```
44 | After I ran this, the `%pip` magic command worked as expected - I didn't even need to restart the kernel.
45 | 
46 | ## Reported to Jupyter
47 | 
48 | I [opened an issue](https://github.com/jupyterlab/jupyterlab/issues/17375) about this and [submitted a PR](https://github.com/jupyterlab/jupyterlab/pull/17376) with a potential fix.
49 | 


--------------------------------------------------------------------------------
/linux/allow-sudo-without-password-specific-command.md:
--------------------------------------------------------------------------------
 1 | # Enabling a user to execute a specific command as root without a password
 2 | 
 3 | I wanted a script running as a non-root user to be able to restart a systemd service on my Ubuntu machine without needing a password.
 4 | 
 5 | I figured out how to do that by adding the following line to the `sudoers` file, which can be edited as root using the `visudo` command:
 6 | ```
 7 | # dogsheep user can restart datasette service
 8 | dogsheep  ALL = (root) NOPASSWD: /usr/bin/systemctl restart datasette
 9 | ```
10 | Having added this line, my `dogsheep` user account could now run the following:
11 | 
12 | ```
13 | $ sudo /usr/bin/systemctl restart datasette
14 | ```
15 | But if it tries to run the command with any other arguments it gets prompted for a password:
16 | ```
17 | $ sudo /usr/bin/systemctl restart datasette2
18 | [sudo] password for dogsheep: 
19 | ```
20 | 


--------------------------------------------------------------------------------
/linux/echo-pipe-to-file-su.md:
--------------------------------------------------------------------------------
 1 | # Piping echo to a file owned by root using sudo and tee
 2 | 
 3 | I wanted to create a file with a shell one-liner where the file lived somewhere owned by root.
 4 | 
 5 | I tried this but it didn't work:
 6 | ```bash
 7 | sudo echo '#!/bin/sh
 8 | /usr/bin/tarsnap -c \
 9 |   -f "$(uname -n)-$(date +%Y-%m-%d_%H-%M-%S)" \
10 |   /home/simon/team-storage' > /root/tarsnap-backup.sh
11 | ```
12 | Running `echo` using `sudo` didn't pass through to the `> filename` bit.
13 | 
14 | Here's what did work:
15 | 
16 | ```bash
17 | echo '#!/bin/sh
18 | /usr/bin/tarsnap -c \
19 |   -f "$(uname -n)-$(date +%Y-%m-%d_%H-%M-%S)" \
20 |   /home/simon/team-storage' | sudo tee /root/tarsnap-backup.sh > /dev/null
21 | ```
22 | 
23 | No need for `sudo` on the `echo` - but it pipes the output to `sudo tee` which can then write the file to disk.
24 | 
25 | The `> /dev/null` at the end supresses any output from the `tee` command. If you want to see the output you can do this instead:
26 | 
27 | ```bash
28 | echo '#!/bin/sh
29 | /usr/bin/tarsnap -c \
30 |   -f "$(uname -n)-$(date +%Y-%m-%d_%H-%M-%S)" \
31 |   /home/simon/team-storage' | sudo tee /root/tarsnap-backup.sh
32 | ```
33 | 


--------------------------------------------------------------------------------
/linux/iconv.md:
--------------------------------------------------------------------------------
 1 | # Using iconv to convert the text encoding of a file
 2 | 
 3 | In [sqlite-utils issue 439](https://github.com/simonw/sqlite-utils/issues/439) I was testing against a CSV file that used UTF16 little endian encoding, also known as `utf-16-le`.
 4 | 
 5 | I converted it to UTF-8 using `iconv` like this:
 6 | 
 7 |     iconv -f UTF-16LE -t UTF-8 file-in-utf16le.csv > file-in-utf8.csv
 8 | 
 9 | The `-f` argument here is the input encoding and `-t` is the desired output encoding.
10 | 
11 | I figured out the `-f` argument should be `UTF-16LE` (after first trying and failing with `utf-16-le`) by running:
12 | 
13 |     iconv -l
14 | 
15 | This outputs all of the available encoding options. It's a pretty long list so I filtered it like this:
16 | ```
17 | % iconv -l | grep UTF
18 | UTF-8 UTF8
19 | UTF-8-MAC UTF8-MAC
20 | UTF-16
21 | UTF-16BE
22 | UTF-16LE
23 | UTF-32
24 | UTF-32BE
25 | UTF-32LE
26 | UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7
27 | ```
28 | 
29 | ## Discarding invalid characters
30 | 
31 | I picked up this tip [from Ben Brandwood](https://twitter.com/ben_brandwood/status/1617926062213853185): you can also use `iconv` to fix problems when a file includes invalid UTF-8 characters.
32 | 
33 | The trick is to use the `-c` option, which `iconv --help` tells you will "discard unconvertible characters".
34 | 
35 | Here's Ben's recipe:
36 | 
37 |     iconv -f utf-8 -t utf-8 -c FILE.txt -o NEW_FILE
38 | 
39 | Note that the input encoding (`-f`) and the output encoding (`-t`) are the same here. The `-c` option does all of the work.
40 | 


--------------------------------------------------------------------------------
/machinelearning/musicgen.md:
--------------------------------------------------------------------------------
 1 | # Trying out the facebook/musicgen-small sound generation model
 2 | 
 3 | Facebook's [musicgen](https://huggingface.co/facebook/musicgen-small) is a model that generates snippets of audio from a text description - it's effectively a Stable Diffusion for music.
 4 | 
 5 | It turns out it's pretty easy to run it using Python, thanks to the Hugging Face [transformers](https://pypi.org/project/transformers/) library.
 6 | 
 7 | Here's the code that worked for me. First, install the dependencies:
 8 | ```
 9 | pip install scipy transformers
10 | ```
11 | The following will download the small model - around 2GB - and store it in `~/.cache/huggingface/hub/models--facebook--musicgen-small` the first time you run it.
12 | 
13 | ```python
14 | from transformers import AutoProcessor, MusicgenForConditionalGeneration
15 | import scipy
16 | 
17 | processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
18 | model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
19 | 
20 | def save(prompt, filename, num_tokens=1503):
21 |     inputs = processor(
22 |         text=[prompt],
23 |         padding=True,
24 |         return_tensors="pt",
25 |     )
26 |     audio_values = model.generate(**inputs, max_new_tokens=num_tokens)
27 |     sampling_rate = model.config.audio_encoder.sampling_rate
28 |     scipy.io.wavfile.write(filename, rate=sampling_rate, data=audio_values[0, 0].numpy())
29 | ```
30 | Then you can use that `save()` function like this to generate and save an audio sample:
31 | ```python
32 | save("trumpet mariachi frenetic excitement", "trumpet_mariachi.wav")
33 | ```
34 | Here's the audio that generated:
35 | 
36 | https://static.simonwillison.net/static/2023/trumpet_mariachi.wav
37 | 


--------------------------------------------------------------------------------
/macos/apple-photos-large-files.md:
--------------------------------------------------------------------------------
 1 | # Trick Apple Photos into letting you access your video files
 2 | 
 3 | I had an 11GB movie in Apple Photos (sync'd from my iPhone) and I wanted to upload it to YouTube (actually via AirDrop to another laptop first).
 4 | 
 5 | The "export" options in Apple Photos provided no visual indicator of what they were doing - as far as I could tell they were broken.
 6 | 
 7 | I wanted to deal with the actual file on disk. I figured out how to get access to that like so:
 8 | 
 9 | 1. Right click on the movie in Apple Photes
10 | 2. Select Edit With -> QuickTime Player
11 | 3. In QuickTime Player, command-click on the filename in the QuickTime window to get a list of folders
12 | 4. Click on the parent folder of the file to get a Finder window
13 | 
14 | Now you can treat the .MOV like a regular file! Right click and AirDrop and suchlike should work fine.
15 | 
16 | The command-click menu in QuickTime should look like this:
17 | 
18 | <img width="368" alt="IMG_7790.mov
19 | In a folder with a big complex UUID name
20 | In a folder called ExternalEditSessions
21 | In a folder called com.apple.Photos
22 | in a folder called private
23 | In a folder called Photos Library.photoslibrary" src="https://user-images.githubusercontent.com/9599/163071235-6ff80bc1-374c-4905-93d6-cd3c5486c6a8.png">
24 | 
25 | Then when I went to quit the Apple Photos app later it told me I had export jobs in progress, the jobs that I hadn't been able to see in the first place!
26 | 


--------------------------------------------------------------------------------
/macos/close-terminal-on-ctrl-d.md:
--------------------------------------------------------------------------------
1 | # Close terminal window on Ctrl+D for macOS
2 | 
3 | I always forget how to set this up. It's hidden in Preferences -> Profiles -> Basic -> Shell:
4 | 
5 | ![Preferences -> Profiles -> Basic -> Shell](https://raw.githubusercontent.com/simonw/til/master/macos/close-terminal-on-ctrl-d.png)
6 | 


--------------------------------------------------------------------------------
/macos/close-terminal-on-ctrl-d.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/macos/close-terminal-on-ctrl-d.png


--------------------------------------------------------------------------------
/macos/edit-ios-home-screen.md:
--------------------------------------------------------------------------------
 1 | # Editing an iPhone home screen using macOS
 2 | 
 3 | My iPhone has a weird bug: I can no longer re-arrange the app icons on the home screen (or in the dock) by dragging them around on my phone. It lets me get into edit mode (where the icons wiggle) and drag them around, but when I release they go back to where they came from.
 4 | 
 5 | Today I found a workaround.
 6 | 
 7 | It used to be possible to edit the home screen layout using iTunes on macOS, but that's no longer possible now that iTunes has been replaced by the new Music app.
 8 | 
 9 | Instead, you can edit the layout using an app called [Apple Configurator](https://support.apple.com/apple-configurator).
10 | 
11 | This is designed for IT departments that need to manage dozens or hundreds of devices, but it turns out it solves this problem for a solo device too.
12 | 
13 | Connect your iPhone to your Mac laptop via a USB-C cable, then open that application.
14 | 
15 | Then right-click on the full screen display of the device on the "All devices" page and select Modify -> Home Screen Layout:
16 | 
17 | ![Screenshot showing the right click menu with Modify -> Home Screen Layout...](https://github.com/simonw/til/assets/9599/e7e5e49a-5d3c-440b-95dd-35ad62a4deba)
18 | 
19 | This opens a window which can be used to re-arrange icons, including dragging them into the dock:
20 | 
21 | ![It's not a pretty interface, but it works](https://github.com/simonw/til/assets/9599/1519a256-63c8-4bce-8cbc-bb8ebba9954d)
22 | 
23 | Click "Apply" to send the updated layout to the phone.
24 | 
25 | 


--------------------------------------------------------------------------------
/macos/find-largest-sqlite.md:
--------------------------------------------------------------------------------
 1 | # Finding the largest SQLite files on a Mac
 2 | 
 3 | This runs using Spotlight so it's really fast:
 4 | 
 5 |     mdfind "kMDItemDisplayName == *.sqlite" -0 | xargs -0 stat "-f%z %N" | sort -nr | head -n 20
 6 | 
 7 | I have a lot of files in my Dropbox so I excluded those like this:
 8 | 
 9 | ```
10 | ~ % mdfind "kMDItemDisplayName == *.sqlite" -0 | xargs -0 stat "-f%z %N" | sort -nr | grep -v Dropbox | head -n 20
11 | 852422656 /Users/simon/Pictures/Photos Library.photoslibrary/database/Photos.sqlite
12 | 301924352 /Users/simon/Library/Reminders/Container_v1/Stores/Data-A8BA15F3-C80E-43B3-9E21-F7D1CD42AB6A.sqlite
13 | 51343360 /Users/simon/Pictures/Photos Library.photoslibrary/database/search/psi.sqlite
14 | 47349760 /Users/simon/Library/Application Support/Firefox/Profiles/ljj897b8.default-release/webappsstore.sqlite
15 | 47185920 /Users/simon/Library/Application Support/Firefox/Profiles/ljj897b8.default-release/places.sqlite
16 | 33779712 /Users/simon/Library/Group Containers/group.com.apple.notes/NoteStore.sqlite
17 | ```
18 | Figured out via [this comment](https://news.ycombinator.com/item?id=24179518) by why_only_15 on Hacker News.
19 | 
20 | ## SQLite files in a specific directory
21 | 
22 | The `-onlyin directory/` option searches just within a specified folder.
23 | 
24 | Here's how to see all of the SQLite files that have been created by Firefox (which is a lot, because it is used as a disk format for websites that use `localStorage`):
25 | 
26 |     mdfind "kMDItemDisplayName == *.sqlite" -onlyin ~/Library/Application\ Support/Firefox
27 | 


--------------------------------------------------------------------------------
/macos/fixing-compinit-insecure-directories.md:
--------------------------------------------------------------------------------
 1 | # Fixing "compinit: insecure directories" error
 2 | 
 3 | Every time I opened a terminal on my new Mac running Catalina with zsh I got the following annoying error:
 4 | 
 5 | ```
 6 | Last login: Fri Apr 24 17:50:41 on ttys004
 7 | zsh compinit: insecure directories, run compaudit for list.
 8 | Ignore insecure directories and continue [y] or abort compinit [n]? ncompinit: initialization aborted
 9 | ```
10 | 
11 | Based on a tip in https://github.com/zsh-users/zsh-completions/issues/680 I fixed it by doing this:
12 | 
13 | ```
14 | % compaudit
15 | There are insecure directories:
16 | /usr/local/share/zsh/site-functions
17 | /usr/local/share/zsh
18 | ```
19 | 
20 | Then I ran these commands to fix it:
21 | 
22 | ```
23 | sudo chmod g-w /usr/local/share/zsh/site-functions
24 | sudo chmod g-w /usr/local/share/zsh
25 | ```
26 | 


--------------------------------------------------------------------------------
/macos/impaste.md:
--------------------------------------------------------------------------------
 1 | # impaste: pasting images to piped commands on macOS
 2 | 
 3 | I wanted the ability to paste the image on my clipboard into a command in the macOS terminal.
 4 | 
 5 | It turns out `pbpaste` only works with textual data - so copying a portion of a screenshot to my clipboard (using [CleanShot X](https://cleanshot.com/)) and running the following produced a 0 byte file:
 6 | ```bash
 7 | pbpaste > /tmp/screenshot.png
 8 | ```
 9 | With some initial clues from [Feraidoon Mehri in a GitHub issue](https://github.com/simonw/llm/issues/331#issuecomment-2038425242) followed by [some ChatGPT](https://chat.openai.com/share/25e3f8f8-3e8a-4b40-aa3c-aa724d69a349) and [Claude 3 Opus prompting](https://gist.github.com/simonw/736bcc9bcfaef40a55deaa959fd57ca8) I got to the following script, saved as `~/.local/bin/impaste` on my machine (that folder is on my `PATH`) and made excutable with `chmod 755 ~/.local/bin/impaste`:
10 | 
11 | ```zsh
12 | #!/bin/zsh
13 | 
14 | # Generate a unique temporary filename
15 | tempfile=$(mktemp -t clipboard.XXXXXXXXXX.png)
16 | 
17 | # Save the clipboard image to the temporary file
18 | osascript -e 'set theImage to the clipboard as «class PNGf»' \
19 |   -e "set theFile to open for access POSIX file \"$tempfile\" with write permission" \
20 |   -e 'write theImage to theFile' \
21 |   -e 'close access theFile'
22 | 
23 | # Output the image data to stdout
24 | cat "$tempfile"
25 | 
26 | # Delete the temporary file
27 | rm "$tempfile"
28 | ```
29 | Now I can copy an image to my clipboard and run this:
30 | ```
31 | impaste > /tmp/image.png
32 | ```
33 | Or pipe `impaste` into any command that accepts images.
34 | 


--------------------------------------------------------------------------------
/macos/quick-whisper-youtube.md:
--------------------------------------------------------------------------------
 1 | # Grabbing a transcript of a short snippet of a YouTube video with MacWhisper
 2 | 
 3 | I grabbed [a quote](https://simonwillison.net/2023/Dec/1/jeremy-howard/) from a transcript of a snippet of a YouTube video today for my blog.
 4 | 
 5 | I use the [MacWhisper](https://goodsnooze.gumroad.com/l/macwhisper) macOS desktop app to run Whisper. It's a very pleasant GUI wrapper around the Whisper transcription model.
 6 | 
 7 | Usually I pull a full YouTube video using [yt-dlp](https://github.com/yt-dlp/yt-dlp) and then drop the resulting `.mp4` file into MacWhisper to transcribe the whole thing.
 8 | 
 9 | Today I realized there's a faster way if I just want a transcript of a few minutes from the video:
10 | 
11 | 1. Open MacWhisper
12 | 2. Hit the "New Recording" button and then "Start Recording"
13 | 3. Hit "Play" on the YouTube video
14 | 4. Wait until the end of the snippet and hit "Stop Recording" in MacWhisper
15 | 5. Hit "Transcribe Reporting" and wait a few seconds or minutes depending on the length of the snippet
16 | 
17 | Once the transcription is done you can hit the "Copy" button to copy out the text - I then usually drop it into VS Code to make a few minor edits.
18 | 
19 | Here's a GIF of the whole process:
20 | 
21 | ![Animated GIF illustrating each of the above steps](https://static.simonwillison.net/static/2023/macwhisper.gif)
22 | 


--------------------------------------------------------------------------------
/macos/shrinking-pngs-with-pngquant-and-oxipng.md:
--------------------------------------------------------------------------------
 1 | # Shrinking PNG files with pngquant and oxipng
 2 | 
 3 | I usually use [Squoosh.app](https://squoosh.app/) to reduce the size of my PNGs, but in this case I had a folder with nearly 50 images in it so I wanted to do it using the command-line.
 4 | 
 5 | [pngquant](https://pngquant.org/) can reduce the number of colours in a PNG image, which I find makes a huge differente to the file size (also possible using Squoosh).
 6 | 
 7 | [oxipng](https://github.com/shssoichiro/oxipng) is a performante lossless PNG compressor.
 8 | 
 9 | I got great results by running `pngquant` first, then `oxipng` on the results.
10 | 
11 | Both can be installed via Homebrew:
12 | 
13 |     brew install pngquant oxipng
14 | 
15 | Then I ran this command to reduce to a maximum of 50 colours per image:
16 | 
17 |     pngquant --quality 20-50 *.png
18 | 
19 | (I don't know if the lower bound of 20 is the right thing to do here, maybe `0-50` would be better?)
20 | 
21 | This creates a file called `x-fs8.png` for any file called `x.png`.
22 | 
23 | Then I ran this to apply `oxipng`:
24 | 
25 |     oxipng -o 3 -i 0 --strip safe *-fs8.png
26 | 
27 | The results were that a 383KB file dropped down to just 70KB. The images are visually different but the size savings are huge, which is particularly important if you plan to put 50+ images on a single web page.
28 | 


--------------------------------------------------------------------------------
/macos/skitch-catalina-1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/macos/skitch-catalina-1.png


--------------------------------------------------------------------------------
/macos/skitch-catalina-2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/macos/skitch-catalina-2.png


--------------------------------------------------------------------------------
/macos/skitch-catalina.md:
--------------------------------------------------------------------------------
 1 | # Get Skitch working on Catalina
 2 | 
 3 | I really like Skitch for taking screeshots, mainly because I like to snap a quick shot and then drag the icon onto a Google Doc or similar without saving the file anywhere.
 4 | 
 5 | Getting it working on Catalina was a bit tricky.
 6 | 
 7 | First, install it from the App Store.
 8 | 
 9 | Next, disable the default Shift+Command+5 system shortcut so Skitch can use it instead:
10 | 
11 | ![Keyboard shortcut settings](https://raw.githubusercontent.com/simonw/til/master/macos/skitch-catalina-1.png)
12 | 
13 | Then give Skitch permission to record your screen. Without that every screenshot will come out as a snippet of your desktop background!
14 | 
15 | ![Privace -> screen recording settings](https://raw.githubusercontent.com/simonw/til/master/macos/skitch-catalina-2.png)
16 | 


--------------------------------------------------------------------------------
/macos/zsh-pip-install.md:
--------------------------------------------------------------------------------
 1 | # Running pip install -e .[test] in zsh on macOS Catalina
 2 | 
 3 | macOS Catalina uses `zsh` rather than `bash` as the default shell (apparently because Apple don't like GPL 3).
 4 | 
 5 | I usually set up my Python projects for development like this:
 6 | 
 7 |     datasette % pipenv shell
 8 |     Launching subshell in virtual environment…
 9 |      . /Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/bin/activate                                                                         
10 |     datasette %  . /Users/simon/.local/share/virtualenvs/datasette-AWNrQs95/bin/activate
11 |     (datasette) simon@Simons-MacBook-Pro datasette % pip install -e .[test]
12 |     zsh: no matches found: .[test]
13 | 
14 | In `zsh` the `[` character has special meaning.
15 | 
16 | Two solutions. The first is to use quotes:
17 | 
18 |     datasette % pip install -e '.[test]'
19 |     Obtaining file:///Users/simon/Dropbox/Development/datasette
20 |     ...
21 | 
22 | The second is to prefix it with `noglob`:
23 | 
24 |     datasette % noglob pip install -e .[test]
25 | 


--------------------------------------------------------------------------------
/markdown/converting-to-markdown.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/markdown/converting-to-markdown.gif


--------------------------------------------------------------------------------
/markdown/converting-to-markdown.md:
--------------------------------------------------------------------------------
 1 | # Converting HTML and rich-text to Markdown
 2 | 
 3 | If you copy and paste from a web page - including a full table - into a GitHub issue comment GitHub converts it to the corresponding Markdown for you. Really quick way to construct Markdown tables.
 4 | 
 5 | ![GitHub converting to Markdown](https://raw.githubusercontent.com/simonw/til/main/markdown/converting-to-markdown.gif)
 6 | 
 7 | https://domchristie.github.io/turndown/ is an open source JavaScript library by Dom Christie that can convert HTML strings into Markdown strings. Code: https://github.com/domchristie/turndown - it used to be called `to-markdown`.
 8 | 
 9 | https://euangoddard.github.io/clipboard2markdown/ is a tool which lets you paste in rich-text and uses turndown to convert it for you directly in your browser.
10 | 
11 | ## 2md, explained
12 | 
13 | https://2md.ca/ is another tool that does this, written in TypeScript. It's accompanied by [a detailed essay](https://2md.ca/how-it-works) explaining exactly how it works.
14 | 


--------------------------------------------------------------------------------
/midjourney/desktop-backgrounds.md:
--------------------------------------------------------------------------------
 1 | # Creating desktop backgrounds using Midjourney
 2 | 
 3 | I decided to create a new desktop background for my Mac using [Midjourney](https://midjourney.com/). My laptop has a 16:10 aspect ratio and a retina screen, so I wanted as high a resolution image as possible.
 4 | 
 5 | After some experimentation, I learned that Midjourney v5 can produce a maximum image in that aspect ratio of 1376 × 864 pixels - but Midjourney v4 can produce 2560 × 1536 pixels.
 6 | 
 7 | This is true as-of my experiments on 10th April 2023 - Midjourney is constantly improving so this limitation may no longer apply when you read this.
 8 | 
 9 | Here's the prompt I passed to their bot on Discord:
10 | 
11 | > vector dark background, paper art, encouraging, technical, complex fractal patterns. abstract, small achievable goals --ar 16:10 --no letters --upbeta
12 | 
13 | The `--ar 16:10` sets the aspect ratio. `--no letters` sets a negative prompt on letters - I didn't want any words coming through on the generated image. `--upbeta` turns on the beta upscaler which means when you later select an image to upscale (by clicking the U1 through U4 buttons) they will come out at a significantly higher resolution.
14 | 
15 | - [Midjourney docs on upscalers](https://docs.midjourney.com/docs/upscalers)
16 | - [and on aspect ratios](https://docs.midjourney.com/docs/aspect-ratios)
17 | 
18 | I originally tried using `--v 5` to get Midjourney v5, before I found out that it doesn't support the higher resolution upscaler.
19 | 
20 | I saved the resulting image from Discord as a 866KB `.webp` file. When I converted it to a PNG using Preview I got this:
21 | 
22 | ![vector dark background, paper art, encouraging, technical, complex fractal patterns. abstract, small achievable goals](https://static.simonwillison.net/static/2023/small-achievable-goals-background.png)
23 | 


--------------------------------------------------------------------------------
/networking/http-ipv6.md:
--------------------------------------------------------------------------------
 1 | # Making HTTP calls using IPv6
 2 | 
 3 | Tiny TIL today: I learned how to make an HTTP call to an IPv6 address. The trick is to enclose the address in the URL in square braces:
 4 | 
 5 |     http://[2a09:8280:1::1:2741]
 6 | 
 7 | Here's that working as a request to www.pillarpointstewards.com (hosted on [Fly.io](https://fly.io/) which issues IPv6 addresses) using the [httpx](https://www.python-httpx.org/) Python library:
 8 | ```pycon
 9 | >>> httpx.get("http://[2a09:8280:1::1:2741]", headers={"host": "www.pillarpointstewards.com"}).text
10 | '<!DOCTYPE html>\n<html lang="en">\n<head>\n<title>Pillar Point Tidepool Stewards</title>\n<meta
11 | ```
12 | 


--------------------------------------------------------------------------------
/nginx/proxy-domain-sockets.md:
--------------------------------------------------------------------------------
 1 | # Using nginx to proxy to a Unix domain socket
 2 | 
 3 | I figured this out while adding `--uds` support to Datasette in [#1388](https://github.com/simonw/datasette/issues/1388). Save the following in `nginx.conf`:
 4 | 
 5 | ```nginx
 6 | daemon off;
 7 | events {
 8 |   worker_connections  1024;
 9 | }
10 | http {
11 |   server {
12 |     listen 8092;
13 |     location / {
14 |       proxy_pass http://datasette;
15 |       proxy_set_header Host $host;
16 |     }
17 |   }
18 |   upstream datasette {
19 |     server unix:/tmp/datasette.sock;
20 |   }
21 | }
22 | ```
23 | Start `nginx` against that configuration file - this works without root provided you listen on a high port:
24 | 
25 |     nginx -c $PWD/nginx.conf
26 | 
27 | (The `$PWD` seems necessary to avoid `nginx` looking in its default directory.)
28 | 
29 | Start something listening on the `/tmp/datasette.sock` path - with the latest Datasette you can do this:
30 | 
31 |     datasette --uds /tmp/datasette.sock
32 | 
33 | Now hits to `http://localhost:8092/` will proxy through to Datasette.
34 | 


--------------------------------------------------------------------------------
/node/constant-time-compare-strings.md:
--------------------------------------------------------------------------------
 1 | # Constant-time comparison of strings in Node
 2 | 
 3 | When comparing secrets, passwords etc it's important to use a constant-time compare function to avoid timing attacks.
 4 | 
 5 | In Python I use `secrets.compare_digest(a, b)`, [documented here](https://docs.python.org/3/library/secrets.html#secrets.compare_digest).
 6 | 
 7 | I needed an equivalent in Node.js today. It has a [crypto.timingSafeEqual() function](https://nodejs.org/api/crypto.html#crypto_crypto_timingsafeequal_a_b) but it's a little tricky to use: it requires arguments that are `Buffer`, `TypedArray` or `DataView` and it throws an exception if they are not the same length.
 8 | 
 9 | I figured out this wrapper function so I can operate against strings of varying length:
10 | 
11 | ```javascript
12 | const { timingSafeEqual } = require('crypto');
13 | 
14 | const compare = (a, b) => {
15 |     try {
16 |         return timingSafeEqual(Buffer.from(a, "utf8"), Buffer.from(b, "utf8"));
17 |     } catch {
18 |         return false;
19 |     }
20 | };
21 | ```
22 | 


--------------------------------------------------------------------------------
/observable/jq-in-observable.md:
--------------------------------------------------------------------------------
 1 | # Using jq in an Observable notebook
 2 | 
 3 | I use the `jq` language quite a lot these days, mainly because I can get ChatGPT to write little JSON transformation programs for me very quickly.
 4 | 
 5 | I just figured out how to run `jq` in an Observable notebook.
 6 | 
 7 | The key is the [jq-web](https://www.npmjs.com/package/jq-web) npm package, which provides an Emscripten-compiled version of `jq` itself.
 8 | 
 9 | You can load that in an Observable notebook with this cell:
10 | 
11 | ```javascript
12 | jq = require("jq-web")
13 | ```
14 | 
15 | Now you can use the `jq.json(data, jqScript)` function to run a conversion against some data.
16 | 
17 | Here's a simple example from the `jq-web` documentation:
18 | 
19 | ```javascript
20 | jq.json({
21 |   a: {
22 |     big: {
23 |       json: [
24 |         'full',
25 |         'of',
26 |         'important',
27 |         'things'
28 |       ]
29 |     }
30 |   }
31 | }, '.a.big.json | ["empty", .[1], "useless", .[3]] | join(" ")')
32 | ```
33 | 
34 | I tend to want to run recipes against data from an Observable textarea - so I add a cell like this:
35 | 
36 | ```javascript
37 | viewof input = Inputs.textarea({
38 |   placeholder: "Paste JSON here"
39 | })
40 | ```
41 | Then I can run a `jq` recipe like it and assign the result to a variable:
42 | ```javascript
43 | output = jq.json(JSON.parse(input), '.my.jq.program.here');
44 | ```
45 | I can display that output like so:
46 | ```javascript
47 | html`<h2>Output:</h2>
48 | <textarea style="width: 80%; height: 20em">${JSON.stringify(
49 |   output,
50 |   null,
51 |   2
52 | )}</textarea>
53 | ```
54 | Here's [an example of a notebook I created](https://observablehq.com/@simonw/chatgpt-json-transcript-to-markdown-using-jq) using `jq-web`.
55 | 


--------------------------------------------------------------------------------
/playwright/expect-selector-count.md:
--------------------------------------------------------------------------------
 1 | # Using expect() to wait for a selector to match multiple items
 2 | 
 3 | In the Playwright tests for [datasette-cluster-map](https://github.com/simonw/datasette-cluster-map) I wanted to assert that two markers had been displayed on a Leaflet map.
 4 | 
 5 | Since the map can take a little while to load, I wanted to run an assertion that would keep trying for up to a deadline to see if those elements had become available.
 6 | 
 7 | Here's how to do that in a Playwright Python test:
 8 | 
 9 | ```python
10 | from playwright.sync_api import expect
11 | 
12 | def test_markers_are_displayed(ds_server, page):
13 |     page.goto(ds_server + "/data/latitude_longitude")
14 |     # There should be two leaflet-marker-icons
15 |     expect(page.locator(".leaflet-marker-icon")).to_have_count(2)
16 | ```
17 | The `page` and `ds_server` fixtures are explained [in this TIL](https://til.simonwillison.net/datasette/playwright-tests-datasette-plugin).
18 | 
19 | `page.locator()` returns a [Locator](https://playwright.dev/python/docs/api/class-locator), described by Playwright's documentation as "the central piece of Playwright's auto-waiting and retry-ability".
20 | 
21 | Here's the documentation for [to_have_count()](https://playwright.dev/python/docs/api/class-locatorassertions#locator-assertions-to-have-count) - it takes an optional second `timeout` floating point argument which defaults to 5.0 seconds.
22 | 
23 | Initially I tried using the `page.locator(...).all()` method, but this [doesn't wait for matching elements](https://playwright.dev/python/docs/api/class-locator#locator-all) to become available.
24 | 


--------------------------------------------------------------------------------
/pluggy/multiple-hooks-same-file.md:
--------------------------------------------------------------------------------
 1 | # Registering the same Pluggy hook multiple times in a single file
 2 | 
 3 | I found myself wanting to register more than one instance of a [Pluggy](https://pluggy.readthedocs.io/) plugin hook inside a single module.
 4 | 
 5 | Hooks are usually registered like this:
 6 | 
 7 | ```python
 8 | @hookimpl
 9 | def filters_from_request(request, database, datasette):
10 |     # ...
11 | ```
12 | Where `filters_from_request` matches the name of a registered plugin hook.
13 | 
14 | It turns out you can do this instead:
15 | 
16 | ```python
17 | @hookimpl(specname="filters_from_request")
18 | def filters_from_request_1(request, database, datasette):
19 |     # ...
20 | 
21 |     
22 | @hookimpl(specname="filters_from_request")
23 | def filters_from_request_2(request, database, datasette):
24 |     # ...
25 | ```
26 | Which allows you to write more than one plugin implementation function in the same Python module file.
27 | 
28 | Note that the `specname` feature requires [Pluggy 1.0.0](https://github.com/pytest-dev/pluggy/blob/main/CHANGELOG.rst#pluggy-100-2021-08-25) or higher.
29 | 
30 | These can be combined with `tryfirst=` and `trylast=`. This example adds one link at the start of the Datasette application menu and one at the end, using the [menu_links hook](https://docs.datasette.io/en/stable/plugin_hooks.html#menu-links-datasette-actor-request).
31 | 
32 | ```python
33 | from datasette import hookimpl
34 | 
35 | 
36 | @hookimpl(specname="menu_links", tryfirst=True)
37 | def menu_links_1(datasette):
38 |     return [
39 |         {"href": datasette.urls.path("/"), "label": "Home"},
40 |     ]
41 | 
42 | 
43 | @hookimpl(specname="menu_links", trylast=True)
44 | def menu_links_2():
45 |     return [
46 |         {
47 |             "href": "http://www.example.com/",
48 |             "label": "Link at the end",
49 |         },
50 |     ]
51 | ```
52 | 


--------------------------------------------------------------------------------
/plugins/redirects.py:
--------------------------------------------------------------------------------
 1 | from datasette import hookimpl
 2 | from datasette.utils.asgi import Response
 3 | 
 4 | 
 5 | @hookimpl
 6 | def register_routes():
 7 |     return (
 8 |         (
 9 |             r"^/til/til/(?P<topic>[^_]+)_(?P<slug>[^\.]+)\.md$",
10 |             lambda request: Response.redirect(
11 |                 "/{topic}/{slug}".format(**request.url_vars), status=301
12 |             ),
13 |         ),
14 |         ("^/til/feed.atom$", lambda: Response.redirect("/tils/feed.atom", status=301)),
15 |         (
16 |             "^/til$",
17 |             lambda request: Response.redirect(
18 |                 "/tils"
19 |                 + (("?" + request.query_string) if request.query_string else ""),
20 |                 status=301,
21 |             ),
22 |         ),
23 |         (
24 |             "^/til/search$",
25 |             lambda request: Response.redirect(
26 |                 "/tils/search"
27 |                 + (("?" + request.query_string) if request.query_string else ""),
28 |                 status=301,
29 |             ),
30 |         ),
31 |     )
32 | 


--------------------------------------------------------------------------------
/plugins/template_vars.py:
--------------------------------------------------------------------------------
 1 | from datasette import hookimpl
 2 | from bs4 import BeautifulSoup as Soup
 3 | import html
 4 | import re
 5 | 
 6 | non_alphanumeric = re.compile(r"[^a-zA-Z0-9\s]")
 7 | multi_spaces = re.compile(r"\s+")
 8 | 
 9 | 
10 | def first_paragraph(html):
11 |     soup = Soup(html, "html.parser")
12 |     return str(soup.find("p"))
13 | 
14 | 
15 | def highlight(s):
16 |     s = html.escape(s)
17 |     s = s.replace("b4de2a49c8", "<strong>").replace("8c94a2ed4b", "</strong>")
18 |     return s
19 | 
20 | 
21 | @hookimpl
22 | def extra_template_vars(request, datasette):
23 |     async def related_tils(til):
24 |         path = til["path"]
25 |         sql = """
26 |         select
27 |           til.topic, til.slug, til.title, til.created
28 |         from til
29 |           join similarities on til.path = similarities.other_id
30 |         where similarities.id = :path
31 |         order by similarities.score desc limit 10
32 |         """
33 |         result = await datasette.get_database().execute(
34 |             sql,
35 |             {"path": til["path"]},
36 |         )
37 |         return result.rows
38 | 
39 |     return {
40 |         "q": request.args.get("q", ""),
41 |         "highlight": highlight,
42 |         "first_paragraph": first_paragraph,
43 |         "related_tils": related_tils,
44 |     }
45 | 
46 | 
47 | @hookimpl
48 | def prepare_connection(conn):
49 |     conn.create_function("first_paragraph", 1, first_paragraph)
50 | 


--------------------------------------------------------------------------------
/postgresql/json-extract-path.md:
--------------------------------------------------------------------------------
 1 | # Using json_extract_path in PostgreSQL
 2 | 
 3 | The `json_extract_path()` function in PostgreSQL can be used to extract specific items from JSON - but I couldn't find documentation for the path language it uses.
 4 | 
 5 | It turns out it's a variadic functions - it takes multiple arguments, so the path you want is split into separate arguments.
 6 | 
 7 | I had data that looks like this (from [django-reversion](https://github.com/etianen/django-reversion)) in a column called `serialized_data`:
 8 | 
 9 | ```json
10 | [
11 |     {
12 |         "model": "core.location",
13 |         "pk": 119,
14 |         "fields": {
15 |             "name": "Vista Community Clinic- The Gary Center, S. Harbour Blvd",
16 |             "full_address": "201 S. Harbor Boulevard, \nLa Habra, CA 90631"
17 |         }
18 |     }
19 | ]
20 | ```
21 | I wanted just that `full_address` value. Here's how I got it:
22 | ```sql
23 | select
24 |   object_id,
25 |   content_type_id,
26 |   json_extract_path(
27 |     serialized_data::json,
28 |     '0',
29 |     'fields',
30 |     'full_address'
31 |   ) as full_address
32 | from
33 |   reversion_version
34 | ```
35 | That's a path of `0`, `fields`, `full_address` - note that arrays are accessed by passing a string integer.
36 | 
37 | The `::json` casting operater is required here because my JSON isn't stored in a PostgreSQL `jsonb` column, it's stored in a regular text column.
38 | 
39 | Without the `::json` I got the following error:
40 | 
41 | > function json_extract_path(text, unknown, unknown, unknown) does not exist LINE 7: json_extract_path( ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. 
42 | 


--------------------------------------------------------------------------------
/postgresql/read-only-postgresql-user.md:
--------------------------------------------------------------------------------
 1 | # Granting a PostgreSQL user read-only access to some tables
 2 | 
 3 | I wanted to grant a PostgreSQL user (or role) read-only access to a specific list of tables.
 4 | 
 5 | I created the role using the Heroku PostgreSQL web console. Having done that, it had the name `read-only-core-tables`.
 6 | 
 7 | I attached it to my Heroku application using the web console, then I ran the following to get myself a default-permission terminal session:
 8 | 
 9 |     % heroku pg:psql postgresql-metric-59331 --app myapp
10 | 
11 | As the default user I ran the following:
12 | 
13 | ```sql
14 | GRANT USAGE ON SCHEMA PUBLIC TO "read-only-core-tables";
15 | ```
16 | 
17 | This grants that user the ability to see what tables exist - after running this the `\dt` command for that user started showing a full list of tables.
18 | 
19 | But... `select * from table` returned permission denied for every table. To allow select access I ran this:
20 | 
21 | ```sql
22 | GRANT SELECT ON TABLE
23 |     public.availability_tag,
24 |     public.county,
25 |     public.django_content_type,
26 |     public.django_migrations
27 | TO "read-only-core-tables";
28 | ```
29 | That's all it took - my read-only user was then able to run `select * from county` and see the results - but attempts to select against tables not in that allow-list were denied.
30 | 


--------------------------------------------------------------------------------
/postgresql/show-schema.md:
--------------------------------------------------------------------------------
 1 | # Show the SQL schema for a PostgreSQL database
 2 | 
 3 | This took me longer to figure out than I care to admit.
 4 | 
 5 |     pg_dump -s name-of-database
 6 | 
 7 | Surprisingly there doesn't seem to be an easy way to do this using a `SELECT` statement within PostgreSQL itself.
 8 | 
 9 | StackOverflow [offers up](https://stackoverflow.com/a/16154183) a terrifying 72 line SQL monstrosity that attempts to recreate the `CREATE TABLE` for a given table.
10 | 


--------------------------------------------------------------------------------
/postgresql/unnest-csv.md:
--------------------------------------------------------------------------------
 1 | # Using unnest() to use a comma-separated string as the input to an IN query
 2 | 
 3 | [django-sql-dashboard](https://github.com/simonw/django-sql-dashboard) lets you define a SQL query plus one or more text inputs that the user can provide in order to execute the query.
 4 | 
 5 | I wanted the user to provide a comma-separated list of IDs which I would then use as input to a `WHERE column IN ...` query.
 6 | 
 7 | I figured out how to do that using the `unnest()` function and `regexp_split_to_array`:
 8 | 
 9 | ```sql
10 | select * from report where id in (select unnest(regexp_split_to_array(%(ids)s, ',')))
11 | ```
12 | 
13 | The `ids` string passed to this query is split on commas and used for the IN clause.
14 | 
15 | Here's a simple demo of how `unnest()` works:
16 | 
17 | ```sql
18 | select unnest(regexp_split_to_array('1,2,3', ','))
19 | ```
20 | Output:
21 | 
22 | |unnest|
23 | |------|
24 | |1     |
25 | |2     |
26 | |3     |
27 | 


--------------------------------------------------------------------------------
/presenting/Tipsheet__https___bit_ly_…_and_New_File_and_Zoom.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/presenting/Tipsheet__https___bit_ly_…_and_New_File_and_Zoom.png


--------------------------------------------------------------------------------
/presenting/stickies-for-workshop-links.md:
--------------------------------------------------------------------------------
1 | # Using macOS stickies to display a workshop link on the screen
2 | 
3 | When giving a workshop it's often useful to have a URL to the workshop materials visible on screen at all times. I use a bit.ly link for these.
4 | 
5 | macOS stickies work great for this: create a sticky with the URL, locate it in a bottom corner of the screen and then turn on the "Float on Top" and "Translucent" options in the Window menu.
6 | 
7 | ![Screenshot of stickies](https://raw.githubusercontent.com/simonw/til/master/presenting/Tipsheet__https___bit_ly_%E2%80%A6_and_New_File_and_Zoom.png)
8 | 


--------------------------------------------------------------------------------
/pypi/project-links.md:
--------------------------------------------------------------------------------
 1 | # Adding project links to PyPI
 2 | 
 3 | I spotted a neat looking project links section on https://pypi.org/project/ExifReader/
 4 | 
 5 | Turns out that's added using a `project_urls` key in setup.py, e.g. here: https://gitlab.com/Cyb3r-Jak3/exifreader/-/blob/publish/setup.py
 6 | 
 7 | ```python
 8 |     project_urls={
 9 |         "Issues": "https://gitlab.com/Cyb3r-Jak3/ExifReader/issues",
10 |         "Source Code": "https://gitlab.com/Cyb3r-Jak3/ExifReader/-/tree/publish",
11 |         "CI": "https://gitlab.com/Cyb3r-Jak3/ExifReader/pipelines",
12 |         "Releases": "https://github.com/Cyb3r-Jak3/ExifReader"
13 |     },
14 | ```
15 | The [documentation](https://packaging.python.org/guides/distributing-packages-using-setuptools/#project-urls) says that the keys can be any string you like. So where do those icons come from?
16 | 
17 | I looked at the HTML source code and searched the PyPI website source code for one of the CSS classes I found there: https://github.com/pypa/warehouse/search?q=fa-bug&unscoped_q=fa-bug
18 | 
19 | This lead me to the macro that defines the custom icons, which shows what custom keys or URLs are supported: https://github.com/pypa/warehouse/blob/2f00f4a9f208546ff0ebb6a6e61439021ca60a43/warehouse/templates/packaging/detail.html#L16-L60
20 | 
21 | I added these to the [sqlite-utils PyPI page](https://pypi.org/project/sqlite-utils/) in [this commit](https://github.com/simonw/sqlite-utils/commit/74b30af31bf5169559c06aa6e57e1e4873076720).
22 | 
23 | <img src="https://raw.githubusercontent.com/simonw/til/main/pypi/project-links.png" width="333" alt="Project links on PyPI">
24 | 


--------------------------------------------------------------------------------
/pypi/project-links.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/simonw/til/0b27337d42e744a184742f8cbbfb44cddbeae008/pypi/project-links.png


--------------------------------------------------------------------------------
/pytest/assert-dictionary-subset.md:
--------------------------------------------------------------------------------
 1 | # Asserting a dictionary is a subset of another dictionary
 2 | 
 3 | My [lazy approach to writing unit tests](https://simonwillison.net/2020/Feb/11/cheating-at-unit-tests-pytest-black/) means that sometimes I want to run an assertion against most (but not all) of a dictionary.
 4 | 
 5 | Take for example an API endpoint that returns something like this:
 6 | 
 7 | ```json
 8 | {
 9 |     "sqlite_version": "3.27.1",
10 |     "name": "Datasette",
11 |     "columns": ["foo", "bar"]
12 | }
13 | ```
14 | I want to efficiently assert against the second two keys, but I don't want to hard-code the SQLite version into my test in case it changes in the future.
15 | 
16 | Solution:
17 | 
18 | ```python
19 | assert {
20 |     "name": "Datasette",
21 |     "columns": ["foo", "bar"]
22 | }.items() <= api_response.items()
23 | ```
24 | 
25 | The trick here is using `expected.items() <= actual.items()` to assert that one dictionary is a subset of another.
26 | 
27 | Here's a recent example test that uses this trick: https://github.com/simonw/datasette/blob/40885ef24e32d91502b6b8bbad1c7376f50f2830/tests/test_plugins.py#L414-L446
28 | 


--------------------------------------------------------------------------------
/pytest/async-fixtures.md:
--------------------------------------------------------------------------------
 1 | # Async fixtures with pytest-asyncio
 2 | 
 3 | I wanted to use a fixture with `pytest-asyncio` that was itsef as `async def` function, so that it could execute `await` statements.
 4 | 
 5 | Since I'm using a `pytest.ini` file containing `asyncio_mode = strict` I had to use the `@pytest_asyncio.fixture` fixture to get this to work. Without that fixture I got this error:
 6 | 
 7 | ```
 8 |     assert _has_explicit_asyncio_mark(fixturedef.func)
 9 | E   AssertionError: assert False
10 | E    +  where False = _has_explicit_asyncio_mark(<function ds_with_route at 0x11332d2d0>)
11 | E    +    where <function ds_with_route at 0x11332d2d0> = <FixtureDef argname='ds_with_route' scope='function' baseid='tests/test_routes.py'>.func
12 | ```
13 | 
14 | Swapping `@pytest.fixture` for `@pytest_asyncio.fixture` fixed this problem:
15 | 
16 | ```python
17 | import pytest_asyncio
18 | 
19 | @pytest_asyncio.fixture
20 | async def ds_with_route():
21 |     ds = Datasette()
22 |     db = Database(ds, is_memory=True, memory_name="route-name-db")
23 |     ds.add_database(db, name="name", route="route-name")
24 |     await db.execute_write_script(
25 |         """
26 |         create table if not exists t (id integer primary key);
27 |         insert or replace into t (id) values (1);
28 |     """
29 |     )
30 |     return ds
31 | ```
32 | 


--------------------------------------------------------------------------------
/pytest/pytest-httpx-debug.md:
--------------------------------------------------------------------------------
 1 | # A tip for debugging pytest-httpx
 2 | 
 3 | I use [pytest-httpx](https://colin-b.github.io/pytest_httpx/) in a bunch of my projects. Occasionally I run into test failures like this one, which can sometimes be really hard to figure out:
 4 | 
 5 | ```
 6 | E       AssertionError: The following responses are mocked but not requested:
 7 | E         - Match POST request on https://api.mistral.ai/v1/chat/completions
 8 | E         
 9 | E         If this is on purpose, refer to https://github.com/Colin-b/pytest_httpx/blob/master/README.md#allow-to-register-more-responses-than-what-will-be-requested
10 | E       assert not [<pytest_httpx._request_matcher._RequestMatcher object at 0x107d4e560>]
11 | ```
12 | Today I figured out a pattern for investigating these that works *really well*.
13 | 
14 | Drop a variant of this decorator onto your failing test:
15 | 
16 | ```python
17 | def intercept(request):
18 |     from pprint import pprint
19 |     import json
20 | 
21 |     print(request.url)
22 |     pprint(json.loads(request.content))
23 |     breakpoint()
24 |     return True
25 | 
26 | @pytest.mark.httpx_mock(should_mock=intercept)
27 | def test_tools_stream(mocked_tool_stream):
28 |     model = llm.get_model("mistral/mistral-medium")
29 |     ...
30 | ```
31 | The `intercept()` function will then be called for _every_ request that `pytest-httpx` has the chance to intercept. In my function here I'm printing out the URL and pretty-printing the JSON body (I was debugging a Mistral API call) but you can leave those out entirely and just rely on the `breakpoint()`.
32 | 
33 | When you run `pytest` your tests will pause at every instance of that function call, and you can then inspect the `request` object and figure out what's going on.
34 | 


--------------------------------------------------------------------------------
/pytest/pytest-uv.md:
--------------------------------------------------------------------------------
 1 | # Running pytest against a specific Python version with uv run
 2 | 
 3 | While [working on this issue](https://github.com/simonw/datasette/issues/2461) I figured out a neat pattern for running the tests for my project locally against a specific Python version using [uv run](https://docs.astral.sh/uv/guides/scripts/):
 4 | 
 5 | ```bash
 6 | uv run --python 3.12 --with '.[test]' pytest
 7 | ```
 8 | The new-to-me trick here is that `--with '.[test]` works for adding the project dependencies _and_ the test dependencies from the `setup.py` or `pyproject.toml` file in the current project directory.
 9 | 
10 | This makes it trivial to try running the test suite against different Python versions on demand without needing to worry about manually creating a virtual environment for each one.
11 | 
12 | This pattern works for more complex scenarios too. My project's GitHub Actions CI runs an additional variant that uses the `pytest-cov` plugin to generate coverage reports. I could simulate that locally by including that as another `--with pytest-cov` option - here I'm also adding the `-Werror` flag so any warnings would be treated as errors:
13 | 
14 | ```bash
15 | uv run --python 3.12 \
16 |   --with pytest-cov \
17 |   --with '.[test]' pytest \
18 |     -Werror \
19 |     --cov=datasette \
20 |     --cov-config=.coveragerc \
21 |     --cov-report xml:coverage.xml \
22 |     --cov-report term  
23 | ```
24 | 


--------------------------------------------------------------------------------
/pytest/session-scoped-tmp.md:
--------------------------------------------------------------------------------
 1 | # Session-scoped temporary directories in pytest
 2 | 
 3 | I habitually use the `tmpdir` fixture in pytest to get a temporary directory that will be cleaned up after each test, but that doesn't work with `scope="session"` - which can be used to ensure an expensive fixture is run only once per test session and the generated content is used for multiple tests.
 4 | 
 5 | To get a temporary directory that works with `scope="session"`, use the `tmp_path_factory` built-in pytest fixture like this:
 6 | 
 7 | ```python
 8 | import pytest
 9 | 
10 | 
11 | @pytest.fixture(scope="session")
12 | def template_dir(tmp_path_factory):
13 |     template_dir = tmp_path_factory.mktemp("page-templates")
14 |     pages_dir = template_dir / "pages"
15 |     pages_dir.mkdir()
16 |     (pages_dir / "about.html").write_text("ABOUT!", "utf-8")
17 |     (pages_dir / "request.html").write_text("request", "utf-8")
18 |     return template_dir
19 | 
20 | 
21 | def test_about(template_dir):
22 |     assert "ABOUT!" == (template_dir / "pages" / "about.html").read_text()
23 | 
24 | 
25 | def test_request(template_dir):
26 |     assert "request" == (template_dir / "pages" / "request.html").read_text()
27 | ```
28 | 
29 | Example: https://github.com/simonw/datasette/blob/1b7b66c465e44025ec73421bd69752e42f108321/tests/test_custom_pages.py#L16-L45
30 | 


--------------------------------------------------------------------------------
/pytest/treat-warnings-as-errors.md:
--------------------------------------------------------------------------------
 1 | # Treating warnings as errors in pytest
 2 | 
 3 | I was seeing this warning in a Django project when I thought I was correctly using timezone-aware dates everywhere:
 4 | 
 5 | > RuntimeWarning: DateTimeField Shift.shift_start received a naive datetime (2022-04-01 00:00:00) while time zone support is active
 6 | 
 7 | Running `pytest -Werror` turns those warnings into errors that fail the tests.
 8 | 
 9 | Which means you can investigate them in the Python debugger by running:
10 | 
11 |     pytest -Werror --pdb -x
12 | 
13 | The `--pdb` starts the debugger at the warning (now error) and the `-x` stops the tests after the first failure.
14 | 
15 | ## In pytest.ini
16 | 
17 | You can also set this in `pytest.ini` - useful if you want ALL warnings to be failures in both development and CI.
18 | 
19 | Add the following to the `pytest.ini` file:
20 | 
21 | ```ini
22 | [pytest]
23 | # ...
24 | filterwarnings =
25 |     error
26 | ```
27 | 
28 | ## Ignoring specific warnings
29 | 
30 | If you do this you may find there are warnings you cannot fix (because they are in dependency libraries). You can ignore those like this:
31 | 
32 | ```ini
33 | [pytest]
34 | # ...
35 | filterwarnings =
36 |     error
37 |     ignore::arrow.factory.ArrowParseWarning
38 | ```
39 | 
40 | You need to figure out the full path to the warning. I used `--pdb` to figure this out.
41 | 


--------------------------------------------------------------------------------
/python/build-official-docs.md:
--------------------------------------------------------------------------------
 1 | # Build the official Python documentation locally
 2 | 
 3 | First, checkout the cpython repo:
 4 | 
 5 |     $ git clone git@github.com:python/cpython
 6 | 
 7 | cd into the `Doc` directory:
 8 | 
 9 |     $ cd cpython/Doc
10 | 
11 | To get a virtual environment with Sphinx and other required tools, run `make venv` like this:
12 | 
13 |     $ make venv
14 |     python3 -m venv ./venv
15 |     ./venv/bin/python3 -m pip install -U pip setuptools
16 |     ...
17 |     Installing collected packages: snowballstemmer, sphinxcontrib-jsmath, sphinxcontrib-devhelp, sphinxcontrib-serializinghtml, pyparsing, six, packaging, sphinxcontrib-applehelp, sphinxcontrib-qthelp, alabaster, docutils, MarkupSafe, Jinja2, pytz, babel, imagesize, chardet, certifi, urllib3, idna, requests, Pygments, sphinxcontrib-htmlhelp, Sphinx, blurb, python-docs-theme
18 |         Running setup.py install for python-docs-theme ... done
19 |     Successfully installed Jinja2-2.11.2 MarkupSafe-1.1.1 Pygments-2.6.1 Sphinx-2.2.0 alabaster-0.7.12 babel-2.8.0 blurb-1.0.7 certifi-2020.4.5.1 chardet-3.0.4 docutils-0.16 idna-2.9 imagesize-1.2.0 packaging-20.3 pyparsing-2.4.7 python-docs-theme-2020.1 pytz-2020.1 requests-2.23.0 six-1.14.0 snowballstemmer-2.0.0 sphinxcontrib-applehelp-1.0.2 sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp-1.0.3 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml-1.1.4 urllib3-1.25.9
20 |     The venv has been created in the ./venv directory
21 | 
22 | Now running `make html` will build the docs:
23 | 
24 |     $ make html
25 | 
26 | Run `open build/html/index.html` to view them in a browser.
27 | 


--------------------------------------------------------------------------------
/python/click-file-encoding.md:
--------------------------------------------------------------------------------
 1 | # Explicit file encodings using click.File
 2 | 
 3 | I wanted to add a `--encoding` option to `sqlite-utils insert` which could be used to change the file encoding used to read the incoming CSV or TSV file - see [sqlite-utils #182](https://github.com/simonw/sqlite-utils/issues/182).
 4 | 
 5 | Just one problem: the Click file argument was defined using `click.File()`, which has useful features like automatically handling `-` for standard input. The code looked like this:
 6 | 
 7 | ```python
 8 | @click.argument("json_file", type=click.File(), required=True)
 9 | # ...
10 | def command(json_file):
11 |     # ...
12 | ```
13 | `click.File()` takes an optional `encoding=` parameter, but that requires you to know the file encoding in advance. In my case I wanted to use the encoding optionally provided by an `--encoding=` option.
14 | 
15 | The workaround I came up with was to switch to using `click.File("rb")`, which opened the incoming file (or stdin stream) in binary mode. I could then wrap it in a `codecs.getreader()` object that could convert those bytes into a Python string using the user-supplied encoding.
16 | 
17 | ```python
18 | @click.argument("json_file", type=click.File("rb"), required=True)
19 | @click.option("--encoding", help="Character encoding for input, defaults to utf-8")
20 | def command(json_file, encoding):
21 |     encoding = encoding or "utf-8"
22 |     json_file = codecs.getreader(encoding)(json_file)
23 |     reader = csv.reader(json_file)
24 |     # ...
25 | ```
26 | 


--------------------------------------------------------------------------------
/python/click-option-names.md:
--------------------------------------------------------------------------------
 1 | # Understanding option names in Click
 2 | 
 3 | I hit [a bug today](https://github.com/simonw/datasette/issues/973) where I had defined a Click option called `open` but in doing so I replaced the Python bulit-in `open()` function:
 4 | 
 5 | ```python
 6 | @click.command()
 7 | # ...
 8 | @click.option("-o", "--open", is_flag=True, help="Open Datasette in your web browser")
 9 | def my_command(open):
10 |     # Now open() is no longer available
11 | ```
12 | This inspired me to finally figure out how Click function argument names work. It's documented here: https://click.palletsprojects.com/en/7.x/options/#name-your-options
13 | 
14 | Short version: you can do this:
15 | 
16 | ```python
17 | @click.command()
18 | # ...
19 | @click.option("-o", "--open", "open_browser", is_flag=True, help="Open Datasette in your web browser")
20 | def my_command(open_browser):
21 |     # Now open() can be used safely
22 | ```
23 | Click will use the positional argument without any hyphen prefixes as the name of the argument. If none is provided it will use the first `--` one. If that's not defined it will use the `-o` one - all with the hypens stripped.
24 | 


--------------------------------------------------------------------------------
/python/comparing-version-numbers.md:
--------------------------------------------------------------------------------
 1 | # Programmatically comparing Python version strings
 2 | 
 3 | I found myself wanting to compare the version numbers `0.63.1`, `1.0` and the `1.0a13` in Python code, in order to mark a `pytest` test as skipped if the installed version of Datasette was pre-1.0.
 4 | 
 5 | This is very slightly tricky, because `1.0` is a higher version than `1.0a13` but comparing it based on string comparison or a `tuple` of components split by `.` would give the wrong result.
 6 | 
 7 | It turns out the [packaging.version](https://packaging.pypa.io/en/stable/version.html) Python package solves this exact problem:
 8 | 
 9 | ```bash
10 | python -m pip install packaging
11 | ```
12 | Then:
13 | ```python
14 | from packaging.version import parse
15 | 
16 | v_1_0 = parse("1.0")
17 | v_1_0a13 = parse("1.0a13")
18 | v_0631 = parse("0.63.1")
19 | ```
20 | And some comparisons:
21 | ```pycon
22 | >>> v_1_0 > v_1_0a13
23 | True
24 | >>> v_1_0 < v_1_0a13
25 | False
26 | >>> v_0631 < v_1_0
27 | True
28 | >>> v_0631 < v_1_0a13
29 | True
30 | ```
31 | 
32 | ## Using this with pytest
33 | 
34 | Here's how I used this to decorate a `pytest` test so it would only run on versions of Datasette more recent than `1.0a13`:
35 | ```python
36 | from datasette import version
37 | from packaging.version import parse
38 | import pytest
39 | 
40 | 
41 | @pytest.mark.asyncio
42 | @pytest.mark.skipif(
43 |     parse(version.__version__) < parse("1.0a13"),
44 |     reason="uses row_actions() plugin hook",
45 | )
46 | async def test_row_actions():
47 |     # ...
48 | ```
49 | Full [example test here](https://github.com/datasette/datasette-enrichments/blob/c0deca1481d1c1e4b4d6a5802d4252adc3d84fb8/tests/test_enrichments.py#L153-L180).
50 | 


--------------------------------------------------------------------------------
/python/convert-to-utc-without-pytz.md:
--------------------------------------------------------------------------------
 1 | # Convert a datetime object to UTC without using pytz
 2 | 
 3 | I wanted to convert a datetime object (from GitPython) to UTC without adding the `pytz` dependency.
 4 | 
 5 | ```python
 6 | from datetime import timezone
 7 | import git
 8 | 
 9 | 
10 | repo = git.Repo(".", odbt=git.GitDB)
11 | commit = list(repo.iter_commits(ref))[0]
12 | dt = commit.committed_datetime
13 | # This was 2020-04-19T07:55:08-07:00
14 | dt_in_utc = dt.astimezone(timezone.utc)
15 | 
16 | # Now use .isoformat() to convert to a string
17 | print(dt_in_utc.isoformat())
18 | # Came out as 2020-04-19T14:55:08+00:00
19 | ```
20 | 


--------------------------------------------------------------------------------
/python/fabric-ssh-key.md:
--------------------------------------------------------------------------------
 1 | # Using Fabric with an SSH public key
 2 | 
 3 | Inspired by [this tweet](https://twitter.com/driscollis/status/1445772718507376646) by Mike Driscoll I decided to try using Fabric to run commands over SSH from a Python script, using a public key for authentication.
 4 | 
 5 | This worked:
 6 | 
 7 | ```python
 8 | from fabric import Connection
 9 | 
10 | c = Connection(
11 |     host="my-server.simonwillison.net",
12 |     user="root",
13 |     connect_kwargs={
14 |         "key_filename": "/path/to/id_rsa.pub",
15 |     },
16 | )
17 | output = c.run("pwd && ls -lah")
18 | # This outputs to the console but also lets me retrieve the result like so:
19 | string_output = output.stdout
20 | ```
21 | 
22 | Mike's Tweet recipe looks like this:
23 | ```python
24 | with Connection(f"{username}@{host}:{port}", connect_kwargs={"password: "pw"}) as conn:
25 |     output = conn.run("docker images")
26 |     print(output.stdout)
27 | ```
28 | 


--------------------------------------------------------------------------------
/python/ignore-both-flake8-and-mypy.md:
--------------------------------------------------------------------------------
 1 | # Ignoring a line in both flake8 and mypy
 2 | 
 3 | I [needed to tell](https://github.com/simonw/sqlite-utils/pull/347#issuecomment-982133970) both `flake8` and `mypy` to ignore the same line of code.
 4 | 
 5 | This worked:
 6 | 
 7 | ```python
 8 | from sqlite3.dump import _iterdump as iterdump # type: ignore # noqa: F401
 9 | ```
10 | 
11 | The order here mattered. This did not get picked up by both tools:
12 | 
13 |     # noqa: F401 # type: ignore
14 | 
15 | But this did:
16 | 
17 |     # type: ignore # noqa: F401
18 | 


--------------------------------------------------------------------------------
/python/introspect-function-parameters.md:
--------------------------------------------------------------------------------
 1 | # Introspecting Python function parameters
 2 | 
 3 | For https://github.com/simonw/datasette/issues/581 I want to be able to inspect a Python function to determine which named parameters it accepts and send only those arguments.
 4 | 
 5 | Python 3.3 added [an inspect.signature() function](https://docs.python.org/3/library/inspect.html#introspecting-callables-with-the-signature-object) that can be used for this.
 6 | 
 7 | Here's a function I wrote to take advantage of that and solve my problem:
 8 | 
 9 | ```python
10 | def call_with_supported_arguments(fn, **kwargs):
11 |     parameters = inspect.signature(fn).parameters.keys()
12 |     call_with = []
13 |     for parameter in parameters:
14 |         if parameter not in kwargs:
15 |             raise TypeError("{} requires parameters {}".format(fn, tuple(parameters)))
16 |         call_with.append(kwargs[parameter])
17 |     return fn(*call_with)
18 | ```
19 | 
20 | And here's an illustrative unit test:
21 | 
22 | ```python
23 | def test_call_with_supported_arguments():
24 |     def foo(a, b):
25 |         return "{}+{}".format(a, b)
26 | 
27 |     assert "1+2" == utils.call_with_supported_arguments(foo, a=1, b=2)
28 |     assert "1+2" == utils.call_with_supported_arguments(foo, a=1, b=2, c=3)
29 | 
30 |     with pytest.raises(TypeError):
31 |         utils.call_with_supported_arguments(foo, a=1)
32 | ```
33 | 


--------------------------------------------------------------------------------
/python/json-floating-point.md:
--------------------------------------------------------------------------------
 1 | # Outputting JSON with reduced floating point precision
 2 | 
 3 | [datasette-leaflet-geojson](github.com/simonw/datasette-leaflet-geojson) outputs GeoJSON geometries in HTML pages in a way that can be picked up by JavaScript and used to plot a Leaflet map.
 4 | 
 5 | These geometries often look something like this:
 6 | 
 7 | ```
 8 | {
 9 |   "type":"MultiPolygon",
10 |   "coordinates":[[[[-122.4457678900319,37.77292891669105],[-122.441075063058,37.77352490695095]...
11 | ```
12 | [Decimal_degrees: Precision on Wikipedia](https://en.wikipedia.org/wiki/Decimal_degrees#Precision) says that `0.00001` should be accurate to within around a meter.
13 | 
14 | Shortening these floating point representations can shave 100KB+ off an HTML page with a lot of GeoJSON shapes on it!
15 | 
16 | There's [a lengthy Stack Overflow](https://stackoverflow.com/questions/1447287/format-floats-with-standard-json-module) about this but it's difficult to follow because ways of doing this changed between Python 2 and Python 3. Here's what worked for me:
17 | ```python
18 | 
19 | def round_floats(o):
20 |     if isinstance(o, float):
21 |         return round(o, 5)
22 |     if isinstance(o, dict):
23 |         return {k: round_floats(v) for k, v in o.items()}
24 |     if isinstance(o, (list, tuple)):
25 |         return [round_floats(x) for x in o]
26 |     return o
27 | 
28 | data = json.dumps(round_floats(data))
29 | ```
30 | See [issue 11](https://github.com/simonw/datasette-leaflet-geojson/issues/11) for details.
31 | 


--------------------------------------------------------------------------------
/python/lxml-m1-mac.md:
--------------------------------------------------------------------------------
 1 | # Installing lxml for Python on an M1/M2 Mac
 2 | 
 3 | I ran into this error while trying to run `pip install lxml` on an M2 Mac, inside a virtual environment I had intitially created using `pipenv shell`:
 4 | ```
 5 | % pip install lxml
 6 | Collecting lxml
 7 |   Using cached lxml-4.9.2.tar.gz (3.7 MB)
 8 |   Preparing metadata (setup.py) ... done
 9 | Building wheels for collected packages: lxml
10 |   Building wheel for lxml (setup.py) ... error
11 |   error: subprocess-exited-with-error
12 |   
13 |   × python setup.py bdist_wheel did not run successfully.
14 |   │ exit code: 1
15 |   ╰─> [121 lines of output]
16 | ...
17 |       src/lxml/etree.c:96:10: fatal error: 'Python.h' file not found
18 |       #include "Python.h"
19 |                ^~~~~~~~~~
20 |       1 error generated.
21 |       Compile failed: command '/usr/bin/clang' failed with exit code 1
22 | ...
23 | ```
24 | I eventually realized that this was using the system Python - `/usr/bin/python3` - which doesn't have access to the necessary headers needed to build `lxml`.
25 | 
26 | I had also installed Python using Homebrew, which DOES include those headers - but the environment I was working in was using a different Python version.
27 | 
28 | I'm using `pipenv` to manage my environments, so the fix for me was to do this:
29 | 
30 | ```
31 | pipenv --python /opt/homebrew/bin/python3.11
32 | ```
33 | (`/opt/homebrew/bin/python3.10` would have worked too.)
34 | 
35 | Then within my new environment `pip install lxml` worked just fine.
36 | 
37 | If I wasn't using `pipenv` I would run this command to create a fresh virtual environment instead:
38 | 
39 |     /opt/homebrew/bin/python3.11 -m venv venv
40 |     venv/bin/pip install lxml
41 | 


--------------------------------------------------------------------------------
/python/output-json-array-streaming.md:
--------------------------------------------------------------------------------
 1 | # Streaming indented output of a JSON array
 2 | 
 3 | I wanted to produce the following output from a command-line tool:
 4 | 
 5 | ```json
 6 | [
 7 |   {
 8 |     "id": 1,
 9 |     "name": "Simon"
10 |   },
11 |   {
12 |     "id": 2,
13 |     "name": "Cleo"
14 |   },
15 |   {
16 |     "id": 3,
17 |     "name": "Azi"
18 |   }
19 | ]
20 | ```
21 | But I wanted to do it from a streaming iterator - without first buffering the entire output in an in-memory Python list and calling `json.dumps()` on it.
22 | 
23 | Here's the pattern I came up with:
24 | 
25 | ```python
26 | import itertools, json, textwrap
27 | 
28 | def stream_indented_json(iterator, indent=2):
29 |     # We have to iterate two-at-a-time so we can know if we
30 |     # should output a trailing comma or if we have reached
31 |     # the last item.
32 |     current_iter, next_iter = itertools.tee(iterator, 2)
33 |     next(next_iter, None)
34 |     first = True
35 |     for item, next_item in itertools.zip_longest(current_iter, next_iter):
36 |         is_last = next_item is None
37 |         data = item
38 |         line = "{first}{serialized}{separator}{last}".format(
39 |             first="[\n" if first else "",
40 |             serialized=textwrap.indent(
41 |                 json.dumps(data, indent=indent, default=repr), " " * indent
42 |             ),
43 |             separator="," if not is_last else "",
44 |             last="\n]" if is_last else "",
45 |         )
46 |         yield line
47 |         first = False
48 |     if first:
49 |         # We didn't output anything, so yield the empty list
50 |         yield "[]"
51 | ```
52 | Example usage:
53 | ```python
54 | for line in stream_indented_json(item_iterator):
55 |     click.echo(line)
56 | ```
57 | I built this for [s3-credentials issue #48](https://github.com/simonw/s3-credentials/issues/48).
58 | 


--------------------------------------------------------------------------------
/python/pip-cache.md:
--------------------------------------------------------------------------------
 1 | # The location of the pip cache directory
 2 | 
 3 | `pip` uses a cache to avoid downloading packages again:
 4 | 
 5 | ```
 6 | % pip install lxml  
 7 | Collecting lxml
 8 |   Using cached lxml-4.9.2-cp311-cp311-macosx_13_0_arm64.whl
 9 | Installing collected packages: lxml
10 | Successfully installed lxml-4.9.2
11 | ```
12 | The `pip cache dir` command can be used to find the location of that cache on your system:
13 | ```
14 | % pip cache dir
15 | /Users/simon/Library/Caches/pip
16 | ```
17 | Wheels are cached in `pip/wheels` - in a nested set of folders based on a hash, for example:
18 | ```
19 | wheels/fb/5b/f7/0a27880b4a007daeff53a196d01901627f640392b7e76e76e5/lxml-4.9.2-cp311-cp311-macosx_13_0_arm64.whl
20 | ```
21 | I found this pattern worked for deleting files from the cache:
22 | ```
23 | cd $(pip cache dir)
24 | find wheels | grep lxml | xargs rm
25 | ```
26 | 


--------------------------------------------------------------------------------
/python/pipx-alpha.md:
--------------------------------------------------------------------------------
 1 | # Upgrading a pipx application to an alpha version
 2 | 
 3 | I wanted to upgrade my [git-history](https://datasette.io/tools/git-history) installation to a new alpha version.
 4 | 
 5 | `pipx upgrade git-history` doesn't work for that, because it upgrades to the most recent stable version - but I wanted the alpha.
 6 | 
 7 | This recipe did what I wanted:
 8 | 
 9 |     pipx inject git-history git-history==0.7a0
10 | 
11 | `pipx inject` provides a way to manipulate the packages installed in a specific `pipx` managed virtual environment.
12 | 
13 | The above command runs the equivalent of `pip install git-history==0.7a0` inside the virtual environment that `pipx` is already managing for `git-history`.
14 | 
15 | I confirmed that it worked like this:
16 | ```
17 | ~ % git-history --version                     
18 | git-history, version 0.7a0
19 | ```
20 | 


--------------------------------------------------------------------------------
/python/platform-specific-dependencies.md:
--------------------------------------------------------------------------------
 1 | # Use setup.py to install platform-specific dependencies
 2 | 
 3 | For [photos-to-sqlite](https://github.com/dogsheep/photos-to-sqlite) I needed to install `osxphotos` as a dependency, but only if the platform is macOS - it's not available for Linux.
 4 | 
 5 | Here's the magic incantation to do that:
 6 | 
 7 | ```python
 8 | setup(
 9 |     name="photos-to-sqlite",
10 |     ...
11 |     install_requires=[
12 |         "sqlite-utils>=2.7",
13 |         "boto3>=1.12.41",
14 |         "osxphotos>=0.28.13 ; sys_platform=='darwin'",
15 |     ]
16 | )
17 | ```
18 | So ` ; sys_platform=='darwin'` in the install requires line.
19 | 
20 | More details: https://www.python.org/dev/peps/pep-0508/#environment-markers and https://hynek.me/articles/conditional-python-dependencies/
21 | 


--------------------------------------------------------------------------------
/python/quick-testing-pyenv.md:
--------------------------------------------------------------------------------
 1 | # Quickly testing code in a different Python version using pyenv
 2 | 
 3 | I had [a bug](https://github.com/simonw/llm/issues/82#issuecomment-1629735729) that was only showing up in CI against Python 3.8.
 4 | 
 5 | I used the following pattern with [pyenv](https://github.com/pyenv/pyenv) to quickly run the tests against that specific version.
 6 | 
 7 | (I had previously installed `pyenv` using `brew install pyenv`.)
 8 | 
 9 | ## Seeing what versions I had already
10 | 
11 | ```bash
12 | pyenv versions
13 | ```
14 | This outputs (on my machine):
15 | ```
16 |   system
17 |   3.7.16
18 |   3.8.17
19 | ```
20 | To see all possible versions:
21 | ```bash
22 | pyenv install --list
23 | ```
24 | That's a long list! I grepped it for 3.8:
25 | ```bash
26 | pyenv install --list | grep '3.8'
27 | ```
28 | ```
29 |   3.8.0
30 |   3.8-dev
31 |   3.8.1
32 |   3.8.2
33 |   ...
34 |   3.8.14
35 |   3.8.15
36 |   3.8.16
37 |   3.8.17
38 |   ...
39 | ```
40 | ## Installing a specific version
41 | 
42 | I installed 3.8.17 like this:
43 | ```bash
44 | pyenv install 3.8.17
45 | ```
46 | This took a long time, because it compiled it from scratch.
47 | 
48 | ## Using that version via a virtual environment
49 | 
50 | I decided to use that version of Python directly. The binary was installed here:
51 | ```bash
52 | ~/.pyenv/versions/3.8.17/bin/python
53 | ```
54 | I created a temporary virtual environment in `/tmp` like this:
55 | ```bash
56 | ~/.pyenv/versions/3.8.17/bin/python -m venv /tmp/py38env
57 | ```
58 | Then installed my current project into that environment like so:
59 | ```bash
60 | /tmp/py38env/bin/pip install -e '.[test]'
61 | ```
62 | Now I can run the tests like this:
63 | ```bash
64 | /tmp/py38env/bin/pytest
65 | ```
66 | 


--------------------------------------------------------------------------------
/python/safe-output-json.md:
--------------------------------------------------------------------------------
 1 | # Safely outputting JSON
 2 | 
 3 | Carelessly including the output of `json.dumps()` in an HTML page can lead to an XSS hole, thanks to the following:
 4 | 
 5 | ```pycon
 6 | >>> import json
 7 | >>> s = "</script><script>alert(document.location)</script>"
 8 | >>> print(json.dumps({"bad": s}))
 9 | {"bad": "</script><script>alert(document.location)</script>"}
10 | ```
11 | Jinja has a function that avoids this in `jinja2.utils.htmlsafe_json_dumps()` - the (simplified) implementation [looks like this](https://github.com/pallets/jinja/blob/3.0.3/src/jinja2/utils.py#L704-L741):
12 | 
13 | ```python
14 | def htmlsafe_json_dumps(obj):
15 |     return (
16 |         json.dumps(obj)
17 |         .replace("<", "\\u003c")
18 |         .replace(">", "\\u003e")
19 |         .replace("&", "\\u0026")
20 |         .replace("'", "\\u0027")
21 |     )
22 | ```
23 | Which outputs like this:
24 | ```pycon
25 | >>> print(htmlsafe_json_dumps({"bad": s}))
26 | {"bad": "\u003c/script\u003e\u003cscript\u003ealert(document.location)\u003c/script\u003e"}
27 | ```
28 | 


--------------------------------------------------------------------------------
/python/too-many-open-files-psutil.md:
--------------------------------------------------------------------------------
 1 | # Using psutil to investigate "Too many open files"
 2 | 
 3 | I was getting this intermittent error running my Datasette test suite:
 4 | 
 5 |     OSError: [Errno 24] Too many open files
 6 | 
 7 | To figure out what was going on, I used the `psutil` package and its `open_files()` method.
 8 | 
 9 | Here's the documentation for [psutil.Process.open_files](https://psutil.readthedocs.io/en/latest/#psutil.Process.open_files).
10 | 
11 | I ran `pip install psutil` in my virtual environment.
12 | 
13 | Then I ran `pytest --pdb` to drop into a Python debugger when a test failed.
14 | 
15 | In the debugger I ran this:
16 | 
17 | ```python
18 | import psutil
19 | for f in psutil.Process().open_files(): print(f)
20 | ```
21 | This showed the current list of open files for that process, which gave me [some clues](https://github.com/simonw/datasette/issues/1843) to help me start resolving the problem.
22 | 


--------------------------------------------------------------------------------
/python/tracing-every-statement.md:
--------------------------------------------------------------------------------
 1 | # Tracing every executed Python statement
 2 | 
 3 | Today I learned how to use the Python [trace module](https://docs.python.org/3/library/trace.html) to output every single executed line of Python code in a program - useful for figuring out exactly when a crash or infinite loop happens.
 4 | 
 5 | The basic format is to run:
 6 | 
 7 |     python3 -m trace --trace myscript.py
 8 | 
 9 | This will execute the script and print out every single line of code as it executes - which can be a LOT of output. It slows the program down to a crawl - just starting up Datasette took probably over a minute and churned through hundreds of thousands of lines of console output.
10 | 
11 | Since Datasette is a command-line application, I needed to use the following recipe to trace it:
12 | 
13 |     python3 -m trace --trace $(which datasette) fixtures.db -p 8002
14 | 


--------------------------------------------------------------------------------
/python/yielding-in-asyncio.md:
--------------------------------------------------------------------------------
 1 | # Relinquishing control in Python asyncio
 2 | 
 3 | `asyncio` in Python is a form of co-operative multitasking, where everything runs in a single thread but asynchronous tasks can yield to other tasks to allow them to execute.
 4 | 
 5 | Normally you do this with `await` - but I'm thinking through a problem at the moment which could involve long-running asyncio functions. To avoid blocking the event loop, I'd like them to periodically yield to see if there are any other tasks that need to spend some time with the CPU.
 6 | 
 7 | This doesn't seem to be covered in the Python `asyncio` documentation, but after some digging I came across this issue in the old `python/asyncio` repo: [Question: How to relinquishing control to the event loop in Python 3.5](https://github.com/python/asyncio/issues/284)
 8 | 
 9 | It turns out the supported, optimized idiom is this one:
10 | 
11 | ```python
12 | await asyncio.sleep(0)
13 | ```
14 | 


--------------------------------------------------------------------------------
/readthedocs/pip-install-docs.md:
--------------------------------------------------------------------------------
 1 | # Running pip install '.[docs]' on ReadTheDocs
 2 | 
 3 | I decided to use ReadTheDocs for my in-development [datasette-enrichments](https://github.com/datasette/datasette-enrichments) project.
 4 | 
 5 | Previously when I've used ReadTheDocs I've had a `docs/` folder in my project with its own `docs/requirements.txt` file containing the requirements.
 6 | 
 7 | For this project I decided to try putting my documentation dependencies in a `setup.py` file (which I will likely upgrade to `pyproject.toml` in the future) like this:
 8 | 
 9 | ```python
10 |     # ...
11 |     extras_require={
12 |         "test": ["pytest", "pytest-asyncio", "black", "cogapp", "ruff"],
13 |         "docs": [
14 |             "sphinx==7.2.6",
15 |             "furo==2023.9.10",
16 |             "sphinx-autobuild",
17 |             "sphinx-copybutton",
18 |             "myst-parser",
19 |             "cogapp",
20 |         ],
21 |     },
22 |     # ...
23 | ```
24 | When I'm working on this project locally I install these dependencies like so:
25 | ```bash
26 | pip install -e '.[docs]'
27 | ```
28 | It took me a few iterations to figure it out, so here's how to run that same command on ReadTheDocs using the `.readthedocs.yaml` configuration file:
29 | 
30 | ```yaml
31 | version: 2
32 | 
33 | build:
34 |   os: ubuntu-22.04
35 |   tools:
36 |     python: "3.12"
37 | 
38 | sphinx:
39 |   configuration: docs/conf.py
40 | 
41 | formats:
42 | - pdf
43 | - epub
44 | 
45 | python:
46 |   install:
47 |   - method: pip
48 |     path: .
49 |     extra_requirements:
50 |     - docs
51 | ```
52 | 


--------------------------------------------------------------------------------
/readthedocs/readthedocs-search-api.md:
--------------------------------------------------------------------------------
 1 | # Read the Docs Search API
 2 | 
 3 | I stumbled across this API today: https://docs.datasette.io/_/api/v2/docsearch/?q=startup&project=datasette&version=stable&language=en
 4 | 
 5 | It's used by the search feature at https://docs.datasette.io/en/stable/search.html?q=startup&check_keywords=yes&area=default - I had assumed that feature was implemented in JavaScript (as is common in Sphinx world), but Read the Docs have upgraded it with their own backend search index.
 6 | 
 7 | It's built on Elasticsearch - they have developer docs for it here: https://github.com/readthedocs/readthedocs.org/blob/master/docs/development/search.rst
 8 | 
 9 | It looks like the key bits of the server-side implementation live here:
10 | 
11 | - https://github.com/readthedocs/readthedocs.org/blob/master/readthedocs/search/api.py
12 | - https://github.com/readthedocs/readthedocs.org/blob/master/readthedocs/search/faceted_search.py
13 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | GitPython
 2 | sqlite-utils>=3.2
 3 | beautifulsoup4
 4 | datasette>=0.65.1
 5 | datasette-atom>=0.7
 6 | datasette-publish-fly
 7 | datasette-template-sql>=1.0.2
 8 | datasette-graphql
 9 | datasette-block-robots
10 | datasette-sitemap>=1.0
11 | s3-credentials
12 | shot-scraper
13 | openai-to-sqlite>=0.4.2
14 | sqlite-utils-sqlite-vec
15 | datasette-sqlite-vec
16 | 


--------------------------------------------------------------------------------
/script/bootstrap:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | pipenv shell
3 | pip install -r requirements.txt
4 | 


--------------------------------------------------------------------------------
/script/build:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | python build_database.py
3 | python generate_screenshots.py
4 | python update_readme.py --rewrite
5 | 


--------------------------------------------------------------------------------
/script/server:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | datasette . -p 8002
3 | 


--------------------------------------------------------------------------------
/script/update:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | pip install -r requirements.txt
3 | 


--------------------------------------------------------------------------------
/selenium/async-javascript-in-selenium.md:
--------------------------------------------------------------------------------
 1 | # Using async/await in JavaScript in Selenium
 2 | 
 3 | Thanks [Stuart Langridge](https://twitter.com/sil/status/1312137808111304704) for showing me how to do this:
 4 | 
 5 | ```python
 6 | from selenium import webdriver
 7 | 
 8 | chromedriver_path = "/Users/simon/bin/chromedriver"
 9 | driver = webdriver.Chrome(executable_path=chromedriver_path)
10 | 
11 | script = """
12 | done = arguments[arguments.length-1];
13 | a1 = async () => {
14 |     return 42
15 | };
16 | a2 = async() => {
17 |     return await a1()+1
18 | };
19 | a2().then(done);
20 | """
21 | output = driver.execute_async_script(script)
22 | # output is now the Python integer 43
23 | ```
24 | 


--------------------------------------------------------------------------------
/spatialite/minimal-spatialite-database-in-python.md:
--------------------------------------------------------------------------------
 1 | # Creating a minimal SpatiaLite database with Python
 2 | 
 3 | When writing a test for [datasette-leaflet-freedraw](https://github.com/simonw/datasette-leaflet-freedraw) I realized I didn't have a simple tiny recipe for creating an in-memory SpatiaLite database in Python. I came up with this:
 4 | 
 5 | ```python
 6 | import sqlite3
 7 | 
 8 | SPATIALITE = "/usr/local/lib/mod_spatialite.dylib"
 9 | 
10 | db = sqlite3.connect(":memory:")
11 | db.enable_load_extension(True)
12 | db.execute("SELECT load_extension(?)", [SPATIALITE])
13 | db.execute("SELECT InitSpatialMetadata(1)")
14 | db.execute("CREATE TABLE places_spatialite (id integer primary key, name text)")
15 | db.execute(
16 |     "SELECT AddGeometryColumn('places_spatialite', 'geometry', 4326, 'POINT', 'XY');"
17 | )
18 | # Then to add a spatial index:
19 | db.execute(
20 |     "SELECT CreateSpatialIndex('places_spatialite', 'geometry');"
21 | )
22 | ```
23 | Datasette and `sqlite-utils` both have `find_spatialite()` utility functions. Here's how to call the Datasette one as a one-liner:
24 | ```
25 | % python -c 'import datasette.utils; print(datasette.utils.find_spatialite())'
26 | /usr/local/lib/mod_spatialite.dylib
27 | ```
28 | I also remembered I have this script: [build_small_spatialite_db.py](https://github.com/simonw/datasette/blob/main/tests/build_small_spatialite_db.py)
29 | 


--------------------------------------------------------------------------------
/sphinx/sphinx-ext-extlinks.md:
--------------------------------------------------------------------------------
 1 | # Using sphinx.ext.extlinks for issue links
 2 | 
 3 | Datasette's [release notes](https://github.com/simonw/datasette/blob/main/docs/changelog.rst) are formatted using Sphinx. Almost every bullet point links to the corresponding GitHub issue, so they were full of lines that look like this:
 4 | 
 5 | ``` - Fixed a bug where ``?_search=`` and ``?_sort=`` parameters were incorrectly duplicated when the filter form on the table page was re-submitted. (`#1214 <https://github.com/simonw/datasette/issues/1214>`__) ```
 6 | 
 7 | I noticed that [the aspw documentation](https://github.com/simonw/datasette/issues/1227) was using `sphinx.ext.extlinks` to define a macro for this: `` :issue:`268` `` - so I decided to configure that for Datasette.
 8 | 
 9 | I added the following to the `conf.py` for my documentation:
10 | 
11 | ```python
12 | extensions = ["sphinx.ext.extlinks"]
13 | 
14 | extlinks = {
15 |     "issue": ("https://github.com/simonw/datasette/issues/%s", "#"),
16 | }
17 | ```
18 | 
19 | Then in Visual Studio Code I opened the "Edit -> Replace in Files" tool. I used this search pattern:
20 | 
21 |     `#(\d+) <https://github.com/simonw/datasette/issues/\1>`__
22 | 
23 | And this as the replacement value:
24 | 
25 |     :issue:`$1`
26 | 
27 | Note that the search pattern uses `\1` as the group reference for the captured number, but in the replacement value you use `$1` to re-use that value.
28 | 
29 | Here's [the commit](https://github.com/simonw/datasette/commit/d2d53a5559f3014cccba2cba7e1eab1e5854c759) where I applied that change to Datasette's existing documentation.
30 | 


--------------------------------------------------------------------------------
/sql/cumulative-total-over-time.md:
--------------------------------------------------------------------------------
 1 | # Cumulative total over time in SQL
 2 | 
 3 | This is a quick trick for creating a cumulative chart of the total number of items created over time based just on their creation date.
 4 | 
 5 | - [Example below](https://github-to-sqlite.dogsheep.net/github?sql=select%0D%0A++created_at%2C%0D%0A++%28%0D%0A++++select%0D%0A++++++count%28*%29%0D%0A++++from%0D%0A++++++repos+repos2%0D%0A++++where%0D%0A++++++repos2.owner+%3D+%3Ap0%0D%0A++++++and+repos2.created_at+%3C%3D+repos.created_at%0D%0A++%29+as+cumulative%0D%0Afrom%0D%0A++repos%0D%0Awhere%0D%0A++%22owner%22+%3D+%3Ap0%0D%0Aorder+by%0D%0A++created_at+desc&p0=9599#g.mark=line&g.x_column=created_at&g.x_type=temporal&g.y_column=cumulative&g.y_type=quantitative) which shows number of repositories I have created over time
 6 | - [Cumulative stars for a GitHub repository over time](https://github-to-sqlite.dogsheep.net/github?sql=select%0D%0A++starred_at%2C%0D%0A++(%0D%0A++++select%0D%0A++++++count(*)%0D%0A++++from%0D%0A++++++stars+stars2%0D%0A++++where%0D%0A++++++stars2.repo+%3D+stars.repo%0D%0A++++++and+stars2.starred_at+%3C%3D+stars.starred_at%0D%0A++)+as+cumulative%0D%0Afrom%0D%0A++stars%0D%0Awhere%0D%0A++repo+%3D+(select+id+from+repos+where+full_name+%3D+%3Afull_name)%0D%0Aorder+by%0D%0A++starred_at&full_name=dogsheep%2Fgithub-to-sqlite#g.mark=line&g.x_column=starred_at&g.x_type=temporal&g.y_column=cumulative&g.y_type=quantitative)
 7 | 
 8 | ```sql
 9 | select
10 |   created_at,
11 |   (
12 |     select
13 |       count(*)
14 |     from
15 |       repos repos2
16 |     where
17 |       repos2.owner = 9599
18 |       and repos2.created_at <= repos.created_at
19 |   ) as cumulative
20 | from
21 |   repos
22 | where
23 |   "owner" = 9599
24 | order by
25 |   created_at desc
26 | ```
27 | I imagine there's a more elegant way to do this using a window function but this works fine.
28 | 


--------------------------------------------------------------------------------
/sql/finding-dupes-by-name-and-distance.md:
--------------------------------------------------------------------------------
 1 | # Finding duplicate records by matching name and nearby distance
 2 | 
 3 | I wanted to find potentially duplicate records in my data, based on having the exact same name and being geographically located within 500 meters of each other.
 4 | 
 5 | This worked:
 6 | ```sql
 7 | with potential_duplicates as (
 8 |   select
 9 |     a.id as one,
10 |     b.id as two,
11 |     ST_Distance(a.point, b.point) as distance_m
12 |   from location a, location b 
13 |     where a.name = b.name
14 |     and a.id > b.id
15 |     and ST_Distance(a.point, b.point) < 500
16 | )
17 | select * from potential_duplicates
18 | ```
19 | I'm using a CTE here because it makes it easy to further customize the output with an additional query.
20 | 
21 | A few tricks in here:
22 | 
23 | - Alias the `location` twice as `a` and `b` in order to join against itself to find duplicates
24 | - The `ST_Distance(a.point, b.point) < 500` clause returns locations within 500m of each other
25 | - The `a.id > b.id` clause solves a problem I had with the first version of this query where each pairing was returned twice, with `one` and `two` swapped. By requiring `a` to have a higher `id` than `b` I avoid this problem entirely - and also prevent rows from matching themselves (where `a.id = b.id`).
26 | 


--------------------------------------------------------------------------------
/sqlite/blob-literals.md:
--------------------------------------------------------------------------------
 1 | # SQLite BLOB literals
 2 | 
 3 | I wanted to construct a string of SQL that would return a blob value:
 4 | 
 5 | ```sql
 6 | select 'binary-data' as content, 'x.jpg' as content_filename
 7 | ```
 8 | 
 9 | This was while writing a unit test for `datasette-media` - for [issue #19](https://github.com/simonw/datasette-media/issues/19). I used it in the test [here](https://github.com/simonw/datasette-media/blob/2cf64d949ccb8cd5f34b24aeb41b2a91de14cdd2/tests/test_media.py#L292-L295).
10 | 
11 | The SQLite documentation for [Literal values](https://www.sqlite.org/lang_expr.html#literal_values_constants_) explains how to do this:
12 | 
13 | > BLOB literals are string literals containing hexadecimal data and preceded by a single "x" or "X" character. Example: X'53514C697465' 
14 | 
15 | In Python 3 you can generate the hexadecimal representation of any byte string using `b'...'.hex()`
16 | 
17 | So my solution looked like this:
18 | 
19 | ```python
20 | jpeg_bytes = open("content.jpg", "rb").read()
21 | sql = "select X'{}' as content, 'x.jpg' as content_filename".format(jpeg_bytes.hex())
22 | ```
23 | 


--------------------------------------------------------------------------------
/sqlite/compile-spellfix-osx.md:
--------------------------------------------------------------------------------
 1 | # Compiling the SQLite spellfix.c module on macOS
 2 | 
 3 | I wanted to browse a backup copy of my Plex database, which is a SQLite file. I tried this:
 4 | 
 5 | ```
 6 | $ datasette databaseBackup.db94acbe6d-7442-4361-9663-61f1e97fe930
 7 | Usage: datasette serve [OPTIONS] [FILES]...
 8 | 
 9 | Error: Connection to databaseBackup.db94acbe6d-7442-4361-9663-61f1e97fe930 failed check: no such module: spellfix1
10 | ```
11 | The `spellfix1` module is an optional module for SQLite. It's not shipped as part of the SQLite amalgamation distribution, so you have to compile it separately.
12 | 
13 | Here's how I did that.
14 | 
15 | I needed to know my SQLite version, in order to download the correct C module:
16 | 
17 | ```
18 | $ sqlite3 --version
19 | 3.28.0 2019-04-15 14:49:49 378230ae7f4b721c8b8d83c8ceb891449685cd23b1702a57841f1be40b5daapl
20 | ```
21 | Now I need the source code for `spellfix.c` for that version. I navigated to https://github.com/sqlite/sqlite/blob/master/ext/misc/spellfix.c and then selected the `version-3.28.0` tag from the tag dropdown. This gave me the following page:
22 | 
23 | https://github.com/sqlite/sqlite/blob/version-3.28.0/ext/misc/spellfix.c
24 | 
25 | I downloaded the code using the "Raw" link:
26 | 
27 | ```
28 | cd /tmp
29 | wget 'https://raw.githubusercontent.com/sqlite/sqlite/version-3.28.0/ext/misc/spellfix.c'
30 | ```
31 | Next step - compile it:
32 | 
33 | ```
34 | gcc -I. -g -fPIC -shared spellfix.c -o spellfix.so
35 | ```
36 | The `spellfix.so` module can now be used with Datasette like this:
37 | ```
38 | datasette databaseBackup.db94acbe6d-7442-4361-9663-61f1e97fe930 --load-extension=spellfix.so
39 | ```
40 | 


--------------------------------------------------------------------------------
/sqlite/compile-sqlite3-ubuntu.md:
--------------------------------------------------------------------------------
 1 | # Compile a new sqlite3 binary on Ubuntu
 2 | 
 3 | I wanted to try the `vacuum into` backup command that was released in SQLite3 3.27.0 [on 2019-02-07](https://www.sqlite.org/changes.html#version_3_27_0).
 4 | 
 5 | Ubuntu 18.04.4 LTS has SQLite 3.22.0 from 2018-01-22.
 6 | 
 7 | Thankfully it's really easy to compile a new `sqlite3` binary on Ubuntu, using the amalgamation release.
 8 | 
 9 | ```
10 | cd /tmp
11 | wget https://www.sqlite.org/2020/sqlite-amalgamation-3310100.zip
12 | unzip sqlite-amalgamation-3310100.zip
13 | cd sqlite-amalgamation-3310100
14 | gcc shell.c sqlite3.c -lpthread -ldl
15 | ./a.out --version
16 | 3.31.1 2020-01-27 19:55:54 3bfa9cc97da10598521b342961df8f5f68c7388fa117345eeb516eaa837bb4d6
17 | ```
18 | Now you can move the `a.out` file somewhere else:
19 | ```
20 | mkdir ~/bin
21 | mv ./a.out ~/bin/sqlite3
22 | ```
23 | I used the backup command like this:
24 | ```
25 | /home/simon/bin/sqlite3 data.db "vacuum into '/tmp/backup.db'"
26 | ```
27 | 


--------------------------------------------------------------------------------
/sqlite/copy-tables-between-databases.md:
--------------------------------------------------------------------------------
 1 | # Copy tables between SQLite databases
 2 | 
 3 | I figured out a pattern for doing this today using the `sqlite3` CLI tool - given two SQLite databases in the current folder, called `tils.db` and `simonwillisonblog.db`:
 4 | 
 5 | ```bash
 6 | echo "
 7 | attach database 'simonwillisonblog.db' as simonwillisonblog;
 8 | attach database 'tils.db' as tils;
 9 | drop table if exists simonwillisonblog.til;
10 | create table simonwillisonblog.til as select * from tils.til;
11 | update simonwillisonblog.til set shot = null;
12 | " | sqlite3
13 | ```
14 | I'm using that in [this GitHub Actions workflow](https://github.com/simonw/simonwillisonblog-backup/blob/main/.github/workflows/backup.yml).
15 | 
16 | That last `update simonwillisonblog.til set shot = null` line is because the `shot` column contains a large BLOB screenshot image which I don't need in the copied table.
17 | 


--------------------------------------------------------------------------------
/sqlite/cte-values.md:
--------------------------------------------------------------------------------
 1 | # Combining CTEs and VALUES in SQLite
 2 | 
 3 | Here's how to use SQLite's `VALUES` syntax with a CTE to create a temporary table that you can then perform joins against in a query:
 4 | 
 5 | ```sql
 6 | with x(c1, c2, c3) as (
 7 |   values
 8 |     ('a', 'b', 3),
 9 |     ('b', 'c', 4)
10 | )
11 | select * from x
12 | ```
13 | [Try that here](https://latest.datasette.io/fixtures?sql=with+x%28c1%2C+c2%2C+c3%29+as+%28%0D%0A++values%0D%0A++++%28%27a%27%2C+%27b%27%2C+3%29%2C%0D%0A++++%28%27b%27%2C+%27c%27%2C+4%29%0D%0A%29%0D%0Aselect+*+from+x).
14 | 
15 | The output of this query is:
16 | 
17 | | c1 | c2 | c3 |
18 | | --- | --- | --- |
19 | | a | b | 3 |
20 | | b | c | 4 |
21 | 
22 | The `with x(c1, c2, c3)` bit defines a temporary table for the duration of the query called `x` with columns called `c1`, `c2` and `c3`.
23 | 
24 | Then the `values (...), (...)` bit defines two rows within that table - and can define many more.
25 | 
26 | This is useful for injecting data that you can then join against other tables - or for providing queries that include their own example data to illustrate different SQL concepts.
27 | 


--------------------------------------------------------------------------------
/sqlite/fixing-column-encoding-with-ftfy-and-sqlite-transform.md:
--------------------------------------------------------------------------------
 1 | # Fixing broken text encodings with sqlite-transform and ftfy
 2 | 
 3 | I was working with a database table that included values that were clearly in the wrong character encoding - values like this:
 4 | 
 5 |     Rue LÃ©opold I
 6 | 
 7 | I used my [sqlite-transform](https://github.com/simonw/sqlite-transform) tool with the [ftfy Python library](https://pypi.org/project/ftfy/) to fix that by running the following:
 8 | 
 9 | ```bash
10 | sqlite-transform lambda chiens.db espaces-pour-chiens-et-espaces-interdits-aux-chiens namefr \
11 |   --code 'ftfy.fix_encoding(value)' \
12 |   --import ftfy
13 | ```
14 | 
15 | That's the database file, the table and the column, then `--code` and `--import` to specify the transformation.
16 | 
17 | Since I had installed `sqlite-transform` using `pipx install sqlite-transform` I needed to first install the `ftfy` library into the correct virtual environment. The recipe for doing that is:
18 | 
19 | ```bash
20 | pipx inject sqlite-transform ftfy
21 | ```
22 | 


--------------------------------------------------------------------------------
/sqlite/floating-point-seconds.md:
--------------------------------------------------------------------------------
 1 | # SQLite timestamps with floating point seconds
 2 | 
 3 | Today I learned about this:
 4 | 
 5 | ```sql
 6 | select strftime('%Y-%m-%dT%H:%M:%f')
 7 | ```
 8 | Which outputs:
 9 | ```
10 | 2024-03-14T04:23:25.087Z
11 | ```
12 | Note the seconds component which reads `25.087` - that's what you get from the `%f` format string.
13 | 
14 | This is useful because it provides a string which captures timestamp information at the millisecond level but can still be sorted alphabetically to sort by date.
15 | 
16 | I spotted this in [the SQL schema](https://github.com/maragudk/goqite/blob/main/schema.sql) for [goqite](https://github.com/maragudk/goqite) by Markus Wüstenberg, who uses it for recording `created` and `updated` timestamps:
17 | ```sql
18 | create table goqite (
19 |   id text primary key default ('m_' || lower(hex(randomblob(16)))),
20 |   created text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
21 |   updated text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
22 |   queue text not null,
23 |   body blob not null,
24 |   timeout text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
25 |   received integer not null default 0
26 | ) strict;
27 | 
28 | create trigger goqite_updated_timestamp after update on goqite begin
29 |   update goqite set updated = strftime('%Y-%m-%dT%H:%M:%fZ') where id = old.id;
30 | end;
31 | ```
32 | Another neat trick in that schema:
33 | ```sql
34 | select lower(hex(randomblob(16)))
35 | ```
36 | Which returns random strings like this one, suitable for use as IDs:
37 | ```
38 | b4496695399120dfa999bff9981467b1
39 | ```
40 | 


--------------------------------------------------------------------------------
/sqlite/import-csv.md:
--------------------------------------------------------------------------------
 1 | # Importing CSV data into SQLite with .import
 2 | 
 3 | I usually use my `sqlite-utils insert blah.db tablename file.csv --csv` command to import CSV data into SQLite, but for large CSV files (like a 750MB one) this can take quite a long time - over half an hour in this case.
 4 | 
 5 | SQLite can import CSV data directly, and when I tried it on this file it completed in 45 seconds!
 6 | 
 7 | Here's how I scripted that from the command-line:
 8 | ```bash
 9 | sqlite3 data.db <<EOS
10 | .mode csv
11 | .import school.csv schools
12 | .import state.csv states
13 | EOS
14 | ```
15 | The `.mode csv` ensures the imported files are treated as CSV, then the `.import filename.csv tablename` lines import the data.
16 | 
17 | Every column was imported as `TEXT` - so I used `sqlite-utils transform` to transform some of the column types afterwards:
18 | ```
19 | sqlite-utils transform data.db schools \
20 |     --type school_nces_id integer \
21 |     --type year integer
22 | ```
23 | 


--------------------------------------------------------------------------------
/sqlite/null-case.md:
--------------------------------------------------------------------------------
 1 | # Null case comparisons in SQLite
 2 | 
 3 | I wanted to say "output this transformed value if it's not null, otherwise nothing". The recipe I figured out was:
 4 | 
 5 | ```sql
 6 |   case
 7 |     when (media_url_https is not null) then json_object('img_src', media_url_https, 'width', 300)
 8 |   end as photo
 9 | ```
10 | 
11 | Full query example:
12 | 
13 | ```sql
14 | select
15 |   created_at,
16 |   regexp_match('.*?(\d+(\.\d+))lb.*', full_text, 1) as lbs,
17 |   full_text,
18 |   case
19 |     when (media_url_https is not null) then json_object('img_src', media_url_https, 'width', 300)
20 |   end as photo
21 | from
22 |   tweets
23 |   left join media_tweets on tweets.id = media_tweets.tweets_id
24 |   left join media on media.id = media_tweets.media_id
25 | where
26 |   full_text like '%lb%'
27 |   and user = 3166449535
28 |   and lbs is not null
29 | group by
30 |   tweets.id
31 | order by
32 |   created_at
33 | ```
34 | This uses [datasette-rure](https://github.com/simonw/datasette-rure) for the `regexp_match()` function. Example output here: https://twitter.com/simonw/status/1249400425138155523
35 | 


--------------------------------------------------------------------------------
/sqlite/ordered-group-concat.md:
--------------------------------------------------------------------------------
 1 | # Ordered group_concat() in SQLite
 2 | 
 3 | I was trying to use `group_concat()` to glue together some column values into a stiched together Markdown document. My first attempt looked like this:
 4 | 
 5 | ```sql
 6 | select group_concat('## ' || chapter || '
 7 |                     
 8 | > ' || quote, '
 9 |                     
10 | ') from highlights order by timestamp
11 | ```
12 | 
13 | This attempt didn't work, because the order of the elements combined by a `group_concat()` [is undefined](https://www.sqlite.org/lang_aggfunc.html#group_concat):
14 | 
15 | > The group_concat() function returns a string which is the concatenation of all non-NULL values of X. If parameter Y is present then it is used as the separator between instances of X. A comma (",") is used as the separator if Y is omitted. **The order of the concatenated elements is arbitrary.**
16 | 
17 | It turns out you can fix this using a subselect:
18 | 
19 | ```sql
20 | select group_concat('## ' || chapter || '
21 |                     
22 | > ' || quote, '
23 |                     
24 | ') from (select chapter, quote from highlights order by timestamp)
25 | ```
26 | See [this explanation](https://sqlite.org/forum/forumpost/228bb96e12a746ce) by Keith Medcalf on the SQLite forum.
27 | 
28 | I think it may also be possible to solve this using Window functions. I tried doing this:
29 | ```sql
30 | select group_concat('## ' || chapter || '
31 |                     
32 | > ' || quote, '
33 |                     
34 | ') OVER (ORDER BY timestamp) from highlights
35 | ```
36 | Which almost worked... but it returned one row for each row in `highlights`, each one with a growing combined result - the result I wanted was in the last returned row.
37 | 


--------------------------------------------------------------------------------
/sqlite/python-sqlite-memory-to-file.md:
--------------------------------------------------------------------------------
 1 | # Saving an in-memory SQLite database to a file in Python
 2 | 
 3 | I was messing around in Python with an in-memory SQLite database, when I decided I actually wanted to save my experimental database to a file so I could explore it using [Datasette](https://datasette.io/).
 4 | 
 5 | In-memory databases can be created using `sqlite3` like this:
 6 | 
 7 | ```python
 8 | import sqlite3
 9 | 
10 | db = sqlite.connect(":memory:")
11 | ```
12 | Or with [sqlite-utils](https://sqlite-utils.datasette.io/) like this:
13 | ```python
14 | import sqlite_utils
15 | 
16 | db = sqlite_utils.Database(memory=True)
17 | ```
18 | 
19 | The `VACUUM INTO` command can be used to save a copy of the database to a new file. Here's how to use it:
20 | 
21 | ```python
22 | import sqlite3
23 | 
24 | db = sqlite3.connect(":memory:")
25 | db.execute("create table foo (bar text)")
26 | 
27 | db.execute("vacuum main into '/tmp/saved.db'")
28 | 
29 | # Or with sqlite-utils
30 | import sqlite_utils
31 | 
32 | db = sqlite_utils.Database(memory=True)
33 | 
34 | db["foo"].insert({"bar": "Example record"})
35 | 
36 | db.execute("vacuum main into '/tmp/saved2.db'")
37 | ```
38 | 


--------------------------------------------------------------------------------
/sqlite/simple-recursive-cte.md:
--------------------------------------------------------------------------------
 1 | # The simplest recursive CTE
 2 | 
 3 | I found this really simple recursive CTE useful for ensuring I understood how to write recursive CTEs.
 4 | ```sql
 5 | with recursive counter(x) as (
 6 |   select 0
 7 |     union
 8 |   select x + 1 from counter
 9 | )
10 | select * from counter limit 5;
11 | ```
12 | This query [returns five rows](https://latest.datasette.io/_memory?sql=with+recursive+counter%28x%29+as+%28%0D%0A++select+0%0D%0A++++union%0D%0A++select+x+%2B+1+from+counter%0D%0A%29%0D%0Aselect+*+from+counter+limit+10%3B) from a single column `x` - from 0 to 4.
13 | 
14 | |   x |
15 | |-----|
16 | |   0 |
17 | |   1 |
18 | |   2 |
19 | |   3 |
20 | |   4 |
21 | 
22 | If you write `with recursive counter as ...`, omitting the `(x)`, you get the following error:
23 | 
24 | > `no such column: x`
25 | 
26 | You can fix that by assigning `x` as the alias in the first part of that union:
27 | ```sql
28 | with recursive counter as (
29 |   select 0 as x
30 |     union
31 |   select x + 1 from counter
32 | )
33 | select * from counter limit 5;
34 | ```
35 | So that `counter(x)` formulation is really just a way to define the column names up front.
36 | 
37 | This query returns two columns, `x` and `y`:
38 | 
39 | ```sql
40 | with recursive counter(x, y) as (
41 |   select 0 as x, 1 as y
42 |     union
43 |   select x + 1, y + 2 from counter
44 | )
45 | select * from counter limit 5;
46 | ```
47 | |   x |   y |
48 | |-----|-----|
49 | |   0 |   1 |
50 | |   1 |   3 |
51 | |   2 |   5 |
52 | |   3 |   7 |
53 | |   4 |   9 |
54 | 


--------------------------------------------------------------------------------
/sqlite/splitting-commas-sqlite.md:
--------------------------------------------------------------------------------
 1 | # Splitting on commas in SQLite
 2 | 
 3 | I had an input string in `x,y,z` format and I needed to split it into three separate values in SQLite. I managed to do it using a confusing combination of the `instr()` and `substr()` functions.
 4 | 
 5 | Here's what I came up with:
 6 | 
 7 | ```sql
 8 | with comma_locations as (
 9 |   select instr(:path, ',') as first_comma,
10 |   instr(:path, ',') + instr(substr(:path, instr(:path, ',') + 1), ',') as second_comma
11 | ), variables as (
12 |   select
13 |     substr(:path, 0, first_comma) as first,
14 |     substr(:path, first_comma + 1, second_comma - first_comma - 1) as second,
15 |     substr(:path, second_comma + 1) as third
16 |   from comma_locations
17 | )
18 | select * from variables
19 | ```
20 | 
21 | Against an input of `x12,y1234,z12345` it returns this:
22 | 
23 | | first | second | third |
24 | | --- | --- | --- |
25 | | x12 | y1234 | z12345 |
26 | 
27 | Here's [a live demo of the query](https://latest.datasette.io/fixtures?sql=with+comma_locations+as+%28%0D%0A++select+instr%28%3Apath%2C+%27%2C%27%29+as+first_comma%2C%0D%0A++instr%28%3Apath%2C+%27%2C%27%29+%2B+instr%28substr%28%3Apath%2C+instr%28%3Apath%2C+%27%2C%27%29+%2B+1%29%2C+%27%2C%27%29+as+second_comma%0D%0A%29%2C+variables+as+%28%0D%0A++select%0D%0A++++substr%28%3Apath%2C+0%2C+first_comma%29+as+first%2C%0D%0A++++substr%28%3Apath%2C+first_comma+%2B+1%2C+second_comma+-+first_comma+-+1%29+as+second%2C%0D%0A++++substr%28%3Apath%2C+second_comma+%2B+1%29+as+third%0D%0A++from+comma_locations%0D%0A%29%0D%0Aselect+*+from+variables&path=x12%2Cy1234%2Cz12345).
28 | 


--------------------------------------------------------------------------------
/sqlite/sqlite-aggregate-filter-clauses.md:
--------------------------------------------------------------------------------
 1 | # SQLite aggregate filter clauses
 2 | 
 3 | SQLite supports aggregate filter clauses (as of [3.30.0, released 2019-10-04](https://www.sqlite.org/changes.html#version_3_30_0)), as described in this [SQL Pivot in all databases](https://modern-sql.com/use-case/pivot) tutorial.
 4 | 
 5 | An example query:
 6 | ```sql
 7 | select
 8 |   year,
 9 |   sum(revenue) filter (where month = 1) as jan_revenue,
10 |   sum(revenue) filter (where month = 2) as feb_revenue
11 | from invoices
12 | group by year
13 | ```
14 | Here's an example using `sqlite-utils` to initially populate a database table:
15 | ```
16 | /tmp % echo 'year,month,revenue
17 | 2019,1,110
18 | 2019,1,30
19 | 2019,2,34
20 | 2019,2,112
21 | 2020,1,40
22 | 2020,1,50
23 | 2020,2,110' | sqlite-utils insert data.db invoices - --csv
24 | /tmp % sqlite-utils rows data.db invoices
25 | [{"year": "2019", "month": "1", "revenue": "110"},
26 |  {"year": "2019", "month": "1", "revenue": "30"},
27 |  {"year": "2019", "month": "2", "revenue": "34"},
28 |  {"year": "2019", "month": "2", "revenue": "112"},
29 |  {"year": "2020", "month": "1", "revenue": "40"},
30 |  {"year": "2020", "month": "1", "revenue": "50"},
31 |  {"year": "2020", "month": "2", "revenue": "110"}]
32 | ```
33 | And the results of that query:
34 | ```
35 | /tmp % sqlite-utils data.db 'select
36 |   year,
37 |   sum(revenue) filter (where month = 1) as jan_revenue,
38 |   sum(revenue) filter (where month = 2) as feb_revenue
39 | from invoices
40 | group by year' -t
41 |   year    jan_revenue    feb_revenue
42 | ------  -------------  -------------
43 |   2019            140            146
44 |   2020             90            110
45 | ```
46 | 


--------------------------------------------------------------------------------
/sqlite/sqlite-extensions-python-macos.md:
--------------------------------------------------------------------------------
 1 | # Loading SQLite extensions in Python on macOS
 2 | 
 3 | I finally found a workaround for this error when attempting to load a SQLite extension in Python on macOS:
 4 | 
 5 | ```
 6 |   File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datasette/app.py", line 614, in _prepare_connection
 7 |     conn.enable_load_extension(True)
 8 | AttributeError: 'sqlite3.Connection' object has no attribute 'enable_load_extension'
 9 | ```
10 | 
11 | The fix is to install Python using Homebrew, and then use **that** version of Python.
12 | 
13 |     brew install python
14 | 
15 | This gives you a version of Python that can load SQLite extensions. The problem is there is a good chance that when you type `python` that's not the version you will get.
16 | 
17 | One way to fix that is to run the Homebrew Python directly like this:
18 | 
19 |     /usr/local/opt/python@3/libexec/bin/python
20 | 
21 | You can create a virtual environment with that Python version like so:
22 | 
23 |     /usr/local/opt/python@3/libexec/bin/python -m venv my-venv
24 |     source my-venv/bin/activate
25 | 
26 | Then within that virtual environment any time you run `python` (or install extra tools using `pip`) they will use the correct version of Python and will be able to load extensions.
27 | 
28 | I expanded this TIL into a section of the Datasette documentation here: https://datasette.io/help/extensions
29 | 


--------------------------------------------------------------------------------
/sqlite/utc-items-on-thursday-in-pst.md:
--------------------------------------------------------------------------------
 1 | # Querying for items stored in UTC that were created on a Thursday in PST
 2 | 
 3 | This came up as [a question](https://news.ycombinator.com/item?id=26443148) on Hacker News. How can you query a SQLite database for items that were created on a Thursday in PST, when the data is stored in UTC?
 4 | 
 5 | I have datetimes stored in UTC, so I first needed to convert them to PST by applying the 8 hour time difference, using `datetime(author_date, '-8 hours') as author_date_pst`.
 6 | 
 7 | Then I used `strftime('%w')` to get the day of week (as a number contained in a string).
 8 | 
 9 | Then I can filter for that equalling '4' for Thursday.
10 | 
11 | ```sql
12 | select
13 |   author_date,
14 |   datetime(author_date, '-8 hours') as author_date_pst,
15 |   strftime('%w', datetime(author_date, '-8 hours')) as dayofweek_pst,
16 |   *
17 | from
18 |   commits
19 | where
20 |   dayofweek_pst = '4' -- Thursday
21 | ```
22 | [Try this query](https://github-to-sqlite.dogsheep.net/github?sql=select+author_date%2C%0D%0Adatetime%28author_date%2C+%27-8+hours%27%29+as+author_date_pst%2C+%0D%0Astrftime%28%27%25w%27%2C+datetime%28author_date%2C+%27-8+hours%27%29%29+as+dayofweek_pst%2C%0D%0A*+from+commits%0D%0Awhere+dayofweek_pst+%3D+%274%27+--+Thursday+).
23 | 
24 | SQLite documentation for date time functions is at https://sqlite.org/lang_datefunc.html
25 | 


--------------------------------------------------------------------------------
/sqlite/vacum-disk-full.md:
--------------------------------------------------------------------------------
 1 | # SQLite VACUUM: database or disk is full
 2 | 
 3 | I was trying to run `VACUUM` against a large SQLite database file (~7GB) using `sqlite-utils vacuum data.db` and I got this error:
 4 | 
 5 |     sqlite3.OperationalError: database or disk is full
 6 | 
 7 | The `/data` volume that the database lived in was 20GB in size, so there should have been enough room to run the operation.
 8 | 
 9 | I realized that this was because VACUUM uses the `/tmp` directory, and on this machine that was on a separate volume that did not have enough space.
10 | 
11 | The fix was to set the `SQLITE_TMPDIR` environment variable to the current directory (or any directory on a volume with enough space):
12 | 
13 |     SQLITE_TMPDIR=/data sqlite-utils vacuum data.db
14 | 


--------------------------------------------------------------------------------
/templates/pages/all.html:
--------------------------------------------------------------------------------
 1 | {% extends "til_base.html" %}
 2 | 
 3 | {% block title %}Simon Willison: all TILs{% endblock %}
 4 | 
 5 | {% block body %}
 6 | <h1>Simon Willison: all TILs</h1>
 7 | 
 8 | {% for row in sql("select topic from til group by topic order by max(created_utc) desc", database="tils") %}
 9 |     <h2>{{ row.topic }}</h2>
10 |     <ul>
11 |         {% for til in sql("select * from til where topic = :topic order by created_utc desc", {"topic": row.topic}, database="tils") %}
12 |             <li><a href="/{{ til.topic }}/{{ til.slug }}">{{ til.title }}</a> - {{ til.created[:10] }}</li>
13 |         {% endfor %}
14 |     </ul>
15 | {% endfor %}
16 | 
17 | {% endblock %}
18 | 


--------------------------------------------------------------------------------
/templates/query-tils-search.html:
--------------------------------------------------------------------------------
 1 | {% extends "til_base.html" %}
 2 | 
 3 | {% block title %}TIL search: {{ q }}{% endblock %}
 4 | 
 5 | {% block extra_head %}
 6 | <style>
 7 | input[type=search] {
 8 |     padding: .25em;
 9 |     font-size: 16px;
10 |     width: 60%;
11 | }
12 | 
13 | input[type=submit] {
14 |     box-sizing: border-box;
15 |     color: #fff;
16 |     background-color: #007bff;
17 |     border-color: #007bff;
18 |     font-weight: 400;
19 |     cursor: pointer;
20 |     text-align: center;
21 |     vertical-align: middle;
22 |     border-width: 1px;
23 |     border-style: solid;
24 |     padding: .5em 0.8em;
25 |     font-size: 16px;
26 |     line-height: 1;
27 |     border-radius: .25rem;
28 | }
29 | pre {
30 |     white-space: pre-wrap;
31 |     overflow-wrap: break-word;
32 | }
33 | pre strong {
34 |     background-color: yellow;
35 | }
36 | </style>
37 | {% endblock %}
38 | {% block body %}
39 | <h1>TIL search: {{ q }}</h1>
40 | 
41 | <form action="/tils/search">
42 |     <p>
43 |         <input type="search" name="q" value="{{ q }}">
44 |         <input type="submit" value="Search">
45 |     </p>
46 | </form>
47 | 
48 | {% if q and not rows %}
49 |     <p><em>No results for "{{ q }}"</em></p>
50 | {% endif %}
51 |     
52 | {% for til in rows %}
53 |     <h3><span class="topic"><a href="/{{ til.topic }}">{{ til.topic }}</a></span> <a href="/{{ til.topic }}/{{ til.slug }}">{{ til.title }}</a> - {{ til.created[:10] }}</h3>
54 |     <pre>{{ highlight(til.snippet)|safe }}</pre>
55 | {% endfor %}
56 | 
57 | {% endblock %}
58 | 


--------------------------------------------------------------------------------
/tesseract/tesseract-cli.md:
--------------------------------------------------------------------------------
 1 | # Using the tesseract CLI tool
 2 | 
 3 | Tesseract OCR has a command-line utility which is woefully under-documented. Thanks to [Alexandru Nedelcu](https://alexn.org/blog/2020/11/11/organize-index-screenshots-ocr-macos.html) I figured out how to use it today.
 4 | 
 5 | To install on macOS:
 6 | 
 7 |     brew install tesseract
 8 | 
 9 | To convert an image into an annotated PDF (which you can then copy and paste text out of, and which will be correctly indexed by Spotlight):
10 | 
11 |     tesseract image.png output-file -l eng pdf
12 |     
13 | The second `output-file` argument there is the path and filename of the output - note that I didn't include a `.pdf` extension because Tesseract adds that automatically - so the output will be in a file called `output-file.pdf`.
14 | 
15 | To get out just the plain text:
16 | 
17 |     tesseract image.png output-file -l eng txt
18 | 


--------------------------------------------------------------------------------
/update_readme.py:
--------------------------------------------------------------------------------
 1 | "Run this after build_database.py - it needs tils.db"
 2 | import pathlib
 3 | import sqlite_utils
 4 | import sys
 5 | import re
 6 | 
 7 | root = pathlib.Path(__file__).parent.resolve()
 8 | 
 9 | index_re = re.compile(r"<!\-\- index starts \-\->.*<!\-\- index ends \-\->", re.DOTALL)
10 | count_re = re.compile(r"<!\-\- count starts \-\->.*<!\-\- count ends \-\->", re.DOTALL)
11 | 
12 | COUNT_TEMPLATE = "<!-- count starts -->{}<!-- count ends -->"
13 | 
14 | if __name__ == "__main__":
15 |     db = sqlite_utils.Database(root / "tils.db")
16 |     by_topic = {}
17 |     for row in db["til"].rows_where(order_by="created_utc"):
18 |         by_topic.setdefault(row["topic"], []).append(row)
19 |     index = ["<!-- index starts -->"]
20 |     for topic, rows in by_topic.items():
21 |         index.append("## {}\n".format(topic))
22 |         for row in rows:
23 |             index.append(
24 |                 "* [{title}]({url}) - {date}".format(
25 |                     date=row["created"].split("T")[0], **row
26 |                 )
27 |             )
28 |         index.append("")
29 |     if index[-1] == "":
30 |         index.pop()
31 |     index.append("<!-- index ends -->")
32 |     if "--rewrite" in sys.argv:
33 |         readme = root / "README.md"
34 |         index_txt = "\n".join(index).strip()
35 |         readme_contents = readme.open().read()
36 |         rewritten = index_re.sub(index_txt, readme_contents)
37 |         rewritten = count_re.sub(COUNT_TEMPLATE.format(db["til"].count), rewritten)
38 |         readme.open("w").write(rewritten)
39 |     else:
40 |         print("\n".join(index))
41 | 


--------------------------------------------------------------------------------
/vim/mouse-support-in-vim.md:
--------------------------------------------------------------------------------
1 | # Mouse support in vim
2 | 
3 | Today I learned that if you hit `Esc` in vim and then type `:set mouse=a` and hit enter... vim grows mouse support! In your terminal!
4 | 
5 | You can use the mouse to select blocks of text and move the insertion cursor around, then hit `del` to delete it or type to replace it.
6 | 
7 | I learned this after tweeting [a demo video](https://twitter.com/simonw/status/1406336417500860423) of Will McGugan's brilliant new [textual](https://github.com/willmcgugan/textual) Python library for building TUIs - terminal user interfaces - and marveling at how his demo application can already respond to mouseover events and scroll wheel activation while running in the terminal.
8 | 


--------------------------------------------------------------------------------
/vscode/language-specific-indentation-settings.md:
--------------------------------------------------------------------------------
 1 | # Language-specific indentation settings in VS Code
 2 | 
 3 | When I'm working with Python I like four space indents, but for JavaScript or HTML I like two space indents.
 4 | 
 5 | Today I figured out how to teach VS Code those defaults.
 6 | 
 7 | 1. Hit `Shift+Command+P` to bring up the action menu
 8 | 2. Search for the `Preferences: Configure Language Specific Settings...` item
 9 | 3. Select the language, e.g. `HTML` or `Python`
10 | 4. Add `"editor.tabSize": 2` to the corresponding JSON object
11 | 5. Save that file
12 | 
13 | I ended up with the following in my JSON (plus a bunch of other stuff I've edited out):
14 | 
15 | ```json
16 | {
17 |     "[html]": {
18 |         "editor.tabSize": 2,
19 |         "editor.defaultFormatter": "esbenp.prettier-vscode"
20 |     },
21 |     "[javascript]": {
22 |         "editor.defaultFormatter": "esbenp.prettier-vscode",
23 |         "editor.tabSize": 2
24 |     },
25 |     "[typescript]": {
26 |         "editor.defaultFormatter": "esbenp.prettier-vscode",
27 |         "editor.tabSize": 2
28 |     },
29 |     "[python]": {
30 |         "editor.tabSize": 4,
31 |         "editor.wordBasedSuggestions": false
32 |     }
33 | }
34 | ```
35 | This file is stored on my machine at `~/Library/Application Support/Code/User/settings.json`.
36 | 


--------------------------------------------------------------------------------
/vscode/vs-code-regular-expressions.md:
--------------------------------------------------------------------------------
 1 | # Search and replace with regular expressions in VS Code
 2 | 
 3 | I wanted to replace all instances of this:
 4 | 
 5 |     `#90 <https://github.com/simonw/sqlite-utils/issues/90>`__
 6 | 
 7 | With this:
 8 | 
 9 |     :issue:`90`
10 | 
11 | For [sqlite-utils issue #306](https://github.com/simonw/sqlite-utils/issues/306).
12 | 
13 | I used the VS Code's Find and Replace tool with regular expression mode turned on (the `.*` button). I used the following for the find:
14 | 
15 |     `#(\d+) <https://github.com/simonw/sqlite-utils/issues/\1>`__
16 | 
17 | Note the `\1` reference to say "the same thing I captured earlier with parenthesis". Then I used this as the replace string:
18 | 
19 |     :issue:`$1`
20 | 
21 | Here the `$1` means "the first thing captured with parenthesis".
22 | 
23 | <img width="983" alt="Screenshot of the find and replace dialog" src="https://user-images.githubusercontent.com/9599/127925903-156f2f74-d8d3-4ce3-b65b-aed29a279253.png">
24 | 
25 | The resulting change can be seen in [commit e83aef95](https://github.com/simonw/sqlite-utils/commit/e83aef951bd3e8c179511faddb607239a5fa8682).
26 | 


--------------------------------------------------------------------------------
/webauthn/webauthn-browser-support.md:
--------------------------------------------------------------------------------
 1 | # WebAuthn browser support
 2 | 
 3 | I [started exploring](https://twitter.com/simonw/status/1476249939516616704) **[WebAuthn](https://webauthn.guide/)** today - a set of browser standards that adds support for both Yubikey 2FA hardware devices and "platform" authentication using things like Touch ID and Face ID.
 4 | 
 5 | ## Trying it out
 6 | 
 7 | I have an iPhone with Face ID and a MacBook Pro (16", 2019) running macOS Catalina. I could get it to work on my iPhone but not on my Mac - at least not without a separate Yubikey (I have one lost in a bag somewhere, I should dig that out and try it).
 8 | 
 9 | The easiest way to try it out is using the demo on https://webauthn.io/
10 | 
11 | ## Browser support
12 | 
13 | https://caniuse.com/?search=webauthn has a support matrix.
14 | 
15 | https://webauthn.me/browser-support has more browser support information, including code that detects the level of support for your current browser.
16 | 
17 | ## Other places that support it
18 | 
19 | - PyPI let you use it for 2FA at https://pypi.org/manage/account/webauthn-provision - [source code here](https://github.com/pypa/warehouse/blob/eb241f9061633752672a07851cd553ad4c5cd24a/warehouse/manage/views.py#L568)
20 | - GitHub offer it for 2FA - they call it "security keys" and the flow starts at https://github.com/settings/two_factor_authentication/configure
21 | - Best Buy, apparently: https://www.bestbuy.com/identity/global/signin - but I think you need to configure it in an existing account before you use it to sign in.
22 | 


--------------------------------------------------------------------------------
/wikipedia/page-stats-api.md:
--------------------------------------------------------------------------------
 1 | # The Wikipedia page stats API
 2 | 
 3 | Via https://alexgarcia.xyz/dataflow/examples/wiki-pageviews/ I found this API for retrieving daily pageview stats from Wikipedia for any article:
 4 | 
 5 | - https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/user/Python_(programming_language)/daily/20210101/20210501
 6 | - https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/user/Simon_Willison/daily/20210101/20210501
 7 | 
 8 | Example response:
 9 | 
10 | ```json
11 | {
12 |   "items": [
13 |     {
14 |       "project": "en.wikipedia",
15 |       "article": "Python_(programming_language)",
16 |       "granularity": "daily",
17 |       "timestamp": "2021010100",
18 |       "access": "all-access",
19 |       "agent": "user",
20 |       "views": 7238
21 |     },
22 |     {
23 |       "project": "en.wikipedia",
24 |       "article": "Python_(programming_language)",
25 |       "granularity": "daily",
26 |       "timestamp": "2021010200",
27 |       "access": "all-access",
28 |       "agent": "user",
29 |       "views": 8449
30 |     },
31 |     {
32 |       "project": "en.wikipedia",
33 |       "article": "Python_(programming_language)",
34 |       "granularity": "daily",
35 |       "timestamp": "2021010300",
36 |       "access": "all-access",
37 |       "agent": "user",
38 |       "views": 8669
39 |     }
40 |   ]
41 | }
42 | ```
43 | 


--------------------------------------------------------------------------------
/zsh/argument-heredoc.md:
--------------------------------------------------------------------------------
 1 | # Passing command arguments using heredoc syntax
 2 | 
 3 | This trick works for both Bash and zsh.
 4 | 
 5 | I wanted to pass the following as an argument to the sqlite-utils CLI tool:
 6 | 
 7 | ```
 8 | insert into documents select
 9 |   substr(s3_ocr_etag, 2, 8) as id,
10 |   key as title,
11 |   key as path,
12 |   replace(s3_ocr_etag, '"', '') as etag
13 | from
14 |   index2.ocr_jobs;
15 | ```
16 | 
17 | Problem: this contains both single AND double quotes, which makes string escaping a tiny bit tricky.
18 | 
19 | Solution: use heredoc syntax:
20 | ```
21 | sqlite-utils sfms.db --attach index2 index.db "$(cat <<EOF
22 | insert into documents select
23 |   substr(s3_ocr_etag, 2, 8) as id,
24 |   key as title,
25 |   key as path,
26 |   replace(s3_ocr_etag, '"', '') as etag
27 | from
28 |   index2.ocr_jobs;
29 | EOF
30 | )"
31 | ```
32 | Breaking that apart: the main trick here is to use `cat <<EOF ... EOF` to wrap the literal chunk of text:
33 | ```
34 | $(cat <<EOF
35 | insert into documents select
36 |   substr(s3_ocr_etag, 2, 8) as id,
37 |   key as title,
38 |   key as path,
39 |   replace(s3_ocr_etag, '"', '') as etag
40 | from
41 |   index2.ocr_jobs;
42 | EOF
43 | )
44 | ```
45 | Then to pass it as an argument to the `sqlite-utils` command use `"$(cat ...)"` - the double quotes around that ensure that tokens in that input are not treated as separate arguments by zsh/bash.
46 | 


--------------------------------------------------------------------------------
/zsh/custom-zsh-prompt.md:
--------------------------------------------------------------------------------
 1 | # Customizing my zsh prompt
 2 | 
 3 | I got fed up of the default macOS `zsh` prompt:
 4 | 
 5 |     simon@Simons-MacBook-Pro ~ % 
 6 | 
 7 | Mainly because I like copying and pasting terminal examples into GitHub issues.
 8 | 
 9 | I changed it to this:
10 | 
11 |     ~ % cd /tmp
12 |     /tmp % 
13 | 
14 | By adding this line to the top of my `~/.zshrc` file:
15 | 
16 |     PROMPT='%1~ %# '
17 | 


--------------------------------------------------------------------------------