├── .graphics ├── taupe-icon.png ├── noun-bird-233023.svg ├── README.md └── taupe-icon.svg ├── bin ├── README.md └── taupe ├── SUPPORT.md ├── requirements.txt ├── CITATION.cff ├── requirements-dev.txt ├── codemeta.json ├── CHANGES.md ├── .gitattributes ├── CONTRIBUTING.md ├── LICENSE ├── taupe ├── __init__.py ├── exit_codes.py └── __main__.py ├── setup.cfg ├── .flake8 ├── setup.py ├── .gitignore ├── CODE_OF_CONDUCT.md ├── Makefile └── README.md /.graphics/taupe-icon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mhucka/taupe/HEAD/.graphics/taupe-icon.png -------------------------------------------------------------------------------- /bin/README.md: -------------------------------------------------------------------------------- 1 | # About the shell script in this directory 2 | 3 | The shell script in this directory is mainly for testing and development. During development, I run Taupe from a terminal emulator by starting it simply like this: 4 | 5 | ```sh 6 | ./taupe 7 | ``` 8 | 9 | When Taupe is installed on a computer using `pip` or `pipx`, a different wrapper script is installed, not the one that is in this directory. The one here is merely a convenience. 10 | -------------------------------------------------------------------------------- /.graphics/noun-bird-233023.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | -------------------------------------------------------------------------------- /.graphics/README.md: -------------------------------------------------------------------------------- 1 | # Icon for Taupe 2 | 3 | The [vector artwork](https://thenounproject.com/icon/bird-233023/) of a bird, used as the icon for this repository, was created by [Noe Araujo](https://thenounproject.com/noearaujo/) from the Noun Project. It is licensed under the Creative Commons [CC-BY 3.0](https://creativecommons.org/licenses/by/3.0/) license. 4 | 5 | I edited the logo in [Boxy SVG](https://boxy-svg.com), a native SVG editor for macOS, to change the icon color to [taupe](https://en.wikipedia.org/wiki/Taupe). 6 | -------------------------------------------------------------------------------- /SUPPORT.md: -------------------------------------------------------------------------------- 1 | Support 2 | ======= 3 | 4 | Thank you for your interest in this project. If you are experiencing problems or have questions, the following are the preferred methods of reaching someone: 5 | 6 | 1. Report a new issue using the [issue tracker](https://github.com/mhucka/template/issues). 7 | 2. Send email to the primary maintainer: [mhucka@caltech.edu](mhucka@caltech.edu). 8 | 3. Send email to an individual involved in the project. People's names appear in the top-level `README.md` file in the source code repository. 9 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # ============================================================================= 2 | # @file requirements.txt 3 | # @brief Python dependencies for Taupe 4 | # @created 2022-11-18 5 | # @license Please see the file named LICENSE in the project directory 6 | # @website https://github.com/mhucka/taupe 7 | # ============================================================================= 8 | 9 | aenum >= 3.1.0 10 | commonpy == 1.9.5 11 | plac == 1.3.5 12 | rich >= 12.6.0 13 | setuptools == 58.3.0 14 | sidetrack >= 2.0.1 15 | -------------------------------------------------------------------------------- /CITATION.cff: -------------------------------------------------------------------------------- 1 | cff-version: 1.2.0 2 | message: "If you use this software, please cite it as below." 3 | title: Taupe 4 | authors: 5 | - family-names: Hucka 6 | given-names: Michael 7 | orcid: https://orcid.org/0000-0001-9105-5960 8 | abstract: Twitter archive URL parser 9 | repository-code: "https://github.com/mhucka/taupe" 10 | type: software 11 | version: 1.2.0 12 | license-url: "https://github.com/mhucka/taupe/blob/main/LICENSE" 13 | keywords: 14 | - Twitter 15 | - archiving 16 | - data processing 17 | - CSV 18 | - comma separated values 19 | - JSON 20 | - software 21 | date-released: 2022-11-23 22 | 23 | -------------------------------------------------------------------------------- /.graphics/taupe-icon.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | Bird 5 | Artwork created by Noe Araujo (https://thenounproject.com/noearaujo/) obtained on 2022-11-15 from https://thenounproject.com/icon/bird-233023/. Licensed CCBY. 6 | 7 | -------------------------------------------------------------------------------- /requirements-dev.txt: -------------------------------------------------------------------------------- 1 | # ============================================================================= 2 | # @file requirements-dev.txt 3 | # @brief Python dependencies for Waystation for development 4 | # @created 2022-11-18 5 | # @license Please see the file named LICENSE in the project directory 6 | # @website https://github.com/mhucka/taupe 7 | # ============================================================================= 8 | 9 | -r requirements.txt 10 | 11 | pytest >= 6.2.5 12 | pytest-cov >= 3.0.0 13 | pytest-mock >= 3.7.0 14 | 15 | flake8 >= 4.0.1 16 | flake8-bugbear >= 22.4.25 17 | flake8-builtins >= 1.5.3 18 | flake8-comprehensions >= 3.8.0 19 | flake8-executable >= 2.1.1 20 | flake8_implicit_str_concat >= 0.3.0 21 | flake8-pie >= 0.15.0 22 | flake8-simplify >= 0.19.2 23 | 24 | twine 25 | -------------------------------------------------------------------------------- /codemeta.json: -------------------------------------------------------------------------------- 1 | { 2 | "@context": "https://doi.org/10.5063/schema/codemeta-2.0", 3 | "@type": "SoftwareSourceCode", 4 | "description": "Twitter archive URL parser", 5 | "name": "taupe", 6 | "codeRepository": "https://github.com/mhucka/taupe", 7 | "issueTracker": "https://github.com/mhucka/taupe/issues", 8 | "license": "https://github.com/mhucka/taupe/blob/master/LICENSE", 9 | "version": "1.2.0", 10 | "author": [ 11 | { 12 | "@type": "Person", 13 | "givenName": "Michael", 14 | "familyName": "Hucka", 15 | "affiliation": "California Institute of Technology Library", 16 | "email": "mhucka@caltech.edu", 17 | "@id": "https://orcid.org/0000-0001-9105-5960" 18 | }], 19 | "developmentStatus": "active", 20 | "downloadUrl": "https://github.com/mhucka/taupe/archive/master.zip", 21 | "keywords": [ 22 | "software", 23 | ], 24 | "maintainer": "https://orcid.org/0000-0001-9105-5960", 25 | } 26 | -------------------------------------------------------------------------------- /CHANGES.md: -------------------------------------------------------------------------------- 1 | # Change log for Taupe 2 | 3 | ## Version 1.2.0 (2022-11-23) 4 | 5 | This release only corrects inconsistent statements about the license terms of the software. There are no functional or other changes in this release. 6 | 7 | 8 | ## Version 1.1.0 (2022-11-22) 9 | 10 | This update brings more output format options. The option `--extract` now accepts many more values to control the output. For example, It is now possible to produce a plain list of URLs of your tweets. Please see the help text or the [README](https://github.com/mhucka/taupe#the-structure-of-the-output) for the details. 11 | 12 | 13 | ## Version 1.0.0 (2022-11-18) 14 | 15 | Changes since the last release: 16 | * The [README](https://github.com/mhucka/taupe/blob/main/README.md) has been edited and enhanced. 17 | * The help text printed for `taupe --help` has been edited to (hopefully) improve clarity. 18 | 19 | 20 | ## Version 0.0.1 21 | 22 | First release of complete working version. 23 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | # -*- mode: sh; -*- 2 | 3 | # Set the default behavior, in case people don't have core.autocrlf set. 4 | # ............................................................................. 5 | 6 | * text=auto 7 | 8 | # Specify what's text and should be normalized. 9 | # ............................................................................. 10 | 11 | *.py text 12 | *.in text 13 | *.rst text 14 | *.cfg text 15 | *.ini text 16 | *.yml text 17 | *.json text 18 | *.bat text 19 | *.sh text 20 | LICENSE text 21 | CONTRIBUTING text 22 | 23 | # Denote all files that are truly binary and should not be modified. 24 | # ............................................................................. 25 | 26 | *.png binary 27 | *.jpg binary 28 | *.xls binary 29 | *.doc binary 30 | 31 | # This next one is because in other projects, we've had problems with git 32 | # getting confused about line endings when people using Windows and Mac edit 33 | # the same files. 34 | # ............................................................................. 35 | 36 | *.csv binary diff=csv 37 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Guidelines for contributing to this project 2 | 3 | Any constructive contributions – bug reports, pull requests (code or documentation), suggestions for improvements, and more – are welcome. 4 | 5 | ## Conduct 6 | 7 | Everyone is asked to read and respect the [code of conduct](CODE_OF_CONDUCT.md) before participating in this project. 8 | 9 | ## Coordinating work 10 | 11 | A quick way to find out what is currently in the near-term plans for this project is to look at the [GitHub issue tracker](https://github.com/mhucka/taupe/issues), but the possibilities are not limited to what you see there – if you have ideas for new features and enhancements, please feel free to write them up as a new issue or contact the developers directly! 12 | 13 | ## Submitting contributions 14 | 15 | Please feel free to contact the primary author (Mike Hucka) directly, or even better, jump right in and use the standard GitHub approach of forking the repo and creating a pull request. When committing code changes and submitting pull requests, please write a clear log message for your commits. 16 | -------------------------------------------------------------------------------- /bin/taupe: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # ============================================================================= 3 | # @file taupe 4 | # @brief Simple interface to run taupe, for testing and exploration 5 | # @author Michael Hucka 6 | # @license Please see the file named LICENSE in the project directory 7 | # @website https://github.com/mhucka/taupe 8 | # ============================================================================= 9 | 10 | # Allow this program to be executed directly from the 'bin' directory. 11 | import os 12 | import sys 13 | import plac 14 | 15 | # Allow this program to be executed directly from the 'bin' directory. 16 | try: 17 | thisdir = os.path.dirname(os.path.abspath(__file__)) 18 | sys.path.append(os.path.join(thisdir, '..')) 19 | except: 20 | sys.path.append('..') 21 | 22 | # Hand over to the command line interface. 23 | import taupe 24 | from taupe.__main__ import main as main 25 | 26 | if __name__ == '__main__': 27 | if len(sys.argv) > 1 and sys.argv[1] == 'help': 28 | plac.call(main, ['-h']) 29 | else: 30 | plac.call(main) 31 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2022 by Michael Hucka. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in 11 | all copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /taupe/__init__.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Taupe: Extract the URLs from your personal Twitter archive 3 | 4 | This file is part of https://github.com/mhucka/taupe/. 5 | 6 | Copyright (c) 2022 by Michael Hucka. 7 | This code is open-source software released under the MIT license. 8 | Please see the file "LICENSE" for more information. 9 | ''' 10 | 11 | # Package metadata ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 12 | # 13 | # ╭────────────────────── Notice ── Notice ── Notice ─────────────────────╮ 14 | # | The following values are automatically updated at every release | 15 | # | by the Makefile. Manual changes to these values will be lost. | 16 | # ╰────────────────────── Notice ── Notice ── Notice ─────────────────────╯ 17 | 18 | __version__ = '1.2.0' 19 | __description__ = 'Taupe: a tool to extract URLs from your personal Twitter archive' 20 | __url__ = 'https://github.com/mhucka/taupe' 21 | __author__ = 'Mike Hucka' 22 | __email__ = 'mhucka@caltech.edu' 23 | __license__ = 'MIT' 24 | 25 | 26 | # Miscellaneous utilities. 27 | # ............................................................................. 28 | 29 | def print_version(): 30 | print(f'{__name__} version {__version__}') 31 | print(f'Authors: {__author__}') 32 | print(f'URL: {__url__}') 33 | print(f'License: {__license__}') 34 | -------------------------------------------------------------------------------- /taupe/exit_codes.py: -------------------------------------------------------------------------------- 1 | ''' 2 | exit_codes.py: define exit codes for program return values 3 | 4 | This file is part of https://github.com/mhucka/taupe/. 5 | 6 | Copyright (c) 2022 by Michael hucka. 7 | This code is open-source software released under the MIT license. 8 | Please see the file "LICENSE" for more information. 9 | ''' 10 | 11 | from aenum import Enum, MultiValue 12 | 13 | 14 | # I adapted the clever approach posted by the author of the Python aenum 15 | # package, Ethan Furman, to Stack Overflow on 2016-03-13 at 16 | # https://stackoverflow.com/a/35964875/743730 17 | # The most important bit is realizing you can define __int__(). 18 | 19 | class ExitCode(Enum): 20 | '''Class of exit codes that this program may return. 21 | 22 | The numeric value of a given code can be obtained by using int(). For 23 | example, int(ExitCode.success) will produce 0. 24 | ''' 25 | 26 | _init_ = 'value meaning' 27 | _settings_ = MultiValue 28 | 29 | success = 0, "success -- program completed normally" 30 | user_interrupt = 1, "the user interrupted the program's execution" 31 | bad_arg = 2, "encountered a bad or missing value for an option" 32 | file_error = 3, "encountered a problem with a file or directory" 33 | exception = 4, "a miscellaneous exception or fatal error occurred" 34 | 35 | def __int__(self): 36 | return self.value 37 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | # ============================================================================= 2 | # @file setup.cfg 3 | # @brief Package metadata and PyPI configuration 4 | # @created 2021-10-16 5 | # @license Please see the file named LICENSE in the project directory 6 | # @website https://github.com/mhucka/taupe 7 | # ============================================================================= 8 | 9 | [metadata] 10 | name = taupe 11 | version = 1.2.0 12 | description = Taupe: a tool to extract URLs from your personal Twitter archive 13 | author = Mike Hucka 14 | author_email = mhucka@caltech.edu 15 | license = MIT 16 | license_files = LICENSE 17 | url = https://github.com/mhucka/taupe 18 | # The remaining items below are used by PyPI. 19 | project_urls = 20 | Source Code = https://github.com/mhucka/taupe 21 | Bug Tracker = https://github.com/mhucka/taupe/issues 22 | keywords = Python, applications 23 | classifiers = 24 | Development Status :: 3 - Alpha 25 | Environment :: Console 26 | License :: OSI Approved :: MIT License 27 | Intended Audience :: Science/Research 28 | Operating System :: MacOS :: MacOS X 29 | Operating System :: POSIX 30 | Operating System :: POSIX :: Linux 31 | Operating System :: Unix 32 | Programming Language :: Python 33 | Programming Language :: Python :: 3.8 34 | long_description = file:README.md 35 | long_description_content_type = text/markdown 36 | 37 | [options] 38 | packages = find: 39 | zip_safe = False 40 | python_requires = >= 3.8 41 | 42 | [options.entry_points] 43 | console_scripts = 44 | taupe = taupe.__main__:console_scripts_main 45 | 46 | [tool:pytest] 47 | pythonpath = . 48 | 49 | -------------------------------------------------------------------------------- /.flake8: -------------------------------------------------------------------------------- 1 | # =========================================================== -*- conf-toml -*- 2 | # @file .flake8 3 | # @brief Project-wide Flake8 configuration 4 | # @created 2022-05-10 5 | # @license Please see the file named LICENSE in the project directory 6 | # @website https://github.com/caltechlibrary/foliage 7 | # 8 | # Note: as of version 4.0, flake8 does NOT read global configuration files 9 | # from ~/.flake8 or ~/.config/flake8. If you had such a config file of your 10 | # own, and you're looking at this config file and wondering how the two will 11 | # interaction, the answer is simple: they won't. Only this file matters. 12 | # 13 | # The following flake8 plugins are assumed to be installed: 14 | # flake8-bugbear 15 | # flake8-builtins 16 | # flake8-comprehensions 17 | # flake8-executable 18 | # flake8-implicit-str-concat 19 | # flake8-pie 20 | # flake8_simplify 21 | # ============================================================================= 22 | 23 | [flake8] 24 | # I try to stick to 80 chars, but sometimes it's more readable to go longer. 25 | max-line-length = 90 26 | 27 | ignore = 28 | # We prefer to put spaces around the = in keyword arg lists. 29 | E251, 30 | # We prefer two lines between methods of a class. 31 | E303, 32 | # Sometimes we want to align keywords, and these rules run counter to it. 33 | E271, 34 | E221, 35 | # In some situations, it's more readable to omit spaces around operators 36 | # and colons. 37 | E203, 38 | E226, 39 | # According to Flake8 docs at https://www.flake8rules.com/rules/W503.html 40 | # line breaks *should* come before a binary operator, but as of version 4, 41 | # Flake8 still flags the breaks as bad. So: 42 | W503 43 | # I disagree wit this one. 44 | B005 45 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # ============================================================================= 3 | # @file setup.py 4 | # @brief Installation setup file 5 | # @created 2022-11-18 6 | # @license Please see the file named LICENSE in the project directory 7 | # @website https://github.com/mhucka/taupe 8 | # 9 | # Note: configuration metadata is maintained in setup.cfg. This file exists 10 | # primarily to hook in setup.cfg and requirements.txt. 11 | # ============================================================================= 12 | 13 | from setuptools import setup 14 | 15 | 16 | def requirements(file): 17 | from os import path 18 | required = [] 19 | requirements_file = path.join(path.abspath(path.dirname(__file__)), file) 20 | if path.exists(requirements_file): 21 | with open(requirements_file, encoding='utf-8') as f: 22 | required = [ln for ln in filter(str.strip, f.read().splitlines()) 23 | if not ln.startswith('#')] 24 | if any(item.startswith(('-', '.', '/')) for item in required): 25 | # The requirements.txt uses pip features. Try to use pip's parser. 26 | try: 27 | from pip._internal.req import parse_requirements 28 | from pip._internal.network.session import PipSession 29 | parsed = parse_requirements(requirements_file, PipSession()) 30 | required = [item.requirement for item in parsed] 31 | except ImportError: 32 | # No pip, or not the expected version. Give up & return as-is. 33 | pass 34 | return required 35 | 36 | 37 | setup( 38 | setup_requires = ['wheel'], 39 | install_requires = requirements('requirements.txt'), 40 | extras_require={'dev': requirements('requirements-dev.txt')}, 41 | ) 42 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # =========================================================== -*- gitignore -*- 2 | # @file .gitignore 3 | # @brief Files and patterns for files and subdirs that git should ignore 4 | # @date 2022-10-16 5 | # @license Please see the file named LICENSE in the project directory 6 | # @website https://github.com/mhucka/taupe 7 | # 8 | # The approach I suggest is to add ONLY project-specific rules here. Put 9 | # rules that apply to your way of doing things (and the particular tools you 10 | # happen to use) into a global git ignore file as described in the section 11 | # "Configuring ignored files for all repositories on your computer" here: 12 | # https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files 13 | # (accessed on 2022-07-14). For example, Emacs checkpoint and backup files are 14 | # things that are not specific to a given project; rather, Emacs users will 15 | # see them created everywhere, in all projects, because they're a byproduct 16 | # of using Emacs, not a consequence of working on a particular project. Thus, 17 | # they belong in a user's global ignores list, not in this project .gitignore. 18 | # 19 | # A useful starting point for global .gitignore file contents can be found at 20 | # https://github.com/github/gitignore/tree/main/Global (as of 2022-07-14). 21 | # ============================================================================= 22 | 23 | # Python-specific things to ignore (relevant because this is a Python project). 24 | # ............................................................................. 25 | 26 | __pycache__/ 27 | *.py[cod] 28 | *$py.class 29 | *.egg-info/ 30 | .eggs/ 31 | .pytest_cache 32 | .coverage 33 | 34 | # Project-specific things to ignore: 35 | # ............................................................................. 36 | 37 | build 38 | dist 39 | *.tmp 40 | *.bak 41 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | Contributor Covenant Code of Conduct 2 | ==================================== 3 | 4 | ## Our Pledge 5 | 6 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. 7 | 8 | ## Our Standards 9 | 10 | Examples of behavior that contributes to creating a positive environment include: 11 | 12 | * Using welcoming and inclusive language 13 | * Being respectful of differing viewpoints and experiences 14 | * Gracefully accepting constructive criticism 15 | * Focusing on what is best for the community 16 | * Showing empathy towards other community members 17 | 18 | Examples of unacceptable behavior by participants include: 19 | 20 | * The use of sexualized language or imagery and unwelcome sexual attention or advances 21 | * Trolling, insulting/derogatory comments, and personal or political attacks 22 | * Public or private harassment 23 | * Publishing others' private information, such as a physical or electronic address, without explicit permission 24 | * Other conduct which could reasonably be considered inappropriate in a professional setting 25 | 26 | ## Our Responsibilities 27 | 28 | Project contributors are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. 29 | 30 | Project contributors have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 31 | 32 | ## Scope 33 | 34 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project contributors. 35 | 36 | ## Enforcement 37 | 38 | If a contributor engages in harassing behaviour, the project organizer(s) may take any action they deem appropriate, including warning the offender or expelling them from online forums, online project resources, face-to-face meetings, or any other project-related activity or resource. 39 | 40 | If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of the project team immediately. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. 41 | 42 | ## Attribution 43 | 44 | Portions of this Code of Conduct were adapted from Electron's [Contributor Covenant Code of Conduct](https://github.com/electron/electron/blob/master/CODE_OF_CONDUCT.md), which itself was adapted from the [Contributor Covenant](http://contributor-covenant.org/version/1/4), version 1.4. 45 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # ============================================================================= 2 | # @file Makefile 3 | # @brief Makefile for some steps in creating new releases on GitHub 4 | # @date 2021-10-16 5 | # @license Please see the file named LICENSE in the project directory 6 | # @website https://github.com/mhucka/taupe 7 | # ============================================================================= 8 | 9 | .ONESHELL: # Run all commands in the same shell. 10 | .SHELLFLAGS += -e # Exit at the first error. 11 | 12 | # This Makefile uses syntax that needs at least GNU Make version 3.82. 13 | # The following test is based on the approach posted by Eldar Abusalimov to 14 | # Stack Overflow in 2012 at https://stackoverflow.com/a/12231321/743730 15 | 16 | ifeq ($(filter undefine,$(value .FEATURES)),) 17 | $(error Unsupported version of Make. \ 18 | This Makefile does not work properly with GNU Make $(MAKE_VERSION); \ 19 | it needs GNU Make version 3.82 or later) 20 | endif 21 | 22 | # Before we go any further, test if certain programs are available. 23 | # The following is based on the approach posted by Jonathan Ben-Avraham to 24 | # Stack Overflow in 2014 at https://stackoverflow.com/a/25668869 25 | 26 | programs_needed = awk curl gh git jq sed python3 pyinstaller pandoc inliner create-dmg 27 | TEST := $(foreach p,$(programs_needed),\ 28 | $(if $(shell which $(p)),_,$(error Cannot find program "$(p)"))) 29 | 30 | # Set some basic variables. These are quick to set; we set additional 31 | # variables using "set-vars" but only when the others are needed. 32 | 33 | name := $(strip $(shell awk -F "=" '/^name/ {print $$2}' setup.cfg)) 34 | version := $(strip $(shell awk -F "=" '/^version/ {print $$2}' setup.cfg)) 35 | url := $(strip $(shell awk -F "=" '/^url/ {print $$2}' setup.cfg)) 36 | desc := $(strip $(shell awk -F "=" '/^description / {print $$2}' setup.cfg)) 37 | author := $(strip $(shell awk -F "=" '/^author / {print $$2}' setup.cfg)) 38 | email := $(strip $(shell awk -F "=" '/^author_email/ {print $$2}' setup.cfg)) 39 | license := $(strip $(shell awk -F "=" '/^license / {print $$2}' setup.cfg)) 40 | platform := $(strip $(shell python3 -c 'import sys; print(sys.platform)')) 41 | os := $(subst $(platform),darwin,macos) 42 | branch := $(shell git rev-parse --abbrev-ref HEAD) 43 | initfile := $(name)/__init__.py 44 | distdir := dist/$(os) 45 | builddir := build/$(os) 46 | 47 | 48 | # Print help if no command is given ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 49 | 50 | help: 51 | @echo 'Available commands:' 52 | @echo '' 53 | @echo 'make' 54 | @echo 'make help' 55 | @echo ' Print this summary of available commands.' 56 | @echo '' 57 | @echo 'make report' 58 | @echo ' Print variables set in this Makefile from various sources.' 59 | @echo ' This is useful to verify the values that have been parsed.' 60 | @echo '' 61 | @echo 'make lint' 62 | @echo ' Run Python linters like flake8.' 63 | @echo '' 64 | @echo 'make test' 65 | @echo ' Run pytest.' 66 | @echo '' 67 | @echo 'make install' 68 | @echo ' Install the project in dev mode.' 69 | @echo '' 70 | @echo 'make release' 71 | @echo ' Do a release on GitHub. This will push changes to GitHub,' 72 | @echo ' open an editor to let you edit release notes, and run' 73 | @echo ' "gh release create" followed by "gh release upload".' 74 | @echo ' Note: this will NOT upload to PyPI, nor create binaries.' 75 | @echo '' 76 | @echo 'make packages' 77 | @echo ' Create the distribution files for PyPI.' 78 | @echo ' Do this manually to check that everything looks okay before.' 79 | @echo ' After doing this, do a "make test-pypi".' 80 | @echo '' 81 | @echo 'make test-pypi' 82 | @echo ' Upload distribution to test.pypi.org.' 83 | @echo ' Do this before doing "make pypi" for real.' 84 | @echo '' 85 | @echo 'make pypi' 86 | @echo ' Upload distribution to pypi.org.' 87 | @echo '' 88 | @echo 'make clean' 89 | @echo ' Clean up various files generated by this Makefile.' 90 | @echo '' 91 | @echo 'make really-clean' 92 | @echo ' Like "make clean", but more so.' 93 | @echo '' 94 | @echo 'make completely-clean' 95 | @echo ' The ultimate in cleaning.' 96 | 97 | 98 | # Gather additional values we sometimes need ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 99 | # 100 | # These variables take longer to compute, and for some actions like "make help" 101 | # they are unnecessary and annoying to wait for. 102 | 103 | .SILENT: vars 104 | vars: 105 | $(info Gathering data -- this takes a few moments ...) 106 | $(eval repo := $(strip $(shell gh repo view | head -1 | cut -f2 -d':'))) 107 | $(eval api_url := https://api.github.com) 108 | $(eval id := $(shell curl -s $(api_url)/repos/$(repo) | jq '.id')) 109 | $(info Gathering data -- this takes a few moments ... Done.) 110 | 111 | report: vars 112 | @echo name = $(name) 113 | @echo version = $(version) 114 | @echo url = $(url) 115 | @echo desc = $(desc) 116 | @echo author = $(author) 117 | @echo email = $(email) 118 | @echo license = $(license) 119 | @echo branch = $(branch) 120 | @echo repo = $(repo) 121 | @echo id = $(id) 122 | @echo initfile = $(initfile) 123 | @echo distdir = $(distdir) 124 | @echo builddir = $(builddir) 125 | 126 | 127 | # make lint & make test ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 128 | 129 | lint: 130 | flake8 taupe 131 | 132 | test tests:; 133 | pytest -v --cov=taupe -l tests/ 134 | 135 | 136 | # make install ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 137 | 138 | install: 139 | python3 install -e .[dev] 140 | 141 | 142 | # make release ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 143 | 144 | release: | test-branch release-on-github print-instructions 145 | 146 | test-branch:; 147 | ifneq ($(branch),main) 148 | $(error Current git branch != main. Merge changes into main first!) 149 | endif 150 | 151 | update-init:; 152 | @sed -i .bak -e "s|^\(__version__ *=\).*|\1 '$(version)'|" $(initfile) 153 | @sed -i .bak -e "s|^\(__description__ *=\).*|\1 '$(desc)'|" $(initfile) 154 | @sed -i .bak -e "s|^\(__url__ *=\).*|\1 '$(url)'|" $(initfile) 155 | @sed -i .bak -e "s|^\(__author__ *=\).*|\1 '$(author)'|" $(initfile) 156 | @sed -i .bak -e "s|^\(__email__ *=\).*|\1 '$(email)'|" $(initfile) 157 | @sed -i .bak -e "s|^\(__license__ *=\).*|\1 '$(license)'|" $(initfile) 158 | 159 | update-meta:; 160 | @sed -i .bak -e "/version/ s/[0-9].[0-9][0-9]*.[0-9][0-9]*/$(version)/" codemeta.json 161 | 162 | update-citation:; 163 | $(eval date := $(shell date "+%F")) 164 | @sed -i .bak -e "/^date-released/ s/[0-9][0-9-]*/$(date)/" CITATION.cff 165 | @sed -i .bak -e "/^version/ s/[0-9].[0-9][0-9]*.[0-9][0-9]*/$(version)/" CITATION.cff 166 | 167 | edited := codemeta.json $(initfile) CITATION.cff 168 | 169 | commit-updates:; 170 | git add $(edited) 171 | git diff-index --quiet HEAD $(edited) || \ 172 | git commit -m"Update stored version number" $(edited) 173 | 174 | release-on-github: | vars update-init update-meta update-citation commit-updates 175 | $(eval tmp_file := $(shell mktemp /tmp/release-notes-$(name).XXXX)) 176 | git push -v --all 177 | git push -v --tags 178 | @$(info ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓) 179 | @$(info ┃ Write release notes in the file that gets opened in your ┃) 180 | @$(info ┃ editor. Close the editor to complete the release process. ┃) 181 | @$(info ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛) 182 | sleep 2 183 | $(EDITOR) $(tmp_file) 184 | gh release create v$(version) -t "Release $(version)" -F $(tmp_file) 185 | 186 | print-instructions: vars 187 | @$(info ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓) 188 | @$(info ┃ Next steps: ┃) 189 | @$(info ┃ 1. Check https://github.com/$(repo)/releases ) 190 | @$(info ┃ 2. Wait a few seconds to let web services do their work ┃) 191 | @$(info ┃ 3. Run "make packages" & check the results ┃) 192 | @$(info ┃ 4. Run "make test-pypi" to push to test.pypi.org ┃) 193 | @$(info ┃ 5. Check https://test.pypi.org/project/$(name) ) 194 | @$(info ┃ 6. Run "make pypi" to push to pypi for real ┃) 195 | @$(info ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛) 196 | 197 | packages: vars 198 | -mkdir -p $(builddir) $(distdir) 199 | python3 setup.py sdist --dist-dir $(distdir) 200 | python3 setup.py bdist_wheel --dist-dir $(distdir) 201 | python3 -m twine check $(distdir)/$(name)-$(version).tar.gz 202 | 203 | # Note: for the next action to work, the repository "testpypi" needs to be 204 | # defined in your ~/.pypirc file. Here is an example file: 205 | # 206 | # [distutils] 207 | # index-servers = 208 | # pypi 209 | # testpypi 210 | # 211 | # [testpypi] 212 | # repository = https://test.pypi.org/legacy/ 213 | # username = YourPyPIlogin 214 | # password = YourPyPIpassword 215 | # 216 | # You could copy-paste the above to ~/.pypirc, substitute your user name and 217 | # password, and things should work after that. See the following for more info: 218 | # https://packaging.python.org/en/latest/specifications/pypirc/ 219 | 220 | test-pypi: packages 221 | python3 -m twine upload --verbose --repository testpypi \ 222 | $(distdir)/$(name)-$(version)*.{whl,gz} 223 | 224 | pypi: packages 225 | python3 -m twine upload $(distdir)/$(name)-$(version)*.{gz,whl} 226 | 227 | 228 | # Cleanup and miscellaneous directives ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 229 | 230 | clean: clean-dist clean-build clean-release clean-other 231 | @echo ✨ Cleaned! ✨ 232 | 233 | really-clean: clean really-clean-dist really-clean-build 234 | 235 | completely-clean: really-clean clean-other 236 | rm -rf build dist 237 | 238 | clean-build:; 239 | rm -rf $(builddir)/lib $(builddir)/bdist.* 240 | 241 | clean-dist: vars 242 | rm -fr $(distdir)/$(name) $(distdir)/$(name)-$(version)-py3-none-any.whl 243 | 244 | really-clean-build: clean-build 245 | rm -rf $(builddir)/*.* 246 | 247 | really-clean-dist: clean-dist 248 | rm -fr $(distdir)/*.* 249 | 250 | clean-release:; 251 | rm -rf $(name).egg-info codemeta.json.bak $(initfile).bak README.md.bak 252 | 253 | clean-other:; 254 | rm -fr __pycache__ $(name)/__pycache__ .eggs 255 | rm -rf .cache 256 | rm -rf .pytest_cache 257 | 258 | .PHONY: release release-on-github update-init update-meta update-citation \ 259 | print-instructions packages clean test-pypi pypi extra-files dmg \ 260 | pyinstaller clean clean-dist clean-build clean-release clean-other \ 261 | really-clean really-clean-dist really-clean-build completely-clean 262 | 263 | .PHONY: help vars report release test-branch \ 264 | update-init update-meta update-citation commit-updates \ 265 | release-on-github print-instructions update-doi \ 266 | packages test-pypi pypi clean really-clean completely-clean \ 267 | clean-dist really-clean-dist clean-build really-clean-build \ 268 | clean-release clean-other dmg pyinstaller extra-files 269 | 270 | .SILENT: clean clean-dist clean-build clean-release clean-other really-clean \ 271 | really-clean-dist really-clean-build completely-clean 272 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Taupe 2 | 3 | A simple program to extract the URLs of your tweets, retweets, replies, quote tweets, and "likes" from a personal Twitter archive. 4 | 5 | [![License](https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square)](https://choosealicense.com/licenses/mit) 6 | [![Latest release](https://img.shields.io/github/v/release/mhucka/taupe.svg?style=flat-square&color=purple&label=Release)](https://github.com/mhucka/taupe/releases) 7 | 8 | 9 | ## Table of contents 10 | 11 | * [Introduction](#introduction) 12 | * [Installation](#installation) 13 | * [Usage](#usage) 14 | * [Known issues and limitations](#known-issues-and-limitations) 15 | * [Relationships to other similar tools](#relationships-to-other-similar-tools) 16 | * [Getting help](#getting-help) 17 | * [Contributing](#contributing) 18 | * [License](#license) 19 | * [Acknowledgments](#authors-and-acknowledgments) 20 | 21 | 22 | ## Introduction 23 | 24 | When you [download your personal Twitter archive](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive), you receive a [ZIP](https://en.wikipedia.org/wiki/ZIP_(file_format)) file. The contents are not necessarily in a format convenient for doing something with them. For example, you may want to send the URLs to the [Wayback Machine at the Internet Archive](https://archive.org/web/) or do something else with the URLs. For tasks like that, you need to extract URLs from your Twitter archive. That's the purpose of Taupe. 25 | 26 | _Taupe_ (a loose acronym of Twitter archive URL parser) takes a Twitter archive ZIP file, extracts the URLs corresponding to your tweets, retweets, replies, quote tweets, and liked tweets, and outputs the results in a [comma-separated values (CSV)](https://en.wikipedia.org/wiki/Comma-separated_values) format that you can easily use with other software tools. Once you have [installed it](#installation), using `taupe` is easy: 27 | ```shell 28 | # Extract tweets, retweets, replies, and quote tweets: 29 | taupe /path/to/your/twitter-archive.zip 30 | 31 | # Extract likes: 32 | taupe --extract likes /path/to/your/twitter-archive.zip 33 | 34 | # Learn more: 35 | taupe --help 36 | ``` 37 | 38 | ## Installation 39 | 40 | There are multiple ways of installing Taupe. Please choose the alternative that suits you. 41 | 42 | ### _Alternative 1: installing Taupe using `pipx`_ 43 | 44 | [Pipx](https://pypa.github.io/pipx/) lets you install Python programs in a way that isolates Python dependencies, and yet the resulting `taupe` command can be run from any shell and directory – like any normal program on your computer. If you use `pipx` on your system, you can install Taupe with the following command: 45 | ```sh 46 | pipx install taupe 47 | ``` 48 | 49 | Pipx can also let you run Taupe directly using `pipx run taupe`, although in that case, you must always prefix every Taupe command with `pipx run`. Consult the [documentation for `pipx run`](https://github.com/pypa/pipx#walkthrough-running-an-application-in-a-temporary-virtual-environment) for more information. 50 | 51 | 52 | ### _Alternative 2: installing Taupe using `pip`_ 53 | 54 | You should be able to install `taupe` with [`pip`](https://pip.pypa.io/en/stable/installing/) for Python 3. To install `taupe` from the [Python package repository (PyPI)](https://pypi.org), run the following command: 55 | ```sh 56 | python3 -m pip install taupe 57 | ``` 58 | 59 | As an alternative to getting it from [PyPI](https://pypi.org), you can use `pip` to install `taupe` directly from GitHub: 60 | ```sh 61 | python3 -m pip install git+https://github.com/mhucka/taupe.git 62 | ``` 63 | 64 | _If you already installed Taupe once before_, and want to update to the latest version, add `--upgrade` to the end of either command line above. 65 | 66 | 67 | ### _Alternative 3: installing Taupe from sources_ 68 | 69 | If you prefer to install Taupe directly from the source code, you can do that too. To get a copy of the files, you can clone the GitHub repository: 70 | ```sh 71 | git clone https://github.com/mhucka/taupe 72 | ``` 73 | 74 | Alternatively, you can download the software source files as a ZIP archive directly from your browser using this link: 75 | 76 | Next, after getting a copy of the files, run `setup.py` inside the code directory: 77 | ```sh 78 | cd taupe 79 | python3 setup.py install 80 | ``` 81 | 82 | 83 | ## Usage 84 | 85 | If the installation process described above is successful, you should end up with a program named `taupe` in a location where software is normally installed on your computer. Running `taupe` should be as simple as running any other command-line program. For example, the following command should print a helpful message to your terminal: 86 | ```shell 87 | taupe --help 88 | ``` 89 | 90 | If not given the option `--help` or `--version`, this program expects to be given a [personal Twitter archive file](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive), either on the command line (as an argument) or on standard input (from a pipe or file redirection). Here's an example (and note this path is fake – substitute a real path on your computer when you do this!): 91 | ```shell 92 | taupe /path/to/twitter-archive.zip 93 | ``` 94 | 95 | The URLs produced by `taupe` will be, by default, as they appear in the archive. If you want to [normalize the URLs](https://developer.twitter.com/en/blog/community/2020/getting-to-the-canonical-url-for-a-tweet) into the canonical form `https://twitter.com/twitter/status/TWEETID`, use the option `--canonical-urls` (`-c` for short): 96 | ```shell 97 | taupe -c /path/to/twitter-archive.zip 98 | ``` 99 | 100 | 101 | ### The structure of the output 102 | 103 | The option `--extract` controls both the content and the format of the output. The following options are recognized: 104 | 105 | | Value | Synonym | Output | 106 | |------------------|----------------|--------| 107 | | `all-tweets` | `tweets` | CSV table with all tweets and details (default) | 108 | | `my-tweets` | | list of URLs of only your original tweets | 109 | | `retweets` | | list of URLs of tweets that are retweets | 110 | | `quoted-tweets` | `quote-tweets` | list of URLs of other tweets you quoted | 111 | | `replied-tweets` | `reply-tweets` | list of URLs of other tweets you replied to | 112 | | `liked` | `likes` | list of URLs of tweets you "liked" | 113 | 114 | 115 | #### `all-tweets` 116 | 117 | When using `--extract all-tweets` (the default), `taupe` produces a table with four columns. Each row of the table corresponds to a type of event in the Twitter timeline: a tweet, a retweet, a reply to another tweet, or a quote tweet. The values in the columns provide details about the event. The following is a summary of the structure: 118 | 119 | | Column 1 | Column 2 | Column 3 | Column 4 | 120 | |:-------------:|----------|----------|----------| 121 | | tweet timestamp in ISO format | The URL of the tweet | The type; one of `tweet`, `reply`, `retweet`, or `quote` | (For type `reply` or `quote`.) The URL of the original or source tweet | 122 | 123 | The last column only has a value for replies and quote-tweets; in those cases, the URL in the column refers to the tweet being replied to or the tweet being quoted. The fourth column does not have a value for retweets even though it would be desirable, because the Twitter archive – strangely – does not provide the URLs of retweeted tweets. 124 | 125 | Here is an example of the output: 126 | ```text 127 | 2022-09-21T22:36:29+00:00,https://twitter.com/mhucka/status/1572716422857658368,quote,https://twitter.com/poppy_northcutt/status/1572714310077673472 128 | 2022-10-10T22:04:20+00:00,https://twitter.com/mhucka/status/1579593701965582336,reply,https://twitter.com/arfon/status/1579572453726355456 129 | 2022-10-14T04:17:01+00:00,https://twitter.com/mhucka/status/1580774654217625600,tweet 130 | 2022-10-25T14:49:06+00:00,https://twitter.com/mhucka/status/1584919989307715586,retweet 131 | ... 132 | ``` 133 | 134 | #### `my-tweets` 135 | 136 | When using `--extract my-tweets`, the output is just a single column (a list) of URLs, one per line, of just your original tweets. This list corresponds exactly to column 2 in the `--extract all-tweets` case above. 137 | 138 | 139 | #### `retweets` 140 | 141 | When using `--extract retweets`, the output is a single column (a list) of URLs, one per line, of tweets that are retweets of other tweets. This list corresponds to the values of column 2 above when the type is `retweet`. **Important**: the Twitter archive does not contain the original tweet's URL, only the URL of your retweet. Consequently, the output for `--extract retweets` is _your_ retweet's URL, not the URL of the source tweet. 142 | 143 | 144 | #### `quoted-tweets` 145 | 146 | When using `--extract quoted-tweets`, the output is a list of the URLs of other tweets that you have quoted. It corresponds to the subset of column 4 values above when the type is "quote". Note that these are the source tweet URLs, not the URLs of your tweets. 147 | 148 | 149 | #### `replied-tweets` 150 | 151 | When using `--extract replied-tweets`, the output is a list of the URLs of other tweets that you have replied to. It corresponds to the subset of column 4 values above when the type is "reply". Note that these are the source tweet URLs, not the URLs of your tweets. 152 | 153 | 154 | #### `likes` 155 | 156 | When using the option `--extract likes`, the output will only contain one column: the URLs of the "liked" tweets. `taupe` cannot provide more detail because the Twitter archive format does not contain date/time information for "likes". (This is also why "likes" are _not_ part of the output when `--extract all-tweets` is used – there is no possible value for column 1.) 157 | 158 | Here is an example of the output when using `--extract likes` in combination with `--canonical-urls`: 159 | ``` 160 | https://twitter.com/twitter/status/1588146224376463365 161 | https://twitter.com/twitter/status/1588349144803905536 162 | https://twitter.com/twitter/status/1590475356976578560 163 | ... 164 | ``` 165 | 166 | 167 | ### Other options recognized by `taupe` 168 | 169 | Running `taupe` with the option `--help` will make it print help text and exit without doing anything else. 170 | 171 | The option `--output` controls where `taupe` writes the output. If the value given to `--output` is `-` (a single dash), the output is written to the terminal (stdout). Otherwise, the value must be a file. 172 | 173 | If given the `--version` option, this program will print its version and other information, and exit without doing anything else. 174 | 175 | If given the `--debug` argument, `taupe` will output a detailed trace of what it is doing. The debug trace will be sent to the given destination, which can be `-` to indicate console output, or a file path to send the debug output to a file. 176 | 177 | ### _Summary of command-line options_ 178 | 179 | The following table summarizes all the command line options available. 180 | 181 | | Short      | Long form opt   | Meaning | Default | | 182 | |---------------|------------------------|----------------------|---------|---| 183 | | `-c` | `--canonical-urls` | Normalize Twitter URLs | Leave as-is| | 184 | | `-h` | `--help` | Print help info and exit | | | 185 | | `-e` _E_ | `--extract` _E_ | Extract URL type _E_ | `all-tweets` | ⚑ | 186 | | `-o` _O_ | `--output` _O_ | Write output to file _O_ | Terminal | ✦ | 187 | | `-V` | `--version` | Print program version & exit | | | 188 | | `-@` _OUT_ | `--debug` _OUT_ | Write debug output to _OUT_ | | ⚐ | 189 | 190 | ⚑   Recognized values: `all-tweets`, `tweets`, `my-tweets`, `retweets`, `quoted-tweets`, `replied-tweets`, and `likes`. See [section above](#the-structure-of-the-output) for more information.
191 | ✦   To write to the console, you can also use the character `-` as the value of _O_; otherwise, _O_ must be the name of a file where the output should be written.
192 | ⚐   To write to the console, use the character `-` as the value of _OUT_; otherwise, _OUT_ must be the name of a file where the output should be written. 193 | 194 | 195 | ## Known issues and limitations 196 | 197 | This program assumes that the Twitter archive ZIP file is in the format which Twitter produced in mid-November 2022. Twitter probably used a different format in the past, and may change the format again in the future, so `taupe` may or may not work on Twitter archives obtained in different historical periods. 198 | 199 | The Twitter archive format for "likes" contains only the tweet identifier and the text of the tweet; consequently, `taupe` cannot provide date/time information for this case. 200 | 201 | This program does all its work in memory, which means that `taupe`'s ability to process a given archive depends on its size and how much RAM the computer has. It has only been tested with modest-sized archives. It is unknown how it will behave with exceptionally large archives. 202 | 203 | 204 | ## Relationships to other similar tools 205 | 206 | To the author's knowledge, Taupe is the only tool that will directly and easily extract the URLs of tweets and "likes" from a Twitter archive ZIP file. There do exist other software tools for working with Twitter archives; the following is a (possibly incomplete) list: 207 | * [twitter-archive-parser](https://github.com/timhutton/twitter-archive-parser) – convert the contents of a Twitter archive into and extract other information such as lists of followers. 208 | * [Save Your Threads](https://archive.social) – lets you download signed PDFs of Twitter URLs. 209 | * [tweetback Twitter Archive](https://github.com/tweetback/tweetback) – "Take ownership of your Twitter data". 210 | * [twitter-tools](https://github.com/selfawaresoup/twitter-tools) – perform various operations such as get details about specific tweets using the Twitter API 211 | * [Twitter-Archive](https://github.com/jarulsamy/Twitter-Archive) – a Python CLI tool to download media from bookmarked tweets. 212 | * [get_twitter_bookmarks.py](https://gist.github.com/divyajyotiuk/9fb29c046e1dfcc8d5683684d7068efe#file-get_twitter_bookmarks_v3-py) – extract the URLs from bookmarked tweets; requires first using your web browser's developer interface to grab Twitter's bookmarks JSON data. 213 | * [archive.alt-text.org](https://github.com/alt-text-org/www.alt-text.org) – a tool for saving the alt text you've written on Twitter. 214 | * [twitter-archive-tweets](https://observablehq.com/@enjalot/twitter-archive-tweets) – a notebook to use as a starting point for processing tweets from your Twitter archive. 215 | * [fork of TWINT](https://github.com/woluxwolu/twint) – a fork of the now-defunct [Twitter Intelligence Tool](https://github.com/twintproject/twint). 216 | * [pleroma-bot](https://github.com/robertoszek/pleroma-bot) – bot for mirroring your favorite Twitter accounts in the Fediverse as well as migrating your own to the Fediverse using a Twitter archive. 217 | * [twitter-archive-analysis](https://github.com/dangoldin/twitter-archive-analysis) – a script to analyze your Twitter archive. 218 | * [twitter-archive-reader](https://github.com/alkihis/twitter-archive-reader) – explore tweets, DMs, media and more in a Twitter archive. 219 | * [twitter-archive-parser](https://github.com/leandrojmp/twitter-archive-converter) – extract tweets from a Twitter archive. 220 | 221 | 222 | ## Getting help 223 | 224 | If you find a problem or have a request or suggestion, please submit it in [the GitHub issue tracker](https://github.com/mhucka/taupe/issues) for this repository. 225 | 226 | 227 | ## Contributing 228 | 229 | I would be happy to receive your help and participation if you are interested. Everyone is asked to read and respect the [code of conduct](CONDUCT.md) when participating in this project. Please feel free to [report issues](https://github.com/mhucka/taupe/issues) or do a [pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) to fix bugs or add new features. 230 | 231 | 232 | ## License 233 | 234 | This software is Copyright (C) 2022, by Michael Hucka. This software is freely distributed under the MIT license. Please see the [LICENSE](LICENSE) file for more information. 235 | 236 | 237 | ## Acknowledgments 238 | 239 | This work is a personal project developed by the author, using computing equipment owned by the [California Institute of Technology Library](https://www.library.caltech.edu). 240 | 241 | The [vector artwork](https://thenounproject.com/icon/bird-233023/) of a bird, used as the icon for this repository, was created by [Noe Araujo](https://thenounproject.com/noearaujo/) from the Noun Project. It is licensed under the Creative Commons [CC-BY 3.0](https://creativecommons.org/licenses/by/3.0/) license. I manually changed the color to be a shade of taupe. 242 | 243 | Taupe uses multiple other open-source packages, without which it would have taken much longer to write the software. I want to acknowledge this debt. In alphabetical order, the packages are: 244 | * [Aenum](https://github.com/ethanfurman/aenum) – Python package for advanced enumerations 245 | * [CommonPy](https://github.com/caltechlibrary/commonpy) – a collection of commonly-useful Python functions 246 | * [Plac](https://github.com/ialbert/plac) – a command line argument parser 247 | * [Rich](https://github.com/Textualize/rich) – library for writing styled text to the terminal 248 | * [Sidetrack](https://github.com/caltechlibrary/sidetrack) – simple debug logging/tracing package 249 | * [Twine](https://github.com/pypa/twine) – utilities for publishing Python packages on [PyPI](https://pypi.org) 250 | -------------------------------------------------------------------------------- /taupe/__main__.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Taupe: Extract the URLs from your personal Twitter archive 3 | 4 | This file is part of https://github.com/mhucka/taupe/. 5 | 6 | Copyright (c) 2022 by Michael Hucka. 7 | This code is open-source software released under the MIT license. 8 | Please see the file "LICENSE" for more information. 9 | ''' 10 | 11 | import sys 12 | if sys.version_info <= (3, 8): 13 | print('taupe requires Python version 3.8 or higher,') 14 | print('but the current version is ' + str(sys.version_info.major) 15 | + '.' + str(sys.version_info.minor) + '.') 16 | exit(1) 17 | 18 | # Note: this code uses lazy loading. Additional imports are made later. 19 | from commonpy.data_structures import CaseFoldDict 20 | import errno 21 | import plac 22 | from sidetrack import set_debug, log 23 | 24 | from .exit_codes import ExitCode 25 | 26 | 27 | # Constants. 28 | # ............................................................................. 29 | 30 | # Mapping of recognized --extract argument values to canonical names. 31 | EXTRACT_OPTIONS = CaseFoldDict({'all-tweets' : 'all-tweets', 32 | 'tweets' : 'all-tweets', 33 | 'my-tweets' : 'my-tweets', 34 | 'my-tweet' : 'my-tweets', 35 | 'my' : 'my-tweets', 36 | 'mine' : 'my-tweets', 37 | 'retweets' : 'retweets', 38 | 'retweet' : 'retweets', 39 | 'quoted-tweets' : 'quote-tweets', 40 | 'quote-tweets' : 'quote-tweets', 41 | 'quoted' : 'quote-tweets', 42 | 'replied-tweets' : 'reply-tweets', 43 | 'reply-tweets' : 'reply-tweets', 44 | 'replied' : 'reply-tweets', 45 | 'reply' : 'reply-tweets', 46 | 'likes' : 'likes', 47 | 'liked' : 'likes', 48 | 'like' : 'likes'}) 49 | 50 | # Main program. 51 | # ............................................................................. 52 | 53 | @plac.annotations( 54 | canonical_urls = ('convert URLs to canonical Twitter URL form' , 'flag' , 'c'), 55 | extract = ('extract info "E" (default: tweets)' , 'option', 'e'), 56 | output = ('write output to destination "O" (default: stdout)', 'option', 'o'), 57 | version = ('print program version info and exit' , 'flag' , 'V'), 58 | debug = ('write debug trace to "OUT" ("-" for console)' , 'option', '@'), 59 | archive_file = 'path to Twitter archive ZIP file', 60 | ) 61 | def main(canonical_urls = False, extract = 'E', output = 'O', version = False, 62 | debug = 'OUT', *archive_file): 63 | '''Taupe extracts URLs from your downloaded personal Twitter archive. 64 | 65 | At its most basic, taupe ("Twitter Archive Url ParsEr") expects to be given 66 | the path to a Twitter archive ZIP file from which it should extract the URLs 67 | of tweets, replies, retweets, and quote tweets, and print the results: 68 | 69 | taupe /path/to/twitter-archive.zip 70 | 71 | If instead you want taupe to extract the URLs of "liked" tweets (see the next 72 | section for the difference), use the optional argument '--extract likes': 73 | 74 | taupe --extract likes /path/to/twitter-archive.zip 75 | 76 | The URLs produced by taupe will be, by default, as they appear in the archive, 77 | which means they will have account names in them. If you prefer to normalize 78 | the URLs to the canonical form https://twitter.com/twitter/status/TWEETID, use 79 | the optional argument '--canonical-urls': 80 | 81 | taupe --canonical-urls /path/to/twitter-archive.zip 82 | 83 | If you want to send the output to a file instead of the terminal, you can use 84 | the option '--output' and give it a destination file: 85 | 86 | taupe --output /tmp/urls.txt --canonical-urls /path/to/twitter-archive.zip 87 | 88 | The structure of the output 89 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 90 | 91 | The option '--extract' controls both the content and the format of the output. 92 | The following options are recognized: 93 | 94 | Value Synonym Output 95 | ------------ ------- ----------------------------------------------- 96 | all-tweets tweets CSV table with all tweets and details (default) 97 | my-tweets list of URLs of only your original tweets 98 | retweets list of URLs of tweets that are retweets 99 | quoted-tweets quote-tweets list of URLs of (other) tweets you quoted 100 | replied-tweets reply-tweets list of URLs of (other) tweets you replied to 101 | 102 | liked likes list of URLs of tweets you "liked" 103 | 104 | When using '--extract all-tweets' (the default), taupe produces a table with 105 | four columns. Each row of the table corresponds to a tweet of some kind. The 106 | values in the columns provide details: 107 | 108 | Column 1 Column 2 Column 3 Column 4 109 | -------- -------- ------------- --------------------------------- 110 | timestamp tweet URL type of tweet URL of quoted or replied-to tweet 111 | 112 | The last column only has a value for replies and quote-tweets; in those cases, 113 | it provides the URL of the tweet being replied to or the tweet being quoted. 114 | The fourth column does not have a value for retweets even though it would be 115 | desirable, because the Twitter archive (strangely) does not provide the 116 | URLs of retweeted tweets. Note also that this format does NOT include your 117 | "liked" tweets; those are available using a different option described below. 118 | 119 | When using '--extract my-tweets', the output is just a single column (a list) 120 | of URLs, one per line, corresponding to just your original tweets. This list 121 | corresponds exactly to column 2 in the '--extract all-tweets' case above. 122 | 123 | When using '--extract retweets', the output is a single column (a list) of 124 | URLs, one per line, of tweets that are retweets of other tweets. This list 125 | corresponds to the values of column 2 above when the type is 'retweet'. 126 | IMPORTANT: the Twitter archive does not contain the original tweet's URL, 127 | only the URL of your retweet. Consequently, the output of '--extract retweets' 128 | is YOUR retweet's URL, not the URL of the source tweet. 129 | 130 | When using '--extract quoted-tweets', the output is a list of the URLs of 131 | other people's tweets that you have quoted. It corresponds to the subset of 132 | column 4 values above when the type is "quote"; i.e., the source tweet URL, 133 | not the URL of your tweet. 134 | 135 | When using '--extract replied-tweets', the output is a list of the URLs of 136 | other people's tweets that you have replied to. It corresponds to the subset 137 | of column 4 values above when the type is "reply"; i.e., the source tweet URL, 138 | not the URL of your tweet. 139 | 140 | Finally, when using '--extract likes', the output will contain a list of the 141 | URLs of tweets you have "liked" on Twitter. Taupe cannot provide more details 142 | (not even timestamps) because the Twitter archive format does not contain the 143 | information. 144 | 145 | Other options recognized by taupe 146 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 147 | 148 | Running taupe with the option '--help' will make it print help text and exit 149 | without doing anything else. 150 | 151 | The option '--output' controls where taupe writes the output. If the value 152 | given to '--output' is "-" (a single dash), the output is written to the 153 | terminal (stdout). Otherwise, the value must be a file. 154 | 155 | If given the '--version' option, this program will print its version and other 156 | information, and exit without doing anything else. 157 | 158 | If given the '--debug' argument, taupe will output details about what it is 159 | doing. The debug trace will be sent to the given destination, which can be "-" 160 | to indicate console output, or a file path to send the debug output to a file. 161 | 162 | Return values 163 | ~~~~~~~~~~~~~ 164 | 165 | Taupe exits with a return code of 0 if no problem is encountered. Otherwise, 166 | it returns a nonzero value. The following table lists the possible values: 167 | 168 | 0 = success -- program completed normally 169 | 1 = the user interrupted the program's execution 170 | 2 = encountered a bad or missing value for an option 171 | 3 = file error -- encountered a problem with a file 172 | 4 = an exception or fatal error occurred 173 | 174 | Command-line options summary 175 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 176 | ''' 177 | 178 | # Process arguments & handle early exits ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 179 | 180 | debugging = (debug != 'OUT') 181 | if debugging: 182 | set_debug(True, debug) 183 | import faulthandler 184 | faulthandler.enable() 185 | 186 | if version: 187 | from taupe import print_version 188 | print_version() 189 | sys.exit(int(ExitCode.success)) 190 | 191 | log('starting.') 192 | log('command line: ' + str(sys.argv)) 193 | 194 | extract = 'all-tweets' if extract == 'E' else extract 195 | if extract not in EXTRACT_OPTIONS: 196 | stop('Unrecognized value for --extract option: ' + extract, ExitCode.bad_arg) 197 | else: 198 | requested = EXTRACT_OPTIONS[extract] 199 | 200 | archive_file = '-' if not archive_file else archive_file[0] 201 | if archive_file == '-' and sys.stdin.isatty(): 202 | stop('Need archive as argument or via pipe/redirection.', ExitCode.bad_arg) 203 | elif archive_file != '-': 204 | from commonpy.file_utils import readable 205 | from os.path import exists, isfile 206 | if not exists(archive_file): 207 | stop(f'Path does not appear to exist: {archive_file}', ExitCode.bad_arg) 208 | if not isfile(archive_file): 209 | stop(f'Path is not a file: {archive_file}', ExitCode.bad_arg) 210 | if not readable(archive_file): 211 | stop(f'File is not readable: {archive_file}', ExitCode.file_error) 212 | 213 | output = '-' if output == 'O' else output 214 | if output != '-': 215 | from commonpy.file_utils import writable 216 | if not writable(output): 217 | stop(f'Unable to write to destination: {output}', ExitCode.file_error) 218 | 219 | # Do the main work -------------------------------------------------------- 220 | 221 | exit_code = ExitCode.success 222 | try: 223 | if archive_file == '-': 224 | log('reading archive from stdin') 225 | import io 226 | archive_file = io.BytesIO(sys.stdin.buffer.read()) 227 | 228 | data = parsed_data(archive_file, requested, canonical_urls) 229 | filtered_data = filter(None, map(data_filter(requested), data)) 230 | write_data(filtered_data, output) 231 | except KeyboardInterrupt: 232 | # Catch it, but don't treat it as an error; just stop execution. 233 | log('keyboard interrupt received') 234 | exit_code = ExitCode.user_interrupt 235 | except Exception as ex: # noqa: PIE786 236 | exit_code = ExitCode.exception 237 | import traceback 238 | exception = sys.exc_info() 239 | details = ''.join(traceback.format_exception(*exception)) 240 | log('exception: ' + str(ex) + '\n\n' + details) 241 | if debugging and debug == '-': 242 | from rich.console import Console 243 | Console().print_exception() 244 | else: 245 | import taupe 246 | line = 'unknown' 247 | tb = ex.__traceback__ 248 | while tb.tb_next: 249 | tb = tb.tb_next 250 | line = tb.tb_lineno 251 | stop('Oh no! Taupe encountered an error. Please consider reporting' 252 | f' this to the developer. Your version of {taupe.__name__} is' 253 | f' {taupe.__version__} and the error occurred on line {line}.' 254 | f' For information about how to report this, please see the' 255 | f' project page at ' + taupe.__url__) 256 | 257 | # Exit with status code --------------------------------------------------- 258 | 259 | log(f'exiting with exit code {int(exit_code)}.') 260 | sys.exit(int(exit_code)) 261 | 262 | 263 | # Miscellaneous helpers. 264 | # ............................................................................. 265 | 266 | # The functions for extracting URLs from the .js files (currently only likes.js 267 | # and tweets.js) return a common intermediate format consisting of a generator 268 | # that produces 4-tuples: 269 | # 270 | # (date, url of my tweet, type, url of referenced tweet) 271 | # 272 | # The "type" can be one of "tweet", "reply", "retweet", "quote", or "like". 273 | # Some of the slots in the tuple are not filled in for all types. Notably, if 274 | # the type is "likes", the date and tweet url are empty (because for a "liked" 275 | # tweet, it only makes sense to talk about the referenced tweet's URL). 276 | # Conversely, if we're not extracting "likes", then the referenced tweet url 277 | # slot only has a value for types "quote" and "retweet". 278 | # 279 | # This kind of funneling of all types into a common intermediate form, even 280 | # though there is heterogeneity in the underlying data, is done to shorten 281 | # and simplify the code and not really for performance reasons. Performance 282 | # is currently not a concern because the expectation is that users won't run 283 | # this program very often anyway. 284 | 285 | def data_filter(requested): 286 | return { 287 | 'all-tweets' : lambda row: ','.join(row), 288 | 'my-tweets' : lambda row: row[1], 289 | 'retweets' : lambda row: row[1] if row[2] == 'retweet' else '', 290 | 'quote-tweets': lambda row: row[3] if row[2] == 'quote' else '', 291 | 'reply-tweets': lambda row: row[3] if row[2] == 'reply' else '', 292 | 'likes' : lambda row: row[3], 293 | }.get(requested) 294 | 295 | 296 | def likes_from(likes_file, username, canonical_urls = False): 297 | '''Return the URLs from the likes.js file in a Twitter archive.''' 298 | import json 299 | # The file starts with "window.YTD.like.part0 = ". Skip that and it's json. 300 | likes_json = json.loads(likes_file[23:]) 301 | log(f'extracted {len(likes_json)} likes from the likes file') 302 | likes_urls = (item['like']['expandedUrl'] for item in likes_json) 303 | account = 'twitter' if canonical_urls else username 304 | # Return the same 4-tuple format as tweets_from(...). 305 | return (('', '', 'like', url.replace('i/web', account)) for url in likes_urls) 306 | 307 | 308 | def tweets_from(tweets_file, username, canonical_urls = False): 309 | '''Return tuples of parsed data from tweets.js in a Twitter archive.''' 310 | from dateutil.parser import parse 311 | import json 312 | import re 313 | 314 | ending_in_twitter_url = re.compile(r'.*(https://t.co/\S+)$') 315 | 316 | # Helper functions. 317 | 318 | def user_from_tweet_url(url): 319 | if canonical_urls: 320 | return 'twitter' 321 | else: 322 | # Extract USERNAME from https://twitter.com/USERNAME/status/TWEETID 323 | fragment = url[20:] 324 | return fragment[: fragment.find('/')] 325 | 326 | def tweet_url(tweet): 327 | account = 'twitter' if canonical_urls else username 328 | return 'https://twitter.com/' + account + '/status/' + tweet['id_str'] 329 | 330 | def tweet_date(tweet): 331 | date = parse(tweet['created_at']) 332 | return date.isoformat() 333 | 334 | def tweet_data(tweet): 335 | tdate = tweet_date(tweet) 336 | turl = tweet_url(tweet) 337 | 338 | # Figure out the type & extracting reference URLs. Look for specific 339 | # cases; default case is normal tweet, possibly with embedded media. 340 | ttype = 'tweet' 341 | tref = '' 342 | if tweet.get('in_reply_to_status_id_str', None): 343 | # Easiest case: replies. 344 | ttype = 'reply' 345 | if canonical_urls: 346 | author = 'twitter' 347 | elif 'in_reply_to_screen_name' not in tweet: 348 | # This happens if the tweet being replied to has been deleted. 349 | log(f'reply tweet {tweet["id"]} refers to a deleted tweet') 350 | author = 'twitter' 351 | else: 352 | author = tweet['in_reply_to_screen_name'] 353 | tweet_id = tweet['in_reply_to_status_id_str'] 354 | tref = 'https://twitter.com/' + author + '/status/' + tweet_id 355 | elif tweet['full_text'].startswith('RT @'): 356 | ttype = 'retweet' 357 | # In my archive, the full_text of retweeted tweets is truncated, 358 | # and the tweet object doesn't contain the retweeted tweet's id 359 | # or a URL. (This despite that when I look up my retweet on 360 | # Twitter, it shows info about the original tweet.) The archive is 361 | # thus incomplete and I see no way to get the retweeted tweet's id. 362 | tref = '' 363 | elif (match := ending_in_twitter_url.match(tweet['full_text'])): 364 | # This can be either a quote tweet or just a tweet with media in it. 365 | embedded_url = match.group(1) 366 | for entity in tweet['entities']['urls']: 367 | if entity['url'] != embedded_url: 368 | continue 369 | # Found the entity info for the URL we pulled from the text. 370 | expanded_url = entity['expanded_url'] 371 | if not expanded_url.startswith('https://twitter.com'): 372 | # This is not a quote tweet after all. 373 | break 374 | author = user_from_tweet_url(expanded_url) 375 | tweet_id = expanded_url[expanded_url.rfind('/') + 1:] 376 | tref = 'https://twitter.com/' + author + '/status/' + tweet_id 377 | ttype = 'quote' 378 | break 379 | 380 | return (tdate, turl, ttype, tref) 381 | 382 | # The 26 is to skip the "window.YTD.tweets.part0 =" text at the start. 383 | all_tweets = json.loads(tweets_file[26:]) 384 | log(f'found a total of {len(all_tweets)} tweets in the tweets file') 385 | return sorted(tweet_data(tweet_json['tweet']) for tweet_json in all_tweets) 386 | 387 | 388 | def username_from(account_file): 389 | '''Return the "username" from the account.js file in a Twitter archive.''' 390 | import json 391 | # The file starts w/ "window.YTD.account.part0 = ". Skip it; rest is json. 392 | account_json = json.loads(account_file[27:]) 393 | username = account_json[0]['account']['username'] 394 | log(f'found username "{username}"') 395 | return username 396 | 397 | 398 | def parsed_data(source_zip, requested, canonical_urls): 399 | from zipfile import is_zipfile, ZipFile, BadZipFile, LargeZipFile 400 | if not is_zipfile(source_zip): 401 | stop('The input does not appear to be a ZIP file.', ExitCode.bad_arg) 402 | log(f'parsing Twitter data to extract {requested}') 403 | try: 404 | username = None 405 | with ZipFile(source_zip) as zf: 406 | # First find the account name because we need it to construct URLs. 407 | for item in zf.namelist(): 408 | if item == 'data/account.js': 409 | with zf.open(item) as file_: 410 | username = username_from(file_.read()) 411 | break 412 | if not username: 413 | stop('Cannot find account.js file in ' + source_zip, ExitCode.file_error) 414 | 415 | # Now extract the tweets. 416 | for item in zf.namelist(): 417 | if item == 'data/like.js' and requested == 'likes': 418 | with zf.open(item) as file_: 419 | return likes_from(file_.read(), username, canonical_urls) 420 | break 421 | elif item == 'data/tweets.js': 422 | with zf.open(item) as file_: 423 | return tweets_from(file_.read(), username, canonical_urls) 424 | break 425 | log('done parsing Twitter data') 426 | except BadZipFile: 427 | stop('Unable to parse ZIP archive.', ExitCode.file_error) 428 | except LargeZipFile: 429 | stop('Unable to parse very large ZIP archive.', ExitCode.file_error) 430 | 431 | 432 | def write_data(rows, dest): 433 | log(f'writing output to {dest}') 434 | try: 435 | if dest == '-': 436 | print(*rows, flush = True, sep = '\n') 437 | sys.stdout.flush() 438 | else: 439 | with open(dest, 'w') as output: 440 | output.write('\n'.join(rows)) 441 | except IOError as ex: 442 | # Check for broken pipe, as happens when the output is sent to "head". 443 | if ex.errno == errno.EPIPE: 444 | log('broken pipe') 445 | import os 446 | # This solution comes from a 2015-05-07 posting by user "mklement0" 447 | # to Stack Overflow at https://stackoverflow.com/a/30091579/743730. 448 | # Python flushes standard streams on exit, so redirect remaining 449 | # output to devnull to avoid another BrokenPipeError at shutdown. 450 | devnull = os.open(os.devnull, os.O_WRONLY) 451 | os.dup2(devnull, sys.stdout.fileno()) 452 | else: 453 | # A real error, not merely a broken pipe. Bubble up to caller. 454 | raise 455 | 456 | 457 | def stop(msg, err = ExitCode.exception): 458 | '''Print an error message and exit with an exit code.''' 459 | log('printing to terminal: ' + msg) 460 | from rich import print 461 | print('[red]' + msg + '[/]') 462 | log(f'exiting with exit code {int(err)}.') 463 | sys.exit(int(err)) 464 | 465 | 466 | # Main entry point. 467 | # ............................................................................. 468 | 469 | # The following entry point definition is for the console_scripts keyword 470 | # option to setuptools. The entry point for console_scripts has to be a 471 | # function that takes zero arguments. 472 | def console_scripts_main(): 473 | plac.call(main) 474 | 475 | 476 | # The following allows users to invoke this using "python3-m taupe" and also 477 | # pass it an argument of "help" to get the help text. 478 | if __name__ == '__main__': 479 | if len(sys.argv) > 1 and sys.argv[1] == 'help': 480 | plac.call(main, ['-h']) 481 | else: 482 | plac.call(main) 483 | 484 | 485 | # For Emacs users 486 | # ............................................................................. 487 | # Local Variables: 488 | # mode: python 489 | # python-indent-offset: 4 490 | # End: 491 | --------------------------------------------------------------------------------