├── .gitignore ├── TODO.md ├── Makefile ├── USAGE.md ├── README.md ├── bin ├── package_decompcheck_all.sh ├── populate_m4_db.sh ├── package_scan_all.sh ├── package_unpack_all.sh └── find_m4.sh └── LICENSE /.gitignore: -------------------------------------------------------------------------------- 1 | .targets 2 | m4.db 3 | m4.db-journal 4 | known_m4.db 5 | known_m4.db-journal 6 | unknown_m4.db 7 | unknown_m4.db-journal 8 | tmp 9 | -------------------------------------------------------------------------------- /TODO.md: -------------------------------------------------------------------------------- 1 | # Current TODOs 2 | 3 | * `package_unpack_all.sh` & `package_scan_all.sh`: 4 | 5 | - Investigate packages that do not unpack successfully (per-distro) 6 | 7 | - Better annotation/attribution of existing patterns 8 | 9 | - More search patterns 10 | 11 | - Add example output, triaging of false-positives 12 | 13 | - Smartly (re)scan different phases - after fresh unpack, then after 14 | applying distro patches; this will differ by distro 15 | 16 | * Add command-line knobs 17 | https://github.com/hlein/distro-backdoor-scanner/issues/19 18 | 19 | * Add fuzzy matching for m4 files 20 | https://github.com/hlein/distro-backdoor-scanner/issues/18 21 | 22 | * Compare git-tagged versions to Release assets 23 | See https://github.com/hlein/distro-backdoor-scanner/issues/17 24 | 25 | * Analyze `pkgconf` files? 26 | See https://github.com/hlein/distro-backdoor-scanner/issues/7 27 | 28 | * Analyze `IFUNC` use? 29 | See https://github.com/hlein/distro-backdoor-scanner/issues/16 30 | 31 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | TARGETS := $(shell cat .targets 2>/dev/null) 2 | 3 | BIN_DIR=/usr/local/bin 4 | 5 | BIN_FILES=\ 6 | find_m4.sh \ 7 | package_decompcheck_all.sh \ 8 | package_scan_all.sh \ 9 | package_unpack_all.sh \ 10 | populate_m4_db.sh \ 11 | 12 | install: 13 | @mkdir -p $(BIN_DIR) && \ 14 | for BIN_FILE in $(BIN_FILES) ; do \ 15 | if [ -f $${BIN_FILE} ]; then \ 16 | install -m 755 $${BIN_FILE} $(BIN_DIR) ; \ 17 | else \ 18 | install -m 755 bin/$${BIN_FILE} $(BIN_DIR) ; \ 19 | fi ; \ 20 | done 21 | 22 | diff: 23 | @for A in $(BIN_FILES) ; do \ 24 | if [ -f $(BIN_DIR)/$${A} ]; then \ 25 | if [ -f $${A} ]; then \ 26 | diff -u $(BIN_DIR)/$${A} $${A} ; \ 27 | else \ 28 | diff -u $(BIN_DIR)/$${A} bin/$${A} ; \ 29 | fi ; \ 30 | else \ 31 | echo "File $${A} is new" ; \ 32 | fi ; \ 33 | done ; \ 34 | true 35 | 36 | dist: 37 | @for TARGET in $(TARGETS) ; do \ 38 | echo DEST: $${TARGET} ; \ 39 | rsync -aP Makefile bin/* $${TARGET}: ; \ 40 | done 41 | -------------------------------------------------------------------------------- /USAGE.md: -------------------------------------------------------------------------------- 1 | ## Usage 2 | 3 | ### Unpack and Scan Distro Sources 4 | 5 | Typical use: 6 | 1. Run `package_unpack_all.sh` 7 | 2. Run `package_scan_all.sh`. 8 | 9 | #### package_unpack_all.sh 10 | 11 | No command-line args or knobs; edit the script to add support for 12 | another distribution (PR please!), or to set `PACKAGE_DIR` and/or 13 | `UNPACK_DIR` to override per-distro defaults, or `JOBS` to change the 14 | level of parallelization. 15 | 16 | #### package_scan_all.sh 17 | 18 | No command-line args or knobs; same `PACKAGE_DIR`, `UNPACK_DIR`, 19 | `JOBS` as `package_unpack_all.sh`. Also add/update patterns in the 20 | perl script embedded in the `do_dirs()` function (again, PR please!). 21 | 22 | ### Compare Decompressor Implementations 23 | 24 | Typical use: 25 | 1. Run `package_unpack_all.sh`. 26 | 2. Run `package_decompcheck_all.sh`. 27 | 28 | #### package_decompcheck_all.sh 29 | 30 | No command-line args or knobs; edit the script to change the 31 | distro-determined `OBJ_DIR` path or to change the name & args to 32 | the alternate decompressor. 33 | 34 | ### Identify Modified `.m4` Macro Files 35 | 36 | Typical use: 37 | 1. Run `populate_m4_db.sh` to create a reference database. 38 | 2. Run `package_unpack_all.sh` to collect 39 | 3. Run `MODE=1 find_m4.sh` compare unpacked packages' `.m4` files to the known references. 40 | 41 | #### populate_m4_db.sh 42 | 43 | Has some variables which can be overriden on the command line: 44 | 45 | * `KNOWN_REPOS_TOPDIR`: path to where "known good" Git repos are checked 46 | out. 47 | * `GNU_REPOS_TOPDIR`: path to where "known good" GNU Git repos are checked out. 48 | * `GNU_REPOS_TOPURL`: common URL prefix for "known good" GNU repos specifically 49 | * `NO_NET`: set to non-zero to prevent any outbound network connections 50 | (requires that you have already cloned the needed repos). 51 | * `TMPDIR`: _lots_ of tempdirs will be created under here during 52 | repo-spelunking; should properly clean up after itself. 53 | 54 | And one which you must edit the script to change: 55 | * `GNU_REPOS`, `OTHER_REPOS`: list of "known good" repos to check out. 56 | 57 | This script calls `MODE=0 find_m4.sh ...` to process the `.m4` files 58 | it finds. 59 | 60 | #### find_m4.sh 61 | 62 | ##### `MODE=0 find_m4.sh ...` 63 | 64 | Called with `MODE=0` set, create a DB of known `.m4` files, their 65 | names, serial numbers, and checksums. (This is typically not done 66 | directly, but called by `populate_m4_db.sh`.) 67 | 68 | ##### `MODE=1 find_m4.sh ...` 69 | 70 | Called with `MODE=1` set, find `.m4` files in source trees (set 71 | `M4_DIR` to specify the topdir; otherwise it is set to `UNPACK_DIR` 72 | using the same per-distro handling as `package_unpack_all.sh`). 73 | 74 | At the end of the run, outputs lists of not-seen-before `.m4` files, and `git 75 | diff` commands macros that have been seen before but do not match. 76 | 77 | Set `VERBOSE=1` for more immediate runtime output (expected vs found 78 | hash values, `git diff` commands, etc.) rather than only wait for the 79 | end. 80 | 81 | Set `DEBUG=1` to spam your console. 82 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # distro-backdoor-scanner 2 | 3 | Tools to scan OS distributions for backdoor indicators. 4 | 5 | See [USAGE](USAGE.md) for a rundown of how to use each script. 6 | 7 | The toolkit used for the `xz-utils` backdoor is far too sophisticated 8 | to be a first draft. Were there earlier iterations of this, that 9 | shared some things in common but were slightly simpler, injected into 10 | other projects? Can we detect the style/"fist" of the author 11 | elsewhere? Moreso the delivery mechanics - backdooring codebases - 12 | than the contents of the extracted+injected malicious `.so`. 13 | 14 | There need to be more search patterns, among other things; see 15 | [TODO](TODO.md). 16 | 17 | Distros supported: 18 | - Gentoo Linux: Works 19 | - Rocky/RHEL/CentOS Linux: Works 20 | - Debian/Devuan/Ubuntu Linux: Works 21 | - EndeavourOS/Arch: Works 22 | 23 | ## Checking distfiles 24 | 25 | Tools: 26 | * `package_unpack_all.sh` 27 | * `package_scan_all.sh` 28 | 29 | These scripts unpack the source packages for all of a distro repo's 30 | current packages, then scan them for content similar to the malware 31 | that was added to `xz-utils`. 32 | 33 | Have been run over: 34 | 35 | - ~11k EndeavourOS/Arch packages 36 | - ~40k Debian packages 37 | - ~19k Gentoo packages 38 | - ~9k Rocky/RPM packages 39 | 40 | This gives a manageable amount of results (~hundreds of hits), 41 | digestable by a human. So far the only confirmed malicious results 42 | are... from the backdoored `xz-utils` versions. 43 | 44 | ## Checking M4 macros 45 | 46 | Tools: 47 | * `populate_m4_db.sh` 48 | * `find_m4.sh` 49 | 50 | These scripts harvest every iteration of every `.m4` macro file ever 51 | committed to some specific repos considered "known good" (if, say, GNU 52 | `automake` upstream has already been trojaned, then the preppers were 53 | right, civilization is ending). Build an SQLite database of files, 54 | their embedded `serial` numbers (if any), their plain `sha256` 55 | checksum plus a checksum of the file contents with comments and 56 | whitespace-only lines removed. 57 | 58 | Then, for a given tree of sources (such as unpacked by 59 | `package_unpack_all.sh`), bash every `.m4` file found against the 60 | known-good database. Alert on `.m4` files that differ from any known 61 | upstream, and emit cut-and-paste-able `git diff` commands for human 62 | review (maybe the package customized it for good reason... or maybe 63 | to hide a trojan). Also warn about new `.m4` files (nothing inherently 64 | wrong with a package shipping its own, but noteworthy). Generate a 65 | database of unknowns, so that package A's `new.m4` and package B's 66 | `nouveau.m4` can be recognized to be the same, suggesting a shared 67 | upstream and/or developer. 68 | 69 | Running over the source trees of ~19k Gentoo packages containing 50k 70 | `.m4` files finds about 5k that are unrecognized (new, or modified), 71 | with a little under 1k `git diff` commands to compare mismatches to 72 | a candidate upstream file. Note, the candidate selection in case of 73 | mismatch is pretty basic now; plan is to implement fuzzy matching, see 74 | https://github.com/hlein/distro-backdoor-scanner/issues/18 75 | 76 | ## Comparing decompression output 77 | 78 | Tools: 79 | * `package_decompcheck_all.sh` 80 | 81 | Compare the output of the backdoored `xz-utils` decompressing 82 | a large corpus of `.xz` files vs the output of an independent 83 | implementation, just in case of some fancy 84 | [injection](https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf) 85 | of malware into the output stream whenever a recognized block of 86 | tarred-up code is decompressed. Verrry unlikely to catch something, 87 | but easy to look for so why not. So far this has only caught minor 88 | bugs in other decompressors (upstream bugs will be filed, but not 89 | urgent). 90 | -------------------------------------------------------------------------------- /bin/package_decompcheck_all.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # Decompress every .xz found in source dir(s) using xz and 4 | # an alternate implementation, error if any differences. 5 | # XXX: add .lzma as well? 6 | 7 | warn() 8 | { 9 | echo "$@" >&2 10 | } 11 | export -f warn 12 | 13 | die() 14 | { 15 | warn "$@" 16 | exit 1 17 | } 18 | export -f die 19 | 20 | CHECK_LOG=~/parallel_checksum.log 21 | PKG_LIST=~/package_list 22 | COMMANDS="parallel xz " 23 | BATCH_SIZE=20 24 | BATCH_NUM=0 25 | 26 | export ALT_XZ="gxz" 27 | COMMANDS+="$ALT_XZ" 28 | 29 | verbose() 30 | { 31 | test "$VERBOSE" = "1" && echo "$@" 32 | } 33 | export -f verbose 34 | 35 | do_compare() 36 | { 37 | THIS_COUNT=1 38 | for F in "$@" ; do 39 | verbose "### child $$ processing '$F'" 40 | 41 | # Force a single thread since we are parallelizing one level up 42 | XZ_SUM=$(xz -d -T1 < "$F" | sha256sum; exit ${PIPESTATUS[0]}) 43 | test $? != "0" && warn "### child $$ error on file $THIS_COUNT, '$F'" && continue 44 | 45 | XZ_SUM=${XZ_SUM%% *} 46 | 47 | # May need tweaking for other implementations w/different args 48 | ALT_SUM=$($ALT_XZ -d < "$F" | sha256sum; exit ${PIPESTATUS[0]}) 49 | test $? != "0" && warn "### child $$ error on file $THIS_COUNT, '$F'" && continue 50 | 51 | ALT_SUM=${ALT_SUM%% *} 52 | 53 | # Force a mismatch for testing 54 | #ALT_SUM+="wakkawakka" 55 | 56 | if [ "$XZ_SUM" != "$ALT_SUM" ]; then 57 | die "Mismatch for '$F': xz '$XZ_SUM' vs $ALT_XZ '$ALT_SUM'" 58 | fi 59 | let THIS_COUNT=$THIS_COUNT+1 60 | done 61 | } 62 | export -f do_compare 63 | 64 | test -f /etc/os-release || die "Required /etc/os-release not found" 65 | 66 | # Various locations, commands, etc. differ by distro 67 | 68 | OS_ID=$(sed -n -E 's/^ID="?([^ "]+)"? *$/\1/p' /etc/os-release 2>/dev/null) 69 | 70 | case "$OS_ID" in 71 | 72 | "") 73 | die "Could not extract an ID= line from /etc/os-release" 74 | ;; 75 | 76 | debian|devuan|ubuntu) 77 | # XXX: is there an equivalent of MAKE_OPTS that sets a -j factor? 78 | JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 79 | OBJ_DIR="/var/packages/" 80 | ;; 81 | 82 | gentoo) 83 | JOBS=$(sed -E -n 's/^MAKEOPTS="[^"#]*-j ?([0-9]+).*/\1/p' /etc/portage/make.conf 2>/dev/null) 84 | OBJ_DIR="$(portageq distdir)" 85 | if [ -z "$OBJ_DIR" ]; then 86 | OBJ_DIR="/usr/portage/distfiles/" 87 | else 88 | # Make sure there is a trailing slash 89 | OBJ_DIR="${OBJ_DIR%/}/" 90 | fi 91 | ;; 92 | 93 | # XXX: only actually tested on Rocky Linux yet 94 | centos|fedora|rhel|rocky) 95 | # XXX: is there an equivalent of MAKE_OPTS that sets a -j factor? 96 | JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 97 | OBJ_DIR="/var/repo/BUILD/" 98 | ;; 99 | 100 | *) 101 | die "Unsupported OS '$OS_ID'" 102 | ;; 103 | esac 104 | 105 | for COMMAND in $COMMANDS ; do 106 | command -v ${COMMAND} >/dev/null || die "${COMMAND} not found in PATH" 107 | done 108 | 109 | test -d "$OBJ_DIR" || die "Object dir '$OBJ_DIR' does not exist" 110 | 111 | echo "### Building a list of '.xz' objects in '${OBJ_DIR}'..." 112 | 113 | # Some distros unpack tarballs in the same dir those tarballs live; 114 | # we are currently only concerned with checking those toplevel files. 115 | # For other distros, limiting depth will have no impact. 116 | mapfile -d '' OBJS < <(find "${OBJ_DIR}" -maxdepth 1 -type f -name \*.xz -print0) 117 | export OBJ_COUNT=${#OBJS[@]} 118 | BATCHES=$(( ($OBJ_COUNT + $BATCH_SIZE - 1) / $BATCH_SIZE)) 119 | 120 | COUNT=0 121 | echo "### Processing $OBJ_COUNT objects in $JOBS parallel decompress-compare jobs" 122 | 123 | printf "%s\0" "${OBJS[@]}" | \ 124 | parallel -0 -j$JOBS -n $BATCH_SIZE --line-buffer --joblog +${CHECK_LOG} \ 125 | 'echo "### child $$ processing batch {#}/'"$BATCHES"'" && do_compare {} && echo "### child $$ finished"' 126 | 127 | -------------------------------------------------------------------------------- /bin/populate_m4_db.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # For a set of git repos harvest every iteration of every .m4 file 4 | # and feed them to find_m4.sh to record their filename, serialno, 5 | # checksum, and other metadata. 6 | # 7 | # These will then be used as a corpus of "known good" .m4 files, 8 | # against which arbitrary repos' .m4 files can be compared. 9 | 10 | # Override w/env vars 11 | KNOWN_REPOS_TOPDIR="${KNOWN_REPOS_TOPDIR:-${HOME}/known-repos/}" 12 | GNU_REPOS_TOPURL="${GNU_REPOS_TOPURL:-https://git.savannah.gnu.org/r/}" 13 | NO_NET="${NO_NET:-0}" 14 | 15 | # GNU autotools project repos we will harvest .m4 files from. 16 | # TODO: Add support for specifying arbitrary repos/dirs to include 17 | GNU_REPOS=( 18 | "autoconf" 19 | "autoconf-archive" 20 | "automake" 21 | "gettext" 22 | "gnulib" 23 | "libtool" 24 | ) 25 | 26 | # Other repos we will harvest. These should be full clonable URLs 27 | OTHER_REPOS=( 28 | "https://github.com/freetype/freetype" 29 | "https://gitlab.gnome.org/GNOME/gnome-common" 30 | "https://gitlab.gnome.org/GNOME/gobject-introspection" 31 | "https://gitlab.gnome.org/GNOME/gtk-doc" 32 | "https://github.com/pkgconf/pkgconf" 33 | "https://gitlab.gnome.org/GNOME/vala" 34 | "https://gitlab.xfce.org/xfce/xfce4-dev-tools" 35 | ) 36 | 37 | shopt -s expand_aliases 38 | alias tput=false 39 | test -f /lib/gentoo/functions.sh && . /lib/gentoo/functions.sh || { 40 | # Stubs for non-Gentoo systems 41 | eerror() { echo "$@"; } 42 | ewarn() { echo "$@"; } 43 | einfo() { echo "$@"; } 44 | eindent() { :; } 45 | eoutdent() { :; } 46 | } 47 | unalias tput 48 | 49 | cleanup() 50 | { 51 | [[ ${#CLEAN_DIRS[@]} == 0 ]] && return 52 | einfo "Cleaning up ${#CLEAN_DIRS[@]} tmpdirs..." 53 | printf "%s\0" "${CLEAN_DIRS[@]}" | xargs -0 -I@ bash -c 'rm -f @/*.m4{,.gitcommit,.gitpath,.gitrepo} && rmdir @' 54 | CLEAN_DIRS=() 55 | } 56 | 57 | die() 58 | { 59 | echo "$@" >&2 60 | cleanup 61 | exit 1 62 | } 63 | 64 | COMMANDS=( git grep mktemp realpath ) 65 | 66 | if command -v find_m4.sh >/dev/null ; then 67 | FINDM4=find_m4.sh 68 | elif [[ -x "${BASH_SOURCE%/*}/find_m4.sh" ]]; then 69 | FINDM4="${BASH_SOURCE%/*}/find_m4.sh" 70 | FINDM4=$(realpath "${FINDM4}") 71 | else 72 | die "Could not find find_m4.sh in PATH or '${BASH_SOURCE%/*}/'" 73 | fi 74 | 75 | for COMMAND in "${COMMANDS[@]}" ; do 76 | command -v "${COMMAND}" >/dev/null || die "'${COMMAND}' not found in PATH" 77 | done 78 | 79 | # If TMPDIR is set, force it to be an absolute path 80 | [[ -n "${TMPDIR}" ]] && TMPDIR=$(realpath "${TMPDIR}") 81 | 82 | KNOWN_REPOS_TOPDIR="${KNOWN_REPOS_TOPDIR%/}/" 83 | [[ -d ${KNOWN_REPOS_TOPDIR} ]] || die "KNOWN_REPOS_TOPDIR directory '${KNOWN_REPOS_TOPDIR}' does not exist" 84 | cd ${KNOWN_REPOS_TOPDIR} || die "chdir KNOWN_REPOS_TOPDIR directory '${KNOWN_REPOS_TOPDIR}' failed" 85 | 86 | # Warn about cloning repos only the first time 87 | warn_clone_abort=1 88 | 89 | DIRS=() 90 | for repo in "${GNU_REPOS[@]}" "${OTHER_REPOS[@]}" ; do 91 | 92 | # canonicalize repo name and path 93 | repo="${repo%/}"; repo="${repo%.git}" 94 | 95 | repo_topurl="${repo%/*}" 96 | if [[ -z $repo_topurl ]]; then 97 | repo_topurl="${GNU_REPOS_TOPURL}" 98 | else 99 | repo="${repo##*/}" 100 | fi 101 | repo_topurl="${repo_topurl%/}/" 102 | 103 | if [[ ! -d "${KNOWN_REPOS_TOPDIR}/${repo}" ]]; then 104 | [[ ${NO_NET} != "0" ]] && die "Repo '${repo}' not found under '${KNOWN_REPOS_TOPDIR}' but NO_NET='${NO_NET}'" 105 | einfo "Repo '${repo}' not found under '${KNOWN_REPOS_TOPDIR}', cloning" 106 | if [[ ${warn_clone_abort} == 1 ]]; then 107 | ewarn "Hit ^C within 5 seconds to abort" 108 | sleep 5 109 | fi 110 | warn_clone_abort=0 111 | git clone ${repo_topurl}${repo}.git/ || die "Clone '${repo_topurl}${repo}.git' failed" 112 | fi 113 | git -C ${repo} branch | grep -E -q '^\* (master|main)$' || die "Repo ${repo} exists but not in master/main branch" 114 | DIRS+=( "${KNOWN_REPOS_TOPDIR}${repo}" ) 115 | done 116 | 117 | einfo "Checking for regular directories to be processed..." 118 | for dir in "${DIRS[@]}" ; do 119 | [[ -d "${dir}"/.git ]] && continue 120 | einfo "Processing .m4 files under ${dir##*/}..." 121 | MODE=0 ${FINDM4} "${dir}" || exit 1 122 | done 123 | 124 | CLEAN_DIRS=() 125 | 126 | trap 'die' SIGINT 127 | 128 | einfo "Checking for git repos to be processed..." 129 | # Now go further and check out every commit touching M4 serial numbers. 130 | # TODO: Use \x00 delimiter 131 | for dir in "${DIRS[@]}" ; do 132 | [[ -d "${dir}/.git" ]] || continue 133 | 134 | einfo "Processing all versions of all .m4 files in git repo ${dir##*/}..." 135 | 136 | batch_dirs=() 137 | 138 | # Make sure we have the latest, unless NO_NET was set 139 | [[ ${NO_NET} == "0" ]] && { git -C "${dir}" pull --tags || die "git pull failed"; } 140 | 141 | # TODO: Could this be parallelized, or does git do locking that 142 | # would defeat it? 143 | while read -d'|' gunk; do 144 | # Example text: 145 | # 1994-09-26T03:02:30+00:00 74cc3753fc2479e53045e952b3dcd908bbafef79 146 | # 147 | # M acgeneral.m4 148 | # M lib/autoconf/general.m4 149 | commit= 150 | files=() 151 | # TODO: Do we really need the read/printf here? 152 | while read line ; do 153 | fragment="${line##*[[:space:]]}" 154 | if [[ -z ${commit} ]] ; then 155 | commit="${fragment}" 156 | continue 157 | fi 158 | 159 | if [[ ${line} =~ \.m4$ ]] ; then 160 | files+=( "${fragment}" ) 161 | continue 162 | fi 163 | done < <(printf "%s\n" "${gunk}") 164 | 165 | #einfo "Scraping ${dir##*/} at commit ${commit}" 166 | 167 | temp=$(mktemp -d) 168 | CLEAN_DIRS+=( "${temp}" ) 169 | for file in "${files[@]}" ; do 170 | filename=${file##*/} 171 | echo "${dir}" > "${temp}"/${filename}.gitrepo 172 | echo "${commit}" > "${temp}"/${filename}.gitcommit 173 | echo "${file#${dir}}" > "${temp}"/${filename}.gitpath 174 | 175 | git -C "${dir}" cat-file -p "${commit}:${file}" > "${temp}"/${filename} 176 | done 177 | 178 | batch_dirs+=( "${temp}" ) 179 | done < <(git -C "${dir}" log --diff-filter=ACMR --date-order --reverse --format='| %ad %H' --name-status --date=iso-strict -- '*.m4') 180 | 181 | MODE=0 ${FINDM4} "${batch_dirs[@]}" || die 182 | 183 | # remove all tempdirs created for this repo 184 | cleanup 185 | done 186 | -------------------------------------------------------------------------------- /bin/package_scan_all.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # Process batches of package dirs in parallel; 4 | # process files under each set of dirs serially. 5 | 6 | die() 7 | { 8 | echo "$@" >&2 9 | exit 1 10 | } 11 | 12 | SCAN_LOG=~/parallel_scan.log 13 | PKG_LIST=~/package_list 14 | COMMANDS="parallel perl xargs" 15 | BATCH_SIZE=50 16 | BATCH_NUM=0 17 | 18 | # set DEBUG to non-0: print more individual status messages 19 | : "${DEBUG:=0}" 20 | export DEBUG 21 | 22 | test -f /etc/os-release || die "Required /etc/os-release not found" 23 | 24 | # Various locations, commands, etc. differ by distro 25 | 26 | OS_ID=$(sed -n -E 's/^ID="?([^ "]+)"? *$/\1/p' /etc/os-release 2>/dev/null) 27 | 28 | case "$OS_ID" in 29 | 30 | "") 31 | die "Could not extract an ID= line from /etc/os-release" 32 | ;; 33 | 34 | arch|endeavouros) 35 | JOBS=$(sed -E -n 's/^MAKEFLAGS="[^"#]*-j ?([0-9]+).*/\1/p' /etc/makepkg.conf 2>/dev/null) 36 | [[ -z $JOBS ]] && JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 37 | 38 | UNPACK_DIR="${HOME}/pkgs/sources/" 39 | 40 | make_dir_list() 41 | { 42 | # Arch package source trees are repo/pkg-ver/pkg/src/pkg-ver/ 43 | mapfile -d '' DIRS < <(find "${UNPACK_DIR}" -mindepth 5 -maxdepth 5 -type d -print0) 44 | # Scan a single package (XXX: hardcoded; should be an arg or env var) 45 | ##mapfile -d '' DIRS < <(find "${UNPACK_DIR}core/xz-5.6.1-3" -mindepth 3 -maxdepth 3 -type d -print0) 46 | } 47 | ;; 48 | 49 | debian|devuan|ubuntu) 50 | # XXX: is there an equivalent of MAKE_OPTS that sets a -j factor? 51 | JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 52 | UNPACK_DIR="/var/packages/" 53 | 54 | make_dir_list() 55 | { 56 | # Debian package source trees are simply packagename/ 57 | mapfile -d '' DIRS < <(find "${UNPACK_DIR}" -maxdepth 1 -type d -print0) 58 | # Scan a single package (XXX: hardcoded; should be an arg or env var) 59 | ##mapfile -d '' DIRS < <(find "${UNPACK_DIR}"xz-utils-5.6.0 -maxdepth 0 -type d -print0) 60 | } 61 | ;; 62 | 63 | gentoo) 64 | JOBS=$(sed -E -n 's/^MAKEOPTS="[^"#]*-j ?([0-9]+).*/\1/p' /etc/portage/make.conf 2>/dev/null) 65 | UNPACK_DIR="${PORTAGE_TMPDIR:-/var/tmp/portage/}" 66 | # We want to get the 'real' PORTAGE_TMPDIR, as PORTAGE_TMPDIR has confusing 67 | # semantics (PORTAGE_TMPDIR=/var/tmp -> stuff goes into /var/tmp/portage). 68 | UNPACK_DIR="${UNPACK_DIR%%/portage/}/portage/" 69 | 70 | make_dir_list() 71 | { 72 | # Gentoo package source trees have the form category/packagename-ver/work/* 73 | mapfile -d '' DIRS < <(find "${UNPACK_DIR}" -mindepth 3 -maxdepth 3 -type d -name work -print0) 74 | # Scan a single package (XXX: hardcoded; should be an arg or env var) 75 | ##mapfile -d '' DIRS < <(find "${UNPACK_DIR}"app-arch/xz-utils-5.6.1 -mindepth 1 -maxdepth 1 -type d -name work -print0) 76 | } 77 | ;; 78 | 79 | # XXX: only actually tested on Rocky Linux yet 80 | centos|fedora|rhel|rocky) 81 | # XXX: is there an equivalent of MAKE_OPTS that sets a -j factor? 82 | JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 83 | UNPACK_DIR="/var/repo/BUILD/" 84 | 85 | make_dir_list() 86 | { 87 | # Unpacked RPMs have a particular fan-out structure per repo 88 | mapfile -d '' DIRS < <(find "${UNPACK_DIR}" -maxdepth 1 -type d -print0) 89 | # Scan a single package (XXX: hardcoded; should be an arg or env var) 90 | ##mapfile -d '' DIRS < <(find "${UNPACK_DIR}/xz-5.6.1" -maxdepth 0 -type d -print0) 91 | } 92 | ;; 93 | 94 | *) 95 | die "Unsupported OS '$OS_ID'" 96 | ;; 97 | esac 98 | 99 | for COMMAND in $COMMANDS ; do 100 | command -v ${COMMAND} >/dev/null || die "'${COMMAND}' not found in PATH" 101 | done 102 | 103 | test -d "$UNPACK_DIR" || die "Unpack target '$UNPACK_DIR' does not exist" 104 | 105 | cd "$UNPACK_DIR" || die "Could not cd '$UNPACK_DIR'" 106 | 107 | # Depends on $PKG_LIST generated previously by package_unpack_all.sh 108 | test -f "$PKG_LIST" || die "package list '$PKG_LIST' does not exist" 109 | test -s "$PKG_LIST" || die "package list '$PKG_LIST' is empty" 110 | 111 | # Function that parallel will kick off to do the work on batches of dirs 112 | do_dirs() 113 | { 114 | export GRANDPARENT=$$ 115 | # XXX: Can inject a -name argument for testing; should be an arg or env var 116 | NAMEARG='' 117 | ##NAMEARG='-name CMakeLists.txt' 118 | find "$@" -type f $NAMEARG -print0 | xargs -0 perl -ne ' 119 | # Note: perl -n mode eats trailing spaces in filenames; 120 | # trying to escape by rewriting ARGV in BEGIN does not work. 121 | # So we expect handfuls of failures opening perfectly good paths 122 | # from tools with files ending in space (all appear to be test 123 | # files to make sure the tools handle such cases well!) 124 | BEGIN { $files_done=0; }; 125 | if 126 | ( 127 | # Patterns based loosely on indicators in known-trojaned xz-utils - 128 | # changes made to an .m4 file, strings from some binary files and 129 | # from fragments assembled at build time. 130 | # Add enough fuzz to catch slight variations, perhaps previous 131 | # generations, but not so much as to false positive much. 132 | m{ 133 | # Strings in build-to-host.m4 in the release tarballs (not in git repo) 134 | # https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27 135 | \#\s* build-to-host\.m4\s+ serial\s+ ([4-9]|3[0-9]) 136 | | \|\s* eval\s+ \$gl_path 137 | | \[\s* eval\s+ \$gl_config .* \|\s* \$SHELL 138 | | dnl\s+ If\s+ the\s+ host\s+ conversion\s+ code\s+ has\s+ been\s+ placed\s+ in 139 | | dnl\s+ Search\s+ for\s+ Automake-defined\s+ pkg\*\s+ macros 140 | | map='\''tr\s+ "\\t\s+ \\-_"\s+ "\s+ \\t_\\-"'\'' 141 | | HAVE_PKG_CONFIGMAKE=1 142 | # In build-to-host.m4 and in v5.6.1 stage2 extension loader 143 | # https://gynvael.coldwind.pl/?lang=en&id=782#stage2-ext 144 | | grep\s+ - (?: aErls | broaF ) 145 | # Seen in stage1 loader 146 | # https://gynvael.coldwind.pl/?lang=en&id=782 147 | | head\s+ -c.*head\s+ -c.*head\s+ -c 148 | | \# \{ [3-5] \} \[\[:alnum:\]\] \{ [3-6] \} \#\{ [3-5] \} \$ 149 | | eval\s+ \$[a-z]+\s* \|\s* tail\s+ -c 150 | | \#{2}s* [Hh]ello\s* \#{2} 151 | | \#{2}s* [Ww]orld\s* \#{2} 152 | # Stage1 substitution cipher implemented in tr 153 | # (generalized so other "keys" can match) 154 | | tr\s+ " (?: \\ [0-9]{1,3} - \\ [0-9]{1,3} ){3} 155 | # identifier bytes in different versions of stage1 156 | # https://gynvael.coldwind.pl/?lang=en&id=782 157 | | \x86 \xf9 \x5a \xf7 \x2e \x68 \x6a \xbc 158 | | \xe5 \x55 \x89 \xb7 \x24 \x04 \xd8 \x17 159 | # stage2 loader with some fuzz tolerances added 160 | # https://gynvael.coldwind.pl/?lang=en&id=782#stage2-backdoor 161 | | BEGIN\{FS="\\n";RS="\\n";ORS="";m=256;for\( ([a-z]) =0; $1 \s* /dev/null 172 | | if\s+ (!\s+ )?test\s+ -[a-z]\s+ "[^\s"]+ /tests/files/\$[^\s"]+ "\s* >\s*/dev/null 173 | | sed\s+ -i\s+ "/\$./i\$."\s+ src/[^\s]+/Makefile\s* \|\|\s* true 174 | }x 175 | or 176 | ( 177 | # One suspicious commit short-circuited cmake logic by entering a "." 178 | # in a line by itself in a CMakeLists.txt file; do we see that elsewhere? 179 | # https://git.tukaani.org/?p=xz.git;a=commit;h=328c52da8a2bbb81307644efdb58db2c422d9ba7 180 | $ARGV =~ /CMakeLists.txt$/ and 181 | m{ 182 | ^ \. \s* $ 183 | }x 184 | ) 185 | or 186 | ( 187 | # A key=value pair has been found to be an "off switch" env var; look 188 | # for others that match the observed pattern but are otherwise rare. 189 | /(?:^|['\''"\x0\s])([A-Za-z0-9]{12,18})=([A-Za-z0-9]{12,18})(?:$|['\''"\x0\s])/ && 190 | length($1) eq length($2) && 191 | lc($1) ne lc($2) && 192 | $1 !~ /$2/ && 193 | $2 !~ /$1/ && 194 | $1 !~ /^(?:[A-Z]+|[A-Z]?[a-z]+|[0-9]+)$/ && 195 | $2 !~ /^(?:[A-Z]+|[A-Z]?[a-z]+|[0-9]+)$/ && 196 | $1 !~ /^(?:min|max)?(?:[A-Z]+[a-z]+)+$/ && 197 | $2 !~ /^(?:min|max)?(?:[A-Z]+[a-z]+)+$/ && 198 | $1 !~ /^(?:0x)?[A-Fa-f0-9]+$/ && 199 | $2 !~ /^(?:0x)?[A-Fa-f0-9]+$/ 200 | ) 201 | ) 202 | { 203 | chomp; 204 | s/^[^!-~ ]+//; 205 | s/[^!-~ ]+$//; 206 | # Armor any non-ascii 207 | s/([^!-~ ])/sprintf("\\x%02s",unpack("C",$1))/eg; 208 | print "$ARGV $. $_\n"; 209 | }; 210 | # At the end of each file, reset $. and count the file as done 211 | close ARGV if (eof and ++$files_done); 212 | if (eof) 213 | { 214 | close ARGV; 215 | $files_done++; 216 | # Do not enable this, very loud and noticably slower 217 | print "### $$ finished $ARGV\n" if $ENV{DEBUG}; 218 | } 219 | END 220 | { 221 | # Log our heredity so post-processing can regroup if needed 222 | print "### grandchild $::ENV{GRANDPARENT} -> " . getppid() . 223 | " -> $$ processed $files_done files\n"; 224 | } 225 | ' 226 | } 227 | export -f do_dirs 228 | 229 | PKGS=$(wc -l "$PKG_LIST" | cut -d\ -f1) 230 | 231 | # Populate DIRS (distro-specific) 232 | make_dir_list 233 | 234 | # XXX: we should track paths that have been scanned so we can resume 235 | # without redundant re-scans. Either keep a list and filter against it, 236 | # or touch state files a layer up? 237 | 238 | export DIR_COUNT=${#DIRS[@]} 239 | BATCHES=$(( ($DIR_COUNT + $BATCH_SIZE - 1) / $BATCH_SIZE)) 240 | 241 | echo "### Found $DIR_COUNT dirs for $PKGS pkgs under $UNPACK_DIR, doing $BATCHES batches of $BATCH_SIZE ea with $JOBS parallel jobs" 242 | 243 | printf "%s\0" "${DIRS[@]}" | parallel -0 -j$JOBS -n $BATCH_SIZE --line-buffer --joblog +${SCAN_LOG} 'echo "### child $$ processing batch {#}/'"$BATCHES"'" && do_dirs {} && echo "### child $$ finished"' 244 | -------------------------------------------------------------------------------- /bin/package_unpack_all.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # Unpack the latest valid version of every package 4 | 5 | warn() 6 | { 7 | echo "$@" >&2 8 | } 9 | export -f warn 10 | 11 | die() 12 | { 13 | warn "$@" 14 | exit 1 15 | } 16 | export -f die 17 | 18 | UNPACK_LOG=~/parallel_unpack.log 19 | PKG_LIST=~/package_list 20 | COMMANDS="df parallel " 21 | DOWNLOAD_ONLY=0 22 | # Only used on rpm-based distros, but keep this up here 23 | # for visibility since it's a tunable knob. 24 | RPM_LIST=~/rpm_list 25 | FETCH_TIMEOUT=1800 26 | 27 | # set VERBOSE to non-0: print more individual status messages 28 | : "${VERBOSE:=0}" 29 | 30 | test -f /etc/os-release || die "Required /etc/os-release not found" 31 | 32 | verbose() 33 | { 34 | [[ -z ${VERBOSE} || ${VERBOSE} == "0" ]] && return 35 | local line 36 | 37 | for line in "$@" ; do 38 | warn "${line}" 39 | done 40 | } 41 | export -f verbose 42 | 43 | dfcheck() 44 | { 45 | local filesystem pct 46 | for filesystem in "$@" ; do 47 | pct=$(df "${filesystem}" | awk -F'[ %]+' '/[0-9]%/{print $5}') 48 | echo "${pct}" | grep -q -E '^[0-9]+$' || die "Unable to get '${filesystem}' full %, unsafe to continue" 49 | test "${pct}" -lt 90 || die "${filesystem} filesystem at ${pct}% full, refusing to continue" 50 | done 51 | } 52 | export -f dfcheck 53 | 54 | # On most OSs, this is a noop 55 | pre_parallel_hook() 56 | { 57 | : 58 | } 59 | 60 | # Various locations, commands, etc. differ by distro 61 | 62 | OS_ID=$(sed -n -E 's/^ID="?([^ "]+)"? *$/\1/p' /etc/os-release 2>/dev/null) 63 | 64 | case "${OS_ID}" in 65 | 66 | "") 67 | die "Could not extract an ID= line from /etc/os-release" 68 | ;; 69 | 70 | arch|endeavouros) 71 | COMMANDS+="antlr4 cargo cmake curl electron28 expac filterdiff gen-setup gendesk gnome-autogen.sh go gtkdocize intltoolize makepkg mate-autogen meson mlyacc npm opam pacman pipenv_to_requirements rustc setconf signify svn timeout uusi yarn yelp-build" 72 | # pacman -S $(pacman -Fq ... | sort -u | egrep -v '^extra/(nodejs-|go)') 73 | # Consistantly 404's for me: 74 | #COMMANDS+=" composer " 75 | 76 | PACKAGE_DIR="${HOME}/pkgs/distfiles/" 77 | PKGBUILD_DIR="${HOME}/pkgs/pkgbuild/" 78 | UNPACK_DIR="${HOME}/pkgs/sources/" 79 | LOG_DIR="${HOME}/pkgs/logs/" 80 | 81 | JOBS=$(sed -E -n 's/^MAKEFLAGS="[^"#]*-j ?([0-9]+).*/\1/p' /etc/makepkg.conf 2>/dev/null) 82 | [[ -z $JOBS ]] && JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 83 | 84 | # XXX: is there an equivalent of MAKE_OPTS that sets a -j factor? 85 | JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 86 | 87 | # makepkg will refuse to run as root 88 | [[ $EUID == "0" ]] && die "Must be run as non-root" 89 | 90 | # ...and yet root must have synced before we start 91 | [[ $(ls /var/lib/pacman/sync/*.db 2>/dev/null | wc -l) -gt 0 ]] || \ 92 | die "No .db files under /var/lib/pacman/sync/! Run pacman -Sy as root once first." 93 | 94 | # make our own config that tunes some settings 95 | if [[ ! -f ${HOME}/.package_unpack.conf ]]; then 96 | sed 's/curl -q/curl -s -S -/' /etc/makepkg.conf >${HOME}/.package_unpack.conf 97 | cat >>${HOME}/.package_unpack.conf </dev/null | grep -E -o '[A-Fa-f0-9]{40}' | sort -u) 118 | [[ -z $KEYS ]] && return 0 119 | gpg -q --recv-keys ${KEYS} 120 | } 121 | export -f import_keys 122 | 123 | # gitlab mangles package names (repository.git names) differently from versions 124 | # core|archlinux-keyring|20240313-1 - no mangling 125 | # extra|libsigc++-3.0|3.6.0-1 - name gets s/++/plusplus/ 126 | # extra|nicotine+|3.3.2-1 - name gets s/+$/plus/ 127 | # extra|dvd+rw-tools|7.1-9 - name gets s/+/-/g 128 | # core|grub|2:2.12-2 - ver gets s/:/-/g 129 | # extra|adobe-source-code-pro-fonts|2.042u+1.062i+1.026vf-1 - ver + left alone 130 | 131 | gitlab_pkg_mangle() 132 | { 133 | local pkg="$1" 134 | pkg="${pkg//++/plusplus}" 135 | pkg="${pkg/%+/plus}" 136 | pkg="${pkg//+/-}" 137 | pkg="${pkg//:/-}" 138 | echo "${pkg}" 139 | } 140 | export -f gitlab_pkg_mangle 141 | 142 | gitlab_ver_mangle() 143 | { 144 | local ver="$1" 145 | ver="${ver//:/-}" 146 | echo "${ver}" 147 | } 148 | export -f gitlab_ver_mangle 149 | 150 | make_pkg_cmd() 151 | { 152 | local repo_pkg_ver pkg_mangle pkg_unpack_dir ver_mangle repo_pkg_ver_mangle tarball 153 | IFS='|' read -r repo pkg ver <<< "${PKG}" 154 | repo_pkg="${repo}/${pkg}" 155 | repo_pkg_ver="${repo_pkg}-${ver}" 156 | repo_pkg_ver_mangle="${repo}/$(gitlab_pkg_mangle "${pkg}")-$(gitlab_ver_mangle "${ver}")" 157 | 158 | 159 | if [[ ${repo} == "endeavouros" ]]; then 160 | # EndeavourOS packages' PKGBUILD files will already be present 161 | tarball= 162 | pkg_unpack_dir="${UNPACK_DIR}${repo_pkg}" 163 | pkg_build_dir="${PKGBUILD_DIR}${repo}/${pkg}" 164 | 165 | # Some EndeavourOS packages can't be resolved, skip silently 166 | # https://github.com/endeavouros-team/PKGBUILDS/issues/335 167 | [[ -d "${pkg_build_dir}" ]] || return 168 | else 169 | tarball="${repo_pkg_ver}.tar.bz2" 170 | pkg_unpack_dir="${UNPACK_DIR}${repo_pkg_ver_mangle}" 171 | pkg_build_dir="${PKGBUILD_DIR}${repo_pkg_ver_mangle}" 172 | fi 173 | [[ -d "${pkg_unpack_dir}" ]] && return 174 | 175 | # Wrap unpacking in timeout(1) so that we do not wait forever 176 | # for pathological downloads / git clones, see: 177 | # https://bugs.gentoo.org/930633 178 | 179 | echo "echo '### unpack ${COUNT}/${TOT} ${tarball}' && \ 180 | dfcheck "${PACKAGE_DIR}" "${UNPACK_DIR}" "${PKGBUILD_DIR}" && \ 181 | mkdir -p '${PKGBUILD_DIR}${repo}/' '${pkg_unpack_dir}/' && \ 182 | [[ -z '${tarball}' ]] || \ 183 | tar -C '${PKGBUILD_DIR}${repo}/' -xf '${PACKAGE_DIR}${tarball}' && \ 184 | cd '${pkg_build_dir}' && \ 185 | import_keys && \ 186 | BUILDDIR='${pkg_unpack_dir}' MAKEPKG_CONF='${HOME}/.package_unpack.conf' timeout -v --preserve-status ${FETCH_TIMEOUT} makepkg --nodeps --nobuild --noconfirm --noprogressbar || \ 187 | die '### unpack failed ${COUNT}/${TOT} ${repo_pkg_ver} ${tarball}'" 188 | } 189 | 190 | # We need to fetch individual pkgbuild repos from Arch before 191 | # we can start actually fetching package sources + unpacking. 192 | # Cloning >10k repos from gitlab will kill them and they us, 193 | # so just grab versioned tarballs, and don't parallelize. 194 | pre_parallel_hook() 195 | { 196 | local outfile 197 | local pkgbuild_count 198 | 199 | # Progress bar... note we exclude endeavouros which we are 200 | # skipping down in the loop. 201 | pkgbuild_count=$(grep -E -v '^endeavouros\|' "${PKG_LIST}" | wc -l) 202 | echo "### Fetching ${pkgbuild_count} pkgbuild repo tarballs" 203 | local fetched=0 204 | 205 | while IFS='|' read -r repo pkg ver ; do 206 | 207 | # All EnOS packages live in their own single repo; handle separately. 208 | # XXX: Are there other Arch family distros that do similar? 209 | [[ $repo == "endeavouros" ]] && continue 210 | 211 | [[ $(( ${fetched} % 1000 )) == 0 ]] && echo "### Fetched ${fetched} / ${pkgbuild_count}" 212 | let fetched=${fetched}+1 213 | 214 | outfile="${repo}/${pkg}-${ver}.tar.bz2" 215 | [[ -f "${PACKAGE_DIR}${outfile}" ]] && continue 216 | 217 | dfcheck "${PACKAGE_DIR}" "${PKGBUILD_DIR}" "${UNPACK_DIR}" || exit 1 218 | mkdir -p "${PACKAGE_DIR}${repo}/" 219 | pkg_mangle="$(gitlab_pkg_mangle "${pkg}")" 220 | ver_mangle="$(gitlab_ver_mangle "${ver}")" 221 | path_mangle="/archlinux/packaging/packages/${pkg_mangle}/-/archive/${ver_mangle}/${pkg_mangle}-${ver_mangle}.tar.bz2" 222 | 223 | verbose "### Fetching 'https://gitlab.archlinux.org${path_mangle}' -> '${PACKAGE_DIR}${outfile}'" 224 | 225 | curl -s -S --max-time ${FETCH_TIMEOUT} -o "${PACKAGE_DIR}${outfile}" \ 226 | "https://gitlab.archlinux.org${path_mangle}" || \ 227 | warn "### Error on ${pkg}-${ver}" 228 | # Increase if we hit rate limits 229 | sleep 1 230 | done <"${PKG_LIST}" 231 | 232 | # If there are any EnOS packages listed, get/update that repo 233 | if grep -q -l '^endeavouros\|' "${PKG_LIST}" ; then 234 | mkdir -p "${PKGBUILD_DIR}/endeavouros" 235 | if [[ ! -d "${PKGBUILD_DIR}/endeavouros/.git" ]]; then 236 | git -C "${PKGBUILD_DIR}" clone --quiet https://github.com/endeavouros-team/PKGBUILDS endeavouros 237 | else 238 | git -C "${PKGBUILD_DIR}/endeavouros" pull --quiet 239 | fi 240 | fi 241 | 242 | # Do not be fooled later by existing empty directories 243 | rmdir "${UNPACK_DIR}"/*/* 2>/dev/null 244 | } 245 | 246 | ;; 247 | 248 | debian|devuan|ubuntu) 249 | COMMANDS+="apt-cache apt-get" 250 | # XXX: is there an equivalent of MAKE_OPTS that sets a -j factor? 251 | JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 252 | PACKAGE_DIR=/var/packages/ 253 | UNPACK_DIR="${PACKAGE_DIR}" 254 | 255 | DEB_SRC=$(grep -r '^deb-src' /etc/apt/sources.list* 2>/dev/null | wc -l) 256 | if [ "${DEB_SRC}" = "0" ]; then 257 | die 'No deb-src entries found in /etc/apt/sources.list*' 258 | fi 259 | 260 | if [ "${DOWNLOAD_ONLY}" = "1" ]; then 261 | DOWNLOAD_FLAG=--download-only 262 | fi 263 | 264 | make_pkg_list() 265 | { 266 | # List all available packages 267 | apt-cache search . | cut -d\ -f1 268 | } 269 | 270 | make_pkg_cmd() 271 | { 272 | echo "echo '### unpack ${COUNT}/${TOT} ${PKG}' && \ 273 | dfcheck "${PACKAGE_DIR}" "${UNPACK_DIR}" && \ 274 | apt-get source ${DOWNLOAD_FLAG} '${PKG}'" 275 | } 276 | ;; 277 | 278 | gentoo) 279 | COMMANDS+="ebuild portageq" 280 | JOBS=$(sed -E -n 's/^MAKEOPTS="[^"#]*-j ?([0-9]+).*/\1/p' /etc/portage/make.conf 2>/dev/null) 281 | [[ -z $JOBS ]] && JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 282 | 283 | for D in $(portageq get_repo_path "${EROOT:-/}" gentoo) /usr/portage/ /var/db/repos/gentoo/ ; do 284 | test -d "${D}" && PACKAGE_DIR="${D}" && break 285 | done 286 | test -n "${PACKAGE_DIR}" || die "Could not find package dir" 287 | UNPACK_DIR="${PORTAGE_TMPDIR:-/var/tmp/portage/}" 288 | 289 | if [ "${DOWNLOAD_ONLY}" = "1" ]; then 290 | EBUILD_CMD=fetch 291 | else 292 | EBUILD_CMD=unpack 293 | fi 294 | 295 | make_pkg_list() 296 | { 297 | # List the highest version of each package that is eligible 298 | # (skip non-keyworded/masked packages; skip older when newer exists) 299 | portageq all_best_visible / | sed -E '/^acct-(user|group)\//d' 300 | } 301 | 302 | make_pkg_cmd() 303 | { 304 | # Skip packages that come from overlays instead of ::gentoo 305 | if ! test -d "${PACKAGE_DIR}/$(qatom -C -F '%{CATEGORY}/%{PN}' "${PKG}")" ; then 306 | return 307 | fi 308 | EBUILD="$(qatom -C -F '%{CATEGORY}/%{PN}/%{PF}' "${PKG}").ebuild" 309 | echo "echo '### unpack ${COUNT}/${TOT} ${EBUILD}' \ 310 | dfcheck "${PACKAGE_DIR}" "${UNPACK_DIR}" && \ 311 | && ebuild $(echo \"${EBUILD}\") ${EBUILD_CMD}" 312 | } 313 | ;; 314 | 315 | # XXX: only actually tested on Rocky Linux yet 316 | centos|fedora|rhel|rocky) 317 | # %prep stage can require various development tools; best to do: 318 | # dnf groupinstall "Development Tools" 319 | # dnf install javapackages-tools jq 320 | COMMANDS+="build-jar-repository cpio gcc git reposync rpm2cpio rpmbuild tar" 321 | # XXX: is there an equivalent of MAKE_OPTS that sets a -j factor? 322 | JOBS=$(grep -E '^processor.*: [0-9]+$' /proc/cpuinfo | wc -l) 323 | PACKAGE_DIR="/var/repo/dist/" 324 | UNPACK_DIR="/var/repo/" 325 | ENABLE_REPO='*-source' 326 | 327 | make_pkg_list() 328 | { 329 | dnf list --disablerepo='*' --enablerepo="${ENABLE_REPO}" --available | \ 330 | awk '/^(Last metadata|(Available) Packages)/{next}; /\.src/{print $1}' 331 | } 332 | 333 | make_pkg_cmd() 334 | { 335 | # Extract the package name from the path+RPM name 336 | PNAME=$(rpm --queryformat "%{NAME}" -qp "${PKG}") 337 | # We could/should rpm2cpio ... | cpio -i..., but then unpacking 338 | # the .tar files inside would be our job, reading from .spec. 339 | # For now just skip the intermediate step. Run the %prep stage 340 | # which unpacks tars, applies patches, conditionally other things. 341 | echo "echo '### unpack ${COUNT}/${TOT} ${PNAME}' && \ 342 | dfcheck "${PACKAGE_DIR}" "${UNPACK_DIR}" && \ 343 | mkdir -p ${UNPACK_DIR}SOURCES/ && \ 344 | rpmbuild --define '_topdir ${UNPACK_DIR}' --quiet -rp '${PKG}'" 345 | } 346 | 347 | # We cannot really combine fetch+unpack, and reposync(1) is not 348 | # multiprocess (and if it was we'd need to worry about beating up 349 | # the mirrors we talked to, anyway). So, call it once before entering 350 | # the parallel unpacks. Unfortunately because it is a oneshot we can't 351 | # monitor df between fetches. 352 | pre_parallel_hook() 353 | { 354 | # First, fetch every available distfile 355 | reposync --disablerepo='*' --enablerepo="${ENABLE_REPO}" --source || \ 356 | warn "reposync errored, attempting to continue" 357 | # Second, build a list of RPMs and use that instead of ${PKG_LIST}. 358 | # Ignore the bird, follow the river. 359 | find ${PACKAGE_DIR}${ENABLE_REPO}/Packages/ -type f -name \*.src.rpm >"${RPM_LIST}" || \ 360 | die "find RPMs failed" 361 | PKG_LIST="${RPM_LIST}" 362 | # Prepare the target directory structure, just once. 363 | mkdir -p ${UNPACK_DIR}{BUILD,BUILDROOT,RPMS,SOURCES,SRPMS} 364 | } 365 | ;; 366 | 367 | *) 368 | die "Unsupported OS '${OS_ID}'" 369 | ;; 370 | esac 371 | 372 | export -f make_pkg_list 373 | export -f make_pkg_cmd 374 | export -f pre_parallel_hook 375 | 376 | # Mirrors will hate you fetching too many in parallel 377 | test "${DOWNLOAD_ONLY}" = "1" && test "${JOBS}" -gt 4 && JOBS=4 378 | 379 | for COMMAND in ${COMMANDS} ; do 380 | command -v ${COMMAND} >/dev/null || die "${COMMAND} not found in PATH" 381 | done 382 | 383 | # On some OSs, these are the same 384 | test -d "${UNPACK_DIR}" || die "Unpack target ${UNPACK_DIR} does not exist" 385 | cd "${PACKAGE_DIR}" || die "Could not cd ${PACKAGE_DIR}" 386 | 387 | if ! test -s "${PKG_LIST}" ; then 388 | echo "### Generating package list" 389 | make_pkg_list >"${PKG_LIST}" 390 | fi 391 | 392 | pre_parallel_hook 393 | 394 | COUNT=0 395 | TOT=$(wc -l "${PKG_LIST}") 396 | echo "### Processing ${TOT} packages in ${JOBS} parallel fetch+unpack jobs" 397 | while IFS= read -r PKG ; do 398 | make_pkg_cmd 399 | let COUNT=${COUNT}+1 400 | done <"${PKG_LIST}" | parallel -j${JOBS} --joblog +${UNPACK_LOG} 401 | 402 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The GNU General Public License (GPL-2.0) 2 | Version 2, June 1991 3 | Copyright (C) 1989, 1991 Free Software Foundation, Inc. 4 | 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 5 | 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your freedom to share 12 | and change it. By contrast, the GNU General Public License is intended to 13 | guarantee your freedom to share and change free software--to make sure the 14 | software is free for all its users. This General Public License applies to 15 | most of the Free Software Foundation's software and to any other program whose 16 | authors commit to using it. (Some other Free Software Foundation software is 17 | covered by the GNU Library General Public License instead.) You can apply 18 | it to your programs, too. 19 | 20 | When we speak of free software, we are referring to freedom, not price. 21 | Our General Public Licenses are designed to make sure that you have the 22 | freedom to distribute copies of free software (and charge for this service 23 | if you wish), that you receive source code or can get it if you want it, 24 | that you can change the software or use pieces of it in new free programs; 25 | and that you know you can do these things. 26 | 27 | To protect your rights, we need to make restrictions that forbid anyone to 28 | deny you these rights or to ask you to surrender the rights. These restrictions 29 | translate to certain responsibilities for you if you distribute copies of the 30 | software, or if you modify it. 31 | 32 | For example, if you distribute copies of such a program, whether gratis or 33 | for a fee, you must give the recipients all the rights that you have. 34 | You must make sure that they, too, receive or can get the source code. 35 | And you must show them these terms so they know their rights. 36 | 37 | We protect your rights with two steps: (1) copyright the software, and 38 | (2) offer you this license which gives you legal permission to copy, 39 | distribute and/or modify the software. 40 | 41 | Also, for each author's protection and ours, we want to make certain that 42 | everyone understands that there is no warranty for this free software. 43 | If the software is modified by someone else and passed on, we want its 44 | recipients to know that what they have is not the original, so that any 45 | problems introduced by others will not reflect on the 46 | original authors' reputations. 47 | 48 | Finally, any free program is threatened constantly by software patents. 49 | We wish to avoid the danger that redistributors of a free program will 50 | individually obtain patent licenses, in effect making the program proprietary. 51 | To prevent this, we have made it clear that any patent must be licensed for 52 | everyone's free use or not licensed at all. 53 | 54 | The precise terms and conditions for copying, distribution 55 | and modification follow. 56 | 57 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 58 | 59 | 0. This License applies to any program or other work which contains a notice 60 | placed by the copyright holder saying it may be distributed under the terms 61 | of this General Public License. The "Program", below, refers to any such 62 | program or work, and a "work based on the Program" means either the Program 63 | or any derivative work under copyright law: that is to say, a work containing 64 | the Program or a portion of it, either verbatim or with modifications and/or 65 | translated into another language. (Hereinafter, translation is included 66 | without limitation in the term "modification".) 67 | Each licensee is addressed as "you". 68 | 69 | Activities other than copying, distribution and modification are not covered 70 | by this License; they are outside its scope. The act of running the Program 71 | is not restricted, and the output from the Program is covered only if its 72 | contents constitute a work based on the Program (independent of having been 73 | made by running the Program). Whether that is true depends on 74 | what the Program does. 75 | 76 | 1. You may copy and distribute verbatim copies of the Program's source code 77 | as you receive it, in any medium, provided that you conspicuously and 78 | appropriately publish on each copy an appropriate copyright notice and 79 | disclaimer of warranty; keep intact all the notices that refer to this 80 | License and to the absence of any warranty; and give any other recipients 81 | of the Program a copy of this License along with the Program. 82 | 83 | You may charge a fee for the physical act of transferring a copy, and you 84 | may at your option offer warranty protection in exchange for a fee. 85 | 86 | 2. You may modify your copy or copies of the Program or any portion of it, 87 | thus forming a work based on the Program, and copy and distribute such 88 | modifications or work under the terms of Section 1 above, provided that you 89 | also meet all of these conditions: 90 | 91 | a) You must cause the modified files to carry prominent notices stating 92 | that you changed the files and the date of any change. 93 | 94 | b) You must cause any work that you distribute or publish, that in whole 95 | or in part contains or is derived from the Program or any part thereof, 96 | to be licensed as a whole at no charge to all third parties under 97 | the terms of this License. 98 | 99 | c) If the modified program normally reads commands interactively when run, 100 | you must cause it, when started running for such interactive use in the 101 | most ordinary way, to print or display an announcement including an 102 | appropriate copyright notice and a notice that there is no warranty 103 | (or else, saying that you provide a warranty) and that users may 104 | redistribute the program under these conditions, and telling the user how 105 | to view a copy of this License. (Exception: if the Program itself is 106 | interactive but does not normally print such an announcement, your work 107 | based on the Program is not required to print an announcement.) 108 | 109 | These requirements apply to the modified work as a whole. If identifiable 110 | sections of that work are not derived from the Program, and can be reasonably 111 | considered independent and separate works in themselves, then this License, 112 | and its terms, do not apply to those sections when you distribute them as 113 | separate works. But when you distribute the same sections as part of a whole 114 | which is a work based on the Program, the distribution of the whole must be 115 | on the terms of this License, whose permissions for other licensees extend 116 | to the entire whole, and thus to each and every part 117 | regardless of who wrote it. 118 | 119 | Thus, it is not the intent of this section to claim rights or contest your 120 | rights to work written entirely by you; rather, the intent is to exercise 121 | the right to control the distribution of derivative or collective 122 | works based on the Program. 123 | 124 | In addition, mere aggregation of another work not based on the Program with 125 | the Program (or with a work based on the Program) on a volume of a storage 126 | or distribution medium does not bring the other work under 127 | the scope of this License. 128 | 129 | 3. You may copy and distribute the Program (or a work based on it, 130 | under Section 2) in object code or executable form under the terms of 131 | Sections 1 and 2 above provided that you also do one of the following: 132 | 133 | a) Accompany it with the complete corresponding machine-readable source 134 | code, which must be distributed under the terms of Sections 1 and 2 above 135 | on a medium customarily used for software interchange; or, 136 | 137 | b) Accompany it with a written offer, valid for at least three years, to 138 | give any third party, for a charge no more than your cost of physically 139 | performing source distribution, a complete machine-readable copy of the 140 | corresponding source code, to be distributed under the terms of 141 | Sections 1 and 2 above on a medium customarily used 142 | for software interchange; or, 143 | 144 | c) Accompany it with the information you received as to the offer to 145 | distribute corresponding source code. (This alternative is allowed only 146 | for noncommercial distribution and only if you received the program in 147 | object code or executable form with such an offer, in accord 148 | with Subsection b above.) 149 | 150 | The source code for a work means the preferred form of the work for making 151 | modifications to it. For an executable work, complete source code means all 152 | the source code for all modules it contains, plus any associated interface 153 | definition files, plus the scripts used to control compilation and 154 | installation of the executable. However, as a special exception, the source 155 | code distributed need not include anything that is normally distributed 156 | (in either source or binary form) with the major components (compiler, kernel, 157 | and so on) of the operating system on which the executable runs, unless that 158 | component itself accompanies the executable. 159 | 160 | If distribution of executable or object code is made by offering access to 161 | copy from a designated place, then offering equivalent access to copy the 162 | source code from the same place counts as distribution of the source code, 163 | even though third parties are not compelled to copy the source 164 | along with the object code. 165 | 166 | 4. You may not copy, modify, sublicense, or distribute the Program except as 167 | expressly provided under this License. Any attempt otherwise to copy, modify, 168 | sublicense or distribute the Program is void, and will automatically terminate 169 | your rights under this License. However, parties who have received copies, 170 | or rights, from you under this License will not have their licenses 171 | terminated so long as such parties remain in full compliance. 172 | 173 | 5. You are not required to accept this License, since you have not signed it. 174 | However, nothing else grants you permission to modify or distribute the 175 | Program or its derivative works. These actions are prohibited by law if you 176 | do not accept this License. Therefore, by modifying or distributing the Program 177 | (or any work based on the Program), you indicate your acceptance of this 178 | License to do so, and all its terms and conditions for copying, distributing 179 | or modifying the Program or works based on it. 180 | 181 | 6. Each time you redistribute the Program (or any work based on the Program), 182 | the recipient automatically receives a license from the original licensor 183 | to copy, distribute or modify the Program subject to these terms 184 | and conditions. You may not impose any further restrictions on the recipients' 185 | exercise of the rights granted herein. You are not responsible for enforcing 186 | compliance by third parties to this License. 187 | 188 | 7. If, as a consequence of a court judgment or allegation of patent 189 | infringement or for any other reason (not limited to patent issues), 190 | conditions are imposed on you (whether by court order, agreement or otherwise) 191 | that contradict the conditions of this License, they do not excuse you from 192 | the conditions of this License. If you cannot distribute so as to satisfy 193 | simultaneously your obligations under this License and any other pertinent 194 | obligations, then as a consequence you may not distribute the Program at all. 195 | For example, if a patent license would not permit royalty-free redistribution 196 | of the Program by all those who receive copies directly or indirectly 197 | through you, then the only way you could satisfy both it and this License 198 | would be to refrain entirely from distribution of the Program. 199 | 200 | If any portion of this section is held invalid or unenforceable under any 201 | particular circumstance, the balance of the section is intended to apply 202 | and the section as a whole is intended to apply in other circumstances. 203 | 204 | It is not the purpose of this section to induce you to infringe any patents 205 | or other property right claims or to contest validity of any such claims; 206 | this section has the sole purpose of protecting the integrity of the free 207 | software distribution system, which is implemented by public license practices. 208 | Many people have made generous contributions to the wide range of software 209 | distributed through that system in reliance on consistent application of 210 | that system; it is up to the author/donor to decide if he or she is willing 211 | to distribute software through any other system and a licensee 212 | cannot impose that choice. 213 | 214 | This section is intended to make thoroughly clear what is believed to be a 215 | consequence of the rest of this License. 216 | 217 | 8. If the distribution and/or use of the Program is restricted in certain 218 | countries either by patents or by copyrighted interfaces, the original 219 | copyright holder who places the Program under this License may add an explicit 220 | geographical distribution limitation excluding those countries, so that 221 | distribution is permitted only in or among countries not thus excluded. 222 | In such case, this License incorporates the limitation as if written 223 | in the body of this License. 224 | 225 | 9. The Free Software Foundation may publish revised and/or new versions of the 226 | General Public License from time to time. Such new versions will be similar 227 | in spirit to the present version, but may differ in detail to address 228 | new problems or concerns. 229 | 230 | Each version is given a distinguishing version number. If the Program 231 | specifies a version number of this License which applies to it and 232 | "any later version", you have the option of following the terms and conditions 233 | either of that version or of any later version published by the 234 | Free Software Foundation. If the Program does not specify a version number of 235 | this License, you may choose any version ever published 236 | by the Free Software Foundation. 237 | 238 | 10. If you wish to incorporate parts of the Program into other free programs 239 | whose distribution conditions are different, write to the author to ask for 240 | permission. For software which is copyrighted by the Free Software Foundation, 241 | write to the Free Software Foundation; we sometimes make exceptions for this. 242 | Our decision will be guided by the two goals of preserving the free status of 243 | all derivatives of our free software and of promoting the sharing 244 | and reuse of software generally. 245 | 246 | NO WARRANTY 247 | 248 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR 249 | THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE 250 | STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE 251 | PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, 252 | INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 253 | FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND 254 | PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, 255 | YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 256 | 257 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL 258 | ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE 259 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY 260 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE 261 | OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA 262 | OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES 263 | OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF 264 | SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 265 | 266 | END OF TERMS AND CONDITIONS 267 | 268 | How to Apply These Terms to Your New Programs 269 | 270 | If you develop a new program, and you want it to be of the greatest possible 271 | use to the public, the best way to achieve this is to make it free software 272 | which everyone can redistribute and change under these terms. 273 | 274 | To do so, attach the following notices to the program. It is safest to attach 275 | them to the start of each source file to most effectively convey the exclusion 276 | of warranty; and each file should have at least the "copyright" line and a 277 | pointer to where the full notice is found. 278 | 279 | One line to give the program's name and a brief idea of what it does. 280 | Copyright (C) {{ year }} {{ organization }} 281 | 282 | This program is free software; you can redistribute it and/or modify it 283 | under the terms of the GNU General Public License as published by the 284 | Free Software Foundation; either version 2 of the License, or 285 | (at your option) any later version. 286 | 287 | This program is distributed in the hope that it will be useful, but 288 | WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY 289 | or FITNESS FOR A PARTICULAR PURPOSE. 290 | See the GNU General Public License for more details. 291 | 292 | You should have received a copy of the GNU General Public License along 293 | with this program; if not, write to the Free Software Foundation, Inc., 294 | 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA 295 | 296 | Also add information on how to contact you by electronic and paper mail. 297 | 298 | If the program is interactive, make it output a short notice like this when 299 | it starts in an interactive mode: 300 | 301 | Gnomovision version 69, Copyright (C) year name of author Gnomovision 302 | comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is 303 | free software, and you are welcome to redistribute it under certain 304 | conditions; type `show c' for details. 305 | 306 | The hypothetical commands `show w' and `show c' should show the appropriate 307 | parts of the General Public License. Of course, the commands you use may be 308 | called something other than `show w' and `show c'; they could even be 309 | mouse-clicks or menu items--whatever suits your program. 310 | 311 | You should also get your employer (if you work as a programmer) or your school, 312 | if any, to sign a "copyright disclaimer" for the program, if necessary. 313 | Here is a sample; alter the names: 314 | 315 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 316 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 317 | 318 | signature of Ty Coon, 1 April 1989 319 | Ty Coon, President of Vice 320 | 321 | This General Public License does not permit incorporating your program into 322 | proprietary programs. If your program is a subroutine library, you may 323 | consider it more useful to permit linking proprietary applications with 324 | the library. If this is what you want to do, use the GNU Library General 325 | Public License instead of this License. 326 | -------------------------------------------------------------------------------- /bin/find_m4.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # TODO: Integrate with package_scan_all(?) 3 | # TODO: Avoid adding duplicate entries? 4 | 5 | die() 6 | { 7 | echo "$@" >&2 8 | exit 1 9 | } 10 | 11 | COMMANDS=( cut find gawk git grep sha256sum sqlite3 ) 12 | 13 | KNOWN_M4_DBPATH="known_m4.db" 14 | UNKNOWN_M4_DBPATH="unknown_m4.db" 15 | 16 | # Various locations, commands, etc. differ by distro 17 | 18 | OS_ID=$(sed -n -E 's/^ID="?([^ "]+)"? *$/\1/p' /etc/os-release 2>/dev/null) 19 | 20 | case "${OS_ID}" in 21 | 22 | "") 23 | die "Could not extract an ID= line from /etc/os-release" 24 | ;; 25 | 26 | debian|devuan|ubuntu) 27 | UNPACK_DIR="/var/packages/" 28 | ;; 29 | 30 | gentoo) 31 | UNPACK_DIR="${PORTAGE_TMPDIR:-/var/tmp/portage/}" 32 | # We want to get the 'real' PORTAGE_TMPDIR, as PORTAGE_TMPDIR has confusing 33 | # semantics (PORTAGE_TMPDIR=/var/tmp -> stuff goes into /var/tmp/portage). 34 | UNPACK_DIR="${UNPACK_DIR%%/portage/}/portage/" 35 | ;; 36 | 37 | centos|fedora|rhel|rocky) 38 | UNPACK_DIR="/var/repo/BUILD/" 39 | ;; 40 | 41 | *) 42 | die "Unsupported OS '${OS_ID}'" 43 | ;; 44 | esac 45 | 46 | # Use the distro-specific unpack dir unless told otherwise 47 | M4_DIR="${M4_DIR:-${UNPACK_DIR}}" 48 | 49 | shopt -s expand_aliases 50 | alias tput=false 51 | test -f /lib/gentoo/functions.sh && . /lib/gentoo/functions.sh || { 52 | # Stubs for non-Gentoo systems 53 | eerror() { echo "$@"; } 54 | ewarn() { echo "$@"; } 55 | einfo() { echo "$@"; } 56 | eindent() { :; } 57 | eoutdent() { :; } 58 | } 59 | unalias tput 60 | 61 | # unset DEBUG or 0: only display mismatches and other actionable items 62 | # set DEBUG to non-0: very noisy 63 | : "${DEBUG:=0}" 64 | 65 | # Enabling DEBUG also enables VERBOSE by default 66 | [[ -z ${DEBUG} || ${DEBUG} == "0" ]] || VERBOSE=1 67 | 68 | debug() 69 | { 70 | [[ -z ${DEBUG} || ${DEBUG} == "0" ]] && return 71 | # Deliberately treating this as a 'printf with debug check' function 72 | # shellcheck disable=2059 73 | printf "$@" >&2 74 | } 75 | 76 | # unset VERBOSE or 0: only print details at the end 77 | # set VERBOSE to non-0: print any time a new or unmatched m4 is found, 78 | # including git diff commands, etc. 79 | : "${VERBOSE:=0}" 80 | 81 | # Enable verbose flag for commands like rm 82 | [[ -z ${VERBOSE} || ${VERBOSE} == "0" ]] || VERBOSE_FLAG=-v 83 | 84 | verbose() 85 | { 86 | [[ -z ${VERBOSE} || ${VERBOSE} == "0" ]] && return 87 | local cmd line 88 | cmd=$1 89 | shift 90 | 91 | eindent 92 | for line in "$@" ; do 93 | ${cmd} "${line}" 94 | done 95 | eoutdent 96 | } 97 | 98 | # Extract M4 serial number from an M4 macro. 99 | extract_serial() 100 | { 101 | local file=$1 102 | local serial serial_int 103 | local filename="${file##*/}" 104 | 105 | # https://www.gnu.org/software/automake/manual/html_node/Serials.html 106 | # We have to cope with: 107 | # - '#serial 1234 a.m4' 108 | # - '# serial 1234 b.m4' 109 | # TODO: pretty sure this can be optimized with sed(?) (less important now it uses gawk) 110 | # TODO: missed opportunity to diagnose multiple serial lines here, see https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00266.html 111 | serial=$(gawk 'match($0, /^#(.* )?serial ([[:digit:]]+).*$/, a) {print a[2]; exit;}' "${file}") 112 | 113 | if [[ -z ${serial} ]] ; then 114 | # Some (old) macros may use an invalid format: 'x.m4 serial n' 115 | # https://lists.gnu.org/archive/html/bug-gnulib/2024-04/msg00051.html 116 | # TODO: pretty sure this can be optimized with sed 117 | # TODO: since that was fixed, there may be 2 valid checksums for each serial. How do we handle that 118 | # in the DB queries later on? 119 | serial=$(grep -m 1 -Pr "#(.+ )?(${filename} )?serial (\d+[^ ]*).*$" "${file}") 120 | serial="${serial#* }" 121 | fi 122 | 123 | # Fallbacks and warnings in case of no/bad serial number 124 | if [[ -z ${serial} ]] ; then 125 | serial="NULL" 126 | serial_int=0 127 | debug "[%s] No serial found, recording 'NULL' and for arithmetic ops using '0'\n" "${filename}" 128 | else 129 | serial_int="${serial//[!0-9]/}" 130 | [[ -z ${serial_int} ]] && serial_int=0 131 | [[ ${serial_int} != "${serial}" ]] && eerror "File '${file}': Non-numeric serial '${serial}', arithmetic ops will use '${serial_int}'" 132 | fi 133 | 134 | echo "${serial_int}" "${serial}" 135 | } 136 | 137 | # For a given file, get a comment-stripped checksum. 138 | # If the file contained 'changecom', we give up, don't try to strip. 139 | # https://www.gnu.org/software/m4/manual/html_node/Comments.html 140 | # https://www.gnu.org/software/m4/manual/html_node/Changecom.html 141 | # https://lists.gnu.org/archive/html/m4-discuss/2014-06/msg00000.html 142 | 143 | make_stripped_checksum() 144 | { 145 | local file="$1" 146 | local plain_checksum="$2" 147 | local strip_checksum 148 | 149 | # TODO: dnl can follow something other than whitespace, like 150 | # foo)dnl, bar]dnl. Broaden our match? We'd have to restore or 151 | # not consume such chars, unlike the whitespace we currently consume 152 | 153 | strip_checksum=$(gawk '/changecom/{exit 77}; { gsub(/#.*/,""); gsub(/(^| )dnl.*/,"");}; /^ *$/{next}; {print};' "${file}" 2>/dev/null \ 154 | | sha256sum - \ 155 | | cut -d' ' -f1 ; \ 156 | exit ${PIPESTATUS[0]}) 157 | local ret=$? 158 | if [[ ${ret} != 0 ]] ; then 159 | strip_checksum="${plain_checksum}" 160 | if [[ ${ret} != 77 ]]; then 161 | eerror "File '${file}': Got error ${ret} from gawk?" 162 | fi 163 | fi 164 | echo "${strip_checksum}" 165 | } 166 | 167 | # Initial creation of known M4 macros database. 168 | # Creates a table called `m4` with fields: 169 | # `name` 170 | # `serial` 171 | # `plain_checksum` (SHA256), 172 | # `strip_checksum` (SHA256), (checksum of comment-stripped contents) 173 | # `repository` (name of git repo) 174 | # `commit` (git commit in `repository`) 175 | create_known_db() 176 | { 177 | sqlite3 "${KNOWN_M4_DBPATH}" <<-EOF | grep -v '^wal$' 178 | PRAGMA journal_mode=WAL; 179 | CREATE table m4 (name TEXT, serial TEXT, plain_checksum TEXT, strip_checksum TEXT, repository TEXT, gitcommit TEXT, gitpath TEXT); 180 | EOF 181 | [[ ${PIPESTATUS[0]} == 0 ]] || die "SQLite ${KNOWN_M4_DBPATH} DB creation failed" 182 | } 183 | 184 | # Initial creation of unknown M4 macros database. 185 | # Creates a table called `m4` with fields: 186 | # `name` 187 | # `serial` 188 | # `plain_checksum` (SHA256), 189 | # `strip_checksum` (SHA256), (checksum of comment-stripped contents) 190 | # `projectfile` (path under M4_DIR for this specific file, incl project dir) 191 | # `reason` (what kind of check led to us adding it here) 192 | create_unknown_db() 193 | { 194 | sqlite3 "${UNKNOWN_M4_DBPATH}" <<-EOF | grep -v '^wal$' 195 | PRAGMA journal_mode=WAL; 196 | CREATE table m4 (name TEXT, serial TEXT, plain_checksum TEXT, strip_checksum TEXT, projectfile TEXT, reason TEXT); 197 | EOF 198 | [[ ${PIPESTATUS[0]} == 0 ]] || die "SQLite ${UNKNOWN_M4_DBPATH} DB creation failed" 199 | } 200 | 201 | # Remember per-run unrecognized macros, so that we can then cross-ref 202 | # across all analyzed projects, find common matching unknowns, etc. 203 | record_unknown() 204 | { 205 | local filename="$1" 206 | local serial="$2" 207 | local plain_checksum="$3" 208 | local strip_checksum="$4" 209 | local project_filepath="$5" 210 | local reason="$6" 211 | 212 | sqlite3 "${UNKNOWN_M4_DBPATH}" <<-EOF || die "SQLite insert into ${UNKNOWN_M4_DBPATH} failed" 213 | $(printf "PRAGMA synchronous = OFF;\nINSERT INTO \ 214 | m4 (name, serial, plain_checksum, strip_checksum, projectfile, reason) \ 215 | VALUES ('%s', '%s', '%s', '%s', '%s', '%s');\n" \ 216 | "${filename}" "${serial}" "${plain_checksum}" "${strip_checksum}" "${project_filepath:-NULL}" "${reason}" 217 | ) 218 | EOF 219 | } 220 | 221 | # Search passed directories for M4 macros and populate `M4_FILES` with the result. 222 | find_macros() 223 | { 224 | # What .m4 files are there in the wild? 225 | # TODO: exclude list for aclocal.m4 and so on? 226 | mapfile -d '' M4_FILES < <(find "$@" -iname "*.m4" -type f -print0) 227 | } 228 | 229 | # Populate the DB with the contents of `M4_FILES`. 230 | populate_known_db() 231 | { 232 | local queries=() 233 | local serial serial_int 234 | local file filename 235 | local plain_checksum strip_checksum 236 | local processed=0 237 | 238 | for file in "${M4_FILES[@]}" ; do 239 | 240 | [[ $(( ${processed} % 1000 )) == 0 ]] && einfo "Processed ${processed} / ${#M4_FILES[@]} macro files" 241 | let processed=${processed}+1 242 | 243 | filename="${file##*/}" 244 | [[ ${filename} == @(aclocal.m4|acinclude.m4|m4sugar.m4) ]] && continue 245 | 246 | # TODO: reject pathological filenames? spaces, shell metacharacters, etc. 247 | 248 | dirname="${file%/*}" 249 | 250 | read -r serial_int serial <<< $(extract_serial "${file}") 251 | 252 | # TODO: we used to skip files w/no serial, should we again? 253 | # [[ ${serial} == NULL ]] && continue 254 | 255 | repository=$(git -C "${dirname}" rev-parse --show-toplevel 2>/dev/null || cat "${file}.gitrepo") 256 | commit=$(git -C "${dirname}" rev-parse HEAD 2>/dev/null || cat "${file}.gitcommit") 257 | path=$(cat "${file}".gitpath 2>/dev/null || echo "${file}") 258 | 259 | plain_checksum=$(sha256sum "${file}" | cut -d' ' -f 1) 260 | strip_checksum=$(make_stripped_checksum "${file}" "${plain_checksum}") 261 | 262 | queries+=( 263 | "$(printf "INSERT INTO \ 264 | m4 (name, serial, plain_checksum, strip_checksum, repository, gitcommit, gitpath) \ 265 | VALUES ('%s', '%s', '%s', '%s', '%s', '%s', '%s');\n" \ 266 | "${filename}" "${serial}" "${plain_checksum}" "${strip_checksum}" "${repository:-NULL}" "${commit:-NULL}" "${path:-NULL}")" 267 | ) 268 | 269 | debug "[%s] Got serial %s with checksum %s stripped %s\n" "${filename}" "${serial}" "${plain_checksum}" "${strip_checksum}" 270 | done 271 | 272 | sqlite3 "${KNOWN_M4_DBPATH}" <<-EOF || die "SQLite batched insert into ${KNOWN_M4_DBPATH} failed" 273 | PRAGMA synchronous = OFF; 274 | ${queries[@]} 275 | EOF 276 | } 277 | 278 | # Compare `M4_FILES` found on disk with the contents of the database (known M4 serials/hashes). 279 | compare_with_db() 280 | { 281 | # We have `M4_FILES` as a bunch of macros pending verification that we found 282 | # unpacked in archives. 283 | local file filename 284 | local max_serial_seen max_serial_seen_int serial serial_int 285 | local plain_checksum strip_checksum 286 | local delta absolute_delta 287 | local processed=0 288 | local known_filename known_filename_query 289 | 290 | declare -A valid_checksums=() 291 | declare -A known_filenames=() 292 | declare -A bad_checksums=() 293 | 294 | # Load up a list of all known checksums, for future reference 295 | known_checksum_query=$(sqlite3 "${KNOWN_M4_DBPATH}" <<-EOF || die "SQLite lookup of known checksums failed" 296 | SELECT DISTINCT plain_checksum,strip_checksum FROM m4 297 | EOF 298 | ) 299 | for checksum in ${known_checksum_query} ; do 300 | IFS='|' read -ra known_checksum_query_parsed <<< "${checksum}" 301 | plain_checksum=${known_checksum_query_parsed[0]} 302 | strip_checksum=${known_checksum_query_parsed[1]} 303 | valid_checksums[${plain_checksum}]=1 304 | valid_checksums[${strip_checksum}]=1 305 | done 306 | 307 | # Load up a list of all observed filenames, for future reference 308 | known_filename_query=$(sqlite3 "${KNOWN_M4_DBPATH}" <<-EOF || die "SQLite lookup of known names failed" 309 | SELECT DISTINCT name FROM m4 310 | EOF 311 | ) 312 | for known_filename in ${known_filename_query} ; do 313 | known_filenames[${known_filename}]=1 314 | done 315 | 316 | for file in "${M4_FILES[@]}" ; do 317 | 318 | [[ $(( ${processed} % 1000 )) == 0 ]] && einfo "Compared ${processed} / ${#M4_FILES[@]} macro files" 319 | let processed=${processed}+1 320 | 321 | filename="${file##*/}" 322 | [[ ${filename} == @(aclocal.m4|acinclude.m4|m4sugar.m4) ]] && continue 323 | 324 | # TODO: reject pathological filenames? spaces, shell metacharacters, etc. 325 | 326 | project_filepath=${file#"${M4_DIR}"} 327 | 328 | read -r serial_int serial <<< $(extract_serial "${file}") 329 | 330 | plain_checksum=$(sha256sum "${file}" | cut -d' ' -f 1) 331 | strip_checksum=$(make_stripped_checksum "${file}" "${plain_checksum}") 332 | 333 | debug "\n" 334 | debug "[%s] Got serial %s with checksum %s stripped %s\n" \ 335 | "${filename}" "${serial}" "${plain_checksum}" "${strip_checksum}" 336 | debug "[%s] Checking database...\n" "${filename}" 337 | 338 | # Have we seen this checksum before (stripped or otherwise)? 339 | # If yes, it's only (mildly) interesting if it has a different name than we know it by. 340 | # If not, we need to see if it's a known serial number or not. 341 | local indexed_checksum=${valid_checksums[${plain_checksum}]} || ${valid_checksums[${strip_checksum}]} || 0 342 | 343 | if [[ ${indexed_checksum} == 1 ]] ; then 344 | # We know the checksum, we can move on. 345 | # TODO: Should we mention if only stripped matched, not raw? 346 | # TODO: Check if the filename and/or serial matched, make a note if they did not? 347 | let MATCH_COUNT=${MATCH_COUNT}+1 348 | continue 349 | fi 350 | 351 | # 352 | # If we get here, this checksum is not known-good. 353 | # 354 | 355 | if ! [[ ${known_filenames[${filename}]} ]] ; then 356 | # Have we seen this filename before during this scan, even though 357 | # it's not in our index? 358 | # 359 | # We've seen it before as an "unseen" macro, so not very interesting, 360 | # but remember we saw it in this project/path too. 361 | if [[ ${NEW_MACROS[${filename}]} ]] ; then 362 | record_unknown "${filename}" "${serial}" "${plain_checksum}" "${strip_checksum}" "${project_filepath:-NULL}" "unknown-repeat" 363 | continue 364 | fi 365 | 366 | # We didn't see this filename before when indexing. 367 | NEW_MACROS[${filename}]=1 368 | 369 | ewarn "$(printf "Found new macro %s\n" "${file}")" 370 | 371 | debug "[%s] Got serial %s with checksum %s stripped %s\n" "${filename}" "${serial}" "${plain_checksum}" "${strip_checksum}" 372 | 373 | # This filename isn't in the index, so no point in carrying on. 374 | record_unknown "${filename}" "${serial}" "${plain_checksum}" "${strip_checksum}" "${project_filepath:-NULL}" "new-filename" 375 | continue 376 | fi 377 | 378 | # 379 | # If we get here, the filename is known but the checksum is new. 380 | # 381 | 382 | # Is it a new checksum for an existing known serial? 383 | # Find the maximum serial number we've ever seen for this macro. 384 | # TODO: This could be optimized by preloading it into an assoc array 385 | # ... and save many repeated forks & queries (to avoid looking up same macro repeatedly) 386 | max_serial_seen_query=$(sqlite3 "${KNOWN_M4_DBPATH}" <<-EOF || die "SQLite lookup of max serial for '${filename}' failed" 387 | SELECT MAX(CAST(serial AS INT)),name,serial,plain_checksum,strip_checksum,repository,gitcommit,gitpath FROM m4 WHERE name='${filename}'; 388 | EOF 389 | ) 390 | 391 | # Check for discontinuities in serial number. Linear increase is OK, 392 | # like N+1 or so (likely just a genuinely new version), but something 393 | # like +20 is suspicious as they really want theirs to take priority... 394 | # TODO: Make this more intelligent? 395 | if [[ -n ${max_serial_seen_query} ]] ; then 396 | print_diff_cmd() { 397 | local cmd=$1 398 | local reason=$2 399 | 400 | IFS='|' read -ra parsed_results <<< "${max_serial_seen_query}" 401 | expected_repository=${parsed_results[5]} 402 | expected_gitcommit=${parsed_results[6]} 403 | expected_gitpath=${parsed_results[7]} 404 | verbose ${cmd} "diff using:"$'\n\t'"git diff --no-index <(git -C ${expected_repository} show '${expected_gitcommit}:${expected_gitpath}') '${file}'" 405 | 406 | DIFF_CMDS+=( "git diff --no-index <(git -C ${expected_repository} show '${expected_gitcommit}:${expected_gitpath}') '${file}' # ${reason}" ) 407 | # We don't want to emit loads of diff commands for the same thing 408 | bad_checksums[${plain_checksum}]=1 409 | bad_checksums[${strip_checksum}]=1 410 | } 411 | 412 | # We don't want to emit loads of diff commands for the same thing 413 | if [[ ${bad_checksums[${plain_checksum}]} == 1 || ${bad_checksums[${strip_checksum}]} == 1 ]] ; then 414 | record_unknown "${filename}" "${serial}" "${plain_checksum}" "${strip_checksum}" "${project_filepath:-NULL}" "known-bad-checksum" 415 | continue 416 | fi 417 | 418 | IFS='|' read -ra max_serial_seen_parsed <<< "${max_serial_seen_query}" 419 | max_serial_seen=${max_serial_seen_parsed[2]} 420 | # What even are numbers 421 | max_serial_seen_int="${max_serial_seen//[!0-9]/}" 422 | [[ -z ${max_serial_seen_int} ]] && max_serial_seen_int=0 423 | delta=$(( max_serial_seen_int - serial_int )) 424 | absolute_delta=$(( delta >= 0 ? delta : -delta )) 425 | 426 | # Call out large deltas at a higher priority 427 | if [[ ${delta} -lt -10 ]] ; then 428 | BAD_SERIAL_MACROS+=( "${filename}" ) 429 | 430 | eerror "$(printf "Large serial delta found in %s!\n" "${project_filepath}")" 431 | verbose eerror \ 432 | "$(printf "full path: %s" "${file}")" $'\n' \ 433 | "$(printf "serial=%s" "${serial}")" $'\n' \ 434 | "$(printf "max_serial_seen=%s" "${max_serial_seen}")" $'\n' \ 435 | "$(printf "delta=%s" "${absolute_delta}")" $'\n' 436 | print_diff_cmd eerror "serialjump" 437 | elif [[ ${delta} -lt 0 ]] ; then 438 | NEW_SERIAL_MACROS+=( "${filename}" ) 439 | 440 | ewarn "$(printf "Newer macro serial found in %s\n" "${project_filepath}")" 441 | verbose ewarn \ 442 | "$(printf "serial=%s" "${serial}")" $'\n' \ 443 | "$(printf "max_serial_seen=%s" "${max_serial_seen}")" $'\n' \ 444 | "$(printf "absolute_delta=%s" "${absolute_delta}")" $'\n' 445 | print_diff_cmd ewarn "serialinc" 446 | fi 447 | fi 448 | 449 | # We know this filename, but not its checksum and maybe not serial number. 450 | # Look up all the checksums for this macro & serial. 451 | known_macro_query=$(sqlite3 "${KNOWN_M4_DBPATH}" <<-EOF || die "SQLite lookup of known records for '${filename}' failed" 452 | SELECT name,serial,plain_checksum,strip_checksum,repository,gitcommit,gitpath FROM m4 WHERE name='${filename}'; 453 | EOF 454 | ) 455 | 456 | local line expected_serial checksum_ok 457 | for line in ${known_macro_query} ; do 458 | IFS='|' read -ra parsed_results <<< "${line}" 459 | expected_serial=${parsed_results[1]} 460 | expected_plain_checksum=${parsed_results[2]} 461 | expected_strip_checksum=${parsed_results[3]} 462 | expected_repository=${parsed_results[4]} 463 | expected_gitcommit=${parsed_results[5]} 464 | expected_gitpath=${parsed_results[6]} 465 | 466 | debug "[%s] Checking candidate w/ expected_serial=%s, expected_plain_checksum=%s, expected_strip_checksum=%s\n" \ 467 | "${filename}" "${expected_serial}" "${expected_plain_checksum}" "${expected_strip_checksum}" 468 | 469 | # TODO: In the case of multiple knowns for this file w/different 470 | # serials & checksums, we are picking the first one with a serial 471 | # match. That doesn't necessarily mean the closest content match. 472 | # Add fuzzy hashes & find the candidate with the closest fuzzy hash? 473 | if [[ ${expected_serial} == "${serial}" ]] ; then 474 | # We know this serial, so we can assert what its checksum ought to be. 475 | if [[ ${expected_plain_checksum} == "${plain_checksum}" ]]; then 476 | checksum_ok=plain 477 | elif [[ ${expected_strip_checksum} == "${strip_checksum}" ]]; then 478 | checksum_ok=strip 479 | else 480 | checksum_ok=no 481 | fi 482 | 483 | debug "[%s] checksum_ok=%s\n" "${filename}" "${checksum_ok}" 484 | 485 | if [[ ${checksum_ok} == no ]] ; then 486 | BAD_MACROS+=( "${file}" ) 487 | 488 | eerror "$(printf "Found mismatch in %s\n" "${file}")" 489 | verbose eerror \ 490 | "$(printf "full path: %s" "${file}")" \ 491 | "$(printf "expected_serial=%s vs serial=%s" \ 492 | "${expected_serial}" "${serial}")" \ 493 | "$(printf "expected_plain_checksum=%s vs plain_checksum=%s" \ 494 | "${expected_plain_checksum}" "${plain_checksum}")" \ 495 | "$(printf "expected_strip_checksum=%s vs strip_checksum=%s" \ 496 | "${expected_strip_checksum}" "${strip_checksum}")" \ 497 | "diff using:"$'\n\t'"git diff --no-index <(git -C ${expected_repository} show '${expected_gitcommit}:${expected_gitpath}') '${file}'" 498 | 499 | DIFF_CMDS+=( "git diff --no-index <(git -C ${expected_repository} show '${expected_gitcommit}:${expected_gitpath}') '${file}' # mismatch" ) 500 | 501 | # We don't want to emit loads of diff commands for the same thing 502 | bad_checksums[${plain_checksum}]=1 503 | bad_checksums[${strip_checksum}]=1 504 | 505 | # No point in checking this one against other checksums 506 | break 507 | fi 508 | fi 509 | done 510 | record_unknown "${filename}" "${serial}" "${plain_checksum}" "${strip_checksum}" "${project_filepath:-NULL}" "new-serial" 511 | debug "[%s] Got %s\n" "${filename}" "unknown" 512 | done 513 | } 514 | 515 | for COMMAND in "${COMMANDS[@]}" ; do 516 | command -v "${COMMAND}" >/dev/null || die "'${COMMAND}' not found in PATH" 517 | done 518 | 519 | # MODE=0: create database 520 | # MODE=1: search against the db 521 | : "${MODE:=0}" 522 | 523 | declare -Ag NEW_MACROS=() 524 | 525 | M4_FILES=() 526 | NEW_MACROS=() 527 | NEW_SERIAL_MACROS=() 528 | BAD_MACROS=() 529 | BAD_SERIAL_MACROS=() 530 | DIFF_CMDS=() 531 | MATCH_COUNT=0 532 | 533 | if [[ ${MODE} == 0 ]] ; then 534 | if [[ "$#" -le 3 ]] ; then 535 | label="$*" 536 | else 537 | label="$1 $2 ...[$#]" 538 | fi 539 | einfo "Running in create mode, scraping ${label}" 540 | 541 | if [[ -f "${KNOWN_M4_DBPATH}" ]] ; then 542 | debug "Using existing database...\n" 543 | else 544 | debug "Creating database...\n" 545 | create_known_db 546 | fi 547 | 548 | einfo "Finding macros to index..." 549 | find_macros "$@" 550 | 551 | einfo "Adding ${#M4_FILES[@]} macros to database..." 552 | populate_known_db 553 | else 554 | einfo "Running in comparison mode..." 555 | [[ -f "${KNOWN_M4_DBPATH}" ]] || die "error: running in DB comparison mode but '${KNOWN_M4_DBPATH}' not found!" 556 | 557 | einfo "Purging old (if any) unknown db..." 558 | rm ${VERBOSE_FLAG} -f "${UNKNOWN_M4_DBPATH}" 559 | einfo "Creating new unknown db..." 560 | create_unknown_db 561 | 562 | # Which of these files are new? 563 | einfo "Finding macros in '${M4_DIR}' to compare..." 564 | find_macros "${M4_DIR}" 565 | 566 | einfo "Comparing ${#M4_FILES[@]} macros with database..." 567 | compare_with_db 568 | 569 | printf "\n" 570 | 571 | einfo "Scanning complete." 572 | 573 | einfo "Found ${MATCH_COUNT} matched m4, ${#NEW_MACROS[@]} new m4, ${#NEW_SERIAL_MACROS[@]} new serial, ${#BAD_MACROS[@]} differing m4, ${#BAD_SERIAL_MACROS[@]} serial jumps, ${#DIFF_CMDS[@]} diff commands." 574 | 575 | if (( ${#NEW_MACROS[@]} > 0 )) || (( ${#NEW_SERIAL_MACROS[@]} > 0 )) || (( ${#BAD_MACROS[@]} > 0 )) \ 576 | || (( ${#BAD_SERIAL_MACROS[@]} > 0 )) || (( ${#DIFF_CMDS[@]} > 0 )) ; then 577 | 578 | # Sort our lists of new/modified/bad m4's 579 | 580 | (( ${#NEW_MACROS[@]} > 0 )) && \ 581 | mapfile -d '' _sorted < <(printf '%s\0' "${!NEW_MACROS[@]}" | sort -z) && \ 582 | ewarn "New macros: ${_sorted[@]}" 583 | 584 | (( ${#NEW_SERIAL_MACROS} > 0 )) && \ 585 | mapfile -d '' _sorted < <(printf '%s\0' "${NEW_SERIAL_MACROS[@]}" | sort -z) && \ 586 | ewarn "Updated macros: ${_sorted[@]}" 587 | 588 | (( ${#BAD_MACROS} > 0 )) && \ 589 | mapfile -d '' _sorted < <(printf '%s\0' "${BAD_MACROS[@]}" | sort -z) && \ 590 | eerror "Miscompared macros: ${_sorted[@]}" 591 | 592 | (( ${#BAD_SERIAL_MACROS} > 0 )) && \ 593 | mapfile -d '' _sorted < <(printf '%s\0' "${BAD_SERIAL_MACROS[*]}" | sort -z) && \ 594 | eerror "Significant serial diff. macros: ${_sorted[@]}" 595 | 596 | # DIFF_CMDS is already in a logical order (grouped by project) 597 | (( ${#DIFF_CMDS} > 0 )) && { 598 | eerror "Collected diff cmds for review:" ; 599 | printf "%s\n" "${DIFF_CMDS[@]}" ; 600 | } 601 | fi 602 | fi 603 | --------------------------------------------------------------------------------