├── LICENSE ├── README.md └── sources └── clui /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Joel Bruner 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # clui 2 | Get the description, code point(s) and UTF encoding of Unicode characters and sequences, in a variety of formats and encodings with clui (Command Line Unicode Information) for macOS 3 | 4 | Check out [blog entries on clui at brunerd.com](https://www.brunerd.com/blog/category/projects/clui/) 5 | 6 | ## clui demo video 7 | 8 | See clui in action: 9 | [![clui walkthrough](https://img.youtube.com/vi/KhNblOSffz4/0.jpg)](https://www.youtube.com/watch?v=KhNblOSffz4) 10 | 11 | ## clui usage 12 | `./clui -u` for usage/help output in `less` 13 | ``` 14 | clui (1.0) - Command Line Unicode Info (https://github.com/brunerd/clui) 15 | Usage: clui [options] ... 16 | 17 | Input can be: 18 | * Unicode characters, space or comma delimited (use -x to expand non-delimited strings) 19 | * Hexadecimal codepoint representations (U+hhhhh or 0xhhhhh), double-quoted muti-point sequences 20 | * Hyphenated ranges (ascending or descending): z-a, U+A1-U+BF or 0x20-0x7E 21 | * Category or Group names (see Input Options) 22 | * Descriptive words or phrases (see Input Options) 23 | 24 | Output Options 25 | 26 | -D Discrete info fields for CharacterDB and localized AppleName.strings 27 | 28 | -H Hide characters lacking info descriptions 29 | 30 | -l 31 | Show localized info (Emoji only). Use -Ll to list available localizations. 32 | 33 | -p Preserve case of CharacterDB info 34 | 35 | Encoding style for UTF field 36 | -E 37 | h* UTF-8 hexadecimal, space delimited and capitalized (NN) (default) 38 | H Hex HTML Entity UTF-32 (&#xnnnn;) 39 | 0 Octal UTF-8 with leading 0 (\0nnn) 40 | o Octal UTF-8 (\nnn) 41 | x Shell style UTF-8 hex(\xnn) 42 | u JS style UTF-16 (\unnnn) 43 | U zsh style UTF-32 Unicode Code Point (\Unnnnnnnn) 44 | w Web/URL UTF-8 encoding (%nn) 45 | 46 | Output format 47 | -O 48 | C* CSV (default) 49 | c Character-only, space delimited 50 | j JSON output (array of objects) 51 | J JSON Sequence output (objects delimited by 0x1E and 0x0A) 52 | p Plain output (no field descriptions) 53 | r RTF output (plain output with large sized characters) 54 | y YAML output 55 | 56 | Format dependent output options 57 | -f set font size for RTF output of char and info fields (default: 256,32) 58 | -h Hide headers for CSV output 59 | 60 | Input Options 61 | 62 | -C [,Subsection] 63 | Treat input as a Category name with possible a subsection (see -L for listing) 64 | 65 | -F Remove Fitzpatrick skin tone modifier and process, then process as-is 66 | 67 | -G [,Category] 68 | Treat input as a Group name with possible category name (see -L for listing) 69 | 70 | -l 71 | Search localized descriptions (Emoji only), use -Ll to list available localizations 72 | 73 | -S 74 | Treat input as search criteria 75 | d Search descriptions in CharacterDB and AppleName.strings (case insensitive) 76 | c Search for character in other Unicode sequences 77 | C Search for character plus "related characters" 78 | 79 | -x Expand and describe each individual code point in a sequence 80 | -X Expand plus display original sequence prior to expansion 81 | 82 | -V Verbatim, process input raw/as-is, no additional interpretation or delimitation 83 | 84 | Other Modes 85 | 86 | List categories and groups 87 | -L List categories or groups in CSV (use -h to suppress header) 88 | c Category list (* after a name denotes subsections) 89 | C Category list, with subsections expanded 90 | g Groups of categories, top level name 91 | G Group name with member categories expanded 92 | l Locales available to search and display results from (Emoji only) 93 | 94 | -u Display usage info (aka help) with less (press q to quit) 95 | 96 | Examples: 97 | 98 | Search for characters a to z plus "related characters" and output as CSV (default) 99 | clui -SC a-z 100 | 101 | Look up all available Categories 102 | clui -Lc 103 | 104 | Get every character in Emoji category and output in RTF to a file 105 | clui -Or -C Emoji > Emoji.rtf 106 | 107 | All characters in Emoji category with discrete info fields in Spanish to a CSV to a file 108 | clui -D -l es -C Emoji > Emoji-es.csv 109 | 110 | Search descriptions for substring "family" and expand multi-code point ZWJ sequences 111 | clui -X -Sd "family" 112 | ``` 113 | -------------------------------------------------------------------------------- /sources/clui: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | : <<-LICENSE_BLOCK 3 | clui - Command Line Unicode Info, Copyright (c) 2023 Joel Bruner (https://github.com/brunerd/clui) 4 | Licensed under the MIT License 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 6 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 8 | LICENSE_BLOCK 9 | 10 | ############# 11 | # FUNCTIONS # 12 | ############# 13 | 14 | function outputCharInfo(){ 15 | 16 | #GLOBAL booleans: FONTSIZE_LARGE, FONTSIZE_SMALL, HIDE_UNDESCRIBED, LOCALE, and HAD_OUTPUT 17 | #positional parameters 18 | local string="${1}"; [ -z "${string}" ] && return 19 | local outputOption="${2:-CSV}" 20 | local encodingScheme="${3:-h}" 21 | 22 | #font size defaults for RTF output (-Or) 23 | #the character/sequence 24 | local default_fontSize_large="256" 25 | #the info, code points, and UTF 26 | local default_fontSize_small="32" 27 | 28 | #member names for JSON and YAML objects 29 | #the star of the show 30 | local fieldName_string="char" 31 | #info/description from both CharacterDB.sqlite3 and AppleName.strings, can be broken out individually or ; delimited (if more than one) 32 | local fieldName_info="info" 33 | 34 | #later if DISCRETE_DESCRIPTIONS is true use these field names 35 | [ "${LOCALE:=en}" != "en" ] && local locale_suffix="-${LOCALE}" 36 | #make discrete names used for JSON and YAML 37 | local fieldName_info1="${fieldName_info}-chardb" 38 | local fieldName_info2="${fieldName_info}-emoji${locale_suffix}" 39 | 40 | local fieldName_codePoints="cp" 41 | #this will have the bit depth and encoding type appended 42 | local fieldName_encoding="utf" 43 | 44 | #internal variables 45 | local IFS=$'\n'; local i; local encodedString; local str_codePoint; local byte 46 | 47 | #jse JSON String encoder 48 | function jse() ( 49 | #jse - JSON String Encoder (https://github.com/brunerd/jse) Copyright (c) 2022 Joel Bruner, Licensed under the MIT License 50 | set +x; read -r -d '' JSCode <<-'EOT' 51 | var argument=decodeURIComponent(escape(arguments[0]));var fileFlag=decodeURIComponent(escape(arguments[1]));if(fileFlag){try{vartext=readFile(argument)} catch(error){throw new Error(error);quit()};if(argument==="/dev/stdin"){text=text.slice(0,-1)}}else{var text=argument};print(JSON.stringify(text,null,0)) 52 | EOT 53 | jsc=$(find "/System/Library/Frameworks/JavaScriptCore.framework/Versions/Current/" -name 'jsc');[ -z "${jsc}" ] && jsc=$(which jsc);if ([ -n "${ARGC}" ] && [ "${ARGC}" = "0" ]) || ([ -z "${ARGC}" ] && [ "${#BASH_ARGC[@]}" = "0" ]); then if [ -f '/dev/stdin' ]; then fileFlag=1; argument="/dev/stdin"; elif [ -t '0' ]; then exit 0; fi; elif [ "${1}" = "-f" ] && [ -n "${2}" ]; then if [ ! -f "${2}" ]; then echo "File not found: ${2}";exit 1; else fileFlag=1; argument="${2}"; fi; else argument="${1}"; fi; if [ -z "${argument}" ] && [ ! -t '0' ]; then "${jsc}" -e "${JSCode}" -- "/dev/stdin" "1" <<< "$(cat)"; else "${jsc}" -e "${JSCode}" -- "${argument}" "${fileFlag}"; fi 54 | ) 55 | 56 | function encodeRTF(){ 57 | local needsCtrlSeq=1 58 | local OPTIND 59 | local OPTARG 60 | 61 | while getopts ":f:" option; do 62 | case "${option}" in 63 | f) 64 | local fontsize="${OPTARG}" 65 | ((${fontsize})) && echo "\\fs${fontsize} " 66 | ;; 67 | esac 68 | done 69 | 70 | #if required shift parameters, OPTIND is only set after getopts 71 | [ ${OPTIND} -ge 2 ] && shift $((OPTIND-1)) 72 | 73 | local string="${1}" 74 | 75 | #go through each character 76 | for ((i=0;;i++)); do 77 | local char="${string:i:1}" 78 | [ -z "${char}" ] && break 79 | local charDecimal=$(/usr/bin/printf "%d" "'${char}") 80 | #anything in these ranges encode 81 | if ((charDecimal <= 0x1F)) || ((charDecimal > 0x7E)); then 82 | ((needsCtrlSeq)) && echo -En "\\uc0" 83 | #make UTF-16 in hex, add 0x prefix with sed, then convert to decimal and prefix with \u for RTF 84 | printf '\\u%04d ' $(echo -n "${char}" | iconv -t UTF-16BE | xxd -p -c2 | sed 's/..../0x&/g') 85 | needsCtrlSeq=0 86 | else 87 | #a few ascii chars need escaping 88 | case "${char}" in 89 | #just add a backslach before { } and \ 90 | '{'|'}'|'\') 91 | echo -En $(sed 's/\\/\\\\/g;s/[{}]/\\&/g' <<< "${char}") 92 | ;; 93 | #output all else as-is 94 | *) 95 | printf "%s" "${char}" 96 | ;; 97 | esac 98 | fi 99 | done 100 | #end of line 101 | echo "\\" 102 | } 103 | 104 | #build variable str_codePoint (U+nnnn ...) 105 | while [ -n "${string:$i:1}" ]; do 106 | #code point (6 padded for max 10FFFF value) 107 | local codePoint=$(/usr/bin/printf "%06X" \'"${string:$i:1}") 108 | #space depending 109 | [ -z "${str_codePoint}" ] && space="" || space=" " 110 | #trim off leading zeroes except for single digit characters 111 | str_codePoint+="${space}U+$(sed -E 's/^0{0,4}//g' <<< "${codePoint}")" 112 | let i++ 113 | done 114 | 115 | #build encodedString encoded one of these ways 116 | case "${encodingScheme}" in 117 | #utf-8 octal \nnn (-Eo) or \0nnn (-E0) 118 | o|0) 119 | #append bit depth and encoding type 120 | [ "${encodingScheme}" = "0" ] && { zero="0"; fieldName_encoding+="-8 0-octal-sh"; } || fieldName_encoding+="-8 octal-sh" 121 | for byte in $(echo -En "${string}" | xxd -p -c1 -u); do 122 | #use shell hex conversion along with printf 123 | encodedString+="$(printf "\\\\${zero}%03o" $((0x${byte})))" 124 | done 125 | ;; 126 | #Web (URL) encoding %nn (-Ew) 127 | w) 128 | #append bit depth and encoding type 129 | fieldName_encoding+="-8 url" 130 | #print encodedString encoded \x escape style, leave xxd output unquoted to leverage each line as argument for printf 131 | encodedString="$(printf "%%%s" $(echo -En "${string}" | xxd -p -c1 -u))" 132 | ;; 133 | #zsh UTF-32 code point \Unnnnnnnn (-EU) 134 | U) 135 | fieldName_encoding+="-32 zsh" 136 | for ((i=0;;i++)); do 137 | [ -z "${string:i:1}" ] && break 138 | #printf in bash 3.x cannot print Code points but /usr/bin/printf can 139 | encodedString+="$(/usr/bin/printf "\\\\U%08X" \'"${string:i:1}")" 140 | done 141 | ;; 142 | #JS UTF-16 \unnnn (-Eu) 143 | u) 144 | fieldName_encoding+="-16 JS" 145 | for (( i=0;; i++ )); do 146 | [ -z "${string:i:1}" ] && break 147 | #print UTF16 encoded \u escape style, leave xxd output unquoted to leverage each line as argument for printf 148 | encodedString+="$(printf "\\\\u%s%s" $(echo -n "${string:$i:1}" | iconv -f utf-8 -t utf-16be | xxd -p -c1))" 149 | done 150 | ;; 151 | #utf-8 shell escaped \xnn (-Ex) [DEFAULT] 152 | x) 153 | fieldName_encoding+="-8 hex-sh" 154 | #print encodedString encoded \x escape style, leave xxd output unquoted to leverage each line as argument for printf 155 | encodedString="$(printf "\\\\x%s" $(echo -En "${string}" | xxd -p -c1 -u))" 156 | ;; 157 | #HTML Entity &#xnnnn; 158 | H) 159 | fieldName_encoding+="-32 HTML Entity" 160 | for ((i=0;;i++)); do 161 | [ -z "${string:i:1}" ] && break 162 | #printf in bash 3.x cannot print Code points but /usr/bin/printf can 163 | encodedString+="$(/usr/bin/printf "&#x%X;" \'"${string:i:1}")" 164 | done 165 | ;; 166 | ## default (or -Eh) capitalized space delimited hex bytes NN NN NN NN 167 | 'h'|*) 168 | fieldName_encoding+="-8 hex" 169 | #UTF8 Hex string 170 | encodedString="$(echo -En "${string}" | xxd -c0 -p -u | sed -e 's/$..$/\1 /g' -e 's/ $//') " 171 | ;; 172 | esac 173 | 174 | #clean up trailing spaces 175 | encodedString=$(echo -n "${encodedString}" | sed -E 's/ +$//') 176 | 177 | #get description 178 | local description=$(getDescriptionByCharMatch "${string}") 179 | #break apart if needed 180 | if ((DISCRETE_DESCRIPTIONS)); then 181 | local info1=$(echo -n "${description}" | sed -n "1p") 182 | local info2=$(echo -n "${description}" | sed -n "2p") 183 | fi 184 | 185 | #if no description and global "HIDE_UNDESCRIBED" is true (1), return 186 | if [ -z "${description}" ] && ((${HIDE_UNDESCRIBED})); then 187 | return 188 | fi 189 | 190 | case ${outputOption} in 191 | "JSON"*) 192 | #JSON formatting variables 193 | case "${outputOption}" in 194 | "JSONSEQ") 195 | local recSep=$'\x1E' 196 | local seq_nl=$'\n' 197 | local spc=' ' 198 | ;; 199 | "JSON") 200 | local spc=' ' 201 | local objspc=' ' 202 | local comma=, 203 | 204 | #JSON object(s) will be inside an array and comma separated 205 | if ((${NEEDSHEADER:=1})); then 206 | #start the array 207 | JSON_format=$'[\n' 208 | else 209 | JSON_format=$',\n' 210 | fi 211 | ;; 212 | esac 213 | 214 | if ((DISCRETE_DESCRIPTIONS)); then 215 | local infoblock="${spc}\"${fieldName_info1}\": $(jse "${info1}" 2>/dev/null), 216 | ${spc}\"${fieldName_info2}\": $(jse "${info2}" 2>/dev/null)," 217 | else 218 | local infoblock="${spc}\"${fieldName_info}\": $(jse "${description}" 2>/dev/null)," 219 | fi 220 | 221 | #create JSON output 222 | local output="${JSON_format}${recSep}${objspc}{"$'\n'"${spc}\"${fieldName_string:=string}\": $(jse "${string}" 2>/dev/null), 223 | ${infoblock} 224 | ${spc}\"${fieldName_codePoints}\": \"${str_codePoint}\", 225 | ${spc}\"${fieldName_encoding}\": $(jse "${encodedString}" 2>/dev/null) 226 | ${objspc}}${seq_nl}" 227 | 228 | #output in one atomic chunk so if TERM is trapped the closing ] for JSON array will always be correct 229 | echo -n "${output}" 230 | ;; 231 | "YAML") 232 | if ((DISCRETE_DESCRIPTIONS)); then 233 | local infoblock="${fieldName_info1}: $(jse "${info1}" 2>/dev/null) 234 | ${fieldName_info2}: $(jse "${info2}" 2>/dev/null)" 235 | else 236 | local infoblock="${fieldName_info}: $(jse "${description}" 2>/dev/null)" 237 | fi 238 | 239 | local output="- ${fieldName_string}: $(jse "${string}" 2>/dev/null) 240 | ${infoblock} 241 | ${fieldName_codePoints}: ${str_codePoint} 242 | ${fieldName_encoding}: ${encodedString}" 243 | echo "${output}" 244 | ;; 245 | "PLAIN") 246 | output="-- 247 | ${string} 248 | ${description} 249 | ${str_codePoint} 250 | ${encodedString}" 251 | echo "${output}" 252 | ;; 253 | "CHARACTER") 254 | ((HAD_OUTPUT)) && local char_space=" " 255 | #just the character 256 | echo -n "${char_space}${string}" 257 | ;; 258 | "RTF") 259 | #use global var to keep track of this 260 | if ((${NEEDSHEADER:=1})); then 261 | echo '{\rtf1\ansi\ansicpg1252\cocoartf2709 262 | \cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;} 263 | {\colortbl;\red255\green255\blue255;} 264 | {\*\expandedcolortbl;;} 265 | \margl1440\margr1440\vieww15000\viewh16000\viewkind0 266 | \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0 267 | ' 268 | fi 269 | output="$(encodeRTF -f ${FONTSIZE_LARGE:=$default_fontSize_large} "${string}") 270 | $(encodeRTF -f ${FONTSIZE_SMALL:=$default_fontSize_small} "${description}") 271 | ${str_codePoint}\\ 272 | ${encodedString}\\ 273 | --\\ 274 | " 275 | echo -n "${output}" 276 | ;; 277 | "CSV"|*) 278 | #if any of these character are present, enclose character in quotes and escape any existing quotes 279 | if [ "${string}" = $'\n' ]; then 280 | string="\"${string}\"" 281 | elif grep -q -e \" -e "," -e " " -e $'\a' -e $'\e' -e $'\f' -e $'\r' -e $'\t' -e $'\v' -e $'\x7F' <<< "${string}" || [[ "${string}" == *$'\n'* ]]; then 282 | string="\"$(sed -e 's/"/""/g' <<< "${string}")\"" 283 | fi 284 | 285 | #if description has comma or newline then enclose in quotes 286 | if ((DISCRETE_DESCRIPTIONS)); then 287 | if grep -q -e "," <<< "${info1}"; then 288 | info1="\"${info1}\"" 289 | fi 290 | 291 | if grep -q -e "," <<< "${info2}"; then 292 | info1="\"${info2}\"" 293 | fi 294 | 295 | description="${info1},${info2}" 296 | else 297 | #no descriptions contain straight quotes, curly quotes only 298 | #description=$(sed -e 's/"/""/' <<< "${description}") 299 | 300 | #quote if comma found 301 | if grep -q -e "," <<< "${description}"; then 302 | description="\"${description}\"" 303 | fi 304 | fi 305 | 306 | #print header if needed and not explicitly blocked 307 | if ! ((headerOff)) && ((${NEEDSHEADER:=1})); then 308 | if ((DISCRETE_DESCRIPTIONS)); then 309 | echo "${fieldName_string},${fieldName_info1},${fieldName_info2},${fieldName_codePoints},${fieldName_encoding}" 310 | else 311 | echo "${fieldName_string},${fieldName_info},${fieldName_codePoints},${fieldName_encoding}" 312 | fi 313 | fi 314 | 315 | if ((DISCRETE_DESCRIPTIONS)); then 316 | echo "${string},${info1},${info2},${str_codePoint},${encodedString}" 317 | else 318 | echo "${string},${description},${str_codePoint},${encodedString}" 319 | fi 320 | ;; 321 | esac 322 | 323 | #GLOBALS 324 | NEEDSHEADER=0 325 | HAD_OUTPUT=1 326 | } 327 | 328 | #get the description 329 | function getDescriptionByCharMatch(){ 330 | 331 | local char="${1}" 332 | [ -z "${char}" ] && return 333 | 334 | #database lookup info 335 | local databasePath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/CharacterDB.sqlite3" 336 | local dbTable="unihan_dict" 337 | local char_fieldName="uchr" 338 | local desc_fieldName="info" 339 | 340 | #both kinds of quotes in the string?! forget about it! 341 | (grep -q -e \" <<< "${char}" && grep -q -e \' <<< "${char}") && return 342 | #set quote type depending on char 343 | grep -q -e \" <<< "${char}" && local quote="'" || local quote='"' 344 | 345 | #remove trailing/repeating | only 346 | local chardb_description=$(sqlite3 "${databasePath}" "SELECT ${desc_fieldName} FROM \`${dbTable}\` where ${char_fieldName} == ${quote}${char}${quote};" | sed -E -e 's/\|*$//') 347 | #lowercase for comparison 348 | local chardb_description_lower=$(awk '{print tolower($0)}' <<< "${chardb_description}") 349 | 350 | #if has fitzpatrick, remove and try again 351 | if [ -z "${chardb_description}" ] && hasFitzpatrickMod "${char}"; then 352 | chardb_description=$(sqlite3 "${databasePath}" "SELECT ${desc_fieldName} FROM \`${dbTable}\` where ${char_fieldName} == ${quote}$(filterFitzpatrickModifier "${char}")${quote};" | sed -E -e 's/\|*$//') 353 | chardb_description_lower=$(awk '{print tolower($0)}' <<< "${chardb_description}") 354 | fi 355 | 356 | #escape characters problematic for PlistBuddy: colon, space, solidus, tab (even though they shouldn't be in here, you never know what the future may bring) 357 | [ "${char}" != $'\n' ] && char=$(sed -e 's/\\/\\\\/g;s/ /\\ /g;s/:/\\:/g;s/\t/\\\t/g' <<< "${char}") 358 | 359 | #try a CoreEmoji lookup in AppleName.strings, an XML doc with each character is it's own dictionary entry 360 | local PlistPath="/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${LOCALE:=en}.lproj/AppleName.strings" 361 | 362 | #get description from CoreEmoji AppleName.string in whatever locale 363 | local emoji_description=$(/usr/libexec/PlistBuddy -c "print :${char}" "${PlistPath}" 2>/dev/null) 364 | #for comparison only 365 | local emoji_description_lower=$(awk '{print tolower($0)}' <<< "${emoji_description}") 366 | 367 | #try without fitzpatrick if present 368 | if [ -z "${emoji_description}" ] && hasFitzpatrickMod "${char}"; then 369 | local tempchar="$(filterFitzpatrickModifier "${char}")" 370 | #just in case it's fitzpatrick(s) only and nothing is left 371 | if [ -n "${tempchar}" ]; then 372 | emoji_description=$(/usr/libexec/PlistBuddy -c "print :${tempchar}" "${PlistPath}" 2>/dev/null | awk '{print tolower($0)}') 373 | emoji_description_lower=$(awk '{print tolower($0)}' <<< "${emoji_description}") 374 | fi 375 | fi 376 | 377 | #output 378 | if ((DISCRETE_DESCRIPTIONS)); then 379 | if ((preserveCase)); then 380 | echo -n "${chardb_description}"$'\n'"${emoji_description}" 381 | else 382 | echo -n "${chardb_description_lower}"$'\n'"${emoji_description}" 383 | fi 384 | #if the lowescase is the same, use AppleName.strings b/c capitalization is sane 385 | elif [ "${emoji_description_lower}" = "${chardb_description_lower}" ]; then 386 | echo -n "${emoji_description}" 387 | else 388 | ([ -n "${chardb_description}" ] && [ -n "${emoji_description}" ]) && local separator=";" 389 | ((preserveCase)) && echo -n "${chardb_description}${separator}${emoji_description}" || echo -n "${chardb_description_lower}${separator}${emoji_description}" 390 | fi 391 | } 392 | 393 | #search in descriptions of CharacterDB.sqlite3 and AppleName.strings 394 | function getCharsByDescriptionSearch(){ 395 | local searchstring="${1}" 396 | local databasePath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/CharacterDB.sqlite3" 397 | local dbTable="unihan_dict" 398 | local char_fieldName="uchr" 399 | local desc_fieldName="info" 400 | 401 | #strip newlines - otherwise weird search results can happen 402 | searchstring=$(echo -n "${searchstring}" | tr -d $'\n') 403 | 404 | #nothing left? go back 405 | [ -z "${searchstring}" ] && return 406 | 407 | #both quotes forget about it 408 | (grep -q -e \" <<< "${char}" && grep -q -e \' <<< "${char}") && return 409 | #set quote depending on string 410 | grep -q -e \" <<< "${char}" && local quote="'" || local quote='"' 411 | 412 | #CharacterDB results, info is upper so convert search to upper 413 | local matchingChars=$(sqlite3 "${databasePath}" "SELECT uchr FROM \`${dbTable}\` WHERE instr(${desc_fieldName}, UPPER(${quote}${searchstring}${quote})) > 0 OR instr(${desc_fieldName}, LOWER(${quote}${searchstring}${quote})) > 0;") 414 | [ -n "${matchingChars}" ] && matchingChars+=$'\n' 415 | 416 | local PlistPath="/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${LOCALE:=en}.lproj/AppleName.strings" 417 | #add results from coreemoji 418 | matchingChars+=$(plutil -convert json -o - "${PlistPath}" | sed -e 's/","/\n/g' -e '1 s/^{"//' -e '$ s/"}$//' -e 's/":"/\x1E/g' | awk '{print tolower($0)}' | awk -F $'\x1E' '$2 ~ /'"${searchstring}"'/ {print $1}')$'\n' 419 | 420 | sed 's/^$//g' <<< "${matchingChars}" | sort | uniq 421 | } 422 | 423 | #find chacter usage in all glyphs 424 | function findMatchingChars(){ 425 | local char="${1}" 426 | 427 | [ -z "${char}" ] && return 1 428 | [ "${char}" = $'\n' ] && return 1 429 | 430 | #strip newlines - otherwise weird search results can happen 431 | char=$(echo -n "${char}" | tr -d $'\n') 432 | 433 | local PlistPath="/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${LOCALE:=en}.lproj/AppleName.strings" 434 | #coreemoji search, convert to json then massage down to CSV 435 | #\ escape any regex conflicting characters: \ | . / + * ^ $ [ ( 436 | local awk_char="$(echo -n "${char}" | sed -e 's/[\\\|\./\+\*^$[(?]/\\&/g')" 437 | local matchingChars=$(plutil -convert json -o - "${PlistPath}" | sed -e $'s/",/\\n/g' | sed -e '1 s/^{//' -e 's/^"//g' -e 's/":"/,/g' -e '$ s/}$//' | awk -F , '$1 ~ /'"${awk_char}"'/ {print $1}') 438 | [ -n "${matchingChars}" ] && matchingChars+=$'\n' 439 | 440 | local databasePath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/CharacterDB.sqlite3" 441 | local dbTable="unihan_dict" 442 | local char_fieldName="uchr" 443 | local desc_fieldName="info" 444 | 445 | #both quotes forget about it 446 | (grep -q -e \" <<< "${char}" && grep -q -e \' <<< "${char}") && return 447 | #set quote depending on string 448 | grep -q -e \" <<< "${char}" && local quote="'" || local quote='"' 449 | 450 | #characterdb 451 | #local query_char=$(echo -n "${char}" | sed -e 's/[\*]/\\&/g') 452 | matchingChars+=$(sqlite3 "${databasePath}" "SELECT uchr FROM \`${dbTable}\` WHERE instr(${char_fieldName}, ${quote}${char}${quote}) > 0;") 453 | 454 | #sort and uniq 455 | local matchingChars=$(echo -n "${matchingChars}" | sort | uniq) 456 | echo "${matchingChars}" 457 | } 458 | 459 | #Apple maintains a db with look-a-like characters for Latin chars (and some others) 460 | function getRelatedCharacters(){ 461 | local char="${1}" 462 | 463 | local databasePath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/RelatedCharDB.sqlite3" 464 | local dbTable="related_dict" 465 | local fieldName="relatedChars" 466 | 467 | #no db file? return! 468 | ! [ -f "${databasePath}" ] && return 469 | 470 | #both quotes forget about it 471 | (grep -q -e \" <<< "${char}" && grep -q -e \' <<< "${char}") && return 472 | #set quote depending on string 473 | grep -q -e \" <<< "${char}" && local quote="'" || local quote='"' 474 | 475 | local characters=$(sqlite3 "${databasePath}" "SELECT ${fieldName} FROM \`${dbTable}\` WHERE instr(${fieldName}, ${quote}${char}${quote}) > 0;") 476 | 477 | #return newline delimited list (all these chars are only 1 code point) 478 | echo "${characters}" | sed $'s/./&\\n/g' 479 | } 480 | 481 | #list locales for AppleName.strings 482 | function listLocales(){ 483 | ls -1 "/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources" | awk -F . '/lproj$/ {print $1}' 484 | } 485 | 486 | #macOS has groupings within the categories plist 487 | function listcategoryGroups(){ 488 | local expandGroup="${1}" 489 | local cfile="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/Categories.plist" 490 | 491 | #no category file? return! 492 | ! [ -f "${cfile}" ] && return 493 | 494 | if ! ((headerOff)); then 495 | if ((expandGroup)); then 496 | echo "Group Name,Categoy Name" 497 | else 498 | echo "Group Name" 499 | fi 500 | fi 501 | 502 | local alength=$(plutil -extract CVAvailableCategories raw "${cfile}" -o -) 503 | for ((i=0;i < alength;i++)); do 504 | groupname=$(plutil -extract CVAvailableCategories.$i.Group raw "${cfile}" | sed 's/CategoryGroup-//g') 505 | if ! ((expandGroup)); then 506 | #strip off prefix 507 | echo "${groupname/#CategoryGroup-}" 508 | else 509 | local alength_2=$(plutil -extract CVAvailableCategories.$i.Categories raw "${cfile}" -o -) 510 | for ((j=0; j < alength_2; j++)); do 511 | catName=$(plutil -extract CVAvailableCategories.$i.Categories.$j raw "${cfile}" -o - | sed -e 's/^Category-//') 512 | echo "${groupname},${catName}" 513 | done 514 | fi 515 | done | sort | sed -e '/^$/d' -e '/Dingbats/d' 516 | } 517 | 518 | #macos has category plist files with comma delimited characters and 0x entries and ranges 519 | function listCategories(){ 520 | #bool 521 | local expandCategory="${1}" 522 | local sectionName 523 | local cfile 524 | local cname 525 | local list 526 | 527 | local searchPath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources" 528 | 529 | #simple list of all the Category files 530 | local categoryPaths_raw=$(find "${searchPath}" -name 'Category*.plist' | sort) 531 | 532 | #nothing found? return! 533 | [ -z "${categoryPaths_raw}" ] && return 534 | 535 | #find the ones with CVCategoryData 536 | local IFS=$'\n' 537 | for cfile in ${categoryPaths_raw}; do 538 | if plutil -type CVCategoryData "${cfile}" >/dev/null; then 539 | categoryPaths+="${cfile}"$'\n' 540 | fi 541 | done 542 | 543 | #header 544 | if ! ((headerOff)); then 545 | if ((expandCategory)); then 546 | echo "Category Name,Section Name" 547 | else 548 | echo "Category Name" 549 | fi 550 | fi 551 | 552 | #we need to further investigate each member for array within CVCategoryData 553 | for cfile in ${categoryPaths}; do 554 | #the name without path or prefix "Category-" or extension ".plist" 555 | cname=$(sed -e "s|${searchPath}/Category-||g" -e 's/\.plist$//g' -e '/^$/d' <<< "${cfile}") 556 | 557 | #these have an array named DataArray instead of a string name Data 558 | if plutil -type CVCategoryData.DataArray -expect array "${cfile}" >/dev/null; then 559 | if ((expandCategory)); then 560 | #loop through all the sections and name them 561 | local alength=$(plutil -extract CVCategoryData.DataArray raw "${cfile}") 562 | for ((i=0;i < alength;i++)); do 563 | sectionName=$(plutil -extract CVCategoryData.DataArray.$i.CVDataTitle raw "${cfile}" | sed 's/SectionTitle-//g') 564 | list+="${cname},${sectionName}"$'\n' 565 | done 566 | else 567 | list+="${cname}*"$'\n' 568 | fi 569 | else 570 | list+="${cname}"$'\n' 571 | fi 572 | done 573 | #manually add Emoji category 574 | list+="Emoji"$'\n' 575 | sort <<< "${list}" | sed -e '/Favorites/d' -e '/Recents/d' -e '/^$/d' 576 | } 577 | 578 | #macos has category plist files with comma delimited characters and 0x entries and ranges 579 | function getCategoryCharacters(){ 580 | 581 | #strip off trailing, leading whitespace, and * from end (list function) 582 | local categoryArgument=$(sed -e $'s/^[ \t]*//' -e $'s/[ \t]*$//' -e 's/\*$//' <<< "${1}") 583 | local categoryName=$(awk -F ',' '{print $1}' <<< "${categoryArgument}") 584 | local categorySection=$(awk -F ',' '{print $2}' <<< "${categoryArgument}") 585 | local data 586 | 587 | #our category file in the file system 588 | local cfile="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/Category-${categoryName}.plist" 589 | 590 | #emoji shows up but does not have a category file 591 | if [ "${categoryName}" = "Emoji" ]; then 592 | #add from coreemoji AppleName.strings 593 | local PlistPath="/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${LOCALE:=en}.lproj/AppleName.strings" 594 | #get all the object names with sed, delimit using 0x1E (record separator) and get the first field 595 | data=$(plutil -convert json -o - "${PlistPath}" | sed -e 's/","/\n/g' -e '1 s/^{"//' -e '$ s/"}$//' -e 's/":"/\x1E/g' | awk -F $'\x1E' '{print $1}') 596 | #remove empty line, sort, uniq, comma delimit and output results 597 | sed 's/^$//g' <<< "${data}" | sort | uniq | tr $'\n' "," 598 | return 599 | elif ! [ -f "${cfile}" ]; then 600 | echo "Category not found: ${categoryName}" >&2 601 | return 602 | fi 603 | 604 | #if a section name is specified 605 | if [ -n "${categorySection}" ]; then 606 | local alength=$(plutil -extract CVCategoryData.DataArray raw "${cfile}") 607 | 608 | for ((i=0;i < alength;i++)); do 609 | local sectionName=$(plutil -extract CVCategoryData.DataArray.$i.CVDataTitle raw "${cfile}" | sed 's/SectionTitle-//') 610 | if [ "$sectionName" = "${categorySection}" ]; then 611 | data=$(plutil -extract CVCategoryData.DataArray.$i.Data raw "${cfile}" | sed 's/SectionTitle-//g') 612 | echo "${data}" 613 | return 614 | fi 615 | done 616 | #else only category specified (which may or may not contain sections) 617 | else 618 | #just a single sectioned category file 619 | if data=$(plutil -extract CVCategoryData.Data raw -o - "${cfile}"); then 620 | echo "${data}" 621 | #if we have a dataArray we'll need to loop 622 | elif plutil -type CVCategoryData.DataArray -expect array "${cfile}" >/dev/null; then 623 | local alength=$(plutil -extract CVCategoryData.DataArray raw "${cfile}") 624 | for ((i=0;i < alength;i++)); do 625 | data=$(plutil -extract CVCategoryData.DataArray.$i.Data raw "${cfile}" | sed 's/SectionTitle-//g') 626 | echo "${data}" 627 | done 628 | fi 629 | fi 630 | } 631 | 632 | #groups are comprised of categories, clui can iterate through all of them or individual categories 633 | function getCategoryGroupCharacters(){ 634 | 635 | #strip off trailing, leading whitespace, and * from end (list function) 636 | local groupArgument=$(sed -e $'s/^[ \t]*//' -e $'s/[ \t]*$//' <<< "${1}") 637 | local groupName=$(awk -F ',' '{print $1}' <<< "${groupArgument}") 638 | local groupCategory=$(awk -F ',' '{print $2}' <<< "${groupArgument}") 639 | 640 | local cfile="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/Categories.plist" 641 | 642 | ! [ -f "${cfile}" ] && return 643 | 644 | #if a category name is also specified 645 | if [ -n "${groupCategory}" ]; then 646 | echo $(getCategoryCharacters "${groupCategory}") 647 | #else only group name specified, retrieve all categories within 648 | else 649 | #get array length 650 | local alength=$(plutil -extract CVAvailableCategories raw "${cfile}") 651 | #iterate over 652 | for ((i=0;i < alength;i++)); do 653 | #find group name 654 | local gname=$(plutil -extract CVAvailableCategories.$i.Group raw "${cfile}" | sed 's/^CategoryGroup-//') 655 | if [ "${gname}" = "${groupName}" ]; then 656 | local alength_2=$(plutil -extract CVAvailableCategories.$i.Categories raw "${cfile}") 657 | #loop through this array and print each categories chars 658 | for ((j=0; j < alength_2; j++)); do 659 | local groupCategory=$(plutil -extract CVAvailableCategories.$i.Categories.$j raw "${cfile}" | sed 's/^Category-//') 660 | [ -z "${groupCategory}" ] && break 661 | echo "$(getCategoryCharacters "${groupCategory}")" 662 | done 663 | break 664 | fi 665 | done 666 | fi 667 | 668 | } 669 | 670 | function filterFitzpatrickModifier(){ 671 | echo -n "${1}" | sed $'s/[\xF0\x9F\x8F\xBB\xF0\x9F\x8F\xBC\xF0\x9F\x8F\xBD\xF0\x9F\x8F\xBE\xF0\x9F\x8F\xBF]//g' 672 | } 673 | 674 | function hasFitzpatrickMod(){ 675 | if grep -q -E $'[\xF0\x9F\x8F\xBB\xF0\x9F\x8F\xBC\xF0\x9F\x8F\xBD\xF0\x9F\x8F\xBE\xF0\x9F\x8F\xBF]' <<< "${1}"; then 676 | return 0 677 | else 678 | return 1 679 | fi 680 | } 681 | 682 | #turn code point representations into literal characters 683 | function processCodePoints(){ 684 | local string="${1}" 685 | local newString="" 686 | local IFS=' ' 687 | local element 688 | 689 | #space delimited 690 | for element in ${string}; do 691 | case "${element}" in 692 | 0[Xx]*|[Uu]+*) 693 | #gets first and possible first and second element, trim leading zeroes (but not 0x0 or 0x00 so we can handle later) 694 | local first=$(awk -F '-' '{print $1}' <<< "${element}" | sed -E -e 's/^[Uu]\+//;s/^0[Xx]//' -e 's/^0{3,}//') 695 | local second=$(awk -F '-' '{print $2}' <<< "${element}" | sed -E -e 's/^[Uu]\+//;s/^0[Xx]//' -e 's/^0{3,}//') 696 | 697 | #validate 1-6 hex character only 698 | if ! grep -q -E "^[0-9a-fA-F]{1,6}$" <<< "${first}" || ([ -n "${second}" ] && ! grep -q -E "^[0-9a-fA-F]{1,6}$" <<< "${second}"); then 699 | echo "Invalid hexadecimal range: ${element}" >&2 700 | continue 701 | fi 702 | 703 | #ensure values lower than limit of Unicode 704 | if ((0x${first} > 0x10ffff)) || ((0x${second} > 0x10ffff)); then 705 | echo "Values cannot be greater than 10ffff: ${element}" >&2 706 | continue 707 | fi 708 | 709 | #from 0x0-0x0?! you weirdo. bash vars can't handle nulls, while zsh does: it's not in chardb anyway so we just ignore! 710 | [ $((0x${first})) -eq 0 ] && [ $((0x${second})) -eq 0 ] && echo "" && return 711 | 712 | #if this is a range and an entry is 0 (NULL) we are going to fudge it to 1 so category "Unicode,Basic Latin" doesn't fail 713 | if [ -n "${first}" ] && [ -n "${second}" ] ; then 714 | [ $((0x${first})) -eq 0 ] && first=1 715 | [ $((0x${second})) -eq 0 ] && second=1 716 | fi 717 | 718 | newString+=$(unichr $((0x${first}))) 719 | #note if this is a range include hyphen 720 | [ -n "${second}" ] && newString+="-$(unichr $((0x${second})))" 721 | ;; 722 | #literal character or character range 723 | *) 724 | newString+="${element}" 725 | ;; 726 | esac 727 | done 728 | printf "%s" "${newString}" 729 | } 730 | 731 | #produces Unicode given a code point value (in decimal) 732 | function unichr(){ 733 | #based on https://stackoverflow.com/a/16509364 from Orwellophile 734 | 735 | #used by fast_char 736 | local CHAR 737 | 738 | local charDecimal="${1}" 739 | local byteCounter=0 740 | # o - Ceiling decreases from 63 to 31, 15, and finally 7 741 | local ceiling=63 742 | # Accum. bits out of 256, starts at 128 increments to 192, 224, and 240 (differences of 128, 64, 32, and 16) 743 | local accumBits=128 744 | #output string 745 | local str='' # Output string 746 | 747 | #convert decimal to character 748 | function fast_char() { 749 | local __octal 750 | printf -v __octal '%03o' $1 751 | printf -v CHAR \\$__octal 752 | } 753 | 754 | ! ((charDecimal)) && echo $'\0' && return 755 | 756 | #if this is in the surrogate range (0xD800-0xDFFF), bail 757 | ((( charDecimal >= 0xD800 )) && (( charDecimal <= 0xDFFF ))) && return 758 | 759 | #if it's under 0x80 quickly print it out and return 760 | (( charDecimal < 0x80 )) && { fast_char "${charDecimal}"; printf "%s" "${CHAR}"; return; } 761 | 762 | #work through each byte, as long as the ordinal is bigger than the ceiling (which decreases from 63 to 31, 15, and finally 7) 763 | while (( charDecimal > ceiling )); do 764 | #fast_char $(( t = 0x80 | charDecimal & 0x3f )) 765 | fast_char $(( 0x80 | charDecimal & 0x3f )) 766 | #prepend the reply (we are working backward through the encoding) 767 | str="${CHAR}${str}" 768 | (( charDecimal >>= 6, byteCounter++, accumBits += ceiling+1, ceiling>>=1 )) 769 | done 770 | 771 | #final byte 772 | fast_char $(( accumBits | charDecimal )) 773 | 774 | #append final char for output 775 | echo -nE "${CHAR}${str}" 776 | } 777 | 778 | function finishUp(){ 779 | #no need to close up? 780 | ! ((HAD_OUTPUT)) && exit 781 | 782 | #close output 783 | case "${outputOption}" in 784 | #output breaks on objects, close out the array of them 785 | "JSON") echo -e '\n]' ;; 786 | #rtf can deal with sudden closure anywhere 787 | "RTF") echo '}';; 788 | #it just looks nice yeah? 789 | "CHARACTER") echo "" ;; 790 | esac 791 | exit 792 | } 793 | 794 | function printUsage(){ 795 | helpText="clui (1.0) - Command Line Unicode Info (https://github.com/brunerd/clui)\nUsage: clui [options] ...\n\nInput can be:\n * Unicode characters, space or comma delimited (use -x to expand non-delimited strings)\n * Hexadecimal codepoint representations (U+hhhhh or 0xhhhhh), double-quoted muti-point sequences\n * Hyphenated ranges (ascending or descending): z-a, U+A1-U+BF or 0x20-0x7E\n * Category or Group names (see Input Options)\n * Descriptive words or phrases (see Input Options)\n\nOutput Options\n\n -D Discrete info fields for CharacterDB and localized AppleName.strings\n\n -H Hide characters lacking info descriptions\n\n -l \n Show localized info (Emoji only). Use -Ll to list available localizations.\n\n -p Preserve case of CharacterDB info \n \n Encoding style for UTF field\n -E \n h* UTF-8 hexadecimal, space delimited and capitalized (NN) (default)\n H Hex HTML Entity UTF-32 (&#xnnnn;)\n 0 Octal UTF-8 with leading 0 (\\\\0nnn)\n o Octal UTF-8 (\\\\nnn)\n x Shell style UTF-8 hex(\\\\xnn)\n u JS style UTF-16 (\\\\unnnn)\n U zsh style UTF-32 Unicode Code Point (\\\\Unnnnnnnn)\n w Web/URL UTF-8 encoding (%nn)\n\n Output format \n -O \n C* CSV (default)\n c Character-only, space delimited\n j JSON output (array of objects)\n J JSON Sequence output (objects delimited by 0x1E and 0x0A)\n p Plain output (no field descriptions)\n r RTF output (plain output with large sized characters)\n y YAML output\n\n Format dependent output options\n -f set font size for RTF output of char and info fields (default: 256,32)\n -h Hide headers for CSV output\n \nInput Options\n\n -C [,Subsection]\n Treat input as a Category name with possible a subsection (see -L for listing)\n\n -F Remove Fitzpatrick skin tone modifier and process, then process as-is\n\n -G [,Category]\n Treat input as a Group name with possible category name (see -L for listing)\n\n -l \n Search localized descriptions (Emoji only), use -Ll to list available localizations\n\n -S \n Treat input as search criteria\n d Search descriptions in CharacterDB and AppleName.strings (case insensitive)\n c Search for character in other Unicode sequences\n C Search for character plus \"related characters\"\n\n -x Expand and describe each individual code point in a sequence\n -X Expand plus display original sequence prior to expansion\n\n -V Verbatim, process input raw/as-is, no additional interpretation or delimitation\n \nOther Modes\n\n List categories and groups\n -L List categories or groups in CSV (use -h to suppress header)\n c Category list (* after a name denotes subsections)\n C Category list, with subsections expanded\n g Groups of categories, top level name\n G Group name with member categories expanded\n l Locales available to search and display results from (Emoji only) \n \n -u Display usage info (aka help) with less (press q to quit)\n \nExamples:\n\n Search for characters a to z plus \"related characters\" and output as CSV (default)\n clui -SC a-z\n\n Look up all available Categories\n clui -Lc\n\n Get every character in Emoji category and output in RTF to a file\n clui -Or -C Emoji > Emoji.rtf\n\n All characters in Emoji category with discrete info fields in Spanish to a CSV to a file\n clui -D -l es -C Emoji > Emoji-es.csv\n\n Search descriptions for substring \"family\" and expand multi-code point ZWJ sequences \n clui -X -Sd \"family\"\n\t" 796 | echo -e "${helpText}" | less 797 | exit 0 798 | } 799 | 800 | ######## 801 | # MAIN # 802 | ######## 803 | 804 | #turn off globbing so * is not expanded in characer search 805 | set -f 806 | 807 | #if control-c or other interruption, we can close up our JSON and RTF output 808 | trap 'finishUp' TERM INT HUP 809 | 810 | #turn on debug output if Shift key held down 811 | shiftKeyDown="$(osascript -l JavaScript -e "ObjC.import('Cocoa'); ($.NSEvent.modifierFlags & $.NSEventModifierFlagShift) > 1" 2>/dev/null)" 812 | #if /tmp/debug file found or Shift key is held down then xtrace debug output 813 | if [ -f /tmp/debug ] || [ "${shiftKeyDown}" = "true" ] || [ "${shiftKeyDown}" = "True" ]; then xtraceFlag=1; fi 814 | ((${xtraceFlag})) && set -x 815 | 816 | #macOS version check (plutil raw output is the limiting factor for OS) 817 | if [ "$(sw_vers -productVersion | cut -d. -f1)" -le 12 ]; then 818 | echo "macOS Monterey (12) or higher required, exiting" >&2 819 | exit 1 820 | fi 821 | 822 | #process any options 823 | while getopts ":E:f:L:l:O:U:S:CDFGhHpruVxX" option; do 824 | case "${option}" in 825 | p) preserveCase=1 826 | ;; 827 | #discrete descriptions 828 | D) 829 | #GLOBAL used in outputCharInfo 830 | DISCRETE_DESCRIPTIONS=1 831 | ;; 832 | #locale 833 | l) 834 | if [ -d "/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${OPTARG}.lproj" ]; then 835 | #GLOBAL used in: getCharsByDescriptionSearch() getDescriptionByCharMatch() outputCharInfo() findMatchingChars() (defaults to en) 836 | LOCALE="${OPTARG}" 837 | else 838 | echo "Invalid locale: ${OPTARG}" >&2 839 | exit 1 840 | fi 841 | ;; 842 | #font size -f [large,small] 843 | f) 844 | #get CSV data for font sizes 845 | #match the size scale in TextEdit where the GUI is 2x what is in the RTF doc 846 | f_param1=$(awk -F , '{print $1}' <<< "${OPTARG}") 847 | ((${f_param1})) && FONTSIZE_LARGE="$((2 * f_param1))" 848 | f_param2=$(awk -F , '{print $2}' <<< "${OPTARG}") 849 | ((${f_param2})) && FONTSIZE_SMALL="$((2 * f_param2))" 850 | ;; 851 | #usage 852 | u) 853 | printUsage 854 | exit 0 855 | ;; 856 | #list categories or groups 857 | L) 858 | case "${OPTARG}" in 859 | C|c) 860 | [ "${OPTARG}" = "C" ] && expandCategory=1 861 | listType="CATEGORY" 862 | ;; 863 | #list Groups of fonts 864 | G|g) 865 | [ "${OPTARG}" = "G" ] && expandGroup=1 866 | listType="GROUP" 867 | ;; 868 | l) 869 | listType="LOCALE" 870 | ;; 871 | esac 872 | ;; 873 | G) [ -z "${searchMethod}" ] && searchMethod="GROUP" ;; 874 | C) [ -z "${searchMethod}" ] && searchMethod="CATEGORY" ;; 875 | S) 876 | case ${OPTARG} in 877 | d) [ -z "${searchMethod}" ] && searchMethod="DESCRIPTION" ;; 878 | c) [ -z "${searchMethod}" ] && searchMethod="CHARACTER" ;; 879 | C) [ -z "${searchMethod}" ] && searchMethod="CHARACTER" && relatedChars=1 ;; 880 | esac 881 | ;; 882 | F) filterFitzpatrick=1 ;; 883 | H) HIDE_UNDESCRIBED=1 ;; #GLOBAL used in outputCharInfo 884 | h) headerOff=1 ;; 885 | E) encodingScheme="${OPTARG}";; 886 | V) verbatimMode=1 ;; 887 | X|x) 888 | #summarize strings before expansion 889 | [ "${option}" = "X" ] && summarize=1 890 | #put space between points 891 | expand=1 892 | ;; 893 | O) 894 | case ${OPTARG} in 895 | J) outputOption="JSONSEQ" ;; 896 | j) outputOption="JSON" ;; 897 | y) outputOption="YAML" ;; 898 | p) outputOption="PLAIN" ;; 899 | c) outputOption="CHARACTER" ;; 900 | r|R) outputOption="RTF" ;; 901 | C|*) outputOption="CSV" ;; 902 | esac 903 | ;; 904 | #do nothing for unknown options 905 | '?') : ;; 906 | esac 907 | done 908 | 909 | #if required shift parameters, OPTIND is only set after getopts 910 | [ ${OPTIND} -ge 2 ] && shift $((OPTIND-1)) 911 | 912 | #list categories or groups and exit 913 | case "${listType}" in 914 | "CATEGORY") 915 | listCategories "${expandCategory}" 916 | exit 0 917 | ;; 918 | "GROUP") 919 | listcategoryGroups "${expandGroup}" 920 | exit 0 921 | ;; 922 | "LOCALE") 923 | listLocales 924 | exit 0 925 | ;; 926 | esac 927 | 928 | #if we are piping in input, use that 929 | if ! [ -t '0' ]; then 930 | #set positional parameters from cat input (IFS delimited) 931 | set -- $(cat) 932 | #otherwise see if any input is provided, if not offer help 933 | elif [ "${1}" != $'\n' ] && [ -z "$(echo -n "$@")" ]; then 934 | echo -e "clui (1.0) - Command Line Unicode Info (https://github.com/brunerd/clui)\nUsage: clui [options] ... (option -u for usage help)" 935 | exit 0 936 | fi 937 | 938 | #process each paramter, quote "$@" to split by positional parameter (unquoted $@ splits by IFS) 939 | for parameter in "$@"; do 940 | #replace parameter with results for these cases 941 | case "${searchMethod}" in 942 | "DESCRIPTION") 943 | #get characters, turning line feeds into commas to process each individually later 944 | parameter=$(getCharsByDescriptionSearch "${parameter}") 945 | [ -n "$parameter" ] && parameter=$(echo "${parameter}" | tr $'\n' ',') || continue 946 | ;; 947 | "CATEGORY") 948 | parameter=$(getCategoryCharacters "${parameter}") 949 | ;; 950 | "GROUP") 951 | parameter=$(getCategoryGroupCharacters "${parameter}") 952 | ;; 953 | esac 954 | 955 | #verbatim (-V) do not processCodePoints() 956 | if ((verbatimMode)); then 957 | #-X show as-is first, if more than 1 char 958 | if ((summarize)) && [ -n "${parameter:1:1}" ]; then 959 | outputCharInfo "${parameter}" "${outputOption}" "${encodingScheme}" 960 | fi 961 | 962 | #-x expand each and every code point in string 963 | if ((expand)); then 964 | #go through each char 965 | for ((i=0;;i++)); do 966 | char="${parameter:$i:1}" 967 | [ -z "${char}" ] && break 968 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}" 969 | done 970 | #or treat entire parameter as a single entity 971 | else 972 | outputCharInfo "${parameter}" "${outputOption}" "${encodingScheme}" 973 | fi 974 | #process next parameter 975 | continue 976 | fi 977 | 978 | #treat comma and newline delimited items discretely 979 | IFS=$',\n' 980 | for chunk in ${parameter}; do 981 | #transform any U+hhhh/0xhhhh representation to the literal character/sequence, space is not significant 982 | chunk=$(processCodePoints "${chunk}") 983 | 984 | #only two ways to handle it 985 | case "${chunk}" in 986 | #if it has a hyphen treat as range (if misinterpreted use -V) 987 | *-*) 988 | #if it is simply a "-" alone 989 | if [ "${chunk}" = "-" ]; then 990 | start="-" 991 | else 992 | #Get our start and end (if it exists) 993 | start="$(awk -F - '{print $1}' <<< "${chunk}")" 994 | [ -z "${start}" ] && continue 995 | end="$(awk -F - '{print $2}' <<< "${chunk}")" 996 | fi 997 | [ -z "${end}" ] && end="${start}" 998 | 999 | #ensure they are only a single code point, otherwise ignore and move of 1000 | [ "${start:1:1}" ] && continue 1001 | [ "${end:1:1}" ] && continue 1002 | 1003 | #get decimal of character 1004 | start=$(/usr/bin/printf "%d" "'${start}") 1005 | end=$(/usr/bin/printf "%d" "'${end}") 1006 | 1007 | #to increment or decrement, that is the question 1008 | if ((start <= end)); then 1009 | comparison="<=" 1010 | inc_dec="i++" 1011 | else 1012 | inc_dec="i--" 1013 | comparison=">=" 1014 | fi 1015 | 1016 | ## loop through code point(s) 1017 | for ((i=${start}; i ${comparison} end; ${inc_dec})); do 1018 | #if searching by character or related chars, get them and go through them 1019 | if [ "${searchMethod}" = "CHARACTER" ]; then 1020 | #get characters (newline delimited) 1021 | newChunk=$(findMatchingChars "$(unichr ${i})") 1022 | #get related characters also, if applicable (newline delimited also) 1023 | ((relatedChars)) && newChunk+=$'\n'$(getRelatedCharacters "$(unichr ${i})") 1024 | #sort and uniq the results and turn into csv 1025 | for char in $(sort -h <<< "${newChunk}" | uniq); do 1026 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}" 1027 | done 1028 | else 1029 | [ "${i}" = 10 ] && char=$'\n' || char="$(unichr ${i})" 1030 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}" 1031 | fi 1032 | done 1033 | ;; 1034 | #anything else, not a range 1035 | *) 1036 | #search for the character in other sequences 1037 | if [ "${searchMethod}" = "CHARACTER" ]; then 1038 | #get characters (newline delimited) 1039 | newChunk=$(findMatchingChars "${chunk}") 1040 | #get related characters also, if applicable (newline delimited also) 1041 | ((relatedChars)) && newChunk+=$'\n'$(getRelatedCharacters "${chunk}") 1042 | #sort and uniq the results and output 1043 | for char in $(sort -h <<< "${newChunk}" | uniq); do 1044 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}" 1045 | done 1046 | continue 1047 | fi 1048 | 1049 | #if over one code point 1050 | if [ -n "${chunk:1:1}" ]; then 1051 | #output version without fitzpatrick modifier if option on and modifier found 1052 | if ((filterFitzpatrick)) && hasFitzpatrickMod "${chunk}"; then 1053 | outputCharInfo "$(filterFitzpatrickModifier "${chunk}")" "${outputOption}" "${encodingScheme}" 1054 | fi 1055 | 1056 | #if (-X) "summarize" the chunk first, display and get info as-is 1057 | if ((summarize)); then 1058 | outputCharInfo "${chunk}" "${outputOption}" "${encodingScheme}" 1059 | fi 1060 | fi 1061 | 1062 | # if -x or -X go through each character 1063 | if ((expand)); then 1064 | for ((i=0;;i++)); do 1065 | char="${chunk:$i:1}" 1066 | [ -z "${char}" ] && break 1067 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}" 1068 | done 1069 | #if not expand, pass entire chunk to our string 1070 | else 1071 | outputCharInfo "${chunk}" "${outputOption}" "${encodingScheme}" 1072 | fi 1073 | ;; 1074 | esac 1075 | done 1076 | done 1077 | 1078 | #close up any output formats that need it 1079 | finishUp 1080 | --------------------------------------------------------------------------------