├── LICENSE
├── README.md
└── sources
└── clui
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Joel Bruner
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # clui
2 | Get the description, code point(s) and UTF encoding of Unicode characters and sequences, in a variety of formats and encodings with clui (Command Line Unicode Information) for macOS
3 |
4 | Check out [blog entries on clui at brunerd.com](https://www.brunerd.com/blog/category/projects/clui/)
5 |
6 | ## clui demo video
7 |
8 | See clui in action:
9 | [](https://www.youtube.com/watch?v=KhNblOSffz4)
10 |
11 | ## clui usage
12 | `./clui -u` for usage/help output in `less`
13 | ```
14 | clui (1.0) - Command Line Unicode Info (https://github.com/brunerd/clui)
15 | Usage: clui [options] ...
16 |
17 | Input can be:
18 | * Unicode characters, space or comma delimited (use -x to expand non-delimited strings)
19 | * Hexadecimal codepoint representations (U+hhhhh or 0xhhhhh), double-quoted muti-point sequences
20 | * Hyphenated ranges (ascending or descending): z-a, U+A1-U+BF or 0x20-0x7E
21 | * Category or Group names (see Input Options)
22 | * Descriptive words or phrases (see Input Options)
23 |
24 | Output Options
25 |
26 | -D Discrete info fields for CharacterDB and localized AppleName.strings
27 |
28 | -H Hide characters lacking info descriptions
29 |
30 | -l
31 | Show localized info (Emoji only). Use -Ll to list available localizations.
32 |
33 | -p Preserve case of CharacterDB info
34 |
35 | Encoding style for UTF field
36 | -E
37 | h* UTF-8 hexadecimal, space delimited and capitalized (NN) (default)
38 | H Hex HTML Entity UTF-32 (nnnn;)
39 | 0 Octal UTF-8 with leading 0 (\0nnn)
40 | o Octal UTF-8 (\nnn)
41 | x Shell style UTF-8 hex(\xnn)
42 | u JS style UTF-16 (\unnnn)
43 | U zsh style UTF-32 Unicode Code Point (\Unnnnnnnn)
44 | w Web/URL UTF-8 encoding (%nn)
45 |
46 | Output format
47 | -O
48 | C* CSV (default)
49 | c Character-only, space delimited
50 | j JSON output (array of objects)
51 | J JSON Sequence output (objects delimited by 0x1E and 0x0A)
52 | p Plain output (no field descriptions)
53 | r RTF output (plain output with large sized characters)
54 | y YAML output
55 |
56 | Format dependent output options
57 | -f set font size for RTF output of char and info fields (default: 256,32)
58 | -h Hide headers for CSV output
59 |
60 | Input Options
61 |
62 | -C [,Subsection]
63 | Treat input as a Category name with possible a subsection (see -L for listing)
64 |
65 | -F Remove Fitzpatrick skin tone modifier and process, then process as-is
66 |
67 | -G [,Category]
68 | Treat input as a Group name with possible category name (see -L for listing)
69 |
70 | -l
71 | Search localized descriptions (Emoji only), use -Ll to list available localizations
72 |
73 | -S
74 | Treat input as search criteria
75 | d Search descriptions in CharacterDB and AppleName.strings (case insensitive)
76 | c Search for character in other Unicode sequences
77 | C Search for character plus "related characters"
78 |
79 | -x Expand and describe each individual code point in a sequence
80 | -X Expand plus display original sequence prior to expansion
81 |
82 | -V Verbatim, process input raw/as-is, no additional interpretation or delimitation
83 |
84 | Other Modes
85 |
86 | List categories and groups
87 | -L List categories or groups in CSV (use -h to suppress header)
88 | c Category list (* after a name denotes subsections)
89 | C Category list, with subsections expanded
90 | g Groups of categories, top level name
91 | G Group name with member categories expanded
92 | l Locales available to search and display results from (Emoji only)
93 |
94 | -u Display usage info (aka help) with less (press q to quit)
95 |
96 | Examples:
97 |
98 | Search for characters a to z plus "related characters" and output as CSV (default)
99 | clui -SC a-z
100 |
101 | Look up all available Categories
102 | clui -Lc
103 |
104 | Get every character in Emoji category and output in RTF to a file
105 | clui -Or -C Emoji > Emoji.rtf
106 |
107 | All characters in Emoji category with discrete info fields in Spanish to a CSV to a file
108 | clui -D -l es -C Emoji > Emoji-es.csv
109 |
110 | Search descriptions for substring "family" and expand multi-code point ZWJ sequences
111 | clui -X -Sd "family"
112 | ```
113 |
--------------------------------------------------------------------------------
/sources/clui:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | : <<-LICENSE_BLOCK
3 | clui - Command Line Unicode Info, Copyright (c) 2023 Joel Bruner (https://github.com/brunerd/clui)
4 | Licensed under the MIT License
5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8 | LICENSE_BLOCK
9 |
10 | #############
11 | # FUNCTIONS #
12 | #############
13 |
14 | function outputCharInfo(){
15 |
16 | #GLOBAL booleans: FONTSIZE_LARGE, FONTSIZE_SMALL, HIDE_UNDESCRIBED, LOCALE, and HAD_OUTPUT
17 | #positional parameters
18 | local string="${1}"; [ -z "${string}" ] && return
19 | local outputOption="${2:-CSV}"
20 | local encodingScheme="${3:-h}"
21 |
22 | #font size defaults for RTF output (-Or)
23 | #the character/sequence
24 | local default_fontSize_large="256"
25 | #the info, code points, and UTF
26 | local default_fontSize_small="32"
27 |
28 | #member names for JSON and YAML objects
29 | #the star of the show
30 | local fieldName_string="char"
31 | #info/description from both CharacterDB.sqlite3 and AppleName.strings, can be broken out individually or ; delimited (if more than one)
32 | local fieldName_info="info"
33 |
34 | #later if DISCRETE_DESCRIPTIONS is true use these field names
35 | [ "${LOCALE:=en}" != "en" ] && local locale_suffix="-${LOCALE}"
36 | #make discrete names used for JSON and YAML
37 | local fieldName_info1="${fieldName_info}-chardb"
38 | local fieldName_info2="${fieldName_info}-emoji${locale_suffix}"
39 |
40 | local fieldName_codePoints="cp"
41 | #this will have the bit depth and encoding type appended
42 | local fieldName_encoding="utf"
43 |
44 | #internal variables
45 | local IFS=$'\n'; local i; local encodedString; local str_codePoint; local byte
46 |
47 | #jse JSON String encoder
48 | function jse() (
49 | #jse - JSON String Encoder (https://github.com/brunerd/jse) Copyright (c) 2022 Joel Bruner, Licensed under the MIT License
50 | set +x; read -r -d '' JSCode <<-'EOT'
51 | var argument=decodeURIComponent(escape(arguments[0]));var fileFlag=decodeURIComponent(escape(arguments[1]));if(fileFlag){try{vartext=readFile(argument)} catch(error){throw new Error(error);quit()};if(argument==="/dev/stdin"){text=text.slice(0,-1)}}else{var text=argument};print(JSON.stringify(text,null,0))
52 | EOT
53 | jsc=$(find "/System/Library/Frameworks/JavaScriptCore.framework/Versions/Current/" -name 'jsc');[ -z "${jsc}" ] && jsc=$(which jsc);if ([ -n "${ARGC}" ] && [ "${ARGC}" = "0" ]) || ([ -z "${ARGC}" ] && [ "${#BASH_ARGC[@]}" = "0" ]); then if [ -f '/dev/stdin' ]; then fileFlag=1; argument="/dev/stdin"; elif [ -t '0' ]; then exit 0; fi; elif [ "${1}" = "-f" ] && [ -n "${2}" ]; then if [ ! -f "${2}" ]; then echo "File not found: ${2}";exit 1; else fileFlag=1; argument="${2}"; fi; else argument="${1}"; fi; if [ -z "${argument}" ] && [ ! -t '0' ]; then "${jsc}" -e "${JSCode}" -- "/dev/stdin" "1" <<< "$(cat)"; else "${jsc}" -e "${JSCode}" -- "${argument}" "${fileFlag}"; fi
54 | )
55 |
56 | function encodeRTF(){
57 | local needsCtrlSeq=1
58 | local OPTIND
59 | local OPTARG
60 |
61 | while getopts ":f:" option; do
62 | case "${option}" in
63 | f)
64 | local fontsize="${OPTARG}"
65 | ((${fontsize})) && echo "\\fs${fontsize} "
66 | ;;
67 | esac
68 | done
69 |
70 | #if required shift parameters, OPTIND is only set after getopts
71 | [ ${OPTIND} -ge 2 ] && shift $((OPTIND-1))
72 |
73 | local string="${1}"
74 |
75 | #go through each character
76 | for ((i=0;;i++)); do
77 | local char="${string:i:1}"
78 | [ -z "${char}" ] && break
79 | local charDecimal=$(/usr/bin/printf "%d" "'${char}")
80 | #anything in these ranges encode
81 | if ((charDecimal <= 0x1F)) || ((charDecimal > 0x7E)); then
82 | ((needsCtrlSeq)) && echo -En "\\uc0"
83 | #make UTF-16 in hex, add 0x prefix with sed, then convert to decimal and prefix with \u for RTF
84 | printf '\\u%04d ' $(echo -n "${char}" | iconv -t UTF-16BE | xxd -p -c2 | sed 's/..../0x&/g')
85 | needsCtrlSeq=0
86 | else
87 | #a few ascii chars need escaping
88 | case "${char}" in
89 | #just add a backslach before { } and \
90 | '{'|'}'|'\')
91 | echo -En $(sed 's/\\/\\\\/g;s/[{}]/\\&/g' <<< "${char}")
92 | ;;
93 | #output all else as-is
94 | *)
95 | printf "%s" "${char}"
96 | ;;
97 | esac
98 | fi
99 | done
100 | #end of line
101 | echo "\\"
102 | }
103 |
104 | #build variable str_codePoint (U+nnnn ...)
105 | while [ -n "${string:$i:1}" ]; do
106 | #code point (6 padded for max 10FFFF value)
107 | local codePoint=$(/usr/bin/printf "%06X" \'"${string:$i:1}")
108 | #space depending
109 | [ -z "${str_codePoint}" ] && space="" || space=" "
110 | #trim off leading zeroes except for single digit characters
111 | str_codePoint+="${space}U+$(sed -E 's/^0{0,4}//g' <<< "${codePoint}")"
112 | let i++
113 | done
114 |
115 | #build encodedString encoded one of these ways
116 | case "${encodingScheme}" in
117 | #utf-8 octal \nnn (-Eo) or \0nnn (-E0)
118 | o|0)
119 | #append bit depth and encoding type
120 | [ "${encodingScheme}" = "0" ] && { zero="0"; fieldName_encoding+="-8 0-octal-sh"; } || fieldName_encoding+="-8 octal-sh"
121 | for byte in $(echo -En "${string}" | xxd -p -c1 -u); do
122 | #use shell hex conversion along with printf
123 | encodedString+="$(printf "\\\\${zero}%03o" $((0x${byte})))"
124 | done
125 | ;;
126 | #Web (URL) encoding %nn (-Ew)
127 | w)
128 | #append bit depth and encoding type
129 | fieldName_encoding+="-8 url"
130 | #print encodedString encoded \x escape style, leave xxd output unquoted to leverage each line as argument for printf
131 | encodedString="$(printf "%%%s" $(echo -En "${string}" | xxd -p -c1 -u))"
132 | ;;
133 | #zsh UTF-32 code point \Unnnnnnnn (-EU)
134 | U)
135 | fieldName_encoding+="-32 zsh"
136 | for ((i=0;;i++)); do
137 | [ -z "${string:i:1}" ] && break
138 | #printf in bash 3.x cannot print Code points but /usr/bin/printf can
139 | encodedString+="$(/usr/bin/printf "\\\\U%08X" \'"${string:i:1}")"
140 | done
141 | ;;
142 | #JS UTF-16 \unnnn (-Eu)
143 | u)
144 | fieldName_encoding+="-16 JS"
145 | for (( i=0;; i++ )); do
146 | [ -z "${string:i:1}" ] && break
147 | #print UTF16 encoded \u escape style, leave xxd output unquoted to leverage each line as argument for printf
148 | encodedString+="$(printf "\\\\u%s%s" $(echo -n "${string:$i:1}" | iconv -f utf-8 -t utf-16be | xxd -p -c1))"
149 | done
150 | ;;
151 | #utf-8 shell escaped \xnn (-Ex) [DEFAULT]
152 | x)
153 | fieldName_encoding+="-8 hex-sh"
154 | #print encodedString encoded \x escape style, leave xxd output unquoted to leverage each line as argument for printf
155 | encodedString="$(printf "\\\\x%s" $(echo -En "${string}" | xxd -p -c1 -u))"
156 | ;;
157 | #HTML Entity nnnn;
158 | H)
159 | fieldName_encoding+="-32 HTML Entity"
160 | for ((i=0;;i++)); do
161 | [ -z "${string:i:1}" ] && break
162 | #printf in bash 3.x cannot print Code points but /usr/bin/printf can
163 | encodedString+="$(/usr/bin/printf "%X;" \'"${string:i:1}")"
164 | done
165 | ;;
166 | ## default (or -Eh) capitalized space delimited hex bytes NN NN NN NN
167 | 'h'|*)
168 | fieldName_encoding+="-8 hex"
169 | #UTF8 Hex string
170 | encodedString="$(echo -En "${string}" | xxd -c0 -p -u | sed -e 's/\(..\)/\1 /g' -e 's/ $//') "
171 | ;;
172 | esac
173 |
174 | #clean up trailing spaces
175 | encodedString=$(echo -n "${encodedString}" | sed -E 's/ +$//')
176 |
177 | #get description
178 | local description=$(getDescriptionByCharMatch "${string}")
179 | #break apart if needed
180 | if ((DISCRETE_DESCRIPTIONS)); then
181 | local info1=$(echo -n "${description}" | sed -n "1p")
182 | local info2=$(echo -n "${description}" | sed -n "2p")
183 | fi
184 |
185 | #if no description and global "HIDE_UNDESCRIBED" is true (1), return
186 | if [ -z "${description}" ] && ((${HIDE_UNDESCRIBED})); then
187 | return
188 | fi
189 |
190 | case ${outputOption} in
191 | "JSON"*)
192 | #JSON formatting variables
193 | case "${outputOption}" in
194 | "JSONSEQ")
195 | local recSep=$'\x1E'
196 | local seq_nl=$'\n'
197 | local spc=' '
198 | ;;
199 | "JSON")
200 | local spc=' '
201 | local objspc=' '
202 | local comma=,
203 |
204 | #JSON object(s) will be inside an array and comma separated
205 | if ((${NEEDSHEADER:=1})); then
206 | #start the array
207 | JSON_format=$'[\n'
208 | else
209 | JSON_format=$',\n'
210 | fi
211 | ;;
212 | esac
213 |
214 | if ((DISCRETE_DESCRIPTIONS)); then
215 | local infoblock="${spc}\"${fieldName_info1}\": $(jse "${info1}" 2>/dev/null),
216 | ${spc}\"${fieldName_info2}\": $(jse "${info2}" 2>/dev/null),"
217 | else
218 | local infoblock="${spc}\"${fieldName_info}\": $(jse "${description}" 2>/dev/null),"
219 | fi
220 |
221 | #create JSON output
222 | local output="${JSON_format}${recSep}${objspc}{"$'\n'"${spc}\"${fieldName_string:=string}\": $(jse "${string}" 2>/dev/null),
223 | ${infoblock}
224 | ${spc}\"${fieldName_codePoints}\": \"${str_codePoint}\",
225 | ${spc}\"${fieldName_encoding}\": $(jse "${encodedString}" 2>/dev/null)
226 | ${objspc}}${seq_nl}"
227 |
228 | #output in one atomic chunk so if TERM is trapped the closing ] for JSON array will always be correct
229 | echo -n "${output}"
230 | ;;
231 | "YAML")
232 | if ((DISCRETE_DESCRIPTIONS)); then
233 | local infoblock="${fieldName_info1}: $(jse "${info1}" 2>/dev/null)
234 | ${fieldName_info2}: $(jse "${info2}" 2>/dev/null)"
235 | else
236 | local infoblock="${fieldName_info}: $(jse "${description}" 2>/dev/null)"
237 | fi
238 |
239 | local output="- ${fieldName_string}: $(jse "${string}" 2>/dev/null)
240 | ${infoblock}
241 | ${fieldName_codePoints}: ${str_codePoint}
242 | ${fieldName_encoding}: ${encodedString}"
243 | echo "${output}"
244 | ;;
245 | "PLAIN")
246 | output="--
247 | ${string}
248 | ${description}
249 | ${str_codePoint}
250 | ${encodedString}"
251 | echo "${output}"
252 | ;;
253 | "CHARACTER")
254 | ((HAD_OUTPUT)) && local char_space=" "
255 | #just the character
256 | echo -n "${char_space}${string}"
257 | ;;
258 | "RTF")
259 | #use global var to keep track of this
260 | if ((${NEEDSHEADER:=1})); then
261 | echo '{\rtf1\ansi\ansicpg1252\cocoartf2709
262 | \cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
263 | {\colortbl;\red255\green255\blue255;}
264 | {\*\expandedcolortbl;;}
265 | \margl1440\margr1440\vieww15000\viewh16000\viewkind0
266 | \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
267 | '
268 | fi
269 | output="$(encodeRTF -f ${FONTSIZE_LARGE:=$default_fontSize_large} "${string}")
270 | $(encodeRTF -f ${FONTSIZE_SMALL:=$default_fontSize_small} "${description}")
271 | ${str_codePoint}\\
272 | ${encodedString}\\
273 | --\\
274 | "
275 | echo -n "${output}"
276 | ;;
277 | "CSV"|*)
278 | #if any of these character are present, enclose character in quotes and escape any existing quotes
279 | if [ "${string}" = $'\n' ]; then
280 | string="\"${string}\""
281 | elif grep -q -e \" -e "," -e " " -e $'\a' -e $'\e' -e $'\f' -e $'\r' -e $'\t' -e $'\v' -e $'\x7F' <<< "${string}" || [[ "${string}" == *$'\n'* ]]; then
282 | string="\"$(sed -e 's/"/""/g' <<< "${string}")\""
283 | fi
284 |
285 | #if description has comma or newline then enclose in quotes
286 | if ((DISCRETE_DESCRIPTIONS)); then
287 | if grep -q -e "," <<< "${info1}"; then
288 | info1="\"${info1}\""
289 | fi
290 |
291 | if grep -q -e "," <<< "${info2}"; then
292 | info1="\"${info2}\""
293 | fi
294 |
295 | description="${info1},${info2}"
296 | else
297 | #no descriptions contain straight quotes, curly quotes only
298 | #description=$(sed -e 's/"/""/' <<< "${description}")
299 |
300 | #quote if comma found
301 | if grep -q -e "," <<< "${description}"; then
302 | description="\"${description}\""
303 | fi
304 | fi
305 |
306 | #print header if needed and not explicitly blocked
307 | if ! ((headerOff)) && ((${NEEDSHEADER:=1})); then
308 | if ((DISCRETE_DESCRIPTIONS)); then
309 | echo "${fieldName_string},${fieldName_info1},${fieldName_info2},${fieldName_codePoints},${fieldName_encoding}"
310 | else
311 | echo "${fieldName_string},${fieldName_info},${fieldName_codePoints},${fieldName_encoding}"
312 | fi
313 | fi
314 |
315 | if ((DISCRETE_DESCRIPTIONS)); then
316 | echo "${string},${info1},${info2},${str_codePoint},${encodedString}"
317 | else
318 | echo "${string},${description},${str_codePoint},${encodedString}"
319 | fi
320 | ;;
321 | esac
322 |
323 | #GLOBALS
324 | NEEDSHEADER=0
325 | HAD_OUTPUT=1
326 | }
327 |
328 | #get the description
329 | function getDescriptionByCharMatch(){
330 |
331 | local char="${1}"
332 | [ -z "${char}" ] && return
333 |
334 | #database lookup info
335 | local databasePath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/CharacterDB.sqlite3"
336 | local dbTable="unihan_dict"
337 | local char_fieldName="uchr"
338 | local desc_fieldName="info"
339 |
340 | #both kinds of quotes in the string?! forget about it!
341 | (grep -q -e \" <<< "${char}" && grep -q -e \' <<< "${char}") && return
342 | #set quote type depending on char
343 | grep -q -e \" <<< "${char}" && local quote="'" || local quote='"'
344 |
345 | #remove trailing/repeating | only
346 | local chardb_description=$(sqlite3 "${databasePath}" "SELECT ${desc_fieldName} FROM \`${dbTable}\` where ${char_fieldName} == ${quote}${char}${quote};" | sed -E -e 's/\|*$//')
347 | #lowercase for comparison
348 | local chardb_description_lower=$(awk '{print tolower($0)}' <<< "${chardb_description}")
349 |
350 | #if has fitzpatrick, remove and try again
351 | if [ -z "${chardb_description}" ] && hasFitzpatrickMod "${char}"; then
352 | chardb_description=$(sqlite3 "${databasePath}" "SELECT ${desc_fieldName} FROM \`${dbTable}\` where ${char_fieldName} == ${quote}$(filterFitzpatrickModifier "${char}")${quote};" | sed -E -e 's/\|*$//')
353 | chardb_description_lower=$(awk '{print tolower($0)}' <<< "${chardb_description}")
354 | fi
355 |
356 | #escape characters problematic for PlistBuddy: colon, space, solidus, tab (even though they shouldn't be in here, you never know what the future may bring)
357 | [ "${char}" != $'\n' ] && char=$(sed -e 's/\\/\\\\/g;s/ /\\ /g;s/:/\\:/g;s/\t/\\\t/g' <<< "${char}")
358 |
359 | #try a CoreEmoji lookup in AppleName.strings, an XML doc with each character is it's own dictionary entry
360 | local PlistPath="/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${LOCALE:=en}.lproj/AppleName.strings"
361 |
362 | #get description from CoreEmoji AppleName.string in whatever locale
363 | local emoji_description=$(/usr/libexec/PlistBuddy -c "print :${char}" "${PlistPath}" 2>/dev/null)
364 | #for comparison only
365 | local emoji_description_lower=$(awk '{print tolower($0)}' <<< "${emoji_description}")
366 |
367 | #try without fitzpatrick if present
368 | if [ -z "${emoji_description}" ] && hasFitzpatrickMod "${char}"; then
369 | local tempchar="$(filterFitzpatrickModifier "${char}")"
370 | #just in case it's fitzpatrick(s) only and nothing is left
371 | if [ -n "${tempchar}" ]; then
372 | emoji_description=$(/usr/libexec/PlistBuddy -c "print :${tempchar}" "${PlistPath}" 2>/dev/null | awk '{print tolower($0)}')
373 | emoji_description_lower=$(awk '{print tolower($0)}' <<< "${emoji_description}")
374 | fi
375 | fi
376 |
377 | #output
378 | if ((DISCRETE_DESCRIPTIONS)); then
379 | if ((preserveCase)); then
380 | echo -n "${chardb_description}"$'\n'"${emoji_description}"
381 | else
382 | echo -n "${chardb_description_lower}"$'\n'"${emoji_description}"
383 | fi
384 | #if the lowescase is the same, use AppleName.strings b/c capitalization is sane
385 | elif [ "${emoji_description_lower}" = "${chardb_description_lower}" ]; then
386 | echo -n "${emoji_description}"
387 | else
388 | ([ -n "${chardb_description}" ] && [ -n "${emoji_description}" ]) && local separator=";"
389 | ((preserveCase)) && echo -n "${chardb_description}${separator}${emoji_description}" || echo -n "${chardb_description_lower}${separator}${emoji_description}"
390 | fi
391 | }
392 |
393 | #search in descriptions of CharacterDB.sqlite3 and AppleName.strings
394 | function getCharsByDescriptionSearch(){
395 | local searchstring="${1}"
396 | local databasePath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/CharacterDB.sqlite3"
397 | local dbTable="unihan_dict"
398 | local char_fieldName="uchr"
399 | local desc_fieldName="info"
400 |
401 | #strip newlines - otherwise weird search results can happen
402 | searchstring=$(echo -n "${searchstring}" | tr -d $'\n')
403 |
404 | #nothing left? go back
405 | [ -z "${searchstring}" ] && return
406 |
407 | #both quotes forget about it
408 | (grep -q -e \" <<< "${char}" && grep -q -e \' <<< "${char}") && return
409 | #set quote depending on string
410 | grep -q -e \" <<< "${char}" && local quote="'" || local quote='"'
411 |
412 | #CharacterDB results, info is upper so convert search to upper
413 | local matchingChars=$(sqlite3 "${databasePath}" "SELECT uchr FROM \`${dbTable}\` WHERE instr(${desc_fieldName}, UPPER(${quote}${searchstring}${quote})) > 0 OR instr(${desc_fieldName}, LOWER(${quote}${searchstring}${quote})) > 0;")
414 | [ -n "${matchingChars}" ] && matchingChars+=$'\n'
415 |
416 | local PlistPath="/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${LOCALE:=en}.lproj/AppleName.strings"
417 | #add results from coreemoji
418 | matchingChars+=$(plutil -convert json -o - "${PlistPath}" | sed -e 's/","/\n/g' -e '1 s/^{"//' -e '$ s/"}$//' -e 's/":"/\x1E/g' | awk '{print tolower($0)}' | awk -F $'\x1E' '$2 ~ /'"${searchstring}"'/ {print $1}')$'\n'
419 |
420 | sed 's/^$//g' <<< "${matchingChars}" | sort | uniq
421 | }
422 |
423 | #find chacter usage in all glyphs
424 | function findMatchingChars(){
425 | local char="${1}"
426 |
427 | [ -z "${char}" ] && return 1
428 | [ "${char}" = $'\n' ] && return 1
429 |
430 | #strip newlines - otherwise weird search results can happen
431 | char=$(echo -n "${char}" | tr -d $'\n')
432 |
433 | local PlistPath="/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${LOCALE:=en}.lproj/AppleName.strings"
434 | #coreemoji search, convert to json then massage down to CSV
435 | #\ escape any regex conflicting characters: \ | . / + * ^ $ [ (
436 | local awk_char="$(echo -n "${char}" | sed -e 's/[\\\|\./\+\*^$[(?]/\\&/g')"
437 | local matchingChars=$(plutil -convert json -o - "${PlistPath}" | sed -e $'s/",/\\n/g' | sed -e '1 s/^{//' -e 's/^"//g' -e 's/":"/,/g' -e '$ s/}$//' | awk -F , '$1 ~ /'"${awk_char}"'/ {print $1}')
438 | [ -n "${matchingChars}" ] && matchingChars+=$'\n'
439 |
440 | local databasePath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/CharacterDB.sqlite3"
441 | local dbTable="unihan_dict"
442 | local char_fieldName="uchr"
443 | local desc_fieldName="info"
444 |
445 | #both quotes forget about it
446 | (grep -q -e \" <<< "${char}" && grep -q -e \' <<< "${char}") && return
447 | #set quote depending on string
448 | grep -q -e \" <<< "${char}" && local quote="'" || local quote='"'
449 |
450 | #characterdb
451 | #local query_char=$(echo -n "${char}" | sed -e 's/[\*]/\\&/g')
452 | matchingChars+=$(sqlite3 "${databasePath}" "SELECT uchr FROM \`${dbTable}\` WHERE instr(${char_fieldName}, ${quote}${char}${quote}) > 0;")
453 |
454 | #sort and uniq
455 | local matchingChars=$(echo -n "${matchingChars}" | sort | uniq)
456 | echo "${matchingChars}"
457 | }
458 |
459 | #Apple maintains a db with look-a-like characters for Latin chars (and some others)
460 | function getRelatedCharacters(){
461 | local char="${1}"
462 |
463 | local databasePath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/RelatedCharDB.sqlite3"
464 | local dbTable="related_dict"
465 | local fieldName="relatedChars"
466 |
467 | #no db file? return!
468 | ! [ -f "${databasePath}" ] && return
469 |
470 | #both quotes forget about it
471 | (grep -q -e \" <<< "${char}" && grep -q -e \' <<< "${char}") && return
472 | #set quote depending on string
473 | grep -q -e \" <<< "${char}" && local quote="'" || local quote='"'
474 |
475 | local characters=$(sqlite3 "${databasePath}" "SELECT ${fieldName} FROM \`${dbTable}\` WHERE instr(${fieldName}, ${quote}${char}${quote}) > 0;")
476 |
477 | #return newline delimited list (all these chars are only 1 code point)
478 | echo "${characters}" | sed $'s/./&\\n/g'
479 | }
480 |
481 | #list locales for AppleName.strings
482 | function listLocales(){
483 | ls -1 "/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources" | awk -F . '/lproj$/ {print $1}'
484 | }
485 |
486 | #macOS has groupings within the categories plist
487 | function listcategoryGroups(){
488 | local expandGroup="${1}"
489 | local cfile="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/Categories.plist"
490 |
491 | #no category file? return!
492 | ! [ -f "${cfile}" ] && return
493 |
494 | if ! ((headerOff)); then
495 | if ((expandGroup)); then
496 | echo "Group Name,Categoy Name"
497 | else
498 | echo "Group Name"
499 | fi
500 | fi
501 |
502 | local alength=$(plutil -extract CVAvailableCategories raw "${cfile}" -o -)
503 | for ((i=0;i < alength;i++)); do
504 | groupname=$(plutil -extract CVAvailableCategories.$i.Group raw "${cfile}" | sed 's/CategoryGroup-//g')
505 | if ! ((expandGroup)); then
506 | #strip off prefix
507 | echo "${groupname/#CategoryGroup-}"
508 | else
509 | local alength_2=$(plutil -extract CVAvailableCategories.$i.Categories raw "${cfile}" -o -)
510 | for ((j=0; j < alength_2; j++)); do
511 | catName=$(plutil -extract CVAvailableCategories.$i.Categories.$j raw "${cfile}" -o - | sed -e 's/^Category-//')
512 | echo "${groupname},${catName}"
513 | done
514 | fi
515 | done | sort | sed -e '/^$/d' -e '/Dingbats/d'
516 | }
517 |
518 | #macos has category plist files with comma delimited characters and 0x entries and ranges
519 | function listCategories(){
520 | #bool
521 | local expandCategory="${1}"
522 | local sectionName
523 | local cfile
524 | local cname
525 | local list
526 |
527 | local searchPath="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources"
528 |
529 | #simple list of all the Category files
530 | local categoryPaths_raw=$(find "${searchPath}" -name 'Category*.plist' | sort)
531 |
532 | #nothing found? return!
533 | [ -z "${categoryPaths_raw}" ] && return
534 |
535 | #find the ones with CVCategoryData
536 | local IFS=$'\n'
537 | for cfile in ${categoryPaths_raw}; do
538 | if plutil -type CVCategoryData "${cfile}" >/dev/null; then
539 | categoryPaths+="${cfile}"$'\n'
540 | fi
541 | done
542 |
543 | #header
544 | if ! ((headerOff)); then
545 | if ((expandCategory)); then
546 | echo "Category Name,Section Name"
547 | else
548 | echo "Category Name"
549 | fi
550 | fi
551 |
552 | #we need to further investigate each member for array within CVCategoryData
553 | for cfile in ${categoryPaths}; do
554 | #the name without path or prefix "Category-" or extension ".plist"
555 | cname=$(sed -e "s|${searchPath}/Category-||g" -e 's/\.plist$//g' -e '/^$/d' <<< "${cfile}")
556 |
557 | #these have an array named DataArray instead of a string name Data
558 | if plutil -type CVCategoryData.DataArray -expect array "${cfile}" >/dev/null; then
559 | if ((expandCategory)); then
560 | #loop through all the sections and name them
561 | local alength=$(plutil -extract CVCategoryData.DataArray raw "${cfile}")
562 | for ((i=0;i < alength;i++)); do
563 | sectionName=$(plutil -extract CVCategoryData.DataArray.$i.CVDataTitle raw "${cfile}" | sed 's/SectionTitle-//g')
564 | list+="${cname},${sectionName}"$'\n'
565 | done
566 | else
567 | list+="${cname}*"$'\n'
568 | fi
569 | else
570 | list+="${cname}"$'\n'
571 | fi
572 | done
573 | #manually add Emoji category
574 | list+="Emoji"$'\n'
575 | sort <<< "${list}" | sed -e '/Favorites/d' -e '/Recents/d' -e '/^$/d'
576 | }
577 |
578 | #macos has category plist files with comma delimited characters and 0x entries and ranges
579 | function getCategoryCharacters(){
580 |
581 | #strip off trailing, leading whitespace, and * from end (list function)
582 | local categoryArgument=$(sed -e $'s/^[ \t]*//' -e $'s/[ \t]*$//' -e 's/\*$//' <<< "${1}")
583 | local categoryName=$(awk -F ',' '{print $1}' <<< "${categoryArgument}")
584 | local categorySection=$(awk -F ',' '{print $2}' <<< "${categoryArgument}")
585 | local data
586 |
587 | #our category file in the file system
588 | local cfile="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/Category-${categoryName}.plist"
589 |
590 | #emoji shows up but does not have a category file
591 | if [ "${categoryName}" = "Emoji" ]; then
592 | #add from coreemoji AppleName.strings
593 | local PlistPath="/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${LOCALE:=en}.lproj/AppleName.strings"
594 | #get all the object names with sed, delimit using 0x1E (record separator) and get the first field
595 | data=$(plutil -convert json -o - "${PlistPath}" | sed -e 's/","/\n/g' -e '1 s/^{"//' -e '$ s/"}$//' -e 's/":"/\x1E/g' | awk -F $'\x1E' '{print $1}')
596 | #remove empty line, sort, uniq, comma delimit and output results
597 | sed 's/^$//g' <<< "${data}" | sort | uniq | tr $'\n' ","
598 | return
599 | elif ! [ -f "${cfile}" ]; then
600 | echo "Category not found: ${categoryName}" >&2
601 | return
602 | fi
603 |
604 | #if a section name is specified
605 | if [ -n "${categorySection}" ]; then
606 | local alength=$(plutil -extract CVCategoryData.DataArray raw "${cfile}")
607 |
608 | for ((i=0;i < alength;i++)); do
609 | local sectionName=$(plutil -extract CVCategoryData.DataArray.$i.CVDataTitle raw "${cfile}" | sed 's/SectionTitle-//')
610 | if [ "$sectionName" = "${categorySection}" ]; then
611 | data=$(plutil -extract CVCategoryData.DataArray.$i.Data raw "${cfile}" | sed 's/SectionTitle-//g')
612 | echo "${data}"
613 | return
614 | fi
615 | done
616 | #else only category specified (which may or may not contain sections)
617 | else
618 | #just a single sectioned category file
619 | if data=$(plutil -extract CVCategoryData.Data raw -o - "${cfile}"); then
620 | echo "${data}"
621 | #if we have a dataArray we'll need to loop
622 | elif plutil -type CVCategoryData.DataArray -expect array "${cfile}" >/dev/null; then
623 | local alength=$(plutil -extract CVCategoryData.DataArray raw "${cfile}")
624 | for ((i=0;i < alength;i++)); do
625 | data=$(plutil -extract CVCategoryData.DataArray.$i.Data raw "${cfile}" | sed 's/SectionTitle-//g')
626 | echo "${data}"
627 | done
628 | fi
629 | fi
630 | }
631 |
632 | #groups are comprised of categories, clui can iterate through all of them or individual categories
633 | function getCategoryGroupCharacters(){
634 |
635 | #strip off trailing, leading whitespace, and * from end (list function)
636 | local groupArgument=$(sed -e $'s/^[ \t]*//' -e $'s/[ \t]*$//' <<< "${1}")
637 | local groupName=$(awk -F ',' '{print $1}' <<< "${groupArgument}")
638 | local groupCategory=$(awk -F ',' '{print $2}' <<< "${groupArgument}")
639 |
640 | local cfile="/System/Library/Input Methods/CharacterPalette.app/Contents/Resources/Categories.plist"
641 |
642 | ! [ -f "${cfile}" ] && return
643 |
644 | #if a category name is also specified
645 | if [ -n "${groupCategory}" ]; then
646 | echo $(getCategoryCharacters "${groupCategory}")
647 | #else only group name specified, retrieve all categories within
648 | else
649 | #get array length
650 | local alength=$(plutil -extract CVAvailableCategories raw "${cfile}")
651 | #iterate over
652 | for ((i=0;i < alength;i++)); do
653 | #find group name
654 | local gname=$(plutil -extract CVAvailableCategories.$i.Group raw "${cfile}" | sed 's/^CategoryGroup-//')
655 | if [ "${gname}" = "${groupName}" ]; then
656 | local alength_2=$(plutil -extract CVAvailableCategories.$i.Categories raw "${cfile}")
657 | #loop through this array and print each categories chars
658 | for ((j=0; j < alength_2; j++)); do
659 | local groupCategory=$(plutil -extract CVAvailableCategories.$i.Categories.$j raw "${cfile}" | sed 's/^Category-//')
660 | [ -z "${groupCategory}" ] && break
661 | echo "$(getCategoryCharacters "${groupCategory}")"
662 | done
663 | break
664 | fi
665 | done
666 | fi
667 |
668 | }
669 |
670 | function filterFitzpatrickModifier(){
671 | echo -n "${1}" | sed $'s/[\xF0\x9F\x8F\xBB\xF0\x9F\x8F\xBC\xF0\x9F\x8F\xBD\xF0\x9F\x8F\xBE\xF0\x9F\x8F\xBF]//g'
672 | }
673 |
674 | function hasFitzpatrickMod(){
675 | if grep -q -E $'[\xF0\x9F\x8F\xBB\xF0\x9F\x8F\xBC\xF0\x9F\x8F\xBD\xF0\x9F\x8F\xBE\xF0\x9F\x8F\xBF]' <<< "${1}"; then
676 | return 0
677 | else
678 | return 1
679 | fi
680 | }
681 |
682 | #turn code point representations into literal characters
683 | function processCodePoints(){
684 | local string="${1}"
685 | local newString=""
686 | local IFS=' '
687 | local element
688 |
689 | #space delimited
690 | for element in ${string}; do
691 | case "${element}" in
692 | 0[Xx]*|[Uu]+*)
693 | #gets first and possible first and second element, trim leading zeroes (but not 0x0 or 0x00 so we can handle later)
694 | local first=$(awk -F '-' '{print $1}' <<< "${element}" | sed -E -e 's/^[Uu]\+//;s/^0[Xx]//' -e 's/^0{3,}//')
695 | local second=$(awk -F '-' '{print $2}' <<< "${element}" | sed -E -e 's/^[Uu]\+//;s/^0[Xx]//' -e 's/^0{3,}//')
696 |
697 | #validate 1-6 hex character only
698 | if ! grep -q -E "^[0-9a-fA-F]{1,6}$" <<< "${first}" || ([ -n "${second}" ] && ! grep -q -E "^[0-9a-fA-F]{1,6}$" <<< "${second}"); then
699 | echo "Invalid hexadecimal range: ${element}" >&2
700 | continue
701 | fi
702 |
703 | #ensure values lower than limit of Unicode
704 | if ((0x${first} > 0x10ffff)) || ((0x${second} > 0x10ffff)); then
705 | echo "Values cannot be greater than 10ffff: ${element}" >&2
706 | continue
707 | fi
708 |
709 | #from 0x0-0x0?! you weirdo. bash vars can't handle nulls, while zsh does: it's not in chardb anyway so we just ignore!
710 | [ $((0x${first})) -eq 0 ] && [ $((0x${second})) -eq 0 ] && echo "" && return
711 |
712 | #if this is a range and an entry is 0 (NULL) we are going to fudge it to 1 so category "Unicode,Basic Latin" doesn't fail
713 | if [ -n "${first}" ] && [ -n "${second}" ] ; then
714 | [ $((0x${first})) -eq 0 ] && first=1
715 | [ $((0x${second})) -eq 0 ] && second=1
716 | fi
717 |
718 | newString+=$(unichr $((0x${first})))
719 | #note if this is a range include hyphen
720 | [ -n "${second}" ] && newString+="-$(unichr $((0x${second})))"
721 | ;;
722 | #literal character or character range
723 | *)
724 | newString+="${element}"
725 | ;;
726 | esac
727 | done
728 | printf "%s" "${newString}"
729 | }
730 |
731 | #produces Unicode given a code point value (in decimal)
732 | function unichr(){
733 | #based on https://stackoverflow.com/a/16509364 from Orwellophile
734 |
735 | #used by fast_char
736 | local CHAR
737 |
738 | local charDecimal="${1}"
739 | local byteCounter=0
740 | # o - Ceiling decreases from 63 to 31, 15, and finally 7
741 | local ceiling=63
742 | # Accum. bits out of 256, starts at 128 increments to 192, 224, and 240 (differences of 128, 64, 32, and 16)
743 | local accumBits=128
744 | #output string
745 | local str='' # Output string
746 |
747 | #convert decimal to character
748 | function fast_char() {
749 | local __octal
750 | printf -v __octal '%03o' $1
751 | printf -v CHAR \\$__octal
752 | }
753 |
754 | ! ((charDecimal)) && echo $'\0' && return
755 |
756 | #if this is in the surrogate range (0xD800-0xDFFF), bail
757 | ((( charDecimal >= 0xD800 )) && (( charDecimal <= 0xDFFF ))) && return
758 |
759 | #if it's under 0x80 quickly print it out and return
760 | (( charDecimal < 0x80 )) && { fast_char "${charDecimal}"; printf "%s" "${CHAR}"; return; }
761 |
762 | #work through each byte, as long as the ordinal is bigger than the ceiling (which decreases from 63 to 31, 15, and finally 7)
763 | while (( charDecimal > ceiling )); do
764 | #fast_char $(( t = 0x80 | charDecimal & 0x3f ))
765 | fast_char $(( 0x80 | charDecimal & 0x3f ))
766 | #prepend the reply (we are working backward through the encoding)
767 | str="${CHAR}${str}"
768 | (( charDecimal >>= 6, byteCounter++, accumBits += ceiling+1, ceiling>>=1 ))
769 | done
770 |
771 | #final byte
772 | fast_char $(( accumBits | charDecimal ))
773 |
774 | #append final char for output
775 | echo -nE "${CHAR}${str}"
776 | }
777 |
778 | function finishUp(){
779 | #no need to close up?
780 | ! ((HAD_OUTPUT)) && exit
781 |
782 | #close output
783 | case "${outputOption}" in
784 | #output breaks on objects, close out the array of them
785 | "JSON") echo -e '\n]' ;;
786 | #rtf can deal with sudden closure anywhere
787 | "RTF") echo '}';;
788 | #it just looks nice yeah?
789 | "CHARACTER") echo "" ;;
790 | esac
791 | exit
792 | }
793 |
794 | function printUsage(){
795 | helpText="clui (1.0) - Command Line Unicode Info (https://github.com/brunerd/clui)\nUsage: clui [options] ...\n\nInput can be:\n * Unicode characters, space or comma delimited (use -x to expand non-delimited strings)\n * Hexadecimal codepoint representations (U+hhhhh or 0xhhhhh), double-quoted muti-point sequences\n * Hyphenated ranges (ascending or descending): z-a, U+A1-U+BF or 0x20-0x7E\n * Category or Group names (see Input Options)\n * Descriptive words or phrases (see Input Options)\n\nOutput Options\n\n -D Discrete info fields for CharacterDB and localized AppleName.strings\n\n -H Hide characters lacking info descriptions\n\n -l \n Show localized info (Emoji only). Use -Ll to list available localizations.\n\n -p Preserve case of CharacterDB info \n \n Encoding style for UTF field\n -E \n h* UTF-8 hexadecimal, space delimited and capitalized (NN) (default)\n H Hex HTML Entity UTF-32 (nnnn;)\n 0 Octal UTF-8 with leading 0 (\\\\0nnn)\n o Octal UTF-8 (\\\\nnn)\n x Shell style UTF-8 hex(\\\\xnn)\n u JS style UTF-16 (\\\\unnnn)\n U zsh style UTF-32 Unicode Code Point (\\\\Unnnnnnnn)\n w Web/URL UTF-8 encoding (%nn)\n\n Output format \n -O \n C* CSV (default)\n c Character-only, space delimited\n j JSON output (array of objects)\n J JSON Sequence output (objects delimited by 0x1E and 0x0A)\n p Plain output (no field descriptions)\n r RTF output (plain output with large sized characters)\n y YAML output\n\n Format dependent output options\n -f set font size for RTF output of char and info fields (default: 256,32)\n -h Hide headers for CSV output\n \nInput Options\n\n -C [,Subsection]\n Treat input as a Category name with possible a subsection (see -L for listing)\n\n -F Remove Fitzpatrick skin tone modifier and process, then process as-is\n\n -G [,Category]\n Treat input as a Group name with possible category name (see -L for listing)\n\n -l \n Search localized descriptions (Emoji only), use -Ll to list available localizations\n\n -S \n Treat input as search criteria\n d Search descriptions in CharacterDB and AppleName.strings (case insensitive)\n c Search for character in other Unicode sequences\n C Search for character plus \"related characters\"\n\n -x Expand and describe each individual code point in a sequence\n -X Expand plus display original sequence prior to expansion\n\n -V Verbatim, process input raw/as-is, no additional interpretation or delimitation\n \nOther Modes\n\n List categories and groups\n -L List categories or groups in CSV (use -h to suppress header)\n c Category list (* after a name denotes subsections)\n C Category list, with subsections expanded\n g Groups of categories, top level name\n G Group name with member categories expanded\n l Locales available to search and display results from (Emoji only) \n \n -u Display usage info (aka help) with less (press q to quit)\n \nExamples:\n\n Search for characters a to z plus \"related characters\" and output as CSV (default)\n clui -SC a-z\n\n Look up all available Categories\n clui -Lc\n\n Get every character in Emoji category and output in RTF to a file\n clui -Or -C Emoji > Emoji.rtf\n\n All characters in Emoji category with discrete info fields in Spanish to a CSV to a file\n clui -D -l es -C Emoji > Emoji-es.csv\n\n Search descriptions for substring \"family\" and expand multi-code point ZWJ sequences \n clui -X -Sd \"family\"\n\t"
796 | echo -e "${helpText}" | less
797 | exit 0
798 | }
799 |
800 | ########
801 | # MAIN #
802 | ########
803 |
804 | #turn off globbing so * is not expanded in characer search
805 | set -f
806 |
807 | #if control-c or other interruption, we can close up our JSON and RTF output
808 | trap 'finishUp' TERM INT HUP
809 |
810 | #turn on debug output if Shift key held down
811 | shiftKeyDown="$(osascript -l JavaScript -e "ObjC.import('Cocoa'); ($.NSEvent.modifierFlags & $.NSEventModifierFlagShift) > 1" 2>/dev/null)"
812 | #if /tmp/debug file found or Shift key is held down then xtrace debug output
813 | if [ -f /tmp/debug ] || [ "${shiftKeyDown}" = "true" ] || [ "${shiftKeyDown}" = "True" ]; then xtraceFlag=1; fi
814 | ((${xtraceFlag})) && set -x
815 |
816 | #macOS version check (plutil raw output is the limiting factor for OS)
817 | if [ "$(sw_vers -productVersion | cut -d. -f1)" -le 12 ]; then
818 | echo "macOS Monterey (12) or higher required, exiting" >&2
819 | exit 1
820 | fi
821 |
822 | #process any options
823 | while getopts ":E:f:L:l:O:U:S:CDFGhHpruVxX" option; do
824 | case "${option}" in
825 | p) preserveCase=1
826 | ;;
827 | #discrete descriptions
828 | D)
829 | #GLOBAL used in outputCharInfo
830 | DISCRETE_DESCRIPTIONS=1
831 | ;;
832 | #locale
833 | l)
834 | if [ -d "/System/Library/PrivateFrameworks/CoreEmoji.framework/Versions/A/Resources/${OPTARG}.lproj" ]; then
835 | #GLOBAL used in: getCharsByDescriptionSearch() getDescriptionByCharMatch() outputCharInfo() findMatchingChars() (defaults to en)
836 | LOCALE="${OPTARG}"
837 | else
838 | echo "Invalid locale: ${OPTARG}" >&2
839 | exit 1
840 | fi
841 | ;;
842 | #font size -f [large,small]
843 | f)
844 | #get CSV data for font sizes
845 | #match the size scale in TextEdit where the GUI is 2x what is in the RTF doc
846 | f_param1=$(awk -F , '{print $1}' <<< "${OPTARG}")
847 | ((${f_param1})) && FONTSIZE_LARGE="$((2 * f_param1))"
848 | f_param2=$(awk -F , '{print $2}' <<< "${OPTARG}")
849 | ((${f_param2})) && FONTSIZE_SMALL="$((2 * f_param2))"
850 | ;;
851 | #usage
852 | u)
853 | printUsage
854 | exit 0
855 | ;;
856 | #list categories or groups
857 | L)
858 | case "${OPTARG}" in
859 | C|c)
860 | [ "${OPTARG}" = "C" ] && expandCategory=1
861 | listType="CATEGORY"
862 | ;;
863 | #list Groups of fonts
864 | G|g)
865 | [ "${OPTARG}" = "G" ] && expandGroup=1
866 | listType="GROUP"
867 | ;;
868 | l)
869 | listType="LOCALE"
870 | ;;
871 | esac
872 | ;;
873 | G) [ -z "${searchMethod}" ] && searchMethod="GROUP" ;;
874 | C) [ -z "${searchMethod}" ] && searchMethod="CATEGORY" ;;
875 | S)
876 | case ${OPTARG} in
877 | d) [ -z "${searchMethod}" ] && searchMethod="DESCRIPTION" ;;
878 | c) [ -z "${searchMethod}" ] && searchMethod="CHARACTER" ;;
879 | C) [ -z "${searchMethod}" ] && searchMethod="CHARACTER" && relatedChars=1 ;;
880 | esac
881 | ;;
882 | F) filterFitzpatrick=1 ;;
883 | H) HIDE_UNDESCRIBED=1 ;; #GLOBAL used in outputCharInfo
884 | h) headerOff=1 ;;
885 | E) encodingScheme="${OPTARG}";;
886 | V) verbatimMode=1 ;;
887 | X|x)
888 | #summarize strings before expansion
889 | [ "${option}" = "X" ] && summarize=1
890 | #put space between points
891 | expand=1
892 | ;;
893 | O)
894 | case ${OPTARG} in
895 | J) outputOption="JSONSEQ" ;;
896 | j) outputOption="JSON" ;;
897 | y) outputOption="YAML" ;;
898 | p) outputOption="PLAIN" ;;
899 | c) outputOption="CHARACTER" ;;
900 | r|R) outputOption="RTF" ;;
901 | C|*) outputOption="CSV" ;;
902 | esac
903 | ;;
904 | #do nothing for unknown options
905 | '?') : ;;
906 | esac
907 | done
908 |
909 | #if required shift parameters, OPTIND is only set after getopts
910 | [ ${OPTIND} -ge 2 ] && shift $((OPTIND-1))
911 |
912 | #list categories or groups and exit
913 | case "${listType}" in
914 | "CATEGORY")
915 | listCategories "${expandCategory}"
916 | exit 0
917 | ;;
918 | "GROUP")
919 | listcategoryGroups "${expandGroup}"
920 | exit 0
921 | ;;
922 | "LOCALE")
923 | listLocales
924 | exit 0
925 | ;;
926 | esac
927 |
928 | #if we are piping in input, use that
929 | if ! [ -t '0' ]; then
930 | #set positional parameters from cat input (IFS delimited)
931 | set -- $(cat)
932 | #otherwise see if any input is provided, if not offer help
933 | elif [ "${1}" != $'\n' ] && [ -z "$(echo -n "$@")" ]; then
934 | echo -e "clui (1.0) - Command Line Unicode Info (https://github.com/brunerd/clui)\nUsage: clui [options] ... (option -u for usage help)"
935 | exit 0
936 | fi
937 |
938 | #process each paramter, quote "$@" to split by positional parameter (unquoted $@ splits by IFS)
939 | for parameter in "$@"; do
940 | #replace parameter with results for these cases
941 | case "${searchMethod}" in
942 | "DESCRIPTION")
943 | #get characters, turning line feeds into commas to process each individually later
944 | parameter=$(getCharsByDescriptionSearch "${parameter}")
945 | [ -n "$parameter" ] && parameter=$(echo "${parameter}" | tr $'\n' ',') || continue
946 | ;;
947 | "CATEGORY")
948 | parameter=$(getCategoryCharacters "${parameter}")
949 | ;;
950 | "GROUP")
951 | parameter=$(getCategoryGroupCharacters "${parameter}")
952 | ;;
953 | esac
954 |
955 | #verbatim (-V) do not processCodePoints()
956 | if ((verbatimMode)); then
957 | #-X show as-is first, if more than 1 char
958 | if ((summarize)) && [ -n "${parameter:1:1}" ]; then
959 | outputCharInfo "${parameter}" "${outputOption}" "${encodingScheme}"
960 | fi
961 |
962 | #-x expand each and every code point in string
963 | if ((expand)); then
964 | #go through each char
965 | for ((i=0;;i++)); do
966 | char="${parameter:$i:1}"
967 | [ -z "${char}" ] && break
968 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}"
969 | done
970 | #or treat entire parameter as a single entity
971 | else
972 | outputCharInfo "${parameter}" "${outputOption}" "${encodingScheme}"
973 | fi
974 | #process next parameter
975 | continue
976 | fi
977 |
978 | #treat comma and newline delimited items discretely
979 | IFS=$',\n'
980 | for chunk in ${parameter}; do
981 | #transform any U+hhhh/0xhhhh representation to the literal character/sequence, space is not significant
982 | chunk=$(processCodePoints "${chunk}")
983 |
984 | #only two ways to handle it
985 | case "${chunk}" in
986 | #if it has a hyphen treat as range (if misinterpreted use -V)
987 | *-*)
988 | #if it is simply a "-" alone
989 | if [ "${chunk}" = "-" ]; then
990 | start="-"
991 | else
992 | #Get our start and end (if it exists)
993 | start="$(awk -F - '{print $1}' <<< "${chunk}")"
994 | [ -z "${start}" ] && continue
995 | end="$(awk -F - '{print $2}' <<< "${chunk}")"
996 | fi
997 | [ -z "${end}" ] && end="${start}"
998 |
999 | #ensure they are only a single code point, otherwise ignore and move of
1000 | [ "${start:1:1}" ] && continue
1001 | [ "${end:1:1}" ] && continue
1002 |
1003 | #get decimal of character
1004 | start=$(/usr/bin/printf "%d" "'${start}")
1005 | end=$(/usr/bin/printf "%d" "'${end}")
1006 |
1007 | #to increment or decrement, that is the question
1008 | if ((start <= end)); then
1009 | comparison="<="
1010 | inc_dec="i++"
1011 | else
1012 | inc_dec="i--"
1013 | comparison=">="
1014 | fi
1015 |
1016 | ## loop through code point(s)
1017 | for ((i=${start}; i ${comparison} end; ${inc_dec})); do
1018 | #if searching by character or related chars, get them and go through them
1019 | if [ "${searchMethod}" = "CHARACTER" ]; then
1020 | #get characters (newline delimited)
1021 | newChunk=$(findMatchingChars "$(unichr ${i})")
1022 | #get related characters also, if applicable (newline delimited also)
1023 | ((relatedChars)) && newChunk+=$'\n'$(getRelatedCharacters "$(unichr ${i})")
1024 | #sort and uniq the results and turn into csv
1025 | for char in $(sort -h <<< "${newChunk}" | uniq); do
1026 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}"
1027 | done
1028 | else
1029 | [ "${i}" = 10 ] && char=$'\n' || char="$(unichr ${i})"
1030 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}"
1031 | fi
1032 | done
1033 | ;;
1034 | #anything else, not a range
1035 | *)
1036 | #search for the character in other sequences
1037 | if [ "${searchMethod}" = "CHARACTER" ]; then
1038 | #get characters (newline delimited)
1039 | newChunk=$(findMatchingChars "${chunk}")
1040 | #get related characters also, if applicable (newline delimited also)
1041 | ((relatedChars)) && newChunk+=$'\n'$(getRelatedCharacters "${chunk}")
1042 | #sort and uniq the results and output
1043 | for char in $(sort -h <<< "${newChunk}" | uniq); do
1044 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}"
1045 | done
1046 | continue
1047 | fi
1048 |
1049 | #if over one code point
1050 | if [ -n "${chunk:1:1}" ]; then
1051 | #output version without fitzpatrick modifier if option on and modifier found
1052 | if ((filterFitzpatrick)) && hasFitzpatrickMod "${chunk}"; then
1053 | outputCharInfo "$(filterFitzpatrickModifier "${chunk}")" "${outputOption}" "${encodingScheme}"
1054 | fi
1055 |
1056 | #if (-X) "summarize" the chunk first, display and get info as-is
1057 | if ((summarize)); then
1058 | outputCharInfo "${chunk}" "${outputOption}" "${encodingScheme}"
1059 | fi
1060 | fi
1061 |
1062 | # if -x or -X go through each character
1063 | if ((expand)); then
1064 | for ((i=0;;i++)); do
1065 | char="${chunk:$i:1}"
1066 | [ -z "${char}" ] && break
1067 | outputCharInfo "${char}" "${outputOption}" "${encodingScheme}"
1068 | done
1069 | #if not expand, pass entire chunk to our string
1070 | else
1071 | outputCharInfo "${chunk}" "${outputOption}" "${encodingScheme}"
1072 | fi
1073 | ;;
1074 | esac
1075 | done
1076 | done
1077 |
1078 | #close up any output formats that need it
1079 | finishUp
1080 |
--------------------------------------------------------------------------------