├── docs ├── .nojekyll ├── yard │ ├── css │ │ ├── common.css │ │ └── full_list.css │ ├── frames.html │ ├── file.about.html │ ├── top-level-namespace.html │ ├── Unisec │ │ ├── Utils.html │ │ ├── CLI.html │ │ ├── Bidi.html │ │ └── CLI │ │ │ ├── Commands │ │ │ ├── Bidi.html │ │ │ ├── Surrogates.html │ │ │ ├── Normalize.html │ │ │ ├── Confusables.html │ │ │ ├── Properties.html │ │ │ ├── Properties │ │ │ │ └── List.html │ │ │ ├── Versions.html │ │ │ ├── Grep.html │ │ │ └── Size.html │ │ │ └── Commands.html │ ├── file.documentation.html │ ├── file.LICENSE.html │ ├── file_list.html │ ├── file.quick-start.html │ ├── file.publishing.html │ ├── file.CHANGELOG.html │ ├── Unisec.html │ ├── index.html │ ├── file.README.html │ └── file.install.html ├── _media │ ├── unisec-logo.png │ └── unisec-favicon.ico ├── _navbar.md ├── about.md ├── _coverpage.md ├── _sidebar.md ├── pages │ ├── documentation.md │ ├── quick-start.md │ ├── publishing.md │ ├── install.md │ └── usage.md ├── CHANGELOG.md ├── index.html └── vendor │ ├── plugins │ ├── docsify-image-caption.min.js │ └── docsify-sidebar-collapse.min.js │ └── prismjs │ └── components │ └── prism-ruby.min.js ├── .tool-versions ├── docs-tools ├── .tool-versions ├── package.json └── gulpfile.mjs ├── .gitignore ├── lib ├── unisec │ ├── version.rb │ ├── cli │ │ ├── rugrep.rb │ │ ├── size.rb │ │ ├── versions.rb │ │ ├── cli.rb │ │ ├── hexdump.rb │ │ ├── confusables.rb │ │ ├── surrogates.rb │ │ ├── properties.rb │ │ ├── bidi.rb │ │ └── normalization.rb │ ├── confusables.rb │ ├── versions.rb │ ├── hexdump.rb │ ├── utils.rb │ ├── surrogates.rb │ ├── normalization.rb │ ├── rugrep.rb │ └── size.rb └── unisec.rb ├── bin └── unisec ├── .yardopts ├── Rakefile ├── .editorconfig ├── test ├── test_versions.rb ├── test_confusables.rb ├── test_normalization.rb ├── test_rugrep.rb ├── test_surrogates.rb ├── test_hexdump.rb ├── test_size.rb ├── test_properties.rb └── test_bidi.rb ├── .github ├── dependabot.yml └── workflows │ └── ruby.yml ├── .rubocop.yml ├── LICENSE ├── Gemfile ├── unisec.gemspec ├── Gemfile.lock └── README.md /docs/.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.tool-versions: -------------------------------------------------------------------------------- 1 | ruby 3.3.0 2 | -------------------------------------------------------------------------------- /docs-tools/.tool-versions: -------------------------------------------------------------------------------- 1 | nodejs 20.0.0 2 | -------------------------------------------------------------------------------- /docs/yard/css/common.css: -------------------------------------------------------------------------------- 1 | /* Override this file with custom rules */ -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Documentation 2 | docs-tools/node_modules 3 | .yardoc 4 | pkg 5 | -------------------------------------------------------------------------------- /docs/_media/unisec-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Acceis/Unisec/HEAD/docs/_media/unisec-logo.png -------------------------------------------------------------------------------- /docs/_media/unisec-favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Acceis/Unisec/HEAD/docs/_media/unisec-favicon.ico -------------------------------------------------------------------------------- /lib/unisec/version.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | module Unisec 4 | # Version of unisec library and app 5 | VERSION = '0.0.6' 6 | end 7 | -------------------------------------------------------------------------------- /docs/_navbar.md: -------------------------------------------------------------------------------- 1 | - [Home](/) 2 | - [Library documentation](https://acceis.github.io/unisec/yard/Unisec) 3 | - [Source](https://github.com/Acceis/unisec) 4 | -------------------------------------------------------------------------------- /bin/unisec: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env ruby 2 | # frozen_string_literal: true 3 | 4 | require 'unisec' 5 | require 'unisec/cli/cli' 6 | 7 | Dry::CLI.new(Unisec::CLI::Commands).call 8 | -------------------------------------------------------------------------------- /.yardopts: -------------------------------------------------------------------------------- 1 | --output-dir docs/yard 2 | --markup markdown 3 | --markup-provider commonmarker 4 | --plugin coderay 5 | - 6 | --main README.md 7 | docs/pages/*.md 8 | docs/CHANGELOG.md 9 | docs/about.md 10 | LICENSE 11 | -------------------------------------------------------------------------------- /Rakefile: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'rake/testtask' 4 | require 'bundler/gem_tasks' 5 | 6 | Rake::TestTask.new(:test) do |t| 7 | t.libs << 'test' 8 | t.libs << 'lib' 9 | t.test_files = FileList['test/**/test_*.rb'] 10 | end 11 | 12 | desc 'Run tests' 13 | task default: :test 14 | -------------------------------------------------------------------------------- /docs/about.md: -------------------------------------------------------------------------------- 1 | # About 2 | 3 | ## Logo 4 | 5 | Logo made with [DesignEvo](https://www.designevo.com). 6 | 7 | ## User documentation 8 | 9 | The user documentation is made with [docsify](https://docsify.js.org), the theme 10 | used is [docsify-themeable](https://jhildenbiddle.github.io/docsify-themeable) 11 | (Simple Dark scheme). 12 | -------------------------------------------------------------------------------- /docs/_coverpage.md: -------------------------------------------------------------------------------- 1 | logo 2 | 3 | # unisec 4 | 5 | > Unicode Security Toolkit 6 | 7 | [Quick start](pages/quick-start?id=quick-start) 8 | [Install](pages/install) 9 | [Usage](pages/usage) 10 | [Changelog](CHANGELOG) 11 | 12 | ![color](#ffffff) 13 | 14 | -------------------------------------------------------------------------------- /docs/_sidebar.md: -------------------------------------------------------------------------------- 1 | - Getting started 2 | 3 | - [Quick start](pages/quick-start.md) 4 | - [Installation](pages/install.md) 5 | - [Usage](pages/usage.md) 6 | 7 | - Development 8 | 9 | - [Documentation](pages/documentation.md) 10 | - [Publishing](pages/publishing.md) 11 | 12 | - [About](about.md) 13 | - [Changelog](CHANGELOG.md) 14 | -------------------------------------------------------------------------------- /lib/unisec.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'unisec/version' 4 | 5 | require 'unisec/bidi' 6 | require 'unisec/confusables' 7 | require 'unisec/hexdump' 8 | require 'unisec/normalization' 9 | require 'unisec/properties' 10 | require 'unisec/rugrep' 11 | require 'unisec/size' 12 | require 'unisec/surrogates' 13 | require 'unisec/versions' 14 | -------------------------------------------------------------------------------- /.editorconfig: -------------------------------------------------------------------------------- 1 | # EditorConfig: https://EditorConfig.org 2 | 3 | # top-most EditorConfig file 4 | root = true 5 | 6 | # Unix-style newlines with a newline ending every file 7 | [*] 8 | end_of_line = lf 9 | insert_final_newline = true 10 | 11 | # ruby 12 | [*.rb] 13 | charset = utf-8 14 | indent_style = space 15 | indent_size = 2 16 | trim_trailing_whitespace = true 17 | -------------------------------------------------------------------------------- /test/test_versions.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: false 2 | 3 | require 'minitest/autorun' 4 | require 'unisec' 5 | 6 | class UnisecTest < Minitest::Test 7 | def test_unisec_versions_versions 8 | data = Unisec::Versions.versions 9 | assert_kind_of(Hash, data) 10 | data.each do |_k, v| 11 | assert(v.key?(:version)) 12 | assert(v.key?(:label)) 13 | end 14 | end 15 | end 16 | -------------------------------------------------------------------------------- /docs/pages/documentation.md: -------------------------------------------------------------------------------- 1 | # Documentation 2 | 3 | ## CLI doc 4 | 5 | See [Usage](pages/usage.md?id=cli). 6 | 7 | ### Serve locally 8 | 9 | ``` 10 | $ npm i docsify-cli gulp-cli -g 11 | $ cd docs-tools 12 | $ npm i 13 | $ gulp 14 | $ docsify serve ../docs 15 | ``` 16 | 17 | ## Library doc 18 | 19 | The output directory of the library documentation will be `docs/yard`. 20 | 21 | You can consult it online [here](https://acceis.github.io/unisec/yard/). 22 | 23 | ### Build & serve locally 24 | 25 | ``` 26 | $ bundle exec yard doc && bundle exec yard server 27 | ``` 28 | -------------------------------------------------------------------------------- /docs/pages/quick-start.md: -------------------------------------------------------------------------------- 1 | # Quick start 2 | 3 | ## Quick install 4 | 5 | ``` 6 | $ gem install unisec 7 | ``` 8 | 9 | ## Default usage: CLI 10 | 11 | Example converting surrogates to code point. 12 | 13 | ``` 14 | $ unisec surrogates from 0xD801 0xDC37 15 | Char: 𐐷 16 | Code Point: 0x10437, 0d66615, 0b10000010000110111 17 | High Surrogate: 0xD801, 0d55297, 0b1101100000000001 18 | Low Surrogate: 0xDC37, 0d56375, 0b1101110000110111 19 | ``` 20 | 21 | ## Default usage: library 22 | 23 | ```ruby 24 | require 'unisec' 25 | 26 | surr = Unisec::Surrogates.new(55357, 56489) 27 | surr.code_point # => 128169 28 | ``` 29 | -------------------------------------------------------------------------------- /test/test_confusables.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: false 2 | 3 | require 'minitest/autorun' 4 | require 'unisec' 5 | 6 | class UnisecTest < Minitest::Test 7 | def test_unisec_confusables_list 8 | data = Unisec::Confusables.list('!') 9 | assert_kind_of(Array, data) 10 | assert_kind_of(String, data.first) 11 | assert_equal(['!', 'ǃ', 'ⵑ', '‼', '⁉', '⁈'], data) 12 | end 13 | 14 | def test_unisec_confusables_randomize 15 | assert_kind_of(String, Unisec::Confusables.randomize('noraj')) 16 | # Should not fail when then is no confusable alternative 17 | assert_equal('é🚀', Unisec::Confusables.randomize('é🚀')) 18 | end 19 | end 20 | -------------------------------------------------------------------------------- /docs/yard/frames.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Documentation by YARD 0.9.36 6 | 7 | 18 | 22 | 23 | -------------------------------------------------------------------------------- /.github/dependabot.yml: -------------------------------------------------------------------------------- 1 | # To get started with Dependabot version updates, you'll need to specify which 2 | # package ecosystems to update and where the package manifests are located. 3 | # Please see the documentation for all configuration options: 4 | # https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file 5 | 6 | version: 2 7 | updates: 8 | - package-ecosystem: "bundler" 9 | directory: "/" 10 | schedule: 11 | interval: "daily" 12 | labels: 13 | - "dependency::update" 14 | - package-ecosystem: "npm" 15 | directory: "/docs-tools" 16 | schedule: 17 | interval: "daily" 18 | labels: 19 | - "dependency::update" 20 | -------------------------------------------------------------------------------- /test/test_normalization.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: false 2 | 3 | require 'minitest/autorun' 4 | require 'unisec' 5 | 6 | class UnisecTest < Minitest::Test 7 | def test_unisec_normalization 8 | assert_equal("\u{1E9B 0323}", Unisec::Normalization.nfc("\u{1E9B 0323}")) 9 | assert_equal("\u{1E69}", Unisec::Normalization.nfkc("\u{1E9B 0323}")) 10 | assert_equal("\u{017F 0323 0307}", Unisec::Normalization.nfd("\u{1E9B 0323}")) 11 | assert_equal("\u{0073 0323 0307}", Unisec::Normalization.nfkd("\u{1E9B 0323}")) 12 | assert_equal("\u{2126}", Unisec::Normalization.new("\u{2126}").original) 13 | 14 | payload = "" 15 | assert_equal(payload, Unisec::Normalization.replace_bypass(payload).unicode_normalize(:nfkc)) 16 | end 17 | end 18 | -------------------------------------------------------------------------------- /test/test_rugrep.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: false 2 | 3 | require 'minitest/autorun' 4 | require 'unisec' 5 | 6 | class UnisecTest < Minitest::Test 7 | def test_unisec_rugrep_regrep 8 | search = Unisec::Rugrep.regrep('large \w+ square') 9 | assert_kind_of(Array, search) 10 | assert(search.first.has_key?(:char)) 11 | assert(search.first.has_key?(:codepoint)) 12 | assert(search.first.has_key?(:name)) 13 | assert_kind_of(String, search.first[:char]) 14 | assert_kind_of(Integer, search.first[:codepoint]) 15 | assert_kind_of(String, search.first[:name]) 16 | search2 = Unisec::Rugrep.regrep('azerty') 17 | assert_kind_of(Array, search2) 18 | assert_empty(search2) 19 | end 20 | 21 | def test_unisec_rugrep_ucd_derivedname_version 22 | assert(/\A\d+\.\d+\.\d+\Z/.match?(Unisec::Rugrep.ucd_derivedname_version)) 23 | end 24 | end 25 | -------------------------------------------------------------------------------- /.rubocop.yml: -------------------------------------------------------------------------------- 1 | inherit_mode: 2 | merge: 3 | - Exclude 4 | AllCops: 5 | TargetRubyVersion: 3.0 6 | NewCops: enable 7 | Exclude: 8 | - 'test/*.rb' 9 | SuggestExtensions: false 10 | Layout/HashAlignment: 11 | Exclude: 12 | - '*.gemspec' 13 | Layout/LineLength: 14 | AllowedPatterns: 15 | - !ruby/regexp /\A\s*# / 16 | Lint/MissingSuper: 17 | Exclude: 18 | - 'lib/tls_map/cli/cli.rb' 19 | Metrics/AbcSize: 20 | Exclude: 21 | - 'lib/unisec/properties.rb' 22 | Metrics/ClassLength: 23 | Exclude: 24 | - 'lib/unisec/properties.rb' 25 | Metrics/MethodLength: 26 | Max: 20 27 | Exclude: 28 | - 'lib/unisec/properties.rb' 29 | Naming/MethodParameterName: 30 | Exclude: 31 | - 'lib/unisec/surrogates.rb' 32 | Style/Documentation: 33 | Exclude: 34 | - 'lib/unisec/cli/surrogates.rb' 35 | - 'lib/unisec/utils.rb' 36 | Gemspec/AddRuntimeDependency: 37 | Enabled: false # https://github.com/rubocop/rubocop/pull/13030#discussion_r1674791776 38 | -------------------------------------------------------------------------------- /docs/pages/publishing.md: -------------------------------------------------------------------------------- 1 | # Publishing 2 | 3 | Be sure all **tests** pass! 4 | 5 | ``` 6 | $ bundle exec rake test 7 | ``` 8 | 9 | Also check the **linter**: 10 | 11 | ``` 12 | $ bundle exec rubocop 13 | ``` 14 | 15 | Update the version in `lib/unisec/version.rb`. 16 | 17 | Update the documentation, at least: 18 | 19 | - `README.md` 20 | - `docs/CHANGELOG.md` 21 | - `docs/pages/usage.md` 22 | 23 | On new release don't forget to rebuild the **library documentation**: 24 | 25 | ``` 26 | $ bundle exec yard doc 27 | ``` 28 | 29 | Create an **annotated git tag**: 30 | 31 | ``` 32 | $ git tag -a v1.5.0 33 | ``` 34 | 35 | Push the changes including the tags: 36 | 37 | ``` 38 | $ git push --follow-tags 39 | ``` 40 | 41 | Build the **gem**: 42 | 43 | ``` 44 | $ gem build unisec.gemspec 45 | # or 46 | $ bundle exec rake build 47 | ``` 48 | 49 | Push the new gem release on **RubyGems** See https://guides.rubygems.org/publishing/. 50 | 51 | ``` 52 | $ gem push unisec-1.5.0.gem 53 | ``` 54 | -------------------------------------------------------------------------------- /docs-tools/package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "unisec docs", 3 | "version": "1.0.0", 4 | "description": "Documentation for unisec", 5 | "main": "gulpfile.mjs", 6 | "scripts": { 7 | "build-only": "gulp build", 8 | "build": "gulp" 9 | }, 10 | "repository": { 11 | "type": "git", 12 | "url": "git+https://github.com/Acceis/unisec.git" 13 | }, 14 | "keywords": [ 15 | "cybersecurity", 16 | "security", 17 | "infosec", 18 | "unicode" 19 | ], 20 | "author": "Alexandre ZANNI (ACCEIS)", 21 | "license": "Copyright", 22 | "bugs": { 23 | "url": "https://github.com/Acceis/unisec/issues" 24 | }, 25 | "homepage": "https://github.com/Acceis/unisec", 26 | "dependencies": { 27 | "@h-hg/docsify-image-caption": "^0.1.2", 28 | "del": "^7.1.0", 29 | "docsify": "^4.13.1", 30 | "docsify-sidebar-collapse": "^1.3.5", 31 | "docsify-tabs": "^1.6.3", 32 | "docsify-themeable": "^0.9.0", 33 | "gulp": "^5.0.0", 34 | "prismjs": "^1.29.0" 35 | } 36 | } 37 | -------------------------------------------------------------------------------- /lib/unisec/cli/rugrep.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'dry/cli' 4 | require 'unisec' 5 | 6 | module Unisec 7 | module CLI 8 | module Commands 9 | # CLI command `unisec grep` for the class {Unisec::Rugrep} from the lib. 10 | # 11 | # Example: 12 | # 13 | # ```plaintext 14 | # $ unisec grep 'FRENCH \w+' 15 | # U+20A3 ₣ FRENCH FRANC SIGN 16 | # U+1F35F 🍟 FRENCH FRIES 17 | # ``` 18 | class Grep < Dry::CLI::Command 19 | desc 'Search for Unicode code point names by regular expression' 20 | 21 | argument :regexp, required: true, 22 | desc: 'regular expression' 23 | 24 | # Hexdump of all Unicode encodings. 25 | # @param regexp [Regexp] Regular expression without delimiters or modifiers. 26 | # Supports everything Ruby Regexp supports 27 | def call(regexp: nil, **) 28 | puts Unisec::Rugrep.regrep_display(regexp) 29 | end 30 | end 31 | end 32 | end 33 | end 34 | -------------------------------------------------------------------------------- /lib/unisec/cli/size.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'dry/cli' 4 | require 'unisec' 5 | 6 | module Unisec 7 | module CLI 8 | module Commands 9 | # CLI command `unisec size` for the class {Unisec::Size} from the lib. 10 | # 11 | # Example: 12 | # 13 | # ```plaintext 14 | # $ unisec size 🧑🏼‍🔬 15 | # Code point(s): 4 16 | # Grapheme(s): 1 17 | # UTF-8 byte(s): 15 18 | # UTF-16 byte(s): 14 19 | # UTF-32 byte(s): 16 20 | # UTF-8 unit(s): 15 21 | # UTF-16 unit(s): 7 22 | # UTF-32 unit(s): 4 23 | # ``` 24 | class Size < Dry::CLI::Command 25 | desc 'All kinf of size information about a Unicode string' 26 | 27 | argument :input, required: true, 28 | desc: 'String input' 29 | 30 | # All kinf of size information about a Unicode string. 31 | # @param input [String] Input sting we want to know the size of 32 | def call(input: nil, **) 33 | puts Unisec::Size.new(input).display 34 | end 35 | end 36 | end 37 | end 38 | end 39 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Alexandre ZANNI at ACCEIS 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /lib/unisec/cli/versions.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'dry/cli' 4 | require 'unisec' 5 | 6 | module Unisec 7 | module CLI 8 | module Commands 9 | # CLI command `unisec versions` for the class {Unisec::Versions} from the lib. 10 | # 11 | # Example: 12 | # 13 | # ```plaintext 14 | # $ unisec versions 15 | # Unicode: 16 | # Unicode (Ruby) 15.0.0 17 | # Unicode (twitter_cldr gem) 14.0.0 18 | # Unicode (unicode-confusable gem) 15.0.0 19 | # ICU (twitter_cldr gem) 70.1 20 | # CLDR (twitter_cldr gem) 40 21 | # Unicode emoji (Ruby) 15.0 22 | # 23 | # Gems: 24 | # unisec 0.0.1 25 | # twitter_cldr gem 6.11.5 26 | # unicode-confusable gem 1.9.0 27 | # ``` 28 | class Versions < Dry::CLI::Command 29 | desc 'Version of anything related to Unicode as used in unisec' 30 | 31 | # Version of anything related to Unicode as used in unisec. 32 | def call(**) 33 | puts Unisec::Versions.display 34 | end 35 | end 36 | end 37 | end 38 | end 39 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | source 'https://rubygems.org' 4 | 5 | # Specify your gem's dependencies in .gemspec 6 | gemspec 7 | 8 | # Needed for the CLI only 9 | group :runtime, :cli do 10 | gem 'dry-cli', '~> 1.1' # for arg parsing 11 | gem 'paint', '~> 2.3' # for colorized ouput 12 | end 13 | 14 | # Needed for the CLI & library 15 | group :runtime, :all do 16 | gem 'ctf-party', '~> 3.0' # string conversion 17 | gem 'twitter_cldr', '~> 6.12' # ICU / CLDR 18 | gem 'unicode-confusable', '~> 1.10' # confusable chars 19 | end 20 | 21 | # Needed to install dependencies 22 | group :development, :install do 23 | gem 'bundler', '~> 2.1' 24 | end 25 | 26 | # Needed to run tests 27 | group :development, :test do 28 | gem 'minitest', '~> 5.24' 29 | gem 'minitest-skip', '~> 0.0' # skip dummy tests 30 | gem 'rake', '~> 13.2' 31 | end 32 | 33 | # Needed for linting 34 | group :development, :lint do 35 | gem 'rubocop', '~> 1.64' 36 | end 37 | 38 | group :development, :docs do 39 | gem 'commonmarker', '~> 0.23' # for markdown support in YARD 40 | gem 'webrick', '~> 1.8', '>= 1.8.1' # for yard server 41 | gem 'yard', ['>= 0.9.27', '< 0.10'] 42 | gem 'yard-coderay', '~> 0.1' # for syntax highlight support in YARD 43 | end 44 | -------------------------------------------------------------------------------- /.github/workflows/ruby.yml: -------------------------------------------------------------------------------- 1 | # This workflow uses actions that are not certified by GitHub. 2 | # They are provided by a third-party and are governed by 3 | # separate terms of service, privacy policy, and support 4 | # documentation. 5 | # This workflow will download a prebuilt Ruby version, install dependencies and run tests with Rake 6 | # For more information see: https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby 7 | 8 | name: Ruby 9 | 10 | on: 11 | push: 12 | branches: [ master ] 13 | pull_request: 14 | branches: [ master ] 15 | 16 | jobs: 17 | test: 18 | 19 | runs-on: ubuntu-latest 20 | strategy: 21 | matrix: 22 | ruby-version: ['3.3', '3.2', '3.1', '3.0'] 23 | env: 24 | BUNDLE_WITHOUT: docs development # https://bundler.io/v1.5/groups.html 25 | steps: 26 | - uses: actions/checkout@v4 # https://github.com/actions/checkout 27 | - name: Set up Ruby 28 | # To automatically get bug fixes and new Ruby versions for ruby/setup-ruby, 29 | # change this to (see https://github.com/ruby/setup-ruby#versioning): 30 | uses: ruby/setup-ruby@v1 31 | with: 32 | ruby-version: ${{ matrix.ruby-version }} 33 | bundler-cache: true # runs 'bundle install' and caches installed gems automatically 34 | - name: Run tests 35 | run: bundle exec rake test 36 | - name: Run lint 37 | run: bundle exec rubocop 38 | -------------------------------------------------------------------------------- /lib/unisec/cli/cli.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'unisec/cli/bidi' 4 | require 'unisec/cli/confusables' 5 | require 'unisec/cli/hexdump' 6 | require 'unisec/cli/normalization' 7 | require 'unisec/cli/properties' 8 | require 'unisec/cli/rugrep' 9 | require 'unisec/cli/size' 10 | require 'unisec/cli/surrogates' 11 | require 'unisec/cli/versions' 12 | 13 | module Unisec 14 | # Module used to create the CLI for the executable 15 | module CLI 16 | # Registered commands for the CLI 17 | module Commands 18 | extend Dry::CLI::Registry 19 | 20 | # Mapping between the (sub-)commands as seen by the user 21 | # on the command-line interface and the CLI modules in the lib 22 | register 'bidi spoof', Bidi::Spoof 23 | register 'confusables list', Confusables::List 24 | register 'confusables randomize', Confusables::Randomize 25 | register 'grep', Grep 26 | register 'hexdump', Hexdump 27 | register 'normalize all', Normalize::All 28 | register 'normalize replace', Normalize::Replace 29 | register 'properties char', Properties::Char 30 | register 'properties codepoints', Properties::Codepoints 31 | register 'properties list', Properties::List 32 | register 'size', Size 33 | register 'surrogates from', Surrogates::From 34 | register 'surrogates to', Surrogates::To 35 | register 'versions', Versions 36 | end 37 | end 38 | end 39 | -------------------------------------------------------------------------------- /test/test_surrogates.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: false 2 | 3 | require 'minitest/autorun' 4 | require 'unisec' 5 | 6 | class UnisecTest < Minitest::Test 7 | def test_unisec_surrogates_equals_attrs 8 | surr1 = Unisec::Surrogates.new(128169) # 💩 9 | surr2 = Unisec::Surrogates.new(55357, 56489) # 💩 10 | 11 | assert_equal(surr1.cp, surr2.cp) 12 | assert_equal(surr1.hs, surr2.hs) 13 | assert_equal(surr1.ls, surr2.ls) 14 | assert_equal(surr1.code_point, surr2.code_point) 15 | assert_equal(surr1.high_surrogate, surr2.high_surrogate) 16 | assert_equal(surr1.low_surrogate, surr2.low_surrogate) 17 | end 18 | 19 | def test_unisec_surrogates_code_point 20 | assert_equal(0x10000, Unisec::Surrogates.code_point(0xD800, 0xDC00)) # 𐀀 21 | assert_equal(0x1FF00, Unisec::Surrogates.code_point(0xD83F, 0xDF00)) # 🼀 22 | assert_equal(0x1F600, Unisec::Surrogates.code_point(0xD83D, 0xDE00)) # 😀 23 | end 24 | 25 | def test_unisec_surrogates_high_surrogate 26 | assert_equal(0xD835, Unisec::Surrogates.high_surrogate(0x1D400)) # 𝐀 27 | assert_equal(0xD837, Unisec::Surrogates.high_surrogate(0x1DF00)) # 𝼀 28 | assert_equal(0xD83C, Unisec::Surrogates.high_surrogate(0x1F300)) # 🌀 29 | end 30 | 31 | def test_unisec_surrogates_low_surrogate 32 | assert_equal(0xDF00, Unisec::Surrogates.low_surrogate(0x10300)) # 𐜀 33 | assert_equal(0xDE00, Unisec::Surrogates.low_surrogate(0x11A00)) # 𑨀 34 | assert_equal(0xDD00, Unisec::Surrogates.low_surrogate(0x12500)) # 𒔀 35 | end 36 | end 37 | -------------------------------------------------------------------------------- /test/test_hexdump.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: false 2 | 3 | require 'minitest/autorun' 4 | require 'unisec' 5 | 6 | class UnisecTest < Minitest::Test 7 | def test_unisec_hexdump_utf8 8 | assert_equal('f0 9f a6 93', Unisec::Hexdump.utf8('🦓')) 9 | assert_equal('f0 9f a7 91 e2 80 8d f0 9f 9a 80 20 67 6f 65 73 20 69 6e 20 f0 9f 9a 80 e2 9c a8', Unisec::Hexdump.utf8('🧑‍🚀 goes in 🚀✨')) 10 | end 11 | 12 | def test_unisec_hexdump_utf16be 13 | assert_equal('d83e dd93', Unisec::Hexdump.utf16be('🦓')) 14 | assert_equal('d83e ddd1 200d d83d de80 0020 0067 006f 0065 0073 0020 0069 006e 0020 d83d de80 2728', Unisec::Hexdump.utf16be('🧑‍🚀 goes in 🚀✨')) 15 | end 16 | 17 | def test_unisec_hexdump_utf16le 18 | assert_equal('3ed8 93dd', Unisec::Hexdump.utf16le('🦓')) 19 | assert_equal('3ed8 d1dd 0d20 3dd8 80de 2000 6700 6f00 6500 7300 2000 6900 6e00 2000 3dd8 80de 2827', Unisec::Hexdump.utf16le('🧑‍🚀 goes in 🚀✨')) 20 | end 21 | 22 | def test_unisec_hexdump_utf32be 23 | assert_equal('0001f993', Unisec::Hexdump.utf32be('🦓')) 24 | assert_equal('0001f9d1 0000200d 0001f680 00000020 00000067 0000006f 00000065 00000073 00000020 00000069 0000006e 00000020 0001f680 00002728', Unisec::Hexdump.utf32be('🧑‍🚀 goes in 🚀✨')) 25 | end 26 | 27 | def test_unisec_hexdump_utf32le 28 | assert_equal('93f90100', Unisec::Hexdump.utf32le('🦓')) 29 | assert_equal('d1f90100 0d200000 80f60100 20000000 67000000 6f000000 65000000 73000000 20000000 69000000 6e000000 20000000 80f60100 28270000', Unisec::Hexdump.utf32le('🧑‍🚀 goes in 🚀✨')) 30 | end 31 | end 32 | -------------------------------------------------------------------------------- /test/test_size.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: false 2 | 3 | require 'minitest/autorun' 4 | require 'unisec' 5 | 6 | class UnisecTest < Minitest::Test 7 | def setup 8 | @zwj = '👩‍❤️‍👩' 9 | @josé = 'J̲o̲s̲é̲' 10 | end 11 | 12 | def test_unisec_size_code_points_size 13 | assert_equal(6, Unisec::Size.code_points_size(@zwj)) 14 | assert_equal(9, Unisec::Size.code_points_size(@josé)) 15 | end 16 | 17 | def test_unisec_size_grapheme_size 18 | assert_equal(1, Unisec::Size.grapheme_size(@zwj)) 19 | assert_equal(4, Unisec::Size.grapheme_size(@josé)) 20 | end 21 | 22 | def test_unisec_size_utf8_bytesize 23 | assert_equal(20, Unisec::Size.utf8_bytesize(@zwj)) 24 | assert_equal(14, Unisec::Size.utf8_bytesize(@josé)) 25 | end 26 | 27 | def test_unisec_size_utf16_bytesize 28 | assert_equal(16, Unisec::Size.utf16_bytesize(@zwj)) 29 | assert_equal(18, Unisec::Size.utf16_bytesize(@josé)) 30 | end 31 | 32 | def test_unisec_size_utf32_bytesize 33 | assert_equal(24, Unisec::Size.utf32_bytesize(@zwj)) 34 | assert_equal(36, Unisec::Size.utf32_bytesize(@josé)) 35 | end 36 | 37 | def test_unisec_size_utf8_unitsize 38 | assert_equal(20, Unisec::Size.utf8_unitsize(@zwj)) 39 | assert_equal(14, Unisec::Size.utf8_unitsize(@josé)) 40 | end 41 | 42 | def test_unisec_size_utf16_unitsize 43 | assert_equal(8, Unisec::Size.utf16_unitsize(@zwj)) 44 | assert_equal(9, Unisec::Size.utf16_unitsize(@josé)) 45 | end 46 | 47 | def test_unisec_size_utf32_unitsize 48 | assert_equal(6, Unisec::Size.utf32_unitsize(@zwj)) 49 | assert_equal(9, Unisec::Size.utf32_unitsize(@josé)) 50 | end 51 | end 52 | -------------------------------------------------------------------------------- /lib/unisec/cli/hexdump.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'dry/cli' 4 | require 'unisec' 5 | 6 | module Unisec 7 | module CLI 8 | module Commands 9 | # CLI command `unisec hexdumps` for the class {Unisec::Hexdump} from the lib. 10 | # 11 | # Example: 12 | # 13 | # ```plaintext 14 | # $ unisec hexdump "ACCEIS" 15 | # UTF-8: 41 43 43 45 49 53 16 | # UTF-16BE: 0041 0043 0043 0045 0049 0053 17 | # UTF-16LE: 4100 4300 4300 4500 4900 5300 18 | # UTF-32BE: 00000041 00000043 00000043 00000045 00000049 00000053 19 | # UTF-32LE: 41000000 43000000 43000000 45000000 49000000 53000000 20 | # 21 | # $unisec hexdump "ACCEIS" --enc utf16le 22 | # 4100 4300 4300 4500 4900 5300 23 | # ``` 24 | class Hexdump < Dry::CLI::Command 25 | desc 'Hexdump in all Unicode encodings' 26 | 27 | argument :input, required: true, 28 | desc: 'String input. Read from STDIN if equal to -.' 29 | 30 | option :enc, default: nil, values: %w[utf8 utf16be utf16le utf32be utf32le], 31 | desc: 'Output only in the specified encoding.' 32 | 33 | # Hexdump of all Unicode encodings. 34 | # @param input [String] Input string to encode 35 | def call(input: nil, **options) 36 | input = $stdin.read.chomp if input == '-' 37 | if options[:enc].nil? 38 | puts Unisec::Hexdump.new(input).display 39 | else 40 | # using send() is safe here thanks to the value whitelist 41 | puts Unisec::Hexdump.send(options[:enc], input) 42 | end 43 | end 44 | end 45 | end 46 | end 47 | end 48 | -------------------------------------------------------------------------------- /docs/CHANGELOG.md: -------------------------------------------------------------------------------- 1 | ## [unreleased] 2 | 3 | ## [0.0.6] 4 | 5 | **Features** 6 | 7 | - _Prepare a XSS payload for HTML escape bypass (HTML escape followed by NFKC / NFKD normalization)_ 8 | - Rename CLI command `normalize` into `normalize all` 9 | - Add a new method `replace_bypass` in the class `Unisec::Normalization` 10 | - Add a new CLI command `normalize replace` (using the new `replace_bypass` method) 11 | 12 | ## [0.0.5] 13 | 14 | **Features** 15 | 16 | - Add a new class `Unisec::Normalization` and CLI command `normalize` to output all normalization forms 17 | 18 | **Chore** 19 | 20 | - Enhance documentation 21 | - Dependencies update 22 | 23 | ## [0.0.4] 24 | 25 | **Features** 26 | 27 | - Add a new class `Unisec::Bidi::Spoof` and CLI command `bidi spoof` to craft payloads for attack using BiDi code points like RtLO, for example, for spoofing a domain name or a file name 28 | - Add a new helper method: `Unisec::Utils::String.grapheme_reverse`: Reverse a string by graphemes (not by code points) 29 | - Add an `--enc` option for `unisec hexdump` to output only in the specified encoding 30 | - `unisec hexdump` can now read from STDIN if the input equals to `-` 31 | 32 | ## [0.0.3] 33 | 34 | **Features** 35 | 36 | - Add a new class `Unisec::Rugrep` and CLI command `grep` to search for Unicode code point names by regular expression 37 | - Add a new method `Unisec::Properties.deccp2stdhexcp`: Convert from decimal code point to standardized format hexadecimal code point 38 | 39 | **Chore** 40 | 41 | - Enhance tests: `assert_equal(true, test)` ➡️ `assert(test)` 42 | - Enhance SEO: better description 43 | 44 | ## [0.0.2] 45 | 46 | - Add 2 new classes (and corresponding CLI command): 47 | - `Unisec::Versions`: Version of Unicode, ICU, CLDR, gems used in Unisec 48 | - `Unisec::Size`: Code point, grapheme, UTF-8/UTF-16/UTF-32 byte/unit size 49 | 50 | ## [0.0.1] 51 | 52 | - Initial version 53 | -------------------------------------------------------------------------------- /unisec.gemspec: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require_relative 'lib/unisec/version' 4 | 5 | Gem::Specification.new do |s| 6 | s.name = 'unisec' 7 | s.version = Unisec::VERSION 8 | s.platform = Gem::Platform::RUBY 9 | s.summary = 'Unicode Security Toolkit' 10 | s.description = 'Toolkit for security research manipulating Unicode: ' 11 | s.description += 'confusables, homoglyphs, hexdump, code point, UTF-8, UTF-16, UTF-32, properties, regexp search, ' 12 | s.description += 'size, grapheme, surrogates, version, ICU, CLDR, UCD, BiDi, normalization' 13 | s.authors = ['Alexandre ZANNI'] 14 | s.email = 'alexandre.zanni@europe.com' 15 | s.homepage = 'https://github.com/Acceis/unisec' 16 | s.license = 'MIT' 17 | 18 | s.files = Dir['bin/*', 'lib/**/*.rb', 'data/*', 'LICENSE'] 19 | s.bindir = 'bin' 20 | s.executables = s.files.grep(%r{^bin/}) { |f| File.basename(f) } 21 | s.require_paths = ['lib'] 22 | 23 | s.metadata = { 24 | 'yard.run' => 'yard', 25 | 'bug_tracker_uri' => 'https://github.com/Acceis/unisec/issues', 26 | 'changelog_uri' => 'https://github.com/Acceis/unisec/releases', 27 | 'documentation_uri' => 'https://acceis.github.io/unisec/', 28 | 'homepage_uri' => 'https://github.com/Acceis/unisec', 29 | 'source_code_uri' => 'https://github.com/Acceis/unisec/', 30 | 'rubygems_mfa_required' => 'true' 31 | } 32 | 33 | s.required_ruby_version = ['>= 3.0.0', '< 4.0'] 34 | 35 | s.add_runtime_dependency('ctf-party', '~> 3.0') # string conversion 36 | s.add_runtime_dependency('dry-cli', '~> 1.0') # CLI 37 | s.add_runtime_dependency('paint', '~> 2.3') # colorized output 38 | s.add_runtime_dependency('twitter_cldr', '~> 6.11', '>= 6.11.5') # ICU / CLDR 39 | s.add_runtime_dependency('unicode-confusable', '~> 1.9') # confusable chars 40 | end 41 | -------------------------------------------------------------------------------- /test/test_properties.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: false 2 | 3 | require 'minitest/autorun' 4 | require 'unisec' 5 | 6 | class UnisecTest < Minitest::Test 7 | def test_unisec_properties_list 8 | assert_kind_of(Array, Unisec::Properties.list) 9 | assert_kind_of(String, Unisec::Properties.list.first) 10 | end 11 | 12 | def test_unisec_properties_codepoints 13 | cps = Unisec::Properties.codepoints('Quotation_Mark') 14 | assert_kind_of(Array, cps) 15 | assert_kind_of(Hash, cps.first) 16 | assert(cps.first.has_key?(:char)) 17 | assert(cps.first.has_key?(:codepoint)) 18 | assert(cps.first.has_key?(:name)) 19 | end 20 | 21 | def test_unisec_properties_char 22 | data = Unisec::Properties.char('é') 23 | assert_kind_of(Hash, data) 24 | assert(data.has_key?(:age)) 25 | assert(data.has_key?(:block)) 26 | assert(data.has_key?(:category)) 27 | assert(data.has_key?(:subcategory)) 28 | assert(data.has_key?(:codepoint)) 29 | assert(data.has_key?(:name)) 30 | assert(data.has_key?(:script)) 31 | assert(data.has_key?(:case)) 32 | assert(data.has_key?(:normalization)) 33 | assert(data.has_key?(:other_properties)) 34 | assert_equal('LATIN SMALL LETTER E WITH ACUTE', data[:name]) 35 | assert_equal('U+00E9', data[:codepoint]) 36 | end 37 | 38 | def test_unisec_properties_char2codepoint 39 | assert_equal('U+00E9', Unisec::Properties.char2codepoint('é')) 40 | assert_equal('U+0041', Unisec::Properties.char2codepoint('AZ')) 41 | end 42 | 43 | def test_unisec_properties_chars2codepoints 44 | assert_equal('U+00E9', Unisec::Properties.chars2codepoints('é')) 45 | assert_equal('U+0041 U+005A', Unisec::Properties.chars2codepoints('AZ')) 46 | end 47 | 48 | def test_unisec_properties_deccp2stdhexcp 49 | assert_equal('U+1F680', Unisec::Properties.deccp2stdhexcp(128640)) 50 | assert_equal('U+0020', Unisec::Properties.deccp2stdhexcp(32)) 51 | end 52 | end 53 | -------------------------------------------------------------------------------- /docs/pages/install.md: -------------------------------------------------------------------------------- 1 | # Installation 2 | 3 | ## Production 4 | 5 | 6 | 7 | ### **rubygems.org (universal)** 8 | 9 | ``` 10 | $ gem install unisec 11 | ``` 12 | 13 | Gem: [unisec](https://rubygems.org/gems/unisec) 14 | 15 | ### **BlackArch** 16 | 17 | From the repository: 18 | 19 | ``` 20 | # pacman -S unisec 21 | ``` 22 | 23 | From git: 24 | 25 | ``` 26 | # blackman -i unisec 27 | ``` 28 | 29 | PKGBUILD: [unisec](https://github.com/BlackArch/blackarch/blob/master/packages/unisec/PKGBUILD) 30 | 31 | ### **ArchLinux** 32 | 33 | Manually: 34 | 35 | ``` 36 | $ git clone https://aur.archlinux.org/unisec.git 37 | $ cd unisec 38 | $ makepkg -sic 39 | ``` 40 | 41 | With an AUR helper ([Pacman wrappers](https://wiki.archlinux.org/index.php/AUR_helpers#Pacman_wrappers)), eg. pikaur: 42 | 43 | ``` 44 | $ pikaur -S unisec 45 | ``` 46 | 47 | AUR: [unisec](https://aur.archlinux.org/packages/unisec/) 48 | 49 | 50 | 51 | ## Development 52 | 53 | It's better to use [ASDM-VM](https://asdf-vm.com/) to have latests version of ruby and to avoid trashing your system ruby. 54 | 55 | 56 | 57 | ### **rubygems.org** 58 | 59 | ``` 60 | $ gem install --development unisec 61 | ``` 62 | 63 | ### **git** 64 | 65 | Just replace `x.x.x` with the gem version you see after `gem build`. 66 | 67 | ``` 68 | $ git clone https://github.com/acceis/unisec.git unisec 69 | $ cd unisec 70 | $ gem install bundler 71 | $ bundler install 72 | $ gem build unisec.gemspec 73 | $ gem install unisec-x.x.x.gem 74 | ``` 75 | 76 | Note: if an automatic install is needed you can get the version with `$ gem build unisec.gemspec | grep Version | cut -d' ' -f4`. 77 | 78 | ### **No install** 79 | 80 | Run the library in irb without installing the gem. 81 | 82 | From local file: 83 | 84 | ``` 85 | $ irb -Ilib -runisec 86 | ``` 87 | 88 | Same for the CLI tool: 89 | 90 | ``` 91 | $ ruby -Ilib -runisec bin/unisec 92 | ``` 93 | 94 | 95 | -------------------------------------------------------------------------------- /docs-tools/gulpfile.mjs: -------------------------------------------------------------------------------- 1 | // Load plugins 2 | import gulp from 'gulp'; 3 | const { series, parallel, src, dest, task } = gulp; 4 | 5 | task('copy', 6 | parallel( 7 | docsify_js, docsify_plugins, docsify_themeable_css, docsify_themeable_js, 8 | docsify_tabs_js, docsify_image_caption_js, docsify_sidebar_collapse_js, 9 | prismjs_js 10 | ) 11 | ); 12 | task('copy').description = 'Copy dependencies'; 13 | task('default', series('copy')); 14 | task('default').description = 'Default'; 15 | 16 | function docsify_js() { 17 | return src('node_modules/docsify/lib/docsify.min.js') 18 | .pipe(dest('../docs/vendor')); 19 | }; 20 | 21 | function docsify_plugins() { 22 | return src(['node_modules/docsify/lib/plugins/emoji.min.js', 23 | 'node_modules/docsify/lib/plugins/search.min.js']) 24 | .pipe(dest('../docs/vendor/plugins')); 25 | }; 26 | 27 | function docsify_themeable_css() { 28 | return src(['node_modules/docsify-themeable/dist/css/theme-simple.css', 29 | 'node_modules/docsify-themeable/dist/css/theme-simple-dark.css']) 30 | .pipe(dest('../docs/vendor/themes')); 31 | }; 32 | 33 | function docsify_themeable_js() { 34 | return src('node_modules/docsify-themeable/dist/js/docsify-themeable.min.js') 35 | .pipe(dest('../docs/vendor/plugins')); 36 | }; 37 | 38 | function docsify_tabs_js() { 39 | return src('node_modules/docsify-tabs/dist/docsify-tabs.min.js') 40 | .pipe(dest('../docs/vendor/plugins')); 41 | }; 42 | 43 | function docsify_image_caption_js() { 44 | return src('node_modules/@h-hg/docsify-image-caption/dist/docsify-image-caption.min.js') 45 | .pipe(dest('../docs/vendor/plugins')); 46 | }; 47 | 48 | function docsify_sidebar_collapse_js() { 49 | return src('node_modules/docsify-sidebar-collapse/dist/docsify-sidebar-collapse.min.js') 50 | .pipe(dest('../docs/vendor/plugins')); 51 | }; 52 | 53 | function prismjs_js() { 54 | return src(['node_modules/prismjs/components/prism-ruby.min.js']) 55 | .pipe(dest('../docs/vendor/prismjs/components')); 56 | }; 57 | -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | unisec 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | -------------------------------------------------------------------------------- /docs/vendor/plugins/docsify-image-caption.min.js: -------------------------------------------------------------------------------- 1 | parcelRequire=function(e,r,t,n){var i,o="function"==typeof parcelRequire&&parcelRequire,u="function"==typeof require&&require;function f(t,n){if(!r[t]){if(!e[t]){var i="function"==typeof parcelRequire&&parcelRequire;if(!n&&i)return i(t,!0);if(o)return o(t,!0);if(u&&"string"==typeof t)return u(t);var c=new Error("Cannot find module '"+t+"'");throw c.code="MODULE_NOT_FOUND",c}p.resolve=function(r){return e[t][1][r]||r},p.cache={};var l=r[t]=new f.Module(t);e[t][0].call(l.exports,p,l,l.exports,this)}return r[t].exports;function p(e){return f(p.resolve(e))}}f.isParcelRequire=!0,f.Module=function(e){this.id=e,this.bundle=f,this.exports={}},f.modules=e,f.cache=r,f.parent=o,f.register=function(r,t){e[r]=[function(e,r){r.exports=t},{}]};for(var c=0;c 2 | 3 | 4 | 5 | 6 | 7 | File: about 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 59 | 60 |

About

61 |

Logo

62 |

Logo made with DesignEvo.

63 |

User documentation

64 |

The user documentation is made with docsify, the theme 65 | used is docsify-themeable 66 | (Simple Dark scheme).

67 |
68 | 69 | 74 | 75 |
76 | 77 | -------------------------------------------------------------------------------- /Gemfile.lock: -------------------------------------------------------------------------------- 1 | PATH 2 | remote: . 3 | specs: 4 | unisec (0.0.6) 5 | ctf-party (~> 3.0) 6 | dry-cli (~> 1.0) 7 | paint (~> 2.3) 8 | twitter_cldr (~> 6.11, >= 6.11.5) 9 | unicode-confusable (~> 1.9) 10 | 11 | GEM 12 | remote: https://rubygems.org/ 13 | specs: 14 | ast (2.4.2) 15 | camertron-eprun (1.1.1) 16 | cldr-plurals-runtime-rb (1.1.0) 17 | coderay (1.1.3) 18 | commonmarker (0.23.10) 19 | concurrent-ruby (1.3.3) 20 | ctf-party (3.0.0) 21 | docopt (~> 0.6) 22 | uri (>= 0.12.1, < 0.14.0) 23 | docopt (0.6.1) 24 | dry-cli (1.1.0) 25 | json (2.7.2) 26 | language_server-protocol (3.17.0.3) 27 | minitest (5.24.1) 28 | minitest-skip (0.0.3) 29 | minitest (~> 5.0) 30 | paint (2.3.0) 31 | parallel (1.25.1) 32 | parser (3.3.4.0) 33 | ast (~> 2.4.1) 34 | racc 35 | racc (1.8.0) 36 | rainbow (3.1.1) 37 | rake (13.2.1) 38 | regexp_parser (2.9.2) 39 | rexml (3.3.1) 40 | strscan 41 | rubocop (1.65.0) 42 | json (~> 2.3) 43 | language_server-protocol (>= 3.17.0) 44 | parallel (~> 1.10) 45 | parser (>= 3.3.0.2) 46 | rainbow (>= 2.2.2, < 4.0) 47 | regexp_parser (>= 2.4, < 3.0) 48 | rexml (>= 3.2.5, < 4.0) 49 | rubocop-ast (>= 1.31.1, < 2.0) 50 | ruby-progressbar (~> 1.7) 51 | unicode-display_width (>= 2.4.0, < 3.0) 52 | rubocop-ast (1.31.3) 53 | parser (>= 3.3.1.0) 54 | ruby-progressbar (1.13.0) 55 | strscan (3.1.0) 56 | twitter_cldr (6.12.1) 57 | camertron-eprun 58 | cldr-plurals-runtime-rb (~> 1.1) 59 | tzinfo 60 | tzinfo (2.0.6) 61 | concurrent-ruby (~> 1.0) 62 | unicode-confusable (1.10.0) 63 | unicode-display_width (2.5.0) 64 | uri (0.13.0) 65 | webrick (1.8.1) 66 | yard (0.9.36) 67 | yard-coderay (0.1.0) 68 | coderay 69 | yard 70 | 71 | PLATFORMS 72 | x86_64-linux 73 | 74 | DEPENDENCIES 75 | bundler (~> 2.1) 76 | commonmarker (~> 0.23) 77 | ctf-party (~> 3.0) 78 | dry-cli (~> 1.1) 79 | minitest (~> 5.24) 80 | minitest-skip (~> 0.0) 81 | paint (~> 2.3) 82 | rake (~> 13.2) 83 | rubocop (~> 1.64) 84 | twitter_cldr (~> 6.12) 85 | unicode-confusable (~> 1.10) 86 | unisec! 87 | webrick (~> 1.8, >= 1.8.1) 88 | yard (>= 0.9.27, < 0.10) 89 | yard-coderay (~> 0.1) 90 | 91 | BUNDLED WITH 92 | 2.5.6 93 | -------------------------------------------------------------------------------- /lib/unisec/cli/confusables.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'dry/cli' 4 | require 'unisec' 5 | require 'unisec/utils' 6 | 7 | module Unisec 8 | module CLI 9 | module Commands 10 | # CLI sub-commands `unisec confusables xxx` for the class {Unisec::Confusables} from the lib. 11 | module Confusables 12 | # Command `unisec confusables list` 13 | # 14 | # Example: 15 | # 16 | # ```plaintext 17 | # $ unisec confusables list '!' 18 | # U+FF01 ! FULLWIDTH EXCLAMATION MARK 19 | # U+01C3 ǃ LATIN LETTER RETROFLEX CLICK 20 | # … 21 | # ``` 22 | class List < Dry::CLI::Command 23 | desc 'List confusables characters for a given character' 24 | 25 | argument :character, required: true, desc: 'Unicode code point (as string)' 26 | option :map, default: true, values: %w[true false], 27 | desc: 'Allows partial mapping, includes confusable where the given chart is a part of' 28 | 29 | # List confusables characters for a given character 30 | # @param character [String] the character to search confusables for 31 | # @option options [Boolean] :map allows partial mapping, includes confusable where the given chart is a 32 | # part of 33 | def call(character: nil, **options) 34 | to_bool = ->(str) { ['true', true].include?(str) } 35 | Unisec::Confusables.list_display(character, map: to_bool.call(options.fetch(:map))) 36 | end 37 | end 38 | 39 | # Command `unisec confusables randomize` 40 | # 41 | # Example: 42 | # 43 | # ```plaintext 44 | # $ unisec confusables randomize noraj 45 | # Original: noraj 46 | # Transformed: ռ໐𝘳𝜶𝙟 47 | # … 48 | # ``` 49 | class Randomize < Dry::CLI::Command 50 | desc 'Replace all characters from a string with random confusables when possible' 51 | 52 | argument :str, required: true, desc: 'Unicode string' 53 | 54 | # Replace all characters from a string with random confusables when possible 55 | # @param str [String] Unicode string 56 | def call(str: nil, **) 57 | Unisec::Confusables.randomize_display(str) 58 | end 59 | end 60 | end 61 | end 62 | end 63 | end 64 | -------------------------------------------------------------------------------- /lib/unisec/confusables.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'unicode/confusable' 4 | require 'twitter_cldr' 5 | 6 | module Unisec 7 | # Operations about Unicode confusable characters (homoglyphs). 8 | class Confusables 9 | # List confusables characters for a given character 10 | # @param chr [String] the character to search confusables for 11 | # @param map [Boolean] allows partial mapping, includes confusable where the given chart is a part of 12 | # @return [Array] list of confusables 13 | # @example 14 | # Unisec::Confusables.list('!') # => ["!", "ǃ", "ⵑ", "‼", "⁉", "⁈"] 15 | # Unisec::Confusables.list('!', map: false) # => ["!", "ǃ", "ⵑ"] 16 | def self.list(chr, map: true) 17 | Unicode::Confusable.list(chr, map) 18 | end 19 | 20 | # Display a CLI-friendly output listing all confusables corresponding to a character (code point) 21 | # @param chr [String] the character to search confusables for 22 | # @param map [Boolean] allows partial mapping, includes confusable where the given chart is a part of 23 | def self.list_display(chr, map: true) 24 | Confusables.list(chr, map: map).each do |confu| 25 | puts "#{Properties.char2codepoint(confu).ljust(9)} #{confu.ljust(4)} " \ 26 | "#{TwitterCldr::Shared::CodePoint.get(confu.codepoints.first).name}" 27 | end 28 | nil 29 | end 30 | 31 | # Replace all characters with random confusables when possible. 32 | # @param str [String] Unicode string 33 | # @return [String] input randomized with confusables 34 | # @example 35 | # Unisec::Confusables.randomize('noraj') # => "𝓃ⲟ𝓇𝒶j" 36 | # Unisec::Confusables.randomize('noraj') # => "𝗻૦𝚛⍺𝐣" 37 | # Unisec::Confusables.randomize('noraj') # => "𝔫𞺄𝕣⍺j" 38 | def self.randomize(str) 39 | out = '' 40 | str.each_char do |chr| 41 | confu = Confusables.list(chr, map: false).sample 42 | out += confu.nil? ? chr : confu 43 | end 44 | out 45 | end 46 | 47 | # Display a CLI-friendly output of a string where characters are replaces with random confusables 48 | # @param str [String] Unicode string 49 | def self.randomize_display(str) 50 | display = ->(key, value) { puts Paint[key, :red, :bold].ljust(23) + " #{value}" } 51 | display.call('Original:', str) 52 | display.call('Transformed:', Unisec::Confusables.randomize(str)) 53 | end 54 | end 55 | end 56 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # unisec 2 | 3 | [![GitHub forks](https://img.shields.io/github/forks/acceis/unisec)](https://github.com/acceis/unisec/network) 4 | [![GitHub stars](https://img.shields.io/github/stars/acceis/unisec)](https://github.com/acceis/unisec/stargazers) 5 | [![GitHub license](https://img.shields.io/github/license/acceis/unisec)](https://github.com/acceis/unisec/blob/master/LICENSE) 6 | [![Rawsec's CyberSecurity Inventory](https://inventory.raw.pm/img/badges/Rawsec-inventoried-FF5050_flat.svg)](https://inventory.raw.pm/tools.html#unisec) 7 | 8 | ![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/acceis/unisec/ruby.yml?branch=master) 9 | ![GitHub commit activity](https://img.shields.io/github/commit-activity/y/acceis/unisec) 10 | 11 | ![](https://acceis.github.io/unisec/_media/unisec-logo.png) 12 | 13 | > Unicode Security Toolkit 14 | 15 | ## What is it? 16 | 17 | A CLI tool and library to play with Unicode security. 18 | 19 | ## Features 20 | 21 | - **BiDi spoofing** 22 | - Craft payloads for attack using BiDi code points (e.g. spoofing a domain name or a file name) 23 | - **Confusables / homoglyphs** 24 | - List confusables characters for a given character 25 | - Replace all characters from a string with random confusables 26 | - **Hexdump** 27 | - UTF-8, UTF-16, UTF-32 hexadecimal dumps 28 | - **Normalization** 29 | - NFC, NFKC, NFD, NFKD normalization forms, HTML escape bypass for XSS 30 | - **Properties** 31 | - Get all properties of a given Unicode character 32 | - List code points matching a Unicode property 33 | - List all Unicode properties name 34 | - **Regexp search** 35 | - Search for Unicode code point names by regular expression 36 | - **Size** 37 | - Code point, grapheme, UTF-8/UTF-16/UTF-32 byte/unit size 38 | - **Surrogates** 39 | - Code point ↔️ Surrogates conversion 40 | - **Versions** 41 | - Version of Unicode, ICU, CLDR, UCD, gems used in Unisec 42 | 43 | ## Installation 44 | 45 | ```plaintext 46 | $ gem install unisec 47 | ``` 48 | 49 | Check the [installation](https://acceis.github.io/unisec/#/pages/install) page on the documentation to discover more methods. 50 | 51 | [![Packaging status](https://repology.org/badge/vertical-allrepos/unisec.svg)](https://repology.org/project/unisec/versions) 52 | [![Gem Version](https://badge.fury.io/rb/unisec.svg)](https://badge.fury.io/rb/unisec) 53 | ![GitHub tag (latest SemVer)](https://img.shields.io/github/tag/acceis/unisec) 54 | 55 | ## Documentation 56 | 57 | Homepage / Documentation: https://acceis.github.io/unisec/ 58 | 59 | ## Author 60 | 61 | Made by Alexandre ZANNI ([@noraj](https://pwn.by/noraj/)) at [ACCEIS](https://www.acceis.fr/). 62 | -------------------------------------------------------------------------------- /docs/yard/top-level-namespace.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Top Level Namespace 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Top Level Namespace 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
80 | 81 |

Defined Under Namespace

82 |

83 | 84 | 85 | Modules: Unisec 86 | 87 | 88 | 89 | Classes: Integer 90 | 91 | 92 |

93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 |
103 | 104 | 109 | 110 |
111 | 112 | -------------------------------------------------------------------------------- /lib/unisec/cli/surrogates.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'dry/cli' 4 | require 'unisec' 5 | require 'unisec/utils' 6 | 7 | module Unisec 8 | module CLI 9 | module Commands 10 | # CLI sub-commands `unisec surrogates xxx` for the class {Unisec::Surrogates} from the lib. 11 | module Surrogates 12 | # Command `unisec surrogates from` 13 | # 14 | # Example: 15 | # 16 | # ```plaintext 17 | # $ unisec surrogates from 0xD801 0xDC37 18 | # Char: 𐐷 19 | # Code Point: 0x10437, 0d66615, 0b10000010000110111 20 | # High Surrogate: 0xD801, 0d55297, 0b1101100000000001 21 | # Low Surrogate: 0xDC37, 0d56375, 0b1101110000110111 22 | # ``` 23 | class From < Dry::CLI::Command 24 | desc 'Code point ⬅️ Surrogates' 25 | 26 | argument :high, required: true, 27 | desc: 'High surrogate (in hexadecimal (0xXXXX), decimal (0dXXXX), binary (0bXXXX) or as text)' 28 | argument :low, required: true, 29 | desc: 'Low surrogate (in hexadecimal (0xXXXX), decimal (0dXXXX), binary (0bXXXX) or as text)' 30 | 31 | # Calculate the Unicode code point based on the surrogates. 32 | # @param high [String] decimal high surrogate 33 | # @param low [String] decimal low surrogate 34 | def call(high: nil, low: nil, **) 35 | puts Unisec::Surrogates.new(Unisec::Utils::String.convert(high, :integer), 36 | Unisec::Utils::String.convert(low, :integer)).display 37 | end 38 | end 39 | 40 | # Command `unisec surrogates to` 41 | # 42 | # Example: 43 | # 44 | # ```plaintext 45 | # $ unisec surrogates to 0x1F4A9 46 | # Char: 💩 47 | # Code Point: 0x1F4A9, 0d128169, 0b11111010010101001 48 | # High Surrogate: 0xD83D, 0d55357, 0b1101100000111101 49 | # Low Surrogate: 0xDCA9, 0d56489, 0b1101110010101001 50 | # ``` 51 | class To < Dry::CLI::Command 52 | desc 'Code point ➡️ Surrogates' 53 | 54 | argument :codepoint, required: true, 55 | desc: 'One code point (character) (in hexadecimal (0xXXXX), decimal (0dXXXX), binary ' \ 56 | '(0bXXXX) or as text)' 57 | 58 | # Calculate the surrogates based on the Unicode code point. 59 | # @param codepoint [String] decimal codepoint 60 | def call(codepoint: nil, **) 61 | puts Unisec::Surrogates.new(Unisec::Utils::String.convert(codepoint, :integer)).display 62 | end 63 | end 64 | end 65 | end 66 | end 67 | end 68 | -------------------------------------------------------------------------------- /lib/unisec/cli/properties.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'dry/cli' 4 | require 'unisec' 5 | require 'unisec/utils' 6 | 7 | module Unisec 8 | module CLI 9 | module Commands 10 | # CLI sub-commands `unisec properties xxx` for the class {Unisec::Properties} from the lib. 11 | module Properties 12 | # Command `unisec properties list` 13 | # 14 | # Example: 15 | # 16 | # ```plaintext 17 | # $ unisec properties list 18 | # ASCII_Hex_Digit 19 | # Age 20 | # Alphabetic 21 | # … 22 | # ``` 23 | class List < Dry::CLI::Command 24 | desc 'List all Unicode properties' 25 | 26 | # List Unicode properties name 27 | def call(**) 28 | Unisec::Properties.list.each do |p| 29 | puts p 30 | end 31 | end 32 | end 33 | 34 | # Command `unisec properties codepoints` 35 | # 36 | # Example: 37 | # 38 | # ```plaintext 39 | # $ unisec properties codepoints Bidi_Control 40 | # U+61C ؜ ARABIC LETTER MARK 41 | # … 42 | # ``` 43 | class Codepoints < Dry::CLI::Command 44 | desc 'List all code points for a given property' 45 | 46 | argument :property, required: true, desc: 'Unicode property name' 47 | 48 | # List code points matching a Unicode property 49 | # @param property [String] property name 50 | def call(property: nil, **) 51 | Unisec::Properties.codepoints_display(property) 52 | end 53 | end 54 | 55 | # Command `unisec properties char` 56 | # 57 | # Example: 58 | # 59 | # ```plaintext 60 | # $ unisec properties char é 61 | # Name: LATIN SMALL LETTER E WITH ACUTE 62 | # Code Point: U+00E9 63 | # 64 | # Block: Latin-1 Supplement 65 | # … 66 | # ``` 67 | class Char < Dry::CLI::Command 68 | desc 'Returns all properties of a given Unicode character (code point as string)' 69 | 70 | argument :character, required: true, desc: 'Unicode character' 71 | option :extended, default: false, values: %w[true false], desc: 'Show all properties' 72 | 73 | # Returns all properties of a given Unicode character (code point as string) 74 | # @param character [String] Unicode code point (as character / string) 75 | # @option options [Boolean] :extended Show all properties 76 | def call(character: nil, **options) 77 | to_bool = ->(str) { str == 'true' } 78 | Unisec::Properties.char_display(character, extended: to_bool.call(options.fetch(:extended))) 79 | end 80 | end 81 | end 82 | end 83 | end 84 | end 85 | -------------------------------------------------------------------------------- /test/test_bidi.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: false 2 | 3 | require 'minitest/autorun' 4 | require 'unisec' 5 | 6 | class UnisecTest < Minitest::Test 7 | def test_unisec_bidi_spoof_init 8 | input = 'https://moc.example.org//:sptth' 9 | bd = Unisec::Bidi::Spoof.new(input) 10 | rlo = "\u{202E}" 11 | pdf = "\u{202C}" 12 | 13 | assert_equal(input, bd.target_display) 14 | assert_kind_of(String, bd.spoof_string) 15 | assert_kind_of(String, bd.spoof_payload) 16 | refute_includes(bd.spoof_string, rlo) 17 | refute_includes(bd.spoof_string, pdf) 18 | assert_includes(bd.spoof_payload, rlo) 19 | assert_includes(bd.spoof_payload, pdf) 20 | end 21 | 22 | def test_unisec_bidi_spoof_init_options 23 | input = 'acceis' 24 | rli = "\u{2067}" 25 | pdi = "\u{2069}" 26 | 27 | bd1 = Unisec::Bidi::Spoof.new(input, prefix: rli, suffix: pdi) 28 | 29 | refute_includes(bd1.spoof_string, rli) 30 | refute_includes(bd1.spoof_string, pdi) 31 | assert_includes(bd1.spoof_payload, rli) 32 | assert_includes(bd1.spoof_payload, pdi) 33 | end 34 | 35 | def test_unisec_bidi_spoof_reverse 36 | text = "aeiou\u0308yz🇫🇷🐓👦🏻👋" 37 | # With default options, applying two times should give back the original input 38 | assert_equal(text, Unisec::Bidi::Spoof.reverse(Unisec::Bidi::Spoof.reverse(text))) 39 | 40 | # sub-string reverse 41 | assert_equal('document_annexe.txt', Unisec::Bidi::Spoof.reverse('document_anntxt.exe', index: 12)) 42 | end 43 | 44 | def test_unisec_bidi_spoof_bidi_affix 45 | # By default inject a RLO prefix, a PDF suffix and no infix. 46 | assert_equal('‮acceis‬', Unisec::Bidi::Spoof.bidi_affix('acceis')) 47 | # RLI ... PDI 48 | assert_equal('⁧acceis⁩', Unisec::Bidi::Spoof.bidi_affix('acceis', prefix: "\u{2067}", suffix: "\u{2069}")) 49 | # RLE ... PDF 50 | assert_equal('‫acceis‬', Unisec::Bidi::Spoof.bidi_affix('acceis', prefix: "\u{202B}", suffix: "\u{202C}")) 51 | # RLO ... PDF 52 | assert_equal('‮https://moc.example.org//:sptth‬', 53 | Unisec::Bidi::Spoof.bidi_affix('https://moc.example.org//:sptth', prefix: "\u{202E}", suffix: "\u{202C}")) 54 | # FSI RLO ... PDF PDI 55 | assert_equal('⁨‮https://moc.example.org//:sptth‬⁩', 56 | Unisec::Bidi::Spoof.bidi_affix('https://moc.example.org//:sptth', prefix: "\u{2068 202E}", suffix: "\u{202C 2069}")) 57 | # RLM ... 58 | assert_equal('‏unicode', Unisec::Bidi::Spoof.bidi_affix('unicode', prefix: "\u{200F}", suffix: '')) 59 | # For file name spoofing, it is useful to be able to inject just a RLO before the fake extension 60 | # so we can void the prefix and suffix and just set the position of an infix 61 | assert_equal('document_ann‮txt.exe', 62 | Unisec::Bidi::Spoof.bidi_affix('document_anntxt.exe', prefix: '', suffix: '', infix_bidi: "\u{202E}", infix_pos: 12)) 63 | end 64 | end 65 | -------------------------------------------------------------------------------- /docs/yard/Unisec/Utils.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Module: Unisec::Utils 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Module: Unisec::Utils 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
80 |
Defined in:
81 |
lib/unisec/utils.rb
82 |
83 | 84 |
85 | 86 |

Overview

87 |
88 |

Generic stuff not Unicode-related that can be re-used.

89 | 90 | 91 |
92 |
93 |
94 | 95 | 96 |

Defined Under Namespace

97 |

98 | 99 | 100 | Modules: String 101 | 102 | 103 | 104 | 105 |

106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 |
116 | 117 | 122 | 123 |
124 | 125 | -------------------------------------------------------------------------------- /docs/yard/file.documentation.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | File: documentation 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 59 | 60 |

Documentation

61 |

CLI doc

62 |

See Usage.

63 |

Serve locally

64 |
$ npm i docsify-cli gulp-cli -g
65 | $ cd docs-tools
66 | $ npm i
67 | $ gulp
68 | $ docsify serve ../docs
69 | 
70 |

Library doc

71 |

The output directory of the library documentation will be docs/yard.

72 |

You can consult it online here.

73 |

Build & serve locally

74 |
$ bundle exec yard doc && bundle exec yard server
75 | 
76 |
77 | 78 | 83 | 84 |
85 | 86 | -------------------------------------------------------------------------------- /docs/yard/file.LICENSE.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | File: LICENSE 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 59 | 60 |

MIT License

61 |

Copyright (c) 2023 Alexandre ZANNI at ACCEIS

62 |

Permission is hereby granted, free of charge, to any person obtaining a copy 63 | of this software and associated documentation files (the "Software"), to deal 64 | in the Software without restriction, including without limitation the rights 65 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 66 | copies of the Software, and to permit persons to whom the Software is 67 | furnished to do so, subject to the following conditions:

68 |

The above copyright notice and this permission notice shall be included in all 69 | copies or substantial portions of the Software.

70 |

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 71 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 72 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 73 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 74 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 75 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 76 | SOFTWARE.

77 |
78 | 79 | 84 | 85 |
86 | 87 | -------------------------------------------------------------------------------- /lib/unisec/cli/bidi.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'dry/cli' 4 | require 'unisec' 5 | require 'unisec/utils' 6 | 7 | module Unisec 8 | module CLI 9 | module Commands 10 | # CLI sub-commands `unisec bidi xxx` for the class {Unisec::Bidi} from the lib. 11 | module Bidi 12 | # Command `unisec bidi spoof` 13 | # 14 | # Example: 15 | # 16 | # ```plaintext 17 | # $ unisec bidi spoof noraj 18 | # Target string: noraj 19 | # Spoof payload (display) ⚠: ‮jaron‬ 20 | # Spoof string 🛈: jaron 21 | # Spoof payload (hex): e280ae6a61726f6ee280ac 22 | # Spoof payload (hex, escaped): \xe2\x80\xae\x6a\x61\x72\x6f\x6e\xe2\x80\xac 23 | # Spoof payload (base64): 4oCuamFyb27igKw= 24 | # Spoof payload (urlencode): %E2%80%AEjaron%E2%80%AC 25 | # Spoof payload (code points): U+202E U+006A U+0061 U+0072 U+006F U+006E U+202C 26 | # 27 | # 28 | # 29 | # ⚠: for the spoof payload to display correctly, be sure your VTE has RTL support, e.g. see https://wiki.archlinux.org/title/Bidirectional_text#Terminal. 30 | # 🛈: Does not contain the BiDi character (e.g. RtLO). 31 | # 32 | # $ unisec bidi spoof 'document_annexe.txt' --prefix '' --suffix '' --infix-bidi $'\U202E' --infix-pos 12 --light=true 33 | # document_ann‮txt.exe 34 | # ``` 35 | class Spoof < Dry::CLI::Command 36 | desc 'Craft a payload for BiDi attacks (for example, for spoofing a domain name or a file name)' 37 | 38 | argument :input, required: true, 39 | desc: 'String input' 40 | option :light, default: false, values: %w[true false], 41 | desc: 'true = light display (displays only the spoof payload for easy piping with other ' \ 42 | 'commands), false = full display' 43 | option :prefix, default: nil, desc: 'Prefix Bidi. Default: RLO (U+202E).' 44 | option :suffix, default: nil, desc: 'Suffix Bidi. Default: PDF (U+202C).' 45 | option :infix_bidi, default: nil, desc: 'Bidi injected at a chosen position. Default: none (empty string).' 46 | option :infix_pos, default: nil, desc: 'Spoof payload (input string with injected BiDi)' 47 | 48 | # Craft a payload for BiDi attacks 49 | # @param input [String] Input string to spoof 50 | # @param options [Hash] optional parameters, see {Unisec::Bidi::Spoof.bidi_affix} 51 | def call(input: nil, **options) 52 | to_bool = ->(str) { ['true', true].include?(str) } 53 | light = to_bool.call(options.fetch(:light)) 54 | infix_pos = options[:infix_pos].to_i unless options[:infix_pos].nil? 55 | puts Unisec::Bidi::Spoof.new(input, prefix: options[:prefix], suffix: options[:suffix], 56 | infix_bidi: options[:infix_bidi], 57 | infix_pos: infix_pos).display(light: light) 58 | end 59 | end 60 | end 61 | end 62 | end 63 | end 64 | -------------------------------------------------------------------------------- /docs/yard/file_list.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | File List 19 | 20 | 21 | 22 |
23 |
24 |

File List

25 |
26 | 27 | 28 | Classes 29 | 30 | 31 | 32 | Methods 33 | 34 | 35 | 36 | Files 37 | 38 | 39 |
40 | 41 | 42 |
43 | 44 | 94 |
95 | 96 | 97 | -------------------------------------------------------------------------------- /lib/unisec/versions.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'twitter_cldr' 4 | require 'unicode/confusable' 5 | require 'paint' 6 | 7 | module Unisec 8 | # Version information related to Unicode used in Unisec 9 | class Versions 10 | # Version and label of anything related to Unicode used in Unisec 11 | # @return [Hash] versions of each component 12 | # @example 13 | # Unisec::Versions.versions 14 | # # => 15 | # # {:unisec=>{:version=>"0.0.1", :label=>"unisec"}, 16 | # # … } 17 | def self.versions # rubocop:disable Metrics/MethodLength 18 | { 19 | unisec: { 20 | version: Unisec::VERSION, 21 | label: 'unisec' 22 | }, 23 | ruby_unicode: { 24 | version: RbConfig::CONFIG['UNICODE_VERSION'], 25 | label: 'Unicode (Ruby)' 26 | }, 27 | ruby_unicode_emoji: { 28 | version: RbConfig::CONFIG['UNICODE_EMOJI_VERSION'], 29 | label: 'Unicode emoji (Ruby)' 30 | }, 31 | twittercldr_cldr: { 32 | version: TwitterCldr::Versions::CLDR_VERSION, 33 | label: 'CLDR (twitter_cldr gem)' 34 | }, 35 | twittercldr_icu: { 36 | version: TwitterCldr::Versions::ICU_VERSION, 37 | label: 'ICU (twitter_cldr gem)' 38 | }, 39 | twittercldr_unicode: { 40 | version: TwitterCldr::Versions::UNICODE_VERSION, 41 | label: 'Unicode (twitter_cldr gem)' 42 | }, 43 | twittercldr: { 44 | version: TwitterCldr::VERSION, 45 | label: 'twitter_cldr gem' 46 | }, 47 | unicodeconfusable: { 48 | version: Unicode::Confusable::VERSION, 49 | label: 'unicode-confusable gem' 50 | }, 51 | unicodeconfusable_unicode: { 52 | version: Unicode::Confusable::UNICODE_VERSION, 53 | label: 'Unicode (unicode-confusable gem)' 54 | }, 55 | ucd_derivedname: { 56 | version: Unisec::Rugrep.ucd_derivedname_version, 57 | label: 'UCD (data/DerivedName.txt)' 58 | } 59 | } 60 | end 61 | 62 | # Display a CLI-friendly output of the version of anything related to Unicode used in unisec 63 | # @example 64 | # Unisec::Versions.display 65 | # # => 66 | # # Unicode: 67 | # # Unicode (Ruby) 15.0.0 68 | # # … 69 | # # 70 | # # Gems: 71 | # # unisec 0.0.1 72 | # # … 73 | def self.display # rubocop:disable Metrics/AbcSize 74 | data = versions 75 | colorize = ->(node) { Paint[data[node][:label], :red, :bold].ljust(44) + " #{data[node][:version]}\n" } 76 | Paint["Unicode:\n", :underline] + 77 | colorize.call(:ruby_unicode) + 78 | colorize.call(:twittercldr_unicode) + 79 | colorize.call(:unicodeconfusable_unicode) + 80 | colorize.call(:twittercldr_icu) + 81 | colorize.call(:twittercldr_cldr) + 82 | colorize.call(:ruby_unicode_emoji) + 83 | colorize.call(:ucd_derivedname) + 84 | Paint["\nGems:\n", :underline] + 85 | colorize.call(:unisec) + 86 | colorize.call(:twittercldr) + 87 | colorize.call(:unicodeconfusable) 88 | end 89 | end 90 | end 91 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Module: Unisec::CLI 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Module: Unisec::CLI 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
80 |
Defined in:
81 |
lib/unisec/cli/cli.rb,
82 | lib/unisec/cli/bidi.rb,
lib/unisec/cli/size.rb,
lib/unisec/cli/rugrep.rb,
lib/unisec/cli/hexdump.rb,
lib/unisec/cli/versions.rb,
lib/unisec/cli/properties.rb,
lib/unisec/cli/surrogates.rb,
lib/unisec/cli/confusables.rb,
lib/unisec/cli/normalization.rb
83 |
84 |
85 | 86 |
87 | 88 |

Overview

89 |
90 |

Module used to create the CLI for the executable

91 | 92 | 93 |
94 |
95 |
96 | 97 | 98 |

Defined Under Namespace

99 |

100 | 101 | 102 | Modules: Commands 103 | 104 | 105 | 106 | 107 |

108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 |
118 | 119 | 124 | 125 |
126 | 127 | -------------------------------------------------------------------------------- /docs/yard/Unisec/Bidi.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Class: Unisec::Bidi 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Class: Unisec::Bidi 63 | 64 | 65 | 66 |

67 |
68 | 69 |
70 |
Inherits:
71 |
72 | Object 73 | 74 |
    75 |
  • Object
  • 76 | 77 | 78 | 79 |
80 | show all 81 | 82 |
83 |
84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 |
96 |
Defined in:
97 |
lib/unisec/bidi.rb
98 |
99 | 100 |
101 | 102 |

Overview

103 |
104 |

Manipulation of bidirectional related content

105 | 106 | 107 |
108 |
109 |
110 | 111 | 112 |

Defined Under Namespace

113 |

114 | 115 | 116 | 117 | 118 | Classes: Spoof 119 | 120 | 121 |

122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 |
132 | 133 | 138 | 139 |
140 | 141 | -------------------------------------------------------------------------------- /docs/vendor/prismjs/components/prism-ruby.min.js: -------------------------------------------------------------------------------- 1 | !function(e){e.languages.ruby=e.languages.extend("clike",{comment:{pattern:/#.*|^=begin\s[\s\S]*?^=end/m,greedy:!0},"class-name":{pattern:/(\b(?:class|module)\s+|\bcatch\s+\()[\w.\\]+|\b[A-Z_]\w*(?=\s*\.\s*new\b)/,lookbehind:!0,inside:{punctuation:/[.\\]/}},keyword:/\b(?:BEGIN|END|alias|and|begin|break|case|class|def|define_method|defined|do|each|else|elsif|end|ensure|extend|for|if|in|include|module|new|next|nil|not|or|prepend|private|protected|public|raise|redo|require|rescue|retry|return|self|super|then|throw|undef|unless|until|when|while|yield)\b/,operator:/\.{2,3}|&\.|===||[!=]?~|(?:&&|\|\||<<|>>|\*\*|[+\-*/%<>!^&|=])=?|[?:]/,punctuation:/[(){}[\].,;]/}),e.languages.insertBefore("ruby","operator",{"double-colon":{pattern:/::/,alias:"punctuation"}});var n={pattern:/((?:^|[^\\])(?:\\{2})*)#\{(?:[^{}]|\{[^{}]*\})*\}/,lookbehind:!0,inside:{content:{pattern:/^(#\{)[\s\S]+(?=\}$)/,lookbehind:!0,inside:e.languages.ruby},delimiter:{pattern:/^#\{|\}$/,alias:"punctuation"}}};delete e.languages.ruby.function;var t="(?:"+["([^a-zA-Z0-9\\s{(\\[<=])(?:(?!\\1)[^\\\\]|\\\\[^])*\\1","\\((?:[^()\\\\]|\\\\[^]|\\((?:[^()\\\\]|\\\\[^])*\\))*\\)","\\{(?:[^{}\\\\]|\\\\[^]|\\{(?:[^{}\\\\]|\\\\[^])*\\})*\\}","\\[(?:[^\\[\\]\\\\]|\\\\[^]|\\[(?:[^\\[\\]\\\\]|\\\\[^])*\\])*\\]","<(?:[^<>\\\\]|\\\\[^]|<(?:[^<>\\\\]|\\\\[^])*>)*>"].join("|")+")",i='(?:"(?:\\\\.|[^"\\\\\r\n])*"|(?:\\b[a-zA-Z_]\\w*|[^\\s\0-\\x7F]+)[?!]?|\\$.)';e.languages.insertBefore("ruby","keyword",{"regex-literal":[{pattern:RegExp("%r"+t+"[egimnosux]{0,6}"),greedy:!0,inside:{interpolation:n,regex:/[\s\S]+/}},{pattern:/(^|[^/])\/(?!\/)(?:\[[^\r\n\]]+\]|\\.|[^[/\\\r\n])+\/[egimnosux]{0,6}(?=\s*(?:$|[\r\n,.;})#]))/,lookbehind:!0,greedy:!0,inside:{interpolation:n,regex:/[\s\S]+/}}],variable:/[@$]+[a-zA-Z_]\w*(?:[?!]|\b)/,symbol:[{pattern:RegExp("(^|[^:]):"+i),lookbehind:!0,greedy:!0},{pattern:RegExp("([\r\n{(,][ \t]*)"+i+"(?=:(?!:))"),lookbehind:!0,greedy:!0}],"method-definition":{pattern:/(\bdef\s+)\w+(?:\s*\.\s*\w+)?/,lookbehind:!0,inside:{function:/\b\w+$/,keyword:/^self\b/,"class-name":/^\w+/,punctuation:/\./}}}),e.languages.insertBefore("ruby","string",{"string-literal":[{pattern:RegExp("%[qQiIwWs]?"+t),greedy:!0,inside:{interpolation:n,string:/[\s\S]+/}},{pattern:/("|')(?:#\{[^}]+\}|#(?!\{)|\\(?:\r\n|[\s\S])|(?!\1)[^\\#\r\n])*\1/,greedy:!0,inside:{interpolation:n,string:/[\s\S]+/}},{pattern:/<<[-~]?([a-z_]\w*)[\r\n](?:.*[\r\n])*?[\t ]*\1/i,alias:"heredoc-string",greedy:!0,inside:{delimiter:{pattern:/^<<[-~]?[a-z_]\w*|\b[a-z_]\w*$/i,inside:{symbol:/\b\w+/,punctuation:/^<<[-~]?/}},interpolation:n,string:/[\s\S]+/}},{pattern:/<<[-~]?'([a-z_]\w*)'[\r\n](?:.*[\r\n])*?[\t ]*\1/i,alias:"heredoc-string",greedy:!0,inside:{delimiter:{pattern:/^<<[-~]?'[a-z_]\w*'|\b[a-z_]\w*$/i,inside:{symbol:/\b\w+/,punctuation:/^<<[-~]?'|'$/}},string:/[\s\S]+/}}],"command-literal":[{pattern:RegExp("%x"+t),greedy:!0,inside:{interpolation:n,command:{pattern:/[\s\S]+/,alias:"string"}}},{pattern:/`(?:#\{[^}]+\}|#(?!\{)|\\(?:\r\n|[\s\S])|[^\\`#\r\n])*`/,greedy:!0,inside:{interpolation:n,command:{pattern:/[\s\S]+/,alias:"string"}}}]}),delete e.languages.ruby.string,e.languages.insertBefore("ruby","number",{builtin:/\b(?:Array|Bignum|Binding|Class|Continuation|Dir|Exception|FalseClass|File|Fixnum|Float|Hash|IO|Integer|MatchData|Method|Module|NilClass|Numeric|Object|Proc|Range|Regexp|Stat|String|Struct|Symbol|TMS|Thread|ThreadGroup|Time|TrueClass)\b/,constant:/\b[A-Z][A-Z0-9_]*(?:[?!]|\b)/}),e.languages.rb=e.languages.ruby}(Prism); -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands/Bidi.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Module: Unisec::CLI::Commands::Bidi 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Module: Unisec::CLI::Commands::Bidi 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
80 |
Defined in:
81 |
lib/unisec/cli/bidi.rb
82 |
83 | 84 |
85 | 86 |

Overview

87 |
88 |

CLI sub-commands unisec bidi xxx for the class Bidi from the lib.

89 | 90 | 91 |
92 |
93 |
94 | 95 | 96 |

Defined Under Namespace

97 |

98 | 99 | 100 | 101 | 102 | Classes: Spoof 103 | 104 | 105 |

106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 |
116 | 117 | 122 | 123 |
124 | 125 | -------------------------------------------------------------------------------- /docs/yard/file.quick-start.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | File: quick-start 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 59 | 60 |

Quick start

61 |

Quick install

62 |
$ gem install unisec
63 | 
64 |

Default usage: CLI

65 |

Example converting surrogates to code point.

66 |
$ unisec surrogates from 0xD801 0xDC37
67 | Char: 𐐷
68 | Code Point: 0x10437, 0d66615, 0b10000010000110111
69 | High Surrogate: 0xD801, 0d55297, 0b1101100000000001
70 | Low Surrogate: 0xDC37, 0d56375, 0b1101110000110111
71 | 
72 |

Default usage: library

73 |
require 'unisec'
74 | 
75 | surr = Unisec::Surrogates.new(55357, 56489)
76 | surr.code_point # => 128169
77 | 
78 |
79 | 80 | 85 | 86 |
87 | 88 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands/Surrogates.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Module: Unisec::CLI::Commands::Surrogates 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Module: Unisec::CLI::Commands::Surrogates 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
80 |
Defined in:
81 |
lib/unisec/cli/surrogates.rb
82 |
83 | 84 |
85 | 86 |

Overview

87 |
88 |

CLI sub-commands unisec surrogates xxx for the class Surrogates from the lib.

89 | 90 | 91 |
92 |
93 |
94 | 95 | 96 |

Defined Under Namespace

97 |

98 | 99 | 100 | 101 | 102 | Classes: From, To 103 | 104 | 105 |

106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 |
116 | 117 | 122 | 123 |
124 | 125 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands/Normalize.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Module: Unisec::CLI::Commands::Normalize 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Module: Unisec::CLI::Commands::Normalize 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
80 |
Defined in:
81 |
lib/unisec/cli/normalization.rb
82 |
83 | 84 |
85 | 86 |

Overview

87 |
88 |

CLI sub-commands unisec normalize xxx for the class Normalization from the lib.

89 | 90 | 91 |
92 |
93 |
94 | 95 | 96 |

Defined Under Namespace

97 |

98 | 99 | 100 | 101 | 102 | Classes: All, Replace 103 | 104 | 105 |

106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 |
116 | 117 | 122 | 123 |
124 | 125 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands/Confusables.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Module: Unisec::CLI::Commands::Confusables 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Module: Unisec::CLI::Commands::Confusables 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
80 |
Defined in:
81 |
lib/unisec/cli/confusables.rb
82 |
83 | 84 |
85 | 86 |

Overview

87 |
88 |

CLI sub-commands unisec confusables xxx for the class Unisec::Confusables from the lib.

89 | 90 | 91 |
92 |
93 |
94 | 95 | 96 |

Defined Under Namespace

97 |

98 | 99 | 100 | 101 | 102 | Classes: List, Randomize 103 | 104 | 105 |

106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 |
116 | 117 | 122 | 123 |
124 | 125 | -------------------------------------------------------------------------------- /docs/pages/usage.md: -------------------------------------------------------------------------------- 1 | # Usage 2 | 3 | ## CLI 4 | 5 | ### General help 6 | 7 | List commands: 8 | 9 | ``` 10 | $ unisec --help 11 | Commands: 12 | unisec bidi [SUBCOMMAND] 13 | unisec confusables [SUBCOMMAND] 14 | unisec grep REGEXP # Search for Unicode code point names by regular expression 15 | unisec hexdump INPUT # Hexdump in all Unicode encodings 16 | unisec properties [SUBCOMMAND] 17 | unisec size INPUT # All kinf of size information about a Unicode string 18 | unisec surrogates [SUBCOMMAND] 19 | unisec versions # Version of anything related to Unicode as used in unisec 20 | ``` 21 | 22 | List sub-commands: 23 | 24 | ``` 25 | $ unisec surrogates --help 26 | Commands: 27 | unisec surrogates from HIGH LOW # Code point ⬅️ Surrogates 28 | unisec surrogates to NAME # Code point ➡️ Surrogates 29 | ``` 30 | 31 | Sub-command help: 32 | 33 | ``` 34 | $ unisec surrogates from --help 35 | Command: 36 | unisec surrogates from 37 | 38 | Usage: 39 | unisec surrogates from HIGH LOW 40 | 41 | Description: 42 | Code point ⬅️ Surrogates 43 | 44 | Arguments: 45 | HIGH # REQUIRED High surrogate (in hexadecimal (0xXXXX), decimal (0dXXXX), binary (0bXXXX) or as text) 46 | LOW # REQUIRED Low surrogate (in hexadecimal (0xXXXX), decimal (0dXXXX), binary (0bXXXX) or as text) 47 | 48 | Options: 49 | --help, -h # Print this help 50 | ``` 51 | 52 | ### Examples 53 | 54 | - **BiDi** 55 | - [Spoof](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Bidi/Spoof) 56 | - **Confusables** 57 | - [List](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Confusables/List) 58 | - [Randomize](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Confusables/Randomize) 59 | - [Grep](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Grep) 60 | - [Hexdump](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Hexdump) 61 | - **Normalize** 62 | - [All](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Normalize/All) 63 | - [Replace](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Normalize/Replace) 64 | - **Properties** 65 | - [Char](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Properties/Char) 66 | - [Codepoints](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Properties/Codepoints) 67 | - [List](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Properties/List) 68 | - [Size](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Size) 69 | - **Surrogates** 70 | - [From](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Surrogates/From) 71 | - [To](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Surrogates/To) 72 | - [Versions](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands/Versions) 73 | 74 | [Library documentation for commands](https://acceis.github.io/unisec/yard/Unisec/CLI/Commands). 75 | 76 | ## Library 77 | 78 | See examples in [the library documentation](https://acceis.github.io/unisec/yard/Unisec). 79 | 80 | - [Unisec::Bidi](https://acceis.github.io/unisec/yard/Unisec/Bidi) 81 | - [Unisec::Confusables](https://acceis.github.io/unisec/yard/Unisec/Confusables) 82 | - [Unisec::Hexdump](https://acceis.github.io/unisec/yard/Unisec/Hexdump) 83 | - [Unisec::Normalization](https://acceis.github.io/unisec/yard/Unisec/Normalization) 84 | - [Unisec::Properties](https://acceis.github.io/unisec/yard/Unisec/Properties) 85 | - [Unisec::Rugrep](https://acceis.github.io/unisec/yard/Unisec/Rugrep) 86 | - [Unisec::Size](https://acceis.github.io/unisec/yard/Unisec/Size) 87 | - [Unisec::Surrogates](https://acceis.github.io/unisec/yard/Unisec/Surrogates) 88 | - [Unisec::Versions](https://acceis.github.io/unisec/yard/Unisec/Versions) 89 | -------------------------------------------------------------------------------- /lib/unisec/cli/normalization.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'dry/cli' 4 | require 'unisec' 5 | require 'unisec/utils' 6 | 7 | module Unisec 8 | module CLI 9 | module Commands 10 | # CLI sub-commands `unisec normalize xxx` for the class {Unisec::Normalization} from the lib. 11 | module Normalize 12 | # Command `unisec normalize all "example"` 13 | # 14 | # Example: 15 | # 16 | # ```plaintext 17 | # ➜ unisec normalize all ẛ̣ 18 | # Original: ẛ̣ 19 | # U+1E9B U+0323 20 | # NFC: ẛ̣ 21 | # U+1E9B U+0323 22 | # NFKC: ṩ 23 | # U+1E69 24 | # NFD: ẛ̣ 25 | # U+017F U+0323 U+0307 26 | # NFKD: ṩ 27 | # U+0073 U+0323 U+0307 28 | # 29 | # ➜ unisec normalize all ẛ̣ --form nfkd 30 | # ṩ 31 | # ``` 32 | class All < Dry::CLI::Command 33 | desc 'Normalize in all forms' 34 | 35 | argument :input, required: true, 36 | desc: 'String input. Read from STDIN if equal to -.' 37 | 38 | option :form, default: nil, values: %w[nfc nfkc nfd nfkd], 39 | desc: 'Output only in the specified normalization form.' 40 | 41 | # Normalize in all forms 42 | # @param input [String] Input string to normalize 43 | def call(input: nil, **options) 44 | input = $stdin.read.chomp if input == '-' 45 | if options[:form].nil? 46 | puts Unisec::Normalization.new(input).display 47 | else 48 | # using send() is safe here thanks to the value whitelist 49 | puts Unisec::Normalization.send(options[:form], input) 50 | end 51 | end 52 | end 53 | 54 | # Command `unisec normalize replace "example"` 55 | # 56 | # Example: 57 | # 58 | # ```plaintext 59 | # ➜ unisec normalize replace "" 60 | # Original: 61 | # U+003C U+0073 U+0076 U+0067 U+0020 U+006F U+006E U+006C U+006F U+0061 U+0064 U+003D U+0022 U+0061 U+006C U+0065 U+0072 U+0074 U+0028 U+0027 U+0058 U+0053 U+0053 U+0027 U+0029 U+0022 U+003E 62 | # Bypass payload: ﹤svg onload="alert('XSS')"﹥ 63 | # U+FE64 U+0073 U+0076 U+0067 U+0020 U+006F U+006E U+006C U+006F U+0061 U+0064 U+003D U+FF02 U+0061 U+006C U+0065 U+0072 U+0074 U+0028 U+FF07 U+0058 U+0053 U+0053 U+FF07 U+0029 U+FF02 U+FE65 64 | # NFKC: 65 | # U+003C U+0073 U+0076 U+0067 U+0020 U+006F U+006E U+006C U+006F U+0061 U+0064 U+003D U+0022 U+0061 U+006C U+0065 U+0072 U+0074 U+0028 U+0027 U+0058 U+0053 U+0053 U+0027 U+0029 U+0022 U+003E 66 | # NFKD: 67 | # U+003C U+0073 U+0076 U+0067 U+0020 U+006F U+006E U+006C U+006F U+0061 U+0064 U+003D U+0022 U+0061 U+006C U+0065 U+0072 U+0074 U+0028 U+0027 U+0058 U+0053 U+0053 U+0027 U+0029 U+0022 U+003E 68 | # 69 | # ➜ echo -n "" | unisec normalize replace - 70 | # ``` 71 | class Replace < Dry::CLI::Command 72 | desc 'Prepare a XSS payload for HTML escape bypass (HTML escape followed by NFKC / NFKD normalization)' 73 | 74 | argument :input, required: true, 75 | desc: 'String input. Read from STDIN if equal to -.' 76 | 77 | # Prepare a XSS payload for HTML escape bypass (HTML escape followed by NFKC / NFKD normalization) 78 | # @param input [String] Input string to normalize 79 | def call(input: nil, **_options) 80 | input = $stdin.read.chomp if input == '-' 81 | puts Unisec::Normalization.new(input).display_replace 82 | end 83 | end 84 | end 85 | end 86 | end 87 | end 88 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands/Properties.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Module: Unisec::CLI::Commands::Properties 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Module: Unisec::CLI::Commands::Properties 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
80 |
Defined in:
81 |
lib/unisec/cli/properties.rb
82 |
83 | 84 |
85 | 86 |

Overview

87 |
88 |

CLI sub-commands unisec properties xxx for the class Properties from the lib.

89 | 90 | 91 |
92 |
93 |
94 | 95 | 96 |

Defined Under Namespace

97 |

98 | 99 | 100 | 101 | 102 | Classes: Char, Codepoints, List 103 | 104 | 105 |

106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 |
116 | 117 | 122 | 123 |
124 | 125 | -------------------------------------------------------------------------------- /lib/unisec/hexdump.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'ctf_party' 4 | 5 | module Unisec 6 | # Hexdump of all Unicode encodings. 7 | class Hexdump 8 | # UTF-8 hexdump 9 | # @return [String] UTF-8 hexdump 10 | attr_reader :utf8 11 | 12 | # UTF-16BE hexdump 13 | # @return [String] UTF-16BE hexdump 14 | attr_reader :utf16be 15 | 16 | # UTF-16LE hexdump 17 | # @return [String] UTF-16LE hexdump 18 | attr_reader :utf16le 19 | 20 | # UTF-32BE hexdump 21 | # @return [String] UTF-32BE hexdump 22 | attr_reader :utf32be 23 | 24 | # UTF-32LE hexdump 25 | # @return [String] UTF-32LE hexdump 26 | attr_reader :utf32le 27 | 28 | # Init the hexdump. 29 | # @param str [String] Input string to encode 30 | # @example 31 | # hxd = Unisec::Hexdump.new('I 💕 Ruby 💎') 32 | # hxd.utf8 # => "49 20 f0 9f 92 95 20 52 75 62 79 20 f0 9f 92 8e" 33 | # hxd.utf16be # => "0049 0020 d83d dc95 0020 0052 0075 0062 0079 0020 d83d dc8e" 34 | # hxd.utf32be # => "00000049 00000020 0001f495 00000020 00000052 00000075 00000062 00000079 00000020 0001f48e" 35 | def initialize(str) 36 | @utf8 = Hexdump.utf8(str) 37 | @utf16be = Hexdump.utf16be(str) 38 | @utf16le = Hexdump.utf16le(str) 39 | @utf32be = Hexdump.utf32be(str) 40 | @utf32le = Hexdump.utf32le(str) 41 | end 42 | 43 | # Encode to UTF-8 in hexdump format (spaced at every code unit = every byte) 44 | # @param str [String] Input string to encode 45 | # @return [String] hexdump (UTF-8 encoded) 46 | # @example 47 | # Unisec::Hexdump.utf8('🐋') # => "f0 9f 90 8b" 48 | def self.utf8(str) 49 | str.encode('UTF-8').to_hex.scan(/.{2}/).join(' ') 50 | end 51 | 52 | # Encode to UTF-16BE in hexdump format (spaced at every code unit = every 2 bytes) 53 | # @param str [String] Input string to encode 54 | # @return [String] hexdump (UTF-16BE encoded) 55 | # @example 56 | # Unisec::Hexdump.utf16be('🐋') # => "d83d dc0b" 57 | def self.utf16be(str) 58 | str.encode('UTF-16BE').to_hex.scan(/.{4}/).join(' ') 59 | end 60 | 61 | # Encode to UTF-16LE in hexdump format (spaced at every code unit = every 2 bytes) 62 | # @param str [String] Input string to encode 63 | # @return [String] hexdump (UTF-16LE encoded) 64 | # @example 65 | # Unisec::Hexdump.utf16le('🐋') # => "3dd8 0bdc" 66 | def self.utf16le(str) 67 | str.encode('UTF-16LE').to_hex.scan(/.{4}/).join(' ') 68 | end 69 | 70 | # Encode to UTF-32BE in hexdump format (spaced at every code unit = every 4 bytes) 71 | # @param str [String] Input string to encode 72 | # @return [String] hexdump (UTF-32BE encoded) 73 | # @example 74 | # Unisec::Hexdump.utf32be('🐋') # => "0001f40b" 75 | def self.utf32be(str) 76 | str.encode('UTF-32BE').to_hex.scan(/.{8}/).join(' ') 77 | end 78 | 79 | # Encode to UTF-32LE in hexdump format (spaced at every code unit = every 4 bytes) 80 | # @param str [String] Input string to encode 81 | # @return [String] hexdump (UTF-32LE encoded) 82 | # @example 83 | # Unisec::Hexdump.utf32le('🐋') # => "0bf40100" 84 | def self.utf32le(str) 85 | str.encode('UTF-32LE').to_hex.scan(/.{8}/).join(' ') 86 | end 87 | 88 | # Display a CLI-friendly output summurizing the hexdump in all Unicode encodings 89 | # @return [String] CLI-ready output 90 | # @example 91 | # puts Unisec::Hexdump.new('K').display # => 92 | # # UTF-8: e2 84 aa 93 | # # UTF-16BE: 212a 94 | # # UTF-16LE: 2a21 95 | # # UTF-32BE: 0000212a 96 | # # UTF-32LE: 2a210000 97 | def display 98 | "UTF-8: #{@utf8}\n" \ 99 | "UTF-16BE: #{@utf16be}\n" \ 100 | "UTF-16LE: #{@utf16le}\n" \ 101 | "UTF-32BE: #{@utf32be}\n" \ 102 | "UTF-32LE: #{@utf32le}" 103 | end 104 | end 105 | end 106 | -------------------------------------------------------------------------------- /docs/yard/file.publishing.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | File: publishing 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 59 | 60 |

Publishing

61 |

Be sure all tests pass!

62 |
$ bundle exec rake test
 63 | 
64 |

Also check the linter:

65 |
$ bundle exec rubocop
 66 | 
67 |

Update the version in lib/unisec/version.rb.

68 |

Update the documentation, at least:

69 |
    70 |
  • README.md
  • 71 |
  • docs/CHANGELOG.md
  • 72 |
  • docs/pages/usage.md
  • 73 |
74 |

On new release don't forget to rebuild the library documentation:

75 |
$ bundle exec yard doc
 76 | 
77 |

Create an annotated git tag:

78 |
$ git tag -a v1.5.0
 79 | 
80 |

Push the changes including the tags:

81 |
$ git push --follow-tags
 82 | 
83 |

Build the gem:

84 |
$ gem build unisec.gemspec
 85 | # or
 86 | $ bundle exec rake build
 87 | 
88 |

Push the new gem release on RubyGems See https://guides.rubygems.org/publishing/.

89 |
$ gem push unisec-1.5.0.gem
 90 | 
91 |
92 | 93 | 98 | 99 |
100 | 101 | -------------------------------------------------------------------------------- /lib/unisec/utils.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'ctf_party' 4 | 5 | class Integer 6 | # Convert an integer to an hexadecimal string 7 | # @return [String] The interger converted to hexadecimal and casted to an upper case string 8 | # @example 9 | # 42.to_hex # => "2A" 10 | def to_hex 11 | to_s(16).upcase 12 | end 13 | 14 | # Convert an integer to an binary string 15 | # @return [String] The interger converted to binary and casted to a string 16 | # @example 17 | # 42.to_bin # => "101010" 18 | def to_bin 19 | to_s(2) 20 | end 21 | end 22 | 23 | module Unisec 24 | # Generic stuff not Unicode-related that can be re-used. 25 | module Utils 26 | # About string conversion and manipulation. 27 | module String 28 | # Convert a string input into the chosen type. 29 | # @param input [String] If the target type is `:integer`, the string must represent a number encoded in 30 | # hexadecimal, decimal, binary. If it's a Unicode string, only the first code point will be taken into account. 31 | # @param target_type [Symbol] Convert to the chosen type. Currently only supports `:integer`. 32 | # @return [Variable] The type of the output depends on the chosen `target_type`. 33 | # @example 34 | # Unisec::Utils::String.convert('0x1f4a9', :integer) # => 128169 35 | def self.convert(input, target_type) 36 | case target_type 37 | when :integer 38 | convert_to_integer(input) 39 | else 40 | raise TypeError, "Target type \"#{target_type}\" not avaible" 41 | end 42 | end 43 | 44 | # Internal method used for {.convert}. 45 | # 46 | # Convert a string input into integer. 47 | # @param input [String] The string must represent a number encoded in hexadecimal, decimal, binary. If it's a 48 | # Unicode string, only the first code point will be taken into account. The input type is determined 49 | # automatically based on the prefix. 50 | # @return [Integer] 51 | # @example 52 | # # Hexadecimal 53 | # Unisec::Utils::String.convert_to_integer('0x1f4a9') # => 128169 54 | # # Decimal 55 | # Unisec::Utils::String.convert_to_integer('0d128169') # => 128169 56 | # # Binary 57 | # Unisec::Utils::String.convert_to_integer('0b11111010010101001') # => 128169 58 | # # Unicode string 59 | # Unisec::Utils::String.convert_to_integer('💩') # => 128169 60 | def self.convert_to_integer(input) 61 | case autodetect(input) 62 | when :hexadecimal 63 | input.hex2dec(prefix: '0x').to_i 64 | when :decimal 65 | input.to_i 66 | when :binary 67 | input.bin2hex.hex2dec.to_i 68 | when :string 69 | input.codepoints.first 70 | else 71 | raise TypeError, "Input \"#{input}\" is not of the expected type" 72 | end 73 | end 74 | 75 | # Internal method used for {.convert}. 76 | # 77 | # Autodetect the representation type of the string input. 78 | # @param str [String] Input. 79 | # @return [Symbol] the detected type: `:hexadecimal`, `:decimal`, `:binary`, `:string`. 80 | # @example 81 | # # Hexadecimal 82 | # Unisec::Utils::String.autodetect('0x1f4a9') # => :hexadecimal 83 | # # Decimal 84 | # Unisec::Utils::String.autodetect('0d128169') # => :decimal 85 | # # Binary 86 | # Unisec::Utils::String.autodetect('0b11111010010101001') # => :binary 87 | # # Unicode string 88 | # Unisec::Utils::String.autodetect('💩') # => :string 89 | def self.autodetect(str) 90 | case str 91 | when /0x[0-9a-fA-F]/ 92 | :hexadecimal 93 | when /0d[0-9]+/ 94 | :decimal 95 | when /0b[0-1]+/ 96 | :binary 97 | else 98 | :string 99 | end 100 | end 101 | 102 | # Reverse a string by graphemes (not by code points) 103 | # @return [String] the reversed string 104 | # @example 105 | # b = "\u{1f1eb}\u{1f1f7}\u{1F413}" # => "🇫🇷🐓" 106 | # b.reverse # => "🐓🇷🇫" 107 | # Unisec::Utils::String.grapheme_reverse(b) # => "🐓🇫🇷" 108 | def self.grapheme_reverse(str) 109 | str.grapheme_clusters.reverse.join 110 | end 111 | end 112 | end 113 | end 114 | -------------------------------------------------------------------------------- /docs/vendor/plugins/docsify-sidebar-collapse.min.js: -------------------------------------------------------------------------------- 1 | !function(e){("object"!=typeof exports||"undefined"==typeof module)&&"function"==typeof define&&define.amd?define(e):e()}(function(){"use strict";function e(e,n){var t,a=(n=void 0===n?{}:n).insertAt;e&&"undefined"!=typeof document&&(t=document.head||document.getElementsByTagName("head")[0],(n=document.createElement("style")).type="text/css","top"===a&&t.firstChild?t.insertBefore(n,t.firstChild):t.appendChild(n),n.styleSheet?n.styleSheet.cssText=e:n.appendChild(document.createTextNode(e)))}var t;function a(e){e&&null!=t&&(e=e.getBoundingClientRect().top,document.querySelector(".sidebar").scrollBy(0,e-t))}function n(){requestAnimationFrame(function(){var e=document.querySelector(".app-sub-sidebar > .active");if(e)for(e.parentNode.parentNode.querySelectorAll(".app-sub-sidebar").forEach(function(e){return e.classList.remove("open")});e.parentNode.classList.contains("app-sub-sidebar")&&!e.parentNode.classList.contains("open");)e.parentNode.classList.add("open"),e=e.parentNode})}function o(e){t=e.target.getBoundingClientRect().top;var n=d(e.target,"LI",2);n&&(n.classList.contains("open")?(n.classList.remove("open"),setTimeout(function(){n.classList.add("collapse")},0)):(function(e){if(e)for(e.classList.remove("open","active");e&&"sidebar-nav"!==e.className&&e.parentNode;)"LI"!==e.parentNode.tagName&&"app-sub-sidebar"!==e.parentNode.className||e.parentNode.classList.remove("open"),e=e.parentNode}(s()),i(n),setTimeout(function(){n.classList.remove("collapse")},0)),a(n))}function s(){var e=document.querySelector(".sidebar-nav .active");return e||(e=d(document.querySelector('.sidebar-nav a[href="'.concat(decodeURIComponent(location.hash).replace(/ /gi,"%20"),'"]')),"LI",2))&&e.classList.add("active"),e}function i(e){if(e)for(e.classList.add("open","active");e&&"sidebar-nav"!==e.className&&e.parentNode;)"LI"!==e.parentNode.tagName&&"app-sub-sidebar"!==e.parentNode.className||e.parentNode.classList.add("open"),e=e.parentNode}function d(e,n,t){if(e&&e.tagName===n)return e;for(var a=0;e;){if(t<++a)return;if(e.parentNode.tagName===n)return e.parentNode;e=e.parentNode}}e(".sidebar-nav > ul > li ul {\n display: none;\n}\n\n.app-sub-sidebar {\n display: none;\n}\n\n.app-sub-sidebar.open {\n display: block;\n}\n\n.sidebar-nav .open > ul:not(.app-sub-sidebar),\n.sidebar-nav .active:not(.collapse) > ul {\n display: block;\n}\n\n/* 抖动 */\n.sidebar-nav li.open:not(.collapse) > ul {\n display: block;\n}\n\n.active + ul.app-sub-sidebar {\n display: block;\n}\n"),document.addEventListener("scroll",n);e("@media screen and (max-width: 768px) {\n /* 移动端适配 */\n .markdown-section {\n max-width: none;\n padding: 16px;\n }\n /* 改变原来按钮热区大小 */\n .sidebar-toggle {\n padding: 0 0 10px 10px;\n }\n /* my pin */\n .sidebar-pin {\n appearance: none;\n outline: none;\n position: fixed;\n bottom: 0;\n border: none;\n width: 40px;\n height: 40px;\n background: transparent;\n }\n}\n");var r,c="DOCSIFY_SIDEBAR_PIN_FLAG";function l(){var e="true"===(e=localStorage.getItem(c));localStorage.setItem(c,!e),e?(document.querySelector(".sidebar").style.transform="translateX(0)",document.querySelector(".content").style.transform="translateX(0)"):(document.querySelector(".sidebar").style.transform="translateX(300px)",document.querySelector(".content").style.transform="translateX(300px)")}768 ul"),1),a(t),n(e)}),e.ready(function(){document.querySelector(".sidebar-nav").addEventListener("click",o)})})}); -------------------------------------------------------------------------------- /docs/yard/file.CHANGELOG.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | File: CHANGELOG 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 59 | 60 |

[unreleased]

61 |

[0.0.6]

62 |

Features

63 |
    64 |
  • Prepare a XSS payload for HTML escape bypass (HTML escape followed by NFKC / NFKD normalization) 65 |
      66 |
    • Rename CLI command normalize into normalize all
    • 67 |
    • Add a new method replace_bypass in the class Unisec::Normalization
    • 68 |
    • Add a new CLI command normalize replace (using the new replace_bypass method)
    • 69 |
    70 |
  • 71 |
72 |

[0.0.5]

73 |

Features

74 |
    75 |
  • Add a new class Unisec::Normalization and CLI command normalize to output all normalization forms
  • 76 |
77 |

Chore

78 |
    79 |
  • Enhance documentation
  • 80 |
  • Dependencies update
  • 81 |
82 |

[0.0.4]

83 |

Features

84 |
    85 |
  • Add a new class Unisec::Bidi::Spoof and CLI command bidi spoof to craft payloads for attack using BiDi code points like RtLO, for example, for spoofing a domain name or a file name
  • 86 |
  • Add a new helper method: Unisec::Utils::String.grapheme_reverse: Reverse a string by graphemes (not by code points)
  • 87 |
  • Add an --enc option for unisec hexdump to output only in the specified encoding
  • 88 |
  • unisec hexdump can now read from STDIN if the input equals to -
  • 89 |
90 |

[0.0.3]

91 |

Features

92 |
    93 |
  • Add a new class Unisec::Rugrep and CLI command grep to search for Unicode code point names by regular expression
  • 94 |
  • Add a new method Unisec::Properties.deccp2stdhexcp: Convert from decimal code point to standardized format hexadecimal code point
  • 95 |
96 |

Chore

97 |
    98 |
  • Enhance tests: assert_equal(true, test) ➡️ assert(test)
  • 99 |
  • Enhance SEO: better description
  • 100 |
101 |

[0.0.2]

102 |
    103 |
  • Add 2 new classes (and corresponding CLI command): 104 |
      105 |
    • Unisec::Versions: Version of Unicode, ICU, CLDR, gems used in Unisec
    • 106 |
    • Unisec::Size: Code point, grapheme, UTF-8/UTF-16/UTF-32 byte/unit size
    • 107 |
    108 |
  • 109 |
110 |

[0.0.1]

111 |
    112 |
  • Initial version
  • 113 |
114 |
115 | 116 | 121 | 122 |
123 | 124 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Module: Unisec::CLI::Commands 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Module: Unisec::CLI::Commands 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 |
73 |
Extended by:
74 |
Dry::CLI::Registry
75 |
76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 |
85 |
Defined in:
86 |
lib/unisec/cli/cli.rb,
87 | lib/unisec/cli/bidi.rb,
lib/unisec/cli/size.rb,
lib/unisec/cli/rugrep.rb,
lib/unisec/cli/hexdump.rb,
lib/unisec/cli/versions.rb,
lib/unisec/cli/properties.rb,
lib/unisec/cli/surrogates.rb,
lib/unisec/cli/confusables.rb,
lib/unisec/cli/normalization.rb
88 |
89 |
90 | 91 |
92 | 93 |

Overview

94 |
95 |

Registered commands for the CLI

96 | 97 | 98 |
99 |
100 |
101 | 102 | 103 |

Defined Under Namespace

104 |

105 | 106 | 107 | Modules: Bidi, Confusables, Normalize, Properties, Surrogates 108 | 109 | 110 | 111 | Classes: Grep, Hexdump, Size, Versions 112 | 113 | 114 |

115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 |
126 | 127 | 132 | 133 |
134 | 135 | -------------------------------------------------------------------------------- /lib/unisec/surrogates.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'unisec/utils' 4 | require 'ctf_party' 5 | 6 | module Unisec 7 | # UTF-16 surrogates conversion. 8 | class Surrogates 9 | # Unicode code point 10 | # @return [Integer] decimal codepoint 11 | attr_reader :cp 12 | 13 | # High surrogate (1st code unit of a surrogate pair). Also called lead surrogate. 14 | # @return [Integer] decimal high surrogate 15 | attr_reader :hs 16 | 17 | # Low surrogate (2nd code unit of a surrogate pair). Also called trail surrogate. 18 | # @return [Integer] decimal low surrogate 19 | attr_reader :ls 20 | 21 | # Init the surrogate pair. 22 | # @param args [Integer] If one argument is provided, it's evaluated as the 23 | # code point and the two surrogates will be calculated automatically. 24 | # If two arguments are provided, they are evaluated as a surrogate pair (high 25 | # then low) and the code point will be calculated. 26 | # @example 27 | # surr = Unisec::Surrogates.new(128169) 28 | # # => # 29 | # surr.cp # => 128169 30 | # surr.hs # => 55357 31 | # surr.ls # => 56489 32 | # Unisec::Surrogates.new(55357, 56489) 33 | # # => # 34 | def initialize(*args) 35 | if args.size == 1 36 | @cp = args[0] 37 | @hs = high_surrogate 38 | @ls = low_surrogate 39 | elsif args.size == 2 40 | @hs = args[0] 41 | @ls = args[1] 42 | @cp = code_point 43 | else 44 | raise ArgumentError 45 | end 46 | end 47 | 48 | # Calculate the high surrogate based on the Unicode code point. 49 | # @param codepoint [Integer] decimal codepoint 50 | # @return [Integer] decimal high surrogate 51 | # @example 52 | # Unisec::Surrogates.high_surrogate(128169) # => 55357 53 | def self.high_surrogate(codepoint) 54 | (((codepoint - 0x10000) / 0x400).floor + 0xd800) 55 | end 56 | 57 | # Calculate the low surrogate based on the Unicode code point. 58 | # @param codepoint [Integer] decimal codepoint 59 | # @return [Integer] decimal low surrogate 60 | # @example 61 | # Unisec::Surrogates.low_surrogate(128169) # => 56489 62 | def self.low_surrogate(codepoint) 63 | (((codepoint - 0x10000) % 0x400) + 0xdc00) 64 | end 65 | 66 | # Calculate the Unicode code point based on the surrogates. 67 | # @param hs [Integer] decimal high surrogate 68 | # @param ls [Integer] decimal low surrogate 69 | # @return [Integer] decimal code point 70 | # @example 71 | # Unisec::Surrogates.code_point(55357, 56489) # => 128169 72 | def self.code_point(hs, ls) 73 | (((hs - 0xd800) * 0x400) + ls - 0xdc00 + 0x10000) 74 | end 75 | 76 | # Same as accessing {.hs}. Calculate the {.high_surrogate}. 77 | # @return [Integer] decimal high surrogate 78 | # @example 79 | # surr = Unisec::Surrogates.new(128169) 80 | # surr.high_surrogate # => 55357 81 | def high_surrogate 82 | @hs = Surrogates.high_surrogate(@cp) 83 | end 84 | 85 | # Same as accessing {.ls}. Calculate the {.low_surrogate}. 86 | # @return [Integer] decimal low surrogate 87 | # @example 88 | # surr = Unisec::Surrogates.new(128169) 89 | # surr.low_surrogate # => 56489 90 | def low_surrogate 91 | @ls = Surrogates.low_surrogate(@cp) 92 | end 93 | 94 | # Same as accessing {.cp}. Calculate the {.code_point}. 95 | # @return [Integer] decimal code point 96 | # surr = Unisec::Surrogates.new(55357, 56489) 97 | # surr.code_point # => 128169 98 | def code_point 99 | @cp = Surrogates.code_point(@hs, @ls) 100 | end 101 | 102 | # Display a CLI-friendly output summurizing everithing about the surrogates: 103 | # the corresponding character, code point, high and low surrogates 104 | # (each displayed as hexadecimal, decimal and binary). 105 | # @return [String] CLI-ready output 106 | # @example 107 | # surr = Unisec::Surrogates.new(128169) 108 | # puts surr.display # => 109 | # # Char: 💩 110 | # # Code Point: 0x1F4A9, 0d128169, 0b11111010010101001 111 | # # High Surrogate: 0xD83D, 0d55357, 0b1101100000111101 112 | # # Low Surrogate: 0xDCA9, 0d56489, 0b1101110010101001 113 | def display 114 | "Char: #{[@cp].pack('U*')}\n" \ 115 | "Code Point: 0x#{@cp.to_hex}, 0d#{@cp}, 0b#{@cp.to_bin}\n" \ 116 | "High Surrogate: 0x#{@hs.to_hex}, 0d#{@hs}, 0b#{@hs.to_bin}\n" \ 117 | "Low Surrogate: 0x#{@ls.to_hex}, 0d#{@ls}, 0b#{@ls.to_bin}" 118 | end 119 | end 120 | end 121 | -------------------------------------------------------------------------------- /docs/yard/Unisec.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Module: Unisec 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Module: Unisec 63 | 64 | 65 | 66 |

67 |
68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 |
80 |
Defined in:
81 |
lib/unisec/bidi.rb,
82 | lib/unisec/size.rb,
lib/unisec/utils.rb,
lib/unisec/rugrep.rb,
lib/unisec/cli/cli.rb,
lib/unisec/hexdump.rb,
lib/unisec/version.rb,
lib/unisec/cli/bidi.rb,
lib/unisec/cli/size.rb,
lib/unisec/versions.rb,
lib/unisec/cli/rugrep.rb,
lib/unisec/properties.rb,
lib/unisec/surrogates.rb,
lib/unisec/cli/hexdump.rb,
lib/unisec/confusables.rb,
lib/unisec/cli/versions.rb,
lib/unisec/normalization.rb,
lib/unisec/cli/properties.rb,
lib/unisec/cli/surrogates.rb,
lib/unisec/cli/confusables.rb,
lib/unisec/cli/normalization.rb
83 |
84 |
85 | 86 |
87 | 88 |

Defined Under Namespace

89 |

90 | 91 | 92 | Modules: CLI, Utils 93 | 94 | 95 | 96 | Classes: Bidi, Confusables, Hexdump, Normalization, Properties, Rugrep, Size, Surrogates, Versions 97 | 98 | 99 |

100 | 101 | 102 |

103 | Constant Summary 104 | collapse 105 |

106 | 107 |
108 | 109 |
VERSION = 110 |
111 |
112 |

Version of unisec library and app

113 | 114 | 115 |
116 |
117 |
118 | 119 | 120 |
121 |
122 |
'0.0.6'
123 | 124 |
125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 |
136 | 137 | 142 | 143 |
144 | 145 | -------------------------------------------------------------------------------- /lib/unisec/normalization.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'ctf_party' 4 | 5 | module Unisec 6 | # Normalization Forms 7 | class Normalization 8 | # HTML escapable characters mapped with their Unicode counterparts that will 9 | # cast to themself after applying normalization forms using compatibility mode. 10 | HTML_ESCAPE_BYPASS = { 11 | '<' => ['﹤', '<'], 12 | '>' => ['﹥', '>'], 13 | '"' => ['"'], 14 | "'" => ['''], 15 | '&' => ['﹠', '&'] 16 | }.freeze 17 | 18 | # Original input 19 | # @return [String] untouched input 20 | attr_reader :original 21 | 22 | # Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition 23 | # @return [String] input normalized with NFC 24 | attr_reader :nfc 25 | 26 | # Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition 27 | # @return [String] input normalized with NFKC 28 | attr_reader :nfkc 29 | 30 | # Normalization Form D (NFD) - Canonical Decomposition 31 | # @return [String] input normalized with NFD 32 | attr_reader :nfd 33 | 34 | # Normalization Form KD (NFKD) - Compatibility Decomposition 35 | # @return [String] input normalized with NFKD 36 | attr_reader :nfkd 37 | 38 | # Generate all normilzation forms for a given input 39 | # @param str [String] the target string 40 | # @return [nil] 41 | def initialize(str) 42 | @original = str 43 | @nfc = Normalization.nfc(str) 44 | @nfkc = Normalization.nfkc(str) 45 | @nfd = Normalization.nfd(str) 46 | @nfkd = Normalization.nfkd(str) 47 | end 48 | 49 | # Normalization Form C (NFC) - Canonical Decomposition, followed by Canonical Composition 50 | # @param str [String] the target string 51 | # @return [String] input normalized with NFC 52 | def self.nfc(str) 53 | str.unicode_normalize(:nfc) 54 | end 55 | 56 | # Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition 57 | # @param str [String] the target string 58 | # @return [String] input normalized with NFKC 59 | def self.nfkc(str) 60 | str.unicode_normalize(:nfkc) 61 | end 62 | 63 | # Normalization Form D (NFD) - Canonical Decomposition 64 | # @param str [String] the target string 65 | # @return [String] input normalized with NFD 66 | def self.nfd(str) 67 | str.unicode_normalize(:nfd) 68 | end 69 | 70 | # Normalization Form KD (NFKD) - Compatibility Decomposition 71 | # @param str [String] the target string 72 | # @return [String] input normalized with NFKD 73 | def self.nfkd(str) 74 | str.unicode_normalize(:nfkd) 75 | end 76 | 77 | # Replace HTML escapable characters with their Unicode counterparts that will 78 | # cast to themself after applying normalization forms using compatibility mode. 79 | # Usefull for XSS, to bypass HTML escape. 80 | # If several values are possible, one is picked randomly. 81 | # @param str [String] the target string 82 | # @return [String] escaped input 83 | def self.replace_bypass(str) 84 | str = str.dup 85 | HTML_ESCAPE_BYPASS.each do |k, v| 86 | str.gsub!(k, v.sample) 87 | end 88 | str 89 | end 90 | 91 | # Instance version of {Normalization.replace_bypass}. 92 | def replace_bypass 93 | Normalization.replace_bypass(@original) 94 | end 95 | 96 | # Display a CLI-friendly output summurizing all normalization forms 97 | # @return [String] CLI-ready output 98 | # @example 99 | # puts Unisec::Normalization.new("\u{1E9B 0323}").display 100 | # # => 101 | # # Original: ẛ̣ 102 | # # U+1E9B U+0323 103 | # # NFC: ẛ̣ 104 | # # U+1E9B U+0323 105 | # # NFKC: ṩ 106 | # # U+1E69 107 | # # NFD: ẛ̣ 108 | # # U+017F U+0323 U+0307 109 | # # NFKD: ṩ 110 | # # U+0073 U+0323 U+0307 111 | def display 112 | colorize = lambda { |form_title, form_attr| 113 | "#{Paint[form_title.to_s, :underline, 114 | :bold]}: #{form_attr}\n #{Paint[Unisec::Properties.chars2codepoints(form_attr), :red]}\n" 115 | } 116 | colorize.call('Original', @original) + 117 | colorize.call('NFC', @nfc) + 118 | colorize.call('NFKC', @nfkc) + 119 | colorize.call('NFD', @nfd) + 120 | colorize.call('NFKD', @nfkd) 121 | end 122 | 123 | # Display a CLI-friendly output of the XSS payload to bypass HTML escape and 124 | # what it does once normalized in NFKC & NFKD. 125 | def display_replace 126 | colorize = lambda { |form_title, form_attr| 127 | "#{Paint[form_title.to_s, :underline, 128 | :bold]}: #{form_attr}\n #{Paint[Unisec::Properties.chars2codepoints(form_attr), :red]}\n" 129 | } 130 | payload = replace_bypass 131 | colorize.call('Original', @original) + 132 | colorize.call('Bypass payload', payload) + 133 | colorize.call('NFKC', Normalization.nfkc(payload)) + 134 | colorize.call('NFKD', Normalization.nfkd(payload)) 135 | end 136 | end 137 | end 138 | -------------------------------------------------------------------------------- /docs/yard/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | File: README 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 59 | 60 |

unisec

61 |

GitHub forks 62 | GitHub stars 63 | GitHub license 64 | Rawsec's CyberSecurity Inventory

65 |

GitHub Workflow Status 66 | GitHub commit activity

67 |

68 |
69 |

Unicode Security Toolkit

70 |
71 |

What is it?

72 |

A CLI tool and library to play with Unicode security.

73 |

Features

74 |
    75 |
  • BiDi spoofing 76 |
      77 |
    • Craft payloads for attack using BiDi code points (e.g. spoofing a domain name or a file name)
    • 78 |
    79 |
  • 80 |
  • Confusables / homoglyphs 81 |
      82 |
    • List confusables characters for a given character
    • 83 |
    • Replace all characters from a string with random confusables
    • 84 |
    85 |
  • 86 |
  • Hexdump 87 |
      88 |
    • UTF-8, UTF-16, UTF-32 hexadecimal dumps
    • 89 |
    90 |
  • 91 |
  • Normalization 92 |
      93 |
    • NFC, NFKC, NFD, NFKD normalization forms, HTML escape bypass for XSS
    • 94 |
    95 |
  • 96 |
  • Properties 97 |
      98 |
    • Get all properties of a given Unicode character
    • 99 |
    • List code points matching a Unicode property
    • 100 |
    • List all Unicode properties name
    • 101 |
    102 |
  • 103 |
  • Regexp search 104 |
      105 |
    • Search for Unicode code point names by regular expression
    • 106 |
    107 |
  • 108 |
  • Size 109 |
      110 |
    • Code point, grapheme, UTF-8/UTF-16/UTF-32 byte/unit size
    • 111 |
    112 |
  • 113 |
  • Surrogates 114 |
      115 |
    • Code point ↔️ Surrogates conversion
    • 116 |
    117 |
  • 118 |
  • Versions 119 |
      120 |
    • Version of Unicode, ICU, CLDR, UCD, gems used in Unisec
    • 121 |
    122 |
  • 123 |
124 |

Installation

125 |
$ gem install unisec
126 | 
127 |

Check the installation page on the documentation to discover more methods.

128 |

Packaging status 129 | Gem Version 130 | GitHub tag (latest SemVer)

131 |

Documentation

132 |

Homepage / Documentation: https://acceis.github.io/unisec/

133 |

Author

134 |

Made by Alexandre ZANNI (@noraj) at ACCEIS.

135 |
136 | 137 | 142 | 143 |
144 | 145 | -------------------------------------------------------------------------------- /docs/yard/file.README.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | File: README 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 59 | 60 |

unisec

61 |

GitHub forks 62 | GitHub stars 63 | GitHub license 64 | Rawsec's CyberSecurity Inventory

65 |

GitHub Workflow Status 66 | GitHub commit activity

67 |

68 |
69 |

Unicode Security Toolkit

70 |
71 |

What is it?

72 |

A CLI tool and library to play with Unicode security.

73 |

Features

74 |
    75 |
  • BiDi spoofing 76 |
      77 |
    • Craft payloads for attack using BiDi code points (e.g. spoofing a domain name or a file name)
    • 78 |
    79 |
  • 80 |
  • Confusables / homoglyphs 81 |
      82 |
    • List confusables characters for a given character
    • 83 |
    • Replace all characters from a string with random confusables
    • 84 |
    85 |
  • 86 |
  • Hexdump 87 |
      88 |
    • UTF-8, UTF-16, UTF-32 hexadecimal dumps
    • 89 |
    90 |
  • 91 |
  • Normalization 92 |
      93 |
    • NFC, NFKC, NFD, NFKD normalization forms, HTML escape bypass for XSS
    • 94 |
    95 |
  • 96 |
  • Properties 97 |
      98 |
    • Get all properties of a given Unicode character
    • 99 |
    • List code points matching a Unicode property
    • 100 |
    • List all Unicode properties name
    • 101 |
    102 |
  • 103 |
  • Regexp search 104 |
      105 |
    • Search for Unicode code point names by regular expression
    • 106 |
    107 |
  • 108 |
  • Size 109 |
      110 |
    • Code point, grapheme, UTF-8/UTF-16/UTF-32 byte/unit size
    • 111 |
    112 |
  • 113 |
  • Surrogates 114 |
      115 |
    • Code point ↔️ Surrogates conversion
    • 116 |
    117 |
  • 118 |
  • Versions 119 |
      120 |
    • Version of Unicode, ICU, CLDR, UCD, gems used in Unisec
    • 121 |
    122 |
  • 123 |
124 |

Installation

125 |
$ gem install unisec
126 | 
127 |

Check the installation page on the documentation to discover more methods.

128 |

Packaging status 129 | Gem Version 130 | GitHub tag (latest SemVer)

131 |

Documentation

132 |

Homepage / Documentation: https://acceis.github.io/unisec/

133 |

Author

134 |

Made by Alexandre ZANNI (@noraj) at ACCEIS.

135 |
136 | 137 | 142 | 143 |
144 | 145 | -------------------------------------------------------------------------------- /lib/unisec/rugrep.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'twitter_cldr' 4 | require 'paint' 5 | 6 | module Unisec 7 | # Ruby grep : Ruby regular expression search for Unicode code point names 8 | class Rugrep 9 | # UCD Derived names file location 10 | # @see https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedName.txt 11 | UCD_DERIVEDNAME = File.join(__dir__, '../../data/DerivedName.txt') 12 | 13 | # Search code points by (Ruby) regexp 14 | # @param regexp [Regexp] Regular expression without delimiters or modifiers. 15 | # Supports everything Ruby Regexp supports 16 | # @return [Array] Array of code points (`{char: String, codepoint: Integer, name: String}`) 17 | # @example 18 | # Unisec::Rugrep.regrep('snowman|snowflake') 19 | # # => 20 | # # [{:char=>"☃", :codepoint=>9731, :name=>"SNOWMAN"}, 21 | # # {:char=>"⛄", :codepoint=>9924, :name=>"SNOWMAN WITHOUT SNOW"}, 22 | # # {:char=>"⛇", :codepoint=>9927, :name=>"BLACK SNOWMAN"}, 23 | # # {:char=>"❄", :codepoint=>10052, :name=>"SNOWFLAKE"}, 24 | # # {:char=>"❅", :codepoint=>10053, :name=>"TIGHT TRIFOLIATE SNOWFLAKE"}, 25 | # # {:char=>"❆", :codepoint=>10054, :name=>"HEAVY CHEVRON SNOWFLAKE"}] 26 | # Unisec::Rugrep.regrep('greek small letter \w+') 27 | # # => 28 | # # [{:char=>"ͱ", :codepoint=>881, :name=>"GREEK SMALL LETTER HETA"}, 29 | # # {:char=>"ͳ", :codepoint=>883, :name=>"GREEK SMALL LETTER ARCHAIC SAMPI"}, 30 | # # {:char=>"ͷ", :codepoint=>887, :name=>"GREEK SMALL LETTER PAMPHYLIAN DIGAMMA"}, 31 | # # …] 32 | def self.regrep(regexp) 33 | out = [] 34 | file = File.new(UCD_DERIVEDNAME) 35 | file.each_line(chomp: true) do |line| 36 | # Skip if the line is empty or a comment 37 | next if line.empty? || line[0] == '#' 38 | 39 | # parse the line to extract code point as integer and the name 40 | cp_int, name = line.split(';') 41 | cp_int = cp_int.chomp.to_i(16) 42 | name.lstrip! 43 | next unless /#{regexp}/i.match?(name) # compiling regexp once is surprisingly not faster 44 | 45 | out << { 46 | char: TwitterCldr::Utils::CodePoints.to_string([cp_int]), 47 | codepoint: cp_int, 48 | name: name 49 | } 50 | end 51 | out 52 | end 53 | 54 | # Display a CLI-friendly output listing all code points corresponding to a regular expression. 55 | # @example 56 | # Unisec::Rugrep.regrep_display('snowman|snowflake') 57 | # # => 58 | # # U+2603 ☃ SNOWMAN 59 | # # U+26C4 ⛄ SNOWMAN WITHOUT SNOW 60 | # # U+26C7 ⛇ BLACK SNOWMAN 61 | # # U+2744 ❄ SNOWFLAKE 62 | # # U+2745 ❅ TIGHT TRIFOLIATE SNOWFLAKE 63 | # # U+2746 ❆ HEAVY CHEVRON SNOWFLAKE 64 | def self.regrep_display(regexp) 65 | codepoints = regrep(regexp) 66 | codepoints.each do |cp| 67 | puts "#{Properties.deccp2stdhexcp(cp[:codepoint]).ljust(7)} #{cp[:char].ljust(4)} #{cp[:name]}" 68 | end 69 | nil 70 | end 71 | 72 | # Returns the version of Unicode used in UCD local file (data/DerivedName.txt) 73 | # @return [String] Unicode version 74 | # @example 75 | # Unisec::Rugrep.ucd_derivedname_version # => "15.1.0" 76 | def self.ucd_derivedname_version 77 | first_line = File.open(UCD_DERIVEDNAME, &:readline) 78 | first_line.match(/-(\d+\.\d+\.\d+)\.txt/).captures.first 79 | end 80 | 81 | # Search code points by (Ruby) regexp 82 | # @param regexp [Regexp] Regular expression without delimiters or modifiers 83 | # @return [Array] Array of code points (`{char: String, codepoint: Integer, name: String}`) 84 | # @example 85 | # Unisec::Rugrep.regrep_slow('snowman|snowflake') 86 | # # => 87 | # # [{:char=>"☃", :codepoint=>9731, :name=>"SNOWMAN"}, 88 | # # {:char=>"⛄", :codepoint=>9924, :name=>"SNOWMAN WITHOUT SNOW"}, 89 | # # {:char=>"⛇", :codepoint=>9927, :name=>"BLACK SNOWMAN"}, 90 | # # {:char=>"❄", :codepoint=>10052, :name=>"SNOWFLAKE"}, 91 | # # {:char=>"❅", :codepoint=>10053, :name=>"TIGHT TRIFOLIATE SNOWFLAKE"}, 92 | # # {:char=>"❆", :codepoint=>10054, :name=>"HEAVY CHEVRON SNOWFLAKE"}] 93 | # @note ⚠ This command is very time consuming (~ 1min) and unoptimized (execute one regexp per code point…) 94 | def self.regrep_slow(regexp) 95 | out = [] 96 | TwitterCldr::Shared::CodePoint.each do |cp| 97 | next unless /#{regexp}/oi.match?(cp.name) # compiling regexp once is surprisingly not faster 98 | 99 | out << { 100 | char: TwitterCldr::Utils::CodePoints.to_string([cp.code_point]), 101 | codepoint: cp.code_point, 102 | name: cp.name 103 | } 104 | end 105 | out 106 | end 107 | 108 | # Display a CLI-friendly output listing all code points corresponding to a regular expression. 109 | # @example 110 | # Unisec::Rugrep.regrep_display_slow('snowman|snowflake') 111 | # # => 112 | # # U+2603 ☃ SNOWMAN 113 | # # U+26C4 ⛄ SNOWMAN WITHOUT SNOW 114 | # # U+26C7 ⛇ BLACK SNOWMAN 115 | # # U+2744 ❄ SNOWFLAKE 116 | # # U+2745 ❅ TIGHT TRIFOLIATE SNOWFLAKE 117 | # # U+2746 ❆ HEAVY CHEVRON SNOWFLAKE 118 | def self.regrep_display_slow(regexp) 119 | codepoints = regrep_slow(regexp) 120 | codepoints.each do |cp| 121 | puts "#{Properties.deccp2stdhexcp(cp[:codepoint]).ljust(7)} #{cp[:char].ljust(4)} #{cp[:name]}" 122 | end 123 | nil 124 | end 125 | end 126 | end 127 | -------------------------------------------------------------------------------- /docs/yard/file.install.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | File: install 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 59 | 60 |

Installation

61 |

Production

62 | 63 |

rubygems.org (universal)

64 |
$ gem install unisec
 65 | 
66 |

Gem: unisec

67 |

BlackArch

68 |

From the repository:

69 |
# pacman -S unisec
 70 | 
71 |

From git:

72 |
# blackman -i unisec
 73 | 
74 |

PKGBUILD: unisec

75 |

ArchLinux

76 |

Manually:

77 |
$ git clone https://aur.archlinux.org/unisec.git
 78 | $ cd unisec
 79 | $ makepkg -sic
 80 | 
81 |

With an AUR helper (Pacman wrappers), eg. pikaur:

82 |
$ pikaur -S unisec
 83 | 
84 |

AUR: unisec

85 | 86 |

Development

87 |

It's better to use ASDM-VM to have latests version of ruby and to avoid trashing your system ruby.

88 | 89 |

rubygems.org

90 |
$ gem install --development unisec
 91 | 
92 |

git

93 |

Just replace x.x.x with the gem version you see after gem build.

94 |
$ git clone https://github.com/acceis/unisec.git unisec
 95 | $ cd unisec
 96 | $ gem install bundler
 97 | $ bundler install
 98 | $ gem build unisec.gemspec
 99 | $ gem install unisec-x.x.x.gem
100 | 
101 |

Note: if an automatic install is needed you can get the version with $ gem build unisec.gemspec | grep Version | cut -d' ' -f4.

102 |

No install

103 |

Run the library in irb without installing the gem.

104 |

From local file:

105 |
$ irb -Ilib -runisec
106 | 
107 |

Same for the CLI tool:

108 |
$ ruby -Ilib -runisec bin/unisec
109 | 
110 | 111 |
112 | 113 | 118 | 119 |
120 | 121 | -------------------------------------------------------------------------------- /docs/yard/css/full_list.css: -------------------------------------------------------------------------------- 1 | body { 2 | margin: 0; 3 | font-family: "Lucida Sans", "Lucida Grande", Verdana, Arial, sans-serif; 4 | font-size: 13px; 5 | height: 101%; 6 | overflow-x: hidden; 7 | background: #fafafa; 8 | } 9 | 10 | h1 { padding: 12px 10px; padding-bottom: 0; margin: 0; font-size: 1.4em; } 11 | .clear { clear: both; } 12 | .fixed_header { position: fixed; background: #fff; width: 100%; padding-bottom: 10px; margin-top: 0; top: 0; z-index: 9999; height: 70px; } 13 | #search { position: absolute; right: 5px; top: 9px; padding-left: 24px; } 14 | #content.insearch #search, #content.insearch #noresults { background: url(data:image/gif;base64,R0lGODlhEAAQAPYAAP///wAAAPr6+pKSkoiIiO7u7sjIyNjY2J6engAAAI6OjsbGxjIyMlJSUuzs7KamppSUlPLy8oKCghwcHLKysqSkpJqamvT09Pj4+KioqM7OzkRERAwMDGBgYN7e3ujo6Ly8vCoqKjY2NkZGRtTU1MTExDw8PE5OTj4+PkhISNDQ0MrKylpaWrS0tOrq6nBwcKysrLi4uLq6ul5eXlxcXGJiYoaGhuDg4H5+fvz8/KKiohgYGCwsLFZWVgQEBFBQUMzMzDg4OFhYWBoaGvDw8NbW1pycnOLi4ubm5kBAQKqqqiQkJCAgIK6urnJyckpKSjQ0NGpqatLS0sDAwCYmJnx8fEJCQlRUVAoKCggICLCwsOTk5ExMTPb29ra2tmZmZmhoaNzc3KCgoBISEiIiIgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH/C05FVFNDQVBFMi4wAwEAAAAh/hpDcmVhdGVkIHdpdGggYWpheGxvYWQuaW5mbwAh+QQJCAAAACwAAAAAEAAQAAAHaIAAgoMgIiYlg4kACxIaACEJCSiKggYMCRselwkpghGJBJEcFgsjJyoAGBmfggcNEx0flBiKDhQFlIoCCA+5lAORFb4AJIihCRbDxQAFChAXw9HSqb60iREZ1omqrIPdJCTe0SWI09GBACH5BAkIAAAALAAAAAAQABAAAAdrgACCgwc0NTeDiYozCQkvOTo9GTmDKy8aFy+NOBA7CTswgywJDTIuEjYFIY0JNYMtKTEFiRU8Pjwygy4ws4owPyCKwsMAJSTEgiQlgsbIAMrO0dKDGMTViREZ14kYGRGK38nHguHEJcvTyIEAIfkECQgAAAAsAAAAABAAEAAAB2iAAIKDAggPg4iJAAMJCRUAJRIqiRGCBI0WQEEJJkWDERkYAAUKEBc4Po1GiKKJHkJDNEeKig4URLS0ICImJZAkuQAhjSi/wQyNKcGDCyMnk8u5rYrTgqDVghgZlYjcACTA1sslvtHRgQAh+QQJCAAAACwAAAAAEAAQAAAHZ4AAgoOEhYaCJSWHgxGDJCQARAtOUoQRGRiFD0kJUYWZhUhKT1OLhR8wBaaFBzQ1NwAlkIszCQkvsbOHL7Y4q4IuEjaqq0ZQD5+GEEsJTDCMmIUhtgk1lo6QFUwJVDKLiYJNUd6/hoEAIfkECQgAAAAsAAAAABAAEAAAB2iAAIKDhIWGgiUlh4MRgyQkjIURGRiGGBmNhJWHm4uen4ICCA+IkIsDCQkVACWmhwSpFqAABQoQF6ALTkWFnYMrVlhWvIKTlSAiJiVVPqlGhJkhqShHV1lCW4cMqSkAR1ofiwsjJyqGgQAh+QQJCAAAACwAAAAAEAAQAAAHZ4AAgoOEhYaCJSWHgxGDJCSMhREZGIYYGY2ElYebi56fhyWQniSKAKKfpaCLFlAPhl0gXYNGEwkhGYREUywag1wJwSkHNDU3D0kJYIMZQwk8MjPBLx9eXwuETVEyAC/BOKsuEjYFhoEAIfkECQgAAAAsAAAAABAAEAAAB2eAAIKDhIWGgiUlh4MRgyQkjIURGRiGGBmNhJWHm4ueICImip6CIQkJKJ4kigynKaqKCyMnKqSEK05StgAGQRxPYZaENqccFgIID4KXmQBhXFkzDgOnFYLNgltaSAAEpxa7BQoQF4aBACH5BAkIAAAALAAAAAAQABAAAAdogACCg4SFggJiPUqCJSWGgkZjCUwZACQkgxGEXAmdT4UYGZqCGWQ+IjKGGIUwPzGPhAc0NTewhDOdL7Ykji+dOLuOLhI2BbaFETICx4MlQitdqoUsCQ2vhKGjglNfU0SWmILaj43M5oEAOwAAAAAAAAAAAA==) no-repeat center left; } 15 | #full_list { padding: 0; list-style: none; margin-left: 0; margin-top: 80px; font-size: 1.1em; } 16 | #full_list ul { padding: 0; } 17 | #full_list li { padding: 0; margin: 0; list-style: none; } 18 | #full_list li .item { padding: 5px 5px 5px 12px; } 19 | #noresults { padding: 7px 12px; background: #fff; } 20 | #content.insearch #noresults { margin-left: 7px; } 21 | li.collapsed ul { display: none; } 22 | li a.toggle { cursor: default; position: relative; left: -5px; top: 4px; text-indent: -999px; width: 10px; height: 9px; margin-left: -10px; display: block; float: left; background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAASCAYAAABb0P4QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAK8AAACvABQqw0mAAAABx0RVh0U29mdHdhcmUAQWRvYmUgRmlyZXdvcmtzIENTM5jWRgMAAAAVdEVYdENyZWF0aW9uIFRpbWUAMy8xNC8wOeNZPpQAAAE2SURBVDiNrZTBccIwEEXfelIAHUA6CZ24BGaWO+FuzZAK4k6gg5QAdGAq+Bxs2Yqx7BzyL7Llp/VfzZeQhCTc/ezuGzKKnKSzpCxXJM8fwNXda3df5RZETlIt6YUzSQDs93sl8w3wBZxCCE10GM1OcWbWjB2mWgEH4Mfdyxm3PSepBHibgQE2wLe7r4HjEidpnXMYdQPKEMJcsZ4zs2POYQOcaPfwMVOo58zsAdMt18BuoVDPxUJRacELbXv3hUIX2vYmOUvi8C8ydz/ThjXrqKqqLbDIAdsCKBd+Wo7GWa7o9qzOQHVVVXeAbs+yHHCH4aTsaCOQqunmUy1yBUAXkdMIfMlgF5EXLo2OpV/c/Up7jG4hhHcYLgWzAZXUc2b2ixsfvc/RmNNfOXD3Q/oeL9axJE1yT9IOoUu6MGUkAAAAAElFTkSuQmCC) no-repeat bottom left; } 23 | li.collapsed a.toggle { opacity: 0.5; cursor: default; background-position: top left; } 24 | li { color: #888; cursor: pointer; } 25 | li.deprecated { text-decoration: line-through; font-style: italic; } 26 | li.odd { background: #f0f0f0; } 27 | li.even { background: #fafafa; } 28 | .item:hover { background: #ddd; } 29 | li small:before { content: "("; } 30 | li small:after { content: ")"; } 31 | li small.search_info { display: none; } 32 | a, a:visited { text-decoration: none; color: #05a; } 33 | li.clicked > .item { background: #05a; color: #ccc; } 34 | li.clicked > .item a, li.clicked > .item a:visited { color: #eee; } 35 | li.clicked > .item a.toggle { opacity: 0.5; background-position: bottom right; } 36 | li.collapsed.clicked a.toggle { background-position: top right; } 37 | #search input { border: 1px solid #bbb; border-radius: 3px; } 38 | #full_list_nav { margin-left: 10px; font-size: 0.9em; display: block; color: #aaa; } 39 | #full_list_nav a, #nav a:visited { color: #358; } 40 | #full_list_nav a:hover { background: transparent; color: #5af; } 41 | #full_list_nav span:after { content: ' | '; } 42 | #full_list_nav span:last-child:after { content: ''; } 43 | 44 | #content h1 { margin-top: 0; } 45 | li { white-space: nowrap; cursor: normal; } 46 | li small { display: block; font-size: 0.8em; } 47 | li small:before { content: ""; } 48 | li small:after { content: ""; } 49 | li small.search_info { display: none; } 50 | #search { width: 170px; position: static; margin: 3px; margin-left: 10px; font-size: 0.9em; color: #888; padding-left: 0; padding-right: 24px; } 51 | #content.insearch #search { background-position: center right; } 52 | #search input { width: 110px; } 53 | 54 | #full_list.insearch ul { display: block; } 55 | #full_list.insearch .item { display: none; } 56 | #full_list.insearch .found { display: block; padding-left: 11px !important; } 57 | #full_list.insearch li a.toggle { display: none; } 58 | #full_list.insearch li small.search_info { display: block; } 59 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands/Properties/List.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Class: Unisec::CLI::Commands::Properties::List 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Class: Unisec::CLI::Commands::Properties::List 63 | 64 | 65 | 66 |

67 |
68 | 69 |
70 |
Inherits:
71 |
72 | Dry::CLI::Command 73 | 74 |
    75 |
  • Object
  • 76 | 77 | 78 | 79 | 80 | 81 |
82 | show all 83 | 84 |
85 |
86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 |
98 |
Defined in:
99 |
lib/unisec/cli/properties.rb
100 |
101 | 102 |
103 | 104 |

Overview

105 |
106 |

Command unisec properties list

107 |

Example:

108 |
$ unisec properties list
109 | ASCII_Hex_Digit
110 | Age
111 | Alphabetic
112 | …
113 | 
114 | 115 | 116 |
117 |
118 |
119 | 120 | 121 |
122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 |

130 | Instance Method Summary 131 | collapse 132 |

133 | 134 |
    135 | 136 |
  • 137 | 138 | 139 | #call ⇒ Object 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 |

    List Unicode properties name.

    154 |
    155 | 156 |
  • 157 | 158 | 159 |
160 | 161 | 162 | 163 | 164 | 165 |
166 |

Instance Method Details

167 | 168 | 169 |
170 |

171 | 172 | #callObject 173 | 174 | 175 | 176 | 177 | 178 |

179 |
180 |

List Unicode properties name

181 | 182 | 183 |
184 |
185 |
186 | 187 | 188 |
189 | 190 | 200 | 209 | 210 |
191 |
192 | 
193 | 
194 | 27
195 | 28
196 | 29
197 | 30
198 | 31
199 |
201 |
# File 'lib/unisec/cli/properties.rb', line 27
202 | 
203 | def call(**)
204 |   Unisec::Properties.list.each do |p|
205 |     puts p
206 |   end
207 | end
208 |
211 |
212 | 213 |
214 | 215 |
216 | 217 | 222 | 223 |
224 | 225 | -------------------------------------------------------------------------------- /lib/unisec/size.rb: -------------------------------------------------------------------------------- 1 | # frozen_string_literal: true 2 | 3 | require 'paint' 4 | 5 | module Unisec 6 | # All kinf of size information about a Unicode string 7 | class Size 8 | # Number of code points 9 | # @return [Integer] number of code points 10 | # @example 11 | # us = Unisec::Size.new('👩‍❤️‍👩') 12 | # us.code_points_size # => 6 13 | attr_reader :code_points_size 14 | 15 | # Number of graphemes 16 | # @return [Integer] number of graphemes 17 | # @example 18 | # us = Unisec::Size.new('👩‍❤️‍👩') 19 | # us.grapheme_size # => 1 20 | attr_reader :grapheme_size 21 | 22 | # UTF-8 size in bytes 23 | # @return [Integer] UTF-8 size in bytes 24 | # @example 25 | # us = Unisec::Size.new('👩‍❤️‍👩') 26 | # us.utf8_bytesize # => 20 27 | attr_reader :utf8_bytesize 28 | 29 | # UTF-16 size in bytes 30 | # @return [Integer] UTF-16 size in bytes 31 | # @example 32 | # us = Unisec::Size.new('👩‍❤️‍👩') 33 | # us.utf16_bytesize # => 16 34 | attr_reader :utf16_bytesize 35 | 36 | # UTF-32 size in bytes 37 | # @return [Integer] UTF-32 size in bytes 38 | # @example 39 | # us = Unisec::Size.new('👩‍❤️‍👩') 40 | # us.utf32_bytesize # => 24 41 | attr_reader :utf32_bytesize 42 | 43 | # Number of UTF-8 units 44 | # @return [Integer] number of UTF-8 units 45 | # @example 46 | # us = Unisec::Size.new('👩‍❤️‍👩') 47 | # us.utf8_unitsize # => 20 48 | attr_reader :utf8_unitsize 49 | 50 | # Number of UTF-16 units 51 | # @return [Integer] number of UTF-16 units 52 | # @example 53 | # us = Unisec::Size.new('👩‍❤️‍👩') 54 | # us.utf16_unitsize # => 8 55 | attr_reader :utf16_unitsize 56 | 57 | # Number of UTF-32 units 58 | # @return [Integer] number of UTF-32 units 59 | # @example 60 | # us = Unisec::Size.new('👩‍❤️‍👩') 61 | # us.utf32_unitsize # => 6 62 | attr_reader :utf32_unitsize 63 | 64 | def initialize(str) 65 | @code_points_size = Size.code_points_size(str) 66 | @grapheme_size = Size.grapheme_size(str) 67 | @utf8_bytesize = Size.utf8_bytesize(str) 68 | @utf16_bytesize = Size.utf16_bytesize(str) 69 | @utf32_bytesize = Size.utf32_bytesize(str) 70 | @utf8_unitsize = Size.utf8_unitsize(str) 71 | @utf16_unitsize = Size.utf16_unitsize(str) 72 | @utf32_unitsize = Size.utf32_unitsize(str) 73 | end 74 | 75 | # Number of code points 76 | # @param str [String] Input sting we want to know the size of 77 | # @return [Integer] number of code points 78 | # @example 79 | # Unisec::Size.code_points_size('👩‍❤️‍👩') # => 6 80 | def self.code_points_size(str) 81 | str.size 82 | end 83 | 84 | # Number of graphemes 85 | # @param str [String] Input sting we want to know the size of 86 | # @return [Integer] number of graphemes 87 | # @example 88 | # Unisec::Size.grapheme_size('👩‍❤️‍👩') # => 1 89 | def self.grapheme_size(str) 90 | str.grapheme_clusters.size 91 | end 92 | 93 | # UTF-8 size in bytes 94 | # @param str [String] Input sting we want to know the size of 95 | # @return [Integer] UTF-8 size in bytes 96 | # @example 97 | # Unisec::Size.utf8_bytesize('👩‍❤️‍👩') # => 20 98 | def self.utf8_bytesize(str) 99 | str.bytesize 100 | end 101 | 102 | # UTF-16 size in bytes 103 | # @param str [String] Input sting we want to know the size of 104 | # @return [Integer] UTF-16 size in bytes 105 | # @example 106 | # Unisec::Size.utf16_bytesize('👩‍❤️‍👩') # => 16 107 | def self.utf16_bytesize(str) 108 | str.encode('UTF-16BE').bytesize 109 | end 110 | 111 | # UTF-32 size in bytes 112 | # @param str [String] Input sting we want to know the size of 113 | # @return [Integer] UTF-32 size in bytes 114 | # @example 115 | # Unisec::Size.utf32_bytesize('👩‍❤️‍👩') # => 24 116 | def self.utf32_bytesize(str) 117 | str.encode('UTF-32BE').bytesize 118 | end 119 | 120 | # Number of UTF-8 units 121 | # @param str [String] Input sting we want to know the size of 122 | # @return [Integer] number of UTF-8 units 123 | # @example 124 | # Unisec::Size.utf8_unitsize('👩‍❤️‍👩') # => 20 125 | def self.utf8_unitsize(str) 126 | utf8_bytesize(str) 127 | end 128 | 129 | # Number of UTF-16 units 130 | # @param str [String] Input sting we want to know the size of 131 | # @return [Integer] number of UTF-16 units 132 | # @example 133 | # Unisec::Size.utf16_unitsize('👩‍❤️‍👩') # => 8 134 | def self.utf16_unitsize(str) 135 | utf16_bytesize(str) / 2 136 | end 137 | 138 | # Number of UTF-32 units 139 | # @param str [String] Input sting we want to know the size of 140 | # @return [Integer] number of UTF-32 units 141 | # @example 142 | # Unisec::Size.utf32_unitsize('👩‍❤️‍👩') # => 6 143 | def self.utf32_unitsize(str) 144 | utf32_bytesize(str) / 4 145 | end 146 | 147 | # Display a CLI-friendly output summurizing the size information about a Unicode string. 148 | # @example 149 | # Unisec::Size.new('👩‍❤️‍👨').display 150 | # # => 151 | # # Code point(s): 6 152 | # # Grapheme(s): 1 153 | # # UTF-8 byte(s): 20 154 | # # UTF-16 byte(s): 16 155 | # # UTF-32 byte(s): 24 156 | # # UTF-8 unit(s): 20 157 | # # UTF-16 unit(s): 8 158 | # # UTF-32 unit(s): 6 159 | def display 160 | display = ->(key, value) { puts Paint[key, :red, :bold].ljust(27) + " #{value}" } 161 | display.call('Code point(s):', @code_points_size) 162 | display.call('Grapheme(s):', @grapheme_size) 163 | display.call('UTF-8 byte(s):', @utf8_bytesize) 164 | display.call('UTF-16 byte(s):', @utf16_bytesize) 165 | display.call('UTF-32 byte(s):', @utf32_bytesize) 166 | display.call('UTF-8 unit(s):', @utf8_unitsize) 167 | display.call('UTF-16 unit(s):', @utf16_unitsize) 168 | display.call('UTF-32 unit(s):', @utf32_unitsize) 169 | end 170 | end 171 | end 172 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands/Versions.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Class: Unisec::CLI::Commands::Versions 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Class: Unisec::CLI::Commands::Versions 63 | 64 | 65 | 66 |

67 |
68 | 69 |
70 |
Inherits:
71 |
72 | Dry::CLI::Command 73 | 74 |
    75 |
  • Object
  • 76 | 77 | 78 | 79 | 80 | 81 |
82 | show all 83 | 84 |
85 |
86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 |
98 |
Defined in:
99 |
lib/unisec/cli/versions.rb
100 |
101 | 102 |
103 | 104 |

Overview

105 |
106 |

CLI command unisec versions for the class Versions from the lib.

107 |

Example:

108 |
$ unisec versions
109 | Unicode:
110 | Unicode (Ruby)                    15.0.0
111 | Unicode (twitter_cldr gem)        14.0.0
112 | Unicode (unicode-confusable gem)  15.0.0
113 | ICU (twitter_cldr gem)            70.1
114 | CLDR (twitter_cldr gem)           40
115 | Unicode emoji (Ruby)              15.0
116 | 
117 | Gems:
118 | unisec                            0.0.1
119 | twitter_cldr gem                  6.11.5
120 | unicode-confusable gem            1.9.0
121 | 
122 | 123 | 124 |
125 |
126 |
127 | 128 | 129 |
130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 |

138 | Instance Method Summary 139 | collapse 140 |

141 | 142 |
    143 | 144 |
  • 145 | 146 | 147 | #call ⇒ Object 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 |

    Version of anything related to Unicode as used in unisec.

    162 |
    163 | 164 |
  • 165 | 166 | 167 |
168 | 169 | 170 | 171 | 172 | 173 |
174 |

Instance Method Details

175 | 176 | 177 |
178 |

179 | 180 | #callObject 181 | 182 | 183 | 184 | 185 | 186 |

187 |
188 |

Version of anything related to Unicode as used in unisec.

189 | 190 | 191 |
192 |
193 |
194 | 195 | 196 |
197 | 198 | 206 | 213 | 214 |
199 |
200 | 
201 | 
202 | 32
203 | 33
204 | 34
205 |
207 |
# File 'lib/unisec/cli/versions.rb', line 32
208 | 
209 | def call(**)
210 |   puts Unisec::Versions.display
211 | end
212 |
215 |
216 | 217 |
218 | 219 |
220 | 221 | 226 | 227 |
228 | 229 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands/Grep.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Class: Unisec::CLI::Commands::Grep 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Class: Unisec::CLI::Commands::Grep 63 | 64 | 65 | 66 |

67 |
68 | 69 |
70 |
Inherits:
71 |
72 | Dry::CLI::Command 73 | 74 |
    75 |
  • Object
  • 76 | 77 | 78 | 79 | 80 | 81 |
82 | show all 83 | 84 |
85 |
86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 |
98 |
Defined in:
99 |
lib/unisec/cli/rugrep.rb
100 |
101 | 102 |
103 | 104 |

Overview

105 |
106 |

CLI command unisec grep for the class Rugrep from the lib.

107 |

Example:

108 |
$ unisec grep 'FRENCH \w+'
109 | U+20A3  ₣    FRENCH FRANC SIGN
110 | U+1F35F 🍟    FRENCH FRIES
111 | 
112 | 113 | 114 |
115 |
116 |
117 | 118 | 119 |
120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 |

128 | Instance Method Summary 129 | collapse 130 |

131 | 132 |
    133 | 134 |
  • 135 | 136 | 137 | #call(regexp: nil) ⇒ Object 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 |

    Hexdump of all Unicode encodings.

    152 |
    153 | 154 |
  • 155 | 156 | 157 |
158 | 159 | 160 | 161 | 162 | 163 |
164 |

Instance Method Details

165 | 166 | 167 |
168 |

169 | 170 | #call(regexp: nil) ⇒ Object 171 | 172 | 173 | 174 | 175 | 176 |

177 |
178 |

Hexdump of all Unicode encodings.

179 | 180 | 181 |
182 |
183 |
184 |

Parameters:

185 |
    186 | 187 |
  • 188 | 189 | regexp 190 | 191 | 192 | (Regexp) 193 | 194 | 195 | (defaults to: nil) 196 | 197 | 198 | — 199 |

    Regular expression without delimiters or modifiers. 200 | Supports everything Ruby Regexp supports

    201 |
    202 | 203 |
  • 204 | 205 |
206 | 207 | 208 |
209 | 210 | 218 | 225 | 226 |
211 |
212 | 
213 | 
214 | 27
215 | 28
216 | 29
217 |
219 |
# File 'lib/unisec/cli/rugrep.rb', line 27
220 | 
221 | def call(regexp: nil, **)
222 |   puts Unisec::Rugrep.regrep_display(regexp)
223 | end
224 |
227 |
228 | 229 |
230 | 231 |
232 | 233 | 238 | 239 |
240 | 241 | -------------------------------------------------------------------------------- /docs/yard/Unisec/CLI/Commands/Size.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Class: Unisec::CLI::Commands::Size 8 | 9 | — Documentation by YARD 0.9.36 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 34 | 35 |
36 | 61 | 62 |

Class: Unisec::CLI::Commands::Size 63 | 64 | 65 | 66 |

67 |
68 | 69 |
70 |
Inherits:
71 |
72 | Dry::CLI::Command 73 | 74 |
    75 |
  • Object
  • 76 | 77 | 78 | 79 | 80 | 81 |
82 | show all 83 | 84 |
85 |
86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 |
98 |
Defined in:
99 |
lib/unisec/cli/size.rb
100 |
101 | 102 |
103 | 104 |

Overview

105 |
106 |

CLI command unisec size for the class Size from the lib.

107 |

Example:

108 |
$ unisec size 🧑🏼‍🔬
109 | Code point(s):   4
110 | Grapheme(s):     1
111 | UTF-8 byte(s):   15
112 | UTF-16 byte(s):  14
113 | UTF-32 byte(s):  16
114 | UTF-8 unit(s):   15
115 | UTF-16 unit(s):  7
116 | UTF-32 unit(s):  4
117 | 
118 | 119 | 120 |
121 |
122 |
123 | 124 | 125 |
126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 |

134 | Instance Method Summary 135 | collapse 136 |

137 | 138 |
    139 | 140 |
  • 141 | 142 | 143 | #call(input: nil) ⇒ Object 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 |

    All kinf of size information about a Unicode string.

    158 |
    159 | 160 |
  • 161 | 162 | 163 |
164 | 165 | 166 | 167 | 168 | 169 |
170 |

Instance Method Details

171 | 172 | 173 |
174 |

175 | 176 | #call(input: nil) ⇒ Object 177 | 178 | 179 | 180 | 181 | 182 |

183 |
184 |

All kinf of size information about a Unicode string.

185 | 186 | 187 |
188 |
189 |
190 |

Parameters:

191 |
    192 | 193 |
  • 194 | 195 | input 196 | 197 | 198 | (String) 199 | 200 | 201 | (defaults to: nil) 202 | 203 | 204 | — 205 |

    Input sting we want to know the size of

    206 |
    207 | 208 |
  • 209 | 210 |
211 | 212 | 213 |
214 | 215 | 223 | 230 | 231 |
216 |
217 | 
218 | 
219 | 32
220 | 33
221 | 34
222 |
224 |
# File 'lib/unisec/cli/size.rb', line 32
225 | 
226 | def call(input: nil, **)
227 |   puts Unisec::Size.new(input).display
228 | end
229 |
232 |
233 | 234 |
235 | 236 |
237 | 238 | 243 | 244 |
245 | 246 | --------------------------------------------------------------------------------