├── README.md
├── ReflectionTypeHint.php
├── ReflectionTypeHint_example.php
├── UTF8-CHANGELOG.txt
├── UTF8.php
└── php.ini.error_prepend_string.example


/README.md:
--------------------------------------------------------------------------------
  1 | # UTF8 support in PHP5
  2 | PHP5 UTF8 is a UTF-8 aware library of functions mirroring PHP's own string functions.
  3 | The powerful solution/contribution for UTF-8 support in your framework/CMS, written on PHP.
  4 | This library is advance of http://sourceforge.net/projects/phputf8 (last updated in 2007).
  5 | 
  6 | ## Features and benefits
  7 | 
  8 | 1. Compatibility with the interface standard PHP functions that deal with single-byte encodings
  9 | 1. Ability to work without PHP extensions ICONV and MBSTRING, if any, that are actively used! Uses the fastest available method between MBSTRING, ICONV, native on PHP and hacks.
 10 | 1. Useful features are missing from the ICONV and MBSTRING
 11 | 1. The methods that take and return a string, are able to take and return null. This useful for selects from a database.
 12 | 1. Several methods are able to process arrays recursively: `array_change_key_case()`, `convert_from()`, `convert_to()`, `strict()`, `is_utf8()`, `blocks_check()`, `convert_case()`, `lowercase()`, `uppercase()`, `unescape()`
 13 | 1. Validating method parameters to allowed types via reflection (You can disable it)
 14 | 1. A single interface and encapsulation, You can inherit and override
 15 | 1. Test coverage
 16 | 1. PHP >= 5.3.x
 17 | 
 18 | Example:
 19 | 
 20 |     $s = 'Hello, Привет';
 21 |     if (UTF8::is_utf8($s)) echo UTF8::strlen($s);
 22 | 
 23 | ## Standard PHP functions, implemented for UTF-8 encoding string
 24 | 
 25 | ### Alphabetical order list
 26 | 
 27 | 1. `array_change_key_case()`
 28 | 1. `chr()` — Converts a UNICODE codepoint to a UTF-8 character
 29 | 1. `chunk_split()`
 30 | 1. `ltrim()`
 31 | 1. `ord()` — Converts a UTF-8 character to a UNICODE codepoint
 32 | 1. `preg_match_all()` — Call `preg_match_all()` and convert byte offsets into character offsets for `PREG_OFFSET_CAPTURE` flag. This is regardless of whether you use `/u` modifier.
 33 | 1. `range()`
 34 | 1. `rtrim()`
 35 | 1. `str_pad()`
 36 | 1. `str_split()`
 37 | 1. `strcasecmp()`
 38 | 1. `strcmp()`
 39 | 1. `stripos()`
 40 | 1. `strlen()`
 41 | 1. `strncmp()`
 42 | 1. `strpos()`
 43 | 1. `strrev()`
 44 | 1. `strspn()`
 45 | 1. `strtolower()`, `lowercase()` is alias
 46 | 1. `strtoupper()`, `uppercase()` is alias
 47 | 1. `strtr()`
 48 | 1. `substr()`
 49 | 1. `substr_replace()`
 50 | 1. `trim()`
 51 | 1. `ucfirst()`
 52 | 1. `ucwords()`
 53 | 
 54 | ## Extra useful functions for UTF-8 encoding string
 55 | 
 56 | ### Alphabetical order list:
 57 | 
 58 | 1. `blocks_check()` — Check the data in UTF-8 charset on given ranges of the standard UNICODE. The suitable alternative to regular expressions.
 59 | 1. `convert_case()` — Конвертирует регистр букв в данных в кодировке UTF-8. Массивы обходятся рекурсивно, при этом конвертируются только значения в элементах массива, а ключи остаются без изменений.
 60 | 1. `convert_files_from()` — Recode the text files in a specified folder in the UTF-8. In the processing skipped binary files, files encoded in UTF-8, files that could not convert.
 61 | 1. `convert_from()` — Encodes data from another character encoding to UTF-8.
 62 | 1. `convert_to()` — Encodes data from UTF-8 to another character encoding.
 63 | 1. `diactrical_remove()` — Remove combining diactrical marks, with possibility of the restore. Удаляет диакритические знаки в тексте, с возможностью восстановления (опция)
 64 | 1. `diactrical_restore()` — Restore combining diactrical marks, removed by diactrical_remove(). Восстанавливает диакритические знаки в тексте, при условии, что их символьные позиции и кол-во символов не изменились!
 65 | 1. `from_unicode()` — Converts a UNICODE codepoints to a UTF-8 string
 66 | 1. `has_binary()` — Check the data accessory to the class of control characters in ASCII.
 67 | 1. `html_entity_decode()` — Convert all HTML entities to native UTF-8 characters
 68 | 1. `html_entity_encode()` — Convert special UTF-8 characters to HTML entities.
 69 | 1. `is_ascii()` — Check the data accessory to the class of characters ASCII.
 70 | 1. `is_utf8()` — Returns true if data is valid UTF-8 and false otherwise. For null, integer, float, boolean returns TRUE.
 71 | 1. `preg_quote_case_insensitive()` — Make regular expression for case insensitive match
 72 | 1. `str_limit()`, `truncate()` — Обрезает текст в кодировке UTF-8 до заданной длины,	причём последнее слово показывается целиком, а не обрывается на середине.	Html сущности корректно обрабатываются.
 73 | 1. `strict()` — Strips out device control codes in the ASCII range.
 74 | 1. `textarea_rows()` — Calculates the height of the edit text in \<textarea\> html tag by value and width.
 75 | 1. `to_unicode()` — Converts a UTF-8 string to a UNICODE codepoints
 76 | 1. `unescape()` — Decodes a string to UTF-8 string from some formats (can be mixed)
 77 | 1. `unescape_request()` — Corrects the global arrays `$_GET`, `$_POST`, `$_COOKIE`, `$_REQUEST`, `$_FILES` decoded values from `%XX` and extended `%uXXXX` / `%u{XXXXXX}` format, for example, through an outdated JavaScript function `escape()`. Standard PHP5 cannot do it. Recode `$_GET`, `$_POST`, `$_COOKIE`, `$_REQUEST`, `$_FILES` from `$charset` encoding to UTF-8, if necessary. A side effect is a positive protection against XSS attacks with non-printable characters on the vulnerable PHP function. Thus web forms can be sent to the server in 2-encoding: `$charset` and UTF8. For example: `?тест[тест]=тест`
 78 | If in the `HTTP_COOKIE` there are parameters with the same name, takes the last value (as in the `QUERY_STRING`), not the first. Creates an array of `$_POST` for non-standard Content-Type, for example, `"Content-Type: application/octet-stream"`. Standard PHP5 creates an array for `"Content-Type: application/x-www-form-urlencoded"` and `"Content-Type: multipart/form-data"`.
 79 | 
 80 | Examples of `unescape()`
 81 | 
 82 |     '%D1%82%D0%B5%D1%81%D1%82'        => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #binary (regular)
 83 |     '0xD182D0B5D181D182'              => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #binary (compact)
 84 |     '%u0442%u0435%u0441%u0442'        => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #UCS-2  (U+0 — U+FFFF)
 85 |     '%u{442}%u{435}%u{0441}%u{00442}' => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #UTF-8  (U+0 — U+FFFFFF)
 86 | 
 87 | Examples of `unescape_request()`
 88 | 
 89 |     '%F2%E5%F1%F2'                    => 'тест'  #CP1251 (regular)
 90 |     '0xF2E5F1F2'                      => 'тест'  #CP1251 (compact)
 91 |     '%D1%82%D0%B5%D1%81%D1%82'        => 'тест'  #UTF-8 (regular)
 92 |     '0xD182D0B5D181D182'              => 'тест'  #UTF-8 (compact)
 93 |     '%u0442%u0435%u0441%u0442'        => 'тест'  #UCS-2 (U+0 — U+FFFF)
 94 |     '%u{442}%u{435}%u{0441}%u{00442}' => 'тест'  #UTF-8 (U+0 — U+FFFFFF)
 95 | 
 96 | # Поддержка UTF8 в PHP5
 97 | 
 98 | ## Возможности и преимущества
 99 | 
100 | 1. Совместимость с интерфейсом стандартных PHP функций, работающих с однобайтовыми кодировками
101 | 1. Возможность работы без PHP расширений ICONV и MBSTRING, если они есть, то активно используются! Используется наиболее быстрый из доступных методов между MBSTRING, ICONV, родной реализацией на PHP и хаками.
102 | 1. Полезные функции, отсутствующие в ICONV и MBSTRING
103 | 1. Методы, которые принимают и возвращают строку, умеют принимать и возвращать null. Это удобно при выборках значений из базы данных.
104 | 1. Несколько методов умеют обрабатывать массивы рекурсивно: `array_change_key_case()`, `convert_from()`, `convert_to()`, `strict()`, `is_utf8()`, `blocks_check()`, `convert_case()`, `lowercase()`, `uppercase()`, `unescape()`
105 | 1. Проверка у методов входных параметров на допустимые типы через рефлексию (можно отключить)
106 | 1. Единый интерфейс и инкапсуляция, можно унаследоваться и переопределить методы
107 | 1. Покрытие тестами
108 | 1. PHP >= 5.3.x
109 | 
110 | Example:
111 | 
112 |     $s = 'Hello, Привет';
113 |     if (UTF8::is_utf8($s)) echo UTF8::strlen($s);
114 |   
115 | Project was exported from http://code.google.com/p/php5-utf8
116 | 


--------------------------------------------------------------------------------
/ReflectionTypeHint.php:
--------------------------------------------------------------------------------
  1 | <?php
  2 | /**
  3 |  * A class for validating method parameters to allowed types via reflection.
  4 |  *
  5 |  * Purpose
  6 |  *   * Used as a more convenient mechanism than a big code for checking types,
  7 |  *     standing after the declaration of the methods.
  8 |  *   * Requires write correct phpDoc
  9 |  *
 10 |  * Features
 11 |  *   * Very easy to use
 12 |  *   * Ability to turn off on the production server
 13 |  *
 14 |  * Understanding
 15 |  *   All built-in PHP functions check the type of input variables and the "swearing", if not given.
 16 |  *   ReflectionTypeHint does too.
 17 |  *   Previously, I wrote this (the correct way, but a lot of code):
 18 |  *   if (! is_bool($b)) {
 19 |  *       trigger_error('A bool type expected in 1-st parameter, ' . gettype($b)   . ' type given!', E_USER_WARNING);
 20 |  *       return false;
 21 |  *   }
 22 |  *   if (! is_string($s)) {
 23 |  *       trigger_error('A string type expected in 2-nd parameter, ' . gettype($s)   . ' type given!', E_USER_WARNING);
 24 |  *       return false;
 25 |  *   }
 26 |  *   Now I'm doing this one line of code:
 27 |  *   if (! ReflectionTypeHint::isValid()) return false;
 28 |  *
 29 |  * WARNING
 30 |  *   On a production server, it is important to disable assert, that would save server resources.
 31 |  *   For this, use the assert_options(ASSERT_ACTIVE, false) or INI setting "assert.active 0".
 32 |  *   In this case ReflectionTypeHint::isValid() always immediately returns TRUE!
 33 |  *
 34 |  * Useful links
 35 |  *   http://www.ilia.ws/archives/205-Type-hinting-for-PHP-5.3.html
 36 |  *   http://php.net/manual/en/language.oop5.typehinting.php
 37 |  * 
 38 |  * @example  ReflectionTypeHint_example.php
 39 |  * @link     http://code.google.com/p/php5-reflection-type-hint/
 40 |  * @license  http://creativecommons.org/licenses/by-sa/3.0/
 41 |  * @author   Nasibullin Rinat
 42 |  * @version  1.1.0
 43 |  */
 44 | class ReflectionTypeHint
 45 | {
 46 | 	protected static $hints = array(
 47 | 		'int'      => 'is_int',
 48 | 		'integer'  => 'is_int',
 49 | 		'digit'    => 'ctype_digit',
 50 | 		'number'   => 'ctype_digit',
 51 | 		'float'    => 'is_float',
 52 | 		'double'   => 'is_float',
 53 | 		'real'     => 'is_float',
 54 | 		'numeric'  => 'is_numeric',
 55 | 		'str'      => 'is_string',
 56 | 		'string'   => 'is_string',
 57 | 		'char'     => 'is_string',
 58 | 		'bool'     => 'is_bool',
 59 | 		'boolean'  => 'is_bool',
 60 | 		'null'     => 'is_null',
 61 | 		'array'    => 'is_array',
 62 | 		'obj'      => 'is_object',
 63 | 		'object'   => 'is_object',
 64 | 		'res'      => 'is_resource',
 65 | 		'resource' => 'is_resource',
 66 | 		'scalar'   => 'is_scalar',  #integer, float, string or boolean
 67 | 		'cb'       => 'is_callable',
 68 | 		'callback' => 'is_callable',
 69 | 	);
 70 | 
 71 | 	#calling the methods of this class only statically!
 72 | 	private function __construct() {}
 73 | 
 74 | 	public static function isValid()
 75 | 	{
 76 | 		if (! assert_options(ASSERT_ACTIVE)) return true;
 77 | 		$bt = self::debugBacktrace(null, 1);
 78 | 		extract($bt);  //to $file, $line, $function, $class, $object, $type, $args
 79 | 		if (! $args) return true; #speed improve
 80 | 		$r = new ReflectionMethod($class, $function);
 81 | 		$doc = $r->getDocComment();
 82 | 		$cache_id = $class. $type. $function;
 83 | 		preg_match_all('~	[\r\n]++ [\x20\t]++ \* [\x20\t]++
 84 | 							@param
 85 | 							[\x20\t]++
 86 | 							\K #memory reduce
 87 | 							( [_a-z]++[_a-z\d]*+
 88 | 								(?>[|/,][_a-z]+[_a-z\d]*)*+
 89 | 							) #1 types
 90 | 							[\x20\t]++
 91 | 							&?+\$([_a-z]++[_a-z\d]*+) #2 name
 92 | 						~sixSX', $doc, $params, PREG_SET_ORDER);
 93 | 		$parameters = $r->getParameters();
 94 | 		//d($args, $params, $parameters);
 95 | 		if (count($parameters) > count($params))
 96 | 		{
 97 | 			$message = 'phpDoc %d piece(s) @param description expected in %s%s%s(), %s given, ' . PHP_EOL
 98 | 					 . 'called in %s on line %d ' . PHP_EOL
 99 | 					 . 'and defined in %s on line %d';
100 | 			$message = sprintf($message, count($parameters), $class, $type, $function, count($params), $file, $line, $r->getFileName(), $r->getStartLine());
101 | 			trigger_error($message, E_USER_NOTICE);
102 | 		}
103 | 		foreach ($args as $i => $value)
104 | 		{
105 | 			if (! isset($params[$i])) return true;
106 | 			if ($parameters[$i]->name !== $params[$i][2])
107 | 			{
108 | 				$param_num = $i + 1;
109 | 				$message = 'phpDoc @param %d in %s%s%s() must be named as $%s, $%s given, ' . PHP_EOL
110 | 						 . 'called in %s on line %d ' . PHP_EOL
111 | 						 . 'and defined in %s on line %d';
112 | 				$message = sprintf($message, $param_num, $class, $type, $function, $parameters[$i]->name, $params[$i][2], $file, $line, $r->getFileName(), $r->getStartLine());
113 | 				trigger_error($message, E_USER_NOTICE);
114 | 			}
115 | 
116 | 			$hints = preg_split('~[|/,]~sSX', $params[$i][1]);
117 | 			if (! self::checkValueTypes($hints, $value))
118 | 			{
119 | 				$param_num = $i + 1;
120 | 				$message = 'Argument %d passed to %s%s%s() must be an %s, %s given, ' . PHP_EOL
121 | 						 . 'called in %s on line %d ' . PHP_EOL
122 | 						 . 'and defined in %s on line %d';
123 | 				$message = sprintf($message, $param_num, $class, $type, $function, implode('|', $hints), (is_object($value) ? get_class($value) . ' ' : '') . gettype($value), $file, $line, $r->getFileName(), $r->getStartLine());
124 | 				trigger_error($message, E_USER_WARNING);
125 | 				return false;
126 | 			}
127 | 		}
128 | 		return true;
129 | 	}
130 | 
131 | 	/**
132 | 	 * Return stacktrace. Correctly work with call_user_func*()
133 | 	 * (totally skip them correcting caller references).
134 | 	 * If $return_frame is present, return only $return_frame matched caller, not all stacktrace.
135 | 	 *
136 | 	 * @param   string|null  $re_ignore     example: '~^' . preg_quote(__CLASS__, '~') . '(?![a-zA-Z\d])~sSX'
137 | 	 * @param   int|null     $return_frame
138 | 	 * @return  array
139 | 	 */
140 | 	public static function debugBacktrace($re_ignore = null, $return_frame = null)
141 | 	{
142 | 		$trace = debug_backtrace();
143 | 
144 | 		$a = array();
145 | 		$frames = 0;
146 | 		for ($i = 0, $n = count($trace); $i < $n; $i++)
147 | 		{
148 | 			$t = $trace[$i];
149 | 			if (! $t) continue;
150 | 
151 | 			// Next frame.
152 | 			$next = isset($trace[$i+1])? $trace[$i+1] : null;
153 | 
154 | 			// Dummy frame before call_user_func*() frames.
155 | 			if (! isset($t['file']) && $next)
156 | 			{
157 | 				$t['over_function'] = $trace[$i+1]['function'];
158 | 				$t = $t + $trace[$i+1];
159 | 				$trace[$i+1] = null; // skip call_user_func on next iteration
160 | 			}
161 | 
162 | 			// Skip myself frame.
163 | 			if (++$frames < 2) continue;
164 | 
165 | 			// 'class' and 'function' field of next frame define where this frame function situated.
166 | 			// Skip frames for functions situated in ignored places.
167 | 			if ($re_ignore && $next)
168 | 			{
169 | 				// Name of function "inside which" frame was generated.
170 | 				$frame_caller = (isset($next['class']) ? $next['class'] . $next['type'] : '')
171 | 							  . (isset($next['function']) ? $next['function'] : '');
172 | 				if (preg_match($re_ignore, $frame_caller)) continue;
173 | 			}
174 | 
175 | 			// On each iteration we consider ability to add PREVIOUS frame to $a stack.
176 | 			if (count($a) === $return_frame) return $t;
177 | 			$a[] = $t;
178 | 		}
179 | 		return $a;
180 | 	}
181 | 
182 | 	/**
183 | 	 * Checks a value to the allowed types
184 | 	 *
185 | 	 * @param   array  $types
186 | 	 * @param   mixed  $value
187 | 	 * @return  bool
188 | 	 */
189 | 	public static function checkValueTypes(array $types, $value)
190 | 	{
191 | 		foreach ($types as $type)
192 | 		{
193 | 			$type = strtolower($type);
194 | 			if (array_key_exists($type, self::$hints) && call_user_func(self::$hints[$type], $value)) return true;
195 | 			if (is_object($value) && @is_a($value, $type)) return true;
196 | 			if ($type === 'mixed') return true;
197 | 		}
198 | 		return false;
199 | 	}
200 | }


--------------------------------------------------------------------------------
/ReflectionTypeHint_example.php:
--------------------------------------------------------------------------------
 1 | <?php
 2 | class Example
 3 | {
 4 | 	/**
 5 | 	 * This is myMethod!
 6 | 	 *
 7 | 	 * @param   string|array  $s  param1
 8 | 	 * @param   int           $i  param2
 9 | 	 * @param   Example|null  $e  param3
10 | 	 * @param   bool          $b  param4
11 | 	 * @param   array/null    $a  param5
12 | 	 * @return  array|bool    Returns FALSE if error occurred
13 | 	 */
14 | 	public function myMethod($s, $i, $e = null, $b = true, array $a = null)
15 | 	{
16 | 		if (! ReflectionTypeHint::isValid()) return false;
17 | 		//...
18 | 	}
19 | }
20 | 
21 | Example::myMethod('sss', 75467, $e, true);
22 | $e = new Example();
23 | //$e->myMethod('sss', 75467, new Exception(), true);
24 | 


--------------------------------------------------------------------------------
/UTF8-CHANGELOG.txt:
--------------------------------------------------------------------------------
 1 | 2.3.1 / 2012-03-11
 2 | 
 3 | 	* UTF8::QUOTATION_MARK_RE new constant added
 4 | 	* UTF8::$html_quotation_mark_table added
 5 | 	* UTF8::ucfirst() improved
 6 | 	* UTF8::ucwords() improved
 7 | 	* UTF8::convert_files_from() improved
 8 | 	* UTF8::array_change_key_case() recursive support added
 9 | 	* UTF8::html_entity_encode() binary support
10 | 	* UTF8::html_entity_decode() &apos; entity support
11 | 	* UTF8::str_limit() syntax error: "preg_relace" instead of "preg_replace"
12 | 	* Small bugs fixed
13 | 
14 | 2.3.0 / 2011-10-06
15 | 
16 | 	* Constants BOM, CHAR_UPPER_RE, CHAR_LOWER_RE, HTML_ENTITY_RE added
17 | 	* UTF8::has_binary() - new method added
18 | 	* UTF8::strict() - recursive support added
19 | 	* UTF8::$char_re renamed to constant CHAR_RE,
20 | 	  UTF8::$diactrical_re renamed to constant DIACTRICAL_RE
21 | 	* UTF8::unescape_request() - improved, $charset parameter added
22 | 	* UTF8::unescape() - improved and interface changed from
23 | 	  ($data, $is_rawurlencode = false) to ($data, $is_hex2bin = false, $is_urldecode = true)
24 | 	* UTF8::autoconvert_request() removed, use UTF8::unescape_request() instead
25 | 	* UTF8::is_ascii() - recursive support removed (was ambiguity),
26 | 	  second paramether added, for non string/int/float always returns FALSE
27 | 	* UTF8::blocks_check() - for non string/int/float always returns FALSE
28 | 	* UTF8::str_limit() - small internal improved
29 | 	* UTF8::preg_quote_case_insensitive() - speed improved
30 | 
31 | 2.2.2 / 2011-06-24
32 | 
33 | 	* Convert case functions improved: from all russian charsets to UTF8 native support was added
34 | 	* UTF8::stripos() speed improved
35 | 	* Constant REPLACEMENT_CHAR added
36 | 
37 | 2.2.1 / 2011-06-08
38 | 
39 | 	* UTF8::preg_quote_case_insensitive() added
40 | 	* UTF8::stripos() speed improved
41 | 
42 | 2.2.0 / 2011-06-06
43 | 
44 | 	* UTF8::strlen(), UTF8::substr(), UTF8::strpos(),
45 | 	  UTF8::html_entity_encode(), UTF8::html_entity_decode(),
46 |       UTF8::convert_case(), UTF8::lowercase(), UTF8::uppercase() speed improved
47 | 	* UTF8::stripos(), UTF8::to_unicode(), UTF8::from_unicode() added
48 | 	* UTF8::strtolower(), UTF8::strtoupper() as wrapper to UTF8::convert_case() added
49 | 	* Unicode character database to 6.0.0 (2010-06-04) updated
50 | 	* UTF8::$convert_case_table improved
51 | 
52 | 2.1.3 / 2011-05-31
53 | 
54 | 	* UTF8::truncate() small bug fixed
55 | 
56 | 2.1.2 / 2011-03-25
57 | 
58 | 	* Класс требует PHP-5.3.x
59 | 	* UTF8::$char_re deprecated
60 | 	* Добавлен метод UTF8::tests(), который тестирует методы класса на правильность работы
61 | 	* Добавлены методы UTF8::strcmp(), UTF8::strncmp(), UTF8::strcasecmp()
62 | 	* UTF8::is_utf8(), UTF8::str_limit(), UTF8::str_split() speed improved
63 | 	* Добавлен 2-й параметр в UTF8::html_entity_encode()
64 | 	* Добавлен 3-й параметр в UTF8::ucwords()
65 | 	* Методы UTF8::convert_case(), UTF8::lowercase(), UTF8::uppercase() могут принимать массив в 1-м параметре
66 | 	* Мелкие улучшения в UTF8::strtr()
67 | 	* Модернизирован класс ReflectionTypeHint
68 | 
69 | 2.1.1 / 2010-07-19
70 | 
71 | 	* Добавлены методы array_change_key_case(), range(), strtr()
72 | 	* Улучшен метод convert_files_from()
73 | 	* Unicode Character Database 5.2.0
74 | 	* Исправлены ошибки в trim(), ltrim(), rtrim(), str_pad(), которые могут возникать в некоторых случаях
75 | 
76 | 2.1.0 / 2010-03-26
77 | 
78 | 	* Удалён метод unescape_recursive()
79 | 	* Добавлен метод convert_files_from()
80 | 	* Несколько методов теперь могут принимать массив и делать их обход рекурсивно
81 | 	* Почти все методы для обработки строк могут принимать и возвращать NULL
82 | 
83 | 2.0.2 / 2010-02-13
84 | 
85 | 	* Новые методы is_ascii(), ltrim(), rtrim(), trim(), str_pad(), strspn()
86 | 	* Исправлена небольшая ошибка в str_limit()
87 | 	* Исправлена ошибка в методах convert_from() и convert_to(): они ошибочно возвращали FALSE,
88 | 	  если подать на вход массив, содержащий элементы типа boolean со значением FALSE
89 | 
90 | 2.0.1 / 2010-02-08
91 | 
92 | 	* Удалён метод convert_from_cp1259(), используйте convert_from('cp1251')
93 | 	* Метод convert_from_utf16() теперь приватный, используйте convert_from('UTF-16')
94 | 	* Добавлены методы convert_to(), diactrical_remove(), diactrical_restore()
95 | 	* Другие мелкие исправления
96 | 


--------------------------------------------------------------------------------
/UTF8.php:
--------------------------------------------------------------------------------
   1 | <?php
   2 | /**
   3 |  * UTF8 support in PHP5.
   4 |  * PHP5 UTF8 is a UTF8 aware library of functions mirroring PHP's own string functions.
   5 |  *
   6 |  * The powerful solution/contribution for UTF-8 support in your framework/CMS, written on PHP.
   7 |  * This library is advance of http://sourceforge.net/projects/phputf8 (last updated in 2007).
   8 |  *
   9 |  * Features and benefits
  10 |  *   * Compatibility with the interface standard PHP functions that deal with single-byte encodings
  11 |  *   * Ability to work without PHP extensions ICONV and MBSTRING, if any, that are actively used!
  12 |  *     Uses the fastest available method between MBSTRING, ICONV, native on PHP and hacks.
  13 |  *   * Useful features are missing from the ICONV and MBSTRING
  14 |  *   * The methods that take and return a string, are able to take and return null.
  15 |  *     This useful for selects from a database.
  16 |  *   * Several methods are able to process arrays recursively:
  17 |  *     array_change_key_case(), convert_from(), convert_to(), strict(), is_utf8(), blocks_check(), convert_case(), lowercase(), uppercase(), unescape()
  18 |  *   * Validating method parameters to allowed types via reflection (You can disable it)
  19 |  *   * A single interface and encapsulation, You can inherit and override
  20 |  *   * Test coverage
  21 |  *   * PHP >= 5.3.x
  22 |  *
  23 |  * In Russian:
  24 |  *
  25 |  * Поддержка UTF-8 в PHP 5.
  26 |  *
  27 |  * Возможности и преимущества
  28 |  *   * Совместимость с интерфейсом стандартных PHP функций, работающих с однобайтовыми кодировками
  29 |  *   * Возможность работы без PHP расширений ICONV и MBSTRING, если они есть, то активно используются!
  30 |  *     Используется наиболее быстрый из доступных методов между MBSTRING, ICONV, родной реализацией на PHP и хаками.
  31 |  *   * Полезные функции, отсутствующие в ICONV и MBSTRING
  32 |  *   * Методы, которые принимают и возвращают строку, умеют принимать и возвращать null.
  33 |  *     Это удобно при выборках значений из базы данных.
  34 |  *   * Несколько методов умеют обрабатывать массивы рекурсивно:
  35 |  *     array_change_key_case(), convert_from(), convert_to(), strict(), is_utf8(), blocks_check(), convert_case(), lowercase(), uppercase(), unescape()
  36 |  *   * Проверка у методов входных параметров на допустимые типы через рефлексию (можно отключить)
  37 |  *   * Единый интерфейс и инкапсуляция, можно унаследоваться и переопределить методы
  38 |  *   * Покрытие тестами
  39 |  *   * PHP >= 5.3.x
  40 |  *
  41 |  * Example:
  42 |  *   $s = 'Hello, Привет';
  43 |  *   if (UTF8::is_utf8($s)) echo UTF8::strlen($s);
  44 |  *
  45 |  * UTF-8 encoding scheme:
  46 |  *   2^7   0x00000000 — 0x0000007F  0xxxxxxx
  47 |  *   2^11  0x00000080 — 0x000007FF  110xxxxx 10xxxxxx
  48 |  *   2^16  0x00000800 — 0x0000FFFF  1110xxxx 10xxxxxx 10xxxxxx
  49 |  *   2^21  0x00010000 — 0x001FFFFF  11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
  50 |  *   1-4 bytes length: 2^7 + 2^11 + 2^16 + 2^21 = 2 164 864
  51 |  *
  52 |  * If I was a owner of the world, I would leave only 2 encoding: UTF-8 and UTF-32 ;-)
  53 |  *
  54 |  * Useful links
  55 |  *   http://ru.wikipedia.org/wiki/UTF8
  56 |  *   http://www.madore.org/~david/misc/unitest/   A Unicode Test Page
  57 |  *   http://www.unicode.org/
  58 |  *   http://www.unicode.org/reports/
  59 |  *   http://www.unicode.org/reports/tr10/      Unicode Collation Algorithm
  60 |  *   http://www.unicode.org/Public/UCA/6.0.0/  Unicode Collation Algorithm
  61 |  *   http://www.unicode.org/reports/tr6/       A Standard Compression Scheme for Unicode
  62 |  *   http://www.fileformat.info/info/unicode/char/search.htm  Unicode Character Search
  63 |  *
  64 |  * @link     http://code.google.com/p/php5-utf8/
  65 |  * @license  http://creativecommons.org/licenses/by-sa/3.0/
  66 |  * @author   Nasibullin Rinat
  67 |  * @version  2.3.1
  68 |  */
  69 | class UTF8
  70 | {
  71 | 	/**
  72 | 	 * REPLACEMENT CHARACTER (for broken char)
  73 | 	 *
  74 | 	 * @var string
  75 | 	 */
  76 | 	const REPLACEMENT_CHAR = "\xEF\xBF\xBD"; #U+FFFD
  77 | 
  78 | 	/**
  79 | 	 * Byte order mark, http://en.wikipedia.org/wiki/Byte_Order_Mark
  80 | 	 *
  81 | 	 * @var string
  82 | 	 */
  83 | 	const BOM = "\xEF\xBB\xBF";
  84 | 
  85 | 	/**
  86 | 	 * Regular expression for a character in UTF-8.
  87 | 	 * For engines, which don't support UTF8 mode.
  88 | 	 * In PCRE use a dot (".") and the flag /u, it works much faster!
  89 | 	 *
  90 | 	 * @var string
  91 | 	 */
  92 | 	const CHAR_RE =
  93 | 		'[\x09\x0A\x0D\x20-\x7E]            # ASCII strict
  94 | 		# [\x00-\x7F]                       # ASCII non-strict (including control chars)
  95 | 		| [\xC2-\xDF][\x80-\xBF]            # non-overlong 2-byte
  96 | 		|  \xE0[\xA0-\xBF][\x80-\xBF]       # excluding overlongs
  97 | 		| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
  98 | 		|  \xED[\x80-\x9F][\x80-\xBF]       # excluding surrogates
  99 | 		|  \xF0[\x90-\xBF][\x80-\xBF]{2}    # planes 1-3
 100 | 		| [\xF1-\xF3][\x80-\xBF]{3}         # planes 4-15
 101 | 		|  \xF4[\x80-\x8F][\x80-\xBF]{2}    # plane 16
 102 | 		';
 103 | 
 104 | 	/**
 105 | 	 * Combining diactrical marks (Unicode 5.1).
 106 | 	 * \p{M} in PCRE terms.
 107 | 	 * For engines, which don't support UTF8 mode.
 108 | 	 *
 109 | 	 * For example, russian letters in composed form: "Ё" (U+0401), "Й" (U+0419),
 110 | 	 * decomposed form: (U+0415 U+0308), (U+0418 U+0306)
 111 | 	 *
 112 | 	 * @link http://www.unicode.org/charts/PDF/U0300.pdf
 113 | 	 * @link http://www.unicode.org/charts/PDF/U1DC0.pdf
 114 | 	 * @link http://www.unicode.org/charts/PDF/UFE20.pdf
 115 | 	 * @var  string
 116 | 	 */
 117 | 	const DIACTRICAL_RE =
 118 | 		'   \xcc[\x80-\xb9]|\xcd[\x80-\xaf]  #UNICODE range: U+0300 — U+036F (for letters)
 119 | 		  | \xe2\x83[\x90-\xbf]              #UNICODE range: U+20D0 — U+20FF (for symbols)
 120 | 		  | \xe1\xb7[\x80-\xbf]              #UNICODE range: U+1DC0 — U+1DFF (supplement)
 121 | 		  | \xef\xb8[\xa0-\xaf]              #UNICODE range: U+FE20 — U+FE2F (combining half marks)
 122 | 		';
 123 | 
 124 | 	/**
 125 | 	 * \p{Lu} in PCRE terms.
 126 | 	 * For engines, which don't support UTF8 mode.
 127 | 	 *
 128 | 	 * @var string
 129 | 	 */
 130 | 	const CHAR_UPPER_RE = '[\x41-\x5a]
 131 | 							| \xc3[\x80-\x9e]
 132 | 							| \xc4[\x80-\xbf]
 133 | 							| \xc5[\x81-\xbd]
 134 | 							| \xc6[\x81-\xbc]
 135 | 							| \xc7[\x85-\xbe]
 136 | 							| \xc8[\x80-\xb2]
 137 | 							| \xce[\x86-\xab]
 138 | 							| \xcf[\x98-\xae]
 139 | 							| \xd0[\x80-\xaf]
 140 | 							| \xd1[\xa0-\xbe]
 141 | 							| \xd2[\x80-\xbe]
 142 | 							| \xd3[\x81-\xb8]
 143 | 							| \xd4[\x80-\xbf]
 144 | 							| \xd5[\x80-\x96]
 145 | 							| \xe1[\xb8\xb9\xba][\x80-\xbe]
 146 | 							| \xe1\xbb[\x80-\xb8]
 147 | 							| \xe1\xbc[\x88-\xbf]
 148 | 							| \xe1\xbd[\x88-\xaf]
 149 | 							| \xe1[\xbe\xbf][\x88-\xbc]
 150 | 							| \xef\xbc[\xa1-\xba]
 151 | 							';
 152 | 
 153 | 	/**
 154 | 	 * \p{Ll} in PCRE terms.
 155 | 	 * For engines, which don't support UTF8 mode.
 156 | 	 *
 157 | 	 * @var string
 158 | 	 */
 159 | 	const CHAR_LOWER_RE = '[\x61-\x7a]
 160 | 							| \xc2\xb5
 161 | 							| \xc3[\xa0-\xbf]
 162 | 							| \xc4[\x81-\xbe]
 163 | 							| \xc5[\x80-\xbe]
 164 | 							| \xc6[\x83-\xbf]
 165 | 							| \xc7[\x86-\xbf]
 166 | 							| \xc8[\x81-\xb3]
 167 | 							| \xc9[\x93-\xb5]
 168 | 							| \xca[\x80-\x92]
 169 | 							| \xce[\xac-\xbf]
 170 | 							| \xcf[\x80-\xaf]
 171 | 							| \xd0[\xb0-\xbf]
 172 | 							| \xd1[\x80-\xbf]
 173 | 							| \xd2[\x81-\xbf]
 174 | 							| \xd3[\x82-\xb9]
 175 | 							| \xd4[\x81-\x8f]
 176 | 							| \xd5[\xa1-\xbf]
 177 | 							| \xd6[\x80-\x86]
 178 | 							| \xe1[\xb8\xb9\xba][\x81-\xbf]
 179 | 							| \xe1\xbb[\x81-\xb9]
 180 | 							| \xe1\xbc[\x80-\xb7]
 181 | 							| \xe1\xbd[\x80-\xbd]
 182 | 							| \xe1\xbe[\x80-\xb3]
 183 | 							| \xe1\xbf[\x83-\xb3]
 184 | 							| \xef\xbd[\x81-\x9a]
 185 | 							';
 186 | 
 187 | 	/**
 188 | 	 * HTML entities, examples: &gt; &Ouml; &#x02DC; &#34;
 189 | 	 *
 190 | 	 * @var string
 191 | 	 */
 192 | 	const HTML_ENTITY_RE = '&(?> [a-zA-Z][a-zA-Z\d]++
 193 | 							   | \#(?> \d{1,4}+
 194 | 									 | x[\da-fA-F]{2,4}+
 195 | 								   )
 196 | 							 );
 197 | 							';
 198 | 
 199 | 	/**
 200 | 	 * Quotation marks.
 201 | 	 * For engines, which don't support UTF8 mode.
 202 | 	 *
 203 | 	 * @var string
 204 | 	 */
 205 | 	const QUOTATION_MARK_RE = '\x22|\xc2[\xab\xbb]|\xe2\x80[\x98\x99\x9a\x9c\x9d\x9e\xb9\xba]';
 206 | 
 207 | 	/**
 208 | 	 *
 209 | 	 * @var array
 210 | 	 */
 211 | 	public static $html_quotation_mark_table = array(
 212 | 		'&quot;'   => "\x22",          #U+0022 ["] &#34; quotation mark = APL quote
 213 | 		'&laquo;'  => "\xc2\xab",      #U+00AB [«] left-pointing double angle quotation mark = left pointing guillemet
 214 | 		'&raquo;'  => "\xc2\xbb",      #U+00BB [»] right-pointing double angle quotation mark = right pointing guillemet
 215 | 		'&lsquo;'  => "\xe2\x80\x98",  #U+2018 [‘] left single quotation mark
 216 | 		'&rsquo;'  => "\xe2\x80\x99",  #U+2019 [’] right single quotation mark (and apostrophe!)
 217 | 		'&sbquo;'  => "\xe2\x80\x9a",  #U+201A [‚] single low-9 quotation mark
 218 | 		'&ldquo;'  => "\xe2\x80\x9c",  #U+201C [“] left double quotation mark
 219 | 		'&rdquo;'  => "\xe2\x80\x9d",  #U+201D [”] right double quotation mark
 220 | 		'&bdquo;'  => "\xe2\x80\x9e",  #U+201E [„] double low-9 quotation mark
 221 | 		'&lsaquo;' => "\xe2\x80\xb9",  #U+2039 [‹] single left-pointing angle quotation mark
 222 | 		'&rsaquo;' => "\xe2\x80\xba",  #U+203A [›] single right-pointing angle quotation mark
 223 | 	);
 224 | 
 225 | 	/**
 226 | 	 * HTML special chars table
 227 | 	 *
 228 | 	 * @var array
 229 | 	 */
 230 | 	public static $html_special_chars_table = array(
 231 | 		'&quot;' => "\x22",  #U+0022 ["] &#34; quotation mark = APL quote
 232 | 		'&amp;'  => "\x26",  #U+0026 [&] &#38; ampersand
 233 | 		'&lt;'   => "\x3c",  #U+003C [<] &#60; less-than sign
 234 | 		'&gt;'   => "\x3e",  #U+003E [>] &#62; greater-than sign
 235 | 		#&apos; entity is only available in XHTML/HTML5 and not in plain HTML, see http://www.w3.org/TR/xhtml1/#C_16
 236 | 		#'&apos;' => "\x27",  #U+0027 ['] &#39; apostrophe
 237 | 	);
 238 | 
 239 | 	/**
 240 | 	 * @link http://www.fileformat.info/format/w3c/entitytest.htm?sort=Unicode%20Character  HTML Entity Browser Test Page
 241 | 	 * @var  array
 242 | 	 */
 243 | 	public static $html_entity_table = array(
 244 | 		#Latin-1 Entities:
 245 | 		'&nbsp;'   => "\xc2\xa0",  #U+00A0 [ ] no-break space = non-breaking space
 246 | 		'&iexcl;'  => "\xc2\xa1",  #U+00A1 [¡] inverted exclamation mark
 247 | 		'&cent;'   => "\xc2\xa2",  #U+00A2 [¢] cent sign
 248 | 		'&pound;'  => "\xc2\xa3",  #U+00A3 [£] pound sign
 249 | 		'&curren;' => "\xc2\xa4",  #U+00A4 [¤] currency sign
 250 | 		'&yen;'    => "\xc2\xa5",  #U+00A5 [¥] yen sign = yuan sign
 251 | 		'&brvbar;' => "\xc2\xa6",  #U+00A6 [¦] broken bar = broken vertical bar
 252 | 		'&sect;'   => "\xc2\xa7",  #U+00A7 [§] section sign
 253 | 		'&uml;'    => "\xc2\xa8",  #U+00A8 [¨] diaeresis = spacing diaeresis
 254 | 		'&copy;'   => "\xc2\xa9",  #U+00A9 [©] copyright sign
 255 | 		'&ordf;'   => "\xc2\xaa",  #U+00AA [ª] feminine ordinal indicator
 256 | 		'&laquo;'  => "\xc2\xab",  #U+00AB [«] left-pointing double angle quotation mark = left pointing guillemet
 257 | 		'&not;'    => "\xc2\xac",  #U+00AC [¬] not sign
 258 | 		'&shy;'    => "\xc2\xad",  #U+00AD [ ] soft hyphen = discretionary hyphen
 259 | 		'&reg;'    => "\xc2\xae",  #U+00AE [®] registered sign = registered trade mark sign
 260 | 		'&macr;'   => "\xc2\xaf",  #U+00AF [¯] macron = spacing macron = overline = APL overbar
 261 | 		'&deg;'    => "\xc2\xb0",  #U+00B0 [°] degree sign
 262 | 		'&plusmn;' => "\xc2\xb1",  #U+00B1 [±] plus-minus sign = plus-or-minus sign
 263 | 		'&sup2;'   => "\xc2\xb2",  #U+00B2 [²] superscript two = superscript digit two = squared
 264 | 		'&sup3;'   => "\xc2\xb3",  #U+00B3 [³] superscript three = superscript digit three = cubed
 265 | 		'&acute;'  => "\xc2\xb4",  #U+00B4 [´] acute accent = spacing acute
 266 | 		'&micro;'  => "\xc2\xb5",  #U+00B5 [µ] micro sign
 267 | 		'&para;'   => "\xc2\xb6",  #U+00B6 [¶] pilcrow sign = paragraph sign
 268 | 		'&middot;' => "\xc2\xb7",  #U+00B7 [·] middle dot = Georgian comma = Greek middle dot
 269 | 		'&cedil;'  => "\xc2\xb8",  #U+00B8 [¸] cedilla = spacing cedilla
 270 | 		'&sup1;'   => "\xc2\xb9",  #U+00B9 [¹] superscript one = superscript digit one
 271 | 		'&ordm;'   => "\xc2\xba",  #U+00BA [º] masculine ordinal indicator
 272 | 		'&raquo;'  => "\xc2\xbb",  #U+00BB [»] right-pointing double angle quotation mark = right pointing guillemet
 273 | 		'&frac14;' => "\xc2\xbc",  #U+00BC [¼] vulgar fraction one quarter = fraction one quarter
 274 | 		'&frac12;' => "\xc2\xbd",  #U+00BD [½] vulgar fraction one half = fraction one half
 275 | 		'&frac34;' => "\xc2\xbe",  #U+00BE [¾] vulgar fraction three quarters = fraction three quarters
 276 | 		'&iquest;' => "\xc2\xbf",  #U+00BF [¿] inverted question mark = turned question mark
 277 | 		#Latin capital letter
 278 | 		'&Agrave;' => "\xc3\x80",  #Latin capital letter A with grave = Latin capital letter A grave
 279 | 		'&Aacute;' => "\xc3\x81",  #Latin capital letter A with acute
 280 | 		'&Acirc;'  => "\xc3\x82",  #Latin capital letter A with circumflex
 281 | 		'&Atilde;' => "\xc3\x83",  #Latin capital letter A with tilde
 282 | 		'&Auml;'   => "\xc3\x84",  #Latin capital letter A with diaeresis
 283 | 		'&Aring;'  => "\xc3\x85",  #Latin capital letter A with ring above = Latin capital letter A ring
 284 | 		'&AElig;'  => "\xc3\x86",  #Latin capital letter AE = Latin capital ligature AE
 285 | 		'&Ccedil;' => "\xc3\x87",  #Latin capital letter C with cedilla
 286 | 		'&Egrave;' => "\xc3\x88",  #Latin capital letter E with grave
 287 | 		'&Eacute;' => "\xc3\x89",  #Latin capital letter E with acute
 288 | 		'&Ecirc;'  => "\xc3\x8a",  #Latin capital letter E with circumflex
 289 | 		'&Euml;'   => "\xc3\x8b",  #Latin capital letter E with diaeresis
 290 | 		'&Igrave;' => "\xc3\x8c",  #Latin capital letter I with grave
 291 | 		'&Iacute;' => "\xc3\x8d",  #Latin capital letter I with acute
 292 | 		'&Icirc;'  => "\xc3\x8e",  #Latin capital letter I with circumflex
 293 | 		'&Iuml;'   => "\xc3\x8f",  #Latin capital letter I with diaeresis
 294 | 		'&ETH;'    => "\xc3\x90",  #Latin capital letter ETH
 295 | 		'&Ntilde;' => "\xc3\x91",  #Latin capital letter N with tilde
 296 | 		'&Ograve;' => "\xc3\x92",  #Latin capital letter O with grave
 297 | 		'&Oacute;' => "\xc3\x93",  #Latin capital letter O with acute
 298 | 		'&Ocirc;'  => "\xc3\x94",  #Latin capital letter O with circumflex
 299 | 		'&Otilde;' => "\xc3\x95",  #Latin capital letter O with tilde
 300 | 		'&Ouml;'   => "\xc3\x96",  #Latin capital letter O with diaeresis
 301 | 		'&times;'  => "\xc3\x97",  #U+00D7 [×] multiplication sign
 302 | 		'&Oslash;' => "\xc3\x98",  #Latin capital letter O with stroke = Latin capital letter O slash
 303 | 		'&Ugrave;' => "\xc3\x99",  #Latin capital letter U with grave
 304 | 		'&Uacute;' => "\xc3\x9a",  #Latin capital letter U with acute
 305 | 		'&Ucirc;'  => "\xc3\x9b",  #Latin capital letter U with circumflex
 306 | 		'&Uuml;'   => "\xc3\x9c",  #Latin capital letter U with diaeresis
 307 | 		'&Yacute;' => "\xc3\x9d",  #Latin capital letter Y with acute
 308 | 		'&THORN;'  => "\xc3\x9e",  #Latin capital letter THORN
 309 | 		#Latin small letter
 310 | 		'&szlig;'  => "\xc3\x9f",  #Latin small letter sharp s = ess-zed
 311 | 		'&agrave;' => "\xc3\xa0",  #Latin small letter a with grave = Latin small letter a grave
 312 | 		'&aacute;' => "\xc3\xa1",  #Latin small letter a with acute
 313 | 		'&acirc;'  => "\xc3\xa2",  #Latin small letter a with circumflex
 314 | 		'&atilde;' => "\xc3\xa3",  #Latin small letter a with tilde
 315 | 		'&auml;'   => "\xc3\xa4",  #Latin small letter a with diaeresis
 316 | 		'&aring;'  => "\xc3\xa5",  #Latin small letter a with ring above = Latin small letter a ring
 317 | 		'&aelig;'  => "\xc3\xa6",  #Latin small letter ae = Latin small ligature ae
 318 | 		'&ccedil;' => "\xc3\xa7",  #Latin small letter c with cedilla
 319 | 		'&egrave;' => "\xc3\xa8",  #Latin small letter e with grave
 320 | 		'&eacute;' => "\xc3\xa9",  #Latin small letter e with acute
 321 | 		'&ecirc;'  => "\xc3\xaa",  #Latin small letter e with circumflex
 322 | 		'&euml;'   => "\xc3\xab",  #Latin small letter e with diaeresis
 323 | 		'&igrave;' => "\xc3\xac",  #Latin small letter i with grave
 324 | 		'&iacute;' => "\xc3\xad",  #Latin small letter i with acute
 325 | 		'&icirc;'  => "\xc3\xae",  #Latin small letter i with circumflex
 326 | 		'&iuml;'   => "\xc3\xaf",  #Latin small letter i with diaeresis
 327 | 		'&eth;'    => "\xc3\xb0",  #Latin small letter eth
 328 | 		'&ntilde;' => "\xc3\xb1",  #Latin small letter n with tilde
 329 | 		'&ograve;' => "\xc3\xb2",  #Latin small letter o with grave
 330 | 		'&oacute;' => "\xc3\xb3",  #Latin small letter o with acute
 331 | 		'&ocirc;'  => "\xc3\xb4",  #Latin small letter o with circumflex
 332 | 		'&otilde;' => "\xc3\xb5",  #Latin small letter o with tilde
 333 | 		'&ouml;'   => "\xc3\xb6",  #Latin small letter o with diaeresis
 334 | 		'&divide;' => "\xc3\xb7",  #U+00F7 [÷] division sign
 335 | 		'&oslash;' => "\xc3\xb8",  #Latin small letter o with stroke = Latin small letter o slash
 336 | 		'&ugrave;' => "\xc3\xb9",  #Latin small letter u with grave
 337 | 		'&uacute;' => "\xc3\xba",  #Latin small letter u with acute
 338 | 		'&ucirc;'  => "\xc3\xbb",  #Latin small letter u with circumflex
 339 | 		'&uuml;'   => "\xc3\xbc",  #Latin small letter u with diaeresis
 340 | 		'&yacute;' => "\xc3\xbd",  #Latin small letter y with acute
 341 | 		'&thorn;'  => "\xc3\xbe",  #Latin small letter thorn
 342 | 		'&yuml;'   => "\xc3\xbf",  #Latin small letter y with diaeresis
 343 | 		#Symbols and Greek Letters:
 344 | 		'&fnof;'    => "\xc6\x92",  #U+0192 [ƒ] Latin small f with hook = function = florin
 345 | 		'&Alpha;'   => "\xce\x91",  #Greek capital letter alpha
 346 | 		'&Beta;'    => "\xce\x92",  #Greek capital letter beta
 347 | 		'&Gamma;'   => "\xce\x93",  #Greek capital letter gamma
 348 | 		'&Delta;'   => "\xce\x94",  #Greek capital letter delta
 349 | 		'&Epsilon;' => "\xce\x95",  #Greek capital letter epsilon
 350 | 		'&Zeta;'    => "\xce\x96",  #Greek capital letter zeta
 351 | 		'&Eta;'     => "\xce\x97",  #Greek capital letter eta
 352 | 		'&Theta;'   => "\xce\x98",  #Greek capital letter theta
 353 | 		'&Iota;'    => "\xce\x99",  #Greek capital letter iota
 354 | 		'&Kappa;'   => "\xce\x9a",  #Greek capital letter kappa
 355 | 		'&Lambda;'  => "\xce\x9b",  #Greek capital letter lambda
 356 | 		'&Mu;'      => "\xce\x9c",  #Greek capital letter mu
 357 | 		'&Nu;'      => "\xce\x9d",  #Greek capital letter nu
 358 | 		'&Xi;'      => "\xce\x9e",  #Greek capital letter xi
 359 | 		'&Omicron;' => "\xce\x9f",  #Greek capital letter omicron
 360 | 		'&Pi;'      => "\xce\xa0",  #Greek capital letter pi
 361 | 		'&Rho;'     => "\xce\xa1",  #Greek capital letter rho
 362 | 		'&Sigma;'   => "\xce\xa3",  #Greek capital letter sigma
 363 | 		'&Tau;'     => "\xce\xa4",  #Greek capital letter tau
 364 | 		'&Upsilon;' => "\xce\xa5",  #Greek capital letter upsilon
 365 | 		'&Phi;'     => "\xce\xa6",  #Greek capital letter phi
 366 | 		'&Chi;'     => "\xce\xa7",  #Greek capital letter chi
 367 | 		'&Psi;'     => "\xce\xa8",  #Greek capital letter psi
 368 | 		'&Omega;'   => "\xce\xa9",  #Greek capital letter omega
 369 | 		'&alpha;'   => "\xce\xb1",  #Greek small letter alpha
 370 | 		'&beta;'    => "\xce\xb2",  #Greek small letter beta
 371 | 		'&gamma;'   => "\xce\xb3",  #Greek small letter gamma
 372 | 		'&delta;'   => "\xce\xb4",  #Greek small letter delta
 373 | 		'&epsilon;' => "\xce\xb5",  #Greek small letter epsilon
 374 | 		'&zeta;'    => "\xce\xb6",  #Greek small letter zeta
 375 | 		'&eta;'     => "\xce\xb7",  #Greek small letter eta
 376 | 		'&theta;'   => "\xce\xb8",  #Greek small letter theta
 377 | 		'&iota;'    => "\xce\xb9",  #Greek small letter iota
 378 | 		'&kappa;'   => "\xce\xba",  #Greek small letter kappa
 379 | 		'&lambda;'  => "\xce\xbb",  #Greek small letter lambda
 380 | 		'&mu;'      => "\xce\xbc",  #Greek small letter mu
 381 | 		'&nu;'      => "\xce\xbd",  #Greek small letter nu
 382 | 		'&xi;'      => "\xce\xbe",  #Greek small letter xi
 383 | 		'&omicron;' => "\xce\xbf",  #Greek small letter omicron
 384 | 		'&pi;'      => "\xcf\x80",  #Greek small letter pi
 385 | 		'&rho;'     => "\xcf\x81",  #Greek small letter rho
 386 | 		'&sigmaf;'  => "\xcf\x82",  #Greek small letter final sigma
 387 | 		'&sigma;'   => "\xcf\x83",  #Greek small letter sigma
 388 | 		'&tau;'     => "\xcf\x84",  #Greek small letter tau
 389 | 		'&upsilon;' => "\xcf\x85",  #Greek small letter upsilon
 390 | 		'&phi;'     => "\xcf\x86",  #Greek small letter phi
 391 | 		'&chi;'     => "\xcf\x87",  #Greek small letter chi
 392 | 		'&psi;'     => "\xcf\x88",  #Greek small letter psi
 393 | 		'&omega;'   => "\xcf\x89",  #Greek small letter omega
 394 | 		'&thetasym;'=> "\xcf\x91",  #Greek small letter theta symbol
 395 | 		'&upsih;'   => "\xcf\x92",  #Greek upsilon with hook symbol
 396 | 		'&piv;'     => "\xcf\x96",  #U+03D6 [ϖ] Greek pi symbol
 397 | 
 398 | 		'&bull;'    => "\xe2\x80\xa2",  #U+2022 [•] bullet = black small circle
 399 | 		'&hellip;'  => "\xe2\x80\xa6",  #U+2026 […] horizontal ellipsis = three dot leader
 400 | 		'&prime;'   => "\xe2\x80\xb2",  #U+2032 [′] prime = minutes = feet (для обозначения минут и футов)
 401 | 		'&Prime;'   => "\xe2\x80\xb3",  #U+2033 [″] double prime = seconds = inches (для обозначения секунд и дюймов).
 402 | 		'&oline;'   => "\xe2\x80\xbe",  #U+203E [‾] overline = spacing overscore
 403 | 		'&frasl;'   => "\xe2\x81\x84",  #U+2044 [⁄] fraction slash
 404 | 		'&weierp;'  => "\xe2\x84\x98",  #U+2118 [℘] script capital P = power set = Weierstrass p
 405 | 		'&image;'   => "\xe2\x84\x91",  #U+2111 [ℑ] blackletter capital I = imaginary part
 406 | 		'&real;'    => "\xe2\x84\x9c",  #U+211C [ℜ] blackletter capital R = real part symbol
 407 | 		'&trade;'   => "\xe2\x84\xa2",  #U+2122 [™] trade mark sign
 408 | 		'&alefsym;' => "\xe2\x84\xb5",  #U+2135 [ℵ] alef symbol = first transfinite cardinal
 409 | 		'&larr;'    => "\xe2\x86\x90",  #U+2190 [←] leftwards arrow
 410 | 		'&uarr;'    => "\xe2\x86\x91",  #U+2191 [↑] upwards arrow
 411 | 		'&rarr;'    => "\xe2\x86\x92",  #U+2192 [→] rightwards arrow
 412 | 		'&darr;'    => "\xe2\x86\x93",  #U+2193 [↓] downwards arrow
 413 | 		'&harr;'    => "\xe2\x86\x94",  #U+2194 [↔] left right arrow
 414 | 		'&crarr;'   => "\xe2\x86\xb5",  #U+21B5 [↵] downwards arrow with corner leftwards = carriage return
 415 | 		'&lArr;'    => "\xe2\x87\x90",  #U+21D0 [⇐] leftwards double arrow
 416 | 		'&uArr;'    => "\xe2\x87\x91",  #U+21D1 [⇑] upwards double arrow
 417 | 		'&rArr;'    => "\xe2\x87\x92",  #U+21D2 [⇒] rightwards double arrow
 418 | 		'&dArr;'    => "\xe2\x87\x93",  #U+21D3 [⇓] downwards double arrow
 419 | 		'&hArr;'    => "\xe2\x87\x94",  #U+21D4 [⇔] left right double arrow
 420 | 		'&forall;'  => "\xe2\x88\x80",  #U+2200 [∀] for all
 421 | 		'&part;'    => "\xe2\x88\x82",  #U+2202 [∂] partial differential
 422 | 		'&exist;'   => "\xe2\x88\x83",  #U+2203 [∃] there exists
 423 | 		'&empty;'   => "\xe2\x88\x85",  #U+2205 [∅] empty set = null set = diameter
 424 | 		'&nabla;'   => "\xe2\x88\x87",  #U+2207 [∇] nabla = backward difference
 425 | 		'&isin;'    => "\xe2\x88\x88",  #U+2208 [∈] element of
 426 | 		'&notin;'   => "\xe2\x88\x89",  #U+2209 [∉] not an element of
 427 | 		'&ni;'      => "\xe2\x88\x8b",  #U+220B [∋] contains as member
 428 | 		'&prod;'    => "\xe2\x88\x8f",  #U+220F [∏] n-ary product = product sign
 429 | 		'&sum;'     => "\xe2\x88\x91",  #U+2211 [∑] n-ary sumation
 430 | 		'&minus;'   => "\xe2\x88\x92",  #U+2212 [−] minus sign
 431 | 		'&lowast;'  => "\xe2\x88\x97",  #U+2217 [∗] asterisk operator
 432 | 		'&radic;'   => "\xe2\x88\x9a",  #U+221A [√] square root = radical sign
 433 | 		'&prop;'    => "\xe2\x88\x9d",  #U+221D [∝] proportional to
 434 | 		'&infin;'   => "\xe2\x88\x9e",  #U+221E [∞] infinity
 435 | 		'&ang;'     => "\xe2\x88\xa0",  #U+2220 [∠] angle
 436 | 		'&and;'     => "\xe2\x88\xa7",  #U+2227 [∧] logical and = wedge
 437 | 		'&or;'      => "\xe2\x88\xa8",  #U+2228 [∨] logical or = vee
 438 | 		'&cap;'     => "\xe2\x88\xa9",  #U+2229 [∩] intersection = cap
 439 | 		'&cup;'     => "\xe2\x88\xaa",  #U+222A [∪] union = cup
 440 | 		'&int;'     => "\xe2\x88\xab",  #U+222B [∫] integral
 441 | 		'&there4;'  => "\xe2\x88\xb4",  #U+2234 [∴] therefore
 442 | 		'&sim;'     => "\xe2\x88\xbc",  #U+223C [∼] tilde operator = varies with = similar to
 443 | 		'&cong;'    => "\xe2\x89\x85",  #U+2245 [≅] approximately equal to
 444 | 		'&asymp;'   => "\xe2\x89\x88",  #U+2248 [≈] almost equal to = asymptotic to
 445 | 		'&ne;'      => "\xe2\x89\xa0",  #U+2260 [≠] not equal to
 446 | 		'&equiv;'   => "\xe2\x89\xa1",  #U+2261 [≡] identical to
 447 | 		'&le;'      => "\xe2\x89\xa4",  #U+2264 [≤] less-than or equal to
 448 | 		'&ge;'      => "\xe2\x89\xa5",  #U+2265 [≥] greater-than or equal to
 449 | 		'&sub;'     => "\xe2\x8a\x82",  #U+2282 [⊂] subset of
 450 | 		'&sup;'     => "\xe2\x8a\x83",  #U+2283 [⊃] superset of
 451 | 		'&nsub;'    => "\xe2\x8a\x84",  #U+2284 [⊄] not a subset of
 452 | 		'&sube;'    => "\xe2\x8a\x86",  #U+2286 [⊆] subset of or equal to
 453 | 		'&supe;'    => "\xe2\x8a\x87",  #U+2287 [⊇] superset of or equal to
 454 | 		'&oplus;'   => "\xe2\x8a\x95",  #U+2295 [⊕] circled plus = direct sum
 455 | 		'&otimes;'  => "\xe2\x8a\x97",  #U+2297 [⊗] circled times = vector product
 456 | 		'&perp;'    => "\xe2\x8a\xa5",  #U+22A5 [⊥] up tack = orthogonal to = perpendicular
 457 | 		'&sdot;'    => "\xe2\x8b\x85",  #U+22C5 [⋅] dot operator
 458 | 		'&lceil;'   => "\xe2\x8c\x88",  #U+2308 [⌈] left ceiling = APL upstile
 459 | 		'&rceil;'   => "\xe2\x8c\x89",  #U+2309 [⌉] right ceiling
 460 | 		'&lfloor;'  => "\xe2\x8c\x8a",  #U+230A [⌊] left floor = APL downstile
 461 | 		'&rfloor;'  => "\xe2\x8c\x8b",  #U+230B [⌋] right floor
 462 | 		'&lang;'    => "\xe2\x8c\xa9",  #U+2329 [〈] left-pointing angle bracket = bra
 463 | 		'&rang;'    => "\xe2\x8c\xaa",  #U+232A [〉] right-pointing angle bracket = ket
 464 | 		'&loz;'     => "\xe2\x97\x8a",  #U+25CA [◊] lozenge
 465 | 		'&spades;'  => "\xe2\x99\xa0",  #U+2660 [♠] black spade suit
 466 | 		'&clubs;'   => "\xe2\x99\xa3",  #U+2663 [♣] black club suit = shamrock
 467 | 		'&hearts;'  => "\xe2\x99\xa5",  #U+2665 [♥] black heart suit = valentine
 468 | 		'&diams;'   => "\xe2\x99\xa6",  #U+2666 [♦] black diamond suit
 469 | 		#Other Special Characters:
 470 | 		'&OElig;'  => "\xc5\x92",  #U+0152 [Œ] Latin capital ligature OE
 471 | 		'&oelig;'  => "\xc5\x93",  #U+0153 [œ] Latin small ligature oe
 472 | 		'&Scaron;' => "\xc5\xa0",  #U+0160 [Š] Latin capital letter S with caron
 473 | 		'&scaron;' => "\xc5\xa1",  #U+0161 [š] Latin small letter s with caron
 474 | 		'&Yuml;'   => "\xc5\xb8",  #U+0178 [Ÿ] Latin capital letter Y with diaeresis
 475 | 		'&circ;'   => "\xcb\x86",  #U+02C6 [ˆ] modifier letter circumflex accent
 476 | 		'&tilde;'  => "\xcb\x9c",  #U+02DC [˜] small tilde
 477 | 		'&ensp;'   => "\xe2\x80\x82",  #U+2002 [ ] en space
 478 | 		'&emsp;'   => "\xe2\x80\x83",  #U+2003 [ ] em space
 479 | 		'&thinsp;' => "\xe2\x80\x89",  #U+2009 [ ] thin space
 480 | 		'&zwnj;'   => "\xe2\x80\x8c",  #U+200C [‌] zero width non-joiner
 481 | 		'&zwj;'    => "\xe2\x80\x8d",  #U+200D [‍] zero width joiner
 482 | 		'&lrm;'    => "\xe2\x80\x8e",  #U+200E [‎] left-to-right mark
 483 | 		'&rlm;'    => "\xe2\x80\x8f",  #U+200F [‏] right-to-left mark
 484 | 		'&ndash;'  => "\xe2\x80\x93",  #U+2013 [–] en dash
 485 | 		'&mdash;'  => "\xe2\x80\x94",  #U+2014 [—] em dash
 486 | 		'&lsquo;'  => "\xe2\x80\x98",  #U+2018 [‘] left single quotation mark
 487 | 		'&rsquo;'  => "\xe2\x80\x99",  #U+2019 [’] right single quotation mark (and apostrophe!)
 488 | 		'&sbquo;'  => "\xe2\x80\x9a",  #U+201A [‚] single low-9 quotation mark
 489 | 		'&ldquo;'  => "\xe2\x80\x9c",  #U+201C [“] left double quotation mark
 490 | 		'&rdquo;'  => "\xe2\x80\x9d",  #U+201D [”] right double quotation mark
 491 | 		'&bdquo;'  => "\xe2\x80\x9e",  #U+201E [„] double low-9 quotation mark
 492 | 		'&dagger;' => "\xe2\x80\xa0",  #U+2020 [†] dagger
 493 | 		'&Dagger;' => "\xe2\x80\xa1",  #U+2021 [‡] double dagger
 494 | 		'&permil;' => "\xe2\x80\xb0",  #U+2030 [‰] per mille sign
 495 | 		'&lsaquo;' => "\xe2\x80\xb9",  #U+2039 [‹] single left-pointing angle quotation mark
 496 | 		'&rsaquo;' => "\xe2\x80\xba",  #U+203A [›] single right-pointing angle quotation mark
 497 | 		'&euro;'   => "\xe2\x82\xac",  #U+20AC [€] euro sign
 498 | 	);
 499 | 
 500 | 	/**
 501 | 	 * This table contains the data on how cp1259 characters map into Unicode (UTF-8).
 502 | 	 * The cp1259 map describes standart tatarish cyrillic charset and based on the cp1251 table.
 503 | 	 * cp1259 -- this is an outdated one byte encoding of the Tatar language,
 504 | 	 * which includes all the Russian letters from cp1251.
 505 | 	 *
 506 | 	 * @link  http://search.cpan.org/CPAN/authors/id/A/AM/AMICHAUER/Lingua-TT-Yanalif-0.08.tar.gz
 507 | 	 * @link  http://www.unicode.org/charts/PDF/U0400.pdf
 508 | 	 * @var   array
 509 | 	 */
 510 | 	public static $cp1259_table = array(
 511 | 		#bytes from 0x00 to 0x7F (ASCII) saved as is
 512 | 		"\x80" => "\xd3\x98",      #U+04d8 CYRILLIC CAPITAL LETTER SCHWA
 513 | 		"\x81" => "\xd0\x83",      #U+0403 CYRILLIC CAPITAL LETTER GJE
 514 | 		"\x82" => "\xe2\x80\x9a",  #U+201a SINGLE LOW-9 QUOTATION MARK
 515 | 		"\x83" => "\xd1\x93",      #U+0453 CYRILLIC SMALL LETTER GJE
 516 | 		"\x84" => "\xe2\x80\x9e",  #U+201e DOUBLE LOW-9 QUOTATION MARK
 517 | 		"\x85" => "\xe2\x80\xa6",  #U+2026 HORIZONTAL ELLIPSIS
 518 | 		"\x86" => "\xe2\x80\xa0",  #U+2020 DAGGER
 519 | 		"\x87" => "\xe2\x80\xa1",  #U+2021 DOUBLE DAGGER
 520 | 		"\x88" => "\xe2\x82\xac",  #U+20ac EURO SIGN
 521 | 		"\x89" => "\xe2\x80\xb0",  #U+2030 PER MILLE SIGN
 522 | 		"\x8a" => "\xd3\xa8",      #U+04e8 CYRILLIC CAPITAL LETTER BARRED O
 523 | 		"\x8b" => "\xe2\x80\xb9",  #U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
 524 | 		"\x8c" => "\xd2\xae",      #U+04ae CYRILLIC CAPITAL LETTER STRAIGHT U
 525 | 		"\x8d" => "\xd2\x96",      #U+0496 CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
 526 | 		"\x8e" => "\xd2\xa2",      #U+04a2 CYRILLIC CAPITAL LETTER EN WITH HOOK
 527 | 		"\x8f" => "\xd2\xba",      #U+04ba CYRILLIC CAPITAL LETTER SHHA
 528 | 		"\x90" => "\xd3\x99",      #U+04d9 CYRILLIC SMALL LETTER SCHWA
 529 | 		"\x91" => "\xe2\x80\x98",  #U+2018 LEFT SINGLE QUOTATION MARK
 530 | 		"\x92" => "\xe2\x80\x99",  #U+2019 RIGHT SINGLE QUOTATION MARK
 531 | 		"\x93" => "\xe2\x80\x9c",  #U+201c LEFT DOUBLE QUOTATION MARK
 532 | 		"\x94" => "\xe2\x80\x9d",  #U+201d RIGHT DOUBLE QUOTATION MARK
 533 | 		"\x95" => "\xe2\x80\xa2",  #U+2022 BULLET
 534 | 		"\x96" => "\xe2\x80\x93",  #U+2013 EN DASH
 535 | 		"\x97" => "\xe2\x80\x94",  #U+2014 EM DASH
 536 | 		#"\x98"                    #UNDEFINED
 537 | 		"\x99" => "\xe2\x84\xa2",  #U+2122 TRADE MARK SIGN
 538 | 		"\x9a" => "\xd3\xa9",      #U+04e9 CYRILLIC SMALL LETTER BARRED O
 539 | 		"\x9b" => "\xe2\x80\xba",  #U+203a SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
 540 | 		"\x9c" => "\xd2\xaf",      #U+04af CYRILLIC SMALL LETTER STRAIGHT U
 541 | 		"\x9d" => "\xd2\x97",      #U+0497 CYRILLIC SMALL LETTER ZHE WITH DESCENDER
 542 | 		"\x9e" => "\xd2\xa3",      #U+04a3 CYRILLIC SMALL LETTER EN WITH HOOK
 543 | 		"\x9f" => "\xd2\xbb",      #U+04bb CYRILLIC SMALL LETTER SHHA
 544 | 		"\xa0" => "\xc2\xa0",      #U+00a0 NO-BREAK SPACE
 545 | 		"\xa1" => "\xd0\x8e",      #U+040e CYRILLIC CAPITAL LETTER SHORT U
 546 | 		"\xa2" => "\xd1\x9e",      #U+045e CYRILLIC SMALL LETTER SHORT U
 547 | 		"\xa3" => "\xd0\x88",      #U+0408 CYRILLIC CAPITAL LETTER JE
 548 | 		"\xa4" => "\xc2\xa4",      #U+00a4 CURRENCY SIGN
 549 | 		"\xa5" => "\xd2\x90",      #U+0490 CYRILLIC CAPITAL LETTER GHE WITH UPTURN
 550 | 		"\xa6" => "\xc2\xa6",      #U+00a6 BROKEN BAR
 551 | 		"\xa7" => "\xc2\xa7",      #U+00a7 SECTION SIGN
 552 | 		"\xa8" => "\xd0\x81",      #U+0401 CYRILLIC CAPITAL LETTER IO
 553 | 		"\xa9" => "\xc2\xa9",      #U+00a9 COPYRIGHT SIGN
 554 | 		"\xaa" => "\xd0\x84",      #U+0404 CYRILLIC CAPITAL LETTER UKRAINIAN IE
 555 | 		"\xab" => "\xc2\xab",      #U+00ab LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
 556 | 		"\xac" => "\xc2\xac",      #U+00ac NOT SIGN
 557 | 		"\xad" => "\xc2\xad",      #U+00ad SOFT HYPHEN
 558 | 		"\xae" => "\xc2\xae",      #U+00ae REGISTERED SIGN
 559 | 		"\xaf" => "\xd0\x87",      #U+0407 CYRILLIC CAPITAL LETTER YI
 560 | 		"\xb0" => "\xc2\xb0",      #U+00b0 DEGREE SIGN
 561 | 		"\xb1" => "\xc2\xb1",      #U+00b1 PLUS-MINUS SIGN
 562 | 		"\xb2" => "\xd0\x86",      #U+0406 CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
 563 | 		"\xb3" => "\xd1\x96",      #U+0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
 564 | 		"\xb4" => "\xd2\x91",      #U+0491 CYRILLIC SMALL LETTER GHE WITH UPTURN
 565 | 		"\xb5" => "\xc2\xb5",      #U+00b5 MICRO SIGN
 566 | 		"\xb6" => "\xc2\xb6",      #U+00b6 PILCROW SIGN
 567 | 		"\xb7" => "\xc2\xb7",      #U+00b7 MIDDLE DOT
 568 | 		"\xb8" => "\xd1\x91",      #U+0451 CYRILLIC SMALL LETTER IO
 569 | 		"\xb9" => "\xe2\x84\x96",  #U+2116 NUMERO SIGN
 570 | 		"\xba" => "\xd1\x94",      #U+0454 CYRILLIC SMALL LETTER UKRAINIAN IE
 571 | 		"\xbb" => "\xc2\xbb",      #U+00bb RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
 572 | 		"\xbc" => "\xd1\x98",      #U+0458 CYRILLIC SMALL LETTER JE
 573 | 		"\xbd" => "\xd0\x85",      #U+0405 CYRILLIC CAPITAL LETTER DZE
 574 | 		"\xbe" => "\xd1\x95",      #U+0455 CYRILLIC SMALL LETTER DZE
 575 | 		"\xbf" => "\xd1\x97",      #U+0457 CYRILLIC SMALL LETTER YI
 576 | 		"\xc0" => "\xd0\x90",      #U+0410 CYRILLIC CAPITAL LETTER A
 577 | 		"\xc1" => "\xd0\x91",      #U+0411 CYRILLIC CAPITAL LETTER BE
 578 | 		"\xc2" => "\xd0\x92",      #U+0412 CYRILLIC CAPITAL LETTER VE
 579 | 		"\xc3" => "\xd0\x93",      #U+0413 CYRILLIC CAPITAL LETTER GHE
 580 | 		"\xc4" => "\xd0\x94",      #U+0414 CYRILLIC CAPITAL LETTER DE
 581 | 		"\xc5" => "\xd0\x95",      #U+0415 CYRILLIC CAPITAL LETTER IE
 582 | 		"\xc6" => "\xd0\x96",      #U+0416 CYRILLIC CAPITAL LETTER ZHE
 583 | 		"\xc7" => "\xd0\x97",      #U+0417 CYRILLIC CAPITAL LETTER ZE
 584 | 		"\xc8" => "\xd0\x98",      #U+0418 CYRILLIC CAPITAL LETTER I
 585 | 		"\xc9" => "\xd0\x99",      #U+0419 CYRILLIC CAPITAL LETTER SHORT I
 586 | 		"\xca" => "\xd0\x9a",      #U+041a CYRILLIC CAPITAL LETTER KA
 587 | 		"\xcb" => "\xd0\x9b",      #U+041b CYRILLIC CAPITAL LETTER EL
 588 | 		"\xcc" => "\xd0\x9c",      #U+041c CYRILLIC CAPITAL LETTER EM
 589 | 		"\xcd" => "\xd0\x9d",      #U+041d CYRILLIC CAPITAL LETTER EN
 590 | 		"\xce" => "\xd0\x9e",      #U+041e CYRILLIC CAPITAL LETTER O
 591 | 		"\xcf" => "\xd0\x9f",      #U+041f CYRILLIC CAPITAL LETTER PE
 592 | 		"\xd0" => "\xd0\xa0",      #U+0420 CYRILLIC CAPITAL LETTER ER
 593 | 		"\xd1" => "\xd0\xa1",      #U+0421 CYRILLIC CAPITAL LETTER ES
 594 | 		"\xd2" => "\xd0\xa2",      #U+0422 CYRILLIC CAPITAL LETTER TE
 595 | 		"\xd3" => "\xd0\xa3",      #U+0423 CYRILLIC CAPITAL LETTER U
 596 | 		"\xd4" => "\xd0\xa4",      #U+0424 CYRILLIC CAPITAL LETTER EF
 597 | 		"\xd5" => "\xd0\xa5",      #U+0425 CYRILLIC CAPITAL LETTER HA
 598 | 		"\xd6" => "\xd0\xa6",      #U+0426 CYRILLIC CAPITAL LETTER TSE
 599 | 		"\xd7" => "\xd0\xa7",      #U+0427 CYRILLIC CAPITAL LETTER CHE
 600 | 		"\xd8" => "\xd0\xa8",      #U+0428 CYRILLIC CAPITAL LETTER SHA
 601 | 		"\xd9" => "\xd0\xa9",      #U+0429 CYRILLIC CAPITAL LETTER SHCHA
 602 | 		"\xda" => "\xd0\xaa",      #U+042a CYRILLIC CAPITAL LETTER HARD SIGN
 603 | 		"\xdb" => "\xd0\xab",      #U+042b CYRILLIC CAPITAL LETTER YERU
 604 | 		"\xdc" => "\xd0\xac",      #U+042c CYRILLIC CAPITAL LETTER SOFT SIGN
 605 | 		"\xdd" => "\xd0\xad",      #U+042d CYRILLIC CAPITAL LETTER E
 606 | 		"\xde" => "\xd0\xae",      #U+042e CYRILLIC CAPITAL LETTER YU
 607 | 		"\xdf" => "\xd0\xaf",      #U+042f CYRILLIC CAPITAL LETTER YA
 608 | 		"\xe0" => "\xd0\xb0",      #U+0430 CYRILLIC SMALL LETTER A
 609 | 		"\xe1" => "\xd0\xb1",      #U+0431 CYRILLIC SMALL LETTER BE
 610 | 		"\xe2" => "\xd0\xb2",      #U+0432 CYRILLIC SMALL LETTER VE
 611 | 		"\xe3" => "\xd0\xb3",      #U+0433 CYRILLIC SMALL LETTER GHE
 612 | 		"\xe4" => "\xd0\xb4",      #U+0434 CYRILLIC SMALL LETTER DE
 613 | 		"\xe5" => "\xd0\xb5",      #U+0435 CYRILLIC SMALL LETTER IE
 614 | 		"\xe6" => "\xd0\xb6",      #U+0436 CYRILLIC SMALL LETTER ZHE
 615 | 		"\xe7" => "\xd0\xb7",      #U+0437 CYRILLIC SMALL LETTER ZE
 616 | 		"\xe8" => "\xd0\xb8",      #U+0438 CYRILLIC SMALL LETTER I
 617 | 		"\xe9" => "\xd0\xb9",      #U+0439 CYRILLIC SMALL LETTER SHORT I
 618 | 		"\xea" => "\xd0\xba",      #U+043a CYRILLIC SMALL LETTER KA
 619 | 		"\xeb" => "\xd0\xbb",      #U+043b CYRILLIC SMALL LETTER EL
 620 | 		"\xec" => "\xd0\xbc",      #U+043c CYRILLIC SMALL LETTER EM
 621 | 		"\xed" => "\xd0\xbd",      #U+043d CYRILLIC SMALL LETTER EN
 622 | 		"\xee" => "\xd0\xbe",      #U+043e CYRILLIC SMALL LETTER O
 623 | 		"\xef" => "\xd0\xbf",      #U+043f CYRILLIC SMALL LETTER PE
 624 | 		"\xf0" => "\xd1\x80",      #U+0440 CYRILLIC SMALL LETTER ER
 625 | 		"\xf1" => "\xd1\x81",      #U+0441 CYRILLIC SMALL LETTER ES
 626 | 		"\xf2" => "\xd1\x82",      #U+0442 CYRILLIC SMALL LETTER TE
 627 | 		"\xf3" => "\xd1\x83",      #U+0443 CYRILLIC SMALL LETTER U
 628 | 		"\xf4" => "\xd1\x84",      #U+0444 CYRILLIC SMALL LETTER EF
 629 | 		"\xf5" => "\xd1\x85",      #U+0445 CYRILLIC SMALL LETTER HA
 630 | 		"\xf6" => "\xd1\x86",      #U+0446 CYRILLIC SMALL LETTER TSE
 631 | 		"\xf7" => "\xd1\x87",      #U+0447 CYRILLIC SMALL LETTER CHE
 632 | 		"\xf8" => "\xd1\x88",      #U+0448 CYRILLIC SMALL LETTER SHA
 633 | 		"\xf9" => "\xd1\x89",      #U+0449 CYRILLIC SMALL LETTER SHCHA
 634 | 		"\xfa" => "\xd1\x8a",      #U+044a CYRILLIC SMALL LETTER HARD SIGN
 635 | 		"\xfb" => "\xd1\x8b",      #U+044b CYRILLIC SMALL LETTER YERU
 636 | 		"\xfc" => "\xd1\x8c",      #U+044c CYRILLIC SMALL LETTER SOFT SIGN
 637 | 		"\xfd" => "\xd1\x8d",      #U+044d CYRILLIC SMALL LETTER E
 638 | 		"\xfe" => "\xd1\x8e",      #U+044e CYRILLIC SMALL LETTER YU
 639 | 		"\xff" => "\xd1\x8f",      #U+044f CYRILLIC SMALL LETTER YA
 640 | 	);
 641 | 
 642 | 	/**
 643 | 	 * UTF-8 Case lookup table
 644 | 	 *
 645 | 	 * This lookuptable defines the upper case letters to their correspponding
 646 | 	 * lower case letter in UTF-8
 647 | 	 *
 648 | 	 * @author Andreas Gohr <andi@splitbrain.org>
 649 | 	 * @var array
 650 | 	 */
 651 | 	public static $convert_case_table = array(
 652 | 		#CASE_UPPER => case_lower
 653 | 		"\x41" => "\x61", #A a
 654 | 		"\x42" => "\x62", #B b
 655 | 		"\x43" => "\x63", #C c
 656 | 		"\x44" => "\x64", #D d
 657 | 		"\x45" => "\x65", #E e
 658 | 		"\x46" => "\x66", #F f
 659 | 		"\x47" => "\x67", #G g
 660 | 		"\x48" => "\x68", #H h
 661 | 		"\x49" => "\x69", #I i
 662 | 		"\x4a" => "\x6a", #J j
 663 | 		"\x4b" => "\x6b", #K k
 664 | 		"\x4c" => "\x6c", #L l
 665 | 		"\x4d" => "\x6d", #M m
 666 | 		"\x4e" => "\x6e", #N n
 667 | 		"\x4f" => "\x6f", #O o
 668 | 		"\x50" => "\x70", #P p
 669 | 		"\x51" => "\x71", #Q q
 670 | 		"\x52" => "\x72", #R r
 671 | 		"\x53" => "\x73", #S s
 672 | 		"\x54" => "\x74", #T t
 673 | 		"\x55" => "\x75", #U u
 674 | 		"\x56" => "\x76", #V v
 675 | 		"\x57" => "\x77", #W w
 676 | 		"\x58" => "\x78", #X x
 677 | 		"\x59" => "\x79", #Y y
 678 | 		"\x5a" => "\x7a", #Z z
 679 | 		"\xc3\x80" => "\xc3\xa0",
 680 | 		"\xc3\x81" => "\xc3\xa1",
 681 | 		"\xc3\x82" => "\xc3\xa2",
 682 | 		"\xc3\x83" => "\xc3\xa3",
 683 | 		"\xc3\x84" => "\xc3\xa4",
 684 | 		"\xc3\x85" => "\xc3\xa5",
 685 | 		"\xc3\x86" => "\xc3\xa6",
 686 | 		"\xc3\x87" => "\xc3\xa7",
 687 | 		"\xc3\x88" => "\xc3\xa8",
 688 | 		"\xc3\x89" => "\xc3\xa9",
 689 | 		"\xc3\x8a" => "\xc3\xaa",
 690 | 		"\xc3\x8b" => "\xc3\xab",
 691 | 		"\xc3\x8c" => "\xc3\xac",
 692 | 		"\xc3\x8d" => "\xc3\xad",
 693 | 		"\xc3\x8e" => "\xc3\xae",
 694 | 		"\xc3\x8f" => "\xc3\xaf",
 695 | 		"\xc3\x90" => "\xc3\xb0",
 696 | 		"\xc3\x91" => "\xc3\xb1",
 697 | 		"\xc3\x92" => "\xc3\xb2",
 698 | 		"\xc3\x93" => "\xc3\xb3",
 699 | 		"\xc3\x94" => "\xc3\xb4",
 700 | 		"\xc3\x95" => "\xc3\xb5",
 701 | 		"\xc3\x96" => "\xc3\xb6",
 702 | 		"\xc3\x98" => "\xc3\xb8",
 703 | 		"\xc3\x99" => "\xc3\xb9",
 704 | 		"\xc3\x9a" => "\xc3\xba",
 705 | 		"\xc3\x9b" => "\xc3\xbb",
 706 | 		"\xc3\x9c" => "\xc3\xbc",
 707 | 		"\xc3\x9d" => "\xc3\xbd",
 708 | 		"\xc3\x9e" => "\xc3\xbe",
 709 | 		"\xc4\x80" => "\xc4\x81",
 710 | 		"\xc4\x82" => "\xc4\x83",
 711 | 		"\xc4\x84" => "\xc4\x85",
 712 | 		"\xc4\x86" => "\xc4\x87",
 713 | 		"\xc4\x88" => "\xc4\x89",
 714 | 		"\xc4\x8a" => "\xc4\x8b",
 715 | 		"\xc4\x8c" => "\xc4\x8d",
 716 | 		"\xc4\x8e" => "\xc4\x8f",
 717 | 		"\xc4\x90" => "\xc4\x91",
 718 | 		"\xc4\x92" => "\xc4\x93",
 719 | 		"\xc4\x94" => "\xc4\x95",
 720 | 		"\xc4\x96" => "\xc4\x97",
 721 | 		"\xc4\x98" => "\xc4\x99",
 722 | 		"\xc4\x9a" => "\xc4\x9b",
 723 | 		"\xc4\x9c" => "\xc4\x9d",
 724 | 		"\xc4\x9e" => "\xc4\x9f",
 725 | 		"\xc4\xa0" => "\xc4\xa1",
 726 | 		"\xc4\xa2" => "\xc4\xa3",
 727 | 		"\xc4\xa4" => "\xc4\xa5",
 728 | 		"\xc4\xa6" => "\xc4\xa7",
 729 | 		"\xc4\xa8" => "\xc4\xa9",
 730 | 		"\xc4\xaa" => "\xc4\xab",
 731 | 		"\xc4\xac" => "\xc4\xad",
 732 | 		"\xc4\xae" => "\xc4\xaf",
 733 | 		"\xc4\xb2" => "\xc4\xb3",
 734 | 		"\xc4\xb4" => "\xc4\xb5",
 735 | 		"\xc4\xb6" => "\xc4\xb7",
 736 | 		"\xc4\xb9" => "\xc4\xba",
 737 | 		"\xc4\xbb" => "\xc4\xbc",
 738 | 		"\xc4\xbd" => "\xc4\xbe",
 739 | 		"\xc4\xbf" => "\xc5\x80",
 740 | 		"\xc5\x81" => "\xc5\x82",
 741 | 		"\xc5\x83" => "\xc5\x84",
 742 | 		"\xc5\x85" => "\xc5\x86",
 743 | 		"\xc5\x87" => "\xc5\x88",
 744 | 		"\xc5\x8a" => "\xc5\x8b",
 745 | 		"\xc5\x8c" => "\xc5\x8d",
 746 | 		"\xc5\x8e" => "\xc5\x8f",
 747 | 		"\xc5\x90" => "\xc5\x91",
 748 | 		"\xc5\x92" => "\xc5\x93",
 749 | 		"\xc5\x94" => "\xc5\x95",
 750 | 		"\xc5\x96" => "\xc5\x97",
 751 | 		"\xc5\x98" => "\xc5\x99",
 752 | 		"\xc5\x9a" => "\xc5\x9b",
 753 | 		"\xc5\x9c" => "\xc5\x9d",
 754 | 		"\xc5\x9e" => "\xc5\x9f",
 755 | 		"\xc5\xa0" => "\xc5\xa1",
 756 | 		"\xc5\xa2" => "\xc5\xa3",
 757 | 		"\xc5\xa4" => "\xc5\xa5",
 758 | 		"\xc5\xa6" => "\xc5\xa7",
 759 | 		"\xc5\xa8" => "\xc5\xa9",
 760 | 		"\xc5\xaa" => "\xc5\xab",
 761 | 		"\xc5\xac" => "\xc5\xad",
 762 | 		"\xc5\xae" => "\xc5\xaf",
 763 | 		"\xc5\xb0" => "\xc5\xb1",
 764 | 		"\xc5\xb2" => "\xc5\xb3",
 765 | 		"\xc5\xb4" => "\xc5\xb5",
 766 | 		"\xc5\xb6" => "\xc5\xb7",
 767 | 		"\xc5\xb8" => "\xc3\xbf",
 768 | 		"\xc5\xb9" => "\xc5\xba",
 769 | 		"\xc5\xbb" => "\xc5\xbc",
 770 | 		"\xc5\xbd" => "\xc5\xbe",
 771 | 		"\xc6\x81" => "\xc9\x93",
 772 | 		"\xc6\x82" => "\xc6\x83",
 773 | 		"\xc6\x84" => "\xc6\x85",
 774 | 		"\xc6\x86" => "\xc9\x94",
 775 | 		"\xc6\x87" => "\xc6\x88",
 776 | 		"\xc6\x89" => "\xc9\x96",
 777 | 		"\xc6\x8a" => "\xc9\x97",
 778 | 		"\xc6\x8b" => "\xc6\x8c",
 779 | 		"\xc6\x8e" => "\xc7\x9d",
 780 | 		"\xc6\x8f" => "\xc9\x99",
 781 | 		"\xc6\x90" => "\xc9\x9b",
 782 | 		"\xc6\x91" => "\xc6\x92",
 783 | 		"\xc6\x94" => "\xc9\xa3",
 784 | 		"\xc6\x96" => "\xc9\xa9",
 785 | 		"\xc6\x97" => "\xc9\xa8",
 786 | 		"\xc6\x98" => "\xc6\x99",
 787 | 		"\xc6\x9c" => "\xc9\xaf",
 788 | 		"\xc6\x9d" => "\xc9\xb2",
 789 | 		"\xc6\x9f" => "\xc9\xb5",
 790 | 		"\xc6\xa0" => "\xc6\xa1",
 791 | 		"\xc6\xa2" => "\xc6\xa3",
 792 | 		"\xc6\xa4" => "\xc6\xa5",
 793 | 		"\xc6\xa6" => "\xca\x80",
 794 | 		"\xc6\xa7" => "\xc6\xa8",
 795 | 		"\xc6\xa9" => "\xca\x83",
 796 | 		"\xc6\xac" => "\xc6\xad",
 797 | 		"\xc6\xae" => "\xca\x88",
 798 | 		"\xc6\xaf" => "\xc6\xb0",
 799 | 		"\xc6\xb1" => "\xca\x8a",
 800 | 		"\xc6\xb2" => "\xca\x8b",
 801 | 		"\xc6\xb3" => "\xc6\xb4",
 802 | 		"\xc6\xb5" => "\xc6\xb6",
 803 | 		"\xc6\xb7" => "\xca\x92",
 804 | 		"\xc6\xb8" => "\xc6\xb9",
 805 | 		"\xc6\xbc" => "\xc6\xbd",
 806 | 		"\xc7\x85" => "\xc7\x86",
 807 | 		"\xc7\x88" => "\xc7\x89",
 808 | 		"\xc7\x8b" => "\xc7\x8c",
 809 | 		"\xc7\x8d" => "\xc7\x8e",
 810 | 		"\xc7\x8f" => "\xc7\x90",
 811 | 		"\xc7\x91" => "\xc7\x92",
 812 | 		"\xc7\x93" => "\xc7\x94",
 813 | 		"\xc7\x95" => "\xc7\x96",
 814 | 		"\xc7\x97" => "\xc7\x98",
 815 | 		"\xc7\x99" => "\xc7\x9a",
 816 | 		"\xc7\x9b" => "\xc7\x9c",
 817 | 		"\xc7\x9e" => "\xc7\x9f",
 818 | 		"\xc7\xa0" => "\xc7\xa1",
 819 | 		"\xc7\xa2" => "\xc7\xa3",
 820 | 		"\xc7\xa4" => "\xc7\xa5",
 821 | 		"\xc7\xa6" => "\xc7\xa7",
 822 | 		"\xc7\xa8" => "\xc7\xa9",
 823 | 		"\xc7\xaa" => "\xc7\xab",
 824 | 		"\xc7\xac" => "\xc7\xad",
 825 | 		"\xc7\xae" => "\xc7\xaf",
 826 | 		"\xc7\xb2" => "\xc7\xb3",
 827 | 		"\xc7\xb4" => "\xc7\xb5",
 828 | 		"\xc7\xb6" => "\xc6\x95",
 829 | 		"\xc7\xb7" => "\xc6\xbf",
 830 | 		"\xc7\xb8" => "\xc7\xb9",
 831 | 		"\xc7\xba" => "\xc7\xbb",
 832 | 		"\xc7\xbc" => "\xc7\xbd",
 833 | 		"\xc7\xbe" => "\xc7\xbf",
 834 | 		"\xc8\x80" => "\xc8\x81",
 835 | 		"\xc8\x82" => "\xc8\x83",
 836 | 		"\xc8\x84" => "\xc8\x85",
 837 | 		"\xc8\x86" => "\xc8\x87",
 838 | 		"\xc8\x88" => "\xc8\x89",
 839 | 		"\xc8\x8a" => "\xc8\x8b",
 840 | 		"\xc8\x8c" => "\xc8\x8d",
 841 | 		"\xc8\x8e" => "\xc8\x8f",
 842 | 		"\xc8\x90" => "\xc8\x91",
 843 | 		"\xc8\x92" => "\xc8\x93",
 844 | 		"\xc8\x94" => "\xc8\x95",
 845 | 		"\xc8\x96" => "\xc8\x97",
 846 | 		"\xc8\x98" => "\xc8\x99",
 847 | 		"\xc8\x9a" => "\xc8\x9b",
 848 | 		"\xc8\x9c" => "\xc8\x9d",
 849 | 		"\xc8\x9e" => "\xc8\x9f",
 850 | 		"\xc8\xa0" => "\xc6\x9e",
 851 | 		"\xc8\xa2" => "\xc8\xa3",
 852 | 		"\xc8\xa4" => "\xc8\xa5",
 853 | 		"\xc8\xa6" => "\xc8\xa7",
 854 | 		"\xc8\xa8" => "\xc8\xa9",
 855 | 		"\xc8\xaa" => "\xc8\xab",
 856 | 		"\xc8\xac" => "\xc8\xad",
 857 | 		"\xc8\xae" => "\xc8\xaf",
 858 | 		"\xc8\xb0" => "\xc8\xb1",
 859 | 		"\xc8\xb2" => "\xc8\xb3",
 860 | 		"\xce\x86" => "\xce\xac",
 861 | 		"\xce\x88" => "\xce\xad",
 862 | 		"\xce\x89" => "\xce\xae",
 863 | 		"\xce\x8a" => "\xce\xaf",
 864 | 		"\xce\x8c" => "\xcf\x8c",
 865 | 		"\xce\x8e" => "\xcf\x8d",
 866 | 		"\xce\x8f" => "\xcf\x8e",
 867 | 		"\xce\x91" => "\xce\xb1",
 868 | 		"\xce\x92" => "\xce\xb2",
 869 | 		"\xce\x93" => "\xce\xb3",
 870 | 		"\xce\x94" => "\xce\xb4",
 871 | 		"\xce\x95" => "\xce\xb5",
 872 | 		"\xce\x96" => "\xce\xb6",
 873 | 		"\xce\x97" => "\xce\xb7",
 874 | 		"\xce\x98" => "\xce\xb8",
 875 | 		"\xce\x99" => "\xce\xb9",
 876 | 		"\xce\x9a" => "\xce\xba",
 877 | 		"\xce\x9b" => "\xce\xbb",
 878 | 		"\xce\x9c" => "\xc2\xb5",
 879 | 		"\xce\x9d" => "\xce\xbd",
 880 | 		"\xce\x9e" => "\xce\xbe",
 881 | 		"\xce\x9f" => "\xce\xbf",
 882 | 		"\xce\xa0" => "\xcf\x80",
 883 | 		"\xce\xa1" => "\xcf\x81",
 884 | 		"\xce\xa3" => "\xcf\x82",
 885 | 		"\xce\xa4" => "\xcf\x84",
 886 | 		"\xce\xa5" => "\xcf\x85",
 887 | 		"\xce\xa6" => "\xcf\x86",
 888 | 		"\xce\xa7" => "\xcf\x87",
 889 | 		"\xce\xa8" => "\xcf\x88",
 890 | 		"\xce\xa9" => "\xcf\x89",
 891 | 		"\xce\xaa" => "\xcf\x8a",
 892 | 		"\xce\xab" => "\xcf\x8b",
 893 | 		"\xcf\x98" => "\xcf\x99",
 894 | 		"\xcf\x9a" => "\xcf\x9b",
 895 | 		"\xcf\x9c" => "\xcf\x9d",
 896 | 		"\xcf\x9e" => "\xcf\x9f",
 897 | 		"\xcf\xa0" => "\xcf\xa1",
 898 | 		"\xcf\xa2" => "\xcf\xa3",
 899 | 		"\xcf\xa4" => "\xcf\xa5",
 900 | 		"\xcf\xa6" => "\xcf\xa7",
 901 | 		"\xcf\xa8" => "\xcf\xa9",
 902 | 		"\xcf\xaa" => "\xcf\xab",
 903 | 		"\xcf\xac" => "\xcf\xad",
 904 | 		"\xcf\xae" => "\xcf\xaf",
 905 | 		"\xd0\x80" => "\xd1\x90",
 906 | 		"\xd0\x81" => "\xd1\x91",
 907 | 		"\xd0\x82" => "\xd1\x92",
 908 | 		"\xd0\x83" => "\xd1\x93",
 909 | 		"\xd0\x84" => "\xd1\x94",
 910 | 		"\xd0\x85" => "\xd1\x95",
 911 | 		"\xd0\x86" => "\xd1\x96",
 912 | 		"\xd0\x87" => "\xd1\x97",
 913 | 		"\xd0\x88" => "\xd1\x98",
 914 | 		"\xd0\x89" => "\xd1\x99",
 915 | 		"\xd0\x8a" => "\xd1\x9a",
 916 | 		"\xd0\x8b" => "\xd1\x9b",
 917 | 		"\xd0\x8c" => "\xd1\x9c",
 918 | 		"\xd0\x8d" => "\xd1\x9d",
 919 | 		"\xd0\x8e" => "\xd1\x9e",
 920 | 		"\xd0\x8f" => "\xd1\x9f",
 921 | 		"\xd0\x90" => "\xd0\xb0",
 922 | 		"\xd0\x91" => "\xd0\xb1",
 923 | 		"\xd0\x92" => "\xd0\xb2",
 924 | 		"\xd0\x93" => "\xd0\xb3",
 925 | 		"\xd0\x94" => "\xd0\xb4",
 926 | 		"\xd0\x95" => "\xd0\xb5",
 927 | 		"\xd0\x96" => "\xd0\xb6",
 928 | 		"\xd0\x97" => "\xd0\xb7",
 929 | 		"\xd0\x98" => "\xd0\xb8",
 930 | 		"\xd0\x99" => "\xd0\xb9",
 931 | 		"\xd0\x9a" => "\xd0\xba",
 932 | 		"\xd0\x9b" => "\xd0\xbb",
 933 | 		"\xd0\x9c" => "\xd0\xbc",
 934 | 		"\xd0\x9d" => "\xd0\xbd",
 935 | 		"\xd0\x9e" => "\xd0\xbe",
 936 | 		"\xd0\x9f" => "\xd0\xbf",
 937 | 		"\xd0\xa0" => "\xd1\x80",
 938 | 		"\xd0\xa1" => "\xd1\x81",
 939 | 		"\xd0\xa2" => "\xd1\x82",
 940 | 		"\xd0\xa3" => "\xd1\x83",
 941 | 		"\xd0\xa4" => "\xd1\x84",
 942 | 		"\xd0\xa5" => "\xd1\x85",
 943 | 		"\xd0\xa6" => "\xd1\x86",
 944 | 		"\xd0\xa7" => "\xd1\x87",
 945 | 		"\xd0\xa8" => "\xd1\x88",
 946 | 		"\xd0\xa9" => "\xd1\x89",
 947 | 		"\xd0\xaa" => "\xd1\x8a",
 948 | 		"\xd0\xab" => "\xd1\x8b",
 949 | 		"\xd0\xac" => "\xd1\x8c",
 950 | 		"\xd0\xad" => "\xd1\x8d",
 951 | 		"\xd0\xae" => "\xd1\x8e",
 952 | 		"\xd0\xaf" => "\xd1\x8f",
 953 | 		"\xd1\xa0" => "\xd1\xa1",
 954 | 		"\xd1\xa2" => "\xd1\xa3",
 955 | 		"\xd1\xa4" => "\xd1\xa5",
 956 | 		"\xd1\xa6" => "\xd1\xa7",
 957 | 		"\xd1\xa8" => "\xd1\xa9",
 958 | 		"\xd1\xaa" => "\xd1\xab",
 959 | 		"\xd1\xac" => "\xd1\xad",
 960 | 		"\xd1\xae" => "\xd1\xaf",
 961 | 		"\xd1\xb0" => "\xd1\xb1",
 962 | 		"\xd1\xb2" => "\xd1\xb3",
 963 | 		"\xd1\xb4" => "\xd1\xb5",
 964 | 		"\xd1\xb6" => "\xd1\xb7",
 965 | 		"\xd1\xb8" => "\xd1\xb9",
 966 | 		"\xd1\xba" => "\xd1\xbb",
 967 | 		"\xd1\xbc" => "\xd1\xbd",
 968 | 		"\xd1\xbe" => "\xd1\xbf",
 969 | 		"\xd2\x80" => "\xd2\x81",
 970 | 		"\xd2\x8a" => "\xd2\x8b",
 971 | 		"\xd2\x8c" => "\xd2\x8d",
 972 | 		"\xd2\x8e" => "\xd2\x8f",
 973 | 		"\xd2\x90" => "\xd2\x91",
 974 | 		"\xd2\x92" => "\xd2\x93",
 975 | 		"\xd2\x94" => "\xd2\x95",
 976 | 		"\xd2\x96" => "\xd2\x97",
 977 | 		"\xd2\x98" => "\xd2\x99",
 978 | 		"\xd2\x9a" => "\xd2\x9b",
 979 | 		"\xd2\x9c" => "\xd2\x9d",
 980 | 		"\xd2\x9e" => "\xd2\x9f",
 981 | 		"\xd2\xa0" => "\xd2\xa1",
 982 | 		"\xd2\xa2" => "\xd2\xa3",
 983 | 		"\xd2\xa4" => "\xd2\xa5",
 984 | 		"\xd2\xa6" => "\xd2\xa7",
 985 | 		"\xd2\xa8" => "\xd2\xa9",
 986 | 		"\xd2\xaa" => "\xd2\xab",
 987 | 		"\xd2\xac" => "\xd2\xad",
 988 | 		"\xd2\xae" => "\xd2\xaf",
 989 | 		"\xd2\xb0" => "\xd2\xb1",
 990 | 		"\xd2\xb2" => "\xd2\xb3",
 991 | 		"\xd2\xb4" => "\xd2\xb5",
 992 | 		"\xd2\xb6" => "\xd2\xb7",
 993 | 		"\xd2\xb8" => "\xd2\xb9",
 994 | 		"\xd2\xba" => "\xd2\xbb",
 995 | 		"\xd2\xbc" => "\xd2\xbd",
 996 | 		"\xd2\xbe" => "\xd2\xbf",
 997 | 		"\xd3\x81" => "\xd3\x82",
 998 | 		"\xd3\x83" => "\xd3\x84",
 999 | 		"\xd3\x85" => "\xd3\x86",
1000 | 		"\xd3\x87" => "\xd3\x88",
1001 | 		"\xd3\x89" => "\xd3\x8a",
1002 | 		"\xd3\x8b" => "\xd3\x8c",
1003 | 		"\xd3\x8d" => "\xd3\x8e",
1004 | 		"\xd3\x90" => "\xd3\x91",
1005 | 		"\xd3\x92" => "\xd3\x93",
1006 | 		"\xd3\x94" => "\xd3\x95",
1007 | 		"\xd3\x96" => "\xd3\x97",
1008 | 		"\xd3\x98" => "\xd3\x99",
1009 | 		"\xd3\x9a" => "\xd3\x9b",
1010 | 		"\xd3\x9c" => "\xd3\x9d",
1011 | 		"\xd3\x9e" => "\xd3\x9f",
1012 | 		"\xd3\xa0" => "\xd3\xa1",
1013 | 		"\xd3\xa2" => "\xd3\xa3",
1014 | 		"\xd3\xa4" => "\xd3\xa5",
1015 | 		"\xd3\xa6" => "\xd3\xa7",
1016 | 		"\xd3\xa8" => "\xd3\xa9",
1017 | 		"\xd3\xaa" => "\xd3\xab",
1018 | 		"\xd3\xac" => "\xd3\xad",
1019 | 		"\xd3\xae" => "\xd3\xaf",
1020 | 		"\xd3\xb0" => "\xd3\xb1",
1021 | 		"\xd3\xb2" => "\xd3\xb3",
1022 | 		"\xd3\xb4" => "\xd3\xb5",
1023 | 		"\xd3\xb8" => "\xd3\xb9",
1024 | 		"\xd4\x80" => "\xd4\x81",
1025 | 		"\xd4\x82" => "\xd4\x83",
1026 | 		"\xd4\x84" => "\xd4\x85",
1027 | 		"\xd4\x86" => "\xd4\x87",
1028 | 		"\xd4\x88" => "\xd4\x89",
1029 | 		"\xd4\x8a" => "\xd4\x8b",
1030 | 		"\xd4\x8c" => "\xd4\x8d",
1031 | 		"\xd4\x8e" => "\xd4\x8f",
1032 | 		"\xd4\xb1" => "\xd5\xa1",
1033 | 		"\xd4\xb2" => "\xd5\xa2",
1034 | 		"\xd4\xb3" => "\xd5\xa3",
1035 | 		"\xd4\xb4" => "\xd5\xa4",
1036 | 		"\xd4\xb5" => "\xd5\xa5",
1037 | 		"\xd4\xb6" => "\xd5\xa6",
1038 | 		"\xd4\xb7" => "\xd5\xa7",
1039 | 		"\xd4\xb8" => "\xd5\xa8",
1040 | 		"\xd4\xb9" => "\xd5\xa9",
1041 | 		"\xd4\xba" => "\xd5\xaa",
1042 | 		"\xd4\xbb" => "\xd5\xab",
1043 | 		"\xd4\xbc" => "\xd5\xac",
1044 | 		"\xd4\xbd" => "\xd5\xad",
1045 | 		"\xd4\xbe" => "\xd5\xae",
1046 | 		"\xd4\xbf" => "\xd5\xaf",
1047 | 		"\xd5\x80" => "\xd5\xb0",
1048 | 		"\xd5\x81" => "\xd5\xb1",
1049 | 		"\xd5\x82" => "\xd5\xb2",
1050 | 		"\xd5\x83" => "\xd5\xb3",
1051 | 		"\xd5\x84" => "\xd5\xb4",
1052 | 		"\xd5\x85" => "\xd5\xb5",
1053 | 		"\xd5\x86" => "\xd5\xb6",
1054 | 		"\xd5\x87" => "\xd5\xb7",
1055 | 		"\xd5\x88" => "\xd5\xb8",
1056 | 		"\xd5\x89" => "\xd5\xb9",
1057 | 		"\xd5\x8a" => "\xd5\xba",
1058 | 		"\xd5\x8b" => "\xd5\xbb",
1059 | 		"\xd5\x8c" => "\xd5\xbc",
1060 | 		"\xd5\x8d" => "\xd5\xbd",
1061 | 		"\xd5\x8e" => "\xd5\xbe",
1062 | 		"\xd5\x8f" => "\xd5\xbf",
1063 | 		"\xd5\x90" => "\xd6\x80",
1064 | 		"\xd5\x91" => "\xd6\x81",
1065 | 		"\xd5\x92" => "\xd6\x82",
1066 | 		"\xd5\x93" => "\xd6\x83",
1067 | 		"\xd5\x94" => "\xd6\x84",
1068 | 		"\xd5\x95" => "\xd6\x85",
1069 | 		"\xd5\x96" => "\xd6\x86",
1070 | 		"\xe1\xb8\x80" => "\xe1\xb8\x81",
1071 | 		"\xe1\xb8\x82" => "\xe1\xb8\x83",
1072 | 		"\xe1\xb8\x84" => "\xe1\xb8\x85",
1073 | 		"\xe1\xb8\x86" => "\xe1\xb8\x87",
1074 | 		"\xe1\xb8\x88" => "\xe1\xb8\x89",
1075 | 		"\xe1\xb8\x8a" => "\xe1\xb8\x8b",
1076 | 		"\xe1\xb8\x8c" => "\xe1\xb8\x8d",
1077 | 		"\xe1\xb8\x8e" => "\xe1\xb8\x8f",
1078 | 		"\xe1\xb8\x90" => "\xe1\xb8\x91",
1079 | 		"\xe1\xb8\x92" => "\xe1\xb8\x93",
1080 | 		"\xe1\xb8\x94" => "\xe1\xb8\x95",
1081 | 		"\xe1\xb8\x96" => "\xe1\xb8\x97",
1082 | 		"\xe1\xb8\x98" => "\xe1\xb8\x99",
1083 | 		"\xe1\xb8\x9a" => "\xe1\xb8\x9b",
1084 | 		"\xe1\xb8\x9c" => "\xe1\xb8\x9d",
1085 | 		"\xe1\xb8\x9e" => "\xe1\xb8\x9f",
1086 | 		"\xe1\xb8\xa0" => "\xe1\xb8\xa1",
1087 | 		"\xe1\xb8\xa2" => "\xe1\xb8\xa3",
1088 | 		"\xe1\xb8\xa4" => "\xe1\xb8\xa5",
1089 | 		"\xe1\xb8\xa6" => "\xe1\xb8\xa7",
1090 | 		"\xe1\xb8\xa8" => "\xe1\xb8\xa9",
1091 | 		"\xe1\xb8\xaa" => "\xe1\xb8\xab",
1092 | 		"\xe1\xb8\xac" => "\xe1\xb8\xad",
1093 | 		"\xe1\xb8\xae" => "\xe1\xb8\xaf",
1094 | 		"\xe1\xb8\xb0" => "\xe1\xb8\xb1",
1095 | 		"\xe1\xb8\xb2" => "\xe1\xb8\xb3",
1096 | 		"\xe1\xb8\xb4" => "\xe1\xb8\xb5",
1097 | 		"\xe1\xb8\xb6" => "\xe1\xb8\xb7",
1098 | 		"\xe1\xb8\xb8" => "\xe1\xb8\xb9",
1099 | 		"\xe1\xb8\xba" => "\xe1\xb8\xbb",
1100 | 		"\xe1\xb8\xbc" => "\xe1\xb8\xbd",
1101 | 		"\xe1\xb8\xbe" => "\xe1\xb8\xbf",
1102 | 		"\xe1\xb9\x80" => "\xe1\xb9\x81",
1103 | 		"\xe1\xb9\x82" => "\xe1\xb9\x83",
1104 | 		"\xe1\xb9\x84" => "\xe1\xb9\x85",
1105 | 		"\xe1\xb9\x86" => "\xe1\xb9\x87",
1106 | 		"\xe1\xb9\x88" => "\xe1\xb9\x89",
1107 | 		"\xe1\xb9\x8a" => "\xe1\xb9\x8b",
1108 | 		"\xe1\xb9\x8c" => "\xe1\xb9\x8d",
1109 | 		"\xe1\xb9\x8e" => "\xe1\xb9\x8f",
1110 | 		"\xe1\xb9\x90" => "\xe1\xb9\x91",
1111 | 		"\xe1\xb9\x92" => "\xe1\xb9\x93",
1112 | 		"\xe1\xb9\x94" => "\xe1\xb9\x95",
1113 | 		"\xe1\xb9\x96" => "\xe1\xb9\x97",
1114 | 		"\xe1\xb9\x98" => "\xe1\xb9\x99",
1115 | 		"\xe1\xb9\x9a" => "\xe1\xb9\x9b",
1116 | 		"\xe1\xb9\x9c" => "\xe1\xb9\x9d",
1117 | 		"\xe1\xb9\x9e" => "\xe1\xb9\x9f",
1118 | 		"\xe1\xb9\xa0" => "\xe1\xb9\xa1",
1119 | 		"\xe1\xb9\xa2" => "\xe1\xb9\xa3",
1120 | 		"\xe1\xb9\xa4" => "\xe1\xb9\xa5",
1121 | 		"\xe1\xb9\xa6" => "\xe1\xb9\xa7",
1122 | 		"\xe1\xb9\xa8" => "\xe1\xb9\xa9",
1123 | 		"\xe1\xb9\xaa" => "\xe1\xb9\xab",
1124 | 		"\xe1\xb9\xac" => "\xe1\xb9\xad",
1125 | 		"\xe1\xb9\xae" => "\xe1\xb9\xaf",
1126 | 		"\xe1\xb9\xb0" => "\xe1\xb9\xb1",
1127 | 		"\xe1\xb9\xb2" => "\xe1\xb9\xb3",
1128 | 		"\xe1\xb9\xb4" => "\xe1\xb9\xb5",
1129 | 		"\xe1\xb9\xb6" => "\xe1\xb9\xb7",
1130 | 		"\xe1\xb9\xb8" => "\xe1\xb9\xb9",
1131 | 		"\xe1\xb9\xba" => "\xe1\xb9\xbb",
1132 | 		"\xe1\xb9\xbc" => "\xe1\xb9\xbd",
1133 | 		"\xe1\xb9\xbe" => "\xe1\xb9\xbf",
1134 | 		"\xe1\xba\x80" => "\xe1\xba\x81",
1135 | 		"\xe1\xba\x82" => "\xe1\xba\x83",
1136 | 		"\xe1\xba\x84" => "\xe1\xba\x85",
1137 | 		"\xe1\xba\x86" => "\xe1\xba\x87",
1138 | 		"\xe1\xba\x88" => "\xe1\xba\x89",
1139 | 		"\xe1\xba\x8a" => "\xe1\xba\x8b",
1140 | 		"\xe1\xba\x8c" => "\xe1\xba\x8d",
1141 | 		"\xe1\xba\x8e" => "\xe1\xba\x8f",
1142 | 		"\xe1\xba\x90" => "\xe1\xba\x91",
1143 | 		"\xe1\xba\x92" => "\xe1\xba\x93",
1144 | 		"\xe1\xba\x94" => "\xe1\xba\x95",
1145 | 		"\xe1\xba\xa0" => "\xe1\xba\xa1",
1146 | 		"\xe1\xba\xa2" => "\xe1\xba\xa3",
1147 | 		"\xe1\xba\xa4" => "\xe1\xba\xa5",
1148 | 		"\xe1\xba\xa6" => "\xe1\xba\xa7",
1149 | 		"\xe1\xba\xa8" => "\xe1\xba\xa9",
1150 | 		"\xe1\xba\xaa" => "\xe1\xba\xab",
1151 | 		"\xe1\xba\xac" => "\xe1\xba\xad",
1152 | 		"\xe1\xba\xae" => "\xe1\xba\xaf",
1153 | 		"\xe1\xba\xb0" => "\xe1\xba\xb1",
1154 | 		"\xe1\xba\xb2" => "\xe1\xba\xb3",
1155 | 		"\xe1\xba\xb4" => "\xe1\xba\xb5",
1156 | 		"\xe1\xba\xb6" => "\xe1\xba\xb7",
1157 | 		"\xe1\xba\xb8" => "\xe1\xba\xb9",
1158 | 		"\xe1\xba\xba" => "\xe1\xba\xbb",
1159 | 		"\xe1\xba\xbc" => "\xe1\xba\xbd",
1160 | 		"\xe1\xba\xbe" => "\xe1\xba\xbf",
1161 | 		"\xe1\xbb\x80" => "\xe1\xbb\x81",
1162 | 		"\xe1\xbb\x82" => "\xe1\xbb\x83",
1163 | 		"\xe1\xbb\x84" => "\xe1\xbb\x85",
1164 | 		"\xe1\xbb\x86" => "\xe1\xbb\x87",
1165 | 		"\xe1\xbb\x88" => "\xe1\xbb\x89",
1166 | 		"\xe1\xbb\x8a" => "\xe1\xbb\x8b",
1167 | 		"\xe1\xbb\x8c" => "\xe1\xbb\x8d",
1168 | 		"\xe1\xbb\x8e" => "\xe1\xbb\x8f",
1169 | 		"\xe1\xbb\x90" => "\xe1\xbb\x91",
1170 | 		"\xe1\xbb\x92" => "\xe1\xbb\x93",
1171 | 		"\xe1\xbb\x94" => "\xe1\xbb\x95",
1172 | 		"\xe1\xbb\x96" => "\xe1\xbb\x97",
1173 | 		"\xe1\xbb\x98" => "\xe1\xbb\x99",
1174 | 		"\xe1\xbb\x9a" => "\xe1\xbb\x9b",
1175 | 		"\xe1\xbb\x9c" => "\xe1\xbb\x9d",
1176 | 		"\xe1\xbb\x9e" => "\xe1\xbb\x9f",
1177 | 		"\xe1\xbb\xa0" => "\xe1\xbb\xa1",
1178 | 		"\xe1\xbb\xa2" => "\xe1\xbb\xa3",
1179 | 		"\xe1\xbb\xa4" => "\xe1\xbb\xa5",
1180 | 		"\xe1\xbb\xa6" => "\xe1\xbb\xa7",
1181 | 		"\xe1\xbb\xa8" => "\xe1\xbb\xa9",
1182 | 		"\xe1\xbb\xaa" => "\xe1\xbb\xab",
1183 | 		"\xe1\xbb\xac" => "\xe1\xbb\xad",
1184 | 		"\xe1\xbb\xae" => "\xe1\xbb\xaf",
1185 | 		"\xe1\xbb\xb0" => "\xe1\xbb\xb1",
1186 | 		"\xe1\xbb\xb2" => "\xe1\xbb\xb3",
1187 | 		"\xe1\xbb\xb4" => "\xe1\xbb\xb5",
1188 | 		"\xe1\xbb\xb6" => "\xe1\xbb\xb7",
1189 | 		"\xe1\xbb\xb8" => "\xe1\xbb\xb9",
1190 | 		"\xe1\xbc\x88" => "\xe1\xbc\x80",
1191 | 		"\xe1\xbc\x89" => "\xe1\xbc\x81",
1192 | 		"\xe1\xbc\x8a" => "\xe1\xbc\x82",
1193 | 		"\xe1\xbc\x8b" => "\xe1\xbc\x83",
1194 | 		"\xe1\xbc\x8c" => "\xe1\xbc\x84",
1195 | 		"\xe1\xbc\x8d" => "\xe1\xbc\x85",
1196 | 		"\xe1\xbc\x8e" => "\xe1\xbc\x86",
1197 | 		"\xe1\xbc\x8f" => "\xe1\xbc\x87",
1198 | 		"\xe1\xbc\x98" => "\xe1\xbc\x90",
1199 | 		"\xe1\xbc\x99" => "\xe1\xbc\x91",
1200 | 		"\xe1\xbc\x9a" => "\xe1\xbc\x92",
1201 | 		"\xe1\xbc\x9b" => "\xe1\xbc\x93",
1202 | 		"\xe1\xbc\x9c" => "\xe1\xbc\x94",
1203 | 		"\xe1\xbc\x9d" => "\xe1\xbc\x95",
1204 | 		"\xe1\xbc\xa9" => "\xe1\xbc\xa1",
1205 | 		"\xe1\xbc\xaa" => "\xe1\xbc\xa2",
1206 | 		"\xe1\xbc\xab" => "\xe1\xbc\xa3",
1207 | 		"\xe1\xbc\xac" => "\xe1\xbc\xa4",
1208 | 		"\xe1\xbc\xad" => "\xe1\xbc\xa5",
1209 | 		"\xe1\xbc\xae" => "\xe1\xbc\xa6",
1210 | 		"\xe1\xbc\xaf" => "\xe1\xbc\xa7",
1211 | 		"\xe1\xbc\xb8" => "\xe1\xbc\xb0",
1212 | 		"\xe1\xbc\xb9" => "\xe1\xbc\xb1",
1213 | 		"\xe1\xbc\xba" => "\xe1\xbc\xb2",
1214 | 		"\xe1\xbc\xbb" => "\xe1\xbc\xb3",
1215 | 		"\xe1\xbc\xbc" => "\xe1\xbc\xb4",
1216 | 		"\xe1\xbc\xbd" => "\xe1\xbc\xb5",
1217 | 		"\xe1\xbc\xbe" => "\xe1\xbc\xb6",
1218 | 		"\xe1\xbc\xbf" => "\xe1\xbc\xb7",
1219 | 		"\xe1\xbd\x88" => "\xe1\xbd\x80",
1220 | 		"\xe1\xbd\x89" => "\xe1\xbd\x81",
1221 | 		"\xe1\xbd\x8a" => "\xe1\xbd\x82",
1222 | 		"\xe1\xbd\x8b" => "\xe1\xbd\x83",
1223 | 		"\xe1\xbd\x8c" => "\xe1\xbd\x84",
1224 | 		"\xe1\xbd\x8d" => "\xe1\xbd\x85",
1225 | 		"\xe1\xbd\x99" => "\xe1\xbd\x91",
1226 | 		"\xe1\xbd\x9b" => "\xe1\xbd\x93",
1227 | 		"\xe1\xbd\x9d" => "\xe1\xbd\x95",
1228 | 		"\xe1\xbd\x9f" => "\xe1\xbd\x97",
1229 | 		"\xe1\xbd\xa9" => "\xe1\xbd\xa1",
1230 | 		"\xe1\xbd\xaa" => "\xe1\xbd\xa2",
1231 | 		"\xe1\xbd\xab" => "\xe1\xbd\xa3",
1232 | 		"\xe1\xbd\xac" => "\xe1\xbd\xa4",
1233 | 		"\xe1\xbd\xad" => "\xe1\xbd\xa5",
1234 | 		"\xe1\xbd\xae" => "\xe1\xbd\xa6",
1235 | 		"\xe1\xbd\xaf" => "\xe1\xbd\xa7",
1236 | 		"\xe1\xbe\x88" => "\xe1\xbe\x80",
1237 | 		"\xe1\xbe\x89" => "\xe1\xbe\x81",
1238 | 		"\xe1\xbe\x8a" => "\xe1\xbe\x82",
1239 | 		"\xe1\xbe\x8b" => "\xe1\xbe\x83",
1240 | 		"\xe1\xbe\x8c" => "\xe1\xbe\x84",
1241 | 		"\xe1\xbe\x8d" => "\xe1\xbe\x85",
1242 | 		"\xe1\xbe\x8e" => "\xe1\xbe\x86",
1243 | 		"\xe1\xbe\x8f" => "\xe1\xbe\x87",
1244 | 		"\xe1\xbe\x98" => "\xe1\xbe\x90",
1245 | 		"\xe1\xbe\x99" => "\xe1\xbe\x91",
1246 | 		"\xe1\xbe\x9a" => "\xe1\xbe\x92",
1247 | 		"\xe1\xbe\x9b" => "\xe1\xbe\x93",
1248 | 		"\xe1\xbe\x9c" => "\xe1\xbe\x94",
1249 | 		"\xe1\xbe\x9d" => "\xe1\xbe\x95",
1250 | 		"\xe1\xbe\x9e" => "\xe1\xbe\x96",
1251 | 		"\xe1\xbe\x9f" => "\xe1\xbe\x97",
1252 | 		"\xe1\xbe\xa9" => "\xe1\xbe\xa1",
1253 | 		"\xe1\xbe\xaa" => "\xe1\xbe\xa2",
1254 | 		"\xe1\xbe\xab" => "\xe1\xbe\xa3",
1255 | 		"\xe1\xbe\xac" => "\xe1\xbe\xa4",
1256 | 		"\xe1\xbe\xad" => "\xe1\xbe\xa5",
1257 | 		"\xe1\xbe\xae" => "\xe1\xbe\xa6",
1258 | 		"\xe1\xbe\xaf" => "\xe1\xbe\xa7",
1259 | 		"\xe1\xbe\xb8" => "\xe1\xbe\xb0",
1260 | 		"\xe1\xbe\xb9" => "\xe1\xbe\xb1",
1261 | 		"\xe1\xbe\xba" => "\xe1\xbd\xb0",
1262 | 		"\xe1\xbe\xbb" => "\xe1\xbd\xb1",
1263 | 		"\xe1\xbe\xbc" => "\xe1\xbe\xb3",
1264 | 		"\xe1\xbf\x88" => "\xe1\xbd\xb2",
1265 | 		"\xe1\xbf\x89" => "\xe1\xbd\xb3",
1266 | 		"\xe1\xbf\x8a" => "\xe1\xbd\xb4",
1267 | 		"\xe1\xbf\x8b" => "\xe1\xbd\xb5",
1268 | 		"\xe1\xbf\x8c" => "\xe1\xbf\x83",
1269 | 		"\xe1\xbf\x98" => "\xe1\xbf\x90",
1270 | 		"\xe1\xbf\x99" => "\xe1\xbf\x91",
1271 | 		"\xe1\xbf\x9a" => "\xe1\xbd\xb6",
1272 | 		"\xe1\xbf\x9b" => "\xe1\xbd\xb7",
1273 | 		"\xe1\xbf\xa9" => "\xe1\xbf\xa1",
1274 | 		"\xe1\xbf\xaa" => "\xe1\xbd\xba",
1275 | 		"\xe1\xbf\xab" => "\xe1\xbd\xbb",
1276 | 		"\xe1\xbf\xac" => "\xe1\xbf\xa5",
1277 | 		"\xe1\xbf\xb8" => "\xe1\xbd\xb8",
1278 | 		"\xe1\xbf\xb9" => "\xe1\xbd\xb9",
1279 | 		"\xe1\xbf\xba" => "\xe1\xbd\xbc",
1280 | 		"\xe1\xbf\xbb" => "\xe1\xbd\xbd",
1281 | 		"\xe1\xbf\xbc" => "\xe1\xbf\xb3",
1282 | 		"\xef\xbc\xa1" => "\xef\xbd\x81",
1283 | 		"\xef\xbc\xa2" => "\xef\xbd\x82",
1284 | 		"\xef\xbc\xa3" => "\xef\xbd\x83",
1285 | 		"\xef\xbc\xa4" => "\xef\xbd\x84",
1286 | 		"\xef\xbc\xa5" => "\xef\xbd\x85",
1287 | 		"\xef\xbc\xa6" => "\xef\xbd\x86",
1288 | 		"\xef\xbc\xa7" => "\xef\xbd\x87",
1289 | 		"\xef\xbc\xa8" => "\xef\xbd\x88",
1290 | 		"\xef\xbc\xa9" => "\xef\xbd\x89",
1291 | 		"\xef\xbc\xaa" => "\xef\xbd\x8a",
1292 | 		"\xef\xbc\xab" => "\xef\xbd\x8b",
1293 | 		"\xef\xbc\xac" => "\xef\xbd\x8c",
1294 | 		"\xef\xbc\xad" => "\xef\xbd\x8d",
1295 | 		"\xef\xbc\xae" => "\xef\xbd\x8e",
1296 | 		"\xef\xbc\xaf" => "\xef\xbd\x8f",
1297 | 		"\xef\xbc\xb0" => "\xef\xbd\x90",
1298 | 		"\xef\xbc\xb1" => "\xef\xbd\x91",
1299 | 		"\xef\xbc\xb2" => "\xef\xbd\x92",
1300 | 		"\xef\xbc\xb3" => "\xef\xbd\x93",
1301 | 		"\xef\xbc\xb4" => "\xef\xbd\x94",
1302 | 		"\xef\xbc\xb5" => "\xef\xbd\x95",
1303 | 		"\xef\xbc\xb6" => "\xef\xbd\x96",
1304 | 		"\xef\xbc\xb7" => "\xef\xbd\x97",
1305 | 		"\xef\xbc\xb8" => "\xef\xbd\x98",
1306 | 		"\xef\xbc\xb9" => "\xef\xbd\x99",
1307 | 		"\xef\xbc\xba" => "\xef\xbd\x9a",
1308 | 	);
1309 | 
1310 | 	/**
1311 | 	 * Unicode Character Database 6.0.0 (2010-06-04)
1312 | 	 * Autogenerated by unicode_blocks_txt2php() PHP function at 2011-06-04 00:19:39, 209 blocks total
1313 | 	 *
1314 | 	 * @var array
1315 | 	 */
1316 | 	public static $unicode_blocks = array(
1317 | 		'Basic Latin' => array(
1318 | 			0 => 0x0000,
1319 | 			1 => 0x007F,
1320 | 			2 => 0,
1321 | 		),
1322 | 		'Latin-1 Supplement' => array(
1323 | 			0 => 0x0080,
1324 | 			1 => 0x00FF,
1325 | 			2 => 1,
1326 | 		),
1327 | 		'Latin Extended-A' => array(
1328 | 			0 => 0x0100,
1329 | 			1 => 0x017F,
1330 | 			2 => 2,
1331 | 		),
1332 | 		'Latin Extended-B' => array(
1333 | 			0 => 0x0180,
1334 | 			1 => 0x024F,
1335 | 			2 => 3,
1336 | 		),
1337 | 		'IPA Extensions' => array(
1338 | 			0 => 0x0250,
1339 | 			1 => 0x02AF,
1340 | 			2 => 4,
1341 | 		),
1342 | 		'Spacing Modifier Letters' => array(
1343 | 			0 => 0x02B0,
1344 | 			1 => 0x02FF,
1345 | 			2 => 5,
1346 | 		),
1347 | 		'Combining Diacritical Marks' => array(
1348 | 			0 => 0x0300,
1349 | 			1 => 0x036F,
1350 | 			2 => 6,
1351 | 		),
1352 | 		'Greek and Coptic' => array(
1353 | 			0 => 0x0370,
1354 | 			1 => 0x03FF,
1355 | 			2 => 7,
1356 | 		),
1357 | 		'Cyrillic' => array(
1358 | 			0 => 0x0400,
1359 | 			1 => 0x04FF,
1360 | 			2 => 8,
1361 | 		),
1362 | 		'Cyrillic Supplement' => array(
1363 | 			0 => 0x0500,
1364 | 			1 => 0x052F,
1365 | 			2 => 9,
1366 | 		),
1367 | 		'Armenian' => array(
1368 | 			0 => 0x0530,
1369 | 			1 => 0x058F,
1370 | 			2 => 10,
1371 | 		),
1372 | 		'Hebrew' => array(
1373 | 			0 => 0x0590,
1374 | 			1 => 0x05FF,
1375 | 			2 => 11,
1376 | 		),
1377 | 		'Arabic' => array(
1378 | 			0 => 0x0600,
1379 | 			1 => 0x06FF,
1380 | 			2 => 12,
1381 | 		),
1382 | 		'Syriac' => array(
1383 | 			0 => 0x0700,
1384 | 			1 => 0x074F,
1385 | 			2 => 13,
1386 | 		),
1387 | 		'Arabic Supplement' => array(
1388 | 			0 => 0x0750,
1389 | 			1 => 0x077F,
1390 | 			2 => 14,
1391 | 		),
1392 | 		'Thaana' => array(
1393 | 			0 => 0x0780,
1394 | 			1 => 0x07BF,
1395 | 			2 => 15,
1396 | 		),
1397 | 		'NKo' => array(
1398 | 			0 => 0x07C0,
1399 | 			1 => 0x07FF,
1400 | 			2 => 16,
1401 | 		),
1402 | 		'Samaritan' => array(
1403 | 			0 => 0x0800,
1404 | 			1 => 0x083F,
1405 | 			2 => 17,
1406 | 		),
1407 | 		'Mandaic' => array(
1408 | 			0 => 0x0840,
1409 | 			1 => 0x085F,
1410 | 			2 => 18,
1411 | 		),
1412 | 		'Devanagari' => array(
1413 | 			0 => 0x0900,
1414 | 			1 => 0x097F,
1415 | 			2 => 19,
1416 | 		),
1417 | 		'Bengali' => array(
1418 | 			0 => 0x0980,
1419 | 			1 => 0x09FF,
1420 | 			2 => 20,
1421 | 		),
1422 | 		'Gurmukhi' => array(
1423 | 			0 => 0x0A00,
1424 | 			1 => 0x0A7F,
1425 | 			2 => 21,
1426 | 		),
1427 | 		'Gujarati' => array(
1428 | 			0 => 0x0A80,
1429 | 			1 => 0x0AFF,
1430 | 			2 => 22,
1431 | 		),
1432 | 		'Oriya' => array(
1433 | 			0 => 0x0B00,
1434 | 			1 => 0x0B7F,
1435 | 			2 => 23,
1436 | 		),
1437 | 		'Tamil' => array(
1438 | 			0 => 0x0B80,
1439 | 			1 => 0x0BFF,
1440 | 			2 => 24,
1441 | 		),
1442 | 		'Telugu' => array(
1443 | 			0 => 0x0C00,
1444 | 			1 => 0x0C7F,
1445 | 			2 => 25,
1446 | 		),
1447 | 		'Kannada' => array(
1448 | 			0 => 0x0C80,
1449 | 			1 => 0x0CFF,
1450 | 			2 => 26,
1451 | 		),
1452 | 		'Malayalam' => array(
1453 | 			0 => 0x0D00,
1454 | 			1 => 0x0D7F,
1455 | 			2 => 27,
1456 | 		),
1457 | 		'Sinhala' => array(
1458 | 			0 => 0x0D80,
1459 | 			1 => 0x0DFF,
1460 | 			2 => 28,
1461 | 		),
1462 | 		'Thai' => array(
1463 | 			0 => 0x0E00,
1464 | 			1 => 0x0E7F,
1465 | 			2 => 29,
1466 | 		),
1467 | 		'Lao' => array(
1468 | 			0 => 0x0E80,
1469 | 			1 => 0x0EFF,
1470 | 			2 => 30,
1471 | 		),
1472 | 		'Tibetan' => array(
1473 | 			0 => 0x0F00,
1474 | 			1 => 0x0FFF,
1475 | 			2 => 31,
1476 | 		),
1477 | 		'Myanmar' => array(
1478 | 			0 => 0x1000,
1479 | 			1 => 0x109F,
1480 | 			2 => 32,
1481 | 		),
1482 | 		'Georgian' => array(
1483 | 			0 => 0x10A0,
1484 | 			1 => 0x10FF,
1485 | 			2 => 33,
1486 | 		),
1487 | 		'Hangul Jamo' => array(
1488 | 			0 => 0x1100,
1489 | 			1 => 0x11FF,
1490 | 			2 => 34,
1491 | 		),
1492 | 		'Ethiopic' => array(
1493 | 			0 => 0x1200,
1494 | 			1 => 0x137F,
1495 | 			2 => 35,
1496 | 		),
1497 | 		'Ethiopic Supplement' => array(
1498 | 			0 => 0x1380,
1499 | 			1 => 0x139F,
1500 | 			2 => 36,
1501 | 		),
1502 | 		'Cherokee' => array(
1503 | 			0 => 0x13A0,
1504 | 			1 => 0x13FF,
1505 | 			2 => 37,
1506 | 		),
1507 | 		'Unified Canadian Aboriginal Syllabics' => array(
1508 | 			0 => 0x1400,
1509 | 			1 => 0x167F,
1510 | 			2 => 38,
1511 | 		),
1512 | 		'Ogham' => array(
1513 | 			0 => 0x1680,
1514 | 			1 => 0x169F,
1515 | 			2 => 39,
1516 | 		),
1517 | 		'Runic' => array(
1518 | 			0 => 0x16A0,
1519 | 			1 => 0x16FF,
1520 | 			2 => 40,
1521 | 		),
1522 | 		'Tagalog' => array(
1523 | 			0 => 0x1700,
1524 | 			1 => 0x171F,
1525 | 			2 => 41,
1526 | 		),
1527 | 		'Hanunoo' => array(
1528 | 			0 => 0x1720,
1529 | 			1 => 0x173F,
1530 | 			2 => 42,
1531 | 		),
1532 | 		'Buhid' => array(
1533 | 			0 => 0x1740,
1534 | 			1 => 0x175F,
1535 | 			2 => 43,
1536 | 		),
1537 | 		'Tagbanwa' => array(
1538 | 			0 => 0x1760,
1539 | 			1 => 0x177F,
1540 | 			2 => 44,
1541 | 		),
1542 | 		'Khmer' => array(
1543 | 			0 => 0x1780,
1544 | 			1 => 0x17FF,
1545 | 			2 => 45,
1546 | 		),
1547 | 		'Mongolian' => array(
1548 | 			0 => 0x1800,
1549 | 			1 => 0x18AF,
1550 | 			2 => 46,
1551 | 		),
1552 | 		'Unified Canadian Aboriginal Syllabics Extended' => array(
1553 | 			0 => 0x18B0,
1554 | 			1 => 0x18FF,
1555 | 			2 => 47,
1556 | 		),
1557 | 		'Limbu' => array(
1558 | 			0 => 0x1900,
1559 | 			1 => 0x194F,
1560 | 			2 => 48,
1561 | 		),
1562 | 		'Tai Le' => array(
1563 | 			0 => 0x1950,
1564 | 			1 => 0x197F,
1565 | 			2 => 49,
1566 | 		),
1567 | 		'New Tai Lue' => array(
1568 | 			0 => 0x1980,
1569 | 			1 => 0x19DF,
1570 | 			2 => 50,
1571 | 		),
1572 | 		'Khmer Symbols' => array(
1573 | 			0 => 0x19E0,
1574 | 			1 => 0x19FF,
1575 | 			2 => 51,
1576 | 		),
1577 | 		'Buginese' => array(
1578 | 			0 => 0x1A00,
1579 | 			1 => 0x1A1F,
1580 | 			2 => 52,
1581 | 		),
1582 | 		'Tai Tham' => array(
1583 | 			0 => 0x1A20,
1584 | 			1 => 0x1AAF,
1585 | 			2 => 53,
1586 | 		),
1587 | 		'Balinese' => array(
1588 | 			0 => 0x1B00,
1589 | 			1 => 0x1B7F,
1590 | 			2 => 54,
1591 | 		),
1592 | 		'Sundanese' => array(
1593 | 			0 => 0x1B80,
1594 | 			1 => 0x1BBF,
1595 | 			2 => 55,
1596 | 		),
1597 | 		'Batak' => array(
1598 | 			0 => 0x1BC0,
1599 | 			1 => 0x1BFF,
1600 | 			2 => 56,
1601 | 		),
1602 | 		'Lepcha' => array(
1603 | 			0 => 0x1C00,
1604 | 			1 => 0x1C4F,
1605 | 			2 => 57,
1606 | 		),
1607 | 		'Ol Chiki' => array(
1608 | 			0 => 0x1C50,
1609 | 			1 => 0x1C7F,
1610 | 			2 => 58,
1611 | 		),
1612 | 		'Vedic Extensions' => array(
1613 | 			0 => 0x1CD0,
1614 | 			1 => 0x1CFF,
1615 | 			2 => 59,
1616 | 		),
1617 | 		'Phonetic Extensions' => array(
1618 | 			0 => 0x1D00,
1619 | 			1 => 0x1D7F,
1620 | 			2 => 60,
1621 | 		),
1622 | 		'Phonetic Extensions Supplement' => array(
1623 | 			0 => 0x1D80,
1624 | 			1 => 0x1DBF,
1625 | 			2 => 61,
1626 | 		),
1627 | 		'Combining Diacritical Marks Supplement' => array(
1628 | 			0 => 0x1DC0,
1629 | 			1 => 0x1DFF,
1630 | 			2 => 62,
1631 | 		),
1632 | 		'Latin Extended Additional' => array(
1633 | 			0 => 0x1E00,
1634 | 			1 => 0x1EFF,
1635 | 			2 => 63,
1636 | 		),
1637 | 		'Greek Extended' => array(
1638 | 			0 => 0x1F00,
1639 | 			1 => 0x1FFF,
1640 | 			2 => 64,
1641 | 		),
1642 | 		'General Punctuation' => array(
1643 | 			0 => 0x2000,
1644 | 			1 => 0x206F,
1645 | 			2 => 65,
1646 | 		),
1647 | 		'Superscripts and Subscripts' => array(
1648 | 			0 => 0x2070,
1649 | 			1 => 0x209F,
1650 | 			2 => 66,
1651 | 		),
1652 | 		'Currency Symbols' => array(
1653 | 			0 => 0x20A0,
1654 | 			1 => 0x20CF,
1655 | 			2 => 67,
1656 | 		),
1657 | 		'Combining Diacritical Marks for Symbols' => array(
1658 | 			0 => 0x20D0,
1659 | 			1 => 0x20FF,
1660 | 			2 => 68,
1661 | 		),
1662 | 		'Letterlike Symbols' => array(
1663 | 			0 => 0x2100,
1664 | 			1 => 0x214F,
1665 | 			2 => 69,
1666 | 		),
1667 | 		'Number Forms' => array(
1668 | 			0 => 0x2150,
1669 | 			1 => 0x218F,
1670 | 			2 => 70,
1671 | 		),
1672 | 		'Arrows' => array(
1673 | 			0 => 0x2190,
1674 | 			1 => 0x21FF,
1675 | 			2 => 71,
1676 | 		),
1677 | 		'Mathematical Operators' => array(
1678 | 			0 => 0x2200,
1679 | 			1 => 0x22FF,
1680 | 			2 => 72,
1681 | 		),
1682 | 		'Miscellaneous Technical' => array(
1683 | 			0 => 0x2300,
1684 | 			1 => 0x23FF,
1685 | 			2 => 73,
1686 | 		),
1687 | 		'Control Pictures' => array(
1688 | 			0 => 0x2400,
1689 | 			1 => 0x243F,
1690 | 			2 => 74,
1691 | 		),
1692 | 		'Optical Character Recognition' => array(
1693 | 			0 => 0x2440,
1694 | 			1 => 0x245F,
1695 | 			2 => 75,
1696 | 		),
1697 | 		'Enclosed Alphanumerics' => array(
1698 | 			0 => 0x2460,
1699 | 			1 => 0x24FF,
1700 | 			2 => 76,
1701 | 		),
1702 | 		'Box Drawing' => array(
1703 | 			0 => 0x2500,
1704 | 			1 => 0x257F,
1705 | 			2 => 77,
1706 | 		),
1707 | 		'Block Elements' => array(
1708 | 			0 => 0x2580,
1709 | 			1 => 0x259F,
1710 | 			2 => 78,
1711 | 		),
1712 | 		'Geometric Shapes' => array(
1713 | 			0 => 0x25A0,
1714 | 			1 => 0x25FF,
1715 | 			2 => 79,
1716 | 		),
1717 | 		'Miscellaneous Symbols' => array(
1718 | 			0 => 0x2600,
1719 | 			1 => 0x26FF,
1720 | 			2 => 80,
1721 | 		),
1722 | 		'Dingbats' => array(
1723 | 			0 => 0x2700,
1724 | 			1 => 0x27BF,
1725 | 			2 => 81,
1726 | 		),
1727 | 		'Miscellaneous Mathematical Symbols-A' => array(
1728 | 			0 => 0x27C0,
1729 | 			1 => 0x27EF,
1730 | 			2 => 82,
1731 | 		),
1732 | 		'Supplemental Arrows-A' => array(
1733 | 			0 => 0x27F0,
1734 | 			1 => 0x27FF,
1735 | 			2 => 83,
1736 | 		),
1737 | 		'Braille Patterns' => array(
1738 | 			0 => 0x2800,
1739 | 			1 => 0x28FF,
1740 | 			2 => 84,
1741 | 		),
1742 | 		'Supplemental Arrows-B' => array(
1743 | 			0 => 0x2900,
1744 | 			1 => 0x297F,
1745 | 			2 => 85,
1746 | 		),
1747 | 		'Miscellaneous Mathematical Symbols-B' => array(
1748 | 			0 => 0x2980,
1749 | 			1 => 0x29FF,
1750 | 			2 => 86,
1751 | 		),
1752 | 		'Supplemental Mathematical Operators' => array(
1753 | 			0 => 0x2A00,
1754 | 			1 => 0x2AFF,
1755 | 			2 => 87,
1756 | 		),
1757 | 		'Miscellaneous Symbols and Arrows' => array(
1758 | 			0 => 0x2B00,
1759 | 			1 => 0x2BFF,
1760 | 			2 => 88,
1761 | 		),
1762 | 		'Glagolitic' => array(
1763 | 			0 => 0x2C00,
1764 | 			1 => 0x2C5F,
1765 | 			2 => 89,
1766 | 		),
1767 | 		'Latin Extended-C' => array(
1768 | 			0 => 0x2C60,
1769 | 			1 => 0x2C7F,
1770 | 			2 => 90,
1771 | 		),
1772 | 		'Coptic' => array(
1773 | 			0 => 0x2C80,
1774 | 			1 => 0x2CFF,
1775 | 			2 => 91,
1776 | 		),
1777 | 		'Georgian Supplement' => array(
1778 | 			0 => 0x2D00,
1779 | 			1 => 0x2D2F,
1780 | 			2 => 92,
1781 | 		),
1782 | 		'Tifinagh' => array(
1783 | 			0 => 0x2D30,
1784 | 			1 => 0x2D7F,
1785 | 			2 => 93,
1786 | 		),
1787 | 		'Ethiopic Extended' => array(
1788 | 			0 => 0x2D80,
1789 | 			1 => 0x2DDF,
1790 | 			2 => 94,
1791 | 		),
1792 | 		'Cyrillic Extended-A' => array(
1793 | 			0 => 0x2DE0,
1794 | 			1 => 0x2DFF,
1795 | 			2 => 95,
1796 | 		),
1797 | 		'Supplemental Punctuation' => array(
1798 | 			0 => 0x2E00,
1799 | 			1 => 0x2E7F,
1800 | 			2 => 96,
1801 | 		),
1802 | 		'CJK Radicals Supplement' => array(
1803 | 			0 => 0x2E80,
1804 | 			1 => 0x2EFF,
1805 | 			2 => 97,
1806 | 		),
1807 | 		'Kangxi Radicals' => array(
1808 | 			0 => 0x2F00,
1809 | 			1 => 0x2FDF,
1810 | 			2 => 98,
1811 | 		),
1812 | 		'Ideographic Description Characters' => array(
1813 | 			0 => 0x2FF0,
1814 | 			1 => 0x2FFF,
1815 | 			2 => 99,
1816 | 		),
1817 | 		'CJK Symbols and Punctuation' => array(
1818 | 			0 => 0x3000,
1819 | 			1 => 0x303F,
1820 | 			2 => 100,
1821 | 		),
1822 | 		'Hiragana' => array(
1823 | 			0 => 0x3040,
1824 | 			1 => 0x309F,
1825 | 			2 => 101,
1826 | 		),
1827 | 		'Katakana' => array(
1828 | 			0 => 0x30A0,
1829 | 			1 => 0x30FF,
1830 | 			2 => 102,
1831 | 		),
1832 | 		'Bopomofo' => array(
1833 | 			0 => 0x3100,
1834 | 			1 => 0x312F,
1835 | 			2 => 103,
1836 | 		),
1837 | 		'Hangul Compatibility Jamo' => array(
1838 | 			0 => 0x3130,
1839 | 			1 => 0x318F,
1840 | 			2 => 104,
1841 | 		),
1842 | 		'Kanbun' => array(
1843 | 			0 => 0x3190,
1844 | 			1 => 0x319F,
1845 | 			2 => 105,
1846 | 		),
1847 | 		'Bopomofo Extended' => array(
1848 | 			0 => 0x31A0,
1849 | 			1 => 0x31BF,
1850 | 			2 => 106,
1851 | 		),
1852 | 		'CJK Strokes' => array(
1853 | 			0 => 0x31C0,
1854 | 			1 => 0x31EF,
1855 | 			2 => 107,
1856 | 		),
1857 | 		'Katakana Phonetic Extensions' => array(
1858 | 			0 => 0x31F0,
1859 | 			1 => 0x31FF,
1860 | 			2 => 108,
1861 | 		),
1862 | 		'Enclosed CJK Letters and Months' => array(
1863 | 			0 => 0x3200,
1864 | 			1 => 0x32FF,
1865 | 			2 => 109,
1866 | 		),
1867 | 		'CJK Compatibility' => array(
1868 | 			0 => 0x3300,
1869 | 			1 => 0x33FF,
1870 | 			2 => 110,
1871 | 		),
1872 | 		'CJK Unified Ideographs Extension A' => array(
1873 | 			0 => 0x3400,
1874 | 			1 => 0x4DBF,
1875 | 			2 => 111,
1876 | 		),
1877 | 		'Yijing Hexagram Symbols' => array(
1878 | 			0 => 0x4DC0,
1879 | 			1 => 0x4DFF,
1880 | 			2 => 112,
1881 | 		),
1882 | 		'CJK Unified Ideographs' => array(
1883 | 			0 => 0x4E00,
1884 | 			1 => 0x9FFF,
1885 | 			2 => 113,
1886 | 		),
1887 | 		'Yi Syllables' => array(
1888 | 			0 => 0xA000,
1889 | 			1 => 0xA48F,
1890 | 			2 => 114,
1891 | 		),
1892 | 		'Yi Radicals' => array(
1893 | 			0 => 0xA490,
1894 | 			1 => 0xA4CF,
1895 | 			2 => 115,
1896 | 		),
1897 | 		'Lisu' => array(
1898 | 			0 => 0xA4D0,
1899 | 			1 => 0xA4FF,
1900 | 			2 => 116,
1901 | 		),
1902 | 		'Vai' => array(
1903 | 			0 => 0xA500,
1904 | 			1 => 0xA63F,
1905 | 			2 => 117,
1906 | 		),
1907 | 		'Cyrillic Extended-B' => array(
1908 | 			0 => 0xA640,
1909 | 			1 => 0xA69F,
1910 | 			2 => 118,
1911 | 		),
1912 | 		'Bamum' => array(
1913 | 			0 => 0xA6A0,
1914 | 			1 => 0xA6FF,
1915 | 			2 => 119,
1916 | 		),
1917 | 		'Modifier Tone Letters' => array(
1918 | 			0 => 0xA700,
1919 | 			1 => 0xA71F,
1920 | 			2 => 120,
1921 | 		),
1922 | 		'Latin Extended-D' => array(
1923 | 			0 => 0xA720,
1924 | 			1 => 0xA7FF,
1925 | 			2 => 121,
1926 | 		),
1927 | 		'Syloti Nagri' => array(
1928 | 			0 => 0xA800,
1929 | 			1 => 0xA82F,
1930 | 			2 => 122,
1931 | 		),
1932 | 		'Common Indic Number Forms' => array(
1933 | 			0 => 0xA830,
1934 | 			1 => 0xA83F,
1935 | 			2 => 123,
1936 | 		),
1937 | 		'Phags-pa' => array(
1938 | 			0 => 0xA840,
1939 | 			1 => 0xA87F,
1940 | 			2 => 124,
1941 | 		),
1942 | 		'Saurashtra' => array(
1943 | 			0 => 0xA880,
1944 | 			1 => 0xA8DF,
1945 | 			2 => 125,
1946 | 		),
1947 | 		'Devanagari Extended' => array(
1948 | 			0 => 0xA8E0,
1949 | 			1 => 0xA8FF,
1950 | 			2 => 126,
1951 | 		),
1952 | 		'Kayah Li' => array(
1953 | 			0 => 0xA900,
1954 | 			1 => 0xA92F,
1955 | 			2 => 127,
1956 | 		),
1957 | 		'Rejang' => array(
1958 | 			0 => 0xA930,
1959 | 			1 => 0xA95F,
1960 | 			2 => 128,
1961 | 		),
1962 | 		'Hangul Jamo Extended-A' => array(
1963 | 			0 => 0xA960,
1964 | 			1 => 0xA97F,
1965 | 			2 => 129,
1966 | 		),
1967 | 		'Javanese' => array(
1968 | 			0 => 0xA980,
1969 | 			1 => 0xA9DF,
1970 | 			2 => 130,
1971 | 		),
1972 | 		'Cham' => array(
1973 | 			0 => 0xAA00,
1974 | 			1 => 0xAA5F,
1975 | 			2 => 131,
1976 | 		),
1977 | 		'Myanmar Extended-A' => array(
1978 | 			0 => 0xAA60,
1979 | 			1 => 0xAA7F,
1980 | 			2 => 132,
1981 | 		),
1982 | 		'Tai Viet' => array(
1983 | 			0 => 0xAA80,
1984 | 			1 => 0xAADF,
1985 | 			2 => 133,
1986 | 		),
1987 | 		'Ethiopic Extended-A' => array(
1988 | 			0 => 0xAB00,
1989 | 			1 => 0xAB2F,
1990 | 			2 => 134,
1991 | 		),
1992 | 		'Meetei Mayek' => array(
1993 | 			0 => 0xABC0,
1994 | 			1 => 0xABFF,
1995 | 			2 => 135,
1996 | 		),
1997 | 		'Hangul Syllables' => array(
1998 | 			0 => 0xAC00,
1999 | 			1 => 0xD7AF,
2000 | 			2 => 136,
2001 | 		),
2002 | 		'Hangul Jamo Extended-B' => array(
2003 | 			0 => 0xD7B0,
2004 | 			1 => 0xD7FF,
2005 | 			2 => 137,
2006 | 		),
2007 | 		'High Surrogates' => array(
2008 | 			0 => 0xD800,
2009 | 			1 => 0xDB7F,
2010 | 			2 => 138,
2011 | 		),
2012 | 		'High Private Use Surrogates' => array(
2013 | 			0 => 0xDB80,
2014 | 			1 => 0xDBFF,
2015 | 			2 => 139,
2016 | 		),
2017 | 		'Low Surrogates' => array(
2018 | 			0 => 0xDC00,
2019 | 			1 => 0xDFFF,
2020 | 			2 => 140,
2021 | 		),
2022 | 		'Private Use Area' => array(
2023 | 			0 => 0xE000,
2024 | 			1 => 0xF8FF,
2025 | 			2 => 141,
2026 | 		),
2027 | 		'CJK Compatibility Ideographs' => array(
2028 | 			0 => 0xF900,
2029 | 			1 => 0xFAFF,
2030 | 			2 => 142,
2031 | 		),
2032 | 		'Alphabetic Presentation Forms' => array(
2033 | 			0 => 0xFB00,
2034 | 			1 => 0xFB4F,
2035 | 			2 => 143,
2036 | 		),
2037 | 		'Arabic Presentation Forms-A' => array(
2038 | 			0 => 0xFB50,
2039 | 			1 => 0xFDFF,
2040 | 			2 => 144,
2041 | 		),
2042 | 		'Variation Selectors' => array(
2043 | 			0 => 0xFE00,
2044 | 			1 => 0xFE0F,
2045 | 			2 => 145,
2046 | 		),
2047 | 		'Vertical Forms' => array(
2048 | 			0 => 0xFE10,
2049 | 			1 => 0xFE1F,
2050 | 			2 => 146,
2051 | 		),
2052 | 		'Combining Half Marks' => array(
2053 | 			0 => 0xFE20,
2054 | 			1 => 0xFE2F,
2055 | 			2 => 147,
2056 | 		),
2057 | 		'CJK Compatibility Forms' => array(
2058 | 			0 => 0xFE30,
2059 | 			1 => 0xFE4F,
2060 | 			2 => 148,
2061 | 		),
2062 | 		'Small Form Variants' => array(
2063 | 			0 => 0xFE50,
2064 | 			1 => 0xFE6F,
2065 | 			2 => 149,
2066 | 		),
2067 | 		'Arabic Presentation Forms-B' => array(
2068 | 			0 => 0xFE70,
2069 | 			1 => 0xFEFF,
2070 | 			2 => 150,
2071 | 		),
2072 | 		'Halfwidth and Fullwidth Forms' => array(
2073 | 			0 => 0xFF00,
2074 | 			1 => 0xFFEF,
2075 | 			2 => 151,
2076 | 		),
2077 | 		'Specials' => array(
2078 | 			0 => 0xFFF0,
2079 | 			1 => 0xFFFF,
2080 | 			2 => 152,
2081 | 		),
2082 | 		'Linear B Syllabary' => array(
2083 | 			0 => 0x10000,
2084 | 			1 => 0x1007F,
2085 | 			2 => 153,
2086 | 		),
2087 | 		'Linear B Ideograms' => array(
2088 | 			0 => 0x10080,
2089 | 			1 => 0x100FF,
2090 | 			2 => 154,
2091 | 		),
2092 | 		'Aegean Numbers' => array(
2093 | 			0 => 0x10100,
2094 | 			1 => 0x1013F,
2095 | 			2 => 155,
2096 | 		),
2097 | 		'Ancient Greek Numbers' => array(
2098 | 			0 => 0x10140,
2099 | 			1 => 0x1018F,
2100 | 			2 => 156,
2101 | 		),
2102 | 		'Ancient Symbols' => array(
2103 | 			0 => 0x10190,
2104 | 			1 => 0x101CF,
2105 | 			2 => 157,
2106 | 		),
2107 | 		'Phaistos Disc' => array(
2108 | 			0 => 0x101D0,
2109 | 			1 => 0x101FF,
2110 | 			2 => 158,
2111 | 		),
2112 | 		'Lycian' => array(
2113 | 			0 => 0x10280,
2114 | 			1 => 0x1029F,
2115 | 			2 => 159,
2116 | 		),
2117 | 		'Carian' => array(
2118 | 			0 => 0x102A0,
2119 | 			1 => 0x102DF,
2120 | 			2 => 160,
2121 | 		),
2122 | 		'Old Italic' => array(
2123 | 			0 => 0x10300,
2124 | 			1 => 0x1032F,
2125 | 			2 => 161,
2126 | 		),
2127 | 		'Gothic' => array(
2128 | 			0 => 0x10330,
2129 | 			1 => 0x1034F,
2130 | 			2 => 162,
2131 | 		),
2132 | 		'Ugaritic' => array(
2133 | 			0 => 0x10380,
2134 | 			1 => 0x1039F,
2135 | 			2 => 163,
2136 | 		),
2137 | 		'Old Persian' => array(
2138 | 			0 => 0x103A0,
2139 | 			1 => 0x103DF,
2140 | 			2 => 164,
2141 | 		),
2142 | 		'Deseret' => array(
2143 | 			0 => 0x10400,
2144 | 			1 => 0x1044F,
2145 | 			2 => 165,
2146 | 		),
2147 | 		'Shavian' => array(
2148 | 			0 => 0x10450,
2149 | 			1 => 0x1047F,
2150 | 			2 => 166,
2151 | 		),
2152 | 		'Osmanya' => array(
2153 | 			0 => 0x10480,
2154 | 			1 => 0x104AF,
2155 | 			2 => 167,
2156 | 		),
2157 | 		'Cypriot Syllabary' => array(
2158 | 			0 => 0x10800,
2159 | 			1 => 0x1083F,
2160 | 			2 => 168,
2161 | 		),
2162 | 		'Imperial Aramaic' => array(
2163 | 			0 => 0x10840,
2164 | 			1 => 0x1085F,
2165 | 			2 => 169,
2166 | 		),
2167 | 		'Phoenician' => array(
2168 | 			0 => 0x10900,
2169 | 			1 => 0x1091F,
2170 | 			2 => 170,
2171 | 		),
2172 | 		'Lydian' => array(
2173 | 			0 => 0x10920,
2174 | 			1 => 0x1093F,
2175 | 			2 => 171,
2176 | 		),
2177 | 		'Kharoshthi' => array(
2178 | 			0 => 0x10A00,
2179 | 			1 => 0x10A5F,
2180 | 			2 => 172,
2181 | 		),
2182 | 		'Old South Arabian' => array(
2183 | 			0 => 0x10A60,
2184 | 			1 => 0x10A7F,
2185 | 			2 => 173,
2186 | 		),
2187 | 		'Avestan' => array(
2188 | 			0 => 0x10B00,
2189 | 			1 => 0x10B3F,
2190 | 			2 => 174,
2191 | 		),
2192 | 		'Inscriptional Parthian' => array(
2193 | 			0 => 0x10B40,
2194 | 			1 => 0x10B5F,
2195 | 			2 => 175,
2196 | 		),
2197 | 		'Inscriptional Pahlavi' => array(
2198 | 			0 => 0x10B60,
2199 | 			1 => 0x10B7F,
2200 | 			2 => 176,
2201 | 		),
2202 | 		'Old Turkic' => array(
2203 | 			0 => 0x10C00,
2204 | 			1 => 0x10C4F,
2205 | 			2 => 177,
2206 | 		),
2207 | 		'Rumi Numeral Symbols' => array(
2208 | 			0 => 0x10E60,
2209 | 			1 => 0x10E7F,
2210 | 			2 => 178,
2211 | 		),
2212 | 		'Brahmi' => array(
2213 | 			0 => 0x11000,
2214 | 			1 => 0x1107F,
2215 | 			2 => 179,
2216 | 		),
2217 | 		'Kaithi' => array(
2218 | 			0 => 0x11080,
2219 | 			1 => 0x110CF,
2220 | 			2 => 180,
2221 | 		),
2222 | 		'Cuneiform' => array(
2223 | 			0 => 0x12000,
2224 | 			1 => 0x123FF,
2225 | 			2 => 181,
2226 | 		),
2227 | 		'Cuneiform Numbers and Punctuation' => array(
2228 | 			0 => 0x12400,
2229 | 			1 => 0x1247F,
2230 | 			2 => 182,
2231 | 		),
2232 | 		'Egyptian Hieroglyphs' => array(
2233 | 			0 => 0x13000,
2234 | 			1 => 0x1342F,
2235 | 			2 => 183,
2236 | 		),
2237 | 		'Bamum Supplement' => array(
2238 | 			0 => 0x16800,
2239 | 			1 => 0x16A3F,
2240 | 			2 => 184,
2241 | 		),
2242 | 		'Kana Supplement' => array(
2243 | 			0 => 0x1B000,
2244 | 			1 => 0x1B0FF,
2245 | 			2 => 185,
2246 | 		),
2247 | 		'Byzantine Musical Symbols' => array(
2248 | 			0 => 0x1D000,
2249 | 			1 => 0x1D0FF,
2250 | 			2 => 186,
2251 | 		),
2252 | 		'Musical Symbols' => array(
2253 | 			0 => 0x1D100,
2254 | 			1 => 0x1D1FF,
2255 | 			2 => 187,
2256 | 		),
2257 | 		'Ancient Greek Musical Notation' => array(
2258 | 			0 => 0x1D200,
2259 | 			1 => 0x1D24F,
2260 | 			2 => 188,
2261 | 		),
2262 | 		'Tai Xuan Jing Symbols' => array(
2263 | 			0 => 0x1D300,
2264 | 			1 => 0x1D35F,
2265 | 			2 => 189,
2266 | 		),
2267 | 		'Counting Rod Numerals' => array(
2268 | 			0 => 0x1D360,
2269 | 			1 => 0x1D37F,
2270 | 			2 => 190,
2271 | 		),
2272 | 		'Mathematical Alphanumeric Symbols' => array(
2273 | 			0 => 0x1D400,
2274 | 			1 => 0x1D7FF,
2275 | 			2 => 191,
2276 | 		),
2277 | 		'Mahjong Tiles' => array(
2278 | 			0 => 0x1F000,
2279 | 			1 => 0x1F02F,
2280 | 			2 => 192,
2281 | 		),
2282 | 		'Domino Tiles' => array(
2283 | 			0 => 0x1F030,
2284 | 			1 => 0x1F09F,
2285 | 			2 => 193,
2286 | 		),
2287 | 		'Playing Cards' => array(
2288 | 			0 => 0x1F0A0,
2289 | 			1 => 0x1F0FF,
2290 | 			2 => 194,
2291 | 		),
2292 | 		'Enclosed Alphanumeric Supplement' => array(
2293 | 			0 => 0x1F100,
2294 | 			1 => 0x1F1FF,
2295 | 			2 => 195,
2296 | 		),
2297 | 		'Enclosed Ideographic Supplement' => array(
2298 | 			0 => 0x1F200,
2299 | 			1 => 0x1F2FF,
2300 | 			2 => 196,
2301 | 		),
2302 | 		'Miscellaneous Symbols And Pictographs' => array(
2303 | 			0 => 0x1F300,
2304 | 			1 => 0x1F5FF,
2305 | 			2 => 197,
2306 | 		),
2307 | 		'Emoticons' => array(
2308 | 			0 => 0x1F600,
2309 | 			1 => 0x1F64F,
2310 | 			2 => 198,
2311 | 		),
2312 | 		'Transport And Map Symbols' => array(
2313 | 			0 => 0x1F680,
2314 | 			1 => 0x1F6FF,
2315 | 			2 => 199,
2316 | 		),
2317 | 		'Alchemical Symbols' => array(
2318 | 			0 => 0x1F700,
2319 | 			1 => 0x1F77F,
2320 | 			2 => 200,
2321 | 		),
2322 | 		'CJK Unified Ideographs Extension B' => array(
2323 | 			0 => 0x20000,
2324 | 			1 => 0x2A6DF,
2325 | 			2 => 201,
2326 | 		),
2327 | 		'CJK Unified Ideographs Extension C' => array(
2328 | 			0 => 0x2A700,
2329 | 			1 => 0x2B73F,
2330 | 			2 => 202,
2331 | 		),
2332 | 		'CJK Unified Ideographs Extension D' => array(
2333 | 			0 => 0x2B740,
2334 | 			1 => 0x2B81F,
2335 | 			2 => 203,
2336 | 		),
2337 | 		'CJK Compatibility Ideographs Supplement' => array(
2338 | 			0 => 0x2F800,
2339 | 			1 => 0x2FA1F,
2340 | 			2 => 204,
2341 | 		),
2342 | 		'Tags' => array(
2343 | 			0 => 0xE0000,
2344 | 			1 => 0xE007F,
2345 | 			2 => 205,
2346 | 		),
2347 | 		'Variation Selectors Supplement' => array(
2348 | 			0 => 0xE0100,
2349 | 			1 => 0xE01EF,
2350 | 			2 => 206,
2351 | 		),
2352 | 		'Supplementary Private Use Area-A' => array(
2353 | 			0 => 0xF0000,
2354 | 			1 => 0xFFFFF,
2355 | 			2 => 207,
2356 | 		),
2357 | 		'Supplementary Private Use Area-B' => array(
2358 | 			0 => 0x100000,
2359 | 			1 => 0x10FFFF,
2360 | 			2 => 208,
2361 | 		),
2362 | 	);
2363 | 
2364 | 	#calling the methods of this class only statically!
2365 | 	private function __construct() {}
2366 | 
2367 | 	/**
2368 | 	 * Remove combining diactrical marks, with possibility of the restore
2369 | 	 * Удаляет диакритические знаки в тексте, с возможностью восстановления (опция)
2370 | 	 *
2371 | 	 * @param   string|null       $s
2372 | 	 * @param   array|null        $additional_chars   for example: "\xc2\xad"  #soft hyphen = discretionary hyphen
2373 | 	 * @param   bool              $is_can_restored
2374 | 	 * @param   array|null        &$restore_table
2375 | 	 * @return  string|bool|null  Returns FALSE if error occurred
2376 | 	 */
2377 | 	public static function diactrical_remove($s, $additional_chars = null, $is_can_restored = false, &$restore_table = null)
2378 | 	{
2379 | 		if (! ReflectionTypeHint::isValid()) return false;
2380 | 		if (! is_string($s) || $s === '') return $s;
2381 | 
2382 | 		if ($additional_chars)
2383 | 		{
2384 | 			foreach ($additional_chars as $k => &$v) $v = preg_quote($v, '/');
2385 | 			$re = '/((?>' . self::DIACTRICAL_RE . '|' . implode('|', $additional_chars) . ')+)/sxSX';
2386 | 		}
2387 | 		else $re = '/((?>' . self::DIACTRICAL_RE . ')+)/sxSX';
2388 | 		if (! $is_can_restored) return preg_replace($re, '', $s);
2389 | 
2390 | 		$restore_table = array();
2391 | 		$a = preg_split($re, $s, -1, PREG_SPLIT_DELIM_CAPTURE);
2392 | 		$c = count($a);
2393 | 		if ($c === 1) return $s;
2394 | 		$pos = 0;
2395 | 		$s2 = '';
2396 | 		for ($i = 0; $i < $c - 1; $i += 2)
2397 | 		{
2398 | 			$s2 .= $a[$i];
2399 | 			#запоминаем символьные (не байтовые!) позиции
2400 | 			$pos += self::strlen($a[$i]);
2401 | 			$restore_table['offsets'][$pos] = $a[$i + 1];
2402 | 		}
2403 | 		$restore_table['length'] = $pos + self::strlen(end($a));
2404 | 		return $s2 . end($a);
2405 | 	}
2406 | 
2407 | 	/**
2408 | 	 * Restore combining diactrical marks, removed by self::diactrical_remove()
2409 | 	 * In Russian:
2410 | 	 * Восстанавливает диакритические знаки в тексте, при условии, что их символьные позиции и кол-во символов не изменились!
2411 | 	 *
2412 | 	 * @see     self::diactrical_remove()
2413 | 	 * @param   string|null       $s
2414 | 	 * @param   array             $restore_table
2415 | 	 * @return  string|bool|null  Returns FALSE if error occurred (broken $restore_table)
2416 | 	 */
2417 | 	public static function diactrical_restore($s, array $restore_table)
2418 | 	{
2419 | 		if (! ReflectionTypeHint::isValid()) return false;
2420 | 		if (! is_string($s) || $s === '') return $s;
2421 | 
2422 | 		if (! $restore_table) return $s;
2423 | 		if (! is_int(@$restore_table['length']) ||
2424 | 			! is_array(@$restore_table['offsets']) ||
2425 | 			$restore_table['length'] !== self::strlen($s)) return false;
2426 | 		$a = array();
2427 | 		$length = $offset = 0;
2428 | 		$s2 = '';
2429 | 		foreach ($restore_table['offsets'] as $pos => $diactricals)
2430 | 		{
2431 | 			$length = $pos - $offset;
2432 | 			$s2 .= self::substr($s, $offset, $length) . $diactricals;
2433 | 			$offset = $pos;
2434 | 		}
2435 | 		return $s2 . self::substr($s, $offset, strlen($s));
2436 | 	}
2437 | 
2438 | 	/**
2439 | 	 * Encodes data from another character encoding to UTF-8.
2440 | 	 *
2441 | 	 * @param   array|scalar|null  $data
2442 | 	 * @param   string             $charset
2443 | 	 * @return  array|scalar|null  Returns FALSE if error occurred
2444 | 	 */
2445 | 	public static function convert_from($data, $charset = 'cp1251')
2446 | 	{
2447 | 		if (! ReflectionTypeHint::isValid()) return false;
2448 | 		$charset = strtoupper($charset);
2449 | 		return self::_convert($data, $charset, 'UTF-8');
2450 | 	}
2451 | 
2452 | 	/**
2453 | 	 * Encodes data from UTF-8 to another character encoding.
2454 | 	 *
2455 | 	 * @param   array|scalar|null  $data
2456 | 	 * @param   string             $charset
2457 | 	 * @return  array|scalar|null  Returns FALSE if error occurred
2458 | 	 */
2459 | 	public static function convert_to($data, $charset = 'cp1251')
2460 | 	{
2461 | 		if (! ReflectionTypeHint::isValid()) return false;
2462 | 		$charset = strtoupper($charset);
2463 | 		return self::_convert($data, 'UTF-8', $charset);
2464 | 	}
2465 | 
2466 | 	/**
2467 | 	 * Recoding the data of any structure to/from UTF-8.
2468 | 	 * Arrays traversed recursively, recoded keys and values.
2469 | 	 *
2470 | 	 * @see mb_encoding_aliases()
2471 | 	 * @param   array|scalar|null  $data
2472 | 	 * @param   string             $charset_from
2473 | 	 * @param   string             $charset_to
2474 | 	 * @return  array|scalar|null  Returns FALSE if error occurred
2475 | 	 */
2476 | 	private static function _convert($data, $charset_from, $charset_to)
2477 | 	{
2478 | 		if (! ReflectionTypeHint::isValid()) return false;  #for recursive calls
2479 | 		if ($charset_from === $charset_to) return $data; #speed improve
2480 | 		if (is_array($data))
2481 | 		{
2482 | 			$d = array();
2483 | 			foreach ($data as $k => &$v)
2484 | 			{
2485 | 				if (is_string($k))
2486 | 				{
2487 | 					$k = self::_convert($k, $charset_from, $charset_to);
2488 | 					if (! is_string($k)) return false;
2489 | 				}
2490 | 				$d[$k] = self::_convert($v, $charset_from, $charset_to);
2491 | 				if ($d[$k] === false && ! is_bool($v)) return false;
2492 | 			}
2493 | 			return $d;
2494 | 		}
2495 | 		if (is_string($data))
2496 | 		{
2497 | 			#smart behaviour for errors protected + speed improve
2498 | 			if ($charset_from === 'UTF-8' && ! self::is_utf8($data)) return $data;
2499 | 			if ($charset_to === 'UTF-8' && self::is_utf8($data)) return $data;
2500 | 
2501 | 			#since PHP-5.3.x iconv() faster then mb_convert_encoding()
2502 | 			if (function_exists('iconv')) return iconv($charset_from, $charset_to . '//IGNORE//TRANSLIT', $data);
2503 | 			if (function_exists('mb_convert_encoding')) return mb_convert_encoding($data, $charset_to, $charset_from);
2504 | 
2505 | 			#charset_from
2506 | 			if ($charset_from === 'ISO-8859-1') return utf8_encode($data);
2507 | 			if ($charset_from === 'UTF-16' || $charset_from === 'UCS-2') return self::_convert_from_utf16($data);
2508 | 			if ($charset_from === 'CP1251' || $charset_from === 'CP1259') return strtr($data, self::$cp1259_table);
2509 | 			if ($charset_from === 'KOI8-R') return strtr(convert_cyr_string($data, 'k', 'w'), self::$cp1259_table);
2510 | 			if ($charset_from === 'ISO-8859-5') return strtr(convert_cyr_string($data, 'i', 'w'), self::$cp1259_table);
2511 | 			if ($charset_from === 'CP866') return strtr(convert_cyr_string($data, 'a', 'w'), self::$cp1259_table);
2512 | 			if ($charset_from === 'MAC-CYRILLIC') return strtr(convert_cyr_string($data, 'm', 'w'), self::$cp1259_table);
2513 | 
2514 | 			#charset_to
2515 | 			if ($charset_to === 'ISO-8859-1') return utf8_decode($data);
2516 | 			if ($charset_to === 'CP1251' || $charset_to === 'CP1259') return strtr($data, array_flip(self::$cp1259_table));
2517 | 
2518 | 			#last trying
2519 | 			if (function_exists('recode_string'))
2520 | 			{
2521 | 				$s = @recode_string($charset_from . '..' . $charset_to, $data);
2522 | 				if (is_string($s)) return $s;
2523 | 			}
2524 | 
2525 | 			trigger_error('Convert "' . $charset_from . '" --> "' . $charset_to . '" is not supported native, "iconv" or "mbstring" extension required', E_USER_WARNING);
2526 | 			return false;
2527 | 		}
2528 | 		if (is_scalar($data) || is_null($data)) return $data;  #~ null, integer, float, boolean
2529 | 		return false; #object or resource
2530 | 	}
2531 | 
2532 | 	/**
2533 | 	 * Convert UTF-16 / UCS-2 encoding string to UTF-8.
2534 | 	 * Surrogates UTF-16 are supported!
2535 | 	 *
2536 | 	 * In Russian:
2537 | 	 * Преобразует строку из кодировки UTF-16 / UCS-2 в UTF-8.
2538 | 	 * Суррогаты UTF-16 поддерживаются!
2539 | 	 *
2540 | 	 * @param    string        $s
2541 | 	 * @param    string        $type      'BE' -- big endian byte order
2542 | 	 *                                    'LE' -- little endian byte order
2543 | 	 * @param    bool          $to_array  returns array chars instead whole string?
2544 | 	 * @return   string|array|bool        UTF-8 string, array chars or FALSE if error occurred
2545 | 	 */
2546 | 	private static function _convert_from_utf16($s, $type = 'BE', $to_array = false)
2547 | 	{
2548 | 		static $types = array(
2549 | 			'BE' => 'n',  #unsigned short (always 16 bit, big endian byte order)
2550 | 			'LE' => 'v',  #unsigned short (always 16 bit, little endian byte order)
2551 | 		);
2552 | 		if (! array_key_exists($type, $types))
2553 | 		{
2554 | 			trigger_error('Unexpected value in 2-nd parameter, "' . $type . '" given!', E_USER_WARNING);
2555 | 			return false;
2556 | 		}
2557 | 		#the fastest way:
2558 | 		if (function_exists('iconv') || function_exists('mb_convert_encoding'))
2559 | 		{
2560 | 			if (function_exists('iconv'))                   $s = iconv('UTF-16' . $type, 'UTF-8', $s);
2561 | 			elseif (function_exists('mb_convert_encoding')) $s = mb_convert_encoding($s, 'UTF-8', 'UTF-16' . $type);
2562 | 			if (! $to_array) return $s;
2563 | 			return self::str_split($s);
2564 | 		}
2565 | 
2566 | 		/*
2567 | 		http://en.wikipedia.org/wiki/UTF-16
2568 | 
2569 | 		The improvement that UTF-16 made over UCS-2 is its ability to encode
2570 | 		characters in planes 1-16, not just those in plane 0 (BMP).
2571 | 
2572 | 		UTF-16 represents non-BMP characters (those from U+10000 through U+10FFFF)
2573 | 		using a pair of 16-bit words, known as a surrogate pair.
2574 | 		First 1000016 is subtracted from the code point to give a 20-bit value.
2575 | 		This is then split into two separate 10-bit values each of which is represented
2576 | 		as a surrogate with the most significant half placed in the first surrogate.
2577 | 		To allow safe use of simple word-oriented string processing, separate ranges
2578 | 		of values are used for the two surrogates: 0xD800-0xDBFF for the first, most
2579 | 		significant surrogate and 0xDC00-0xDFFF for the second, least significant surrogate.
2580 | 
2581 | 		For example, the character at code point U+10000 becomes the code unit sequence 0xD800 0xDC00,
2582 | 		and the character at U+10FFFD, the upper limit of Unicode, becomes the sequence 0xDBFF 0xDFFD.
2583 | 		Unicode and ISO/IEC 10646 do not, and will never, assign characters to any of the code points
2584 | 		in the U+D800-U+DFFF range, so an individual code value from a surrogate pair does not ever
2585 | 		represent a character.
2586 | 
2587 | 		http://www.russellcottrell.com/greek/utilities/SurrogatePairCalculator.htm
2588 | 		http://www.russellcottrell.com/greek/utilities/UnicodeRanges.htm
2589 | 
2590 | 		Conversion of a Unicode scalar value S to a surrogate pair <H, L>:
2591 | 		  H = Math.floor((S - 0x10000) / 0x400) + 0xD800;
2592 | 		  L = ((S - 0x10000) % 0x400) + 0xDC00;
2593 | 		The conversion of a surrogate pair <H, L> to a scalar value:
2594 | 		  N = ((H - 0xD800) * 0x400) + (L - 0xDC00) + 0x10000;
2595 | 		*/
2596 | 		$a = array();
2597 | 		$hi = false;
2598 | 		foreach (unpack($types[$type] . '*', $s) as $codepoint)
2599 | 		{
2600 | 			#surrogate process
2601 | 			if ($hi !== false)
2602 | 			{
2603 | 				$lo = $codepoint;
2604 | 				if ($lo < 0xDC00 || $lo > 0xDFFF) $a[] = "\xEF\xBF\xBD"; #U+FFFD REPLACEMENT CHARACTER (for broken char)
2605 | 				else
2606 | 				{
2607 | 					$codepoint = (($hi - 0xD800) * 0x400) + ($lo - 0xDC00) + 0x10000;
2608 | 					$a[] = self::chr($codepoint);
2609 | 				}
2610 | 				$hi = false;
2611 | 			}
2612 | 			elseif ($codepoint < 0xD800 || $codepoint > 0xDBFF) $a[] = self::chr($codepoint); #not surrogate
2613 | 			else $hi = $codepoint; #surrogate was found
2614 | 		}
2615 | 		return $to_array ? $a : implode('', $a);
2616 | 	}
2617 | 
2618 | 	/**
2619 | 	 * Strips out device control codes in the ASCII range.
2620 | 	 *
2621 | 	 * @param   array|scalar|null  Data to clean
2622 | 	 * @return  array|scalar|null  Returns FALSE if error occurred
2623 | 	 */
2624 | 	public static function strict($data)
2625 | 	{
2626 | 		if (! ReflectionTypeHint::isValid()) return false;
2627 | 		if (is_array($data))
2628 | 		{
2629 | 			$d = array();
2630 | 			foreach ($data as $k => &$v)
2631 | 			{
2632 | 				if (is_string($k))
2633 | 				{
2634 | 					$k = self::strict($k);
2635 | 					if (! is_string($k)) return false;
2636 | 				}
2637 | 				$d[$k] = self::strict($v);
2638 | 				if ($d[$k] === false && ! is_bool($v)) return false;
2639 | 			}
2640 | 			return $d;
2641 | 		}
2642 | 		if (is_string($data)) return preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F]+/sSX', '', $data);
2643 | 		if (is_scalar($data) || is_null($data)) return $data;  #int/float/bool/null
2644 | 		return false; #object or resource
2645 | 	}
2646 | 
2647 | 	/**
2648 | 	 * Check the data accessory to the class of control characters in ASCII.
2649 | 	 * For non string always returns FALSE.
2650 | 	 *
2651 | 	 * @param   scalar|null  $data
2652 | 	 * @param   int|null     $found_char_offset  Returns the offset for the first found binary symbol
2653 | 	 * @return  bool
2654 | 	 */
2655 | 	public static function has_binary($data, &$found_char_offset = null)
2656 | 	{
2657 | 		if (! ReflectionTypeHint::isValid()) return false;
2658 | 		#[\t\n\r] = [\x09\x0a\x0d]
2659 | 		#[\x00-\x1f\x7f](?<![\t\n\r]) = [\x00-\x08\x0b\x0c\x0e-\x1f\x7f] = [^\x09\x0a\x0d\x20-\x7e\x80-\xff]
2660 | 		if (! is_string($data) ||
2661 | 			#search a binary char
2662 | 			! preg_match('~[\x00-\x1f\x7f](?<![\t\n\r])~sSX', $data, $m, PREG_OFFSET_CAPTURE)) return false;
2663 | 		$found_char_offset = self::strlen(substr($data, 0, $m[0][1]));
2664 | 		return true;
2665 | 	}
2666 | 
2667 | 	/**
2668 | 	 * Check the data accessory to the class of characters ASCII.
2669 | 	 * For non string/int/float always returns FALSE
2670 | 	 *
2671 | 	 * @param   scalar|null  $data
2672 | 	 * @param   int|null     $error_char_offset  Returns the offset for the first found non ASCII symbol
2673 | 	 * @return  bool
2674 | 	 */
2675 | 	public static function is_ascii($data, &$error_char_offset = null)
2676 | 	{
2677 | 		if (! ReflectionTypeHint::isValid()) return false;
2678 | 		if (is_string($data))
2679 | 		{
2680 | 			if (! preg_match('~[\x80-\xff]~sSX', $data, $m, PREG_OFFSET_CAPTURE)) return true;
2681 | 			$error_char_offset = $m[0][1];
2682 | 			return false;
2683 | 		}
2684 | 		if (is_int($data) || is_float($data)) return true;
2685 | 		return false;
2686 | 	}
2687 | 
2688 | 	/**
2689 | 	 * Returns true if data is valid UTF-8 and false otherwise.
2690 | 	 * For null, integer, float, boolean returns TRUE.
2691 | 	 *
2692 | 	 * The arrays are traversed recursively, if At least one element of the array
2693 | 	 * its value is not in UTF-8, returns FALSE.
2694 | 	 *
2695 | 	 * @link    http://www.w3.org/International/questions/qa-forms-utf-8.html
2696 | 	 * @link    http://ru3.php.net/mb_detect_encoding
2697 | 	 * @link    http://webtest.philigon.ru/articles/utf8/
2698 | 	 * @link    http://unicode.coeurlumiere.com/
2699 | 	 * @param   array|scalar|null  $data
2700 | 	 * @param   bool               $is_strict  strict the range of ASCII?
2701 | 	 * @return  bool
2702 | 	 */
2703 | 	public static function is_utf8($data, $is_strict = true)
2704 | 	{
2705 | 		if (! ReflectionTypeHint::isValid()) return false;
2706 | 		if (is_string($data))
2707 | 		{
2708 | 			if (preg_match('~~suSX', $data) !== 1) return false;
2709 | 			//if (function_exists('preg_last_error') && preg_last_error() !== PREG_NO_ERROR) return false;
2710 | 			//preg_match('~~suSX') much faster (up to 4 times), then mb_check_encoding($data, 'UTF-8')!
2711 | 			//if (function_exists('mb_check_encoding') && ! mb_check_encoding($data, 'UTF-8')) return false; #DEPRECATED
2712 | 			/**
2713 | 			 * Специальные символы по спецификации JSON (http://json.org/)
2714 | 			 *   \b represents the backspace character (U+0008)
2715 | 			 *   \t represents the character tabulation character (U+0009)
2716 | 			 *   \n represents the line feed character (U+000A)
2717 | 			 *   \f represents the form feed character (U+000C)
2718 | 			 *   \r represents the carriage return character (U+000D)
2719 | 			 */
2720 | 			//с данным регулярным выражением preg_match() работает в 2 раза быстрее, чем strpbrk()
2721 | 			if ($is_strict && preg_match('/[^\x08\x09\x0A\x0C\x0D\x20-\xBF\xC2-\xF7]/sSX', $data)) {
2722 | 				return false;
2723 | 			}
2724 | 			return true;
2725 | 		}
2726 | 		if (is_scalar($data) || is_null($data)) return true;  #int/float/bool/null
2727 | 		if (is_array($data))
2728 | 		{
2729 | 			foreach ($data as $k => &$v)
2730 | 			{
2731 | 				if (! self::is_utf8($k, $is_strict) || ! self::is_utf8($v, $is_strict)) return false;
2732 | 			}
2733 | 			return true;
2734 | 		}
2735 | 		return false; #object or resource
2736 | 	}
2737 | 
2738 | 	/**
2739 | 	 * Tries to detect if a string is in Unicode encoding
2740 | 	 *
2741 | 	 * @deprecated  Slowly, use self::is_utf8() instead
2742 | 	 * @see     self::is_utf8()
2743 | 	 * @param   string   $s          текст
2744 | 	 * @param   bool     $is_strict  строгая проверка диапазона ASCII?
2745 | 	 * @return  bool
2746 | 	 */
2747 | 	public static function check($s, $is_strict = true)
2748 | 	{
2749 | 		if (! ReflectionTypeHint::isValid()) return false;
2750 | 		for ($i = 0, $len = strlen($s); $i < $len; $i++)
2751 | 		{
2752 | 			$c = ord($s[$i]);
2753 | 			if ($c < 0x80) #1 byte  0bbbbbbb
2754 | 			{
2755 | 				if ($is_strict === false || ($c > 0x1F && $c < 0x7F) || $c == 0x09 || $c == 0x0A || $c == 0x0D) continue;
2756 | 			}
2757 | 			if (($c & 0xE0) == 0xC0) $n = 1; #2 bytes 110bbbbb 10bbbbbb
2758 | 			elseif (($c & 0xF0) == 0xE0) $n = 2; #3 bytes 1110bbbb 10bbbbbb 10bbbbbb
2759 | 			elseif (($c & 0xF8) == 0xF0) $n = 3; #4 bytes 11110bbb 10bbbbbb 10bbbbbb 10bbbbbb
2760 | 			elseif (($c & 0xFC) == 0xF8) $n = 4; #5 bytes 111110bb 10bbbbbb 10bbbbbb 10bbbbbb 10bbbbbb
2761 | 			elseif (($c & 0xFE) == 0xFC) $n = 5; #6 bytes 1111110b 10bbbbbb 10bbbbbb 10bbbbbb 10bbbbbb 10bbbbbb
2762 | 			else return false; #does not match any model
2763 | 			#n bytes matching 10bbbbbb follow ?
2764 | 			for ($j = 0; $j < $n; $j++)
2765 | 			{
2766 | 				$i++;
2767 | 				if ($i == $len || ((ord($s[$i]) & 0xC0) != 0x80) ) return false;
2768 | 			}
2769 | 		}
2770 | 		return true;
2771 | 	}
2772 | 
2773 | 	/**
2774 | 	 * Check the data in UTF-8 charset on given ranges of the standard UNICODE.
2775 | 	 * The suitable alternative to regular expressions.
2776 | 	 *
2777 | 	 * For null, integer, float, boolean returns TRUE.
2778 | 	 *
2779 | 	 * Arrays traversed recursively (keys and values).
2780 | 	 * At least if one array element value is not passed checking, it returns FALSE.
2781 | 	 *
2782 | 	 * @example
2783 | 	 *   #A simple check the standard named ranges:
2784 | 	 *   UTF8::blocks_check('поисковые системы Google и Yandex', array('Basic Latin', 'Cyrillic'));
2785 | 	 *   #You can check the named, direct ranges or codepoints together:
2786 | 	 *   UTF8::blocks_check('поисковые системы Google и Yandex', array(array(0x20, 0x7E),     #[\x20-\x7E]
2787 | 	 *                                                                 array(0x0410, 0x044F), #[A-Яa-я]
2788 | 	 *                                                                 0x0401, #russian yo (Ё)
2789 | 	 *                                                                 0x0451, #russian ye (ё)
2790 | 	 *                                                                 'Arrows',
2791 | 	 *                                                                ));
2792 | 	 *
2793 | 	 * @link    http://www.unicode.org/charts/
2794 | 	 * @param   array|scalar|null  $data
2795 | 	 * @param   array|string       $blocks
2796 | 	 * @return  bool               Возвращает TRUE, если все символы из текста принадлежат указанным диапазонам
2797 | 	 *                             и FALSE в противном случае или для разбитого UTF-8.
2798 | 	 */
2799 | 	public static function blocks_check($data, $blocks)
2800 | 	{
2801 | 		if (! ReflectionTypeHint::isValid()) return false;
2802 | 
2803 | 		if (is_array($data))
2804 | 		{
2805 | 			foreach ($data as $k => &$v)
2806 | 			{
2807 | 				if (! self::blocks_check($k, $blocks) || ! self::blocks_check($v, $blocks)) return false;
2808 | 			}
2809 | 			return true;
2810 | 		}
2811 | 
2812 | 		if (is_int($data)) $data = strval($data);
2813 | 		elseif (is_float($data)) $data = str_replace(',', '.', strval($data));
2814 | 		elseif (! is_string($data)) return false;
2815 | 
2816 | 		$chars = self::str_split($data);
2817 | 		if ($chars === false) return false; #broken UTF-8
2818 | 		unset($data); #memory free
2819 | 		$skip = array(); #save to cache already checked symbols
2820 | 		foreach ($chars as $i => $char)
2821 | 		{
2822 | 			if (array_key_exists($char, $skip)) continue; #speed improve
2823 | 			$codepoint = self::ord($char);
2824 | 			if (! is_int($codepoint)) return false; #broken UTF-8?
2825 | 			$is_valid = false;
2826 | 			$blocks = (array)$blocks;
2827 | 			foreach ($blocks as $j => $block)
2828 | 			{
2829 | 				if (is_string($block))
2830 | 				{
2831 | 					if (! array_key_exists($block, self::$unicode_blocks))
2832 | 					{
2833 | 						trigger_error('Unknown block "' . $block . '"!', E_USER_WARNING);
2834 | 						return false;
2835 | 					}
2836 | 					list ($min, $max) = self::$unicode_blocks[$block];
2837 | 				}
2838 | 				elseif (is_array($block)) list ($min, $max) = $block;
2839 | 				elseif (is_int($block)) $min = $max = $block;
2840 | 				else trigger_error('A string/array/int type expected for block[' . $j . ']!', E_USER_ERROR);
2841 | 				if ($codepoint >= $min && $codepoint <= $max)
2842 | 				{
2843 | 					$is_valid = true;
2844 | 					break;
2845 | 				}
2846 | 			}
2847 | 			if (! $is_valid) return false;
2848 | 			$skip[$char] = null;
2849 | 		}
2850 | 		return true;
2851 | 	}
2852 | 
2853 | 	/**
2854 | 	 * Сравнение строк
2855 | 	 *
2856 | 	 * @param   string|null    $s1
2857 | 	 * @param   string|null    $s2
2858 | 	 * @param   string         $locale   For example, 'en_CA', 'ru_RU'
2859 | 	 * @return  int|bool|null  Returns FALSE if error occurred
2860 | 	 *                         Returns < 0 if $s1 is less than $s2;
2861 | 	 *                                 > 0 if $s1 is greater than $s2;
2862 | 	 *                                 0 if they are equal.
2863 | 	 */
2864 | 	public static function strcmp($s1, $s2, $locale = '')
2865 | 	{
2866 | 		if (! ReflectionTypeHint::isValid()) return false;
2867 | 		if (! is_string($s1) || ! is_string($s2)) return null;
2868 | 		if (! function_exists('collator_create')) return strcmp($s1, $s2);
2869 | 		# PHP 5 >= 5.3.0, PECL intl >= 1.0.0
2870 | 		# If empty string ("") or "root" are passed, UCA rules will be used.
2871 | 		$c = new Collator($locale);
2872 | 		if (! $c)
2873 | 		{
2874 | 			# Returns an "empty" object on error. You can use intl_get_error_code() and/or intl_get_error_message() to know what happened.
2875 | 			trigger_error(intl_get_error_message(), E_USER_WARNING);
2876 | 			return false;
2877 | 		}
2878 | 		return $c->compare($s1, $s2);
2879 | 	}
2880 | 
2881 | 	/**
2882 | 	 * Сравнение строк для N первых символов
2883 | 	 *
2884 | 	 * @param   string|null    $s1
2885 | 	 * @param   string|null    $s2
2886 | 	 * @param   int            $length
2887 | 	 * @return  int|bool|null  Returns FALSE if error occurred
2888 | 	 *                         Returns < 0 if $s1 is less than $s2;
2889 | 	 *                                 > 0 if $s1 is greater than $s2;
2890 | 	 *                                 0 if they are equal.
2891 | 	 */
2892 | 	public static function strncmp($s1, $s2, $length)
2893 | 	{
2894 | 		if (! ReflectionTypeHint::isValid()) return false;
2895 | 		if (! is_string($s1) || ! is_string($s2)) return null;
2896 | 		return self::strcmp(self::substr($s1, 0, $length), self::substr($s2, 0, $length));
2897 | 	}
2898 | 
2899 | 	/**
2900 | 	 * Implementation strcasecmp() function for UTF-8 encoding string.
2901 | 	 *
2902 | 	 * @param   string|null    $s1
2903 | 	 * @param   string|null    $s2
2904 | 	 * @return  int|bool|null  Returns FALSE if error occurred
2905 | 	 *                         Returns < 0 if $s1 is less than $s2;
2906 | 	 *                                 > 0 if $s1 is greater than $s2;
2907 | 	 *                                 0 if they are equal.
2908 | 	 */
2909 | 	public static function strcasecmp($s1, $s2)
2910 | 	{
2911 | 		if (! ReflectionTypeHint::isValid()) return false;
2912 | 		if (! is_string($s1) || ! is_string($s2)) return null;
2913 | 		return self::strcmp(self::lowercase($s1), self::lowercase($s2));
2914 | 	}
2915 | 
2916 | 	/**
2917 | 	 * Converts a UTF-8 string to a UNICODE codepoints
2918 | 	 *
2919 | 	 * @param   string|null     $s  UTF-8 string
2920 | 	 * @return  array|bool|null     Unicode codepoints
2921 | 	 *                              Returns FALSE if $s broken (not UTF-8)
2922 | 	 */
2923 | 	public static function to_unicode($s)
2924 | 	{
2925 | 		if (! ReflectionTypeHint::isValid()) return false;
2926 | 		if (! is_string($s) || $s === '') return $s;
2927 | 
2928 | 		$s2 = null;
2929 | 		#since PHP-5.3.x iconv() little faster then mb_convert_encoding()
2930 | 		if (function_exists('iconv')) $s2 = @iconv('UTF-8', 'UCS-4BE', $s);
2931 | 		elseif (function_exists('mb_convert_encoding')) $s2 = @mb_convert_encoding($s, 'UCS-4BE', 'UTF-8');
2932 | 		if (is_string($s2)) return array_values(unpack('N*', $s2));
2933 | 		if ($s2 !== null) return false;
2934 | 
2935 | 		$a = self::str_split($s);
2936 | 		if (! is_array($a)) return false;
2937 | 		return array_map(array(__CLASS__, 'ord'), $a);
2938 | 	}
2939 | 
2940 | 	/**
2941 | 	 * Converts a UNICODE codepoints to a UTF-8 string
2942 | 	 *
2943 | 	 * @param   array|null       $a  Unicode codepoints
2944 | 	 * @return  string|bool|null     UTF-8 string
2945 | 	 *                               Returns FALSE if error occurred
2946 | 	 */
2947 | 	public static function from_unicode($a)
2948 | 	{
2949 | 		if (! ReflectionTypeHint::isValid()) return false;
2950 | 		if (! is_array($a)) return $a;
2951 | 
2952 | 		#since PHP-5.3.x iconv() little faster then mb_convert_encoding()
2953 | 		if (function_exists('iconv'))
2954 | 		{
2955 | 			array_walk($a, function(&$cp) { $cp = pack('N', $cp); });
2956 | 			$s = @iconv('UCS-4BE', 'UTF-8', implode('', $a));
2957 | 			if (! is_string($s)) return false;
2958 | 			return $s;
2959 | 		}
2960 | 		if (function_exists('mb_convert_encoding'))
2961 | 		{
2962 | 			array_walk($a, function(&$cp) { $cp = pack('N', $cp); });
2963 | 			$s = mb_convert_encoding(implode('', $a), 'UTF-8', 'UCS-4BE');
2964 | 			if (! is_string($s)) return false;
2965 | 			return $s;
2966 | 		}
2967 | 
2968 | 		return implode('', array_map(array(__CLASS__, 'chr'), $a));
2969 | 	}
2970 | 
2971 | 	/**
2972 | 	 * Converts a UTF-8 character to a UNICODE codepoint
2973 | 	 *
2974 | 	 * @param   string|null    $char  UTF-8 character
2975 | 	 * @return  int|bool|null         Unicode codepoint
2976 | 	 *                                Returns FALSE if $char broken (not UTF-8)
2977 | 	 */
2978 | 	public static function ord($char)
2979 | 	{
2980 | 		if (! ReflectionTypeHint::isValid()) return false;
2981 | 		if (! is_string($char)) return $char;
2982 | 
2983 | 		static $cache = array();
2984 | 		if (array_key_exists($char, $cache)) return $cache[$char]; #speed improve
2985 | 
2986 | 		switch (strlen($char))
2987 | 		{
2988 | 			case 1 : return $cache[$char] = ord($char);
2989 | 			case 2 : return $cache[$char] = (ord($char{1}) & 63) |
2990 | 											((ord($char{0}) & 31) << 6);
2991 | 			case 3 : return $cache[$char] = (ord($char{2}) & 63) |
2992 | 											((ord($char{1}) & 63) << 6) |
2993 | 											((ord($char{0}) & 15) << 12);
2994 | 			case 4 : return $cache[$char] = (ord($char{3}) & 63) |
2995 | 											((ord($char{2}) & 63) << 6) |
2996 | 											((ord($char{1}) & 63) << 12) |
2997 | 											((ord($char{0}) & 7)  << 18);
2998 | 			default :
2999 | 				trigger_error('Character 0x' . bin2hex($char) . ' is not UTF-8!', E_USER_WARNING);
3000 | 				return false;
3001 | 		}
3002 | 	}
3003 | 
3004 | 	/**
3005 | 	 * Converts a UNICODE codepoint to a UTF-8 character
3006 | 	 *
3007 | 	 * @param   int|digit|null  $cp  Unicode codepoint
3008 | 	 * @return  string|bool|null     UTF-8 character
3009 | 	 *                               Returns FALSE if error occurred
3010 | 	 */
3011 | 	public static function chr($cp)
3012 | 	{
3013 | 		if (! ReflectionTypeHint::isValid()) return false;
3014 | 		if (! is_int($cp) && ! ctype_digit($cp)) return $cp;
3015 | 
3016 | 		static $cache = array();
3017 | 		if (array_key_exists($cp, $cache)) return $cache[$cp]; #speed improve
3018 | 
3019 | 		if ($cp <= 0x7f)     return $cache[$cp] = chr($cp);
3020 | 		if ($cp <= 0x7ff)    return $cache[$cp] = chr(0xc0 | ($cp >> 6))  .
3021 | 												  chr(0x80 | ($cp & 0x3f));
3022 | 		if ($cp <= 0xffff)   return $cache[$cp] = chr(0xe0 | ($cp >> 12)) .
3023 | 												  chr(0x80 | (($cp >> 6) & 0x3f)) .
3024 | 												  chr(0x80 | ($cp & 0x3f));
3025 | 		if ($cp <= 0x10ffff) return $cache[$cp] = chr(0xf0 | ($cp >> 18)) .
3026 | 												  chr(0x80 | (($cp >> 12) & 0x3f)) .
3027 | 												  chr(0x80 | (($cp >> 6) & 0x3f)) .
3028 | 												  chr(0x80 | ($cp & 0x3f));
3029 | 		#U+FFFD REPLACEMENT CHARACTER
3030 | 		return $cache[$cp] = "\xEF\xBF\xBD";
3031 | 	}
3032 | 
3033 | 	/**
3034 | 	 * Implementation chunk_split() function for UTF-8 encoding string.
3035 | 	 *
3036 | 	 * @param   string|null       $s
3037 | 	 * @param   int|digit|null    $length
3038 | 	 * @param   string|null       $glue
3039 | 	 * @return  string|bool|null  Returns FALSE if error occurred
3040 | 	 */
3041 | 	public static function chunk_split($s, $length = null, $glue = null)
3042 | 	{
3043 | 		if (! ReflectionTypeHint::isValid()) return false;
3044 | 		if (! is_string($s) || $s === '') return $s;
3045 | 
3046 | 		$length = intval($length);
3047 | 		$glue   = strval($glue);
3048 | 		if ($length < 1) $length = 76;
3049 | 		if ($glue === '') $glue = "\r\n";
3050 | 		$a = self::str_split($s, $length);
3051 | 		if (! is_array($a)) return false;
3052 | 		return implode($glue, $a);
3053 | 	}
3054 | 
3055 | 	/**
3056 | 	 * Changes all keys in an array
3057 | 	 *
3058 | 	 * @param   array|null       $a
3059 | 	 * @param   int              $mode  {CASE_LOWER|CASE_UPPER}
3060 | 	 * @param   bool             $is_recursive
3061 | 	 * @return  array|bool|null  Returns FALSE if error occurred
3062 | 	 */
3063 | 	public static function array_change_key_case($a, $mode, $is_recursive = false)
3064 | 	{
3065 | 		if (! ReflectionTypeHint::isValid()) return false;
3066 | 		if (! is_array($a)) return $a;
3067 | 
3068 | 		$a2 = array();
3069 | 		foreach ($a as $k => $v)
3070 | 		{
3071 | 			if (is_string($k))
3072 | 			{
3073 | 				$k = self::convert_case($k, $mode);
3074 | 				if ($k === false) return false;
3075 | 			}
3076 | 			if ($is_recursive && is_array($v)) #recursive support
3077 | 			{
3078 | 				$v = self::array_change_key_case($v, $mode, $is_recursive);
3079 | 				if (! is_array($v)) return false;
3080 | 			}
3081 | 			$a2[$k] = $v;
3082 | 		}
3083 | 		return $a2;
3084 | 	}
3085 | 
3086 | 	/**
3087 | 	 * Конвертирует регистр букв в данных в кодировке UTF-8.
3088 | 	 * Массивы обходятся рекурсивно, при этом конвертируются только значения
3089 | 	 * в элементах массива, а ключи остаются без изменений.
3090 | 	 * Для конвертирования только ключей используйте метод self::array_change_key_case().
3091 | 	 *
3092 | 	 * @see     self::array_change_key_case()
3093 | 	 * @link    http://www.unicode.org/charts/PDF/U0400.pdf
3094 | 	 * @link    http://ru.wikipedia.org/wiki/ISO_639-1
3095 | 	 * @param   array|scalar|null $data  Данные произвольной структуры
3096 | 	 * @param   int               $mode  {CASE_LOWER|CASE_UPPER}
3097 | 	 * @param   bool              $is_ascii_optimization    for speed improve
3098 | 	 * @return  scalar|bool|null  Returns FALSE if error occurred
3099 | 	 */
3100 | 	public static function convert_case($data, $mode, $is_ascii_optimization = true)
3101 | 	{
3102 | 		if (! ReflectionTypeHint::isValid()) return false;
3103 | 
3104 | 		if (is_array($data)) #recursive support
3105 | 		{
3106 | 			foreach ($data as $k => $v)
3107 | 			{
3108 | 				$data[$k] = self::convert_case($v, $mode);
3109 | 				if ($data[$k] === false && ! is_bool($v)) return false;
3110 | 			}
3111 | 			return $data;
3112 | 		}
3113 | 		if (! is_string($data) || ! $data) return $data;
3114 | 
3115 | 		if ($mode === CASE_UPPER)
3116 | 		{
3117 | 			if ($is_ascii_optimization && self::is_ascii($data)) return strtoupper($data); #speed improve!
3118 | 			#deprecated, since PHP-5.3.x strtr() 2-3 times faster then mb_strtolower()
3119 | 			#if (function_exists('mb_strtoupper')) return mb_strtoupper($data, 'utf-8');
3120 | 			return strtr($data, array_flip(self::$convert_case_table));
3121 | 		}
3122 | 		if ($mode === CASE_LOWER)
3123 | 		{
3124 | 			if ($is_ascii_optimization && self::is_ascii($data)) return strtolower($data); #speed improve!
3125 | 			#deprecated, since PHP-5.3.x strtr() 2-3 times faster then mb_strtolower()
3126 | 			#if (function_exists('mb_strtolower')) return mb_strtolower($data, 'utf-8');
3127 | 			return strtr($data, self::$convert_case_table);
3128 | 		}
3129 | 		trigger_error('Parameter 2 should be a constant of CASE_LOWER or CASE_UPPER!', E_USER_WARNING);
3130 | 		return $data;
3131 | 	}
3132 | 
3133 | 	/**
3134 | 	 * Convert a data to lower case
3135 | 	 *
3136 | 	 * @param   array|scalar|null  $data
3137 | 	 * @return  scalar|bool|null   Returns FALSE if error occurred	 */
3138 | 	public static function lowercase($data)
3139 | 	{
3140 | 		if (! ReflectionTypeHint::isValid()) return false;
3141 | 		return self::convert_case($data, CASE_LOWER);
3142 | 	}
3143 | 
3144 | 	/**
3145 | 	 * Convert a data to upper case
3146 | 	 *
3147 | 	 * @param   array|scalar|null  $data
3148 | 	 * @return  scalar|null        Returns FALSE if error occurred
3149 | 	 */
3150 | 	public static function uppercase($data)
3151 | 	{
3152 | 		if (! ReflectionTypeHint::isValid()) return false;
3153 | 		return self::convert_case($data, CASE_UPPER);
3154 | 	}
3155 | 
3156 | 	/**
3157 | 	 * Convert a data to lower case
3158 | 	 *
3159 | 	 * @param   array|scalar|null  $data
3160 | 	 * @return  scalar|bool|null   Returns FALSE if error occurred
3161 | 	 */
3162 | 	public static function strtolower($data)
3163 | 	{
3164 | 		if (! ReflectionTypeHint::isValid()) return false;
3165 | 		return self::convert_case($data, CASE_LOWER);
3166 | 	}
3167 | 
3168 | 	/**
3169 | 	 * Convert a data to upper case
3170 | 	 *
3171 | 	 * @param   array|scalar|null  $data
3172 | 	 * @return  scalar|null        Returns FALSE if error occurred
3173 | 	 */
3174 | 	public static function strtoupper($data)
3175 | 	{
3176 | 		if (! ReflectionTypeHint::isValid()) return false;
3177 | 		return self::convert_case($data, CASE_UPPER);
3178 | 	}
3179 | 
3180 | 
3181 | 	/**
3182 | 	 * Convert all HTML entities to native UTF-8 characters
3183 | 	 * Функция декодирует гораздо больше именованных сущностей, чем стандартная html_entity_decode()
3184 | 	 * Все dec и hex сущности так же переводятся в UTF-8.
3185 | 	 *
3186 | 	 * Example: '&quot;' or '&#34;' or '&#x22;' will be converted to '"'.
3187 | 	 *
3188 | 	 * @link  http://www.htmlhelp.com/reference/html40/entities/
3189 | 	 * @link  http://www.alanwood.net/demos/ent4_frame.html (HTML 4.01 Character Entity References)
3190 | 	 * @link  http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset1.asp?frame=true
3191 | 	 * @link  http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset2.asp?frame=true
3192 | 	 * @link  http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset3.asp?frame=true
3193 | 	 *
3194 | 	 * @param   scalar|null  $s
3195 | 	 * @param   bool         $is_special_chars   Дополнительно обрабатывать специальные html сущности? (&lt; &gt; &amp; &quot; &apos;)
3196 | 	 * @return  scalar|null  Returns FALSE if error occurred
3197 | 	 */
3198 | 	public static function html_entity_decode($s, $is_special_chars = false)
3199 | 	{
3200 | 		if (! ReflectionTypeHint::isValid()) return false;
3201 | 		if (! is_string($s) || $s === '') return $s;
3202 | 
3203 | 		#speed improve
3204 | 		if (strlen($s) < 4  #по минимальной длине сущности - 4 байта: &#d; &xx;
3205 | 			|| ($pos = strpos($s, '&') === false) || strpos($s, ';', $pos) === false) return $s;
3206 | 
3207 | 		$table = self::$html_entity_table;
3208 | 		if ($is_special_chars)
3209 | 		{
3210 | 			$table += self::$html_special_chars_table
3211 | 					+ array(
3212 | 						#&apos; entity is only available in XHTML/HTML5 and not in plain HTML, see http://www.w3.org/TR/xhtml1/#C_16
3213 | 						'&apos;' => "\x27",  #U+0027 ['] &#39; apostrophe
3214 | 					);  
3215 | 		}
3216 | 		#replace named entities
3217 | 		$s = strtr($s, $table);
3218 | 		#block below deprecated, since PHP-5.3.x strtr() 1.5 times faster
3219 | 		if (0 && preg_match_all('/&[a-zA-Z]++\d*+;/sSX', $s, $m, null, $pos))
3220 | 		{
3221 | 			foreach (array_unique($m[0]) as $entity)
3222 | 			{
3223 | 				if (array_key_exists($entity, $table)) $s = str_replace($entity, $table[$entity], $s);
3224 | 			}
3225 | 		}
3226 | 
3227 | 		#заменяем числовые dec и hex сущности:
3228 | 		if (strpos($s, '&#') !== false)  #speed improve
3229 | 		{
3230 | 			$class = __CLASS__;
3231 | 			$html_special_chars_table_flipped = array_flip(self::$html_special_chars_table);
3232 | 			$s = preg_replace_callback('/&#((x)[\da-fA-F]{1,6}+|\d{1,7}+);/sSX',
3233 | 										function (array $m) use ($class, $html_special_chars_table_flipped, $is_special_chars)
3234 | 										{
3235 | 											$codepoint = isset($m[2]) && $m[2] === 'x' ? hexdec($m[1]) : $m[1];
3236 | 											if (! $is_special_chars)
3237 | 											{
3238 | 												$char = pack('C', $codepoint);
3239 | 												if (array_key_exists($char, $html_special_chars_table_flipped)) return $html_special_chars_table_flipped[$char];
3240 | 											}
3241 | 											return $class::chr($codepoint);
3242 | 										}, $s);
3243 | 		}
3244 | 		return $s;
3245 | 	}
3246 | 
3247 | 	/**
3248 | 	 * Convert special UTF-8 characters to HTML entities.
3249 | 	 * Функция кодирует гораздо больше именованных сущностей, чем стандартная htmlentities()
3250 | 	 *
3251 | 	 * @link  http://www.htmlhelp.com/reference/html40/entities/
3252 | 	 * @link  http://www.alanwood.net/demos/ent4_frame.html (HTML 4.01 Character Entity References)
3253 | 	 * @link  http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset1.asp?frame=true
3254 | 	 * @link  http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset2.asp?frame=true
3255 | 	 * @link  http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset3.asp?frame=true
3256 | 	 *
3257 | 	 * @param   scalar|null  $s
3258 | 	 * @param   bool         $is_special_chars_only          Обрабатывать только специальные html сущности? (&lt; &gt; &amp; &quot;)
3259 | 	 * @return  scalar|null  Returns FALSE if error occurred
3260 | 	 */
3261 | 	public static function html_entity_encode($s, $is_special_chars_only = false)
3262 | 	{
3263 | 		if (! ReflectionTypeHint::isValid()) return false;
3264 | 		if (! is_string($s) || $s === '') return $s;
3265 | 
3266 | 		if ($is_special_chars_only) return strtr($s, array_flip(self::$html_special_chars_table));  #binary support
3267 | 		#if ($is_special_chars_only) return htmlspecialchars($s);  #DEPRECATED, charset dependent
3268 | 
3269 | 		#replace UTF-8 chars to named entities:
3270 | 		$s = strtr($s, array_flip(self::$html_entity_table));
3271 | 
3272 | 		#block below deprecated, since PHP-5.3.x strtr() 3 times faster
3273 | 		if (0 && preg_match_all('~(?>	[\xc2\xc3\xc5\xc6\xcb\xce\xcf][\x80-\xbf]  #2 bytes
3274 | 									|	\xe2[\x80-\x99][\x82-\xac]                 #3 bytes
3275 | 								  )
3276 | 								~sxSX', $s, $m))
3277 | 		{
3278 | 			$table = array_flip(self::$html_entity_table);
3279 | 			foreach (array_unique($m[0]) as $char)
3280 | 			{
3281 | 				if (array_key_exists($char, $table)) $s = str_replace($char, $table[$char], $s);
3282 | 			}
3283 | 		}
3284 | 
3285 | 		return $s;
3286 | 	}
3287 | 
3288 | 	/**
3289 | 	 * Make regular expression for case insensitive match
3290 | 	 * Example (only digits): "123" => "123"
3291 | 	 * Example (only ASCII):  "123_test" => "(?i:123_test)"
3292 | 	 * Example (upper ASCII): "123_слово_test" => "123_(с|С)(л|Л)(о|О)(в|В)(о|О)_[tT][eE][sS][tT]"
3293 | 	 *
3294 | 	 * @param  string|null $s
3295 | 	 * @param  string|null $delimiter  If the optional delimiter is specified, it will also be escaped.
3296 | 	 *                                 This is useful for escaping the delimiter that is required by the PCRE functions.
3297 | 	 *                                 The / is the most commonly used delimiter.
3298 | 	 * @return string|bool|null        Returns FALSE if error occurred
3299 | 	 */
3300 | 	public static function preg_quote_case_insensitive($s, $delimiter = null)
3301 | 	{
3302 | 		if (! ReflectionTypeHint::isValid()) return false;
3303 | 		if (! is_string($s) || $s === '') return $s;
3304 | 
3305 | 		if (ctype_digit($s)) return preg_quote($s, $delimiter); #speed improve
3306 | 		if (self::is_ascii($s)) return '(?i:' . preg_quote($s, $delimiter) . ')'; #speed improve
3307 | 
3308 | 		$s_lc = self::convert_case($s, CASE_LOWER, false); if ($s_lc === false) return false;
3309 | 		$s_uc = self::convert_case($s, CASE_UPPER, false); if ($s_uc === false) return false;
3310 | 		if ($s_lc === $s_uc) return preg_quote($s, $delimiter); #speed improve
3311 | 
3312 | 		$chars_lc = self::str_split($s_lc); if ($chars_lc === false) return false;
3313 | 		$chars_uc = self::str_split($s_uc); if ($chars_uc === false) return false;
3314 | 
3315 | 		$s_re = '';
3316 | 		foreach ($chars_lc as $i => $char)
3317 | 		{
3318 | 			if ($chars_lc[$i] === $chars_uc[$i])
3319 | 				$s_re .= preg_quote($chars_lc[$i], $delimiter);
3320 | 			elseif (strlen($chars_lc[$i]) === 1 /*self::is_ascii($chars_lc[$i])*/)
3321 | 				$s_re .= '[' . self::_preg_quote_class($chars_lc[$i] . $chars_uc[$i], $delimiter) . ']';
3322 | 			else
3323 | 				#для русских и др. букв, т. к. флаг /u и (?i:слово) не помогают :(
3324 | 				$s_re .= '(' . preg_quote($chars_lc[$i], $delimiter) . '|'
3325 | 							 . preg_quote($chars_uc[$i], $delimiter) . ')';
3326 | 		}
3327 | 		return $s_re;
3328 | 	}
3329 | 
3330 | 	/**
3331 | 	 * Call preg_match_all() and convert byte offsets into character offsets for PREG_OFFSET_CAPTURE flag.
3332 | 	 * This is regardless of whether you use /u modifier.
3333 | 	 *
3334 | 	 * @link  http://bolknote.ru/2010/09/08/~2704
3335 | 	 *
3336 | 	 * @param   string           $pattern
3337 | 	 * @param   string|null      $subject
3338 | 	 * @param   array            $matches
3339 | 	 * @param   int              $flags
3340 | 	 * @param   int              $char_offset
3341 | 	 * @return  array|bool|null  Returns FALSE if error occurred
3342 | 	 */
3343 | 	public static function preg_match_all($pattern, $subject, &$matches, $flags = PREG_PATTERN_ORDER, $char_offset = 0)
3344 | 	{
3345 | 		if (! ReflectionTypeHint::isValid()) return false;
3346 | 		if (! is_string($subject)) return $subject;
3347 | 
3348 | 		$byte_offset = ($char_offset > 0) ? strlen(self::substr($subject, 0, $char_offset)) : $char_offset;
3349 | 
3350 | 		$return = preg_match_all($pattern, $subject, $matches, $flags, $byte_offset);
3351 | 		if ($return === false) return false;
3352 | 
3353 | 		if ($flags & PREG_OFFSET_CAPTURE)
3354 | 		{
3355 | 			foreach ($matches as &$match)
3356 | 			{
3357 | 				foreach ($match as &$a) $a[1] = self::strlen(substr($subject, 0, $a[1]));
3358 | 			}
3359 | 		}
3360 | 
3361 | 		return $return;
3362 | 	}
3363 | 
3364 | 	#alias for self::str_limit()
3365 | 	public static function truncate($s, $maxlength = null, $continue = "\xe2\x80\xa6", &$is_cutted = null, $tail_min_length = 20)
3366 | 	{
3367 | 		return self::str_limit($s, $maxlength, $continue, $is_cutted, $tail_min_length);
3368 | 	}
3369 | 
3370 | 	/**
3371 | 	 * Обрезает текст в кодировке UTF-8 до заданной длины,
3372 | 	 * причём последнее слово показывается целиком, а не обрывается на середине.
3373 | 	 * Html сущности корректно обрабатываются.
3374 | 	 *
3375 | 	 * @param   string|null     $s                Текст в кодировке UTF-8
3376 | 	 * @param   int|null|digit  $maxlength        Ограничение длины текста
3377 | 	 * @param   string          $continue         Завершающая строка, которая будет вставлена после текста, если он обрежется
3378 | 	 * @param   bool|null       &$is_cutted       Текст был обрезан?
3379 | 	 * @param   int|digit       $tail_min_length  Если длина "хвоста", оставшегося после обрезки текста, меньше $tail_min_length,
3380 | 	 *                                            то текст возвращается без изменений
3381 | 	 * @return  string|bool|null                  Returns FALSE if error occurred
3382 | 	 */
3383 | 	public static function str_limit($s, $maxlength = null, $continue = "\xe2\x80\xa6", &$is_cutted = null, $tail_min_length = 20) #"\xe2\x80\xa6" = "&hellip;"
3384 | 	{
3385 | 		if (! ReflectionTypeHint::isValid()) return false;
3386 | 		if (! is_string($s) || $s === '') return $s;
3387 | 
3388 | 		$is_cutted = false;
3389 | 		if ($continue === null) $continue = "\xe2\x80\xa6";
3390 | 		if (! $maxlength) $maxlength = 256;
3391 | 
3392 | 		#speed improve block
3393 | 		#{{{
3394 | 		if (strlen($s) <= $maxlength) return $s;
3395 | 		$s2 = str_replace("\r\n", '?', $s);
3396 | 		$s2 = preg_replace('~' . self::HTML_ENTITY_RE . '~sxSX', '?', $s2);
3397 | 		if (strlen($s2) <= $maxlength || self::strlen($s2) <= $maxlength) return $s;
3398 | 		#}}}
3399 | 
3400 | 		$r = preg_match_all('~(?> \r\n   # next line
3401 | 								   | ' . self::HTML_ENTITY_RE . '
3402 | 								   | .
3403 | 								 )
3404 | 								~sxuSX', $s, $m);
3405 | 		if ($r === false) return false;
3406 | 
3407 | 		#d($m);
3408 | 		if (count($m[0]) <= $maxlength) return $s;
3409 | 
3410 | 		$left = implode('', array_slice($m[0], 0, $maxlength));
3411 | 		#из диапазона ASCII исключаем буквы, цифры, открывающие парные символы [a-zA-Z\d\(\{\[] и некоторые др. символы
3412 | 		#нельзя вырезать в конце строки символ ";", т.к. он используются в сущностях &xxx;
3413 | 		$left2 = rtrim($left, "\x00..\x28\x2A..\x2F\x3A\x3C..\x3E\x40\x5B\x5C\x5E..\x60\x7B\x7C\x7E\x7F");
3414 | 		if (strlen($left) !== strlen($left2)) $return = $left2 . $continue;
3415 | 		else
3416 | 		{
3417 | 			#добавляем остаток к обрезанному слову
3418 | 			$right = implode('', array_slice($m[0], $maxlength));
3419 | 			preg_match('/^(?>
3420 | 							#цифры, закрывающие парные символы, дефис для составных слов, дата, время, IP-адреса, URL типа www.ya.ru:80!
3421 | 								[\d\)\]\}\-\.:]+
3422 | 							#letters
3423 | 							|	\p{L}+
3424 | 							#quotation marks
3425 | 							|	[' . implode('', self::$html_quotation_mark_table) . ']+
3426 | 						  )+
3427 | 						/suxSX', $right, $m);
3428 | 			#d($m);
3429 | 			$right = isset($m[0]) ? rtrim($m[0], '.-') : '';
3430 | 			$return = $left . $right;
3431 | 			if (strlen($return) !== strlen($s)) $return .= $continue;
3432 | 		}
3433 | 		if (self::strlen($s) - self::strlen($return) < $tail_min_length) return $s;
3434 | 
3435 | 		$is_cutted = true;
3436 | 		return $return;
3437 | 	}
3438 | 
3439 | 	/**
3440 | 	 * Implementation str_split() function for UTF-8 encoding string.
3441 | 	 *
3442 | 	 * @param   string|null      $s
3443 | 	 * @param   int|null|digit   $length
3444 | 	 * @return  array|bool|null  Returns FALSE if error occurred
3445 | 	 */
3446 | 	public static function str_split($s, $length = null)
3447 | 	{
3448 | 		if (! ReflectionTypeHint::isValid()) return false;
3449 | 		if (! is_string($s)) return $s;
3450 | 
3451 | 		$length = ($length === null) ? 1 : intval($length);
3452 | 		if ($length < 1) return false;
3453 | 		#there are limits in regexp for {min,max}!
3454 | 		if (preg_match_all('~.~suSX', $s, $m) === false) return false;
3455 | 		if (function_exists('preg_last_error') && preg_last_error() !== PREG_NO_ERROR) return false;
3456 | 		if ($length === 1) $a = $m[0];
3457 | 		else
3458 | 		{
3459 | 			$a = array();
3460 | 			for ($i = 0, $c = count($m[0]); $i < $c; $i += $length) $a[] = implode('', array_slice($m[0], $i, $length));
3461 | 		}
3462 | 		return $a;
3463 | 	}
3464 | 
3465 | 	/**
3466 | 	 * Implementation strlen() function for UTF-8 encoding string.
3467 | 	 *
3468 | 	 * @param   string|null    $s
3469 | 	 * @return  int|bool|null  Returns FALSE if error occurred
3470 | 	 */
3471 | 	public static function strlen($s)
3472 | 	{
3473 | 		if (! ReflectionTypeHint::isValid()) return false;
3474 | 		if (! is_string($s)) return $s;
3475 | 
3476 | 		//since PHP-5.3.x mb_strlen() faster then strlen(utf8_decode())
3477 | 		if (function_exists('mb_strlen')) return mb_strlen($s, 'utf-8');
3478 | 
3479 | 		/*
3480 | 		  utf8_decode() converts characters that are not in ISO-8859-1 to '?', which, for the purpose of counting, is quite alright.
3481 | 		  It's much faster than iconv_strlen()
3482 | 		  Note: this function does not count bad UTF-8 bytes in the string - these are simply ignored
3483 | 		*/
3484 | 		return strlen(utf8_decode($s));
3485 | 
3486 | 		/*
3487 | 		#iconv_strlen() slowly then strlen(utf8_decode())
3488 | 		if (function_exists('iconv_strlen')) return iconv_strlen($s, 'utf-8');
3489 | 
3490 | 		#Do not count UTF-8 continuation bytes
3491 | 		#return strlen(preg_replace('/[\x80-\xBF]/sSX', '', $s));
3492 | 
3493 | 		#slowly then strlen(utf8_decode())
3494 | 		preg_match_all('~.~suSX', $str, $m);
3495 | 		return count($m[0]);
3496 | 
3497 | 		#slowly then preg_match_all() + count()
3498 | 		$n = 0;
3499 | 		for ($i = 0, $len = strlen($s); $i < $len; $i++)
3500 | 		{
3501 | 			$c = ord(substr($s, $i, 1));
3502 | 			if ($c < 0x80) $n++;                 #single-byte (0xxxxxx)
3503 | 			elseif (($c & 0xC0) == 0xC0) $n++;   #multi-byte starting byte (11xxxxxx)
3504 | 		}
3505 | 		return $n;
3506 | 		*/
3507 | 	}
3508 | 
3509 | 	/**
3510 | 	 * Implementation strpos() function for UTF-8 encoding string
3511 | 	 *
3512 | 	 * @param   string|null    $s       The entire string
3513 | 	 * @param   string|int     $needle  The searched substring
3514 | 	 * @param   int|null       $offset  The optional offset parameter specifies the position from which the search should be performed
3515 | 	 * @return  int|bool|null           Returns the numeric position of the first occurrence of needle in haystack.
3516 | 	 *                                  If needle is not found, will return FALSE.
3517 | 	 */
3518 | 	public static function strpos($s, $needle, $offset = null)
3519 | 	{
3520 | 		if (! ReflectionTypeHint::isValid()) return false;
3521 | 		if (! is_string($s)) return $s;
3522 | 
3523 | 		if ($offset === null || $offset < 0) $offset = 0;
3524 | 		#mb_strpos() faster then iconv_strpos()
3525 | 		if (function_exists('mb_strpos')) return mb_strpos($s, $needle, $offset, 'utf-8');
3526 | 		#iconv_strpos() deprecated, because slowly than self::strlen(substr())
3527 | 		#if (function_exists('iconv_strpos')) return iconv_strpos($s, $needle, $offset, 'utf-8');
3528 | 		$byte_pos = $offset;
3529 | 		do if (($byte_pos = strpos($s, $needle, $byte_pos)) === false) return false;
3530 | 		while (($char_pos = self::strlen(substr($s, 0, $byte_pos++))) < $offset);
3531 | 		return $char_pos;
3532 | 	}
3533 | 
3534 | 	/**
3535 | 	 * Find position of first occurrence of a case-insensitive string.
3536 | 	 *
3537 | 	 * @param   string|null    $s       The entire string
3538 | 	 * @param   string|int     $needle  The searched substring
3539 | 	 * @param   int|null       $offset  The optional offset parameter specifies the position from which the search should be performed
3540 | 	 * @return  int|bool|null           Returns the numeric position of the first occurrence of needle in haystack.
3541 | 	 *                                  If needle is not found, will return FALSE.
3542 | 	 */
3543 | 	public static function stripos($s, $needle, $offset = null)
3544 | 	{
3545 | 		if (! ReflectionTypeHint::isValid()) return false;
3546 | 		if (! is_string($s)) return $s;
3547 | 
3548 | 		if ($offset === null || $offset < 0) $offset = 0;
3549 | 		if (function_exists('mb_stripos')) return mb_stripos($s, $needle, $offset, 'utf-8');
3550 | 
3551 | 		#optimization block (speed improve)
3552 | 		#{{{
3553 | 		$ascii_int = intval(self::is_ascii($s)) + intval(self::is_ascii($needle));
3554 | 		if ($ascii_int === 1) return false;
3555 | 		if ($ascii_int === 2) return stripos($s, $needle, $offset);
3556 | 		#}}}
3557 | 
3558 | 		$s = self::convert_case($s, CASE_LOWER, false);
3559 | 		if ($s === false) return false;
3560 | 		$needle = self::convert_case($needle, CASE_LOWER, false);
3561 | 		if ($needle === false) return false;
3562 | 		return self::strpos($s, $needle, $offset);
3563 | 	}
3564 | 
3565 | 	/**
3566 | 	 * Implementation strrev() function for UTF-8 encoding string
3567 | 	 *
3568 | 	 * @param   string|null       $s
3569 | 	 * @return  string|bool|null  Returns FALSE if error occurred
3570 | 	 */
3571 | 	public static function strrev($s)
3572 | 	{
3573 | 		if (! ReflectionTypeHint::isValid()) return false;
3574 | 		if (! is_string($s) || $s === '') return $s;
3575 | 
3576 | 		if (0) #TODO test speed
3577 | 		{
3578 | 			$s = self::_convert($s, 'UTF-8', 'UTF-32');
3579 | 			if (! is_string($s)) return false;
3580 | 			$s = implode('', array_reverse(str_split($s, 4)));
3581 | 			return self::_convert($s, 'UTF-32', 'UTF-8');
3582 | 		}
3583 | 
3584 | 		if (! is_array($a = self::str_split($s))) return false;
3585 | 		return implode('', array_reverse($a));
3586 | 	}
3587 | 
3588 | 	/**
3589 | 	 * Implementation substr() function for UTF-8 encoding string.
3590 | 	 *
3591 | 	 * @link     http://www.w3.org/International/questions/qa-forms-utf-8.html
3592 | 	 * @param    string|null       $s
3593 | 	 * @param    int|digit         $offset
3594 | 	 * @param    int|null|digit    $length
3595 | 	 * @return   string|bool|null             Returns FALSE if error occurred
3596 | 	 */
3597 | 	public static function substr($s, $offset, $length = null)
3598 | 	{
3599 | 		if (! ReflectionTypeHint::isValid()) return false;
3600 | 		if (! is_string($s)) return $s;
3601 | 
3602 | 		#since PHP-5.3.x mb_substr() faster then iconv_substr()
3603 | 		if (function_exists('mb_substr'))
3604 | 		{
3605 | 			if ($length === null) $length = self::strlen($s);
3606 | 			return mb_substr($s, $offset, $length, 'utf-8');
3607 | 		}
3608 | 		if (function_exists('iconv_substr'))
3609 | 		{
3610 | 			if ($length === null) $length = self::strlen($s);
3611 | 			return iconv_substr($s, $offset, $length, 'utf-8');
3612 | 		}
3613 | 
3614 | 		static $_s = null;
3615 | 		static $_a = null;
3616 | 
3617 | 		if ($_s !== $s) $_a = self::str_split($_s = $s);
3618 | 		if (! is_array($_a)) return false;
3619 | 		if ($length !== null) $a = array_slice($_a, $offset, $length);
3620 | 		else                  $a = array_slice($_a, $offset);
3621 | 		return implode('', $a);
3622 | 	}
3623 | 
3624 | 	/**
3625 | 	 * Implementation substr_replace() function for UTF-8 encoding string.
3626 | 	 *
3627 | 	 * @param   string|null       $s
3628 | 	 * @param   string|int        $replacement
3629 | 	 * @param   int|digit         $start
3630 | 	 * @param   int|null          $length
3631 | 	 * @return  string|bool|null  Returns FALSE if error occurred
3632 | 	 */
3633 | 	public static function substr_replace($s, $replacement, $start, $length = null)
3634 | 	{
3635 | 		if (! ReflectionTypeHint::isValid()) return false;
3636 | 		if (! is_string($s) || $s === '') return $s;
3637 | 
3638 | 		$a = self::str_split($s);
3639 | 		if (! is_array($a)) return false;
3640 | 		array_splice($a, $start, $length, $replacement);
3641 | 		return implode('', $a);
3642 | 	}
3643 | 
3644 | 	/**
3645 | 	 * Implementation ucfirst() function for UTF-8 encoding string.
3646 | 	 * Преобразует первый символ строки в кодировке UTF-8 в верхний регистр.
3647 | 	 * Корректно обрабатывает слова в кавычках, например: «северный поток» --> «Северный поток»
3648 | 	 *
3649 | 	 * @param   string|null       $s
3650 | 	 * @param   bool              $is_other_to_lowercase  остальные символы преобразуются в нижний регистр?
3651 | 	 * @return  string|bool|null  Returns FALSE if error occurred
3652 | 	 */
3653 | 	public static function ucfirst($s, $is_other_to_lowercase = true)
3654 | 	{
3655 | 		if (! ReflectionTypeHint::isValid()) return false;
3656 | 		if ($s === '' || ! is_string($s)) return $s;
3657 | 
3658 | 		if (! preg_match('/^([' . implode('', self::$html_quotation_mark_table) . ']{1,2}+)  #1 quotation marks
3659 | 							(\p{L})     #2 first letter
3660 | 							(.*+)       #3 next letters
3661 | 							$/sxuSX', $s, $m)) return $s; #letters not found
3662 | 		return $m[1] . self::uppercase($m[2]) . ($is_other_to_lowercase ? self::lowercase($m[3]) : $m[3]);
3663 | 	}
3664 | 
3665 | 	/**
3666 | 	 * Implementation ucwords() function for UTF-8 encoding string.
3667 | 	 * Преобразует в верхний регистр первый символ каждого слова в строке в кодировке UTF-8,
3668 | 	 * остальные символы каждого слова преобразуются в нижний регистр.
3669 | 	 *
3670 | 	 * @param   string|null       $s
3671 | 	 * @param   bool              $is_other_to_lowercase  остальные символы преобразуются в нижний регистр?
3672 | 	 * @param   string            $spaces_re
3673 | 	 * @return  string|bool|null  Returns FALSE if error occurred
3674 | 	 */
3675 | 	public static function ucwords($s, $is_other_to_lowercase = true, $spaces_re = '~([\p{Z}\s]+)~suSX')
3676 | 	{
3677 | 		if (! ReflectionTypeHint::isValid()) return false;
3678 | 		if ($s === '' || ! is_string($s)) return $s;
3679 | 
3680 | 		$words = preg_split($spaces_re, $s, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
3681 | 		foreach ($words as $k => $word)
3682 | 		{
3683 | 			$words[$k] = self::ucfirst($word, $is_other_to_lowercase);
3684 | 			if ($words[$k] === false) return false;
3685 | 		}
3686 | 		return implode('', $words);
3687 | 	}
3688 | 
3689 | 	/**
3690 | 	 * Decodes a string to UTF-8 string from some formats (can be mixed)
3691 | 	 * Examples
3692 | 	 *   '%D1%82%D0%B5%D1%81%D1%82'        => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #binary (regular)
3693 | 	 *   '0xD182D0B5D181D182'              => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #binary (compact)
3694 | 	 *   '%u0442%u0435%u0441%u0442'        => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #UCS-2  (U+0 — U+FFFF)
3695 | 	 *   '%u{442}%u{435}%u{0441}%u{00442}' => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"  #UTF-8  (U+0 — U+FFFFFF)
3696 | 	 *
3697 | 	 * It is used to decode the data in the format %uXXXX, encoded deprecated
3698 | 	 * javascript's function encode(). Recommended to use encodeURIComponent().
3699 | 	 * Obsolete format %uXXXX allows unicode only in the range of UCS-2, ie, U+0 to U+FFFF.
3700 | 	 *
3701 | 	 * @see     urldecode()
3702 | 	 * @param   array|scalar|null  $data
3703 | 	 * @param   bool               $is_hex2bin  Decode the HEX-data?
3704 | 	 *                                          Example: '0xD182D0B5D181D182' => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"
3705 | 	 *                                          Hint: parameters in the URL address is sometimes
3706 | 	 *                                          convenient to encode not function rawurlencode($string),
3707 | 	 *                                          and use the following mechanism (encoded data is more compact):
3708 | 	 *                                          '0x' . bin2hex($string)
3709 | 	 * @param   bool               $is_urldecode
3710 | 	 * @return  array|scalar|null  Returns FALSE if error occurred
3711 | 	 */
3712 | 	public static function unescape($data, $is_hex2bin = false, $is_urldecode = true)
3713 | 	{
3714 | 		if (! ReflectionTypeHint::isValid()) return false;
3715 | 		if (is_array($data))
3716 | 		{
3717 | 			$d = array();
3718 | 			foreach ($data as $k => &$v)
3719 | 			{
3720 | 				if (is_string($k))
3721 | 				{
3722 | 					$k = self::unescape($k, $is_hex2bin, $is_urldecode);
3723 | 					if (! is_string($k)) return false;
3724 | 				}
3725 | 				$d[$k] = self::unescape($v, $is_hex2bin, $is_urldecode);
3726 | 				if ($d[$k] === false && ! is_bool($v)) return false;
3727 | 			}
3728 | 			return $d;
3729 | 		}
3730 | 		if (is_string($data))
3731 | 		{
3732 | 			#use strpos() for speed improving of regexp
3733 | 			if ($is_hex2bin && strpos($data, '0x') !== false)
3734 | 			{
3735 | 				$data = preg_replace_callback(
3736 | 							'~0x((?:[\da-fA-F]{2})+)~sSX',
3737 | 							function (array $m)
3738 | 							{
3739 | 								$s = pack('H' . strlen($m[1]), $m[1]); #hex2bin()
3740 | 								return rawurlencode($s);
3741 | 							},
3742 | 							$data);
3743 | 			}
3744 | 			if (strpos($data, '%u') !== false)
3745 | 			{
3746 | 				$class = __CLASS__;
3747 | 				$data = preg_replace_callback(
3748 | 							'~%u(   [\da-fA-F]{4}+          #%uXXXX     only UCS-2
3749 | 								  | \{ [\da-fA-F]{1,6}+ \}  #%u{XXXXXX} extended form for all UNICODE charts
3750 | 								)
3751 | 							 ~sxSX',
3752 | 							function (array $m) use ($class)
3753 | 							{
3754 | 								$codepoint = hexdec(trim($m[1], '{}'));
3755 | 								$char = $class::chr($codepoint);
3756 | 								return rawurlencode($char);
3757 | 							},
3758 | 							$data);
3759 | 			}
3760 | 			return $is_urldecode ? urldecode($data) : $data;
3761 | 		}
3762 | 		if (is_scalar($data) || is_null($data)) return $data;  #~ null, integer, float, boolean
3763 | 		return false; #object or resource
3764 | 	}
3765 | 
3766 | 	/**
3767 | 	 * 1) Corrects the global arrays $_GET, $_POST, $_COOKIE, $_REQUEST, $_FILES
3768 | 	 *    decoded values ​​from %XX and extended %uXXXX / %u{XXXXXX} format,
3769 | 	 *    for example, through an outdated javascript function escape().
3770 | 	 *    Standard PHP5 cannot do it.
3771 | 	 * 2) Recode $_GET, $_POST, $_COOKIE, $_REQUEST, $_FILES from $charset
3772 | 	 *    encoding to UTF-8, if necessary.
3773 | 	 *    A side effect is a positive protection against XSS attacks with
3774 | 	 *    non-printable characters on the vulnerable PHP function.
3775 | 	 *    Thus web forms can be sent to the server in 2-encoding: $charset and UTF-8.
3776 | 	 *    For example: ?тест[тест]=тест
3777 | 	 * 3) If in the HTTP_COOKIE there are parameters with the same name,
3778 | 	 *    takes the last value (as in the QUERY_STRING), not the first.
3779 | 	 * 4) Creates an array of $_POST for non-standard Content-Type, for example,
3780 | 	 *    "Content-Type: application/octet-stream". Standard PHP5 creates
3781 | 	 *    an array for "Content-Type: application/x-www-form-urlencoded"
3782 | 	 *    and "Content-Type: multipart/form-data".
3783 | 	 *
3784 | 	 * Examples
3785 | 	 *   '%F2%E5%F1%F2'                    => 'тест'  #CP1251 (regular)
3786 | 	 *   '0xF2E5F1F2'                      => 'тест'  #CP1251 (compact)
3787 | 	 *   '%D1%82%D0%B5%D1%81%D1%82'        => 'тест'  #UTF-8 (regular)
3788 | 	 *   '0xD182D0B5D181D182'              => 'тест'  #UTF-8 (compact)
3789 | 	 *   '%u0442%u0435%u0441%u0442'        => 'тест'  #UCS-2 (U+0 — U+FFFF)
3790 | 	 *   '%u{442}%u{435}%u{0441}%u{00442}' => 'тест'  #UTF-8 (U+0 — U+FFFFFF)
3791 | 	 *
3792 | 	 * Сессии, куки и независимая авторизация на поддоменах.
3793 | 	 *
3794 | 	 * ПРИМЕР 1
3795 | 	 * У рабочего сайта http://domain.com появились поддомены.
3796 | 	 * Для кроссдоменной авторизации через механизм сессий имя хоста для COOKIE было изменено с "domain.com" на ".domain.com"
3797 | 	 * В результате авторизация не работает. Решение: поменять имя сессии.
3798 | 	 * Ещё помогает очистка COOKIE, но их принудительная очистка на тысячах пользовательских компьютеров проблематична.
3799 | 	 * PHP не правильно (?) обрабатывает заголовок HTTP_COOKIE, если там встречаются параметры с одинаковым именем, но разными значениями.
3800 | 	 * Пример запроса HTTP-заголовка клиентом: "Cookie: sid=chpgs2fiak-330mzqza; sid=cmz5tnp5zz-xlbbgqp"
3801 | 	 * В этом случае сервер берёт первое значение, а не последнее.
3802 | 	 * Хотя если в QUERY_STRING есть такая ситуация, всегда берётся последний параметр.
3803 | 	 * В HTTP_COOKIE два параметра с одинаковым именем могут появиться, если отправить клиенту следующие HTTP-заголовки:
3804 | 	 * "Set-Cookie: sid=chpgs2fiak-330mzqza; expires=Thu, 15 Oct 2009 14:23:42 GMT; path=/; domain=domain.com"  (только domain.com)
3805 | 	 * "Set-Cookie: sid=cmz6uqorzv-1bn35110; expires=Thu, 15 Oct 2009 14:23:42 GMT; path=/; domain=.domain.com" (domain.com и все его поддомены)
3806 | 	 *
3807 | 	 * ПРИМЕР 2
3808 | 	 * Есть рабочие сайты: http://domain.com (основной), http://admin.domain.com (админка),
3809 | 	 * http://sub1.domain.com (подпроект 1), http://sub2.domain.com, (подпроект 2).
3810 | 	 * Так же имеется сервер разработки http://dev.domain.com, на котором м. б. свои поддомены.
3811 | 	 * Требуется сделать независимую кросс-доменную авторизацию для http://*.domain.com и http://*.dev.domain.com.
3812 | 	 * Для сохранения статуса авторизации будем использовать сессию, имя и значение которой пишется в COOKIE.
3813 | 	 * Т. к. домены http://*.dev.domain.com имеют пересечение с доменами http://*.domain.com,
3814 | 	 * для независимой авторизации	нужно использовать разные имена сессий!
3815 | 	 * Пример HTTP заголовков ответа сервера:
3816 | 	 * "Set-Cookie: sid=chpgs2fiak-330mzqza; expires=Thu, 15 Oct 2009 14:23:42 GMT; path=/; domain=.domain.com" (.domain.com и все его поддомены)
3817 | 	 * "Set-Cookie: sid.dev=cmz6uqorzv-1bn35110; expires=Thu, 15 Oct 2009 14:23:42 GMT; path=/; domain=.dev.domain.com" (dev.domain.com и все его поддомены)
3818 | 	 *
3819 | 	 * @link    http://tools.ietf.org/html/rfc2965  RFC 2965 - HTTP State Management Mechanism
3820 | 	 * @param   bool               $is_hex2bin  Decode the HEX-data?
3821 | 	 *                                          Example: '0xD182D0B5D181D182' => "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"
3822 | 	 *                                          Hint: parameters in the URL address is sometimes
3823 | 	 *                                          convenient to encode not function rawurlencode($string),
3824 | 	 *                                          and use the following mechanism (encoded data is more compact):
3825 | 	 *                                          '0x' . bin2hex($string)
3826 | 	 * @param   string  $charset
3827 | 	 * @return  bool
3828 | 	 */
3829 | 	public static function unescape_request($is_hex2bin = false, $charset = 'ISO-8859-1')
3830 | 	{
3831 | 		$fixed = false;
3832 | 		#ATTENTION! HTTP_RAW_POST_DATA is only accessible when Content-Type of POST request is NOT default "application/x-www-form-urlencoded"!
3833 | 		$HTTP_RAW_POST_DATA = isset($_SERVER['REQUEST_METHOD']) && $_SERVER['REQUEST_METHOD'] === 'POST' ? (isset($GLOBALS['HTTP_RAW_POST_DATA']) ? $GLOBALS['HTTP_RAW_POST_DATA'] : @file_get_contents('php://input')) : null;
3834 | 		if (ini_get('always_populate_raw_post_data')) $GLOBALS['HTTP_RAW_POST_DATA'] = $HTTP_RAW_POST_DATA;
3835 | 		foreach (array( '_GET'    => isset($_SERVER['QUERY_STRING']) ? $_SERVER['QUERY_STRING'] : null,
3836 | 						'_POST'   => $HTTP_RAW_POST_DATA,
3837 | 						'_COOKIE' => isset($_SERVER['HTTP_COOKIE']) ? $_SERVER['HTTP_COOKIE'] : null,
3838 | 						'_FILES'  => isset($_FILES) ? $_FILES : null,
3839 | 						) as $k => $v)
3840 | 		{
3841 | 			if (! is_string($v)) continue;
3842 | 
3843 | 			if ($k === '_COOKIE')
3844 | 			{
3845 | 				$v = preg_replace('/; *+/sSX', '&', $v);
3846 | 				unset($_COOKIE); #будем парсить HTTP_COOKIE сами, чтобы сделать обработку как у QUERY_STRING
3847 | 			}
3848 | 
3849 | 			$v = self::unescape($v, $is_hex2bin, false);
3850 | 			if ($v === false) return false;
3851 | 			parse_str($v, $GLOBALS[$k]);
3852 | 
3853 | 			$GLOBALS[$k] = self::convert_from($GLOBALS[$k], $charset);
3854 | 			if ($GLOBALS[$k] === false)
3855 | 			{
3856 | 				trigger_error('Array $' . $k . ' does not have keys/values in UTF-8 charset!', E_USER_WARNING);
3857 | 				return false;
3858 | 			}
3859 | 
3860 | 			$fixed = true;
3861 | 		}
3862 | 		if ($fixed)
3863 | 		{
3864 | 			$_REQUEST =
3865 | 				(isset($_COOKIE) ? $_COOKIE : array()) +
3866 | 				(isset($_POST) ? $_POST : array()) +
3867 | 				(isset($_GET) ? $_GET : array());
3868 | 		}
3869 | 		return true;
3870 | 	}
3871 | 
3872 | 	/**
3873 | 	 * Calculates the height of the edit text in <textarea> html tag by value and width.
3874 | 	 *
3875 | 	 * В большинстве случаев будет корректно работать для моноширинных шрифтов.
3876 | 	 * Т.к. браузер переносит последнее слово, которое не умещается на строке,
3877 | 	 * на следующую строку, высота м.б. меньше ожидаемой.
3878 | 	 * Этот алгоритм явл. простым (и быстрым) и не отслеживает переносы слов.
3879 | 	 *
3880 | 	 * @param   string|null     $s         Текст
3881 | 	 * @param   int|digit       $cols      Ширина области редактирования (колонок)
3882 | 	 * @param   int|digit       $min_rows  Минимальное кол-во строк
3883 | 	 * @param   int|digit       $max_rows  Максимальное кол-во строк
3884 | 	 * @return  int|bool|null              Number of rows (lines)
3885 | 	 */
3886 | 	public static function textarea_rows($s, $cols, $min_rows = 3, $max_rows = 32)
3887 | 	{
3888 | 		if (! ReflectionTypeHint::isValid()) return false;
3889 | 		if (! is_string($s)) return $s;
3890 | 
3891 | 		if (strlen($s) == 0) return $min_rows;  #speed improve
3892 | 		$rows = 0;
3893 | 		#utf8_decode() converts characters that are not in ISO-8859-1 to '?'
3894 | 		foreach (preg_split('/\r\n|[\r\n]/sSX', utf8_decode($s)) as $line)
3895 | 		{
3896 | 			$rows += ceil((strlen($line) + 1) / $cols);
3897 | 			if ($rows > $max_rows) return $max_rows;
3898 | 		}
3899 | 		return ($rows < $min_rows) ? $min_rows : $rows;
3900 | 	}
3901 | 
3902 | 	/**
3903 | 	 * @param   string|null       $s
3904 | 	 * @param   string|null       $charlist
3905 | 	 * @return  string|bool|null
3906 | 	 */
3907 | 	public static function ltrim($s, $charlist = null)
3908 | 	{
3909 | 		if (! ReflectionTypeHint::isValid()) return false;
3910 | 		if (! is_string($s) || $s === '') return $s;
3911 | 		if ($charlist === null || self::is_ascii($charlist)) return ltrim($s);
3912 | 		return preg_replace('~^[' . self::_preg_quote_class($charlist, '~') . ']+~suSX', '', $s);
3913 | 	}
3914 | 
3915 | 	/**
3916 | 	 * @param   string|null       $s
3917 | 	 * @param   string|null       $charlist
3918 | 	 * @return  string|bool|null
3919 | 	 */
3920 | 	public static function rtrim($s, $charlist = null)
3921 | 	{
3922 | 		if (! ReflectionTypeHint::isValid()) return false;
3923 | 		if (! is_string($s) || $s === '') return $s;
3924 | 		if ($charlist === null || self::is_ascii($charlist)) return rtrim($s);
3925 | 		return preg_replace('~[' . self::_preg_quote_class($charlist, '~') . ']+$~suSX', '', $s);
3926 | 	}
3927 | 
3928 | 	/**
3929 | 	 * @param   scalar|null  $s
3930 | 	 * @param   string|null  $charlist
3931 | 	 * @return  scalar|null
3932 | 	 */
3933 | 	public static function trim($s, $charlist = null)
3934 | 	{
3935 | 		if (! ReflectionTypeHint::isValid()) return false;
3936 | 		if (! is_string($s) || $s === '') return $s;
3937 | 		if ($charlist === null || self::is_ascii($charlist)) return trim($s);
3938 | 		$charlist_re = self::_preg_quote_class($charlist, '~');
3939 | 		$s = preg_replace('~^[' . $charlist_re . ']+~suSX', '', $s);
3940 | 		return preg_replace('~[' . $charlist_re . ']+$~suSX', '', $s);
3941 | 	}
3942 | 
3943 | 	/**
3944 | 	 * @param  string      $charlist
3945 | 	 * @param  string|null $delimiter
3946 | 	 * @return string
3947 | 	 */
3948 | 	private static function _preg_quote_class($charlist, $delimiter = null)
3949 | 	{
3950 | 		#return preg_quote($charlist, $delimiter); #DEPRECATED
3951 | 		$quote_table = array(
3952 | 			'\\' => '\\\\',
3953 | 			'-'  => '\-',
3954 | 			']'  => '\]',
3955 | 		);
3956 | 		if (is_string($delimiter)) $quote_table[$delimiter] = '\\' . $delimiter;
3957 | 		return strtr($charlist, $quote_table);
3958 | 	}
3959 | 
3960 | 	/**
3961 | 	 * @param   string|null       $s
3962 | 	 * @param   int|digit         $length
3963 | 	 * @param   string            $pad_str
3964 | 	 * @param   int               $type     STR_PAD_LEFT, STR_PAD_RIGHT or STR_PAD_BOTH
3965 | 	 * @return  string|bool|null
3966 | 	 */
3967 | 	public static function str_pad($s, $length, $pad_str = ' ', $type = STR_PAD_RIGHT)
3968 | 	{
3969 | 		if (! ReflectionTypeHint::isValid()) return false;
3970 | 		if (! is_string($s)) return $s;
3971 | 
3972 | 		$input_len = self::strlen($s);
3973 | 		if ($length <= $input_len) return $s;
3974 | 
3975 | 		$pad_str_len = self::strlen($pad_str);
3976 | 		$pad_len = $length - $input_len;
3977 | 
3978 | 		if ($type == STR_PAD_RIGHT)
3979 | 		{
3980 | 			$repeat_num = ceil($pad_len / $pad_str_len);
3981 | 			return self::substr($s . str_repeat($pad_str, $repeat_num), 0, $length);
3982 | 		}
3983 | 
3984 | 		if ($type == STR_PAD_LEFT)
3985 | 		{
3986 | 			$repeat_num = ceil($pad_len / $pad_str_len);
3987 | 			return self::substr(str_repeat($pad_str, $repeat_num), 0, intval(floor($pad_len))) . $s;
3988 | 		}
3989 | 
3990 | 		if ($type == STR_PAD_BOTH)
3991 | 		{
3992 | 			$pad_len /= 2;
3993 | 			$pad_amount_left  = intval(floor($pad_len));
3994 | 			$pad_amount_right = intval(ceil($pad_len));
3995 | 			$repeat_times_left  = ceil($pad_amount_left  / $pad_str_len);
3996 | 			$repeat_times_right = ceil($pad_amount_right / $pad_str_len);
3997 | 
3998 | 			$padding_left  = self::substr(str_repeat($pad_str, $repeat_times_left),  0, $pad_amount_left);
3999 | 			$padding_right = self::substr(str_repeat($pad_str, $repeat_times_right), 0, $pad_amount_right);
4000 | 			return $padding_left . $s . $padding_right;
4001 | 		}
4002 | 
4003 | 		trigger_error('Parameter 4 should be a constant of STR_PAD_RIGHT, STR_PAD_LEFT or STR_PAD_BOTH!', E_USER_WARNING);
4004 | 		return false;
4005 | 	}
4006 | 
4007 | 	/**
4008 | 	 * @param   string    $str
4009 | 	 * @param   string    $mask
4010 | 	 * @param   int|null  $start
4011 | 	 * @param   int|null  $length
4012 | 	 * @return  int|bool
4013 | 	 */
4014 | 	public static function strspn($str, $mask, $start = null, $length = null)
4015 | 	{
4016 | 		if (! ReflectionTypeHint::isValid()) return false;
4017 | 		#if (self::is_ascii($str) && self::is_ascii($mask)) return strspn($str, $mask, $start, $length);
4018 | 		if ($start !== null || $length !== null) $str = self::substr($str, $start, $length);
4019 | 		if (preg_match('~^[' . preg_quote($mask, '~') . ']+~uSX', $str, $m)) self::strlen($m[0]);
4020 | 		return 0;
4021 | 	}
4022 | 
4023 | 	/**
4024 | 	 * Recode the text files in a specified folder in the UTF-8
4025 | 	 * In the processing skipped binary files, files encoded in UTF-8, files that could not convert.
4026 | 	 * So method works reliably enough.
4027 | 	 *
4028 | 	 *
4029 | 	 * @param   string       $dir             Директория для сканирования
4030 | 	 * @param   string|null  $files_re        Регул. выражение для шаблона имён файлов,
4031 | 	 *                                        например: '~\.(?:txt|sql|php|pl|py|sh|tpl|xml|xsl|html|xhtml|phtml|htm|js|json|css|conf|cfg|ini|htaccess)$~sSX'
4032 | 	 * @param   bool         $is_recursive    Обрабатывать вложенные папки и файлы?
4033 | 	 * @param   string       $charset         Исходная кодировка
4034 | 	 * @param   string|null  $dirs_ignore_re  Регул. выражение для исключения папок из обработки
4035 | 	 *                                        например: '~^(?:cache|images?|photos?|fonts?|img|ico|\.svn|\.hg|\.cvs)$~siSX'
4036 | 	 * @param   bool         $is_echo         Печать имён обработанных файлов и статус обработки в выходной поток?
4037 | 	 * @param   bool         $is_simulate     Сымитировать работу без реальной перезаписи файлов?
4038 | 	 * @return  int|bool                      Возвращает кол-во перекодированных файлов
4039 | 	 *                                        Returns FALSE if error occurred
4040 | 	 */
4041 | 	public static function convert_files_from(
4042 | 		$dir,
4043 | 		$files_re = null,
4044 | 		$is_recursive = true,
4045 | 		$charset = 'CP1251',
4046 | 		$dirs_ignore_re = null,
4047 | 		$is_echo = false,
4048 | 		$is_simulate = false)
4049 | 	{
4050 | 		if (! ReflectionTypeHint::isValid()) return false;
4051 | 
4052 | 		$dh = opendir($dir);
4053 | 		if (! is_resource($dh)) return false;
4054 | 		$counter = 0;
4055 | 		while (($name = readdir($dh)) !== false)
4056 | 		{
4057 | 			if ($name == '.' || $name == '..') continue;
4058 | 			$file = $dir . '/' . $name;
4059 | 			if (is_file($file))
4060 | 			{
4061 | 				if (is_string($files_re) && ! preg_match($files_re, $name)) continue;
4062 | 				if ($is_echo) echo $file;
4063 | 
4064 | 				$s = @file_get_contents($file);
4065 | 				if (! is_string($s))
4066 | 				{
4067 | 					if ($is_echo) echo '  Error to reading' . PHP_EOL;
4068 | 					return false;
4069 | 				}
4070 | 
4071 | 				if (self::is_utf8($s))
4072 | 				{
4073 | 					if ($is_echo) echo '  Already UTF-8, skipped' . PHP_EOL;
4074 | 					continue;
4075 | 				}
4076 | 
4077 | 				if (self::has_binary($s))
4078 | 				{
4079 | 					if ($is_echo) echo '  Вinary file, skipped' . PHP_EOL;
4080 | 					continue;
4081 | 				}
4082 | 
4083 | 				$s = self::convert_from($s, $charset);
4084 | 				if (! is_string($s) || ! self::is_utf8($s))
4085 | 				{
4086 | 					if ($is_echo) echo '  Error to converting (source file not in ' . $charset . '?)' . PHP_EOL;
4087 | 					continue;
4088 | 				}
4089 | 
4090 | 				$ext = strtolower(pathinfo($name, PATHINFO_EXTENSION));
4091 | 				if ($ext === 'htm' || $ext === 'html' || $ext === 'xhtml' || $ext === 'phtml' || $ext === 'tpl')
4092 | 				{
4093 | 					$s = preg_replace('~(<meta  [\x00-\x20]++
4094 | 												(?:  content="text/html; [\x00-\x20]++ charset= #HTML4
4095 | 												  |  charset="                                  #HTML5
4096 | 												)
4097 | 										)               #1
4098 | 											[-a-z\d]++  #charset name
4099 | 										(" [^>]* >)     #2
4100 | 										~sixSX', '$1utf-8$2', $s);
4101 | 				}
4102 | 				if ($ext === 'xml' || $ext === 'xsl' || $ext === 'tpl')
4103 | 				{
4104 | 					$s = preg_replace('~(<\?xml [\x00-\x20]++ encoding=") #1
4105 | 											[-a-z\d]++                    #charset name
4106 | 										(" .*? \?>)                       #2
4107 | 										~sixSX', '$1utf-8$2', $s);
4108 | 				}
4109 | 
4110 | 				if (! $is_simulate)
4111 | 				{
4112 | 					$bytes = @file_put_contents($file, $s);
4113 | 					if ($bytes === false)
4114 | 					{
4115 | 						if ($is_echo) echo '  Error to writing' . PHP_EOL;
4116 | 						return false;
4117 | 					}
4118 | 				}
4119 | 				if ($is_echo) echo '  ' . $charset . ' to UTF-8 converted' . PHP_EOL;
4120 | 				$counter++;
4121 | 			}
4122 | 			elseif ($is_recursive && is_dir($file))
4123 | 			{
4124 | 				if (! is_string($dirs_ignore_re) || ! preg_match($dirs_ignore_re, $name))
4125 | 				{
4126 | 					$c = self::convert_files_from($file, $files_re, $is_recursive, $charset, $dirs_ignore_re, $is_echo, $is_simulate);
4127 | 					if ($c === false) return false;
4128 | 					$counter += $c;
4129 | 				}
4130 | 			}
4131 | 		}
4132 | 		closedir($dh);
4133 | 		return $counter;
4134 | 	}
4135 | 
4136 | 	/**
4137 | 	 *
4138 | 	 * @param   int|string  $low
4139 | 	 * @param   int|string  $high
4140 | 	 * @param   int         $step
4141 | 	 * @return  array|bool         Returns FALSE if error occurred
4142 | 	 */
4143 | 	public static function range($low, $high, $step = 1)
4144 | 	{
4145 | 		if (! ReflectionTypeHint::isValid()) return false;
4146 | 		if (is_int($low) || is_int($high)) return range($low, $high, $step);  #speed improve
4147 | 		$low_cp  = self::ord($low);
4148 | 		$high_cp = self::ord($high);
4149 | 		if (! is_int($low_cp) || ! is_int($high_cp)) return false;
4150 | 		$a = range($low_cp, $high_cp, $step);
4151 | 		return array_map(array('self', 'chr'), $a);
4152 | 	}
4153 | 
4154 | 	/**
4155 | 	 *
4156 | 	 * @param   string|null       $s
4157 | 	 * @param   string|array      $from
4158 | 	 * @param   string|null       $to
4159 | 	 * @return  string|bool|null         Returns FALSE if error occurred
4160 | 	 */
4161 | 	public static function strtr($s, $from, $to = null)
4162 | 	{
4163 | 		if (! ReflectionTypeHint::isValid()) return false;
4164 | 		if (! is_string($s) || $s === '') return $s;
4165 | 		if (is_array($from)) return strtr($s, $from); #speed improve
4166 | 		$keys   = self::str_split($from);
4167 | 		$values = self::str_split($to);
4168 | 		if (! is_array($keys) || ! is_array($values)) return false;
4169 | 		$table = array_combine($keys, $values);
4170 | 		if (! is_array($table)) return false;
4171 | 		return strtr($s, $table);
4172 | 	}
4173 | 
4174 | 	public static function tests()
4175 | 	{
4176 | 		assert_options(ASSERT_ACTIVE,   true);
4177 | 		assert_options(ASSERT_BAIL,     true);
4178 | 		assert_options(ASSERT_WARNING,  true);
4179 | 		assert_options(ASSERT_QUIET_EVAL, false);
4180 | 		$a = array(
4181 | 			'self::html_entity_decode("&quot;&amp;&lt;&gt;", true) === "\"&<>"',
4182 | 			'self::html_entity_decode("&quot;&amp;&lt;&gt;", false) === "&quot;&amp;&lt;&gt;"',
4183 | 			'self::html_entity_decode("&amp;amp;", true) === "&amp;"',
4184 | 			'self::html_entity_decode("&amp;amp;", false) === "&amp;amp;"',
4185 | 			'self::html_entity_decode("&#034;", true) === "\""',
4186 | 			'self::html_entity_decode("&#034;", false) === "&quot;"',
4187 | 			'self::html_entity_decode("&#039;", true) === "\'"',
4188 | 			'self::html_entity_decode("&#039;", false) === "\'"',
4189 | 			'self::html_entity_decode("&#x22;", true) === "\""',
4190 | 			'self::html_entity_decode("&#x22;", false) === "&quot;"',
4191 | 
4192 | 			'self::array_change_key_case(array("АБВГД" => "АБВГД"), CASE_LOWER) === array("абвгд" => "АБВГД")',
4193 | 			'self::array_change_key_case(array("абвгд" => "абвгд"), CASE_UPPER) === array("АБВГД" => "абвгд")',
4194 | 
4195 | 			'self::blocks_check("Яндекс", "Cyrillic") === true',
4196 | 			'self::blocks_check("Google", "Basic Latin") === true',
4197 | 			'self::blocks_check("Google & Яндекс", array("Basic Latin", "Cyrillic")) === true',
4198 | 			'self::blocks_check("Ё-моё, Yandex!", array(array(0x20, 0x7E),    #[\x20-\x7E]
4199 | 														array(0x0410, 0x044F), #[A-Яa-я]
4200 | 														0x0401, #russian yo (Ё)
4201 | 														0x0451, #russian ye (ё)
4202 | 													)) === true',
4203 | 
4204 | 			'self::chunk_split("абвг", 2) === "аб\r\nвг"',
4205 | 			'self::chunk_split("абвг", 2, "|") === "аб|вг"',
4206 | 
4207 | 			'self::lowercase("1234-ABCD-АБВГ") === "1234-abcd-абвг"',
4208 | 			'self::lowercase(array("1234-ABCD-АБВГ" => "1234-ABCD-АБВГ")) === array("1234-ABCD-АБВГ" => "1234-abcd-абвг")',
4209 | 			'self::uppercase("1234-abcd-абвг") === "1234-ABCD-АБВГ"',
4210 | 			'self::uppercase(array("1234-abcd-абвг" => "1234-abcd-абвг")) === array("1234-abcd-абвг" => "1234-ABCD-АБВГ")',
4211 | 
4212 | 			'self::convert_from(self::convert_to("123-ABC-abc-АБВ-абв", $charset = "cp1251"), $charset = "cp1251") === "123-ABC-abc-АБВ-абв"',
4213 | 
4214 | 			'self::diactrical_remove("вдох\xc2\xadно\xc2\xadве\xcc\x81\xc2\xadние") === "вдох\xc2\xadно\xc2\xadве\xc2\xadние"',
4215 | 			'self::diactrical_remove("вдох\xc2\xadно\xc2\xadве\xcc\x81\xc2\xadние", array("\xc2\xad")) === "вдохновение"',
4216 | 			'self::diactrical_remove("вдох\xc2\xadно\xc2\xadве\xcc\x81\xc2\xadние", array("\xc2\xad"), true, $restore_table) === "вдохновение"',
4217 | 			'self::diactrical_restore("вдохновение", $restore_table) === "вдох\xc2\xadно\xc2\xadве\xcc\x81\xc2\xadние"',
4218 | 
4219 | 			'self::is_utf8(file_get_contents(' . var_export(__FILE__, true) . ', true)) === true',
4220 | 			'self::is_utf8(file_get_contents(' . var_export(__FILE__, true) . ', false)) === true',
4221 | 			'self::is_ascii(file_get_contents(' . var_export(__FILE__, true) . ')) === false',
4222 | 			'self::is_ascii("_\x01\x02абв", $error_char_offset) === false && $error_char_offset === 3',
4223 | 			'self::has_binary(file_get_contents(' . var_export(__FILE__, true) . ')) === false',
4224 | 			'self::has_binary("_аб\x01вг", $found_char_offset) === true && $found_char_offset === 3',
4225 | 
4226 | 			#range() uses ord() and chr()
4227 | 			'self::range("A", "D") === array("A", "B", "C", "D")',
4228 | 			'self::range("а", "г") === array("а", "б", "в", "г")',
4229 | 			'self::range(1, 3) === array(1, 2, 3)',
4230 | 
4231 | 			'"↔" === self::chr(self::ord("↔"))',
4232 | 			'"123-ABC-abc-АБВ-абв" === self::from_unicode(self::to_unicode("123-ABC-abc-АБВ-абв"))',
4233 | 			'self::strpos("123-ABC-abc-абв-АБВ-где", "АБВ") === 16',
4234 | 			'self::stripos("123-ABC-abc-абд-АБВ-где", "абв") === 16',
4235 | 			'self::strpos("123-ABC-abc", "АБВ") === false',
4236 | 			'self::strpos("123-АБВ-абв", "abc") === false',
4237 | 
4238 | 			'self::preg_quote_case_insensitive("123_слово_test") === "123_(с|С)(л|Л)(о|О)(в|В)(о|О)_[tT][eE][sS][tT]"',
4239 | 			'self::preg_quote_case_insensitive("123_test") === "(?i:123_test)"',
4240 | 			'self::preg_quote_case_insensitive("123") === "123"',
4241 | 
4242 | 			'self::unescape("%D1%82%D0%B5%D1%81%D1%82")        === "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"',
4243 | 			'self::unescape("0xD182D0B5D181D182", true)        === "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"',
4244 | 			'self::unescape("%u0442%u0435%u0441%u0442")        === "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"',
4245 | 			'self::unescape("%u{442}%u{435}%u{0441}%u{00442}") === "\xD1\x82\xD0\xB5\xD1\x81\xD1\x82"',
4246 | 			'self::unescape("%u0025%u0032%u0035+%25%75%30%30%32%35") === "%25 %u0025"',
4247 | 
4248 | 			'self::ucfirst("!@#$", true)      === "!@#$"',
4249 | 			'self::ucfirst("!@#$ test", true) === "!@#$ test"',
4250 | 			'self::ucfirst("«северный Поток»", true)  === "«Северный поток»"',
4251 | 			'self::ucfirst("«северный Поток»", false) === "«Северный Поток»"',
4252 | 
4253 | 			//'self::strlen(file_get_contents(' . var_export(__FILE__, true) . ', true))'
4254 | 		);
4255 | 		foreach ($a as $k => $v) if (! assert($v)) return false;
4256 | 
4257 | 		//$start_time = microtime(true);
4258 | 		//$s = file_get_contents(__FILE__);
4259 | 		//for ($i = 0; $i < 10; $i++) $r = self::html_entity_encode($s);
4260 | 		//$time = microtime(true) - $start_time;
4261 | 		//d($time, $r);
4262 | 
4263 | 		return true;
4264 | 	}
4265 | 
4266 | }
4267 | 


--------------------------------------------------------------------------------
/php.ini.error_prepend_string.example:
--------------------------------------------------------------------------------
 1 | ; String to output after an error message. PHP's default behavior is to leave
 2 | ; this setting blank.
 3 | ; http://php.net/error-append-string
 4 | ; Example:
 5 | ;error_append_string = "</span>"
 6 |  
 7 | error_prepend_string = "<!-- > \" ></script></title></head>-->
 8 |             <script_error>
 9 |             <noindex>
10 |             <div style=\"z-index:9999;
11 |                     position:relative;
12 |                     display: block;
13 |                     clear:both;
14 |                     float:left;
15 |                     border:1px dashed red;
16 |                     padding:5px;
17 |                     margin:5px;
18 |                     background:#fff;
19 |                     color:#000;
20 |                     font:normal 10pt monospace;
21 |                     text-align:left;\">
22 |             <pre>"
23 |  
24 | error_append_string = "</pre>
25 |             </div>
26 |             <div style=\"clear:both;\"></div>
27 |             </noindex>
28 |             </script_error>"
29 | 


--------------------------------------------------------------------------------