├── LICENSE ├── README.md ├── Utf8Conv.sln └── Utf8Conv ├── Utf8Conv.hpp ├── Utf8Conv.vcxproj ├── Utf8Conv.vcxproj.filters └── Utf8ConvTest.cpp /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016-2022 by Giovanni Dicanio 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Utf8Conv v1.0.0 2 | ## Unicode UTF-8 <-> Unicode UTF-16 Conversion Utility Functions for Windows C++ Code 3 | 4 | by Giovanni Dicanio 5 | 6 | Code that implements utility functions to convert between Unicode UTF-8 and Unicode UTF-16. 7 | 8 | **UTF-8** strings are stored as `std::string` instances; **UTF-16** strings are represented 9 | using `std::wstring`. 10 | 11 | There are also conversion overloads that take `std::[w]string_view`s 12 | and C-style NUL-terminated string pointers. 13 | 14 | This code is currently being developed using **Visual Studio 2019** (v16.9.1) with C++17 features 15 | enabled (`/std:c++17`). 16 | The code compiles cleanly at warning level 4 (`/W4`) in both 32-bit and 64-bit builds. 17 | 18 | This is a **header-only** library, implemented in the **[`Utf8Conv.hpp`](Utf8Conv/Utf8Conv.hpp)** 19 | header file. 20 | 21 | `Utf8ConvTest.cpp` contains some test code for the library: check it out for some sample usage. 22 | 23 | The library exposes two main conversion functions: **`Utf8ToUtf16()`** and **`Utf16ToUtf8()`**. 24 | Conversion errors are signaled throwing exceptions. 25 | 26 | For example, you can simply convert a string from UTF-8 to UTF-16 using code like this: 27 | 28 | ```c++ 29 | // From UTF-8 to UTF-16 30 | wstring utf16String = Utf8ToUtf16(utf8String); 31 | ``` 32 | 33 | and viceversa: 34 | 35 | ```c++ 36 | // From UTF-16 to UTF-8 37 | string utf8String = Utf16ToUtf8(utf16String); 38 | ``` 39 | 40 | The library stuff lives under the `Utf8Conv` namespace. 41 | 42 | See the **[`Utf8Conv.hpp`](Utf8Conv/Utf8Conv.hpp)** header file for more details 43 | and **documentation** about the implemented conversion functions. 44 | -------------------------------------------------------------------------------- /Utf8Conv.sln: -------------------------------------------------------------------------------- 1 |  2 | Microsoft Visual Studio Solution File, Format Version 12.00 3 | # Visual Studio Version 16 4 | VisualStudioVersion = 16.0.31105.61 5 | MinimumVisualStudioVersion = 10.0.40219.1 6 | Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "Utf8Conv", "Utf8Conv\Utf8Conv.vcxproj", "{87E1B526-29FB-4AF8-AC01-9A78544A3B15}" 7 | EndProject 8 | Global 9 | GlobalSection(SolutionConfigurationPlatforms) = preSolution 10 | Debug|x64 = Debug|x64 11 | Debug|x86 = Debug|x86 12 | Release|x64 = Release|x64 13 | Release|x86 = Release|x86 14 | EndGlobalSection 15 | GlobalSection(ProjectConfigurationPlatforms) = postSolution 16 | {87E1B526-29FB-4AF8-AC01-9A78544A3B15}.Debug|x64.ActiveCfg = Debug|x64 17 | {87E1B526-29FB-4AF8-AC01-9A78544A3B15}.Debug|x64.Build.0 = Debug|x64 18 | {87E1B526-29FB-4AF8-AC01-9A78544A3B15}.Debug|x86.ActiveCfg = Debug|Win32 19 | {87E1B526-29FB-4AF8-AC01-9A78544A3B15}.Debug|x86.Build.0 = Debug|Win32 20 | {87E1B526-29FB-4AF8-AC01-9A78544A3B15}.Release|x64.ActiveCfg = Release|x64 21 | {87E1B526-29FB-4AF8-AC01-9A78544A3B15}.Release|x64.Build.0 = Release|x64 22 | {87E1B526-29FB-4AF8-AC01-9A78544A3B15}.Release|x86.ActiveCfg = Release|Win32 23 | {87E1B526-29FB-4AF8-AC01-9A78544A3B15}.Release|x86.Build.0 = Release|Win32 24 | EndGlobalSection 25 | GlobalSection(SolutionProperties) = preSolution 26 | HideSolutionNode = FALSE 27 | EndGlobalSection 28 | GlobalSection(ExtensibilityGlobals) = postSolution 29 | SolutionGuid = {64517626-0433-4621-8F38-7E4AD249FF0C} 30 | EndGlobalSection 31 | EndGlobal 32 | -------------------------------------------------------------------------------- /Utf8Conv/Utf8Conv.hpp: -------------------------------------------------------------------------------- 1 | #ifndef GIOVANNI_DICANIO_UTF8CONV_HPP_INCLUDED 2 | #define GIOVANNI_DICANIO_UTF8CONV_HPP_INCLUDED 3 | 4 | 5 | //////////////////////////////////////////////////////////////////////////////// 6 | // 7 | // *** Unicode UTF-8 <-> UTF-16 Conversion Helpers *** 8 | // 9 | // Copyright (C) by Giovanni Dicanio 10 | // 11 | // 12 | // First version: 2016, September 1st 13 | // Last update: 2022, October 17th 14 | // 15 | // E-mail: . AT REMOVE_THIS gmail.com 16 | // 17 | // This header-only C++ library implements conversion helper functions 18 | // to convert strings between Unicode UTF-8 and Unicode UTF-16. 19 | // 20 | // Unicode UTF-16 (LE) is the de facto standard Unicode encoding 21 | // of the Windows APIs, while UTF-8 is widely used to store and exchange 22 | // text across the Internet and in a multi-platform way. 23 | // So, in Windows C++ applications, many times there's a need to convert 24 | // between UTF-16 and UTF-8. 25 | // 26 | // Unicode UTF-8 strings are represented using the std::string class; 27 | // Unicode UTF-16 strings are represented using the std::wstring class. 28 | // ATL's CString is not used, to avoid dependencies from ATL or MFC. 29 | // 30 | // --------------------------------------------------------------------------- 31 | // 32 | // For more information, please read the article I wrote that was published 33 | // on the 2016 September issue of MSDN Magazine: 34 | // 35 | // C++ - Unicode Encoding Conversions with STL Strings and Win32 APIs 36 | // https://msdn.microsoft.com/magazine/mt763237 37 | // 38 | // --------------------------------------------------------------------------- 39 | // 40 | // Compiler: Visual Studio 2019 41 | // C++ Language Standard: ISO C++17 Standard (/std:c++17) 42 | // Code compiles cleanly at warning level 4 (/W4) on both 32-bit and 64-bit builds. 43 | // 44 | // Requires building in Unicode mode (which has been the default since VS2005). 45 | // 46 | // =========================================================================== 47 | // 48 | // The MIT License(MIT) 49 | // 50 | // Copyright(c) 2016-2022 by Giovanni Dicanio 51 | // 52 | // Permission is hereby granted, free of charge, to any person obtaining a copy 53 | // of this software and associated documentation files(the "Software"), to deal 54 | // in the Software without restriction, including without limitation the rights 55 | // to use, copy, modify, merge, publish, distribute, sublicense, and / or sell 56 | // copies of the Software, and to permit persons to whom the Software is 57 | // furnished to do so, subject to the following conditions : 58 | // 59 | // The above copyright notice and this permission notice shall be included in all 60 | // copies or substantial portions of the Software. 61 | // 62 | // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 63 | // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 64 | // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.IN NO EVENT SHALL THE 65 | // AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 66 | // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 67 | // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 68 | // SOFTWARE. 69 | // 70 | //////////////////////////////////////////////////////////////////////////////// 71 | 72 | 73 | //------------------------------------------------------------------------------ 74 | // 75 | // Safe Integer Overflow Checks 76 | // ============================ 77 | // 78 | // Windows APIs used to convert between UTF-8 and UTF-16 79 | // (e.g. MultiByteToWideChar) require string lengths expressed as int, 80 | // while STL strings store their lengths using size_t. 81 | // In case of *VERY* long strings there can be an overflow when converting 82 | // from size_t (STL std::[w]string) to int (for Win32 API calls). 83 | // 84 | // The default behavior for this library is to *always* perform those checks 85 | // at runtime, and throw a std::overflow_error exception on overflow. 86 | // 87 | // If you want to change this default behavior and want to _disable_ 88 | // those runtime checks in Release Builds, you can #define the following macro: 89 | // 90 | // // If defined, integer overflow checks happen in debug builds only 91 | // #define GIOVANNI_DICANIO_UTF8CONV_CHECK_INTEGER_OVERFLOWS_ONLY_IN_DEBUG 92 | // 93 | //------------------------------------------------------------------------------ 94 | 95 | 96 | //------------------------------------------------------------------------------ 97 | // Includes 98 | //------------------------------------------------------------------------------ 99 | 100 | #include // Windows Platform SDK 101 | #include // _ASSERTE 102 | 103 | #include // For std::numeric_limits 104 | #include // For std::overflow_error 105 | #include // For std::string, std::wstring 106 | #include // For std::string_view, std::wstring_view 107 | #include // For std::system_error 108 | 109 | 110 | namespace Utf8Conv 111 | { 112 | 113 | // 114 | // Forward declarations and Function Prototypes 115 | // 116 | 117 | // Exception class representing an error occurred during a Unicode conversion 118 | class Utf8ConversionException; 119 | 120 | 121 | //============================================================================== 122 | // *** UTF-16 --> UTF-8 *** 123 | //============================================================================== 124 | 125 | //------------------------------------------------------------------------------ 126 | // Convert the input string from UTF-16 to UTF-8. 127 | // Throws Utf8ConversionException on conversion errors 128 | // (e.g. invalid UTF-16 sequence found in input string). 129 | //------------------------------------------------------------------------------ 130 | [[nodiscard]] std::string Utf16ToUtf8(const std::wstring& utf16); 131 | 132 | //------------------------------------------------------------------------------ 133 | // Convert the input string *view* from UTF-16 to UTF-8. 134 | // Throws Utf8ConversionException on conversion errors 135 | // (e.g. invalid UTF-16 sequence found in input string). 136 | //------------------------------------------------------------------------------ 137 | [[nodiscard]] std::string Utf16ToUtf8(std::wstring_view utf16); 138 | 139 | //------------------------------------------------------------------------------ 140 | // Convert the input NUL-terminated C-style string pointer from UTF-16 to UTF-8. 141 | // Throws Utf8ConversionException on conversion errors 142 | // (e.g. invalid UTF-16 sequence found in input string). 143 | //------------------------------------------------------------------------------ 144 | [[nodiscard]] std::string Utf16ToUtf8(_In_opt_z_ const wchar_t* utf16); 145 | 146 | 147 | //============================================================================== 148 | // *** UTF-8 --> UTF-16 *** 149 | //============================================================================== 150 | 151 | //------------------------------------------------------------------------------ 152 | // Convert the input string from UTF-8 to UTF-16. 153 | // Throws Utf8ConversionException on conversion errors 154 | // (e.g. invalid UTF-8 sequence found in input string). 155 | //------------------------------------------------------------------------------ 156 | [[nodiscard]] std::wstring Utf8ToUtf16(const std::string& utf8); 157 | 158 | //------------------------------------------------------------------------------ 159 | // Convert the input string *view* from UTF-8 to UTF-16. 160 | // Throws Utf8ConversionException on conversion errors 161 | // (e.g. invalid UTF-8 sequence found in input string). 162 | //------------------------------------------------------------------------------ 163 | [[nodiscard]] std::wstring Utf8ToUtf16(std::string_view utf8); 164 | 165 | //------------------------------------------------------------------------------ 166 | // Convert the input NUL-terminated C-style string pointer from UTF-8 to UTF-16. 167 | // Throws Utf8ConversionException on conversion errors 168 | // (e.g. invalid UTF-8 sequence found in input string). 169 | //------------------------------------------------------------------------------ 170 | [[nodiscard]] std::wstring Utf8ToUtf16(_In_opt_z_ const char* utf8); 171 | 172 | //============================================================================== 173 | 174 | 175 | //------------------------------------------------------------------------------ 176 | // Error occurred during UTF-8 conversions 177 | //------------------------------------------------------------------------------ 178 | class Utf8ConversionException 179 | : public std::system_error 180 | { 181 | public: 182 | 183 | // Possible conversion "directions" 184 | enum class ConversionType 185 | { 186 | FromUtf8ToUtf16 = 0, 187 | FromUtf16ToUtf8 188 | }; 189 | 190 | 191 | // Initialize with last Win32 error code, error message raw C-string, and conversion direction 192 | Utf8ConversionException(DWORD errorCode, const char* message, ConversionType type); 193 | 194 | // Initialize with last Win32 error code, error message string, and conversion direction 195 | Utf8ConversionException(DWORD errorCode, const std::string& message, ConversionType type); 196 | 197 | // Direction of the conversion (e.g. from UTF-8 to UTF-16) 198 | [[nodiscard]] ConversionType Direction() const noexcept; 199 | 200 | 201 | private: 202 | 203 | // Direction of the conversion 204 | ConversionType m_conversionType; 205 | }; 206 | 207 | 208 | //------------------------------------------------------------------------------ 209 | // Utf8ConversionException Inline Method Implementations 210 | //------------------------------------------------------------------------------ 211 | 212 | inline Utf8ConversionException::Utf8ConversionException( 213 | const DWORD errorCode, 214 | const char* const message, 215 | const ConversionType type 216 | ) 217 | : std::system_error(errorCode, std::system_category(), message) 218 | , m_conversionType(type) 219 | { 220 | } 221 | 222 | 223 | inline Utf8ConversionException::Utf8ConversionException( 224 | const DWORD errorCode, 225 | const std::string& message, 226 | const ConversionType type 227 | ) 228 | : std::system_error(errorCode, std::system_category(), message) 229 | , m_conversionType(type) 230 | { 231 | } 232 | 233 | 234 | inline Utf8ConversionException::ConversionType Utf8ConversionException::Direction() const noexcept 235 | { 236 | return m_conversionType; 237 | } 238 | 239 | 240 | //------------------------------------------------------------------------------ 241 | // Private Helper Functions 242 | //------------------------------------------------------------------------------ 243 | 244 | namespace detail 245 | { 246 | 247 | //------------------------------------------------------------------------------ 248 | // Returns true if the input 'size_t' value overflows the maximum value 249 | // that can be stored in an 'int' 250 | //------------------------------------------------------------------------------ 251 | [[nodiscard]] inline bool ValueOverflowsInt(size_t value) 252 | { 253 | if (value > static_cast((std::numeric_limits::max)())) 254 | { 255 | return true; 256 | } 257 | else 258 | { 259 | return false; 260 | } 261 | } 262 | 263 | 264 | //------------------------------------------------------------------------------ 265 | // Returns true if the input string pointer is null or it points to an empty 266 | // string ('\0') 267 | //------------------------------------------------------------------------------ 268 | [[nodiscard]] inline bool IsNullOrEmpty(_In_opt_z_ const char* psz) 269 | { 270 | if (psz == nullptr) 271 | { 272 | return true; 273 | } 274 | 275 | if (*psz == '\0') 276 | { 277 | return true; 278 | } 279 | 280 | return false; 281 | } 282 | 283 | 284 | //------------------------------------------------------------------------------ 285 | // Returns true if the input string pointer is null or it points to an empty 286 | // string (L'\0') 287 | //------------------------------------------------------------------------------ 288 | [[nodiscard]] inline bool IsNullOrEmpty(_In_opt_z_ const wchar_t* psz) 289 | { 290 | if (psz == nullptr) 291 | { 292 | return true; 293 | } 294 | 295 | if (*psz == L'\0') 296 | { 297 | return true; 298 | } 299 | 300 | return false; 301 | } 302 | 303 | } // namespace detail 304 | 305 | 306 | // 307 | // Convenience macro: 308 | // 309 | // ..._ALWAYS_CHECK_INTEGER_OVERFLOWS = !(..._CHECK_INTEGER_OVERFLOWS_ONLY_IN_DEBUG) 310 | // 311 | // NOTE: Users of this library should #define (or undefine) the macro 312 | // GIOVANNI_DICANIO_UTF8CONV_CHECK_INTEGER_OVERFLOWS_ONLY_IN_DEBUG 313 | // (see comments at the beginning of this header file). 314 | // 315 | // This ...UTF8CONV_ALWAYS_CHECK_INTEGER_OVERFLOWS macro here 316 | // is for library's *private code* only. 317 | // 318 | #ifndef GIOVANNI_DICANIO_UTF8CONV_CHECK_INTEGER_OVERFLOWS_ONLY_IN_DEBUG 319 | #define GIOVANNI_DICANIO_UTF8CONV_ALWAYS_CHECK_INTEGER_OVERFLOWS 320 | #endif 321 | 322 | 323 | //------------------------------------------------------------------------------ 324 | // Unicode Encoding Conversion Function Implementations 325 | //------------------------------------------------------------------------------ 326 | 327 | inline std::string Utf16ToUtf8(const std::wstring& utf16) 328 | { 329 | if (utf16.empty()) 330 | { 331 | return std::string{}; 332 | } 333 | 334 | return Utf16ToUtf8(std::wstring_view{ utf16.data(), utf16.size() }); 335 | } 336 | 337 | 338 | inline std::string Utf16ToUtf8(_In_opt_z_ const wchar_t* utf16) 339 | { 340 | if (detail::IsNullOrEmpty(utf16)) 341 | { 342 | return std::string{}; 343 | } 344 | 345 | // 346 | // The following line generates a Warning C6387 347 | // when compiled with VS2019 v16.9.1: 348 | // 349 | // --------------------------------------------------------------------------------------- 350 | // 'utf16' could be '0': this does not adhere to the specification 351 | // for the function 'std::basic_string_view >::{ctor}'. 352 | // --------------------------------------------------------------------------------------- 353 | // 354 | // But the code analyzer was unable to understand that I *did* a proper check 355 | // for nullptr in the above detail::IsNullOrEmpty() call. 356 | // 357 | // So, this is actually a spurious warning, that I'm disabling it here: 358 | // 359 | #pragma warning (suppress: 6387) 360 | return Utf16ToUtf8(std::wstring_view{ utf16 }); 361 | } 362 | 363 | 364 | inline std::string Utf16ToUtf8(std::wstring_view utf16) 365 | { 366 | if (utf16.empty()) 367 | { 368 | return std::string{}; 369 | } 370 | 371 | // Safely fail if an invalid UTF-16 character sequence is encountered 372 | constexpr DWORD kFlags = WC_ERR_INVALID_CHARS; 373 | 374 | #ifdef GIOVANNI_DICANIO_UTF8CONV_ALWAYS_CHECK_INTEGER_OVERFLOWS 375 | // Safely cast the length of the source UTF-16 string from size_t 376 | // (returned by std::wstring_view::length()) to int 377 | // for the WideCharToMultiByte API. 378 | // If the size_t value is too big, throw an exception to prevent overflows. 379 | if (detail::ValueOverflowsInt(utf16.length())) 380 | { 381 | throw std::overflow_error( 382 | "[Utf8Conv::Utf16ToUt8] Input string too long: size_t-length doesn't fit into int."); 383 | } 384 | #else 385 | // Only check in debug-builds 386 | _ASSERTE(! detail::ValueOverflowsInt(utf16.length())); 387 | #endif 388 | 389 | const int utf16Length = static_cast(utf16.length()); 390 | 391 | // Get the length, in chars, of the resulting UTF-8 string 392 | const int utf8Length = ::WideCharToMultiByte( 393 | CP_UTF8, // convert to UTF-8 394 | kFlags, // conversion flags 395 | utf16.data(), // source UTF-16 string 396 | utf16Length, // length of source UTF-16 string, in wchar_ts 397 | nullptr, // unused - no conversion required in this step 398 | 0, // request size of destination buffer, in chars 399 | nullptr, nullptr // unused 400 | ); 401 | if (utf8Length == 0) 402 | { 403 | // Conversion error: capture error code and throw 404 | const DWORD error = ::GetLastError(); 405 | throw Utf8ConversionException( 406 | error, 407 | error == ERROR_NO_UNICODE_TRANSLATION ? 408 | "[Utf8Conv::Utf16ToUtf8] Invalid UTF-16 sequence found in input string." 409 | : 410 | "[Utf8Conv::Utf16ToUtf8] Cannot get result string length when converting "\ 411 | "from UTF-16 to UTF-8 (WideCharToMultiByte failed).", 412 | Utf8ConversionException::ConversionType::FromUtf16ToUtf8); 413 | } 414 | 415 | // Make room in the destination string for the converted bits 416 | std::string utf8(utf8Length, ' '); 417 | 418 | // Do the actual conversion from UTF-16 to UTF-8 419 | int result = ::WideCharToMultiByte( 420 | CP_UTF8, // convert to UTF-8 421 | kFlags, // conversion flags 422 | utf16.data(), // source UTF-16 string 423 | utf16Length, // length of source UTF-16 string, in wchar_ts 424 | utf8.data(), // pointer to destination buffer 425 | utf8Length, // size of destination buffer, in chars 426 | nullptr, nullptr // unused 427 | ); 428 | if (result == 0) 429 | { 430 | // Conversion error: capture error code and throw 431 | const DWORD error = ::GetLastError(); 432 | throw Utf8ConversionException( 433 | error, 434 | error == ERROR_NO_UNICODE_TRANSLATION ? 435 | "[Utf8Conv::Utf16ToUtf8] Invalid UTF-16 sequence found in input string." 436 | : 437 | "[Utf8Conv::Utf16ToUtf8] Cannot convert from UTF-16 to UTF-8 "\ 438 | "(WideCharToMultiByte failed).", 439 | Utf8ConversionException::ConversionType::FromUtf16ToUtf8); 440 | } 441 | 442 | return utf8; 443 | } 444 | 445 | 446 | inline std::wstring Utf8ToUtf16(const std::string& utf8) 447 | { 448 | if (utf8.empty()) 449 | { 450 | return std::wstring{}; 451 | } 452 | 453 | return Utf8ToUtf16(std::string_view{ utf8.data(), utf8.size() }); 454 | } 455 | 456 | 457 | inline std::wstring Utf8ToUtf16(_In_opt_z_ const char* utf8) 458 | { 459 | if (detail::IsNullOrEmpty(utf8)) 460 | { 461 | return std::wstring{}; 462 | } 463 | 464 | // 465 | // The following line generates a Warning C6387 466 | // when compiled with VS2019 v16.9.1: 467 | // 468 | // --------------------------------------------------------------------------------- 469 | // 'utf8' could be '0': this does not adhere to the specification 470 | // for the function 'std::basic_string_view >::{ctor}'. 471 | // --------------------------------------------------------------------------------- 472 | // 473 | // But the code analyzer was unable to understand that I *did* a proper check 474 | // for nullptr in the above detail::IsNullOrEmpty() call. 475 | // 476 | // So, this is actually a spurious warning, that I'm disabling it here: 477 | // 478 | #pragma warning (suppress: 6387) 479 | return Utf8ToUtf16(std::string_view{ utf8 }); 480 | } 481 | 482 | 483 | inline std::wstring Utf8ToUtf16(std::string_view utf8) 484 | { 485 | if (utf8.empty()) 486 | { 487 | return std::wstring{}; 488 | } 489 | 490 | // Safely fail if an invalid UTF-8 character sequence is encountered 491 | constexpr DWORD kFlags = MB_ERR_INVALID_CHARS; 492 | 493 | #ifdef GIOVANNI_DICANIO_UTF8CONV_ALWAYS_CHECK_INTEGER_OVERFLOWS 494 | // Safely cast the length of the source UTF-8 string from size_t 495 | // (returned by std::string_view::length()) to int 496 | // for the MultiByteToWideChar API. 497 | // If the size_t value is too big, throw an exception to prevent overflows. 498 | if (detail::ValueOverflowsInt(utf8.length())) 499 | { 500 | throw std::overflow_error( 501 | "[Utf8Conv::Utf8ToUtf16] Input string too long: size_t-length doesn't fit into int."); 502 | } 503 | #else 504 | // Only check in debug-builds 505 | _ASSERTE(! detail::ValueOverflowsInt(utf8.length())); 506 | #endif 507 | 508 | const int utf8Length = static_cast(utf8.length()); 509 | 510 | // Get the size of the destination UTF-16 string 511 | const int utf16Length = ::MultiByteToWideChar( 512 | CP_UTF8, // source string is in UTF-8 513 | kFlags, // conversion flags 514 | utf8.data(), // source UTF-8 string pointer 515 | utf8Length, // length of the source UTF-8 string, in chars 516 | nullptr, // unused - no conversion done in this step 517 | 0 // request size of destination buffer, in wchar_ts 518 | ); 519 | if (utf16Length == 0) 520 | { 521 | // Conversion error: capture error code and throw 522 | const DWORD error = ::GetLastError(); 523 | throw Utf8ConversionException( 524 | error, 525 | error == ERROR_NO_UNICODE_TRANSLATION ? 526 | "[Utf8Conv::Utf8ToUtf16] Invalid UTF-8 sequence found in input string." 527 | : 528 | "[Utf8Conv::Utf8ToUtf16] Cannot get result string length when converting " \ 529 | "from UTF-8 to UTF-16 (MultiByteToWideChar failed).", 530 | Utf8ConversionException::ConversionType::FromUtf8ToUtf16); 531 | } 532 | 533 | // Make room in the destination string for the converted bits 534 | std::wstring utf16(utf16Length, ' '); 535 | 536 | // Do the actual conversion from UTF-8 to UTF-16 537 | int result = ::MultiByteToWideChar( 538 | CP_UTF8, // source string is in UTF-8 539 | kFlags, // conversion flags 540 | utf8.data(), // source UTF-8 string pointer 541 | utf8Length, // length of source UTF-8 string, in chars 542 | utf16.data(), // pointer to destination buffer 543 | utf16Length // size of destination buffer, in wchar_ts 544 | ); 545 | if (result == 0) 546 | { 547 | // Conversion error: capture error code and throw 548 | const DWORD error = ::GetLastError(); 549 | throw Utf8ConversionException( 550 | error, 551 | error == ERROR_NO_UNICODE_TRANSLATION ? 552 | "[Utf8Conv::Utf8ToUtf16] Invalid UTF-8 sequence found in input string." 553 | : 554 | "[Utf8Conv::Utf8ToUtf16] Cannot convert from UTF-8 to UTF-16 "\ 555 | "(MultiByteToWideChar failed).", 556 | Utf8ConversionException::ConversionType::FromUtf8ToUtf16); 557 | } 558 | 559 | return utf16; 560 | } 561 | 562 | } // namespace Utf8Conv 563 | 564 | 565 | #endif // GIOVANNI_DICANIO_UTF8CONV_HPP_INCLUDED 566 | -------------------------------------------------------------------------------- /Utf8Conv/Utf8Conv.vcxproj: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Debug 6 | Win32 7 | 8 | 9 | Release 10 | Win32 11 | 12 | 13 | Debug 14 | x64 15 | 16 | 17 | Release 18 | x64 19 | 20 | 21 | 22 | 16.0 23 | Win32Proj 24 | {87e1b526-29fb-4af8-ac01-9a78544a3b15} 25 | Utf8Conv 26 | 10.0 27 | 28 | 29 | 30 | Application 31 | true 32 | v142 33 | Unicode 34 | 35 | 36 | Application 37 | false 38 | v142 39 | true 40 | Unicode 41 | 42 | 43 | Application 44 | true 45 | v142 46 | Unicode 47 | 48 | 49 | Application 50 | false 51 | v142 52 | true 53 | Unicode 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | true 75 | 76 | 77 | false 78 | 79 | 80 | true 81 | 82 | 83 | false 84 | 85 | 86 | 87 | Level4 88 | true 89 | WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions) 90 | true 91 | stdcpp17 92 | 93 | 94 | Console 95 | true 96 | 97 | 98 | 99 | 100 | Level4 101 | true 102 | true 103 | true 104 | WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 105 | true 106 | stdcpp17 107 | 108 | 109 | Console 110 | true 111 | true 112 | true 113 | 114 | 115 | 116 | 117 | Level4 118 | true 119 | _DEBUG;_CONSOLE;%(PreprocessorDefinitions) 120 | true 121 | stdcpp17 122 | 123 | 124 | Console 125 | true 126 | 127 | 128 | 129 | 130 | Level4 131 | true 132 | true 133 | true 134 | NDEBUG;_CONSOLE;%(PreprocessorDefinitions) 135 | true 136 | stdcpp17 137 | 138 | 139 | Console 140 | true 141 | true 142 | true 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | -------------------------------------------------------------------------------- /Utf8Conv/Utf8Conv.vcxproj.filters: -------------------------------------------------------------------------------- 1 |  2 | 3 | 4 | 5 | {4FC737F1-C7A5-4376-A066-2A32D752A2FF} 6 | cpp;c;cc;cxx;c++;cppm;ixx;def;odl;idl;hpj;bat;asm;asmx 7 | 8 | 9 | {93995380-89BD-4b04-88EB-625FBE52EBFB} 10 | h;hh;hpp;hxx;h++;hm;inl;inc;ipp;xsd 11 | 12 | 13 | {67DA6AB6-F800-4c08-8B7A-83BB121AAD01} 14 | rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav;mfcribbon-ms 15 | 16 | 17 | 18 | 19 | Source Files 20 | 21 | 22 | 23 | 24 | Header Files 25 | 26 | 27 | 28 | 29 | 30 | 31 | -------------------------------------------------------------------------------- /Utf8Conv/Utf8ConvTest.cpp: -------------------------------------------------------------------------------- 1 | //////////////////////////////////////////////////////////////////////////////// 2 | // 3 | // Utf8ConvTest.cpp -- by Giovanni Dicanio 4 | // 5 | // Unit test the UTF-8 <-> UTF-16 conversion functions declared in Utf8Conv.hpp 6 | // 7 | //////////////////////////////////////////////////////////////////////////////// 8 | 9 | 10 | // 11 | // You can also test the code undefining the following overflow check macro 12 | // *before* including "Utf8Conv.hpp": 13 | // 14 | //#define GIOVANNI_DICANIO_UTF8CONV_CHECK_INTEGER_OVERFLOWS_ONLY_IN_DEBUG 15 | // 16 | 17 | #include "Utf8Conv.hpp" // Library to test 18 | 19 | #include // Console output 20 | 21 | using std::cout; 22 | using std::string; 23 | using std::wstring; 24 | 25 | using Utf8Conv::Utf16ToUtf8; 26 | using Utf8Conv::Utf8ToUtf16; 27 | 28 | 29 | void Check(bool condition, const char* message) 30 | { 31 | cout << "[Check]: " << message; 32 | if (condition) 33 | { 34 | cout << " [OK]\n"; 35 | } 36 | else 37 | { 38 | cout << " [Failed]\n"; 39 | } 40 | } 41 | 42 | 43 | void TestEmptyStrings() 44 | { 45 | Check(Utf16ToUtf8(L"").empty(), "Input empty UTF-16 C-string converted to empty UTF-8"); 46 | Check(Utf8ToUtf16("").empty(), "Input empty UTF-8 C-string converted to empty UTF-16"); 47 | 48 | Check(Utf16ToUtf8(nullptr).empty(), "Input NULL UTF-16 pointer converted to empty UTF-8"); 49 | Check(Utf8ToUtf16(nullptr).empty(), "Input NULL UTF-8 pointer converted to empty UTF-16"); 50 | 51 | Check(Utf16ToUtf8(wstring{}).empty(), "Input empty UTF-16 wstring converted to empty UTF-8"); 52 | Check(Utf8ToUtf16(string{}).empty(), "Input empty UTF-8 string converted to empty UTF-16"); 53 | } 54 | 55 | 56 | void TestSimpleStrings() 57 | { 58 | { 59 | string s1 = "Connie"; 60 | wstring ws = Utf8ToUtf16(s1); 61 | string s2 = Utf16ToUtf8(ws); 62 | Check(s1 == s2, "Conversion loop UTF-8 -> UTF-16 -> UTF-8"); 63 | } 64 | 65 | { 66 | wstring ws1 = L"Connie Plus Plus"; 67 | string s = Utf16ToUtf8(ws1); 68 | wstring ws2 = Utf8ToUtf16(s); 69 | Check(ws1 == ws2, "Conversion loop UTF-16 -> UTF-8 -> UTF-16"); 70 | } 71 | 72 | 73 | // 74 | // Unicode UTF-16 string with a Japanese kanji 75 | // 76 | // https://www.compart.com/en/unicode/U+5B66 77 | // 78 | { 79 | wstring ws1 = L"Connie \u5B66 C++"; 80 | string s = Utf16ToUtf8(ws1); 81 | wstring ws2 = Utf8ToUtf16(s); 82 | Check(ws1 == ws2, "Conversion loop UTF-16 -> UTF-8 -> UTF-16"); 83 | } 84 | } 85 | 86 | 87 | void TestConversionErrors() 88 | { 89 | // 90 | // Try an invalid UTF-8 sequence 91 | // 92 | string utf8Invalid = "Invalid UTF-8 *** @ *** Invalid UTF-8"; 93 | size_t pos = utf8Invalid.find('@'); 94 | utf8Invalid[pos] = static_cast(0xC1); // Invalid UTF-8 byte 95 | 96 | // I expect the following conversion to throw an exception: 97 | bool exceptionRaised = false; 98 | try 99 | { 100 | wstring impossible = Utf8ToUtf16(utf8Invalid); 101 | } 102 | catch (Utf8Conv::Utf8ConversionException const& ex) 103 | { 104 | if (ex.Direction() == Utf8Conv::Utf8ConversionException::ConversionType::FromUtf8ToUtf16) 105 | { 106 | exceptionRaised = true; 107 | } 108 | } 109 | Check(exceptionRaised, "Utf8ConversionException caught from invalid UTF-8"); 110 | 111 | 112 | // 113 | // Try an invalid UTF-16 sequence 114 | // 115 | wstring utf16Invalid = L"Invalid UTF-16 *** @ *** Invalid UTF-16"; 116 | pos = utf16Invalid.find(L'@'); 117 | utf16Invalid[pos] = static_cast(0xD800); // Invalid UTF-16 118 | 119 | // I expect the following conversion to throw an exception: 120 | exceptionRaised = false; 121 | try 122 | { 123 | string impossible = Utf16ToUtf8(utf16Invalid); 124 | } 125 | catch (Utf8Conv::Utf8ConversionException const& ex) 126 | { 127 | if (ex.Direction() == Utf8Conv::Utf8ConversionException::ConversionType::FromUtf16ToUtf8) 128 | { 129 | exceptionRaised = true; 130 | } 131 | } 132 | Check(exceptionRaised, "Utf8ConversionException caught from invalid UTF-16"); 133 | 134 | } 135 | 136 | 137 | // 138 | // Run the various tests 139 | // 140 | int main() 141 | { 142 | constexpr int kExitOk = 0; 143 | constexpr int kExitError = 1; 144 | 145 | try 146 | { 147 | cout << "===============================================\n"; 148 | cout << "*** Giovanni Dicanio's Utf8Conv C++ Library ***\n"; 149 | cout << "===============================================\n"; 150 | 151 | cout << "\nRunning tests...\n\n"; 152 | 153 | TestEmptyStrings(); 154 | TestSimpleStrings(); 155 | TestConversionErrors(); 156 | } 157 | catch (const std::exception& ex) 158 | { 159 | cout << "\n*** Error ***\n"; 160 | cout << ex.what() << '\n'; 161 | 162 | return kExitError; 163 | } 164 | 165 | return kExitOk; 166 | } 167 | --------------------------------------------------------------------------------