Basic Information
General Category
Get the type of a character:
Canonical Combining Classes
The canonical combining classes are integers as they already have the integer definition:
Bidirectional Category
Decimal/Digit/Numeric Values
If the character has some numeric properties:
#include <cmath>
using namespace unicode;
UChar code = static_cast<UChar>('3')
assert(fraction.first == 3);
assert(fraction.second == 1);
༳
represents -1/2
:
#include <cmath>
using namespace unicode;
UChar code = 0x0F33;
assert(fraction.first == -1);
assert(fraction.second == 2);
Mirrored
Upper/Lower/Title Cases
using namespace unicode;
assert(
getUpperCase(
static_cast<UChar
>(
'a') ==
static_cast<UChar
>(
'A'));
assert(
getLowerCase(
static_cast<UChar
>(
'A') ==
static_cast<UChar
>(
'a'));
assert(
getTitleCase(
static_cast<UChar
>(
'a') ==
static_cast<UChar
>(
'A'));
Decomposition Mapping
The character ㉐
is the partnership sign, and its decomposition is PTE
:
using namespace unicode;
assert(decomposition.size() == 3u);
assert(decomposition[0] == 0x50);
assert(decomposition[1] == 0x54);
assert(decomposition[2] == 0x45);
UChar buffer[16];
assert(buffer[0] == 0x50);
assert(buffer[1] == 0x54);
assert(buffer[2] == 0x45);
assert(buffer[3] == 0);
Encoding & Decoding
The code points can be converted to UTF-8 or UTF-16 string, and vice versa.
An example to convert a UTF-8 encoded string to the code points:
std::cout << std::hex;
for (auto code : codes) {
std::cout << "0x" << code << " ";
}
std::cout << std::endl;
And you can convert it back to the UTF-8 string:
std::cout << str << std::endl;
It's the same with UTF-16 string, but the related functions uses std::u16string
as input and output.