Unicode Char
UChar

CI Deploy

Basic Information

General Category

Get the type of a character:

#include "unicode_char.h"
using namespace unicode;
assert(getGeneralCategory(0x370) == GeneralCategory::Lu);
assert(getBaseGeneralCategory(0x21) == BaseGeneralCategory::P);

Canonical Combining Classes

The canonical combining classes are integers as they already have the integer definition:

#include "unicode_char.h"
using namespace unicode;
assert(getCanonicalCombiningClass(0x302C) == 232);

Bidirectional Category

#include "unicode_char.h"
using namespace unicode;
assert(getBidirectionalCategory(0x0030) == BidirectionalCategory::EN);

Decimal/Digit/Numeric Values

If the character has some numeric properties:

#include <cmath>
#include "unicode_char.h"
using namespace unicode;
UChar code = static_cast<UChar>('3')
assert(getDecimalDigitValue(code) == 3);
assert(getDigitValue(code) == 3);
assert(fabs(getNumericValue(code) - 3.0) <= 1e-8));
auto fraction = getNumericFraction(code);
assert(fraction.first == 3);
assert(fraction.second == 1);

represents -1/2:

#include <cmath>
#include "unicode_char.h"
using namespace unicode;
UChar code = 0x0F33;
assert(fabs(getNumericValue(code) - (-0.5)) <= 1e-8));
auto fraction = getNumericFraction(code);
assert(fraction.first == -1);
assert(fraction.second == 2);

Mirrored

#include "unicode_char.h"
using namespace unicode;
assert(isMirrored(0x3C));

Upper/Lower/Title Cases

#include "unicode_char.h"
using namespace unicode;
assert(getUpperCase(static_cast<UChar>('a') == static_cast<UChar>('A'));
assert(getLowerCase(static_cast<UChar>('A') == static_cast<UChar>('a'));
assert(getTitleCase(static_cast<UChar>('a') == static_cast<UChar>('A'));

Decomposition Mapping

The character is the partnership sign, and its decomposition is PTE:

#include "unicode_char.h"
using namespace unicode;
assert(getDecompositionMappingTag(0x3250)) == DecompositionMappingTag::SQUARE);
auto decomposition = getDecompositionMapping(0x3250);
assert(decomposition.size() == 3u);
assert(decomposition[0] == 0x50);
assert(decomposition[1] == 0x54);
assert(decomposition[2] == 0x45);
UChar buffer[16];
getDecompositionMapping(0x3250, buffer);
assert(buffer[0] == 0x50);
assert(buffer[1] == 0x54);
assert(buffer[2] == 0x45);
assert(buffer[3] == 0);

Encoding & Decoding

The code points can be converted to UTF-8 or UTF-16 string, and vice versa.

An example to convert a UTF-8 encoded string to the code points:

#include "unicode_char.h"
auto codes = unicode::fromUTF8("你好,世界!");
std::cout << std::hex;
for (auto code : codes) {
std::cout << "0x" << code << " ";
}
std::cout << std::endl;
// Outputs should be: "0x4f60 0x597d 0xff0c 0x4e16 0x754c 0xff01 "

And you can convert it back to the UTF-8 string:

auto str = unicode::toUTF8(codes);
std::cout << str << std::endl;
// Outputs should be: "你好,世界!"

It's the same with UTF-16 string, but the related functions uses std::u16string as input and output.

unicode::getUpperCase
UChar getUpperCase(UChar code)
Definition: unicode_char.cpp:162
unicode::getTitleCase
UChar getTitleCase(UChar code)
Definition: unicode_char.cpp:170
unicode::getNumericValue
double getNumericValue(UChar code)
Definition: unicode_char.cpp:130
unicode::fromUTF8
std::vector< UChar > fromUTF8(const std::string &str)
Definition: encode.cpp:89
unicode::getBaseGeneralCategory
BaseGeneralCategory getBaseGeneralCategory(UChar code)
Definition: unicode_char.cpp:62
unicode::getDigitValue
int32_t getDigitValue(UChar code)
Definition: unicode_char.cpp:122
unicode::getDecompositionMapping
std::vector< UChar > getDecompositionMapping(UChar code)
Definition: unicode_char.cpp:86
unicode::getNumericFraction
std::pair< int64_t, int64_t > getNumericFraction(UChar code)
Definition: unicode_char.cpp:138
unicode::getBidirectionalCategory
BidirectionalCategory getBidirectionalCategory(UChar code)
Definition: unicode_char.cpp:74
unicode_char.h
The data file that stores the information.
unicode::getCanonicalCombiningClass
int32_t getCanonicalCombiningClass(UChar code)
Definition: unicode_char.cpp:66
unicode::getDecompositionMappingTag
DecompositionMappingTag getDecompositionMappingTag(UChar code)
Definition: unicode_char.cpp:78
unicode::getLowerCase
UChar getLowerCase(UChar code)
Definition: unicode_char.cpp:166
unicode::isMirrored
bool isMirrored(UChar code)
Definition: unicode_char.cpp:146
unicode::toUTF8
std::string toUTF8(const std::vector< UChar > codes)
Definition: encode.cpp:58
unicode::getGeneralCategory
GeneralCategory getGeneralCategory(UChar code)
Definition: unicode_char.cpp:58
unicode::getDecimalDigitValue
int32_t getDecimalDigitValue(UChar code)
Definition: unicode_char.cpp:114