Rezha Julio

The Hard Coded Chemist

Unicode Character Database at Your Hand


Python’s self explanatory module called unicodedata provides the user with access to the Unicode Character Database and implicitly every character’s properties.

Lookup a character by name with lookup:

>>> import unicodedata
>>> unicodedata.lookup('RIGHT SQUARE BRACKET')
>>> three_wise_monkeys = ["SEE-NO-EVIL MONKEY",
                          "HEAR-NO-EVIL MONKEY",
                          "SPEAK-NO-EVIL MONKEY"]
>>> ''.join(map(unicodedata.lookup, three_wise_monkeys))

Get a character’s name with name:


Get the category of a character:

>>> unicodedata.category(u'X')
# L = letter, u = uppercase

Also, using the unicodedata Python module, it’s easy to normalize any unicode data strings (remove accents, etc):

>>> import unicodedata

data = u'ïnvéntìvé'
normal = unicodedata.normalize('NFKD', data).\
    encode('ASCII', 'ignore')
# b'inventive'

The NFKD stands for Normalization Form Compatibility Decomposition, and this is where characters are decomposed by compatibility, also multiple combining characters are arranged in a specific order.

To get the version of the Unicode Database currently used:

>>> unicodedata.unidata_version

Read more methods at the python documentation