Python’s self explanatory module called unicodedata provides the user with access to the Unicode Character Database and implicitly every character’s properties.
Lookup a character by name with lookup:
>>> import unicodedata>>> unicodedata.lookup('RIGHT SQUARE BRACKET')']'>>> three_wise_monkeys = ["SEE-NO-EVIL MONKEY", "HEAR-NO-EVIL MONKEY", "SPEAK-NO-EVIL MONKEY"]>>> ''.join(map(unicodedata.lookup, three_wise_monkeys))'🙈🙉🙊'Get a character’s name with name:
>>> unicodedata.name(u'~')'TILDE'Get the category of a character:
>>> unicodedata.category(u'X')'Lu'# L = letter, u = uppercaseAlso, using the unicodedata Python module, it’s easy to normalize any unicode data strings (remove accents, etc):
>>> import unicodedata
data = u'ïnvéntìvé'normal = unicodedata.normalize('NFKD', data).\ encode('ASCII', 'ignore')print(normal)# b'inventive'The NFKD stands for Normalization Form Compatibility Decomposition, and this is where characters are decomposed by compatibility, also multiple combining characters are arranged in a specific order.
To get the version of the Unicode Database currently used:
>>> unicodedata.unidata_version'8.0.0'Read more methods at the python documentation