Skip to content

Instantly share code, notes, and snippets.

View brenhamp's full-sized avatar

brenhamp

View GitHub Profile
@brenhamp
brenhamp / regex_japanese_tutorial.MD
Last active April 25, 2023 17:06
Regular Expressions And Japanese

Regular Expressions and Japanese

In our modules, we have covered how to read and write regular expressions that use numbers, special characters, and letters from the Roman alphabet (A-Z). But what about characters from other languages? This is also possible, and there are different ways to do it!

Now, most character sets and alphabets have their own ranges in Unicode, so you can simply grab the first and last code and use that, as long as your system supports it. However, Japanese uses three different "alphabets," one of which is derived from Chinese. Additionally, Japanese users will sometimes use English words when they are talking about something specific, quoting something written in English, or if there isn't a good translation for it. So how can we account for all of this? Regex is flexible enough to cover it all!

A Quick Breakdown of Japanese Characters

You don't need to know Japanese to understand this tutorial, but you'll benefit from having some small idea of the different types of Japanese