Last active
December 1, 2021 03:52
-
-
Save u110/7954f86eb950ac7f20cf6e240fde4187 to your computer and use it in GitHub Desktop.
normalized_and_casefold_NFKD_in_bigquery ref https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#normalize_and_casefold
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
with subset as ( | |
select '123−4567' as x union all -- 123−4567 | |
select '㌶' as x union all -- ヘクタール | |
select '㈱㈲' as x union all -- (株)(有) | |
select 'アイウエオABCabc①' as x union all -- アイウエオabcabc1 | |
select '〒〠〶' as x union all -- 〒〠〒 | |
select '()()' as x union all -- ()() | |
select 'ABCabc' | |
) | |
select x, | |
NORMALIZE_AND_CASEFOLD(x, NFC) as fixed_x_NFC, | |
NORMALIZE_AND_CASEFOLD(x, NFKC) as fixed_x_NFKC, | |
NORMALIZE_AND_CASEFOLD(x, NFD) as fixed_x_NFD, | |
NORMALIZE_AND_CASEFOLD(x, NFKD) as fixed_x_NFKD, | |
NORMALIZE(x, NFKD) as fixed_x_NFKD_without_casefold, | |
from subset |
Author
u110
commented
Dec 1, 2021
行 | x | fixed_x_NFC | fixed_x_NFKC | fixed_x_NFD | fixed_x_NFKD | fixed_x_NFKD_without_casefold | |
---|---|---|---|---|---|---|---|
1 | 123−4567 | 123−4567 | 123−4567 | 123−4567 | 123−4567 | 123−4567 | |
2 | ㌶ | ㌶ | ヘクタール | ㌶ | ヘクタール | ヘクタール | |
3 | ㈱㈲ | ㈱㈲ | (株)(有) | ㈱㈲ | (株)(有) | (株)(有) | |
4 | アイウエオABCabc① | アイウエオabcabc① | アイウエオabcabc1 | アイウエオabcabc① | アイウエオabcabc1 | アイウエオABCabc1 | |
5 | 〒〠〶 | 〒〠〶 | 〒〠〒 | 〒〠〶 | 〒〠〒 | 〒〠〒 | |
6 | ()() | ()() | ()() | ()() | ()() | ()() | |
7 | ABCabc | abcabc | abcabc | abcabc | abcabc | ABCabc |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment