Text document containing all characters of the Multilingual European Subsets of Unicode and some other common Unicode subsets (and a small Java program to verify the file has not been garbled)
Common Unicode Subsets
Not exactly known as a Unicode subset; the Unicode character set starts with
ASCII, though; therefore, ASCII is the smallest widely-used subset of
|Latin uppercase letters |0041-5A(26)|ABCDEFGHIJKLMNOPQRSTUVWXYZ|-----|
|Latin lowercase letters |0061-7A(26)|abcdefghijklmnopqrstuvwxyz|-----|
|Decimal digits |0030-39(10)|0123456789|---------------------|
|Symbols and special characters |0020-2F(16)| !"#$%&'()*+,-./|---------------|
|'-> |003A-40,5B-60,7B-7E(17)|:;<=>?@[\]^_`{|}~|--------------|
ASCII is defined as:
>00 20-7E
Multilingual European Subset 1 (MES-1)
On top of ASCII, this charset contains common Latin letters and symbols used
in Europe (or by European character sets):
|Latin-1 symbols |00A0-BF(32)| ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿|
|Latin-1 uppercase letters |00C0-DF(32)|ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß|
|Latin-1 lowercase letters |00E0-FF(32)|àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ|
|Latin extended |0100-13(20)|ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒē|-----------|
|'-> |0116-2B(22)|ĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪī|---------|
|'-> |012E-4D(32)|ĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌō|
|'-> |0150-67(24)|ŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧ|-------|
|'-> |0168-7E(23)|ŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽž|--------|
|Accents |02C7-C7(01)|ˇ|------------------------------|
|'-> |02D8-DB,DD-DD(05)|˘˙˚˛˝|--------------------------|
|Typographic special characters |2015-15(01)|―|------------------------------|
|'-> |2018-19,1C-1D(04)|‘’“”|---------------------------|
|Euro symbol |20AC-AC(01)|€|------------------------------|
|Trademark symbol |2122-22(01)|™|------------------------------|
|Ohm symbol |2126-26(01)|Ω|------------------------------|
|Vulgar fractions |215B-5E(04)|⅛⅜⅝⅞|---------------------------|
|Arrow symbols |2190-93(04)|←↑→↓|---------------------------|
|Musical note symbol |266A-6A(01)|♪|------------------------------|
MES-1 is defined as:
>00 20-7E A0-FF
>01 00-13 16-2B 2E-4D 50-7E
>02 C7 D8-DB DD
>20 15 18-19 1C-1D AC
>21 22 26 5B-5E 90-93
>26 6A
Multilingual European Subset 2 (MES-2)
On top of MES-1, this contains more "exotic" Latin European characters
as well as Greek and Cyrillic ones, and more symbols:
|Latin extended |0114-15(02)|Ĕĕ|-----------------------------|
|'-> |012C-2D,4E-4F(04)|ĬĭŎŏ|---------------------------|
|'-> |0192-92,FA-FF(07)|ƒǺǻǼǽǾǿ|------------------------|
|'-> |1E80-85,F2-F3(08)|ẀẁẂẃẄẅỲỳ|-----------------------|
|'-> (*) |01DE-EF(18)|ǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯ|-------------|
|'-> (*) |0218-1B,1E-1F(06)|ȘșȚțȞȟ|-------------------------|
|'-> (*) |1E02-03,0A-0B,1E-1F,40-41(08)|ḂḃḊḋḞḟṀṁ|-----------------------|
|'-> (*) |1E56-57,60-61,6A-6B(06)|ṖṗṠṡṪṫ|-------------------------|
|More exotic Latin letters |017F-7F(01)|ſ|------------------------------|
|'-> (*) |018F-8F,B7-B7(02)|ƏƷ|-----------------------------|
|'-> (*) |0259-59,7C-7C,92-92(03)|əɼʒ|----------------------------|
|'-> (*) |1E9B-9B(01)|ẛ|------------------------------|
|Latin Modifier letters |02C6-C6(01)|ˆ|------------------------------|
|'-> |02C9-C9,DC-DC(02)|ˉ˜|-----------------------------|
|'-> (*) |02BB-BD,EE-EE(04)|ʻʼʽˮ|---------------------------|
|Greek uppercase letters |0391-A1(17)|ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡ|--------------|
|'-> |03A3-A9(07)|ΣΤΥΦΧΨΩ|------------------------|
|Greek lowercase letters |03B1-C9(25)|αβγδεζηθικλμνξοπρςστυφχψω|------|
|Greek extended |0384-8A(07)|΄΅Ά·ΈΉΊ|------------------------|
|'-> |038C-8C,8E-90(04)|ΌΎΏΐ|---------------------------|
|'-> |03AA-B0,CA-CE(12)|ΪΫάέήίΰϊϋόύώ|-------------------|
|'-> (*) |0374-75,7A-7A,7E-7E(04)|ʹ͵ͺ;|---------------------------|
|'-> (*) |03D7-D7,DA-E1(09)|ϗϚϛϜϝϞϟϠϡ|----------------------|
|'-> (*) |1F00-15,18-1D(28)|ἀἁἂἃἄἅἆἇἈἉἊἋἌἍἎἏἐἑἒἓἔἕἘἙἚἛἜἝ|---|
|'-> (*) |1F20-3F(32)|ἠἡἢἣἤἥἦἧἨἩἪἫἬἭἮἯἰἱἲἳἴἵἶἷἸἹἺἻἼἽἾἿ|
|'-> (*) |1F40-45,48-4D,50-57(20)|ὀὁὂὃὄὅὈὉὊὋὌὍὐὑὒὓὔὕὖὗ|-----------|
|'-> (*) |1F59-59,5B-5B,5D-5D(03)|ὙὛὝ|----------------------------|
|'-> (*) |1F5F-7D(31)|ὟὠὡὢὣὤὥὦὧὨὩὪὫὬὭὮὯὰάὲέὴήὶίὸόὺύὼώ||
|'-> (*) |1F80-9F(32)|ᾀᾁᾂᾃᾄᾅᾆᾇᾈᾉᾊᾋᾌᾍᾎᾏᾐᾑᾒᾓᾔᾕᾖᾗᾘᾙᾚᾛᾜᾝᾞᾟ|
|'-> (*) |1FA0-B4(21)|ᾠᾡᾢᾣᾤᾥᾦᾧᾨᾩᾪᾫᾬᾭᾮᾯᾰᾱᾲᾳᾴ|----------|
|'-> (*) |1FB6-C4,C6-D3(29)|ᾶᾷᾸᾹᾺΆᾼ᾽ι᾿῀῁ῂῃῄῆῇῈΈῊΉῌ῍῎῏ῐῑῒΐ|--|
|'-> (*) |1FD6-DB,DD-EF(25)|ῖῗῘῙῚΊ῝῞῟ῠῡῢΰῤῥῦῧῨῩῪΎῬ῭΅`|------|
|'-> (*) |1FF2-F4,F6-FE(12)|ῲῳῴῶῷῸΌῺΏῼ´῾|-------------------|
|'-> |0420-3F(32)|РСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмноп|
|'-> |0440-5F(32)|рстуфхцчшщъыьэюяѐёђѓєѕіїјљњћќѝўџ|
|'-> |0490-91(02)|Ґґ|-----------------------------|
|'-> (*) |0492-B1(32)|ҒғҔҕҖҗҘҙҚқҜҝҞҟҠҡҢңҤҥҦҧҨҩҪҫҬҭҮүҰұ|
|'-> (*) |04B2-C4,C7-C8(21)|ҲҳҴҵҶҷҸҹҺһҼҽҾҿӀӁӂӃӄӇӈ|----------|
|'-> (*) |04CB-CC,D0-EB(30)|ӋӌӐӑӒӓӔӕӖӗӘәӚӛӜӝӞӟӠӡӢӣӤӥӦӧӨөӪӫ|-|
|'-> (*) |04EE-F5,F8-F9(10)|ӮӯӰӱӲӳӴӵӸӹ|---------------------|
|Typographic symbols |2013-14(02)|–—|-----------------------------|
|'-> |2017-17,1A-1B,1E-1E,20-22(07)|‗‚‛„†‡•|------------------------|
|'-> |2026-26,30-30,32-33,39-3A(06)|…‰′″‹›|-------------------------|
|'-> |203C-3C,3E-3E,44-44,7F-7F(04)|‼‾⁄ⁿ|---------------------------|
|'-> (*) |204A-4A,82-82(02)|⁊₂|-----------------------------|
|Currency symbols |20A3-A4(02)|₣₤|-----------------------------|
|'-> |20A7-A7(01)|₧|------------------------------|
|'-> (*) |20AF-AF(01)|₯|------------------------------|
|Business symbols |2105-05(01)|℅|------------------------------|
|'-> |2116-16(01)|№|------------------------------|
|Arrow symbols |2194-95(02)|↔↕|-----------------------------|
|'-> |21A8-A8(01)|↨|------------------------------|
|Mathematical symbols |2202-02(01)|∂|------------------------------|
|'-> |2206-06,0F-0F,11-12,19-1A(06)|∆∏∑−∙√|-------------------------|
|'-> |221E-1F,29-29,2B-2B(04)|∞∟∩∫|---------------------------|
|'-> |2248-48,60-61,64-65(05)|≈≠≡≤≥|--------------------------|
|'-> |2302-02,10-10,20-21(04)|⌂⌐⌠⌡|---------------------------|
|'-> (*) |2200-00,03-03,08-09(04)|∀∃∈∉|---------------------------|
|'-> (*) |2227-28,2A-2A,59-59(04)|∧∨∪≙|---------------------------|
|'-> (*) |2282-83,95-95,97-97(04)|⊂⊃⊕⊗|---------------------------|
|'-> (*) |2329-2A(02)|〈〉|-----------------------------|
|Box drawing characters |2500-00(01)|─|------------------------------|
|'-> |2502-02,0C-0C,10-10(03)|│┌┐|----------------------------|
|'-> |2514-14,18-18,1C-1C(03)|└┘├|----------------------------|
|'-> |2524-24,2C-2C,34-34,3C-3C(04)|┤┬┴┼|---------------------------|
|'-> |2550-6C(29)|═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟╠╡╢╣╤╥╦╧╨╩╪╫╬|--|
|Block graphic characters |2580-80(01)|▀|------------------------------|
|'-> |2584-84,88-88,8C-8C(03)|▄█▌|----------------------------|
|'-> |2590-93(04)|▐░▒▓|---------------------------|
|Shapes |25A0-A0(01)|■|------------------------------|
|'-> |25AC-AC(01)|▬|------------------------------|
|'-> |25B2-B2,BA-BA,BC-BC,C4-C4(04)|▲►▼◄|---------------------------|
|'-> |25CA-CB,D8-D9(04)|◊○◘◙|---------------------------|
|Miscellaneous symbols |263A-3C(03)|☺☻☼|----------------------------|
|'-> |2640-40,42-42(02)|♀♂|-----------------------------|
|'-> |2660-60,63-63,65-66(04)|♠♣♥♦|---------------------------|
|'-> |266B-6B(01)|♫|------------------------------|
|Ligatures |FB01-02(02)|fifl|-----------------------------|
|Replacement character (*) |FFFD-FD(01)|�|------------------------------|
MES-2 is defined as:
>00 20-7E A0-FF
>01 00-7F 8F 92 B7 DE-EF FA-FF
>02 18-1B 1E-1F 59 7C 92 BB-BD C6-C7 C9 D8-DD EE
>03 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D7 DA-E1
>04 00-5F 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9
>1E 02-03 0A-0B 1E-1F 40-41 56-57 60-61 6A-6B 80-85 9B F2-F3
>1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB
>1F DD-EF F2-F4 F6-FE
>20 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 4A 7F 82 A3-A4 A7 AC AF
>21 05 16 22 26 5B-5E 90-95 A8
>22 00 02-03 06 08-09 0F 11-12 19-1A 1E-1F 27-2B 48 59 60-61 64-65 82-83 95 97
>23 02 10 20-21 29-2A
>25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 AC B2 BA BC C4
>25 CA-CB D8-D9
>26 3A-3C 40 42 60 63 65-66 6A-6B
>FB 01-02
Windows Glyph List 4 (WGL-4)
A superset of MES-1 and mostly a subset of MES-2 (everything not marked with
(*) above), but with a few additional characters; this was defined by
Microsoft as the character set that is supposed to be displayable on all mayor
Windows versions without installing additional fonts.
|Special letters |2113-13(01)|ℓ|------------------------------|
|'-> |212E-2E(01)|℮|------------------------------|
|Special symbols |2215-15(01)|∕|------------------------------|
|'-> |25A1-A1,AA-AB,CF-CF,E6-E6(05)|□▪▫●◦|--------------------------|
WGL4 is defined as:
>00 20-7E A0-FF
>01 00-7F 92 FA-FF
>02 C6-C7 C9 D8-DD
>03 84-8A 8C 8E-A1 A3-CE
>04 00-5F 90-91
>1E 80-85 F2-F3
>20 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 7F A3-A4 A7 AC
>21 05 13 16 22 26 2E 5B-5E 90-95 A8
>22 02 06 0F 11-12 15 19-1A 1E-1F 29 2B 48 60-61 64-65
>23 02 10 20-21
>25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0-A1 AA-AC B2 BA
>25 BC C4 CA-CB CF D8-D9 E6
>26 3A-3C 40 42 60 63 65-66 6A-6B
>FB 01-02
Multilingual European Subset 3 (MES-3) and its variants
MES-3 contains even more characters. There are several version of this subset,
MES-3A is an open subset (that may receive more characters if they are added
to the respective code ranges), so is not included in this file here.
MES-3B and MES-3KS are two fixed subsets. The latter does not contain some
characters that are not used by languages of European origin, and is therefore
shown first here (as difference to MES-2 and WGL):
|MES-3KS |0180-81(02)|ƀƁ|-----------------------------|
|'-> |018B-8C(02)|Ƌƌ|-----------------------------|
|'-> |0195-95(01)|ƕ|------------------------------|
|'-> |019A-9B(02)|ƚƛ|-----------------------------|
|'-> |019E-9F(02)|ƞƟ|-----------------------------|
|'-> |01A2-A3(02)|Ƣƣ|-----------------------------|
|'-> |01A6-A6(01)|Ʀ|------------------------------|
|'-> |01AA-AB(02)|ƪƫ|-----------------------------|
|'-> |01B5-B6(02)|Ƶƶ|-----------------------------|
|'-> |01B8-BB(04)|Ƹƹƺƻ|---------------------------|
|'-> |01BE-CC(15)|ƾƿǀǁǂǃDŽDždžLJLjljNJNjnj|----------------|
|'-> |01D5-D6(02)|Ǖǖ|-----------------------------|
|'-> |01F0-F7(08)|ǰDZDzdzǴǵǶǷ|-----------------------|
|'-> |0200-17(24)|ȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏȐȑȒȓȔȕȖȗ|-------|
|'-> |021C-1D(02)|Ȝȝ|-----------------------------|
|'-> |0224-27(04)|ȤȥȦȧ|---------------------------|
|'-> |022A-33(10)|ȪȫȬȭȮȯȰȱȲȳ|---------------------|
|'-> |0250-58(09)|ɐɑɒɓɔɕɖɗɘ|----------------------|
|'-> |025A-79(32)|ɚɛɜɝɞɟɠɡɢɣɤɥɦɧɨɩɪɫɬɭɮɯɰɱɲɳɴɵɶɷɸɹ|
|'-> |027A-7B(02)|ɺɻ|-----------------------------|
|'-> |027D-91(21)|ɽɾɿʀʁʂʃʄʅʆʇʈʉʊʋʌʍʎʏʐʑ|----------|
|'-> |0293-AD(27)|ʓʔʕʖʗʘʙʚʛʜʝʞʟʠʡʢʣʤʥʦʧʨʩʪʫʬʭ|----|
|'-> |02B0-BA(11)|ʰʱʲʳʴʵʶʷʸʹʺ|--------------------|
|'-> |02BE-C5(08)|ʾʿˀˁ˂˃˄˅|-----------------------|
|'-> |02C8-C8(01)|ˈ|------------------------------|
|'-> |02CA-D7(14)|ˊˋˌˍˎˏːˑ˒˓˔˕˖˗|-----------------|
|'-> |02DE-ED(16)|˞˟ˠˡˢˣˤ˥˦˧˨˩˪˫ˬ˭|---------------|
|'-> |0300-1F(32)|̛̖̗̘̙̜̝̞̟̀́̂̃̄̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̕̚|
|'-> |0320-3F(32)|̴̵̶̷̸̡̢̧̨̠̣̤̥̦̩̪̫̬̭̮̯̰̱̲̳̹̺̻̼̽̾̿|
|'-> |0340-4E(15)|͇͈͉͍͎̀́͂̓̈́͆͊͋͌ͅ|----------------|
|'-> |0360-62(03)|͢͠͡|----------------------------|
|'-> |03D0-D6(07)|ϐϑϒϓϔϕϖ|------------------------|
|'-> |03E2-F3(18)|ϢϣϤϥϦϧϨϩϪϫϬϭϮϯϰϱϲϳ|-------------|
|'-> |0460-7F(32)|ѠѡѢѣѤѥѦѧѨѩѪѫѬѭѮѯѰѱѲѳѴѵѶѷѸѹѺѻѼѽѾѿ|
|'-> |0480-86(07)|Ҁҁ҂҃҄҅҆|------------------------|
|'-> |0488-89(02)|҈҉|-----------------------------|
|'-> |048C-8F(04)|ҌҍҎҏ|---------------------------|
|'-> |04EC-ED(02)|Ӭӭ|-----------------------------|
|'-> |0551-56(06)|ՑՒՓՔՕՖ|-------------------------|
|'-> |0559-5F(07)|ՙ՚՛՜՝՞՟|------------------------|
|'-> |0561-80(32)|աբգդեզէըթժիլխծկհձղճմյնշոչպջռսվտր|
|'-> |0581-87(07)|ցւփքօֆև|------------------------|
|'-> |0589-8A(02)|։֊|-----------------------------|
|'-> |10D0-EF(32)|აბგდევზთიკლმნოპჟრსტუფქღყშჩცძწჭხჯ|
|'-> |10F0-F6(07)|ჰჱჲჳჴჵჶ|------------------------|
|'-> |10FB-FB(01)|჻|------------------------------|
|'-> |1E00-01(02)|Ḁḁ|-----------------------------|
|'-> |1E04-09(06)|ḄḅḆḇḈḉ|-------------------------|
|'-> |1E0C-1D(18)|ḌḍḎḏḐḑḒḓḔḕḖḗḘḙḚḛḜḝ|-------------|
|'-> |1E20-3F(32)|ḠḡḢḣḤḥḦḧḨḩḪḫḬḭḮḯḰḱḲḳḴḵḶḷḸḹḺḻḼḽḾḿ|
|'-> |1E42-55(20)|ṂṃṄṅṆṇṈṉṊṋṌṍṎṏṐṑṒṓṔṕ|-----------|
|'-> |1E58-5F(08)|ṘṙṚṛṜṝṞṟ|-----------------------|
|'-> |1E62-69(08)|ṢṣṤṥṦṧṨṩ|-----------------------|
|'-> |1E6C-7F(20)|ṬṭṮṯṰṱṲṳṴṵṶṷṸṹṺṻṼṽṾṿ|-----------|
|'-> |1E86-9A(21)|ẆẇẈẉẊẋẌẍẎẏẐẑẒẓẔẕẖẗẘẙẚ|----------|
|'-> |2000-12(19)|           ​‌‍‎‏‐‑‒|------------|
|'-> |2016-16(01)|‖|------------------------------|
|'-> |201F-1F(01)|‟|------------------------------|
|'-> |2023-25(03)|‣․‥|----------------------------|
|'-> |2027-2F(09)|‧

‪‫‬‭‮ |----------------------|
‭The previous line contains right-to-left separators and may look strange.
|'-> |2031-31(01)|‱|------------------------------|
|'-> |2034-38(05)|‴‵‶‷‸|--------------------------|
|'-> |203B-3B(01)|※|------------------------------|
|'-> |203D-3D(01)|‽|------------------------------|
|'-> |203F-43(05)|‿⁀⁁⁂⁃|--------------------------|
|'-> |2045-46(02)|⁅⁆|-----------------------------|
|'-> |2048-49(02)|⁈⁉|-----------------------------|
|'-> |204B-4D(03)|⁋⁌⁍|----------------------------|
|'-> |206A-70(07)|⁰|------------------------|
|'-> |2074-7E(11)|⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾|--------------------|
|'-> |2080-81(02)|₀₁|-----------------------------|
|'-> |2083-8E(12)|₃₄₅₆₇₈₉₊₋₌₍₎|-------------------|
|'-> |20A0-A2(03)|₠₡₢|----------------------------|
|'-> |20A5-A6(02)|₥₦|-----------------------------|
|'-> |20A8-AB(04)|₨₩₪₫|---------------------------|
|'-> |20AD-AE(02)|₭₮|-----------------------------|
|'-> |20D0-E3(20)|⃒⃓⃘⃙⃚⃐⃑⃔⃕⃖⃗⃛⃜⃝⃞⃟⃠⃡⃢⃣|-----------|
|'-> |2100-04(05)|℀℁ℂ℃℄|--------------------------|
|'-> |2106-12(13)|℆ℇ℈℉ℊℋℌℍℎℏℐℑℒ|------------------|
|'-> |2114-15(02)|℔ℕ|-----------------------------|
|'-> |2117-21(11)|℗℘ℙℚℛℜℝ℞℟℠℡|--------------------|
|'-> |2123-25(03)|℣ℤ℥|----------------------------|
|'-> |2127-2D(07)|℧ℨ℩KÅℬℭ|------------------------|
|'-> |212F-3A(12)|ℯℰℱℲℳℴℵℶℷℸℹ℺|-------------------|
|'-> |2153-5A(08)|⅓⅔⅕⅖⅗⅘⅙⅚|-----------------------|
|'-> |215F-7E(32)|⅟ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾ|
|'-> |217F-83(05)|ⅿↀↁↂↃ|--------------------------|
|'-> |2196-A7(18)|↖↗↘↙↚↛↜↝↞↟↠↡↢↣↤↥↦↧|-------------|
|'-> |21A9-C8(32)|↩↪↫↬↭↮↯↰↱↲↳↴↵↶↷↸↹↺↻↼↽↾↿⇀⇁⇂⇃⇄⇅⇆⇇⇈|
|'-> |21C9-E8(32)|⇉⇊⇋⇌⇍⇎⇏⇐⇑⇒⇓⇔⇕⇖⇗⇘⇙⇚⇛⇜⇝⇞⇟⇠⇡⇢⇣⇤⇥⇦⇧⇨|
|'-> |21E9-F3(11)|⇩⇪⇫⇬⇭⇮⇯⇰⇱⇲⇳|--------------------|
|'-> |2201-01(01)|∁|------------------------------|
|'-> |2204-05(02)|∄∅|-----------------------------|
|'-> |2207-07(01)|∇|------------------------------|
|'-> |220A-0E(05)|∊∋∌∍∎|--------------------------|
|'-> |2210-10(01)|∐|------------------------------|
|'-> |2213-14(02)|∓∔|-----------------------------|
|'-> |2216-18(03)|∖∗∘|----------------------------|
|'-> |221B-1D(03)|∛∜∝|----------------------------|
|'-> |2220-26(07)|∠∡∢∣∤∥∦|------------------------|
|'-> |222C-47(28)|∬∭∮∯∰∱∲∳∴∵∶∷∸∹∺∻∼∽∾∿≀≁≂≃≄≅≆≇|---|
|'-> |2249-58(16)|≉≊≋≌≍≎≏≐≑≒≓≔≕≖≗≘|---------------|
|'-> |225A-5F(06)|≚≛≜≝≞≟|-------------------------|
|'-> |2262-63(02)|≢≣|-----------------------------|
|'-> |2266-81(28)|≦≧≨≩≪≫≬≭≮≯≰≱≲≳≴≵≶≷≸≹≺≻≼≽≾≿⊀⊁|---|
|'-> |2284-94(17)|⊄⊅⊆⊇⊈⊉⊊⊋⊌⊍⊎⊏⊐⊑⊒⊓⊔|--------------|
|'-> |2296-96(01)|⊖|------------------------------|
|'-> |2298-B7(32)|⊘⊙⊚⊛⊜⊝⊞⊟⊠⊡⊢⊣⊤⊥⊦⊧⊨⊩⊪⊫⊬⊭⊮⊯⊰⊱⊲⊳⊴⊵⊶⊷|
|'-> |22B8-D7(32)|⊸⊹⊺⊻⊼⊽⊾⊿⋀⋁⋂⋃⋄⋅⋆⋇⋈⋉⋊⋋⋌⋍⋎⋏⋐⋑⋒⋓⋔⋕⋖⋗|
|'-> |22D8-F1(26)|⋘⋙⋚⋛⋜⋝⋞⋟⋠⋡⋢⋣⋤⋥⋦⋧⋨⋩⋪⋫⋬⋭⋮⋯⋰⋱|-----|
|'-> |2300-01(02)|⌀⌁|-----------------------------|
|'-> |2303-0F(13)|⌃⌄⌅⌆⌇⌈⌉⌊⌋⌌⌍⌎⌏|------------------|
|'-> |2311-1F(15)|⌑⌒⌓⌔⌕⌖⌗⌘⌙⌚⌛⌜⌝⌞⌟|----------------|
|'-> |2322-28(07)|⌢⌣⌤⌥⌦⌧⌨|------------------------|
|'-> |232B-4A(32)|⌫⌬⌭⌮⌯⌰⌱⌲⌳⌴⌵⌶⌷⌸⌹⌺⌻⌼⌽⌾⌿⍀⍁⍂⍃⍄⍅⍆⍇⍈⍉⍊|
|'-> |234B-6A(32)|⍋⍌⍍⍎⍏⍐⍑⍒⍓⍔⍕⍖⍗⍘⍙⍚⍛⍜⍝⍞⍟⍠⍡⍢⍣⍤⍥⍦⍧⍨⍩⍪|
|'-> |236B-7B(17)|⍫⍬⍭⍮⍯⍰⍱⍲⍳⍴⍵⍶⍷⍸⍹⍺⍻|--------------|
|'-> |237D-9A(30)|⍽⍾⍿⎀⎁⎂⎃⎄⎅⎆⎇⎈⎉⎊⎋⎌⎍⎎⎏⎐⎑⎒⎓⎔⎕⎖⎗⎘⎙⎚|-|
|'-> |2440-4A(11)|⑀⑁⑂⑃⑄⑅⑆⑇⑈⑉⑊|--------------------|
|'-> |2501-01(01)|━|------------------------------|
|'-> |2503-0B(09)|┃┄┅┆┇┈┉┊┋|----------------------|
|'-> |250D-0F(03)|┍┎┏|----------------------------|
|'-> |2511-13(03)|┑┒┓|----------------------------|
|'-> |2515-17(03)|┕┖┗|----------------------------|
|'-> |2519-1B(03)|┙┚┛|----------------------------|
|'-> |251D-23(07)|┝┞┟┠┡┢┣|------------------------|
|'-> |2525-2B(07)|┥┦┧┨┩┪┫|------------------------|
|'-> |252D-33(07)|┭┮┯┰┱┲┳|------------------------|
|'-> |2535-3B(07)|┵┶┷┸┹┺┻|------------------------|
|'-> |253D-4F(19)|┽┾┿╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏|------------|
|'-> |256D-7F(19)|╭╮╯╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿|------------|
|'-> |2581-83(03)|▁▂▃|----------------------------|
|'-> |2585-87(03)|▅▆▇|----------------------------|
|'-> |2589-8B(03)|▉▊▋|----------------------------|
|'-> |258D-8F(03)|▍▎▏|----------------------------|
|'-> |2594-95(02)|▔▕|-----------------------------|
|'-> |25A2-A9(08)|▢▣▤▥▦▧▨▩|-----------------------|
|'-> |25AD-B1(05)|▭▮▯▰▱|--------------------------|
|'-> |25B3-B9(07)|△▴▵▶▷▸▹|------------------------|
|'-> |25BB-BB(01)|▻|------------------------------|
|'-> |25BD-C3(07)|▽▾▿◀◁◂◃|------------------------|
|'-> |25C5-C9(05)|◅◆◇◈◉|--------------------------|
|'-> |25CC-CE(03)|◌◍◎|----------------------------|
|'-> |25D0-D7(08)|◐◑◒◓◔◕◖◗|-----------------------|
|'-> |25DA-E5(12)|◚◛◜◝◞◟◠◡◢◣◤◥|-------------------|
|'-> |25E7-F7(17)|◧◨◩◪◫◬◭◮◯◰◱◲◳◴◵◶◷|--------------|
|'-> |2600-13(20)|☀☁☂☃☄★☆☇☈☉☊☋☌☍☎☏☐☑☒☓|-----------|
|'-> |2619-38(32)|☙☚☛☜☝☞☟☠☡☢☣☤☥☦☧☨☩☪☫☬☭☮☯☰☱☲☳☴☵☶☷☸|
|'-> |2639-39(01)|☹|------------------------------|
|'-> |263D-3F(03)|☽☾☿|----------------------------|
|'-> |2641-41(01)|♁|------------------------------|
|'-> |2643-5F(29)|♃♄♅♆♇♈♉♊♋♌♍♎♏♐♑♒♓♔♕♖♗♘♙♚♛♜♝♞♟|--|
|'-> |2661-62(02)|♡♢|-----------------------------|
|'-> |2664-64(01)|♤|------------------------------|
|'-> |2667-69(03)|♧♨♩|----------------------------|
|'-> |266C-71(06)|♬♭♮♯♰♱|-------------------------|
|'-> |FB00-00(01)|ff|------------------------------|
|'-> |FB03-06(04)|ffifflſtst|---------------------------|
|'-> |FB13-17(05)|ﬓﬔﬕﬖﬗ|--------------------------|
|'-> |FE20-23(04)|︠︡︢︣|---------------------------|
|'-> |FFF9-FC(04)||---------------------------|
MES-3KS is defined as:
>00 20-7E A0-FF
>01 00-81 8B-8C 8F 92 95 9A-9B 9E-9F A2-A3 A6 AA-AB B5-BB BE-CC D5-D6 DE-F7
>01 FA-FF
>02 00-1F 24-27 2A-33 50-AD B0-EE
>03 00-4E 60-62 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D0-D7 DA-F3
>04 00-86 88-89 8C-8F 90-C4 C7-C8 CB-CC D0-ED EE-F5 F8-F9
>05 31-56 59-5F 61-87 89-8A
>10 D0-F6 FB
>1E 00-9B F2-F3
>1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB DD-EF
>1F F2-F4 F6-FE
>20 00-46 48-4D 6A-70 74-8E A0-AF D0-E3
>21 00-3A 53-83 90-F3
>22 00-F1
>23 00-7B 7D-9A
>24 40-4A
>25 00-95 A0-F7
>26 00-13 19-6F 70-71
>FB 00-06 13-17
>FE 20-23
Here are the characters that are missing from MES-3KS but are included in
|MES-3B |0182-8A(09)|ƂƃƄƅƆƇƈƉƊ|----------------------|
|'-> |018D-8E(02)|ƍƎ|-----------------------------|
|'-> |0190-91(02)|ƐƑ|-----------------------------|
|'-> |0193-94(02)|ƓƔ|-----------------------------|
|'-> |0196-99(04)|ƖƗƘƙ|---------------------------|
|'-> |019C-9D(02)|ƜƝ|-----------------------------|
|'-> |01A0-A1(02)|Ơơ|-----------------------------|
|'-> |01A4-A5(02)|Ƥƥ|-----------------------------|
|'-> |01A7-A9(03)|ƧƨƩ|----------------------------|
|'-> |01AC-B4(09)|ƬƭƮƯưƱƲƳƴ|----------------------|
|'-> |01BC-BD(02)|Ƽƽ|-----------------------------|
|'-> |01CD-D4(08)|ǍǎǏǐǑǒǓǔ|-----------------------|
|'-> |01D7-DD(07)|ǗǘǙǚǛǜǝ|------------------------|
|'-> |01F8-F9(02)|Ǹǹ|-----------------------------|
|'-> |0222-23(02)|Ȣȣ|-----------------------------|
|'-> |0228-29(02)|Ȩȩ|-----------------------------|
|'-> |1EA0-BF(32)|ẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặẸẹẺẻẼẽẾế|
|'-> |1EC0-DF(32)|ỀềỂểỄễỆệỈỉỊịỌọỎỏỐốỒồỔổỖỗỘộỚớỜờỞở|
|'-> |1EE0-F1(18)|ỠỡỢợỤụỦủỨứỪừỬửỮữỰự|-------------|
|'-> |1EF4-F9(06)|ỴỵỶỷỸỹ|-------------------------|
MES-3B is defined as:
>00 20-7E A0-FF
>01 00-FF
>02 00-1F 22-33 50-AD B0-EE
>03 00-4E 60-62 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D0-D7 DA-F3
>04 00-86 88-89 8C-C4 C7-C8 CB-CC D0-F5 F8-F9
>05 31-56 59-5F 61-87 89-8A
>10 D0-F6 FB
>1E 00-9B A0-F9
>1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB DD-EF
>1F F2-F4 F6-FE
>20 00-46 48-4D 6A-70 74-8E A0-AF D0-E3
>21 00-3A 53-83 90-F3
>22 00-F1
>23 00-7B 7D-9A
>24 40-4A
>25 00-95 A0-F7
>26 00-13 19-71
>FB 00-06 13-17
>FE 20-23
Some unrelated subsets:
There are also subsets used by Adobe to define glyph names, which are AGL and
AGLFN. Not shown here (yet), but (for now) only their definition is provided
for reference:
>00 20-7E A1-AC AE-B1 B4-B4 B6-B8 BA-FF
>01 00-7F 92-92 A0-A1 AF-B0 E6-E7 FA-FF
>02 18-19 BC-BD C6-C7 D8-DD
>03 00-01 03-03 09-09 23-23 84-8A 8C-8C 8E-A1 A3-CE D1-D2 D5-D6
>04 01-0C 0E-4F 51-5C 5E-5F 62-63 72-75 90-91 D9-D9
>05 B0-B9 BB-C3 D0-EA F0-F2
>06 0C-0C 1B-1B 1F-1F 21-3A 40-52 60-6A 6D-6D 79-79 7E-7E 86-86 88-88 91-91
>06 98-98 A4-A4 AF-AF BA-BA D2-D2 D5-D5
>1E 80-85 F2-F3
>20 0C-0F 12-15 17-1E 20-22 24-26 2C-2E 30-30 32-33 39-3A 3C-3C 44-44 A1-A1
>20 A3-A4 A7-A7 AA-AC
>21 05-05 11-11 13-13 16-16 18-18 1C-1C 1E-1E 22-22 2E-2E 35-35 53-54 5B-5E
>21 90-95 A8-A8 B5-B5 D0-D4
>22 00-00 02-03 05-05 07-09 0B-0B 0F-0F 11-12 17-17 1A-1A 1D-20 27-2B 34-34
>22 3C-3C 45-45 48-48 60-61 64-65 82-84 86-87 95-95 97-97 A5-A5 C5-C5
>23 02-02 10-10 20-21 29-2A
>25 00-00 02-02 0C-0C 10-10 14-14 18-18 1C-1C 24-24 2C-2C 34-34 3C-3C 50-6C
>25 80-80 84-84 88-88 8C-8C 90-93 A0-A1 AA-AC B2-B2 BA-BA BC-BC C4-C4 CA-CB
>25 CF-CF D8-D9 E6-E6
>26 3A-3C 40-40 42-42 60-60 63-63 65-66 6A-6B
>00 01-7F A0-FF
>01 00-F5 FA-FF
>02 00-19 50-61 63-69 6B-73 75-75 77-7F 81-8E 90-98 9A-9B 9D-9E A0-A8 B0-B2
>02 B4-DE E0-E0 E3-E9
>03 00-25 27-45 60-61 74-75 7A-7A 7E-7E 84-8A 8C-8C 8E-A1 A3-CE D0-D6 DA-DA
>03 DC-DC DE-DE E0-E0 E2-F3
>04 01-0C 0E-4F
>05 31-87 89-89 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9
>06 0C-0C 1B-1B 1F-1F 21-3A 40-52 60-6D 79-79 7E-7E 86-86 88-88 91-91 98-98
>06 A4-A4 AF-AF BA-BA C1-C1 D1-D2 D5-D5 F0-F9
>09 01-03 05-39 3C-4D 50-54 58-70 81-83 85-8C 8F-90 93-A8 AA-B0 B2-B2 B6-B9
>09 BC-BC BE-C4 C7-C8 CB-CD D7-D7 DC-DD DF-E3 E6-FA
>0A 02-02 05-0A 0F-10 13-28 2A-30 32-32 35-36 38-39 3C-3C 3E-42 47-48 4B-4D
>0A 59-5C 5E-5E 66-74 81-83 85-8B 8D-8D 8F-91 93-A8 AA-B0 B2-B3 B5-B9 BC-BC
>0A BE-C5 C7-C9 CB-CD D0-D0 E0-E0 E6-EF
>0E 01-3A 3F-5B
>1E 00-9B A0-F9
>20 02-02 0B-10 12-1E 20-22 24-26 2C-2E 30-30 32-33 35-35 39-3C 3E-3E 42-42
>20 44-44 70-70 74-7A 7C-89 8D-8E A1-A4 A7-A7 A9-AC
>21 03-03 05-05 09-09 11-11 13-13 16-16 18-18 1C-1C 1E-1E 21-22 26-26 2B-2B
>21 2E-2E 35-35 53-54 5B-5E 60-6B 70-7B 90-99 A8-A8 B5-B5 BC-BC C0-C0 C4-C6
>22 00-00 02-03 05-09 0B-0C 0F-0F 11-13 15-15 17-17 19-1A 1D-20 23-23 25-2C
>22 2E-2E 34-37 3C-3D 43-43 45-45 48-48 4C-4C 50-53 60-62 64-67 6A-6B 6E-73
>22 76-77 79-7B 80-87 8A-8B 95-97 99-99 A3-A5 BF-BF C5-C5 CE-CF DA-DB EE-EE
>23 02-03 05-05 10-10 12-12 18-18 20-21 25-27 29-2B
>24 23-23 60-E9
>25 00-00 02-02 0C-0C 10-10 14-14 18-18 1C-1C 24-24 2C-2C 34-34 3C-3C 50-6C
>25 80-80 84-84 88-88 8C-8C 90-93 A0-A1 A3-AC B2-B7 B9-BA BC-BD BF-C1 C3-C4
>25 C6-CC CE-D1 D8-D9 E2-E6 EF-EF
>26 05-06 0E-0F 1C-1F 2F-2F 3A-3C 40-42 60-6D 6F-6F
>27 13-13 8A-92 9E-9E
>30 00-19 1C-1E 20-29 36-36 41-94 9B-9E A1-FE
>31 05-29 31-8E
>32 00-1C 20-40 42-43 60-7B 7F-7F 8A-90 94-94 96-96 98-99 9D-9E A3-A9
>33 00-00 03-03 05-05 0D-0D 14-16 18-18 1E-1E 22-23 26-27 2A-2B 31-31 33-33
>33 36-36 39-39 3B-3B 42-42 47-47 49-4A 4D-4E 51-51 57-57 7B-CB CD-D6 D8-D8
>33 DB-DD
>53 44-44
>F6 BE-C0 C3-FF
>F7 21-21 24-24 26-26 30-39 3F-3F 60-7A A1-A2 A8-A8 AF-AF B4-B4 B8-B8 BF-BF
>F7 E0-F6 F8-FF
>F8 84-99 E5-FF
>FB 00-04 1F-20 2A-36 38-3C 3E-3E 40-41 43-44 46-4F 57-59 67-69 6B-6D 7B-7D
>FB 89-89 8B-8B 8D-8D 93-95 9F-9F A4-A5 A7-A9 AF-AF
>FC 08-08 0B-0C 0E-0E 48-48 4B-4B 4E-4E 58-58 5E-62 6D-6D 73-73 8D-8D 94-94
>FC 9F-9F A1-A2 A4-A4 C9-CC D1-D2 D5-D5 DD-DD
>FD 3E-3F 88-88 F2-F2 FA-FA
>FE 30-44 49-50 52-52 54-55 59-5F 61-66 69-6B 82-82 84-84 86-86 88-88 8A-8C
>FE 8E-8E 90-92 94-94 96-98 9A-9C 9E-A0 A2-A4 A6-A8 AA-AA AC-AC AE-AE B0-B0
>FE B2-B4 B6-B8 BA-BC BE-C0 C2-C4 C6-C8 CA-CC CE-D0 D2-D4 D6-D8 DA-DC DE-E0
>FF 01-5E 61-9F E0-E1 E3-E3 E5-E6
import java.util.regex.*;
public class SubsetTextVerifier {
static String line;
static int lineNumber;
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("C:\\Users\\Michi\\Desktop\\subsets.txt"));
boolean[][] seen = new boolean[256][], nostars = new boolean[256][], current = new boolean[256][];
Pattern rulePattern = Pattern.compile("([0-9A-F]{4}-[0-9A-F]{2}(?:,[0-9A-F]{2}-[0-9A-F]{2})*)\\(([0-9]{2})\\)");
Pattern definitionPattern = Pattern.compile(">[0-9A-F]{2}( [0-9A-F]{2}(-[0-9A-F]{2})?)+");
while ((line = br.readLine()) != null) {
if (line.startsWith("|")) {
if (line.length() != 78)
fail("Invalid line length " + line.length());
String[] parts = line.split("\\|", 4);
boolean star = parts[1].contains("(*)");
Matcher m = rulePattern.matcher(parts[2]);
if (!m.matches())
fail("Invalid rule: '" + parts[2] + "'");
int count = Integer.parseInt(;
StringBuilder chars = new StringBuilder();
String ranges =;
int base = Integer.parseInt(ranges.substring(0, 2), 16) * 0x100;
for (int i = 2; i < ranges.length(); i += 6) {
int from = Integer.parseInt(ranges.substring(i, i + 2), 16);
int to = Integer.parseInt(ranges.substring(i + 3, i + 5), 16);
for (int j = from; j <= to; j++) {
char ch = (char) (base + j);
addChar(seen, ch);
if (!star)
addChar(nostars, ch);
if (chars.length() != count)
fail("Invalid count: " + count + " (should be " + chars.length() + ")");
while (chars.length() < 32)
if (chars.length() < 33)
if (!chars.toString().equals(parts[3]))
fail("Invalid character list '" + parts[3] + "' should be '" + chars + "'");
} else if (line.startsWith(">#")) {
boolean[][] check = seen;
if (line.charAt(2) == '*') {
check = nostars;
line = line.substring(1);
} else if (line.charAt(2) == '?') {
check = current;
line = line.substring(1);
int count = Integer.parseInt(line.substring(2));
for (int i = 0; i < check.length; i++) {
if (check[i] == null ^ current[i] == null) {
fail("U+" + Integer.toHexString(i) + "xx is missing from " + (check[i] != null ? "rules" : "definitions"));
if (check[i] == null)
for (int j = 0; j < check[i].length; j++) {
if (check[i][j] != current[i][j])
fail("U+" + Integer.toHexString(i * 0x100 + j) + " is missing from " + (check[i][j] ? "rules" : "definitions"));
if (check[i][j])
if (count != 0)
fail("Count off by " + count);
current = new boolean[256][];
} else if (line.startsWith(">")) {
if (!definitionPattern.matcher(line).matches())
fail("Invalid definition");
int base = Integer.parseInt(line.substring(1, 3), 16) * 0x100;
for (int i = 4; i < line.length(); i += 3) {
int from = Integer.parseInt(line.substring(i, i + 2), 16), to = from;
if (i + 2 < line.length() && line.charAt(i + 2) == '-') {
i += 3;
to = Integer.parseInt(line.substring(i, i + 2), 16);
for (int j = from; j <= to; j++) {
addChar(current, (char) (base + j));
private static void addChar(boolean[][] flags, char ch) throws IOException {
if (flags[ch >> 8] == null)
flags[ch >> 8] = new boolean[256];
if (flags[ch >> 8][ch & 0xFF])
fail("Add char twice: U+" + Integer.toHexString(ch));
flags[ch >> 8][ch & 0xFF] = true;
private static IOException fail(String message) throws IOException {
throw new IOException("In line " + lineNumber + ": " + line + "\r\n" + message);
