|
Character code according to ISO 10646 UCS-2 (not UTF-16).
All Closed Caption characters can be represented in Unicode, but unfortunately not all Teletext characters.
ETS 300 706 Table 36 Latin National Subset Turkish, character 0x23 "Turkish currency symbol" is not representable in Unicode, thus translated to private code U+E800. I was unable to identify all Arabic glyphs in Table 44 and 45 Arabic G0 and G2, so for now these are mapped to private code U+E620 ... U+E67F and U+E720 ... U+E77F respectively. Table 47 G1 Block Mosaic is not representable in Unicode, translated to private code U+EE00 ... U+EE7F. That is, the contiguous form has bit 5 (0x20) set, the separate form cleared. Table 48 G3 "Smooth Mosaics and Line Drawing Set" is not representable in Unicode, translated to private code U+EF20 ... U+EF7F.
Teletext Level 2.5+ DRCS are represented by private code U+F000 ... U+F7FF. The 6 lsb select character 0x00 ... 0x3F from a DRCS plane, the 5 msb select DRCS plane 0 ... 31, see vbi_page for details.
- Bug:
- Some Teletext character sets contain complementary Latin characters. For example the Greek capital letters Alpha and Beta are re-used as Latin capital letter A and B, while a separate code exists for Latin capital letter C. libzvbi will not analyse the page contents, so Greek A and B are always translated to Greek Alpha and Beta, C to Latin C, even if they appear in a pure Latin character word.
|