Text 354, 199 rader
Skriven 2007-01-30 00:11:42 av Michiel van der Vlist (2:280/5555)
Kommentar till en text av Stas Degteff (2:5080/102.1)
Ärende: Golded and Unicode
==========================
Hello Stas!
On Sunday January 28 2007 04:46, you wrote to Richard Menedetter:
RM>> Is it now possible to view unicode messages correctly? How???
SD> Message in UTF-8 charset may be readed (looks charsets.cfg and *.CHR
SD> in archive gpc*.zip). To write in UTF-8, you should use external
SD> editor.
I found 850_u8.chs. It looks a bit like the 850_UTF8.CHS that I created myself
but that is a conversion file for *writing* a subset of utf8. Like I am doing
now. Reading utf8 is not possible AFAIK.
A problem is that the level is displayed as 2 (should be 4). Changing the level
parameter to 4 effectively disables the conversion. Another thing is that I
doubt the three bytes entries in 850_u8.chs are going to work, but I admit not
having tested that yet. This is what I use, it does not support three and four
byte characters.
File 850_UTF8.CHS:
=== Cut ===
;
; This file is a charset conversion module in text form.
;
; This module Converts IBM CP850 characters to UTF-8 characters.
;
; Format: ID, version, level,
; from charset, to charset,
; 128 entries: first & second byte
; "END"
; Lines beginning with a ";" or a ";" after the entries are comments
;
; Unknown characters are mapped to the "?" character.
;
; cedilla = , ; dieresis = .. ; acute = '
; grave = ` ; circumflex = ^ ; ring = o
; tilde = ~ ; caron = v
; All of these are above the character, apart from the cedilla which is below.
;
; \ is the escape character: \0 means decimal zero,
; \dnnn where nnn is a decimal number is the ordinal value of the character
; \xnn where nn is a hexadecimal number
; e.g.: \d32 is the ASCII space character
; Two \\ is the character "\" itself.
;
0 ; ID number
0 ; version number
;
2 ; level number
;
CP850 ; from set
UTF-8 ; to set
; ; dec hx description
\xC3 \x87 ; 128 80 latin capital letter c with cedilla
\xC3 \xBC ; 129 81 latin small letter u with diaeresis
\xC3 \xA9 ; 130 82 latin small letter e with acute
\xC3 \xA2 ; 131 83 latin small letter a with circumflex
\xC3 \xA4 ; 132 84 latin small letter a with diaeresis
\xC3 \xA0 ; 133 85 latin small letter a with grave
\xC3 \xA5 ; 134 86 latin small letter a with ring above
\xC3 \xA7 ; 135 87 latin small letter c with cedilla
\xC3 \xAA ; 136 88 latin small letter e with circumflex
\xC3 \xAB ; 137 89 latin small letter e with diaeresis
\xC3 \xA8 ; 138 8A latin small letter e with grave
\xC3 \xAF ; 139 8B latin small letter i with diaeresis
\xC3 \xAE ; 140 8C latin small letter i with circumflex
\xC3 \xAC ; 141 8D latin small letter i with grave
\xC3 \x84 ; 142 8E latin capital letter a with diaeresis
\xC3 \x85 ; 143 8F latin capital letter a with ring above
\xC3 \x89 ; 144 90 latin capital letter e with acute
\xC3 \xA6 ; 145 91 latin small letter ae
\xC3 \x86 ; 146 92 latin capital letter ae
\xC3 \xB4 ; 147 93 latin small letter o with circumflex
\xC3 \xB6 ; 148 94 latin small letter o with diaeresis
\xC3 \xB3 ; 149 95 latin small letter o with grave
\xC3 \xBB ; 150 96 latin small letter u with circumflex
\xC3 \xB9 ; 151 97 latin small letter u with grave
\xC3 \xBF ; 152 98 latin small letter y with diaeresis
\xC3 \x96 ; 153 99 latin capital letter o with diaeresis
\xC3 \x9C ; 154 9A latin capital letter u with diaeresis
\xC3 \xB8 ; 155 9B latin small letter o with stroke
\xC2 \xA3 ; 156 9C pound sign
\xC3 \x98 ; 157 9D latin capital letter o with stroke
\xC3 \x97 ; 158 9E multiplication sign
\0 f ; 159 9F dutch guilder sign (ibm437 159)
\xC3 \xA1 ; 160 A0 latin small letter a with acute
\xC3 \xAC ; 161 A1 latin small letter i with acute
\xC3 \xB2 ; 162 A2 latin small letter o with acute
\xC3 \xB9 ; 163 A3 latin small letter u with acute
\xC3 \xB1 ; 164 A4 latin small letter n with tilde
\xC3 \x91 ; 165 A5 latin capital letter n with tilde
\xC2 \xB8 ; 166 A6 feminine ordinal indicator
\xC2 \xBA ; 167 A7 masculine ordinal indicator
\xC2 \xBF ; 168 A8 inverted question mark
\xC2 \xAE ; 169 A9 registered sign
\xC2 \xAC ; 170 AA not sign
\xC2 \xBD ; 171 AB vulgar fraction one half
\xC2 \xBC ; 172 AC vulgar fraction one quarter
\xC2 \xA1 ; 173 AD inverted exclamation mark
\xC2 \xAB ; 174 AE left-pointing double angle quotation mark
\xC2 \xBB ; 175 AF right-pointing double angle quotation mark
\0 ? ; 176 B0 light shade
\0 ? ; 177 B1 medium shade
\0 ? ; 178 B2 dark shade
\0 ? ; 179 B3 box drawings light vertical
\0 ? ; 180 B4 box drawings light vertical and left
\xC3 \x81 ; 181 B5 latin capital letter a with acute
\xC3 \x82 ; 182 B6 latin capital letter a with circumflex
\xC3 \x80 ; 183 B7 latin capital letter a with grave
\xC2 \xA9 ; 184 B8 copyright sign
\0 ? ; 185 B9 box drawings heavy vertical and left
\0 ? ; 186 BA box drawings heavy vertical
\0 ? ; 187 BB box drawings heavy down and left
\0 ? ; 188 BC box drawings heavy up and left
\xC2 \xA2 ; 189 BD cent sign
\xC2 \xA5 ; 190 BE yen sign
\0 ? ; 191 BF box drawings light down and left
\0 ? ; 192 C0 box drawings light up and right
\0 ? ; 193 C1 box drawings light up and horizontal
\0 ? ; 194 C2 box drawings light down and horizontal
\0 ? ; 195 C3 box drawings light vertical and right
\0 ? ; 196 C4 box drawings light horizontal
\0 ? ; 197 C5 box drawings light vertical and horizontal
\xC3 \xA3 ; 198 C6 latin small letter a with tilde
\xC3 \x93 ; 199 C7 latin capital letter a with tilde
\0 ? ; 200 C8 box drawings heavy up and right
\0 ? ; 201 C9 box drawings heavy down and right
\0 ? ; 202 CA box drawings heavy up and horizontal
\0 ? ; 203 CB box drawings heavy down and horizontal
\0 ? ; 204 CC box drawings heavy vertical and right
\0 ? ; 205 CD box drawings heavy horizontal
\0 ? ; 206 CE box drawings heavy vertical and horizontal
\xC2 \xA4 ; 207 CF currency sign
\xC3 \xB0 ; 208 D0 latin small letter eth (icelandic)
\xC3 \x90 ; 209 D1 latin capital letter eth (icelandic)
\xC3 \x8A ; 210 D2 latin capital letter e with circumflex
\xC3 \x8B ; 211 D3 latin capital letter e with diaeresis
\xC3 \x94 ; 212 D4 latin capital letter e with grave
\0 i ; 213 D5 latin small letter i dotless
\xC3 \x8D ; 214 D6 latin capital letter i with acute
\xC3 \x8E ; 215 D7 latin capital letter i with circumflex
\xC3 \x8F ; 216 D8 latin capital letter i with diaeresis
\0 ? ; 217 D9 box drawings light up and left
\0 ? ; 218 DA box drawings light down and right
\0 ? ; 219 DB full block
\0 ? ; 220 DC lower half block
\xC2 \xA6 ; 221 DD broken bar
\xC3 \x8C ; 222 DE latin capital letter i with grave
\0 ? ; 223 DF upper half block
\xC3 \x93 ; 224 E0 latin capital letter o with acute
\xC3 \x9F ; 225 E1 latin small letter sharp s (german)
\xC3 \x94 ; 226 E2 latin capital letter o with circumflex
\xC3 \x92 ; 227 E3 latin capital letter o with grave
\xC3 \xB5 ; 228 E4 latin small letter o with tilde
\xC3 \x95 ; 229 E5 latin capital letter o with tilde
\xC2 \xB5 ; 230 E6 greek small letter mu
\xC3 \x9E ; 231 E7 latin capital letter thorn (icelandic)
\xC3 \xBE ; 232 E8 latin small letter thorn (icelandic)
\xC3 \x9A ; 233 E9 latin capital letter u with acute
\xC3 \x9B ; 234 EA latin capital letter u with circumflex
\xC3 \x99 ; 235 EB latin capital letter u with grave
\xC3 \xBD ; 236 EC latin small letter y with acute
\xC3 \x9D ; 237 ED latin capital letter y with acute
\xC3 \x8F ; 238 EE em dash
\xC2 \xB4 ; 239 EF acute accent
\xC2 \xAD ; 240 F0 soft hyphen
\xC2 \xB1 ; 241 F1 plus-minus sign
= = ; 242 F2 left right double arrow
\xC2 \xBE ; 243 F3 vulgar fraction three quarters
\xC2 \xB6 ; 244 F4 pilcrow sign
\xC2 \xA7 ; 245 F5 section sign
\xC3 \xB7 ; 246 F6 division sign
\xC2 \xB8 ; 247 F7 ogonek
\xC2 \xB0 ; 248 F8 degree sign
\xC2 \xA8 ; 249 F9 diaeresis
\0 . ; 250 FA dot above
\xC2 \xB9 ; 251 FB superscript one
\xC2 \xB3 ; 252 FC superscript three
\xC2 \xB2 ; 253 FD superscript two
\xC2 \xB7 ; 254 FE black square
\xC2 \xA0 ; 255 FF no-break space
END
=== Cut ===
SD> Stas
SD> --- GoldED+/LNX 1.1.5-b20070101
SD> * Origin: Golded+, Husky, RNTrack maintainer (2:5080/102.1)
Cheers, Michiel
--- GoldED+/W32-MSVC 1.1.5-b20060315
* Origin: http://www.vlist.org (2:280/5555)
|