Tillbaka till svenska Fidonet
English   Information   Debug  
ENET.SOFT   0/11701
ENET.SYSOP   33963
ENET.TALKS   0/32
ENGLISH_TUTOR   0/2000
EVOLUTION   0/1335
FDECHO   0/217
FDN_ANNOUNCE   0/7068
FIDONEWS   24191
FIDONEWS_OLD1   0/49742
FIDONEWS_OLD2   0/35949
FIDONEWS_OLD3   22841/30874
FIDONEWS_OLD4   0/37224
FIDO_SYSOP   12852
FIDO_UTIL   0/180
FILEFIND   0/209
FILEGATE   0/212
FILM   0/18
FNEWS_PUBLISH   4461
FN_SYSOP   41735
FN_SYSOP_OLD1   71952
FTP_FIDO   0/2
FTSC_PUBLIC   0/13627
FUNNY   0/4886
GENEALOGY.EUR   0/71
GET_INFO   105
GOLDED   0/408
HAM   0/16084
HOLYSMOKE   0/6791
HOT_SITES   0/1
HTMLEDIT   0/71
HUB203   466
HUB_100   264
HUB_400   39
HUMOR   0/29
IC   0/2851
INTERNET   0/424
INTERUSER   0/3
IP_CONNECT   719
JAMNNTPD   0/233
JAMTLAND   0/47
KATTY_KORNER   0/41
LAN   0/16
LINUX-USER   0/19
LINUXHELP   0/1155
LINUX   0/22120
LINUX_BBS   0/957
mail   18.68
mail_fore_ok   249
MENSA   0/341
MODERATOR   0/102
MONTE   0/992
MOSCOW_OKLAHOMA   0/1245
MUFFIN   0/783
MUSIC   0/321
N203_STAT   932
N203_SYSCHAT   313
NET203   321
NET204   69
NET_DEV   0/10
NORD.ADMIN   0/101
NORD.CHAT   0/2572
NORD.FIDONET   189
NORD.HARDWARE   0/28
NORD.KULTUR   0/114
NORD.PROG   0/32
NORD.SOFTWARE   0/88
NORD.TEKNIK   0/58
NORD   0/453
OCCULT_CHAT   0/93
OS2BBS   0/787
OS2DOSBBS   0/580
OS2HW   0/42
OS2INET   0/37
OS2LAN   0/134
OS2PROG   0/36
OS2REXX   0/113
OS2USER-L   207
OS2   0/4793
OSDEBATE   0/18996
PASCAL   0/490
PERL   0/457
PHP   0/45
POINTS   0/405
POLITICS   0/29554
POL_INC   0/14731
PSION   103
R20_ADMIN   1124
R20_AMATORRADIO   0/2
R20_BEST_OF_FIDONET   13
R20_CHAT   0/893
R20_DEPP   0/3
R20_DEV   399
R20_ECHO2   1379
R20_ECHOPRES   0/35
R20_ESTAT   0/719
R20_FIDONETPROG...
...RAM.MYPOINT
  0/2
R20_FIDONETPROGRAM   0/22
R20_FIDONET   0/248
R20_FILEFIND   0/24
R20_FILEFOUND   0/22
R20_HIFI   0/3
R20_INFO2   3268
R20_INTERNET   0/12940
R20_INTRESSE   0/60
R20_INTR_KOM   0/99
R20_KANDIDAT.CHAT   42
R20_KANDIDAT   28
R20_KOM_DEV   112
R20_KONTROLL   0/13318
R20_KORSET   0/18
R20_LOKALTRAFIK   0/24
R20_MODERATOR   0/1852
R20_NC   76
R20_NET200   245
R20_NETWORK.OTH...
...ERNETS
  0/13
R20_OPERATIVSYS...
...TEM.LINUX
  0/44
R20_PROGRAMVAROR   0/1
R20_REC2NEC   534
R20_SFOSM   0/341
R20_SF   0/108
R20_SPRAK.ENGLISH   0/1
R20_SQUISH   107
R20_TEST   2
R20_WORST_OF_FIDONET   12
RAR   0/9
RA_MULTI   106
RA_UTIL   0/162
REGCON.EUR   0/2056
REGCON   0/13
SCIENCE   0/1206
SF   0/239
SHAREWARE_SUPPORT   0/5146
SHAREWRE   0/14
SIMPSONS   0/169
STATS_OLD1   0/2539.065
STATS_OLD2   0/2530
STATS_OLD3   0/2395.095
STATS_OLD4   0/1692.25
SURVIVOR   0/495
SYSOPS_CORNER   0/3
SYSOP   0/84
TAGLINES   0/112
TEAMOS2   0/4530
TECH   0/2617
TEST.444   0/105
TRAPDOOR   0/19
TREK   0/755
TUB   0/290
UFO   0/40
UNIX   0/1316
USA_EURLINK   0/102
USR_MODEMS   0/1
VATICAN   0/2740
VIETNAM_VETS   0/14
VIRUS   0/378
VIRUS_INFO   0/201
VISUAL_BASIC   0/473
WHITEHOUSE   0/5187
WIN2000   0/101
WIN32   0/30
WIN95   0/4290
WIN95_OLD1   0/70272
WINDOWS   0/1517
WWB_SYSOP   0/419
WWB_TECH   0/810
ZCC-PUBLIC   0/1
ZEC   4

 
4DOS   0/134
ABORTION   0/7
ALASKA_CHAT   0/506
ALLFIX_FILE   0/1313
ALLFIX_FILE_OLD1   0/7997
ALT_DOS   0/152
AMATEUR_RADIO   0/1039
AMIGASALE   0/14
AMIGA   0/331
AMIGA_INT   0/1
AMIGA_PROG   0/20
AMIGA_SYSOP   0/26
ANIME   0/15
ARGUS   0/924
ASCII_ART   0/340
ASIAN_LINK   0/651
ASTRONOMY   0/417
AUDIO   0/92
AUTOMOBILE_RACING   0/105
BABYLON5   0/17862
BAG   135
BATPOWER   0/361
BBBS.ENGLISH   0/382
BBSLAW   0/109
BBS_ADS   0/5290
BBS_INTERNET   0/507
BIBLE   0/3563
BINKD   0/1119
BINKLEY   0/215
BLUEWAVE   0/2173
CABLE_MODEMS   0/25
CBM   0/46
CDRECORD   0/66
CDROM   0/20
CLASSIC_COMPUTER   0/378
COMICS   0/15
CONSPRCY   0/899
COOKING   33703
COOKING_OLD1   0/24719
COOKING_OLD2   0/40862
COOKING_OLD3   0/37489
COOKING_OLD4   0/35496
COOKING_OLD5   9370
C_ECHO   0/189
C_PLUSPLUS   0/31
DIRTY_DOZEN   0/201
DOORGAMES   0/2065
DOS_INTERNET   0/196
duplikat   6002
ECHOLIST   0/18295
EC_SUPPORT   0/318
ELECTRONICS   0/359
ELEKTRONIK.GER   1534
ENET.LINGUISTIC   0/13
ENET.POLITICS   0/4
Möte FIDONEWS_OLD3, 30874 texter
 lista första sista föregående nästa
Text 24295, 98 rader
Skriven 2012-01-04 15:01:57 av Peter Krefting (2:203/0.222)
  Kommentar till text 23986 av FidoNews Robot (2:2/2.0)
Ärende: A plea for UTF-8 in Fidonet (was: FidoNews 28:52 [02/05]: General Artic
===============================================================================
Den 2011-12-26 03:34:25 skrev FidoNews Robot <0@2.2.2>:

>                  A PLEA FOR UTF-8 IN FIDONET  Part 1
>                  By Michiel van der Vlist. 2:280/5555

Some notes (I've been working on Unicode and legacy encodings professionally
for the better part of the last decade):

> The "A" is "ASCII" stands for "American". So it is no surprise that as
> far as the letters go, once again it only covers the 26 letters found in
> American English. ASCII is much richer that all of its predecessors, it
> has many punctuation and special characters, 32 - now mostly obsolete -
> control codes and as a new feature, the distinction between upper and
> lower case.

Initially, it also *did* have some limited support for characters from other
languages -- by using the backspace control code, you could produce some
diacritics. For instance, an 'ö' could be produced by emitting "o", backspace,
'"'. This did work for paper-based terminals, but the first screen-based
terminals never really supported this, so the functionality was mostly lost
there. This is why ASCII was changed to include '^' and '_' (for underlining),
instead of the up-arrow and left-arrow of the original version (compare the
Commodore PET character set, which was based on the earlier version of ASCII).

> Anyway, at the end of the DOS era, there were dozens of code pages,
> covering the needs for hundreds of languages. One could write in
> German, Swedish, Russian and Greek without problems. Well, one could
> not write in Greek and Russian in the same article because on e could
> not change code pages in mid stream. But who wanted that?

Your history is missing out a bit on the MBCSes (multi-byte character sets)
used for Chinese, Japanese and Korean here.

> Enter Unicode.

Unicode does predate the web, IIRC, though.

> Unicode introduces the concept of The Universal Character Set. It is not
> a static entity, it is still growing. Presently there are over a million
> characters defined. While in the code page concept, character set and
> character encoding scheme are one and the same, in Unicode they are
> decoupled. There is ONE charceter set: the Universal Character Set.

You're mixing things a bit. Universal Character Set (UCS) is ISO-10646, not
Unicode. The two (ISO and the Unicode consortium) do work together to
coordinate their character sets, though, making sure they are always
compatible, so it is easy enough to mistake one for the other. The surrounding
documentation is different between the two, however.

> There are several encoding schemes that all have their merits.
>
> First there is UTF-7. Designed for stone age transport layers that are 7
> bits only.

UTF-7 is an encoding of Unicode that is not specified by Unicode itself. It was
devised to work around problems with seven-bit email links, but is not
officially "sanctioned" by Unicode (and it is a horrible encoding to work with,
trust me on that).

> Next there is UTF-8. This is an 8 byte multibyte encoding that takes one
> to six bytes to encode a character.

The five and six byte forms are not used in modern UTF-8, as Unicode has been
defined as ending at U+10FFFF. So UTF-8 is at most four bytes per characters.

> Next there is UTF-16. Not suitable for byte onrientated transport media
> that use NULL as a special character, but is is used internally by
> Windows from XP and up.

This is what originally was the encoding of Unicode. It is used (in its unnamed
form, or just called "Unicode") in all versions of Windows NT, from NT 3.1 and
up, all up to Windows 7. I don't remember which version that first supported
the UTF-16 features (i.e., the surrogate pairs), but I do believe that Windows
NT 4 had some support for that already. UTF-16 is two or four bytes per
character.

> And finally there is UTF-32.

Which is a nice internal representation if you're not concerned with memory.
Each Unicode character fits nicely in a 32-bit data unit. Of course, this
wastes several bits per character, as values over 0x10FFFF are not used, but it
makes working on the characters a breeze.

> The obvious choice for FidoNet is UTF-8.

Indeed.


Fortunately interest for ISO 2022, which is a lot older, has dwindled after
Unicode appeared. ISO 2022 can be used to mix characters from any character
sets, using a stateful encoding. But writing support for that in software is
not fun (trust me, I know this from experience).


\\// Peter

--- Opera Mail/11.60 (Linux)
 * Origin: Softwolves Software @ Oslo, Norway (2:203/0.222)