Fidonet echomail

Tillbaka till svenska Fidonet
English Information Debug

ENET.SOFT 0/11701
ENET.SYSOP 34136
ENET.TALKS 0/32
ENGLISH_TUTOR 0/2000
EVOLUTION 0/1335
FDECHO 0/217
FDN_ANNOUNCE 0/7068
FIDONEWS 24559
FIDONEWS_OLD1 24293/49742
FIDONEWS_OLD2 34845/35949
FIDONEWS_OLD3 14791/30874
FIDONEWS_OLD4 7580/37224
FIDO_SYSOP 12899
FIDO_UTIL 0/180
FILEFIND 0/209
FILEGATE 0/212
FILM 0/18
FNEWS_PUBLISH 4666
FN_SYSOP 41990
FN_SYSOP_OLD1 71952
FTP_FIDO 0/2
FTSC_PUBLIC 4230/13893
FUNNY 0/4886
GENEALOGY.EUR 0/71
GET_INFO 105
GOLDED 0/408
HAM 0/16234
HOLYSMOKE 0/6791
HOT_SITES 0/1
HTMLEDIT 0/71
HUB203 466
HUB_100 264
HUB_400 39
HUMOR 0/29
IC 0/2851
INTERNET 0/424
INTERUSER 0/3
IP_CONNECT 719
JAMNNTPD 38/233
JAMTLAND 0/47
KATTY_KORNER 0/41
LAN 0/16
LINUX-USER 0/19
LINUXHELP 0/1155
LINUX 0/22237
LINUX_BBS 0/957
mail 18.68
mail_fore_ok 249
MENSA 0/341
MODERATOR 0/102
MONTE 0/992
MOSCOW_OKLAHOMA 0/1245
MUFFIN 0/783
MUSIC 0/321
N203_STAT 938
N203_SYSCHAT 313
NET203 321
NET204 69
NET_DEV 0/10
NORD.ADMIN 0/101
NORD.CHAT 0/2572
NORD.FIDONET 189
NORD.HARDWARE 0/28
NORD.KULTUR 0/114
NORD.PROG 0/32
NORD.SOFTWARE 0/88
NORD.TEKNIK 0/58
NORD 0/453
OCCULT_CHAT 0/93
OS2BBS 0/787
OS2DOSBBS 0/580
OS2HW 0/42
OS2INET 0/37
OS2LAN 0/134
OS2PROG 0/36
OS2REXX 0/113
OS2USER-L 207
OS2 0/4806
OSDEBATE 0/18996
PASCAL 0/490
PERL 0/457
PHP 0/45
POINTS 0/405
POLITICS 11045/29554
POL_INC 0/14731
PSION 103
R20_ADMIN 1129
R20_AMATORRADIO 0/2
R20_BEST_OF_FIDONET 14
R20_CHAT 0/893
R20_DEPP 0/3
R20_DEV 399
R20_ECHO2 1579
R20_ECHOPRES 0/35
R20_ESTAT 444/719
R20_FIDONETPROG...
...RAM.MYPOINT 0/2
R20_FIDONETPROGRAM 0/22
R20_FIDONET 0/248
R20_FILEFIND 0/24
R20_FILEFOUND 0/22
R20_HIFI 0/3
R20_INFO2 3470
R20_INTERNET 0/12940
R20_INTRESSE 0/60
R20_INTR_KOM 0/99
R20_KANDIDAT.CHAT 42
R20_KANDIDAT 28
R20_KOM_DEV 112
R20_KONTROLL 0/13360
R20_KORSET 0/18
R20_LOKALTRAFIK 0/24
R20_MODERATOR 0/1852
R20_NC 76
R20_NET200 245
R20_NETWORK.OTH...
...ERNETS 0/13
R20_OPERATIVSYS...
...TEM.LINUX 0/44
R20_PROGRAMVAROR 0/1
R20_REC2NEC 534
R20_SFOSM 0/341
R20_SF 0/108
R20_SPRAK.ENGLISH 0/1
R20_SQUISH 107
R20_TEST 2
R20_WORST_OF_FIDONET 20
RAR 0/9
RA_MULTI 106
RA_UTIL 0/162
REGCON.EUR 0/2066
REGCON 0/13
SCIENCE 0/1206
SF 0/239
SHAREWARE_SUPPORT 0/5146
SHAREWRE 0/14
SIMPSONS 0/169
STATS_OLD1 0/2539.065
STATS_OLD2 0/2530
STATS_OLD3 0/2395.095
STATS_OLD4 1239/1692.25
SURVIVOR 0/495
SYSOPS_CORNER 0/3
SYSOP 0/84
TAGLINES 0/112
TEAMOS2 0/4530
TECH 0/2617
TEST.444 0/105
TRAPDOOR 0/19
TREK 0/755
TUB 0/290
UFO 0/40
UNIX 0/1316
USA_EURLINK 0/102
USR_MODEMS 0/1
VATICAN 0/2740
VIETNAM_VETS 0/14
VIRUS 0/378
VIRUS_INFO 0/201
VISUAL_BASIC 0/473
WHITEHOUSE 0/5187
WIN2000 77/101
WIN32 0/30
WIN95 0/4291
WIN95_OLD1 0/70272
WINDOWS 0/1517
WWB_SYSOP 0/419
WWB_TECH 0/810
ZCC-PUBLIC 0/1
ZEC 4

4DOS 0/134
ABORTION 0/7
ALASKA_CHAT 0/506
ALLFIX_FILE 0/1313
ALLFIX_FILE_OLD1 0/7997
ALT_DOS 97/152
AMATEUR_RADIO 0/1039
AMIGASALE 0/14
AMIGA 0/331
AMIGA_INT 0/1
AMIGA_PROG 0/20
AMIGA_SYSOP 0/26
ANIME 0/15
ARGUS 0/924
ASCII_ART 0/340
ASIAN_LINK 0/651
ASTRONOMY 0/417
AUDIO 0/92
AUTOMOBILE_RACING 0/105
BABYLON5 0/17862
BAG 135
BATPOWER 0/361
BBBS.ENGLISH 0/382
BBSLAW 0/109
BBS_ADS 0/5290
BBS_INTERNET 0/507
BIBLE 0/3563
BINKD 0/1119
BINKLEY 0/215
BLUEWAVE 0/2173
CABLE_MODEMS 0/25
CBM 0/46
CDRECORD 0/66
CDROM 0/20
CLASSIC_COMPUTER 0/378
COMICS 0/15
CONSPRCY 0/899
COOKING 38114
COOKING_OLD1 0/24719
COOKING_OLD2 26057/40862
COOKING_OLD3 34745/37489
COOKING_OLD4 0/35496
COOKING_OLD5 9370
C_ECHO 0/189
C_PLUSPLUS 0/31
DIRTY_DOZEN 0/201
DOORGAMES 0/2126
DOS_INTERNET 0/196
duplikat 6057
ECHOLIST 1712/18295
EC_SUPPORT 0/318
ELECTRONICS 0/359
ELEKTRONIK.GER 1534
ENET.LINGUISTIC 0/13
ENET.POLITICS 0/4

Möte FIDONEWS_OLD3, 30874 texter

Text 23986, 152 rader
Skriven 2011-12-26 03:34:25 av FidoNews Robot (2:2/2.0)
Ärende: FidoNews 28:52 [02/05]: General Articles
================================================
=================================================================
                        GENERAL ARTICLES
=================================================================

                A PLEA FOR UTF-8 IN FIDONET  Part 1
                By Michiel van der Vlist. 2:280/5555

First there was the spoken word. That was long time ago, nobody knows
exactly how long, but it must have been in the order some hundred
thousand years ago. Later, much, much later came the written word. In
the order five thousand years ago. To get a message from one place to
another. A messenger needed to physically transport an object with the
text written on it from A to B.

Forget about the semaphore and let us jump straight to transporting
messages over electric wire. With that came the need for an encoding
scheme. One of the first encoding schemes was Morse Code. Named after
its (co) inventor Samual Morse. This was around 1840. Since this was
invented in the western World, mostly the USA, it is no surprise that
Morse code only covers the digits 0-9, a few special characters, such
as the question mark and the period, plus 26 letters of the Latin
alphabet. Nowadays Morse Code is used only by a small group of radio
amateurs but for over a century, it was a mainstream coding method for
telecommunication.

Next step was Baudot code. Used in the telex communication system. A
five bit code that covered the 26 letters of the Roman alphabet plus
the digits 0-9 and some punctuation and control signals. Like Morse
code, no distinction between upper and lower case.

In the fifties of the previous century, the first computers entered
the scene. At first these were bulky pieces of machinery filling an
entire room. They were programmed by entering the binary code directly
into memory by so called sense switches. This was cumbersome and error
prone. Soon the need developed to have a way to directly enter the
mnemonics used to memorise the instructions into the computer and let
the commputer itself do the translation into binary form instead of
the operator manually entering the binary code.

With that came the need for a character encoding scheme for computers.
Several encoding schemes were used in the beginning, but in the end it
converged into an 8 bit code that seemed to fit computers like a
glove. Or to be more precise a seven bit code. Used on 8 bit transport
media, but only the lower seven bits were used for encoding text. The
highest bit was used as an error detection mechanism: the parity bit.
This was ASCII, The American Standard Code for Information
Interchange. First introduced in 1960.

The "A" is "ASCII" stands for "American". So it is no surprise that as
far as the letters go, once again it only covers the 26 letters found
in American English. ASCII is much richer that all of its
predecessors, it has many punctuation and special characters, 32 - now
mostly obsolete - control codes and as a new feature, the distinction
between upper and lower case.

That the character set is limited to what is found in American
English, was no great limitation in the beginning of the history of
data processing. Computers becasue of their bulk and cost were only to
be found at government institutes, large companies and universities.
They were used by scientists and engineers. Those could deal with
ASCII only mnachines.

What nobody could foresee when ASCII was devised, happened some two
decades later. Computers became small enough and cheap enough to allow
individuals to have their own private computer ( a PC ) all for
themselves in their own homes. With affordable home computers, came
affordable printers and that was the end of the classic type writer.
Computer use was no longer limited to research workers who's employers
could afford tons of research equipment, but by people that could
afford type writers. And then when those New type writers"  spread
around the world came the need for more than just ASCII. While ASCII
was enough for US Americans using type writers, it was not enough for
the rest of the world. ASCII only became a stranglehold. Those new
computer users wanted to write in their own language. A language that
used characters with accents, umlauts, slashes and even characters not
at all resembling the Roman alphabet. Cyrillics, or even more complex
Asian and Arabic languages.

Microsoft and IBM were quick to respond. They introduced the concept
of code pages. ASCII is seven bit, but computers store information in
lumps of eight bits called a byte. The most significant bit,
originally meant as a parity bit, but obsoleted by more robust error
checking mechanisms, was free to define another 128 characters. IBM
choose to not only include language specific characters in that set of
128, but to also include some 30+ so called "graphic characters" for
line drawing. That may have been a good idea at the time, but in
retrospect it may have been a waste of valuable coding space.

Anyway, at the end of the DOS era, there were dozens of code pages,
covering the needs for hundreds of languages. One could write in
German, Swedish, Russian and Greek without problems. Well, one could
not write in Greek and Russian in the same article because on e could
not change code pages in mid stream. But who wanted that?

And then came the InterNet. And with the Internet came the World Wide
Web. In the beginning the web just copied the solution to language
issues from DOS. code pages and more code pages. It did not take much
more than a decade to realise that the eight bit barrier was the
second stranglehold. Not being able to write Russian and Greek in one
and the same article was NOT acceptable. Eight bits for a character
set was NOT good enough.

Fortunately the price of memory had also dropped spectacularly. Also
the price of transporting bits had dropped steadily. Memory had become
so cheap that it became affordable to store pictures in digital form.
Pictures take orders of magnitude more storing space than text. So
increasing the required storing space for text by a factor of two by
going from a one byte character encoding scheme to a multi byte
encoding scheme, no longer met with economic restrictions.

Enter Unicode.

Unicode introduces the concept of The Universal Character Set. It is
not a static entity, it is still growing. Presently there are over a
million characters defined. While in the code page concept, character
set and character encoding scheme are one and the same, in Unicode
they are decoupled. There is ONE charceter set: the Universal
Character Set. There are several encoding schemes that all have their
merits.

First there is UTF-7. Designed for stone age transport layers that are
7 bits only. Next there is UTF-8. This is an 8 byte multibyte encoding
that takes one to six bytes to encode a character. Next there is
UTF-16. Not suitable for byte onrientated transport media that use
NULL as a special character, but is is used internally by Windows from
XP and up. And finally there is UTF-32.

The obvious choice for FidoNet is UTF-8. The transport layer of
FidoNet is fully 8 bit transparent, with the exception of the NULL
byte that is used as a termination character. Since UTF-8 is fully
downward compatible with ASCII, the first 127 characters in the
Universal Character set are the same as the ASCII set and they are
encoded in exactly the same way. So the NULL in UTF-8 is the same as
the NULL in ASCII, so no problem. Also there will be no conflict with
those that have no need for anything other than good old 7 bit ASCII.
They can keep using the software that they have been using all the
time and everyone will see the same text on his/her screen.

Next week we will go into some details on how to get UTF-8 encoded
FidoNet message on your screen.

To be continued....


© Michiel van der Vlist, all rights reserved.
Permission to publish in the FIDONEWS file scho and the FIDONEWS
discussion echo as originating from 2:2/2

-----------------------------------------------------------------

--- Azure/NewsPrep 3.0
 * Origin: Home of the Fidonews (2:2/2.0)