Tillbaka till svenska Fidonet
English   Information   Debug  
OS2BBS   0/787
OS2DOSBBS   0/580
OS2HW   0/42
OS2INET   0/37
OS2LAN   0/134
OS2PROG   0/36
OS2REXX   0/113
OS2USER-L   207
OS2   0/4804
OSDEBATE   1947/18996
PASCAL   0/490
PERL   0/457
PHP   0/45
POINTS   0/405
POLITICS   0/29554
POL_INC   0/14731
PSION   103
R20_ADMIN   1124
R20_AMATORRADIO   0/2
R20_BEST_OF_FIDONET   13
R20_CHAT   0/893
R20_DEPP   0/3
R20_DEV   399
R20_ECHO2   1387
R20_ECHOPRES   0/35
R20_ESTAT   0/719
R20_FIDONETPROG...
...RAM.MYPOINT
  0/2
R20_FIDONETPROGRAM   0/22
R20_FIDONET   0/248
R20_FILEFIND   0/24
R20_FILEFOUND   0/22
R20_HIFI   0/3
R20_INFO2   3281
R20_INTERNET   0/12940
R20_INTRESSE   0/60
R20_INTR_KOM   0/99
R20_KANDIDAT.CHAT   42
R20_KANDIDAT   28
R20_KOM_DEV   112
R20_KONTROLL   0/13332
R20_KORSET   0/18
R20_LOKALTRAFIK   0/24
R20_MODERATOR   0/1852
R20_NC   76
R20_NET200   245
R20_NETWORK.OTH...
...ERNETS
  0/13
R20_OPERATIVSYS...
...TEM.LINUX
  0/44
R20_PROGRAMVAROR   0/1
R20_REC2NEC   534
R20_SFOSM   0/341
R20_SF   0/108
R20_SPRAK.ENGLISH   0/1
R20_SQUISH   107
R20_TEST   2
R20_WORST_OF_FIDONET   12
RAR   0/9
RA_MULTI   106
RA_UTIL   0/162
REGCON.EUR   0/2056
REGCON   0/13
SCIENCE   0/1206
SF   0/239
SHAREWARE_SUPPORT   0/5146
SHAREWRE   0/14
SIMPSONS   0/169
STATS_OLD1   0/2539.065
STATS_OLD2   0/2530
STATS_OLD3   0/2395.095
STATS_OLD4   0/1692.25
SURVIVOR   0/495
SYSOPS_CORNER   0/3
SYSOP   0/84
TAGLINES   0/112
TEAMOS2   0/4530
TECH   0/2617
TEST.444   0/105
TRAPDOOR   0/19
TREK   0/755
TUB   0/290
UFO   0/40
UNIX   0/1316
USA_EURLINK   0/102
USR_MODEMS   0/1
VATICAN   0/2740
VIETNAM_VETS   0/14
VIRUS   0/378
VIRUS_INFO   0/201
VISUAL_BASIC   0/473
WHITEHOUSE   0/5187
WIN2000   0/101
WIN32   0/30
WIN95   0/4290
WIN95_OLD1   0/70272
WINDOWS   0/1517
WWB_SYSOP   0/419
WWB_TECH   0/810
ZCC-PUBLIC   0/1
ZEC   4

 
4DOS   0/134
ABORTION   0/7
ALASKA_CHAT   0/506
ALLFIX_FILE   0/1313
ALLFIX_FILE_OLD1   0/7997
ALT_DOS   0/152
AMATEUR_RADIO   0/1039
AMIGASALE   0/14
AMIGA   0/331
AMIGA_INT   0/1
AMIGA_PROG   0/20
AMIGA_SYSOP   0/26
ANIME   0/15
ARGUS   0/924
ASCII_ART   0/340
ASIAN_LINK   0/651
ASTRONOMY   0/417
AUDIO   0/92
AUTOMOBILE_RACING   0/105
BABYLON5   0/17862
BAG   135
BATPOWER   0/361
BBBS.ENGLISH   0/382
BBSLAW   0/109
BBS_ADS   0/5290
BBS_INTERNET   0/507
BIBLE   0/3563
BINKD   0/1119
BINKLEY   0/215
BLUEWAVE   0/2173
CABLE_MODEMS   0/25
CBM   0/46
CDRECORD   0/66
CDROM   0/20
CLASSIC_COMPUTER   0/378
COMICS   0/15
CONSPRCY   0/899
COOKING   34033
COOKING_OLD1   0/24719
COOKING_OLD2   0/40862
COOKING_OLD3   0/37489
COOKING_OLD4   0/35496
COOKING_OLD5   9370
C_ECHO   0/189
C_PLUSPLUS   0/31
DIRTY_DOZEN   0/201
DOORGAMES   0/2069
DOS_INTERNET   0/196
duplikat   6002
ECHOLIST   0/18295
EC_SUPPORT   0/318
ELECTRONICS   0/359
ELEKTRONIK.GER   1534
ENET.LINGUISTIC   0/13
ENET.POLITICS   0/4
ENET.SOFT   0/11701
ENET.SYSOP   33966
ENET.TALKS   0/32
ENGLISH_TUTOR   0/2000
EVOLUTION   0/1335
FDECHO   0/217
FDN_ANNOUNCE   0/7068
FIDONEWS   24205
FIDONEWS_OLD1   0/49742
FIDONEWS_OLD2   0/35949
FIDONEWS_OLD3   0/30874
FIDONEWS_OLD4   0/37224
FIDO_SYSOP   12853
FIDO_UTIL   0/180
FILEFIND   0/209
FILEGATE   0/212
FILM   0/18
FNEWS_PUBLISH   4475
FN_SYSOP   41736
FN_SYSOP_OLD1   71952
FTP_FIDO   0/2
FTSC_PUBLIC   0/13628
FUNNY   0/4886
GENEALOGY.EUR   0/71
GET_INFO   105
GOLDED   0/408
HAM   0/16095
HOLYSMOKE   0/6791
HOT_SITES   0/1
HTMLEDIT   0/71
HUB203   466
HUB_100   264
HUB_400   39
HUMOR   0/29
IC   0/2851
INTERNET   0/424
INTERUSER   0/3
IP_CONNECT   719
JAMNNTPD   0/233
JAMTLAND   0/47
KATTY_KORNER   0/41
LAN   0/16
LINUX-USER   0/19
LINUXHELP   0/1155
LINUX   0/22120
LINUX_BBS   0/957
mail   18.68
mail_fore_ok   249
MENSA   0/341
MODERATOR   0/102
MONTE   0/992
MOSCOW_OKLAHOMA   0/1245
MUFFIN   0/783
MUSIC   0/321
N203_STAT   934
N203_SYSCHAT   313
NET203   321
NET204   69
NET_DEV   0/10
NORD.ADMIN   0/101
NORD.CHAT   0/2572
NORD.FIDONET   189
NORD.HARDWARE   0/28
NORD.KULTUR   0/114
NORD.PROG   0/32
NORD.SOFTWARE   0/88
NORD.TEKNIK   0/58
NORD   0/453
OCCULT_CHAT   0/93
Möte OSDEBATE, 18996 texter
 lista första sista föregående nästa
Text 2625, 325 rader
Skriven 2005-02-19 23:32:36 av Rich (1:379/45)
   Kommentar till text 2624 av Ellen K. (1:379/45)
Ärende: Re: ESB / XML / Unicode vs 8-bit characters ?
=====================================================
From: "Rich" <@>

This is a multi-part message in MIME format.

------=_NextPart_000_07D6_01C516DB.4BF0FA60
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

   The UTF in UTF-8/16/32 stands for Unicode Transformation Format.  You =
can find these defined in section 2.5 of =
http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf.

   It's not clear to me how you are creating the XML from the templates. =
 If ANSI data is emitted into an XML document declared as UTF-8 then you =
would have problems only for non-ASCII characters.  UTF-8 and = Windows-1252
are identical for 0x00 to 0x7F which is ASCII in both.

   I do not know how SQL Server maps from char to nchar, specifically =
what conversion is performed.  Also, in some (maybe all released) = versions of
SQL Server nchar and nvarchar are encoded in UCS-2.  UCS-2 = is a 16-bit
encoding like UTF-16.  It dates back to when Unicode was = defined as having
2**16 characters instead of the 2**20+ that it has = now.  You can not express
characters >=3D U+10000 in UCS-2 not that you = care about these.

   I don't know if whether those systems you describe being written in =
java make a difference.  They can do what they want.  The native java = string
is Unicode though I don't remember if it is UCS-2 or UTF-16.  My = guess is
that it was once the former and is now the latter.  One of the = documents on
this on sun's site suggests that java used UCS-2 until the = recently released
1.5 which is the first to use UTF-16.

Rich

  "Ellen K." <72322.1016@compuserve.com> wrote in message =
news:aqag115606i9g8bmh3lst66une1f1sotth@4ax.com...
  UTF-8 is unicode?!?   Sheesh, all this time I thought it meant 8-bit.
  In fact I could swear I read that somewhere.

  My question was coming from the database perspective, where I always =
use
  char and varchar, as opposed to nchar and nvarchar.  I give the
  front-end guys little templates for creating the XML documents for all
  my SQL Server stored procedures that take XML input, and I always
  specify UTF-8 in the header... and my char and varchar columns always
  end up normal, so since you're now telling me UTF-8 is really unicode, =
I
  guess that would answer my question for XML data I would be getting =
from
  the apps...?    Or would the answer be different if the incoming XML =
is
  some other encoding?

  To simulate getting nvarchar data from somewhere, I just tried =
creating
  two dummy tables, one with an nvarchar column and the other with a
  varchar column, typed stuff into the nvarchar one, then inserted to =
the
  varchar one select from the nvarchar one and it looks normal. =20

  If all this means I was worrying about nothing, excellent!   OTOH, is
  there something I should be worrying about that I didn't ask?

  The only pieces whose names I know so far are Sonic and SalesForce, =
both
  of which are written in Java, if that makes any difference.  I know
  there is at least one other external piece but I think that is the =
next
  phase.

  On Sat, 19 Feb 2005 21:37:15 -0800, "Rich" <@> wrote in message
  <421821c1$1@w3.nls.net>:

  >   You need to be more specific than "8-bit characters".  There are =
many 8-bit character encodings.  If you are using Windows to generate = your
data you most likely are using Windows-1252 which is the default = 8-bit
character set for U.S. English in Windows.  Windows supports many = 8-bit
encodings so you could be using something else too.
  >
  >   Unicode is a character set not an encoding.  There are multiple =
encodings the main ones being UTF-8, UTF-16, and UTF-32.  You can use = any of
these for XML as well as non-Unicode encodings.  For = interoperability you
should use Unicode preferably UTF-8.
  >
  >   What comes out when the XML is parsed depends on the XML parser.  =
XML is logically expressed in Unicode.  The Windows XML parsers provide = a
Unicode interface.  Other parsers could do differently.
  >
  >Rich
  >
  >
  >  "Ellen K." <72322.1016@compuserve.com> wrote in message =
news:4o2g11pu048kafbdilg46u77vs5ls0be55@4ax.com...
  >  Our new enterprise system is going to be built around an Enterprise
  >  Service Bus.  I don't have the full specs yet but as I understand =
it the
  >  main apps (starting with SalesForce) are going to be out on the =
internet
  >  and the Sonic ESB will be the messaging piece.  There will  be an
  >  Operational Data Store in house that will get updated every night =
on a
  >  batch basis from the main apps. =20
  >
  >  My data warehouse will continue to be the data warehouse and will =
remain
  >  in house.  The dimensions will stay the same but I might have to =
create
  >  separate measures for the data from the new apps and then create =
views
  >  to keep everything transparent to the users.  =20
  >
  >  I'm thinking if we're going to have an ODS in house already, I may =
as
  >  well do the ETL from there.   But I'm worrying that the new data =
will
  >  probably be unicode (because Java defaults to that and SalesForce =
is
  >  written in Java).  Right now I am storing everything (except our =
blobs
  >  of course) in 8-bit characters.  =20
  >
  >  Anyone here who's up on this stuff, can the XML that goes back and =
forth
  >  convert between unicode and 8-bit characters, or am I gonna have to
  >  redefine all my data?   For example, if unicode data is put into an =
XML
  >  document that specifies UTF-8, what comes out when the document is
  >  parsed?  How about vice versa?  If this is too simplistic to work, =
what
  >  is needed?
  >
  >  (We actually have no substantive need for unicode -- we are =
bilingual
  >  Spanish but all the special Spanish characters exist in the ascii
  >  character set.)

------=_NextPart_000_07D6_01C516DB.4BF0FA60
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.3790.1289" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp; The UTF in UTF-8/16/32 =
stands for=20
Unicode Transformation Format.&nbsp; You can find these defined in = section
2.5=20
of <A=20
href=3D"http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf">http://www=
.unicode.org/versions/Unicode4.0.0/ch02.pdf</A>.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp; It's not clear to me how =
you are=20
creating the XML from the templates.&nbsp; If ANSI data is emitted into = an
XML=20
document declared as UTF-8 then you would have problems only for = non-ASCII=20
characters.&nbsp; UTF-8 and Windows-1252 are identical for 0x00 to 0x7F = which
is=20
ASCII in both.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp; I do not know how SQL =
Server maps from=20
char to nchar, specifically what conversion is performed.&nbsp; Also, in =
some=20
(maybe all released) versions of SQL Server nchar and nvarchar are = encoded
in=20
UCS-2.&nbsp; UCS-2 is a 16-bit encoding like UTF-16.&nbsp; It dates back = to
when=20
Unicode was defined as having 2**16 characters instead of the 2**20+ = that it
has=20
now.&nbsp; You can not express characters &gt;=3D U+10000 in UCS-2 not = that
you=20
care about these.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp; I don't know if whether =
those systems=20
you describe being written in java make a difference.&nbsp; They can do =
what=20
they want.&nbsp; The native java string is Unicode though I don't = remember if
it=20
is UCS-2 or UTF-16.&nbsp; My guess is that it was once the former and is = now
the=20
latter.&nbsp; One of the documents on this on sun's site suggests that = java
used=20
UCS-2 until the recently released 1.5 which is the first to use=20
UTF-16.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Rich</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<BLOCKQUOTE=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV>"Ellen K." &lt;<A=20
  =
href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
t;=20
  wrote in message <A=20
  =
href=3D"news:aqag115606i9g8bmh3lst66une1f1sotth@4ax.com">news:aqag115606i=
9g8bmh3lst66une1f1sotth@4ax.com</A>...</DIV>UTF-8=20
  is unicode?!?&nbsp;&nbsp; Sheesh, all this time I thought it meant=20
  8-bit.<BR>In fact I could swear I read that somewhere.<BR><BR>My =
question was=20
  coming from the database perspective, where I always use<BR>char and =
varchar,=20
  as opposed to nchar and nvarchar.&nbsp; I give the<BR>front-end guys =
little=20
  templates for creating the XML documents for all<BR>my SQL Server =
stored=20
  procedures that take XML input, and I always<BR>specify UTF-8 in the =
header...=20
  and my char and varchar columns always<BR>end up normal, so since =
you're now=20
  telling me UTF-8 is really unicode, I<BR>guess that would answer my =
question=20
  for XML data I would be getting from<BR>the apps...?&nbsp;&nbsp;&nbsp; =
Or=20
  would the answer be different if the incoming XML is<BR>some other=20
  encoding?<BR><BR>To simulate getting nvarchar data from somewhere, I =
just=20
  tried creating<BR>two dummy tables, one with an nvarchar column and =
the other=20
  with a<BR>varchar column, typed stuff into the nvarchar one, then =
inserted to=20
  the<BR>varchar one select from the nvarchar one and it looks =
normal.&nbsp;=20
  <BR><BR>If all this means I was worrying about nothing, =
excellent!&nbsp;&nbsp;=20
  OTOH, is<BR>there something I should be worrying about that I didn't=20
  ask?<BR><BR>The only pieces whose names I know so far are Sonic and=20
  SalesForce, both<BR>of which are written in Java, if that makes any=20
  difference.&nbsp; I know<BR>there is at least one other external piece =
but I=20
  think that is the next<BR>phase.<BR><BR>On Sat, 19 Feb 2005 21:37:15 =
-0800,=20
  "Rich" &lt;@&gt; wrote in message<BR>&lt;<A=20
  =
href=3D"mailto:421821c1$1@w3.nls.net">421821c1$1@w3.nls.net</A>&gt;:<BR><=
BR>&gt;&nbsp;&nbsp;=20
  You need to be more specific than "8-bit characters".&nbsp; There are =
many=20
  8-bit character encodings.&nbsp; If you are using Windows to generate =
your=20
  data you most likely are using Windows-1252 which is the default 8-bit =

  character set for U.S. English in Windows.&nbsp; Windows supports many =
8-bit=20
  encodings so you could be using something else=20
  too.<BR>&gt;<BR>&gt;&nbsp;&nbsp; Unicode is a character set not an=20
  encoding.&nbsp; There are multiple encodings the main ones being =
UTF-8,=20
  UTF-16, and UTF-32.&nbsp; You can use any of these for XML as well as=20
  non-Unicode encodings.&nbsp; For interoperability you should use =
Unicode=20
  preferably UTF-8.<BR>&gt;<BR>&gt;&nbsp;&nbsp; What comes out when the =
XML is=20
  parsed depends on the XML parser.&nbsp; XML is logically expressed in=20
  Unicode.&nbsp; The Windows XML parsers provide a Unicode =
interface.&nbsp;=20
  Other parsers could do=20
  differently.<BR>&gt;<BR>&gt;Rich<BR>&gt;<BR>&gt;<BR>&gt;&nbsp; "Ellen =
K."=20
  &lt;<A=20
  =
href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
t;=20
  wrote in message <A=20
  =
href=3D"news:4o2g11pu048kafbdilg46u77vs5ls0be55@4ax.com">news:4o2g11pu048=
kafbdilg46u77vs5ls0be55@4ax.com</A>...<BR>&gt;&nbsp;=20
  Our new enterprise system is going to be built around an=20
  Enterprise<BR>&gt;&nbsp; Service Bus.&nbsp; I don't have the full =
specs yet=20
  but as I understand it the<BR>&gt;&nbsp; main apps (starting with =
SalesForce)=20
  are going to be out on the internet<BR>&gt;&nbsp; and the Sonic ESB =
will be=20
  the messaging piece.&nbsp; There will&nbsp; be an<BR>&gt;&nbsp; =
Operational=20
  Data Store in house that will get updated every night on =
a<BR>&gt;&nbsp; batch=20
  basis from the main apps.&nbsp; <BR>&gt;<BR>&gt;&nbsp; My data =
warehouse will=20
  continue to be the data warehouse and will remain<BR>&gt;&nbsp; in=20
  house.&nbsp; The dimensions will stay the same but I might have to=20
  create<BR>&gt;&nbsp; separate measures for the data from the new apps =
and then=20
  create views<BR>&gt;&nbsp; to keep everything transparent to the=20
  users.&nbsp;&nbsp; <BR>&gt;<BR>&gt;&nbsp; I'm thinking if we're going =
to have=20
  an ODS in house already, I may as<BR>&gt;&nbsp; well do the ETL from=20
  there.&nbsp;&nbsp; But I'm worrying that the new data =
will<BR>&gt;&nbsp;=20
  probably be unicode (because Java defaults to that and SalesForce=20
  is<BR>&gt;&nbsp; written in Java).&nbsp; Right now I am storing =
everything=20
  (except our blobs<BR>&gt;&nbsp; of course) in 8-bit =
characters.&nbsp;&nbsp;=20
  <BR>&gt;<BR>&gt;&nbsp; Anyone here who's up on this stuff, can the XML =
that=20
  goes back and forth<BR>&gt;&nbsp; convert between unicode and 8-bit=20
  characters, or am I gonna have to<BR>&gt;&nbsp; redefine all my=20
  data?&nbsp;&nbsp; For example, if unicode data is put into an=20
  XML<BR>&gt;&nbsp; document that specifies UTF-8, what comes out when =
the=20
  document is<BR>&gt;&nbsp; parsed?&nbsp; How about vice versa?&nbsp; If =
this is=20
  too simplistic to work, what<BR>&gt;&nbsp; is =
needed?<BR>&gt;<BR>&gt;&nbsp;=20
  (We actually have no substantive need for unicode -- we are=20
  bilingual<BR>&gt;&nbsp; Spanish but all the special Spanish characters =
exist=20
  in the ascii<BR>&gt;&nbsp; character =
set.)<BR></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_07D6_01C516DB.4BF0FA60--

--- BBBS/NT v4.01 Flag-5
 * Origin: Barktopia BBS Site http://HarborWebs.com:8081 (1:379/45)