Tillbaka till svenska Fidonet
English   Information   Debug  
OS2BBS   0/787
OS2DOSBBS   0/580
OS2HW   0/42
OS2INET   0/37
OS2LAN   0/134
OS2PROG   0/36
OS2REXX   0/113
OS2USER-L   207
OS2   0/4794
OSDEBATE   0/18996
PASCAL   0/490
PERL   0/457
PHP   0/45
POINTS   0/405
POLITICS   0/29554
POL_INC   0/14731
PSION   103
R20_ADMIN   1124
R20_AMATORRADIO   0/2
R20_BEST_OF_FIDONET   13
R20_CHAT   0/893
R20_DEPP   0/3
R20_DEV   399
R20_ECHO2   1379
R20_ECHOPRES   0/35
R20_ESTAT   0/719
R20_FIDONETPROG...
...RAM.MYPOINT
  0/2
R20_FIDONETPROGRAM   0/22
R20_FIDONET   0/248
R20_FILEFIND   0/24
R20_FILEFOUND   0/22
R20_HIFI   0/3
R20_INFO2   3268
R20_INTERNET   0/12940
R20_INTRESSE   0/60
R20_INTR_KOM   0/99
R20_KANDIDAT.CHAT   42
R20_KANDIDAT   28
R20_KOM_DEV   112
R20_KONTROLL   0/13318
R20_KORSET   0/18
R20_LOKALTRAFIK   0/24
R20_MODERATOR   0/1852
R20_NC   76
R20_NET200   245
R20_NETWORK.OTH...
...ERNETS
  0/13
R20_OPERATIVSYS...
...TEM.LINUX
  0/44
R20_PROGRAMVAROR   0/1
R20_REC2NEC   534
R20_SFOSM   0/341
R20_SF   0/108
R20_SPRAK.ENGLISH   0/1
R20_SQUISH   107
R20_TEST   2
R20_WORST_OF_FIDONET   12
RAR   0/9
RA_MULTI   106
RA_UTIL   0/162
REGCON.EUR   0/2056
REGCON   0/13
SCIENCE   0/1206
SF   0/239
SHAREWARE_SUPPORT   0/5146
SHAREWRE   0/14
SIMPSONS   0/169
STATS_OLD1   0/2539.065
STATS_OLD2   0/2530
STATS_OLD3   0/2395.095
STATS_OLD4   0/1692.25
SURVIVOR   0/495
SYSOPS_CORNER   0/3
SYSOP   0/84
TAGLINES   0/112
TEAMOS2   0/4530
TECH   0/2617
TEST.444   0/105
TRAPDOOR   0/19
TREK   0/755
TUB   0/290
UFO   0/40
UNIX   0/1316
USA_EURLINK   0/102
USR_MODEMS   0/1
VATICAN   0/2740
VIETNAM_VETS   0/14
VIRUS   0/378
VIRUS_INFO   0/201
VISUAL_BASIC   0/473
WHITEHOUSE   0/5187
WIN2000   0/101
WIN32   0/30
WIN95   0/4290
WIN95_OLD1   0/70272
WINDOWS   0/1517
WWB_SYSOP   0/419
WWB_TECH   0/810
ZCC-PUBLIC   0/1
ZEC   4

 
4DOS   0/134
ABORTION   0/7
ALASKA_CHAT   0/506
ALLFIX_FILE   0/1313
ALLFIX_FILE_OLD1   0/7997
ALT_DOS   0/152
AMATEUR_RADIO   0/1039
AMIGASALE   0/14
AMIGA   0/331
AMIGA_INT   0/1
AMIGA_PROG   0/20
AMIGA_SYSOP   0/26
ANIME   0/15
ARGUS   0/924
ASCII_ART   0/340
ASIAN_LINK   0/651
ASTRONOMY   0/417
AUDIO   0/92
AUTOMOBILE_RACING   0/105
BABYLON5   0/17862
BAG   135
BATPOWER   0/361
BBBS.ENGLISH   0/382
BBSLAW   0/109
BBS_ADS   0/5290
BBS_INTERNET   0/507
BIBLE   0/3563
BINKD   0/1119
BINKLEY   0/215
BLUEWAVE   0/2173
CABLE_MODEMS   0/25
CBM   0/46
CDRECORD   0/66
CDROM   0/20
CLASSIC_COMPUTER   0/378
COMICS   0/15
CONSPRCY   0/899
COOKING   33710
COOKING_OLD1   0/24719
COOKING_OLD2   0/40862
COOKING_OLD3   0/37489
COOKING_OLD4   0/35496
COOKING_OLD5   9370
C_ECHO   0/189
C_PLUSPLUS   0/31
DIRTY_DOZEN   0/201
DOORGAMES   0/2065
DOS_INTERNET   0/196
duplikat   6002
ECHOLIST   0/18295
EC_SUPPORT   0/318
ELECTRONICS   0/359
ELEKTRONIK.GER   1534
ENET.LINGUISTIC   0/13
ENET.POLITICS   0/4
ENET.SOFT   0/11701
ENET.SYSOP   33963
ENET.TALKS   0/32
ENGLISH_TUTOR   0/2000
EVOLUTION   0/1335
FDECHO   0/217
FDN_ANNOUNCE   0/7068
FIDONEWS   24191
FIDONEWS_OLD1   0/49742
FIDONEWS_OLD2   0/35949
FIDONEWS_OLD3   0/30874
FIDONEWS_OLD4   0/37224
FIDO_SYSOP   12852
FIDO_UTIL   0/180
FILEFIND   0/209
FILEGATE   0/212
FILM   0/18
FNEWS_PUBLISH   4461
FN_SYSOP   41736
FN_SYSOP_OLD1   71952
FTP_FIDO   0/2
FTSC_PUBLIC   0/13627
FUNNY   0/4886
GENEALOGY.EUR   0/71
GET_INFO   105
GOLDED   0/408
HAM   0/16084
HOLYSMOKE   0/6791
HOT_SITES   0/1
HTMLEDIT   0/71
HUB203   466
HUB_100   264
HUB_400   39
HUMOR   0/29
IC   0/2851
INTERNET   0/424
INTERUSER   0/3
IP_CONNECT   719
JAMNNTPD   0/233
JAMTLAND   0/47
KATTY_KORNER   0/41
LAN   0/16
LINUX-USER   0/19
LINUXHELP   0/1155
LINUX   0/22120
LINUX_BBS   0/957
mail   18.68
mail_fore_ok   249
MENSA   0/341
MODERATOR   0/102
MONTE   0/992
MOSCOW_OKLAHOMA   0/1245
MUFFIN   0/783
MUSIC   0/321
N203_STAT   932
N203_SYSCHAT   313
NET203   321
NET204   69
NET_DEV   0/10
NORD.ADMIN   0/101
NORD.CHAT   0/2572
NORD.FIDONET   189
NORD.HARDWARE   0/28
NORD.KULTUR   0/114
NORD.PROG   0/32
NORD.SOFTWARE   0/88
NORD.TEKNIK   0/58
NORD   0/453
OCCULT_CHAT   0/93
Möte OSDEBATE, 18996 texter
 lista första sista föregående nästa
Text 2636, 495 rader
Skriven 2005-02-20 12:52:24 av Rich (1:379/45)
   Kommentar till text 2629 av Ellen K. (1:379/45)
Ärende: Re: ESB / XML / Unicode vs 8-bit characters ?
=====================================================
From: "Rich" <@>

This is a multi-part message in MIME format.

------=_NextPart_000_0837_01C5174B.06C691F0
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

   From what you describe below, if the values you emit to XML have =
non-ASCII characters I would expect you to have a problem.

Rich

  "Ellen K." <72322.1016@compuserve.com> wrote in message =
news:eanh11h4vv6b9v21fiaounii3f5dunjl3g@4ax.com...
  On Sat, 19 Feb 2005 23:32:37 -0800, "Rich" <@> wrote in message
  <42183ccd@w3.nls.net>:

  >   The UTF in UTF-8/16/32 stands for Unicode Transformation Format.  =
You can find these defined in section 2.5 of =
http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf.

  THANK YOU SO MUCH!!!    :)
  >
  >   It's not clear to me how you are creating the XML from the =
templates.  If ANSI data is emitted into an XML document declared as = UTF-8
then you would have problems only for non-ASCII characters.  UTF-8 = and
Windows-1252 are identical for 0x00 to 0x7F which is ASCII in both.

  I don't have a copy of a template here at home, but I have them create
  it by string concatenation because that seems to be the only way to be
  able to have CDATA attributes, which I have to have because in the
  legacy data numeric-appearing identifiers are actually 10-character
  strings with leading spaces, and if these are not specified as CDATA
  the spaces go lost even with "xml:space=3D"preserve"" included in the
  header.  Here is a code snippet from one of my apps that creates an =
XML
  document which is passed as a parameter to a SQL Server stored
  procedure:

  >      strXM =3D "<?xml version =3D" & Chr(34) & "1.0" & Chr(34) & "  =
encoding=3D" & Chr(34) & "UTF-8" & Chr(34) & "?>" & vbCrLf _
  >        & "<ROOT xml:space=3D" & Chr(34) & "preserve" & Chr(34) & ">" =
& vbCrLf
  >
  >      Do While Not .EOF
  >        strXM =3D strXM & "<M><A>" & !Ofc & "</A><B><![CDATA[" & =
!Contract & "]]></B><C>" & !TCode & "</C><D>" & !Date & "</D>" _
  >                & "<E><![CDATA[" & !TransNo & "]]></E></M>" & vbCrLf
  >        .MoveNext
  >      Loop
  >
  >      strXM =3D strXM & "</ROOT>"

  (The vbCrLf's are there so if there is a problem the document can be
  printed to a text file and be easier for humans to read -- SQL Server
  ignores them.  The single-character aliases for entity and attribute
  names are for performance -- for most of the stuff we use these for it
  doesn't really matter because we are only sending a few rows, but the
  first time I did it it was for something that was sending about 5000
  rows and there it made a huge difference, so I stuck with it.  We
  comment both the front-end code and the stored procedure with the
  mappings of these aliases.)

  >   I do not know how SQL Server maps from char to nchar, specifically =
what conversion is performed.  Also, in some (maybe all released) = versions of
SQL Server nchar and nvarchar are encoded in UCS-2.  UCS-2 = is a 16-bit
encoding like UTF-16.  It dates back to when Unicode was = defined as having
2**16 characters instead of the 2**20+ that it has = now.  You can not express
characters >=3D U+10000 in UCS-2 not that you = care about these.

  Thankfully, no.   :)
  >
  >   I don't know if whether those systems you describe being written =
in java make a difference.  They can do what they want.  The native java =
string is Unicode though I don't remember if it is UCS-2 or UTF-16.  My = guess
is that it was once the former and is now the latter.  One of the = documents
on this on sun's site suggests that java used UCS-2 until the = recently
released 1.5 which is the first to use UTF-16.

  The Java native string being unicode is exactly what made me start
  worrying -- when I was learning Java a couple of years ago (because I
  wanted to port an app to it so as to be able to run it right on the =
Unix
  box where the Oracle database was) I was horrified the first time I
  tried reading back what I had written to a text file when I saw spaces
  between all the characters.
  >
  >Rich
  >
  >  "Ellen K." <72322.1016@compuserve.com> wrote in message =
news:aqag115606i9g8bmh3lst66une1f1sotth@4ax.com...
  >  UTF-8 is unicode?!?   Sheesh, all this time I thought it meant =
8-bit.
  >  In fact I could swear I read that somewhere.
  >
  >  My question was coming from the database perspective, where I =
always use
  >  char and varchar, as opposed to nchar and nvarchar.  I give the
  >  front-end guys little templates for creating the XML documents for =
all
  >  my SQL Server stored procedures that take XML input, and I always
  >  specify UTF-8 in the header... and my char and varchar columns =
always
  >  end up normal, so since you're now telling me UTF-8 is really =
unicode, I
  >  guess that would answer my question for XML data I would be getting =
from
  >  the apps...?    Or would the answer be different if the incoming =
XML is
  >  some other encoding?
  >
  >  To simulate getting nvarchar data from somewhere, I just tried =
creating
  >  two dummy tables, one with an nvarchar column and the other with a
  >  varchar column, typed stuff into the nvarchar one, then inserted to =
the
  >  varchar one select from the nvarchar one and it looks normal. =20
  >
  >  If all this means I was worrying about nothing, excellent!   OTOH, =
is
  >  there something I should be worrying about that I didn't ask?
  >
  >  The only pieces whose names I know so far are Sonic and SalesForce, =
both
  >  of which are written in Java, if that makes any difference.  I know
  >  there is at least one other external piece but I think that is the =
next
  >  phase.
  >
  >  On Sat, 19 Feb 2005 21:37:15 -0800, "Rich" <@> wrote in message
  >  <421821c1$1@w3.nls.net>:
  >
  >  >   You need to be more specific than "8-bit characters".  There =
are many 8-bit character encodings.  If you are using Windows to = generate
your data you most likely are using Windows-1252 which is the = default 8-bit
character set for U.S. English in Windows.  Windows = supports many 8-bit
encodings so you could be using something else too.
  >  >
  >  >   Unicode is a character set not an encoding.  There are multiple =
encodings the main ones being UTF-8, UTF-16, and UTF-32.  You can use = any of
these for XML as well as non-Unicode encodings.  For = interoperability you
should use Unicode preferably UTF-8.
  >  >
  >  >   What comes out when the XML is parsed depends on the XML =
parser.  XML is logically expressed in Unicode.  The Windows XML parsers =
provide a Unicode interface.  Other parsers could do differently.
  >  >
  >  >Rich
  >  >
  >  >
  >  >  "Ellen K." <72322.1016@compuserve.com> wrote in message =
news:4o2g11pu048kafbdilg46u77vs5ls0be55@4ax.com...
  >  >  Our new enterprise system is going to be built around an =
Enterprise
  >  >  Service Bus.  I don't have the full specs yet but as I =
understand it the
  >  >  main apps (starting with SalesForce) are going to be out on the =
internet
  >  >  and the Sonic ESB will be the messaging piece.  There will  be =
an
  >  >  Operational Data Store in house that will get updated every =
night on a
  >  >  batch basis from the main apps. =20
  >  >
  >  >  My data warehouse will continue to be the data warehouse and =
will remain
  >  >  in house.  The dimensions will stay the same but I might have to =
create
  >  >  separate measures for the data from the new apps and then create =
views
  >  >  to keep everything transparent to the users.  =20
  >  >
  >  >  I'm thinking if we're going to have an ODS in house already, I =
may as
  >  >  well do the ETL from there.   But I'm worrying that the new data =
will
  >  >  probably be unicode (because Java defaults to that and =
SalesForce is
  >  >  written in Java).  Right now I am storing everything (except our =
blobs
  >  >  of course) in 8-bit characters.  =20
  >  >
  >  >  Anyone here who's up on this stuff, can the XML that goes back =
and forth
  >  >  convert between unicode and 8-bit characters, or am I gonna have =
to
  >  >  redefine all my data?   For example, if unicode data is put into =
an XML
  >  >  document that specifies UTF-8, what comes out when the document =
is
  >  >  parsed?  How about vice versa?  If this is too simplistic to =
work, what
  >  >  is needed?
  >  >
  >  >  (We actually have no substantive need for unicode -- we are =
bilingual
  >  >  Spanish but all the special Spanish characters exist in the =
ascii
  >  >  character set.)

------=_NextPart_000_0837_01C5174B.06C691F0
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.3790.1289" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp; From what you describe =
below, if the=20
values you emit to XML have non-ASCII characters I would expect you to = have
a=20
problem.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Rich</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<BLOCKQUOTE=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV>"Ellen K." &lt;<A=20
  =
href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
t;=20
  wrote in message <A=20
  =
href=3D"news:eanh11h4vv6b9v21fiaounii3f5dunjl3g@4ax.com">news:eanh11h4vv6=
b9v21fiaounii3f5dunjl3g@4ax.com</A>...</DIV>On=20
  Sat, 19 Feb 2005 23:32:37 -0800, "Rich" &lt;@&gt; wrote in =
message<BR>&lt;<A=20
  =
href=3D"mailto:42183ccd@w3.nls.net">42183ccd@w3.nls.net</A>&gt;:<BR><BR>&=
gt;&nbsp;&nbsp;=20
  The UTF in UTF-8/16/32 stands for Unicode Transformation Format.&nbsp; =
You can=20
  find these defined in section 2.5 of <A=20
  =
href=3D"http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf">http://www=
.unicode.org/versions/Unicode4.0.0/ch02.pdf</A>.<BR><BR>THANK=20
  YOU SO MUCH!!!&nbsp;&nbsp;&nbsp; :)<BR>&gt;<BR>&gt;&nbsp;&nbsp; It's =
not clear=20
  to me how you are creating the XML from the templates.&nbsp; If ANSI =
data is=20
  emitted into an XML document declared as UTF-8 then you would have =
problems=20
  only for non-ASCII characters.&nbsp; UTF-8 and Windows-1252 are =
identical for=20
  0x00 to 0x7F which is ASCII in both.<BR><BR>I don't have a copy of a =
template=20
  here at home, but I have them create<BR>it by string concatenation =
because=20
  that seems to be the only way to be<BR>able to have CDATA attributes, =
which I=20
  have to have because in the<BR>legacy data numeric-appearing =
identifiers are=20
  actually 10-character<BR>strings with leading spaces, and if these are =
not=20
  specified as CDATA<BR>the spaces go lost even with =
"xml:space=3D"preserve""=20
  included in the<BR>header.&nbsp; Here is a code snippet from one of my =
apps=20
  that creates an XML<BR>document which is passed as a parameter to a =
SQL Server=20
  stored<BR>procedure:<BR><BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strXM =
=3D=20
  "&lt;?xml version =3D" &amp; Chr(34) &amp; "1.0" &amp; Chr(34) &amp; =
"&nbsp;=20
  encoding=3D" &amp; Chr(34) &amp; "UTF-8" &amp; Chr(34) &amp; "?&gt;" =
&amp;=20
  vbCrLf _<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &amp; =
"&lt;ROOT=20
  xml:space=3D" &amp; Chr(34) &amp; "preserve" &amp; Chr(34) &amp; =
"&gt;" &amp;=20
  vbCrLf<BR>&gt;<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Do While Not=20
  .EOF<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strXM =3D strXM =
&amp;=20
  "&lt;M&gt;&lt;A&gt;" &amp; !Ofc &amp; =
"&lt;/A&gt;&lt;B&gt;&lt;![CDATA[" &amp;=20
  !Contract &amp; "]]&gt;&lt;/B&gt;&lt;C&gt;" &amp; !TCode &amp;=20
  "&lt;/C&gt;&lt;D&gt;" &amp; !Date &amp; "&lt;/D&gt;"=20
  =
_<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;=20
  &amp; "&lt;E&gt;&lt;![CDATA[" &amp; !TransNo &amp;=20
  "]]&gt;&lt;/E&gt;&lt;/M&gt;" &amp;=20
  vbCrLf<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
  .MoveNext<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
  Loop<BR>&gt;<BR>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strXM =3D strXM =
&amp;=20
  "&lt;/ROOT&gt;"<BR><BR>(The vbCrLf's are there so if there is a =
problem the=20
  document can be<BR>printed to a text file and be easier for humans to =
read --=20
  SQL Server<BR>ignores them.&nbsp; The single-character aliases for =
entity and=20
  attribute<BR>names are for performance -- for most of the stuff we use =
these=20
  for it<BR>doesn't really matter because we are only sending a few =
rows, but=20
  the<BR>first time I did it it was for something that was sending about =

  5000<BR>rows and there it made a huge difference, so I stuck with =
it.&nbsp;=20
  We<BR>comment both the front-end code and the stored procedure with=20
  the<BR>mappings of these aliases.)<BR><BR>&gt;&nbsp;&nbsp; I do not =
know how=20
  SQL Server maps from char to nchar, specifically what conversion is=20
  performed.&nbsp; Also, in some (maybe all released) versions of SQL =
Server=20
  nchar and nvarchar are encoded in UCS-2.&nbsp; UCS-2 is a 16-bit =
encoding like=20
  UTF-16.&nbsp; It dates back to when Unicode was defined as having =
2**16=20
  characters instead of the 2**20+ that it has now.&nbsp; You can not =
express=20
  characters &gt;=3D U+10000 in UCS-2 not that you care about=20
  these.<BR><BR>Thankfully, no.&nbsp;&nbsp; =
:)<BR>&gt;<BR>&gt;&nbsp;&nbsp; I=20
  don't know if whether those systems you describe being written in java =
make a=20
  difference.&nbsp; They can do what they want.&nbsp; The native java =
string is=20
  Unicode though I don't remember if it is UCS-2 or UTF-16.&nbsp; My =
guess is=20
  that it was once the former and is now the latter.&nbsp; One of the =
documents=20
  on this on sun's site suggests that java used UCS-2 until the recently =

  released 1.5 which is the first to use UTF-16.<BR><BR>The Java native =
string=20
  being unicode is exactly what made me start<BR>worrying -- when I was =
learning=20
  Java a couple of years ago (because I<BR>wanted to port an app to it =
so as to=20
  be able to run it right on the Unix<BR>box where the Oracle database =
was) I=20
  was horrified the first time I<BR>tried reading back what I had =
written to a=20
  text file when I saw spaces<BR>between all the=20
  characters.<BR>&gt;<BR>&gt;Rich<BR>&gt;<BR>&gt;&nbsp; "Ellen K." =
&lt;<A=20
  =
href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
t;=20
  wrote in message <A=20
  =
href=3D"news:aqag115606i9g8bmh3lst66une1f1sotth@4ax.com">news:aqag115606i=
9g8bmh3lst66une1f1sotth@4ax.com</A>...<BR>&gt;&nbsp;=20
  UTF-8 is unicode?!?&nbsp;&nbsp; Sheesh, all this time I thought it =
meant=20
  8-bit.<BR>&gt;&nbsp; In fact I could swear I read that=20
  somewhere.<BR>&gt;<BR>&gt;&nbsp; My question was coming from the =
database=20
  perspective, where I always use<BR>&gt;&nbsp; char and varchar, as =
opposed to=20
  nchar and nvarchar.&nbsp; I give the<BR>&gt;&nbsp; front-end guys =
little=20
  templates for creating the XML documents for all<BR>&gt;&nbsp; my SQL =
Server=20
  stored procedures that take XML input, and I always<BR>&gt;&nbsp; =
specify=20
  UTF-8 in the header... and my char and varchar columns =
always<BR>&gt;&nbsp;=20
  end up normal, so since you're now telling me UTF-8 is really unicode, =

  I<BR>&gt;&nbsp; guess that would answer my question for XML data I =
would be=20
  getting from<BR>&gt;&nbsp; the apps...?&nbsp;&nbsp;&nbsp; Or would the =
answer=20
  be different if the incoming XML is<BR>&gt;&nbsp; some other=20
  encoding?<BR>&gt;<BR>&gt;&nbsp; To simulate getting nvarchar data from =

  somewhere, I just tried creating<BR>&gt;&nbsp; two dummy tables, one =
with an=20
  nvarchar column and the other with a<BR>&gt;&nbsp; varchar column, =
typed stuff=20
  into the nvarchar one, then inserted to the<BR>&gt;&nbsp; varchar one =
select=20
  from the nvarchar one and it looks normal.&nbsp; =
<BR>&gt;<BR>&gt;&nbsp; If all=20
  this means I was worrying about nothing, excellent!&nbsp;&nbsp; OTOH,=20
  is<BR>&gt;&nbsp; there something I should be worrying about that I =
didn't=20
  ask?<BR>&gt;<BR>&gt;&nbsp; The only pieces whose names I know so far =
are Sonic=20
  and SalesForce, both<BR>&gt;&nbsp; of which are written in Java, if =
that makes=20
  any difference.&nbsp; I know<BR>&gt;&nbsp; there is at least one other =

  external piece but I think that is the next<BR>&gt;&nbsp;=20
  phase.<BR>&gt;<BR>&gt;&nbsp; On Sat, 19 Feb 2005 21:37:15 -0800, =
"Rich"=20
  &lt;@&gt; wrote in message<BR>&gt;&nbsp; &lt;<A=20
  =
href=3D"mailto:421821c1$1@w3.nls.net">421821c1$1@w3.nls.net</A>&gt;:<BR>&=
gt;<BR>&gt;&nbsp;=20
  &gt;&nbsp;&nbsp; You need to be more specific than "8-bit =
characters".&nbsp;=20
  There are many 8-bit character encodings.&nbsp; If you are using =
Windows to=20
  generate your data you most likely are using Windows-1252 which is the =
default=20
  8-bit character set for U.S. English in Windows.&nbsp; Windows =
supports many=20
  8-bit encodings so you could be using something else =
too.<BR>&gt;&nbsp;=20
  &gt;<BR>&gt;&nbsp; &gt;&nbsp;&nbsp; Unicode is a character set not an=20
  encoding.&nbsp; There are multiple encodings the main ones being =
UTF-8,=20
  UTF-16, and UTF-32.&nbsp; You can use any of these for XML as well as=20
  non-Unicode encodings.&nbsp; For interoperability you should use =
Unicode=20
  preferably UTF-8.<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp;&nbsp; =
What comes=20
  out when the XML is parsed depends on the XML parser.&nbsp; XML is =
logically=20
  expressed in Unicode.&nbsp; The Windows XML parsers provide a Unicode=20
  interface.&nbsp; Other parsers could do differently.<BR>&gt;&nbsp;=20
  &gt;<BR>&gt;&nbsp; &gt;Rich<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp;=20
  &gt;<BR>&gt;&nbsp; &gt;&nbsp; "Ellen K." &lt;<A=20
  =
href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
t;=20
  wrote in message <A=20
  =
href=3D"news:4o2g11pu048kafbdilg46u77vs5ls0be55@4ax.com">news:4o2g11pu048=
kafbdilg46u77vs5ls0be55@4ax.com</A>...<BR>&gt;&nbsp;=20
  &gt;&nbsp; Our new enterprise system is going to be built around an=20
  Enterprise<BR>&gt;&nbsp; &gt;&nbsp; Service Bus.&nbsp; I don't have =
the full=20
  specs yet but as I understand it the<BR>&gt;&nbsp; &gt;&nbsp; main =
apps=20
  (starting with SalesForce) are going to be out on the =
internet<BR>&gt;&nbsp;=20
  &gt;&nbsp; and the Sonic ESB will be the messaging piece.&nbsp; There=20
  will&nbsp; be an<BR>&gt;&nbsp; &gt;&nbsp; Operational Data Store in =
house that=20
  will get updated every night on a<BR>&gt;&nbsp; &gt;&nbsp; batch basis =
from=20
  the main apps.&nbsp; <BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; My =
data=20
  warehouse will continue to be the data warehouse and will =
remain<BR>&gt;&nbsp;=20
  &gt;&nbsp; in house.&nbsp; The dimensions will stay the same but I =
might have=20
  to create<BR>&gt;&nbsp; &gt;&nbsp; separate measures for the data from =
the new=20
  apps and then create views<BR>&gt;&nbsp; &gt;&nbsp; to keep everything =

  transparent to the users.&nbsp;&nbsp; <BR>&gt;&nbsp; =
&gt;<BR>&gt;&nbsp;=20
  &gt;&nbsp; I'm thinking if we're going to have an ODS in house =
already, I may=20
  as<BR>&gt;&nbsp; &gt;&nbsp; well do the ETL from there.&nbsp;&nbsp; =
But I'm=20
  worrying that the new data will<BR>&gt;&nbsp; &gt;&nbsp; probably be =
unicode=20
  (because Java defaults to that and SalesForce is<BR>&gt;&nbsp; =
&gt;&nbsp;=20
  written in Java).&nbsp; Right now I am storing everything (except our=20
  blobs<BR>&gt;&nbsp; &gt;&nbsp; of course) in 8-bit =
characters.&nbsp;&nbsp;=20
  <BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; Anyone here who's up on =
this=20
  stuff, can the XML that goes back and forth<BR>&gt;&nbsp; &gt;&nbsp; =
convert=20
  between unicode and 8-bit characters, or am I gonna have =
to<BR>&gt;&nbsp;=20
  &gt;&nbsp; redefine all my data?&nbsp;&nbsp; For example, if unicode =
data is=20
  put into an XML<BR>&gt;&nbsp; &gt;&nbsp; document that specifies =
UTF-8, what=20
  comes out when the document is<BR>&gt;&nbsp; &gt;&nbsp; parsed?&nbsp; =
How=20
  about vice versa?&nbsp; If this is too simplistic to work, =
what<BR>&gt;&nbsp;=20
  &gt;&nbsp; is needed?<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; (We =
actually=20
  have no substantive need for unicode -- we are bilingual<BR>&gt;&nbsp; =

  &gt;&nbsp; Spanish but all the special Spanish characters exist in the =

  ascii<BR>&gt;&nbsp; &gt;&nbsp; character =
set.)<BR></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0837_01C5174B.06C691F0--

--- BBBS/NT v4.01 Flag-5
 * Origin: Barktopia BBS Site http://HarborWebs.com:8081 (1:379/45)