Tillbaka till svenska Fidonet
English   Information   Debug  
OS2BBS   0/787
OS2DOSBBS   0/580
OS2HW   0/42
OS2INET   0/37
OS2LAN   0/134
OS2PROG   0/36
OS2REXX   0/113
OS2USER-L   207
OS2   0/4786
OSDEBATE   2854/18996
PASCAL   0/490
PERL   0/457
PHP   0/45
POINTS   0/405
POLITICS   0/29554
POL_INC   0/14731
PSION   103
R20_ADMIN   1121
R20_AMATORRADIO   0/2
R20_BEST_OF_FIDONET   13
R20_CHAT   0/893
R20_DEPP   0/3
R20_DEV   399
R20_ECHO2   1379
R20_ECHOPRES   0/35
R20_ESTAT   0/719
R20_FIDONETPROG...
...RAM.MYPOINT
  0/2
R20_FIDONETPROGRAM   0/22
R20_FIDONET   0/248
R20_FILEFIND   0/24
R20_FILEFOUND   0/22
R20_HIFI   0/3
R20_INFO2   3205
R20_INTERNET   0/12940
R20_INTRESSE   0/60
R20_INTR_KOM   0/99
R20_KANDIDAT.CHAT   42
R20_KANDIDAT   28
R20_KOM_DEV   112
R20_KONTROLL   0/13258
R20_KORSET   0/18
R20_LOKALTRAFIK   0/24
R20_MODERATOR   0/1852
R20_NC   76
R20_NET200   245
R20_NETWORK.OTH...
...ERNETS
  0/13
R20_OPERATIVSYS...
...TEM.LINUX
  0/44
R20_PROGRAMVAROR   0/1
R20_REC2NEC   534
R20_SFOSM   0/340
R20_SF   0/108
R20_SPRAK.ENGLISH   0/1
R20_SQUISH   107
R20_TEST   2
R20_WORST_OF_FIDONET   12
RAR   0/9
RA_MULTI   106
RA_UTIL   0/162
REGCON.EUR   0/2056
REGCON   0/13
SCIENCE   0/1206
SF   0/239
SHAREWARE_SUPPORT   0/5146
SHAREWRE   0/14
SIMPSONS   0/169
STATS_OLD1   0/2539.065
STATS_OLD2   0/2530
STATS_OLD3   0/2395.095
STATS_OLD4   0/1692.25
SURVIVOR   0/495
SYSOPS_CORNER   0/3
SYSOP   0/84
TAGLINES   0/112
TEAMOS2   0/4530
TECH   0/2617
TEST.444   0/105
TRAPDOOR   0/19
TREK   0/755
TUB   0/290
UFO   0/40
UNIX   0/1316
USA_EURLINK   0/102
USR_MODEMS   0/1
VATICAN   0/2740
VIETNAM_VETS   0/14
VIRUS   0/378
VIRUS_INFO   0/201
VISUAL_BASIC   0/473
WHITEHOUSE   0/5187
WIN2000   0/101
WIN32   0/30
WIN95   0/4288
WIN95_OLD1   0/70272
WINDOWS   0/1517
WWB_SYSOP   0/419
WWB_TECH   0/810
ZCC-PUBLIC   0/1
ZEC   4

 
4DOS   0/134
ABORTION   0/7
ALASKA_CHAT   0/506
ALLFIX_FILE   0/1313
ALLFIX_FILE_OLD1   0/7997
ALT_DOS   0/152
AMATEUR_RADIO   0/1039
AMIGASALE   0/14
AMIGA   0/331
AMIGA_INT   0/1
AMIGA_PROG   0/20
AMIGA_SYSOP   0/26
ANIME   0/15
ARGUS   0/924
ASCII_ART   0/340
ASIAN_LINK   0/651
ASTRONOMY   0/417
AUDIO   0/92
AUTOMOBILE_RACING   0/105
BABYLON5   0/17862
BAG   135
BATPOWER   0/361
BBBS.ENGLISH   0/382
BBSLAW   0/109
BBS_ADS   0/5290
BBS_INTERNET   0/507
BIBLE   0/3563
BINKD   0/1119
BINKLEY   0/215
BLUEWAVE   0/2173
CABLE_MODEMS   0/25
CBM   0/46
CDRECORD   0/66
CDROM   0/20
CLASSIC_COMPUTER   0/378
COMICS   0/15
CONSPRCY   0/899
COOKING   32677
COOKING_OLD1   0/24719
COOKING_OLD2   0/40862
COOKING_OLD3   0/37489
COOKING_OLD4   0/35496
COOKING_OLD5   9370
C_ECHO   0/189
C_PLUSPLUS   0/31
DIRTY_DOZEN   0/201
DOORGAMES   0/2053
DOS_INTERNET   0/196
duplikat   6002
ECHOLIST   0/18295
EC_SUPPORT   0/318
ELECTRONICS   0/359
ELEKTRONIK.GER   1534
ENET.LINGUISTIC   0/13
ENET.POLITICS   0/4
ENET.SOFT   0/11701
ENET.SYSOP   33888
ENET.TALKS   0/32
ENGLISH_TUTOR   0/2000
EVOLUTION   0/1335
FDECHO   0/217
FDN_ANNOUNCE   0/7068
FIDONEWS   24094
FIDONEWS_OLD1   0/49742
FIDONEWS_OLD2   0/35949
FIDONEWS_OLD3   0/30874
FIDONEWS_OLD4   0/37224
FIDO_SYSOP   12852
FIDO_UTIL   0/180
FILEFIND   0/209
FILEGATE   0/212
FILM   0/18
FNEWS_PUBLISH   4393
FN_SYSOP   41678
FN_SYSOP_OLD1   71952
FTP_FIDO   0/2
FTSC_PUBLIC   0/13598
FUNNY   0/4886
GENEALOGY.EUR   0/71
GET_INFO   105
GOLDED   0/408
HAM   0/16069
HOLYSMOKE   0/6791
HOT_SITES   0/1
HTMLEDIT   0/71
HUB203   466
HUB_100   264
HUB_400   39
HUMOR   0/29
IC   0/2851
INTERNET   0/424
INTERUSER   0/3
IP_CONNECT   719
JAMNNTPD   0/233
JAMTLAND   0/47
KATTY_KORNER   0/41
LAN   0/16
LINUX-USER   0/19
LINUXHELP   0/1155
LINUX   0/22090
LINUX_BBS   0/957
mail   18.68
mail_fore_ok   249
MENSA   0/341
MODERATOR   0/102
MONTE   0/992
MOSCOW_OKLAHOMA   0/1245
MUFFIN   0/783
MUSIC   0/321
N203_STAT   924
N203_SYSCHAT   313
NET203   321
NET204   69
NET_DEV   0/10
NORD.ADMIN   0/101
NORD.CHAT   0/2572
NORD.FIDONET   189
NORD.HARDWARE   0/28
NORD.KULTUR   0/114
NORD.PROG   0/32
NORD.SOFTWARE   0/88
NORD.TEKNIK   0/58
NORD   0/453
OCCULT_CHAT   0/93
Möte OSDEBATE, 18996 texter
 lista första sista föregående nästa
Text 2668, 1342 rader
Skriven 2005-02-21 21:58:24 av Rich (1:379/45)
   Kommentar till text 2653 av Ellen K (1:379/45)
Ärende: Re: ESB / XML / Unicode vs 8-bit characters ?
=====================================================
From: "Rich" <@>

This is a multi-part message in MIME format.

------=_NextPart_000_0A6B_01C51860.770EC1B0
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

   If SalesForce creates UTF-8 encoded XML, and it really is UTF-8 =
encoded, then you should have no problem.  The XML is valid and any = software
you have that consumes XML should accept it just fine.

   I don't see a problem with UTF-16 if you prefer this to UTF-8.  Both =
can represent exactly the same characters.  The only difference is the =
mechanics of the encoding.

Rich

  "Ellen K" <Ellen.K@harborwebs.com> wrote in message =
news:908e44.556f2c@harborwebs.com...
  First of all I have to say you have once again earned my eternal =
gratitude, I
  am so glad you are here.   :)

  I still have to make a decision regarding the standard I want to =
request.  Are
  you saying that if for example I get XML generated by say SalesForce =
with UTF-8
  specified and it includes these characters, I would not have a =
problem?  Stuff
  I write myself I'm not worried about because I have the opportunity to =
tweak
  it, what I'm worried about is what the ESB will try to feed my =
databases.=20
  Since even the hand-typed XML was accepted when UTF-16 was specified, =
I'm kind
  of leaning toward that, especially since the Oracle guy told me that =
the TJ
  accounting manager used to complain that the Spanish characters =
weren't coming
  out on reports when we were on Oracle 8.x which used UTF-8 but as soon =
as we
  moved to 9.x which uses UTF-16 there were no more problems.  What (if =
any) do
  you see as a potential downside to UTF-16?


  > From: "Rich" <@>
  > This is a multi-part message in MIME format.
  > ------=3D_NextPart_000_08FA_01C517AB.C3617E00
  > Content-Type: text/plain;
  > charset=3D"iso-8859-1"
  > Content-Transfer-Encoding: quoted-printable
  > The problem is not that UTF-8 won't work for what you call Spanish =
=3D
  > characters.  UTF-8 can encode anything that you consider a =
character.  =3D
  > The problem is that when these characters are present they are not =
being =3D
  > encoded in UTF-8.  There are two straight forward solutions I see.  =
One =3D
  > is to encode the XML correctly in UTF-8.  The other is to encode the =
XML =3D
  > in the Windows ANSI encoding I suspect you are using and to tag it =
=3D
  > correctly.  On a Windows U.S. English system this would be =3D
  > "Windows-1252".  I would expect SQL Server to support this.  Other =
=3D
  > applications may or may not.  You could also use UTF-16 as long as =
you =3D
  > generate your XML in UTF-16.  I don't know if this is simple for =
your =3D
  > application or not.  If you are using VB 7.0 it should be.
  > Rich
  > "Ellen K." <72322.1016@compuserve.com> wrote in message =3D
  > news:of2j11lqf4r4h1dsnvok8i1dv7c9e3rlb2@4ax.com...
  > Well.   You just helped me learn a lot.   For one thing, I guess I
  > thought that because the Spanish characters can be expressed as 8 =
bits
  > they were ASCII. =3D20
  > I just made a little test stored procedure taking an XML document as =
a
  > parameter, created the document manually in Notepad with UTF-8 =3D
  > specified
  > in the header and tried including some of the Spanish characters... =
=3D
  > and
  > it failed.  SQL Server could not execute sp_xml_preparedocument =3D
  > because
  > "an invalid character was found in text content".   Just to make =
sure
  > that was the problem I substituted non-Spanish characters for the
  > Spanish ones and it executed fine.  However, I can manually type the
  > text with the Spanish characters into the varchar field if I open =
the
  > table in EnterpriseManager and SQL Server is perfectly happy.  OTOH, =
=3D
  > if
  > I specify UTF-16, I get "Switch from current encoding to specified
  > encoding not supported."   Next thing I tried was cloning the sproc =
to
  > write to the table with the nvarchar column, still no joy, same =
error
  > message... but on changing the datatype of the input parameter from =
=3D
  > text
  > to ntext it worked fine.  BUT here's the surprise (OK, to me it was =
a
  > surprise):  If I again clone the sproc to point to the table with =
the
  > varchar column, but leave the input parameter as ntext and specify
  > UTF-16 in the document header, it works.  In other words, a varchar =
=3D
  > (and
  > presumably a char) column can successfully accept unicode data even
  > though char and varchar are explicitly defined as non-unicode =3D
  > datatypes!
  > Now I have to understand whether I have a problem at work.   I never
  > experienced this problem in real life because none of the data we
  > currently send using XML includes any of the Spanish characters.  Is
  > the problem only going to occur if the XML document is constructed =
=3D
  > using
  > the concatenated-string method?   Or would it happen any time an XML
  > document specified as UTF-8 included Spanish characters?   (The data
  > sent by SalesForce to the ODS is likely to include Spanish =
characters,
  > but it probably creates the XML some other way.)  Do I need to tell =
=3D
  > the
  > consulting outfit to specify all XML as UTF-16? =3D20
  > For the ETL from the ODS to the data warehouse I am not planning to =
=3D
  > use
  > Sonic, but rather probably to link the databases and use a bunch of
  > stored procedures controlled by some VB code, IOW I will not need =
XML
  > because all the extract sprocs will look like INSERT INTO....SELECT
  > FROM.
  > I don't yet understand why UTF-8 can't work for the Spanish
  > characters... (unless it only doesn't work when the characters are
  > manually typed into the document).  If I correctly understand the
  > document to which you referred me, an 8-bit character can't fit in =
one
  > UTF-8 byte because the first bit is reserved for indicating which is =
=3D
  > the
  > first byte of a UTF-8 multi-byte character.  (This was your point =
=3D
  > about
  > not greater than 0x7F.)   But why wouldn't it just make two bytes =
out =3D
  > of
  > the Spanish characters then?   The documentation says UTF-8 uses
  > multiple bytes for the characters that it can't fit into one byte. =
=3D20
  > ???
  > On Sun, 20 Feb 2005 13:10:40 -0800, "Rich" <@> wrote in message
  > <4218fc91@w3.nls.net>:
  >> The Spanish accented characters are not part of ASCII.  They are =
=3D
  > part of Windows calls ANSI of which ASCII is the subset (0x00 to =
0x7F).  =3D
  > Any character in the 0x80 to 0xFF range is not compatible between =
ANSI =3D
  > and UTF-8.
  >>=20
  >> Rich
  >>=20
  >> "Ellen K." <72322.1016@compuserve.com> wrote in message =3D
  > news:7ouh119ivmuk26icg3mqqqk2ss1lfm5c10@4ax.com...
  >> Should not have any non-ASCII characters, as previously noted all =
=3D
  > the
  >> special Spanish characters are available in the ASCII character =3D
  > set.
  >> And since the company is built on our understanding of the Hispanic
  >> market, I don't see any use of, say, pictograph-based languages in =
=3D
  > the
  >> foreseeable future.   If 10 years down the road something like that
  >> happens, well, by then we will no longer need compatibility with =
=3D
  > the
  >> current legacy system because it will long since have been =3D
  > replaced.
  >>=20
  >> On Sun, 20 Feb 2005 12:52:25 -0800, "Rich" <@> wrote in message
  >> <4218f849$1@w3.nls.net>:
  >>=20
  >> >   From what you describe below, if the values you emit to XML =3D
  > have non-ASCII characters I would expect you to have a problem.
  >> >
  >> >Rich
  >> >
  >> >  "Ellen K." <72322.1016@compuserve.com> wrote in message =3D
  > news:eanh11h4vv6b9v21fiaounii3f5dunjl3g@4ax.com...
  >> >  On Sat, 19 Feb 2005 23:32:37 -0800, "Rich" <@> wrote in message
  >> >  <42183ccd@w3.nls.net>:
  >> >
  >> >  >   The UTF in UTF-8/16/32 stands for Unicode Transformation =3D
  > Format.  You can find these defined in section 2.5 of =3D
  > http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf.
  >> >
  >> >  THANK YOU SO MUCH!!!    :)
  >> >  >
  >> >  >   It's not clear to me how you are creating the XML from the =
=3D
  > templates.  If ANSI data is emitted into an XML document declared as =
=3D
  > UTF-8 then you would have problems only for non-ASCII characters.  =
UTF-8 =3D
  > and Windows-1252 are identical for 0x00 to 0x7F which is ASCII in =
both.
  >> >
  >> >  I don't have a copy of a template here at home, but I have them =
=3D
  > create
  >> >  it by string concatenation because that seems to be the only way =
=3D
  > to be
  >> >  able to have CDATA attributes, which I have to have because in =
=3D
  > the
  >> >  legacy data numeric-appearing identifiers are actually =3D
  > 10-character
  >> >  strings with leading spaces, and if these are not specified as =
=3D
  > CDATA
  >> >  the spaces go lost even with "xml:space=3D3D"preserve"" included =
=3D
  > in the
  >> >  header.  Here is a code snippet from one of my apps that creates =
=3D
  > an XML
  >> >  document which is passed as a parameter to a SQL Server stored
  >> >  procedure:
  >> >
  >> >  >      strXM =3D3D "<?xml version =3D3D" & Chr(34) & "1.0" & =
Chr(34) =3D
  > & "  encoding=3D3D" & Chr(34) & "UTF-8" & Chr(34) & "?>" & vbCrLf _
  >> >  >        & "<ROOT xml:space=3D3D" & Chr(34) & "preserve" & =
Chr(34) =3D
  > & ">" & vbCrLf
  >> >  >
  >> >  >      Do While Not .EOF
  >> >  >        strXM =3D3D strXM & "<M><A>" & !Ofc & =
"</A><B><![CDATA[" =3D
  > & !Contract & "]]></B><C>" & !TCode & "</C><D>" & !Date & "</D>" _
  >> >  >                & "<E><![CDATA[" & !TransNo & "]]></E></M>" & =
=3D
  > vbCrLf
  >> >  >        .MoveNext
  >> >  >      Loop
  >> >  >
  >> >  >      strXM =3D3D strXM & "</ROOT>"
  >> >
  >> >  (The vbCrLf's are there so if there is a problem the document =
=3D
  > can be
  >> >  printed to a text file and be easier for humans to read -- SQL =
=3D
  > Server
  >> >  ignores them.  The single-character aliases for entity and =3D
  > attribute
  >> >  names are for performance -- for most of the stuff we use these =
=3D
  > for it
  >> >  doesn't really matter because we are only sending a few rows, =
=3D
  > but the
  >> >  first time I did it it was for something that was sending about =
=3D
  > 5000
  >> >  rows and there it made a huge difference, so I stuck with it.  =
=3D
  > We
  >> >  comment both the front-end code and the stored procedure with =
=3D
  > the
  >> >  mappings of these aliases.)
  >> >
  >> >  >   I do not know how SQL Server maps from char to nchar, =3D
  > specifically what conversion is performed.  Also, in some (maybe all =
=3D
  > released) versions of SQL Server nchar and nvarchar are encoded in =
=3D
  > UCS-2.  UCS-2 is a 16-bit encoding like UTF-16.  It dates back to =
when =3D
  > Unicode was defined as having 2**16 characters instead of the 2**20+ =
=3D
  > that it has now.  You can not express characters >=3D3D U+10000 in =
UCS-2 =3D
  > not that you care about these.
  >> >
  >> >  Thankfully, no.   :)
  >> >  >
  >> >  >   I don't know if whether those systems you describe being =3D
  > written in java make a difference.  They can do what they want.  The =
=3D
  > native java string is Unicode though I don't remember if it is UCS-2 =
or =3D
  > UTF-16.  My guess is that it was once the former and is now the =
latter.  =3D
  > One of the documents on this on sun's site suggests that java used =
UCS-2 =3D
  > until the recently released 1.5 which is the first to use UTF-16.
  >> >
  >> >  The Java native string being unicode is exactly what made me =3D
  > start
  >> >  worrying -- when I was learning Java a couple of years ago =3D
  > (because I
  >> >  wanted to port an app to it so as to be able to run it right on =
=3D
  > the Unix
  >> >  box where the Oracle database was) I was horrified the first =3D
  > time I
  >> >  tried reading back what I had written to a text file when I saw =
=3D
  > spaces
  >> >  between all the characters.
  >> >  >
  >> >  >Rich
  >> >  >
  >> >  >  "Ellen K." <72322.1016@compuserve.com> wrote in message =3D
  > news:aqag115606i9g8bmh3lst66une1f1sotth@4ax.com...
  >> >  >  UTF-8 is unicode?!?   Sheesh, all this time I thought it =3D
  > meant 8-bit.
  >> >  >  In fact I could swear I read that somewhere.
  >> >  >
  >> >  >  My question was coming from the database perspective, where I =
=3D
  > always use
  >> >  >  char and varchar, as opposed to nchar and nvarchar.  I give =
=3D
  > the
  >> >  >  front-end guys little templates for creating the XML =3D
  > documents for all
  >> >  >  my SQL Server stored procedures that take XML input, and I =
=3D
  > always
  >> >  >  specify UTF-8 in the header... and my char and varchar =3D
  > columns always
  >> >  >  end up normal, so since you're now telling me UTF-8 is really =
=3D
  > unicode, I
  >> >  >  guess that would answer my question for XML data I would be =
=3D
  > getting from
  >> >  >  the apps...?    Or would the answer be different if the =3D
  > incoming XML is
  >> >  >  some other encoding?
  >> >  >
  >> >  >  To simulate getting nvarchar data from somewhere, I just =3D
  > tried creating
  >> >  >  two dummy tables, one with an nvarchar column and the other =
=3D
  > with a
  >> >  >  varchar column, typed stuff into the nvarchar one, then =3D
  > inserted to the
  >> >  >  varchar one select from the nvarchar one and it looks normal. =
=3D
  > =3D20
  >> >  >
  >> >  >  If all this means I was worrying about nothing, excellent!   =
=3D
  > OTOH, is
  >> >  >  there something I should be worrying about that I didn't ask?
  >> >  >
  >> >  >  The only pieces whose names I know so far are Sonic and =3D
  > SalesForce, both
  >> >  >  of which are written in Java, if that makes any difference.  =
=3D
  > I know
  >> >  >  there is at least one other external piece but I think that =
=3D
  > is the next
  >> >  >  phase.
  >> >  >
  >> >  >  On Sat, 19 Feb 2005 21:37:15 -0800, "Rich" <@> wrote in =3D
  > message
  >> >  >  <421821c1$1@w3.nls.net>:
  >> >  >
  >> >  >  >   You need to be more specific than "8-bit characters".  =
=3D
  > There are many 8-bit character encodings.  If you are using Windows =
to =3D
  > generate your data you most likely are using Windows-1252 which is =
the =3D
  > default 8-bit character set for U.S. English in Windows.  Windows =
=3D
  > supports many 8-bit encodings so you could be using something else =
too.
  >> >  >  >
  >> >  >  >   Unicode is a character set not an encoding.  There are =
=3D
  > multiple encodings the main ones being UTF-8, UTF-16, and UTF-32.  =
You =3D
  > can use any of these for XML as well as non-Unicode encodings.  For =
=3D
  > interoperability you should use Unicode preferably UTF-8.
  >> >  >  >
  >> >  >  >   What comes out when the XML is parsed depends on the XML =
=3D
  > parser.  XML is logically expressed in Unicode.  The Windows XML =
parsers =3D
  > provide a Unicode interface.  Other parsers could do differently.
  >> >  >  >
  >> >  >  >Rich
  >> >  >  >
  >> >  >  >
  >> >  >  >  "Ellen K." <72322.1016@compuserve.com> wrote in message =
=3D
  > news:4o2g11pu048kafbdilg46u77vs5ls0be55@4ax.com...
  >> >  >  >  Our new enterprise system is going to be built around an =
=3D
  > Enterprise
  >> >  >  >  Service Bus.  I don't have the full specs yet but as I =3D
  > understand it the
  >> >  >  >  main apps (starting with SalesForce) are going to be out =
=3D
  > on the internet
  >> >  >  >  and the Sonic ESB will be the messaging piece.  There will =
=3D
  > be an
  >> >  >  >  Operational Data Store in house that will get updated =3D
  > every night on a
  >> >  >  >  batch basis from the main apps. =3D20
  >> >  >  >
  >> >  >  >  My data warehouse will continue to be the data warehouse =
=3D
  > and will remain
  >> >  >  >  in house.  The dimensions will stay the same but I might =
=3D
  > have to create
  >> >  >  >  separate measures for the data from the new apps and then =
=3D
  > create views
  >> >  >  >  to keep everything transparent to the users.  =3D20
  >> >  >  >
  >> >  >  >  I'm thinking if we're going to have an ODS in house =3D
  > already, I may as
  >> >  >  >  well do the ETL from there.   But I'm worrying that the =
=3D
  > new data will
  >> >  >  >  probably be unicode (because Java defaults to that and =3D
  > SalesForce is
  >> >  >  >  written in Java).  Right now I am storing everything =3D
  > (except our blobs
  >> >  >  >  of course) in 8-bit characters.  =3D20
  >> >  >  >
  >> >  >  >  Anyone here who's up on this stuff, can the XML that goes =
=3D
  > back and forth
  >> >  >  >  convert between unicode and 8-bit characters, or am I =3D
  > gonna have to
  >> >  >  >  redefine all my data?   For example, if unicode data is =
=3D
  > put into an XML
  >> >  >  >  document that specifies UTF-8, what comes out when the =3D
  > document is
  >> >  >  >  parsed?  How about vice versa?  If this is too simplistic =
=3D
  > to work, what
  >> >  >  >  is needed?
  >> >  >  >
  >> >  >  >  (We actually have no substantive need for unicode -- we =
=3D
  > are bilingual
  >> >  >  >  Spanish but all the special Spanish characters exist in =
=3D
  > the ascii
  >> >  >  >  character set.)
  > ------=3D_NextPart_000_08FA_01C517AB.C3617E00
  > Content-Type: text/html;
  > charset=3D"iso-8859-1"
  > Content-Transfer-Encoding: quoted-printable
  > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
  > <HTML><HEAD>
  > <META http-equiv=3D3DContent-Type content=3D3D"text/html; =3D
  > charset=3D3Diso-8859-1">
  > <META content=3D3D"MSHTML 6.00.3790.1289" name=3D3DGENERATOR>
  > <STYLE></STYLE>
  > </HEAD>
  > <BODY bgColor=3D3D#ffffff>
  > <DIV><FONT face=3D3DArial size=3D3D2>&nbsp;&nbsp; The problem is not =
that =3D
  > UTF-8 won't=3D20
  > work for what you call Spanish characters.&nbsp; UTF-8 can encode =
=3D
  > anything that=3D20
  > you consider a character.&nbsp; The problem is that when these =3D
  > characters are=3D20
  > present they are not being encoded in UTF-8.&nbsp; There are two =3D
  > straight=3D20
  > forward solutions I see.&nbsp; One is to encode the XML correctly =
in=3D20
  > UTF-8.&nbsp; The other is to encode the XML in the Windows ANSI =
encoding =3D
  > I=3D20
  > suspect you are using and to tag it correctly.&nbsp; On a Windows =
U.S. =3D
  > English=3D20
  > system this would be "Windows-1252".&nbsp; I would expect SQL Server =
to =3D
  > support=3D20
  > this.&nbsp; Other applications may or may not.&nbsp; You could also =
use =3D
  > UTF-16=3D20
  > as long as you generate your XML in UTF-16.&nbsp; I don't know if =
this =3D
  > is simple=3D20
  > for your application or not.&nbsp; If you are using VB 7.0 it =
should=3D20
  > be.</FONT></DIV>
  > <DIV><FONT face=3D3DArial size=3D3D2></FONT>&nbsp;</DIV>
  > <DIV><FONT face=3D3DArial size=3D3D2>Rich</FONT></DIV>
  > <DIV><FONT face=3D3DArial size=3D3D2></FONT>&nbsp;</DIV>
  > <BLOCKQUOTE=3D20
  > style=3D3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
=3D
  > BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  > <DIV>"Ellen K." &lt;<A=3D20
  > =3D
  > =
href=3D3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>=
&g=3D
  > t;=3D20
  > wrote in message <A=3D20
  > =3D
  > =
href=3D3D"news:of2j11lqf4r4h1dsnvok8i1dv7c9e3rlb2@4ax.com">news:of2j11lqf=
4r=3D
  > 4h1dsnvok8i1dv7c9e3rlb2@4ax.com</A>...</DIV>Well.&nbsp;&nbsp;=3D20
  > You just helped me learn a lot.&nbsp;&nbsp; For one thing, I =
guess=3D20
  > I<BR>thought that because the Spanish characters can be expressed as =
8 =3D
  > bits<BR>they were ASCII.&nbsp; <BR><BR>I just made a little test =3D
  > stored=3D20
  > procedure taking an XML document as a<BR>parameter, created the =3D
  > document=3D20
  > manually in Notepad with UTF-8 specified<BR>in the header and tried =
=3D
  > including=3D20
  > some of the Spanish characters... and<BR>it failed.&nbsp; SQL Server =
=3D
  > could not=3D20
  > execute sp_xml_preparedocument because<BR>"an invalid character was =
=3D
  > found in=3D20
  > text content".&nbsp;&nbsp; Just to make sure<BR>that was the problem =
I =3D
  > substituted non-Spanish characters for the<BR>Spanish ones and it =
=3D
  > executed=3D20
  > fine.&nbsp; However, I can manually type the<BR>text with the =
Spanish=3D20
  > characters into the varchar field if I open the<BR>table in =3D
  > EnterpriseManager=3D20
  > and SQL Server is perfectly happy.&nbsp; OTOH, if<BR>I specify =
UTF-16, =3D
  > I get=3D20
  > "Switch from current encoding to specified<BR>encoding not=3D20
  > supported."&nbsp;&nbsp; Next thing I tried was cloning the sproc =3D
  > to<BR>write=3D20
  > to the table with the nvarchar column, still no joy, same =3D
  > error<BR>message...=3D20
  > but on changing the datatype of the input parameter from text<BR>to =
=3D
  > ntext it=3D20
  > worked fine.&nbsp; BUT here's the surprise (OK, to me it was=3D20
  > a<BR>surprise):&nbsp; If I again clone the sproc to point to the =
table =3D
  > with=3D20
  > the<BR>varchar column, but leave the input parameter as ntext =
and=3D20
  > specify<BR>UTF-16 in the document header, it works.&nbsp; In other =
=3D
  > words, a=3D20
  > varchar (and<BR>presumably a char) column can successfully accept =
=3D
  > unicode data=3D20
  > even<BR>though char and varchar are explicitly defined as =
non-unicode=3D20
  > datatypes!<BR><BR>Now I have to understand whether I have a problem =
at =3D
  > work.&nbsp;&nbsp; I never<BR>experienced this problem in real life =
=3D
  > because=3D20
  > none of the data we<BR>currently send using XML includes any of the =
=3D
  > Spanish=3D20
  > characters.&nbsp; Is<BR>the problem only going to occur if the XML =
=3D
  > document is=3D20
  > constructed using<BR>the concatenated-string method?&nbsp;&nbsp; Or =
=3D
  > would it=3D20
  > happen any time an XML<BR>document specified as UTF-8 included =
Spanish =3D
  > characters?&nbsp;&nbsp; (The data<BR>sent by SalesForce to the ODS =
is =3D
  > likely=3D20
  > to include Spanish characters,<BR>but it probably creates the XML =
some =3D
  > other=3D20
  > way.)&nbsp; Do I need to tell the<BR>consulting outfit to specify =
all =3D
  > XML as=3D20
  > UTF-16?&nbsp; <BR><BR>For the ETL from the ODS to the data warehouse =
I =3D
  > am not=3D20
  > planning to use<BR>Sonic, but rather probably to link the databases =
=3D
  > and use a=3D20
  > bunch of<BR>stored procedures controlled by some VB code, IOW I will =
=3D
  > not need=3D20
  > XML<BR>because all the extract sprocs will look like INSERT=3D20
  > INTO....SELECT<BR>FROM.<BR><BR>I don't yet understand why UTF-8 =
can't =3D
  > work for=3D20
  > the Spanish<BR>characters... (unless it only doesn't work when the =
=3D
  > characters=3D20
  > are<BR>manually typed into the document).&nbsp; If I correctly =3D
  > understand=3D20
  > the<BR>document to which you referred me, an 8-bit character can't =
fit =3D
  > in=3D20
  > one<BR>UTF-8 byte because the first bit is reserved for indicating =
=3D
  > which is=3D20
  > the<BR>first byte of a UTF-8 multi-byte character.&nbsp; (This was =
=3D
  > your point=3D20
  > about<BR>not greater than 0x7F.)&nbsp;&nbsp; But why wouldn't it =
just =3D
  > make two=3D20
  > bytes out of<BR>the Spanish characters then?&nbsp;&nbsp; The =3D
  > documentation=3D20
  > says UTF-8 uses<BR>multiple bytes for the characters that it can't =
fit =3D
  > into=3D20
  > one byte.&nbsp; <BR><BR>???<BR><BR><BR><BR>On Sun, 20 Feb 2005 =3D
  > 13:10:40 -0800,=3D20
  > "Rich" &lt;@&gt; wrote in message<BR>&lt;<A=3D20
  > =3D
  > =
href=3D3D"mailto:4218fc91@w3.nls.net">4218fc91@w3.nls.net</A>&gt;:<BR><BR=
>&=3D
  > gt;&nbsp;&nbsp;=3D20
  > The Spanish accented characters are not part of ASCII.&nbsp; They =
are =3D
  > part of=3D20
  > Windows calls ANSI of which ASCII is the subset (0x00 to =
0x7F).&nbsp; =3D
  > Any=3D20
  > character in the 0x80 to 0xFF range is not compatible between ANSI =
and =3D
  > UTF-8.<BR>&gt;<BR>&gt;Rich<BR>&gt;<BR>&gt;&nbsp; "Ellen K." =
&lt;<A=3D20
  > =3D
  > =
href=3D3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>=
&g=3D
  > t;=3D20
  > wrote in message <A=3D20
  > =3D
  > =
href=3D3D"news:7ouh119ivmuk26icg3mqqqk2ss1lfm5c10@4ax.com">news:7ouh119iv=
mu=3D
  > k26icg3mqqqk2ss1lfm5c10@4ax.com</A>...<BR>&gt;&nbsp;=3D20
  > Should not have any non-ASCII characters, as previously noted =
all=3D20
  > the<BR>&gt;&nbsp; special Spanish characters are available in the =
=3D
  > ASCII=3D20
  > character set.<BR>&gt;&nbsp; And since the company is built on =
our=3D20
  > understanding of the Hispanic<BR>&gt;&nbsp; market, I don't see any =
=3D
  > use of,=3D20
  > say, pictograph-based languages in the<BR>&gt;&nbsp; =
foreseeable=3D20
  > future.&nbsp;&nbsp; If 10 years down the road something like=3D20
  > that<BR>&gt;&nbsp; happens, well, by then we will no longer need =3D
  > compatibility=3D20
  > with the<BR>&gt;&nbsp; current legacy system because it will long =
=3D
  > since have=3D20
  > been replaced.<BR>&gt;<BR>&gt;&nbsp; On Sun, 20 Feb 2005 12:52:25 =
=3D
  > -0800,=3D20
  > "Rich" &lt;@&gt; wrote in message<BR>&gt;&nbsp; &lt;<A=3D20
  > =3D
  > =
href=3D3D"mailto:4218f849$1@w3.nls.net">4218f849$1@w3.nls.net</A>&gt;:<BR=
>&=3D
  > gt;<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp;&nbsp; From what you describe below, if the values you =
emit =3D
  > to XML=3D20
  > have non-ASCII characters I would expect you to have a =3D
  > problem.<BR>&gt;&nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;Rich<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; =3D
  > &gt;&nbsp; "Ellen=3D20
  > K." &lt;<A=3D20
  > =3D
  > =
href=3D3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>=
&g=3D
  > t;=3D20
  > wrote in message <A=3D20
  > =3D
  > =
href=3D3D"news:eanh11h4vv6b9v21fiaounii3f5dunjl3g@4ax.com">news:eanh11h4v=
v6=3D
  > b9v21fiaounii3f5dunjl3g@4ax.com</A>...<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; On Sat, 19 Feb 2005 23:32:37 -0800, "Rich" &lt;@&gt; =
wrote =3D
  > in=3D20
  > message<BR>&gt;&nbsp; &gt;&nbsp; &lt;<A=3D20
  > =3D
  > =
href=3D3D"mailto:42183ccd@w3.nls.net">42183ccd@w3.nls.net</A>&gt;:<BR>&gt=
;&=3D
  > nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp; The UTF in =
UTF-8/16/32 =3D
  > stands=3D20
  > for Unicode Transformation Format.&nbsp; You can find these defined =
in =3D
  > section=3D20
  > 2.5 of <A=3D20
  > =3D
  > =
href=3D3D"http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf">http://w=
ww=3D
  > .unicode.org/versions/Unicode4.0.0/ch02.pdf</A>.<BR>&gt;&nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;&nbsp; THANK YOU SO =
MUCH!!!&nbsp;&nbsp;&nbsp;=3D20
  > :)<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp;&nbsp;=3D20
  > It's not clear to me how you are creating the XML from the =3D
  > templates.&nbsp; If=3D20
  > ANSI data is emitted into an XML document declared as UTF-8 then you =
=3D
  > would=3D20
  > have problems only for non-ASCII characters.&nbsp; UTF-8 and =3D
  > Windows-1252 are=3D20
  > identical for 0x00 to 0x7F which is ASCII in =
both.<BR>&gt;&nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;&nbsp; I don't have a copy of a template here =
=3D
  > at home,=3D20
  > but I have them create<BR>&gt;&nbsp; &gt;&nbsp; it by string =3D
  > concatenation=3D20
  > because that seems to be the only way to be<BR>&gt;&nbsp; &gt;&nbsp; =
=3D
  > able to=3D20
  > have CDATA attributes, which I have to have because in =3D
  > the<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; legacy data numeric-appearing identifiers are =
actually=3D20
  > 10-character<BR>&gt;&nbsp; &gt;&nbsp; strings with leading spaces, =
and =3D
  > if=3D20
  > these are not specified as CDATA<BR>&gt;&nbsp; &gt;&nbsp; the spaces =
=3D
  > go lost=3D20
  > even with "xml:space=3D3D"preserve"" included in the<BR>&gt;&nbsp; =
=3D
  > &gt;&nbsp;=3D20
  > header.&nbsp; Here is a code snippet from one of my apps that =
creates =3D
  > an=3D20
  > XML<BR>&gt;&nbsp; &gt;&nbsp; document which is passed as a parameter =
=3D
  > to a SQL=3D20
  > Server stored<BR>&gt;&nbsp; &gt;&nbsp; procedure:<BR>&gt;&nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
strXM =3D
  > =3D3D=3D20
  > "&lt;?xml version =3D3D" &amp; Chr(34) &amp; "1.0" &amp; Chr(34) =
&amp; =3D
  > "&nbsp;=3D20
  > encoding=3D3D" &amp; Chr(34) &amp; "UTF-8" &amp; Chr(34) &amp; =
"?&gt;" =3D
  > &amp;=3D20
  > vbCrLf _<BR>&gt;&nbsp; &gt;&nbsp;=3D20
  > &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &amp; "&lt;ROOT =3D
  > xml:space=3D3D"=3D20
  > &amp; Chr(34) &amp; "preserve" &amp; Chr(34) &amp; "&gt;" &amp;=3D20
  > vbCrLf<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp;=3D20
  > &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Do While Not .EOF<BR>&gt;&nbsp; =
=3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strXM =3D3D strXM =
&amp;=3D20
  > "&lt;M&gt;&lt;A&gt;" &amp; !Ofc &amp; =3D
  > "&lt;/A&gt;&lt;B&gt;&lt;![CDATA[" &amp;=3D20
  > !Contract &amp; "]]&gt;&lt;/B&gt;&lt;C&gt;" &amp; !TCode &amp;=3D20
  > "&lt;/C&gt;&lt;D&gt;" &amp; !Date &amp; "&lt;/D&gt;" =
_<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp;=3D20
  > =3D
  > =
&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
=3D
  > sp;&nbsp;&nbsp;&nbsp;=3D20
  > &amp; "&lt;E&gt;&lt;![CDATA[" &amp; !TransNo &amp;=3D20
  > "]]&gt;&lt;/E&gt;&lt;/M&gt;" &amp; vbCrLf<BR>&gt;&nbsp; =
&gt;&nbsp;=3D20
  > &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
.MoveNext<BR>&gt;&nbsp; =3D
  > &gt;&nbsp; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Loop<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
strXM =3D
  > =3D3D strXM=3D20
  > &amp; "&lt;/ROOT&gt;"<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =
(The =3D
  > vbCrLf's are there so if there is a problem the document can =3D
  > be<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; printed to a text file and be easier for humans to read =
-- =3D
  > SQL=3D20
  > Server<BR>&gt;&nbsp; &gt;&nbsp; ignores them.&nbsp; The =3D
  > single-character=3D20
  > aliases for entity and attribute<BR>&gt;&nbsp; &gt;&nbsp; names are =
=3D
  > for=3D20
  > performance -- for most of the stuff we use these for =
it<BR>&gt;&nbsp; =3D
  > &gt;&nbsp; doesn't really matter because we are only sending a few =
=3D
  > rows, but=3D20
  > the<BR>&gt;&nbsp; &gt;&nbsp; first time I did it it was for =
something =3D
  > that was=3D20
  > sending about 5000<BR>&gt;&nbsp; &gt;&nbsp; rows and there it made a =
=3D
  > huge=3D20
  > difference, so I stuck with it.&nbsp; We<BR>&gt;&nbsp; &gt;&nbsp; =
=3D
  > comment both=3D20
  > the front-end code and the stored procedure with the<BR>&gt;&nbsp; =
=3D
  > &gt;&nbsp;=3D20
  > mappings of these aliases.)<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp;&nbsp; I do not know how SQL Server maps from char to =
nchar, =3D
  > specifically what conversion is performed.&nbsp; Also, in some =
(maybe =3D
  > all=3D20
  > released) versions of SQL Server nchar and nvarchar are encoded =
in=3D20
  > UCS-2.&nbsp; UCS-2 is a 16-bit encoding like UTF-16.&nbsp; It dates =
=3D
  > back to=3D20
  > when Unicode was defined as having 2**16 characters instead of the =
=3D
  > 2**20+ that=3D20
  > it has now.&nbsp; You can not express characters &gt;=3D3D U+10000 =
in =3D
  > UCS-2 not=3D20
  > that you care about these.<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; =
&gt;&nbsp; =3D
  > Thankfully, no.&nbsp;&nbsp; :)<BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp;&nbsp; I don't know if whether those systems =
you =3D
  > describe=3D20
  > being written in java make a difference.&nbsp; They can do what =
they=3D20
  > want.&nbsp; The native java string is Unicode though I don't =
remember =3D
  > if it is=3D20
  > UCS-2 or UTF-16.&nbsp; My guess is that it was once the former and =
is =3D
  > now the=3D20
  > latter.&nbsp; One of the documents on this on sun's site suggests =
that =3D
  > java=3D20
  > used UCS-2 until the recently released 1.5 which is the first to =
use=3D20
  > UTF-16.<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; The Java native =
=3D
  > string=3D20
  > being unicode is exactly what made me start<BR>&gt;&nbsp; &gt;&nbsp; =
=3D
  > worrying=3D20
  > -- when I was learning Java a couple of years ago (because =3D
  > I<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; wanted to port an app to it so as to be able to run it =
=3D
  > right on the=3D20
  > Unix<BR>&gt;&nbsp; &gt;&nbsp; box where the Oracle database was) I =
was =3D
  > horrified the first time I<BR>&gt;&nbsp; &gt;&nbsp; tried reading =
back =3D
  > what I=3D20
  > had written to a text file when I saw spaces<BR>&gt;&nbsp; =
&gt;&nbsp; =3D
  > between=3D20
  > all the characters.<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;Rich<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > "Ellen K." &lt;<A=3D20
  > =3D
  > =
href=3D3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>=
&g=3D
  > t;=3D20
  > wrote in message <A=3D20
  > =3D
  > =
href=3D3D"news:aqag115606i9g8bmh3lst66une1f1sotth@4ax.com">news:aqag11560=
6i=3D
  > 9g8bmh3lst66une1f1sotth@4ax.com</A>...<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; UTF-8 is unicode?!?&nbsp;&nbsp; Sheesh, all =
this =3D
  > time I=3D20
  > thought it meant 8-bit.<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; In fact =
I =3D
  > could=3D20
  > swear I read that somewhere.<BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; My question was coming from the database =3D
  > perspective,=3D20
  > where I always use<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; char and =3D
  > varchar, as=3D20
  > opposed to nchar and nvarchar.&nbsp; I give the<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; front-end guys little templates for creating the XML =3D
  > documents for=3D20
  > all<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; my SQL Server stored =3D
  > procedures that=3D20
  > take XML input, and I always<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =3D
  > specify UTF-8=3D20
  > in the header... and my char and varchar columns =
always<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; end up normal, so since you're now telling me =
=3D
  > UTF-8 is=3D20
  > really unicode, I<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; guess that =
would =3D
  > answer=3D20
  > my question for XML data I would be getting from<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; the apps...?&nbsp;&nbsp;&nbsp; Or would the answer be =3D
  > different if=3D20
  > the incoming XML is<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; some =
other=3D20
  > encoding?<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp; To=3D20
  > simulate getting nvarchar data from somewhere, I just tried=3D20
  > creating<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; two dummy tables, one =
=3D
  > with an=3D20
  > nvarchar column and the other with a<BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > varchar column, typed stuff into the nvarchar one, then inserted =
to=3D20
  > the<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; varchar one select from the =
=3D
  > nvarchar=3D20
  > one and it looks normal.&nbsp; <BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; If all this means I was worrying about =
nothing,=3D20
  > excellent!&nbsp;&nbsp; OTOH, is<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
=3D
  > there=3D20
  > something I should be worrying about that I didn't =
ask?<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; The only pieces =
=3D
  > whose=3D20
  > names I know so far are Sonic and SalesForce, both<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; of which are written in Java, if that makes any =3D
  > difference.&nbsp; I=3D20
  > know<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; there is at least one other =
=3D
  > external=3D20
  > piece but I think that is the next<BR>&gt;&nbsp; &gt;&nbsp; =
&gt;&nbsp; =3D
  > phase.<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp; On=3D20
  > Sat, 19 Feb 2005 21:37:15 -0800, "Rich" &lt;@&gt; wrote in=3D20
  > message<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &lt;<A=3D20
  > =3D
  > =
href=3D3D"mailto:421821c1$1@w3.nls.net">421821c1$1@w3.nls.net</A>&gt;:<BR=
>&=3D
  > gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp; =
=3D
  > You need=3D20
  > to be more specific than "8-bit characters".&nbsp; There are many =
=3D
  > 8-bit=3D20
  > character encodings.&nbsp; If you are using Windows to generate your =
=3D
  > data you=3D20
  > most likely are using Windows-1252 which is the default 8-bit =3D
  > character set=3D20
  > for U.S. English in Windows.&nbsp; Windows supports many 8-bit =3D
  > encodings so=3D20
  > you could be using something else too.<BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp; Unicode is =
a =3D
  > character set not an encoding.&nbsp; There are multiple encodings =
the =3D
  > main=3D20
  > ones being UTF-8, UTF-16, and UTF-32.&nbsp; You can use any of these =
=3D
  > for XML=3D20
  > as well as non-Unicode encodings.&nbsp; For interoperability you =3D
  > should use=3D20
  > Unicode preferably UTF-8.<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp; What comes =
=3D
  > out when=3D20
  > the XML is parsed depends on the XML parser.&nbsp; XML is logically =
=3D
  > expressed=3D20
  > in Unicode.&nbsp; The Windows XML parsers provide a Unicode =3D
  > interface.&nbsp;=3D20
  > Other parsers could do differently.<BR>&gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;Rich<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
&gt;<BR>&gt;&nbsp; =3D
  > &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; "Ellen K." &lt;<A=3D20
  > =3D
  > =
href=3D3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>=
&g=3D
  > t;=3D20
  > wrote in message <A=3D20
  > =3D
  > =
href=3D3D"news:4o2g11pu048kafbdilg46u77vs5ls0be55@4ax.com">news:4o2g11pu0=
48=3D
  > kafbdilg46u77vs5ls0be55@4ax.com</A>...<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; Our new enterprise system is going =
to =3D
  > be=3D20
  > built around an Enterprise<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > Service Bus.&nbsp; I don't have the full specs yet but as I =
understand =3D
  > it=3D20
  > the<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; main apps =
(starting =3D
  > with=3D20
  > SalesForce) are going to be out on the internet<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; and the Sonic ESB will be the messaging =3D
  > piece.&nbsp;=3D20
  > There will&nbsp; be an<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
&gt;&nbsp;=3D20
  > Operational Data Store in house that will get updated every night =
on=3D20
  > a<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; batch basis from =
the =3D
  > main=3D20
  > apps.&nbsp; <BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; =
=3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; My data warehouse will continue to be the data =
=3D
  > warehouse=3D20
  > and will remain<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; in =
=3D
  > house.&nbsp;=3D20
  > The dimensions will stay the same but I might have to =3D
  > create<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; separate measures for the data from =
=3D
  > the new=3D20
  > apps and then create views<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp; to=3D20
  > keep everything transparent to the users.&nbsp;&nbsp; =
<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =3D
  > &gt;&nbsp; I'm=3D20
  > thinking if we're going to have an ODS in house already, I may=3D20
  > as<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; well do the ETL =
from =3D
  > there.&nbsp;&nbsp; But I'm worrying that the new data =3D
  > will<BR>&gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; probably be unicode (because Java =
=3D
  > defaults to=3D20
  > that and SalesForce is<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
&gt;&nbsp; =3D
  > written=3D20
  > in Java).&nbsp; Right now I am storing everything (except our=3D20
  > blobs<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; of course) in =
=3D
  > 8-bit=3D20
  > characters.&nbsp;&nbsp; <BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;=3D20
  > &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; Anyone here =
who's =3D
  > up on=3D20
  > this stuff, can the XML that goes back and forth<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; convert between unicode and 8-bit characters, =
or =3D
  > am I=3D20
  > gonna have to<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
redefine =3D
  > all my=3D20
  > data?&nbsp;&nbsp; For example, if unicode data is put into an=3D20
  > XML<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; document that =3D
  > specifies=3D20
  > UTF-8, what comes out when the document is<BR>&gt;&nbsp; &gt;&nbsp; =
=3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; parsed?&nbsp; How about vice versa?&nbsp; If this is too =
=3D
  > simplistic=3D20
  > to work, what<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; is=3D20
  > needed?<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; (We actually have no substantive need for =3D
  > unicode -- we=3D20
  > are bilingual<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; Spanish =
=3D
  > but all=3D20
  > the special Spanish characters exist in the ascii<BR>&gt;&nbsp; =3D
  > &gt;&nbsp;=3D20
  > &gt;&nbsp; &gt;&nbsp; character set.)<BR></BLOCKQUOTE></BODY></HTML>
------=_NextPart_000_0A6B_01C51860.770EC1B0
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.3790.1289" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp; If SalesForce creates =
UTF-8 encoded=20
XML, and it really is UTF-8 encoded, then you should have no = problem.&nbsp;
The=20
XML is valid and any software you have that consumes XML should accept = it
just=20
fine.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp; I don't see a problem with =
UTF-16 if=20
you prefer this to UTF-8.&nbsp; Both can represent exactly the same=20
characters.&nbsp; The only difference is the mechanics of the=20
encoding.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Rich</FONT></DIV>
<DIV>&nbsp;</DIV>
<BLOCKQUOTE=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV>"Ellen K" &lt;<A=20
  href=3D"mailto:Ellen.K@harborwebs.com">Ellen.K@harborwebs.com</A>&gt; =
wrote in=20
  message <A=20
  =
href=3D"news:908e44.556f2c@harborwebs.com">news:908e44.556f2c@harborwebs.=
com</A>...</DIV>First=20
  of all I have to say you have once again earned my eternal gratitude, =
I<BR>am=20
  so glad you are here.&nbsp;&nbsp; :)<BR><BR>I still have to make a =
decision=20
  regarding the standard I want to request.&nbsp; Are<BR>you saying that =
if for=20
  example I get XML generated by say SalesForce with UTF-8<BR>specified =
and it=20
  includes these characters, I would not have a problem?&nbsp; =
Stuff<BR>I write=20
  myself I'm not worried about because I have the opportunity to =
tweak<BR>it,=20
  what I'm worried about is what the ESB will try to feed my databases.=20
  <BR>Since even the hand-typed XML was accepted when UTF-16 was =
specified, I'm=20
  kind<BR>of leaning toward that, especially since the Oracle guy told =
me that=20
  the TJ<BR>accounting manager used to complain that the Spanish =
characters=20
  weren't coming<BR>out on reports when we were on Oracle 8.x which used =
UTF-8=20
  but as soon as we<BR>moved to 9.x which uses UTF-16 there were no more =

  problems.&nbsp; What (if any) do<BR>you see as a potential downside to =

  UTF-16?<BR><BR><BR>&gt; From: "Rich" &lt;@&gt;<BR>&gt; This is a =
multi-part=20
  message in MIME format.<BR>&gt;=20
  ------=3D_NextPart_000_08FA_01C517AB.C3617E00<BR>&gt; Content-Type:=20
  text/plain;<BR>&gt; charset=3D"iso-8859-1"<BR>&gt; =
Content-Transfer-Encoding:=20
  quoted-printable<BR>&gt; The problem is not that UTF-8 won't work for =
what you=20
  call Spanish =3D<BR>&gt; characters.&nbsp; UTF-8 can encode anything =
that you=20
  consider a character.&nbsp; =3D<BR>&gt; The problem is that when these =

  characters are present they are not being =3D<BR>&gt; encoded in =
UTF-8.&nbsp;=20
  There are two straight forward solutions I see.&nbsp; One =3D<BR>&gt; =
is to=20
  encode the XML correctly in UTF-8.&nbsp; The other is to encode the =
XML=20
  =3D<BR>&gt; in the Windows ANSI encoding I suspect you are using and =
to tag it=20
  =3D<BR>&gt; correctly.&nbsp; On a Windows U.S. English system this =
would be=20
  =3D<BR>&gt; "Windows-1252".&nbsp; I would expect SQL Server to support =

  this.&nbsp; Other =3D<BR>&gt; applications may or may not.&nbsp; You =
could also=20
  use UTF-16 as long as you =3D<BR>&gt; generate your XML in =
UTF-16.&nbsp; I don't=20
  know if this is simple for your =3D<BR>&gt; application or not.&nbsp; =
If you are=20
  using VB 7.0 it should be.<BR>&gt; Rich<BR>&gt; "Ellen K." &lt;<A=20
  =
href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
t;=20
  wrote in message =3D<BR>&gt; <A=20
  =
href=3D"news:of2j11lqf4r4h1dsnvok8i1dv7c9e3rlb2@4ax.com">news:of2j11lqf4r=
4h1dsnvok8i1dv7c9e3rlb2@4ax.com</A>...<BR>&gt;=20
  Well.&nbsp;&nbsp; You just helped me learn a lot.&nbsp;&nbsp; For one =
thing, I=20
  guess I<BR>&gt; thought that because the Spanish characters can be =
expressed=20
  as 8 bits<BR>&gt; they were ASCII. =3D20<BR>&gt; I just made a little =
test=20
  stored procedure taking an XML document as a<BR>&gt; parameter, =
created the=20
  document manually in Notepad with UTF-8 =3D<BR>&gt; specified<BR>&gt; =
in the=20
  header and tried including some of the Spanish characters... =
=3D<BR>&gt;=20
  and<BR>&gt; it failed.&nbsp; SQL Server could not execute=20
  sp_xml_preparedocument =3D<BR>&gt; because<BR>&gt; "an invalid =
character was=20
  found in text content".&nbsp;&nbsp; Just to make sure<BR>&gt; that was =
the=20
  problem I substituted non-Spanish characters for the<BR>&gt; Spanish =
ones and=20
  it executed fine.&nbsp; However, I can manually type the<BR>&gt; text =
with the=20
  Spanish characters into the varchar field if I open the<BR>&gt; table =
in=20
  EnterpriseManager and SQL Server is perfectly happy.&nbsp; OTOH, =
=3D<BR>&gt;=20
  if<BR>&gt; I specify UTF-16, I get "Switch from current encoding to=20
  specified<BR>&gt; encoding not supported."&nbsp;&nbsp; Next thing I =
tried was=20
  cloning the sproc to<BR>&gt; write to the table with the nvarchar =
column,=20
  still no joy, same error<BR>&gt; message... but on changing the =
datatype of=20
  the input parameter from =3D<BR>&gt; text<BR>&gt; to ntext it worked =
fine.&nbsp;=20
  BUT here's the surprise (OK, to me it was a<BR>&gt; surprise):&nbsp; =
If I=20
  again clone the sproc to point to the table with the<BR>&gt; varchar =
column,=20
  but leave the input parameter as ntext and specify<BR>&gt; UTF-16 in =
the=20
  document header, it works.&nbsp; In other words, a varchar =3D<BR>&gt; =

  (and<BR>&gt; presumably a char) column can successfully accept unicode =
data=20
  even<BR>&gt; though char and varchar are explicitly defined as =
non-unicode=20
  =3D<BR>&gt; datatypes!<BR>&gt; Now I have to understand whether I have =
a problem=20
  at work.&nbsp;&nbsp; I never<BR>&gt; experienced this problem in real =
life=20
  because none of the data we<BR>&gt; currently send using XML includes =
any of=20
  the Spanish characters.&nbsp; Is<BR>&gt; the problem only going to =
occur if=20
  the XML document is constructed =3D<BR>&gt; using<BR>&gt; the=20
  concatenated-string method?&nbsp;&nbsp; Or would it happen any time an =

  XML<BR>&gt; document specified as UTF-8 included Spanish=20
  characters?&nbsp;&nbsp; (The data<BR>&gt; sent by SalesForce to the =
ODS is=20
  likely to include Spanish characters,<BR>&gt; but it probably creates =
the XML=20
  some other way.)&nbsp; Do I need to tell =3D<BR>&gt; the<BR>&gt; =
consulting=20
  outfit to specify all XML as UTF-16? =3D20<BR>&gt; For the ETL from =
the ODS to=20
  the data warehouse I am not planning to =3D<BR>&gt; use<BR>&gt; Sonic, =
but=20
  rather probably to link the databases and use a bunch of<BR>&gt; =
stored=20
  procedures controlled by some VB code, IOW I will not need XML<BR>&gt; =
because=20
  all the extract sprocs will look like INSERT INTO....SELECT<BR>&gt;=20
  FROM.<BR>&gt; I don't yet understand why UTF-8 can't work for the=20
  Spanish<BR>&gt; characters... (unless it only doesn't work when the =
characters=20
  are<BR>&gt; manually typed into the document).&nbsp; If I correctly =
understand=20
  the<BR>&gt; document to which you referred me, an 8-bit character =
can't fit in=20
  one<BR>&gt; UTF-8 byte because the first bit is reserved for =
indicating which=20
  is =3D<BR>&gt; the<BR>&gt; first byte of a UTF-8 multi-byte =
character.&nbsp;=20
  (This was your point =3D<BR>&gt; about<BR>&gt; not greater than=20
  0x7F.)&nbsp;&nbsp; But why wouldn't it just make two bytes out =
=3D<BR>&gt;=20
  of<BR>&gt; the Spanish characters then?&nbsp;&nbsp; The documentation =
says=20
  UTF-8 uses<BR>&gt; multiple bytes for the characters that it can't fit =
into=20
  one byte. =3D20<BR>&gt; ???<BR>&gt; On Sun, 20 Feb 2005 13:10:40 =
-0800, "Rich"=20
  &lt;@&gt; wrote in message<BR>&gt; &lt;<A=20
  =
href=3D"mailto:4218fc91@w3.nls