Tillbaka till svenska Fidonet
English   Information   Debug  
OS2BBS   0/787
OS2DOSBBS   0/580
OS2HW   0/42
OS2INET   0/37
OS2LAN   0/134
OS2PROG   0/36
OS2REXX   0/113
OS2USER-L   207
OS2   0/4786
OSDEBATE   2570/18996
PASCAL   0/490
PERL   0/457
PHP   0/45
POINTS   0/405
POLITICS   0/29554
POL_INC   0/14731
PSION   103
R20_ADMIN   1121
R20_AMATORRADIO   0/2
R20_BEST_OF_FIDONET   13
R20_CHAT   0/893
R20_DEPP   0/3
R20_DEV   399
R20_ECHO2   1379
R20_ECHOPRES   0/35
R20_ESTAT   0/719
R20_FIDONETPROG...
...RAM.MYPOINT
  0/2
R20_FIDONETPROGRAM   0/22
R20_FIDONET   0/248
R20_FILEFIND   0/24
R20_FILEFOUND   0/22
R20_HIFI   0/3
R20_INFO2   3218
R20_INTERNET   0/12940
R20_INTRESSE   0/60
R20_INTR_KOM   0/99
R20_KANDIDAT.CHAT   42
R20_KANDIDAT   28
R20_KOM_DEV   112
R20_KONTROLL   0/13270
R20_KORSET   0/18
R20_LOKALTRAFIK   0/24
R20_MODERATOR   0/1852
R20_NC   76
R20_NET200   245
R20_NETWORK.OTH...
...ERNETS
  0/13
R20_OPERATIVSYS...
...TEM.LINUX
  0/44
R20_PROGRAMVAROR   0/1
R20_REC2NEC   534
R20_SFOSM   0/340
R20_SF   0/108
R20_SPRAK.ENGLISH   0/1
R20_SQUISH   107
R20_TEST   2
R20_WORST_OF_FIDONET   12
RAR   0/9
RA_MULTI   106
RA_UTIL   0/162
REGCON.EUR   0/2056
REGCON   0/13
SCIENCE   0/1206
SF   0/239
SHAREWARE_SUPPORT   0/5146
SHAREWRE   0/14
SIMPSONS   0/169
STATS_OLD1   0/2539.065
STATS_OLD2   0/2530
STATS_OLD3   0/2395.095
STATS_OLD4   0/1692.25
SURVIVOR   0/495
SYSOPS_CORNER   0/3
SYSOP   0/84
TAGLINES   0/112
TEAMOS2   0/4530
TECH   0/2617
TEST.444   0/105
TRAPDOOR   0/19
TREK   0/755
TUB   0/290
UFO   0/40
UNIX   0/1316
USA_EURLINK   0/102
USR_MODEMS   0/1
VATICAN   0/2740
VIETNAM_VETS   0/14
VIRUS   0/378
VIRUS_INFO   0/201
VISUAL_BASIC   0/473
WHITEHOUSE   0/5187
WIN2000   0/101
WIN32   0/30
WIN95   0/4288
WIN95_OLD1   0/70272
WINDOWS   0/1517
WWB_SYSOP   0/419
WWB_TECH   0/810
ZCC-PUBLIC   0/1
ZEC   4

 
4DOS   0/134
ABORTION   0/7
ALASKA_CHAT   0/506
ALLFIX_FILE   0/1313
ALLFIX_FILE_OLD1   0/7997
ALT_DOS   0/152
AMATEUR_RADIO   0/1039
AMIGASALE   0/14
AMIGA   0/331
AMIGA_INT   0/1
AMIGA_PROG   0/20
AMIGA_SYSOP   0/26
ANIME   0/15
ARGUS   0/924
ASCII_ART   0/340
ASIAN_LINK   0/651
ASTRONOMY   0/417
AUDIO   0/92
AUTOMOBILE_RACING   0/105
BABYLON5   0/17862
BAG   135
BATPOWER   0/361
BBBS.ENGLISH   0/382
BBSLAW   0/109
BBS_ADS   0/5290
BBS_INTERNET   0/507
BIBLE   0/3563
BINKD   0/1119
BINKLEY   0/215
BLUEWAVE   0/2173
CABLE_MODEMS   0/25
CBM   0/46
CDRECORD   0/66
CDROM   0/20
CLASSIC_COMPUTER   0/378
COMICS   0/15
CONSPRCY   0/899
COOKING   32896
COOKING_OLD1   0/24719
COOKING_OLD2   0/40862
COOKING_OLD3   0/37489
COOKING_OLD4   0/35496
COOKING_OLD5   9370
C_ECHO   0/189
C_PLUSPLUS   0/31
DIRTY_DOZEN   0/201
DOORGAMES   0/2056
DOS_INTERNET   0/196
duplikat   6002
ECHOLIST   0/18295
EC_SUPPORT   0/318
ELECTRONICS   0/359
ELEKTRONIK.GER   1534
ENET.LINGUISTIC   0/13
ENET.POLITICS   0/4
ENET.SOFT   0/11701
ENET.SYSOP   33903
ENET.TALKS   0/32
ENGLISH_TUTOR   0/2000
EVOLUTION   0/1335
FDECHO   0/217
FDN_ANNOUNCE   0/7068
FIDONEWS   24125
FIDONEWS_OLD1   0/49742
FIDONEWS_OLD2   0/35949
FIDONEWS_OLD3   0/30874
FIDONEWS_OLD4   0/37224
FIDO_SYSOP   12852
FIDO_UTIL   0/180
FILEFIND   0/209
FILEGATE   0/212
FILM   0/18
FNEWS_PUBLISH   4408
FN_SYSOP   41678
FN_SYSOP_OLD1   71952
FTP_FIDO   0/2
FTSC_PUBLIC   0/13599
FUNNY   0/4886
GENEALOGY.EUR   0/71
GET_INFO   105
GOLDED   0/408
HAM   0/16070
HOLYSMOKE   0/6791
HOT_SITES   0/1
HTMLEDIT   0/71
HUB203   466
HUB_100   264
HUB_400   39
HUMOR   0/29
IC   0/2851
INTERNET   0/424
INTERUSER   0/3
IP_CONNECT   719
JAMNNTPD   0/233
JAMTLAND   0/47
KATTY_KORNER   0/41
LAN   0/16
LINUX-USER   0/19
LINUXHELP   0/1155
LINUX   0/22092
LINUX_BBS   0/957
mail   18.68
mail_fore_ok   249
MENSA   0/341
MODERATOR   0/102
MONTE   0/992
MOSCOW_OKLAHOMA   0/1245
MUFFIN   0/783
MUSIC   0/321
N203_STAT   926
N203_SYSCHAT   313
NET203   321
NET204   69
NET_DEV   0/10
NORD.ADMIN   0/101
NORD.CHAT   0/2572
NORD.FIDONET   189
NORD.HARDWARE   0/28
NORD.KULTUR   0/114
NORD.PROG   0/32
NORD.SOFTWARE   0/88
NORD.TEKNIK   0/58
NORD   0/453
OCCULT_CHAT   0/93
Möte OSDEBATE, 18996 texter
 lista första sista föregående nästa
Text 2653, 828 rader
Skriven 2005-02-21 13:57:24 av Ellen K (1:379/45)
   Kommentar till text 2650 av Rich (1:379/45)
Ärende: Re: ESB / XML / Unicode vs 8-bit characters ?
=====================================================
First of all I have to say you have once again earned my eternal gratitude, I
am so glad you are here.   :)

I still have to make a decision regarding the standard I want to request.  Are
you saying that if for example I get XML generated by say SalesForce with UTF-8
specified and it includes these characters, I would not have a problem?  Stuff
I write myself I'm not worried about because I have the opportunity to tweak
it, what I'm worried about is what the ESB will try to feed my databases. 
Since even the hand-typed XML was accepted when UTF-16 was specified, I'm kind
of leaning toward that, especially since the Oracle guy told me that the TJ
accounting manager used to complain that the Spanish characters weren't coming
out on reports when we were on Oracle 8.x which used UTF-8 but as soon as we
moved to 9.x which uses UTF-16 there were no more problems.  What (if any) do
you see as a potential downside to UTF-16?


> From: "Rich" <@>
> This is a multi-part message in MIME format.
> ------=_NextPart_000_08FA_01C517AB.C3617E00
> Content-Type: text/plain;
> charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
> The problem is not that UTF-8 won't work for what you call Spanish =
> characters.  UTF-8 can encode anything that you consider a character.  =
> The problem is that when these characters are present they are not being =
> encoded in UTF-8.  There are two straight forward solutions I see.  One =
> is to encode the XML correctly in UTF-8.  The other is to encode the XML =
> in the Windows ANSI encoding I suspect you are using and to tag it =
> correctly.  On a Windows U.S. English system this would be =
> "Windows-1252".  I would expect SQL Server to support this.  Other =
> applications may or may not.  You could also use UTF-16 as long as you =
> generate your XML in UTF-16.  I don't know if this is simple for your =
> application or not.  If you are using VB 7.0 it should be.
> Rich
> "Ellen K." <72322.1016@compuserve.com> wrote in message =
> news:of2j11lqf4r4h1dsnvok8i1dv7c9e3rlb2@4ax.com...
> Well.   You just helped me learn a lot.   For one thing, I guess I
> thought that because the Spanish characters can be expressed as 8 bits
> they were ASCII. =20
> I just made a little test stored procedure taking an XML document as a
> parameter, created the document manually in Notepad with UTF-8 =
> specified
> in the header and tried including some of the Spanish characters... =
> and
> it failed.  SQL Server could not execute sp_xml_preparedocument =
> because
> "an invalid character was found in text content".   Just to make sure
> that was the problem I substituted non-Spanish characters for the
> Spanish ones and it executed fine.  However, I can manually type the
> text with the Spanish characters into the varchar field if I open the
> table in EnterpriseManager and SQL Server is perfectly happy.  OTOH, =
> if
> I specify UTF-16, I get "Switch from current encoding to specified
> encoding not supported."   Next thing I tried was cloning the sproc to
> write to the table with the nvarchar column, still no joy, same error
> message... but on changing the datatype of the input parameter from =
> text
> to ntext it worked fine.  BUT here's the surprise (OK, to me it was a
> surprise):  If I again clone the sproc to point to the table with the
> varchar column, but leave the input parameter as ntext and specify
> UTF-16 in the document header, it works.  In other words, a varchar =
> (and
> presumably a char) column can successfully accept unicode data even
> though char and varchar are explicitly defined as non-unicode =
> datatypes!
> Now I have to understand whether I have a problem at work.   I never
> experienced this problem in real life because none of the data we
> currently send using XML includes any of the Spanish characters.  Is
> the problem only going to occur if the XML document is constructed =
> using
> the concatenated-string method?   Or would it happen any time an XML
> document specified as UTF-8 included Spanish characters?   (The data
> sent by SalesForce to the ODS is likely to include Spanish characters,
> but it probably creates the XML some other way.)  Do I need to tell =
> the
> consulting outfit to specify all XML as UTF-16? =20
> For the ETL from the ODS to the data warehouse I am not planning to =
> use
> Sonic, but rather probably to link the databases and use a bunch of
> stored procedures controlled by some VB code, IOW I will not need XML
> because all the extract sprocs will look like INSERT INTO....SELECT
> FROM.
> I don't yet understand why UTF-8 can't work for the Spanish
> characters... (unless it only doesn't work when the characters are
> manually typed into the document).  If I correctly understand the
> document to which you referred me, an 8-bit character can't fit in one
> UTF-8 byte because the first bit is reserved for indicating which is =
> the
> first byte of a UTF-8 multi-byte character.  (This was your point =
> about
> not greater than 0x7F.)   But why wouldn't it just make two bytes out =
> of
> the Spanish characters then?   The documentation says UTF-8 uses
> multiple bytes for the characters that it can't fit into one byte. =20
> ???
> On Sun, 20 Feb 2005 13:10:40 -0800, "Rich" <@> wrote in message
> <4218fc91@w3.nls.net>:
>> The Spanish accented characters are not part of ASCII.  They are =
> part of Windows calls ANSI of which ASCII is the subset (0x00 to 0x7F).  =
> Any character in the 0x80 to 0xFF range is not compatible between ANSI =
> and UTF-8.
>> 
>> Rich
>> 
>> "Ellen K." <72322.1016@compuserve.com> wrote in message =
> news:7ouh119ivmuk26icg3mqqqk2ss1lfm5c10@4ax.com...
>> Should not have any non-ASCII characters, as previously noted all =
> the
>> special Spanish characters are available in the ASCII character =
> set.
>> And since the company is built on our understanding of the Hispanic
>> market, I don't see any use of, say, pictograph-based languages in =
> the
>> foreseeable future.   If 10 years down the road something like that
>> happens, well, by then we will no longer need compatibility with =
> the
>> current legacy system because it will long since have been =
> replaced.
>> 
>> On Sun, 20 Feb 2005 12:52:25 -0800, "Rich" <@> wrote in message
>> <4218f849$1@w3.nls.net>:
>> 
>> >   From what you describe below, if the values you emit to XML =
> have non-ASCII characters I would expect you to have a problem.
>> >
>> >Rich
>> >
>> >  "Ellen K." <72322.1016@compuserve.com> wrote in message =
> news:eanh11h4vv6b9v21fiaounii3f5dunjl3g@4ax.com...
>> >  On Sat, 19 Feb 2005 23:32:37 -0800, "Rich" <@> wrote in message
>> >  <42183ccd@w3.nls.net>:
>> >
>> >  >   The UTF in UTF-8/16/32 stands for Unicode Transformation =
> Format.  You can find these defined in section 2.5 of =
> http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf.
>> >
>> >  THANK YOU SO MUCH!!!    :)
>> >  >
>> >  >   It's not clear to me how you are creating the XML from the =
> templates.  If ANSI data is emitted into an XML document declared as =
> UTF-8 then you would have problems only for non-ASCII characters.  UTF-8 =
> and Windows-1252 are identical for 0x00 to 0x7F which is ASCII in both.
>> >
>> >  I don't have a copy of a template here at home, but I have them =
> create
>> >  it by string concatenation because that seems to be the only way =
> to be
>> >  able to have CDATA attributes, which I have to have because in =
> the
>> >  legacy data numeric-appearing identifiers are actually =
> 10-character
>> >  strings with leading spaces, and if these are not specified as =
> CDATA
>> >  the spaces go lost even with "xml:space=3D"preserve"" included =
> in the
>> >  header.  Here is a code snippet from one of my apps that creates =
> an XML
>> >  document which is passed as a parameter to a SQL Server stored
>> >  procedure:
>> >
>> >  >      strXM =3D "<?xml version =3D" & Chr(34) & "1.0" & Chr(34) =
> & "  encoding=3D" & Chr(34) & "UTF-8" & Chr(34) & "?>" & vbCrLf _
>> >  >        & "<ROOT xml:space=3D" & Chr(34) & "preserve" & Chr(34) =
> & ">" & vbCrLf
>> >  >
>> >  >      Do While Not .EOF
>> >  >        strXM =3D strXM & "<M><A>" & !Ofc & "</A><B><![CDATA[" =
> & !Contract & "]]></B><C>" & !TCode & "</C><D>" & !Date & "</D>" _
>> >  >                & "<E><![CDATA[" & !TransNo & "]]></E></M>" & =
> vbCrLf
>> >  >        .MoveNext
>> >  >      Loop
>> >  >
>> >  >      strXM =3D strXM & "</ROOT>"
>> >
>> >  (The vbCrLf's are there so if there is a problem the document =
> can be
>> >  printed to a text file and be easier for humans to read -- SQL =
> Server
>> >  ignores them.  The single-character aliases for entity and =
> attribute
>> >  names are for performance -- for most of the stuff we use these =
> for it
>> >  doesn't really matter because we are only sending a few rows, =
> but the
>> >  first time I did it it was for something that was sending about =
> 5000
>> >  rows and there it made a huge difference, so I stuck with it.  =
> We
>> >  comment both the front-end code and the stored procedure with =
> the
>> >  mappings of these aliases.)
>> >
>> >  >   I do not know how SQL Server maps from char to nchar, =
> specifically what conversion is performed.  Also, in some (maybe all =
> released) versions of SQL Server nchar and nvarchar are encoded in =
> UCS-2.  UCS-2 is a 16-bit encoding like UTF-16.  It dates back to when =
> Unicode was defined as having 2**16 characters instead of the 2**20+ =
> that it has now.  You can not express characters >=3D U+10000 in UCS-2 =
> not that you care about these.
>> >
>> >  Thankfully, no.   :)
>> >  >
>> >  >   I don't know if whether those systems you describe being =
> written in java make a difference.  They can do what they want.  The =
> native java string is Unicode though I don't remember if it is UCS-2 or =
> UTF-16.  My guess is that it was once the former and is now the latter.  =
> One of the documents on this on sun's site suggests that java used UCS-2 =
> until the recently released 1.5 which is the first to use UTF-16.
>> >
>> >  The Java native string being unicode is exactly what made me =
> start
>> >  worrying -- when I was learning Java a couple of years ago =
> (because I
>> >  wanted to port an app to it so as to be able to run it right on =
> the Unix
>> >  box where the Oracle database was) I was horrified the first =
> time I
>> >  tried reading back what I had written to a text file when I saw =
> spaces
>> >  between all the characters.
>> >  >
>> >  >Rich
>> >  >
>> >  >  "Ellen K." <72322.1016@compuserve.com> wrote in message =
> news:aqag115606i9g8bmh3lst66une1f1sotth@4ax.com...
>> >  >  UTF-8 is unicode?!?   Sheesh, all this time I thought it =
> meant 8-bit.
>> >  >  In fact I could swear I read that somewhere.
>> >  >
>> >  >  My question was coming from the database perspective, where I =
> always use
>> >  >  char and varchar, as opposed to nchar and nvarchar.  I give =
> the
>> >  >  front-end guys little templates for creating the XML =
> documents for all
>> >  >  my SQL Server stored procedures that take XML input, and I =
> always
>> >  >  specify UTF-8 in the header... and my char and varchar =
> columns always
>> >  >  end up normal, so since you're now telling me UTF-8 is really =
> unicode, I
>> >  >  guess that would answer my question for XML data I would be =
> getting from
>> >  >  the apps...?    Or would the answer be different if the =
> incoming XML is
>> >  >  some other encoding?
>> >  >
>> >  >  To simulate getting nvarchar data from somewhere, I just =
> tried creating
>> >  >  two dummy tables, one with an nvarchar column and the other =
> with a
>> >  >  varchar column, typed stuff into the nvarchar one, then =
> inserted to the
>> >  >  varchar one select from the nvarchar one and it looks normal. =
> =20
>> >  >
>> >  >  If all this means I was worrying about nothing, excellent!   =
> OTOH, is
>> >  >  there something I should be worrying about that I didn't ask?
>> >  >
>> >  >  The only pieces whose names I know so far are Sonic and =
> SalesForce, both
>> >  >  of which are written in Java, if that makes any difference.  =
> I know
>> >  >  there is at least one other external piece but I think that =
> is the next
>> >  >  phase.
>> >  >
>> >  >  On Sat, 19 Feb 2005 21:37:15 -0800, "Rich" <@> wrote in =
> message
>> >  >  <421821c1$1@w3.nls.net>:
>> >  >
>> >  >  >   You need to be more specific than "8-bit characters".  =
> There are many 8-bit character encodings.  If you are using Windows to =
> generate your data you most likely are using Windows-1252 which is the =
> default 8-bit character set for U.S. English in Windows.  Windows =
> supports many 8-bit encodings so you could be using something else too.
>> >  >  >
>> >  >  >   Unicode is a character set not an encoding.  There are =
> multiple encodings the main ones being UTF-8, UTF-16, and UTF-32.  You =
> can use any of these for XML as well as non-Unicode encodings.  For =
> interoperability you should use Unicode preferably UTF-8.
>> >  >  >
>> >  >  >   What comes out when the XML is parsed depends on the XML =
> parser.  XML is logically expressed in Unicode.  The Windows XML parsers =
> provide a Unicode interface.  Other parsers could do differently.
>> >  >  >
>> >  >  >Rich
>> >  >  >
>> >  >  >
>> >  >  >  "Ellen K." <72322.1016@compuserve.com> wrote in message =
> news:4o2g11pu048kafbdilg46u77vs5ls0be55@4ax.com...
>> >  >  >  Our new enterprise system is going to be built around an =
> Enterprise
>> >  >  >  Service Bus.  I don't have the full specs yet but as I =
> understand it the
>> >  >  >  main apps (starting with SalesForce) are going to be out =
> on the internet
>> >  >  >  and the Sonic ESB will be the messaging piece.  There will =
> be an
>> >  >  >  Operational Data Store in house that will get updated =
> every night on a
>> >  >  >  batch basis from the main apps. =20
>> >  >  >
>> >  >  >  My data warehouse will continue to be the data warehouse =
> and will remain
>> >  >  >  in house.  The dimensions will stay the same but I might =
> have to create
>> >  >  >  separate measures for the data from the new apps and then =
> create views
>> >  >  >  to keep everything transparent to the users.  =20
>> >  >  >
>> >  >  >  I'm thinking if we're going to have an ODS in house =
> already, I may as
>> >  >  >  well do the ETL from there.   But I'm worrying that the =
> new data will
>> >  >  >  probably be unicode (because Java defaults to that and =
> SalesForce is
>> >  >  >  written in Java).  Right now I am storing everything =
> (except our blobs
>> >  >  >  of course) in 8-bit characters.  =20
>> >  >  >
>> >  >  >  Anyone here who's up on this stuff, can the XML that goes =
> back and forth
>> >  >  >  convert between unicode and 8-bit characters, or am I =
> gonna have to
>> >  >  >  redefine all my data?   For example, if unicode data is =
> put into an XML
>> >  >  >  document that specifies UTF-8, what comes out when the =
> document is
>> >  >  >  parsed?  How about vice versa?  If this is too simplistic =
> to work, what
>> >  >  >  is needed?
>> >  >  >
>> >  >  >  (We actually have no substantive need for unicode -- we =
> are bilingual
>> >  >  >  Spanish but all the special Spanish characters exist in =
> the ascii
>> >  >  >  character set.)
> ------=_NextPart_000_08FA_01C517AB.C3617E00
> Content-Type: text/html;
> charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML><HEAD>
> <META http-equiv=3DContent-Type content=3D"text/html; =
> charset=3Diso-8859-1">
> <META content=3D"MSHTML 6.00.3790.1289" name=3DGENERATOR>
> <STYLE></STYLE>
> </HEAD>
> <BODY bgColor=3D#ffffff>
> <DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp; The problem is not that =
> UTF-8 won't=20
> work for what you call Spanish characters.&nbsp; UTF-8 can encode =
> anything that=20
> you consider a character.&nbsp; The problem is that when these =
> characters are=20
> present they are not being encoded in UTF-8.&nbsp; There are two =
> straight=20
> forward solutions I see.&nbsp; One is to encode the XML correctly in=20
> UTF-8.&nbsp; The other is to encode the XML in the Windows ANSI encoding =
> I=20
> suspect you are using and to tag it correctly.&nbsp; On a Windows U.S. =
> English=20
> system this would be "Windows-1252".&nbsp; I would expect SQL Server to =
> support=20
> this.&nbsp; Other applications may or may not.&nbsp; You could also use =
> UTF-16=20
> as long as you generate your XML in UTF-16.&nbsp; I don't know if this =
> is simple=20
> for your application or not.&nbsp; If you are using VB 7.0 it should=20
> be.</FONT></DIV>
> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
> <DIV><FONT face=3DArial size=3D2>Rich</FONT></DIV>
> <DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
> <BLOCKQUOTE=20
> style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
> BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
> <DIV>"Ellen K." &lt;<A=20
> =
> href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
> t;=20
> wrote in message <A=20
> =
> href=3D"news:of2j11lqf4r4h1dsnvok8i1dv7c9e3rlb2@4ax.com">news:of2j11lqf4r=
> 4h1dsnvok8i1dv7c9e3rlb2@4ax.com</A>...</DIV>Well.&nbsp;&nbsp;=20
> You just helped me learn a lot.&nbsp;&nbsp; For one thing, I guess=20
> I<BR>thought that because the Spanish characters can be expressed as 8 =
> bits<BR>they were ASCII.&nbsp; <BR><BR>I just made a little test =
> stored=20
> procedure taking an XML document as a<BR>parameter, created the =
> document=20
> manually in Notepad with UTF-8 specified<BR>in the header and tried =
> including=20
> some of the Spanish characters... and<BR>it failed.&nbsp; SQL Server =
> could not=20
> execute sp_xml_preparedocument because<BR>"an invalid character was =
> found in=20
> text content".&nbsp;&nbsp; Just to make sure<BR>that was the problem I =
> substituted non-Spanish characters for the<BR>Spanish ones and it =
> executed=20
> fine.&nbsp; However, I can manually type the<BR>text with the Spanish=20
> characters into the varchar field if I open the<BR>table in =
> EnterpriseManager=20
> and SQL Server is perfectly happy.&nbsp; OTOH, if<BR>I specify UTF-16, =
> I get=20
> "Switch from current encoding to specified<BR>encoding not=20
> supported."&nbsp;&nbsp; Next thing I tried was cloning the sproc =
> to<BR>write=20
> to the table with the nvarchar column, still no joy, same =
> error<BR>message...=20
> but on changing the datatype of the input parameter from text<BR>to =
> ntext it=20
> worked fine.&nbsp; BUT here's the surprise (OK, to me it was=20
> a<BR>surprise):&nbsp; If I again clone the sproc to point to the table =
> with=20
> the<BR>varchar column, but leave the input parameter as ntext and=20
> specify<BR>UTF-16 in the document header, it works.&nbsp; In other =
> words, a=20
> varchar (and<BR>presumably a char) column can successfully accept =
> unicode data=20
> even<BR>though char and varchar are explicitly defined as non-unicode=20
> datatypes!<BR><BR>Now I have to understand whether I have a problem at =
> work.&nbsp;&nbsp; I never<BR>experienced this problem in real life =
> because=20
> none of the data we<BR>currently send using XML includes any of the =
> Spanish=20
> characters.&nbsp; Is<BR>the problem only going to occur if the XML =
> document is=20
> constructed using<BR>the concatenated-string method?&nbsp;&nbsp; Or =
> would it=20
> happen any time an XML<BR>document specified as UTF-8 included Spanish =
> characters?&nbsp;&nbsp; (The data<BR>sent by SalesForce to the ODS is =
> likely=20
> to include Spanish characters,<BR>but it probably creates the XML some =
> other=20
> way.)&nbsp; Do I need to tell the<BR>consulting outfit to specify all =
> XML as=20
> UTF-16?&nbsp; <BR><BR>For the ETL from the ODS to the data warehouse I =
> am not=20
> planning to use<BR>Sonic, but rather probably to link the databases =
> and use a=20
> bunch of<BR>stored procedures controlled by some VB code, IOW I will =
> not need=20
> XML<BR>because all the extract sprocs will look like INSERT=20
> INTO....SELECT<BR>FROM.<BR><BR>I don't yet understand why UTF-8 can't =
> work for=20
> the Spanish<BR>characters... (unless it only doesn't work when the =
> characters=20
> are<BR>manually typed into the document).&nbsp; If I correctly =
> understand=20
> the<BR>document to which you referred me, an 8-bit character can't fit =
> in=20
> one<BR>UTF-8 byte because the first bit is reserved for indicating =
> which is=20
> the<BR>first byte of a UTF-8 multi-byte character.&nbsp; (This was =
> your point=20
> about<BR>not greater than 0x7F.)&nbsp;&nbsp; But why wouldn't it just =
> make two=20
> bytes out of<BR>the Spanish characters then?&nbsp;&nbsp; The =
> documentation=20
> says UTF-8 uses<BR>multiple bytes for the characters that it can't fit =
> into=20
> one byte.&nbsp; <BR><BR>???<BR><BR><BR><BR>On Sun, 20 Feb 2005 =
> 13:10:40 -0800,=20
> "Rich" &lt;@&gt; wrote in message<BR>&lt;<A=20
> =
> href=3D"mailto:4218fc91@w3.nls.net">4218fc91@w3.nls.net</A>&gt;:<BR><BR>&=
> gt;&nbsp;&nbsp;=20
> The Spanish accented characters are not part of ASCII.&nbsp; They are =
> part of=20
> Windows calls ANSI of which ASCII is the subset (0x00 to 0x7F).&nbsp; =
> Any=20
> character in the 0x80 to 0xFF range is not compatible between ANSI and =
> UTF-8.<BR>&gt;<BR>&gt;Rich<BR>&gt;<BR>&gt;&nbsp; "Ellen K." &lt;<A=20
> =
> href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
> t;=20
> wrote in message <A=20
> =
> href=3D"news:7ouh119ivmuk26icg3mqqqk2ss1lfm5c10@4ax.com">news:7ouh119ivmu=
> k26icg3mqqqk2ss1lfm5c10@4ax.com</A>...<BR>&gt;&nbsp;=20
> Should not have any non-ASCII characters, as previously noted all=20
> the<BR>&gt;&nbsp; special Spanish characters are available in the =
> ASCII=20
> character set.<BR>&gt;&nbsp; And since the company is built on our=20
> understanding of the Hispanic<BR>&gt;&nbsp; market, I don't see any =
> use of,=20
> say, pictograph-based languages in the<BR>&gt;&nbsp; foreseeable=20
> future.&nbsp;&nbsp; If 10 years down the road something like=20
> that<BR>&gt;&nbsp; happens, well, by then we will no longer need =
> compatibility=20
> with the<BR>&gt;&nbsp; current legacy system because it will long =
> since have=20
> been replaced.<BR>&gt;<BR>&gt;&nbsp; On Sun, 20 Feb 2005 12:52:25 =
> -0800,=20
> "Rich" &lt;@&gt; wrote in message<BR>&gt;&nbsp; &lt;<A=20
> =
> href=3D"mailto:4218f849$1@w3.nls.net">4218f849$1@w3.nls.net</A>&gt;:<BR>&=
> gt;<BR>&gt;&nbsp;=20
> &gt;&nbsp;&nbsp; From what you describe below, if the values you emit =
> to XML=20
> have non-ASCII characters I would expect you to have a =
> problem.<BR>&gt;&nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;Rich<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; =
> &gt;&nbsp; "Ellen=20
> K." &lt;<A=20
> =
> href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
> t;=20
> wrote in message <A=20
> =
> href=3D"news:eanh11h4vv6b9v21fiaounii3f5dunjl3g@4ax.com">news:eanh11h4vv6=
> b9v21fiaounii3f5dunjl3g@4ax.com</A>...<BR>&gt;&nbsp;=20
> &gt;&nbsp; On Sat, 19 Feb 2005 23:32:37 -0800, "Rich" &lt;@&gt; wrote =
> in=20
> message<BR>&gt;&nbsp; &gt;&nbsp; &lt;<A=20
> =
> href=3D"mailto:42183ccd@w3.nls.net">42183ccd@w3.nls.net</A>&gt;:<BR>&gt;&=
> nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp; The UTF in UTF-8/16/32 =
> stands=20
> for Unicode Transformation Format.&nbsp; You can find these defined in =
> section=20
> 2.5 of <A=20
> =
> href=3D"http://www.unicode.org/versions/Unicode4.0.0/ch02.pdf">http://www=
> .unicode.org/versions/Unicode4.0.0/ch02.pdf</A>.<BR>&gt;&nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;&nbsp; THANK YOU SO MUCH!!!&nbsp;&nbsp;&nbsp;=20
> :)<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp;&nbsp;=20
> It's not clear to me how you are creating the XML from the =
> templates.&nbsp; If=20
> ANSI data is emitted into an XML document declared as UTF-8 then you =
> would=20
> have problems only for non-ASCII characters.&nbsp; UTF-8 and =
> Windows-1252 are=20
> identical for 0x00 to 0x7F which is ASCII in both.<BR>&gt;&nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;&nbsp; I don't have a copy of a template here =
> at home,=20
> but I have them create<BR>&gt;&nbsp; &gt;&nbsp; it by string =
> concatenation=20
> because that seems to be the only way to be<BR>&gt;&nbsp; &gt;&nbsp; =
> able to=20
> have CDATA attributes, which I have to have because in =
> the<BR>&gt;&nbsp;=20
> &gt;&nbsp; legacy data numeric-appearing identifiers are actually=20
> 10-character<BR>&gt;&nbsp; &gt;&nbsp; strings with leading spaces, and =
> if=20
> these are not specified as CDATA<BR>&gt;&nbsp; &gt;&nbsp; the spaces =
> go lost=20
> even with "xml:space=3D"preserve"" included in the<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> header.&nbsp; Here is a code snippet from one of my apps that creates =
> an=20
> XML<BR>&gt;&nbsp; &gt;&nbsp; document which is passed as a parameter =
> to a SQL=20
> Server stored<BR>&gt;&nbsp; &gt;&nbsp; procedure:<BR>&gt;&nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strXM =
> =3D=20
> "&lt;?xml version =3D" &amp; Chr(34) &amp; "1.0" &amp; Chr(34) &amp; =
> "&nbsp;=20
> encoding=3D" &amp; Chr(34) &amp; "UTF-8" &amp; Chr(34) &amp; "?&gt;" =
> &amp;=20
> vbCrLf _<BR>&gt;&nbsp; &gt;&nbsp;=20
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &amp; "&lt;ROOT =
> xml:space=3D"=20
> &amp; Chr(34) &amp; "preserve" &amp; Chr(34) &amp; "&gt;" &amp;=20
> vbCrLf<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp;=20
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Do While Not .EOF<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strXM =3D strXM &amp;=20
> "&lt;M&gt;&lt;A&gt;" &amp; !Ofc &amp; =
> "&lt;/A&gt;&lt;B&gt;&lt;![CDATA[" &amp;=20
> !Contract &amp; "]]&gt;&lt;/B&gt;&lt;C&gt;" &amp; !TCode &amp;=20
> "&lt;/C&gt;&lt;D&gt;" &amp; !Date &amp; "&lt;/D&gt;" _<BR>&gt;&nbsp;=20
> &gt;&nbsp;=20
> =
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
> sp;&nbsp;&nbsp;&nbsp;=20
> &amp; "&lt;E&gt;&lt;![CDATA[" &amp; !TransNo &amp;=20
> "]]&gt;&lt;/E&gt;&lt;/M&gt;" &amp; vbCrLf<BR>&gt;&nbsp; &gt;&nbsp;=20
> &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; .MoveNext<BR>&gt;&nbsp; =
> &gt;&nbsp; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Loop<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; strXM =
> =3D strXM=20
> &amp; "&lt;/ROOT&gt;"<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; (The =
> vbCrLf's are there so if there is a problem the document can =
> be<BR>&gt;&nbsp;=20
> &gt;&nbsp; printed to a text file and be easier for humans to read -- =
> SQL=20
> Server<BR>&gt;&nbsp; &gt;&nbsp; ignores them.&nbsp; The =
> single-character=20
> aliases for entity and attribute<BR>&gt;&nbsp; &gt;&nbsp; names are =
> for=20
> performance -- for most of the stuff we use these for it<BR>&gt;&nbsp; =
> &gt;&nbsp; doesn't really matter because we are only sending a few =
> rows, but=20
> the<BR>&gt;&nbsp; &gt;&nbsp; first time I did it it was for something =
> that was=20
> sending about 5000<BR>&gt;&nbsp; &gt;&nbsp; rows and there it made a =
> huge=20
> difference, so I stuck with it.&nbsp; We<BR>&gt;&nbsp; &gt;&nbsp; =
> comment both=20
> the front-end code and the stored procedure with the<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> mappings of these aliases.)<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp;&nbsp; I do not know how SQL Server maps from char to nchar, =
> specifically what conversion is performed.&nbsp; Also, in some (maybe =
> all=20
> released) versions of SQL Server nchar and nvarchar are encoded in=20
> UCS-2.&nbsp; UCS-2 is a 16-bit encoding like UTF-16.&nbsp; It dates =
> back to=20
> when Unicode was defined as having 2**16 characters instead of the =
> 2**20+ that=20
> it has now.&nbsp; You can not express characters &gt;=3D U+10000 in =
> UCS-2 not=20
> that you care about these.<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =
> Thankfully, no.&nbsp;&nbsp; :)<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;<BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp;&nbsp; I don't know if whether those systems you =
> describe=20
> being written in java make a difference.&nbsp; They can do what they=20
> want.&nbsp; The native java string is Unicode though I don't remember =
> if it is=20
> UCS-2 or UTF-16.&nbsp; My guess is that it was once the former and is =
> now the=20
> latter.&nbsp; One of the documents on this on sun's site suggests that =
> java=20
> used UCS-2 until the recently released 1.5 which is the first to use=20
> UTF-16.<BR>&gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; The Java native =
> string=20
> being unicode is exactly what made me start<BR>&gt;&nbsp; &gt;&nbsp; =
> worrying=20
> -- when I was learning Java a couple of years ago (because =
> I<BR>&gt;&nbsp;=20
> &gt;&nbsp; wanted to port an app to it so as to be able to run it =
> right on the=20
> Unix<BR>&gt;&nbsp; &gt;&nbsp; box where the Oracle database was) I was =
> horrified the first time I<BR>&gt;&nbsp; &gt;&nbsp; tried reading back =
> what I=20
> had written to a text file when I saw spaces<BR>&gt;&nbsp; &gt;&nbsp; =
> between=20
> all the characters.<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;Rich<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp;=20
> "Ellen K." &lt;<A=20
> =
> href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
> t;=20
> wrote in message <A=20
> =
> href=3D"news:aqag115606i9g8bmh3lst66une1f1sotth@4ax.com">news:aqag115606i=
> 9g8bmh3lst66une1f1sotth@4ax.com</A>...<BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; UTF-8 is unicode?!?&nbsp;&nbsp; Sheesh, all this =
> time I=20
> thought it meant 8-bit.<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; In fact I =
> could=20
> swear I read that somewhere.<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;<BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; My question was coming from the database =
> perspective,=20
> where I always use<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; char and =
> varchar, as=20
> opposed to nchar and nvarchar.&nbsp; I give the<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; front-end guys little templates for creating the XML =
> documents for=20
> all<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; my SQL Server stored =
> procedures that=20
> take XML input, and I always<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
> specify UTF-8=20
> in the header... and my char and varchar columns always<BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; end up normal, so since you're now telling me =
> UTF-8 is=20
> really unicode, I<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; guess that would =
> answer=20
> my question for XML data I would be getting from<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; the apps...?&nbsp;&nbsp;&nbsp; Or would the answer be =
> different if=20
> the incoming XML is<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; some other=20
> encoding?<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp; To=20
> simulate getting nvarchar data from somewhere, I just tried=20
> creating<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; two dummy tables, one =
> with an=20
> nvarchar column and the other with a<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp;=20
> varchar column, typed stuff into the nvarchar one, then inserted to=20
> the<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; varchar one select from the =
> nvarchar=20
> one and it looks normal.&nbsp; <BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;<BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; If all this means I was worrying about nothing,=20
> excellent!&nbsp;&nbsp; OTOH, is<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
> there=20
> something I should be worrying about that I didn't ask?<BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; The only pieces =
> whose=20
> names I know so far are Sonic and SalesForce, both<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; of which are written in Java, if that makes any =
> difference.&nbsp; I=20
> know<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; there is at least one other =
> external=20
> piece but I think that is the next<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
> phase.<BR>&gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp; On=20
> Sat, 19 Feb 2005 21:37:15 -0800, "Rich" &lt;@&gt; wrote in=20
> message<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &lt;<A=20
> =
> href=3D"mailto:421821c1$1@w3.nls.net">421821c1$1@w3.nls.net</A>&gt;:<BR>&=
> gt;&nbsp;=20
> &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp; =
> You need=20
> to be more specific than "8-bit characters".&nbsp; There are many =
> 8-bit=20
> character encodings.&nbsp; If you are using Windows to generate your =
> data you=20
> most likely are using Windows-1252 which is the default 8-bit =
> character set=20
> for U.S. English in Windows.&nbsp; Windows supports many 8-bit =
> encodings so=20
> you could be using something else too.<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp; Unicode is a =
> character set not an encoding.&nbsp; There are multiple encodings the =
> main=20
> ones being UTF-8, UTF-16, and UTF-32.&nbsp; You can use any of these =
> for XML=20
> as well as non-Unicode encodings.&nbsp; For interoperability you =
> should use=20
> Unicode preferably UTF-8.<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp;&nbsp; What comes =
> out when=20
> the XML is parsed depends on the XML parser.&nbsp; XML is logically =
> expressed=20
> in Unicode.&nbsp; The Windows XML parsers provide a Unicode =
> interface.&nbsp;=20
> Other parsers could do differently.<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;Rich<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; =
> &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; "Ellen K." &lt;<A=20
> =
> href=3D"mailto:72322.1016@compuserve.com">72322.1016@compuserve.com</A>&g=
> t;=20
> wrote in message <A=20
> =
> href=3D"news:4o2g11pu048kafbdilg46u77vs5ls0be55@4ax.com">news:4o2g11pu048=
> kafbdilg46u77vs5ls0be55@4ax.com</A>...<BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; Our new enterprise system is going to =
> be=20
> built around an Enterprise<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp;=20
> Service Bus.&nbsp; I don't have the full specs yet but as I understand =
> it=20
> the<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; main apps (starting =
> with=20
> SalesForce) are going to be out on the internet<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; and the Sonic ESB will be the messaging =
> piece.&nbsp;=20
> There will&nbsp; be an<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp;=20
> Operational Data Store in house that will get updated every night on=20
> a<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; batch basis from the =
> main=20
> apps.&nbsp; <BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; My data warehouse will continue to be the data =
> warehouse=20
> and will remain<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; in =
> house.&nbsp;=20
> The dimensions will stay the same but I might have to =
> create<BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; separate measures for the data from =
> the new=20
> apps and then create views<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp; to=20
> keep everything transparent to the users.&nbsp;&nbsp; <BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp; I'm=20
> thinking if we're going to have an ODS in house already, I may=20
> as<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; well do the ETL from =
> there.&nbsp;&nbsp; But I'm worrying that the new data =
> will<BR>&gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; probably be unicode (because Java =
> defaults to=20
> that and SalesForce is<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; =
> written=20
> in Java).&nbsp; Right now I am storing everything (except our=20
> blobs<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; of course) in =
> 8-bit=20
> characters.&nbsp;&nbsp; <BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp;=20
> &gt;<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; Anyone here who's =
> up on=20
> this stuff, can the XML that goes back and forth<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; convert between unicode and 8-bit characters, or =
> am I=20
> gonna have to<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; redefine =
> all my=20
> data?&nbsp;&nbsp; For example, if unicode data is put into an=20
> XML<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; document that =
> specifies=20
> UTF-8, what comes out when the document is<BR>&gt;&nbsp; &gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; parsed?&nbsp; How about vice versa?&nbsp; If this is too =
> simplistic=20
> to work, what<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; is=20
> needed?<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; (We actually have no substantive need for =
> unicode -- we=20
> are bilingual<BR>&gt;&nbsp; &gt;&nbsp; &gt;&nbsp; &gt;&nbsp; Spanish =
> but all=20
> the special Spanish characters exist in the ascii<BR>&gt;&nbsp; =
> &gt;&nbsp;=20
> &gt;&nbsp; &gt;&nbsp; character set.)<BR></BLOCKQUOTE></BODY></HTML>

--- BBBS/NT v4.01 Flag-5
 * Origin: Barktopia BBS Site http://HarborWebs.com:8081 (1:379/45)