Text 21305, 231 rader
Skriven 2015-02-23 01:17:21 av FidoNews Robot (2:2/2.0)
Ärende: FidoNews 32:08 [02/07]: General Articles
================================================
=================================================================
GENERAL ARTICLES
=================================================================
How to create and edit UTF-8 text files
By Michiel van der Vlist, 2:280/5555
Last week I wrote about the Z2 UTF-8 nodelist project. In order to
participate, nodelist clercks and maybe individual sysops need to
create and edit UTF-8 text files. Whatever way is used to create a
nodelist segment, it always starts with a text editor to create the
master input. In this case a UTF-8 capable editor.
So what UTF-8 capable editors do we have:
This week we will limit ourselves to Windows:
Notepad
=======
An obvious choice is Notepad. It comes with the OS. This is how one
creates a UTF-8 text file with notepad.
1) Start notepad by clicking on the icon, going to the start menu or
type notepad on the command line.
2) Open an existing file by clicking of "file"and "open" or a new
file by clicking on "new".
3) Make whatever changes you want.
4) Click on "file" and "save as".
5) In the menu at the bottom where it now displays "ANSI", select
"UTF-8".
6) Enter the file name and click on "save". In the case of an
existing file, the file name may be the same as that of the existing
file. A window will pop up to ask if the existing file may be
overwritten.
If non-ASCII characters have been added to the file, the next time
notepad is used to open the file, it will recognise it as a UTF-8
encoded file and open in that mode. When saving the file, it is no
longer needed to use the "save as" method to retain the encoding.
Just pressing "save" will do.
Notepad has one little quirk that requires a work around if the
edited file is a nodelist segment. Notepad adds a so called
"Byte Order Mark" to the start of the file. Using a BOM to an UTF-8
encoded file is an outdated concept but Notepad insists on adding it
anyway. The BOM consists of the three byte sequence 0xEF 0xBB 0xBF.
In most applications this does no harm, they will just ignore the
BOM, but for nodelist processing it is a show stopper. So...
How to remove a BOM?
Method A.
1) Open the file with a classic editor like EDIT. Even EDLIN will do.
2) Remove the first three "funny" characters of the file and save it.
Method B.
Use a serial editor like SED.
SED s/\xEF\xBB\xEF//g {infile} >{outfile}
A Windows version for SED can be found here:
http://gnuwin32.sourceforge.net/packages/sed.htm
Method C....
Maybe, just maybe a future version of MakeNl will have the ability
to remove the BOM all by itself, so you need not worry about it any
more when using Notepad or another editor that insists on adding a
BOM.
Notepad++
=========
Notepad++ is an open source free software project under the GPL
licence. It may be a bit of an overkill for just editing nodelist
segments, but it is highly recomended by some. I haven't tried it
myself (yet), so I can't really comment on the details, but I do
know that it does not insist on adding a BOM to the edited file.
http://en.wikipedia.org/wiki/Notepad%2B%2B
Winvi
=====
Winvi is a Windows version of vi, an editor desigend for Linux. It
is a bit of an oddball in that it does things "different". Some love
it, some hate it. I like it because it does not need jumping through
hoops to create and edit UTF-8 files. Just click on the button for
UTF-8 at the top and go ahead. It has lots of whistles and bells,
you can also create and edit files in other encodings, like the
common DOS and Windows code pages, you can define profiles and other
nice things, but we need not go into that just now.
http://www.winvi.de/en/
There are plenty of other UTF-8 capable editors for Windows. Google
is your friend. Your milage may vary.
How to enter non ASCII characters with the keyboard.
Finding an UTF-8 capable editor is one thing, entering non-ASCII
characters via the keyboard is another. Those who's native language
is a language that is not US English are uisually familiar with how
to enter the characters that are used frequently in their own
language. In many countries keyboards are used that have a different
layout than the standard US layout. They have keys for characters
that are used often in those languages. The German keyboard has keys
for the A and O with umlaut. (Ä Ö). The Belgian keyboard has keys for
letters with accents. The Russian keybaord has dual labelling. It has
the normal US layout labelled on the upper left of each key, more or
less the same as a US keyboard. In addition it has cyrillic
characters on the lower right of each key, usually in a different
colour. Also there is an Alt-Gr key on many non US keyboards. This is
actually the right Alt key with just another lable. It acts as a sort
of second shift key. On my keyboard for example prewssing Alt-Gr
together with the '5' key on the upper row corresponds to a Euro
sign. And then there is the "dead key" method. In some countries they
use a standard US keyboard but deploy a driver that allows some
shortcuts for often used characters. On my US keyboard a first type
the key for the single quoute followed by the e to get an e with
accent: é. To get the single quote itself, I have to type it twice or
have it followed by a space. It sounds akward, but when one is used
to ity, it is convenient.
Note that the keyboards are all he same under the hood. It is just
the labelling on the keys that is different. It is the keyboard
driver that maps the keys to alle the different characters. I won't
go into the details. Everyone knows how to enter the non ASCII
characters that he needs for his own language.
But how to type characters that are not on the keyboard and that
the keyboard driver driver does not have known shortcuts for?
There are ways to enter a character by its nummeric value. One method
is to hold the Alt key and enter the three digit decimal number of
the character on the nummeric keyboard. The result depends on the
active code page. If the code page is set to 437 or 850 typing alt
148 will result in a small o with um;laut. ö. When the code page is
set to 866 it will result in the Russian small "ef".
One can also enter a four digit number. In that case one will get the
corresponding character in the Windows character set. Typing Alt 0148
results in a right double quote. Typing Alt 0128 gives a Euro sign.
That is of course when the system is set for "Western". Your milage
will vary with other language settings.
For Windows systems that use Unicode internally, WinXp and up does,
there is another method. Language and code page independent. One can
directly enter value of the code point in hex.
Press and hold down the Alt key.
Press the + (plus) key on the numeric keypad.
Type the hexidecimal unicode value.
Release the Alt key.
To get the Dutch ij ligature on your screen type Alt +133.
In this case one does not have to enter leading zeros, they may be
omitted.
If it does not work, you need to change a setting in the registry:
Under HKEY_Current_User/Control Panel/Input Method, set
EnableHexNumpad to "1". If you have to add it, set the type to be
REG_SZ.
You have to reboot the machine for the change to take effect.
Source:
http://www.fileformat.info/tip/microsoft/enter_unicode.htm
On a laptop or other device without a nummeric key pad, it is a bit
more complicated but still possible:
Activate the numlock.
Press the blue Fn key and hold it while hunting for the "nummeric" +
key. Sometimes it is marked on the key in blue, sometimes it is not.
It usually is the key that in normal mode is the key for / and ?. If
not try neighbour keys.
Locate the "nummeric" keys for 0-6. they usually correspond to the
normal keys M J K L U I O and usually are labelled on those keys in
blue. The keys for 7, 8 and 9 are the same as the keys on the normal
nummeric keys in the top row.
Once you have located the "nummeric keys", you can enter the codes by
pressing and holding the Fn and Alt key together and entering the
code using the keys just located.
Ehhh, one more thing. When entering a hexadecimal number it gets even
more complex. The digits A-F are entered using the normal A-F keys on
the keyboard but you have to release the Fn key and keep holding the
Alt key while doing so.
SO... to get a Euro sign on the screen of your laptop press the
following sequence:
1) Hold down the Fn and Alt keys
2) Press "nummeric" +
3) Press K
4) Press M
5) Release the Fn key while keeping the Alt key pressed
6) Press A
7) Press C
8) Release Alt.
Wauw...
OK, enough for this week.
-----------------------------------------------------------------
--- Azure/NewsPrep 3.0
* Origin: Home of the Fidonews (2:2/2.0)
|