Text 657, 181 rader
Skriven 2007-02-10 13:53:10 av Mithgol the Webmaster (2:5063/88)
Ärende: [2/11] FidoURL.txt
==========================
* written in FTSC_PUBLIC
* also sent to CU.TALK
* also sent to GANJANET.LOCAL
* also sent to RU.FIDO.WWW
* also sent to RU.FIDONET.TODAY
* also sent to RU.FTN.DEVELOP
* also sent to SU.FIDOTECH
* also sent to TITANIC.BEST
textsection 2 of 11 of file FidoURL.txt
textbegin.section
5.1. The main parts of URLs
-+-------------------------
In general, Fidonet URLs are written as follows:
<scheme>:<scheme-specific-part>
Any URL contains the name of the scheme being used (<scheme>)
followed by a colon and then a string (the <scheme-specific-part>)
whose interpretation depends on the scheme.
Scheme names consist of a sequence of characters. The lower case
letters "a"--"z", digits, and the characters plus ("+"), period
("."), and hyphen ("-") are allowed. For the sake of resiliency,
programs interpreting Fidonet URLs SHOULD treat upper case letters
in scheme names as equivalent to the corresponding lower case
letters (e.g., allow "AREA" scheme name as well as "area").
Only the first colon of the URL plays the role of delimiter
between <scheme> and <scheme-specific-part>. The scheme-specific
part of any URL MAY contain other colons.
The colon delimiter between <scheme> and <scheme-specific-part>
MAY be immediately followed by an optional double slash ("//").
Fidonet programs interpreting URLs MUST treat the delimiter "://"
as equivalent to the simple colon before <scheme-specific-part>.
5.1.1. Conformance note
-+---------------------
This subsection is informative.
The Fidonet URL schemes defined in this document consist of
lower case letters "a"--"z" only. However, digits, and the
characters plus ("+"), period ("."), and hyphen ("-") MUST also
be allowed in scheme names, so that Internet schemes conforming
with the specifications of RFC 1738 are correctly dealt with.
5.1.2. Delimiter guidelines
-+-------------------------
In current Internet practice they distinguish between delimiters
":" and "://". The delimiter "://" is often used after scheme
names that designate objects and resources ("http://", "ftp://",
"gopher://", "nntp://", "ed2k://", "file://", etc.).
The delimiter ":" is often used after scheme names that
designate actions (e.g. "mailto:", "skype:").
The same difference exists between Fidonet resources (objects)
and actions. That's why, though these delimiters MUST always
be interpreted as equivalent, it is still RECOMMENDED that ":"
SHOULD be used after schemes that designate actions ("netmail:",
"echomail:", "areafix:") and "://" SHOULD be used after schemes
that designate resources ("area://", "freq://", "fecho://",
"faqserv://", etc.).
5.2. URL character encoding
-+-------------------------
URLs are sequences of characters (i.e., letters, digits, and/or
special characters). URLs may be represented in a variety of ways:
e.g., ink on paper, or a sequence of octets in a coded character
set. The interpretation of URL depends only on the identity of the
characters used.
It is useful to distinguish between a "character" (distinguishable
semantic entity) and an "octet" (an 8-bit byte).
In most URL schemes, the character sequences in different parts
of a URL are used to represent sequences of octets used in Fidonet
services. For example, in the "netmail:" scheme, the Fidonet
address, netmail subject and addressee name are such sequences of
octets, represented by parts of the URL. That sequences of octets,
in turn, represent the original characters (of subject line, or of
sysop's name, etc.); each original character is represented by one
or more octets.
So there are always two mappings, one from URL characters to
octets, and the second from octets to original characters:
URL character sequence<->octet sequence<->original character sequence
5.2.1. Encoding of original characters
-+------------------------------------
The following paragraph is informative.
The sequence of octets defined by a component of the URL
is subsequently used to represent a sequence of original
characters. That process could have a very volatile nature.
Being an international network, Fidonet always needs to deal
with hundreds of national characters, with dozens of available
encoding traditions and character sets. There is a number of
FSC (Fidonet Standard Proposal documents) suggesting several
kludge-based methods to define which character set is used.
However, it is not wise to implement any equivalents to kludges
as a required part of every Fidonet URL; and it could be hard to
mantain complete lists of all possible character sets inside all
programs interpreting Fidonet URLs. (Remember, it should be also
made possible for Fidonet URLs to appear and be well interpreted
in traditional HTML hypertext environment of the Web, Internet
e-mail, instant messaging, etc.) That's why only one encoding,
with large enough character set, has to be chosen.
The following paragraphs of this subsection are normative.
The sequence of octets used in Fidonet URLs MUST always contain
UTF-8 encoded representation of original characters.
ISO/IEC 10646-1 defines a multi-octet character set called the
Universal Character Set (UCS), which encompasses most of the
world's writing systems. And UTF-8, one of a few so-called UCS
transformation formats (UTF), preserves the 7-bit ASCII range,
thus providing some compatibility with file systems, parsers and
other software elements that rely on 7-bit ASCII values but are
transparent to other values.
UTF-8 is defined in RFC 2279. Its description can also be found
in Unicode Technical Report #4 and in the Unicode Standard,
version 2.0.
5.2.2. Encoding of octets
-+-----------------------
The character sequences in different parts of a URL are used
to represent sequences of octets.
It is possible to represent an octet by the chararacter
which has that octet as its code within the pure 7-bit ASCII
character set. However, there are some exceptions (see below).
Alternatively, octets MAY be encoded by a character triplet
consisting of the character "%" followed by the two hexadecimal
digits (from "0123456789ABCDEF") which form the hexadecimal
value of the octet. (The characters "abcdef" MAY also be used in
hexadecimal encodings.)
Hexadecimal encoding of any octet MAY be used even when it is
not REQUIRED or RECOMMENDED. However, it is RECOMMENDED to avoid
unnecessary hexadecimal encoding, thus keeping URLs reasonably
short.
It is either REQIRED or RECOMMENDED to use the hexadecimal
encoding of octets if they have no corresponding graphic
character within the 7-bit ASCII character set, or if the use
of the corresponding character is unsafe, or if the
corresponding character is reserved for some other
interpretation within the particular URL scheme. These
requirements and recommendations are detailed below.
5.2.2.1. No corresponding graphic 7-bit character
-+-----------------------------------------------
URLs are written only with the graphic printable characters
of the 7-bit ASCII coded character set.
The octets 80-FF hexadecimal do not belong to 7-bit ASCII,
and the octets 00-1F and 7F hexadecimal represent control
characters; these octets MUST be encoded.
textend.section
With best Fidonet 2.0 regards,
Mithgol the Webmaster. [Real nodelisted name: Sergey Sokoloff]
... 204. I will hire an entire squad of blind guards.
--- Something is rotten in the state of Denmark. (Shakespeare, Hamlet, I, IV)
* Origin: I have a strange feeling, as if I already had a deja vu (2:5063/88)
|