Text 1051, 365 rader
Skriven 2007-10-14 00:25:06 av Mithgol the Webmaster (2:5063/88)
Ärende: [2/11] FidoURL.txt
==========================
* originally in FTSC_Public
* also sent to GanjaNet.Local
* also sent to Ru.Fido.WWW
* also sent to Ru.FTN.Develop
* also sent to SU.FidoTech
* also sent to Titanic.Best
textsection 2 of 11 of file FidoURL.txt
textbegin.section
5.2.2. Encoding of octets
-+-----------------------
The character sequences in different parts of a URL are used
to represent sequences of octets.
It is possible to represent an octet by the chararacter
which has that octet as its code within the pure 7-bit ASCII
character set. However, there are some exceptions (see below).
Alternatively, octets MAY be encoded by a character triplet
consisting of the character "%" followed by the two hexadecimal
digits (from "0123456789ABCDEF") which form the hexadecimal
value of the octet. (The characters "abcdef" MAY also be used in
hexadecimal encodings.)
Hexadecimal encoding of any octet MAY be used even when it is
not REQUIRED or RECOMMENDED. However, it is RECOMMENDED to avoid
unnecessary hexadecimal encoding, thus keeping URLs reasonably
short.
It is either REQIRED or RECOMMENDED to use the hexadecimal
encoding of octets if they have no corresponding graphic
character within the 7-bit ASCII character set, or if the use
of the corresponding character is unsafe, or if the
corresponding character is reserved for some other
interpretation within the particular URL scheme. These
requirements and recommendations are detailed below.
5.2.2.1. No corresponding graphic 7-bit character
-+-----------------------------------------------
URLs are written only with the graphic printable characters
of the 7-bit ASCII coded character set.
The octets 80-FF hexadecimal do not belong to 7-bit ASCII,
and the octets 00-1F and 7F hexadecimal represent control
characters; these octets MUST be encoded.
5.2.2.2. Unsafe characters
-+------------------------
Characters can be unsafe for a number of reasons.
The space character is unsafe because significant spaces may
disappear and insignificant spaces may be introduced when URLs
are transcribed or typeset or subjected to the treatment of
word-processing programs. The octet 20 hexadecimal MUST always
be encoded.
The characters "<" and ">" are unsafe because they are used
as the delimiters around tags in HTML hypertext and XML data.
The octets 3C and 3E hexadecimal MUST always be encoded.
The quote mark (""") is used to delimit URLs in some systems,
including valid XHTML and XML. The octet 22 hexadecimal
MUST always be encoded.
The character "#" is unsafe because it is used in World Wide
Web and in other systems to delimit a URL from a fragment or
anchor identifier that might follow it.
The octet 23 hexadecimal MUST always be encoded.
The character "%" is unsafe because it is used for encodings
of other characters. The octet 25 hexadecimal MUST always be
encoded.
The character sequence of triple minus ("-" repeated thrice)
has a special meaning in Fidonet and can accidentally start
a tearline in some cases (e.g. when a line is wrapped).
At least one of the three corresponding octets
(2D 2D 2D hexadecimal) MUST be encoded if they follow
each other in a sequence.
Other characters were declared unsafe in RFC 1738 because some
gateways and other transport agents were known to sometimes
modify such characters. These characters are "{", "}", "|",
"\", "^", "~", "[", "]", and "`". The corresponding octets
(7B 7D 7C 5C 5E 7E 5B 5D 60 hexadecimal) MUST always be
encoded for the sake of Internet compatibility.
All unsafe characters MUST always be encoded within a URL.
For example, the character "#" MUST be encoded within URLs
even in software programs that do not normally deal with
fragment or anchor identifiers, so that if the URL is copied
into another program that does use them, it will not be
necessary to change the URL encoding.
5.2.2.3. Reserved characters
-+--------------------------
Many URL schemes reserve certain characters for a special
meaning: appearance of that characters in the scheme-specific
part of the URL (in <scheme-specific-part> after scheme name)
has a designated semantics.
Usually a URL has the same interpretation when an octet is
represented by a character and when it is encoded. However,
this is not true for reserved characters: encoding a character
that is reserved for a particular scheme may cause harm to
the meaning of a URL, if the character is used according
to its designated semantics. And vice versa.
The character "?" is used as the delimiter between required
and optional parts of the URL. The delimiter itself MUST NOT
be encoded. If the character "?" appears in any other part of
a URL, it MUST be encoded, so it won't be confused with the
delimiter.
The character "=" is used as the delimiter between parameter
names and parameter values. The delimiters themselves MUST NOT
be encoded. If the character "=" appears in any other part
of a URL, it MUST be encoded, so it won't be confused with
any of the delimiters.
The character "&" is used as the delimiter between
"parameter=value" pairs. The delimiters themselves MUST NOT
be encoded. If the character "&" appears in any other part
of a URL, it MUST be encoded, so it won't be confused with
any of the delimiters.
The character "@" is used as the delimiter between an areatag
and its FTN domain suffix (see subsection 5.2.2.3.1 for
details). The delimiters themselves MUST NOT be encoded. If
the character "@" appears inside the areatag itself (i.e. not
between the areatag and its suffix), it MUST be encoded,
though in any other part of an URL this character MAY be left
as it is.
The character "/" is scheme-specific:
*) In some schemes ("netmail:", for example) the character "/"
has its own (literal) meaning, as it is widely used
in standard Fidonet addressing notation
<zone>:<net>/<node>.<point> (see FSP-1004 for details).
*) In some other schemes the character "/" is reserved
to be used in the file path
(<directory>/<directory>/...<directory>/<filename>),
and its corresponding octet (2F hexadecimal)
MUST be encoded if it does not delimit parts of the path.
See the scheme-specific details below (in scheme sections).
5.2.2.3.1 Using domain suffixes in areatags
-+-----------------------------------------
Different domains of Fidonet (in "@<domain>" sense,
see FSP-1004 for details), also known as Fidonet Technology
Networks, MAY have common echomail areas (i.e. areas that
are gated between some of FTNs) and MAY have internal
echomail areas (i.e. areas distributed only inside
the domain).
If a Fidonet station has access to echomail areas in
dirrerent domains, it MAY encounter areas of the same name
(of the same areatag) in different FTN domains. It's OK
if it is the same common area; however, even if they are
different internal areas that just have the same name
by coincidence, the Uniform Resource Locator MAY contain
an optional "@<domain>" suffix after the areatag, and thus
distinguish between different areas. The suffix contains
the domain name of the FTN of the designated echo area and
the preceding "@" symbol.
The same rule applies to areatags of file echoes.
Examples:
area://jabber@fidonet
area://jabber@othernet
areafix:sysop.talks@fidonet
areafix:sysop.talks@othernet
fecho://common.files@fidonet
fecho://common.files@othernet
Domain suffixes are intentionally OPTIONAL, because FTNs
generally have their own means to ensure that the names of
echomail areas are unique. Some FTNs, for example, use
their domain names as prefixes or suffixes for echomail area
names (i.e. othernet.areaname, or areaname.othernet), thus
eliminating the need of a special URL element, that
otherwise would be needed for the same purpose.
The character "@" is a reserved character. When it is used
as the delimiter between an areatag and its FTN domain
name, the character "@" MUST NOT be encoded. However,
if the character "@" appears inside the areatag itself (e.g.
when the area name is something like SETI@home), then
the character MUST be encoded, so it won't be confused with
the delimiters.
But outside of the areatags the character "@" is not
reserved, so it MAY be either encoded or left intact in any
other part of the URL (e.g. in object's path, in parameter's
name, in parameter's value, etc.).
5.2.2.4. The plus ("+") and the encoding of white spaces
-+------------------------------------------------------
White spaces (octets 20 hexadecimal) are the most common
unsafe characters in Fidonet, and so they play a significant
role in some scheme-specific parts of the URL: they appear in
MSGID kludges, they are used as delimiters between words
in lines of text, etc.
To enhance human readability of Fidonet URLs, and to make them
shorter, a new shorter synonym for "%20" hexadecimal triplet
is available. It is the plus sign ("+").
Programs interpreting scheme-specific part of Fidonet URL
MUST treat the character plus ("+") there as equivalent
to the white space hexadecimal triplet ("%20").
Because of that, the plus character itself is reserved, and
its own corresponding octet (2B hexadecimal) MUST be encoded
if it appears in scheme-specific part of Fidonet URL.
5.2.2.4.1. Specificity note
-+-------------------------
The rule of equivalence between "+" and "%20" does not apply
outside of the scheme-specific part of URL; the plus sign
has no special meaning in scheme name, since white spaces
are not allowed in scheme names.
5.2.2.4.2. Internet practice note
-+-------------------------------
The same shortening already happens in Internet. Open
http://www.google.ru/search?q=Fidonet+URL URL, and you'll
get the Google search for "Fidonet URL" (not "Fidonet+URL");
http://www.google.ru/search?hl=ru&q=Fidonet%2BURL is needed
if you're looking for "Fidonet+URL".
This practice is not documented in RFC 1738. It is, however,
documented in RFC 1630.
5.2.2.5. URLs that span several lines of text in Fidonet
-+------------------------------------------------------
Some Fidonet mail editors and other units of software do not
permit lines of text to be longer than some limit, e.g. longer
than 78 or 80 characters (or a lesser limit, especially inside
quotes). If text is longer than limit, it spans several lines
(usually a line break is inserted instead of a white space;
however, if more than 80 successive characters do not contain
white spaces, the line MAY be broken anyway. Or less than 80:
the limit MAY vary.)
Sometimes it MAY become necessary for a long enough URL
to span several lines as well. To distinguish between URLs
that span several lines and URLs that just end (by chance)
before some end of line, a special mark is needed.
Two successive "%" characters MUST NOT appear in URLs (because
"%" MUST be followed by two hexadecimal digits), and they are
also rare in ordinary text. That's why "%%" character sequence
MUST be used before and after a line break in URL, to mark
that the line break does not end the URL.
If an URL parser encounters "%%" character sequence in the URL
it parses, then the parser MUST skip the "%%" sequence, and
all characters after it and before the line break, and
the line break, and all characters after the line break
and before the next "%%" sequence, and that "%%" sequence.
Then the URL continues.
Quote decoration MAY be encountered after the line break and
before the "%%" sequence marking the place where the URL
resumes. Fidonet mail editors MAY rearrange the "%%" sequences
and line breaks when quoting the quotes.
Example:
MtW>> To track Fidonet software development in Russian,
MtW>> a newsreel like area://Ru.FTN.Develop+Ru.FTN.Win%%
MtW>> %%Soft+Ru.FIPS/ is often used.
MtW>>>>> To track Fidonet software development in Russian,
MtW>>>>> a newsreel like area://Ru.FTN.Develop+Ru.FTN.W%%
MtW>>>>> %%inSoft+Ru.FIPS/ is often used.
The URL used in this example:
area://Ru.FTN.Develop+Ru.FTN.WinSoft+Ru.FIPS/
(the meaning of area:// URLs is explained in section 7.2)
Frame decoration MAY be encountered after the line break and
before the "%%" sequence marking the place where the URL
resumes, or before the line break and after the "%%" sequence
marking the place where the URL pauses.
Example:
+==========================================================+
+ +
+ To track Fidonet software development in Russian, +
+ a newsreel like area://Ru.FTN.Develop+Ru.FTN.Win%% +
+ %%Soft+Ru.FIPS/ is often used. +
+ +
+==========================================================+
Any other decoration is also possible, so the URL parser MUST
expect it. For example, the URL parser MUST allow more than
one line break between the URL-pausing "%%" and the next "%%",
because additional line breaks MAY be introduced by quoting.
Example:
***********************************************************
***********************************************************
** **
** ATTENTION! Grab the N5019 pointlist at fecho://p%% **
** %%ntlist/pnt5019.zip **
** **
***********************************************************
***********************************************************
MtW> ******************************************************
MtW> *****
MtW> ******************************************************
MtW> *****
MtW> **
MtW> **
MtW> ** ATTENTION! Grab the N5019 pointlist at fecho://p%%
MtW> **
MtW> ** %%ntlist/pnt5019.zip
MtW> **
MtW> **
MtW> **
MtW> ******************************************************
MtW> *****
MtW> ******************************************************
MtW> *****
The URL used in this example:
fecho://pntlist/pnt5019.zip
(the meaning of fecho:// URLs is explained in section 7.4)
textend.section
With best Fidonet 2.0 regards,
Mithgol the Webmaster. [Real nodelisted name: Sergey Sokoloff]
... Never judge an iBook by its cover. (Bugzilla Quip System)
--- Come with me in the twilight of a summer night for a while... .hack//SIGN
* Origin: Be careful, the paranoid ones are always wathing you!.. (2:5063/88)
|