Document: FSC-0007 Version: 002 Date: 17-Apr-90 FidoNet(r) RFC822-Style Message Format (Informal Proposed Message Format Specification - Draft, revised) Robert Heller @ 1:321/153.0 April 17, 1990 Status of this document: This FSC suggests a proposed protocol for the FidoNet(r) community, and requests discussion and suggestions for improvements. Distribution of this document is unlimited. Fido and FidoNet are registered marks of Tom Jennings and Fido Software. 1. Purpose. =========== The purpose of this document is to outline my ideas concerning FidoNet (r) message format, both as stored on disk as message files handled by BBS (or other "conferencing" programs) and as these messages exist packed into "bundles" or "packets" as transmitted from machine to machine. I think using a uniform format for normal message storage will make things easier, at least in terms of "standardized" message bundling and transmiting software is concerned. If done right it also makes things easier for BBS and conferencing software writers too. This specification is only a first draft proposal. Just something to put on the table for discussion. Feel free to comment on it. I am open to suggestions. 2. Preliminary Definations. =========================== I will be using BNF notation to describe the format of data fields. This is a fairly standard notation and should be familar to anyone who has taken a compiler design course. To make things a little briefer, I will be using some pre-defined psuedo-terminal symbols. These symbols are defined as: o The symbol ALPHA referes to any ASCII alphabetic character, including the uppercase letters ('A' thru 'Z', 41H thru 5AH), the lowercase letters ('a' thru 'z', 61H thru 7AH) and these characters: '#' (23H), '$' (24H), '&' (26H), '*' (2AH), '+' (2BH), '-' (2DH), '=' (3DH), '^' (5EH), and '_' (5FH). o The symbol DIGIT refers to any of the ASCII characters '0' thru '9' (30H thru 39H). | o The symbol NEWLINE refers to the single ASCII character LF (0AH), | when the message is in transit, and refers to the local O/S's | newline convention for text files (i.e. LF under UNIX, CRLF under | MS-DOS and CP/M, CR under OS-9, etc.), or whatever is convient for | the BBS software. o The symbol WHITESPACE refers to one or more ASCII space (20H) or tab (09H) characters. o The symbol OPTWHITESPACE referes to zero or more ASCII space (20H) or tab (09H) characters. o The symbol TEXT referes to zero or more printable ASCII characters not including a NEWLINE sequence. o The symbol NULL referes to the null string (no characters at all). Oh, one other thing: message files contain only printable ASCII characters and NEWLINE sequences (packed messages will have non-printable bytes). Also, I'll number the definations. I am also only using six BNF operator characters: a vertical bar (|) for alteration, braces ({}) for comments, single quotes (') for character and string literals and parens (()) for expression grouping. 3. Definations 1: Stored Message. ================================= Changed or added definations are indicated by an '*' after the def number. The goal symbol is "". { A message consists of a header followed followed by a NEWLINE followed by a message body. } ::=
NEWLINE {Def 1.1} { A message body is just unbounded text. } ::=NULL | (TEXT NEWLINE ) {Def 1.2} { A header is more complicated: There are a series of header line types. }
::= NULL | ( NEWLINE
) {Def 1.3} * { This syntax defines the posiblity of a null header - this needs to be checked for by sematic routines, since it makes no sense. } ::=|||| |||| |||| | {Def 1.4} * { Now for the header line formats themselves. Some notes: certain header lines are required (, , and ), and some can only occur once (, , , and ). Except for these restrictions, most header lines can either be omited or can occur more than once. } ::='To: '
{Def 1.5} ::='From: '
{Def 1.6}
::= OPTWHITESPACE '@' OPTWHITESPACE {Def 1.7} ::= ALPHA {Def 1.8} ::= (ALPHA | DIGIT | WHITESPACE | NULL) {Def 1.9} { Note: this is the full blown FidoNet node address - includes optional zone and point numbers. It does not include the "domain". I am not sure about this - I think more discussion on the whole idea of "domains" and "zones" is needed. My feeling is we should look into a symbolic addressing system, simular to what the InterNet uses. } ::= (( ':') | NULL) {zone} '/' {basic net/node} (('.' ) | NULL) {point} {Def 1.10} ::= DIGIT | (DIGIT ) {Def 1.11} ::='Date: ' {Def 1.12} { Here it is: my idea for a *standard* date string } { day-of-week month date, year hour:minute AM/PM time-zone } { Although not specified, hours and minutes are zero padded to two digits. The date and year are not padded at all.} ::= ' ' ' ' ', ' ' ' ':' ((' ' | NULL) {Def 1.13} ::= 'Mon" | 'Tue' | 'Wed' | 'Thu' | 'Fri' | 'Sat' | 'Sun' {Def 1.14} ::= 'Jan' | 'Feb' | 'Mar' | 'Apr' | 'May' | 'Jun' | 'Jul' | 'Aug' | 'Sep' | 'Oct' | 'Nov' | 'Dec' {Def 1.15} { If the AM/PM indicator is missing (null), the hours field is assumed to in 24-hour format (i.e. 00 to 23) } ::= 'AM' | 'PM' | NULL {Def 1.16} { This field is optional. It makes sense given that FidoNet is international. } ::= ALPHA | (ALPHA ) {Def 1.17} ::=('Subject: ' | 'Subject (Private): ') TEXT {Def 1.18} ::='Cost: ' (('.' ) | NULL) {Def 1.19} { This is tricky, given the internationalness of FidoNet(r). I guess it isn't critical. } ::= '$' | '#' | NULL {Def 1.20} ::= 'Via: ' ', ' {Def 1.21} ::= NULL | (' ' TEXT) {Def 1.22} ::= 'Processed-by: ' TEXT {Def 1.22.1} * { This replaces the 'tear' line. } ::= 'Origin: ' TEXT '(' ')' {Def 1.23} * ::= 'Area: ' {Def 1.24} { I'm leaving the question of all caps for the area name open: other than ease of comparision, is it neccessary to be all caps? } ::= ALPHA | (ALPHA ) {Def 1.25} ::= 'Seen-By: ' {Def 1.26} * ::= | ( ) {Def 1.27} * ::= (( ':') | NULL) (( '/') | NULL) ( | NULL) (('.' ) | NULL) {Def 1.28} * { This is also open-ended. Should there be a standard format for this? The syntax here is somewhat ambigious - it allows for certain bogus forms. It needs sematic routines to handle these forms (raise an error or whatever). Writing the grammer to avoid these problems would add complexity not needed at this level. } ::= 'Path: ' {Def 1.28.1} * ::= 'Message-id: ' ' ' {Def 1.29} * { This is the syntax proposed by Jim Nutt } ::= {8 hex digits} {Def 1.29.1} * { I've left out a proper grammer rule or token for a hexidecimal number. } ::= 'Attributes: ' {Def 1.30} ::= | ( ', ' ){Def 1.31} { This is probably not complete, but...} ::='Kill Sent' | 'File Attached' | 'File Request' | 'Sent' | 'Crash' | 'Audit' {Def 1.32} { Maybe we should forget about an 'Attributes: ' header tag and instead have a collection of additional header tags to handle each posible attibute - i.e. 'File-Attached: ', 'File-Request: ', 'Sent: ', etc. header lines. } ::= ': ' (TEXT | NULL) {Def 1.33} { This is the expandsion hook. } ::= ALPHA {Def 1.34} ::=NULL | ((ALPHA | WHITESPACE | DIGIT | ) {Def 1.35} { This is also open-ended. Restriction: colon (:) cannot be allowed! } ::='(' | ')' | '.' | ',' | ';' {Def 1.36} 4. Packed Message Format. ========================= A packed message is simply a regular message with some binary header (i.e. an "envelope") info and a NUL (00H) byte after the message text: Offset dec hex .-----------------------------------------------. 0 0 | 0 | 3 | 0 | 0 | +-----------------------+-----------------------+ 2 2 | origZone (low order) | origZone (high order) | +-----------------------+-----------------------+ 4 4 | origNet (low order) | origNet (high order) | +-----------------------+-----------------------+ 6 8 | origNode (low order) | origNode (high order) | +-----------------------+-----------------------+ 8 8 | origPoint (low order) | origPoint (high order)| +-----------------------+-----------------------+ 10 A | destZone (low order) | destZone (high order) | +-----------------------+-----------------------+ 12 C | destNet (low order) | destNet (high order) | +-----------------------+-----------------------+ 14 E | destNode (low order) | destNode (high order) | +-----------------------+-----------------------+ 16 10 | destPoint (low order) | destPoint (high order)| +-----------------------+-----------------------+ 18 12 | Attribute (low order) | Attribute (high order)| +-----------------------+-----------------------+ 20 14 | message text (includes ASCII header) | ~ unbounded ~ | null terminated | `-----------------------------------------------' Some notes: I've included both the Zone and Point addresses in the packed message headers. This does not really affect things like routing and point mapping. The packets themselves have addressing info in their headers (as described in FSC001). The addressing in the packet header - this addressing is used by the transmitting programs. The internal addressing info is processed by re-packing programs - that is programs which peel routed messages (messages that are "just passing through") and re-packet them for later re-transmitsion to another node during a future mail event. Messages destined for the current node (one whose address exactly matches all four destination address words), get extracted from the packet and stored in the message base. Note that only the ASCII message text is stored. The binary header is discarded at this point. 5. Conclusions. =============== It is my idea that FidoNet(r) is going to sooner or later going need some of the extendablity provided by this sort of message format. If fact it allready needs some of these fields, and has been "faking it" for some time now: things like EchoMail ( "Area: ", "Origin: ", "Seen-By: ", and "Path: " header lines), points and zones (extra addressing hacks), uucp gatewaying (more extra addressing hacks), routing ("Via: " header lines). Going to a RFC822-style message format also helps to increase the varity of BBS and conferencing software - this will help improve the "state of the art" in this regard. Also, using a RFC822-style message format allows indefinite extensablity - as new ideas regarding messages and conferencing develope, the message format can be easily extended to handle these new ideas with ease. 6. Contact Info. ================ Comments, suggestions, gripes, etc. can be sent to me at any of these addresses: ARPANet: Heller@CS.UMass.Edu BITNET: Heller@UMass.BitNet Genie: RHELLER BIX: lockshill.bbs CompuServe: 71450,3432 FidoNet Robert Heller @ 1:321/153.0 USMail: HC82 Box 29 LH1, Locks Hill Road, Wendell, MA 01379 Voice Phone: Home: 617-544-6933, Work: 413-545-0528 Data Phone: 617-544-8337 at 300, 1200, or 2400 BAUD 24hours, except during FidoNet(r) mail periods. 7. More Information. ==================== I have written a set of EchoMail processing using a message format described in this document. The code is in C and is freely available for evalation. If you would like a copy, let me know and I will get a copy to you. I developed the code under OS-9/68000, but the code should easily port to other platforms.