Tillbaka till svenska Fidonet
English   Information   Debug  
OS2BBS   0/787
OS2DOSBBS   0/580
OS2HW   0/42
OS2INET   0/37
OS2LAN   0/134
OS2PROG   0/36
OS2REXX   0/113
OS2USER-L   207
OS2   0/4786
OSDEBATE   17265/18996
PASCAL   0/490
PERL   0/457
PHP   0/45
POINTS   0/405
POLITICS   0/29554
POL_INC   0/14731
PSION   103
R20_ADMIN   1121
R20_AMATORRADIO   0/2
R20_BEST_OF_FIDONET   13
R20_CHAT   0/893
R20_DEPP   0/3
R20_DEV   399
R20_ECHO2   1379
R20_ECHOPRES   0/35
R20_ESTAT   0/719
R20_FIDONETPROG...
...RAM.MYPOINT
  0/2
R20_FIDONETPROGRAM   0/22
R20_FIDONET   0/248
R20_FILEFIND   0/24
R20_FILEFOUND   0/22
R20_HIFI   0/3
R20_INFO2   3221
R20_INTERNET   0/12940
R20_INTRESSE   0/60
R20_INTR_KOM   0/99
R20_KANDIDAT.CHAT   42
R20_KANDIDAT   28
R20_KOM_DEV   112
R20_KONTROLL   0/13273
R20_KORSET   0/18
R20_LOKALTRAFIK   0/24
R20_MODERATOR   0/1852
R20_NC   76
R20_NET200   245
R20_NETWORK.OTH...
...ERNETS
  0/13
R20_OPERATIVSYS...
...TEM.LINUX
  0/44
R20_PROGRAMVAROR   0/1
R20_REC2NEC   534
R20_SFOSM   0/340
R20_SF   0/108
R20_SPRAK.ENGLISH   0/1
R20_SQUISH   107
R20_TEST   2
R20_WORST_OF_FIDONET   12
RAR   0/9
RA_MULTI   106
RA_UTIL   0/162
REGCON.EUR   0/2056
REGCON   0/13
SCIENCE   0/1206
SF   0/239
SHAREWARE_SUPPORT   0/5146
SHAREWRE   0/14
SIMPSONS   0/169
STATS_OLD1   0/2539.065
STATS_OLD2   0/2530
STATS_OLD3   0/2395.095
STATS_OLD4   0/1692.25
SURVIVOR   0/495
SYSOPS_CORNER   0/3
SYSOP   0/84
TAGLINES   0/112
TEAMOS2   0/4530
TECH   0/2617
TEST.444   0/105
TRAPDOOR   0/19
TREK   0/755
TUB   0/290
UFO   0/40
UNIX   0/1316
USA_EURLINK   0/102
USR_MODEMS   0/1
VATICAN   0/2740
VIETNAM_VETS   0/14
VIRUS   0/378
VIRUS_INFO   0/201
VISUAL_BASIC   0/473
WHITEHOUSE   0/5187
WIN2000   0/101
WIN32   0/30
WIN95   0/4288
WIN95_OLD1   0/70272
WINDOWS   0/1517
WWB_SYSOP   0/419
WWB_TECH   0/810
ZCC-PUBLIC   0/1
ZEC   4

 
4DOS   0/134
ABORTION   0/7
ALASKA_CHAT   0/506
ALLFIX_FILE   0/1313
ALLFIX_FILE_OLD1   0/7997
ALT_DOS   0/152
AMATEUR_RADIO   0/1039
AMIGASALE   0/14
AMIGA   0/331
AMIGA_INT   0/1
AMIGA_PROG   0/20
AMIGA_SYSOP   0/26
ANIME   0/15
ARGUS   0/924
ASCII_ART   0/340
ASIAN_LINK   0/651
ASTRONOMY   0/417
AUDIO   0/92
AUTOMOBILE_RACING   0/105
BABYLON5   0/17862
BAG   135
BATPOWER   0/361
BBBS.ENGLISH   0/382
BBSLAW   0/109
BBS_ADS   5087/5290
BBS_INTERNET   0/507
BIBLE   0/3563
BINKD   0/1119
BINKLEY   0/215
BLUEWAVE   0/2173
CABLE_MODEMS   0/25
CBM   0/46
CDRECORD   0/66
CDROM   0/20
CLASSIC_COMPUTER   0/378
COMICS   0/15
CONSPRCY   0/899
COOKING   32953
COOKING_OLD1   0/24719
COOKING_OLD2   0/40862
COOKING_OLD3   0/37489
COOKING_OLD4   0/35496
COOKING_OLD5   9370
C_ECHO   0/189
C_PLUSPLUS   0/31
DIRTY_DOZEN   0/201
DOORGAMES   0/2061
DOS_INTERNET   0/196
duplikat   6002
ECHOLIST   0/18295
EC_SUPPORT   0/318
ELECTRONICS   0/359
ELEKTRONIK.GER   1534
ENET.LINGUISTIC   0/13
ENET.POLITICS   0/4
ENET.SOFT   0/11701
ENET.SYSOP   33903
ENET.TALKS   0/32
ENGLISH_TUTOR   0/2000
EVOLUTION   0/1335
FDECHO   0/217
FDN_ANNOUNCE   0/7068
FIDONEWS   24128
FIDONEWS_OLD1   0/49742
FIDONEWS_OLD2   0/35949
FIDONEWS_OLD3   0/30874
FIDONEWS_OLD4   0/37224
FIDO_SYSOP   12852
FIDO_UTIL   0/180
FILEFIND   0/209
FILEGATE   0/212
FILM   0/18
FNEWS_PUBLISH   4408
FN_SYSOP   41679
FN_SYSOP_OLD1   71952
FTP_FIDO   0/2
FTSC_PUBLIC   0/13599
FUNNY   0/4886
GENEALOGY.EUR   0/71
GET_INFO   105
GOLDED   0/408
HAM   0/16070
HOLYSMOKE   0/6791
HOT_SITES   0/1
HTMLEDIT   0/71
HUB203   466
HUB_100   264
HUB_400   39
HUMOR   0/29
IC   0/2851
INTERNET   0/424
INTERUSER   0/3
IP_CONNECT   719
JAMNNTPD   0/233
JAMTLAND   0/47
KATTY_KORNER   0/41
LAN   0/16
LINUX-USER   0/19
LINUXHELP   0/1155
LINUX   0/22093
LINUX_BBS   0/957
mail   18.68
mail_fore_ok   249
MENSA   0/341
MODERATOR   0/102
MONTE   0/992
MOSCOW_OKLAHOMA   0/1245
MUFFIN   0/783
MUSIC   0/321
N203_STAT   926
N203_SYSCHAT   313
NET203   321
NET204   69
NET_DEV   0/10
NORD.ADMIN   0/101
NORD.CHAT   0/2572
NORD.FIDONET   189
NORD.HARDWARE   0/28
NORD.KULTUR   0/114
NORD.PROG   0/32
NORD.SOFTWARE   0/88
NORD.TEKNIK   0/58
NORD   0/453
OCCULT_CHAT   0/93
Möte OSDEBATE, 18996 texter
 lista första sista föregående nästa
Text 17164, 141 rader
Skriven 2007-03-29 18:13:52 av mike (1:379/45)
Ärende: Linux to help the Library of Congress save American history
===================================================================
From: mike <mike@barkto.com>


http://www.linux.com/article.pl?sid=07/03/26/1157212

===
The Library of Congress, where thousands of rare public domain documents
relating to America's history are stored and slowly decaying, is about to begin
an ambitious project to digitize these fragile documents using Linux-based
systems and publish the results online in multiple formats.


Thanks to a $2 million grant from the Sloan Foundation, "Digitizing American
Imprints at the Library of Congress" will begin the task of digitizing these
rare materials -- including Civil War and genealogical documents, technical and
artistic works concerning photography, scores of books, and the 850 titles
written, printed, edited, or published by Benjamin Franklin. According to
Brewster Kahle of the Internet Archive, which developed the digitizing
technology, open source software will play an "absolutely critical" role in
getting the job done.

The main component is Scribe, a combination of hardware and free software.
"Scribe is a book-scanning system that takes high-quality images of books and
then does a set of manipulations, gets them in optical character recognition
and compressed, so you can get beautiful, printable versions of the book that
are also searchable," says Kahle.

While previous versions were written for both Linux and Windows, the Internet
Archive has migrated Scribe entirely to Linux, and Windows support has been
dropped. Kahle says the project uses Ubuntu now.

When asked why the Library of Congress chose Scribe for this project, Dr.
Jeremy E. A. Adamson, the library's director for collections and services,
replies that the Internet Archive has already demonstrated "the efficient
production of high-quality images" with it.

Kahle says that a Linux-based Scribe workstation at the Library of Congress
will hold the material to be scanned in a V-shaped cradle -- it doesn't crack
books all the way open -- while two cameras take images of it. A human operator
performs quality assurance, then Scribe sends the digital images across the
breadth of the country to the Internet Archive in San Francisco, where it is
processed and eventually posted online in various formats. Free software is
used almost every step of the way.

"[It's a] Linux-based station out there in the field. It rsyncs the files up to
the servers, [and then] it goes and does the processing on a Linux cluster of
over 1,000 machines, and then posts it online -- also on Linux machines," Kahle
says.

Image processing for an average book takes about 10 hours on the cluster, and
while the project still uses proprietary optical character recognition (OCR)
software, Kahle says that many open source applications come into play,
including the netpbm utilities and ImageMagick, and the software performs "a
lot of image manipulation, cropping, deskewing, correcting color to normalize
it -- [it] does compression, optical character recognition, and packaging into
a searchable, downloadable PDF; searchable, downloadable DjVu files; and an
on-screen representation we call the Flip Book."

The Flip Book is used at The Open Library, a charmingly retro Web interface for
online books that mimics old technologies (clicking "Details" for a title
brings up a yellowed card catalog entry), which the Internet Archive says was
"inspired by a British Library kiosk."

The books are stored in the PetaBox, which is the Internet Archive's massive
million-gigabyte storage system -- a system that Kahle says is "all built on
open source software."

Caring for brittle books

A good number of the historic materials in question are old, fragile, and in
such rough shape that placing them in Scribe's cradle, or even attempting to
read them, could irreparably damage them. Adamson says that some of the books,
for example, have pages "that have become brittle with age"; while Adamson says
these materials are in a broad range of conditions that limit their physical
handling, he uses the general term "brittle books" to describe it. No list of
such brittle materials at the Library of Congress has been made, but Adamson
says that "they comprise a percentage of virtually every collection." Adamson
says the project's objectives include the development of a more formal
classification and description of these "brittle" materials, and to "establish
digitization workflows based on that classification of condition."

If scanning the brittle materials demands new software and digitization
techniques, the Library of Congress will work in conjunction with the Internet
Archive to make the innovations available to the public. But there's no way to
know at this point what they may be, because the project is only getting
underway.

"The project proposal calls for months of planning before any scanning or
engineering is to begin," Adamson says. And the planning, he says, is
"significant": "Space needs to be prepared to accommodate the physical scanning
of books, server storage allocated, project plans need to be written, project
team members briefed, along with myriad other details required for a project of
this magnitude and complexity."

Eventually, Adamson says, when the scanning and processing of materials has
been completed, the high-quality digitized versions of these historic documents
(and metadata associated with them, such as indices and contents) will be
freely accessible online -- which Kahle says is a "huge step" in broadening the
reach of the ever-too-small public domain.

"There may be public domain books that are sitting on shelves, but if you can't
get access to [something], what good does it do to be in the public domain?"
says Kahle. "The Library of Congress is dedicated to keeping [these digitized
holdings] public domain, which I think is a great step that's not being
followed by everybody else."

The program is part of larger efforts, both at the Library of Congress, to
preserve old media and records, and at the Internet Archive, which is already
scanning public domain materials with its Open Content Alliance, a consortium
of about 40 libraries. Kahle says that the alliance is presently operating in
five cities, using the Scribe software, at a brisk clip of 12,000 books a
month.

"We're part of the 'open world' through and through -- we use open source
software, we generate open source software, we generate open content," says
Kahle. "We're trying to take this open source idea to the next level, which is
open content and open access to cultural materials, which means 'publicly
downloadable in bulk.' I think we're really seeing the next level up of this
whole movement -- we had the open network, then open source software, now we're
starting to see open source content."


Links

"Library of Congress" - http://loc.gov/ "Sloan Foundation" -
http://www.sloan.org/ "previous versions" -
http://sourceforge.net/projects/scribesw/ "Ubuntu" - http://ubuntu.com/
"Internet Archive" - http://archive.org/ "netpbm utilities" -
http://netpbm.sourceforge.net/ "ImageMagick" -
http://applications.linux.com/article.pl?sid=05/03/29/1525217&tid=39
"The Open Library" - http://www.openlibrary.org/ "PetaBox" -
http://www.archive.org/web/petabox.php "preserve old media and records" -
http://www.digitalpreservation.gov/ "Open Content Alliance" -
http://www.opencontentalliance.org/

===

   /m

--- BBBS/NT v4.01 Flag-5
 * Origin: Barktopia BBS Site http://HarborWebs.com:8081 (1:379/45)