Tillbaka till svenska Fidonet
English   Information   Debug  
IC   0/2851
INTERNET   0/424
INTERUSER   0/3
IP_CONNECT   719
JAMNNTPD   0/233
JAMTLAND   0/47
KATTY_KORNER   0/41
LAN   0/16
LINUX-USER   0/19
LINUXHELP   0/1155
LINUX   7964/22120
LINUX_BBS   0/957
mail   18.68
mail_fore_ok   249
MENSA   0/341
MODERATOR   0/102
MONTE   0/992
MOSCOW_OKLAHOMA   0/1245
MUFFIN   0/783
MUSIC   0/321
N203_STAT   932
N203_SYSCHAT   313
NET203   321
NET204   69
NET_DEV   0/10
NORD.ADMIN   0/101
NORD.CHAT   0/2572
NORD.FIDONET   189
NORD.HARDWARE   0/28
NORD.KULTUR   0/114
NORD.PROG   0/32
NORD.SOFTWARE   0/88
NORD.TEKNIK   0/58
NORD   0/453
OCCULT_CHAT   0/93
OS2BBS   0/787
OS2DOSBBS   0/580
OS2HW   0/42
OS2INET   0/37
OS2LAN   0/134
OS2PROG   0/36
OS2REXX   0/113
OS2USER-L   207
OS2   0/4793
OSDEBATE   0/18996
PASCAL   0/490
PERL   0/457
PHP   0/45
POINTS   0/405
POLITICS   0/29554
POL_INC   0/14731
PSION   103
R20_ADMIN   1124
R20_AMATORRADIO   0/2
R20_BEST_OF_FIDONET   13
R20_CHAT   0/893
R20_DEPP   0/3
R20_DEV   399
R20_ECHO2   1379
R20_ECHOPRES   0/35
R20_ESTAT   0/719
R20_FIDONETPROG...
...RAM.MYPOINT
  0/2
R20_FIDONETPROGRAM   0/22
R20_FIDONET   0/248
R20_FILEFIND   0/24
R20_FILEFOUND   0/22
R20_HIFI   0/3
R20_INFO2   3267
R20_INTERNET   0/12940
R20_INTRESSE   0/60
R20_INTR_KOM   0/99
R20_KANDIDAT.CHAT   42
R20_KANDIDAT   28
R20_KOM_DEV   112
R20_KONTROLL   0/13318
R20_KORSET   0/18
R20_LOKALTRAFIK   0/24
R20_MODERATOR   0/1852
R20_NC   76
R20_NET200   245
R20_NETWORK.OTH...
...ERNETS
  0/13
R20_OPERATIVSYS...
...TEM.LINUX
  0/44
R20_PROGRAMVAROR   0/1
R20_REC2NEC   534
R20_SFOSM   0/341
R20_SF   0/108
R20_SPRAK.ENGLISH   0/1
R20_SQUISH   107
R20_TEST   2
R20_WORST_OF_FIDONET   12
RAR   0/9
RA_MULTI   106
RA_UTIL   0/162
REGCON.EUR   0/2056
REGCON   0/13
SCIENCE   0/1206
SF   0/239
SHAREWARE_SUPPORT   0/5146
SHAREWRE   0/14
SIMPSONS   0/169
STATS_OLD1   0/2539.065
STATS_OLD2   0/2530
STATS_OLD3   0/2395.095
STATS_OLD4   0/1692.25
SURVIVOR   0/495
SYSOPS_CORNER   0/3
SYSOP   0/84
TAGLINES   0/112
TEAMOS2   0/4530
TECH   0/2617
TEST.444   0/105
TRAPDOOR   0/19
TREK   0/755
TUB   0/290
UFO   0/40
UNIX   0/1316
USA_EURLINK   0/102
USR_MODEMS   0/1
VATICAN   0/2740
VIETNAM_VETS   0/14
VIRUS   0/378
VIRUS_INFO   0/201
VISUAL_BASIC   0/473
WHITEHOUSE   0/5187
WIN2000   0/101
WIN32   0/30
WIN95   0/4290
WIN95_OLD1   0/70272
WINDOWS   0/1517
WWB_SYSOP   0/419
WWB_TECH   0/810
ZCC-PUBLIC   0/1
ZEC   4

 
4DOS   0/134
ABORTION   0/7
ALASKA_CHAT   0/506
ALLFIX_FILE   0/1313
ALLFIX_FILE_OLD1   0/7997
ALT_DOS   0/152
AMATEUR_RADIO   0/1039
AMIGASALE   0/14
AMIGA   0/331
AMIGA_INT   0/1
AMIGA_PROG   0/20
AMIGA_SYSOP   0/26
ANIME   0/15
ARGUS   0/924
ASCII_ART   0/340
ASIAN_LINK   0/651
ASTRONOMY   0/417
AUDIO   0/92
AUTOMOBILE_RACING   0/105
BABYLON5   0/17862
BAG   135
BATPOWER   0/361
BBBS.ENGLISH   0/382
BBSLAW   0/109
BBS_ADS   0/5290
BBS_INTERNET   0/507
BIBLE   0/3563
BINKD   0/1119
BINKLEY   0/215
BLUEWAVE   0/2173
CABLE_MODEMS   0/25
CBM   0/46
CDRECORD   0/66
CDROM   0/20
CLASSIC_COMPUTER   0/378
COMICS   0/15
CONSPRCY   0/899
COOKING   33685
COOKING_OLD1   0/24719
COOKING_OLD2   0/40862
COOKING_OLD3   0/37489
COOKING_OLD4   0/35496
COOKING_OLD5   9370
C_ECHO   0/189
C_PLUSPLUS   0/31
DIRTY_DOZEN   0/201
DOORGAMES   0/2065
DOS_INTERNET   0/196
duplikat   6002
ECHOLIST   0/18295
EC_SUPPORT   0/318
ELECTRONICS   0/359
ELEKTRONIK.GER   1534
ENET.LINGUISTIC   0/13
ENET.POLITICS   0/4
ENET.SOFT   0/11701
ENET.SYSOP   33963
ENET.TALKS   0/32
ENGLISH_TUTOR   0/2000
EVOLUTION   0/1335
FDECHO   0/217
FDN_ANNOUNCE   0/7068
FIDONEWS   24191
FIDONEWS_OLD1   0/49742
FIDONEWS_OLD2   0/35949
FIDONEWS_OLD3   0/30874
FIDONEWS_OLD4   0/37224
FIDO_SYSOP   12852
FIDO_UTIL   0/180
FILEFIND   0/209
FILEGATE   0/212
FILM   0/18
FNEWS_PUBLISH   4461
FN_SYSOP   41735
FN_SYSOP_OLD1   71952
FTP_FIDO   0/2
FTSC_PUBLIC   0/13627
FUNNY   0/4886
GENEALOGY.EUR   0/71
GET_INFO   105
GOLDED   0/408
HAM   0/16084
HOLYSMOKE   0/6791
HOT_SITES   0/1
HTMLEDIT   0/71
HUB203   466
HUB_100   264
HUB_400   39
HUMOR   0/29
Möte LINUX, 22120 texter
 lista första sista föregående nästa
Text 7912, 815 rader
Skriven 2006-11-11 09:34:04 av Robert Wolfe (1:261/1)
Ärende: Archiving and Compression
=================================
 
Archiving and Compression
By Scott Granneman 
Created 2006-10-23 01:00 

Chapter 8 from Scott Granneman's new book \"Linux Phrasebook: The Pocket Guide
Every Linux User Needs\". Linux Phrasebook offers a concise reference that,
like a language phrasebook, can be used \"in the street.\" The book goes
straight to practical Linux uses, providing immediate solutions for day-to-day
tasks. 
Chapter 8: Archiving and Compression

Although the differences are sometimes made opaque in casual conversation,
there is in fact a complete difference between archiving files and compressing
them. Archiving means that you take 10 files and combine them into one file,
with no difference in size. If you start with 10 100KB files and archive them,
the resulting single file is 1000KB. On the other hand, if you compress those
10 files, you might find that the resulting files range from only a few
kilobytes to close to the original size of 100KB, depending upon the original
file type.

Note - In fact, you might end up with a bigger file during compression! If the
file is already compressed, compressing it again adds extra overhead, resulting
in a slightly bigger file.

All of the archive and compression formats in this chapter - zip, gzip, bzip2,
and tar - are popular, but zip is probably the world's most widely used format.
That's because of its almost universal use on Windows, but zip and unzip are
well supported among all major (and most minor) operating systems, so things
compressed using zip also work on Linux and Mac OS. If you're sending archives
out to users and you don't know which operating systems they're using, zip is a
safe choice to make.

gzip was designed as an open-source replacement for an older Unix program,
compress. It's found on virtually every Unix-based system in the world,
including Linux and Mac OS X, but it is much less common on Windows. If you're
sending files back and forth to users of Unix-based machines, gzip is a safe
choice.

The bzip2 command is the new kid on the block. Designed to supersede gzip,
bzip2 creates smaller files, but at the cost of speed. That said, computers are
so fast nowadays that most users won't notice much of a difference between the
times it takes gzip or bzip2 to compress a group of files.

Note - Linux Magazine published a good article comparing several different
compression formats, which you can find at
www.linux-mag.com/content/view/1678/43/.

zip, gzip, and bzip2 are focused on compression (although zip also archives).
The tar command does one thing - archive - and it has been doing it for a long
time. It's found almost solely on Unix-based machines. You'll definitely run
into tar files (also called tarballs) if you download source code, but almost
every Linux user can expect to encounter a tarball some time in his career.
Archive and Compress Files Using zip

     zip

zip both archives and compresses files, thus making it great for sending
multiple files as email attachments, backing up items, or for saving disk
space. Using it is simple. Let's say you want to send a TIFF to someone via
email. A TIFF image is uncompressed, so it tends to be pretty large. Zipping it
up should help make the email attachment a bit smaller.

Note - When using ls -l, I'm only showing the information needed for each
example.
   $ ls -lh
   -rw-r--r-- scott scott 1006K young_edgar_scott.tif
   $ zip grandpa.zip young_edgar_scott.tif
   adding: young_edgar_scott.tif (deflated 19%)
   $ ls -lh
   -rw-r--r-- scott scott 1006K young_edgar_scott.tif
   -rw-r--r-- scott scott 819K grandpa.zip
   _grandpa.zip


In this case, you shaved off about 200KB on the resulting zip file, or 19%, as
zip helpfully informs you. Not bad. You can do the same thing for several
images.
   $ ls -l
   -rw-r--r-- scott scott 251980 edgar_intl_shoe.tif
   -rw-r--r-- scott scott 1130922 edgar_baby.tif
   -rw-r--r-- scott scott 1029224 young_edgar_scott.tif
   $ zip grandpa.zip edgar_intl_shoe.tif edgar_baby.tif young_edgar_scott.tif
   adding: edgar_intl_shoe.tif (deflated 4%)
   adding: edgar_baby.tif (deflated 12%)
   adding: young_edgar_scott.tif (deflated 19%)
   $ ls -l
   -rw-r--r-- scott scott 251980 edgar_intl_shoe.tif
   -rw-r--r-- scott scott 1130922 edgar_baby.tif
   -rw-r--r-- scott scott 2074296 grandpa.zip
   -rw-r--r-- scott scott 1029224 young_edgar_scott.tif


It's not too polite, however, to zip up individual files this way. For three
files, it's not so bad. The recipient will unzip grandpa.zip and end up with
three individual files. If the payload was 50 files, however, the user would
end up with files strewn everywhere. Better to zip up a directory containing
those 50 files so when the user unzips it, he's left with a tidy directory
instead.
   $ ls -lF 
   drwxr-xr-x scott scott edgar_scott/
   $ zip grandpa.zip edgar_scott
   adding: edgar_scott/ (stored 0%)
   adding: edgar_scott/edgar_baby.tif (deflated 12%)
   adding: edgar_scott/young_edgar_scott.tif (deflated 19%)
   adding: edgar_scott/edgar_intl_shoe.tif (deflated 4%)
   $ ls -lF
   drwxr-xr-x scott scott   160 edgar_scott/
   -rw-r--r-- scott scott 2074502 grandpa.zip


Whether you're zipping up a file, several files, or a directory, the pattern is
the same: the zip command, followed by the name of the Zip file you're
creating, and finished with the item(s) you're adding to the Zip file. Get the
Best Compression Possible with zip

     -[0-9]

It's possible to adjust the level of compression that zip uses when it does its
job. The zip command uses a scale from 0 to 9, in which 0 means "no compression
at all" (which is like tar, as you'll see later), 1 means "do the job quickly,
but don't bother compressing very much," and 9 means "compress the heck out of
the files, and I don't mind waiting a bit longer to get the job done." The
default is 6, but modern computers are fast enough that it's probably just fine
to use 9 all the time.

Say you're interested in researching Herman Melville's Moby-Dick, so you want
to collect key texts to help you understand the book: Moby-Dick itself,
Milton's Paradise Lost, and the Bible's book of Job. Let's compare the results
of different compression rates.
   $ ls -l
   -rw-r--r-- scott scott 102519 job.txt
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 508925 paradise_lost.txt
   $ zip -0 moby.zip *.txt
   adding: job.txt (stored 0%)
   adding: moby-dick.txt (stored 0%)
   adding: paradise_lost.txt (stored 0%)
   $ ls -l
   -rw-r--r-- scott scott 102519 job.txt
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 1848444 moby.zip
   -rw-r--r-- scott scott 508925 paradise_lost.txt
   $ zip -1 moby.zip *txt
   updating: job.txt (deflated 58%)
   updating: moby-dick.txt (deflated 54%)
   updating: paradise_lost.txt (deflated 50%)
   $ ls -l
   -rw-r--r-- scott scott 102519 job.txt
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 869946 moby.zip
   -rw-r--r-- scott scott 508925 paradise_lost.txt
   $ zip -9 moby.zip *txt
   updating: job.txt (deflated 65%)
   updating: moby-dick.txt (deflated 61%)
   updating: paradise_lost.txt (deflated 56%)
   $ ls -l
   -rw-r--r-- scott scott 102519 job.txt
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 747730 moby.zip
   -rw-r--r-- scott scott 508925 paradise_lost.txt


In tabular format, the results look like this: Book    zip -0  zip -1  zip -9
Moby-Dick       0%      54%     61%
Paradise Lost   0%      50%     56%
Job     0%      58%     65%
Total (in bytes)        1848444 869946  747730


The results you see here would vary depending on the file types (text files
typically compress well) and the sizes of the original files, but this gives
you a good idea of what you can expect. Unless you have a really slow machine
or you're just naturally impatient, you should just use -9 all the time to get
the maximum compression.

Note - If you want to be clever, define an alias in your .bashrc file that
looks like this:

alias zip='zip -9'

That way you'll always use -9 and won't have to think about it.
Password-Protect Compressed Zip Archives

     -P

     -e

The Zip program allows you to password-protect your Zip archives using the -P
option. You shouldn't use this option. It's completely insecure, as you can see
in the following example (the actual password is 12345678):
   $ zip -P 12345678 moby.zip *.txt


Because you had to specify the password on the command line, anyone viewing
your shell's history (and you might be surprised how easy it is for other users
to do so) can see your password in all its glory. Don't use the -P option!

Instead, just use the -e option, which encrypts the contents of your Zip file
and also uses a password. The difference, however, is that you're prompted to
type the password in, so it won't be saved in the history of your shell events.
   $ zip -e moby.zip *.txt
   Enter password:
   Verify password:
   adding: job.txt (deflated 65%)
   adding: moby-dick.txt (deflated 61%)
   adding: paradise_lost.txt (deflated 56%)


The only part of this that's saved in the shell is zip -e moby.zip *.txt. The
actual password you type disappears into the ether, unavailable to anyone
viewing your shell history.

Caution - The security offered by the Zip program's password protection isn't
that great. In fact, it's pretty easy to find a multitude of tools floating
around the Internet that can quickly crack a password-protected Zip archive.
Think of password-protecting a Zip file as the difference between writing a
message on a postcard and sealing it in an envelope: It's good enough for
ordinary folks, but it won't stop a determined attacker.

Also, the version of zip included with some Linux distros may not support
encryption, in which case you'll see a zip error: "encryption not supported."
The only solution: recompile zip from source. Ugh. Unzip Files

     unzip

Expanding a Zip archive isn't hard at all. To create a zipped archive, use the
zip command; to expand that archive, use the unzip command.
   $ unzip moby.zip
   Archive: moby.zip
   inflating: job.txt
   inflating: moby-dick.txt
   inflating: paradise_lost.txt


The unzip command helpfully tells you what it's doing as it works. To get even
more information, add the -v option (which stands, of course, for verbose).
   unzip -v moby.zip
   Archive: moby.zip
   Length   Method  Size   Ratio  CRC-32   Name
   -------  ------  ------ -----  ------   ----
   102519   Defl:X   35747  65%  fabf86c9  job.txt
   1236574  Defl:X  487553  61%  34a8cc3a  moby-dick.txt
   508925   Defl:X  224004  56%  6abe1d0f  paradise_lost.t
   -------          ------  ---            -------
   1848018          747304  60%            3 files


There's quite a bit of useful data here, including the method used to compress
the files, the ratio of original to compressed file size, and the cyclic
redundancy check (CRC) used for error correction. List Files That Will Be
Unzipped

     -l

Sometimes you might find yourself looking at a Zip file and not remembering
what's in that file. Or perhaps you want to make sure that a file you need is
contained within that Zip file. To list the contents of a zip file without
unzipping it, use the -l option (which stands for "list").
   $ unzip -l moby.zip
   Archive: moby.zip
   Length     Date    Time   Name
   --------   ----    ----   ----
         0  01-26-06  18:40  bible/
    207254  01-26-06  18:40  bible/genesis.txt
    102519  01-26-06  18:19  bible/job.txt
   1236574  01-26-06  18:19  moby-dick.txt
    508925  01-26-06  18:19  paradise_lost.txt
   --------                  -------
   2055272                   5 files


From these results, you can see that moby.zip contains two files -
moby-dick.txt and paradise_lost.txt - and a directory (bible), which itself
contains two files, genesis. txt and job.txt. Now you know exactly what will
happen when you expand moby.zip. Using the -l command helps prevent
inadvertently unzipping a file that spews out 100 files instead of unzipping a
directory that contains 100 files. The first leaves you with files strewn
pell-mell, while the second is far easier to handle. Test Files That Will Be
Unzipped

-t

Sometimes zipped archives become corrupted. The worst time to discover this is
after you've unzipped the archive and deleted it, only to discover that some or
even all of the unzipped contents are damaged and won't open. Better to test
the archive first before you actually unzip it by using the -t (for test)
option.
   $ unzip -t moby.zip
   Archive: moby.zip
   testing: bible/               OK
   testing: bible/genesis.txt    OK
   testing: bible/job.txt        OK
   testing: moby-dick.txt        OK
   testing: paradise_lost.txt    OK
   No errors detected in compressed data of moby.zip.


You really should use -t every time you work with a zipped file. It's the smart
thing to do, and although it might take some extra time, it's worth it in the
end.
Archive and Compress Files Using gzip

     gzip

Using gzip is a bit easier than zip in some ways. With zip, you need to specify
the name of the newly created Zip file or zip won't work; with gzip, though,
you can just type the command and the name of the file you want to compress.
   $ ls -l
   -rw-r--r-- scott scott 508925 paradise_lost.txt
   $ gzip paradise_lost.txt
   $ ls -l
   -rw-r--r-- scott scott 224425 paradise_lost.txt.gz


You should be aware of a very big difference between zip and gzip: When you zip
a file, zip leaves the original behind so you have both the original and the
newly zipped file, but when you gzip a file, you're left with only the new
gzipped file. The original is gone.

If you want gzip to leave behind the original file, you need to use the -c (or
--stdout or --to-stdout) option, which outputs the results of gzip to the
shell, but you need to redirect that output to another file. If you use -c and
forget to redirect your output, you get nonsense like this:


Not good. Instead, output to a file.
   $ls -l
   -rw-r--r-- 1 scott scott 508925 paradise_lost.txt
   $ gzip -c paradise_lost.txt > paradise_lost.txt.gz
   $ ls -l
   -rw-r--r-- 1 scott scott 497K paradise_lost.txt
   -rw-r--r-- 1 scott scott 220K paradise_lost.txt.gz


Much better! Now you have both your original file and the zipped version.

Tip: If you accidentally use the -c option without specifying an output file,
just start pressing Ctrl+C several times until gzip stops. Archive and Compress
Files Recursively Using gzip

     -r

If you want to use gzip on several files in a directory, just use a wildcard.
You might not end up gzipping everything you think you will, however, as this
example shows.
   $ ls -F
   bible/ moby-dick.txt paradise_lost.txt
   $ ls -l *
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 508925 paradise_lost.txt
   bible:
   -rw-r--r-- scott scott 207254 genesis.txt
   -rw-r--r-- scott scott 102519 job.txt
   $ gzip *
   gzip: bible is a directory -- ignored
   $ ls -l *
   -rw-r--r-- scott scott 489609 moby-dick.txt.gz
   -rw-r--r-- scott scott 224425 paradise_lost.txt.gz
   bible:
   -rw-r--r-- scott scott 207254 genesis.txt
   -rw-r--r-- scott scott 102519 job.txt


Notice that the wildcard didn't do anything for the files inside the bible
directory because gzip by default doesn't walk down into subdirectories. To get
that behavior, you need to use the -r (or --recursive) option along with your
wildcard.
   $ ls -F
   bible/ moby-dick.txt paradise_lost.txt
   $ ls -l *
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 508925 paradise_lost.txt
   bible:
   -rw-r--r-- scott scott 207254 genesis.txt
   -rw-r--r-- scott scott 102519 job.txt
   $ gzip -r *
   $ ls -l *
   -rw-r--r-- scott scott 489609 moby-dick.txt.gz
   -rw-r--r-- scott scott 224425 paradise_lost.txt.gz
   bible:
   -rw-r--r-- scott scott 62114 genesis.txt.gz
   -rw-r--r-- scott scott 35984 job.txt.gz


This time, every file - even those in subdirectories - was gzipped. However,
note that each file is individually gzipped. The gzip command cannot combine
all the files into one big file, like you can with the zip command. To do that,
you need to incorporate tar, as you'll see in "Archive and Compress Files with
tar and gzip."
Get the Best Compression Possible with gzip

     -[0-9]

Just as with zip, it's possible to adjust the level of compression that gzip
uses when it does its job. The gzip command uses a scale from 0 to 9, in which
0 means "no compression at all" (which is like tar, as you'll see later), 1
means "do the job quickly, but don't bother compressing very much," and 9 means
"compress the heck out of the files, and I don't mind waiting a bit longer to
get the job done." The default is 6, but modern computers are fast enough that
it's probably just fine to use 9 all the time.
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   $ gzip -c -1 moby-dick.txt > moby-dick.txt.gz
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 571005 moby-dick.txt.gz
   $ gzip -c -9 moby-dick.txt > moby-dick.txt.gz
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 487585 moby-dick.txt.gz


Remember to use the -c option and pipe the output into the actual .gz file due
to the way gzip works, as discussed in "Archive and Compress Files Using gzip."

Note - If you want to be clever, define an alias in your .bashrc file that
looks like this:

alias gzip='gzip -9'

That way, you'll always use -9 and won't have to think about it. Uncompress
Files Compressed with gzip

     gunzip

Getting files out of a gzipped archive is easy with the gunzip command.
   $ ls -l
   -rw-r--r-- scott scott 224425 paradise_lost.txt.gz
   $ gunzip paradise_lost.txt.gz
   $ ls -l
   -rw-r--r-- scott scott 508925 paradise_lost.txt


In the same way that gzip removes the original file, leaving you solely with
the gzipped result, gunzip removes the .gz file, leaving you with the final
gunzipped result. If you want to ensure that you have both, you need to use the
-c option (or --stdout or --to-stdout) and pipe the results to the file you
want to create.
   $ ls -l
   -rw-r--r-- scott scott 224425 paradise_lost.txt.gz
   $ gunzip -c paradise_lost.txt.gz > paradise_lost.txt
   $ ls -l
   -rw-r--r-- scott scott 508925 paradise_lost.txt
   -rw-r--r-- scott scott 224425 paradise_lost.txt.gz


It's probably a good idea to use -c, especially if you plan to keep behind the
.gz file or pass it along to someone else. Sure, you could use gzip and create
your own archive, but why go to the extra work?

Note - If you don't like the gunzip command, you can also use gzip -d (or
--decompress or --uncompress).
Test Files That Will Be Unzipped with gunzip

     -t

Before gunzipping a file (or files) with gunzip, you might want to verify that
they're going to gunzip correctly without any file corruption. To do this, use
the -t (or --test) option.
   $ gzip -t paradise_lost.txt.gz
   $


That's right: If nothing is wrong with the archive, gzip reports nothing back
to you. If there's a problem, you'll know, but if there's not a problem, gzip
is silent. That can be a bit disconcerting, but that's how Unix-based systems
work. They're generally only noisy if there's an issue you should know about,
not if everything is working as it should. Archive and Compress Files Using
bzip2

     bzip2

Working with bzip2 is pretty easy if you're comfortable with gzip, as the
creators of bzip2 deliberately made the options and behavior of the new command
as similar to its progenitor as possible.
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   $ bzip2 moby-dick.txt
   $ ls -l
   -rw-r--r-- scott scott 367248 moby-dick.txt.bz2


Just like gzip, bzip2 leaves you with just the .bz2 file. The original
moby-dick.txt is gone. To keep the original file, use the -c (or --stdout)
option and pipe the output to a filename that ends with .bz2.
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   $ bzip2 -c moby-dick.txt > moby-dick.txt.bz2
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 367248 moby-dick.txt.bz2


If you look back at "Archive and Compress Files Using gzip," you'll see that
gzip and bzip2 are incredibly similar, which is by design. Get the Best
Compression Possible with bzip2

     -[0-9]

Just as with zip and gzip, it's possible to adjust the level of compression
that bzip2 uses when it does its job. The bzip2 command uses a scale from 0 to
9, in which 0 means "no compression at all" (which is like tar, as you'll see
later), 1 means "do the job quickly, but don't bother compressing very much,"
and 9 means "compress the heck out of the files, and I don't mind waiting a bit
longer to get the job done." The default is 6, but modern computers are fast
enough that it's probably just fine to use 9 all the time.
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   $ bzip2 -c -1 moby-dick.txt > moby-dick.txt.bz2
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 424084 moby-dick.txt.bz2
   $ bzip2 -c -9 moby-dick.txt > moby-dick.txt.bz2
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 367248 moby-dick.txt.bz2


From 424KB with 1 to 367KB with 9 - that's quite a difference! Also notice the
difference in ultimate file size between gzip and bzip2. At -9, gzip compressed
moby-dick.txt down to 488KB, while bzip2 mashed it even further to 367KB. The
bzip2 command is noticeably slower than the gzip command, but on a fast machine
that means that bzip2 takes two or three seconds longer than gzip, which
frankly isn't much to worry about.

Note - If you want to be clever, define an alias in your .bashrc file that
looks like this:

alias bzip2='bzip2 -9'

That way, you'll always use -9 and won't have to think about it. Uncompress
Files Compressed with bzip2

     bunzip2

In the same way that bzip2 was purposely designed to emulate gzip as closely as
possible, the way bunzip2 works is very close to that of gunzip.
   $ ls -l
   -rw-r--r-- scott scott 367248 moby-dick.txt.bz2
   $ bunzip2 moby-dick.txt.bz2
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt


You'll notice that bunzip2 is similar to gunzip in another way: Both commands
remove the original compressed file, leaving you with the final uncompressed
result. If you want to ensure that you have both the compressed and
uncompressed files, you need to use the -c option (or --stdout or --to-stdout)
and pipe the results to the file you want to create.
   $ ls -l
   -rw-r--r-- scott scott 367248 moby-dick.txt.bz2
   $ bunzip2 -c moby-dick.txt.bz2 > moby-dick.txt
   $ ls -l
   -rw-r--r-- scott scott 1236574 moby-dick.txt
   -rw-r--r-- scott scott 367248 moby-dick.txt.bz2


It's a good thing when commands copy each other's options and behavior, as it
makes them easier to learn. In this, the creators of bzip2 and bunzip2 showed
remarkable foresight.

Note - If you're not feeling favorable toward bunzip2, you can also use bzip2
-d (or --decompress or --uncompress).
Test Files That Will Be Unzipped with bunzip

     -t

Before bunzipping a file (or files) with bunzip, you might want to verify that
they're going to bunzip correctly without any file corruption. To do this, use
the -t (or --test) option.
   $ bunzip2 -t paradise_lost.txt.gz
   $


Just as with gunzip, if there's nothing wrong with the archive, bunzip2 doesn't
report anything back to you. If there's a problem, you'll know, but if there's
not a problem, bunzip2 is silent.
Archive Files with tar

     -cf

Remember, tar doesn't compress; it merely archives (the resulting archives are
known as tarballs, by the way). Instead, tar uses other programs, such as gzip
or bzip2, to compress the archives that tar creates. Even if you're not going
to compress the tarball, you still create it the same way with the same basic
options: -c (or --create), which tells tar that you're making a tarball, and -f
(or --file), which is the specified filename for the tarball.
   $ ls -l
   scott scott 102519 job.txt
   scott scott 1236574 moby-dick.txt
   scott scott 508925 paradise_lost.txt
   $ tar -cf moby.tar *.txt
   $ ls -l
   scott scott 102519 job.txt
   scott scott 1236574 moby-dick.txt
   scott scott 1853440 moby.tar
   scott scott 508925 paradise_lost.txt


Pay attention to two things here. First, add up the file sizes of job.txt,
moby-dick.txt, and paradise_lost.txt, and you get 1848018 bytes. Compare that
to the size of moby.tar, and you see that the tarball is only 5422 bytes
bigger. Remember that tar is an archive tool, not a compression tool, so the
result is at least the same size as the individual files put together, plus a
little bit for overhead to keep track of what's in the tarball. Second, notice
that tar, unlike gzip and bzip2, leaves the original files behind. This isn't a
surprise, considering the tar command's background as a backup tool.

What's really cool about tar is that it's designed to compress entire directory
structures, so you can archive a large number of files and subdirectories in
one fell swoop.
   $ ls -lF
   drwxr-xr-x scott scott 168 moby-dick/
   $ ls -l moby-dick/*
   scott scott 102519 moby-dick/job.txt
   scott scott 1236574 moby-dick/moby-dick.txt
   scott scott 508925 moby-dick/paradise_lost.txt
   moby-dick/bible:
   scott scott 207254 genesis.txt
   scott scott 102519 job.txt
   $ tar -cf moby.tar moby-dick/
   $ ls -lF
   scott scott   168 moby-dick/
   scott scott 2170880 moby.tar


The tar command has been around forever, and it's obvious why: It's so darn
useful! But it gets even more useful when you start factoring in compression
tools, as you'll see in the next section. Archive and Compress Files with tar
and gzip

     -zcvf

If you look back at "Archive and Compress Files Using gzip" and "Archive and
Compress Files Using bzip2" and think about what was discussed there, you'll
probably start to figure out a problem. What if you want to compress a
directory that contains 100 files, contained in various subdirectories? If you
use gzip or bzip2 with the -r (for recursive) option, you'll end up with 100
individually compressed files, each stored neatly in its original subdirectory.
This is undoubtedly not what you want. How would you like to attach 100 .gz or
.bz2 files to an email? Yikes!

That's where tar comes in. First you'd use tar to archive the directory and its
contents (those 100 files inside various subdirectories) and then you'd use
gzip or bzip2 to compress the resulting tarball. Because gzip is the most
common compression program used in concert with tar, we'll focus on that.

You could do it this way:
   $ ls -l moby-dick/*
   scott scott 102519 moby-dick/job.txt
   scott scott 1236574 moby-dick/moby-dick.txt
   scott scott 508925 moby-dick/paradise_lost.txt
   moby-dick/bible:
   scott scott 207254 genesis.txt
   scott scott 102519 job.txt
   $ tar -cf moby.tar moby-dick/ | gzip -c > moby.tar.gz
   $ ls -l
   scott scott 168 moby-dick/
   scott scott  20 moby.tar.gz


That method works, but it's just too much typing! There's a much easier way
that should be your default. It involves two new options for tar: -z (or
--gzip), which invokes gzip from within tar so you don't have to do so
manually, and -v (or --verbose), which isn't required here but is always
useful, as it keeps you notified as to what tar is doing as it runs.
   $ ls -l moby-dick/*
   scott scott 102519 moby-dick/job.txt
   scott scott 1236574 moby-dick/moby-dick.txt
   scott scott 508925 moby-dick/paradise_lost.txt
   moby-dick/bible:
   scott scott 207254 genesis.txt
   scott scott 102519 job.txt
   $ tar -zcvf moby.tar.gz moby-dick/
   moby-dick/
   moby-dick/job.txt
   moby-dick/bible/
   moby-dick/bible/genesis.txt
   moby-dick/bible/job.txt
   moby-dick/moby-dick.txt
   moby-dick/paradise_lost.txt
   $ ls -l
   scott scott  168 moby-dick
   scott scott 846049 moby.tar.gz


The usual extension for a file that has had the tar and then the gzip commands
used on it is .tar.gz; however, you could use .tgz and .tar.gzip if you like.

Note - It's entirely possible to use bzip2 with tar instead of gzip. Your
command would look like this (note the -j option, which is where bzip2 comes
in):

     $ tar -jcvf moby.tar.bz2 moby-dick/

In that case, the extension should be .tar.bz2, although you may also use
.tar.bzip2, .tbz2, or .tbz. Yes, it's very confusing that using gzip or bzip2
might both result in a file ending with .tbz. This is a strong argument for
using anything but that particular extension to keep confusion to a minimum.
Test Files That Will Be Untarred and Uncompressed

     -zvtf

Before you take apart a tarball (whether or not it was also compressed using
gzip), it's a really good idea to test it. First, you'll know if the tarball is
corrupted, saving yourself hair pulling when files don't seem to work. Second,
you'll know if the person who created the tarball thoughtfully tarred up a
directory containing 100 files, or instead thoughtlessly tarred up 100
individual files, which you're just about to spew all over your desktop.

To test your tarball (once again assuming it was also zipped using gzip), use
the -t (or --list) option.
   $ tar -zvtf moby.tar.gz
   scott/scott 0 moby-dick/
   scott/scott 102519 moby-dick/job.txt
   scott/scott 0 moby-dick/bible/
   scott/scott 207254 moby-dick/bible/genesis.txt
   scott/scott 102519 moby-dick/bible/job.txt
   scott/scott 1236574 moby-dick/moby-dick.txt
   scott/scott 508925 moby-dick/paradise_lost.txt


This tells you the permissions, ownership, file size, and time for each file.
In addition, because every line begins with moby-dick/, you can see that you're
going to end up with a directory that contains within it all the files and
subdirectories that accompany the tarball, which is a relief.

Be sure that the -f is the last option because after that you're going to
specify the name of the .tar.gz file. If you don't, tar complains:
   $ tar -zvft moby.tar.gz
   tar: You must specify one of the '-Acdtrux' options
   Try 'tar --help' or 'tar --usage' for more information.


Now that you've ensured that your .tar.gz file isn't corrupted, it's time to
actually open it up, as you'll see in the following section.

Note - If you're testing a tarball that was compressed using bzip2, just use
this command instead:

     $ tar -jvtf moby.tar.bz2
Untar and Uncompress Files

     -zxvf

To create a .tar.gz file, you used a set of options: -zcvf. To untar and
uncompress the resulting file, you only make one substitution: -x (or
--extract) for -c (or --create).
   $ ls -l
   rsgranne rsgranne 846049 moby.tar.gz
   $ tar -zxvf moby.tar.gz
   moby-dick/
   moby-dick/job.txt
   moby-dick/bible/
   moby-dick/bible/genesis.txt
   moby-dick/bible/job.txt
   moby-dick/moby-dick.txt
   moby-dick/paradise_lost.txt
   $ ls -l
   rsgranne rsgranne  168 moby-dick
   rsgranne rsgranne 846049 moby.tar.gz


Make sure you always test the file before you open it, as covered in the
previous section, "Test Files That Will Be Untarred and Uncompressed." That
means the order of commands you should run will look like this:
   $ tar -zvtf moby.tar.gz
   $ tar -zxvf moby.tar.gz


Note - If you're opening a tarball that was compressed using bzip2, just use
this command instead:

      $ tar -jxvf moby.tar.bz2
Conclusion

Back in the days of slow modems and tiny hard drives, archiving and compression
was a necessity. These days, it's more of a convenience, but it's still
something you'll find yourself using all the time. For instance, if you ever
download source code to compile it, more than likely you'll find yourself
face-to-face with a file such as sourcecode.tar.gz. In the future, you'll
probably see more and more of those files ending with .tar.bz2. And if you
exchange files with Windows users, you're going to run into files that end with
.zip. Learn how to use your archival and compression tools because you're going
to be using them far more than you think.

About the Author:

Scott Granneman is a monthly columnist for SecurityFocus and Linux Magazine, as
well as a professional blogger on The Open Source Weblog. He is an adjunct
Professor at Washington University, St. Louis and at Webster University,
teaching a variety of courses about technology and the Internet.



         "Linux Phrasebook" by Scott Granneman
         ISBN: 0-672-32838-0
         http://www.samspublishing.com/bookstore/product.asp?isbn=0672328380&rl
=1
         C Copyright Pearson Education.  All rights reserved.
         Chapter excerpt provided by Sams Publishing an imprint of Pearson
Education

         Reprinted with permission.
      



Links

Source URL: http://interactive.linuxjournal.com/article/9370

--- BBBS/NT v4.00 MP
 * Origin: Omicron Theta (1:261/1)