bzip2 中文man页面

系统
bzip2 采用 Burrows-Wheeler 块排序文本压缩算法和 Huffman 编码方式压缩文件。压缩率一般比基于 LZ77/LZ78 的压缩软件好得多,其性能接近 PPM 族统计类压缩软件。

NAME 命令名


bzip2, bunzip2 - 一种块排序文件压缩软件,v0.9.5
bzcat - 将文件解压缩至标准输出
bzip2recover - 恢复损坏的 bzip2 文件

总览

bzip2 [ -cdfkqstvzVL123456789 ] [ filenames ... ]
bunzip2 [ -fkvsVL ] [ filenames ... ]
bzcat [ -s ] [ filenames ... ]
bzip2recover filename

描述

bzip2 采用 Burrows-Wheeler 块排序文本压缩算法和 Huffman 编码方式压缩文件。压缩率一般比基于 LZ77/LZ78 的压缩软件好得多,其性能接近 PPM 族统计类压缩软件。

命令行参数有意设计为非常接近 GNU gzip 的形式,但也不完全相同。

bzip2 从命令行读入文件名和参数。 每个文件被名为 "原始文件名.bz2" 的压缩文件替换。每个压缩文件具有与原文件相同的修改时间、 权限, 如果可能的话,还具有相同的属主, 因此在解压缩时这些特性将正确地恢复。在某些文件系统中, 没有权限、 属主或时间的概念,或者对文件名的长度有严格限制, 例如 MSDOS, 在这种情况下,bzip2 没有保持原文件名、 属主、 权限以及时间的机制,从这个意义上说,bzip2 对文件名的处理是幼稚的。

bzip2bunzip2 在缺省情况下不覆盖已有的文件。 如果想覆盖已有的文件,要指定 -f 选项。

如果未指定文件名, bzip2 将压缩来自标准输入的数据并写往标准输出。在这种情况下, bzip2 会拒绝将压缩结果写往终端,因为这完全无法理解并且是没有意义的。

bunzip2 (以及 bzip2 -d) 对所有指定的文件进行解压缩处理。不是由 bzip2 产生的文件将被忽略,同时发出一个警告信息。 bzip2 按下列方式由压缩文件名确定解压后的文件名:


       filename.bz2    解压成   filename
       filename.bz     解压成   filename
       filename.tbz2   解压成   filename.tar
       filename.tbz    解压成   filename.tar
       anyothername    解压成   anyothername.out

如果文件名的后缀不是下列之一: .bz2, .bz, .tbz2.tbz, .bzip2 将抱怨无法确定原始文件名,并采用原文件名加 .out 作为解压缩文件名。

在压缩时,如果不提供文件名,bzip2 将从标准输入读取数据,压缩结果写往标准输出。

bunzip2 能够正确地解压由两个或更多个压缩文件连在一起的文件。解压的结果为相应的连在一起的未压缩文件。
 bzip2 也支持对连在一起的压缩文件的完整性检查(-t选项)。

同样可采用 -c 选项将文件压缩或解压缩至标准输出。多个文件可通过这种方式压缩或解压缩。输出结果被依次送往标准输出。 采用这种方式对多个文件的压缩将生成包含多个压缩文件的数据流。这样的数据流只能被 0.9.0 版或其后续版本的 bzip2 正确解压。较早版本的 bzip2 会在解压完第一个文件之后停止。

bzcat (或 bzip2 -dc) 将所有指定文件解压缩至标准输出。

bzip2 可从环境变量 BZIP2BZIP 中依次读取参数, 并在命令行参数之前对其进行处理。 这是提供缺省选项的方便途径。

即使压缩后的文件略大于原文件, 压缩也总是照样进行。小于大约 100 字节的文件压缩后倾向于变大,因为会有一个 50 字节的数据头。 对于随机数据 (包括大多数压缩软件的输出), 大约每字节压成 8.05 位, 放大率约为 0.5%。

bzip2 采用 32 位 CRC 校验码作自我检查,以确认解压后的文件与原始文件相同。这可用于检测压缩文件是否损坏,并防止 bzip2 中未知的缺陷(运气好的话这种可能性非常小)。 数据损坏而未检测到的几率非常之小,对于每个被处理的文件大约是四十亿分之一。检查是在解压缩时进行的, 因此它只能说明某个地方出问题了。它能帮助恢复原始未压缩的数据。可以用 bzip2recover 来尝试从损坏的文件中恢复数据。

返回值:正常退出返回 0, 出现环境问题返回 1 (文件未找到,非法的选项,I/O错误等),返回 2 表明压缩文件损坏,出现导致 bzip2 紧急退出的内部一致性错误(例如缺陷)时返回 3。

选项

-c --stdout
将数据压缩或解压缩至标准输出。
-d --decompress
强制解压缩。 bzip2, bunzip2 以及 bzcat 实际上是同一个程序,进行何种操作将根据程序名确定。指定该选项后将不考虑这一机制,强制 bzip2 进行解压缩。
-z --compress
-d 选项的补充:强制进行压缩操作,而不管执行的是哪个程序。
-t --test
检查指定文件的完整性,但并不对其解压缩。实际上将对数据进行实验性的解压缩操作,而不输出结果。
-f --force
强制覆盖输出文件。通常 bzip2 不会覆盖已经存在的文件。该选项还强制 bzip2 打破文件的硬连接,缺省情况下 bzip2 不会这么做。
-k --keep
在压缩或解压缩时保留输入文件(不删除这些文件)。
-s --small
在压缩、 解压缩及检查时减少内存用量。 采用一种修正的算法进行压缩和测试,每个数据块仅需要 2.5 个字节。这意味着任何文件都可以在 2300k 的内存中进行解压缩,尽管速度只有通常情况下的一半。

在压缩时,-s将选定 200k 的块长度,内存用量也限制在 200k 左右,代价是压缩率会降低。总之,如果机器的内存较少(8兆字节或更少),可对所有操作都采用-s选项。参见下面的内存管理。

-q --quiet
压制不重要的警告信息。属于 I/O 错误及其它严重事件的信息将不会被压制。
-v --verbose
详尽模式 -- 显示每个被处理文件的压缩率。命令行中更多的 -v 选项将增加详细的程度,使 bzip2 显示出许多主要用于诊断目的信息。
-L --license -V --version
显示软件版本,许可证条款及条件。
-1 to -9
在压缩时将块长度设为 100 k、200 k .. 900 k。对解压缩没有影响。参见下面的内存管理。
--
将所有后面的命令行变量看作文件名,即使这些变量以减号"-"打头。可用这一选项处理以减号"-"打头的文件名,例如:bzip2 -- -myfilename.
--repetitive-fast --repetitive-best
这些选项在 0.9.5 及其以上版本中是多余的。在较早的版本中,这两个选项对排序算法的行为提供了一些粗糙的控制,有些情况下很有用。 0.9.5 及其以上版本采用了改进的算法而与这些选项无关。

内存管理

bzip2 按照数据块压缩大文件。 数据块长度同时影响数据的压缩率和压缩及解压缩时需要的内存用量。 选项 -1 至 -9 将数据块长度分别指定为 100,000 字节至 900,000(缺省)字节。在解压缩时, 压缩时使用的块长度从压缩文件的头中读取,同时 bunzip2 分配出刚好够用的内存对文件进行解压缩。由于数据块长度保存在压缩文件中, 所以在解压缩时不需要 -1 至 -9 这些选项,因而将被忽略。

可以按下面的公式估计压缩和解压缩时的内存用量,单位为字节:


       压缩:   400k + ( 8 x 数据块长度 )


       解压缩: 100k + ( 4 x 数据块长度 ), 或
                      100k + ( 2.5 x 数据块长度 )

大数据块长度产生迅速缩小的临界返回 (give rapidly diminishing marginal returns)。在小机器上使用 bzip2 时, 一个值得记住的事实是, 大多数压缩来自数据块长度的前 200 或 300k。另外重要的一点是, 解压缩时内存的需要量是在压缩时用块长度选项设定的。

对于缺省用 900k 的数据块长度压缩的文件, bunzip2 大约需要 3700k 字节的内存进行解压缩。为支持一台 4MB 机器上任何文件的解压缩, bunzip2 有一个选项大约只需一半容量的内存,约 2300k 字节。 解压缩速度同样也降低一半。因此应该只在需要时采用该选项。相应的选项标志为 -s。

一般来说,应尽量采用内存允许的最大数据块长度,因为这能达到最好的压缩率,压缩和解压缩速度实质上不受块长度的影响。

另一个值得注意的问题是关于小于一个数据块长度的文件的, 也就是说, 所遇到的大多数文件使用一个大数据块。 由于文件长度小于一个数据块长度,实际使用到的内存与文件长度成正比。例如,采用 -9 选项压缩一个 20,000 字节的文件时,将分配 7600k 的内存,但其中只用到了 400k+20000*8=560k 字节。同样地,在解压缩时将分配 3700k 内存,但只用到 100k + 20000 * 4 = 180 k 字节。

下表总结了不同数据块长度下的内存用量。同时列出的还有 Calgary 文本压缩语料库中的 14 个文件的压缩长度,这 14 个文件压缩前总长度为 3,141,622 字节。这些数据显示了压缩率是如何随数据块长度变化的。由于这一语料库主要由小文件组成, 所以这些数字并没有充分体现出大文件情况下,采用大数据块所能达到的较高压缩率的优势。


           压缩时      解压缩     解压缩 -s     语料库文件
    Flag   内存用量   内存用量   选项内存用量   压缩长度


     -1      1200k       500k         350k      914704
     -2      2000k       900k         600k      877703
     -3      2800k      1300k         850k      860338
     -4      3600k      1700k        1100k      846899
     -5      4400k      2100k        1350k      845160
     -6      5200k      2500k        1600k      838626
     -7      6100k      2900k        1850k      834096
     -8      6800k      3300k        2100k      828642
     -9      7600k      3700k        2350k      828642

从损坏的文件中恢复数据

bzip2 按数据块对数据进行压缩,数据块长度通常为 900k 字节。每个数据块被独立地处理。如果由于介质或传输错误导致多数据块的 .bz2 文件损坏,有可能将文件中未损坏的数据块中的数据恢复。

压缩后的数据块以一个 48 位的结构分界,因而有可能在合理的范围内找到块边界。每个数据块也带着自己的 32 位 CRC 校验码,因此可以区分损坏与未损坏的数据块。

bzip2recover 是一个简单的程序,它的功能是在 .bz2 文件中寻找数据块,并将每个数据块写到自己的 .bz2 文件中。然后可以用 bzip2 -t 测试结果的完整性,将未损坏的部分解压缩。

bzip2recover 只有一个命令行变量,即损坏文件的名字。输出结果是一系列象 "rec0001file.bz2"、 "rec0002file.bz2" 这样的文件,每个文件含有从损坏文件中找出的数据块。输出文件名设计为在接下来的处理中可方便地使用通配符,例如,"bzip2 -dc rec*file.bz2>recovered_data",可按正确的次序列出文件。

bzip2recover 在处理大文件时最有用, 因为大文件含有很多数据块。显然用它处理单个数据块的损坏文件不会有任何结果,因为一个损坏的数据块是无法恢复的。如果想尽量减少潜在的由于介质及传输错误导致的数据损坏,可以考虑采用较小的数据块长度进行压缩。

有关性能的注解

在压缩的排序阶段, 相似的字符串将被聚集在一起。 因此, 对于包含很长重复符号的文件, 例如象 "aabaabaabaab......" 这样的字符串(重复几百次), 压缩速度会比通常情况慢得多。 0.9.5 及其以上版本在处理这样的重复时, 速度比以前版本提高了很多。 最坏情况与平均情况下的压缩时间之比约为 10:1。 对于以前的版本,这一数字大约是 100:1 以上。你如果愿意,可采用 -vvvv 选项来非常详细地监视这一过程。

解压缩速度并不受这些现象的影响。

bzip2 通常分配出几兆字节的内存用于处理数据, 对这些内存的访问是以相当随机的方式进行的。 这意味着, 压缩及解压缩的性能在很大程度上取决于机器上处理高速缓存未命中的速度。 因此,已经观察到对程序作很小的减少失败率的改动会导致不成比例的很大的性能上的提升。 我设想 bzip2 在有大量高速缓存机器上的性能最佳。

警告

I/O 错误信息并不是很有用。 bzip2 会尽量探测 I/O 错误信息并干净地退出, 但问题的细节有时看上去很容易引起误解。

本手册页适用于 0.9.5 版的 bzip2。 由这一版本的 bzip2 产生的压缩数据与以前的公开版本 0.1pl2、0.9.0 完全兼容,但有一个例外:0.9.0 及其以上版本能正确解压缩多个连在一起的压缩文件,0.1pl2 则不能, 它将在解压缩完数据流中的第一个文件之后停止。

bzip2recover 采用 32 位的整型数表示压缩文件中位的位置,因此它无法处理大于 512 兆字节的文件。但这一问题很容易解决。

作者

Julian Seward, jseward@acm.org.

http://www.muraroa.demon.co.uk

bzip2 包含的想法及概念至少归功于下列人员: Michael Burrows 和 David Wheeler(块排序变换), David Wheeler(Huffman 编码器), Peter Fenwick(原始 bzip 的结构编程模型及许多改进),Alistair Moffat、 Ian Witten(原始 bzip 中的算法编码)。我非常感激他们的帮助、 支持以及建议。 参见源发布的手册中有关文档来源中的线索。 Christian von Roques 曾鼓励我寻找更快的排序算法, 以提高压缩速度。 bela Lubkin 曾鼓励我改进最坏情况下的压缩性能。 很多人给我发来修补程序, 帮助解决移植问题,租借机器,提出建议等。

#p#

NAME

bzip2, bunzip2 - a block-sorting file compressor, v1.0.2
bzcat - decompresses files to stdout
bzip2recover - recovers data from damaged bzip2 files

SYNOPSIS

bzip2 [ -cdfkqstvzVL123456789 ] [ filenames ... ]
bunzip2 [ -fkvsVL ] [ filenames ... ]
bzcat [ -s ] [ filenames ... ]
bzip2recover filename

DESCRIPTION

bzip2 compresses files using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding. Compression is generally considerably better than that achieved by more conventional LZ77/LZ78-based compressors, and approaches the performance of the PPM family of statistical compressors.

The command-line options are deliberately very similar to those of GNU gzip, but they are not identical.

bzip2 expects a list of file names to accompany the command-line flags. Each file is replaced by a compressed version of itself, with the name "original_name.bz2". Each compressed file has the same modification date, permissions, and, when possible, ownership as the corresponding original, so that these properties can be correctly restored at decompression time. File name handling is naive in the sense that there is no mechanism for preserving original file names, permissions, ownerships or dates in filesystems which lack these concepts, or have serious file name length restrictions, such as MS-DOS.

bzip2 and bunzip2 will by default not overwrite existing files. If you want this to happen, specify the -f flag.

If no file names are specified, bzip2 compresses from standard input to standard output. In this case, bzip2 will decline to write compressed output to a terminal, as this would be entirely incomprehensible and therefore pointless.

bunzip2 (or bzip2 -d) decompresses all specified files. Files which were not created by bzip2 will be detected and ignored, and a warning issued. bzip2 attempts to guess the filename for the decompressed file from that of the compressed file as follows:


       filename.bz2    becomes   filename
       filename.bz     becomes   filename
       filename.tbz2   becomes   filename.tar
       filename.tbz    becomes   filename.tar
       anyothername    becomes   anyothername.out

If the file does not end in one of the recognised endings, .bz2, .bz, .tbz2 or .tbz, bzip2 complains that it cannot guess the name of the original file, and uses the original name with .out appended.

As with compression, supplying no filenames causes decompression from standard input to standard output.

bunzip2 will correctly decompress a file which is the concatenation of two or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing (-t) of concatenated compressed files is also supported.

You can also compress or decompress files to the standard output by giving the -c flag. Multiple files may be compressed and decompressed like this. The resulting outputs are fed sequentially to stdout. Compression of multiple files in this manner generates a stream containing multiple compressed file representations. Such a stream can be decompressed correctly only by bzip2 version 0.9.0 or later. Earlier versions of bzip2 will stop after decompressing the first file in the stream.

bzcat (or bzip2 -dc) decompresses all specified files to the standard output.

bzip2 will read arguments from the environment variables BZIP2 and BZIP, in that order, and will process them before any arguments read from the command line. This gives a convenient way to supply default arguments.

Compression is always performed, even if the compressed file is slightly larger than the original. Files of less than about one hundred bytes tend to get larger, since the compression mechanism has a constant overhead in the region of 50 bytes. Random data (including the output of most file compressors) is coded at about 8.05 bits per byte, giving an expansion of around 0.5%.

As a self-check for your protection, bzip2 uses 32-bit CRCs to make sure that the decompressed version of a file is identical to the original. This guards against corruption of the compressed data, and against undetected bugs in bzip2 (hopefully very unlikely). The chances of data corruption going undetected is microscopic, about one chance in four billion for each file processed. Be aware, though, that the check occurs upon decompression, so it can only tell you that something is wrong. It can't help you recover the original uncompressed data. You can use bzip2recover to try to recover data from damaged files.

Return values: 0 for a normal exit, 1 for environmental problems (file not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt compressed file, 3 for an internal consistency error (eg, bug) which caused bzip2 to panic.

OPTIONS

-c --stdout
Compress or decompress to standard output.
-d --decompress
Force decompression. bzip2, bunzip2 and bzcat are really the same program, and the decision about what actions to take is done on the basis of which name is used. This flag overrides that mechanism, and forces bzip2 to decompress.
-z --compress
The complement to -d: forces compression, regardless of the invocation name.
-t --test
Check integrity of the specified file(s), but don't decompress them. This really performs a trial decompression and throws away the result.
-f --force
Force overwrite of output files. Normally, bzip2 will not overwrite existing output files. Also forces bzip2 to break hard links to files, which it otherwise wouldn't do.

bzip2 normally declines to decompress files which don't have the correct magic header bytes. If forced (-f), however, it will pass such files through unmodified. This is how GNU gzip behaves.

-k --keep
Keep (don't delete) input files during compression or decompression.
-s --small
Reduce memory usage, for compression, decompression and testing. Files are decompressed and tested using a modified algorithm which only requires 2.5 bytes per block byte. This means any file can be decompressed in 2300k of memory, albeit at about half the normal speed.

During compression, -s selects a block size of 200k, which limits memory use to around the same figure, at the expense of your compression ratio. In short, if your machine is low on memory (8 megabytes or less), use -s for everything. See MEMORY MANAGEMENT below.

-q --quiet
Suppress non-essential warning messages. Messages pertaining to I/O errors and other critical events will not be suppressed.
-v --verbose
Verbose mode -- show the compression ratio for each file processed. Further -v's increase the verbosity level, spewing out lots of information which is primarily of interest for diagnostic purposes.
-L --license -V --version
Display the software version, license terms and conditions.
-1 (or --fast) to -9 (or --best)
Set the block size to 100 k, 200 k .. 900 k when compressing. Has no effect when decompressing. See MEMORY MANAGEMENT below. The --fast and --best aliases are primarily for GNU gzip compatibility. In particular, --fast doesn't make things significantly faster. And --best merely selects the default behaviour.
--
Treats all subsequent arguments as file names, even if they start with a dash. This is so you can handle files with names beginning with a dash, for example: bzip2 -- -myfilename.
--repetitive-fast --repetitive-best
These flags are redundant in versions 0.9.5 and above. They provided some coarse control over the behaviour of the sorting algorithm in earlier versions, which was sometimes useful. 0.9.5 and above have an improved algorithm which renders these flags irrelevant.

MEMORY MANAGEMENT

bzip2 compresses large files in blocks. The block size affects both the compression ratio achieved, and the amount of memory needed for compression and decompression. The flags -1 through -9 specify the block size to be 100,000 bytes through 900,000 bytes (the default) respectively. At decompression time, the block size used for compression is read from the header of the compressed file, and bunzip2 then allocates itself just enough memory to decompress the file. Since block sizes are stored in compressed files, it follows that the flags -1 to -9 are irrelevant to and so ignored during decompression.

Compression and decompression requirements, in bytes, can be estimated as:


       Compression:   400k + ( 8 x block size )


       Decompression: 100k + ( 4 x block size ), or
                      100k + ( 2.5 x block size )

Larger block sizes give rapidly diminishing marginal returns. Most of the compression comes from the first two or three hundred k of block size, a fact worth bearing in mind when using bzip2 on small machines. It is also important to appreciate that the decompression memory requirement is set at compression time by the choice of block size.

For files compressed with the default 900k block size, bunzip2 will require about 3700 kbytes to decompress. To support decompression of any file on a 4 megabyte machine, bunzip2 has an option to decompress using approximately half this amount of memory, about 2300 kbytes. Decompression speed is also halved, so you should use this option only where necessary. The relevant flag is -s.

In general, try and use the largest block size memory constraints allow, since that maximises the compression achieved. Compression and decompression speed are virtually unaffected by block size.

Another significant point applies to files which fit in a single block -- that means most files you'd encounter using a large block size. The amount of real memory touched is proportional to the size of the file, since the file is smaller than a block. For example, compressing a file 20,000 bytes long with the flag -9 will cause the compressor to allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the decompressor will allocate 3700k but only touch 100k + 20000 * 4 = 180 kbytes.

Here is a table which summarises the maximum memory usage for different block sizes. Also recorded is the total compressed size for 14 files of the Calgary Text Compression Corpus totalling 3,141,622 bytes. This column gives some feel for how compression varies with block size. These figures tend to understate the advantage of larger block sizes for larger files, since the Corpus is dominated by smaller files.


           Compress   Decompress   Decompress   Corpus
    Flag     usage      usage       -s usage     Size


     -1      1200k       500k         350k      914704
     -2      2000k       900k         600k      877703
     -3      2800k      1300k         850k      860338
     -4      3600k      1700k        1100k      846899
     -5      4400k      2100k        1350k      845160
     -6      5200k      2500k        1600k      838626
     -7      6100k      2900k        1850k      834096
     -8      6800k      3300k        2100k      828642
     -9      7600k      3700k        2350k      828642

RECOVERING DATA FROM DAMAGED FILES

bzip2 compresses files in blocks, usually 900kbytes long. Each block is handled independently. If a media or transmission error causes a multi-block .bz2 file to become damaged, it may be possible to recover data from the undamaged blocks in the file.

The compressed representation of each block is delimited by a 48-bit pattern, which makes it possible to find the block boundaries with reasonable certainty. Each block also carries its own 32-bit CRC, so damaged blocks can be distinguished from undamaged ones.

bzip2recover is a simple program whose purpose is to search for blocks in .bz2 files, and write each block out into its own .bz2 file. You can then use bzip2 -t to test the integrity of the resulting files, and decompress those which are undamaged.

bzip2recover takes a single argument, the name of the damaged file, and writes a number of files "rec00001file.bz2", "rec00002file.bz2", etc, containing the extracted blocks. The output filenames are designed so that the use of wildcards in subsequent processing -- for example, "bzip2 -dc rec*file.bz2 > recovered_data" -- processes the files in the correct order.

bzip2recover should be of most use dealing with large .bz2 files, as these will contain many blocks. It is clearly futile to use it on damaged single-block files, since a damaged block cannot be recovered. If you wish to minimise any potential data loss through media or transmission errors, you might consider compressing with a smaller block size.

PERFORMANCE NOTES

The sorting phase of compression gathers together similar strings in the file. Because of this, files containing very long runs of repeated symbols, like "aabaabaabaab ..." (repeated several hundred times) may compress more slowly than normal. Versions 0.9.5 and above fare much better than previous versions in this respect. The ratio between worst-case and average-case compression time is in the region of 10:1. For previous versions, this figure was more like 100:1. You can use the -vvvv option to monitor progress in great detail, if you want.

Decompression speed is unaffected by these phenomena.

bzip2 usually allocates several megabytes of memory to operate in, and then charges all over it in a fairly random fashion. This means that performance, both for compressing and decompressing, is largely determined by the speed at which your machine can service cache misses. Because of this, small changes to the code to reduce the miss rate have been observed to give disproportionately large performance improvements. I imagine bzip2 will perform best on machines with very large caches.

CAVEATS

I/O error messages are not as helpful as they could be. bzip2 tries hard to detect I/O errors and exit cleanly, but the details of what the problem is sometimes seem rather misleading.

This manual page pertains to version 1.0.2 of bzip2. Compressed data created by this version is entirely forwards and backwards compatible with the previous public releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1, but with the following exception: 0.9.0 and above can correctly decompress multiple concatenated compressed files. 0.1pl2 cannot do this; it will stop after decompressing just the first file in the stream.

bzip2recover versions prior to this one, 1.0.2, used 32-bit integers to represent bit positions in compressed files, so it could not handle compressed files more than 512 megabytes long. Version 1.0.2 and above uses 64-bit ints on some platforms which support them (GNU supported targets, and Windows). To establish whether or not bzip2recover was built with such a limitation, run it without arguments. In any event you can build yourself an unlimited version if you can recompile it with MaybeUInt64 set to be an unsigned 64-bit integer.

AUTHOR

Julian Seward, jseward@acm.org.

http://sources.redhat.com/bzip2

The ideas embodied in bzip2 are due to (at least) the following people: Michael Burrows and David Wheeler (for the block sorting transformation), David Wheeler (again, for the Huffman coder), Peter Fenwick (for the structured coding model in the original bzip, and many refinements), and Alistair Moffat, Radford Neal and Ian Witten (for the arithmetic coder in the original bzip). I am much indebted for their help, support and advice. See the manual in the source distribution for pointers to sources of documentation. Christian von Roques encouraged me to look for faster sorting algorithms, so as to speed up compression. Bela Lubkin encouraged me to improve the worst-case compression performance. The bz* scripts are derived from those of GNU gzip. Many people sent patches, helped with portability problems, lent machines, gave advice and were generally helpful.

责任编辑:韩亚珊 来源: CMPP.net
相关推荐

2010-06-24 10:59:11

Bzip2算法

2011-08-24 16:48:36

man中文man

2011-08-15 10:21:09

man中文man

2011-08-15 17:41:58

bunzip2中文man

2010-06-24 10:25:55

Linux Bzip2

2011-08-11 16:11:49

at中文man

2010-06-24 10:42:42

Bzip2压缩

2011-08-29 10:44:30

zic2xpm中文man

2011-08-23 15:35:03

rpm2cpio中文man

2010-06-24 09:29:02

Linux Bzip2

2010-06-24 10:21:46

Linux Bzip2

2011-08-25 10:21:56

man.conf中文man

2011-08-11 15:03:21

ACCESS中文man

2011-08-16 10:59:16

pwconv中文man

2011-08-25 17:43:07

snprintf中文man

2011-08-19 18:30:52

ipc中文man

2011-08-25 17:18:07

putc中文man

2011-08-15 13:53:19

stat中文man

2011-08-25 15:24:31

execlp中文man

2011-08-15 18:06:07

restore中文man
点赞
收藏

51CTO技术栈公众号