Commit 977101ad authored by Julian Seward's avatar Julian Seward

bzip2-0.9.0c

parent 1eb67a9d
Bzip2 is not research work, in the sense that it doesn't present any
new ideas. Rather, it's an engineering exercise based on existing
ideas.
Four documents describe essentially all the ideas behind bzip2:
Michael Burrows and D. J. Wheeler:
"A block-sorting lossless data compression algorithm"
10th May 1994.
Digital SRC Research Report 124.
ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz
Daniel S. Hirschberg and Debra A. LeLewer
"Efficient Decoding of Prefix Codes"
Communications of the ACM, April 1990, Vol 33, Number 4.
You might be able to get an electronic copy of this
from the ACM Digital Library.
David J. Wheeler
Program bred3.c and accompanying document bred3.ps.
This contains the idea behind the multi-table Huffman
coding scheme.
ftp://ftp.cl.cam.ac.uk/pub/user/djw3/
Jon L. Bentley and Robert Sedgewick
"Fast Algorithms for Sorting and Searching Strings"
Available from Sedgewick's web page,
www.cs.princeton.edu/~rs
The following paper gives valuable additional insights into the
algorithm, but is not immediately the basis of any code
used in bzip2.
Peter Fenwick:
Block Sorting Text Compression
Proceedings of the 19th Australasian Computer Science Conference,
Melbourne, Australia. Jan 31 - Feb 2, 1996.
ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps
All three are well written, and make fascinating reading. If you want
to modify bzip2 in any non-trivial way, I strongly suggest you obtain,
read and understand these papers.
I am much indebted to the various authors for their help, support and
advice.
0.9.0
~~~~~
First version.
0.9.0a
~~~~~~
Removed 'ranlib' from Makefile, since most modern Unix-es
don't need it, or even know about it.
0.9.0b
~~~~~~
Fixed a problem with error reporting in bzip2.c. This does not effect
the library in any way. Problem is: versions 0.9.0 and 0.9.0a (of the
program proper) compress and decompress correctly, but give misleading
error messages (internal panics) when an I/O error occurs, instead of
reporting the problem correctly. This shouldn't give any data loss
(as far as I can see), but is confusing.
Made the inline declarations disappear for non-GCC compilers.
0.9.0c
~~~~~~
Fixed some problems in the library pertaining to some boundary cases.
This makes the library behave more correctly in those situations. The
fixes apply only to features (calls and parameters) not used by
bzip2.c, so the non-fixedness of them in previous versions has no
effect on reliability of bzip2.c.
In bzlib.c:
* made zero-length BZ_FLUSH work correctly in bzCompress().
* fixed bzWrite/bzRead to ignore zero-length requests.
* fixed bzread to correctly handle read requests after EOF.
* wrong parameter order in call to bzDecompressInit in
bzBuffToBuffDecompress. Fixed.
In compress.c:
* changed setting of nGroups in sendMTFValues() so as to
do a bit better on small files. This _does_ effect
bzip2.c.
This diff is collapsed.
CC = gcc
SH = /bin/sh
CFLAGS = -O3 -fomit-frame-pointer -funroll-loops
CC=gcc
CFLAGS=-Wall -O2 -fomit-frame-pointer -fno-strength-reduce
OBJS= blocksort.o \
huffman.o \
crctable.o \
randtable.o \
compress.o \
decompress.o \
bzlib.o
all: lib bzip2 test
bzip2: lib
$(CC) $(CFLAGS) -c bzip2.c
$(CC) $(CFLAGS) -o bzip2 bzip2.o -L. -lbz2
$(CC) $(CFLAGS) -o bzip2recover bzip2recover.c
lib: $(OBJS)
rm -f libbz2.a
ar clq libbz2.a $(OBJS)
all:
cat words0
$(CC) $(CFLAGS) -o bzip2 bzip2.c
$(CC) $(CFLAGS) -o bzip2recover bzip2recover.c
rm -f bunzip2
ln -s ./bzip2 ./bunzip2
cat words1
test: bzip2
@cat words1
./bzip2 -1 < sample1.ref > sample1.rb2
./bzip2 -2 < sample2.ref > sample2.rb2
./bunzip2 < sample1.bz2 > sample1.tst
./bunzip2 < sample2.bz2 > sample2.tst
cat words2
./bzip2 -d < sample1.bz2 > sample1.tst
./bzip2 -d < sample2.bz2 > sample2.tst
@cat words2
cmp sample1.bz2 sample1.rb2
cmp sample2.bz2 sample2.rb2
cmp sample1.tst sample1.ref
cmp sample2.tst sample2.ref
cat words3
@cat words3
clean:
rm -f *.o libbz2.a bzip2 bzip2recover sample1.rb2 sample2.rb2 sample1.tst sample2.tst
.c.o: $*.o bzlib.h bzlib_private.h
$(CC) $(CFLAGS) -c $*.c -o $*.o
clean:
rm -f bzip2 bunzip2 bzip2recover sample*.tst sample*.rb2
tarfile:
tar cvf interim.tar *.c *.h Makefile manual.texi manual.ps LICENSE bzip2.1 bzip2.1.preformatted bzip2.txt words1 words2 words3 sample1.ref sample2.ref sample1.bz2 sample2.bz2 *.html README CHANGES libbz2.def libbz2.dsp dlltest.dsp
This diff is collapsed.
As of today (3 March 1998) I've removed the
Win95/NT executables from this distribution, sorry.
You can still get an executable from
http://www.muraroa.demon.co.uk, or (as a last
resort) by mailing me at jseward@acm.org.
The reason for this change of packaging is that it
makes it easier for me to fix problems with specific
executables if they are not included in the main
distribution.
J
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -7,43 +7,63 @@
/*--
This program is bzip2recover, a program to attempt data
salvage from damaged files created by the accompanying
bzip2-0.1 program.
Copyright (C) 1996, 1997 by Julian Seward.
Guildford, Surrey, UK
email: jseward@acm.org
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
The GNU General Public License is contained in the file LICENSE.
bzip2-0.9.0c program.
Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Guildford, Surrey, UK.
jseward@acm.org
bzip2/libbzip2 version 0.9.0c of 18 October 1998
--*/
/*--
This program is a complete hack and should be rewritten
properly. It isn't very complicated.
--*/
#include <stdio.h>
#include <errno.h>
#include <malloc.h>
#include <stdlib.h>
#include <strings.h> /*-- or try string.h --*/
#include <string.h>
#define UInt32 unsigned int
#define Int32 int
#define UChar unsigned char
#define Char char
#define Bool unsigned char
#define True 1
#define False 0
typedef unsigned int UInt32;
typedef int Int32;
typedef unsigned char UChar;
typedef char Char;
typedef unsigned char Bool;
#define True ((Bool)1)
#define False ((Bool)0)
Char inFileName[2000];
......@@ -191,8 +211,9 @@ void bsClose ( BitStream* bs )
if (retVal == EOF) writeError();
}
retVal = fclose ( bs->handle );
if (retVal == EOF)
if (retVal == EOF) {
if (bs->mode == 'w') writeError(); else readError();
}
free ( bs );
}
......@@ -248,13 +269,19 @@ Int32 main ( Int32 argc, Char** argv )
UInt32 bitsRead;
UInt32 bStart[20000];
UInt32 bEnd[20000];
UInt32 rbStart[20000];
UInt32 rbEnd[20000];
Int32 rbCtr;
UInt32 buffHi, buffLo, blockCRC;
Char* p;
strcpy ( progName, argv[0] );
inFileName[0] = outFileName[0] = 0;
fprintf ( stderr, "bzip2recover: extracts blocks from damaged .bz2 files.\n" );
fprintf ( stderr, "bzip2recover v0.9.0c: extracts blocks from damaged .bz2 files.\n" );
if (argc != 2) {
fprintf ( stderr, "%s: usage is `%s damaged_file_name'.\n",
......@@ -278,6 +305,8 @@ Int32 main ( Int32 argc, Char** argv )
currBlock = 0;
bStart[currBlock] = 0;
rbCtr = 0;
while (True) {
b = bsGetBit ( bsIn );
bitsRead++;
......@@ -303,19 +332,25 @@ Int32 main ( Int32 argc, Char** argv )
if (bitsRead > 49)
bEnd[currBlock] = bitsRead-49; else
bEnd[currBlock] = 0;
if (currBlock > 0)
if (currBlock > 0 &&
(bEnd[currBlock] - bStart[currBlock]) >= 130) {
fprintf ( stderr, " block %d runs from %d to %d\n",
currBlock, bStart[currBlock], bEnd[currBlock] );
rbCtr+1, bStart[currBlock], bEnd[currBlock] );
rbStart[rbCtr] = bStart[currBlock];
rbEnd[rbCtr] = bEnd[currBlock];
rbCtr++;
}
currBlock++;
bStart[currBlock] = bitsRead;
}
}
bsClose ( bsIn );
/*-- identified blocks run from 1 to currBlock inclusive. --*/
/*-- identified blocks run from 1 to rbCtr inclusive. --*/
if (currBlock < 1) {
if (rbCtr < 1) {
fprintf ( stderr,
"%s: sorry, I couldn't find any block boundaries.\n",
progName );
......@@ -336,23 +371,23 @@ Int32 main ( Int32 argc, Char** argv )
bitsRead = 0;
outFile = NULL;
wrBlock = 1;
wrBlock = 0;
while (True) {
b = bsGetBit(bsIn);
if (b == 2) break;
buffHi = (buffHi << 1) | (buffLo >> 31);
buffLo = (buffLo << 1) | (b & 1);
if (bitsRead == 47+bStart[wrBlock])
if (bitsRead == 47+rbStart[wrBlock])
blockCRC = (buffHi << 16) | (buffLo >> 16);
if (outFile != NULL && bitsRead >= bStart[wrBlock]
&& bitsRead <= bEnd[wrBlock]) {
if (outFile != NULL && bitsRead >= rbStart[wrBlock]
&& bitsRead <= rbEnd[wrBlock]) {
bsPutBit ( bsWr, b );
}
bitsRead++;
if (bitsRead == bEnd[wrBlock]+1) {
if (bitsRead == rbEnd[wrBlock]+1) {
if (outFile != NULL) {
bsPutUChar ( bsWr, 0x17 ); bsPutUChar ( bsWr, 0x72 );
bsPutUChar ( bsWr, 0x45 ); bsPutUChar ( bsWr, 0x38 );
......@@ -360,18 +395,18 @@ Int32 main ( Int32 argc, Char** argv )
bsPutUInt32 ( bsWr, blockCRC );
bsClose ( bsWr );
}
if (wrBlock >= currBlock) break;
if (wrBlock >= rbCtr) break;
wrBlock++;
} else
if (bitsRead == bStart[wrBlock]) {
if (bitsRead == rbStart[wrBlock]) {
outFileName[0] = 0;
sprintf ( outFileName, "rec%4d", wrBlock );
sprintf ( outFileName, "rec%4d", wrBlock+1 );
for (p = outFileName; *p != 0; p++) if (*p == ' ') *p = '0';
strcat ( outFileName, inFileName );
if ( !endsInBz2(outFileName)) strcat ( outFileName, ".bz2" );
fprintf ( stderr, " writing block %d to `%s' ...\n",
wrBlock, outFileName );
wrBlock+1, outFileName );
outFile = fopen ( outFileName, "wb" );
if (outFile == NULL) {
......
This diff is collapsed.
/*-------------------------------------------------------------*/
/*--- Public header file for the library. ---*/
/*--- bzlib.h ---*/
/*-------------------------------------------------------------*/
/*--
This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression.
Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Guildford, Surrey, UK.
jseward@acm.org
bzip2/libbzip2 version 0.9.0c of 18 October 1998
This program is based on (at least) the work of:
Mike Burrows
David Wheeler
Peter Fenwick
Alistair Moffat
Radford Neal
Ian H. Witten
Robert Sedgewick
Jon L. Bentley
For more information on these sources, see the manual.
--*/
#ifndef _BZLIB_H
#define _BZLIB_H
#define BZ_RUN 0
#define BZ_FLUSH 1
#define BZ_FINISH 2
#define BZ_OK 0
#define BZ_RUN_OK 1
#define BZ_FLUSH_OK 2
#define BZ_FINISH_OK 3
#define BZ_STREAM_END 4
#define BZ_SEQUENCE_ERROR (-1)
#define BZ_PARAM_ERROR (-2)
#define BZ_MEM_ERROR (-3)
#define BZ_DATA_ERROR (-4)
#define BZ_DATA_ERROR_MAGIC (-5)
#define BZ_IO_ERROR (-6)
#define BZ_UNEXPECTED_EOF (-7)
#define BZ_OUTBUFF_FULL (-8)
typedef
struct {
char *next_in;
unsigned int avail_in;
unsigned int total_in;
char *next_out;
unsigned int avail_out;
unsigned int total_out;
void *state;
void *(*bzalloc)(void *,int,int);
void (*bzfree)(void *,void *);
void *opaque;
}
bz_stream;
#ifndef BZ_IMPORT
#define BZ_EXPORT
#endif
#ifdef _WIN32
# include <stdio.h>
# include <windows.h>
# ifdef small
/* windows.h define small to char */
# undef small
# endif
# ifdef BZ_EXPORT
# define BZ_API(func) WINAPI func
# define BZ_EXTERN extern
# else
/* import windows dll dynamically */
# define BZ_API(func) (WINAPI * func)
# define BZ_EXTERN
# endif
#else
# define BZ_API(func) func
# define BZ_EXTERN extern
#endif
/*-- Core (low-level) library functions --*/
BZ_EXTERN int BZ_API(bzCompressInit) (
bz_stream* strm,
int blockSize100k,
int verbosity,
int workFactor
);
BZ_EXTERN int BZ_API(bzCompress) (
bz_stream* strm,
int action
);
BZ_EXTERN int BZ_API(bzCompressEnd) (
bz_stream* strm
);
BZ_EXTERN int BZ_API(bzDecompressInit) (
bz_stream *strm,
int verbosity,
int small
);
BZ_EXTERN int BZ_API(bzDecompress) (
bz_stream* strm
);
BZ_EXTERN int BZ_API(bzDecompressEnd) (
bz_stream *strm
);
/*-- High(er) level library functions --*/
#ifndef BZ_NO_STDIO
#define BZ_MAX_UNUSED 5000
typedef void BZFILE;
BZ_EXTERN BZFILE* BZ_API(bzReadOpen) (
int* bzerror,
FILE* f,
int verbosity,
int small,
void* unused,
int nUnused
);
BZ_EXTERN void BZ_API(bzReadClose) (
int* bzerror,
BZFILE* b
);
BZ_EXTERN void BZ_API(bzReadGetUnused) (
int* bzerror,
BZFILE* b,
void** unused,
int* nUnused
);
BZ_EXTERN int BZ_API(bzRead) (
int* bzerror,
BZFILE* b,
void* buf,
int len
);
BZ_EXTERN BZFILE* BZ_API(bzWriteOpen) (
int* bzerror,
FILE* f,
int blockSize100k,
int verbosity,
int workFactor
);
BZ_EXTERN void BZ_API(bzWrite) (
int* bzerror,
BZFILE* b,
void* buf,
int len
);
BZ_EXTERN void BZ_API(bzWriteClose) (
int* bzerror,
BZFILE* b,
int abandon,
unsigned int* nbytes_in,
unsigned int* nbytes_out
);
#endif
/*-- Utility functions --*/
BZ_EXTERN int BZ_API(bzBuffToBuffCompress) (
char* dest,
unsigned int* destLen,
char* source,
unsigned int sourceLen,
int blockSize100k,
int verbosity,
int workFactor
);
BZ_EXTERN int BZ_API(bzBuffToBuffDecompress) (
char* dest,
unsigned int* destLen,
char* source,
unsigned int sourceLen,
int small,
int verbosity
);
/*--
Code contributed by Yoshioka Tsuneo
(QWF00133@niftyserve.or.jp/tsuneo-y@is.aist-nara.ac.jp),
to support better zlib compatibility.
This code is not _officially_ part of libbzip2 (yet);
I haven't tested it, documented it, or considered the
threading-safeness of it.
If this code breaks, please contact both Yoshioka and me.
--*/
BZ_EXTERN const char * BZ_API(bzlibVersion) (
void
);
#ifndef BZ_NO_STDIO
BZ_EXTERN BZFILE * BZ_API(bzopen) (
const char *path,
const char *mode
);
BZ_EXTERN BZFILE * BZ_API(bzdopen) (
int fd,
const char *mode
);
BZ_EXTERN int BZ_API(bzread) (
BZFILE* b,
void* buf,
int len
);
BZ_EXTERN int BZ_API(bzwrite) (
BZFILE* b,
void* buf,
int len
);
BZ_EXTERN int BZ_API(bzflush) (
BZFILE* b
);
BZ_EXTERN void BZ_API(bzclose) (
BZFILE* b
);
BZ_EXTERN const char * BZ_API(bzerror) (
BZFILE *b,
int *errnum
);
#endif
#endif
/*-------------------------------------------------------------*/
/*--- end bzlib.h ---*/
/*-------------------------------------------------------------*/
This diff is collapsed.
This diff is collapsed.
/*-------------------------------------------------------------*/
/*--- Table for doing CRCs ---*/
/*--- crctable.c ---*/
/*-------------------------------------------------------------*/
/*--
This file is a part of bzip2 and/or libbzip2, a program and
library for lossless, block-sorting data compression.
Copyright (C) 1996-1998 Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. The origin of this software must not be misrepresented; you must
not claim that you wrote the original software. If you use this
software in a product, an acknowledgment in the product
documentation would be appreciated but is not required.
3. Altered source versions must be plainly marked as such, and must
not be misrepresented as being the original software.
4. The name of the author may not be used to endorse or promote
products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Julian Seward, Guildford, Surrey, UK.
jseward@acm.org
bzip2/libbzip2 version 0.9.0c of 18 October 1998
This program is based on (at least) the work of:
Mike Burrows
David Wheeler
Peter Fenwick
Alistair Moffat
Radford Neal
Ian H. Witten
Robert Sedgewick
Jon L. Bentley