Commit 2ae06858 authored by Devon Kearns's avatar Devon Kearns

Imported Upstream version 1.3

bulk_extractor is a group effort from many authors and contributors, including:
Simson L. Garfinkel <> (overall design)
Bruce Allen <> (BEViewer, exif analyzer, windows prefetch rewrite)
Alex Eubanks <> (pe and ELF scanners)
Luis E. Garcia II <> (Initial windows prefetch implementation)
Michael Shick <> (odds and ends)
LIFT is developed CMU under contract to the Department of Defense. The team members were:
Siddharth Gopal <>
Yiming Yang <>
Konstantin Salomatin <>
Jaime Carbonell <>
Public Domain Software
Simson L. Garfinkel
Naval Postgraduate School
Release: 1.3 Beta 1
Release Date: June 30, 2012
Where noted noted, bulk_extractor source code files are public domain
software. Because some of the included
The software provided here is released by the Naval Postgraduate
School, an agency of the U.S. Department of Navy. The software bears
no warranty, either expressed or implied. NPS does not assume legal
liability nor responsibility for a User's use of the software or the
results of such use.
Please note that within the United States, copyright protection, under
Section 105 of the United States Code, Title 17, is not available for
any work of the United States Government and/or for any works created
by United States Government employees.
However, because one of the bulk_extractor source modules (pyxpress.c)
is covered under the GNU Public License, the compiled bulk_extractor
executable is covered under the GPL copyright.
tsk3 includes are Copyright (C) 2010 Brian Carrier and covered under
the Common Public License 1.0
MyFlexLexer.h is (C) 1993 by the Regents of the University of Califronia.
utf8.h is Copyright 2006 Nemanja Trifunovic
base64_forensic.cpp is Copyright (C) 1996-1999 by Internet Software Consortium, with
portions Copyright (c) 1995 by International Business Machines, Inc.
scan_ascii85.cpp is Copyright (C) 2011 Remy Oukaour
scan_json.cpp is Copyright (c) 2005
pyxpress.c is Copyright 2008 (c) Matthieu Suiche. <msuiche[at]>
\ No newline at end of file
This diff is collapsed.
Installation Instructions
Copyright (C) 1994, 1995, 1996, 1999, 2000, 2001, 2002, 2004, 2005,
2006 Free Software Foundation, Inc.
This file is free documentation; the Free Software Foundation gives
unlimited permission to copy, distribute and modify it.
Basic Installation
Briefly, the shell commands `./configure; make; make install' should
configure, build, and install this package. The following
more-detailed instructions are generic; see the `README' file for
instructions specific to this package.
The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation. It uses
those values to create a `Makefile' in each directory of the package.
It may also create one or more `.h' files containing system-dependent
definitions. Finally, it creates a shell script `config.status' that
you can run in the future to recreate the current configuration, and a
file `config.log' containing compiler output (useful mainly for
debugging `configure').
It can also use an optional file (typically called `config.cache'
and enabled with `--cache-file=config.cache' or simply `-C') that saves
the results of its tests to speed up reconfiguring. Caching is
disabled by default to prevent problems with accidental use of stale
cache files.
If you need to do unusual things to compile the package, please try
to figure out how `configure' could check whether to do them, and mail
diffs or instructions to the address given in the `README' so they can
be considered for the next release. If you are using the cache, and at
some point `config.cache' contains results you don't want to keep, you
may remove or edit it.
The file `' (or `') is used to create
`configure' by a program called `autoconf'. You need `' if
you want to change it or regenerate `configure' using a newer version
of `autoconf'.
The simplest way to compile this package is:
1. `cd' to the directory containing the package's source code and type
`./configure' to configure the package for your system.
Running `configure' might take a while. While running, it prints
some messages telling which features it is checking for.
2. Type `make' to compile the package.
3. Optionally, type `make check' to run any self-tests that come with
the package.
4. Type `make install' to install the programs and any data files and
5. You can remove the program binaries and object files from the
source code directory by typing `make clean'. To also remove the
files that `configure' created (so you can compile the package for
a different kind of computer), type `make distclean'. There is
also a `make maintainer-clean' target, but that is intended mainly
for the package's developers. If you use it, you may have to get
all sorts of other programs in order to regenerate files that came
with the distribution.
Compilers and Options
Some systems require unusual options for compilation or linking that the
`configure' script does not know about. Run `./configure --help' for
details on some of the pertinent environment variables.
You can give `configure' initial values for configuration parameters
by setting variables in the command line or in the environment. Here
is an example:
./configure CC=c99 CFLAGS=-g LIBS=-lposix
*Note Defining Variables::, for more details.
Compiling For Multiple Architectures
You can compile the package for more than one kind of computer at the
same time, by placing the object files for each architecture in their
own directory. To do this, you can use GNU `make'. `cd' to the
directory where you want the object files and executables to go and run
the `configure' script. `configure' automatically checks for the
source code in the directory that `configure' is in and in `..'.
With a non-GNU `make', it is safer to compile the package for one
architecture at a time in the source code directory. After you have
installed the package for one architecture, use `make distclean' before
reconfiguring for another architecture.
Installation Names
By default, `make install' installs the package's commands under
`/usr/local/bin', include files under `/usr/local/include', etc. You
can specify an installation prefix other than `/usr/local' by giving
`configure' the option `--prefix=PREFIX'.
You can specify separate installation prefixes for
architecture-specific files and architecture-independent files. If you
pass the option `--exec-prefix=PREFIX' to `configure', the package uses
PREFIX as the prefix for installing programs and libraries.
Documentation and other data files still use the regular prefix.
In addition, if you use an unusual directory layout you can give
options like `--bindir=DIR' to specify different values for particular
kinds of files. Run `configure --help' for a list of the directories
you can set and what kinds of files go in them.
If the package supports it, you can cause programs to be installed
with an extra prefix or suffix on their names by giving `configure' the
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
Optional Features
Some packages pay attention to `--enable-FEATURE' options to
`configure', where FEATURE indicates an optional part of the package.
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
is something like `gnu-as' or `x' (for the X Window System). The
`README' should mention any `--enable-' and `--with-' options that the
package recognizes.
For packages that use the X Window System, `configure' can usually
find the X include and library files automatically, but if it doesn't,
you can use the `configure' options `--x-includes=DIR' and
`--x-libraries=DIR' to specify their locations.
Specifying the System Type
There may be some features `configure' cannot figure out automatically,
but needs to determine by the type of machine the package will run on.
Usually, assuming the package is built to be run on the _same_
architectures, `configure' can figure that out, but if it prints a
message saying it cannot guess the machine type, give it the
`--build=TYPE' option. TYPE can either be a short name for the system
type, such as `sun4', or a canonical name which has the form:
where SYSTEM can have one of these forms:
See the file `config.sub' for the possible values of each field. If
`config.sub' isn't included in this package, then this package doesn't
need to know the machine type.
If you are _building_ compiler tools for cross-compiling, you should
use the option `--target=TYPE' to select the type of system they will
produce code for.
If you want to _use_ a cross compiler, that generates code for a
platform different from the build platform, you should specify the
"host" platform (i.e., that on which the generated programs will
eventually be run) with `--host=TYPE'.
Sharing Defaults
If you want to set default values for `configure' scripts to share, you
can create a site shell script called `' that gives default
values for variables like `CC', `cache_file', and `prefix'.
`configure' looks for `PREFIX/share/' if it exists, then
`PREFIX/etc/' if it exists. Or, you can set the
`CONFIG_SITE' environment variable to the location of the site script.
A warning: not all `configure' scripts look for a site script.
Defining Variables
Variables not defined in a site shell script can be set in the
environment passed to `configure'. However, some packages may run
configure again during the build, and the customized values of these
variables may be lost. In order to avoid this problem, you should set
them in the `configure' command line, using `VAR=value'. For example:
./configure CC=/usr/local2/bin/gcc
causes the specified `gcc' to be used as the C compiler (unless it is
overridden in the site shell script).
Unfortunately, this technique does not work for `CONFIG_SHELL' due to
an Autoconf bug. Until the bug is fixed you can use this workaround:
CONFIG_SHELL=/bin/bash /bin/bash ./configure CONFIG_SHELL=/bin/bash
`configure' Invocation
`configure' recognizes the following options to control how it operates.
Print a summary of the options to `configure', and exit.
Print the version of Autoconf used to generate the `configure'
script, and exit.
Enable the cache: use and save the results of the tests in FILE,
traditionally `config.cache'. FILE defaults to `/dev/null' to
disable caching.
Alias for `--cache-file=config.cache'.
Do not print messages saying which checks are being made. To
suppress all normal output, redirect it to `/dev/null' (any error
messages will still be shown).
Look for the package's source code in directory DIR. Usually
`configure' can determine that directory automatically.
`configure' also accepts some other, not widely useful, options. Run
`configure --help' for more details.
BE_VIEWER_DIR = java_gui
SRC_WIN_DIR = src_win
SUBDIRS = doc doc/user_manual src man python specfiles tests $(BE_VIEWER_DIR) $(SRC_WIN_DIR)
RELEASE_USER = simsong@
VERSION_FN = $(PACKAGE)_version.txt
make check_release_version
make dist
make distcheck
make the_release
@echo Checking version on server for $(VERSION_FN)
/bin/rm -f $(VERSION_FN)
wget -q http://$(RELEASE_HOST)/downloads/$(VERSION_FN)
@echo Version `cat $(VERSION_FN)` is on the server.
@sh -c "if [ `cat $(VERSION_FN)` = $(RELEASE).tar.gz ]; then echo ; echo ; echo $(RELEASE) is already on the server; exit 1; fi"
/bin/rm -f $(VERSION_FN)
gpg --detach-sign $(RELEASE).tar.gz
scp $(RELEASE).tar.gz{,.sig} $(RELEASE_SSH)
ssh $(RELEASE_HOST) 'cd $(RELEASE_LOC);/bin/rm $(PACKAGE).tar.gz;ln -s $(RELEASE).tar.gz $(PACKAGE).tar.gz'
ssh $(RELEASE_HOST) 'echo $(RELEASE).tar.gz > $(RELEASE_PATH)'
@echo Release $(RELEASE) uploaded to server
# config.h doesn't work right with .flex so run configure before make for win32 and win64
rm -rf win32
mkdir win32
cp config.status temp_config.status
make distclean;
cd win32; mingw32-configure
cd win32/src && make
cp win32/src/bulk_extractor.exe win32/src/bulk_extractor32.exe
mv temp_config.status config.status
rm -rf win64
mkdir win64
cp config.status temp_config.status
make distclean;
cd win64; mingw64-configure
cd win64/src && make
cp win64/src/bulk_extractor.exe win64/src/bulk_extractor64.exe
mv temp_config.status config.status
make windist
# windist makes bulk_extractor32.exe and bulk_extractor64.exe
# and puts them in a zip file
windist: win32 win64
@echo checking to see if there are uncommitted sources
(if (svn status | grep '^Q') ; then exit 1 ; fi)
@echo nope
rm -rf $(distdir).zip $(distdir) src/*.exe
mkdir $(distdir)
mkdir $(distdir)/python
cp win32/src/bulk_extractor32.exe $(distdir)
cp win64/src/bulk_extractor64.exe $(distdir)
@echo ====================================
@echo making documentation
make man/bulk_extractor.txt
mv man/bulk_extractor.txt $(distdir)
@echo ====================================
@echo Creating ZIP archive
zip -r9 $(distdir).zip $(distdir)
@echo ====================================
@echo Adding text files to $(distdir).zip
cp python/*.{py,txt} $(distdir)/python
cp ChangeLog $(distdir)/ChangeLog.txt
cp NEWS $(distdir)/NEWS.txt
cp COPYING $(distdir)/COPYING.txt
md5deep -r $distdir > md5list.txt
md5deep -rd $distdir > md5list.xml
mv md5list.txt md5list.xml $(distdir)
zip --to-crlf $(distdir).zip $(distdir)/*.txt $(distdir)/*.xml $(distdir)/python/*
rm -rf $(distdir) $(WINDOWSDOCS)
@echo "***********************"
@echo "*** WINDIST IS MADE ***"
@echo "***********************"
@echo ""
ls -l $(distdir).*
@echo ""
@unzip -l $(distdir).zip
$(BEZIP): $(distdir).zip
mv $(distdir).zip $(BEZIP)
winrelease: $(BEZIP)
@echo checking to see if there are uncommitted sources
(if (svn status | grep '^M') ; then exit 1 ; fi)
@echo nope
echo make $(distdir).zip
echo these files will be deleted:
svn status | grep '^[?]' | awk '{print $2;}'
echo hit return to continue
/bin/rm -rf `svn status | grep '^[?]' | awk '{print $2;}'`
CLEANFILES = man/bulk_extractor.txt
# rm -rf win32
# rm -rf win64
SUFFIXES = .txt .1
/usr/bin/tbl $< | /usr/bin/groff -S -Wall -mtty-char -mandoc -Tascii | /usr/bin/col -bx > $@
.PHONY: windist win32 win64 doxygen
This diff is collapsed.
This diff is collapsed.
Welcome to bulk_extractor!
To install on a Linux/MacOS/Mingw system, use:
$ ./configure
$ make
$ sudo make install
The following directories will NOT be installed with the above commands:
python/ - bulk_extractor python tools.
Copy them where you wish and run them directly.
These tools are experimental.
plugins/ - This is for C/C++ developers only. You can develop your own
bulk_extractor plugins which will then be run at run-time
if the .so or .dll files are in the same directory as
the bulk_extractor executable.
This will install bulk_extractor in /usr/local/bin (by default)
To get started and send an extract of image.raw to OUTPUT, use this command:
$ /usr/local/bin/bulk_extractor -o OUTPUT image.raw
This will create a directory called OUTPUT that contains lots of files you should examine.
Additional Packages used by bulk_extractor:
The TRE or libgnurx regular expression library is required.
TRE is preferred because experiments indicate that it is about 10X faster.
The libgnurx-static package is required.
The LIBEWF library is recommended for access to E01 files.
Packages may be installed by running the script in src_win/.
The additional libraries may be installed by running the script in src_win/.
Compiling bulk_extractor:
bulk_extractor builds with the GNU auto tools. The maintainer has
prevously run automake and autoconf to produce the script
"configure". This script *should* be able to compile bulk_extractor
for your platform.
We recommend compiling bulk_extractor with -O3 and that is the
default. You can disable all optimizaiton flags by specifying the
configure option --with-noopt.
On Fedora, this command should add the appropriate packages:
$ sudo yum update
$ sudo yum groupinstall development-tools
$ sudo yum install flex
On Ubuntu 12.04, this was sufficient:
$ sudo apt-get -y install gcc g++ flex libewf-dev
We recommend installing Mac dependencies using the MacPorts system. Once that is installed, try:
$ sudo port install flex autoconf automake libewf-devel
Note that port installs to /opt/local/bin, so file /etc/paths may need to be updated
to include /opt/local/bin.
Note that libewf-devel may not be available in ports. If it is not, please download
libewf source, ./configure && make && sudo make install
TRE is faster than libgnurx, so we recommend to download the source,
./configure && make && sudo make install
If you really need to read AFFLIB, you will also need to install openssldev
== Compiling for Windows ==
Please see src_win/README for instructions on cross-compiling for Windows from Fedora
using automated scripts.
There are three ways to compile for Windows:
1 - Cross-compiling from a Linux or Mac system with mingw.
2 - Compiling natively on Windows using mingw.
3 - Compiling natively on Windows using cygwin (untested)
Cross-compiling for Windows from Ubuntu 12.04 LTS:
You will need to install mingw-w64 and then you will need to install zlib-dev
$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get -y install mingw-w64
Next, download zlib from
$ ./configure --host=i686-w64-mingw32
This allows the cross-compiling of the 64-bit and the 32-bit
bulk_extractor.exe, although we do not recommend running the 32-bit
Cross-compiling for Windows from Fedora
Please see src_win/README for instructions on cross-compiling for Windows from Fedora
using automated scripts.
Set up mingw and the cross-compilation environment:
$ sudo yum -y install mingw64-gcc-c++ mingw64-zlib-static mingw64-pthreads flex
$ sudo yum -y install autoconf automake # not strictly needed, but necessary to build from SVN/GIT
$ sudo yum -y install zlib-devel zlib-static
Run script found in directory src_win/.
Run script found in directory src_win/ to install libewf and TRE.
Type "make win32" or "make win64".
Naval Postgraduate School Digital Evaluation and Exploitation
June 30, 2012
Attached please find our DFRWS 2012 challenge submission.
1. We are entering as a team. The team members are:
Simson Garfinkel (team leader)
Bruce Allen
Alex Eubanks
Kristina Foster
Tony Melaragno
Joel Young
Siddharth Gopal (*)
Yiming Yang (*)
Konstantin Salomatin (*)
Jamie Carbonell (*)
(*) Provided the LIFT file type identification system, developed under DOD contract in 2011.
2. Our tool has a command line interface and will work out of the box
on MacOS and Linux.
3. The tool has a corresponding API that can be incorporated as part
of other tools. The library is the bulk_extractor plug-in API. This
API is documented in the file src/bulk_extractor.h and
src/bulk_extractor_i.h. Briefly, the scanners provided by the tool can
be linked into any C++ program and called.