Commit da1c1dec authored by Sophie Brun's avatar Sophie Brun

Imported Upstream version 2.12

ssdeep was written by Jesse Kornblum and Helmut Grohne.
This diff is collapsed.
This diff is collapsed.
14 Aug 2006 - Initial version (jk)
15 Jul 2010 - Adding quotation marks to filenames
The first line of the file is a header, like this:
ssdeep - Identifies the file type
1.1 - The version of the file format, NOT the version of the program
-- - Separator
The remainder of the line identifies the format of the file.
Note that for version 1.1 these values must be given EXACTLY as shown above
Each line represents the hash of one file as listed in the header.
Specifically, we have the blocksize used by the program, the hash
for this blocksize and twice the blocksize, and the filename. Filenames
are enclosed in quotation marks. Filenames which contain a quotation mark
will have those quotes slash escaped. For example, the file ma"in.c
will be listed as:
Installation Instructions
Copyright (C) 1994, 1995, 1996, 1999, 2000, 2001, 2002, 2004, 2005,
2006 Free Software Foundation, Inc.
This file is free documentation; the Free Software Foundation gives
unlimited permission to copy, distribute and modify it.
Basic Installation
Briefly, the shell commands `./configure; make; make install' should
configure, build, and install this package. The following
more-detailed instructions are generic; see the `README' file for
instructions specific to this package.
The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation. It uses
those values to create a `Makefile' in each directory of the package.
It may also create one or more `.h' files containing system-dependent
definitions. Finally, it creates a shell script `config.status' that
you can run in the future to recreate the current configuration, and a
file `config.log' containing compiler output (useful mainly for
debugging `configure').
It can also use an optional file (typically called `config.cache'
and enabled with `--cache-file=config.cache' or simply `-C') that saves
the results of its tests to speed up reconfiguring. Caching is
disabled by default to prevent problems with accidental use of stale
cache files.
If you need to do unusual things to compile the package, please try
to figure out how `configure' could check whether to do them, and mail
diffs or instructions to the address given in the `README' so they can
be considered for the next release. If you are using the cache, and at
some point `config.cache' contains results you don't want to keep, you
may remove or edit it.
The file `' (or `') is used to create
`configure' by a program called `autoconf'. You need `' if
you want to change it or regenerate `configure' using a newer version
of `autoconf'.
The simplest way to compile this package is:
1. `cd' to the directory containing the package's source code and type
`./configure' to configure the package for your system.
Running `configure' might take a while. While running, it prints
some messages telling which features it is checking for.
2. Type `make' to compile the package.
3. Optionally, type `make check' to run any self-tests that come with
the package.
4. Type `make install' to install the programs and any data files and
5. You can remove the program binaries and object files from the
source code directory by typing `make clean'. To also remove the
files that `configure' created (so you can compile the package for
a different kind of computer), type `make distclean'. There is
also a `make maintainer-clean' target, but that is intended mainly
for the package's developers. If you use it, you may have to get
all sorts of other programs in order to regenerate files that came
with the distribution.
Compilers and Options
Some systems require unusual options for compilation or linking that the
`configure' script does not know about. Run `./configure --help' for
details on some of the pertinent environment variables.
You can give `configure' initial values for configuration parameters
by setting variables in the command line or in the environment. Here
is an example:
./configure CC=c99 CFLAGS=-g LIBS=-lposix
*Note Defining Variables::, for more details.
Compiling For Multiple Architectures
You can compile the package for more than one kind of computer at the
same time, by placing the object files for each architecture in their
own directory. To do this, you can use GNU `make'. `cd' to the
directory where you want the object files and executables to go and run
the `configure' script. `configure' automatically checks for the
source code in the directory that `configure' is in and in `..'.
With a non-GNU `make', it is safer to compile the package for one
architecture at a time in the source code directory. After you have
installed the package for one architecture, use `make distclean' before
reconfiguring for another architecture.
Installation Names
By default, `make install' installs the package's commands under
`/usr/local/bin', include files under `/usr/local/include', etc. You
can specify an installation prefix other than `/usr/local' by giving
`configure' the option `--prefix=PREFIX'.
You can specify separate installation prefixes for
architecture-specific files and architecture-independent files. If you
pass the option `--exec-prefix=PREFIX' to `configure', the package uses
PREFIX as the prefix for installing programs and libraries.
Documentation and other data files still use the regular prefix.
In addition, if you use an unusual directory layout you can give
options like `--bindir=DIR' to specify different values for particular
kinds of files. Run `configure --help' for a list of the directories
you can set and what kinds of files go in them.
If the package supports it, you can cause programs to be installed
with an extra prefix or suffix on their names by giving `configure' the
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
Optional Features
Some packages pay attention to `--enable-FEATURE' options to
`configure', where FEATURE indicates an optional part of the package.
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
is something like `gnu-as' or `x' (for the X Window System). The
`README' should mention any `--enable-' and `--with-' options that the
package recognizes.
For packages that use the X Window System, `configure' can usually
find the X include and library files automatically, but if it doesn't,
you can use the `configure' options `--x-includes=DIR' and
`--x-libraries=DIR' to specify their locations.
Specifying the System Type
There may be some features `configure' cannot figure out automatically,
but needs to determine by the type of machine the package will run on.
Usually, assuming the package is built to be run on the _same_
architectures, `configure' can figure that out, but if it prints a
message saying it cannot guess the machine type, give it the
`--build=TYPE' option. TYPE can either be a short name for the system
type, such as `sun4', or a canonical name which has the form:
where SYSTEM can have one of these forms:
See the file `config.sub' for the possible values of each field. If
`config.sub' isn't included in this package, then this package doesn't
need to know the machine type.
If you are _building_ compiler tools for cross-compiling, you should
use the option `--target=TYPE' to select the type of system they will
produce code for.
If you want to _use_ a cross compiler, that generates code for a
platform different from the build platform, you should specify the
"host" platform (i.e., that on which the generated programs will
eventually be run) with `--host=TYPE'.
Sharing Defaults
If you want to set default values for `configure' scripts to share, you
can create a site shell script called `' that gives default
values for variables like `CC', `cache_file', and `prefix'.
`configure' looks for `PREFIX/share/' if it exists, then
`PREFIX/etc/' if it exists. Or, you can set the
`CONFIG_SITE' environment variable to the location of the site script.
A warning: not all `configure' scripts look for a site script.
Defining Variables
Variables not defined in a site shell script can be set in the
environment passed to `configure'. However, some packages may run
configure again during the build, and the customized values of these
variables may be lost. In order to avoid this problem, you should set
them in the `configure' command line, using `VAR=value'. For example:
./configure CC=/usr/local2/bin/gcc
causes the specified `gcc' to be used as the C compiler (unless it is
overridden in the site shell script).
Unfortunately, this technique does not work for `CONFIG_SHELL' due to
an Autoconf bug. Until the bug is fixed you can use this workaround:
CONFIG_SHELL=/bin/bash /bin/bash ./configure CONFIG_SHELL=/bin/bash
`configure' Invocation
`configure' recognizes the following options to control how it operates.
Print a summary of the options to `configure', and exit.
Print the version of Autoconf used to generate the `configure'
script, and exit.
Enable the cache: use and save the results of the tests in FILE,
traditionally `config.cache'. FILE defaults to `/dev/null' to
disable caching.
Alias for `--cache-file=config.cache'.
Do not print messages saying which checks are being made. To
suppress all normal output, redirect it to `/dev/null' (any error
messages will still be shown).
Look for the package's source code in directory DIR. Usually
`configure' can determine that directory automatically.
`configure' also accepts some other, not widely useful, options. Run
`configure --help' for more details.
libfuzzy_la_SOURCES=fuzzy.c edit_dist.c find-file-size.c
libfuzzy_la_LDFLAGS=-no-undefined -version-info 2:0:0
include_HEADERS=fuzzy.h edit_dist.h
ssdeep_SOURCES = main.cpp match.cpp engine.cpp filedata.cpp \
dig.cpp cycles.cpp helpers.cpp ui.cpp edit_dist.h \
main.h fuzzy.h tchar-local.h ssdeep.h filedata.h match.h
dll: $(libfuzzy_la_SOURCES)
$(CC) $(CFLAGS) -shared -o fuzzy.dll $(libfuzzy_la_SOURCES) \
$(STRIP) fuzzy.dll
CLEANFILES=fuzzy.dll fuzzy.def
EXTRA_DIST=$(man_MANS) config.guess config.sub sample.c FILEFORMAT
README.TXT: ssdeep.1
man ./ssdeep.1 | col -bx > README.TXT
win-docs: $(WINDOWSDOCS)
# flip -d $(WINDOWSDOCS)
# unix2dos $(WINDOWSDOCS)
win-package: win-docs
rm -rf $(distdir).zip $(distdir)
make dll
$(STRIP) ssdeep.exe
mkdir $(distdir)
cp $(WINDOWSDOCS) ssdeep.exe fuzzy.dll fuzzy.def sample.c $(distdir)
# flip -d $(distdir)/{sample.c,fuzzy.def}
# unix2dos $(distdir)/{sample.c,fuzzy.def}
zip -lr9 $(distdir).zip $(distdir)
rm -rf $(distdir) $(WINDOWSDOCS)
world: distclean
./configure --host=i686-w64-mingw32
make win-package
make dist
# Only generic routines go below this line
# ------------------------------------------------------------------
rm -f *~
This diff is collapsed.
** Version 2.12 - 24 Oct 2014
* Bug Fixes
- Fixed issue when comparing identical hashes but with different
block sizes.
** Version 2.11.1 - 27 Sep 2014
* Bug Fixes
- Made libfuzzy compile as a shared library again.
** Version 2.11 - 11 Sep 2014
* New Features
- Added fuzzy_clone function to the API.
- Moved to modern Win32 compiler.
* Bug Fixes
- Fixed edge case on signature generaion. Behavior now matches v2.9 again.
** Version 2.10 - 17 Jul 2013
* New Features
- Fuzzy Hashing engine re-written to be thread safe.
* Bug Fixes
- Able to handle long file paths on Win32.
- Fixed bug on comparing signatures with the same block size.
- Fixed crash on comparing short signatures.
- Fixed memory leak
** Version 2.9 - 23 Jul 2012
* New Features
- Added warning message for when some data on stdin is not hashed.
- Can now hash up to 512MB of data on stdin.
- Added clustering mode to group together matching files
* Bug Fixes
- Fixed incorrect match scores for hashes with long filenames.
** Version 2.8 - 25 May 2012
* New Features
- Converted to C++
* Bug Fixes
- Fixed filename display on Win32.
- Fixed support for large files on some platforms.
- Fixed errors in handling command line argument processing.
** Version 2.7 - 30 Sep 2011
* New Features
- Added the capability to process the first 100MB of data
from standard input.
- Added a warning message when the program does not process
any file large enough to produce a meaningful result.
* Bug Fixes
- Standard errors are now sent to stderr, not stdout.
** Version 2.6 - 28 Sep 2010
* New Features
- Modified the output file format to allow for proper escaping of
filenames with quotation marks in them.
* Bug Fixes
- Added quotation marks to filenames in CSV matching mode.
** Version 2.5 - 6 May 2010
* New Features
- Added API documentation
- Added return values indicating errors in API functions
- Added compatibility for compiling with C++
* Bug Fixes
- Added parameter validation to API functions
- Fixed some cosmetic errors in error handling
** Version 2.4 - 25 Feb 2010
* New Features
- Added -k mode to compare unknown signatures against known signatures.
** Version 2.3 - 10 Jan 2010
* New Features
- Added -a mode to display all 'matches', regardless of score.
** Version 2.2 - 22 Jul 2009
* New Features
- Added capability to compare two or more files containing signatures
against one another.
* Bug Fixes
- Changed default behavior to exit program on invalid command line flags
** Version 2.1 - 1 Jan 2009
* New Features
- Added fuzzy_hash_filename function to hash an entire file given
only its filename. Avoids issues on Win32 systems.
* Bug Fixes
- Fixed -p mode to display output
** Version 2.0 - 2 Apr 2008
* New Features
- Created fuzzy hashing API/DLL
- Added support for filenames with Unicode characters on Win32
- Added threshold mode
- Added CSV mode
* Bug Fixes
- Fixed extra characters appearing during verbose mode
** Version 1.1 - 14 Aug 2006
* New Features
- First public release
- Added verbose mode to display filenames as they're being hashed
- Added -d mode to make finding similar files in the same directory tree
both easier and faster. Removes the need for two command lines and
many extraneous lines of output.
- Added -p mode to improve -d mode. Prints bi-directional matches together
and omits self matches.
- Added LARGEFILE_SOURCE define to Linux version to allow processing
of large files. (You never know...)
* Bug Fixes
- Fixed cosmetic errors in usage message. Updated man page.
** Version 1.0 - 31 Mar 2006
* New Features
- Released internally
- Added silent mode, -s. All error messages are suppressed.
* Bug Fixes
- Fixed failure to close files after reading in engine.c
- Fixed routine to read headers of matching hashes on Windows.
- Fixed handling of symbolic links
- Fixed cosmetic bug to display error messages if file open fails
(e.g. Permission denied, etc)
- Removed quotation marks from the signatures but not the file names.
Filenames may contain spaces, but signatures may not. Two bytes
per line adds up when we starting compiling large hash sets.
- Redirected all error messages to stderr instead of stdout
- Removed duplicate defines at the start of engine.c
- Replaced all references to u32 with C99 standard uint32_t
- Added error checking for memory allocation in main.c:main() and
- Removed useless logical AND of 0xFFFFFFFF from rolling hash update
** Version 0.1 - 4 Nov 2005
* New Features
- Proof of concept
- This version supports recursion, relative and bare file names, and
can perform positive matching using a previous output.
This file documents the fuzzy hashing API. Information on how to use the
fuzzy hashing program ssdeep can be found in the man page. On *nix
systems you can view this file with:
$ man ./ssdeep.1
Windows users can get the ssdeep usage information from README.TXT.
** Using the API in Your Own Progrms **
You can use the fuzzy hashing API in your own programs by doing
the following:
1. Include the fuzzy hashing header
#include <fuzzy.h>
2. Call one of the functions:
* Fuzzy hashing a buffer of text:
int fuzzy_hash_buf(const unsigned char *buf,
uint32_t buf_len,
char *result);
This function computes the fuzzy hash of the buffer 'buf' and stores the
result in result. You MUST allocate result to hold FUZZY_MAX_RESULT
characters before calling this function. The length of the buffer should
be passed in via buf_len. It is the user's responsibility to append the
filename, if any, to the output. The function returns zero on success,
one on error.
* Fuzzy hashing a file:
There are in fact two ways to fuzzy hash a file. If you already
have an open file handle you can use:
int fuzzy_hash_file(FILE *handle,
char *result);
This function computes the fuzzy hash of the file pointed to by handle
and stores the result in result. You MUST allocate result to hold
FUZZY_MAX_RESULT characters before calling this function. It is the
user's responsibility to append the filename to the output.
The function returns zero on success, one on error.
The other function to hash a file takes a file name:
int fuzzy_hash_filename(const char * filename,
char * result);
Like the function above, this function stores the fuzzy hash result
in the parameter result. You MUST allocate result to hold
FUZZY_MAX_RESULT characters before calling this function.
* Compare two fuzzy hash signatures:
int fuzzy_compare(const char *sig1, const char *sig2);
This function returns a value from 0 to 100 indicating the match
score of the two signatures. A match score of zero indicates the \
sigantures did not match.
3. Compile
To compile the program using gcc:
$ gcc -Wall -I/usr/local/include -L/usr/local/lib sample.c -lfuzzy
Using mingw:
C:\> gcc -Wall -Ic:\path\to\includes sample.c fuzzy.dll
Using Microsoft Visual C (MSVC):
To paraphrase the MinGW documentation,
The Windows ssdeep package includes a Win32 DLL and a .def file. Although
MSVC users can't use the DLL directly, they can easily create a .lib file
using the Microsoft LIB tool:
C:\> lib /machine:i386 /def:fuzzy.def
You can then compile your program using the resulting library:
C:\> cl sample.c fuzzy.lib
** Sample Program **
A sample program that uses the API is in sample.c.
** See Also **
- Jesse D. Kornblum, "Identifying almost identical files using context
triggered piecewise hashing", Digital Investigaton, 3(S):91-97,
September 2006,,
The Proceedings of the 6th Annual Digital Forensic Research Workshop
- Update man page
- Update web page, to include new man page
- Write README
- Find a way to estimate device sizes on Windows
- See if Windows Vista's symbolic links create problems for dig.c
This diff is collapsed.
#! /bin/sh
# Wrapper for compilers which do not understand '-c -o'.
scriptversion=2012-10-14.11; # UTC
# Copyright (C) 1999-2013 Free Software Foundation, Inc.
# Written by Tom Tromey <[email protected]>.
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <>.
# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.
# This file is maintained in Automake, please report
# bugs to <[email protected]> or send patches to
# <[email protected]>.
# We need space, tab and new line, in precisely that order. Quoting is
# there to prevent tools from complaining about whitespace usage.
IFS=" "" $nl"
# func_file_conv build_file lazy
# Convert a $build file to $host form and store it in $file
# Currently only supports Windows hosts. If the determined conversion
# type is listed in (the comma separated) LAZY, no conversion will
# take place.
func_file_conv ()
case $file in
/ | /[!/]*) # absolute file, and not a UNC file
if test -z "$file_conv"; then
# lazily determine how to convert abs files
case `uname -s` in