README.md 10.7 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# CRUST

CRUST is a C static analyzer that allows to have in C some of
the memory checks available in RUST, allowing to create more
secure code without the overhead of libraries or other
programming techniques like reference counters or garbage
collectors.

CRUST works by adding several tags to the source code. These
tags allow to specify which pointers must be tracked by CRUST
and their properties.

The tags are automagically removed by the C Preprocessor, so
they are completely transparent to the compiler. That also
means that there are no fancy libraries, runtimes or expandable
macros: the source code is just standard, plain C. The tags just
17
allows to annotate specific information for the CRUST preprocessor,
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
but they aren't used during compilation.

It can be useful for projects where RUST is not feasible, like
code for microcontrollers, kernel drivers...


## INSTALLATION

After downloading the source code, just run:

        sudo ./setup.py install

It will compile the *flex* and *bison* code and generate the main
library with the parser, and install the static analyzer.

## USAGE

### Preparing the source code

First, each C source code must include the file **crust.h**.
This file ensures that the tags are removed at compile time
during the C Preprocessor pass.

Then, each C source file must be annotated with the CRUST tags.
The available tags are:

 * __crust_t__ : specifies that this is a CRUST element, and must
 be tracked by the CRUST static analyzer. It can be applied only to
 single-indirection pointers.
 * __crust_borrow__ : when used with an argument in a function
 definition, it means that the pointer is borrowed to that function,
 so the calling function retains the ownership. When used with the
 return value of a function, it means that the calling function won't
 receive the ownership, but it is retained by the called function.
 * __crust_not_null__ : specifies that an argument in a function or
 the return value will never be NULL.
 * __crust_recycle__ : when a function returns a CRUST pointer, and
 has at least one CRUST argument, it is possible to mark one (and
 only one) of those CRUST arguments as *recyclable*. That means that
 the memory block passed as that argument will be reused for the
 return value if available. In other words: if the *recycle* argument
 is not NULL, the return value will never be NULL, but if the *recycle*
 argument is NULL, then the return value can be NULL.
 * __crust_alias__ : an alias variable is a variable that can point
 to the same block pointer by other CRUST-type variable, in which
 case it will have exactly the same state than the main pointer. But
 it can also point to a block not pointed by any other CRUST pointer.
 It is useful for FOR loops to operate over a linked list of CRUST
 elements without freeing them.
 * __crust_disable__ : disables the error messages. Useful when there
 is no way of implementing something without violating the rules.
 * __crust_enable__ : after a __crust_disable__ statement, enables
 again the error messages. If there are nested tags, it is needed as
 many *enable* tags as *disable* to re-enable the messages.
 * __crust_full_enable__ : enables the error messages without taking
 into account the disable counter (three *disable* tags require three
 *enable* tags to re-enable the messages, but only one *full_enable*
 tag).
 * __crust_no_0__ : by default, when analyzing FOR and WHILE loops,
 CRUST presumes that it is possible to never run the code inside (if
 the condition is met before running the loop). If it is known for
 sure that the loop will run at least once, but CRUST can't infer it,
 it is possible to append this tag to the loop to specify that the
 code inside must be evaluated at least once.
 * __crust_debug__ : shows the status of the analysis. Useful when
 writing unitary tests, to ensure that the pointers have the right
 status.

### Calling CRUST

To check a source file, just call CRUST with:

        crust file_name.c

If there are headers located in other places, it is possible to
specify them from the command line using the **-I** statement.
Example: to include the headers located at */usr/share/a_program/includes*
to analyze the file *source.c*, just use:

        crust -I/usr/share/a_program/includes source.c

It is possible to use as many *-I* statements as needed, one per
directory.

It is also possible to make definitions from the command line
using the **-D** statement. In the previous case, if we want to
define DEBUG_ALL, it can be done with:

        crust -I/usr/share/a_program/includes -DDEBUG_ALL source.c

It is possible to specify several source files, or even to use
wildcards:

        crust "src/*.c"

The quotes are needed to avoid BASH to expand the wildcards itself.
It is also possible to specify filenames that must be avoided (useful
when using wildcards, but there are files that should not be analyzed).
This can be done with the **-e** statement:

        crust "src/*.c" -esourcefile.c

will process all files whose name end with *.c* except *sourcefile.c*.


123 124 125 126 127 128 129 130 131 132
## MANAGEMENT MODEL

Before reading this, it is strongly recommended to read the chapter
"understanding ownership" from the RUST tutorial, because the model
used in CRUST is mainly the same.

https://doc.rust-lang.org/beta/book/second-edition/ch04-00-understanding-ownership.html

Also it can be useful to check the **unitest** folder, with all the
tests and the comments that explain why something is right or wrong.
133 134 135 136 137 138 139 140 141 142 143 144 145 146

CRUST presumes that every CRUST-type pointer can be in one of the
following states:

 * UNINITIALIZED: when it was defined but still hasn't been assigned
 to a block
 * NULL: when it is known for sure that it points to NULL
 * NOT_NULL: when it is known for sure that it points to a block
 * NOT_NULL_OR_NULL: when it is initialized but it is unknown if it
 points to NULL or to a block
 * FREED: when the ownership has been passed to another variable or
 function, so it is presumed that the block has been freed and this
 is a dangling pointer.

147
It also mandates that each memory block must be pointed by one, and
148 149
only one, CRUST variable (there is an exception with aliases, and
global variables).
150

151 152 153 154
CRUST follows all the possible execution branches for each of the
functions in the source file. When it starts with a function, each of
the arguments are set to NOT_NULL_OR_NULL, unless they are tagged
with __crust_not_null__.
155 156 157 158 159 160 161 162 163

Every time a CRUST variable is initialized inside the code (example: when
assigning to a variable a function's return value), it is allowed only if
its current status is UNINITIALIZED, NULL or FREED. In case it is NOT_NULL,
or NOT_NULL_OR_NULL, it is considered an error, because that will result
in a memory leak.

If a function returns a CRUST value, it is also an error to not store it
in a CRUST variable, because it also results in a memory leak. The only
164 165
exception is when the return value is marked as *borrowed*, in which case
it must be stored in a CRUST *borrowed* variable.
166

167 168
Every time a CRUST variable is passed as an argument to a function, the
ownership is passed too, so, from that point, it will be in the FREED
169 170 171 172 173 174
state inside the calling function until it is assigned a new block,
because it is assumed that the block has been freed in the called
function. There is an exception, and it is when the argument in the
called function is marked as *borrowed*. In that case, the called
function can't free the block, but can modify it, because the
ownership is retained in the caller function.
175 176 177 178 179 180 181

Trying to use a CRUST variable that is in UNINITIALIZED or FREED status is
an error, either using it as an argument when calling a function or accessing
them through indirection (in the case of a pointer to a struct), because,
in the case of FREED status, that block has already been freed (it is a
dangling pointer).

182 183
Global variables are an special case: since it is not possible to know
its state (because it can be changed outside the current function), they
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199
are assumed to be in NOT_NULL_OR_NULL status by default, and it is checked
at exit that they are in NOT_NULL, NULL or NOT_NULL_OR_NULL state (
never in FREED or UNINITIALIZED state). Assigning a CRUST variable to a global
variable is allowed, and the variable will not be marked as freed, but the
global variable will work like an alias variable (this is: freeing the
block pointed by a local variable copied in a global variable will result
in the global variable being also freed). It is possible to have several global
variables pointing to the same block during the life of the function.

When the execution reaches the end of the function all variables are checked.
It is an error to reach the end with CRUST variables in NOT_NULL or
NOT_NULL_OR_NULL state. There is an exception: when a local variable has been
copied to a global variable, it is not needed to free it. But at the end
of the execution, each block pointed by global variables must be pointed by
only one global variable (it is an error to reach the end of the function
and have two or more global variables pointing to the same block).
200

201 202 203 204 205 206 207
CRUST understand some comparisons, so this code:

    if (crust_variable == NULL) {
		crust_variable = function();
	}

will never produce an error, because if *crust_variable* has a NOT_NULL_OR_NULL
208
status before the IF statement, the code evaluation will branch in two, one
209 210 211 212 213 214 215 216 217 218 219 220
where it has the NULL status (that will evaluate the call to the function), and
another with a NOT_NULL status, that will evaluate only the code after the **if**
statement.

Finally, when the execution evaluation reaches the end of the function, all the
CRUST-type variables are checked, and it is an error that they are in NOT_NULL or
NOT_NULL_OR_NULL status (unless they are *borrowed*). Doing so would lead again
to memory leaks.


# CURRENT STATUS

221 222 223 224
The code is still in *alpha* status. The C parser is quite complete, but there
are still some obscure cases that aren't managed yet (they can be found in the
bison file, *c99.y*, marked with a call to *show_error*). In case you receive
a message:
225 226 227 228 229 230 231 232 233 234 235

    Undefined statement at line...
    Please, contact the author

Just paste the code that generated the error and send it to the author.

## GNU extensions

The parser recognizes the following non-standard statements (but it doesn't use them, just
doesn't fail if they are present in the code):

236 237 238 239 240 241 242 243
* #pragma
* __builtin_va_list
* __signed__
* __extension__
* __prog__
* __restrict
* __inline

244
It also recognizes the following syntax extensions (but are managed as stubs):
245 246 247 248 249 250 251

* statements inside parentheses
* ellipsis syntax in CASE statements ( CASE n1 ... n2: )
* __attribute__ (...)
* __asm__ [XXXX] (...)
* asm [XXXXX] (...)
* __alignof__(...)
252 253
* __typeof__(...)
* __builtin_offsetof(...)
254 255 256 257 258 259 260


# CONTACTING THE AUTHOR
Sergio Costas-Rodriguez (Raster Software Vigo)
http://www.rastersoft.com
https://github.com/rastersoft/crust
rastersoft@gmail.com