Skip to content
  • Jeff King's avatar
    automatically ban strcpy() · c8af66ab
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    There are a few standard C functions (like strcpy) which are
    easy to misuse. E.g.:
    
      char path[PATH_MAX];
      strcpy(path, arg);
    
    may overflow the "path" buffer. Sometimes there's an earlier
    constraint on the size of "arg", but even in such a case
    it's hard to verify that the code is correct. If the size
    really is unbounded, you're better off using a dynamic
    helper like strbuf:
    
      struct strbuf path = STRBUF_INIT;
      strbuf_addstr(path, arg);
    
    or if it really is bounded, then use xsnprintf to show your
    expectation (and get a run-time assertion):
    
      char path[PATH_MAX];
      xsnprintf(path, sizeof(path), "%s", arg);
    
    which makes further auditing easier.
    
    We'd usually catch undesirable code like this in a review,
    but there's no automated enforcement. Adding that
    enforcement can help us be more consistent and save effort
    (and a round-trip) during review.
    
    This patch teaches the compiler to report an error when it
    sees strcpy (and will become a model for banning a few other
    functions). This has a few advantages over a separate
    linting tool:
    
      1. We know it's run as part of a build cycle, so it's
         hard to ignore. Whereas an external linter is an extra
         step the developer needs to remember to do.
    
      2. Likewise, it's basically free since the compiler is
         parsing the code anyway.
    
      3. We know it's robust against false positives (unlike a
         grep-based linter).
    
    The two big disadvantages are:
    
      1. We'll only check code that is actually compiled, so it
         may miss code that isn't triggered on your particular
         system. But since presumably people don't add new code
         without compiling it (and if they do, the banned
         function list is the least of their worries), we really
         only care about failing to clean up old code when
         adding new functions to the list. And that's easy
         enough to address with a manual audit when adding a new
         function (which is what I did for the functions here).
    
      2. If this ends up generating false positives, it's going
         to be harder to disable (as opposed to a separate
         linter, which may have mechanisms for overriding a
         particular case).
    
         But the intent is to only ban functions which are
         obviously bad, and for which we accept using an
         alternative even when this particular use isn't buggy
         (e.g., the xsnprintf alternative above).
    
    The implementation here is simple: we'll define a macro for
    the banned function which replaces it with a reference to a
    descriptively named but undeclared identifier.  Replacing it
    with any invalid code would work (since we just want to
    break compilation).  But ideally we'd meet these goals:
    
     - it should be portable; ideally this would trigger
       everywhere, and does not need to be part of a DEVELOPER=1
       setup (because unlike warnings which may depend on the
       compiler or system, this is a clear indicator of
       something wrong in the code).
    
     - it should generate a readable error that gives the
       developer a clue what happened
    
     - it should avoid generating too much other cruft that
       makes it hard to see the actual error
    
     - it should mention the original callsite in the error
    
    The output with this patch looks like this (using gcc 7, on
    a checkout with 022d2ac1
    
     reverted, which removed the final
    strcpy from blame.c):
    
          CC builtin/blame.o
      In file included from ./git-compat-util.h:1246,
                       from ./cache.h:4,
                       from builtin/blame.c:8:
      builtin/blame.c: In function ‘cmd_blame’:
      ./banned.h:11:22: error: ‘sorry_strcpy_is_a_banned_function’ undeclared (first use in this function)
       #define BANNED(func) sorry_##func##_is_a_banned_function
                            ^~~~~~
      ./banned.h:14:21: note: in expansion of macro ‘BANNED’
       #define strcpy(x,y) BANNED(strcpy)
                           ^~~~~~
      builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
          strcpy(repeated_meta_color, GIT_COLOR_CYAN);
          ^~~~~~
      ./banned.h:11:22: note: each undeclared identifier is reported only once for each function it appears in
       #define BANNED(func) sorry_##func##_is_a_banned_function
                            ^~~~~~
      ./banned.h:14:21: note: in expansion of macro ‘BANNED’
       #define strcpy(x,y) BANNED(strcpy)
                           ^~~~~~
      builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
          strcpy(repeated_meta_color, GIT_COLOR_CYAN);
          ^~~~~~
    
    This prominently shows the phrase "strcpy is a banned
    function", along with the original callsite in blame.c and
    the location of the ban code in banned.h. Which should be
    enough to get even a developer seeing this for the first
    time pointed in the right direction.
    
    This doesn't match our ideals perfectly, but it's a pretty
    good balance. A few alternatives I tried:
    
      1. Instead of using an undeclared variable, using an
         undeclared function. This shortens the message, because
         the "each undeclared identifier" message is not needed
         (and as you can see above, it triggers a separate
         mention of each of the expansion points).
    
         But it doesn't actually stop compilation unless you use
         -Werror=implicit-function-declaration in your CFLAGS.
         This is the case for DEVELOPER=1, but not for a default
         build (on the other hand, we'd eventually produce a
         link error pointing to the correct source line with the
         descriptive name).
    
      2. The linux kernel uses a similar mechanism in its
         BUILD_BUG_ON_MSG(), where they actually declare the
         function but do so with gcc's error attribute. But
         that's not portable to other compilers (and it also
         runs afoul of our error() macro).
    
         We could make a gcc-specific technique and fallback on
         other compilers, but it's probably not worth the
         complexity. It also isn't significantly shorter than
         the error message shown above.
    
      3. We could drop the BANNED() macro, which would shorten
         the number of lines in the error. But curiously,
         removing it (and just expanding strcpy directly to the
         bogus identifier) causes gcc _not_ to report the
         original line of code.
    
    So this strategy seems to be an acceptable mix of
    information, portability, simplicity, and robustness,
    without _too_ much extra clutter. I also tested it with
    clang, and it looks as good (actually, slightly less
    cluttered than with gcc).
    
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    c8af66ab