1. 08 May, 2019 1 commit
  2. 28 Feb, 2019 1 commit
    • Martin Ågren's avatar
      setup: fix memory leaks with `struct repository_format` · e8805af1
      Martin Ågren authored
      After we set up a `struct repository_format`, it owns various pieces of
      allocated memory. We then either use those members, because we decide we
      want to use the "candidate" repository format, or we discard the
      candidate / scratch space. In the first case, we transfer ownership of
      the memory to a few global variables. In the latter case, we just
      silently drop the struct and end up leaking memory.
      
      Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
      function `clear_repository_format()`, to be used on each side of
      `read_repository_format()`. To have a clear and simple memory ownership,
      let all users of `struct repository_format` duplicate the strings that
      they take from it, rather than stealing the pointers.
      
      Call `clear_...()` at the start of `read_...()` instead of just zeroing
      the struct, since we sometimes enter the function multiple times. Thus,
      it is important to initialize the struct before calling `read_...()`, so
      document that. It's also important because we might not even call
      `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c.
      
      Teach `read_...()` to clear the struct on error, so that it is reset to
      a safe state, and document this. (In `setup_git_directory_gently()`, we
      look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
      weren't actually supposed to do per the API. After this commit, that's
      ok.)
      
      We inherit the existing code's combining "error" and "no version found".
      Both are signalled through `version == -1` and now both cause us to
      clear any partial configuration we have picked up. For "extensions.*",
      that's fine, since they require a positive version number. For
      "core.bare" and "core.worktree", we're already verifying that we have a
      non-negative version number before using them.
      Signed-off-by: 's avatarMartin Ågren <martin.agren@gmail.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      e8805af1
  3. 22 Feb, 2019 1 commit
    • Jeff Hostetler's avatar
      trace2: create new combined trace facility · ee4512ed
      Jeff Hostetler authored
      Create a new unified tracing facility for git.  The eventual intent is to
      replace the current trace_printf* and trace_performance* routines with a
      unified set of git_trace2* routines.
      
      In addition to the usual printf-style API, trace2 provides higer-level
      event verbs with fixed-fields allowing structured data to be written.
      This makes post-processing and analysis easier for external tools.
      
      Trace2 defines 3 output targets.  These are set using the environment
      variables "GIT_TR2", "GIT_TR2_PERF", and "GIT_TR2_EVENT".  These may be
      set to "1" or to an absolute pathname (just like the current GIT_TRACE).
      
      * GIT_TR2 is intended to be a replacement for GIT_TRACE and logs command
        summary data.
      
      * GIT_TR2_PERF is intended as a replacement for GIT_TRACE_PERFORMANCE.
        It extends the output with columns for the command process, thread,
        repo, absolute and relative elapsed times.  It reports events for
        child process start/stop, thread start/stop, and per-thread function
        nesting.
      
      * GIT_TR2_EVENT is a new structured format. It writes event data as a
        series of JSON records.
      
      Calls to trace2 functions log to any of the 3 output targets enabled
      without the need to call different trace_printf* or trace_performance*
      routines.
      Signed-off-by: 's avatarJeff Hostetler <jeffhost@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      ee4512ed
  4. 24 Jan, 2019 1 commit
  5. 14 Jan, 2019 1 commit
  6. 05 Dec, 2018 1 commit
    • Stefan Beller's avatar
      repository: repo_submodule_init to take a submodule struct · d5498e08
      Stefan Beller authored
      When constructing a struct repository for a submodule for some revision
      of the superproject where the submodule is not contained in the index,
      it may not be present in the working tree currently either. In that
      situation giving a 'path' argument is not useful. Upgrade the
      repo_submodule_init function to take a struct submodule instead.
      The submodule struct can be obtained via submodule_from_{path, name} or
      an artificial submodule struct can be passed in.
      
      While we are at it, rename the repository struct in the repo_submodule_init
      function, which is to be initialized, to a name that is not confused with
      the struct submodule as easily. Perform such renames in similar functions
      as well.
      
      Also move its documentation into the header file.
      Reviewed-by: 's avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      d5498e08
  7. 13 Nov, 2018 1 commit
    • Jeff King's avatar
      sha1-file: use an object_directory for the main object dir · f0eaf638
      Jeff King authored
      Our handling of alternate object directories is needlessly different
      from the main object directory. As a result, many places in the code
      basically look like this:
      
        do_something(r->objects->objdir);
      
        for (odb = r->objects->alt_odb_list; odb; odb = odb->next)
              do_something(odb->path);
      
      That gets annoying when do_something() is non-trivial, and we've
      resorted to gross hacks like creating fake alternates (see
      find_short_object_filename()).
      
      Instead, let's give each raw_object_store a unified list of
      object_directory structs. The first will be the main store, and
      everything after is an alternate. Very few callers even care about the
      distinction, and can just loop over the whole list (and those who care
      can just treat the first element differently).
      
      A few observations:
      
        - we don't need r->objects->objectdir anymore, and can just
          mechanically convert that to r->objects->odb->path
      
        - object_directory's path field needs to become a real pointer rather
          than a FLEX_ARRAY, in order to fill it with expand_base_dir()
      
        - we'll call prepare_alt_odb() earlier in many functions (i.e.,
          outside of the loop). This may result in us calling it even when our
          function would be satisfied looking only at the main odb.
      
          But this doesn't matter in practice. It's not a very expensive
          operation in the first place, and in the majority of cases it will
          be a noop. We call it already (and cache its results) in
          prepare_packed_git(), and we'll generally check packs before loose
          objects. So essentially every program is going to call it
          immediately once per program.
      
          Arguably we should just prepare_alt_odb() immediately upon setting
          up the repository's object directory, which would save us sprinkling
          calls throughout the code base (and forgetting to do so has been a
          source of subtle bugs in the past). But I've stopped short of that
          here, since there are already a lot of other moving parts in this
          patch.
      
        - Most call sites just get shorter. The check_and_freshen() functions
          are an exception, because they have entry points to handle local and
          nonlocal directories separately.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      f0eaf638
  8. 10 May, 2018 1 commit
    • Duy Nguyen's avatar
      repository: fix free problem with repo_clear(the_repository) · 74373b5f
      Duy Nguyen authored
      the_repository is special. One of the special things about it is that
      it does not allocate a new index_state object like submodules but
      points to the global the_index variable instead. As a global variable,
      the_index cannot be free()'d.
      
      Add an exception for this in repo_clear(). In the future perhaps we
      would be able to allocate the_repository's index on heap too. Then we
      can revert this.
      
      the_repository->index remains pointed to a clean the_index even after
      repo_clear() so that it could still be used next time (e.g. in a crazy
      use case where a dev switches repo in the same process).
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      74373b5f
  9. 09 May, 2018 1 commit
    • Stefan Beller's avatar
      repository: introduce parsed objects field · 99bf115c
      Stefan Beller authored
      Convert the existing global cache for parsed objects (obj_hash) into
      repository-specific parsed object caches. Existing code that uses
      obj_hash are modified to use the parsed object cache of
      the_repository; future patches will use the parsed object caches of
      other repositories.
      
      Another future use case for a pool of objects is ease of memory management
      in revision walking: If we can free the rev-list related memory early in
      pack-objects (e.g. part of repack operation) then it could lower memory
      pressure significantly when running on large repos. While this has been
      discussed on the mailing list lately, this series doesn't implement this.
      Signed-off-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      99bf115c
  10. 29 Mar, 2018 2 commits
  11. 23 Mar, 2018 1 commit
    • Stefan Beller's avatar
      repository: introduce raw object store field · 90c62155
      Stefan Beller authored
      The raw object store field will contain any objects needed for access
      to objects in a given repository.
      
      This patch introduces the raw object store and populates it with the
      `objectdir`, which used to be part of the repository struct.
      
      As the struct gains members, we'll also populate the function to clear
      the memory for these members.
      
      In a later step, we'll introduce a struct object_parser, that will
      complement the object parsing in a repository struct: The raw object
      parser is the layer that will provide access to raw object content,
      while the higher level object parser code will parse raw objects and
      keeps track of parenthood and other object relationships using 'struct
      object'.  For now only add the lower level to the repository struct.
      Signed-off-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: 's avatarJonathan Nieder <jrnieder@gmail.com>
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      90c62155
  12. 05 Mar, 2018 5 commits
  13. 19 Jan, 2018 2 commits
    • brian m. carlson's avatar
      repository: pre-initialize hash algo pointer · e26f7f19
      brian m. carlson authored
      There are various git subcommands (among them, clone) which don't set up
      the repository (that is, they lack RUN_SETUP or RUN_SETUP_GENTLY) but
      end up needing to have information about the hash algorithm in use.
      Because the hash algorithm is part of struct repository and it's only
      initialized in repository setup, we can end up dereferencing a NULL
      pointer in some cases if we call one of these subcommands and look up
      the empty blob or empty tree values.
      
      A "git clone" of a project that has two paths that differ only in
      case suffers from this if it is run on a case insensitive platform.
      When the command attempts to check out one of these two paths after
      checking out the other one, the checkout codepath needs to see if
      the version that is already on the filesystem (which should not
      happen if the FS were case sensitive) is dirty, and it needs to
      exercise the hashing code at that point.
      
      In the future, we can add a command line option for this or read it
      from the configuration, but until we're ready to expose that
      functionality to the user, simply initialize the repository
      structure to use the current hash algorithm, SHA-1.
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      e26f7f19
    • Thomas Gummerer's avatar
      read-cache: fix reading the shared index for other repos · a125a223
      Thomas Gummerer authored
      read_index_from() takes a path argument for the location of the index
      file.  For reading the shared index in split index mode however it just
      ignores that path argument, and reads it from the gitdir of the current
      repository.
      
      This works as long as an index in the_repository is read.  Once that
      changes, such as when we read the index of a submodule, or of a
      different working tree than the current one, the gitdir of
      the_repository will no longer contain the appropriate shared index,
      and git will fail to read it.
      
      For example t3007-ls-files-recurse-submodules.sh was broken with
      GIT_TEST_SPLIT_INDEX set in 188dce13 ("ls-files: use repository
      object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
      broken in a similar manner, probably by introducing struct repository
      there, although I didn't track down the exact commit for that.
      
      be489d02 ("revision.c: --indexed-objects add objects from all
      worktrees", 2017-08-23) breaks with split index mode in a similar
      manner, not erroring out when it can't read the index, but instead
      carrying on with pruning, without taking the index of the worktree into
      account.
      
      Fix this by passing an additional gitdir parameter to read_index_from,
      to indicate where it should look for and read the shared index from.
      
      read_cache_from() defaults to using the gitdir of the_repository.  As it
      is mostly a convenience macro, having to pass get_git_dir() for every
      call seems overkill, and if necessary users can have more control by
      using read_index_from().
      Helped-by: 's avatarBrandon Williams <bmwill@google.com>
      Signed-off-by: 's avatarThomas Gummerer <t.gummerer@gmail.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      a125a223
  14. 28 Nov, 2017 1 commit
  15. 13 Nov, 2017 1 commit
    • brian m. carlson's avatar
      Integrate hash algorithm support with repo setup · 78a67668
      brian m. carlson authored
      In future versions of Git, we plan to support an additional hash
      algorithm.  Integrate the enumeration of hash algorithms with repository
      setup, and store a pointer to the enumerated data in struct repository.
      Of course, we currently only support SHA-1, so hard-code this value in
      read_repository_format.  In the future, we'll enumerate this value from
      the configuration.
      
      Add a constant, the_hash_algo, which points to the hash_algo structure
      pointer in the repository global.  Note that this is the hash which is
      used to serialize data to disk, not the hash which is used to display
      items to the user.  The transition plan anticipates that these may be
      different.  We can add an additional element in the future (say,
      ui_hash_algo) to provide for this case.
      
      Include repository.h in cache.h since we now need to have access to
      these struct and variable definitions.
      Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      78a67668
  16. 02 Oct, 2017 1 commit
  17. 06 Sep, 2017 2 commits
    • Jeff King's avatar
      set_git_dir: handle feeding gitdir to itself · 1fb2b636
      Jeff King authored
      Ideally we'd free the existing gitdir field before assigning
      the new one, to avoid a memory leak. But we can't do so
      safely because some callers do the equivalent of:
      
        set_git_dir(get_git_dir());
      
      We can detect that case as a noop, but there are even more
      complicated cases like:
      
        set_git_dir(remove_leading_path(worktree, get_git_dir());
      
      where we really do need to do some work, but the original
      string must remain valid.
      
      Rather than put the burden on callers to make a copy of the
      string (only to free it later, since we'll make a copy of it
      ourselves), let's solve the problem inside set_git_dir(). We
      can make a copy of the pointer for the old gitdir, and then
      avoid freeing it until after we've made our new copy.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      1fb2b636
    • Jeff King's avatar
      repository: free fields before overwriting them · f9b7573f
      Jeff King authored
      It's possible that the repository data may be initialized
      twice (e.g., after doing a chdir() to the top of the
      worktree we may have to adjust a relative git_dir path). We
      should free() any existing fields before assigning to them
      to avoid leaks.
      
      This should be safe, as the fields are set based on the
      environment or on other strings like the gitdir or
      commondir. That makes it impossible that we are feeding an
      alias to the just-freed string.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      f9b7573f
  18. 18 Jul, 2017 2 commits
  19. 17 Jul, 2017 1 commit
  20. 24 Jun, 2017 5 commits
    • Brandon Williams's avatar
      repository: enable initialization of submodules · 96dc883b
      Brandon Williams authored
      Introduce 'repo_submodule_init()' which performs initialization of a
      'struct repository' as a submodule of another 'struct repository'.
      
      The resulting submodule 'struct repository' can be in one of three states:
      
        1. The submodule is initialized and has a worktree.
      
        2. The submodule is initialized but does not have a worktree.  This
           would occur when the submodule's gitdir is present in the
           superproject's 'gitdir/modules/' directory yet the submodule has not
           been checked out in superproject's worktree.
      
        3. The submodule remains uninitialized due to an error in the
           initialization process or there is no matching submodule at the
           provided path in the superproject.
      Signed-off-by: 's avatarBrandon Williams <bmwill@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      96dc883b
    • Brandon Williams's avatar
      submodule-config: store the_submodule_cache in the_repository · bf12fcdf
      Brandon Williams authored
      Refactor how 'the_submodule_cache' is handled so that it can be stored
      inside of a repository object.  Also migrate 'the_submodule_cache' to be
      stored in 'the_repository'.
      Signed-off-by: 's avatarBrandon Williams <bmwill@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      bf12fcdf
    • Brandon Williams's avatar
      639e30b5
    • Brandon Williams's avatar
      config: read config from a repository object · 3b256228
      Brandon Williams authored
      Teach the config machinery to read config information from a repository
      object.  This involves storing a 'struct config_set' inside the
      repository object and adding a number of functions (repo_config*) to be
      able to query a repository's config.
      
      The current config API enables lazy-loading of the config.  This means
      that when 'git_config_get_int()' is called, if the_config_set hasn't
      been populated yet, then it will be populated and properly initialized by
      reading the necessary config files (system wide .gitconfig, user's home
      .gitconfig, and the repository's config).  To maintain this paradigm,
      the new API to read from a repository object's config will also perform
      this lazy-initialization.
      
      Since both APIs (git_config_get* and repo_config_get*) have the same
      semantics we can migrate the default config to be stored within
      'the_repository' and just have the 'git_config_get*' family of functions
      redirect to the 'repo_config_get*' functions.
      Signed-off-by: 's avatarBrandon Williams <bmwill@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      3b256228
    • Brandon Williams's avatar
      repository: introduce the repository object · 359efeff
      Brandon Williams authored
      Introduce the repository object 'struct repository' which can be used to
      hold all state pertaining to a git repository.
      
      Some of the benefits of object-ifying a repository are:
      
        1. Make the code base more readable and easier to reason about.
      
        2. Allow for working on multiple repositories, specifically
           submodules, within the same process.  Currently the process for
           working on a submodule involves setting up an argv_array of options
           for a particular command and then launching a child process to
           execute the command in the context of the submodule.  This is
           clunky and can require lots of little hacks in order to ensure
           correctness.  Ideally it would be nice to simply pass a repository
           and an options struct to a command.
      
        3. Eliminating reliance on global state will make it easier to
           enable the use of threading to improve performance.
      Signed-off-by: 's avatarBrandon Williams <bmwill@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      359efeff