1. 04 Oct, 2018 1 commit
    • Amir Goldstein's avatar
      fsnotify: generalize handling of extra event flags · 007d1e83
      Amir Goldstein authored
      FS_EVENT_ON_CHILD gets a special treatment in fsnotify() because it is
      not a flag specifying an event type, but rather an extra flags that may
      be reported along with another event and control the handling of the
      event by the backend.
      FS_ISDIR is also an "extra flag" and not an "event type" and therefore
      desrves the same treatment. With inotify/dnotify backends it was never
      possible to set FS_ISDIR in mark masks, so it did not matter.
      With fanotify backend, mark adding code jumps through hoops to avoid
      setting the FS_ISDIR in the commulative object mask.
      Separate the constant ALL_FSNOTIFY_EVENTS to ALL_FSNOTIFY_FLAGS and
      ALL_FSNOTIFY_EVENTS, so the latter can be used to test for specific
      event types.
      Signed-off-by: Amir Goldstein's avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
  2. 27 Sep, 2018 1 commit
  3. 03 Sep, 2018 1 commit
  4. 17 Aug, 2018 1 commit
    • Shakeel Butt's avatar
      fs: fsnotify: account fsnotify metadata to kmemcg · d46eb14b
      Shakeel Butt authored
      Patch series "Directed kmem charging", v8.
      The Linux kernel's memory cgroup allows limiting the memory usage of the
      jobs running on the system to provide isolation between the jobs.  All
      the kernel memory allocated in the context of the job and marked with
      __GFP_ACCOUNT will also be included in the memory usage and be limited
      by the job's limit.
      The kernel memory can only be charged to the memcg of the process in
      whose context kernel memory was allocated.  However there are cases
      where the allocated kernel memory should be charged to the memcg
      different from the current processes's memcg.  This patch series
      contains two such concrete use-cases i.e.  fsnotify and buffer_head.
      The fsnotify event objects can consume a lot of system memory for large
      or unlimited queues if there is either no or slow listener.  The events
      are allocated in the context of the event producer.  However they should
      be charged to the event consumer.  Similarly the buffer_head objects can
      be allocated in a memcg different from the memcg of the page for which
      buffer_head objects are being allocated.
      To solve this issue, this patch series introduces mechanism to charge
      kernel memory to a given memcg.  In case of fsnotify events, the memcg
      of the consumer can be used for charging and for buffer_head, the memcg
      of the page can be charged.  For directed charging, the caller can use
      the scope API memalloc_[un]use_memcg() to specify the memcg to charge
      for all the __GFP_ACCOUNT allocations within the scope.
      This patch (of 2):
      A lot of memory can be consumed by the events generated for the huge or
      unlimited queues if there is either no or slow listener.  This can cause
      system level memory pressure or OOMs.  So, it's better to account the
      fsnotify kmem caches to the memcg of the listener.
      However the listener can be in a different memcg than the memcg of the
      producer and these allocations happen in the context of the event
      producer.  This patch introduces remote memcg charging API which the
      producer can use to charge the allocations to the memcg of the listener.
      There are seven fsnotify kmem caches and among them allocations from
      dnotify_struct_cache, dnotify_mark_cache, fanotify_mark_cache and
      inotify_inode_mark_cachep happens in the context of syscall from the
      listener.  So, SLAB_ACCOUNT is enough for these caches.
      The objects from fsnotify_mark_connector_cachep are not accounted as
      they are small compared to the notification mark or events and it is
      unclear whom to account connector to since it is shared by all events
      attached to the inode.
      The allocations from the event caches happen in the context of the event
      producer.  For such caches we will need to remote charge the allocations
      to the listener's memcg.  Thus we save the memcg reference in the
      fsnotify_group structure of the listener.
      This patch has also moved the members of fsnotify_group to keep the size
      same, at least for 64 bit build, even with additional member by filling
      the holes.
      [shakeelb@google.com: use GFP_KERNEL_ACCOUNT rather than open-coding it]
        Link: http://lkml.kernel.org/r/20180702215439.211597-1-shakeelb@google.com
      Link: http://lkml.kernel.org/r/20180627191250.209150-2-shakeelb@google.comSigned-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  5. 27 Jun, 2018 4 commits
  6. 18 May, 2018 4 commits
    • Amir Goldstein's avatar
      fsnotify: add fsnotify_add_inode_mark() wrappers · b249f5be
      Amir Goldstein authored
      Before changing the arguments of the functions fsnotify_add_mark()
      and fsnotify_add_mark_locked(), convert most callers to use a wrapper.
      Signed-off-by: Amir Goldstein's avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
    • Amir Goldstein's avatar
      fsnotify: generalize iteration of marks by object type · 47d9c7cc
      Amir Goldstein authored
      Make some code that handles marks of object types inode and vfsmount
      generic, so it can handle other object types.
      Introduce fsnotify_foreach_obj_type macro to iterate marks by object type
      and fsnotify_iter_{should|set}_report_type macros to set/test report_mask.
      This is going to be used for adding mark of another object type
      (super block mark).
      Signed-off-by: Amir Goldstein's avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
    • Amir Goldstein's avatar
      fsnotify: remove redundant arguments to handle_event() · 5b0457ad
      Amir Goldstein authored
      inode_mark and vfsmount_mark arguments are passed to handle_event()
      operation as function arguments as well as on iter_info struct.
      The difference is that iter_info struct may contain marks that should
      not be handled and are represented as NULL arguments to inode_mark or
      Instead of passing the inode_mark and vfsmount_mark arguments, add
      a report_mask member to iter_info struct to indicate which marks should
      be handled, versus marks that should only be kept alive during user
      This change is going to be used for passing more mark types
      with handle_event() (i.e. super block marks).
      Signed-off-by: Amir Goldstein's avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
    • Amir Goldstein's avatar
      fsnotify: use type id to identify connector object type · d6f7b98b
      Amir Goldstein authored
      An fsnotify_mark_connector is referencing a single type of object
      (either inode or vfsmount). Instead of storing a type mask in
      connector->flags, store a single type id in connector->type to
      identify the type of object.
      When a connector object is detached from the object, its type is set
      to FSNOTIFY_OBJ_TYPE_DETACHED and this object is not going to be
      The function fsnotify_clear_marks_by_group() is the only place where
      type mask was used, so use type flags instead of type id to this
      This change is going to be more convenient when adding a new object
      type (super block).
      Signed-off-by: Amir Goldstein's avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
  7. 19 Apr, 2018 1 commit
    • Robert Kolchmeyer's avatar
      fsnotify: Fix fsnotify_mark_connector race · d90a10e2
      Robert Kolchmeyer authored
      fsnotify() acquires a reference to a fsnotify_mark_connector through
      the SRCU-protected pointer to_tell->i_fsnotify_marks. However, it
      appears that no precautions are taken in fsnotify_put_mark() to
      ensure that fsnotify() drops its reference to this
      fsnotify_mark_connector before assigning a value to its 'destroy_next'
      field. This can result in fsnotify_put_mark() assigning a value
      to a connector's 'destroy_next' field right before fsnotify() tries to
      traverse the linked list referenced by the connector's 'list' field.
      Since these two fields are members of the same union, this behavior
      results in a kernel panic.
      This issue is resolved by moving the connector's 'destroy_next' field
      into the object pointer union. This should work since the object pointer
      access is protected by both a spinlock and the value of the 'flags'
      field, and the 'flags' field is cleared while holding the spinlock in
      fsnotify_put_mark() before 'destroy_next' is updated. It shouldn't be
      possible for another thread to accidentally read from the object pointer
      after the 'destroy_next' field is updated.
      The offending behavior here is extremely unlikely; since
      fsnotify_put_mark() removes references to a connector (specifically,
      it ensures that the connector is unreachable from the inode it was
      formerly attached to) before updating its 'destroy_next' field, a
      sizeable chunk of code in fsnotify_put_mark() has to execute in the
      short window between when fsnotify() acquires the connector reference
      and saves the value of its 'list' field. On the HEAD kernel, I've only
      been able to reproduce this by inserting a udelay(1) in fsnotify().
      However, I've been able to reproduce this issue without inserting a
      udelay(1) anywhere on older unmodified release kernels, so I believe
      it's worth fixing at HEAD.
      References: https://bugzilla.kernel.org/show_bug.cgi?id=199437
      Fixes: 08991e83
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarRobert Kolchmeyer <rkolchmeyer@google.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
  8. 13 Apr, 2018 1 commit
  9. 27 Feb, 2018 1 commit
    • Jan Kara's avatar
      fsnotify: Let userspace know about lost events due to ENOMEM · 7b1f6417
      Jan Kara authored
      Currently if notification event is lost due to event allocation failing
      we ENOMEM, we just silently continue (except for fanotify permission
      events where we deny the access). This is undesirable as userspace has
      no way of knowing whether the notifications it got are complete or not.
      Treat lost events due to ENOMEM the same way as lost events due to queue
      overflow so that userspace knows something bad happened and it likely
      needs to rescan the filesystem.
      Reviewed-by: Amir Goldstein's avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
  10. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      How this work was done:
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
      All documentation files were explicitly excluded.
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
         For non */uapi/* files that summary was:
         SPDX license identifier                            # files
         GPL-2.0                                              11139
         and resulted in the first patch in this series.
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
         SPDX license identifier                            # files
         GPL-2.0 WITH Linux-syscall-note                        930
         and resulted in the second patch in this series.
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
         SPDX license identifier                            # files
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
         and that resulted in the third patch in this series.
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  11. 31 Oct, 2017 3 commits
  12. 10 Oct, 2017 1 commit
    • Steve Grubb's avatar
      audit: Record fanotify access control decisions · de8cd83e
      Steve Grubb authored
      The fanotify interface allows user space daemons to make access
      control decisions. Under common criteria requirements, we need to
      optionally record decisions based on policy. This patch adds a bit mask,
      FAN_AUDIT, that a user space daemon can 'or' into the response decision
      which will tell the kernel that it made a decision and record it.
      It would be used something like this in user space code:
        response.response = FAN_DENY | FAN_AUDIT;
        write(fd, &response, sizeof(struct fanotify_response));
      When the syscall ends, the audit system will record the decision as a
      AUDIT_FANOTIFY auxiliary record to denote that the reason this event
      occurred is the result of an access control decision from fanotify
      rather than DAC or MAC policy.
      A sample event looks like this:
      type=PATH msg=audit(1504310584.332:290): item=0 name="./evil-ls"
      inode=1319561 dev=fc:03 mode=0100755 ouid=1000 ogid=1000 rdev=00:00
      obj=unconfined_u:object_r:user_home_t:s0 nametype=NORMAL
      type=CWD msg=audit(1504310584.332:290): cwd="/home/sgrubb"
      type=SYSCALL msg=audit(1504310584.332:290): arch=c000003e syscall=2
      success=no exit=-1 a0=32cb3fca90 a1=0 a2=43 a3=8 items=1 ppid=901
      pid=959 auid=1000 uid=1000 gid=1000 euid=1000 suid=1000
      fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts1 ses=3 comm="bash"
      exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:
      s0-s0:c0.c1023 key=(null)
      type=FANOTIFY msg=audit(1504310584.332:290): resp=2
      Prior to using the audit flag, the developer needs to call
      fanotify_init or'ing in FAN_ENABLE_AUDIT to ensure that the kernel
      supports auditing. The calling process must also have the CAP_AUDIT_WRITE
      Signed-off-by: default avatarsgrubb <sgrubb@redhat.com>
      Reviewed-by: Amir Goldstein's avatarAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
  13. 10 Apr, 2017 17 commits
  14. 23 Jan, 2017 1 commit
    • Nikolay Borisov's avatar
      inotify: Convert to using per-namespace limits · 1cce1eea
      Nikolay Borisov authored
      This patchset converts inotify to using the newly introduced
      per-userns sysctl infrastructure.
      Currently the inotify instances/watches are being accounted in the
      user_struct structure. This means that in setups where multiple
      users in unprivileged containers map to the same underlying
      real user (i.e. pointing to the same user_struct) the inotify limits
      are going to be shared as well, allowing one user(or application) to exhaust
      all others limits.
      Fix this by switching the inotify sysctls to using the
      per-namespace/per-user limits. This will allow the server admin to
      set sensible global limits, which can further be tuned inside every
      individual user namespace. Additionally, in order to preserve the
      sysctl ABI make the existing inotify instances/watches sysctls
      modify the values of the initial user namespace.
      Signed-off-by: default avatarNikolay Borisov <n.borisov.lkml@gmail.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
  15. 23 Dec, 2016 1 commit
    • Jan Kara's avatar
      fsnotify: Remove fsnotify_duplicate_mark() · e3ba7307
      Jan Kara authored
      There are only two calls sites of fsnotify_duplicate_mark(). Those are
      in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
      for audit tree, inode pointer and group gets set in
      fsnotify_add_mark_locked() later anyway, mask and free_mark are already
      set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
      actively harmful because following fsnotify_add_mark_locked() will leak
      group reference by overwriting the group pointer. So just remove the two
      calls to fsnotify_duplicate_mark() and the function.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      [PM: line wrapping to fit in 80 chars]
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
  16. 05 Dec, 2016 1 commit