1. 08 Jun, 2018 31 commits
  2. 06 Jun, 2018 1 commit
    • Kees Cook's avatar
      treewide: Use struct_size() for kmalloc()-family · acafe7e3
      Kees Cook authored
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct foo {
          int stuff;
          void *entry[];
      };
      
      instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
      
      Instead of leaving these open-coded and prone to type mistakes, we can
      now use the new struct_size() helper:
      
      instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
      
      This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
      uses. It was done via automatic conversion with manual review for the
      "CHECKME" non-standard cases noted below, using the following Coccinelle
      script:
      
      // pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
      //                      sizeof *pkey_cache->table, GFP_KERNEL);
      @@
      identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
      expression GFP;
      identifier VAR, ELEMENT;
      expression COUNT;
      @@
      
      - alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
      + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
      
      // mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
      @@
      identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
      expression GFP;
      identifier VAR, ELEMENT;
      expression COUNT;
      @@
      
      - alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
      + alloc(struct_size(VAR, ELEMENT, COUNT), GFP)
      
      // Same pattern, but can't trivially locate the trailing element name,
      // or variable name.
      @@
      identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
      expression GFP;
      expression SOMETHING, COUNT, ELEMENT;
      @@
      
      - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
      + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      acafe7e3
  3. 04 Jun, 2018 8 commits
    • David Howells's avatar
      rxrpc: Fix handling of call quietly cancelled out on server · 1a025028
      David Howells authored
      Sometimes an in-progress call will stop responding on the fileserver when
      the fileserver quietly cancels the call with an internally marked abort
      (RX_CALL_DEAD), without sending an ABORT to the client.
      
      This causes the client's call to eventually expire from lack of incoming
      packets directed its way, which currently leads to it being cancelled
      locally with ETIME.  Note that it's not currently clear as to why this
      happens as it's really hard to reproduce.
      
      The rotation policy implement by kAFS, however, doesn't differentiate
      between ETIME meaning we didn't get any response from the server and ETIME
      meaning the call got cancelled mid-flow.  The latter leads to an oops when
      fetching data as the rotation partially resets the afs_read descriptor,
      which can result in a cleared page pointer being dereferenced because that
      page has already been filled.
      
      Handle this by the following means:
      
       (1) Set a flag on a call when we receive a packet for it.
      
       (2) Store the highest packet serial number so far received for a call
           (bearing in mind this may wrap).
      
       (3) If, when the "not received anything recently" timeout expires on a
           call, we've received at least one packet for a call and the connection
           as a whole has received packets more recently than that call, then
           cancel the call locally with ECONNRESET rather than ETIME.
      
           This indicates that the call was definitely in progress on the server.
      
       (4) In kAFS, if the rotation algorithm sees ECONNRESET rather than ETIME,
           don't try the next server, but rather abort the call.
      
           This avoids the oops as we don't try to reuse the afs_read struct.
           Rather, as-yet ungotten pages will be reread at a later data.
      
      Also:
      
       (5) Add an rxrpc tracepoint to log detection of the call being reset.
      
      Without this, I occasionally see an oops like the following:
      
          general protection fault: 0000 [#1] SMP PTI
          ...
          RIP: 0010:_copy_to_iter+0x204/0x310
          RSP: 0018:ffff8800cae0f828 EFLAGS: 00010206
          RAX: 0000000000000560 RBX: 0000000000000560 RCX: 0000000000000560
          RDX: ffff8800cae0f968 RSI: ffff8800d58b3312 RDI: 0005080000000000
          RBP: ffff8800cae0f968 R08: 0000000000000560 R09: ffff8800ca00f400
          R10: ffff8800c36f28d4 R11: 00000000000008c4 R12: ffff8800cae0f958
          R13: 0000000000000560 R14: ffff8800d58b3312 R15: 0000000000000560
          FS:  00007fdaef108080(0000) GS:ffff8800ca680000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fb28a8fa000 CR3: 00000000d2a76002 CR4: 00000000001606e0
          Call Trace:
           skb_copy_datagram_iter+0x14e/0x289
           rxrpc_recvmsg_data.isra.0+0x6f3/0xf68
           ? trace_buffer_unlock_commit_regs+0x4f/0x89
           rxrpc_kernel_recv_data+0x149/0x421
           afs_extract_data+0x1e0/0x798
           ? afs_wait_for_call_to_complete+0xc9/0x52e
           afs_deliver_fs_fetch_data+0x33a/0x5ab
           afs_deliver_to_call+0x1ee/0x5e0
           ? afs_wait_for_call_to_complete+0xc9/0x52e
           afs_wait_for_call_to_complete+0x12b/0x52e
           ? wake_up_q+0x54/0x54
           afs_make_call+0x287/0x462
           ? afs_fs_fetch_data+0x3e6/0x3ed
           ? rcu_read_lock_sched_held+0x5d/0x63
           afs_fs_fetch_data+0x3e6/0x3ed
           afs_fetch_data+0xbb/0x14a
           afs_readpages+0x317/0x40d
           __do_page_cache_readahead+0x203/0x2ba
           ? ondemand_readahead+0x3a7/0x3c1
           ondemand_readahead+0x3a7/0x3c1
           generic_file_buffered_read+0x18b/0x62f
           __vfs_read+0xdb/0xfe
           vfs_read+0xb2/0x137
           ksys_read+0x50/0x8c
           do_syscall_64+0x7d/0x1a0
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Note the weird value in RDI which is a result of trying to kmap() a NULL
      page pointer.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a025028
    • Andreas Gruenbacher's avatar
      gfs2: Iomap cleanups and improvements · 628e366d
      Andreas Gruenbacher authored
      Clean up gfs2_iomap_alloc and gfs2_iomap_get.  Document how
      gfs2_iomap_alloc works: it now needs to be called separately after
      gfs2_iomap_get where necessary; this will be used later by iomap write.
      Move gfs2_iomap_ops into bmap.c.
      
      Introduce a new gfs2_iomap_get_alloc helper and use it in
      fallocate_chunk: gfs2_iomap_begin will become unsuitable for fallocate
      with proper iomap write support.
      
      In gfs2_block_map and fallocate_chunk, zero-initialize struct iomap.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      628e366d
    • Andreas Gruenbacher's avatar
      gfs2: Remove ordered write mode handling from gfs2_trans_add_data · 845802b1
      Andreas Gruenbacher authored
      In journaled data mode, we need to add each buffer head to the current
      transaction.  In ordered write mode, we only need to add the inode to
      the ordered inode list.  So far, both cases are handled in
      gfs2_trans_add_data.  This makes the code look misleading and is
      inefficient for small block sizes as well.  Handle both cases separately
      instead.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      845802b1
    • Andreas Gruenbacher's avatar
      gfs2: gfs2_stuffed_write_end cleanup · d6382a35
      Andreas Gruenbacher authored
      First, change the sanity check in gfs2_stuffed_write_end to check for
      the actual write size instead of the requested write size.
      
      Second, use the existing teardown code in gfs2_write_end instead of
      duplicating it in gfs2_stuffed_write_end.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      d6382a35
    • Andreas Gruenbacher's avatar
      gfs2: hole_size improvement · 7841b9f0
      Andreas Gruenbacher authored
      Reimplement function hole_size based on a generic function for walking
      the metadata tree and rename hole_size to gfs2_hole_size.  While
      previously, multiple invocations of hole_size were sometimes needed to
      walk across the entire hole, the new implementation always returns the
      entire hole at once (provided that the caller is interested in the total
      size).
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      7841b9f0
    • Bob Peterson's avatar
      GFS2: gfs2_free_extlen can return an extent that is too long · dc8fbb03
      Bob Peterson authored
      Function gfs2_free_extlen calculates the length of an extent of
      free blocks that may be reserved. The end pointer was calculated as
      end = start + bh->b_size but b_size is incorrect because the
      bitmap usually stops prior to the end of the buffer data on
      the last bitmap.
      
      What this means is that when you do a write, you can reserve a
      chunk of blocks that runs off the end of the last bitmap. For
      example, I've got a file system where there is only one bitmap
      for each rgrp, so ri_length==1. I saw cases in which iozone
      tried to do a big write, grabbed a large block reservation,
      chose rgrp 5464152, which has ri_data0 5464153 and ri_data 8188.
      So 5464153 + 8188 = 5472341 which is the end of the rgrp.
      
      When it grabbed a reservation it got back: 5470936, length 7229.
      But 5470936 + 7229 = 5478165. So the reservation starts inside
      the rgrp but runs 5824 blocks past the end of the bitmap.
      
      This patch fixes the calculation so it won't exceed the last
      bitmap. It also adds a BUG_ON to guard against overflows in the
      future.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      dc8fbb03
    • Andreas Gruenbacher's avatar
      GFS2: Fix allocation error bug with recursive rgrp glocking · 7b5747f4
      Andreas Gruenbacher authored
      Before this patch function gfs2_write_begin, upon discovering an
      error, called gfs2_trim_blocks while the rgrp glock was still held.
      That's because gfs2_inplace_release is not called until later.
      This patch reorganizes the logic a bit so gfs2_inplace_release
      is called to release the lock prior to the call to gfs2_trim_blocks,
      thus preventing the glock recursion.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      7b5747f4
    • Andreas Gruenbacher's avatar
      07e23d68