1. 23 Oct, 2018 4 commits
    • David Howells's avatar
      afs: Probe multiple fileservers simultaneously · 3bf0fb6f
      David Howells authored
      Send probes to all the unprobed fileservers in a fileserver list on all
      addresses simultaneously in an attempt to find out the fastest route whilst
      not getting stuck for 20s on any server or address that we don't get a
      reply from.
      This alleviates the problem whereby attempting to access a new server can
      take a long time because the rotation algorithm ends up rotating through
      all servers and addresses until it finds one that responds.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Eliminate the address pointer from the address list cursor · 2feeaf84
      David Howells authored
      Eliminate the address pointer from the address list cursor as it's
      redundant (ac->addrs[ac->index] can be used to find the same address) and
      address lists must be replaced rather than being rearranged, so is of
      limited value.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS · 3b6492df
      David Howells authored
      Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
      number equivalent) to 96 bits to allow the support of YFS.
      This requires the iget comparator to check the vnode->fid rather than i_ino
      and i_generation as i_ino is not sufficiently capacious.  It also requires
      this data to be placed into the vnode cache key for fscache.
      For the moment, just discard the top 32 bits of the vnode ID when returning
      it though stat.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Implement VL server rotation · 0a5143f2
      David Howells authored
      Track VL servers as independent entities rather than lumping all their
      addresses together into one set and implement server-level rotation by:
       (1) Add the concept of a VL server list, where each server has its own
           separate address list.  This code is similar to the FS server list.
       (2) Use the DNS resolver to retrieve a set of servers and their associated
           addresses, ports, preference and weight ratings.
       (3) In the case of a legacy DNS resolver or an address list given directly
           through /proc/net/afs/cells, create a list containing just a dummy
           server record and attach all the addresses to that.
       (4) Implement a simple rotation policy, for the moment ignoring the
           priorities and weights assigned to the servers.
       (5) Show the address list through /proc/net/afs/<cell>/vlservers.  This
           also displays the source and status of the data as indicated by the
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
  2. 06 Apr, 2018 1 commit
  3. 04 Apr, 2018 1 commit
    • David Howells's avatar
      fscache: Attach the index key and aux data to the cookie · 402cb8dd
      David Howells authored
      Attach copies of the index key and auxiliary data to the fscache cookie so
       (1) The callbacks to the netfs for this stuff can be eliminated.  This
           can simplify things in the cache as the information is still
           available, even after the cache has relinquished the cookie.
       (2) Simplifies the locking requirements of accessing the information as we
           don't have to worry about the netfs object going away on us.
       (3) The cache can do lazy updating of the coherency information on disk.
           As long as the cache is flushed before reboot/poweroff, there's no
           need to update the coherency info on disk every time it changes.
       (4) Cookies can be hashed or put in a tree as the index key is easily
           available.  This allows:
           (a) Checks for duplicate cookies can be made at the top fscache layer
           	 rather than down in the bowels of the cache backend.
           (b) Caching can be added to a netfs object that has a cookie if the
           	 cache is brought online after the netfs object is allocated.
      A certain amount of space is made in the cookie for inline copies of the
      data, but if it won't fit there, extra memory will be allocated for it.
      The downside of this is that live cache operation requires more memory.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarAnna Schumaker <anna.schumaker@netapp.com>
      Tested-by: default avatarSteve Dickson <steved@redhat.com>
  4. 06 Feb, 2018 2 commits
    • David Howells's avatar
      afs: Fix server list handling · 45df8462
      David Howells authored
      Fix server list handling in the following ways:
       (1) In afs_alloc_volume(), remove duplicate server list build code.  This
           was already done by afs_alloc_server_list() which afs_alloc_volume()
           previously called.  This just results in twice as many VL RPCs.
       (2) In afs_deliver_vl_get_entry_by_name_u(), use the number of server
           records indicated by ->nServers in the UVLDB record returned by the
           VL.GetEntryByNameU RPC call rather than scanning all NMAXNSERVERS
           slots.  Unused slots may contain garbage.
       (3) In afs_alloc_server_list(), don't stop converting a UVLDB record into
           a server list just because we can't look up one of the servers.  Just
           skip that server and go on to the next.  If we can't look up any of
           the servers then we'll fail at the end.
      Without this patch, an attempt to view the umich.edu root cell using
      something like "ls /afs/umich.edu" on a dynamic root (future patch) mount
      or an autocell mount will result in ENOMEDIUM.  The failure is due to kafs
      not stopping after nServers'worth of records have been read, but then
      trying to access a server with a garbage UUID and getting an error, which
      aborts the server list build.
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: default avatarJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: stable@vger.kernel.org
    • David Howells's avatar
      afs: Add missing afs_put_cell() · e4415015
      David Howells authored
      afs_alloc_volume() needs to release the cell ref it obtained in the case of
      an error.  Fix this by adding an afs_put_cell() call into the error path.
      This can triggered when a lookup for a cell in a dynamic root or an
      autocell mount returns an error whilst trying to look up the server (such
      as ENOMEDIUM).  This results in an assertion failure oops when the module
      is unloaded due to outstanding refs on a cell record.
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: stable@vger.kernel.org
  5. 13 Nov, 2017 9 commits
    • David Howells's avatar
      afs: Make use of the YFS service upgrade to fully support IPv6 · bf99a53c
      David Howells authored
      YFS VL servers offer an upgraded Volume Location service that can return
      IPv6 addresses to fileservers and volume servers in addition to IPv4
      addresses using the YFSVL.GetEndpoints operation which we should use if
      it's available.
      To this end:
       (1) Make rxrpc_kernel_recv_data() return the call's current service ID so
           that the caller can detect service upgrade and see what the service
           was upgraded to.
       (2) When we see a VL server address we haven't seen before, send a
           VL.GetCapabilities operation to it with the service upgrade bit set.
           If we get an upgrade to the YFS VL service, change the service ID in
           the address list for that address to use the upgraded service and set
           a flag to note that this appears to be a YFS-compatible server.
       (3) If, when a server's addresses are being looked up, we note that we
           previously detected a YFS-compatible server, then send the
           YFSVL.GetEndpoints operation rather than VL.GetAddrsU.
       (4) Build a fileserver address list from the reply of YFSVL.GetEndpoints,
           including both IPv4 and IPv6 addresses.  Volume server addresses are
       (5) The address list is sorted by address and port now, instead of just
           address.  This allows multiple servers on the same host sitting on
           different ports.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Overhaul volume and server record caching and fileserver rotation · d2ddc776
      David Howells authored
      The current code assumes that volumes and servers are per-cell and are
      never shared, but this is not enforced, and, indeed, public cells do exist
      that are aliases of each other.  Further, an organisation can, say, set up
      a public cell and a private cell with overlapping, but not identical, sets
      of servers.  The difference is purely in the database attached to the VL
      The current code will malfunction if it sees a server in two cells as it
      assumes global address -> server record mappings and that each server is in
      just one cell.
      Further, each server may have multiple addresses - and may have addresses
      of different families (IPv4 and IPv6, say).
      To this end, the following structural changes are made:
       (1) Server record management is overhauled:
           (a) Server records are made independent of cell.  The namespace keeps
           	 track of them, volume records have lists of them and each vnode
           	 has a server on which its callback interest currently resides.
           (b) The cell record no longer keeps a list of servers known to be in
           	 that cell.
           (c) The server records are now kept in a flat list because there's no
           	 single address to sort on.
           (d) Server records are now keyed by their UUID within the namespace.
           (e) The addresses for a server are obtained with the VL.GetAddrsU
           	 rather than with VL.GetEntryByName, using the server's UUID as a
           (f) Cached server records are garbage collected after a period of
           	 non-use and are counted out of existence before purging is allowed
           	 to complete.  This protects the work functions against rmmod.
           (g) The servers list is now in /proc/fs/afs/servers.
       (2) Volume record management is overhauled:
           (a) An RCU-replaceable server list is introduced.  This tracks both
           	 servers and their coresponding callback interests.
           (b) The superblock is now keyed on cell record and numeric volume ID.
           (c) The volume record is now tied to the superblock which mounts it,
           	 and is activated when mounted and deactivated when unmounted.
           	 This makes it easier to handle the cache cookie without causing a
           	 double-use in fscache.
           (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
           	 to get the server UUID list.
           (e) The volume name is updated if it is seen to have changed when the
           	 volume is updated (the update is keyed on the volume ID).
       (3) The vlocation record is got rid of and VLDB records are no longer
           cached.  Sufficient information is stored in the volume record, though
           an update to a volume record is now no longer shared between related
           volumes (volumes come in bundles of three: R/W, R/O and backup).
      and the following procedural changes are made:
       (1) The fileserver cursor introduced previously is now fleshed out and
           used to iterate over fileservers and their addresses.
       (2) Volume status is checked during iteration, and the server list is
           replaced if a change is detected.
       (3) Server status is checked during iteration, and the address list is
           replaced if a change is detected.
       (4) The abort code is saved into the address list cursor and -ECONNABORTED
           returned in afs_make_call() if a remote abort happened rather than
           translating the abort into an error message.  This allows actions to
           be taken depending on the abort code more easily.
           (a) If a VMOVED abort is seen then this is handled by rechecking the
           	 volume and restarting the iteration.
           (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
               handled by sleeping for a short period and retrying and/or trying
               other servers that might serve that volume.  A message is also
               displayed once until the condition has cleared.
           (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
           (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
           	 see if it has been deleted; if not, the fileserver is probably
           	 indicating that the volume couldn't be attached and needs
           (e) If statfs() sees one of these aborts, it does not sleep, but
           	 rather returns an error, so as not to block the umount program.
       (5) The fileserver iteration functions in vnode.c are now merged into
           their callers and more heavily macroised around the cursor.  vnode.c
           is removed.
       (6) Operations on a particular vnode are serialised on that vnode because
           the server will lock that vnode whilst it operates on it, so a second
           op sent will just have to wait.
       (7) Fileservers are probed with FS.GetCapabilities before being used.
           This is where service upgrade will be done.
       (8) A callback interest on a fileserver is set up before an FS operation
           is performed and passed through to afs_make_call() so that it can be
           set on the vnode if the operation returns a callback.  The callback
           interest is passed through to afs_iget() also so that it can be set
           there too.
      In general, record updating is done on an as-needed basis when we try to
      access servers, volumes or vnodes rather than offloading it to work items
      and special threads.
       (1) Pre AFS-3.4 servers are no longer supported, though this can be added
           back if necessary (AFS-3.4 was released in 1998).
       (2) VBUSY is retried forever for the moment at intervals of 1s.
       (3) /proc/fs/afs/<cell>/servers no longer exists.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Move server rotation code into its own file · 9cc6fc50
      David Howells authored
      Move server rotation code into its own file.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Add an address list concept · 8b2a464c
      David Howells authored
      Add an RCU replaceable address list structure to hold a list of server
      addresses.  The list also holds the
      To this end:
       (1) A cell's VL server address list can be loaded directly via insmod or
           echo to /proc/fs/afs/cells or dynamically from a DNS query for AFSDB
           or SRV records.
       (2) Anyone wanting to use a cell's VL server address must wait until the
           cell record comes online and has tried to obtain some addresses.
       (3) An FS server's address list, for the moment, has a single entry that
           is the key to the server list.  This will change in the future when a
           server is instead keyed on its UUID and the VL.GetAddrsU operation is
       (4) An 'address cursor' concept is introduced to handle iteration through
           the address list.  This is passed to the afs_make_call() as, in the
           future, stuff (such as abort code) that doesn't outlast the call will
           be returned in it.
      In the future, we might want to annotate the list with information about
      how each address fares.  We might then want to propagate such annotations
      over address list replacement.
      Whilst we're at it, we allow IPv6 addresses to be specified in
      colon-delimited lists by enclosing them in square brackets.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Overhaul the callback handling · c435ee34
      David Howells authored
      Overhaul the AFS callback handling by the following means:
       (1) Don't give up callback promises on vnodes that we are no longer using,
           rather let them just expire on the server or let the server break
           them.  This is actually more efficient for the server as the callback
           lookup is expensive if there are lots of extant callbacks.
       (2) Only give up the callback promises we have from a server when the
           server record is destroyed.  Then we can just give up *all* the
           callback promises on it in one go.
       (3) Servers can end up being shared between cells if cells are aliased, so
           don't add all the vnodes being backed by a particular server into a
           big FID-indexed tree on that server as there may be duplicates.
           Instead have each volume instance (~= superblock) register an interest
           in a server as it starts to make use of it and use this to allow the
           processor for callbacks from the server to find the superblock and
           thence the inode corresponding to the FID being broken by means of
       (4) Rather than iterating over the entire callback list when a mass-break
           comes in from the server, maintain a counter of mass-breaks in
           afs_server (cb_seq) and make afs_validate() check it against the copy
           in afs_vnode.
           It would be nice not to have to take a read_lock whilst doing this,
           but that's tricky without using RCU.
       (5) Save a ref on the fileserver we're using for a call in the afs_call
           struct so that we can access its cb_s_break during call decoding.
       (6) Write-lock around callback and status storage in a vnode and read-lock
           around getattr so that we don't see the status mid-update.
      This has the following consequences:
       (1) Data invalidation isn't seen until someone calls afs_validate() on a
           vnode.  Unfortunately, we need to use a key to query the server, but
           getting one from a background thread is tricky without caching loads
           of keys all over the place.
       (2) Mass invalidation isn't seen until someone calls afs_validate().
       (3) Callback breaking is going to hit the inode_hash_lock quite a bit.
           Could this be replaced with rcu_read_lock() since inodes are destroyed
           under RCU conditions.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr · 4d9df986
      David Howells authored
      Keep and pass sockaddr_rxrpc addresses around rather than keeping and
      passing in_addr addresses to allow for the use of IPv6 and non-standard
      port numbers in future.
      This also allows the port and service_id fields to be removed from the
      afs_call struct.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Update the cache index structure · ad6a942a
      David Howells authored
      Update the cache index structure in the following ways:
       (1) Don't use the volume name followed by the volume type as levels in the
           cache index.  Volumes can be renamed.  Use the volume ID instead.
       (2) Don't store the VLDB data for a volume in the tree.  If the volume
           database should be cached locally, then it should be done in a separate
       (3) Expand the volume ID stored in the cache to 64 bits.
       (4) Expand the file/vnode ID stored in the cache to 96 bits.
       (5) Increment the cache structure version number to 1.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Push the net ns pointer to more places · 9ed900b1
      David Howells authored
      Push the network namespace pointer to more places in AFS, including the
      afs_server structure (which doesn't hold a ref on the netns).
      In particular, afs_put_cell() now takes requires a net ns parameter so that
      it can safely alter the netns after decrementing the cell usage count - the
      cell will be deallocated by a background thread after being cached for a
      period, which means that it's not safe to access it after reducing its
      usage count.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
    • David Howells's avatar
      afs: Lay the groundwork for supporting network namespaces · f044c884
      David Howells authored
      Lay the groundwork for supporting network namespaces (netns) to the AFS
      filesystem by moving various global features to a network-namespace struct
      (afs_net) and providing an instance of this as a temporary global variable
      that everything uses via accessor functions for the moment.
      The following changes have been made:
       (1) Store the netns in the superblock info.  This will be obtained from
           the mounter's nsproxy on a manual mount and inherited from the parent
           superblock on an automount.
       (2) The cell list is made per-netns.  It can be viewed through
           /proc/net/afs/cells and also be modified by writing commands to that
       (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
           This is unset by default.
       (4) The 'rootcell' module parameter, which sets a cell and VL server list
           modifies the init net namespace, thereby allowing an AFS root fs to be
           theoretically used.
       (5) The volume location lists and the file lock manager are made
       (6) The AF_RXRPC socket and associated I/O bits are made per-ns.
      The various workqueues remain global for the moment.
      Changes still to be made:
       (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
           from the old name.
       (2) A per-netns subsys needs to be registered for AFS into which it can
           store its per-netns data.
       (3) Rather than the AF_RXRPC socket being opened on module init, it needs
           to be opened on the creation of a superblock in that netns.
       (4) The socket needs to be closed when the last superblock using it is
           destroyed and all outstanding client calls on it have been completed.
           This prevents a reference loop on the namespace.
       (5) It is possible that several namespaces will want to use AFS, in which
           case each one will need its own UDP port.  These can either be set
           through /proc/net/afs/cm_port or the kernel can pick one at random.
           The init_ns gets 7001 by default.
      Other issues that need resolving:
       (1) The DNS keyring needs net-namespacing.
       (2) Where do upcalls go (eg. DNS request-key upcall)?
       (3) Need something like open_socket_in_file_ns() syscall so that AFS
           command line tools attempting to operate on an AFS file/volume have
           their RPC calls go to the right place.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
  6. 20 Apr, 2017 1 commit
  7. 06 Jan, 2017 1 commit
  8. 20 Jan, 2015 1 commit
  9. 27 Sep, 2013 1 commit
    • David Howells's avatar
      FS-Cache: Provide the ability to enable/disable cookies · 94d30ae9
      David Howells authored
      Provide the ability to enable and disable fscache cookies.  A disabled cookie
      will reject or ignore further requests to:
      	Acquire a child cookie
      	Invalidate and update backing objects
      	Check the consistency of a backing object
      	Allocate storage for backing page
      	Read backing pages
      	Write to backing pages
      but still allows:
      	Checks/waits on the completion of already in-progress objects
      	Uncaching of pages
      	Relinquishment of cookies
      Two new operations are provided:
       (1) Disable a cookie:
      	void fscache_disable_cookie(struct fscache_cookie *cookie,
      				    bool invalidate);
           If the cookie is not already disabled, this locks the cookie against other
           dis/enablement ops, marks the cookie as being disabled, discards or
           invalidates any backing objects and waits for cessation of activity on any
           associated object.
           This is a wrapper around a chunk split out of fscache_relinquish_cookie(),
           but it reinitialises the cookie such that it can be reenabled.
           All possible failures are handled internally.  The caller should consider
           calling fscache_uncache_all_inode_pages() afterwards to make sure all page
           markings are cleared up.
       (2) Enable a cookie:
      	void fscache_enable_cookie(struct fscache_cookie *cookie,
      				   bool (*can_enable)(void *data),
      				   void *data)
           If the cookie is not already enabled, this locks the cookie against other
           dis/enablement ops, invokes can_enable() and, if the cookie is not an
           index cookie, will begin the procedure of acquiring backing objects.
           The optional can_enable() function is passed the data argument and returns
           a ruling as to whether or not enablement should actually be permitted to
           All possible failures are handled internally.  The cookie will only be
           marked as enabled if provisional backing objects are allocated.
      A later patch will introduce these to NFS.  Cookie enablement during nfs_open()
      is then contingent on i_writecount <= 0.  can_enable() checks for a race
      between open(O_RDONLY) and open(O_WRONLY/O_RDWR).  This simplifies NFS's cookie
      handling and allows us to get rid of open(O_RDONLY) accidentally introducing
      caching to an inode that's open for writing already.
      One operation has its API modified:
       (3) Acquire a cookie.
      	struct fscache_cookie *fscache_acquire_cookie(
      		struct fscache_cookie *parent,
      		const struct fscache_cookie_def *def,
      		void *netfs_data,
      		bool enable);
           This now has an additional argument that indicates whether the requested
           cookie should be enabled by default.  It doesn't need the can_enable()
           function because the caller must prevent multiple calls for the same netfs
           object and it doesn't need to take the enablement lock because no one else
           can get at the cookie before this returns.
      Signed-off-by: David Howells <dhowells@redhat.com
  10. 22 Apr, 2010 1 commit
  11. 03 Apr, 2009 1 commit
  12. 21 May, 2007 1 commit
    • Alexey Dobriyan's avatar
      Detach sched.h from mm.h · e8edc6e0
      Alexey Dobriyan authored
      First thing mm.h does is including sched.h solely for can_do_mlock() inline
      function which has "current" dereference inside. By dealing with can_do_mlock()
      mm.h can be detached from sched.h which is good. See below, why.
      This patch
      a) removes unconditional inclusion of sched.h from mm.h
      b) makes can_do_mlock() normal function in mm/mlock.c
      c) exports can_do_mlock() to not break compilation
      d) adds sched.h inclusions back to files that were getting it indirectly.
      e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
         getting them indirectly
      Net result is:
      a) mm.h users would get less code to open, read, preprocess, parse, ... if
         they don't need sched.h
      b) sched.h stops being dependency for significant number of files:
         on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
         after patch it's only 3744 (-8.3%).
      Cross-compile tested on
      	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
      	alpha alpha-up
      	i386 i386-up i386-defconfig i386-allnoconfig
      	ia64 ia64-up
      	parisc parisc-up
      	powerpc powerpc-up
      	s390 s390-up
      	sparc sparc-up
      	sparc64 sparc64-up
      	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
      as well as my two usual configs.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  13. 26 Apr, 2007 4 commits
  14. 27 Sep, 2006 1 commit
  15. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      Let it rip!