Commit 126e76ff authored by Linus Torvalds's avatar Linus Torvalds

Merge branch 'for-4.14/block-postmerge' of git://git.kernel.dk/linux-block

Pull followup block layer updates from Jens Axboe:
 "I ended up splitting the main pull request for this series into two,
  mainly because of clashes between NVMe fixes that went into 4.13 after
  the for-4.14 branches were split off. This pull request is mostly
  NVMe, but not exclusively. In detail, it contains:

   - Two pull request for NVMe changes from Christoph. Nothing new on
     the feature front, basically just fixes all over the map for the
     core bits, transport, rdma, etc.

   - Series from Bart, cleaning up various bits in the BFQ scheduler.

   - Series of bcache fixes, which has been lingering for a release or
     two. Coly sent this in, but patches from various people in this
     area.

   - Set of patches for BFQ from Paolo himself, updating both
     documentation and fixing some corner cases in performance.

   - Series from Omar, attempting to now get the 4k loop support
     correct. Our confidence level is higher this time.

   - Series from Shaohua for loop as well, improving O_DIRECT
     performance and fixing a use-after-free"

* 'for-4.14/block-postmerge' of git://git.kernel.dk/linux-block: (74 commits)
  bcache: initialize dirty stripes in flash_dev_run()
  loop: set physical block size to logical block size
  bcache: fix bch_hprint crash and improve output
  bcache: Update continue_at() documentation
  bcache: silence static checker warning
  bcache: fix for gc and write-back race
  bcache: increase the number of open buckets
  bcache: Correct return value for sysfs attach errors
  bcache: correct cache_dirty_target in __update_writeback_rate()
  bcache: gc does not work when triggering by manual command
  bcache: Don't reinvent the wheel but use existing llist API
  bcache: do not subtract sectors_to_gc for bypassed IO
  bcache: fix sequential large write IO bypass
  bcache: Fix leak of bdev reference
  block/loop: remove unused field
  block/loop: fix use after free
  bfq: Use icq_to_bic() consistently
  bfq: Suppress compiler warnings about comparisons
  bfq: Check kstrtoul() return value
  bfq: Declare local functions static
  ...
parents fbd01410 175206cf
......@@ -16,14 +16,16 @@ throughput. So, when needed for achieving a lower latency, BFQ builds
schedules that may lead to a lower throughput. If your main or only
goal, for a given device, is to achieve the maximum-possible
throughput at all times, then do switch off all low-latency heuristics
for that device, by setting low_latency to 0. Full details in Section 3.
for that device, by setting low_latency to 0. See Section 3 for
details on how to configure BFQ for the desired tradeoff between
latency and throughput, or on how to maximize throughput.
On average CPUs, the current version of BFQ can handle devices
performing at most ~30K IOPS; at most ~50 KIOPS on faster CPUs. As a
reference, 30-50 KIOPS correspond to very high bandwidths with
sequential I/O (e.g., 8-12 GB/s if I/O requests are 256 KB large), and
to 120-200 MB/s with 4KB random I/O. BFQ has not yet been tested on
multi-queue devices.
to 120-200 MB/s with 4KB random I/O. BFQ is currently being tested on
multi-queue devices too.
The table of contents follow. Impatients can just jump to Section 3.
......@@ -33,7 +35,7 @@ CONTENTS
1-1 Personal systems
1-2 Server systems
2. How does BFQ work?
3. What are BFQ's tunable?
3. What are BFQ's tunables and how to properly configure BFQ?
4. BFQ group scheduling
4-1 Service guarantees provided
4-2 Interface
......@@ -145,19 +147,28 @@ plus a lot of code, are borrowed from CFQ.
contrast, BFQ may idle the device for a short time interval,
giving the process the chance to go on being served if it issues
a new request in time. Device idling typically boosts the
throughput on rotational devices, if processes do synchronous
and sequential I/O. In addition, under BFQ, device idling is
also instrumental in guaranteeing the desired throughput
fraction to processes issuing sync requests (see the description
of the slice_idle tunable in this document, or [1, 2], for more
details).
throughput on rotational devices and on non-queueing flash-based
devices, if processes do synchronous and sequential I/O. In
addition, under BFQ, device idling is also instrumental in
guaranteeing the desired throughput fraction to processes
issuing sync requests (see the description of the slice_idle
tunable in this document, or [1, 2], for more details).
- With respect to idling for service guarantees, if several
processes are competing for the device at the same time, but
all processes (and groups, after the following commit) have
the same weight, then BFQ guarantees the expected throughput
distribution without ever idling the device. Throughput is
thus as high as possible in this common scenario.
all processes and groups have the same weight, then BFQ
guarantees the expected throughput distribution without ever
idling the device. Throughput is thus as high as possible in
this common scenario.
- On flash-based storage with internal queueing of commands
(typically NCQ), device idling happens to be always detrimental
for throughput. So, with these devices, BFQ performs idling
only when strictly needed for service guarantees, i.e., for
guaranteeing low latency or fairness. In these cases, overall
throughput may be sub-optimal. No solution currently exists to
provide both strong service guarantees and optimal throughput
on devices with internal queueing.
- If low-latency mode is enabled (default configuration), BFQ
executes some special heuristics to detect interactive and soft
......@@ -191,10 +202,7 @@ plus a lot of code, are borrowed from CFQ.
- Queues are scheduled according to a variant of WF2Q+, named
B-WF2Q+, and implemented using an augmented rb-tree to preserve an
O(log N) overall complexity. See [2] for more details. B-WF2Q+ is
also ready for hierarchical scheduling. However, for a cleaner
logical breakdown, the code that enables and completes
hierarchical support is provided in the next commit, which focuses
exactly on this feature.
also ready for hierarchical scheduling, details in Section 4.
- B-WF2Q+ guarantees a tight deviation with respect to an ideal,
perfectly fair, and smooth service. In particular, B-WF2Q+
......@@ -249,13 +257,24 @@ plus a lot of code, are borrowed from CFQ.
the Idle class, to prevent it from starving.
3. What are BFQ's tunable?
==========================
3. What are BFQ's tunables and how to properly configure BFQ?
=============================================================
Most BFQ tunables affect service guarantees (basically latency and
fairness) and throughput. For full details on how to choose the
desired tradeoff between service guarantees and throughput, see the
parameters slice_idle, strict_guarantees and low_latency. For details
on how to maximise throughput, see slice_idle, timeout_sync and
max_budget. The other performance-related parameters have been
inherited from, and have been preserved mostly for compatibility with
CFQ. So far, no performance improvement has been reported after
changing the latter parameters in BFQ.
The tunables back_seek-max, back_seek_penalty, fifo_expire_async and
fifo_expire_sync below are the same as in CFQ. Their description is
just copied from that for CFQ. Some considerations in the description
of slice_idle are copied from CFQ too.
In particular, the tunables back_seek-max, back_seek_penalty,
fifo_expire_async and fifo_expire_sync below are the same as in
CFQ. Their description is just copied from that for CFQ. Some
considerations in the description of slice_idle are copied from CFQ
too.
per-process ioprio and weight
-----------------------------
......@@ -285,15 +304,17 @@ number of seeks and see improved throughput.
Setting slice_idle to 0 will remove all the idling on queues and one
should see an overall improved throughput on faster storage devices
like multiple SATA/SAS disks in hardware RAID configuration.
like multiple SATA/SAS disks in hardware RAID configuration, as well
as flash-based storage with internal command queueing (and
parallelism).
So depending on storage and workload, it might be useful to set
slice_idle=0. In general for SATA/SAS disks and software RAID of
SATA/SAS disks keeping slice_idle enabled should be useful. For any
configurations where there are multiple spindles behind single LUN
(Host based hardware RAID controller or for storage arrays), setting
slice_idle=0 might end up in better throughput and acceptable
latencies.
(Host based hardware RAID controller or for storage arrays), or with
flash-based fast storage, setting slice_idle=0 might end up in better
throughput and acceptable latencies.
Idling is however necessary to have service guarantees enforced in
case of differentiated weights or differentiated I/O-request lengths.
......@@ -312,13 +333,14 @@ There is an important flipside for idling: apart from the above cases
where it is beneficial also for throughput, idling can severely impact
throughput. One important case is random workload. Because of this
issue, BFQ tends to avoid idling as much as possible, when it is not
beneficial also for throughput. As a consequence of this behavior, and
of further issues described for the strict_guarantees tunable,
short-term service guarantees may be occasionally violated. And, in
some cases, these guarantees may be more important than guaranteeing
maximum throughput. For example, in video playing/streaming, a very
low drop rate may be more important than maximum throughput. In these
cases, consider setting the strict_guarantees parameter.
beneficial also for throughput (as detailed in Section 2). As a
consequence of this behavior, and of further issues described for the
strict_guarantees tunable, short-term service guarantees may be
occasionally violated. And, in some cases, these guarantees may be
more important than guaranteeing maximum throughput. For example, in
video playing/streaming, a very low drop rate may be more important
than maximum throughput. In these cases, consider setting the
strict_guarantees parameter.
strict_guarantees
-----------------
......@@ -420,6 +442,13 @@ The default value is 0, which enables auto-tuning: BFQ sets max_budget
to the maximum number of sectors that can be served during
timeout_sync, according to the estimated peak rate.
For specific devices, some users have occasionally reported to have
reached a higher throughput by setting max_budget explicitly, i.e., by
setting max_budget to a higher value than 0. In particular, they have
set max_budget to higher values than those to which BFQ would have set
it with auto-tuning. An alternative way to achieve this goal is to
just increase the value of timeout_sync, leaving max_budget equal to 0.
weights
-------
......@@ -427,51 +456,6 @@ Read-only parameter, used to show the weights of the currently active
BFQ queues.
wr_ tunables
------------
BFQ exports a few parameters to control/tune the behavior of
low-latency heuristics.
wr_coeff
Factor by which the weight of a weight-raised queue is multiplied. If
the queue is deemed soft real-time, then the weight is further
multiplied by an additional, constant factor.
wr_max_time
Maximum duration of a weight-raising period for an interactive task
(ms). If set to zero (default value), then this value is computed
automatically, as a function of the peak rate of the device. In any
case, when the value of this parameter is read, it always reports the
current duration, regardless of whether it has been set manually or
computed automatically.
wr_max_softrt_rate
Maximum service rate below which a queue is deemed to be associated
with a soft real-time application, and is then weight-raised
accordingly (sectors/sec).
wr_min_idle_time
Minimum idle period after which interactive weight-raising may be
reactivated for a queue (in ms).
wr_rt_max_time
Maximum weight-raising duration for soft real-time queues (in ms). The
start time from which this duration is considered is automatically
moved forward if the queue is detected to be still soft real-time
before the current soft real-time weight-raising period finishes.
wr_min_inter_arr_async
Minimum period between I/O request arrivals after which weight-raising
may be reactivated for an already busy async queue (in ms).
4. Group scheduling with BFQ
============================
......
......@@ -206,7 +206,7 @@ static void bfqg_get(struct bfq_group *bfqg)
bfqg->ref++;
}
void bfqg_put(struct bfq_group *bfqg)
static void bfqg_put(struct bfq_group *bfqg)
{
bfqg->ref--;
......@@ -385,7 +385,7 @@ static struct bfq_group_data *blkcg_to_bfqgd(struct blkcg *blkcg)
return cpd_to_bfqgd(blkcg_to_cpd(blkcg, &blkcg_policy_bfq));
}
struct blkcg_policy_data *bfq_cpd_alloc(gfp_t gfp)
static struct blkcg_policy_data *bfq_cpd_alloc(gfp_t gfp)
{
struct bfq_group_data *bgd;
......@@ -395,7 +395,7 @@ struct blkcg_policy_data *bfq_cpd_alloc(gfp_t gfp)
return &bgd->pd;
}
void bfq_cpd_init(struct blkcg_policy_data *cpd)
static void bfq_cpd_init(struct blkcg_policy_data *cpd)
{
struct bfq_group_data *d = cpd_to_bfqgd(cpd);
......@@ -403,12 +403,12 @@ void bfq_cpd_init(struct blkcg_policy_data *cpd)
CGROUP_WEIGHT_DFL : BFQ_WEIGHT_LEGACY_DFL;
}
void bfq_cpd_free(struct blkcg_policy_data *cpd)
static void bfq_cpd_free(struct blkcg_policy_data *cpd)
{
kfree(cpd_to_bfqgd(cpd));
}
struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, int node)
static struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, int node)
{
struct bfq_group *bfqg;
......@@ -426,7 +426,7 @@ struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, int node)
return &bfqg->pd;
}
void bfq_pd_init(struct blkg_policy_data *pd)
static void bfq_pd_init(struct blkg_policy_data *pd)
{
struct blkcg_gq *blkg = pd_to_blkg(pd);
struct bfq_group *bfqg = blkg_to_bfqg(blkg);
......@@ -445,7 +445,7 @@ void bfq_pd_init(struct blkg_policy_data *pd)
bfqg->rq_pos_tree = RB_ROOT;
}
void bfq_pd_free(struct blkg_policy_data *pd)
static void bfq_pd_free(struct blkg_policy_data *pd)
{
struct bfq_group *bfqg = pd_to_bfqg(pd);
......@@ -453,7 +453,7 @@ void bfq_pd_free(struct blkg_policy_data *pd)
bfqg_put(bfqg);
}
void bfq_pd_reset_stats(struct blkg_policy_data *pd)
static void bfq_pd_reset_stats(struct blkg_policy_data *pd)
{
struct bfq_group *bfqg = pd_to_bfqg(pd);
......@@ -740,7 +740,7 @@ static void bfq_reparent_active_entities(struct bfq_data *bfqd,
* blkio already grabs the queue_lock for us, so no need to use
* RCU-based magic
*/
void bfq_pd_offline(struct blkg_policy_data *pd)
static void bfq_pd_offline(struct blkg_policy_data *pd)
{
struct bfq_service_tree *st;
struct bfq_group *bfqg = pd_to_bfqg(pd);
......
......@@ -239,7 +239,7 @@ static int T_slow[2];
static int T_fast[2];
static int device_speed_thresh[2];
#define RQ_BIC(rq) ((struct bfq_io_cq *) (rq)->elv.priv[0])
#define RQ_BIC(rq) icq_to_bic((rq)->elv.priv[0])
#define RQ_BFQQ(rq) ((rq)->elv.priv[1])
struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
......@@ -720,7 +720,7 @@ static void bfq_updated_next_req(struct bfq_data *bfqd,
entity->budget = new_budget;
bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
new_budget);
bfq_requeue_bfqq(bfqd, bfqq);
bfq_requeue_bfqq(bfqd, bfqq, false);
}
}
......@@ -2563,7 +2563,7 @@ static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
bfq_del_bfqq_busy(bfqd, bfqq, true);
} else {
bfq_requeue_bfqq(bfqd, bfqq);
bfq_requeue_bfqq(bfqd, bfqq, true);
/*
* Resort priority tree of potential close cooperators.
*/
......@@ -3780,6 +3780,7 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
default:
dev_err(bfqq->bfqd->queue->backing_dev_info->dev,
"bfq: bad prio class %d\n", ioprio_class);
/* fall through */
case IOPRIO_CLASS_NONE:
/*
* No prio set, inherit CPU scheduling settings.
......@@ -4801,13 +4802,15 @@ static ssize_t bfq_var_show(unsigned int var, char *page)
return sprintf(page, "%u\n", var);
}
static void bfq_var_store(unsigned long *var, const char *page)
static int bfq_var_store(unsigned long *var, const char *page)
{
unsigned long new_val;
int ret = kstrtoul(page, 10, &new_val);
if (ret == 0)
*var = new_val;
if (ret)
return ret;
*var = new_val;
return 0;
}
#define SHOW_FUNCTION(__FUNC, __VAR, __CONV) \
......@@ -4848,12 +4851,16 @@ static ssize_t \
__FUNC(struct elevator_queue *e, const char *page, size_t count) \
{ \
struct bfq_data *bfqd = e->elevator_data; \
unsigned long uninitialized_var(__data); \
bfq_var_store(&__data, (page)); \
if (__data < (MIN)) \
__data = (MIN); \
else if (__data > (MAX)) \
__data = (MAX); \
unsigned long __data, __min = (MIN), __max = (MAX); \
int ret; \
\
ret = bfq_var_store(&__data, (page)); \
if (ret) \
return ret; \
if (__data < __min) \
__data = __min; \
else if (__data > __max) \
__data = __max; \
if (__CONV == 1) \
*(__PTR) = msecs_to_jiffies(__data); \
else if (__CONV == 2) \
......@@ -4876,12 +4883,16 @@ STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 2);
static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count)\
{ \
struct bfq_data *bfqd = e->elevator_data; \
unsigned long uninitialized_var(__data); \
bfq_var_store(&__data, (page)); \
if (__data < (MIN)) \
__data = (MIN); \
else if (__data > (MAX)) \
__data = (MAX); \
unsigned long __data, __min = (MIN), __max = (MAX); \
int ret; \
\
ret = bfq_var_store(&__data, (page)); \
if (ret) \
return ret; \
if (__data < __min) \
__data = __min; \
else if (__data > __max) \
__data = __max; \
*(__PTR) = (u64)__data * NSEC_PER_USEC; \
return count; \
}
......@@ -4893,9 +4904,12 @@ static ssize_t bfq_max_budget_store(struct elevator_queue *e,
const char *page, size_t count)
{
struct bfq_data *bfqd = e->elevator_data;
unsigned long uninitialized_var(__data);
unsigned long __data;
int ret;
bfq_var_store(&__data, (page));
ret = bfq_var_store(&__data, (page));
if (ret)
return ret;
if (__data == 0)
bfqd->bfq_max_budget = bfq_calc_max_budget(bfqd);
......@@ -4918,9 +4932,12 @@ static ssize_t bfq_timeout_sync_store(struct elevator_queue *e,
const char *page, size_t count)
{
struct bfq_data *bfqd = e->elevator_data;
unsigned long uninitialized_var(__data);
unsigned long __data;
int ret;
bfq_var_store(&__data, (page));
ret = bfq_var_store(&__data, (page));
if (ret)
return ret;
if (__data < 1)
__data = 1;
......@@ -4938,9 +4955,12 @@ static ssize_t bfq_strict_guarantees_store(struct elevator_queue *e,
const char *page, size_t count)
{
struct bfq_data *bfqd = e->elevator_data;
unsigned long uninitialized_var(__data);
unsigned long __data;
int ret;
bfq_var_store(&__data, (page));
ret = bfq_var_store(&__data, (page));
if (ret)
return ret;
if (__data > 1)
__data = 1;
......@@ -4957,9 +4977,12 @@ static ssize_t bfq_low_latency_store(struct elevator_queue *e,
const char *page, size_t count)
{
struct bfq_data *bfqd = e->elevator_data;
unsigned long uninitialized_var(__data);
unsigned long __data;
int ret;
bfq_var_store(&__data, (page));
ret = bfq_var_store(&__data, (page));
if (ret)
return ret;
if (__data > 1)
__data = 1;
......
......@@ -817,7 +817,6 @@ extern const int bfq_timeout;
struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync);
void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync);
struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
void bfq_requeue_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_entity *entity,
struct rb_root *root);
......@@ -917,7 +916,8 @@ void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd);
void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool ins_into_idle_tree, bool expiration);
void bfq_activate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_requeue_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_requeue_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool expiration);
void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool expiration);
void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq);
......
......@@ -44,7 +44,8 @@ static unsigned int bfq_class_idx(struct bfq_entity *entity)
BFQ_DEFAULT_GRP_CLASS - 1;
}
static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd);
static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
bool expiration);
static bool bfq_update_parent_budget(struct bfq_entity *next_in_service);
......@@ -54,6 +55,8 @@ static bool bfq_update_parent_budget(struct bfq_entity *next_in_service);
* @new_entity: if not NULL, pointer to the entity whose activation,
* requeueing or repositionig triggered the invocation of
* this function.
* @expiration: id true, this function is being invoked after the
* expiration of the in-service entity
*
* This function is called to update sd->next_in_service, which, in
* its turn, may change as a consequence of the insertion or
......@@ -72,19 +75,20 @@ static bool bfq_update_parent_budget(struct bfq_entity *next_in_service);
* entity.
*/
static bool bfq_update_next_in_service(struct bfq_sched_data *sd,
struct bfq_entity *new_entity)
struct bfq_entity *new_entity,
bool expiration)
{
struct bfq_entity *next_in_service = sd->next_in_service;
bool parent_sched_may_change = false;
bool change_without_lookup = false;
/*
* If this update is triggered by the activation, requeueing
* or repositiong of an entity that does not coincide with
* sd->next_in_service, then a full lookup in the active tree
* can be avoided. In fact, it is enough to check whether the
* just-modified entity has a higher priority than
* sd->next_in_service, or, even if it has the same priority
* as sd->next_in_service, is eligible and has a lower virtual
* just-modified entity has the same priority as
* sd->next_in_service, is eligible and has a lower virtual
* finish time than sd->next_in_service. If this compound
* condition holds, then the new entity becomes the new
* next_in_service. Otherwise no change is needed.
......@@ -96,13 +100,12 @@ static bool bfq_update_next_in_service(struct bfq_sched_data *sd,
* set to true, and left as true if
* sd->next_in_service is NULL.
*/
bool replace_next = true;
change_without_lookup = true;
/*
* If there is already a next_in_service candidate
* entity, then compare class priorities or timestamps
* to decide whether to replace sd->service_tree with
* new_entity.
* entity, then compare timestamps to decide whether
* to replace sd->service_tree with new_entity.
*/
if (next_in_service) {
unsigned int new_entity_class_idx =
......@@ -110,32 +113,26 @@ static bool bfq_update_next_in_service(struct bfq_sched_data *sd,
struct bfq_service_tree *st =
sd->service_tree + new_entity_class_idx;
/*
* For efficiency, evaluate the most likely
* sub-condition first.
*/
replace_next =
change_without_lookup =
(new_entity_class_idx ==
bfq_class_idx(next_in_service)
&&
!bfq_gt(new_entity->start, st->vtime)
&&
bfq_gt(next_in_service->finish,
new_entity->finish))
||
new_entity_class_idx <
bfq_class_idx(next_in_service);
new_entity->finish));
}
if (replace_next)
if (change_without_lookup)
next_in_service = new_entity;
} else /* invoked because of a deactivation: lookup needed */
next_in_service = bfq_lookup_next_entity(sd);
}
if (!change_without_lookup) /* lookup needed */
next_in_service = bfq_lookup_next_entity(sd, expiration);
if (next_in_service) {
if (next_in_service)
parent_sched_may_change = !sd->next_in_service ||
bfq_update_parent_budget(next_in_service);
}
sd->next_in_service = next_in_service;
......@@ -1127,10 +1124,12 @@ static void __bfq_activate_requeue_entity(struct bfq_entity *entity,
* @requeue: true if this is a requeue, which implies that bfqq is
* being expired; thus ALL its ancestors stop being served and must
* therefore be requeued
* @expiration: true if this function is being invoked in the expiration path
* of the in-service queue
*/
static void bfq_activate_requeue_entity(struct bfq_entity *entity,
bool non_blocking_wait_rq,
bool requeue)
bool requeue, bool expiration)
{
struct bfq_sched_data *sd;
......@@ -1138,7 +1137,8 @@ static void bfq_activate_requeue_entity(struct bfq_entity *entity,
sd = entity->sched_data;
__bfq_activate_requeue_entity(entity, sd, non_blocking_wait_rq);
if (!bfq_update_next_in_service(sd, entity) && !requeue)
if (!bfq_update_next_in_service(sd, entity, expiration) &&
!requeue)
break;
}
}
......@@ -1194,6 +1194,8 @@ bool __bfq_deactivate_entity(struct bfq_entity *entity, bool ins_into_idle_tree)
* bfq_deactivate_entity - deactivate an entity representing a bfq_queue.
* @entity: the entity to deactivate.
* @ins_into_idle_tree: true if the entity can be put into the idle tree
* @expiration: true if this function is being invoked in the expiration path
* of the in-service queue
*/
static void bfq_deactivate_entity(struct bfq_entity *entity,
bool ins_into_idle_tree,
......@@ -1222,7 +1224,7 @@ static void bfq_deactivate_entity(struct bfq_entity *entity,
* then, since entity has just been
* deactivated, a new one must be found.
*/
bfq_update_next_in_service(sd, NULL);
bfq_update_next_in_service(sd, NULL, expiration);
if (sd->next_in_service || sd->in_service_entity) {
/*
......@@ -1281,7 +1283,7 @@ static void bfq_deactivate_entity(struct bfq_entity *entity,
__bfq_requeue_entity(entity);
sd = entity->sched_data;
if (!bfq_update_next_in_service(sd, entity) &&
if (!bfq_update_next_in_service(sd, entity, expiration) &&
!expiration)
/*
* next_in_service unchanged or not causing
......@@ -1416,12 +1418,14 @@ __bfq_lookup_next_entity(struct bfq_service_tree *st, bool in_service)
/**
* bfq_lookup_next_entity - return the first eligible entity in @sd.
* @sd: the sched_data.
* @expiration: true if we are on the expiration path of the in-service queue
*
* This function is invoked when there has been a change in the trees
* for sd, and we need know what is the new next entity after this
* change.