Matthieu Jedor (ENS Paris-Saclay & Cdiscount) · Vianney Perchet (ENS Paris-Saclay & Criteo AI Lab) · Jonathan Louedec (Cdiscount)

We introduce a new stochastic multi-armed bandit setting where arms are grouped inside « ordered » categories. The motivating example comes from e-commerce, where a customer typically has a greater appetence for items of a specific well-identified but unknown category than any other one. We introduce three concepts of ordering between categories, inspired by stochastic dominance between random variables, which are gradually weaker so that more and more bandit scenarios satisfy at least one of them. …

https://www.endtoend.ai/blog/neurips2019-rl