Stochastic Bandit Models for Delayed Conversions

Claire Vernade; Olivier Cappé; Vianney Perchet

Communication Dans Un Congrès Année : 2017

Stochastic Bandit Models for Delayed Conversions

(1, 2) , (3, 2, 4) , (5, 2, 6)

1
2
3
4
5
6

Claire Vernade

Fonction : Auteur
PersonId : 1010869
IdHAL : vernade
ORCID : 0000-0002-1305-2702

Laboratoire Traitement et Communication de l'Information

Université Paris-Saclay

Olivier Cappé

Fonction : Auteur
PersonId : 1534
IdHAL : olivier-cappe
ORCID : 0000-0001-7415-8669
IdRef : 057106878

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Université Paris-Saclay

Centre National de la Recherche Scientifique

Vianney Perchet

Fonction : Auteur
PersonId : 871881

Centre de Mathématiques et de Leurs Applications

Université Paris-Saclay

Criteo AI Lab

Résumé

Online advertising and product recommendation are important domains of applications for multi-armed bandit methods. In these fields, the reward that is immediately available is most often only a proxy for the actual outcome of interest, which we refer to as a conversion. For instance, in web advertising, clicks can be observed within a few seconds after an ad display but the corresponding sale –if any– will take hours, if not days to happen. This paper proposes and investigates a new stochas-tic multi-armed bandit model in the framework proposed by Chapelle (2014) –based on empirical studies in the field of web advertising– in which each action may trigger a future reward that will then happen with a stochas-tic delay. We assume that the probability of conversion associated with each action is unknown while the distribution of the conversion delay is known, distinguishing between the (idealized) case where the conversion events may be observed whatever their delay and the more realistic setting in which late conversions are censored. We provide performance lower bounds as well as two simple but efficient algorithms based on the UCB and KLUCB frameworks. The latter algorithm, which is preferable when conversion rates are low, is based on a Poissonization argument, of independent interest in other settings where aggregation of Bernoulli observations with different success probabilities is required.

Mots clés

Multi-Armed Bandit Delayed Feedback Online Advertising

Domaines

Apprentissage [cs.LG]

Fichier principal

Paper.pdf (943.31 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Claire Vernade : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01545667

Soumis le : mardi 4 juillet 2017-00:59:33

Dernière modification le : mardi 9 avril 2024-03:24:56

Archivage à long terme le : mardi 23 janvier 2018-18:01:11

Dates et versions

hal-01545667 , version 1 (28-06-2017)

hal-01545667 , version 2 (04-07-2017)

Identifiants

HAL Id : hal-01545667 , version 2
ARXIV : 1706.09186

Citer

Claire Vernade, Olivier Cappé, Vianney Perchet. Stochastic Bandit Models for Delayed Conversions. Conference on Uncertainty in Artificial Intelligence, Aug 2017, Sydney, Australia. ⟨hal-01545667v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS ENS-CACHAN PARISTECH LIMSI SORBONNE-UNIVERSITE LTCI ENS-PARIS-SACLAY LISN GS-ENGINEERING GS-COMPUTER-SCIENCE GS-SPORT-HUMAN-MOVEMENT

390 Consultations

372 Téléchargements

Stochastic Bandit Models for Delayed Conversions

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager