Analysis of Thompson sampling for the Multi-Armed Bandit problem, JMLR, Conference On Learning Theory, 2012. ,
Opportunistic Spectrum Access with Multiple Users: Learning under Competition, 2010 Proceedings IEEE INFOCOM, 2010. ,
DOI : 10.1109/INFCOM.2010.5462144
URL : http://www.mit.edu/%7Eanimakum/pubs/AnandkumarInfocom10.pdf
Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret, IEEE Journal on Selected Areas in Communications, vol.29, issue.4, pp.731-745, 2011. ,
DOI : 10.1109/JSAC.2011.110406
Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards, IEEE Transactions on Automatic Control, vol.32, issue.11, pp.968-976, 1987. ,
DOI : 10.1109/TAC.1987.1104491
Finite-time Analysis of the Multi-armed Bandit Problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002. ,
DOI : 10.1137/S0097539701398375
URL : http://homepages.math.uic.edu/%7Elreyzin/f14_mcs548/auer02.pdf
Learning to Coordinate Without Communication in Multi-User Multi-Armed Bandit Problems, 2015. ,
Multi-user lax communications: A multi-armed bandit approach, IEEE INFOCOM 2016, The 35th Annual IEEE International Conference on Computer Communications, 2016. ,
DOI : 10.1109/INFOCOM.2016.7524557
Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings, 12th EAI Conference on Cognitive Radio Oriented Wireless Network and Communication, CROWNCOM Proceedings, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01575419
Regret Analysis of Stochastic and Non-Stochastic Multi-Armed Bandit Problems, Machine Learning, p.2012 ,
Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013. ,
DOI : 10.1214/13-AOS1119SUPP
Simple and scalable response prediction for display advertising. Transactions on Intelligent Systems and Technology, 2014. ,
DOI : 10.1145/2532128
URL : http://olivier.chapelle.cc/pub/ngdstone.pdf
Explore First, Exploit Next: The True Shape of Regret in Bandit Problems, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01276324
Multi-armed bandit based policies for cognitive radio's decision making issues, 2009 3rd International Conference on Signals, Circuits and Systems (SCS), 2009. ,
DOI : 10.1109/ICSCS.2009.5412697
Upper Confidence Bound Based Decision Making Strategies and Dynamic Spectrum Access, 2010 IEEE International Conference on Communications, 2010. ,
DOI : 10.1109/ICC.2010.5502014
URL : https://hal.archives-ouvertes.fr/hal-00489331
Decentralized learning for multi-player multi-armed bandits, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012. ,
DOI : 10.1109/CDC.2012.6426587
URL : http://arxiv.org/abs/1206.3582
On Bayesian Upper Confidence Bounds for Bandit Problems, AISTATS, pp.592-600, 2012. ,
Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis, pp.199-213, 2012. ,
DOI : 10.1007/978-3-642-34106-9_18
URL : https://hal.archives-ouvertes.fr/hal-00830033
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-Armed Bandit Problem with Multiple Plays, International Conference on Machine Learning, pp.1152-1161, 2015. ,
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
URL : https://doi.org/10.1016/0196-8858(85)90002-8
A contextual-bandit approach to personalized news article recommendation, Proceedings of the 19th international conference on World wide web, WWW '10, pp.661-670, 2010. ,
DOI : 10.1145/1772690.1772758
URL : http://www.cs.rutgers.edu/~lihong/pub/Li10Contextual.pdf
Distributed Learning in Multi-Armed Bandit With Multiple Players, IEEE Transactions on Signal Processing, vol.58, issue.11, pp.5667-5681, 2010. ,
DOI : 10.1109/TSP.2010.2062509
Cognitive radio: making software radios more personal, IEEE Personal Communications, vol.6, issue.4, pp.13-18, 1999. ,
DOI : 10.1109/98.788210
Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952. ,
DOI : 10.1090/S0002-9904-1952-09620-8
Multi-Player Bandits ? A Musical Chairs Approach, International Conference on Machine Learning, pp.155-163, 2016. ,
Online learning in decentralized multi-user spectrum access with synchronized explorations, MILCOM 2012, 2012 IEEE Military Communications Conference, 2012. ,
DOI : 10.1109/MILCOM.2012.6415693
On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples, Biometrika, p.25, 1933. ,
Bandits with Movement Costs and Adaptive Pricing, 30th Annual Conference on Learning Theory (COLT) Conference Proceedings, pp.1242-1268, 2017. ,