S. Agrawal and N. Goyal, Analysis of Thompson sampling for the Multi-Armed Bandit problem, JMLR, Conference On Learning Theory, 2012.

A. Anandkumar, N. Michael, and A. K. Tang, Opportunistic Spectrum Access with Multiple Users: Learning under Competition, 2010 Proceedings IEEE INFOCOM, 2010.
DOI : 10.1109/INFCOM.2010.5462144

URL : http://www.mit.edu/%7Eanimakum/pubs/AnandkumarInfocom10.pdf

A. Anandkumar, N. Michael, A. K. Tang, and S. Agrawal, Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret, IEEE Journal on Selected Areas in Communications, vol.29, issue.4, pp.731-745, 2011.
DOI : 10.1109/JSAC.2011.110406

V. Anantharam, P. Varaiya, and J. Walrand, Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part I: I.I.D. rewards, IEEE Transactions on Automatic Control, vol.32, issue.11, pp.968-976, 1987.
DOI : 10.1109/TAC.1987.1104491

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time Analysis of the Multi-armed Bandit Problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Auer, N. Cesa-bianchi, Y. Freund, and R. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.
DOI : 10.1137/S0097539701398375

URL : http://homepages.math.uic.edu/%7Elreyzin/f14_mcs548/auer02.pdf

O. Avner and S. Mannor, Learning to Coordinate Without Communication in Multi-User Multi-Armed Bandit Problems, 2015.

O. Avner and S. Mannor, Multi-user lax communications: A multi-armed bandit approach, IEEE INFOCOM 2016, The 35th Annual IEEE International Conference on Computer Communications, 2016.
DOI : 10.1109/INFOCOM.2016.7524557

R. Bonnefoi, L. Besson, C. Moy, E. Kaufmann, and J. Palicot, Multi-Armed Bandit Learning in IoT Networks: Learning helps even in non-stationary settings, 12th EAI Conference on Cognitive Radio Oriented Wireless Network and Communication, CROWNCOM Proceedings, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01575419

S. Bubeck and N. Cesa-bianchi, Regret Analysis of Stochastic and Non-Stochastic Multi-Armed Bandit Problems, Machine Learning, p.2012

O. Cappé, A. Garivier, O. Maillard, R. Munos, and G. Stoltz, Kullback???Leibler upper confidence bounds for optimal sequential allocation, The Annals of Statistics, vol.41, issue.3, pp.1516-1541, 2013.
DOI : 10.1214/13-AOS1119SUPP

O. Chapelle, E. Manavoglu, and R. Rosales, Simple and scalable response prediction for display advertising. Transactions on Intelligent Systems and Technology, 2014.
DOI : 10.1145/2532128

URL : http://olivier.chapelle.cc/pub/ngdstone.pdf

A. Garivier, P. Ménard, and G. Stoltz, Explore First, Exploit Next: The True Shape of Regret in Bandit Problems, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01276324

W. Jouini, D. Ernst, C. Moy, and J. Palicot, Multi-armed bandit based policies for cognitive radio's decision making issues, 2009 3rd International Conference on Signals, Circuits and Systems (SCS), 2009.
DOI : 10.1109/ICSCS.2009.5412697

W. Jouini, D. Ernst, C. Moy, and J. Palicot, Upper Confidence Bound Based Decision Making Strategies and Dynamic Spectrum Access, 2010 IEEE International Conference on Communications, 2010.
DOI : 10.1109/ICC.2010.5502014

URL : https://hal.archives-ouvertes.fr/hal-00489331

D. Kalathil, N. Nayyar, and R. Jain, Decentralized learning for multi-player multi-armed bandits, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 2012.
DOI : 10.1109/CDC.2012.6426587

URL : http://arxiv.org/abs/1206.3582

E. Kaufmann, O. Cappé, and A. Garivier, On Bayesian Upper Confidence Bounds for Bandit Problems, AISTATS, pp.592-600, 2012.

E. Kaufmann, N. Korda, and R. Munos, Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis, pp.199-213, 2012.
DOI : 10.1007/978-3-642-34106-9_18

URL : https://hal.archives-ouvertes.fr/hal-00830033

J. Komiyama, J. Honda, and H. Nakagawa, Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-Armed Bandit Problem with Multiple Plays, International Conference on Machine Learning, pp.1152-1161, 2015.

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

URL : https://doi.org/10.1016/0196-8858(85)90002-8

L. Li, W. Chu, J. Langford, and R. E. Schapire, A contextual-bandit approach to personalized news article recommendation, Proceedings of the 19th international conference on World wide web, WWW '10, pp.661-670, 2010.
DOI : 10.1145/1772690.1772758

URL : http://www.cs.rutgers.edu/~lihong/pub/Li10Contextual.pdf

K. Liu and Q. Zhao, Distributed Learning in Multi-Armed Bandit With Multiple Players, IEEE Transactions on Signal Processing, vol.58, issue.11, pp.5667-5681, 2010.
DOI : 10.1109/TSP.2010.2062509

J. Mitola and G. Q. Maguire, Cognitive radio: making software radios more personal, IEEE Personal Communications, vol.6, issue.4, pp.13-18, 1999.
DOI : 10.1109/98.788210

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

J. Rosenski, O. Shamir, and L. Szlak, Multi-Player Bandits ? A Musical Chairs Approach, International Conference on Machine Learning, pp.155-163, 2016.

C. Tekin and M. Liu, Online learning in decentralized multi-user spectrum access with synchronized explorations, MILCOM 2012, 2012 IEEE Military Communications Conference, 2012.
DOI : 10.1109/MILCOM.2012.6415693

W. R. Thompson, On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples, Biometrika, p.25, 1933.

K. Tomer, L. Roi, and M. Yishay, Bandits with Movement Costs and Adaptive Pricing, 30th Annual Conference on Learning Theory (COLT) Conference Proceedings, pp.1242-1268, 2017.