imported>Ivlevin: Migrated current public revision from wiki.cs.hse.ru

2023-11-25T16:19:35Z

Migrated current public revision from wiki.cs.hse.ru

Новая страница

== Lecturers and Seminarists ==

{| class="wikitable" style="text-align:center"
|-
|| Lecturer || [https://www.hse.ru/staff/anaumov Alexey Naumov] || [anaumov@hse.ru] || T924
|-
|| Seminarist || Ilya Levin || [tg: @levensons] || T926
|-
|}

== About the course ==
This page contains materials for Mathematical Foundations of Reinforcement learning course in 2023 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).

== Grading ==
The final grade consists of 2 components (each is non-negative real number from 0 to 10, without any intermediate rounding) :
* OHW for the hometasks
* OExam for the exam
The formula for the final grade is
* OFinal = 0.6*OHW + 0.4*OExam
with the usual (arithmetical) rounding rule.

== Course materials ==
*[https://www.overleaf.com/read/kbzmvxdzbrxq '''Lectures and seminars notes''']
*[https://colab.research.google.com/drive/10qBq7Ot_1ZpnTeD11P5AnE8jFVj0OLXl?usp=sharing '''Notebook for the first seminar''']

== Homeworks ==
*[https://disk.yandex.ru/i/C8hwvvS5us09sA '''HW 1''']

== Recommended literature ==

* Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf
* Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;
* Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247
* Aleksandrs Slivkins. Introduction to Multi-Armed Bandits. https://arxiv.org/abs/1904.07272 [Chapter 1]

==Homeworks ==

== Projects ==

RL 2023 - История изменений

imported>Ivlevin: Migrated current public revision from wiki.cs.hse.ru