<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="ru">
	<id>https://www.wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=Reinforcement_learning_2021_2022</id>
	<title>Reinforcement learning 2021 2022 - История изменений</title>
	<link rel="self" type="application/atom+xml" href="https://www.wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=Reinforcement_learning_2021_2022"/>
	<link rel="alternate" type="text/html" href="https://www.wikicshse.ru/index.php?title=Reinforcement_learning_2021_2022&amp;action=history"/>
	<updated>2026-06-06T12:14:12Z</updated>
	<subtitle>История изменений этой страницы в вики</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://www.wikicshse.ru/index.php?title=Reinforcement_learning_2021_2022&amp;diff=634&amp;oldid=prev</id>
		<title>imported&gt;Svsamsonov: Migrated current public revision from wiki.cs.hse.ru</title>
		<link rel="alternate" type="text/html" href="https://www.wikicshse.ru/index.php?title=Reinforcement_learning_2021_2022&amp;diff=634&amp;oldid=prev"/>
		<updated>2021-12-14T19:49:23Z</updated>

		<summary type="html">&lt;p&gt;Migrated current public revision from wiki.cs.hse.ru&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Новая страница&lt;/b&gt;&lt;/p&gt;&lt;div&gt;== Lecturers and Seminarists ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align:center&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
|| Lecturer || [https://www.hse.ru/staff/anaumov Alexey Naumov] || [anaumov@hse.ru] || T924&lt;br /&gt;
|- &lt;br /&gt;
|| Lecturer || [https://www.hse.ru/org/persons/93130881 Denis Belomestny ] || [dbelomestny@hse.ru] || T924&lt;br /&gt;
|- &lt;br /&gt;
|| Seminarist || [https://www.hse.ru/org/persons/219484540 Sergey Samsonov] || [svsamsonov@hse.ru] || T926&lt;br /&gt;
|-&lt;br /&gt;
|| Seminarist || [https://www.hse.ru/staff/mkaledin Maxim Kaledin ] || [mkaledin@hse.ru] || T926&lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== About the course ==&lt;br /&gt;
This page contains materials for Mathematical Foundations of Reinforcement learning course in 2021/2022 year, optional one for 2nd year Master students of the Math of Machine Learning program (HSE and Skoltech).&lt;br /&gt;
&lt;br /&gt;
== Grading == &lt;br /&gt;
The final grade consists of 2 components (each is non-negative real number from 0 to 10, without any intermediate rounding) :&lt;br /&gt;
* O&amp;lt;sub&amp;gt;HW&amp;lt;/sub&amp;gt; for the hometasks&lt;br /&gt;
* O&amp;lt;sub&amp;gt;Project&amp;lt;/sub&amp;gt; for the course project&lt;br /&gt;
The formula for the final grade is &lt;br /&gt;
* O&amp;lt;sub&amp;gt;Final&amp;lt;/sub&amp;gt; = 0.5*O&amp;lt;sub&amp;gt;HW&amp;lt;/sub&amp;gt; + 0.5*O&amp;lt;sub&amp;gt;Project&amp;lt;/sub&amp;gt;&lt;br /&gt;
with the usual (arithmetical) rounding rule.&lt;br /&gt;
&lt;br /&gt;
[https://docs.google.com/spreadsheets/d/1MPWVIkgxyotHU-P5cE7Gik4C6RTWxTnAVK8Btl7Fw3Y/edit?usp=sharing &amp;#039;&amp;#039;&amp;#039;Table with grades&amp;#039;&amp;#039;&amp;#039;]&lt;br /&gt;
&lt;br /&gt;
== Lectures ==&lt;br /&gt;
*[https://www.dropbox.com/s/a69ql9duo5jf5gt/Math%20of%20RL%20Lecture%201.pdf?dl=0 &amp;#039;&amp;#039;&amp;#039; Lecture 09.11&amp;#039;&amp;#039;&amp;#039;]&lt;br /&gt;
*[https://www.dropbox.com/s/7zkirk1xykua890/Math_of_RL_Le%20cture_2.pdf?dl=0 &amp;#039;&amp;#039;&amp;#039; Lecture 16.11&amp;#039;&amp;#039;&amp;#039;]&lt;br /&gt;
&lt;br /&gt;
== Seminars ==&lt;br /&gt;
*[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 &amp;#039;&amp;#039;&amp;#039;Seminar 09.11&amp;#039;&amp;#039;&amp;#039;], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 &amp;#039;&amp;#039;&amp;#039;Seminar 09.11, Video&amp;#039;&amp;#039;&amp;#039;], [https://www.dropbox.com/s/bxa8h9vjrnegsql/Bandit_intro_strategies_09_11_2021.ipynb?dl=0 &amp;#039;&amp;#039;&amp;#039;Seminar 09.11, Notebook&amp;#039;&amp;#039;&amp;#039;]&lt;br /&gt;
*[https://www.dropbox.com/s/cq0t2o6n4yn6oag/Seminar_16_11_RL.mp4?dl=0 &amp;#039;&amp;#039;&amp;#039;Seminar 16.11, Video&amp;#039;&amp;#039;&amp;#039;],&lt;br /&gt;
*[https://www.dropbox.com/s/ex8v9w3smar70m7/Seminar_23_11_RL.mp4?dl=0 &amp;#039;&amp;#039;&amp;#039;Seminar 23.11, Video&amp;#039;&amp;#039;&amp;#039;],&lt;br /&gt;
*[https://www.dropbox.com/s/v1ywnk8eyhourjq/Seminar_07_12_RL.mp4?dl=0 &amp;#039;&amp;#039;&amp;#039;Seminar 07.12, Video&amp;#039;&amp;#039;&amp;#039;],&lt;br /&gt;
&lt;br /&gt;
== Recommended literature ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Lecture and seminar 09.11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
* Sebastien Bubek, Nicolo Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Chapter 2. http://sbubeck.com/SurveyBCB12.pdf&lt;br /&gt;
* Richard S. Sutton, Andrew G. Barto. Reinforcement Learning: An Introduction. Chapter 2. http://incompleteideas.net/book/the-book-2nd.html;&lt;br /&gt;
* Botao Hao et al. Bootstrapping Upper Confidence Bound. https://arxiv.org/abs/1906.05247&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Lecture and seminar 16.11&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
*[https://www.dropbox.com/s/wc951vseud1q1p2/Seminar_09_11_RL.pdf?dl=0 &amp;#039;&amp;#039;&amp;#039;Seminar 09.11&amp;#039;&amp;#039;&amp;#039;], [https://www.dropbox.com/s/2h83vbjgew1inen/Seminar_1_RL.mp4?dl=0 &amp;#039;&amp;#039;&amp;#039;Seminar 09.11, Video&amp;#039;&amp;#039;&amp;#039;], &lt;br /&gt;
&lt;br /&gt;
==Homeworks ==&lt;br /&gt;
*[https://www.dropbox.com/s/k2at9lixvshpcbw/HW_1_RL_2021.pdf?dl=0 &amp;#039;&amp;#039;&amp;#039;Homework №1, deadline 19.12.2021, 23:59&amp;#039;&amp;#039;&amp;#039;], [https://www.dropbox.com/s/l7pma6kwnopl856/HW_1_task_2.ipynb?dl=0 &amp;#039;&amp;#039;&amp;#039;Environment for task №2&amp;#039;&amp;#039;&amp;#039;],&lt;br /&gt;
*[https://www.dropbox.com/s/jynwji3dw3xxjww/HW_2_RL_2021.pdf?dl=0 &amp;#039;&amp;#039;&amp;#039;Homework №2, deadline 19.12.2021, 23:59&amp;#039;&amp;#039;&amp;#039;].&lt;br /&gt;
&lt;br /&gt;
== Projects ==&lt;/div&gt;</summary>
		<author><name>imported&gt;Svsamsonov</name></author>
	</entry>
</feed>