<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="ru">
	<id>https://www.wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_%28%D0%98%D0%9824%2C_7_%D0%BC%D0%BE%D0%B4%D1%83%D0%BB%D1%8C%29</id>
	<title>Обучение с подкреплением (ИИ24, 7 модуль) - История изменений</title>
	<link rel="self" type="application/atom+xml" href="https://www.wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_%28%D0%98%D0%9824%2C_7_%D0%BC%D0%BE%D0%B4%D1%83%D0%BB%D1%8C%29"/>
	<link rel="alternate" type="text/html" href="https://www.wikicshse.ru/index.php?title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_(%D0%98%D0%9824,_7_%D0%BC%D0%BE%D0%B4%D1%83%D0%BB%D1%8C)&amp;action=history"/>
	<updated>2026-06-09T02:36:22Z</updated>
	<subtitle>История изменений этой страницы в вики</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://www.wikicshse.ru/index.php?title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_(%D0%98%D0%9824,_7_%D0%BC%D0%BE%D0%B4%D1%83%D0%BB%D1%8C)&amp;diff=1514&amp;oldid=prev</id>
		<title>imported&gt;Murrcha: Migrated current public revision from wiki.cs.hse.ru</title>
		<link rel="alternate" type="text/html" href="https://www.wikicshse.ru/index.php?title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_(%D0%98%D0%9824,_7_%D0%BC%D0%BE%D0%B4%D1%83%D0%BB%D1%8C)&amp;diff=1514&amp;oldid=prev"/>
		<updated>2026-04-13T12:17:05Z</updated>

		<summary type="html">&lt;p&gt;Migrated current public revision from wiki.cs.hse.ru&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Новая страница&lt;/b&gt;&lt;/p&gt;&lt;div&gt;==О курсе==&lt;br /&gt;
&lt;br /&gt;
Занятия проводятся в [https://us06web.zoom.us/j/82360571226?pwd=QcdTZQvEba8tBWx2FrfK7v4P4k2Jra.1 Zoom] &amp;#039;&amp;#039;&amp;#039;по субботам с 13:00 МСК&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
==Контакты==&lt;br /&gt;
&lt;br /&gt;
Чат курса в TG: [https://t.me/+cl-7RP37Ulw4YTE6 link]&lt;br /&gt;
&lt;br /&gt;
Преподаватели: Сергей Лактионов, Вячеслав Бучков&lt;br /&gt;
&lt;br /&gt;
==Материалы курса==&lt;br /&gt;
Ссылка на плейлист курса на YouTube: [[https://www.youtube.com/playlist?list=PLmA-1xX7IuzDe8CEWijYwsgmdHXyaEQsg YouTube-playlist]]&lt;br /&gt;
&lt;br /&gt;
Ссылка на плейлист курса в VKVideo: [[https://vkvideo.ru/playlist/-227011779_68 VKVideo-playlist]]&lt;br /&gt;
&lt;br /&gt;
Ссылка на GitHub с материалами курса: [[https://github.com/laktionov/RL-course/tree/2026 GitHub repository]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
 ! Занятие !! Тема !! Дата !! Дополнительные материалы&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;1&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=KLEcPmdR87U YouTube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239732?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week1_intro_dynamic_programming Материалы]] Intro to RL, Dynamic Programming  || 10/01/2026 || &lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;2&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=Uf-KHdRh3zs YouTube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239762?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week2_model_free_rl Материалы]] Model-Free Tabular RL: Q-Learning, SARSA || 17/01/2026 || &lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;3&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=1wqbiJEB5ok YouTube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239789?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week3_dqn Материалы]] Intro to Deep RL: DQN, RAINBOW and beyond || 24/01/2026 ||[[https://arxiv.org/pdf/1507.06527.pdf#:~:text=The%20resulting%20Deep%20Recurrent%20Q,equivalents%20featuring%20flickering%20game%20screens. DQN]], [[https://arxiv.org/abs/1507.06527 DRQN]], [[https://arxiv.org/abs/1710.02298 RAINBOW]], [[https://arxiv.org/abs/1803.00933 APE-X]]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;4&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=Je_20lKuBSM YouTube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239808?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week4_policy_based Материалы]] Policy-Based Methods: Policy Gradient, REINFORCE, A2C || 31/01/2026 || [[https://papers.nips.cc/paper_files/paper/1999/hash/464d828b85b0bed98e80ade0a5c43b0f-Abstract.html Policy Gradient]], [[https://arxiv.org/abs/1602.01783 Actor-Critic]], [[https://arxiv.org/abs/2402.14740 REINFORCE in 2024]]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;5&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=8KU9nnp1PMo YouTube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239827?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week5_advanced_policy_based Материалы]] Advanced Policy-Based: TRPO, PPO and beyond || 07/02/2026 || [[https://arxiv.org/pdf/1502.05477.pdf TRPO]], [[https://arxiv.org/pdf/1707.06347.pdf PPO]], [[https://vitalab.github.io/article/2020/01/14/Implementation_Matters.html TRPO vs PPO]]&lt;br /&gt;
&lt;br /&gt;
[[https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ 37 implementation details of PPO]]&lt;br /&gt;
&lt;br /&gt;
[[https://arxiv.org/abs/2010.05380 Wasserstein distance вместо KL]]&lt;br /&gt;
&lt;br /&gt;
[[https://openreview.net/forum?id=Mlwe37htstv Sinkhorn distance вместо KL]]&lt;br /&gt;
&lt;br /&gt;
[[https://arxiv.org/pdf/1705.10528 Improvement Lower Bound в TRPO]]&lt;br /&gt;
&lt;br /&gt;
[[https://arxiv.org/abs/2401.16025 TV Distance]]&lt;br /&gt;
&lt;br /&gt;
[[https://arxiv.org/abs/2205.10047 Sigmoid Soft-Clipping]]&lt;br /&gt;
&lt;br /&gt;
[[https://arxiv.org/abs/2511.20347 Soft-Clipping in LLM]]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;6&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=ujESxbK1uI0 YouTube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239845?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week6_continuous_control Материалы]] Continuous Control: DDPG, SAC and beyond || 14/02/2026 || [[https://arxiv.org/abs/1509.02971 DDPG]], [[https://arxiv.org/abs/1802.09477 TD3]], [[https://arxiv.org/abs/1801.01290 SAC]], [[https://arxiv.org/abs/2005.04269 TQC]]&lt;br /&gt;
&lt;br /&gt;
[[https://gymnasium.farama.org/environments/mujoco/ MuJoCo]]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;7&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=z705MwrjrEU Youtube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239857?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week7_offline_rl Материалы]] Offline RL || 21/02/2026 || [[https://arxiv.org/abs/2005.01643 Offline RL Tutorial], [[https://arxiv.org/abs/2203.01387 A Survey on Offline RL]]&lt;br /&gt;
&lt;br /&gt;
[[https://arxiv.org/abs/2110.06169 IQL]], [[https://arxiv.org/abs/2006.04779 CQL]], [[https://arxiv.org/abs/2305.09836 ReBRAC]]&lt;br /&gt;
&lt;br /&gt;
[[https://arxiv.org/pdf/2106.01345.pdf Decision Transformers]], [[https://arxiv.org/abs/2106.02039 Trajectory Transformers]]&lt;br /&gt;
&lt;br /&gt;
[[https://github.com/tinkoff-ai/CORL CORL Library]]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;8&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=roVyJyAGCtM Youtube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239873?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week8_bandits Материалы]] Multi-Armed Bandits || 28/02/2026 || [[https://arxiv.org/abs/1911.04462 Neural UCB]] [[https://arxiv.org/abs/2010.00827 Neural Thompson Sampling]]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;9&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=bE_rLccqGXI Youtube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239891?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week9_model_based_rl Материалы]] Model-based RL: AlphaZero and friends || 07/03/2026 || [[https://arxiv.org/abs/1712.01815 AlphaZero]], [[https://deepmind.google/discover/blog/muzero-mastering-go-chess-shogi-and-atari-without-rules/ MuZero]], [[https://arxiv.org/abs/2111.00210 EfficientZero]]&lt;br /&gt;
&lt;br /&gt;
[[https://worldmodels.github.io/ World Models]], [[https://blog.research.google/2020/03/introducing-dreamer-scalable.html Dreamer-V1]]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;10&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=QBkqOB65WPg Youtube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239912?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week10_rl_for_llm Материалы]] RL in a context of LLMs || 14/03/2026 || &lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;11&amp;#039;&amp;#039;&amp;#039; [[https://www.youtube.com/watch?v=tn9C5VbqsY0 Youtube]] [[https://vkvideo.ru/playlist/-227011779_68/video-227011779_456239939?linked=1 VKVideo]] || [[https://github.com/laktionov/RL-course/tree/2026/week11_practical_rl Материалы]] Practical RL || 21/03/2026 || &lt;br /&gt;
|-&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Формула оценивания==&lt;br /&gt;
&lt;br /&gt;
Оценка = МИН(10, 10*(0.65*HW + 0.10*TA + 0.25*RC)), где &lt;br /&gt;
&lt;br /&gt;
* HW - сумма баллов за (как минимум) 5 ДЗ;&lt;br /&gt;
* RC - оценка за презентацию статьи, посвященной новым алгоритмам или неожиданным применениям RL-парадигмы в индустрии;&lt;br /&gt;
* TA - сумма баллов за еженедельные квизы (суммарно 10 квизов).&lt;br /&gt;
&lt;br /&gt;
Для каждого домашнего задания есть мягкий дедлайн, сдача после которого в течение недели до жёсткого дедлайна оценивается со штрафом 5% от оценки за ДЗ за каждый день просрочки.&lt;br /&gt;
&lt;br /&gt;
== Домашние задания ==&lt;br /&gt;
# HW-1 &amp;quot;Value- and policy-iteration algorithms&amp;quot; (&amp;#039;&amp;#039;2 балла&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн - 23/01/26 23:59&amp;#039;&amp;#039;&amp;#039;, Жёсткий - 30/01/26 23:59 | [[https://github.com/laktionov/RL-course/tree/2026/hw1 Ноутбук]]&lt;br /&gt;
# HW-2 &amp;quot;SARSA(\lambda) and EV-SARSA(\lambda)&amp;quot; (&amp;#039;&amp;#039;3 балла&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн&amp;#039;&amp;#039;&amp;#039; - &amp;lt;strike&amp;gt;31/01/26 23:59&amp;lt;/strike&amp;gt; &amp;#039;&amp;#039;&amp;#039;04.02.2026 23:59&amp;#039;&amp;#039;&amp;#039;, Жёсткий - &amp;lt;strike&amp;gt;07/02/26 23:59&amp;lt;/strike&amp;gt; &amp;#039;&amp;#039;&amp;#039;11.02.2026 23:59&amp;#039;&amp;#039;&amp;#039; | [[https://github.com/laktionov/RL-course/tree/2026/hw2 Ноутбук]]&lt;br /&gt;
# HW-3 &amp;quot;DQN Implementation&amp;quot; (&amp;#039;&amp;#039;6 баллов&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн - 20/02/26 23:59&amp;#039;&amp;#039;&amp;#039;, Жёсткий - &amp;lt;strike&amp;gt;27/02/26 23:59&amp;lt;/strike&amp;gt; &amp;#039;&amp;#039;&amp;#039;01.03.2026 23:59&amp;#039;&amp;#039;&amp;#039; | [[https://github.com/laktionov/RL-course/tree/2026/hw3 Ноутбук]]&lt;br /&gt;
# HW-4 &amp;quot;PPO Implementation&amp;quot; (&amp;#039;&amp;#039;5 баллов&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн - 08/03/26 23:59&amp;#039;&amp;#039;&amp;#039;, Жёсткий - 15/03/26 23:59 | [[https://github.com/laktionov/RL-course/blob/main/hw4/ppo.ipynb Ноутбук]]&lt;br /&gt;
# HW-5 &amp;quot;SAC Implementation&amp;quot; (&amp;#039;&amp;#039;5 баллов&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн - 15/03/26 23:59&amp;#039;&amp;#039;&amp;#039;, Жёсткий - 22/03/26 23:59 | [[https://github.com/laktionov/RL-course/blob/2026/hw5/sac.ipynb Ноутбук]]&lt;br /&gt;
&lt;br /&gt;
Дедлайн по согласованию статьи - &amp;#039;&amp;#039;&amp;#039;15/03/26 23:59&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
Дедлайн по сдаче статьи - &amp;lt;strike&amp;gt;21/03/26 23:59&amp;lt;/strike&amp;gt; &amp;lt;strike&amp;gt;24/03/26 23:59&amp;lt;/strike&amp;gt; &amp;#039;&amp;#039;&amp;#039;25/03/26 8:59&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
== Литература ==&lt;br /&gt;
# [http://incompleteideas.net/book/the-book-2nd.html Reinforcement Learning: An Introduction by R.Sutton and A.Barto]&lt;br /&gt;
# [https://github.com/yandexdataschool/Practical_RL Practical RL course by YSDA]&lt;br /&gt;
# [https://www.davidsilver.uk/teaching/ David Silver&amp;#039;s course]&lt;br /&gt;
# [https://rail.eecs.berkeley.edu/deeprlcourse/ Sergey Levine&amp;#039;s course]&lt;br /&gt;
# [https://arxiv.org/abs/2201.09746 Reinforcement Learning Textbook (in Russian)]&lt;/div&gt;</summary>
		<author><name>imported&gt;Murrcha</name></author>
	</entry>
</feed>