<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="ru">
	<id>https://www.wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_24%2F25_%28%D0%9C%D0%9E%D0%92%D0%A123%29</id>
	<title>Обучение с подкреплением 24/25 (МОВС23) - История изменений</title>
	<link rel="self" type="application/atom+xml" href="https://www.wikicshse.ru/index.php?action=history&amp;feed=atom&amp;title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_24%2F25_%28%D0%9C%D0%9E%D0%92%D0%A123%29"/>
	<link rel="alternate" type="text/html" href="https://www.wikicshse.ru/index.php?title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_24/25_(%D0%9C%D0%9E%D0%92%D0%A123)&amp;action=history"/>
	<updated>2026-06-06T22:54:01Z</updated>
	<subtitle>История изменений этой страницы в вики</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://www.wikicshse.ru/index.php?title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_24/25_(%D0%9C%D0%9E%D0%92%D0%A123)&amp;diff=1515&amp;oldid=prev</id>
		<title>imported&gt;Ekantonistova 2: Migrated current public revision from wiki.cs.hse.ru</title>
		<link rel="alternate" type="text/html" href="https://www.wikicshse.ru/index.php?title=%D0%9E%D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5_%D1%81_%D0%BF%D0%BE%D0%B4%D0%BA%D1%80%D0%B5%D0%BF%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5%D0%BC_24/25_(%D0%9C%D0%9E%D0%92%D0%A123)&amp;diff=1515&amp;oldid=prev"/>
		<updated>2025-02-17T02:21:32Z</updated>

		<summary type="html">&lt;p&gt;Migrated current public revision from wiki.cs.hse.ru&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Новая страница&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
==О курсе==&lt;br /&gt;
&lt;br /&gt;
Занятия проводятся в [https://us06web.zoom.us/j/83989277435?pwd=bWZqj4WbblAPbsJaE0KSbgMmJNgnWY.1 Zoom] &amp;#039;&amp;#039;&amp;#039;по субботам в 14:30.&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
==Контакты==&lt;br /&gt;
&lt;br /&gt;
Чат курса в TG: [https://t.me/+m2pVU4F3nsU2YmIy link]&lt;br /&gt;
&lt;br /&gt;
Преподаватель: Лактионов Сергей Дмитриевич&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Ассистент !! Контакты &lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | Оленина Александра || [https://t.me/alex_deer @alex_deer]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | Сивых Егор || [https://t.me/EgorSivykh @EgorSivykh]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | Прошин Александр || [https://t.me/Alex_Pro_7 @Alex_Pro_7]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | Максутова Айза || [https://t.me/aiziks @aiziks]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | Разин Арслан || [https://t.me/CrazyBadRedCat @CrazyBadRedCat]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | Демиденко Никита || [https://t.me/kalxon @kalxon]&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | Никита || [https://t.me/Nn_holt @Nn_holt]&lt;br /&gt;
|-&lt;br /&gt;
&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Материалы курса==&lt;br /&gt;
Ссылка на плейлист курса на YouTube: [[YouTube-playlist]]&lt;br /&gt;
&lt;br /&gt;
Записи лекций и семинаров предыдущего потока: [[https://www.youtube.com/playlist?list=PLmA-1xX7IuzAO3gkubS2I6LuqDNBs1xcP YouTube-playlist]]&lt;br /&gt;
&lt;br /&gt;
Ссылка на GitHub с материалами курса: [[https://github.com/laktionov/RL-course/tree/2025 GitHub repository]]&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
 ! Занятие !! Тема !! Дата &lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;1&amp;#039;&amp;#039;&amp;#039; [[Запись]] || [[https://github.com/laktionov/RL-course/blob/2025/week1_intro_dynamic_programming/solve_rl_tasks_without_rl.ipynb Ноутбук]] Introduction to RL, Bellman equations, Dynamic Programming ||  18/01/25 ||&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;2&amp;#039;&amp;#039;&amp;#039; [[Запись]] || [[https://github.com/laktionov/RL-course/blob/2025/week2_model_free_rl/tabular_rl.ipynb Ноутбук]] Model-free RL, tabular case || 25/01/25 ||&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;3&amp;#039;&amp;#039;&amp;#039; [[Запись]] || [[https://github.com/laktionov/RL-course/blob/2025/week3_dqn/dqn_for_cartpole.ipynb Ноутбук]] Intro to deep RL: from DQN to RAINBOW and beyond || 01/02/25 ||&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;4&amp;#039;&amp;#039;&amp;#039; [[Запись]] || [[https://github.com/laktionov/RL-course/blob/2025/week4_policy_based/reinforce.ipynb Ноутбук1] [https://github.com/laktionov/RL-course/blob/2025/week4_policy_based/a2c.ipynb Ноутбук2]] Policy Gradients Methods, Actor-Critic || 08/02/25 ||&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;5&amp;#039;&amp;#039;&amp;#039; [[Запись]] || [[https://github.com/laktionov/RL-course/blob/2025/week5_advanced_policy_based/trpo.ipynb Ноутбук]] Advanced Actor-Critic Algorithms: TRPO, PPO || 15/02/25 ||&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;6&amp;#039;&amp;#039;&amp;#039; [[Запись]]|| [[Ноутбук]] Continuous Control: DDPG, TD3, SAC || 22/02/25 ||&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;7&amp;#039;&amp;#039;&amp;#039; [[Запись]]|| [[Ноутбук]] Offline RL || 01/03/25 ||&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;8&amp;#039;&amp;#039;&amp;#039; [[Запись]] || [[Ноутбук]] Multi-armed Bandits ||  08/03/25 ||&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;9&amp;#039;&amp;#039;&amp;#039; [[Запись]] || [[Ноутбук]] Model-based RL ||  15/03/25 ||&lt;br /&gt;
|-&lt;br /&gt;
| style=&amp;quot;background:#eaecf0;&amp;quot; | &amp;#039;&amp;#039;&amp;#039;10&amp;#039;&amp;#039;&amp;#039; [[Запись]] || RL in a context of LLM ||  22/03/25 ||&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==Формула оценивания==&lt;br /&gt;
&lt;br /&gt;
Оценка = МИН(10, 10*(0.65*HW/20 + 0.25*RC/5 + 0.1*TA/9)), где HW - сумма баллов за 5 ДЗ (2 простое и 3 сложных), RC - оценка за презентацию статьи, TA - сумма баллов за еженедельные квизы.&lt;br /&gt;
&lt;br /&gt;
== Домашние задания ==&lt;br /&gt;
# HW-1 &amp;quot;Value- and policy-iteration algorithms&amp;quot; (&amp;#039;&amp;#039;2 балла&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн - 26/01/25&amp;#039;&amp;#039;&amp;#039;, Жёсткий - 02/02/25 | [[https://github.com/laktionov/RL-course/blob/2025/hw1/hw-1-value-policy-iteration.ipynb Ноутбук]]&lt;br /&gt;
# HW-2 &amp;quot;Tabular RL&amp;quot; (&amp;#039;&amp;#039;2 балла&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн - 09/02/25&amp;#039;&amp;#039;&amp;#039;, Жёсткий - 16/02/25 | [[https://github.com/laktionov/RL-course/blob/2025/hw2/advanced_tabular_rl.ipynb Ноутбук]]&lt;br /&gt;
# HW-3 &amp;quot;Duelling DDQN&amp;quot; (&amp;#039;&amp;#039;6 баллов&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн - 02/03/25&amp;#039;&amp;#039;&amp;#039;, Жёсткий - 09/03/25 | [[https://github.com/laktionov/RL-course/blob/2025/hw3/dueling_ddqn.ipynb Ноутбук]]&lt;br /&gt;
# HW-4 &amp;quot;PPO&amp;quot; (&amp;#039;&amp;#039;5 баллов&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн - 09/03/25&amp;#039;&amp;#039;&amp;#039;, Жёсткий - 16/03/25 | [[ Ноутбук]]&lt;br /&gt;
# HW-5 &amp;quot;SAC&amp;quot; (&amp;#039;&amp;#039;5 баллов&amp;#039;&amp;#039;)  | &amp;#039;&amp;#039;&amp;#039;Мягкий дедлайн - 23/03/25&amp;#039;&amp;#039;&amp;#039;, Жёсткий - 30/03/25 | [[ Ноутбук]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Литература ==&lt;br /&gt;
# [http://incompleteideas.net/book/the-book-2nd.html Reinforcement Learning: An Introduction by R.Sutton and A.Barto]&lt;br /&gt;
# [https://github.com/yandexdataschool/Practical_RL Practical RL course by YSDA]&lt;br /&gt;
# [https://www.davidsilver.uk/teaching/ David Silver&amp;#039;s course]&lt;br /&gt;
# [https://rail.eecs.berkeley.edu/deeprlcourse/ Sergey Levine&amp;#039;s course]&lt;br /&gt;
# [https://arxiv.org/abs/2201.09746 Reinforcement Learning Textbook (in Russian)]&lt;/div&gt;</summary>
		<author><name>imported&gt;Ekantonistova 2</name></author>
	</entry>
</feed>