This course was created with the
course builder. Create your online course today.
Start now
Create your course
with
Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Practical Reinforcement Learning
Course Lectures
001. Why should you care (9:43)
002. Reinforcement learning vs all (3:11)
003. Multi-armed bandit (4:52)
004. Decision process & applications (6:50)
005. Markov Decision Process (5:20)
006. Crossentropy method (9:57)
007. Approximate crossentropy method (5:15)
008. More on approximate crossentropy method (6:24)
009. Evolution strategies core idea (6:32)
010. Evolution strategies math problems (5:36)
011. Evolution strategies log-derivative trick (8:21)
012. Evolution strategies duct tape (6:44)
013. Blackbox optimization drawbacks (4:54)
014. Reward design (15:44)
015. State and Action Value Functions (13:06)
016. Measuring Policy Optimality (6:15)
017. Policy evaluation & improvement (10:44)
018. Policy and value iteration (8:13)
019. Model-based vs model-free (8:22)
020. Monte-Carlo & Temporal Difference; Q-learning (8:51)
021. Exploration vs Exploitation (8:25)
022. Footnote Monte-Carlo vs Temporal Difference (2:56)
023. Accounting for exploration. Expected Value SARSA. (11:16)
024. On-policy vs off-policy; Experience replay (7:33)
025. Supervised & Reinforcement Learning (17:01)
026. Loss functions in value based RL (11:21)
027. Difficulties with Approximate Methods (15:29)
028. DQN bird's eye view (9:14)
029. DQN the internals (9:49)
030. DQN statistical issues (6:09)
031. Double Q-learning (6:22)
032. More DQN tricks (10:54)
033. Partial observability (17:56)
034. Intuition (9:48)
035. All Kinds of Policies (4:26)
036. Policy gradient formalism (8:20)
037. The log-derivative trick (3:40)
038. REINFORCE (8:38)
039. Advantage actor-critic (6:35)
040. Duct tape zone (4:48)
041. Policy-based vs Value-based (4:22)
042. Case study A3C (6:53)
043. A3C case study (2 2) (3:46)
044. Combining supervised & reinforcement learning (6:40)
045. Recap bandits (7:48)
046. Regret measuring the quality of exploration (6:39)
047. The message just repeats. 'Regret, Regret, Regret.' (5:44)
048. Intuitive explanation (7:04)
049. Thompson Sampling (5:13)
050. Optimism in face of uncertainty (5:24)
051. UCB-1 (6:58)
052. Bayesian UCB (11:44)
053. Introduction to planning (17:46)
054. Monte Carlo Tree Search (10:39)
041. Policy-based vs Value-based
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock