上QQ阅读APP看书，第一时间看更新

Summary

In this chapter, we learned about OpenAI Gym, including the installation of different important functions to load, render, and understand the environment state-action spaces. We learned about the Epsilon-Greedy approach as a solution to the exploration-exploitation dilemma, and tried to implement a basic Q-learning and Q-network algorithm to train a reinforcement-learning agent to navigate an environment from OpenAI Gym.

In the next chapter, we will cover the most fundamental concepts in Reinforcement Learning, which include Markov Decision Processes (MDPs), Bellman Equation, and Markov Chain Monte Carlo.