Designed for: State dimensionality.
Description: We re-use the Catch environment from DeepMinds dm_env, where an agent is shown a grid with a dropping ball and a paddle at the bottom. The player can move the paddle and has to catch the ball when it reaches the bottom.
Variable parameters:
Grid Size (rows&columns): Influences range of values within an observation type and its memory size.
Observation Type (observation_type): Determines the observation space for deep learning approaches. Choice between: Default (Grid), RGB-Image
State space:
Vectorized: MultiDiscrete([rows, rows, columns])
Grid: Box(0, 1, [rows, columns])
Image: Box(0, 255, [rows, columns, 3])
Action space: Discrete(3) Agent can move left, right or stay put.
Dynamics: The ball drops in increments of one along the y-axis.
Reward function: 0 at every step. At the end of the episode +1 if the ball was caught otherwise -1.
Initial state: Player / Paddle always start in the middle at the bottom. The ball starts randomly somewhere at the top.
Termination: The ball reaches the same height as the player / paddle.
Compared algorithms (notebook): Tabular Q-Learning, Deep Q Learning (Stable Baselines 3)