Designed for: State/action spaces, discrete versus continuous.
Description: The agent (a robotic arm with 2 degrees of freedom) has to move red boxes to the container (green box). The main idea illustrated by this environment is that the discretisation of continuous action spaces into an appropriate number of unique actions increases sample efficiency. Additionally, a lower number of possible actions further speeds up learning, as long as the precision level that is necessary to achieve the task is not compromised.
Variable parameters:
Discretization level: when discrete action space is used, determines how many bins the continuous action space is segregated into.
Width of the container: a smaller green container makes the problem harder by increasing the level of precision required in order to obtain the final reward.
State space: Box(-1, 1, [7,]): (4) current positions of motors joints, (2) coordinates of the magnet, (1) binary variable indicating whether the arm is holding a box.
Action space:
Box(-1, 1, [2,]): affects the change in joint angles in radians, range [-1: 1].
MultiDiscrete([num_bins, num_bins]): the same continuous range discretized into [3/5/7/9/11] bins.
Dynamics: At every time step the two controllable angles of the motors are updated based on the selected action values. Collision checks between the agent and the rest of the environment are performed to grant rewards and to check for terminal conditions.
Reward function: +1 reward for picking up a red box. +2 reward for putting the red box into the green box. Up to +2 for placing the red box at the exact centre of the green container (incentive for precise movement). -2 for colliding with the other environment objects. -1 for exceeding the max step limit (150).
Initial state: The initial state of the environment is shown in the figure above.
Termination: Whenever the agent or the red box collides with other objects; maximum step count is reached (150).
Compared algorithms (notebook): Proximal Policy Optimisation (PPO) - continuous and discrete formulations.