Designed for: Dynamics, stochasticity.
Description: The ball (white dot) starts on the bottom side of the golf course. At each step, the agent hits the ball with a chosen type of swing. The goal of the agent is to get the ball on the green after which the game is finished and the agent gets a reward. Reaching the maximum number of allowed hits or hitting the ball off course also causes the game to end, but the agent gets a negative reward instead. A stochasticity level is used to mimic the skill level of the agent. It causes the ball to deflect from the coordinates the ball was hit towards.
Variable parameters:
Stochasticity level (stochasticity_level). Used to set the standard deviation of the zero-centred Gaussian noise that is used to sample the deflection of the ball.
State space: MultiDiscrete([width, length]). The coordinates of the ball on the field.
Action space: Discrete(3). The action can be a drive, chip, putt or i.e. a long shot, a medium shot, and a short shot.
Dynamics: The ball moves forward towards the flag with a distance depending on the type of swing performed. A random deflection in the (transverse) direction of the movement where the magnitude of the deflection is defined through the stochasticity level.
Reward function: -1 is obtained if the ball goes off the course or if the maximum number of hits is reached. If the green is reached, the reward is between 0 and 1 proportionally to the number of needed hits.
Initial state: The ball starts at (width / 2, 0).
Termination: When the ball is off course, when the ball is on the green, and when the maximum number of hits is reached.
Compared algorithms (notebook): Q Learning with a Risk-Sensitive factor.