The task here is to implement the value iteration planning algorithm. The environment is a simple grid-based world which is simulated in ROS through a node called gridsim. The agent who lives in this world can move to one of the 8 adjacent cells in the grid, as long as the map does not indicate that the destination cell is an obstacle.
Download the following zip file which contains all necessary components:
Copy value_it_template.py to value_it.py. Your job is to fill in the value_it method in value_it.py.
For task 1, you can make the assumption that the movement of the robot is deterministic. This means that in any state x_i when an action u is taken, only one new state x_j is possible.
You can use the payoff function suggested in the class notes for value iteration. However, instead of using a payoff of -1 for any single movement, you should use -sqrt(2) for diagonal movements (think about why).
To test your code execute the following: roslaunch a5_p2 office.launch. Take a look at the launch file to see what is going on. The following nodes are created:
- The simulator, gridsim.py
- The map server, map_server, which is used to provide access to the map to the other nodes
- A Planner object, defined in planner.py. The Planner will call your value_it function
Once the launch file is executed you should see the map, overlaid with a green square indicating the agent’s position. Nothing will happen until a goal position is selected. Click on the “2D Nav Goal” button then click somewhere within the map (this tool allows a pose to be selected, but our agents have no orientation, so the selected direction is ignored). If you uncomment the starter code provided for value_it then you will see a random value function superimposed over the map in blue. You should also see randomly oriented arrows which are intended to show the control policy. The agent may make a few movements. However, since the control policy is random, we cannot expect much to happen.
Once your implementation is complete run the launch file again and you should hopefully see a sensible value function and control policy.
- Use gamma = 0.999
- The data stored in occupancy_grid is a one-dimensional list that holds two-dimensional data. This data is stored in row-major order which means that it is filled row-by-row. To access column i, row j, do the following:
occupancy_grid.data[width * j + i]
where width is defined as follows:
width = occupancy_grid.info.width
The occupancy grid contains 0 for free space, 100 for obstacles, and -1 for ice (see task 2).
Completing this part will give you up to 15% bonus marks which can offset marks lost on other assignments (but not on other aspects of the course).
Expand on your solution from to handle a probabilistic motion model. Motion will be deterministic unless the robot is on an icey surface. On an icey surface the robot goes in its intended direction 50% of the time. The other 50% of the time it will go in a different randomly selected direction (but will not stand still). On a non-icey surface the robot goes in its intended direction 100% of the time (just as in task 1). Note that iciness is indicated with when the occupancy grid’s value is -1.
Test your code on icey.launch. You should be able to demonstrate that the algorithm tends to avoid icey patches where possible. Also, it should be demonstrated to move through a small icey patch, but to avoid a larger patch.
Submit value_it.py via D2L by the deadline given here.