The task here is to implement the value iteration planning algorithm. The environment is a simple grid-based world which is simulated in ROS through a node called gridsim. The agent who lives in this world can move to one of the 8 adjacent cells in the grid, as long as the map does not indicate that the destination cell is an obstacle.
Download the following zip file which contains all necessary components:
Copy value_it_template.py to value_it.py. Your job is to fill in the value_it method in value_it.py.
For Task 1 (but not Task 2) you can make the assumption that the movement of the robot is deterministic. This means that in any state x_i when an action u is taken, only one new state x_j is possible.
You can use the payoff function suggested in the class notes for value iteration. However, instead of using a payoff of -1 for any single movement, you should use -sqrt(2) for diagonal movements (think about why).
To test your code execute the following: roslaunch a5_value_it office.launch. Take a look at the launch file to see what is going on. The following nodes are created:
- The simulator, gridsim.py
- The map server, map_server, which is used to provide access to the map to the other nodes
- A Planner object, defined in planner.py. The Planner will call your value_it function
Once the launch file is executed you should see the map, overlaid with a green square indicating the agent’s position. Nothing will happen until a goal position is selected. Click on the “2D Nav Goal” button then click somewhere within the map (this tool allows a pose to be selected, but our agents have no orientation, so the selected direction is ignored). If you uncomment the starter code provided for value_it then you will see a random value function superimposed over the map in blue. You should also see randomly oriented arrows which are intended to show the control policy. The agent may make a few movements. However, since the control policy is random, we cannot expect much to happen.
Once your implementation is complete run the launch file again and you should hopefully see a sensible value function and control policy.
Your code should complete in a timely manner. Implement some maximum number of iterations, but you should also check for convergence. Convergence means that the maximum change in the value function has fallen below some threshold value. Experiment to achieve reasonable performance (e.g. 5 seconds from the time the “2D Nav Goal” is set to the formation of the plan).
- Use gamma = 0.999
- The data stored in occupancy_grid is a one-dimensional list that holds two-dimensional data. This data is stored in row-major order which means that it is filled row-by-row. To access column i, row j, do the following:
occupancy_grid.data[width * j + i]
where width is defined as follows:
width = occupancy_grid.info.width
The occupancy grid contains 0 for free space, 100 for obstacles, and -1 for ice (see task 2).
- Similar to the Bayes Filter assignment, when updating the value function you will need to make use of a temporary array to avoid modifying the values array while reading it.
Expand on your solution to handle a probabilistic motion model. Motion will be deterministic unless the robot is on an icey surface. On an icey surface the robot goes in its intended direction 50% of the time. The other 50% of the time it will go in a different randomly selected direction (but will not stand still). On a non-icey surface the robot goes in its intended direction 100% of the time (just as in task 1). Note that iciness is indicated when the occupancy grid’s value is -1.
Test your code by launching icey.launch. You should be able to demonstrate that the algorithm tends to avoid icey patches where possible. Also, it should be demonstrated to move through a small icey patch, but to avoid a larger patch.
Submit value_it.py via D2L by 9:00 on Wednesday July 18. Demonstrate it with Rory during your assigned time slot on July 18.