Assignment 5, Part 2

The task here is to implement the value iteration planning algorithm.  The environment is a simple grid-based world which is simulated in ROS through a node called gridsim.  The agent who lives in this world can move to one of the 8 adjacent cells in the grid, as long as the map does not indicate that the destination cell is an obstacle.

Download the following zip file which contains all necessary components:

Copy value_it_template.py to value_it.py.  Your job is to fill in the value_it method in value_it.py.

For task 1, you can make the assumption that the movement of the robot is deterministic.  This means that in any state x_i when an action u is taken, only one new state x_j is possible.

You can use the payoff function suggested in the class notes for value iteration.  However, instead of using a payoff of -1 for any single movement, you should use -sqrt(2) for diagonal movements (think about why).

Testing

To test your code execute the following: roslaunch a5_p2 office.launch.  Take a look at the launch file to see what is going on.  The following nodes are created:

  • The simulator, gridsim.py
  • The map server, map_server, which is used to provide access to the map to the other nodes
  • rviz
  • A Planner object, defined in planner.py.  The Planner will call your value_it function

Once the launch file is executed you should see the map, overlaid with a green square indicating the agent’s position.  Nothing will happen until a goal position is selected.  Click on the “2D Nav Goal” button then click somewhere within the map (this tool allows a pose to be selected, but our agents have no orientation, so the selected direction is ignored).  If you uncomment the starter code provided for value_it then you will see a random value function superimposed over the map in blue.  You should also see randomly oriented arrows which are intended to show the control policy.  The agent may make a few movements.  However, since the control policy is random, we cannot expect much to happen.

Once your implementation is complete run the launch file again and you should hopefully see a sensible value function and control policy.

Hints

  • Use gamma = 0.999
  • The data stored in occupancy_grid is a one-dimensional list that holds two-dimensional data.  This data is stored in row-major order which means that it is filled row-by-row.  To access column i, row j, do the following:
    occupancy_grid.data[width * j + i]
    where width is defined as follows:
    width = occupancy_grid.info.width
    The occupancy grid contains 0 for free space, 100 for obstacles, and -1 for ice (see task 2).

Bonus [15%]

Completing this part will give you up to 15% bonus marks which can offset marks lost on other assignments (but not on other aspects of the course).

Expand on your solution from to handle a probabilistic motion model.  Motion will be deterministic unless the robot is on an icey surface.  On an icey surface the robot goes in its intended direction 50% of the time.  The other 50% of the time it will go in a different randomly selected direction (but will not stand still).  On a non-icey surface the robot goes in its intended direction 100% of the time (just as in task 1).  Note that iciness is indicated with when the occupancy grid’s value is -1.

Test your code on icey.launch.  You should be able to demonstrate that the algorithm tends to avoid icey patches where possible.  Also, it should be demonstrated to move through a small icey patch, but to avoid a larger patch.

Submission

Submit value_it.py via D2L by the deadline given here.