Assignment 6
Due: 11:00 AM on Wednesday, November 4, 2009
Question #1
Write and document a Python script index2.py which takes as
command-line arguments the names of a ignored-word file (one word per line),
a text-file, and an index-file, and computes and outputs to the
index-file a sorted index describing the lines on which each word in the
text-file that is not in the ignored-word file occurs in the
text-file.
You may assume that the only non-word characters in a text-file are
apostrophes, quotation marks (single and double), parentheses, exclamation
marks, question marks, colons, semi-colons, commas, and periods.
Your script must work on word-file
word1.txt and text-file
text1.txt to produce index-file
index1.txt as specified in
typescript-file
index2.script.
You may assume that all given files are formatted correctly.
Question #2
A market-basket file is a file in which each line has the form
X: L and describes a shopping trip made by customer X
in which in the items in list L were purchased. Such
market-basket files are used by on-line retail sites such as Amazon.ca to
suggest additional items that a customer X might want to purchase
based on past purchases by other customers that share one or more
items with X's current purchase.
Write and document a Python script market.py which takes
as command-line arguments a training market-basket file and a current
market-basket file and, for each transaction T in the current
market-basket file, prints a list (sorted in reverse order by total
number of times purchased) of all additional items purchased
any time an item from T was purchased in the transactions
listed in the training market-basket file.
Your script must work on training market-basket file
basketTrain1.dat
and current market-basket file
basketCurrent1.dat
to produce the output given in typescript-file
market.script.
You may assume that all given files are formatted correctly.
Hints
You may find the various example scripts in the course notes of use.
In Q1, you may find it useful to store words in the index and their
associated line-occurrences as a dictionary of lists.
In Q2, you may find it useful to create and store indices of customers
purchasing particular items(as a dictionary of sets) and items (along
with the number of times purchased) purchased by particular customers
(as a dictionary of dictionaries).
Submission
Please hand in printed copies of all of your Python script files.
You must also submit these files electronically using the
submit-assignment command.
Note that each script file must have the following comment
block at the top, where the X's are replaced with the appropriate
information, followed by a docstring briefly describing the program in that
script. For instance, my script for Question #1 of this assignment would
begin with the following comment block:
#########################################################
## CS 2500 (Fall 2009), Assignment #6, Question #1 ##
## Script File Name: index2.py ##
## Student Name: Todd Wareham ##
## Login Name: harold ##
## MUN #: 8008765 ##
#########################################################
You do not have to develop your code on our CS departmental systems.
However, as your code will be interpreted and tested on our CS departmental
systems as part of the assignment marking process,
you should ensure that your code interprets and runs correctly on at
least one of these systems.
- August 24, 2:00pm
Assignment #6 posted.
Created: August 24, 2009
Last Modified: August 24, 2009