Assignment 2
Due: 11:59 PM via Brightspace on Tuesday, October 22, 2024
Program and test a Python-based system for creating and using finite-state
transducers by performing the following tasks:
- Write the following Python functions:
- readFST(filename): Read in a description of a FST stored
in file filename and return a stored version of that
FST.
The format of a FST description file is as follows: The first
line gives the number of states in the FST and a string
containing all characters in the lower and upper strings
encoded in the FST. The special symbol "-" will be used to
denote the symbol epsilon. This is followed by one or more
groups of lines, one per state, which encode the transitions going
outwards from that state. In each such group, the first line
gives the state number and an character indicating whether
that state is ("F") or is not ("N") final and each subsequent line
specifies a transition by the lower and upper symbols in that
transition and the number of the state towards which that
transition is directed. The states in an n-state FST will
be numbered from 1 to n and state 1 is assumed to be the
start state. Two example FST descriptions for FST over the
alphabet {a,b,c} that (1) transform all b's following the leftmost a
into c's and (2) delete all b's except the leftmost one are given in
files test1.fst and
test2.fst, respectively.
- composeFST(F1,F2): Given two stored versions F1 and
F2, create and return the FST that is F1
composed with F2.
- reconstructUpper(l, F): Print the set of upper strings
associated with lower string l by FST F.
- reconstructLower(u, F): Print the set of lower strings
associated with upper string u by FST F.
- Using the functions described above, write and document a Python script
reconstruct.py which performs different functions depending
on the following command-line argument-sets:
- reconstruct.py surface wlf F1 F2 ... Fn: Print the
surface (upper) forms associated with each of the lexical (lower) forms
in lexical-form file wlf relative to the FST
created by composing the FST described in files F1
through Fn in that order. It is possible that only
a single FST may be specified.
- reconstruct.py lexical wsf F1 F2 ... Fn: Print the
lexical (lower) forms associated with each of the surface (upper) forms
in surface-form file wsf relative to the FST
created by composing the FST described in files F1
through Fn in that order. It is possible that only
a single FST may be specified.
Prior to printing any reconstructed word-forms, each of these
commands must print the number of states and the number of transitions
in the composed FST (which may or may not include a composed lexicon
FST, depending on the command executed) used to reconstruct word-forms.
Moreover, for each given form in a lexical- or surface-form file, that
form must be printed immediately before the list of reconstructed forms
associated with that form.
- Using the Python script reconstruct.py described above, perform
the following tests relative to given lexical-form
file word.lex, surface-form file
word.srf, and FST description files
addVowPlu.fst,
delPlu1.fst,
delPlu2.fst, and
vcePlu.fst:
- reconstruct.py surface word.lex vcePlu.fst
- reconstruct.py lexical word.srf vcePlu.fst
- reconstruct.py surface word.lex addVowPlu.fst
- reconstruct.py lexical word.srf addVowPlu.fst
- reconstruct.py surface word.lex vcePlu.fst addVowPlu.fst
- reconstruct.py lexical word.srf vcePlu.fst addVowPlu.fst
- reconstruct.py surface word.lex vcePlu.fst addVowPlu.fst delPlu1.fst
- reconstruct.py lexical word.srf vcePlu.fst addVowPlu.fst delPlu1.fst
- reconstruct.py surface word.lex vcePlu.fst addVowPlu.fst delPlu2.fst
- reconstruct.py lexical word.srf vcePlu.fst addVowPlu.fst delPlu2.fst
Your script must work on the requested tests and provided datafiles
as command-line arguments to produce the output given in
typescript-file reconstruct.script.
Your Python code is only allowed to use language-internal features and
the language-standard packages "io" and "sys". Any use of packages such
as "nltk" or "openFST" written in Python or any other language that
gives NLP, FSA, Trie, and/or FST functionality will result in an
assignment mark of zero.
Hints
You may find it useful to store an FST as a dictionary of dictionaries,
where each sub-dictionary encodes the transitions outward from a
particular state. In the sub-dictionaries, as there may be multiple
transitions outwards from a state labelled with the same l/u symbol-pair,
consider having lists of state-designators rather than single state-designators
as the values associated with an l/u symbol-pair key.
You may find it useful to implement the required functions composeFST,
reconstructUpper, and reconstructLower
using algorithms given in Lecture #8.
Relative to the symbols present in the word-forms in files word.lex and
word.srf, symbol P is the plural marker, symbols [p], [t], [s], and [S]
are voiceless sounds (with [S] corresponding to [sh] in the ARPAbet), symbols
[b], [d], [e], [o], [z], and [Z] are voiced sounds (with [Z] corresponding to
[zh] in the ARPAbet), and symbols [s], [z], [S], and [Z] are fricatives.
Submission
Please submit copies of all of your Python code electronically
through the course D2L / Brightspace
shell by 11:59pm on the assignment due date.
Submission of a script file
whose results cannot be replicated by the associated submitted Python
code will result in an assignment mark of zero.
Note that each Python script file must have the following
comment block at the top, where the X's are replaced
with the appropriate information, followed by a docstring briefly describing
the program in that script. For instance, my reconstruct.py script for
this assignment would begin with the following comment block:
#########################################################
## CS 4750 (Fall 2023), Assignment #2 ##
## Script File Name: reconstruct.py ##
## Student Name: Todd Wareham ##
## Login Name: harold ##
## MUN #: 8008765 ##
#########################################################
You do not have to develop your code on our CS departmental systems.
However, as your code will be compiled and tested on our CS departmental
systems as part of the assignment marking process,
you should ensure that your code compiles and runs correctly on at
least one of these systems.
- September 26, 8:20am
Assignment #2 due date changed to October 22.
- June 30, 11:15am
Assignment #2 posted.
Created: June 30, 2024
Last Modified: September 27, 2024