Create two Python scripts CKYdet.py and CKYprb.py that implement deterministic and probabilistic versions of the Cocke-Kasami-Younger algorithm for parsing context-tree grammars in extended Chomsky Normal Form (eCNF). Each of these scripts will take two arguments, namely, the filenames of an eCNF grammar file and an utterance file. The output associated with these arguments is, for each utterance in the utterance file, the set of all valid parses for that utterance relative to the given grammar (in the case of CKYdet.py) or the most probable valid parse and the probability of that parse (in the case of CKYprb.py), where a valid parse is one whose topmost non-terminal is "S" (see below). In cases where an utterance does not have a valid parse, the output for that utterance will be the sentence "No valid parse".
An eCNF grammar file will consist of one or more lines, where each line describes a single rule of the form LHS -> RHS (in the case of a deterministic eCNF grammar) or LHS -> RHS prb (in the case of a probabilistic eCNF grammar), LHS is a non-blank string denoting a non-terminal, RHS can be either (1) one or two non-blank strings denoting non-terminals or (2) a double-quote enclosed string denoting a word, and prb is a floating-point number x such that 0.0 lt x lte 1.0. An example deterministic eCNF grammar is
S -> VP S -> NP VP VP -> V NP VP -> V NP -> Det N Det -> "the" N -> "man" N -> "elephant" V -> "shot" V -> "shoot" V -> "book" V -> "booked"
and an example probabilistic eCNF grammar is
S -> VP 0.3 S -> NP VP 0.7 VP -V NP 0.9 VP -> V 0.1 NP -> Det N 1.0 Det -> "the" 1.0 N -> "man" 0.5 N -> "elephant" 0.5 V -> "shot" 0.35 V -> "shoot" 0.25 V -> "book" 0.15 V -> "booked" 0.25
Non-terminal "S" is the start / utterance non-terminal in every grammar. A parse of an utterance relative to a grammar will be output as a bracketed representation of the derivation-tree associated with that parse. For example, the parses of the utterances "the man shot the elephant" and "shoot the man" relative to the example eCNF grammar above have the bracketed representations
[S [NP [Det "the"] [N "man"]] [VP [V "shot] [NP [Det "the"] [N "elephant"]]]]
and
[S [VP [V "shoot"] [NP [Det "the"] [N "man"]]]]
respectively.
An utterance file consists of one or more utterances (one utterance per line). For simplicity, you may assume that each utterance consists only of words and will not include punctuation or capitalization (except in the case of proper nouns). An example utterance file is
the man shot the elephant shoot the elephant see the elephant I shot the elephant in my pajamas
and the output produced by CKYdet.py for this utterance relative to the example deterministic eCNF grammar given above would be
Utterance #1: the man shot the elephant Parse #1: [S [NP [Det "the"] [N "man"]] [VP [V "shot] [NP [Det "the"] [N "elephant"]]]] Utterance #2: shoot the elephant Parse #1: [S [VP [V "shoot"] [NP [Det "the"] [N "elephant"]]]] Utterance #3: see the elephant No valid parse Utterance #4: I shot the elephant in my pajamas No valid parse
Given your scripts CKYdet.py and CKYprb.py, you will perform the tests
relative to the deterministic eCNF grammar file g1.dcf, probabilistic eCNF grammar files g1.pcf, g2.pcf, and g3.pcf, and utterance files u1.utt and u2.utt.
Your Python code is only allowed to use language-internal features and the language-standard packages "io" and "sys". Any use of packages such as "nltk" written in Python or any other language that gives CKY functionality will result in an assignment mark of zero.
You may find it useful to have multiple internal representations of an eCNF grammar depending on how you want to access that grammar.
######################################################### ## CS 4750 (Fall 2014), Assignment #3 ## ## Script File Name: CKYdet.py ## ## Student Name: Todd Wareham ## ## Login Name: harold ## ## MUN #: 8008765 ## #########################################################You do not have to develop your code on our CS departmental systems. However, as your code will be compiled and tested on our CS departmental systems as part of the assignment marking process, you should ensure that your code compiles and runs correctly on at least one of these systems.
Created: November 2, 2014
Last Modified: November 27, 2014