Assignment 3

Due: 12:00 PM on Friday, November 28, 2014


Create two Python scripts CKYdet.py and CKYprb.py that implement deterministic and probabilistic versions of the Cocke-Kasami-Younger algorithm for parsing context-tree grammars in extended Chomsky Normal Form (eCNF). Each of these scripts will take two arguments, namely, the filenames of an eCNF grammar file and an utterance file. The output associated with these arguments is, for each utterance in the utterance file, the set of all valid parses for that utterance relative to the given grammar (in the case of CKYdet.py) or the most probable valid parse and the probability of that parse (in the case of CKYprb.py), where a valid parse is one whose topmost non-terminal is "S" (see below). In cases where an utterance does not have a valid parse, the output for that utterance will be the sentence "No valid parse".

An eCNF grammar file will consist of one or more lines, where each line describes a single rule of the form LHS -> RHS (in the case of a deterministic eCNF grammar) or LHS -> RHS prb (in the case of a probabilistic eCNF grammar), LHS is a non-blank string denoting a non-terminal, RHS can be either (1) one or two non-blank strings denoting non-terminals or (2) a double-quote enclosed string denoting a word, and prb is a floating-point number x such that 0.0 lt x lte 1.0. An example deterministic eCNF grammar is

and an example probabilistic eCNF grammar is

Non-terminal "S" is the start / utterance non-terminal in every grammar. A parse of an utterance relative to a grammar will be output as a bracketed representation of the derivation-tree associated with that parse. For example, the parses of the utterances "the man shot the elephant" and "shoot the man" relative to the example eCNF grammar above have the bracketed representations

and

respectively.

An utterance file consists of one or more utterances (one utterance per line). For simplicity, you may assume that each utterance consists only of words and will not include punctuation or capitalization (except in the case of proper nouns). An example utterance file is

and the output produced by CKYdet.py for this utterance relative to the example deterministic eCNF grammar given above would be

Given your scripts CKYdet.py and CKYprb.py, you will perform the tests

relative to the deterministic eCNF grammar file g1.dcf, probabilistic eCNF grammar files g1.pcf, g2.pcf, and g3.pcf, and utterance files u1.utt and u2.utt.

Your Python code is only allowed to use language-internal features and the language-standard packages "io" and "sys". Any use of packages such as "nltk" written in Python or any other language that gives CKY functionality will result in an assignment mark of zero.


Hints

Submission


Additional Notes:


Created: November 2, 2014
Last Modified: November 27, 2014