Assignment 3

Due: 3pm on Wednesday, November 14, 2018


Create a Python script CKYdet.py that implements the deterministic version of the Cocke-Kasami-Younger algorithm for parsing context-tree grammars in extended Chomsky Normal Form (eCNF). This script will take two arguments, namely, the filenames of an eCNF grammar file and an utterance file. The output associated with these arguments is, for each utterance in the utterance file, the set of all valid parses for that utterance relative to the given grammar, where a valid parse is one whose topmost non-terminal is "S" (see below). In cases where an utterance does not have a valid parse, the output for that utterance will be the sentence "No valid parse".

An eCNF grammar file will consist of one or more lines, where each line describes a single rule of the form LHS -> RHS such that LHS is a non-blank string denoting a non-terminal and RHS can be either (1) one or two non-blank strings denoting non-terminals or (2) a double-quote enclosed string denoting a word. An example eCNF grammar is

Non-terminal "S" is the start / utterance non-terminal in every grammar. A parse of an utterance relative to a grammar will be output as a bracketed representation of the derivation-tree associated with that parse. For example, the parses of the utterances "the man shot the elephant" and "shoot the man" relative to the example eCNF grammar above have the bracketed representations

and

respectively.

An utterance file consists of one or more utterances (one utterance per line). For simplicity, you may assume that each utterance consists only of words and will not include punctuation or capitalization (except in the case of proper nouns). An example utterance file is

and the output produced by CKYdet.py for this utterance relative to the example eCNF grammar given above would be

Given your script CKYdet.py, you will perform the tests

relative to the deterministic eCNF grammar file g1.ecfg, g2.ecfg, and g3.ecfg and utterance files u1a.utt, u1b.utt, u2a.utt, u2b.utt, u3a.utt, and u3b.utt to produce the output given in typescript-file CKYdet.script.

Your Python code is only allowed to use language-internal features and the language-standard packages "io" and "sys". Any use of packages such as "nltk" written in Python or any other language that gives CKY functionality will result in an assignment mark of zero.


Hints

Submission


Additional Notes:


Created: October 1, 2018
Last Modified: November 5, 2018