Create a Python script CKYdet.py that implements the deterministic version of the Cocke-Kasami-Younger algorithm for parsing context-tree grammars in extended Chomsky Normal Form (eCNF). This script will take two arguments, namely, the filenames of an eCNF grammar file and an utterance file. The output associated with these arguments is, for each utterance in the utterance file, the set of all valid parses for that utterance relative to the given grammar, where a valid parse is one whose topmost non-terminal is "S" (see below). In cases where an utterance does not have a valid parse, the output for that utterance will be the sentence "No valid parse".
An eCNF grammar file will consist of one or more lines, where each line describes a single rule of the form LHS -> RHS such that LHS is a non-blank string denoting a non-terminal and RHS can be either (1) one or two non-blank strings denoting non-terminals or (2) a double-quote enclosed string denoting a word. An example eCNF grammar is
S -> VP S -> NP VP VP -> V NP VP -> V NP -> Det N Det -> "the" N -> "man" N -> "elephant" V -> "shot" V -> "shoot" V -> "book" V -> "booked"
Non-terminal "S" is the start / utterance non-terminal in every grammar. A parse of an utterance relative to a grammar will be output as a bracketed representation of the derivation-tree associated with that parse. For example, the parses of the utterances "the man shot the elephant" and "shoot the man" relative to the example eCNF grammar above have the bracketed representations
[S [NP [Det "the"] [N "man"]] [VP [V "shot] [NP [Det "the"] [N "elephant"]]]]
and
[S [VP [V "shoot"] [NP [Det "the"] [N "man"]]]]
respectively.
An utterance file consists of one or more utterances (one utterance per line). For simplicity, you may assume that each utterance consists only of words and will not include punctuation or capitalization (except in the case of proper nouns). An example utterance file is
the man shot the elephant shoot the elephant see the elephant I shot the elephant in my pajamas
and the output produced by CKYdet.py for this utterance relative to the example eCNF grammar given above would be
Utterance #1: the man shot the elephant Parse #1: [S [NP [Det "the"] [N "man"]] [VP [V "shot] [NP [Det "the"] [N "elephant"]]]] Utterance #2: shoot the elephant Parse #1: [S [VP [V "shoot"] [NP [Det "the"] [N "elephant"]]]] Utterance #3: see the elephant No valid parse Utterance #4: I shot the elephant in my pajamas No valid parse
Given your script CKYdet.py, you will perform the tests
relative to the deterministic eCNF grammar file g1.ecfg, g2.ecfg, and g3.ecfg and utterance files u1a.utt, u1b.utt, u2a.utt, u2b.utt, u3a.utt, and u3b.utt as command-line arguments to produce the output given in typescript-file CKYdet.script.
Your Python code is only allowed to use language-internal features and the language-standard packages "io" and "sys". Any use of packages such as "nltk" written in Python or any other language that gives CKY functionality will result in an assignment mark of zero.
You may find it useful to have multiple internal representations of an eCNF grammar depending on how you want to access that grammar.
######################################################### ## CS 4750 (Fall 2024), Assignment #3 ## ## Script File Name: CKYdet.py ## ## Student Name: Todd Wareham ## ## Login Name: harold ## ## MUN #: 8008765 ## #########################################################You do not have to develop your code on our CS departmental systems. However, as your code will be compiled and tested on our CS departmental systems as part of the assignment marking process, you should ensure that your code compiles and runs correctly on at least one of these systems.
Created: June 30, 2024
Last Modified: November 7, 2024