Assignment 3

Due: 11:59 PM via Brightspace on Thursday, November 7, 2024

Create a Python script CKYdet.py that implements the deterministic version of the Cocke-Kasami-Younger algorithm for parsing context-tree grammars in extended Chomsky Normal Form (eCNF). This script will take two arguments, namely, the filenames of an eCNF grammar file and an utterance file. The output associated with these arguments is, for each utterance in the utterance file, the set of all valid parses for that utterance relative to the given grammar, where a valid parse is one whose topmost non-terminal is "S" (see below). In cases where an utterance does not have a valid parse, the output for that utterance will be the sentence "No valid parse".

An eCNF grammar file will consist of one or more lines, where each line describes a single rule of the form LHS -> RHS such that LHS is a non-blank string denoting a non-terminal and RHS can be either (1) one or two non-blank strings denoting non-terminals or (2) a double-quote enclosed string denoting a word. An example eCNF grammar is

S -> VP
S -> NP VP
VP -> V NP
VP -> V
NP -> Det N
Det -> "the"
N -> "man"
N -> "elephant"
V -> "shot"
V -> "shoot"
V -> "book"
V -> "booked"

Non-terminal "S" is the start / utterance non-terminal in every grammar. A parse of an utterance relative to a grammar will be output as a bracketed representation of the derivation-tree associated with that parse. For example, the parses of the utterances "the man shot the elephant" and "shoot the man" relative to the example eCNF grammar above have the bracketed representations

[S [NP [Det "the"] [N "man"]] [VP [V "shot] [NP [Det "the"] [N "elephant"]]]]

and

[S [VP [V "shoot"] [NP [Det "the"] [N "man"]]]]

respectively.

An utterance file consists of one or more utterances (one utterance per line). For simplicity, you may assume that each utterance consists only of words and will not include punctuation or capitalization (except in the case of proper nouns). An example utterance file is

the man shot the elephant
shoot the elephant
see the elephant
I shot the elephant in my pajamas

and the output produced by CKYdet.py for this utterance relative to the example eCNF grammar given above would be

Utterance #1: the man shot the elephant
  Parse #1: [S [NP [Det "the"] [N "man"]] [VP [V "shot] [NP [Det "the"] [N "elephant"]]]]
Utterance #2: shoot the elephant
  Parse #1: [S [VP [V "shoot"] [NP [Det "the"] [N "elephant"]]]]
Utterance #3: see the elephant
  No valid parse
Utterance #4: I shot the elephant in my pajamas
  No valid parse

Given your script CKYdet.py, you will perform the tests

CKYdet.py g1.ecfg u1a.utt
CKYdet.py g1.ecfg u1b.utt
CKYdet.py g2.ecfg u2a.utt
CKYdet.py g2.ecfg u2b.utt
CKYdet.py g3.ecfg u3a.utt
CKYdet.py g3.ecfg u3b.utt

relative to the deterministic eCNF grammar file g1.ecfg, g2.ecfg, and g3.ecfg and utterance files u1a.utt, u1b.utt, u2a.utt, u2b.utt, u3a.utt, and u3b.utt as command-line arguments to produce the output given in typescript-file CKYdet.script.

Your Python code is only allowed to use language-internal features and the language-standard packages "io" and "sys". Any use of packages such as "nltk" written in Python or any other language that gives CKY functionality will result in an assignment mark of zero.

Hints

You may find it useful to have multiple internal representations of an eCNF grammar depending on how you want to access that grammar.

Submission

Submission of a script file whose results cannot be replicated by the associated submitted Python code will result in an assignment mark of zero.

comment block

CKYdet.py

#########################################################
##  CS 4750 (Fall 2024), Assignment #3                 ##
##   Script File Name: CKYdet.py                       ##
##       Student Name: Todd Wareham                    ##
##         Login Name: harold                          ##
##              MUN #: 8008765                         ##
#########################################################

assignment marking process

Additional Notes:

June 30, 11:15am
Assignment #3 posted.

Assignment 3

Hints

Submission

Additional Notes:

Created: June 30, 2024 Last Modified: November 7, 2024

Created: June 30, 2024
Last Modified: November 7, 2024