adv = vi\vi
-- adverbial phrase (adv) prefixed by vi produces vi
With more base categories in [GLT]:
noun phrases = NP: John, Mary, the man
noun = N: man
article = NP/N: the
infinitive = VP: to-go
adjective = A = N/N: tall
intransitive verb = V = NP\S: walks, likes Mary
transitive verb = V/NP: likes -- or is it NP\(S/NP)?
verb with object and adjective = V/(NP·A): considers -- using Lambek's (1958) category-pairing operator "·"
verb with subclause = V/S: thinks
verb with infinitive (intransitive) = V/VP: wants
verb with infinitive (transitive) = V/(NP·VP): wants
verb with ?-preposition = V/PP: votes
?-preposition: PP/NP: for
adnominal preposition = (N\N)/NP: with
adverbial preposition = (V\V)/NP: with
relative = (N\N)/(S/NP): who
reflexive = (V/NP)\V: himself -- observed by Szabolcsi 1987
overloaded noun/adjective = N^A: square -- using intersection types
Compared to phrase structure grammars,
no production rules need to be specified.
The name of the phrase class specifies what to do.
In terms of phrase structure grammars, categorial grammar
is like having only two fixed basic implicit rule templates:
- application of prefix constructors (X/Y):
X -> X/Y + Y
- application of postfix constructors (Z\X):
X -> Z + Z\X
- a slight generalization: infix constructors (Z\X/Y):
X -> Z + Z\X/Y + Y
Categorial grammars may produce the same constituent structures as phrase structure grammars
(esp. if generalized to allow more than one element ``below the fraction bar'').
But they always identify one of a constituent's sub-constituents
as the dominant one, on which the others depend (dependence) [EML 234].
|
Grammatical analysis for a sentence
with a categorial grammar is similar to (direction sensitive) calculation with fractions:
"Poor John ran away" :
a · n · vi · adv
= (n/n · n) · (vi · vi\vi)
= n · vi
= n · n\
=
Expressed as category inference rules, they define a categorial calculus
(Ajdukiewicz 1935, Bar-Hillel 1953)
which can be completed to Lambek's associative calculus L of 1958
[GLT 2].
- Mary thinks John votes for the man =>
[S: [NP:Mary] [V: [V/S:thinks] [S: [NP:John] [V: [V/PP:votes] [PP: [PP/NP:for] [NP: [NP/N:the] [N:man]]]]]]]
- the man who Mary likes =>
[NP: [NP/N:the] [N: [N:man] [N\N: [(N\N)/(S/NP):who] [S/NP: [NP:Mary] [V/NP:likes]]]]]
- try youself with "the man who John thinks Marky likes"
- John votes for himself =>
[S: [NP:John] [V: [V/NP: [V/PP:votes] [PP/NP:for]] [(V/NP)\V:himself]]]
- Mary considers John tall =>
[S: [NP:Mary] [V: [V/(NP·A):considers] [NP·A: [NP:John] [A:tall]]]]
- John meets the man with Mary =>
[S: [NP:John] [V: [V: [V/NP:meets] [NP: [NP/N:the] [N:man]]] [V\V: [(V\V)/NP:with] [NP:Mary]]]]
- Mary wants to-go =>
[S: [NP:Mary] [V: [V/VP: wants] [VP: to-go]]]
- Mary wants John to-go
[S: [NP:Mary] [V: [V/NP·VP: wants] [NP·VP: [NP:John] [VP: to-go]]]]
Categorial grammars are based on works of the Polish logician Ajdukiewicz
["Die syntaktische Konnexität" 1935]
(he followed Lesniewski);
further developed by Bar-Hillel ["Logical syntax and semantics" 1954],
Lambek ["On the calculus of syntactic types" 1961?]
and other logicians and linguists [ETL 230]
-> categorial explanation of parts of speech
| Cf. PL analogies to categorial parts of speech
Maybe kind-systems can be understood as a categorial grammar for the abstract syntax.
If Fi are functions/ expressions with i parameters, t are types and x are names then
F0/F0: 1st ord. functions
Fi/t: polymorphic functions
t/t: unary type constructors
t\t/t: infix type constructors
x\decl/t:
infix constructors for declarations of variables (x = «:»)
or type aliases (x = «=»)
rec/{decl}: record type constructors,
(rec = «record», «struct», ...)
clause (Teilsatz) [grammatical]
|
| |
Word sequence with subject and predicate
(like a sentence and unlike a syntagma)
but which is contained in a sentence [EML 173].
|
closed class
|
| |
A class of words or morphemes whose membership is fixed and can be listed.
E.g. there is a closed class of determiners (the, this, etc.).
| | Control-flow construct keywords like if, then, else, while, for,
case, break, continue, return
|
|
constituent structure [grammatical]
|
| |
A purely structural description of phrases: which parts belong (closer) together.
it does not build on grammatical classifications (cf. morphemes).
See grammatical ambiguity by ambiguous constituent structure.
IC-analysis = the analysis of the immediate constituents (Bloomfield 1933) [EML 213]
(defined by distributional criteria [EML 215]).
E.g. ICs of "poor John ran away" are "poor John" and "ran away", each of which has two constituents again.
This constituent structure can be made explicit by bracketing: [(Poor John) (ran away)],
or by drawing a (binary) tree above the sentence with the four words as the tree's leaves
[EML 213].
"Belonging (closer) together" does not always imply
"standing (closer) together":
The constituent "called up" in sentences like "John called up Bill"
becomes a discontinuous constituent in sentences like "John called him up".
A compound constituent is endocentric
if it is in the distribution class like one of its immediate subconsituents
(e.g., "poor John" is a noun-phrase, just like "John" alone).
Otherwise a compound constituent is exocentric
(e.g., "in Vancouver" distributes like adverbs of place,
not like preposition "in" and not like noun "Vancouver")
[EML 235].
| | ((abstract?) syntax tree without node labels)
|
|
deep structure [grammatical]
| | |
A hypothetical construct between visible structure and semantics:
- Active and passive version of sentence derive from the same deep structure.
The object of the active sentence corresponds to the subject of the passive sentence [EML 258].
- Some amiguities can be understood
as a surface structure S deriving from different deep structures Di,
which can be made visible by alternative surface structures Si derived from them:
word w in genetiv <-> sentence with subject w (and intransitiv verb) or object w (and transtive verb);
direct <-> indirect speech, etc.
- "They denied the existence of God" and "They denied that God exists"
have the same deep structure [EML 258].
Cf. Chomsky's transformation grammar.
Chomsky's 1981 new theory replaced deep structure by "D-structure" [x],
abondoned in the early 1990s in Chomsky's minimalist programme [x].
| | abstract syntax (tree)
| |
| |
The grammar of a language is parameterized by the language's lexicon
and describes (among others)
the syntax of sentences and the morphology of words.
The grammar can be specified by a combination of several sets of rules [EML 227]:
- formation rules, or PS-rules, define the phrase structure
(e.g. in form of categorial grammars or phrase-structure grammars).
In a transformation grammar the phrase structure
is the deep structure.
- Several ways of compensating for the shortcomings of formation rules (PS-rules)
have been proposed [x]:
- transformation rules convert
phrase structures with structural combinator "+"
into proper strings of words with concatenation as combinator.
They deal with discontinuous constituents
and linearize words in language's whose phrase structure has free word order.
In a transformation grammar the transformation rules
transform deep structure into surface structure,
e.g., to obtain a passive sentence from an active one
(the passive rules).
- Generalized Phrase Structure Grammar (G J M Gazdar, et.al. from end of 1970s)
«with additional devices, in particular the use of metarules and slash categories,
that removed the need ... for transformations»
[x].
«The constraint of transformational rules has been carried to its logical conclusion within GPSG,
since transformational rules have been abolished altogether.
Instead, the phrase structure of a sentence permits the flow of information
from one part of the sentence to another, in a tightly constrained manner»
[x].
- Lexical-Functional Grammar (J Bresnan, R Kaplan, et.al. from the end of the 1970s) [x]
«has also abandoned transformational rules, and relies instead on the
properties of lexical items to explain the connections between sentence types.
For example, the relationship between active and passive sentences
is established via the lexical properties of active and passive verb forms
(for example, eat/is eaten; see/was seen)»
[x].
- morphophonetic and morphographemic rules
realize morpheme combinations (grammatical words)
at the phonologic/orthographic substance level
as strings of phonemes or graphemes, respectively.
-> see morphology.
For example the "morphological spelling" module
of the Lexeme-Morpheme Base Morphology approach [LMBM].
By the ``rule to rule hypothesis'' [x]
each rule of syntax is paired with a corresponding rule of semantics.
Cf. compositional meaning of sentences.
Variant of transformational grammar :
Case Grammar (C J Fillmore, late 1960s) [x].
Universal grammar theories [x]:
Principles and Parameters Theory (Chomsky early 1980's onwards) [x],
Government and Binding Theory (Chomsky's version which was current in the 1980s) [x,
x].
Relational grammar theories:
Relational Grammar (D M Perlmutter, P M Postal, et.al., 1970s) [x],
Arc Pair Grammar (D E Johnson, P M Postal, 1980) [x],
Models of functional syntax [x]:
Tagmemics (Pike, R E Longacre, et.al., 1950s) [x],
Systemic Grammar (Halliday, late 1950s) [x],
Functional Grammar (S C Dik et.al., late 1970s) [x],
Role and Reference Grammar (W A Foley, R D Van Valin, mid-1980s) [x]
| | grammar, syntax rules
-- cf. Describing Morphology and Syntax of PLs
|
|
grammatical meaning [grammatical]
|
| |
Any aspect of meaning described as part of the syntax and morphology of a
language as distinct from its lexicon. Thus, in particular, the meanings of
constructions; of inflections; of other units forming closed classes.
[x]
|
graph [physical:graphic]
|
| |
A symbol for an idea, sound, or linguistic expression
[x]
in the graphic medium [x].
Graphs like "t", "t", "t", "t" etc
all represent the same grapheme t [x],
they are allographs [x].
grapheme [substance:orthography]
|
| |
Minimal unit in a writing system.
«Graphemes are classes of similar marks,
each of which functions in an identical way in the [written] language»
[Realms 65].
Each grapheme is realized in writing or print by its graphs,
such as the different ways of writing and printing an a or a t.
[x].
one grapheme in written language approximately
encodes one phoneme of spoken language.
In alphabetic script a letter [more at x].
A digraph/trigraph
are two/three separate letters that represent one phoneme,
such as ch, gh, sh, th, wh, ng, ck,
and such as tch and sch
[x].
Diacritical marks can modify a base letter [more at x].
A ligature is the close combination of two letters
cast in the same piece of printer type [x].
A ligatures may represent a digraph,
e.g., Æ/æ (ash) = A+E/a+e.
In that case, their constituents may be forgotten,
so that the ligature becomes a letter of its own,
e.g., German ß (s-sharp) = s (in its long form) + z.
Or one of the constituents may be converted to a diacritical,
e.g., Ç/ç (cedilla) = C/c + z below [x];
German Ä/ä (a-umlaut) = A/a + e atop [x].
| character
|
|
graphology [substance:orthography]
|
| |
The study of writing and print as systems [x],
e.g. letters and how they are used to express sounds and form words
(also called orthography) [x].
Parallel to to phonology.
The spelling system of an alphabetic language consists of the conventions
by which its letters represent sounds and words (E-G-G spells egg)
and the way(s) in which words are spelt/spelled
(How d'you spell 'accommodation' -- one m or two?) [x]
| |
| |
[from x]
- In dialect geography, an area within which a feature is used predominantly or exclusively.
Such a feature (phonological, morphological, syntactic, semantic, lexical, or other)
usually contrasts with some similar feature in adjoining areas.
- More commonly, the line on a dialect map which bounds the area of a certain usage.
|
|
isolating language (or "analytical language", as opposed to synthetic)
|
| |
A language whose (lexical) words do not occur in different forms (grammatical words)
(e.g. Vietnamnese).
The degree of isolation can be measured by the proportion of morphemes and words:
English 1.68, Sanskrit 2.59, Eskimo 3.72 (1.0 would be an ideal isolating language) [EML 191].
| | All PLs are isolating
| |
| |
The language as a system captured in its grammar.
Main components of langue:
lexicon, semantics, syntax, morphology, phonology.
Linguists try to infer the langue from the observable parole.
| |
| |
- lexical word
- semantic morpheme (Vendryes's "semanteme")
| |
| |
A rule that expresses generalization over sets of entries in a lexicon.
For example in derivational morphology:
«[A] lexical rule can state that,
if there is an entry for X, where X is an adjective, there is also, barring exceptions,
an entry for X + -ness, where the whole is an abstract noun with a meaning corresponding to it.
Hence happiness or blackness»
[x].
|
lexicon [lexical]
|
| |
Contains grammatical and semantic information on each word [EML 170].
Associates words with their ``distribution classes''
or ``grammatical features'' on which the grammar is based [EML 161].
The lexicon contains (for phrase-structure grammars)
the following information (which can be presented in two alternative ways) [EML 220]:
N = { John, ... } away: Adv
V = { ran, ... } John: N
A = { poor, ... } poor: A
Adv = { away, ... } ran: V
However, closer analysis of word distributivity
shows that there are overlapping subclassifications of word classes.
This is better handled by assigning grammatical features to the entries in the lexicon [EML 169]:
door: [appellativ] [-human] [neutral]
boy: [appellativ] [+human] [masculine]
child: [appellativ] [+human]
There is often an alignment of grammatical features
and words' semantic components.
However note that [masculine] [feminine] [neutral] are grammatical,
whereas [male] [female] [asexual] are semantic.
| |
In language formalization, the environment
associates words with their meaning ("value"),
and the type assignment associates words
with their grammatical class ("static type").
In compiler construction: the symbol table
associates words with their tokenclass.
grammatical features
|
| |
The smallest segment of the phonological/orthographic word [EML 187]
into which it is divided in an analysis of morphemes [x].
(why not the morphologic word?)
E.g. «unacceptable» = un·(accept·able) [EML 187].
Simplified, a morph realizes a morpheme at the substance level
-> see morphology.
Allomorphs are variants of realizing a morpheme [EML 187].
The same (allo)morph can represent different morphemes [EML 194].
E.g. allomorphs s and es (or /s/, /z/, /iz/)
can realize the morphemes {plural} or {3rd pers. sg.}
A portmanteau morph realizes two or more successive morphemes.
E.g. in French «au théâtre», au is a single morph ([o])
which simultaneously realizes preposition {à} and the definite article {le}.
[x]
An empty morph does not directly realize a morpheme.
E.g. in «children», r is an empty morph
whereas the morph child realizes the morpheme {child},
and a morph {en} may be said to realize the plural morpheme {s} (as also in oxen).
[x]
A zero morph is the invisible realization of a morpheme.
E.g. in three sheep the noun's plural morpheme {s} is realized by a zero morph:
«sheep» = sheep·ø.
[x]
| Subsegments of words can be identified in PLs,
but are they realizations of content-distinguishing morphemes?
- «12.34E-56» = (12 · . · 34)
· E · (- · 56)
= float with mantissa 12.34 and exponent -56
Meaning and grammatical properties do not change by varying between 'E' and 'e'
- «010» = 0 · 10 = octal 10 = 8
«0x10» = 0x · 10 = hexadecimal 10 = 16
Meaning and grammatical properties do not change by varying between '0x' and '0X'
- «" string text "» = «"» · « string text » · «"»
The second morph of a string literal never contains unmasked «"»
- «/* comment text */» = «/*» · « comment text » · «*/»
The second morph of a comment never contains '*/'
|
| |
A minimal unit of form and meaning in the language system.
There are many variations in how the term is used and understood
[x]:
- In structural linguistics ((Bloomfield 1933),
a morpheme is a grammatical factor of a word
(irrespectively of its realization at the phonological or orthographic level)
which determines its distributability [EML 185].
E.g. «oxen» is composed of the morphemes {ox} and {s} (or {plural}),
and «worse» = {bad} + {er} (or {comparative}).
Note that this is in fact a matter of distributability since
only certain words can be qualified with the {bad}-words bad, worse, worst,
and only certain contexts allow {er}-words like worse or taller.
The {bad} and {er} dimensions of distributability can be shown and compared as follows
[EML 185]:
bad : worse : worst = tall : taller : tallest
{bad} : {bad}+{er} : {bad}+{st} = {tall} : {tall}+{er} : {tall}+{st}
The 8-word sentence «The cats were sitting unhappily in the rain»
consists of 12 morphemes, all of equal status:
{the} + {cat} + {s} + {were} + {sit} + {ing} + {un} + {happy} + {ly} + {in} + {the} + {rain}
[x].
-
The French linguist Joseph Vendryes (1921) called
morpheme the minimal grammatical unit (a grammatical morpheme),
as opposed to the "semanteme" (now known as lexeme(b) = lexical word?),
the minimal semantical unit (a lexical morpheme).
The former are the glue that hold the latter together
[x]:
The sentence «The cats were sitting unhappily in the rain»
consists of 8 morphemes and 5 lexemes:
{the} + CAT + {s} + (BE + {past/plural}) + SIT + {ing} + {un} + HAPPY + {ly} + {in} + {the} + RAIN
-
For Bloomfield, morphemes have meanings called sememes.
Bloomfield's morphemes have been devided into
"system morphemes" (glue) and "source morphemes" (content) by Dwight Bolinger (1968);
while André Martinet (1970) subsumed Vendryes's morpheme and lexeme under the "moneme"
[x].
«[W]e distinguish three kinds of morphemes:» [Mor 126]
- «semantic morphemes which have lexical meaning»
(Vendryes's "semanteme" or lexemes(b),
Bolinger's "source morpheme",
lexical morphemes (root, stem) [ESW ch.VII]).
They are the units of semantic meaning:
have a meaning (content),
always have a substance realization,
form an open unlimited class
[LMBM]
- «grammatical morphemes which have grammatical meaning»
(also in [ESW ch.VII], Vendryes's "morpheme", "system morpheme").
They are the units of grammatical meaning:
have only grammatical value but no meaning (they are ``empty forms''),
they may have no substance realization,
they belong to a small closed class
(around 200; fewer than 100 in Indo-European languages)
[LMBM].
- «and grammatical-semantic morphemes which have both
grammatical indication and semantic (lexcial) meaning.»
In lioness the morpheme {ness}
not only changes the grammatical gender to feminine [my dictionary says, lioness is neutral!]
but also fixes the sex of the animal designated by {lion} to be female.
On the other hand,
plural {s} changes the grammatical category of the whole word lions
but not the semantics of the semantic/lexical morpheme {lion}.
-
The morpheme (in `langue' [ESW]) is an invariant lexical or grammatical unit
(the ``ultimate constituent'' of a sentence [x])
realized by one or more configurations of phonological units (in `parole' [ESW]).
Reversely, an morpheme is the abstraction behind a morph,
possibly subsuming several allomorphs [x].
See example in morphology.
E.g., the {negative} morpheme may be realized by allomorphes
dis in distasteful or by un in unpleasant
[x]
or by in-, im-, and il- in insincere, impolite, illogical.
An unusal example:
The future tense in Tagalog (or Philippino) is indicated by a prefix
-- the reduplication of the first syllable of the verb stem:
bi-bili (will buy), ku-kuha (will get), pu-punta (will go), su-sulat (will write), ta-tawa (will laugh)
[LMBM]
|
|
morphology [grammatical]
|
| |
The study of words (their structure, their phonological and orthographic realization),
and the grammatical and semantical categories they convey
[LMBM].
E.g., inflection [x];
derivation of one word in the lexicon from another
(hang => hanger, count => countless) [x]
-- cf. lexical rule.
Free morph(eme)s like 'prawn' and 'alabaster', can exist independently,
whereas bound morph(eme)s, like '-ed' and '-ing', cannot stand alone
and only find legitimate expression as part of a word
[x].
Lexical morph(eme)s, such as 'non-', 'conform' and '-ist',
can be used to build up forms of (new) lexical words (for example, 'nonconformist'),
whereas grammatical morph(eme)s, like the third person singular marker '-s',
do not result in new lexical words but only change the grammatical meaning
[x].
free grammatical (articles, prepositions, conjunctions, ...):
the, in, therefore
bound grammatical (suffixes, prefixes):
-ly, un- (derivative), and -s, -ed (flexative)
free lexical: house
bound lexical: rasp- (as in raspberry)? (in German: Him-, Brom-)
An example of realization at the substance level
[EML 268]:
orthograpic: «the» «man» «will» «have» «read» «the» «books»
phonological: /ðe/ /mAn/ /wil/ /haf/ /red/ /ðe/ /bu:ks/
grammatical: {the} {man}+ø {will}+{s} {have} {read}+{en} {the} {book}+{s}
orthograpic: «the» «books» «will» «have» «been» «read» «by» «the» «man»
phonological: /ðe/ /bu:ks/ /wil/ /haf/ /bi:n/ /red/ /bAi/ /ðe/ /mAn/
grammatical: {the} {book}+{s} {will}+ø {have} {be}+{en} {read}+{en} {by} {the} {man}+ø
(the "{s}" is the noun-plural or the 3rd pers. sg. morpheme,
the "ø" is the morpheme for unmarked noun-singular or non-3rd p. sg.;
"ø" is an artefact of the PS-grammar which produced the morpheme string in Lyon's example)
In an agglutinating language (e.g. Turkish), each morpheme of a word is represented by a morph of its own [EML 191].
In a flecting language (e.g. Latin), several morphemes of a word are represented in one morph
(e.g. '-us' represents {singular} and {nominative}) [EML 192]
(cf. ``cumulative exponence'' [LMBM]).
|
open class [lexical]
|
| |
A class of words or morphemes to which new members can readily be added.
E.g., one cannot list all the nouns since it will always be possible for
speakers to coin ones that are new, or to borrow them from another language.
[x]
| | The class of valid (defined) identifiers is an open class in PLs.
| |
| |
A relationship between signs.
More precisely, signs form a paradigma
if they are mutually substitutable in a syntagma [KL, ch.4].
A closed paradigma: e.g. a flexion group.
An open paradigma: e.g. "relative" + "kin" + "friend" ... [ELT].
| |
| |
The language as a body of linguistic artifacts (observables).
From them, linguists try to construct the langue
defining them.
| |
| |
- «the role that a word (or sometimes a phrase) plays in a sentence»
[wiki].
Traditionally: Parts of speech are verbs, nouns, adjectives, adverbs, prepositions, articles, conjunctions, etc.
This classification is a compromize between
morphological and grammatical distinctions.
«Words are not inherently one part of speech or another.
Many verbs and nouns are spelled identically,
for instance, denoting the action and the name of the action, respectively:
neigh, break, outlaw»
[wiki].
«In transformational-generative grammar, parts of speech are known as lexical categories»
[wiki]
-> syntactic categories
Categories (NB: not the same as the categories of categorial grammar):
The main parts of speech have been explained as follows by Hjelmslev and Jespersen:
- Nouns (n) are the `first grade' category.
- Verbs (in the broad semantic, not the strict syntactic sense, ie. including adjectives)
are the `second grade' category:
They further determine a noun, are `adnominal'.
- Adverbs (in the broad sense, ie. including adadjectives)
are the `third grade' category:
They further determines a verb or adjective,
they are ad-verbs or ad-adjectives.
More categories are not needed, since there is no further part of speech
whose main[!] purpose it is to determine adverbs (in all checked languages).
[EML 333].
Specifying parts of speach using categorial grammar
helps to find correspondences to programming language concepts.
But the categorial classification is less powerful than a type system
and can not even capture the full natural syntactic categories,
but only subcases.
|
In APL jargon, first-order functions are verbs,
higher-order functions are adverbs
(cf. adverbial programming).
In object-oriented analysis, nouns in problem statements
are seen as candidates for objects, and
verbs as candidates for object operations.
Interface types can be regarded as adjectives,
describing objects' capabilities and not objects per-se.
Analyzing problems statements (also) for adjectives allows better modelling than only looking for nouns
[Michael C Feathers: Factoring Class Capabilities with Adjectives; JOOP Mar/Apr 1999]
| | | | | | | |