Category theory in theoretical linguistics: A monadic semantics for root syntax

❶ In a nutshell

Root syntax is a popular branch of Chomskyan linguistics (since the 1990s).
But so far it has received nearly no attention in formal semantics.
A principled compositional semantics for root syntax is hard to get (see Song 2021).
Category theory offers a neat solution (via the writer monad).

Learn more from my 4-min video or blogpost.

❷ Root syntax & its generalization

Two major incarnations (I use DM):

Distributed Morphology (DM)
Exoskeletal Syntax (XS)

Mainly for decomposition of content words:

Schema: [_X X √R ]
X = categorizer (purely functional)
√R = root (purely idiosyncratic)
Ex.1: [_N n √DOG] => (/dɔg/, 🐶)
Ex.2: [_V v √RUN] => (/rʌn/, 🏃)
Roots have no fixed sound/meaning.

Generalized root syntax (Song 2019):

Extended def. of X (from a few major parts of speech [P] to any functional category [F])
[X = P] => content word
[X = non-P F] => semigrammatical word
Root categorization <=> root support

❸ Examples of semigrammaticality

Ex.3: Chinese classifiers [Cl_{\(\lambda P \lambda x. x \in \mathrm{Atom}(P)\)}]

a. bǎ 'grip' (objects with handle-like bars), běn 'volume' (bound print matter), dòng 'pillar' (buildings), miàn 'surface' (flat objects), etc.

b. yī wèi/míng/gè lǎoshī 'one Cl_r/o/n teacher'
(r = respectful, o = official, n = neutral)

Ex.4: Vietnamese negators [Neg_{\(\lambda t. \neg t\)}]

a. không 'empty' (default), đâu 'where' (emphatic, colloquial), nào 'which' (colloquial but elevated), đếch 'fuck' (mildly vulgar), đéo 'penis, fuck' (very vulgar), etc.

b. Em không cần anh giúp.
'I Neg_n need your help.' (n = neutral)

c. Tao đéo cần mày giúp.
'I_v Neg_v need your_v help.' (v = vulgar)

❹ Monadic semantics

Point of departure: formal semantics for generative grammar (Heim & Kratzer 1998)

natural language expressions =>
syntactic structures (binary trees) =>
semantic denotations (\(\lambda\)-calculus)
We focus on the semantic side, which is usually taken to be a set-theoretic structure. It can be viewed as a category \(\mathbf{Sem}\).
Then, we define a monad on \(\mathbf{Sem}\).

Def. 1 (cf. Asudeh & Giorgolo 2020):
Let \(\langle T, \eta, \mu\rangle\) be a monad on \(\mathbf{Sem}\), such that \(\forall A. TA = \langle A, \{\langle X, \surd_1\rangle, \langle Y, \surd_2\rangle, \dots \} \rangle\), where \(\langle X, \surd_1\rangle\) etc. record the root-supported types in the syntactic structure denoting \(A\). Then \(\forall f: A \rightarrow B. Tf = \lambda\langle x, Q\rangle.\langle f(x), Q\rangle\), where \(Q\) is also a set of type-root pairs. The two natural transformations are \(\eta_A = \langle x, \emptyset\rangle\) and \(\mu_A(\langle x, P\rangle, Q\rangle) = \langle x, P\cup Q\rangle\). With \(\mu\), we can further define >>= on \(ta: TA\) and \(f: A \rightarrow TB\) as \(ta\) >>= \(f = \mu_B(Tf(ta))\).

Remark 1: The set of grammatical type–root pairs serves as a record of the root support situation in an expression. The log set is "inert" in composition and only gets "opened" at the final stage of semantic interpretation.

We complete the monadic semantics with the ancillary function \(\mathrm{write}\) (A&G2020), which wraps a grammatical type–root pair into a dummy monadic term:

Def. 2: \(\mathrm{write}\langle\)X, √\(\rangle\) = \(\langle\)1, \(\{\langle\)X, √\(\rangle\}\rangle\).

Together with >>=, this gives us a way to compose the root categorization schema:

Def. 3: ⟦[_X X √ ]⟧ = \(\mathrm{write}\)(X, √) >>= \(\lambda y. \eta\)(⟦X⟧)

This writes a grammatical type–root pair into the log slot of a vacuous monadic term.

❺ Examples of monadic composition

Ex.5: the English noun dog

⟦dog⟧ = ⟦[_N n √DOG]⟧ = \(\mathrm{write}\)(n, √DOG) >>= \(\lambda y. \eta\)(⟦n⟧) = \(\langle\)⟦n⟧, \(\{\langle\)⟦n⟧, √DOG\(\rangle\}\rangle\)
(an entity enriched by √DOG)

Ex.6: the Vietnamese negator đéo

⟦đéo⟧ = ⟦[_Neg Neg √ĐÉO]⟧ = \(\mathrm{write}\)(Neg,√ĐÉO) >>= \(\lambda y. \eta\)(⟦Neg⟧) = \(\langle\)⟦Neg⟧, \(\{\langle\)⟦Neg⟧, √ĐÉO\(\rangle\}\rangle\)
(a boolean function enriched by √ĐÉO)

See Song (2021, 2022) for larger examples.

❻ Categorical setting

The category \(\mathbf{Sem}\), being a subcategory of \(\mathbf{Set}\), is cartesian closed. We do not need extra structures, such as left/right directionality, because word order is not regulated by syntax/semantics in Chomskyan linguistics (but is done in phonology).

Def. 1 is given from the perspective of semantics. We could also start from the syntactic side, but it is unclear whether Chomskyan syntax defines a category. Thus, focusing on the semantic side seems to be the "shortest path" for us at this stage.

Despite the name \(\mathbf{Sem}\), the category in Def. 1 is actually more like the syntax (i.e., formal calculus) category in other categorical linguistic works, since its objects/morphisms can also be viewed as types/terms. Due to its Montogavian foundation and \(\lambda\)-calculus implementation, formal semantics as it is practiced in generative grammar is still quite syntax-y from a categorical perspective.

❼ Conclusion

The writer monad can help linguists develop a compositional semantics for root syntax.
Most (if not all) semantic composition is monadic if we take root syntax seriously, simply because every sentence contains content words, whose idiosyncratic contribution to semantic interpretation must be carried along via the monadic shell.
But since the root support log does not interfere with computation in the pure function slot, we can safely ignore the logging or even root syntax altogether when subatomic structure is not the focus.
Overall, this is reminiscent of the global consequences of the extensional-to-intensional jump. Now we are experiencing this again in the ordinary-to-monadic composition jump.