Context Free Grammars And Generating Functions

We saw how to get the number of ways to generate a string of length n, matching or avoiding certain patterns, from Regular Expressions (RE).

But it can't help when more expressive power is required. We'll see examples on how to use Context Free Grammars (CFG) to obtain the generating functions (GF), whose coefficients in the formal power series indicate the number of parse trees possible for n-letter string for the given CFG.

So, in a way, it can be used to know whether the grammar is ambiguous or not.

Let's look at relatively simple examples, for the strings constructed over the set of symbols \(\left\{\left(, \right)\right\}\)

1. Find the number of ways of constructing balanced parentheses

The grammar is

\begin{equation*} S \to (S)S\; \big| \; \epsilon \end{equation*}

Then, to get its GF, replace each symbol with \(x\), and \(\epsilon\) with 1. Hence, the GF for the RE is

\begin{align*} S(x) &= x^2\, S(x)^2 + 1\\ \implies S(x)&= \frac{1-\sqrt{1-4 x^2}}{2 x^2}\\ &= \sum_{n\ge 0} \frac{1}{n+1}\binom{2\, n}{n} x^{2 n} \end{align*}

and the series is the well known Catalan numbers.

2. Find the number of ways of constructing balanced parentheses, which can have more opening parentheses

e.g. for \(n=3\), \((((, ((), ()(\) are valid

We may obtain the CFG as

\begin{equation*} S \to (S)S \; \big|\; (S \; \big| \; \epsilon \end{equation*}

Even though the grammar describes the language, it's actually ambiguous, and the GF obtained from this counts all the extra parse trees.

The right CFG is

\begin{align*} S &\to B \;\big|\; U \\ B &\to (\, B\, )\, B\; \big|\; \epsilon \\ U &\to (\, S \;\big| \;(\, B\, )\, U \\ \end{align*}

and we derive the GF

\begin{align*} S(x) &= B(x) + U(x)\\ B(x) &= B(x)^2 x^2 + 1\\ U(x) &= S(x) x + B(x) U(x) x^2\\ \implies S(x) &= \frac{-1+2\,x+\sqrt{1-4\,x^2}}{2\,x-4\,x^2}\\ &= \sum_{n\ge 0} \binom{n}{\lfloor n/2 \rfloor} x^{n} \end{align*}

Read Gruber, Lee & Shallit for the theory.