Interactive Regexp and Replacing Regexp with Emacs

If we need to replace text based on patterns, then replace-regexp is the command.

But we may not always be sure that the regular expression (RE) entered does what we wanted. Not to worry, emacs has it covered: with re-builder! As we keep entering the text, the editor progressively highlights the text in the buffer that matches. Once we are certain that all the required text has matched, we can move to enter the command to replace it.

We'll look at a usecase where we need to uppercase some sql keywords, and cast the output of a function as float

In replace-regexp, the matched expressions are grouped using \( ... \), and the matched groups can be accessed in the result as \1, \2, etc.

Context Free Grammars And Generating Functions

We saw how to get the number of ways to generate a string of length n, matching or avoiding certain patterns, from Regular Expressions (RE).

But it can't help when more expressive power is required. We'll see examples on how to use Context Free Grammars (CFG) to obtain the generating functions (GF), whose coefficients in the formal power series indicate the number of parse trees possible for n-letter string for the given CFG.

So, in a way, it can be used to know whether the grammar is ambiguous or not.

Let's look at relatively simple examples, for the strings constructed over the set of symbols \(\left\{\left(, \right)\right\}\)

1. Find the number of ways of constructing balanced parentheses

The grammar is

\begin{equation*} S \to (S)S\; \big| \; \epsilon \end{equation*}

Then, to get its GF, replace each symbol with \(x\), and \(\epsilon\) with 1. Hence, the GF for the RE is

\begin{align*} S(x) &= x^2\, S(x)^2 + 1\\ \implies S(x)&= \frac{1-\sqrt{1-4 x^2}}{2 x^2}\\ &= \sum_{n\ge 0} \frac{1}{n+1}\binom{2\, n}{n} x^{2 n} \end{align*}

and the series is the well known Catalan numbers.

2. Find the number of ways of constructing balanced parentheses, which can have more opening parentheses

e.g. for \(n=3\), \((((, ((), ()(\) are valid

We may obtain the CFG as

\begin{equation*} S \to (S)S \; \big|\; (S \; \big| \; \epsilon \end{equation*}

Even though the grammar describes the language, it's actually ambiguous, and the GF obtained from this counts all the extra parse trees.

The right CFG is

\begin{align*} S &\to B \;\big|\; U \\ B &\to (\, B\, )\, B\; \big|\; \epsilon \\ U &\to (\, S \;\big| \;(\, B\, )\, U \\ \end{align*}

and we derive the GF

\begin{align*} S(x) &= B(x) + U(x)\\ B(x) &= B(x)^2 x^2 + 1\\ U(x) &= S(x) x + B(x) U(x) x^2\\ \implies S(x) &= \frac{-1+2\,x+\sqrt{1-4\,x^2}}{2\,x-4\,x^2}\\ &= \sum_{n\ge 0} \binom{n}{\lfloor n/2 \rfloor} x^{n} \end{align*}

Read Gruber, Lee & Shallit for the theory.

Regular Expressions And Generating Functions

We'll discuss about the number of ways to generate a string of length n, matching or avoiding certain patterns as its substring.

This can be easily achieved deriving an unambiguous regular expression (RE) by constructing a minimal Deterministic Finite Automaton (DFA), and then the ordinary Generating Function (GF) whose coefficients are the number of n-letter strings having the pattern.

Let's look at a few examples, for the strings constructed over the set of symbols \(\{0, 1\}\) such that

1. There are no two consecutive 1's

The RE is: \((0+10)^*(1+\epsilon)\)

Then, to get its GF, replace each symbol with \(x\), and \(\epsilon\) with 1. Hence, the GF for the RE is

\begin{align*} G_1(x) &= \frac{1}{1-x-x^2}\cdot (1+x)\\ &= \sum_{n\ge 0} \frac{ \left(\left(\sqrt{5}-3\right) \left(1-\sqrt{5}\right)^n+\left(\sqrt{5}+3\right) \left(1+\sqrt{5}\right)^n\right)}{\sqrt{5}\, 2^{n+1}} x^n\\ &= \sum_{n\ge 0} F_{n+2} x^n \end{align*}

where \(F_{n}\) is the fibonacci number

2. At least one pair of consecutive 1's as its substring

RE: \((0+10)^*11(0+1)^*\)

\begin{align*} G_2(x) &= \frac{1}{1-x-x^2} \cdot x^2\cdot \frac{1}{1-2\, x}\\ &= \sum_{n\ge 0} \left(\frac{ \left(\left(\sqrt{5}-3\right) \left(-\left(1-\sqrt{5}\right)^n\right)-\left(3+\sqrt{5}\right) \left(1+\sqrt{5}\right)^n\right)}{\sqrt{5}\, 2^{n+1}}+2^n\right)\, x^n\\ &= \sum_{n\ge 0}\left(2^{n} - F_{n+2}\right) x^n \end{align*}

which can also be confirmed from the first example by taking its complement, since the number of possible words are \(2^n\)

3. At least one 01 as its substring

RE: \(1^*00^*1(0+1)^*\)

\begin{align*} G_3(x) &= \frac{x^2}{(1-2 x) (1-x)^2}\\ &= \sum_{n\ge 0} \left(2^n-n-1\right) x^n \end{align*}

For more, refer Analytic Combinatorics, which is a treatise on Generating Functions.

Communicating With Serial Ports Using Emacs

Emacs can even be used to communicate with serial ports! Hence, it can replace softwares like minicom, hyperterminal, putty etc. that are used for serial port communication.

Making a connection is simple,

M-x serial-term

enter port name and baud rate, and it's connected!

E.g. In Linux based systems, if we need to connect to a GSM modem, find its port name by listing /dev/serial/by-id/

ls -l /dev/serial/by-id/

which will show the symbolic link to the serial port to be used. Suppose the name is /dev/ttyUSB3

Then, using that name with the default baud rate (9600 bps) connects to the modem. The settings can be changed at runtime without requiring to reconnect to the serial port.

And we may issue the standard AT commands

at
OK
at+cpin?
+CPIN: READY

OK
at+csq
+CSQ: 14,99

OK
at+cmgf=1
OK