Using directed graphs to count the number of patterns

How many n-digit numbers are there which do not contain 122 and 213 in them? Numbers start with a nonzero digit.

We can write a directed graph to solve that, the weighted adjacency matrix for which is given by:

\begin{equation*} \displaystyle D = \left(\begin{array}{ccccccc} & I & A & 1 & 2 & 12 & 21 \\ I & 0 & 7 & 1 & 1 & 0 & 0 \\ A & 0 & 8 & 1 & 1 & 0 & 0 \\ 1 & 0 & 8 & 1 & 0 & 1 & 0 \\ 2 & 0 & 8 & 0 & 1 & 0 & 1 \\ 12 & 0 & 8 & 0 & 0 & 0 & 1 \\ 21 & 0 & 7 & 1 & 0 & 1 & 0 \end{array}\right) \end{equation*}

'I' is the initial state, '1' and '2' are the states if those digits are the last encountered. \(A\) is for any other valid digits. It goes to state \(12\) or \(21\) if we see a sequence of 12 or 21, respectively. The number of ways of valid transitions are given as the weights.

There, the hard work is done. Leave the remaining to Sage!

Obtain the characteristic polynomial of the matrix, which also is the characteristic equation of the required recurrence equation.

1
2
3
4
5
6
7
8
9
am=matrix([
[0, 7, 1, 1, 0, 0],
[0, 8, 1, 1, 0, 0],
[0, 8, 1, 0, 1, 0],
[0, 8, 0, 1, 0, 1],
[0, 8, 0, 0, 0, 1],
[0, 7, 1, 0, 1, 0]
])
print am.characteristic_polynomial()

The characteristic equation is:

\begin{equation*} \displaystyle x^{5} - 10x^{4} + 2x^{2} - 1 = 0 \end{equation*}

So, the recurrence would be:

\begin{align*} \displaystyle a_{i+5}&=10\, a_{i+4}-2\, a_{i+2}+a_{i} \\ a_{0}&=1\\ a_{1}&= 9\\ a_{2}&= 90\\ a_{3}&= 898\\ a_{4}&= 8962\\ \end{align*}

Wonder if anyone can come up with a combinatorial argument for that equation?!

Initial conditions can be easily obtained by matrix exponentiation or using a program to iterate through the sequences, e.g.

print sum((am^4)[0,:].list())

or

1
2
3
4
5
cnt = 0
for i in range(1000,10000):
    if str(i).find('122')==-1 and str(i).find('213')==-1:
        cnt += 1
print cnt

We can also automatically get the generating function of the obtained recurrence by using a small program (method is given in "Lectures on generating functions" by S. K. Lando):

1
2
3
4
5
6
def get_gf(alst,initvals):
    nn=len(alst)
    Am = zero_matrix(QQ,nn-1,1).augment(identity_matrix(nn-1)).stack(matrix(QQ,alst[::-1]))
    Bm = matrix(QQ,initvals)
    return (((identity_matrix(nn)-x*Am).inverse()*Bm.transpose())[0,0]).full_simplify()
print get_gf([10,0,-2,0,1],[1,9,90,898,8962])

which gives the g.f.

\begin{equation*} \displaystyle G(x)=\frac{x - 1}{x^{5} - 2 \, x^{3} + 10 \, x - 1} \end{equation*}

There are tremendous uses of generating functions, one of which is to obtain an asymptotic formula. (See William Feller's book on probability for a brief explanation on the topic)

If we have a generating function of the form \(G(x)=U(x)/V(x)\), then the asymptotic form is given by

\begin{equation*} \displaystyle a_n \sim \dfrac{\rho_1}{s_1^{n+1}} \displaystyle \textrm{where }\rho_1=\dfrac{-U(s_1)}{V^{'}(s_1)} \displaystyle \textrm{and }s_1 \textrm{ is the root of }V(x)\textrm{ nearest to origin} \end{equation*}

We will visually inspect where the roots lie, to get an idea about the closest root to the origin

complex_plot(x^5 - 2*x^3 + 10*x - 1,(-2, 2), (-2, 2))
../../images/complexroot.jpg

complex plot of the equation

and we see that there is only one real root (also the nearest to origin) and other four are complex.

We can proceed with the following steps in Sage:

1
2
3
4
s1=find_root(x^5 - 2*x^3 + 10*x - 1, 0, 4)
rho1=(1-s1)/diff(x^5 - 2*x^3 + 10*x - 1,x).subs(x=s1)
f(n)=rho1/s1^(n+1)
print int(f(15)),f(n)

We find the approximation to be

\begin{equation*} \displaystyle a_n \sim \frac{0.0905207193521}{0.100200193518^{n + 1}} \end{equation*}

The \(15^{th}\) term using the asymptotic formula gives about \(876700051238642\), which is only a little more than the actual value of \(876700051238641\).

Expected number of marbles to be picked till one of the colours are repeated

There is a bag having 3 red, 3 black, 5 white and 7 green marbles. A marble is randomly picked one after another without replacement till the colour of the picked marble matches with one of the marbles in hand. What is the expected number of marbles we need to pick?

Before trying out the analytical solution, let us get an approximate answer from a simulation. It's just some tens of characters in J:

1
2
3
a=:(3#0 1),(5#2),(7#3)
sim =: 3 : '{.1+I.-.~:(5?#a){a'
(+/%#)(sim "0) 100000#0 NB. about 3.25279

sample size of 5 is chosen by pigeonhole principle.

  • ~: returns 1 for items which are distinct till that position.
  • -. flips the 1's and 0's.
  • I. fetches the indices of non-zero items. 1 added since indexing starts from 0.
  • {. gets the head of the array.

For the analytical solution, it can be simply expressed as a recurrence relation:

\begin{equation*} \displaystyle f_{a,b,c,d} = \begin{cases} A+B+C+D-(a+b+c+d) & A-a = 2 \lor B-b = 2 \lor C-c = 2 \lor D-d = 2\\ & \\ \dfrac{1}{a+b+c+d}\left(a\cdot f_{a-1,b,c,d} + b\cdot f_{a,b-1,c,d} + c\cdot f_{a,b,c-1,d} + d\cdot f_{a,b,c,d-1}\right) & \text{otherwise} \end{cases} \end{equation*}

where \(A,B,C,D\) are the initial number of marbles of four colours.

And it can be directly translated to code.

In Sage:

A = 3
B = 3
C = 5
D = 7
def f(a,b,c,d):
    if (A-a == 2 or B-b == 2 or C-c == 2 or D-d == 2):
        return (A+B+C+D-(a+b+c+d))
    return (a*f(a-1,b,c,d) + b*f(a,b-1,c,d) + c*f(a,b,c-1,d) + d*f(a,b,c,d-1))/(a+b+c+d)
ans= f(A,B,C,D)
print ans, N(ans) # = 3979/1224 and N() for the numerical approximation = 3.25081699346405

Here's an exercise for you to try:

Repeat the same problem, this time with the replacement of the marbles. What's the expected number of number of picks till you see the same coloured marble again?

Simulation is easy:

1
2
3
a=:(3#0 1),(5#2),(7#3)
sim =: 3 : '{.1+I.-.~:(?5##a){a'
(+/%#)(sim "0) 100000#0

In a similar fashion, the recurrence can be modified. (Ans: \(757/243\))

Simulating Buffon's needle problem

Buffon's needle problem is a classic problem on geometric probability. mathworld describes the problem quite well, and gives the following formula for the probability that the needle touches a line:

\begin{equation*} \displaystyle \mathbb{P}(x)=\begin{cases} \dfrac{2}{\pi}\, x& x\le 1\\ & \\ \dfrac{2}{\pi}\, \left(x-\sqrt{x^2-1}+\sec^{-1}{x}\right)& x>1 \end{cases} \end{equation*}

How can we convince ourselves that the formula has been derived correctly? How to simulate geometric shapes?

The answer is to use the parametric representation of the points. If the needle of length \(l\) is dropped, then it makes an angle \(\theta\) with the horizontal. For simulating, take one end of the needle as a reference, and it must be randomly and uniformly distributed in \([0,d)\). Call the random number as \(h\). The other end of the needle will then be at a distance \(h+l\cdot sin(\theta)\).

Now, for the condition that the needle touches the line, the other end must be either higher than \(d\) or less than zero. Hence, we can simulate the experiment by picking \(h\) in \(U(0,d)\) and \(\theta \in U(0,2\, \theta)\).

Following is the code in J:

1
2
3
4
5
6
7
8
load 'trig'
'l d'=:4 1
    sim =: 3 : 0
h=:d*?1000000#0
th=:2p1*?1000000#0
(+/%#)((h+l*sin th)>d)+.((h+l*sin th)<0)
)
    sim 0

It gives an answer about \(0.919978\), and changing \(l\) and \(d\) to

'l d'=:1 4

gives about \(0.15907\), which are close to actual answers \(0.920000066713994\) and \(0.159154943091895\), respectively.

Monte-Carlo simulation for an expected value of cards

Suppose we have a standard deck of 52 cards, and we pick 13 cards randomly and arrange them in a row, what is the expected number of adjacent pairs that are of same suit?

E.g. in ♠♥♥♣♦♣♣♦♦♣♠♠♣ , there are 4 adjacent pairs that are of same suit. On an average, what would be the expected number of such pairs?

Newcomers to probability theory would find such a question a bit tricky. In such a situation, using a computer for simulation/enumeration would ease the job. For this problem, enumeration can yield the exact answer, but getting all the combinations is awkward.

In such cases, simulation is there for our rescue! Even an approximate answer would be sufficient to conjecture a formula.

For the simulation, we will be using a language called J. The programs written in J can be very short compared to other well known languages. So, we can focus on the problem at hand instead of the program.

Let's see how it can be used for our simulation (there can be other ways, here’s my shot):

1
2
3
a=:13#(i.4)
sim=: 3 : '+/2=/\(13?52){a'
(+/%#)(sim "0) 1000000#0

That's it, less than 70 characters! It output \(2.82489\) for me. When we are proceeding with an analytical method, if we get an answer around \(2.82\), then we can be pretty sure that it's right.

Some explanation about the program:

In J, every operation is performed right to left, if no parentheses are provided. So, if we write 2*5+3, answer would be 16 and not 13. No operator precedence here.

In J terminology, the operators are called verbs. They can be monadic or dyadic. Check the help files for more info.

  1. i.4 is array of integers 0 1 2 3, representing 4 suits. 13# repeats each element 13 times.
  2. sim is the function for our simulation. Read it right to left. { is the verb for indexing.
    • (13?52) gets 13 random integers in 0..51 without replacement, to simulate 13 card draws.
    • 2=/\ compares two adjacent values from the selected list, and returns 1 or 0 for true or false, respectively.
    • +/ gives us the sum of the array, which is the total number of pairs with same adjacent suit.
  3. " is a verb for rank. "0 gets atomic values in the rhs, i.e. a million zeros, the rhs is not used in our simulation, it's just for performing the experiment a million times. The outcome of each experiment is then averaged, by using the tacit definition (+/%#). This last line almost always remains the same for any similar kind of simulation.

Lastly, experiment with different number of picks and observe how answer is changing according to that.

Now, for the analytical result, the linearity of expectation is used.

The probability that the card in positions i and i+1 are of the same suit is given by

\begin{equation*} \displaystyle \mathbb{P}[c_i=c_{i+1} | c_i \text{ is spade}] = \frac{13}{52}\cdot \frac{12}{51} \end{equation*}

and similarly for three other suits. The total probability is then

\begin{equation*} \displaystyle \mathbb{P}[c_i=c_{i+1}] = \frac{13}{52}\cdot \frac{12}{51} \times 4 = \frac{4}{17} \end{equation*}

This is also the expected number of pairs of the same suit when two cards are picked. We write the expected value as

\begin{equation*} \displaystyle \mathbb{E}[c_i=c_{i+1}] = \frac{4}{17} \end{equation*}

Hence, by linearity of expectation, when ‘n’ cards are picked, we can expect

\begin{equation*} \displaystyle \sum_{i=1}^{n}\mathbb{E}[c_i=c_{i+1}] = \frac{4\, (n-1)}{17} \end{equation*}

adjacent pairs to be of the same suit.

For \(n=13\), it would be \(48/17\), or \(2.82352941176471\) cards, which agrees with the simulation.