Differentiation under the integral sign and a sum

We will see how differentiation under the integral sign, which Mr. Feynman loved, can be applied to derive an otherwise difficult integral. The integral can also be expanded as a Taylor series, thus obtaining an infinite sum.

So, this is the integral:

\begin{equation*} \displaystyle I(a)=\int_0^1 \, \frac{\ln{(1+a\,(x-x^2))}}{x-x^2}\, dx \end{equation*}

Applying the differentiation under the integral sign to this:

\begin{align*} \displaystyle \frac{\partial}{\partial a} I(a) &= \int_0^1\, \frac{1}{1+a\,(x-x^2)}\, dx\\ &=\frac{\log\left(\frac{1}{2} \, a + \frac{1}{2} \, \sqrt{a^{2} + 4 \, a} + 1\right)}{\sqrt{a^{2} + 4 \, a}} - \frac{\log\left(\frac{1}{2} \, a - \frac{1}{2} \, \sqrt{a^{2} + 4 \, a} + 1\right)}{\sqrt{a^{2} + 4 \, a}} \end{align*}

Integrating w.r.t \(a\) gives us (answer by friCAS)

\begin{equation*} \displaystyle I(a)=\log\left( \frac{a+2 -\sqrt{a^{2} + 4 \, a}}{2}\right)^{2} + C \end{equation*}

Putting \(a=0\) in the above gives \(I(0)=0+C\) and original integral is also 0. Hence, \(C=0\) and the final answer to the integral is:

\begin{equation*} \displaystyle I(a)=\log\left( \frac{a+2 -\sqrt{a^{2} + 4 \, a}}{2}\right)^{2} \end{equation*}

Now, we may also expand the integral as a taylor series:

\begin{align*} \displaystyle I(a)&=\int_0^1 \, \sum_{n\ge 1}\, \frac{(-1)^{n-1}\, a^{n} (x-x^2)^n}{n\, (x-x^2)} \, dx\\ &=\int_0^1 \, \sum_{n\ge 1}\, \frac{(-1)^{n-1}\, a^{i} (x-x^2)^{n-1}}{n} \, dx\\ &=\int_0^1 \, \sum_{n\ge 0}\, \frac{(-1)^{n}\, a^{n+1} (x-x^2)^{n}}{n+1} \, dx\\ &=\sum_{n\ge 0}\, \frac{(-1)^{n}\, a^{n+1} B(n+1,n+1)}{n+1}\\ &=\sum_{n\ge 0}\, \frac{(-1)^{n}\, a^{n+1} \Gamma\left(n+1\right)\Gamma\left(n+1\right)}{(n+1)\,\Gamma\left(2\, n+1\right)}\\ &=\sum_{n\ge 0}\, \frac{(-1)^{n}\, a^{n+1} \, n!^2}{(n+1)\,(2\, n+1)\, (2n)!}\\ &=\sum_{n\ge 0}\, \frac{(-1)^{n}\, a^{n+1}}{(n+1)\,(2\, n+1)\, \binom{2n}{n}} \end{align*}

Therefore, we have the following sum!

\begin{equation*} \displaystyle \sum_{n\ge 0}\, \frac{(-1)^{n}\, a^{n+1}}{(n+1)\,(2\, n+1)\, \binom{2n}{n}}=\log\left( \frac{a+2 -\sqrt{a^{2} + 4 \, a}}{2}\right)^{2} \end{equation*}

This has got real values for \(a>-4\)

E.g. when \(a=-2\)

\begin{equation*} \displaystyle \sum_{n\ge 0}\, \frac{2^{n+1}}{(n+1)\,(2\, n+1)\, \binom{2n}{n}}=\frac{\pi^2}{4} \end{equation*}

and when \(a=1/2\)

\begin{equation*} \displaystyle \sum_{n\ge 0}\, \frac{(-1)^{n}}{(n+1)\,(2\, n+1)\, \binom{2n}{n}\, 2^{n+1}}=\log\left(2\right)^{2} \end{equation*}

A restricted arrangement of letters in a word

Find number of ways to arrange letters in word "benzine" such that no two same letters are adjacent to each other

This is quite a tough nut to crack. But, there are multiple ways to solve this: by summation, generating function, a recurrence and by integration; summation being the most efficient one.

1 By summation

We'll start with a generalized formula for 5 distinct objects, each with \(n_i\) repetitions. The summation formula can be written as:

\begin{equation*} \displaystyle N(n_1,n_2,n_3,n_4,n_5)=\sum_{x_{1}=1}^{n_{1}}\sum_{x_{2}=1}^{n_{2}}\sum_{x_{3}=1}^{n_{3}}\sum_{x_{4}=1}^{n_{4}}\sum_{x_{5}=1}^{n_{5}}\, (-1)^{n_{1}+n_{2}+n_{3}+n_{4}+n_{5}-x_{1}-x_{2}-x_{3}-x_{4}-x_{5}}\, \dfrac{(x_{1}+x_{2}+x_{3}+x_{4}+x_{5})!}{x_1!x_2!x_3!x_4!x_5!} \cdot \dbinom{n_{1}-1}{x_{1}-1}\,\dbinom{n_{2}-1}{x_{2}-1}\,\dbinom{n_{3}-1}{x_{3}-1}\,\dbinom{n_{4}-1}{x_{4}-1}\,\dbinom{n_{5}-1}{x_{5}-1} \end{equation*}

and for our problem, we need to find \(N(2,2,1,1,1)\). Putting that in Sage:

1
2
3
var('x1 x2 x3 x4 x5')
n1,n2,n3,n4,n5 = 2,2,1,1,1
sum(sum(sum(sum(sum((-1)^(n1+n2+n3+n4+n5-x1-x2-x3-x4-x5)*factorial(x1+x2+x3+x4+x5)/(factorial(x1)*factorial(x2)*factorial(x3)*factorial(x4)*factorial(x5))*binomial(SR(n1)-1,x1-1)*binomial(SR(n2)-1,x2-1)*binomial(SR(n3)-1,x3-1)*binomial(SR(n4)-1,x4-1)*binomial(SR(n5)-1,x5-1),x1,1,n1),x2,1,n2),x3,1,n3),x4,1,n4),x5,1,n5)

\(=660\)

2 Generating function

This is actually just a part of the generating function (which I think can be derived from the recurrence relation)

\begin{equation*} \displaystyle G_n(v,w,x,y,z)=\left(\begin{array}{rrrrr} v & w & x & y & z \end{array}\right)\cdot \left(\begin{array}{rrrrr} 0 & w & x & y & z \\ v & 0 & x & y & z \\ v & w & 0 & y & z \\ v & w & x & 0 & z \\ v & w & x & y & 0 \end{array}\right)^{n-1}\cdot \left(\begin{array}{r} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{array}\right) \end{equation*}

The coefficient of \(v^2 w^2 x y z\) is the answer (evidently 660).

In Sage:

1
2
3
4
5
6
var('v w x y z')
a = matrix([v, w, x, y, z])
A = matrix([[0, w, x, y, z], [v, 0, x, y, z], [v, w, 0, y, z], [v, w, x, 0, z], [v, w, x, y, 0]])
ones = ones_matrix(QQ,5,1)
fxt = expand(a*(A^6)*ones)[0,0]
fxt.coefficient(x,2).coefficient(y,2).coefficient(z,1).coefficient(v,1).coefficient(w,1)

3 By recurrence

\begin{equation*} \displaystyle f_{p,q,r,s,t}=\left\{\begin{matrix} 0 & \text{if any of }p,q,r,s,t<0\\ 1 & \text{if any one of }p,q,r,s,t = 1 \text{ and others } 0\\ 2 & \text{if any two of }p,q,r,s,t = 1 \text{ and others } 0\\ 6 & \text{if any three of }p,q,r,s,t = 1 \text{ and the other's } 0\\ 24 & \text{if any four of }p,q,r,s,t = 1 \text{ and the other } 0\\ 120 & \text{if all of }p,q,r,s,t = 1 \\ A & \text{otherwise} \end{matrix}\right. \end{equation*}

where

\begin{align*} \displaystyle A=f_{p-1,q-1,r,s,t} + f_{p-1,q,r-1,s,t} + f_{p-1,q,r,s-1,t} + f_{p-1,q,r,s,t-1} + f_{p,q-1,r-1,s,t}+\\ f_{p,q-1,r,s-1,t} + f_{p,q-1,r,s,t-1} + f_{p,q,r-1,s-1,t} + f_{p,q,r-1,s,t-1} + f_{p,q,r,s-1,t-1}+\\ 2\cdot (f_{p-1,q-1,r-1,s,t} + f_{p-1,q-1,r,s-1,t} + f_{p-1,q-1,r,s,t-1} + f_{p-1,q,r-1,s-1,t}+ f_{p-1,q,r-1,s,t-1}+\\ f_{p-1,q,r,s-1,t-1} + f_{p,q-1,r-1,s-1,t} + f_{p,q-1,r-1,s,t-1} + f_{p,q-1,r,s-1,t-1} + f_{p,q,r-1,s-1,t-1})+\\ 3\cdot (f_{p-1,q-1,r-1,s-1,t} + f_{p-1,q-1,r-1,s,t-1} + f_{p-1,q-1,r,s-1,t-1} + f_{p-1,q,r-1,s-1,t-1} + f_{p,q-1,r-1,s-1,t-1})+4\cdot f_{p-1,q-1,r-1,s-1,t-1} \end{align*}

In Sage (python will also do), to print out the number of arrangements and its probability of occuring:

import numpy as np
def f(p,q,r,s,t) :
    if sum([0]>np.array([p,q,r,s,t]))>0:
        return 0
    if sum([1]==np.array([p,q,r,s,t]))==5:
        return 120
    if sum([1]==np.array([p,q,r,s,t]))==4 and sum([0]==np.array([p,q,r,s,t]))==1:
        return 24
    if sum([1]==np.array([p,q,r,s,t]))==3 and sum([0]==np.array([p,q,r,s,t]))==2:
        return 6
    if sum([1]==np.array([p,q,r,s,t]))==2 and sum([0]==np.array([p,q,r,s,t]))==3:
        return 2
    if sum([1]==np.array([p,q,r,s,t]))==1 and sum([0]==np.array([p,q,r,s,t]))==4:
        return 1
    return f(p-1,q-1,r,s,t)+ f(p-1,q,r-1,s,t)+ f(p-1,q,r,s-1,t)+ f(p-1,q,r,s,t-1)+ f(p,q-1,r-1,s,t)+ f(p,q-1,r,s-1,t)+ f(p,q-1,r,s,t-1)+ f(p,q,r-1,s-1,t)+ f(p,q,r-1,s,t-1)+ f(p,q,r,s-1,t-1)+2*(f(p-1,q-1,r-1,s,t)+ f(p-1,q-1,r,s-1,t)+ f(p-1,q-1,r,s,t-1)+ f(p-1,q,r-1,s-1,t)+ f(p-1,q,r-1,s,t-1)+ f(p-1,q,r,s-1,t-1)+ f(p,q-1,r-1,s-1,t)+ f(p,q-1,r-1,s,t-1)+ f(p,q-1,r,s-1,t-1)+ f(p,q,r-1,s-1,t-1)) +3*(f(p-1,q-1,r-1,s-1,t)+ f(p-1,q-1,r-1,s,t-1)+ f(p-1,q-1,r,s-1,t-1)+ f(p-1,q,r-1,s-1,t-1)+ f(p,q-1,r-1,s-1,t-1))+4*f(p-1,q-1,r-1,s-1,t-1)
a1,a2,a3,a4,a5 = 2,2,1,1,1
tot = f(a1,a2,a3,a4,a5)
print tot, tot/(factorial(a1+a2+a3+a4+a5)/(factorial(a1)*factorial(a2)*factorial(a3)*factorial(a4)*factorial(a5)))

NumPy is used here, since it allows array to be manipulated just like in an array programming language like J. The condition checking is made much shorter.

The recurrence is too slow if used for higher values. This can be sped up by caching the computed values, e.g. by dynamic programming.

We may back up our analytical results with a simulation (always a good thing to do, when possible)

1
2
3
a=.1 1 2 2 3 4 5
sim=: 3 : '0=+/0=2-/\(7?7){a'
(+/%#)(sim"0)1000000#0

about \(0.523729\), which is close to the actual result \(0.523809523809524\).

4 Using Integrals

One more way is to make use of integrals, which actually conveys the summation in a compact representation.

\begin{equation*} \displaystyle N(\{n_i\})=\int_0^\infty \prod_i q_{n_i}(x)\, e^{-x} \, dx \end{equation*}

where

\begin{equation*} \displaystyle q_{n_i}(x) = \sum_{i=1}^{n_i} \frac{(-1)^{i-n_i}}{i!} {n_i-1 \choose i-1}x^i \text{ for }n_i\geq 1 \end{equation*}

\(n_i\) is the number of repetitions of each character in the set.

For our example, the list of \(n_i\) can be written as [2,2,1,1,1]

Hence, in Sage:

1
2
3
4
var('i')
def q(n): return sum((-1)^(i-n)/factorial(i)*binomial(SR(n)-1,SR(i)-1)*x^i, i, 1, n)
lst = [2,2,1,1,1]
integrate(prod([q(l) for l in lst])*e^-x, x, 0, oo)

which displays our expected answer.

References:

1. Brilliant.org discussion

2. Stackexchange question

3. Another stackexchange question

Generalized Derangements

Derangement problems are quite easy to do when there are no restrictions. Suppose we want to extend the problem with restrictions, e.g.:

There are bins numbered 1,2,3,4,1,2,3,4,1,2 , and there are balls numbered 1,2,3,4,5,1,2,3,4,5 (distinguishable, say, two sets with different colors), and each bin is to contain a single ball such that the number of the bin and the ball should not match. In how many ways can that be done? (or what is the probability of that happening?)

This can be solved by using Rook polynomials.

The rook polynomial for a \(m\times n\) rectangle is:

\begin{equation*} \displaystyle r_{m,n}(x)=\sum_{k=0}^n{m\choose k}\, {n!\over (n-k)!}\, x^k \end{equation*}

For this problem:

\(m=\) number of bins numbered 's'

\(n=\) number of balls numbered 's'

and the rook polynomial we require to solve our problem is:

\begin{align*} \displaystyle R(x)&=r_{3,2}(x)\, r_{3,2}(x)\, r_{2,2}(x)\, r_{2,2}(x)\\\\ R(x)&=(6\, x^2 + 6\, x + 1)^2\, (2\, x^2 + 4\, x + 1)^2 \end{align*}

Now, with this rook polynomial, the number of ways of derangements can be found in two ways:

\begin{equation*} \displaystyle \int_0^\infty \, x^n\, R\left(-\dfrac{1}{x}\right)\, e^{-x}\, dx \end{equation*}

where \(n\) is the number of bins.

Another way is to expand \(R(x)\) and replace each \(x^i\) with \(i\cdot (n-i)!\)

The answer divided by \(n!\) gives the probability.

Both ways are described in the following Sage code:

var('k')
def rp(m,n): return sum(binomial(SR(m),SR(k))*factorial(n)/factorial(n-k)*x^k,k,0,n)
summ = 0
balls = range(1,6)*2
bins = ([1,2,3,4]*30)[:len(balls)]
stbin = set(bins)
nums = [bins.count(s) for s in stbin]
rook(x) = expand(prod([rp(bins.count(s),balls.count(s)) for s in stbin]))
nn = sum(nums)
for i in range(nn+1):
    summ += (-1)^i*rook(x).coefficient(x,i)*factorial(nn-i)
print summ/factorial(nn),"=",integrate(x^nn*rook(-1/x)*e^(-x),x,0,oo)/factorial(nn),"~",N(summ/factorial(nn)),summ

283/2520 = 283/2520 ~ 0.112301587301587 407520

which agrees with a simulation:

1
2
3
4
lst=.10$1 2 3 4
a=.2#1+i.5
sim=: 3 : '0=+/((10?10){a)=lst'
(+/%#)(sim"0)1000000#0

which is about \(0.112101\)

If we now turn our attention to the classic derangement problem, e.g. 10 bins and balls, each numbered 1 to 10, we change the variables accordingly:

balls = range(1,11)
bins = balls

We see that the summ is indeed \(1334961\), which is the number of derangements, \(D(10)\).

References:

1. Stackexchange problem

2. Notes on rook polynomial

Monte-Carlo simulations in J

We will see some more problems on probability, and how to do it in J.

1 Derangement problem

There are 30 boxes numbered 1 to 30, and 30 balls numbered 1 to 30. If the balls are put into the boxes (one in each) randomly, what's the probability that none of the balls are put in the boxes with matching number?

By simulation:

sim=: 3 : '0=+/(i.30)=30?30'
(+/%#)(sim"0)100000#0

The analytical answer:

+/((30$(1 _1))*(%!i.30))

2 Urns, balls and a gamble.

A game is played with the following rules: There is an urn with 20 balls, 10 red and 10 white. You need pick 10 balls out of these 20.

  1. If 10 balls are of the same color, you get $300
  2. If 9 balls are of the same color, you get $30
  3. If 8 balls are of the same color, you get $3
  4. If 7 balls are of the same color, you get $2
  5. If 6 balls are of the same color, you get $1
  6. If 5 balls are of the same color, you lose $5

What's your expected amount?

Simulation answer:

1
2
3
a=: 10#0 1
sim=: 3 : '((_5 * 5 = ]) + ([: +/ 4 6 =/ ]) + (2 * [: +/ 3 7 =/ ]) + (3 * [: +/ 2 8 =/ ]) + (30 * [: +/ 1 9 =/ ]) + 300 * [: +/ 10 0 =/ ])+/(10?20){a'
(+/%#)(sim"0)1000000#0  NB. = _0.826702

Analytical answer (hypergeometric distribution):

\(A=\{1, 2, 3, 30, 300, -5\}\)

\begin{equation*} \displaystyle \sum_{i=0}^{4}\dfrac{A_i\cdot 2\, \dbinom{10}{i}\dbinom{10}{10-i}}{\dbinom{20}{10}}+\dfrac{A_5\cdot \dbinom{10}{5}^2}{\dbinom{20}{10}} =-\dfrac{76485}{92378}=-0.827956872848514 \end{equation*}
_5 1 2 3 30 300)*(((5!10)^2), (2 * (10 !~ ]) * 10 !~ 10 - ]) 6+i.5)%10!20

3 Birthday problem

How many people should be in a room such that the probability of at least two of them sharing the same birthday is more than 0.5? (Assume 365 days in an year)

By Simulation:

sim=: 3 : '23>+/~:?23#365'
(+/%#)(sim"0)100000#0

Analytically:

1-*/((365-i.23)%(365)) NB. = 0.507297234323985

To see a plot of the probabilities, up to 100 people in a room:

load'plot'
'marker' plot (1+i.100);((1 - [: */ 365 %~ 365 - i.)"0) 1+i.100

4 4 dice

Four dice are rolled, what's the probability that no two faces are repeated? (E.g. 6 4 2 5 is okay, but 3 6 5 6 is forbidden)

By simulation:

sim=: 3 : '(4 = [: +/ [: +/ =/~)?4#6'

By permutation:

\begin{equation*} \displaystyle \mathbb{P}=\dfrac{\dbinom{6}{4}\cdot 4!}{6^4} = \dfrac{5}{18} = 0.2777777 \end{equation*}

Putting that in J console:

((4!6)*!4)%6^4

By counting:

(+/4=+/"1~:"1(6 6 6 6#:i.1296))%6^4 NB. this uses base-6 representation till 6^4

5 The ballot problem

In an election, candidate A receives n votes, and candidate B receives m votes where \(n > m\). Assuming that all orderings are equally likely, show that the probability that A is always ahead in the count of votes is \(\dfrac{n - m}{n + m}\).

Here's a simulation to see that it may be true:

1
2
3
4
5
n=:55
m=:45
a=:(m#_1),n#1
sim=: 3 : '100=+/0<+/\(100?100){a'
(+/%#)(sim"0)100000#0 NB. = 0.0993

which is close to 0.1 computed from the formula.

6 An ace from a regular deck of 52 cards

A deck of cards is well shuffled, and placed face down on a table. The cards are picked one after another, what's the probability that you get to see the first ace when \(k^{th}\) card is picked?

Finding the answer is quite easy, which is

\begin{equation*} \displaystyle \mathbb{P}(k)=\dfrac{\dbinom{48}{k-1}}{\dbinom{52}{k-1}}\cdot \dfrac{4}{52-k} \end{equation*}

and a simulation can back up our claim, e.g. for \(k=5\):

1
2
3
4
pos=: 5
a=:(4#1),48#0
sim=: 3 : 'pos=1+{.I.(52?52){a'
(+/%#)(sim"0)100000#0

and the probabilities for first five positions:

(((48 !~ 1 -~ ]) % 52 !~ 1 -~ ]) * 4 % 52 - 1 -~ ]) 1+i.5