The Classic Coupon Collector's Problem
0. Introduction
At the moment a large German grocery store hands out 35 collector cards with the faces of the German national football team including staff.
For every 10 Euro worth of groceries you receive two separate packaged cards. The cards you receive are random, so there is no opportunity to specifically choose cards that are still missing in the collection.
All collected cards are then stored in a corresponding booklet.
A classic question in probability theory asks how many cards you need to receive (or in other words, money you need to spend) on average in order to obtain all cards and hence complete the collection. In the following, I will extensively discuss that question and draw mainly on [1] and [2].
1. Preliminaries
1.1 Assumptions and Notation
In order to make the model mathematically tractable, we need to introduce assumptions on the problem structure. The assumptions (a) and (b) are necessary as otherwise the mathematics become way more complicated.
(a). We assume that there is no trading of duplicates possible.
(b). The probability of being drawn is equally likely for all cards. Also, all drawings are independent of one another.
(c’).
(d).
(e). Define
1.2 An auxiliary lemma
The general approach to solving the problem follows [2]:
We want to count the number of instances in which there is at least one card not drawn yet by converting it to a sum of events for which there are exactly
Proposition A: Inclusion-Exclusion Principle.
For a probability space
Proof.
The following proof is the one I wrote up in the course of solving Georgii’s chapter one exercises and treats the more general claim
for
Following [1, p. 26] we first show that for all
This identity follows at once as soon as we can establish the identity
For each element
A more straightforward argument shows the disjointness directly. Let
Next, we show the equality of the left- and right-hand side.
Let
Finally, we prove the general equality by inserting the identity that we just proved and replace
3. Solving the problem theoretically
3.1 The exact probability
Consider the event
Next, we use Proposition A (Inclusion-Exclusion-principle) to transform the union of events into a sum of intersection of events.
Now observe that
For some element
Because of the assumed independence of the draws, the formula further simplifies to
Reinserting this in formula (3) gives
Since there are
To obtain the final formula, first note that
and
Hence, the final result is
3.2 The expected value of draws
From basic calculus it is known that for
Differentiating both sides with respect to
This fact together with proposition A.2 then allows us the derive the expected value as follows.
The last step follows from the identity
Finally, we can answer the initial question. For
cards to complete the album. This makes
4. Solving the problem computationally in Python
The following Python code simulates repeated drawing of cards and hence delivers an empirical solution of the problem.
import numpy
import random
i = 0
All_lengths = []
random.seed(a=12345, version=2)
while i < 10000:
All_numbers = []
LEN = len(numpy.unique(All_numbers))
while LEN < 35:
A = random.randint(1, 35)
All_numbers.append(A)
LEN = len(numpy.unique(All_numbers))
All_lengths.append(len(All_numbers))
i = i + 1
print(sum(All_lengths)/len(All_lengths))
Running the above code on my machine gives a value of 144.9484 in a couple of seconds which is already pretty impressively close to the true value.
5. Epilogue
There are other approaches to the coupon collector problem. For instance,
(i). the resulting distribution can be shown to be a convolution of geometric distributions (cf. [2]).
(ii). the derivation based on Stirling numbers of the second kind
Finally, the mathematics get a lot more difficult when giving up assumption (b), that is, the uniformness assumption. If there are some player cards that are rarer (e.g. for the most popular players), then there is no known closed-form solution that is built-up of elementary functions.
References
[1] H.-O. Georgii (2014): Stochastik - Eine Einführung. Spektrum.
[2] Peter Ruckdeschel (2011): Das Coupon-Collector Problem und warum und in welcher Hinsicht das letzte fehlende Panini-Bild am “teuersten” ist.