Notes
2023-07-14
It is folklore that, under some conditions,
open-addressed hash tables with quadratic
probing do not miss free slots. This note
explains why with a proof.
Let me break things down a little. With hash
tables, the prime problem to solve when
implementing insertion is conflicts: You are
handed a key value pair (K₁,V₁) and, after
hasing the key K₁, you realize that the table
slot designated by the hash value is already
filled with a binding (K₂,V₂). Chaining
implementations will just cram the binding
in the slot anyway, hence creating a bucket
with more than one binding in a single
table slot. Open-addressing implementations
will instead go look for another slot.
One obvious strategy for searching is to
go look at the next slot, and the next one,
and so on, potentially wrapping around when the
last slot of the table is visited.
Quadratic probing offers an alternative scheme:
instead of scanning slot h+i mod N
at the
i-th step, it scans the slot h+P(i) mod N
where P
is a polynomial function of degree 2.
This strategy helps with the clustering problem
that linear probing may create.
The polynomial P
of interest to us here is
P(X) = (X + X²)/2
.
The polynomial P
outputs so-called
triangular numbers (they can be stacked in a
triangle shape), and the sequence P(0)
, P(1)
,
P(2)
, ... is easily computed:
p_i = 0;
for (i=0; i<N; i++) {
assert(p_i == (i + i*i)/2);
p_i += i;
}
But, more importantly, if the table size is a
power of 2, and there is a single free slot,
quadratic probing will find it. It is not an
obvious fact and I prove it below.
Formally, we want to show that as i
ranges
over the interval I=[0,2ⁿ)
, h+P(i) mod 2ⁿ
ranges over I
as well. Said differently, we
want to show that the function mapping x
to
h+P(x) mod 2ⁿ
is a bijection over I
.
Since I
is finite, it is sufficent to show
that the map is injective; that is, it maps
different values of I
to different values.
To prove injectivity we consider i
and j
in I
such that h+P(i) mod 2ⁿ
= h+P(j) mod 2ⁿ
and show
that i
must be equal to j
. We first reason by
equivalence:
(h + P(i) mod 2ⁿ) = (h + P(j) mod 2ⁿ)
⟺ (h + (i + i²)/2) - (h + (j + j²)/2) = k2ⁿ
⟺ (i + i²) - (j + j²) = k2ⁿ⁺¹
⟺ (i - j)(1 + i + j) = k2ⁿ⁺¹
Breaking the products on both sides we get that
there must be three integers k₁
, k₂
, and m
such that:
A := i - j = k₁2ᵐ with k = k₁k₂
B := 1 + i + j = k₂2ⁿ⁺¹⁻ᵐ and 0 ≤ m ≤ n + 1
We now consider three cases:
- If
m
is not 0
nor n+1
then it must be that
both A
and B
are even. That is however impossible
because B = 1 + 2j + A
, so when A
is even, B
is
odd, and vice versa.
- If
m
is n+1
, we must then have i - j = k₁2ⁿ⁺¹
,
but since i
and j
are in the interval I
, i - j
is in (-2ⁿ,2ⁿ)
, so k₁
can only be zero and thus
i
equals j
.
- If
m
is 0
, we again use a size argument. Since i
and j
are in I
, we have that 1 + i + j
is in
(0,2ⁿ⁺¹-1)
. So k₂
must be positive and less
than 1; these requirements prove that the present
case is impossible.
The only possible outcome for the case analysis
is thus that i
equals j
. Unwinding everything, the
map h+P(x) mod 2ⁿ
is injective on I
, so it is
also bijective on I
, so when i
ranges from 0
to 2ⁿ-1
, the map ranges over all of I
, and thus
quadratic probing indeed vists all slots!