A hash function defines the mapping relationship of (key, value) pairs. Good hash functions should be very fast to compute (easy for hardware implementation), and should minimize duplication of output values (less collisions).
How to Implement a Hash Function?
One common approach to implement a hash function, is to use modulo operations:
value = mod(key, d) = key % d
To minimize collision, d is often a prime number. If d is in the form of (2n – 1), we call it Mersenne prime, and it is quite friendly for hardware implementation. For example, when key = 272, n = 2:
value = ‘d272 % ‘d3 = ‘b100010000 % ‘b11
The lower 2-bit of ‘d272 is part of the remainder; The upper 7-bit of ‘d272 represents how many ‘d4 this number has, and each ‘d4 contributes 1 to the remainder, thus:
value = (‘b1000100 + ‘b00) % ‘b11 = ‘b1000100 % ‘b11
Continue this process:
value = (‘b10001 + ‘b00) % ‘b11 = ‘b10001 % ‘b11
= (‘b100 + ‘b01) % ‘b11 = ‘b101 % ‘b11
= (‘b1 + ‘b01) % ‘b11 = ‘b10 % ‘b11
= ‘b10 = ‘d2
The proof of this process is shown below:
value = key % (2n – 1) = ((key / 2n) + (key % 2n)) % (2n – 1)
And,
(key % 2n) = key & (2n – 1)
In terms of RTL coding, it can be easily implemented using recursion or iteration.
Hash CAM
RTL designers often use Content Addressable Memories, or CAMs, as tag RAMs in cache. CAMs compare the input address tag against the stored data, and returns the address of matching data. The address indicates the matching cache sets and ways.
One can use hash functions to index tag RAMs in cache. Instead of CAM look up, input address serves as key, and the hash function output value serves as tag RAM index.
The hash function should be carefully evaluated based on the cache entry number. For example, in a 16K-line cache, one can use mod 32767 (214 – 1), mod 16383 (213 – 1), or mod 8191 (211 – 1).
Bloom filter is an efficient data structure for performing approximate set membership queries. Its main characteristics include:
- No false negative, i.e., if Bloom filter tells you something is not in the set, then it is definitely not in the set
- Possible false positive, i.e., if Bloom filter tells you something is in the set, then it may or may not be in the set
- Unlike a standard hash table, a Bloom filter of a fixed size can represent a set with an arbitrarily large number of elements, i.e., adding an element never fails due to the data structure “fill up”, however, you pay the price of increasing false positive rate
The following diagram shows a Bloom filter example:

There are 3 elements in the set, {x, y, z} (n = 3). Each element insertion invokes 3 different hash functions (k = 3), and hash function outputs serve as indices to the 18-bit array (m = 18), where the corresponding bit is set to 1.
When performing a query to check if element w is in the set, the same hash functions are invoked. Since it hashes to an array position containing 0, we can conclude w is not in the set {x, y, z}.
It is easy to observe that:
- The more elements that are added to the set, the larger chances of false positives
- The more hash functions you have, the slower the Bloom filter performs, and the quicker it fills up; However, if you have too few hash functions, you will suffer from too many false positives
Given parameters n, m, k, the false positive probability is approximated as:
(1 – e-kn/m)k
And the optimal number of hash functions:
k = (m / n) ln2
Bloom filters are used in wide varieties of applications. For example, snoop bypass filters in a cache-coherent multi-socket system.
In a cache-coherent multi-socket system, a snoop request can be sent to one or more remote cache devices to determine if any other cache has a copy of the content, and a link between the cache and the remote cache devices can include a snoop bypass filter. The snoop bypass filter monitors data cached by the remote cache devices on a cache line basis or a coarser granularity. The snoop bypass filter can respond to the snoop request early if it knows the data does not exist in the remote cache devices. The snoop bypass filter reduces snoop response latency, and in turn improves the system performance. It is easy to observe that Bloom filters are perfect candidates to implement the snoop bypass filter.
Since you cannot “evict” an element from the Bloom filter, RTL designers often keep multiple copies of the Bloom filter bit arrays, which refresh / reset at alternating cycles.
References:

Leave a comment