binaryfuse
I built binaryfuse to bring the efficiency of Binary Fuse Filters (based on Lemire et al.) to Python.
If you’re working with large, immutable datasets (like blocklists or static dictionaries), Bloom filters are often the default choice. However, Binary Fuse Filters offer a significant upgrade: they are smaller (~1.13x overhead vs ~1.44x for Bloom) and faster for lookups.
The library is implemented in Rust to ensure memory safety and performance, but I ship abi3 wheels so you can install it on any modern Python version (3.8+) without needing a Rust toolchain.
How it Works
Unlike Bloom filters which set bits in a single array, Binary Fuse Filters use a fused structure of three segments. Keys are mapped to a location in each segment, and their values are XORed to store a fingerprint. To check if a key exists, we compute the same hash locations and verifying if the XOR sum matches the fingerprint.
flowchart LR
K[Key] --> H1[Hash 1]
K --> H2[Hash 2]
K --> H3[Hash 3]
subgraph Segments
direction TB
S1[Segment 0]
S2[Segment 1]
S3[Segment 2]
end
H1 --> S1
H2 --> S2
H3 --> S3
S1 & S2 & S3 -->|XOR| R[Result]
R -->|Match Fingerprint?| D{In Set?}
Here’s the logic expressed in simplified Python:
def fingerprint(key):
# Reduce key to an 8-bit or 16-bit value
return hash(key) & 0xFF
def contains(key, segments):
# 1. Map key to three segments
h1, h2, h3 = hash_to_locations(key)
# 2. Retrieve values from the fused structure
val1 = segments[0][h1]
val2 = segments[1][h2]
val3 = segments[2][h3]
# 3. Combine them
computed_fingerprint = val1 ^ val2 ^ val3
# 4. Check against key's fingerprint
return computed_fingerprint == fingerprint(key)
Performance
The main trade-off is immutability: you generally construct the filter once and query it many times. In exchange, you get better space efficiency and cache-friendly lookups.
| Metric | Binary Fuse 8-bit | Bloom Filter |
|---|---|---|
| Bits per key | ~9 | ~10-14 (depending on FPR) |
| False Positive Rate | ~0.4% | ~1% (at 10 bits) |
| Lookup Speed | ~20ns | ~50+ns |
usage
import binaryfuse
# Create once from a list of unique strings
data = ["apple", "banana", "cherry"]
filter = binaryfuse.BinaryFuse8(data)
# Fast lookups
"apple" in filter # True
"grape" in filter # False