Part I: Bloom Filters

7 min readMay 13, 2022

Introduction

As bitcoin adoption grows, so has the growth of light clients, i.e. nodes on the network incapable of independent transaction verification. To ensure that these clients can verify transactions related to them, the concept of bloom filters was proposed for usage in what is known as simple payment verification.

The introduction and usage of bloom filters made it possible for light clients to verify transactions in interested blocks in ways that, to a certain degree, preserved their privacy. However, this “privacy-preserving” method of payment verification is not without leaks, requiring trust in honest peers (full nodes) to forward Merkle blocks containing the desired transactions, for work they (full nodes) have no incentive to do.

For all the benefits they provide, the problems with bloom filters initiated the drive towards client-side transaction filtering based on trustless peer connections, and with no privacy leaks. In this article, I provide a basic overview of how light clients utilize bloom filters for payment verification, some of the challenges associated with their use, and why there was a need for a different way to conduct transaction verification.

Light Client Transaction Verification

Not all devices have the “luxury” of performing independent verification of transactions, which can only be done with access to the entire bitcoin ledger. This is primarily because of device-specific constraints on space, bandwidth, and processing power. These constrained devices are, in truth, only interested in the transactions related to the addresses in their wallets and should not be downloading multiple GBs of unrelated data. Also, with the growth of bitcoin, it has become increasingly more expensive to run a full node. Beyond just downloading the blockchain, nodes must remain connected to different peers on the network all the time. The memory, processing, and bandwidth requirements are quite “steep” and unreasonable for all nodes on the network to have.

Figure 1: A block of transactions with a transaction of interest for a light client

For devices like mobile phones and tablets, it is possible to partake in the network by running lightweight clients — nodes that do not have access to the full blockchain but can conduct verification of payments by depending on full nodes, and their access to the complete ledger. To verify that a transaction of interest, i.e. paying bitcoins to address(es) on their wallet, is valid, the light clients create and send bloom filters to full nodes, who, in turn, forward just about enough block information (in a Merkle block) for the light clients to recalculate the Merkle root in the block header. The light clients check a transaction’s proof of inclusion if the Merkle roots match. This is the only way a light client can be sure that the transaction was included in this block.

The lack of access to the full blockchain translates to light clients delegating double-spend detection to full nodes and miners. Light clients are not exactly susceptible to double-spend attacks because they outsource the responsibility of detecting transactions spending the same UTXOs to miners. As long as honest miners have >51% of the hashing power, there will be no double-spends, as enforced by the miners.

Full nodes can hide transactions from them, thereby denying them service. With the hope of peering with an honest full node, light clients randomly connect to several nodes on the network to combat the aforementioned attack. This makes them vulnerable to Sybil attacks that encircle them with fake nodes that disconnect them from the real network, stalling the light clients. This attack cannot steal funds but will impact usability.

Light clients conduct transaction verification by using tune-able bloom filters so that they do not leak information about the addresses in their wallets.

Bloom Filters:

Light clients cannot share their addresses/script_pubkeys to full nodes. Well, they can, but should not because of the loss of privacy in doing so. What they can do is create a superset of transactions they might be interested in. This is done using bloom filters — a probabilistic search filter. These filters are n-sized bit field created by passing transactions through a set number of hash functions, noting the output number q (between 1 and n) of each hash function, and flicking the bit at position q on (0 to 1).

This is how light clients use bloom filters. They:

create a list of script hashes, and transaction IDs from any UTXO its wallet can spend: For any transaction paying bitcoins to addresses in the light client’s wallet, the light client creates a list containing the transaction’s ID, the pub keys, and hashes in the transaction output’s script_pubkey, each input in the transaction input, and signatures in the witness scripts (or script_sig). See the figure below.

Figure 2: Creating list of transaction objects

add each list item to the bloom filter: Each item in the list is passed through the set number of hash functions to the bloom filter, as shown below.

Figure 3: Adding transaction objects to bloom filter

send bloom filter to peers: The bloom filter is then sent to peers on the network. Full nodes match each incoming transaction to the bloom filter, sending a Merkle block to the light client only when there is a match.
verify transaction: With the Merkle block, the light client checks for matched transactions, and updates its view of the UTXO set, wallet balance, and the bloom filter to match future transactions that will reference the found UTXO.

The way bloom filters work seems pretty straightforward. Light clients get to, to some degree, protect their privacy, and keep bandwidth and space requirements at a minimum. These benefits are not without their costs though. The risk of Sybil attacks remain for the light client. Full nodes also bear the brunt of the work when they match each incoming transaction to the bloom filters they received. It is work done without reward that also exposes them to denial of service attacks if they get spammed with fake bloom filters. Having one bloom filter per connected client does not scale.

For all the benefits of bloom filters, it appears most light clients rely on data from wallet vendors for transaction verification (Song, 2019). The drawback to using bloom filters heralded a new way for light clients to perform client-side transaction filtering and verification. This way has to provide the following benefits:

Privacy: Bloom filters are tune-able and you can achieve varying degrees of privacy by how they are constructed. The larger the size of the bitfield, and the number of hash functions, the greater the accuracy and specificity of the filter, and the worse the privacy score. With a lower specificity, the better the privacy score. Light clients should be able to request for block headers without worrying about how much privacy is lost with the specificity of their bloom filters. The new way must ensure that light clients can request for block data without compromising on their privacy.
Trust: Light clients need to connect with at least one honest peer that would neither deny them service by hiding transactions nor send them double-spent transactions. The new way should encourage bitcoin’s philosophy of trustlessness. Light clients should request and get filter information for transactions they want to verify without having to trust any one honest peer.
Work: Full nodes are in a state of perpetual work, scanning and re-scanning bloom filters for every incoming transaction they receive. Apart from providing a service that keeps the bitcoin network active for growing light clients, there is no reward for server-side transaction filtering. The risks of DOS attacks also need consideration. If a new solution to the transaction verification problem for light clients reduces the work of scanning to one-time computation and eliminates the attack vector, then that should be explored.
Size: The size of the new filter should be lightweight. Small enough to compute and save to disk, and to relay along the network.

The proposal and implementation of compact block filters provided these benefits. In the follow-up article to this, I will explore compact block filters and how it is a better alternative to bloom filters.

Conclusion

As bitcoin adoption grows, so has the growth of light clients. These clients, without access to the full ledger, have particular requirements concerning transaction verification for addresses in their wallets. They rely on a payment verification method that creates and sends bloom filters to full node peers. These filters are tune-able, giving light clients privacy control, but makes them vulnerable to Sybil attacks. These filters also place a disproportionate amount of work on full nodes. To counter the disadvantages, a new kind of filtering system called compact block filters was proposed and will be the topic of discussion in my article.

Contact me

I would deeply appreciate any feedback you can provide. If you found this article helpful/useful or found factual misrepresentation, please do not hesitate to comment or contact me here or on Twitter @engb_os.

References

Antonopoulos, A. (2017). Mastering Bitcoin: Programming the open blockchain
Song, J. (2019). Programming bitcoin: Learn how to program bitcoin from scratch
Rosenbaum, K. (2019). Grokking Bitcoin
Mouton, E (2021): Compact Block Filters Deep Dive (BIP 158). https://bitcoin-dev.blog/blog/bip158-deep-dive/. Accessed 2 May 2022.