Cuckoo Filter

https://en.wikipedia.org/wiki/Cuckoo_filter

From Wikipedia, the free encyclopedia

A cuckoo filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, like a Bloom filter does. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". A cuckoo filter can also delete existing items, which is not supported by Bloom filters. In addition, for applications that store many items and target moderately low false positive rates, cuckoo filters can achieve lower space overhead than space-optimized Bloom filters.[1]

Cuckoo filters were first described in 2014.[2]

Algorithm description

[edit]

A cuckoo filter uses a hash table based on cuckoo hashing to store the fingerprints of items.[2] The data structure is broken into buckets of some size {\displaystyle b}. To insert the fingerprint of an item {\displaystyle x}, one first computes two potential buckets {\displaystyle h_{1}(x)} and {\displaystyle h_{2}(x)} where {\displaystyle x} could go. These buckets are calculated using the formula

{\displaystyle h_{1}(x)={\text{hash}}(x)}
{\displaystyle h_{2}(x)=h_{1}(x)\oplus {\text{hash}}({\text{fingerprint}}(x))}

Note that, due to the symmetry of the XOR operation, one can compute {\displaystyle h_{2}(x)} from {\displaystyle h_{1}(x)}, and {\displaystyle h_{1}(x)} from {\displaystyle h_{2}(x)}. As defined above, {\displaystyle h_{2}(x)=h_{1}(x)\oplus {\text{hash}}({\text{fingerprint}}(x))}; it follows that {\displaystyle h_{1}(x)=h_{2}(x)\oplus {\text{hash}}({\text{fingerprint}}(x))}. These properties are what make it possible to store the fingerprints with cuckoo hashing.

The fingerprint of {\displaystyle x} is placed into one of buckets {\displaystyle h_{1}(x)} and {\displaystyle h_{2}(x)}. If the buckets are full, then one of the fingerprints in the bucket is evicted using cuckoo hashing, and placed into the other bucket where it can go. If that bucket, in turn, is also full, then that may trigger another eviction, etc.

The hash table can achieve both high utilization (thanks to cuckoo hashing), and compactness because only fingerprints are stored. Lookup and delete operations of a cuckoo filter are straightforward.[2]

There are a maximum of two buckets to check by {\displaystyle h_{1}(x)} and {\displaystyle h_{2}(x)}. If found, the appropriate lookup or delete operation can be performed in {\displaystyle O(b)} time. Often, in practice, {\displaystyle b} is a constant.

In order for the hash table to offer theoretical guarantees, the fingerprint size {\displaystyle f} must be at least {\displaystyle \Omega ((\log n)/b)} bits.[2][3][4] Subject to this constraint, cuckoo filters guarantee a false-positive rate of at most {\displaystyle \epsilon \leq b/2^{f-1}}.[2]

Comparison to Bloom filters

[edit]

A cuckoo filter is similar to a Bloom filter in that they both are fast and compact, and they may both return false positives as answers to set-membership queries:

  1. ^ Michael D. Mitzenmacher. "Bloom Filters, Cuckoo Hashing, Cuckoo Filters, Adaptive Cuckoo Filters, and Learned Bloom Filters".
  2. ^ a b c d e f g Fan, Bin; Andersen, Dave G.; Kaminsky, Michael; Mitzenmacher, Michael D. (2014). Cuckoo filter: Practically better than Bloom. Proc. 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT '14). Sydney, Australia. pp. 75–88. doi:10.1145/2674005.2674994. ISBN 9781450332798.
  3. ^ Eppstein, David (22 June 2016). Cuckoo filter: Simplification and analysis. Proc. 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs). Vol. 53. Reykjavik, Iceland. pp. 8:1–8:12. arXiv:1604.06067. doi:10.4230/LIPIcs.SWAT.2016.8.
  4. ^ Fleming, Noah (17 May 2018). Cuckoo Hashing and Cuckoo Filters (PDF) (Technical report). University of Toronto.
  5. ^ Pagh, Rasmus; Rodler, Flemming Friche (2001). "Cuckoo hashing". Proc. 9th Annual European Symposium on Algorithms (ESA 2001). Lecture Notes in Computer Science. Vol. 2161. Århus, Denmark. pp. 121–133. doi:10.1007/3-540-44676-1_10. ISBN 978-3-540-42493-2.
{
"by": "thunderbong",
"descendants": 1,
"id": 40248838,
"kids": [
40248972
],
"score": 2,
"time": 1714750348,
"title": "Cuckoo Filter",
"type": "story",
"url": "https://en.wikipedia.org/wiki/Cuckoo_filter"
}
{
"author": "Contributors to Wikimedia projects",
"date": "2024-07-28T22:05:27.000Z",
"description": null,
"image": "https://wikimedia.org/api/rest_v1/media/math/render/svg/f11423fbb2e967f986e36804a8ae4271734917c3",
"logo": "https://logo.clearbit.com/wikipedia.org",
"publisher": "Wikipedia",
"title": "Cuckoo filter - Wikipedia",
"url": "https://en.wikipedia.org/wiki/Cuckoo_filter"
}
{
"url": "https://en.wikipedia.org/wiki/Cuckoo_filter",
"title": "Cuckoo filter - Wikipedia",
"description": "From Wikipedia, the free encyclopedia \t\t\t\t\t A cuckoo filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set, like a Bloom filter does. False...",
"links": [
"https://en.wikipedia.org/wiki/Cuckoo_filter"
],
"image": "",
"content": "<div>\n\t\t\t\t\t\t<p>From Wikipedia, the free encyclopedia</p>\n\t\t\t\t\t</div><div>\n<p>A <b>cuckoo filter</b> is a space-efficient <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Probabilistic\" title=\"Probabilistic\">probabilistic</a> <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Data_structure\" title=\"Data structure\">data structure</a> that is used to test whether an <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Element_(mathematics)\" title=\"Element (mathematics)\">element</a> is a member of a <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Set_(computer_science)\" title=\"Set (computer science)\">set</a>, like a <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Bloom_filter\" title=\"Bloom filter\">Bloom filter</a> does. <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Type_I_and_type_II_errors\" title=\"Type I and type II errors\">False positive</a> matches are possible, but <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Type_I_and_type_II_errors\" title=\"Type I and type II errors\">false negatives</a> are not – in other words, a query returns either \"possibly in set\" or \"definitely not in set\". A cuckoo filter can also delete existing items, which is\nnot supported by Bloom filters.\nIn addition, for applications that store many items and\ntarget moderately low false positive rates, cuckoo filters can achieve\nlower space overhead than space-optimized Bloom filters.<sup><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_note-1\"><span>[</span>1<span>]</span></a></sup>\n</p><p>Cuckoo filters were first described in 2014.<sup><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_note-CuckooFilter-2\"><span>[</span>2<span>]</span></a></sup>\n</p>\n<div><h2 id=\"Algorithm_description\">Algorithm description</h2><p><span><span>[</span><a target=\"_blank\" href=\"https://en.wikipedia.org/w/index.php?title=Cuckoo_filter&amp;action=edit&amp;section=1\" title=\"Edit section: Algorithm description\"><span>edit</span></a><span>]</span></span></p></div>\n<p>A cuckoo filter uses a hash table based on <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_hashing\" title=\"Cuckoo hashing\">cuckoo hashing</a> to store the <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Fingerprint_(computing)\" title=\"Fingerprint (computing)\">fingerprints</a> of items.<sup><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_note-CuckooFilter-2\"><span>[</span>2<span>]</span></a></sup> The data structure is broken into buckets of some size <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/f11423fbb2e967f986e36804a8ae4271734917c3\" alt=\"{\\displaystyle b}\" /></span>. To insert the fingerprint of an item <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/87f9e315fd7e2ba406057a97300593c4802b53e4\" alt=\"{\\displaystyle x}\" /></span>, one first computes two potential buckets <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/5a54192749baa8ca6069d3c58b30722fd7eae55d\" alt=\"{\\displaystyle h_{1}(x)}\" /></span> and <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/d4960c74f885bc90838e4ed3abc4ae97ae838948\" alt=\"{\\displaystyle h_{2}(x)}\" /></span> where <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/87f9e315fd7e2ba406057a97300593c4802b53e4\" alt=\"{\\displaystyle x}\" /></span> could go. These buckets are calculated using the formula\n</p>\n<dl><dd><span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/e51e461ab35b3d70a776adf7d1e8cf7d3558ce3f\" alt=\"{\\displaystyle h_{1}(x)={\\text{hash}}(x)}\" /></span></dd>\n<dd><span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/5966395dc87893ec398a67e9c4f06001e878d970\" alt=\"{\\displaystyle h_{2}(x)=h_{1}(x)\\oplus {\\text{hash}}({\\text{fingerprint}}(x))}\" /></span></dd></dl>\n<p>Note that, due to the symmetry of the <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/XOR\" title=\"XOR\">XOR</a> operation, one can compute <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/d4960c74f885bc90838e4ed3abc4ae97ae838948\" alt=\"{\\displaystyle h_{2}(x)}\" /></span> from <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/5a54192749baa8ca6069d3c58b30722fd7eae55d\" alt=\"{\\displaystyle h_{1}(x)}\" /></span>, and <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/5a54192749baa8ca6069d3c58b30722fd7eae55d\" alt=\"{\\displaystyle h_{1}(x)}\" /></span> from <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/d4960c74f885bc90838e4ed3abc4ae97ae838948\" alt=\"{\\displaystyle h_{2}(x)}\" /></span>. As defined above, <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/5966395dc87893ec398a67e9c4f06001e878d970\" alt=\"{\\displaystyle h_{2}(x)=h_{1}(x)\\oplus {\\text{hash}}({\\text{fingerprint}}(x))}\" /></span>; it follows that <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/cb7df14fb242d4a2852e3354a6248babeeee8084\" alt=\"{\\displaystyle h_{1}(x)=h_{2}(x)\\oplus {\\text{hash}}({\\text{fingerprint}}(x))}\" /></span>. These properties are what make it possible to store the fingerprints with cuckoo hashing.\n</p><p>The fingerprint of <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/87f9e315fd7e2ba406057a97300593c4802b53e4\" alt=\"{\\displaystyle x}\" /></span> is placed into one of buckets <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/5a54192749baa8ca6069d3c58b30722fd7eae55d\" alt=\"{\\displaystyle h_{1}(x)}\" /></span> and <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/d4960c74f885bc90838e4ed3abc4ae97ae838948\" alt=\"{\\displaystyle h_{2}(x)}\" /></span>. If the buckets are full, then one of the fingerprints in the bucket is evicted using cuckoo hashing, and placed into the other bucket where it can go. If that bucket, in turn, is also full, then that may trigger another eviction, etc.\n</p><p>The hash table can achieve both high utilization (thanks to <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_hashing\" title=\"Cuckoo hashing\">cuckoo hashing</a>), and compactness because only fingerprints are stored. Lookup and delete operations of a cuckoo filter are straightforward.<sup><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_note-CuckooFilter-2\"><span>[</span>2<span>]</span></a></sup> \n</p><p>There are a maximum of two buckets to check by <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/5a54192749baa8ca6069d3c58b30722fd7eae55d\" alt=\"{\\displaystyle h_{1}(x)}\" /></span> and <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/d4960c74f885bc90838e4ed3abc4ae97ae838948\" alt=\"{\\displaystyle h_{2}(x)}\" /></span>. If found, the appropriate lookup or delete operation can be performed in <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/617e0fcc33be0730c40d7c458cd4951c60b4d66a\" alt=\"{\\displaystyle O(b)}\" /></span> time. Often, in practice, <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/f11423fbb2e967f986e36804a8ae4271734917c3\" alt=\"{\\displaystyle b}\" /></span> is a constant.\n</p><p>In order for the hash table to offer theoretical guarantees, the fingerprint size <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/132e57acb643253e7810ee9702d9581f159a1c61\" alt=\"{\\displaystyle f}\" /></span> must be at least <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/94905321b7398823add475e1562d88c869bea128\" alt=\"{\\displaystyle \\Omega ((\\log n)/b)}\" /></span> bits.<sup><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_note-CuckooFilter-2\"><span>[</span>2<span>]</span></a></sup><sup><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_note-3\"><span>[</span>3<span>]</span></a></sup><sup><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_note-4\"><span>[</span>4<span>]</span></a></sup> Subject to this constraint, cuckoo filters guarantee a false-positive rate of at most <span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/a25f5e35a055e0079a11bf04e85acfee8859aae8\" alt=\"{\\displaystyle \\epsilon \\leq b/2^{f-1}}\" /></span>.<sup><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_note-CuckooFilter-2\"><span>[</span>2<span>]</span></a></sup>\n</p>\n<div><h2 id=\"Comparison_to_Bloom_filters\">Comparison to Bloom filters</h2><p><span><span>[</span><a target=\"_blank\" href=\"https://en.wikipedia.org/w/index.php?title=Cuckoo_filter&amp;action=edit&amp;section=2\" title=\"Edit section: Comparison to Bloom filters\"><span>edit</span></a><span>]</span></span></p></div>\n<p>A cuckoo filter is similar to a <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Bloom_filter\" title=\"Bloom filter\">Bloom filter</a> in that they both are fast and compact, and they may both return false positives as answers to set-membership queries:\n</p>\n<div><ol>\n<li><span><b><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-1\">^</a></b></span> <span>\n<a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Michael_Mitzenmacher\" title=\"Michael Mitzenmacher\">Michael D. Mitzenmacher</a>. <a target=\"_blank\" href=\"https://smartech.gatech.edu/handle/1853/60577\">\"Bloom Filters, Cuckoo Hashing, Cuckoo Filters, Adaptive Cuckoo Filters, and Learned Bloom Filters\"</a>.<span></span></span>\n</li>\n<li><span>^ <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-CuckooFilter_2-0\"><sup><i><b>a</b></i></sup></a> <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-CuckooFilter_2-1\"><sup><i><b>b</b></i></sup></a> <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-CuckooFilter_2-2\"><sup><i><b>c</b></i></sup></a> <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-CuckooFilter_2-3\"><sup><i><b>d</b></i></sup></a> <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-CuckooFilter_2-4\"><sup><i><b>e</b></i></sup></a> <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-CuckooFilter_2-5\"><sup><i><b>f</b></i></sup></a> <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-CuckooFilter_2-6\"><sup><i><b>g</b></i></sup></a></span> <span>\nFan, Bin; Andersen, Dave G.; Kaminsky, Michael; <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Michael_Mitzenmacher\" title=\"Michael Mitzenmacher\">Mitzenmacher, Michael D.</a> (2014). <i>Cuckoo filter: Practically better than Bloom</i>. Proc. 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT '14). Sydney, Australia. pp. <span>75–</span>88. <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Doi_(identifier)\" title=\"Doi (identifier)\">doi</a>:<span><a target=\"_blank\" href=\"https://doi.org/10.1145%2F2674005.2674994\">10.1145/2674005.2674994</a></span>. <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/ISBN_(identifier)\" title=\"ISBN (identifier)\">ISBN</a> <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Special:BookSources/9781450332798\" title=\"Special:BookSources/9781450332798\">9781450332798</a>.<span></span></span>\n</li>\n<li><span><b><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-3\">^</a></b></span> <span>\n<a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/David_Eppstein\" title=\"David Eppstein\">Eppstein, David</a> (22 June 2016). <i>Cuckoo filter: Simplification and analysis</i>. Proc. 15th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2016). Leibniz International Proceedings in Informatics (LIPIcs). Vol. 53. Reykjavik, Iceland. pp. 8:1–8:12. <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/ArXiv_(identifier)\" title=\"ArXiv (identifier)\">arXiv</a>:<span><a target=\"_blank\" href=\"https://arxiv.org/abs/1604.06067\">1604.06067</a></span>. <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Doi_(identifier)\" title=\"Doi (identifier)\">doi</a>:<span><a target=\"_blank\" href=\"https://doi.org/10.4230%2FLIPIcs.SWAT.2016.8\">10.4230/LIPIcs.SWAT.2016.8</a></span>.<span></span></span>\n</li>\n<li><span><b><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-4\">^</a></b></span> <span>\nFleming, Noah (17 May 2018). <a target=\"_blank\" href=\"http://www.cs.toronto.edu/~noahfleming/CuckooHashing.pdf\"><i>Cuckoo Hashing and Cuckoo Filters</i></a> <span>(PDF)</span> (Technical report). University of Toronto.<span></span></span>\n</li>\n<li><span><b><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Cuckoo_filter#cite_ref-CuckooHashing_5-0\">^</a></b></span> <span><a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Rasmus_Pagh\" title=\"Rasmus Pagh\">Pagh, Rasmus</a>; Rodler, Flemming Friche (2001). \"Cuckoo hashing\". <i>Proc. 9th Annual European Symposium on Algorithms (ESA 2001)</i>. Lecture Notes in Computer Science. Vol. 2161. Århus, Denmark. pp. <span>121–</span>133. <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Doi_(identifier)\" title=\"Doi (identifier)\">doi</a>:<a target=\"_blank\" href=\"https://doi.org/10.1007%2F3-540-44676-1_10\">10.1007/3-540-44676-1_10</a>. <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/ISBN_(identifier)\" title=\"ISBN (identifier)\">ISBN</a> <a target=\"_blank\" href=\"https://en.wikipedia.org/wiki/Special:BookSources/978-3-540-42493-2\" title=\"Special:BookSources/978-3-540-42493-2\">978-3-540-42493-2</a>.<span></span></span>\n</li>\n</ol></div>\n<ul><li><a target=\"_blank\" href=\"https://bdupras.github.io/filter-tutorial/\">Probabilistic Filters By Example – A tutorial comparing Cuckoo and Bloom filters.</a></li></ul>\n</div>",
"author": "",
"favicon": "https://en.wikipedia.org/static/favicon/wikipedia.ico",
"source": "en.wikipedia.org",
"published": "2018-11-25t05:11:57z",
"ttr": 109,
"type": "website"
}