Class Shape


  • public final class Shape
    extends java.lang.Object
    The definition of a Bloom filter shape.

    This class contains the values for the filter configuration and is used to convert a Hasher into a BloomFilter as well as verify that two Bloom filters are compatible. (i.e. can be compared or merged)

    Interrelatedness of values

    Number of Items (n)
    n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))
    Probability of False Positives (p)
    p = pow(1 - exp(-k / (m / n)), k)
    Number of Bits (m)
    m = ceil((n * ln(p)) / ln(1 / pow(2, ln(2))))
    Number of Functions (k)
    k = round((m / n) * ln(2))

    Estimations from cardinality based on shape

    Several estimates can be calculated from the Shape and the cardinality of a Bloom filter.

    In the calculation below the following values are used:

    • double c = the cardinality of the Bloom filter.
    • double m = numberOfBits as specified in the shape.
    • double k = numberOfHashFunctions as specified in the shape.

    Estimate N - n()

    The calculation for the estimate of N is: -(m/k) * ln(1 - (c/m)). This is the calculation performed by the Shape.estimateN(cardinality) method below. This estimate is roughly equivalent to the number of hashers that have been merged into a filter to create the cardinality specified.

    Note:

    • if cardinality == numberOfBits, then result is infinity.
    • if cardinality > numberOfBits, then result is NaN.

    Estimate N of Union - n(A ∪ B)

    To estimate the number of items in the union of two Bloom filters with the same shape, merge them together and calculate the estimated N from the result.

    Estimate N of the Intersection - n(A ∩ B)

    To estimate the number of items in the intersection of two Bloom filters A and B with the same shape the calculation is: n(A) + n(b) - n(A ∪ B).

    Care must be taken when any of the n(x) returns infinity. In general the following assumptions are true:

    • If n(A) = ∞ and n(B) < ∞ then n(A ∩ B) = n(B)
    • If n(A) < ∞ and n(B) = ∞ then n(A ∩ B) = n(A)
    • If n(A) = ∞ and n(B) = ∞ then n(A ∩ B) = ∞
    • If n(A) < ∞ and n(B) < ∞ and n(A ∪ B) = ∞ then n(A ∩ B) is undefined.
    Since:
    4.5.0-M1
    See Also:
    Bloom Filter calculator, Bloom filter [Wikipedia]
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean equals​(java.lang.Object obj)  
      double estimateMaxN()
      Estimates the maximum number of elements that can be merged into a filter of this shape before the false positive rate exceeds the desired rate.
      double estimateN​(int cardinality)
      Estimate the number of items in a Bloom filter with this shape and the specified number of bits enabled.
      static Shape fromKM​(int numberOfHashFunctions, int numberOfBits)
      Constructs a filter configuration with the specified number of hashFunctions (k) and bits (m).
      static Shape fromNM​(int numberOfItems, int numberOfBits)
      Constructs a filter configuration with the specified number of items (n) and bits (m).
      static Shape fromNMK​(int numberOfItems, int numberOfBits, int numberOfHashFunctions)
      Constructs a filter configuration with the specified number of items, bits and hash functions.
      static Shape fromNP​(int numberOfItems, double probability)
      Constructs a filter configuration with the specified number of items (n) and desired false-positive probability (p).
      static Shape fromPMK​(double probability, int numberOfBits, int numberOfHashFunctions)
      Constructs a filter configuration with a desired false-positive probability (p) and the specified number of bits (m) and hash functions (k).
      int getNumberOfBits()
      Gets the number of bits in the Bloom filter.
      int getNumberOfHashFunctions()
      Gets the number of hash functions used to construct the filter.
      double getProbability​(int numberOfItems)
      Calculates the probability of false positives (p) given numberOfItems (n), numberOfBits (m) and numberOfHashFunctions (k).
      int hashCode()  
      boolean isSparse​(int cardinality)
      Determines if a cardinality is sparse based on the shape.
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, wait, wait, wait
    • Method Detail

      • fromKM

        public static Shape fromKM​(int numberOfHashFunctions,
                                   int numberOfBits)
        Constructs a filter configuration with the specified number of hashFunctions (k) and bits (m).
        Parameters:
        numberOfHashFunctions - Number of hash functions to use for each item placed in the filter.
        numberOfBits - The number of bits in the filter
        Returns:
        a valid Shape.
        Throws:
        java.lang.IllegalArgumentException - if numberOfHashFunctions < 1 or numberOfBits < 1
      • fromNM

        public static Shape fromNM​(int numberOfItems,
                                   int numberOfBits)
        Constructs a filter configuration with the specified number of items (n) and bits (m).

        The optimal number of hash functions (k) is computed.

        k = round((m / n) * ln(2))

        The false-positive probability is computed using the number of items, bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

        Parameters:
        numberOfItems - Number of items to be placed in the filter
        numberOfBits - The number of bits in the filter
        Returns:
        a valid Shape.
        Throws:
        java.lang.IllegalArgumentException - if numberOfItems < 1, numberOfBits < 1, the calculated number of hash function is < 1, or if the actual probability is >= 1.0
      • fromNMK

        public static Shape fromNMK​(int numberOfItems,
                                    int numberOfBits,
                                    int numberOfHashFunctions)
        Constructs a filter configuration with the specified number of items, bits and hash functions.

        The false-positive probability is computed using the number of items, bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

        Parameters:
        numberOfItems - Number of items to be placed in the filter
        numberOfBits - The number of bits in the filter.
        numberOfHashFunctions - The number of hash functions in the filter
        Returns:
        a valid Shape.
        Throws:
        java.lang.IllegalArgumentException - if numberOfItems < 1, numberOfBits < 1, numberOfHashFunctions < 1, or if the actual probability is >= 1.0.
      • fromNP

        public static Shape fromNP​(int numberOfItems,
                                   double probability)
        Constructs a filter configuration with the specified number of items (n) and desired false-positive probability (p).

        The number of bits (m) for the filter is computed.

        m = ceil(n * ln(p) / ln(1 / 2^ln(2)))

        The optimal number of hash functions (k) is computed.

        k = round((m / n) * ln(2))

        The actual probability will be approximately equal to the desired probability but will be dependent upon the calculated number of bits and hash functions. An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

        Parameters:
        numberOfItems - Number of items to be placed in the filter
        probability - The desired false-positive probability in the range (0, 1)
        Returns:
        a valid Shape
        Throws:
        java.lang.IllegalArgumentException - if numberOfItems < 1, if the desired probability is not in the range (0, 1) or if the actual probability is >= 1.0.
      • fromPMK

        public static Shape fromPMK​(double probability,
                                    int numberOfBits,
                                    int numberOfHashFunctions)
        Constructs a filter configuration with a desired false-positive probability (p) and the specified number of bits (m) and hash functions (k).

        The number of items (n) to be stored in the filter is computed.

        n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))

        The actual probability will be approximately equal to the desired probability but will be dependent upon the calculated Bloom filter capacity (number of items). An exception is raised if this is greater than or equal to 1 (i.e. the shape is invalid for use as a Bloom filter).

        Parameters:
        probability - The desired false-positive probability in the range (0, 1)
        numberOfBits - The number of bits in the filter
        numberOfHashFunctions - The number of hash functions in the filter
        Returns:
        a valid Shape.
        Throws:
        java.lang.IllegalArgumentException - if the desired probability is not in the range (0, 1), numberOfBits < 1, numberOfHashFunctions < 1, or the actual probability is >= 1.0
      • equals

        public boolean equals​(java.lang.Object obj)
        Overrides:
        equals in class java.lang.Object
      • estimateMaxN

        public double estimateMaxN()
        Estimates the maximum number of elements that can be merged into a filter of this shape before the false positive rate exceeds the desired rate.

        The formula for deriving k when m and n are known is:

        k = ln2 * m / n

        Solving for n yields:

        n = ln2 * m / k

        Returns:
        An estimate of max N.
      • estimateN

        public double estimateN​(int cardinality)
        Estimate the number of items in a Bloom filter with this shape and the specified number of bits enabled.

        Note:

        • if cardinality == numberOfBits, then result is infinity.
        • if cardinality > numberOfBits, then result is NaN.
        Parameters:
        cardinality - the number of enabled bits also known as the hamming value.
        Returns:
        An estimate of the number of items in the Bloom filter.
      • getNumberOfBits

        public int getNumberOfBits()
        Gets the number of bits in the Bloom filter. This is also known as m.
        Returns:
        the number of bits in the Bloom filter (m).
      • getNumberOfHashFunctions

        public int getNumberOfHashFunctions()
        Gets the number of hash functions used to construct the filter. This is also known as k.
        Returns:
        the number of hash functions used to construct the filter (k).
      • getProbability

        public double getProbability​(int numberOfItems)
        Calculates the probability of false positives (p) given numberOfItems (n), numberOfBits (m) and numberOfHashFunctions (k).
        p = pow(1 - exp(-k / (m / n)), k)

        This is the probability that a Bloom filter will return true for the presence of an item when it does not contain the item.

        The probability assumes that the Bloom filter is filled with the expected number of items. If the filter contains fewer items then the actual probability will be lower. Thus, this returns the worst-case false positive probability for a filter that has not exceeded its expected number of items.

        Parameters:
        numberOfItems - the number of items hashed into the Bloom filter.
        Returns:
        the probability of false positives.
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • isSparse

        public boolean isSparse​(int cardinality)
        Determines if a cardinality is sparse based on the shape.

        This method assumes that bit maps are 64bits and indexes are 32bits. If the memory necessary to store the cardinality as indexes is less than the estimated memory for bit maps, the cardinality is determined to be sparse.

        Parameters:
        cardinality - the cardinality to check.
        Returns:
        true if the cardinality is sparse within the shape.
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object