BDZ Algorithm
Introduction
Coming soon...
The Algorithm
Coming soon...
Mapping Step
Coming soon...
Assigning Step
Coming soon...
Ranking Step
Coming soon...
Memory Consumption
Now we detail the memory consumption to generate and to store minimal perfect hash functions
using the BDZ algorithm. The structures responsible for memory consumption are in the
following:
- 3-graph:
- first: is a vector that stores cn integer numbers, each one representing
the first edge (index in the vector edges) in the list of
incident edges of each vertex. The integer numbers are 4 bytes long. Therefore,
the vector first is stored in 4cn bytes.
- edges: is a vector to represent the edges of the graph. As each edge
is compounded by three vertices, each entry stores three integer numbers
of 4 bytes that represent the vertices. As there are n edges, the
vector edges is stored in 12n bytes.
- next: given a vertex
, we can discover the edges that
contain
following its list of incident edges,
which starts on first[
] and the next
edges are given by next[...first[
]...]. Therefore, the vectors first and next represent
the linked lists of edges of each vertex. As there are three vertices for each edge,
when an edge is iserted in the 3-graph, it must be inserted in the three linked lists
of the vertices in its composition. Therefore, there are 3n entries of integer
numbers in the vector next, so it is stored in 4*3n = 12n bytes.
- Vertices degree (vert_degree vector): is a vector of cn bytes
that represents the degree of each vertex. We can use just one byte for each
vertex because the 3-graph is sparse, once it has more vertices than edges.
Therefore, the vertices degree is represented in cn bytes.
- Acyclicity test:
- List of deleted edges obtained when we test whether the 3-graph is a forest (queue vector):
is a vector of n integer numbers containing indexes of vector edges. Therefore, it
requires 4n bytes in internal memory.
- Marked edges in the acyclicity test (marked_edges vector):
is a bit vector of n bits to indicate the edges that have already been deleted during
the acyclicity test. Therefore, it requires n/8 bytes in internal memory.
- MPHF description
- function g: is represented by a vector of 2cn bits. Therefore, it is
stored in 0.25cn bytes
- ranktable: is a lookup table used to store some precomputed ranking information.
It has (cn)/(2^b) entries of 4-byte integer numbers. Therefore it is stored in
(4cn)/(2^b) bytes. The larger is b, the more compact is the resulting MPHFs and
the slower are the functions. So b imposes a trade-of between space and time.
- Total: 0.25cn + (4cn)/(2^b) bytes
Thus, the total memory consumption of BDZ algorithm for generating a minimal
perfect hash function (MPHF) is: (28.125 + 5c)n + 0.25cn + (4cn)/(2^b) + O(1) bytes.
As the value of constant c may be larger than or equal to 1.23 we have:
| c |
b |
Memory consumption to generate a MPHF (in bytes) |
| 1.23 |
7 |
34.62n + O(1) |
| 1.23 |
8 |
34.60n + O(1) |
| Table 1: Memory consumption to generate a MPHF using the BDZ algorithm. |
Now we present the memory consumption to store the resulting function.
So we have:
| c |
b |
Memory consumption to store a MPHF (in bits) |
| 1.23 |
7 |
2.77n + O(1) |
| 1.23 |
8 |
2.61n + O(1) |
| Table 2: Memory consumption to store a MPHF generated by the BDZ algorithm. |
Experimental Results
Experimental results to compare the BDZ algorithm with the other ones in the CMPH
library are presented in Botelho, Pagh and Ziviani [1,2].
Papers
- F. C. Botelho, R. Pagh, N. Ziviani. Simple and space-efficient minimal perfect hash functions. 10th International Workshop on Algorithms and Data Structures (WADs'07), Springer-Verlag Lecture Notes in Computer Science, vol. 4619, Halifax, Canada, August 2007, 139-150.
- F. C. Botelho. Near Space-Optimal Perfect Hashing Algorithms. Thesis Proposal, Department of Computer Science, Federal University of Minas Gerais, July 2007.
Enjoy!
Davi de Castro Reis
Djamel Belazzougui
Fabiano Cupertino Botelho