Comparison Between BMZ And CHM Algorithms


Characteristics

Table 1 presents the main characteristics of the two algorithms. The number of edges in the graph is , the number of keys in the input set . The number of vertices of is equal to and for BMZ algorithm and the CHM algorithm, respectively. This measure is related to the amount of space to store the array . This improves the space required to store a function in BMZ algorithm to of the space required by the CHM algorithm. The number of critical edges is and 0, for BMZ algorithm and the CHM algorithm, respectively. BMZ algorithm generates random graphs that necessarily contains cycles and the CHM algorithm generates acyclic random graphs. Finally, the CHM algorithm generates order preserving functions while BMZ algorithm does not preserve order.

Characteristics Algorithms
BMZ CHM
$c$ 1.15 2.09
$\vert E(G)\vert$ $n$ $n$
$\vert V(G)\vert=\vert g\vert$ $cn$ $cn$
$\vert E(G_{\rm crit})\vert$ $0.5\vert E(G)\vert$ 0
$G$ cyclic acyclic
Order preserving no yes
Table 1: Main characteristics of the algorithms.

Memory Consumption

Algorithm c Memory consumption to generate a MPHF
BMZ 0.93 24.80n + O(1)
BMZ 1.15 26.42n + O(1)
CHM 2.09 33.00n + O(1)
Table 2: Memory consumption to generate a MPHF using the algorithms BMZ and CHM.
Algorithm c Memory consumption to store a MPHF
BMZ 0.93 3.72n
BMZ 1.15 4.60n
CHM 2.09 8.36n
Table 3: Memory consumption to store a MPHF generated by the algorithms BMZ and CHM.

Run times

We now present some experimental results to compare the BMZ and CHM algorithms. The data consists of a collection of 100 million universe resource locations (URLs) collected from the Web. The average length of a URL in the collection is 63 bytes. All experiments were carried on a computer running the Linux operating system, version 2.6.7, with a 2.4 gigahertz processor and 4 gigabytes of main memory.

Table 4 presents time measurements. All times are in seconds. The table entries represent averages over 50 trials. The column labelled as represents the number of iterations to generate the random graph in the mapping step of the algorithms. The next columns represent the run times for the mapping plus ordering steps together and the searching step for each algorithm. The last column represents the percent gain of our algorithm over the CHM algorithm.

$n$ BMZ CHM algorithm Gain
$N_i$ Map+Ord Search Total $N_i$ Map+Ord Search Total (%)
1,562,500 2.28 8.54 2.37 10.91 2.70 14.56 1.57 16.13 48
3,125,000 2.16 15.92 4.88 20.80 2.85 30.36 3.20 33.56 61
6,250,000 2.20 33.09 10.48 43.57 2.90 62.26 6.76 69.02 58
12,500,000 2.00 63.26 23.04 86.30 2.60 117.99 14.94 132.92 54
25,000,000 2.00 130.79 51.55 182.34 2.80 262.05 33.68 295.73 62
50,000,000 2.07 273.75 114.12 387.87 2.90 577.59 73.97 651.56 68
100,000,000 2.07 567.47 243.13 810.60 2.80 1,131.06 157.23 1,288.29 59
Table 4: Time measurements for BMZ and the CHM algorithm.

The mapping step of the BMZ algorithm is faster because the expected number of iterations in the mapping step to generate are 2.13 and 2.92 for BMZ algorithm and the CHM algorithm, respectively (see [2] for details). The graph generated by BMZ algorithm has vertices, against for the CHM algorithm. These two facts make BMZ algorithm faster in the mapping step. The ordering step of BMZ algorithm is approximately equal to the time to check if is acyclic for the CHM algorithm. The searching step of the CHM algorithm is faster, but the total time of BMZ algorithm is, on average, approximately 59 % faster than the CHM algorithm. It is important to notice the times for the searching step: for both algorithms they are not the dominant times, and the experimental results clearly show a linear behavior for the searching step.

We now present run times for BMZ algorithm using a heuristic that reduces the space requirement to any given value between words and words. For example, for and , the analytical expected number of iterations are and , respectively (for , the number of iterations are 2.78 for and 3.04 for ). Table 5 presents the total times to construct a function for , with an increase from seconds for (see Table 4) to seconds for and to seconds for .

$n$ BMZ $c=1.00$ BMZ $c=0.93$
$N_i$ Map+Ord Search Total $N_i$ Map+Ord Search Total
12,500,000 2.78 76.68 25.06 101.74 3.04 76.39 25.80 102.19
Table 5: Time measurements for BMZ tuned algorithm with and .

Home CHD BDZ BMZ CHM BRZ FCH

Enjoy!

Davi de Castro Reis

Djamel Belazzougui

Fabiano Cupertino Botelho

Nivio Ziviani

VigLink badge