LibDS: Heap

HASH TABLE

Introduction

Using a heap or a balanced binary tree to store and retrieve data is a very fast and efficient way to deal with data that needs to be sorted in one way or another. We know that construction of a heap or binary tree is of the order of O(n lg n), while data access can happen in O(lg n), n being the number of data items stored in the data structure. These limits are lower limits, and as long as key comparisons are required in order to access the data, we know that they cannot be made lower (I won't go about giving the proof here, check out a book on this if you need the exact proof).

But, strictly speaking, we don't always need to store data in any particular order, nor do we need to sort it. If we don't need to sort the data, then we can dispense with the need for key comparisons, and can in fact make data access much faster. This is the principle behind using a table: you can access any element of the table in O(1), which is independent of the number of data elements contained in it. With a lot of elements and frequent data access, we can achieve some noticeable speed differences this way.

Mapping a data element with a key to a location in the table is achieved by using a hash function. A hash function will always map a given key to a particular location in the table. Accessing the element simply means run its key through the hash function and read the table index that the function returned. From here stems the advantage in speed over a balance binary tree.

LibDS refers to a hashtable as a HASHTABLE. In the current implementation, a hash table is represented by a fixed size array, while collisions are resolved through chaining. Keys are as usual of type DSKEY. You need to define your own hash function, and hash key comparison function (by now this should sound familiar). However, since strings are quite often used as keys, LibDS provides a default hash and hash comparison function that work on C-strings.

As usual, items stored in the hash table are void pointers to actual user data.

All data is stored in virtual memory.

Initialization

HASHTABLE htMake(int size)

Creates a default hash table of size size. Passing a value for size smaller than 1 lets LibDS choose a default value, normally something in the order of less than a 1000 elements. There is some debate whether the size should be a primary number or not, which can supposedly reflect on the amount of collisions that will occur. Experiment with this to find out.
The default hash table assumes C-strings as keys, and uses a predefined hash function. If you got a better function, see the next function on how to tell LibDS to use that instead.

HASHTABLE htMakeHashTable(int size,HASHFUNC hashfun,HASHCMPFUNC cmpfun)
typedef unsigned int (*HASHFUNC)(DSKEY);
typedef int (*HASHCMPFUNC)(DSKEY,DSKEY);

Creates a new hash table. It has a size of size, and uses the given functions for hashing and comparing the keys. LibDS assumes a key of 32 bits, so hashfun must return a 32-bit unsigned value. Comparing the keys is necessary only when LibDS is resolving collisions, and cmpfun is expected to return -1 when the first key is smaller than the second, 0 when the two keys are equal, and 1 whenever the second key is larger than the first.

Upon success, the above functions return a valid HASHTABLE pointer. If they fail, NULL is returned.

Closing

void htClose(HASHTABLE hashtable)

Closes the hash table, effectively releasing all memory used by it. User data remains untouched.

void htCloseWithFunction(HASHTABLE hashtable,void (*fun)(void*))

Closes the hashtable, only this time call the function fun on each data element in the table as it is being closed.

Adding

int htAdd(HASHTABLE table,DSKEY key,void *data)

Adds a new element with key key to the hashtable. The function returns 0 on success, -1 on failure. Failure is signalled whenever a duplicate key is detected, or there is a memory allocation problem.

Deleting

void* htRemove(HASHTABLE table,DSKEY key)

Delete the element with the given key from the hash table.

Fetching

As usual, whenever a void* is returned, it contains user data, or NULL if something went wrong. Therefore, storing NULL as a value will not be distinguished from an error.

void * htFind(HASHTABLE table,DSKEY key)

Return the element with the given key. Fails if the key is not in the table.

Parameters

int htSize(HASHTABLE table)

Return the size of the table. This is equal to the argument passed to the htMake*() functions, unless that argument was -1, in which case this function returns the actual size LibDS selected for the table. Does not fail.

int htItems(HASHTABLE table)

Returns the actual number of items currently stored in the table. Never fails.

int htConflicts(HASHTABLE table)

Returns the number of conflicts (items that are mapped to the same location in the table). Never failes.

int htEmpty(HASHTABLE table)

Returns true (value not equal to zero) if the table has no (more) elements. Else, false (value equal to zero) is returned. (Equivalent to htItems() == 0).

Advanced funtionality

Visiting the elements of a HASHTABLE

You can visit all elements in the hash table and call a function to be performed on each data item found in each element.

void htWalk(HASHTABLE table,int all,void (*fun)(DSKEY key,void * data,int is_conflict))

Visits all the elements in the table (in no predefined order), and calls fun on each element found. The function receives as first argument the key of the element. As second argument the user data that was stored with the element is passed. The third argument is non-zero if the current element is a conflict of the previous one. The third argument is zero if the current element is not conflicting with any other elements. If all is true (not equal to zero), all elements are visited. If all is equal to zero, only the conflicting elements are visited.

NOTE: htWalk() will happily visit all slots in the hashtable, including those that are empty. Therefore, it is not ideal for sequential access to your data.

Obscure functionality

Nothing.

Thu Dec 27 15:37:04 EST 2001