COMP1927 - Hash Tables

Introduction

As far as assessable course content goes, I don't think Hash Tables are in that category. But just to be on the safe side and learn some cool things, here are notes!

Hash tables are a data type that provide a way of pairing data with a particular 'key'. This allows for quick access, insertion and deletion.

Commonly implemented as an array, a hash table applies a hash function to a variable and then stores said variable at that index of the array. Essentially it gives you a number for your string or struct or any other data, and that tells you where to look for it.

The number is always the same for any piece of input with the same hash function. This has to be the case or storage and look-up wouldn't work.

Collisions

Depending on the algorithm used to convert data to a number (the hash function), multiple different values may hash to the same number, causing what is termed as a "collision". There are many ways to deal with this, but I think the simplest method is to create an array of linked lists, where each node stores the hash'd data.

Sizing

Hash table size can be approached in one of two main strategies;

  1. Make the table big enough to start with, allowing for lack of optimal memory use.
  2. Make the table resize when it nears full

One may be more appropriate than the other depending on the hash algorithm and strategy for dealing with collisions

Code (for storing structs)

Creation

// createHash creates a HashTable
// Initialising every cell to be a NULL-pointing List
HashTable createHash () {
 
   HashTable h = malloc(sizeof(struct hashTable));
   if (h == NULL) {
      fprintf (stderr, "Out of memory, exiting\n");
      exit(1);
   }
 
   h->array = malloc(sizeof(List)*HT_SIZE);
   if (h->array == NULL) {
      fprintf (stderr, "Out of memory, exiting\n");
      exit(1);
   }
 
   int i;
   for (i = 0; i < HT_SIZE; i++) {
      h->array[i] = createList();
   }
 
   return h;
 
}

Deletion

void deleteHashTable (HashTable h) {
 
   if (h != NULL) {
      if (h->array != NULL) {
         int i;
         for (i = 0; i < HT_SIZE; i++) {
            deleteList(h->array[i]);
         }
         free(h->array);
      }
      free (h);
   }
 
}

Insertion

// insertHashTable takes in a Data d
// If d is not in the HashTable it inserts it at the start of the list
void insertHashTable (HashTable h, Data d) {
 
   if (h != NULL) {
      if (inTable(h, d) == NULL) {
         int index = hash(d);
         Node current = h->array[index]->first;
         h->array[index]->first = createNode(d);
         h->array[index]->first->next = current;
      }
   }
 
}

Removal

// removeNode finds the node before the matching one
// It then relinks the nodes correctly, freeing the matching one
Data removeHashTable (HashTable h, Data d) {
 
   if (h != NULL) {
      Node found = inTable(h,d);
      if (found != NULL) {
         int index = hash(d);
         Node next = found->next->next;
         deleteNode(found->next);
         found->next = next;
      }
   }
 
}

Find Data

// inTable returns the node before the matching one
// that way removeHashTable can quickly delete the correct node
Node inTable (HashTable h, Data d) {
 
   if (h != NULL) {
 
      Node found = NULL;
 
      int index = hash(d);
      Node current = h->array[index]->first;
      Node next;
 
      while ((current->next != NULL) && (found == NULL)) {
         if (match(current->next->data, d)) {
            found = current;
         } else {
            current = current->next;
         }
      }
 
   }
 
   return found;
 
}