How would you implement HashMap?


#1

Design Cache Q: How would you implement HashMap?


#2

We can create thread safe Hashmap similar to concurrenthashmap in Java, so that even write locking can be improved unless write's are in different segments


#3

Got it , we block a piece of code , not the data structure. This should help to understand chaining scheme: https://www.coursera.org/learn/data-structures/lecture/2yY2h/chaining-scheme


#4

newEntry.next = H[g] makes it a little confusing to understand. It gives an illusion that a cycle is being created here until one realizes that H[g] points to NULL initially.

The way I see it newEntry.next = NULL will make it easier to read and understand.


#5

I am really having hard time understanding this par, can someone please break it down in simple way.Thanks in advance


#6

Rishi, Here H[g] is not always null. H[g] stores a linked list which grows every time we add a node for the same bucket. Read [http://opendatastructures.org/ods-cpp/5_1_Hashing_with_Chaining.html] for further understanding.


#7

Hi Pratik, though a little late, still this might help someone else. Breaking down this implementation-

  1. HashMap is an array of size N.
  2. Each array element is a pointer storing head of a linked list.
  3. In other words there are N linked list whose head are stored in array.
  4. Linked list heads are initialized to null.
  5. For storing (Key, Val) in HashMap, hash of Key is calculated.
  6. Hash can be any number and we have array of size N.
  7. We do hash(Key)%N to find position in array.
  8. Let g = hash(Key)%N. g is a number between 0 to N-1.
  9. Create a LinkedListNode storing Val. Call it newEntry.
  10. Add newEntry to the LinkedList at position g.
  11. H[g] stores the head of that LinkedList.
  12. Add newEntry to beginning. newEntry.next = H[g]; H[g] = newEntry;
    Since this implementation has N individual LinkedLists, we can have a lock on a particular LinkedList instead of a lock on the array itself.

#8

Hi Mukund,

Can you elaborate on point 12.I didn’t quite get it.


#9

@sandeep_suthari: This might be a little late for the explanation you needed but it might helpful for someone else in your position. I will try to explain it as much as I can.

If the list is empty ( no head present ), then create a new entry and set it as head.
else
(we will push the new node at the head)
make a new entry
new_entry->next = current_head
head = new_entry

So, to make it thread safe, we only need to take lock on the linkedlist rather than the array of linked list


#10

If you implement a LinkedLink for each HashMap entry, don’t you lose the LRU nature of the cache? My understanding is that by storing a single LinkedList for all entries in the HashMap, you know exactly which node should be evicted when exceeding the capacity (the oldest or last node in the LinkedList). The idea to use a LinkedList for EACH HashMap entry is interesting, but then you lose knowing the oldest node in the entire LinkedList. I suppose this might be okay if you want to evict the oldest node ONLY AT the key. But that’s no longer a LRU cache.


#12

For who is trying to understand “For a given key k, generate g = hash(k) % N newEntry = LinkedList Node with value = v newEntry.next = H[g]
H[g] = newEntry
– Since we are using a linked list to avoid collision for Hashed key. if the H(g) is already pointing to other value then , in order to insert the new node in the head. we need to point the newentry.next to previous value stored in H(g) and make the h(g) points to newly created node.


#13

We are using 2 data structures in LRU cache problem -

  1. hashmap - used in read/write queries. We are implementing hashmap using array of linked list. (which has nothing to do with point no 2.)
  2. doubly linked list - used to implement least recently used policy.