Why is a Trie suggested?


It doesn’t really seem like trie is a good use case here. It says a trie will improve run time of review word checks? Maybe, if we were searching for words that have similar stems (cooler, cool, ice, icey, etc.). You can figure out with the expected output, the grader is not doing this. It seems like if would be better to just place all the good words into a set and while parsing the reviews chck if they show up there and increment count for the review accordingly.


A set is usually implemented as a red black tree. Since it is a balanced tree there are at worst O(log(n)) comparisons. Each string comparison will take O(m) where m is the length of the string we are searching for. Therefore using a set uses O(mlog(n)) time. If we use a trie there are only O(m) comparisons.

An easy solution would be to use an unordered_map which would give us an expected O(m) time but could be O(n*m) in the worst case where all words are hashed to the same object. This could be fixed by using perfect hashing since our input is static (it ensures no collisions). This is harder to implement then a trie however and there is no stl support.


I understood why a set shouldn’t be implemented here due to increased time complexity. But I didn’t understand why isn’t an unordered_set preferred when it only takes O(1) time in lookup. Can you please elaborate on that part?
Thank you.


Unordered_set uses hashing and is expected O(1). It can still be O (n) in the worst case. The trie just makes the expected time turn to worst case time.


Actually a trie and a hashmap has the same time complexity for traversal: O(n) where n = number of characters in a word.

(Note: while hashmap is O(1) for search, the hash function is O(n) when n = length of input.)

A trie is possibly better because the search can terminate early when match fails for a specific character. But I doubt that will work in practice because of locality of reference problems. (A trie uses a lot of memory, hence poor locality of reference).

So my conclusion is: we need some real-life data to test it.


Hashtable or hashmaps also exhibit poor locality of reference . The access patterns change a lot leading to cache misses , Interesting article :https://archive.siam.org/meetings/alenex05/papers/13gheileman.pdf