10 - Hashing
10.1 - Introduction
- Hashing = A technique that retrieves the value using the index obtained from the key without performing a search.
- Hashing is superefficient. Takes O(1) time to search, insert and delete an element using hashing.
- Hashing uses a hashing function to map a key to an index.
- Key is used to search for the corresponding value.
- Hash Table = Array that stores the values that can map a key to an index in an array.
- Has 3 states: Occupied, Marked or Empty.
- Hash Function = The function that maps a key to an index in the hash table.
- Converts a search key to an integer value (hashcode)
- Compresses the hashcode to an index in the hash table.
- Perfect Hash Function = A function that maps each search key to a different index in the hash table. Difficult because of collision.
- Hash Collision = When two or more keys are mapped to the same hash value.
- HashMaps have low time complexities. (get has O(1), values/rehash has O(capacity)
10.2 - Hash Functions and Hash Codes
- Hashcode = An integer value of the search key from an object.
- Java`s root class Object has the hashCode method.
- hashCode() returns the memoryAddress of the object when not overridden.
- HashCode rules:
- Override the hashCode() method when overriding the equals() method to make sure two equal objects have the same hashcode.
- Invoking the hashCode() method multiple times during a program execution should result in the same integer when the objects data has not changed.
- Two unequal objects may have the same hashcode, when they are not equal. The hashCode() method needs to be overwritten.
- Folding = Converting a 64-bit into two 32-bit hashes and using the XOR (java bitwise: ^ )operation. To create one 32-bit hashcode from which all 64-bits have been taken in consideration. (for example Long.hashCode() works like this)
- Polynomial hash code = ???
10.3 - Handling Collisions Using Open Addressing
Open Addressing = The process of finding an open location in the hash table in the event of a collision. In one of 3 ways:
- Linear Probing = Finding the next available location sequentially.
- Generally causes clusters.
- Collision at hashTable[(k) % n]? Then k++;
- K is the hashvalue.
- N is a chosen prime.
- Will always find a location, as long as the table is not full.
- Quadratic Probing = Finding the next available location while changing the location modifier quadratic.
- Causes Secondary Clustering
- Collision at hashTable[(k + j^2) % N]? then j++, starting with 1;
- K is the hashvalue.
- J is an integer 1,2,3,4,5, etc.
- N is a chosen prime.
- Possible that is does not find a spot for the key.
- Double Hashing = Uses a secondairy hash function h`(key) on the keys to determine increments.
- Collision at hashTable[(k+j*h`(key))]? Then j++, starting with 0.
- h`(key) should not return 0.
- Linear Probing = Finding the next available location sequentially.
Cluster = A group of consecutive cells in a hashtable to be occupied.
Secondary Clustering = Entries can collide with an occupied entry using the same probe sequence.
10.4 - Handeling Collisions Using Seperate Chaining
- Seperate Chaining = Placing all entries with the same hash index in the same location. Each location uses a bucket (ArrayList or LinkedList) to hold the multiple values.
- Rehashing = Reloading of entries in a new hashtable after the previous one has exceeded its load factor.
- Load factor is displayed as the greek lambda symbol.