These are the most basic hash based collections that are used in every day programming, so here I will visit both HashSet and HashMap and explain their internal working. First I will start with HashMap and then continue with HashSet.
HashMap
HashMap stores its elements in the form of key value pairs but no order is maintained inside the map. Each key value pair in a map corresponds to an Entry object which is static inner class inside a HashMap class. Moreover HashMap is not a thread safe implementation of a Map interface like the Hashtable, also HashMap can store one null key and multiple null values unlike the Hashtable.
How the objects are stored inside the HashMap?
HashMap is made up of two collection implementations i.e. Array and LinkedList.
It stores its key value pairs in the buckets where each bucket corresponds to an index in the array. The array index is calculated by taking into account key's hashcode value and the size of the array. Each hashcode value can fall into the same bucket so these key-value pairs are stored in a LinkedList structure, this is known as collision in hash based collections. To get a value from a hashmap first the hashcode value of the key is computed and then the index position in the array is calculated which is also called bucket and if the bucket contains multiple values, all the values are iterated to find the correct value using the object(key) equals() method. So, proper implementation of hashcode() and equals() is necessary to ensure correct working of hash based collections.
Immutability
Immutable objects act as great building blocks for other objects whether mutable or immutable. So, immutable object make good map keys and set elements as their values will not change once they are in a map or set.
Initial Capacity and Load Factor
Initial Capacity and load factor plays an important part in deciding the performance of the hash based collections. The capacity is the number of buckets and the initial capacity is the capacity that is defined while creating a hashmap. The load factor decides how full the hashmap can get before its capacity is increased. The default capacity and load factor of hashmap are 16 and 0.75 respectively this means initially hashmap will contain 16 buckets and the capacity of the hashmap is doubled when it is 75% filled.
Higher values of load factor will decrease the space overhead but it will eventually increase the look up cost. While lower values of load factor will increase the rehashing operations thereby decreasing the performance. So, a good tradeoff between initial capacity and load factor is necessary to improve the performance of the hashmap by reducing the rehashing operations. Rehashing means to create a new array and map the old elements into the new array by applying the hashing function.
HashSet
HashSet is backed by HashMap instance and it stores unique elements but order in which the elements are stored is not defined, it also allows null element.
The value that is added to the HashSet acts as a key to the backing HashMap with a dummy object as the value to that key which is a static constant so that multiple add operations does not result in multiple instances of object. For example if we add a value to the hashset like
Set<String> collections= new HashSet<String>();
collections.add("HashSet");
then internally it is stored in a map as hashmap.put("HashSet",dummy) where dummy is
private static final Object dummy = new Object();
When iterating over a HashSet a keyset iterator is returned not the values. Since the elements stored in the HashSet are stored as keys into the HashMap.
HashMap
HashMap stores its elements in the form of key value pairs but no order is maintained inside the map. Each key value pair in a map corresponds to an Entry object which is static inner class inside a HashMap class. Moreover HashMap is not a thread safe implementation of a Map interface like the Hashtable, also HashMap can store one null key and multiple null values unlike the Hashtable.
How the objects are stored inside the HashMap?
HashMap is made up of two collection implementations i.e. Array and LinkedList.
It stores its key value pairs in the buckets where each bucket corresponds to an index in the array. The array index is calculated by taking into account key's hashcode value and the size of the array. Each hashcode value can fall into the same bucket so these key-value pairs are stored in a LinkedList structure, this is known as collision in hash based collections. To get a value from a hashmap first the hashcode value of the key is computed and then the index position in the array is calculated which is also called bucket and if the bucket contains multiple values, all the values are iterated to find the correct value using the object(key) equals() method. So, proper implementation of hashcode() and equals() is necessary to ensure correct working of hash based collections.
Immutability
Immutable objects act as great building blocks for other objects whether mutable or immutable. So, immutable object make good map keys and set elements as their values will not change once they are in a map or set.
Initial Capacity and Load Factor
Initial Capacity and load factor plays an important part in deciding the performance of the hash based collections. The capacity is the number of buckets and the initial capacity is the capacity that is defined while creating a hashmap. The load factor decides how full the hashmap can get before its capacity is increased. The default capacity and load factor of hashmap are 16 and 0.75 respectively this means initially hashmap will contain 16 buckets and the capacity of the hashmap is doubled when it is 75% filled.
Higher values of load factor will decrease the space overhead but it will eventually increase the look up cost. While lower values of load factor will increase the rehashing operations thereby decreasing the performance. So, a good tradeoff between initial capacity and load factor is necessary to improve the performance of the hashmap by reducing the rehashing operations. Rehashing means to create a new array and map the old elements into the new array by applying the hashing function.
HashSet
HashSet is backed by HashMap instance and it stores unique elements but order in which the elements are stored is not defined, it also allows null element.
The value that is added to the HashSet acts as a key to the backing HashMap with a dummy object as the value to that key which is a static constant so that multiple add operations does not result in multiple instances of object. For example if we add a value to the hashset like
Set<String> collections= new HashSet<String>();
collections.add("HashSet");
then internally it is stored in a map as hashmap.put("HashSet",dummy) where dummy is
private static final Object dummy = new Object();
When iterating over a HashSet a keyset iterator is returned not the values. Since the elements stored in the HashSet are stored as keys into the HashMap.