Monday, 27 June 2011

Internal Working Of HashSet and HashMap

These are the most basic hash based collections that are used in every day programming, so here I will visit both HashSet and HashMap and explain their internal working.  First I will start with HashMap and then continue with HashSet.

HashMap 

HashMap stores its elements in the form of key value pairs but no order is maintained inside the map. Each key value pair in a map corresponds to an Entry object which is static inner class inside a HashMap class. Moreover HashMap is not a thread safe implementation of a Map interface like the Hashtable, also HashMap can store one null key and multiple null values unlike the Hashtable.


How the objects are stored inside the HashMap?


HashMap is made up of two collection implementations i.e. Array and LinkedList. 

It stores its key value pairs in the buckets where each bucket corresponds to an index in the array. The array index is calculated by taking into account key's hashcode value and the size of the array. Each hashcode value can fall into the same bucket so these key-value pairs are stored in a LinkedList structure, this is known as collision in hash based collections. To get a value from a hashmap first the hashcode value of the key is computed and then the index position in the array is calculated which is also called bucket and if the bucket contains multiple values, all the values are iterated to find the correct value using the object(key) equals() method. So, proper implementation of hashcode() and equals() is necessary to ensure correct working of hash based collections.


Immutability

Immutable objects act as great building blocks for other objects whether mutable or immutable. So, immutable object make good map keys and set elements as their values will not change once they are in a map or set.





Initial Capacity and Load Factor

Initial Capacity and load factor plays an important part in deciding the performance of the hash based collections. The capacity is the number of buckets and the initial capacity is the capacity that is defined while creating a hashmap. The load factor decides how full the hashmap can get before its capacity is increased. The default capacity and load factor of hashmap are 16 and 0.75 respectively this means initially hashmap will contain 16 buckets and the capacity of the hashmap is doubled when it is 75% filled.

Higher values of load factor will decrease the space overhead but it will eventually increase the look up cost. While lower values of load factor will increase the rehashing operations thereby decreasing the performance. So, a good tradeoff between initial capacity and load factor is necessary to improve the performance of the hashmap by reducing the rehashing operations. Rehashing means to create a new array and map the old elements into the new array by applying the hashing function.


HashSet


HashSet is backed by HashMap instance and it stores unique elements but order in which the elements are stored is not defined, it also allows null element.


The value that is added to the HashSet acts as a key to the backing HashMap with a dummy object as the value to that key which is a static constant so that multiple add operations does not result in multiple instances of object. For example if we add a value to the hashset like
Set<String> collections= new HashSet<String>();
collections.add("HashSet");
then internally it is stored in a map as hashmap.put("HashSet",dummy) where dummy is 
private static final Object dummy = new Object();


When iterating over a HashSet a keyset iterator is returned not the values. Since the elements stored in the HashSet are stored as keys into the HashMap. 

Friday, 24 June 2011

Strings and String Literal Pool

Strings are immutable


String objects are immutable objects i.e. a string object cannot be changed once it is created. For example in the below code string object is not changed instead a new string object is created on concatenation. So whenever some operations are performed on the String objects a new string object is created.

String s  = "Java ";
s = s.concat("Blog");


String Literal Pool


Whenever a class is loaded, java virtual machine checks for all the string literals in the class and for all the literals it finds whether they are already on the heap, if not a new instance is created on the heap and a reference is stored in the string literal pool. Subsequently any reference to that string literal in the program is replaced with the reference from the string literal pool. 

For example in the below code 

String s1 = "Blog";
String s2 = "Blog";
1. s1.equals(s2) : true
2. s1==s2 : true

the first statement returns true and so the second statement because the second time a new string object is not created instead it is referenced from the String literal pool.


But, in the below code 


String s1 = "Blog";
String s2 = new String("Blog");
1. s1.equals(s2) : true
2. s1 == s2 : false

second statement returned false because java virtual machine is bound to create a new string object no matter it was there in the String Literal Pool. So, avoid using new  when creating a string literal otherwise new string objects are created unnecessarily.