Oct 10, 2014

When to use which Java collection or data structure? and why?

List, Set, Map, and Queue(access the ends FIFO or LIFO) are the basic Java data structure interfaces for which there are different implementations to cater for different usage patterns. Java data structure interview questions are very popular, and pays to know the differences.

#1 Array (Employee[]) Vs List (List<Employee>): 
  • Array is a fixed length data structure whilst a List is a variable length Collection class.
  • java.util.Arrays has the helper classes for the arrays and java.util.Collections has the helper classes for the Collection classes.
  • An array can use primitive data types, but the Collection classes can only use objects. 

Q. Which one to favor/use?
A. Favor  List.

  • Arrays are inflexible and do not have the expressive power of generic types.  
  • List gives you the data abstraction as you can swap ArrayList, LinkedList, CopyOnWriteArrayList, etc depending on the requirements.
  • List allows you to add and subtract elements even it is an O(n) operation in worst case scenario.

#2: ArrayList (ArrayList<employee>)Vs LinkedList (LinkedList<employee>)

  • Insertions and deletions are faster in LinkedList compared to an ArrayList as LinkedList uses links (i.e. before and next reference) as opposed to an ArrayList, which uses an array under the covers, and may need to resize the array if the array gets full.  Adding to an ArrayList has a worst case scenario of O(n) whilst LinkedList has O(1). 
  • LinkedList has more memory footprint than ArrayList. An ArrayList only holds actual object whereas LinkedList holds both data and reference of next and previous node.  
  • Random access has the worst case scenario of O(n) in LinkedList as to access 6th element in a LinkedList with 8 elements, you need to traverse through 1 to 5th element before you can get to the 6th element, whereas in an ArrayList, you can get the 6th element with O(1) with list.get(5)

Q. Which one to favor/use?
A. Almost always favor an ArrayList. ArrayLists are good for write once and randomly read many times, and for adding elements at the end, but bad at add/remove from the front or middle. LinkedLists are useful in rare usage patterns where insertion and deletion in the front and middle is significantly more than the retrievals, and the LinkedList can live with the O(n) cost of random access.

If you are after adding/removing from both ends, ArrayDeque is better than a LinkedList. The Deque interface is pronounced as "deck", and represents a double-ended queue. ArrayDeque is backed by an array. ArrayDeque can be used as both a Queue (i.e. FIFO) and a Stack (i.e. LIFO). ArrayDeque performs better than LinkedList, as  ArrayDeque is backed by an array and Array is more cache friendly. Also, ArrayDeque is more memory efficient than LinkedList since it does not have to keep an additional reference to previous or next node. ArrayDeque does not take null elements, but a LinkedList does. The best operation in a LinkedList implementation is removing the current element during the iteration. LinkedList implementations are also not ideal to iterate. So, LinkedList is very rarely used.

#3: List Vs Set

Set can't have duplicates whereas a List can have duplicates.

#4: HasMap Vs TreeMap

TreeMap is an implementation of a SortedMap, where the order of the keys can be sorted, and when iterating over the keys, you can expect that keys will be in order. HashMap on the other hand, makes no such guarantee on the order.

Q. Which one to favor/use?
A. In general, favor HashMap as it is more efficient in general, and use a Comparator to sort when required. Use a TreeMap only when it is required to keep the keys in a sorted order to iterate over them.

#5: HashSet, ArrayList Vs. CopyOnWriteSet, CopyOnWriteArrayList

HashSet and ArryList are not thread-safe and you need to provide your own synchronization, whereas CopyOnWriteArraySet and CopyOnWriteArrayList are not only thread-safe, but more efficient as they allow concurrent multiple  reads and single write. This concurrent read and write behavior is accomplished by  making a brand new copy of the list every time it is altered.

Q. Which one to favor/use?
A. Favor CopyOnWriteArraySet or CopyOnWriteArrayList only when the number of reads is significantly more than the number of writes. It also gives you fail safe iteration when you want to add/remove elements during iteration.

#6: HashMap Vs ConcurrentHashMap

HashMap is not thread-safe and you need to provide your own synchronization with Collections.synchornizedMap(hashMap), which will return a collection which is almost equivalent to the legacy Hashtable, where every modification operation on Map is locked. As the name implies, ConcurrentHashMap provides thread-safety by dividing the whole map into different partitions based upon concurrency level and locking only particular portion instead of locking the whole map.  ConcurrentHashMap does not allow NULL key values, whereas HashMap there can only be one null key.

Q. Which one to favor/use?
A. Favor ConcurrentHashMap  for scalability.

#7: new SynchronousQueue( ) Vs. new ArrayBlockingQueue(1) or LinkedBlockingQueue(1)

SynchronousQueue is a queue that can only contain a single element internally. A thread inserting an element into a SynchronousQueue blocked until another thread takes that element from the queue. Similarly, if a thread tries to take an element, and no element is currently present, that thread will be blocked until a thread inserts an element into the SynchronousQueue.

SynchronousQueue provides extremely good throughput when compared to a unit sized ArrayBlockingQueue(1) or LinkedBlockingQueue(1). SynchronousQueue  is the default BlockingQueue used in the Executor framework for the Executors.newCachedThreadPool( ).

Q. Which one to favor/use?
  • Use SynchronousQueue in scenarios where a task to be executed asynchronously while discarding any further requests until the task is finished. 
  • executor = new ThreadPoolExecutor(
        1, 1, 
        2000, TimeUnit.SECONDS, 
        new SynchronousQueue<runnable>(),
        new ThreadPoolExecutor.DiscardPolicy());
  • Use SynchronousQueue to improve application performance. 

#8: HashMap Vs LinkedHashMap

LinkedHashMap will iterate in the order in which the entries were put into the map. HashMap does not provide any grantees about the iteration order.

Q. What is a BlockingQueue?
A. BlockingQueue is a Queue that supports additional operations that wait for the queue to become non-empty when retrieving an element, and wait for space to become available in the queue when storing an element. The main advantage is that a BlockingQueue is that it provides a correct, thread-safe implementation with  throttling.

  • The producers are throttled to add elements if the consumers get too far behind. 
  • If the Queue capacity is limited, the memory consumption will be limited as well. 

Q. Why Vector, Stack, and Hashtable are legacy data structures and not to be used?
A. All methods in these classes are synchronized (i.e. coarse grained lock), hence not efficient. Favor the concurrent data structures for concurrent read and single write.



Post a Comment

Subscribe to Post Comments [Atom]

<< Home