Caching Data in Java and LRU strategy
Q. What is the purpose of using a cache? A. A cache is an area of local memory that holds a copy of frequently accessed data that is otherwise expensive to get or compute. Examples of such data include a result of a query to a database, a disk file or a report. Caching is used primarily for performance and scalability. It is a mechanism by which subsequent requests for the data are served faster. Avoid the anti-pattern or pitfall of "Cache Them All", and cache only the data that is hard to get. Q. What are some of the requirements you need to consider when writing your own cache or using a cacheing framework like EHCache, OSCache, etc? A.
|
So, this is why you need to apply the "don't reinvent the wheel" principle and favor utilizing a proven and tested caching frameworks like EHCache that takes care of better memory management with LRU and FIFO eviction strategies, disk overflow, data expiration and many other optional advanced features, as opposed to writing your own. Having said this, there are cases where you might want to write your own simple caching solution or you might be asked in job interviews with questions like -- How will you implement an LRU cache in Java?
This is is very similar to the question, if Java did not have a HashMap class, how will you go about implementing your own? This was discussed with the answer and key considerations in the book "Core Java Career Esentials"
Q. What are the differences between HashMap, TreeMap (i.e. SortedMap), and a LinkedHashMap?
A. All three classes implement the Map interface and offer mostly the same functionality. The differences are in how the entries are iterated through
- HashMap does not offer any guarantees about the iteration order. Its iteration order can change completely when new elements are added.
- TreeMap will iterate according to the "natural ordering" of the keys according to their compareTo( ) method or based on an externally supplied Comparator. Additionally, it implements the SortedMap interface, which contains methods that depend on this sort order.
- LinkedHashMap will iterate in the order in which the entries were put into the map. The LinkedHashMap also provides a great starting point for creating a LRU Cache by overriding the removeEldestEntry( ) method. This lets you create a Cache that can expire data using some criteria that you define.
Q. What are the different types of algorithms that can be used to remove objects from the cache?
A. A variety of cache expiration mechanisms can remove objects from a cache. These algorithms are based on criteria such as least frequently used (LFU), least recently used (LRU), most recently used (MRU), first in first out (FIFO), last access time, and object size. Each algorithm has advantages and disadvantages. So, analyse your requirements and choose the most appropriate one.
In enterprise applications, object caching in a cluster is important because multiple JVMs run in a cluster, and it is crucial to keep all the cluster members' cached data in sync.
Q. How will you implement an LRU cache in Java?
A. Here is the sample code in Java
Firstly, define an interface so that the implementation can be changed if for example, in future you have a ConcurrentLinkedHashMap class. A typical interface for a Java cache will look like
public interface LRUCache<K,V> { public abstract V put(K key, V item); public abstract V get(K key); public abstract V atomicGetAndSet(K key, V item); }
Now, the implementation class.
import java.lang.ref.SoftReference; import java.util.LinkedHashMap; import java.util.Map; public class LRUCacheImpl<K, V> implements LRUCache<K, V> { //SoftReference is used for a memory friendly cache. //the value will be removed under memory shortage situations and //the keys of the values will be removed from the cache map. private final Map<K, SoftReference<V>> cache; public LRUCacheImpl(final int cacheSize) { // 'true' uses the access order instead of the insertion order. this.cache = new LinkedHashMap<K, SoftReference<V>> (cacheSize, 0.75f, true) { private static final long serialVersionUID = 1L; @Override protected boolean removeEldestEntry(Map.Entry<K, SoftReference<V>> eldest) { // When to remove the eldest entry i.e Least Recently Used (i.e LRU) entry return size() > cacheSize; // Size exceeded the max allowed. } }; } @Override public V put(K key, V value) { SoftReference<V> previousValueReference = cache.put(key, new SoftReference<V>(value)); return previousValueReference != null ? previousValueReference.get() : null; } @Override public V get(K key) { SoftReference<V> valueReference = cache.get(key); return valueReference != null ? valueReference.get() : null; } @Override public V atomicGetAndSet(K key, V value) { V result = get(key); put(key, value); return result; } }
Here some key points on the above code that are worthy of mentioning not only in the job interviews to stand out from your competition, but also to write quality code and get thumbs up in the code reviews.
- The above code would have been written as LRUCacheImpl<K, V>extends LinkedHashMap<K, V>, but the GoF design pattern favors composition over inheritance as inheritance is more fragile to changes.
- SoftReference is used as opposed to the hard reference, to force the items to be garbage collected when the Java heap runs low in memory.
- The methods are synchronized to be thread-safe. In future, if a ConcurrentLinkedHashMap were to be added, then it can be used instead the LinkedHashMap.
- If performance is of utmost importance is of utmost importance, then the public V get(K key) method can be executed asynchrously via a thread pool as this method will get frequently executed.
Q. What are the different types of caches?
A. There are different types of caches
- An application cache is a cache that an application accesses directly. An application benefits from using a cache by keeping most frequently accessed data in memory. This is also known as the first level cache. For example, Hibernate can use EHCache, OSCache, SwarmCache, or JBoss TreeCache as its first level cache.
- Level 2 (aka L2) cache provides caching services to an object-relational mapping (ORM) framework or a data mapping (DM) framework such as Hibernate or iBatis respectively by reducing the number of trips to the database. An L2 cache hides the complexity of the caching logic from an application.
- In memory Data Grids/Clouds for distributed caching with powerful analysis and management tools to give you a complete solution for managing fast-changing data in multiple servers, compute grid, or in the cloud. For example, Apache Hadoop, GridGain, Oracle Cherence, etc. A data grid is a reliable distrbuted cache that uses an external data source to retrieve data that is not present in the cache and an external data store to write the updates. In simple terms, grid or cloud computing is all about setting up an HTTP server to farm out requests RESTfully, and accept responses the same way. You can also use messaging with JMS, RMI or even the old-school Corba. But why reinvent the wheel when there are open-source frameworks like Hadoop, GridGain, Hazelcast, etc.
Q. Will WeakHashMap make a good cache? Explain your answer?
A. No. There are 2 reasons for it.
Q. Where would you use a WeakHashMap ?
A. It is good to implement canonical maps where you can associate some extra information to an object that you have a strong reference to. You put an entry in a WeakHashMap with the object as the key, and the extra information as the map value. This means, as long as you keep a strong reference to the object, you will be able to check the map to retrieve the extra information, and once you release the object, the map entry will be cleared and the memory used by the extra information will be released.
A. No. There are 2 reasons for it.
- It uses weak references as the underlying memory management mechanism. If the garbage collector determines that an object is weakly reachable, it will clear atomically all weak references to the object., whereas if the garbage collector determines that an object is softly reachable (i.e. uses a soft reference as opposed to a weak refrence), it may clear atomically all soft references to the object, in the case that it finds that memory is running low, or at its own discretion.
- In a WeakHashMap, the weak references are used for the keys and not for the values. You want the map values to use a weaker reference.
Q. Where would you use a WeakHashMap ?
A. It is good to implement canonical maps where you can associate some extra information to an object that you have a strong reference to. You put an entry in a WeakHashMap with the object as the key, and the extra information as the map value. This means, as long as you keep a strong reference to the object, you will be able to check the map to retrieve the extra information, and once you release the object, the map entry will be cleared and the memory used by the extra information will be released.
Labels: Java Key Areas
5 Comments:
Nicely done ! Bookmarked.
Simple explanation on nontrivial stuff, well done!
Thanks for sharing your valuable insights about cache. Could you please point me how to implement object caching in cluster enviornment and what are the considerations needs to be noted down? is the above LRU cache in Java implemented in cluster environment? if two jvms in different machine, which place can i share the objects for two jvms? how to synchronizely update the objects if newly added objects. i hope you got my questions. thanks
You need to look at the distributed caching. For example EHCache with terracotta or EHCache with RMI. Have a look at http://ehcache.org/documentation/2.5/configuration/distributed-cache-configuration
This article is very Helpful,But does it solves my problem.
http://stackoverflow.com/questions/24245989/ehcache-compex-query-for-different-conditions
Post a Comment
Subscribe to Post Comments [Atom]
<< Home