Jul 22, 2014

11 tips to writing low latency real-time applications in Java

Q. Have you seen job advertisements requiring Java candidates to work in real-time or high volume transaction processing systems? Wondering what questions you will be asked?

Real-time and low-latency are distinctly separate subjects although often related. Real-time is about being more predictable than fast. Low latency systems need to be fast to meet SLAs (Service Level Acceptances) in sub milliseconds (e.g. micro seconds).

Tip #1: Use a RTSJ (Real Time Specification for Java ) JVM.  IBM, Oracle, and other smaller vendors have implemented this, but it comes at a cost. Oracle's JavaRT, IBM's real-time WebSpere, and aicas JamaicaVM to name a few popular ones. In real time JVM, instead of writing java.lang.Thread you just have to write, javax.realtime.RealtimeThread.

Tip #2: Big O notation for algorithms: Ensure all your data structures related algorithms are O(1) or at least O(log n). This is probably the biggest cause of performance issues. Make sure that you have performance tests with real size data. Also, make sure that your algorithms are cache friendly. It is imperative to use proper cache strategies to minimize garbage collection pauses by having proper cache expiry strategy, using weak references for cache, reducing cache by carefully deciding what to cache, increasing the cache size along with the heap memory to reduce object eviction from cache, etc.  Understanding Big O notations through Java examples

Tip #3: Lock free:  Use lock free algorithms and I/O.  Even the most well designed concurrent application that uses locks is at risk of blocking. For example, the java.util.concurrent package that allows concurrent reads and the Java NIO (New I/O using non-blocking multiplexers) respectively. Blocking is not good for low latency applications. Minimize context switching among threads by having threads not more than the number of CPU cores in your machine.

Tip #4: Reduce memory size: Reduce the number of objects you create. Apply the flyweight design pattern where applicable. Favor stateless objects. Where applicable write immutable objects that can be shared between threads. Fewer objects mean lesser GC.

Tip #5: Tune your JVM:  Tune your JVM with appropriate heap sizes and GC configuration. Before tuning profile your application with real life data. Basically you want to avoid GC pauses and increase GC throughput. GC throughput is a measure of % of time not spent on GC over a long period of time.  Specialist GC collectors like the Azul collector can in many cases solve this problem for you out of the box, but for many you who use the Oracle's GC, you need to understand how GC works and tune it to minimize the pauses. The default JVM options optimize for throughput, and latencies could be improved by switching to the Concurrent Garbarge Collector.

GC tuning is very application specific. It is imperative to understand following topics

-- You need to first understand how your application uses the garbage collection. Memory is cheap and abundant on modern servers, but garbage collector pauses is a serious obstacle for using larger memory sizes.  You should configure GC so that
  • Enable diagnostic options (-XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCTimestamps).
  • Decide the total amount of memory you can afford for the JVM by graphing your own performance metric against young generation sizes to find the best setting.
  • Make plenty of memory available to the younger (i.e eden) generation. The default is calculated from NewRatio and the -Xmx setting.
  • Make the survival space to be same size as Eden (-XX:SurvivorRatio=1) and increase new space to account for growth of the survivor spaces  (-XX:MaxNewSize= -XX:NewSize=
  • Larger younger generation spaces increase the spacing between full GCs. But young space collections could take a proportionally longer time. In general, keep the eden size between one fourth and one third the maximum heap size. The old generation must be larger than the new generation.
Tip #6: Favor primitives to wrapper classes to eliminate auto-boxing and un-boxing: In situations where getter and setter methods are called very frequently for the wrapper classes like Integer, Float, Double, etc the performance is going to be adversely impacted due to auto boxing and unboxing. The operations like x++ will also provide poor performance if x is an Integer and not an int. So, avoid using wrappers in performance critical loops.

Tip #7: Good caching strategy and applying the short-circuit pattern

Short-circuit pattern is handy for I/O related patterns like socket or URL based, database operations, and complex File I/O operations. I/O operations need to complete within a short amount of time, but with low latency Web sites, the short-circuit pattern can be applied to time-out long running I/O tasks, and then can either display an error message or show cached results.

Tip #8: Coding best practices to avoid performance issues due to death by 1000 cuts.

  • When using arrays it is always efficient to copy arrays using System.arraycopy( ) than using a loop. The following example shows the difference.
  • When using short circuit operators place the expression which is likely to evaluate to false on extreme left if the expression contains &&.
  • Do not use exception handling inside loops.
  • Avoid using method calls to check for termination condition in a loop.
  • Short-circuit equals( ) in large object graphs where it compares for identity first

public boolean equals(Object other) {
    if (this == other) return true;
    if (other == null) return false;
    // Rest of equality logic...

Tip #9: Experience and knowledge with some of  the libraries like

These libraries are aimed at providing reduced memory size, less impact on GC, lock free concurrent processing, data structure algorithmic efficiency, etc.
  • NIO-based scalable server applications by directly using java.nio package or framework like Apache MINA.
  • FIX protocol and commercial FIX libraries like Cameron FIX.
  • Use  Java 5 concurrency utilities, and locks.
  • Lock free Java disruptor library for high throughput.
  • Chronicle Java library for low latency and high throughput, which almost uses no heap, hence has trivial impact on GC.
  • Trove collection libraries for primitives. Alternative for the JDK wrapper classes like java.lang.Integer for primitives requiring less space and providing better performance.
  • Javolution library with real-time classes. For example, Javolution XML provides real-time marshaling and unmarshaling.

Tip #10: How is your data stored? Are you using a SQL database? How will that scale? Can you use a NoSQL data tore instead.  Transactional systems need SQL for transaction demarcation.

Relational and NoSQL data models are very different.

SQL Model:

The relational model takes data and  store them in many normalized interrelated tables that contain rows and columns. Tables relate with each other through foreign keys.  When looking up data, the desired information needs to be collected by joining many related tables and combined before it can be provided to the application.

NoSQL Model 

NoSQL databases have a very different model. NoSQL databases have been built from the ground up to be distributed, scale-out technologies and therefore fit better with the highly distributed nature of the three-tier Internet architecture. A document-oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format. Each JSON document can be thought of as an object to be used by your application. This might relate to data agregated from 10+ tables in an SQL model.

Tip #11: Pay attention to network round trips, payload sizes and type, protocols used, service timeouts and retries.

Labels: ,


Post a Comment

Subscribe to Post Comments [Atom]

<< Home