Java-Success Blog

Jul 22, 2014

11 tips to writing low latency real-time applications in Java

Q. Have you seen job advertisements requiring Java candidates to work in real-time or high volume transaction processing systems? Wondering what questions you will be asked?

Real-time and low-latency are distinctly separate subjects although often related. Real-time is about being more predictable than fast. Low latency systems need to be fast to meet SLAs (Service Level Acceptances) in sub milliseconds (e.g. micro seconds).

Tip #1: Use a RTSJ (Real Time Specification for Java ) JVM. IBM, Oracle, and other smaller vendors have implemented this, but it comes at a cost. Oracle's JavaRT, IBM's real-time WebSpere, and aicas JamaicaVM to name a few popular ones. In real time JVM, instead of writing java.lang.Thread you just have to write, javax.realtime.RealtimeThread.

Tip #2: Big O notation for algorithms: Ensure all your data structures related algorithms are O(1) or at least O(log n). This is probably the biggest cause of performance issues. Make sure that you have performance tests with real size data. Also, make sure that your algorithms are cache friendly. It is imperative to use proper cache strategies to minimize garbage collection pauses by having proper cache expiry strategy, using weak references for cache, reducing cache by carefully deciding what to cache, increasing the cache size along with the heap memory to reduce object eviction from cache, etc. Understanding Big O notations through Java examples

Tip #3: Lock free: Use lock free algorithms and I/O. Even the most well designed concurrent application that uses locks is at risk of blocking. For example, the java.util.concurrent package that allows concurrent reads and the Java NIO (New I/O using non-blocking multiplexers) respectively. Blocking is not good for low latency applications. Minimize context switching among threads by having threads not more than the number of CPU cores in your machine.

Tip #4: Reduce memory size: Reduce the number of objects you create. Apply the flyweight design pattern where applicable. Favor stateless objects. Where applicable write immutable objects that can be shared between threads. Fewer objects mean lesser GC.

Tip #5: Tune your JVM: Tune your JVM with appropriate heap sizes and GC configuration. Before tuning profile your application with real life data. Basically you want to avoid GC pauses and increase GC throughput. GC throughput is a measure of % of time not spent on GC over a long period of time. Specialist GC collectors like the Azul collector can in many cases solve this problem for you out of the box, but for many you who use the Oracle's GC, you need to understand how GC works and tune it to minimize the pauses. The default JVM options optimize for throughput, and latencies could be improved by switching to the Concurrent Garbarge Collector.

GC tuning is very application specific. It is imperative to understand following topics

-- You need to first understand how your application uses the garbage collection. Memory is cheap and abundant on modern servers, but garbage collector pauses is a serious obstacle for using larger memory sizes. You should configure GC so that

Enable diagnostic options (-XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+PrintGCTimestamps).
Decide the total amount of memory you can afford for the JVM by graphing your own performance metric against young generation sizes to find the best setting.
Make plenty of memory available to the younger (i.e eden) generation. The default is calculated from NewRatio and the -Xmx setting.
Make the survival space to be same size as Eden (-XX:SurvivorRatio=1) and increase new space to account for growth of the survivor spaces (-XX:MaxNewSize= -XX:NewSize=)
Larger younger generation spaces increase the spacing between full GCs. But young space collections could take a proportionally longer time. In general, keep the eden size between one fourth and one third the maximum heap size. The old generation must be larger than the new generation.

Tip #6: Favor primitives to wrapper classes to eliminate auto-boxing and un-boxing: In situations where getter and setter methods are called very frequently for the wrapper classes like Integer, Float, Double, etc the performance is going to be adversely impacted due to auto boxing and unboxing. The operations like x++ will also provide poor performance if x is an Integer and not an int. So, avoid using wrappers in performance critical loops.

Tip #7: Good caching strategy and applying the short-circuit pattern

Short-circuit pattern is handy for I/O related patterns like socket or URL based, database operations, and complex File I/O operations. I/O operations need to complete within a short amount of time, but with low latency Web sites, the short-circuit pattern can be applied to time-out long running I/O tasks, and then can either display an error message or show cached results.

Tip #8: Coding best practices to avoid performance issues due to death by 1000 cuts.

When using arrays it is always efficient to copy arrays using System.arraycopy( ) than using a loop. The following example shows the difference.
When using short circuit operators place the expression which is likely to evaluate to false on extreme left if the expression contains &&.
Do not use exception handling inside loops.
Avoid using method calls to check for termination condition in a loop.
Short-circuit equals( ) in large object graphs where it compares for identity first

@Override
public boolean equals(Object other) {
    if (this == other) return true;
    if (other == null) return false;
 
    // Rest of equality logic...
}

Tip #9: Experience and knowledge with some of the libraries like

These libraries are aimed at providing reduced memory size, less impact on GC, lock free concurrent processing, data structure algorithmic efficiency, etc.

NIO-based scalable server applications by directly using java.nio package or framework like Apache MINA.
FIX protocol and commercial FIX libraries like Cameron FIX.
Use Java 5 concurrency utilities, and locks.
Lock free Java disruptor library for high throughput.
Chronicle Java library for low latency and high throughput, which almost uses no heap, hence has trivial impact on GC.
Trove collection libraries for primitives. Alternative for the JDK wrapper classes like java.lang.Integer for primitives requiring less space and providing better performance.
Javolution library with real-time classes. For example, Javolution XML provides real-time marshaling and unmarshaling.

Tip #10: How is your data stored? Are you using a SQL database? How will that scale? Can you use a NoSQL data tore instead. Transactional systems need SQL for transaction demarcation.

Relational and NoSQL data models are very different.

SQL Model:

The relational model takes data and store them in many normalized interrelated tables that contain rows and columns. Tables relate with each other through foreign keys. When looking up data, the desired information needs to be collected by joining many related tables and combined before it can be provided to the application.

NoSQL Model

NoSQL databases have a very different model. NoSQL databases have been built from the ground up to be distributed, scale-out technologies and therefore fit better with the highly distributed nature of the three-tier Internet architecture. A document-oriented NoSQL database takes the data you want to store and aggregates it into documents using the JSON format. Each JSON document can be thought of as an object to be used by your application. This might relate to data agregated from 10+ tables in an SQL model.

Tip #11: Pay attention to network round trips, payload sizes and type, protocols used, service timeouts and retries.

Labels: Performance, Scalability

Dec 11, 2013

Scalable Straight Through Processing System (OLTP) vs OLAP in Java

Large mission critical applications use Straight Through Processing, and these systems need to be highly scalable. So, when you apply for these high paying jobs, it really pays to prepare for your job interviews with the following questions and answers.

Q. What is Straight Through Processing (STP)?
A. This is the definition from INVESTOPEDIA.

"An initiative used by companies in the financial world to optimize the speed at which transactions are processed. This is performed by allowing information that has been electronically entered to be transferred from one party to another in the settlement process without manually re-entering the same pieces of information repeatedly over the entire sequence of events."

Q. How will you go about designing a STP system?
A. Most conceptual architectures use a hybrid approach using a combination of different architectures based on the benefits of each approach and its pertinence to your situation. Here is a sample hybrid approach depicting an online trading system, which is a STP system.

The above system is designed for:

Placing Buy/Sell online trades real time. The trades are validated first and then sent all the way to Stock Exchange system using the FIX (Financial Information eXchange) protocol. Here some usefil links on FIX.

Are you going for programming jobs at investment banks or financial institutions? Knowing some investment and trading terms will be useful?

Once the trade is matched, the contract notes are asynchrnously issued via the SETTLEMENT queue, and processed by an ESB (Enterprise Service Bus) system like web Methods, Tibco, or Websophere MQ. Here are some useful links on asynchronous processing.

The above system is an operational OLTP (i.e. On-Line Transaction Processing) system. These systems are also known as STP (i.e. Straight Through Processing) system. This leads to another question.

Q. What is the difference between OLTP and OLAP?

A. OLTP stands for On-Line Transaction Processing and OLAP stands for On-Line Analytical Processing. OLAP contains a multidimensional or relational data store designed to provide quick access to pre-summarized data & multidimensional analysis.

MOLAP: Multidimensional OLAP – enabling OLAP by providing cubes.
ROLAP: Relational OLAP – enabling OLAP using a relational database management system

OLTP	OLAP
Creates operational source data from transactional systems as shown in the above diagram. This data is the source of truth for many other systems.	Data comes from various OLTP data sources as shown in the above diagram.
Transactional and normalized data is used for daily operational business activities.	Historical, de-normalized and aggregated multidimensional data is used for analysis and decision making. This is also known as for BI (i.e. Business Intelligence).
Data is inserted via short inserts and updates. The data is normally captured via user actions.	Periodic (i.e. scheduled) and long running (i.e. during off-peak) batch jobs refresh the data. Also, known as ETL process as shown in the below diagram.
The database design involves highly normalized tables.	The database design involves de-normalized tables for speed. Also, requires more indexes for the aggregated data.
Regular backup of data is required to prevent any loss of data, monetary loss, and legal liability.	Data can be reloaded from the OLTP systems if required. Hence, stringent backup is not required.
Transactional data older than certain period can be archived and purged based on the compliance requirements.	Contains historical data. The volume of this data will be higher as well due to its requirement to maintain historical data.
The typical users are operational staff.	The typical users are management and executives to make business decisions.
The space requirement is relatively small if the historical data is regularly archived.	The space requirement is larger due to the existence of aggregation structures and historical data. Also requires more spaces for the indexes.

Labels: Architecture, Scalability

Oct 3, 2013

Scaling your application -- Vertical Vs Horizontal scaling

Many organization face scalability and performance issues. So, it is really worth knowing your way around these topics.

Q. What is the difference between performance and scalability?
A. The performance and scalability are two different things.

For example, if you are in the business of transporting people in a horse carriage, the performance is all about utilizing more powerful horses to transporting your people quicker to their destination. Scalability is all about catering for increase in demand for such transportation as your business grows by either increasing the capacity of individual actors (e.g. carriage capacity) or adding more actors (e.g. horses and carriages).

Q. What are the different types of scalability?
A. Vertical and Horizontal scaling.

Vertical Scaling: You can increase the capacity of a horse carriage or use more powerful horses to reduce the time it takes to reach the destination. In a computer term, increase CPU, memory, etc to increase the capacity or tune the code/ database to reduce the time it takes to process. This means we have just increased the capacity of each actor -- horse and/or carriage. You can also vertically scale an application via multi-threading or using non-blocking I/O.

Horizontal Scaling: In a horizontal scaling model, instead of increasing the capacity of each individual actor in the system, we simply add more actors to the system. This means more horses and carriages. In terms of the computers, adding more nodes and servers.

Q. How will you scale your data store?
A. The scalability of database is critical because data is often a shared resource, and it becomes the main contact point for nearly every web request. The most important question you have to ask when considering the scalability of your database is, “What kind of system am I working with?” Are you working with a read-heavy or a write-heavy system?

Scaling Reads: If your website is primarily a read-centric system, vertically scale your data store with a caching strategy that uses memory cache (e.g. ehcache) or a CDN (Content Delivery Newtork). You can also add more CPU/RAM/Disk to scale vertically.

Scaling Writes: If your website is primarily a write-heavy system, you want to think about using a horizontally scalable datastore such as MongoDB (NoSQL database), Riak, Cassandra or HBase. MongoDB is a NoSQL database with great features like replication and sharding built in. This allows you to scale your database to as many servers as you would like by distributing content among them. A database shard ("sharding") is the phrase used to describe a horizontal partition in a database or search engine. The idea behind sharding is to split data among multiple machines while ensuring that the data is always accessed from the correct place. Since sharding spreads the database across multiple machines, the database programmer specifies explicit sharding rules to determine which machines any piece of data will be stored on. Sharding may also be referred to as horizontal scaling or horizontal partitioning. Oracle uses (RAC - Real Application Cluster) where small server blades are genned-in to an Oracle RAC cluster over a high-speed interconnect.

Q. What is BigData?
A. Big data is the term for a collection of data sets so large and complex that becomes very difficult to work with using most relational database management systems and desktop statistics and visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers". Apache™ Hadoop® is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance.

Hadoop uses MapReduce to understand and assign work to nodes in the cluster and HDFS(Hadoop Distributed File System), which is file system that spans all the nodes in a Hadoop cluster for data storage.

Q. What are the general scaling practices for a medium size system in Java?
A.

Using non-blocking IO and favoring multi-threading.
Vertical scaling -- more CPU, RAM, etc.
Caching data.
Favor stateless idempotent methods.
Using big JVM heaps
Using JMS -- publish/subscribe model
Using resource pooling - e.g. database connection pooling, JMS connection factory pooling, thread pooling, etc.

Q. What are the general scaling practices for a large size system in Java?
A.

Use RTSJ (Real Time Specification for Java): Java has the following real time difficulties:

During garbage collection all threads are blocked and the garbage collection time can expand to minutes. These huge latencies effectively limit memory which limits scalability.
Increased garbage collection latencies make Java less useful for application that use heart beats, make real-time trades, etc.
Java supports a strict priority based threading model.

To overcome this, the Java Community introduced a specification for real-time Java, JSR001 (RTSJ -- Real Time Specification for Java)

RTSJ addressed these critical issues by mandating a minimum specification for the threading model (and allowing other models to be plugged into the VM) and by providing for areas of memory that are not subject to garbage collection, along with threads that are not preemptable by the garbage collector. These areas are instead managed using region-based memory management.

Use Big Data like MongDB.
Use distributed cache.
Use Server clusters/JVM clustering (e.g. terracotta).
SEDA based architecture.
Cloud computing.