Google

Jan 31, 2013

Hibernate automatic dirty checking of persistent objects and handling detached objects



Q. What do you understand by automatic dirty checking in Hibernate?
A. Dirty checking is a feature of hibernate that saves time and effort to update the database when states of objects are modified inside a transaction. All persistent objects are monitored by hibernate.It detects which objects have been modified and then calls update statements on all updated objects.

Hibernate Session contains a PersistenceContext object that maintains a cache of all the objects read from the database as a Map. So, when you modify an object within the same session, Hibernate compares the objects and triggers the updates when the session is flushed. The objects that are in the PersistenceContext are pesistent objects.

Q. How do you perform dirty checks for detached objects?
A. When the session is closed, the PersistenceContext is lost and so is the cached copy, and the persistent object becomes a detached object. Detached objects can be passed all the way to the presentation layer.  When you reatch a detached object through merge( ), update( ), or saveOrUpdate( ) methods, a new session is created with an empty PersistenceContext, hence there is nothing to compare against to perform the dirty check. Here is how you overcome this scenario with the help of the following annotation selectBeforeUpdate = true, by default it is set to false.


To save the change to a detached object, you do something like


employee.setLastname("Smith");              // modifying a detached object
Session sess = sessionFactory.getSession(); // open a new session with an empty PersistenceContext
Transaction tx = sess.beginTransaction();   //begin a new transaction
sess.update(employee);                      //The detached object becomes a persistent object by reattaching
employee.setFirstName("John");              //modify the persistent object 
tx.commit();


When the update() call is made, hibernate issues an SQL UPDATE statement. This happens irrespective of weather the object has changed after detaching or not. One way to avoid this UPDATE statement while reattaching is by setting the select-before-update= “true”. If this is set, hibernates tries to determine if the UPDATE needed by executing the SELECT statement.

@Entity
@org.hibernate.annotations.Entity(selectBeforeUpdate = true)
@Table(name = "tbl_employee")
public class Employee extends MyAppDomainObject implements Serializable {
    .....
}


In rare scenarios, where you are confident that a particular object can never be modified, hence you can tell hibernate that an UPDATE statement will never be needed by setting the following annotation

@Entity
@org.hibernate.annotations.Entity(mutable = false)
@Table(name = "tbl_employee")
public class Employee extends MyAppDomainObject implements Serializable {
    .....
}


Alternatively, if you want to decide if an object's state has changed or not and then decide if a redundant update call to be made or not, you can implement your own DirtyCheckInterceptor by either implementing Hibenate's Inteceptor interface or extending the EmptyInterceptor class. The interceptor can be bootstrapped as shown below

SessionFactory.openSession( new DirtyCheckInterceptor() );


and, Hibernate's FlushEntityEventListener's onFlushEntity implementation calls the registered interceptor before making an update call.

Another possible approach would be to clone the detached objects and store it somewhere in a HttpSession and then use that to populate the PersistenceContext when reattaching the object. For example,

Session sess = sessionFactory.getSession();
PersistenceContext persistenceContext = session instanceof SessionImpl ? ((SessionImpl) session).getPersistenceContext(): null;

if (persistenceContext != null) {
  addPreviouslyStoredEntitiesToPersistenceContext(persistenceContext, storedObjects);
}



Q. What do you understand by the terms optimistic locking versus pessimistic locking?
A. Optimistic locking means a specific record in the database table is open for all users/sessions. Optimistic locking uses a strategy where you read a record, make a note of the version number and check that the version number hasn't changed before you write the record back. When you write the record back, you filter the update on the version to make sure that it hasn't been updated between when you check the version and write the record to the disk. If the record is dirty (i.e. different version to yours) you abort the transaction and the user can re-start it.

You could also use other strategies like checking  the timestamp or all the modified fields (this is useful for legacy tables that don't have version number or timestamp column). Note: The strategy to compare version numbers and timestamp will work well with detached hibernate objects as well. Hibernate will  automatically manage the version numbers.

In Hibernate, you can use either long number or Date for versioning

@Version
private long id;


or

@Version
private Date version;


and mark

@Entity
@org.hibernate.annotations.Entity(selectBeforeUpdate = true, optimisticLock=OptimisticLockType.VERSION)
@Table(name = "tbl_employee")
public class Employee extends MyAppDomainObject implements Serializable {
    .....
}


if you have a legacy table that does not have a version or timestamp column, then use either

@Entity
@org.hibernate.annotations.Entity(selectBeforeUpdate = true, optimisticLock=OptimisticLockType.ALL)
@Table(name = "tbl_employee")
public class Employee extends MyAppDomainObject implements Serializable {
    .....
}


for all fields and

@Entity
@org.hibernate.annotations.Entity(selectBeforeUpdate = true, optimisticLock=OptimisticLockType.DIRTY)
@Table(name = "tbl_employee")
public class Employee extends MyAppDomainObject implements Serializable {
    .....
}


for dirty fileds only.

Pessimistic locking means a specific record in the database table is open for read/write only for that current session. The other session users can not edit the same because you lock the record for your exclusive use until you have finished with it. It has much better integrity than optimistic locking, but requires you to be careful with your application design to avoid deadlocks. In pessimistic locking, appropriate transaction isolation levels need to be set, so that the records can be locked at different levels. The general isolation levels are

  • Read uncommitted isolation
  • Read committed isolation
  • Repeatable read isolation
  • Serializable isolation


It can be dangerous to use "read uncommitted isolation" as it uses one transaction’s uncommitted changes in a different transaction. The "Serializable isolation"  is used to protect phantom reads, phantom reads are not usually problematic, and this isolation level tends to scale very poorly. So, if you are using pessimistic locking, then read commited and repeatable reads are the most common ones.

Labels:

Jan 29, 2013

Java coding -- reverse enum lookup and sorting objects with a comparator

This blog post covers three core Java concepts:

1) Writing value objects to store data.
2) Writing enum classes with reverse look-up.
3) Lastly, but most importantly  custom sorting your value objects with a comparator.


Step 1: Writing your value object class.

package com.myapp.model;

import java.io.Serializable;
import java.math.BigDecimal;
import java.util.Date;

public class CashForecastSummary implements Serializable
{
 private static final long serialVersionUID = 2663449836220393299L;
 
 private String portfolioCode;
 private Date valuationDate;
 private String accountCd; 
 private BigDecimal totalAmount = BigDecimal.ZERO;
 private String currencyCode;
 private String transactionTypeDesc;
 private ForecastDay foreCastDay;
 private Date foreCastDate;  
 private RecordType recordType;
 
 public String getPortfolioCode() {
  return portfolioCode;
 }
 public void setPortfolioCode(String portfolioCode) {
  this.portfolioCode = portfolioCode;
 }
 public Date getValuationDate() {
  return valuationDate;
 }
 public void setValuationDate(Date valuationDate) {
  this.valuationDate = valuationDate;
 }
 public String getAccountCd() {
  return accountCd;
 }
 public void setAccountCd(String accountCd) {
  this.accountCd = accountCd;
 }
 public BigDecimal getTotalAmount() {
  return totalAmount;
 }
 public void setTotalAmount(BigDecimal totalAmount) {
  this.totalAmount = totalAmount;
 }
 public String getCurrencyCode() {
  return currencyCode;
 }
 public void setCurrencyCode(String currencyCode) {
  this.currencyCode = currencyCode;
 }
 public String getTransactionTypeDesc() {
  return transactionTypeDesc;
 }
 public void setTransactionTypeDesc(String transactionTypeDesc) {
  this.transactionTypeDesc = transactionTypeDesc;
 }
 public ForecastDay getForeCastDay() {
  return foreCastDay;
 }
 public void setForeCastDay(ForecastDay foreCastDay) {
  this.foreCastDay = foreCastDay;
 } 
 public Date getForeCastDate() {
  return foreCastDate;
 }
 public void setForeCastDate(Date foreCastDate) {
  this.foreCastDate = foreCastDate;
 }
 public RecordType getRecordType() {
  return recordType;
 }
 public void setRecordType(RecordType recordType) {
  this.recordType = recordType;
 }
 
 @Override
 public String toString() 
 {
  return "CashForecastSummary [portfolioCode=" + portfolioCode
    + ", valuationDate=" + valuationDate + ", accountCd="
    + accountCd + ", totalAmount=" + totalAmount
    + ", currencyCode=" + currencyCode + ", transactionTypeDesc="
    + transactionTypeDesc + ", foreCastDay=" + foreCastDay
    + ", recordType=" + recordType + "]";
 }
 
 @Override
 public int hashCode() {
  final int prime = 31;
  int result = 1;
  result = prime * result
    + ((accountCd == null) ? 0 : accountCd.hashCode());
  result = prime * result
    + ((foreCastDay == null) ? 0 : foreCastDay.hashCode());
  result = prime
    * result
    + ((transactionTypeDesc == null) ? 0 : transactionTypeDesc
      .hashCode());
  return result;
 }
 
 @Override
 public boolean equals(Object obj) {
  if (this == obj)
   return true;
  if (obj == null)
   return false;
  if (getClass() != obj.getClass())
   return false;
  CashForecastSummary other = (CashForecastSummary) obj;
  if (accountCd == null) {
   if (other.accountCd != null)
    return false;
  } else if (!accountCd.equals(other.accountCd))
   return false;
  if (foreCastDay != other.foreCastDay)
   return false;
  if (transactionTypeDesc == null) {
   if (other.transactionTypeDesc != null)
    return false;
  } else if (!transactionTypeDesc.equals(other.transactionTypeDesc))
   return false;
  return true;
 }

}



The hashCode( ), equals( Object obj), and toString( ) methods from the Java Object class are over written. It also implements the Serializable interface to make the object serializable (i.e. can be flattened to bytes).

Step 2a: Writing a simple enum class.

package com.myapp.model;

public enum RecordType 
{
 OPENING_BALANCE(1),
    DETAILS(2),
    CLOSING_BALANCE(3);

    private final int recordType;
    
    private RecordType(int recordType)
    {
     this.recordType = recordType;
    }
    
    public int getRecordType()
    {
        return recordType;
    }

}



Step 2b: Writing an enum class with reverse look-up using a Map.

package com.myapp.model;

import java.util.HashMap;
import java.util.Map;

public enum ForecastDay {
 T(0), T1(1), T2(2), T3(3), T4(4), T5(5), T6(6), T7(7), T8(8), T9(9), T10(10), T11(11), T12(12), T13(13), TN(14);

 // Reverse-lookup map for getting a day from an abbreviation
 private static final Map<Integer, ForecastDay> lookup = new HashMap<Integer, ForecastDay>();
 static {
  for (ForecastDay d : ForecastDay.values())
   lookup.put(d.getDaycount(), d);
 }

 private final int dayCount;

 private ForecastDay(int dayCount) {
  this.dayCount = dayCount;
 }

 public int getDaycount() {
  return dayCount;
 }

 public static ForecastDay getDayFromCount(int dayCount) {
  if (dayCount > 13) {
   dayCount = 14;
  }
  return lookup.get(dayCount);
 }

}


Step 3: Writing a comparator class to sort a collection of value objects.



package com.myapp.model.sort;

import java.util.Comparator;

import com.myapp.model.CashForecastSummary;

public class CashForecastSummaryComparator implements Comparator<CashForecastSummary> {

 @Override
 public int compare(CashForecastSummary o1, CashForecastSummary o2) {

  int result = 1;

  String accountCd1 = o1.getAccountCd();
  String accountCd2 = o2.getAccountCd();

  if (accountCd1 == null || accountCd2 == null) {
   throw new IllegalArgumentException("The account codes cannot be null " + " accountCd1=" + accountCd1 + ", accountCd2=" + accountCd2);
  }
  
  result = accountCd1.compareTo(accountCd2);
  if (result != 0) {
   return result;
  }
  
  Integer forecastDay1 = o1.getForeCastDay().getDaycount();
  Integer forecastDay2 = o2.getForeCastDay().getDaycount();

  if (forecastDay1 == null || forecastDay2 == null) {
   throw new IllegalArgumentException("The forecastDay cannot be null " + " forecastDay1=" + forecastDay1 + ", forecastDay2=" + forecastDay2);
  }

  result = forecastDay1.compareTo(forecastDay2);
  if (result != 0) {
   return result;
  }

  Integer recType1 = o1.getRecordType().getRecordType();
  Integer recType2 = o2.getRecordType().getRecordType();

  if (recType1 == null || recType2 == null) {
   throw new IllegalArgumentException("The recordType cannot be null " + " recType1=" + recType1 + ", recType2=" + recType2);
  }

  result = recType1.compareTo(recType2);
  if (result != 0) {
   return result;
  }
  
  result = o1.getTransactionTypeDesc().compareTo(o2.getTransactionTypeDesc());
  
  return result;
 }

}


As you could see, you need to implement the compare(Object obj1, Object obj2) method. This method has been written with fail fast in mind. As you can see, the input fields are validated, and exception is thrown if validation fails.


Step 4:  Finally, how to use the comparator.

    
 ....

    /**
 *  takes unsorted values as an argument and returns the sorted values
 */
 public List<CashForecastSummary> getSortedSummariesForBalanceUpdate(List<CashForecastSummary> values) {
  Collections.sort(values, new CashForecastSummaryComparator());
  return values;
 }
 
 
 ....
 


Labels: ,

Jan 21, 2013

Top 20+ Java EE interview questions and answers that experienced Java developers must know

Java Interview Questions and Answers

1-5 for experienced developers 6-10 for experienced developers 11-16 for experienced developers 17-20 for experienced developers

If you are an interviewer -- it really pays to ask well rounded questions to judge the real experience of the candidate as opposed to just relying on the number of years.

If you are an interviewee -- You have no control over what questions will be asked in Java job interviews, but there are a few Java interview questions that are very frequently asked in job interviews, and it really pays to prepare for those questions. Here are a few such questions. This is based on some of my recent interviews. Brushing up on these answers will boost your success rate in Java job interviews.

The focus is on judging your experience and technical strength. So, go through your resume and reflect back on your hands-on experience. Many of these answers will also serve you well in selling your technical strengths to open ended questions like -- tell me about yourself? , why do you like software development?, when reviewing others' work, what do you look for?, what was your most significant accomplishment?, etc

Q1. Can you describe the high level architecture of the application that you had worked on recently? or Can you give a 100 feet bird's eye view of the application you were involved in as a developer or designer?

A1. The purpose is to judge your overall experience. Here are a few links that will prepare for this most sought-after question, especially for the experienced professionals.
  • 3 to n-tier enterprise Java architecture, the MVC-2 design pattern, the distinction between physical and logical tiers, and the layered architecture.The logical tiers shown in the diagrams for the above link are basically layers, and the layers are used for organizing your code. Typical layers include Presentation, Business and Data – the same as the traditional 3-tier model. But when  talk about layers, we’re only talking about logical organization of code. Physical tiers however, are only about where the code gets deployed and run as war, ear or jar. Specifically, tiers are places where layers are deployed and where layers run. In other words, tiers are the physical deployment of layers.
  • high level architectural patterns and the pros and cons of each approach. Discusses a number different architectual styles like MVC, SOA, UI Component, RESTful data composition, HTML data composition, Plug-in architecture, and Event Driven Architecture (EDA). Also, covers a sample enterprise architecture, which is a hybrid of a number of architectural styles.
  • Single page web design. A single rich web page loading different sections (e.g tabs) of the page by making different ajax calls to relevant RESTful web services to get the JSON data to render the relevant sections.
  • Understanding of various types of application integration styles. Enterprise applications need to be integrated with other systems, and this post covers different integration styles like sharing the same database, feed based, RPC style (e.g. web services), and exchanging JMS messages.


Q2. How would you go about deciding between SOAP based web service and RESTful web service?
A2. Web services are very popular and widely used to integrate disparate systems. It is imperative to understand the differences, pros, and cons between each approach. Recently, I have been asked these questions very frequently, here is the answer that discusses the pros and cons of SOAP vs. RESTful web services in detail.  

Q3.How will you go about ensuring that you build a more robust application? or How do you improve quality of your application?
A3. This question is very popular with the interviewers because they want to hire candidates who write good quality application. Here are some useful links to prepare for this question.
Q4. What are some of the best practices regarding SDLC? Have you worked in an agile development environment? Did you use test driven development and continuous integration?
A4


Q5: Can you write code? How would you go about implementing .....?
A5:  You will be asked to write a small function or a program. The interviewer will be looking for things like
  • How well you analyze the requirements and the solution? Ask relevant questions if the requirements are not clear. Think out loud so that the interviewer knows that how you are approaching the problem.The approach is more important that arriving at the correct solution.
  • List all possible alternatives.
  • Write pseudo code.
  • Write unit tests.
  • Where required, brainstorm with the interviewer.
  • Find ways to improve your initial solution. Talk through thread safety and best practices where applicable.
Here are some links to popular coding questions and answers.
  • Java coding interview questions and answers:. Covers a number of coding questions with answers. These are some of the popular coding interview questions like reversing a string, calculating the Fibonacci number, recursion vs iteration, equals versus ==, etc.
Coding Questions and Answers worth brushing up on, especially before the screening written tests.
Q. Can you write Java code to compute a collection of numbers supplied to it? The computation could be addition, subtraction, etc. Use recursion to compute the numbers. Here are some requirements to take into considerations.

1. It should be flexible enough to convert from Recursion to iteration if required.
2. Computation will initially be "addition", but should be extendable to multiplication, subtraction, etc.
3. Should handle integers and floating point numbers.
4. Make use of generics.

A. Java coding with recursion and generics

Q. The SimpleDateFormatter and DecimalFormatter are not thread-safe. How will you use them in a thread-safe manner?
A. Using the ThreadLocal class as demonstrated in this tutorial.


Q. If Java did not have its own Map or Set implementation, how will you go about implementing your own?
A. Covered in detail with diagrams and code in "Core Java Career Essentials" book.

Java Interview Questions and Answers

1-5 for experienced developers 6-10 for experienced developers 11-16 for experienced developers 17-20 for experienced developers

Labels: , ,

Jan 18, 2013

Identifying Java concurrency issues -- thread starvation, dead lock, and contention


Concurrency is very important in any modern system, and this is one topic many software engineers struggle to have a good grasp. The complexity in concurrency programming stems from the fact that the threads often need to operate on the common data. Each thread has its own sequence of execution, but accesses common data.


Debugging concurrency issues and fixing any thread starvation, dead lock, and contention require skills and experience to identify and reproduce these hard to resolve issues. Here are some techniques to detect concurrency issues.

  • Manually reviewing the code for any obvious thread-safety issues. There are static analysis tools like Sonar, ThreadCheck, etc for catching concurrency bugs at compile-time by analyzing their byte code.
  • List all possible causes and add extensive log statements and write test cases to prove or disprove your theories.
  • Thread dumps are very useful for diagnosing synchronization problems such as deadlocks. The trick is to take 5 or 6 sets of thread dumps at an interval of 5 seconds between each to have a log file that has 25 to 30 seconds worth of runtime action. For thread dumps, use kill -3 in Unix and CTRL+BREAK in Windows. There are tools like Thread Dump Analyzer (TDA), Samurai, etc. to derive useful information from the thread dumps to find where the problem is. For example, Samurai colors idle threads in grey, blocked threads in red, and running threads in green. You must pay more attention to those red threads.
  • There are tools like JDB (i.e. Java DeBugger) where a “watch” can be set up on the suspected variable. When ever the application is modifying that variable, a thread dump will be printed.
  • There are dynamic analysis tools like jstack and JConsole, which is a JMX compliant GUI tool to get a thread dump on the fly. The JConsole GUI tool does have handy features like “detect deadlock” button to perform deadlock detection operations and ability to inspect the threads and objects in error states. Similar tools are available for other languages as well.

Here are some best practices to keep in mind relating to writing concurrent programs.

  • Favor immutable objects as they are inherently thread-safe.
  • If you need to use mutable objects, and share them among threads, then a key element of thread-safety is locking access to shared data while it is being operated on by a thread. For example, in Java you can use the synchronized keyword.
  • Generally try to keep your locking for as shorter duration as possible to minimize any thread contention issues if you have many threads running. Putting a big, fat lock right at the start of the function and unlocking it at the end of the function is useful on functions that are rarely called, but can adversely impact performance on frequently called functions. Putting one or many larger locks in the function around the data that actually need protection is a finer grained approach that works better than the coarse grained approach, especially when there are only a few places in the function that actually need protection and there are larger areas that are thread-safe and can be carried out concurrently.
  • Use proven concurrency libraries (e.g. java.util.concurrency) as opposed to writing your own. Well written concurrency libraries provide concurrent access to reads, while restricting concurrent writes.




Fixing production issues

There could be general run time production issues that either slow down or make a system to hang. In these situations, the general approach for troubleshooting would be to analyze the thread dumps to isolate the threads which are causing the slow-down or hang. For example, a Java thread dump gives you a snapshot of all threads running inside a Java Virtual Machine. There are graphical tools like Samurai to help you analyze the thread dumps more effectively.


  • Application seems to consume 100% CPU and throughput has drastically reduced – Get a series of thread dumps, say 7 to 10 at a particular interval, say 5 -8 seconds and analyze these thread dumps by inspecting closely the “runnable” threads to ensure that if a particular thread is progressing well. If a particular thread is executing the same method through all the thread dumps, then that particular method could be the root cause. You can now continue your investigation by inspecting the code.

  • Application consumes very less CPU and response times are very poor due to heavy I/O operations like file or database read/write operations – Get a series of thread dumps and inspect for threads that are in “blocked” status. This analysis can also be used for situations where the application server hangs due to running out of all runnable threads due to a deadlock or a thread is holding a lock on an object and never returns it while other threads are waiting for the same lock.
The solution to the above problems could vastly vary from fixing the thread safety issue(s) to reducing the size of synchronization granularity, and from implementing appropriate caching strategies to setting the appropriate connection timeouts, and other reasons discussed in the last section.



You may also like:

Labels:

Jan 8, 2013

XML processing in Java and reading XML data with a Stax Reader



Q. What APIs do Java provide to process XML? What are the pros and cons of each, and when to use what?
A

SAX: Pros: Memory efficient and faster than the DOM parser. Good for very large files. Supports schema validation.
SAX: Cons: You can use it for reading only. It does not have xpath support. You have to write more code to get things done as there is no object model mapping and you have to tap into the events and create your self.

DOM: Pros: Simple to use, bi-directional(read & write) and supports schema validation. The read XML is kept in memory allowing XML manipulation and preserves element order. Supports CRUD operation.
DOM: Cons: Not suited for large XML files as it consumes more memory. You will have to map to your object model as it supports generic model incorporating nodes.

Stax: Pros:  Gives you the best of SAX (i.e. memory efficient) and DOM (i.e. ease of use). Supports both reading and writing. Very efficient processing as it can read multiple documents same time in one single thread, and also can process XML in parallel on multiple threads. If you need speed, using a Stax parser is the best way.
Stax: Cons:  You have to write more code to get things done and you have to get used to process xml in streams.
 
JAXB: Pros:
Allows you to access and process XML data without having to know XML by binding Java objects to XML through annotations. Supports both reading & writing and more memory efficient than a DOM (as DOM is a generic model)
JAXB: Cons: It can only parse valid XML documents.

SAX, DOM, Stax, and JAXB are just specifications. There are many open source and commercial implementations of these specifications. For example, the JAXB API has implementations like Metro (The reference implementation, included in Java SE 6), EclipseLink MOXy, and JaxMe.

There are other XML to object mapping frameworks like xtream from thoughtworks and JiBX (very efficient and uses byte code injection).

The StAX has the following implementation libraries  --  The reference implementation (i.e. http://stax.codehaus.org), the Woodstox implementation (i.e. http://woodstox.codehaus.org), and Sun's SJSXP implementation (https://sjsxp.dev.java.net/).

If you want to transform XML from one format to another, then use a TrAX. TrAX is based on XSLT, which is a rule-based language. A TrAX source document may be created via SAX or DOM. TrAX needs both Java and XSLT knowledge.



Q. What is the main difference between a StAX and SAX API ?
A. The main differences between the StAX and SAX API's are

  • StAX is a "pull" API. SAX is a "push" API. 
  • StAX can do both XML reading and writing. SAX can only do reading.

SAX is a push style API, which means SAX parser iterates through the XML and calls methods on the handler object provided by you. For instance, when the SAX parser encounters the beginning of an XML element, it calls the startElement on your handler object. It "pushes" the information from the XML into your object. Hence the name "push" style API. This is also referred to as an "event driven" API.

StAX is a pull style API, which means that you have to move the StAX parser from item to item in the XML file yourself, just like you do with a standard Iterator or JDBC ResultSet. You can then access the XML information via the StAX parser for each such "item" encountered in the XML file .

Q. Why do you need StAX when you alread have SAX and DOM?

A. The primary goal of the StAX API is to give "parsing control to the programmer by exposing a simple iterator based API. This allows the programmer to ask for the next event (pull the event) and allows state to be stored in procedural fashion." StAX was created to address limitations in the two most prevalent parsing APIs, SAX and DOM.

Q. What is the issue with the DOM parser?
A. The DOM model involves creating in-memory objects representing an entire document tree and the complete data set  state for an XML document. Once in memory, DOM trees can be navigated freely and parsed arbitrarily, and as such provide maximum flexibility for developers. However the cost of this flexibility is a potentially large memory footprint and significant processor requirements, as the entire representation of the document must be held in memory as objects for the duration of the document processing. So, this approach will not be suited for larger XML documents.


Q. What is the issue with a "push" parser?
A. Pull parsing provides several advantages over push parsing when working with XML streams:
  • With pull parsing, the invoking application controls the application thread, and can call methods on the parser when needed. By contrast, with push processing, the parser controls the application thread, and the client can only accept invocations from the parser.
  • Pull parsing libraries can be much smaller and the client code to interact with those libraries much simpler than with push libraries, even for more complex documents.
  • Pull clients can read multiple documents at one time with a single thread.
  • A StAX pull parser can filter XML documents such that elements unnecessary to the client can be ignored, and it can support XML views of non-XML data.


Now that you know why a StAX parser is more useful. here is an example of using the StAX API to read an XML snippet. The XML snippet is as shown below: 

 
<metadata BatchJobId='17232674' ParentBatchJobId='17232675' BatchCode='SOME_JOB_NAME' />


Now the StAX parser code

package com.myapp.item.reader;

import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.util.Iterator;

import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.Attribute;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.myapp.model.BatchJobMeta;

public class CashForecastJobMetaStaxReader {

 private final static Logger logger = LoggerFactory.getLogger(CashForecastJobMetaStaxReader.class);

 private static final String BATCH_JOB_ID = "BatchJobId";
 private static final String PARENT_BATCH_JOB_ID = "ParentBatchJobId";
 private static final String BATCH_CODE = "BatchCode";
 

 public BatchJobMeta readBatchJobMetaInfo(String input) {

  //First create a new XMLInputFactory
  XMLInputFactory inputFactory = XMLInputFactory.newInstance();
  InputStream in = new ByteArrayInputStream(input.getBytes());
  BatchJobMeta item = null;
  try {
   XMLEventReader eventReader = inputFactory.createXMLEventReader(in);
   item = new BatchJobMeta();

   while (eventReader.hasNext()) {
    XMLEvent event = eventReader.nextEvent();

    if (event.isStartElement()) {
     StartElement startElement = event.asStartElement();
     @SuppressWarnings("unchecked")
     Iterator<Attribute> attributes = startElement.getAttributes();
     while (attributes.hasNext()) {
      @SuppressWarnings("unused")
      Attribute attribute = attributes.next();
      if (attribute.getName().toString().equals(BATCH_JOB_ID)) {
       item.setBatchJobId(Long.valueOf(attribute.getValue()));
      }

      if (attribute.getName().toString().equals(PARENT_BATCH_JOB_ID)) {
       item.setParentBatchJobId(Long.valueOf(attribute.getValue()));
      }

      if (attribute.getName().toString().equals(BATCH_CODE)) {
       item.setBatchCode(attribute.getValue());
      }
     }
    }
   }

  } catch (XMLStreamException e) {
   logger.error("", e);
  }

  return item;

 }

}



Finally, the JUnit test class that tests the above XML reader code.

 

package com.myapp.item.reader;

import org.junit.Assert;
import org.junit.Test;

import com.myapp.model.BatchJobMeta;

public class CashForecastJobMetaStaxReaderTest {
 private static final String META_SNIPPET = "<metadata batchcode="CSHFR" batchjobid="17232674" parentbatchjobid="17232675">";

 @Test
 public void testReadBatchJobMetaInfo() {
  CashForecastJobMetaStaxReader staxReader = new CashForecastJobMetaStaxReader();
  BatchJobMeta readBatchJobMetaInfo = staxReader.readBatchJobMetaInfo(META_SNIPPET);

  Assert.assertNotNull(readBatchJobMetaInfo);

  Assert.assertEquals("Failed on Job Id", 17232674L, readBatchJobMetaInfo.getBatchJobId());
  Assert.assertEquals("Failed on Parent Job Id", 17232675L, readBatchJobMetaInfo.getParentBatchJobId());
  Assert.assertEquals("Failed on batch code", "CSHFR", readBatchJobMetaInfo.getBatchCode());
 }

}


You may also like:

Labels:

Jan 7, 2013

Working with Java Calendar and Dates - coding Q&A



Q. Can you write a Java program to return the next week day for a given date?
A.

package com.myapp.accounting.util;

import java.util.Calendar;
import java.util.Date;

public class CashforecastingUtil {
 
 public static Date getNextWeekday(Date inputDate) {
  Calendar cal = Calendar.getInstance();
  cal.setTime(inputDate);
  int dayOfWeek = cal.get(Calendar.DAY_OF_WEEK);
  switch (dayOfWeek) {
  case Calendar.FRIDAY:
   cal.add(Calendar.DAY_OF_WEEK, 3);
   break;
  case Calendar.SATURDAY:
   cal.add(Calendar.DAY_OF_WEEK, 2);
   break;
  default:
   cal.add(Calendar.DAY_OF_WEEK, 1);
   break;
  }
  
  return cal.getTime();
 }

}


The unit test class will be something like


import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;

import junit.framework.Assert;

import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;


public class CAshforecastingUtilTest {
 
 private final static Logger logger = LoggerFactory.getLogger(CAshforecastingUtilTest.class);
 
 
 @Test
 public void testGetNextWeekday() {
  SimpleDateFormat sdf = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss");
  Calendar cal = Calendar.getInstance();
  cal.set(2012,Calendar.DECEMBER,5);
  
  Date nextCalcWkday = CashforecastingUtil.getNextWeekday(cal.getTime());
  logger.info(sdf.format(nextCalcWkday));
  cal.set(2012,Calendar.DECEMBER,6);
  logger.info(sdf.format(cal.getTime()));
  Assert.assertTrue("Dates don't match-1", cal.getTime().equals(nextCalcWkday));
  
  nextCalcWkday = CashforecastingUtil.getNextWeekday(cal.getTime());
  cal.set(2012,Calendar.DECEMBER,7);
  Assert.assertTrue("Dates don't match-2", cal.getTime().equals(nextCalcWkday));
  
  nextCalcWkday = CashforecastingUtil.getNextWeekday(cal.getTime());
  cal.set(2012,Calendar.DECEMBER,10);
  Assert.assertTrue("Dates don't match-3", cal.getTime().equals(nextCalcWkday));
  
  cal.set(2012,Calendar.DECEMBER,8);
  nextCalcWkday = CashforecastingUtil.getNextWeekday(cal.getTime());
  cal.set(2012,Calendar.DECEMBER,10);
  Assert.assertTrue("Dates don't match-4", cal.getTime().equals(nextCalcWkday));
  
  cal.set(2012,Calendar.DECEMBER,9);
  nextCalcWkday = CashforecastingUtil.getNextWeekday(cal.getTime());
  cal.set(2012,Calendar.DECEMBER,10);
  Assert.assertTrue("Dates don't match-5", cal.getTime().equals(nextCalcWkday));
  
  nextCalcWkday = CashforecastingUtil.getNextWeekday(cal.getTime());
  cal.set(2012,Calendar.DECEMBER,11);
  Assert.assertTrue("Dates don't match-6", cal.getTime().equals(nextCalcWkday));
  
  nextCalcWkday = CashforecastingUtil.getNextWeekday(cal.getTime());
  cal.set(2012,Calendar.DECEMBER,12);
  Assert.assertTrue("Dates don't match-7", cal.getTime().equals(nextCalcWkday));
  
  nextCalcWkday = CashforecastingUtil.getNextWeekday(cal.getTime());
  cal.set(2012,Calendar.DECEMBER,13);
  Assert.assertTrue("Dates don't match-8", cal.getTime().equals(nextCalcWkday));
 }

}

Note that the Calendar.DECEMBER is used instead of 12 for the month as the months are zero based. So, it is 11 for Decemebr. If you have to perform lots of date manipulation, then use the Joda library, which is more intuitive and easier to work with. 

Q. Why are there two Date classes in Java -- one in java.util package and another in java.sql package?

A java.util.Date represents date and time of a day, whereas java.sql.Date only represents a date without the time. java.sql.Time, only represents a time of day, and java.sql.Timestamp represents both date and time

Q. Can you write a function to compare two given dates and return the difference between the two dates in number of days?
A. This where the new Joda date library introduced in java 7 shines.

Step 1: Write the function that takes two dates and return the difference in days. The assumption is to ignore the time and reset both times to midnight.

package com.myapp.util;

import java.util.Calendar;
import java.util.Date;

import org.joda.time.DateTime;
import org.joda.time.Days;


public class MyAppUtil {
 
  
 /**
  * returns the number of days between two given dates
  * @param currentDate
  * @param curOrFutureDate
  * @return
  */
 public static int diffDays(Date currentDate, Date curOrFutureDate) {
  if(currentDate == null || curOrFutureDate == null){
   throw new IllegalArgumentException("Dates to be compared cannot be null! date1=" + currentDate + ", date2=" + curOrFutureDate );
  }
  
  if(curOrFutureDate.before(currentDate)) {
   throw new IllegalArgumentException("The curOrFutureDate cannot be bfore currentDate. date1=" + currentDate + ", date2=" + curOrFutureDate );
  }
  
  currentDate = resetTimeToMidnight(currentDate);
  curOrFutureDate = resetTimeToMidnight(curOrFutureDate);
  
  return Days.daysBetween(new DateTime(currentDate), new DateTime(curOrFutureDate)).getDays();
 }
 
 
 private static Date resetTimeToMidnight(Date inputDate){
  Calendar cal = Calendar.getInstance();
  cal.setTime(inputDate);
  cal.set(Calendar.HOUR_OF_DAY, 00);
  cal.set(Calendar.MINUTE, 00);
  cal.set(Calendar.SECOND, 00);
  cal.set(Calendar.MILLISECOND, 00);
  return cal.getTime();
 }

}



Step 2:

package com.myapp.util;

import java.util.Calendar;
import java.util.Date;

import junit.framework.Assert;

import org.junit.Test;

public class MyAppUtilTest {
 
 
 @Test
 public void testDiffDays() {
  Date now = new Date();
  
  Calendar cal = Calendar.getInstance();
  cal.setTime(now);
  
  cal.add(Calendar.DAY_OF_MONTH, -1);
  cal.set(Calendar.HOUR_OF_DAY, 00);
  Date yesterday = cal.getTime();
  
  cal.add(Calendar.DAY_OF_MONTH, -2);
  cal.set(Calendar.HOUR_OF_DAY, 5);
  Date _3daysAgo = cal.getTime();
  
  cal.add(Calendar.DAY_OF_MONTH, -4);
  cal.set(Calendar.HOUR_OF_DAY, 15);
  Date _7daysAgo = cal.getTime();
  
  Assert.assertEquals(0, MyAppUtil.diffDays(now, now));
  Assert.assertEquals(1, MyAppUtil.diffDays(yesterday,now));
  Assert.assertEquals(3, MyAppUtil.diffDays(_3daysAgo,now));
  Assert.assertEquals(7, MyAppUtil.diffDays(_7daysAgo,now));
  Assert.assertEquals(4, MyAppUtil.diffDays(_7daysAgo,_3daysAgo));
 }
 
 
 @Test(expected=IllegalArgumentException.class)
 public void testNegative1DiffDays() {
  Date now = new Date();
  
  Calendar cal = Calendar.getInstance();
  cal.setTime(now);
  
  cal.add(Calendar.DAY_OF_MONTH, -1);
  cal.set(Calendar.HOUR_OF_DAY, 00);
  Date yesterday = cal.getTime();
  
  Assert.assertEquals(1, MyAppUtil.diffDays(now,yesterday));
 }
 
 @Test(expected=IllegalArgumentException.class)
 public void testNegative2DiffDays() {
  Date now = new Date();
  
  Assert.assertEquals(1, MyAppUtil.diffDays(now,null));
 }

}


Note: Working with dates can be tricky and you will need to consider things like:

1. Time zones. 2. Day light savings.

Labels: ,

Jan 3, 2013

Spring Integration -- polling for a file and processing the file



Q. What are the main differences and similarities between light-weight integration frameworks like Spring Integration, Apache Camel, etc and an ESB like Oracle Service Bus, Mule, etc?


A. The core concepts are about the same, and based on the Enterprise integration patterns (EIP) book, and you can do the things mentioned there in the book. These are all about connectivity using different protocols, routing and messaging patterns, transformation, orchestration, business rules engine, and business and technical monitoring. The real difference lies where the ESBs are more powerful in the space of orchestration, business rules engine, and business monitoring compared to the light-weight integration frameworks.

Some of the commercial ESBs provide graphical drag and drop feature for routing and orchestration to model the system. So, the ESBs are more suited for more complex orchestration and BPM (Business Process Management). These extra enterprise level flexibility and features do come at additional cost and effort.

Q. Why do you need integration frameworks?Can you give an example where you used an integration framework?

A. Data exchanges between and within companies are very common. The number of applications which must be integrated require different technologies, protocols and data formats to be handled uniformly and efficiently. There are 3 integration frameworks available in the JVM environment : Spring Integration, Mule ESB and Apache Camel. These frameworks implement the well-known Enteprise Integration Patterns (EIP) and therefore offer a standardized, domain-specific language (DSL) to integrate applications. These integration frameworks can be used in almost every integration project within the JVM environment – no matter which technologies, transport protocols or data formats are used. If you know one of these frameworks, you can learn the others very easily as they use similar concepts. Each framework has its on pros and cons, it is always pays to do a "proof of concept" to see if it serves your purpose.

This blog entry provides a simple example as to how to make use of Spring integration to write a file polling task. Spring integration framework is used for watching for file that ends with ".end" or ".END" in a given folder and once the file arrives, kick off a method to process that file.

Step 1: Define the applicationContext-myapp.xml file to define the Spring integration channel, MyappInputFileHandler bean, etc.

 
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:batch="http://www.springframework.org/schema/batch"
 xmlns:task="http://www.springframework.org/schema/task" xmlns:context="http://www.springframework.org/schema/context"
 xmlns:tx="http://www.springframework.org/schema/tx" xmlns:aop="http://www.springframework.org/schema/aop"
 xmlns:int="http://www.springframework.org/schema/integration"
 xmlns:file="http://www.springframework.org/schema/integration/file"
 xmlns:util="http://www.springframework.org/schema/util"
 xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
 http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd
 http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task-3.0.xsd
 http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch.xsd
 http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-3.0.xsd
 http://www.springframework.org/schema/integration http://www.springframework.org/schema/integration/spring-integration-2.0.xsd
 http://www.springframework.org/schema/integration/file http://www.springframework.org/schema/integration/file/spring-integration-file-2.0.xsd
 http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util-2.0.xsd
 http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd">


 <context:property-placeholder location="classpath:myapp.properties" />
 <tx:annotation-driven />

 <!-- INTEGRATION BEANS -->

 <bean id="myappInputFileHandler"
  class="com.myapp.MyappInputFileHandler">
  <constructor-arg value="${myapp.file.path}" />
  <constructor-arg value="${myapp.file.regex}" />
 </bean>
 
 <int:channel id="fileIn"></int:channel>

 <file:inbound-channel-adapter id="inputChannelAdapter"
  channel="fileIn" directory="${myapp.file.path}"
  prevent-duplicates="false" filename-regex="${myapp.file.regex}">

  <int:poller id="poller" fixed-delay="5000" />

 </file:inbound-channel-adapter>


 <int:service-activator id="inputFileServiceActivator"
  input-channel="fileIn" method="processFile" ref="myappInputFileHandler" />

  
</beans> 


Step 2: Define the myapp.properties file. These properties are used by both the spring context file and the MyappInputFileHandler Java class.
 
#read from within spring context file
myapp.file.path=C:\\TEMP\\myapp\\
myapp.file.regex=.*\\.(end|END)

#properties read from the MyappInputFileHandler
cashforecast.job.lock.retry.count=5
cashforecast.job.lock.retry.wait.duration=20000


Step 3: The final step is to define the MyappInputFileHandler class and the processFile(..) method that gets invoked.
 
package com.myapp;

import java.io.File;
import java.io.IOException;
import java.util.concurrent.Semaphore;

import org.apache.commons.io.FileUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;


public class MyappInputFileHandler {

 private final static Logger logger = LoggerFactory.getLogger(MyappInputFileHandler.class);

 private static final String CSV_EXT = ".CSV";
 private static final String METACSV_EXT = ".META_CSV";

 private static final Semaphore mutex = new Semaphore(1);

 @Value("${myapp.job.lock.retry.wait.duration}")
 private long lockRetryWaitDuration;

 @Value("${myapp.job.lock.retry.count}")
 private long lockRetryCount;

 public MyappInputFileHandler(String path, String fileRegEx) {

  logger.info("Launching Job CashForecasting. Polling Folder: ".concat(path).concat(" for file ").concat(fileRegEx));
 }

 public void processFile(File file) {
  try {
   String fileNameNoExtension = file.getAbsolutePath().substring(0, file.getAbsolutePath().length() - 8);

   // Delete .end file and Move .META_CSV and .CSV File to process
   // folder
   mutex.acquire();

   File csvFile = new File(fileNameNoExtension.concat(CSV_EXT));
   File metaCsvFile = new File(fileNameNoExtension.concat(METACSV_EXT));
   file.delete();

   mutex.release();

   // Create Lock File
   File lockFile = createLockFile(csvFile);

   // further file processing ....................

  } catch (Exception ex) {
   logger.error("Error processing myapp: ", ex);
   throw new RuntimeException(ex);
  }
 }

 private File createLockFile(File csvFile) {

  if (!csvFile.exists()) {
   throw new RuntimeException("File not found: " + csvFile.getAbsolutePath());
  }

  String feedKey = "some key";

  File lockFile = new File(csvFile.getParent() + File.separator + feedKey.concat(".lock"));
  boolean fileCreated = false;

  int count = 0;
  while (lockFile.exists()) {
   try {
    Thread.sleep(lockRetryWaitDuration);
   } catch (InterruptedException e) {
    logger.error("Interrupted ", e);
   }

   if (++count > lockRetryCount) {
    throw new RuntimeException("Timedout acquiring a lock file for file : " + csvFile.getAbsolutePath() + " and "
      + "input data: " + feedKey);
   }
  }

  try {
      //Apache io library 
   FileUtils.touch(lockFile);
  } catch (IOException e) {
   logger.error("Error creating a lock file: " + lockFile.getAbsolutePath(), e);
   throw new RuntimeException(e);
  }

  if (logger.isDebugEnabled() && fileCreated) {
   logger.debug("Lock file created: " + lockFile.getAbsolutePath());
  }

  return lockFile;

 }

}


Another alternative to achieve similar behavior is to use Apache Camel.  Both Apache Camel and Spring Integration are light weight integration frameworks. It is very common in polling for a file in batch processes. Spring batch is a good fit for writing batch jobs in Java. You could also use this approach to transfer internal data files to/from external parties which usually requires format conversions and sending it using ftp/sftp/scp, etc or attaching it to an email and sending it out.

Another typical use case is to write your own automated custom end to end application integration testing framework using a light weight integration testing framework like Apache Camel. Your custom integration testing framework will have routes defined  for testing different tasks in an end to end manner.

1. Extracting files from System A.
2. Transforming the extracts to format that Systems B and C  can understand.
3. Publish the message to a JMS queue so that Systems B and C can load that data   into their databases.
4. Invoke a RESTful Web Service on System B to produce a report that needs to be emailed to a number of recipients.

So, good knowledge of integration frameworks like Spring Integration, Apache Camel, etc and enterprise service buses like Mule will be a plus to sell yourself in the job interviews.


Labels: ,

Jan 2, 2013

Java writing code -- compare two CSV files in Java

Q. How will you go about writing code for a scenario where you need to compare two CSV files? assume that both CSV files are converted to "List" for both target and generate. Duplicate entries are allowed, and you need to list differences between the target file and the generated file.

A.  You can use a CSV framework like OpenCSV to convert a csv file to List. The method below tales this list compare the values.

Possible scenarios are

  • Target and generated files could have different no of records
  • Target and generated files could have same no of records but the content might be different
  • Generated file could have a few rows removed ( this is treated as record deleted)
  • Generated file could have a few new rows added ( this is treated as record inserted)
  • With or without duplicate records


One approach to doing this would be to use Java's Set interface (e.g. Set) and , then do a removeAll() with the target set on the generated set, thus retaining the rows which differ. This, of course, assumes that there are no duplicate rows in the files.


 
// using FileUtils to read in the files.
HashSet<String[]> target = new HashSet<String[]>();
//...populate target via OpenCSV
HashSet<String[]> generated = new HashSet<String[]>();
//...populate generated via OpenCSV
generated.removeAll(target); // generated now contains only the lines which are not in target


The above solution would not work if duplicates are allowed. Here is the one possible solution when there are duplicates.
Firstly, write down the logical steps before start coding

1. Compare the number of items in both target and generated lists.
2. If the number of items are same, compare each item.
3. If no if items or any contents of individual items differ, the contents are not same, and identify the contents that differ.


3.1. To identify the contents that differ, both the target and generated lists need to be sorted.
3.2. Loop through and compare each item from both the lists. If either all the target or generated list has been completed exit the loop.
3.3. If there are any target or generated items not processed yet, process them in a new loop.

3.2.1 Inside the loop 3.2, there are 3 possible outcomes.

A. The item is in target but not in generated.
B. The item is in both target and generated.
C. The item is not in generated but is in target.


The compareTo(..) method is used as it will return 0, -1, or 1 meaning contents are equal, target is less than generated, and target is greater than the generated.

 

package com.myapp.compare;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

import org.springframework.stereotype.Component;

@Component(value = "simpleCSVCompare")
public class SimpleCSVCompareImpl implements SimpleCSVCompare {

 
 //compare target and generated CSV lines
 public CSVCompareResult compareCSVLines(List<String[]> targetLines, List<String[]> generatedLines, CSVCompareResult result){
  
  //Step1: Number of lines differ
  if(targetLines.size() != generatedLines.size()){
   result.addMismatchingLines("Line sizes don't match: " + " target=" + targetLines.size() + ", generated=" + generatedLines.size());
   result.setStatus(CSVCompareResult.FileCompareStatus.UNMATCHED);
  }
  
  //Step 2: Contents differ
  if(targetLines.size() == generatedLines.size()){
   for (int i = 0; i < targetLines.size(); i++) {
    String[] targetLine = targetLines.get(i);
    String[] genLine = generatedLines.get(i);
    
    if(!Arrays.deepEquals(targetLine, genLine)){
     result.addMismatchingLines("Line contents don't match.");
     result.setStatus(CSVCompareResult.FileCompareStatus.UNMATCHED);
     break;
    } 
   }
  }
  
  //Step 3: Identify the differing lines
  if(CSVCompareResult.FileCompareStatus.UNMATCHED == result.getStatus()){
   sortedList(targetLines);
   sortedList(generatedLines);
   evaluateCSVLineDifferences(targetLines,generatedLines,result);
  }
  
  return result;
  
 }
 
 public CSVCompareResult evaluateCSVLineDifferences(List<String[]> targetLines, List<String[]> generatedLines, CSVCompareResult result) {

  result.setNoOfGeneratedLines(generatedLines.size());
  result.setNoOfTargetLines(targetLines.size());

  int genIndex = 0;
  int targIndex = 0;

  String[] lineTarget = targetLines.get(targIndex);
  String[] lineGen = generatedLines.get(genIndex);

  boolean targetDone = false;
  boolean generatedDone = false;

  while (!targetDone && !generatedDone) {

   //target line is less than the generated line
   if (Arrays.deepToString(lineTarget).compareTo(Arrays.deepToString(lineGen)) < 0) {
    while (Arrays.deepToString(lineTarget).compareTo(Arrays.deepToString(lineGen)) < 0 && !targetDone) {
     result.addMismatchingLines("TARGET:" + Arrays.deepToString(lineTarget));
     if (targIndex < targetLines.size() - 1) {
      lineTarget = targetLines.get(++targIndex);
     } else {
      targetDone = true;
     }
    }

   //target and generated lines are same 
   } else if (Arrays.deepToString(lineTarget).compareTo(Arrays.deepToString(lineGen)) == 0) {
    if (targIndex < targetLines.size() - 1) {
     lineTarget = targetLines.get(++targIndex);
    } else {
     targetDone = true;
    }
    if (genIndex < generatedLines.size() - 1) {
     lineGen = generatedLines.get(++genIndex);
    } else {
     generatedDone = true;
    }

   //target line is greater than the generated line 
   } else if (Arrays.deepToString(lineTarget).compareTo(Arrays.deepToString(lineGen)) > 0) {
    while (Arrays.deepToString(lineTarget).compareTo(Arrays.deepToString(lineGen)) > 0 && !generatedDone) {
     result.addMismatchingLines("GENERATED:" + Arrays.deepToString(lineGen));
     if (genIndex < generatedLines.size() - 1) {
      lineGen = generatedLines.get(++genIndex);
     } else {
      generatedDone = true;
     }
    }
   }

  }

  //process any target lines not processed
  while (!targetDone) {
   result.addMismatchingLines("TARGET:" + Arrays.deepToString(lineTarget));
   if (targIndex < targetLines.size() - 1) {
    lineTarget = targetLines.get(++targIndex);
   } else {
    targetDone = true;
   }
  }

  //process any generated lines not processed
  while (!generatedDone) {
   result.addMismatchingLines("GENERATED:" + Arrays.deepToString(lineGen));
   if (genIndex < generatedLines.size() - 1) {
    lineGen = generatedLines.get(++genIndex);
   } else {
    generatedDone = true;
   }
  }

  return result;
 }
 
 
 public void sortedList(List<String[]> input){
   Collections.sort(input, new Comparator<String[]>() {

   @Override
   public int compare(String[] o1, String[] o2) {
    return Arrays.deepToString(o1).compareTo(Arrays.deepToString(o2));
   }
  });
   
 }

 public static void main(String[] args) {
  String[] targA1 = { "a1" };
  String[] genA1 = { "a1" };

  String[] targA2 = { "a2" };
  String[] genA2 = { "a2" };

  String[] targA3 = { "a3" };
  String[] genA3 = { "a3" };

  String[] targA4 = { "a4" };
  String[] genA4 = { "a4" };

  String[] targA5 = { "a5" };

  String[] genA6 = { "a6" };

  List<String[]> targetLines = new ArrayList<String[]>();
  List<String[]> generatedLines = new ArrayList<String[]>();

  targetLines.add(targA1);
  targetLines.add(targA2);
  targetLines.add(targA2);
  targetLines.add(targA3);
  targetLines.add(targA4);
  targetLines.add(targA5);

  generatedLines.add(genA1);
  generatedLines.add(genA2);
  generatedLines.add(genA3);
  generatedLines.add(genA4);
  generatedLines.add(genA6);

  CSVCompareResult result = new CSVCompareResult();

  new SimpleCSVCompareImpl().evaluateCSVLineDifferences(targetLines, generatedLines, result);

  System.out.println(result.getMismatchingLines());

 }

}


The results can be added to a value object like

 
package com.myapp.compare;

import java.util.ArrayList;
import java.util.List;

public class CSVCompareResult {

 public enum FileCompareStatus {
  MATCHED, UNMATCHED
 };

 private String generatedFileName;
 private String targetFileName;
 private int noOfTargetLines;
 private int noOfGeneratedLines;
 private FileCompareStatus status = FileCompareStatus.MATCHED;

 private List<String> mismatchingLines = new ArrayList<String>(20);

 public String getGeneratedFileName() {
  return generatedFileName;
 }

 public void setGeneratedFileName(String generatedFileName) {
  this.generatedFileName = generatedFileName;
 }

 public String getTargetFileName() {
  return targetFileName;
 }

 public void setTargetFileName(String targetFileName) {
  this.targetFileName = targetFileName;
 }

 public int getNoOfTargetLines() {
  return noOfTargetLines;
 }

 public void setNoOfTargetLines(int noOfTargetLines) {
  this.noOfTargetLines = noOfTargetLines;
 }

 public int getNoOfGeneratedLines() {
  return noOfGeneratedLines;
 }

 public void setNoOfGeneratedLines(int noOfGeneratedLines) {
  this.noOfGeneratedLines = noOfGeneratedLines;
 }

 public List<String> getMismatchingLines() {
  return mismatchingLines;
 }

 public void setMismatchingLines(List<String> mismatchingLineNumbers) {
  this.mismatchingLines = mismatchingLineNumbers;
 }

 public void addMismatchingLines(String lineNumber) {
  mismatchingLines.add(lineNumber);
 }

 public FileCompareStatus getStatus() {
  return status;
 }

 public void setStatus(FileCompareStatus status) {
  this.status = status;
 }
 
 public String outputResultsAsString(){
  StringBuilder sb = new StringBuilder();
  sb.append("Files Compared: " + " target=" + targetFileName + ", generated=" + generatedFileName);
  sb.append("\n");
  sb.append("Status:" + status);
  sb.append("\n");
  
  List<String> mismatchingLines = getMismatchingLines();
  
  for (String msg : mismatchingLines) {
   sb.append(msg);
   sb.append("\n");
  }
  
  return sb.toString();
 }

}



Labels: ,