Monday, 3 March 2014

Tracking Application Exceptions With Spring

A few weeks ago a colleague asked me to spend a week doing a support role as he needed cover whilst he took a well earned holiday and he couldn't find anyone else. As I'd just completed a particularly complex coding project and was feeling a little burnt out, I said 'yes'; after all the change would do me good.

Part of the job consisted of monitoring a collection of fairly critical backed processes, to see how well they were performing and whether or not they were going wrong.

We developers spend a lot of time and energy adding logging to our application in order to prove that it's working okay and to figure out what went wrong when an exception occurs. These log files are often used to tell us how well, or badly, our application is performing on a daily basis.

I'm ignoring other techniques such as adding probes to your application by whatever method you choose, such as HTTP or JMX. These provide immediate information on your application rather than the second level monitoring under discussion here.

There are at least three ways of monitoring log files:
  • Never
  • Reactively
  • Proactively

"Never" means what it says. I'm betting that there are those applications that go live, never seem to fail and are never checked.

The "reactive" strategy is very common; Mrs Smith from Doncaster phones up and complains that the web site crashed when she was trying to buy a new pair of shoes. She was charged twice and never received any shoes. In this traditional company, where the roles of DEV and OPS are totally segregated, the developer investigating the problem requests the logs for the time and date of the incident from OPS. The developer, of course, gets back a totally unrelated section of the log and has to re-request the correct section - several times. By that time, a few weeks have gone by and Mrs Smith is irate. The log eventually arrives and the problem is solved.

In this "reactive" scenario, I'm assuming that this is one of those companies when developers are under NO CIRCUMSTANCES trusted or allowed to touch the live servers. It's all too common and most of us will have been there at some point. Developers should be trusted with access to live systems; however, as a developer, you should remember that there are two golden rules when dealing with live systems:
  1. Don't break anything.
  2. If you do break something, make sure you have someone else to blame.

"Proactively" means that log files are checked on a regular basis: daily, hourly or what ever. Even if your application contains a multitude of JMX, http or other probes, there's no guarantee that they'll spot a problem. Probes can only probe what you tell them to probe, any other problem is beyond them.

Getting back to my spell on support… for some, probably historical, reason most of this monitoring consisted of manually checking log files for errors using a set of documented 'grep' commands via a bit of cut and paste. If you you add the time up spent doing repetitive cut and paste job then I recon that it was consuming approximately ½ a man day per week.

And this got me thinking… I really don’t like doing this kind of task. I’m not very good at at, it’s manual, prone to error, repetitive and very boring. In burning ½ a man day per week it’s obviously going to to be cost effective to automate this task, so long as time isn’t spent faffing about making he solution perfect. So, what options are there?

If you look at the professional end of the scale, then there's the likes of Splunk which can monitor messages from a multitude of sources such as the syslog daemon.

This means that a simple way of sending errors to Splunk is to simply setup the Log4j syslog appender - but that’s beyond the scope of this blog...

At the quick and dirty end of the spectrum, you could very quickly write a shell script to do a few 'grep’ commands, writing the results to a file and mailing it your Unix mailbox.

Then I thought about the middle ground. How hard can it be to write a Spring app that contains as many generic, reusable classes as possible, runs periodically checking a whole bunch of error logs and emailing you the results. For this kind of thing, emails are,good, because you generally read them out of habit.

Getting Started

In any project like this, there are those intangible first steps that get the whole thing moving. They form that answer to the question of what classes do you need to write to get this ‘thing’ released? There are many methods of determining what classes you need to write, such as simply picking them out of thin air (gut feel), using a formal design technique such as UML, Rapid Prototyping or Test Driven Design. In each case all you really do is to come up with a few requirements from which the names of some classes pop up. For example, in this case I need to:
  1. Search a given directory and its sub-directories (possibly) looking for files of a particular type.
  2. If a file is found then check its date: does it need to be searched for errors?
  3. If the file is young enough to be checked then validate it, looking for exceptions.
  4. If it contains exceptions, are they the ones we’re looking for or have they been excluded?
  5. If it contains the kind of exceptions we’re after, then add the details to a report.
  6. When all the files have been checked, format the report ready for publishing.
  7. Publish the report using email or some other technique.
  8. The whole thing will run at a given time every day

This throws up several names of classes. I need a FileLocator, FileValidator, RegexValidator, FileAgeValidator and a Report.

Given that the word "Validator" crops up several times in the above list indicates that I can use an interface, presumably called Validator and create several implementations to perform the above validation tasks. These implementations could then be combined to create a coherent application.

Note that this is only a preliminary list of ideas. If you look at the code, you won’t find a class named Report, it was renamed Results and refactored to create a Formatter interface, the TextFormatter and HtmlFormatter classes together with the Publisher interface and EmailPublisher class.

In terms of running this periodically there are a couple of choices. Firstly you could write the Java code together with a simple script to call it and offer it to your Unix machines crontab. This would be okay, but it means that the app wouldn’t run on Windows and would be limited to running as a standalone application. That may be a big deal, so that alternative would be to use Spring and the Quartz schedular; making setting up a cron schedule fairly simple.

As for the emailing of the report; again, Spring provides a really good email template that simplifies using the Java email classes.

Okay, so that’s the starting point: a set of vague ideas for classes that can be linked together in some loosely coupled way to create our application. If you took a formal route, then you may want to spend time documenting all this, perhaps even producing a class diagram, which is then added to a Word doc, and reviewed several times until it all becomes set in stone; however, I don’t have to bother with all that here…

Configuring the Application

As with any application, there's a need to figure out how we're going to set up what is usually a whole raft of property values and how these are used by the application. This application is driven by the app.properties file located, as usual, in the src/main/resources directory.

# The path to the log file directory to scan for errors 
scan.in=/Library/Tomcat/logs
# A regex defining what to look for - or what not to include
scan.for=^.*Exception.*
exclude=^.*IllegalStateException.*
# The number of following lines to add to the report
following.lines=10
# Where to email the report
[email protected]
# The max age of a file in days
max.days=1000

The first of these properties that we're interested in is the scan.in property. This is the location of our web server's log directory and is used as a starting point to the FileLocator class outlined in the next section.

Searching for Files

In writing a FileLocator I've gone a little beyond the requirements and intentionally done a little 'gold plating'.

Gold plating is a really bad idea. You should write just enough code to meet the functional requirements - that is to get the job done. In this case, being a blog... my blog, I can justify it by saying that it meets the non-functional requirement of documenting a technique that I've used many times before.

The following code doesn't just search for log files in a nominated log file directory, it searches all/any sub-directories too.

@Service
public class FileLocator {

 
private static final Logger logger = LoggerFactory.getLogger(FileLocator.class);

 
@Value("${scan.in}")
 
private String scanIn;

 
@Autowired
  @Qualifier
("fileValidator")
 
private Validator validator;

 
/** Search for the files requested */
 
public void findFile() {

   
logger.info("Searching in... {}", scanIn);
    File file = createFile
(scanIn);
    search
(file);
 
}

 
@VisibleForTesting
 
File createFile(String name) {
   
return new File(name);
 
}

 
private void search(File file) {

   
if (file.isDirectory()) {
     
logger.debug("Searching directory: {}", file.getName());
      File
[] files = file.listFiles();
      searchFiles
(files);
   
} else {
     
logger.debug("Validating file: {}", file.getName());
      validator.validate
(file);
   
}
  }

 
private void searchFiles(File[] files) {
   
for (File file : files) {
     
search(file);
   
}
  }
}

The code above uses the old faithful technique of recursion to search for log files. The main entry point is findFile(). It creates a File object from the Spring @Value annotated scanIn instance variable and passes it to the search() method. It then checks whether or not this object is a directory and, if it is, it gets hold of all the files in the directory and loops around calling search() on each File object in the list. If the directory check reveals that the File object is a file, then file validation is called.

So far so good, with the application's first class we can now search a log file directory; however, if you want to know what happens when a file has been found, then you'll have to wait until my next blog is published.

One final thought: do you need to investigate every error in your system? There's an old philosophical saying: if a tree falls over in a forest and there's no one there to hear it, does it still make a sound? In the same way, if your application throws an exception and the user isn't affected, is it still an error? Do you need to spend time investigating it? Let me know what you think?


The code for this blog is available on Github at: https://github.com/roghughe/captaindebug/tree/master/error-track.


3 comments:

Melani said...

Awesome Artical Really i have searching this type of valuable information From a lot of days i found satisfaction when Read your blog Thanks for giving this type blog and also please Read link bvba Woodstone which provide information server monitoring software & Network monitoring tools: http://www.woodstone.nu/salive

Józsi said...

A full blown Spring application just for detecting some patterns in the logs? You must love Spring a lot and must have been really bored :)

Roger Hughes said...

Jozsi,
It might seem over the top, but there are reasons... you'll have to read to the end to see the final twist to the plot...