Tag Archives: Testing

Using ShiftLeft in Open Source

After years of being involved in open-source development at Apache, we’ve seen security issues pop up in Apache Commons like arbitrary remote code execution, and denial of service attacks (CVE-2016-3092 and CVE-2014-0050). While some threats are real, other are just FUD. Even when they are real, it is important to consider context. There may be problems that end users never see because the “hole” is not reachable by users’ actions.

I started to wonder how we can short-circuit FUD and proactively deal with security issues before we hear about them in the press and in panicked mailing list posts. Is there something we can do at build time? A FindBugs or PMD on steroids? Surely someone else must be thinking about this. Can we apply the principle of shift left testing to security?

It turns out there is a tool out there that does and it’s called, not coincidentally, ShiftLeft.io. The idea behind ShiftLeft is to break old habits of building a product and then, later, figuring out how to fend off attacks and plug up security holes. Today, we take for granted that unit testing, integration testing, continuous integration, and continuous delivery are common place. ShiftLeft propose to make security analysis as ubiquitous.

Getting started

Since ShiftLeft is free for open source projects, I decided to look what it reports for Apache Commons IO 2.5, an Apache Commons Java component. While I won’t divulge right now any real security issues, I’ll show what is safe to report about Commons IO. I’ll also show the cool stuff ShiftLeft points out to write more bullet-proof software. To use ShiftLeft, you go to their web site and submit the URL to your GitHub repository. ShiftLeft has a 30 day disclosure policy so you have plenty of time to fix up your FOSS project.

What I got is a JSON report file containing all sorts of data and what ShiftLeft calls conclusions. These are the potentially actionable items. As we’ll see, even if you find some conclusions non-actionable, these will do a great deal to raise your awareness of potential security issues for code that you’ll write tomorrow or need to maintain today.

Let’s start with a simple conclusion and get deeper in the weeds after that.

No memory for you!

The following example shows what “untrusted” data is and how it can affect your application.

I have a ShiftLeft conclusion that tells me the method IOUtils.buffer(Writer, int) does not support handling untrusted data to be passed as parameter size because it controls the size of a buffer, giving an attacker the chance to starve the system of memory:

/**
 * Returns the given Writer if it is already a {@link BufferedWriter}, otherwise creates a BufferedWriter from the
 * given Writer.
 *
 * @param writer the Writer to wrap or return (not null)
 * @param size the buffer size, if a new BufferedWriter is created.
 * @return the given Writer or a new {@link BufferedWriter} for the given Writer
 * @throws NullPointerException if the input parameter is null
 * @since 2.5
 */
 public static BufferedWriter buffer(final Writer writer, int size) {
  return writer instanceof BufferedWriter ? (BufferedWriter) writer : new BufferedWriter(writer, size);
 }

While this example may seem trivial, ShiftLeft shows understanding of what the code does in this method: We are allowing call sites to control memory usage in an unbounded manner. ShiftLeft gives us the exact method and line number:

While some ShiftLeft conclusions may not all be actionable, I must say that the report has made me more aware of what to potentially secure when writing new code or maintaining old an code base.

zzfireplaceTangent: What could you do in this case? You could hard-code an upper bound of say 10 MB. You could make the bound configurable with a static non-final variable (effectively creating a global variable, an anti-pattern for sure.) You could move all the static methods to the instance side, create a memory profile class to define this bound, and then build IOUtils instances with this profile in the constructor. In addition, you could also use a different buffer class internally to make sure the buffer does not grow beyond a given size. And so on. I am not proposing these changes in the context of the Apache Commons IO 2.x line since we take great care to maintain binary compatibility within a major release across all of Apache Commons. But I can definitively see proposing changes for 3.0!

ShiftLeft’s understanding of code is deeper than this example thanks to Enhanced Code Property Graphs so let’s look at a more complex example.

Digging deeper

Here is a conclusion in raw form (edited for brevity):

{
 "id": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)/srcFile/2",
 "description": "The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.",
 "unsupportedDataType": "SENSITIVE",
 "interfaceId": "FILE/false",
 "methodId": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)",
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

Looking at the methodId tells us to go look at FileUtils.copyFileToDirectory(File, File) where we find:

/**
 * Copies a file to a directory preserving the file date.
 *

 * This method copies the contents of the specified source file
 * to a file of the same name in the specified destination directory.
 * The destination directory is created if it does not exist.
 * If the destination file exists, then this method will overwrite it.
 *

 * <strong>Note:</strong> This method tries to preserve the file's last
 * modified date/times using {@link File#setLastModified(long)}, however
 * it is not guaranteed that the operation will succeed.
 * If the modification operation fails, no indication is provided.
 *
 * @param srcFile an existing file to copy, must not be {@code null}
 * @param destDir the directory to place the copy in, must not be {@code null}
 *
 * @throws NullPointerException if source or destination is null
 * @throws IOException if source or destination is invalid
 * @throws IOException if an IO error occurs during copying
 * @see #copyFile(File, File, boolean)
 */
 public static void copyFileToDirectory(final File srcFile, final File destDir) throws IOException {
  copyFileToDirectory(srcFile, destDir, true);
 }

This method just delegates to another copyFileToDirectory() with an added parameter, no big deal. What is interesting is that the codeLocationUrl points to code not in this method but to a private utility method:

https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141

FileUtils at line 1141 is in the guts of a private method called org.apache.commons.io.FileUtils.doCopyFile(File, File, boolean) which is where ShiftLeft flags an issue where the method creates a new FileInputStream. To be honest, this is a beta and I am not convinced that the line numbers are always dead on. What is important here is that ShiftLeft is working with a code graph. Therefore, when I search the JSON conclusions for this URL, I find a total of 14 conclusions that use this URL. This tells me that this code fragment creates 14 possible attack points in the component; with a careful emphasis on possible since context is important.

Another key point is to realize that there are two key technologies at work here and that I expect both to get better as the beta progresses: First, building a code graph to give us the power to see that once a problem has been identified on a line of code, that all (I assume public) call-sites can be flagged. Second, what constitutes a problem or a conclusion in ShiftLeft’s neutral parlance will improve and be configurable, filterable and sortable.

In this example, the conclusion description reads:

The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.

What goes through my head when I read that is: Yeah, I do not want just anybody to be able to copy any file anywhere like overwriting a password vault a la copyFileToDirectory(myFile, "/etc/shadow"). Granted, Commons IO is a library, not an application, so there is no alarm bell to ring here, but you get the idea.

Stepping back, I think it is important to reiterate what happened here: ShiftLeft found an issue (less dramatic than a problem) on a line of code in a private methods, then, using its code graph, created conclusions (report items) for each public facing method that may eventually call this private method in its code path.

Are you Trusted and Sensitive?

ShiftLeft uses two terms in conclusion descriptions that I want to review. Based on the limited subset of 255 (a very computer friendly number!) conclusions I saw for all of Commons IO, we can see two types of issues: trusted and sensitive.

sensitive-data-in-the-cloud-blog-image-1Conclusions described as dealing with sensitive data: This says: Lookout, if you have a password in this variable, it’s in plain text. Now, it’s up to me to make sure that this password does not end up in a clear text file or anywhere else that is not secure. This is where context matters, you are the SME of your code, you know how much trouble you can get yourself and your users into, ShiftLeft has no opinion, it offers ‘conclusions.’

Conclusions referring to untrusted data: This tells me I should take precautions before doing anything with that data. Should I just execute this script? Should I need to worry about JSON Hijacking? See Why does Google prepend while(1); to their JSON responses?

Flipping it around on ShiftLeft

Let’s flip it around on ShiftLeft for a minute. I am now thinking about not writing sensitive data like passwords and financial reports to disk insecurely. I know we have many API in FileUtils that write strings to files. Will ShiftLeft tell me about, for example FileUtils.write(File, CharSequence, Charset)? Here is the method I should never use to write passwords or any sensitive data:

/**
 * Writes a CharSequence to a file creating the file if it does not exist.
 *
 * @param file the file to write
 * @param data the content to write to the file
 * @param encoding the encoding to use, {@code null} means platform default
 * @throws IOException in case of an I/O error
 * @since 2.3
 */
 public static void write(final File file, final CharSequence data, final Charset encoding) throws IOException {
  write(file, data, encoding, false);
 }

This in turn calls:

/**
 * Writes a CharSequence to a file creating the file if it does not exist.
 *
 * @param file the file to write
 * @param data the content to write to the file
 * @param encoding the encoding to use, {@code null} means platform default
 * @param append if {@code true}, then the data will be added to the
 * end of the file rather than overwriting
 * @throws IOException in case of an I/O error
 * @since 2.3
 */
 public static void write(final File file, final CharSequence data, final Charset encoding, final boolean append)
 throws IOException {
  final String str = data == null ? null : data.toString();
  writeStringToFile(file, str, encoding, append);
 }

Which calls:

/**
 * Writes a String to a file creating the file if it does not exist.
 *
 * @param file the file to write
 * @param data the content to write to the file
 * @param encoding the encoding to use, {@code null} means platform default
 * @param append if {@code true}, then the String will be added to the
 * end of the file rather than overwriting
 * @throws IOException in case of an I/O error
 * @since 2.3
 */
 public static void writeStringToFile(final File file, final String data, final Charset encoding, final boolean
 append) throws IOException {
   OutputStream out = null;
   try {
     out = openOutputStream(file, append);
     IOUtils.write(data, out, encoding);
     out.close(); // don't swallow close Exception if copy completes normally
   } finally {
     IOUtils.closeQuietly(out);
   }
 }

Which calls IOUtils‘:

/**
 * Writes chars from a <code>String</code> to bytes on an
 * <code>OutputStream</code> using the specified character encoding.
 *

 * This method uses {@link String#getBytes(String)}.
 *
 * @param data the <code>String</code> to write, null ignored
 * @param output the <code>OutputStream</code> to write to
 * @param encoding the encoding to use, null means platform default
 * @throws NullPointerException if output is null
 * @throws IOException if an I/O error occurs
 * @since 2.3
 */
 public static void write(final String data, final OutputStream output, final Charset encoding) throws IOException {
   if (data != null) {
     output.write(data.getBytes(Charsets.toCharset(encoding)));
   }
 }

Knowing what I know, I expect ShiftLeft to conclude that all these methods do not support sensitive data. Working back up the stack, I find:

  • org.apache.commons.io.IOUtils.write(String, OutputStream, Charset)
    Nothing on that one; did I miss it due to the 255 conclusion limit?
  • org.apache.commons.io.FileUtils.writeStringToFile(File, String, Charset, boolean)
    Got it:

     {
     "id": "org.apache.commons.io.FileUtils.writeStringToFile:void(java.io.File,java.lang.String,boolean)/file/1",
     "description": "The method `writeStringToFile` does not support handling **untrusted data** to be passed as parameter `file` because it controls access to I/O **File** in a manner that would allow an attacker to abuse it.",
     "unsupportedDataType": "ATTACKER_CONTROLLED",
     "interfaceId": "FILE/false",
     "methodId": "org.apache.commons.io.FileUtils.writeStringToFile:void(java.io.File,java.lang.String,boolean)",
     "codeLocation": {
       "file": "org/apache/commons/io/FileUtils.java",
       "lineNumber": 360,
       "symbol": "parent"
     },
     "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L360",
     "state": "NEUTRAL",
     "externalIssueUrl": "https://todo"
     }
    
  • org.apache.commons.io.FileUtils.write(File, CharSequence, Charset, boolean)
    Got it:

    {
     "id": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset,boolean)/file/2",
     "description": "The method `write` does not support handling **sensitive data** to be passed as parameter `file` because it is leaked over I/O **File**.",
     "unsupportedDataType": "SENSITIVE",
     "interfaceId": "FILE/false",
     "methodId": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset,boolean)",
     "codeLocation": {
       "file": "org/apache/commons/io/FileUtils.java",
       "lineNumber": 355,
       "symbol": "parent"
     },
     "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L355",
     "state": "NEUTRAL",
     "externalIssueUrl": "https://todo"
     }
    
  • org.apache.commons.io.FileUtils.write(File, CharSequence, Charset)
    Got it:

    {
     "id": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset)/file/2",
     "description": "The method `write` does not support handling **sensitive data** to be passed as parameter `file` because it is leaked over I/O **File**.",
     "unsupportedDataType": "SENSITIVE",
     "interfaceId": "FILE/false",
     "methodId": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset)",
     "codeLocation": {
       "file": "org/apache/commons/io/FileUtils.java",
       "lineNumber": 355,
       "symbol": "parent"
     },
     "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L355",
     "state": "NEUTRAL",
     "externalIssueUrl": "https://todo"
     }
    

So yeah, that all hangs nicely together, thank you code graphs!

Finding arbitrary code attacks

Did I mention there 255 are conclusions in the JSON report file? It takes a while to go through these. I am hoping that ShiftLeft will have their UI in place soon so I can filter and sort all this information! Now that I am about 20% through the file, I see:

The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.

runaway_train

Yikes! let’s take a deeper look and see if this is for real or a false positive.

Here is the raw conclusion:

{
 "id": "org.apache.commons.io.Java7Support.createSymbolicLink:java.io.File(java.io.File,java.io.File)/symlink/1",
 "description": "The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.",
 "unsupportedDataType": "ATTACKER_CONTROLLED",
 "methodId": "org.apache.commons.io.Java7Support.createSymbolicLink:java.io.File(java.io.File,java.io.File)",
 "codeLocation": {
 "file": "org/apache/commons/io/Java7Support.java",
 "lineNumber": 128,
 "symbol": "file"
 },
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/Java7Support.java#L128",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

Our codeLocationUrl for this conclusion points us to the Java7Support class at line 128: where we find:

/**
 * Indicates if a symlunk target exists
 * @param file The symlink file
 * @return true if the target exists
 * @throws IOException upon error
 */
 private static boolean exists(File file)
 throws IOException {
   try {
     Object path = toPath.invoke(file);
     final Boolean result = (Boolean) exists.invoke(null, path, emptyLinkOpts);
     return result.booleanValue();
   } catch (IllegalAccessException e) {
     throw new RuntimeException(e);
   } catch (InvocationTargetException e) {
     throw (RuntimeException) e.getTargetException();
   }
}

ShiftLeft points to the line:

Object path = toPath.invoke(file);

The instance variable toPath is a java.lang.reflect.Method which can and does execute code as shown above. Looking narrowly at the code so far we can say that yes, this code run anything since toPath is a Method.

However, widening our view to the field declaration we see the following in the class static initialization block:

toPath = File.class.getMethod("toPath");

This makes sense in the context of the class: Java7Support is used to access Java 7 features while running on pre-Java 7 platforms. Here we are setting up toPath to run one method. I would expect toPath to be a static final but it is not:

private static Method toPath;

Why is it not static? Well, it’s just that the way the static initialize block is written does not allow you to just add final to the declaration. The static block needs to be rewritten to allow for toPath to be final which we will leave as ‘an exercise to the reader’ 😉 as it is out of scope for an already long blog post.

I would be curious to see how ShiftLeft responds to such a code change.

I am not sure if this is really a problem though. The variable is private now, but not final. Yes its type (Method) is all about executing code. Under normal circumstances, this value cannot be changed outside this class. I can use reflection of course to force a new value in toPath. Does that mean that anytime I use a Method instance variable I am going to get an arbitrary code execution conclusion? Another corner-case to examine.

What if I rewrote the static block and declared the variable final. Would ShiftLeft still reach the same conclusion? If yes, would that be because I could still use reflection to hammer any value in the field.

Concluding on this first arbitrary code attack

The more I explore these test results, the more I realize how tricky security is and how much context matters. I now know that the Java7Support class in Apache Commons IO 2.5 is open to an arbitrary code attack under the narrow use case of another piece of code using reflection. But if that code is allowed to use reflection, surely it could achieve its goal without going through the extra hoop of Java7Support hacking.

Stepping back, the realization is that I should think twice about using the Method class because I could open my application up to an attack unless that Method field is properly protected.

Looking for more arbitrary code attacks

Now that ShiftLeft has whetted my appetite, I wonder if there are more arbitrary code attacks lurking. A quick search through the file reveals to total of five. Not surprisingly, these are all in the Java7Support class and all follow the same pattern as above: calling the invoke method of a Method object where the Methodis initialized in the static block.

Flipping it around once more, let’s look at the Java7Support class variable declarations and see if all Method objects end up being accounted for by ShiftLeft:

/**
 * Java7 feature detection and reflection based feature access.
 * <p/>
 * Taken from maven-shared-utils, only for private usage until we go full java7
 */
class Java7Support {

  private static final boolean IS_JAVA7;

  private static Method isSymbolicLink;

  private static Method delete;

  private static Method toPath;

  private static Method exists;

  private static Method toFile;

  private static Method readSymlink;

  private static Method createSymlink;
  ...

We have seven static Method declarations which I see initialized in the static block:

static {
 boolean isJava7x = true;
 try {
   ClassLoader cl = Thread.currentThread().getContextClassLoader();
   Class<?> files = cl.loadClass("java.nio.file.Files");
   Class<?> path = cl.loadClass("java.nio.file.Path");
   Class<?> fa = cl.loadClass("java.nio.file.attribute.FileAttribute");
   Class<?> linkOption = cl.loadClass("java.nio.file.LinkOption");
   isSymbolicLink = files.getMethod("isSymbolicLink", path);
   delete = files.getMethod("delete", path);
   readSymlink = files.getMethod("readSymbolicLink", path);
   emptyFileAttributes = Array.newInstance(fa, 0);
   createSymlink = files.getMethod("createSymbolicLink", path, path, emptyFileAttributes.getClass());
   emptyLinkOpts = Array.newInstance(linkOption, 0);
   exists = files.getMethod("exists", path, emptyLinkOpts.getClass());
   toPath = File.class.getMethod("toPath");
   toFile = path.getMethod("toFile");
   } catch (ClassNotFoundException e) {
     isJava7x = false;
   } catch (NoSuchMethodException e) {
     isJava7x = false;
   }
   IS_JAVA7 = isJava7x;
 }

ShiftLeft gives me five conclusions:

The method `isSymLink` does not support handling **untrusted data** to be passed as parameter `file` because it could allow an attacker to run arbitrary code.
The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.
The method `delete` does not support handling **untrusted data** to be passed as parameter `file` because it could allow an attacker to run arbitrary code.
The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `target` because it could allow an attacker to run arbitrary code.
The method `readSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.

All of the Method declarations in this class are used by all of the methods listed above. Nice.

Fin

fin_3_0I’d like to wrap up this exploration of ShiftLeft with a quick summary of what we found: a tool we can add to our build pipelines to find potential security issues. There are a lot of data here, and this is just for Apache Commons IO 2.5, so another lesson is that context matters. ShiftLeft currently provides ‘conclusions’ based on a code graph. We found conclusions about untrusted data (I’m not sure what’s in here so don’t go executing it), sensitive data (don’t save passwords in plain text!) and arbitrary code attacks.

I hope revisit this story and run ShiftLeft on other Apache Commons projects soon. This sure is fun!

Happy Coding,
Gary Gregory

The Art of Test Driven Development: Running unit tests conditionally

There comes a time where Java’s promise of “write once, run anywhere” breaks down for some of us. In my case and my Windows users, file paths present platform specific challenges. This article will show you how to run unit tests conditionally; only on Windows for example.

I talked about conditionally ignoring tests in JUnit in the past, but I’d like to revisit this topic in a little more detail.

I have an application that uses URIs for file references, but for most sane Windows users, a file reference is something like E:\some\folder\foo.txt, not file:///E:/some/folder/foo.txt. Is it ‘file’, a colon, and two slashes, or three, or four? Darn!

To make life easier for normal and power users, the application supports both Windows paths and URIs. The application converts Windows file paths to URIs internally and everyone’s happy. How do you unit test this? Just as always, you say. The trick comes when your build runs on a farm of Windows and Linux boxes, and some of your carefully crafted tests pass on Windows but fail on Linux for obtuse reasons that are not the point of this write up. The bottom line is that I want to run certain tests on Windows only. Fortunately, JUnit provides a nice way to do this using its Assume class.

Here’s how you do it:

private static final boolean IS_WINDOWS =
  System.getProperty("os.name").startsWith("Windows");

@Test
public void testMeOnWindows() {
  Assume.assumeTrue(IS_WINDOWS);
  …
}

We first set a Boolean constant that indicates that we are running on Windows or not. To do that, we simply lookup the os.name system property and check that it starts with the String “Windows”.

The JUnit Assume class causes JUnit to ignore a test if the given condition fails. In our example, the test method will only proceed on Windows; but the test will not fail.

A failed assumption does not mean you have a bug or that the test failed, it means that JUnit should not continue running the test and not mark it as a failure. JUnit test runners will skip tests will failing assumptions. Your mileage may vary with other test runners.

Assume checks can be for any condition, giving you and your tests a lot of flexibility.

Under the hood

How does JUnit do this? When you call an Assume method like assumeTrue(), the following appens:

public static void assumeTrue(boolean b) {
  assumeThat(b, is(true));
}

JUnit calls the more general assumeThat() method with a Hamcrest matcher from org.hamcrest.CoreMatchers.is(T):

public static <T> void assumeThat(T actual, Matcher<T> matcher) {
  if (!matcher.matches(actual)) {
    throw new AssumptionViolatedException(actual, matcher);
  }
}

The expression matcher.matches(actual) uses Hamcrest to evaluate your condition and returns true if it passes. If your condition fails, the method assumeThat() throws an AssumptionViolatedException; which JUnit uses to implement assumptions. The JUnit guts catch these exceptions and handle them differently than other exceptions by adding the assumption violation to a list it uses in its reports.

There you have it, JUnit assumptions give your tests an extra level of flexibility by allowing you to run tests conditionally.

Happy Coding,
Gary Gregory

The Art of Test Driven Development: Logging JUnit Test Results

JUnit saves your test results to a file (using XML, plain, brief or failure formatters). If you use Maven, you can convert these files into an HTML report easily. However, that means waiting for a build to complete. If you want to see your test results while a Maven or Ant build is running, you must watch the command line.

In an IDE like Eclipse, you can watch the JUnit view to see tests progress. Wouldn’t it be nice to be able to do whatever you want with tests results on the fly?

What we’ll show you here as an alternative is logging your test results while JUnit runs your tests. This gives you all the power of Log4j to direct test results all over the place, from a simple console, to a database, a JMS queue, or any of the long list of appenders Log4j provides. You can provide your own Log4j appender as well for custom processing.

The basic mechanism that lets us do this is JUnit’s TestWatcher rule. Let’s introduce the TestWatcher with a simple example that logs test results to a StringBuilder. The example dumps the builder’s contents on the console when the test finishes.

package com.garygregory.log4j;

import org.junit.AfterClass;
import org.junit.Assert;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TestWatcher;
import org.junit.runner.Description;

public class StringBuilderTestWatcherTest {
    private static final String EOL = 
        System.getProperty("line.separator");

    private static StringBuilder builder = new StringBuilder();

    @AfterClass
    public static void afterClass() {
        System.out.print(builder);
    }

    @Rule
    public TestWatcher watchman = new TestWatcher() {

        @Override
        protected void failed(Throwable e, Description description) {
            if (description != null) {
                builder.append(description);
            }
            if (e != null) {
                builder.append(' ');
                builder.append(e);
            }
            builder.append(" FAIL");
            builder.append(EOL);
        }

        @Override
        protected void succeeded(Description description) {
            if (description != null) {
                builder.append(description);
            }
            builder.append(" OK");
            builder.append(EOL);
        }
    };

    @Test
    public void testFails() {
        // test
        Assert.fail();
    }

    @Test
    public void testSucceeds() {
        // test
    }
}

In this example, the test class implements the TestWatcher rule directly. JUnit calls each method you annotate with @Test one after the next. JUnit calls the failed() method when a @Test method fails and gives it a description and an exception. JUnit calls the succeeded() method when a @Test method completes normally, no JUnit Assert calls failed, no unexpected exceptions.

The method afterClass() (annotated with JUnit’s @AfterClass) runs after all test methods have run and outputs the contents of the StringBuilder.

The console output is for the StringBuilder example is:

testSucceeds(com.garygregory.log4j.StringBuilderTestWatcherTest) OK
testFails(com.garygregory.log4j.StringBuilderTestWatcherTest) java.lang.AssertionError FAIL

It’s easy to imagine growing and morphing this StringBuilder implementation into all sorts of things: to write the results to a file, with your own format, or send results to a database, and so on.

Instead of going that way though, you should bridge to Log4j and reuse its powerful and rich feature set, which let’s you do all that and much more. The power of Log4j comes from the simplicity of its API combined with rich configurations you create in XML, JSON, or YAML. Log4j 2.4 also provides a programmatic configuration API, but that’s a separate topic.

To log our results to Log4j, we replace the StringBuilder with a Logger:

package com.garygregory.log4j;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.junit.Assert;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TestWatcher;
import org.junit.runner.Description;

public class Log4jTestWatcherTest {

    private static Logger logger = LogManager.getLogger();

    @Rule
    public TestWatcher watchman = new TestWatcher() {

        @Override
        protected void failed(Throwable e, Description description) {
            logger.error(description, e);
        }

        @Override
        protected void succeeded(Description description) {
            logger.info(description);
        }
    };

    @Test
    public void testFails() {
        // test
        Assert.fail();
    }

    @Test
    public void testSucceeds() {
        // test
    }
}

In this example, we still embed an implementation of TestWatcher in the test itself, but this time the methods call on its Logger. When a test fails, we log at the ERROR level, and when a test succeeds, we log at the INFO level.

The console output is for the Log4j example is:

2015-09-14 20:25:57,173 INFO [main][com.garygregory.log4j.Log4jTestWatcherTest] testSucceeds(com.garygregory.log4j.Log4jTestWatcherTest)
2015-09-14 20:25:57,175 ERROR [main][com.garygregory.log4j.Log4jTestWatcherTest] testFails(com.garygregory.log4j.Log4jTestWatcherTest)
java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:86)
  at org.junit.Assert.fail(Assert.java:95)
  at com.garygregory.log4j.Log4jTestWatcherTest.testFails(Log4jTestWatcherTest.java:32)

The next obvious step is to abstract this TestWatcher in a class:

package com.garygregory.log4j;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.junit.rules.TestWatcher;
import org.junit.runner.Description;

public class Log4jTestWatcher extends TestWatcher {

    private final Logger logger;

    public Log4jTestWatcher() {
        logger = LogManager.getLogger();
    }

    public Log4jTestWatcher(String loggerName) {
        logger = LogManager.getLogger(loggerName);
    }

    @Override
    protected void failed(Throwable e, Description description) {
        logger.error(description, e);
    }

    @Override
    protected void succeeded(Description description) {
        logger.info(description);
    }
}

And reuse it in the test:

package com.garygregory.log4j;

import org.junit.Assert;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TestWatcher;

public class Log4jTestWatcherTest2 {

    @Rule
    public TestWatcher watchman = new Log4jTestWatcher();

    @Test
    public void testFails() {
        // test
        Assert.fail();
    }

    @Test
    public void testSucceeds() {
        // test
    }
}

This is short and sweet, compare it to our first listing!

Boom! All the power of Log4j and JUnit combined.

Happy Coding,
Gary Gregory

The Art of Test Driven Development: Understanding Logging

Modern logging framework include the concepts of levelsloggers, markers, and filters. How and when should you use each? This is what I want to clarify based on discussions recently taking place on the Log4j user mailing list.

While logging is obviously a Martha Stewart-like “good thing”, what, when, and how to log might not be obvious to some. The how can be quite tricky, or not even possible with older frameworks. This article will clarify basic and advanced logging concepts using Java and Apache Log4j 2 as our reference platform.

Log4j 2 is the best Java logging library out there, I claim. Yes, better than Slf4j, Logback, and without a doubt, JUL. Granted, I am partial, as one of its contributors.

Central to any logging system is the log event, this is the piece of information you are sending to the log system, like “hello, I just started running.” Let’s start with the simplest concept: the log event’s level.

Understanding Log Levels

Log4jLevelsI think of levels as “how important is this event” and “who needs to know this”. Some of the art of logging is deciding who your audience is. To help your development team chase down a bug, you want to make sure that your app logs interesting events at the DEBUG and TRACE level. This is different than “what it is I am telling my audience”, which is where loggers come in.

To help you rate the importance of log events, Log4J provides the following levels: FATAL, ERROR, WARN, INFO, DEBUG, and TRACE. When I enable a level, Log4j logs these events at that level and all levels above it. For example, enabling WARN events will show me WARN through FATAL events, but not DEBUG through TRACE. In addition, Log4j bookends these levels with two special levels: OFF and ALL.

Let’s look at logging events in an authentication module. A user successfully logging in is normal and logs an INFO event. The module rejects a user after he enters the wrong password and logs a WARN event. After three unsuccessful attempts, the module locks out this user, and logs an ERROR event:

INFO : User alice logged in.
WARN : User bob entered an invalid password.
ERROR: User bob entered an invalid password three times, user locked out.

Then from a different part of the app, the email agent, for example:

DEBUG: Loading user bob from database jdbc:...
INFO : Emailed user bob: three login attempts rejected, user locked out.
ERROR: Email to alice bounced back; subject: Your weekly summary.

Depending on who does the logging, the meaning of what is normal vs. an error can be very different.

To wrap up this section, I usually configure logging when I start the application, through a configuration file. For example, telling Log4j to log all events at the WARN level and above. This lets me see only non-mundane events, stuff I should probably pay attention to. It is also possible to reconfigure Log4j while an application is up and running, a topic I’ll leave outside the scope of this article; it’s worth noting that the Log4j can watch its configuration file for changes.

Using Custom Log Levels

Advanced logging frameworks like Log4j allow you to define custom levels. Custom levels can help you understand levels better by applying the level concept to your application domain. Let’s take a little detour and look at some interesting level nomenclatures.
DEFCON levels
My favorite example is a War Games module that implements the five defense readiness conditions (DEFCONs); the alert states used by the United States Armed Forces:

“It prescribes five graduated levels of readiness (or states of alert) for the U.S. military, and increase in severity from DEFCON 5 (least severe) to DEFCON 1 (most severe) to match varying military situations.”[1]

The U.S. Department of Defense uses exercise terms when referring to DEFCON levels. The intent being to preclude confusing exercise commands with actual operational commands. The exercise term are: COCKED PISTOL (DEFCON 1), FAST PACE, ROUND HOUSE, DOUBLE TAKE, and FADE OUT (DEFCON 5).

DEFCON levels
Name Description Color
DEFCON 1 Maximum force readiness. White
DEFCON 2 Armed Forces ready to deploy and engage in less than 6 hours. Red
DEFCON 3 Air Force ready to mobilize in 15 minutes. Yellow
DEFCON 4 Above normal readiness. Green
DEFCON 5 Normal peacetime readiness. Blue

The DEFCONs are perfect candidates for custom log levels, as are their exercise terms.

You can also imagine a set of custom levels based on the Saffir-Simpson hurricane wind scale.

All North Atlantic and Eastern North Pacific major hurricanes (at least Category 3 on the Saffir-Simpson Hurricane Scale)

This graphic shows all North Atlantic and Eastern North Pacific major hurricanes (at least Category 3 on the Saffir-Simpson Hurricane Scale)

The National Hurricane Center defines the the Saffir-Simpson Hurricane Wind Scale as “a 1 to 5 rating based on a hurricane’s sustained wind speed. This scale estimates potential property damage. Hurricanes reaching Category 3 and higher are considered major hurricanes because of their potential for significant loss of life and damage. Category 1 and 2 storms are still dangerous, however, and require preventative measures. In the western North Pacific, the term “super typhoon” is used for tropical cyclones with sustained winds exceeding 150 mph.”

Saffir-Simpson Hurricane Wind Scale (more)
Category Sustained Winds Damage
5 (major) 157 mph or higher
137 kt or higher
252 km/h or higher
Catastrophic damage will occur.
4 (major) 130-156 mph
113-136 kt
209-251 km/h
Catastrophic damage will occur.
3 (major) 111-129 mph
96-112 kt
178-208 km/h
Devastating damage will occur.
2 96-110 mph
83-95 kt
154-177 km/h
Extremely dangerous winds will cause extensive damage.
1 74-95 mph
64-82 kt
119-153 km/h
Very dangerous winds will produce some damage.

This is another great example where domain levels would make a great example for custom log levels.

I hope these examples make it clear, if not already, that the level is about the relative importance of log events, regardless of who sends events.

To recap: A log level describes how important an event is: Is this a low-level step, or a rare and dangerous once-in-a-100-year event?

Next up are loggers, the kind of logging object an application interfaces with the most often.

Understanding Loggers

An application send log events like “I woke up” to the logging system through a logger.

The log event includes a level so what the application is really saying is, “Here is an interesting bit of information (nothing alarming); I just work up.”

A car could log a warning event saying, “Hey, driver, I’m running low on gas!” In code, this usually looks something like this:

logger.warn(“I’m low on gas.”);

A better message would include how much gas remains:

logger.warn(“I’m low on gas; {} gallons remaining.”, gallonsRemaining);

In this example, the {} is a parameter marker.

A logger has a name, in this case, we could call ours “car”.

If you imagine a large enough system, using the same logger to log all events will make it harder to find any single event or an interesting subset out of the whole event set. Let’s say my system includes my house, office, home office, kitchen, each room in the house, people, cars, toys, appliances, and so on. You can see that we will end up with a lot of logging events, from all over the place. To remedy this issue, you use a distinct logger for each kind of object and give each one a unique name.

If I want to analyze my driving and still be alerted of all other important events, I can now configure Log4j to log all car events at the DEBUG level and all other loggers at the WARN level.

Refining Logging with Hierarchical Loggers

Let’s take this a step further, looking at logging for our kitchen subsystem, and imagine that we look at all events for one day, regardless of the levels involved.

Every time the refrigerator door opens, I get an event, when it closes, another event. Same for the kitchen door, the oven door, the dishwasher door; you get the idea. The refrigerator can also warn me when its water filter needs replacing. The oven can tell me when preheating to a preset temperature is done, when the timer goes off and my cake is done.

What I then get is a flood of events from the kitchen logger, even if I can throttle the importance of events through levels, I still get to hear from all my appliances. What if I am only interested in the oven? Now, I know what you’re thinking: “Gary, just create one logger for each appliance”. “Well, yes you’re right” I’d say, but what would you call each logger? “I’d call them refrigerator, dishwasher, oven door, oven timer and so on”.

OK, but what if want to listen to all kitchen events again? Not just the one appliance? I’d have to set the level on each appliance logger, if I have dozens (imagine hundreds) of loggers, that’s quite a pain. There should be a way I can set the level for the whole kitchen, including everything in it.

And there is: By using hierarchical loggers.

To create a logger hierarchy, all you have to do is use periods in the name of your loggers, kind of like with internet domains. For example:

  • Kitchen
  • Kitchen.Oven
  • Kitchen.Oven.BottomOven
  • Kitchen.Oven.BottomOven.Door
  • Kitchen.Oven.BottomOven.Timer
  • Kitchen.Oven.BottomOven.Light
  • Kitchen.Oven.TopOven.Door
  • Kitchen.Oven.TopOven.Timer
  • Kitchen.Oven.TopOven.Light
  • Kitchen.Dishwasher
  • Kitchen.Refrigerator
  • Kitchen.Refrigerator.Door
  • Kitchen.Refrigerator.Filter
  • Kitchen.Refrigerator.Light
  • Kitchen.Refrigerator.Freezer.Door
  • Kitchen.Refrigerator.Freezer.Light
  • And so on

If my configuration only sets the Kitchen level to INFO, then all loggers that start with the string Kitchen inherit its level, INFO. In addition, if I configure the Kitchen.Oven logger to DEBUG, then all loggers that start with the string Kitchen.Oven will log at the DEBUG level. That’s how level inheritance works. Pretty cool and very powerful.

On the coding side, a common and pragmatic practice is to use one static logger for each class, named with the fully qualified name of the class (meaning the package name is included, like com.example.appliance.oven).

By carefully naming your loggers, you can get precisely defined log events, suitable for all manners of clever filtering, searching, and analysis.

You can already see that with just two concepts, levels and hierarchical loggers, you can get a world of well-defined log events at your fingertips, allowing you to analyze and debug your system to your hearts content, in live applications or post-mortem.

Top Gun Logging with Markers

The next ingredient in our logging soup is the event marker, which adds another level (pardon the pun) of information (again, sorry) to a log event. A Log4j marker has a name, and yes, it too can be hierarchical, but let’s leave that bit for another article. So, what is a marker? The way I like to look at it, is that a marker qualifies the kind of event I am signaling, like a tag on a blog post. This is very handy when many parts of the system, using different loggers, log the same kind of event. For example, every time a user opens and closes a door, whether that be an appliance door, a garage door, a doggy door or a POD (a Plain Old Door). This would sure be handy for security monitoring.

How would you only log open and close door events using loggers and levels. You can assume that door events are logged at… what? Maybe some appliances log door events at the INFO level and others at the DEBUG level. What if doors from a third party library log this kind of event at TRACE, another twist!

Noodle on that one for a minute…

Well, tricky, isn’t it? If I have loggers like Kitchen.Refrigerator.Door and Kitchen.Refrigerator.Door, the common root is Kitchen, not Door, so setting Kitchen to INFO is not what I want because I will get INFO events from all kitchen loggers! And what about the appliance that logs door events at the DEBUG level? That approach is dead on arrival.

Well, what about the reverse domain name trick? For each logger I could create another logger with the name in reverse, one word at time, where Kitchen.Refrigerator.Door becomes Door.Refrigerator.Kitchen. Then I can say set the Door logger to INFO and get all Door events! That would work for a given level but it sure is nasty! Two loggers for each appliance? But it’s worse, I’d need more reversed names for my Filters, Lights, and so on. Worse of all, this does not account for appliances that logs door events at the DEBUG and TRACE level! Nope, can’t be done with this reverse hack.

That’s when Markers come to the rescue. The code in all your classes that contains a door changes from:

logger.info(&quot;Door opened&quot;);
...
logger.info(&quot;Door closed&quot;);

to:

logger.info(DOOR, &quot;Door opened&quot;);
...
logger.info(DOOR, &quot;Door closed&quot;);

Where DOOR is a constant that defines a marker, in Log4j, you do it like this:

public static final Marker DOOR = MarkerManager.getMarker(“DOOR”);

No new loggers needed, just an additional parameter to your log call, regardless of the level API used.

Now, I can configure Log4j to log only events that contain the marker DOOR (if that level is enabled). Truly awesome. In order to do this, we must use filters:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;Configuration status=&quot;WARN&quot; monitorInterval=&quot;5&quot;&gt;
  &lt;Appenders&gt;
    &lt;Console name=&quot;Console&quot; target=&quot;SYSTEM_OUT&quot; /&gt;
  &lt;/Appenders&gt;
  &lt;Loggers&gt;
    &lt;Root level=&quot;WARN&quot;&gt;
      &lt;AppenderRef ref=&quot;Console&quot; /&gt;
    &lt;/Root&gt;
    &lt;Logger name=&quot;com.example.oven&quot; level=&quot;DEBUG&quot; /&gt;
    &lt;Logger name=&quot;com.example.garage&quot; level=&quot;INFO&quot; /&gt;
  &lt;/Loggers&gt;
  &lt;Filters&gt;
    &lt;MarkerFilter marker=&quot;DOOR&quot; onMatch=&quot;ACCEPT&quot; onMismatch=&quot;DENY&quot; /&gt;
  &lt;/Filters&gt;
&lt;/Configuration&gt;

Here, start by setting the root logger to WARN, then we set the oven to log at DEBUG because the oven logs door events at the DEBUG level, and the garage to log at INFO because the garage logs door events at the INFO level. Next, we define filters, and here we only have one: a marker filter set to only accept DOOR markers and reject all others.

If instead, we wanted to hide DOOR events, we’d say:

&lt;MarkerFilter marker=&quot;DV.BUFFER&quot; onMatch=&quot;DENY&quot; onMismatch=&quot;NEUTRAL&quot; /&gt;

I’ve used markers for:

  • Different kinds of hex dumps:
    • Compressed buffers
    • Decompressed buffers
    • IO buffers
  • Different kinds of SQL statements:
    • Queries
    • Batches
    • Prepared statements
  • Flow tracing (built-in Log4j)
  • and more

That, my friends, is what I call Top Gun logging!

Summary

Levels tell you how important an event is. Loggers define who does the logging. Markers refine what it is you are logging. Filters let you control what ends up in your logs. If you combine these concepts with care, your applications will create logs that will help you debug, analyze, monitor and audit your applications with ease and grace.

Happy coding,
Gary Gregory

[1] DEFCON: https://en.wikipedia.org/wiki/DEFCON

The Art of Test Driven Development: Per-Test Logging

Log4j logoAll applications complex enough have a logging subsystem. When writing tests, it is often useful to enable a specific logging configuration for a given test case. In this article, I will show how to use Apache Log4j and JUnit to achieve this goal.

Log4j is the granddaddy of logging framework and yes, I am partial to it as one of its developers and member of its Project Management Committee. I find it far superior, as we are about to release version 2.4, to all of its cousins. JUnit and the best practice of Test Driven Development should need no introduction; if you use neither, then I welcome you to the 21st century and its ‘extreme’ or ‘agile’ development practices.

When you are running a larger build with thousands of tests or even a small build with dozens, you do not want to set the logging level to debug or trace for the whole build. A small subsystem can end up generating a large amount of log events, leaving you with the task equivalent to finding a needle in a haystack in order to locate the few logging events relevant to your development session.

What I want, is to tell JUnit that a given test case should run with a specific Log4j configuration and then forget this configuration when the test finishes. To accomplish this, we will use a JUnit Rule and Log4j’s LoggerContextRule (This class is located in Log4j’s test jar. Log4j 2.3 named this class InitialContextRule).

Here is our test boilerplate:

public class MyTest {

  @Rule
  public LoggerContextRule ctx = 
    new LoggerContextRule("resources/MyConfig.xml");

  @Test
  public void testSomething() throws Exception {
    // testing…
  }

}

In JUnit, a @Rule annotates a field that points to a rule or a method that returns a rule. In our case, we store our rule in an instance variable, which must be public and non-static. A rule will run methods annotated with @Before, then @Test methods, and finally @After methods. JUnit throws an exception when any of these fail. The code for the rule itself runs before each @Before method. This means we get a fresh logging configuration for each @Test method.

The LoggerContextRule argument is a URI or path to a configuration file on your classpath.

If you want the rule to run only once for the whole test class, then use a class rule instead:

@ClassRule
public static LoggerContextRule ctx = 
  new LoggerContextRule("resources/MyConfig.xml");

Note that the variable is now static (and still public).

Whether you should use a rule or a class rule depends on how much isolation you want between test methods. In the case of logging, it should not matter and using a class rule should be preferred since your build be faster.

You can find the Log4j rule class in the Core’s test jar here.

Happy Coding,
Gary Gregory

JUnit Tip: Use rules to manage temporary files and folders

You can remove the burden of deleting temporary files and folders from your test code by using a JUnit TemporaryFolder Rule. Rules themselves are new in JUnit 4.7 and are used to change the behavior of tests. For example, the following test creates and deletes a temporary file and folder:

package test;

import java.io.File;
import java.io.IOException;

import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TemporaryFolder;

public class TestTemporaryFolderRule {
    @Rule
    public TemporaryFolder testFolder = new TemporaryFolder();

    @Test
    public void testInTempFolder() throws IOException {
        File tempFile = testFolder.newFile("file.txt");
        File tempFolder = testFolder.newFolder("folder");
        System.out.println("Test folder: " + testFolder.getRoot());
        // test...
    }
}

In order to enable this feature, you must use the annotation @Rule on the declaration of the TemporaryFolder instance variable. The TemporaryFolder creates a folder in the default temporary file directory specified by the system property java.io.tmpdir. The method newFile creates a new file in the temporary directory and newFolder creates a new folder.

When the test method finishes, JUnit automatically deletes all files and directories in and including the TemporaryFolder. JUnit guarantees to delete the resources, whether the test passes or fails.