How to mitigate CVE-2021-44228 in Log4j 2

Leave a reply

This post describes how to mitigate against CVE-2021-44228: In Apache Log4j2 2.0-beta9 through 2.14.1, the JNDI features used in configurations, log messages, and parameters do not protect against an attacker-controlled LDAP and other JNDI related endpoints. An attacker who can control log messages or log message parameters can execute arbitrary code loaded from LDAP servers when message lookup substitution is enabled.

The Log4j JNDI Attack

Log4j 2.16.0 addresses CVE-2021-45046: Apache Log4j2 Thread Context Message Pattern and Context Lookup Pattern vulnerable to a denial of service attack.

Log4j 1.x mitigation

Log4j 1.x does not have Lookups so the risk is lower.

Applications using Log4j 1.x are only vulnerable to this attack when they use JNDI in their configuration. A separate CVE-2021-4104 has been filed for this vulnerability. To mitigate: audit your logging configuration to ensure it has no JMSAppender configured. Log4j 1.x configurations without JMSAppender are not impacted by this vulnerability.

Log4j 2.x mitigation

Implement one of the mitigation techniques below.

Java 8 users should upgrade to release 2.17.0 (latest).
Java 7 users should upgrade to release 2.12.2.
Otherwise, remove the JndiLookup class from the classpath:
zip -q -d log4j-core-*.jar org/apache/logging/log4j/core/lookup/JndiLookup.class

Note: The log4j-api JAR file in Log4j2 is not impacted by this vulnerability.

See also the Apache Log4j Security page.

How we handle binary compatibility at Apache Commons

Leave a reply

As a maintainer of the free and open source software project Apache Commons, I review pull requests on GitHub. Since the libraries we produce in components like Commons Lang, Commons IO, and Commons VFS are used directly and transitively in countless applications, open and closed source alike, we want to be careful when releasing new versions to cause the least headaches as possible. One way to enforce our least-headache policy is a recurring reason PRs are rejected: The PR breaks binary compatibility. I’ll outline here how we deal with this issue while still allowing our APIs to evolve.

What is binary compatibility?

Maintaining binary compatibility, in a nutshell, is the ability to drop a new version of a class file (in a jar for example) on top of your application and have it run without the Java Virtual Machine throwing an error trying to call the updated code.

At the JLS level, this means binary compatibility is preserved when a binary that previously linked without error continues to link without error.

Binary compatibility is… well, a binary quality: A release either is or isn’t binary compatible with its previous version.

If you remove a class or method and your app tries to call it, you break binary compatibility and your app blows up.

If you add or remove a parameter to a method, you break binary compatibility and, your app blows up.

This is all detailed in the Java Language Specification.

Note that this is different from source compatibility and behavioral compatibility.

Benefiting from binary compatibility

Why do this? To avoid Jar hell and needlessly breaking applications. See also DLL hell.

Defining guidelines at Apache Commons

Our guideline at Apache Commons on binary compatibility is:

Don’t break binary compatibility in a minor release (e.g. 2.0 to 2.1) or a maintenance release (e.g. 2.1.2 to 2.1.3).

For a major release (e.g. 2.2 to 3.0), you may break binary compatibility; and if you do, you must change the Java package name and Maven coordinate’s artifact ID.

For example, when Apache Commons Lang 3.0 was released, we updated the Java package from org.apache.commons.lang to org.apache.commons.lang3 and the Maven artifact ID from commons-lang to commons-lang3.

In the case of Apache Commons Lang, the 1.x and 2.x series of releases live in the same package and Maven coordinates.

This system allows, and this is key, for version 2 and 3 to co-exist in the same class loader. While the same is true for version 1 and 3, it is not for 1 and 2.

What am I allowed to change?

Almost everything, according to the Java Language Specification, you may:

Rewrite the body of methods, constructors, and initializers (like static blocks).
Rewrite code in the above that previously threw exceptions to no longer do so.
Add fields, methods, and constructors.
Delete items declared private.
Reorder fields, methods, and constructors.
Move a method higher in a class hierarchy.
Reorder the list of direct super-interfaces in a class or interface.
Insert new class or interface types in a type hierarchy.
Add generics (since the compiler erases them).

In addition, our guidelines allow for updating package private element.

Checking for binary compatibility

When a release candidate is put up for a vote, it is proposed along with a web site generated by Maven which contains a report on binary compatibility. There are several tools you can use to do this, we usually use the JApiCmp Maven plugin. For example, this is the report for the current version of Apache Commons Lang.

A pattern I like using is providing a default Maven goal which includes JApiCmp which I invoke for builds on Travis-CI and GitHub Actions. For example:

<defaultGoal>clean verify apache-rat:check japicmp:cmp checkstyle:check spotbugs:check javadoc:javadoc</defaultGoal>

Wrapping up

All of this to say, as a library author:

Don’t give your users headaches or put their apps in Jar hell by maintaining binary compatibility within a major release line.
Use a build tool to fail a build that breaks binary compatibility.
When you must break binary compatibility, change the Java package name and Maven coordinates (this works for Ivy, Gradle, and so on).

Explaining Terms at Apache

At Apache, the term project refers to a whole top level project (TLP).

Apache Commons is a TLP.

Apache Commons is made up of over 20 components.

Apache Commons components produce one or or more jar files. The components are listed on the Commons’ project main page.

Enabling Long Path Names on Windows

Leave a reply

Here is how you enable long path names on Windows (I tested this on Window 10.0.16299.1268):

Run:
```
regedit.exe
```

Open the key:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem

Set LongPathsEnabled to 1:
Reboot

Then, if you are a git user, you will need to run:

git config --system core.longpaths true

This usually requires admin rights.

That’s it.

Visualizing Code Vulnerabilities with the new ShilftLeft UI

Leave a reply

This post is a follow up to Using ShiftLeft in Open Source, where I was looking to see if I could apply the principle of shift left testing to security. Now that ShiftLeft has a user interface, I want to come back to it and revisit looking at results from the UI instead of pouring through JSON reports. You’ll find that this write up parallels my original post so reading the original is not required to get up to speed.

Getting Rid of FUD and Panic

To get us started, allow me to go through the premise from my initial post: My long term goal is to formally insert security awareness into my development practices and eventually into my continuous integration-based builds.

After years of being involved in open-source development at Apache, we’ve seen security issues pop up in Apache Commons like arbitrary remote code execution, and denial of service attacks (CVE-2016-3092 and CVE-2014-0050). While some threats are real, other are just FUD. Even when they are real, it is important to consider context. There may be problems that end users never see because the “hole” is not reachable by users’ actions.

The idea behind ShiftLeft is to break old habits of building a product and then, later, figuring out how to fend off attacks and plug up security holes. Today, we take for granted that unit testing, integration testing, continuous integration, and continuous delivery are common place. ShiftLeft propose to make security analysis as ubiquitous.

By DonFiresmith (Own work) [CC BY-SA 4.0], via Wikimedia Commons

Getting Started

Since ShiftLeft is free for open source projects, I decided to look what it reports for Apache Commons IO, an Apache Commons Java component.

To get started, go to https://www.shiftleft.io/developers/ and enter a GitHub repository URL.

ShiftLeft then asks you for your name and email address:

And you are off to the races.

It’s important to note that ShiftLeft has a 30 day disclosure policy so you have plenty of time to fix up your FOSS projects.

My previous post looked at the 2.5 release tag for Apache Commons IO; here I am working with my GitHub fork of the master branch, which I’ve kept up-to-date. While my initial experiment with ShiftLeft gave me a 150 KB JSON report to pour over, here, I have a nice web UI to explore:

What does it all mean? We have three areas in the UI that we will explore:

The top-left shows a summary for the current state of the repository’s master branch: the latest commit details and a summary of conclusions (in white boxes.)
The dark-colored list on the left shows what ShiftLeft calls conclusions. These are our potentially actionable items. As we’ll see, even if you find some conclusions non-actionable, these will do a great deal to raise your awareness of potential security issues for code that you’ll write tomorrow or need to maintain today. You can expand each item (dark box) to reveal more information.
On the right-hand-side, you see a tree with paths of all public classes organized by package. On the left of that pane is a list of packages. You can expand each package to reveal of the public classes it contains. You can then expand each class to show its methods. We’ll see of this later. Leading away from tree item that have a conclusion, you’ll see light-colored path to its category. In other words, if you see a path leading away from an item, be it a package or class, that means one of its containing items carries with it a conclusion.

The first thing to notice of course is that I no longer have to consider the whole JSON report file. In the UI, the conclusions are presented in an expandable list without having to filter out the graph data (and thank goodness for that.) There is also a heading called “Issues” you will use to track which conclusions you want to track for changes. Since we’ve not marked any conclusions as issues, the UI presents the expected “0” count and that “No conclusions marked as issues”.

The first UI elements to notice are the two summary boxes for “Sensitive Data” and “Untrusted Data”. ShiftLeft uses these two terms in conclusion descriptions to organize its findings.

The Trusted and Sensitive Kind

Lets describe “Sensitive Data” and “Untrusted Data”.

Conclusions described as dealing with Sensitive Data tell you: Lookout, if you have a password in this variable, it’s in plain text. Now, it’s up to me to make sure that this password does not end up in a clear text file or anywhere else that is not secure. This is where context matters, you are the SME of your code, you know how much trouble you can get yourself and your users into, ShiftLeft has no opinion, it offers ‘conclusions.’

Conclusions referring to Untrusted Data: This tells me I should take precautions before doing anything with that data. Should I just execute this script? Should I need to worry about JSON Hijacking? See Why does Google prepend while(1); to their JSON responses?

Looking for Trouble Again

Let’s start with a simple conclusion and get deeper in the weeds after that. When you click on “Sensitive Data” and “Untrusted Data”, you filter the list of conclusions. I choose “Untrusted Data” because I am looking for the first interesting conclusion I found while writing Using ShiftLeft in Open Source: The method IOUtils.buffer(Writer, int) does not support handling untrusted data to be passed as parameter size because it controls the size of a buffer, giving an attacker the chance to starve the system of memory. I find it quickly using a page search:

I can click on the link to open a page on exact line of code in GitHub:

While this example may seem trivial, ShiftLeft shows understanding of what the code does in this method: We are allowing call sites to control memory usage in an unbounded manner.

Let’s imagine an application that would allow an unbound value to be used, for example, to process a 2 GB file and that would care about this API and the conclusion rendered by ShiftLeft. To track this conclusion, we mark it as an issue to have it tracked in our Issues list:

Now, for the fun part. Let’s edit the code to guard against unbounded usage. Let’s institute an arbitrary 10 MB limit. We’ll change the code from:

    /**
     * Returns the given Writer if it is already a {@link BufferedWriter}, otherwise creates a BufferedWriter from the
     * given Writer.
     *
     * @param writer the Writer to wrap or return (not null)
     * @param size the buffer size, if a new BufferedWriter is created.
     * @return the given Writer or a new {@link BufferedWriter} for the given Writer
     * @throws NullPointerException if the input parameter is null
     * @since 2.5
     */
    public static BufferedWriter buffer(final Writer writer, int size) {
        return writer instanceof BufferedWriter ? (BufferedWriter) writer : new BufferedWriter(writer, size);
    }

to:

    private static final int MAX_BUFFER_SIZE = 10 * 1024 * 1024; // 10 MB

    /**
     * Returns the given Writer if it is already a {@link BufferedWriter}, otherwise creates a BufferedWriter from the
     * given Writer.
     *
     * @param writer the Writer to wrap or return (not null)
     * @param size the buffer size, if a new BufferedWriter is created.
     * @return the given Writer or a new {@link BufferedWriter} for the given Writer
     * @throws NullPointerException if the input parameter is null
     * @since 2.5
     */
    public static BufferedWriter buffer(final Writer writer, int size) {
    	if (size &amp;amp;amp;amp;amp;amp;amp;amp;gt; MAX_BUFFER_SIZE) {
            throw new IllegalArgumentException("Request buffer cannot exceed " + MAX_BUFFER_SIZE);
    	}
        return writer instanceof BufferedWriter ? (BufferedWriter) writer : new BufferedWriter(writer, size);
    }

After pushing this change to GitHub, I do not see a change in my ShiftLeft report; ah, this is a beta still, should I chalk this up to work in progress or is there still potential trouble ahead?

I wonder if this method shouldn’t be always flagged anyway. Yes, I changed the code so that the memory allocation is no longer unbounded, but who is to decide if my MAX_BUFFER_SIZE is reasonable or not? It might be fine for a simple use case like a single threaded app does does it once. What if I have ten thousand concurrently tasks that want to do this? Is that still reasonable? I’m not so sure. So for now, I think I like being notified of this memory allocation.

Digging deeper

In my previous ShiftLeft post — based on Apache Commons IO 2.5, not master — I had found this conclusion (in raw form edited for brevity):

{
 "id": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)/srcFile/2",
 "description": "The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.",
 "unsupportedDataType": "SENSITIVE",
 "interfaceId": "FILE/false",
 "methodId": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)",
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

Looking at the methodId tells us to go look at FileUtils.copyFileToDirectory(File, File) where we find:

/**
 * Copies a file to a directory preserving the file date.
 *
 * This method copies the contents of the specified source file
 * to a file of the same name in the specified destination directory.
 * The destination directory is created if it does not exist.
 * If the destination file exists, then this method will overwrite it.
 *
 * &amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;strong&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;Note:&amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;/strong&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; This method tries to preserve the file's last
 * modified date/times using {@link File#setLastModified(long)}, however
 * it is not guaranteed that the operation will succeed.
 * If the modification operation fails, no indication is provided.
 *
 * @param srcFile an existing file to copy, must not be {@code null}
 * @param destDir the directory to place the copy in, must not be {@code null}
 *
 * @throws NullPointerException if source or destination is null
 * @throws IOException if source or destination is invalid
 * @throws IOException if an IO error occurs during copying
 * @see #copyFile(File, File, boolean)
 */
 public static void copyFileToDirectory(final File srcFile, final File destDir) throws IOException {
  copyFileToDirectory(srcFile, destDir, true);
 }

This method just delegates to another copyFileToDirectory() with an added parameter, no big deal. What is interesting is that the codeLocationUrl points to code not in this method but to a private utility method:

https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141

FileUtils at line 1141 is in the guts of a private method called org.apache.commons.io.FileUtils.doCopyFile(File, File, boolean) which is where ShiftLeft flagged an issue where the method creates a new FileInputStream. Because ShiftLeft is working with a code graph, when I search the JSON conclusions for this URL, I find a total of 14 conclusions that use this URL. This tells me that this code fragment creates 14 possible vulnerabilities in the component; with a careful emphasis on possible since context is important.

If I search in the Conclusions list on the left f the page, I find several hits for “FileUtils.copyFileToDirectory”. Then, I can click to expand each one so see the exact location and hyperlink to GitHub. What I hope is coming is the ability to filter sort so I create a mental picture like I was able with the JSON report.

ShiftLeft also has a user friendly way to discover this information: the tree view:

In this view, the “” node is the topmost package in Apache Commons IO. You can see that it has a path that leads to all three different categories: Generic, File, and Child process. This means that the root package contains conclusions and that these conclusions are in the linked categories.

When I expand the root node, I find the FileUtils class (highlighted):

You can see that the class has a path leading away from it, so I know it contains conclusions. At that point, it’s a little harder to make sense of the categories as they’ve scrolled off the top of the screen. It would be nice if the categories floated down as you scroll. Version 2 I hope! You can also see that some classes like FilenameUtils and IOCase do not have paths leading away from them and therefore do not carry conclusions. A relief I suppose, but I’d like to ability to filter out items that are conclusion-free.

I now expand the FileUtils class:

Here, some methods have paths, some don’t; scrolling down, we get to copyFileToDirectory:

As expected, the method has a path leading away from it which indicates a conclusion but we do not know which kind or which one. We do get a description of its parameters though, a nice touch.

For now, clicking on the method does not do anything where I would expect to be able perform the same operations as in the list. This view lets you explore the whole library but I do not find it terribly useful beyond the path to categories. I’d like to see hyperlinks to code and also the use of color to distinguish which methods are flagged as Untrusted Data and Sensitive Data as well as an indication as to which categories are involved that does not scroll of the screen.

The nice thing though is that I have two paths of exploration in the UI: the conclusion list and the explorer tree.

There are two key technologies at work here and that I expect both to get better as the beta progresses: First, building a code graph to give us the power to see that once a problem has been identified on a line of code, that all (I assume public) call-sites can be flagged. Second, what constitutes a problem or a conclusion in ShiftLeft’s neutral parlance will improve and be configurable, filterable and sortable.

In this example, the conclusion description reads:

The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.

What goes through my head when I read that is: Yeah, I do not want just anybody to be able to copy any file anywhere like overwriting a password vault a la copyFileToDirectory(myFile, "/etc/shadow"). Granted, Apache Commons IO is a library, not an application, so there is no alarm bells to ring here, but you get the idea.

Stepping back, I think it is important to reiterate what happened here: ShiftLeft found an issue (less dramatic than a problem) on a line of code in a private methods, then, using its code graph, created conclusions (report items) for each public facing method that may eventually call this private method in its code path.

Working from a baseline

If you think that having a list over 200 hundred conclusions to sift through is daunting, I would agree with you. This is why I look forward to using some sorting and filtering in the UI!

What matters just as much is how to use ShiftLeft when your code evolves. I want to track differences from commit to commit and from build to build: Did I create or squash vulnerabilities? This I can tell by watching the Conclusions and Issues list in the UI. I am hoping that ShiftLeft will implement a similar feature to Coveralls where you get an email that tells how much your test code coverage has changed in a build.

As an experiment, let’s see what happens when I add some possibly malicious code, a method to delete all files and directories from a given directory:

package org.apache.commons.io;

import java.io.File;
import java.io.IOException;

public class ADangerousClass {

    public void deleteAll(File directory) throws IOException {
        FileUtils.deleteDirectory(directory);
    }

}

Note that all this method does is delegate to another method. I hit refresh in my browser and I see my commit:

My commit comment, date, and commit hash are there. ShiftLeft goes to work for about two minutes (the two counts are reset to 0 as ShiftLeft is analyzing.) Then the Sensitive Data and Untrusted Data conclusion counts have gone up. Scrolling down I see my new class:

I also see it in the tree of course:

Notice that the deleteAll method has a path to the File category on the right hand side, this makes sense based on my previous findings.

Now I really want to click on the categories on the right as filters! I am especially intrigued by the “Child process” category.

What is worth noting here is that my new class and method do not in themselves actually do anything dangerous. But since we are working with a code graph, and that graph leads to a dangerous place, the new code is flagged.

Now for a bit of fun, let’s change the method to make the dangerous bits unreachable:

    public void deleteAll(File directory) throws IOException {
        if (false) {
            FileUtils.deleteDirectory(directory);
        }
    }

The dangerous class is gone from the list but present in the tree since it is a public API. What if it’s something more tricky? Let’s make some code unreachable through a local variable, and we will make it final to make it obvious to the code graph that the value is immutable:

    public void deleteAll(File directory) throws IOException {
        final boolean test = 1 == 2;
        if (test) {
            FileUtils.deleteDirectory(directory);
        }
    }

The dangerous class is still gone from the list. Pretty clever it is. Let’s see about delegating the test to a method:

    public void deleteAll(File directory) throws IOException {
        final boolean test = test();
        if (test) {
            FileUtils.deleteDirectory(directory);
        }
    }

    private boolean test() {
        return 1 == 2;
    }

ShiftLeft now shows the deleteAll() method in both the Untrusted Data and Sensitive Data lists. So that’s a false positive. Let’s get away from using a method and use two local variables instead:

    public void deleteAll(File directory) throws IOException {
        final Object obj = null;
        boolean test = true;
        if (obj == null) {
            test = false;
        }
        if (test) {
            FileUtils.deleteDirectory(directory);
        }
    }

With this change, ShilfLeft still puts the method as Untrusted Data and Sensitive Data lists. OK, so this is a bit like Eclipse’s compiler warnings for null analysis, it flags what it can see without really evaluating, fair enough.

Linking to the root cause

Let’s go back to the conclusions list for a minute. My deleteAll experiment created two conclusions: one untrusted data, one senstive data. Let’s take a closer look at these.

Untrusted Data

.ADangerousClass.deleteAll

The method deleteAll does not support handling untrusted data to be passed as parameter directory because it controls access to I/O File in a manner that would allow an attacker to abuse it.

When I click on the GitHub link for Untrusted Data, I see:

https://github.com/garydgregory/commons-io/blob/4e21badfa902f4d7cb80b5db0abb287dd43963b7/src/main/java/org/apache/commons/io/FileUtils.java#L1749

Note that we are not in the deleteAll method here, rather we are where the ShiftLeft code graph flags as the root issue. In other words, if I wrote a public method that called deleteAll, I would get the same conclusion and link. Graph Power!

Why is calling directory.listFiles() labeled untrusted? Well, passing a sensitive file path should not be considered a problem, because the file path you are searching for would not end up written on the disk. It is however considered dangerous if attackers were to control the input path, because they could be able to list arbitrary directories on the system. That’s a breach.

Only considering the method verifiedListFiles(), ShiftLeft does not know that the method is used in an operation to delete files. That’s up next:

Sensitive Data

.ADangerousClass.deleteAll

The method deleteAll does not support handling sensitive data to be passed as parameter directory because it is leaked over I/O File.

When I click on the GitHub link for Sensitive Data, I see:

https://github.com/garydgregory/commons-io/blob/4e21badfa902f4d7cb80b5db0abb287dd43963b7/src/main/java/org/apache/commons/io/FileUtils.java#L1620

Clearly calling File.delete() can be trouble but using the sensitive data category may be a bit of a stretch. If any sensitive data is used in a file operation, (for example, as the path of the file, like “path/to/my-secrets”,) then that data will end up on disk. For a delete operation, you could say that that’s not the case because you’re doing the reverse, but actually just the fact that you are deleting a file with a sensitive name is interesting. It’s also possible that you already had previously written sensitive data unencrypted to the disk. That’s a roundabout way to get there but it feels justifiable.

Finding arbitrary code attacks

When I first ran ShiftLeft on Apache Commons 2.5, I found a few conclusions for arbitrary code attacks in the Java7Support class. Now that Apache Commons in Git master requires Java 7, the Java7Support class is gone. At the moment, I’ve not found a way to run ShiftLeft on anything but the master branch of a repository, so let’s make our own trouble with Method.invoke() to call BigInteger.intValueExact() on Java 8 and intValue() on older versions of Java:

package org.apache.commons.io;

import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.math.BigInteger;

public class BigIntHelper {

    private static Method intValueExactMethod;

    static {
        try {
            intValueExactMethod = BigInteger.class.getMethod("intValueExact");
        } catch (NoSuchMethodException | SecurityException e) {
            intValueExactMethod = null;
            e.printStackTrace();
        }
    }

    public static int getExactInt(BigInteger bigInt) {
        try {
            return (int) (intValueExactMethod != null
                ? intValueExactMethod.invoke(bigInt)
                : bigInt.intValue());
        } catch (IllegalAccessException | IllegalArgumentException | InvocationTargetException e) {
            e.printStackTrace();
            return bigInt.intValue();
        }
    }

    public static void main(String[] args) {
        System.out.println(getExactInt(BigInteger.TEN));
    }
}

This code is OK by ShiftLeft even though our intValueExactMethod variable is private but not final:

Let’s open things up by making the variable by changing:

private static Method intValueExactMethod;

to:

public static Method intValueExactMethod;

For the Java7Support class in Apache Commons 2.5, ShiftLeft reports several arbitrary code attack vulnerabilities. Unfortunately, ShiftLeft does not report any such vulnerabilities for this example. Growing pains I suppose. Well, that’s all I have for now. A fun exploration in an area I’d like to get back to soon.

Fin

I’d like to wrap up this exploration of ShiftLeft with a quick summary of what we found: a tool we can add to our build pipelines to find potential security vulnerabilities.

There are a lot of data here, and this is just for Apache Commons IO! Another lesson is that context matters. This is low-level library as opposed to an application. Finding vulnerabilities in a low level library is good but this may not be vulnerabilities for your application. ShiftLeft conclusions can at least make you aware of how to use this library safely. ShiftLeft currently provides conclusions based on a code graph, this is powerful, as the examples show. We found conclusions about untrusted data (I’m not sure what’s in here so don’t go executing it) and sensitive data (don’t save passwords in plain text!)

I hope revisit this story and run ShiftLeft on other Apache Commons projects soon. This sure is fun!

Happy Coding,
Gary Gregory

Using ShiftLeft in Open Source

1 Reply

I started to wonder how we can short-circuit FUD and proactively deal with security issues before we hear about them in the press and in panicked mailing list posts. Is there something we can do at build time? A FindBugs or PMD on steroids? Surely someone else must be thinking about this. Can we apply the principle of shift left testing to security?

It turns out there is a tool out there that does and it’s called, not coincidentally, ShiftLeft.io. The idea behind ShiftLeft is to break old habits of building a product and then, later, figuring out how to fend off attacks and plug up security holes. Today, we take for granted that unit testing, integration testing, continuous integration, and continuous delivery are common place. ShiftLeft propose to make security analysis as ubiquitous.

Getting started

Since ShiftLeft is free for open source projects, I decided to look what it reports for Apache Commons IO 2.5, an Apache Commons Java component. While I won’t divulge right now any real security issues, I’ll show what is safe to report about Commons IO. I’ll also show the cool stuff ShiftLeft points out to write more bullet-proof software. To use ShiftLeft, you go to their web site and submit the URL to your GitHub repository. ShiftLeft has a 30 day disclosure policy so you have plenty of time to fix up your FOSS project.

What I got is a JSON report file containing all sorts of data and what ShiftLeft calls conclusions. These are the potentially actionable items. As we’ll see, even if you find some conclusions non-actionable, these will do a great deal to raise your awareness of potential security issues for code that you’ll write tomorrow or need to maintain today.

Let’s start with a simple conclusion and get deeper in the weeds after that.

No memory for you!

The following example shows what “untrusted” data is and how it can affect your application.

I have a ShiftLeft conclusion that tells me the method IOUtils.buffer(Writer, int) does not support handling untrusted data to be passed as parameter size because it controls the size of a buffer, giving an attacker the chance to starve the system of memory:

/**
 * Returns the given Writer if it is already a {@link BufferedWriter}, otherwise creates a BufferedWriter from the
 * given Writer.
 *
 * @param writer the Writer to wrap or return (not null)
 * @param size the buffer size, if a new BufferedWriter is created.
 * @return the given Writer or a new {@link BufferedWriter} for the given Writer
 * @throws NullPointerException if the input parameter is null
 * @since 2.5
 */
 public static BufferedWriter buffer(final Writer writer, int size) {
  return writer instanceof BufferedWriter ? (BufferedWriter) writer : new BufferedWriter(writer, size);
 }

While this example may seem trivial, ShiftLeft shows understanding of what the code does in this method: We are allowing call sites to control memory usage in an unbounded manner. ShiftLeft gives us the exact method and line number:

org.apache.commons.io.IOUtils.buffer:java.io.BufferedWriter(java.io.Writer,int)
https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/IOUtils.java#L628

While some ShiftLeft conclusions may not all be actionable, I must say that the report has made me more aware of what to potentially secure when writing new code or maintaining old an code base.

zzfireplace Tangent: What could you do in this case? You could hard-code an upper bound of say 10 MB. You could make the bound configurable with a static non-final variable (effectively creating a global variable, an anti-pattern for sure.) You could move all the static methods to the instance side, create a memory profile class to define this bound, and then build IOUtils instances with this profile in the constructor. In addition, you could also use a different buffer class internally to make sure the buffer does not grow beyond a given size. And so on. I am not proposing these changes in the context of the Apache Commons IO 2.x line since we take great care to maintain binary compatibility within a major release across all of Apache Commons. But I can definitively see proposing changes for 3.0!

ShiftLeft’s understanding of code is deeper than this example thanks to Enhanced Code Property Graphs so let’s look at a more complex example.

Digging deeper

Here is a conclusion in raw form (edited for brevity):

{
 "id": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)/srcFile/2",
 "description": "The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.",
 "unsupportedDataType": "SENSITIVE",
 "interfaceId": "FILE/false",
 "methodId": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)",
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

Looking at the methodId tells us to go look at FileUtils.copyFileToDirectory(File, File) where we find:

/**
 * Copies a file to a directory preserving the file date.
 *

 * This method copies the contents of the specified source file
 * to a file of the same name in the specified destination directory.
 * The destination directory is created if it does not exist.
 * If the destination file exists, then this method will overwrite it.
 *

 * &amp;amp;lt;strong&amp;amp;gt;Note:&amp;amp;lt;/strong&amp;amp;gt; This method tries to preserve the file's last
 * modified date/times using {@link File#setLastModified(long)}, however
 * it is not guaranteed that the operation will succeed.
 * If the modification operation fails, no indication is provided.
 *
 * @param srcFile an existing file to copy, must not be {@code null}
 * @param destDir the directory to place the copy in, must not be {@code null}
 *
 * @throws NullPointerException if source or destination is null
 * @throws IOException if source or destination is invalid
 * @throws IOException if an IO error occurs during copying
 * @see #copyFile(File, File, boolean)
 */
 public static void copyFileToDirectory(final File srcFile, final File destDir) throws IOException {
  copyFileToDirectory(srcFile, destDir, true);
 }

https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141

FileUtils at line 1141 is in the guts of a private method called org.apache.commons.io.FileUtils.doCopyFile(File, File, boolean) which is where ShiftLeft flags an issue where the method creates a new FileInputStream. To be honest, this is a beta and I am not convinced that the line numbers are always dead on. What is important here is that ShiftLeft is working with a code graph. Therefore, when I search the JSON conclusions for this URL, I find a total of 14 conclusions that use this URL. This tells me that this code fragment creates 14 possible attack points in the component; with a careful emphasis on possible since context is important.

Another key point is to realize that there are two key technologies at work here and that I expect both to get better as the beta progresses: First, building a code graph to give us the power to see that once a problem has been identified on a line of code, that all (I assume public) call-sites can be flagged. Second, what constitutes a problem or a conclusion in ShiftLeft’s neutral parlance will improve and be configurable, filterable and sortable.

In this example, the conclusion description reads:

The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.

What goes through my head when I read that is: Yeah, I do not want just anybody to be able to copy any file anywhere like overwriting a password vault a la copyFileToDirectory(myFile, "/etc/shadow"). Granted, Commons IO is a library, not an application, so there is no alarm bell to ring here, but you get the idea.

Are you Trusted and Sensitive?

ShiftLeft uses two terms in conclusion descriptions that I want to review. Based on the limited subset of 255 (a very computer friendly number!) conclusions I saw for all of Commons IO, we can see two types of issues: trusted and sensitive.

sensitive-data-in-the-cloud-blog-image-1 Conclusions described as dealing with sensitive data: This says: Lookout, if you have a password in this variable, it’s in plain text. Now, it’s up to me to make sure that this password does not end up in a clear text file or anywhere else that is not secure. This is where context matters, you are the SME of your code, you know how much trouble you can get yourself and your users into, ShiftLeft has no opinion, it offers ‘conclusions.’

Conclusions referring to untrusted data: This tells me I should take precautions before doing anything with that data. Should I just execute this script? Should I need to worry about JSON Hijacking? See Why does Google prepend while(1); to their JSON responses?

Flipping it around on ShiftLeft

Let’s flip it around on ShiftLeft for a minute. I am now thinking about not writing sensitive data like passwords and financial reports to disk insecurely. I know we have many API in FileUtils that write strings to files. Will ShiftLeft tell me about, for example FileUtils.write(File, CharSequence, Charset)? Here is the method I should never use to write passwords or any sensitive data:

/**
 * Writes a CharSequence to a file creating the file if it does not exist.
 *
 * @param file the file to write
 * @param data the content to write to the file
 * @param encoding the encoding to use, {@code null} means platform default
 * @throws IOException in case of an I/O error
 * @since 2.3
 */
 public static void write(final File file, final CharSequence data, final Charset encoding) throws IOException {
  write(file, data, encoding, false);
 }

This in turn calls:

/**
 * Writes a CharSequence to a file creating the file if it does not exist.
 *
 * @param file the file to write
 * @param data the content to write to the file
 * @param encoding the encoding to use, {@code null} means platform default
 * @param append if {@code true}, then the data will be added to the
 * end of the file rather than overwriting
 * @throws IOException in case of an I/O error
 * @since 2.3
 */
 public static void write(final File file, final CharSequence data, final Charset encoding, final boolean append)
 throws IOException {
  final String str = data == null ? null : data.toString();
  writeStringToFile(file, str, encoding, append);
 }

Which calls:

/**
 * Writes a String to a file creating the file if it does not exist.
 *
 * @param file the file to write
 * @param data the content to write to the file
 * @param encoding the encoding to use, {@code null} means platform default
 * @param append if {@code true}, then the String will be added to the
 * end of the file rather than overwriting
 * @throws IOException in case of an I/O error
 * @since 2.3
 */
 public static void writeStringToFile(final File file, final String data, final Charset encoding, final boolean
 append) throws IOException {
   OutputStream out = null;
   try {
     out = openOutputStream(file, append);
     IOUtils.write(data, out, encoding);
     out.close(); // don't swallow close Exception if copy completes normally
   } finally {
     IOUtils.closeQuietly(out);
   }
 }

Which calls IOUtils‘:

/**
 * Writes chars from a &amp;lt;code&amp;gt;String&amp;lt;/code&amp;gt; to bytes on an
 * &amp;lt;code&amp;gt;OutputStream&amp;lt;/code&amp;gt; using the specified character encoding.
 *

 * This method uses {@link String#getBytes(String)}.
 *
 * @param data the &amp;lt;code&amp;gt;String&amp;lt;/code&amp;gt; to write, null ignored
 * @param output the &amp;lt;code&amp;gt;OutputStream&amp;lt;/code&amp;gt; to write to
 * @param encoding the encoding to use, null means platform default
 * @throws NullPointerException if output is null
 * @throws IOException if an I/O error occurs
 * @since 2.3
 */
 public static void write(final String data, final OutputStream output, final Charset encoding) throws IOException {
   if (data != null) {
     output.write(data.getBytes(Charsets.toCharset(encoding)));
   }
 }

Knowing what I know, I expect ShiftLeft to conclude that all these methods do not support sensitive data. Working back up the stack, I find:

org.apache.commons.io.IOUtils.write(String, OutputStream, Charset)
Nothing on that one; did I miss it due to the 255 conclusion limit?

org.apache.commons.io.FileUtils.writeStringToFile(File, String, Charset, boolean)
Got it:

 {
 "id": "org.apache.commons.io.FileUtils.writeStringToFile:void(java.io.File,java.lang.String,boolean)/file/1",
 "description": "The method `writeStringToFile` does not support handling **untrusted data** to be passed as parameter `file` because it controls access to I/O **File** in a manner that would allow an attacker to abuse it.",
 "unsupportedDataType": "ATTACKER_CONTROLLED",
 "interfaceId": "FILE/false",
 "methodId": "org.apache.commons.io.FileUtils.writeStringToFile:void(java.io.File,java.lang.String,boolean)",
 "codeLocation": {
   "file": "org/apache/commons/io/FileUtils.java",
   "lineNumber": 360,
   "symbol": "parent"
 },
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L360",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

org.apache.commons.io.FileUtils.write(File, CharSequence, Charset, boolean)
Got it:

{
 "id": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset,boolean)/file/2",
 "description": "The method `write` does not support handling **sensitive data** to be passed as parameter `file` because it is leaked over I/O **File**.",
 "unsupportedDataType": "SENSITIVE",
 "interfaceId": "FILE/false",
 "methodId": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset,boolean)",
 "codeLocation": {
   "file": "org/apache/commons/io/FileUtils.java",
   "lineNumber": 355,
   "symbol": "parent"
 },
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L355",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

org.apache.commons.io.FileUtils.write(File, CharSequence, Charset)
Got it:

{
 "id": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset)/file/2",
 "description": "The method `write` does not support handling **sensitive data** to be passed as parameter `file` because it is leaked over I/O **File**.",
 "unsupportedDataType": "SENSITIVE",
 "interfaceId": "FILE/false",
 "methodId": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset)",
 "codeLocation": {
   "file": "org/apache/commons/io/FileUtils.java",
   "lineNumber": 355,
   "symbol": "parent"
 },
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L355",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

So yeah, that all hangs nicely together, thank you code graphs!

Finding arbitrary code attacks

Did I mention there 255 are conclusions in the JSON report file? It takes a while to go through these. I am hoping that ShiftLeft will have their UI in place soon so I can filter and sort all this information! Now that I am about 20% through the file, I see:

The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.

runaway_train

Yikes! let’s take a deeper look and see if this is for real or a false positive.

Here is the raw conclusion:

{
 "id": "org.apache.commons.io.Java7Support.createSymbolicLink:java.io.File(java.io.File,java.io.File)/symlink/1",
 "description": "The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.",
 "unsupportedDataType": "ATTACKER_CONTROLLED",
 "methodId": "org.apache.commons.io.Java7Support.createSymbolicLink:java.io.File(java.io.File,java.io.File)",
 "codeLocation": {
 "file": "org/apache/commons/io/Java7Support.java",
 "lineNumber": 128,
 "symbol": "file"
 },
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/Java7Support.java#L128",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

Our codeLocationUrl for this conclusion points us to the Java7Support class at line 128: where we find:

/**
 * Indicates if a symlunk target exists
 * @param file The symlink file
 * @return true if the target exists
 * @throws IOException upon error
 */
 private static boolean exists(File file)
 throws IOException {
   try {
     Object path = toPath.invoke(file);
     final Boolean result = (Boolean) exists.invoke(null, path, emptyLinkOpts);
     return result.booleanValue();
   } catch (IllegalAccessException e) {
     throw new RuntimeException(e);
   } catch (InvocationTargetException e) {
     throw (RuntimeException) e.getTargetException();
   }
}

ShiftLeft points to the line:

Object path = toPath.invoke(file);

The instance variable toPath is a java.lang.reflect.Method which can and does execute code as shown above. Looking narrowly at the code so far we can say that yes, this code run anything since toPath is a Method.

However, widening our view to the field declaration we see the following in the class static initialization block:

toPath = File.class.getMethod("toPath");

This makes sense in the context of the class: Java7Support is used to access Java 7 features while running on pre-Java 7 platforms. Here we are setting up toPath to run one method. I would expect toPath to be a static final but it is not:

private static Method toPath;

Why is it not static? Well, it’s just that the way the static initialize block is written does not allow you to just add final to the declaration. The static block needs to be rewritten to allow for toPath to be final which we will leave as ‘an exercise to the reader’ 😉 as it is out of scope for an already long blog post.

I would be curious to see how ShiftLeft responds to such a code change.

I am not sure if this is really a problem though. The variable is private now, but not final. Yes its type (Method) is all about executing code. Under normal circumstances, this value cannot be changed outside this class. I can use reflection of course to force a new value in toPath. Does that mean that anytime I use a Method instance variable I am going to get an arbitrary code execution conclusion? Another corner-case to examine.

What if I rewrote the static block and declared the variable final. Would ShiftLeft still reach the same conclusion? If yes, would that be because I could still use reflection to hammer any value in the field.

Concluding on this first arbitrary code attack

The more I explore these test results, the more I realize how tricky security is and how much context matters. I now know that the Java7Support class in Apache Commons IO 2.5 is open to an arbitrary code attack under the narrow use case of another piece of code using reflection. But if that code is allowed to use reflection, surely it could achieve its goal without going through the extra hoop of Java7Support hacking.

Stepping back, the realization is that I should think twice about using the Method class because I could open my application up to an attack unless that Method field is properly protected.

Looking for more arbitrary code attacks

Now that ShiftLeft has whetted my appetite, I wonder if there are more arbitrary code attacks lurking. A quick search through the file reveals to total of five. Not surprisingly, these are all in the Java7Support class and all follow the same pattern as above: calling the invoke method of a Method object where the Methodis initialized in the static block.

Flipping it around once more, let’s look at the Java7Support class variable declarations and see if all Method objects end up being accounted for by ShiftLeft:

/**
 * Java7 feature detection and reflection based feature access.
 * <p/>
 * Taken from maven-shared-utils, only for private usage until we go full java7
 */
class Java7Support {

  private static final boolean IS_JAVA7;

  private static Method isSymbolicLink;

  private static Method delete;

  private static Method toPath;

  private static Method exists;

  private static Method toFile;

  private static Method readSymlink;

  private static Method createSymlink;
  ...

We have seven static Method declarations which I see initialized in the static block:

static {
 boolean isJava7x = true;
 try {
   ClassLoader cl = Thread.currentThread().getContextClassLoader();
   Class<?> files = cl.loadClass("java.nio.file.Files");
   Class<?> path = cl.loadClass("java.nio.file.Path");
   Class<?> fa = cl.loadClass("java.nio.file.attribute.FileAttribute");
   Class<?> linkOption = cl.loadClass("java.nio.file.LinkOption");
   isSymbolicLink = files.getMethod("isSymbolicLink", path);
   delete = files.getMethod("delete", path);
   readSymlink = files.getMethod("readSymbolicLink", path);
   emptyFileAttributes = Array.newInstance(fa, 0);
   createSymlink = files.getMethod("createSymbolicLink", path, path, emptyFileAttributes.getClass());
   emptyLinkOpts = Array.newInstance(linkOption, 0);
   exists = files.getMethod("exists", path, emptyLinkOpts.getClass());
   toPath = File.class.getMethod("toPath");
   toFile = path.getMethod("toFile");
   } catch (ClassNotFoundException e) {
     isJava7x = false;
   } catch (NoSuchMethodException e) {
     isJava7x = false;
   }
   IS_JAVA7 = isJava7x;
 }

ShiftLeft gives me five conclusions:

The method `isSymLink` does not support handling **untrusted data** to be passed as parameter `file` because it could allow an attacker to run arbitrary code.
The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.
The method `delete` does not support handling **untrusted data** to be passed as parameter `file` because it could allow an attacker to run arbitrary code.
The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `target` because it could allow an attacker to run arbitrary code.
The method `readSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.

All of the Method declarations in this class are used by all of the methods listed above. Nice.

Fin

fin_3_0 I’d like to wrap up this exploration of ShiftLeft with a quick summary of what we found: a tool we can add to our build pipelines to find potential security issues. There are a lot of data here, and this is just for Apache Commons IO 2.5, so another lesson is that context matters. ShiftLeft currently provides ‘conclusions’ based on a code graph. We found conclusions about untrusted data (I’m not sure what’s in here so don’t go executing it), sensitive data (don’t save passwords in plain text!) and arbitrary code attacks.

I hope revisit this story and run ShiftLeft on other Apache Commons projects soon. This sure is fun!

Happy Coding,
Gary Gregory

Understanding Java Generics’ super and extends

7 Replies

Do you know the difference in Java generics between <? super E> and <? extends E>? If you have any doubt, read on.

In Java, classes and interfaces can have type parameters, like a List of Numbers instead of just a List:

List list = new ArrayList();

With the diamond notation <>, the Java 7 compiler uses type inference to deduce the argument type so you can abbreviate instantiation to new ArrayList<>().

You can also type the variable that holds the reference to the ArrayList instance as a List, a Collection, or a Serializable.

Let’s now say you have a pair of methods that take a List of Numbers. The first method printNumbers() gets Numbers out of the List:

public void printNumbers(List list) {
    for (Number number : list) {
        System.out.print(number);
        System.out.print(", ");
    }
}

The second method, fillNumbers(), adds Numbers to the List:

public void fillNumbers(List list) {
    Number n = Integer.MAX_VALUE;
    list.add(n);
    list.add(3);
}

The method getting Numbers out of the List could just as easily work with a List of Integers or a List of Floats:

List floats = new ArrayList();
floats.add(Float.valueOf(1.2f));
printNumbers(floats);

What we really want though is a List, not a List:

List floats = new ArrayList();
floats.add(Float.valueOf(1.2f));
printNumbers(floats);

But that causes the compiler to fail with:

The method printNumbers(List) in the type Test is not applicable for the arguments (List)

To use a List, the method must be typed with List<? extends Number>.

public void printNumbers(List list)

The ? extends tells the compiler that we want to use some unknown (the ?) subtype of Number. We explicitly constrain the wildcard (the ?) to represent the unknown subtype of Number. We say the method is a covariant use of the List of Numbers; it works with any List of any subtype of Number.

On the other hand, you should be able to give the method adding Numbers to the List any old List of Objects:

List objects = new ArrayList();
fillNumbers(objects);

That does not compile:

The method fillNumbers(List) in the type GenericsSuperExtendsDemo is not applicable for the arguments (List)You get this to work with List<? super Number>:

public void fillNumbers(List list)

The ? super Number is an explicitly constrained wildcard that represents some unknown supertype of Number. We call that one a contravariant use of the List of Numbers and it works for any List of any supertype of Number.Here is the complete example:

package test;import java.util.ArrayList;import java.util.List;public class GenericsSuperExtendsDemo {    public static void main(String[] args) {        new GenericsSuperExtendsDemo().go();    }    public void go() {        System.out.println("Hello World!");        List numbers = new ArrayList();        fillNumbers(numbers);        printNumbers(numbers);        {            List floats = new ArrayList();            floats.add(Float.valueOf(1.2f));            printNumbers(floats);        }        {            List floats = new ArrayList();            floats.add(Float.valueOf(1.2f));            printNumbers(floats);        }        {            List integers = new ArrayList();            printNumbers(integers);        }        {            List objects = new ArrayList();            fillNumbers(objects);        }    }    public void printNumbers(List list) {        for (Number number : list) {            System.out.print(number);            System.out.print(", ");        }    }    public void fillNumbers(List list) {        Number n = Integer.MAX_VALUE;        list.add(n);        list.add(3);    }}

The inspiration for this article is a paper presented at OOPSLA 2016 titled “Java and Scala’s Type Systems are Unsound.”Happy Coding,Gary Gregory

Loading a Log4j Configuration for a specific EJB

Leave a reply

You can load and unload a specific Log4j 2 configuration file for a given EJB. How? Use @PreDestroy and @PostConstruct. This gives you separately deployable EJBs with separate Log4j configurations. Ka-Pow!

For example:

public class MySessionEJB implements SessionBean {

    private static final String LOGGER_CONFIG = "/path/to/log4j2.xml";
    private static final String LOGGER_CONTEXT_NAME = "MySessionEJB";
    private static LoggerContext logctx;
    private static Logger logger;

    @PostConstruct
    @TransactionAttribute(value=TransactionAttributeType.NOT_SUPPORTED)
    private void postConstruct() {
        logctx = Configurator.initialize(LOGGER_CONTEXT_NAME, LOGGER_CONFIG);
        logger = logctx.getLogger("com.whatever.myejb");
    }

    @PreDestroy
    @TransactionAttribute(value=TransactionAttributeType.NOT_SUPPORTED)
    private void preDestroy() {
        Configurator.shutdown(logctx);
    }
}

Happy Coding,
Gary Gregory

How to catch up my git fork to master

1 Reply

TL;DR

git fetch upstream
git checkout master
git merge upstream/master

Here I am in one of my forked git repositories on GitHub. I have a patch to contribute, or a bug to fix but I want to make sure that my local copy of the repository is not stale. If I do not, the project I want to contribute to might not be able to apply my patch or merge my pull request cleanly. The first thing I need to do is to catch up my git repository to whatever the current code is in the master branch of the original repository. Some development workflows will use a different branch than master for day-to-day development but the same steps apply using whatever that branch name is. In this post, I’ll assume you are using master. You’ll also need a git command line tool.

There are two steps required. First, you must configure a git remote for a fork. Then you can catch up that fork to the current master.

Open a command line prompt and change the current directory to your project’s directory.

Configuring a git remote for a fork

You only need to do this once: Add a new remote upstream repository to sync with the fork where ORIGINAL_OWNER is the original GitHub account and ORIGINAL_REPOSITORY is the original repository name.

git remote add upstream https://github.com/ORIGINAL_OWNER/ORIGINAL_REPOSITORY.git

For example:

git remote add upstream https://github.com/apache/thrift.git

You can verify that all went well:

git remote -v
origin https://github.com/garydgregory/thrift.git (fetch)
origin https://github.com/garydgregory/thrift.git (push)
upstream https://github.com/apache/thrift.git (fetch)
upstream https://github.com/apache/thrift.git (push)

Catching up a git fork to master

Fetch project branches from the upstream repository to get all the commits. Your commits to master will be stored in the local branch upstream/master.

git fetch upstream

Check out the master branch from your local fork.

git checkout master

Now merge the changes from upstream/master into your local master branch. Your fork’s master branch will be in sync with the upstream repository. You will not lose your local changes.

git merge upstream/master

All done!

You can cut and paste this into a script:

git fetch upstream
git checkout master
git merge upstream/master

Happy Forking,
Gary

Of the demise of FindBugs and Monty Python

1 Reply

It turns out that FindBugs, the Java bug hunting tool used by legions of Java developers, after being proclaimed dead, has issued a Monty Python-like “I’m not dead yet” rejoinder on Hacker News.

What is going on here?

FindBugs is a development-time tool that finds bugs in Java programs using static analysis. It’s free and distributed under the Lesser GNU Public License out of The University of Maryland. Most developers use FindBugs through Maven and the Maven Findbugs Plugin. David Hovemeyer who did his Ph.D. thesis on FindBugs founded the project. Since then Bill Pugh has been the project lead and primary developer. The current developers listed for FindBugs are Bill Pugh and Andrey Loskutov.

Reviewing Andrey’s November 2nd post (edited in quotes for minor typos and clarity), here what seems to be the Good, the Bad and the Ugly, no not that one, but the film is a classic and the images below are all in good fun.

The Good

The Good is thin: FindBugs has two committers with push karma: Andrey Loskutov and Tagir Valeev. That’s not much for such a complex project but better than none. Tagir’s last commit (Google Code Archive) was on 2015-04-09 and Andrey does not consider him active. Andrey has stated that he himself “has no free time to work on the project”. (There is code on the Google Code Archive and then on GitHub.)

The Bad

Bill Pugh, the project lead, has not been active and is the bottleneck for access rights to critical parts of the project. According to Andrey:

“Only the project leader Bill Pugh has admin rights for the project web page and the GitHub project group and page. We cannot deploy any website update, we can’t add new project members, we can’t manage code access rights, we can’t publish releases to the well-known update sites without his help. Without him, we have no admin rights to anything; we can only push to the repository.”

This seems to be par for the course in many FOSS projects, the benevolent dictator acts as a gatekeeper to resources and once a bus or vacation enters the pictures, this locks out contributors. Bill Pugh, the project lead, has not been active or responsive it seems (until now, more on this later).

The Ugly

FindBugs has been around for a long time now (ten years) and has accrued a fair amount of technical debt.

Andrey states that the code is very complex, has “organically grown” over a decade. Not surprisingly and like many other FOSS projects, and authors have not done much in terms of documentation. Poor public interfaces apparently compound the hurt.

“Most of the code consists of the very low level bytecode related stuff, tightly coupled with the ancient BCEL library, which doesn’t scale and is not multi-thread safe.”

I hear you there. I help out over at Apache Commons, the current home of the Apache Commons BCEL component, where we released in July a long overdue version 6.0. Apache published the previous version 5.2; wait for it… in June 2006! Ouch, FindBugs, I feel your pain. It might be too late for FindBugs and Apache Commons BCEL developers to work together to salvage this dependency:

“No one enjoys maintaining this code, at least not me. I see no future for FindBugs with the BCEL approach, and see no way to get rid of it without investing lot of effort, and without breaking every detector and possibly many third party tools. This is the biggest issue we have with FindBugs today, and most likely the root cause for all the evil. This code can’t be fixed, it must be rewritten.”

What are the alternatives?

Luckily, there are other byte code fiddling tools out there. We have (quoted with minor edits from their respective sites):

ASM (INRIA, France Telecom license) from the OW2 consortium. ASM is an all-purpose Java bytecode manipulation and analysis framework. You can use ASM to modify existing classes or dynamically generate classes, directly in binary form. ASM provides common transformations and analysis algorithms allow you to assemble easily custom complex transformations and code analysis tools.
Apache Commons Weaver (Apache 2.0 license) manipulates existing class files. It looks like FINDBUGS has started to work on some integration with ASM based on looking at a few commits.
Javassist (Mozilla Public License 1.1, LGPL 2.1, Apache 2.0 licenses) is a class library for editing bytecodes in Java; it enables Java programs to define a new class at runtime and to modify a class file when the JVM loads it. Unlike other similar bytecode editors, Javassist provides two levels of API: source level and bytecode level. If the users use the source-level API, they can edit a class file without knowledge of the specifications of the Java bytecode. Javassist designed the whole API with only the vocabulary of the Java language. You can even specify inserted bytecode in the form of source text; Javassist compiles it on the fly. On the other hand, the bytecode-level API allows the users to edit directly a class file as other editors.
Byte Buddy (Apache 2.0 license) is a code generation and manipulation library for creating and modifying Java classes during the runtime of a Java application and without the help of a compiler. Other than the code generation utilities that ship with the Java Class Library, Byte Buddy allows the creation of arbitrary classes and is not limited to implementing interfaces for the creation of runtime proxies. Furthermore, Byte Buddy offers a convenient API for changing classes either manually, using a Java agent or during a build.

From FindBugs to HuntBugs

Andrey thinks that because the code is as it is, there are not so many people willing to contribute. GitHub sees some pull requests, but most of them are smaller fixes or enhancements which are not reviewed or tested.

These conditions has led Tagir to start his own project called HuntBugs, a new Java bytecode static analyzer tool based on Procyon Compiler Tools aimed to supersede FindBugs. HuntBugs is currently in early development stage and published under the Apache 2.0 license. It also sports seven contributors on GitHub, an almost good sign, but all of these contributors only have a handful of commits each. Almost all commits are from Tagir, so the word “community” seems generous. The Procyon Compiler Tools are published on BitBucket also under the Apache 2.0 license.

The lack of community around FindBugs is what killed it’s momentum. This happens over and over in the FOSS world.

Like many FOSS projects and again to no one’s surprise, there is little support from any companies or organizations. The FindBugs sponsors page states that the most recent funding for FindBugs comes from a Google Faculty Research Awards. No code patches, no testing, no paid developers. Of course, Andrey remarks, some companies do use FindBugs in commercial products like SonarSource and Coverity, not to mention countless companies using FindBugs in their build processes.

As it is today, even without further development, FindBugs is still useful. Sadly it cannot find bugs that are specific to Java 8 or Java 9’s early releases.

What’s on the horizon?

There are two paths here: FindBugs will raise from its ashes or a fork will take over. There are a number of steps Andrey outlines in his post to allow the current contributors (himself really) to continue the project. These are mostly project governance and access rights issues. However, having stated that his time is limited, does this seem realistic? This assumes that Bill follows the steps to further open up FindBugs and that this will lead to a community forming around a more open FindBugs . Summarizing from Andrey’s post, the following needs to happen:

Update the site to point to GitHub instead of SourceForge.
Shut down the old SourceForge bug tracker and forums and point to GitHub instead.
Allow to grant access rights to the GitHub project.
Grant right to publish the new releases all known download sites.
Configure automated build and test (hello Travis CI and Coveralls)
Attract contributors

None of these seems difficult or insurmountable; it just needs Bill’s blessing and a new project governance model that allows other contributors to come in have proper access rights. Perhaps FindBugs could adopt some of Apache’s nomenclature: You can be a committer with push rights and above that a Project Management Committee (PMC) member, with full access to the project. Only votes from PMC members vote are binding on release votes. This would allow many committers to come in under the mentorship of a PMC.

Hello Bill!

Bill eating a bug.

Then after all this, Bill pops up on Hacker News:

“FindBugs isn’t dead (although my participation had been in hibernation for a while).

I’ve been juggling far too many projects, but I’m now working to move FindBugs back into the active rotation.

I also want announce I’ll be working with GrammaTech as part of the Swamp Project, and they will be helping with rebooting the FindBugs project. This has been in the works for a long time (almost a year), and although I’ve known that GrammaTech was likely to win an award, this hasn’t been official and something I could talk about until recently. Was hoping to have something a little more concrete to talk about as far as that goes; but I don’t yet have the information I wanted to share.

Thanks to all the FindBugs fans and supporters who lobbied for me to return to active maintenance of FindBugs. Give me a week to get up to speed on current project needs.”

So that sounds good…

Now what?

This is all fine and good and I hope that Bill can work with the current contributors, or the one contributor to rejuvenate FindBugs. What Andrey asks sounds reasonable, but that is not enough. I believe that better project governance would really help FindBugs move forward again.

What about bringing FindBugs under the auspices of an Apache top-level project? Bringing a project under the Apache umbrella is done through the Apache Incubator under a well-defined process. This would really deal with all project governance issues and infrastructure in one go. The project sources could still live in GitHub, but the Apache-way of doing things would address many if not all of Andey’s concerns.

What do you think?

Cheers and Happy Coding,
Gary

Apache Log4j 2.7 is out!

Leave a reply

Apache Log4j 2.7 is heading out to Maven Central. Here’s are the highlights of what’s new since 2.6.2.

The RoutingAppender can be configured with scripts.
A new Appender, the ScriptAppenderSelector can create another Appender as specified by a Script.
Users can now inject context data from other sources than ThreadContext. Values can be any Object, not just Strings.
The SocketAppender now supports IO buffering.
Added the ability to generate Log4j 2 XML configuration file from a ConfigurationBuilder.
Hello Scala! We’ve added Logging APIs for Scala 2.10 and 2.11.
Added options to exclude stack trace from JSON, XML and YAML layouts.
Added Core API Configurator.shutdown(LoggerContext, long, TimeUnit).
FileAppender and RollingFileAppender can create files on-demand.
PatternLayout added a color ANSI option to %message and %xThrowable.
org.apache.logging.log4j.core.LoggerContext now implements Closeable.
Add ThreadContextMap2 interface supporting method putAll(Map<String, String>).
Add JUnit Rule implementations to manage the thread context.
The Core AbstractConfiguration and AbstractManager now track its LoggerContext; add Configuration.getLoggerContext().

Continuing the Asynchronous Epic

We’ve added support for java.util.concurrent.LinkedTransferQueue to the AsyncAppender.
Added optional support for the Conversant DisruptorBlockingQueue in AsyncAppender.
Added optional support for JCTools MPSC bounded lock-free queue in AsyncAppender.

Continuing the GC-free Epic

Continuing the GC-free epic, we’ve added support for garbage-free ThreadContext map. This is disabled by default, and users need to enable this explicitly.
Also in GC-free-land, we changed LogEvent‘s internal data structure for context data to be garbage-free. We added the method LogEvent#getContextData() and deprecated getContextMap().

Continuing the Builder Epic

Added Builders for the ConsoleAppender, FileAppender, RoutingAppender, SocketAppender, and ServletAppender (and deprecated factory methods).
Builders can be generic.
A Builder can subclass another Builder.

And bug fixes as well, for which you can just see the Log4j site.

Happy Log4j Logging!
Gary

The Log4j JNDI Attack

Log4j 1.x mitigation

Log4j 2.x mitigation

Share this:

What is binary compatibility?

Benefiting from binary compatibility

Defining guidelines at Apache Commons

What am I allowed to change?

Checking for binary compatibility

Wrapping up

Explaining Terms at Apache

Share this:

Share this:

Getting Rid of FUD and Panic

Getting Started

The Trusted and Sensitive Kind

Looking for Trouble Again

Digging deeper

Working from a baseline

Linking to the root cause

Finding arbitrary code attacks

Fin

Share this:

Getting started

No memory for you!

Digging deeper

Are you Trusted and Sensitive?

Flipping it around on ShiftLeft

Finding arbitrary code attacks

Concluding on this first arbitrary code attack

Looking for more arbitrary code attacks

Fin

Share this:

Share this:

Share this:

TL;DR

Configuring a git remote for a fork

Catching up a git fork to master

Share this:

The Good

The Bad

The Ugly

What are the alternatives?

From FindBugs to HuntBugs

What’s on the horizon?

Hello Bill!

Now what?

Share this:

Continuing the Asynchronous Epic

Continuing the GC-free Epic

Continuing the Builder Epic

Share this: