Category Archives: Apache

Visualizing Code Vulnerabilities with the new ShilftLeft UI

This post is a follow up to Using ShiftLeft in Open Source, where I was looking to see if I could apply the principle of shift left testing to security. Now that ShiftLeft has a user interface, I want to come back to it and revisit looking at results from the UI instead of pouring through JSON reports. You’ll find that this write up parallels my original post so reading the original is not required to get up to speed.

Getting Rid of FUD and Panic

To get us started, allow me to go through the premise from my initial post: My long term goal is to formally insert security awareness into my development practices and eventually into my continuous integration-based builds.

After years of being involved in open-source development at Apache, we’ve seen security issues pop up in Apache Commons like arbitrary remote code execution, and denial of service attacks (CVE-2016-3092 and CVE-2014-0050). While some threats are real, other are just FUD. Even when they are real, it is important to consider context. There may be problems that end users never see because the “hole” is not reachable by users’ actions.

The idea behind ShiftLeft is to break old habits of building a product and then, later, figuring out how to fend off attacks and plug up security holes. Today, we take for granted that unit testing, integration testing, continuous integration, and continuous delivery are common place. ShiftLeft propose to make security analysis as ubiquitous.

traditional-shift-left

By DonFiresmith (Own work) [CC BY-SA 4.0], via Wikimedia Commons

Getting Started

Since ShiftLeft is free for open source projects, I decided to look what it reports for Apache Commons IO, an Apache Commons Java component.

To get started, go to https://www.shiftleft.io/developers/ and enter a GitHub repository URL.

ShiftLeft-enter-url

ShiftLeft then asks you for your name and email address:

ShiftLeft-name-emaill

And you are off to the races.

It’s important to note that  ShiftLeft has a 30 day disclosure policy so you have plenty of time to fix up your FOSS projects.

My previous post looked at the 2.5 release tag for Apache Commons IO; here I am working with my GitHub fork of the master branch, which I’ve kept up-to-date. While my initial experiment with ShiftLeft gave me a 150 KB JSON report to pour over, here, I have a nice web UI to explore:

ShiftLeft-main

What does it all mean? We have three areas in the UI that we will explore:

  • The top-left shows a summary for the current state of the repository’s master branch: the latest commit details and a summary of conclusions (in white boxes.)
  • The dark-colored list on the left shows what ShiftLeft calls conclusions. These are our potentially actionable items. As we’ll see, even if you find some conclusions non-actionable, these will do a great deal to raise your awareness of potential security issues for code that you’ll write tomorrow or need to maintain today. You can expand each item (dark box) to reveal more information.
  • On the right-hand-side, you see a tree with paths of all public classes organized by package. On the left of that pane is a list of packages. You can expand each package to reveal of the public classes it contains. You can then expand each class to show its methods. We’ll see of this later. Leading away from tree item that have a conclusion, you’ll see light-colored path to its category. In other words, if you see a path leading away from an item, be it a package or class, that means one of its containing items carries with it a conclusion.

The first thing to notice of course is that I no longer have to consider the whole JSON report file. In the UI, the conclusions are presented in an expandable list without having to  filter out the graph data (and thank goodness for that.) There is also a heading called “Issues” you will use to track which conclusions you want to track for changes. Since we’ve not marked any conclusions as issues, the UI presents the expected “0” count and that “No conclusions marked as issues”.

The first UI elements to notice are the two summary boxes for “Sensitive Data” and “Untrusted Data”. ShiftLeft uses these two terms in conclusion descriptions to organize its findings.

The Trusted and Sensitive Kind

Lets describe “Sensitive Data” and “Untrusted Data”.

ShiftLeft-sensitive-and-trusted

Conclusions described as dealing with Sensitive Data tell you: Lookout, if you have a password in this variable, it’s in plain text. Now, it’s up to me to make sure that this password does not end up in a clear text file or anywhere else that is not secure. This is where context matters, you are the SME of your code, you know how much trouble you can get yourself and your users into, ShiftLeft has no opinion, it offers ‘conclusions.’

Conclusions referring to Untrusted Data: This tells me I should take precautions before doing anything with that data. Should I just execute this script? Should I need to worry about JSON Hijacking? See Why does Google prepend while(1); to their JSON responses?

sensitive-data-in-the-cloud-blog-image-1

Looking for Trouble Again

Let’s start with a simple conclusion and get deeper in the weeds after that. When you click on “Sensitive Data” and “Untrusted Data”, you filter the list of conclusions. I choose “Untrusted Data” because I am looking for the first interesting conclusion I found while writing Using ShiftLeft in Open Source: The method IOUtils.buffer(Writer, int) does not support handling untrusted data to be passed as parameter size because it controls the size of a buffer, giving an attacker the chance to starve the system of memory. I find it quickly using a page search:

ShiftLeft-IOUtils-buffer

I can click on the link to open a page on exact line of code in GitHub:

GitHub-IOUtils-write

While this example may seem trivial, ShiftLeft shows understanding of what the code does in this method: We are allowing call sites to control memory usage in an unbounded manner.

Let’s imagine an application that would allow an unbound value to be used, for example, to process a 2 GB file and that would care about this API and the conclusion rendered by ShiftLeft. To track this conclusion, we mark it as an issue to have it tracked in our Issues list:

ShiftLeft-issues-1

Now, for the fun part. Let’s edit the code to guard against unbounded usage. Let’s institute an arbitrary 10 MB limit. We’ll change the code from:

    /**
     * Returns the given Writer if it is already a {@link BufferedWriter}, otherwise creates a BufferedWriter from the
     * given Writer.
     *
     * @param writer the Writer to wrap or return (not null)
     * @param size the buffer size, if a new BufferedWriter is created.
     * @return the given Writer or a new {@link BufferedWriter} for the given Writer
     * @throws NullPointerException if the input parameter is null
     * @since 2.5
     */
    public static BufferedWriter buffer(final Writer writer, int size) {
        return writer instanceof BufferedWriter ? (BufferedWriter) writer : new BufferedWriter(writer, size);
    }

to:

    private static final int MAX_BUFFER_SIZE = 10 * 1024 * 1024; // 10 MB

    /**
     * Returns the given Writer if it is already a {@link BufferedWriter}, otherwise creates a BufferedWriter from the
     * given Writer.
     *
     * @param writer the Writer to wrap or return (not null)
     * @param size the buffer size, if a new BufferedWriter is created.
     * @return the given Writer or a new {@link BufferedWriter} for the given Writer
     * @throws NullPointerException if the input parameter is null
     * @since 2.5
     */
    public static BufferedWriter buffer(final Writer writer, int size) {
    	if (size > MAX_BUFFER_SIZE) {
            throw new IllegalArgumentException("Request buffer cannot exceed " + MAX_BUFFER_SIZE);
    	}
        return writer instanceof BufferedWriter ? (BufferedWriter) writer : new BufferedWriter(writer, size);
    }

After pushing this change to GitHub, I do not see a change in my ShiftLeft report; ah, this is a beta still, should I chalk this up to work in progress or is there still potential trouble ahead?

I wonder if this method shouldn’t be always flagged anyway. Yes, I changed the code so that the memory allocation is no longer unbounded, but who is to decide if my MAX_BUFFER_SIZE is reasonable or not? It might be fine for a simple use case like a single threaded app does does it once. What if I have ten thousand concurrently tasks that want to do this? Is that still reasonable? I’m not so sure. So for now, I think I like being notified of this memory allocation.

Digging deeper

In my previous ShiftLeft post — based on Apache Commons IO 2.5, not master — I had found this conclusion (in raw form edited for brevity):

{
 "id": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)/srcFile/2",
 "description": "The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.",
 "unsupportedDataType": "SENSITIVE",
 "interfaceId": "FILE/false",
 "methodId": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)",
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

Looking at the methodId tells us to go look at FileUtils.copyFileToDirectory(File, File) where we find:

/**
 * Copies a file to a directory preserving the file date.
 *
 * This method copies the contents of the specified source file
 * to a file of the same name in the specified destination directory.
 * The destination directory is created if it does not exist.
 * If the destination file exists, then this method will overwrite it.
 *
 * <strong>Note:</strong> This method tries to preserve the file's last
 * modified date/times using {@link File#setLastModified(long)}, however
 * it is not guaranteed that the operation will succeed.
 * If the modification operation fails, no indication is provided.
 *
 * @param srcFile an existing file to copy, must not be {@code null}
 * @param destDir the directory to place the copy in, must not be {@code null}
 *
 * @throws NullPointerException if source or destination is null
 * @throws IOException if source or destination is invalid
 * @throws IOException if an IO error occurs during copying
 * @see #copyFile(File, File, boolean)
 */
 public static void copyFileToDirectory(final File srcFile, final File destDir) throws IOException {
  copyFileToDirectory(srcFile, destDir, true);
 }

This method just delegates to another copyFileToDirectory() with an added parameter, no big deal. What is interesting is that the codeLocationUrl points to code not in this method but to a private utility method:

https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141

FileUtils at line 1141 is in the guts of a private method called org.apache.commons.io.FileUtils.doCopyFile(File, File, boolean) which is where ShiftLeft flagged an issue where the method creates a new FileInputStream. Because ShiftLeft is working with a code graph, when I search the JSON conclusions for this URL, I find a total of 14 conclusions that use this URL. This tells me that this code fragment creates 14 possible vulnerabilities in the component; with a careful emphasis on possible since context is important.

If I search in the Conclusions list on the left f the page, I find several hits for “FileUtils.copyFileToDirectory”. Then, I can click to expand each one so see the exact location and hyperlink to GitHub. What I hope is coming is the ability to filter sort so I create a mental picture like I was able with the JSON report.

ShiftLeft also has a user friendly way to discover this information: the tree view:

ShiftLeft-explore-commons-io

In this view, the “” node is the topmost package in Apache Commons IO. You can see that it has a path that leads to all three different categories: Generic, File, and Child process. This means that the root package contains conclusions and that these conclusions are in the linked categories.

When I expand the root node, I find the FileUtils class (highlighted):

ShiftLeft-explore-FileUtils-class.png

You can see that the class has a path leading away from it, so I know it contains conclusions. At that point, it’s a little harder to make sense of the categories as they’ve scrolled off the top of the screen. It would be nice if the categories floated down as you scroll. Version 2 I hope! You can also see that some classes like FilenameUtils and IOCase do not have paths leading away from them and therefore do not carry conclusions. A relief I suppose, but I’d like to ability to filter out items that are conclusion-free.

I now expand the FileUtils class:

ShiftLeft-explore-FileUtils

Here, some methods have paths, some don’t; scrolling down, we get to copyFileToDirectory:

ShiftLeft-explore-copyFileToDirectory.png

As expected, the method has a path leading away from it which indicates a conclusion but we do not know which kind or which one. We do get a description of its parameters though, a nice touch.

For now, clicking on the method does not do anything where I would expect to be able perform the same operations as in the list. This view lets you explore the whole library but I do not find it terribly useful beyond the path to categories. I’d like to see hyperlinks to code and also the use of color to distinguish which methods are flagged as Untrusted Data and Sensitive Data as well as an indication as to which categories are involved that does not scroll of the screen.

The nice thing though is that I have two paths of exploration in the UI: the conclusion list and the explorer tree.

There are two key technologies at work here and that I expect both to get better as the beta progresses: First, building a code graph to give us the power to see that once a problem has been identified on a line of code, that all (I assume public) call-sites can be flagged. Second, what constitutes a problem or a conclusion in ShiftLeft’s neutral parlance will improve and be configurable, filterable and sortable.

In this example, the conclusion description reads:

The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.

What goes through my head when I read that is: Yeah, I do not want just anybody to be able to copy any file anywhere like overwriting a password vault a la copyFileToDirectory(myFile, "/etc/shadow"). Granted, Apache Commons IO is a library, not an application, so there is no alarm bells to ring here, but you get the idea.

Stepping back, I think it is important to reiterate what happened here: ShiftLeft found an issue (less dramatic than a problem) on a line of code in a private methods, then, using its code graph, created conclusions (report items) for each public facing method that may eventually call this private method in its code path.

Working from a baseline

If you think that having a list over 200 hundred conclusions to sift through is daunting, I would agree with you. This is why I look forward to using some sorting and filtering in the UI!

What matters just as much is how to use ShiftLeft when your code evolves. I want to track differences from commit to commit and from build to build: Did I create or squash vulnerabilities? This I can tell by watching the Conclusions and Issues list in the UI. I am hoping that ShiftLeft will implement a similar feature to Coveralls where you get an email that tells how much your test code coverage has changed in a build.

As an experiment, let’s see what happens when I add some possibly malicious code, a method to delete all files and directories from a given directory:

package org.apache.commons.io;

import java.io.File;
import java.io.IOException;

public class ADangerousClass {

    public void deleteAll(File directory) throws IOException {
        FileUtils.deleteDirectory(directory);
    }

}

Note that all this method does is delegate to another method. I hit refresh in my browser and I see my commit:

ShiftLeft-ADangerousClass-main

My commit comment, date, and commit hash are there. ShiftLeft goes to work for about two minutes (the two counts are reset to 0 as ShiftLeft is analyzing.) Then the Sensitive Data and Untrusted Data conclusion counts have gone up. Scrolling down I see my new class:

ShiftLeft-ADangerousClass-list

I also see it in the tree of course:

ShiftLeft-ADangerousClass-tree.png

Notice that the deleteAll method has a path to the File category on the right hand side, this makes sense based on my previous findings.

Now I really want to click on the categories on the right as filters! I am especially intrigued by the “Child process” category.

What is worth noting here is that my new class and method do not in themselves actually do anything dangerous. But since we are working with a code graph, and that graph leads to a dangerous place, the new code is flagged.

Now for a bit of fun, let’s change the method to make the dangerous bits unreachable:

    public void deleteAll(File directory) throws IOException {
        if (false) {
            FileUtils.deleteDirectory(directory);
        }
    }

The dangerous class is gone from the list but present in the tree since it is a public API. What if it’s something more tricky? Let’s make some code unreachable through a local variable, and we will make it final to make it obvious to the code graph that the value is immutable:

    public void deleteAll(File directory) throws IOException {
        final boolean test = 1 == 2;
        if (test) {
            FileUtils.deleteDirectory(directory);
        }
    }

The dangerous class is still gone from the list. Pretty clever it is. Let’s see about delegating the test to a method:

    public void deleteAll(File directory) throws IOException {
        final boolean test = test();
        if (test) {
            FileUtils.deleteDirectory(directory);
        }
    }

    private boolean test() {
        return 1 == 2;
    }

ShiftLeft now shows the deleteAll() method in both the Untrusted Data and Sensitive Data lists. So that’s a false positive. Let’s get away from using a method and use two local variables instead:

    public void deleteAll(File directory) throws IOException {
        final Object obj = null;
        boolean test = true;
        if (obj == null) {
            test = false;
        }
        if (test) {
            FileUtils.deleteDirectory(directory);
        }
    }

With this change, ShilfLeft still puts the method as Untrusted Data and Sensitive Data lists. OK, so this is a bit like Eclipse’s compiler warnings for null analysis, it flags what it can see without really evaluating, fair enough.

Linking to the root cause

Let’s go back to the conclusions list for a minute. My deleteAll experiment created two conclusions: one untrusted data, one senstive data. Let’s take a closer look at these.

Untrusted Data

.ADangerousClass.deleteAll

The method deleteAll does not support handling untrusted data to be passed as parameter directory because it controls access to I/O File in a manner that would allow an attacker to abuse it.

When I click on the GitHub link for Untrusted Data, I see:

Note that we are not in the deleteAll method here, rather we are where the ShiftLeft code graph flags as the root issue. In other words, if I wrote a public method that called deleteAll, I would get the same conclusion and link. Graph Power!

Why is calling directory.listFiles() labeled untrusted? Well, passing a sensitive file path should not be considered a problem, because the file path you are searching for would not end up written on the disk. It is however considered dangerous if attackers were to control the input path, because they could be able to list arbitrary directories on the system. That’s a breach.

Only considering the method verifiedListFiles(), ShiftLeft does not know that the method is used in an operation to delete files. That’s up next:

Sensitive Data
.ADangerousClass.deleteAll
The method deleteAll does not support handling sensitive data to be passed as parameter directory because it is leaked over I/O File.

When I click on the GitHub link for Sensitive Data, I see:

Clearly calling File.delete() can be trouble but using the sensitive data category may be a bit of a stretch. If any sensitive data is used in a file operation, (for example, as the path of the file, like “path/to/my-secrets”,) then that data will end up on disk. For a delete operation, you could say that that’s not the case because you’re doing the reverse, but actually just the fact that you are deleting a file with a sensitive name is interesting. It’s also possible that you already had previously written sensitive data unencrypted to the disk. That’s a roundabout way to get there but it feels justifiable.

Finding arbitrary code attacks

When I first ran ShiftLeft on Apache Commons 2.5, I found a few conclusions for arbitrary code attacks in the Java7Support class. Now that Apache Commons in Git master requires Java 7, the Java7Support class is gone. At the moment, I’ve not found a way to run ShiftLeft on anything but the master branch of a repository, so let’s make our own trouble with Method.invoke() to call BigInteger.intValueExact() on Java 8 and intValue() on older versions of Java:

package org.apache.commons.io;

import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.math.BigInteger;

public class BigIntHelper {

    private static Method intValueExactMethod;

    static {
        try {
            intValueExactMethod = BigInteger.class.getMethod("intValueExact");
        } catch (NoSuchMethodException | SecurityException e) {
            intValueExactMethod = null;
            e.printStackTrace();
        }
    }

    public static int getExactInt(BigInteger bigInt) {
        try {
            return (int) (intValueExactMethod != null
                ? intValueExactMethod.invoke(bigInt)
                : bigInt.intValue());
        } catch (IllegalAccessException | IllegalArgumentException | InvocationTargetException e) {
            e.printStackTrace();
            return bigInt.intValue();
        }
    }

    public static void main(String[] args) {
        System.out.println(getExactInt(BigInteger.TEN));
    }
}

This code is OK by ShiftLeft even though our intValueExactMethod variable is private but not final:

ShiftLeft-BigIntHelper-1

Let’s open things up by making the variable by changing:

private static Method intValueExactMethod;

to:

public static Method intValueExactMethod;

For the Java7Support class in Apache Commons 2.5, ShiftLeft reports several arbitrary code attack vulnerabilities. Unfortunately, ShiftLeft does not report any such vulnerabilities for this example. Growing pains I suppose. Well, that’s all I have for now. A fun exploration in an area I’d like to get back to soon.

Fin

fin_3_0I’d like to wrap up this exploration of ShiftLeft with a quick summary of what we found: a tool we can add to our build pipelines to find potential security vulnerabilities.

There are a lot of data here, and this is just for Apache Commons IO! Another lesson is that context matters. This is low-level library as opposed to an application. Finding vulnerabilities in a low level library is good but this may not be vulnerabilities for your application. ShiftLeft conclusions can at least make you aware of how to use this library safely. ShiftLeft currently provides conclusions based on a code graph, this is powerful, as the examples show. We found conclusions about untrusted data (I’m not sure what’s in here so don’t go executing it) and sensitive data (don’t save passwords in plain text!)

I hope revisit this story and run ShiftLeft on other Apache Commons projects soon. This sure is fun!

Happy Coding,
Gary Gregory

Advertisements

Using ShiftLeft in Open Source

After years of being involved in open-source development at Apache, we’ve seen security issues pop up in Apache Commons like arbitrary remote code execution, and denial of service attacks (CVE-2016-3092 and CVE-2014-0050). While some threats are real, other are just FUD. Even when they are real, it is important to consider context. There may be problems that end users never see because the “hole” is not reachable by users’ actions.

I started to wonder how we can short-circuit FUD and proactively deal with security issues before we hear about them in the press and in panicked mailing list posts. Is there something we can do at build time? A FindBugs or PMD on steroids? Surely someone else must be thinking about this. Can we apply the principle of shift left testing to security?

It turns out there is a tool out there that does and it’s called, not coincidentally, ShiftLeft.io. The idea behind ShiftLeft is to break old habits of building a product and then, later, figuring out how to fend off attacks and plug up security holes. Today, we take for granted that unit testing, integration testing, continuous integration, and continuous delivery are common place. ShiftLeft propose to make security analysis as ubiquitous.

Getting started

Since ShiftLeft is free for open source projects, I decided to look what it reports for Apache Commons IO 2.5, an Apache Commons Java component. While I won’t divulge right now any real security issues, I’ll show what is safe to report about Commons IO. I’ll also show the cool stuff ShiftLeft points out to write more bullet-proof software. To use ShiftLeft, you go to their web site and submit the URL to your GitHub repository. ShiftLeft has a 30 day disclosure policy so you have plenty of time to fix up your FOSS project.

What I got is a JSON report file containing all sorts of data and what ShiftLeft calls conclusions. These are the potentially actionable items. As we’ll see, even if you find some conclusions non-actionable, these will do a great deal to raise your awareness of potential security issues for code that you’ll write tomorrow or need to maintain today.

Let’s start with a simple conclusion and get deeper in the weeds after that.

No memory for you!

The following example shows what “untrusted” data is and how it can affect your application.

I have a ShiftLeft conclusion that tells me the method IOUtils.buffer(Writer, int) does not support handling untrusted data to be passed as parameter size because it controls the size of a buffer, giving an attacker the chance to starve the system of memory:

/**
 * Returns the given Writer if it is already a {@link BufferedWriter}, otherwise creates a BufferedWriter from the
 * given Writer.
 *
 * @param writer the Writer to wrap or return (not null)
 * @param size the buffer size, if a new BufferedWriter is created.
 * @return the given Writer or a new {@link BufferedWriter} for the given Writer
 * @throws NullPointerException if the input parameter is null
 * @since 2.5
 */
 public static BufferedWriter buffer(final Writer writer, int size) {
  return writer instanceof BufferedWriter ? (BufferedWriter) writer : new BufferedWriter(writer, size);
 }

While this example may seem trivial, ShiftLeft shows understanding of what the code does in this method: We are allowing call sites to control memory usage in an unbounded manner. ShiftLeft gives us the exact method and line number:

While some ShiftLeft conclusions may not all be actionable, I must say that the report has made me more aware of what to potentially secure when writing new code or maintaining old an code base.

zzfireplaceTangent: What could you do in this case? You could hard-code an upper bound of say 10 MB. You could make the bound configurable with a static non-final variable (effectively creating a global variable, an anti-pattern for sure.) You could move all the static methods to the instance side, create a memory profile class to define this bound, and then build IOUtils instances with this profile in the constructor. In addition, you could also use a different buffer class internally to make sure the buffer does not grow beyond a given size. And so on. I am not proposing these changes in the context of the Apache Commons IO 2.x line since we take great care to maintain binary compatibility within a major release across all of Apache Commons. But I can definitively see proposing changes for 3.0!

ShiftLeft’s understanding of code is deeper than this example thanks to Enhanced Code Property Graphs so let’s look at a more complex example.

Digging deeper

Here is a conclusion in raw form (edited for brevity):

{
 "id": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)/srcFile/2",
 "description": "The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.",
 "unsupportedDataType": "SENSITIVE",
 "interfaceId": "FILE/false",
 "methodId": "org.apache.commons.io.FileUtils.copyFileToDirectory:void(java.io.File,java.io.File)",
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

Looking at the methodId tells us to go look at FileUtils.copyFileToDirectory(File, File) where we find:

/**
 * Copies a file to a directory preserving the file date.
 *

 * This method copies the contents of the specified source file
 * to a file of the same name in the specified destination directory.
 * The destination directory is created if it does not exist.
 * If the destination file exists, then this method will overwrite it.
 *

 * <strong>Note:</strong> This method tries to preserve the file's last
 * modified date/times using {@link File#setLastModified(long)}, however
 * it is not guaranteed that the operation will succeed.
 * If the modification operation fails, no indication is provided.
 *
 * @param srcFile an existing file to copy, must not be {@code null}
 * @param destDir the directory to place the copy in, must not be {@code null}
 *
 * @throws NullPointerException if source or destination is null
 * @throws IOException if source or destination is invalid
 * @throws IOException if an IO error occurs during copying
 * @see #copyFile(File, File, boolean)
 */
 public static void copyFileToDirectory(final File srcFile, final File destDir) throws IOException {
  copyFileToDirectory(srcFile, destDir, true);
 }

This method just delegates to another copyFileToDirectory() with an added parameter, no big deal. What is interesting is that the codeLocationUrl points to code not in this method but to a private utility method:

https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L1141

FileUtils at line 1141 is in the guts of a private method called org.apache.commons.io.FileUtils.doCopyFile(File, File, boolean) which is where ShiftLeft flags an issue where the method creates a new FileInputStream. To be honest, this is a beta and I am not convinced that the line numbers are always dead on. What is important here is that ShiftLeft is working with a code graph. Therefore, when I search the JSON conclusions for this URL, I find a total of 14 conclusions that use this URL. This tells me that this code fragment creates 14 possible attack points in the component; with a careful emphasis on possible since context is important.

Another key point is to realize that there are two key technologies at work here and that I expect both to get better as the beta progresses: First, building a code graph to give us the power to see that once a problem has been identified on a line of code, that all (I assume public) call-sites can be flagged. Second, what constitutes a problem or a conclusion in ShiftLeft’s neutral parlance will improve and be configurable, filterable and sortable.

In this example, the conclusion description reads:

The method `copyFileToDirectory` does not support handling **sensitive data** to be passed as parameter `srcFile` because it is leaked over I/O **File**.

What goes through my head when I read that is: Yeah, I do not want just anybody to be able to copy any file anywhere like overwriting a password vault a la copyFileToDirectory(myFile, "/etc/shadow"). Granted, Commons IO is a library, not an application, so there is no alarm bell to ring here, but you get the idea.

Stepping back, I think it is important to reiterate what happened here: ShiftLeft found an issue (less dramatic than a problem) on a line of code in a private methods, then, using its code graph, created conclusions (report items) for each public facing method that may eventually call this private method in its code path.

Are you Trusted and Sensitive?

ShiftLeft uses two terms in conclusion descriptions that I want to review. Based on the limited subset of 255 (a very computer friendly number!) conclusions I saw for all of Commons IO, we can see two types of issues: trusted and sensitive.

sensitive-data-in-the-cloud-blog-image-1Conclusions described as dealing with sensitive data: This says: Lookout, if you have a password in this variable, it’s in plain text. Now, it’s up to me to make sure that this password does not end up in a clear text file or anywhere else that is not secure. This is where context matters, you are the SME of your code, you know how much trouble you can get yourself and your users into, ShiftLeft has no opinion, it offers ‘conclusions.’

Conclusions referring to untrusted data: This tells me I should take precautions before doing anything with that data. Should I just execute this script? Should I need to worry about JSON Hijacking? See Why does Google prepend while(1); to their JSON responses?

Flipping it around on ShiftLeft

Let’s flip it around on ShiftLeft for a minute. I am now thinking about not writing sensitive data like passwords and financial reports to disk insecurely. I know we have many API in FileUtils that write strings to files. Will ShiftLeft tell me about, for example FileUtils.write(File, CharSequence, Charset)? Here is the method I should never use to write passwords or any sensitive data:

/**
 * Writes a CharSequence to a file creating the file if it does not exist.
 *
 * @param file the file to write
 * @param data the content to write to the file
 * @param encoding the encoding to use, {@code null} means platform default
 * @throws IOException in case of an I/O error
 * @since 2.3
 */
 public static void write(final File file, final CharSequence data, final Charset encoding) throws IOException {
  write(file, data, encoding, false);
 }

This in turn calls:

/**
 * Writes a CharSequence to a file creating the file if it does not exist.
 *
 * @param file the file to write
 * @param data the content to write to the file
 * @param encoding the encoding to use, {@code null} means platform default
 * @param append if {@code true}, then the data will be added to the
 * end of the file rather than overwriting
 * @throws IOException in case of an I/O error
 * @since 2.3
 */
 public static void write(final File file, final CharSequence data, final Charset encoding, final boolean append)
 throws IOException {
  final String str = data == null ? null : data.toString();
  writeStringToFile(file, str, encoding, append);
 }

Which calls:

/**
 * Writes a String to a file creating the file if it does not exist.
 *
 * @param file the file to write
 * @param data the content to write to the file
 * @param encoding the encoding to use, {@code null} means platform default
 * @param append if {@code true}, then the String will be added to the
 * end of the file rather than overwriting
 * @throws IOException in case of an I/O error
 * @since 2.3
 */
 public static void writeStringToFile(final File file, final String data, final Charset encoding, final boolean
 append) throws IOException {
   OutputStream out = null;
   try {
     out = openOutputStream(file, append);
     IOUtils.write(data, out, encoding);
     out.close(); // don't swallow close Exception if copy completes normally
   } finally {
     IOUtils.closeQuietly(out);
   }
 }

Which calls IOUtils‘:

/**
 * Writes chars from a <code>String</code> to bytes on an
 * <code>OutputStream</code> using the specified character encoding.
 *

 * This method uses {@link String#getBytes(String)}.
 *
 * @param data the <code>String</code> to write, null ignored
 * @param output the <code>OutputStream</code> to write to
 * @param encoding the encoding to use, null means platform default
 * @throws NullPointerException if output is null
 * @throws IOException if an I/O error occurs
 * @since 2.3
 */
 public static void write(final String data, final OutputStream output, final Charset encoding) throws IOException {
   if (data != null) {
     output.write(data.getBytes(Charsets.toCharset(encoding)));
   }
 }

Knowing what I know, I expect ShiftLeft to conclude that all these methods do not support sensitive data. Working back up the stack, I find:

  • org.apache.commons.io.IOUtils.write(String, OutputStream, Charset)
    Nothing on that one; did I miss it due to the 255 conclusion limit?
  • org.apache.commons.io.FileUtils.writeStringToFile(File, String, Charset, boolean)
    Got it:

     {
     "id": "org.apache.commons.io.FileUtils.writeStringToFile:void(java.io.File,java.lang.String,boolean)/file/1",
     "description": "The method `writeStringToFile` does not support handling **untrusted data** to be passed as parameter `file` because it controls access to I/O **File** in a manner that would allow an attacker to abuse it.",
     "unsupportedDataType": "ATTACKER_CONTROLLED",
     "interfaceId": "FILE/false",
     "methodId": "org.apache.commons.io.FileUtils.writeStringToFile:void(java.io.File,java.lang.String,boolean)",
     "codeLocation": {
       "file": "org/apache/commons/io/FileUtils.java",
       "lineNumber": 360,
       "symbol": "parent"
     },
     "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L360",
     "state": "NEUTRAL",
     "externalIssueUrl": "https://todo"
     }
    
  • org.apache.commons.io.FileUtils.write(File, CharSequence, Charset, boolean)
    Got it:

    {
     "id": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset,boolean)/file/2",
     "description": "The method `write` does not support handling **sensitive data** to be passed as parameter `file` because it is leaked over I/O **File**.",
     "unsupportedDataType": "SENSITIVE",
     "interfaceId": "FILE/false",
     "methodId": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset,boolean)",
     "codeLocation": {
       "file": "org/apache/commons/io/FileUtils.java",
       "lineNumber": 355,
       "symbol": "parent"
     },
     "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L355",
     "state": "NEUTRAL",
     "externalIssueUrl": "https://todo"
     }
    
  • org.apache.commons.io.FileUtils.write(File, CharSequence, Charset)
    Got it:

    {
     "id": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset)/file/2",
     "description": "The method `write` does not support handling **sensitive data** to be passed as parameter `file` because it is leaked over I/O **File**.",
     "unsupportedDataType": "SENSITIVE",
     "interfaceId": "FILE/false",
     "methodId": "org.apache.commons.io.FileUtils.write:void(java.io.File,java.lang.CharSequence,java.nio.charset.Charset)",
     "codeLocation": {
       "file": "org/apache/commons/io/FileUtils.java",
       "lineNumber": 355,
       "symbol": "parent"
     },
     "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/FileUtils.java#L355",
     "state": "NEUTRAL",
     "externalIssueUrl": "https://todo"
     }
    

So yeah, that all hangs nicely together, thank you code graphs!

Finding arbitrary code attacks

Did I mention there 255 are conclusions in the JSON report file? It takes a while to go through these. I am hoping that ShiftLeft will have their UI in place soon so I can filter and sort all this information! Now that I am about 20% through the file, I see:

The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.

runaway_train

Yikes! let’s take a deeper look and see if this is for real or a false positive.

Here is the raw conclusion:

{
 "id": "org.apache.commons.io.Java7Support.createSymbolicLink:java.io.File(java.io.File,java.io.File)/symlink/1",
 "description": "The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.",
 "unsupportedDataType": "ATTACKER_CONTROLLED",
 "methodId": "org.apache.commons.io.Java7Support.createSymbolicLink:java.io.File(java.io.File,java.io.File)",
 "codeLocation": {
 "file": "org/apache/commons/io/Java7Support.java",
 "lineNumber": 128,
 "symbol": "file"
 },
 "codeLocationUrl": "https://github.com/apache/commons-io/blob/commons-io-2.5/src/main/java/org/apache/commons/io/Java7Support.java#L128",
 "state": "NEUTRAL",
 "externalIssueUrl": "https://todo"
 }

Our codeLocationUrl for this conclusion points us to the Java7Support class at line 128: where we find:

/**
 * Indicates if a symlunk target exists
 * @param file The symlink file
 * @return true if the target exists
 * @throws IOException upon error
 */
 private static boolean exists(File file)
 throws IOException {
   try {
     Object path = toPath.invoke(file);
     final Boolean result = (Boolean) exists.invoke(null, path, emptyLinkOpts);
     return result.booleanValue();
   } catch (IllegalAccessException e) {
     throw new RuntimeException(e);
   } catch (InvocationTargetException e) {
     throw (RuntimeException) e.getTargetException();
   }
}

ShiftLeft points to the line:

Object path = toPath.invoke(file);

The instance variable toPath is a java.lang.reflect.Method which can and does execute code as shown above. Looking narrowly at the code so far we can say that yes, this code run anything since toPath is a Method.

However, widening our view to the field declaration we see the following in the class static initialization block:

toPath = File.class.getMethod("toPath");

This makes sense in the context of the class: Java7Support is used to access Java 7 features while running on pre-Java 7 platforms. Here we are setting up toPath to run one method. I would expect toPath to be a static final but it is not:

private static Method toPath;

Why is it not static? Well, it’s just that the way the static initialize block is written does not allow you to just add final to the declaration. The static block needs to be rewritten to allow for toPath to be final which we will leave as ‘an exercise to the reader’ 😉 as it is out of scope for an already long blog post.

I would be curious to see how ShiftLeft responds to such a code change.

I am not sure if this is really a problem though. The variable is private now, but not final. Yes its type (Method) is all about executing code. Under normal circumstances, this value cannot be changed outside this class. I can use reflection of course to force a new value in toPath. Does that mean that anytime I use a Method instance variable I am going to get an arbitrary code execution conclusion? Another corner-case to examine.

What if I rewrote the static block and declared the variable final. Would ShiftLeft still reach the same conclusion? If yes, would that be because I could still use reflection to hammer any value in the field.

Concluding on this first arbitrary code attack

The more I explore these test results, the more I realize how tricky security is and how much context matters. I now know that the Java7Support class in Apache Commons IO 2.5 is open to an arbitrary code attack under the narrow use case of another piece of code using reflection. But if that code is allowed to use reflection, surely it could achieve its goal without going through the extra hoop of Java7Support hacking.

Stepping back, the realization is that I should think twice about using the Method class because I could open my application up to an attack unless that Method field is properly protected.

Looking for more arbitrary code attacks

Now that ShiftLeft has whetted my appetite, I wonder if there are more arbitrary code attacks lurking. A quick search through the file reveals to total of five. Not surprisingly, these are all in the Java7Support class and all follow the same pattern as above: calling the invoke method of a Method object where the Methodis initialized in the static block.

Flipping it around once more, let’s look at the Java7Support class variable declarations and see if all Method objects end up being accounted for by ShiftLeft:

/**
 * Java7 feature detection and reflection based feature access.
 * <p/>
 * Taken from maven-shared-utils, only for private usage until we go full java7
 */
class Java7Support {

  private static final boolean IS_JAVA7;

  private static Method isSymbolicLink;

  private static Method delete;

  private static Method toPath;

  private static Method exists;

  private static Method toFile;

  private static Method readSymlink;

  private static Method createSymlink;
  ...

We have seven static Method declarations which I see initialized in the static block:

static {
 boolean isJava7x = true;
 try {
   ClassLoader cl = Thread.currentThread().getContextClassLoader();
   Class<?> files = cl.loadClass("java.nio.file.Files");
   Class<?> path = cl.loadClass("java.nio.file.Path");
   Class<?> fa = cl.loadClass("java.nio.file.attribute.FileAttribute");
   Class<?> linkOption = cl.loadClass("java.nio.file.LinkOption");
   isSymbolicLink = files.getMethod("isSymbolicLink", path);
   delete = files.getMethod("delete", path);
   readSymlink = files.getMethod("readSymbolicLink", path);
   emptyFileAttributes = Array.newInstance(fa, 0);
   createSymlink = files.getMethod("createSymbolicLink", path, path, emptyFileAttributes.getClass());
   emptyLinkOpts = Array.newInstance(linkOption, 0);
   exists = files.getMethod("exists", path, emptyLinkOpts.getClass());
   toPath = File.class.getMethod("toPath");
   toFile = path.getMethod("toFile");
   } catch (ClassNotFoundException e) {
     isJava7x = false;
   } catch (NoSuchMethodException e) {
     isJava7x = false;
   }
   IS_JAVA7 = isJava7x;
 }

ShiftLeft gives me five conclusions:

The method `isSymLink` does not support handling **untrusted data** to be passed as parameter `file` because it could allow an attacker to run arbitrary code.
The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.
The method `delete` does not support handling **untrusted data** to be passed as parameter `file` because it could allow an attacker to run arbitrary code.
The method `createSymbolicLink` does not support handling **untrusted data** to be passed as parameter `target` because it could allow an attacker to run arbitrary code.
The method `readSymbolicLink` does not support handling **untrusted data** to be passed as parameter `symlink` because it could allow an attacker to run arbitrary code.

All of the Method declarations in this class are used by all of the methods listed above. Nice.

Fin

fin_3_0I’d like to wrap up this exploration of ShiftLeft with a quick summary of what we found: a tool we can add to our build pipelines to find potential security issues. There are a lot of data here, and this is just for Apache Commons IO 2.5, so another lesson is that context matters. ShiftLeft currently provides ‘conclusions’ based on a code graph. We found conclusions about untrusted data (I’m not sure what’s in here so don’t go executing it), sensitive data (don’t save passwords in plain text!) and arbitrary code attacks.

I hope revisit this story and run ShiftLeft on other Apache Commons projects soon. This sure is fun!

Happy Coding,
Gary Gregory