Masking sensitive data in Log4j 2

Automatically mask sensitive fields in Log4j 2 to protect your users privacy and comply with PCI standards..

Igor Shults

A growing practice across many organizations is to log as much information as is feasible, to allow for better debugging and auditing. Tools like Splunk and ELK may it even easier to index the logs, treating the them almost like databases.  However, with PCI and HIPAA standards, those same organizations may want to mask much of the data to prevent unauthorized or unprotected access to sensitive data.  In this blog post I’ll detail one potential approach to masking that data, so developers do not need to worry about filtering individual log statements.

Prerequisites

You’re going to need to use Log4j 2 (potentially with SLF4J as well).  A sample pom.xml for just these dependencies would include the lines:

<properties>
    <log4j.version>2.7</log4j.version>
    <slf4j.version>1.7.22</slf4j.version>
</properties>

<dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-core</artifactId>
    <version>${log4j.version}</version>
</dependency>
<dependency>
    <groupId>org.apache.logging.log4j</groupId>
    <artifactId>log4j-slf4j-impl</artifactId>
    <version>${log4j.version}</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-api</artifactId>
    <version>${slf4j.version}</version>
</dependency>

If you’re using a different logging framework, then I imagine this guide may not be very helpful.

Setup

I’m going to dive right in, as there are a few different files we need to create or modify to get log masking to work. The first file we will create is a pretty basic one. It’s going to hold all of our logging markers, so that we can tell Log4J to only run the masking on the log statements that need it. Masking our logs means we’re taking a performance hit, so we should not do it any more than we need to:

class LoggingMarkers {
    static final Marker JSON = MarkerFactory.getMarker('JSON-MASK')
    static final Marker XML = MarkerFactory.getMarker('XML-MASK')
}

I’ve got two basic Markers in that class, one for JSON, and one for XML. You can define as many as you need — for different content types, data types, etc. For this tutorial we’re only going to be using the JSON marker.

Let’s continue by extending the LogEventPatternConverter:

@Plugin(name = 'logmask', category = 'Converter')
@ConverterKeys(['cm'])
class LogMaskingConverter extends LogEventPatternConverter {
    private static final String NAME = 'cm'
    private static final String JSON_REPLACEMENT_REGEX = "\"\$1\": \"****\""
    private static final String JSON_KEYS = ['ssn', 'private', 'creditCard'].join('|')
    private static final Pattern JSON_PATTERN = Pattern.compile(/"(${JSON_KEYS})": "([^"]+)"/)

    LogMaskingConverter(String[] options) {
        super(NAME, NAME)
    }

    static LogMaskingConverter newInstance(final String[] options) {
        return new LogMaskingConverter(options)
    }

    @Override
    void format(LogEvent event, StringBuilder outputMessage) {
        String message = event.message.formattedMessage
        String maskedMessage = message

        if (event.marker?.name == LoggingMarkers.JSON.name) {
            try {
                maskedMessage = mask(message)

            } catch (Exception e) {
                maskedMessage = message // Although if this fails, it may be better to not log the message
            }
        }

        outputMessage.append(maskedMessage)
    }

    private String mask(String message) {
        StringBuffer buffer = new StringBuffer()
        Matcher matcher = JSON_PATTERN.matcher(message)

        while (matcher.find()) {
            matcher.appendReplacement(buffer, JSON_REPLACEMENT_REGEX)
        }

        matcher.appendTail(buffer)

        return buffer.toString()
    }
}

Ok, let’s stop and analyze the important bits. The ”ConverterKeys” value and the ”NAME” field we pass to the LogEventPatternConverter define the pattern that we will include in our “log4j2.xml” config. It’s what we need to include to ever see our masking at work. I believe you cannot override the default “%m”, so we are defining our own custom pattern “cm”. In fact, we will call “cm” INSTEAD of the default “m” in our configuration.

Next, the constructor and the ”newInstance()” methods are required for our converter to be properly invoked by Log4j. The ”format()” method holds the crux of our work. You can see that it takes the formatted message, and returns it if we do not have any Markers for the current logging statement. If we DO have markers (like for example our JSON one), then and only then will we attempt to mask the message.

I’ve implemented a simple JSON regex replacement for the mask method, but there are many different approaches you can take: you can hydrate the JSON and replace the values based on name/path, you can inspect an object to see if it’s annotated with a “DoNotMask” annotation, or you can even define simple regex values to replace (e.g. credit cards, SSNs). The implementation I provide is meant as a proof-of-concept example, and is not prod-ready. Also, if you DO decide to implement multiple strategies for different markers, it makes sense to move that logic into specific classes (I have included everything in one file for simplicity).

As a simple demonstration of this class, let’s also include the tests:

class LogMaskingConverterSpec extends Specification {

    @Shared
    LogMaskingConverter converter

    void setup() {
        converter = new LogMaskingConverter()
    }

    @Unroll
    void 'format() should mask sensitive data'() {
        setup:
            SimpleMessage message = new SimpleMessage(input)
            LogEvent logEvent = new Log4jLogEvent('LogMaskingConverterSpecLogger', new MarkerManager.Log4jMarker(LoggingMarkers.JSON.name), null, null, message, null)
            StringBuilder builder = new StringBuilder()

        when:
            converter.format(logEvent, builder)

        then:
            assert builder.toString() == expectedOutput

        where:
            input                                                          | expectedOutput
            '{"noMask": "foo"}'                                            | '{"noMask": "foo"}'
            '{"ssn": "1234567890", "id": "ABC-123", "private": "someKey"}' | '{"ssn": "****", "id": "ABC-123", "private": "****"}'
            'invalidJson'                                                  | 'invalidJson'
    }
}

At this point however, we are still not ready to use our class, as Log4j does not know to look for it. For this, we need to update the log4j2.xml file:

<Configuration packages='com.path.to.logging, com.your.other.packages'>
    <Properties>
      <Property name="maskingPattern">
        %d, level=%p, %cm
      </Property>
  </Properties>
  ...
</Configuration>

The key parts here are to update “Configuration packages” attribute to include the package (or parent) of your LogEventPatternConverter, and to replace, or append “cm” rather than “m” in the pattern. If your logs should be filtering but are instead prefixed by a “c”, then Log4j has not picked up your converter, and you should make sure that the names are correct, and that the package is included in the “Configuration” node!

Usage

So hopefully now we have everything hooked up so that our log statements can be masked. In order to take advantage of our converter, we need to log our statements with the appropriate Marker:

log.info(LoggingMarkers.JSON, '{"ssn": "1234567890"}') // Will mask
log.info('{"ssn": "1234567890"}') // Will NOT mask
log.info(LoggingMarkers.XML, '{"ssn": "1234567890"}') // Will try to mask, but probably won't work for this message

If all went well, you should now see your sensitive data being replaced with your mask. As a final note, if you are using Spring Boot, by default Log4J is configured BEFORE Spring Boot components and @Value fields, so if you put your fields-to-mask into a properties file, it may take some extra configuration to make sure Log4J picks them up.

Igor Shults

Share this Post

Related Blog Posts

JVM

Real Time Chat Application with Kotlin and Firebase

September 6th, 2017

Real Time Chat Application with Kotlin and Firebase

Joseph Roskopf
JVM

Unit Testing Camel Routes with Spock

August 22nd, 2017

Unit testing Camel Routes with the Spock testing framework in a Spring Boot application.

Chris Tosspon
JVM

Intro to Reactive Web in Spring 5

July 18th, 2017

Intro to Reactive Web in Spring 5

Mike Plummer

About the author

Igor Shults

Sr. Consultant

Igor is a self-driven developer with a desire to apply and expand his current skill set.  He has experience working with companies of all sizes on the whole application stack, from the database to the front-end.  He enjoys working in collaborative environments involving discussion and planning around data modeling and product development.