Building an event-driven AppSec Service

Over the last few years, application security(AppSec) and Cybersecurity, have emerged as one of the most essential disciplines of DevOps. It has become extremely necessary to ensure safety against cyber-attacks, particularly in larger organizations, for apps developed internally and apps available to the external world. Parallelly, we have DevOps which attempts to reduce the gap between development and operations – application security is one of the operations. Integrating security within DevOps is what is now commonly known as DevSecOps (Development, Security, and Operations). Implementing DevSecOps has its own set of challenges. Below is an example of one such challenge that Addteq faced while implementing DevSecOps for one of our financial clients.

The Challenge

Our client is a leading multinational investment bank and has a robust portfolio of services including asset management, asset servicing, and wealth management. They have a large number of application teams working on an equally large and diverse set of applications, developed in different technologies including but not limited to Java/J2EE, Python, NodeJS, etc. Keeping the notion of application security in mind, all of these applications need to undergo "security scans" to ensure awareness of any vulnerabilities and their timely mitigations. There is a specialized set of tools that do the job of scanning the applications, depending on the type of security scan required. Common security scan types are Static Scans, Software Composition Analysis (SCA) for Open Source components, Dynamic Scans, and more.

While these scans are important, the process of mandating these scans is very complicated and manual, requiring a lot of time. Due to the stringent deadlines on Application teams to deliver new features and bug fixes, there remained very little time to address the existing security vulnerabilities, let alone undergoing periodic security scans. Without a process to have these security scans in place, there arose frequent escalations and frequent back-and-forth between the AppSec and App teams, thereby delaying releases and causing further frustration.

Initial Analysis

After some meetings with the Application Security team, it was agreed upon to have some kind of mandate that enforces running security scans as a part of the release process. The process of running security scans needed to be automated somehow – so that it became an integral part of the software build and release process. Around the same time, the client was in the process of introducing GitLab CI/CD to its teams. GitLab CI/CD introduced the notion of pipelines that ensured proper separation of stages in the build and release process. This led us to think of a few things:

Can Addteq somehow integrate automated scans within the GitLab pipeline itself?
How does Addteq ensure the solution we build is capable of integrating new scan tools that plan to be introduced over time (extensibility)?
How can the GitLab pipeline reliably "send a message" to the Addteq built automation to initiate scans?
Where can the scan results be published for the client’s App teams to view?
How does Addteq ensure the issues pointed out by scans are being addressed and tracked in a timely manner?

The Addteq Solution

To design a solution to this problem, we needed an event-driven IFTT (if-this-then-that) framework/tool, which can somehow look for messages constantly to initiate scans. Once the message is received, the framework should then leverage the API/CLI provided by the scanning tools, to initiate scans. After the scans are complete, Addteq needed a place to publish the results for the client’s App teams to review.

Enter the tools

After going through a couple of options, while keeping in mind the existing infrastructure available with the client, Addteq came up with the following set of tools to achieve our goal outlined above:

Stackstorm will be used as the event-driven framework.
Kafka to be used as the "message bus". A tool that can reliably publish scan initiation messages and receive scan results. We ensured the scan initiation messages followed a strict JSON structure prior to suggesting this tool.
Grafana to be used as the reporting tool that will display scan results. Since we chose Kafka to publish results, any other reporting tool upstream wishing to consume the results can simply subscribe to the concerned Kafka topic.
Jira to be used as the tool to create tickets for the App teams to address vulnerabilities provided in the scan results. This tool will also ensure timely tracking and resolution of those vulnerabilities.
Nexus was already being used as the repository management tool, where the artifacts scanned are to be stored. Addteq’s automation can then simply download the concerned artifacts from Nexus and initiate/run the necessary scans.

The above tools used in conjunction with GitLab CI/CD would result in the following flow for the client’s infrastructure:

As seen above, we were able to leverage Stackstorm to act as an orchestration layer that takes care of :

Reading and publishing scan related data to/from Kafka.
Downloading artifacts from Nexus.
Initiating scans in the concerned AppSec scan tools and receiving scan results.
Keeping up-to-date with the necessary information about the apps by constantly polling information from the client’s application metadata service.

We used Postgres relational database to store any reporting information about the scans that were initiated from Stackstorm. This database was then linked to Grafana to produce useful dashboards to the App teams, as well as higher management. To facilitate reading scan initiation messages, initiating scans, and publishing scan results, Addteq wrote customized Stackstorm packs.

Conclusion

Using an event-driven automation tool like Stackstorm, coupled with a reliable means of message passing, like the Kafka Message bus, Addteq was able to build a bridge between the AppSec scanning tools and the applications that need to be scanned. By being a part of the solution, we were able to ensure that the applications underwent a security scan as a part of their CI/CD process itself. This ensured timely mitigations of the security vulnerabilities and reduced the back-and-forth between the App teams. On the other hand, since the App teams get an early heads up about the vulnerabilities, they now get enough time to address the issues without getting overwhelmed right before the release. It’s a win-win situation for both the Application and Application Security teams.

Next steps

This solution addresses a small fraction of the total applications on the instance, but the goal was to ensure the implementation covered the majority of mission-critical applications over time. At a high level here are some of the improvements/features Addteq has planned for our client’s pipeline:

Converting this service into a Kubernetes like structure that ensures each component in stack storm runs as its own micro-service.
Integrating more security tools onto the client’s instance.
Integrating more repository management tools onto the client’s instance.
Enabling integrations with cloud-based services.