Robo Red Team: Training Platform

Written: March 7, 2021

During the first semester of my studies towards an MSc in cybersecurity from Aalborg University, the semester project for my group was the development of a training platform for blue team professionals. In this article, I will describe the outcome of this project, which we have decided to open source with a GPL 3.0 license.

Sceenshot of the website for the Robo Red Team Training platform.

Sceenshot of the website for the platform.

From a user's perspective, the platform works like this: You visit the webpage where you decide which campaign to begin, how much time you have to harden the system, and enter the API-key. Once you start the campaign the associated Docker containers will be spawned, and you get the credentials for SSH into these boxes shown on the webpage. Before you can SSH into the machine you first need to be on the local network, which is achieved via a WireGuard VPN connection to the server running the platform. Once you have access to the machine, then the goal is to quickly identify and mitigate the security vulnerabilities which are present in the system, before the timer runs out. Once all of the time has been used, then an attacker machine will run attack-scripts against the machine to test if all the vulnerabilities have been mitigated. Finally, the machines will be stopped, and the result of the attack-script will be displayed on the webpage.

Hereby we have made the foundation of a platform that can be used to learn how to quickly identify vulnerabilities in the system, and test if the mitigations are effective. We chose to try to develop this type of platform as we felt it would be a useful contribution since most similar training/competitions mainly focus on the offensive side of cybersecurity.

The cybersecurity MSc I am studying at: Aalborg University

Technical Details:

During the design and development, we wanted to ensure that the platform was developed in a way where it would be easy to scale it up to allow for many users to use the platform simultaneously. Therefore, we decided to utilize a micro-service architecture for this project. While we chose not to implement Kubernetes in the project due to time limitations, this will allow for the platform to quickly be changed to work with Kubernetes in the future to allow for efficient scaling based upon demand.

Architecture overview of the platform.

Architecture overview of the platform.

For the overall design, we had set up a FlaskRESTful server to temporarily serve the website. If this project is expanded into a production-grade product, then this should be changed to either utilize the template-engine which Flask provides, or the static site should be moved to a hosting site, as it is not the most optimal way to serve a webpage. However, we chose to use it as the product is still in an early "proof of concept state", where we felt we could use our time more productive on further development, instead of switching the way the frontend is served.

For the backend we are using a micro-service architecture for the various services, along with having one main FlaskRESTful API running natively on the machine, as it needs to be able to control Docker:

Visualization of the various micro-services in the backend.

Visualization of the various micro-services in the backend.

As seen above, we have a few different micro-services: The first one is a "DataStore" service, which is a service functioning as a static database, since we did not have time to integrate the management of user-sessions, so the need for a database was not there during development. The second is an "Authentication" service, which holds the hashed and salted API-key, and is used by the primary backend API to validate the authentication the users provide.

Finally, we have the various services and containers associated with a single running campaign: We have a "CampaignManager" who is responsible for keeping track of the campaign and is the point of communication between the backend API and the campaign machines. Then we have the "Attacker" who, once the timer runs out, will attack the "Blue machines" by following the pre-defined attack-script and report back to the "CampaingManager" with the result of the attack. Lastly, we have the "Blue machines" which are the machines the users will gain access to, to find the vulnerabilities and mitigate them.

Visualization of the various communication the platform uses.

Communication from user visits the frontend, to the containers are stopped.

When using a micro-service architecture we have quite a lot of communication going on between the various elements of the platform, as displayed in the figure above. The figure displays how the various communications in the platform are from the point where a user accesses the webpage, spawns two machines, and finally the "CampaignManager" requests for the machines to be stopped again once the timer has run out.

Future Work:

We do not intend to develop this platform further, which is why we have decided to open source it under a GPL-3.0 license. That said we would be happy if anyone gets inspired by our project and decide to finalize such a training platform for blue team professionals, whether it will be a further development on our code-base, or just getting inspired by the project. If you are wanting to develop such a platform, then feel free to reach out, as I will be happy to share my experiences with the development of this "proof of concept" platform.

In case you wish to continue using our platform, then we have compiled a list of improvements that we would highly recommend to make to the platform:

  • Build a mechanism that ensures a "one campaign pr. user/API-key" policy is kept, as it now is possible to DoS the service by spawning a large number of containers if you know the valid API-key.
  • Implement a database and support for users.
  • Develop more campaigns, to give more content to the users.
  • Ensure that the communication will be done encrypted via HTTPS.
  • Build protection into the networking of the system, so that the "blue machines" do not have internet access (or only limited internet access)
  • Chose a better way to serve the backend API, and the frontend, which is more scalable than the Flask development server.
  • Utilize Kubernetes for making the platform able to scale automatically.

See the source code on: Github