This article is a summary of my (Elias Lindkvist) thesis in which I explored the functionality, accuracy, performance and usability of function call tracing in Python to detect the runtime usage of known vulnerable code. I found that the approach is functional with 100% accuracy, but with an unfortunate 50% decrease in the performance of the traced application. The ease-of-use was found to be very high by deploying the solution in a GitLab CI/CD pipeline.
Society is becoming increasingly dependent on software-based solutions in critical infrastructure, such as in healthcare or finance. With this comes an immense problem of software vulnerabilities, which have an ever growing risk of causing real harm in our society.
There are plenty of recent examples of how these vulnerabilities can affect our society, such as the attack on a mental health startup in Finland where the attackers got access to patient journal data, or the ransomware attack targeting a company in Florida which forced the Coop supermarket chain in Sweden to temporarily close around 500 stores.
Luckily many of these vulnerabilities are known through the Common Vulnerabilities and Exposure (CVE) database. These CVEs are quite coarse grained and there exists almost 200 000 of them which makes current vulnerability management efforts difficult.
Since CVEs most often only contain a short description of a vulnerability and which specific versions of a library is affected, it can be very difficult to know if your application is vulnerable. Simply using a specified version of a library mentioned in a CVE does not mean your application is actually affected by the vulnerability, since it might not make use of the specific library functionality that is vulnerable.
ARVOS is a joint project between Elastisys and Debricked with the goal of improving our vulnerability management by decreasing the number of relevant vulnerabilities to only those that execute in critical environments inside of an application.
The approach of ARVOS is to create a more fine-grained database based on CVEs that also contains the relevant vulnerable symbols, and then use this database with eBPF to detect the execution of these symbols during the runtime of the application.
Symbols, in the context of programming languages, are unique identifiers to various definitions in a program such as variables, functions, classes. In the case of ARVOS, the relevant symbols are the names of vulnerable functions and the class they belong to (if any).
The work is split into two parts, with Debricked being responsible for creating the database, and Elastisys being responsible for creating an eBPF solution which uses the database to detect runtime use of vulnerable code.
Since my thesis was done in collaboration with Elastisys, it focused solely on the latter part of ARVOS, with a database provided by Debricked.
The Extended Berkeley Packet Filter, or eBPF, is a relatively new addition to the Linux kernel which grants an unprecedented view into the runtime behavior of applications. The use case of eBPF is extensive but the relevant aspect of eBPF for ARVOS is its tracing capabilities. To understand this, one needs two understand two important concepts: probes and tracing.
A probe is essentially a data source, a place in a program which can be attached to an eBPF program running inside of the kernel. This probe can then fire which triggers a callback to a function inside of the eBPF program. The probe can be defined with any number of arguments, which can help provide relevant information regarding the context of its firing.
Tracing is the process of attaching an eBPF program to a specific probe.
Static and dynamic probes in eBPF
There are two different types of probes, dynamic and static probes. Dynamic probes allow one to dynamically insert probes into running programs by essentially inserting extra instructions into specific locations of the program's instruction memory. You could for example insert a probe into a specific function and get a callback every time that function was executed.
Static probes on the other hand are compiled into the original program, meaning one has to add extra code in order to use these types of probes. Dynamic probes work well for statically typed compiled languages where symbol tables can be found in executables. For high level interpreted languages like Python or Java these dynamic probes are not as useful. For these languages there is a language runtime that abstracts away language specifics and without knowing runtime implementations, dynamic probes are simply impossible. Using dynamic probes would probe the runtime itself and not the actual application in such a language.
Goal of the thesis
The goal of my thesis was to explore the possibilities of using eBPF for function call tracing in Python. I evaluated the functionality and accuracy of eBPF in Python, the performance of eBPF on the traced application, and lastly how to package the solution with Docker and Kubernetes.
Important note about eBPF probes in Python
While dynamic probes are impossible in Python, the CPython developers have added several static probes into the python interpreter/runtime that can be used for tracing. These markers must specifically be enabled when compiling Python.
The one probe that was relevant to the ARVOS approach was the function__entry probe which fires for every single function call. This data was then filtered based on the vulnerability database to detect if any vulnerable symbols had been executed. This is not the most efficient way of detecting vulnerable code but it's the only possible solution in Python.
The experiment was divided into several parts. I first examined the baseline performance and accuracy of an implementation of an eBPF program. I then attempted to evaluate and improve the performance impact. Lastly I attempted to package the solution in an easy-to-use way.
Using the function__entry probe supplied by the Python developers proved to be successful in implementing the ARVOS approach to detect runtime invocations of vulnerable code.
The following plot contains the throughput of a Django application as the number of clients increases. The results clearly show that simply enabling the static probes in the Python runtime has almost zero effect on the performance of the application. However once eBPF tracing is enabled, the performance decreases by over 60%.
Comparing different sizes of vulnerability databases showed no change in the performance of the application.
To evaluate the accuracy of the tracing solution, one specific vulnerability relating to file uploads in the Django library was selected. An analysis of the Django source code showed that for one multipart file upload, the vulnerable function is called only once. This means that the tracing program should detect one invocation for each request sent by the workload generator. By comparing the number of file upload requests sent to the number of invocations of the vulnerable function that was detected the accuracy was found to be 100%.
|Requests sent||Invocations detected||Accuracy|
* symbols are often expressed like the following:
with a complete package name, class name (in this case MultiPartParser) and function name (parse). The arguments that the Python probe supplies are simply the function name of the executed function, the filename in which the function is defined and the line number of where the function is defined in the file. The filename and function name is not enough information for effective symbol matching since it does not contain the class name in any way, for example
file: /some/path/django/http/multipartparser.py, function: parse
This results in the chance of false positives when matching vulnerable functions in the case where a file contains multiple classes with functions of the same name. Meaning that while there is a 100% accuracy in matching actual vulnerable functions, the tracing might also mark some functions as vulnerable even if they are not. A crude analysis of the Django library, which was used in this thesis, found that around 11% of all functions were defined multiple times in the same file, which might give some indication of how often a false positive match occurs, assuming every function is executed equally often. Using the line number would allow one to determine the complete symbol name by parsing each file to determine to what class a function defined on a specific line belongs to, but this solution is much more complicated and there was simply no time to attempt to implement this.
Multiple attempts at improving the massive performance implication of the tracing solution were performed. The most notable of these is the top most line in the following plot, in which a completely empty eBPF function was used. This shows that simply attaching to the Python probe decreases the performance of the application by 50%, and thus no attempt at a smarter and more efficient eBPF program would significantly address this problem.
There are theoretically two ways of fixing this problem, by either using dynamic probes or by manually defining static probes in only relevant vulnerable functions. The first is, as previously explained, not possible in Python. The second would first of all require a ton of manual work, it would break if the vulnerability database or the used libraries are updated, but most importantly it would require one to modify the code of external libraries which is probably not something that you want to do.
Packaging and CI/CD
Through packaging with Docker and Kubernetes and running ARVOS through a GitLab CI/CD pipeline it was possible to automatically run the ARVOS tracer when new application code was pushed. If any vulnerable function was detected the pipeline would simply fail and notify the developer with relevant information.
This transformed the use-case of ARVOS from quite complicated to entirely seamless, as to not require the application developer to have any specific knowledge of using eBPF.
Due to the nature of function call tracing in dynamic and interpreted languages there is no easy solution to solve this massive impact in performance that eBPF has shown to have. Since a static probe must be used that fires for every single python function call this is hard to fix. The ideal would be to only fire callbacks into the kernel in the case of a vulnerable function executing but as explained this is not feasible with the current state of tracing in Python.
This massive performance degradation made ARVOS unsuitable for production environments which would otherwise be the perfect environment for runtime tracing. Deploying ARVOS in a CI/CD environment however proved beneficial in several ways. First of all, the performance of the application is not as important in such an environment. Secondly, eBPF is quite tricky to install and get working, and very particular about which operating system it's running on and which specific packages are installed. By running ARVOS in a CI/CD pipeline all of these complexities are removed from the application developer so that they can focus completely on their application code.
Ultimately eBPF has proved incredibly accurate at detecting actual use of vulnerable code, meaning that the work the application developers needs to do is heavily reduced.
ARVOS can specify exactly what specific function is affected in a certain library, and will ignore vulnerabilities that don't actually affect the application. All of this can be run automatically in a CI/CD environment which will simply notify the developer of any issues, and prevent new vulnerabilities from being merged into the application.