SENSOR: Graph based Revision History Analysis for Code Evolution Introspection

During the summer of 2022, I worked as a Research Intern at the Computer Science Laboratory at SRI International. My research focused on securing open-source software from malicious actors and influence operations within developer communities.

Research Objective

How can we protect the integrity of open-source software projects from malicious actors and influence operations within the community?

With open-source software playing a crucial role in modern infrastructure, securing its development process has become a pressing challenge. Several high-profile attacks on open-source projects have led to supply chain vulnerabilities and downstream security incidents. To make progress on this broader issue, we first tackled a more specific problem:

Can we detect malicious patches in the Linux kernel repository?

Approach

The team had already developed a graph-based AI model that analyzed the social dynamics of the Linux kernel community. This model successfully identified incidents like Hypocrite Commits and other influence operations with low false positive rates.

My role was to enrich this model by incorporating information from actual code changes. My approach involved:

Repo Mining (Step 1) - was straightforward once we determined the right abstractions and tools.
Using Change Graphs for Code Representation - turned out to be more complex than anticipated, primarily due to the evolving build dependencies of the kernel. We partially solved this using TuxMake by Linaro, an excellent build tool for anyone compiling the kernel frequently.
Feature Engineering for Patch Classification - the most engaging part. Defining meaningful features from code changes and implementing analysis passes to extract them was key to improving our classification accuracy.
Patch Classification - I learnt a lot here. We used Graph Neural Networks for malicious patch detection. We leveraged the existing graph structure and trained a GNN model to classify patches as malicious or benign.