Detection and prevention of malicious commits to the PHP repository

Technical

April 5 2021

3 min read

The Motivation

On Sunday, March 28th, members of the PHP team identified malicious code commits to their interpreter using legitimate developer identities that had been compromised, along with the git.php.net server.

Our Approach

One of the features in the Apiiro Code Risk Platform^TM is the ability to detect and prevent malicious commits to code repositories using UEBA and Anomaly Detection technologies (patent-pending). This capability is based on Machine Learning and Artificial Intelligence algorithms that analyze the behavior of different entities in the organization (e.g., code components, security controls, data types, contributor’s knowledge, organizational behavior, repositories, projects and more).

The algorithms extract dozens of domain-oriented features (including logical, contextual, and time-series features) to build a multi-dimensional characterization of each entity. Various sources are used for the feature extraction. For example, both the metadata and the content of the historical commits, pull requests, and tickets are thoroughly analyzed and their numerical, time-series and textual features are extracted. Another source of data for the algorithms is the historical cross repositories code analysis features produced by our own platform. Once the features are extracted and enriched with our domain expertise, Apiiro builds and trains an adaptive behavioral model in real-time.

In addition to individual models for each entity in the organization, Apiiro’s algorithms train higher-level models, which are used to strengthen the confidence of the detected events. This way we can achieve a high detection rate of malicious activities, while lowering the false detections of irrelevant anomalies. For example, comparing a developer’s behavior to their peer group behavior can shed a light on the legitimacy of an individual’s activity.

Obviously, we cannot publish the malicious activities that were successfully detected by our platform in our client environments; however, a few days ago, while running our platform on the PHP repository, we were able to detect the malicious attack against php-src (the PHP interpreter) – a popular open-source repository in GitHub.

The Attack

On March 28^th at 03:57 AM UTC a malicious commit was pushed to the repository by a legitimate contributor named “rlerdorf” (Rasmus Lerdorf). The malicious piece of code that was added to zlib.c checked the HTTP_USER_AGENTT header (x2T was deliberately used by the attacker) in incoming HTTP packets. If the header started with the string “zerodium“ then the remainder of the string was executed on the server as a PHP code. This way, the attacker was able to inject and run arbitrary PHP code on servers running the updated version of the php-sr code.

Thanks to a responsible PR review by the repository contributors, an investigation of this commit was issued and the malicious code was removed after a couple of hours and before the updated code went online

Apiiro’s Code Risk Platform detected the origin malicious commit (out of 4400 commits learned) to the PHP code repository, using UEBA and Anomaly Detection technologies

Detection and Prevention

Apiiro was able to detect the malicious commit based on the abnormal activity of the compromised user (rlerdorf). Our anomaly detection algorithms flagged this commit as suspicious since it deviated from the normal activity of this user with respect to its past activity and the activities of other contributors in the repository. Some of the indicators that triggered an abnormality alert by our algorithms were:

The commit did not match the contributor’s recent activity in regards to the intensity of commits
The commit time deviated from the expected activity day and time of the user
The contributor deviated from the patterns common by his peers in the repository
A major discrepancy between commit’s information and analyzed code
The added code was significantly different than predicted by the model for this repository

Apiiro’s detection is done automatically in real-time, while being agnostic to both the language and the hosting. Once we detect an attack, our platform is able to automatically comment on the PR of the suspicious commit or even trigger an alert in Slack for the security operation center.

False Alarms

One of the key challenges of any anomaly detection system is to maintain a low false alarm rate. During the analysis of the php-src repository in the last two years, our platform triggered 4 suspicious events, where only one of them involved a major code addition in addition to a combination of flagged high-risk indicators. This was the malicious commit described in the blog. The low false alarm rate of our platform enables operators to focus on numerous suspicious events, without flooding them with endless irrelevant anomalies.

Summary

Apiiro’s anomaly detection algorithms were able to successfully detect a malicious commit by analyzing various behaviors of activity of different types of entities while keeping an extremely low false alarm rate. This blog demonstrates some of Apiiro’s anomaly detection capabilities that are used by our clients to protect and secure their repositories.

Gil David

Head of AI