A team of researchers from the University of California, Berkeley School of Information have discovered that the Log4Shell vulnerability — a widely-reported cybersecurity issue discovered in late 2021 — still has not been patched in many applications, especially those that use the Log4Shell indirectly. A fixed version of Log4j has been available for over four months.
“This is a call to action for the open-source software industry to fix critical vulnerabilities,” says Muhammad Akhtar, one of the researchers who conducted the research, which was funded in part by the Center for Long-Term Cybersecurity. “Otherwise there is a good chance an attacker will be able to exploit the vulnerability on a library or application that uses Log4j indirectly.”
Log4j is a popular logging framework used by java application developers to log important contextual data to the application console for analysis, debugging, and other purposes. In December, many versions of Log4j version 2 were impacted by a remote code execution vulnerability called Log4Shell, which allowed an attacker to send an arbitrary payload that compromised the underlying application, allowing an attacker to gain full access to the system.
While a patch was made available shortly after the discovery of Log4Shell, the research team — called V-Cube — hypothesized that many applications remained vulnerable because a dependent library may be accessing the Log4j indirectly. When there are multiple layers of dependencies between libraries, this chain of dependency results in multiple “hops” from an application to a vulnerability.
The researchers’ study focused on whether a greater number of “hops” makes it harder to detect and fix a vulnerability. They conducted their analysis using downloaded libraries from Maven Central, a popular public repository of all Java open source libraries, and building a graph structure to connect all libraries.
Their analysis confirmed their hypothesis that when vulnerable libraries are buried deep inside the software dependency chain, developers are less likely to patch, possibly due to a lack of awareness that a vulnerability is present inside their application, or due to the unavailability of a patched version for a direct dependent library. Their research identified that fewer than five percent of open source library versions on Maven Central use the fixed version of the Log4j library. (See a more detailed description of the research methodology below.)
Fewer than five percent of open source library versions on Maven Central use the fixed version of the Log4j library.
The team estimates that several thousand organizations could be impacted as these libraries, when referenced, become part of an organization’s internal source code. “We believe that developers find it difficult to fix open-source library vulnerabilities in their applications due to lack of visibility surrounding which libraries are getting used,” Akhtar says. “Our research shows that, even after four months, most of the libraries remain unpatched. Developers need better visibility tools that provide software bills of material to better understand the supply chain and identify indirect dependencies in their application.”
Research Methodology
By Muhammad Akhtar
There were a series of four vulnerabilities that left Log4j library version 2.0 to 2.16 impacted, with few exceptions. We looked at all versions of Log4j that have major version 2, and found that it was used by approximately 53,684 total library versions that referenced the vulnerable Log4j-core version. Out of these, 14.0% (7,516) were direct dependencies, and the rest 86% (46,268) were transitive (indirect) dependencies. (Transitive dependencies do not use the library directly, but go through some other intermediate library or set of libraries.)
Next we looked at the fixed versions (2.17 to 2.17.2) to check how many libraries have been patched to use these versions and we found that only 1,507 (2.8%) have fixed the issue.
Since we built a graph database , we could also find the total number of hops, as well as dependencies at each hop. The dependencies for all Log4j2 versions went up to nine hops deep, with a maximum of 28% references at the third hop. Hops 2–5 had the most dependencies, with corresponding values 12293 (22%), 15155 (28%), 11794(21.9%), and 4917 (9.1%). The dependency graph by hops almost forms a bell curve with positive skewness.
Next, we wanted to verify our hypothesis that transitive dependencies (hops >= 2) are harder to fix, so we looked at dependencies on vulnerable versions and fixed versions by hop. When we analyzed by hop, we found that 45% of the 1507 fixes were direct and 55% were transitive (indirect) dependencies, and at hop 1 (direct dependency), around 9% dependencies were fixed. But the percentage drops at an alarming rate from hops 2 to 5 (transitive dependencies), with values of 3.6%, 1.5%, 0.8%, and 0.75% respectively. These results confirmed our hypothesis that as hops increase, the fix rate drops. This means that developers find it increasingly difficult to fix issues due to a lack of visibility around dependency chains.
As an example, we randomly picked one Log4j vulnerable version to analyze the chain up to level 5. We found this chain:
“log4j-core -> azure-storage-queue -> spring-cloud-azure-service-> spring-messaging-azure -> spring-messaging-azure-servicebus -> spring-integration-azure-servicebus”
This means that any app that uses spring-integration-azure-servicebus is impacted by Log4shell.
We also looked at the older Log4j version 1 and found 282K libraries with a staggering 94% indirect use, with highest 58K use at level 4 going up to level 13 deep. It was a similar story for another popular JSON processing library, jackson-databind, which had 250K dependencies and 88% indirectly going 11 levels deep.
Conclusion
Based on our research, it is clear that the Log4Shell vulnerability is not patched by many libraries, especially those that use it indirectly, even though a fixed version of Log4j has been available for over four months. This is a call for action for the open source software industry to fix critical vulnerabilities. Otherwise, there is a good chance an attacker will be able to exploit the vulnerability on a library/application that uses Log4j indirectly. Also, developers need better visibility into tools that provide software bill of material to better understand the supply chain and identify indirect dependencies in their applications.