The Devil's in the Dependency: Data-Driven Software Composition Analysis

The Devil's in the Dependency: Data-Driven Software Composition Analysis

Oct 8, 2020

We all know that lurking within even the most popular open source packages are flaws that can leave carefully constructed applications vulnerable. In fact, 71% of all applications contain flawed open source libraries, many (70.7%) coming from downstream dependencies which might escape the notice of developers. Using graph analytics and a broad data science toolkit, we untangle the web of open source dependencies and flaws and show the best way for developers to navigate this seemingly intractable game of whack-a-mole.

In this analysis, we examine over 85,000 applications and their use of more than 500k open source libraries. We provide an overview of open source usage showing that typical applications have hundreds or thousands of libraries, with most coming from a cascade of transitive dependencies. We find that proof-of-concept exploits exist for 21.7% of libraries with flaws, and that even very tiny (162 LoC) and very popular (included in 89% of applications) JavaScript libraries can contain exploitable flaws.