Shadow Vulnerabilities in AI/ML Data Stacks - What You Don’t Know CAN Hurt You

Abstract

The adoption of open-source AI software introduces a new family of vulnerabilities to organizations. Some components in AI, like model serving, include Remote Code Execution (RCE) by design, like when loading pre-trained models from external sources.

Traditional SCA and SAST approaches are not built for the AI ecosystem leaving a huge & insecure attack surface.

AI models are often downloaded from the public web, from untrusted sources in common platforms like HuggingFace using the “trust_remote_code=True” flag when loading models. So how do we better secure our AI stacks?

In this talk, we’ll examine some of the common security anti-patterns prevalent in AI engineering, such as security issues that are not classified as CVEs by design, or patched security issues that introduce breaking changes.

We’ll review the methods introduced for better security hygiene such as new checkpoint formats (model files on disk) - like SavedModel and SafeTensors. While SCA, SAST, and traditional approaches don't analyze model checkpoints, leaving these silent vulnerabilities in your stacks, we’ll demo through real code examples, why the runtime context is crucial to detect these security issues.

Gal Elbaz

Co-founder & CTO at Oligo Security with 10+ years of experience in vulnerability research and practical hacking. He previously worked as a Security Researcher at CheckPoint and served in the IDF Intelligence. In his free time, he enjoys playing CTFs.