Research & Work
minRLM: AI Inference Acceleration via Recursive Language Models
- Research on oken-efficient inference: Recursive Language Models (RLMs) keep data in a REPL and query it via generated code instead of stuffing context into the prompt — attention runs only on search results, so cost stays flat regardless of context size
- Implementation and benchmark across 12 tasks, 3 model sizes (GPT-5-nano, mini, 5.2): 72.7% accuracy on GPT-5-mini with 3.6× fewer tokens than the reference RLM and 2.6× fewer than vanilla; on GPT-5.2, +30pp over vanilla, 11 of 12 tasks won
- Article: minRLM: A Token-Efficient Recursive Language Model Implementation and Benchmark
- Code: github.com/avilum/minrlm — open-source client, evals, DockerREPL sandbox
ShadowRay 2.0 - AI Attacks AI: Self-Propagating Botnet Campaign
- Discovery: Active global campaign exploiting CVE-2023-48022 in Ray to hijack AI compute clusters into a self-replicating botnet - the first documented use of AI infrastructure to autonomously attack other AI infrastructure
- Scale: 230,000+ Ray servers exposed globally (10× the original ShadowRay discovery); active since at least September 2024
- Sophistication: DevOps-style delivery via GitLab/GitHub, LLM-generated payloads, CPU throttling at ~60% to avoid detection, processes disguised as kernel workers
- Blog: ShadowRay 2.0: Attackers Turn AI Against Itself in Global Campaign
- Coverage: [Forbes] [Dark Reading]
- Demo: Live RCE demo
ShadowMQ - Systemic RCE Across AI Inference Frameworks
- Insecure ZeroMQ + pickle deserialization reused across the AI ecosystem, affecting Meta, NVIDIA, Modular, vLLM, SGLang and others
- CVE-2025-60455: RCE in Modular Max Server
- CVE-2025-30165: RCE in vLLM li>CVE-2025-23254: RCE in NVIDIA TensorRT-LLM [CybersecurityNews]
- CVE-2026-3059: RCE in SGLang [SGLang]
- CVE-2024-50050: RCE in Meta llama-stack [The Hacker News]
- Blog: ShadowMQ: How Code Reuse Spread Critical Vulnerabilities Across the AI Ecosystem
Airborne - Wormable Zero-Click RCE in AirPlay
- Critical wormable zero-click RCE in Apple's AirPlay protocol. No user interaction required. Affects iPhones, iPads, Macs, Apple TVs, HomePods, and millions of third-party IoT devices
- Talk: A Worm in the Apple: Wormable Zero-Click RCE in AirPlay (w/ Uri Katz, Gal Elbaz) - Black Hat USA 2025
- Blog: Airborne: Wormable Zero-Click RCE in AirPlay Protocol [Wired]
- Checker: Test your device for AirPlay vulnerability
Pwn My Ride - CarPlay Attack Surface & Jailbreaking
- Security analysis of Apple CarPlay revealing attack vectors that enable car jailbreaking and vehicle system access via infotainment interfaces
- Talk: Pwn My Ride: Jailbreaking Cars with CarPlay - DEF CON 33 / AppSec Village 2025
- Blog: Pwn My Ride: Exploring the CarPlay Attack Surface [Wired]
React & Next.js Critical RCE
- CVEs: CVE-2025-55182 & CVE-2025-66478 - critical RCE in the world's most widely used frontend framework (w/ Gal Elbaz, Uri Katz)
OpenAI Codex Supply Chain Research
Application Attack Matrix
Anthropic MCP Inspector RCE
- CVE: CVE-2025-49596 - RCE in Anthropic MCP Inspector [The Hacker News]
- Exposed API Tokens in .cursor/mcp.json Files
0.0.0.0 Day
Shadow Vulnerabilities in AI
Ollama Vulnerabilities
- CVEs: CVE-2024-39719/20/21/22
- Blog: More Models, More ProbLLMs [The Hacker News]
ShadowRay - First Known Attack Campaign on AI Infrastructure
- Campaign: MITRE ATT&CK C0045 - first attack campaign targeting AI workloads identified in the wild
- Blog: ShadowRay: First Known Attack Campaign Targeting AI Workloads [Forbes]
- Checker: Test your Ray cluster for vulnerabilities
- Demo: Live RCE demo (THOTCON)
Shining a Light on Shadow Vulnerabilities
- Foundational research defining the shadow vulnerability class - real, exploitable risks that exist at runtime but are invisible to static analysis and dependency scanners
- Blog: Shining a Light on Shadow Vulnerabilities (w/ Gal Elbaz)
TensorFlow Keras Downgrade Attack
- Discovered a bypass for CVE-2024-3660 that forces Keras model files to load with a downgraded, vulnerable version of TensorFlow - enabling RCE even on patched systems through a dependency confusion path
- Blog: CVE-2024-3660 Bypass: TensorFlow Keras Downgrade Attack
eBPF Runtime Security
- Podcast: Interview: AI Security Researcher at Oligo Security - eBPFChirp FM (Oct 2025)
- Talk: Scaling Runtime Application Security with eBPF - BSides Budapest 2024
- Talk: Secimport: Tailor-Made eBPF Sandbox for Python Applications - PyCon Israel 2024
- Talk: Discovering Shadow Vulnerabilities via Reverse-Fuzzing - AppSec Village 2023 / OWASP Global 2023
- Blogs: App-Level eBPF Applications · Secure PyTorch Models with eBPF · Secure FastAPI with eBPF
ShellTorch - PyTorch TorchServe RCE
- Multiple critical vulnerabilities (CVSS 9.9, 9.8) in TorchServe, PyTorch's production model serving framework used by Amazon, Google, and Microsoft. Vulnerabilities allowed unauthenticated remote code execution ia SSRF and unsafe deserialization, affecting any publicly exposed TorchServe instance
- Talk: ShellTorch: The Next Evolution in *4Shell Executions - CNCF
- Blog: ShellTorch Explained: Multiple Vulnerabilities in PyTorch Model Server
Building LLM Agents with Minimal Dependencies
- Talk: Agent with Minimal Dependencies - LangTalks Webinar 2024
- Code: github.com/avilum/agent - a working agent loop in ~100 lines, no framework required
Early Research
Deci AI NVIDIA
- Part of the founding team as Deep Learning Software Engineer → Software Architect. Worked on inference acceleration and model architecture optimization across hardware targets — NVIDIA GPUs, mobile (iOS/Android), Jetson, TPUs, CPUs, and browsers
- Built research pipelines and orchestration infrastructure that enabled research at scale, including the automation layer for Neural Architecture Search (NAS) across any device and hardware stack
- Deci acquired by NVIDIA in 2024
- Writing: Infery: Deep Learning Inference in 3 Lines of Python
- Feb 2026Docling RCE: A Shadow Vulnerability Introduced via PyYAML (CVE-2026-24009)
- Nov 2025ShadowRay 2.0: Attackers Turn AI Against Itself in Global Campaign
- 2025ShadowMQ: How Code Reuse Spread Critical Vulnerabilities Across the AI Ecosystem
- 2025Critical React & Next.js RCE Vulnerability (CVE-2025-55182 & CVE-2025-66478)
- 2025Airborne: Wormable Zero-Click RCE in AirPlay Protocol
- 2025Critical Vulnerabilities in AirPlay Protocol Affecting Multiple Apple Devices
- 2025Pwn My Ride: Exploring the CarPlay Attack Surface
- 2025Critical RCE in Anthropic MCP Inspector (CVE-2025-49596)
- 2025CVE Funding Almost Expired: What You Need to Know
- 2025The Hidden Risks of NPM Supply Chain Attacks on AI Agents
- 2025The Application Attack Matrix
- 2025Uncovering the Hidden Risks: How Oligo Identifies 1100% More Vulnerable Functions
- 2025Shadow Vulnerabilities in AI: The Hidden Perils Beyond CVEs
- 2025CVE-2024-50050: Critical Vulnerability in Meta's llama-stack
- 2024More Models, More ProbLLMs: Vulnerabilities in Ollama
- 20240.0.0.0 Day: Exploiting Localhost APIs From the Browser
- 2024Shining a Light on Shadow Vulnerabilities
- 2024ShadowRay: First Known Attack Campaign Targeting AI Workloads
- 2024TensorFlow Keras Downgrade Attack: CVE-2024-3660 Bypass
- 2023ShellTorch: Multiple Vulnerabilities in PyTorch TorchServe
- 2023App-Level eBPF Applications
- 2023Secure FastAPI with eBPF
- 2023Secure PyTorch Models with eBPF
- 2022Sandboxing Python Dependencies in Your Code
- 2022How I Discovered Thousands of Open Databases on AWS
- 2021Infery: Run Deep Learning Inference with Only 3 Lines of Python Code
- 2021Identify Website Users By Client Port Scanning Using WebAssembly and Go
- 2021Facebook Knows What You Eat: Visualizing the Data Facebook Has About You
- 2020POC For Google Phishing In 10 Minutes: ɢoogletranslate.com
Projects & Tools
- minrlm - Research: AI inference acceleration via recursive language odels (3.6× fewer tokens, flat cost). Article
- airplay-checker - browser tool to test AirPlay vulnerability exposure on your local network
- uvify - converts Python repos into uv-managed environments automatically (87★)
- ray-checker - browser tool to test whether a Ray cluster is exposed to CVE-2023-48022
- yalla - fast CLI task runner and shell alias managerpan>
- agent - minimal LLM agent loop, ~100 lines, no framework dependencies
- semantic-search - in-browser semantic search via TensorFlow.js, fully client-side
- llama-saas - client/server for running LLaMA models as a local service (61★)
- secimport - library-level eBPF sandbox for Python; syscall control per module (234★)
- docker-downloaywhere - pull Docker images from registries in restricted environments
- audio2text - batch audio transcription CLI built on Whisper
- portsscan - web client port-scanner in Go/WASM; the research that led to 0.0.0.0 Day (156★)
- jsafer - sandbox and safe eval for untrusted JavaScript
- facebook-archive-analyzer - parses and visualizes the data export Facebook provides
- waycup - hides web assets from automated security scanners (117★)
- syscalls - Linux syscall reference for building eBPF policies
- smart-url-fuzzer - context-aware URL fuzzer based on discovered application structure
- linqit - LINQ-style list operations for Python (251★)
All projects: github.com/avilum