Avi Lumelsky

Security Researcher · Software Artect

Security researcher at Oligo Security, focused on AI infrastructure and open-source software. Previously at Deci AI (acquired by NVIDIA), working on inference acceleration and model architecture optimization across hardware and software stacks. I enjoy runtime challenges on both sides: optimizing ML pipelines for low latency or high throughput, building eBPF-based security tooling, identifying overlooked attack vectors, and finding exploitable bugs in software that most of the industry relies on.

Speaker at Black Hat USA & Asia · DEF CON · BlueHat · CNCF · AppSec Village · BSides · THOTCON · OWASP · AI Summit · PyCon

Research & Work

minRLM: AI Inference Acceleration via Recursive Language Models

  • Research on oken-efficient inference: Recursive Language Models (RLMs) keep data in a REPL and query it via generated code instead of stuffing context into the prompt — attention runs only on search results, so cost stays flat regardless of context size
  • Implementation and benchmark across 12 tasks, 3 model sizes (GPT-5-nano, mini, 5.2): 72.7% accuracy on GPT-5-mini with 3.6× fewer tokens than the reference RLM and 2.6× fewer than vanilla; on GPT-5.2, +30pp over vanilla, 11 of 12 tasks won
  • Article: minRLM: A Token-Efficient Recursive Language Model Implementation and Benchmark
  • Code: github.com/avilum/minrlm — open-source client, evals, DockerREPL sandbox

ShadowRay 2.0 - AI Attacks AI: Self-Propagating Botnet Campaign

  • Discovery: Active global campaign exploiting CVE-2023-48022 in Ray to hijack AI compute clusters into a self-replicating botnet - the first documented use of AI infrastructure to autonomously attack other AI infrastructure
  • Scale: 230,000+ Ray servers exposed globally (10× the original ShadowRay discovery); active since at least September 2024
  • Sophistication: DevOps-style delivery via GitLab/GitHub, LLM-generated payloads, CPU throttling at ~60% to avoid detection, processes disguised as kernel workers
  • Blog: ShadowRay 2.0: Attackers Turn AI Against Itself in Global Campaign
  • Coverage: [Forbes] [Dark Reading]
  • Demo: Live RCE demo

ShadowMQ - Systemic RCE Across AI Inference Frameworks

Airborne - Wormable Zero-Click RCE in AirPlay

Pwn My Ride - CarPlay Attack Surface & Jailbreaking

React & Next.js Critical RCE

Anthropic MCP Inspector RCE

Ollama Vulnerabilities

ShadowRay - First Known Attack Campaign on AI Infrastructure

Shining a Light on Shadow Vulnerabilities

  • Foundational research defining the shadow vulnerability class - real, exploitable risks that exist at runtime but are invisible to static analysis and dependency scanners
  • Blog: Shining a Light on Shadow Vulnerabilities (w/ Gal Elbaz)

TensorFlow Keras Downgrade Attack

ShellTorch - PyTorch TorchServe RCE

Building LLM Agents with Minimal Dependencies

Deci AI NVIDIA

  • Part of the founding team as Deep Learning Software Engineer → Software Architect. Worked on inference acceleration and model architecture optimization across hardware targets — NVIDIA GPUs, mobile (iOS/Android), Jetson, TPUs, CPUs, and browsers
  • Built research pipelines and orchestration infrastructure that enabled research at scale, including the automation layer for Neural Architecture Search (NAS) across any device and hardware stack
  • Deci acquired by NVIDIA in 2024
  • Writing: Infery: Deep Learning Inference in 3 Lines of Python

Projects & Tools

  • minrlm - Research: AI inference acceleration via recursive language odels (3.6× fewer tokens, flat cost). Article
  • airplay-checker - browser tool to test AirPlay vulnerability exposure on your local network
  • uvify - converts Python repos into uv-managed environments automatically (87★)
  • ray-checker - browser tool to test whether a Ray cluster is exposed to CVE-2023-48022
  • yalla - fast CLI task runner and shell alias managerpan>
  • agent - minimal LLM agent loop, ~100 lines, no framework dependencies
  • semantic-search - in-browser semantic search via TensorFlow.js, fully client-side
  • llama-saas - client/server for running LLaMA models as a local service (61★)
  • secimport - library-level eBPF sandbox for Python; syscall control per module (234★)
  • docker-downloaywhere - pull Docker images from registries in restricted environments
  • audio2text - batch audio transcription CLI built on Whisper
  • portsscan - web client port-scanner in Go/WASM; the research that led to 0.0.0.0 Day (156★)
  • jsafer - sandbox and safe eval for untrusted JavaScript
  • facebook-archive-analyzer - parses and visualizes the data export Facebook provides
  • waycup - hides web assets from automated security scanners (117★)
  • syscalls - Linux syscall reference for building eBPF policies
  • smart-url-fuzzer - context-aware URL fuzzer based on discovered application structure
  • linqit - LINQ-style list operations for Python (251★)

All projects: github.com/avilum