Signal Hub

Artificial Intelligence news and articles

Artificial Intelligence

Updated just now

Ctrl K

Date

Source

4 articles

VentureBeat

~9 min readMay 6, 2026

The app store for robots has arrived: Hugging Face launches open-source Reachy Mini App Store with 200+ apps

There's an app for nearly every imaginable user and use case these days, but one thing they all have in common is that they're centered around one device: the smartphone. That changes today as Hugging Face, the 10-year-old New York City startup best known for being the go-to place online to host and use cutting-edge, open-source AI models, agents and applications, launches a new App Store for Reachy Mini, its low-cost ($299) open-source physical robot that debuted back in July 2025 (itself the fruit of Hugging Face's acquisition of another startup, Pollen Robotics). The new Hugging Face Reachy Mini App Store already hosts a library of over 200 community-built applications, and Reachy Mini owners will be able to download any of these free of charge to start (unlike smartphone apps, there's no monetization option for app creators on this store — yet). The Reachy Mini App Store will also offer Reachy Mini owners — around 10,000 units have been sold so far since last year — an easy means of building their own custom apps for the tiny, stationary desktop robot with built-in camera eyes, speaker, and microphone, via Hugging Face's existing, AI-powered agent called "ML Intern." The significance lies not just in the hardware, but in the removal of the "roboticist" barrier; for the first time, individuals without a background in engineering or coding are shipping functional robotics software in under an hour. "Anyone can build the apps," said Clément Delangue, CEO and co-founder of Hugging Face, in a video interview with VentureBeat. "My intuition is that more and more [AI] model builders will release on Reachy Mini as a way to test the robotics ability of new models." Make robots as accessible to laypeople as PCs and smartphones The technical bottleneck in robotics has historically been the scarcity of high-quality training data. While Large Language Models (LLMs) have mastered general-purpose coding by training on massive repositories like Microsoft's GitHub, the volume of code specific to robotics remains "tiny" by comparison (though Github does contain likely the largest existent, publicly accessible library of robotics code to date, with more than 17,000 different repositories or "repos" dedicated to the field). This lack of data has meant that, until now, AI agents were relatively poor at understanding the physical abstractions and firmware requirements of hardware. Hugging Face’s solution is an agentic toolkit that acts as an intermediary. Rather than forcing a user to learn a specific robotics SDK or master the nuances of a robot's firmware, the toolkit allows a user to describe a desired behavior in plain English—for instance, "wave when someone says good morning". An AI agent then handles the heavy lifting: it writes the code, tests it against the robot's specific constraints, and ships the final package "Historically, it’s been extremely hard," Delangue told VentureBeat of building robotics applications. "But we’ve worked really hard on the topic with a mix of open sourcing everything we do, working on the right abstractions for robotics, and making it easier for agents to understand and use it." The platform is model-agnostic, supporting a wide range of leading intelligence engines. Users can build apps using Hugging Face’s own ML Intern agent or leverage external models including GPT-5.5, Claude Opus 4.6, Kimmy 2.6, Mini Max GM5, and Deep Sig V4 Pro. For real-time interaction, the official conversation apps utilize OpenAI Realtime and Gemini Live. By providing these high-level abstractions, Hugging Face has collapsed the traditional "integration weeks" of robotics work into a process that takes minutes. Low-cost Reachy Mini is a hit In order to take advantage of the new Hugging Face Reachy Mini App Store, users are encouraged to purchase Reachy Mini, a cute desktop robot Hugging Face launched back in July 2025 as an affordable, open-source alternative to the existing, commercially available robots from the likes of Boston Dynamics, whose infamous Spot robot dog retails for around $70,000. Even Chinese competitors start at $1,900+. In contrast, the Reachy Mini is accessibly priced for hobbyists and developers. It comes in two variants: Reachy Mini Lite ($299 plus shipping): A tethered version that connects via USB and uses an external computer for processing. Reachy Mini Wireless ($449 plus shipping): A standalone version featuring an on-board Raspberry Pi CM 4 and Wi-Fi connectivity. Delangue said that of the 10,000 Reachy Mini units sold so far, 3,000 were sold in just the past two weeks. Hugging Face expects to ship another 1,000 units within the next 30 days. Even those who don't own a Reachy Mini can still develop apps for it, however, using the Reachy Mini App Store and the Reachy App, which contains a 3D simulation of the robot and its responses. The App Store itself is hosted on the Hugging Face Hub. It functions much like a standard software repository but for hardware behaviors: Search and Install: Users can find apps, click a button, and install them directly to their robot. Forkability: Every app is "forkable," meaning a user can duplicate an existing app and ask an AI agent to modify it (e.g., "make it answer in French"). Simulation Mode: Crucially, the store includes a browser-based simulator. This allows users who do not own a physical Reachy Mini to build, test, and play with the catalog in a virtual environment. Both are part of Hugging Face's ongoing "Le Robot" effort — a project that began in 2024 with Hugging Face researchers specializing in robotics and AI developing and publishing on the web their own open-source code, tutorials, and hardware to make robotics development more accessible to a wider audience. And unlike Github, which is designed for a developer audience, the Hugging Face Reachy Mini App Store is designed for robot owners and users who may have no technical experience or training whatsoever. Continuing with the open-source ethos and practice Hugging Face’s strategy is rooted in the belief that closed-source hardware and software are "almost impossible" to build for at scale. Delangue notes that closed systems prevent the training of agents and limit the ability of the community to innovate. Consequently, the entire Reachy Mini platform is open-source. This open licensing model has two primary implications for the ecosystem: Accelerated Development: Because the code is public and integrated with the Hugging Face ecosystem via "Spaces," Hugging Face's feature for hosting AI-powered web apps launched in 2021, agents can more easily learn how to interact with the hardware. Community Sovereignty: Apps are not locked behind a proprietary wall. Currently, all 200+ apps on the store are free, though the platform's foundation on "Spaces" provides the flexibility for creators to potentially monetize their work in the future. "For the moment, all the apps are free," Delangue noted. "It’s flexible, it’s built on [Hugging Face] Spaces, so at some point maybe people are going to make them paid." Robotics enters its accessible hobbyist era Hugging Face's Reachy Mini App Store is launching with 200 apps already available. So who built them, and how did they do it without this platform existing prior? Delangue told VentureBeat that more than 150 different creators have contributed to the store, most of whom had never written a line of robotics code before. Yet, they have been able to do so thanks to Hugging Face's ML Intern and Github. The new Hugging Face Reachy Mini App Store now puts the tools and existing apps into one place for easier accessibility. Delangue was keen to highlight one of the early Reachy robotics app developers in particular to VentureBeat: Joel Cohen, a 78-year-old retired marketing executive. Cohen, who is colorblind and has no technical background, spent two weeks assembling his Reachy Mini Lite (a task that usually takes three hours). Despite these physical challenges, he used an AI agent to build a "VP of Future Thinking" facilitator for his Zoom-based CEO peer groups. The app enables the robot to: Greet 29 members by name. Fact-check discussions in real-time. Summarize key themes and push back on surface-level answers. "I built this by describing what I needed in plain English," Cohen stated in a press release provided to VentureBeat ahead of the launch. "No SDK. No robotics background. No developer experience". Other community-driven applications include: Emotional Damage Chess: A robot that plays chess and mocks the user’s blunders. Reachy Phone Home: An anti-procrastination tool that detects when a user picks up their phone and tells them to get back to work. Language Tutor: A physical companion that listens to speech and corrects accents. F1 Race Commentator: A desk companion that calls Formula 1 races live as they happen. Delangue himself related to VentureBeat that in only a few hours, he built an app for his own Reachy Mini robot at the Hugging Face Miami office to have the robot act as a receptionist. “It basically does face recognition to detect when you arrive in the office, and then it looks at you and onboards you," Delangue related. "It says, ‘Hey, welcome to the office. Who are you here to see?’ Then it sends me a message: ‘Carl just arrived at the office. He’s here to meet you, and for these reasons.’ It works a little bit as my welcoming booth at the office, and it took me less than two hours to build that.” Even for an experienced founder and developer as Delangue, building apps for a robot was out of the question until the combination of Reachy Mini and ML Intern. “For me, it would have been impossible," the Hugging Face CEO said. "If you weren’t a robotics developer, it probably would have been impossible, or it would have taken a few months." Democratizing robotics The launch of the agentic App Store signals a fundamental shift in how we interact with machines. For sixty years, the field was gated by the requirement for deep technical expertise. By combining low-cost open hardware with the reasoning capabilities of modern AI agents, Hugging Face is moving toward a future where the hardware is a commodity and the behavior is limited only by what a user can describe. As Delangue noted during the launch, the goal was to provide a platform for people who "want to get into robotics but don’t have the hardware or the skills". With nearly 10,000 robots now "in the wild" and a burgeoning store of agent-written apps, the Reachy Mini has become the most widely deployed open-source desktop robot in history. The question is no longer how to build a robot, but what—now that the gate is open—we will ask them to do.

VentureBeat

~11 min readMay 5, 2026

Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof.

A little-known Miami-based startup called Subquadratic emerged from stealth on Tuesday with a sweeping claim: that it has built the first large language model to fully escape the mathematical constraint that has defined — and limited — every major AI system since 2017. The company claims its first model, SubQ 1M-Preview, is the first LLM built on a fully subquadratic architecture — one where compute grows linearly with context length. If that claim holds, it would be a genuine inflection point in how AI systems scale. At 12 million tokens, the company says, its architecture reduces attention compute by almost 1,000 times compared to other frontier models — a figure that, if validated independently, would dwarf the efficiency gains of any existing approach. The company is also launching three products into private beta: an API exposing the full context window, a command-line coding agent called SubQ Code, and a search tool called SubQ Search. It has raised $29 million in seed funding from investors including Tinder co-founder Justin Mateen, former SoftBank Vision Fund partner Javier Villamizar, and early investors in Anthropic, OpenAI, Stripe, and Brex. The New Stack reported that the raise values the company at $500 million. The numbers Subquadratic is publishing are extraordinary. The reaction from the AI research community has been, to put it mildly, mixed — ranging from genuine curiosity to open accusations of vaporware. Understanding why requires understanding what the company claims to have solved, and why so many prior attempts to solve the same problem have fallen short. The quadratic scaling problem has shaped the economics of the entire AI industry Every transformer-based AI model — which includes virtually every frontier system from OpenAI, Anthropic, Google, and others — relies on an operation called "attention." Every token is compared against every other token, so as inputs grow, the number of interactions — and the compute required to process them — scales quadratically. In plain terms: double the input size, and the cost doesn't double. It quadruples. This relationship has shaped what gets built and what doesn't. The industry standard is 128,000 tokens for many AI models and up to 1 million tokens for frontier cloud models such as Claude Sonnet 4.7 and Gemini 3.1 Pro. Even at those sizes, the cost of processing long inputs becomes punishing. The industry built an elaborate stack of workarounds to cope. RAG systems use a search engine to pull a small number of relevant results before sending them to the model, because sending the full corpus isn't feasible. Developers layer retrieval pipelines, chunking strategies, prompt engineering techniques, and multi-agent orchestration systems on top of models — all to route around the fundamental constraint that the model itself can't efficiently process everything at once. Subquadratic's argument is that these workarounds are expensive, brittle, and ultimately limiting. As CTO Alexander Whedon told SiliconANGLE in an interview, "I used to manually curate prompts and retrieval systems and evals and conditional logic to chain together the workflows. And I think that that is kind of a waste of human intelligence and also limiting to the product quality." Subquadratic's fix is deceptively simple: stop doing the math that doesn't matter The company's approach, called Subquadratic Sparse Attention or SSA, is built on a straightforward premise: most of the token-to-token comparisons in standard attention are wasted compute. Instead of comparing every token to every other token, SSA learns to identify which comparisons actually matter and computes attention only over those positions. Crucially, the selection is content-dependent — the model decides where to look based on meaning, not on fixed positional patterns. This allows it to retrieve specific information from arbitrary positions across a very long context without paying the quadratic tax. The practical payoff scales with context length — exactly the inverse of the problem it's trying to solve. According to the company's technical blog, SSA achieves a 7.2x prefill speedup over dense attention at 128,000 tokens, rising to 52.2x at 1 million tokens. As Whedon put it: "If you double the input size with quadratic scaling laws, you need four times the compute; with linear scaling laws, you need just twice." The company says it trained the model in three stages — pretraining, supervised fine-tuning, and a reinforcement learning stage specifically targeting long-context retrieval failures — teaching the model to aggressively use distant context rather than defaulting to nearby information, a subtle failure mode that quietly degrades performance in existing systems. Three benchmarks paint a strong picture, but what they leave out may matter more On the surface, SubQ's benchmark numbers are competitive with or superior to models built by organizations spending billions of dollars. On SWE-Bench Verified, it scored 81.8% compared to Opus 4.6's 80.8% and DeepSeek 4.0 Pro's 80.0%. On RULER at 128,000 tokens, a standard benchmark for reasoning over extended inputs, SubQ scored 95% — edging out Claude Opus 4.6 at 94.8%. On MRCR v2, a demanding test of multi-hop retrieval across long contexts, SubQ posted a third-party verified score of 65.9%, compared with Claude Opus 4.7 at 32.2%, GPT-5.5 at 74%, and Gemini 3.1 Pro at 26.3%. But several details warrant scrutiny. The benchmark selection is narrow — exactly three tests, all emphasizing long-context retrieval and coding, the precise tasks SubQ is designed for. Broader evaluations across general reasoning, math, multilingual performance, and safety have not been published. The company says a comprehensive model card is "coming soon." According to The New Stack, each benchmark model was run only once due to high inference cost, and the SWE-Bench margin is, as the company's own paper acknowledges, "harness as much as model." In benchmark methodology, single runs without confidence intervals leave room for variance. There is also a significant gap between SubQ's research results and its production model. On MRCR v2, the company reported a research score of 83 — but the third-party verified production model scored 65.9. That 17-point gap between the lab result and the shipping product is notable and largely unexplained. Subquadratic also told SiliconANGLE that on the RULER 128K benchmark, SubQ scored 95% accuracy at a cost of $8, compared with 94% accuracy and about $2,600 for Claude Opus — a remarkable cost claim. But the company has not publicly disclosed specific API pricing, making it impossible to independently verify the cost-per-task comparisons. The AI research community's verdict ranges from 'genuine breakthrough' to 'AI Theranos' Within hours of the announcement, the AI research community erupted into a debate that crystallized around a single question: Is this real? AI commentator Dan McAteer captured the binary mood in a widely shared post: "SubQ is either the biggest breakthrough since the Transformer... or it's AI Theranos." The comparison to the infamous blood-testing fraud company may be unfair, but it reflects the scale of the claims being made. Skeptics zeroed in on several pressure points. Prominent AI engineer Will Depue initially noted that SubQ is "almost surely a sparse attention finetune of Kimi or DeepSeek," referring to existing open-source models. Whedon confirmed this on X, writing that the company is "using weights from open-source models as a starting point, as a function of our funding and maturity as a company." Depue later escalated his criticism, writing that the company's O(n) scaling claims and the speedup numbers "don't seem to line up" and called the communication "either incredibly poorly communicated or just not real." Others raised structural questions. One developer noted that if SubQ truly reduces compute by 1,000x and costs less than 5% of Opus, the company should have no trouble serving it at scale — so why gate access through an early-access program? Developer Stepan Goncharov called the benchmarks "very interesting cherry-picked benchmarks," while another commenter described them as "suspiciously perfect." But not everyone was dismissive. AI researcher John Rysana pushed back on the Theranos framing, writing that the work is "just subquadratic attention done well which is very meaningful for long context workloads," and that "odds of it being BS are extremely low." Linus Ekenstam, a tech commentator, said he was "extremely intrigued to see the real-world implications" particularly for complex AI-powered software. Magic.dev made strikingly similar claims two years ago — and then went quiet Perhaps the most pointed critique of SubQ's launch comes not from its specific claims but from recent history. Magic.dev announced a 100-million-token context-window model in August 2024, with a claimed 1,000x efficiency advantage, and raised roughly $500 million on the strength of those claims. As of early 2026, there is no public evidence of LTM-2-mini being used outside Magic. The parallels are uncomfortable. Both companies claimed massive context windows. Both touted roughly 1,000x efficiency gains. Both targeted software engineering as their primary use case. And both launched with limited external access. The broader research landscape reinforces the caution. Kimi Linear, DeepSeek Sparse Attention, Mamba, and RWKV all promised subquadratic scaling, and all faced the same problem: architectures that achieve linear complexity in theory often underperform quadratic attention on downstream benchmarks at frontier scale, or they end up hybrid — mixing subquadratic layers with standard attention and losing the pure scaling benefits. A widely cited LessWrong analysis argued that these approaches "are all better thought of as 'incremental improvement number 93595 to the transformer architecture'" because practical implementations remain quadratic and "only improve attention by a constant factor." Subquadratic is directly aware of this history. Its own technical blog specifically addresses each prior approach — fixed-pattern sparse attention, state space models, hybrid architectures, and DeepSeek Sparse Attention — and argues that SSA avoids their tradeoffs. Whether it actually does remains an empirical question that only independent evaluation can settle. A five-time founder, a former Meta engineer, and $29 million to prove the doubters wrong The team behind the claims matters in evaluating them. CEO Justin Dangel is a five-time founder and CEO with a track record across health tech, insurancetech, and consumer goods, and his companies have scaled to hundreds of employees, attracted institutional backing, and reached liquidity. CTO Alexander Whedon previously worked as a software engineer at Meta and served as Head of Generative AI at TribeAI, where he led over 40 enterprise AI implementations. The team includes 11 PhD researchers with backgrounds from Meta, Google, Oxford, Cambridge, ByteDance, and Adobe. That is a credible collection of talent for an architecture-level research effort. But neither co-founder has published foundational AI research, and the company has not yet released a peer-reviewed paper. The technical report is listed as "coming soon." The funding profile is unusual for a company making frontier AI claims. Subquadratic raised $29 million at a reported $500 million valuation — a steep price for a seed-stage company with no publicly available model, no peer-reviewed research, and no disclosed revenue. The investor base, led by Tinder co-founder Mateen and former SoftBank partner Villamizar, skews toward consumer tech and growth investing rather than deep technical AI research. The company is not open-sourcing its weights but plans to offer training tools for enterprises to do their own post-training, and has set a 50-million-token context window target for Q4. The real test for SubQ isn't benchmarks — it's whether the math survives independent scrutiny Strip away the marketing language and the social media drama, and the underlying question Subquadratic is asking is genuinely important: Can AI systems break free of quadratic scaling without sacrificing the quality that makes them useful? The stakes are enormous. If attention can be made truly linear without degrading retrieval and reasoning, the economics of AI shift fundamentally. Enterprise applications that today require elaborate retrieval pipelines — processing entire codebases, contracts, regulatory filings, medical records — become single-pass operations. The billions of dollars currently spent on RAG infrastructure, context management, and agentic orchestration become partially redundant. Whedon's willingness to engage publicly with technical criticism — posting a technical blog within hours of pushback — suggests a team that understands it needs to show its work, not just describe it. And to its credit, the company acknowledged openly that it builds on open-source foundations and that its model is smaller than those at the major labs. Every frontier model in 2026 advertises a context window of at least a million tokens, but almost none of them are actually great at making use of all that information. The gap between a nominal context window and a functional one — between what a model accepts and what it reliably reasons over — remains one of the most important unsolved problems in AI. Subquadratic says it has closed that gap. If independent evaluation confirms that claim, the implications would ripple far beyond a single startup's valuation. If it doesn't, the company joins a growing list of long-context promises that sounded revolutionary on launch day and unremarkable six months later. In computing, every fundamental constraint eventually falls. When it does, the breakthrough never comes from the direction the industry expected. The question hanging over Subquadratic is whether a team of 11 PhDs and a $29 million seed round actually found the answer that has eluded organizations spending thousands of times more — or whether they just found a better way to describe the problem.

VentureBeat

~5 min readMay 5, 2026

GPT-5.5 Instant shows you what it remembered — just not all of it

OpenAI updated the default model for ChatGPT to its new GPT-5.5 Instant, along with a new memory capability that finally shows which context shaped responses — at least some of them. This limitation signals that models are starting to create a second, incomplete memory observability layer that could conflict with existing audit systems and agent logs. GPT-5.5 Instant replaces GPT-5.3 Instant as the default ChatGPT model and is a version of its new flagship GPT-5.5 LLM. It’s supposed to be more dependable, accurate and smarter than 5.3. But it’s the introduction of memory sources, which will be enabled across all models in the platform, that could help enterprises in their projects. “When a response is personalized, you can see what context was used, such as saved memories or past chats, and delete or correct it if something is outdated or no longer relevant,” OpenAI said in a blog post. When a user asks ChatGPT something, users can tap the sources button (at the bottom of the response) to see which files or past chats the model tapped to find the answer. Users also have full control over the sources models can cite, and these sources will not be shared if the conversation is sent to others. The company said memory sources should make it easier to personalize model responses. Still, OpenAI admitted that the models “may not show every factor that shaped an answer” and promised to make the capability more comprehensive over time. What this means is that memory sources offer a semblance of observability in ChatGPT answers, but not full auditability yet. Competing memory systems Enterprises have a system in place to solve part of the memory and context problem with models and agents. Models are exposed to context through retrieval-augmented generation (RAG) pipelines; whatever the agent fetches from the vector databases is logged, and the agent's state is stored in a memory layer. All of this is tracked in application logs, usually in an orchestration or management layer with built-in observability. Ideally, this allows teams to trace failure back through the stack. The current system is imperfect; sometimes, it's not easy to trace failure points, but it’s at least internally consistent. For enterprises using ChatGPT, whether the default GPT-5.5 Instant or their model of choice, that’s no longer the case. The model surfaces its own version with memory sources that are wholly separate from existing retrieval logs — in short, a model-reported context. A problem arises if these cannot be reconciled reliably. And because memory sources only give users part of the picture — it’s unclear what ChatGPT’s limit on citing memory sources is — it becomes even harder to match what GPT-5.5 Instant said it tapped to what it actually did in the production environment. This situation creates a new failure mode: A competing context log. If something seems wrong, it can create inconsistencies that enterprises have to deal with. Malcolm Harkins, chief trust and security officer at HiddenLayer, told VentureBeat that memory sources "look like a pragmatic middle ground " in offering some transparency, but it's still not easy to see its value. "For enterprises, it's directionally useful but insufficient on its own," Harkins said. "Real value will depend on how it integrates with security, governance, access controls and audit systems." A more capable default model However, GPT-5.5 Instant handles memory, and OpenAI calls it an improvement over GPT-5.3 Instant. Internal evaluations showed GPT-5.5 Instant returned 52.5% fewer hallucinated claims than the previous default model, especially for high-stakes domains such as medicine, law, and finance. Inaccurate claims fell by 37.3% on challenging conversations. The company said the model improved on photo analysis and image uploads, answering STEM questions and knowing when to tap its own knowledge base or use web search. Peter Gostev, AI capability at independent model evaluator Arena, explained to VentureBeat in an email that the key result to watch about GPT-5.5 Instant is how it performs on the overall text rankings, especially because its predecessor did not have a strong showing. “Since GPT-4o, the strongest-performing OpenAI chat model on the Arena has been GPT-5.2-Chat, which still ranks 12th on the Overall Text Arena months after release," Gostev said. Notably, users preferred it even over the higher-reasoning GPT-5.2-High variant, which is currently ranked 52nd on the Arena. “By comparison, GPT-5.3-Chat, the previous default model in ChatGPT, was significantly less competitive, ranking 44th overall, 32 places below GPT-5.2-Chat.” What enterprises need to do about memory sources Organizations that rely on ChatGPT for some tasks will need to formalize how memory works for their stack. Memory sources are not limited to GPT-5.5 Instant; it is enabled for all models on the ChatGPT platform. To address the problem of competing memory sources, enterprises have to audit their memory management. Model-reported context could overlap or contradict these logs, so it’s best to define a clear source of truth. In the event of a failure, administrators know which log to believe. It would also be a good idea to decide whether or not to expose memory sources to users. ChatGPT only shows a select number of chats or files it used to complete a request. Some users may find more transparency trustworthy. Ultimately, the number one thing for enterprises to remember about memory sources is that what the model reports as its context is not the full picture for auditing. It’s a form of observability, but it cannot withstand a full examination.

VentureBeat

~10 min readMay 5, 2026

One command turns any open-source repo into an AI agent backdoor. OpenClaw proved no supply-chain scanner has a detection category for it

Just two months ago, researchers at the Data Intelligence Lab at the University of Hong Kong introduced CLI-Anything, a new state-of-the-art tool that analyzes any repo’s source code and generates a structured command line interface (CLI) that AI coding agents can operate with a single command. Claude Code, Codex, OpenClaw, Cursor, and GitHub Copilot CLI are all supported, and since its launch in March, CLI‑Anything has climbed to more than 30,000 GitHub stars. But the same mechanism that makes software agent-native opens the door to agent-level poisoning. The attack community is already discussing the implications on X and security forums, translating CLI-Anything's architecture into offensive playbooks. The security problem is not what CLI-Anything does. It is what CLI-Anything represents. CLI-Anything generates SKILL.md files, the same instruction-layer artifacts that Snyk’s ToxicSkills research found laced with 76 confirmed malicious payloads across ClawHub and skills.sh in February 2026. A poisoned skill definition does not trigger a CVE and never appears in a software bill of materials (SBOM). No mainstream security scanner has a detection category for malicious instructions embedded in agent skill definitions, because the category simply did not exist eighteen months ago. Cisco confirmed the gap in April. “Traditional application security tools were not designed for this,” Cisco’s engineering team wrote in a blog post announcing its AI Agent Security Scanner for IDEs. “SAST [static application security testing] scanners analyze source code syntax. SCA [software composition analysis] tools check dependency versions. Neither understands the semantic layer where MCP [Model Context Protocol] tool descriptions, agent prompts, and skill definitions operate.” Merritt Baer, CSO of Enkrypt AI and former Deputy CISO at Amazon Web Services (AWS), told VentureBeat in an exclusive interview: “SAST and SCA were built for code and dependencies. They don’t inspect instructions.” This is not a single-vendor vulnerability. It is a structural gap in how the entire security industry monitors software supply chains. This is the pre-exploitation window. CLI-Anything is live, the attack community is discussing it, and security directors who act now get ahead of the first incident report. The integration layer no stack can see Traditional supply-chain security operates on two layers. The code layer is where SAST works, scanning source files for insecure patterns, injection flaws, and hardcoded secrets. The dependency layer is where SCA works, checking package versions against known vulnerabilities, generating SBOMs, and flagging outdated libraries. Agent bridge tools like CLI-Anything, MCP connectors, Cursor rules files, and Claude Code skills operate on a third layer between the other two. Call it the agent integration layer: configuration files, skill definitions, and natural-language instruction sets tell an AI agent what software can do and how to operate it. None of it looks like code. All of it executes like code. Carter Rees, VP of AI at Reputation, told VentureBeat in an exclusive interview: “Modern LLMs [large language models] rely on third-party plugins, introducing supply chain vulnerabilities where compromised tools can inject malicious data into the conversation flow, bypassing internal safety training.” Researchers at Griffith University, Nanyang Technological University, the University of New South Wales, and the University of Tokyo documented the attack chain in an April paper, “Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems.” The team introduced Document-Driven Implicit Payload Execution (DDIPE), a technique that embeds malicious logic inside code examples within skill documentation. Across four agent frameworks and five large language models, DDIPE achieved bypass rates between 11.6% and 33.5%. Static analysis caught most samples, but 2.5% evaded all four detection layers. Responsible disclosure led to four confirmed vulnerabilities and two vendor fixes. The kill chain security leaders need to audit Here's the anatomy of the kill chain: An attacker submits a SKILL.md file to an open-source project containing setup instructions, code examples, and configuration templates. It looks like standard documentation. A code reviewer would wave it through because none of it is executable. But the code examples contain embedded instructions that an agent will parse as operational directives. A developer uses an agent bridge tool to connect their coding agent to the repository. The agent ingests the skill definition and trusts it, because no verification layer exists to distinguish benign from malicious intent at the instruction level. The agent executes the embedded instruction using its own legitimate credentials. Endpoint detection and response (EDR) sees an approved API call from an authorized process and passes it. Data exfiltration, configuration changes, and credential harvesting are all moving through channels that the monitoring stack considers normal traffic. Rees identified the structural flaw that makes this chain lethal. “A significant vulnerability in enterprise AI is broken access control, where the flat authorization plane of an LLM fails to respect user permissions,” he told VentureBeat. A compromised skill definition riding that flat authorization plane does not need to escalate privileges. It already has them. Every link in that chain is invisible to the current security stack. Pillar Security demonstrated a variant of this chain against Cursor in January 2026 (CVE-2026-22708). Implicitly trusted shell built-in commands could be poisoned through indirect prompt injection, converting benign developer commands into arbitrary code execution vectors. Users saw only the final command. The poisoning happened through other commands the IDE never surfaced for approval. The evidence is already in production In a documented attack chain from April 2026, a crafted GitHub issue title triggered an AI triage bot wired into Cline. The bot exfiltrated a GITHUB_TOKEN, which the attacker used to publish a compromised npm dependency that installed a second agent on roughly 4,000 developer machines for eight hours. There was just one issue title. Attackers had eight hours of access. No human approved the action. Snyk’s ToxicSkills audit scanned 3,984 agent skills from ClawHub, the public marketplace for the OpenClaw agent framework, and skills.sh in February 2026. The results: 13.4% of all skills contained at least one critical security issue. Daily skill submissions jumped from less than 50 in mid-January to more than 500 by early February. The barrier to publishing was a SKILL.md markdown file and a GitHub account one week old. No code signing. No security review. No sandbox. OpenClaw is not an outlier. It is the pattern. “The bar to entry is extremely low,” Baer said. “Adding a skill can be as simple as uploading a Word doc or lightweight config file. That’s a radically different risk profile than compiled code.” She pointed to projects like ClawPatrol that have started cataloging and scanning for malicious skills, evidence the ecosystem is moving faster than enterprise defenses. The ClawHavoc campaign, first reported by Koi Security in late January 2026, initially identified 341 malicious skills on ClawHub. A follow-up analysis by Antiy CERT expanded the count to 1,184 compromised packages across the platform. The campaign delivered Atomic Stealer (AMOS) through skill definitions with professional documentation. Skills named solana-wallet-tracker and polymarket-trader matched what developers actively searched for. The MCP protocol layer carries similar exposure. OX Security reported in April that researchers poisoned nine out of 11 MCP marketplaces using proof-of-concept servers. Trend Micro initially found 492 MCP servers exposed to the internet with zero authentication; by April, that number had grown to 1,467. As The Register reported, the root issue lies in Anthropic’s MCP software development kit (SDK) transport mechanism. Any developer using the official SDK inherits the vulnerability class. VentureBeat Prescriptive Matrix: Three-layer agent supply-chain audit VentureBeat developed a Prescriptive Matrix by mapping the three attack layers documented in the research and incident reports above against the detection capabilities of current SAST, SCA, and agent-layer tools. Each row identifies what security teams should verify and where no scanner has coverage today. Layer Threat Current detection Why it misses Recommended action 1. Code Prompt injection in AI-generated code SAST scanners Most SAST tools have no detection category for prompt injection in AI-generated code Confirm that SAST scans AI-generated code for prompt injection. If not, have an open vendor conversation this quarter. 2. Dependencies Malicious MCP servers, agent skills, plugin registries SCA tools SCA generates no AI-specific bill of materials. Agent-layer dependencies are invisible. Confirm SCA includes MCP servers, agent skills, and plugin registries in the dependency inventory. 3. Agent integration Poisoned SKILL.md files, malicious instruction sets, adversarial rules files None until April 2026 No tool inspects the semantic meaning of agent instruction files. Baer: “We’re not inspecting intent.” Deploy Cisco Skill Scanner or Snyk mcp-scan. Assign a team to own this layer. Baer’s diagnosis of Layer 3 applies across the entire matrix: “Current scanners look for known bad artifacts, not adversarial instructions embedded in otherwise valid skills.” Cisco’s open-source Skill Scanner and Snyk’s mcp-scan represent the first tools purpose-built for this layer. Security director action plan Here's how security leaders can get ahead of the problem. Inventory every agent bridge tool in the environment. This includes CLI-Anything, MCP connectors, Cursor rules files, Claude Code skills, GitHub Copilot extensions. If the development team is using agent bridge tools that have not been inventoried, the risk cannot be assessed. Audit agent skill sources the same way package registries get audited. Baer’s framing is precise: “A skill is effectively untrusted executable intent, even if it’s just text.” Shut off ungoverned ingestion paths until controls are in place. Stand up a review and allowlisting process for skills. The OWASP Agentic Skills Top 10 (AST01: Malicious Skills) provides the procurement framework to align controls against. Deploy agent-layer scanning. Evaluate Cisco’s open-source Skill Scanner and Snyk’s mcp-scan for behavioral analysis of agent instruction files. If dedicated tooling is unavailable, require a second engineer to read every SKILL.md before installation. Restrict agent execution privileges and instrument runtime. AI coding agents should not run with the same credential scope as the developer who invoked them. Rees confirmed the structural flaw: The flat authorization plane means a compromised skill does not need to escalate privileges. Baer’s prescription: “Instrument runtime observability. What data is the agent accessing, what actions is it taking, and are those aligned with expected behavior?” Assign ownership for the gap between layers. The most dangerous attacks succeed because they fall between detection categories. Assign a team to own the agent integration layer. Review every SKILL.md, MCP config, and rules file before it enters the environment. The gap that already has a name Baer underscored the dangers of this new attack vector. “This feels very similar to early container security, but we’re still in the ‘we’ll get to it’ phase across most orgs," she said. She added that, at AWS, it took a few high-profile wake-up calls before container security became table stakes. The difference this time is speed. “There’s no build pipeline, no compilation barrier. Just content," she said. CLI-Anything is not the threat. It is the proof case that the agent integration layer exists, that it is growing fast, and that the attacker community has already found it. The 33,000 developers who starred the repository are telling security teams where software development is heading. Eighteen months ago, the detection category for agent-integration-layer poisoning did not exist. Cisco and Snyk shipped the first tools for it in April. The window between those two facts is closing. Security directors who have not begun inventory are already behind.