Startups news and articles

Sheryl Sandberg leads $10 million investment in AI-powered vehicle inspection service

The startup, founded in 2021, lets enterprise customers use smartphones to scan and spot vehicle damage.

Why AMI Labs’ Alexandre LeBrun won’t call his AI ‘AGI’ or ‘superintelligence’

While everyone in AI is chasing "superintelligence," Alexandre LeBrun, CEO of Yann LeCun’s world model startup, AMI Labs, dismisses the word.

Lululemon backs nylon-recycling startup Syntetica in $30M Series A

Syntetica, a French startup that has developed a novel approach to recycling nylon, has already obtained big-name partners and investors.

Applied Computing wants to give oil and gas operators an AI model for the entire plant

Applied Computing has raised a $20M Series A to build a foundation AI model for the oil, gas and petrochemical industry.

~13 min readJul 15, 2026

Agentic orchestration: Enterprise AI organizations have a deployment problem, not a platform problem — and most are calling chatbots agents

Across 101 enterprises, agent orchestration is consolidating onto model-provider platforms — Anthropic’s Claude leads by a wide margin — chosen for the gravity of the underlying model and judged on reliable multi-step execution. But the ambition runs well ahead of the reality: most deployed “agents” are still chatbot wrappers, the control plane enterprises expect is deliberately hybrid to avoid lock-in, and real-time fiscal control over token burn remains the exception. This wave of VentureBeat Pulse Research examines enterprise agent orchestration: which platforms enterprises run on, what drives the choice, what they optimize for, how they expect agent control to be structured, and — most revealingly — how orchestrated their deployed “agents” actually are and how tightly they control the cost of running them. The central finding is a gap between orchestration ambition and orchestration reality. Enterprises are consolidating fast onto the major model platforms: Anthropic’s Claude is the primary platform for 40%, more than double any rival, followed by Microsoft (18%) and OpenAI (13%). The choice is driven by “model gravity” — native alignment with a state-of-the-art base model (21%) — and success is judged by reliable, multi-step execution (task completion reliability 32%, multi-step workflow management 28%). Yet asked to assess their portfolios honestly, 71% say a quarter or fewer of their deployed “agents” are true multi-step orchestrated workflows rather than single-prompt chatbot wrappers, and only 10% have crossed the halfway mark. The orchestration layer is being built well ahead of the orchestrated portfolio it is meant to run. That gap shapes the architecture enterprises are putting in place. By the end of 2026 a clear majority (51%) expect a hybrid control plane — provider-native plus external orchestration — and only 6% expect to hand control to a provider-managed service, because vendor lock-in (35%) is the risk they fear most if control lives inside a model provider. Investment follows the build-out: agent workflow tooling leads the spend (34%), with security and permissions enforcement (25%) behind. And fiscal control lags throughout — more than a quarter (27%) have no real-time way to stop a runaway agent before the bill arrives. Methodology VentureBeat fielded this survey as part of its ongoing Pulse Research series, this instrument focused on enterprise agent orchestration. Responses are filtered to organizations with 100 or more employees (n=101), drawn from a single June 2026 wave; because this is one wave rather than a pooled multi-month sample, the report reads cross-sectionally and does not infer month-over-month trends. By organization size the sample is spread evenly across the enterprise bands: 100–499 employees, 2,500–9,999, and 50,000+ (21% each), with 10,000–49,999 and 500–2,499 (19% each). By role it is senior and buyer-credible: product and program managers (15%), CIO/CTO/CISO (13%), consultants and advisors (13%), and a spread of data, AI, and engineering directors and VPs, with an “Other” function at 18%. On purchasing, 81% are recommenders, influencers, or final decision-makers for AI solutions (66% recommender/influencer, 15% final decision-maker). Technology/Software is the largest industry at 44%, followed by Financial Services (17%) and Healthcare/Life Sciences (8%). At 101 respondents the sample is robust enough to read directionally with reasonable confidence, though it remains self-selected and is not a probability sample. Finding 1: Orchestration runs on model-provider platforms Anthropic’s Claude leads; open frameworks are marginal We asked which agent orchestration platform enterprises primarily use today. The answer concentrates on the major model providers — and on one in particular. A note on reading these shares. As described in the methodology section, the respondents are self-selected, and this question asked them for a single primary platform — so the figures measure which platform leads each enterprise's deployment, within a self-selected audience of AI-active technical decision-makers. A sample built this way can diverge substantially from spend-weighted market measures, and each VB Pulse survey draws its own sample with its own company-size mix, so vendor figures should not be compared across our surveys either. Read these shares as a portrait of where this cohort has placed its primary orchestration bet today, rather than as market share. The model platforms dominate. Anthropic, Microsoft, OpenAI, Google, and Amazon together account for roughly 80% of deployments (81 of 101), while the open frameworks (LangChain/LangGraph) and custom in-house builds that anchor engineering discussion sit in single digits. Anthropic’s lead — 40%, more than double the next platform — mirrors the “model gravity” selection logic in Finding 2: enterprises are choosing the orchestration layer that comes with the model they want to build on. As with the security vendors in the prior agent-security wave, the tools that define the category in technical circles are not yet where enterprise deployment concentrates. A small 3% are not orchestrating at all. Respondents rate the platforms they run at 3.94 out of 5 overall (109 answered), with “value for money” specifically at 3.94 and “ease of implementation” the weakest score, at 3.85 — placing orchestration near the bottom of our five-tracker satisfaction range, ahead of only evaluation tooling. A rating just under 4 out of 5, from users of whom 96% plan to change their orchestration approach within the year, reads as provisional acceptance: the platforms work well enough to run today, and not well enough to stop the search for something better. The ratings sit alongside near-universal intent to change; this is a layer enterprises tolerate more than they love. Finding 2: Model gravity drives platform selection The base model, not the tooling, decides the platform We asked what most influenced the orchestration platform choice. The single largest factor is the pull of the underlying model — though flexibility and ease of development follow close behind. Model gravity leading is the selection-side explanation for Anthropic’s platform lead: enterprises pick the orchestration environment closest to the frontier model they have standardized on. But the next tier complicates the picture — flexibility across models and tools (17%) and ease of development (17%) say enterprises also want to avoid being trapped by that choice, foreshadowing the lock-in fear in Finding 6. Security and permissions (14%) and total cost of ownership (11%) round out a pragmatic buying logic. Performance (latency/memory) sits last at 4%, a reminder that at this stage of adoption the binding constraints are model fit and optionality, not raw speed. Finding 3: The job is reliable multi-step execution Enterprises just orchestration by whether it completes the work We asked what enterprises optimize for — their primary success metric for orchestration. Reliability and multi-step workflow management dominate; developer- and user-facing metrics trail. Task completion reliability (32%) and multi-step workflow management (28%) together account for 59% of responses (60 of 101): orchestration succeeds, in the enterprise view, when it reliably carries a task through multiple steps to completion. Developer productivity (17%) matters but is secondary — the inverse of its prominence in framework discussion — and end-user experience (9%) is a minor concern, consistent with orchestration being an internal execution problem rather than a UX one. This reliability-first standard is exactly what makes the Chatbot Trap finding so pointed: enterprises define success as dependable multi-step execution, yet most of their deployed “agents” do not yet do multi-step work at all. The trap is not evenly distributed. Splitting the sample by organization size, 77% of smaller enterprises say a quarter or fewer of their agents do true multi-step work, against 62% of larger ones. Larger enterprises are meaningfully further into genuine multi-step deployment; the chatbot trap is, directionally, a mid-market condition. Finding 4: Consolidate, productionize, and build in-house Three strategic moves are nearly tied for the year ahead We asked what major change enterprises anticipate in their orchestration strategy over the next 12 months. Three moves cluster at the top, almost evenly split. The top three — building in-house control (25%), standardizing on one framework (24%), and moving agents from sandbox to production (23%) — are statistically indistinguishable and tell a single story: enterprises are moving from experimentation to operational consolidation. They want fewer frameworks, more production exposure, and more ownership of the control layer; only 4% expect no change. The appetite for custom in-house control planes is notable alongside the platform concentration in Finding 1 — enterprises are standardizing on model-provider platforms while simultaneously planning to wrap them in control logic they own, the hybrid posture that Finding 6 makes explicit. Finding 5: Investment flows to workflow tooling Tooling and permissions lead the spend; monitoring trails We asked which orchestration-related investment will grow most next year. Agent workflow tooling leads, with security and permissions enforcement behind. Workflow tooling leading (34%) is the budget-side expression of the reliability-and-multi-step priority in Finding 3: the money is going to the machinery that strings steps together dependably. Security and permissions enforcement (25%) and scaling infrastructure (20%) follow — the investments required to take agents from sandbox into production, the strategic move in Finding 4. Monitoring and debugging draws a smaller 11%, with another 11% reporting flat budgets. The weight on tooling, permissions, and scaling over pure observability signals that enterprises are spending to build and harden orchestration, not merely to watch it run. Finding 6: The control plane will be hybrid — and lock-in is why Enterprises expect to split control between providers and their own layer We asked where enterprises expect the primary control plane for agents to live by the end of 2026, and what worries them most if that control sits inside a model-provider platform. A clear majority expect a hybrid model — and vendor lock-in is the reason. Hybrid control is the dominant expectation by a wide margin (51%), and only 6% expect to hand control to a provider-managed service outright. Read together, the hybrid, custom, and externally-abstracted options — every architecture that keeps control at least partly outside the provider — sum to 88% (89 of 101). The reason surfaces directly when we asked about the risk of provider-resident control: vendor lock-in leads at 35% (35 of 101), ahead of security and permissioning limitations (28%) and inflexibility across models and tools (21%). The pattern echoes the prior wave’s “don’t trust the model to police itself” posture — here, enterprises will build on a provider’s platform but decline to be governed entirely by it. The hybrid control plane is the architectural hedge against the lock-in they most fear. The June figure asserting a preference for a hybrid control plane marks movement from earlier. In the April–May survey (n=145), only 34% expected a hybrid control plane, and a greater number (12%) expected to hand control fully to a provider-managed service. These two snapshots don’t yet measure a confirmed longitudinal trend — but the direction of the conversation is unambiguous: toward keeping control. Lock-in is also a new arrival as a top concern. In the April–May wave, the leading concern was security and permissioning limitations (32%), with lock-in second at 24%; by June the two had traded places. The worry about provider platforms appears to be maturing from whether they can be secured to whether they can be replaced. Finding 7: The chatbot trap — most “agents” aren’t agents yet Enterprises admit most deployments are still chatbot wrappers We asked enterprises to assess their portfolios honestly: what share of their deployed “agents” are true multi-step orchestrated workflows versus simple single-prompt chatbot wrappers. The answer is the defining finding of this wave. This is the gap at the center of the report. Combining the bottom two bands, 71% of enterprises (72 of 101) say a quarter or fewer of their deployed “agents” are genuinely orchestrated — and just 10% (10 of 101) have crossed the halfway mark. The ambition documented in the earlier findings — model-provider platforms, reliability-first success metrics, production rollouts, a deliberate control architecture — runs well ahead of the deployed reality, which remains overwhelmingly single-prompt assistants dressed as agents. This is less a contradiction than a roadmap: the platforms, budgets, and strategies are being put in place precisely because the orchestrated portfolio is still so thin. The open question for later waves is how fast the reality closes on the ambition. Finding 8: Fiscal control is still reactive Only a minority can stop a runaway agent before the bill arrives Finally, we asked how enterprises enforce fiscal control over agent token consumption — the risk that an autonomous loop exhausts a budget before anyone intervenes. Most rely on native caps or after-the-fact monitoring; real-time programmatic control is the exception. More than a quarter of enterprises (27%) admit they have no real-time, programmatic way to stop an agent before a budget-breaking bill arrives — they learn of it from the logs afterward. Another 32% lean entirely on the native caps and throttles built into their primary platform, a control only as good as the provider’s tooling and one that ties back to the lock-in concern of Finding 6. The enterprises building custom gateways (23%) or exploiting cross-model routing to arbitrage cost (19%) are the ones treating token burn as an engineering problem to be controlled deterministically. As with orchestration maturity, fiscal control is an area where the operational reality lags the ambition: agents are moving toward production faster than the cost-control plane around them is being built. It’s worth noting, a split appears according to company size: roughly one in three enterprises under 2,500 employees (34%) exercises only reactive control of agent spend, against 20% of larger enterprises — directional figures, but consistent with the chatbot-trap split. The mid-market is running the least mature agents on the least instrumented budgets. The bottom line: The layer is real; most of the agents aren't yet Organizations with 100 or more employees describe an orchestration strategy that is consolidating quickly and maturing slowly. They are standardizing on model-provider platforms — Anthropic’s Claude leads at 40% — chosen for the gravity of the underlying model, and they judge success by reliable multi-step execution. Investment is flowing to workflow tooling and permissions, the strategy is to consolidate frameworks and push agents into production, and the control plane they expect is deliberately hybrid, because vendor lock-in is the risk they fear most. But the honest self-assessment punctures the ambition. Seventy-one percent say a quarter or fewer of their deployed “agents” are truly orchestrated, only 10% are past the halfway mark, and more than a quarter cannot stop a runaway agent in real time. The orchestration layer — the platforms, the budgets, the control architecture — is being built ahead of the orchestrated portfolio it is meant to run. At 101 respondents in a single June wave this reads as a clear directional signal rather than a precise measurement: enterprises have decided how they want to orchestrate agents well before most of their agents are doing anything an orchestration layer is for. The question for subsequent waves is whether the deployed reality closes the gap on the ambition — or whether the chatbot trap proves stickier than the roadmap assumes. Based on survey responses from 101 qualified enterprise respondents (100+ employees), drawn from a single June 2026 wave. Because this is one wave rather than a pooled multi-month sample, results read directionally rather than as a confirmed trend. Respondents include product and program managers, CIOs, CTOs and CISOs, consultants and advisors, and directors and VPs of data, AI, and engineering, across Technology/Software, Financial Services, Healthcare, and other sectors.

~11 min readJul 15, 2026

Thinking Machines open sources first multimodal language model, Inkling, focused on low cost and 'resistance to censorship'

Enterprises looking to move more of their agentic AI workloads to open weights models they can customize, control and run on-premises or in virtual private clouds have a strong new contender to consider. Today, Thinking Machines—the highly capitalized American AI startup founded by former OpenAI CTO Mira Murati—released Inkling, its first major language model under an enterprise-friendly Apache 2.0 open source license, and it boasts high, if sub state-of-the-art, performance for open weights models on third-party benchmarks, specifically software engineering (77.6% on SWE-bench Verified, where it beats fellow U.S. open rival Nvidia Nemotron 3's 71.9%) and voice understanding (91.4% on VoiceBench compared to 94.4% for Gemini 3.1 Pro on high reasoning effort). Another differentiator: Thinking Machines notes that Inkling was designed "to answer directly on topics that may be subject to censorship," offering enterprises concerned about factual outputs, irrespective of controversy or sensitivity, a more trustworthy option. Coming in at 975 billion total parameters, Inkling is a natively multimodal, open-weights Mixture-of-Experts (MoE) system capable of reasoning across text, images, and audio. The weights are already available on Hugging Face and the company's own model training application programming interface (API), Tinker. Designed to balance cost against performance through a novel "controllable thinking effort" mechanism, the model represents a significant departure from the black-box scaling strategies of frontier competitors. Alongside the flagship model, Thinking Machines also announced a preview of Inkling-Small, a lighter 276-billion-parameter alternative optimized for workloads where low latency and cost are paramount. Benchmarks Show a Powerful, High-End, Sub State-of-the-Art Model While Inkling is a formidable multimodal engine, it lands in a fiercely competitive 2026 open-weight landscape characterized by highly specialized MoE architectures. Rather than attempting to dominate every leaderboard, Thinking Machines explicitly designed Inkling—with 975 billion total and 41 billion active parameters—as a broad, balanced generalist. For example, it comes in near the middle high-end of benchmark performance 1257 on Design Arena’s Agentic Web Dev leaderboard measuring human scores of frontend web design. But China’s leading AI labs have produced models with elite reasoning and coding capabilities, posing a stiff challenge to Inkling's generalist approach and ultimately outperforming it on general and coding benchmarks. GLM 5.2: Widely considered the top open-weight reasoning model available in the benchmark set, GLM 5.2 outperforms Inkling on pure coding, agentic, and complex reasoning tasks. It scores 62.1% on SWEBench Pro (Public) compared to Inkling’s 54.3%, and a massive 82.7 on Terminal Bench 2.1 against Inkling’s 63.8. GLM 5.2 also holds the edge in text-only reasoning, scoring 40.1% on HLE (text only) versus Inkling's 30.0%. DeepSeek V4 Pro: DeepSeek maintains an edge in several strict coding and factuality domains, beating Inkling on SWEBench Verified (80.6% vs. 77.6%) and SimpleQA Verified (57.0% vs. 43.9%). However, Inkling successfully overtakes DeepSeek V4 Pro in mathematical problem-solving, achieving 97.1% on AIME 2026 compared to DeepSeek's 96.7%. Kimi K2.6: This model outpaces Inkling across multiple technical benchmarks, delivering higher scores on GPQA Diamond (91.1% vs. 87.9%), BrowseComp (83.2% vs. 77.1%), and HLE with tools (54.0% vs. 46.0%). Yet Inkling proves more resilient on general chat instruction following, scoring 79.8% on IFBench compared to Kimi K2.6's 76.0%. Against its primary U.S.-based open-weight competition, Inkling demonstrates strong parity and frequent superiority. Nemotron 3 Ultra: Inkling consistently outperforms this U.S. rival across reasoning and coding. Inkling posts 97.1% on AIME 2026 and 77.6% on SWEBench Verified, beating Nemotron's 94.2% and 70.7%, respectively. Furthermore, Inkling significantly leads in agentic workflows, scoring 74.1% on MCP Atlas against Nemotron's 44.7%. When compared to closed-source juggernauts like Claude Fable 5, GPT 5.6 Sol, and Gemini 3.1 Pro, Inkling trails in peak reasoning and software engineering autonomy, but remains highly competitive in multimodality. Coding and Reasoning: Closed models maintain a commanding lead. Claude Fable 5 (max) hits 95.0% on SWEBench Verified and 53.3% on HLE (text only), far outpacing Inkling's 77.6% and 30.0%. GPT 5.6 Sol dominates Terminal Bench 2.1 with an 89.5, easily clearing Inkling's 63.8. Native Multimodality: Inkling's native visual and audio capabilities hold their own. On the MMMU Pro (Standard 10) vision benchmark, Inkling's 73.3% is competitive, though trailing Claude Fable 5's 84.2% and GPT 5.6 Sol's 83.0%. In audio processing, Inkling scores a highly respectable 77.2% on MMAU, keeping it within striking distance of Gemini 3.1 Pro's 82.5%. If an enterprise workflow demands elite software engineering autonomy or the highest bounds of text-only reasoning, models like GLM 5.2 or proprietary systems like Claude Fable 5 maintain the edge. However, Inkling carves out a unique and highly defensible position: it is the most capable open-weight foundation model that natively fuses text, vision, and audio, while simultaneously offering developers direct programmatic control over the cost-to-performance ratio. The Shift from Static Reasoning to Controllable Thinking Rather than attempting to build a singular "god model" optimized strictly for state-of-the-art benchmark domination, Thinking Machines engineered Inkling for adaptability and efficiency in real-world workflows. The standout feature of this release is Inkling's "controllable thinking effort." Developers can programmatically adjust the model's reasoning budget—scaling from 0.2 to 0.99—to dictate how hard the AI should "think" before generating an output. As the company noted, "Inkling's continuous thinking effort lets you pick your point on the cost/performance curve—reaching the same score with a fraction of the tokens". In practical terms, this allows enterprises to deploy Inkling with lower token expenditure for simpler tasks, while cranking up the compute overhead for complex, multi-step reasoning challenges. However, by keeping the thinking effort lower and generating fewer tokens, the cost-conscious enterprise can achieve high quality results and performance on simple tasks while spending less money, or, in the case of those running models locally, less costs on energy and compute resources. During the model’s large-scale reinforcement learning (RL) training over 30 million rollouts, researchers observed an emergent phenomenon they called "chain of thought condensation". Over time, Inkling naturally learned to compress its internal reasoning steps—dropping grammatical overhead and connectives—while reaching the same accurate conclusions, resulting in drastically reduced latency. Epistemics and Censorship Resistance A notable element of Thinking Machines' release is its explicit focus on the model's epistemics—specifically its calibration, instruction following, and resistance to censorship. In an ecosystem where open-weight models adopt either overly restrictive safety guardrails or echo state-aligned ideological talking points, Inkling was intentionally trained to answer directly on politically sensitive or heavily censored topics. To validate this approach, Thinking Machines submitted Inkling to the Propaganda and Censorship Eval developed by AI startup Cognition. According to the published findings, Inkling demonstrated "strong patterns of censorship non-compliance," effectively resisting ideological capture or boilerplate refusals when presented with sensitive subjects. Despite its resistance to censorship, the model maintains a robust defense against genuinely malicious, dangerous, or illegal queries. On the StrongREJECT benchmark—which tests responses to unambiguous harmful requests—Inkling scored 98.6%, placing it in line with strict frontier safety standards. Furthermore, on the FORTRESS benchmark, Inkling successfully navigated the line between safety and over-refusal: it achieved a 78.0% refusal rate on adversarial queries (such as those involving weapons, cyberattacks, or violence) while maintaining a 95.9% compliance rate on benign, look-alike queries. Thinking Machines noted that typical open-weight vulnerabilities remain within the architecture. Internal safety evaluations revealed an "occasional tendency to comply with role-play and indirectly framed prompts concerning harmful topics". The company advised enterprise developers to treat the model's built-in refusals as just one layer of security, recommending the downstream deployment of external moderation tools—such as Llama Guard—to filter adversarial jailbreaks and enforce use-case-specific safety policies at the application level. Under the Hood: Architecture and Multimodality Inkling's scale is staggering, yet sparse. The MoE architecture features 975 billion total parameters, but only 41 billion parameters are active during any given token generation. It supports a massive context window of 1 million tokens and diverges from typical transformer models by using relative positional embeddings instead of the industry-standard Rotary Positional Embedding (RoPE). True to the company's foundational vision, Inkling was trained from scratch to be natively multimodal. Unlike models that rely on bolted-on external encoders, Inkling uses an encoder-free early fusion approach. It directly ingests audio as discrete dMel spectrograms and visual data as 40x40 pixel patches via a hierarchical multi-layer perceptron (hMLP), projecting all modalities into a shared hidden space. Licensing: True Open-Source for the Enterprise For enterprise IT teams and developers, the most disruptive aspect of Inkling may be its licensing. Inkling is released under the permissive Apache 2.0 license. In an ecosystem where many so-called "open" models from Western labs are tethered to dual-use commercial licenses, acceptable use restrictions, or revenue caps, an Apache 2.0 designation makes Inkling a true open-source foundation. This gives developers the legal freedom to download, modify, integrate, and commercialize the model weights entirely royalty-free. The model is readily deployable across major open-source inference libraries—including SGLang, vLLM, TokenSpeed, and llama.cpp—and comes with a native NVFP4 quantized checkpoint optimized for NVIDIA Blackwell systems. Community Reactions: The Engineering Feat The AI community's response has been swift, praising both the model's openness and the underlying engineering execution. In a post on X, Thinking Machines co-founder John Schulman reflected on the rapid development cycle: "Inkling is out today, with open weights and in Tinker. It's been fun to watch this one come together: pretraining began last winter, and starting in mid-January a small team built up the coding, reasoning, and agentic training from there. We learned a lot building it, and I hope people find good uses for it." Horace He, a researcher at Thinking Machines (previously from PyTorch), underscored the difficulty of the task in another post on X: "It truly takes a village to release a model, perhaps especially an open weights model. Actually doing the entire process from scratch, from data to pretraining to posttraining to actual release, gives a lot of appreciation for anyone who does it!" The broader open-source ecosystem has also embraced the technical integrations. Lysandre Debut, the Chief Open-Source Officer at Hugging Face, shared his enthusiasm regarding the model's optimization in his own X post: "One thing I find quite striking is how much easier accelerating models has become... We replaced the model's causal Conv1D with the `causal-conv1d` kernel. One line changed, +4% tokens per second. We then replaced its attention implementation with FlashAttention-4. Another single change, another +11%. That's a total throughput improvement of about 15%, without changing the model architecture or retraining anything." Tiezhen Wang, an ecosystem growth expert and ex-Googler, celebrated the release as a massive win for the open-source community, listing the model's impressive specifications on X, highlighting its "975B total, 41B active" size, "Native MTP support," and the highly coveted "Apache 2.0 license." Background: The Road to Inkling To understand the significance of Inkling, one has to look back at the rapid trajectory of Thinking Machines over the past 18 months. When Mira Murati departed OpenAI in late 2024 to found Thinking Machines alongside industry veterans like John Schulman and Barret Zoph, the stated goal was to pivot away from building isolated autonomous agents. Instead, the company aimed to build flexible, multimodal systems designed for genuine human-AI collaboration and open science. By July 2025, the startup had secured a historic $2 billion seed round led by Andreessen Horowitz at a $12 billion valuation. At the time, Murati promised the impending release of a product with a "significant open source component" to empower researchers and startups. The company’s philosophy began coming into sharper focus in October 2025 with the launch of Tinker, a Python-based API for large language model fine-tuning that gave researchers granular control over training pipelines without the friction of distributed compute management. That same month, Thinking Machines researcher Rafael Rafailov delivered a provocative critique of the AI industry at TED AI. He argued that the current trajectory of simply throwing more compute at models was fundamentally flawed, noting that today's systems take shortcuts—like wrapping code in try/except blocks—because they are trained strictly for task completion rather than genuine learning. Rafailov posited that the first artificial superintelligence would not be a "god model," but rather a "superhuman learner" capable of meta-learning and internalizing abstractions. Inkling’s architecture—specifically its controllable thinking effort and its ability to organically compress its chain of thought during RL—feels like the first tangible realization of Rafailov's thesis. In May 2026, the lab teased its technical prowess with the research preview of TML-Interaction-Small, a system that eliminated "turn-based" chat by processing inputs and outputs simultaneously in 200ms chunks. This "full-duplex" breakthrough proved the company could build highly responsive, natively multimodal models from scratch. Now, with Inkling out in the wild, Thinking Machines has delivered on its foundational promises. By offering a massive, natively multimodal model under a true open-source license, they aren't just giving developers a new tool—they are attempting to fundamentally rewrite the economics and accessibility of frontier AI development.

~1 min readJul 15, 2026

Daniel Ek’s body-scanning startup Neko Health raises another $700M

Neko Health has developed proprietary body-scanning technology, which it couples with bloodwork, to assess a person's health.

~4 min readJul 15, 2026

Cohere VP says enterprise AI sovereignty requires control of the full agent stack at VB Transform 2026

Hundreds of enterprise leaders and technical experts packed the main ballroom of the luxurious Hotel Nia in Menlo Park this week for VB Transform 2026, the year's preeminent conference on using generative AI agents to drive business outcomes. Rachad Alao, vice president of product engineering at the rising Canadian enterprise AI startup Cohere, joined VentureBeat CEO and editor-in-chief Matt Marshall for a fireside chat about building agentic systems without surrendering sensitive data, infrastructure control, or the ability to change vendors. Alao, who previously led responsible AI and trust and safety engineering teams at Google and Meta, argued that AI sovereignty means more than downloading an open model or running an application behind a corporate firewall. Asked how Cohere defines sovereignty, Alao pointed to organizations operating mission-critical systems, including banks, hospitals and governments. “It is important to have very tight control on where the data resides, have tight control on the AI,” he said, adding that AI operations should take place in jurisdictions an organization understands or directly controls. That extends from GPUs and private-cloud infrastructure through governance systems that route requests among models, as well as the connectors, search tools and agent frameworks acting on enterprise data. “You want to have control on the entire stack,” Alao said. Agent workloads could outrun falling token prices Marshall challenged one of the central economic arguments for smaller, locally deployed models: Inference prices continue to fall rapidly, potentially weakening the case for optimizing every token. Alao countered that total consumption is climbing even faster as enterprises move from relatively simple chatbots to agents that reason through problems, call tools, search internal systems and take multiple steps before returning an answer. “Your token utilization is going exponentially up, because you’re dealing with more and more complex agentic use cases,” he said. Those workflows require “a lot of processing, thinking, tools interaction” to complete their objectives, he added. Alao also drew a contrast between providers that bill customers according to token consumption and Cohere’s approach. “If your whole way of charging customers is for token utilization, you want to maximize token utilization,” he said. “We do not sell our models and our platform that way.” Instead, Alao said Cohere tries to help enterprises solve their hardest problems privately and securely while reducing unnecessary model usage. His prescription was straightforward: “Use the right model for the task at hand.” Rather than sending every request to the largest available frontier model, enterprises should route work according to the intelligence required and the sensitivity or regulatory burden attached to the task. Alao cited an unnamed Canadian bank that uses Cohere’s on-premises models for highly regulated workloads, while sending less sensitive tasks requiring greater intelligence through Cohere’s North platform to larger frontier models. “So model routing can become super useful,” he said. Smaller models for most enterprise work Asked by an audience member how Cohere’s open-source North Mini Code, released last month, could compete against proprietary coding models, Alao acknowledged that larger frontier models may perform somewhat better on the hardest tasks. But that advantage may not justify using them indiscriminately. “For 80% of the use cases that they needed, this was a lot more effective, a lot cheaper,” Alao said of developers adopting the model. Cohere’s North Mini Code runs on a single Nvidia H100 GPU and targets agentic software engineering, including terminal work, code review and tool use. The company has also released Command A+, a 218-billion-parameter mixture-of-experts model with only 25 billion parameters active during each generation step. Its compressed four-bit version reduces the hardware required for private deployment, while its Apache 2.0 license gives enterprises broad freedom to operate and modify it. Search becomes part of the agent Asked about Cohere’s longstanding work on embeddings and enterprise search, Alao said the field is moving beyond retrieving text and inserting it into a model’s context window. “Today, the state of the art is around multimodal search,” he said. “It’s beyond just the text modality.” Search across documents, images and other forms of information is becoming “an integral component of your agentic workflow,” Alao added, with the model deciding when and how to use retrieval like any other tool. Asked what would persuade enterprises to move beyond bundled AI services from existing cloud providers, Alao returned to data control and portability. “If you’re interested in sovereignty, you want to have more control on your data,” he said. Cohere’s governance layer, he added, lets customers route traffic to appropriate models, “breaking that vendor lock-in concern that a lot of our customers have.”

~13 min readJul 14, 2026

1Password moves into AI cost management, betting that token spend is the next enterprise budget crisis

1Password on Tuesday launched AI Spend and Consumption Management, a new capability embedded in its SaaS Manager platform that gives IT and finance teams a unified, real-time view of how their organizations consume and spend on AI services from vendors including Anthropic, Cursor, and OpenAI. The move marks the latest strategic expansion for a company that built its reputation on password management for consumers and, over the past three years, has aggressively repositioned itself as a broader identity security and SaaS governance platform for enterprise buyers. With this release, 1Password is staking a claim in one of enterprise technology's newest and most chaotic budget categories: the consumption-based cost of large language models. "Executives want teams to build faster with AI, but that speed is creating a new kind of spending pressure," Greg Henry, 1Password's chief financial officer, said in an exclusive interview with VentureBeat. "Developers are consuming tokens at a pace that traditional budgets weren't built to manage, and IT and finance teams are being asked to forecast and justify AI investments without a clear view of what's actually driving costs." The product, now in public preview with broad availability planned for fall 2026, connects directly to vendor admin APIs to pull token-level consumption data daily. It normalizes that data across providers into a single dashboard and allows organizations to set vendor-level spend limits, configure threshold-based alerts via Slack and email, and break down usage by team, user, vendor, and model. Why traditional software budgets can't keep up with AI token pricing The core challenge 1Password is targeting is structural. Traditional SaaS pricing operates on a per-seat, per-year model that is easy to budget and reconcile. AI pricing does not. Every API call to Claude, GPT-5.6, or a Cursor-powered coding assistant consumes tokens, and the cost of those tokens varies by model, by input versus output, and by the complexity of the task. A single engineering team running agentic workflows can burn through a prepaid token budget in weeks — and the finance team may not notice until the invoice arrives. Henry drew a sharp analogy to a problem enterprises have already lived through once. "Consumption-based pricing isn't new," he said. "We saw it arrive with cloud infrastructure, and it took years to build the tools and disciplines to manage it. AI is the next version of that shift." That comparison resonates across the industry. When Amazon Web Services, Microsoft Azure, and Google Cloud popularized consumption-based pricing for compute and storage in the 2010s, enterprises initially lacked the tooling to monitor and optimize their cloud bills. That gap spawned an entire FinOps ecosystem — companies like CloudHealth, Spot.io, and Apptio built multi-billion-dollar businesses helping organizations understand what they were spending on cloud and why. Henry is explicitly betting that AI token spend will follow the same trajectory, and that organizations that fail to build visibility now will end up, as he put it, "paying far more than they needed to, for far longer than they should have." The scale of the coming wave lends credibility to that bet. Goldman Sachs has estimated that token consumption from AI agents alone will grow 24 times by 2030, a projection driven by the expectation that autonomous AI systems will increasingly execute multi-step workflows — booking travel, writing and deploying code, managing customer service interactions — that generate vastly more API calls than a human sitting at a chat interface. How 1Password's new dashboard tracks every token across Anthropic, Cursor, and OpenAI The new capability extends 1Password SaaS Manager's existing foundation of application discovery, license management, and spend analytics. It is not a standalone product. Existing SaaS Manager customers can activate it by connecting their supported AI vendor API keys, at which point consumption data flows into a dedicated AI Consumption Management dashboard. Henry confirmed that there is no separate product or add-on fee: "AI Spend and Consumption Management is available to all 1Password SaaS Manager customers." The system provides four core functions. First, it aggregates token usage and spend across Anthropic, Cursor, and OpenAI into a single, normalized view — eliminating the need to toggle between three separate vendor dashboards with three different reporting formats. Second, it enables budget controls: organizations can set vendor-level spend limits, configure percentage-based thresholds, and receive automated alerts when prepaid balances approach depletion. Third, it disaggregates consumption by team, user, vendor, and model, allowing finance and IT to understand not just how much is being spent, but where and by whom. Fourth, it situates AI spend within the broader SaaS portfolio, helping organizations see how token costs relate to their total software investment. Notably, the system captures consumption regardless of whether a human or an AI agent generated it. "Token consumption is captured at the API level regardless of whether a human or an agent is generating it," Henry explained. "Organizations get the total consumption picture, including the spikes that agent loops can create, which can be some of the hardest usage to catch before it becomes a problem." That agent-level visibility matters because autonomous AI systems can generate runaway costs in ways that human users typically cannot. An agentic coding assistant stuck in a retry loop, for example, can consume thousands of dollars in tokens in minutes — with no human in the loop to notice. For now, the product alerts but does not enforce. When asked whether 1Password will eventually give organizations the ability to automatically cut off spending when a threshold is crossed, Henry said the company is "actively evaluating" automatic enforcement but emphasized that visibility must come first: "You can't enforce what you can't see." The choice of launch partners reveals where enterprise AI budgets are under the most pressure The decision to start with Anthropic, Cursor, and OpenAI — rather than casting a wider net — reflects where enterprise AI adoption and budget strain are most concentrated right now. Henry said the choice was driven entirely by customer demand. "Anthropic, Cursor, and OpenAI are where we're seeing the highest adoption, and where token consumption can move fast and get ahead of the teams responsible for managing it," he said. The company plans to add additional vendors based on customer demand, API availability, and budget impact, though it has not committed to a specific timeline or vendor list. The inclusion of Cursor alongside the two major foundation model providers is telling. Cursor, an AI-powered code editor that has rapidly gained traction among developers, represents a category of AI tool where consumption is particularly difficult to forecast. Unlike a chatbot interface where a user consciously types a prompt, Cursor integrates AI suggestions directly into the development workflow, generating token consumption continuously as developers write code. That ambient, always-on consumption pattern makes it especially prone to budget overruns. Henry also addressed who inside an organization should actually own this problem — and acknowledged that the honest answer right now is no one. "When spend is fragmented across vendor dashboards and finance teams are reconciling it monthly, you're always behind," he said. "AI spend can't be treated as a finance-only or IT-only problem." He noted that the pricing differences between models have become significant enough that the choice of which AI model a team uses is now a meaningful financial decision, one that is pulling CFOs into conversations with IT, product, and engineering leaders "in ways they never had to before." Steve May, director of IT at ServiceTrade, a 1Password customer that has been using the capability, said it addressed a concrete planning gap. "Forecasting tools for AI consumption and spend was one of our biggest gaps in planning because we didn't have a reliable way to track it," May said. He added that the visibility has "prevented overages that would have cost far more to fix after the fact." Where 1Password fits in the fast-consolidating SaaS management market 1Password is not the only company racing to solve the AI cost management problem, but the competitive landscape is still fragmented and the category is far from mature. Zylo, a SaaS management platform that Gartner has also recognized as a leader in the space, published its 2026 SaaS Management Index in January showing that AI-native application spend surged 393% year over year in organizations with more than 10,000 employees and 108% overall. Zylo's data also revealed that ChatGPT has become the most expensed application in enterprise environments, highlighting how AI tools are entering organizations through employee credit cards and expense reports — outside formal procurement and governance workflows. Zylo has added its own token-level cost tracking for AI vendors including Anthropic, OpenAI, Cursor, and Perplexity. Meanwhile, according to a comparison published by Coommit in May, Vendr — which focuses more on SaaS negotiation than discovery — tracks AI tools at the contract level but does not yet offer consumption-level visibility. And the FinOps Foundation reported in its 2026 State of FinOps survey that 98% of organizations now actively manage AI costs, up from just 31% in 2024. The broader SaaS management market is also consolidating rapidly. In May, Deel acquired Sastrify, a German SaaS management vendor, and began folding it into its HR platform — a signal that SaaS management capabilities are increasingly being absorbed into adjacent enterprise platforms rather than remaining standalone products. 1Password's approach differs from pure-play SaaS management competitors in one important respect: it is building AI cost management on top of an identity security platform, not a FinOps or procurement tool. The company's SaaS Manager product grew out of its 2025 acquisition of Trelica, a UK-based SaaS access management startup whose technology enabled the discovery of unsanctioned applications — so-called shadow IT. As BetaKit reported at the time of that deal, 1Password co-CEO Jeff Shiner described Trelica as "a pioneer in modern SaaS access management" and said the acquisition would accelerate 1Password's Extended Access Management product roadmap by more than a year. CRN noted that Trelica brought more than 300 SaaS integrations to the platform. That identity-first lineage gives 1Password a natural advantage in connecting spend data to specific users and teams — a linkage that matters when the question shifts from "how much are we spending on AI?" to "who is spending it, and is it delivering value?" From password manager to platform company: 1Password's $6.8 billion bet on enterprise identity The launch raises a question that Henry addressed head-on: whether a company that started as a consumer password manager can credibly compete in enterprise AI cost management. "It doesn't feel like a stretch to us. It feels like a natural progression," he said. "For more than 20 years, 1Password has evolved alongside how our customers work. We started by protecting passwords. Then we helped organizations manage secrets, control access, and get visibility into the applications their teams rely on." The company's evolution has been rapid. 1Password raised a $620 million Series C in January 2022 led by ICONIQ Growth, reaching a $6.8 billion valuation — at the time, the largest funding round ever raised by a Canadian company, according to Crunchbase. The round also attracted celebrity investors including Ryan Reynolds, Scarlett Johansson, and Robert Downey Jr. As of early 2025, BetaKit reported that 1Password had surpassed $250 million in annual recurring revenue, with B2B sales accounting for nearly three-quarters of total revenue and the company claiming to be cash-flow positive. In May 2024, 1Password launched Extended Access Management, a platform designed to secure sign-ins across both managed and unmanaged applications and devices. That same year, it acquired Kolide for device trust and, in early 2025, Trelica for SaaS discovery. In June 2026, Gartner named 1Password a Leader in its Magic Quadrant for SaaS Management Platforms. According to 1Password's own blog post on the recognition, its SaaS Manager now supports over 400 integrations and provides visibility into a library of more than 40,000 pre-populated application profiles. Each step has moved the company further from its consumer roots and deeper into enterprise infrastructure. The AI Spend and Consumption Management launch extends that trajectory into financial operations territory — a domain where 1Password will compete not only with SaaS management vendors but potentially with dedicated FinOps platforms and the AI vendors' own billing dashboards. Why high AI token consumption doesn't always mean wasted money Perhaps the most revealing part of Henry's commentary concerns what organizations should actually do with the consumption data once they have it. He pushed back forcefully against the assumption that high token consumption automatically signals waste. "A team burning through tokens may be building something genuinely valuable," he said. "A lower-usage project might not be moving the business forward at all. What matters is whether that consumption is producing enough business value to justify the spend." Henry drew a distinction between personal productivity — "having a bot summarize your meeting or draft a quick email" — and genuine business outcomes. "What organizations need to see is where consumption is actually driving revenue, efficiency, or something that moves the needle." That framing positions AI Spend and Consumption Management not just as a cost-cutting tool but as a decision-support system for AI investment allocation. If a CFO can see that one engineering team's heavy Claude usage is powering a product feature that drives revenue, while another team's OpenAI spend is funding low-value internal automation, the organization can reallocate budget accordingly rather than imposing across-the-board cuts. "When costs rise faster than expected, the instinct is to cut," Henry said. "But most organizations can't yet tell which teams, models, or tools are responsible for the increase, so they end up cutting across the board rather than directing investment toward the AI projects that are actually delivering business value. Blunt cuts on a technology you're counting on for competitive advantage is not a management strategy, it's a missed opportunity." The next enterprise budget crisis is already here — and it's priced per token The product's current scope — three vendor integrations, alerting but not enforcement — is clearly a starting point. Henry signaled that automatic spend limits are on the roadmap and that additional vendor integrations will follow based on customer demand. But the broader trajectory he described suggests 1Password sees this launch as a wedge into a much larger opportunity. "As traditional SaaS products add AI capabilities, their pricing models are going to follow," he said. "Organizations that build visibility and management discipline around consumption now are going to be in a much better position when that happens across the rest of their software portfolio." If Henry is right, the chaos currently confined to AI token budgets is not a temporary growing pain but a preview of how all enterprise software will eventually be priced. A decade ago, companies scrambled to understand their cloud bills. Today, they are scrambling to understand their AI bills. The question is whether the organizations building the dashboards this time around can get ahead of the curve — or whether, as Henry warned, they will end up where so many companies ended up with cloud, realizing too late how much they were overpaying, and for how long. AI Spend and Consumption Management is available now in public preview for 1Password SaaS Manager customers. Broad availability is planned for fall 2026.