AI agents are getting better at writing application logic.
That is no longer the surprising part.
The harder question is what happens after the local prototype works.
The agent can write the app. The hard part is making the app real.
What ScaleMule provides
ScaleMule turns the MenuGen problem into a backend product surface: identity, organizations, tenant boundaries, storage, events, background work, policy, audit logs, and production configuration exposed through interfaces agents can use directly.
Instead of asking an AI agent to glue together auth, data, storage, queues, usage, environment state, and operational controls across many dashboards, ScaleMule gives the agent one production substrate to build against.
The goal is not to replace every specialized tool. The goal is to make the core backend layer coherent, inspectable, and safe enough for agentic engineering.
Karpathy's MenuGen story is one of the clearest public descriptions of this gap. MenuGen began as a vibe-coded app: take a photo of a restaurant menu and generate helpful visual explanations of the dishes. The first local version came together quickly. But making the app real meant dealing with much more than generated UI and model calls.
The work moved into the production substrate: deployment, environment variables, API keys, authentication, OAuth, DNS, payments, database state, work queues, rate limits, dev and prod configuration, and service-specific settings pages.
That is the MenuGen problem.
AI can increasingly write the app. The surrounding infrastructure is still designed for humans clicking through dashboards.
The app was not the hard part
The most important detail in the MenuGen story is not that an AI-assisted workflow produced a working application.
It is that the local demo created a false sense of completion.
The product felt close because the visible interface existed. But production software is more than interface and generated code. A real app needs durable identity, permission boundaries, data persistence, background work, billing, secure configuration, auditability, and deployment state.
Today, much of that state lives across separate services.
One dashboard owns deployment. Another owns authentication. Another owns OAuth. Another owns payments. Another owns storage. Another owns background jobs. Another owns rate limits. Another owns environment variables. Another owns production keys. Another owns team and security settings.
For a human developer, this is annoying but familiar.
For an AI agent, it is a broken surface.
The agent can modify files, run commands, and reason over text. But the critical production state often lives behind browser flows, nested settings pages, copied secrets, inconsistent documentation, and interfaces designed for humans in a pre-agent world.
Dashboard-native infrastructure does not compose well with agents
Most modern infrastructure was designed around a human operator.
That made sense for the last era of software. Developers read docs, opened tabs, clicked through settings, copied keys, configured DNS, pasted environment variables, and stitched together products across specialized SaaS tools.
Agentic engineering changes the interface requirement.
If an AI agent is expected to build, deploy, operate, and improve software, the infrastructure has to expose itself through surfaces agents can use reliably:
- structured APIs
- deterministic CLIs
- Markdown-native documentation
- clear configuration contracts
- inspectable state
- safe defaults
- scoped permissions
- audit trails
- environment-aware provisioning
- production guardrails
The issue is not that Vercel, Clerk, Stripe, Supabase, Google Cloud, or other services are weak products. They solve important pieces of the stack.
The issue is that the agent has to operate across too many fragmented human interfaces to make a product real.
The next infrastructure layer needs to reduce that fragmentation.
From vibe coding to production systems
Vibe coding lowers the cost of creating software-shaped artifacts.
Agentic engineering raises the question of how those artifacts become reliable systems.
That requires a different kind of backend layer.
Not just hosting.
Not just auth.
Not just a database.
Not just a queue.
Not just billing.
Not just logs.
The missing layer is a production substrate that combines the core backend primitives an AI-generated product needs, while exposing them through interfaces agents can operate directly.
That is the category ScaleMule is building toward.
ScaleMule is our answer to this infrastructure gap. We are building agent-native backend infrastructure for AI and API products: identity, tenant isolation, data, storage, events, functions, policy, auditability, and operational controls in one layer designed for the agent era.
The goal is simple:
The agent writes the product logic. ScaleMule carries the production substrate.
What an agent-native backend needs
An agent-native backend should not require the user to assemble production software through ten disconnected dashboards.
It should provide the primitives that most AI and API products need from the beginning.
Identity and access
Agents need a reliable way to create applications with users, organizations, roles, permissions, API keys, service accounts, and non-human actors.
Authentication is not enough. Agent-generated products need identity systems that understand tenants, teams, applications, and delegated actions.
Tenant-aware data
Many AI products are multi-tenant from day one.
Customer data, organization boundaries, user permissions, usage records, generated content, and operational metadata need to be isolated by default. Tenant isolation should be a core primitive, not an afterthought.
Storage and media
Modern AI products often handle files, images, transcripts, generated media, user uploads, and derived assets.
Agents should not have to wire object storage, metadata records, permission checks, and lifecycle behavior from scratch every time.
Events and background work
The MenuGen story points directly at the need for durable background workflows.
If an operation takes time, the system needs a job record, queue, state transitions, retries, progress updates, and failure handling. The user should not lose everything because a request timed out.
Policy and auditability
Agent-generated systems need strong operational boundaries.
Who created this key? Which user triggered this action? Which agent changed this configuration? What tenant did it affect? What happened before the failure?
Agent-native infrastructure needs audit events and policy controls by default.
Production configuration
Environment variables, secrets, webhooks, domains, rate limits, API keys, and deploy configuration should not be scattered across hidden dashboard state.
They should be inspectable, scriptable, and understandable by agents.
MenuGen pain vs. ScaleMule primitives
| MenuGen pain | ScaleMule primitive |
|---|---|
| API keys and environment variables scattered across services | Scoped configuration and secret surfaces |
| Authentication and OAuth setup across dashboards | Tenant-aware identity and access |
| Database records and background queues wired manually | Durable data and event workflows |
| Timeout-prone processing | Background jobs with state transitions, retries, and failure handling |
| Hidden production state in service dashboards | Agent-readable API, CLI, and documentation surfaces |
| Debugging production behavior after deployment | Audit logs and operational state |
| Usage limits and commercial controls added late | Billing, metering, and policy primitives |
MenuGen pain
API keys and environment variables scattered across services
ScaleMule primitive
Scoped configuration and secret surfaces
MenuGen pain
Authentication and OAuth setup across dashboards
ScaleMule primitive
Tenant-aware identity and access
MenuGen pain
Database records and background queues wired manually
ScaleMule primitive
Durable data and event workflows
MenuGen pain
Timeout-prone processing
ScaleMule primitive
Background jobs with state transitions, retries, and failure handling
MenuGen pain
Hidden production state in service dashboards
ScaleMule primitive
Agent-readable API, CLI, and documentation surfaces
MenuGen pain
Debugging production behavior after deployment
ScaleMule primitive
Audit logs and operational state
MenuGen pain
Usage limits and commercial controls added late
ScaleMule primitive
Billing, metering, and policy primitives
The opposite of dashboard glue
The MenuGen problem is not only a developer-experience problem.
It is a category signal.
As AI agents become more capable, the bottleneck shifts from generating code to safely connecting that code to the real world.
Production infrastructure has to become legible to agents.
That means fewer browser-only configuration flows. Fewer copied secrets. Fewer hidden production states. Fewer disconnected service contracts. Fewer brittle handoffs between local code and deployed systems.
It means infrastructure that an agent can inspect, provision, modify, and verify through explicit contracts.
Karpathy's MenuGen writeup points toward more than another marketplace of disconnected parts. It points toward a more opinionated, preconfigured layer with surfaces LLMs can actually operate: APIs, CLIs, curl examples, Markdown documentation, and explicit configuration. That is the direction ScaleMule is built around.
ScaleMule exists for that world.
Our thesis
We believe the next generation of software will be created by a mix of human intent and agent execution.
But those agents need a production layer they can rely on.
The winning infrastructure will not only make developers faster. It will make agents safer, more capable, and more operationally useful.
That is why ScaleMule is building agent-native backend infrastructure for AI and API products.
A product should not feel eighty percent done when it is really twenty percent production-ready.
The agent should be able to move from prompt to product against a backend substrate designed for deployment, identity, data, storage, events, policy, auditability, and operations from the start.
That is the MenuGen problem.
And it is exactly the kind of infrastructure gap ScaleMule was built to close.