★ VENTURE TAKES

Pick Your Stack, Not Just the Model

AI startups aren't differentiated by which model they use anymore. The real decision is the stack around it — how they access a model, layer retrieval, routing, and fine-tuning on top, and manage the cost, data, and trust trade-offs that follow.

1P · JUDY DUONG·JUNE 24, 2026·8 MIN READ

For a while, the easiest way to describe an AI startup was: “They use GPT for X.”

That was good enough when the market was early. The first wave of AI apps was mostly about proving that large language models could be useful inside real workflows: writing emails, summarizing documents, generating code, answering customer questions, drafting legal clauses, analyzing sales calls, and so on.

But now the question has changed.

It is no longer just:

Which model is the smartest?

It is:

Which model strategy gives this startup the right mix of quality, cost, control, privacy, speed, and defensibility?

That distinction matters because model choice quietly shapes the whole company. It affects gross margin, enterprise sales, data security, infrastructure ownership, procurement, and whether the product is easy to copy or hard to replace.

The AI stack splits into two layers. Model access: which model the startup uses, who hosts it, and where the data goes. Product intelligence: what the startup builds around the model to make it useful, reliable, and specific to the customer's workflow.

A legal AI startup might use Claude through AWS Bedrock, add RAG over legal contracts, build permission-aware retrieval, log every answer, and run evaluations before deployment.
A consumer writing app might use OpenAI’s API, strong prompting, user memory, a beautiful interface, and later add model routing to reduce cost.
A healthcare workflow startup might use a self-hosted open model, retrieve private documents through RAG, add strict access control, and require human review before outputs are used.

Same AI category. Very different company shapes.

1. Layer one: model access strategy

The model access layer answers the most basic infrastructure question:

Where does the intelligence come from, and who runs it?

This is where startups decide whether to rent a frontier model, access models through a cloud platform, use an open model through a hosted provider, self-host an open model, or build their own.

Model access options
The access layer determines who runs the model, who sees the data, and what cost or control trade-off the startup inherits.

Model access strategy	How it works	Best for	Main risk
Closed frontier API	The startup calls providers like OpenAI, Anthropic, Google, Mistral, or Cohere directly through an API	Fast MVPs, frontier reasoning, coding, research, general AI apps	API cost, vendor dependency, limited control
Cloud-hosted model	The startup accesses models through AWS, Azure, Google Cloud, or similar cloud platforms	Enterprise customers, regulated industries, existing cloud buyers	Cloud lock-in, procurement complexity, sometimes slower model access
Managed open-model API	The startup uses open models through inference providers without managing GPUs itself	Teams wanting open-model flexibility without running infrastructure	Still pays usage fees, less control than self-hosting
Self-hosted open model	The startup downloads open-weight models like Llama, Mistral, Qwen, DeepSeek, Gemma, or Stable Diffusion-style models and runs them on its own infrastructure	Privacy, control, high-volume use cases, custom deployment	GPU infrastructure, engineering burden, model quality trade-offs
Own trained or heavily post-trained model	The startup trains, post-trains, or deeply customizes its own model	Foundation model labs, robotics/world models, scientific AI, deeply specialized model companies	Extremely expensive, talent-heavy, high execution risk

Closed APIs are the fastest path, but the startup is renting intelligence from someone else. Cloud-hosted models matter for enterprise sales because large buyers already have security reviews, cloud contracts, and procurement workflows attached to their existing provider. Managed open-model APIs are an underrated middle ground — model variety without managing GPUs. Self-hosting buys control, but the weights are free; running them isn't. Training from scratch only makes sense when the model itself is the company.

2. Layer two: product intelligence strategy

The product intelligence layer answers a different question:

How does the startup make a general model useful for a specific customer workflow?

This is where RAG, fine-tuning, agents, and routing belong — not replacements for closed APIs or open models, but product layers built around whichever model access strategy the startup chooses.

Product intelligence layers
RAG, fine-tuning, agents, and routing are product layers built around a model.

Product layer	What it does	Works with which model?	Best for	Main risk
Prompting	Gives the model instructions, examples, constraints, or output formats	Any model	Fast iteration, simple workflows, early products	Brittle outputs, long prompts, weak consistency
RAG	Retrieves relevant documents or data before the model answers	Any model: closed API, cloud-hosted, open, self-hosted	Enterprise search, legal, finance, support, internal knowledge tools	Bad retrieval, stale sources, permission leakage
Fine-tuning	Trains an existing model further on examples so it behaves more consistently	Usually open models or provider-supported fine-tuning APIs	Repeated formats, classification, extraction, brand voice, domain behaviour	Needs clean data, evals, maintenance
Tool use / agents	Lets the model call tools, search, calculate, update systems, or follow multi-step workflows	Any capable model with tool-calling or orchestration support	Workflow automation, research agents, coding agents, operational tasks	Unsafe actions, tool misuse, harder debugging
Model routing	Sends different tasks to different models based on quality, cost, speed, or risk	Multiple models	Cost control, multi-model products, enterprise optimization	Complexity, evaluation burden, routing mistakes

RAG stands for retrieval-augmented generation. It doesn't change the model's weights — it gives the model a private library to look at before answering. Most enterprise value isn't in a model's general knowledge; it's in the customer's private data: contracts, policies, tickets, transcripts, manuals.

A general model can say:

Here is how employee expense policies usually work.

A RAG product can say:

According to your company's 2025 expense policy, meals over £45 require manager approval.

That second answer comes from retrieval, not the model magically knowing the company.

Fine-tuning is different — it changes how the model tends to behave, not what it knows. Best for behaviour, not constantly changing knowledge: classifying claims, extracting invoice fields, holding a brand voice.

RAG gives the model information to read. Fine-tuning changes how it behaves.

Agents go a layer further: a system where the model decides what action to take, calls tools, observes results, and continues until the task is done. Riskier, because the model is no longer only generating text — it may be searching, editing files, sending requests, or triggering workflows.

Model routing is what happens when a startup stops treating "the model" as one fixed choice. A cheap small model handles simple classification. A frontier model handles complex reasoning. An open model handles privacy-sensitive workflows.

The future isn't one model for every task. It's model portfolios.

3. How the full architecture actually works

A modern AI startup does not always send a user question to one model and immediately return the answer. The app may decide which model to use, retrieve context, check permissions, call tools, evaluate outputs, and log the result.

Simple closed API app
User → Startup app → Closed model API → Model output → back to user

Enterprise RAG app
User question → Startup app → Permission check → Retrieval layer → Company documents → Selected model → Source-grounded answer → Audit log

Self-hosted open-model app
User → Startup app → Own GPU server / private cloud → Open model inference → Output review → back to user

Agentic workflow app
User task → Planner/router → Model chooses action → Tool/database call → Observation → Continue or stop (loops back until done) → Final output

"What model do you use?" is only the starting question. A stronger diligence question is: what is the full model stack, from access to retrieval to evaluation to security? That tells you more about the startup's defensibility than the model name alone.

4. The market is becoming multi-model

Companies are no longer betting on one model for everything. Enterprise AI buyers increasingly use multiple models because different models are good at different jobs — one stronger at code, another at long documents, another cheaper for simple classification, another safer for regulated workflows.

A16z's 2025 CIO survey found 37% of enterprise respondents using five or more models, up from 29% the year before. Menlo Ventures estimated enterprise generative AI spend reached $37 billion in 2025, up from $11.5 billion in 2024 — $12.5 billion of that going to foundation model APIs. Linux Foundation Research reported that 89% of organizations using AI incorporate open-source AI somewhere in their infrastructure, and 63% use an open model.

Companies want model choice. But they also need control over complexity. The more models companies use, the more they need routing, observability, security, retrieval, evaluation, cost control, and deployment tooling.

The model war creates the infrastructure market around the model.

5. What kinds of startups choose what?

A consumer AI app usually starts with a closed API plus prompting. Speed matters more than infrastructure control — the startup needs to test whether users care and whether it can grow distribution. The risk is becoming a thin wrapper: if the model provider adds the same feature natively, the startup needs another moat — community, brand, workflow, proprietary data, habit.

An enterprise knowledge assistant usually uses RAG on top of a closed, cloud-hosted, or self-hosted model. The value isn't that it knows the internet — it's that it knows the company, with traceable sources. The hard part is permissions, data cleaning, parsing, and auditability, not the model.

A regulated vertical AI startup often prefers cloud-hosted models, private deployment, or self-hosted open models, with RAG and strict governance on top. In healthcare, finance, legal, insurance, and defense-adjacent workflows, buyers care deeply about data movement — where it's stored, whether it trains the model, whether outputs are auditable. The model architecture becomes part of the sales process.

A high-volume AI workflow tool may start with closed APIs, then add routing, caching, batching, or smaller open models once usage scales and API costs become a gross margin problem. An AI infrastructure startup doesn't compete by building a better chatbot — it competes by solving the mess created by model adoption: routing, inference optimization, observability, evaluation, RAG infrastructure, security, cost monitoring.

A foundation model startup trains or heavily post-trains its own models — the hardest category, because the company isn't building an app that uses AI, it's building the AI capability itself. These startups need capital, research talent, data access, compute strategy, and a real reason to believe their model is differentiated.

6. Data risk: will proprietary data train someone else’s model?

Two very different things happen when a company uses an AI model.

Inference means the model reads your prompt or document and generates an answer — the weights don't change. Like asking a consultant to read a memo and summarize it.

Training means the model provider uses data to update the model's weights so it improves over time.

Inference is necessary — if you ask a model to summarize a contract, it has to see the contract. The more strategic question is whether that data gets stored, reused, trained on, leaked, or used to build a competing product later.

Business/API products from major providers often state that customer inputs and outputs aren't used to train base models by default. But startups still need to check the exact terms, plan, deployment method, retention settings, and whether any third-party tools touch the data.

The risk isn't only whether the model updates its weights. It's the entire data path around the model.

7. Where the startup opportunities are

This creates six bottlenecks: infrastructure (serving open models reliably at scale), cybersecurity (prompt injection, tool misuse, RAG leakage), evaluation (AI is probabilistic — confidently wrong is still wrong), data quality (the AI product can't be better than the knowledge layer underneath it), gross margin (expensive model calls vs. unlimited-usage pricing), and procurement (enterprise buyers asking where data goes, whether it's auditable, what happens if terms change, how to switch models later).

Every model strategy creates pain somewhere. Pain becomes tooling. Tooling becomes startup opportunity.

Where model strategy creates startup opportunities
The best infrastructure opportunities often sit around the pain created by model adoption.

Bottleneck	Why it matters	Startup opportunity
API cost	Usage growth can squeeze gross margin	Model routing, caching, smaller model orchestration, inference optimization
Data leakage	Enterprises need to protect proprietary or regulated data	AI security, access control, private deployment, data governance
Cybersecurity	Prompt injection and agents taking unsafe actions are new attack surfaces	Guardrails, sandboxing, agent permissioning, red-teaming
Bad retrieval	RAG fails when the wrong context is retrieved	Better document parsing, indexing, search, knowledge-layer infrastructure
Evaluation	AI is probabilistic, not deterministic — outputs can be confidently wrong	Evals, monitoring, observability, human review tooling
GPU complexity	Open models are hard to serve reliably	GPU cloud, inference platforms, deployment tooling
Vendor lock-in	Startups depend on model providers	Multi-model orchestration, abstraction layers, routing systems
Compliance burden	Enterprise buyers need auditability	AI governance, logging, policy enforcement, model risk management
Messy enterprise data	AI cannot use knowledge it cannot access cleanly	Connectors, data cleaning, permission-aware retrieval
Agent risk	Tool-using models can take incorrect or unsafe actions	Agent guardrails, workflow approvals, sandboxing, permission controls

The first generation of AI startups asked what they could build with GPT. The next generation asks how to make AI reliable, private, cheap, auditable, and deeply embedded in real workflows. That's where many of the best infrastructure opportunities are.

The actual conclusion

The winning AI startups will not simply be the ones using the smartest model. They will be the ones that know how to build the right model stack.

The model is not the moat by default. In many cases, everyone can access the same model, or a similar one, within weeks. The moat comes from what surrounds the model: workflow, proprietary data, customer trust, distribution.

In the AI era, choosing a model is not just an engineering decision. It is choosing what kind of company you are building.

#AI#STARTUPS#VENTURE CAPITAL#AI INFRASTRUCTURE#LLMS#RAG#FINE-TUNING#OPEN SOURCE AI#ENTERPRISE AI#MODEL ROUTING