★ INSERT COINNOW PLAYING: VENTURESHIGH SCORE: $100M ARR★ NEW STAGE UNLOCKED: ABOUT MEPRESS START★ DEMO DAY 04:00:00
★ INSERT COINNOW PLAYING: VENTURESHIGH SCORE: $100M ARR★ NEW STAGE UNLOCKED: ABOUT MEPRESS START★ DEMO DAY 04:00:00
◀ BACK
VENTURE TAKES

Pick Your Stack, Not Just the Model

AI startups aren't differentiated by which model they use anymore. The real decision is the stack around it — how they access a model, layer retrieval, routing, and fine-tuning on top, and manage the cost, data, and trust trade-offs that follow.

1P · JUDY DUONG·JUNE 24, 2026·8 MIN READ
Pick Your Stack, Not Just the Model

AI startups aren't differentiated by which model they use anymore. The real decision is the stack around it — how they access a model, layer retrieval, routing, and fine-tuning on top, and manage the cost, data, and trust trade-offs that follow.

For a while, the easiest way to describe an AI startup was: “They use GPT for X.”

That was good enough when the market was early. The first wave of AI apps was mostly about proving that large language models could be useful inside real workflows: writing emails, summarizing documents, generating code, answering customer questions, drafting legal clauses, analyzing sales calls, and so on.

But now the question has changed.

It is no longer just:

Which model is the smartest?

It is:

Which model strategy gives this startup the right mix of quality, cost, control, privacy, speed, and defensibility?

That distinction matters because model choice quietly shapes the whole company. It affects gross margin, enterprise sales, data security, infrastructure ownership, procurement, and whether the product is easy to copy or hard to replace.

The AI stack splits into two layers. Model access: which model the startup uses, who hosts it, and where the data goes. Product intelligence: what the startup builds around the model to make it useful, reliable, and specific to the customer's workflow.

  • A legal AI startup might use Claude through AWS Bedrock, add RAG over legal contracts, build permission-aware retrieval, log every answer, and run evaluations before deployment.
  • A consumer writing app might use OpenAI’s API, strong prompting, user memory, a beautiful interface, and later add model routing to reduce cost.
  • A healthcare workflow startup might use a self-hosted open model, retrieve private documents through RAG, add strict access control, and require human review before outputs are used.

Same AI category. Very different company shapes.

1. Layer one: model access strategy

The model access layer answers the most basic infrastructure question:

Where does the intelligence come from, and who runs it?

This is where startups decide whether to rent a frontier model, access models through a cloud platform, use an open model through a hosted provider, self-host an open model, or build their own.

Model access options
The access layer determines who runs the model, who sees the data, and what cost or control trade-off the startup inherits.

Model access strategyHow it worksBest forMain risk
Closed frontier APIThe startup calls providers like OpenAI, Anthropic, Google, Mistral, or Cohere directly through an APIFast MVPs, frontier reasoning, coding, research, general AI appsAPI cost, vendor dependency, limited control
Cloud-hosted modelThe startup accesses models through AWS, Azure, Google Cloud, or similar cloud platformsEnterprise customers, regulated industries, existing cloud buyersCloud lock-in, procurement complexity, sometimes slower model access
Managed open-model APIThe startup uses open models through inference providers without managing GPUs itselfTeams wanting open-model flexibility without running infrastructureStill pays usage fees, less control than self-hosting
Self-hosted open modelThe startup downloads open-weight models like Llama, Mistral, Qwen, DeepSeek, Gemma, or Stable Diffusion-style models and runs them on its own infrastructurePrivacy, control, high-volume use cases, custom deploymentGPU infrastructure, engineering burden, model quality trade-offs
Own trained or heavily post-trained modelThe startup trains, post-trains, or deeply customizes its own modelFoundation model labs, robotics/world models, scientific AI, deeply specialized model companiesExtremely expensive, talent-heavy, high execution risk

Closed APIs are the fastest path, but the startup is renting intelligence from someone else. Cloud-hosted models matter for enterprise sales because large buyers already have security reviews, cloud contracts, and procurement workflows attached to their existing provider. Managed open-model APIs are an underrated middle ground — model variety without managing GPUs. Self-hosting buys control, but the weights are free; running them isn't. Training from scratch only makes sense when the model itself is the company.

2. Layer two: product intelligence strategy

The product intelligence layer answers a different question:

How does the startup make a general model useful for a specific customer workflow?

This is where RAG, fine-tuning, agents, and routing belong — not replacements for closed APIs or open models, but product layers built around whichever model access strategy the startup chooses.

Product intelligence layers
RAG, fine-tuning, agents, and routing are product layers built around a model.

Product layerWhat it doesWorks with which model?Best forMain risk
PromptingGives the model instructions, examples, constraints, or output formatsAny modelFast iteration, simple workflows, early productsBrittle outputs, long prompts, weak consistency
RAGRetrieves relevant documents or data before the model answersAny model: closed API, cloud-hosted, open, self-hostedEnterprise search, legal, finance, support, internal knowledge toolsBad retrieval, stale sources, permission leakage
Fine-tuningTrains an existing model further on examples so it behaves more consistentlyUsually open models or provider-supported fine-tuning APIsRepeated formats, classification, extraction, brand voice, domain behaviourNeeds clean data, evals, maintenance
Tool use / agentsLets the model call tools, search, calculate, update systems, or follow multi-step workflowsAny capable model with tool-calling or orchestration supportWorkflow automation, research agents, coding agents, operational tasksUnsafe actions, tool misuse, harder debugging
Model routingSends different tasks to different models based on quality, cost, speed, or riskMultiple modelsCost control, multi-model products, enterprise optimizationComplexity, evaluation burden, routing mistakes

RAG stands for retrieval-augmented generation. It doesn't change the model's weights — it gives the model a private library to look at before answering. Most enterprise value isn't in a model's general knowledge; it's in the customer's private data: contracts, policies, tickets, transcripts, manuals.

A general model can say:

Here is how employee expense policies usually work.

A RAG product can say:

According to your company's 2025 expense policy, meals over £45 require manager approval.

That second answer comes from retrieval, not the model magically knowing the company.

Fine-tuning is different — it changes how the model tends to behave, not what it knows. Best for behaviour, not constantly changing knowledge: classifying claims, extracting invoice fields, holding a brand voice.

RAG gives the model information to read. Fine-tuning changes how it behaves.

Agents go a layer further: a system where the model decides what action to take, calls tools, observes results, and continues until the task is done. Riskier, because the model is no longer only generating text — it may be searching, editing files, sending requests, or triggering workflows.

Model routing is what happens when a startup stops treating "the model" as one fixed choice. A cheap small model handles simple classification. A frontier model handles complex reasoning. An open model handles privacy-sensitive workflows.

The future isn't one model for every task. It's model portfolios.

3. How the full architecture actually works

A modern AI startup does not always send a user question to one model and immediately return the answer. The app may decide which model to use, retrieve context, check permissions, call tools, evaluate outputs, and log the result.

Simple closed API app
User → Startup app → Closed model API → Model output → back to user

Enterprise RAG app
User question → Startup app → Permission check → Retrieval layer → Company documents → Selected model → Source-grounded answer → Audit log

Self-hosted open-model app
User → Startup app → Own GPU server / private cloud → Open model inference → Output review → back to user

Agentic workflow app
User task → Planner/router → Model chooses action → Tool/database call → Observation → Continue or stop (loops back until done) → Final output

"What model do you use?" is only the starting question. A stronger diligence question is: what is the full model stack, from access to retrieval to evaluation to security? That tells you more about the startup's defensibility than the model name alone.

4. The market is becoming multi-model

Companies are no longer betting on one model for everything. Enterprise AI buyers increasingly use multiple models because different models are good at different jobs — one stronger at code, another at long documents, another cheaper for simple classification, another safer for regulated workflows.

A16z's 2025 CIO survey found 37% of enterprise respondents using five or more models, up from 29% the year before. Menlo Ventures estimated enterprise generative AI spend reached $37 billion in 2025, up from $11.5 billion in 2024 — $12.5 billion of that going to foundation model APIs. Linux Foundation Research reported that 89% of organizations using AI incorporate open-source AI somewhere in their infrastructure, and 63% use an open model.

Companies want model choice. But they also need control over complexity. The more models companies use, the more they need routing, observability, security, retrieval, evaluation, cost control, and deployment tooling.

The model war creates the infrastructure market around the model.

5. What kinds of startups choose what?

A consumer AI app usually starts with a closed API plus prompting. Speed matters more than infrastructure control — the startup needs to test whether users care and whether it can grow distribution. The risk is becoming a thin wrapper: if the model provider adds the same feature natively, the startup needs another moat — community, brand, workflow, proprietary data, habit.

An enterprise knowledge assistant usually uses RAG on top of a closed, cloud-hosted, or self-hosted model. The value isn't that it knows the internet — it's that it knows the company, with traceable sources. The hard part is permissions, data cleaning, parsing, and auditability, not the model.

A regulated vertical AI startup often prefers cloud-hosted models, private deployment, or self-hosted open models, with RAG and strict governance on top. In healthcare, finance, legal, insurance, and defense-adjacent workflows, buyers care deeply about data movement — where it's stored, whether it trains the model, whether outputs are auditable. The model architecture becomes part of the sales process.

A high-volume AI workflow tool may start with closed APIs, then add routing, caching, batching, or smaller open models once usage scales and API costs become a gross margin problem. An AI infrastructure startup doesn't compete by building a better chatbot — it competes by solving the mess created by model adoption: routing, inference optimization, observability, evaluation, RAG infrastructure, security, cost monitoring.

A foundation model startup trains or heavily post-trains its own models — the hardest category, because the company isn't building an app that uses AI, it's building the AI capability itself. These startups need capital, research talent, data access, compute strategy, and a real reason to believe their model is differentiated.

6. Data risk: will proprietary data train someone else’s model?

Two very different things happen when a company uses an AI model.

Inference means the model reads your prompt or document and generates an answer — the weights don't change. Like asking a consultant to read a memo and summarize it.

Training means the model provider uses data to update the model's weights so it improves over time.

Inference is necessary — if you ask a model to summarize a contract, it has to see the contract. The more strategic question is whether that data gets stored, reused, trained on, leaked, or used to build a competing product later.

Business/API products from major providers often state that customer inputs and outputs aren't used to train base models by default. But startups still need to check the exact terms, plan, deployment method, retention settings, and whether any third-party tools touch the data.

The risk isn't only whether the model updates its weights. It's the entire data path around the model.

7. Where the startup opportunities are

This creates six bottlenecks: infrastructure (serving open models reliably at scale), cybersecurity (prompt injection, tool misuse, RAG leakage), evaluation (AI is probabilistic — confidently wrong is still wrong), data quality (the AI product can't be better than the knowledge layer underneath it), gross margin (expensive model calls vs. unlimited-usage pricing), and procurement (enterprise buyers asking where data goes, whether it's auditable, what happens if terms change, how to switch models later).

Every model strategy creates pain somewhere. Pain becomes tooling. Tooling becomes startup opportunity.

Where model strategy creates startup opportunities
The best infrastructure opportunities often sit around the pain created by model adoption.

BottleneckWhy it mattersStartup opportunity
API costUsage growth can squeeze gross marginModel routing, caching, smaller model orchestration, inference optimization
Data leakageEnterprises need to protect proprietary or regulated dataAI security, access control, private deployment, data governance
CybersecurityPrompt injection and agents taking unsafe actions are new attack surfacesGuardrails, sandboxing, agent permissioning, red-teaming
Bad retrievalRAG fails when the wrong context is retrievedBetter document parsing, indexing, search, knowledge-layer infrastructure
EvaluationAI is probabilistic, not deterministic — outputs can be confidently wrongEvals, monitoring, observability, human review tooling
GPU complexityOpen models are hard to serve reliablyGPU cloud, inference platforms, deployment tooling
Vendor lock-inStartups depend on model providersMulti-model orchestration, abstraction layers, routing systems
Compliance burdenEnterprise buyers need auditabilityAI governance, logging, policy enforcement, model risk management
Messy enterprise dataAI cannot use knowledge it cannot access cleanlyConnectors, data cleaning, permission-aware retrieval
Agent riskTool-using models can take incorrect or unsafe actionsAgent guardrails, workflow approvals, sandboxing, permission controls

The first generation of AI startups asked what they could build with GPT. The next generation asks how to make AI reliable, private, cheap, auditable, and deeply embedded in real workflows. That's where many of the best infrastructure opportunities are.

The actual conclusion

The winning AI startups will not simply be the ones using the smartest model. They will be the ones that know how to build the right model stack.

The model is not the moat by default. In many cases, everyone can access the same model, or a similar one, within weeks. The moat comes from what surrounds the model: workflow, proprietary data, customer trust, distribution.

In the AI era, choosing a model is not just an engineering decision. It is choosing what kind of company you are building.

#AI#STARTUPS#VENTURE CAPITAL#AI INFRASTRUCTURE#LLMS#RAG#FINE-TUNING#OPEN SOURCE AI#ENTERPRISE AI#MODEL ROUTING