Pick Your Stack, Not Just the Model
AI startups aren't differentiated by which model they use anymore. The real decision is the stack around it — how they access a model, layer retrieval, routing, and fine-tuning on top, and manage the cost, data, and trust trade-offs that follow.

AI startups aren't differentiated by which model they use anymore. The real decision is the stack around it — how they access a model, layer retrieval, routing, and fine-tuning on top, and manage the cost, data, and trust trade-offs that follow.
For a while, the easiest way to describe an AI startup was: “They use GPT for X.”
That was good enough when the market was early. The first wave of AI apps was mostly about proving that large language models could be useful inside real workflows: writing emails, summarizing documents, generating code, answering customer questions, drafting legal clauses, analyzing sales calls, and so on.
But now the question has changed.
It is no longer just:
Which model is the smartest?
It is:
Which model strategy gives this startup the right mix of quality, cost, control, privacy, speed, and defensibility?
That distinction matters because model choice quietly shapes the whole company. It affects gross margin, enterprise sales, data security, infrastructure ownership, procurement, and whether the product is easy to copy or hard to replace.
The AI stack splits into two layers. Model access: which model the startup uses, who hosts it, and where the data goes. Product intelligence: what the startup builds around the model to make it useful, reliable, and specific to the customer's workflow.
- A legal AI startup might use Claude through AWS Bedrock, add RAG over legal contracts, build permission-aware retrieval, log every answer, and run evaluations before deployment.
- A consumer writing app might use OpenAI’s API, strong prompting, user memory, a beautiful interface, and later add model routing to reduce cost.
- A healthcare workflow startup might use a self-hosted open model, retrieve private documents through RAG, add strict access control, and require human review before outputs are used.
Same AI category. Very different company shapes.
1. Layer one: model access strategy
The model access layer answers the most basic infrastructure question:
Where does the intelligence come from, and who runs it?
This is where startups decide whether to rent a frontier model, access models through a cloud platform, use an open model through a hosted provider, self-host an open model, or build their own.
Model access options
The access layer determines who runs the model, who sees the data, and what cost or control trade-off the startup inherits.
| Model access strategy | How it works | Best for | Main risk |
|---|---|---|---|
| Closed frontier API | The startup calls providers like OpenAI, Anthropic, Google, Mistral, or Cohere directly through an API | Fast MVPs, frontier reasoning, coding, research, general AI apps | API cost, vendor dependency, limited control |
| Cloud-hosted model | The startup accesses models through AWS, Azure, Google Cloud, or similar cloud platforms | Enterprise customers, regulated industries, existing cloud buyers | Cloud lock-in, procurement complexity, sometimes slower model access |
| Managed open-model API | The startup uses open models through inference providers without managing GPUs itself | Teams wanting open-model flexibility without running infrastructure | Still pays usage fees, less control than self-hosting |
| Self-hosted open model | The startup downloads open-weight models like Llama, Mistral, Qwen, DeepSeek, Gemma, or Stable Diffusion-style models and runs them on its own infrastructure | Privacy, control, high-volume use cases, custom deployment | GPU infrastructure, engineering burden, model quality trade-offs |
| Own trained or heavily post-trained model | The startup trains, post-trains, or deeply customizes its own model | Foundation model labs, robotics/world models, scientific AI, deeply specialized model companies | Extremely expensive, talent-heavy, high execution risk |
Closed APIs are the fastest path, but the startup is renting intelligence from someone else. Cloud-hosted models matter for enterprise sales because large buyers already have security reviews, cloud contracts, and procurement workflows attached to their existing provider. Managed open-model APIs are an underrated middle ground — model variety without managing GPUs. Self-hosting buys control, but the weights are free; running them isn't. Training from scratch only makes sense when the model itself is the company.
2. Layer two: product intelligence strategy
The product intelligence layer answers a different question:
How does the startup make a general model useful for a specific customer workflow?
This is where RAG, fine-tuning, agents, and routing belong — not replacements for closed APIs or open models, but product layers built around whichever model access strategy the startup chooses.
Product intelligence layers
RAG, fine-tuning, agents, and routing are product layers built around a model.
| Product layer | What it does | Works with which model? | Best for | Main risk |
|---|---|---|---|---|
| Prompting | Gives the model instructions, examples, constraints, or output formats | Any model | Fast iteration, simple workflows, early products | Brittle outputs, long prompts, weak consistency |
| RAG | Retrieves relevant documents or data before the model answers | Any model: closed API, cloud-hosted, open, self-hosted | Enterprise search, legal, finance, support, internal knowledge tools | Bad retrieval, stale sources, permission leakage |
| Fine-tuning | Trains an existing model further on examples so it behaves more consistently | Usually open models or provider-supported fine-tuning APIs | Repeated formats, classification, extraction, brand voice, domain behaviour | Needs clean data, evals, maintenance |
| Tool use / agents | Lets the model call tools, search, calculate, update systems, or follow multi-step workflows | Any capable model with tool-calling or orchestration support | Workflow automation, research agents, coding agents, operational tasks | Unsafe actions, tool misuse, harder debugging |
| Model routing | Sends different tasks to different models based on quality, cost, speed, or risk | Multiple models | Cost control, multi-model products, enterprise optimization | Complexity, evaluation burden, routing mistakes |
RAG stands for retrieval-augmented generation. It doesn't change the model's weights — it gives the model a private library to look at before answering. Most enterprise value isn't in a model's general knowledge; it's in the customer's private data: contracts, policies, tickets, transcripts, manuals.
A general model can say:
Here is how employee expense policies usually work.
A RAG product can say:
According to your company's 2025 expense policy, meals over £45 require manager approval.
That second answer comes from retrieval, not the model magically knowing the company.
Fine-tuning is different — it changes how the model tends to behave, not what it knows. Best for behaviour, not constantly changing knowledge: classifying claims, extracting invoice fields, holding a brand voice.
RAG gives the model information to read. Fine-tuning changes how it behaves.
Agents go a layer further: a system where the model decides what action to take, calls tools, observes results, and continues until the task is done. Riskier, because the model is no longer only generating text — it may be searching, editing files, sending requests, or triggering workflows.
Model routing is what happens when a startup stops treating "the model" as one fixed choice. A cheap small model handles simple classification. A frontier model handles complex reasoning. An open model handles privacy-sensitive workflows.
The future isn't one model for every task. It's model portfolios.
3. How the full architecture actually works
A modern AI startup does not always send a user question to one model and immediately return the answer. The app may decide which model to use, retrieve context, check permissions, call tools, evaluate outputs, and log the result.
Simple closed API app
User → Startup app → Closed model API → Model output → back to user
Enterprise RAG app
User question → Startup app → Permission check → Retrieval layer → Company documents → Selected model → Source-grounded answer → Audit log
Self-hosted open-model app
User → Startup app → Own GPU server / private cloud → Open model inference → Output review → back to user
Agentic workflow app
User task → Planner/router → Model chooses action → Tool/database call → Observation → Continue or stop (loops back until done) → Final output
"What model do you use?" is only the starting question. A stronger diligence question is: what is the full model stack, from access to retrieval to evaluation to security? That tells you more about the startup's defensibility than the model name alone.
4. The market is becoming multi-model
Companies are no longer betting on one model for everything. Enterprise AI buyers increasingly use multiple models because different models are good at different jobs — one stronger at code, another at long documents, another cheaper for simple classification, another safer for regulated workflows.
A16z's 2025 CIO survey found 37% of enterprise respondents using five or more models, up from 29% the year before. Menlo Ventures estimated enterprise generative AI spend reached $37 billion in 2025, up from $11.5 billion in 2024 — $12.5 billion of that going to foundation model APIs. Linux Foundation Research reported that 89% of organizations using AI incorporate open-source AI somewhere in their infrastructure, and 63% use an open model.
Companies want model choice. But they also need control over complexity. The more models companies use, the more they need routing, observability, security, retrieval, evaluation, cost control, and deployment tooling.
The model war creates the infrastructure market around the model.
5. What kinds of startups choose what?
A consumer AI app usually starts with a closed API plus prompting. Speed matters more than infrastructure control — the startup needs to test whether users care and whether it can grow distribution. The risk is becoming a thin wrapper: if the model provider adds the same feature natively, the startup needs another moat — community, brand, workflow, proprietary data, habit.
An enterprise knowledge assistant usually uses RAG on top of a closed, cloud-hosted, or self-hosted model. The value isn't that it knows the internet — it's that it knows the company, with traceable sources. The hard part is permissions, data cleaning, parsing, and auditability, not the model.
A regulated vertical AI startup often prefers cloud-hosted models, private deployment, or self-hosted open models, with RAG and strict governance on top. In healthcare, finance, legal, insurance, and defense-adjacent workflows, buyers care deeply about data movement — where it's stored, whether it trains the model, whether outputs are auditable. The model architecture becomes part of the sales process.
A high-volume AI workflow tool may start with closed APIs, then add routing, caching, batching, or smaller open models once usage scales and API costs become a gross margin problem. An AI infrastructure startup doesn't compete by building a better chatbot — it competes by solving the mess created by model adoption: routing, inference optimization, observability, evaluation, RAG infrastructure, security, cost monitoring.
A foundation model startup trains or heavily post-trains its own models — the hardest category, because the company isn't building an app that uses AI, it's building the AI capability itself. These startups need capital, research talent, data access, compute strategy, and a real reason to believe their model is differentiated.
6. Data risk: will proprietary data train someone else’s model?
Two very different things happen when a company uses an AI model.
Inference means the model reads your prompt or document and generates an answer — the weights don't change. Like asking a consultant to read a memo and summarize it.
Training means the model provider uses data to update the model's weights so it improves over time.
Inference is necessary — if you ask a model to summarize a contract, it has to see the contract. The more strategic question is whether that data gets stored, reused, trained on, leaked, or used to build a competing product later.
Business/API products from major providers often state that customer inputs and outputs aren't used to train base models by default. But startups still need to check the exact terms, plan, deployment method, retention settings, and whether any third-party tools touch the data.
The risk isn't only whether the model updates its weights. It's the entire data path around the model.
7. Where the startup opportunities are
This creates six bottlenecks: infrastructure (serving open models reliably at scale), cybersecurity (prompt injection, tool misuse, RAG leakage), evaluation (AI is probabilistic — confidently wrong is still wrong), data quality (the AI product can't be better than the knowledge layer underneath it), gross margin (expensive model calls vs. unlimited-usage pricing), and procurement (enterprise buyers asking where data goes, whether it's auditable, what happens if terms change, how to switch models later).
Every model strategy creates pain somewhere. Pain becomes tooling. Tooling becomes startup opportunity.
Where model strategy creates startup opportunities
The best infrastructure opportunities often sit around the pain created by model adoption.
| Bottleneck | Why it matters | Startup opportunity |
|---|---|---|
| API cost | Usage growth can squeeze gross margin | Model routing, caching, smaller model orchestration, inference optimization |
| Data leakage | Enterprises need to protect proprietary or regulated data | AI security, access control, private deployment, data governance |
| Cybersecurity | Prompt injection and agents taking unsafe actions are new attack surfaces | Guardrails, sandboxing, agent permissioning, red-teaming |
| Bad retrieval | RAG fails when the wrong context is retrieved | Better document parsing, indexing, search, knowledge-layer infrastructure |
| Evaluation | AI is probabilistic, not deterministic — outputs can be confidently wrong | Evals, monitoring, observability, human review tooling |
| GPU complexity | Open models are hard to serve reliably | GPU cloud, inference platforms, deployment tooling |
| Vendor lock-in | Startups depend on model providers | Multi-model orchestration, abstraction layers, routing systems |
| Compliance burden | Enterprise buyers need auditability | AI governance, logging, policy enforcement, model risk management |
| Messy enterprise data | AI cannot use knowledge it cannot access cleanly | Connectors, data cleaning, permission-aware retrieval |
| Agent risk | Tool-using models can take incorrect or unsafe actions | Agent guardrails, workflow approvals, sandboxing, permission controls |
The first generation of AI startups asked what they could build with GPT. The next generation asks how to make AI reliable, private, cheap, auditable, and deeply embedded in real workflows. That's where many of the best infrastructure opportunities are.
The actual conclusion
The winning AI startups will not simply be the ones using the smartest model. They will be the ones that know how to build the right model stack.
The model is not the moat by default. In many cases, everyone can access the same model, or a similar one, within weeks. The moat comes from what surrounds the model: workflow, proprietary data, customer trust, distribution.
In the AI era, choosing a model is not just an engineering decision. It is choosing what kind of company you are building.


