How to Add AI Features Without Being an ML Engineer

Everyone wants AI in their product now. And if you're a frontend or full-stack engineer, the expectation is increasingly that you can build those features — even if you've never trained a model, touched Python's data stack, or know what a transformer actually is.

The good news: you don't need any of that.

Adding AI features to a web app in 2025 is largely an API integration problem. The hard parts — the models, the infrastructure, the inference — are handled by providers like Anthropic, OpenAI, and Google. Your job is to call those APIs intelligently and build a good experience around the results.

Here's how to do that.


The Mental Model: LLMs Are Just APIs

The first thing to internalise is that an LLM, from your perspective as a web engineer, behaves like a slow, probabilistic API. You send it text, it returns text. That's the core contract.

Everything else — tools, function calling, RAG, agents — is built on top of this. Start with the basics and add complexity only when you need it.

// This is the core of almost every AI feature
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  messages: [{ role: 'user', content: userMessage }],
})

const text = response.content[0].text

Once you have that working, you understand 80% of AI integration. The remaining 20% is mostly about streaming, context management, and UX.


Setting Up: Pick One Provider and Start

Don't paralysis-by-analysis your way through provider comparisons. Pick one, get something working, and switch later if you need to.

For most Next.js projects, the setup looks like this:

npm install @anthropic-ai/sdk

Add your API key to .env.local:

ANTHROPIC_API_KEY=sk-ant-...

Create an API route in src/app/api/chat/route.ts:

import Anthropic from '@anthropic-ai/sdk'

const anthropic = new Anthropic()

export async function POST(req: Request) {
  const { message } = await req.json()

  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    messages: [{ role: 'user', content: message }],
  })

  return Response.json({ text: response.content[0].text })
}

That's a working AI feature. It's not fancy, but it's real. Now you can build on it.


Streaming: The UX Difference-Maker

The biggest UX mistake I see in early AI integrations is waiting for the full response before showing anything. LLMs can take 5–15 seconds to generate a long response. Showing a spinner for that long feels broken.

Streaming fixes this. Instead of waiting for the full response, you send tokens as they're generated. The user sees text appearing in real time — which feels fast even when it isn't.

Here's how to stream from an API route:

export async function POST(req: Request) {
  const { message } = await req.json()

  const stream = await anthropic.messages.stream({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    messages: [{ role: 'user', content: message }],
  })

  return new Response(stream.toReadableStream())
}

And on the client:

const response = await fetch('/api/chat', {
  method: 'POST',
  body: JSON.stringify({ message }),
})

const reader = response.body?.getReader()
const decoder = new TextDecoder()

while (true) {
  const { done, value } = await reader.read()
  if (done) break
  const chunk = decoder.decode(value)
  setOutput((prev) => prev + chunk)
}

This transforms your AI feature from "feels slow" to "feels alive."


Prompt Design: The Engineering Part

Here's where most of the actual work is. The model is smart, but it needs good instructions to be useful. Writing those instructions well is called prompt engineering — and despite the fancy name, it's mostly just being clear and specific.

A few patterns that work:

System prompts set the rules. Use the system field to define the model's behaviour, not the first user message.

await anthropic.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  system:
    'You are a concise code reviewer. Review the provided code snippet and return only a bulleted list of issues. Be direct and specific.',
  messages: [{ role: 'user', content: codeSnippet }],
})

Be explicit about output format. If you need JSON back, say so. If you need a specific structure, show an example. LLMs follow formatting instructions well when they're clear.

Give context, not just questions. "Fix this bug" is worse than "This function is supposed to debounce a search input. It's currently firing on every keystroke. Fix it." The more context, the better the output.


Keeping Conversation Context

If you're building anything conversational, you need to send the message history with each request. LLMs are stateless — they don't remember previous turns unless you include them.

type Message = { role: 'user' | 'assistant'; content: string }

async function chat(history: Message[], newMessage: string) {
  const messages = [...history, { role: 'user' as const, content: newMessage }]

  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    messages,
  })

  return response.content[0].text
}

Watch your token usage here. Long conversations get expensive fast. A common pattern is to summarize older context and drop early messages from the history once it exceeds a certain length.


Error Handling and Rate Limits

AI APIs fail. They hit rate limits, time out, return malformed output. Handle this properly from the start.

try {
  const response = await anthropic.messages.create({ ... })
  return response.content[0].text
} catch (error) {
  if (error instanceof Anthropic.RateLimitError) {
    return 'Too many requests — try again in a moment.'
  }
  if (error instanceof Anthropic.APITimeoutError) {
    return 'The request timed out. Please try again.'
  }
  throw error
}

Also: validate that response.content[0] exists and is a text block before accessing .text. The API can return other content types depending on the request.


UX Principles for AI Features

The engineering is only half the job. AI features need different UX thinking than normal CRUD operations.

Show generation progress. Streaming is one way. A subtle "thinking..." indicator while waiting for the first token is another. Users forgive latency when they know something is happening.

Make outputs editable. AI output is a starting point, not a final product. Letting users edit generated text reduces the stakes of imperfect output.

Be honest about uncertainty. If the model might be wrong — and it might — say so. "Here's a suggestion — double-check before using it" builds more trust than presenting AI output as fact.

Give users control. Regenerate buttons, tone sliders, length controls — small affordances that let users guide the output without re-prompting from scratch.


You Don't Need to Be an ML Engineer

The barrier to shipping AI features has never been lower. The models are accessible, the SDKs are well-documented, and the integration patterns are familiar to anyone who has worked with REST APIs.

What you do need is good engineering judgment — about prompt design, about context management, about UX. Those are frontend and full-stack skills. You already have them.

Start simple. Get something working. Then layer in streaming, context, and better prompts. That's the whole playbook.