AI Tutorial

Run Your Own Coding Agent Locally (For Free)

Set up a local coding agent with Ollama and Claude Code. Save on API costs, keep your code private, and run everything on your machine.

Editorial StaffJune 12, 20264 min read

In this guide, you’ll learn how to download and run a coding LLM directly on your laptop using Ollama, and connect it to Claude Code or Codex with a single command. The result: a fully functional coding agent that runs locally, costs nothing per token, and keeps your code private.

Who This Is For

Solo developers and indie founders are spending heavily on Claude Code Max or Codex Pro for tasks that don’t require top-tier reasoning
Students, hobbyists, and beginner programmers who want a hands-on agentic setup without paying monthly fees
Anyone working with sensitive or proprietary code who prefers everything to stay on their own machine

What You’ll Build

By the end, you’ll have a local coding agent (Claude Code, Codex, or OpenCode) connected to a free Ollama model running on your hardware. You’ll use the same interface and workflow, just without API costs or external dependencies.

Requirements

Ollama installed (follow the install guide if needed)
Claude Code set up, or OpenCode as an alternative
A terminal and an active project folder

Hardware:

8 GB RAM (minimum, for smaller models)
16 GB RAM (recommended)
32 GB+ or GPU (ideal for larger models)

Step 1: Check Your Hardware and Choose the Right Model

Before downloading anything, understand what your machine can handle.

On Mac: Apple menu and go to About This Mac
On Windows: Settings, and then System, and lastly About

Take a screenshot of your specs and ask an LLM:

“Which Ollama coding models can I run on this machine?”

It will suggest suitable models based on your RAM and processor.

General guideline:

RAM Model Size Example

8 GB ~3B params qwen3-coder:3b

12 GB 4–7B params gemma4:e2b

16 GB 7–12B params qwen3-coder:7b

32 GB+ / GPU 20B+ gpt-oss:20b

RAM	Model Size	Example
8 GB	~3B params	qwen3-coder:3b
12 GB	4–7B params	gemma4:e2b
16 GB	7–12B params	qwen3-coder:7b
32 GB+ / GPU	20B+	gpt-oss:20b

Pro tip: Use the largest model your system can comfortably handle. Bigger models are more reliable when executing tool-based coding tasks.

Step 2: Choose a Model That Supports Agent Workflows

Go to Ollama’s model directory and look for coding models.

Before selecting one, check:

The Applications section on the model page
Ensure it supports tools like Claude Code, Codex, or OpenCode

If it doesn’t, skip it, agentic workflows won’t function properly.

Top-performing options:

qwen3-coder: strong code generation for its size
gemma4: better at multi-step reasoning and tool usage
gpt-oss: powerful open-weights model with solid agent support

Pro tip: If unsure, download two models and compare. Storage is cheap; performance matters more.

Step 3: Launch Your Local Coding Agent

From the model page, copy the launch command. Example:

Then,

Open your terminal inside your project folder
Paste and run the command
Confirm the model download

Once complete, you’ll be inside a Claude Code session powered by your local model.

Run /model to confirm which model is active. From here, every response is free.

Pro tip: Use ollama ps in another terminal to monitor performance (RAM, GPU usage, active model).

Step 4: Increase the Context Window

This is a critical step most people miss.

By default, Ollama sets a 4K context window, which is too small for coding agents. It causes the model to forget earlier parts of your session.

To fix it:

Open the Ollama app
Go to Settings
Increase context to 32K or higher (depending on your system)

Pro tip: Don’t max it out blindly. Ask an LLM what your hardware can safely support to avoid crashes.

Step 5: Try OpenCode for a Simpler Setup

If Claude Code feels too complex, switch to OpenCode, a lighter alternative designed for smaller models.

Install on Mac:

curl -fsSL https://opencode.ai/install | bash

Run it with:

ollama launch opencode --model gemma4:e4b

Pro tip: Smaller models perform better when they “think less.” Use plan mode (Shift + Tab) or reduce reasoning settings if tasks feel overcomplicated.

Going Further

Once everything is working, you can extend your setup:

Use a hybrid workflow: Let Claude Code or Codex handle planning, and your local model handle repetitive tasks
Control your agent remotely: Use Claude remote control and connect via mobile
Run on a dedicated machine: Set up an old PC or Mac mini as a local inference server

Final Takeaway

Local coding models aren’t yet a complete replacement for cloud-based agents, but they’re incredibly useful for:

Practicing agent workflows
Protecting sensitive code
Reducing costs on routine tasks

And with rapid improvements, the gap between local and cloud models continues to shrink.

Editorial Staff

The Editorial Staff at AIChief is a team of Professional Content writers with extensive experience in the field of AI and Marketing. AIChief was Founded in 2025, AIChief has quickly grown to become the largest free AI resource hub in the industry. Stay connected with them on Facebook, Instagram and X for the latest updates.

View All Posts

User Comments

Filter:

No comments yet. Be the first to comment!

Latest Reads