News#gemma4#ollama#open-source#agents#tutorial

Gemma 4 Is Out: Run a Full Agentic AI Stack on Your Laptop for Free (Apache 2.0, Ollama, Function-Calling)

Google's Gemma 4 is the first open model under Apache 2.0 with native function-calling and 256K context. Run it locally with Ollama in 5 minutes. Here's the builder's guide.

AI Builder ClubApril 6, 20262 min read

Google just released Gemma 4 — and it changes the math on self-hosted AI agents.

Here's the quick version: a 31B open-weight model ranked #3 globally among open models, with native function-calling built in, a 256K context window, and an Apache 2.0 license that lets you use it commercially without restrictions. You can run it on your laptop right now using Ollama.

What Makes Gemma 4 Different for Builders

1. Native function-calling — no fine-tuning required

Most open models need fine-tuning or careful prompt engineering to reliably call tools. Gemma 4 has native support for function-calling, structured JSON output, and system instructions baked in.

2. Apache 2.0 — genuinely commercial-use open

Llama 4 has commercial-use restrictions above a certain usage threshold. Gemma 4 ships under Apache 2.0 — no usage caps, no restrictions, no "contact Meta for a license."

3. Runs where your users are

Four model sizes cover the whole spectrum:

  • E2B / E4B (edge) — runs on Android, iOS, Raspberry Pi 5. Under 1.5GB RAM for E2B. 128K context.
  • 26B MoE — latency-optimized, activates only 3.8B parameters during inference.
  • 31B Dense — best quality, runs quantized on a gaming GPU. 256K context.

4. The ecosystem is already there

Day-one support from: Ollama, LM Studio, vLLM, Hugging Face Transformers, llama.cpp, MLX, NVIDIA NIM.

Quick Start: Gemma 4 Agent with Ollama

If you have Ollama installed:

ollama pull gemma4:26b

Basic function-calling agent in Python:

import ollama

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    }
]

response = ollama.chat(
    model="gemma4:26b",
    messages=[{"role": "user", "content": "What are the latest AI agent frameworks?"}],
    tools=tools
)

if response.message.tool_calls:
    for tool_call in response.message.tool_calls:
        print(f"Calling: {tool_call.function.name}")
        print(f"Args: {tool_call.function.arguments}")

When to Use Gemma 4 vs. Claude Sonnet

Use Gemma 4 when:

  • Building offline-first (field tools, low-connectivity, IoT)
  • Privacy requirements prevent sending data to external APIs
  • Prototyping without API credits
  • Building commercial products (Apache 2.0 matters)

Stick with Claude Sonnet 4.6 when:

  • You need best reasoning quality on hard problems
  • Running multi-agent coordination
  • Latency from local inference is a UX problem

The Bigger Picture

Gemma 4 + Ollama is what "AI agent for everyone" actually looks like. Not a cloud subscription. Not a monthly API bill. A capable, commercially-usable agent model that runs on the hardware your users already have.

The builders who figure out the on-device agent use cases in the next 6 months will define a category that nobody has named yet.

AI Builder Club is where we figure out those use cases first.

Join AIBC and share what you build with Gemma 4.

Get the free AI Builder Newsletter

Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.

No spam. Unsubscribe anytime.

Go deeper with AI Builder Club

Join 1,000+ ambitious professionals and builders learning to use AI at work.

  • Expert-led courses on Cursor, MCP, AI agents, and more
  • Weekly live workshops with industry builders
  • Private community for feedback, collaboration, and accountability