📖 Table of Contents

💰 Token Saving Strategies

Claude Sonnet 4 is powerful but expensive. Here's how to minimize costs while maximizing efficiency:

1. Use the Right Model for the Job

Task Type Best Model Cost When to Use
Simple queries Ollama (qwen2.5:3b) $0 (FREE) Basic questions, quick lookups, simple tasks
Code tasks Qwen Coder 32B NVIDIA API Python, JavaScript, bash scripting
Quick image analysis Llama 11B Vision NVIDIA API Fast screenshot review
Deep analysis Llama 90B Vision NVIDIA API Long documents, complex forms
Screenshot debugging Kimi K2.5 NVIDIA API Error analysis with thinking mode
Complex conversations Claude Sonnet 4 $$$ EXPENSIVE When you need the best
💡 Pro Tip: Use Ollama local (free) first for simple queries. Only escalate to Claude if needed.

2. Spawn Sub-Agents for Heavy Work

Sub-agents run in isolated sessions with their own token budgets. Perfect for:

sessions_spawn( task="Research AI job market 2026", model="local", # or "gemini" for cheap cloud cleanup="delete" # auto-cleanup when done )
⚠️ Warning: Sub-agents don't have your conversation context. Keep task descriptions clear and self-contained.

3. Batch Operations

Instead of multiple back-and-forth exchanges:

❌ Inefficient (burns tokens):

"Check my email" → Response "Now check calendar" → Response "Now check weather" → Response

✅ Efficient (one request):

"Check email for urgent messages, upcoming calendar events today, and weather forecast"

4. Use Heartbeat for Periodic Checks

Instead of asking repeatedly, let heartbeat handle routine monitoring:

This runs automatically without burning your conversation tokens.

🤖 Smart Model Selection Guide

LLM Gateway Commands (Telegram)

/ask "your question" → Auto-routes to best model /code "write a script" → Forces Qwen Coder (code specialist) /vision "image-url" → Forces Llama 11B (fast vision) /analyze "doc-url" → Forces Llama 90B (deep analysis) /screenshot "error-url" → Forces Kimi (debug + thinking) /think "complex problem" → Deep reasoning mode /usage → Check today's API usage

LLM Gateway CLI

# Quick commands ~/dta/gateway/ask "question" ~/dta/gateway/think-deep "problem" ~/dta/gateway/analyze-screenshot "image-url" ~/dta/gateway/llm-usage # Force specific model python3 ~/dta/gateway/llm-gateway.py --force qwen_coder "write code" python3 ~/dta/gateway/llm-gateway.py --force llama_90b --image "url" "analyze"
💡 Smart Routing Logic:
• Code keywords → Qwen Coder
• Complex/long docs → Llama 90B
• Screenshots with errors → Kimi + thinking
• Fast vision → Llama 11B
• Simple queries → Ollama (free!)

NVIDIA API Limits

50 calls/day total shared across: Kimi K2.5, Llama 90B, Llama 11B, Qwen Coder

Check remaining calls: ~/dta/gateway/llm-usage

🤖 Using Sub-Agents Effectively

When to Spawn a Sub-Agent

How to Spawn

sessions_spawn( task="Your detailed task description", model="local", # or "gemini", "claude", etc. label="Task-Name", # optional: friendly name cleanup="delete", # or "keep" to preserve logs runTimeoutSeconds=600 # optional: max runtime )

Sub-Agent Returns

When complete, sub-agent results automatically announce back to your main session.

Cost Comparison

Approach Tokens Used Cost
Main agent (Claude) does research ~50,000 💰💰💰 High
Sub-agent (Gemini) does research ~50,000 (Gemini) 💰 Lower
Sub-agent (Ollama local) does research ~50,000 (local) 🆓 FREE
⚠️ Trade-off: Cheaper models = lower quality. Use Claude when quality matters, sub-agents for grunt work.

📦 Batch Operations & Efficiency

Combine Multiple Queries

Reduce round-trips by asking for multiple things at once:

# Instead of separate messages "Check email" → "Check calendar" → "Check weather" # Do this "Morning briefing: check email for urgent items, calendar for today's events, and weather forecast"

File Operations

Batch read/write instead of one-at-a-time:

# Efficient "Read all .md files in memory/ and summarize key points" # Inefficient "Read memory/2026-02-10.md" → "Now read 2026-02-11.md" → etc.

Script Execution

Write scripts for repeated operations:

# Instead of asking me every time "Create a script: ~/clawd/scripts/daily-check.sh" # Then just run it "Run daily-check.sh"

⚡ Quick Command Reference

System

# Check system health vm_stat | head -10 # Check RAM sysctl hw.memsize top -l 1 | grep PhysMem # Check Ollama status ollama ps # Dashboard open http://100.82.234.66:8080

LLM Gateway

# Quick query (Telegram) /ask "your question" # CLI query ~/dta/gateway/ask "your question" # Check usage ~/dta/gateway/llm-usage /usage # Force model ~/dta/gateway/ask --force ollama "simple query"

Email

# List inbox himalaya envelope list # Search himalaya envelope list from:recruiter subject:job # Read himalaya message read <id>

Tasks

# Add to Things things add "Task description" # View today things show today # Reminders remindctl add "Reminder" --date tomorrow

Notes

# Apple Notes memo new "Note content" memo list memo search "query" # Memory # Write to today's log echo "Event happened" >> ~/clawd/memory/$(date +%Y-%m-%d).md

✨ Best Practices

Do's ✅

Don'ts ❌

Token Budget Guidelines

Conservative: ~100k tokens/day
Moderate: ~200k tokens/day
Heavy: 300k+ tokens/day

Current session limit: 1M tokens (200k used right now)

When Quality Matters More Than Cost

Use Claude Sonnet 4 for:

Use cheaper models for:

🤖 The Bot Team

Tommie's AI infrastructure includes multiple specialized bots working together:

Bot Host Telegram Role
🧠 Main Agent Mac Mini (100.82.234.66) @tommie77bot Primary orchestrator, Clawdbot gateway
🍑 Bottom Bitch Dell (100.119.87.108) @Thats_My_Bottom_Bitch_bot Cross-node coordinator, infrastructure helper
🐭 Pinky Mac Pro (100.64.58.30) @Pinkypickles_bot Compute node assistant, heavy inference
🥜 Deez Nutz TBD @look_at_deeznutszbot Needs setup

Group Chat: "The Bot Chat"

All bots communicate in a shared Telegram group for coordination.

💡 Team Rule: Each bot monitors and responds to @ mentions in The Bot Chat. Check it on every heartbeat!