Documentation - Clawd Dashboard

💰 Token Saving Strategies

Claude Sonnet 4 is powerful but expensive. Here's how to minimize costs while maximizing efficiency:

1. Use the Right Model for the Job

Task Type	Best Model	Cost	When to Use
Simple queries	Ollama (qwen2.5:3b)	$0 (FREE)	Basic questions, quick lookups, simple tasks
Code tasks	Qwen Coder 32B	NVIDIA API	Python, JavaScript, bash scripting
Quick image analysis	Llama 11B Vision	NVIDIA API	Fast screenshot review
Deep analysis	Llama 90B Vision	NVIDIA API	Long documents, complex forms
Screenshot debugging	Kimi K2.5	NVIDIA API	Error analysis with thinking mode
Complex conversations	Claude Sonnet 4	$$$ EXPENSIVE	When you need the best

💡 Pro Tip: Use Ollama local (free) first for simple queries. Only escalate to Claude if needed.

2. Spawn Sub-Agents for Heavy Work

Sub-agents run in isolated sessions with their own token budgets. Perfect for:

Research tasks (use Gemini CLI - cheaper)
Long-running jobs
Parallel processing
Tasks that don't need conversation context

sessions_spawn(
    task="Research AI job market 2026",
    model="local",  # or "gemini" for cheap cloud
    cleanup="delete"  # auto-cleanup when done
)

⚠️ Warning: Sub-agents don't have your conversation context. Keep task descriptions clear and self-contained.

3. Batch Operations

Instead of multiple back-and-forth exchanges:

❌ Inefficient (burns tokens):

"Check my email"
→ Response
"Now check calendar"
→ Response
"Now check weather"
→ Response

✅ Efficient (one request):

"Check email for urgent messages, upcoming calendar events today, and weather forecast"

4. Use Heartbeat for Periodic Checks

Instead of asking repeatedly, let heartbeat handle routine monitoring:

System health checks
Email monitoring
Calendar reminders
Background maintenance

This runs automatically without burning your conversation tokens.

🤖 Smart Model Selection Guide

LLM Gateway Commands (Telegram)

/ask "your question"           → Auto-routes to best model
/code "write a script"         → Forces Qwen Coder (code specialist)
/vision "image-url"            → Forces Llama 11B (fast vision)
/analyze "doc-url"             → Forces Llama 90B (deep analysis)
/screenshot "error-url"        → Forces Kimi (debug + thinking)
/think "complex problem"       → Deep reasoning mode
/usage                         → Check today's API usage

LLM Gateway CLI

# Quick commands
~/dta/gateway/ask "question"
~/dta/gateway/think-deep "problem"
~/dta/gateway/analyze-screenshot "image-url"
~/dta/gateway/llm-usage

# Force specific model
python3 ~/dta/gateway/llm-gateway.py --force qwen_coder "write code"
python3 ~/dta/gateway/llm-gateway.py --force llama_90b --image "url" "analyze"

💡 Smart Routing Logic:
• Code keywords → Qwen Coder
• Complex/long docs → Llama 90B
• Screenshots with errors → Kimi + thinking
• Fast vision → Llama 11B
• Simple queries → Ollama (free!)

NVIDIA API Limits

50 calls/day total shared across: Kimi K2.5, Llama 90B, Llama 11B, Qwen Coder

Check remaining calls: ~/dta/gateway/llm-usage

🤖 Using Sub-Agents Effectively

When to Spawn a Sub-Agent

Long research tasks that would burn too many Claude tokens
Background jobs (video processing, batch operations)
Tasks requiring a different model (Gemini for research)
Parallel processing (multiple tasks simultaneously)

How to Spawn

sessions_spawn(
    task="Your detailed task description",
    model="local",        # or "gemini", "claude", etc.
    label="Task-Name",    # optional: friendly name
    cleanup="delete",     # or "keep" to preserve logs
    runTimeoutSeconds=600 # optional: max runtime
)

Sub-Agent Returns

When complete, sub-agent results automatically announce back to your main session.

Cost Comparison

Approach	Tokens Used	Cost
Main agent (Claude) does research	~50,000	💰💰💰 High
Sub-agent (Gemini) does research	~50,000 (Gemini)	💰 Lower
Sub-agent (Ollama local) does research	~50,000 (local)	🆓 FREE

⚠️ Trade-off: Cheaper models = lower quality. Use Claude when quality matters, sub-agents for grunt work.

📦 Batch Operations & Efficiency

Combine Multiple Queries

Reduce round-trips by asking for multiple things at once:

# Instead of separate messages
"Check email" → "Check calendar" → "Check weather"

# Do this
"Morning briefing: check email for urgent items, calendar for today's events, and weather forecast"

File Operations

Batch read/write instead of one-at-a-time:

# Efficient
"Read all .md files in memory/ and summarize key points"

# Inefficient
"Read memory/2026-02-10.md" → "Now read 2026-02-11.md" → etc.

Script Execution

Write scripts for repeated operations:

# Instead of asking me every time
"Create a script: ~/clawd/scripts/daily-check.sh"

# Then just run it
"Run daily-check.sh"

⚡ Quick Command Reference

System

# Check system health
vm_stat | head -10

# Check RAM
sysctl hw.memsize
top -l 1 | grep PhysMem

# Check Ollama status
ollama ps

# Dashboard
open http://100.82.234.66:8080

LLM Gateway

# Quick query (Telegram)
/ask "your question"

# CLI query
~/dta/gateway/ask "your question"

# Check usage
~/dta/gateway/llm-usage
/usage

# Force model
~/dta/gateway/ask --force ollama "simple query"

Email

# List inbox
himalaya envelope list

# Search
himalaya envelope list from:recruiter subject:job

# Read
himalaya message read <id>

Tasks

# Add to Things
things add "Task description"

# View today
things show today

# Reminders
remindctl add "Reminder" --date tomorrow

Notes

# Apple Notes
memo new "Note content"
memo list
memo search "query"

# Memory
# Write to today's log
echo "Event happened" >> ~/clawd/memory/$(date +%Y-%m-%d).md

✨ Best Practices

Do's ✅

Use Ollama local for simple queries (FREE)
Spawn sub-agents for research/heavy work
Batch multiple requests into one message
Let heartbeat handle routine monitoring
Use LLM Gateway for non-Claude tasks
Write scripts for repeated operations
Check token usage periodically

Don'ts ❌

Don't use Claude for simple lookups (use Ollama/LLM Gateway)
Don't make separate requests when you can batch
Don't keep asking the same thing (write it once, reference later)
Don't spawn sub-agents for quick tasks (overhead not worth it)
Don't forget to check NVIDIA API limits (50/day)

Token Budget Guidelines

Conservative: ~100k tokens/day
Moderate: ~200k tokens/day
Heavy: 300k+ tokens/day

Current session limit: 1M tokens (200k used right now)

When Quality Matters More Than Cost

Use Claude Sonnet 4 for:

Important decisions
Complex problem-solving
Creative work
Sensitive communications
When sub-agents have failed

Use cheaper models for:

Information retrieval
Summarization
Simple transformations
Background research
Routine monitoring

🤖 The Bot Team

Tommie's AI infrastructure includes multiple specialized bots working together:

Bot	Host	Telegram	Role
🧠 Main Agent	Mac Mini (100.82.234.66)	@tommie77bot	Primary orchestrator, Clawdbot gateway
🍑 Bottom Bitch	Dell (100.119.87.108)	@Thats_My_Bottom_Bitch_bot	Cross-node coordinator, infrastructure helper
🐭 Pinky	Mac Pro (100.64.58.30)	@Pinkypickles_bot	Compute node assistant, heavy inference
🥜 Deez Nutz	TBD	@look_at_deeznutszbot	Needs setup

Group Chat: "The Bot Chat"

All bots communicate in a shared Telegram group for coordination.

Chat ID: -1003779327245 / -5052671848
@ mention each other for tasks
Share knowledge and reports
Don't step on each other's work

💡 Team Rule: Each bot monitors and responds to @ mentions in The Bot Chat. Check it on every heartbeat!

📖 Table of Contents

💰 Token Saving Strategies

1. Use the Right Model for the Job

2. Spawn Sub-Agents for Heavy Work

3. Batch Operations

4. Use Heartbeat for Periodic Checks

🤖 Smart Model Selection Guide

LLM Gateway Commands (Telegram)

LLM Gateway CLI

NVIDIA API Limits

🤖 Using Sub-Agents Effectively

When to Spawn a Sub-Agent

How to Spawn

Sub-Agent Returns

Cost Comparison

📦 Batch Operations & Efficiency

Combine Multiple Queries

File Operations

Script Execution

⚡ Quick Command Reference

System

LLM Gateway

Email

Tasks

Notes

✨ Best Practices

Do's ✅

Don'ts ❌

Token Budget Guidelines

When Quality Matters More Than Cost

🤖 The Bot Team

Group Chat: "The Bot Chat"