🤖 Clawd Dashboard
💰 Token Saving Strategies
Claude Sonnet 4 is powerful but expensive. Here's how to minimize costs while maximizing efficiency:
1. Use the Right Model for the Job
Task Type
Best Model
Cost
When to Use
Simple queries
Ollama (qwen2.5:3b)
$0 (FREE)
Basic questions, quick lookups, simple tasks
Code tasks
Qwen Coder 32B
NVIDIA API
Python, JavaScript, bash scripting
Quick image analysis
Llama 11B Vision
NVIDIA API
Fast screenshot review
Deep analysis
Llama 90B Vision
NVIDIA API
Long documents, complex forms
Screenshot debugging
Kimi K2.5
NVIDIA API
Error analysis with thinking mode
Complex conversations
Claude Sonnet 4
$$$ EXPENSIVE
When you need the best
💡 Pro Tip: Use Ollama local (free) first for simple queries. Only escalate to Claude if needed.
2. Spawn Sub-Agents for Heavy Work
Sub-agents run in isolated sessions with their own token budgets. Perfect for:
Research tasks (use Gemini CLI - cheaper)
Long-running jobs
Parallel processing
Tasks that don't need conversation context
sessions_spawn(
task="Research AI job market 2026",
model="local", # or "gemini" for cheap cloud
cleanup="delete" # auto-cleanup when done
)
⚠️ Warning: Sub-agents don't have your conversation context. Keep task descriptions clear and self-contained.
3. Batch Operations
Instead of multiple back-and-forth exchanges:
❌ Inefficient (burns tokens):
"Check my email"
→ Response
"Now check calendar"
→ Response
"Now check weather"
→ Response
✅ Efficient (one request):
"Check email for urgent messages, upcoming calendar events today, and weather forecast"
4. Use Heartbeat for Periodic Checks
Instead of asking repeatedly, let heartbeat handle routine monitoring:
System health checks
Email monitoring
Calendar reminders
Background maintenance
This runs automatically without burning your conversation tokens.
🤖 Smart Model Selection Guide
LLM Gateway Commands (Telegram)
/ask "your question" → Auto-routes to best model
/code "write a script" → Forces Qwen Coder (code specialist)
/vision "image-url" → Forces Llama 11B (fast vision)
/analyze "doc-url" → Forces Llama 90B (deep analysis)
/screenshot "error-url" → Forces Kimi (debug + thinking)
/think "complex problem" → Deep reasoning mode
/usage → Check today's API usage
LLM Gateway CLI
# Quick commands
~/dta/gateway/ask "question"
~/dta/gateway/think-deep "problem"
~/dta/gateway/analyze-screenshot "image-url"
~/dta/gateway/llm-usage
# Force specific model
python3 ~/dta/gateway/llm-gateway.py --force qwen_coder "write code"
python3 ~/dta/gateway/llm-gateway.py --force llama_90b --image "url" "analyze"
💡 Smart Routing Logic:
• Code keywords → Qwen Coder
• Complex/long docs → Llama 90B
• Screenshots with errors → Kimi + thinking
• Fast vision → Llama 11B
• Simple queries → Ollama (free!)
NVIDIA API Limits
50 calls/day total shared across: Kimi K2.5, Llama 90B, Llama 11B, Qwen Coder
Check remaining calls: ~/dta/gateway/llm-usage
🤖 Using Sub-Agents Effectively
When to Spawn a Sub-Agent
Long research tasks that would burn too many Claude tokens
Background jobs (video processing, batch operations)
Tasks requiring a different model (Gemini for research)
Parallel processing (multiple tasks simultaneously)
How to Spawn
sessions_spawn(
task="Your detailed task description",
model="local", # or "gemini", "claude", etc.
label="Task-Name", # optional: friendly name
cleanup="delete", # or "keep" to preserve logs
runTimeoutSeconds=600 # optional: max runtime
)
Sub-Agent Returns
When complete, sub-agent results automatically announce back to your main session.
Cost Comparison
Approach
Tokens Used
Cost
Main agent (Claude) does research
~50,000
💰💰💰 High
Sub-agent (Gemini) does research
~50,000 (Gemini)
💰 Lower
Sub-agent (Ollama local) does research
~50,000 (local)
🆓 FREE
⚠️ Trade-off: Cheaper models = lower quality. Use Claude when quality matters, sub-agents for grunt work.
📦 Batch Operations & Efficiency
Combine Multiple Queries
Reduce round-trips by asking for multiple things at once:
# Instead of separate messages
"Check email" → "Check calendar" → "Check weather"
# Do this
"Morning briefing: check email for urgent items, calendar for today's events, and weather forecast"
File Operations
Batch read/write instead of one-at-a-time:
# Efficient
"Read all .md files in memory/ and summarize key points"
# Inefficient
"Read memory/2026-02-10.md" → "Now read 2026-02-11.md" → etc.
Script Execution
Write scripts for repeated operations:
# Instead of asking me every time
"Create a script: ~/clawd/scripts/daily-check.sh"
# Then just run it
"Run daily-check.sh"
⚡ Quick Command Reference
System
# Check system health
vm_stat | head -10
# Check RAM
sysctl hw.memsize
top -l 1 | grep PhysMem
# Check Ollama status
ollama ps
# Dashboard
open http://100.82.234.66:8080
LLM Gateway
# Quick query (Telegram)
/ask "your question"
# CLI query
~/dta/gateway/ask "your question"
# Check usage
~/dta/gateway/llm-usage
/usage
# Force model
~/dta/gateway/ask --force ollama "simple query"
Email
# List inbox
himalaya envelope list
# Search
himalaya envelope list from:recruiter subject:job
# Read
himalaya message read <id>
Tasks
# Add to Things
things add "Task description"
# View today
things show today
# Reminders
remindctl add "Reminder" --date tomorrow
Notes
# Apple Notes
memo new "Note content"
memo list
memo search "query"
# Memory
# Write to today's log
echo "Event happened" >> ~/clawd/memory/$(date +%Y-%m-%d).md
✨ Best Practices
Do's ✅
Use Ollama local for simple queries (FREE)
Spawn sub-agents for research/heavy work
Batch multiple requests into one message
Let heartbeat handle routine monitoring
Use LLM Gateway for non-Claude tasks
Write scripts for repeated operations
Check token usage periodically
Don'ts ❌
Don't use Claude for simple lookups (use Ollama/LLM Gateway)
Don't make separate requests when you can batch
Don't keep asking the same thing (write it once, reference later)
Don't spawn sub-agents for quick tasks (overhead not worth it)
Don't forget to check NVIDIA API limits (50/day)
Token Budget Guidelines
Conservative: ~100k tokens/day
Moderate: ~200k tokens/day
Heavy: 300k+ tokens/day
Current session limit: 1M tokens (200k used right now)
When Quality Matters More Than Cost
Use Claude Sonnet 4 for:
Important decisions
Complex problem-solving
Creative work
Sensitive communications
When sub-agents have failed
Use cheaper models for:
Information retrieval
Summarization
Simple transformations
Background research
Routine monitoring
🤖 The Bot Team
Tommie's AI infrastructure includes multiple specialized bots working together:
Bot
Host
Telegram
Role
🧠 Main Agent
Mac Mini (100.82.234.66)
@tommie77bot
Primary orchestrator, Clawdbot gateway
🍑 Bottom Bitch
Dell (100.119.87.108)
@Thats_My_Bottom_Bitch_bot
Cross-node coordinator, infrastructure helper
🐭 Pinky
Mac Pro (100.64.58.30)
@Pinkypickles_bot
Compute node assistant, heavy inference
🥜 Deez Nutz
TBD
@look_at_deeznutszbot
Needs setup
Group Chat: "The Bot Chat"
All bots communicate in a shared Telegram group for coordination.
Chat ID: -1003779327245 / -5052671848
@ mention each other for tasks
Share knowledge and reports
Don't step on each other's work
💡 Team Rule: Each bot monitors and responds to @ mentions in The Bot Chat. Check it on every heartbeat!