PRIVATE, LOCAL & OFFLINE AI. AD-FREE WITH NO SUBSCRIPTION
TokForge runs large language models directly on your Android device fast. No cloud, no subscription, and no data leaving your pocket.
Whether you need a local AI assistant for productivity or a talking AI friend offline, TokForge delivers high-performance inference without an internet connection.
WHAT CAN IT DO? TOKFORGE FEATURES:
Chat with AI Characters
💬 Your offline AI chat experience just got an upgrade. Import TavernAI V2 character cards (PNG/JSON), customize personalities, and have real conversations with streaming generation. TokForge is the ultimate AI friend offline, featuring Lorebooks, alternate greetings, and world info. Reasoning models even include collapsible thinking blocks for deep logic.
Attach Documents & Ask Questions
📄 Turn TokForge into a powerful local AI research tool. Drop in a PDF, DOCX, EPUB, or text file and ask me anything app offline style. Using RAPTOR tree indexing and BGE-small embeddings, the app finds relevant passages instantly. Follow-up questions stay fast thanks to delta KV cache preservation.
Hear Responses Read Aloud
🔊 A true voice assistant for Android offline. Featuring on-device Kokoro TTS with 11 voices and two quality tiers, your offline assistant can read responses back to you with no latency and zero data usage.
2x Faster with Speculative Decoding
⚡ Experience the fastest LLM performance on mobile. A small draft model predicts ahead while the main model verifies in batch. With a live tok/s indicator and smart backend routing, it’s the most efficient AI on-device solution available.
Three Backends, Five GPU Paths
· MNN with OpenCL and Vulkan GPU: Tuned kernels for Mali and Adreno. TQ4 TurboQuant hits 46–57 tok/s on small models.
· GGUF via llama.cpp: ARM i8mm, Vulkan cooperative matrix, flash attention, and full quantization range.
· Remote API: OpenAI-compatible streaming to Ollama, vLLM, or llama.cpp servers.
· SoC-Aware Auto-Routing: This ai local assistant automatically picks the fastest path for your specific chipset.
ADVANCED AI OFFLINE CHAT FEATURES:
• Your AI Remembers You: Per-character persistent memory with background extraction. Knowledge graphs track entity relationships using hybrid keyword and semantic search.
• Tune Your Device: ForgeLab benchmarks every ai model and backend combo on your hardware. AutoForge sweeps all configs to pick the fastest settings for your offline ai app.
• Developer API: 120+ endpoints for full local control over HTTP. Load models, manage memory, and send messages programmatically.
TESTED ON REAL HARDWARE
- RedMagic 11 Pro: 21.0 tok/s — Qwen3-8B
- Galaxy S24 Ultra: 13.58 tok/s — Qwen3-4B
- OnePlus Ace 5 Ultra: 11.88 tok/s — Qwen3-8B
- Xiaomi Pad 7 Pro: 11.81 tok/s — Qwen3-4B
WHY TOKFORGE?
►This is the AI all in one app for users who refuse to compromise on speed or security.
►Zero analytics, zero telemetry, zero cloud dependency.
►Free ai chatbot offline: All inference happens on-device—airplane mode works perfectly.
►No accounts, no sign-up.
►17 curated models (0.6B–14B): Choose from Qwen3, DeepSeek-R1, Llama 3, Phi-4, and more.
Your smartphone is smarter and more powerful than you think. And by moving the brain of the AI directly onto your silicon, we've eliminated the lag, the costs, and the prying eyes of the cloud.
☑️Download this free offline AI powerhouse today and take control of your data.
Latest Version
3.4.7Uploaded by
Andrei Ancheta
Requires Android
Android 8.0+
Category
Free Entertainment AppContent Rating
Everyone
Security Report
Check Now
Report
Flag as inappropriateLast updated on Apr 7, 2026
Lot's of changes vs last upload. TurboQuant added under advanced settings, Cache clearing, RAG + Attachment support (Very Beta), Metrics/API work, UI/UX cleaning and improvements from beta tester feedback