1. What This App Can Do
This app lets you run large language models (LLMs) entirely on your Android device, enabling private text generation without external servers.
You can load GGUF models from HuggingFace or from local storage, providing a flexible offline AI environment.
The app supports a wide range of llama.cpp‑compatible models, including Gemma‑4 and Bonsai.
An Ollama‑compatible API server is also included, allowing other apps or scripts to access your local LLM via standard HTTP endpoints.
Web UI is Available. Vision as multimodel is Available with Gemma-4.
Upload mmproj file from local device. Only one mmproj model is acceptable at same time.
MCP and Function Calling are available.
---
2. Intended Users and Supported Devices
Ideal for:
- Users who want fully local LLMs
- Those using GGUF models from HuggingFace or local files
- Advanced users needing detailed parameter control
- Developers calling a local LLM from their apps
- Privacy‑focused users
The app works on a wide range of Android devices. By adjusting context size, threads, and batch size, you can tune performance for your hardware.
---
3. Key Features
- Load GGUF models from HuggingFace or local storage
- Fully offline inference
- Supports Gemma‑4 and other llama.cpp‑compatible models
- Detailed parameter settings (Mirostat, DRY, XTC, etc.)
- Ollama‑compatible API: /api/chat, /api/generate, /api/tags
- Automatic prompt template selection
- Streaming output option
- Comprehensive logs and UI safeguards
- Various minor improvements for stability and usability
- Web UI is Available.
- Vision as multimodel is Available with Gemma-4.
---
4. Getting Started
1. Open Settings.
2. Enter a HuggingFace GGUF URL or choose a local GGUF file, then tap Load Model.
3. Adjust parameters and tap Save Config.
4. Tap SAVE & CLOSE to apply settings.
---
5. Main Screen Functions
- Enter Prompt: Input your instruction
- Send: Start generation
- Re‑init Model: Reload current model
- View Log / Clear Log
- Start/Stop API Server
- Copy output or logs
- View timestamped processing logs
---
6. Settings Screen Highlights
- Save/load/delete configurations
- Model selection from URL or local storage
- Context size, temperature, Mirostat, DRY, XTC, etc.
- Streaming output toggle
- Custom or auto‑selected prompt templates
- API server port settings
- Log verbosity options
- Manual and privacy policy
---
7. Prompt Templates and Stop Sequences
The app detects the model family from the filename and selects an appropriate template.
It also stops generation when common delimiters appear to prevent runaway output.
Tips:
Gemma‑4 tends to repeat short phrases. Adding explicit anti‑repetition instructions in the system prompt or using stricter stop sequences can improve output quality.
---
8. API Server Capabilities
Provides:
- /api/chat
- /api/generate
- /api/tags
- /v1/chat/completions,
- /v1/models
- /props, /slots
- / for Web UI http://localhost:11434/
Only one generation request is processed at a time. Android 13+ may require notification permission.
---
9. How This App Stands Out
- GGUF loading from HuggingFace and local device
- Support for Gemma‑4 as multimodel via Web UI
- More detailed parameter control than typical local LLM apps
- Built‑in Ollama‑compatible API server
- Automatic template selection
- Flexible performance tuning
Latest Version
1.400Uploaded by
Emma Diaz
Requires Android
Android 7.0+
Category
Free Business AppContent Rating
Everyone
Security Report
Check Now
Report
Flag as inappropriateLast updated on May 5, 2026
MCP and Function Calling are Available.
Improve Settings usability.