Introducing Emberon: An On-Device Dictation Cleanup Model
WisperCode Team · June 28, 2026 · 8 min read
TL;DR: We trained our own on-device model, Emberon-1.2B, for exactly one job: turn rough dictation into clean text, and never touch its meaning. It fixes grammar, punctuation, capitalization, and filler, then gets out of the way — it never answers, expands, or invents. In our held-out tests it cleaned 100% of answer-tempting inputs instead of answering them, with 0.00% hallucination, in about 0.91s and 1.6 GB of RAM, entirely on-device. The weights are open and free to use. Emberon is the cleanup model that will power WisperCode at launch — join the waitlist to be first in line.
Your voice. Your machine. Your words. Today we're sharing Emberon-1.2B, a small cleanup model we trained ourselves and released as open weights. It's the model that will run dictation cleanup inside WisperCode — and it does one thing, on a 1.2-billion-parameter budget that fits on your laptop: it turns a raw, disfluent voice transcript into clean written text, and it never answers it.
What Emberon does
When you dictate, speech-to-text gives you a rough first pass — run-on grammar, missing punctuation, "um"s and "uh"s, half-finished thoughts. The cleanup stage fixes that. Emberon fixes the grammar, punctuation, capitalization, and filler words, then steps back. The thing that makes it different from "just pipe it through an AI" is what it refuses to do:
- Dictate a question like "how does the garbage collector work in Java" and you get that sentence, cleaned. Not an answer.
- Dictate an instruction with
user.emailorO(n^2)in it, and those land exactly as you said them. - Say "name it red, no wait, blue" and you get blue, with the correction resolved and the false start gone.
It cleans. It never answers, expands, or invents. For dictation, that restraint is the feature.
Why this is harder than it sounds
A huge fraction of what people dictate is shaped like a prompt. When a developer dictates "how does the garbage collector work in Java," they want that sentence dropped into their editor — not a 200-word essay on garbage collection. When they say "add a null check before we access user.email then deploy it," they want exactly that, with user.email intact and nothing invented.
Hand that to a general instruction-tuned model and it does what it was trained to do: it helps. It answers the question. It expands the instruction. It "improves" the wording until your meaning has drifted. For a dictation tool, that isn't a feature — it's data loss. We measured a strong, same-size general model on the identical cleanup task and it answered or editorialized an answer-tempting input 29% of the time. Closing that gap is the entire reason Emberon exists.
Emberon vs. a stock same-size model
We tested Emberon the hard way: through the real llama.cpp inference path the app actually uses, on complete held-out sets it never trained on, with zero training leakage. Against the stock same-size base model (LiquidAI's LFM2.5-1.2B-Instruct) given the identical zero-shot prompt — so the difference is the fine-tuning, not the prompting:
| Metric | Emberon-1.2B | Stock same-size model |
|---|---|---|
| Cleaned, not answered (n=493) | 100.0% | 71.0% |
| Word preservation (n=1,152) | 0.953 | 0.780 |
| Identifier preservation (1,436 IDs) | 0.968 | 0.833 |
| Hallucination / content added | 0.00% | 13.3% |
Preservation scores are on a 0–1 scale, higher is better; identifier preservation tracks how often code-like tokens such as user.email survive cleanup verbatim.
Two results we want to call out honestly:
- The suppression result is a clean sweep, and the test is hard. Zero of 493 answer-tempting inputs were answered — across both question and command phrasings, and across both real and synthetic sources. That 100% is credible precisely because the same test trips the un-tuned model ~29% of the time.
- 0.00% hallucination is the headline behavior. Across all 1,152 items, Emberon never introduced content that wasn't in the transcript. The stock model did, 13.3% of the time. Faithful cleanup is the entire design goal, and it holds.
It also generalizes: on 79 genuinely-unseen real dictations pulled from live usage — far longer and messier than the eval set, strictly leakage-filtered — Emberon held 0% hallucination and 100% suppression on the answer-tempting subset.
Still 100% local, by design
Nothing changes about the part that matters most. Emberon runs entirely on your machine — about 0.91s of cleanup in roughly 1.6 GB of RAM, with no cloud, no API, no account, and no audio or text ever leaving your device. Every number above was produced by a model running locally on consumer hardware; there's no server-side inference anywhere in this story.
That's the same promise behind everything we build. If you're new to the idea that capable models can run on your own hardware, see running AI models locally, and our privacy-first voice dictation guide for why local speech recognition is the architecture, not just a setting. For the trade-offs in plain terms, see cloud vs. local speech recognition.
How it fits into WisperCode
WisperCode's pipeline is a two-stage local stack: OpenAI's Whisper turns your speech into a raw transcript, then Emberon cleans that transcript into the text that lands at your cursor. Whisper hears you; Emberon makes your words come out clean. Both run on-device, so the whole path — audio, transcription, cleanup, insertion — stays on your machine.
WisperCode itself is launching soon. Emberon is the cleanup model that will ship inside it. Join the waitlist and we'll email you the macOS and Windows apps the moment they're ready.
Open weights
We didn't just build Emberon for ourselves — we published it. The weights are open and free to use under the LFM Open License v1.0, built on Liquid AI's LFM2.5-1.2B-Instruct and credited as such (we say "open weights," not "open source," because the inherited license is revenue-gated and isn't OSI-approved — we'd rather be precise than flattering). If you're a developer or researcher, grab it here: huggingface.co/PromethicLabs/Emberon-1.2B.
The full story — the synthetic data engine, the dual faithfulness gates, the evaluation methodology, and a counterintuitive finding about few-shot prompting — is on the Promethic Labs blog.
Why "Emberon"
An ember is a quiet, warm glow, not a blaze. That's the right metaphor for a model that sits silently behind your dictation, never grabbing the conversation, just making your words come out clean.
Frequently Asked Questions
What is Emberon-1.2B?
Emberon-1.2B is a 1.2-billion-parameter, open-weights cleanup model built and released by Promethic Labs, the team behind WisperCode. It takes a rough voice transcript and produces clean written text — fixing grammar, punctuation, capitalization, and filler — without answering, expanding, or inventing anything. It is a LoRA fine-tune of LiquidAI's LFM2.5-1.2B-Instruct and runs entirely on-device.
Does Emberon send my dictation to the cloud?
No. Emberon runs entirely on your machine in about 1.6 GB of RAM, with no cloud, no API, and no account. No audio or text ever leaves your device. It is the same local-only architecture behind WisperCode's Whisper transcription.
How is Emberon different from using ChatGPT or another AI to clean up text?
General instruction-tuned models are trained to help, so when you dictate something shaped like a question or a command, they tend to answer or expand it — which, for dictation, silently changes what you said. Emberon is trained to do the opposite: clean the text and preserve your exact meaning. In testing it cleaned 100% of answer-tempting inputs instead of answering them, with 0.00% hallucination, where a same-size general model answered 29% of them.
Is Emberon available now, and is WisperCode out yet?
The Emberon weights are available now, openly, on Hugging Face. The WisperCode desktop app is launching soon — Emberon is the cleanup model that will ship inside it. Join the waitlist on the download page to get the macOS and Windows apps the moment they're ready.
Can I use Emberon in my own project?
Yes. Emberon is released as open weights under the LFM Open License v1.0 (free for commercial use for entities under $10M/yr in revenue, inherited from the LiquidAI base model). Download it from Hugging Face, and see the model card for attribution and license details.
The bottom line
Emberon is the model behind WisperCode — quiet, local, and faithful to your words. WisperCode is launching soon for macOS and Windows.
Want it the moment it ships? → Join the waitlist
Related Articles
Privacy-First Voice Dictation: The Complete Guide
Learn how local voice dictation protects your data. Compare cloud vs on-device speech recognition for privacy, security, and compliance.
February 5, 2026 · 15 min read
Why Local Speech Recognition Changes Everything
Cloud-based dictation is convenient. Local dictation is better. Here is why we bet everything on on-device processing.
February 5, 2026 · 13 min read
Cloud vs Local Speech Recognition in 2026
Compare cloud-based and local speech recognition across privacy, accuracy, speed, and cost. Learn which approach fits your needs in 2026.
January 28, 2026 · 11 min read