Fully offline text-to-speech via Piper TTS. Self-contained setup, automatic language detection for 20+ languages, and per-call voice selection. Writes audio files into the OpenClaw workspace for easy attachment and sending.

Features

Fully offline — no API keys required
Self-contained setup — setup() installs Piper into an isolated venv, no system packages modified
Automatic language detection for 20+ languages with English as default
Per-call voice and speed selection: pass voice: "voice-stem" and lengthScale: 0.85 to tts()
Dynamic voice discovery: listVoices() returns whatever is installed — no hardcoded assumptions
On-demand voice download: downloadVoices(["en_US-ryan-medium", ...]) fetches models from HuggingFace
Voice removal: removeVoice("en_US-ryan-medium") deletes models you no longer need
Extensible: add any language by dropping in a Piper .onnx model
Writes outputs into the OpenClaw workspace for easy attachment
Default output: OGG/Opus (compact, widely compatible)

Requirements

python3 (3.8+) — for the one-time setup() step
ffmpeg — for WAV → OGG/Opus conversion
espeak-ng — system library used by Piper for phonemization (see note below)

No API keys. No system-wide package installation. Everything stays inside the skill directory.

Platform support

Platform	Status
Linux x86_64	Fully supported
macOS x86_64 / arm64	Fully supported
Linux ARM (Raspberry Pi, etc.)	May require building piper-tts from source
Windows	Not supported (bash dependency)

espeak-ng

Piper uses espeak-ng internally for text-to-phoneme conversion. On many systems it is already installed. setup() checks for it and warns if missing. If needed, install via your package manager:

# Debian / Ubuntu
sudo apt install espeak-ng

# Fedora / RHEL
sudo dnf install espeak-ng

# macOS
brew install espeak

After installing, TTS should work without re-running setup().

Installation

cp -r local-piper-tts-multilang-secure ~/.openclaw/skills/local-piper-tts-multilang-secure

Then ask your agent to set it up — it will call setup() after asking for your confirmation. setup() is a one-time operation that:

Creates a Python venv inside the skill directory
Installs piper-tts from PyPI into that venv
Checks for espeak-ng and warns if missing

First run

After installation, tell your agent:

"Set up the local TTS skill"

The agent will:

Call status() and explain what needs to be done
Ask for confirmation, then run setup()
Offer to download English voice models (ryan-medium and/or amy-medium)
Ask if you need any other languages (German, French, Spanish, Polish, Italian, Russian, …)
Download your chosen voices, generate a short sample for each, and send them to you
Ask which voice you prefer
Ask about preferred speech speed in % (default 100% = normal, e.g. 125% = faster), play a sample at your chosen speed

Voice models

The skill ships with no voice models — you choose what to install. English is recommended as a baseline. Browse available models at: https://github.com/rhasspy/piper/blob/master/VOICES.md

Recommended English defaults

Stem	Gender	Size
`en_US-ryan-medium`	Male, American	~65 MB
`en_US-amy-medium`	Female, American	~65 MB

Download programmatically:

const { downloadVoices } = require('./index');
await downloadVoices(['en_US-ryan-medium', 'en_US-amy-medium']);

Or just ask your agent: "Download the English voices" — it will handle everything including playing samples so you can choose.

To see what is installed:

require('./index').listVoices()
// ["en_US-ryan-medium", "de_DE-thorsten-medium", ...]

Or ask your agent: "What voices do you have available?"

Changing voices

Just tell your agent:

"I don't like this voice, use a different one"
"Download a female English voice"
"Switch to British accent"
"Get a German voice"

The agent will check what is installed, download what is needed, play a sample, and use the right model.

Removing voices

Just tell your agent:

"Remove the German voice"
"Delete the Ryan voice, I only use Amy"
"Clean up unused voices"

The agent will confirm which voice to remove and delete the model files. Each voice takes ~65 MB, so removing unused ones can free significant disk space.

Programmatically:

require('./index').removeVoice('en_US-ryan-medium')
// { removed: 'en_US-ryan-medium', filesDeleted: ['en_US-ryan-medium.onnx', 'en_US-ryan-medium.onnx.json'] }

Changing speech speed

Just tell your agent:

"Speak faster"
"Too slow, speed it up"
"Use 120% speed"
"Back to normal"

The agent will suggest options in %, play a sample, and apply the change. Speed is expressed as a percentage — 100% is normal. lengthScale is the inverse: lengthScale = 1 / (speed% / 100).

Speed	lengthScale
125% (fast)	0.8
115%	0.87
100% (normal)	1.0
80% (slow)	1.25

Default is 100% (lengthScale 1.0).

To persist your preferred speed across sessions, ask your agent to save it — it will call saveConfig({ lengthScale: 0.8 }) which writes to config.json inside the skill directory. The skill picks this up automatically on every subsequent call — no need to repeat your preference each session.

Language detection

Detection logic lives in piper-tts.sh and works automatically based on character and script analysis:

Non-Latin scripts (unambiguous):

Cyrillic → Russian (with Ukrainian detection via і/ї/є/ґ), Bulgarian, Serbian
Greek → Greek
Arabic script → Arabic (with Persian detection via پ/چ/ژ/گ)
CJK ideographs → Chinese (with Japanese detection via Hiragana/Katakana)
Hangul → Korean
Georgian → Georgian

Latin-script languages (by distinctive characters):

Vietnamese (ăơưđ)
Polish (ąćęłńśźż)
Romanian (șț)
Turkish (ğışİ)
Czech/Slovak (ěščřžďťň, ů for Czech)
Hungarian (őű)
Portuguese (ãõ)
Spanish (ñ¿¡)
Catalan (l·l)
German (ß, äöü)
Finnish (äö, when no Scandinavian markers)
Scandinavian — Norwegian/Danish (æø), Swedish (åäö)
French (œçèêëïî)
Italian (àèìòù)

Fallback: English keywords → first English model → any installed model.

No detection needed when voice is specified explicitly.

Security

execFile throughout — no shell interpreter, user text cannot inject commands
Voice path validated to stay within the skill directory — no path traversal
Output filename sanitised with path.basename() — no directory traversal
HTTPS-only downloads — non-HTTPS URLs and redirects are rejected
URL path components validated against expected patterns
Atomic downloads (write to .tmp, rename on success) — no corrupt models from interrupted downloads
Piper installed in isolated venv — no system Python packages touched
No credentials, no network calls during TTS (only during setup and voice downloads)

Remove

rm -rf ~/.openclaw/skills/local-piper-tts-multilang-secure

This removes everything: skill code, venv, and all voice models.

License

MIT

README.md Unescape Escape

local-piper-tts-multilang-secure

Features

Requirements

Platform support

espeak-ng

Installation

First run

Voice models

Recommended English defaults

Changing voices

Removing voices

Changing speech speed

Language detection

Security

Remove

License

README.md