Transcribe audio files via OpenRouter using audio-capable models (Gemini, GPT-4o-audio, etc).
Install
Documentation
OpenRouter Audio Transcription
Transcribe audio files using OpenRouter's chat completions API with input_audio content type. Works with any audio-capable model.
Quick start
{baseDir}/scripts/transcribe.sh /path/to/audio.m4a
Output goes to stdout.
Useful flags
Custom model (default: google/gemini-2.5-flash)
{baseDir}/scripts/transcribe.sh audio.ogg --model openai/gpt-4o-audio-preview
Custom instructions
{baseDir}/scripts/transcribe.sh audio.m4a --prompt "Transcribe with speaker labels"
Save to file
{baseDir}/scripts/transcribe.sh audio.m4a --out /tmp/transcript.txt
Custom caller identifier (for OpenRouter dashboard)
{baseDir}/scripts/transcribe.sh audio.m4a --title "MyApp"
How it works
1. Converts audio to WAV (mono, 16kHz) using ffmpeg
2. Base64 encodes the audio
3. Sends to OpenRouter chat completions with input_audio content
4. Extracts transcript from response
API key
Set OPENROUTER_API_KEY env var, or configure in ~/.clawdbot/clawdbot.json:
{
skills: {
"openrouter-transcribe": {
apiKey: "YOUR_OPENROUTER_KEY"
}
}
}
Headers
The script sends identification headers to OpenRouter:
- -
X-Title: Caller name (default: "Peanut/Clawdbot") - -
HTTP-Referer: Reference URL (default: "https://clawdbot.com")
These show up in your OpenRouter dashboard for tracking.
Troubleshooting
ffmpeg format errors: The script uses a temp directory (notmktemp -t file.wav) because macOS's mktemp adds random suffixes after the extension, breaking format detection.
Argument list too long: Large audio files produce huge base64 strings that exceed shell argument limits. The script writes to temp files (--rawfile for jq, @file for curl) instead of passing data as arguments.
Empty response: If you get "Empty response from API", the script will dump the raw response for debugging. Common causes:
- -Invalid API key
- -Model doesn't support audio input
- -Audio file too large or corrupted
Launch an agent with Transcribe audio files via OpenRouter using audio-capable models on Termo.