π΅ Stem Separation (Demix)ΒΆ
Powered by Demucs β an AI model that separates audio into vocal and instrument tracks.
Why stem separation?ΒΆ
Without demix, a segment where someone sings over a backing track is labeled speech_over_music β ambiguous.
With demix, that segment becomes precisely classified as singing β enabling clean extraction.
flowchart LR
A[Concert recording] --> B{Without --demix}
B --> C["speech_over_music π€·"]
A --> D{With --demix}
D --> E["Demucs separates\nvocals + instruments"]
E --> F["singing β
"]
E --> G["music β
"]
InstallΒΆ
UsageΒΆ
praisonai-editor edit concert.mp3 \
--preset songs_only \
--detector ensemble \
--demix \
--primary-zone \
-v
What Demucs separatesΒΆ
| Stem | File | Description |
|---|---|---|
vocals |
vocals.wav |
Isolated vocal track |
no_vocals |
no_vocals.wav |
All instruments (no voice) |
Stem cacheΒΆ
The first run is slow (CPU inference on a 40-min file β 10 min). After that, stems are cached forever:
~/.praisonai/editor/.demix_cache/{sha256_hash}/
βββ vocals.wav (first 8 MiB SHA-256 hash β unique per file)
βββ no_vocals.wav
Second run is ~8 seconds. See Stem Cache for details.
Supported modelsΒΆ
| Model | Notes |
|---|---|
mdx_extra |
Default β best quality |
htdemucs |
More stems (drums, bass, etc.) |
mdx_extra_q |
Quantized β lower memory |