Skip to content

🎡 Stem Separation (Demix)¢

Powered by Demucs β€” an AI model that separates audio into vocal and instrument tracks.

Why stem separation?ΒΆ

Without demix, a segment where someone sings over a backing track is labeled speech_over_music β€” ambiguous.

With demix, that segment becomes precisely classified as singing β€” enabling clean extraction.

flowchart LR
    A[Concert recording] --> B{Without --demix}
    B --> C["speech_over_music 🀷"]
    A --> D{With --demix}
    D --> E["Demucs separates\nvocals + instruments"]
    E --> F["singing βœ…"]
    E --> G["music βœ…"]

InstallΒΆ

pip install "praisonai-editor[demix]"

UsageΒΆ

praisonai-editor edit concert.mp3 \
  --preset songs_only \
  --detector ensemble \
  --demix \
  --primary-zone \
  -v

What Demucs separatesΒΆ

Stem File Description
vocals vocals.wav Isolated vocal track
no_vocals no_vocals.wav All instruments (no voice)

Stem cacheΒΆ

The first run is slow (CPU inference on a 40-min file β‰ˆ 10 min). After that, stems are cached forever:

~/.praisonai/editor/.demix_cache/{sha256_hash}/
  β”œβ”€β”€ vocals.wav       (first 8 MiB SHA-256 hash β†’ unique per file)
  └── no_vocals.wav

Second run is ~8 seconds. See Stem Cache for details.

Supported modelsΒΆ

Model Notes
mdx_extra Default β€” best quality
htdemucs More stems (drums, bass, etc.)
mdx_extra_q Quantized β€” lower memory

Python APIΒΆ

from praisonai_editor._demix import isolate_vocals, has_demucs

if has_demucs():
    vocals_path, inst_path = isolate_vocals(
        "concert.mp3",
        model_name="mdx_extra",  # default
        device="cpu",            # or "mps" for Apple Silicon
        verbose=True,
    )