Welcome to .voicery Documentation
.voicery is a modern voice changer powered by W-Okada engine — adapted for simplicity, ease of use, and English-speaking users.
Use the menu on the left to navigate through the documentation.
If you prefer an all-in-one preconfigured version with included models, download Voicery.
Overview
.voicery is built on top of the powerful open-source project W-Okada Voice Changer.
While W-Okada offers incredible real-time AI voice conversion, it can be complex to install and use for non-technical users. Voicery solves that by packaging the core technology into an easy-to-use interface for English-speaking users, with built-in voice models and one-click installation.
If you want full control and flexibility, you can follow this documentation to configure and use W-Okada directly.
Or, you can download the Voicery version with all settings pre-configured.
Features
- Real-time voice conversion using state-of-the-art AI models.
- Support for RVC and Beatrice v2 voice models.
- Custom voice training and model management.
- Clean and responsive graphical interface.
- CLI support for advanced users:
python server.py --voice="model_name"
Use Cases
Voicery and W-Okada-based voice changers can be used in many areas:
- Gaming: Modify your voice to match in-game characters in real time.
- Streaming: Create fun or anonymous experiences on Twitch, YouTube, etc.
- Meetings & Calls: Maintain privacy or simulate accents during virtual meetings.
- Content Creation: Add voiceovers with unique styles to videos, music, or podcasts.
Operating System Support
Voicery is primarily optimized for Windows platforms. While support for macOS and Linux is planned for future releases, users on these systems can currently utilize the underlying W-Okada Voice Changer directly.
For Windows users, Voicery offers a streamlined experience with pre-configured settings and an intuitive interface.
Hardware Specifications
Minimum Requirements
- CPU: Modern multi-core processor (e.g., Intel i5 or AMD Ryzen 5)
- RAM: 8 GB
- GPU:
- NVIDIA: GTX 1060 or higher with CUDA support
- AMD: Radeon RX 580 or higher with DirectML support
- Intel: Integrated GPUs are not recommended due to performance limitations
Recommended Requirements
- CPU: Intel i7 or AMD Ryzen 7
- RAM: 16 GB or more
- GPU:
- NVIDIA: RTX 3060 (12GB VRAM) or higher for optimal performance and training capabilities
- AMD: Radeon RX 6700 XT or higher with DirectML support
Note: High VRAM GPUs are beneficial for training custom voice models and ensuring low-latency real-time voice conversion. :contentReference[oaicite:1]{index=1}
Audio Setup
For seamless voice routing and integration with applications like Discord, OBS, or Zoom, it's recommended to use virtual audio cables.
Recommended Virtual Audio Cable Software
- VB-Audio Virtual Cable: A popular choice for creating virtual audio devices.
- Voicemeeter: Offers advanced mixing capabilities alongside virtual audio routing.
Setup Instructions
- Download and install the virtual audio cable software of your choice.
- Set the virtual cable as the default output device in your system's sound settings.
- Configure Voicery to use the virtual cable as its output device.
- In your target application (e.g., Discord), set the input device to the virtual cable.
This setup ensures that the modified voice output from Voicery is correctly routed into your communication or recording applications.
Windows Installation
There are two different builds available for Windows depending on your GPU:
Option 1: CUDA Version (for NVIDIA GPUs)
Recommended if you have a CUDA-compatible NVIDIA GPU with drivers for CUDA 11.7 or 11.8.
Download
Requirements
- NVIDIA GPU with CUDA 11.7 or 11.8 support
- CUDA Toolkit 11.7 / 11.8
- Latest NVIDIA graphics drivers
Option 2: Standard Version (for AMD / Intel or CPU-only)
Use this if your system does not have an NVIDIA GPU. This version relies on CPU or DirectML acceleration (for AMD / Intel GPUs).
Download
Requirements
- Any modern CPU (quad-core recommended) or AMD/Intel GPU with DirectML support
- Windows 10 or newer
- No need for CUDA Toolkit
Installation Steps (for both versions)
- Download and unzip the appropriate version to a folder of your choice (e.g., Desktop).
- Open the folder and double-click
start_http.bat
to launch the voice changer. - If the window closes immediately, open CMD manually and run:
- Wait for first-time setup (it may install Python packages or dependencies).
cd C:\Path\To\UnzippedFolder
start_http.bat
Optional: Virtual Audio Setup
To route the AI-processed voice into apps like Discord or OBS:
- Recommended: VAC Lite (Line 1)
- Not recommended: VB-Audio Virtual Cable (prone to delay and instability)
- After installation, run
mmsys.cpl
via Win+R to manage default input/output devices
macOS Installation
macOS builds are available for both Intel and Apple Silicon (M1/M2). Apple Silicon is recommended.
Download
vcclient_mac_2.0.76-beta.zip
- For M1 Macs:
vcclient_mac_2.0.70-beta.zip
(lightweight, better compatibility)
Installation Steps
- Unzip the downloaded archive.
- Control+Click
MMVCServerSIO
→ Open → Allow in System Preferences. - Control+Click
startHTTP.command
→ Open. - Important: Launch
MMVCServerSIO
beforestartHTTP.command
.
Virtual Audio Cable
- BlackHole — install 2ch version.
- Configure routing via
Audio MIDI Setup
.
Linux Installation
There is no precompiled version for Linux. You must build from source using Python and manually configure dependencies.
Install Instructions
sudo apt update
sudo apt install git python3.10 python3.10-venv ffmpeg
git clone https://github.com/w-okada/voice-changer.git
cd voice-changer
python3.10 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
python MMVCServerSIO.py
GPU Acceleration
If using NVIDIA GPU:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Audio Routing
- Use
pavucontrol
to route input/output devices. - Or configure
JACK Audio
for advanced setups.
Version Notes
- 2.0.76-beta (macOS): Latest mac build, works on M1/M2. May show warnings — allow via Gatekeeper.
- 2.0.70-beta (macOS M1-only): Optimized for Apple Silicon, fewer crashes, less GPU usage.
- 2.0.61-alpha (Windows CUDA): Stable CUDA build for NVIDIA GPUs (recommended).
Release Source
All downloads are available on Hugging Face:
https://huggingface.co/wok000/vcclient000
Model Migration When Updating
- Backup
model_dir
and copy it into the new version’s folder. - Keep your shortcut to
start_http.bat
orstartHTTP.command
. - After launching new version, reset audio/output/chunk settings as needed.
Launching the Application
After installation, follow these steps to launch the Voicery app:
- Navigate to the folder where you extracted the ZIP archive.
- On Windows: Double-click
start_http.bat
. - On macOS: Hold Control and click
MMVCServerSIO
→ choose "Open". Repeat forstartHTTP.command
. - Wait for the app to initialize and launch the web-based user interface in your default browser.
Basic Configuration
Input/Output Devices
To use Voicery with your mic and speakers or route output into apps like Discord or OBS, configure audio devices properly:
- In the Web UI, open the audio settings panel.
- Select your main microphone under Input Device.
- For Output Device, choose your virtual cable (e.g.,
[MME] CABLE Input (VB-Audio)
). - To hear the modified voice, set your headphones or speakers under Monitor Output.
💡 Press Win + R
and type mmsys.cpl
to open Windows sound settings and make sure defaults are correctly configured.
Voice Models
Voicery supports real-time voice conversion using RVC (Retrieval-based Voice Conversion) and Beatrice v2 models. To change or upload a model:
- Click the Edit button next to the model slot in the Web UI.
- Click Upload and choose a voice model file:
.pth
(PyTorch) or.onnx
(ONNX format). - RVC v1/v2 is supported by default — no need to change type unless you're using a custom SVC/Beatrice model.
- For AMD/Intel users, ONNX models are recommended for better performance.
- You may assign an image to the model by clicking the placeholder icon labeled
no image
.
Testing
Follow these steps to test voice conversion in real-time:
- Ensure your microphone and headphones are correctly selected in both Voicery and your OS sound settings.
- Speak into the microphone — your transformed voice should play back in real time.
- Use apps like Discord, OBS, or Zoom and select the virtual audio cable as the microphone input.
- Adjust pitch shift, noise reduction, gain, and buffer size inside the Voicery UI for optimal quality.
- Test different models and switch voices live during usage.
RVC (Retrieval-based Voice Conversion)
Overview: RVC is an open-source voice conversion framework that enables realistic speech-to-speech transformations, preserving the intonation and audio characteristics of the original speaker. It allows for real-time voice conversion with low latency, making it suitable for various applications such as gaming, streaming, and content creation.
Use Cases:
- Real-time voice modulation during live streams or gaming sessions.
- Creating AI-generated singing or speaking voices for content production.
- Voice anonymization for privacy during online communications.
Adding and Managing RVC Models:
- Download RVC models from trusted sources, ensuring they are compatible with .voicery.
- Place the model files (typically
.pth
and.index
) into the designated models directory within .voicery. - In the .voicery interface, navigate to the model management section and load the desired RVC model.
- Configure any additional settings as required for optimal performance.
Beatrice v2
Overview: Beatrice v2 is a voice conversion model designed for real-time applications, offering low latency and high-quality voice transformation. It's optimized for efficiency, requiring minimal computational resources, and supports both speech and singing voice conversion.
Use Cases:
- Live voice changing with minimal delay, suitable for interactive applications.
- Voice synthesis for virtual characters in games or virtual reality environments.
- Creating personalized voice assistants or chatbots with unique voices.
Integrating Beatrice v2 Models:
- Obtain Beatrice v2 models from official repositories or trusted sources.
- Ensure the model files are compatible with .voicery and place them in the appropriate directory.
- Within the .voicery interface, access the model management section and load the Beatrice v2 model.
- Adjust settings as necessary to achieve the desired voice transformation quality.
Model Management
Importing Models:
- Click on the 'Import Model' button within the .voicery interface.
- Select the model file (
.pth
,.onnx
, or other supported formats) from your local system. - Assign a name and optional description to the model for easy identification.
- Confirm the import to add the model to your library.
Exporting Models:
- Navigate to the model management section and select the model you wish to export.
- Click on the 'Export' option and choose the destination folder on your system.
- The model files will be saved in the selected location for backup or sharing purposes.
Organizing Models:
- Use folders or tags within the .voicery interface to categorize models based on type, usage, or other criteria.
- Rename models for clarity and ease of access.
- Delete unused or outdated models to maintain an organized library.
How to Add Voice Models to Voicery
Both RVC and Beatrice v2 models can be easily added to Voicery using the Web UI.
Supported File Types
.pth
— standard PyTorch models (used by RVC, Beatrice).onnx
— recommended for AMD/Intel GPU users (converted version of .pth).index
— optional index file for better accuracy (used by RVC v1/v2)
Step-by-Step Instructions
- Download a model from a trusted source like Hugging Face or AIHub.
- Open the Voicery interface in your browser (after launching
start_http.bat
). - Click the Edit button next to any voice slot.
- Click Upload and select:
model.pth
(main model)model.index
(optional for indexing)- Optionally, upload a .png avatar image
- After upload, assign a name and confirm the slot.
- Test the model in real time to ensure it's working correctly.
For ONNX Users (AMD / Intel)
- Once a
.pth
model is uploaded, a button “Export to ONNX” will appear. - Click it, wait for export.
- Go back to Edit → Upload → and load the ONNX file instead of the .pth.
- Switch to a different model, then back again, to apply ONNX changes.
How to Add RVC Voice Models to W-Okada
Where to find voice models:
- 🌐 AI Hub — RVC Model Search
- 📁 List of AI Voice Models(voice-models.com)
- 🔗 Beatrice v2 Beta Models (official)
- 📄 rentry.co/VoiceChangerOG — community-verified voice models
Audio Settings
Fine-tuning audio parameters is crucial for achieving optimal voice conversion quality. Below are detailed explanations of key settings:
Gain Controls
- Input Gain: Adjusts the volume of the incoming microphone signal. A typical value is
1.0
. - Output Gain: Controls the volume of the processed audio output. Values above
3.0
may cause clipping; consider reducing if distortion occurs. - Monitor Gain: Sets the volume level for monitoring the processed audio in real-time.
Pitch Adjustment (f0Factor)
The f0Factor
parameter modifies the pitch of the voice:
- Male to Female: Set
f0Factor
to approximately12
. - Female to Male: Set
f0Factor
to approximately-12
. - Same Gender: Adjust within the range of
-4
to4
based on desired pitch shift.
Note: Extreme values may result in unnatural or robotic-sounding voices. :contentReference[oaicite:3]{index=3}
Advanced Audio Parameters
- numTrancateTreshold: Determines the threshold for truncating input audio chunks. Default is
100
. - volTrancateThreshold: Sets the volume threshold for truncation. Default is
0.0005
. - volTrancateLength: Specifies the length of audio to truncate. Default is
32
. - Protocol: Communication protocol used. Recommended setting is
"sio"
. - sendingSampleRate: Sample rate for sending audio. Default is
48000
Hz. - inputChunkNum: Number of audio chunks processed per input. Adjusting this can affect latency and performance.
- downSamplingMode: Method for downsampling audio. Options include
"average"
. - sampleRate: Overall sample rate for processing. Default is
48000
Hz. - echoCancel: Enables echo cancellation. Set to
true
orfalse
. - noiseSuppression: Enables noise suppression. Set to
true
orfalse
. - noiseSuppression2: Alternative noise suppression method. Set to
true
orfalse
. - passThroughConfirmationSkip: Skips confirmation for pass-through. Set to
false
by default. - f0Detector: Algorithm used for pitch detection. Options include
rmvpe_onnx
,crepe_tiny
,dio
, andharvest
. Selection depends on hardware and desired quality. :contentReference[oaicite:4]{index=4}
Performance Tuning
Optimizing performance ensures low latency and efficient CPU/GPU usage during real-time voice conversion.
Reducing Latency
- Chunk Size: Lowering the
inputChunkNum
can reduce latency but may increase CPU usage. Experiment to find a balance. - Extra Buffer: Adjusting the
Extra
parameter can help manage processing delays. Higher values may increase latency but reduce glitches. :contentReference[oaicite:5]{index=5}
CPU/GPU Optimization
- GPU Acceleration: Utilize GPU processing by selecting appropriate models and ensuring compatible drivers are installed.
- Model Selection: Choose models optimized for your hardware. For instance,
rmvpe_onnx
is suitable for AMD GPUs. :contentReference[oaicite:6]{index=6} - Resource Monitoring: Use system monitoring tools to observe CPU and GPU usage, adjusting settings accordingly to prevent overloading.
Custom Shortcuts
Setting up custom keyboard shortcuts enhances workflow efficiency, especially for streamers and content creators.
Configuring Hotkeys
- Model Switching: Assign shortcuts like
CTRL + SHIFT + [Number]
to switch between different voice models quickly. - Pitch Adjustment: Use shortcuts such as
CTRL + [
andCTRL + ]
to decrease or increase pitch on the fly. :contentReference[oaicite:7]{index=7}
Implementation
To set up custom shortcuts:
- Access the settings or preferences menu within the application.
- Navigate to the "Shortcuts" or "Hotkeys" section.
- Assign desired key combinations to specific functions.
- Save changes and test the shortcuts to ensure they work as intended.
Note: The ability to configure shortcuts may depend on the specific version of the software being used.
RVC Model Training
RVC (Retrieval-based Voice Conversion) is an open-source framework that allows training custom voice models for real-time conversion. This section covers all the essentials.
Prerequisites
- Python 3.8+ installed (recommended: use Anaconda or venv)
- FFmpeg installed and added to PATH
- At least 10 minutes of clean speech from the target voice (no background music, noise or effects)
Dataset Preparation
- Use UVR to isolate vocals if needed.
- Split audio into multiple short WAV files (5–15 seconds each).
- Sample rate must be 44100Hz or 48000Hz (mono preferred).
Training Steps (via WebUI)
- Clone the WebUI:
git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
- Install dependencies:
pip install -r requirements.txt
- Launch WebUI:
python infer-web.py
- Upload dataset via UI and run "Feature Extraction".
- Start training and monitor loss — it may take 1–3 hours depending on dataset size and GPU.
Resources
Beatrice v2 Model Training
Beatrice v2 is a powerful and lightweight voice conversion model. It can be trained from scratch or fine-tuned using the web-based Beatrice Trainer.
Prerequisites
- Python 3.8+ with pip
- FFmpeg installed
- Clean speech dataset (10+ minutes recommended)
Training Steps (via WebUI)
- Clone the Beatrice trainer UI:
git clone https://github.com/JarodMica/beatrice_trainer_webui
- Install dependencies (see README for your OS):
pip install -r requirements.txt
- Prepare your dataset and place it in the appropriate folder.
- Run the UI:
python app.py
- Use the web interface to start training and monitor progress.
Resources
Best Practices
To ensure you achieve high-quality voice models during training, follow these industry best practices:
Data Quality
- Use lossless audio formats like WAV, mono preferred
- Avoid reverberation, compression, or excessive noise
- Remove silences and normalize volume across clips
Recording Consistency
- Record in the same room and with the same microphone setup
- Do not mix voice styles or accents in one model
Model Training Tips
- Use a GPU (NVIDIA CUDA 11.7+ recommended) to reduce training time
- Train in short sessions and validate outputs regularly
- Try ONNX conversion if using AMD or Intel hardware
After Training
- Test in Voicery by uploading the model (
.pth
or.onnx
) - Assign a name and image in the Voicery interface
- Test in real-time and compare to reference audio
Common Issues
Model doesn't start after double-clicking start_http.bat
- ✔ Make sure your antivirus didn't block Python or the bat file.
- ✔ Try launching it manually via command line:
cd C:\Path\To\vcclient\ start_http.bat
Web UI doesn't load in the browser
- ✔ Wait for full package download and Python dependency installation (first launch may take several minutes).
- ✔ Ensure no other app is using port 18888. Try:
netstat -ano | findstr :18888
No voice conversion / silence in output
- ✔ Check if your input/output devices are set correctly in the UI.
- ✔ Test your mic in
mmsys.cpl
or macOS "Sound" settings. - ✔ Set input/output channels to mono or stereo only — 5.1/7.1 not supported.
Voice is robotic, distorted, or broken
- ✔ Lower pitch shift or
f0Factor
(try values between -4 and 4). - ✔ Choose a model that matches your natural vocal range.
- ✔ Try rmvpe or crepe pitch detector instead of
dio
orharvest
.
Black window flashes then closes instantly
- ✔ Likely Python not installed or environment not set up.
- ✔ Run the bat file via terminal to see the actual error message.
Error Codes
Most common runtime and training errors
Error | Description | Fix |
---|---|---|
RuntimeError: CUDA error: illegal memory access |
GPU overload or driver mismatch | Lower model size or batch. Update drivers. |
IndexError: size mismatch in tensor |
Broken or invalid training audio | Recreate your dataset and re-split audio files properly. |
ONNXRuntimeError: Load model failed |
Bad ONNX export or incompatible format | Export again from Voicery and test in ONNX runtime first. |
UnpicklingError: invalid load key |
Corrupted or wrong .pth model file |
Redownload or retrain your model. Avoid using .pth from unknown sources. |
Support Resources
Official Help & Documentation
Community Support
Contact Voicery Team
If you're using the official .voicery build and encounter issues, reach out to:
- Email: [email protected]
- Website: voicery.net
Frequently Asked Questions (FAQ)
Q: Do I need a GPU to run .voicery?
A: A dedicated GPU (preferably NVIDIA with CUDA 11.7/11.8) is recommended for real-time performance. However, .voicery can also run on CPUs or with DirectML for AMD/Intel GPUs, but with higher latency.
Q: Where can I find compatible voice models?
A: You can download pre-trained models from:
Q: Why is my converted voice robotic or broken?
A: This is often caused by an unsuitable pitch setting or incompatible model. Try adjusting the f0Factor
(pitch shift), change pitch detectors to rmvpe
or crepe
, and ensure the voice model matches your vocal type (male/female).
Q: What model format does .voicery support?
A: You can use models in .pth
(PyTorch), .onnx
(ONNX - great for AMD/Intel), and optional .index
files for enhanced indexing (RVC only).
Q: Can I use .voicery for live streaming or gaming?
A: Absolutely. .voicery supports virtual audio routing. We recommend installing VAC Lite on Windows or BlackHole on macOS for microphone input/output routing.
Q: Does .voicery work on macOS and Linux?
A: Yes, partial support is available:
- macOS: Download the
vcclient_mac_*.zip
version from Hugging Face. - Linux: You need to build from source using W-Okada's GitHub project. See our Linux Installation section.
Q: How do I update .voicery without losing models?
A: Simply backup the model_dir
folder, download the new version, and move your models into the new folder. Your start_http.bat
or startHTTP.command
can also be reused.
Q: What pitch detector should I use?
A: It depends on your system and voice type:
- rmvpe: Highest quality (default)
- crepe_tiny: Good balance of speed and quality
- dio / harvest: Fastest, but lower quality
Q: Can I train my own voice model?
A: Yes! See our section "Training Custom Voice Models" for a full step-by-step guide on RVC and Beatrice model training using WebUI or Colab.
Q: I hear myself delayed when using the mic. Is that normal?
A: Some delay (latency) is expected depending on your CPU/GPU. You can reduce it by lowering inputChunkNum
or increasing model speed settings. For best results, use a low-latency audio driver and a fast GPU.
Still have questions?
Check out our Support Resources section or contact [email protected].
Tutorial Videos
1. How to Set Up W-Okada Voice Changer
Beatrice v2
Resources:
Resources:
RVC
Resources:
Optimizing Settings in AI Voice Changer Client
Resources:
2. Changing Your Voice for Free
How to Get AI Voice Models
Resources:
3. How to Train Voices for the Realtime AI Voice Changer
Beatrice V2
Resources:
RVC
Resources:
Community Contributions
Official GitHub Repository
Access the source code, documentation, and latest updates for the W-Okada Voice Changer.