Documentation

Welcome to .voicery Documentation

.voicery is a modern voice changer powered by W-Okada engine — adapted for simplicity, ease of use, and English-speaking users.

Use the menu on the left to navigate through the documentation.

If you prefer an all-in-one preconfigured version with included models, download Voicery.

Overview

.voicery is built on top of the powerful open-source project W-Okada Voice Changer.

While W-Okada offers incredible real-time AI voice conversion, it can be complex to install and use for non-technical users. Voicery solves that by packaging the core technology into an easy-to-use interface for English-speaking users, with built-in voice models and one-click installation.

If you want full control and flexibility, you can follow this documentation to configure and use W-Okada directly.
Or, you can download the Voicery version with all settings pre-configured.

Features

  • Real-time voice conversion using state-of-the-art AI models.
  • Support for RVC and Beatrice v2 voice models.
  • Custom voice training and model management.
  • Clean and responsive graphical interface.
  • CLI support for advanced users: python server.py --voice="model_name"

Use Cases

Voicery and W-Okada-based voice changers can be used in many areas:

  • Gaming: Modify your voice to match in-game characters in real time.
  • Streaming: Create fun or anonymous experiences on Twitch, YouTube, etc.
  • Meetings & Calls: Maintain privacy or simulate accents during virtual meetings.
  • Content Creation: Add voiceovers with unique styles to videos, music, or podcasts.

Operating System Support

Voicery is primarily optimized for Windows platforms. While support for macOS and Linux is planned for future releases, users on these systems can currently utilize the underlying W-Okada Voice Changer directly.

For Windows users, Voicery offers a streamlined experience with pre-configured settings and an intuitive interface.

Hardware Specifications

Minimum Requirements

  • CPU: Modern multi-core processor (e.g., Intel i5 or AMD Ryzen 5)
  • RAM: 8 GB
  • GPU:
    • NVIDIA: GTX 1060 or higher with CUDA support
    • AMD: Radeon RX 580 or higher with DirectML support
    • Intel: Integrated GPUs are not recommended due to performance limitations

Recommended Requirements

  • CPU: Intel i7 or AMD Ryzen 7
  • RAM: 16 GB or more
  • GPU:
    • NVIDIA: RTX 3060 (12GB VRAM) or higher for optimal performance and training capabilities
    • AMD: Radeon RX 6700 XT or higher with DirectML support

Note: High VRAM GPUs are beneficial for training custom voice models and ensuring low-latency real-time voice conversion. :contentReference[oaicite:1]{index=1}

Audio Setup

For seamless voice routing and integration with applications like Discord, OBS, or Zoom, it's recommended to use virtual audio cables.

Recommended Virtual Audio Cable Software

Setup Instructions

  1. Download and install the virtual audio cable software of your choice.
  2. Set the virtual cable as the default output device in your system's sound settings.
  3. Configure Voicery to use the virtual cable as its output device.
  4. In your target application (e.g., Discord), set the input device to the virtual cable.

This setup ensures that the modified voice output from Voicery is correctly routed into your communication or recording applications.

Windows Installation

There are two different builds available for Windows depending on your GPU:

Option 1: CUDA Version (for NVIDIA GPUs)

Recommended if you have a CUDA-compatible NVIDIA GPU with drivers for CUDA 11.7 or 11.8.

Download

Requirements

Option 2: Standard Version (for AMD / Intel or CPU-only)

Use this if your system does not have an NVIDIA GPU. This version relies on CPU or DirectML acceleration (for AMD / Intel GPUs).

Download

Requirements

  • Any modern CPU (quad-core recommended) or AMD/Intel GPU with DirectML support
  • Windows 10 or newer
  • No need for CUDA Toolkit

Installation Steps (for both versions)

  1. Download and unzip the appropriate version to a folder of your choice (e.g., Desktop).
  2. Open the folder and double-click start_http.bat to launch the voice changer.
  3. If the window closes immediately, open CMD manually and run:
  4. cd C:\Path\To\UnzippedFolder
          start_http.bat
  5. Wait for first-time setup (it may install Python packages or dependencies).

Optional: Virtual Audio Setup

To route the AI-processed voice into apps like Discord or OBS:

  • Recommended: VAC Lite (Line 1)
  • Not recommended: VB-Audio Virtual Cable (prone to delay and instability)
  • After installation, run mmsys.cpl via Win+R to manage default input/output devices

macOS Installation

macOS builds are available for both Intel and Apple Silicon (M1/M2). Apple Silicon is recommended.

Download

Installation Steps

  1. Unzip the downloaded archive.
  2. Control+Click MMVCServerSIO → Open → Allow in System Preferences.
  3. Control+Click startHTTP.command → Open.
  4. Important: Launch MMVCServerSIO before startHTTP.command.

Virtual Audio Cable

  • BlackHole — install 2ch version.
  • Configure routing via Audio MIDI Setup.

Linux Installation

There is no precompiled version for Linux. You must build from source using Python and manually configure dependencies.

Install Instructions

sudo apt update
      sudo apt install git python3.10 python3.10-venv ffmpeg

      git clone https://github.com/w-okada/voice-changer.git
      cd voice-changer

      python3.10 -m venv venv
      source venv/bin/activate

      pip install --upgrade pip
      pip install -r requirements.txt

      python MMVCServerSIO.py

GPU Acceleration

If using NVIDIA GPU:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Audio Routing

  • Use pavucontrol to route input/output devices.
  • Or configure JACK Audio for advanced setups.

Version Notes

  • 2.0.76-beta (macOS): Latest mac build, works on M1/M2. May show warnings — allow via Gatekeeper.
  • 2.0.70-beta (macOS M1-only): Optimized for Apple Silicon, fewer crashes, less GPU usage.
  • 2.0.61-alpha (Windows CUDA): Stable CUDA build for NVIDIA GPUs (recommended).

Release Source

All downloads are available on Hugging Face:

https://huggingface.co/wok000/vcclient000

Model Migration When Updating

  • Backup model_dir and copy it into the new version’s folder.
  • Keep your shortcut to start_http.bat or startHTTP.command.
  • After launching new version, reset audio/output/chunk settings as needed.

Launching the Application

After installation, follow these steps to launch the Voicery app:

  1. Navigate to the folder where you extracted the ZIP archive.
  2. On Windows: Double-click start_http.bat.
  3. On macOS: Hold Control and click MMVCServerSIO → choose "Open". Repeat for startHTTP.command.
  4. Wait for the app to initialize and launch the web-based user interface in your default browser.

Basic Configuration

Input/Output Devices

To use Voicery with your mic and speakers or route output into apps like Discord or OBS, configure audio devices properly:

  1. In the Web UI, open the audio settings panel.
  2. Select your main microphone under Input Device.
  3. For Output Device, choose your virtual cable (e.g., [MME] CABLE Input (VB-Audio)).
  4. To hear the modified voice, set your headphones or speakers under Monitor Output.

💡 Press Win + R and type mmsys.cpl to open Windows sound settings and make sure defaults are correctly configured.

Voice Models

Voicery supports real-time voice conversion using RVC (Retrieval-based Voice Conversion) and Beatrice v2 models. To change or upload a model:

  1. Click the Edit button next to the model slot in the Web UI.
  2. Click Upload and choose a voice model file: .pth (PyTorch) or .onnx (ONNX format).
  3. RVC v1/v2 is supported by default — no need to change type unless you're using a custom SVC/Beatrice model.
  4. For AMD/Intel users, ONNX models are recommended for better performance.
  5. You may assign an image to the model by clicking the placeholder icon labeled no image.

Testing

Follow these steps to test voice conversion in real-time:

  1. Ensure your microphone and headphones are correctly selected in both Voicery and your OS sound settings.
  2. Speak into the microphone — your transformed voice should play back in real time.
  3. Use apps like Discord, OBS, or Zoom and select the virtual audio cable as the microphone input.
  4. Adjust pitch shift, noise reduction, gain, and buffer size inside the Voicery UI for optimal quality.
  5. Test different models and switch voices live during usage.

RVC (Retrieval-based Voice Conversion)

Overview: RVC is an open-source voice conversion framework that enables realistic speech-to-speech transformations, preserving the intonation and audio characteristics of the original speaker. It allows for real-time voice conversion with low latency, making it suitable for various applications such as gaming, streaming, and content creation.

Use Cases:

  • Real-time voice modulation during live streams or gaming sessions.
  • Creating AI-generated singing or speaking voices for content production.
  • Voice anonymization for privacy during online communications.

Adding and Managing RVC Models:

  1. Download RVC models from trusted sources, ensuring they are compatible with .voicery.
  2. Place the model files (typically .pth and .index) into the designated models directory within .voicery.
  3. In the .voicery interface, navigate to the model management section and load the desired RVC model.
  4. Configure any additional settings as required for optimal performance.

Beatrice v2

Overview: Beatrice v2 is a voice conversion model designed for real-time applications, offering low latency and high-quality voice transformation. It's optimized for efficiency, requiring minimal computational resources, and supports both speech and singing voice conversion.

Use Cases:

  • Live voice changing with minimal delay, suitable for interactive applications.
  • Voice synthesis for virtual characters in games or virtual reality environments.
  • Creating personalized voice assistants or chatbots with unique voices.

Integrating Beatrice v2 Models:

  1. Obtain Beatrice v2 models from official repositories or trusted sources.
  2. Ensure the model files are compatible with .voicery and place them in the appropriate directory.
  3. Within the .voicery interface, access the model management section and load the Beatrice v2 model.
  4. Adjust settings as necessary to achieve the desired voice transformation quality.

Model Management

Importing Models:

  1. Click on the 'Import Model' button within the .voicery interface.
  2. Select the model file (.pth, .onnx, or other supported formats) from your local system.
  3. Assign a name and optional description to the model for easy identification.
  4. Confirm the import to add the model to your library.

Exporting Models:

  1. Navigate to the model management section and select the model you wish to export.
  2. Click on the 'Export' option and choose the destination folder on your system.
  3. The model files will be saved in the selected location for backup or sharing purposes.

Organizing Models:

  • Use folders or tags within the .voicery interface to categorize models based on type, usage, or other criteria.
  • Rename models for clarity and ease of access.
  • Delete unused or outdated models to maintain an organized library.

How to Add Voice Models to Voicery

Both RVC and Beatrice v2 models can be easily added to Voicery using the Web UI.

Supported File Types

  • .pth — standard PyTorch models (used by RVC, Beatrice)
  • .onnx — recommended for AMD/Intel GPU users (converted version of .pth)
  • .index — optional index file for better accuracy (used by RVC v1/v2)

Step-by-Step Instructions

  1. Download a model from a trusted source like Hugging Face or AIHub.
  2. Open the Voicery interface in your browser (after launching start_http.bat).
  3. Click the Edit button next to any voice slot.
  4. Click Upload and select:
    • model.pth (main model)
    • model.index (optional for indexing)
    • Optionally, upload a .png avatar image
  5. After upload, assign a name and confirm the slot.
  6. Test the model in real time to ensure it's working correctly.

For ONNX Users (AMD / Intel)

  1. Once a .pth model is uploaded, a button “Export to ONNX” will appear.
  2. Click it, wait for export.
  3. Go back to Edit → Upload → and load the ONNX file instead of the .pth.
  4. Switch to a different model, then back again, to apply ONNX changes.

How to Add RVC Voice Models to W-Okada

Where to find voice models:

Audio Settings

Fine-tuning audio parameters is crucial for achieving optimal voice conversion quality. Below are detailed explanations of key settings:

Gain Controls

  • Input Gain: Adjusts the volume of the incoming microphone signal. A typical value is 1.0.
  • Output Gain: Controls the volume of the processed audio output. Values above 3.0 may cause clipping; consider reducing if distortion occurs.
  • Monitor Gain: Sets the volume level for monitoring the processed audio in real-time.

Pitch Adjustment (f0Factor)

The f0Factor parameter modifies the pitch of the voice:

  • Male to Female: Set f0Factor to approximately 12.
  • Female to Male: Set f0Factor to approximately -12.
  • Same Gender: Adjust within the range of -4 to 4 based on desired pitch shift.

Note: Extreme values may result in unnatural or robotic-sounding voices. :contentReference[oaicite:3]{index=3}

Advanced Audio Parameters

  • numTrancateTreshold: Determines the threshold for truncating input audio chunks. Default is 100.
  • volTrancateThreshold: Sets the volume threshold for truncation. Default is 0.0005.
  • volTrancateLength: Specifies the length of audio to truncate. Default is 32.
  • Protocol: Communication protocol used. Recommended setting is "sio".
  • sendingSampleRate: Sample rate for sending audio. Default is 48000 Hz.
  • inputChunkNum: Number of audio chunks processed per input. Adjusting this can affect latency and performance.
  • downSamplingMode: Method for downsampling audio. Options include "average".
  • sampleRate: Overall sample rate for processing. Default is 48000 Hz.
  • echoCancel: Enables echo cancellation. Set to true or false.
  • noiseSuppression: Enables noise suppression. Set to true or false.
  • noiseSuppression2: Alternative noise suppression method. Set to true or false.
  • passThroughConfirmationSkip: Skips confirmation for pass-through. Set to false by default.
  • f0Detector: Algorithm used for pitch detection. Options include rmvpe_onnx, crepe_tiny, dio, and harvest. Selection depends on hardware and desired quality. :contentReference[oaicite:4]{index=4}

Performance Tuning

Optimizing performance ensures low latency and efficient CPU/GPU usage during real-time voice conversion.

Reducing Latency

  • Chunk Size: Lowering the inputChunkNum can reduce latency but may increase CPU usage. Experiment to find a balance.
  • Extra Buffer: Adjusting the Extra parameter can help manage processing delays. Higher values may increase latency but reduce glitches. :contentReference[oaicite:5]{index=5}

CPU/GPU Optimization

  • GPU Acceleration: Utilize GPU processing by selecting appropriate models and ensuring compatible drivers are installed.
  • Model Selection: Choose models optimized for your hardware. For instance, rmvpe_onnx is suitable for AMD GPUs. :contentReference[oaicite:6]{index=6}
  • Resource Monitoring: Use system monitoring tools to observe CPU and GPU usage, adjusting settings accordingly to prevent overloading.

Custom Shortcuts

Setting up custom keyboard shortcuts enhances workflow efficiency, especially for streamers and content creators.

Configuring Hotkeys

  • Model Switching: Assign shortcuts like CTRL + SHIFT + [Number] to switch between different voice models quickly.
  • Pitch Adjustment: Use shortcuts such as CTRL + [ and CTRL + ] to decrease or increase pitch on the fly. :contentReference[oaicite:7]{index=7}

Implementation

To set up custom shortcuts:

  1. Access the settings or preferences menu within the application.
  2. Navigate to the "Shortcuts" or "Hotkeys" section.
  3. Assign desired key combinations to specific functions.
  4. Save changes and test the shortcuts to ensure they work as intended.

Note: The ability to configure shortcuts may depend on the specific version of the software being used.

RVC Model Training

RVC (Retrieval-based Voice Conversion) is an open-source framework that allows training custom voice models for real-time conversion. This section covers all the essentials.

Prerequisites

  • Python 3.8+ installed (recommended: use Anaconda or venv)
  • FFmpeg installed and added to PATH
  • At least 10 minutes of clean speech from the target voice (no background music, noise or effects)

Dataset Preparation

  1. Use UVR to isolate vocals if needed.
  2. Split audio into multiple short WAV files (5–15 seconds each).
  3. Sample rate must be 44100Hz or 48000Hz (mono preferred).

Training Steps (via WebUI)

  1. Clone the WebUI:
    git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
  2. Install dependencies:
    pip install -r requirements.txt
  3. Launch WebUI:
    python infer-web.py
  4. Upload dataset via UI and run "Feature Extraction".
  5. Start training and monitor loss — it may take 1–3 hours depending on dataset size and GPU.

Resources

Beatrice v2 Model Training

Beatrice v2 is a powerful and lightweight voice conversion model. It can be trained from scratch or fine-tuned using the web-based Beatrice Trainer.

Prerequisites

  • Python 3.8+ with pip
  • FFmpeg installed
  • Clean speech dataset (10+ minutes recommended)

Training Steps (via WebUI)

  1. Clone the Beatrice trainer UI:
    git clone https://github.com/JarodMica/beatrice_trainer_webui
  2. Install dependencies (see README for your OS):
    pip install -r requirements.txt
  3. Prepare your dataset and place it in the appropriate folder.
  4. Run the UI:
    python app.py
  5. Use the web interface to start training and monitor progress.

Resources

Best Practices

To ensure you achieve high-quality voice models during training, follow these industry best practices:

Data Quality

  • Use lossless audio formats like WAV, mono preferred
  • Avoid reverberation, compression, or excessive noise
  • Remove silences and normalize volume across clips

Recording Consistency

  • Record in the same room and with the same microphone setup
  • Do not mix voice styles or accents in one model

Model Training Tips

  • Use a GPU (NVIDIA CUDA 11.7+ recommended) to reduce training time
  • Train in short sessions and validate outputs regularly
  • Try ONNX conversion if using AMD or Intel hardware

After Training

  • Test in Voicery by uploading the model (.pth or .onnx)
  • Assign a name and image in the Voicery interface
  • Test in real-time and compare to reference audio

Common Issues

Model doesn't start after double-clicking start_http.bat

  • ✔ Make sure your antivirus didn't block Python or the bat file.
  • ✔ Try launching it manually via command line:
    cd C:\Path\To\vcclient\
          start_http.bat

Web UI doesn't load in the browser

  • ✔ Wait for full package download and Python dependency installation (first launch may take several minutes).
  • ✔ Ensure no other app is using port 18888. Try: netstat -ano | findstr :18888

No voice conversion / silence in output

  • ✔ Check if your input/output devices are set correctly in the UI.
  • ✔ Test your mic in mmsys.cpl or macOS "Sound" settings.
  • ✔ Set input/output channels to mono or stereo only — 5.1/7.1 not supported.

Voice is robotic, distorted, or broken

  • ✔ Lower pitch shift or f0Factor (try values between -4 and 4).
  • ✔ Choose a model that matches your natural vocal range.
  • ✔ Try rmvpe or crepe pitch detector instead of dio or harvest.

Black window flashes then closes instantly

  • ✔ Likely Python not installed or environment not set up.
  • ✔ Run the bat file via terminal to see the actual error message.

Error Codes

Most common runtime and training errors

Error Description Fix
RuntimeError: CUDA error: illegal memory access GPU overload or driver mismatch Lower model size or batch. Update drivers.
IndexError: size mismatch in tensor Broken or invalid training audio Recreate your dataset and re-split audio files properly.
ONNXRuntimeError: Load model failed Bad ONNX export or incompatible format Export again from Voicery and test in ONNX runtime first.
UnpicklingError: invalid load key Corrupted or wrong .pth model file Redownload or retrain your model. Avoid using .pth from unknown sources.

Support Resources

Official Help & Documentation

Community Support

Contact Voicery Team

If you're using the official .voicery build and encounter issues, reach out to:

Frequently Asked Questions (FAQ)

Q: Do I need a GPU to run .voicery?

A: A dedicated GPU (preferably NVIDIA with CUDA 11.7/11.8) is recommended for real-time performance. However, .voicery can also run on CPUs or with DirectML for AMD/Intel GPUs, but with higher latency.

Q: Where can I find compatible voice models?

A: You can download pre-trained models from:

Q: Why is my converted voice robotic or broken?

A: This is often caused by an unsuitable pitch setting or incompatible model. Try adjusting the f0Factor (pitch shift), change pitch detectors to rmvpe or crepe, and ensure the voice model matches your vocal type (male/female).

Q: What model format does .voicery support?

A: You can use models in .pth (PyTorch), .onnx (ONNX - great for AMD/Intel), and optional .index files for enhanced indexing (RVC only).

Q: Can I use .voicery for live streaming or gaming?

A: Absolutely. .voicery supports virtual audio routing. We recommend installing VAC Lite on Windows or BlackHole on macOS for microphone input/output routing.

Q: Does .voicery work on macOS and Linux?

A: Yes, partial support is available:

  • macOS: Download the vcclient_mac_*.zip version from Hugging Face.
  • Linux: You need to build from source using W-Okada's GitHub project. See our Linux Installation section.

Q: How do I update .voicery without losing models?

A: Simply backup the model_dir folder, download the new version, and move your models into the new folder. Your start_http.bat or startHTTP.command can also be reused.

Q: What pitch detector should I use?

A: It depends on your system and voice type:

  • rmvpe: Highest quality (default)
  • crepe_tiny: Good balance of speed and quality
  • dio / harvest: Fastest, but lower quality

Q: Can I train my own voice model?

A: Yes! See our section "Training Custom Voice Models" for a full step-by-step guide on RVC and Beatrice model training using WebUI or Colab.

Q: I hear myself delayed when using the mic. Is that normal?

A: Some delay (latency) is expected depending on your CPU/GPU. You can reduce it by lowering inputChunkNum or increasing model speed settings. For best results, use a low-latency audio driver and a fast GPU.

Still have questions?

Check out our Support Resources section or contact [email protected].

Tutorial Videos

1. How to Set Up W-Okada Voice Changer

Beatrice v2

Resources:

Resources:

RVC

Resources:

Optimizing Settings in AI Voice Changer Client

Resources:

2. Changing Your Voice for Free

How to Get AI Voice Models

Resources:

3. How to Train Voices for the Realtime AI Voice Changer

Beatrice V2

Resources:

RVC

Resources:

Community Contributions

Official GitHub Repository

Access the source code, documentation, and latest updates for the W-Okada Voice Changer.