Prerequisites

  • Windows with WSL2 installed
  • NVIDIA RTX 5070 GPU
  • NVIDIA drivers installed on Windows host

Issue Overview

RTX 50-series GPUs use CUDA compute capability sm_120, which requires PyTorch with CUDA 12.8 support. Standard PyTorch installations only support up to sm_90, causing runtime errors.

Installation Steps

1. Setup Conda Environment

# Install Miniconda (if not already installed)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

# Create new environment
conda create -n f5-tts python=3.11
conda activate f5-tts

2. Install PyTorch with CUDA 12.8 Support

# Install PyTorch with CUDA 12.8 (critical for RTX 50-series)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# Verify GPU compatibility
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA version: {torch.version.cuda}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name()}')
    x = torch.randn(1000, 1000).cuda()
    y = x @ x
    print('✅ GPU computation successful!')
"

Expected output:

PyTorch version: 2.7.1+cu128
CUDA version: 12.8
CUDA available: True
GPU: NVIDIA GeForce RTX 5070
✅ GPU computation successful!

3. Install F5-TTS

# Clone F5-TTS repository
git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS

# Install F5-TTS
pip install -e .

# Test imports
python -c "
import torch
from f5_tts.infer.utils_infer import load_model
print('F5-TTS imports successfully!')
print(f'GPU available: {torch.cuda.is_available()}')
"

4. Launch F5-TTS

# Start Gradio interface
python -m f5_tts.infer.infer_gradio --host 0.0.0.0 --port 7860

Access the interface at: http://localhost:7860

Voice Cloning Setup

Recording Reference Audio

  1. Duration: 15-30 seconds
  2. Quality: Clear, natural speech
  3. Language: Any language (cross-lingual cloning supported)
  4. Format: WAV or MP3

Cross-lingual Voice Cloning

  • Record reference audio in your native language
  • Use the cloned voice to speak any supported language
  • Voice characteristics transfer while pronunciation adapts to target language

Performance Expectations on RTX 5070

  • Inference time: 2-5 seconds for typical text lengths
  • VRAM usage: ~2-3GB during inference
  • Concurrent processing: Possible due to 12GB VRAM capacity

Environment Management

Daily Usage

# Activate environment
conda activate f5-tts

# Navigate to F5-TTS
cd ~/F5-TTS

# Start interface
python -m f5_tts.infer.infer_gradio --host 0.0.0.0 --port 7860

Create Start Script

# Create convenient startup script
cat << 'EOF' > ~/start_f5tts.sh
#!/bin/bash
conda activate f5-tts
cd ~/F5-TTS
python -m f5_tts.infer.infer_gradio --host 0.0.0.0 --port 7860
EOF

chmod +x ~/start_f5tts.sh

# Usage: ./start_f5tts.sh

Troubleshooting

Common Issues

  1. “CUDA error: no kernel image available”

    • Solution: Ensure PyTorch with CUDA 12.8 is installed
    • Verify with: python -c "import torch; print(torch.version.cuda)"
  2. Import errors

    • Ensure you’re in the correct conda environment: conda activate f5-tts
    • Reinstall F5-TTS: pip install -e .
  3. GPU not detected

    • Check NVIDIA drivers on Windows host
    • Verify WSL2 GPU passthrough is working: nvidia-smi

Performance Optimization

  • Batch processing: RTX 5070’s 12GB VRAM supports larger batch sizes
  • Memory management: Close other GPU applications for optimal performance
  • Temperature monitoring: Ensure adequate cooling for sustained workloads

Next Steps

  • API Integration: Set up REST API for n8n workflows
  • Voice Library: Build collection of reference voices
  • Automation: Create scripts for batch processing
  • Quality Tuning: Experiment with different reference audio qualities

Key Success Factors

PyTorch 2.7.1+ with CUDA 12.8 (essential for RTX 50-series)
Clean conda environment (avoids dependency conflicts)
Proper WSL2 GPU setup (nvidia-smi should work)
Quality reference audio (clear, natural speech samples)