F5-TTS Installation Guide for RTX 5070 on WSL2

Prerequisites

Windows with WSL2 installed
NVIDIA RTX 5070 GPU
NVIDIA drivers installed on Windows host

Issue Overview

RTX 50-series GPUs use CUDA compute capability sm_120, which requires PyTorch with CUDA 12.8 support. Standard PyTorch installations only support up to sm_90, causing runtime errors.

Installation Steps

1. Setup Conda Environment

# Install Miniconda (if not already installed)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

# Create new environment
conda create -n f5-tts python=3.11
conda activate f5-tts

2. Install PyTorch with CUDA 12.8 Support

# Install PyTorch with CUDA 12.8 (critical for RTX 50-series)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# Verify GPU compatibility
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA version: {torch.version.cuda}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name()}')
    x = torch.randn(1000, 1000).cuda()
    y = x @ x
    print('✅ GPU computation successful!')
"

Expected output:

PyTorch version: 2.7.1+cu128
CUDA version: 12.8
CUDA available: True
GPU: NVIDIA GeForce RTX 5070
✅ GPU computation successful!

3. Install F5-TTS

# Clone F5-TTS repository
git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS

# Install F5-TTS
pip install -e .

# Test imports
python -c "
import torch
from f5_tts.infer.utils_infer import load_model
print('F5-TTS imports successfully!')
print(f'GPU available: {torch.cuda.is_available()}')
"

4. Launch F5-TTS

# Start Gradio interface
python -m f5_tts.infer.infer_gradio --host 0.0.0.0 --port 7860

Access the interface at: http://localhost:7860

Voice Cloning Setup

Recording Reference Audio

Duration: 15-30 seconds
Quality: Clear, natural speech
Language: Any language (cross-lingual cloning supported)
Format: WAV or MP3

Cross-lingual Voice Cloning

Record reference audio in your native language
Use the cloned voice to speak any supported language
Voice characteristics transfer while pronunciation adapts to target language

Performance Expectations on RTX 5070

Inference time: 2-5 seconds for typical text lengths
VRAM usage: ~2-3GB during inference
Concurrent processing: Possible due to 12GB VRAM capacity

Environment Management

Daily Usage

# Activate environment
conda activate f5-tts

# Navigate to F5-TTS
cd ~/F5-TTS

# Start interface
python -m f5_tts.infer.infer_gradio --host 0.0.0.0 --port 7860

Create Start Script

# Create convenient startup script
cat << 'EOF' > ~/start_f5tts.sh
#!/bin/bash
conda activate f5-tts
cd ~/F5-TTS
python -m f5_tts.infer.infer_gradio --host 0.0.0.0 --port 7860
EOF

chmod +x ~/start_f5tts.sh

# Usage: ./start_f5tts.sh

Troubleshooting

Common Issues

“CUDA error: no kernel image available”
- Solution: Ensure PyTorch with CUDA 12.8 is installed
- Verify with: python -c "import torch; print(torch.version.cuda)"
Import errors
- Ensure you’re in the correct conda environment: conda activate f5-tts
- Reinstall F5-TTS: pip install -e .
GPU not detected
- Check NVIDIA drivers on Windows host
- Verify WSL2 GPU passthrough is working: nvidia-smi

Performance Optimization

Batch processing: RTX 5070’s 12GB VRAM supports larger batch sizes
Memory management: Close other GPU applications for optimal performance
Temperature monitoring: Ensure adequate cooling for sustained workloads

Next Steps

API Integration: Set up REST API for n8n workflows
Voice Library: Build collection of reference voices
Automation: Create scripts for batch processing
Quality Tuning: Experiment with different reference audio qualities

Key Success Factors

✅ PyTorch 2.7.1+ with CUDA 12.8 (essential for RTX 50-series)
✅ Clean conda environment (avoids dependency conflicts)
✅ Proper WSL2 GPU setup (nvidia-smi should work)
✅ Quality reference audio (clear, natural speech samples)

Prerequisites¶

Issue Overview¶

Installation Steps¶

1. Setup Conda Environment¶

2. Install PyTorch with CUDA 12.8 Support¶

3. Install F5-TTS¶

4. Launch F5-TTS¶

Voice Cloning Setup¶

Recording Reference Audio¶

Cross-lingual Voice Cloning¶

Performance Expectations on RTX 5070¶

Environment Management¶

Daily Usage¶

Create Start Script¶

Troubleshooting¶

Common Issues¶

Performance Optimization¶

Next Steps¶

Key Success Factors¶