Prerequisites
- Windows with WSL2 installed
- NVIDIA RTX 5070 GPU
- NVIDIA drivers installed on Windows host
Issue Overview
RTX 50-series GPUs use CUDA compute capability sm_120, which requires PyTorch with CUDA 12.8 support. Standard PyTorch installations only support up to sm_90, causing runtime errors.
Installation Steps
1. Setup Conda Environment
# Install Miniconda (if not already installed)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc
# Create new environment
conda create -n f5-tts python=3.11
conda activate f5-tts
2. Install PyTorch with CUDA 12.8 Support
# Install PyTorch with CUDA 12.8 (critical for RTX 50-series)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
# Verify GPU compatibility
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA version: {torch.version.cuda}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
print(f'GPU: {torch.cuda.get_device_name()}')
x = torch.randn(1000, 1000).cuda()
y = x @ x
print('✅ GPU computation successful!')
"
Expected output:
PyTorch version: 2.7.1+cu128
CUDA version: 12.8
CUDA available: True
GPU: NVIDIA GeForce RTX 5070
✅ GPU computation successful!
3. Install F5-TTS
# Clone F5-TTS repository
git clone https://github.com/SWivid/F5-TTS.git
cd F5-TTS
# Install F5-TTS
pip install -e .
# Test imports
python -c "
import torch
from f5_tts.infer.utils_infer import load_model
print('F5-TTS imports successfully!')
print(f'GPU available: {torch.cuda.is_available()}')
"
4. Launch F5-TTS
# Start Gradio interface
python -m f5_tts.infer.infer_gradio --host 0.0.0.0 --port 7860
Access the interface at: http://localhost:7860
Voice Cloning Setup
Recording Reference Audio
- Duration: 15-30 seconds
- Quality: Clear, natural speech
- Language: Any language (cross-lingual cloning supported)
- Format: WAV or MP3
Cross-lingual Voice Cloning
- Record reference audio in your native language
- Use the cloned voice to speak any supported language
- Voice characteristics transfer while pronunciation adapts to target language
Performance Expectations on RTX 5070
- Inference time: 2-5 seconds for typical text lengths
- VRAM usage: ~2-3GB during inference
- Concurrent processing: Possible due to 12GB VRAM capacity
Environment Management
Daily Usage
# Activate environment
conda activate f5-tts
# Navigate to F5-TTS
cd ~/F5-TTS
# Start interface
python -m f5_tts.infer.infer_gradio --host 0.0.0.0 --port 7860
Create Start Script
# Create convenient startup script
cat << 'EOF' > ~/start_f5tts.sh
#!/bin/bash
conda activate f5-tts
cd ~/F5-TTS
python -m f5_tts.infer.infer_gradio --host 0.0.0.0 --port 7860
EOF
chmod +x ~/start_f5tts.sh
# Usage: ./start_f5tts.sh
Troubleshooting
Common Issues
“CUDA error: no kernel image available”
- Solution: Ensure PyTorch with CUDA 12.8 is installed
- Verify with:
python -c "import torch; print(torch.version.cuda)"
Import errors
- Ensure you’re in the correct conda environment:
conda activate f5-tts - Reinstall F5-TTS:
pip install -e .
- Ensure you’re in the correct conda environment:
GPU not detected
- Check NVIDIA drivers on Windows host
- Verify WSL2 GPU passthrough is working:
nvidia-smi
Performance Optimization
- Batch processing: RTX 5070’s 12GB VRAM supports larger batch sizes
- Memory management: Close other GPU applications for optimal performance
- Temperature monitoring: Ensure adequate cooling for sustained workloads
Next Steps
- API Integration: Set up REST API for n8n workflows
- Voice Library: Build collection of reference voices
- Automation: Create scripts for batch processing
- Quality Tuning: Experiment with different reference audio qualities
Key Success Factors
✅ PyTorch 2.7.1+ with CUDA 12.8 (essential for RTX 50-series)
✅ Clean conda environment (avoids dependency conflicts)
✅ Proper WSL2 GPU setup (nvidia-smi should work)
✅ Quality reference audio (clear, natural speech samples)