Building a Personal Fitness Insights Engine with Vector Search

This project showcases a sophisticated approach to making personal fitness data searchable and meaningful using modern AI tools. The system combines Strava activity data, vector embeddings, and an intelligent search interface, creating a personalized fitness knowledge base. The system consists of three main components: Strava data collection, vector storage in Supabase, and semantic activity analysis.

The workflow begins with authenticating and fetching detailed activity data from Strava using a Python-based OAuth flow. The strava-fetch.py script retrieves comprehensive activity information, including metrics like heart rate, pace, elevation, and detailed split timings. This raw data forms the foundation of the personal fitness history.

Each activity is then transformed into a rich textual representation by strava-embeddings.py, which captures the nuanced details of every workout. These activity descriptions are converted into vector embeddings using the nomic-embed-text model running on Ollama. The embeddings capture the semantic essence of each workout, enabling intelligent similarity searches that go beyond simple metric matching.

The system stores these embeddings in Supabase’s vector database, creating a searchable fitness knowledge base. This approach allows for sophisticated queries that can understand the context and patterns in your training history. For example, you could find similar workouts based on effort level, terrain profile, or performance characteristics—not just basic parameters like distance or time.

What makes this system powerful is its ability to understand the context of your fitness journey. Rather than just storing raw numbers, it maintains the rich context of each activity: your performance on specific segments, personal records, heart rate zones, and even the progression of your pace throughout the workout. This enables natural language queries like “Find me hard hill workouts where I maintained a strong pace in the second half” or “Show me activities similar to last week’s breakthrough run.”

This architecture transforms raw Strava data into an intelligent fitness companion that understands your training patterns and can provide meaningful insights about your athletic journey. The system maintains robustness through careful error handling and retry mechanisms, ensuring reliable data processing even with API rate limits and connection issues.

Credits to the Strava API documentation and the Supabase vector database documentation for providing the foundational knowledge. The project could be extended to include any type of fitness data, creating a comprehensive personal health analytics platform.

Getting Started

Create a Strava API key
- Go to https://www.strava.com/settings/api
- Create an application to get your API credentials
Configure and run the authentication script
- Add your client_id and client_secret to strava-auth.py
- Run the script - a browser window will open for authentication
- After authorizing, you’ll receive a refresh token
Set up data fetching
- Add your client_id, client_secret, and refresh_token to strava-fetch.py
- Run the script to download your Strava activities
Create the Supabase table
- Create a new table named strava_activities
- Enable vector extension in your Supabase project
- Add necessary columns for activity data and embeddings
Process and store activities
- Add your Supabase credentials to strava-embeddings.py
- Run the script to generate embeddings and store activities
Your data is now ready to use
- All activities are processed and indexed
- You can now perform semantic searches on your fitness data

Note: Make sure to have all required Python packages installed and environment variables properly configured before running the scripts.

Getting Started¶

Getting Started