Problem
Traditional portfolios are static. Visitors have to navigate manually, read through pages, and figure out the technical context on their own. That creates friction — especially for recruiters or developers who want to quickly understand what you built, how you think, and how deep your knowledge actually goes.
Solution
A conversational AI assistant embedded in the portfolio that lets anyone explore my work through natural language. It answers questions about projects, experience, stack, and more — in real time, using my own content as context.
How it works
- The user asks a question in natural language
- The frontend sends the query to the backend with streaming enabled
- The backend generates an embedding and searches the vector database by cosine similarity
- The most relevant chunks are injected as context into the prompt
- Groq streams the response back to the frontend in real time
Tech stack
- Frontend: Next.js, Tailwind CSS
- Backend: Python, FastAPI
- CMS: Payload CMS
- Embeddings: HuggingFace Inference API (all-MiniLM-L6-v2)
- LLM: Groq, Llama 3.3 70B
- Vector database: Neon DB with pgvector
- Deploy: Docker, Azure App Service
Technical decisions
- RAG instead of static context: dynamically retrieves the most relevant fragments based on the question
- HuggingFace Inference API: avoids loading the model in memory, solving free tier limits
- Groq: low latency and real streaming for a fluid conversational experience
- Async FastAPI: non-blocking backend that handles streaming efficiently
- Azure App Service with Docker: no cold start, no artificial memory limits
Challenges and learnings
- Configuring real streaming without buffering through the Azure proxy
- Handling 2D vs 1D embeddings from the HuggingFace API
- Designing the system prompt for language detection and topic restriction
- Managing database reconnections after instance idle time