Nacho Assistant

An AI assistant integrated into my portfolio, built with Google Gemini, LangChain, and FastAPI. It answers questions in real time about my projects, tech stack, experience, and professional background. It also allows visitors to send messages directly and analyze job descriptions to evaluate how well I fit a specific role.

Website Repository

Problem

Traditional portfolios are static. Visitors have to navigate manually, read through pages, and figure out the technical context on their own. That creates friction — especially for recruiters or developers who want to quickly understand what you built, how you think, and how deep your knowledge actually goes.

Solution

A conversational AI assistant embedded in the portfolio that lets anyone explore my work through natural language. It answers questions about projects, experience, stack, and more — in real time, using my own content as context.

How it works

The user asks a question in natural language
The frontend sends the query to the backend with streaming enabled
The backend generates an embedding and searches the vector database by cosine similarity
The most relevant chunks are injected as context into the prompt
Groq streams the response back to the frontend in real time

Tech stack

Frontend: Next.js, Tailwind CSS
Backend: Python, FastAPI
CMS: Payload CMS
Embeddings: HuggingFace Inference API (all-MiniLM-L6-v2)
LLM: Groq, Llama 3.3 70B
Vector database: Neon DB with pgvector
Deploy: Docker, Azure App Service

Technical decisions

RAG instead of static context: dynamically retrieves the most relevant fragments based on the question
HuggingFace Inference API: avoids loading the model in memory, solving free tier limits
Groq: low latency and real streaming for a fluid conversational experience
Async FastAPI: non-blocking backend that handles streaming efficiently
Azure App Service with Docker: no cold start, no artificial memory limits

Challenges and learnings

Configuring real streaming without buffering through the Azure proxy
Handling 2D vs 1D embeddings from the HuggingFace API
Designing the system prompt for language detection and topic restriction
Managing database reconnections after instance idle time