AI/ML Engineer • RAG & LLM Systems

I make AI systems practical, performant, and production-ready.

Currently at Radical Squares building RAG pipelines, LLM orchestration, and real-time inference infrastructure.

Selected Work

Projects

5-Stage RAG PipelineQueryRouteReformQdrantCheckStream Gen400msFastAPI + Qdrant
400
ms latency
01

Brckt

Production RAG system for real-time tennis analytics.

FastAPIQdrantLLM Streaming
QueryOrchestratorPlannerCoderReviewerExecutorE2B Sandbox
4
Agents
02

CodePilot

Multi-agent AI system for autonomous code generation.

Claude 4.5LangGraphE2B
Real-time Fraud DetectionDataStreamFastAPI<100msXGBoostModelRiskScoreMLflow
<100
ms latency
03

ML-Monitor

Production MLOps platform for real-time fraud detection.

FastAPIXGBoostMLflow
QueryDistilBERTClassifierQdrantHITRouterRedisMISS
60%
cost saved
04

Cascade

Intelligent LLM router with semantic caching.

DistilBERTQdrantRedis
Hybrid RAG PipelineQueryHybridBM25DenseCrossEncoderLLMQdrant
94%
relevance
05

VerbaQuery

Industrial RAG with hybrid retrieval (BM25 + dense embeddings) and cross-encoder re-ranking for enterprise document search.

LangChainQdrantCross-Encoder

About

Background

MS in Computer Science from Indiana University with a 3.9 GPA. Focused on making machine learning work in production.

My work spans RAG systems, LLM pipelines, and MLOps infrastructure. I care deeply about building AI that's reliable, fast, and actually useful.

Experience

GenAI Engineer

Jan 2026 — Present
Radical Squares

Developing production GenAI applications integrating OpenAI GPT-4o APIs with LangChain. Built full-stack AI platform with React/FastAPI, implemented RAG pipeline with vector embeddings, and deployed microservices with Docker/Redis achieving 94.6% API cost reduction.

GPT-4oRAG Pipeline94.6% cost reduction

AI/ML Engineer

Dec 2024 — Present
Brckt (Peristyle Labs)

Built real-time GenAI application using Llama 3.3-70B LLM with streaming responses via Server-Sent Events (SSE). Developed scalable backend API with FastAPI and async processing, containerized with Docker and deployed with Caddy reverse proxy.

Llama 3.3-70BSSE StreamingDocker/Caddy

GenAI Developer

Jun 2025 — Dec 2025
Riverside Global

Architected enterprise RAG system using GPT-4 API with LangChain orchestration. Implemented 5-stage pipeline with hybrid retrieval (BM25 + semantic), built vector search with ChromaDB/FAISS for 10,000+ documents achieving 94% retrieval accuracy.

94% accuracyChromaDB/FAISSGPT-4 + LangChain

Tech Stack

Python
PyTorch
TensorFlow
LangChain
Hugging Face
OpenAI
Claude
Ollama

Contact

Let's build something

Currently open to AI/ML engineering opportunities. If you're building something interesting, I'd love to hear about it.

Send a Message

© 2026 Ayush

Email copied to clipboard!