Mithil

01 / About Me

I am a Data Science undergraduate specializing in Retrieval-Augmented Generation (RAG), NLP systems, and backend AI infrastructure. I build retrieval pipelines, evaluation frameworks, and low-latency AI applications focused on measurable retrieval quality and production reliability.

Python•SQL•FastAPI•PyTorch•RAG•Docker•PostgreSQL•

02 / Experience

Work History

Analytics Intern — RAG & Data Analytics

Star Health Allied Insurance

Dec 2025

View Certificate

Built analytics pipelines over FY24 to FY25 insurance data using hybrid retrieval and FastAPI serving for scalable market analysis
Developed a hybrid RAG pipeline using LangChain and ChromaDB integrating structured datasets with analytical documents for natural language querying
Improved retrieval quality using BM25 and dense vector retrieval with reranking and metadata filtering across internal evaluation workflows
Reduced end to end latency from 2.4s to 650ms through query routing context compression and prompt optimization while reducing token usage by 42%
Evaluated retrieval and answer quality using RAGAS and manual review workflows across a 60 query benchmark

PythonLangChainRAGChromaDBSentence-TransformersHybrid SearchStreamlit

03 / Projects

Featured Work

View Project

01

Aurora RAG Chatbot

RAG system for real-time event queries deployed during a university event serving 400+ attendees. Reduced repeated-query latency from 4.2s to sub-20ms using multi-tier caching and semantic cache reuse. Led a 6-member development team across retrieval pipeline development, deployment, and testing.

PythonFastAPIRedisChromaDBGroqDocker

View Project

02

AI Cloud Drive

Built a self-hosted cloud storage system with an integrated RAG pipeline for querying technical PDFs. Implemented a retrieval strategy using hybrid search, reranking, and context sufficiency checks to improve retrieval reliability and context grounding. Features asynchronous document ingestion, citation tracking, and context validation pipelines. Processed and indexed 1,000+ document chunks with hybrid retrieval and reranking pipelines to optimize retrieval quality.

PythonFastAPIDockerGroqChromaDB

View Project

03

Automated News Ingestion and Analysis Pipeline

Built a fault-tolerant news ingestion pipeline using dual-stage article extraction (Newspaper3k + custom DOM parsing) with NLP-based sentiment analysis and topic classification across live news sources.

PythonFlaskNLPVADER

04 / Open Source