Projects

A mix of shipped work, ongoing experiments, and slots reserved for future LLM and agentic AI projects.

LLMs • Embeddings • Research

Embeddings Researcher — Comparing Representation Learning Methods

A research-oriented Streamlit app to compare TF-IDF, Skip-gram, CBOW, and GloVe embeddings — exploring when each representation method works best and how they affect downstream tasks.

  • Implemented core algorithms from scratch in PyTorch / NumPy.
  • Interactive UI for neighborhoods, similarities, and analogies.
  • Designed to help users build intuition for “how embeddings think”.
Agentic AI • RAG • Healthcare

Clinical QA Agent with Retrieval-Augmented Generation

Multi-agent workflow where one agent retrieves & ranks medical evidence, another synthesizes answers, and a critic agent checks for hallucinations before responses are surfaced.

  • Built with LangGraph, LangChain, and a vector database.
  • Distinct agent roles (retriever, generator, verifier).
  • Focus on traceability and “show your work” chains of thought.

Coming soon

These are intentional placeholders — as I push more projects to GitHub / Hugging Face, I’ll link them here.

Agentic AI

AgentBench: Evaluating Multi-Agent LLM Workflows

A small benchmark suite for comparing agentic LLM setups: single-agent vs multi-agent, with different planning and tool-calling strategies.

  • Task templates for retrieval, planning, tool use.
  • Evaluation metrics beyond accuracy (latency, cost, stability).
RAG • Evaluation

RAGEval: Debugger for Retrieval-Augmented Generation

A toolkit to inspect retrieval quality, context construction, and answer grounding — with side-by-side comparisons of different retrievers and rerankers.

  • Visualize query → top-k documents → final answer.
  • Hooks for LLM-as-a-judge and human evaluation.
Vision • Deep Learning

Vision Transformer from Scratch

A PyTorch implementation of a compact Vision Transformer, trained on a small image dataset, focusing on clarity of code and explanatory visuals.

  • Step-by-step, well-documented implementation.
  • Side-by-side with a ResNet baseline.