Projects Portfolio
Explore my portfolio of data engineering, cloud architecture, and machine learning projects.
RESTful Shopping API: Secure E-Commerce Backend Service
Designed and built a production-ready RESTful API backend for an e-commerce platform using Node.js and MongoDB Atlas. Features include modular route architecture, JWT-based authentication, input validation, and auto-generated Swagger documentation. Follows microservices best practices with clean separation of concerns.
NYC Collision Data Pipeline: Scalable ETL on AWS with Spark and Redshift
Architected and deployed an end-to-end data pipeline on AWS. Ingests NYC collision data daily from the Socrata API, processes it with Spark on AWS EMR, and loads into Amazon Redshift for analytics. Includes QuickSight dashboards, automated scheduling, and comprehensive logging for observability.
StockStream: Real-Time Event-Driven Data Platform
Built a real-time stock market data streaming platform using event-driven architecture. Ingests intraday stock prices via Alpha Vantage API, streams through Confluent Kafka producers/consumers, and persists processed data in MySQL. Demonstrates distributed systems patterns including message queuing, fault tolerance, and low-latency processing.
Customer Ticket Management System: Full-Stack Platform with RBAC
Built a full-stack ticketing platform with role-based access control using Streamlit, MySQL (GCP Cloud SQL), and SQLAlchemy. Features real-time CRUD operations, custom analytics filters, and a normalized relational schema supporting multi-tenant workflows.
Generative Origami AI: Creative AI with LLMs and Image Generation
A creative AI application that converts text prompts into unique origami design ideas. Integrates LLMs (GPT-4) with image generation models (DALL-E, Stable Diffusion) to visualize folding concepts. Built with Streamlit and LangChain for rapid prototyping and user interaction.
SummarAIze: AI-Powered Content Analysis Assistant
A Streamlit-based web application that transcribes, summarizes, and analyzes audio or video content. Supports file uploads and YouTube links, with action item extraction, sentiment analysis, and TL;DR summaries using AssemblyAI and GPT-4o.
Graph Database for Prime User Insights
Modeled and explored Amazon Prime user data using Neo4j. Created a graph schema linking users, devices, genres, and subscription plans. Ran Cypher queries to derive insights like popular genres, average age, and device trends.
Legal PDF Splitter: OCR-Powered Document Automation Tool
Developed a Python-based automation tool for splitting multi-document PDFs in legal workflows. Uses Tesseract OCR to detect keywords in specific page regions and splits documents accordingly. Includes configurable detection rules and ships as both a Python package and standalone executable.
Association Rule Mining and Product Recommendation
Built a retail recommendation engine using Association Rule Mining (Apriori), Item-Based and Order-Based Collaborative Filtering on a 3M+ order dataset. Demonstrates algorithmic problem-solving and large-scale data processing.