Hi, I'm Yash 👋
AI Engineer specializing in LLM systems, multi-agent architectures, and production ML pipelines.

Based in Calgary, Canada · Download Resume

YG

About

I'm Yash Gupta, an AI Engineer with 4+ years of experience building production AI systems in healthcare and enterprise domains. I specialize in LLM applications, multi-agent orchestration, and real-time voice AI, with a track record of deploying scalable solutions on AWS and GCP. I hold an M.Sc. in Artificial Intelligence and have published research in federated learning and edge AI.

Work Experience

M

Monark

May 2025 - February 2026
AI Engineer
  • Built a real-time voice AI roleplay bot using WebRTC, Python, Gemini Live, and Deepgram, improving transcript retention by 50% and reducing token usage by 25%.
  • Containerized and deployed the system with CI/CD and AWS ECR for scalable, low-latency production use.
  • Built a serverless leadership assessment pipeline with AWS Lambda and Step Functions to evaluate multiple traits in parallel with sub-60-second latency.
  • Designed research-backed prompts with I/O psychology experts and shipped the system with automated CI/CD using AWS SAM and GitHub Actions.
  • Implemented an LLM-as-a-judge evaluation pipeline using LangChain to enforce strict adherence to system prompt rules.
  • Deployed the service on AWS Lambda using a GPT model for automated quality assurance and bias-aware scoring.
  • Built a serverless PII redaction pipeline integrating Google Cloud NLP, LangExtract, and Gemini, achieving 90% F1 on redaction quality.
  • Ensured consistent redaction across transcripts, summaries, and insights for reliable downstream analytics.
  • Developed a speaker identification pipeline using HuggingFace and pyannote.audio for accurate multi-speaker transcription at scale.
  • Migrated model serving from SageMaker to Google Cloud Run, reducing infrastructure costs by 60%.
  • Built a multi-agent chatbot for conversational access to meeting data using LangChain and FastAPI with RAG over Vertex AI Vector Search.
  • Integrated the system with a Next.js frontend to deliver a production-ready, full-stack AI feature.
A

Aurora Constellations

May 2023 - September 2024
AI Researcher
  • Built a custom VSCode fork and Langium-based DSL for clinical pathway modeling with conditional logic for oncology workflows.
  • Implemented a decision-tree graph generator in Scala to navigate patient-specific pathways and reduce clinician cognitive load.
  • Built a RAG pipeline using GPT-3, LangChain, Pinecone, and FastAPI to generate structured patient treatment plans from unstructured clinical guidance.
  • Integrated the service with a Scala backend to iteratively generate and validate DSL-based plans.
  • Integrated with the OpenEpic platform to ingest EHR data and map it into the company’s DSL for downstream clinical tooling.
  • Implemented secure OAuth 2.0 flows with a FHIR-compliant server and built connectors to normalize patient data from Epic.
  • Modeled patient plans from MIMIC-III/IV as heterogeneous graphs and trained GNNs for ICU outcome prediction.
  • Achieved AUROC ~0.74 for mortality prediction and ~1.2 days RMSE for length-of-stay, matching strong clinical baselines.
  • Built a Neo4j-based clinical knowledge graph from MIMIC data to represent treatments, entities, and outcomes.
  • Implemented a Graph-RAG pipeline combining graph retrieval with GPT-3 to generate structured patient plans.
A

Aurora Constellations

May 2021 - May 2023
Software Engineer
  • Implemented structured logging for a Scala Play Server to improve observability, enable reliable log parsing, and support production analytics and debugging.
  • Built and maintained CI/CD pipelines for Play Server deployments, added secure deployment webhooks, and wrote Bash automation to generate and manage self-signed JWT certificates, improving release reliability and developer onboarding.
  • Developed a wake-word voice assistant using Picovoice for on-device trigger detection and Google Speech-to-Text for intent classification, enabling hands-free actions such as importing treatment plans and dictating patient notes.
  • Integrated voice intent routing into automated clinical workflows and optimized the system for low-latency, robust performance across diverse acoustic environments.
LL
Assistant Researcher
  • Designed and implemented a smart airflow regulation system for remote community homes using Arduino sensors, ESP32, and Raspberry Pi to monitor wall-siding temperatures and dynamically control airflow direction and fan speeds.
  • Engineered embedded hardware integrations with environmental sensors and actuators to enable efficient, automated temperature regulation.
  • Built a remote data logging and monitoring system using Raspberry Pi and Firebase, reducing the need for manual on-site data collection by ~95% and enabling real-time environmental insights.
  • Improved comfort and energy efficiency in off-grid housing through automated control logic and a connected monitoring infrastructure.
LL
Assistant Researcher
  • Conducted research on high-frequency utility data (5-minute resolution) to build power consumption forecasting models using FFNs and LSTMs, teaching myself Python, TensorFlow, time series analysis, and hyperparameter tuning.
  • Engineered models to capture temporal patterns and seasonal effects using cyclical time features and weather inputs, improving performance over baseline methods.
  • Benchmarked model performance against industry research, achieving RMSE ≈ 0.1–0.17 on short-term electricity consumption forecasting tasks, consistent with published LSTM results.
  • Tuned network depth, training window size, and feature encoding to reduce prediction error and improve generalization across summer and winter test periods.
  • Demonstrated practical, end-to-end implementation of real-world time series forecasting using LSTM architectures aligned with deep learning benchmarks for load prediction.

Skills

Languages

Python
TypeScript
JavaScript
Scala

Frontend Engineering

Next.js
React
Tailwind CSS
Shadcn UI

Backend Engineering

FastAPI
Play Framework (Scala)

Databases & Vector Stores

PostgreSQL
Supabase
Pinecone
OpenSearch
Vertex AI Vector Search

Core Machine Learning

PyTorch
TensorFlow
scikit-learn
pandas

Agentic & LLM Systems

OpenAI
Deepgram
Google Gemini
Anthropic Claude
LangChain
LangGraph
RAG (Retrieval-Augmented Generation)
MCP (Multi-agent Orchestration)
Multi-agent System Design
Google Agent Development Kit (ADK)
Pipecat (Daily)
Evals

Cloud & Infrastructure

Github Actions
AWS Lambda
AWS Step Functions
AWS SAM
AWS ECS
AWS S3
AWS SageMaker
GCP Cloud Run
GCP Vertex AI (Models, Endpoints, Pipelines)
GCP Vertex AI Vector Search
GCP Cloud Storage
GCP Cloud Functions
My Projects

Check out my latest work

Research projects and applied ML work spanning federated learning, computer vision, and signal processing.

Federated Learning Framework via Distributed Mutual Learning

Federated Learning Framework via Distributed Mutual Learning

Developed a privacy-preserving federated learning framework that replaces weight-sharing with loss-based mutual learning, reducing bandwidth usage and model inversion attack risks. By leveraging knowledge distillation and deep mutual learning, clients share insights without exposing sensitive data, improving model generalization. The framework was evaluated on a face mask detection case study, demonstrating superior performance compared to traditional synchronous and asynchronous federated learning methods.

Federated Learning
Deep Learning
Knowledge Distillation
Mutual Learning
Privacy-Preserving Machine Learning
Computer Vision
Convolutional Neural Networks (CNN)
KL Divergence Optimization
Python
TensorFlow
Image Compression Using Fast Fourier Transform and JPEG Compression

Image Compression Using Fast Fourier Transform and JPEG Compression

Developed an image compression tool in MATLAB using DFT, FFT, and DCT, implementing algorithms from scratch. The project optimized Fourier-based compression, benchmarked it against JPEG, and integrated a GUI for real-time visualization. Key concepts include Fourier Transform for frequency-domain compression, matrix transformations and quantization for data reduction, benchmarking compression efficiency across techniques, and a graphical user interface for user-controlled compression.

MATLAB
Signal Processing
Matrix Algebra & Linear Algebra
Fast Fourier Transform (FFT)
Discrete Cosine Transform (DCT)
Quantization & Data Reduction
Benchmarking & Performance Analysis
Graphical User Interface (GUI)
Research

Research Publications

Peer-reviewed publications in federated learning, IoT, and healthcare AI.

  • Toward Asynchronously Weight Updating Federated Learning for AI-on-Edge IoT Systems

    Yash Gupta, Zubair Md Fadlullah, Mostafa M. Fouda
    Journal Article

    2022 IEEE International Conference on Internet of Things and Intelligence Systems (IoTaIS)DOI

    Designed an asynchronously weight updating federated learning algorithm for AI-on-Edge IoT systems, enhancing data privacy by eliminating the need for centralized data sharing. Applied the approach to face mask detection, traditionally a centralized computer vision task, by distributing learning tasks across users. Investigated performance trade-offs between synchronous and asynchronous weight updates, introducing a penalization mechanism to optimize model aggregation. Experimental results demonstrated comparable accuracy to centralized training while significantly reducing transmission time overhead.

  • Intelligent Real-Time Face-Mask Detection System with Hardware Acceleration for COVID-19 Mitigation

    Peter Sertic, Ayman Alahmar, Thangarajah Akilan, Yash Gupta, Marko Javorac
    Journal Article

    Healthcare 2022DOI

    Developed and implemented a hardware-accelerated real-time face-mask detection system using deep learning (DL), optimized for embedded platforms including Raspberry Pi 4B (Google Coral TPU, Intel NCS2 VPU) and NVIDIA Jetson Nano. Designed a custom face-mask detection model (MaskDetect), independently quantized and optimized for each hardware platform. Conducted an ablation study comparing MaskDetect to transfer-learning models (VGG16, ResNet-50V2, InceptionV3), achieving 94%+ accuracy on most platforms. Results demonstrated that Jetson Nano offers the best trade-off in accuracy (94.2%), inference speed, and cost, making it ideal for real-time deployment.

  • HELIUS: A Blockchain Based Renewable Energy Trading System

    Yash Gupta, Marko Javorac, Shaun Cyr, Abdulsalam Yassine
    Journal Article

    2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)DOI

    Developed a peer-to-peer (P2P) sustainable energy exchange system using Blockchain and Deep Learning to optimize energy trading during peak demand. Designed a novel framework for power system operations, enabling users to trade energy efficiently while simulating sustainable energy production based on location, time, and weather. Integrated a blind bidding mechanism and a web application to demonstrate real-world feasibility.

Contact

Get in Touch

Send me an email or connect with me on LinkedIn.

Send Email