Name Email
Deepak Udayakumar udayakumar.de@northeastern.edu
Amitesh Tripathi tripathi.am@northeastern.edu
Dinesh Sai Pappuru pappuru.d@northeastern.edu
Rohit Kumar Gaddam gaddamsreeramulu.r@northeastern.edu
Sneha Amin amin.sn@northeastern.edu

Table of Contents

  1. Introduction
  2. Scoping
  3. Data Pipeline
  4. Modeling
  5. Deployment
  6. AWS Deployment Setup
  7. CI/CD
  8. Monitoring

Demo Video

Introduction

1. Problem Statement and Overview

MedifyAI is a healthcare analytics system that enhances medical symptom analysis and patient care through AI-powered tools. The system integrates Medical Chatbots with Retrieval-Augmented Generation (RAG) framework, and automated data pipelines to provide accurate medical insights and treatment recommendations.

It uses the PMC-Patients dataset (link), which contains 167,034 anonymized patient summaries from PubMed Central (PMC).

2. Methodology

AI Model Architecture

The system is structured into three primary phases:

2.1. Medical Chatbot (HealthcarechatLLM)

  • Model Used: GPT-3.5
  • Purpose: Dynamic symptom collection and clinical summaries.
  • Capabilities:
    • Structured symptom gathering.
    • Real-time emergency detection.
    • Clinical summary generation.
    • Bias detection for fair patient interactions.

2.2. Medical Analysis (RAG System)

  • Embedding Model: sentence-transformers/all-MiniLM-L6-v2
  • Generation Model: GPT-4
  • Purpose: Retrieval-based medical analysis.
  • Capabilities:
    • Retrieval-Augmented Generation (RAG) for case-based diagnosis.
    • Historical medical case-based recommendations.
    • Comprehensive tracking via MLflow.

2.3. Patient Report Interaction (OpenBioLLM)

  • Model Used: Llama3-OpenBioLLM-70B (More Info)
  • Purpose: Patients can interact with doctor reports.
  • Capabilities:
    • Provides clarifications and explanations about medical findings.
    • Ensures accurate, context-aware responses.

3. Goals

  1. Enhance Patient Interaction – AI-powered symptom collection chatbot.
  2. Improve Diagnosis – Retrieval-based medical case insights.
  3. Enable Patient Empowerment – AI-assisted medical report explanations.
  4. Ensure Bias-Free AI – Robust bias detection and fairness checks.
  5. Seamless MLOps Deployment – Cloud-based automation & monitoring.

The source code for our project can be found here: GitHub.

Tools Used for MLOps

Category Tools Used
Cloud Provider AWS (EKS, S3, Lambda, SageMaker)
Model Training & Tracking MLflow
Data Pipeline Apache Airflow
Containerization & Orchestration Docker, Kubernetes (EKS)
CI/CD GitHub Actions
Monitoring & Logging Prometheus, CloudWatch, Grafana
Vector Database Pinecone

Project Architecture

Project Achitecture