MedifyAI

Name	Email
Deepak Udayakumar	udayakumar.de@northeastern.edu
Amitesh Tripathi	tripathi.am@northeastern.edu
Dinesh Sai Pappuru	pappuru.d@northeastern.edu
Rohit Kumar Gaddam	gaddamsreeramulu.r@northeastern.edu
Sneha Amin	amin.sn@northeastern.edu

Demo Video

Introduction

1. Problem Statement and Overview

MedifyAI is a healthcare analytics system that enhances medical symptom analysis and patient care through AI-powered tools. The system integrates Medical Chatbots with Retrieval-Augmented Generation (RAG) framework, and automated data pipelines to provide accurate medical insights and treatment recommendations.

It uses the PMC-Patients dataset (link), which contains 167,034 anonymized patient summaries from PubMed Central (PMC).

2. Methodology

AI Model Architecture

The system is structured into three primary phases:

2.1. Medical Chatbot (HealthcarechatLLM)

Model Used: GPT-3.5
Purpose: Dynamic symptom collection and clinical summaries.
Capabilities:
- Structured symptom gathering.
- Real-time emergency detection.
- Clinical summary generation.
- Bias detection for fair patient interactions.

2.2. Medical Analysis (RAG System)

Embedding Model: sentence-transformers/all-MiniLM-L6-v2
Generation Model: GPT-4
Purpose: Retrieval-based medical analysis.
Capabilities:
- Retrieval-Augmented Generation (RAG) for case-based diagnosis.
- Historical medical case-based recommendations.
- Comprehensive tracking via MLflow.

2.3. Patient Report Interaction (OpenBioLLM)

Model Used: Llama3-OpenBioLLM-70B (More Info)
Purpose: Patients can interact with doctor reports.
Capabilities:
- Provides clarifications and explanations about medical findings.
- Ensures accurate, context-aware responses.

3. Goals

Enhance Patient Interaction – AI-powered symptom collection chatbot.
Improve Diagnosis – Retrieval-based medical case insights.
Enable Patient Empowerment – AI-assisted medical report explanations.
Ensure Bias-Free AI – Robust bias detection and fairness checks.
Seamless MLOps Deployment – Cloud-based automation & monitoring.

The source code for our project can be found here: GitHub.

Tools Used for MLOps

Category	Tools Used
Cloud Provider	AWS (EKS, S3, Lambda, SageMaker)
Model Training & Tracking	MLflow
Data Pipeline	Apache Airflow
Containerization & Orchestration	Docker, Kubernetes (EKS)
CI/CD	GitHub Actions
Monitoring & Logging	Prometheus, CloudWatch, Grafana
Vector Database	Pinecone

Project Architecture

Project Achitecture

Table of Contents