2. Scoping

1. Overview

MedifyAI is an AI-driven healthcare analytics system for medical symptom analysis and case-based diagnosis. It integrates conversational AI, retrieval-augmented generation (RAG), and cloud-based MLOps to assist healthcare professionals.

The system is built on the PMC-Patients Dataset, which includes 167,034 anonymized patient summaries extracted from PubMed Central.

2. Problem Statement

Healthcare systems often lack structured patient symptom collection and real-time medical assistance. Doctors rely on incomplete patient histories, leading to diagnostic delays. MedifyAI aims to improve this by enabling structured symptom collection, retrieval-based case comparison, and patient interaction with their medical reports.

3. Objectives

Improve patient engagement through AI-driven conversation
Provide treatment recommendations using historical medical data
Reduce bias in patient interactions
Monitor retrieval performance and maintain data updates
Deploy a scalable AI solution in a cloud environment

4. Key Features

4.1 Medical Chatbot

Uses GPT-3.5 for patient symptom collection
Detects emergency situations in real time
Generates clinical summaries for doctors

4.2 Medical Case Analysis

Uses sentence-transformers/all-MiniLM-L6-v2
Employs GPT-4 to analyze retrieved cases
Tracks system performance with MLflow

4.3 Patient Report Interaction

Llama3-OpenBioLLM-70B (More Info) enables question-answering on reports
Explains complex medical findings in simple terms
Provides context-aware responses for better patient understanding

5. System Workflow

The chatbot collects symptoms and creates a structured summary
The system retrieves similar cases from the PMC-Patients dataset
GPT-4 generates a medical summary based on retrieved cases
Llama3-OpenBioLLM-70B (More Info) allows patients to interact with their reports
Data processing and storage are handled using Apache Airflow and Pinecone
The system is deployed on AWS EKS, with monitoring via CloudWatch

6. Technical Components

Component	Technology Used
Cloud Provider	AWS (EKS, Lambda, SageMaker)
Data Processing	Apache Airflow
Embedding Model	sentence-transformers/all-MiniLM-L6-v2
Language Models	GPT-3.5, GPT-4, OpenBioLLM
Vector Database	Pinecone
Deployment	Docker, Kubernetes
CI/CD	GitHub Actions
Monitoring	Prometheus, CloudWatch, Grafana
Experiment Tracking	MLflow

7. MLOps Pipeline

The MLOps pipeline automates training, deployment, and monitoring. It consists of:

Data ingestion from PMC-Patients Dataset
Embedding generation and storage in Pinecone
Retrieval and response generation using GPT-4
Model deployment through Amazon SageMaker
CI/CD automation via GitHub Actions
System monitoring through Prometheus and CloudWatch

8. Challenges and Considerations

Addressing bias in chatbot responses across demographics
Improving retrieval accuracy to ensure relevant case recommendations
Managing cloud resource costs and optimizing infrastructure
Ensuring patient data security and regulatory compliance

9. Expected Impact

MedifyAI aims to enhance healthcare decision-making by providing structured patient data, assisting doctors with case-based recommendations, and allowing patients to engage with their medical reports. The system is designed to be scalable, efficient, and accessible.

Table of Contents