Ensuring the Safety & Effectiveness of AI Systems in Healthcare

Guardrails for Safe & Effective Use of LLM-Based Systems in Healthcare

Since the inception of large language models (LLMs) from OpenAI in 2022, more than a dozen LLMs and large multimodal models (LMMs) have been developed and deployed by various organizations for public use either through APIs or as locally run models.

According to the Coalition for Health AI (CHAI), generative AI, also known as LLM-based or LMM-based AI systems, can be applied across a wide variety of healthcare tasks such as:

Clinical Decision Support: EHR information retrieval, medical report generation, medical imaging enhancement and analysis
Patient Facing Functions: EHR information retrieval, medical report generation, medical imaging enhancement and analysis
Administrative Tasks: Coding and billing automation, appointment management
Public Health: EHR information retrieval, medical report generation, medical imaging enhancement and analysis Social determinants of health
Research & Development: Drug design, genomics, clinical trial simulation

As these capabilities expand, it’s imperative to consider how to ensure the safety and effectiveness of AI systems used within the healthcare ecosystem, regardless of whether they fall under the definition of a medical device.

Rook Quality Systems is here to support your organization in preparing for the rollout of any LLM- or LMM-based AI system by leveraging our extensive experience in the medical device space.

Why Guardrails are Essential for AI In Healthcare

From a medical device perspective, establishing robust guardrails is critical to ensure that AI meets specified standards for the following:

Accuracy (e.g., precise predictions)

Clinical Utility (e.g., meaningful impact on patient care)

Reliability and repeatability (e.g., consistent performance across settings

Safety (e.g., minimizing risks and unintended harms)

Ethical Considerations (e.g., addressing bias, fairness, and transparency

To support these goals, medical device manufacturers can consider adopting the Good Machine Learning Practice guidelines, co-developed by global regulatory authorities, ISO 14971 which outlines best practices for AI risk evaluation, and ISO 42001 provides governance considerations for AI systems, including those not classified as medical devices.

Risks of Unregulated LLM Use in Healthcare

Whether you’re developing a regulated AI medical device, using AI to support medical device development, or launching a non-device healthcare AI solution, it’s critical to approach quality assurance throughout the AI lifecycle , considering key factors such as:

1. Generalization vs. Specification

LLMs are typically not task-specific unless explicitly configured and calibrated. Without proper setup, users may unknowingly engage in off-label use, leading to inconsistent outputs or hallucinations.

2. Performance Metrics

Standardized performance metrics can be difficult to define due to the varied tasks LLMs can perform.

3. Automation Bias

Users may overly trust incorrect outputs.

4. Predication Drift

There may be a lack of tools to detect shifts in model behavior over time.

5. Substantial Performance Changes

Performance variability in production environments can lead to unexpected issues.

Quality Assurance in AI

Key Considerations:

Clarify intended use and mitigate off-label risks
Assess variability in LLM behavior and implement risk controls (e.g., restrict model responses to intended use cases)
Evaluate accuracy and fairness across clinical stages and user types (e.g., grade quality attributes, use expert evaluators with proper training)
Conduct bias assessments across user subgroups at both pre-deployment and post-deployment stages
Validate auxiliary features that support credibility assessment of AI outputs, such as uncertainty indicators, explanations, references, and source citations.

RookQS: Your Partner in Quality Assurance Success

Whether your AI system falls under medical device regulation or not, the principles for developing, verifying, and validating your product should remain consistent. Rook Quality Systems is actively working with organizations across the healthcare ecosystem to develop consensus-driven definitions, guides, frameworks, and tooling content for LLM-based and LMM-based AI systems.

Our mission is to ensure that these advanced technologies are deployed responsibly and effectively. Please don’t hesitate to reach out if you need support preparing your AI systems for regulatory compliance and real-world implementation.

RQS Blog