Since the inception of large language models (LLMs) from OpenAI in 2022, more than a dozen LLMs and large multimodal models (LMMs) have been developed and deployed by various organizations for public use either through APIs or as locally run models.
According to the Coalition for Health AI (CHAI), generative AI, also known as LLM-based or LMM-based AI systems, can be applied across a wide variety of healthcare tasks such as:
As these capabilities expand, it’s imperative to consider how to ensure the safety and effectiveness of AI systems used within the healthcare ecosystem, regardless of whether they fall under the definition of a medical device.
Rook Quality Systems is here to support your organization in preparing for the rollout of any LLM- or LMM-based AI system by leveraging our extensive experience in the medical device space.
From a medical device perspective, establishing robust guardrails is critical to ensure that AI meets specified standards for the following:
Accuracy (e.g., precise predictions)
Clinical Utility (e.g., meaningful impact on patient care)
Reliability and repeatability (e.g., consistent performance across settings
Safety (e.g., minimizing risks and unintended harms)
Ethical Considerations (e.g., addressing bias, fairness, and transparency
To support these goals, medical device manufacturers can consider adopting the Good Machine Learning Practice guidelines, co-developed by global regulatory authorities, ISO 14971 which outlines best practices for AI risk evaluation, and ISO 42001 provides governance considerations for AI systems, including those not classified as medical devices.
Whether you’re developing a regulated AI medical device, using AI to support medical device development, or launching a non-device healthcare AI solution, it’s critical to approach quality assurance throughout the AI lifecycle , considering key factors such as:
LLMs are typically not task-specific unless explicitly configured and calibrated. Without proper setup, users may unknowingly engage in off-label use, leading to inconsistent outputs or hallucinations.
Standardized performance metrics can be difficult to define due to the varied tasks LLMs can perform.
Users may overly trust incorrect outputs.
There may be a lack of tools to detect shifts in model behavior over time.
Performance variability in production environments can lead to unexpected issues.
Whether you’re developing a regulated AI medical device, using AI to support medical device development, or launching a non-device healthcare AI solution, it’s critical to approach quality assurance throughout the AI lifecycle.
Key Considerations:
Whether your AI system falls under medical device regulation or not, the principles for developing, verifying, and validating your product should remain consistent. Rook Quality Systems is actively working with organizations across the healthcare ecosystem to develop consensus-driven definitions, guides, frameworks, and tooling content for LLM-based and LMM-based AI systems.
Our mission is to ensure that these advanced technologies are deployed responsibly and effectively. Please don’t hesitate to reach out if you need support preparing your AI systems for regulatory compliance and real-world implementation.