AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model

AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation

Qingqiu Li¹, Zihang Cui², Seongsu Bae³, Jilan Xu¹, Runtian Yuan¹,
Yuejie Zhang¹, Rui Feng³, Quanli Shen⁴, Xiaobo Zhang⁴, Junjun He⁵ and Shujun Wang⁶

¹Fudan University ²Xidian University ³KAIST ⁴Children's Hospital of Fudan University
⁵Shanghai AI Laboratory ⁶Hong Kong Polytechnic University

Abstract

Chest X-rays (CXRs) are the most frequently performed imaging examinations in clinical settings. Recent advancements in Large Multimodal Models (LMMs) have enabled automated CXR interpretation, enhancing diagnostic accuracy and efficiency. However, despite their strong visual understanding, current Medical LMMs (MLMMs) still face two major challenges: (1) Insufficient regionlevel understanding and interaction, and (2) Limited accuracy and interpretability due to singlestep reasoning. In this paper, we empower MLMMs with anatomycentric reasoning capabilities to enhance their interactivity and explainability. Specifically, we first propose an Anatomical Ontology-Guided Reasoning (AOR) framework, which centers on cross-modal region-level information to facilitate multi-step reasoning. Next, under the guidance of expert physicians, we develop AOR-Instruction, a large instruction dataset for MLMMs training. Our experiments demonstrate AOR's superior performance in both VQA and report generation tasks.

💡AOR framework

(a) our AOR framework, which flexibly accommodates both textual and optional visual prompts as input, centered on region-level information to enable multimodal multi-step reasoning and (b) Three-stage training procedure for AOR

🖼️ Results

For the VQA task, AOR is capable of generating correct and logically reasoned answers. For the report generation task, due to the incorporation of fine-grained anatomical regions, AOR demonstrates a stronger grasp of details, such as ET tube, NG tube, and basal atelectasis. Moreover, it can generate corresponding report sentences for specified regions.

AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation

📝 TL;DR: By empowering Medical LMMs with anatomy-centric reasoning capabilities, we offer a new paradigm for interactive and explainable LMMs in medical imaging analysis.

Abstract

💡AOR framework

📑 AOR-Instruction dataset

🖼️ Results