Multimodal Deep Learning for Improved Disease Diagnosis in Ophthalmology

Fundus-Enhanced Disease-Aware Distillation Model for Retinal Disease Classification from OCT Images

1The Hong Kong University of Science and Technology
2Guangdong Provincial Hospital of Integrated Traditional Chinese and Western Medicine
3Guangdong Weiren Meditech Co., Ltd.

*Corresponding author

Abstract

Optical Coherence Tomography (OCT) is a novel and effective screening tool for ophthalmic examination. Since collecting OCT images is relatively more expensive than fundus photographs, existing methods use multi-modal learning to complement limited OCT data with additional context from fundus images. However, the multi-modal framework requires eye-paired datasets of both modalities, which is impractical for clinical use. To address this problem, we propose a novel fundus-enhanced disease-aware distillation model (FDDM), for retinal disease classification from OCT images. Our framework enhances the OCT model during training by utilizing unpaired fundus images and does not require the use of fundus images during testing, which greatly improves the practicality and efficiency of our method for clinical use. Specifically, we propose a novel class prototype matching to distill disease-related information from the fundus model to the OCT model and a novel class similarity alignment to enforce consistency between disease distribution of both modalities. Experimental results show that our proposed approach outperforms single-modal, multi-modal, and stateof-the-art distillation methods for retinal disease classification.

Framework

An overview of our framework is shown below. Our method is based on class prototype matching, which distills disease-specific features, and class similarity alignment, which distills inter-class relationships.

MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images

1The Hong Kong University of Science and Technology
2Guangdong Weiren Meditech Co., Ltd.
3Yunnan United Vision Innovations Technology Co., Ltd.
4Guangdong Hospital of Integrated Traditional Chinese and Western Medicine
5The Second People's Hospital of Foshan

*Corresponding author

Abstract

Existing multi-modal learning methods on fundus and OCT images mostly require both modalities to be available and strictly paired for training and testing, which appears less practical in clinical scenarios. To expand the scope of clinical applications, we formulate a novel setting, "OCT-enhanced disease recognition from fundus images", that allows for the use of unpaired multi-modal data during the training phase, and relies on the widespread fundus photographs for testing. To benchmark this setting, we present the first large multi-modal multi-class dataset for eye disease diagnosis, MultiEYE, and propose an OCT-assisted Conceptual Distillation Approach (OCT-CoDA), which employs semantically rich concepts to extract disease-related knowledge from OCT images and leverages them into the fundus model. Specifically, we regard the image-concept relation as a link to distill useful knowledge from OCT teacher model to fundus student model, which considerably improves the diagnostic performance based on fundus images and formulates the cross-modal knowledge transfer into an explainable process. Through extensive experiments on the multi-disease classification task, our proposed OCT-CoDA demonstrates remarkable results and interpretability, showing great potential for clinical application.

Framework

The Framework of the Proposed OCT-CoDA Method is presented below. The pre-trained OCT model is adopted as the teacher to train the fundus student model. Given a batch of unpaired OCT images and fundus photos, they are fed into separate image encoders to get the extracted features. To implement the conceptual distillation, we first prompt the LLM to generate a concept pool. Secondly, we compute the similarity between image features and concept embeddings for each modality. Finally, the OCT-assisted distillation is performed based on the image-concept similarity. In the inference stage, we input this similarity matrix into a Fully Connected (FC) layer to obtain the prediction score.

BibTeX

@inproceedings{wang2023fundus,
      title={Fundus-Enhanced Disease-Aware Distillation Model for Retinal Disease Classification from OCT Images},
      author={Wang, Lehan and Dai, Weihang and Jin, Mei and Ou, Chubin and Li, Xiaomeng},
      booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
      pages={639--648},
      year={2023},
      organization={Springer}
    }

    @article{wang2024multieye,
      title={MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images},
      author={Wang, Lehan and Qi, Chongchong and Ou, Chubin and An, Lin and Jin, Mei and Kong, Xiangbin and Li, Xiaomeng},
      journal={arXiv preprint arXiv:2412.09402},
      year={2024}
    }