Cross-Modal Image Translation Using Diffusion Models

Zihao Wang, Yingyu Yang, Yuzhou Chen, Tingting Yuan, Maxime Sermesant, Hervé Delingette, Ona Wu (2023). ArXiv.
Full text
PDF
URL:
Share
tweet

Abstract

Cross-modal image translation is a challenging task due to the domain gap between different modalities. Recent works have shown the effectiveness of diffusion models in generating high-quality images. In this paper, we propose a novel cross-modal image translation method based on diffusion models, which can generate realistic images in the target modality from the source modality. We introduce a mutual information maximization loss to align the latent spaces of different modalities, which helps to preserve the content information during the translation. We also propose a novel training strategy to stabilize the training of the diffusion model. Experimental results on several datasets demonstrate the effectiveness of our method.

Key Words: Cross-modal image translation, diffusion models, mutual information maximization, latent space alignment

Main points

  • Proposes a novel cross-modal image translation method based on diffusion models.

  • Introduces a mutual information maximization loss to align the latent spaces of different modalities.

  • Proposes a novel training strategy to stabilize the training of the diffusion model.

  • Demonstrates the effectiveness of the proposed method through experiments on several datasets.

Citation

@article{Wang2023,
title={Cross-Modal Image Translation Using Diffusion Models},
author={Zihao Wang, Yingyu Yang, Yuzhou Chen, Tingting Yuan, Maxime Sermesant, Herve Delingette, Ona Wu},
journal={ArXiv},
year={2023},
url={https://arxiv.org/pdf/2301.13743.pdf}
}