Abstract

Cross-modal image translation is a challenging task due to the domain gap between different modalities. Recent works have shown the effectiveness of diﬀusion models in generating high-quality images. In this paper, we propose a novel cross-modal image translation method based on diﬀusion models, which can generate realistic images in the target modality from the source modality. We introduce a mutual information maximization loss to align the latent spaces of different modalities, which helps to preserve the content information during the translation. We also propose a novel training strategy to stabilize the training of the diﬀusion model. Experimental results on several datasets demonstrate the effectiveness of our method.

Key Words: Cross-modal image translation, diﬀusion models, mutual information maximization, latent space alignment

Main points

Proposes a novel cross-modal image translation method based on diﬀusion models.
Introduces a mutual information maximization loss to align the latent spaces of different modalities.
Proposes a novel training strategy to stabilize the training of the diﬀusion model.
Demonstrates the effectiveness of the proposed method through experiments on several datasets.

Citation

@article{Wang2023,
title={Cross-Modal Image Translation Using Diﬀusion Models},
author={Zihao Wang, Yingyu Yang, Yuzhou Chen, Tingting Yuan, Maxime Sermesant, Herve Delingette, Ona Wu},
journal={ArXiv},
year={2023},
url={https://arxiv.org/pdf/2301.13743.pdf}
}