祝贺实验室孙宇清教授的论文被人工智能领域旗舰会议ICLR 2026录用
祝贺实验室孙宇清教授的论文被人工智能领域旗舰会议ICLR 2026录用
实验室孙宇清教授为通讯作者的论文“Multimodal Aligned Semantic Knowledge for Unpaired Image-Text Matching”被人工智能领域旗舰会议ICLR 2026录用。
ICLR 2026会议即第十四届国际学习表征会议(The 14th International Conference on Learning Representations, ICLR 2026),预计将于2026年春季举行(具体时间与地点以官方公布为准)。ICLR由国际学习表征会议组织委员会主办,是深度学习与表征学习领域最具影响力的顶级学术会议之一,汇聚了全球机器学习与人工智能领域的顶尖研究者和工业界专家。ICLR以推动表征学习、深度神经网络及相关方法的发展为核心,涵盖生成模型、自监督学习、大模型、优化方法等前沿方向,在学术界和工业界均具有重要影响力。作为人工智能领域的重要旗舰会议之一,ICLR同样被中国计算机学会(CCF)列为A类国际学术会议。
该论文主要内容如下:
Multimodal Aligned Semantic Knowledge for Unpaired Image-Text Matching
While existing approaches address unpaired image-text matching by construct ing cross-modal aligned knowledge, they often fail to identify semantically cor responding visual representations for Out-of-Distribution (OOD) words. More over, the distributional variance of visual representations associated with differ ent words varies significantly, which negatively impacts matching accuracy. To address these issues, we propose a novel method namely Multimodal Aligned Semantic Knowledge (MASK), which leverages word embeddings as bridges to associate words with their corresponding prototypes, thereby enabling semantic knowledge alignment between the image and text modalities. For OOD words, the representative prototypes are constructed by leveraging the semantic relationships encoded in word embeddings. Beyond that, we introduce a prototype consistency contrastive learning loss to structurally regularize the feature space, effectively mitigating the adverse effects of variance. Experimental results on the Flickr30K and MSCOCO datasets demonstrate that MASK achieves superior performance in unpaired matching.
多模态对齐语义知识用于无配对图文匹配
现有方法通过构建跨模态对齐知识来解决无配对图文匹配问题,但在处理分布外(Out-of-Distribution,OOD)词汇时,往往难以识别与其语义对应的视觉表示。此外,不同词汇对应的视觉表示在分布上的方差差异较大,这也会对匹配精度产生负面影响。针对上述问题,本文提出了一种新的方法——多模态对齐语义知识(Multimodal Aligned Semantic Knowledge, MASK)。该方法以词嵌入作为桥梁,将词语与其对应的原型进行关联,从而实现图像与文本模态之间的语义知识对齐。对于OOD词汇,利用词嵌入中蕴含的语义关系构建其代表性原型。在此基础上,本文进一步引入原型一致性对比学习损失,从结构上对特征空间进行正则化,有效缓解方差带来的不利影响。在Flickr30K和MSCOCO数据集上的实验结果表明,MASK在无配对匹配任务中取得了优越的性能。
