祝贺实验室郑威博士的论文被人工智能领域旗舰会议ICML 2026录用

祝贺实验室郑威博士的论文被人工智能领域旗舰会议ICML 2026录用

实验室郑威博士为第一作者的论文“Unsupervised Process-Aware Coreset Selection for In-Context Learning”（作者：郑威，王子杰，李新，龚斌，孙宇清）被人工智能领域旗舰会议ICML 2026录用。

ICML 2026会议即第43届国际机器学习大会（The 43rd International Conference on Machine Learning, ICML 2026），预计将于2026年夏季举行（具体时间与地点以官方公布为准）。ICML由国际机器学习学会（International Machine Learning Society, IMLS）主办，是机器学习领域最具影响力的顶级学术会议之一，汇集了来自全球的研究人员与工业界专家。ICML长期聚焦于机器学习理论、方法及其在各领域的应用前沿，涵盖深度学习、概率建模、优化方法、强化学习等核心方向，在学术界与工业界均具有广泛影响力。作为机器学习领域的旗舰会议之一，ICML也被中国计算机学会（CCF）列为人工智能方向的A类国际学术会议。

该论文主要内容如下：

Unsupervised Process-Aware Coreset Selection for In-Context Learning

We address the challenge of unsupervised coreset selection for few-shot in-context learning (ICL). The goal is to select a small subset of examples under a fixed annotation budget to yield effective prompts for large language models. Existing geometry-based methods often yield coresets that suffer from a skewed distribution, due to the oversampling of peripheral examples and high local redundancy. To address these issues, we propose a process-aware framework for coreset selection. It jointly optimizes the diversity and representativeness of selected samples via a submodular objective. It ensures representativeness by selecting samples based on local density awareness, while promoting diversity by imposing a redundancy penalty relative to the evolving selected set. Thus, it performs progress-aware balancing of representativeness and diversity based on the selection context. Extensive experiments on 7 NLP datasets demonstrate that our method consistently outperforms state-of-the-art coreset selection methods in downstream ICL performance. Further analysis validates that our approach better balances diversity and representativeness throughout the selection process, while retaining the theoretical guarantees of submodular optimization.

用于上下文学习的无监督过程感知核心集选择

我们针对无监督核心集选择在少样本上下文学习（ICL）中的挑战展开研究。目标是在固定标注预算下，选取一小部分示例样本，以生成对大型语言模型有效的提示。现有的基于几何的方法选取的核心集往往存在分布偏斜的问题，这源于对边缘样本的过度采样以及较高的局部冗余性。为解决这些问题，我们提出了一种过程感知的核心集选择框架。该框架通过一个子模目标函数，联合优化所选样本的多样性与代表性。它基于局部密度感知选择样本以确保代表性，同时通过对演化中的已选集合施加冗余惩罚来促进多样性。因此，该方法能够根据选择进程，在代表性和多样性之间实现过程感知的平衡。在7个自然语言处理数据集上的大量实验表明，我们的方法在提升下游上下文学习性能方面，始终优于最先进的核心集选择方法。进一步的分析验证了我们的方法在整个选择过程中能更好地平衡多样性与代表性，同时保留了子模优化的理论保证。