祝贺实验室潘韦同学的论文被IJCNN2021会议录用
实验室硕士研究生潘韦同学的论文“Learning New Word Semantics with Conceptual Text”( 作者:潘韦,刘天元,张文涛,孙宇清)被IJCNN2021会议录用。
第31届神经网络国际联合会议(The 31st International Joint Conference on Neural Network, IJCNN 2021)将于2021年7月18日-7月22日以线上会议形式召开。IJCNN是IEEE计算智能学会和国际神经网络学会的旗舰会议。它涵盖了神经嵌入系统、表示学习、深度语义计算等广泛的研究方向。
Abstract: In this paper, we consider the embedding problem of Chinese new word with respect to its conceptual definition or description, which is especially important for understanding specialty documents. We present a two-stage model to learn the Chinese new word embedding, where the first encodes the information of character components and context, and the second aggregates the semantics of multiple texts. We perform extensive experiments to verify the proposed method and the results outperform the state of art methods on both direct semantics verification and advanced NLP tasks. Comparing with previous methods that require a corpus or an elaborately designed dataset for learning a new word embedding, our method requires only a few pieces of text and supports the evolution of meanings. We also experimentally verify the effects of different parts of model, the number and types of conceptual texts. Finally, we present some biology texts to illustrate whether the specialty semantics are encoded in the word embedding.
摘要:本文从概念性描述文本的角度考虑了中文新词的向量问题,这对于理解专业文献至关重要。我们提出一个两阶段模型来学习中文新词向量:第一阶段编码单词组成字符信息和上下文信息;第二阶段则是用于聚合多个不同文本的语义向量。本文进行了广泛的实验来验证所提出的方法,结果显示在直接语义验证和高级自然语言处理任务上局优于目前最新的方法。与之前需要大规模语料来学习单词向量的方法相比,我们的模型词向量学习只需要几段文本并且支持向量词义的更新。之后,还通过实验分析了模型的组成结构,训练文本的数量以及概念文本类型的影响。最后,在生物专业领域对氨基酸学到的词向量进行可视化表示,说明了模型能够将该领域的特殊语义进行编码。
引用:Wei Pan, Tianyuan Liu, Wentao Zhang and Yuqing Sun. Learning New Word Semantics with Conceptual Text [C]. International Joint Conference on Neural Networks. 2021.