一种非监督的事件触发词检测和分类方法
CSTR:
作者:
作者单位:

1. 中国科学院大学北京100049;2. 中国科学院空间信息处理与应用系统重点实验室北京100190

中图分类号:

TP3

基金项目:

国家自然科学基金(61331017)项目资助


Unsupervised method for event trigger identification and classification
Author:
Affiliation:

1. University of Chinese Academy of Sciences, Beijing 100049, China; 2. Key Laboratory of Technology in Geospatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190,China

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [19]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    事件触发词检测和分类是事件抽取中至关重要的第一步。传统的抽取和分类方法往往倾向于监督学习方法,如条件随机场、SVM等,但由于这类方法需要繁重的人工标注且受限于预先定义好的类别,因此很难在开放领域中得到应用。提出了一种非监督的事件触发词检测和分类方法,利用主题模型获取候选触发词在主题上的分布,然后利用二值状态自动机模型捕获高概率的主题,从而筛选出真正的事件触发词和相应的分类。在大规模的未标注新浪新闻数据集上的实验结果充分验证了本文方法的有效性。

    Abstract:

    The identification and classification of event trigger plays a decisive role in event extraction. Usually, the trigger words are extracted based on supervised machine learning methods such as CRF. However, since these methods rely on expensive manual annotation and require predefined event types, they are not sufficient for open domain application. In this paper, we present an unsupervised method for event trigger identification and classification. First, we run a topic model to obtain the topic distribution over each candidate trigger word. Then, an improved twostate automaton is proposed to detect the real trigger word and capture the corresponding topics. The experiment on a large unlabeled corpus shows our unsupervised model is very inspiring.

    参考文献
    [1]BUYKO E, FAESSLER E, WERMTER J,et al. Event extraction from trimmed dependency graphs[C]. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task. Association for Computational Linguistics, 2009: 1927.
    [2]VLACHOS A, BUTTERY P, S AGHDHA D O, et al. Biomedical event extraction without training data[C]. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task. Association for Computational Linguistics, 2009: 3740.
    [3]LE MINH Q, TRUONG S N, BAO Q H.A pattern approach for biomedical event annotation[C]. Proceedings of the BioNLP Shared Task 2011 Workshop. Association for Computational Linguistics, 2011: 149150.
    [4]郑新元,严军,范浩,等.线性不稳定环境下的WIFI室内定位系统[J].电子测量技术,2015,38(12):121124
    [5]王道明,鲁昌华,蒋薇薇,等.基于粒子群算法的决策树SVM多分类方法研究[J].电子测量与仪器学报,2015,29(4):611615
    [6]徐超, 高梦珠, 查宇锋, 等.基于HOG和SVM的公交乘客人流量统计算法[J].仪器仪表学报,2015,36(2):446452
    [7]RITTER A, ETZIONI O, CLARK S. Open domain event extraction from twitter[C]. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012: 11041112.
    [8]AHN D. The stages of event extraction[C]. Proceedings of the Workshop on Annotating and Reasoning about Time and Events. Association for Computational Linguistics, 2006: 18.
    [9]LI P, ZHU Q, DIAO H, et al. Joint modeling of trigger identification and event type determination in chinese event extraction[C]. Proceedings of COLING2012, 2012:16351652.
    [10]王健, 吴雨, 林鸿飞, 等. 基于深层句法分析的生物事件触发词抽取[J]. 计算机工程, 2014, 40(1): 2530.
    [11]TIAN L, MA W, ZHOU W. Automatic event trigger word extraction in chinese event[J]. Journal of Software Engineering & Applications, 2012, 5(12):208212.
    [12]轩小星, 廖涛, 高贝贝. 中文事件触发词的自动抽取研究[J]. 计算机与数字工程, 2015, 43(3): 457461.
    [13]丁效, 宋凡, 秦兵, 等. 音乐领域典型事件抽取方法研究[J]. 中文信息学报, 2011, 25(2): 1520.
    [14]BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003(3):9931022.
    [15]宋俊.基于概率主题模型的话题演化与摘要生成方法研究[D].北京:中国科学院大学,2015:211.
    [16]田璟, 郭智, 黄宇, 等. 一种基于多模态主题模型的图像自动标注方法[J]. 国外电子测量技术, 2015 ,34(5): 2226.
    [17]CHEN M H, SHAO Q M, IBRAHIM J G. Monte Carlo methods in Bayesian computation[M]. Springer Science & Business Media, 2012:1966.
    [18]IHLER A, HUTCHINS J, SMYTH P. Adaptive event detection with timevarying poisson processes[C]. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006: 207216.
    [19]DIAO Q, JIANG J, ZHU F, et al. Finding bursty topics from microblogs[C]. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long PapersVolume 1. Association for Computational Linguistics, 2012: 536544.
    相似文献
    引证文献
引用本文

陈自岩,黄宇,王洋,傅兴玉,付琨.一种非监督的事件触发词检测和分类方法[J].国外电子测量技术,2016,35(7):91-95

复制
分享
文章指标
  • 点击次数:811
  • 下载次数: 2109
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 在线发布日期: 2016-09-30
文章二维码
×
《国外电子测量技术》
2025年投稿方式有变