地震地质 ›› 2019, Vol. 41 ›› Issue (3): 759-773.DOI: 10.3969/j.issn.0253-4967.2019.03.014

• 研究论文 • 上一篇    下一篇

面向地震应急的自媒体信息挖掘模型

苏晓慧1, 邹再超2, 苏伟2,3, 李林2, 刘峻明2,3, 张晓东2,3   

  1. 1. 北京林业大学, 信息学院, 北京 100083;
    2. 中国农业大学, 土地科学与技术学院, 北京 100083;
    3. 中国农业大学, 农业灾害遥感重点实验室, 北京 100083
  • 收稿日期:2018-06-28 修回日期:2018-11-05 出版日期:2019-06-20 发布日期:2019-07-28
  • 通讯作者: 张晓东,女,1966年生,教授,主要从事志愿者GIS技术及其应用研究,E-mail:zhangxd@cau.edu.cn。
  • 作者简介:苏晓慧,女,1985年生,2013年于中国农业大学获农业信息化技术专业博士学位,讲师,主要研究方向为空间信息技术及其应用,电话:010-62338246,E-mail:suxhui@bjfu.edu.cn。
  • 基金资助:
    "十三五"国家重点研发计划项目"天空地协同遥感监测精准应急服务体系构建与示范"(2016YFB0502500)资助。

RESEARCH ON SELF-MEDIA INFORMATION MINING MODEL FOR EARTHQUAKE EMERGENCY RESPONSE

SU Xiao-hui1, ZOU Zai-chao2, SU Wei2,3, LI Lin2, LIU Jun-ming2,3, ZHANG Xiao-dong2,3   

  1. 1. School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China;
    2. College of Land Science and Technology, China Agricultural University, Beijing 100083, China;
    3. Key Laboratory of Remote Sensing for Agri-Hazards, China Agricultural University, Beijing 100083, China
  • Received:2018-06-28 Revised:2018-11-05 Online:2019-06-20 Published:2019-07-28

摘要: 从近几年发生的特大自然灾害事件中可以发现,社交媒体平台正日益成为普通公众及时发布和获取灾情信息的最主要、最便捷的新途径,在这类平台获取的数据中隐藏了大量记录灾情现状的文字、图片等信息。文中首先对海量的历史灾情数据进行统计分析,构建了面向地震应急的信息类别体系和危急度评价体系;基于此训练了用于信息分类的朴素贝叶斯模型,模型的准确率为73.6%;同时采用机器学习模型和语义计算模型这种特征融合的分类方法,对灾情信息的危急度进行评价,评价模型的准确率为89.2%。该模型能够在震后实时地对自媒体中出现的灾情信息进行爬取、分类和评价等操作,可从海量的自媒体信息中挖掘出少量危急又重要的信息,以辅助震后的灾情研判和精准救援。文中最后以2017年8月8日九寨沟地震事件为例,从地震烈度速报、震后精准救援2个角度对挖掘数据的可用性进行了研究分析。

关键词: 地震应急, 自媒体, 语义分析, 危急度

Abstract: From the events of catastrophic natural disasters that have occurred in recent years, it can be found that social media platforms are increasingly becoming the most important and most convenient way for the general public to timely release and obtain information on disasters. The information obtained from such platforms contains a large amount of information in the form of texts, pictures, etc. that record the current situation of the disaster. And it also has characteristics of high efficiency and high spatial distribution to serve the rapid emergency after the earthquake. In this paper, we firstly make a statistical analysis of 32 689 pieces of historical disaster data acquired from 5 earthquakes with obvious characteristics, such as post-earthquake disaster events, user's expression habits and so on, and adopts cross-validation method. Then information classification system which includes seven first-level categories and more than 50 second-level categories is constructed. The information classification system and evaluation system of crisis degree for post-earthquake emergency response are constructed both using cross-validation method. The former is referred to the thought of existing classification basis and the experience knowledge of several emergency experts. Based on the five indicators of subject word, action word, degree word, time and position measurement, an evaluation system of critically with four levels of severity, moderate intensity, mildness and others was constructed. Considering the sparse features of self-media information and the large difference in the number of training sets, a naive Bayes model for information classification is trained based on the classification system and evaluation system. Its accuracy rate is 73.6%. At the same time, the classification method of feature fusion of machine learning model and semantic calculation model is used to evaluate the criticality of the disaster information. The accuracy rate of the evaluation model is 89.2%, higher than 85.2% of the semantic computing model and 77% of the naive Bayesian model. The evaluation model has combined the advantages of semantic computing method which can evaluate all index features with machine learning method which has high classification efficiency and accuracy. The thresholds for classification between mild and moderate intensity, moderate intensity and severe intensity were 15.2 and 27.39. The model realized in this paper can crawl, classify and evaluate the disaster information in the media in real time after an earthquake, and realizes mining of a small amount of critical and important information from the massive self-media information, thus, to assist in earthquake intensity rapid reporting and accurate rescue. Finally, taking the Jiuzhaigou earthquake on August 8, 2017 as an example, 17 432 pieces of data were crawled in real time within 48 hours after the earthquake. At the same time, based on ArcGIS, the mining information is visualized in time and space, and the availability of the data is evaluated from two perspectives of earthquake intensity quick reporting and accurate rescue after the earthquake. The disaster information of Jiuzhaigou County in the earthquake area is obviously more than that of the non-earthquake area in terms of quantity and emergency degree. The results show that the self-media information with high spatial distribution can effectively find the severer disaster grade area after the earthquake, shorten the time of earthquake intensity prediction, effectively classify and extract information, provide real-time information for precise rescue, and improve the efficiency of emergency response after the earthquake.

Key words: earthquake emergency, self-media, semantic analysis, criticality

中图分类号: