Papers

An entity mapping technology of national grid public data model integrating BERT and congestion filtering

  • LI Yufei ,
  • HAO Baocong ,
  • LOU Yiwei ,
  • YANG Shiyu ,
  • GAO Shijie ,
  • ZHANG Pengyu
Expand
  • 1. Big Data Center of State Grid Corporation of China, Beijing 100053, China
    2. School of Computer Science, Peking University, Beijing 100871, China
    3. Beijing Zhongdian Puhua Information Technology Co., Ltd., Beijing 100085, China

Received date: 2023-02-06

  Revised date: 2023-03-09

  Online published: 2023-08-30

Abstract

Aiming at the problems of current SG-CIM (state grid-common information model) such as difficult to achieve automatic update iteration and low efficient mining of new elements, an SG-CIM model automatic mapping technology based on BERT model and blocking filtering is proposed. On the basis of the existing SG-CIM, an SG-CIM knowledge map and data table knowledge graph are constructed at first. Secondly, by studying the entity alignment method based on BERT model and blocking filtering, the mapping relationship between the two knowledge graphs is established. Finally, the effectiveness of the proposed method is verified by experimental analysis of the text mapping effect. Results show that the accuracy of BERT model after finetuning on a self-made data set is more than 95%. This method lays a foundation for subsequent mining of new elements and automatic updating iteration of SG-CIM.

Cite this article

LI Yufei , HAO Baocong , LOU Yiwei , YANG Shiyu , GAO Shijie , ZHANG Pengyu . An entity mapping technology of national grid public data model integrating BERT and congestion filtering[J]. Science & Technology Review, 2023 , 41(15) : 113 -123 . DOI: 10.3981/j.issn.1000-7857.2023.15.012

References

[1] 杨帅. 基于SG-CIM的配电网生产管理系统的研究与应用[D]. 北京: 华北电力大学, 2018.
[2] 徐尧强, 舒乔晔, 黄昭, 等. 基于公共信息模型的电力项目管理模型设计[J]. 能源工程, 2021(4): 76-80.
[3] HAO YI M,WU Y,CHEN L,et al. Intelligent question answering system based on domain knowledge graph[P]. 2022 3rd International Conference on Artificial Intelligence and Education: IC-ICAIE 2022, 2022.
[4] ZHOU H J, SHEN T T, LIU X L, et al. Survey of knowledge graph approaches and applications[J]. Journal on Artificial Intelligence, 2020, 2(2): 89-101.
[5] 曲克童 . 基于深度迁移学习的电力知识图谱智能问答[D]. 北京: 华北电力大学(北京), 2022.
[6] Sajisha P S, Anoop V S, Ansal K A. Knowledge graphbased recommendation systems: The state-of-the-art and some future directions[J]. International Journal of Machine Learning and Networked Collaborative Engineering, 2019, 3(3): 159-167.
[7] 陈烨, 周刚, 卢记仓 . 多模态知识图谱构建与应用研究综述[J]. 计算机应用研究, 2021, 38(12): 3535-3543.
[8] 闻涛 . 面向知识图谱的实体对齐和知识补全[D]. 杭州:杭州电子科技大学, 2019.
[9] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2. New York: ACM, 2013: 3111-3119.
[10] Hu B T, Lu Z D, Li H, et al. Convolutional neural network architectures for matching natural language sentences[DB/OL]. arXiv Preprint: 1503.03244, 2015.
[11] Yoon K. Convolutional neural networks for sentence classification. 2014[DB/OL]. arXiv Preprint: CL/1408.5852, 2014.
[12] Wang S H, Jiang J. Learning natural language inference with LSTM[DB/OL]. arXiv Preprint: 1512.08849, 2015.
[13] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[DB/OL]. arXiv Preprint: 1810.04805, 2018.
[14] 张富, 杨琳艳, 李健伟, 等. 实体对齐研究综述[J]. 计算机学报, 2022, 45(6): 1195-1225.
[15] Cohen W W, Richman J. Learning to match and cluster large high-dimensional data sets for data integration[C]//Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2002: 475-480.
[16] Song D Z, Luo Y, Heflin J. Linking heterogeneous data in the semantic web using scalable and domain-independent candidate selection[J]. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(1): 143-156.
[17] Arasu A, Götz M, Kaushik R. On active learning of record matching packages[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. New York: ACM, 2010: 783-794.
[18] Teong K S, Soon L K, Su T T. Schema-agnostic entity matching using pre-trained language models[C]// Proceedings of the 29th ACM International Conference on Information & Knowledge Management. New York: ACM, 2020: 2241-2244.
[19] 李家瑞, 李华昱, 闫阳. 面向多源异质数据源的学科知识图谱构建方法[J]. 计算机系统应用, 2021, 30(10): 59-67.
[20] Gruber T R. A translation approach to portable ontology specifications[J]. Knowledge Acquisition, 1993, 5(2): 199-220.
[21] Cui Y M, Che W X, Liu T, et al. Pre-training with whole word masking for Chinese BERT[J]. ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
[22] 谢腾, 杨俊安, 刘辉 . 基于 BERT-BiLSTM-CRF 模型的中文实体识别[J]. 计算机系统应用, 2020, 29(7): 48-55.
[23] 杨晨 . 基于神经网络的短文本语义相似度计算方法研究[D]. 成都: 电子科技大学, 2020.
[24] Vaswani A, Shazeer N, Parmar N, et al. Attention is all You need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6000-6010.
[25] Zeng K S, Li C J, Hou L, et al. A comprehensive survey of entity alignment for knowledge graphs[J]. AI Open, 2021, 2: 1-13.
[26] Carrington A M, Manuel D G, Fieguth P W, et al. Deep ROC analysis and AUC as balanced average accuracy, for improved classifier selection, audit and explanation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(1): 329-341.
Outlines

/