2025年 04期

Short Text News Fake Detection Model Based on Aggregating External Knowledge and Internal Contextual Semantics


摘要(Abstract):

为了解决短文本新闻语义特征稀疏以及忽略了外部知识与短文本新闻语义之间同源关联性的问题,提出一种外部知识与内部上下文语义聚合的短文本新闻虚假检测模型(EKCS-ST),构建新闻特征信息网络,包含新闻主题、作者、实体3种外部知识,丰富短文本新闻语义特征,通过图卷积生成新闻的外部知识图特征;将新闻文本输入到文本编码器中捕获新闻内部上下文语义特征;将外部知识图特征和内部上下文语义特征用于上下文感知计算,加强外部知识与上下文语义的关联性;使用注意力机制筛选和加强新闻关键特征,并且通过调高少数类新闻的损失误差,缓解数据不均衡问题。结果表明,本文所提模型的F_1值即精确率和召回率的调和平均值为0.86,比BERT、 TextGCN等模型分别高18%、 17%,验证了模型的有效性。

关键词(KeyWords):短文本新闻虚假检测;外部知识;注意力机制;语义特征

基金项目(Foundation):国家自然科学基金项目(72471103);; 山东省自然科学基金创新发展联合基金(ZR2022LZH016);; 山东省重点研发计划重大创新工程(2021CXGC010103)

作者(Author): 邱艳芳,赵振宇,孙志杰,马坤,纪科,陈贞翔

DOI: 10.13349/j.cnki.jdxbn.20250507.002

参考文献(References):

[1]张明道,周欣,吴晓红,等.基于语义扩充和HDGCN的虚假新闻联合检测技术[J].计算机科学,2024,51(4):299.

[2]王腾,张大伟,王利琴,等.多模态特征自适应融合的虚假新闻检测[J].计算机工程与应用,2024,60(13):102.

[3]TOMMASEL A,GODOY D.Short-text feature construction and selection in social media data:a survey[J].Artificial Intelligence Review,2018,49(3):301.

[4]ZHOU X Y,ZAFARANI R,SHU K,et al.Fake news:fundamental theories,detection strategies and challenges[C]//Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining,February 11-15,2019,Melbourne,VIC,Australia.New York:ACM,2019:836.

[5]李海霞,宋丹蕾,孔佳宁,等.传统机器学习模型的超参数优化技术评估[J].计算机科学,2024,51(8):242.

[6]刘晓明,李丞正旭,吴少聪,等.文本分类算法及其应用场景研究综述[J].计算机学报,2024,47(6):1244.

[7]MINAEE S,KALCHBRENNER N,CAMBRIA E et al.Deep learning-based text classification:a comprehensive review[J].ACM Computing Surveys,2021,54(3):62.

[8]ZHANG T Y,YOU F C.Research on short text classification based on Text CNN[J].Journal of Physics:Conference Series,2021,1757(1):012092.

[9]CHENG B W,WEI Y C,SHI H H,et al.Revisiting RCNN:On awakening the classification power of faster RCNN[C]//FERRARIV,HEBERT M,SMINCHISESCU C,et al.Computer Vision:ECCV 2018.Cham:Springer,2018:473.

[10]SHI M Y,WANG K X,LI C F.A C-LSTM with word embedding model for news text classification[C]//2019 IEEE/ACIS 18th International Conference on Computer and Information Science(ICIS),June 17-19,2019,Beijing,China.New York:IEEE,2019:253.

[11]LIU T Y,WANG K X,SHA L,et al.Table-to-text generation by structure-aware seq2seq learning[C]//Proceedings of the AAAIConference on Artificial Intelligence,February 2-7,2018,New Orleans,Louisiana,USA.Palo Alto,CA:AAAI,2018:4881.

[12]MASLENNIKOVA E.ELMO word representations for news protection[C]//CEUR Workshop Proceedings,September 9-12,2019,Lugano,Switzerland.Lugano:CEUR-WS.Org,2019:1.

[13]SHAO Y F,GENG Z C,LIU Y T,et al.CPT:a pre-trained unbalanced transformer for both chinese language understanding and generation[J].Science China:Information Sciences,2024,67(5):152102.

[14]DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,June 2-7,2019,Minneapolis,MN,USA.Stroudsburg:ACL,2019:4171.

[15]YANG Z L,DAI Z H,YANG Y M,et al.XLNet:generalized autoregressive pretraining for language understanding[C]//WAL-LACH H M,LAROCHELLE H,BEYGELZIMER A,et al.NIPS’19:Proceedings of the 33rd International Conference on Neural Information Processing Systems.Red Hook:Curran Associates Inc,2019:5753.

[16]LIU Y H,OTT M,GOYAL N,et al.Ro BERTa:a robustly optimized bert pretraining approach[EB/OL].(2019-07-26)[2024-05-01].https://doi.org/10.48550/ar Xiv.1907.11692.

[17]YAO L,MAO C S,LUO Y.Graph convolutional networks for text classification[C]//Proceedings of the AAAI conference on artificial intelligence,January 27-February 1,2019,Honolulu,Hawaii,USA.Menlo Park:AAAI Press,2019:7370.

[18]ZHANG Y F,YU X L,CUI Z Y,et al.Every document owns its structure:inductive text classification via graph neural networks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,July 5-10,2020,Online.Stroudsburg:ACL,2020:334.

[19]DING K,WANG J L,LI J D,et al.Be more with less:hypergraph attention networks for inductive text classification[C]//2020Conference on Empirical Methods in Natural Language Processing,November 16-20,2020,Online.Stroudsburg:ACL,2020:4927.

[20]HU L M,YANG T C,SHI C,et al.Heterogeneous graph attention networks for semi-supervised short text classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing,November 3-7,2019,Hong Kong,China.Stroudsburg:ACL,2019:4823.

[21]REN Y X,ZHANG J W.Fake news detection on news-oriented heterogeneous information networks through hierarchical graph attention[C]//2021 International Joint Conference on Neural Networks (IJCNN),July 18-22,2021,Shenzhen,China.New York:IEEE,2021:1.

[22]MEHTA N,PACHECO M L,GOLDWASSER D.Tackling fake news detection by continually improving social context representations using graph neural networks[C]//Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics,May 22-27,2022,Dublin,Ireland.Stroudsburg:ACL,2022:1363.

[23]NAKAMURA K,LEVY S,WANG W Y.Fakeddit:a new multimodal benchmark dataset for fine-grained fake news detection[C]//Proceedings of the Twelfth Language Resources and Evaluation Conference,May 11-16,2020,Marseille,France.Paris:European Language Resources Association,2020:6149.

[24]WANG S,GUO Y Z,WANG Y H,et al.SMILES-BERT:large scale unsupervised pre-training for molecular property prediction[C]//Proceedings of the 10th ACM International Conference on Bioinformatics,Computational Biology and Health Informatics,September 7-10,2019,New York,NY,USA.New York:ACM,2019:429.

[25]VELICKOVIC'P,CUCURULL G,CASANOVA A,et al.Graph attention networks[C]//6th International Conference on Learning Representations,April 30-May 3,2018,Vancouver,BC,Canada.[S.l.]:Open Review,2018:339.

[26]ZHOU Y C,HUO H T,HOU Z W,et al.A deep graph convolutional neural network architecture for graph classification[J].PLo S One,2023,18(3):e0279604.

[27]SUN Z J.Graph attention network for short text type news[C]//Proceedings of the 2023 6th International Conference on Big Data Technologies,September 22-24,2023,Qingdao,China.New York:ACM,2023:66.