Research on News Keyword Extraction Based on TF-IDF-MP Algorithm

doi:10.16749/j.cnki.jecjtu.2021.01.019

Home > Archive>Volume 38, Issue 1, 2021 >122-130. DOI:10.16749/j.cnki.jecjtu.2021.01.019

Research on News Keyword Extraction Based on TF-IDF-MP Algorithm
DOI:
                        10.16749/j.cnki.jecjtu.2021.01.019
                    
CSTR:
                        [cstr]
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:TP391
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

The TF-IDF algorithm uses the word frequency and inverse document frequency to judge the importance of words, but the category discrimination effect is not very good. In order to improve the classification effect, a TF-IDF-MP algorithm is proposed. First, the documents in the corpus were marked with paragraphs. The word segmentation tool jieba was used to label and tag the parts of speech. Then, the number of times a feature word in a single document was compared with the average number of occurrences in the document, and the feature word weights were adjusted by the improved Sigmoid function. At the same time, different position weights were given according to the importance of the paragraph position of the relevant document. According to the weight of the feature words, Naive Bayes classifier was used to classify the documents. The experimental results show that the TF-IDF-MP algorithm is applied to the news classification, and the evaluation indicators such as accuracy, recall and F1 value are better than TF-IDF and related improved algorithms.

Reference

Cited by

Get Citation

曹义亲,盛武平,周会祥.基于TF-IDF-MP算法的新闻关键词提取研究[J].华东交通大学学报英文版,2021,38(1):122-130.
Cao Yiqin, Sheng Wuping, Zhou Huixiang. Research on News Keyword Extraction Based on TF-IDF-MP Algorithm[J]. JOURNAL OF EAST CHINA JIAOTONG UNIVERSTTY,2021,38(1):122-130

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:
Revised:
Adopted:
Online: April 23,2021
Published:

Home

About Journal

Editorial Board

Guidelines

Review Guideline

Download

Contact Us

中文

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code