Research on News Keyword Extraction Based on TF-IDF-MP Algorithm
CSTR:
Author:
Affiliation:

Clc Number:

TP391

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The TF-IDF algorithm uses the word frequency and inverse document frequency to judge the importance of words, but the category discrimination effect is not very good. In order to improve the classification effect, a TF-IDF-MP algorithm is proposed. First, the documents in the corpus were marked with paragraphs. The word segmentation tool jieba was used to label and tag the parts of speech. Then, the number of times a feature word in a single document was compared with the average number of occurrences in the document, and the feature word weights were adjusted by the improved Sigmoid function. At the same time, different position weights were given according to the importance of the paragraph position of the relevant document. According to the weight of the feature words, Naive Bayes classifier was used to classify the documents. The experimental results show that the TF-IDF-MP algorithm is applied to the news classification, and the evaluation indicators such as accuracy, recall and F1 value are better than TF-IDF and related improved algorithms.

    Reference
    Related
    Cited by
Get Citation

曹义亲,盛武平,周会祥.基于TF-IDF-MP算法的新闻关键词提取研究[J].华东交通大学学报英文版,2021,38(1):122-130.
Cao Yiqin, Sheng Wuping, Zhou Huixiang. Research on News Keyword Extraction Based on TF-IDF-MP Algorithm[J]. JOURNAL OF EAST CHINA JIAOTONG UNIVERSTTY,2021,38(1):122-130

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: April 23,2021
  • Published:
Article QR Code