Abstract:The TF-IDF algorithm uses the word frequency and inverse document frequency to judge the importance of words, but the category discrimination effect is not very good. In order to improve the classification effect, a TF-IDF-MP algorithm is proposed. First, the documents in the corpus were marked with paragraphs. The word segmentation tool jieba was used to label and tag the parts of speech. Then, the number of times a feature word in a single document was compared with the average number of occurrences in the document, and the feature word weights were adjusted by the improved Sigmoid function. At the same time, different position weights were given according to the importance of the paragraph position of the relevant document. According to the weight of the feature words, Naive Bayes classifier was used to classify the documents. The experimental results show that the TF-IDF-MP algorithm is applied to the news classification, and the evaluation indicators such as accuracy, recall and F1 value are better than TF-IDF and related improved algorithms.