NLP Concepts

27 May 2024 Deep-Learning and Note

Word Weighting

1 无监督 (unsupervised) 方法

1.1 统计方法(TF, TF-IDF, YAKE)

1.2 图方法 (TextRank, SingleRank, TopicRank, PositionRank)

TF-IDF

In information retrieval, tf–idf (also TF*IDF, TFIDF, TF–IDF, or Tf–idf), short for term frequency–inverse document frequency, is a measure of importance of a word to a document in a collection or corpus, adjusted for the fact that some words appear more frequently in general.[1] It was often used as a weighting factor in searches of information retrieval, text mining, and user modeling. A survey conducted in 2015 showed that 83% of text-based recommender systems in digital libraries used tf–idf.[2]

Variations of the tf–idf weighting scheme were often used by search engines as a central tool in scoring and ranking a document’s relevance given a user query.

One of the simplest ranking functions is computed by summing the tf–idf for each query term; many more sophisticated ranking functions are variants of this simple model.

Comments