πŸ“¦ DL

GloVe: Global Vectors for Word Representation

date
Sep 26, 2023
slug
nlp-4
author
status
Public
tags
NLP
DL
summary
semantic, syntactic, matrix factorization, window-based
type
Post
thumbnail
7e8c527d-5675-460e-adf8-271178f235d0_2259.png
category
πŸ“¦ DL
updatedAt
Sep 26, 2023 05:50 AM

Title:

Keyword: semantic, syntactic, matrix factorization, window-based

Concept

Learning vector space representations of words β†’ before; vector arithmetic
1) Matrix Factorization Method
rows; words OR terms, columns; different document
Advantage] At HAL, raw co-occurrence counts are compressed evenly on distribution in a smaller interval.
Problem] most freq. words give disproportionate amount for similarity measurment (ex. β€œthe”, β€œand”)
2) Shallow Window-Based Methods
CBOW model(skip-gram & continuous BOW model): single-layer arch. based on the inner product
Advantage] In word analogy task, easy to learn linguistic patterns as linear relationship
Problem] DO NOT operater directly on the co-occurrence statistics (not for whole data!)
β†’ GloVe: 1) global matrix factorization(LSA) + 2) local context window(skip-gram model Mikolovet al)
β†’ Try to combine these advantages!

Progress

: probability that word j appears in context of word i
β†’ Def.) : co-occurrence prob. of word i and j
if co-occur.prob ~= 1: word k have relationship both or neither
else: word k have relationship on one word(i or j)
co-occur. has dependency on i, j, k
domain should be on vector space and use inner product for calculation
Function should have homomorphism
combine eq. 2) and 3)
if F is exponantial
log(X) is independent on k, so think as bias
In last equation, if , value would be infinite. β†’ So, apply the weight function with cost function
notion image
for weight func, 1) f(0) = 0 (for conti.), 2) non-decreasing func. (for no overweight), 3) not give large weight for frequent X β†’

Result

Complexity:
Evaluation
1) Word Analogy: estimate the word on given context (classify by semantic and syntactic question)
2) Word Similarity: dataset; WordSim-353, MC, RG
3) NER(Named Entity Recognition)
Tokeization β†’ Small Letter β†’ choose top 400,000 words for making co-occurrence count matrix (use decreasing weight func.) β†’ 50 repeats on dim < 300, else 100 repeats && window range; 10 words for each sides
Best performance in 2014!
Β 

Opinion

  • For weight function, how the performance appears the difference on current function and half-softmax function?
  • For bias , how can set these? fixed value or individual value for some condition?

Advanced

Β