word embedding models

- [[text similarity]], [[semantic search]] # Idea Word embedding maps words to vectors. It is a feature vector representation of a word. Embeddings are the backbone of some of the most impressive AI models we have today. Here's an example of a simple word embedding for the word "apple": ```r apple = [-0.7681, 0.4352, -0.3706, 0.6055, 0.1569, -0.8971] ``` It is a 6-dimensional vector representation of the word "apple". In practice, word embeddings usually have higher dimensions, such as 100, 300, or even more, to better capture the relationships among words. [[GPT-3]] uses vectors with 12288 dimensions, and CohereAI uses 4096. The actual values in the word embeddings are not directly interpretable by humans. They're useful for feeding into machine learning algorithms, which can leverage the semantic information they encode to perform tasks like text classification, sentiment analysis, and machine translation. We use [[neural networks]] to represent content with these vectors, which have a simple characteristic: Related content will be close together, while content with different meanings should lie far away. ![[Pasted image 20211219183308.png]] Popular models: [[word2vec]], [[GloVe]], [[FastText]], [[Poincare embeddings]], [[BERT]] ## Example training Training word2Vec - Initialization: Randomly initialize weights for the shallow neural network. The network typically has one hidden layer. - Objective: The model aims to optimize an objective function that either maximizes the probability of correctly predicting context words given a target word (Skip-gram) or predicts the target word from the given context (CBOW). - Forward and Backward Pass: For each word and its context in the corpus, the model undergoes forward and backward passes to adjust its weights. - Update: Use optimization techniques like stochastic gradient descent (SGD) to update the weights. - Iterate: Steps 3-4 are repeated for multiple epochs until the model performs satisfactorily. After Training - Lookup Table: Post-training, each word in the vocabulary corresponds to a row in the weight matrix of the hidden layer. - Vector Extraction: When you query the word "apple," the model goes to the row corresponding to "apple" in the hidden layer's weight matrix and returns that row as the word vector. - Think of this process like using a dictionary. You've trained yourself (the model) to understand the meaning (embedding) of each word (e.g., "apple"). When someone asks for the meaning, you quickly look it up in your mental dictionary and provide the 'definition' (the word vector). Example: In training, if the current word is "apple" and the context window includes "fruit," "sweet," and "juice," the model would adjust its weights to better predict either "fruit," "sweet," and "juice" when given "apple" (Skip-gram) or predict "apple" when given "fruit," "sweet," and "juice" (CBOW). # References - [Neural Networks Hate Text - by Santiago - Underfitted](https://underfitted.svpino.com/p/neural-networks-hate-text) - [Word Embedding Models — shorttext 1.5.3 documentation](https://shorttext.readthedocs.io/en/latest/tutorial_wordembed.html) - [Short technical information about Word2Vec, GloVe and Fasttext | by Côme Cothenet | Towards Data Science](https://towardsdatascience.com/short-technical-information-about-word2vec-glove-and-fasttext-d38e4f529ca8) - [Introduction to word embeddings – Word2Vec, Glove, FastText and ELMo – Data Science, Machine Learning, Deep Learning](https://www.alpha-quantum.com/blog/word-embeddings/introduction-to-word-embeddings-word2vec-glove-fasttext-and-elmo/) - [GitHub - facebookresearch/fastText: Library for fast text representation and classification.](https://github.com/facebookresearch/fastText) - [What is the difference between word2Vec and Glove ? - Machine Learning Interviews](https://machinelearninginterview.com/topics/natural-language-processing/what-is-the-difference-between-word2vec-and-glove/) - [GloVe and fastText — Two Popular Word Vector Models in NLP - DZone AI](https://dzone.com/articles/glove-and-fasttext-two-popular-word-vector-models) - [Word2Vec, GLOVE, FastText and Baseline Word Embeddings step by step | by Akash Deep | Analytics Vidhya | Medium](https://medium.com/analytics-vidhya/word2vec-glove-fasttext-and-baseline-word-embeddings-step-by-step-d0489c15d10b) - [Word Embeddings in NLP | Word2Vec | GloVe | fastText | by Aravind CR | Analytics Vidhya | Medium](https://medium.com/analytics-vidhya/word-embeddings-in-nlp-word2vec-glove-fasttext-24d4d4286a73) - [The General Ideas of Word Embeddings | by Timo Böhm | Towards Data Science](https://towardsdatascience.com/the-three-main-branches-of-word-embeddings-7b90fa36dfb9)