- [[word embedding models]], [[input embedding]], [[contextual embedding]]
# Idea
A dense numeric representation of a [[tokens|token]].
Embeddings allow us to represent large chunks of text using fewer numbers. Semantic relationships across tokens can be represented.
Embeddings are learned from data.
A text embedding is a piece of text projected into a high-dimensional latent space. The position of our text in this space is a vector, a long sequence of numbers. Think of the two-dimensional cartesian coordinates from algebra class, but with more dimensions—often 768 or 1536.
![[20240423225950.gif]]
An embedding is a sequence of numbers that represents the concepts within content such as natural language or code. Embeddings make it easy for machine learning models and other algorithms to understand the relationships between content and to perform tasks like clustering or retrieval. They power applications like knowledge retrieval in both ChatGPT and the Assistants API, and many retrieval augmented generation (RAG) developer tools.
Embeddings represent words, tokens, or even larger text segments in a continuous vector space.
![[20250812134948.png]]
# References
- [New embedding models and API updates](https://openai.com/blog/new-embedding-models-and-api-updates)
- [platform.openai.com/docs/guides/embeddings/use-cases](https://platform.openai.com/docs/guides/embeddings/use-cases)
- [Site Unreachable](https://www.youtube.com/watch?v=ySus5ZS0b94)
- [An intuitive introduction to text embeddings - Stack Overflow](https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/)
- [The Evolution of Embeddings - by Avi Chawla](https://blog.dailydoseofds.com/p/the-evolution-of-embeddings)
- [How word vectors encode meaning - YouTube](https://www.youtube.com/shorts/FJtFZwbvkI4)