- The Context Window
- Posts
- The GPT Quantum Realm
The GPT Quantum Realm
We discuss how you can use a technique called quantisation to run LLMs on your laptop
If you’d like to explore how AI can enhance your business, reply to this email or contact us at [email protected]
The OpenAI API is expensive. But did you know it is possible to get, a ChatGPT level AI model for free on your OWN computer?
Quantisation is a process that allows a very big language model (like GPT3.5) to be shrunk to fit on consumer grade hardware. This means that you can run the model for free on your very own computer.
The LLM quantisation technique means that the GPT technology is soon to be ubiquitous & (almost) free. Imagine a world when you can be offline and have an LLM on your phone!
If you are looking to get started with Quantisation here are 3 projects, with pros & cons, try it out yourself:
Technique: GGML
Pros: Use GGML if you cannot fit the model entirely on VRAM
Cons: Slow
Technique: Bitsandbytes
Pros: Newest Framework, Ease of use
Cons: Slowest
Technique: GPTQ
Pros: Fast, If you can fit the model entirely on the GPU using VRAM, GPTQ is faster
Cons: ?

Reply