The GPT Quantum Realm

We discuss how you can use a technique called quantisation to run LLMs on your laptop

If you’d like to explore how AI can enhance your business, reply to this email or contact us at [email protected] 

The OpenAI API is expensive. But did you know it is possible to get, a ChatGPT level AI model for free on your OWN computer?

Quantisation is a process that allows a very big language model (like GPT3.5) to be shrunk to fit on consumer grade hardware. This means that you can run the model for free on your very own computer.

The LLM quantisation technique means that the GPT technology is soon to be ubiquitous & (almost) free. Imagine a world when you can be offline and have an LLM on your phone!

If you are looking to get started with Quantisation here are 3 projects, with pros & cons, try it out yourself:

Technique: GGML

Pros: Use GGML if you cannot fit the model entirely on VRAM

Cons: Slow

Technique: Bitsandbytes

Pros: Newest Framework, Ease of use

Cons: Slowest

Technique: GPTQ

Pros: Fast, If you can fit the model entirely on the GPU using VRAM, GPTQ is faster

Cons: ?

Reply

or to participate.