Google develops TurboQuant compression technology for AI models

Google LLC has unveiled a technology called TurboQuant that can speed up artificial intelligence models and lower their memory requirements.

Amir Zandieh and Vahab Mirrokni, two of the researchers who worked on the project, explained how it works in a Tuesday blog post.

One way to speed up AI models is to reduce the amount of data they must process to make decisions. That can be achieved by compressing the input data that a model ingests. There are many algorithms that can compress AI models’ input data, but they often provide only limited efficiency improvements. Additionally, they can introduce errors into the data they compress, which lowers AI models’ output quality.

According to Google, TurboQuant can not only compress AI models’ data more efficiently than existing algorithms but also do so with fewer errors. It does so by changing the data’s mathematical properties.

AI models represent the data they process in the form of vectors. A vector is a geometric object that is often visualized as a simple two-dimensional line. The line has two main properties: length and direction. An arrow indicates the direction of the line.

In practice, advanced AI models store data using not simple two-dimensional lines but so-called high-dimensional vectors. What sets such vectors apart from a simple line is that they point in up to thousands of different directions rather than just one. A high-dimensional vector can store a piece of data such as a sentence or an equation.

The fact that vectors have a direction means that they can be rotated in an abstract sense of the word. TurboQuant harnesses that property to optimize AI models’ data. According to Google, it uses an approach called random preconditioning to rotate an AI model’s vectors in a way that makes them easier to compress. It then compresses them with an algorithm called a quantizer.

The primary benefit of rotating vectors is that it shields them from data errors during the compression process. However, a small number of errors still find their way into the vectors. TurboQuant fixes those inaccuracies using an algorithm called QJL.

“QJL uses a mathematical technique called the Johnson-Lindenstrauss Transform to shrink complex, high-dimensional data while preserving the essential distances and relationships between data points,” Zandieh and Mirrokni explained. “This algorithm essentially creates a high-speed shorthand that requires zero memory overhead.”

Google put TurboQuant to the test by applying it to multiple open-source large language models. The company measured the LLMs’ efficiency using benchmarks that tasked them with finding specific pieces of information in a complex dataset. According to Google, the models completed the evaluations using one-sixth the memory they would have normally required. The technology also made the LLMs better at certain other long-context tasks.

Photo: Unsplash

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Source link

Google develops TurboQuant compression technology for AI models

Photo: Unsplash

Leave a Reply Cancel reply

Recent Posts

Smart energy: Flexible energy and low carbon technology adoption

There are solutions to Britain’s energy crisis | Energy industry

Monthly Updates

Weekly Updates

Important

Google develops TurboQuant compression technology for AI models

Photo: Unsplash

Leave a Reply Cancel reply

Related Post