TECH SPACE
Shrinking AI memory improves LLM accuracy
illustration only

Shrinking AI memory improves LLM accuracy

by Sophie Jenkins
London, UK (SPX) Dec 26, 2025
Researchers have developed a new way to compress the memory used by AI models to increase their accuracy in complex tasks or reduce the energy needed to run them.

Experts from the University of Edinburgh and NVIDIA found that large language models using memory eight times smaller than an uncompressed system scored better on maths, science, and coding tests while spending the same amount of time reasoning. The method can also be configured so that models respond to more user queries simultaneously, lowering the power required per task.

The approach focuses on the models' key-value cache, or KV cache, which stores segments of step-by-step reasoning sequences known as reasoning threads. As models generate more threads or extend them, the KV cache grows and becomes slower to retrieve, creating a bottleneck during inference when the system answers prompts.

To address this, the team developed Dynamic Memory Sparsification (DMS), a technique that compresses the KV cache by deciding which tokens to retain and which to delete. Instead of keeping every token, DMS selects those judged most important so the model keeps useful context while reducing memory use.

There is a short delay between deciding to delete tokens and actually removing them, which gives the model time to transfer valuable information from tokens that will be evicted into those that remain. By managing token eviction in this way, DMS allows the AI model to explore more possible solutions or reason in greater depth without extra compute.

The researchers tested DMS on different versions of the Llama and Qwen model families and compared their performance with non-compressed baselines. Even when memory was compressed to one eighth of its original size, large language models maintained their accuracy on difficult tasks and produced results faster than non-compressed systems.

In the AIME 24 mathematics test, which serves as a qualifier for the United States Mathematical Olympiad, compressed models performed twelve points better on average while using the same number of KV cache reads per answer. On GPQA Diamond, a set of complex questions in biology, chemistry, and physics authored by PhD-level experts, the compressed models scored more than eight points higher.

The models were also evaluated with LiveCode Bench, which measures how well AI systems write code. In these tests, compressed models scored about ten points better on average than non-compressed models, indicating that KV cache compression can preserve and enhance reasoning quality while operating with much smaller memory budgets.

The findings were peer reviewed and presented at the NeurIPS 2025 conference. The paper, titled "Inference-Time Hyper-Scaling with KV Cache Compression," is available at https://openreview.net/pdf?id=8ZiElzQxf1.

Dr Edoardo Ponti, GAIL Fellow and Lecturer in Natural Language Processing at the University's School of Informatics, said: "In a nutshell, our models can reason faster but with the same quality. Hence, for an equivalent time budget for reasoning, they can explore more and longer reasoning threads. This improves their ability to solve complex problems in maths, science, and coding."

Dr Ponti and his team will continue to study how large AI systems represent and remember information as part of a 1.5 million euros European Research Council-funded project called AToM-FM, which aims to make such systems more efficient and sustainable.

Research Report:Inference-Time Hyper-Scaling with KV Cache Compression

Related Links
University of Edinburgh
Space Technology News - Applications and Research

Tweet

TECH SPACE
US denies visas to EU ex-commissioner, four others over tech rules
Washington, United States (AFP) Dec 24, 2025
The US State Department said Tuesday it would deny visas to a former EU commissioner and four others, accusing them of seeking to "coerce" American social media platforms into censoring viewpoints they oppose. "These radical activists and weaponized NGOs have advanced censorship crackdowns by foreign states - in each case targeting American speakers and American companies," the department said in a statement announcing the sanctions. The measure targeted Thierry Breton, the former top tech regu ... read more

TECH SPACE
France updates net-zero plan, with fossil fuel phaseout; Fight over fossil fuels nixes key text of UN environment report

EU agrees to weaken and delay green business rules

Policies to expand US grid weigh cost reliability and emissions

Keep energy infrastructure out of war, Turkey warns Moscow, Kyiv

TECH SPACE
Tokamak study maps error impacts on plasma equilibrium models

Solar co-electrolysis process converts biomass sugars to low cost green hydrogen

EAST experiments point to density free regime for fusion plasmas

Chitin based carbon aerogel boosts stable thermal energy storage

TECH SPACE
Trump gets wrong country, wrong bird in windmill rant

S.Africa seeks to save birds from wind turbine risks

Vertical wind turbines may soon power UK railways using tunnel airflow

Danish wind giant Orsted to cut workforce by a quarter

TECH SPACE
PCBM additive strategy lifts efficiency and durability of inverted perovskite solar cells

3D mapping shows how passivation boosts perovskite solar cells

NUS team boosts durability of vapor deposited perovskite silicon tandem solar cells

Bilayer tin oxide layer boosts back contact perovskite solar cell efficiency and stability

TECH SPACE
Crown ether resins modeled for precise gadolinium isotope separation

Project Pele microreactor reaches key milestone with first TRISO fuel delivery

Microbes join forces to quickly clean up uranium pollution

India's parliament passes bill to open nuclear power to private firms

TECH SPACE
Biochar layer boosts hydrogen rich gas yields from corn straw

Carbon monoxide enables rapid atomic scale control for fuel cell catalysts

Singapore sets course for 'green' methanol ship fuel supplies

Methane conversion enabled by iron catalyst delivers pharmaceutical compounds

TECH SPACE
Maduro now in New York jail as Trump says US to 'run' Venezuela

Hydrogen plays part in global warming: study

ExxonMobil slows low-carbon investment push through 2030

Israel, Qatar and US hold trilateral meeting in New York

TECH SPACE
How Climate Policies that Incentivize and Penalize Can Drive the Clean Energy Transition

Turkmenistan's battle against desert sand

Rain in Tehran brings relief from nationwide drought

US agency wipes climate change facts from website: reports