Inference Technique - Search News

Google targets AI inference bottlenecks with TurboQuant

The technique aims to ease GPU memory constraints that limit how enterprises scale AI inference and long-context applications ...

IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models

Researchers at Tsinghua University and Z.ai built IndexCache to eliminate redundant computation in sparse attention models ...

Geeky Gadgets

SteerLM a simple technique to customize LLMs during inference introduced by NVIDIA

Large language models (LLMs) have made significant strides in artificial intelligence (AI) natural language generation. Models such as GPT-3, Megatron-Turing, Chinchilla, PaLM-2, Falcon, and Llama 2 ...

EurekAlert!

KAIST develops new AI inference-scaling method for planning

Diffusion models are widely used in many AI applications, but research on efficient inference-time scalability*, particularly for reasoning and planning (known as System 2 abilities) has been lacking.

14don MSN

What is inference? Explaining the massive new shift in AI computing

The focus of artificial-intelligence spending has gone from training models to using them. Here’s how to understand the difference—and the implications.

Semiconductor Engineering

Review of Tools & Techniques for DL Edge Inference

A new technical paper titled “Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review” was published in “Proceedings of the IEEE” by researchers at University ...

InfoWorld

Multi-token prediction technique triples LLM inference speed without auxiliary draft models

With reported 3x speed gains and limited degradation in output quality, the method targets one of the biggest pain points in production AI systems: latency at scale. High inference latency and ...

TechRepublic

DeepSeek-GRM: Introducing an Enhanced AI Reasoning Technique

Researchers from DeepSeek and Tsinghua University say combining two techniques improves the answers the large language model creates with computer reasoning techniques. Image: Envato/DC_Studio ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results