Technology
Researchers say they trained a foundation model from scratch for about $1,500
Image via VentureBeat
Article Summary
205 words
Training a foundation LLM from scratch costs millions and requires internet-scale data — which is why most enterprises don't bother. Sapient thinks it has a cheaper path.To overcome this brute-force scaling dogma, researchers at Sapient developed HRM-Text, which replaces standard… Training a foundation LLM from scratch costs millions and requires internet-scale data — which is why most enterprises don't bother. Sapient thinks it has a cheaper path.To overcome this brute-force scaling dogma, researchers at Sapient developed HRM-Text, which replaces standard Transformers with a highly sample-efficient Hierarchical Recurrent Model (HRM), an architecture they first introduced last year.HRM decouples computation into slow-evolving strategic and fast-evolving execution layers. Instead of brute-force autoregressive prediction on raw text, HRM-Text trains exclusively on instruction-response pairs. This is close to real-world enterprise settings, where users usually expect a targeted answer to a specific task.The researchers were able to train a 1B-parameter HRM-Text from scratch at a fraction of the cost and tokens of normal LLMs. Their model achieved performance competitive with much larger open models on key industry benchmarks.For real-world AI applications, this means foundational pretraining is no longer restricted to highly resourced institutions. With HRM-Text, organizations can affordably pretrain their own highly capable reasoning models from…
Continue Reading
Full story on VentureBeat
🔗 Clicking will take you to venturebeat.com
