News

lesswrong. com
lesswrong. com > posts > y795kpd Rq C3u Rmhtt > arena-9-0-call-for-applicants

ARENA 9. 0: Call for Applicants " Less Wrong

2+ hour, 20+ min ago  (770+ words) Apply here'to participate in ARENA 9. 0'before 11: 59pm on Sunday July 12th, 2026 (anywhere on Earth). Being situated at LISA brings several benefits to participants, such as productive discussions about AI safety research agendas, allowing participants to form a better picture of what working…...

Symbols: btc-usd,wow.ax,cba.ax,nyse:aren
lesswrong. com
lesswrong. com > posts > q Zrbhoa EALFTmyidr > perspectives-on-continual-learning-survey-results-and

Perspectives on Continual Learning: Survey Results and Forecasts " Less Wrong

1+ day, 1+ hour ago  (1047+ words) This is the fifth post in the sequence Implications of Continual Learning for LLM Agents. "...

Symbols: btc-usd,symbol:once,skill.md
lesswrong. com
lesswrong. com > posts > q Zrbhoa EALFTmyidr > expert-views-on-continual-learning-survey-results-and

Expert Views on Continual Learning: Survey Results and Forecasts " Less Wrong

1+ day, 1+ hour ago  (1010+ words) This is the fifth post in the sequence Implications of Continual Learning for LLM Agents. "...

Symbols: symbol:once,btc-usd
lesswrong. com
lesswrong. com > posts > de2qaz6 G3qr FZv Qq K > reasoning-and-learning-about-injected-concepts-in-language-1

Reasoning and learning about injected concepts in language models " Less Wrong

1+ day, 13+ hour ago  (1105+ words) This work was done as a part of SPAR, under the mentorship of Mirko Bronzi and Damiano Fornasiere. " TL; DR "...

Symbols: aaai-26
lesswrong. com
lesswrong. com > posts > xw GTp23 TJB5 Fmdz Hu > advocates-can-influence-llm-values-by-editing-wikipedia

Advocates Can Influence LLM Values By Editing Wikipedia " Less Wrong

3+ day, 1+ hour ago  (203+ words) This article is a summary of an original study: Brazilek, J. , Navas, M. , & Gnauck, A. (2026). Small edits, large models: How Wikipedia advocacy shape...

Symbols: btc-usd
lesswrong. com
lesswrong. com > posts > d8x DGz CEYE639qq Ev > a-mechanistic-explanation-of-prompt-injection-and-why-you

A Mechanistic Explanation of Prompt Injection (and why you should study roles) " Less Wrong

3+ day, 3+ hour ago  (1399+ words) Summary * We've been building a theory of how prompt injections work under the hood. * We show it comes down to how LLMs perceive roles (the humble...

Symbols: cwe-77,nasdaq:prim
lesswrong. com
lesswrong. com > posts > d8x DGz CEYE639qq Ev > a-theory-of-prompt-injection-and-why-you-should-study-roles

A Theory of Prompt Injection (and why you should study roles) " Less Wrong

3+ day, 3+ hour ago  (1399+ words) Summary * We've been building a theory of how prompt injections work under the hood. * We show it comes down to how LLMs perceive roles (the humble...

Symbols: cwe-77
lesswrong. com
lesswrong. com > posts > Nazpr Rf WJ4qkwc Sro > nla-explanations-can-be-shortened-without-harming

NLA explanations can be shortened without harming reconstruction " Less Wrong

3+ day, 16+ hour ago  (67+ words) Natural language autoencoders are a really cool mostly-unsupervised method for producing free-form text explanations of LLM activations. You should r...

Symbols: symbol:once
lesswrong. com
lesswrong. com > posts > cmbnqd AJRo WBHmo RT > the-one-week-sprint

The one-week sprint " Less Wrong

6+ day, 5+ hour ago  (441+ words) Recently I've been working in one-week sprints, and I've really enjoyed it! Tl; dr I need to do a lot of creative knowledge work, and have recently fallen into a routine which IMO is pretty good at facilitating that. Monday…...

Symbols: six:your,chqm-fm,ckzz-fm
lesswrong. com
lesswrong. com > posts > 2gsrx Nx3 Qf SZBuqtj > reinforcement-learning-towards-broadly-and-persistently

Reinforcement learning towards broadly and persistently beneficial models " Less Wrong

6+ day, 19+ hour ago  (375+ words) This is an unofficial automated linkpost. We find that reinforcement learning on realistic scenarios targeting beneficial traits can produce broad improvements across dozens of benchmarks measuring aligned and beneficial behavior. These alignment gains generalize beyond the domains used for training…...

Symbols: nasdaq:crwv,nyse:chgg,d05.S0,u11.S0,z74.S0,cyw.si