Can people be persuaded not to believe disinformation?:
Dr [Thomas] Costello believes chatbots work where humans fail because they offer rational responses instead of letting emotions get the better of them. What’s more, they are able to comb through their extensive training data to offer precise counter-arguments, rather than the generalised ones humans often reach for in debates.
Researchers lift the lid on how reasoning models actually “think”:
When Claude itself is asked to reason, printing out the chain of thought that it takes to answer maths questions, the microscope suggests that the way the model says it reached a conclusion, and what it actually thought, might not always be the same thing. Ask the llm a complex maths question that it does not know how to solve and it will “bullshit” its way to an answer: rather than actually trying, it decides to spit out random numbers and move on.
Worse still, ask a leading question — suggesting, for instance, that the answer “might be 4” — and the model still secretly bullshits as part of its answer, but rather than randomly picking numbers, it will specifically insert numbers that ultimately lead it to agree with the question, even if the suggestion is wrong.
These stories were posted just a few days apart. It’s comical to me how many AI researchers act as though the hallucinations and bullshitting simply don’t exist. Also: LLMs are not rational or irrational or emotional or anything else that human beings are. They are the conduits, thanks to their corpora, of human rationality or irrationality or emotionalism.