Since LLMs essentially decide on one character at a time, I wonder if they would have better accuracy if asked to tell you the sum backwards. That's how we teach kids to add, right to left, carry the 1.
I think this is essentially what they did. The point of the paper is they made an architecture to make the llm more aware of an individual digit's position in a number. It helped with addition, multiplication, and even sorting.
That's why it's easier. if you're going left to right you have to not only figure out the sum of the first number position, but also if there's a 1 to carry or not. Going right to left you only have to focus on one 1 digit add at a time and you already know if there's a carry by looking at the last addition.
Since LLMs essentially decide on one character at a time, I wonder if they would have better accuracy if asked to tell you the sum backwards. That's how we teach kids to add, right to left, carry the 1.
I think this is essentially what they did. The point of the paper is they made an architecture to make the llm more aware of an individual digit's position in a number. It helped with addition, multiplication, and even sorting.
Its technically true that it decides token at a time but it also takes previous tokens into account.
That's why it's easier. if you're going left to right you have to not only figure out the sum of the first number position, but also if there's a 1 to carry or not. Going right to left you only have to focus on one 1 digit add at a time and you already know if there's a carry by looking at the last addition.