Part of it is the same "human speech" aspects that have plagued NLP work over the past few years. Nobody (except the poor postdoctoral bastard who is running the paper farm for their boss) actually speaks in the same way that scholarly articles are written because... that should be obvious.
This combines with the decades of work by right wing fascists to vilify intellectuals and academia. If you have ever seen (or written) a comment that boils down to "This youtuber sounds smug" or "They are presenting their opinion as fact" then you see why people prefer "natural human speech" over actual authoritatively researched and tested statements.
And... while not all pay to publish journals are trash, I feel confident saying that most are. And filtering those can be shockingly hard by design.
But the big one? Most of the owners of the various journals are REALLY fucking litigious and will go scorched earth on anyone who is using their work (because Elsevier et al own your work) to train a model.
Part of it is the same "human speech" aspects that have plagued NLP work over the past few years. Nobody (except the poor postdoctoral bastard who is running the paper farm for their boss) actually speaks in the same way that scholarly articles are written because... that should be obvious.
This combines with the decades of work by right wing fascists to vilify intellectuals and academia. If you have ever seen (or written) a comment that boils down to "This youtuber sounds smug" or "They are presenting their opinion as fact" then you see why people prefer "natural human speech" over actual authoritatively researched and tested statements.
And... while not all pay to publish journals are trash, I feel confident saying that most are. And filtering those can be shockingly hard by design.
But the big one? Most of the owners of the various journals are REALLY fucking litigious and will go scorched earth on anyone who is using their work (because Elsevier et al own your work) to train a model.