Science: AI is expected to help digest massive papers? There are still technical and legal obstacles

Nov. 27 12:13 am 1043 read

On November 21, 2023, according to the journal Science, artificial intelligence (AI) is expected to help researchers digest a large number of papers, but it faces technical and legal obstacles.

Iosif Gidiotis, who started his PhD in educational Technology at Sweden's KTH Royal Institute of Technology this year, was intrigued to learn that new AI-powered tools could help "digest" literature.

Globally, nearly 3 million papers were published in science last year. With the number of papers surging, AI research assistants "sound great."

Gidiotis hopes AI will find papers that are more relevant to the questions it studies and come up with highlights. However, things did not go as well as he expected. When he tried using an AI tool called Elicit, he found that it was only partially relevant and that summaries of Elicit weren't accurate enough to meet his needs. "With Elicit, your instinct is to read the original text yourself to verify that the abstract is correct, so it doesn't save time."

Elicit "suggests" that it is continuing to refine the algorithm for 250,000 general users. In one survey, the tool saved people an average of 90 minutes of reading and searching time per week. Elicit was founded in 2021 by a nonprofit research organization that aims to help scientists navigate the literature.

"These platforms have exploded." Andrea Chiarelli says she follows AI tools in her publishing work at Research Consulting, however, the tools' generation systems are prone to fake content and many of the papers searched are paid for.

"It's hard to predict which AI tools will prevail, and there's a certain amount of hype, but they show great promise," Chiarelli said.

Like ChatGPT, a chatbot developed by OpenAI that garnered global attention, and other large language models (LLMS), some new tools are "trained" on large samples of text, learning to recognize word relationships that enable algorithms to sum search results. They also identify relevant content based on context within the paper, yielding a wider range of results than using keyword queries alone.

Training large language models from scratch is too expensive for most organizations, so Elicit and other AI tools use open source large language models, and many of the texts they use for "training" are non-scientific.

Some AI tools go further. Elicit, for example, organizes papers conceptually and queries "too much caffeine" to elicit separate essays on "reducing drowsiness" and "impairing athletic performance." The premium version costs $10 per month and can use additional in-house programming to improve accuracy.

Another tool, called Scim, helps draw the reader's eye to the most relevant parts of the paper. The tool is a feature of the Semantic Reader tool, created by the nonprofit Allen Institute for AI, which works like an automatic inkblot highlighter that users can customize to apply to statements about novelty, goals, and other topics.

"It provides a quick diagnosis and categorization of whether a paper is worth reading, which is very valuable." Eytan Adar, an information scientist at the University of Michigan, said he tried an early version. There are also tools that annotate summaries, allowing users to judge accuracy for themselves.

To try to avoid generating false responses, the Allen Institute uses large language models "trained" in scientific papers to operate semantic readers, but the effectiveness of this approach is difficult to measure. "These are marginal technical problems," says Michael Carbin, a computer scientist at the Massachusetts Institute of Technology.

"Right now, the best standard we have is for educated people to look at the AI output and analyze it carefully," said Dan Weld, chief scientist of the Semantic Scholar paper library at the Allen Institute.

The institute has collected feedback from more than 300 paid graduate students and thousands of volunteer testers. Quality tests have shown that applying Scim to non-computer science papers produces glitches, so the institute currently offers Scim for only about 550,000 computer science papers.