AI system will trawl through millions of academic journals to find links missed by scientists


A new AI system is designed to help scientists find previously undiscovered connections in the mass of already published research(Credit: phonlamai/Depositphotos)

In the age of big data we often seem to be drowning in a constant torrent of research and information. The massive challenge we now face is how to sort through all the work that has been produced. In an exciting collaboration between computer scientists and cancer researchers at the University of Cambridge, a novel AI system has been developed to help sort through millions of scientific studies and help researchers uncover previously missed connections.

Science, by its very nature, is a piecemeal process. Each tiny new discovery or development adds to our greater body of knowledge, but we are now reaching a point where there is such a giant volume of data available on every research topic, no single human mind can reasonably wade through it.

As a cancer researcher, even if you knew what you were looking for, there are literally thousands of papers appearing every day, says Anna Korhonen, one of the developers of the new AI system.

Called LION LBD, the system is initially focusing on cancer research due to the broad volume of research on the topic spanning a number of different scientific fields. The system incorporates machine learning, natural language processing (NLP) and text mining methods modeled on a technique called literature-based discovery (LBD).

Originally developed in the 1980s by information scientist Don Swanson, the LBD technique was designed to try to help researchers home in on data in studies that could be useful but otherwise remained buried as secondary to the study’s overall hypothesis. Swanson developed the technique after noticing how broad and fragmented scientific research had become.

“The fragmentation of science into specialities makes it likely that there exist innumerable pairs of logically related, mutually isolated literatures,” Swanson wrote in a study demonstrating the potential of LBD back in 1988.

LBD originally arose as a painstaking manual process but in recent years it has proven perfect for computerized appropriation, with 21st century technology allowing machines to help find connections or patterns in different studies that humans would have never been able to detect.

“For example, you may know that a cancer drug affects the behaviour of a certain pathway, but with LION LBD, you may find that a drug developed for a totally different disease affects the same pathway,” explains Korhonen, discussing the potential of the new AI system.

At this early stage, the LION LBD system is still relatively limited. It can only produce connections between two keywords or concepts, and has been initially programmed using just publicly available PubMed abstracts. However, these limitations promise to improve swiftly as the researchers behind it are making the entire system open source and freely accessible.

The LION LBD system is currently accessible to all through a web portal and the entire software code and API is also free to developers keen to collaborate and improve it.

The system is described in a new paper published in the journal Bioinformatics.

Source: University of Cambridge