Linking terms from scientific texts to knowledge base entities
Article's languageRussian
Abstract
This paper proposes new algorithms for linking scientific terms to entities in Wikipedia and MeSH (Medical Subject Headings), designed to work in low-resource settings. The algorithm for linking to Wikipedia, developed for a subset of Russian-language texts, uses the Wikipedia search engine to generate candidates and the spaCy library to obtain vector representations of the text. Semantic similarity between a Wikipedia entity description and a scientific term is calculated based not only on the term itself, but also on its surrounding context. For the medical subset of the collection, which includes Russian-to-English translations, an algorithm was developed and implemented for linking terms using the MeSH vocabulary. Experimental results show F1 scores of 50.77% for Wikipedia and 40.05% for MeSH, which are promising given the limited amount of annotated data. The study highlights the need to develop specialized Russian-language knowledge bases analogous to MeSH. A promising direction for future work is the use of multilingual models for cross-lingual linking, which is particularly important for rare terms. The results can be applied in the development of intelligent systems for scientific text analysis and automated scientific assistants, which is especially relevant for specialized domains.
Issue
# 26,
Pages53-76
File
kuzovlevbaturastartsev.pdf
(673.88 KB)