About project
The objective of the FBC TeNe project is to increase accessibility of digital resources of the Polish public cultural and research institutions. The functionality of the Digital Libraries Federation which aggregates these resources will be extended by implementing search for digital object by their text and music content. As a result the end users will be able to access information which was not available so far and to search the public digital collections much more effectively.
The key target groups of end users are researchers, educator and students, curator, artists and general public interested in culture and science, including disabled users.
FBC
FBC (Digital Libraries Federation) is a service implemented by Poznan Supercomputing and Networking Center in 2007. It aggregates data from over 120 online repositories, libraries, archives, museums and digital galleries offering access to over 6.5 mln objects. FBC enables searching through these distributed collections in one place. Since search results are links to the objects in their home collections users can quickly access selected items. Currently, FBC aggregates only metadata of the content exposed by various digital collections.
The FBC TeNe project’s objective is to extended the FBC functionality by implementing aggregation of the objects’ text and music content in addition to their metadata. This approach will enable to search for objects by their content which will significantly increase their accessibility and improve search effectiveness.
Content available in FBC includes objects of various type and digital formats, from text documents through audiovisual content to musical items. In order to make these objects searchable they need to be converted to a format that can be indexed. Furthermore, search procedures need to be implemented according to the requirements defined by various content types. Content of text documents that were digitized by scanning can be retrieved with OCR (Optical Character Recognition). Musical items’ content in the form of staff notation can be acquired with OMR (Optical Music Recognition) which generates its symbolic representation which can be further converted to various formats (e.g. MIDI).
The project objectives will be achieved by developing mechanisms for text representation retrieval based on OCR, and staff notation retrieval based on OMR. For music documents it will also be necessary to define new search criteria and to implement search procedure accordingly. In addition, the FBC portal functionality will be extended by upgrading its mobile version, defining filters that standardize access to resources and by facilitating access for people with disabilities, especially the blind and visually impaired (compliance with WCAG at the level required by the National Interoperability Framework).
Rezultaty projektu zwiększą dostęp do treści i jej przeszukiwania docelowym grupom użytkowników oraz wpłyną na The project will result in increased content availability and improved range and quality of the search process for the target end users’ groups. Moreover, it will increase the number of digital objects visible through the FBC service. Since FBC is an official Polish content aggregator for Europeana, an increase resource availability will support cultural and scientific content promotion and the European level.
MIRELA
MIRELA is a music digital library based on dLibra system. It is a part of a digital platform created for musicological research and for providing access to Polish folk music collections. MIRELA equipped with tools for collecting, processing and searching for musical information. The library currently contains samples of folk music and their metadata from the Phonographic Collection of the Institute of Art of the Polish Academy of Sciences.
One of the objectives of the project is to expand resources available through MIRELA by collections of folk music compiled and processed by the Institute of Art of the Polish Academy of Sciences and other institutions. So far these collections have not be aggregated by FBC.
Technologies
OCR
Optical Character Recognition (OCR) is a method for identifying and recognizing characters in graphical files, e.g. photographs or scans of text documents. An OCR-based software can automatically retrieve text from such a graphical file. The text representation can be further processed automatically.
Books, journals, articles and similar text documents constitute about 90% of the resources aggregated by FBC. Currently all items are represented and searchable only by metadata entered manually into their home collections. This data (e.g. title, author or description) may be insufficient to find an item of interest. Lack of a full-text search capability was often indicated as a problem by digital libraries users in surveys conducted by PSNC. We want to solve this problem with OCR-based tools.
OCR makes it possible to create so called “text layer” for documents, i.e. generate additional data that can be searched by query terms provided by a user.
OCR process will be conducted automatically with a specialized software. Application of new technologies based on artificial intelligence and machine learning, specifically neural networks, continuously improve quality of the OCR tools results. Eventually, OCR will performed in FBC on objects when they are added to the FBC databased and indexed.
OMR
OMR is a research domain dedicated to automatic retrieval and interpretation of music staff notation from graphical objects such as photographs or document scans. The objective is to generate machine interpretable representation of a musical score. Such a representation can be further processed and converted to a number of formats including MIDI for music playback or MusicXML for presentation on a webpage.
Research on optical music recognition began at the end of the 60s at MIT when first image scanner became available to research institutes. Initially only first few bars were processed since the memory available in the early computers was limited. In 1984 Japanese research group from Wased University built a robot called WABOT which was capable of recognizing staff notation and accompanying a singer on electric organ. Early research on OMR was conducted by Ichiro Fujinaga, Nicholas Carter, Kia Ng, David Bainbridge and Tim Bell. It resulted in a number of techniques which are still in use today. First commercial OMR application MIDISCAN (currently SmartScore) was developed in 1991 by Musitek Corporation. Common availability of smartphones equipped with good quality cameras and computational power paved a way for mobile solutions where user takes a photo of music score which is processed locally on the device.
Projekt dofinansowany z Funduszy Europejskich w ramach Programu Operacyjnego Polska Cyfrowa na lata 2014-2020 (projekt nr POPC.02.04.00-00-0012/20-00)