HieroFinder: A Hieroglyphic Text Retrieval Tool

HieroFinder

A Hieroglyphic Text Retrieval Tool

STUDY AND PRESERVATION OF ANCIENT EGYPTIAN

Until recently, the development of information processing systems has mainly focused on contemporary languages. From a socio-economic point of view, this makes perfect sense since our needs, as users, are connected to our everyday tasks, which we develop in our languages. So, little attention has been paid to the so-called dead languages such as Ancient Egyptian. Nevertheless, our civilization was born in Mesopotamia and Egypt, and the culture of Pharaohs has fascinated us for decades and even centuries. Even nowadays, Egyptology continues to be one of the major branches of Archaeology and it is not unusual to find, from time to time, that new discoveries in this field open our news bulletins. Moreover, Egyptian is the longest-attested language, it thus becoming a particularly valuable object of research for Diachronic Linguistics. Finally, we should not forget either its intrinsic value as one of the most representative elements of one of the most important human civilizations of all time. Egyptian Hieroglyphic script is a major component of our cultural heritage and, for that very reason, we should put particular emphasis on its preservation and study.

COMPUTER PROCESSING OF HIEROGLYPHIC TEXTS

In Egyptology, computer processing of hieroglyphic texts has been closely linked to the development of classic-style text edition tools such as the old Glyph word processor or the modern JSesh system. The reason was that since there were no hieroglyphic typewriters, scholars had to rely on handwritten texts when writing and sharing documents. Even in the case of books, the hieroglyphic texts printed in their pages were typographical transcriptions or, more commonly, mere copies of those handwritten by their authors. Thus, the need for hieroglyphic text processing software was peremptory.

But, what about more complex tools such as Text Mining and Natural Language Processing (NLP) systems? In that respect, we can say that Ancient Egyptian is, basically, a virgin territory waiting to be explored.

THE HIEROFINDER SYSTEM

The HieroFinder hieroglyphic text retrieval system was developed as a M.Eng. Thesis by Ms. Estíbaliz Iglesias-Franjo, being her advisor Prof. Jesús Vilares from the Computer Science Department of the University of A Coruña. The aim of the project was to develop a Document Retrieval system capable of operating on Middle Egyptian texts. The system has been developed as a prototype (beta version) and we would like to extend and improve it in the future.

FEATURES OF THE SYSTEM

Simplicity: Intuitive and easy to use, with a minimum learning curve.
Flexibility: The system is able to manage both hieroglyphic and conventional text contents during indexing and querying-retrieval.
Based on MdC encoding: Hieroglyphic text, either in documents or queries, must be encoded using Manuel de Codage (MdC) standard encoding system.
Graphical user interface: The system provides an intuitive and easy to use front-end querying interface. It provides several options for handling the hieroglyphic text in the query, such as adding shadows, rearranging the text or inserting cartouches.
Sign palette: If required, the user is provided with a palette of hieroglyphic signs that can be added to the query simply by clicking on them. The palette also functions as a catalog of symbols organized according to Gardiner's List classification. This palette was adapted from JSesh system.
Exact and approximate term matching.
Relevant document retrieval.
Display of document contents: The user is able to access, from the tool itself, the content of the documents retrieved by the system. Those query term matchings found in the document are highlighted to enable the user to check why it has been retrieved.
Simple and flexible architecture.
Internationalization: The interface of the system supports internationalization. At this time the user can choose between English, French, Spanish and Galician.
Implemented in Java.
Third party libraries: HieroFinder makes use of third party libraries provided by Apache Lucene, Apache Tika and JSesh.
Source code freely available under a LGPLv3 license.

Based on MdC encoding

DOWNLOAD

Source coude freely available under a LGPLv3 license at GitHub (see below). Beware that, at this time, the system is at prototype stage (beta version v1.0b2), although and we intend to extend and improve it in the future.

Download

CITING

If you use HieroFinder during the course of your research, please notify Prof. Jesús Vilares, to keep track of its users, and cite the following work (copy of the paper available here; slides available here):

Estíbaliz Iglesias-Franjo and Jesús Vilares, Searching Four-Millennia-Old Documents: A Text Retrieval System for Egyptologists. In Proceedings of the 10th ACL SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (ACL LaTeCH 2016), pp. 22-31, Berlin, Germany, 2016. ISBN 978-1-945626-09-8. [Toolkit available at https://github.com/estibalizifranjo/hieroglyphs]

ACKNOWLEDGEMENTS

This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (MINECO) through project FFI2014-51978-C2-2-R. We would also like to thank Prof. Josep Cervelló, Director of the Institut d'Estudis del Pròxim Orient Antic (IEPOA) of the Universitat Autònoma de Barcelona for introducing us to the Ancient Egyptian script; and Mr. Serge Rosmorduc, Associate of the Conservatoire National des Arts et Métiers (CNAM) for all his support when dealing with JSesh.

HieroFinder

HieroFinder

STUDY AND PRESERVATION OF ANCIENT EGYPTIAN

COMPUTER PROCESSING OF HIEROGLYPHIC TEXTS

THE HIEROFINDER SYSTEM

FEATURES OF THE SYSTEM

Based on MdC encoding

Functional and intuitive

Access to contents

DOWNLOAD

CITING

ACKNOWLEDGEMENTS