eScript Glossary

Definitions of some of the terms used in eScript.

Understanding these terms will help you to understand how eScript works.

EScript Glossary

This is an image of a document (usually scanned) that is to be used to supply training images, or is to be transcribed, or both.
This is an image of a glyph that will be used for training the neural network model, along with many others.
A glyph is a collection of image pixels that together form what should be recognisable as single or multiple characters. The character(s) can be what we would normally recognise as one or more alphabetic, numeric, punctuation, or indeed any other symbol or shape that can be tagged with a value. It might sometimes be necessary to associate more than one character with a glyph where they are joined together in the document in such a way that they cannot be separated easily; æ (character a and character e) is an example of this.
This is the boundary below which pixel values are to be counted as a part of the recognisable content of a document and above which they are excluded. This is most useful with old documents that have staining or other noise but of course is not perfect as some documents can have marks that are as dark as the writing on them.
Tagging is the process of identifying an image of a glyph as corresponding to a specific character or characters with a given style, weight and case. Once an image is tagged it is used as part of the model training and keeps that tag throughout. Tagging is a user activity carried out against images of glyphs that have been derived from a source document.
A variant file is a file containing an adjusted glyph image - a variant of the original image. This adjustment is automatic and consists of one or more of a rotation left or right, a stretch up and down, a thinning of rotated and stretched images etc.
A model is a machine learning Neural Network model that is trained with tagged glyph images and their variants. This model is subsequently supplied with un-tagged glyph images from the original or from other documents with the aim that it processes those glyph images and returns the most likely tagging for that image. This returned tag value is then used as the recognised value for that glyph image. When this process is repeated across all glyph images for the document, the result is an initial transcription of that document that can then be evaluated and corrected.
Is the process of ‘walking’ through an image of a document identifying the glyphs that are contained within the document. There are innumerable complexities with this process, from slanting lines to angled writing to glyphs overlapping from one line to another, to say nothing of the problems caused by joined-up writing. If we add to that the fact that eScript is intended for old handwritten documents, i.e. old handwriting, then perhaps the task is impossible! We certainly hope that is not the case.
This is our normal handwriting as we normally practice it and as it has been practised since writing was invented. Almost all manuscripts, legal documents, personal communications etc would have been carried out using this style of handwriting (many still are of course!) and typically with that handwriting the individual characters are joined together.

Have we missed something?

Get in touch

Ready to find out more about digital palaeography and eScript?

Or feel you can offer help to the project?

Contact us to start the dialogue

Contact Us Now