eScript App Description

This is a fairly detailed description of eScript,

an App to recognise handwritten characters

--

-

You may prefer to start with the Overview.

Overall Workflow

Diagram and chat describing the workflow

  • Project/Profiles
  • Import Documents
  • Prepare Documents
  • Prepare Training
  • Train Model
  • Transcribe Document
  • Evaluate Results
  • Feedback Changes
  • Output Results

Note that often a typical workflow can start with Transcribe Document

(Screenshot: eScript to be added

Create and Configure Your Projects

Work is organised into projects. You can create as many projects as you like. Each project can have its own settings to control the training and transcribing process. Typically a project consists of a set of documents that are related, either by being from the same hand or typeface, or all being from the same book etc. Remember a project’s documents are transcribed using a single model and the model is trained to recognise specific characters, so it does not make sense to collect unrelated documents together in a project for the purpose of transcription.

(Screenshot: Coral - App & website startup kit)

Import Your Documents

Documents are imported into projects for training and transcribing purposes. You can import individual documents or group documents into folders and import the folders. Documents must be imported before they can be used. By 'Documents' here we mean scanned images of documents. Any normal image format is supported.

Prepare Your Documents

Individual documents can be prepared in various ways. You can define a border around the document in order to eliminate areas around the edge that do not contain genuine characters but perhaps contain what we refer to as noise, i.e. extraneous matter picked up by the scanning process such as the edge of the document or shadow from the scanning process. You can also adjust the threshold between characters and the background. This might be important in cases where the background is fairly dark.

(Screenshot: Tempo - Bootstrap template for startups)

Prepare for Training

Identify those glyphs that you want to use in training by tagging them with the computer readable character and characteristics that you want to associate with that glyph. You can if necessary associate more than one character with a glyph. When you have tagged all your glyphs (and you only need to associate one glyph with each character you want to identify), you can automatically generate a series of variant image files against those glyphs. This basically means that the system will construct image files based on each tagged glyph that stretches, rotates, scales and thins the glyph. This provides more varied input to the training process (see below). Don’t worry, if you don’t tag all glyphs initially. You can come back and add more later.

Train Your Model

Once you are happy that you have tagged all your glyphs and created the corresponding variant image files then you can train the neural network model.

This involves submitting each image file to the model with its corresponding tag details and can therefore take some considerable time to run. Once all image files have been submitted the deep learning process calculates an error rate against its initial parameters and if that error rate is greater than the target error (currently 0.0001 but is adjustable) then the model automatically makes adjustments to those parameters and the evaluation is repeated until the error is within the permitted error rate and once this happens the model is saved and can then be used repeatedly against the documents you want to transcribe.

(Screenshot: Tempo - Bootstrap template for startups)

Transcribe

Once you have a trained model then you can start to transcribe your documents. Remember you can transcribe as many documents as you wish but the documents need to be related e.g. from the same hand, if the transcription is to make any sense.

(Screenshot: Tempo - Bootstrap template for startups)

Evaluate Results

Once the transcription is complete then you can examine and evaluate the results. This enables you to see how the model has interpreted the document.you will see the following

  • single recognised glyph
  • multiple recognised glyph
  • unrecognised glyph
  • spaces between words and paragraphs

The evaluation is presented in rows of the original document to help you focus on specific areas of the document. You can change any of the decisions taken by the model to enter your own identification for characters.

(Screenshot: Tempo - Bootstrap template for startups)

Feedback Results

You can feedback the results of changes for use in a later or a repeat transcription.

(Screenshot: Tempo - Bootstrap template for startups)

Output Results

At any point during the evaluation process you can output the results of the transcription process as an RTF file to incorporate in Word documents etc.

(Screenshot: Tempo - Bootstrap template for startups)

Want to discover all the features?

Take a Tour

Take a quick tour to see how it works

Ready to find out more about digital palaeography and eScript?

Or feel you can offer help to the project?

Contact us to start the dialogue

Contact Us Now