ocrd_validators.page_validator module

API for validating OcrdPage.

exception ocrd_validators.page_validator.ConsistencyError(tag, ID, actual, expected)[source]

Bases: Exception

Exception representing a consistency error in transcription level of a PAGE-XML.

class ocrd_validators.page_validator.PageValidator(page, strictness, strategy)[source]

Bases: object

Validator for OcrdPage <../ocrd_models/ocrd_models.ocrd_page.html>.

static validate(filename=None, ocrd_page=None, ocrd_file=None, strictness='strict', strategy='index1')[source]

Validates a PAGE file for consistency by filename, OcrdFile or passing OcrdPage directly.

Parameters:
  • filename (string) – Path to PAGE
  • ocrd_page (OcrdPage) – OcrdPage instance
  • ocrd_file (OcrdFile) – OcrdFile instance wrapping OcrdPage
  • strictness (string) – ‘strict’, ‘lax’, ‘fix’ or ‘off’
  • strategy (string) – Currently only ‘index1’
Returns:

report (ValidationReport) Report on the validity

ocrd_validators.page_validator.compare_without_whitespace(a, b)[source]

Compare two strings, ignoring all whitespace.

ocrd_validators.page_validator.concatenate_children(node, concatenate_with, strategy)[source]

Concatenate children of node according to https://ocr-d.github.io/page#consistency-of-text-results-on-different-levels

ocrd_validators.page_validator.get_text(node, strategy)[source]

Get the most confident text results, either those with @index = 1 or the first text results or empty string.

ocrd_validators.page_validator.handle_inconsistencies(node, strictness, strategy, report)[source]

Check whether the text results on an element is consistent with its child element text results.

ocrd_validators.page_validator.set_text(node, text, strategy)[source]

Set the most confident text results, either those with @index = 1, the first text results or add new one.