ocrd_validators package

Validators for various OCR-D related data structures.

class ocrd_validators.ParameterValidator(ocrd_tool)[source]

Bases: ocrd_validators.json_validator.JsonValidator

JsonValidator validating parametersagains ocrd-tool.json.

validate(*args, **kwargs)[source]

Validate a parameter dict against a parameter schema from an ocrd-tool.json

Parameters:
  • obj (dict) –
  • schema (dict) –
class ocrd_validators.WorkspaceValidator(resolver, mets_url, src_dir=None, skip=None, download=False, page_strictness='strict')[source]

Bases: object

Validates an OCR-D/METS workspace against the specs.

static validate(*args, **kwargs)[source]

Validates the workspace of a METS URL against the specs

Parameters:
  • resolver (ocrd.Resolver) – Resolver
  • mets_url (string) – URL of the METS file
  • src_dir (string, None) – Directory containing mets file
  • skip (list) – Tests to skip. One or more of ‘mets_unique_identifier’, ‘mets_file_group_names’, ‘mets_files’, ‘pixel_density’, ‘dimension’, ‘url’
  • download (boolean) – Whether to download files
Returns:

report (ValidationReport) Report on the validity

class ocrd_validators.PageValidator(page, strictness, strategy)[source]

Bases: object

Validator for OcrdPage <../ocrd_models/ocrd_models.ocrd_page.html>.

static validate(filename=None, ocrd_page=None, ocrd_file=None, strictness='strict', strategy='index1')[source]

Validates a PAGE file for consistency by filename, OcrdFile or passing OcrdPage directly.

Parameters:
  • filename (string) – Path to PAGE
  • ocrd_page (OcrdPage) – OcrdPage instance
  • ocrd_file (OcrdFile) – OcrdFile instance wrapping OcrdPage
  • strictness (string) – ‘strict’, ‘lax’, ‘fix’ or ‘off’
  • strategy (string) – Currently only ‘index1’
Returns:

report (ValidationReport) Report on the validity

class ocrd_validators.OcrdToolValidator(schema, validator_class=<class 'jsonschema.validators.create.<locals>.Validator'>)[source]

Bases: ocrd_validators.json_validator.JsonValidator

JsonValidator validating against the ocrd-tool.json schema.

static validate(obj, schema={'additionalProperties': False, 'description': 'Schema for tools by OCR-D MP', 'properties': {'version': {'description': 'Version of the tool, expressed as MAJOR.MINOR.PATCH.', 'type': 'string', 'pattern': '^[0-9]+\\.[0-9]+\\.[0-9]+$'}, 'git_url': {'description': 'Github/Gitlab URL', 'type': 'string', 'format': 'url'}, 'dockerhub': {'description': 'DockerHub image', 'type': 'string'}, 'tools': {'type': 'object', 'additionalProperties': False, 'patternProperties': {'ocrd-.*': {'type': 'object', 'additionalProperties': False, 'required': ['description', 'steps', 'executable', 'categories', 'input_file_grp'], 'properties': {'executable': {'description': 'The name of the CLI executable in $PATH', 'type': 'string'}, 'input_file_grp': {'description': 'Input fileGrp@USE this tool expects by default', 'type': 'array', 'items': {'type': 'string', 'pattern': '^OCR-D-[A-Z0-9-]+$'}}, 'output_file_grp': {'description': 'Output fileGrp@USE this tool produces by default', 'type': 'array', 'items': {'type': 'string', 'pattern': '^OCR-D-[A-Z0-9-]+$'}}, 'parameters': {'description': 'Object describing the parameters of a tool. Keys are parameter names, values sub-schemas.', 'type': 'object', 'patternProperties': {'.*': {'type': 'object', 'additionalProperties': False, 'required': ['description', 'type'], 'properties': {'type': {'type': 'string', 'description': 'Data type of this parameter', 'enum': ['string', 'number', 'boolean']}, 'format': {'description': 'Subtype, such as `float` for type `number` or `uri` for type `string`.'}, 'description': {'description': 'Concise description of syntax and semantics of this parameter'}, 'required': {'type': 'boolean', 'description': 'Whether this parameter is required'}, 'default': {'description': 'Default value when not provided by the user'}, 'enum': {'type': 'array', 'description': 'List the allowed values if a fixed list.'}, 'content-type': {'type': 'string', 'description': 'If parameter is reference to file: Media type of the file'}, 'cacheable': {'type': 'boolean', 'description': "If parameter is reference to file: Whether the file should be cached, e.g. because it is large and won't change.", 'default': False}}}}}, 'description': {'description': 'Concise description what the tool does'}, 'categories': {'description': 'Tools belong to this categories, representing modules within the OCR-D project structure', 'type': 'array', 'items': {'type': 'string', 'enum': ['Image preprocessing', 'Layout analysis', 'Text recognition and optimization', 'Model training', 'Long-term preservation', 'Quality assurance']}}, 'steps': {'description': 'This tool can be used at these steps in the OCR-D functional model', 'type': 'array', 'items': {'type': 'string', 'enum': ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/text-nontext', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']}}}}}}}, 'required': ['version', 'git_url', 'tools'], 'type': 'object'})[source]

Validate against ocrd-tool.json schema.

class ocrd_validators.OcrdZipValidator(resolver, path_to_zip)[source]

Bases: object

Validate conformance with BagIt and OCR-D bagit profile.

See:
validate(skip_checksums=False, skip_bag=False, skip_unzip=False, skip_delete=False, processes=2)[source]

Validate an OCRD-ZIP file for profile, bag and workspace conformance

Parameters:
  • skip_bag (boolean) – Whether to skip all checks of manifests and files
  • skip_checksums (boolean) – Whether to omit checksum checks but still check basic BagIt conformance
  • skip_unzip (boolean) – Whether the OCRD-ZIP is unzipped, i.e. a directory
  • skip_delete (boolean) – Whether to skip deleting the unpacked OCRD-ZIP dir after valdiation
  • processes (integer) – Number of processes used for checksum validation
class ocrd_validators.ValidationReport[source]

Bases: object

Container of notices, warnings and errors about a workspace.

add_error(msg)[source]

Add an error

add_notice(msg)[source]

Add a notice

add_warning(msg)[source]

Add a warning.

is_valid

Whether the report contains neither errors nor warnings.

merge_report(otherself)[source]

Merge another report into this one.

to_xml()[source]

Serialize to XML.