ocrd_models.ocrd_mets module

API to METS

class ocrd_models.ocrd_mets.OcrdMets(**kwargs)[source]

Bases: OcrdXmlDocument

API to a single METS file

static empty_mets(now=None, cache_flag=False)[source]

Create an empty METS file from bundled template.

refresh_caches()[source]
property unique_identifier

Get the unique identifier by looking through mods:identifier See specs for details.

property agents

List all :py:class:`ocrd_models.ocrd_agent.OcrdAgent`s

add_agent(*args, **kwargs)[source]

Add an ocrd_models.ocrd_agent.OcrdAgent to the list of agents in the metsHdr.

property file_groups

fileGrp` entries.

Type:

List the @USE of all `mets

find_all_files(*args, **kwargs)[source]

Like find_files() but return a list of all results. Equivalent to list(self.find_files(...))

find_files(ID=None, fileGrp=None, pageId=None, mimetype=None, url=None, local_filename=None, local_only=False)[source]

Search mets:file entries in this METS document and yield results. The ID, pageId, fileGrp, url and mimetype parameters can each be either a literal string, or a regular expression if the string starts with // (double slash). If it is a regex, the leading // is removed and candidates are matched against the regex with re.fullmatch. If it is a literal string, comparison is done with string equality. The pageId parameter supports the numeric range operator ... For example, to find all files in pages PHYS_0001 to PHYS_0003, PHYS_0001..PHYS_0003 will be expanded to PHYS_0001,PHYS_0002,PHYS_0003. :keyword ID: @ID of the mets:file :kwtype ID: string :keyword fileGrp: @USE of the mets:fileGrp to list files of :kwtype fileGrp: string :keyword pageId: @ID of the corresponding physical mets:structMap entry (physical page) :kwtype pageId: string :keyword url: @xlink:href remote/original URL of mets:Flocat of mets:file :kwtype url: string :keyword local_filename: @xlink:href local/cached filename of mets:Flocat of mets:file :kwtype local_filename: string :keyword mimetype: @MIMETYPE of mets:file :kwtype mimetype: string :keyword local: Whether to restrict results to local files in the filesystem :kwtype local: boolean

Yields:

ocrd_models:ocrd_file:OcrdFile instantiations

add_file_group(fileGrp)[source]

Add a new mets:fileGrp. :param fileGrp: @USE of the new mets:fileGrp. :type fileGrp: string

rename_file_group(old, new)[source]

Rename a mets:fileGrp by changing the @USE from old to new.

remove_file_group(USE, recursive=False, force=False)[source]

Remove a mets:fileGrp (single fixed @USE or multiple regex @USE) :param USE: @USE of the mets:fileGrp to delete. Can be a regex if prefixed with // :type USE: string :param recursive: Whether to recursively delete each mets:file in the group :type recursive: boolean :param force: Do not raise an exception if mets:fileGrp does not exist :type force: boolean

add_file(fileGrp, mimetype=None, url=None, ID=None, pageId=None, force=False, local_filename=None, ignore=False, **kwargs)[source]

Instantiate and add a new ocrd_models.ocrd_file.OcrdFile. :param fileGrp: @USE of mets:fileGrp to add to :type fileGrp: string

Keyword Arguments:
  • mimetype (string) – @MIMETYPE of the mets:file to use

  • url (string) – @xlink:href (URL or path) of the mets:file to use

  • ID (string) – @ID of the mets:file to use

  • pageId (string) – @ID in the physical mets:structMap to link to

  • force (boolean) – Whether to add the file even if a mets:file with the same @ID already exists.

  • ignore (boolean) – Do not look for existing files at all. Shift responsibility for preventing errors from duplicate ID to the user.

  • local_filename (string) –

remove_file(*args, **kwargs)[source]

Delete each ocrd:file matching the query. Same arguments as find_files()

remove_one_file(ID, fileGrp=None)[source]

Delete an existing ocrd_models.ocrd_file.OcrdFile. :param ID: @ID of the mets:file to delete Can also be an ocrd_models.ocrd_file.OcrdFile to avoid search via ID. :type ID: string|OcrdFile :param fileGrp: @USE of the mets:fileGrp containing the mets:file. Used only for optimization. :type fileGrp: string

Returns:

The old ocrd_models.ocrd_file.OcrdFile reference.

property physical_pages

List all page IDs (the @ID of each physical mets:structMap mets:div)

get_physical_pages(for_fileIds=None)[source]

List all page IDs (the @ID of each physical mets:structMap mets:div), optionally for a subset of mets:file @ID for_fileIds.

set_physical_page_for_file(pageId, ocrd_file, order=None, orderlabel=None)[source]

Set the physical page ID (@ID of the physical mets:structMap mets:div entry) corresponding to the mets:file ocrd_file, creating all structures if necessary. :param pageId: @ID of the physical mets:structMap entry to use :type pageId: string :param ocrd_file: existing ocrd_models.ocrd_file.OcrdFile object :type ocrd_file: object

Keyword Arguments:
  • order (string) – @ORDER to use

  • orderlabel (string) – @ORDERLABEL to use

get_physical_page_for_file(ocrd_file)[source]

Get the physical page ID (@ID of the physical mets:structMap mets:div entry) corresponding to the mets:file ocrd_file.

remove_physical_page(ID)[source]

Delete page (physical mets:structMap mets:div entry @ID) ID.

remove_physical_page_fptr(fileId)[source]

Delete all mets:fptr[@FILEID = fileId] to mets:file[@ID == fileId] for fileId from all mets:div entries in the physical mets:structMap. :returns: fptrs were deleted from :rtype: List of pageIds that mets

merge(other_mets, force=False, fileGrp_mapping=None, fileId_mapping=None, pageId_mapping=None, after_add_cb=None, **kwargs)[source]

Add all files from other_mets. Accepts the same kwargs as find_files() :keyword force: Whether to add_file`s with force (overwriting existing ``mets:file``s) :kwtype force: boolean :keyword fileGrp_mapping: Map :py:attr:`other_mets() fileGrp to fileGrp in this METS :kwtype fileGrp_mapping: dict :keyword fileId_mapping: Map other_mets file ID to file ID in this METS :kwtype fileId_mapping: dict :keyword pageId_mapping: Map other_mets page ID to page ID in this METS :kwtype pageId_mapping: dict :keyword after_add_cb: Callback received after file is added to the METS :kwtype after_add_cb: function