ocrd_models.ocrd_mets module¶
API to METS
- class ocrd_models.ocrd_mets.OcrdMets(**kwargs)[source]¶
Bases:
OcrdXmlDocument
API to a single METS file
- static empty_mets(now=None, cache_flag=False)[source]¶
Create an empty METS file from bundled template.
- property unique_identifier¶
Get the unique identifier by looking through
mods:identifier
See specs for details.
- add_agent(*args, **kwargs)[source]¶
Add an
ocrd_models.ocrd_agent.OcrdAgent
to the list of agents in themetsHdr
.
- find_all_files(*args, **kwargs)[source]¶
Like
find_files()
but return a list of all results. Equivalent tolist(self.find_files(...))
- find_files(ID=None, fileGrp=None, pageId=None, mimetype=None, url=None, local_filename=None, local_only=False)[source]¶
Search
mets:file
entries in this METS document and yield results. TheID
,pageId
,fileGrp
,url
andmimetype
parameters can each be either a literal string, or a regular expression if the string starts with//
(double slash). If it is a regex, the leading//
is removed and candidates are matched against the regex with re.fullmatch. If it is a literal string, comparison is done with string equality. ThepageId
parameter supports the numeric range operator..
. For example, to find all files in pagesPHYS_0001
toPHYS_0003
,PHYS_0001..PHYS_0003
will be expanded toPHYS_0001,PHYS_0002,PHYS_0003
. :keyword ID:@ID
of themets:file
:kwtype ID: string :keyword fileGrp:@USE
of themets:fileGrp
to list files of :kwtype fileGrp: string :keyword pageId:@ID
of the corresponding physicalmets:structMap
entry (physical page) :kwtype pageId: string :keyword url:@xlink:href
remote/original URL ofmets:Flocat
ofmets:file
:kwtype url: string :keyword local_filename:@xlink:href
local/cached filename ofmets:Flocat
ofmets:file
:kwtype local_filename: string :keyword mimetype:@MIMETYPE
ofmets:file
:kwtype mimetype: string :keyword local: Whether to restrict results to local files in the filesystem :kwtype local: boolean- Yields:
ocrd_models:ocrd_file:OcrdFile
instantiations
- add_file_group(fileGrp)[source]¶
Add a new
mets:fileGrp
. :param fileGrp:@USE
of the newmets:fileGrp
. :type fileGrp: string
- remove_file_group(USE, recursive=False, force=False)[source]¶
Remove a
mets:fileGrp
(single fixed@USE
or multiple regex@USE
) :param USE:@USE
of themets:fileGrp
to delete. Can be a regex if prefixed with//
:type USE: string :param recursive: Whether to recursively delete eachmets:file
in the group :type recursive: boolean :param force: Do not raise an exception ifmets:fileGrp
does not exist :type force: boolean
- add_file(fileGrp, mimetype=None, url=None, ID=None, pageId=None, force=False, local_filename=None, ignore=False, **kwargs)[source]¶
Instantiate and add a new
ocrd_models.ocrd_file.OcrdFile
. :param fileGrp:@USE
ofmets:fileGrp
to add to :type fileGrp: string- Keyword Arguments:
mimetype (string) –
@MIMETYPE
of themets:file
to useurl (string) –
@xlink:href
(URL or path) of themets:file
to useID (string) –
@ID
of themets:file
to usepageId (string) –
@ID
in the physicalmets:structMap
to link toforce (boolean) – Whether to add the file even if a
mets:file
with the same@ID
already exists.ignore (boolean) – Do not look for existing files at all. Shift responsibility for preventing errors from duplicate ID to the user.
local_filename (string) –
- remove_file(*args, **kwargs)[source]¶
Delete each
ocrd:file
matching the query. Same arguments asfind_files()
- remove_one_file(ID, fileGrp=None)[source]¶
Delete an existing
ocrd_models.ocrd_file.OcrdFile
. :param ID:@ID
of themets:file
to delete Can also be anocrd_models.ocrd_file.OcrdFile
to avoid search viaID
. :type ID: string|OcrdFile :param fileGrp:@USE
of themets:fileGrp
containing themets:file
. Used only for optimization. :type fileGrp: string- Returns:
The old
ocrd_models.ocrd_file.OcrdFile
reference.
- property physical_pages¶
List all page IDs (the
@ID
of each physicalmets:structMap
mets:div
)
- get_physical_pages(for_fileIds=None)[source]¶
List all page IDs (the
@ID
of each physicalmets:structMap
mets:div
), optionally for a subset ofmets:file
@ID
for_fileIds
.
- set_physical_page_for_file(pageId, ocrd_file, order=None, orderlabel=None)[source]¶
Set the physical page ID (
@ID
of the physicalmets:structMap
mets:div
entry) corresponding to themets:file
ocrd_file
, creating all structures if necessary. :param pageId:@ID
of the physicalmets:structMap
entry to use :type pageId: string :param ocrd_file: existingocrd_models.ocrd_file.OcrdFile
object :type ocrd_file: object- Keyword Arguments:
order (string) –
@ORDER
to useorderlabel (string) –
@ORDERLABEL
to use
- get_physical_page_for_file(ocrd_file)[source]¶
Get the physical page ID (
@ID
of the physicalmets:structMap
mets:div
entry) corresponding to themets:file
ocrd_file
.
- remove_physical_page_fptr(fileId)[source]¶
Delete all
mets:fptr[@FILEID = fileId]
tomets:file[@ID == fileId]
forfileId
from allmets:div
entries in the physicalmets:structMap
. :returns: fptrs were deleted from :rtype: List of pageIds that mets
- merge(other_mets, force=False, fileGrp_mapping=None, fileId_mapping=None, pageId_mapping=None, after_add_cb=None, **kwargs)[source]¶
Add all files from other_mets. Accepts the same kwargs as
find_files()
:keyword force: Whether toadd_file`s with force (overwriting existing ``mets:file``s) :kwtype force: boolean :keyword fileGrp_mapping: Map :py:attr:`other_mets()
fileGrp to fileGrp in this METS :kwtype fileGrp_mapping: dict :keyword fileId_mapping: Mapother_mets
file ID to file ID in this METS :kwtype fileId_mapping: dict :keyword pageId_mapping: Mapother_mets
page ID to page ID in this METS :kwtype pageId_mapping: dict :keyword after_add_cb: Callback received after file is added to the METS :kwtype after_add_cb: function