ocrd_utils package

Utility functions and constants usable in various circumstances.

  • coordinates_of_segment, coordinates_for_segment

    These functions convert polygon outlines for PAGE elements on all hierarchy levels below page (i.e. region, line, word, glyph) between relative coordinates w.r.t. a corresponding image and absolute coordinates w.r.t. the top-level image. This includes rotation and offset correction, based on affine transformations. (Used by Workspace methods image_from_page and image_from_segment)

  • rotate_coordinates, shift_coordinates, transpose_coordinates, transform_coordinates

    These backend functions compose affine transformations for reflection, rotation and offset correction of coordinates, or apply them to a set of points. They can be used to pass down the coordinate system along with images (both invariably sharing the same operations context) when traversing the element hierarchy top to bottom. (Used by Workspace methods image_from_page and image_from_segment).

  • rotate_image, crop_image, transpose_image

    These PIL.Image functions are safe replacements for the rotate, crop, and transpose methods.

  • image_from_polygon, polygon_mask

    These functions apply polygon masks to PIL.Image objects.

  • xywh_from_points, points_from_xywh, polygon_from_points etc.

    These functions have the syntax X_from_Y, where X/Y can be

    • bbox is a 4-tuple of integers x0, y0, x1, y1 of the bounding box (rectangle)

      (used by PIL.Image)

    • points a string encoding a polygon: "0,0 100,0 100,100, 0,100"

      (used by PAGE-XML)

    • polygon is a list of 2-lists of integers x, y of points forming an (implicitly closed) polygon path: [[0,0], [100,0], [100,100], [0,100]]

      (used by opencv2 and higher-level coordinate functions in ocrd_utils)

    • xywh a dict with keys for x, y, width and height: {'x': 0, 'y': 0, 'w': 100, 'h': 100}

      (produced by tesserocr and image/coordinate recursion methods in ocrd.workspace)

    • x0y0x1y1 is a 4-list of strings x0, y0, x1, y1 of the bounding box (rectangle)

      (produced by tesserocr)

    • y0x0y1x1 is the same as x0y0x1y1 with positions of x and y in the list swapped

  • is_local_filename, safe_filename, abspath, get_local_filename

    FS-related utilities

  • is_string, membername, concat_padded, nth_url_segment, remove_non_path_from_url, parse_json_string_or_file

    String and OOP utilities

  • MIMETYPE_PAGE, EXT_TO_MIME, MIME_TO_EXT, VERSION

    Constants

  • logging, setOverrideLogLevel, getLevelName, getLogger, initLogging

    Exports of ocrd_utils.logging

ocrd_utils.abspath(url)[source]

Get a full path to a file or file URL

See os.abspath

ocrd_utils.adjust_canvas_to_rotation(size, angle)[source]

Calculate the enlarged image size after rotation.

Given a numpy array size of an original canvas (width and height), and a rotation angle in degrees counter-clockwise angle, calculate the new size which is necessary to encompass the full image after rotation.

Return a numpy array of the enlarged width and height.

ocrd_utils.adjust_canvas_to_transposition(size, method)[source]

Calculate the flipped image size after transposition.

Given a numpy array size of an original canvas (width and height), and a transposition mode method (see transpose_image), calculate the new size after transposition.

Return a numpy array of the enlarged width and height.

ocrd_utils.bbox_from_points(points)[source]

Construct a numeric list representing a bounding box from polygon coordinates in page representation.

ocrd_utils.bbox_from_xywh(xywh)[source]

Convert a bounding box from a numeric dict to a numeric list representation.

ocrd_utils.bbox_from_polygon(polygon)[source]

Construct a numeric list representing a bounding box from polygon coordinates in numeric list representation.

ocrd_utils.coordinates_for_segment(polygon, parent_image, parent_coords)[source]

Convert relative coordinates to absolute.

Given…

  • polygon, a numpy array of points relative to
  • parent_image, a PIL.Image (not used), along with
  • parent_coords, its corresponding affine transformation,

…calculate the absolute coordinates within the page.

That is, apply the given transform inversely to polygon The transform encodes (recursively):

  1. Whenever parent_image or any of its parents was cropped, all points must be shifted by the offset in opposite direction (i.e. coordinate system gets translated by the upper left).
  2. Whenever parent_image or any of its parents was rotated, all points must be rotated around the center of that image in opposite direction (i.e. coordinate system gets translated by the center in opposite direction, rotated purely, and translated back; the latter involves an additional offset from the increase in canvas size necessary to accomodate all points).

Return the rounded numpy array of the resulting polygon.

ocrd_utils.coordinates_of_segment(segment, parent_image, parent_coords)[source]

Extract the coordinates of a PAGE segment element relative to its parent.

Given…

  • segment, a PAGE segment object in absolute coordinates (i.e. RegionType / TextLineType / WordType / GlyphType), and
  • parent_image, the PIL.Image of its corresponding parent object (i.e. PageType / RegionType / TextLineType / WordType), (not used), along with
  • parent_coords, its corresponding affine transformation,

…calculate the relative coordinates of the segment within the image.

That is, apply the given transform to the points annotated in segment. The transform encodes (recursively):

  1. Whenever parent_image or any of its parents was cropped, all points must be shifted by the offset (i.e. coordinate system gets translated by the upper left).
  2. Whenever parent_image or any of its parents was rotated, all points must be rotated around the center of that image (i.e. coordinate system gets translated by the center in opposite direction, rotated purely, and translated back; the latter involves an additional offset from the increase in canvas size necessary to accomodate all points).

Return the rounded numpy array of the resulting polygon.

ocrd_utils.concat_padded(base, *args)[source]

Concatenate string and zero-padded 4 digit number

ocrd_utils.crop_image(image, box=None)[source]

“Crop an image to a rectangle, filling with background.

Given a PIL.Image image and a list box of the bounding rectangle relative to the image, crop at the box coordinates, filling everything outside image with the background. (This covers the case where box indexes are negative or larger than image width/height. PIL.Image.crop would fill with black.) Since image is not necessarily binarized yet, determine the background from the median color (instead of white).

Return a new PIL.Image.

ocrd_utils.getLevelName(lvl)[source]

Get (numerical) python logging level for (string) spec-defined log level name.

ocrd_utils.getLogger(*args, **kwargs)[source]

Wrapper around logging.getLogger that respects overrideLogLevel.

ocrd_utils.initLogging()[source]

Sets logging defaults

ocrd_utils.is_local_filename(url)[source]

Whether a url is a local filename.

ocrd_utils.is_string(val)[source]

Return whether a value is a str.

ocrd_utils.nth_url_segment(url, n=-1)[source]

Return the last /-delimited segment of a URL-like string

Parameters:
  • url (string) –
  • n (integer) – index of segment, default: -1
ocrd_utils.remove_non_path_from_url(url)[source]

Remove everything from URL after path.

ocrd_utils.membername(class_, val)[source]

Convert a member variable/constant into a member name string.

ocrd_utils.image_from_polygon(image, polygon, fill='background', transparency=False)[source]

“Mask an image with a polygon.

Given a PIL.Image image and a numpy array polygon of relative coordinates into the image, fill everything outside the polygon hull to a color according to fill:

  • if background (the default), then use the median color of the image;
  • otherwise use the given color, e.g. 'white' or (255,255,255).

Moreover, if transparency is true, then add an alpha channel from the polygon mask (i.e. everything outside the polygon will be transparent, for those consumers that can interpret alpha channels). Images which already have an alpha channel will have it shrinked from the polygon mask (i.e. everything outside the polygon will be transparent, in addition to existing transparent pixels).

Return a new PIL.Image.

ocrd_utils.parse_json_string_or_file(value='{}')[source]

Parse a string as either the path to a JSON object or a literal JSON object.

ocrd_utils.points_from_bbox(minx, miny, maxx, maxy)[source]

Construct polygon coordinates in page representation from a numeric list representing a bounding box.

ocrd_utils.points_from_polygon(polygon)[source]

Convert polygon coordinates from a numeric list representation to a page representation.

ocrd_utils.points_from_x0y0x1y1(xyxy)[source]

Construct a polygon representation from a rectangle described as a list [x0, y0, x1, y1]

ocrd_utils.points_from_xywh(box)[source]

Construct polygon coordinates in page representation from numeric dict representing a bounding box.

ocrd_utils.points_from_y0x0y1x1(yxyx)[source]

Construct a polygon representation from a rectangle described as a list [y0, x0, y1, x1]

ocrd_utils.polygon_from_bbox(minx, miny, maxx, maxy)[source]

Construct polygon coordinates in numeric list representation from a numeric list representing a bounding box.

ocrd_utils.polygon_from_points(points)[source]

Convert polygon coordinates in page representation to polygon coordinates in numeric list representation.

ocrd_utils.polygon_from_x0y0x1y1(x0y0x1y1)[source]

Construct polygon coordinates in numeric list representation from a string list representing a bounding box.

ocrd_utils.polygon_from_xywh(xywh)[source]

Construct polygon coordinates in numeric list representation from numeric dict representing a bounding box.

ocrd_utils.polygon_mask(image, coordinates)[source]

“Create a mask image of a polygon.

Given a PIL.Image image (merely for dimensions), and a numpy array polygon of relative coordinates into the image, create a new image of the same size with black background, and fill everything inside the polygon hull with white.

Return the new PIL.Image.

ocrd_utils.rotate_coordinates(transform, angle, orig=array([0, 0]))[source]

Compose an affine coordinate transformation with a passive rotation.

Given a numpy array transform of an existing transformation matrix in homogeneous (3d) coordinates, and a rotation angle in degrees counter-clockwise angle, as well as a numpy array orig of the center of rotation, calculate the affine coordinate transform corresponding to the composition of both transformations. (This entails translation to the center, followed by pure rotation, and subsequent translation back. However, since rotation necessarily increases the bounding box, and thus image size, do not translate back the same amount, but to the enlarged offset.)

Return a numpy array of the resulting affine transformation matrix.

ocrd_utils.rotate_image(image, angle, fill='background', transparency=False)[source]

“Rotate an image, enlarging and filling with background.

Given a PIL.Image image and a rotation angle in degrees counter-clockwise angle, rotate the image, increasing its size at the margins accordingly, and filling everything outside the original image according to fill:

  • if background (the default), then use the median color of the image;
  • otherwise use the given color, e.g. 'white' or (255,255,255).

Moreover, if transparency is true, then add an alpha channel fully opaque (i.e. everything outside the original image will be transparent for those that can interpret alpha channels). (This is true for images which already have an alpha channel, regardless of the setting used.)

Return a new PIL.Image.

ocrd_utils.safe_filename(url)[source]

Sanitize input to be safely used as the basename of a local file.

ocrd_utils.setOverrideLogLevel(lvl)[source]

Override all logger filter levels to include lvl and above.

  • Set root logger level
  • iterates all existing loggers and sets their log level to NOTSET.
Parameters:lvl (string) – Log level name.
ocrd_utils.shift_coordinates(transform, offset)[source]

Compose an affine coordinate transformation with a translation.

Given a numpy array transform of an existing transformation matrix in homogeneous (3d) coordinates, and a numpy array offset of the translation vector, calculate the affine coordinate transform corresponding to the composition of both transformations.

Return a numpy array of the resulting affine transformation matrix.

ocrd_utils.transform_coordinates(polygon, transform=None)[source]

Apply an affine transformation to a set of points. Augment the 2d numpy array of points polygon with a an extra column of ones (homogeneous coordinates), then multiply with the transformation matrix transform (or the identity matrix), and finally remove the extra column from the result.

ocrd_utils.transpose_coordinates(transform, method, orig=array([0, 0]))[source]

“Compose an affine coordinate transformation with a transposition (i.e. flip or rotate in 90° multiples).

Given a numpy array transform of an existing transformation matrix in homogeneous (3d) coordinates, a transposition mode method, as well as a numpy array orig of the center of the image, calculate the affine coordinate transform corresponding to the composition of both transformations, which is respectively:

  • PIL.Image.FLIP_LEFT_RIGHT: entails translation to the center, followed by pure reflection about the y-axis, and subsequent translation back
  • PIL.Image.FLIP_TOP_BOTTOM: entails translation to the center, followed by pure reflection about the x-axis, and subsequent translation back
  • PIL.Image.ROTATE_180: entails translation to the center, followed by pure reflection about the origin, and subsequent translation back
  • PIL.Image.ROTATE_90: entails translation to the center, followed by pure rotation by 90° counter-clockwise, and subsequent translation back
  • PIL.Image.ROTATE_270: entails translation to the center, followed by pure rotation by 270° counter-clockwise, and subsequent translation back
  • PIL.Image.TRANSPOSE: entails translation to the center, followed by pure rotation by 90° counter-clockwise and pure reflection about the x-axis, and subsequent translation back
  • PIL.Image.TRANSVERSE: entails translation to the center, followed by pure rotation by 90° counter-clockwise and pure reflection about the y-axis, and subsequent translation back

Return a numpy array of the resulting affine transformation matrix.

ocrd_utils.transpose_image(image, method)[source]

“Transpose (i.e. flip or rotate in 90° multiples) an image.

Given a PIL.Image image and a transposition mode method, apply the respective operation:

  • PIL.Image.FLIP_LEFT_RIGHT: all pixels get mirrored at half the width of the image
  • PIL.Image.FLIP_TOP_BOTTOM: all pixels get mirrored at half the height of the image
  • PIL.Image.ROTATE_180: all pixels get mirrored at both, the width and half the height of the image, i.e. the image gets rotated by 180° counter-clockwise
  • PIL.Image.ROTATE_90: rows become columns (but counted from the right) and columns become rows, i.e. the image gets rotated by 90° counter-clockwise; width becomes height and vice versa
  • PIL.Image.ROTATE_270: rows become columns and columns become rows (but counted from the bottom), i.e. the image gets rotated by 270° counter-clockwise; width becomes height and vice versa
  • PIL.Image.TRANSPOSE: rows become columns and vice versa, i.e. all pixels get mirrored at the main diagonal; width becomes height and vice versa
  • PIL.Image.TRANSVERSE: rows become columns (but counted from the right) and columns become rows (but counted from the bottom), i.e. all pixels get mirrored at the opposite diagonal; width becomes height and vice versa

Return a new PIL.Image.

ocrd_utils.unzip_file_to_dir(path_to_zip, output_directory)[source]

Extract a ZIP archive to a directory

ocrd_utils.xywh_from_bbox(minx, miny, maxx, maxy)[source]

Convert a bounding box from a numeric list to a numeric dict representation.

ocrd_utils.xywh_from_points(points)[source]

Construct a numeric dict representing a bounding box from polygon coordinates in page representation.