Complex Type pc:TextEquivType

Namespace http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15
Diagram
Attribute pc:TextEquivType / @indexAttribute pc:TextEquivType / @confAttribute pc:TextEquivType / @dataTypeAttribute pc:TextEquivType / @dataTypeDetailsAttribute pc:TextEquivType / @commentsElement pc:TextEquivType / pc:PlainTextElement pc:TextEquivType / pc:Unicode
Used by
Model Element pc:TextEquivType / pc:PlainText , Element pc:TextEquivType / pc:Unicode
Children Element pc:TextEquivType / pc:PlainText, Element pc:TextEquivType / pc:Unicode
Attributes
QName Type Use
Attribute pc:TextEquivType / @comments string optional
Attribute pc:TextEquivType / @conf Simple Type pc:ConfSimpleType optional
OCR confidence value (between 0 and 1)
Attribute pc:TextEquivType / @dataType Simple Type pc:TextDataTypeSimpleType optional
Type of text content (is it free text or a number, for instance)
This is only a descriptive attribute, the text type is not checked during XML validation
Attribute pc:TextEquivType / @dataTypeDetails string optional
Refinement for dataType attribute. Can be a regular expression, for instance.
Attribute pc:TextEquivType / @index restriction of integer optional
Used for sort order in case multiple TextEquivs are defined. The text content with the lowest index should be interpreted as the main text content.
Source
<complexType name="TextEquivType">
  <sequence>
    <element name="PlainText" type="string" minOccurs="0">
      <annotation>
        <documentation>Text in a "simple" form (ASCII or extended ASCII as mostly used for typing). I.e. no use of special characters for ligatures (should be stored as two separate characters) etc.</documentation>
      </annotation>
    </element>
    <element name="Unicode" type="string">
      <annotation>
        <documentation>Correct encoding of the original, always using the corresponding Unicode code point. I.e. ligatures have to be represented as one character etc.</documentation>
      </annotation>
    </element>
  </sequence>
  <attribute name="index" use="optional">
    <annotation>
      <documentation>Used for sort order in case multiple TextEquivs are defined. The text content with the lowest index should be interpreted as the main text content.</documentation>
    </annotation>
    <simpleType>
      <restriction base="integer">
        <minInclusive value="0"/>
      </restriction>
    </simpleType>
  </attribute>
  <attribute name="conf" type="pc:ConfSimpleType" use="optional">
    <annotation>
      <documentation>OCR confidence value (between 0 and 1)</documentation>
    </annotation>
  </attribute>
  <attribute name="dataType" type="pc:TextDataTypeSimpleType" use="optional">
    <annotation>
      <documentation>Type of text content (is it free text or a number, for instance) This is only a descriptive attribute, the text type is not checked during XML validation</documentation>
    </annotation>
  </attribute>
  <attribute name="dataTypeDetails" type="string" use="optional">
    <annotation>
      <documentation>Refinement for dataType attribute. Can be a regular expression, for instance.</documentation>
    </annotation>
  </attribute>
  <!-- <attribute name="mergeWithNextRule" type="pc:TextMergeRuleSimpleType" use="optional">
				<annotation>
					<documentation>Rule for merging consecutive text objects. The rule applies to the first object of a pair (i.e. 'remove-last' removes the last
		character of the first region, can be used to remove hyphen, for example)</documentation>
				</annotation>
		</attribute>
		<attribute name="mergeWithNextRuleData" type="string" use="optional">
				<annotation>
					<documentation>Custom data for mergeRule attribute. Can number of characters to be removed, for example.</documentation>
				</annotation>
		</attribute> -->
  <attribute name="comments" type="string" use="optional"/>
</complexType>

Attribute pc:TextEquivType / @index

Namespace No namespace
Annotations
Used for sort order in case multiple TextEquivs are defined. The text content with the lowest index should be interpreted as the main text content.
Type restriction of integer
Properties
use: optional
Facets
minInclusive 0
Used by
Source
<attribute name="index" use="optional">
  <annotation>
    <documentation>Used for sort order in case multiple TextEquivs are defined. The text content with the lowest index should be interpreted as the main text content.</documentation>
  </annotation>
  <simpleType>
    <restriction base="integer">
      <minInclusive value="0"/>
    </restriction>
  </simpleType>
</attribute>

Attribute pc:TextEquivType / @conf

Namespace No namespace
Annotations
OCR confidence value (between 0 and 1)
Type Simple Type pc:ConfSimpleType
Properties
use: optional
Facets
maxInclusive 1
minInclusive 0
Used by
Source
<attribute name="conf" type="pc:ConfSimpleType" use="optional">
  <annotation>
    <documentation>OCR confidence value (between 0 and 1)</documentation>
  </annotation>
</attribute>

Attribute pc:TextEquivType / @dataType

Namespace No namespace
Annotations
Type of text content (is it free text or a number, for instance)
This is only a descriptive attribute, the text type is not checked during XML validation
Type Simple Type pc:TextDataTypeSimpleType
Properties
use: optional
Facets
enumeration xsd:decimal
Examples: "123.456", "+1234.456", "-1234.456", "-.456", "-456"
enumeration xsd:float
Examples: "123.456", "+1234.456", "-1.2344e56", "-.45E-6", "INF", "-INF", "NaN"
enumeration xsd:integer
Examples: "123456", "+00000012", "-1", "-456"
enumeration xsd:boolean
Examples: "true", "false", "1", "0"
enumeration xsd:date
Examples: "2001-10-26", "2001-10-26+02:00", "2001-10-26Z", "2001-10-26+00:00", "-2001-10-26", "-20000-04-01"
enumeration xsd:time
Examples: "21:32:52", "21:32:52+02:00", "19:32:52Z", "19:32:52+00:00", "21:32:52.12679"
enumeration xsd:dateTime
Examples: "2001-10-26T21:32:52", "2001-10-26T21:32:52+02:00", "2001-10-26T19:32:52Z", "2001-10-26T19:32:52+00:00", "-2001-10-26T21:32:52", "2001-10-26T21:32:52.12679"
enumeration xsd:string
Generic text string
enumeration other
An XSD type that is not listed or a custom type (use dataTypeDetails attribute)
Used by
Source
<attribute name="dataType" type="pc:TextDataTypeSimpleType" use="optional">
  <annotation>
    <documentation>Type of text content (is it free text or a number, for instance) This is only a descriptive attribute, the text type is not checked during XML validation</documentation>
  </annotation>
</attribute>

Attribute pc:TextEquivType / @dataTypeDetails

Namespace No namespace
Annotations
Refinement for dataType attribute. Can be a regular expression, for instance.
Type string
Properties
use: optional
Used by
Source
<attribute name="dataTypeDetails" type="string" use="optional">
  <annotation>
    <documentation>Refinement for dataType attribute. Can be a regular expression, for instance.</documentation>
  </annotation>
</attribute>

Attribute pc:TextEquivType / @comments

Namespace No namespace
Type string
Properties
use: optional
Used by
Source
<attribute name="comments" type="string" use="optional"/>

Element pc:TextEquivType / pc:PlainText

Namespace http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15
Annotations
Text in a "simple" form (ASCII or extended ASCII
as mostly used for typing). I.e. no use of
special characters for ligatures (should be
stored as two separate characters) etc.
Diagram

Type string
Properties
content: simple
minOccurs: 0
Source
<element name="PlainText" type="string" minOccurs="0">
  <annotation>
    <documentation>Text in a "simple" form (ASCII or extended ASCII as mostly used for typing). I.e. no use of special characters for ligatures (should be stored as two separate characters) etc.</documentation>
  </annotation>
</element>

Element pc:TextEquivType / pc:Unicode

Namespace http://schema.primaresearch.org/PAGE/gts/pagecontent/2018-07-15
Annotations
Correct encoding of the original, always using
the corresponding Unicode code point. I.e.
ligatures have to be represented as one
character etc.
Diagram

Type string
Properties
content: simple
Source
<element name="Unicode" type="string">
  <annotation>
    <documentation>Correct encoding of the original, always using the corresponding Unicode code point. I.e. ligatures have to be represented as one character etc.</documentation>
  </annotation>
</element>