Typeface (TextStyle)

The information on the font (type) must be documented in all three element areas (<Word>, <TextLine> and <TextRegion>) with the PAGE XML element <TextStyle>.

<TextRegion type="heading" id="r_7_1"">
            <Coords points="542,306 569,306 569,342 542,342"/>
            <TextLine id="tl_4" primaryLanguage="German">
                <Coords points="543,307 568,307 568,341 543,341"/>
                <Baseline points="543,350 568,350"/>
                <Word id="w_w1aab1c13b2b1b1ab1" language="German">
                    <Coords points="543,307 568,307 568,341 543,341"/>
                    <TextEquiv>
                        <Unicode>I.</Unicode>
                    </TextEquiv>
                    <TextStyle fontFamily="fraktur" fontSize="53.0" bold="true"/>
                </Word>
                <TextEquiv>
                    <Unicode>I.</Unicode>
                </TextEquiv>
                <TextStyle fontFamily="fraktur" fontSize="53.0" bold="true"/>
            </TextLine>
 </TextRegion>
Siehe: Complex Type pc:TextStyleType

Font Family Cluster

With originals from the 16th to 19th century, the typeface cannot always be identified unambiguously. However, the font can be assigned to a particular font family from a cluster of related font groups.

Figure 1. Font Family Example. Source: Weichselbaumer, Nikolaus; Seuret, Matthias; Limbach, Saskia et. al.: New Approaches to OCR for Early Printed Books. DigItalia 2-2020. DOI: 10.36181/digitalia-00015.
We recommend to use the following font families for the documentation of the @fontFamily:
  • antiqua
  • textura
  • gotico-antiqua
  • rotunda
  • italic
  • bastarda
  • greek
  • schwabacher
  • hebrew
  • fraktur
<Word>
   <TextStyle fontFamily="fraktur"/>
</Word>
Note:

This specification does not restrict the naming of font families.

Typefaces and Probability of Recognition (Confidence)

By indicating the probability of recognition (confidence) after naming the typeface or font family, several probable typefaces and font families can be named and rated according to their probability.

Typeface, font family and confidence are concatenated with a colon (:) followed by a floating point number between 0 (information is unlikely) and 1 (information is correct or likely).

If the confidence of a typeface or font family is not indicated, the value 1 is to be assumed.

Note:

Naming multiple typefaces and font families within the PAGE XML-element <Word> does not mean that the text was set with multiple fonts or families. It means instead that the text can be assigned to typefaces or font families. The confidence value shows the most likely match.

Naming multiple typefaces and font families within the PAGE XML element PAGE XML-element <TextLine> and <TextRegion> means that the text was set in different fonts or families.

        <TextRegion type="paragraph" id="TextRegion_1476719787056_252">
            <Coords points="980,2090 1529,2090 1741,2098 1741,2149 1529,2156 980,2156"/>
            <TextLine id="tl_83" primaryLanguage="German">
                <Coords points="981,2091 1528,2091 1528,2155 981,2155"/>
                <Baseline points="981,2154 1528,2154"/>
                <Word id="w_w1aab1c99b2b1b1ab1" language="German">
                    <Coords points="981,2096 1109,2096 1109,2151 981,2151"/>
                    <TextEquiv>
                        <Unicode>TroΕΏt</Unicode>
                    </TextEquiv>
                    <TextStyle fontFamily="rotunda:0.8,  bastarda:0.8 " fontSize="53.0"/>
                </Word>
                <Word id="w_w1aab1c99b2b1b1ac13" language="German">
                    <Coords points="1121,2097 1189,2097 1189,2139 1121,2139"/>
                    <TextEquiv>
                        <Unicode>der</Unicode>
                    </TextEquiv>
                    <TextStyle fontFamily="rotunda:0.8,  bastarda:0.8 " fontSize="53.0"/>
                </Word>
                <Word id="w_w1aab1c99b2b1b1ac21" language="German">
                    <Coords points="1209,2093 1540,2093 1540,2151 1209,2151"/>
                    <TextEquiv>
                        <Unicode>Seefahrenden.</Unicode>
                    </TextEquiv>
                    <TextStyle fontFamily="rotunda:0.8,  bastarda:0.8 " fontSize="53.0"/>
                </Word>
                <TextEquiv>
                    <Unicode>TroΕΏt der Seefahrenden.</Unicode>
                </TextEquiv>
                <TextStyle fontFamily="rotunda:0.8,  bastarda:0.8 " fontSize="53.0"/>
            </TextLine>
            <TextLine id="line_1476720742138_2">
                <Coords points="1675,2107 1742,2107 1742,2148 1675,2148"/>
                <Baseline points="1676,2149 1738,2146"/>
                <Word id="word_1476721009045_26">
                    <Coords points="1673,2103 1740,2103 1740,2151 1673,2151"/>
                    <TextEquiv>
                        <Unicode>538</Unicode>
                    </TextEquiv>
                <TextStyle fontFamily="antiqua:0.8" fontSize="53.0"/>
                </Word>
                <TextEquiv>
                    <Unicode>538</Unicode>
                </TextEquiv>
              <TextStyle fontFamily="antiqua:0.8" fontSize="53.0"/>
            </TextLine>
            <TextEquiv>
                <Unicode>
                  TroΕΏt der Seefahrenden. 538
                </Unicode>
            </TextEquiv>
         <TextStyle fontFamily="rotunda:0.8,  bastarda:0.8, antiqua:0.8" fontSize="53.0"/>
        </TextRegion>