pdfwrite - improved handling of 'glyphshow' and similar operations

Bug #695168 "Problem converting xps to pdf" Bug #695259 "both (incorrect) B/W and (correct) AA rendering of Libertine font in same PDF output" Although these are different products, one being Ghostscript and the other gxps, the underlying problem is similar. The PostScript makes extensive use of the glyphshow operator, which ends up as a TEXT_FROM_SINGLE_GLYPH operation in the graphics library, and the XPS interpreter always uses TEXT_FROM_GLYPHS. In both cases the font is effectively unencoded when pdfwrite seee it. Since we cannot construct and use an unencoded font in PDF we have no alternative but to create an encoding for the font, and write the text using that encoding. The way this is done we use the name table, which effectively means that the character code we use is derived from the first byte of the glyph name. For limited usage this works well, but more complex usage can result in problems. For example the glyphs /o and /omicron are both encoded at index 103. Previously this would cause us to fall back to rendering the font and embedding as a type 3 font, or in the worst case an inline image. This is because when encoding the text we would discover that an the font already had a glyph encoded at the correct index, and would simply pass on, not realising it was the incorrect glyph. Later we would check the actual glyph index against the glyph index of the glyph encoded at that position, realise they were different and throw an error. In this commit we check the glyph index early, at the point where we encode the text. If the font already has a glyph encoded at the given character code, we check the glyph index to see if it matches the current glyph. If it does all is well but if it doesn't we break out and create a new font instance, with the new glyph encoded in it. Potentially this could result in a *lot* of font subsets being created, which would increase the size of the output PDF file, but the quality improvement is well worth it. No differences expected.
author: Ken Sharp <ken.sharp@artifex.com> 2014-05-27 14:09:50 +0100
committer: Ken Sharp <ken.sharp@artifex.com> 2014-05-27 14:09:50 +0100
commit: 64dd281abf84ba7383aa85c99599b5aebea3998a (patch)
tree: 7074cb8a905764343a1e83aca8e51d943c3c1203
parent: 099657a962e716658f20658051c9692b439ecf2d (diff)
1 files changed, 16 insertions, 0 deletions
diff --git a/gs/devices/vector/gdevpdte.c b/gs/devices/vector/gdevpdte.c
index 21e2ad665..1532a89f6 100644
--- a/gs/devices/vector/gdevpdte.c
+++ b/gs/devices/vector/gdevpdte.c
@@ -1413,6 +1413,7 @@ process_plain_text(gs_text_enum_t *pte, void *vbuf, uint bsize)
             return_error(gs_error_unregistered); /* Must not happen. */
         count = 0;
         for (i = 0; i < size; ++i) {
+            pdf_font_resource_t *pdfont;
             gs_glyph glyph = gdata[pte->index + i];
             int char_code_length;
 
@@ -1420,6 +1421,21 @@ process_plain_text(gs_text_enum_t *pte, void *vbuf, uint bsize)
                          buf + count, size - count, &char_code_length);
             if (code < 0)
                 break;
+            /* Even if we already have a glyph encoded at this position in the font
+             * it may not be the *right* glyph. We effectively use the first byte of
+             * the glyph name as the index when using glyphshow which means that
+             * /o and /omicron would be encoded at the same index. So we need
+             * to check the actual glyph to see if they are the same. To do
+             * that we need the PDF font resource which is attached to the font (if any).
+             * cf bugs #695259 and #695168
+             */
+            code = pdf_attached_font_resource((gx_device_pdf *)penum->dev, font,
+                            &pdfont, NULL, NULL, NULL, NULL);
+            if (pdfont && pdfont->u.simple.Encoding[*(buf + count)].glyph != glyph)
+                /* the glyph doesn't match the glyph already encoded at this position.
+                 * Breaking out here will start a new PDF font resource in the code below.
+                 */
+                break;
             count += char_code_length;
             if (operation & TEXT_INTERVENE)
                 break; /* Just do one character. */
author	Ken Sharp <ken.sharp@artifex.com>	2014-05-27 14:09:50 +0100
committer	Ken Sharp <ken.sharp@artifex.com>	2014-05-27 14:09:50 +0100
commit	64dd281abf84ba7383aa85c99599b5aebea3998a (patch)
tree	7074cb8a905764343a1e83aca8e51d943c3c1203
parent	099657a962e716658f20658051c9692b439ecf2d (diff)