Menu
Cart (0)

You have no items in your shopping cart.

Subtotal: US$0.00
DK

Revu Mac – Raster, Vector and Text – What’s Really in My PDF?

Summary

Why you’re unable to successfully snap to an object in a PDF when taking a measurement or select text in Revu Mac.


Problem

You’re unable to successfully snap to an object in a PDF when taking a measurement or select text.

Why does this happen?

There are two common questions that we get from people who are working with PDF files:

  1. Why can’t I snap to the content in the PDF when doing measurements and takeoffs?
  2. Why can’t I highlight, select, edit or search for text?

The answer to both questions is that PDFs aren’t all created in the same way, and some contain more “smart information” than others – even though they look exactly the same at first glance.

PDFs can contain raster, vector and text, or a combination of all of three. Our brains are excellent at recognizing patterns, so it’s easy to look at a PDF and conclude that a PDF contains lines and text when it doesn’t. The page may contain lines and characters, but what are used to represent them in the PDF may not be vector lines and text elements, which are needed in order to snap to content and to search and select text.

Raster vs. Vector Content

Let’s look at the difference between raster and vector content in a PDF.

  Raster PDF   Vector PDF
   

Raster

A raster image is created from a series of square dots, called pixels. An example of a raster PDF is one that has been created by scanning an existing piece of paper. A scanned PDF is created by making a bitmap image, like a JPEG or TIFF, of the existing paper page, and then that image is place on the PDF page. This means that a scanned or raster PDF only contains a grid of dots that represent lines and text; it does not actually contain lines or text. So, there are no lines for the Snap to Content function to snap to and there is no text to select or search.

The way to determine if a PDF is a raster image, or scan, is to zoom in very closely. The lines and characters on the page will either change to a grid of square dots or become fuzzy.

Vector

A vector-based PDF uses line segments to define all of the geometry on the page. Most PDFs created from CAD are vector-based. Vector PDFs are usually preferred to raster PDFs because they contain more smart data that make it easier to work. You should always try to work with vector PDFs created from the source instead of creating PDFs from scans. The benefits of working with a vector PDF are that the display of the geometry remains sharp when you zoom in to see details of the drawing, and measurements and takeoffs (as well as their calibration) are more precise in a vector PDF because you can use Snap to Content to snap to the vector lines in the PDF.

Text

Text is an independent type of content in PDFs. You may see text characters in the PDF, but those characters may not necessarily be a PDF text element. Instead, it might be defined by raster dots or vector line segments. We can recognize and read both of these without any problems, but they cannot be selected, edited or searched because they are not PDF text elements.

To avoid confusion, I’m going to use “characters” when referring to text in general and “text” when referring to PDF text elements, or “real text.”

Before going into details, there is a quick test to determine if your PDF contains text. Click Edit > Select select-32x32_20x20 > Select All Text, and all text in the PDF (both text and OCR text, which I’ll explain) will highlight in blue. If the characters don’t highlight, they are either raster or vector.

Highlighted Text

  1. Let’s start with PDF text elements, or real text, which is always preferred because it results in a smarter PDF. PDFs created from character-based programs, like Word and Excel, almost always create PDFs that contain real text. When you zoom in on the text, the edges of the characters always look sharp and crisp – no matter how close you zoom in. The text is searchable and can always be selected.
  2. OCR text can also be included in PDFs. Running OCR, or Optical Character Recognition (found in Revu eXtreme only), on a PDF without real text will add hidden, searchable text to the PDF. The hidden OCR text can then be searched, selected and highlighted, but it does not display. OCR text will highlight blue when using Ctrl+Shift+A to show all the text, but remember that this is hidden text.
  3. Vector characters are created by line segments that are used to draw the shape of each character. This usually occurs when the PDF has been created from CAD (often AutoCAD) and a non-TrueType font is used.
    • An obvious question is, why doesn’t CAD use TrueType fonts and create real text? The answer is because AutoCAD came first – it is older than Macintosh, Windows, and TrueType fonts. They needed to create their own system of text and fonts, resulting in SHX fonts, which are defined using line segments. Those line segments are translated into the PDF instead of actual PDF text.
    • Using TrueType fonts in CAD is preferred when creating PDFs because TrueType fonts are converted to searchable text in PDF. The Bluebeam plugin for AutoCAD will automatically create PDFs with real, searchable text.
    • Vector characters from CAD can usually be spotted because they have a “lumpy” appearance when you zoom in close. The lines segments used to create the character prevent a smooth appearance in the curved strokes of a character, which is why they can be “lumpy”.
    • Graphic design programs, like Adobe Illustrator, also create vector characters. But, in this case the vector characters have clear, sharp edges when you zoom in. They create vector characters in PDF because it is common for designers to use custom fonts and formatting, and vector characters are used to make sure that the PDF looks exactly like the original file.
  4. Raster characters are just like what we talked about before. Individual pixels are used to define each character.

Examples of characters that are text, vector and raster, respectively.