API Reference
The APIs are segregated into 3 modules:
- Common
- COS
- PD
Common module has general system access and file access and parsing APIs.
COS module is the low level file format for PDF. Carousel Object Structure was original term proposed inside Adobe which later transformed into Acrobat. COS layer has the object structure, definition and the cross references to access them.
PD module is the higher level document access layer. Accessing PDF pages or extracting the content from there or understanding document rendering using fonts or image objects will be typically in this layer.
A detailed explanation of these layers and their rational has been explained in the Architecture and Design section.
Common
PDFIO.Common.CDTextString — Type CDTextStringPDF file format structure provides two primary string types. Hexadecimal string CosXString and literal string CosLiteralString. However, these are mere binary representation of string types without having any encoding associated for semantic representation. Determination of encoding is carried out mostly by associated fonts and character maps in the content stream. There are also strings used in descriptions and other attributes of a PDF file where no font or mapping information is provided. This represents the string type in such situations. Typically, strings in PDFs are of 3 types.
- Text string a. PDDocEncoded string - Similar to ISO_8859-1 b. UTF-16BE strings
- ASCII string
- Byte string - Pure binary data no interpretation
1 and 2 can be represented by the CDTextString. convert methods are provided to translate the CosString to CDTextString
Ref: PDF Specification Section 7.9.2
Note: Internally CDTextString is a String object of julia.
PDFIO.Common.CDDate — Type CDDateInternally represented as string objects, these are timezone enabled date and time objects.
PDF files support the string format: (D:YYYYMMDDHHmmSSOHH'mm)
PDFIO.Common.CDDate — Method CDDate(s::CDTextString)PDF files support the string format: (D:YYYYMMDDHHmmSSOHH'mm)
Example
julia> date = CDDate("D:20190425173659+05'30")
D:20190425173659+05'30
julia> date.d
2019-04-25T17:36:59
julia> date.tz
5 hours, 30 minutes
julia> date.ahead
truePDFIO.Common.getUTCTime — Function getUTCTime(d::CDDate) -> CDDateRemoves the timezone information and returns the CDDate at UTC.
Example
julia> getUTCTime(CDDate("D:20190425173659+05'30"))
D:20190425120659ZPDFIO.Common.CDRect — Type CDRectCosArray representation of a rectangle in the lower left and upper right point format
Note: CDRect maps to a Rect object in the Rectangle package.
Example
julia> CDRect(CosArray(CosObject[CosInt(0), CosInt(0), CosInt(840), CosFloat(640)]))
Rect:[0.0 0.0 840.0 640.0]COS Objects
PDFIO.Cos.CosObject — Type CosObjectPDF is a structured document format with lots of internal data structures like dictionaries, arrays, trees. CosObject is the interface to access these objects and get detailed access to the objects and gather additional information. Although, defined in the COS layer, objects of these type are returned from almost all the APIs. Hence, the objects have a separate significance whether you need to use the Cos layer or not. Below is the object hierarchy.
CosObject Abstract
CosNull Value (CosNullType)
CosString Abstract
CosName Concrete
CosNumeric Abstract
CosInt Concrete
CosFloat Concrete
CosBoolean Concrete
CosTrue Value (CosBoolean)
CosFalse Value (CosBoolean)
CosDict Concrete
CosArray Concrete
CosStream Concrete (always wrapped as an indirect object)
CosIndirectObjectRef Concrete (only useful when CosDoc is available)Note: As a reader API you may not need to instantiate any of CosObject types. They are normally populated as a result of parsing a PDF file.
PDFIO.Cos.CosNull — Constant CosNullPDF representation of a null object. Can be applied to CosObject of any type.
PDFIO.Cos.CosString — Type CosStringAbstract type that represents a PDF string. In PDF objects are mere byte representations. They translate to actual text strings by application of fonts and associated encodings.
PDFIO.Cos.CosName — Type CosNameName objects are symbols used in PDF documents.
PDFIO.Cos.@cn_str — Macro @cn_str(str) -> CosNameA string decorator for easier instantiation of a CosName
Example:
julia> cn"Name"
/NamePDFIO.Cos.CosNumeric — Type CosNumericAbstract type for numeric objects. The objects can be an integer CosInt or float CosFloat.
PDFIO.Cos.CosInt — Type CosIntAn integer in PDF document.
PDFIO.Cos.CosFloat — Type CosFloatA numeric float data type.
PDFIO.Cos.CosBoolean — Type CosBooleanA boolean object in PDF which is either a CosTrue or CosFalse
PDFIO.Cos.CosDict — Type CosDictName value pair of a PDF objects. The object is very similar to the Dict object. The key has to be of a CosName type.
PDFIO.Cos.set! — Method set!(dict::CosDict, name::CosName, obj::CosObject) -> CosDict
set!(stm::CosStream, name::CosName, obj::CosObject) -> CosStream
Sets the value on a dictionary object. Setting a CosNull object deletes the object from the dictionary.
In case of CosStream objects the data is added to the extent dictionary.
Example
julia> set!(catalog, cn"Version", cn"1.4")
julia> <<
...
/Version /1.4
...
>>PDFIO.Cos.CosArray — Type CosArrayAn array in a PDF file. The objects can be any combination of CosObject.
Base.length — Method length(o::CosArray) -> IntNumber of elements in CosArray
Example
julia> a = CosArray(CosObject[CosInt(1), CosFloat(2f0),
CosInt(3), CosFloat(4f0)])
[1 2.0 3 4.0 ]
julia> length(a)
4PDFIO.Cos.CosStream — Type CosStreamA stream object in a PDF. Stream objects have an extends disctionary, followed by binary data.
PDFIO.Cos.CosIndirectObjectRef — Type CosIndirectObjectRefA parsed data structure to ensure the object information is stored as an object. This has no meaning without a associated CosDoc. When a reference object is hit the object should be searched from the CosDoc and returned.
Base.get — Function get(o::CosObject) -> val
get(o::CosIndirectObjectRef) -> (objnum, gennum) get(o::CosArray, isNative=false) -> Vector{CosObject}An array in a PDF file. The objects can be any combination of CosObject.
isNative = true will return the underlying native object inside the CosArray by invoking get method on it.
Example
julia> a = CosArray(CosObject[CosInt(1), CosFloat(2f0), CosInt(3), CosFloat(4f0)])
[1 2.0 3 4.0 ]
julia> get(a)
4-element Array{CosObject,1}:
1
2.0
3
4.0
julia> get(a, true)
4-element Array{Real,1}:
1
2.0f0
3
4.0f0 get(dict::CosDict, name::CosName, defval::T = CosNull) where T ->
Union{CosObject, T}
get(stm::CosStream, name::CosName, defval::T = CosNull) where T ->
Union{CosObject, T}Returns the value as a CosObject for the key name or the defval provided.
In case of CosStream objects the data is collected from the extent dictionary.
Example
julia> get(catalog, cn"Version")
null
julia> get(catalog, cn"Version", cn"1.4")
/1.4
julia> get(catalog, cn"Version", "1.4")
"1.4" get(stm::CosStream) -> IODecodes the stream and provides output as an IO.
Example
julia> stm
448 0 obj
<<
/FFilter /FlateDecode
/F (/tmp/tmpIyGPhL/tmp9hwwaG)
/Length 437
>>
stream
...
endstream
endobj
julia> io = get(stm)
IOBuffer(data=UInt8[...], readable=true, writable=true, ...)
PD
PDFIO.PD.PDDoc — Type PDDocAn in memory representation of a PDF document. Mostly, used as an opaque handle to be passed on to other methods.
See pdDocOpen.
PDFIO.PD.pdDocOpen — Function pdDocOpen(filepath::AbstractString) -> PDDocOpens a PDF document and provides the PDDoc document object for subsequent query into the PDF file. filepath is the path to the PDF file in the relative or absolute path format.
Remember to release the document with pdDocClose, once the object is no longer required. Although doc has certain members, it should normally considered as an opaque handle.
Example
julia> doc = pdDocOpen("test/PDFTest-0.0.4/stillhq/3.pdf")
PDDoc ==>
CosDoc ==>
filepath: /home/sambit/.julia/dev/PDFIO/test/PDFTest-0.0.4/stillhq/3.pdf
size: 817945
hasNativeXRefStm: false
Trailer dictionaries:
<<
/Info 146 0 R
/Prev 814755
/Size 163
/Root 154 0 R
/ID [<2ff783c9846ab546bd49f709cb7be307> <2ff783c9846ab546bd49f709cb7be307> ]
>>
<<
/Size 153
/ID [<2ff783c9846ab546bd49f709cb7be307> <2ff783c9846ab546bd49f709cb7be307> ]
>>
Catalog:
154 0 obj
<<
/Type /Catalog
/Pages 152 0 R
>>
endobj
isTagged: nonePDFIO.PD.pdDocClose — Function pdDocClose(doc::PDDoc, num::Int) -> NothingReclaim the resources associated with a PDDoc object. Once called the PDDoc object cannot be further used.
Example
julia> pdDocClose(doc)PDFIO.PD.pdDocGetCatalog — Function pdDocGetCatalog(doc::PDDoc) -> CosObjectCatalog is considered the topmost level object in PDF document that is subsequently used to traverse and extract information from a PDF document. To be used for accessing PDF internal objects from document structure when no direct API is available.
Example
julia> pdDocGetCatalog(doc)
154 0 obj
<<
/Pages 152 0 R
/Type /Catalog
>>
endobjPDFIO.PD.pdDocGetNamesDict — Function pdDocGetNamesDict(doc::PDDoc) -> CosObjectSome information in PDF is stored as name and value pairs not essentially a dictionary. They are all aggregated and can be accessed from one names dictionary object in the document catalog. This method provides access to such values in a PDF file. Not all PDF document may have a names dictionary. In such cases, a CosNull object may be returned.
Please refer to the PDF specification for further details.
Example
julia> pdDocGetNamesDict(doc)
220 0 obj
<<
/IDS 123 0 R
/Dests 119 0 R
/URLS 124 0 R
>>
endobjPDFIO.PD.pdDocGetInfo — Function pdDocGetInfo(doc::PDDoc) -> DictGiven a PDF document provides the document information available in the Document Info dictionary. The information typically includes creation date, modification date, author, creator used etc. However, all information content are not mandatory. Hence, all information needed may not be available in a document. If document does not have Info dictionary at all this method returns nothing.
Please refer to the PDF specification for further details.
Example
julia> pdDocGetInfo(doc)
Dict{String,Union{CDDate, String, CosObject}} with 7 entries:
"Subject" => "AU-B Australian Documents"
"Producer" => "HPA image bureau 1998-1999"
"Author" => "IP Australia"
"ModDate" => D:19990527113911Z
"Keywords" => "Patents"
"Creator" => "HPA image bureau 1998-1999"
"Title" => "199479714D"PDFIO.PD.pdDocGetCosDoc — Function pdDocGetCosDoc(doc::PDDoc) -> CosDocPDF document format is developed in two layers. A logical PDF document information is represented over a physical file structure called COS. CosDoc is an access object to the physical file structure of the PDF document. To be used for accessing PDF internal objects from document structure when no direct API is available.
One can access any aspect of PDF using the COS level APIs alone. However, they may require you to know the PDF specification in details and it is not the most intuititive.
Example
julia> cosdoc = pdDocGetCosDoc(doc)
CosDoc ==>
filepath: /home/sambit/.julia/dev/PDFIO/test/PDFTest-0.0.4/stillhq/3.pdf
size: 817945
hasNativeXRefStm: false
Trailer dictionaries:
<<
/ID [<2ff783c9846ab546bd49f709cb7be307> <2ff783c9846ab546bd49f709cb7be307> ]
/Size 163
/Prev 814755
/Info 146 0 R
/Root 154 0 R
>>
<<
/ID [<2ff783c9846ab546bd49f709cb7be307> <2ff783c9846ab546bd49f709cb7be307> ]
/Size 153
>>PDFIO.PD.pdDocGetPage — Function pdDocGetPage(doc::PDDoc, num::Int) -> PDPage
pdDocGetPage(doc::PDDoc, ref::CosIndirectObjectRef) -> PDPageGiven a document absolute page number or object reference, provides the associated page object.
Example
julia> page = pdDocGetPage(doc, 1)
PDFIO.PD.PDPageImpl(...)
julia> page = pdDocGetPage(doc, CosIndirectObjectRef(155, 0))
PDFIO.PD.PDPageImpl(...)PDFIO.PD.pdDocGetPageCount — Function pdDocGetPageCount(doc::PDDoc) -> IntReturns the number of pages associated with the document.
Example
julia> pdDocGetPageCount(doc)
30PDFIO.PD.pdDocGetPageRange — Function pdDocGetPageRange(doc::PDDoc, nums::AbstractRange{Int}) -> Vector{PDPage}
pdDocGetPageRange(doc::PDDoc, label::AbstractString) -> Vector{PDPage}Given a range of page numbers or a label returns an array of pages associated with it. For a detailed explanation on page labels, refer to the method pdDocHasPageLabels.
Example
julia> pages = pdDocGetPageRange(doc, 1:4);
julia> typeof(pages)
Array{PDFIO.PD.PDPageImpl,1}
julia> length(pages)
4PDFIO.PD.pdDocHasPageLabels — Function pdDocHasPageLabels(doc::PDDoc) -> BoolReturns true if the document has page labels defined.
As per PDF Specification 1.7 Section 12.4.2, a document may optionally define page labels (PDF 1.3) to identifyeach page visually on the screen or in print. Page labels and page indices need not coincide: the indices shallbe fixed, running consecutively through the document starting from 0 for the first page, but the labels may be specified in any way that is appropriate for the particular document.
Example
julia> PDFIO.PD.pdDocHasPageLabels(doc)
falsePDFIO.PD.pdDocGetPageLabel — Function pdDocGetPageLabel(doc::PDDoc, pageno::Int) -> StringReturns the page label if the page has a page label associated to it.
As per PDF Specification 1.7 Section 12.4.2, a document may optionally define page labels (PDF 1.3) to identify each page visually on the screen or in print. Page labels and page indices need not coincide: the indices shallbe fixed, running consecutively through the document starting from 0 for the first page, but the labels may be specified in any way that is appropriate for the particular document.
Example
julia> pdDocGetPageLabel(doc, 3)
"ii"PDFIO.PD.pdDocGetOutline — Function pdDocGetOutline(doc::PDDoc) -> PDOutlineGiven a PDF document provides the document Outline (Table of Contents) available in the Document Catalog dictionary. If document does not have Outline, this method returns nothing.
A PDF document may contain a document outline that the conforming reader may display on the screen, allowing the user to navigate interactively from one part of the document to another. The outline consists of a tree-structured hierarchy of outline items (sometimes called bookmarks), which serve as a visual table of contents to display the document’s structure to the user. The user may interactively open and close individual items by clicking them with the mouse. When an item is open, its immediate children in the hierarchy shall become visible on the screen; each child may in turn be open or closed, selectively revealing or hiding further parts of the hierarchy. When an item is closed, all of its descendants in the hierarchy shall be hidden. Clicking the text of any visible item activates the item, causing the conforming reader to jump to a destination or trigger an action associated with the item. - Section 12.3.3 - Document management — Portable document format — Part 1: PDF 1.7
Example
julia> outline = pdDocGetOutline(doc)
555 0 R
julia> iob = IOBuffer();
julia> using AbstractTrees; print_tree(iob, outline)
julia> write(stdout, iob.data)
Contents
├─ Table of Contents
├─ 1. Introduction
├─ 2. Quick Steps - Kernel Compile
│ ├─ 2.1. Precautionary Preparations
│ ├─ 2.2. Minor Upgrading of Kernel
│ ├─ 2.3. For the Impatient
│ ├─ 2.4. Building New Kernel - Explanation of Steps
│ ├─ 2.5. Troubleshooting
...PDFIO.PD.pdDocHasSignature — Function pdDocHasSignature(doc::PDDoc) -> BoolReturns true when the document has at least one signature field.
This does not mean there is an actual digital signature embedded in the document. A PDF document can be signed and content can be approved by one or more reviewers. Signature fields are placeholders for storing and rendering such information.
Example
julia> pdDocHasSignature(doc)
truePDFIO.PD.pdDocValidateSignatures — Function pdDocValidateSignatures(doc::PDDoc; export_certs=false) -> Vector{Dict{Symbol, Any}}Input
| param | Description |
|---|---|
| doc | The document for which all the signatures are to be validated. |
| export_certs | Optional keyword parameter when set, exports all the |
| certificates that are embeded in the PDF document. These | |
| certificates can be for end-entities or one or more certifying | |
| authorities. | |
Certificates are exported to the file <PDF filename>.pem. |
Output
Vector of dictionary objects representing one dictionary object for each signature. The dictionary objects map the symbols to output as per the following table.
| Symbol | Description |
|---|---|
| :Name | The name of the person or authority signing the document. |
| :P | Object reference of the page in which the signature is found. |
| :M | The CDDate when the document was signed. |
| :certs | The certificates associated with every signature object. |
| :subfilter | The subfilter of PDF signature object. |
| :FQT | Fully qualified title of the signature form. |
| :chain | The certificate chain that validated the signature. |
| :passed | Validation status of the signature (true / false) |
| :error_message | Error message returned during the validation |
| :stacktrace | The stack dump of where the validation failure occurred |
Notes
- Any additional certificates needed for validating a certificate trust chain has to be added manually to the Trust Store file at:
<Package Directory>/data/certs/cacerts.pemin the PEM format. Normally, certificate authorities (root as well as intermediate) are represented in the trust store. - Presence of an end-entity certificate in the Trust Store ensures that the chain validation for the certificate does not have to be carried out. However, this is not considered a good practice for certificates as the certificate validation is an important attribute to avoid security breaches in the chain. In case of self-signed certificates with not CA capabilities this may be the only option.
- Validation of digital signatures are limited to the approval signature validation as per section 12.8.1 of PDF Spec. 1.7. Signatures for permissions and usage rights are not validated as per this method. This API only provides a validation report. It does not modify access to any parts of the document based on the validation output. The consumer of the API needs to take appropriate action based on the validation report as desired in their applications.
- Revocation - When time is embedded in the signature as signing-time attribute or a signed timestamp or PDF sigature dictionary has M attribute, then those are picked up for validation. However, revocation information are not used during validation.
- PDF 2.0 Support - The support is only experimental. While some subfilters like
/ETSI.CAdES.detachedare supported. Document Security Store (DSS) and Document Time Stamp (DTS) has not been implemented.
Example
julia> r = pdDocValidateSignatures(doc);
julia> r[1] # Failure case
Dict{Symbol,Any} with 8 entries:
:Name => "JAYANT KUMAR ARORA"
:P => 1 0 R
:M => D:20190425173659+05'30
:error_message => "Error in Crypto Library:
140322274480320:error:02001002:system library:..."
:subfilter => /adbe.pkcs7.sha1
:stacktrace => ["error(::String) at error.jl:33",
"openssl_error(::Int32) at PDCrypt.jl:96",
"PDFIO.PD.PDCertStore() at PDCrypt.jl:148",
...]
:FQT => "Signature1"
:passed => false
julia> r[1] # Passed case
Dict{Symbol,Any} with 8 entries:
:Name => "JAYANT KUMAR ARORA"
:P => 1 0 R
:M => D:20190425173659+05'30
:certs => Dict{Symbol,Any}[Certificate Parameters...]
:subfilter => /adbe.pkcs7.sha1
:FQT => "Signature1"
:chain => Dict{Symbol,Any}[Certificate Parameters...]
:passed => true
PDFIO.PD.pdPageGetContents — Function pdPageGetContents(page::PDPage) -> CosObjectPage rendering objects are normally stored in a CosStream object in a PDF file. This method provides access to the stream object.
Please refer to the PDF specification for further details.
Example
julia> pdPageGetContents(page)
448 0 obj
<<
/Length 437
/FFilter /FlateDecode
/F (/tmp/tmpZnGGFn/tmp5J60vr)
>>
stream
...
endstream
endobjPDFIO.PD.pdPageIsEmpty — Function pdPageIsEmpty(page::PDPage) -> BoolReturns true when the page has no associated content object.
Example
julia> pdPageIsEmpty(page)
falsePDFIO.PD.pdPageGetCosObject — Function pdPageGetCosObject(page::PDPage) -> CosObjectPDF document format is developed in two layers. A logical PDF document information is represented over a physical file structure called COS. This method provides the internal COS object associated with the page object.
PDFIO.PD.pdPageGetContentObjects — Function pdPageGetContentObjects(page::PDPage) -> CosObjectPage rendering objects are normally stored in a CosStream object in a PDF file. This method provides access to the stream object.
PDFIO.PD.pdPageGetMediaBox — Function pdPageGetMediaBox(page::PDPage) -> CDRect{Float32}
pdPageGetCropBox(page::PDPage) -> CDRect{Float32}Returns the media box associated with the page. See 14.11.2 PDF 1.7 Spec.It's typically, the designated size of the paper for the page. When a crop box is not defined, it defaults to the media box.
Example
julia> pdPageGetMediaBox(page)
Rect:[0.0 0.0 595.0 792.0]
julia> pdPageGetCropBox(page)
Rect:[0.0 0.0 595.0 792.0]PDFIO.PD.pdPageGetFonts — Function pdPageGetFonts(page::PDPage) -> Dict{CosName, PDFont}()Returns a dictionary of fonts in the page.
#Example
julia> pdPageGetFonts(page)
Dict{CosName,PDFIO.PD.PDFont} with 4 entries:
/F0 => PDFont(…
/F4 => PDFont(…
/F8 => PDFont(…
/F9 => PDFont(…
PDFIO.PD.pdPageExtractText — Function pdPageExtractText(io::IO, page::PDPage) -> IOExtracts the text from the page. This extraction works best for tagged PDF files. For PDFs not tagged, some line and word breaks will not be extracted properly.
Example
Following code will extract the text from a full PDF file.
function getPDFText(src, out)
doc = pdDocOpen(src)
docinfo = pdDocGetInfo(doc)
open(out, "w") do io
npage = pdDocGetPageCount(doc)
for i=1:npage
page = pdDocGetPage(doc, i)
pdPageExtractText(io, page)
end
end
pdDocClose(doc)
return docinfo
endPDFIO.PD.pdPageGetPageNumber — Function pdPageGetPageNumber(page::PDPage)Returns the page number of the document page.
Example
julia> pdPageGetPageNumber(page)
1PDFIO.PD.pdFontIsBold — Function pdFontIsBold(pdfont::PDFont) ->BoolReturns `true` is the fonts have the attribute specifiedPDFIO.PD.pdFontIsItalic — Function pdFontIsItalic(pdfont::PDFont) ->BoolReturns `true` is the fonts have the attribute specifiedPDFIO.PD.pdFontIsFixedW — Function pdFontIsFixedW(pdfont::PDFont) ->BoolReturns `true` is the fonts have the attribute specifiedPDFIO.PD.pdFontIsAllCap — Function pdFontIsAllCap(pdfont::PDFont) ->BoolReturns `true` is the fonts have the attribute specifiedPDFIO.PD.pdFontIsSmallCap — Function pdFontIsSmallCap(pdfont::PDFont)->BoolReturns `true` is the fonts have the attribute specifiedPDFIO.PD.PDOutline — Type PDOutlineRepresentation of PDF document Outline (Table of Contents).
Use the methods from AbstractTrees package to traverse the elements.
PDFIO.PD.PDOutlineItem — Type PDOutlineItemRepresentation of PDF document Outline item.
PDFIO.PD.PDDestination — Type PDDestinationUsed for variety of purposes to locate a rectangular region in a PDF document. Particularly, used in outlines, actions etc.
The structure can denote a location outside of a document as well like in remote GoTo(GoToR) actions. In such cases, it's best be used with filename additionally. Moreover, page references have no meaning in remote file references. Hence, the pageno attribute has been set to Int unlike the PDF Spec 32000-2008 12.3.2.2.
- `pageno::Int` - Page number location
- `layout::CosName` - Various view layouts are possible. Please review thePDF spec for details. - values::Vector{Float32} - [left, bottom, right, top] sequence array. Not all values are used. The usage depends on the layout parameter. - zoom::Float32 - Zoom value for the view. Can be zero depending on - layout where it's intrinsic; hence, redundant.
PDFIO.PD.pdOutlineItemGetAttr — Function pdOutlineItemGetAttr(item::PDOutlineItem) -> Dict{Symbol, Any}Attributes stored with an PDOutlineItem object. The traversal parameters like Prev, Next, First, Last and Parent are stored with the structure.
The following keys are stored in the dictionary object returned:
:Title- The title assigned to the item (shows up in the table of content):Count- A representation of no of items open under the outline item. Please
refer to the PDF Spec 32000-2008 section 12.3.2.2 for details. Mostly, used for rendering on a user interface.
:Destination-(filepath, PDDestination)value. Filepath is an empty string
if the destination refers to a location in the same PDF file. This parameter is a combination of /Dest and /A attribute in the PDF specification. The action element is analyzed and data is extracted and stored with the PDDestination as the final refered location.
:C- The color of the outline in theDeviceRGBspace.:F- Flags for title text renderingitalic=1,bold=2
Example
julia> pdOutlineItemGetAttr(outlineitem)
Dict{Symbol,Any} with 5 entries:
:F => 0x00
:Title => "Table of Contents"
:Count => 0
:Destination => ("", PDDestination(2, /XYZ, Float32[0.0, 0.0, 0.0, 756.0], 0.0))
:C => Float32[0.0, 0.0, 0.0]PDF Page objects
PDFIO.PD.PDPageObject — Type PDPageObjectThe content streams associated with PDF pages contain the objects that can be rendered. These objects are represented by PDPageObject. These objects can contain a postfix notation based operator prefixed by its operands like:
(Draw this text) TjAs can be seen above, the string object is a CosString which is a parameter to the operand Tj or draw text. These class of objects are represented by PDPageElement.
However, there are certain objects which only provide grouping information or begin and end markers for grouping information. For example, a text object:
BT
/F1 11 Tf %selectfont
(Draw this text) Tj
ETThese kind of objects are represented by PDPageObjectGroup. In this case, the PDPageObjectGroup contains four PDPageElement. Namely, represented as operators BT, Tf, Tj, ET.
PDPageElement and PDPageObjectGroup can be extended by composition. Hence, there are more specialized objects that can be seen as well.
PDFIO.PD.PDPageElement — Type PDPageElementA representation of a content object with operator and operand. See PDPageObject for more details.
PDFIO.PD.PDPageObjectGroup — Type PDPageObjectGroupA representation of a content object that encloses other content objects. See PDPageObject for more details.
PDFIO.PD.PDPageTextObject — Type PDPageTextObjectA PDPageObjectGroup object that represents a block of text. See PDPageObject for more details.
PDFIO.PD.PDPageTextRun — Type PDPageTextRunIn PDF text may not be contiguous as there may be chnge of font, style, graphics rendering parameters. PDPageTextRun is a unit of text which can be rendered without any change to the graphical parameters. There is no guarantee that a text run will represent a meaningful word or sentence.
PDPageTextRun is a composition implementation of PDPageElement.
PDFIO.PD.PDPageMarkedContent — Type PDPageMarkedContentA PDPageObjectGroup object that represents a group of a object that is logically grouped together in case of a structured PDF document.
PDFIO.PD.PDPageInlineImage — Type PDPageInlineImageMost images in PDF documents are defined in the PDF document and referenced from the page content stream. PDPageInlineImage objects are directly defined in the page content stream.
PDFIO.PD.PDPage_BeginGroup — Type PDPage_BeginGroupA PDPageElement that represents the beginning of a group object.
PDFIO.PD.PDPage_EndGroup — Type PDPage_EndGroupA PDPageElement that represents the end of a group object.
COS Methods
PDFIO.Cos.CosDoc — Type CosDocPDF file structure provides how the objects are arranged in a PDF file. PDF is designed to be accessed in a random access order. Some of the objects in PDF like fonts can be referred from multiple page objects. To address these concerns objects are provided reference identifiers and mappings are provided from various locations in the PDF files. Moreover, to reduce the size of the files, the objects are put inside stream containers and can be compressed. Access to a specific object reference may need several lookups before the actual object can be traced. All these lead to a fairly complex arrangement of objects. CosDoc wraps all the object reference schemes and provide a simplified API called cosDocGetObject and simplifies object look up. Thus any PDF object can be classified into the following forms based on how they are represented in a document:
- Direct Objects: Direct objects are defined where they are referred or used.
- Indirect Objects: Indirect objects have reference identifiers, there location in a PDF document is described through a Object Reference identifier.
One can access any aspect of PDF using the COS level APIs alone. However, they may require you to know the PDF specification in details and they are not the most intuititive.
PDFIO.Cos.cosDocOpen — Function cosDocOpen(filepath::AbstractString) -> CosDocProvides the access to the physical file and file structure of the PDF document. Returns a CosDoc which can be subsequently used for all query into the PDF files. Remember to release the document with cosDocClose, once the object is used.
PDFIO.Cos.cosDocClose — Function cosDocClose(doc::CosDoc)Reclaims all system resources consumed by the CosDoc. The CosDoc should not be used after this method is called. cosDocClose only needs to be explicitly called if you have opened the document by 'cosDocOpen'. Documents opened with pdDocOpen do not need to use this method.
PDFIO.Cos.cosDocGetRoot — Function cosDocGetRoot(doc::CosDoc) -> CosDocThe structural starting point of a PDF document. Also known as document catalog dictionary.
PDFIO.Cos.cosDocGetObject — Function cosDocGetObject(doc::CosDoc, obj::CosObject) -> CosObjectPDF objects are distributed in the file and can be cross referenced from one location to another. This is called as indirect object referencing. However, to extract actual information one needs access to the complete object (direct object). This method provides access to the direct object after searching for the object in the document structure. If an indirect object reference is passed as obj parameter the complete indirect object (reference as well as all content of the object) are returned. A direct object passed to the method is returned as is without any translation. This ensures the user does not have to go through checking the type of the objects before accessing the contents.
Example
julia> cosDocGetObject(doc.cosDoc, CosIndirectObjectRef(555, 0))
555 0 obj
<<
/Count 18
/Last 629 0 R
/First 556 0 R
>>
endobj
julia> cosDocGetObject(doc.cosDoc, cn"DirectObject")
/DirectObject cosDocGetObject(doc::CosDoc, dict::CosObject, key::Union{CosName, CosNullType}) -> CosObjectReturns the object referenced inside the dict dictionary.
dictcan be a PDF dictionary object reference or an indirect object or a directCosDictobject.keycan beCosNullas well. In such a case, a replicatedCosDictwith direct or indirect objects will be returned for all the inputdictkeys.
Example
julia> catalog
652 0 obj
<<
/Outlines 555 0 R
/PageLayout /SinglePage
/PageMode /UseOutlines
/Pages 446 0 R
/Type /Catalog
/OpenAction [447 0 R /XYZ null null 0 ]
>>
endobj
julia> pages = cosDocGetObject(doc.cosDoc, catalog, cn"Pages")
446 0 obj
<<
/Kids [447 0 R 449 0 R 451 0 R]
/Count 3
/Type /Pages
>>
endobj
julia> cosDocGetObject(doc.cosDoc, catalog, cn"PageLayout")
/SinglePage