All templates
PDFBeginner
Get Images from PDF
Robomotion•Updated 6 months ago

Overview
Extracts every embedded image from a PDF file into a target directory. Useful for downstream OCR or asset recovery.
Get Images from PDF
Apart from text content, PDF files can contain important information in the form of images. Robomotion offers a PDF node that extracts images from PDF files and enables users to access and process these images independently of the original file.
What Get Images from PDF can do
Core.Flow.SubFlowdownloads fixtures; a Function buildsmsg.source_pdf(.../fixtures/with_images.pdf) andmsg.images_dir(.../fixtures/images).- Input Dialog titled
Extract images from PDF, messageSelect the PDF to extract image(s) from:, defaultmsg.source_pdf→msg.pdf_path. - Input Dialog titled
Extract images from PDF, messageSelect to folder to save the extracted images to..., defaultmsg.images_dir→msg.destination_folder. - Validate (
Core.Programming.Function,outputs: 2) — proceed whenmsg.pdf_pathends in.pdfandmsg.destination_folderis set; otherwiseCore.Flow.Stop. Core.FileSystem.CreatewithoptType: 'directory'ensuresmsg.destination_folderexists, thenRobomotion.PDFBox.ExtractImageswrites PNGs prefixedPDF Imageto that folder.Core.Dialog.MessageBoxtitledDone!(typeinfo) confirms withImages extracted successfully..
Behind the scenes
Robomotion.PDFBox.ExtractImagesnames outputsPDF Image 1.png,PDF Image 2.png, … — the node owns the numeric suffix and extension, sooptPrefixonly controls the human-readable stem.optExportType: 'png'normalises output regardless of the source image format inside the PDF, which makes downstream OCR and thumbnailing easier than dealing with a mix of JPEG, TIFF and DCT streams.Core.FileSystem.CreatewithcontinueOnError: trueacts as anmkdir -p— harmless when the folder already exists and avoids a pre-check branch.- The validator rejects empty
msg.destination_folderbefore extraction, so an empty dialog response short-circuits toCore.Flow.Stopinstead of failing inside the PDF node.