All templates
ScriptingAdvanced
Extract Text from Word Document
Robomotion•Updated 6 months ago

Overview
Uses a VBScript bridge to pull raw text out of a .docx file for downstream NLP or search indexing.
Extract Text from Word Document
Scripting enables Robomotion users to develop more efficient procedures and create less complex flows. For example, they can extract the multi-page content of Word documents using VBScript code instead of manually launching the application and interacting with its interface.
What Extract Text from Word Document can do
Core.Flow.SubFlowdownloads fixtures; a Function buildsmsg.sample_docx(.../fixtures/sample.docx).- Input Dialog titled
Extract text from Word document, messageSelect the Word document to extract text from:, defaultmsg.sample_docx→msg.word_doc_path. - Validate (
Core.Programming.Function,outputs: 2) — reject paths that do not end in.doc/.docx; on success derivemsg.script_pathas<doc dir>\_extract.vbs. - Build script (
Core.Programming.Function) injects the escapedmsg.word_doc_pathinto a VBScript template that opens the document withWord.Application, iteratesWordDoc.Sentences, andWScript.Echoes each sentence →msg.vbs_body. Core.FileSystem.WriteFile(optMode: 'truncate') writes the script tomsg.script_path; a Function setsmsg.vbs_args = ['//Nologo', msg.script_path].Core.Process.StartProcessrunscscriptwithmsg.vbs_argsin the foreground →msg.vbs_output;Core.FileSystem.Deleteremoves the temp script.- Function trims leading/trailing whitespace into
msg.trimmed_text;Core.Dialog.MessageBoxtitledExtracted text:(typeinfo) displays it, thenCore.Flow.Stop.
Behind the scenes
WordDoc.Sentences.Countreturns a sentence count, not a word count — the script walksSentences(i)so each echoed line is a full sentence. The variable is named accordingly in the VBScript body.- The script calls
WordDoc.Close FalseandWord.Quit, explicitly discarding any incidental edits so Word does not raise a "save changes?" prompt on the next headless run. - Quoting is the riskiest part of the bridge: the build step backslash-escapes and double-quotes the path before interpolation, which is safe for well-formed Windows paths but is not a general-purpose sanitiser — treat
msg.word_doc_pathas trusted input. - The VBScript file is written next to the source document (rather than a system temp) so the user never has to grant write access to an unfamiliar location, and it is deleted after execution with
continueOnError: trueso a transient antivirus lock does not abort the flow. cscript //Nologosuppresses the banner so the captured stdout is only the script'sWScript.Echooutput; the trim node then strips the trailing CRLF thatcscriptalways appends.