Skip to content

Power Automate - Extracting Data from Files

Extract text from Word Docx files - Options

Power Automate flows can get data stored in Microsoft Word documents and extract data, for example processing forms and storing their field data to the Dataverse.

These options look at getting the document content as text and writing matching logic. Each approach is similar by differs in how to get the content and extracting data. Details in the options are in the citation links.

Use the docx zip format, get xml, then use xpath ¹ ².
Tachytelic has an API ³.
Convert the document to PDF and use text recognition. The recognition looks at the structure of the document and matching can use Azure Logic Apps and Power Automate expressions which in idea are similar to xpath.

Option 1 is recommended for those who do not want to use premium services and API calls. The approach of option 1 would also work with programming languages to extract the document and parse XML content.

Option 3 allows image processing or mix of scanned/PDF/Word documents being processed.