Information Extraction in Azure AI Solutions
Source: My personal notes from Course AI-102T00-A: Develop AI solutions in Azure - Microsoft Learn with labs from Exercises for Develop AI-powered information extraction solutions in Azure
Multi-modal Analysis, Document Intelligence
Section titled βMulti-modal Analysis, Document IntelligenceβUse case: extract data and store structured data from documents with text, images, audio, videos.
Common use is recognizing data in forms for processing and getting information out of documents. Models can trained for different forms to recognize content in them and find fields in the media like names, identifiers, and addresses.
- Azure AI Content Understanding can analyze multiple input mediums
- Recognition can be in images and video like graphs, charts, and other structured data. For example, in a video, speakers, transcription, and summary of activity will be detected.
Q: What are differences between document intelligence and other vision services?
Document intelligence and content understanding uses an existing schema like an invoice to extract information.
Azure AI Content Understanding
Section titled βAzure AI Content UnderstandingβSource: Azure Content Understanding documentation | Microsoft Learn and Create a multimodal analysis solution with Azure Content Understanding - Training
Content Understanding uses AI to process and ingest different types of content and generates output to be uses like automation, analysis, search / RAG, reporting, and classification.
Content Understanding Benefits
Section titled βContent Understanding Benefitsβ- Multi-modal and handles documents, images, videos, audio, and custom formats and unstructured data
- Can use multiple models and a schema to analyze, extract, and classify data and validate the extraction and provide accuracy scores
- Includes analyzers for scenarios like taxes, procurement, contracts, call centres, media analysis and others. An analyzer is built on the schema and can be custom built or use pre-built ones to analyze further documents of similar type.
Content Understanding Framework
Section titled βContent Understanding Frameworkβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| Content Understanding Framework |ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Inputs -------------------->> Analyzersββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| | | | || Documents | Content | Segment- | Field | Post-| Image β | Extraction | ation | Extraction | process-| Video | |βββββββββββββ|βββββββββββ|ββββββββββββ| ing| Audio ββββββββββββ>>| | | || | | | |ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| Analyzers |ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| || - Content Extraction - Segmentation - Field Extraction || β’ Scale, Format β’ Categorization β’ Prebuilt/custom sch. || β’ Orientation/de-skew β’ Routing β’ Extractive/inferred || β’ Layout/structure β’ Splitting fields || β’ Speaker recognition β’ Tables/complex fld || || - Postprocessing -----------------------------------------------|----| β’ Confidence scores | || β’ Grounding | Specialized AI Models | Foundational Models || β’ Normalization | OCR, Layout, | GPT-4 family || | Transcription | Embedding |ββββββββββββββββββββββββββββββ΄βββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ β βΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| Structured Output || (Markdown or JSON schema format) |ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| Search Agents Databases Apps Analytics |ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββLegend / Flow:
- Inputs (Documents, Image, Video, Audio) enter the Analyzer
- Analyzers (Content Extraction, Segmentation, Field Extraction, Postprocessing) operate in parallel/sequence.
- Specialized AI Models (OCR, Layout, Transcription) & Foundational Models (example GPT-4o, 4.1) support the framework.
- Output: Structured data (Markdown/JSON).
- Targets: Search, Agents, Databases, Apps, Analytics
Using AI Content Understanding in a Solution
Section titled βUsing AI Content Understanding in a SolutionβIn an Azure Foundry project, define a Content Understanding schema for information to be extracted using a content sample and analyzer template.
An application can call the REST API to:
- Submit analyzer schema definition (API HTTP PUT)
- Check operations of schema processing (API HTTP GET)
- Send content for analysis (API HTTP POST)
- Check status of operation and retrieve results (API HTTP GET)
Custom Models
Section titled βCustom ModelsβCustom Classification
Section titled βCustom ClassificationβApply a label to a whole document, sorting documents into types. For training, requires examples of documents for each label.
Custom Extraction
Section titled βCustom ExtractionβApply label to specific text and extract custom labels from documents
Training methods:
- Custom template (custom form) - structure of forms, templates, other structured documents; short training time with schema, requires labelling fields
- Custom neural (custom document): structured and unstructured documents; longer training time
After training, custom model in response gives accuracy score (how accurate the model thinks the prediction is for each predicted label).
Deployment and Use
Section titled βDeployment and UseβModels do require re-training across environments like development and production separated environments. Alternatively, a custom model can be built in production, though need to balance risk of making changes. Major changes to the model can be built separately in a different model in production. When the new model is ready, the new model ID is given for applications to upgrade.
During SDK calls, call the model using the model ID and get results.
Document Intelligence
Section titled βDocument IntelligenceβUse case: analyze and extract information from documents.
Pre-built models are available like for receipt, invoice, business cards, ID documents, government certificates, financial documents, and other common documents.
The other models are designed to extract values from documents with less specific structures:
- Read model - Extracts text and languages from documents. Used by other pre-built models for text extraction
- General document model - Extract text, keys, values, entities, and selection marks (checkbox, radio buttons, others) from documents. Gets common entities like people, locations, organizations, contact information, email, URL, and others
- Layout model - Extracts text similar to general document model and structured information from documents with focus on structural layout information (header, footer, columns) rather than key-value semantics
Pre-built Model Features
Section titled βPre-built Model Featuresβ-
Text extraction. All the prebuilt models extract lines of text and words from hand-written and printed text.
-
Key-value pairs like label and key or label and value, for example weight and 30 kg
-
Entities. Entity types include people, locations, and dates.
-
Selection marks. Some models extract spans of text that indicate a choice as selection marks. These marks include radio buttons and check boxes.
-
Tables. Many models can extract tables in scanned forms included the data contained in cells, the numbers of columns and rows, and column and row headings. Tables with merged cells are supported.
-
Fields. Models trained for a specific form type identify the values of a fixed set of fields. For example, the Invoice model includes
CustomerNameandInvoiceTotalfields.If you have an industry-specific or unique form type, you might be able to obtain more reliable and predictable results by using a custom model, though balance the training required for accuracy.
Knowledge mining
Section titled βKnowledge miningβUse case: find information and search for relevant information in knowledge bases like web, document, and other data. Implementations are organizational search and supporting retrieval augmented generation (RAG).
Concept: Indexer gets data from data sources. Document cracking retrieves the text content and attributes. An enrichment pipeline builds JSON representations of each indexed document. Fields for each document might be file name, date, size). Result is an index of the indexed documents. The idea is similar to indexes in books with keywords and page numbers.
Azure AI Search with vector databases and also regular databases with vectors can be used for search. It indexes documents and data, use AI to enrich index data and store insights in a knowledge store for analysis and integration.
Examples of Enrichment Pipeline features:
- Language detection
- Key phrase detection
- Translation
- Get text from images
- Image description
- PII identification
- Captions, tags
- Custom skills: other logic
Search an Index
Section titled βSearch an IndexβEach index field can be configured as:
- key: Fields that define a unique key for index records
- searchable: Fields that can be queried using full-text
- filterable: Fields that can be included in filter expressions to return only documents that match specified constraints
- sortable: Fields that can be used to order the results
- facetable: Fields that can be used to determine values for facets (user interface elements used to filter the results based on a list of known field values)
- retrievable: Fields that can be included in search results
Q: How can indexing and information sensitivity be handled, for example classified documents and permissions?
The classification like sensitivity of the document can be metadata in the document. Things like permissions and sensitivity can be included in index so when search is done, the search can determine if a user has access to a result.
Stored Extracted Information in a Knowledge Store for Search and Processing
Section titled βStored Extracted Information in a Knowledge Store for Search and Processingβ| Input type | Projectionβs storage |
|---|---|
| JSON documents | Objects |
| Extracted fields with a relational schema | Tables |
| Extracted images | Files |
Projections are separate from the index and could be used for additional processing.
Azure AI Search
Section titled βAzure AI SearchβThe service Azure AI Search connects data to AI to help:
- Ground AI in accurate data and responses
- Access data
- Enrich and structure content
- Combine text search with vector search (hybrid search) for precision
and recall
- Full-text, vector, hybrid, and multimodal queries over local (indexed) and remote content
- Implement search features like relevance tuning, filters, geo-spatial search, synonym mapping, and autocomplete
- Securely access protected information
- Monitor and measure activity
Deployments include classic search and retrieval-augmented generation (RAG) via agentic retrieval.
Exercise: Extract information from multimodal content
Section titled βExercise: Extract information from multimodal contentβ- Create Foundry to use Content Understanding AI service (Content Understanding - Layout)
- βAI Servicesβ in the foundry do not required deployment of models
- Do custom tasks invoice field detection, slide analysis, audio
analysis and meeting recording analysis to extract information in each
medium
- Each task with test detection with uploaded files and set up an analyzer with a schema of fields to be extracted
Exercise: Analyze forms with prebuilt Azure AI Document Intelligence models
Section titled βExercise: Analyze forms with prebuilt Azure AI Document Intelligence modelsβSet up an Azure AI Foundry project for document analysis. Use Azure AI Foundry portal and the Python SDK to submit forms to AI resource for analysis.
Exercise: Develop a Content Understanding client application
Section titled βExercise: Develop a Content Understanding client applicationβUse the Azure Content Understanding (ACU) Python SDK to create an analyzer that extracts information from business cards.
Create a client application that sets up the analyzer to extract contact
details from scanned business cards. Cards are submitted and JSON
results are returned.
{ "analyzerId": "businesscardanalyzer", "apiVersion": "2025-11-01", "createdAt": "2026-04-17T19:50:57Z", "stringEncoding": "codePoint", "warnings": [], "contents": ["details of the business card scans"],}Exercise: Extract data with Azure Document Intelligence
Section titled βExercise: Extract data with Azure Document IntelligenceβThe Azure AI service, Azure Document Intelligence, can do automated data processing to extract text, key/value pairs, and tables from form documents using optical character recognition (OCR).
It uses pre-built models for recognizing invoices, receipts, business cards, and other common document types. The service also can train custom models to extract specific data fields from forms.
Use both prebuilt and custom Document Intelligence models to extract information from documents.
Exercise: Create a knowledge mining solution
Section titled βExercise: Create a knowledge mining solutionβUse AI Search to index a set of documents maintained by Margieβs Travel, a fictional travel agency. The indexing process involves using AI skills to extract key information to make them searchable, and generating a knowledge store containing data assets for further analysis.
-
Upload documents for indexing and AI enrichment to a Azure storage account container with PDFs of travel information
-
Connect the Azure AI Search to the document container and configure Retrievable, Filterable, Sortable, and Facetable for indexed fields
-
Search through documents using index created by Azure AI Search
-
Create an app that can query and retrieval specific fields from searched documents
-
Search manually in the Azure portal AI Search like the
jsonbelow:
{ "search": "*", "count": true, "select": "title,locations", "queryType": "semantic", "semanticConfiguration": "margies-index-semantic-configuration", "captions": "extractive", "answers": "extractive|count-3", "queryLanguage": "en-us", "queryRewrites": "generative"}Example search and response from index using Azure AI Search explorer
Section titled βExample search and response from index using Azure AI Search explorerβSearch
{ "search": "New York", "count": true, "select": "title,keyPhrases", "filter": "metadata_storage_size lt 380000"}Results
{ "@odata.context": "https://aisearch2325425.search.windows.net/indexes('margies-index')/$metadata#docs(*)", "@odata.count": 1, "value": [ { "@search.score": 6.8039145, "title": "Margies Travel Company Info.pdf", "keyPhrases": [ "world-leading travel agency", "best travel experts", "international reach", "Currency Exchange", "Las Vegas", "New York", "San Francisco", "leadership team", "Marjorie Long", "Logan Reid", "Emma Luffman", "Deepak Nadar", "Strategic Director", "Margie", "local", "expertise", "Flights", "Accommodation", "Transfers", "Visas", "Excursions", "trips", "Dubai", "London", "CEO", "CFO", "website" ] } ]}Exercise: Build an automated RAG ingestion pipeline with Content Understanding
Section titled βExercise: Build an automated RAG ingestion pipeline with Content UnderstandingβRetrieval-augmented generation (RAG) is a method that enhances Large Language Models (LLMs) by integrating data from external knowledge sources. In production scenarios, new documents arrive continuously and must be extracted, embedded, and indexed so they are available for search in near real-time.
- Create an analyzer for the travel PDF documents
- Build an automated RAG ingestion pipeline that uses Azure Content
Understanding to extract content from multimodal documents and embeds
the content using Azure OpenAI, and indexes it in Azure AI Search
- It extracts content with Content Understanding, generating vector embeddings with Azure OpenAI, and indexing into Azure AI Search. It also tracks which files have been processed so it can detect new or updated documents on subsequent runs
- The pipeline tracks which files have already been processed and can run in watch mode to automatically detect and ingest new documents as they arrive
- Creating a conversational agent that answers questions grounded in the indexed data
Pipeline Details from ingest-pipeline.py:
- Tracks processed files using a manifest (processedfiles.json) that records the SHA-256 hash of each file. On each run, the pipeline compares the current hash of every file in the data/ folder against the manifest, so only new or modified files are processed.
- Ensures the search index exists by calling ensureindex(), which creates or updates the Azure AI Search index with the required schema (text fields, a vector field, and HNSW vector search configuration).
- Extracts content from each new file by submitting it to the Content Understanding analyzer via beginanalyzebinary, which returns markdown content and extracted fields (summary, key topics).
- Chunks the content by splitting at paragraph boundaries with a 2000-character limit, keeping each chunk self-contained.
- Generates embeddings for each chunk using the Azure OpenAI embedding model, producing a 3072-dimension vector for semantic search.
- Indexes the chunks into Azure AI Search using deterministic document IDs (based on the file name and chunk index), so re-ingesting an updated file replaces its old chunks.
- Supports a βwatch flag for continuous monitoring and a βreset flag to reprocess all files.
RAG client details from rag-agent.py:
- Creates an Azure AI Search client to retrieve documents.
- Creates an Azure OpenAI chat client.
- Implements a retrieval function that performs hybrid search (combining keyword and vector search) to find the most relevant content chunks.
- Constructs a prompt that includes the retrieved context and the userβs question.
- Sends the prompt to the chat model for answer generation.
- Runs a conversational loop so you can ask multiple questions.