Skip to content

Natural Language and Natural Language Processing (NLP) in Azure AI Solutions

Source: My personal notes from Course AI-102T00-A: Develop AI solutions in Azure - Microsoft Learn with labs from Exercises for Azure AI Language Exercises

Analyze Text and Natural Language Processing (NLP)

Section titled “Analyze Text and Natural Language Processing (NLP)”

Use cases for NLP –> service output in Azure

  • Language detection: with documents, can determine language(s) in text –> detected language(s), confidence scores, identifiers
  • Key phrase extraction: keywords –> key phrase array
  • Sentiment analysis: determine positive, neutral, negative –> sentiment breakdown by sentence, confidence scores
  • Named entity recognition: find and identify specific things in text like locations, people, addresses, food, actions –> entity category, location in text, confidence scores
    • In Azure, can include custom named entities like organization’s terms
  • Entity linking: understand a word (entity) in text and its meaning –> link a keyword in text to its Wikipedia article
  • Summarizing text, content: outline –> Extractive summarization selects key sentences directly from the text. Abstractive summarization rewrites key ideas using new phrasing.
  • Personal identifiable information (PII) detection: detect and/or remove sensitive information like name, phone number, credit card –> label sensitive fields, redacted text array, type of PII, confidence scores

Customizable models are available for each specific case. Generally Azure AI services give JSON responses.

Deployment options include AI Foundry, Azure AI Services or a Language services resource.

Send data for NLP and get response from Azure AI services.

Conversational Understanding And Question Answering

Section titled “Conversational Understanding And Question Answering”

The goal is having conversations between people and machines with machines completing an action. The application is getting user input, determine the user’s intent (semantic meaning), and do the appropriate action like call an API or do a function.

For example, a hotel chat app to help guests with requests like room service. The conversation would need to determine the guest, room number, and food order information and put in the kitchen order.

Interpreting user input is called natural language understanding (NLU).

Conversational Language Understanding (CLU) Applications

Section titled “Conversational Language Understanding (CLU) Applications”

CLU enables building an NLU component in a conversational application.

The NLU model understands utterances from the NL input and maps them to intents that assign semantic meaning. It can recognize entities like quantities, date and time, email, identifiers

Examples:

  • Utterance: what is the time in Toronto
  • Intent: Get Time (function)
  • Entities: Toronto (location)

Specific intents, entities, synonyms can be used to train a model to understand them with utterances. Labelled utterances are used for model training and evaluation. Ensure data is balanced with different utterances and intents and language used is varied.

The trained model can be published at an endpoint.

Conversational Language Understanding (CLU) supports two modes for training models:

  • Standard training uses fast machine learning algorithms to quickly train your models but has limitations (English only as of 2026-04-23)
  • Advanced training uses the latest in machine learning technology to customize models with your data. This training level is expected to show better performance scores for your models and enables you to use the multilingual capabilities of CLU

Question Answering is a knowledge base of question and answer pairs with NLP. Content, like an FAQ, is indexed and service is published as a REST endpoint. Common inputs are FAQ websites, documents, and information brochures.

Custom Question Answering projects can use active learning where alternate questions to relevant question answer pairs can be reviewed and added or manually added. Synonyms can be defined so QA knows multiple words have the same meaning.

Synonym example

{
"synonyms": [
{
"alterations": [
"reservation",
"booking"
]
}
]
}

Question Answering (QA) vs Language Understanding

Section titled “Question Answering (QA) vs Language Understanding”

QA is focused on static answers from known content with slight variations possible. For LLMs, it is similar to Retrieval Augmented Generation (RAG) - Retrieval Augmented Generation (RAG)

NLU focuses on understanding intents and performing actions.

UseQuestion answeringLanguage understanding
Usage patternUser submits a question, expecting an answerUser gives utterance, expecting an appropriate response or action
Query processingUse NLP to match question to answerUse NLP to interpret the utterance, match it to an intent, and identify entities
ResponseStatic Response to known questionResponse indicates likely intent and referenced entities
Client logicClient presents answer to the userClient responsible for performing appropriate action from detected intent

Example are recognizing specific entities, organization or industry terms and custom locations.

Define entity labels and label documents. Custom named entities will be used for train models.

Minor changes like new named entities could be made in production since the solution architecture stays the same.

Speech use cases include:

  • Speech <–> Text
  • Translation
  • Speaker recognition - who is speaking
  • Pronunciation assessment
  • Intent recognition

Azure Services are available in UI, CLI, SDKs, and REST APIs. See AI Speech and in Azure - AI Speech and in Azure for description of capabilities.

Upload speech data to train a custom model (base model + changes). Model is tested, updated, and accept model gets an endpoint for use

Custom voice combined with speech and NLP services above is possible.

XML- based language with customization options:

  • Speaking styles
  • Pauses and silence
  • Phonetic pronunciation
  • Prosody (pitch, range, rate)
  • Say-as (number, date, time, address)
  • Insert recorded speech, background

Example SSML

<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'><voice name='en-US-Ava:DragonHDLatestNeural'>Hello, welcome to Azure AI Foundry!</voice></speak>

Source: Translate text and speech - Training | Microsoft Learn

Use cases:

  • Detect language and do translation, which can be one language to many languages.
  • Transliteration of a language character set to character set of the target language like Japanese characters to English phonetic words
  • Translation can be text, speech, or between text to speech and speech to text in batch or multiple target languages

Can be applied to document translation, custom translation for a domain or industry, and in single task or batch translation. Options includes accurate grammar, profanity filter.

Azure Translator in Foundry Tools provides an API for translating text in supported languages:

  • Translate or transliterate text using the default translation model or a large language model (LLM).
  • Translate documents, synchronously or asynchronously, while maintaining document structure.
  • Use custom translation models to translate domain-specific terms.

Inputs and outputs could be text and speech.

When using speech functions, configuration through SpeechTranslationConfig object allows API connection and with other objects, configuration of languages, audio streams, text, and synthesis.

Audio can be added to apps, like prompts can be given with audio and text and responses given as audio. API calls will include audio media.

Create a language service to analyze hotel review data using the SDK for processing to return sentiment, entity and phrase analysis, and entity linking.

Azure Language in Foundry Tools supports analysis of text, including language detection, entity recognition, and PII redaction.

  • Use the service directly in an application through its REST API with an SDKs.
  • Use the service in Azure Language in Foundry Tools MCP server to integrate its capabilities into an AI agent. It uses an AI Foundry endpoint and key

Exercise: Use speech-capable generative AI models

Section titled “Exercise: Use speech-capable generative AI models”

Use generative AI models to support two use cases:

  • Speech synthesis (text-to-speech) - generating speech output.
  • Speech recognition (speech-to-text) - transcribing speech input.

Exercise: Create a question answering solution

Section titled “Exercise: Create a question answering solution”

Create a language service which used Azure AI Search. Configure the knowledge base with additional questions and follow up answers. Connect to the service and ask questions using the SDK.

Create a language service for custom text analysis and storage account for custom text documents. Label testing and training data and train a new model. Using the SDK, submit new text documents for classification and see categories and confidence scores.

Exercise: Develop an audio-enabled chat app

Section titled “Exercise: Develop an audio-enabled chat app”

Create a Azure AI Foundry model with Phi-4-multimodal-instruct. Use Azure AI Foundry and the Azure AI Inference SDK to create a client application uses a multimodal model to generate responses to audio. The audio was sent to the model and it provided acknowledgement of the requests.