Natural Language and Natural Language Processing (NLP) in Azure AI Solutions
Source: My personal notes from Course AI-102T00-A: Develop AI solutions in Azure - Microsoft Learn with labs from Exercises for Azure AI Language Exercises
Analyze Text and Natural Language Processing (NLP)
Section titled “Analyze Text and Natural Language Processing (NLP)”Use cases for NLP –> service output in Azure
- Language detection: with documents, can determine language(s) in text –> detected language(s), confidence scores, identifiers
- Key phrase extraction: keywords –> key phrase array
- Sentiment analysis: determine positive, neutral, negative –> sentiment breakdown by sentence, confidence scores
- Named entity recognition: find and identify specific things in text
like locations, people, addresses, food, actions –> entity category,
location in text, confidence scores
- In Azure, can include custom named entities like organization’s terms
- Entity linking: understand a word (entity) in text and its meaning –> link a keyword in text to its Wikipedia article
- Summarizing text, content: outline –> Extractive summarization selects key sentences directly from the text. Abstractive summarization rewrites key ideas using new phrasing.
- Personal identifiable information (PII) detection: detect and/or remove sensitive information like name, phone number, credit card –> label sensitive fields, redacted text array, type of PII, confidence scores
Customizable models are available for each specific case. Generally Azure AI services give JSON responses.
Azure Deployment and Use
Section titled “Azure Deployment and Use”Deployment options include AI Foundry, Azure AI Services or a Language services resource.
Send data for NLP and get response from Azure AI services.
Conversational Understanding And Question Answering
Section titled “Conversational Understanding And Question Answering”The goal is having conversations between people and machines with machines completing an action. The application is getting user input, determine the user’s intent (semantic meaning), and do the appropriate action like call an API or do a function.
For example, a hotel chat app to help guests with requests like room service. The conversation would need to determine the guest, room number, and food order information and put in the kitchen order.
Interpreting user input is called natural language understanding (NLU).
Conversational Language Understanding (CLU) Applications
Section titled “Conversational Language Understanding (CLU) Applications”CLU enables building an NLU component in a conversational application.
The NLU model understands utterances from the NL input and maps them to intents that assign semantic meaning. It can recognize entities like quantities, date and time, email, identifiers
Examples:
- Utterance: what is the time in Toronto
- Intent: Get Time (function)
- Entities: Toronto (location)
Customizations and Configurations
Section titled “Customizations and Configurations”Specific intents, entities, synonyms can be used to train a model to understand them with utterances. Labelled utterances are used for model training and evaluation. Ensure data is balanced with different utterances and intents and language used is varied.
The trained model can be published at an endpoint.
Conversational Language Understanding (CLU) supports two modes for training models:
- Standard training uses fast machine learning algorithms to quickly train your models but has limitations (English only as of 2026-04-23)
- Advanced training uses the latest in machine learning technology to customize models with your data. This training level is expected to show better performance scores for your models and enables you to use the multilingual capabilities of CLU
Question Answering (QA)
Section titled “Question Answering (QA)”Question Answering is a knowledge base of question and answer pairs with NLP. Content, like an FAQ, is indexed and service is published as a REST endpoint. Common inputs are FAQ websites, documents, and information brochures.
Custom Question Answering projects can use active learning where alternate questions to relevant question answer pairs can be reviewed and added or manually added. Synonyms can be defined so QA knows multiple words have the same meaning.
Synonym example
{ "synonyms": [ { "alterations": [ "reservation", "booking" ] } ]}Question Answering (QA) vs Language Understanding
Section titled “Question Answering (QA) vs Language Understanding”QA is focused on static answers from known content with slight variations possible. For LLMs, it is similar to Retrieval Augmented Generation (RAG) - Retrieval Augmented Generation (RAG)
NLU focuses on understanding intents and performing actions.
| Use | Question answering | Language understanding |
|---|---|---|
| Usage pattern | User submits a question, expecting an answer | User gives utterance, expecting an appropriate response or action |
| Query processing | Use NLP to match question to answer | Use NLP to interpret the utterance, match it to an intent, and identify entities |
| Response | Static Response to known question | Response indicates likely intent and referenced entities |
| Client logic | Client presents answer to the user | Client responsible for performing appropriate action from detected intent |
Custom Text Classification
Section titled “Custom Text Classification”Example are recognizing specific entities, organization or industry terms and custom locations.
Define entity labels and label documents. Custom named entities will be used for train models.
Minor changes like new named entities could be made in production since the solution architecture stays the same.
Speech Recognition And Synthesis
Section titled “Speech Recognition And Synthesis”Speech use cases include:
- Speech <–> Text
- Translation
- Speaker recognition - who is speaking
- Pronunciation assessment
- Intent recognition
Azure Services are available in UI, CLI, SDKs, and REST APIs. See AI Speech and in Azure - AI Speech and in Azure for description of capabilities.
Custom Speech Models
Section titled “Custom Speech Models”Upload speech data to train a custom model (base model + changes). Model is tested, updated, and accept model gets an endpoint for use
Custom voice combined with speech and NLP services above is possible.
Speech Synthesis Markup Language (SSML)
Section titled “Speech Synthesis Markup Language (SSML)”XML- based language with customization options:
- Speaking styles
- Pauses and silence
- Phonetic pronunciation
- Prosody (pitch, range, rate)
- Say-as (number, date, time, address)
- Insert recorded speech, background
Example SSML
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'><voice name='en-US-Ava:DragonHDLatestNeural'>Hello, welcome to Azure AI Foundry!</voice></speak>Translation
Section titled “Translation”Source: Translate text and speech - Training | Microsoft Learn
Use cases:
- Detect language and do translation, which can be one language to many languages.
- Transliteration of a language character set to character set of the target language like Japanese characters to English phonetic words
- Translation can be text, speech, or between text to speech and speech to text in batch or multiple target languages
Can be applied to document translation, custom translation for a domain or industry, and in single task or batch translation. Options includes accurate grammar, profanity filter.
Translate Text/Speech
Section titled “Translate Text/Speech”Azure Translator in Foundry Tools provides an API for translating text in supported languages:
- Translate or transliterate text using the default translation model or a large language model (LLM).
- Translate documents, synchronously or asynchronously, while maintaining document structure.
- Use custom translation models to translate domain-specific terms.
Inputs and outputs could be text and speech.
When using speech functions, configuration through
SpeechTranslationConfig object allows API connection and with other
objects, configuration of languages, audio streams, text, and synthesis.
Audio in Solutions
Section titled “Audio in Solutions”Audio can be added to apps, like prompts can be given with audio and text and responses given as audio. API calls will include audio media.
Exercise: Analyze text
Section titled “Exercise: Analyze text”Create a language service to analyze hotel review data using the SDK for processing to return sentiment, entity and phrase analysis, and entity linking.
Exercise: Develop a text analysis agent
Section titled “Exercise: Develop a text analysis agent”Azure Language in Foundry Tools supports analysis of text, including language detection, entity recognition, and PII redaction.
- Use the service directly in an application through its REST API with an SDKs.
- Use the service in Azure Language in Foundry Tools MCP server to integrate its capabilities into an AI agent. It uses an AI Foundry endpoint and key
Exercise: Use speech-capable generative AI models
Section titled “Exercise: Use speech-capable generative AI models”Use generative AI models to support two use cases:
- Speech synthesis (text-to-speech) - generating speech output.
- Speech recognition (speech-to-text) - transcribing speech input.
Exercise: Create a question answering solution
Section titled “Exercise: Create a question answering solution”Create a language service which used Azure AI Search. Configure the knowledge base with additional questions and follow up answers. Connect to the service and ask questions using the SDK.
Exercise: Custom Text Classification
Section titled “Exercise: Custom Text Classification”Create a language service for custom text analysis and storage account for custom text documents. Label testing and training data and train a new model. Using the SDK, submit new text documents for classification and see categories and confidence scores.
Exercise: Develop an audio-enabled chat app
Section titled “Exercise: Develop an audio-enabled chat app”Create a Azure AI Foundry model with Phi-4-multimodal-instruct. Use
Azure AI Foundry and the Azure AI Inference SDK to create a client
application uses a multimodal model to generate responses to audio. The
audio was sent to the model and it provided acknowledgement of the
requests.