Skip to content

Computer Vision in Azure AI solutions

Source: My personal notes from Course AI-102T00-A: Develop AI solutions in Azure - Microsoft Learn with labs from Exercises for Develop vision solutions in Azure

Use cases: create captions for image, image tagging, detect and locate objects and people

Existing models will respond with JSON with the image information. Response will include information like entity and the boundary box of the entity in the image.

Access can be using an SDK or API to submit images and get responses.

Use cases: get information from text like addresses, identifiers, numbers, photographer, license plates, and digitize notes

Optical character recognition (OCR) recognizes text and structure. Responses from the service include blocks and text and information for that block.

Example solution: A camera periodically uploads images to blob storage. On an event, the code checks for an update, extracts entities and responds.

Facial Recognition: Detect, Analyze And Recognize Faces

Section titled “Facial Recognition: Detect, Analyze And Recognize Faces”

Use cases: detect and locate faces, analyze facial features.

Using Azure AI face services requires approval due to sensitivity and are used for facial recognition solutions. For features, see Computer Vision (CV) Concepts and in Azure - Computer Vision (CV) Concepts and in Azure

Accessing Azure AI Vision Face resources can be done using Face SDK to get detection and face attributes and features responses.

Use case: classify images (categorization) and detect object(s) and classify them

Classification can be:

  • Multi class where each image is tagged with 1 class label from several, for example apple from [apple, orange, pineapple, banana].
  • Multi label where each image can be tagged with multiple classes, for example fruit bowl with apple, orange, and banana

Example solutions: Food processing where image classification checks for “good” versions of a food product and remove “bad” versions. Medical imaging checks if a disease is present or not.

See Research Data - Mendeley Data for sample data sets to play with.

Models can be trained with custom classifications.

Multimodel generative AI model respond to prompts and return created content. Prompts can include text, speech, and images and typically include a text part and media part.

Uses prompt to create images.

Image generation can have responsible AI implications like other generative AI with malicious use.

Create Azure AI vision resource and using the SDK, submit images for captions, entity location, get objects, tagging, and people identification.

Create a face service and using the SDK, submit images for face detection and boundary box location.

Use the OpenAI DALL-E generative AI model like dall-e-3 to create images and OpenAI Python SDK to create a simple app to generate images based on prompts.