Skip to content

Design and Implement Data Science Solution in Azure DP-100

Source: My personal notes from Course DP-100 Designing and implementing a data science solution on Azure - Microsoft Learn with labs from MicrosoftLearning/mslearn-azure-ml · GitHub

Course looks at data science work for machine learning in Azure.

Topics include input data preparation and ingestion, machine learning model training, and model deployment and use. Machine learning operations (MLOps) is used for managing a model deployment from start to production use.

Half of course learning is labs using Python SDK and Azure compute and storage to train and deploy models.

  • Day 1: Explore Azure Machine Learning (ML)
  • Day 2: MLFlow, hyperparameters tuning, notebooks and scripts for custom model training
  • Day 3: Pipelines, Responsible AI, model deployment
  • Day 4: Model use and deployment, compare models, prompt engineering, exam preparation

Using patient data, course will design an ML solution for diagnosis starting from looking at data, ingestion, and ML training to model deployment, use, and operations.

For models to predict values, need to know machine learning methods:

  1. Classification - determine class
  2. Regression - numeric values
  3. Time-series forecasting - predict based on previous data
  4. Computer vision
  5. Natural language processing (NLP)

For the case study of diabetes diagnosis, classification is appropriate with the metric of Accuracy of the diabetes classification of diabetic.

Prepare data and plan machine learning (ML)

Section titled “Prepare data and plan machine learning (ML)”

Identify where data is coming from and its data source. Could be tabular / structured, semi-structured, and unstructured data.

Determine appropriate storage for data and compute to do training.

Design data ingestion and how training will happen. Microsoft services:

  • Azure storage and compute like virtual machines (VM)
  • Azure AI Services
  • Fabric
  • Azure Databricks (Apache Spark based)
  • Azure Machine Learning (ML)

For case study, configure appropriate training task settings and compute like serverless and CPU. Compute infrastructure in Azure ML runs in containers.

  1. Endpoints

    Deploy to an endpoint for real time or batch predictions.

    Endpoints are encrypted and could be deployed on public network or private endpoint in a private network. Access can be controlled using Managed Identity (no password, managed with ML resource).

    After creating a ML workspace, access it on the web Azure ML studio.

  2. Monitoring

    Monitor the model for performance, data drift (changes in input or target distributions), and adjust training with new data and schedule it.

To speed up node management in compute clusters, you can optionally add az ml compute create ... --min-instances 2 to the Azure CLI command to ensure 2 nodes are always running during the lab since cluster node scaling can take minutes. This setting will consume more resources and cost so make sure to shutdown resources when finished as the setting will prevent the cluster from scaling to 0 nodes.