Microsoft Dev Blogs

Engineering Document (P&ID) Digitization

thumbnail

Engineering Document (P&ID) Digitization

Introduction:

  • P&IDs are essential in the manufacturing industry, but the manual process of digitizing them is a roadblock in the transition to digitally connected factories.
  • The goal is to automate the identification of symbols, text, and connections within P&IDs.
  • Research papers such as "Digitization of chemical process flow diagrams using deep convolutional neural networks" and "Digitize-PID: Automatic Digitization of Piping and Instrumentation Diagrams" provide insights for the solution.

Types of Symbols in P&IDs:

  1. Equipment Symbols: Represent process equipment used in a plant or process.
  2. Instrumentation Symbols: Represent various instruments used to monitor and control process parameters.

Symbol Detection Model:

  • Azure Custom Vision was initially used, but the limitation of algorithms led to using AutoML Image Object Detection with YOLOv5 for training the model.
  • A synthetic dataset with 50+ symbols recognition, including JPEG images with label annotations and bounding boxes, was used.
  • New symbols can be recognized by feeding the model with new training data and labeling them using Azure ML Data Labeling tool.

Automated Training Pipeline Workflow:

  • Data is pulled from blob storage and aggregated into a single annotation file to avoid bias in the model.
  • The training pipeline generates a new model.

Text Detection Module:

  • Text detection faces challenges such as crowded symbols and poor resolution quality.
  • The OCR service used is optimized for text-heavy documents and engineering diagrams like P&IDs.
  • A single OCR pass approach is used.

This digitization process eliminates the manual effort in converting P&IDs into digital format and enables more efficient onboarding of new customers and factories in the manufacturing industry.