Microsoft Dev BlogsFebruary 9, 2024

Engineering Document (P&ID) Digitization

View Original

Engineering Document (P&ID) Digitization

Introduction:

P&IDs are essential in the manufacturing industry, but the manual process of digitizing them is a roadblock in the transition to digitally connected factories.
The goal is to automate the identification of symbols, text, and connections within P&IDs.
Research papers such as "Digitization of chemical process flow diagrams using deep convolutional neural networks" and "Digitize-PID: Automatic Digitization of Piping and Instrumentation Diagrams" provide insights for the solution.

Types of Symbols in P&IDs:

Equipment Symbols: Represent process equipment used in a plant or process.
Instrumentation Symbols: Represent various instruments used to monitor and control process parameters.

Symbol Detection Model:

Azure Custom Vision was initially used, but the limitation of algorithms led to using AutoML Image Object Detection with YOLOv5 for training the model.
A synthetic dataset with 50+ symbols recognition, including JPEG images with label annotations and bounding boxes, was used.
New symbols can be recognized by feeding the model with new training data and labeling them using Azure ML Data Labeling tool.

Automated Training Pipeline Workflow:

Data is pulled from blob storage and aggregated into a single annotation file to avoid bias in the model.
The training pipeline generates a new model.

Text Detection Module:

Text detection faces challenges such as crowded symbols and poor resolution quality.
The OCR service used is optimized for text-heavy documents and engineering diagrams like P&IDs.
A single OCR pass approach is used.

This digitization process eliminates the manual effort in converting P&IDs into digital format and enables more efficient onboarding of new customers and factories in the manufacturing industry.