Engineering Document (P&ID) Digitization

Engineering Document (P&ID) Digitization
Introduction:
- P&IDs are essential in the manufacturing industry, but the manual process of digitizing them is a roadblock in the transition to digitally connected factories.
- The goal is to automate the identification of symbols, text, and connections within P&IDs.
- Research papers such as "Digitization of chemical process flow diagrams using deep convolutional neural networks" and "Digitize-PID: Automatic Digitization of Piping and Instrumentation Diagrams" provide insights for the solution.
Types of Symbols in P&IDs:
- Equipment Symbols: Represent process equipment used in a plant or process.
- Instrumentation Symbols: Represent various instruments used to monitor and control process parameters.
Symbol Detection Model:
- Azure Custom Vision was initially used, but the limitation of algorithms led to using AutoML Image Object Detection with YOLOv5 for training the model.
- A synthetic dataset with 50+ symbols recognition, including JPEG images with label annotations and bounding boxes, was used.
- New symbols can be recognized by feeding the model with new training data and labeling them using Azure ML Data Labeling tool.
Automated Training Pipeline Workflow:
- Data is pulled from blob storage and aggregated into a single annotation file to avoid bias in the model.
- The training pipeline generates a new model.
Text Detection Module:
- Text detection faces challenges such as crowded symbols and poor resolution quality.
- The OCR service used is optimized for text-heavy documents and engineering diagrams like P&IDs.
- A single OCR pass approach is used.
This digitization process eliminates the manual effort in converting P&IDs into digital format and enables more efficient onboarding of new customers and factories in the manufacturing industry.