Intelligent Cloud — Part 2: A Deep Dive into the Architecture and Technical Details of INCL
Intelligent Cloud - Part 2: A Deep Dive into the Architecture and Technical Details of INCL
This article explores the technical details of INCL’s architecture, which consists of three components: INCL client, INCL backend, and job instance. Each component plays a vital role in providing a seamless and scalable deep learning training environment.
Workflow of Running an Experiment in INCL
INCL workflow involves three main steps. First, a user requests a job run, specifying the script and the required resources. Second, the job instance is provisioned by the backend, and the experiment code from the object storage is synced to block storage. Lastly, the job is run on the job instance, allowing users to monitor their experiments through a web UI.
Key Components of the INCL Architecture
INCL Client
INCL client provides a user-friendly interface for visualizing training progress and results through metric graphs. The logging server, which consists of the API server, scheduler, job runner, and logging server components, handles real-time log and metric data.
Scheduler and Job Runner
The scheduler selects jobs from the queue based on resource usage and quotas and sends the selected job to job runner, which runs the job on a job instance.
Logging Server
The logging server handles log and metric data in real-time, providing a smooth training experience for users.
Job Instance
A job instance is a virtual machine that runs the requested job, allowing users to log configuration details, experiment metrics, and model artifacts. INCL offers three types of storage available: object storage, block storage, and file storage, which store all training data and are mounted to the job instance as a network file system.
By leveraging these components, INCL provides a powerful and flexible training environment for deep learning practitioners.