Digital twins: How they can help scale up industrial robotics AI

20 Oct 2020

Image: © THINK b/

Members of the Siemens Digital Industries Software team discuss the role of a digital twin and how it can assist industrial robotics.

As a part of the artificial intelligence and machine learning revolution, robots today can make real-time decisions based on inputs such as cameras (two or three dimensional), force and torque sensors and lidar.

These enable robots to perform industrial tasks that before could only be performed by humans such as part or product detection, random part grasping, assembly, wiring and so on.

Machine learning algorithms such as artificial deep neural networks are the ‘brains’ behind these complex robotic skills. Contrary to traditional programming, a machine learning algorithm is not programmed, rather it is trained for specific tasks by providing it with good examples of the task outcome.

The challenges of commissioning autonomous industrial robots

Training the machine learning algorithms that enable these robotic skills is laborious and time consuming. It requires setting up an environment where robots, sensors and other peripheral equipment are all integrated.

Moreover, the task for which the robots are training must be attempted many times in order to generate enough training examples. Often, manual assistance is required to position the parts after every try and monitoring each of the tasks executed by the robot in order to provide the correct feedback – success or failure. It is also frequently required to stop the robot in case there is a safety issue or a risk of damaging the product or equipment.

For example, consider an automated process where a robot picks parts from a bin while a camera is positioned on top of the bin to capture images of its interior.

In order to make this process work, a machine learning algorithm is trained to detect the position and dimension of the parts from images captured by the camera. The algorithm is trained by supplying it examples using camera images.

For each image, accurate information about the part position and dimensions must be manually supplied by, for example, drawing a bounding rectangle around the parts. Machine learning algorithms become robust after they are trained with enough training examples – so, given the bin-picking example, parts must be manually placed inside the bin in many different configurations.

For each configuration, an image must be captured by the camera, and the parts’ positions and dimensions are annotated. This is a lengthy, cumbersome and expensive process.

The machine learning method described here is an example of supervised learning. When training a supervised learning algorithm, the training data will consist of inputs (images), paired with the correct outputs like bounding rectangles, textual labels describing the objects in the image (‘box’, ‘can’, ‘screw’), object colour and so on.

However, training a machine learning algorithm to detect objects in an image is only the beginning. The complete system – robot, camera and machine learning algorithm – needs to be integrated. This means that the task must be attempted for multiple cycles to validate that the robot handles it robustly without damaging the parts or colliding with the sides of the bin.

Often, an equipment modification is required, one which incurs additional costs and delays. For example, a conclusion that a gripper must be replaced or customised might only be reached after all the other pieces of equipment are assembled and integrated.

Such a process can take weeks and even months to converge into a robust automation solution, depending on the task and system complexity. Moreover, some of the system components might still not be available or utilised in production already, which limits or even prevents your access to them for the purpose of training and integration.

The role of digital twins

A digital twin is a virtual representation of a physical product or process, used to understand and predict the physical counterpart’s performance characteristics. Digital twins are used throughout the product life cycle to simulate, predict and optimise the product and production system before investing in physical prototypes and assets.

By utilising the digital twin of the production system and the product, it is now possible to significantly shorten the time taken to set up and validate a robotic system with integrated vision and machine learning. Thus, you can achieve robust and reliable results faster and at much lower costs.

In a virtual environment, the real robot, parts and camera are replaced with virtual ones. Instead of spending a lot of time and resources on setting up the equipment, capturing many images and manually annotating them, it is now possible to do so easily and automatically within a virtual environment.

The next step is to switch from virtual to physical – the real equipment is set up and integrated. The machine learning algorithm might require some additional training with images captured from the real camera.

However, since the machine learning algorithm is already pre-trained in the digital twin, it will require significantly less real sample images to achieve an accurate and robust result, hence it will reduce the physical commissioning time, resources and re-work.

By Zachi Mann, Albert Harounian and Shahar Zuler

Zachi Mann is an intrapreneur at Siemens Digital Industries Software, Albert Harounian is a deep learning researcher, data scientist and software engineer, and Shahar Zuler is a machine learning and software engineer at Siemens Digital Industries Software.