How can the correlation between NLP and generative AI text models like ChatGPT be leveraged in robotics, and what benefits can be derived from these advancements?
DALL-E generated "A 3D render of a mechanical robot standing over a view of a building under construction."
Having been fascinated recently by the potential of OpenAI's ChatGPT, I have been exploring its potential applications in robotics. I have realized for quite some time now that the slow progress of robotics can be attributed to the lack of critical technology required for its advancement. However, with advanced technologies like ChatGPT, it is worth exploring how they can be leveraged to accelerate progress in this area of interest. Fortunately, I see a good correlation between these technologies, and the possibilities for advancements in robotics are endless.
ChatGPT is a super-buzzword currently, and most of us understand it as a generative text model that has been trained and refined using human input. It offers an incredible ability to interact through dialogue while creating code synthesis under a single prompt. These models excel in various applications, including text generation, machine production, and code analysis. With these capabilities in mind, I have been on a mission to explore how this technology can be advantageous and utilized in robotics systems.
However, we know robotics applications involve real-world physics and require understanding their environment and the ability to take physical actions. Unlike text-only commands, robots need an additional interaction layer with users to comprehend and execute commands. A generative robotics model, especially in the construction environment, requires making sense of the complicated physical world while possessing a high level of common sense knowledge. This allows us to explore the correlation between robotics and generative models further.
After reading Gary Marcus and Ernest Davis' book "Rebooting AI, Building Artificial Intelligence We Can Trust," the authors' statement about "building machines that possess common sense, cognitive models, and powerful tools for reasoning" is noteworthy and makes sense. They argue that these features can facilitate deep understanding, a crucial prerequisite for creating machines that can anticipate and evaluate the consequences of their actions with reliability. This presents an opportunity to move beyond language model texts and convert them into blueprints for physical activities.
I have discovered that ChatGPT is capable of accommodating various physical manifestations, engaging in closed-loop reasoning through dialogue, and solving a wide range of single-command tasks in robotics. Robotics has been in existence for quite some time. As a result, several open-source libraries are available for fundamental functionalities in the perception and action domains, including object detection and segmentation, mapping, motion planning, controls, and grasping.
Regarding robot reasoning and execution, pre-defined routines can be employed by the LLM (Large Language Model) as long as the appropriate prompt is provided. It is crucial to name an application programming interface (API) - such as in BIM authoring software like Revit and many others with scripting capabilities - in a way that precisely reflects its overall purpose and operation. Clear and concise terminology is essential for the LLM to clarify the functional connections between APIs and produce the expected outcome.
Image from "Microsft's Research - ChatGPT for Robotics: Design Principles and Model Abilities"
The latest development by Microsoft's Autonomous Systems and Robotics Group researchers - released recently and reaffirming my assumptions - has demonstrated the potential of OpenAI's ChatGPT for robotics applications by illustrating how to create prompts and instruct ChatGPT to use specific robotic libraries to program tasks. Modern robotics, according to Microsoft's experts, relies on a closed-loop system in which the engineer codes the job, monitors the robot's behavior, and modifies its programming as needed.
In Microsoft's view, NLP, such as ChatGPT, has the potential to translate a human-language description of the task into code that robots can understand. This could enable a non-technical user to take the engineer's place in the loop, with the latter only responsible for providing the original task description in human language, observing the robot, and giving feedback on its behavior in human language. ChatGPT would then convert this feedback into code to improve the robot's behavior.
Using an experimental approach, Microsoft's researchers created a range of use cases, including "zero-shot job planning" or general command to guide a drone in examining a shelf's contents, manipulation of manipulating a robotic arm, and object identification and distance searches based on various APIs.
A recent post by Dhanshree Shripad Shenwai summarizes Microsoft's ChatGPT method for use in robotics for practical applications that describes a strategy for utilizing ChatGPT in robotics which consists of three key components: designing prompts to direct ChatGPT, operating existing APIs, and providing human feedback via text. These components the author noted as the necessary backbone of the strategy.
In conclusion, the emergence of OpenAI's ChatGPT has opened up new possibilities for robotics. By leveraging the power of natural language processing and generative AI text models, we can potentially make significant advancements in robotics, including developing better prompts, utilizing existing APIs, and incorporating human feedback via text. With the potential to convert human language descriptions into robot-readable code, even non-technical users - such as construction workers, designers, or engineers - could take part in instructing robots.
While there is still a ton of work to be done and developed, the correlation between NLP, LLM, BIM, digital twins, and generative AI text models such as ChatGPT with robotics is a promising area for future exploration and progress in the field of automation and robotics. Perhaps a different context of ChatGPT can be utilized for this, but since it's such a developed and capable text generation model right now and will only get more advanced, there might be nothing better. For my next more "realistic" investigation, I will analyze the context of how Digital Twins can be critical to successfully executing automation for both the design and construction industries.
Comments