ChatGPT can now see, hear, and speak

OpenAI is introducing new voice and image capabilities in ChatGPT, allowing users to have voice conversations and show images to the AI. These features offer more intuitive ways to interact with ChatGPT in various scenarios, such as discussing landmarks, planning meals, or helping with math problems. Voice conversations can be initiated by opting into the feature in the mobile app settings and selecting a preferred voice from five options. The voice capability is powered by a text-to-speech model and professional voice actors. Users can also show ChatGPT images and use the drawing tool to focus on specific parts. Image understanding is enabled by multimodal GPT models. OpenAI is deploying these capabilities gradually to ensure safety and refine risk mitigations. Voice technology has potential creative and accessibility applications but also carries risks, so it is being used specifically for voice chat. Vision-based models are designed to assist users in their daily lives and have been informed by collaboration with organizations like Be My Eyes. OpenAI is transparent about the limitations of the models and advises against certain use cases. Plus and Enterprise users will have access to voice and image capabilities first, with plans to expand access to other user groups in the future.

Full article

Leave a Reply