Voice support

Voice support#

The Robotics Edge platform supports voice processing capabilities to enable speech-based interaction within robotic applications. This functionality uses imx-voice-plugins running with GStreamer for audio processing and ROS for communicating with other components.

Overview#

Voice support on this platform includes:

Speech-to-Text transcription using imxasr from imx-voice-plugins (released separately).
Integration with ROS through ros-gst-bridge open source component, which is preinstalled on the platform.
This setup allows decoded text from voice input to be shared with other ROS nodes for further processing or decision-making.

Key components#

imx-voice-plugins: GStreamer NXP proprietary plugins for audio and voice processing, including speech transcription.
ros-gst-bridge: Open source component, pre-installed on Robotics Edge Platform. Facilitates communication between GStreamer pipelines and ROS nodes.

Access to Voice plugins#

The imx-voice-plugins are released separately from the Robotics Edge Platform. To obtain access to the evaluation package, contact:voice@nxp.com.

Installation guides and examples are provided within the evaluation package.

First simple example#

Requires i.MX 95 EVK or i.MX 8M Plus EVK and a microphone connected to the jack audio port.

Download and install imx-voice-plugins (more detailed information is available in the README file of the package):

unzip imx-voice-plugins.zip  
cp gst-plugin/libgstimx* /usr/lib/gstreamer-1.0/  
cp models/moonshine/moonshine-base*.onnx /root/

Initialize the ROS environment:
```
. /opt/ros/jazzy/setup.sh`
```

Start Gstreamer speech to text basic pipeline:
for i.MX 95 EVK:

gst-launch-1.0 -q --no-position alsasrc device=hw:wm8962audio,0 ! audioconvert ! queue ! imxasr silent=true onnx-nb-threads=5 ! rostextsink &

for i.MX 8MP EVK:

gst-launch-1.0 -q --no-position alsasrc device=hw:wm8960audio,0 ! audioconvert ! queue ! imxasr silent=true onnx-nb-threads=3 ! rostextsink &

This command can be easily adapted to use any audio input source. For example, use alsasrc device=hw:micfilaudio,0 to use the onboard digital microphones on i.MX 95 EVK.

Use the ros2 utility to monitor gst_text_pub topic:
```
ros2 topic echo "gst_text_pub"
```
Instead of using the generic ros2 utility, you can subscribe this topic from your application to get notified of the decoded speech.
Wait for a few seconds, so that the speech to text models are loaded. Then, when you speak in the microphone, the ros2 utility displays the decoded text.

Now, the transcript of the speech captured by the microphone is available to ROS nodes as shown in the below figure:

gst