VocalNet is an open-sourced speech interaction model.