There we are. Having 6h of kids talking in German, not always linear, having to consider the GDPR. We thought about running a local system, transcribing the voice of eight kids and the moderator.
There are several options that we tried to use, specifically the following:
Although it was not a problem getting them up and running (on Python), the results were rather “meh”.
We also tried f4x, but here, speaker identification was also kind of an issue.
Ultimately, we ended up using amberscript – they have good references, and our test sample seemed promising. The texts are split up in different speakers (which are not always identified correctly)