It's easier to browse and understand audio recordings when one can simultaneously follow as text what is being said, what was said before and what will be said after a moment. Related to this there is the WebVTT (Web Video Text Tracks Format) standard, which, despite its name, also works with audio files and can be characterised as the representation of the text used in synchronisation with other media such as audio or video.
Making a VTT file from an audio recording is a form of phonetic transcription, or more precisely, transcribing. There are several different online applications for this purpose, but not all of them recognise the Finnish language. They vary widely in pricing, some offer some free transcribing time per month and are very likely to produce different quality.
One could use Google's Speech-to-Text API, which can be accessed directly from the Google Cloud console using a graphical user interface. Basically, an audio file is given as input, a few choices affecting quality is made and then a short moment is waited through. After that download option becames available which can be used to retrieve a SRT file containing the transcribed audio. SRT files are almost identical to the WebVTT files, both being human- and machine-readable text files. Howevery a SRT file needs to be converted to a VTT file, before it can be used. This conversion requires a separate application, which can be a Windows application or, alternatively, one can use any of the many conversion services available on the web (try searching with "convert srt to vtt").
To the publishing application, user does not need to provide any other input other than the url to audio file and url to the vtt file. These are used to generate an audio player, which displays an interactive transcription below it. Here interactivity means that when clicking part of transcribe text, player changes the position where from audio continues to play. As a listener progress through the audio, part of the transcribe text indicates which part of the audio user is currently listening to.
The VTT file must be located in a place that allows distribution of such files either everywhere or to the server used by the publishing application. It is recommended that both the audio file and the vtt file are placed on the CDN Storage provided by the CDN service you are already require to use, as the relevant Cross-Origin Resource Sharing (CORS) settings are easily found in its configuration.
Enabling the transcribe feature requires turning on the experimental functions in the user-spesific settings. Created audio-like particulars remain intact even after the experimental functions are turned off meaning that they will also e.g. get put to backups. In this case, turning on/off the experimental functions will practically just show/hide some interface elements.