Amazon AWS Polly Neural Speech on RPi4 - Scargill's Tech Blog

Regular readers will know that for years I was a great fan of using Ivona speech on my Raspberry Pi.In Node-Red I used the free Ivona service to provide high quality speech in Node-Red at the heart of my home control setup.Well, Ivona, good as it was, has been defunct now for some time.

Amazon Polly is great.especially with the “neural” enhancement – and I’m now using it on RPi4 in the UK and Spain.This post was originally written for Ivona in 2017 and was then completely overhauled for the (then) new NEURAL option in POLLY as well as (optionally) including the voice in the input.

Update April 2024 – the neural option MUST be accompanied by a region option (I don’t live in USA WEST-anything – but this is one of a few regions ytou can put in for it to work.The Amazon Polly system, is effectively a replacement for Ivona.The short, sharp answer is: Polly works, it is effectively free (<5 million characters a month or 1 million characters with the new “neural” option) – and it is actually better than Ivona.

Read on, as my simple Node-Red code caches data so that previously-used text does not require successive calls to Amazon servers.The code also allows for buffering of separate messages so they don’t overrun.In case you’re wondering, I could not see a decent AWS POLLY logo/icon on their website so I took this image from a free-use, no-attribute site So the Amazon system “Polly” works via an AWS account.

I have a free Amazon “developer” account and when I tried to add Polly – it said I didn’t have the right permissions – so – I added user pete to my account and made him part of the Polly group – and that didn’t work either – then I noted something about payment and realised I’d not put any payment details in – I did that – and all of a sudden the thing came to life.This has been running for 5-6 years now and I think they’ve charged me maybe £1 sterling in all that time.The way I use Polly seems appropriate  – download a phrase as a file (MP3) from your text input and save it with a meaningful file name.  Next time you want that phrase – check to see if the file already exists – if so, play it, if not, get a new file from Amazon.

In a typical use case that I might have, after a message is used once, I cache it in it’s own file and hence NO chance of incurring significant charges.There are, no doubt, more elegant ways to do this than calling a command line from Node-Red – but this method works perfectly and as far as I know, the result is unique for Node-Red and Polly.Still working on getting this running on Node-Red in the latest PI OS (Bookworm 64 bit) in a Docker container – as user ROOT – more soon.

I’ll assume you have your aws credentials (simple – free) – don’t worry about location not being where you actually are.Use the command line code below – I’ve used this on RPi2-4 without any issues.Currently using it on PI OS Bullseye 32 bit).

As user pi, I created a folder called /home/pi/audio to store files… then… oh, another update – I noticed this was failing so updated my Python and re-installed: you should now have Python 3.8 – it may work with later versions but I didn’t try that and installation will fail below Python v3.8 sudo pip install awscli I set the region to us-west-1 in the original setup for no other reason than initially not knowing any better – no matter as this gets overwritten in the code below where I use us-west-2 which absolutely works with the NEURAL option.The output format you’d expect to enter might be MP3 but no – so I picked json for no good reason in the initial setup – again see the AWS POLLY example below as the node-red code seems to override some initial settings.Once AWS was installed, I used: aws configure to enter the user ID and secret key (both of which I’d already set up on the Amazon site – for once – easy), location and format.

pi@ukpi:~ $ aws configure AWS Access Key ID [*********]: AWS Secret Access Key [**********]: Default region name [us-west-1]: Default output format [json]: That done I tried this at the command line: aws polly synthesize-speech --output-format mp3 --voice-id Amy --engine "neural" --region eu-west-2 --text "Hello my name is peter." /home/pi/audio/peter.mp3 The resulting file was .MP3 sitting in my /home/pi/audio folder – this used the voice Amy (British female) to store a phrase into peter.mp3 – next step…  mpg123 /home/pi/audio/peter.mp3 Sorted – good for testing but as you’ll see the final solution is much better.I have this the wrong way around of course – you should first ensure you have a working MP3 player with some standard .MP3 file before testing Polly to keep life simple.The Node-Red sub-flow below is about queuing messages, storing them with meaningful names, playing them back and making sure you don’t re-record a phrase you have already recorded.

If you don’t like the default Amy – I’ve included the code to let you add another voice into your input (Brian for example).If you want to add sound effects – just put .MP3 files in the audio folder and call them by name.I have files like red-alert.mp3 and similar using Star Trek recordings – far better to have the original than a modern voice wailing “red alert”? The first function looks to see if the payload has something in it and if so it pushes that onto a stack.

The code then looks to see if speech is busy – if not and if there is something on the stack, it checks – if it is an mp3 file it sends the file to the MP3 player.If it is not an mp3, it looks to see if you’ve already created an mp3 for that speech, if so it plays that file, otherwise it passes the message onto Amazon to create the file – which is then played back with a tiny delay.It would have been nice to process new speech while playing something else back but that would get more complicated, involving more flags.

As it stands this is easy to understand.You can fire in more speech or .MP3 files while one is playing and they will simply be queued.You clearly need your free Amazon account set up and Node-Red for this – you also need MPG123 player.

Both Node-Red and MPG123 are in my standard “The Script” if that helps.Note – when it comes to RPi5 – there is no 3.5mm jack out so we’ll have to look at a USB dongle (ordered) ..and also calling aws and mpg123 from a Docker container will also need some thought.

Here is the code I used in each of those functions….the MPG123 exec node simply has mpg123 for the command and append payload ticked.The AWS exec node has  aws for the command and append payload ticked.

Here is the code for the three yellow function nodes below: I put the code in a Node-Red sub-flow (for ease of use) that can be used by simply injecting some text into the incoming payload.Here however I’ll just show it in a regular flow along with a use-once-st-powerup inject to reset a couple of global variables.[ { "id": "c75889b6.c114a8", "type": "subflow", "name": "Volume 0-100", "info": "", "in": [ { "x": 64, "y": 98, "wires": [ { "id": "228f7ec0.797752" } ] } ], "out": [] }, { "id": "bf061a13.249d78", "type": "exec", "z": "c75889b6.c114a8", "command": "amixer cset numid=1 -- ", "addpay": true, "append": "", "useSpawn": "", "name": "Volume", "x": 380, "y": 100, "wires": [ [], [], [] ] }, { "id": "228f7ec0.797752", "type": "function", "z": "c75889b6.c114a8", "name": "Non-linear", "func": "msg.payload/=3;\nif (msg.payload>0) msg.payload+=66;\nmsg.payload+=\"%\";\nreturn msg;", "outputs": 1, "noerr": 0, "x": 210, "y": 100, "wires": [ [ "bf061a13.249d78" ] ] }, { "id": "5c625079.35e59", "type": "comment", "z": "c75889b6.c114a8", "name": "Amixer control for audio", "info": "", "x": 170, "y": 40, "wires": [] }, { "id": "676931a1.872fc", "type": "subflow:c75889b6.c114a8", "z": "947f7b4d.ee6ef8", "x": 770, "y": 160, "wires": [] }, { "id": "1738bb65.3471d5", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "50%", "repeat": "", "crontab": "", "once": false, "topic": "", "payload": "50", "payloadType": "str", "x": 570, "y": 180, "wires": [ [ "676931a1.872fc" ] ] }, { "id": "10e04ec3.32a121", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "25%", "repeat": "", "crontab": "", "once": false, "topic": "", "payload": "25", "payloadType": "str", "x": 570, "y": 140, "wires": [ [ "676931a1.872fc" ] ] }, { "id": "8f283f57.3b399", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "Red Alert", "repeat": "", "crontab": "", "once": false, "topic": "amy", "payload": "red alert", "payloadType": "str", "x": 980, "y": 100, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "2b461c2.7bd40e4", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "Comms beep", "props": [ { "p": "payload", "v": "comms beep", "vt": "str" }, { "p": "topic", "v": "amy", "vt": "string" } ], "repeat": "", "crontab": "", "once": false, "topic": "amy", "payload": "comms beep", "payloadType": "str", "x": 990, "y": 140, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "e928e8.25ad7718", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "Hailing frequencies open", "repeat": "", "crontab": "", "once": false, "topic": "amy", "payload": "hailing frequencies open", "payloadType": "str", "x": 1030, "y": 180, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "2d98b3aa.d7284c", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "Keypress", "props": [ { "p": "payload", "v": "keypress", "vt": "str" }, { "p": "topic", "v": "amy", "vt": "str" } ], "repeat": "", "crontab": "", "once": false, "topic": "amy", "payload": "keypress", "payloadType": "str", "x": 980, "y": 220, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "b0d13aff.e60fd8", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "Unable to comply", "props": [ { "p": "payload", "v": "unable to comply", "vt": "str" }, { "p": "topic", "v": "amy", "vt": "str" } ], "repeat": "", "crontab": "", "once": false, "topic": "amy", "payload": "unable to comply", "payloadType": "str", "x": 1000, "y": 260, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "7e2cec04.b03ee4", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "0%", "repeat": "", "crontab": "", "once": false, "topic": "", "payload": "0", "payloadType": "str", "x": 570, "y": 100, "wires": [ [ "676931a1.872fc" ] ] }, { "id": "e4d915f1.5aaea8", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "100%", "repeat": "", "crontab": "", "once": false, "topic": "", "payload": "100", "payloadType": "str", "x": 570, "y": 260, "wires": [ [ "676931a1.872fc" ] ] }, { "id": "b98733e4.20171", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "75%", "repeat": "", "crontab": "", "once": false, "topic": "", "payload": "75", "payloadType": "str", "x": 570, "y": 220, "wires": [ [ "676931a1.872fc" ] ] }, { "id": "3134692e.250326", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "Wind Chimes One", "props": [ { "p": "payload", "v": "wind chimes one", "vt": "str" }, { "p": "topic", "v": "amy", "vt": "str" } ], "repeat": "", "crontab": "", "once": false, "topic": "amy", "payload": "wind chimes one", "payloadType": "str", "x": 1011, "y": 301, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "94bb09db.a9bad8", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "100%", "repeat": "", "crontab": "", "once": true, "onceDelay": "1", "topic": "", "payload": "100", "payloadType": "str", "x": 130, "y": 80, "wires": [ [ "4e7da40d.4cef4c" ] ] }, { "id": "4e7da40d.4cef4c", "type": "subflow:c75889b6.c114a8", "z": "947f7b4d.ee6ef8", "x": 300, "y": 80, "wires": [] }, { "id": "14222a13.e6f206", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "thunder", "props": [ { "p": "payload", "v": "thunder", "vt": "str" }, { "p": "topic", "v": "amy", "vt": "string" } ], "repeat": "", "crontab": "", "once": false, "onceDelay": "", "topic": "amy", "payload": "thunder", "payloadType": "str", "x": 970, "y": 60, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "7f84915b.077af", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "", "props": [ { "p": "payload" }, { "p": "topic", "vt": "str" } ], "repeat": "", "crontab": "", "once": false, "onceDelay": 0.1, "topic": "amy", "payload": "dogs barking", "payloadType": "str", "x": 1010, "y": 340, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "1ec9a4f724dff8c4", "type": "exec", "z": "947f7b4d.ee6ef8", "command": "sudo mpg123", "addpay": true, "append": "", "useSpawn": "", "timer": "", "oldrc": false, "name": "Mp3 player", "x": 1610, "y": 220, "wires": [ [ "2cc43b1538cd8d17" ], [], [] ] }, { "id": "e8f1df220b9a0e3b", "type": "function", "z": "947f7b4d.ee6ef8", "name": "Clr creating", "func": "global.set(\"create_speech_busy\",0);\nmsg.payload=\"\";\nreturn msg;", "outputs": 1, "noerr": 0, "x": 1450, "y": 80, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "c6b58278606d97a5", "type": "function", "z": "947f7b4d.ee6ef8", "name": "process text", "func": "if (typeof context.arr == \"undefined\" || !(context.arr instanceof Array)) context.arr = [];\nif (typeof global.get(\"speech_busy\") == \"undefined\") global.set(\"speech_busy\", 0);\nif (typeof global.get(\"create_speech_busy\") == \"undefined\") global.set(\"create_speech_busy\", 0);\n\nif (msg.payload !== \"\") context.arr.push(msg.payload);\nif (context.arr.length) {\n msg.payload = context.arr.shift();\n if (msg.payload.indexOf(\".mp3\") == -1) {\n var fs = global.get(\"fs\");\n var mess = msg.payload;\n mess = mess.replace(/'/g, \"\");\n var messfile = mess.toLowerCase();\n messfile = messfile.replace(/[.,\\/#!$%\\^&\\*;:{}=\\-_`~()]/g, \"\");\n messfile = messfile.replace(/ /g, \"_\");\n messfile = \"/home/pi/audio/\" + messfile + \".mp3\";\n\n if (fs.existsSync(messfile)) {\n if (global.get(\"speech_busy\")==1) { context.arr.unshift(msg.payload); return [null, null]; }\n else { global.set(\"speech_busy\", 1); msg.payload = messfile; return [null, msg]; }\n }\n else {\n if (global.get(\"create_speech_busy\")==1) { context.arr.unshift(msg.payload); return [null, null]; } else\n {\n context.arr.unshift(msg.payload);\n global.set(\"create_speech_busy\",1);\n var voice = \"Amy\";\n msg.payload = 'polly synthesize-speech --engine \"neural\" --region eu-west-2 --output-format mp3 --voice-id ' + voice + ' --text \"' + mess + '\" ' + messfile;\n return [msg, null];\n }\n }\n }\n if (global.get(\"speech_busy\")==1) context.arr.unshift(msg.payload); \n else { global.set(\"speech_busy\", 1); return [null, msg]; } // mp3 or synth \n}\n", "outputs": "2", "timeout": "", "noerr": 0, "initialize": "", "finalize": "", "libs": [], "x": 1430, "y": 180, "wires": [ [ "f26c3a4e0246fb10" ], [ "1ec9a4f724dff8c4" ] ] }, { "id": "2cc43b1538cd8d17", "type": "function", "z": "947f7b4d.ee6ef8", "name": "Clr playing", "func": "global.set(\"speech_busy\",0);\nmsg.payload=\"\"; return msg;", "outputs": "1", "noerr": 0, "x": 1450, "y": 300, "wires": [ [ "c6b58278606d97a5" ] ] }, { "id": "f26c3a4e0246fb10", "type": "exec", "z": "947f7b4d.ee6ef8", "command": "aws", "addpay": "payload", "append": "", "useSpawn": "false", "timer": "", "winHide": false, "name": "aws create", "x": 1610, "y": 160, "wires": [ [ "e8f1df220b9a0e3b" ], [], [] ] }, { "id": "af813dae06f48d2e", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "Power up clear speech busy", "props": [ { "p": "payload" }, { "p": "topic", "vt": "str" } ], "repeat": "", "crontab": "", "once": true, "onceDelay": 0.1, "topic": "", "payload": "", "payloadType": "date", "x": 200, "y": 40, "wires": [ [ "dc4cc669d013eb9f" ] ] }, { "id": "dc4cc669d013eb9f", "type": "function", "z": "947f7b4d.ee6ef8", "name": "Clr creating", "func": "global.set(\"create_speech_busy\",0);\nglobal.set(\"speech_busy\",0);", "outputs": 0, "timeout": "", "noerr": 0, "initialize": "", "finalize": "", "libs": [], "x": 450, "y": 40, "wires": [] }, { "id": "a46305c0988fdf5e", "type": "inject", "z": "947f7b4d.ee6ef8", "name": "new", "props": [ { "p": "payload" }, { "p": "topic", "vt": "str" } ], "repeat": "", "crontab": "", "once": false, "onceDelay": 0.1, "topic": "amy", "payload": "new speech", "payloadType": "str", "x": 1250, "y": 340, "wires": [ [ "c6b58278606d97a5" ] ] } ]

Read More
Related Posts