Octavian's Internet Things: 2018

Tuesday, 13 November 2018

Actions On Google

Well, after some frustration, I have a chat bot working on Google Actions. But now that the frustration is over, the flood gate is open!

The first challenge; using my existing code. Safety Moments, my first skill, uses json files. I needed to upload them to a Google bucket. And then, for whatever reason, the promise loading these files took > 5s to execute (a download of a file <1K), so my conversation was over before my content loaded. There was no synchronous hack I found. Then, I figured I'd load them on invocation of the cloud function. It worked for a while, and then it just stopped working. No idea why. I received connection resets downloading the files. So second solution? Embed all the json in the webhook itself. (What is a webhook? It is the node code that fulfills the request to do something after LU digestion.)

The second challenge; competing SDKs. It looks like Dialogflow and Actions on Google were inseparable. But they are. And if you follow the code examples for Dialogflow (that I did, because it looked like a really cool tool compared to LUIS), you will discover mismatches between how Actions On Google (for assistants) implements bot behavior. So... I couldn't immediately figure out what was Dialogflow SDK (agent.add) and what was Actions SDK (conversation.ask). It manifested itself in my conversation data not persisting and me scratching my head.

The third challenge; moving targets. The Google tech is migrating from V1 to V2, and Dialogflow seems not to have caught up to Actions. Part of my confusion was out-of-the-box webhooks were V1. Dialogflow examples integrating with Actions was V1. But V2 is pretty much were all Actions documentation and examples sit. Ugh. Now I know what to look for.

What I Like

International availability.
Excellent TTS for female and male voice.
Dialogflow (the tool) and how it builds intents (utterances) and actions (entities).
You can build dev without charge.
Events get fired when media stops playing.

What I Dislike

You can't have a Card before a SimpleResponse. I use cards as banners.
Cards can have one button, though the JSON supports multiple buttons. Why?
You can't segment long conversations naturally. You have to make them turn based and granular. Not so good for my experiments.That will, in my opinion, seriously cripple enterprise capability.
Your action will not pass certification if it does not end the dialog on a question. This forces dialogs to be very transactional.

References

Actions on Google and Dialogflow.

Useful documentation

Sunday, 26 August 2018

Enabling Colossal Cave on Cortana

I am back at it. The first step was getting a version of Colossal Cave running. I found Eric Raymond's official public port:

https://gitlab.com/esr/open-adventure
http://www.catb.org/~esr/open-adventure/

I compiled it for Windows (using MinGW-64) and made a few changes (removing unix curses and flushing input appropriately so it worked inside a pipe.)

The next step was putting together a Cortana dialog-based shell calling an external process to invoke it. The fun here is the node.js event loop. Reading and writing to the pipe did not happen when I expected, so the dialog was out of sync until I forced the Cortana dialog to happen after pipe write.

So I now how a demo-able version of Colossal Cave on an Invoke with Cortana.

What is next before publishing?

Support multiple, concurrent users (yay fork).
Support save game.
Improve the Windows 10 prompt to repeat the text.

Sunday, 12 August 2018

Recovering my files....

So... I bought this lovely ROG portable (Strix 17 GL703GE) and started developing on it. Near the 90 day mark, I noticed the machine would not sleep. I found out, slowly, that the left arrow key would fire whenever it liked. It was pretty random at first, and then got persistently worse. Imagine trying to type when some invisible monkey keeps hitting backspace...

In my wisdom, I thought "heck I'll just make an image of the two disks" with disk2vhd. Smart right? And I'll wipe the computer and restore the disks on my replacement.

No. Windows does not have any tools out the box that do low level device copy for exact cloning. And mounting a vhd, though Windows can do it via diskmgmt or partdisk, there are some issues...

I tried to make a bootable partition on a USB drive. First problem, my partition was out-of-range for an MBR style boot. I needed EFI/GPT, but for whatever reason my ROG's boot menu would not recognize my cloned partition.

I though I'd use linux and gparted. Nope. After trying to find the right drivers for the ROG, I could not get the vhd mounted as guestfish uses a "FUSE" device that doesn't support block level reads (no cat, no dd from /dev/fuse...) And... most of the live USB tools would crash loading gparted. I spent hours getting to a point I had a bootable linux with the right video and track drivers just to discover I couldn't copy from a "file system in user space".

So... bought a new USB drive and made a partition of the same size as my former boot disk. I use cygwin to copy from a mounted vhd to this partition. Then, I had to fix up the partition to be bootable. And then, I had to use bcdedit, bootmgr, and msconfig to let me boot from my clone... so that I could repeat the process back to the new machine's boot partition.

And then, I discovered that my windows login credentials didn't work. For whatever reason, my login account didn't have permission to access my own home directory. So I got to figure out takeown, and regrant permissions to myself.

What a pain...

Monday, 23 July 2018

Why doesn't printf / fprintf output anything...

In my build of Colossal Cave to Windows, I had to get rid of getline (as this is a linux function found in <editline/readline.h> and it allocates a string dynamically) by replacing it with fgets. I compiled (under mingw) with no other modifications. And, I played advent.exe for an hour. I was happy.

And then I kicked up my IDE and wrote some node / Javascript to spawn a child_process running my new executable. It should work...

It didn't. My output handler, instead of asking me if I would like instructions, sat there. Did I fail in launching the process? I added logging the PID. It looked like it launched. I looked up the PID in taskmgr. I didn't see it. I looked up the PID in tasklist. There it was. But there was no output.

I searched the internet (using the wrong terms) and found nothing. But then I figured it out. I changed the executable from advent.exe to tasklist.exe. And magic; node dumped out some output... but truncated the last bit. That was the hint.

Programs with pipes work differently than console ttys. Most times you won't notice because a console terminal will by default flush stdout when it hits a newline (on unix at least.) Windows, and as it turns out a few stdio implementations, will buffer any stdout unless fflush is explicitly called or the buffer limit is reached. This is good. I/O writes are typically expensive, and if you have a process in the background (or without a terminal) this is a good optimization.

But it is not good if you are expecting a dialog or stream that would flush stdout on the next stdin call. A newline will not flush stdout onWindows (with the stdio implementation under mingw/cygwin.) On unix, you can use stdbuf or unbuffer to make a process not buffer its output. I saw no way to do this in Windows. (There might be, I just didn't see it.)

The solution? Fortunately the getline removal is also my savior. If I flush stdout before calling fgets, advent.exe works the way I expect. But... this won't help me with any and all other Windows commands. I won't always have the code of the command I'm executing (that adds a flush called or setbuf (stdio, null ). When I find it, I'll post...

Wednesday, 18 July 2018

How to invoke Cortana channel actions in node.js

Ugh. It took me a while to figure it out, and I had to reverse engineer some code to do it. There were breadcrumbs, but of course being a newb it wasn't immediately obvious.

Botframework allows you to connect to channels. You are likely invoked from a channel. That means you can send meta-data back to that channel. In the previous version of the framework, you called a method off the Message called channelData to set the JSON response. But channelData is deprecated in favor of the new sourceEvent method. So forget the "Launching apps or websites from a Cortana skill" example (that is in C# and the old way)...

The next hint is in the documentation. sourceEvent takes a map. The old version just took some action in a JSON with a type : LaunchUri and uri : "http://whatever.com/"

What is this map? Well, you can have your bot connected to one or more channels (or "*" for all.) The source code implementing sourceEvent was the final piece. I found examples setting the facebook channel, or using directline... But what is the channel name for Cortana? Well, it is cortana.

Wrapping the JSON up underneath the channel name and magically Cortana will launch the app.

msg.sourceEvent(
{ cortana : {
action : {
type : "LaunchUri",
uri : "https://www.octavianit.com"
}
}
} )

But be careful as Cortana will close the channel after invoking the app (that the protocol maps to). And you should consider that speakers (screenless devices) will not support this behavior and should have a UX alternative.

Tuesday, 17 July 2018

Building your own Cortana Music Player

Ever since Amazon killed the cloud Music Storage subscription, I've been annoyed. I uploaded my 2000+ CD collection to the cloud to preserve it and allow me to access it from anywhere.

I've been looking for an alternative, but didn't find anything as cost effective as Amazon's dead program ($25 for the year? a steal). Google had a Google Music Play plan that included YouTube Red for $15/mo. But that cost made me uncomfortable (that, and being on a monthly plan.)

But I believe I found a solution! OneDrive... you get 1TB of storage with an Office 365 account for $99/yr. That is value, and you get Office too.

But here is what I find more exciting.

I got my Invoke today and had to (like really) see if I could build my own music player... Out of the box Cortana will only let you hook up to streaming services. But you can't play music from your PC or OneDrive (unless you BlueTooth to the device).

How hard can it be to build a skill to play your music from OneDrive? As it turns out - not hard at all.

Amazon Music continually griefed me because it never kept my songs together in their albums when imported (as were imported via iTunes). But when I synced my library to OneDrive - the directory structure is intact. And as it happens, the OneDrive REST API will let you retrieve your directories and walk the files... and you can use your MSA authentication to keep it all personal or share those files...

So, how hard is it to get Cortana and botframework to play an MP3 you have stored on OneDrive? This easy.

var audioCard = new builder.AudioCard(session)
.media([
{ url : 'https://onedrive.live.com/download?cid=00E75C36F57E8A5B&resid=E75C36F57E8A5B%216254&authkey=AEAEHi1WUjheHj4' }]);
var msg = new builder.Message(session)
.addAttachment(audioCard)
.text('Now playing Nephatiti by 808 State)
.speak('<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Playing Nephatiti by <say-as interpret-as="number_digit">808</say-as> State.</speak>');
session.send(msg).endConversation();

Edit: It is extremely important to end the conversation after sending an audio card on Windows because if Cortana has a dialog going, regardless of any input hint, the volume will be set low. Ending the conversation keeps the volume at 100% at the expense of disconnecting Cortana from the bot. Also, Cortana will ignore every field in the audio card (like title).

What will be even more fun will be using the language recognition services to solve another Alexa pet peeve... search for my classic and latin music titles!

Monday, 16 July 2018

End to End Developer example video of basic Cortana skill

For those wanting a visual walk through of the process to create a Cortana skill with botframework!

Uses node.js.

Friday, 13 July 2018

Get your Azure QnA Bot to Speak with Cortana

Building a QnA bot and hooking it up to Cortana is simple in node.js under BotFramework V3.

If you have a FAQ that is in Q: A: format you can import it via the Azure QnA Maker tools and auto-create a "knowledge base". No coding required. You can do this here https://www.qnamaker.ai.

The next step is creating your bot. Microsoft has standardized on their botframework to do this. https://dev.botframework.com is your gateway.

If you are like me and like the simplicity of node.js then pick the QnA template.

Go back to qnamaker and view the code to extract the QnA keys and host.

Then go back to the Azure portal and update your bot for the Application Settings blade sections shown here.

The test web app will now be successfully linked to your QnA bot! But don't forget the last step...

Go to your Channels blade and set up Cortana. Then, go to the build blade and open the online editor. In the app.js code, you will see that the template uses the standard QnA dialog builder - that does not say the resulting answers back with the Cortana speech channel. Add an override like this.

(See GitHub QnAMaker patch for a V4 node example.)

There you have it. Now "Hey Cortana, ask Bernie Question Bot Test what is a dwarf planet?"

For C#, it is slightly more complicated. You need to subclass the the BasicQnAMakerDialog with something that sends a message with speak attached. This is done as an inner class of RootDialog (from the V3 C# template that comes with Azure Web App Bots.


    // Dialog for QnAMaker GA service
    [Serializable]
    public class BasicQnAMakerDialog : QnAMakerDialog
    {
        // Go to https://qnamaker.ai and feed data, train & publish your QnA Knowledgebase.
        // Parameters to QnAMakerService are:
        // Required: qnaAuthKey, knowledgebaseId, endpointHostName
        // Optional: defaultMessage, scoreThreshold[Range 0.0 – 1.0]
        public BasicQnAMakerDialog() : base(new QnAMakerService(new QnAMakerAttribute(RootDialog.qnaAuthKey, RootDialog.qnaKBId, "No good match in FAQ.", 0.5, 1, RootDialog.endpointHostName)))
        { }

        // Override to also include the knowledgebase question with the answer on confident matches
        protected override async Task RespondFromQnAMakerResultAsync(IDialogContext context, IMessageActivity message, QnAMakerResults results)
        {
            if (results.Answers.Count > 0)
            {
                IMessageActivity response = context.MakeMessage();
                response.Text = "Here is the match from FAQ:  \r\n  Q: " + results.Answers[0].Questions[0] + "  \r\n A: " + results.Answers[0].Answer;
                response.Speak = response.Text;
                response.InputHint = "acceptingInput";
                await context.PostAsync(response);
            }
        }
    }

(See GitHub QnAMaker patch for a V4 C# example.)

Getting the most out of your free Azure subscription for development.

I discovered why my Azure free trial was eating cash. It seems that the free trial defaults to reasonable paid service tiers, and not the free development tiers. So when you set up your Microsoft account (MSA) for development purposes, make sure you double and tripple check that the plan you pick starts with an F (for free) and not an S.

When you use bot builder or QnA builder, you might need to go back after you create a project from template and change the "App Service Plan" to something free. If you see your $ decreasing and you're not doing anything - you've configured something incorrectly.

Thursday, 12 July 2018

Building Bots in Azure

I am starting to build Cortana bots. So far its been interesting.

I had trouble building the Azure function bots from template. The node.js example for the simple "echo" bot... First crack, the Azure "Test in Web Chat" didn't work. Errors implied there was a permission issue.

I tried again, and had a deployment error on the bot function template.

I tried again, and on the third try it deployed. To my knowledge, I did nothing differently.

The advantage of a function bot over a web app bot is supposedly pay per invocation (that should be cheaper, right)? The issue with the example code for node.js on the function bot is that every potential used library is embedded in the index.js code where the two line "echo" functionality is embedded in the middle!

WHY? Why? Well, javascript as a language doesn't have a '#include' statement. Client side, you do the includes on the document for your browser to take care of.

In node, we have requires... that allows us to load modules. But why is this not used in the function bot example? I figure its a work around. But the down side is this: every time I edit the function in the portal for this example, I am touching a 200K line file!