Tuesday, 22 May 2012

Speech Input to Browser App

Capturing Speech in the Web Browser

Recently I have been looking to integrate audio into the web application, and as I understand HTML5 has a very simple speech-to-text input element.

I've found an example here

And the full specification here

Specific Uses

This speech-to-text input element can be used to allow voice input to the Kinect Kiosk.

Users shall be able to ask their questions, and the system shall interpret their speech and attempt to infer an answer using Sitepals AIML.


The speech-to-text element requires users to press the button to start a recording, part of the specification requires that users know they are being recorded.

It also requires Google Chrome at the moment, and is not a standard for web browsers or W3C.  In other browser like IE and Firefox, the speech element appears as a normal textbox.

Other Work

I've also been making progress on face/motion detection using the Kinect and OpenCV, though the classifiers are still not working.  I'll make another post on my progress with that when it is running, and then I'll provide some sample code.

Sunday, 13 May 2012

Progress with Integration

Since the last post some work has been done to set up the Kiosk with existing software, and made progress with integration between C++ applications and Sitepal's Javascript API.

I will cover three things in this post:

Setting up Windows 7 as a Kiosk

I had found a kiosk mode for the google chrome web browser, which allows me to display the webpage and prevent users from browsing other websites.  One problem however is that users can still use alt + f4 to exit the browser and access the desktop.  There are ways to disable alt + f4 which I may explore, but a temporary solution is to provide the on screen keyboard.

Existing Sitepal software

To enable the sitepal software to send and receive HTML requests for my initial experimentation I setup a local apache server.  The webpage uses Javascript, AJAX and HTML 5 to allow for several features.  AJAX is used to poll a datasource at set intervals and update the Sitepal.  I am currently using JSON to define data.

At the moment users can type into text boxes phrases and questions.  The avatar has a limited knowledge base for questions at the moment and can be improved with the Aritificial Intelligence Markup Language (AIML).

For future work, HTML 5 allows for recording devices to be detected on the system via the webbrowser, and this may be implemented with Kinect Microphone array to feed audio into Sitepal.

Integration between C++ and Javascript

A C++ application runs on the local server and captures and processes information from the Kinect sensor.  In order to output data to the server, and Javascript, I am writing information to a file in JSON format.  The file is read using javascript, and sends a HttpRequest to Sitepal every 500 milliseconds.