A research on speech CAPTCHA systems

I am working on my presentation at WIT/SP meeting as follows.

Title: The comparison between the deletion-based methods and the mixing-based methods for safe speech CAPTCHA systems

Authors: Takuya NISHIMOTO, Hitomi MATSUMURA and Takayuki WATANABE

Abstract: Speech-based CAPTCHA systems, which distinguish between software agents and human beings, are especially important for persons with visual disability.  The popular approach is based on mixing-based methods, which use the mixed sounds of target speech and noises.  We have proposed a deletion-based method which uses the phonemic restoration effects.  Our approach can control the difficulty of tasks simply by the masking ratio. Our design principle of CAPTCHA insists that such tasks should be chosen so that the larger difference in performance between the machines and human beings can be provided.  In this paper, we give some hypotheses on the differences between the deletion-based method and the mixing-based methods.  We also show a plan of experiments which compare the automatic speech recognition performance, speech intelligibility, and mental workload of these two approaches.

Date:
Thu, Oct 29 – Fri, Oct 30, 2009

Place:
ASPAM (Aomori City, Japan)

Language:
Japanese

Galatea release announcement

Latest Galatea Toolkit (beta version) is released as follows:

http://en.sourceforge.jp/projects/galatea/releases/

Please notice that current version is for Japanese conversations. I would like to discuss the plans for internationalization of this tools. The English documents are not fully checked. Please give me the comments and suggestions at galatea-i18n mailing list, which is hosted at sourceforge.jp.

P.S. Galatea Toolkit Video Demos are now available at YouTube.

A multimodal interactive system based on hierarchical Model-View-Controller architecture

Multimodal interactive systems are expected to be used widely. To realize life-like agents or humanoid robots, flexible architecture for integrating software modules is necessary. Many frameworks are proposed.

  • Joseph Polifroni, Stephanie Seneff. 2000. Galaxy-II as an Architecture for Spoken Dialogue Evaluation. Proceedings of Second International Conference on Language Resources and Evaluation, pp.42-50.
  • Yosuke Matsusaka, Kentaro Oku, Tetsunori Kobayashi. 2003. Design and Implementation of Data Sharing Architecture for Multi-Functional Robot Development. Trans. of IEICE, Vol.J86-D1, No.5, pp.318-329 (in Japanese).
  • SRI International, The Open Agent Architecture. http://www.ai.sri.com/~oaa/

In this post, the following topics related to Galatea Toolkit are discussed;

  1. A developer should be able to customize a parameter which influences many modules within the system easily.
  2. A developer who doesn’t have knowledge concerning the speech technology should be able to develop the spoken dialog applications efficiently.

Continue reading

O’ra-be : A support system for broadcasting radio programs

Radio is familiar and important media, especially for elderly people and persons with visually impaired. The community-broadcasting offices are also increasing, because they are useful at the time of disaster.
To make the regional information programs, however, take the time and effort of news-gathering.
From this viewpoint, we developed a system “O’ra-be” for broadcasting voice messages posted by telephone, which is useful for small radio broadcasting offices.

The development of O’ra-be was initially supported by Mitou Software Development Program of IPA (Information-technology Promotion Agency), Japan in 2005.

In the IPAX 2009 conference, held in Tokyo on May 27-28, 2009, the first public demonstration of O’ra-be system is going to performed.

Our project did not perform patent query in Japan, and the right was abandoned. The source code and public application server is going to prepared for open use (please see my page on github later).

The system development is anticipated to make more progress through work to introduce Ruby on Rails into implementation part of database and Web application. Moreover, we will appeal for participation in development by shifting to open-source project.

With economic conditions of these days, that “down-sizing” of broadcast advances in every scene is expected as well as individual and small broadcasting station.
Personal computer and the Internet realized the world that anyone can enjoy computer environment of superior quality in low cost. In the same way, it is expected that a broadcast support system will be the important infrastructure in the society and life.
O’ra-be is specialized in user interface for contribution file editing in live broadcasting, and it is extendable to handle video sources. On the basis of the spread of podcasts or video sharing sites in these days, cooperation with existing services can enter visual field in the future.

CastStudio

CastStudio

Using JMC from Java

I have read the following post and understood pattern to use class of JavaFX from the Java language.

I tried to use Java Media Components from Java to play MP3 files and at last I succeeded.
The environment is Windows XP, and JavaFX SDK 1.1 is installed at c:\Program Files\JavaFX\javafx-sdk.
I can agree to the claim that functions of JavaFX are important, but script language is unnecessary. However, the following writing style is redundant. I want to use this technique to reuse resources which I implemented in Java in the past.
Continue reading

Japanese TTS for NVDA

Objectives
Many of tools for the visually impaired to use a PC and the Web are commercial software at present. The following problems occur due to this.

  • The financial problem.
  • It is difficult to follow flexibly and rapidly for a change of needs and the OS environment of the user.
  • The needs cannot be shared among Web developers and persons with visual disability. The tool with speech is useful to verify Web accessibility. However, a lot of Web developers do not use it because such a software is charged.

In late years, NVDA, an open-source screen reader for Windows, attracts attention.

Continue reading

Galatea English Technical Notes

English technical notes page of Galatea Linux is available:

(continue from previous post)
I have implemented a simple template engine for Java before experiencing Ruby on Rails, and have used template engines for PHP and Perl. However, the installation may be troublesome, and it was dissatisfaction to have to use engine-dependent description languages.
Ruby has a template engine called ERB by default. We can use Ruby language itself in ERB. The function of ERB is easily available from our own Ruby script. I have a good feeling about these things.
A VoiceXML browser and an HTML browser are not completely equivalent positioning. In addition, further consideration is necessary when you put modalities together. How should we make up in which hierarchy? We want to make proposals for it, on the basis of experience of implementations.

Galatea English Tutorial

New English tutorial page of Galatea Linux is available:

English version of Release Notes page was also created:

We were concerned with a project of voice interaction toolkit “Galatea,” and we were concerned with standardization of multi-modal talks description / the architecture, and we thought about implementation of the VoiceXML application by Ruby on Rails. And it came to be thought that “the implementation of a hierarchized system became the hierarchy of the template engine.”

Many of frameworks of the Web application offer a template engine. There is the merit of standardizing the description in each in many hierarchies, but there is the demerit that a description becomes redundant. The template engine is one such expedient to solve a problem.

Interactive Speech Technology Consortium (ISTC) investigated the standardization of the interaction description specification in each hierarchy about an interface system having voice input output, a GUI input-output. We go into more details about a structure of so-called Model/View/Controller, and six classes are proposed.
Some hierarchies correspond to an MVC framework of Ruby on Rails when they think about dialog control engine called Galatea Dialog Studio which I continue developing.
It is a reasonable method to implement a voice interaction system as follows: At first so-called Web application is implemented. Only a layer depending on HTML is replaced with VoiceXML.

In the Linux version of Galatea Toolkit, a problem of the difficulty of installation and the setting was left. We succeeded in unifying each modules by original design. However, it is necessary to change many points without contradiction at present when customizing and device setting are necessary.

There are necessary parameters and setting information respectively to operate each hierarchy. It is not the interaction description itself. For example, it is necessary to give the information for language processing and the speaker models for the voice synthesis. There are many parameters including audio in, speech detection, the acoustic models for the speech recognition. These setting wants to be handled in Galatea Toolkit in a mass.
After all this seems to become “the hierarchy of the template.”