ICCHP 2010 talk

July 17th, 2010 by nishimotz 1 comment »

My talk on audio CAPTCHA at ICCHP finished yesterday in Vienna.

I enjoyed using Twitter during the conference. Thank you.

Dear new friends,
Usually I tweet in Japanese with @nishimotz account. I prefer using facebook for English conversation. So, please feel free to unfollow me.
Both facebook and @nishimotz account on Twitter can be used for English conversation.
I will tweet in Japanese with @nishimtz account.

My slide and my tweets at ICCHP is as follows:

Takuya Nishimoto, Takayuki Watanabe: The Evaluations of Deletion-Based Method and Mixing-Mased Method for Audio CAPTCHAs.

» Read more: ICCHP 2010 talk

DMCPP: development of another dialog manager

June 16th, 2010 by nishimotz No comments »

An experimental dialog manager of Galatea for Linux, using lib-julius and OpenCV, written in C++, is under development.
The project is focusing on the low-level multimodal event handling, while Galatea Dialog Studio is focusing on higher-level dialog management using VoiceXML.
Although current version is just the skeleton of applications, I would like to ask for feedback from the developers.

DMCPP in English
DMCPP in Japanese

Please join mailing lists for discussions.

pyAA

February 18th, 2010 by nishimotz No comments »

To test the MSAA-related features of Microsoft Japanese Input Method Editor MS-IME 2002 for Japanese version of Windows, I am working with pyAA.
This is a preliminary work for localization of NVDA for Japanese users.

I tried to adopt original pyAA to Python 2.6.x.
At first I obtained the source code (of simpler branch) from CVS repository, then I built it with Visual Studio 2008 and SWIG. I also modified the code so that the Value-property can be accessed correctly under multibyte charactor coding environments.

We are successfully proceeding the work with it at the moment.
At the previous meeting of NVDAjp project, we added some code to NVDA, and verified that the WinEvent of MS-IME 2002 can be captured and the Value property can be accessed using IAccessible interface.

Related pages in Japanese (not yet translated) : pyaa and nvdajp

Notes on Feb 27 : I created a github repository of pyaa.

Voice interface and effectiveness

December 6th, 2009 by nishimotz 1 comment »

One of my colleague made a presentation at Human-Agent-Interaction symosium in Tokyo yesterday.

The assumption is that the human-like spoken dialogs are highly effective. Our proposal is to use the reinforcement-learning for acquiring the strategy how to respond quickly to overlapped utterances, interruptions, or gestures during spoken dialogs between human and machine. Although the research is still in early stage, we hope something like mind-reading will be possible, in other words, the users of spoken dialog systems do not need to say from the beginning to the end.
» Read more: Voice interface and effectiveness

orpheus_tw

November 28th, 2009 by nishimotz No comments »

I am developing a service called orpheus_tw.
Japanese songs composed by the automatic composition system “Orpheus” (a research project at the University of Tokyo) can be shared with the followers of a Twitter account @orpheus_tw.

This service was built with Ruby and Rails, and hosted by Heroku. Additional “delayed job” option is also used.

Research on Spoken Dialogue Agent

November 28th, 2009 by nishimotz No comments »

The upcoming publications at Human Agent Interaction Symposium (HAI2009) are as follows:

  • Masayuki Nakazawa, Takuya Nishimoto, Shigeki Sagayama:
    Title: Behavior Generation for Spoken Dialogue Agent by Dynamical Model
    Abstract: For the spoken dialog systems with the anthropomorphic agents, it is important to give the natural impressions and the real presence to human. For this purpose, the head and gaze controls of the agent which are consistent with the spoken dialogs are expected to be effective. Our approach is based on the following hypotheses: 1) An agent performs the dialog concurrently with the intentional controls of the head and gaze to retrieve the information and to give signals. 2) The movement of the head and eyeballs is based on mathematical models. To achieve these purpose, we have adopt the mathematical model for movements of the agent.
    There are several merits to formulate by the mathematical model, a) the parameters can reflect the subjectivity which can generate various movement from this model, b) the movements of the agents can reflect the personality, c) the continuous movements of the agent can be controlled by the mathematics. In this paper, we propose a mathematics model by the second order system and perform comparison with the linear model and show the superiority.
  • Di Lu, Masayuki Nakazawa, Takuya Nishimoto, Shigeki Sagayama:
    Title: Barge-in Control with Reinforcement Learning for Efficient Multi-modal Spoken Dialogue Agent
    Abstract: To make the dialogue between the agent and the user smoother, we propose a multi-modal user simulator that could be widely used in real-time agent control for multi-modal dialog agent with reinforcement learning. We also implemented the prototype system that utilized the result of reinforcement learning.

Date: Fri, Dec 4 – Sat, Dec 5, 2009

Place: Tokyo Institute of Technology

Language: Japanese

Date: Thu, Oct 29 – Fri, Oct 30, 2009

Place: ASPAM (Aomori City, Japan)

Language: Japanese

A research on speech CAPTCHA systems

October 8th, 2009 by nishimotz No comments »

I am working on my presentation at WIT/SP meeting as follows.

Title: The comparison between the deletion-based methods and the mixing-based methods for safe speech CAPTCHA systems

Authors: Takuya NISHIMOTO, Hitomi MATSUMURA and Takayuki WATANABE

Abstract: Speech-based CAPTCHA systems, which distinguish between software agents and human beings, are especially important for persons with visual disability.  The popular approach is based on mixing-based methods, which use the mixed sounds of target speech and noises.  We have proposed a deletion-based method which uses the phonemic restoration effects.  Our approach can control the difficulty of tasks simply by the masking ratio. Our design principle of CAPTCHA insists that such tasks should be chosen so that the larger difference in performance between the machines and human beings can be provided.  In this paper, we give some hypotheses on the differences between the deletion-based method and the mixing-based methods.  We also show a plan of experiments which compare the automatic speech recognition performance, speech intelligibility, and mental workload of these two approaches.

Date:
Thu, Oct 29 – Fri, Oct 30, 2009

Place:
ASPAM (Aomori City, Japan)

Language:
Japanese

Galatea release announcement

October 7th, 2009 by nishimotz No comments »

Latest Galatea Toolkit (beta version) is released as follows:

http://en.sourceforge.jp/projects/galatea/releases/

Please notice that current version is for Japanese conversations. I would like to discuss the plans for internationalization of this tools. The English documents are not fully checked. Please give me the comments and suggestions at galatea-i18n mailing list, which is hosted at sourceforge.jp.

P.S. Galatea Toolkit Video Demos are now available at YouTube.

A multimodal interactive system based on hierarchical Model-View-Controller architecture

July 29th, 2009 by nishimotz No comments »

Multimodal interactive systems are expected to be used widely. To realize life-like agents or humanoid robots, flexible architecture for integrating software modules is necessary. Many frameworks are proposed.

  • Joseph Polifroni, Stephanie Seneff. 2000. Galaxy-II as an Architecture for Spoken Dialogue Evaluation. Proceedings of Second International Conference on Language Resources and Evaluation, pp.42-50.
  • Yosuke Matsusaka, Kentaro Oku, Tetsunori Kobayashi. 2003. Design and Implementation of Data Sharing Architecture for Multi-Functional Robot Development. Trans. of IEICE, Vol.J86-D1, No.5, pp.318-329 (in Japanese).
  • SRI International, The Open Agent Architecture. http://www.ai.sri.com/~oaa/

In this post, the following topics related to Galatea Toolkit are discussed;

  1. A developer should be able to customize a parameter which influences many modules within the system easily.
  2. A developer who doesn’t have knowledge concerning the speech technology should be able to develop the spoken dialog applications efficiently.

» Read more: A multimodal interactive system based on hierarchical Model-View-Controller architecture

O’ra-be : A support system for broadcasting radio programs

May 20th, 2009 by nishimotz No comments »

Radio is familiar and important media, especially for elderly people and persons with visually impaired. The community-broadcasting offices are also increasing, because they are useful at the time of disaster.
To make the regional information programs, however, take the time and effort of news-gathering.
From this viewpoint, we developed a system “O’ra-be” for broadcasting voice messages posted by telephone, which is useful for small radio broadcasting offices.

The development of O’ra-be was initially supported by Mitou Software Development Program of IPA (Information-technology Promotion Agency), Japan in 2005.

In the IPAX 2009 conference, held in Tokyo on May 27-28, 2009, the first public demonstration of O’ra-be system is going to performed.

Our project did not perform patent query in Japan, and the right was abandoned. The source code and public application server is going to prepared for open use (please see my page on github later).

The system development is anticipated to make more progress through work to introduce Ruby on Rails into implementation part of database and Web application. Moreover, we will appeal for participation in development by shifting to open-source project.

With economic conditions of these days, that “down-sizing” of broadcast advances in every scene is expected as well as individual and small broadcasting station.
Personal computer and the Internet realized the world that anyone can enjoy computer environment of superior quality in low cost. In the same way, it is expected that a broadcast support system will be the important infrastructure in the society and life.
O’ra-be is specialized in user interface for contribution file editing in live broadcasting, and it is extendable to handle video sources. On the basis of the spread of podcasts or video sharing sites in these days, cooperation with existing services can enter visual field in the future.

CastStudio

CastStudio