Japanese TTS for NVDA

Objectives
Many of tools for the visually impaired to use a PC and the Web are commercial software at present. The following problems occur due to this.

  • The financial problem.
  • It is difficult to follow flexibly and rapidly for a change of needs and the OS environment of the user.
  • The needs cannot be shared among Web developers and persons with visual disability. The tool with speech is useful to verify Web accessibility. However, a lot of Web developers do not use it because such a software is charged.

In late years, NVDA, an open-source screen reader for Windows, attracts attention.

NVDA is released under GPL2. Internationalization is proceeded with the volunteers. The support of Windows Vista was pushed forward quickly. Although it is a free software, it is easy to use. Various considerations for the users are accomplished. Major functions are implemented with Python language, and it is easy to participate in development and improvement.
Therefore Japanese voice synthesis engine which is compatible with GPL2 is expected with open-source. If it is made so, Japanese voice synthesis engine can be distributed with original NVDA. Its merits and impacts are extremely large. We have been investigating whether GalateaTalk (gtalk) which is open-source Japanese TTS can be used for such an objective since 2007.

Investigation of basic policy
Sophisticated audio device handling is necessary to realize demand that is peculiar to screen readers. The following happens frequently: Output speech is stopped on the way, and reading aloud of another character string starts immediately.
It is desirable to make Win32 DLL version, but it takes time.

Examination 1
With GalateaTalk for SAPI5 developed in ISTC experimentally, operation with NVDA is verified. SAPI5 has merit in implementation, in other words, the engine does not need to control audio device. The present GalateaTalk for SAPI does not completely correspond to SAPI. There are functions which should make addition and revision to this engine. However, this should be taken over in Python side in the immediate future.
sapi5gtalk.py was made based on synthDrivers/sapi5.py of NVDA. Handling such as modification of handling of XML markup or conversion of character string was added. Speech driver for tentativeness was made in this way.
We aimed for making use of both Python script of NVDA and GalateaTalk for SAPI of VC++, and making stable version.
Reading out was performed with Firefox. String that reading out was unable and point where engine crashed were found. At first we handled in Python side. It was decided policy was arranged, and to revise GalateaTalk in the future. Policy of revision of GalateaTalk was investigated at the same time.

Verification of stability
Log file handling was added to speech engine. Memory leak and API-affiliated crash should be verified. The page of Yahoo! Japan came to be readable. NVDA falls (together with Firefox?) in a few minutes. It seems that there is problem for memory managements of GalateaTalk side. Writing log messages of events, such as starting speak, is implemented. Logging can be added to memory allocations.

Functions
Markup of GalateaTalk does not become SAPI5 conformance about control of speech speed and voice variety. Conversion to functions of GalateaTalk can be performed in Python.

Performance
There is malfunction that is peculiar to gtalk of old version. An utterance start is late that is speech head gets useless long silence. When there is an unknown word, reading lacks in character unit.

Examination 2
New implementation to utilize Win32 command line version and socket communication has begun to be investigated.

Installation
Consistency with Installer of original NVDA should be considered.

Future plan
We will work on Python side in cooperation with Pytyon expert. A file is arranged to have try it about the present conditions.

License
It can be a TTS of special version to use it in NVDA. We think about substitute expedient about part that release under GPL2 is impossible.

Personal impression
At first, sense of incongruity was felt in screen readers being implemented in Python. After the work mentioned above was pushed forward, Python seemed to be very good as implementation method of screen readers. For example, at first string processing is implemented in Python, and handling can be moved to C++ code in consideration of performance.

Published by nishimotz

A freelance consultant. doctor of engineering. speech interface, open-source software, accessibility, #nvdajp. Facebook: http://bit.ly/ckUk20