Tuesday, February 19, 2008

Some possible kannada problems on the internet

1. Linguistic search for keywords
1.1 Subproblem - identifying proper nouns
1.2 Subproblem - transliteration ambiguity
2. Font rendering technology - improvement and standardization
2.1 Encoding standardization

What already exists -
- Searching kannada documents for english terms (by transliteration) -> Google (http://www.google.com/search?hl=kn)
- Searching kannada documents for kannada terms (in Unicode) -> Google, Wikipedia (kn.wikipedia.org)

Friday, February 15, 2008

Machine translation applications - first findings

Users
Home users
  1. Automatically translate web pages
  2. Translate chat
Organizational users (companies, government)
  1. Classifying documents as “needing human translation” or “not”; estimating effort needed for translation
  2. Localization support [e.g. for instruction manuals]
  3. Translation of email, documents, reports etc.
Professional users (translators)
  1. Support tools for translators who do post-editing
Unclassified
1. Spoken language translation (where is it used?)
Languages
  1. From European languages to Chinese/Japanese/Arabic and vice versa.
  2. From one European language to another
  3. Other languages include - Korean
Comment
In general, even the best MT systems in use today are mainly useful to get a general idea/gist of the text. Grammar and preservation of meaning can not be guaranteed. The main use cases for such limited functionality could be -
  • Automatic translation of websites, but only where the objective is doing something on the website [e.g. booking tickets/hotel rooms, shopping for goods which shoppers already know about], or getting some information. It is not suited for reading articles or literary works. The sentences should be small (and hence easier to translate). [e.g. titles of menus, small descriptions of the services offered by the site etc., news snippets]
  • Tools that assist human translators [e.g. localization support tools].
  • Chatting
  • Online service for naive users – for applications similar to the above, except that the text is in some other system where there is no translation feature provided [e.g. where the chat client does not provide translation].
Applications of MT system components
  1. Spell-check, grammar-check
  2. Dictionary/thesaurus – mono and bi-lingual
  3. Multi-lingual search (thematic search, query translation)
Applications where MT is a component
  1. Speech translation
  2. OCR