Human Auditory Processing and Speech Recognition – tcworld India 2016

I’m attending tcworld India 2016 in Bangalore. Pavithra Garre gave a presentation entitled “Human Auditory Processing and Speech Recognition—Potential Latencies and Benefits for Documentation”. These are my notes from the session. All credit goes to Pavithra, and any mistakes are my own.

Pavithra Garre is an engineer in design technology at Samsung Electronics in South Korea. She started by showing us a video clip about communication as an innate human ability, and about the vision of interacting with computers via speech recognition, and the evolution of speech recognition technology.

Pavithra’s presentation was very interactive. She asked questions and chatted to the audience throughout. The presentation covered the layers of speech recognition architecture, the modes of speech recognition, speech identifiers and tagging, CMS interpretation and custom delivery.

Pavithra described a three-layer architecture:

  • Speech recognition: There are different modes of speech recognition: converting digital audio to simpler acoustic forms; matching units of speech; a complex lexical decoding system based on pattern matching; applying grammar, such as in predictive typing; and phoneme identification. There are challenges in speech recognition technology, such as background noise reduction, the size of the data gathered and data compression to reduce this size, and the problem of energy consumption.
  • Tagging the different elements of speech to present to the CMS: The software needs to identify what the person is talking about, and tag each element appropriately. Once the speech is tagged, it becomes data. Examples of tags may be a form of XML, or VTML, or a more complex tagging format like ID3 or ID3V2Easy.
  • Documentation in a database: Content is information plus data. The indexed tag and associated content are combined to form or retrieve a document, in what Pavithra calls “CMS interpretation”.

Some well known examples of speech recognition software:

  • Siri by Apple
  • Genie by Microsoft
  • Google Speech
  • Dragon
  • and more

Where can we use this technology and the voice bank containing the derived content?

  • Marketing agility
  • Big data and analytics
  • Resolving disputes about customer interactions in a help desk (this suggestion came from the audience)
  • Better performance

Pavithra also described things you need to take into account, such as data volume and data migration.

There was a lively and interested discussion at the close of the presentation. Thanks Pavithra for an interesting presentation!

About Sarah Maddox

Technical writer, author and blogger in Sydney

Posted on 25 February 2016, in Tekom tcworld and tagged . Bookmark the permalink. 2 Comments.

  1. Pavithra Garre

    Hi Sarah, thank you very much for this wonderful brief-up. Sorry for noticing this late.
    Looking forward to more interactions and discussions.
    Pavithra G

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: