ARCHIVES
VOL. 5, ISSUE 4 (2019)
Spoken content metadata extraction using speech and speaker recognition approaches
Authors
Pardeep Sangwan
Abstract
The collection of information today plays an important role in processing massive amounts of data for different purposes. Automatic extraction of information from audio streams is one of the open challenges in this area. The present work explains a method to extract metadata for performing combined tasks to identify the speakers utilizing ‘Hidden Markov (HMM) tied-state crossword tri-phones acoustic models, Mel-Frequency Cepstral Coefficients (MFCC) and N-gram language modelling’. The device performs speech transcription through a Catalan language recognizer. In addition, a diarization of the speaker is performed using segmentation based on HMM and extraction of the feature ‘Perceptual Linear Prediction (PLP)’. For multimedia content, voice-to-text conversion as well as speaker diarization may be utilized as descriptive information. The storage of metadata is done with the help of MPEG-7 to make indexing and retrieval more versatile and effective. The device was successfully tested on the recording of the Catalan Parliament's plenary sessions.
Download
Pages:43-46
How to cite this article:
Pardeep Sangwan "Spoken content metadata extraction using speech and speaker recognition approaches". International Journal of Research in Advanced Engineering and Technology, Vol 5, Issue 4, 2019, Pages 43-46
Download Author Certificate
Please enter the email address corresponding to this article submission to download your certificate.

