ARCHIVES

2026 ISSUES

VOL. 12 : ISSUE 1 : JAN-MAR VOL. 12 : ISSUE 2 : APR-JUN

2025 ISSUES

2024 ISSUES

2023 ISSUES

2022 ISSUES

2021 ISSUES

2020 ISSUES

2019 ISSUES

2018 ISSUES

2017 ISSUES

2016 ISSUES

2015 ISSUES

VOL. 5, ISSUE 4 (2019)

Spoken content metadata extraction using speech and speaker recognition approaches

Authors

Pardeep Sangwan

Abstract

The collection of information today plays an important role in processing massive amounts of data for different purposes. Automatic extraction of information from audio streams is one of the open challenges in this area. The present work explains a method to extract metadata for performing combined tasks to identify the speakers utilizing ‘Hidden Markov (HMM) tied-state crossword tri-phones acoustic models, Mel-Frequency Cepstral Coefficients (MFCC) and N-gram language modelling’. The device performs speech transcription through a Catalan language recognizer. In addition, a diarization of the speaker is performed using segmentation based on HMM and extraction of the feature ‘Perceptual Linear Prediction (PLP)’. For multimedia content, voice-to-text conversion as well as speaker diarization may be utilized as descriptive information. The storage of metadata is done with the help of MPEG-7 to make indexing and retrieval more versatile and effective. The device was successfully tested on the recording of the Catalan Parliament's plenary sessions.

Download

Pages:43-46

How to cite this article:

Pardeep Sangwan "Spoken content metadata extraction using speech and speaker recognition approaches". International Journal of Research in Advanced Engineering and Technology, Vol 5, Issue 4, 2019, Pages 43-46

Download Author Certificate

Please enter the email address corresponding to this article submission to download your certificate.