Feature Extraction

Home
Full List of Titles
1: Speech Processing
CELP Coding
Large Vocabulary Recognition
Speech Analysis and Enhancement
Acoustic Modeling I
ASR Systems and Applications
Topics in Speech Coding
Speech Analysis
Low Bit Rate Speech Coding I
Robust Speech Recognition in Noisy Environments
Speaker Recognition
Acoustic Modeling II
Speech Production and Synthesis
Feature Extraction
Robust Speech Recognition and Adaptation
Low Bit Rate Speech Coding II
Speech Understanding
Language Modeling I
2: Speech Processing, Audio and Electroacoustics, and Neural Networks
Acoustic Modeling III
Lexical Issues/Search
Speech Understanding and Systems
Speech Analysis and Quantization
Utterance Verification/Acoustic Modeling
Language Modeling II
Adaptation /Normalization
Speech Enhancement
Topics in Speaker and Language Recognition
Echo Cancellation and Noise Control
Coding
Auditory Modeling, Hearing Aids and Applications of Signal Processing to Audio and Acoustics
Spatial Audio
Music Applications
Application - Pattern Recognition & Speech Processing
Theory & Neural Architecture
Signal Separation
Application - Image & Nonlinear Signal Processing
3: Signal Processing Theory & Methods I
Filter Design and Structures
Detection
Wavelets
Adaptive Filtering: Applications and Implementation
Nonlinear Signals and Systems
Time/Frequency and Time/Scale Analysis
Signal Modeling and Representation
Filterbank and Wavelet Applications
Source and Signal Separation
Filterbanks
Emerging Applications and Fast Algorithms
Frequency and Phase Estimation
Spectral Analysis and Higher Order Statistics
Signal Reconstruction
Adaptive Filter Analysis
Transforms and Statistical Estimation
Markov and Bayesian Estimation and Classification
4: Signal Processing Theory & Methods II, Design and Implementation of Signal Processing Systems, Special Sessions, and Industry Technology Tracks
System Identification, Equalization, and Noise Suppression
Parameter Estimation
Adaptive Filters: Algorithms and Performance
DSP Development Tools
VLSI Building Blocks
DSP Architectures
DSP System Design
Education
Recent Advances in Sampling Theory and Applications
Steganography: Information Embedding, Digital Watermarking, and Data Hiding
Speech Under Stress
Physics-Based Signal Processing
DSP Chips, Architectures and Implementations
DSP Tools and Rapid Prototyping
Communication Technologies
Image and Video Technologies
Automotive Applications / Industrial Signal Processing
Speech and Audio Technologies
Defense and Security Applications
Biomedical Applications
Voice and Media Processing
Adaptive Interference Cancellation
5: Communications, Sensor Array and Multichannel
Source Coding and Compression
Compression and Modulation
Channel Estimation and Equalization
Blind Multiuser Communications
Signal Processing for Communications I
CDMA and Space-Time Processing
Time-Varying Channels and Self-Recovering Receivers
Signal Processing for Communications II
Blind CDMA and Multi-Channel Equalization
Multicarrier Communications
Detection, Classification, Localization, and Tracking
Radar and Sonar Signal Processing
Array Processing: Direction Finding
Array Processing Applications I
Blind Identification, Separation, and Equalization
Antenna Arrays for Communications
Array Processing Applications II
6: Multimedia Signal Processing, Image and Multidimensional Signal Processing, Digital Signal Processing Education
Multimedia Analysis and Retrieval
Audio and Video Processing for Multimedia Applications
Advanced Techniques in Multimedia
Video Compression and Processing
Image Coding
Transform Techniques
Restoration and Estimation
Image Analysis
Object Identification and Tracking
Motion Estimation
Medical Imaging
Image and Multidimensional Signal Processing Applications I
Segmentation
Image and Multidimensional Signal Processing Applications II
Facial Recognition and Analysis
Digital Signal Processing Education

Author Index
A B C D E F G H I
J K L M N O P Q R
S T U V W X Y Z

Investigations on Inter-Speaker Variability in the Feature Space

Authors:

Reinhold Haeb-Umbach,

Volume: 1, Page (NA) Paper number 1513

Abstract:

We apply Fisher variate analysis to measure the effectiveness of speaker normalization techniques. A trace criterion, which measures the ratio of the variations due to different phonemes compared to variations due to different speakers, serves as a first assessment of a feature set without the need for recognition experiments. By using this measure and by recognition experiments we demonstrate that cepstral mean normalization also has a speaker normalization effect, in addition to the well-known channel normalization effect. Similarly vocal tract normalization (VTN) is shown to remove inter-speaker variability. For VTN we show that normalization on a per sentence basis performs better than normalization on a per speaker basis. Recognition results are given on Wallstreet Journal and Hub-4 databases.

IC991513.PDF (From Author) IC991513.PDF (Rasterized)

TOP


LSP Weighting Functions Based on Spectral Sensitivity and Mel-Frequency Warping for Speech Recogntion in Digital Communication

Authors:

Seung Ho Choi, Dept. of Electrical Eng., Korea Advanced Institute of Science and Technology, 373-1 Kusong-Dong, Yusong-Ku, Taejon 305-701, Korea (Korea)
Hong Kook Kim, AT&T Labs Research, Rm. E148, 180 Park Avenue, Florham Park NJ 07932, USA (USA)
Hwang Soo Lee, Central Research Laboratory, SK Telecom, 58-4 Hwaam-Dong, Yusong-Gu, Taejon 305-348, Korea (Korea)

Volume: 1, Page (NA) Paper number 1331

Abstract:

In digital communication networks, a speech recognition system extracts feature parameters after reconstructing speech signals. In this paper, we consider a useful approach of incorporating speech coding parameters into a speech recognizer. Most speech coders employ line spectrum pairs (LSPs) to represent spectral parameters. We introduce weighted distance measures to improve the recognition performance of an LSP-based speech recognizer. Experiments on speaker-independent connected-digit recognition showed that weighted distance measures provide better recognition accuracy than unweighted distance measures do. Compared with a conventional method employing mel-frequency cepstral coefficients, the proposed method achieved higher performance in terms of a recognition accuracy.

IC991331.PDF (From Author)

TOP


Two-Dimensional Multi-Resolution Analysis of Speech Signals and Its Application to Speech Recognition

Authors:

Chun-Ping Chan, Department of Electronic Engineering, The Chinese University of Hong Kong (Hong Kong)
Yiu-Wing Wong, Department of Electronic Engineering, The Chinese University of Hong Kong (Hong Kong)
Tan Lee, Department of Electronic Engineering, The Chinese University of Hong Kong (Hong Kong)
Pak-Chung Ching, Department of Electronic Engineering, The Chinese University of Hong Kong (Hong Kong)

Volume: 1, Page (NA) Paper number 2261

Abstract:

This paper describes a novel approach of using multi-resolution analysis (MRA) for automatic speech recognition. Two-dimensional MRA is applied to the short-time log spectrum of speech signal to extract the slowly varying spectral envelope that contains the most important articulatory and phonetic information. After passing through a standard cepstral analysis process, the MRA features are used for speech recognition in the same way as conventional short-time features like MFCCs, PLPs, etc. Preliminary experiments on both clean connected speech and noisy telephone conversation speech show that the use of MRA cepstra results in a significant reduction in insertion error when compared with MFCCs.

IC992261.PDF (From Author) IC992261.PDF (Rasterized)

TOP


Hierarchical Subband Linear Predictive Cepstral (HSLPC) Features for HMM-Based Speech Recognition

Authors:

Rathinavelu Chengalvarayan,

Volume: 1, Page (NA) Paper number 2257

Abstract:

In this paper, a new approach for linear prediction (LP) analysis is explored, where predictor can be computed from a mel-warped subband-based autocorrelation functions obtained from the power spectrum. For spectral representation a set of multi-resolution cepstral features are proposed. The general idea is to divide up the full frequency-band into several subbands, perform the IDFT on the mel power spectrum for each subband, followed by Durbin's algorithm and the standard conversion from LP to cepstral coefficients. This approach can be extended to several levels of different resolutions. Muti-resolution feature vectors, formed by concatenation of the subband cepstral features into an extended feature vector, are shown to yield better performance than the conventional mel-warped LPCCs over the full voice-bandwidth for connected digit recognition task.

IC992257.PDF (From Author) IC992257.PDF (Rasterized)

TOP


Towards a Robust/Fast Continuous Speech Recognition System Using a Voiced-Unvoiced Decision

Authors:

Douglas O'Shaughnessy,
Hesham Tolba,

Volume: 1, Page (NA) Paper number 1672

Abstract:

In this paper, we show that the concept of Voiced-Unvoiced (V-U) classification of speech sounds can be incorporated not only in speech analysis or speech enhancement processes, but also can be useful for recognition processes. That is, the incorporation of such a classification in a continuous speech recognition (CSR) system not only improves its performance in low SNR environments, but also limits the time and the necessary memory to carry out the process of the recognition. The proposed V-U classification of the speech sounds has two principal functions: (1) it allows the enhancement of the voiced and unvoiced parts of speech separately; (2) it limits the Viterbi search space, and consequently the process of recognition can be carried out in real time without degrading the performance of the system. We prove via experiments that such a system outperforms the baseline HTK when a V-U decision is included in both front- and far-end of the HTK-based recognizer.

IC991672.PDF (From Author) IC991672.PDF (Rasterized)

TOP


A C/V Segmentation Algorithm For Mandarin Speech Signal Based On Wavelet Transforms

Authors:

Jhing-Fa Wang, Department of Electrical Engineering & Department of Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. (Taiwan)
Shi-Huang Chen, Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. (Taiwan)

Volume: 1, Page (NA) Paper number 1261

Abstract:

This paper proposes a new consonant/vowel (C/V) segmentation algorithm for Mandarin speech signal. Since the Mandarin phoneme structure is a combination of a consonant (may be null) followed by a vowel, the C/V segmentation is an important part in the Mandarin speech recognition system. Based on the wavelet transform, the proposed method can directly search for the C/V segmentation point by using a product function and energy profile. The product function is generated from the appropriate wavelet and scaling coefficients of input speech signal and it can be applied to indicate the C/V segmentation point. With this product function and the additional verification of energy profile, the C/V segmentation point can be accurately pointed out with a low computation complexity. Experiments are provided that demonstrate the superior performance of the proposed algorithm. An overall accuracy rate of 97.2% is achieved. This algorithm is suitable for Mandarin speech recognition task.

IC991261.PDF (From Author) IC991261.PDF (Rasterized)

TOP


Feature Extraction for Speech Recognition Based on Orthogonal Acoustic Feature Planes and LDA

Authors:

Tsuneo Nitta,

Volume: 1, Page (NA) Paper number 2298

Abstract:

This paper describes an attempt to extract multiple topological structures, hidden in time-spectrum (TS) patterns, by using multiple mapping functions, and to incorporate the functions into the feature extractor of a speech recognition system. In the previous work, the author proposed a novel feature extraction method based on MAFP/KLT (MAFP: multiple acoustic feature planes), in which 3*3 derivative operators were used for mapping functions, and showed that the method achieved significant improvement in preliminary experiments. In this paper, firstly, the mapping functions are directly extracted in the form of a 3*3 orthogonal basis from a speech database. Next, the functions are evaluated, together with 3*3 simplified operators modeled on the orthogonal basis. Finally, after comparing the experimental results, the author proposes an effective feature extraction method based on MAFP/LDA, in which a Sobel operator is used for mapping functions.

IC992298.PDF (Scanned)

TOP


Distinctive Feature Detection Using Support Vector Machines

Authors:

Partha Niyogi, Bell Labs, Lucent Technologies, USA. (USA)
Chris Burges, Bell Labs, Lucent Technologies, USA. (USA)
Padma Ramesh, Bell Labs, Lucent Technologies, USA. (USA)

Volume: 1, Page (NA) Paper number 1995

Abstract:

An important aspect of distinctive feature based approaches to automatic speech recognition is the formulation of a framework for robust detection of these features. We discuss the application of the support vector machines (SVM) that arise when the structural risk minimization principle is applied to such feature detection problems. In particular, we describe the problem of detecting stop consonants in continuous speech and discuss an SVM framework for detecting these sounds. In this paper we use both linear and nonlinear SVMs for stop detection and present experimental results to show that they perform better than a cepstral features based hidden Markov model (HMM) system, on the same task.

IC991995.PDF (Scanned)

TOP