BIOwulf, Inc Technology

SVM's

Support Vector Machines are multivariate methods, which use subsets of features that together identify important patterns embedded in data. One such approach is our method of data analysis using feature selection. We have developed proprietary and patented algorithms for locating combinations of features that indicate the presence or absence of a given state. Feature selection is based on finding features that have strong predictive power on unseen data and inherit all the desirable properties of SVMs. This includes generalization guarantees, scalability, speed, and ease of parameter choice.

Other analysts typically employ descriptive statistics or clustering approaches to solve these problems. However, descriptive statistics (correlation coefficients, t-test, PCA) do not aim to build predictors and often make incorrect and prohibitively restrictive assumptions, for example that the data is normally distributed, that the variables are independent, and that the variance in both classes is equal. Our methods do not make such assumptions. Clustering methods, on the other hand, are ad hoc methods with no well-defined objective. Analysts often validate these methods by measuring their predictive power but they perform more poorly at this task than our method, which has the direct objective of obtaining good prediction. Furthermore, clustering algorithms are sensitive to feature scaling and investigators often throw away the lowest contrast features before beginning analysis, potentially losing valuable information. In contrast, our method is largely insensitive to feature scaling, and we work with all of the data. This results in the characterization of pertinent data that would have otherwise been rejected using less sophisticated techniques.

SVM methods rely on support vectors (borderline cases). This allows us to build good classifiers and select meaningful features under very adverse conditions: very large number of features, few training examples, and poorly distributed training examples. Other methods build their decision on the typical or average case, which can result in qualitatively different results.