Joshua Jones

7001 Project 3: Machine Learning with Professor Charles Isbell

Project Report

11/26/2002



Introduction


This project began with a survey of machine learning techniques. The survey uncovered a significant area of unfamiliarity, giving rise to several weeks of deep exploration into Independent Component Analysis (ICA), including foundational study in linear algebra and information theory. This exploration consumed the majority of the project, and an overview of ICA and related topics will form the basis of this report. Finally, a motivation for further study in the realm of machine learning is discussed. Rather than pursuing advances in these techniques in a vacuum, for their own sake, it is desirable to choose an area where the application of ML methods can contribute to our understanding, and allow the particular demands of the application to drive advances. One area where the thoughtful application of ML techniques could likely furnish significant insight is macroeconomics. The final section of this paper discusses a possible approach to such an undertaking.



Information Theory


Fundamentally, information theory deals with the amount of variability (conversely, the degree of regularity) in the values taken on by a random variable. The quantitative measure of this disorder is entropy; a higher entropic value indicates a greater degree of chaos within the values taken on by the variable. Entropy also directly describes the average length of a message needed to communicate a sample of the related variable. Thus, entropy captures the expected compressibility of a sample. Presumably, data compression has been viewed in the light of information theory. An interesting tangent could involve analyzing the relationships between these areas. However, such an analysis is outside the domain of this project, where information theory is a means to the end of understanding ICA.


To this end, it is necessary to describe the relationship between the entropies of associated random variables. If two random variables are not independent, then it is possible to discern some information about the value of one of the variables given the value of the other for a given trial. This means that the information that needs to be transmitted to describe the state of each variable is decreased in the presence of knowledge about the other. That is to say, the entropy of one of the variables conditioned on knowledge of the other is less than the prior entropy of that variable. The degree to which this relationship applies to a given pair of variables is also given a quantitative value under information theory, mutual information. This value is simply the prior entropy of a variable minus the entropy of the variable conditioned on knowledge of another. Producing values that satisfy certain relationships with respect to mutual information is a meaningful way to view the process of ICA.

Independent Component Analysis


As the name suggests, ICA is used to find the independent components of data. Semantically, these independent components can be viewed as the sources or underlying causes of the manifestation of patterns in the data --- these sources take on particular values during each trial and are mixed in some fashion to produce the observed values of random variables constituting the data. Given the observed data, ICA attempts to recover these statistically independent sources. From an information theoretic perspective, for a given input vector ICA aims to produce an output vector such that mutual information between the input and output is maximized while mutual information between components of the output vector is minimized (hopefully, is nonexistent, leading to the production of truly independent components). One natural and exciting application of ICA is blind source separation (BSS). In this problem area, various mixtures of a number of sources such as auditory signals are given as data. In the case of auditory signals, for instance, these mixtures can come from a number of microphones in various locations that record a combination of the sources. ICA is able to recover many of the characteristics of the original sources, though source amplitude and ordering cannot be recovered. In addition, Gaussian sources cannot be separated, as their combination is another Gaussian. This is a powerful application of ICA, but it should be noted that the success of the technique in this case can be attributed to the fact that the underlying assumptions made by ICA match the domain of BSS exactly. The observed signals used as input are the result of combining independent sources that are non-Gaussian (excepting noise). Another simplifying assumption commonly made is that the number of sources is known a priori. In other domains, it is quite possible that many of these assumptions do not hold. Also, in the case of blind source separation, the meaning of the recovered sources is quite clear. There is a great promise that using ICA and other ML techniques to analyze observable characteristics of complex systems may offer deep insight into the fundamental underlying dynamics of studied systems. However, if the assumptions of ICA are violated by the system and the formulation of observable parameters chosen as ICA inputs, the output may not be useful, and may in fact be misleading to the extent that it is not clear that ICA has failed. Further, even in cases where the assumptions of ICA are satisfied, the interpretation of the output will generally be much less clear cut, and therefore much more difficult than in the case of BSS.



Application to Economic Data


Despite these limitations, it may prove valuable to apply ICA and other related machine learning techniques to complex and poorly understood systems. As long as this endeavor is guided by a good understanding of the limitations and assumptions of the methods employed, pitfalls can be avoided and useful insight may be developed. With this understanding, it could prove useful to apply ICA to the stock market. The market fits the description of a complex system with poorly understood internal dynamics, and a wealth of statistical data is available for analysis. Because of the difficulty involved in interpreting the significance of the independent components recovered, if any truly independent components can indeed be recovered at all, an iterative approach to developing functionally generated independent characterizations of the statistical data is suggested. First, functions combining various market statistics should be estimated with the intent of describing the underlying forces that drive the market. The intent is that this estimation should be informed by current theories of macroeconomic systems. ICA should then be applied to the results of passing market data through these functions across a series of time steps, in order to determine whether the functions truly capture a fundamental description of the market’s observed behavior. If the resulting projection found by ICA projects the output of each function onto a separate axis, the theory is that this indicates that the set of functions describe the underlying sources of market behavior. It is likely that the first attempt will not achieve this goal, so subsequent estimations and analyses should be made based on the information gleaned from prior trials. The advantage of this iterative technique is that when a suitable set of functions is found, a separation of independent components has been achieved in such a way that the significance of the components is well understood --- since the generation of the components was controlled. Hopefully, this means that inspection of the final set of functions can yield insight into the internal dynamics of the market system. Notice that a problem here is that the proper number of functions is unknown. Experimentation and/or some extension of existing techniques may be useful in overcoming this additional difficulty. A goal of this analysis could be as lofty as the development of a model with predictive capability, possibly weakened to the point of probabilistic prediction (with probability significantly differing from the prior probability of a predicted movement). Such a goal will allow for evaluation and validation of the knowledge gained through this procedure, as the predictions can easily be compared to actual developments. The benefit of correctly developing such a model is quite clear!



Conclusion


Macroeconomics is only one example of a field studying a complex, partially understood system that could benefit from ML based data analysis techniques such as ICA. The application of these techniques to various fields should bring the benefit of not only advancing the understanding of the complex systems being studied, but also of placing new demands upon the techniques themselves. Such demands will drive further enhancements, as well as the discovery of new techniques. By allowing the applications to drive the enhancements, research is guided in directions that when successful are virtually ensured to result in practically useful developments.


Bibliography


Back AD, Weigend AS (1998) “A first application of independent component analysis to extracting structure from stock returns,” Int. J. on Neural Systems, 8(4):473-484.

http://140.113.216.56/course/nn2001_fall/Download/first-ica.pdf


Bell AJ, Sejnowski TJ (1995) "An information-maximization approach to blind separation and blind deconvolution," Neural Computation, 7: 1129-1159. http://citeseer.nj.nec.com/bell95informationmaximization.html


Isbell CL, Viola P (1999) “Restructuring sparse high-dimensional data for effective retreval,” In Advances in Neural Information Processing Systems, volume 11, MIT Press.


Leach S “Singular Value Decomposition – A primer,” Unpublished Manuscript, Department of Computer Science, Brown University. http://www.cs.brown.edu/research/ai/dynamics/tutorial/Postscript/

SingularValueDecomposition.ps


Smith LI (2002) “A Tutorial on Prinicpal Components Analysis,” Unpublished Manuscript. http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf


Will T (1999) “Introduction to the Singular Value Decomposition,” Online Tutorial, Davidson College.

http://www.iam.dvo.ru/document/svd/index.html