I am an Assistant Professor in the School of Interactive Computing within the College of Computing at Georgia Tech. I direct the Georgia Tech Robot Learning Lab, which is affiliated with both the Center for Machine Learning and the Institute for Robotics and Intelligent Machines.
My group performs fundamental and applied research in machine learning, artificial intelligence, and robotics with a focus on developing theory and systems that tightly integrate perception, learning, and control. Our work touches on a range of problems including computer vision, system identification, forecasting, simultaneous localization & mapping, motion planning, and optimal control. The algorithms that we develop often use and extend theory from nonparametric statistics, neural networks, nonconvex optimization, matrix algebra, and dynamic programming. (See Google Scholar and Publications below).
Prior to joining Georgia Tech, I was a postdoc in the Robotics and State Estimation Lab directed by Dieter Fox in the Computer Science and Engineering Department at the University of Washington. I received my Ph.D. from the Machine Learning Department in the School of Computer Science at Carnegie Mellon University where I was a member of the Sense, Learn, Act (SELECT) Lab codirected by Carlos Guestrin and my advisor Geoff Gordon.
I have coorganized the following workshops and tutorials:
Authors  Title  Year  Journal/Proceedings 

B. Dai, N. He, Y. Pan, B. Boots, & L. Song.  Learning from Conditional Distributions via Dual Kernel Embeddings (31% Acceptance Rate) [PDF Coming Soon] 
2017  Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS2017) 
M. Mukadam, C. Cheng, X. Yan, & B. Boots.  Approximately Optimal ContinuousTime Motion
Planning and Control via Probabilistic Inference (41% Acceptance Rate) [PDF Coming Soon] 
2017  Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA2017) 
G. Williams, N. Wagener, B. Goldfain, P. Drews, J. Rehg, B. Boots, & E. Theodorou.  Information Theoretic MPC for ModelBased Reinforcement Learning (41% Acceptance Rate) [PDF Coming Soon] 
2017  Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA2017) 
E. Huang, M. Mukadam, Z. Liu, & B. Boots.  Motion Planning with GraphBased Trajectories and Gaussian Process Inference (41% Acceptance Rate) [PDF Coming Soon] 
2017  Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA2017) 
J. Dong, J. Burnham, B. Boots, G. Rains, & F. Dellaert.  4D Crop Monitoring: SpatioTemporal Reconstruction for Agriculture
(41% Acceptance Rate) [PDF Coming Soon] 
2017  Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA2017) 
B. Hrolenok, B. Boots, & T. Balch.  Sampling Beats Fixed Estimate Predictors for Cloning Stochastic Behavior in Multiagent Systems. (24% Acceptance Rate) 
2017  Proceedings of the 31st Conference on Artificial Intelligence (AAAI2017) 
Abstract: Modeling stochastic multiagent behavior such as fish schooling is challenging for fixedestimate prediction techniques because they fail to reliably reproduce the stochastic aspects of the agents behavior. We show how standard fixedestimate predictors fit within a probabilistic framework, and suggest the reason they work for certain classes of behaviors and not others. We quantify the degree of mismatch and offer alternative samplingbased modeling techniques. We are specifically interested in building executable models (as opposed to statistical or descriptive models) because we want to reproduce and study multiagent behavior in simulation. Such models can be used by biologists, sociologists, and economists to explain and predict individual and group behavior in novel scenarios, and to test hypotheses regarding group behavior. Developing models from observation of real systems is an obvious application of machine learning. Learning directly from data eliminates expensive hand processing and tuning, but introduces unique challenges that violate certain assumptions common in standard machine learning approaches. Our framework suggests a new class of samplingbased methods, which we implement and apply to simulated deterministic and stochastic schooling behaviors, as well as the observed schooling behavior of real fish. Experimental results show that our implementation performs comparably with standard learning techniques for deterministic behaviors, and better on stochastic behaviors.


BibTeX:
@inproceedings{HrolenokAAAI17, 

X. Yan, V. Indelman, & B. Boots.  Incremental Sparse GP Regression for Continuoustime Trajectory Estimation & Mapping. 
2017  Journal of Robotics and Autonomous Systems 
Abstract: Recent work on simultaneous trajectory estimation and mapping (STEAM) for mobile robots has used Gaussian processes (GPs) to efficiently represent the robot’s trajectory through its environment. GPs have several advantages over discretetime trajectory representations: they can represent a continuoustime trajectory, elegantly handle asynchronous and sparse measurements, and allow the robot to query the trajectory to recover its estimated position at any time of interest. A major drawback of the GP approach to STEAM is that it is formulated as a batch trajectory estimation problem. In this paper we provide the critical extensions necessary to transform the existing GPbased batch algorithm for STEAM into an extremely efficient incremental algorithm. In particular, we are able to vastly speed up the solution time through efficient variable reordering and incremental sparse updates, which we believe will greatly increase the practicality of Gaussian process methods for robot mapping and localization. Finally, we demonstrate the approach and its advantages on both synthetic and real datasets. 

BibTeX:
@inproceedings{Yan17ras, Author = "Xinyan Yan and Vadim Indelman and Byron Boots", journal = " Robotics and Autonomous Systems", Title = "Incremental Sparse {GP} Regression for Continuoustime Trajectory Estimation and Mapping", Year = {2017}, pages="120132", volume = {87} } 

C. Cheng & B. Boots.  Incremental Variational Sparse Gaussian Process Regression. (22% Acceptance Rate)  2016  Proceedings of Advances in Neural Information Processing Systems 30 (NIPS2016) 
Abstract: Recent work on scaling up Gaussian process regression (GPR) to large datasets has primarily focused on sparse GPR, which leverages a small set of basis functions to approximate the full Gaussian process during inference. However, the majority of these approaches are batch methods that operate on the entire training dataset at once, precluding the use of datasets that are streaming or too large to fit into memory. Although previous work has considered incrementally solving variational sparse GPR, most algorithms fail to update the basis functions and therefore perform suboptimally. We propose a novel incremental learning algorithm for variational sparse GPR based on stochastic mirror ascent of probability densities in reproducing kernel Hilbert space. This new formulation allows our algorithm to update basis functions online in accordance with the manifold structure of probability densities for fast convergence. We conduct several experiments and show that our proposed approach achieves better empirical performance in terms of prediction error than the recent stateoftheart incremental solutions to sparse GPR. 

BibTeX:
@inproceedings{Cheng16, Author = "ChingAn Cheng and Byron Boots", Booktitle = "Proceedings of Advances in Neural Information Processing Systems 30 (NIPS)", Title = "Incremental Variational Sparse Gaussian Process Regression", Year = {2016} } 

J. Tan, Z. Xie, B. Boots, & K. Liu.  SimulationBased Design of Dynamic Controllers for Humanoid Balancing. (Selected for oral presentation) (48% Acceptance Rate)  2016  Proceedings of the 2016 International Conference on Intelligent Robots and Systems (IROS2016) 
Abstract: Modelbased trajectory optimization often fails to find a reference trajectory for underactuated bipedal robots performing highlydynamic, contactrich tasks in the real world due to inaccurate physical models. In this paper, we propose a complete system that automatically designs a reference trajectory that succeeds on tasks in the real world with a very small number of real world experiments. We adopt existing system identification techniques and show that, with appropriate model parameterization and control optimization, an iterative system identification framework can be effective for designing reference trajectories. We focus on a set of tasks that leverage the momentum transfer strategy to rapidly change the wholebody from an initial configuration to a target configuration by generating large accelerations at the center of mass and switching contacts. 

BibTeX:
@inproceedings{TanIROS16, 

W. Sun, R. Capobianco, G. J. Gordon, J. A. Bagnell, & B. Boots.  Learning to Smooth with Bidirectional Predictive State Inference Machines. (31% Acceptance Rate)  2016  Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI2016) 
Abstract: We present the Smoothing Machine (SMACH, pronounced "smash"), a dynamical system learning algorithm based on chain Conditional Random Fields (CRFs) with latent states. Unlike previous methods, SMACH is designed to optimize prediction performance when we have information from both past and future observations. By leveraging Predictive State Representations (PSRs), we model beliefs about latent states through predictive statesan alternative but equivalent representation that depends directly on observable quantities. Predictive states enable the use of welldeveloped supervised learning approaches in place of localoptimumprone methods like EM: we learn regressors or classifiers that can approximate message passing and marginalization in the space of predictive states. We provide theoretical guarantees on smoothing performance and we empirically verify the efficacy of SMACH on several dynamical system benchmarks. 

BibTeX:
@inproceedings{SunUAI16, 

J. Dong, M. Mukadam, F. Dellaert, & B. Boots.  Motion Planning as Probabilistic Inference using Gaussian Processes and Factor Graphs. (20% Acceptance Rate) 
2016  Proceedings of Robotics: Science and Systems XII (RSS2016) 
Abstract: With the increased use of high degreeoffreedom robots that must perform tasks in realtime, there is a need for fast algorithms for motion planning. In this work, we view motion planning from a probabilistic perspective. We consider smooth continuoustime trajectories as samples from a Gaussian process (GP) and formulate the planning problem as probabilistic inference. We use factor graphs and numerical optimization to perform inference quickly, and we show how GP interpolation can further increase the speed of the algorithm. Our framework also allows us to incrementally update the solution of the planning problem to contend with changing conditions. We benchmark our algorithm against several recent trajectory optimization algorithms on planning problems in multiple environments. Our evaluation reveals that our approach is several times faster than previous algorithms while retaining robustness. Finally, we demonstrate the incremental version of our algorithm on replanning problems, and show that it often can find successful solutions in a fraction of the time required to replan from scratch. 

BibTeX:
@inproceedings{DongRSS16, 

Z. Marinho, B. Boots, A. Dragan, A. Byravan, G. J. Gordon, & S. Srinivasa.  Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces.
(20% Acceptance Rate)
[Abstract] [BibTeX] [PDF][Supplement] 
2016  Proceedings of Robotics: Science and Systems XII (RSS2016) 
Abstract: We introduce a functional gradient descent trajectory optimization algorithm for robot motion planning in Reproducing Kernel Hilbert Spaces (RKHSs). Functional gradient algorithms are a popular choice for motion planning in complex manydegreeoffreedom robots, since they (in theory) work by directly optimizing within a space of continuous trajectories to avoid obstacles while maintaining geometric properties such as smoothness. However, in practice, implementations such as CHOMP and TrajOpt typically commit to a fixed, finite parametrization of trajectories, often as a sequence of waypoints. Such a
parameterization can lose much of the benefit of reasoning in a continuous trajectory space: e.g., it can require taking an inconveniently small step size and large number of iterations to maintain smoothness. Our work generalizes functional gradient trajectory optimization by formulating it as minimization of a cost functional in an RKHS. This generalization lets us represent trajectories as linear combinations of kernel functions, without any need for waypoints. As a result, we are able to take larger steps and achieve a locally optimal trajectory in just a few iterations. Depending on the selection of kernel, we can directly optimize in spaces of trajectories
that are inherently smooth in velocity, jerk, curvature, etc., and that have a lowdimensional, adaptively chosen parameterization. Our experiments illustrate the effectiveness of the planner for different kernels, including Gaussian RBFs, Laplacian RBFs, and Bsplines, as compared to the standard discretized waypoint representation. 

BibTeX:
@inproceedings{MarinhoRSS16, 

A. Venkatraman, W. Sun, M. Hebert, B. Boots, & J. A. Bagnell.  Inference Machines for Nonparametric Filter Learning. (24% Acceptance Rate) 
2016  Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI2016) 
Abstract: Datadriven approaches for learning dynamic models for Bayesian filtering often try to maximize the data likelihood given parametric forms for the transition and observation models. However, this objective is usually nonconvex in the parametrization and can only be locally optimized. Furthermore, learning algorithms typically do not provide performance guarantees on the desired Bayesian filtering task. In this work, we propose using inference machines to directly optimize the filtering performance. Our procedure is capable of learning partiallyobservable systems when the state space is either unknown or known in advance. To accomplish this, we adapt Predictive State Inference Machines (PSIMS) by introducing the concept of hints, which incorporate prior knowledge of the state space to accompany the predictive state representation. This allows PSIM to be applied to the larger class of filtering problems which require prediction of a specific parameter or partial component of state. Our PSIM+HINTS adaptation enjoys theoretical advantages similar to the original PSIM algorithm, and we showcase its performance on a variety of robotics filtering problems. 

BibTeX:
@inproceedings{VenkatramanIJCAI16, 

W. Sun, A. Venkatraman, B. Boots, & J. A. Bagnell.  Learning to Filter With Predictive State Inference Machines. (24% Acceptance Rate)  2016  Proceedings of the 33rd International Conference on Machine Learning (ICML2016) 
Abstract: Latent state space models are a fundamental and widely used tool for modeling dynamical systems. However, they are difficult to learn from data and learned models often lack performance guarantees on inference tasks such as filtering and prediction. In this work, we present the Predictive State Inference Machine (PSIM), a datadriven method that considers the inference procedure on a dynamical system as a composition of predictors. The key idea is that rather than first learning a latent state space model, and then using the learned model for inference, PSIM directly learns predictors for inference in predictive state space. We provide theoretical guarantees for inference, in both realizable and agnostic settings, and showcase practical performance on a variety of simulated and real world robotics benchmarks. 

BibTeX:
@inproceedings{SunICML16, 

M. Mukadam, X. Yan, & B. Boots.  Gaussian Process Motion Planning. (34% Acceptance Rate) 
2016  Proceedings of the 2016 IEEE Conference on Robotics and Automation (ICRA2016) 
Abstract: Motion planning is a fundamental tool in robotics, used to generate collisionfree, smooth, trajectories, while satisfying taskdependent constraints. In this paper, we present a novel approach to motion planning using Gaussian processes. In contrast to most existing trajectory optimization algorithms, which rely on a discrete waypoint parameterization in practice, we represent the continuoustime trajectory as a sample from a Gaussian process (GP) generated by a linear time varying stochastic differential equation. We then provide a gradientbased optimization technique that optimizes continuoustime trajectories with respect to a cost functional. By exploiting GP interpolation, we develop the Gaussian Process Motion Planner (GPMP), that finds optimal trajectories parameterized by a small number of waypoints. We benchmark our algorithm against recent trajectory optimization algorithms by solving 7DOF robotic arm planning problems in simulation and validate our approach on a real 7DOF WAM arm. 

BibTeX:
@inproceedings{MukadamICRA16, 

Y. Nishiyama, A. Afsharinejad, S. Naruse, B. Boots, & L. Song.  The Nonparametric Kernel Bayes Smoother. (30% Acceptance Rate) [Abstract] [BibTeX] [PDF][Supplement] 
2016  Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS2016) 
Abstract: Recently, significant progress has been made on developing kernel mean expressions for
Bayesian inference. An important success in this domain is the nonparametric kernel Bayes filter (nKBfilter), which can be used for sequential inference in state space models. We expand upon this work by introducing a smoothing algorithm, the nonparametric kernel Bayes smoother (nKBsmoother), which relies on kernel Bayesian inference through the kernel sum rule and kernel Bayes rule. We derive the smoothing equations, analyze the computational cost, and show smoothing consistency. We summarize the algorithm, which is simple to implement, requiring only matrix multiplications and the output
of nKBfilter. Finally, we report experimental results that compare the nKBsmoother to previous parametric and nonparametric approaches to Bayesian filtering and smoothing. In the supplementary materials, we show that the combination of nKBfilter and nKBsmoother allows marginal kernel mean computation, which gives an alternative to the kernel belief propagation. 

BibTeX:
@inproceedings{NishiyamaAISTATS16, 

A. Venkatraman, W. Sun, M. Hebert, J. A. Bagnell, & B. Boots.  Online Instrumental Variable Regression with Applications to Online Linear System Identification. (26% Acceptance Rate) 
2016  Proceedings of the 30th Conference on Artificial Intelligence (AAAI2016) 
Abstract: Instrumental variable regression (IVR) is a statistical technique utilized for recovering unbiased estimators when there are errors in the independent variables. Estimator bias in learned time series models can yield poor performance in applications such as longterm prediction and filtering where the recursive use of the model results in the accumulation of propagated error. However, prior work addressed the IVR objective in the batch setting, where it is necessary to store the entire dataset in memory  an infeasible requirement in large dataset scenarios. In this work, we develop Online Instrumental Variable Regression (OIVR), an algorithm that is capable of updating the learned estimator with streaming data. We show that the online adaptation of IVR enjoys a noregret performance guarantee with respect the original batch setting by taking advantage of any noregret online learning algorithm inside OIVR for the underlying update steps. We experimentally demonstrate the efficacy of our algorithm in combination with popular noregret online algorithms for the task of learning predictive dynamical system models and on a prototypical econometrics instrumental variable regression problem. 

BibTeX:
@inproceedings{VenkatramanAAAI16, 

X. Yan, V. Indelman, & B. Boots.  Incremental Sparse GP Regression for Continuoustime Trajectory Estimation & Mapping. 
2015  Proceedings of the 17th International Symposium on Robotics Research (ISRR2015) 
Abstract: Recent work on simultaneous trajectory estimation and mapping (STEAM) for mobile robots has found success by representing the trajectory as a Gaussian process. Gaussian processes can represent a continuoustime trajectory, elegantly handle asynchronous and sparse measurements, and allow the robot to query the trajectory to recover its estimated position at any time of interest. A major drawback of this approach is that STEAM is formulated as a batch estimation problem. In this paper we provide the critical extensions necessary to transform the existing batch algorithm into an extremely efficient incremental algorithm. In particular, we are able to vastly speed up the solution time through efficient variable reordering and incremental sparse updates, which we believe will greatly increase the practicality of Gaussian process methods for robot mapping and localization. Finally, we demonstrate the approach and its advantages on both synthetic and real datasets. 

BibTeX:
@inproceedings{YanISRR15, 

A. Shaban, M. Farajtabar, B. Xie, L. Song, & B. Boots.  Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Methods. (34% Acceptance Rate) 
2015  Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI2015) 
Abstract: Probabilistic latentvariable models are a fundamental tool in statistics and machine learning. Despite their widespread use, identifying the parameters of basic latent variable models continues to be an extremely challenging problem. Traditional maximum likelihoodbased learning algorithms find valid parameters, but suffer from high computational cost, slow convergence, and local optima. In contrast, recently developed method of momentsbased algorithms are computationally efficient and provide strong statistical guarantees, but are not guaranteed to find valid parameters. In this work, we introduce a twostage learning algorithm for latent variable models. We first use method of moments to find a solution that is close to the optimal solution but not necessarily in the valid set of model parameters. We then incrementally refine the solution via exterior point optimization until a local optima that is arbitrarily near the valid set of parameters is found. We perform several experiments on synthetic and realworld data and show that our approach is more accurate than previous work, especially when training data is limited. 

BibTeX:
@inproceedings{ShabanUAI15, 

A. Byravan, M. Monfort, B. Ziebart, B. Boots, & D. Fox.  Graphbased Inverse Optimal Control for Robot Manipulation. (28% Acceptance Rate)  2015  Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI2015) 
Abstract: Inverse optimal control (IOC) is a powerful approach for learning robotic controllers from demonstration that estimates a cost function which rationalizes demonstrated control trajectories. Unfortunately, its applicability is difficult in settings where optimal control can only be solved approximately. While local IOC approaches have been shown to successfully learn cost functions in such settings, they rely on the availability of good reference trajectories, which might not be available at test time. We address the problem of using IOC in these computationally challenging control tasks by using a graphbased discretization of the trajectory space. Our approach projects continuous demonstrations onto this discrete graph, where a cost function can be tractably learned via IOC. Discrete control trajectories from the graph are then projected back to the original space and locally optimized using the learned cost function. We demonstrate the effectiveness of the approach with experiments conducted on two 7degree of freedom robotic arms. 

BibTeX:
@inproceedings{ByravanIJCAI15, 

B. Boots, A. Byravan, & D. Fox.  Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces. (48% Acceptance Rate) 
2014  Proceedings of the 2014 IEEE Conference on Robotics and Automation (ICRA2014) 
Abstract: We attack the problem of learning a predictive model of a depth camera and manipulator directly from raw execution traces. While the problem of learning manipulator models from visual and proprioceptive data has been addressed before, existing techniques often rely on assumptions about the structure of the robot or tracked features in observation space. We make no such assumptions. Instead, we formulate the problem as that of learning a highdimensional controlled stochastic process. We leverage recent work on nonparametric predictive state representations to learn a generative model of the depth camera and robotic arm from sequences of uninterpreted actions and observations. We perform several experiments in which we demonstrate that our learned model can accurately predict future observations in response to sequences of motor commands. 

BibTeX:
@inproceedings{BootsDepthcamManipulator, 

A. Byravan, B. Boots, S. Srinivasa, & D. Fox.  SpaceTime Functional Gradient Optimization for Motion Planning. (48% Acceptance Rate)  2014  Proceedings of the 2014 IEEE Conference on Robotics and Automation (ICRA2014) 
Abstract: Functional gradient algorithms (e.g. CHOMP) have recently shown great promise for producing optimal motion for complex many degreeoffreedom robots. A key limitation of such algorithms, is the difficulty in incorporating constraints and cost functions that explicitly depend on time. We present TCHOMP, a functional gradient algorithm that overcomes this limitation by directly optimizing in spacetime. We outline a framework for joint spacetime optimization, derive an efficient trajectorywide update for maintaining time monotonicity, and demonstrate the significance of TCHOMP over CHOMP in several scenarios. By manipulating time, TCHOMP produces lowercost trajectories leading to behavior that is meaningfully different from CHOMP. 

BibTeX:
@inproceedings{ByravanSpaceTime, 

B. Boots, A. Gretton, & G. J. Gordon.  Hilbert Space Embeddings of Predictive State Representations. (Selected for plenary presentation: 11% Acceptance Rate) 
2013  Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI2013) 
Abstract: Predictive State Representations (PSRs) are an expressive class of models for controlled stochastic processes. PSRs represent state as a set of predictions of future observable events. Because PSRs are defined entirely in terms of observable data, statistically consistent estimates of PSR parameters can be learned efficiently by manipulating moments of observed training data. Most learning algorithms for PSRs have assumed that actions and observations are finite with low cardinality. In this paper, we generalize PSRs to infinite sets of observations and actions, using the recent concept of Hilbert space embeddings of distributions. The essence is to represent the state as a nonparametric conditional embedding operator in a Reproducing Kernel Hilbert Space (RKHS) and leverage recent work in kernel methods to estimate, predict, and update the representation. We show that these Hilbert space embeddings of PSRs are able to gracefully handle continuous actions and observations, and that our learned models outperform competing system identification algorithms on several prediction benchmarks. 

BibTeX:
@inproceedings{BootsHSEPSRs, 

B. Boots & G. J. Gordon.  A Spectral Learning Approach to RangeOnly SLAM. (24% Acceptance Rate) 
2013  Proceedings of the 30th International Conference on Machine Learning (ICML2013) 
Abstract: We present a novel spectral learning algorithm for simultaneous localization and mapping (SLAM) from range data with known correspondences. This algorithm is an instance of a general spectral system identification framework, from which it inherits several desirable properties, including statistical consistency and no local optima. Compared with popular batch optimization or multiplehypothesis tracking (MHT) methods for rangeonly SLAM, our spectral approach offers guaranteed low computational requirements and good tracking performance. Compared with popular extended Kalman filter (EKF) or extended information filter (EIF) approaches, and many MHT ones, our approach does not need to linearize a transition or measurement model; such linearizations can cause severe errors in EKFs and EIFs, and to a lesser extent MHT, particularly for the highly nonGaussian posteriors encountered in rangeonly SLAM. We provide a theoretical analysis of our method, including finitesample error bounds. Finally, we demonstrate on a realworld robotic SLAM problem that our algorithm is not only theoretically justified, but works well in practice: in a comparison of multiple methods, the lowest errors come from a combination of our algorithm with batch optimization, but our method alone produces nearly as good a result at far lower computational cost. 

BibTeX:
@inproceedings{BootsspectralROSLAM, Author = "Byron Boots and Geoffrey Gordon ", Booktitle = "Proceedings of the 30th International Conference on Machine Learning (ICML2013)", Title = "A Spectral Learning Approach to RangeOnly {SLAM}", Year = {2013} } 

B. Boots & G. J. Gordon.  TwoManifold Problems with Applications to Nonlinear System Identification. (27% Acceptance Rate)  2012  Proceedings of the 29th International Conference on Machine Learning (ICML2012) 
Abstract: Recently, there has been much interest in spectral approaches to learning manifoldssocalled kernel eigenmap methods. These methods have had some successes, but their applicability is limited because they are not robust to noise. To address this limitation, we look at twomanifold problems, in which we simultaneously reconstruct two related manifolds, each representing a different view of the same data. By solving these interconnected learning problems together, twomanifold algorithms are able to succeed where a nonintegrated approach would fail: each view allows us to suppress noise in the other, reducing bias. We propose a class of algorithms for twomanifold problems, based on spectral decomposition of crosscovariance operators in Hilbert space, and discuss when twomanifold problems are useful. Finally, we demonstrate that solving a twomanifold problem can aid in learning a nonlinear dynamical system from limited data. 

BibTeX:
@inproceedings{Boots2Manifold, Author = "Byron Boots and Geoffrey Gordon ", Booktitle = "Proceedings of the 29th International Conference on Machine Learning (ICML2012)", Title = "TwoManifold Problems with Applications to Nonlinear System Identification", Year = {2012} } 

B. Boots & G. J. Gordon.  An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems. (25% Acceptance Rate) 
2011  Proceedings of the 25th Conference on Artificial Intelligence (AAAI2011) 
Abstract: Recently, a number of researchers have proposed spectral algorithms for learning models of dynamical systemsfor example, Hidden Markov Models (HMMs), Partially Observable Markov Decision Processes (POMDPs), and Transformed Predictive State Representations (TPSRs). These algorithms are attractive since they are statistically consistent and not subject to local optima. However,
they are batch methods: they need to store their entire training data set in memory at once and operate on it as a large matrix, and so they cannot scale to extremely large data sets (either many examples or many features per example). In turn, this restriction limits their ability to learn accurate models of complex systems. To overcome these limitations, we propose a new online spectral algorithm, which uses tricks such as incremental SVD updates and random projections to scale to much larger data sets and more complex systems than previous methods. We demonstrate the new method on a highbandwidth video mapping task, and illustrate desirable behaviors such as "closing the loop,'' where the latent state representation changes suddenly as the learner recognizes that it has returned to a previously known place. 

BibTeX:
@inproceedings{Bootsonlinepsr, Author = "Byron Boots and Geoffrey Gordon ", Booktitle = "Proceedings of the 25th National Conference on Artificial Intelligence (AAAI2011)", Title = "An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems ", Year = "2011" } 

B. Boots, S. M. Siddiqi & G. J. Gordon.  Closing the LearningPlanning Loop with Predictive State Representations. (Invited Journal Paper)  2011  The International Journal of Robotics Research (IJRR) 
Abstract: A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of actionobservation pairs. To evaluate the learned model, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is nearoptimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated highdimensional, visionbased mobile robot planning task, and then performing approximate pointbased planning in the learned PSR. This experiment shows that the algorithm learns a state space which captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles highdimensional observations and long time horizons by working from realvalued features of observation sequences; and finally, our closetheloop experiments provide an endtoend practical test. 

BibTeX:
@inproceedings{Boots2011b, Author = "Byron Boots and Sajid Siddiqi and Geoffrey Gordon ", Title = "Closing the Learning Planning Loop with Predictive State Representations", Journal = "International Journal of Robotics Research (IJRR)", Volume = "30", Year = "2011", Pages = "954956" } 

B. Boots & G. J. Gordon  Predictive State Temporal Difference Learning. (24% Acceptance Rate) 
2011  Proceedings of Advances in Neural Information Processing Systems 24 (NIPS2010) 
Abstract: We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either highdimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem. 

BibTeX:
@inproceedings{Boots11a, Author = "Byron Boots and Geoffrey J. Gordon ", Booktitle = "Proceedings of Advances in Neural Information Processing Systems 24 (NIPS)", Title = "Predictive State Temporal Difference Learning", Year = {2011} } 

L. Song, B. Boots, S. M. Siddiqi, G. J. Gordon & A. J. Smola  Hilbert Space Embeddings of Hidden Markov Models.
(Best Paper Award) 
2010  Proceedings of the 27th International Conference on Machine Learning (ICML2010) 
Abstract: Hidden Markov Models (HMMs) are important tools for modeling sequence
data. However, they are restricted to discrete latent states, and are
largely restricted to Gaussian and discrete observations. And,
learning algorithms for HMMs have predominantly relied on local search
heuristics, with the exception of spectral methods such as those
described below. We propose a Hilbert space embedding of HMMs that
extends traditional HMMs to structured and nonGaussian continuous distributions. Furthermore, we derive a
localminimumfree kernel spectral algorithm for learning these
HMMs. We apply our method to robot vision data, slot car inertial
sensor data and audio event classification data, and show that in
these applications, embedded HMMs exceed the previous stateoftheart
performance. 

BibTeX:
@inproceedings{Song:2010fk, Author = "L. Song and B. Boots and S. M. Siddiqi and G. J. Gordon and A. J. Smola", Booktitle = "Proceedings of the 27th International Conference on Machine Learning (ICML210)", Title = "Hilbert Space Embeddings of Hidden {M}arkov Models", Year = {2010} } 

B. Boots, S. M. Siddiqi & G. J. Gordon.  Closing the LearningPlanning Loop with Predictive State Representations. (Selected for plenary presentation: 8% acceptance rate) 
2010  Proceedings of Robotics: Science and Systems
VI (RSS2010) 
Abstract: A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of actionobservation pairs. To evaluate the learned model, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is nearoptimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated highdimensional, visionbased mobile robot planning task, and then performing approximate pointbased planning in the learned PSR. This experiment shows that the algorithm learns a state space which captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles highdimensional observations and long time horizons by working from realvalued features of observation sequences; and finally, our closetheloop experiments provide an endtoend practical test. 

BibTeX:
@inproceedings{BootsRSS10, Author = "B. Boots AND S. Siddiqi and G. Gordon", Title = "Closing the LearningPlanning Loop with Predictive State Representations", Booktitle = "Proceedings of Robotics: Science and Systems VI (RSS2010)", Year = "2010", Address = "Zaragoza, Spain", Month = "June" } 

S. M. Siddiqi, B. Boots & G. J. Gordon.  ReducedRank Hidden Markov Models. (Selected for plenary presentation: 8% acceptance rate) 
2010  Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS2010) 
Abstract: Hsu et al. (2009) recenly proposed an efficient, accurate spectral learning algorirhm for Hidden Markov Models (HMMs). In this paper we relax their assumptions and prove a tighter finitesample error bound for the case of ReducedRank HMMs, i.e., HMMs with lowrank transition matrices. Since rankk RRHMMs are a larger class of models than kstate HMMs while being equally efficient to work with, this relaxation greatly increases the learning algorithm's scope. In addition, we generalize the algorithm and bounds to models where multiple observations are needed to disambiguate state, and to models that emit multivariate realvalued observations. Finally we prove consistency for learning Predictive State Representations, an even larger class of models. Experiments on synthetic data and a toy video, as well as on difficult robot vision data, yeild accurate models that compare favorably with alternatives in simulation quality and prediction accuracy. 

BibTeX:
@inproceedings{Siddiqi10a, author = "Sajid Siddiqi and Byron Boots and Geoffrey J. Gordon", title = "ReducedRank Hidden {Markov} Models", booktitle = "Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS2010)", year = "2010" } 

B. Boots, S. M. Siddiqi & G. J. Gordon.  Closing the LearningPlanning Loop with Predictive State Representations. (Short paper: for longer version see paper accepted at RSS above)  2010  Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS2010) 
Abstract: A central problem in artificial intelligence is to plan to maximize future reward under uncertainty in a partially observable environment. Models of such environments include Partially Observable Markov Decision Processes (POMDPs) as well as their generalizations, Predictive State Representations (PSRs) and Observable Operator Models (OOMs). POMDPs model the state of the world as a latent variable; in contrast, PSRs and OOMs
represent state by tracking occurrence probabilities of a set of future events (called tests or characteristic events) conditioned on past events (called histories or indicative events). Unfortunately, exact planning algorithms such as value iteration are intractable for most realistic POMDPs due to the curse of history and the curse of dimensionality. However, PSRs and OOMs hold the promise of mitigating both of these curses: first, many successful approximate planning techniques designed to address these problems in POMDPs can easily be adapted to PSRs and OOMs. Second, PSRs and OOMs are often more compact than their corresponding POMDPs (i.e., need fewer state dimensions), mitigating the curse of dimensionality. Finally, since tests and histories are observable quantities, it has been suggested that PSRs and OOMs should be easier to learn than POMDPs; with a successful learning algorithm, we can look for a model which ignores all but the most important components of state, reducing dimensionality still further. 

BibTeX:
@inproceedings{Boots10a, author = "Byron Boots and Sajid Siddiqi and Geoffrey J. Gordon", title = "Closing the LearningPlanning Loop with Predictive State Representations", booktitle = "Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS2010)", year = "2010" } 

S. M. Siddiqi, B. Boots & G. J. Gordon.  A Constraint Generation Approach to Learning Stable Linear Dynamical Systems. (Selected for plenary presentation: 2.5% acceptance rate) 
2008  Proceedings of Advances in Neural Information Processing Systems 21 (NIPS2007) 
Abstract: Stability is a desirable characteristic for linear dynamical systems, but it is often ignored by algorithms that learn these systems from data. We propose a novel method for learning stable linear dynamical systems: we formulate an approximation of the problem as a convex program, start with a solution to a relaxed version of the program, and incrementally add constraints to improve stability. Rather than continuing to generate constraints until we reach a feasible solution, we test stability at each step; because the convex program is only an approximation of the desired problem, this early stopping rule can yield a higherquality solution. We apply our algorithm to the task of learning dynamic textures from image sequences as well as to modeling biosurveillance drugsales data. The constraint generation approach leads to noticeable improvement in the quality of simulated sequences. We compare our method to those of Lacy and Bernstein, with positive results in terms of accuracy, quality of simulated sequences, and efﬁciency. 

BibTeX:
@inproceedings{Siddiqi07b, author = "Sajid Siddiqi and Byron Boots and Geoffrey J. Gordon", title = "A Constraint Generation Approach to Learning Stable Linear Dynamical Systems", booktitle = "Proceedings of Advances in Neural Information Processing Systems 20 (NIPS07)", year = "2007" } 

S. M. Siddiqi, B. Boots G. J. Gordon, & A. W. Dubrawski  Learning Stable Multivariate Baseline Models for Outbreak Detection.  2007  Advances in Disease Surveillance 
Abstract: We propose a novel technique for building generative models of realvalued multivariate time series data streams. Such models are of considerable utility as baseline simulators in anomaly detection systems. The proposed algorithm, based on Linear Dynamical Systems (LDS), learns stable parameters efficiently while yeilding more accurate results than previously known methods. The resulting model can be used to generate infinitely long sequences of realistic baselines using small samples of training data. 

BibTeX:
@article{Siddiqi07c, author = "Sajid Siddiqi and Byron Boots and Geoffrey J. Gordon and Artur W. Dubrawski", title = "Learning Stable Multivariate Baseline Models for Outbreak Detection", journal = "Advances in Disease Surveillance", year = "2007", volume = {4}, pages = {266} } 

B. Boots, S. Nundy & D. Purves.  Evolution of Visually Guided Behavior in Artificial Agents.  2007  Network: Computation in Neural Systems 
Abstract: Recent work on brightness, color, and form has suggested that human visual percepts represent the probable sources of retinal images rather than stimulus features as such. Here we investigate the plausibility of this empirical concept of vision by allowing autonomous agents to evolve in virtual environments based solely on the relative success of their behavior. The responses of evolved agents to visual stimuli indicate that fitness improves as the neural network control systems gradually incorporate the statistical relationship between projected images and behavior appropriate to the sources of the inherently ambiguous images. These results: (1) demonstrate the merits of a wholly empirical strategy of animal vision as a means of contending with the inverse optics problem; (2) argue that the information incorporated into biological visual processing circuitry is the relationship between images and their probable sources; and (3) suggest why human percepts do not map neatly onto physical reality. 

BibTeX:
@article{Boots2007a, author = "Byron Boots and Surajit Nundy and Dale Purves", title = "Evolution of Visually Guided Behavior in Artificial Agents", journal = "Network: Computation in Neural Systems", year = "2007", volume = {18(1)}, pages = {1134} } 

S. Majercik & B. Boots  DCSSAT: A DivideandConquer Approach to Solving Stochastic Satisfiability Problems Efficiently. (18% acceptance rate) 
2005  Proceedings of the 20th National Conference on Artificial Intelligence (AAAI2005) 
Abstract: We present DCSSAT, a sound and complete divideandconquer algorithm for solving stochastic satisfiability (SSAT) problems that outperforms the best existing algorithm for solving such problems (ZANDER) by several orders of magnitude with respect to both time and space. DCSSAT achieves this performance by dividing the SSAT problem into subproblems based on the structure of the original instance, caching the viable partial assignments (VPAs) generated by solving these subproblems, and using these VPAs to construct the solution to the original problem. DCSSAT does not save redundant VPAs and each VPA saved is necessary to construct the solution. Furthermore, DCSSAT builds a solution that is already humancomprehensible, allowing it to avoid the costly solution rebuilding phase in ZANDER. As a result, DCSSAT is able to solve problems using, typically, 12 orders of magnitude less space than ZANDER, allowing DCSSAT to solve problems ZANDER cannot solve due to space constraints. And, in spite of its more parsimonious use of space, DCSSAT is typically 12 orders of magnitude faster than ZANDER. We describe the DCSSAT algorithm and present empirical results comparing its performance to that of ZANDER on a set of SSAT problems. 

BibTeX:
@inproceedings{Majercik2005, author = "Stephen M. Majercik and Byron Boots", title = "DCSSAT: A DivideandConquer Approach to Solving Stochastic Satisfiability Problems Efficiently", booktitle = "Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI05)", year = "2005", note = "AAAI05" } 
Refereed Abstracts & Workshop Publications
Authors  Title  Year  Venue 

G. Williams, N. Wagener, B. Goldfain, P. Drews, J. Rehg, B. Boots, & E. Theodorou.  Information Theoretic MPC Using Neural Network Dynamics. [PDF Coming Soon] 
2016  The NIPS Deep Reinforcement Learning Workshop 
Y. Pan, X. Yan, E. Theodorou, & B. Boots.  Solving the Linear Bellman Equation via Kernel Embeddings and Stochastic Gradient Descent. (Selected for oral presentation) [PDF Coming Soon] 
2016  NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning 
C. Cheng & B. Boots.  Incremental Variational Sparse Gaussian Process Regression.
[PDF Coming Soon] 
2016  NIPS Workshop on Adaptive and Scalable Nonparametric Methods in Machine Learning 
Y. Pan, X. Yan, E. Theodorou, & B. Boots.  Scalable Reinforcement Learning via Trajectory Optimization and Approximate Gaussian Process Regression.  2015  NIPS Workshop on Advances in Approximate Bayesian Inference 
Abstract: In order to design an efficient RL algorithm, we combine the attractive characteristics of two approaches: local trajectory optimization and random feature approximations. Local trajectory optimization methods, such as Differential Dynamic Programming (DDP), are a class of approaches for solving nonlinear optimal control problems. Compared to global approaches, DDP shows superior computational efficiency and scalability to highdimensional problems. The principal limitation of DDP is that it relies on accurate and explicit representation of the dynamics. In this work we take a nonparametric approach to learn the dynamics based on Gaussian processes (GPs). GPs have demonstrated encouraging performance in modeling dynamical systems, but are also computationally expensive and do not scale to moderate or large datasets. While a number of approximation methods exist, sparse spectrum Gaussian process regression (SSGPR) stands out with a superior combination of efficiency and accuracy. By combining the benefits of both DDP and SSGPR, we show that our approach is able to scale to highdimensional dynamical systems and large datasets. 

BibTeX:
@article{PanNIPSWS15, author = "Yunpeng Pan and Xinyan Yan and Evangelos Theodorou and Byron Boots", title = "Scalable Reinforcement Learning via Trajectory Optimization and Approximate {G}aussian Process Regression", journal = "NIPS Workshop on Advances in Approximate Bayesian Inference", year = "2015" }  
A. Shaban, M. Farajtabar, B. Xie, L. Song, & B. Boots.  Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Methods.  2015  NIPS Workshop on Nonconvex Optimization for Machine Learning: Theory and Practice 
Abstract: Probabilistic latentvariable models are a fundamental tool in statistics and machine learning. Despite their widespread use, identifying the parameters of basic latent variable models continues to be an extremely challenging problem. Traditional maximum likelihoodbased learning algorithms find valid parameters, but suffer from high computational cost, slow convergence, and local optima. In contrast, recently developed method of momentsbased algorithms are computationally efficient and provide strong statistical guarantees, but are not guaranteed to find valid parameters. In this work, we introduce a twostage learning algorithm for latent variable models. We first use method of moments to find a solution that is close to the optimal solution but not necessarily in the valid set of model parameters. We then incrementally refine the solution via exterior point optimization until a local optima that is arbitrarily near the valid set of parameters is found. We perform several experiments on synthetic and realworld data and show that our approach is more accurate than previous work, especially when training data is limited. 

BibTeX:
@article{ShabanNIPSWS15, author = "Amirreza Shaban and Mehrdad Farajtabar and Bo Xie and Le Song and Byron Boots", title = "Learning Latent Variable Models by Improving Spectral Solutions with Exterior Point Methods", journal = "NIPS Workshop on Nonconvex Optimization for Machine Learning: Theory and Practice", year = "2015" }  
X. Yan, V. Indelman, & B. Boots.  Incremental Sparse GP Regression for Continuoustime Trajectory Estimation & Mapping.
(Best Poster Award) 
2015  RSS Workshop on the Problem of Mobile Sensors: Setting future goals and indicators of progress for SLAM. 
Abstract: Recent work on simultaneous trajectory estimation and mapping (STEAM) for mobile robots has found success by representing the trajectory as a Gaussian process. Gaussian processes can represent a continuoustime trajectory, elegantly handle asynchronous and sparse measurements, and allow the robot to query the trajectory to recover its estimated position at any time of interest. A major drawback of this approach is that STEAM is formulated as a batch estimation problem. In this paper we provide the critical extensions necessary to transform the existing batch algorithm into an extremely efficient incremental algorithm. In particular, we are able to vastly speed up the solution time through efficient variable reordering and incremental sparse updates, which we believe will greatly increase the practicality of Gaussian process methods for robot mapping and localization. Finally, we demonstrate the approach and its advantages on both synthetic and real datasets. 

BibTeX:
@article{YanRSSWS15, author = "Xinyan Yan and Vadim Indelman and Byron Boots", title = "Incremental Sparse {GP} Regression for Continuoustime Trajectory Estimation \& Mapping", journal = "The Problem of Mobile Sensors: Setting future goals and indicators of progress for SLAM", year = "2015" }  
X. Yan, B. Xie, L. Song, & B. Boots.  LargeScale Gaussian Process Regression via Doubly Stochastic Gradient Descent.  2015  The ICML Workshop on LargeScale Kernel Learning: Challenges and New Opportunities 
Abstract: Gaussian process regression (GPR) is a popular
tool for nonlinear function approximation. Unfortunately, GPR can be difficult to use in practice due to the O(n^2) memory and O(n
^3) processing requirements for n training data points. We propose a novel approach to scaling up GPR to handle large datasets using the recent concept
of doubly stochastic functional gradients. Our approach relies on the fact that GPR can be expressed as a convex optimization problem that can be solved by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random features, and then descending using this noisy functional gradient. The effectiveness of the resulting algorithm is evaluated on the wellknown problem of learning the inverse dynamics of a robot manipulator. 

BibTeX:
@article{YanICMLWS15, author = "Xinyan Yan and Bo Xie and Le Song and Byron Boots", title = "LargeScale Gaussian Process Regression via Doubly Stochastic Gradient Descent", journal = "The ICML Workshop on LargeScale Kernel Learning", year = "2015" }  
X. Yan, V. Indelman, & B. Boots.  Incremental Sparse Gaussian Process Regression for Continuoustime Trajectory Estimation & Mapping.  2014  NIPS Workshop on Autonomously Learning Robots 
Abstract: Recent work has investigated the problem of continuoustime trajectory estimation and mapping for mobile robots by formulating the problem as sparse Gaussian
process regression. Gaussian processes provide a continuoustime representation of the robot trajectory, which elegantly handles asynchronous and sparse measurements, and allows the robot to query the trajectory to recover it’s estimated position at any time of interest. One of the major drawbacks of this approach is that Gaussian process regression formulates continuoustime trajectory estimation as a batch estimation problem. In this work, we provide the critical extensions necessary to transform this existing batch approach into an extremely efficient incremental
approach. In particular, we are able to vastly speed up the solution time through efficient variable reordering and incremental sparse updates, which we believe will greatly increase the practicality of Gaussian process methods for
robot mapping and localization. Finally, we demonstrate the approach and its advantages on both synthetic and real datasets. 

BibTeX:
@article{BootsNIPSWS14, author = "Xinyan Yan and Vadim Indelman and Byron Boots", title = "Incremental Sparse GP Regression for Continuoustime Trajectory Estimation & Mapping", journal = "NIPS Workshop on Autonomously Learning Robots", year = "2014" } 

Z. Marinho, A. Dragan, A. Byravan, S. Srinivasa, G. Gordon, & B. Boots.  Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces.  2014  NIPS Workshop on Autonomously Learning Robots 
Abstract: We introduce a functional gradient descent based trajectory optimization algorithm for robot motion planning in arbitrary Reproducing Kernel Hilbert Spaces (RKHSs). Functional gradient algorithms are a popular choice for motion planning in complex many degreeoffreedom robots. In theory, these algorithms work by directly optimizing continuous trajectories to avoid obstacles while maintaining smoothness. However, in practice, functional gradient algorithms commit to a finite parametrization of the trajectories, often as a finite set of waypoints. Such a parametrization limits expressiveness, and can fail to produce smooth trajectories despite the inclusion of smoothness in the objective. As a result, we often observe practical problems such as slow convergence and the requirement to choose an inconveniently small step size. Our work generalizes the waypoint parametrization to arbitrary RKHSs by formulating trajectory optimization as minimization of a cost functional. We derive a gradient update method that is able to take larger steps and achieve a locally minimum trajectory in just a few iterations. Depending on the selection of a kernel, we can directly optimize in spaces of continuous trajectories that are inherently smooth, and that have a lowdimensional, adaptively chosen parametrization. Our experiments illustrate the effectiveness of the planner for two different kernels, RBFs and Bsplines, as compared to the standard discretized waypoint representation. 

BibTeX:
@article{Marinho14, author = "Zita Marinho and Anca Dragan and Arun Byravan and Siddhartha Srinivasa and Geoffrey J. Gordon and Byron Boots", title = "Functional Gradient Motion Planning in Reproducing Kernel Hilbert Spaces", journal = "NIPS Workshop on Autonomously Learning Robots", year = "2014" } 

A. Venkatraman, B. Boots, M. Hebert, & J. A. Bagnell. 
Data as Demonstrator with Applications to System Identification.  2014  NIPS Workshop on Autonomously Learning Robots 
Abstract: Machine learning techniques for system identification and time series modeling often phrase the problem as the optimization of a loss function over a single timestep prediction. However, in many applications, the learned model is recursively
applied in order to make a multiplestep prediction, resulting in compounding prediction errors. We present DATA AS DEMONSTRATOR, an approach that reuses training data to make a noregret learner robust to errors made during multistep prediction. We present results on the task of linear system identification applied to a simulated system and to a realworld dataset. 

BibTeX:
@article{Venkatraman14, author = "Arun Venkatraman and Byron Boots and Martial Hebert and James Bagnell", title = "Data as Demonstrator with Applications to System Identification", journal = "NIPS Workshop on Autonomously Learning Robots", year = "2014" } 

A. Byravan, M. Montfort, B. Ziebart, B. Boots, & D. Fox. 
Layered Hybrid Inverse Optimal Control for Learning Robot Manipulation from Demonstration.  2014  NIPS Workshop on Autonomously Learning Robots 
Abstract: Inverse optimal control (IOC) is a powerful approach for learning robotic controllers from demonstration that estimates a cost function which rationalizes demonstrated control trajectories. Unfortunately, its applicability is difficult in
settings where optimal control can only be solved approximately. Local IOC approaches rationalize demonstrated trajectories based on a linearquadratic approximation around a good reference trajectory (i.e., the demonstrated trajectory itself). Without this same reference trajectory, however, dissimilar control results. We address the complementary problem of using IOC to find appropriate reference trajectories in these computationally challenging control tasks. After discussing the inherent difficulties of learning within this setting, we present a projection technique from the original trajectory space to a discrete and tractable trajectory space and perform IOC on this space. Control trajectories are projected back to the original space and locally optimized. We demonstrate the effectiveness of the approach with experiments conducted on a 7degree of freedom robotic arm. 

BibTeX:
@article{Byravan14, author = "Arunkumar Byravan and Matthew Montfort and Brian Ziebart and Byron Boots and Dieter Fox", title = "Layered Hybrid Inverse Optimal Control for Learning Robot Manipulation from Demonstration", journal = "NIPS Workshop on Autonomously Learning Robots", year = "2014" } 

B. Boots, A. Byravan, & D. Fox.  Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces.  2013  NIPS Workshop on Advances in Machine Learning for Sensorimotor Control 
Abstract: We attack the problem of learning a predictive model of a depth camera and manipulator directly from raw execution traces. While the problem of learning manipulator models from visual and proprioceptive data has been addressed before, existing techniques often rely on assumptions about the structure of the robot or tracked features in observation space. We make no such assumptions. Instead, we formulate the problem as that of learning a highdimensional controlled stochastic process. We leverage recent work on nonparametric predictive state representations to learn a generative model of the depth camera and robotic arm from sequences of uninterpreted actions and observations. We perform several experiments in which we demonstrate that our learned model can accurately predict future observations in response to sequences of motor commands. 

BibTeX:
@article{Boots13NIPSa, author = "Byron Boots and Arunkumar Byravan and Dieter Fox", title = "Learning Predictive Models of a Depth Camera & Manipulator from Raw Execution Traces", journal = "NIPS Workshop on Advances in Machine Learning for Sensorimotor Control", year = "2013" } 

B. Boots & D. Fox.  Learning Dynamic Policies from Demonstration.  2013  NIPS Workshop on Advances in Machine Learning for Sensorimotor Control 
Abstract: We address the problem of learning a policy directly from expert demonstrations. Typically, this problem is solved with a supervised learning approach such as regression or classification resulting in a reactive policy. Unfortunately, reactive policies cannot model longrange dependancies, and this omission can result in suboptimal performance. We take a different approach. We observe that policies and dynamical systems are mathematical duals, and we use this fact to leverage the rich literature on system identification to learn policies from demonstration. System identification algorithms often have desirable properties like the ability to model longrange dependancies, statistical consistency, and efficient offtheshelf implementations. We show that by employing system identification to learning from demonstration problems, all of these properties can be carried over to the learning from demonstration domain resulting in improved practical performance. 

BibTeX:
@article{Boots13NIPSb, author = "Byron Boots and Dieter Fox", title = "Learning Dynamic Policies from Demonstration", journal = "NIPS Workshop on Advances in Machine Learning for Sensorimotor Control", year = "2013" } 

B. Boots, A. Gretton, & G. J. Gordon. 
Hilbert Space Embeddings of PSRs.  2013  ICML Workshop on Machine Learning and System Identification 
Abstract: We fully generalize PSRs to continuous observations and actions using a recent concept called Hilbert space embeddings of distributions. The essence of our method is to represent distributions of tests, histories, observations, and actions, as points in (possibly) infinitedimensional Reproducing Kernel Hilbert Spaces (RKHSs). During filtering we update these distributions using a kernel version of Bayes' rule. To improve computational tractability, we develop a spectral system identification method to learn a succinct parameterization of the target system. 

BibTeX:
@article{Boots13ICML, author = "Byron Boots, Arthur Gretton, and Geoffrey J. Gordon", title = "Hilbert Space Embeddings of PSRs", journal = "ICML workshop on Machine Learning and System Identification (MLSYSID)", year = "2013" } 

B. Boots, A. Gretton, & G. J. Gordon. 
Hilbert Space Embeddings of PSRs.  2012  NIPS Workshop on Spectral Algorithms for Latent Variable Models 
Abstract: We fully generalize PSRs to continuous observations and actions using a recent concept called Hilbert space embeddings of distributions. The essence of our method is to represent distributions of tests, histories, observations, and actions, as points in (possibly) infinitedimensional Reproducing Kernel Hilbert Spaces (RKHSs). During filtering we update these distributions using a kernel version of Bayes' rule. To improve computational tractability, we develop a spectral system identification method to learn a succinct parameterization of the target system. 

BibTeX:
@article{Boots12NIPSb, author = "Byron Boots, Arthur Gretton, and Geoffrey J. Gordon", title = "Hilbert Space Embeddings of PSRs", journal = "NIPS Workshop on Spectral Algorithms for Latent Variable Models", year = "2012" } 

B. Boots, & G. J. Gordon.  A Spectral Learning Approach to RangeOnly SLAM.  2012  NIPS Workshop on Spectral Algorithms for Latent Variable Models 
Abstract: We present a novel spectral learning algorithm for simultaneous localization and mapping (SLAM) from range data with known correspondences. This algorithm is an instance of a general spectral system identification framework, from which it inherits several desirable properties, including statistical consistency and no local optima. Compared with popular batch optimization or multiplehypothesis tracking methods for rangeonly SLAM, our spectral approach offers guaranteed low computational requirements and good tracking performance. 

BibTeX:
@article{Boots12NIPSa, author = "Byron Boots and Geoffrey J. Gordon", title = "A Spectral Learning Approach to RangeOnly SLAM", journal = "NIPS Workshop on Spectral Algorithms for Latent Variable Models", year = "2012" } 

B. Boots & G. J. Gordon.  Online Spectral Identification of Dynamical Systems.  2011  NIPS Workshop on Sparse Representation and Lowrank Approximation 
Abstract: Recently, a number of researchers have proposed spectral algorithms for learning models of dynamical systemsfor example, Hidden Markov Models (HMMs), Partially Observable Markov Decision Processes (POMDPs), and Transformed Predictive State Representations (TPSRs). These algorithms are attractive since they are statistically consistent and not subject to local optima. However,
they are batch methods: they need to store their entire training data set in memory at once and operate on it as a large matrix, and so they cannot scale to extremely large data sets (either many examples or many features per example). In turn, this restriction limits their ability to learn accurate models of complex systems. To overcome these limitations, we propose a new online spectral algorithm, which uses tricks such as incremental SVD updates and random projections to scale to much larger data sets and more complex systems than previous methods. We demonstrate the new method on a highbandwidth video mapping task, and illustrate desirable behaviors such as "closing the loop,'' where the latent state representation changes suddenly as the learner recognizes that it has returned to a previously known place. 

BibTeX:
@inproceedings{Boots11nips, author = "Byron Boots and Geoffrey J. Gordon", title = "Online Spectral Identification of Dynamical Systems", journal = "NIPS Workshop on Sparse Representation and Lowrank Approximation", year = "2011" } 

B. Boots, S. M. Siddiqi & G. J. Gordon.  Closing the LearningPlanning Loop with Predictive State. Representations.  2009  NIPS Workshop on Probabilistic Approaches for Robotics and Control 
Abstract: We propose a principled and provably statistically consistent modellearning algorithm, and demonstrate positive results on a challenging highdimensional problem with continuous observations. In particular, we propose a novel, consistent spectral algorithm for learning a variant of PSRs called Transformed PSRs (TPSRs) directly from execution traces. 

BibTeX:
@article{Boots09nips, author = "Byron Boots and Sajid M. Siddiqi and Geoffrey J. Gordon", title = "Closing the LearningPlanning Loop with Predictive State Representations", journal = "NIPS Workshop on Probabilistic Approaches for Robotics and Control", year = "2009" } 

D. Purves, & B. Boots  Evolution of Visually Guided Behavior in Artificial Agents.  2006  Vision Science Society Annual Meeting/Journal of Vision 
Abstract: Recent work on brightness, color and form has suggested that human visual percepts represent the probable sources of retinal images rather than stimulus features as such. We have investigated this empirical concept of vision by asking whether agents using neural network control systems evolve successful visually guided behavior based solely on the statistical relationship of images on their sensor arrays and the probable sources of the images in a simulated environment. A virtual environment was created with OpenGL consisting of an arena with a central obstacle, similar to arenas used in evolutionary robotics experiments. The neural control system for each agent comprised a singlelayer, feedforward network that connected all 256 inputs from a sensor array to two output nodes that encoded rotation and translation responses. Each agent's behavioral actions in the environment were evaluated, and the fittest individuals selected to produce a new population according to a standard genetic algorithm. This process was repeated until the average fitness of subsequent generations reached a plateau. Analysis of the actions of evolved agents in response to visual input showed their neural network control systems had incorporated the statistical relationship between projected images and their possible sources, and that this information was used to produce increasingly successful visually guided behavior. The simplicity of this paradigm notwithstanding, these results support the idea that biological vision has evolved to solve the inverse problem on a wholly empirical basis, and provide a novel way of exploring visual processing. 

BibTeX:
@article{Purvesn2006, author = "Dale Purves and Byron Boots", title = "Evolution of Visually Guided Behavior in Artificial Agents", journal = "Journal of Vision", year = {2006}, volume = {6(6)}, pages = {356a} } 
Technical Reports & Book Chapters
Authors  Title  Year  Tech. Report/Book 

J. Dong, J. Burnham, B. Boots, G. Rains, & F. Dellaert.  4D Crop Monitoring: SpatioTemporal Reconstruction for Agriculture.  2016  Technical Report arXiv:1610.02482 
Abstract: Autonomous crop monitoring at high spatial and temporal resolution is a critical problem in precision agriculture. While Structure from Motion and MultiView Stereo algorithms can finely reconstruct the 3D structure of a field with lowcost image sensors, these algorithms fail to capture the dynamic nature of continuously growing crops. In this paper we propose a 4D reconstruction approach to crop monitoring, which employs a spatiotemporal model of dynamic scenes that is useful for precision agriculture applications. Additionally, we provide a robust data association algorithm to address the problem of large appearance changes due to scenes being viewed from different angles at different points in time, which is critical to achieving 4D reconstruction. Finally, we collected a high quality dataset with ground truth statistics to evaluate the performance of our method. We demonstrate that our 4D reconstruction approach provides models that are qualitatively correct with respect to visual appearance and quantitatively accurate when measured against the ground truth geometric properties of the monitored crops. 

BibTeX:
@article{Dong16arxiv, author = {Jing Dong and John Gary Burnham and Byron Boots and Glen C. Rains and Frank Dellaert}, title = {4D Crop Monitoring: SpatioTemporal Reconstruction for Agriculture}, journal = {CoRR}, volume = {abs/1610.02482}, year = {2016}, url = {http://arxiv.org/abs/1610.02482} } 

Y. Pan, X. Yan, E. Theodorou, & B. Boots  Adaptive Probabilistic Trajectory Optimization via Efficient Approximate Inference.  2016  Technical Report arXiv:1608.06235 
Abstract: Robotic systems must be able to quickly and robustly make decisions when operating in uncertain and dynamic environments. While Reinforcement Learning (RL) can be used to compute optimal policies with little prior knowledge about the environment, it suffers from slow convergence. An alternative approach is Model Predictive Control (MPC), which optimizes policies quickly, but also requires accurate models of the system dynamics and environment. In this paper we propose a new approach, adaptive probabilistic trajectory optimization, that combines the benefits of RL and MPC. Our method uses scalable approximate inference to learn and updates probabilistic models in an online incremental fashion while also computing optimal control policies via successive local approximations. We present two variations of our algorithm based on the Sparse Spectrum Gaussian Process (SSGP) model, and we test our algorithm on three learning tasks, demonstrating the effectiveness and efficiency of our approach. 

BibTeX:
@article{Pan16arxiv, author = {Yunpeng Pan and Xinyan Yan and Evangelos Theodorou and Byron Boots}, title = {Adaptive Probabilistic Trajectory Optimization via Efficient Approximate Inference}, journal = {CoRR}, volume = {abs/1608.06235}, year = {2016}, url = {http://arxiv.org/abs/1608.06235} } 

B. Dai, N. He, Y. Pan, B. Boots, & L. Song  Learning from Conditional Distributions via Dual Kernel Embeddings.  2016  Technical Report arXiv:1607.04579 
Abstract: In many machine learning problems, such as policy evaluation in reinforcement learning and learning with invariance, each data point x itself is a conditional distribution p(zx), and we want to learn a function f which links these conditional distributions to target values y. The learning problem becomes very challenging when we only have limited samples or in the extreme case only one sample from each conditional distribution p(zx). Commonly used approaches either assume that z is independent of x, or require an overwhelmingly large sample size from each conditional distribution.
To address these challenges, we propose a novel approach which reformulates the original problem into a minmax optimization problem. In the new view, we only need to deal with the kernel embedding of the joint distribution p(z,x) which is easy to estimate. Furthermore, we design an efficient learning algorithm based on mirror descent stochastic approximation, and establish the sample complexity for learning from conditional distributions. Finally, numerical experiments in both synthetic and real data show that our method can significantly improve over the previous stateofthearts. 

BibTeX:
@article{Dai16arxiv, author = {Bo Dai and Niao He and Yunpeng Pan and Byron Boots and Le Song}, title = {Learning from Conditional Distributions via Dual Kernel Embeddings}, journal = {CoRR}, volume = {abs/1607.04579}, year = {2016}, url = {http://arxiv.org/abs/1607.04579} } 

Z. Marinho, A. Dragan, A. Byravan, B. Boots, S. Srinivasa, & G. J. Gordon.  Functional Gradient Motion Planning in Reproducing Kernel Hilbert Space.  2016  Technical Report arXiv:1601.03648 
Abstract: We introduce a functional gradient descent trajectory optimization algorithm for robot motion planning in Reproducing Kernel Hilbert Spaces (RKHSs). Functional gradient algorithms are a popular choice for motion planning in complex manydegreeoffreedom robots, since they (in theory) work by directly optimizing within a space of continuous trajectories to avoid obstacles while maintaining geometric properties such as smoothness. However, in practice, functional gradient algorithms typically commit to a fixed, finite parameterization of trajectories, often as a list of waypoints. Such a parameterization can lose much of the benefit of reasoning in a continuous trajectory space: e.g., it can require taking an inconveniently small step size and large number of iterations to maintain smoothness. Our work generalizes functional gradient trajectory optimization by formulating it as minimization of a cost functional in an RKHS. This generalization lets us represent trajectories as linear combinations of kernel functions, without any need for waypoints. As a result, we are able to take larger steps and achieve a locally optimal trajectory in just a few iterations. Depending on the selection of kernel, we can directly optimize in spaces of trajectories that are inherently smooth in velocity, jerk, curvature, etc., and that have a lowdimensional, adaptively chosen parameterization. Our experiments illustrate the effectiveness of the planner for different kernels, including Gaussian RBFs, Laplacian RBFs, and Bsplines, as compared to the standard discretized waypoint representation. 

BibTeX:
@article{Marinho16arxiv, author = {Zita Marinho and Anca Dragan and Arun Byravan and Byron Boots and Siddhartha Srinivasa and Geoffrey J. Gordon}, title = {Functional Gradient Motion Planning in Reproducing Kernel Hilbert Space}, journal = {CoRR}, volume = {abs/1601.03648}, year = {2016}, url = {http://arxiv.org/abs/1601.03648} } 

W. Sun, A. Venkatraman, B. Boots, & J. A. Bagnell.  Learning to Filter with Predictive State Inference Machines.  2015  Technical Report arXiv:1512.08836 
Abstract: Latent state space models are one of the most fundamental and widely used tools for modeling dynamical systems. Traditional Maximum Likelihood Estimation (MLE) based approaches aim to maximize the likelihood objective, which is nonconvex due to latent states. While nonconvex optimization methods like EM can learn models that locally optimize the likelihood objective, using the locally optimal model for an inference task such as Bayesian filtering usually does not have performance guarantees. In this work, we propose a method that considers the inference procedure on the dynamical system as a composition of predictors. Instead of optimizing a given parametrization of latent states, we learn predictors for inference in predictive belief space, where we can use sufficient features of observations for supervision of our learning algorithm. We further show that our algorithm, the Predictive State Inference Machine, has theoretical performance guarantees on the inference task. Empirical verification across several of dynamical system benchmarks ranging from a simulated helicopter to recorded telemetry traces from a robot showcase the abilities of training Inference Machines. 

BibTeX:
@article{Sun15arxiv, author = {Wen Sun and Arun Venkatraman and Byron Boots and J. Andrew Bagnell}, title = {Learning to Filter with Predictive State Inference Machines}, journal = {CoRR}, volume = {abs/1512.08836}, year = {2015}, url = {http://arxiv.org/abs/1512.08836} } 

X. Yan, V. Indelman, & B. Boots.  Incremental Sparse GP Regression for Continuoustime Trajectory Estimation & Mapping.  2015  Technical Report arXiv:1504.02696 
Abstract: Recent work on simultaneous trajectory estimation and mapping (STEAM) for mobile robots has found success by representing the trajectory as a Gaussian process. Gaussian processes can represent a continuoustime trajectory, elegantly handle asynchronous and sparse measurements, and allow the robot to query the trajectory to recover its estimated position at any time of interest. A major drawback of this approach is that STEAM is formulated as a batch estimation problem. In this paper we provide the critical extensions necessary to transform the existing batch algorithm into an extremely efficient incremental algorithm. In particular, we are able to vastly speed up the solution time through efficient variable reordering and incremental sparse updates, which we believe will greatly increase the practicality of Gaussian process methods for robot mapping and localization. Finally, we demonstrate the approach and its advantages on both synthetic and real datasets. 

BibTeX:
@article{Yan15arxiv, author = {Xinyan Yan and Vadim Indelman and Byron Boots}, title = {Incremental Sparse {GP} Regression for Continuoustime Trajectory Estimation {\&} Mapping}, journal = {CoRR}, volume = {abs/1504.02696}, year = {2015}, url = {http://arxiv.org/abs/1504.02696} } 

B. Boots & G. J. Gordon.  A Spectral Learning Approach to RangeOnly SLAM.  2012  Technical Report arXiv:1207.2491 
Abstract: We present a novel spectral learning algorithm for simultaneous localization and mapping (SLAM) from range data with known correspondences. This algorithm is an instance of a general spectral system identification framework, from which it inherits several desirable properties, including statistical consistency and no local optima. Compared with popular batch optimization or multiplehypothesis tracking (MHT) methods for rangeonly SLAM, our spectral approach offers guaranteed low computational requirements and good tracking performance. Compared with popular extended Kalman filter (EKF) or extended information filter (EIF) approaches, and many MHT ones, our approach does not need to linearize a transition or measurement model; such linearizations can cause severe errors in EKFs and EIFs, and to a lesser extent MHT, particularly for the highly nonGaussian posteriors encountered in rangeonly SLAM. We provide a theoretical analysis of our method, including finitesample error bounds. Finally, we demonstrate on a realworld robotic SLAM problem that our algorithm is not only theoretically justified, but works well in practice: in a comparison of multiple methods, the lowest errors come from a combination of our algorithm with batch optimization, but our method alone produces nearly as good a result at far lower computational cost. 

BibTeX:
@article{Boots11arxiv, author = "Byron Boot and Geoffrey J. Gordon", year = {2012}, title = "A Spectral Learning Approach to RangeOnly SLAM", journal = {http://arxiv.org/abs/1207.2491} } 

B. Boots & G. J. Gordon.  TwoManifold Problems.  2011  Technical Report arXiv:1112.6399 
Abstract: Recently, there has been much interest in spectral approaches to learning manifoldssocalled kernel eigenmap methods. These methods have had some successes, but their applicability is limited because they are not robust to noise. To address this limitation, we look at twomanifold problems, in which we simultaneously reconstruct two related manifolds, each representing a different view of the same data. By solving these interconnected learning problems together and allowing information to flow between them, twomanifold algorithms are able to succeed where a nonintegrated approach would fail: each view allows us to suppress noise in the other, reducing bias in the same way that an instrumental variable allows us to remove bias in a {linear} dimensionality reduction problem. We propose a class of algorithms for twomanifold problems, based on spectral decomposition of crosscovariance operators in Hilbert space. Finally, we discuss situations where twomanifold problems are useful, and demonstrate that solving a twomanifold problem can aid in learning a nonlinear dynamical system from limited data. 

BibTeX:
@article{Boots11arxiv, author = "Byron Boot and Geoffrey J. Gordon", year = {2011}, title = "TwoManifold Problems", journal = {http://arxiv.org/abs/1112.6399} } 

B. Boots & G. J. Gordon.  Predictive State Temporal Difference Learning.  2010  Technical Report arXiv:1011.0041 
Abstract: We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either highdimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem. 

BibTeX:
@article{Boots10arxiv, author = "Byron Boot and Geoffrey J. Gordon", year = {2010}, title = "Predictive State Temporal Difference Learning", journal = {http://arxiv.org/abs/1011.0041} } 

B. Boots, S. M. Siddiqi & G. J. Gordon.  Closing the LearningPlanning Loop with Predictive State Representations. (Updated: 2010)  2009  Technical Report arXiv:0912.2385 
Abstract: A central problem in artificial intelligence is that of planning to maximize
future reward under uncertainty in a partially observable
environment. In this paper we propose and demonstrate a novel
algorithm which accurately learns a model of such an environment
directly from sequences of actionobservation pairs. We then
close the loop from observations to actions by planning in
the learned model and recovering a policy which is nearoptimal in the original environment.
Specifically, we present an efficient and statistically consistent
spectral algorithm for learning the parameters of a Predictive State
Representation (PSR). We demonstrate the algorithm by learning a
model of a simulated highdimensional, visionbased mobile robot
planning task, and then perform approximate pointbased planning in
the learned PSR. Analysis of our results shows that the algorithm
learns a state space which efficiently captures the essential
features of the environment. This representation allows accurate
prediction with a small number of parameters, and enables successful
and efficient planning. 

BibTeX:
@article{Boots09arxiv, author = "Byron Boots and Sajid M. Siddiqi and Geoffrey J. Gordon", year = {2009}, title = "Closing the LearningPlanning Loop with Predictive State Representations", journal = {http://arxiv.org/abs/0912.2385} } 

S. M. Siddiqi, B. Boots & G. J. Gordon.  ReducedRank Hidden Markov Models.  2009  Technical Report arXiv:0910.0902 
Abstract: We introduce the ReducedRank Hidden Markov Model (RRHMM), a generalization of HMMs that can model smooth state evolution as in Linear Dynamical Systems (LDSs) as well as nonlogconcave predictive distributions as in continuousobservation HMMs. RRHMMs assume an mdimensional latent state and n discrete observations, with a transition matrix of rank k <= m. This implies the dynamics evolve in a kdimensional subspace, while the shape of the set of predictive distributions is determined by m. Latent state belief is represented with a kdimensional state vector and inference is carried out entirely in R^k, making RRHMMs as computationally efficient as kstate HMMs yet more expressive. To learn RRHMMs, we relax the assumptions of a recently proposed spectral learning algorithm for HMMs and apply it to learn kdimensional observable representations of rankk RRHMMs. The algorithm is consistent and free of local optima, and we extend its performance guarantees to cover the RRHMM case. We show how this algorithm can be used in conjunction with a kernel density estimator to efficiently model highdimensional multivariate continuous data. We also relax the assumption that single observations are sufficient to disambiguate state, and extend the algorithm accordingly. Experiments on synthetic data and a toy video, as well as on a difficult robot vision modeling problem, yield accurate models that compare favorably with standard alternatives in simulation quality and prediction capability. 

BibTeX:
@article{siddiqi09arxiv, author = "Sajid M. Siddiqi and Byron Boots and Geoffrey J. Gordon", year = "2009", title = "ReducedRank Hidden {M}arkov Models", journal = "http://arxiv.org/abs/0910.0902" } 

S. M. Siddiqi, B. Boots & G. J. Gordon.  A Constraint Generation Approach to Learning Stable Linear Dynamical Systems.  2008  Technical Report CMUML08101 
Abstract: Stability is a desirable characteristic for linear dynamical systems, but it is often ignored by algorithms that learn these systems from data. We propose a novel method for learning stable linear dynamical systems: we formulate an approximation of the problem as a convex program, start with a solution to a relaxed version of the program, and incrementally add constraints to improve stability. Rather than continuing to generate constraints until we reach a feasible solution, we test stability at each step; because the convex program is only an approximation of the desired problem, this early stopping rule can yield a higherquality solution. We apply our algorithm to the task of learning dynamic textures from image sequences as well as to modeling biosurveillance drugsales data. The constraint generation approach leads to noticeable improvement in the quality of simulated sequences. We compare our method to those of Lacy and Bernstein, with positive results in terms of accuracy, quality of simulated sequences, and efﬁciency. 

BibTeX:
@techreport{Siddiqi08, author = "Sajid Siddiqi and Byron Boots and Geoffrey J. Gordon", title = "A Constraint Generation Approach to Learning Stable Linear Dynamical Systems", institution = {Carnegie Mellon University}, number = {CMUML08101}, year = {2008} } 

E. Chown & B. Boots  Learning in Cognitive Maps: Finding Useful Structure in an Uncertain World.  2008  Robot and Cognitive Approaches to Spatial Mapping 
Abstract: In this chapter we will describe the central mechanisms that influence how people learn about largescale space. We will focus particularly on how these mechanisms enable people to effectively cope with both the uncertainty inherent in a constantly changing world and also with the high information content of natural environments. The major lessons are that humans get by with a “less is more” approach to building structure, and that they are able to quickly adapt to environmental changes thanks to a range of general purpose mechanisms. By looking at abstract principles, instead of concrete implementation details, it is shown that the study of human learning can provide valuable lessons for robotics. Finally, these issues are discussed in the context of an implementation on a mobile robot. 

BibTeX:
@incollection{Chown2008, author = "Eric Chown and Byron Boots", title = "Learning Cognitive Maps: Finding Useful Structure in an Uncertain World", booktitle = "Robot and Cognitive Approaches to Spatial Mapping", editor = "Margaret E. Jefferies and WaiKiang Yeap", publisher = {Springer Verlag}, year = {2008}, chapter = {10}, pages ={215236} } 
Theses
Author  Title  Year  Degree & Institution 

B. Boots  Spectral Approaches to Learning Predictive Representations.  2012  Doctoral Thesis Carnegie Mellon University 
Abstract: A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of actionobservation pairs. Our research agenda includes several variations of this general approach: spectral methods for classical models like Kalman filters and hidden Markov models, batch algorithms and online algorithms, and kernelbased algorithms for learning models in high and infinitedimensional feature spaces. All of these approaches share a common framework: the model's belief space is represented as predictions of observable quantities and spectral algorithms are applied to learn the model parameters. Unlike the popular EM algorithm, spectral learning algorithms are statistically consistent, computationally efficient, and easy to implement using established matrixalgebra techniques. We evaluate our learning algorithms on a series of prediction and planning tasks involving simulated data and real robotic systems. 

BibTeX:
@phdthesis{BootsThesis2012, author = "Byron Boots", title = "Spectral Approaches to Learning Predictive Representations", school = "Carnegie Mellon University", year = {2012}, month = {December}, } 

B. Boots  Spectral Approaches to Learning Predictive Representations: Thesis Proposal
[Abstract] 
2011  Thesis Proposal Carnegie Mellon University 
Abstract: A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of actionobservation pairs. Our research agenda includes several variations of this general approach: batch algorithms and online algorithms, kernelbased algorithms for learning models in high and infinitedimensional feature spaces, and manifoldbased identification algorithms. All of these approaches share a common framework: they are statistically consistent, computationally efficient, and easy to implement using established matrixalgebra techniques. Additionally, we show that our framework generalizes a variety of successful spectral learning algorithms in diverse areas, including the identification of Hidden Markov Models, recovering structure from motion, and discovering manifold embeddings. We will evaluate our learning algorithms on a series of prediction and planning tasks involving simulated data and real robotic systems. We anticipate several difficulties while moving from smaller problems and synthetic problems to larger practical applications. The first is the challenge of scaling learning algorithms up to the higherdimensional state spaces that more complex tasks require. The second is the problem of integrating expert knowledge into the learning procedure. The third is the problem of properly accounting for actions and exploration in controlled systems. We believe that overcoming these remaining difficulties will allow our models to capture the essential features of an environment, predict future observations well, and enable successful planning. 

B. Boots  Learning Stable Linear Dynamical Systems.  2009  M.S. Machine Learning Carnegie Mellon University 
Abstract: Stability is a desirable characteristic for linear dynamical systems, but it is often ignored
by algorithms that learn these systems from data. We propose a novel method for learning
stable linear dynamical systems: we formulate an approximation of the problem as a convex
program, start with a solution to a relaxed version of the program, and incrementally add
constraints to improve stability. Rather than continuing to generate constraints until we
reach a feasible solution, we test stability at each step; because the convex program is only
an approximation of the desired problem, this early stopping rule can yield a higherquality
solution. We employ both maximum likelihood and subspace ID methods to the problem of
learning dynamical systems with exogenous inputs directly from data. Our algorithm is applied to a variety of problems including the tasks of learning dynamic textures from image
sequences, learning a model of laser and vision sensor data from a mobile robot, learning
stable baseline models for drugsales data in the biosurveillance domain, and learning a
model to predict sunspot data over time. We compare the constraint generation approach
to learning stable dynamical systems to the best previous stable algorithms (Lacy and Bernstein, 2002, 2003), with positive results in terms of prediction accuracy, quality of simulated
sequences, and computational eﬃciency. 

BibTeX:
@MastersThesis{Boots:Thesis:2009, author = "Byron Boots", title = "Learning Stable Linear Dynamical Systems", school = "Carnegie Mellon University", year = {2009}, month = {May}, } 

B. Boots  Robot Localization and Abstract Mapping in Dynamic Environments.  2003  B.A. Computer Science Bowdoin College 
Abstract: In this chapter we will describe the central mechanisms that influence how people learn about largescale space. We will focus particularly on how these mechanisms enable people to effectively cope with both the uncertainty inherent in a constantly changing world and also with the high information content of natural environments. The major lessons are that humans get by with a “less is more” approach to building structure, and that they are able to quickly adapt to environmental changes thanks to a range of general purpose mechanisms. By looking at abstract principles, instead of concrete implementation details, it is shown that the study of human learning can provide valuable lessons for robotics. Finally, these issues are discussed in the context of an implementation on a mobile robot. 

BibTeX:
techreport{Boots2003b, author = "Byron Boots", title = "Robot Localization and Abstract Mapping in Dynamic Environments", institution = {Bowdoin College}, year = {2003} } 

B. Boots  Chunking: A Modified Dynamic Programming Approach to Solving Stochastic Satisfiability Problems Efficiently.  2003  B.A. Computer Science Bowdoin College 
Abstract: The best general stochastic satisfiability solvers systematically evaluate all possible variable assignments, using heuristics to prune assignments whenever possible. The idea of chunking differs in that it breaks the solution to stochastic satisfiability problems into pieces, amounting to a modified dynamic programming approach. The benefit of this approach, as compared with the straightforward application of dynamic programming, is that the saved solutions to the problem pieces are partial solutions and thus reusable in multiple situations. 

BibTeX:
techreport{Boots2003a, author = "Byron Boots", title = "Chunking: A Modified Dynamic Programming Approach to Solving Stochastic Satisfiability Problems Efficiently", institution = {Bowdoin College}, year = {2003} } 