Chronological  By Speaker  


Title  Reinforcement Learning Feedback Control Using Reduced Output Measurements 
Speaker  F. L. Lewis 
University of Texas at Arlington Riverbend  
Abstract  In this talk we present new results in Hinfinity control using reinforcement learning (RL) techniques, which use observed system responses to update the control policy in realtime in an optimal fashion. We show how to implement RL feedback controllers for continuoustime systems using newly developed techniques. We present a new method for RL control that requires only output measurements. Traditional RL methods require full state variable feedback. Optimal Control design techniques have provided very effective feedback controllers for modern systems in aerospace, vehicle systems, industrial process control, robotics, mobile robots, wireless sensor networks, and elsewhere. Optimal control design is fundamentally a backwardsintime procedure based on dynamic programming, specifically on Bellmanâ€™s Optimality Principle. This means that most existing optimal control design methods must be carried out offline. Moreover, the full system dynamical description must generally be known to compute optimal controllers using wellknown techniques, such as Riccati equation design. In this talk we show how to implement optimal controllers online forward in time for systems whose dynamical description is not known or is partially known. A family of online Optimal Adaptive Controllers is provided, whereby adaptive learning techniques are used to learn the optimal control strategy in real time using system measured data along the system trajectories. In the linear timeinvariant case, this amounts to solving the Riccati equation online in realtime without knowing the system plant matrix. These Optimal Adaptive Controllers are based on Approximate Dynamic Programming (ADP) and Q learning. Reinforcement Learning is a method for online learning of control policies based on stimuli from the environment in response to current control policies. Such methods were used by I.M. Pavlov for learning in canines. Particularly interesting are the actorcritic structures, including those based on policy iteration those based on value iteration. A special case of value iteration is the ADP structures. Qlearning is a method of actor critic reinforcement learning that does not require any knowledge of the system dynamics, yet finds optimal control policies online in real time. ADP and Q learning have been well developed by the Computational Intelligence Community, primarily for Markov Decision processes, and have not been fully explored for feedback control purposes within the Control systems Community 
When  Friday, 1 October 2010, 13:30  14:30 
Where  117 Electrical Engineering Building 
More  Hide Abstracts. Announcement (PDF) 
Title  Statistical Fault Detection and Analysis 
Speaker  Greg Bronevetsky 
Lawrence Livermore National Laboratory  
When  Monday, 15 November 2010, 13:15  14:15 
Where  117 Electrical Engineering Building 
More  Hide Abstracts. Announcement (PDF) 