Recent years have witnessed spectacular successes in applying machine learning tools to sequential decision making and control of complex dynamical systems. These tools combine Reinforcement Learning (RL) and high-capacity function approximators, typically large neural networks, and hence require a tremendous amount of training data, even to learn simple tasks. Their applications have been mainly limited to specific scenarios, such as board and video games, where generating and gathering data is inexpensive and limitless. Now, controlling real-world systems such as self-driving autonomous vehicles or complex industrial processing plants require high fidelity models which in turn necessitate a large amount of data. Many real-world systems can however only be probed a limited number of times. Collecting data can be very expensive and time consuming. It may actually be sometimes even impossible to probe the system without compromising it. This project develops fundamental theory and tools towards learning to control complex dynamical systems based on a limited number of data samples. Results will be applied in three application domains, namely bioprocessing, communication systems, and robotics. Strategically, the collaboration between TechLab and the Competence Centre for Advanced BioProduction aims at establishing a strong, visible and sustainable activity between digitalization and life science at KTH. Research Objective and Approach: The overall research objective is development of new techniques, methods, and tools to learn and control complex dynamical systems using limited number of data samples and structural information in a reliable manner. Our approach will be based on simulated data generation in combination with the limited real data availability. The combination will be achieved by active learning where a gray-box simulator is used to generate data for reliable learning of controller/agent in a complex dynamical system. Test-beds and Demonstrators: The project includes three application demonstrators carefully chosen for their potential for societal and industrial impact and for the theoretical challenges they impose. The demonstrators, detailed in Section 4, also cover a wide variety of model classes: Continuous bioprocessing involve large scale ODEs, reinforcement learning in mobile networks involve discrete-event models, and self-learning physical robots involve hybrid differential algebraic models.
Project team
This is a collaborative project involving four different groups at KTH, covering expertise in software techniques, signal processing, system identification, automatic control, reinforcement learning and bioproduction.
- Assoc. Prof. David Broman (PI)
- Ass. Prof. Saikat Chatterjee
- Docent Véronique Chotteau
- Prof. Håkan Hjalmarsson
- Prof. Alexandre Proutiere
Project funding and duration
The project is funded by KTH.