How Long Until Reinforcement Learning is Applied in the Physical World?

As one of the most popular technologies in the field of machine learning in recent years, reinforcement learning (RL) has made significant progress in areas such as games, robotic control, and large model training. It feels like RL is back!!!

However, when we talk about its practical applications in the physical world, there are still many challenges to overcome—beyond the classic issues like learning efficiency and generalization, real-world deployment requires solving even more practical problems. Similar to traffic prediction in the field of spatiotemporal data mining, we’ve seen an increasing number of RL models for traffic control in recent years. Some of these seem to focus on outperforming state-of-the-art (SOTA) benchmarks, which, in a way, indicates that the problem is gaining more attention. However, what I want to emphasize are the bigger challenges that remain unsolved in RL-based traffic light control. Without addressing these, this valuable problem risks being reduced to a mere “toy problem.”

As a researcher dedicated to intelligent transportation systems, I often ask myself: why are we investing so much effort into the problem of “traffic light control”?

I believe this question has two dimensions:

In intelligent transportation systems, autonomous driving is already a prominent research problem, so why focus on traffic light control?
Traffic light control with RL seems straightforward and doesn’t require complex technology.

To address the first question, I believe autonomous driving complicates RL by adding the perception aspect. For research and publishing, this indeed has great potential, much like vision-based robotics, where papers are abundant. However, the traffic light control problem allows for a clear separation between perception and control, with observations defined as vectors rather than images or videos.

For the second question, traffic light control is far from simple. In fact, it presents an ideal dynamic, multi-agent environment that is almost a natural application scenario for reinforcement learning.

Behind RL-based traffic light control lie profound academic challenges, making it an ideal testing ground for RL’s deployment in real-world applications. In this problem, RL needs to solve a series of complex issues, including optimality, efficiency, generalization, safety, and sim-to-real transfer. These are the critical obstacles RL must overcome before it can be effectively deployed in the physical world.

Optimality
The primary goal of RL in traffic light control is to find an optimal strategy that ensures smooth traffic flow. This not only involves minimizing wait times at individual intersections but also optimizing the overall efficiency of the traffic network. To achieve this, RL algorithms must explore and optimize strategies within limited time and resource constraints, identifying the best control scheme under varying traffic conditions. This is a significant challenge for any dynamic control system, particularly given the constantly changing traffic flow and random external factors. RL needs to be able to find global optima in such complex environments, rather than just local ones.
In many RL applications outside of traffic, optimality is rarely a key discussion point. However, in traffic light control, which has been studied for years, optimization methods have provided numerous classical optimal solutions under given assumptions. Demonstrating that RL can achieve the same optimal solutions would make RL more trustworthy.

**Source: PressLight, KDD 2019. Optimal solution: Bidirectional Green Wave**

Efficiency
RL typically requires large amounts of data and computational resources for training, but traffic light control offers an ideal scenario for testing the efficiency of RL algorithms. In a real urban traffic network, traffic signal decisions must be made within short cycles. Therefore, RL algorithms must make real-time decisions under constraints of limited computing resources and time. Improving the sampling efficiency of these algorithms to converge faster, while considering hardware limitations, is a significant challenge in RL-based traffic light control. Additionally, RL algorithms must quickly adapt to new traffic patterns with limited data to ensure effective signal control.
Generalization
The complexity and diversity of traffic systems make generalization another critical issue for RL algorithms. Traffic conditions vary significantly across intersections, cities, time periods, and weather conditions, and RL algorithms must be able to adapt to these changes. Developing models with strong generalization capabilities is essential. Traffic light control requires RL to not only perform well in specific simulation environments but also to operate stably in diverse real-world settings. A model with strong generalization can provide solutions for traffic signal control and inspire applications in other dynamic systems, such as intelligent vehicle scheduling or logistics optimization.
Safety
Safety is always a top priority in real-world traffic control applications. RL in traffic light control must ensure the safety of both pedestrians and vehicles, avoiding traffic accidents caused by incorrect decisions. This requires RL models to avoid unsafe strategies during training and to be fault-tolerant in real-world applications, capable of handling unexpected situations—such as when sensor data is incorrect or controllers fail. The safety challenges in traffic light control further complicate the design of RL algorithms, which must not only optimize efficiency but also ensure that every decision is safe.

Sim-to-real transfer
RL applications in traffic light control typically begin with testing in simulation environments. However, transferring strategies from simulation to the real world remains a major challenge. Differences between simulation and reality—such as weather changes, vehicle behavior variations, traffic flow fluctuations, and sensor data discrepancies—can cause algorithms that perform well in simulations to fail in real-world settings. To overcome this, RL algorithms need strong transferability, allowing them to seamlessly apply strategies learned in simulations to real-world environments. This not only involves improving the robustness of algorithms but also requires continuous strategy optimization using real-world data.

**Source: UGAT, CDC 2023. Sim-to-real gaps in dynamics**

**Source: PromptGAT, AAAI 2024. Solution to bridge the sim-to-real gaps in dynamics with LLM**

In summary, traffic light control is not only an ideal testbed for RL but also offers rich research potential and practical applications. Those who consider traffic light control a mere “toy problem” often overlook its challenges in complex system control, data uncertainty, and strategy transfer. I hope more researchers will engage with this field, using innovative approaches to advance the real-world deployment of RL. As for how far we are from real-world applications, our team is actively exploring, and I look forward to sharing some exciting news with you in the future.