DaRL Group / Hua Wei

Old-School Deep RL Tricks for Modern LLM Training

How to port n-step returns, TD(λ), uncertainty, safety, and friends from deep RL into RLHF/RLAIF and tool-using LLM agents — with equations. This note collects well-known deep RL techniques (pre-LLM era) and adapts them to modern LLM training/inference. The patterns below are not the only way to do it — treat them as practical starting…

NSF CAREER Resources: From Confusion to Submission

When I was preparing my CAREER proposal, especially the first time, I often felt completely overwhelmed. Every day was a tug-of-war between “I cannot look at this anymore” and “Wait, maybe I can still improve this part…” But even then, I was never quite sure where to focus, or what the reviewers were really looking…

KDD’18: IntelliLight

This post is immigrated from the original page. [paper] [code] [video]

How Long Until Reinforcement Learning is Applied in the Physical World?

As one of the most popular technologies in the field of machine learning in recent years, reinforcement learning (RL) has made significant progress in areas such as games, robotic control, and large model training. It feels like RL is back!!! However, when we talk about its practical applications in the physical world, there are still…

On the Future of Spatio-Temporal Data Mining for Traffic Prediction

Starting as a researcher in spatio-temporal data mining, I still vividly remember the excitement of my first paper being accepted by CIKM 2016 on predicting passenger demand for car-hailing apps. In the last 7-8 years, the field of traffic prediction has gained substantial attention in prominent conferences like KDD, AAAI, and IJCAI. However, as time…

DaRL Lab Handbook

https://darl-lab.gitbook.io/handbook/ Or Chat with AI Hua Bot here (ASU Login Required).