article-journal

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

The commonly used action matching principle may lead to irrelevant or misplaced feature attribution when different DNNs’ outputs lead to the same rewards or different rewards result from the same outputs.

Qisen Yang, Huanqian Wang, Mukun Tong, Wenjie Shi, Gao Huang, Shiji Song

Leveraging Reward Consistency for Interpretable Feature Discovery in Reinforcement Learning

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without any interactions with the …

Qisen Yang, Shenzhi Wang, Qihang Zhang, Gao Huang, Shiji Song