Qisen Yang's Homepage
Qisen Yang's Homepage
Home
News
Selected
Publications
Experience
Awards
Contact
Light
Dark
Automatic
paper-conference
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning
Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with …
Shenzhi Wang
,
Qisen Yang
,
Jiawei Gao
,
Matthieu Gaetan Lin
,
Hao Chen
,
Liwei Wu
,
Ning Jia
,
Shiji Song
,
Gao Huang
PDF
Cite
Arxiv
Boosting Offline Reinforcement Learning with Action Preference Query
Compared to online fine-tuning, querying the preferences between pre-collected and learned actions can be equally or even more helpful to the erroneous estimate problem.
Qisen Yang
,
Shenzhi Wang
,
Matthieu Gaetan Lin
,
Shiji Song
,
Gao Huang
PDF
Cite
Arxiv
Efficient Knowledge Distillation from Model Checkpoints
We observe that an intermediate model often serves as a better teacher compared to the fully converged model, although the former has much lower accuracy. This phenomenon can be partially explained by the information bottleneck principle.
Chaofei Wang
,
Qisen Yang
,
Rui Huang
,
Shiji Song
,
Gao Huang
PDF
Cite
Arxiv
Cite
×