• Login
  • Register
  • Search

Online DRQN-Based E-Commerce Recommendation

Xiaoyi Cai

Abstract


E-commerce recommendation systems, which have the advantage of dynamic interaction over traditional recommendation algorithms, use reinforcement learning for virtual human-e-commerce interaction to model the interests of users. In this paper, we first embed users and items, and then pass them through an LSTM network. Unlike DRQNs, we do not require image recognition, so we do not use convolutional neural networks and instead only consider training these RL-based recommender systems via LSTM in a RecoGym environment [1]. In this paper, we use RecoGym to generate artificial data, rather than real data, in consideration of the requirement to protect user privacy and to be more easily compatible with a reinforcement learning environment. A combination of Long Short Term Memory (LSTM) and DQN is deployed. This compensates for the fact that DQNs handle longer sequences of actions or achieve "relaxed Markovian" learning across sequences. Similar modelling effects were demonstrated in [2][3][4], but this paper uses a combination of reinforcement learning algorithms and recurrent neural networks and achieves better error convergence rates and recommends items to users more efficiently and accurately.


Keywords


Recommender System; DQN; LSTM; Online E-Commerce

Full Text:

PDF

Included Database


References


Rohde, David & Bonner, Stephen & Dunlop, Travis & Vasile, Flavian & Karatzoglou, Alexandros. (2018). RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising.

Yan, A., Cheng, S., Kang, WC., Wan, M. & McAuley, J.J. (2019). CosRec: 2D Convolutional Neural Networks for Sequential Recommendation. Proceedings of the 28th ACM International Conference on Information and Knowledge Management.

Elena Smirnova and Flavian Vasile. 2017. Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks.

Hausknecht, Matthew & Stone, Peter. (2015). Deep Recurrent Q-Learning for Partially Observable MDPs.

Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized Markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web (WWW'10). Association for Computing Machinery, New York, NY, USA, 811–820.




DOI: http://dx.doi.org/10.18686/fm.v7i6.5663

Refbacks