Learning While Repositioning in On-demand Vehicle Sharing Networks

Learning While Repositioning in On-demand Vehicle Sharing Networks

Auteur : Hansheng Jiang, Shunan Jiang, Zuo-Jun Max Shen

Date de publication : 2022

Éditeur : SSRN

Nombre de pages : Non disponible

Résumé du livre

We consider the vehicle repositioning problem for a vehicle sharing network with a fixed number of vehicles distributed across multiple locations. The vehicle sharing service is on-demand and one-way, which means that customers can pick up one vehicle from any location without prior reservation and drop off the vehicle to any location in network after the usage. Due to uncertainty in both customer arriving and vehicle returning, the service provider needs to periodically reposition the vehicles to match the supply with the demand while minimizing the total costs of repositioning labor and lost sales. The repositioning problem is critical in successful management of on-demand one-way vehicle sharing services, and it is challenging both analytically and computationally. The optimal repositioning policy under a general $n$-location network is inaccessible without knowing the optimal value function. Existing literature on computing repositioning policies assumes that the demand distribution is either known or can be simulated effectively from samples collected beforehand. In contrast, we develop an online learning method to dynamically reposition vehicles without knowing the demand distribution in advance. We study the performance of our algorithm by comparing it with the best base-stock policy. The best base-stock policy is a generalization of the popular inventory control policy to the vehicle repositioning problem. We prove that the $T$-period aggregate regret of our algorithm is bounded by $O(T^{ frac{n}{ n+1}}(n log T)^{ frac{1}{ n+1}})$. For the special case of two locations, the optimal repositioning policy has an explicit structure known as a ``two-threshold'' policy. We further show that our online learning algorithm can be applied to learn the optimal ``two-threshold'' policy online and prove a regret bound of $O(T^{3/4} ( log T)^{1/4})$ when compared to the optimal repositioning policy. Numerical experiments on synthetic data demonstrate the effectiveness of our method.

Connexion / Inscription

Saisissez votre e-mail pour vous connecter ou créer un compte

Connexion

Inscription

Mot de passe oublié ?

Nous allons vous envoyer un message pour vous permettre de vous connecter.