Autonomous Charging of Electric Vehicle Fleets to Enhance Renewable Generation Dispatchability
Reza Bayani, Saeed D. Manshadi(), Guangyi Liu, Yawei Wang, Renchang Dai
San Diego State University, San Diego, CA, 92182 USA
GEIRI North America, San Jose, CA, 95134 USA
Show Author Information
Hide Author Information
Abstract
A total of 19% of generation capacity in California is offered by PV units and over some months, more than 10% of this energy is curtailed. In this research, a novel approach to reducing renewable generation curtailment and increasing system flexibility by means of electric vehicles’ charging coordination is presented. The presented problem is a sequential decision making process, and is solved by a fitted -iteration algorithm which unlike other reinforcement learning methods, needs fewer episodes of learning. Three case studies are presented to validate the effectiveness of the proposed approach. These cases include aggregator load following, ramp service and utilization of non-deterministic PV generation. The results suggest that through this framework, EVs successfully learn how to adjust their charging schedule in stochastic scenarios where their trip times, as well as solar power generation are unknown beforehand.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
E. I.Administration. (2020, May). Annual energy outlook 2020 with projections to 2050. [Online]. Available: https://www.eia.gov/outlooks/aeo/.
[2]
H.Kikusato, Y.Fujimoto, S. I.Hanada, D.Isogawa, S.Yoshizawa, H.Ohashi, and Y.Hayashi, “Electric vehicle charging management using auction mechanism for reducing PV curtailment in distribution systems,”IEEE Transactions on Sustainable Energy, vol. 11, no. 3, pp. 1394–1403, Jul.2020.
CAISO. (2020, May). California independent system operator website. [Online]. Available: https://www.caiso.com/.
[5]
J.Cochran, P.Denholm, B.Speer, and M.Miller, “Grid integration and the carrying capacity of the U.S. grid to incorporate variable renewable energy,”National Renewable Energy Lab. (NREL), Golden, CO (United States), Tech. Rep. NREL/TP-6A20-62607, Apr. 2015.
J.Cochran, L.Bird, J.Heeter, and D. J.Arent, “Integrating variable renewable energy in electric power markets. Best practices from international experience,”National Renewable Energy Lab. (NREL), Golden, CO (United States), Tech. Rep. NREL/TP-6A00-53732, Apr. 2012.
L.Bird, J.Cochran, and X.Wang, “Wind and solar energy curtailment: Experience and practices in the United States,”National Renewable Energy Lab. (NREL), Golden, CO (United States), Tech. Rep. NREL/TP-6A20–60983, Mar. 2014.
C. B.Li, H. Q.Shi, Y. J.Cao, J. H.Wang, Y. H.Kuang, Y.Tan, and J.Wei, “Comprehensive review of renewable energy curtailment and avoidance: a specific example in China,”Renewable and Sustainable Energy Reviews, vol. 41, pp. 1067–1079, Jan.2015.
E.Ela, “Using economics to determine the efficient curtailment of wind energy,”National Renewable Energy Lab. (NREL), Golden, CO (United States), Tech. Rep. NREL/TP-550-45071, Feb. 2009.
J.Zou, S.Rahman, and X.Lai, “Mitigation of wind output curtailment by coordinating with pumped storage and increasing transmission capacity,”in2015 IEEE Power & Energy Society General Meeting, Denver, 2015, pp. 1–5.
B.Cleary, A.Duffy, A.OConnor, M.Conlon, and V.Fthenakis, “Assessing the economic benefits of compressed air energy storage for mitigating wind curtailment,”IEEE Transactions on Sustainable Energy, vol. 6, no. 3, pp. 1021–1028, Jul.2015.
M.Moradzadeh, B.Zwaenepoel, J.Van de Vyver, and L.Vandevelde, “Congestion-induced wind curtailment mitigation using energy storage,”in2014 IEEE International Energy Conference (ENERGYCON), Cavtat, 2014, pp. 572–576.
P.Denholm and T.Mai, “Timescales of energy storage needed for reducing renewable energy curtailment,”Renewable Energy, vol. 130, pp. 388–399, Jan.2019.
M. A.Hozouri, A.Abbaspour, M.Fotuhi-Firuzabad, and M.Moeini-Aghtaie, “On the use of pumped storage for wind energy maximization in transmission-constrained power systems,”IEEE Transactions on Power Systems, vol. 30, no. 2, pp. 1017–1025, Mar.2015.
D. X.Zhang, X. Q.Han, and C. Y.Deng, “Review on the research and practice of deep learning and reinforcement learning in smart grids,”CSEE Journal of Power and Energy Systems, vol. 4, no. 3, pp. 362–370, Sep.2018.
E.Mocanu, D. C.Mocanu, P. H.Nguyen, A.Liotta, M. E.Webber, M.Gibescu, and J. G.Slootweg, “On-line building energy optimization using deep reinforcement learning,”IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 3698–3708, Jul.2019.
J. K.Szinai, C. J. R.Sheppard, N.Abhyankar, and A. R.Gopal, “Reduced grid operating costs and renewable energy curtailment with electric vehicle charge management,”Energy Policy, vol. 136, pp. 111051, Jan.2020.
A.Chiş, J.Lundén, and V.Koivunen, “Reinforcement learning-based plug-in electric vehicle charging with forecasted price,”IEEE Transactions on Vehicular Technology, vol. 66, no. 5, pp. 3674–3684, May2017.
T.Chen and W. C.Su, “Indirect customer-to-customer energy trading with reinforcement learning,”IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 4338–4348, Jul.2019.
B. J.Claessens, P.Vrancx, and F.Ruelens, “Convolutional neural networks for automatic state-time feature extraction in reinforcement learning applied to residential load control,”IEEE Transactions on Smart Grid, vol. 9, no. 4, pp. 3259–3269, Jul.2018.
V.Mnih, K.Kavukcuoglu, D.Silver, A. A.Rusu, J.Veness, M. G.Bellemare, A.Graves, M.Riedmiller, A. K.Fidjeland, G.Ostrovski, S.Petersen, C.Beattie, A.Sadik, I.Antonoglou, H.King, D.Kumaran, D.Wierstra, S.Legg, and D.Hassabis, “Human-level control through deep reinforcement learning,”Nature, vol. 518, no. 7540, pp. 529–533, Feb.2015.
Z. D.Zhang, D. X.Zhang, and R. C.Qiu, “Deep reinforcement learning for power system applications: An overview,”CSEE Journal of Power and Energy Systems, vol. 6, no. 1, pp. 213–225, Mar.2020.
F. L.Da Silva, C. E. H.Nishida, D. M.Roijers, and A. H. R.Costa, “Coordination of electric vehicle charging through multiagent reinforcement learning,”IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 2347–2356, May2020.
T.Qian, C. C.Shao, X. L.Wang, and M.Shahidehpour, “Deep reinforcement learning for EV charging navigation by coordinating smart grid and intelligent transportation system,”IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1714–1723, Mar.2020.
H.Ko, S.Pack, and V. C. M.Leung, “Mobility-aware vehicle-to-grid control algorithm in microgrids,”IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 7, pp. 2165–2174, Jul.2018.
H. P.Li, Z. Q.Wan, and H. B.He, “Constrained EV charging scheduling based on safe deep reinforcement learning,”IEEE Transactions on Smart Grid, vol. 11, no. 3, pp. 2427–2439, May2020.
Z. Q.Wan, H. P.Li, H. B.He, and D.Prokhorov, “Model-free real-time EV charging scheduling based on deep reinforcement learning,”IEEE Transactions on Smart Grid, vol. 10, no. 5, pp. 5246–5257, Sep.2019.
M.Shin, D. H.Choi, and J.Kim, “Cooperative management for PV/ESS-enabled electric vehicle charging stations: A multiagent deep reinforcement learning approach,”IEEE Transactions on Industrial Informatics, vol. 16, no. 5, pp. 3493–3503, May2020.
Z.Wei, Y.Li, and L.Cai, “Electric vehicle charging scheme for a park-and-charge system considering battery degradation costs,”IEEE Transactions on Intelligent Vehicles, vol. 3, no. 3, pp. 361–373, Sep.2018.
S.Vandael, B.Claessens, D.Ernst, T.Holvoet, and G.Deconinck, “Reinforcement learning of heuristic EV fleet charging in a day-ahead electricity market,”IEEE Transactions on Smart Grid, vol. 6, no. 4, pp. 1795–1805, Jul.2015.
N.Sadeghianpourhamami, J.Deleu, and C.Develder, “Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning,”IEEE Transactions on Smart Grid, vol. 11, no. 1, pp. 203–214, Jan.2020.
B. V.Mbuwir, F.Ruelens, F.Spiessens, and G.Deconinck, “Battery energy management in a microgrid using batch reinforcement learning,”Energies, vol. 10, no. 11, pp. 1846, Nov.2017.
S. D.Manshadi, M. E.Khodayar, K.Abdelghany, and H.Üster, “Wireless charging of electric vehicles in electricity and transportation networks,”IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 4503–4512, Sep.2018.
S. D.Manshadi and M. E.Khodayar, “Strategic behavior of in-motion wireless charging aggregators in the electricity and transportation networks,”IEEE Transactions on Vehicular Technology, vol. 69, no. 12, pp. 14780–14792, Dec.2020.
X. N.Wang, J. H.Wang, and J. Z.Liu, “Vehicle to grid frequency regulation capacity optimal scheduling for battery swapping station using deep Q-network,”IEEE Transactions on Industrial Informatics, vol. 17, no. 2, pp. 1341–1351, Feb.2021.
E. M. P.Walraven and M. T. J.Spaan, “Planning under uncertainty for aggregated electric vehicle charging with renewable energy supply,”inProceedings of the Twenty-second European Conference on Artificial Intelligence, 2016, pp. 904–912.
Bayani R, Manshadi SD, Liu G, et al. Autonomous Charging of Electric Vehicle Fleets to Enhance Renewable Generation Dispatchability. CSEE Journal of Power and Energy Systems, 2022, 8(3): 669-681. https://doi.org/10.17775/CSEEJPES.2020.04000
10.17775/CSEEJPES.2020.04000.F001
Comparison of California PV output before and after curtailments
10.17775/CSEEJPES.2020.04000.F002
The agent-environment interaction in MDPs.
III. Solution Methodology
As suggested earlier, fitted -iteration is chosen as the approach to solve the RL problem in this study. Basically, the optimal solution of an MDP is acquired by maximizing the discounted sum of the rewards. The discount rate, , determines the present value of future rewards: a reward received time steps in the future is worth only times what it would be worth if it were received immediately. At the time , the value of taking action at state is denoted by and is calculated according to Bellman’s optimality equation as stated in (10).
Based on (10), the action value function at each state is the sum of the immediate reward at that state plus the maximum achievable action value at the subsequent state, for all feasible actions at time . The procedure of -iteration used for solving the problem is presented in Algorithm 1. This algorithm takes the discount factor (), the maximum number of simulation days (D), the maximum number of iterations (), action and control step size ( and , respectively), the initial policy () and the target () for the explorer as inputs; while returning the action value function as output. At the initializing step, the variables for day and time indices are set to zero and the sets containing batch samples for each day () and the whole simulation period () are pre-allocated.
Line 2 of Algorithm 1 asserts that for the initial day (), the actions are acquired based on the initial policy, i.e. . The preset policy is used only for the first day, and we have implemented a random policy for this purpose where actions at each state are chosen randomly from the feasible set of actions, . For the rest of the days, the actions are executed at time steps for each minutes; however, it is only at control time steps where that new actions are chosen. According to line 5, the best action is first obtained from the optimal action value function. Then based on line 6, the action at each time step is chosen by the explorer. To this end, we have implemented the -greedy exploration method, where optimal action is acquired as shown in (11) and (12).
A linear -greedy method is selected here, where according to (11), the curve is a line decreasing from 1 to with a slope equal to . The action at each step is then extracted based on (12), where is a random number between 0 and 1. This equation means with the probability of , action at each time step is set to the optimal action returned from the action value function, and with the probability of , actions are chosen randomly from the feasible action set. In lines 7 and 8 of Algorithm 1, the chosen action is forwarded to the environment and the pair , representing the next state and reward for taking action is returned. Then, the set of batch experiences is updated.
At the end of each day, set , which has the total simulation experiences so far, is updated. Then through an iterative procedure as indicated in lines 17–24, input and output data are fed to the regression function. Finally, the optimal action value function estimator which is used at line 8 for calculating the optimum action choice is set to be equal to the approximator function of the last iteration. To achieve an estimate of convergence, the factor convergence rate is introduced in (13). is the learning convergence for day at iteration , and is a measure to indicate how the best action-value function at each iteration performs compared to the previous iteration. The mean of best action value is denoted by .
The most important element of the fitted -iteration algorithm is the regression function. At the end of each epoch, the data from the environment is fed to the regression function to acquire an approximation of the true Q values. Also, the agent depends on the regression function for choosing actions at each step. Some regression methods that are used by different authors include but are not limited to neural network approximators, multi layer perceptrons, support vector machines, decision trees, and random forests. The fitted -iteration algorithm requires the fitting of any arbitrary (parametric or non-parametric) approximation architecture to the Q-function with no bias on the regression function.
In this work, we follow the path of the original fitted -iteration algorithm, which made use of an ensemble of decision trees. [
23
]. In the search for a desirable regression algorithm that is able to model any Q-function, tree-based models are found to offer great flexibility, meaning they can be used for predicting any type of Q-function. They are non-parametric, i.e. they do not need repeated occasions of trial and error to tune the parameters. Tree based models are also computationally efficient and despite some other methods, with the increase in problem dimensions, the computation burden of tree based methods does not increase exponentially. Among tree based models, we opted for random forests. Random forests are created by putting together various decision trees. At each step, the random forest algorithm selects a random subset of features. That is why this method is robust to outliers. The convergence of tree based regression for fitted -iteration is investigated in [
23
]. All in all, they are a great tool for estimating a priori unknown shaped Q-functions with satisfactory performance.
10.17775/CSEEJPES.2020.04000.F003
The reward and convergence in Case 0 for and .
10.17775/CSEEJPES.2020.04000.F004
The charging power curve in Case 0 for day 103.
10.17775/CSEEJPES.2020.04000.F005
The specifications of in Case 0 for day 103.
10.17775/CSEEJPES.2020.04000.F006
The charging power curve in Case 1 for day 133.
10.17775/CSEEJPES.2020.04000.F007
The specifications of in Case 1 for day 87.
10.17775/CSEEJPES.2020.04000.F008
The charging power curve in Case 2 for days 86–88.
10.17775/CSEEJPES.2020.04000.F009
The charging power curve in Case 2 for day 86.
10.17775/CSEEJPES.2020.04000.F010
SOC of in Case 2 for days 86–88.
10.17775/CSEEJPES.2020.04000.F011
The average rewards of DRL algorithm in Case 0 for 1 year simulation.