BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20250712T092211EDT-4328Wku2hF@132.216.98.100 DTSTAMP:20250712T132211Z DESCRIPTION:Safe Policy Improvement with Baseline Bootstrapping\n\n A commo n goal in Reinforcement Learning is to derive a good strategy given a limi ted batch of data. In this paper\, we propose a new strategy to compute a safe policy\, guaranteed to perform at least as well as a given baseline s trategy. We advocate that the assumptions made in previous work are too st rong for real world applications and propose new algorithms allowing those assumptions to be satisfied only in a subset of the state-action pairs. W hile significantly relaxing the assumptions\, our algorithms achieve the s ame accuracy guarantees than the previous work\, and are also much more co mputationally efficient. We also show that the algorithms can be adapted t o model-free Reinforcement Learning. \n DTSTART:20171201T153000Z DTEND:20171201T163000Z LOCATION:Room AA6214\, CA\, Pav. André-Aisenstadt SUMMARY:Romain Laroche\, researcher from MS Maluuba URL:/mathstat/channels/event/romain-laroche-researcher -ms-maluuba-283119 END:VEVENT END:VCALENDAR