ActorCritic Class

This class implements a GBT-based Actor-Critic learner for reinforcement learning. The ActorCritic class can be used with a shared-tree structure or a separate tree strucutre. Usage examples: GBT-based PPO/AWR implementations.

class gbrl.ac_gbrl.ActorCritic(tree_struct: Dict, input_dim: int, output_dim: int, policy_optimizer: Dict, value_optimizer: Dict = None, shared_tree_struct: bool = True, gbrl_params: Dict = {}, bias: ndarray = None, verbose: int = 0, device: str = 'cpu')[source]

Bases: GBRL

Performs a single boosting step for the actor (should only be used if actor and critic use separate models)

Parameters:

observations (Union[np.ndarray, th.Tensor])
policy_grad_clip (float, optional) – Defaults to None.
policy_grad (Optional[Union[np.ndarray, th.Tensor]], optional) – manually calculated gradients. Defaults to None.

Returns:

policy gradient

Return type:

np.ndarray

copy() → ActorCritic[source]

Copy class instance

Returns:: copy of current instance
Return type:: ActorCritic

Performs a single boosting step for the critic (should only be used if actor and critic use separate models)

Parameters:

observations (Union[np.ndarray, th.Tensor])
value_grad_clip (float, optional) – Defaults to None.
value_grad (Optional[Union[np.ndarray, th.Tensor]], optional) – manually calculated gradients. Defaults to None.

Returns:

value gradient

Return type:

np.ndarray

get_num_trees() → int | Tuple[int, int][source]: Returns number of trees in the ensemble. If separate actor and critic return number of trees per ensemble. :returns: Union[int, Tuple[int, int]]

get_params() → Tuple[ndarray, ndarray][source]

Returns predicted actor and critic parameters and their respective gradients

Returns:: Tuple[np.ndarray, np.ndarray]

classmethod load_model(load_name: str, device: str) → ActorCritic[source]

Loads GBRL model from a file

Parameters:: load_name (str) – full path to file name
Returns:: loaded ActorCriticModel
Return type:: ActorCritic

predict_values(observations: ndarray | Tensor, requires_grad: bool = True, start_idx: int = 0, stop_idx: int = None, tensor: bool = True) → ndarray | Tensor[source]

Predict only values. If requires_grad=True then stores: differentiable parameters in self.params Return type/device is identical to the input type/device.

Parameters:

observations (Union[np.ndarray, th.Tensor])
requires_grad (bool, optional)
start_idx (int, optional) – start tree index for prediction. Defaults to 0.
stop_idx (_type_, optional) – stop tree index for prediction (uses all trees in the ensemble if set to 0). Defaults to None.
tensor (bool, optional) – Return PyTorch Tensor, False returns a numpy array. Defaults to True.

Returns:

values

Return type:

Union[np.ndarray, th.Tensor]

Performs a boosting step for both actor and critic

Parameters:

observations (Union[np.ndarray, th.Tensor])
policy_grad_clip (float, optional) – . Defaults to None.
value_grad_clip (float, optional) – . Defaults to None.
policy_grad (Optional[Union[np.ndarray, th.Tensor]], optional) – manually calculated gradients. Defaults to None.
value_grad (Optional[Union[np.ndarray, th.Tensor]], optional) – manually calculated gradients. Defaults to None.