ActorCritic

GBRL model implementing actor-critic architecture using gradient-boosted decision trees. Supports both shared and separate ensembles for the actor and critic, enabling flexible RL algorithm design such as PPO or A2C with tree-based models.

class gbrl.models.actor_critic.ActorCritic(tree_struct: Dict, input_dim: int, output_dim: int, policy_optimizer: Dict, value_optimizer: Dict, shared_tree_struct: bool = True, params: Dict = {}, bias: float | numpy.ndarray | torch.Tensor | List[numpy.ndarray | torch.Tensor | float] | None = None, verbose: int = 0, device: str = 'cpu')[source]

Bases: BaseGBT

GBRL model for a shared Actor and Critic ensemble.

Supports both shared and separate actor-critic tree structures.

actor_step(observations: numpy.ndarray | torch.Tensor | None = None, policy_grads: numpy.ndarray | torch.Tensor | None = None, policy_grad_clip: float | None = None) None[source]

Performs a single boosting step for the actor (should only be used if actor and critic use separate models)

Parameters:
  • observations (NumericalData)

  • policy_grad_clip (float, optional) – Defaults to None.

  • policy_grads (Optional[NumericalData], optional) – manually calculated gradients. Defaults to None.

Returns:

policy gradient

Return type:

np.ndarray

copy() ActorCritic[source]

Copy class instance

Returns:

copy of current instance

Return type:

ActorCritic

critic_step(observations: numpy.ndarray | torch.Tensor | None = None, value_grads: numpy.ndarray | torch.Tensor | None = None, value_grad_clip: float | None = None) None[source]

Performs a single boosting step for the critic (should only be used if actor and critic use separate models)

Parameters:
  • observations (NumericalData)

  • value_grad_clip (float, optional) – Defaults to None.

  • value_grads (Optional[NumericalData], optional) – manually calculated gradients. Defaults to None.

Returns:

value gradient

Return type:

np.ndarray

get_grads() numpy.ndarray | torch.Tensor | Tuple[numpy.ndarray | torch.Tensor, ...] | None[source]

Gets a copy of the gradients from the last backward pass.

Returns:

Cloned/copied

gradients or None if no backward pass has been performed.

Return type:

Optional[Union[NumericalData, Tuple[NumericalData, …]]]

classmethod load_learner(load_name: str, device: str) ActorCritic[source]

Loads GBRL model from a file

Parameters:

load_name (str) – full path to file name

Returns:

loaded ActorCriticModel

Return type:

ActorCritic

predict_policy(observations: numpy.ndarray | torch.Tensor, requires_grad: bool = True, start_idx: int = 0, stop_idx: int | None = None, tensor: bool = True) numpy.ndarray | torch.Tensor[source]
Predict only policy. If requires_grad=True then stores differentiable parameters in self.params

Return type/device is identical to the input type/device.

Parameters:
  • observations (NumericalData)

  • requires_grad (bool, optional)

  • start_idx (int, optional) – start tree index for prediction. Defaults to 0.

  • stop_idx (_type_, optional) – stop tree index for prediction (uses

  • None. (all trees in the ensemble if set to 0). Defaults to)

  • tensor (bool, optional) – Return PyTorch Tensor, False returns a numpy array. Defaults to True.

Returns:

policy

Return type:

NumericalData

predict_values(observations: numpy.ndarray | torch.Tensor, requires_grad: bool = True, start_idx: int = 0, stop_idx: int | None = None, tensor: bool = True) numpy.ndarray | torch.Tensor[source]
Predict only values. If requires_grad=True then stores differentiable parameters in self.params

Return type/device is identical to the input type/device.

Parameters:
  • observations (NumericalData)

  • requires_grad (bool, optional)

  • start_idx (int, optional) – start tree index for prediction. Defaults to 0.

  • stop_idx (_type_, optional) – stop tree index for prediction (uses

  • None. (all trees in the ensemble if set to 0). Defaults to)

  • tensor (bool, optional) – Return PyTorch Tensor, False returns a numpy array. Defaults to True.

Returns:

values

Return type:

NumericalData

save_learner(save_path: str) None[source]

Saves model to file

Parameters:

filename (str) – Absolute path and name of save filename.

step(observations: numpy.ndarray | torch.Tensor | None = None, policy_grads: numpy.ndarray | torch.Tensor | None = None, value_grads: numpy.ndarray | torch.Tensor | None = None, policy_grad_clip: float | None = None, value_grad_clip: float | None = None) None[source]

Performs a boosting step for both the actor and critic.

If observations is not provided, it uses the stored input from thelast forward pass.

Parameters:
  • observations (Optional[NumericalData], optional) – Input observations.

  • policy_grads (Optional[NumericalData], optional) – Manually computed gradients for the policy.

  • value_grads (Optional[NumericalData], optional) – Manually computed gradients for the value function.

  • policy_grad_clip (Optional[float], optional) – Gradient clipping value for policy updates.

  • value_grad_clip (Optional[float], optional) – Gradient clipping value for value updates.