ParametricActor

GBRL model that outputs a deterministic or parametric policy, typically used for discrete action spaces. Each output dimension corresponds to a learned policy parameter, and the model can be integrated into actor-critic frameworks or standalone policy learning.

class gbrl.models.actor.ParametricActor(tree_struct: Dict, input_dim: int, output_dim: int, policy_optimizer: Dict, params: Dict = {}, bias: ndarray = None, verbose: int = 0, device: str = 'cpu')[source]

Bases: BaseGBT

GBRL model for a ParametricActor ensemble. ParametricActor outputs a single parameter per action dimension, allowing deterministic or stochastic behavior (e.g., for discrete action spaces).

step(observations: ndarray | Tensor | None = None, policy_grad: ndarray | Tensor | None = None, policy_grad_clip: float | None = None) None[source]

Performs a single boosting iteration.

Parameters:
  • observations (NumericalData)

  • policy_grad_clip (float, optional) – . Defaults to None.

  • policy_grad (Optional[NumericalData], optional) – manually

  • None. (calculated gradients. Defaults to)