GaussianActor

GBRL model that produces the parameters of a Gaussian policy distribution (mean and log standard deviation). Used for continuous control tasks, especially in algorithms like SAC, with support for fixed or learnable standard deviations.

class gbrl.models.actor.GaussianActor(tree_struct: Dict, input_dim: int, output_dim: int, mu_optimizer: Dict, std_optimizer: Dict = None, log_std_init: float = -2, params: Dict = {}, bias: ndarray = None, verbose: int = 0, device: str = 'cpu')[source]

Bases: BaseGBT

GBRL model for an actor ensemble used in algorithms such as SAC. This model outputs the mean (mu) and log standard deviation (log_std) of a Gaussian distribution, allowing stochastic action selection.

step(observations: ndarray | Tensor | None = None, mu_grad: ndarray | Tensor | None = None, log_std_grad: ndarray | Tensor | None = None, mu_grad_clip: float | None = None, log_std_grad_clip: float | None = None) None[source]

Performs a single boosting iteration.

Parameters:
  • observations (NumericalData) – Input observations.

  • mu_grad (Optional[NumericalData], optional)

  • gradients. (Manually computed log standard deviation)

  • log_std_grad (Optional[NumericalData], optional)

  • gradients.

  • mu_grad_clip (Optional[float], optional) – Gradient clipping for

  • None. (for log standard deviation. Defaults to)

  • log_std_grad_clip (Optional[float], optional) – Gradient clipping

  • None.