ActorCritic
GBRL model implementing actor-critic architecture using gradient-boosted decision trees. Supports both shared and separate ensembles for the actor and critic, enabling flexible RL algorithm design such as PPO or A2C with tree-based models.
- class gbrl.models.actor_critic.ActorCritic(tree_struct: Dict, input_dim: int, output_dim: int, policy_optimizer: Dict, value_optimizer: Dict = None, shared_tree_struct: bool = True, params: Dict = {}, bias: ndarray = None, verbose: int = 0, device: str = 'cpu')[source]
Bases:
BaseGBT
GBRL model for a shared Actor and Critic ensemble.
Supports both shared and separate actor-critic tree structures.
- actor_step(observations: ndarray | Tensor | None = None, policy_grad: ndarray | Tensor | None = None, policy_grad_clip: float | None = None) None [source]
Performs a single boosting step for the actor (should only be used if actor and critic use separate models)
- Parameters:
observations (NumericalData)
policy_grad_clip (float, optional) – Defaults to None.
policy_grad (Optional[NumericalData], optional) – manually calculated gradients. Defaults to None.
- Returns:
policy gradient
- Return type:
np.ndarray
- copy() ActorCritic [source]
Copy class instance
- Returns:
copy of current instance
- Return type:
- critic_step(observations: ndarray | Tensor | None = None, value_grad: ndarray | Tensor | None = None, value_grad_clip: float | None = None) None [source]
Performs a single boosting step for the critic (should only be used if actor and critic use separate models)
- Parameters:
observations (NumericalData)
value_grad_clip (float, optional) – Defaults to None.
value_grad (Optional[NumericalData], optional) – manually calculated gradients. Defaults to None.
- Returns:
value gradient
- Return type:
np.ndarray
- get_params() Tuple[ndarray, ndarray] [source]
Returns the predicted actor and critic parameters along with their gradients.
- Returns:
Predicted actor and critic outputs.
Corresponding policy and value gradients.
- Return type:
Tuple[np.ndarray, np.ndarray]
- classmethod load_learner(load_name: str, device: str) ActorCritic [source]
Loads GBRL model from a file
- Parameters:
load_name (str) – full path to file name
- Returns:
loaded ActorCriticModel
- Return type:
- predict_values(observations: ndarray | Tensor, requires_grad: bool = True, start_idx: int = 0, stop_idx: int = None, tensor: bool = True) ndarray | Tensor [source]
- Predict only values. If requires_grad=True then stores differentiable parameters in self.params
Return type/device is identical to the input type/device.
- Parameters:
observations (NumericalData)
requires_grad (bool, optional)
start_idx (int, optional) – start tree index for prediction. Defaults to 0.
stop_idx (_type_, optional) – stop tree index for prediction (uses
None. (all trees in the ensemble if set to 0). Defaults to)
tensor (bool, optional) – Return PyTorch Tensor, False returns a numpy array. Defaults to True.
- Returns:
values
- Return type:
NumericalData
- save_learner(save_path: str) None [source]
Saves model to file
- Parameters:
filename (str) – Absolute path and name of save filename.
- step(observations: ndarray | Tensor | None = None, policy_grad: ndarray | Tensor | None = None, value_grad: ndarray | Tensor | None = None, policy_grad_clip: float | None = None, value_grad_clip: float | None = None) None [source]
Performs a boosting step for both the actor and critic.
If observations is not provided, it uses the stored input from thelast forward pass.
- Parameters:
observations (Optional[NumericalData], optional) – Input observations.
policy_grad (Optional[NumericalData], optional) – Manually computed gradients for the policy.
value_grad (Optional[NumericalData], optional) – Manually computed gradients for the value function.
policy_grad_clip (Optional[float], optional) – Gradient clipping value for policy updates.
value_grad_clip (Optional[float], optional) – Gradient clipping value for value updates.