quarticgym.envs package
Subpackages
Submodules
quarticgym.envs.atropineenv module
AtropineEnv simulates an atropine production environment.
- class quarticgym.envs.atropineenv.AtropineEnvGym(normalize=True, max_steps: int = 60, x0_loc='quarticgym/datasets/atropineenv/x0.txt', z0_loc='quarticgym/datasets/atropineenv/z0.txt', model_loc='quarticgym/datasets/atropineenv/model.npy', uss_subtracted=False, reward_on_ess_subtracted=False, reward_on_steady=True, reward_on_absolute_efactor=False, reward_on_actions_penalty=0.0, reward_on_reject_actions=True, relaxed_max_min_actions=False, observation_include_t=True, observation_include_action=False, observation_include_uss=True, observation_include_ess=True, observation_include_e=True, observation_include_kf=True, observation_include_z=True, observation_include_x=False)[source]
quarticgym.envs.beerfmtenv module
BeerFMT simulates the Beer Fermentation process.
- class quarticgym.envs.beerfmtenv.BeerFMTEnvGym(dense_reward=True, normalize=True, observation_relaxation=1.0, action_dim=1, observation_dim=8)[source]
quarticgym.envs.pensimenv module
quarticgym.envs.reactorenv module
ReactorEnv simulates a general reactor environment. This is supposed to be an template environment. The documentations in that file is enhanced and provided comment lines (# —- standard —- and # /—- standard —-) enclose pieces of code that should be reused by most of QuarticGym environments. I will extend some of them into a base class in the future.
- class quarticgym.envs.reactorenv.ReactorEnvGym(dense_reward=True, normalize=True, debug_mode=False, action_dim=2, observation_dim=3, reward_function=None, done_calculator=None, max_observations=[1.0, 100.0, 1.0], min_observations=[1e-08, 1e-08, 1e-08], max_actions=[35.0, 0.2], min_actions=[15.0, 0.05], error_reward=-1000.0, initial_state_deviation_ratio=0.3, compute_diffs_on_reward=False, np_dtype=<class 'numpy.float32'>, sampling_time=0.1, max_steps=100)[source]
Bases:
quarticgym.envs.utils.QuarticGymEnvBase
- evaluate_observation(observation)[source]
observation: numpy array of shape (self.observation_dim) returns: observation evaluation (reward in a sense)
- evaluate_rewards_mean_std_over_episodes(algorithms, num_episodes=1, error_reward=- 1000.0, initial_states=None, to_plt=True, plot_dir='./plt_results', computer_on_episodes=False)[source]
returns: mean and std of rewards over all episodes. since the rewards_list is not aligned (e.g. some trajectories are shorter than the others), so we cannot directly convert it to numpy array. we have to convert and unwrap the nested list. if computer_on_episodes, we first average the rewards_list over episodes, then compute the mean and std. else, we directly compute the mean and std for each step.
- evalute_algorithms(algorithms, num_episodes=1, error_reward=- 1000.0, initial_states=None, to_plt=True, plot_dir='./plt_results')[source]
when excecuting evalute_algorithms, the self.normalize should be False. algorithms: list of (algorithm, algorithm_name, normalize). algorithm has to have a method predict(observation) -> action: np.ndarray. num_episodes: number of episodes to run error_reward: initial_states: None, location of numpy file of initial states or a (numpy) list of initial states to_plt: whether generates plot or not plot_dir: None or directory to save plots returns: list of average_rewards over each episode and num of episodes
- find_outperformances(algorithms, rewards_list, initial_states, threshold=0.05, top_k=10)[source]
this function computes the outperformances of the last algorithm in algorithms. there are three criteria: if in a trajectory, the algorithm has reward >= all other algorithms, the corresponding initial_state is stored to always_better. if in a trajectory, the algorithm’s mean reward >= threshold + all other algorithms’ mean reward, the corresponding initial_state is stored to averagely_better. for the top_k most outperformed reward mean, the corresponding initial_state is stored to top_k_better, in ascending order.
- generate_dataset_with_algorithm(algorithm, normalize=None, num_episodes=1, error_reward=- 1000.0, initial_states=None, format='d4rl')[source]
this function aims to create a dataset for offline reinforcement learning, in either d4rl or pytorch format. the trajectories are generated by the algorithm, which interacts with this env initialized by initial_states. algorithm: an instance that has a method predict(observation) -> action: np.ndarray. if format == ‘d4rl’, returns a dictionary in d4rl format. else if format == ‘torch’, returns an object of type torch.utils.data.Dataset.
quarticgym.envs.utils module
- class quarticgym.envs.utils.QuarticGymEnvBase(dense_reward=True, normalize=True, debug_mode=False, action_dim=2, observation_dim=3, reward_function=None, done_calculator=None, max_observations=[1.0, 1.0], min_observations=[- 1.0, - 1.0], max_actions=[1.0, 1.0], min_actions=[- 1.0, - 1.0], error_reward=- 100.0)[source]
Bases:
gym.core.Env
- done_calculator_standard(current_observation, step_count, reward, done=None, done_info=None)[source]
check whether the current episode is considered finished. returns a boolean value indicated done or not, and a dictionary with information. here in done_calculator_standard, done_info looks like {“terminal”: boolean, “timeout”: boolean}, where “timeout” is true when episode end due to reaching the maximum episode length, “terminal” is true when “timeout” or episode end due to termination conditions such as env error encountered. (basically done)
- evalute_algorithms(algorithms, num_episodes=1, error_reward=- 1000.0, initial_states=None, to_plt=True, plot_dir='./plt_results')[source]
when excecuting evalute_algorithms, the self.normalize should be False. algorithms: list of (algorithm, algorithm_name, normalize). algorithm has to have a method predict(observation) -> action: np.ndarray. num_episodes: number of episodes to run error_reward: initial_states: None, location of numpy file of initial states or a (numpy) list of initial states to_plt: whether generates plot or not plot_dir: None or directory to save plots returns: list of average_rewards over each episode and num of episodes
- generate_dataset_with_algorithm(algorithm, normalize=None, num_episodes=1, error_reward=- 1000.0, initial_states=None, format='d4rl')[source]
this function aims to create a dataset for offline reinforcement learning, in either d4rl or pytorch format. the trajectories are generated by the algorithm, which interacts with this env initialized by initial_states. algorithm: an instance that has a method predict(observation) -> action: np.ndarray. if format == ‘d4rl’, returns a dictionary in d4rl format. else if format == ‘torch’, returns an object of type torch.utils.data.Dataset.
- observation_beyond_box(observation)[source]
check if the observation is beyond the box, which is what we don’t want.