Provider
Llama Cpp Server
llama_cpp_agent.providers.llama_cpp_server
LlamaCppSamplingSettings
dataclass
Bases: LlmSamplingSettings
Settings for generating completions using the Llama.cpp server.
Parameters:
-
temperature(float, default:0.8) –Controls the randomness of the generated completions. Higher values make the output more random.
-
top_k(int, default:40) –Controls the diversity of the top-k sampling. Higher values result in more diverse completions.
-
top_p(float, default:0.95) –Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.
-
min_p(float, default:0.05) –Minimum probability for nucleus sampling. Lower values result in more focused completions.
-
n_predict(int, default:-1) –Number of completions to predict. Set to -1 to use the default value.
-
n_keep(int, default:0) –Number of completions to keep. Set to 0 for all predictions.
-
stream(bool, default:True) –Enable streaming for long completions.
-
additional_stop_sequences(List[str], default:None) –List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.
-
tfs_z(float, default:1.0) –Controls the temperature for top frequent sampling.
-
typical_p(float, default:1.0) –Typical probability for top frequent sampling.
-
repeat_penalty(float, default:1.1) –Penalty for repeating tokens in completions.
-
repeat_last_n(int, default:-1) –Number of tokens to consider for repeat penalty.
-
penalize_nl(bool, default:False) –Enable penalizing newlines in completions.
-
presence_penalty(float, default:0.0) –Penalty for presence of certain tokens.
-
frequency_penalty(float, default:0.0) –Penalty based on token frequency.
-
penalty_prompt(Union[None, str, List[int]], default:None) –Prompts to apply penalty for certain tokens.
-
mirostat_mode(int, default:0) –Mirostat level.
-
mirostat_tau(float, default:5.0) –Mirostat temperature.
-
mirostat_eta(float, default:0.1) –Mirostat eta parameter.
-
seed(int, default:-1) –Seed for randomness. Set to -1 for no seed.
-
ignore_eos(bool, default:False) –Ignore end-of-sequence token.
Attributes:
-
temperature(float) –Controls the randomness of the generated completions. Higher values make the output more random.
-
top_k(int) –Controls the diversity of the top-k sampling. Higher values result in more diverse completions.
-
top_p(float) –Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.
-
min_p(float) –Minimum probability for nucleus sampling. Lower values result in more focused completions.
-
n_predict(int) –Number of completions to predict. Set to -1 to use the default value.
-
n_keep(int) –Number of completions to keep. Set to 0 for all predictions.
-
stream(bool) –Enable streaming for long completions.
-
additional_stop_sequences(List[str]) –List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.
-
tfs_z(float) –Controls the temperature for top frequent sampling.
-
typical_p(float) –Typical probability for top frequent sampling.
-
repeat_penalty(float) –Penalty for repeating tokens in completions.
-
repeat_last_n(int) –Number of tokens to consider for repeat penalty.
-
penalize_nl(bool) –Enable penalizing newlines in completions.
-
presence_penalty(float) –Penalty for presence of certain tokens.
-
frequency_penalty(float) –Penalty based on token frequency.
-
penalty_prompt(Union[None, str, List[int]]) –Prompts to apply penalty for certain tokens.
-
mirostat_mode(int) –Mirostat level.
-
mirostat_tau(float) –Mirostat temperature.
-
mirostat_eta(float) –Mirostat eta parameter.
-
seed(int) –Seed for randomness. Set to -1 for no seed.
-
ignore_eos(bool) –Ignore end-of-sequence token.
Methods: save(file_path: str): Save the settings to a file. load_from_file(file_path: str) -> LlamaCppServerGenerationSettings: Load the settings from a file. load_from_dict(settings: dict) -> LlamaCppServerGenerationSettings: Load the settings from a dictionary. as_dict() -> dict: Convert the settings to a dictionary.
Source code in llama_cpp_agent/providers/llama_cpp_server.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
load_from_dict(settings)
staticmethod
Load the settings from a dictionary.
Parameters:
-
settings(dict) –The dictionary containing the settings.
Returns:
-
LlamaCppSamplingSettings(LlamaCppSamplingSettings) –The loaded settings.
Source code in llama_cpp_agent/providers/llama_cpp_server.py
as_dict()
Convert the settings to a dictionary.
Returns:
-
dict(dict) –The dictionary representation of the settings.
Llama Cpp Python
llama_cpp_agent.providers.llama_cpp_python
LlamaCppPythonSamplingSettings
dataclass
Bases: LlmSamplingSettings
Settings for generating completions using the Llama.cpp server.
Parameters:
-
temperature(float, default:0.8) –Controls the randomness of the generated completions. Higher values make the output more random.
-
top_k(int, default:40) –Controls the diversity of the top-k sampling. Higher values result in more diverse completions.
-
top_p(float, default:0.95) –Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.
-
min_p(float, default:0.05) –Minimum probability for nucleus sampling. Lower values result in more focused completions.
-
max_tokens(int, default:-1) –Number of max tokens to generate.
-
stream(bool, default:False) –Enable streaming for long completions.
-
additional_stop_sequences(List[str], default:None) –List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.
-
tfs_z(float, default:1.0) –Controls the temperature for top frequent sampling.
-
typical_p(float, default:1.0) –Typical probability for top frequent sampling.
-
repeat_penalty(float, default:1.1) –Penalty for repeating tokens in completions.
-
presence_penalty(float, default:0.0) –Penalty for presence of certain tokens.
-
frequency_penalty(float, default:0.0) –Penalty based on token frequency.
-
mirostat_mode(int, default:0) –Mirostat level.
-
mirostat_tau(float, default:5.0) –Mirostat temperature.
-
mirostat_eta(float, default:0.1) –Mirostat eta parameter.
-
seed(int, default:-1) –Seed for randomness. Set to -1 for no seed.
Attributes:
-
temperature(float) –Controls the randomness of the generated completions. Higher values make the output more random.
-
top_k(int) –Controls the diversity of the top-k sampling. Higher values result in more diverse completions.
-
top_p(float) –Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.
-
min_p(float) –Minimum probability for nucleus sampling. Lower values result in more focused completions.
-
max_tokens(int) –Number of max tokens to generate.
-
stream(bool) –Enable streaming for long completions.
-
additional_stop_sequences(List[str]) –List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.
-
tfs_z(float) –Controls the temperature for top frequent sampling.
-
typical_p(float) –Typical probability for top frequent sampling.
-
repeat_penalty(float) –Penalty for repeating tokens in completions.
-
presence_penalty(float) –Penalty for presence of certain tokens.
-
frequency_penalty(float) –Penalty based on token frequency.
-
mirostat_mode(int) –Mirostat level.
-
mirostat_tau(float) –Mirostat temperature.
-
mirostat_eta(float) –Mirostat eta parameter.
-
seed(int) –Seed for randomness. Set to -1 for no seed.
Methods: save(file_path: str): Save the settings to a file. load_from_file(file_path: str) -> LlamaCppServerGenerationSettings: Load the settings from a file. load_from_dict(settings: dict) -> LlamaCppServerGenerationSettings: Load the settings from a dictionary. as_dict() -> dict: Convert the settings to a dictionary.
Source code in llama_cpp_agent/providers/llama_cpp_python.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | |
load_from_dict(settings)
staticmethod
Load the settings from a dictionary.
Parameters:
-
settings(dict) –The dictionary containing the settings.
Returns:
-
LlamaCppPythonSamplingSettings(LlamaCppPythonSamplingSettings) –The loaded settings.
Source code in llama_cpp_agent/providers/llama_cpp_python.py
as_dict()
Convert the settings to a dictionary.
Returns:
-
dict(dict) –The dictionary representation of the settings.
TGI - Server
llama_cpp_agent.providers.tgi_server
TGIServerSamplingSettings
dataclass
Bases: LlmSamplingSettings
TGIServerSamplingSettings dataclass
Source code in llama_cpp_agent/providers/tgi_server.py
load_from_dict(settings)
staticmethod
Load the settings from a dictionary.
Parameters:
-
settings(dict) –The dictionary containing the settings.
Returns:
-
LlamaCppSamplingSettings(TGIServerSamplingSettings) –The loaded settings.
Source code in llama_cpp_agent/providers/tgi_server.py
as_dict()
Convert the settings to a dictionary.
Returns:
-
dict(dict) –The dictionary representation of the settings.
vllm - Server
llama_cpp_agent.providers.vllm_server
VLLMServerSamplingSettings
dataclass
Bases: LlmSamplingSettings
VLLMServerSamplingSettings dataclass
Source code in llama_cpp_agent/providers/vllm_server.py
load_from_dict(settings)
staticmethod
Load the settings from a dictionary.
Parameters:
-
settings(dict) –The dictionary containing the settings.
Returns:
-
LlamaCppSamplingSettings(VLLMServerSamplingSettings) –The loaded settings.
Source code in llama_cpp_agent/providers/vllm_server.py
as_dict()
Convert the settings to a dictionary.
Returns:
-
dict(dict) –The dictionary representation of the settings.