Skip to content

Provider

Llama Cpp Server

llama_cpp_agent.providers.llama_cpp_server

LlamaCppSamplingSettings dataclass

Bases: LlmSamplingSettings

Settings for generating completions using the Llama.cpp server.

Parameters:

  • temperature (float, default: 0.8 ) –

    Controls the randomness of the generated completions. Higher values make the output more random.

  • top_k (int, default: 40 ) –

    Controls the diversity of the top-k sampling. Higher values result in more diverse completions.

  • top_p (float, default: 0.95 ) –

    Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.

  • min_p (float, default: 0.05 ) –

    Minimum probability for nucleus sampling. Lower values result in more focused completions.

  • n_predict (int, default: -1 ) –

    Number of completions to predict. Set to -1 to use the default value.

  • n_keep (int, default: 0 ) –

    Number of completions to keep. Set to 0 for all predictions.

  • stream (bool, default: True ) –

    Enable streaming for long completions.

  • additional_stop_sequences (List[str], default: None ) –

    List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.

  • tfs_z (float, default: 1.0 ) –

    Controls the temperature for top frequent sampling.

  • typical_p (float, default: 1.0 ) –

    Typical probability for top frequent sampling.

  • repeat_penalty (float, default: 1.1 ) –

    Penalty for repeating tokens in completions.

  • repeat_last_n (int, default: -1 ) –

    Number of tokens to consider for repeat penalty.

  • penalize_nl (bool, default: False ) –

    Enable penalizing newlines in completions.

  • presence_penalty (float, default: 0.0 ) –

    Penalty for presence of certain tokens.

  • frequency_penalty (float, default: 0.0 ) –

    Penalty based on token frequency.

  • penalty_prompt (Union[None, str, List[int]], default: None ) –

    Prompts to apply penalty for certain tokens.

  • mirostat_mode (int, default: 0 ) –

    Mirostat level.

  • mirostat_tau (float, default: 5.0 ) –

    Mirostat temperature.

  • mirostat_eta (float, default: 0.1 ) –

    Mirostat eta parameter.

  • seed (int, default: -1 ) –

    Seed for randomness. Set to -1 for no seed.

  • ignore_eos (bool, default: False ) –

    Ignore end-of-sequence token.

Attributes:

  • temperature (float) –

    Controls the randomness of the generated completions. Higher values make the output more random.

  • top_k (int) –

    Controls the diversity of the top-k sampling. Higher values result in more diverse completions.

  • top_p (float) –

    Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.

  • min_p (float) –

    Minimum probability for nucleus sampling. Lower values result in more focused completions.

  • n_predict (int) –

    Number of completions to predict. Set to -1 to use the default value.

  • n_keep (int) –

    Number of completions to keep. Set to 0 for all predictions.

  • stream (bool) –

    Enable streaming for long completions.

  • additional_stop_sequences (List[str]) –

    List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.

  • tfs_z (float) –

    Controls the temperature for top frequent sampling.

  • typical_p (float) –

    Typical probability for top frequent sampling.

  • repeat_penalty (float) –

    Penalty for repeating tokens in completions.

  • repeat_last_n (int) –

    Number of tokens to consider for repeat penalty.

  • penalize_nl (bool) –

    Enable penalizing newlines in completions.

  • presence_penalty (float) –

    Penalty for presence of certain tokens.

  • frequency_penalty (float) –

    Penalty based on token frequency.

  • penalty_prompt (Union[None, str, List[int]]) –

    Prompts to apply penalty for certain tokens.

  • mirostat_mode (int) –

    Mirostat level.

  • mirostat_tau (float) –

    Mirostat temperature.

  • mirostat_eta (float) –

    Mirostat eta parameter.

  • seed (int) –

    Seed for randomness. Set to -1 for no seed.

  • ignore_eos (bool) –

    Ignore end-of-sequence token.

Methods: save(file_path: str): Save the settings to a file. load_from_file(file_path: str) -> LlamaCppServerGenerationSettings: Load the settings from a file. load_from_dict(settings: dict) -> LlamaCppServerGenerationSettings: Load the settings from a dictionary. as_dict() -> dict: Convert the settings to a dictionary.

Source code in llama_cpp_agent/providers/llama_cpp_server.py
@dataclass
class LlamaCppSamplingSettings(LlmSamplingSettings):
    """
    Settings for generating completions using the Llama.cpp server.

    Args:
        temperature (float): Controls the randomness of the generated completions. Higher values make the output more random.
        top_k (int): Controls the diversity of the top-k sampling. Higher values result in more diverse completions.
        top_p (float): Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.
        min_p (float): Minimum probability for nucleus sampling. Lower values result in more focused completions.
        n_predict (int): Number of completions to predict. Set to -1 to use the default value.
        n_keep (int): Number of completions to keep. Set to 0 for all predictions.
        stream (bool): Enable streaming for long completions.
        additional_stop_sequences (List[str]): List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.
        tfs_z (float): Controls the temperature for top frequent sampling.
        typical_p (float): Typical probability for top frequent sampling.
        repeat_penalty (float): Penalty for repeating tokens in completions.
        repeat_last_n (int): Number of tokens to consider for repeat penalty.
        penalize_nl (bool): Enable penalizing newlines in completions.
        presence_penalty (float): Penalty for presence of certain tokens.
        frequency_penalty (float): Penalty based on token frequency.
        penalty_prompt (Union[None, str, List[int]]): Prompts to apply penalty for certain tokens.
        mirostat_mode (int): Mirostat level.
        mirostat_tau (float): Mirostat temperature.
        mirostat_eta (float): Mirostat eta parameter.
        seed (int): Seed for randomness. Set to -1 for no seed.
        ignore_eos (bool): Ignore end-of-sequence token.

    Attributes:
        temperature (float): Controls the randomness of the generated completions. Higher values make the output more random.
        top_k (int): Controls the diversity of the top-k sampling. Higher values result in more diverse completions.
        top_p (float): Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.
        min_p (float): Minimum probability for nucleus sampling. Lower values result in more focused completions.
        n_predict (int): Number of completions to predict. Set to -1 to use the default value.
        n_keep (int): Number of completions to keep. Set to 0 for all predictions.
        stream (bool): Enable streaming for long completions.
        additional_stop_sequences (List[str]): List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.
        tfs_z (float): Controls the temperature for top frequent sampling.
        typical_p (float): Typical probability for top frequent sampling.
        repeat_penalty (float): Penalty for repeating tokens in completions.
        repeat_last_n (int): Number of tokens to consider for repeat penalty.
        penalize_nl (bool): Enable penalizing newlines in completions.
        presence_penalty (float): Penalty for presence of certain tokens.
        frequency_penalty (float): Penalty based on token frequency.
        penalty_prompt (Union[None, str, List[int]]): Prompts to apply penalty for certain tokens.
        mirostat_mode (int): Mirostat level.
        mirostat_tau (float): Mirostat temperature.
        mirostat_eta (float): Mirostat eta parameter.
        seed (int): Seed for randomness. Set to -1 for no seed.
        ignore_eos (bool): Ignore end-of-sequence token.
    Methods:
        save(file_path: str): Save the settings to a file.
        load_from_file(file_path: str) -> LlamaCppServerGenerationSettings: Load the settings from a file.
        load_from_dict(settings: dict) -> LlamaCppServerGenerationSettings: Load the settings from a dictionary.
        as_dict() -> dict: Convert the settings to a dictionary.

    """

    temperature: float = 0.8
    top_k: int = 40
    top_p: float = 0.95
    min_p: float = 0.05
    n_predict: int = -1
    n_keep: int = 0
    stream: bool = True
    additional_stop_sequences: List[str] = None
    tfs_z: float = 1.0
    typical_p: float = 1.0
    repeat_penalty: float = 1.1
    repeat_last_n: int = -1
    penalize_nl: bool = False
    presence_penalty: float = 0.0
    frequency_penalty: float = 0.0
    penalty_prompt: Union[None, str, List[int]] = None
    mirostat_mode: int = 0
    mirostat_tau: float = 5.0
    mirostat_eta: float = 0.1
    cache_prompt: bool = True
    seed: int = -1
    ignore_eos: bool = False
    samplers: List[str] = None

    def get_provider_identifier(self) -> LlmProviderId:
        return LlmProviderId.llama_cpp_server

    def get_additional_stop_sequences(self) -> List[str]:
        if self.additional_stop_sequences is None:
            self.additional_stop_sequences = []
        return self.additional_stop_sequences

    def add_additional_stop_sequences(self, sequences: List[str]):
        if self.additional_stop_sequences is None:
            self.additional_stop_sequences = []
        self.additional_stop_sequences.extend(sequences)

    def is_streaming(self):
        return self.stream

    @staticmethod
    def load_from_dict(settings: dict) -> "LlamaCppSamplingSettings":
        """
        Load the settings from a dictionary.

        Args:
            settings (dict): The dictionary containing the settings.

        Returns:
            LlamaCppSamplingSettings: The loaded settings.
        """
        return LlamaCppSamplingSettings(**settings)

    def as_dict(self) -> dict:
        """
        Convert the settings to a dictionary.

        Returns:
            dict: The dictionary representation of the settings.
        """
        return self.__dict__
load_from_dict(settings) staticmethod

Load the settings from a dictionary.

Parameters:

  • settings (dict) –

    The dictionary containing the settings.

Returns:

Source code in llama_cpp_agent/providers/llama_cpp_server.py
@staticmethod
def load_from_dict(settings: dict) -> "LlamaCppSamplingSettings":
    """
    Load the settings from a dictionary.

    Args:
        settings (dict): The dictionary containing the settings.

    Returns:
        LlamaCppSamplingSettings: The loaded settings.
    """
    return LlamaCppSamplingSettings(**settings)
as_dict()

Convert the settings to a dictionary.

Returns:

  • dict ( dict ) –

    The dictionary representation of the settings.

Source code in llama_cpp_agent/providers/llama_cpp_server.py
def as_dict(self) -> dict:
    """
    Convert the settings to a dictionary.

    Returns:
        dict: The dictionary representation of the settings.
    """
    return self.__dict__

Llama Cpp Python

llama_cpp_agent.providers.llama_cpp_python

LlamaCppPythonSamplingSettings dataclass

Bases: LlmSamplingSettings

Settings for generating completions using the Llama.cpp server.

Parameters:

  • temperature (float, default: 0.8 ) –

    Controls the randomness of the generated completions. Higher values make the output more random.

  • top_k (int, default: 40 ) –

    Controls the diversity of the top-k sampling. Higher values result in more diverse completions.

  • top_p (float, default: 0.95 ) –

    Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.

  • min_p (float, default: 0.05 ) –

    Minimum probability for nucleus sampling. Lower values result in more focused completions.

  • max_tokens (int, default: -1 ) –

    Number of max tokens to generate.

  • stream (bool, default: False ) –

    Enable streaming for long completions.

  • additional_stop_sequences (List[str], default: None ) –

    List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.

  • tfs_z (float, default: 1.0 ) –

    Controls the temperature for top frequent sampling.

  • typical_p (float, default: 1.0 ) –

    Typical probability for top frequent sampling.

  • repeat_penalty (float, default: 1.1 ) –

    Penalty for repeating tokens in completions.

  • presence_penalty (float, default: 0.0 ) –

    Penalty for presence of certain tokens.

  • frequency_penalty (float, default: 0.0 ) –

    Penalty based on token frequency.

  • mirostat_mode (int, default: 0 ) –

    Mirostat level.

  • mirostat_tau (float, default: 5.0 ) –

    Mirostat temperature.

  • mirostat_eta (float, default: 0.1 ) –

    Mirostat eta parameter.

  • seed (int, default: -1 ) –

    Seed for randomness. Set to -1 for no seed.

Attributes:

  • temperature (float) –

    Controls the randomness of the generated completions. Higher values make the output more random.

  • top_k (int) –

    Controls the diversity of the top-k sampling. Higher values result in more diverse completions.

  • top_p (float) –

    Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.

  • min_p (float) –

    Minimum probability for nucleus sampling. Lower values result in more focused completions.

  • max_tokens (int) –

    Number of max tokens to generate.

  • stream (bool) –

    Enable streaming for long completions.

  • additional_stop_sequences (List[str]) –

    List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.

  • tfs_z (float) –

    Controls the temperature for top frequent sampling.

  • typical_p (float) –

    Typical probability for top frequent sampling.

  • repeat_penalty (float) –

    Penalty for repeating tokens in completions.

  • presence_penalty (float) –

    Penalty for presence of certain tokens.

  • frequency_penalty (float) –

    Penalty based on token frequency.

  • mirostat_mode (int) –

    Mirostat level.

  • mirostat_tau (float) –

    Mirostat temperature.

  • mirostat_eta (float) –

    Mirostat eta parameter.

  • seed (int) –

    Seed for randomness. Set to -1 for no seed.

Methods: save(file_path: str): Save the settings to a file. load_from_file(file_path: str) -> LlamaCppServerGenerationSettings: Load the settings from a file. load_from_dict(settings: dict) -> LlamaCppServerGenerationSettings: Load the settings from a dictionary. as_dict() -> dict: Convert the settings to a dictionary.

Source code in llama_cpp_agent/providers/llama_cpp_python.py
@dataclass
class LlamaCppPythonSamplingSettings(LlmSamplingSettings):
    """
    Settings for generating completions using the Llama.cpp server.

    Args:
        temperature (float): Controls the randomness of the generated completions. Higher values make the output more random.
        top_k (int): Controls the diversity of the top-k sampling. Higher values result in more diverse completions.
        top_p (float): Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.
        min_p (float): Minimum probability for nucleus sampling. Lower values result in more focused completions.
        max_tokens (int): Number of max tokens to generate.
        stream (bool): Enable streaming for long completions.
        additional_stop_sequences (List[str]): List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.
        tfs_z (float): Controls the temperature for top frequent sampling.
        typical_p (float): Typical probability for top frequent sampling.
        repeat_penalty (float): Penalty for repeating tokens in completions.
        presence_penalty (float): Penalty for presence of certain tokens.
        frequency_penalty (float): Penalty based on token frequency.
        mirostat_mode (int): Mirostat level.
        mirostat_tau (float): Mirostat temperature.
        mirostat_eta (float): Mirostat eta parameter.
        seed (int): Seed for randomness. Set to -1 for no seed.


    Attributes:
        temperature (float): Controls the randomness of the generated completions. Higher values make the output more random.
        top_k (int): Controls the diversity of the top-k sampling. Higher values result in more diverse completions.
        top_p (float): Controls the diversity of the nucleus sampling. Higher values result in more diverse completions.
        min_p (float): Minimum probability for nucleus sampling. Lower values result in more focused completions.
        max_tokens (int): Number of max tokens to generate.
        stream (bool): Enable streaming for long completions.
        additional_stop_sequences (List[str]): List of stop sequences to finish completion generation. The official stop sequences of the model get added automatically.
        tfs_z (float): Controls the temperature for top frequent sampling.
        typical_p (float): Typical probability for top frequent sampling.
        repeat_penalty (float): Penalty for repeating tokens in completions.
        presence_penalty (float): Penalty for presence of certain tokens.
        frequency_penalty (float): Penalty based on token frequency.
        mirostat_mode (int): Mirostat level.
        mirostat_tau (float): Mirostat temperature.
        mirostat_eta (float): Mirostat eta parameter.
        seed (int): Seed for randomness. Set to -1 for no seed.
    Methods:
        save(file_path: str): Save the settings to a file.
        load_from_file(file_path: str) -> LlamaCppServerGenerationSettings: Load the settings from a file.
        load_from_dict(settings: dict) -> LlamaCppServerGenerationSettings: Load the settings from a dictionary.
        as_dict() -> dict: Convert the settings to a dictionary.

    """

    temperature: float = 0.8
    top_k: int = 40
    top_p: float = 0.95
    min_p: float = 0.05
    max_tokens: int = -1
    stream: bool = False
    additional_stop_sequences: List[str] = None
    tfs_z: float = 1.0
    typical_p: float = 1.0
    repeat_penalty: float = 1.1
    presence_penalty: float = 0.0
    frequency_penalty: float = 0.0
    mirostat_mode: int = 0
    mirostat_tau: float = 5.0
    mirostat_eta: float = 0.1
    seed: int = -1

    def get_provider_identifier(self) -> LlmProviderId:
        return LlmProviderId.llama_cpp_server

    def get_additional_stop_sequences(self) -> List[str]:
        if self.additional_stop_sequences is None:
            self.additional_stop_sequences = []
        return self.additional_stop_sequences

    def add_additional_stop_sequences(self, sequences: List[str]):
        if self.additional_stop_sequences is None:
            self.additional_stop_sequences = []
        self.additional_stop_sequences.extend(sequences)

    def is_streaming(self):
        return self.stream

    @staticmethod
    def load_from_dict(settings: dict) -> "LlamaCppPythonSamplingSettings":
        """
        Load the settings from a dictionary.

        Args:
            settings (dict): The dictionary containing the settings.

        Returns:
            LlamaCppPythonSamplingSettings: The loaded settings.
        """
        return LlamaCppPythonSamplingSettings(**settings)

    def as_dict(self) -> dict:
        """
        Convert the settings to a dictionary.

        Returns:
            dict: The dictionary representation of the settings.
        """
        return self.__dict__
load_from_dict(settings) staticmethod

Load the settings from a dictionary.

Parameters:

  • settings (dict) –

    The dictionary containing the settings.

Returns:

Source code in llama_cpp_agent/providers/llama_cpp_python.py
@staticmethod
def load_from_dict(settings: dict) -> "LlamaCppPythonSamplingSettings":
    """
    Load the settings from a dictionary.

    Args:
        settings (dict): The dictionary containing the settings.

    Returns:
        LlamaCppPythonSamplingSettings: The loaded settings.
    """
    return LlamaCppPythonSamplingSettings(**settings)
as_dict()

Convert the settings to a dictionary.

Returns:

  • dict ( dict ) –

    The dictionary representation of the settings.

Source code in llama_cpp_agent/providers/llama_cpp_python.py
def as_dict(self) -> dict:
    """
    Convert the settings to a dictionary.

    Returns:
        dict: The dictionary representation of the settings.
    """
    return self.__dict__

TGI - Server

llama_cpp_agent.providers.tgi_server

TGIServerSamplingSettings dataclass

Bases: LlmSamplingSettings

TGIServerSamplingSettings dataclass

Source code in llama_cpp_agent/providers/tgi_server.py
@dataclass
class TGIServerSamplingSettings(LlmSamplingSettings):
    """
    TGIServerSamplingSettings dataclass
    """

    best_of: Optional[int] = field(default=None, metadata={"minimum": 0})
    decoder_input_details: bool = False
    details: bool = True
    do_sample: bool = False
    frequency_penalty: Optional[float] = field(
        default=None, metadata={"exclusiveMinimum": -2}
    )
    grammar: Optional[dict] = None
    max_new_tokens: Optional[int] = field(default=None, metadata={"minimum": 0})
    repetition_penalty: Optional[float] = field(
        default=None, metadata={"exclusiveMinimum": 0}
    )
    return_full_text: Optional[bool] = field(default=None)
    seed: Optional[int] = field(default=None, metadata={"minimum": 0})
    stop: Optional[List[str]] = field(default_factory=list)
    temperature: Optional[float] = field(default=None, metadata={"exclusiveMinimum": 0})
    top_k: Optional[int] = field(default=None, metadata={"exclusiveMinimum": 0})
    top_n_tokens: Optional[int] = field(
        default=None, metadata={"minimum": 0, "exclusiveMinimum": 0}
    )
    top_p: Optional[float] = field(
        default=None, metadata={"maximum": 1, "exclusiveMinimum": 0}
    )
    truncate: Optional[int] = field(default=None, metadata={"minimum": 0})
    typical_p: Optional[float] = field(
        default=None, metadata={"maximum": 1, "exclusiveMinimum": 0}
    )
    watermark: bool = False
    stream: bool = False

    def get_provider_identifier(self) -> LlmProviderId:
        return LlmProviderId.tgi_server

    def get_additional_stop_sequences(self) -> Union[List[str], None]:
        return self.stop

    def add_additional_stop_sequences(self, sequences: List[str]):
        self.stop.extend(sequences)

    def is_streaming(self):
        return self.stream

    @staticmethod
    def load_from_dict(settings: dict) -> "TGIServerSamplingSettings":
        """
        Load the settings from a dictionary.

        Args:
            settings (dict): The dictionary containing the settings.

        Returns:
            LlamaCppSamplingSettings: The loaded settings.
        """
        return TGIServerSamplingSettings(**settings)

    def as_dict(self) -> dict:
        """
        Convert the settings to a dictionary.

        Returns:
            dict: The dictionary representation of the settings.
        """
        return self.__dict__
load_from_dict(settings) staticmethod

Load the settings from a dictionary.

Parameters:

  • settings (dict) –

    The dictionary containing the settings.

Returns:

Source code in llama_cpp_agent/providers/tgi_server.py
@staticmethod
def load_from_dict(settings: dict) -> "TGIServerSamplingSettings":
    """
    Load the settings from a dictionary.

    Args:
        settings (dict): The dictionary containing the settings.

    Returns:
        LlamaCppSamplingSettings: The loaded settings.
    """
    return TGIServerSamplingSettings(**settings)
as_dict()

Convert the settings to a dictionary.

Returns:

  • dict ( dict ) –

    The dictionary representation of the settings.

Source code in llama_cpp_agent/providers/tgi_server.py
def as_dict(self) -> dict:
    """
    Convert the settings to a dictionary.

    Returns:
        dict: The dictionary representation of the settings.
    """
    return self.__dict__

vllm - Server

llama_cpp_agent.providers.vllm_server

VLLMServerSamplingSettings dataclass

Bases: LlmSamplingSettings

VLLMServerSamplingSettings dataclass

Source code in llama_cpp_agent/providers/vllm_server.py
@dataclass
class VLLMServerSamplingSettings(LlmSamplingSettings):
    """
    VLLMServerSamplingSettings dataclass
    """

    best_of: Optional[int] = None
    use_beam_search = False
    top_k: float = -1
    top_p: float = 1
    min_p: float = 0.0
    temperature: float = 0.7
    max_tokens: int = 16
    repetition_penalty: Optional[float] = 1.0
    length_penalty: Optional[float] = 1.0
    early_stopping: Optional[bool] = False
    ignore_eos: Optional[bool] = False
    min_tokens: Optional[int] = 0
    stop_token_ids: Optional[List[int]] = field(default_factory=list)
    skip_special_tokens: Optional[bool] = True
    spaces_between_special_tokens: Optional[bool] = True
    stream: bool = False

    def get_provider_identifier(self) -> LlmProviderId:
        return LlmProviderId.vllm_server

    def get_additional_stop_sequences(self) -> Union[List[str], None]:
        return None

    def add_additional_stop_sequences(self, sequences: List[str]):
        pass

    def is_streaming(self):
        return self.stream

    @staticmethod
    def load_from_dict(settings: dict) -> "VLLMServerSamplingSettings":
        """
        Load the settings from a dictionary.

        Args:
            settings (dict): The dictionary containing the settings.

        Returns:
            LlamaCppSamplingSettings: The loaded settings.
        """
        return VLLMServerSamplingSettings(**settings)

    def as_dict(self) -> dict:
        """
        Convert the settings to a dictionary.

        Returns:
            dict: The dictionary representation of the settings.
        """
        return self.__dict__
load_from_dict(settings) staticmethod

Load the settings from a dictionary.

Parameters:

  • settings (dict) –

    The dictionary containing the settings.

Returns:

Source code in llama_cpp_agent/providers/vllm_server.py
@staticmethod
def load_from_dict(settings: dict) -> "VLLMServerSamplingSettings":
    """
    Load the settings from a dictionary.

    Args:
        settings (dict): The dictionary containing the settings.

    Returns:
        LlamaCppSamplingSettings: The loaded settings.
    """
    return VLLMServerSamplingSettings(**settings)
as_dict()

Convert the settings to a dictionary.

Returns:

  • dict ( dict ) –

    The dictionary representation of the settings.

Source code in llama_cpp_agent/providers/vllm_server.py
def as_dict(self) -> dict:
    """
    Convert the settings to a dictionary.

    Returns:
        dict: The dictionary representation of the settings.
    """
    return self.__dict__