llama-cpp-python Usage

2min

Every Functionary model release comes with GGUF file formats . Thus, Functionary can be loaded and used in a much wider variety of hardware using llama.cpp. Currently, we provide the following quantization:

  • 4-bit
  • 8-bit
  • FP16 (except for functionary-medium-v2.* due to file size)

Setup

Make sure that llama-cpp-python is successully installed in your system. The following is the sample code:

Python


The output would be:

Python


Note: we should use the tokenizer from Huggingface to convert prompt into token_ids instead of using the tokenizer from LLama_cpp because we found that tokenizer from LLama_cpp doesn't give the same result as that from Huggingface. The reason might be in the training, we added new tokens to the tokenizer and LLama_Cpp doesn't handle this succesfully

Updated 25 Mar 2024
Doc contributor
Did this page help you?