llama-cpp-python Usage
2min
Every Functionary model release comes with GGUF file formats . Thus, Functionary can be loaded and used in a much wider variety of hardware using llama.cpp. Currently, we provide the following quantization:
- 4-bit
- 8-bit
- FP16 (except for functionary-medium-v2.* due to file size)
Make sure that llama-cpp-python is successully installed in your system. The following is the sample code:
The output would be:
Note: we should use the tokenizer from Huggingface to convert prompt into token_ids instead of using the tokenizer from LLama_cpp because we found that tokenizer from LLama_cpp doesn't give the same result as that from Huggingface. The reason might be in the training, we added new tokens to the tokenizer and LLama_Cpp doesn't handle this succesfully