Functionary

17 min

functionary is a language model that can interpret and execute functions/plugins the model determines when to execute functions, whether in parallel or serially, and can understand their outputs it only triggers functions as needed function definitions are given as json schema objects, similar to openai gpt function calls key features model features intelligent parallel function calls/tool uses able to analyze functions/tools outputs and provide relevant responses grounded in the function/tools outputs able to decide when to not use tools/call functions and provide normal chat response server/inference features openai compatible server based on the blazing fast vllm grammar sampling resulting in 100% accuracy in generating function and argument names automatically execute python functions called by functionary us vs the industry feature/project functionary nexusraven gorilla glaive gpt 4 1106 preview single function call parallel function calls nested function calls following up on missing function arguments multi turn generate model responses grounded in tools execution results chit chat performance function prediction evaluation we evaluated functionary's function call prediction capabilities on meetkai's in house benchmark dataset the accuracy metric measures the overall correctness of predicted function calls, including function name prediction and arguments extraction dataset model name function calling accuracy (name & arguments) in house data meetkai functionary small v2 2 0 546 in house data meetkai functionary medium v2 2 0 664 in house data openai gpt 3 5 turbo 1106 0 531 in house data openai gpt 4 1106 preview 0 737 get started installation make sure you have pytorch https //pytorch org/get started/locally/ installed then to install the required dependencies, run pip install r requirements txt running the server now you can start a blazing fast vllm https //vllm readthedocs io/en/latest/getting started/installation html server python3 server vllm py model "meetkai/functionary small v2 2" host 0 0 0 0 server options since the server is running on vllm, you can pass in various arguments for the vllm engine they are listed here the more important arguments are \ model the name of model (e g "meetkai/functionary medium v2 2") tensor parallel size (default = 1) the number of gpus to use for the server this should be set to greater than 1 for larger models like "meetkai/functionary medium v2 2" depending on the type of gpu that you are using max model len the context window to set for your server there are also some features implemented independent of vllm by us \ host (default="0 0 0 0") the host ip address for the server port (default="8000") the port to be exposed for the server grammar sampling (default=true) when enabled, the server uses our implementation of grammar sampling that ensures 100% accuracy for function and argument names it works by constraining function and argument names to only the list of functions provided in the api call or no function call at all running the client openai compatible usage from openai import openai client = openai(base url="http //localhost 8000/v1", api key="functionary") client chat completions create( model="meetkai/functionary small v2 2", messages=\[{"role" "user", "content" "what is the weather for istanbul?"} ], tools=\[{ "type" "function", "function" { "name" "get current weather", "description" "get the current weather", "parameters" { "type" "object", "properties" { "location" { "type" "string", "description" "the city and state, e g san francisco, ca" } }, "required" \["location"] } } }], tool choice="auto" ) raw usage import requests data = { 'model' 'meetkai/functionary small v2 2', # model name here is the value of argument " model" in deploying server vllm py or server py 'messages' \[ { "role" "user", "content" "what is the weather for istanbul?" } ], 'tools' \[ # for functionary 7b v2 we use "tools"; for functionary 7b v1 4 we use "functions" = \[{"name" "get current weather", "description" , "parameters" }] { "type" "function", "function" { "name" "get current weather", "description" "get the current weather", "parameters" { "type" "object", "properties" { "location" { "type" "string", "description" "the city and state, e g san francisco, ca" } }, "required" \["location"] } } } ] } response = requests post("http //127 0 0 1 8000/v1/chat/completions", json=data, headers={ "content type" "application/json", "authorization" "bearer xxxx" }) \# print the response text print(response text) if you're having trouble with dependencies, and you have nvidia container toolkit https //docs nvidia com/datacenter/cloud native/container toolkit/latest/install guide html#setting up nvidia container toolkit , you can start your environment like this sudo docker run gpus all it shm size=8g name functionary v ${pwd}/functionary workspace /workspace p 8000 8000 nvcr io/nvidia/pytorch 22 12 py3 models available model description compute requirements (for fp16 hf model weights) functionary small v2 2 https //huggingface co/meetkai/functionary small v2 2 / gguf https //huggingface co/meetkai/functionary small v2 2 gguf 8k context any gpu with 24gb vram functionary medium v2 2 https //huggingface co/meetkai/functionary medium v2 2 / gguf https //huggingface co/meetkai/functionary medium v2 2 gguf 8k context, better accuracy 2 x a100 80gb or equivalent functionary 7b v2 1 https //huggingface co/meetkai/functionary 7b v2 1 / gguf https //huggingface co/meetkai/functionary 7b v2 1 gguf 8k context any gpu with 24gb vram functionary 7b v2 https //huggingface co/meetkai/functionary 7b v2 / gguf https //huggingface co/meetkai/functionary 7b v2 gguf parallel function call support any gpu with 24gb vram functionary 7b v1 4 https //huggingface co/meetkai/functionary 7b v1 4 / gguf https //huggingface co/meetkai/functionary 7b v1 4 gguf 4k context, better accuracy (deprecated) any gpu with 24gb vram functionary 7b v1 1 https //huggingface co/meetkai/functionary 7b v1 1 4k context (deprecated) any gpu with 24gb vram functionary 7b v0 1 2k context (deprecated) not recommended, use 2 1 onwards any gpu with 24gb vram compatibility information v1 models are compatible with both openai python v0 and v1 v2 models are designed for compatibility with openai python v1 you may refer to the official openai documentation here https //platform openai com/docs/api reference/chat/create#chat create tools for the difference between openai python v0 and v1 how it works? we convert function definitions to a similar text to typescript definitions then we inject these definitions as system prompts after that, we inject the default system prompt then we start the conversation messages we use specially designed prompt templates which we call "v2prompttemplate" and "v1prompttemplate" v2prompttemplate breaks down each turns into from, recipient and content portions "v1prompttemplate" uses a variety of special tokens in each turn these includes a start of function call token to indicate an assistant turn with function call and role specific stop token for each turn the prompt example can be found here v1 https //github com/meetkai/functionary/blob/main/tests/prompt test v1 txt and v2 https //github com/meetkai/functionary/blob/main/tests/prompt test v2 txt we don't change the logit probabilities to conform to a certain schema, but the model itself knows how to conform this allows us to use existing tools and caching systems with ease license this project is licensed under the terms of the mit license

llama-cpp-python Usage