Functionary

17min
Document image


Functionary is a language model that can interpret and execute functions/plugins.

The model determines when to execute functions, whether in parallel or serially, and can understand their outputs. It only triggers functions as needed. Function definitions are given as JSON Schema Objects, similar to OpenAI GPT function calls.

Key Features

Model Features

  • Intelligent parallel function-calls/tool-uses
  • Able to analyze functions/tools outputs and provide relevant responses grounded in the function/tools outputs
  • Able to decide when to not use tools/call functions and provide normal chat response

Server/Inference Features

  • OpenAI-compatible server based on the blazing fast vLLM
  • Grammar Sampling resulting in 100% accuracy in generating function and argument names
  • Automatically execute python functions called by Functionary

Us vs the Industry

Feature/Project

Gorilla

Glaive

Single Function Call











Parallel Function Calls











Nested Function Calls











Following up on Missing Function Arguments











Multi-turn











Generate Model Responses Grounded in Tools Execution Results











Chit-chat











Performance

Function Prediction Evaluation

We evaluated Functionary's function call prediction capabilities on MeetKai's in-house benchmark dataset. The accuracy metric measures the overall correctness of predicted function calls, including function name prediction and arguments extraction.

Document image


Dataset

Model Name

Function Calling Accuracy (Name & Arguments)

In-house data

MeetKai-functionary-small-v2.2

0.546

In-house data

MeetKai-functionary-medium-v2.2

0.664

In-house data

OpenAI-gpt-3.5-turbo-1106

0.531

In-house data

OpenAI-gpt-4-1106-preview

0.737

Get Started

Installation

Make sure you have PyTorch installed. Then to install the required dependencies, run:

Shell


Running the server

Now you can start a blazing fast vLLM server:

Shell


Server Options

Since the server is running on vLLM, you can pass in various arguments for the vLLM Engine. They are listed here. The more important arguments are:

  • --model : the name of model (e.g.: "meetkai/functionary-medium-v2.2")
  • --tensor-parallel-size (default = 1) : the number of GPUs to use for the server. This should be set to greater than 1 for larger models like "meetkai/functionary-medium-v2.2" depending on the type of GPU that you are using
  • --max-model-len : the context window to set for your server.

There are also some features implemented independent of vLLM by us:

  • --host (default="0.0.0.0") : the host IP address for the server
  • --port (default="8000") : the port to be exposed for the server
  • --grammar_sampling (default=True) : when enabled, the server uses our implementation of grammar sampling that ensures 100% accuracy for function and argument names. It works by constraining function and argument names to only the list of functions provided in the API call or no function call at all.

Running the client

OpenAI Compatible Usage

Python


Raw Usage

Python


If you're having trouble with dependencies, and you have nvidia-container-toolkit, you can start your environment like this:

Shell


Models Available

Model

Description

Compute Requirements (for FP16 HF model weights)

8k context

Any GPU with 24GB VRAM

8k context, better accuracy

2 x A100-80GB or equivalent

8k context

Any GPU with 24GB VRAM

Parallel function call support.

Any GPU with 24GB VRAM

4k context, better accuracy (deprecated)

Any GPU with 24GB VRAM

4k context (deprecated)

Any GPU with 24GB VRAM

functionary-7b-v0.1

2k context (deprecated) Not recommended, use 2.1 onwards

Any GPU with 24GB VRAM

Compatibility information

  • v1 models are compatible with both OpenAI-python v0 and v1.
  • v2 models are designed for compatibility with OpenAI-python v1.

You may refer to the official OpenAI documentation here for the difference between OpenAI-python v0 and v1

How it works?

We convert function definitions to a similar text to TypeScript definitions. Then we inject these definitions as system prompts. After that, we inject the default system prompt. Then we start the conversation messages.

We use specially designed prompt templates which we call "v2PromptTemplate" and "v1PromptTemplate". v2PromptTemplate breaks down each turns into from, recipient and content portions. "v1PromptTemplate" uses a variety of special tokens in each turn. These includes a start-of-function-call token to indicate an assistant turn with function call and role-specific stop token for each turn.

The prompt example can be found here: V1 and V2

We don't change the logit probabilities to conform to a certain schema, but the model itself knows how to conform. This allows us to use existing tools and caching systems with ease.

License

This project is licensed under the terms of the MIT license.

Updated 25 Mar 2024
Doc contributor
Did this page help you?