Advanced Tool Integration

Before starting with this advanced guide, please make sure you have a basic understanding of the tool integration process in Dify. Check out Quick Integration for a quick run through.

Tool Interface

We have defined a series of helper methods in the Tool class to help developers quickly build more complex tools.

Message Return

Dify supports various message types such as text, link, image, and file BLOB. You can return different types of messages to the LLM and users through the following interfaces.

Please note, some parameters in the following interfaces will be introduced in later sections.

Image URL

You only need to pass the URL of the image, and Dify will automatically download the image and return it to the user.

    def create_image_message(self, image: str, save_as: str = '') -> ToolInvokeMessage:
        """
            create an image message

            :param image: the url of the image
            :return: the image message
        """

Link

If you need to return a link, you can use the following interface.

Text

If you need to return a text message, you can use the following interface.

File BLOB

If you need to return the raw data of a file, such as images, audio, video, PPT, Word, Excel, etc., you can use the following interface.

  • blob The raw data of the file, of bytes type

  • meta The metadata of the file, if you know the type of the file, it is best to pass a mime_type, otherwise Dify will use octet/stream as the default type

Shortcut Tools

In large model applications, we have two common needs:

  • First, summarize a long text in advance, and then pass the summary content to the LLM to prevent the original text from being too long for the LLM to handle

  • The content obtained by the tool is a link, and the web page information needs to be crawled before it can be returned to the LLM

To help developers quickly implement these two needs, we provide the following two shortcut tools.

Text Summary Tool

This tool takes in an user_id and the text to be summarized, and returns the summarized text. Dify will use the default model of the current workspace to summarize the long text.

Web Page Crawling Tool

This tool takes in web page link to be crawled and a user_agent (which can be empty), and returns a string containing the information of the web page. The user_agent is an optional parameter that can be used to identify the tool. If not passed, Dify will use the default user_agent.

Variable Pool

We have introduced a variable pool in Tool to store variables, files, etc. generated during the tool's operation. These variables can be used by other tools during the tool's operation.

Next, we will use DallE3 and Vectorizer.AI as examples to introduce how to use the variable pool.

  • DallE3 is an image generation tool that can generate images based on text. Here, we will let DallE3 generate a logo for a coffee shop

  • Vectorizer.AI is a vector image conversion tool that can convert images into vector images, so that the images can be infinitely enlarged without distortion. Here, we will convert the PNG icon generated by DallE3 into a vector image, so that it can be truly used by designers.

DallE3

First, we use DallE3. After creating the image, we save the image to the variable pool. The code is as follows:

Note that we used self.VARIABLE_KEY.IMAGE.value as the variable name of the image. In order for developers' tools to cooperate with each other, we defined this KEY. You can use it freely, or you can choose not to use this KEY. Passing a custom KEY is also acceptable.

Vectorizer.AI

Next, we use Vectorizer.AI to convert the PNG icon generated by DallE3 into a vector image. Let's go through the functions we defined here. The code is as follows:

Next, let's implement these three functions

It's worth noting that we didn't actually use image_id here. We assumed that there must be an image in the default variable pool when calling this tool, so we directly used image_binary = self.get_variable_file(self.VARIABLE_KEY.IMAGE) to get the image. In cases where the model's capabilities are weak, we recommend developers to do the same, which can effectively improve fault tolerance and avoid the model passing incorrect parameters.

Last updated