LLM Function Calling and Tool Use

Large language models are powerful reasoning engines, but they are confined to the knowledge in their training data. Function calling—the ability for an LLM to request and utilize external tools—breaks this confinement, transforming the model from a conversational agent into an orchestrator of real-world actions. By enabling an LLM to reliably call APIs, query databases, or run code, you can build applications that are not just knowledgeable, but actionable and context-aware. This paradigm shift is central to creating truly intelligent assistants, automated workflows, and dynamic systems that interact with the live world.

Defining the Interface: Schemas and Parameter Extraction

At its core, function calling is a structured conversation between the LLM and your application code. You, the developer, define the tools available to the model. This is done by providing function schemas, which are machine-readable descriptions of what each tool does, what inputs it requires, and what it returns.

A schema acts as a contract. It tells the LLM: "Here is a tool you can use. It's called get_weather. To use it, you must provide a location (a string) and a unit (either 'celsius' or 'fahrenheit')." The model's job is to analyze the user's natural language request, decide if this tool is needed, and if so, extract the required parameters in the correct format. This process is parameter extraction. For example, if a user asks, "What's the temperature like in Tokyo?", the model should reason that the get_weather function is relevant and output a structured object like {"location": "Tokyo", "unit": "celsius"}. This structured data is then passed by your application to the actual weather API.

The most common format for these schemas is the OpenAI function calling format. It is a JSON-based standard that has been widely adopted. A typical tool definition includes a name, a description (crucial for the LLM to understand when to use it), and a parameters object following JSON Schema rules to define the type, properties, and any requirements for each input. This structured approach replaces error-prone attempts to parse natural language responses with a reliable, programmatic handshake.

Executing the Call: Integration, Error Handling, and Reasoning

Once the LLM outputs a structured function call request, your application's responsibility begins. This result integration phase has three critical steps: execution, error handling, and feeding the result back to the model.

First, your code executes the call. This might involve calling an external API, running a database query, or performing a calculation. The result, whether successful data or an error, must then be formatted and sent back to the LLM in a subsequent message. This is where robust error handling for failed tool calls becomes essential. Tools can fail for many reasons: an API is down, a database query times out, or the extracted parameters are invalid (e.g., a non-existent city name). Your application should catch these errors and return a descriptive, natural-language error message to the LLM, such as "Error: The weather service could not find data for 'Tokio'. Did you mean 'Tokyo, Japan'?" This allows the model to understand the failure, adjust its reasoning, and potentially try a different approach or ask the user for clarification.

Finally, the LLM receives the tool's result or error message. It synthesizes this new information with the original conversation context and generates a final, helpful response for the user. For instance, after receiving the raw JSON {"temp": 22, "condition": "sunny"}, the model might say, "It's currently a sunny 22°C in Tokyo—perfect weather for a walk." This seamless integration makes the tool's capability appear as a natural extension of the model's own knowledge.

Advanced Patterns: Sequential and Parallel Tool Use

Simple applications might use one tool per user query. Sophisticated agents, however, must plan and execute multi-step operations. This involves sequential tool use and parallel tool use.

Sequential tool use is a chain of thought made actionable. The model must break down a complex goal into steps, where the output of one tool becomes the input for the next. Consider a user asking, "Email me a summary of the top 3 news stories about AI." The model's reasoning might be:

Call search_news(query="artificial intelligence", limit=5).
Receive a list of articles. For each, call get_article_details(article_id) to fetch the full text.
Call summarize_text(text) on each detailed article.
Finally, call send_email(subject, body) with the compiled summaries.

Each step depends on the last. The model must manage this state, knowing when it has the necessary information to proceed to the next function call.

Parallel tool use involves calling multiple, independent tools simultaneously to gather information efficiently. For a query like "Compare the weather in London and the stock price of Tesla," the model can—and should—request both the get_weather and get_stock_price tools at the same time. Your application system should execute these independent calls in parallel where possible to minimize latency before sending all results back to the LLM for a cohesive comparison. This requires the model to identify tasks that lack dependencies and orchestrate them efficiently.

Building Reliable Tool-Augmented Applications

Building a tool-augmented LLM application that interacts reliably with databases, APIs, and external services moves beyond simple demos into engineering for production. Reliability hinges on several key practices.

First, tool schemas must be meticulously designed. Clear, unambiguous description fields guide the LLM's choice. Strict parameter typing (e.g., "type": "string", "enum": ["c", "f"]) prevents invalid calls. You should implement validation logic in your application before calling the external service to catch issues early.

Second, your application architecture must manage context windows and state. A long conversation with multiple tool calls can generate a large history. You need strategies to summarize or prune this context to stay within model limits while retaining necessary information.

Third, implement comprehensive observability. Log every function call request, the extracted parameters, the tool response, and the final model output. This is critical for debugging instances where the model calls the wrong tool or mis-extracts parameters. Furthermore, consider implementing user confirmation flows for tools with real-world consequences (e.g., sending an email, making a purchase). The pattern is: LLM requests a tool -> application asks user for confirmation -> upon approval, execution proceeds.

Finally, design for safety and security. Never expose tools that can perform destructive actions without safeguards. Validate and sanitize all inputs extracted from the LLM before passing them to external systems to prevent injection attacks. The LLM is a powerful but unpredictable reasoning layer; your surrounding application code must provide the guardrails.

Common Pitfalls

Vague or Misleading Tool Descriptions: A schema description that says "Gets data" will lead to poor model performance. The description must explicitly state the tool's purpose and when it should be used. Correction: Write descriptions as instructions for the model, e.g., "Use this function to retrieve the current weather conditions for a specific city. Do not use this for historical weather or forecasts."

Poor Error Handling Propagation: Silently failing tool calls leave the model confused, often causing it to hallucinate an answer or get stuck. Correction: Always catch exceptions at the tool execution layer and return a structured, informative error message to the LLM so it can recover gracefully, for example: {"error": "Database query failed: Connection timeout."}

Assuming Single-Turn Perfection: Expecting the model to always choose the perfect tool with perfectly extracted parameters in one turn leads to fragile systems. Correction: Design your application loop to handle multiple turns of interaction. If a tool call fails or returns an empty result, allow the model to try a different parameter or ask the user a clarifying question.

Neglecting Cost and Latency: Blindly allowing an agent to chain many tool calls in sequence can make an application slow and expensive. Correction: Implement reasoning limits, timeouts, and design tools to be coarse-grained when possible. For parallelizable queries, always execute tools in parallel to improve user experience.

Summary

Function calling is the structured mechanism that allows LLMs to interact with external systems by requesting tools based on defined schemas and extracting necessary parameters from natural language.
The OpenAI function calling format provides a standardized JSON Schema approach for defining tools, making the interaction between the LLM and your application code reliable and predictable.
Effective implementation requires robust result integration and error handling, where your application executes calls, manages failures, and feeds outcomes back to the model for contextual response generation.
Advanced applications employ sequential tool use for multi-step reasoning and parallel tool use for gathering independent information efficiently, requiring careful state and context management.
Building reliable tool-augmented applications demands meticulous schema design, validation, observability, and safety guardrails to ensure interactions with databases and APIs are secure, accurate, and user-friendly.

LLM Function Calling and Tool Use

LLM Function Calling and Tool Use

Defining the Interface: Schemas and Parameter Extraction

Executing the Call: Integration, Error Handling, and Reasoning

Advanced Patterns: Sequential and Parallel Tool Use

Building Reliable Tool-Augmented Applications

Common Pitfalls

Summary

Write better notes with AI