AI2026-04-17

NousResearch Hermes Complete Guide — Hermes 4.3 36B, Function Calling & Hermes Agent [2026]

Complete guide to NousResearch Hermes 4.3 36B (512K context) and the Hermes Agent framework. Covers Function Calling implementation, Ollama setup, hardware requirements, and benchmarks including RefusalBench dominance — updated for 2026.

Hermes NousResearch Function Calling Hermes Agent ローカルLLM

What Is the Hermes Series? — The Function Calling-Optimized Open-Source LLM

NousResearch Hermes is a series of open-source LLM fine-tunes built on the philosophy of "user-aligned, minimally filtered, and highly steerable." From Hermes 1 (2023) through Hermes 4.3 (late 2025), the series has consistently led open-source LLMs in Function Calling reliability and agentic use cases. In March 2026, NousResearch also released "Hermes Agent," an open-source agent framework that has already surpassed 40,000 GitHub stars.

Hermes Series Evolution Timeline

Version	Base Model	Release	Key Feature
Hermes 1	LLaMA 1 13B	Early 2023	First high-quality Nous fine-tune
OpenHermes 2.5	Mistral 7B	2023	1M-sample dataset
Hermes 2 Pro	Llama 3 8B/70B	2024	Dedicated Function Calling tokens
Hermes 3	Llama 3.1 8B/70B/405B	August 2024	128K context, agent capabilities
DeepHermes 3	Llama 3 / Mistral 24B	February 2025	Switchable reasoning mode
Hermes 4	Llama 3.1 70B/405B	August 2025	Hybrid reasoning, RefusalBench #1
Hermes 4.3	ByteDance Seed 36B	December 2025	512K context, distributed training (Solana)

Hermes Model Family Lineage

Loading diagram...

Hermes 4.3 36B — The Latest Flagship

Hermes 4.3 is the first Hermes fine-tune based on a non-Meta model — ByteDance Seed 36B. It delivers 70B-class performance in a 36B Dense architecture with a remarkable 512K token context window. Training was conducted on the Psyche distributed network (Solana-based), and the model achieves the highest score of any model on RefusalBench.

Spec	Hermes 4.3 36B
Base Model	ByteDance Seed 36B
Parameters	36B (Dense)
Context	512K tokens
License	ByteDance Seed License
Training Data	~5M samples / ~60B tokens
Training Infra	Psyche distributed network (Solana)
VRAM (Q4)	24–32 GB
RefusalBench	Highest of all models (vs. GPT-4o at 17%, Claude at 17%)

What Is RefusalBench? — Why Hermes Dominates

RefusalBench measures how often an LLM unnecessarily refuses legitimate user requests. GPT-4o and Claude both score around 17%, while Hermes 4 reaches 57% and Hermes 4.3 exceeds that further. This is not about removing safety guardrails — it reflects NousResearch's philosophy of responding to legitimate requests without over-filtering. For AI agents and business automation, where over-refusal breaks workflows, Hermes's high RefusalBench score is a meaningful practical advantage.

Function Calling — Hermes's Greatest Strength

Since Hermes 2 Pro, the series has used dedicated tokens (<tools>, <tool_call>, <tool_response>) enabling streaming-compatible, highly reliable Function Calling. It is widely regarded as the most dependable open-source LLM for FC implementations and also supports structured JSON output conforming to JSON Schema. Below is an example in ChatML prompt format:

<|im_start|>system
You are a helpful assistant. You have access to the following tools:
<tools>
[{"name": "get_weather", "description": "Get current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}]
</tools>
<|im_end|>
<|im_start|>user
What is the weather in Tokyo?
<|im_end|>
<|im_start|>assistant
<tool_call>
{"name": "get_weather", "arguments": {"location": "Tokyo"}}
</tool_call>
<|im_end|>
<|im_start|>tool
<tool_response>
{"temperature": 18, "condition": "Partly cloudy"}
</tool_response>
<|im_end|>

What Is Hermes Agent? — The Agent Framework Released in March 2026

Hermes Agent is an open-source AI agent framework released in March 2026, already garnering 40,000+ GitHub stars. Built around the concept of a "growing agent," it automatically generates and memorizes new skills after completing tasks. Key features include persistent cross-session memory, natural-language Cron scheduling, multi-platform messaging (Telegram, Discord, Slack, LINE, WhatsApp, and more), official Ollama integration, and sub-agent support for parallel task execution.

Hermes Agent Architecture

Loading diagram...

Hermes Agent vs OpenClaw — Comparison

Feature	Hermes Agent	OpenClaw
Release	March 2026	2025
GitHub Stars	40,000+	—
Persistent Memory	Yes	Yes
Auto Skill Generation	Yes	No
Messaging Integrations	Telegram/Discord/Slack/LINE/WhatsApp etc.	Slack/Discord/LINE etc.
Ollama Integration	Official	Official
MCP Support	Server mode	Yes
Natural Language Cron	Yes	Limited
License	MIT	MIT

Choose Hermes Agent if auto skill generation and broad messaging integrations are priorities. Choose OpenClaw if you need a mature MCP ecosystem or a proven production workflow.

Ollama Support and Installation

Model	Ollama Command	Notes
Hermes 3 8B	`ollama run hermes3`	Official library
Hermes 3 70B	`ollama run hermes3:70b`	Official library
Hermes 4.3 36B	`ollama run HammerAI/hermes-4.3`	Community Modelfile
DeepHermes 3 8B	Manual GGUF download	via bartowski

Hermes 3 is available in Ollama's official library, enabling one-command setup. Hermes 4.3 requires a community-provided Modelfile.

Hardware Requirements

Model	VRAM (Q4)	Recommended GPU
Hermes 3 3B	4–6 GB	RTX 3060
Hermes 3 8B	8–10 GB	RTX 3080 / RTX 4070
DeepHermes 3 Mistral 24B	16–20 GB	RTX 3090 / RTX 4090
Hermes 4.3 36B	24–32 GB	RTX 3090 / RTX 4090 / M3 Max
Hermes 4 70B	40–48 GB	A100 / Dual RTX 3090

Hermes 3 3B at Q4 quantization can run on CPU only, though response speed is significantly reduced.

Top 5 Use Cases for Hermes

1. AI Agent Backend LLM — Hermes's reliable Function Calling makes it the go-to choice for autonomous agent core engines.
2. Chatbot Development — Low filtering enables natural, useful responses in business chatbot deployments.
3. Creative Writing — Hermes 3 is particularly well-regarded for creative text generation tasks.
4. Structured Data Extraction — JSON Schema-compliant output makes it easy to integrate into data pipelines.
5. Multi-Step Reasoning — Hermes 4's hybrid reasoning mode handles complex logical and analytical tasks.

Japanese Language Support — What to Expect

Hermes models are primarily trained on English data. For Japanese natural language generation, Qwen 3.5 or Gemma 4 are recommended instead. However, since Function Calling and JSON output are largely language-agnostic, Hermes remains fully usable for tool integration and data extraction workflows — even when user inputs are in Japanese. Structuring prompts so that tool calls operate in English while accepting Japanese inputs is a practical approach.

DeepHermes 3's Switchable Reasoning Mode

DeepHermes 3 was the first model to allow reasoning mode to be toggled via the system prompt. When enabled, it generates extended reasoning chains enclosed in <think>...</think> tags, significantly improving scores on math and logic tasks. Disabling it for conversational tasks keeps latency low. This dual-mode capability means a single model can cover both routine and high-precision tasks — a practical advantage for production deployments.

The Psyche Network — Decentralized Training on Solana

Hermes 4.3 was trained on the Psyche distributed network. Using the DisTrO optimizer, training was distributed across multiple data centers over the internet, with consensus on compute contributions and rewards secured by the Solana blockchain. This represents a significant innovation: large-scale model training without reliance on any single centralized compute provider, demonstrating that decentralized AI training is viable at frontier scale.

Frequently Asked Questions (FAQ)

Q: What is the difference between Hermes and Llama?
A: Llama is a foundation model released by Meta. Hermes is a fine-tune of Llama (and other base models) using NousResearch's proprietary high-quality datasets, optimized for Function Calling and agentic tasks.

Q: Can Hermes be used commercially?
A: It depends on the version. Hermes 3 follows the Meta Llama License; Hermes 4.3 follows the ByteDance Seed License. Review each license before commercial deployment.

Q: Is Function Calling better than other open-source models?
A: Yes — Hermes is considered the most reliable open-source LLM for Function Calling, thanks to its dedicated token system introduced in Hermes 2 Pro and continuously refined since.

Q: Should I use Hermes Agent or OpenClaw?
A: Choose Hermes Agent if you need automatic skill generation and broad messaging platform support. Choose OpenClaw if you want a mature MCP ecosystem and a proven production track record.

Q: Can Hermes handle Japanese?
A: Natural Japanese text generation is limited. However, for Function Calling and JSON output use cases, it is practical. For high-quality Japanese generation, Qwen 3.5 or Gemma 4 are recommended.

Q: Is the 512K context of Hermes 4.3 actually useful?
A: Yes — it enables whole-codebase comprehension, long-document Q&A, and extended conversation memory, all of which are valuable in real-world deployments.

Q: Can I run Hermes without a GPU?
A: Hermes 3 3B at Q4 quantization can run on CPU only, but response speed will be significantly reduced. A GPU is recommended for practical use.

Q: Which model is recommended?
A: With 24 GB VRAM, Hermes 4.3 36B (Q4) is the best choice. With 8–10 GB, Hermes 3 8B offers the best balance of performance and accessibility.

Oflight's Generative AI Integration Support

Oflight provides end-to-end support for deploying Hermes and other local LLMs in production — from Function Calling implementation and Hermes Agent integration to on-premises AI agent architecture. Whether you're evaluating models or building a full agentic system, our team covers model selection through deployment.

Learn more about our AI consulting services

Feel free to contact us