NousResearch Hermes Complete Guide — Hermes 4.3 36B, Function Calling & Hermes Agent [2026]
Complete guide to NousResearch Hermes 4.3 36B (512K context) and the Hermes Agent framework. Covers Function Calling implementation, Ollama setup, hardware requirements, and benchmarks including RefusalBench dominance — updated for 2026.
What Is the Hermes Series? — The Function Calling-Optimized Open-Source LLM
NousResearch Hermes is a series of open-source LLM fine-tunes built on the philosophy of "user-aligned, minimally filtered, and highly steerable." From Hermes 1 (2023) through Hermes 4.3 (late 2025), the series has consistently led open-source LLMs in Function Calling reliability and agentic use cases. In March 2026, NousResearch also released "Hermes Agent," an open-source agent framework that has already surpassed 40,000 GitHub stars.
Hermes Series Evolution Timeline
| Version | Base Model | Release | Key Feature |
|---|---|---|---|
| Hermes 1 | LLaMA 1 13B | Early 2023 | First high-quality Nous fine-tune |
| OpenHermes 2.5 | Mistral 7B | 2023 | 1M-sample dataset |
| Hermes 2 Pro | Llama 3 8B/70B | 2024 | Dedicated Function Calling tokens |
| Hermes 3 | Llama 3.1 8B/70B/405B | August 2024 | 128K context, agent capabilities |
| DeepHermes 3 | Llama 3 / Mistral 24B | February 2025 | Switchable reasoning mode |
| Hermes 4 | Llama 3.1 70B/405B | August 2025 | Hybrid reasoning, RefusalBench #1 |
| Hermes 4.3 | ByteDance Seed 36B | December 2025 | 512K context, distributed training (Solana) |
Hermes Model Family Lineage
Hermes 4.3 36B — The Latest Flagship
Hermes 4.3 is the first Hermes fine-tune based on a non-Meta model — ByteDance Seed 36B. It delivers 70B-class performance in a 36B Dense architecture with a remarkable 512K token context window. Training was conducted on the Psyche distributed network (Solana-based), and the model achieves the highest score of any model on RefusalBench.
| Spec | Hermes 4.3 36B |
|---|---|
| Base Model | ByteDance Seed 36B |
| Parameters | 36B (Dense) |
| Context | 512K tokens |
| License | ByteDance Seed License |
| Training Data | ~5M samples / ~60B tokens |
| Training Infra | Psyche distributed network (Solana) |
| VRAM (Q4) | 24–32 GB |
| RefusalBench | Highest of all models (vs. GPT-4o at 17%, Claude at 17%) |
What Is RefusalBench? — Why Hermes Dominates
RefusalBench measures how often an LLM unnecessarily refuses legitimate user requests. GPT-4o and Claude both score around 17%, while Hermes 4 reaches 57% and Hermes 4.3 exceeds that further. This is not about removing safety guardrails — it reflects NousResearch's philosophy of responding to legitimate requests without over-filtering. For AI agents and business automation, where over-refusal breaks workflows, Hermes's high RefusalBench score is a meaningful practical advantage.
Function Calling — Hermes's Greatest Strength
Since Hermes 2 Pro, the series has used dedicated tokens (`<tools>`, `<tool_call>`, `<tool_response>`) enabling streaming-compatible, highly reliable Function Calling. It is widely regarded as the most dependable open-source LLM for FC implementations and also supports structured JSON output conforming to JSON Schema. Below is an example in ChatML prompt format:
<|im_start|>system
You are a helpful assistant. You have access to the following tools:
<tools>
[{"name": "get_weather", "description": "Get current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}]
</tools>
<|im_end|>
<|im_start|>user
What is the weather in Tokyo?
<|im_end|>
<|im_start|>assistant
<tool_call>
{"name": "get_weather", "arguments": {"location": "Tokyo"}}
</tool_call>
<|im_end|>
<|im_start|>tool
<tool_response>
{"temperature": 18, "condition": "Partly cloudy"}
</tool_response>
<|im_end|>What Is Hermes Agent? — The Agent Framework Released in March 2026
Hermes Agent is an open-source AI agent framework released in March 2026, already garnering 40,000+ GitHub stars. Built around the concept of a "growing agent," it automatically generates and memorizes new skills after completing tasks. Key features include persistent cross-session memory, natural-language Cron scheduling, multi-platform messaging (Telegram, Discord, Slack, LINE, WhatsApp, and more), official Ollama integration, and sub-agent support for parallel task execution.
Hermes Agent Architecture
Hermes Agent vs OpenClaw — Comparison
| Feature | Hermes Agent | OpenClaw |
|---|---|---|
| Release | March 2026 | 2025 |
| GitHub Stars | 40,000+ | — |
| Persistent Memory | Yes | Yes |
| Auto Skill Generation | Yes | No |
| Messaging Integrations | Telegram/Discord/Slack/LINE/WhatsApp etc. | Slack/Discord/LINE etc. |
| Ollama Integration | Official | Official |
| MCP Support | Server mode | Yes |
| Natural Language Cron | Yes | Limited |
| License | MIT | MIT |
Choose Hermes Agent if auto skill generation and broad messaging integrations are priorities. Choose OpenClaw if you need a mature MCP ecosystem or a proven production workflow.
Ollama Support and Installation
| Model | Ollama Command | Notes |
|---|---|---|
| Hermes 3 8B | `ollama run hermes3` | Official library |
| Hermes 3 70B | `ollama run hermes3:70b` | Official library |
| Hermes 4.3 36B | `ollama run HammerAI/hermes-4.3` | Community Modelfile |
| DeepHermes 3 8B | Manual GGUF download | via bartowski |
Hermes 3 is available in Ollama's official library, enabling one-command setup. Hermes 4.3 requires a community-provided Modelfile.
Hardware Requirements
| Model | VRAM (Q4) | Recommended GPU |
|---|---|---|
| Hermes 3 3B | 4–6 GB | RTX 3060 |
| Hermes 3 8B | 8–10 GB | RTX 3080 / RTX 4070 |
| DeepHermes 3 Mistral 24B | 16–20 GB | RTX 3090 / RTX 4090 |
| Hermes 4.3 36B | 24–32 GB | RTX 3090 / RTX 4090 / M3 Max |
| Hermes 4 70B | 40–48 GB | A100 / Dual RTX 3090 |
Hermes 3 3B at Q4 quantization can run on CPU only, though response speed is significantly reduced.
Top 5 Use Cases for Hermes
1. AI Agent Backend LLM — Hermes's reliable Function Calling makes it the go-to choice for autonomous agent core engines. 2. Chatbot Development — Low filtering enables natural, useful responses in business chatbot deployments. 3. Creative Writing — Hermes 3 is particularly well-regarded for creative text generation tasks. 4. Structured Data Extraction — JSON Schema-compliant output makes it easy to integrate into data pipelines. 5. Multi-Step Reasoning — Hermes 4's hybrid reasoning mode handles complex logical and analytical tasks.
Japanese Language Support — What to Expect
Hermes models are primarily trained on English data. For Japanese natural language generation, Qwen 3.5 or Gemma 4 are recommended instead. However, since Function Calling and JSON output are largely language-agnostic, Hermes remains fully usable for tool integration and data extraction workflows — even when user inputs are in Japanese. Structuring prompts so that tool calls operate in English while accepting Japanese inputs is a practical approach.
DeepHermes 3's Switchable Reasoning Mode
DeepHermes 3 was the first model to allow reasoning mode to be toggled via the system prompt. When enabled, it generates extended reasoning chains enclosed in `<think>...</think>` tags, significantly improving scores on math and logic tasks. Disabling it for conversational tasks keeps latency low. This dual-mode capability means a single model can cover both routine and high-precision tasks — a practical advantage for production deployments.
The Psyche Network — Decentralized Training on Solana
Hermes 4.3 was trained on the Psyche distributed network. Using the DisTrO optimizer, training was distributed across multiple data centers over the internet, with consensus on compute contributions and rewards secured by the Solana blockchain. This represents a significant innovation: large-scale model training without reliance on any single centralized compute provider, demonstrating that decentralized AI training is viable at frontier scale.
Frequently Asked Questions (FAQ)
Q: What is the difference between Hermes and Llama? A: Llama is a foundation model released by Meta. Hermes is a fine-tune of Llama (and other base models) using NousResearch's proprietary high-quality datasets, optimized for Function Calling and agentic tasks. Q: Can Hermes be used commercially? A: It depends on the version. Hermes 3 follows the Meta Llama License; Hermes 4.3 follows the ByteDance Seed License. Review each license before commercial deployment. Q: Is Function Calling better than other open-source models? A: Yes — Hermes is considered the most reliable open-source LLM for Function Calling, thanks to its dedicated token system introduced in Hermes 2 Pro and continuously refined since. Q: Should I use Hermes Agent or OpenClaw? A: Choose Hermes Agent if you need automatic skill generation and broad messaging platform support. Choose OpenClaw if you want a mature MCP ecosystem and a proven production track record. Q: Can Hermes handle Japanese? A: Natural Japanese text generation is limited. However, for Function Calling and JSON output use cases, it is practical. For high-quality Japanese generation, Qwen 3.5 or Gemma 4 are recommended. Q: Is the 512K context of Hermes 4.3 actually useful? A: Yes — it enables whole-codebase comprehension, long-document Q&A, and extended conversation memory, all of which are valuable in real-world deployments. Q: Can I run Hermes without a GPU? A: Hermes 3 3B at Q4 quantization can run on CPU only, but response speed will be significantly reduced. A GPU is recommended for practical use. Q: Which model is recommended? A: With 24 GB VRAM, Hermes 4.3 36B (Q4) is the best choice. With 8–10 GB, Hermes 3 8B offers the best balance of performance and accessibility.
Oflight's Generative AI Integration Support
Oflight provides end-to-end support for deploying Hermes and other local LLMs in production — from Function Calling implementation and Hermes Agent integration to on-premises AI agent architecture. Whether you're evaluating models or building a full agentic system, our team covers model selection through deployment. Learn more about our AI consulting services
Feel free to contact us
Contact Us