株式会社オブライト
AI2026-04-13

NousResearch Hermes Complete Guide — Hermes 4.3 36B, Function Calling & Hermes Agent [2026]

Complete guide to NousResearch Hermes 4.3 36B (512K context) and the Hermes Agent framework. Covers Function Calling implementation, Ollama setup, hardware requirements, and benchmarks including RefusalBench dominance — updated for 2026.


What Is the Hermes Series? — The Function Calling-Optimized Open-Source LLM

NousResearch Hermes is a series of open-source LLM fine-tunes built on the philosophy of "user-aligned, minimally filtered, and highly steerable." From Hermes 1 (2023) through Hermes 4.3 (late 2025), the series has consistently led open-source LLMs in Function Calling reliability and agentic use cases. In March 2026, NousResearch also released "Hermes Agent," an open-source agent framework that has already surpassed 40,000 GitHub stars.

Hermes Series Evolution Timeline

VersionBase ModelReleaseKey Feature
Hermes 1LLaMA 1 13BEarly 2023First high-quality Nous fine-tune
OpenHermes 2.5Mistral 7B20231M-sample dataset
Hermes 2 ProLlama 3 8B/70B2024Dedicated Function Calling tokens
Hermes 3Llama 3.1 8B/70B/405BAugust 2024128K context, agent capabilities
DeepHermes 3Llama 3 / Mistral 24BFebruary 2025Switchable reasoning mode
Hermes 4Llama 3.1 70B/405BAugust 2025Hybrid reasoning, RefusalBench #1
Hermes 4.3ByteDance Seed 36BDecember 2025512K context, distributed training (Solana)

Hermes Model Family Lineage

Loading diagram...

Hermes 4.3 36B — The Latest Flagship

Hermes 4.3 is the first Hermes fine-tune based on a non-Meta model — ByteDance Seed 36B. It delivers 70B-class performance in a 36B Dense architecture with a remarkable 512K token context window. Training was conducted on the Psyche distributed network (Solana-based), and the model achieves the highest score of any model on RefusalBench.

SpecHermes 4.3 36B
Base ModelByteDance Seed 36B
Parameters36B (Dense)
Context512K tokens
LicenseByteDance Seed License
Training Data~5M samples / ~60B tokens
Training InfraPsyche distributed network (Solana)
VRAM (Q4)24–32 GB
RefusalBenchHighest of all models (vs. GPT-4o at 17%, Claude at 17%)

What Is RefusalBench? — Why Hermes Dominates

RefusalBench measures how often an LLM unnecessarily refuses legitimate user requests. GPT-4o and Claude both score around 17%, while Hermes 4 reaches 57% and Hermes 4.3 exceeds that further. This is not about removing safety guardrails — it reflects NousResearch's philosophy of responding to legitimate requests without over-filtering. For AI agents and business automation, where over-refusal breaks workflows, Hermes's high RefusalBench score is a meaningful practical advantage.

Function Calling — Hermes's Greatest Strength

Since Hermes 2 Pro, the series has used dedicated tokens (`<tools>`, `<tool_call>`, `<tool_response>`) enabling streaming-compatible, highly reliable Function Calling. It is widely regarded as the most dependable open-source LLM for FC implementations and also supports structured JSON output conforming to JSON Schema. Below is an example in ChatML prompt format:

<|im_start|>system
You are a helpful assistant. You have access to the following tools:
<tools>
[{"name": "get_weather", "description": "Get current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}}]
</tools>
<|im_end|>
<|im_start|>user
What is the weather in Tokyo?
<|im_end|>
<|im_start|>assistant
<tool_call>
{"name": "get_weather", "arguments": {"location": "Tokyo"}}
</tool_call>
<|im_end|>
<|im_start|>tool
<tool_response>
{"temperature": 18, "condition": "Partly cloudy"}
</tool_response>
<|im_end|>

What Is Hermes Agent? — The Agent Framework Released in March 2026

Hermes Agent is an open-source AI agent framework released in March 2026, already garnering 40,000+ GitHub stars. Built around the concept of a "growing agent," it automatically generates and memorizes new skills after completing tasks. Key features include persistent cross-session memory, natural-language Cron scheduling, multi-platform messaging (Telegram, Discord, Slack, LINE, WhatsApp, and more), official Ollama integration, and sub-agent support for parallel task execution.

Hermes Agent Architecture

Loading diagram...

Hermes Agent vs OpenClaw — Comparison

FeatureHermes AgentOpenClaw
ReleaseMarch 20262025
GitHub Stars40,000+
Persistent MemoryYesYes
Auto Skill GenerationYesNo
Messaging IntegrationsTelegram/Discord/Slack/LINE/WhatsApp etc.Slack/Discord/LINE etc.
Ollama IntegrationOfficialOfficial
MCP SupportServer modeYes
Natural Language CronYesLimited
LicenseMITMIT

Choose Hermes Agent if auto skill generation and broad messaging integrations are priorities. Choose OpenClaw if you need a mature MCP ecosystem or a proven production workflow.

Ollama Support and Installation

ModelOllama CommandNotes
Hermes 3 8B`ollama run hermes3`Official library
Hermes 3 70B`ollama run hermes3:70b`Official library
Hermes 4.3 36B`ollama run HammerAI/hermes-4.3`Community Modelfile
DeepHermes 3 8BManual GGUF downloadvia bartowski

Hermes 3 is available in Ollama's official library, enabling one-command setup. Hermes 4.3 requires a community-provided Modelfile.

Hardware Requirements

ModelVRAM (Q4)Recommended GPU
Hermes 3 3B4–6 GBRTX 3060
Hermes 3 8B8–10 GBRTX 3080 / RTX 4070
DeepHermes 3 Mistral 24B16–20 GBRTX 3090 / RTX 4090
Hermes 4.3 36B24–32 GBRTX 3090 / RTX 4090 / M3 Max
Hermes 4 70B40–48 GBA100 / Dual RTX 3090

Hermes 3 3B at Q4 quantization can run on CPU only, though response speed is significantly reduced.

Top 5 Use Cases for Hermes

1. AI Agent Backend LLM — Hermes's reliable Function Calling makes it the go-to choice for autonomous agent core engines. 2. Chatbot Development — Low filtering enables natural, useful responses in business chatbot deployments. 3. Creative Writing — Hermes 3 is particularly well-regarded for creative text generation tasks. 4. Structured Data Extraction — JSON Schema-compliant output makes it easy to integrate into data pipelines. 5. Multi-Step Reasoning — Hermes 4's hybrid reasoning mode handles complex logical and analytical tasks.

Japanese Language Support — What to Expect

Hermes models are primarily trained on English data. For Japanese natural language generation, Qwen 3.5 or Gemma 4 are recommended instead. However, since Function Calling and JSON output are largely language-agnostic, Hermes remains fully usable for tool integration and data extraction workflows — even when user inputs are in Japanese. Structuring prompts so that tool calls operate in English while accepting Japanese inputs is a practical approach.

DeepHermes 3's Switchable Reasoning Mode

DeepHermes 3 was the first model to allow reasoning mode to be toggled via the system prompt. When enabled, it generates extended reasoning chains enclosed in `<think>...</think>` tags, significantly improving scores on math and logic tasks. Disabling it for conversational tasks keeps latency low. This dual-mode capability means a single model can cover both routine and high-precision tasks — a practical advantage for production deployments.

The Psyche Network — Decentralized Training on Solana

Hermes 4.3 was trained on the Psyche distributed network. Using the DisTrO optimizer, training was distributed across multiple data centers over the internet, with consensus on compute contributions and rewards secured by the Solana blockchain. This represents a significant innovation: large-scale model training without reliance on any single centralized compute provider, demonstrating that decentralized AI training is viable at frontier scale.

Frequently Asked Questions (FAQ)

Q: What is the difference between Hermes and Llama? A: Llama is a foundation model released by Meta. Hermes is a fine-tune of Llama (and other base models) using NousResearch's proprietary high-quality datasets, optimized for Function Calling and agentic tasks. Q: Can Hermes be used commercially? A: It depends on the version. Hermes 3 follows the Meta Llama License; Hermes 4.3 follows the ByteDance Seed License. Review each license before commercial deployment. Q: Is Function Calling better than other open-source models? A: Yes — Hermes is considered the most reliable open-source LLM for Function Calling, thanks to its dedicated token system introduced in Hermes 2 Pro and continuously refined since. Q: Should I use Hermes Agent or OpenClaw? A: Choose Hermes Agent if you need automatic skill generation and broad messaging platform support. Choose OpenClaw if you want a mature MCP ecosystem and a proven production track record. Q: Can Hermes handle Japanese? A: Natural Japanese text generation is limited. However, for Function Calling and JSON output use cases, it is practical. For high-quality Japanese generation, Qwen 3.5 or Gemma 4 are recommended. Q: Is the 512K context of Hermes 4.3 actually useful? A: Yes — it enables whole-codebase comprehension, long-document Q&A, and extended conversation memory, all of which are valuable in real-world deployments. Q: Can I run Hermes without a GPU? A: Hermes 3 3B at Q4 quantization can run on CPU only, but response speed will be significantly reduced. A GPU is recommended for practical use. Q: Which model is recommended? A: With 24 GB VRAM, Hermes 4.3 36B (Q4) is the best choice. With 8–10 GB, Hermes 3 8B offers the best balance of performance and accessibility.

Oflight's Generative AI Integration Support

Oflight provides end-to-end support for deploying Hermes and other local LLMs in production — from Function Calling implementation and Hermes Agent integration to on-premises AI agent architecture. Whether you're evaluating models or building a full agentic system, our team covers model selection through deployment. Learn more about our AI consulting services

Feel free to contact us

Contact Us