Tulu‑2‑DPO‑13B

Alignment First, Preference‑Tuned Chat Intelligence

What is Tulu‑2‑DPO‑13B?

Tulu‑2‑DPO‑13B is a 13‑billion‑parameter LLaMA‑2 model, developed by the Allen Institute, fine‑tuned through Direct Preference Optimization (DPO) for robust, preference-aligned instruction-following. It builds upon a supervised fine-tuned (SFT) model trained on a wide mix of public and synthetic instruction datasets including Alpaca, Baize, FLAN, GPTeacher, and Code‑Alpaca, then enhanced via DPO using human preference data to create a model with improved reasoning, multi-turn dialogue, and instruction coherence (Hugging Face).

This model is part of the Tulu‑2 family and is released under the AI2 ImpACT Low-Risk license, making it one of the most openly accessible yet high-performance chat models in its class.

Key Features of Tulu‑2‑DPO‑13B

LLaMA‑2 13B Instruction Core

Based on Meta’s LLaMA‑2, with strong performance across reasoning, dialogue, and instructional tasks.

Supervised + DPO Tuning

Trained first with supervised learning (SFT) and then preference-tuned using Direct Preference Optimization, mimicking human-aligned choices for improved answer quality.

Benchmark Excellence

Scores 7.00 on MT‑Bench and achieves an 89.5% win rate on AlpacaEval, outperforming many comparable models in preference-aligned instruction following.

Compatible with GGUF & GPTQ

Available in community-built formats (e.g., GGUF, GPTQ) for use with llama.cpp, AutoGPTQ, and vLLM, enabling fast, low‑RAM, offline deployment (via TheBloke & others).

Licensed for Research & Application

Released under the AI2 ImpACT Low-Risk License, ideal for academic, research, and enterprise experimentation.

Use Cases of Tulu‑2‑DPO‑13B

Perfect for assistants that follow step-by-step commands or respond helpfully to complex instructions.

Use in helpdesks, internal tools, or educational chat systems for accurate and preference-tuned dialogue.

Handles chain-of-thought reasoning, simple coding problems, and structured decision-making.

Run fully offline using quantized models for security-sensitive applications, no internet connection or API needed.

Suitable for labs testing fine-tuning workflows, instruction tuning methods, or building instruction-aligned variants.

Tulu‑2‑DPO‑13B

vs

Comparable 13B Chat Models

Feature	Tulu-2-SFT-13B	Tulu-2-DPO-13B	LLaMA-2 Chat-13B
Base Model	LLaMA-2-13B	LLaMA-2-13B	LLaMA-2-13B
Tuning Type	Supervised FT	DPO + SFT	RLHF (Meta)
MT-Bench Score	6.70	7.00	~6.5
AlpacaEval Win Rate	78.9%	89.5%	~70-75%
Quantization Support	GGUF, GPTQ	GGUF, GPTQ	GGUF, GPTQ
License	AI2 ImpACT	AI2 ImpACT	Non-commercial