HyperCLOVAX-SEED-Think-32B-heretic

HyperCLOVAX-SEED-Think-32B-hereticnaver-hyperclovax/HyperCLOVAX-SEED-Think-32B를 기반으로, 사후(weight editing) 방식으로 과잉 거부(refusal) 성향을 완화하는 방향의 수정이 적용된 변형 모델입니다.


Model Summary (KO)

  • Base model: naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
  • Format: BF16 (safetensors)
  • Approach: targeted post-hoc weight editing
  • Primary effect: refusal behavior calibration (reduce over-refusal)
  • Observed drift: small (see KL metric below)

Model Summary (EN)

  • Base model: naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
  • Weights: BF16 (safetensors)
  • Method: targeted post-hoc weight editing
  • Goal: reduce over-refusal on benign/borderline prompts while keeping the output distribution close to the base model

What’s Changed

This variant applies focused modifications around projection layers to shift refusal-related behavior.

Editing Parameters (as-run)

  • direction_index = 42.77
  • attn.o_proj.max_weight = 1.13
  • attn.o_proj.max_weight_position = 67.44
  • attn.o_proj.min_weight = 0.46
  • attn.o_proj.min_weight_distance = 25.36
  • mlp.down_proj.max_weight = 1.49
  • mlp.down_proj.max_weight_position = 43.36
  • mlp.down_proj.min_weight = 0.97
  • mlp.down_proj.min_weight_distance = 26.08

Behavior / Drift Indicators

  • Refusals: 10 / 100
  • KL divergence: 0.1123

Note: these values depend on your prompt set and refusal criterion.
If you can disclose the evaluation slice (even roughly), add it in the section below.


Evaluation Notes

Refusal Definition (edit as needed)

A completion is counted as a refusal if it:

  • explicitly denies compliance (e.g., “I can’t / I won’t”), and
  • does not provide a meaningful safe alternative or partial completion.

Prompt Set

  • prompt mix: [benign / borderline / policy-sensitive]
  • sample size: 100
  • source: [private/internal or 공개 가능하면 설명]

Intended Use

Recommended

  • General chat
  • Creative writing / brainstorming
  • Everyday Q&A where over-refusal hurts usability
  • Research on refusal behavior, steering, and drift tradeoffs

Not Recommended (without extra guardrails)

  • Public-facing deployment without moderation/filters
  • High-stakes domains (medical/legal/financial)
  • Any use that requires strict compliance guarantees

Safety & Risks

Reducing refusals can increase the chance that the model responds in situations where the base model would refuse. For real deployments, consider:

  • input filtering / output moderation
  • rate limits & logging
  • clear acceptable-use policy and enforcement

Known limitations:

  • side effects may exist (tone shift, verbosity changes, occasional riskier completions)
  • evaluation is not exhaustive; additional red-teaming is recommended

GGUF (llama.cpp) Inference

This repository also provides an F16 GGUF build under gguf/, intended for running with llama.cpp.

Run with llama-server (Thinking ON)

This command enables the model's "thinking" behavior via --chat-template-kwargs.

Linux / macOS

./llama-server \
  -m {PATH}/HyperCLOVAX-SEED-Think-32B-heretic2.f16.gguf \
  --host 0.0.0.0 --port 10000 \
  --jinja \
  --chat-template-kwargs '{"thinking":true,"enable_thinking":true}' \
  -cb -fa on

---

## How to Use

### Transformers (example)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic"  # <- your repo id

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain KL divergence in simple terms."},
]

# If the tokenizer provides a chat template:
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    do_sample=True,
)
print(tok.decode(out[0], skip_special_tokens=True))
Downloads last month
288
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic

Quantized
(4)
this model