HyperCLOVAX-SEED-Think-32B-heretic

HyperCLOVAX-SEED-Think-32B-heretic은 naver-hyperclovax/HyperCLOVAX-SEED-Think-32B를 기반으로, 사후(weight editing) 방식으로 과잉 거부(refusal) 성향을 완화하는 방향의 수정이 적용된 변형 모델입니다.

Model Summary (KO)

Base model: naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
Format: BF16 (safetensors)
Approach: targeted post-hoc weight editing
Primary effect: refusal behavior calibration (reduce over-refusal)
Observed drift: small (see KL metric below)

Model Summary (EN)

Base model: naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
Weights: BF16 (safetensors)
Method: targeted post-hoc weight editing
Goal: reduce over-refusal on benign/borderline prompts while keeping the output distribution close to the base model

What’s Changed

This variant applies focused modifications around projection layers to shift refusal-related behavior.

Editing Parameters (as-run)

direction_index = 42.77
attn.o_proj.max_weight = 1.13
attn.o_proj.max_weight_position = 67.44
attn.o_proj.min_weight = 0.46
attn.o_proj.min_weight_distance = 25.36
mlp.down_proj.max_weight = 1.49
mlp.down_proj.max_weight_position = 43.36
mlp.down_proj.min_weight = 0.97
mlp.down_proj.min_weight_distance = 26.08

Behavior / Drift Indicators

Refusals: 10 / 100
KL divergence: 0.1123

Note: these values depend on your prompt set and refusal criterion.
If you can disclose the evaluation slice (even roughly), add it in the section below.

Evaluation Notes

Refusal Definition (edit as needed)

A completion is counted as a refusal if it:

explicitly denies compliance (e.g., “I can’t / I won’t”), and
does not provide a meaningful safe alternative or partial completion.

Prompt Set

prompt mix: [benign / borderline / policy-sensitive]
sample size: 100
source: [private/internal or 공개 가능하면 설명]

Intended Use

Not Recommended (without extra guardrails)

Public-facing deployment without moderation/filters
High-stakes domains (medical/legal/financial)
Any use that requires strict compliance guarantees

Safety & Risks

Reducing refusals can increase the chance that the model responds in situations where the base model would refuse. For real deployments, consider:

input filtering / output moderation
rate limits & logging
clear acceptable-use policy and enforcement

Known limitations:

side effects may exist (tone shift, verbosity changes, occasional riskier completions)
evaluation is not exhaustive; additional red-teaming is recommended

GGUF (llama.cpp) Inference

This repository also provides an F16 GGUF build under gguf/, intended for running with llama.cpp.

Run with `llama-server` (Thinking ON)

This command enables the model's "thinking" behavior via --chat-template-kwargs.

Linux / macOS

./llama-server \
  -m {PATH}/HyperCLOVAX-SEED-Think-32B-heretic2.f16.gguf \
  --host 0.0.0.0 --port 10000 \
  --jinja \
  --chat-template-kwargs '{"thinking":true,"enable_thinking":true}' \
  -cb -fa on

---

## How to Use

### Transformers (example)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic"  # <- your repo id

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain KL divergence in simple terms."},
]

# If the tokenizer provides a chat template:
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    do_sample=True,
)
print(tok.decode(out[0], skip_special_tokens=True))

Downloads last month: 288

Safetensors

Model size

33B params

Tensor type

BF16

Model tree for hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic

Base model

naver-hyperclovax/HyperCLOVAX-SEED-Think-32B

Quantized

(4)

this model