HyperCLOVAX-SEED-Think-32B-heretic
HyperCLOVAX-SEED-Think-32B-heretic은 naver-hyperclovax/HyperCLOVAX-SEED-Think-32B를 기반으로, 사후(weight editing) 방식으로 과잉 거부(refusal) 성향을 완화하는 방향의 수정이 적용된 변형 모델입니다.
Model Summary (KO)
- Base model:
naver-hyperclovax/HyperCLOVAX-SEED-Think-32B - Format: BF16 (safetensors)
- Approach: targeted post-hoc weight editing
- Primary effect: refusal behavior calibration (reduce over-refusal)
- Observed drift: small (see KL metric below)
Model Summary (EN)
- Base model:
naver-hyperclovax/HyperCLOVAX-SEED-Think-32B - Weights: BF16 (safetensors)
- Method: targeted post-hoc weight editing
- Goal: reduce over-refusal on benign/borderline prompts while keeping the output distribution close to the base model
What’s Changed
This variant applies focused modifications around projection layers to shift refusal-related behavior.
Editing Parameters (as-run)
direction_index = 42.77attn.o_proj.max_weight = 1.13attn.o_proj.max_weight_position = 67.44attn.o_proj.min_weight = 0.46attn.o_proj.min_weight_distance = 25.36mlp.down_proj.max_weight = 1.49mlp.down_proj.max_weight_position = 43.36mlp.down_proj.min_weight = 0.97mlp.down_proj.min_weight_distance = 26.08
Behavior / Drift Indicators
- Refusals: 10 / 100
- KL divergence: 0.1123
Note: these values depend on your prompt set and refusal criterion.
If you can disclose the evaluation slice (even roughly), add it in the section below.
Evaluation Notes
Refusal Definition (edit as needed)
A completion is counted as a refusal if it:
- explicitly denies compliance (e.g., “I can’t / I won’t”), and
- does not provide a meaningful safe alternative or partial completion.
Prompt Set
- prompt mix:
[benign / borderline / policy-sensitive] - sample size:
100 - source:
[private/internal or 공개 가능하면 설명]
Intended Use
Recommended
- General chat
- Creative writing / brainstorming
- Everyday Q&A where over-refusal hurts usability
- Research on refusal behavior, steering, and drift tradeoffs
Not Recommended (without extra guardrails)
- Public-facing deployment without moderation/filters
- High-stakes domains (medical/legal/financial)
- Any use that requires strict compliance guarantees
Safety & Risks
Reducing refusals can increase the chance that the model responds in situations where the base model would refuse. For real deployments, consider:
- input filtering / output moderation
- rate limits & logging
- clear acceptable-use policy and enforcement
Known limitations:
- side effects may exist (tone shift, verbosity changes, occasional riskier completions)
- evaluation is not exhaustive; additional red-teaming is recommended
GGUF (llama.cpp) Inference
This repository also provides an F16 GGUF build under gguf/, intended for running with llama.cpp.
Run with llama-server (Thinking ON)
This command enables the model's "thinking" behavior via
--chat-template-kwargs.
Linux / macOS
./llama-server \
-m {PATH}/HyperCLOVAX-SEED-Think-32B-heretic2.f16.gguf \
--host 0.0.0.0 --port 10000 \
--jinja \
--chat-template-kwargs '{"thinking":true,"enable_thinking":true}' \
-cb -fa on
---
## How to Use
### Transformers (example)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic" # <- your repo id
tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain KL divergence in simple terms."},
]
# If the tokenizer provides a chat template:
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
do_sample=True,
)
print(tok.decode(out[0], skip_special_tokens=True))
- Downloads last month
- 288
Model tree for hostkimjang/HyperCLOVAX-SEED-Think-32B-heretic
Base model
naver-hyperclovax/HyperCLOVAX-SEED-Think-32B