Nemotron-70B → Gemma-3 27B (Text) SVD-LoRA Adapter (Adaptive Rank)

中文版本請見:**README_ZH.md**

This repository provides a PEFT LoRA adapter for Changgil/google-gemma-3-27b-it-text, distilled from nvidia/Llama-3.1-Nemotron-70B-Instruct-HF using weight-delta SVD-LoRA distillation (cross-architecture).

  • Base model (student / required): Changgil/google-gemma-3-27b-it-text
  • Teacher model (reference): nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
  • Artifact: LoRA adapter (PEFT) — not a full merged model
  • Scope: Applies to attention + MLP modules (self_attn|mlp)

What is this?

This adapter approximates the teacher→student weight delta (Δ) with low-rank factors, and stores them as LoRA matrices. It is designed for cross-architecture distillation where teacher/student differ in layer count and hidden size.

Key build characteristics (as used for this adapter):

  • SVD backend: aurora (AURORA-SVD)
  • Adaptive rank: enabled via energy threshold
  • Teacher mixing: lsq (per-matrix least-squares mixing)
  • Calibration: RMS-based calibration from Alpaca-format samples

Quickstart (Transformers + PEFT)

This is an adapter. You must load the base model first.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_id = "Changgil/google-gemma-3-27b-it-text"
adapter_id = "win10/Nemotron2Gemma-AURORA-LoRA-27B-IT-0p95"

tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=True)

base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain knowledge distillation in 5 bullet points."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
)

with torch.no_grad():
    out = model.generate(
        inputs.to(model.device),
        max_new_tokens=512,
        do_sample=False,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Optional: Merge the adapter into the base weights

If you need a single merged checkpoint for inference:

from peft import PeftModel

merged = model.merge_and_unload()
merged.save_pretrained("./merged_model", safe_serialization=True)
tokenizer.save_pretrained("./merged_model")

Reproducibility (build command)

The adapter was produced with a command equivalent to:

python universal_distill_v4_1_0_aurora_svd_innovations.py \
  --teacher E:\text-generation-webui-1.14\user_data\models\Llama-3.1-Nemotron-70B-Instruct-HF \
  --student E:\text-generation-webui-1.14\user_data\models\google-gemma-3-27b-it-text \
  --output  ./Llama-3.1-Nemotron-70B-Instruct-HF-gemma-3-27b-it-text-lora-adaptive \
  --svd-mode aurora \
  --energy-threshold 0.95 \
  --min-rank 256 \
  --max-rank 5376 \
  --interp-mode lsq \
  --svd-rand-iter 2 \
  --svd-rand-oversamples 8 \
  --svd-aurora-steps 100 \
  --svd-aurora-order 2 \
  --calib-format alpaca \
  --calib-alpaca-template classic \
  --calib-max-samples 128 \
  --calib-max-length 65536 \
  --calib-batch-size 2 \
  --calib-save .\calib_stats_Yi-70B-200k_alpaca-taiwan-dataset.safetensors \
  --calib-mode rms \
  --include "self_attn|mlp"

Observed run summary (example log):

  • Teacher tensors: 723
  • Student tensors: 808
  • Teacher: GQA + SwiGLU, 80 layers, hidden 8192
  • Student: GQA + standard FFN, 62 layers, hidden 5376
  • TIES: enabled (density=0.3)
  • DARE: disabled

Compatibility notes

  • This adapter targets the exact module naming / shapes of Changgil/google-gemma-3-27b-it-text.
  • If you use a different Gemma-3 27B variant, it must be shape-compatible (otherwise adapter load will fail).

Limitations

  • This is weight-space distillation (delta approximation). It can transfer behavior/style partially, but it is not guaranteed to fully match the teacher across all tasks.
  • Output quality depends on base model prompting/chat template and decoding settings.

Source models


License

Please follow the license and usage terms of the base model and teacher model as listed on their Hugging Face pages. This repository only provides an adapter; downstream usage must remain compliant with upstream terms.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for win10/Nemotron2Gemma-AURORA-LoRA-27B-IT-0p95

Adapter
(1)
this model