🆓 Free API until Jan 28th, 2026! Try on ⬆️ FriendliAI ✈️

K-EXAONE-236B-A23B-GGUF

Introduction

We introduce K-EXAONE, a large-scale multilingual language model developed by LG AI Research. Built using a Mixture-of-Experts architecture, K-EXAONE features 236 billion total parameters, with 23 billion active during inference. Performance evaluations across various benchmarks demonstrate that K-EXAONE excels in reasoning, agentic capabilities, general knowledge, multilingual understanding, and long-context processing.

Key Features

  • Architecture & Efficiency: Features a 236B fine-grained MoE design (23B active) optimized with Multi-Token Prediction (MTP), enabling self-speculative decoding that boosts inference throughput by approximately 1.5x.
  • Long-Context Capabilities: Natively supports a 256K context window, utilizing a 3:1 hybrid attention scheme with a 128-token sliding window to significantly minimize memory usage during long-document processing.
  • Multilingual Support: Covers 6 languages: Korean, English, Spanish, German, Japanese, and Vietnamese. Features a redesigned 150k vocabulary with SuperBPE, improving token efficiency by ~30%.
  • Agentic Capabilities: Demonstrates superior tool-use and search capabilities via multi-agent strategies.
  • Safety & Ethics: Aligned with universal human values, the model uniquely incorporates Korean cultural and historical contexts to address regional sensitivities often overlooked by other models. It demonstrates high reliability across diverse risk categories.

For more details, please refer to the technical report and GitHub.

main_figure

Model Configuration

  • Number of Parameters: 236B in total and 23B activated
  • Number of Parameters (without embeddings): 234B
  • Hidden Dimension: 6,144
  • Number of Layers: 48 Main layers + 1 MTP layers
    • Hybrid Attention Pattern: 12 x (3 Sliding window attention + 1 Global attention)
  • Sliding Window Attention
    • Number of Attention Heads: 64 Q-heads and 8 KV-heads
    • Head Dimension: 128 for both Q/KV
    • Sliding Window Size: 128
  • Global Attention
    • Number of Attention Heads: 64 Q-heads and 8 KV-heads
    • Head Dimension: 128 for both Q/KV
    • No Rotary Positional Embedding Used (NoPE)
  • Mixture of Experts:
    • Number of Experts: 128
    • Number of Activated Experts: 8
    • Number of Shared Experts: 1
    • MoE Intermediate Size: 2,048
  • Vocab Size: 153,600
  • Context Length: 262,144 tokens
  • Knowledge Cutoff: Dec 2024 (2024/12)
  • Quantization: Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_XS in GGUF format (also includes BF16 weights)

Evaluation Results

The evaluation results of the original model against other models are available on the GitHub page or in the model card of the original model. Detailed evaluation configurations and results can be found in the technical report.

Requirements

Until the libraries officially support K-EXAONE, you need to install the requirements in our version with the EXAONE-MoE implementations. We will announce when these libraries are updated to support the K-EXAONE model.

Transformers

You can install the latest version of Transformers with support for EXAONE-MoE architecture from this repository. The base version of Transformers is 5.0.0rc1, so it might be helpful to check the migration guide from the Transformers library.

llama.cpp

You can install the latest version of llama.cpp with support for EXAONE-MoE architecture from this repository. Please refer to the official build guide for details.

Quickstart

llama.cpp

You should install the llama.cpp library with the EXAONE-MoE implementations. Please refer to the requirements section.

After you install the library, you need to prepare a model file in GGUF format as below:

# Download GGUF model weights (e.g. Q4_K_M)
hf download LGAI-EXAONE/K-EXAONE-236B-A23B-GGUF --include "*Q4_K_M*" --local-dir .

# Or convert huggingface model into GGUF format on your own
hf download LGAI-EXAONE/K-EXAONE-236B-A23B --local-dir $YOUR_MODEL_DIR
python convert_hf_to_gguf.py $YOUR_MODEL_DIR --outtype bf16 --outfile K-EXAONE-236B-A23B-BF16.gguf

# If you want to use the lower precision than BF16, you need to quantize the model
./llama-quantize K-EXAONE-236B-A23B-BF16.gguf K-EXAONE-236B-A23B-Q4_K_M.gguf Q4_K_M

You can test the model with simple chat CLI by running the command below:

./llama-cli -m K-EXAONE-236B-A23B-Q4_K_M.gguf \
    -ngl 99 \
    -fa on -sm row \
    --temp 1.0 --top-k 20 --top-p 0.95 --min-p 0 \
    -c 131072 -n 32768 \
    --no-context-shift \
    --jinja

You can also launch a server by running the command below:

./llama-server -m K-EXAONE-236B-A23B-Q4_K_M.gguf \
    -ngl 99 \
    -fa on -sm row \
    --temp 1.0 --top-k 20 --top-p 0.95 --min-p 0 \
    -c 131072 -n 32768 \
    --no-context-shift \
    --jinja \
    --host 0.0.0.0 --port 8080

When the server is ready, you can test the model using the chat-style UI at http://localhost:8080, and access the OpenAI-compatible API at http://localhost:8080/v1.

Ollama / LM-Studio

Ollama and LM-Studio are powered by llama.cpp, so they should be updated once llama.cpp officially supports K-EXAONE. We will update this section once each library supports K-EXAONE.

Usage Guideline

To achieve the expected performance, we recommend using the following configurations:

  • We strongly recommend to use temperature=1.0, top_p=0.95, presence_penalty=0.0 for best performance.
  • Different from EXAONE-4.0, K-EXAONE uses enable_thinking=True as default. Thus, you need to set enable_thinking=False when you want to use non-reasoning mode.

Limitation

The K-EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by K-EXAONE language model does not reflect the views of LG AI Research.

  • Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information.
  • Biased responses may be generated, which are associated with age, gender, race, and so on.
  • The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences.
  • Since the model does not reflect the latest information, the responses may be false or contradictory.

LG AI Research strives to reduce potential risks that may arise from K-EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI's ethical principles when using K-EXAONE language models.

License

The model is licensed under K-EXAONE AI Model License Agreement

Citation

@article{k-exaone,
  title={K-EXAONE Technical Report},
  author={{LG AI Research}},
  journal={arXiv preprint arXiv:2601.01739},
  year={2025}
}

Contact

LG AI Research Technical Support: contact_us@lgresearch.ai

Downloads last month
195
GGUF
Model size
237B params
Architecture
exaone-moe
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LGAI-EXAONE/K-EXAONE-236B-A23B-GGUF

Quantized
(6)
this model

Collection including LGAI-EXAONE/K-EXAONE-236B-A23B-GGUF

Paper for LGAI-EXAONE/K-EXAONE-236B-A23B-GGUF