DeepSignal-4B-V1 (GGUF)

This repository provides a GGUF model file for local inference (e.g., llama.cpp / LM Studio). It is intended for traffic-signal-control analysis and related text-generation workflows. For details, check our repository at AIMSLaboratory/DeepSignal.

Files

  • DeepSignal-4B_V1.F16.gguf
  • config.json

Quickstart (llama.cpp)

llama-cli -m DeepSignal-4B_V1.F16.gguf -p "You are a traffic management expert. You can use your traffic knowledge to solve the traffic signal control task.
Based on the given traffic {scene} and {state}, predict the next signal phase and its duration.
You must answer directly, the format must be: next signal phase: {number}, duration: {seconds} seconds
where the number is the phase index (starting from 0) and the seconds is the duration (usually between 20-90 seconds)."

You need to input the {scene} (total number of phases, which phases controls which lanes/directions and current phase ID/number, etc) and {state} (number of queing vehicles per lane, throughout vehicles per lane during the current phase, etc)

Evaluation (Traffic Simulation)

Performance Metrics Comparison by Model *

Model Avg Saturation Avg Cumulative Queue Length (veh⋅min) Avg Throughput (veh/5min) Avg Response Time (s)
GPT-OSS-20B (thinking) 0.380 14.088 77.910 6.768
DeepSignal-4B (Ours) 0.422 15.703 79.883 2.131
Qwen3-30B-A3B 0.431 17.046 79.059 2.727
Qwen3-4B 0.466 57.699 75.712 1.994
Max Pressure 0.465 23.022 77.236 **
LightGPT-8B-Llama3 0.523 54.384 75.512 3.025***

*: Each simulation scenario runs for 60 minutes. We discard the first 5 minutes as warm-up, then compute metrics over the next 20 minutes (minute 5 to 25). We cap the evaluation window because, when an LLM controls signal timing for only a single intersection, spillback from neighboring intersections may occur after ~20+ minutes and destabilize the scenario. All evaluations are conducted on a Mac Studio M3 Ultra.
**: Max Pressure is a fixed signal-timing optimization algorithm (not an LLM), so we omit its Avg Response Time; this metric is only defined for LLM-based signal-timing optimization.
***: For LightGPT-8B-Llama3, Avg Response Time is computed using only the successful responses.

Downloads last month
104
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support