J C's picture

J C

dark-pen

·

AI & ML interests

None yet

Recent Activity

liked a dataset about 4 hours ago

natolambert/rlhf-library

liked a dataset about 4 hours ago

allenai/RLVR-GSM

liked a dataset about 4 hours ago

allenai/olmo-mix-1124

View all activity

Organizations

upvoted a collection about 4 hours ago

DeepPrune

Parallel Scaling without Inter-trace Redundancy • 3 items • Updated Oct 10, 2025 • 2

upvoted a paper about 4 hours ago

Rethinking Table Instruction Tuning

Paper • 2501.14693 • Published Jan 24, 2025 • 1

upvoted 2 papers about 5 hours ago

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 24

Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series

Paper • 2301.11308 • Published Jan 26, 2023 • 2

upvoted a collection about 5 hours ago

vfa

1 item • Updated Dec 30, 2024 • 1

upvoted a paper about 5 hours ago

VFA: Vision Frequency Analysis of Foundation Models and Human

Paper • 2409.05817 • Published Sep 9, 2024 • 3

upvoted 2 collections about 7 hours ago

Aranizer | Arabic Tokenization with SentencePiece & PBE

Collection of Arabic Tokenizers with different sizes based on SentencePiece & PBE Encodings suitable for training LLMs • 6 items • Updated Aug 25, 2024 • 3

SARD: Synthetic Arabic Recognition Dataset

A large-scale synthetic Arabic OCR dataset comprising 843,622 book-style document images across 10 fonts, designed to advance VLM for Arabic Texts • 2 items • Updated May 19, 2025 • 6

upvoted a collection about 9 hours ago

MasriSpeech-Dataset

6 items • Updated Aug 2, 2025 • 1

upvoted a paper 1 day ago

EfficientLLM: Efficiency in Large Language Models

Paper • 2505.13840 • Published May 20, 2025 • 25

upvoted a collection 2 days ago

Nemotron Speech

Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 13 items • Updated 1 day ago • 14

upvoted a paper 2 days ago

Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement

Paper • 2601.01562 • Published 7 days ago • 24

upvoted a collection 2 days ago

SDNQ

Models quantized with SDNQ • 26 items • Updated 1 day ago • 20

upvoted 4 collections 7 days ago

SAM Audio

The SAM Audio model licenses allow for redistribution so long as the original license files are included • 9 items • Updated 17 days ago • 4

neucodec

We introduce NeuCodec, a 0.8kbps audio codec that outputs audio at 24kHz. • 6 items • Updated Oct 9, 2025 • 5

neutts-air

NeuTTS Air is a speech foundation model that runs on CPU in real-time, with instant voice cloning. • 3 items • Updated Oct 9, 2025 • 16

Mem-Agent

Small sized agents from Dria trained on interacting with an obsidian-like memory system using python tools. Trained on Qwen3-4B-Thinking-2507. • 4 items • Updated Sep 5, 2025 • 4

upvoted an article 7 days ago

Article

DABStep: Data Agent Benchmark for Multi-step Reasoning

+4

Feb 4, 2025

•

122

upvoted 2 collections 8 days ago

World models

4 items • Updated 20 days ago • 2

Cumputer use

2 items • Updated Nov 23, 2025 • 1