CellRep

A model for generating embeddings from cellular microscopy images.

Requirements

This Hugging Face model and its associated pipeline depend on the following Python packages: transformers, pillow, torchvision and matplotlib. Please install them before using the model. For example, using uv package manager:

uv init my-proj
cd my-proj
uv add transformers pillow torchvision matplotlib

Quickstart - Embedding Pipeline

The easiest way to use this model for generating embeddings is via the image-feature-extraction pipeline:

from torch import Tensor
from PIL import Image
from transformers import pipeline

cellrep_pipeline = pipeline(
    task="image-feature-extraction",
    model="novonordisk-red/cellrep-base",
    revision="2.0.0",
    trust_remote_code=True
)

images = [
    Image.open(PATH_TO_MY_PNG_IMAGE_A),
    Image.open(PATH_TO_MY_PNG_IMAGE_B)
]

outputs = cellrep_pipeline(images)

for output in outputs:
    assert isinstance(output.embedding, Tensor)
    assert output.attention_map is None

The above pipeline invocation returns a list of named tuples, one per input image. Each tuple contains two fields: (1) output.embedding is a tensor with the embedding vector for the image and (2) output.attention_map is None by default.

The pipeline can be also instantiated with an optional flag attn_map=True to generate attention maps:

from pathlib import Path
from torch import Tensor
from PIL import Image
from transformers import pipeline

cellrep_pipeline = pipeline(
    task="image-feature-extraction",
    model="novonordisk-red/cellrep-base",
    revision="2.0.0",
    trust_remote_code=True,
    attn_map=True
)

images = [
    Image.open(PATH_TO_MY_PNG_IMAGE_A),
    Image.open(PATH_TO_MY_PNG_IMAGE_B)
]

outputs = cellrep_pipeline(images)

for output in outputs:
    assert isinstance(output.embedding, Tensor)
    assert output.attention_map is not None

# Visualise attention map for the first image
png_to_be_created = Path("/path/to/attn_map.png")
cellrep_pipeline.visualise_attention_map(
    images[0], outputs[0].attention_map, png_to_be_created
)

Same as before, the invocation of the above pipeline returns a list of named tuples and output.embedding is a tensor with the embedding vector. Now, output.attention_map contains a payload with the attention weights from the CLS token to all other tokens of the image. This can be then passed to an auxiliary function of the pipeline for visualisation.

Note: In order to extract the attention weights from the model, the attn_map=True flag swaps the default optimised implementation of the attention mechanism with a less efficient (but mathematically equivalent) implementation. Hence, for large-scale inference you should keep the default setting of running the pipeline without attention maps.

How to use this Model in Full

To work at a lower level than the pipeline, start by loading the model:

from transformers import Dinov2WithRegistersModel

model = Dinov2WithRegistersModel.from_pretrained(
    "novonordisk-red/cellrep-base",
    revision="2.0.0"
)

Then load a PNG image and pre-process it:

Resize height and width to a multiple of 14.
Convert image to a PyTorch tensor.
Normalise the pixel values using ImageNet params.
Add a leading batch dimension to the final tensor.

from PIL import Image
import torchvision.transforms.functional as f

IMAGENET_DEFAULT_MEAN = (0.485, 0.456, 0.406)
IMAGENET_DEFAULT_STD = (0.229, 0.224, 0.225)

image = Image.open(PATH_TO_MY_PNG_IMAGE)
image_tensor = f.to_tensor(image)
image_resized = f.resize(image_tensor, [518, 518])
image_tensor_norm = f.normalize(
    image_resized,
    mean=IMAGENET_DEFAULT_MEAN,
    std=IMAGENET_DEFAULT_STD,
)
image_input = image_tensor_norm.unsqueeze(0)

Then generate the embedding:

image_embedding = model(image_input).pooler_output

The pooler_output method will return the class token - only this embedding should be used for downstream tasks.

How was the Model Trained?

The training code for this model can be found at nn-research-early-development/cellrep.

Data

Our training data is composed of two large-scale cell painting datasets from the Broad Institute: CDRP-BBBC047-Bray and LINCS-Pilot, both of which can be downloaded from the Broad Institute's Cell Painting Gallery. These constitutes ~1.2 million five-channel microscopy images of cancer cells, namely U2OS and A549 cells, respectively. The cell painting assay used in these datasets captures distinct cellular components through the following channels:

RNA/nucleoli and cytoplasmic RNA (SYTO 14)
ER/endoplasmic reticulum (concanavalin A)
AGP/actin, Golgi and plasma membrane (phalloidin and WGA)
Mito/mitochondria (MitoTracker Deep Red)
DNA/nucleus (Hoechst 33342)

These datasets contain images of cells treated with diverse chemical compounds, providing a rich set of morphological phenotypes for model training. For both training and testing datasets for all models, we applied our full normalization and PNG-conversion pipeline to ensure consistent processing across all experiments.

Training Run

The training run that yielded these model weights was logged to Weights & Biases at:

https://nn-red.wandb.io/cellular-foundation-model/cellrep-benchmark-runs/runs/rv393vct

The precise checkpoint used was:

cellular-foundation-model/cellrepv2-testing/cellrepv2-53715-teacher-624999:v0

Evaluation

Our primary benchmark uses CDRP-bio-BBBC036-Bray, a held-out subset of 124,416 images from CDRP-BBBC047-Bray containing known bioactive compounds. Each compound in this dataset has been annotated. As multiple compounds can share the same MoA, this helps test if they learn biologically meaningful features within the same assay rather than memorizing compound-specific artifacts or batch effects. To ensure statistical reliability, we restrict our evaluation to the 23 most frequent MoA classes in CDRP-bio-BBBC036-Bray.

Results

              precision    recall  f1-score   support

           0       0.16      0.13      0.14       323
           1       0.12      0.18      0.14       132
           2       0.24      0.18      0.21       312
           3       0.25      0.14      0.18       478
           4       0.81      0.83      0.82       101
           5       0.11      0.19      0.14       148
           6       0.26      0.40      0.32       175
           7       0.15      0.25      0.19       102
           8       0.26      0.34      0.29        91
           9       0.15      0.11      0.12       351
          10       0.13      0.22      0.16        94
          11       0.18      0.18      0.18       206
          12       0.38      0.22      0.28       436
          13       0.17      0.26      0.20       132
          14       0.13      0.24      0.17       122
          15       0.38      0.45      0.41       221
          16       0.14      0.14      0.14       248
          17       0.14      0.11      0.12       271
          18       0.17      0.21      0.18       203
          19       0.13      0.13      0.13       271
          20       0.26      0.16      0.20       344
          21       0.15      0.21      0.17       186
          22       0.12      0.16      0.14        56

    accuracy                           0.21      5003
   macro avg       0.22      0.24      0.22      5003
weighted avg       0.22      0.21      0.20      5003

References

CellRep: Multichannel Image Representation Learning Model

Downloads last month: 67

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support