ray-zerocopy

Zero-copy model sharing for PyTorch inference in Ray

ray-zerocopy enables efficient model sharing across Ray workers using zero-copy mechanisms, eliminating the need to duplicate large model weights in memory when performing inference.

Features

🚀 Zero-copy sharing - Share model weights across Ray workers without duplication
🎯 Flexible inference - Use with Ray Tasks, Ray Actors, or Ray Data Actor UDFs
💾 Memory efficient - 4 actors with 5GB model = ~5GB total (not 20GB)
⚡ High throughput - Direct inference without model loading overhead
🔧 Pipeline support - Share entire pipelines (classes with nn.Module attributes)

Quick Example

from ray.data import ActorPoolStrategy
from ray_zerocopy import ModelWrapper

# 1. Wrap your model
model = YourModel()
model.eval()
model_wrapper = ModelWrapper.from_model(model, mode="actor")

# 2. Define actor
class InferenceActor:
    def __init__(self, model_wrapper):
        self.model = model_wrapper.load()

    def __call__(self, batch):
        with torch.no_grad():
            return self.model(batch["data"])

# 3. Use with Ray Data
results = ds.map_batches(
    InferenceActor,
    fn_constructor_kwargs={"model_wrapper": model_wrapper},
    compute=ActorPoolStrategy(size=4),  # 4 actors share the model
)

Memory Savings

Without zero-copy:

Actor 1: 5GB model
Actor 2: 5GB model
Actor 3: 5GB model
Actor 4: 5GB model
Total: 20GB

With zero-copy:

Ray Object Store: 5GB (shared)
Actor 1-4: reference object store
Total: ~5GB

Origin

Based on project-codeflare/zero-copy-model-loading

License

Apache License 2.0