ray-zerocopy
Zero-copy model sharing for PyTorch inference in Ray
ray-zerocopy enables efficient model sharing across Ray workers using zero-copy mechanisms, eliminating the need to duplicate large model weights in memory when performing inference.
Features
🚀 Zero-copy sharing - Share model weights across Ray workers without duplication
🎯 Flexible inference - Use with Ray Tasks, Ray Actors, or Ray Data Actor UDFs
💾 Memory efficient - 4 actors with 5GB model = ~5GB total (not 20GB)
⚡ High throughput - Direct inference without model loading overhead
🔧 Pipeline support - Share entire pipelines (classes with
nn.Moduleattributes)
Quick Example
from ray.data import ActorPoolStrategy
from ray_zerocopy import ModelWrapper
# 1. Wrap your model
model = YourModel()
model.eval()
model_wrapper = ModelWrapper.from_model(model, mode="actor")
# 2. Define actor
class InferenceActor:
def __init__(self, model_wrapper):
self.model = model_wrapper.load()
def __call__(self, batch):
with torch.no_grad():
return self.model(batch["data"])
# 3. Use with Ray Data
results = ds.map_batches(
InferenceActor,
fn_constructor_kwargs={"model_wrapper": model_wrapper},
compute=ActorPoolStrategy(size=4), # 4 actors share the model
)
Memory Savings
Without zero-copy:
Actor 1: 5GB model
Actor 2: 5GB model
Actor 3: 5GB model
Actor 4: 5GB model
Total: 20GB
With zero-copy:
Ray Object Store: 5GB (shared)
Actor 1-4: reference object store
Total: ~5GB
Documentation
Origin
License
Apache License 2.0