# ray-zerocopy

**Zero-copy model sharing for PyTorch inference in Ray**

ray-zerocopy enables efficient model sharing across Ray workers using zero-copy mechanisms, eliminating the need to duplicate large model weights in memory when performing inference.

## Features

- 🚀 **Zero-copy sharing** - Share model weights across Ray workers without duplication
- 🎯 **Flexible inference** - Use with Ray Tasks, Ray Actors, or Ray Data Actor UDFs
- 💾 **Memory efficient** - 4 actors with 5GB model = ~5GB total (not 20GB)
- ⚡ **High throughput** - Direct inference without model loading overhead
- 🔧 **Pipeline support** - Share entire pipelines (classes with `nn.Module` attributes)

## Quick Example

```python
from ray.data import ActorPoolStrategy
from ray_zerocopy import ModelWrapper

# 1. Wrap your model
model = YourModel()
model.eval()
model_wrapper = ModelWrapper.from_model(model, mode="actor")

# 2. Define actor
class InferenceActor:
    def __init__(self, model_wrapper):
        self.model = model_wrapper.load()

    def __call__(self, batch):
        with torch.no_grad():
            return self.model(batch["data"])

# 3. Use with Ray Data
results = ds.map_batches(
    InferenceActor,
    fn_constructor_kwargs={"model_wrapper": model_wrapper},
    compute=ActorPoolStrategy(size=4),  # 4 actors share the model
)
```

## Memory Savings

**Without zero-copy:**
```
Actor 1: 5GB model
Actor 2: 5GB model
Actor 3: 5GB model
Actor 4: 5GB model
Total: 20GB
```

**With zero-copy:**
```
Ray Object Store: 5GB (shared)
Actor 1-4: reference object store
Total: ~5GB
```

## Documentation

```{toctree}
:maxdepth: 2
:caption: Contents

getting_started
model_wrapper_guide
jit_wrappers
api_reference/index
```

## Origin

Based on [project-codeflare/zero-copy-model-loading](https://github.com/project-codeflare/zero-copy-model-loading)

## License

Apache License 2.0