Numpy ndarray support
Once you have your data represented as a numpy array, you can easily (and
often without copying) convert it to
torch, tensorflow or other common
numeric library’s objects.NumpyArrayField. For example:
NumpyArrayField is a wrapper around the actual numpy array. Inside your
python code, you can work with its array attribute:
Binary
As a JSON alternative that supports byte data, Chains usesmsgpack (with
msgpack_numpy) to serialize the dict representation.
For Chainlet-Chainlet RPCs this is done automatically for you by enabling binary
mode of the dependency Chainlets, see
all options:
Binary client
If you want to send such data as input to a chain or parse binary output from a chain, you have to add themsgpack serialization client-side:
The implementation of
NumpyArrayField only needs pydantic, no other Chains
dependencies. So you can take that implementation code in isolation and
integrate it in your client code.Some version combinations of
msgpack and msgpack_numpy give errors, we
know that msgpack = ">=1.0.2" and msgpack-numpy = ">=0.4.8" work.JSON
The JSON-schema to represent the array is a dict ofshape (tuple[int]), dtype (str), data_b64 (str). E.g.
np.ndarray.tobytes().
To get back to the array from the JSON string, use the model’s
model_validate_json method.
As discussed in the beginning, this schema is not performant for numeric data
and only offered as a compatibility layer (JSON does not allow bytes) -
generally prefer the binary format.
Simple bytes fields
It is possible to add a bytes field to a pydantic model used in a chain,
or as a plain argument to run_remote. This can be useful to include
non-numpy data formats such as images or audio/video snippets.
In this case, the “normal” JSON representation does not work and all
involved requests or Chainlet-Chainlet-invocations must use binary mode.
The same steps as for arrays above apply: construct dicts
with bytes values and keys corresponding to the run_remote argument
names or the field names in the pydantic model. Then use msgpack to
serialize and deserialize those dicts.
Don’t forget to add Content-type headers and that response.json() will
not work.