Accelerate cold starts by loading in previous compilation artifacts.
torch.compile
, which can decrease inference time by up to 40%. However, it increases your cold start time because you need to compile before your first inference. To decrease this time, caching previous compilation artifacts is a must-implement strategy, read more here. This new API exposes this caching functionality to our users.In practice, having the cache significantly reduces compilation latencies, by up to 5x.OperationStatus
object that helps you control the flow of the program based on the result.
load_compile_cache()
OperationStatus.SUCCESS
β successful loadOperationStatus.SKIPPED
β if already exists in b10fsOperationStatus.ERROR
β general catch-all errorsOperationStatus.DOES_NOT_EXIST
if no cache file was found.save_compile_cache()
OperationStatus.SUCCESS
β successful saveOperationStatus.SKIPPED
β skipped because compile cache already exists in shared directoryOperationStatus.ERROR
β general catch-all errorsload_compile_cache
to inform on whether to save_compile_cache
. In other implementations, you can fall back to skip compiling, in the off chance you fail to load the cache.
There are two files to change.
config.yaml
b10-transfer
:
model.py