Accelerate cold starts by loading in previous compilation artifacts.
torch.compile
, which can decrease inference time by up to 40%. However, it increases your cold start time because you need to compile before your first inference. To decrease this time, caching previous compilation artifacts is a must-implement strategy, read more here. This new API exposes this caching functionality to our users.In practice, having the cache significantly reduces compilation latencies, by up to 5x.save_compile_cache()
True
if successfully saved, else False
.
load_compile_cache()
True
if successfully found a cache file and loaded it, else False
.
load_compile_cache
to inform on whether to save_compile_cache
. In other implementations, you can fall back to skip compiling, in the off chance you fail to load the cache.
There are two files to change.
config.yaml
model.py