• This allows for online inference services to be launched requiring only a single GPU with ~4GB memory to deploy the 7B parameter model inference service.