Currently, onnxruntime-gpu is a hard dependency, even if users aren't using GPU acceleration or ONNX quantized models. This is blocking the conda-forge build of the package since onnxruntime-gpu is not on conda-forge, but is more generally undesirable as we're roping in a big dependency that only a subset of users need.