Consider that a model using FP16 precision is quantized to a lower precision like INT8. Does this reduce the accuracy of the model? From what I know it is designed to reduce the size and required RAM to run the models.
Consider that a model using FP16 precision is quantized to a lower precision like INT8. Does this reduce the accuracy of the model? From what I know it is designed to reduce the size and required RAM to run the models.