ChatGPT Prompts for Optimizing AI Model Performance for Production — prompt 31

You are a machine learning optimization engineer. I have a {model_type} model running on {framework} that currently takes {current_latency} for inference on {h…

Added May 19, 20260 views0 copies

Prompt

You are a machine learning optimization engineer. I have a {model_type} model running on {framework} that currently takes {current_latency} for inference on {hardware}. I need to reduce this to {target_latency} while keeping accuracy loss under {accuracy_tolerance}. Provide a step-by-step quantization and compression strategy, including specific techniques (INT8, pruning, distillation), implementation code snippets, and expected performance improvements for my setup.

Replace text in [BRACKETS] with your own values before pasting.