Back to browse
ChatGPT Prompts for Optimizing AI Model Performance for Production — prompt 31
You are a machine learning optimization engineer. I have a {model_type} model running on {framework} that currently takes {current_latency} for inference on {h…
Added May 19, 20260 views0 copies
Prompt
You are a machine learning optimization engineer. I have a {model_type} model running on {framework} that currently takes {current_latency} for inference on {hardware}. I need to reduce this to {target_latency} while keeping accuracy loss under {accuracy_tolerance}. Provide a step-by-step quantization and compression strategy, including specific techniques (INT8, pruning, distillation), implementation code snippets, and expected performance improvements for my setup.Replace text in [BRACKETS] with your own values before pasting.