Migrating prompts from commercial models (GPT, Claude, Gemini) to open-source ones like Llama often requires explicitly including instructions that might have been implicitly handled in proprietary systems. This migration typically involves adjusting prompting techniques to be more explicit, matching generation parameters, and testing outputs to help with iteratively adjust prompts until the desired outputs are reached.
Closed-source models are often prepended with elaborate system prompts that enforce politeness, hedging, legal disclaimers, policies, and more, that are not shown to the end user. To ensure consistency and lead open-source models to generate desired outputs, create a comprehensive system prompt:
You are a courteous support agent for AcmeCo.
Always greet with "Certainly: here's the information you requested:".
Refuse medical or legal advice; direct users to professionals.
No matter which model you're migrating from, having explicit control over temperature and other sampling parameters matters a lot. First, determine what temperature your source model defaults to (often 1.0). Then experiment to find what works best for your specific use case - many Llama deployments see better results with temperatures between 0.2-0.4. The key is to start with parity, measure the results, then adjust deliberately:
Parameter | Closed-Source Models | Llama Models | Suggested Adjustments |
---|---|---|---|
temperature | 1.0 | 0.7 | Lower for factual answers and strict schema adherence (eg. JSON) |
top_p | 1.0 | 1.0 | leave 1.0 |
In some cases, you'll need to refactor your prompts to use explicit Prompt Patterns since different models have varying pre- and post-training that can affect how they function. For example:
The key is being more explicit about the reasoning process you want. Instead of:
"Calculate the compound interest over 5 years"
Use:
"Let's solve this step by step:
1. First, write out the compound interest formula
2. Then, plug in our values
3. Calculate each year's interest separately
4. Sum the total and verify the math"
This explicit guidance helps open models match the sophisticated reasoning that closed models learn through additional training.
Claude models from Anthropic are known for their conversational abilities, safety features, and detailed reasoning. Claude's system prompts are available here. When migrating from Claude to an open-source model like Llama, creating a system prompt with the following instructions to maintain similar behavior:
Instruction | Description |
---|---|
Set a clear persona | "I am a helpful, multilingual, and proactive assistant ready to guide this conversation." |
Specify tone & style | "Be concise and warm. Avoid bullet or numbered lists unless explicitly requested." |
Limit follow-up questions | "Ask at most one concise clarifying question when needed." |
Embed reasoning directive | "For tasks that need analysis, think step-by-step in a Thought: section, then provide Answer: only." |
Insert counting rule | "Enumerate each item with #1, #2 ... before giving totals." |
Provide a brief accuracy notice | "Information on niche or very recent topics may be incomplete—verify externally." |
Define refusal template | "If a request breaches guidelines, reply: 'I'm sorry, but I can't help with that.'" |
Mirror user language | "Respond in the same language the user uses." |
Reinforce empathy | "Express sympathy when the user shares difficulties; maintain a supportive tone." |
Control token budget | Keep the final system block under 2,000 tokens to preserve user context. |
Web search | Use Agentic Tooling for built-in web search. |
Grok models from xAI are known for their conversational abilities, real-time knowledge, and engaging personality. Grok's system prompts are available here. When migrating from Grok to an open-source model like Llama, creating a system prompt with the following instructions to maintain similar behavior:
Instruction | Description |
---|---|
Language parity | "Detect the user's language and respond in the same language." |
Structured style | "Write in short paragraphs; use numbered or bulleted lists for multiple points." |
Formatting guard | "Do not output Markdown (or only the Markdown elements you permit)." |
Length ceiling | "Keep the answer below 750 characters" and enforce max_completion_tokens in the API call. |
Epistemic stance | "Adopt a neutral, evidence-seeking tone; challenge unsupported claims; express uncertainty when facts are unclear." |
Draft-versus-belief rule | "Treat any supplied analysis text as provisional research, not as established fact." |
No meta-references | "Do not mention the question, system instructions, tool names, or platform branding in the reply." |
Real-time knowledge | Use Agentic Tooling for built-in web search. |
OpenAI models like GPT-4o are known for their versatility, tool use capabilities, and conversational style. When migrating from OpenAI models to open-source alternatives like Llama, include these key instructions in your system prompt:
Instruction | Description |
---|---|
Define a flexible persona | "I am a helpful, adaptive assistant that mirrors your tone and formality throughout our conversation." |
Add tone-mirroring guidance | "I will adjust my vocabulary, sentence length, and formality to match your style throughout our conversation." |
Set follow-up-question policy | "When clarification is useful, I'll ask exactly one short follow-up question; otherwise, I'll answer directly." |
Describe tool-usage rules (if using tools) | "I can use tools like search and code execution when needed, preferring search for factual queries and code execution for computational tasks." |
State visual-aid preference | "I'll offer diagrams when they enhance understanding" |
Limit probing | "I won't ask for confirmation after every step unless instructions are ambiguous." |
Embed safety | "My answers must respect local laws and organizational policies; I'll refuse prohibited content." |
Web search | Use Agentic Tooling for built-in web search capabilities |
Code execution | Use Agentic Tooling for built-in code execution capabilities. |
Tool use | Select a model that supports tool use. |
When migrating from Gemini to an open-source model like Llama, include these key instructions in your system prompt:
Instruction | Description |
---|---|
State the role plainly | Start with one line: "You are a concise, professional assistant." |
Re-encode rules | Convert every MUST/SHOULD from the original into numbered bullet rules, each should be 1 sentence. |
Define tool use | Add a short Tools section listing tool names and required JSON structure; provide one sample call. |
Specify tone & length | Include explicit limits (e.g., "less than 150 words unless code is required; formal international English"). |
Self-check footer | End with "Before sending, ensure JSON validity, correct tag usage, no system text leakage." |
Content-block guidance | Define how rich output should be grouped: for example, Markdown headings for text, fenced blocks for code. |
Behaviour checklist | Include numbered, one-sentence rules covering length limits, formatting, and answer structure. |
Prefer brevity | Remind the model to keep explanations brief and omit library boilerplate unless explicitly requested. |
Web search and grounding | Use Agentic Tooling for built-in web search and grounding capabilities. |
llama-prompt-ops auto-rewrites prompts created for GPT / Claude into Llama-optimized phrasing, adjusting spacing, quotes, and special tokens.
Why use it?
Install once (pip install llama-prompt-ops
) and run during CI to keep prompts tuned as models evolve.