Model Migration Guide

Migrating prompts from commercial models (GPT, Claude, Gemini) to open-source ones like Llama often requires explicitly including instructions that might have been implicitly handled in proprietary systems. This migration typically involves adjusting prompting techniques to be more explicit, matching generation parameters, and testing outputs to help with iteratively adjust prompts until the desired outputs are reached.

Migration Principles

  1. Surface hidden rules: Proprietary model providers prepend their closed-source models with system messages that are not explicitly shared with the end user; you must create clear system messages to get consistent outputs.
  2. Start from parity, not aspiration: Match parameters such as temperature, Top P, and max tokens first, then focus on adjusting your prompts.
  3. Automate the feedback loop: We recommend using open-source tooling like prompt optimizers instead of manual trial-and-error.

Aligning System Behavior and Tone

Closed-source models are often prepended with elaborate system prompts that enforce politeness, hedging, legal disclaimers, policies, and more, that are not shown to the end user. To ensure consistency and lead open-source models to generate desired outputs, create a comprehensive system prompt:

You are a courteous support agent for AcmeCo.
Always greet with "Certainly: here's the information you requested:".
Refuse medical or legal advice; direct users to professionals.

Sampling / Parameter Parity

No matter which model you're migrating from, having explicit control over temperature and other sampling parameters matters a lot. First, determine what temperature your source model defaults to (often 1.0). Then experiment to find what works best for your specific use case - many Llama deployments see better results with temperatures between 0.2-0.4. The key is to start with parity, measure the results, then adjust deliberately:

ParameterClosed-Source ModelsLlama ModelsSuggested Adjustments
temperature1.00.7Lower for factual answers and strict schema adherence (eg. JSON)
top_p1.01.0leave 1.0

Refactoring Prompts

In some cases, you'll need to refactor your prompts to use explicit Prompt Patterns since different models have varying pre- and post-training that can affect how they function. For example:

  • Some models, such as those that can reason, might naturally break down complex problems, while others may need explicit instructions to "think step by step" using Chain of Thought prompting
  • Where some models automatically verify facts, others might need Chain of Verification to achieve similar accuracy
  • When certain models explore multiple solution paths by default, you can achieve similar results with Self-Consistency voting across multiple completions

The key is being more explicit about the reasoning process you want. Instead of:

"Calculate the compound interest over 5 years"

Use:

"Let's solve this step by step:
1. First, write out the compound interest formula
2. Then, plug in our values
3. Calculate each year's interest separately
4. Sum the total and verify the math"

This explicit guidance helps open models match the sophisticated reasoning that closed models learn through additional training.

Migrating from Claude (Anthropic)

Claude models from Anthropic are known for their conversational abilities, safety features, and detailed reasoning. Claude's system prompts are available here. When migrating from Claude to an open-source model like Llama, creating a system prompt with the following instructions to maintain similar behavior:

InstructionDescription
Set a clear persona"I am a helpful, multilingual, and proactive assistant ready to guide this conversation."
Specify tone & style"Be concise and warm. Avoid bullet or numbered lists unless explicitly requested."
Limit follow-up questions"Ask at most one concise clarifying question when needed."
Embed reasoning directive"For tasks that need analysis, think step-by-step in a Thought: section, then provide Answer: only."
Insert counting rule"Enumerate each item with #1, #2 ... before giving totals."
Provide a brief accuracy notice"Information on niche or very recent topics may be incomplete—verify externally."
Define refusal template"If a request breaches guidelines, reply: 'I'm sorry, but I can't help with that.'"
Mirror user language"Respond in the same language the user uses."
Reinforce empathy"Express sympathy when the user shares difficulties; maintain a supportive tone."
Control token budgetKeep the final system block under 2,000 tokens to preserve user context.
Web searchUse Agentic Tooling for built-in web search.

Migrating from Grok (xAI)

Grok models from xAI are known for their conversational abilities, real-time knowledge, and engaging personality. Grok's system prompts are available here. When migrating from Grok to an open-source model like Llama, creating a system prompt with the following instructions to maintain similar behavior:

InstructionDescription
Language parity"Detect the user's language and respond in the same language."
Structured style"Write in short paragraphs; use numbered or bulleted lists for multiple points."
Formatting guard"Do not output Markdown (or only the Markdown elements you permit)."
Length ceiling"Keep the answer below 750 characters" and enforce max_completion_tokens in the API call.
Epistemic stance"Adopt a neutral, evidence-seeking tone; challenge unsupported claims; express uncertainty when facts are unclear."
Draft-versus-belief rule"Treat any supplied analysis text as provisional research, not as established fact."
No meta-references"Do not mention the question, system instructions, tool names, or platform branding in the reply."
Real-time knowledgeUse Agentic Tooling for built-in web search.

Migrating from OpenAI

OpenAI models like GPT-4o are known for their versatility, tool use capabilities, and conversational style. When migrating from OpenAI models to open-source alternatives like Llama, include these key instructions in your system prompt:

InstructionDescription
Define a flexible persona"I am a helpful, adaptive assistant that mirrors your tone and formality throughout our conversation."
Add tone-mirroring guidance"I will adjust my vocabulary, sentence length, and formality to match your style throughout our conversation."
Set follow-up-question policy"When clarification is useful, I'll ask exactly one short follow-up question; otherwise, I'll answer directly."
Describe tool-usage rules (if using tools)"I can use tools like search and code execution when needed, preferring search for factual queries and code execution for computational tasks."
State visual-aid preference"I'll offer diagrams when they enhance understanding"
Limit probing"I won't ask for confirmation after every step unless instructions are ambiguous."
Embed safety"My answers must respect local laws and organizational policies; I'll refuse prohibited content."
Web searchUse Agentic Tooling for built-in web search capabilities
Code executionUse Agentic Tooling for built-in code execution capabilities.
Tool useSelect a model that supports tool use.

Migrating from Gemini (Google)

When migrating from Gemini to an open-source model like Llama, include these key instructions in your system prompt:

InstructionDescription
State the role plainlyStart with one line: "You are a concise, professional assistant."
Re-encode rulesConvert every MUST/SHOULD from the original into numbered bullet rules, each should be 1 sentence.
Define tool useAdd a short Tools section listing tool names and required JSON structure; provide one sample call.
Specify tone & lengthInclude explicit limits (e.g., "less than 150 words unless code is required; formal international English").
Self-check footerEnd with "Before sending, ensure JSON validity, correct tag usage, no system text leakage."
Content-block guidanceDefine how rich output should be grouped: for example, Markdown headings for text, fenced blocks for code.
Behaviour checklistInclude numbered, one-sentence rules covering length limits, formatting, and answer structure.
Prefer brevityRemind the model to keep explanations brief and omit library boilerplate unless explicitly requested.
Web search and groundingUse Agentic Tooling for built-in web search and grounding capabilities.

Tooling: llama-prompt-ops

llama-prompt-ops auto-rewrites prompts created for GPT / Claude into Llama-optimized phrasing, adjusting spacing, quotes, and special tokens.


Why use it?

  • Drop-in CLI: feed a JSONL file of prompts and expected responses; get a better prompt with improved success rates.
  • Regression mode: runs your golden set and reports win/loss vs baseline.

Install once (pip install llama-prompt-ops) and run during CI to keep prompts tuned as models evolve.