AI Integration Standards

When to Use AI vs. Standard Automation

Not every automation needs AI. Adding AI where it’s not needed adds cost, latency, and unpredictability.

Use AI when:

Classifying or categorizing unstructured input (emails, tickets, form responses)
Extracting structured data from unstructured text (invoices, contracts, emails)
Generating personalized text (follow-up emails, summaries, reports)
Making decisions that require understanding context, not just matching rules
Processing natural language where simple keyword matching would fail

Do not use AI when:

A simple if/else rule handles it reliably
Input is already structured data — use data transformation instead
Output must be 100% deterministic — AI introduces variance
Cost per execution would make the automation uneconomical at scale

Default Models (March 2026 — reviewed quarterly)

Use Case	Model	Notes
General generation, summarization	Claude 3.5 Sonnet	Strong reasoning, cost-effective
Complex multi-step reasoning	Claude 3 Opus	Higher cost — use only when needed
Fast classification / simple tasks	GPT-4o Mini	Low cost, high speed
Structured JSON extraction	GPT-4o or Claude 3.5 Sonnet	Both handle JSON output well
Image / document processing	GPT-4o Vision	When visual input is required

Note

Mahmoud reviews and updates this table every quarter.

Prompt Engineering Standards

Prompt quality determines AI output quality. Every production prompt must be written, reviewed, and version-controlled.

Prompt Structure

SYSTEM PROMPT:
  You are [specific role relevant to the task].
  Your job is to [one clear objective].

  Rules:
  - [Explicit constraint 1]
  - [Explicit constraint 2]
  - Always respond in [JSON / plain text / structured list]
  - If [edge case condition], [specific instruction]
  - Never [explicit prohibition]

USER PROMPT:
  [Use XML tags to delimit variable content from instructions]

  <email>
  {{email_body}}
  </email>

  Classify this email and respond with only valid JSON:
  {"category": "...", "priority": "...", "summary": "..."}

Prompt Writing Rules

One objective per prompt. Two tasks = two prompts.
Specify output format explicitly. Need JSON? Say “respond with only valid JSON” and provide the exact schema.
Delimit variable content. Use XML tags (<email>, <document>, <input>) to separate dynamic data from instructions.
Define edge cases. Tell the AI what to do if input is blank, in a different language, or in an unexpected format.
Test adversarially. What happens with blank input? Gibberish? A different language?
Set temperature correctly. Classification/extraction: 0. Creative generation: 0.3–0.7.

Version Control

Prompts stored in GitHub /workflows as .txt or .md files
Every change is a commit: docs: update classification prompt v[X] — [reason]
Never change a production prompt without testing on 10+ real examples first

Handling AI Output

AI output must never be passed directly into downstream systems without validation.

Validation Rules

JSON output: Parse and validate schema before using any field. If parsing fails → error path.
Classification: Validate returned class is in the expected set. Unexpected value → fallback or human review queue.
Generated text: Check minimum length, key fields populated, absence of obvious failures before sending to client or end-user.
Extracted data: Validate field formats (email, date, numeric range) before writing to destination systems.

Required Fallback Strategy for Every AI Step

AI succeeds + output valid      → proceed to next step
AI succeeds + output invalid    → [retry / default value / human queue / stop + alert]
AI fails (timeout / API error)  → retry once after 30 seconds
                                   → still failing: alert Discord #monitoring + stop automation

Cost Monitoring

Log token usage for every AI call in execution logs
Mahmoud reviews API cost per client monthly
If a client’s AI usage significantly exceeds estimate → flag to PMO before next invoice cycle