Skip to content

How We Gave a 30-Person Marketing Agency an AI Teammate That Talks to All Their Platforms at Once

22 min read
GammaEdge Team

How We Gave a 30-Person Marketing Agency an AI Teammate That Talks to All Their Platforms at Once

Client: Under NDA — mid-size performance marketing agency | Industry: Digital Marketing / Advertising | Clients Managed: ~55 active accounts | Monthly Ad Spend Under Management: ~$2.8M | Timeline: ~9 months | Region: India (managing global campaigns)


At a Glance

MetricResult
Reporting Time~18 hrs/week → under 2 hrs
Budget Pacing ErrorsDown ~80%
Avg. Campaign Response TimeHours → minutes
Client Churn~22% → under 10%

What Was Going Wrong

The agency runs performance marketing for 55 clients — e-commerce brands, SaaS companies, D2C startups, a couple of B2B firms. Each client has campaigns running on some combination of Google Ads, Meta, TikTok, LinkedIn, and a few on Amazon, Pinterest, or Snapchat. Total ad spend under management: about $2.8 million a month. Team of 30 — media buyers, account managers, analysts, and a creative team.

They were good at the actual marketing. Strategy, creative, audience targeting — that was the skill. What was killing them was everything around it.

The tab problem. A single media buyer managing 8 client accounts would have, on any given morning: Google Ads open for three clients, Meta Ads Manager for five, TikTok Ads for two, LinkedIn Campaign Manager for one, GA4 for all of them, plus Google Sheets for reporting. We counted once — 34 browser tabs. And that's before Slack, email, and the project management tool. The actual thinking part — "should we shift budget from this campaign to that one?" — took maybe 20% of their day. The rest was clicking between dashboards, copying numbers, and trying to remember which client had which issue.

Reporting was eating everyone alive. Every client gets a weekly performance update and a monthly detailed report. For 55 clients, that's 55 weekly reports and 55 monthly reports. Each one required pulling data from 3–5 platforms, dumping it into a Google Sheet template, calculating metrics, building charts, writing commentary, and sending it. The analytics team — three people — spent roughly 18 hours a week just on reporting. The Monday morning scramble to get all weekly reports out by noon was a recurring crisis. They'd missed the deadline so many times that two clients cited "lack of communication" as the reason for leaving.

Junior staff made expensive mistakes. One media buyer accidentally set a daily budget of $5,000 instead of $500 on a Google Ads campaign. It ran for 14 hours before anyone caught it. Cost: $2,917 in wasted spend that the agency had to eat because the client's budget cap was $500/day. Another time, someone pushed a Meta campaign live targeting the US when the client was India-only. Small mistakes, but at $2.8M/month in ad spend, a 1% error rate is $28,000/month in wasted money.

Nobody saw problems until clients complained. A campaign would stop spending (hit a budget limit, ad disapproved, payment method expired), and nobody would notice for 2–3 days. The client would notice first — either from their own dashboard access or from the drop in leads — and call in frustrated. By then, the agency was on the defensive. Reactive, not proactive.

Budget pacing was manual guesswork. Clients have monthly budgets. $15,000/month on Google, $8,000/month on Meta, whatever the agreement is. Staying on pace means spending roughly 1/30th per day, adjusted for weekdays vs. weekends and campaign performance. Some months they'd underspend by 15% and the client would ask why. Other months they'd overshoot by 10% and have to explain the overage. The pacing spreadsheet was a Google Sheet with conditional formatting that turned red when things were off — but someone had to remember to check it. During busy weeks, nobody did.

Attribution was a fight. "How many leads did we actually generate?" should be a simple question. It wasn't. Google Ads claims 200 conversions. Meta claims 150. GA4 says total conversions from all sources were 280. The numbers don't add up because every platform counts differently — different attribution windows, different models, different definitions of a "conversion." The agency spent hours every month reconciling attribution for clients who cared about the numbers (which was most of them). They never fully resolved it — they just presented "platform-reported" numbers alongside "GA4 numbers" and let the client pick which story they preferred.

The founder told us: "We're a marketing agency that spends more time on spreadsheets than on marketing. Something's fundamentally broken."


What We Built

An AI operations platform where every marketing tool the agency uses is connected through MCP (Model Context Protocol) — a standard that lets the AI agent authenticate with and take actions on external platforms. The agency's team interacts with the AI through a chat interface. They ask questions, give commands, and the AI goes and does it — across whichever platforms are relevant.

The key idea: instead of the team logging into 15 different platforms, the AI logs into all of them. The team talks to the AI.

The MCP Architecture (Why This Works)

MCP is what makes this different from "we built a dashboard that shows data from multiple sources." Dashboards are read-only. This system reads and writes. The AI doesn't just show you that a campaign is overspending — it can pause it. It doesn't just tell you the weekly metrics — it can generate the report, format it, and draft the client email.

Each platform integration is an MCP server — a small service that exposes the platform's capabilities as tools the AI can use. When someone asks the AI "what's Client X's Google Ads spend this week?", the AI calls the Google Ads MCP server's get_campaign_metrics tool. When someone says "pause all Meta ad sets under 1.2 ROAS for Client Y", the AI calls the Meta Ads MCP server's update_adset_status tool for each qualifying ad set.

We built MCP servers for 16 platforms:

Ad Platforms (8):

  • Google Ads — campaign management, bid adjustments, budget changes, keyword management, ad creation
  • Meta Ads (Facebook + Instagram) — campaign/adset/ad management, audience creation, budget pacing
  • TikTok Ads — campaign management, spend monitoring, creative performance
  • LinkedIn Campaign Manager — B2B campaign management, audience targeting, lead gen forms
  • Microsoft Ads (Bing) — campaign mirroring from Google, bid management
  • Pinterest Ads — pin promotion, audience management
  • Amazon Ads (Sponsored Products/Brands) — for e-commerce clients
  • Snapchat Ads — story ads, collection ads, spend monitoring

Analytics & SEO (5):

  • Google Analytics 4 — traffic, conversions, audience data, funnel analysis
  • Google Search Console — organic search performance, keyword rankings, indexing issues
  • Ahrefs — backlink monitoring, keyword tracking, competitor analysis
  • SEMrush — keyword research, site audit data, position tracking
  • Hotjar — heatmaps, session recordings, user behaviour summaries

CRM & Email (3):

  • HubSpot — contact management, deal pipeline, email campaign performance, lead attribution
  • Klaviyo — e-commerce email/SMS campaign data, flow performance, revenue attribution
  • Mailchimp — email campaign metrics, audience segments, A/B test results

Reporting & Output:

  • Google Sheets — read/write for report templates, data dumps, client-facing dashboards
  • Looker Studio — programmatic report generation, data source management

Each MCP server handles authentication (OAuth tokens, API keys), rate limiting, error handling, and data normalization. The AI doesn't know or care about the underlying API differences — it works with normalized concepts like "campaign," "ad set," "spend," "conversions," regardless of whether the source is Google, Meta, or TikTok.

What the Team Actually Does With It

This isn't a theoretical architecture — it's a tool people use 40+ times a day. Here's what the actual usage looks like:

Morning health check. The account manager opens the chat and types: "Morning check — flag anything that needs attention across all clients." The AI scans every active campaign across every platform, checks for:

  • Campaigns that stopped spending (disapproved ads, budget limits, payment issues)
  • Spend pacing anomalies (over or under by more than 15%)
  • Significant performance drops (CPA up more than 25% day-over-day)
  • New leads or conversions that need follow-up in HubSpot
  • SEO alerts from Ahrefs (lost backlinks, ranking drops)

Returns a prioritized list. Most mornings, 40 out of 55 clients are fine. The 15 that need attention are flagged with specific issues and suggested actions. What used to take 2 hours of dashboard-hopping takes 3 minutes.

On-demand campaign management. "Increase the daily budget on Client Z's Google Search campaign 'Brand Terms' from $200 to $350." The AI confirms the change, makes it via the Google Ads MCP server, and logs it. "Duplicate Client A's best-performing Meta ad set into a new campaign targeting lookalike audiences." Done. The AI handles the API calls, the media buyer handles the strategy.

Real-time client questions. Client calls and asks "how are we doing this month?" Account manager types: "Give me a performance summary for Client Q — all channels, month to date." Within 15 seconds, the AI pulls spend, impressions, clicks, conversions, CPA, and ROAS from Google Ads, Meta, and TikTok, compares to last month, and formats a response the account manager can read to the client on the phone. Before this, they'd say "let me pull that together and send you an email" — and the email would come 3 hours later.

Weekly reporting. "Generate weekly reports for all active clients." The AI pulls data from every relevant platform for each client, populates the Google Sheets templates, calculates week-over-week changes, writes performance commentary (not generic — it references specific campaigns and changes), and drafts the email. A human reviews each one before sending, but the 18-hour weekly reporting task now takes about 90 minutes of review time.

Budget pacing. The AI monitors spend pacing daily. If a client's Google Ads account is on track to underspend by more than 10% this month, it flags it and suggests: "Client R is pacing at 78% of monthly Google budget. Recommend increasing daily budget from $400 to $480 for the remaining 12 days." The account manager approves or modifies. Overspend alerts work the same way — "Client S is pacing 18% over monthly Meta budget. Recommend reducing ad set budgets by 15% or pausing lowest-ROAS ad sets."

Cross-platform attribution. The AI pulls conversion data from the ad platforms and from GA4. For each client, it presents: platform-reported conversions (what Google/Meta/TikTok each claim), GA4-reported conversions (unified view), and the delta. It doesn't solve the attribution problem — nobody can, it's inherently messy — but it makes the mess visible and consistent. Clients get the same attribution view every week instead of whatever the account manager happened to pull that morning.

Anomaly detection and proactive alerts. This runs continuously, not just during morning checks. When a campaign's CPA spikes 40% in 4 hours, the team gets a notification: "Client T's Meta retargeting campaign CPA jumped from $12 to $19 since 2pm. Ad fatigue likely — frequency is at 4.2 (was 2.1 last week). Suggest refreshing creative or narrowing audience." The team decides what to do, but they know about the problem in real time instead of finding out three days later.

Guardrails and approval flows. Not everything is automated. The system has configurable guardrails:

  • Budget changes above a threshold require human approval
  • Campaign creation always requires human approval
  • Ad creative changes require human approval
  • Pausing campaigns with spend above $X/day requires approval
  • Any action on "tier 1" clients (top 10 by spend) requires senior approval

Everything is logged — who asked, what the AI did, which platform, which client, timestamp. Full audit trail. When a client asks "who changed my budget on Tuesday?", there's an answer.

The AI agent sits between the team and 16 platforms. Every interaction goes through the guardrails engine. Every action is logged. The team works in natural language; the AI handles the API calls.

Technical stack (for the engineering-minded)
  • AI Agent: Claude (Anthropic) as the primary LLM for reasoning and tool selection. Chosen over GPT-4 for more reliable tool-use — the agent often needs to chain 5–8 MCP tool calls in a single user request ("generate weekly report for Client X" requires pulling from Google Ads, Meta, GA4, and writing to Google Sheets). Claude's tool-use was more consistent in our testing, especially for multi-step chains. Falls back to GPT-4o for specific tasks where we needed structured JSON output matching platform-specific schemas.
  • MCP Layer: 16 MCP servers, each running as a lightweight Python service (FastAPI). Each server wraps a platform's API and exposes it as MCP-compatible tools. Authentication handled per-server — Google platforms use OAuth 2.0 with refresh tokens, Meta uses long-lived access tokens, Ahrefs/SEMrush use API keys. Token refresh runs on a cron schedule; expired tokens trigger alerts.
  • Orchestration: Custom agent loop built on LangGraph. Handles multi-turn reasoning, tool selection, result aggregation, and error recovery. If a tool call fails (API rate limit, auth expired), the agent retries with backoff or reports the failure to the user with context.
  • Backend: FastAPI (Python 3.12), async throughout. PostgreSQL for client data, campaign metadata, action logs, and audit trail. Redis for caching frequently-accessed metrics (campaign spend is polled every 15 minutes; cached to avoid hammering platform APIs). Celery for background jobs (daily health checks, budget pacing calculations, weekly report generation).
  • Frontend: React 18 + TypeScript. Chat interface built with a custom streaming UI — the AI's responses stream in real time, including tool call status ("Querying Google Ads... Querying Meta Ads... Calculating..."). Approval queue is a separate view — pending actions with one-click approve/reject.
  • Anomaly Detection: Lightweight statistical model (Z-score based) running on hourly metric snapshots. Flags metrics that deviate more than 2 standard deviations from the 14-day rolling mean. Not a neural network — overkill for this. Simple stats, run frequently, with sensible thresholds per metric type. CPA spikes matter more than impression dips.
  • Reporting Pipeline: Jinja2 templates for Google Sheets population. The AI generates the commentary text; the pipeline handles formatting, chart generation (via the Sheets API), and email drafting (via Gmail API). Monthly reports use Looker Studio templates with programmatic data source updates.
  • Infrastructure: AWS — ECS for the MCP servers and backend, RDS for PostgreSQL, ElastiCache for Redis, S3 for report archives and exported files. Total infra cost: ~$2,200/month. The MCP servers are the main cost driver — 16 services, each with its own container, though most are lightweight.
  • Security: All platform credentials stored in AWS Secrets Manager. OAuth tokens encrypted at rest. Every action is logged with user ID, client ID, platform, tool called, parameters, result, and timestamp. Role-based access — junior media buyers can query data but can't execute budget changes above $100/day without approval. Client data is isolated — the AI only accesses platforms for the client the user is asking about.

How We Rolled It Out

Months 1–2: Platform integration marathon. We started with the four platforms that covered 80% of the agency's spend: Google Ads, Meta Ads, GA4, and Google Sheets. Each MCP server took about a week to build and test — not because the API integration was hard, but because each platform has its own authentication dance, rate limits, error formats, and data quirks.

Google Ads was the most complex — their API has hundreds of endpoints, and campaign structures are deeply nested (account → campaign → ad group → ad → keywords). We exposed 23 tools from the Google Ads MCP server. Meta was second — their Ads API changes frequently and the documentation sometimes contradicts itself. The Meta MCP server has 18 tools.

The remaining 12 platforms came in batches over months 2–4. Some were straightforward (Ahrefs has a clean API, took 3 days). Some were painful (Snapchat's Ads API documentation was incomplete, Pinterest's rate limits were surprisingly aggressive). TikTok's API required a separate business center verification process that took 2 weeks of back-and-forth.

Month 3: Agent brain surgery. Building MCP servers is plumbing. The hard part was making the AI agent actually useful. Early versions were frustrating — the agent would make a Google Ads query, get a 500-row response, and try to summarize it poorly. Or it would chain 4 tool calls when 1 would do. Or it would confidently misinterpret a metric ("CPA is $12" when the actual CPA was $12.47 and the $12 was CPM).

We spent almost a month on prompt engineering, tool descriptions, and result formatting. The tool descriptions matter more than people think — a well-described tool with clear parameter definitions and example outputs is the difference between an agent that works and one that hallucinates. We also added a "planning" step where the agent outlines what it's going to do before doing it, so the user can catch mistakes before they happen.

Months 4–5: Shadow mode with 5 clients. Picked 5 clients across different verticals (e-commerce, SaaS, D2C, B2B, local services). The team used the AI alongside their normal workflow. Every AI action was logged but required human execution — the AI would say "I recommend pausing these 3 ad sets", and the media buyer would go do it manually in Meta.

Main findings from shadow mode:

  • The morning health check was immediately valuable. Every media buyer started their day with it by week 2.
  • Report generation worked well for standard metrics but the AI's commentary was too generic. We added client context profiles — the AI knows that Client X cares about ROAS, Client Y cares about lead volume, Client Z is in a brand awareness phase. Commentary improved significantly.
  • Budget pacing caught 3 overspend situations in the first month that would have been missed.
  • The AI occasionally misidentified which campaign to act on when clients had similarly named campaigns across platforms. We added campaign ID confirmation before any write action.

Months 6–7: Expanding to all clients + write actions. Enabled the platform for all 55 clients. Turned on write capabilities (budget changes, pausing/enabling campaigns) with the guardrail system. The first week was tense — the founder was watching the audit log like a hawk. Nothing went wrong. By week 3, the team was making 40+ AI-assisted actions per day across platforms.

The reporting automation rolled out during this phase. The first batch of AI-generated weekly reports got mixed reviews — 80% were good enough to send with minor edits, 15% needed significant rewriting, and 5% had errors (wrong date range, pulled data for wrong campaign). By month 7, after tuning, it was closer to 90% good, 8% minor edits, 2% errors.

Months 8–9: Anomaly detection, CRM integration, and stabilization. Added the real-time anomaly detection layer. Connected HubSpot, Klaviyo, and Mailchimp — this was the agency's most-requested feature after reporting. Account managers wanted to see "Client X's Google Ads generated 45 leads this week, 38 are in HubSpot, 12 have progressed to 'qualified'" without logging into both platforms.

The last month was stabilization — fixing edge cases, improving the agent's handling of ambiguous requests ("pause the campaign" — which one? on which platform?), and building the approval queue UI for actions that need senior sign-off.


What Changed

Six months after full rollout, across all 55 client accounts:

What we measuredBeforeAfterChange
Time spent on weekly reporting~18 hrs/week~90 min review90%+ reduction
Budget pacing errors per month8–12 incidents1–2 incidentsDown ~80%
Time to detect campaign issues1–3 daysUnder 30 min (real-time alerts)Essentially instant
Accidental overspend incidents3–4/quarter0 in 6 monthsEliminated
Client-facing response time"Let me pull that together" (3+ hrs)15 seconds, live on callImmediate
Monthly reports delivered on time~70%98%Reliable
Client churn (annualized)~22%Under 10%More than halved
AI-assisted actions per day040–60New capability

The team dynamic shifted. Media buyers used to spend their mornings opening tabs and checking dashboards. Now they spend their mornings asking the AI "what needs my attention?" and then actually working on strategy — testing new audiences, reviewing creative performance, planning campaign expansions. One senior media buyer told us she "feels like she has a research assistant who never sleeps and never forgets to check something."

The reporting impact was the most visible. Three analysts used to spend Monday mornings in a reporting sprint. Now one analyst reviews AI-generated reports in about 90 minutes, and the other two were moved to deeper analytics work — cohort analysis, LTV modeling, attribution studies — stuff the agency always wanted to do but never had bandwidth for.

The biggest surprise: client retention. They lost 12 clients the year before we started (22% churn). In the 6 months after full deployment, they lost 2. The founder attributes it mostly to responsiveness — clients feel like the agency is on top of things because answers come in real time, reports are always on schedule, and problems get flagged before the client notices. "We didn't get better at marketing," he said. "We got better at communicating that we're good at marketing."

One thing that didn't work as well as we hoped: the AI's ability to recommend strategy changes. It can tell you that a campaign's CPA increased 30%. It can even tell you that the increase correlates with a frequency spike. But when it tries to recommend "refresh creative" or "narrow targeting" — it's right about 60% of the time, which isn't reliable enough for the team to trust without verification. We're working on this, but strategic recommendations require context that's hard to encode — the client's brand guidelines, what they've tried before, competitive dynamics. For now, the AI flags problems; humans decide solutions.

"I was skeptical. I've seen 'AI marketing tools' before — they're usually dashboards with a chatbot bolted on. This is different because it actually does things. I can say 'pause all underperforming ad sets for Client X' and it does it. I can say 'how are all my clients pacing this month' and I get an answer in 10 seconds that used to take me an hour to compile. My team went from being data janitors to being actual strategists. That's the shift."Founder & CEO (name withheld at client's request)


What's Next

The platform handles the operational layer well. But there are gaps, and some of them are fundamental:

  1. Creative intelligence — the system knows which ads perform well but can't tell you why. "Ad variant B has 40% higher CTR than variant A" is useful. "Variant B works better because the product is shown in-use rather than on a white background" would be transformative. We're exploring multimodal AI (vision models analyzing ad creatives) to bridge this gap. Early experiments are promising but not production-ready.

  2. Cross-client insights — the agency manages 55 clients, some in the same verticals. There should be learnings that transfer — "this audience targeting approach worked for 3 other e-commerce clients, try it for Client Y." Right now, that knowledge lives in the media buyers' heads. Encoding it into the system without leaking client data across accounts is the design challenge.

  3. Automated A/B testing — the AI can set up tests and monitor results, but the team still manually decides when to call a winner and scale it. They want the AI to handle the statistical significance calculation and auto-scale winners — with human approval, of course. Straightforward conceptually, but the edge cases are tricky (what if both variants are underperforming? what if sample size is too small but the client wants results now?).

  4. Client-facing portal — right now, only agency staff interact with the AI. Several clients have asked for read-only access to a version of the morning health check for their own accounts. We're designing a client-facing dashboard powered by the same MCP infrastructure but with restricted permissions and simplified language. The tricky part: the AI's internal commentary sometimes includes agency-internal notes ("Client Z is likely to churn if ROAS doesn't improve" — not something you want the client to see).


Built by GammaEdge. If your agency or marketing team is spending more time on platform management than on actual marketing — we should talk.

aimarketingautomationadvertisingagencyanalytics

Authored by:

GammaEdge Team

We build and ship production-grade AI systems that drive measurable outcomes. No demos, no slides — just systems that run.

Read more

Want similar results?

Tell us your challenge. We'll scope it and show you the ROI.