Automation Pipeline

Everything that happens from the moment a consultant submits the form to leads landing in HeyReach (LinkedIn) or Instantly (Email).

💬

LinkedIn Path (HeyReach)

Steps 1–4 → Push to HeyReach
No email needed — leads are pushed by LinkedIn URL with personalized messages as custom variables.

📧

Email Path (Instantly)

Steps 1–6 → Push to Instantly
Same as LinkedIn path + 2 extra steps: find work email from LinkedIn, verify it's deliverable via MillionVerifier.

Entry Point

📋

Google Form

Consultant fills out their name, LinkedIn, ICP, areas of expertise, case studies, tool choice (HeyReach or Instantly), and uploads a CSV of their LinkedIn connections.

⚡

Google Apps Script

On form submit, a script fires automatically — converts the CSV file to base64 and POSTs everything to /api/webhook on Vercel.

🔗

Webhook — /api/webhook

Vercel · Next.js API Route

Instant

Step 1

Validates the webhook secret so no one else can trigger it

Step 2

Parses the base64 CSV → extracts first name, last name, LinkedIn URL, company, position per row

Step 3

Saves the submission + all leads to Supabase, then fires the Inngest background pipeline

Background Pipeline — Inngest

Why Inngest? Vercel serverless functions time out after 60 seconds. Inngest runs the pipeline as a fan-out of parallel background jobs with no time limit. Leads are split into batches of 20, up to 5 batches run simultaneously, and each step is checkpointed so if anything fails it retries from where it stopped. Steps 1–4 run for both paths. Steps 5–6 only run for Instantly.

LinkedIn Profile Scraping

Apify · apimaestro/linkedin-profile-detail

Both paths

Each batch of 20 leads fires off 20 Apify runs simultaneously (fire-and-forget). Inngest then sleeps for 2 minutes while Apify scrapes the profiles. After the sleep, results are fetched and saved.

Extracts: headline, about section, full experience array, current company
Fire-and-forget start → 2 minute sleep → fetch results — no Vercel function held open
If position or company was missing from the CSV, it gets filled from the scraped profile data
1 Apify run per lead — with 5 concurrent batches, up to 100 Apify runs can be active at once

IFScrape fails → lead is marked scrape_status: failed → auto-disqualified in Step 3 (no AI call wasted)

Each batch of 20 takes ~3–4 minutes (start + 2m sleep + process). With 5 batches in parallel: 3,000 leads = ~150 batches ÷ 5 = ~2 hours.

Experience Summarization

OpenAI · gpt-5-mini

Both paths

The raw experience array from Apify is a deeply nested JSON object. We compress it into a clean ~150-word plain text summary before passing it to the qualification step.

Input: raw scraped_experiences JSON array from Apify
Output: one line per role — "Title at Company (dates)"
Reduces token cost by ~80% for qualification, makes the model's job easier
3x retry with backoff on failure — if all 3 fail, returns empty string

IFOnly runs for leads where scrape_status = done AND scraped_experiences exists. If scrape failed → this step is skipped for that lead.

Lead Qualification

OpenAI · gpt-5-mini · per lead

Both paths

gpt-5-mini gets the full LinkedIn profile data (headline, about, work history) alongside the consultant's ICP and areas of expertise. It decides if this lead is worth reaching out to. No default rules — the model judges purely on what it sees vs the ICP. If the data doesn't support a match, it disqualifies.

👤

Title Check

Decision-maker title relevant to the consultant's expertise? VP+, Director+, C-suite, SVP, Partner, MD, Head of [function]. ICs, analysts, associates → disqualified.

🏢

Company Check

Does the company match the consultant's ICP? The model reads the ICP field and applies it. If the profile suggests a small consultancy but ICP says Fortune 500 → disqualified.

✅

Active Check

Currently employed in this role? Checks headline, about, and experience for current employment signals. If they left the role → disqualified.

All 3 pass → Qualified

Lead moves to name cleaning → message generation → push.

Any 1 fails → Disqualified

Lead is skipped. No cleaning, no messages, never pushed.

IFOnly runs for leads with summarized_experience. If no summary (scrape failed) → auto-disqualified with reason 'No experience data found'.

FAILIf OpenAI API errors after 3 retries → lead goes to the Failed tab (not disqualified). It is never pushed.

Clean Names + Generate Messages

OpenAI gpt-5-mini (clean) · Claude Sonnet 4.6 (messages)

Both paths

AName + Company Cleaning — OpenAI gpt-5-mini

First Name → clean_first_name

Removes middle initials: "Michael J." → "Michael"

Removes suffixes: "Robert Jr." → "Robert"

Removes handles, special characters, extra formatting

Company → clean_company_name

Strips legal suffixes: "Acme Corp LLC" → "Acme"

Removes Inc, Ltd, Holdings, Group, Solutions, etc.

Normalizes casing and removes social handles

IFOnly runs for qualified leads. If disqualified → skipped entirely, no API call wasted.

BMessage Template Generation — Claude Sonnet 4.6 · once per submission

Claude writes 3 message templates using the consultant's name, expertise, ICP, and case studies. This runs once per consultant — not once per lead. The templates contain {clean_first_name} and {clean_company_name} as placeholders.

Touch 1Warm reconnect

Opens with "Hey {clean_first_name}, it's been a while...". Update about joining VAI Consulting. ONE specific case study result with a real number. Closes with "Thought of you. Would love to catch up." — never a pitch, never asks for a call.

Touch 250–70 words · Follow-up

References a second result or rescue/turnaround story from the case studies. Connects it to what the lead's company might be dealing with. Soft close: "if any of that is on your radar at {clean_company_name}, happy to chat."

Touch 325–40 words · Close-file

"{clean_first_name}, last note from me —" format. Keeps the door open with zero pressure. Ends with "Hope things are going well at {clean_company_name}!"

Substitution — per lead

For each qualified lead, the placeholders are replaced with their actual cleaned values. The full final messages (with real names) are saved to the lead record and pushed as 1st_Message, 2nd_Message, 3rd_Message custom variables to HeyReach or Instantly.

IFSubstitution only runs for leads with clean_first_name. If company name is missing, the template gracefully strips 'at {clean_company_name}' from the messages.

Find Work Emails from LinkedIn

Apify · x_guru/linkedin-email-scraper-no-cookies

Instantly only

All qualified LinkedIn URLs from the batch are sent in one API call to the x_guru email scraper. It returns work emails and personal emails for each profile. All found emails are saved — verification in Step 6 picks the best valid one.

Batch input: sends all 20 URLs (or however many qualified) in a single Apify run
Returns per lead: work_email + personal_emails array (e.g. corporate, gmail, old company emails)
ALL found emails saved to the all_emails column — not just one
Typical hit rate: 50-70% of profiles will have at least one email found
If zero emails found → lead stays qualified but cannot be pushed (missing email)

IFThis entire step is SKIPPED if tool_choice = heyreach. Only runs for Instantly submissions, only for qualified leads.

Example output per lead

work_email: "nathan.bell.81@gmail.com"

personal_emails: ["nathan.bell.81@gmail.com", "pecosbell@aol.com", "nathan@digitaltrends.com"]

Verify Emails

MillionVerifier · API v3

Instantly only

Every email found in Step 5 is sent to MillionVerifier — not just the "best guess." This way, even if the primary email is invalid, we can still find a valid alternative. After verification, the system picks the best valid email using a priority system.

Email selection priority (from valid emails only)

1st

Corporate email matching the lead's current company domain— e.g. sarah@pepsi.com if company is PepsiCo

2nd

Any corporate email (non-personal domain)— e.g. sarah@oldcorp.com — corporate but not current employer

3rd

Any valid email including personal— e.g. sarah.j@gmail.com — last resort, but verified deliverable

MillionVerifier returns: "ok" (valid), "catch_all", "unknown", "invalid", "disposable"
Only "ok" results are accepted — everything else is treated as invalid
3x retry with backoff per email — if all retries fail, email marked as "error"
Rate limit: 160 requests/second — even 5,000 emails verified in ~30 seconds

IFSkipped if tool_choice = heyreach. Only runs for leads where all_emails is not null (emails were found in Step 5). If zero emails are valid → email_verified = false, lead cannot be pushed.

Why verify ALL emails? An old corporate email might be invalid. A personal Gmail might be the only one that works. By checking everything, we maximize the chance of finding a working email for each lead.

💬

Push to HeyReach (LinkedIn)

Manual trigger · Dashboard → Push page

LinkedIn path

Manually triggered from the dashboard. You need to duplicate the master campaign in HeyReach, assign a sender account, and set send times first.

✅

Lead gets pushed if ALL true

→qualified = true
→clean_first_name exists
→All 3 messages generated
→push_status = pending (not already pushed)

⏭

Lead is skipped if ANY missing

→Disqualified leads
→Missing clean first name
→Missing any of the 3 messages
→Already pushed

HeyReach payload per lead

profileUrl, firstName, lastName, companyName

+ customUserFields:

1st_Message, 2nd_Message, 3rd_Message

Rate limit: 200ms delay between batches of 20 leads · API: POST /list/AddLeadsToListV2

📧

Push to Instantly (Email)

Manual trigger · Dashboard → Push page

Email path

Manually triggered from the dashboard. Select an Instantly campaign, then push. Only leads with a verified email address are included.

✅

Lead gets pushed if ALL true

→qualified = true
→clean_first_name exists
→All 3 messages generated
→push_status = pending
→email_verified = true ← extra
→work_email exists ← extra

⏭

Lead is skipped if ANY missing

→Disqualified leads
→Missing clean first name
→Missing any of the 3 messages
→Already pushed
→No email found
→Email found but failed verification

Instantly payload per lead

email, first_name, last_name, company_name

+ custom_variables:

1st_Message, 2nd_Message, 3rd_Message

linkedin_url, clean_first_name, clean_company_name

Rate limit: 500 leads per batch · 6,000 req/min · API: POST /api/v2/lead/add

Dependency Chain — What Blocks What

Scrape fails→no summary→auto-disqualified→no clean names→no messages→can't push

If a lead fails at any step, it drops out of the pipeline for that step onward — but it does not block other leads. Each lead is processed independently within its batch.

Full pipeline at a glance

Form

Google Forms

Webhook

Vercel

Scrape

Apify

Qualify

gpt-5-mini

Clean

gpt-5-mini

Emails

Apify x_guru

email only

Verify

MillionVerifier

email only

Push

HeyReach / Instantly

Costs & Rate Limits

Service	API / Actor	Rate Limit	Cost	Path
Apify (scrape)	apimaestro/linkedin-profile-detail	Free: 25 concurrent, Scale: 128	~$0.005/lead	Both
Apify (email)	x_guru/linkedin-email-scraper-no-cookies	Same account limit	~$0.005/lead	Instantly
OpenAI	gpt-5-mini	Tier 4: ~10K RPM	~$0.01/lead	Both
Claude	claude-sonnet-4-6	Standard	~$0.02/submission	Both
MillionVerifier	v3 API	160 req/sec	~$0.004/email	Instantly
HeyReach	AddLeadsToListV2	10 req/2sec	Included	LinkedIn
Instantly	/api/v2/lead/add	6,000 req/min	Included	Email
Inngest	Background jobs	50K runs/month (free)	Free	Both
Supabase	PostgreSQL	500MB (free)	Free	Both

Cost estimates are approximate. 5,000 leads (LinkedIn path) ≈ $75. 5,000 leads (Email path) ≈ $100. Claude cost is per consultant, not per lead.

Things to Watch

Apify concurrent run limit

With 5 batches × 20 leads = up to 100 concurrent Apify runs at peak. If you're on the free Apify plan (25 concurrent limit), you need to reduce batch concurrency from 5 to 2 in the code. Scale plan (128 limit) handles it fine.

OpenAI billing

Each lead costs ~3-4 API calls (summarize + qualify + clean name + clean company). 5,000 leads ≈ 20,000 API calls ≈ $50-100 depending on response lengths. Monitor usage at platform.openai.com.

Supabase storage

Free tier = 500MB. The csv_base64 column stores the full original CSV file. Multiple large batches will eat storage. Monitor in Supabase dashboard → Database → Usage.

Inngest monthly runs

Free tier = 50,000 runs/month. Each consultant with 5K leads uses ~252 runs (1 coordinator + 250 batches + 1 finalizer). You can onboard ~49 consultants with 5K leads each before hitting the limit.

MillionVerifier credits

Credits never expire. Current balance shown in the MillionVerifier dashboard. Each email verification uses 1 credit. With the "verify all" approach, a lead with 3 emails uses 3 credits.

Questions? Open the dashboard to check a live run — each submission shows status: scraping → qualifying → generating → complete.