Back to Blog

From Inbox to Database Without a Human in the Middle

Weekend experiment is over. This is what happens when you put it to work. Invoices and purchase orders arriving by email, dozens a day, and someone spending half their day typing fields into a system. Email to S3 to Lambda to done.

Weekend experiment is over. This is what happens when you put it to work.

A company I work with gets invoices and purchase orders by email. Dozens a day. Different vendors, different formats. Some are clean PDFs from accounting software. Some are photos of handwritten POs taken on someone's phone. Some are scans of faxes.

Yes. Faxes. In 2026.

The old process: someone opens each email, reads the attachment, types the relevant fields into the internal system. Vendor name, PO number, line items, quantities, totals. A few minutes per document if it's clean. Longer if it isn't. At volume, that's someone spending half their day on data entry.

The goal: emails arrive, documents get processed, structured data shows up in the app. Nobody opens an attachment.

The Pipeline

Email → S3 → Lambda → Vision API → App API

Five steps. Each one simple. Here's how they connect.

Email to S3

SES (Simple Email Service) receives the emails. Point an MX record at it, set up a receipt rule. When an email arrives, SES dumps the raw MIME message into an S3 bucket.

I don't process anything in the receipt rule. SES stores the raw email. That's it. If something breaks downstream, I've still got the original sitting in S3 exactly as it arrived.

Two buckets. One for raw emails. One for extracted attachments. The raw bucket is the source of truth.

S3 to Lambda

S3 event notification fires when something new lands in the raw email bucket. A Lambda pulls the MIME message apart and saves each attachment as its own file in the attachment bucket.

PDF, JPEG, PNG, TIFF - whatever. Each one gets metadata tags: original sender, subject line, timestamp. Housekeeping. But it matters when you're debugging why an invoice processed wrong three weeks later.

Second S3 trigger fires when files land in the attachment bucket. That kicks off the processing Lambda.

The Processing Lambda

This is where the vision API call from the earlier posts lives. Lambda gets the S3 event, pulls the attachment, sends it to GPT-4o.

The prompt is different from the recipe one. It's looking for vendor name and address, PO or invoice number, date, line items with descriptions and quantities and unit prices, subtotal, tax, total, payment terms.

Same principle though. Image in, structured JSON out.

One thing I learned: the prompt matters more than I expected. First version was too generic. The model would put the vendor address in the wrong field, or split a two-line item description into two line items. Tightening the prompt with specific field definitions and a couple of rules ("if an item description wraps to a second line, keep it as one item") fixed most of that.

I also added validation. Lambda checks that required fields exist, that line item totals roughly add up to the invoice total, that dates parse correctly. If something's off, it flags the document for review instead of pushing bad data.

Lambda to App API

Documents that pass validation get POST'd to the internal API. Structured data goes into the system. The team can search it, filter it, reconcile it.

Flagged documents go to a review queue. Someone looks at them, fixes whatever's wrong (usually a digit in a dollar amount or an ambiguous vendor name), and approves. Maybe 10-15% of the time. Takes seconds, not minutes.

Why Two Lambdas

Could have done the MIME parsing and the vision API call in one function. Split them because they fail for different reasons.

MIME parser fails on weird encodings or corrupt attachments. Email problem. Vision API fails on blurry documents or timeouts. Processing problem. Separate, I can retry processing without re-parsing the email. I can see at a glance where things broke.

Also means I can reprocess documents without the original email. Updated the prompt? Re-run processing against the attachment bucket. Fresh results.

Why Not Process Inline in SES

Tried this first. SES Lambda actions have a 30-second timeout and a 30MB payload limit. Vision API calls on large documents can take 10-15 seconds. Multiple attachments blow past the timeout. S3 and async avoids both limits.

Error Handling

Failed documents (API timeout, bad response, validation failure) go to an SQS dead-letter queue. CloudWatch alarm if the queue depth gets too high.

This has been useful exactly twice. Once during an OpenAI outage. Once when a vendor started sending password-protected PDFs the model couldn't read. Both times I knew within minutes instead of finding out days later when someone noticed missing invoices.

What I'd Do Differently

Start with the prompt. I set up the whole pipeline before the prompt was working well. Should have iterated on the prompt against a batch of real documents first. Nail the output format, then build around it.

Version the prompts. I've updated mine four or five times. Didn't track which version processed which document early on. Had to reprocess a batch when I realized an old prompt was mishandling multi-line descriptions.

Test with the ugliest documents first. I tested with clean PDFs. Everything looked great. Then the first batch of phone-photo POs came through and my validation was too strict. The model was correct but dollar amounts were off by a penny from rounding, which tripped the total-matching check. Start with the worst stuff you've got.

The Numbers

After a few months running:

  • 85-90% of documents process with no human touch
  • 10-15% flagged for review. Average review: 30 seconds
  • Less than 1% fail completely (password-protected files, documents so blurry a human couldn't read them either)
  • Under a penny per document for the API call. Lambda and S3 costs are negligible
  • The person who spent half their day on data entry now spends about 20 minutes on flagged items

If you've got documents arriving by email and someone is reading them and typing fields into a system - invoices, POs, receipts, work orders, inspection reports, shipping manifests - this pattern works. SES, S3, Lambda, and an API call. You could build the whole thing in a day if you already have the receiving app.

Most of my time went to the prompt and the validation, not the infrastructure.

Share this article

Want to Work Together?

Let's discuss how I can help with your project.

Get in Touch