Data retention: what to ask your AI vendor and which tier you need

Why retention is a separate question from training

It's easy to lump every privacy question about AI together into "do they use my data." In practice it splits into two distinct questions, and a vendor can answer them differently. The first is whether your inputs are used to train, fine-tune, or evaluate the underlying model — the question "does my sentence about a client matter end up shaping the model someone else queries tomorrow." The second is whether your inputs are stored at all, and if so for how long, where, and who can see them.

Most reputable vendors have landed in roughly the same place on training: their commercial API tiers do not train on customer inputs. Retention is the messier question. Even when no training happens, the inputs may still sit in a logging system for a window of time so the vendor can investigate abuse, debug performance, and comply with law-enforcement requests. That logging window is what the rest of this article is about.

You need a contract, not a privacy policy

The free-tier or consumer privacy policy you can read on a vendor's website is a one-size-fits-all document aimed at a mass audience. It is not the document that governs your data when you sign up as a paying customer. The relevant documents are:

The commercial Terms of Service or Master Subscription Agreement you click through when you create the paid account — this is the baseline.
A Data Processing Addendum (DPA), which most major vendors offer on request and which sets the specific retention windows, processing roles, and sub-processor list.
Any optional addendum that changes the default — most importantly the Zero Data Retention addendum, which has to be separately requested and signed.
A Business Associate Agreement (BAA) if you're processing Protected Health Information.

If your only document is the public-facing privacy policy, you don't have an enterprise relationship — you have a consumer relationship dressed up in a paid skin. That is fine for some uses and wrong for others. Knowing which is which is the whole game.

The three tiers, plainly

Tier 1 — No special data protection

This is the consumer or free-tier experience. Inputs may be retained for long periods, may be reviewed by humans for safety purposes, and on some products may be used for training unless you've affirmatively opted out. It is the right tier for genuinely non-sensitive work: drafting marketing copy, brainstorming names, summarizing already-public articles, anything where the inputs would be fine on a billboard.

It is not the right tier for anything that touches a client matter. Treat the consumer tier as the equivalent of dictating into a microphone in a crowded coffee shop — fine for ideas, wrong for confidences.

Tier 2 — Commercial API (no training, short retention)

This is the standard paid posture from the major model vendors. Inputs are not used for training. Inputs are retained for a short window — typically a week or two — for abuse detection and debugging, then automatically deleted. The DPA spells out who counts as a processor and what subcontracting is allowed.

Specifics as of mid-2026: Anthropic's commercial API retains inputs and outputs for seven days by default before automatic deletion, and never uses them for training. OpenAI's standard API is similar, with a thirty-day default retention window on most endpoints. Both offer DPAs on request. This tier is appropriate for most operational work in a law practice — internal tooling, document drafts, classification, matter-type triage — use cases where the work product is something a colleague at your firm might already see.

Tier 3 — Zero Data Retention (ZDR)

Under a ZDR arrangement, the vendor does not store your inputs or outputs at rest after the API response is returned. The data is processed in memory, the response is sent back, and then it is gone. Anthropic and OpenAI both offer ZDR contractual addenda, gated by a sales conversation and a separately signed addendum on top of the standard commercial agreement.

ZDR does not mean the vendor stops caring about abuse — they still run real-time safety classifiers on inputs, and they may retain content if a request is flagged for malicious use or if compelled by law. But the default-on logging window of the standard commercial tier disappears. This is the tier that lets you write a sentence like "prompts and outputs are not retained at rest by our model provider" in your own privacy policy without overpromising.

An emerging alternative worth knowing about: serverless inference platforms — DigitalOcean's Gradient Platform is the cleanest example as of 2026 — that act as a pass-through to Anthropic and OpenAI and apply ZDR-equivalent posture by default, without a separately negotiated addendum. If you don't want to run a procurement conversation just to get ZDR, this is a real option.

Picking the right tier for a given tool

The tier you need is a function of the data that will flow through the tool, not the title of the lawyer using it. A partner using a tool that only ever sees published case names doesn't need ZDR. A junior associate using a tool that processes a client's unfiled affidavit does. Match the protection to the data.

Data the tool will process	Minimum reasonable tier
Public information only — case names, published statutes, marketing copy, generic templates.	Tier 1 is acceptable; Tier 2 is preferable for consistency across your firm's tooling.
Internal firm operations with no identifiable client information — time codes, matter types, billing summaries with names redacted.	Tier 2 (commercial API with DPA on file).
Identifiable but non-privileged client information — contact details, matter status, deadline tracking, intake fields.	Tier 2, with client disclosure consistent with your engagement letter.
Privileged communications, attorney work product, sensitive personal facts, or anything under a protective order.	Tier 3 (ZDR), and an internal pause to consider whether the use case is justified at all.
Protected Health Information under HIPAA.	Tier 3 plus a signed BAA. Most vendors gate the BAA behind ZDR or equivalent.

What BuildLegal does on its own side

BuildLegal handles the build-side conversation — the scoping interview, the spec, the code-generation back-and-forth — under the commercial API tier with our model provider. That means your inputs and outputs from those conversations are never used to train any model, are retained on the provider side only for a short abuse-detection window, and are then automatically deleted. On our own infrastructure, integration API keys you connect are encrypted at rest with AES-256-GCM using a key derived via HKDF, every project's runtime is isolated, and we do not log the contents of your scoping conversations beyond what's needed to render the chat back to you.

That posture is appropriate for the build-side activity — describing the shape of the tool you want to build is not, by itself, a privileged communication about a specific client matter. If you find yourself wanting to scope a tool by pasting in a sealed brief or an unredacted client memo, stop and just describe the shape of what you need. You don't have to share the data to build the form.

The downstream decision is yours

The tool you build, once it is live, is a separate question from the tool that built it. When your published intake form, summarizer, or document drafter starts processing real client data, you become the one responsible for picking the right tier for that data flow. Three honest scenarios:

Many tools you'll build don't call out to an AI vendor at runtime at all. A static intake form that posts to a database is just a form. Retention questions for it are about your hosting and storage, not about an AI vendor.
Some tools call out to an AI vendor for a specific operation — summarizing an uploaded document, classifying a matter type, drafting a paragraph. Those calls flow under the AI vendor's terms, at whichever tier you selected when you plugged in your integration key.
A few tools will sit in front of clients or counterparties directly. For those, you should be at Tier 3 if any of the inputs could be privileged or sensitive, and you should be transparent in your engagement letter and on the tool itself about what AI is doing and who sees what.

When you connect an integration API key to a tool you've built here, BuildLegal stores it encrypted at rest and does not log the contents of the requests your tool makes. But we don't override your vendor's terms — if you plug in a consumer-tier key, the tool runs under consumer-tier terms. Pick on purpose.

Why retention is a separate question from training

You need a contract, not a privacy policy

The three tiers, plainly

Tier 1 — No special data protection

Tier 2 — Commercial API (no training, short retention)

Tier 3 — Zero Data Retention (ZDR)

Picking the right tier for a given tool

What BuildLegal does on its own side

The downstream decision is yours

Keep reading

ABA Model Rule 1.1: the technology-competence duty in 2026

Privilege and confidentiality when you're using AI tools

Are AI-generated forms unauthorized practice of law?