Combine sources into a dataset
Pick documents, notes, and labels (labels expand at retrieval time) to ground a Focus or workflow run.
A dataset is the set of sources a Focus or workflow is grounded in. Datasets are the unit chats and workflows operate on, and they're more flexible than they look. A dataset can be a single document, dozens through a label, hand-picked notes, or any mix.
Sources
A source is the unit that goes into a dataset. Two kinds:
- Documents: anything you've uploaded (PDF, Office, EPUB, spreadsheet, plain text, etc.).
- Notes: markdown notes you've written or saved from chat. Notes are first-class sources, not metadata.
Labels are not sources directly. When you add a label to a dataset, the label expands at retrieval time to every source it currently holds; adding the label is equivalent to adding everything currently in it.
Build a dataset
You can build a dataset several ways:
-
1
From the documents page: multi-select rows, then click "Open Focus" or "Run workflow" from the bulk-action menu. The selection becomes the dataset.
-
2
From a label: open a Focus on the label, or run a workflow against it. The label expands.
-
3
From the dataset picker inside a chat: once a Focus is open, add or remove sources from its compose chrome. Datasets aren't frozen at the start.
-
4
From Pick documents with natural language: describe what you want, then apply the result to your selection.
Mixed datasets are common: a label of contracts plus a single supplementary memo, or a label of research papers plus three of your own notes summarizing prior reading. The chat sees all of it.
Datasets vs labels
These two come up together but mean different things:
| Labels | Datasets |
|---|---|
| Durable, named, persist across sessions | Live, ephemeral, scoped to one chat or workflow run |
| Where you organize | Where you act |
| Grow and shrink slowly | Change between turns |
The pattern is: organize once with labels; act repeatedly with datasets that pull from them.
When does a dataset show up?
Datasets are used by Focus and workflows. Ask docAnalyzer, by contrast, operates over the whole workspace, with no explicit dataset.
In other words: Ask sees everything; Focus and workflows see only what's in their dataset.
Constraints
Multi-source datasets respect per-plan caps on total source count and page volume. The caps come from your plan; check your plan settings to see your current limits.
What's next
- Three ways to chat: Focus is the chat mode that takes a dataset.
- Run a workflow across many docs: the batch-mode alternative.
- Pick documents with natural language: natural-language selection feeds straight into a dataset.