What comes out of Whistle Enterprise: from audio to document

If you are evaluating Whistle Enterprise, the question that matters most is what comes out the other end. Marketing material can describe a meeting tool in fairly abstract terms; the only thing that decides whether the tool fits your work is the actual document it produces. This article walks through what Whistle Enterprise gives you when a meeting goes in, and how the pieces fit together.

It is for anyone who has not downloaded the trial yet. The trial is the same product, free for thirty days, with no card and no account. Reading this and then trying it on a recording you already have is the cheapest way to know whether the tool fits your work.

The three artefacts

A meeting that runs through Whistle Enterprise produces three things, all kept in a local workspace on your own computer:

The audio recording. The original audio of the meeting. Live recording from a microphone, or an existing file you imported. Whistle Enterprise accepts MP3, WAV, FLAC, M4A, OGG and video files; the audio track from a video is what the rest of the pipeline works against.
The transcript. Speech-to-text output of the recording, with speaker labels attached to each segment. Thirteen languages are supported and the language of the recording is detected automatically.
The generated document. A structured write-up of the meeting produced by Whistle Enterprise’s custom fine-tuned AI model, built specifically for meeting document generation and designed to run on a normal laptop CPU. It reads the full transcript and writes the document on the user’s machine, with no audio or text sent to a vendor.

These three artefacts are deliberately separate. The recording is the source of truth. The transcript is what the AI reads to write the document. The document is the thing you would normally hand to a colleague.

The separation matters because it is what makes the document checkable. Any sentence in the document can be traced back to the part of the transcript that supports it; any part of the transcript can be linked to the part of the document where it appears. More on that below.

What the document actually contains

The document is not a summary in the way a one-paragraph executive summary is a summary. It is also not a transcript with the pauses and filler words taken out. It sits in the middle.

What you can expect on a normal meeting:

Headings and sections. The model reads the whole transcript and breaks the meeting into the topics it actually covered. A board meeting with five agenda items comes out with five sections, in the order the meeting hit them.
Decisions. Where a decision was made, the document records what it was, who proposed it, who agreed and any conditions attached. Where a decision was deferred, the document records that too rather than dropping the item.
Actions. Action points get pulled out into named follow-ups with the owner and the timing where it was given. If “we need to come back to this” is the only decision, that is what the document records.
Speaker attribution. The discussion is reported with attribution where the attribution matters. Not “the meeting agreed”, but “the chair proposed and the others agreed”. Where a speaker raised an objection, the document names the objection by speaker.
Narrative where prose is the right register. The document is in plain prose, not bullet-only telegraph English. A discussion that flowed as a discussion reads as a discussion in the document.

This is the bit the founder Tobias has talked about most. The line he uses is “the document you would write yourself if you had had the time”. That is what the model is asked to produce.

Speaker labels and how they hold the rest together

Speaker labels look like a small feature and they are actually load-bearing. The transcript carries them per segment. The document inherits them in the prose. The recording can be played back to a specific labelled segment.

For meeting types where the speaker matters (board minutes, client interviews, witness records, anything where attribution is part of the record), this is the property that makes the document defensible. A document that says “the chair raised the question of indemnity” is a record. A document that says “the question of indemnity was raised” is a paragraph in a memo. The first is what Whistle Enterprise produces by default.

Speaker labels in Whistle Enterprise are anonymous unless you rename them (“Speaker 1” → “Sarah Chen”). The renaming is local; the rename does not flow back to any vendor service because there is no vendor service. The label change is a change to the workspace files on your machine.

Source traceability

The single most useful feature for sensitive work is one that does not show up in marketing screenshots. Highlight any sentence in the generated document and Whistle Enterprise shows you the exact passage in the transcript it came from. Highlight a passage in the transcript and you see what was written about it.

The mapping is bi-directional. It is not a fuzzy match. The document’s sentence has a known origin in the transcript, and the application can show it.

This is the property that makes the document something more than a generated artefact. If a colleague pushes back on a line in the document with “are you sure that’s what was said”, the answer is to highlight the line and show the source. If a regulator asks for the basis of a particular finding, you show the source. The recording is still on your machine, so the next step from there (playing the audio for the moment in question) is also under your control.

The four export formats

When you are ready to share the document, four export options are available:

Format	When you’d reach for it
PDF	The most common. A formatted PDF that’s ready to email, attach to a matter file, or send to a board pack.
Word (.docx)	When the recipient needs to edit the document or merge it into a longer document.
Markdown (.md)	For internal systems that store records in plain text, or when the document is going into a knowledge base.
Plain text	The minimum-viable export. Same content, no formatting.

PDF and Word exports come in three styled themes (Standard, Formatted and Professional) so the same document can be dressed for an internal share or a client deliverable without re-typing.

The original recording and the transcript stay on your machine when you export. The export is a copy of the document at the moment you exported it, in the format you chose. If you change the document afterwards, the next export reflects the change.

What to do next

If the output here matches what your work needs, the trial is the next step. The trial is the full product for thirty days. You can run it on a recording you already have, or do a fresh recording in the application; the document that comes out is the one you would receive as a paying customer.

The pricing page has the licence tiers and the FAQ. The security notes cover what the application does and does not do at a system level. The free 30 day trial is on the product page.

For procurement, finance, IT directors, ops teams, anyone responsible for tool TCO

What you stop paying when the meeting tool runs locally

Cloud meeting tools come with subscriptions, usage limits, outages and quiet vendor lock-in. A local tool removes all four. Here is what that means in practice.
For technical buyers, IT, anyone evaluating diarisation quality

Speaker identification when nothing leaves the device

How speaker labelling works when the audio cannot be sent to a cloud service. What it gets right, where it sometimes slips, and why it still matters in a real meeting record.
For buyers comparing meeting tools, anyone who has been told the AI is the answer

How accurate are AI meeting notes

AI meeting tools transcribe and summarise with varying levels of accuracy. What affects the result, where the failure modes are, and how to evaluate a tool on real meetings.

Back to all articles