Extract Audio from Video Online

What this tool does

The app takes a local video file and extracts one audio track into a new audio file. It supports these input containers: MKV, MP4, MOV, and WebM. The current app does not convert the audio to a new codec. Instead, it performs direct extraction using stream copy where possible.

Most videos contain one audio track, but some contain multiple tracks, for example: one track for original audio, one track for commentary, one track for a second language, or one track for a microphone mix. The app detects audio tracks and lets you choose which one to extract. If a file contains more than 4 audio tracks, the app will mention that fact and only allow selecting the first 4 tracks.

Because the app is now intentionally simpler, it does not offer bitrate controls, encoder choices, conversion presets, or re-encoding options. This reduces complexity and improves reliability in a browser environment.

Key terms: container, codec, track, demux, remux, stream copy

Container format

A container is the outer file structure that holds one or more media streams plus metadata. Example containers are MKV, MP4, MOV, WebM, Ogg, and WAV. A container can hold multiple tracks, for example video, multiple audio streams, subtitles, chapters, and attachments.

Codec

A codec is the compression method used inside a track. Common audio codecs include AAC, MP3, Opus, FLAC, Vorbis, AC-3, and PCM. Common video codecs include H.264, HEVC, VP9, and AV1.

Track and stream

In everyday language, people say “audio track”. Technically, FFmpeg often uses the word stream. In practical use for this app, an audio track and an audio stream mean the same thing: one separate audio signal inside the video container.

Demux

Demuxing means reading a container and separating its internal streams. A demuxer reads the file structure and identifies where the audio, video, subtitles, and metadata are stored.

Remux

Remuxing means taking an existing stream and writing it into a new container without changing the encoded data. For example, copying AAC audio from an MP4 video into an M4A audio-only file is a remuxing style operation.

Stream copy

Stream copy means copying the compressed audio bitstream without decoding and without re-encoding. In FFmpeg terms, this is done with -c:a copy. This is the core behavior of the current app.

How FFmpeg extracts audio in the current app

The app uses direct extraction only

The current version of the app no longer offers conversion to new audio codecs. It only extracts an existing audio track exactly as it already exists inside the source file. That means:

no decode and re-encode step,
no new bitrate selection,
no lossy recompression from the app itself,
faster processing,
better reliability in the browser.

Conceptual FFmpeg command

The exact command depends on the selected track, but the idea is:

ffmpeg -i input.mp4 -map 0:a:0 -vn -c:a copy output.m4a

Important parts:

-i input.mp4 opens the source file.
-map 0:a:0 selects the first detected audio track.
-vn drops the video stream from the output.
-c:a copy copies the audio stream without re-encoding.

Why the current app is simpler than the earlier design

Earlier planning included both direct extraction and conversion to formats such as MP3, AAC, Opus, WAV, FLAC, and OGG. That approach is useful in theory, but it adds more browser-side complexity:

different encoder options per codec,
bitrate UI decisions,
variable versus constant bitrate handling,
more memory usage,
slower performance on large files and mobile devices,
more chances for confusing failures in Wasm environments.

For that reason, the released app is intentionally focused on the safest browser-first use case: detect and extract one existing audio track as-is.

Why this preserves quality

Because the app uses stream copy, it does not recompress the selected audio. If the source track is AAC, the copied output is still AAC. If the source track is Opus, the copied output is still Opus. This means the app does not add another generation of lossy compression.

How FFmpeg runs in the browser with WebAssembly

WebAssembly changes the runtime, not the media logic

FFmpeg is traditionally a native program compiled for Windows, Linux, or macOS. In this app, FFmpeg is compiled to WebAssembly so it can run inside the browser sandbox. The media processing logic still comes from FFmpeg. What changes is the execution environment.

What the browser app actually does

It reads the selected local file through browser file APIs.
It loads a self-hosted FFmpeg WebAssembly runtime.
It writes the input file into FFmpeg’s virtual in-memory filesystem.
It analyzes the file by opening it and parsing FFmpeg’s stream information output.
It runs a direct extraction command for the selected audio track.
It reads the output file back from FFmpeg’s virtual filesystem and offers it for download.

Why the current app uses the single-thread core

The current app intentionally uses the single-thread FFmpeg WebAssembly core only. Multi-thread builds can be faster, but they also require more browser and hosting complexity, especially around SharedArrayBuffer and cross-origin isolation headers. Since the current app only performs direct extraction and not heavy re-encoding, single-thread mode is a better fit for simplicity and reliability.

Privacy model

The selected video file stays in the user’s browser session. The tool is designed to process the file locally and does not need to upload it to a server. This is the key privacy benefit of a browser-side FFmpeg approach.

Input video containers: MKV, MP4, MOV, WebM

Why containers matter for extraction

Audio extraction is primarily a container and stream selection task. The container defines how streams are declared, indexed, timestamped, and stored. The codec defines how the audio itself is compressed. This tool supports multiple input containers because FFmpeg can demux them in the browser.

MKV: Matroska

Matroska is an open multimedia container based on EBML. It is very flexible and commonly used for files with multiple audio tracks, subtitle tracks, chapters, and rich metadata. MKV is especially common in archival and downloaded media workflows.

For extraction, MKV is useful because it often stores:

multiple language tracks,
commentary tracks,
surround mixes and stereo mixes in one file.

MP4: ISO Base Media File Format family

MP4 is the dominant delivery container for web video and device playback. It is based on the ISO Base Media File Format and stores media inside typed boxes such as ftyp, moov, and mdat. MP4 most commonly carries AAC audio, but it can contain other supported audio codecs as well.

MOV: QuickTime File Format

MOV is the QuickTime File Format and is closely related to MP4. It is common in Apple and production workflows and often appears in camera originals, intermediate exports, and edited source files. From the app’s point of view, MOV is another container that FFmpeg can inspect and extract from.

WebM

WebM is a web-oriented container format derived from the Matroska family. It is commonly used with VP8, VP9, or AV1 video and Opus or Vorbis audio. WebM is especially relevant for browser-native video workflows and web publishing.

In practice, WebM is often the most likely input container to contain:

Opus audio,
Vorbis audio,
modern web-first media combinations.

Why native browser playback and tool processing are different

A browser may not natively play every container and codec combination in a regular <video> or <audio> element. That does not automatically mean the app cannot process it. FFmpeg WebAssembly can still demux and inspect many files that browsers do not play natively.

Current output behavior and extracted audio file mapping

The app does not create arbitrary new output formats

The current app only extracts the selected audio track in its original encoded form. That means the exact output file type depends on the detected source audio codec.

The current mapping logic is:

Detected source codec	Output file type used by the app	Typical extension	Notes
AAC	AAC in MP4 audio-only container	.m4a	Good compatibility for direct extraction.
ALAC	ALAC in MP4 audio-only container	.m4a	Lossless Apple audio in M4A container.
MP3	MP3	.mp3	Very compatible output.
Opus	Opus	.opus	Typically Ogg Opus style output.
Vorbis	Ogg Vorbis	.ogg	Open format output.
FLAC	FLAC	.flac	Lossless compressed output.
AC-3	AC-3	.ac3	Direct extraction only.
E-AC-3	E-AC-3	.eac3	Direct extraction only.
PCM variants	WAV PCM	.wav	Used when the source is PCM.

Why some codecs are not supported for direct export

Not every possible audio codec is currently mapped to a clean output choice in this simplified app. If the selected track uses an unsupported codec for direct export, the app will not offer extraction for that track. This is an intentional limitation to keep behavior predictable and avoid browser-side edge cases.

Browser format support in 2026

Two separate support questions

Can the browser play the resulting audio file natively?
Can FFmpeg WebAssembly process the source file internally?

The app mainly depends on the second question. However, native playback still matters if the user wants to preview the extracted file directly in the browser after extraction.

Practical playback expectations for extracted outputs

Output	General browser compatibility in 2026	Practical note
MP3	Very strong	Most compatible result for playback.
M4A (AAC or ALAC)	Very strong on Apple and strong in modern browsers	Good compatibility for many users.
Opus	Strong in modern browsers	Good modern support, but exact handling can vary by container expectations.
OGG Vorbis	Generally good in modern browsers, historically weaker in Safari	Good open format option, but not the safest universal choice.
FLAC	Strong in modern browsers	Lossless playback is broadly better than it used to be.
WAV	Very strong	Simple and compatible, but large.
AC3 / EAC3	More limited	Useful as direct extraction results, but not ideal for universal browser playback.

What this means in practice

The app’s job is extraction, not playback normalization. If a user needs a universally playable consumer format, they may later convert the extracted file with a separate tool. The current browser app focuses only on safe direct extraction.

App design notes for developers

Current user flow

User selects a local video file.
The app checks the file size first.
If accepted, the app automatically loads the single-thread FFmpeg WebAssembly core.
The app automatically analyzes the selected file.
The app detects and lists up to 4 audio tracks.
User selects one audio track.
The app extracts that track as-is with -c:a copy.
The browser offers the result for download and inline preview when supported.

Why the app no longer has separate “Load engine” and “Analyze file” buttons

In a browser utility like this, extra steps make the UI more confusing than helpful. The simplified version automatically handles engine loading and file analysis after file selection. This gives a much cleaner workflow for normal users.

How track detection works

The app opens the selected file with FFmpeg and parses the printed stream information lines. It only treats lines containing both Stream #0: and Audio: as audio tracks. This matters because earlier parsing approaches could incorrectly interpret other stream lines as extractable audio.

Why the app supports only the first 4 audio tracks

Most real-world files users will handle have 1 to 4 audio tracks. Supporting more is possible in theory, but it increases UI clutter and is unnecessary for the common browser use case.

Why the app uses direct copy only

The browser environment is much more fragile than native desktop FFmpeg. Direct extraction avoids:

large re-encoding memory costs,
codec-specific encoder failures,
bitrate and quality mapping complexity,
unnecessary recompression.

Practical limits: memory, file size, speed, mobile

1.8 GB application limit

The current app intentionally rejects files larger than 1.8 GB. This is a conservative browser-side limit chosen to reduce failures with large local files. If a file is above that threshold, the app does not try to analyze it. Instead, it informs the user that very large files are not supported and suggests compressing the video first.

Why very large files are a problem in browser-based FFmpeg

FFmpeg WebAssembly commonly works with an in-memory virtual filesystem. That means the browser may need memory for:

the input file,
the extracted output file,
runtime overhead,
WebAssembly memory growth.

In practical terms, very large files can fail due to WebAssembly or browser memory limits even before the processing logic itself becomes the issue.

Speed considerations

Direct extraction is much faster than re-encoding.
Single-thread Wasm is slower than native desktop FFmpeg, but acceptable for this kind of direct copy task.
Mobile devices may still be noticeably slower than desktop systems.

Mobile behavior

On mobile devices, browsers can suspend background tabs or reduce resources for long-running tasks. Users should keep the tab open and active during processing.

Licensing and compliance checklist

This section exists so the project can be published and discussed responsibly. It is not legal advice. If you distribute browser builds commercially, review licensing with a qualified lawyer.

Project layers involved

FFmpeg: the underlying multimedia engine.
ffmpeg.wasm: the browser-oriented wrapper and compiled runtime packaging.
Emscripten: the toolchain typically used to compile C and C++ code to WebAssembly.
Your wrapper UI code: the app logic and interface layer around the Wasm runtime.

Why this tool is best understood as a wrapper

The app itself does not implement media extraction algorithms from scratch. It wraps FFmpeg WebAssembly and provides:

file selection,
automatic engine loading,
track detection UI,
stream selection,
download flow,
browser safety checks such as the 1.8 GB limit.

This distinction is important both for open-source publication and for licensing clarity.

Self-hosting assets

The app relies on self-hosted Wasm, JS glue, and worker assets inside the site’s own asset folder. This improves reliability and makes dependency ownership clearer.

Codec and library notes

Since the current app performs direct extraction only, it avoids many of the licensing and packaging questions that come with browser-side re-encoding. Even so, the FFmpeg build and its enabled components still matter. Anyone redistributing the app should document:

which FFmpeg build was used,
which optional codecs or libraries are compiled in,
which license notices must be shipped alongside it.

References and specifications

The following references are useful for the technical and licensing topics covered in this page:

Technical FAQ

Can I convert the audio to MP3 or another codec in the current app?

No. The current app does not re-encode. It only extracts the selected audio track as-is from the source file.

Why was conversion removed?

Conversion added too much browser-side complexity and created more potential failure points. The simplified direct extraction workflow is more reliable and easier to understand.

Can I extract audio without quality loss?

Yes, that is the main goal of the current app. Because it uses stream copy, it does not add another re-encoding step.

Why does the app reject files above 1.8 GB?

Very large files are unreliable in browser-based FFmpeg workflows because of WebAssembly and browser memory limits. The app uses a conservative size limit to avoid confusing failures.

What should I do if my file is too large?

Compress the video first, or use a native desktop FFmpeg workflow for very large media files.

Does the app upload my file?

No. The tool is designed to process locally in the browser. Your file stays on your device.

Conclusion

Extracting audio from video is fundamentally a container and stream selection task. FFmpeg remains the most reliable engine for that job, and a WebAssembly build makes it possible to do it directly in the browser with strong privacy benefits. The current app intentionally focuses on the most practical and reliable browser-side version of that workflow: detect the audio tracks, let the user choose one, and extract it as-is without re-encoding.