C# .NET 10 console app for scraping and converting audiobooks (M4B/MP3) from multiple websites.
- .NET 10, C# with nullable reference types and primary constructors
- Spectre.Console — all console output (markup with
[green]...[/]etc., status spinners, tables, figlet) - Xabe.FFmpeg — audio conversion and segment merging; binaries via
Xabe.FFmpeg.Downloader - Newtonsoft.Json — JSON parsing of API responses
- Microsoft.Extensions.DependencyInjection — DI container
- Microsoft.Extensions.Configuration — configuration via
appsettings.jsonwith binder - Microsoft.Extensions.Http —
IHttpClientFactoryfor typed HttpClients
IScraperStrategy (Scraper/Abstractions/)
└── BaseScraperStrategy (Scraper/Base/)
├── TokybookStrategy (Scraper/Strategies/) — tokybook.com
├── DropboxTracksStrategy (Scraper/Strategies/) — sites with JS `tracks = [{ chapter_link_dropbox }]` structure:
│ zaudiobooks.com, freeaudiobooks.top
├── AudioSourceTagStrategy (Scraper/Strategies/) — sites with `<source type="audio/mpeg">` or `<a href="*.mp3">` structure:
│ goldenaudiobook.net, fulllengthaudiobooks.net, bigaudiobooks.net,
│ findaudiobook.com, bookaudiobook.net, hotaudiobooks.com, audiozaic.com,
│ appaudiobooks.com
├── PlaylistAudiobookStrategy (Scraper/Strategies/) — all sites with `data-playlist` JSON attribute:
│ hdaudiobooks.net
└── AudioAzStrategy (Scraper/Strategies/) — Next.js site with tracks JSON in streaming data:
audioaz.com
ScraperFactoryselects the matching strategy viaCanHandle(url)ScraperConfigcontrols parallelism parameters (defaults: 3 parallel downloads, 2 conversions, 5 segments/track)- Runtime overrides in
Program.cs: MaxParallelDownloads=5, MaxParallelConversions=3, MaxSegmentsPerTrack=8
Downloads and conversions run decoupled via Channel<T> + SemaphoreSlim:
- Download tasks write completed tracks into a bounded channel
- Conversion tasks read from the channel and invoke FFmpeg
- Channel is closed after
Task.WhenAll(downloadTasks)
SegmentedTrackData— for HLS streams (.m3u8→.tssegments → merge via FFmpeg concat)DirectFileTrackData— for direct MP3/audio downloads (zaudiobooks, goldenaudiobook); when the source is already in the target format, a copy-conversion runs to embed metadata without re-encoding
BaseScraperStrategy provides shared metadata infrastructure used by all strategies.
For all non-Tokybook strategies, metadata is collected in two stages before downloading:
Stage 1 — MP3 tag enrichment (EnrichFromFirstTrackTagsAsync):
- Runs
ffprobeon the first chapter URL to read existing ID3 tags without downloading the file - Maps:
artist→Author,date→Year,comment→Description(skips chapter references like "Chapter 1") - Only fills empty fields — never overwrites
Stage 2 — HTML extraction (ExtractCommonMetadata):
- Only fills fields still empty after Stage 1
og:image→CoverArtUrlog:description→Description<script type="application/ld+json">with@type:"Audiobook"→ all fields (AudioAZ)ld+jsonheadlinefield → author viaExtractAuthorFromHeadline()(WordPress/Yoast sites)<link rel="preload" as="image">→CoverArtUrlfallback (fulllengthaudiobooks, appaudiobooks)- H1 title → author as last resort
Other shared helpers:
DownloadCoverArtAsync(url, folder)— downloads cover once to_cover.{ext}, returns temp pathBuildMetadataParams(bookMetadata, trackData, hasCoverArt)— returns FFmpeg-metadataflags (title, album, artist, album_artist, track, genre, comment, publisher, date, cover art)- Cover art is passed to FFmpeg as a second input (
-map 0:a -map 1:v -c:v copy -disposition:v attached_pic) - Cover art temp file is always cleaned up via
finallyafter the conversion pipeline completes
Tokybook gets richer metadata directly from the post-details API response (authors, narrators, coverImage, description, publisher) — no HTML scraping or ffprobe needed.
AudiobookMetadata (abstract)
│ Title, FolderPath
│ Author, Narrator, CoverArtUrl, Description, Publisher, Year ← populated by ffprobe tags, HTML, or API response
├── SimpleAudiobookMetadata — ChapterUrls: List<string>
└── StreamingAudiobookMetadata — Tracks: List<TrackInfo>, StreamToken, AudioBookId
TrackData (abstract)
│ TrackTitle, SanitizedTitle, TrackNumber, TotalTracks
├── SegmentedTrackData — TempFolder, FolderPath, TsSegments: List<string>
└── DirectFileTrackData — FilePath, FolderPath
TrackInfo — Src, TrackTitle
UserSettings — DownloadPath, FFmpegDirectory, ConvertToMp3, ConvertToM4b
HttpService(IHttpService) — GET/POST viaHttpClient(10 min timeout, DI viaIHttpClientFactory)DownloadService— orchestrates strategy selection and execution, provides supported domainsSettingsService(ISettingsService) — loads/savesUserSettingsinappsettings.json, auto-downloads FFmpegIpifyService(IIpifyService) — fetches IP address viaapi.ipify.org(required for Tokybook API)PageService(IPageService) — Spectre.Console UI wrapper with figlet banner (Bulbhead.flf), ESC cancellation
Application.RunAsync(args)
→ InitializeAsync() — load settings, ensure FFmpeg
→ Parse CLI args (-d / --directory)
→ MainPage.ShowAsync() — main menu loop
├── "Search book on Tokybook.com" → SearchTokybookPage (API: /api/v1/search, pagination)
├── "Download from URL" → DownloadPage (shows supported domains, prompts for URL)
├── "Settings" → SettingsPage (download path, FFmpeg path, MP3/M4B toggles)
└── "Exit"
Registration in Program.cs → ConfigureServices() and ScraperServiceExtensions.AddScraperServices():
- Singleton:
IConfiguration,IAnsiConsole,EscapeCancellableConsole,IIpifyService,IPageService,ISettingsService,ScraperFactory,DownloadService - Typed HttpClient:
IHttpService→HttpService - Transient: all
IScraperStrategyimplementations
{
"UserSettings": {
"DownloadPath": "",
"FFmpegDirectory": "",
"ConvertToMp3": true,
"ConvertToM4b": false
}
}-d/--directory<path>— sets download directory
- Create a new class in
TokyBay/Scraper/Strategies/extendingBaseScraperStrategy - Implement
CanHandle(string url)— URL-based detection - Implement
DownloadBookAsync(string url)— fetch metadata, then callProcessTracksInParallelAsyncorProcessDirectFilesInParallelAsync - In the metadata fetch method, call
ExtractCommonMetadata(html, metadata)after building theSimpleAudiobookMetadataobject — this fills cover art, author, description automatically from og-tags and ld+json - Register in
ScraperServiceExtensions.cs:services.AddTransient<IScraperStrategy, NewStrategy>();
dotnet build
dotnet run --project TokyBay -- -d "C:\Path\To\Downloads"dotnet publish -c Release -r win-x64 --self-contained
dotnet publish -c Release -r linux-x64 --self-contained
dotnet publish -c Release -r linux-arm64 --self-contained
dotnet publish -c Release -r osx-x64 --self-contained
dotnet publish -c Release -r osx-arm64 --self-containedPublish profiles are in TokyBay/Properties/PublishProfiles/ (WinX64, LinuxX64, LinuxArm64).
Flags: PublishSingleFile=true, PublishTrimmed=false, SelfContained=true.
Workflow: .github/workflows/release.yml
- Trigger: Pull request merged into
master - Constraint: Org
z00mableonly allows actions owned by the org — no third-party actions likeactions/checkout,actions/setup-dotnet,softprops/action-gh-release. Usegit clone/run-based alternatives instead (e.g.dotnet-install.shfor SDK setup,gh release createinstead ofsoftprops/action-gh-release). - Steps: git clone → install .NET 10 SDK (via
dotnet-install.sh) → extract version from.csproj→ publish for 5 platforms (win-x64, linux-x64, linux-arm64, osx-x64, osx-arm64) → ZIP → git tagv{version}→ GitHub Release viagh release create - Release notes: From PR body
- Console output always via
_console.MarkupLine(...)(Spectre), neverConsole.WriteLine - Error messages in
[red], success in[green], info in[blue], conversions in[cyan], warnings in[yellow], secondary in[grey] - Filenames are sanitized via
SanitizeName()([^A-Za-z0-9]+→_) - Retry logic: use
RetryAsync<T>()fromBaseScraperStrategy(exponential delay) - Temp directories always cleaned up via
SafeDeleteDirectory() - ESC cancellation in UI prompts via
EscapeCancellableConsole