23 OCT 2025 - We are back! If you have been following us over the last few years, you will know that the last 2 months have been rough. We website was practically not loading. Sorry for the mess. We are back though and everything should run smoothly now. New servers. Updated domains. And new owners. We invite you all to start uploading torrents again!
Next after [BOORU CHARS 2024](https://nyaa.si/view/1927862) volume of several imageboards image stream
based on **danbooru** (safe+questionable, **ID 8200000..9100000 = 24.09.2024..23.04.2025**),
with added "the best of" furry-related **e621** and loli-enabled **gelbooru** for the nearly same interval
and also unique **zerochan** content for **ID 3960000..4430000 = 13.06.2023..06.03.2025**.
As usual :
- images initially filtered Mpixels>=0.48, shorter_side>=600 px, volume>=60000 bytes, no animations
stripes dropped or cropped to aspect ratio 0.4..2.1
- PNG/WEBP/AVIF converted to JPG using **cjpegli 96% quality** (2000000 bytes limit)
modest downsampling done to longer side 2560px (landscape) 1920px (1x1) 2480px (portrait)
- verbose file naming used **"%website% - %id% - %up_to_3_copyrights% ~ %up_to_5_characters% (%up_to_2_artists%).jpg"**
files uniquely identified by "%website%+%id%"
- some general image statistics got with EXIFTOOL and [IMAGE MAGICK](https://imagemagick.org)
- content analisys was mostly the same as BC2023 with actual software and models
- [CRAFT text detector](https://github.com/fcakyon/craft-text-detector) used to estimate total size and number of text pieces
- torso components detected with [custom PyTorch model](https://github.com/aperveyev/booru_yolo/tree/main/models)
being built over [Ultralitics YOLOv11](https://github.com/ultralytics/ultralytics)
- clustering and sorting inside cluster implemented to arrange compositionally and visually similar pictures
inspect "readme" for details
- images deduplicatied using [AntiDupl](https://github.com/ermig1979/AntiDupl) up to 3-4% similarity along with BOORU CHARS 2024, 2023 and 2022
- semi-automated quality check done as follow
- real-life photos, no-character landscapes, foods and macro thrown away
- most of comic and N-koma, overtexted images and line-arts filtered out
- too "questionable" images (uncensored nipples or vulva, obvious adult actions) excluded
- some background crops, gamma correction, rotation, denoise and other nontrivial improvements implemented
Beside images release contains tab separated texts :
- **BC_2025.tsv** file/image related metadata **896.142 rows**
- **BC_2025_tags.tsv** tags list with enrichment
- **BC_2025_yolo.tsv** detailed results for torso components detection
- **BC_2025_yolov11m_aa22.pt** PyTorch YOLOv11 model
and also additional "readme".
Keep in mind this release is first of all
**a dataset of character-centric art in effective local format suited for batch processing**
and then
**a representative catalog of anime/game/cartoon copyrights, characters and artists for visual estimation**
but
**not offer high image resolution and pretending on completeness.**
NOTE content is a little more NSFW compared to predecessors. Such themes wasn't allowed before.
