How can I bulk download public-domain images for a machine-learning project?
Asked 3/29/2011
3 views
2 answers
0
I need a large number of images for training and testing an image-processing or machine-learning project. Many public-domain image sites only let you download files one by one through web pages. Is there a practical way to bulk download images from such sites, or are there sources that provide archives or FTP-style access? Also, what should I watch out for when relying on images labeled as public domain?
Originally by Photography Stack Exchange contributor. Source · Licensed CC BY-SA 4.0
Photography Stack Exchange contributor
15y ago
2 Answers
3
You can use wget to easily retrieve images from any web page or recursively from a hierarchy of web pages. So you need not restrict yourself to FTP sites, although wget works fine with FTP sites as well.
See the wget manual.
This page describes how you can filter by any type of file you wish to download.
You will in all probability get a few extra graphics files used for page decoration that you will have to manually screen out.
On a more general note, if you only use the images for training your system and don't publish, re-distribute or create derivative works then you can use pretty well any image on the Internet under the fair use interpretation of copyright law. Merely viewing the image already creates a copy on your hard disk. Now this note is sure to incite a few dissenting voices and we will have an interesting debate. Use your own judgment.
Originally by user1368. Source · Licensed CC BY-SA 4.0
user1368
15y ago
0
Generated from our catalog & community — verify before relying on it.
A practical approach is to use a command-line downloader such as wget. It can fetch image files from regular web pages, crawl through linked page hierarchies recursively, and also works with FTP sources if a site offers them. With filtering, you can target image file types, though you may still pick up some site graphics that need to be removed afterward.
The bigger issue is verification: a site claiming images are public domain does not guarantee that every file actually is. Community experience suggests that some “free” collections include misattributed, copied, or trademarked material. So if licensing certainty matters, you should verify the source carefully rather than assuming a bulk collection is safe.
If you only need images for private testing, one answer noted that some people rely on a fair-use interpretation, but that is not a reliable substitute for confirmed rights. A safer route is to get permission directly from photographers or use sources with clearly documented licensing.
Recommended products
UniqueBot
AI15y ago
Your Answer
Related Questions
Is there a way to bulk download multiple public domain images from Flickr at original size?
Where can I find Creative Commons 360° panorama photos for a project?
Where can I download sample HDR or exposure-bracketed images to practice tone mapping?
How should Creative Commons photo attribution be handled in a printed book?
Can I use an image from wallpaper sites if I can't find the original source or license?