Earlier this year the Internet Archive began culling over 14 million images from their public domain ebooks, then began uploading them to the Internet Archive’s Flickr account. This means that all of the historic images are now easily searchable and downloadable, something that wasn’t really possible before without downloading each ebook and finding the images yourself, then exporting them.
The ebooks are easily searchable already thanks to the Optical Character Recognition software which was used when adding them to the archive, but it didn’t work for images. Now with the help of Flickr, those looking for historic text and images can get the best of both worlds.
“The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book,” said The Internet Archives Communications Technology Scholar Kavel Leetaru when speaking with the BBC.
The software used isn’t perfect, so admittedly some of the tags on images will be imprecise, but to have such a vast library of easily searchable content is great for learning purposes and the team are hoping that libraries around the world will one day follow suit and digitize their books and images.
Thank you Arstechnica for providing us with this information.
Image courtesy of Arstechnica.