Is a picture really worth 1000 words? Taking pictures has gotten easier than ever, knocking the word/picture exchange rate way out of balance. But Ross Goodwin, an NYU graduate student, recently made it a lot easier to get your picture’s worth, by automatically generating long, wordy, and sometimes eerily strange descriptions of your photos.
Goodwin created word.camera, an app that creates “lexographs,” or text documents generated from photographs. It takes images, either directly from your camera or from uploaded files, and uses Clarifai, an image recognition tool, to extract tags and then feeds them into ConceptNet, a network trained on the relationship of words to meaning and real world knowledge. It feeds the result through a template that creates sentences to weave the information together and returns a description that might be given by a Martian who’s read an encyclopedia and is looking at the picture through a foggy telescope. This can be fun, and often weirdly poetic.
For example here’s the first few paragraphs from a lexograph of this photo:
Accordingly, a singer, a stringed instrument, and a european. Now, the singer is made from a person who singing. The stringed instrument is for jam session, and the european is a native or inhabitant of europe. For this purpose, the singer yearns for smoking marijuiana. There, it is a person who singing. To put it another way, it evokes crooner. In conclusion, it evokes performer. To sum up, a ballet dancer and a group: the ballet dancer belongs to a company that producing ballets, and the group evokes accession. Never, the group may include individual. Nonetheless, it is a set with composition operation. Once, it is a collecting in one place. Also, it is a set that being closed, associative, has an identity element and every element has an inverse. Immediately, the ballet dancer is known to some as ballerina. However, it may perfect his posture. Though, it is also known as a ballerina. Doubtedly, it is a one who dancing ballet. Or a skeleton, which is for teach anatomy to student. To repeat, it is an organisation. Surely, it evokes bone. For example, it evokes skeleton in closet. Nonetheless, it evokes endoskeleton. Nevertheless, it evokes secretting. To this end, a dalmatian, a musical performance, and a theater. By all means the dalmatian evokes romance. The musical performance may happen in auditorium, and the theater appertains to a building where theatrical performances or motion-picture shows can be presented. And yet, the dalmatian is a video game. Nearly, it is known to some as далматинец. Hence, it is a video game. Nonetheless, it is known to some as dalmát.
Yes, there is a performance happening, there is crooning and dancing. Are they Europeans? Yes. Are they Dalmations? No, they are supposed to be cats, but yes, they do sort of look like Dalmations. Do they yearn for marijuana, do they have skeletons in their closets? Well, they just might. The lexograph may know more than we do.
When I ran a bland picture of a parachute harness, the lexograph circled, semi-sensibly, around the ideas of harnesses, buckles, leg holes, and sports, but also went on a strange tangent about animals only dying once and children buttering bread, making the bland photo weirdly interesting. When someone else ran a picture of young Vladimir Putin the lexograph led with “Meanwhile, a history, a group, and an outfit. Undoubtedly, the history may repeat itself” before veering off to watercraft and war.
If you’re interested in artificial intelligence, playing with word.camera may give you interesting insights into the limits and strengths of image recognition and semantic networks. If you’re not, it may still give you serendipitous insights into the strange meanings that lurk in your photos and the world around you.