Smart image and video search
appearance of objects in other images. It can find pictures that contain objects with a similar appearance (in this example, other arrows). Interestingly, it can also spot arrows in other pictures even when their appearance varies (color, direction, size, slant, etc.). PiXserve just understands what an arrow is. In fact, piXserve can identify almost any type of general shape or object in an image.
The company has focused on refining the capabilities of the product to recognize text within images and also to recognize detailed information about human faces. It might seem trivial to be able to recognize text within images, since that’s what optical character recognition (OCR) programs have done for the last 10 or 15 years. However, there is a huge difference between recognizing relatively uniform text on a white background and recognizing text in all its different varieties and languages within a complex visual image such as a color photograph or a stream of video. PiXlogic has solved many of those complex and difficult problems.
In Figure 5 (Page 9, KMWorld, Volume 16, Issue 6), piXserve takes the text string “Enron,” entered by the user in its search screen (left) in lieu of an image, and then searches a corpus of streaming video content to find frames in the video where the word Enron appears in the field of view. As can be seen from the results set (right), it finds a significant number of instances where the term “Enron” is shown in the video or in other pictures.
Figure 6 (Page 9, KMWorld, Volume 16, Issue 6), shows a complex image that piXserve contextually understands in detail. Using its underlying intelligence, it grasps the concepts of the sky, a tree and a human represented by a body, face, head, arms and legs. It also understands the concepts of foreground and background, and can pick out the detail between them. Finally, it is able to compare those details in order to grasp relationships and context.
As can be seen in Figure 6, one of the most important capabilities of piXserve is its ability to help organize and categorize image and video information without manual intervention. With its ability to synthesize descriptions from an image, piXserve offers comprehensive tagging capabilities that can provide extensive detail about an image or video. As piXserve’s indexer ingests images and video against which it will search, it automatically synthesizes information about the data it is indexing, including details about the size, shape, position, color and relationship of objects within each image and/ or video frame. That information is recorded in piXserve’s database management system.
Understanding objects
PiXserve’s engine uses that metadata against a library of known information about specific “standard” kinds of objects. For example, it understands a picture of a face, or a tree, or an arrow or the sky, and can consequently classify and contextualize that information. Of course, the system has to understand what an object is or it can’t possibly classify it. To facilitate that system-level learning, the piXlogic product has a significant library of standard objects and a set of tools that allow customers to extend and expand the database of known objects.
PiXlogic has been working to extend the capabilities of its image search engine to provide new and improved business processing capabilities. Most recently, piXlogic has added an event alerting and notification feature that allows it to send out notices when it detects a particular object, text string, or notion within an image or video. That means customers can set up piXlogic servers to automatically search through incoming streams of images and video to find specific actions or objects and then respond to that information automatically. The response might be a simple e-mail notification or a more complex event processing activity, such as carrying out a database operation, or calling the police or fire department.
As we’ve seen, multimedia search of images and video is technologically challenging. Fortunately, extensive research, modeling and simulation about how the human brain and its various components actually work is actively helping us improve our computer systems, making them more intelligent and contextually aware. Though I hate adding yet another definition to Web 2.0, it does seem plausible to me that breakthroughs in image and video search, which provide enhancements for automatic context and meaning, are driving our search solutions to a new singularity for Web 2.0. That is, intelligent computer systems that understand what we want and deliver what we need when we ask for it. Now I have to go ... my son is interested in kite surfing (whatever that is), and I have to search YouTube to find a video explaining it. Surf’s up … I think.