Leading technology in multimedia queries

With the reduction of the price of storage media, more and more companies have stored audio and video files such as audio tapes and video tapes. This makes it easier and faster to search for a specific content in the future than to search for a file.

Text search engine technology has been around for a decade, and image processing technology has been around for a few years. But how do you query an image and text that changes at 30 times per second? New search tools solve this problem. Its key is to divide each audio or video file into many small segments and assign a pointer to each segment. This makes it easy to search for queries.

So how do you make an index of hundreds of hours of video? In the past, it may require the participation of all employees of the company and can only be done by hand. Fortunately, manufacturers provide advanced technology that automatically retrieves audio and video information, which reduces staff's workload by at least half. (Although their research on multimedia resource management systems has only just begun!)

The auto-retrieval software can scan audio and video streams and generate searchable stream content tags (which can also be storyline diagrams or small icons). Most software implements these functions by detecting as much metadata as possible (information describing real data). In audio and video information, metadata includes time code, copy, closed captioning, and body structure. The process of the main structure is: the software scans the video signal for changes due to clipping, fading, erasing, and other reasons, and then takes the “snapshot” of the new structure as the main structure of the video segment.

Many foreign companies provide such audio and video automatic retrieval software. These software use a "basic content processing" technology. This technology analyzes the flow of audio information, not only to distinguish voices, music and cheers, but also to identify significant changes in volume. Virage's multimedia catalog and analysis software (such as Video Cataloguer) uses advanced multimedia analysis algorithms to extract metadata from video information. IBM's Query By Image Content (QBIC) technology recognizes changes in color and texture. All work relies on back-end database systems, including Oracle and Informix; IBM also produces end-to-end automated retrieval software using their own DB/2 database. The related products of each company constitute a set of multimedia management system software of the company. The maximum price of each software is up to 25,000 US dollars.

Existing Technology US SportsLine uses Magnifi's Enterprise Server to automatically retrieve sports videos and the company's content on the CBS SportsLine website. Sports fans only need to search through keywords to watch short game videos. Magnifi's software conducts a search for new content posted on the site every hour, depending on how each company implements the search, and retrieves the content of the entire site every two weeks. Because the amount of data contained in the website is very large, and the content is updated quickly, the search speed is also required to be fast enough to ensure that all contents are queried in a given time.

Magnifi's software has made SportsLine a lot of money. The reason is simple: easier to query means that more people can visit the site and can browse more pages. Accordingly, advertisers will be more willing to pay more advertising costs and attract more advertisers.

By using the metadata for video search, the corresponding video clips can be searched on the company's website, but other video contents will be automatically loaded. For example, if the user wants to search for videos about skiing and enter keywords such as “mountain”, the software will add some high mountain scenery videos while finding ski videos. The same applies to audio retrieval, and the situation is even more serious because the company has its own radio station and a large amount of audio data.

PBS Video is a company that sells teaching records of American history archives to various schools. The biggest problem facing their company is how to get customers to find what they need from 300-hour videos. The short-term solution is a manually-made index (very inaccurate) written on paper. Teachers can use the loose-leaf tags to find the videotape and the order in which they are played. However, making and updating such loose-leaf labels requires a lot of money, and customers are not very satisfied with this.

To make up for this deficiency, PBS Video deployed Excalibur's Screening Room system. The system (currently in Beta phase) can automatically connect users to the PBS website. It allows PBS to provide a more powerful, up-to-date index for more information. And it adds synchronized online copies, chapter directories, and schemas to the index so that teachers can find the exact video segment location.

Automatic audio and video retrieval software can also help users save money. Simulating the mass media in the digital world often brings negative economic benefits. Because even if you invest more money to digitize and manage it, you will not realize the goal of saving costs, improving quality, reducing production cycle, and widely promoting. However, digitizing audio and video information is completely different. Analog audio and video information can only be recorded and played on tape recorders and VCRs. But digital audio and video information can be recorded and played on computers, which means that they can spread through the Internet. Moreover, with the increasing variety of information on the Internet, audio and video retrieval will become more and more important. In addition, as the technology for accessing metadata continues to advance, the search for digital audio and video will become easier.

But this does not mean that some organizations should no longer apply audio and video retrieval to analog mass media. The company will generally implement ROI by creating digital tags (such as analog video files) for large offline storage systems. Relatively speaking, the cost of indexing a file is relatively low, especially in relation to digitizing or modifying the file itself. Although the content of the archive is offline, the digital index makes it easy to access the data in the file.

For frequently used materials, an audio and video index should be established as soon as possible. And there are ready-made tools for creating a card catalog for the company's audio and video data, and effectively combining users with the databases and search engines that company employees have become accustomed to. In this way, if you invest 50,000 US dollars in the establishment of company information index, it is very likely that the audio and video data in the system will be called attractive content, and then you can prove that your original choice was correct. In addition, because most products on the market today are standard components and are compatible with most different query engines and databases, your investment in audio and video retrieval is relatively secure.

As server processing power and network bandwidth increase, video will gradually move toward DVD and on-demand video systems. Once you have a proper index, everything becomes searchable regardless of the medium or format.

The latest technology through automatic retrieval, we can query the audio files stored on the company's disk drive. Due to the use of speech recognition technology, search software can convert audio files into text format. For example, if you want to query your boss's record of last year's speech, you only need to enter the "first quarter sales forecast for 1997" in the keyword query because the text and language have already been converted.

This new technology will become very useful when retrieving video files. Because the CEO's shareholder conference recording information does not have closed captions, and does not contain any product introduction at all. However, they are indeed audible. The current speech recognition technology can only relatively accurately recognize the format of a single speech data, and the accuracy of non-system format and various recording materials is weak.

However, in fact, it does not require 100% accuracy when searching. In a two-minute video recording, as long as six or seven words can be identified, such as “Clinton”, “Lewinsky”, “confession”, “crime”, etc., they can only represent the video. Now. Many manufacturers, including Virage Inc., have listed real-time speech recognition technology as a key research object.

The biggest advantage of video retrieval is the ability to accurately separate the video of many different objects. Although basic shape recognition algorithms have long existed, useful object recognition technology has only appeared in a few years, and it has been less than a decade. The standard way to do object recognition is to create a model for each object—this is a huge project because the number of objects is almost infinite and each object has multiple attributes.

There is also a technology - pattern recognition technology (pattern recognition technology), more advanced than object recognition technology. Chroma Graphics has invented an image recognition technology that can identify a sample of an object without programming for each object. However, the technology has not yet been put into practical use. For example, if you want to query Apple's video, you only need to add some Apple images to your program. Just as a child knows that the fruit in reality is the same thing as the fruit in the book picture, the program can also find the video of Apple through the provided sample.