In this article, a media storm indexing mechanism is presented, where media storms are defined as fast incoming batches. We propose an approximate media storm indexing mechanism to index/store massive image collections with varying incoming image rate. To evaluate the proposed indexing mechanism, two architectures are used: i) a baseline architecture, which utilizes a disk-based processing strategy and ii) an in-memory architecture, which uses the Flink distributed stream processing framework. This study is the first in the literature to utilize an in-memory processing strategy to provide a media storm indexing mechanism. In the experimental evaluation conducted on two image datasets, among the largest publicly available with 80M and 1B images, a media storm generator is implemented to evaluate the proposed media storm indexing mechanism on different indexing workloads, that is, images that come with high volume and different velocity at the scale of 105 and 106 incoming images per second. Using the approximate media storm indexing mechanism a significant speedup factor, equal to 26.32 on average, is achieved compared with conventional indexing techniques, while maintaining high search accuracy, after having indexed the media storms. Finally, the implementations of both architectures and media storm indexing mechanisms are made publicly available.
Several frameworks of image similarity search strategies of multimedia content, also known as image descriptors, in distributed databases have been proposed. However, these strategies are designed in a manner that a collection of descriptors is processed once and indexed in a distributed database while search queries are executed over the already processed descriptors. When new descriptors seek to insert the framework, they are indexed in a sequential fashion, as no parallelization at the indexing process is provided.
i) An approximate indexing mechanism is proposed using the MapReduce paradigm to reduce the high latency and high CPU issue
ii) Two different architectures, a disk-based and an in-memory architecture, are presented in order to evaluate the benefit of the inmemory processing in the proposed mechanism, while the performance of the proposed media storm indexing mechanism is examined on different burst incoming rates within several “storming time frames”, that is, the duration that the storm lasts; and
iii) We verify that the media storms have been correctly indexed in the DVC-based data structure by performing search queries, outperforming state-of-the-art methods in terms of indexing, search time and accuracy.