Headline: Metadata Inserter/Extractor into Compressed Video

In digital environments, metadata can be effectively hidden into the multimedia itself for general purpose        metadata management. One of the most important applications of metadata in broadcast industry is in audio and video archiving. Broadcaster's archives contain thousands or billions of valuable contents so metadata has an      essential role in searching and finding the required archived materials.

In the archive automation workflow, besides the original video content, a high resolution (Hi Res.) and a low         resolution (Low Res.) versions of the original content are generally generated and stored based on H.264/AVC codec. Users could then access the high or low resolution content according to their network access levels.

For simple access to the desired archived contents, the search engines of the automation system are usually        implemented based on the Web. Therefore, the content security considerations and metadata confidentiality are challenges of these systems.

Applying data hiding in compressed video is one of the agile and reliable solutions that can guarantee content      security and metadata confidentiality.

The main goal of the project is to insert metadata into compressed video without significantly increasing its bit rate volume and degrading its quality.



Metadata Insertion into Compressed Video

The system is developed to insert metadata into archived compressed video and is used in automation system as two API modules: metadata inserter and metadata extractor.

The video files are archived in compressed forms, so it is necessary to insert metadata in the compressed video domain to avoid any decompressing process. The proposed metadata insertion system has two important             advantages over the others; the quality transparency and minimum bit rate or capacity change after metadata      insertion into compressed video.

Generally, there are two methods for inserting metadata into compressed video:

      (I) metadata insertion during video compression process. 

      (II) direct metadata insertion into compressed video without full decompression.

In the first approach, if the video is in compressed form, it must be decompressed, and after metadata  insertion  then re-compressed. This implies that in metadata insertion, video decoding and encoding is required and quality loss is inevitable. This undesired outcome limits the possibility of several times metadata insertion into the        compressed video. Whereas in the second, there are no video compression/decompression artefacts. Therefore, one of the unique advantages of the proposed approach is its multi times fast metadata insertion and extraction capability with minimum video quality degradation.







Bit-rate Increase, Transparency and Metadata Confidentiality

Adding any strange data into the video can damage its statistical property, making it less compression efficient. As a    consequence, the compressed bit rate is increased, and its quality deteriorates. The art of data hiding technique in video is to insert the data in a way that the bit rate and the quality of compressed video is not significantly altered by the hidden data. To do this, we have developed a novel method for H.264/AVC coded video, where metadata are hidden in the last non zero coefficient (LNZ) level of quantized DCT block in the scanning order. The unique advantage of this method is that, since data is hidden at the position of last non-zero high frequency DCT coefficient, they do not               contribute to the video distortion. Moreover, since data is hidden at the highest possible frequency, its visual perception is very minimal. Second, altering the position of the last non-zero coefficients hardly affects the run-length bits and hence the increased bit rate is at its minimal. Experimental results on several test video sequences prove that the          proposed approach can realize blind extraction with real-time performance, delivers very high capacity and low                distortion and also keeps the increased bit-rate at a negligible level.

On the data confidentiality, although for video this is not as important as the increased bit-rate and quality degradation are, but with a simple key, the metadata is first encrypted and then inserted into the LNZ of the H.264/AVC coefficients.

This novel method is then adapted for data insertion into the compressed video stream. To do this, the  compressed video is partially decoded, and the encrypted metadata is simply inserted in the LNZ. The performance of this method for  various types of compressed I, P and B pictures are tested, and it is shown that while several times data insertions is       possible, the multiple inserted data do not interfere with each other. For simple real-time operations, the data insertion and extraction parts are designed in two separate modules, each with the following characteristics.


Metadata Insertion Module

Metadata inserter is a small part of decoder, where after entropy decoding of the Run-length DCT coefficients, the         encrypted metadata is inserted into the LNZ of each block of 4x4 coefficients with a high frequency coefficient above a certain threshold. If entropy is the type of CABAC, some transitional probabilities of the H.264/AVC encoder are                  required to be added to this inserter module, but for Huffman entropy coding, the decoder itself has the required tables. In both types of entropy coding, since the compressed data is only partially decoded, the video quality is not degraded and more over the system is extremely fast. A large volume of metadata can be easily inserted to hundreds of frames, in a fraction of a minute.

Metadata Extraction Module


The inserted metadata bits are extracted in the decoding process of H.264/AVC where the quantized DCT levels for each macro block are entropy decoded. Then, according to metadata insertion algorithm, the inserted bits in macroblocks are identified and extracted to create inserted metadata information.


To obtain the raw metadata stream, the extracted encrypted metadata are decrypted by the encryption key. The data  extractor is even faster than the data insertor, since it does not apply any entropy encoder. However, both modules are extremely fast, considering that a second of video comprises of 25 frames, while insertion or extraction of metadata in these frames can be done at a fraction of a video frame.





The new analytical football tools prepare instantaneous statistical and analytical information about current events. It offers audiences to have own analysis and enjoy events more than before. These technologies are based on image processing, machine vision and computer graphic systems.

Pixball is a perfect solution to virtually add some realistic looking 3D overlays to the final TV program streams of a football match such overlays that can visualize the tactics of teams, positioning of players, critical referee decisions, analysis of important moments of match such as goals and free kicks.

In addition to 3D overlays, Pixball is supported by a rich database of statistics for players and teams to generate CG for live broadcast.

Pixball is based on two powerful engines:

1) Machine vision engine that performs image-based calibration tracking to extract the parameters of broadcast camera (pan, tilt, zoom and position), tracks players and extracts Chroma key of the field;

2) Rendering engine that provides augmented reality for 3D overlays, generates and renders graphics and virtual advertisements with tied-to-field view.

Following services are proposed by the Pixball:

  • Generating attractive videos for television broadcasting by overlaying analytical graphics
  • Providing content for second screen services based on OTT and IPTV platforms
  • Providing content for sport programs
  • Creating income by adding virtual advertisement on the football pitch

Generating Analytical Graphics

The broadcast television program stream is fed to the system and a graphic inserted video stream (e.g. offside line, players marking and etc.) is generated. The user can easily access the sequence of video frames and add desired graphical information at any point of the pitch. The graphic inserted video could be previewed and edited. Some of the system features are as follows:

Magnifying capability

Players list

Virtual score board

Throw-in arrow

Graphical data view

Offside line

Defense and attack arrangement

Player marking

Distance to the goal

Hatching an area

Beam on player

Free kick circle

Intermediate frame interpolation during the camera viewing angle changing

Lines and arrows

Player indicator



Player track




Creating Virtual Advertisements

Inserting virtual advertisement is provided like its real counterpart on the football pitch. The user can also place a 3D model of favorite advertisement on any point of the field.





Intelligent Analytical News Dashboard

Title: Intelligent Analytical News Dashboard

For any broadcaster being the first to broadcast breaking news, monitoring national and international news 24-7 and covering news that is trendi­­ng and sensitive to its public is of uttermost importance. In considering that with today's infrastructure thousands upon thousands of articles are being posted on the web every 24 hours, it is becoming impossibly large for manual, human processing. Today's advances in artificial intelligence provide the media industry the opportunity in analyzing big data relating to its assets; in this case its news content as well as its audiences. IRIB's intelligent analytical news dashboard funded by IRIB is working with leaders in data analytics within the academia-industry to intelligently analyze news within social media (including websites, social networks and social messaging apps) using the state-of-the-art techniques in AI. The dashboard is designed for journalists, scriptwriters and others involved in news programmes to be informed instantaneously in regard to up-to-date categorized, ranked, summarized and analyzed news articles and hence to be encouraged in producing creative content important to its readers.     

Micro-service architecture

The Intelligent Analytical News Dashboard constitutes of six main modules and is presented through a responsive and user-friendly UI/UX: 1-gathering of news articles from 400 news websites and from hundreds of social media channels, 2-storage (SQL & NoSQL), 3-pre-processing to unify Farsi text (normalization, tokenization, stemmer, segmentation, …), 4-processing of information, 5-visualization tools and 6-access to the defined services.

The dashboard will be available to its users through a private cloud and will be designed in a scalable manner. As the system entails text mining modules that can be used and extended within other AI projects, the system is based on a micro-service architecture.

The main services of the dashboard or the micro-services of the system (processing modules) include the intelligent identification of the origin of news, breaking news, news stories and trending news, as well as categorizations of news articles, providing recommendations, extractive news summarization, copy detection, keyword extraction, identification of NER's (named entity recognition - highlighting and tagging jobs, locations, organizations and such alike within articles), sentiment analysis, search and reporting facilities.

Farsi Text Mining

The complexity of the dashboard is in the analysis and mining of Farsi textual information. Although within the past decade academic research has been published in this field, due to the complexities of the Persian language and limited available datasets, accuracy results for different modules is a working progress and hence not available in commercial products. The Intelligent Analytical News Dashboard attempts to overcome these challenges using the latest natural language processing and deep learning techniques.

The Persian language requires complex pre-processing tasks to provide a unified format to support all it's different writing styles that include using or eliminating spaces within or between words, writing words with different spellings, transliterations, Unicode ambiguities and so forth.

Named entity recognition (NER) is used in many NLP applications and classifies and tags named entities that are present in a text into pre-defined categories including "people", "places", "organizations", "professions", "time", date", "currency" and "events". Hence, NER can be used to reveal which are the major people, organizations, events and places discussed within input sources (news articles). Knowing the relevant tags for each article will aid in automatically categorizing the articles and enable content discovery.

Topic modeling can be used to automatically classify news articles into trained topics. Furthermore, each news article can be classified into more than one topic e.g. an article can be based both on sports as well as being economically related.

Keywords or key-phrases are important components of news articles as they provide a compact representation of the article's content and is known to be used for search engine optimization. Keyword extraction involves the automatic identification of a sequence of one or more words that best describe the subject of a document.

Automatic extractive text summarization is the process of condensing textual information while preserving the important concepts, which may be based on the combination of statistical, semantic and heuristic methodologies.

Recommendation algorithms can be used to identify and provide suggestions for content that are most likely of interest to a particular user.

Semantic analysis of social media comments can be used to monitor user insights on topics of interest (based on NER – people, events, organizations and so forth) and entails the process of defining and categorizing opinions in a given piece of text as positive, negative, or neutral.

Search engine to index and search through the incoming data based on specified keywords and to return a list of matching news articles or social media content to the user.



Title: HbbTV
Headline: HbbTV playout kit & HbbTV application test suit

Are you looking for a cost effective solution for testing your HbbTV on all types of HbbTV televisions & set-top box applications?

HbbTV presents broadcasters and operators with a business-model neutral method to combine broadcast and broadband delivery and provide unified, enhanced and compelling interactive services. Catch-up services, VOD, gaming, DRM protected content and operator portals and branded EPGs are all deployed on HbbTV platforms today.

During HbbTV Application Testing, ensuring that applications work on a wide and diverse range of HbbTV devices is a major interoperability challenge. Even if a device has been certified with the test suite, there is no guarantee that it will correctly present an application, which uses HTML and JavaScript features that are not included in the HbbTV specification.

In this regard, we present a dedicated device that is simple to setup and control and that allows the most efficient testing on all types of HbbTV televisions & set-top boxes – with no exceptions. After plugging into the power supply and connecting the antenna, the HbbTV Playout Kit starts broadcasting a local DVB-T multiplex and allows the launch and fine-tuning of HbbTV applications on the end devices without any limitations and without the necessity to build a costly infrastructure.

In our solution, the URL of the HbbTV application's server is transmitted to the HbbTV receiver with a return channel for interaction with the server. Also, in this product, it is possible that the application payload is transmitted via the DSMCC object carousel to the receiver. DSMCC is one of the standards used to broadcast files over transport streams. The most common use is to broadcast decoder software upgrades DVB-SSU.


EPG Intelligent Correction System

Title: EPG Intelligent Correction System

One of the most widely used services among broadcaster's audiences is the Electronic Program Guide (EPG) system. Undoubtedly, the accuracy of this system is one of the most important issues and challenges for audio and video content distributors. Mentioned project designed in order to increase the accuracy and efficiency of such systems and as a result, the effectiveness of them is provided intelligently and without human factors interferes.


Basis of work

Briefly, this system works on the basis of comparing and measuring of fingerprints' matching which was extracted from videos in archives and fingerprints received from the antenna (TS) or network stream.

Firstly, appropriate frames and spots that are robust to compression and adding noise to acceptable thresholds are selected. A required number of points from these points on the page, if there are any, which represent a feature vector of that frame is selected. These vectors which pointing to specific frames of a video are stored in a NoSQL database in the form of a series of text files.

By reading and dividing frames which were read from the antenna and sending it to function extractor in threads, simultaneously, trait vectors are calculated and compared and measured for compliance among the vectors in the bank.

For this purpose, the vectors in the data bank, according to their structure, are categorized and indexed with a combination of Mixture Model and Euclidean distance and the vector extracted from the antenna is matched with other vectors, each by each, in its own class of vectors in the database. The highest compliance, regarding threshold, determines the output and, as a result, the type and frame number of the video.

The following figure shows the process.

Practical Results

This system is provided in C++ for feature extractor engine and Java for the search engine on Spark II; During a trial in Islamic Republic of Iran Broadcasting organization, it has been performing well with over 98% accuracy. A maximum of 2% of the error occurs when randomly generating frames in two different videos that have a threshold similarity; Which, of course, would not happen in the continuation of video and next frames. Therefore, with a macroscopic examination resulting from the creation of a string obtained from a delay of a few minutes, a chromosome string can be created and unrelated frames can be replaced with correct frames. Also, using other data and available metadata in the broadcast department, this accuracy can approach 100%.

Whereas the videos read from the antenna (TS) or network stream has an average of 800kbps, which sometimes means that compression is 50 to 70 times! And this is an indication of the power of the feature extractor algorithm that is robust against this type of attacks.



meti 2.jpg

pic project (NAB) (5).jpg

pic project (NAB) (4).jpg

pic project (NAB) (3).jpg

pic project (NAB) (2).jpg

pic project (NAB) (1).jpg

nab slid FA

nab slid EN

nab slide