What is transcoding and what is it for? Transcoding (conversion) What is transcoding.

If in past years the most interesting domestic technological news was mainly related to software, then in 2019 a lot of interesting things happened in the field of hardware. Moreover, the state has resolutely taken up import substitution, and not only software.

Government agencies in 2019 actually ruined T-platforms: the company is in agony, “80% of employees quit”, the site is turned off

To the inexhaustible stream of problems of the company "T-Platforms", whose founder and CEO is in custody, was added a massive layoff. The organization does not have enough money not only for salaries, but, perhaps, even for the support of the corporate website, writes CNews.

Rostec wants to create Russian chips for Bluetooth, Wi-Fi, NFC and the Internet of things

Rostec proposes to develop chips for Bluetooth, Wi-Fi, ZigBee, NFC, LPWAN, NB-IoT and Thread wireless technologies in Russia. Own systems-on-a-chip for the Internet of things and LPWAN base stations should also appear. Total investment in the development of the Internet of things in Russia by 2030 will amount to more than 200 billion rubles.

Kaspersky is working on the first chip in Russia to accelerate artificial intelligence

Kaspersky Lab has signed a strategic cooperation agreement with the developer of Russia's first neuromorphic processor for hardware acceleration of artificial intelligence systems. The chip will allow local processing of large amounts of data and will enable neural networks to retrain in the process.

Russia needs “Mir”, preferably all: in Russia they will be required to pre-install Mir Pay on smartphones instead of Apple Pay and Google Pay

Izvestia reports that the Federal Antimonopoly Service (FAS) is considering making the Mir Pay service a mandatory application for pre-installation on electronics sold in Russia. Judging by the trends of the last year, such an initiative should be approved by the country's authorities.

The failure to launch almost half of the satellites in Roscosmos was explained by sanctions on radiation-resistant microcircuits and the unavailability of OneWeb

Roscosmos did not complete 45 launches, mainly due to the unavailability of OneWeb and the Ministry of Defense spacecraft, said Dmitry Rogozin, CEO of the Russian corporation, commenting on the statement by Deputy Prime Minister Yuri Borisov that this year Russia's space launch programs were completed "a little more than than 50 percent." It is reported by TASS.

Adaptive transcoding: what is it?

This term is called individual language mediation, which is carried out by a specialist of a translation agency. With adaptive transcoding, information is translated from one language to another with simultaneous transformation according to the laws of interlingual interaction.
Typically, adaptive transcoding requires attention to which language group or particular form of informative change is present in the context. Therefore, adaptive transcoding allows you to select a translation option that matches the content of the original text. At the same time, the translated text cannot be used for 100% replacement of the original text.
Translation has always been at the heart of linguistic mediation. The source and destination texts must be equivalent and identical in meaning. Such a similarity of texts is required to achieve mutual understanding, determined by the linguistic features of communication.
Adaptive transcoding is paratranslating in nature and allows complex transformation of the text, which includes not only ordinary translation, but also text adaptation. The essence of adaptive transcoding is the method of composing texts of various forms, focusing on the acceptable style and nature of information and the required volume. The main information contained in the texts is carefully selected and regrouped.
These communicative text formats differ in their own allowable volume and certain rules for presenting the material. Performing a translation in accordance with them facilitates the perception of the text.

Need for video transcoding

Today, digital video compression technologies are important in almost all types of video applications. The importance of parameters such as compression and data compatibility is even more important due to the increasing trend towards convergence of communication media.
Some of the best-known digital video applications include DVD, high-definition television (HDTV), video telephony/teleconferencing and, more recently, video surveillance. Each of these technologies has its own history of development, respectively, each of them has its own compression algorithms.
Transcoding plays two important roles. First, it provides communications between existing and newly emerging devices. For example, many existing video conferencing systems are based on the H.263 video coding standard. Newer video conferencing systems use the basic H.264/AVC profile. Thus, real-time video transcoding is required to enable communication between these systems. Secondly, information networks, especially the Internet, have a limited bandwidth for video transmission. For example, most videos are currently stored on DVD discs in MPEG2 format. Bandwidth limitations in video-on-demand and streaming video over IP networks require these video data to be converted to a more compressed format. This is achieved by transcoding the video in real time before transmission. In general, as a result of transcoding, up to 50% of network bandwidth is released without loss of video quality.
Transcoding in video conferences

So, one of the applications of transcoding is video conferencing systems. Consider a typical transcoding scheme used in such systems (Fig. 1). One signal processor (DSP2) decodes the input video stream and generates a reconstructed video frame that is sent to another digital signal processor (DSP1 in this example) via the RapidIO serial interface (sRIO). DSP1 encodes the reconstructed video frame into the desired format. Typically, one side of a videoconference uses H.263-based equipment while the other side uses H.264-based equipment.
The host processor that manages network traffic communicates with several DSPs (four in this case) via a PCI bus connection.
The key feature of the interaction of processors in this example is their connection through the sRIO interface. Since the data transferred between DSPs is uncompressed video, typically at 30 fps, the bandwidth requirements for the communication link between devices are very high.
If we take video in standard resolution NTSC (720 by 480 pixels) YUV 4:2:0, then the size of each frame will be 720x480x1.5 = 518400 bytes. Accordingly, at a frequency of 30 frames per second, the line throughput should be approximately 124 Mbps.
The choice of sRIO interface is dictated by the requirements for video data transfer rate and support for a flexible switching structure. sRIO supports three data rates: 1.24Gbps, 2.5Gbps, and 3.125Gbps. This interface uses SerDes technology to restore clock synchronization over the data stream and uses 8-b/10-b encoding. This serial interface specification supports single line (1X) and four line (4X) ports. The physical layer of the sRIO interface defines the handshaking mechanism that is used when establishing communication between devices, as well as the error detection order based on cyclic redundancy code. The interface's physical layer also sets the packet priority used in routing within the switching fabric.
To take full advantage of sRIO bandwidth, processors must have these interfaces. Such processors are offered by Texas Instruments. For example, the TMS320C6455 signal processor has a built-in sRIO interface that provides four simultaneous connections and has a peak data transfer rate of 20 Gbps in both directions.
Processor TMS320C6455

In addition to the sRIO interface, the C6455 has an additional set of important features that make it ideal for transcoding. These functional features can be combined into four main blocks.
Availability of a large number of high-speed input-output interfaces. System designers use different solutions, so a digital signal processor for video processing applications must provide I/O ports for connecting system modules at the board level. As mentioned earlier, the C6455 has a built-in sRIO port for communication between devices.
Other I/O options on the C6455 are a 1Gbps Ethernet Media Access Controller (EMAC), a 32-bit Double Data Rate Memory Controller (DDR2-500), and 66MHz bus for connecting peripheral devices (PCI). The built-in ATM interface (UTOPIA 2) allows the C6455 processor to be used in the telecommunications infrastructure.
Efficient movement of data within the chip. The single-chip architecture for efficient data movement is one of the main advantages of the C6455 processor over its predecessors. In video processing applications, DSPs work as slaves to the host processor. Therefore, high throughput, low latency and the possibility of parallel data transfer between master and slave devices are important for them. These requirements determined the architecture of the device: peripheral devices, internal memory and the processor core interact with each other through an efficient switch (switched central resource - SCR) of the C6455 processor.
Also important is the optimal organization of the data flow. It was improved by using 256-bit memory buses and internal direct memory access (IDMA). IDMA provides background data movement between two levels of internal memory, as well as to and from the peripheral bus.
Large amount of on-chip memory. On-chip SRAM is much faster than external dynamic SDRAM and is much smaller due to high manufacturing cost. For typical video applications, the on-chip memory mainly serves two purposes: 1) stores frequently used code and data, 2) loads/uploads temporary data before and after processing. Generally, the more available on-chip memory, the better the performance of the application. The C6455 DSP has a whopping two megabytes of static RAM.
Software Compatibility (SW). Software backwards compatibility is important because many video applications were developed long before transcoding was widely used. In order to use existing software on new processors, it is advisable to improve the performance of the DSP not by changing its instruction set, but by the architecture of the processor core. The C6455 processor has two architectural innovations. The first one is related to the introduction of a circular buffer, which potentially increases the efficiency of software pipelining of code processing with short cycles. The second is the use of 16-bit versions of natively 32-bit instructions, which significantly reduces the size of the program code and, thus, reduces the "miss" rate when accessing the cache memory.
Prototype transcoding system

Transcoding is also necessary for transferring data from DVDs over an IP network, such as in a company training system, video-on-demand applications, and video broadcasting. In this case, the source video format is MPEG2 and the target format is mainly WMV9. Note that the programmability of the DSPs makes it easy to support virtually any combination of source/target video format.
To transcode video data, it is necessary to solve many technical issues, such as format conversion, reduction of the video stream bitrate and its temporal and spatial resolution. Therefore, various intelligent video data transcoding schemes have been developed. Their main principle is the maximum possible reuse of the information contained in the input video stream.
This section discusses a prototype video transcoding system that is suitable for any transcoding scheme due to the use of an architecture based on a flexible hardware/software infrastructure. To satisfy various target scenarios of video transcoding, the simplest transcoding scheme was chosen, in which the video stream is completely decoded and then re-encoded in accordance with new restrictions.
The flow of data in the system starts on the left side of the diagram (Figure 2), with an MPEG2-compressed video file stored on the hard drive, and ends on a flat panel display where the video is played by Windows Media Player. In this demo, the video is in standard NTSC resolution (720 by 480 pixels) and is transcoded at 30 frames per second.
The stream sink module, running on DSP1, buffers the MPEG2 stream and organizes the input to the MPEG2 decoder module. The receive operation is controlled using TI's Network Development Kit (NDK) library, which is essentially a TCP/IP stack. The ASF packetizer module, running on the DSP2 processor, generates ASF packets from the data compressed in the WMV9 module. The DSP2 also has an NDK-based http server that handles streaming requests from Windows Media Player and passes ASF packets to it. Windows Media Player decodes the ASF packets and displays the video on the screen.
One of the most interesting and complex aspects of data streaming is the interaction of two digital signal processors through the sRIO interface. As each video frame is transmitted, the following occurs. After DSP1 completes the transmission of a video frame, it sends a data packet, which is called DOORBELL in the sRIO protocol specification. The DOORBELL package generates a system interrupt in the DSP2 processor, notifying the presence of a frame. In response, DSP2 starts the process of encoding to WMV9 format. When the frame has been encoded, DSP2 sends a DOORBELL packet to DSP1. This generates an interrupt in DSP1 to indicate that DSP1 is ready to continue transmitting the next frame. In practice, a ping-pong buffer scheme is used so that the encoding/decoding and data transmission operations are performed in parallel.
The Graphical User Interface (GUI) block provides control and monitoring functions built into the system. sRIO link and Gigabit MAC (GMAC) link activity is displayed in real time. When transmitting over an MPEG-2 data stream, the average bit rate is 8 Mbps, which is typical for standard definition encoding at 30 frames per second. When transmitting ASF packets over the link, the average transmission rate is 4 Mbps. This shows that the WMV9 format is able to free up approximately 50% of the bandwidth while providing similar video quality. For a communication channel with an sRIO interface, the average data rate is 124 Mbps.

Thus, the capabilities of the TI C6455 digital signal processor in combination with the sRIO interface, as well as the demonstration of the described prototype transcoding system based on C6455 processors, indicate that the complex task of video transmission in IP networks can be successfully solved both now and in the future. .

From the satellite, video is transmitted either in the MPEG-2 codec or in H.264 (aka AVC or MPEG-4 part10). As a rule, for simplicity, MPEG-4 part 10 is shortened to MPEG-4, but here it is important not to confuse it with MPEG-4 part 2, which is completely incompatible and does not look like H.264 and was used in old IP cameras.

Audio is transmitted in MPEG audio layer 2 (abbreviated mp2) or in ac3 (a/52).

Moreover, it is important to understand that today H264 is usually compressed with intra-refresh, i.e. there are no keyframes (IDR or keyframe) in the video stream. This compression method allows you to smooth out bitrate jumps.

As a result, none of the audio or video options transmitted from the satellite are played on the iPhone. Only H264 is played in the browser.

When transmitting over the Internet, as a rule, you can safely compress video from mpeg2 to h264 with a threefold decrease in traffic.

When transmitting HD channels over the Internet today, you have to compress the stream into several different qualities: from HD with maximum quality to standard SD to compensate for overloaded channels.

As a result, video from the satellite must be transcoded into other codecs and qualities to provide a high-quality OTT service.

It is important not to confuse transcoding with repacking. Transcoding is an extremely resource-intensive operation, which includes:

decompressing the stream to encoded video/audio
decoding to raw video/audio
resizing and other parameters
encoding back
packing in transport for flow

Packing and unpacking is relatively easy operation, the streaming server can handle up to 1000 channels on one computer. You can transcode on one computer from 1 to 30 channels, depending on the size and power of the computer.

For transcoding, you can use specialized dedicated devices, a central processor or a video card: external or built into the processor.

We will not consider specialized devices, because for the most part it is either a computer with some kind of program, or extremely expensive and very specialized equipment, or simply an unreasonably expensive device, sold solely through the marketing efforts of the manufacturer's company and not allowing to achieve as much or significant results.

H.264

There are several different programs for processing video on the CPU, but by and large today there are only two libraries that make sense to use for compressing to the H.264 codec on the CPU: this is the free libx264 and the paid MainConcept. Everything else is either worse or much worse, both in terms of the output result and the use of resources.

The practice of working with MainConcept will not be considered in this article, only libx264 will be mentioned

The H.264 codec is the de facto standard for video today, because it is supported in all modern devices, with the exception of some devices from Google.

There are practically no alternatives to it. Today, H.265 has emerged and is developing, it already has a lot of support, but for now, working with it is an investment in the future.

Codecs from Google: VP8 and VP9 are more Google's desire to pull the blanket over itself than something really useful. The resulting quality is worse, there is no support for hardware decoding, and therefore the price of the device increases.

When encoding video, you need to understand that you have to balance between the following parameters:

delay inside the encoder in frames
CPU usage (how many milliseconds it takes to compress one frame)
output image quality (how pixelated and what colors)
output bitrate

For all types of ether, the use of the CPU is absolutely critical. If the encoder settings require full CPU usage or more, then the video will not have time to be encoded in real time and therefore the video streaming will be lost.

For VOD, there is no such hard limit, and an hour-long movie can be encoded for three hours if you want to lower the bitrate. At the same time, for on-air video, they usually still try to use not all the processor power in order to process not 4 channels, but 10 on one computer.

As for the delay inside the encoder, it is critical for videoconferencing, but completely uncritical for IPTV. Even 5 seconds of delay when broadcasting television does not change the quality of the service.

The connection between bitrate and quality is quite clear: the more information about the picture we transmit, the better it will be displayed. As a rule, you can improve the quality of the picture by lowering the bitrate by choosing more efficient compression tools that require more delay and more cycles.

Understanding this complex relationship is necessary in order to better understand the claims that "our encoder is the best encoder in the world." You have to compare at least 4 parameters, but in the end it all comes down to this: how much money does it cost to transcode one channel with the desired quality and output bitrate per month.

Flussonic Media Server for transcoding

A separate package for Flussonic Media Server comes with a transcoder.

Flussonic Media Server can decode video from UDP/HTTP MPEG-TS, RTMP sources and encode it in several qualities and sizes.

This feature becomes necessary when it becomes necessary to show video not only on set-top boxes, but also on tablets: there the choice of available codecs is much less than on a set-top box.

It is important to note that in order for the video to be played on an iPhone, you even need to transcode H264 from the satellite, because as a rule, intra-refresh encoding mode is used on the satellite for a smooth bitrate, which creates a video that is not played on the iPhone.

Flussonic Media Server is more convenient than VLC or other options for organizing transcoding, because it is controlled by a single configuration file and automatically monitors the transcoding status. VLC, on the other hand, requires writing a large number of monitoring scripts to track the status of transcoding.

The next important feature of Flussonic Media Server for transcoding is automatic rebalancing of streams when one of the servers crashes. If one of the 20 transcoders breaks down at night, then the remaining transcoders can be configured to automatically capture streams for transcoding, and the streamer will take the streams from the backup transcoders.

Different manufacturers of IP cameras endow them with different supported video compression processes. Typically, these processes only overlap with the requirements of CCTV projects. When users move to video data, they begin to experience shortcomings in terms of functionality, flexibility and comfort. The only exceptions are those compression processes that have been specially modified for the CCTV system.

The settings of the camera's built-in video compression capabilities do not affect transcoding, so it can be used to convert your camera compression formats to other formats that best suit your requirements. Examples of modified formats include special codecs that are not only optimized for CCTV users, but also comply with well-known standards.

The arguments for using transcoding technology include:

at the time of combining cameras from different manufacturers functional homogenization of the CCTV system. Despite differences in camera manufacturers, all transcoder functions will be available;
the possibility of integration image processing in the transcoder;
function usage, for example, dynamic data streaming in real time (DLS), with automatic matching of the stream resolution with the size of the operator's monitor window. Thanks to this, it is possible to significantly reduce the bandwidth used in real time for multichannel data transmission.

Summary

While more and more logical-information solutions appear in the configuration of IP cameras, the development of transcoding technology is going in a completely different direction. At the same time, the camera is considered today as a source of high-quality images. Every year, logical and informational capabilities are less and less needed in the camera, its integration is simplified, and the functionality becomes homogeneous. While dealing with a large number of common CCTV problems, the centralized view approach in the field of transcoding has more advantages than the decentralized view approach, which is driven by the characteristics of individual cameras. This point is especially important in the case of large systems endowed with hundreds of channels.

Transcoding is not a panacea. Based on the special requirements for the system, it is possible to determine its form and feasibility, functional advantages and the necessary cost savings. Transcoding technology allows you to solve some problems more efficiently than the capabilities of the camera itself allow. Other problems, on the contrary, are easier to solve with the help of camera capabilities, which indicates the effectiveness of decentralized logic-information capabilities. In fact, there is no conflict between centralized and decentralized logical-information capabilities, and each of them is effective in its own field.