Content security refers to the protection of information content and the compliance of information content with the requirements of political, legal and moral levels. With the rapid development of mobile Internet, digital media, artificial intelligence and other technologies and applications, the importance of content security is becoming more and more prominent. Artificial intelligence, especially the development of technologies such as in-depth learning and knowledge map, can effectively improve the ability of content identification, protection and violation review, and accelerate the content security governance to be automated, intelligent, efficient and accurate.
With the rapid development of mobile Internet, digital media, artificial intelligence (AI) and other technologies and applications, the importance of content security is becoming more and more prominent.
The connotation of content security includes two levels: the first level refers to the protection of information content, such as anti theft and anti tampering, which involves many aspects such as information content confidentiality, intellectual property protection, information hiding and privacy protection; The second level refers to that the information content meets the requirements of political, legal and moral levels. Content security has an important impact on the country and society from many aspects, from high-risk content affecting national security such as reactionary and violent terrorism, to risk content affecting social people’s livelihood such as pornography and gambling, and then to content affecting enterprise business and personal life such as garbage advertising. At the same time, content fraud such as false news and network rumors has become one of the biggest problems affecting content security governance.
From the perspective of information review, digital information has diverse sources, complex content, huge quantity and rapid dissemination speed. The requirements for the ability of efficient, fast and accurate fact verification are becoming higher and higher. In the early days of New Coronavirus pneumonia, false news and Internet rumors had seriously hampered epidemic prevention and control. With the current popularity of webcast, some criminals began to use webcast to carry out various illegal and criminal activities. The real-time content supervision of massive live data is also an urgent problem to be solved.
In recent years, the rapid development of artificial intelligence has had a profound impact on content security. Content security algorithms based on artificial intelligence may suffer from data sample pollution and adversarial algorithm attacks, resulting in decision errors. Content deception technologies such as forged images, false news and voice fraud based on in-depth learning have achieved the effect of confusing the false with the true. Intelligent recommendation algorithm is used by criminals to make the dissemination of bad information more targeted and hidden. On the other hand, the development of artificial intelligence also brings new opportunities to content security. Artificial intelligence, especially the development of technologies such as in-depth learning and knowledge map, can effectively improve the ability of content identification, protection and violation review, and accelerate the content security governance to be automated, intelligent, efficient and accurate.
2、 Key technology and application of content security based on Artificial Intelligence
（1） Key technologies of content security based on Artificial Intelligence
1.Content forgery and protection based on Artificial Intelligence
The development of artificial intelligence, especially deep learning, provides great convenience for content forgery. Deep forgery technology is a technology that uses artificial intelligence program and deep learning algorithm to realize video and audio simulation and forgery. The technologies involved in deep forgery mainly include self encoder and generative countermeasure network. At present, deep forgery technology can not only forge human faces, but also simulate human voices and create non-existent character images. Combined with natural language generation technology based on artificial intelligence and social network communication, deep forgery has greatly promoted the development of false news. This deep forgery technology with the characteristics of digital automation, with the help of various media to spread false information, has a strong communication potential, and can realize large-scale and latent political manipulation and control, which greatly aggravates the influence and confrontation complexity of political security threats brought by cyberspace.
Corresponding to the content forgery technology, a large number of false content detection technologies have emerged recently. In the aspect of deep forged content detection based on artificial intelligence, feature extraction mainly includes generative countermeasure network pipeline, deep learning, steganalysis and other technologies, and classifiers mainly include support vector machine (SVM), convolutional neural network (CNN), etc. In terms of false news detection based on artificial intelligence, there are many methods based on knowledge base, writing style, communication characteristics, birthplace and so on, involving many technologies such as deep learning, knowledge base, graph data mining and so on. At present, there are still great challenges in the detection of new types of false news, early detection of false news, cross domain detection and interpretable detection.
2.Content analysis oriented artificial intelligence model and algorithm security
Content analysis based on artificial intelligence involves various machine learning models and algorithms for text, image, video and audio processing. The security of these models and algorithms has a vital impact on content security. The security of machine learning model and algorithm mainly involves the following aspects.
(1) Poison attack and defense. The method of poisoning attack is to intentionally pollute the training data when training the model, so as to destroy the usability and integrity of the model. The pollution of training data is generally realized by injecting some carefully forged malicious data samples.
(2) Back door attack and defense. Backdoor attack embeds backdoors in neural network models in two ways: data and model. When the model gets specific input, it is triggered, and then leads to wrong output of neural network. Therefore, it is very hidden and not easy to be found.
(3) Counter attack and defense. Machine learning model and neural network model are easily affected by countermeasure samples. By adding specific disturbances to the original samples, the classification model can make wrong classification judgments on the newly constructed samples.
(4) Model theft and defense. Model stealing technology refers to stealing models or recovering training data members through black box detection, such as stealing stock market prediction model and spam filtering model.
At present, a large number of new machine learning algorithms appear every year, and the security of these algorithms has become a common concern. Model and algorithm security can be regarded as a game between the defense and the attacker’s technology of modeling each other in the case of lack of information. The new robustness model and training algorithm, multi learner security, system modeling and reasoning under lack of information need to be further studied.
3.Interpretable artificial intelligence for content analysis
Artificial intelligence technology represented by deep learning faces interpretability problems. When it is applied to content analysis in sensitive fields, the “black box” algorithm lacking transparency and comprehensibility is difficult to obtain people’s sense of security and trust.
The interpretability research of artificial intelligence model mainly has three directions:
① Depth interpretation, that is, a new depth learning model is used to learn the features that can be used for interpretation. Many related works are combined with visualization technology to provide more intuitive explanation.
② Interpretable model. The traditional Bayesian and decision tree models have good interpretability. At present, many researchers have improved the deep learning model to make it more interpretable.
③ Model reasoning. This method regards the machine learning model as a black box and establishes a new interpretable model outside through a large number of experiments. A new research method is to construct a set of machine learning technology, which can automatically generate interpretable models and maintain high learning efficiency.
Although the research on model interpretability has achieved some remarkable research results, its research is still in the primary stage, still faces many challenges, and there are many key problems to be solved. One of the challenges of interpretability research is how to design more accurate and friendly interpretation methods to eliminate the inconsistency between the interpretation results and the real behavior of the model; Another challenge is how to design a more scientific and unified interpretability evaluation index to evaluate the interpretability performance and security of interpretable methods.
（2） Important application of content security based on Artificial Intelligence
1.Network public opinion analysis and supervision
Public opinion is the “skin of society” and a barometer reflecting the social situation. Big data and artificial intelligence technology provide new resources, methods and paradigms for public opinion analysis and judgment. The content of network public opinion is complex, and the requirements for real-time public opinion analysis are also relatively high. Artificial intelligence technology can make network public opinion analysis more efficient and accurate, and can greatly reduce the cost of manual work.
2.Multimedia content analysis and review
Even in Norway, Japan, Italy and other countries that boast freedom, Internet content censorship is also strengthening. Using the censored planet tool developed by the University of Michigan (the automatic review tracking system launched in 2018), a team has collected measurement data of more than 21 billion content reviews from 221 countries in the past 20 months. Recently, multimedia data, especially video data, has achieved unprecedented growth and will continue this growth trend. Massive multimedia data is far beyond human processing capacity. Content analysis and audit based on artificial intelligence have been widely used.
Multimedia content analysis based on artificial intelligence mainly includes intelligent audit, content understanding, copyright protection, intelligent editing and so on. Among them, the content audit functions include pornographic identification, political identification of violence and terrorism, advertising QR code identification, meaningless live broadcast identification, etc., and use the identification ability to troubleshoot and deal with meaningless and unhealthy content on the network. Content understanding functions include content classification, labeling, character recognition and speech recognition, as well as text recognition in images and videos. Copyright protection functions include content similarity, homologous content retrieval, audio and video fingerprinting and so on. The content editing level can realize the generation of video header, video summary and video highlights, and support news breaking.
At present, short videos and pictures have become the main content of multimedia audit. Based on massive annotation data and deep learning algorithm, it can accurately identify the prohibited content in multimedia content from multiple dimensions, such as pornography, violence and fear. In 2019, Alibaba Group launched the “artificial intelligence rumor shredder” to support the intelligent judgment of the credibility of news content, and the accuracy rate in specific scenes has reached 81%. China information and Communication Research Institute has preliminarily realized the bad information detection capability based on artificial intelligence technology, and supported the identification of obscene and pornographic, terrorism and violence related and other illegal information. The identification accuracy has increased by 17% compared with the traditional method, reaching more than 97%, and the identification speed has reached 110 times that of the traditional method. In February 2021, baidu released the 2020 annual report on comprehensive management of information security. Baidu content security center will mine more than 51.54 billion pieces of harmful information by using artificial intelligence technology in 2020, and crack down on more than 80 million pieces of relevant harmful information through manual independent inspection. The audit speed has been greatly improved, and six audit dimensions have been formulated, including violent fear, political sensitivity, watermark, label, public figure and malicious image.