Data, the ability to process data and the scenario of commercial realization. The process of solving financial problems by artificial intelligence well corresponds to these three elements. The financial field is the most suitable subversive scenario for artificial intelligence.
In the back-end of the financial application field, information security, investment risk control, asset management and other issues have become new problems. For customers hiding behind touch-screen mobile phones and customers lacking central bank data, banks can’t see whether users are modest gentlemen or liars and hooligans through their eyes. At this time, the financial back-end, Traditional financial risk control means are not covered and hard to reach.
Then, Internet plus finance will deal with a wider range of financial customer problems with a broader combination of Internet data and AI. From this perspective, in the new financial era, artificial intelligence and machine learning really want to “subvert” traditional financial risk control. Will an effective dimensionality reduction attack be formed when introducing the processing means of artificial intelligence to process tens of thousands of dimensional data in the Internet industry to “dimensionality reduction” to process the world and thousand dimensional data of the financial industry?
New financial risk control business
Generally, a risk control business includes front-end page user data application submission and collection, anti fraud, compliance, logic verification, core decision-making and credit extension, including application scoring, electric adjustment, and final collection. Faced with this set of business processes, the data pain points faced by the new financial risk control field generally fall into several categories:
In terms of business process, machine learning has played a role in each risk control node. For example, in the anti fraud link, in the pan Internet environment, the traditional individual fraud faced by financial risk control has rapidly evolved into organized and large-scale group fraud and related risks. The traditional anti fraud still stays in the simple rule way of identifying one-time risk, such as the number of borrowers in contacts, and there is no good plan for the second, third and even wider global network risk. The graph based semi supervised algorithm in machine learning solves this demand well. A huge network graph is constructed based on various information nodes such as applicant, mobile phone number, equipment and IP address, on which the anti fraud model based on rules and machine learning can be identified in real time.
Core credit decision
In the application scoring process, the traditional financial risk control often models the forced credit data, such as bank lending records, based on the scoring card system. Under the new financial business, the customer base further “sinks”, covering more income groups. The forced credit data of new groups are often largely missing, and financial institutions have to use more weak financial data, such as consumption data Operator data, Internet behavior data, etc. Such changes in underlying data have caused great difficulties to traditional credit scoring cards, which are specifically reflected in:
For example, many data of Internet behavior and operators are unstructured data, and the data is complex. It is difficult to process the feature engineering before modeling in the traditional manual way.
Due to the significant expansion of data type and scope, the new model often faces thousands of dimensional weak variable features processed, and the scorecard system can not integrate and absorb these features at all.
The risk environment of online new financial business evolves frequently, the traditional manual iterative model can not adapt to the risk change speed, and the iterative optimization is too slow.
Artificial intelligence and machine learning have unique solutions to the above problems: in the face of the problem of complex data, the feature generation framework based on deep learning has been mature and applied to large-scale risk control scenarios. It has realized deep feature processing and extraction for Internet behaviors such as time sequence, text, image and operator unstructured data, showing that the effect of the model has been improved beyond imagination.
In view of the challenge of difficult data control, a large number of practices show that the maximum value of different data can be mined with appropriate models. Fortunately, the mature application of machine learning methods in Internet advertising, search, recommendation and other technologies in the past is to process different types of data with different machine learning models, transplant this into the financial scenario, and use the complex integration model to deal with thousands of weak variables with ease and link them with the accuracy of default risk.
Solving the problem of slow model iteration is also the best thing for machine learning. In the past, Internet companies generated a large amount of user data every day, which required continuous and frequent online optimization of search and recommendation models. The self iteration frequency was faster and more accurate than that in the financial field, which was almost impossible to solve by manual iteration. Therefore, in financial risk control, through the monitoring of model characteristics, lending groups, model performance and business feedback, the machine learning model has been able to carry out online fast self iteration.
Big data risk control of ai-drive
In fact, the problems to be solved by machine learning are very clear. The core of all the mechanisms of data adaptation and fusion, group anti fraud, feature engineering, model construction and training, performance monitoring and self iteration, including deep learning, semi supervised learning, online learning and other technologies, is to “reduce the dimension” of Internet level machine learning technology to the financial field, To solve the uniqueness of data in the new financial scenario, on the one hand, there are less available data than the Internet, on the other hand, there are many unexplainable, high-dimensional and sparse big data than the traditional scorecard system.