Chapter 113 - Reborn Godfather of AI - CloseAI - Hiatus

Chapter 113 – Unexpected invitation

Compared with the remarkable and outstanding Tiger algorithm, the effect of the mobile optimization ranking algorithm is slightly worse.

Therefore, Meng Fanqi did not rush to promote the online testing, but waited for the update of the AI language interpretation model to be ready for promotion together.

Currently, the commonly used methods for language issues are Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM), both of which are old methods from the end of the last century.

These two methods are simple and easy to use, so they have been popular until around 2017.

Until the Transformer, which is the T method of ChatGPT, appeared.

Generally speaking, everyone believes that the reason why the Transformer method was able to quickly replace RNN and LSTM is mainly because it is more convenient for parallel processing.

It is easy to achieve parallel processing on multiple devices, and the most important thing is to make it possible to have a large-scale version, which also laid the foundation for the later ultimate giant models like ChatGPT.

"In fact, the old version of RNN also has a way to do parallel processing very well, and there is a great misunderstanding in the field about this," Meng Fanqi said, frowning and pondering.

Originally, after the Transformer came out, everyone abandoned the research on the old methods at hand and embraced the T method.

But in 2018, someone actually did highly parallel RNN, but unfortunately it was too late.

If this discovery had been made a year earlier, RNN might have been a long-term competitor to the T method, and we might have seen the emergence of ChatRNN.

"Early T methods require a lot of data, various parameters are difficult to adjust, and they also require a lot of computing power," Meng Fanqi said, even though he made an improved version of the T method based on many mature methods later, the T method was still quite troublesome in the early days.

"Fortunately, Google has no shortage of data and computing power, and I am also familiar with various classic parameter settings." Meng Fanqi first wrote a prototype version of the T method and conducted some tests.

"However, due to the limited memory of the current graphics card, the model cannot be very large, unless I specifically develop advanced parallel methods like DeepSpeed."

Training models on multiple cards may be for the sake of speed, or it may be because one card cannot hold it.

Among them, data parallelism is the simplest, that is, different cards are doing the same thing, and each card will store a model.

However, the input data is different, and after the different cards have finished the calculations, they are integrated and updated together.

It is obvious that the former is much easier than the latter. The former only needs to copy these models on different cards and read the data for calculation.

As for the latter, it needs to be split and merged according to different situations and settings, and it is easy to make mistakes.

Looking at Google's brain servers, there are several batches of 2013 GTX Titans, which are quite valuable.

Compared to the 4G flagship model purchased by Meng Fanqi himself, the extra 2G of memory is enough to do many other things.

By exchanging speed for memory, Meng Fanqi repeatedly transferred many parameters and information between the CPU and GPU.

Because before officially joining, Google Brain had already allocated 16 Titans to him, which he could use at any time.

In addition, there are 32 GPUs on different nodes that can be used.

"At this time, Google's graphics cards were not so many, and this configuration was already quite generous."

Not only is there a unified configuration system and environment, but also good multi-card parallel methods and examples.

In another two years, thousands or even tens of thousands of TPUs will be standard.

If Meng Fanqi wants to integrate AI into the search system, there are three main directions.

One is to split keywords and use language models to obtain their meanings in the real world, in order to better rank the results.

The second is to expand the scale of the model so that it has a certain broad understanding, thereby expanding the amount of content that can be searched.

The third is to make the search engine better understand how different language sequences will change the intent of the query.

It can be hard to make great work when its stolen from bit.ly/3iBfjkV.

The second one is currently more difficult, but Meng Fanqi is very confident in the first and third.

The traditional RNN and LSTM's looping method makes it difficult to handle longer sentences properly, and the understanding of the change in sequence is not so sufficient.

Meng Fanqi's prototype T method has a unique advantage in this regard.

In addition, although the T method is difficult to learn on small data and the various parameters are difficult to fine-tune, the overall training difficulty is great.

But in the face of Meng Fanqi, an old alchemist, this is not difficult at all. With the massive data already prepared by Google, Meng Fanqi is very confident in the effectiveness of this method.

After investing all the graphics card resources in training, on the eve of Christmas in 2013, Meng Fanqi ended his work journey at Google Brain for about ten days.The training of the model will take some time, and the advertising algorithm may be delayed for another two weeks, after New Year's Day.

Finally, Meng Fanqi has completed the early career technology that attracted the most money, feeling a sense of relief.

Just as he was planning to start a company and looking for a work space and equipment, an unexpected phone call disrupted his rhythm.

"Mr. Meng, hello, I am Li Kaifu's secretary from the Innovation Factory. He really wants to talk to you in person, but due to health reasons, it's not very convenient for him to travel. Are you available to come over?"

Li Kaifu? He is also a senior figure in the Google system, having risen to the position of global vice president and the top position in the Greater China region.

Not only that, he has also held high positions at Apple and Microsoft.

However, after his four-year contract expired in 2009, he resigned and started his dream of running an angel fund to invest in college students.

"Where is Mr. Li Kaifu now?" Meng Fanqi is quite familiar with Li Kaifu's experience, and at this time, he should be in the early stages of cancer, but he doesn't know where he is receiving treatment.

"Mr. Li Kaifu is currently receiving treatment in the northern city of Baodao. If it's convenient for you, can we arrange a time? The treatment effect during this period is not particularly good, so Mr. Li has basically stopped participating in any meetings and company work. However, he insists on taking a day to chat with you."

"I have just finished my current work and can apply for an entry permit tomorrow." Meng Fanqi felt a little strange. Although he has made a name for himself in the AI industry, it seems that there is nothing that cannot be seen by a senior figure like Li Kaifu, especially considering his current health condition.

"But after the application, it will be another two weeks."

Meng Fanqi asked the secretary, but she was not very clear about the specific reasons. Meng Fanqi suppressed his curiosity and arranged to meet in mid-January.

It only takes two to three hours to fly from Shanghaifei to Taoyuan in the northern city, which is actually closer than going to Yanjing. He has never been to Baodao in his two lifetimes, so it would be nice to visit Li Kaifu while he's there.

However, the entry permit is as annoying to process as a visa.

Comments0

Maximum number of guest chapters reached