Chapter 86 – Visual new paradigm

The host of the meeting smiled and announced the grand entrance of Meng Fanqi.

Meng Fanqi was immersed in his calculations and was envisioning a bright future, completely unaware of how quickly time had passed.

He quickly closed Google's financial report and walked onto the stage.

Looking at the room full of AI scholars below the stage, all eagerly anticipating and looking at him with expectant eyes, Meng Fanqi realized that he was not as nervous as he had imagined.

"Hello everyone, the main theme of this report is the application of deep residual learning in the field of image recognition. I know that most of you are here because of its achievements in image recognition."

At this point, Meng Fanqi paused for a second, and laughter could be heard from the audience.

"However, in this kind of occasion, I personally don't want to spend too much time on specific details. The relevant papers and codes will be gradually open-sourced in the near future. Those who are interested in the details can explore them on their own."

After signing the contract with Google, Meng Fanqi no longer needed to hold onto all the papers he had on hand.

With the momentum of the press conference and this international conference, he was ready to release all of his previous work to the public.

Of course, this part of the content needed to be approved and reviewed internally by Google before it could be released, to avoid any potential harm to Google's interests.

However, since most of this technical content was in the field of image algorithms and did not involve Google's core interests, it was unlikely that Google would agree to it after a few weeks when Meng Fanqi revealed a series of strategies for Google's recommendation and advertising algorithms.

In situations that did not involve core interests, Google had always been very generous in these technical aspects. After being reviewed by Jeff and Hinton, Meng Fanqi was ready to release the accumulated work.

Before the conference started, some of the papers led by DreamNet were already available for reading on arxiv.

When other teams were introducing their methods earlier, many people were secretly reading the papers on DreamNet, just like Meng Fanqi reading the financial report.

"Although the theme of the competition is image recognition and classification, it is rare for so many scholars to gather together. I hope to showcase some other aspects as well."

There are three top visual conferences, namely the International Conference on Computer Vision (ICCV) that Meng Fanqi participated in this time.

There is also the European Conference on Computer Vision (ECCV), a conference that is more focused on Europe.

Both of these conferences are held every two years, so there are only two opportunities for such top international conferences in a year.

The only one held every year is CVPR, the International Conference on Computer Vision and Pattern Recognition.

For the full version, visit [ pawℝead.com ].

However, it is usually held in the United States, and there are often visa issues.

This opportunity is quite rare, so it is necessary to promote one's own work.

"Based on the residual thinking of DreamNet, not only has there been a breakthrough in image recognition and classification, but I have also derived some variations of it, such as generative networks, detection networks, and segmentation networks."

The results of the classification competition are available, and the paper on generative networks has already been released. As for the detection network, everyone has already understood its power through Baidu's press conference.

As for the segmentation network, it was released along with the DreamNet paper these past few days. With this, it can be said that the basic paradigm of several major visual tasks has been established by Meng Fanqi.

In the future, whether it is recognition and classification, segmentation and detection, or transfer and generation, it will be difficult to bypass these lightweight and practical methods.

"As you can see, after this kind of thinking swept through the field of vision, it has brought revolutionary breakthroughs to the main research directions."

Meng Fanqi placed the main experimental conclusions of these papers on the second page of the slideshow, aiming to shock everyone with the results.

"Obviously, these algorithms have opened up a huge gap with the second place in many fields, and a considerable part of the credit should be attributed to the revolutionary impact of the residual thinking on the depth of the network."

"In 2010 and 2011, we were still using manually designed SIFT, HOG, and SVM. In 2012, AlexNet, an eight-layer network designed by Alex, made a huge breakthrough."

"And this year, the deep revolution triggered by the residual thinking has made it possible to train neural networks with 150+ layers."

"Deep neural networks are the basic engines and backbones for many task scenarios, especially in visual tasks. That's why they can quickly influence several mainstream tasks."

"Structurally, DreamNet is nothing special. Compared to networks where each layer was individually designed before, I intentionally wanted it to be simple and repetitive."

Behind Meng Fanqi, the slideshow displayed an extremely long and detailed pattern, which was the structure diagram of the deep DreamNet with over a hundred layers.

When zoomed in to show its basic design, everyone found that its single-layer design was very simple and plain, only using the most conventional operator operations.The lengthy structure diagram began to scroll, and the audience discovered that there was no difference between each layer, they were simply repeating.

Due to the long scrolling of over a hundred layers, it seemed somewhat comical in this serious setting, causing bursts of laughter in the venue.

"So, this brings up a question, is it always possible to achieve better results by simply enlarging and deepening the network?"

This question posed by Meng Fanqi did not have a theoretical answer until 2023, but it was clear that the massive models were continuously creating miracles.

Whether it was painting, dialogue, or image manipulation, none had reached their limits yet.

"I really hope I could clearly explain this answer theoretically, but due to my limited ability, I can only give my own guess, which is 'Yes'."

"I believe that with more and better GPUs, more and better data, larger models, and better optimization methods, we can continue to create miracles."

"The obstacles that previous networks encountered in terms of depth, I believe, are not due to the network's capability, but because we haven't found the right way to optimize it."

Letting the network repeat several times is something many people have tried, and clearly, the results obtained are worse than the original.

In traditional methods, this is not a strange phenomenon. Many people interpret this phenomenon as the curse of dimensionality or overfitting, without conducting sufficient depth exploration.

"After some thought, this is obviously a counter-intuitive phenomenon. For a deeper network, we can completely copy all the parameters of the smaller version network into it, and as long as the extra parts do nothing, this model will not be worse at least."

"But in reality, it is not the case. I believe many people have observed this common phenomenon, that is, the deeper the model, the worse it becomes."

Get the PawRead app for ad-free reading

Comments0

Maximum number of guest chapters reached

Hello, Dear Reader.

We hope you are enjoying your reading experience in PawRead!

Anonymous users can read up to 10 chapters.

After that number, PawRead continues to be free for everyone, but we require readers to register an account.

To continue reading your favourite web novels, please create a free account, or log in if you already have one.

Sign Up LOGIN
Loading...