What you get:
10 ideas for machine learning projects.
Free datasets to use for your projects.
Examples of libraries and algorithms for each case.
The Transformer architecture is the AI model responsible for the significant advancement in NLP in the last few years.
Its main component is the self-attention mechanism, which learns the importance of each part of the sentence in relation to a word. For example, in "The animal didn't cross the street because it was too tired", self-attention associates "it" with "animal".
However, self-attention is very heavy to compute, it scales with quadratic complexity with respect to the sentence length.
A new solution proposed by Google, FNet, replaces self-attention with a Fourier transform, which is only 8% less accurate than BERT in the GLUE benchmark but 7 times faster to compute in a GPU.
The Fourier transform is a mathematical operation that transforms a complex temporal signal, into simpler subcomponents defined by a frequency. This operation can be applied to a sentence inside the Transformer architecture as a replacement of the self-attention mechanism. Intuitively, applying a Fourier transform is just encoding the input as a linear combination of the text embeddings.
These linear combinations are mixed with simple non-linearities in the feed-forward network. The result is a faster Transformer that is able to competitively understand the semantic relationship in the text in several NLP tasks.
The question this paper suggests is, do we really need complex non-linearities?