arXiv Help Contact

Showing 1-3 of 1234 results

  1. arXiv:1706.03762

    Attention Is All You Need

    Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar

    The dominant sequence transduction models are based on complex...

    Submitted 12 June, 2017; originally announced June 2017.

  2. arXiv:2106.13083

    Attention Is Not All You Need: Pure Attention Loses Rank

    Authors: Yihe Dong, Jean-Baptiste Cordonnier, Andreas Loukas

    Attention-based architectures have become ubiquitous...

    Submitted 23 June, 2021.

  3. arXiv:2009.06732

    Efficient Transformers: A Survey

    Authors: Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

    Transformer model architectures have garnered immense interest...

    Submitted 14 September, 2020.