Going deeper with Image Transformers

論文リンク https://arxiv.org/abs/2103.17239 実装リンク https://github.com/facebookresearch/deit どんなもの？ ViTにLayerScaleとclass-attention layersを導入することで、層数の多いViTの性能を大幅に向上させることができた。 LayerScale 学習可能な…

2021-04-18

TransFG: A Transformer Architecture for Fine-grained Recognition

論文リンク https://arxiv.org/abs/2103.07976 実装リンク https://github.com/TACJu/TransFG どんなもの？ Fine-grained Visual Classification (FGVC, 詳細画像分類: ある特定の対象領域における高粒度の多クラス画像分類。例えば動植物の種類識別など) タ…

2021-04-18

Training data-efficient image transformers & distillation through attention (DeiT)

Vision Transformer DeepLearning 論文読み

どんなもの？ Vision Transformer(ViT)は画像分類タスクでSOTAを達成しているが、大規模な学習データ(JFT-300M, 約3億枚)と計算コスト(680~2,500 TPUv3-days)を必要とする。 DeiTはトークンベースの蒸留と学習方法の工夫によって、ViTよりも精度とスループッ…

2021-04-15

DeepViT: Towards Deeper Vision Transformer

DeepLearning 論文読み Vision Transformer

論文リンク https://arxiv.org/abs/2103.11886 どんなもの？ Vision Transformer (ViT) は、層数を深くした際にCNNよりも性能がサチりやすい。 ViTは層が深くなるにつれてattention mapが似てくる傾向があり、ある層を過ぎるとほとんど同一になってしまう。…

Sleep like a pillow

Deep Learning関係の話。

2021-04-01から1ヶ月間の記事一覧

Going deeper with Image Transformers

TransFG: A Transformer Architecture for Fine-grained Recognition

Training data-efficient image transformers & distillation through attention (DeiT)

DeepViT: Towards Deeper Vision Transformer