yangb02/EVA - EVA - OpenI - 启智AI开源社区提供普惠算力！

关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

History

yangb_pcl 69c043765d change gloo to hccl		9 months ago
..
assets	update radar.png	1 year ago

asuka	ft w/ bf16 requires `ds==0.8.1`	1 year ago

det	change gloo to hccl	9 months ago

seg	add EVA-02	1 year ago

README.md	update eva02 readme	1 year ago

EVA-02: A Visual Representation for Neon Genesis

Yuxin Fang^2,1, Quan Sun¹, Xinggang Wang², Tiejun Huang¹, Xinlong Wang¹, Yue Cao¹

¹BAAI, ²HUST

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained to reconstruct strong and robust language-aligned vision features via masked image modeling.

With an updated plain Transformer architecture as well as extensive pre-training from an open & accessible giant CLIP vision encoder, EVA-02 demonstrates superior performance compared to prior state-of-the-art approaches across various representative vision tasks, while utilizing significantly fewer parameters and compute budgets.

Notably, using exclusively publicly accessible training data, EVA-02 with only 304M parameters achieves a phenomenal 90.0 fine-tuning top-1 accuracy on ImageNet-1K val set.
Additionally, EVA-02-CLIP can reach up to 80.4 zero-shot top-1 on ImageNet-1K, outperforming the previous largest & best open-sourced CLIP with only ~1/6 parameters and ~1/6 image-text training data.

We offer four EVA-02 variants in various model sizes, ranging from 6M to 304M parameters, all with impressive performance.

We hope our efforts enable a broader range of the research community to advance the field in a more efficient, affordable and equitable manner.

Summary of EVA-02 performance

Get Started

Best Practice

If you would like to use / fine-tune EVA-02 in your project, please start with a shorter schedule & smaller learning rate (compared with the baseline setting) first.
Using EVA-02 as a feature extractor: https://github.com/baaivision/EVA/issues/56.

BibTeX & Citation

@article{EVA02,
  title={EVA-02: A Visual Representation for Neon Genesis},
  author={Fang, Yuxin and Sun, Quan and Wang, Xinggang and Huang, Tiejun and Wang, Xinlong and Cao, Yue},
  journal={arXiv preprint arXiv:2303.11331},
  year={2023}
}

Acknowledgement

EVA-01, BEiT, BEiTv2, CLIP, MAE, timm, DeepSpeed, Apex, xFormer, detectron2, mmcv, mmdet, mmseg, ViT-Adapter, detrex, and rotary-embedding-torch.

Contact

For help and issues associated with EVA-02, or reporting a bug, please open a GitHub Issue with label EVA-02.
Let's build a better & stronger EVA-02 together :)
We are hiring at all levels at BAAI Vision Team, including full-time researchers, engineers and interns.
If you are interested in working with us on foundation model, self-supervised learning and multimodal learning, please contact Yue Cao (caoyue@baai.ac.cn) and Xinlong Wang (wangxinlong@baai.ac.cn).

No Description

Text Python Markdown Cuda C++ other

yxf@hust.edu.cn 2yuxinfang@gmail.com 34995468+Quan-Sun@users.noreply.github.com yangbang@pku.edu.cn yangbang.pku@gmail.com huangxs@pcl.ac.cn quansun84@gmail.com

caoyue10@gmail.com 73783715+Robert-zwr@users.noreply.github.com camielk@live.nl 73446357+ivysochyn@users.noreply.github.com 77907090+JiahuiChen-GitHub@users.noreply.github.com harrrrpo@gmail.com sunquan@baai.ac.cn

How to access data resources in code