TimeSformer-pytorch

Phil Wang 9d22e8b183 add ability to mask out frames, for variable lengthed videos in the same batch		3 years ago
.github/workflows	Create python-publish.yml	3 years ago

timesformer_pytorch	add ability to mask out frames, for variable lengthed videos in the same batch	3 years ago

.gitignore	Initial commit	3 years ago

LICENSE	Initial commit	3 years ago

README.md	add ability to mask out frames, for variable lengthed videos in the same batch	3 years ago

diagram.png	add diagram	3 years ago

setup.py	add ability to mask out frames, for variable lengthed videos in the same batch	3 years ago

TimeSformer - Pytorch

Implementation of TimeSformer, from Facebook AI. A pure and simple attention-based solution for reaching SOTA on video classification. This repository will only house the best performing variant, 'Divided Space-Time Attention', which is nothing more than attention along the time axis before the spatial.

Press release

Install

$ pip install timesformer-pytorch

Usage

import torch
from timesformer_pytorch import TimeSformer

model = TimeSformer(
    dim = 512,
    image_size = 224,
    patch_size = 16,
    num_frames = 8,
    num_classes = 10,
    depth = 12,
    heads = 8,
    dim_head =  64,
    attn_dropout = 0.1,
    ff_dropout = 0.1
)

video = torch.randn(2, 8, 3, 224, 224) # (batch x frames x channels x height x width)
mask = torch.ones(2, 8).bool() # (batch x frame) - use a mask if there are variable length videos in the same batch

pred = model(video, mask = mask) # (2, 10)

Citations

@misc{bertasius2021spacetime,
    title   = {Is Space-Time Attention All You Need for Video Understanding?}, 
    author  = {Gedas Bertasius and Heng Wang and Lorenzo Torresani},
    year    = {2021},
    eprint  = {2102.05095},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

No Description

Python

lucidrains@gmail.com

How to access data resources in code

README.md

TimeSformer - Pytorch

Install

Usage

Citations

Contributors (1) All

Contributors (1)
All