deepmar description
In real video surveillance scenarios, visual pedestrian attributes, such as gender, backpack, clothes types, are very important for pedestrian retrieval and person reidentification. Existing methods for attributes recognition have two drawbacks: (a) handcrafted features (e.g. color histograms, local binary patterns) cannot cope well with the difficulty of real video surveillance scenarios; (b) the relationship among pedestrian attributes is ignored. To address the two drawbacks, we propose two deep learning based models to recognize pedestrian attributes. On the one hand, each attribute is treated as an independent component and the deep learning based single attribute recognition model (DeepSAR) is proposed to recognize each attribute one by one. On the other hand, to exploit the relationship among attributes, the deep learning framework which recognizes multiple attributes jointly (DeepMAR) is proposed. In the DeepMAR, one attribute can contribute to the representation of other attributes. For example, the gender of woman can contribute to the representation oflong hair and wearing skirt. Experiments on recent popular pedestrian attribute datasets illustrate that our proposed models achieve the state-of-the-art results.
model architecture
An illustration of the architecture of our network. (a) is the proposed DeepSAR method which consists of a input image, a sharednetwork (c), 2 output nodes. (b) is the proposed DeepMAR method which consists of an input image, a shared network (c), and 35 outputnodes. (c) is a shared sub network between DeepSAR and DeepMAR. Given an image, the DeepSAR outputs a label which representswhether it has the attribute or not, and the DeepMAR output a label vector which represents whether it has each attribute or not.
dataset
you can get it in the directory
peta_dataset.pkl and peta_partition.pkl
feature
The mixed precision training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
environment
- hardware
- Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend , please send the application form to ascend@huawei.com. Once approved, you can get the resources.
- framework
quick start
python train.py
Script Description
│ 52epoch的结果.txt
│ create_pkl_ofpeta.py
│ dataset.py
│ deepmar_best.ckpt//the checkpoint
│ evaluate.py
│ list.txt
│ loss.py
│ main.py
│ model_deepmar.py//the model of deepmar
│ PETA.mat
│ peta_dataset.pkl//dataset
│ peta_partition.pkl//dataasset
│ train.py//train as well test
│ train_with_newBceloss.py
│ try.py
│
├─.idea
│ │ .gitignore
│ │ .name
│ │ deepmar.iml
│ │ misc.xml
│ │ modules.xml
│ │ vcs.xml
│ │ workspace.xml
│ │
│ └─inspectionProfiles
│ profiles_settings.xml
│
└─__pycache__
model description
performance
deepmar on peta
parameters |
ascend |
resource |
Ascend 910 ;CPU 2.60GHz,192cores;Memory,755G |
MindSpore Version |
1.2.0 |
dataset |
peta |
optimizer |
sgd |
loss function |
BCELoss |
Checkpoint for Fine tuning |
90.1MB |
result
{'label_pos_acc': array([0.86713652, 0.75788611, 0.70218228, 0.90455531, 0.750167 ,
0.60548661, 0.97082188, 0.96735951, 0.73154362, 0.75372393,
0.83586207, 0.58614232, 0.79395248, 0.77968397, 0.3277027 ,
0.84368071, 0.90957319, 0.75472527, 0.87183544, 0.93921978,
0.70157819, 0.77832512, 0.68717949, 0.40277778, 0.70800288,
0.61685824, 0.7706422 , 0.56695157, 0.64656716, 0.36507937,
0.32568807, 0.8498354 , 0.584375 , 0.79166667, 0.2375 ]), 'label_neg_acc': array([0.85539931, 0.88815662, 0.98270048, 0.99467713, 0.94641979,
0.9164607 , 0.72011385, 0.7239819 , 0.97300595, 0.97254664,
0.99054545, 0.98315879, 0.91731315, 0.94670381, 0.99219606,
0.95824707, 0.92093831, 0.90197183, 0.99799082, 0.75672766,
0.93574151, 0.99418683, 0.9870278 , 0.98699034, 0.86448404,
0.98923559, 0.96635945, 0.99020555, 0.92590717, 0.99732406,
0.98780818, 0.83511367, 0.98060345, 0.93001931, 0.99654255]), 'label_acc': array([0.86126791, 0.82302137, 0.84244138, 0.94961622, 0.8482934 ,
0.76097366, 0.84546786, 0.8456707 , 0.85227479, 0.86313529,
0.91320376, 0.78465056, 0.85563282, 0.86319389, 0.65994938,
0.90096389, 0.91525575, 0.82834855, 0.93491313, 0.84797372,
0.81865985, 0.88625598, 0.83710364, 0.69488406, 0.78624346,
0.80304691, 0.86850082, 0.77857856, 0.78623717, 0.68120171,
0.65674813, 0.84247453, 0.78248922, 0.86084299, 0.61702128]), 'avg_acc': array([0.86131579, 0.84631579, 0.95394737, 0.98921053, 0.90776316,
0.85381579, 0.93605263, 0.93197368, 0.93986842, 0.94355263,
0.97578947, 0.95526316, 0.87973684, 0.89802632, 0.96631579,
0.93105263, 0.91473684, 0.85789474, 0.9875 , 0.89460526,
0.87131579, 0.98842105, 0.96394737, 0.97592105, 0.80736842,
0.97644737, 0.93828947, 0.97065789, 0.86434211, 0.98684211,
0.96881579, 0.84276316, 0.94723684, 0.86710526, 0.98855263]), 'all_acac': 0.9252218045112782, 'instance_acc': 0.7567285432897856, 'instance_precision': 0.8569174907694643, 'instance_recall': 0.8270195802005013, 'instance_F1': 0.8417031202649606}