deepmar description

In real video surveillance scenarios, visual pedestrian attributes, such as gender, backpack, clothes types, are very important for pedestrian retrieval and person reidentification. Existing methods for attributes recognition have two drawbacks: (a) handcrafted features (e.g. color histograms, local binary patterns) cannot cope well with the difficulty of real video surveillance scenarios; (b) the relationship among pedestrian attributes is ignored. To address the two drawbacks, we propose two deep learning based models to recognize pedestrian attributes. On the one hand, each attribute is treated as an independent component and the deep learning based single attribute recognition model (DeepSAR) is proposed to recognize each attribute one by one. On the other hand, to exploit the relationship among attributes, the deep learning framework which recognizes multiple attributes jointly (DeepMAR) is proposed. In the DeepMAR, one attribute can contribute to the representation of other attributes. For example, the gender of woman can contribute to the representation oflong hair and wearing skirt. Experiments on recent popular pedestrian attribute datasets illustrate that our proposed models achieve the state-of-the-art results.

model architecture

An illustration of the architecture of our network. (a) is the proposed DeepSAR method which consists of a input image, a sharednetwork (c), 2 output nodes. (b) is the proposed DeepMAR method which consists of an input image, a shared network (c), and 35 outputnodes. (c) is a shared sub network between DeepSAR and DeepMAR. Given an image, the DeepSAR outputs a label which representswhether it has the attribute or not, and the DeepMAR output a label vector which represents whether it has each attribute or not.

dataset

you can get it in the directory

peta_dataset.pkl and peta_partition.pkl

feature

The mixed precision training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.

environment

hardware
- Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend , please send the application form to ascend@huawei.com. Once approved, you can get the resources.
framework
- mindspore

quick start

train and test

python train.py

Script Description

│  52epoch的结果.txt
│  create_pkl_ofpeta.py
│  dataset.py
│  deepmar_best.ckpt//the checkpoint
│  evaluate.py
│  list.txt
│  loss.py
│  main.py
│  model_deepmar.py//the model of deepmar
│  PETA.mat
│  peta_dataset.pkl//dataset
│  peta_partition.pkl//dataasset
│  train.py//train as well test
│  train_with_newBceloss.py
│  try.py
│  
├─.idea
│  │  .gitignore
│  │  .name
│  │  deepmar.iml
│  │  misc.xml
│  │  modules.xml
│  │  vcs.xml
│  │  workspace.xml
│  │  
│  └─inspectionProfiles
│          profiles_settings.xml
│          
└─__pycache__

model description

performance

deepmar on peta

parameters	ascend
resource	Ascend 910 ；CPU 2.60GHz，192cores；Memory，755G
MindSpore Version	1.2.0
dataset	peta
optimizer	sgd
loss function	BCELoss
Checkpoint for Fine tuning	90.1MB

result

{'label_pos_acc': array([0.86713652, 0.75788611, 0.70218228, 0.90455531, 0.750167 ,
0.60548661, 0.97082188, 0.96735951, 0.73154362, 0.75372393,
0.83586207, 0.58614232, 0.79395248, 0.77968397, 0.3277027 ,
0.84368071, 0.90957319, 0.75472527, 0.87183544, 0.93921978,
0.70157819, 0.77832512, 0.68717949, 0.40277778, 0.70800288,
0.61685824, 0.7706422 , 0.56695157, 0.64656716, 0.36507937,
0.32568807, 0.8498354 , 0.584375 , 0.79166667, 0.2375 ]), 'label_neg_acc': array([0.85539931, 0.88815662, 0.98270048, 0.99467713, 0.94641979,
0.9164607 , 0.72011385, 0.7239819 , 0.97300595, 0.97254664,
0.99054545, 0.98315879, 0.91731315, 0.94670381, 0.99219606,
0.95824707, 0.92093831, 0.90197183, 0.99799082, 0.75672766,
0.93574151, 0.99418683, 0.9870278 , 0.98699034, 0.86448404,
0.98923559, 0.96635945, 0.99020555, 0.92590717, 0.99732406,
0.98780818, 0.83511367, 0.98060345, 0.93001931, 0.99654255]), 'label_acc': array([0.86126791, 0.82302137, 0.84244138, 0.94961622, 0.8482934 ,
0.76097366, 0.84546786, 0.8456707 , 0.85227479, 0.86313529,
0.91320376, 0.78465056, 0.85563282, 0.86319389, 0.65994938,
0.90096389, 0.91525575, 0.82834855, 0.93491313, 0.84797372,
0.81865985, 0.88625598, 0.83710364, 0.69488406, 0.78624346,
0.80304691, 0.86850082, 0.77857856, 0.78623717, 0.68120171,
0.65674813, 0.84247453, 0.78248922, 0.86084299, 0.61702128]), 'avg_acc': array([0.86131579, 0.84631579, 0.95394737, 0.98921053, 0.90776316,
0.85381579, 0.93605263, 0.93197368, 0.93986842, 0.94355263,
0.97578947, 0.95526316, 0.87973684, 0.89802632, 0.96631579,
0.93105263, 0.91473684, 0.85789474, 0.9875 , 0.89460526,
0.87131579, 0.98842105, 0.96394737, 0.97592105, 0.80736842,
0.97644737, 0.93828947, 0.97065789, 0.86434211, 0.98684211,
0.96881579, 0.84276316, 0.94723684, 0.86710526, 0.98855263]), 'all_acac': 0.9252218045112782, 'instance_acc': 0.7567285432897856, 'instance_precision': 0.8569174907694643, 'instance_recall': 0.8270195802005013, 'instance_F1': 0.8417031202649606}

5.6 KiB Raw Permalink Blame History