Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
Songqing Zhang 15ace31a16 | 1 month ago | |
---|---|---|
.. | ||
README.md | 3 years ago | |
deepwalk.py | 1 year ago | |
load_dataset.py | 1 year ago | |
model.py | 1 year ago | |
reading_data.py | 1 month ago | |
utils.py | 1 year ago |
The implementation includes multi-processing training with CPU and mixed training with CPU and multi-GPU.
Currently, we support two builtin dataset: youtube and blog. Use --data_file youtube to select youtube dataset and --data_file blog to select blog dataset.
The data is avaliable at https://data.dgl.ai/dataset/DeepWalk/youtube.zip and https://data.dgl.ai/dataset/DeepWalk/blog.zip
The youtube.zip includes both youtube-net.txt, youtube-vocab.txt and youtube-label.txt; The blog.zip includes both blog-net.txt, blog-vocab.txt and blog-label.txt.
For other datasets please pass the full path to the trainer through --data_file and the format of a network file should follow:
1(node id) 2(node id)
1 3
1 4
2 4
...
To run the code:
python3 deepwalk.py --data_file youtube --output_emb_file emb.txt --mix --lr 0.2 --gpus 0 1 2 3 --batch_size 100 --negative 5
By default the trained embedding is saved under --output_embe_file FILE_NAME as a numpy object.
To save the trained embedding in raw format(txt format), please use --save_in_txt argument.
To evalutate embedding on multi-label classification, please refer to here
YouTube (1M nodes).
Implementation | Macro-F1 (%) 1% 3% 5% 7% 9% |
Micro-F1 (%) 1% 3% 5% 7% 9% |
---|---|---|
gensim.word2vec(hs) | 28.73 32.51 33.67 34.28 34.79 | 35.73 38.34 39.37 40.08 40.77 |
gensim.word2vec(ns) | 28.18 32.25 33.56 34.60 35.22 | 35.35 37.69 38.08 40.24 41.09 |
ours | 24.58 31.23 33.97 35.41 36.48 | 38.93 43.17 44.73 45.42 45.92 |
The comparison between running time is shown as below, where the numbers in the brackets denote time used on random-walk.
Implementation | gensim.word2vec(hs) | gensim.word2vec(ns) | Ours |
---|---|---|---|
Time (s) | 27119.6(1759.8) | 10580.3(1704.3) | 428.89 |
Parameters.
Speeding-up with mixed CPU & multi-GPU. The used parameters are the same as above.
#GPUs | 1 | 2 | 4 |
---|---|---|---|
Time (s) | 1419.64 | 952.04 | 428.89 |
You can run the code directly with:
python3 deepwalk --ogbl_name xxx --load_from_ogbl
However, ogb.linkproppred might not be compatible with mixed training with multi-gpu. If you want to do mixed training, please use no more than 1 gpu by the command above.
For evaluatation we follow the code mlp.py provided by ogb here.
ogbl-collab
python3 deepwalk.py --ogbl_name ogbl-collab --load_from_ogbl --save_in_pt --output_emb_file collab-embedding.pt --num_walks 50 --window_size 2 --walk_length 40 --lr 0.1 --negative 1 --neg_weight 1 --lap_norm 0.01 --mix --gpus 0 --num_threads 4 --print_interval 2000 --print_loss --batch_size 128 --use_context_weight
cd ./ogb/blob/master/examples/linkproppred/collab/
cp embedding_pt_file_path ./
python3 mlp.py --device 0 --runs 10 --use_node_embedding
ogbl-ddi
python3 deepwalk.py --ogbl_name ogbl-ddi --load_from_ogbl --save_in_pt --output_emb_file ddi-embedding.pt --num_walks 50 --window_size 2 --walk_length 80 --lr 0.1 --negative 1 --neg_weight 1 --lap_norm 0.05 --only_gpu --gpus 0 --num_threads 4 --print_interval 2000 --print_loss --batch_size 16 --use_context_weight
cd ./ogb/blob/master/examples/linkproppred/ddi/
cp embedding_pt_file_path ./
python3 mlp.py --device 0 --runs 10 --epochs 100
ogbl-ppa
python3 deepwalk.py --ogbl_name ogbl-ppa --load_from_ogbl --save_in_pt --output_emb_file ppa-embedding.pt --negative 1 --neg_weight 1 --batch_size 64 --print_interval 2000 --print_loss --window_size 1 --num_walks 30 --walk_length 80 --lr 0.1 --lap_norm 0.02 --mix --gpus 0 --num_threads 4
cp embedding_pt_file_path ./
python3 mlp.py --device 2 --runs 10
ogbl-citation
python3 deepwalk.py --ogbl_name ogbl-citation --load_from_ogbl --save_in_pt --output_emb_file embedding.pt --window_size 2 --num_walks 10 --negative 1 --neg_weight 1 --walk_length 80 --batch_size 128 --print_loss --print_interval 1000 --mix --gpus 0 --use_context_weight --num_threads 4 --lap_norm 0.01 --lr 0.1
cp embedding_pt_file_path ./
python3 mlp.py --device 2 --runs 10 --use_node_embedding
ogbl-collab
#params: 61258346(model) + 131841(mlp) = 61390187
Hits@10
Highest Train: 74.83 ± 4.79
Highest Valid: 40.03 ± 2.98
Final Train: 74.51 ± 4.92
Final Test: 31.13 ± 2.47
Hits@50
Highest Train: 98.83 ± 0.15
Highest Valid: 60.61 ± 0.32
Final Train: 98.74 ± 0.17
Final Test: 50.37 ± 0.34
Hits@100
Highest Train: 99.86 ± 0.04
Highest Valid: 66.64 ± 0.32
Final Train: 99.84 ± 0.06
Final Test: 56.88 ± 0.37
obgl-ddi
#params: 1444840(model) + 99073(mlp) = 1543913
Hits@10
Highest Train: 33.91 ± 2.01
Highest Valid: 30.96 ± 1.89
Final Train: 33.90 ± 2.00
Final Test: 15.16 ± 4.28
Hits@20
Highest Train: 44.64 ± 1.71
Highest Valid: 41.32 ± 1.69
Final Train: 44.62 ± 1.69
Final Test: 26.42 ± 6.10
Hits@30
Highest Train: 51.01 ± 1.72
Highest Valid: 47.64 ± 1.71
Final Train: 50.99 ± 1.72
Final Test: 33.56 ± 3.95
ogbl-ppa
#params: 150024820(model) + 113921(mlp) = 150138741
Hits@10
Highest Train: 4.78 ± 0.73
Highest Valid: 4.30 ± 0.68
Final Train: 4.77 ± 0.73
Final Test: 2.67 ± 0.42
Hits@50
Highest Train: 18.82 ± 1.07
Highest Valid: 17.26 ± 1.01
Final Train: 18.82 ± 1.07
Final Test: 17.34 ± 2.09
Hits@100
Highest Train: 31.29 ± 2.11
Highest Valid: 28.97 ± 1.92
Final Train: 31.28 ± 2.12
Final Test: 28.88 ± 1.53
ogbl-citation
#params: 757811178(model) + 131841(mlp) = 757943019
MRR
Highest Train: 0.9381 ± 0.0003
Highest Valid: 0.8469 ± 0.0003
Final Train: 0.9377 ± 0.0004
Final Test: 0.8479 ± 0.0003
For efficiency, the results of ogbl-collab, ogbl-ppa, ogbl-ddi are run with multi-GPU. Since ogb is somehow incompatible with our multi-GPU implementation, we need to do some preprocessing. The command is:
python3 load_dataset.py --name dataset_name
It will output a data file to the local. For example, if dataset_name
is ogbl-collab
, then a file ogbl-collab-net.txt
will be generated. Then we run
python3 deepwalk.py --data_file data_file_path
where the other parameters are the same with used configs without using --load_from_ogbl
and --ogbl_name
.
The performance on ogbl-ddi and ogbl-ppa can be not that stable.
No Description
Python C++ Jupyter Notebook Cuda Text other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》