Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
Yichen Xu c2f4dc73e4 | 2 years ago | |
---|---|---|
.github/workflows | 2 years ago | |
GCL | 2 years ago | |
docs | 2 years ago | |
examples | 2 years ago | |
.gitignore | 2 years ago | |
.readthedocs.yml | 2 years ago | |
LICENSE | 2 years ago | |
README.md | 2 years ago | |
logo.png | 2 years ago | |
pyproject.toml | 2 years ago | |
setup.cfg | 2 years ago | |
setup.py | 2 years ago |
PyGCL is a PyTorch-based open-source Graph Contrastive Learning (GCL) library, which features modularized GCL components from published papers, standardized evaluation, and experiment management.
Graph Contrastive Learning (GCL) establishes a new paradigm for learning graph representations without human annotations. A typical GCL algorithm firstly constructs multiple graph views via stochastic augmentation of the input and then learns representations by contrasting positive samples against negative ones.
👉 For a general introduction of GCL, please refer to our paper and blog. Also, this repo tracks newly published GCL papers.
PyGCL needs the following packages to be installed beforehand:
To install PyGCL with pip
, simply run:
pip install PyGCL
Then, you can import GCL
from your current environment.
A note regarding DGL
Currently the DGL team maintains two versions,
dgl
for CPU support anddgl-cu***
for CUDA support. Sincepip
treats them as different packages, it is hard for PyGCL to check for the version requirement ofdgl
. We have removed such dependency checks fordgl
in our setup configuration and require the users to install a proper version by themselves.
Our PyGCL implements four main components of graph contrastive learning algorithms:
We also implement utilities for training models, evaluating model performance, and managing experiments.
For a quick start, please check out the examples
folder. We currently implemented the following methods:
Besides try the above examples for node and graph classification tasks, you can also build your own graph contrastive learning algorithms straightforwardly.
In GCL.augmentors
, PyGCL provides the Augmentor
base class, which offers a universal interface for graph augmentation functions. Specifically, PyGCL implements the following augmentation functions:
Augmentation | Class name |
---|---|
Edge Adding (EA) | EdgeAdding |
Edge Removing (ER) | EdgeRemoving |
Feature Masking (FM) | FeatureMasking |
Feature Dropout (FD) | FeatureDropout |
Edge Attribute Masking (EAR) | EdgeAttrMasking |
Personalized PageRank (PPR) | PPRDiffusion |
Markov Diffusion Kernel (MDK) | MarkovDiffusion |
Node Dropping (ND) | NodeDropping |
Node Shuffling (NS) | NodeShuffling |
Subgraphs induced by Random Walks (RWS) | RWSampling |
Ego-net Sampling (ES) | Identity |
Call these augmentation functions by feeding with a Graph
in a tuple form of node features, edge index, and edge features (x, edge_index, edge_attrs)
will produce corresponding augmented graphs.
PyGCL supports composing arbitrary numbers of augmentations together. To compose a list of augmentation instances augmentors
, you need to use the Compose
class:
import GCL.augmentors as A
aug = A.Compose([A.EdgeRemoving(pe=0.3), A.FeatureMasking(pf=0.3)])
You can also use the RandomChoice
class to randomly draw a few augmentations each time:
import GCL.augmentors as A
aug = A.RandomChoice([A.RWSampling(num_seeds=1000, walk_length=10),
A.NodeDropping(pn=0.1),
A.FeatureMasking(pf=0.1),
A.EdgeRemoving(pe=0.1)],
num_choices=1)
You can write your own augmentation functions by inheriting the base Augmentor
class and defining the augment
function.
Existing GCL architectures could be grouped into two lines: negative-sample-based methods and negative-sample-free ones.
Contrastive architectures | Supported contrastive modes | Need negative samples | Class name | Examples |
---|---|---|---|---|
Single-branch contrasting | G2L only | ✅ | SingleBranchContrast |
DGI, InfoGraph |
Dual-branch contrasting | L2L, G2G, and G2L | ✅ | DualBranchContrast |
GRACE |
Bootstrapped contrasting | L2L, G2G, and G2L | ❎ | BootstrapContrast |
BGRL |
Within-embedding contrasting | L2L and G2G | ❎ | WithinEmbedContrast |
GBT |
Moreover, you can use add_extra_mask
if you want to add positives or remove negatives. This function performs bitwise ADD to extra positive masks specified by extra_pos_mask
and bitwise OR to extra negative masks specified by extra_neg_mask
. It is helpful, for example, when you have supervision signals from labels and want to train the model in a semi-supervised manner.
Internally, PyGCL calls Sampler
classes in GCL.models
that receive embeddings and produce positive/negative masks. PyGCL implements three contrasting modes: (a) Local-Local (L2L), (b) Global-Global (G2G), and (c) Global-Local (G2L) modes. L2L and G2G modes contrast embeddings at the same scale and the latter G2L one performs cross-scale contrasting. To implement your own GCL model, you may also use these provided sampler models:
Contrastive modes | Class name |
---|---|
Same-scale contrasting (L2L and G2G) | SameScaleSampler |
Cross-scale contrasting (G2L) | CrossScaleSampler |
sampler.add_intraview_negs
to enlarge the negative sample set.In GCL.losses
, PyGCL implements the following contrastive objectives:
Contrastive objectives | Class name |
---|---|
InfoNCE loss | InfoNCE |
Jensen-Shannon Divergence (JSD) loss | JSD |
Triplet Margin (TM) loss | Triplet |
Bootstrapping Latent (BL) loss | BootstrapLatent |
Barlow Twins (BT) loss | BarlowTwins |
VICReg loss | VICReg |
All these objectives are able to contrast any arbitrary positive and negative pairs, except for Barlow Twins and VICReg losses that perform contrastive learning within embeddings. Moreover, for InfoNCE and Triplet losses, we further provide SP
variants that computes contrastive objectives given only one positive pair per sample to speed up computation and avoid excessive memory consumption.
PyGCL further implements several negative sampling strategies:
Negative sampling strategies | Class name |
---|---|
Subsampling | GCL.models.SubSampler |
Hard negative mixing | GCL.models.HardMixing |
Conditional negative sampling | GCL.models.Ring |
Debiased contrastive objective | GCL.losses.DebiasedInfoNCE , GCL.losses.DebiasedJSD |
Hardness-biased negative sampling | GCL.losses.HardnessInfoNCE , GCL.losses.HardnessJSD |
The former three models serve as an additional sampling step similar to existing Sampler
ones and can be used in conjunction with any objectives. The last two objectives are only for InfoNCE and JSD losses.
PyGCL provides a variety of evaluator functions to evaluate the embedding quality:
Evaluator | Class name |
---|---|
Logistic regression | LREvaluator |
Support vector machine | SVMEvaluator |
Random forest | RFEvaluator |
To use these evaluators, you first need to generate dataset splits by get_split
(random split) or by from_predefined_split
(according to preset splits).
Feel free to open an issue should you find anything unexpected or create pull requests to add your own work! We are motivated to continuously make PyGCL even better.
Please cite our paper if you use this code in your own work:
@article{Zhu:2021tu,
author = {Zhu, Yanqiao and Xu, Yichen and Liu, Qiang and Wu, Shu},
title = {{An Empirical Study of Graph Contrastive Learning}},
journal = {arXiv.org},
year = {2021},
eprint = {2109.01116v1},
eprinttype = {arxiv},
eprintclass = {cs.LG},
month = sep,
}
No Description
Python INI
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》