Chunxiang Xu avadesian
Loading Heatmap…

avadesian synced commits to master at avadesian/skypilot from mirror

  • cabefc4692 [ssh] better error for missing identity file (#6003)
  • 42d0daf594 [Dashboard] Jobs/Clusters: filter by name. (#5997) * [Dashboard] Jobs page: add filter by name. * lint * Add to clusters page
  • Compare 2 commits »

1 day ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • 660d2bde85 [Dashboard] Add user filtering for clusters and jump to user's cluster/jobs (#5936) * Add user filtering for clusters * allow 0 click * Add ref link for filtering * format
  • 6fd0ad541d Add `Hyperbolic` cloud (#5517) * Update fetch_hyperbolic.py for catalog generation improvements * feat: Add Hyperbolic cloud provider integration - Add Hyperbolic cloud provider implementation with support for GPU instances - Update API endpoint from /v2/marketplace/instances/create/cheapest to /v2/marketplace/instances/create-cheapest - Ensure payload handling uses correct casing (instanceId) - Add Hyperbolic to supported clouds list - Add Ray cluster configuration template for Hyperbolic - Add instance provisioning and management utilities - Add service catalog integration for Hyperbolic instances * chore(hyperbolic): explicitly list region, zone, and spot as unsupported features * style: fix all lint and type issues for Hyperbolic provider integration - Break long lines in sky/clouds/hyperbolic.py to comply with 80-char limit - Remove useless __init__ and unused imports; resolve parameter shadowing - Use forward reference and TYPE_CHECKING for Resources type annotation - Fix inconsistent string quotes in fetch_hyperbolic.py - Remove unused imports and improve exception chaining in utils.py - Ensure vms.csv is never committed - Achieve full pylint and mypy compliance for Hyperbolic provider code * feat(hyperbolic): implement get_credential_file_mounts to support API key mounting and fix all style/type issues * fix(hyperbolic): implement _get_feasible_launchable_resources to avoid NotImplementedError in tests and dryrun * fix: provisoner, add missing stubs, standardize imports, and ensure full consistency with other providers * fix: provisoner issues * fix: instance provisioning logic * fix: provisioning params * fix: timeout * feat: support custom user_metadata for skypilot clusters * fix: linting issues * fix: add AuthorizedKey to node_config in hyperbolic-ray.yml.j2 for SSH compatibility * chore: cleanup * fix: linting issues * fix: improve tests * fix: test failures * fix: address pylint warnings in hyperbolic_catalog.py - Remove trailing whitespace - Add proper handling for use_spot and instance_type parameters - Improve error messages formatting * fix: allow None and empty string for zone parameter in list_accelerators * fix: update list_accelerators to match pattern used by other providers * chore: updating endpoint for `dev` and adding `restarting` status * chore: remove SSH public key injection * fix: provisioning issues * chore: streamlining statuses * feat: setting up auth for hyperbolic * fix(hyperbolic): use publicKeys field for SSH key injection as per new API * fix(hyperbolic): return empty dict in query_instances when no instances exist, matching delete-only provider pattern * refactor(hyperbolic): standardize instance metadata filtering to use 'metadata' everywhere and ensure robust filtering in launch flow * refactor: clean up hyperbolic ray template for single-node setup * fix: improve hyperbolic catalog generation and type hints - Add proper type hints, fix GpuInfo formatting, add SpotPrice field, and improve code formatting * refactor: improve GpuInfo formatting * fix: status value as expected by skypilot * chore: simplifying create_catalog() method and removing unwanted test * refactor: simplify hyperbolic API URL configuration * fix: use correct catalog import in hyperbolic cloud implementation * fix: update hyperbolic cloud implementation to use correct catalog imports and API endpoints * fix: improve error handling and logging in hyperbolic API client * refactor: remove redundant sky/catalog/constants.py as it duplicates sky/skylet/constants.py * fix: make fetch_hyperbolic.py compatible with both GH workflow and normal usage * refactor: improvements and move fetch_hyperbolic.py to catalog/data_fetchers * chore: adding better error handling * fix: improve API key path handling and fix test failures * fix: update test_hyperbolic_check_credentials_present to properly handle API key path * fix: test_invalid_instance_type * fix: update Hyperbolic region test to use 'default' region * fix: refactoring instance launch and disabling non-needed smoke tests * Add Hyperbolic cloud to extras_require in dependencies.py * test: skip test_jobs_launch_and_logs for Hyperbolic cloud since it doesn't support autostop and host controllers * test: skip test_multiple_resources for Hyperbolic cloud since it requires multiple cloud providers * fix: mark no run for tests not supported by hyperbolic
  • 9e9bd97754 fix permission service init race condition (#5965)
  • 4dfa63417c Fix `release-publish` pipeline failure (#5978) fix
  • Compare 4 commits »

2 days ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • 8f23cfb0f3 Test private docker registry (#5865) * docker io * registry test * robust test helm * update test case * empty password for GCP * render empty string * env variable * skip if
  • 2367ea9227 Fix Typos in Documentation and Comments (#5984) * Update README.md * Update fetch_vast.py
  • ceedf5638e [UX] Increased robustness for rsync when estimating size of a folder (#5956)
  • 7873bc3ebd [k8s] Better GPU Label Formatter Support for CoreWeave (#5926) * prelim * simplify things * simple fix * nit * add unit test * test * debug attemp
  • Compare 4 commits »

2 days ago

avadesian synced commits to docker-on-vm at avadesian/skypilot from mirror

2 days ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • 7617956543 [Core] removes lstrip('ssh-') (#5985) Removes `lstrip('ssh-')` and replaces it with `removeprefix('ssh-')`. This fixes unintended cases where names starting with s or h might get those letters removed by lstrip.
  • 60a6c930fe [Docs] Instructions to debug api server on helm chart (#5774) * init * lint * upd dockerfile
  • dcb8f8dc82 Fix typos (#5830)
  • afbaa1412c [Examples] Fix GPT2 Example (#5973) fix
  • e11ad0b022 Fix: accept non-k8s TPU names on GKE (e.g. tpu-v6e-8) (#5024) (#5480) * Fix: accept non-k8s TPU names on GKE (e.g. tpu-v6e-8) (#5024) * Fix: Normalize TPU names and counts for GKE compatibility * Fix: Normalize TPU names and counts for GKE compatibility * working optimizer selection --------- Co-authored-by: Yekta Kocaoğullar <yektakocaogullar@Yektas-MacBook-Pro.local> Co-authored-by: Seung Jin Yang <seungjin219@gmail.com>
  • Compare 9 commits »

3 days ago

avadesian synced commits to docker-on-vm at avadesian/skypilot from mirror

  • 038851c2d2 jobs logs
  • cea8791369 jobs queue
  • 0faf98388e [DOV] `ssh up` on clusters (#5974) * mark as infra * turn a cluster into a SSH Node Pool * fix optimizer without gpu * Fix for CPUs * fix resources in dashboard * fix * fix jobs launch * skip ssh setup for sky.jobs.launch * docker pull first * more ux --------- Co-authored-by: cblmemo <cblmemo@gmail.com>
  • 0d89be5c6b fix docker ports
  • 41b9a8baff get docker user from container
  • Compare 6 commits »

3 days ago

avadesian synced commits to add-metrics-to-server-exp at avadesian/skypilot from mirror

3 days ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • 8f6be73b19 [config] re-check clouds if list of allowed clouds changed (#5590) re-check clouds if list of allowed clouds changed only invalidate id a new cloud is added workspaces / sqlalchemy fix fix fix2 check update cloud tempfix testfix2

3 days ago

avadesian synced commits to docker-on-vm at avadesian/skypilot from mirror

3 days ago

avadesian synced commits to add-metrics-to-server-exp at avadesian/skypilot from mirror

3 days ago

avadesian synced commits to speedup-api at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • a582bda995 [Jobs] Remove jobs controller side dashboard (#5917) * remove jobs controller side dashboard * format
  • b85fa52145 Support restful admin policy (#5940) * Support restful admin policy Signed-off-by: Aylei <rayingecho@gmail.com> * Lint Signed-off-by: Aylei <rayingecho@gmail.com> * Fi test Signed-off-by: Aylei <rayingecho@gmail.com> * Lint Signed-off-by: Aylei <rayingecho@gmail.com> * initial docs update * Update docs Signed-off-by: Aylei <rayingecho@gmail.com> * Address review comments Signed-off-by: Aylei <rayingecho@gmail.com> * Fix test Signed-off-by: Aylei <rayingecho@gmail.com> --------- Signed-off-by: Aylei <rayingecho@gmail.com> Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
  • 63b00fb931 Enable workflow trigger in release PR (#5844) change secret
  • Compare 3 commits »

4 days ago

avadesian synced commits to docker-on-vm at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to add-metrics-to-server at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to ssh-node-pools-ux at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to speedup-api at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to restful-admin-policy at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • 3edc0e117a lazy initialization for jobs.db (#5957) * lazy initialization for jobs.db * ???
  • 71ac633a54 lazy initialization of permissions service (#5942) * lazy init of permissions service * testfix * textfix2
  • b9ba30ba1c Robust test helm deploy on GKE (#5943) robust test helm
  • 4c1fdaf3cb [docs] example using kueue (#5498) * example using kueue * address review comments * add images, rewrite sections * bugfix * add more docs on multi API server deployment within a cluster * ez pz fixes * mention fair scheduling * all good changes * add note on flavors/resources * proper list formatting * reorder sections, add overview * separate out section on patching kueue * update docs for just one api server / namespace * Apply suggestions from code review Co-authored-by: Christopher Cooper <cooperc@assemblesys.com> * remove create ns * remove api server deployment step * snippet on default resource flavor * Update docs/source/reference/kubernetes/examples/kueue-example.rst Co-authored-by: Christopher Cooper <cooperc@assemblesys.com> --------- Co-authored-by: Christopher Cooper <cooperc@assemblesys.com>
  • c93e01ebfb [jobs] invert priority value (#5954) * [jobs] invert priority value Now, a higher priority value indicates that the job is higher priority. * lint
  • Compare 9 commits »

4 days ago

avadesian synced commits to jobs-ha-controller-doc at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to ha-job-controller-recovery at avadesian/skypilot from mirror

4 days ago