Chunxiang Xu avadesian
Loading Heatmap…

avadesian synced commits to master at avadesian/skypilot from mirror

  • b79953ed30 Use TTL instead of LRU cache to store AWS session (#8000) * ttl instead of lru * comment
  • 3713a1fbd9 fix threadlocal AWS cache (#7998) * fix threadlocal AWS cache This partially reverts #5229, which incorrectly simplified the thread-local cache. * make it actually work * expose cache functions * add comment
  • 25221d1876 [SSH Node Pools] Support custom metadata (#7913) * support custom metadata for ssh nodepools * fix unit test * move custom_metadata in schemas * make into function
  • 32038d7934 Improved GitHub actions example (#7932) * improved github actions example * format * typo fix * feedback1 * feedback2 * gh refactoring and simplification * screenshots * Apply suggestions from code review Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com> * feedback * update screenshot, lightweight client, launch step comments * also remove azure download --------- Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
  • 92e33333cc [Examples] Revamp Jupyter Lab example with SDK (#7992) * revamp jupyter lab example with SDK * Apply suggestions from code review Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu> * bash, screenshot * typo --------- Co-authored-by: Romil Bhardwaj <romil.bhardwaj@berkeley.edu>
  • Compare 9 commits »

7 hours ago

avadesian synced commits to lloyd/improve-concurrent-job-launch at avadesian/skypilot from mirror

7 hours ago

avadesian synced commits to lloyd/fix-rsync-not-found at avadesian/skypilot from mirror

7 hours ago

avadesian synced commits to kimi-k2-thinking at avadesian/skypilot from mirror

7 hours ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • a2fb6bd1ac Fix gpu label failure for "Too large resource version" (#7779) * retry for gpu label * poll mechanism * revert * set resource_version = 0
  • abb377d769 [Templates] Pin click to <8.3.0 in Ray start_cluster script (#7985) * print stderr * pin click to <8.3.0
  • Compare 2 commits »

1 day ago

avadesian synced commits to kimi-k2-thinking at avadesian/skypilot from mirror

1 day ago

avadesian synced commits to gpu-launched-metrics at avadesian/skypilot from mirror

1 day ago

avadesian synced commits to gpu-launched-metrics at avadesian/skypilot from mirror

1 day ago

avadesian synced commits to improved-gh-actions at avadesian/skypilot from mirror

2 days ago

avadesian synced commits to gpu-launched-metrics at avadesian/skypilot from mirror

2 days ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • 5b71998893 Allow job controller specifying server config for local API server (#7966) * Allow specifying server config for local API server Signed-off-by: Aylei <rayingecho@gmail.com> * Refine Signed-off-by: Aylei <rayingecho@gmail.com> * Better comments Signed-off-by: Aylei <rayingecho@gmail.com> * Refine Signed-off-by: Aylei <rayingecho@gmail.com> --------- Signed-off-by: Aylei <rayingecho@gmail.com>
  • b466dda68e [Pool][Serve] Stash pool/serve yaml to database. (#7876) * fix * lint * stash to db * alembic * resolve comments
  • Compare 2 commits »

3 days ago

avadesian synced commits to lloyd/improve-concurrent-job-launch at avadesian/skypilot from mirror

  • e2ae0aa0f2 Fix test.
  • 16c57b08e9 Merge branch 'master' into lloyd/improve-concurrent-job-launch
  • 702828931d Format.
  • 019d106ba2 Check skylet in core.
  • 82d57659da [k8s] Display reason for pending pods in provision logs and launch spinner (#7959) * add smoke test for pending pods * Display reason for pending pods in provision logs and launch spinner * also capture image pull events * reset launching spinner msg properly * fix for multi node case * run pod status check in parallel * shorten spinner msg
  • Compare 38 commits »

3 days ago

avadesian synced commits to grpc-default-true at avadesian/skypilot from mirror

  • 0862a048c5 Merge branch 'master' into grpc-default-true
  • 5b71998893 Allow job controller specifying server config for local API server (#7966) * Allow specifying server config for local API server Signed-off-by: Aylei <rayingecho@gmail.com> * Refine Signed-off-by: Aylei <rayingecho@gmail.com> * Better comments Signed-off-by: Aylei <rayingecho@gmail.com> * Refine Signed-off-by: Aylei <rayingecho@gmail.com> --------- Signed-off-by: Aylei <rayingecho@gmail.com>
  • b466dda68e [Pool][Serve] Stash pool/serve yaml to database. (#7876) * fix * lint * stash to db * alembic * resolve comments
  • 82d57659da [k8s] Display reason for pending pods in provision logs and launch spinner (#7959) * add smoke test for pending pods * Display reason for pending pods in provision logs and launch spinner * also capture image pull events * reset launching spinner msg properly * fix for multi node case * run pod status check in parallel * shorten spinner msg
  • 188cb6bfc6 [Requests] Fix cost_report JSON encoding (#7974) * add cost-report call to test_minimal smoke test * add ut * convert to python float
  • Compare 45 commits »

3 days ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • 82d57659da [k8s] Display reason for pending pods in provision logs and launch spinner (#7959) * add smoke test for pending pods * Display reason for pending pods in provision logs and launch spinner * also capture image pull events * reset launching spinner msg properly * fix for multi node case * run pod status check in parallel * shorten spinner msg
  • 188cb6bfc6 [Requests] Fix cost_report JSON encoding (#7974) * add cost-report call to test_minimal smoke test * add ut * convert to python float
  • a6ced08a02 [Docs] Add new badge to pools (#7975) Add new to Pools docs
  • 22761f7b98 [SCP] multi-node support (#7288) * new provisioner * new provisioner * new provisioner * new provisioner * new provisioner * revert * New provisioner * Delete sky/skylet/providers/scp/__init__.py * Delete sky/skylet/providers/scp/config.py * Delete sky/skylet/providers/scp/node_provider.py * Update cloud_vm_ray_backend.py * Update scp-ray.yml.j2 * Update scp.py * Update scp.py * Create __init__.py * Create config.py * Create instance.py * Update __init__.py * Update scp_utils.py * Update scp.py * Update __init__.py * Update __init__.py * Update config.py * Update config.py * Update config.py * Update instance.py * Update __init__.py * Update instance.py * Update instance.py * Update config.py * Update config.py * Update scp.py * Update scp.py * Update cloud_vm_ray_backend.py * Update instance.py * Update config.py * Update config.py * Update config.py * Update config.py * Update scp.py * Update scp.py * refactoring (#21) * Update config.py * Update scp_utils.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * feature 1 * Feature 2 (#24) * Update scp_utils.py * Update config.py * Update config.py * Update config.py * refactoring (#25) * Update instance.py * Update scp_utils.py * Update scp_utils.py * Update instance.py * Update scp_utils.py * rollback * refactoring (#26) * Update instance.py * Update scp_utils.py * Update scp_utils.py * Update instance.py * Update instance.py * Update instance.py * Update scp_utils.py * Update scp_utils.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update scp_utils.py * Update instance.py * Update scp_utils.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update test_cluster_job.py * refactoring * refactoring * refactoring * refactoring * refactoring * refactoring * refactoring * refactoring * Update scp.py * Update scp.py * Multi-node support * Update scp.py * Update scp_utils.py * Update scp_utils.py * Update instance.py * Update scp.py * Update scp.py * Update instance.py * Update instance.py * Update scp_utils.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update test_mount_and_storage.py * Update test_basic.py * Update test_cluster_job.py * Update test_cluster_job.py * Update test_cluster_job.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update test_cluster_job.py * Update test_cluster_job.py * Update instance.py * Update test_cluster_job.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update instance.py * Update scp_utils.py * Update scp_utils.py * backward compatibility * backward compatibility * backward compatibility
  • 18f9b04cc6 [Docs] Fix Ray start_cluster script name (#7965)
  • Compare 6 commits »

3 days ago

avadesian synced commits to simple-ray at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to serve-files-stash-to-db at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • b47481caa3 Add type annotation to ha_recovery_for_consolidation_mode (#7953) Signed-off-by: Aylei <rayingecho@gmail.com>
  • de3f763ccb Fix release body creation failure (#7079) * fix release body creation failure * previous tag * fix * test
  • b765b334de [jobs] record process timestamp to protect against reboot/pid reuse (#7847) * [jobs] record process timestamp to protect against reboot/pid reuse * add test * lint * bump SKYLET_VERSION * add migration * address review comments, clean up * lint * fix tests and backwards compatibility for really old jobs * lint * Apply suggestion from @cblmemo Co-authored-by: Tian Xia <cblmemo@gmail.com> --------- Co-authored-by: Tian Xia <cblmemo@gmail.com>
  • 1a8f65d905 [Docs] Add SSH Node Pool GPU Dependency (#7947)
  • 6c4a92817d [Pool] Naming fix: cluster pool / worker pool -> pool. (#7963) fix
  • Compare 7 commits »

4 days ago

avadesian synced commits to lloyd/improve-concurrent-job-launch at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to simple-ray at avadesian/skypilot from mirror

4 days ago

avadesian synced commits to master at avadesian/skypilot from mirror

  • e5b71de1f1 Support image pull secrets (#7955) * Support image pull secrets Signed-off-by: Aylei <rayingecho@gmail.com> * Update schema Signed-off-by: Aylei <rayingecho@gmail.com> --------- Signed-off-by: Aylei <rayingecho@gmail.com>
  • bb2511b9cb [Volume] Support creating volume with existing resources for k8s and runpod (#7915) * support creating volume with existing resources for k8s and runpod * add ut * update smoke test * update api version * update case * update case * update case * update case * update docs
  • aa906bfc80 Adapt existing HTTP middlewares to handle websocket connection (#7863) * Adapt HTTP middleware to handle websocket Signed-off-by: Aylei <rayingecho@gmail.com> * Also adapt auth user init middleware Signed-off-by: Aylei <rayingecho@gmail.com> * Fix header capture Signed-off-by: Aylei <rayingecho@gmail.com> * More logs Signed-off-by: Aylei <rayingecho@gmail.com> * Fix Signed-off-by: Aylei <rayingecho@gmail.com> * Refine error message Signed-off-by: Aylei <rayingecho@gmail.com> * Instrument Signed-off-by: Aylei <rayingecho@gmail.com> * Fix service account token auth at proxy client Signed-off-by: Aylei <rayingecho@gmail.com> * Fix UT Signed-off-by: Aylei <rayingecho@gmail.com> * More unit test Signed-off-by: Aylei <rayingecho@gmail.com> * Apply suggestions from code review Co-authored-by: Christopher Cooper <christopher@cg505.com> * Fix review comments Signed-off-by: Aylei <rayingecho@gmail.com> * Refine Signed-off-by: Aylei <rayingecho@gmail.com> * More types Signed-off-by: Aylei <rayingecho@gmail.com> * Add TODO Signed-off-by: Aylei <rayingecho@gmail.com> --------- Signed-off-by: Aylei <rayingecho@gmail.com> Co-authored-by: Christopher Cooper <christopher@cg505.com>
  • 0023b31866 timeout based flush + batch flush in `passthrough_stream_handler` (#7951) * timeout based flush + batch flush * remove try * unit test for flush streaming * new unit test file * lint
  • fab3ceedb7 Properly display infra string in managed jobs when job is completed (#7941) undo regression
  • Compare 8 commits »

4 days ago