Skip to content

[Bug]: Error processing job assigned to imported fleet with no capacity #3651

@jvstme

Description

@jvstme

Steps to reproduce

  1. Create an SSH fleet with one instance.
  2. Export it to a project.
  3. Run a two-replica service in that project.

Actual behaviour

The first replica gets assigned to the imported instance. During the processing of the second replica, the server starts throwing sqlalchemy.exc.MissingGreenlet. The replica never leaves submitted.

Expected behaviour

The second replica fails to start with FAILED_TO_START_DUE_TO_NO_CAPACITY.

dstack version

master

Server logs

src/dstack/_internal/server/background/scheduled_tasks/submitted_jobs.py:154: in process_submitted_jobs
    await asyncio.gather(*tasks)
src/dstack/_internal/server/utils/sentry_utils.py:14: in wrapper
    return await f(*args, **kwargs)
src/dstack/_internal/server/background/scheduled_tasks/submitted_jobs.py:205: in _process_next_submitted_job
    await _process_submitted_job(
src/dstack/_internal/server/background/scheduled_tasks/submitted_jobs.py:413: in _process_submitted_job
    await _fetch_fleet_with_master_instance_provisioning_data(
src/dstack/_internal/server/background/scheduled_tasks/submitted_jobs.py:583: in _fetch_fleet_with_master_instance_provisioning_data
    fleet = fleet_model_to_fleet(fleet_model)
src/dstack/_internal/server/services/fleets.py:855: in fleet_model_to_fleet
    instances = [instances_services.instance_model_to_instance(i) for i in instance_models]
src/dstack/_internal/server/services/fleets.py:855: in <listcomp>
    instances = [instances_services.instance_model_to_instance(i) for i in instance_models]
src/dstack/_internal/server/services/instances.py:228: in instance_model_to_instance
    project_name=instance_model.project.name,
.venv/lib64/python3.10/site-packages/sqlalchemy/orm/attributes.py:566: in __get__
    return self.impl.get(state, dict_)  # type: ignore[no-any-return]
.venv/lib64/python3.10/site-packages/sqlalchemy/orm/attributes.py:1086: in get
    value = self._fire_loader_callables(state, key, passive)
.venv/lib64/python3.10/site-packages/sqlalchemy/orm/attributes.py:1121: in _fire_loader_callables
    return self.callable_(state, passive)
.venv/lib64/python3.10/site-packages/sqlalchemy/orm/strategies.py:978: in _load_for_state
    return self._emit_lazyload(
.venv/lib64/python3.10/site-packages/sqlalchemy/orm/strategies.py:1079: in _emit_lazyload
    return loading.load_on_pk_identity(
.venv/lib64/python3.10/site-packages/sqlalchemy/orm/loading.py:694: in load_on_pk_identity
    session.execute(
.venv/lib64/python3.10/site-packages/sqlalchemy/orm/session.py:2365: in execute
    return self._execute_internal(
.venv/lib64/python3.10/site-packages/sqlalchemy/orm/session.py:2251: in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
.venv/lib64/python3.10/site-packages/sqlalchemy/orm/context.py:306: in orm_execute_statement
    result = conn.execute(
.venv/lib64/python3.10/site-packages/sqlalchemy/engine/base.py:1416: in execute
    return meth(
.venv/lib64/python3.10/site-packages/sqlalchemy/sql/elements.py:523: in _execute_on_connection
    return connection._execute_clauseelement(
.venv/lib64/python3.10/site-packages/sqlalchemy/engine/base.py:1638: in _execute_clauseelement
    ret = self._execute_context(
.venv/lib64/python3.10/site-packages/sqlalchemy/engine/base.py:1843: in _execute_context
    return self._exec_single_context(
.venv/lib64/python3.10/site-packages/sqlalchemy/engine/base.py:1983: in _exec_single_context
    self._handle_dbapi_exception(
.venv/lib64/python3.10/site-packages/sqlalchemy/engine/base.py:2355: in _handle_dbapi_exception
    raise exc_info[1].with_traceback(exc_info[2])
.venv/lib64/python3.10/site-packages/sqlalchemy/engine/base.py:1964: in _exec_single_context
    self.dialect.do_execute(
.venv/lib64/python3.10/site-packages/sqlalchemy/engine/default.py:945: in do_execute
    cursor.execute(statement, parameters)
.venv/lib64/python3.10/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py:172: in execute
    self._adapt_connection._handle_exception(error)
.venv/lib64/python3.10/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py:323: in _handle_exception
    raise error
.venv/lib64/python3.10/site-packages/sqlalchemy/dialects/sqlite/aiosqlite.py:149: in execute
    _cursor = self.await_(self._connection.cursor())
sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)

Additional information

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions