CLOUDSTACK-8324: Added migration changes for config drive#375
Closed
jayapalu wants to merge 2 commits intoapache:masterfrom
Closed
CLOUDSTACK-8324: Added migration changes for config drive#375jayapalu wants to merge 2 commits intoapache:masterfrom
jayapalu wants to merge 2 commits intoapache:masterfrom
Conversation
|
cloudstack-pull-requests #344 FAILURE |
|
cloudstack-pull-requests #357 SUCCESS |
yadvr
added a commit
that referenced
this pull request
Jan 20, 2021
Fixes #375 Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
shwstppr
pushed a commit
to shapeblue/cloudstack
that referenced
this pull request
Apr 11, 2024
* Fix host stuck in connecting state (apache#8502) There are a lot of test failures due to test_vm_life_cycle.py in multiple PRs due to host not available for migration of VMs. apache#8438 (comment) apache#8433 (comment) apache#7344 (comment) While debugging I noticed that the hosts get stuck in Connecting state because MS is waiting for a response of the ReadyCommand from the agent. Since we take a lock on connection and disconnection, restarting the agent doesn't work. To fix this, we have to restart the MS or wait for ~1 hour (default timeout). On the agent side, it gets stuck waiting for a response from the Script execution. To reproduce, run smoke/test_vm_life_cycle.py (TestSecuredVmMigration test class to be specific). Once the tests are complete, you will notice that some hosts are stuck in Connecting state. And restarting the agent fails due to the named lock. Locks on DB can be checked using the below query. SELECT * FROM performance_schema.metadata_locks INNER JOIN performance_schema.threads ON THREAD_ID = OWNER_THREAD_ID WHERE PROCESSLIST_ID <> CONNECTION_ID() \G; This PR adds a wait for the ready command and a timeout to the Script execution to ensure that the thread doesn't get stuck and the named lock from database is released. * Externalise a few timeouts & fix timeout for hostSupportsUefi in libvirt ready command wrapper (apache#8547) This PR fixes bug introduced in apache#8502. Timeout for script execution was set to 60 ms instead of 60s which resulted in host not getting UEFI enabled. This is a blocker for 4.19 release. We do this by introducing a new agent parameter `agent.script.timeout` (default - 60 seconds) to use as a timeout for the script checking host's UEFI status. We also externalize the timeout for the ReadyCommand by introducing a new global setting `ready.command.wait` (default - 60 seconds). For ModifyStoragePoolCommand, we don't externalize the timeout to avoid confusion for the user. Since, the required timeout can vary depending on the provider in use and we are only setting the wait for default host listener for now. Instead, we reuse the global `wait` setting by dividing it by `5` making the default value of 6 minutes (1800/5 = 360s) for ModifyStoragePoolCommand. Note: the actual time, the MS waits is twice the wait set for a Command. Check reference code below. https://github.com/apache/cloudstack/blob/19250403e645c76f60b17aa4aeb4dc915f5ca206/engine/orchestration/src/main/java/com/cloud/agent/manager/AgentAttache.java#L406-L442 * fixup
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added vm migration changes for config drive.