**Process Migration in DEMOS/MP** ================================= # Take away - what/why process migration - support from DEMOS/MP to ease process migration (links, message passing) + links: context-independent +DELIVERTOKERNEL interface - system process vs. user process + what is the difference when one is moved - mechanism to migrate a process - update out-of-date link: per usage (ask why?) - policy: when/where to move? what can be count on making decision? Xen VM migration ---------------- - VM migration vs. Process migration (from Xen paper) + narrow interface with VMM, hence less residual dependencies - Live Migration + minimize downtime + minimize total migration time + not disrupt active services through resources contention - What are residual dependencies + open file descriptors + shared memory segments + other local resources... # Why bother process migration? - load balancing, leverage parallelism, hence improve performance - reduce inter-machine communication, by moving process closer to resource - good for single process, if resource pattern changes during life time, need to move to another node that best fit - fault recovery: + if process is saved on stable storage + if processor gradually degrade - consolidation: when we want to shutdown a machine for maintenance. Note: need to consider cost/overhead of process migration too # What are the challenges? - transparency > perform migration without affecting other operations in progresses - state of process may be dispersed in system data structure -> hard to extract on source and recreate on destination - sometimes, machine ID as part of process ID - some parts of OS interact with processes in a location-dependent way, hence process is hard to move around - residual dependancies - the down time, during migration, how can we still provide service, or at least minimize the downtime - shared channel, communication with other processes --> if move, need some way for other process to be aware of # Why process migration in DEMOS/MP feasible? - DEMOS/MP: + message-based OS + system services are implemented as server processes + access to a service by message passing + communication-oriented kernel call - Each process has a links table - Links: + specify receiver of a message + addresses in link are *context-independent* (this is important, think about a process doing an IO, and you have to migrate that process) + manipulated like capability (i.e can be duplicated, passed to other processes, destroyed) + address in link contains: > creating machine (unchangeable) > local unique ID (unchangeable) > last known machine (changeable) ==> hence a process in system is identified by (machine ID, local processID) + special kernel communication: DELIVERTOKERNEL flag > cause that message handled by the kernel e.g: for suspending a process, process manager send a message to kernel + also provide access to memory > creation process and specify read/write access to its address space in the link > other process holding that link can read/write to that location (some what similar to capability) > mechanism for large data transfer - sender kernel send data message to receiver kernel - receiver kernel use link's info to correctly place the data - sender kernel does not care about receiver process's location - process state: + image: code, data, stack + page table + *link table* + message queue + execution status + dispatch info (what is it?) - 2 types of processes + system processes: present all time > process manager/memory scheduler - handle process scheduler - when/where to migrate a process > file system > command interpreter > switch board: distributes link by name + user process: created dynamically on user demand - 2 types of link + long link: > request link: represent a service > resource link: represent an object (like an open file) + short link: > reply link: respond to a request Note: links are the only connections a process has to OS, system resources, and other processes ==> hence link table is the complete encapsulation of execution of the process Hence, state of process is not scattered in system structure link is only resource state that process maintain - Where the links located? + in processes' link table + in messages - What if migrating a user process? + simple, because processes likely to have link to a user process is system process, and they are mostly reply link, the number of which is small ==> simple forward those messages to new location - what if migrating system process? + more difficult, because of more link to update since a lot of processes contains long link to a system process ==> Problem: out-of-date links when a process migrate? Need some mechanism to update these links that is *fast* (because of link nature, searching entire system to a link for a particular process is way too inefficient) # Policy question: Where and When? - what is needed: + resource usage pattern of a process ==> performance monitoring + load on a processor + communication cost of migration (does it worth it) # The mechanism: migrating a process This paper does not address the policy issue of: - where to move a process to? - when is a right time to move? This only deals with mechanism of *how* to move a process It is simple because of support from DEMOS/MP. It involves 8 step 1) source kernel: remove process from execution: + queue all incoming message 2) source kernel: ask destination kernel to move process + send to destination info about the process to be migrated 3) destination: allocate a process state + new process ID is the same as the migrating process (*Question*: what if this process ID is not available?) 4) destination: transfer process state (*Question*: how to handle page table?) (Note: resident and swappable data are transferred) 5) destination: transfer the program ==> change to page table may be made here 6) source: forward pending message + location of the process address is changed 7) source: clean-up, just leave a *forwarding address* 8) destination: restart the process # What if the process does not want to migrate? # And the destination kernel refuse the migration? ==> checking can be made during the migration of a process # Message forwarding: 3 cases - message sent but not received before process finished moving: + forwarded immediately as part of migration - message sent after a process is moved using an old link + forwarded as they arrive (and update link procedure may taken place) - message sent after migration with new link: trivial Problem: forwarding message may incur overhead - alternative to message forwarding: return an error ==> sender may do a system wide search to find new location (costly) - When do delete forwarding: + when process dies + when all links are up-to-date. How to know? Counter # Updating links - since a process is migrated, links to it are out-of-date - a message to *old* place will be forward to *new* place ==> but this incur communication overhead - how to update the links: Alternative 1: system-wide search, and update all at once ==> costly Alternative 2: update as they are used i.e when message is forward, an update-link message is sent back to sender, asking the sender to update the link Why this is good? + for user process, a few links to it (mostly from system processes) + for system processes, rarely moved # cost of migration - actual cost of moving process and its state + state transfer cost: > code and data > resident state > swapping state > message forwarding if any arrive during migration + administrative cost: set up connection, ack of message, etc - incremental cost: updating message path