54
Full system emulation. In this mode, QEMU emulates a full system
55
(usually a PC), including a processor and various peripherals. It can
56
be used to launch an different Operating System without rebooting the
57
PC or to debug system code.
55
Full system emulation. In this mode (full platform virtualization),
56
QEMU emulates a full system (usually a PC), including a processor and
57
various peripherals. It can be used to launch several different
58
Operating Systems at once without rebooting the host machine or to
60
User mode emulation (Linux host only). In this mode, QEMU can launch
61
Linux processes compiled for one CPU on another CPU. It can be used to
62
launch the Wine Windows API emulator (@url{http://www.winehq.org}) or
63
to ease cross-compilation and cross-debugging.
62
User mode emulation. In this mode (application level virtualization),
63
QEMU can launch processes compiled for one CPU on another CPU, however
64
the Operating Systems must match. This can be used for example to ease
65
cross-compilation and cross-debugging.
67
68
As QEMU requires no host kernel driver to run, it is very safe and
96
104
@item Accurate signal handling by remapping host signals to target signals.
107
Linux user emulator (Linux host only) can be used to launch the Wine
108
Windows API emulator (@url{http://www.winehq.org}). A Darwin user
109
emulator (Darwin hosts only) exists and a BSD user emulator for BSD
110
hosts is under development. It would also be possible to develop a
111
similar user emulator for Solaris.
99
113
QEMU full system emulation features:
101
@item QEMU can either use a full software MMU for maximum portability or use the host system call mmap() to simulate the target MMU.
116
QEMU uses a full software MMU for maximum portability.
119
QEMU can optionally use an in-kernel accelerator, like kqemu and
120
kvm. The accelerators execute some of the guest code natively, while
121
continuing to emulate the rest of the machine.
124
Various hardware devices can be emulated and in some cases, host
125
devices (e.g. serial and parallel ports, USB, drives) can be used
126
transparently by the guest Operating System. Host device passthrough
127
can be used for talking to external physical peripherals (e.g. a
128
webcam, modem or tape drive).
131
Symmetric multiprocessing (SMP) even on a host with a single CPU. On a
132
SMP host system, QEMU can use only one CPU fully due to difficulty in
133
implementing atomic memory accesses efficiently.
104
137
@node intro_x86_emulation
105
@section x86 emulation
138
@section x86 and x86-64 emulation
107
140
QEMU x86 target features:
111
144
@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation.
112
LDT/GDT and IDT are emulated. VM86 mode is also supported to run DOSEMU.
145
LDT/GDT and IDT are emulated. VM86 mode is also supported to run
146
DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3,
147
and SSE4 as well as x86-64 SVM.
114
149
@item Support of host page sizes bigger than 4KB in user mode emulation.
273
320
QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is
276
The new Plex86 [8] PC virtualizer is done in the same spirit as the
277
qemu-fast system emulator. It requires a patched Linux kernel to work
278
(you cannot launch the same kernel on your PC), but the patches are
279
really small. As it is a PC virtualizer (no emulation is done except
280
for some priveledged instructions), it has the potential of being
281
faster than QEMU. The downside is that a complicated (and potentially
282
unsafe) host kernel patch is needed.
323
The Plex86 [8] PC virtualizer is done in the same spirit as the now
324
obsolete qemu-fast system emulator. It requires a patched Linux kernel
325
to work (you cannot launch the same kernel on your PC), but the
326
patches are really small. As it is a PC virtualizer (no emulation is
327
done except for some privileged instructions), it has the potential of
328
being faster than QEMU. The downside is that a complicated (and
329
potentially unsafe) host kernel patch is needed.
284
331
The commercial PC Virtualizers (VMWare [9], VirtualPC [10], TwoOStwo
285
332
[11]) are faster than QEMU, but they all need specific, proprietary
286
333
and potentially unsafe host drivers. Moreover, they are unable to
287
334
provide cycle exact simulation as an emulator can.
336
VirtualBox [12], Xen [13] and KVM [14] are based on QEMU. QEMU-SystemC
337
[15] uses QEMU to simulate a system where some hardware devices are
338
developed in SystemC.
289
340
@node Portable dynamic translation
290
341
@section Portable dynamic translation
295
346
which make it relatively easily portable and simple while achieving good
298
The basic idea is to split every x86 instruction into fewer simpler
299
instructions. Each simple instruction is implemented by a piece of C
300
code (see @file{target-i386/op.c}). Then a compile time tool
301
(@file{dyngen}) takes the corresponding object file (@file{op.o})
302
to generate a dynamic code generator which concatenates the simple
303
instructions to build a function (see @file{op.h:dyngen_code()}).
305
In essence, the process is similar to [1], but more work is done at
308
A key idea to get optimal performances is that constant parameters can
309
be passed to the simple operations. For that purpose, dummy ELF
310
relocations are generated with gcc for each constant parameter. Then,
311
the tool (@file{dyngen}) can locate the relocations and generate the
312
appriopriate C code to resolve them when building the dynamic code.
314
That way, QEMU is no more difficult to port than a dynamic linker.
316
To go even faster, GCC static register variables are used to keep the
317
state of the virtual CPU.
319
@node Register allocation
320
@section Register allocation
322
Since QEMU uses fixed simple instructions, no efficient register
323
allocation can be done. However, because RISC CPUs have a lot of
324
register, most of the virtual CPU state can be put in registers without
325
doing complicated register allocation.
349
After the release of version 0.9.1, QEMU switched to a new method of
350
generating code, Tiny Code Generator or TCG. TCG relaxes the
351
dependency on the exact version of the compiler used. The basic idea
352
is to split every target instruction into a couple of RISC-like TCG
353
ops (see @code{target-i386/translate.c}). Some optimizations can be
354
performed at this stage, including liveness analysis and trivial
355
constant expression evaluation. TCG ops are then implemented in the
356
host CPU back end, also known as TCG target (see
357
@code{tcg/i386/tcg-target.c}). For more information, please take a
358
look at @code{tcg/README}.
327
360
@node Condition code optimisations
328
361
@section Condition code optimisations
330
Good CPU condition codes emulation (@code{EFLAGS} register on x86) is a
331
critical point to get good performances. QEMU uses lazy condition code
332
evaluation: instead of computing the condition codes after each x86
333
instruction, it just stores one operand (called @code{CC_SRC}), the
334
result (called @code{CC_DST}) and the type of operation (called
337
@code{CC_OP} is almost never explicitely set in the generated code
363
Lazy evaluation of CPU condition codes (@code{EFLAGS} register on x86)
364
is important for CPUs where every instruction sets the condition
365
codes. It tends to be less important on conventional RISC systems
366
where condition codes are only updated when explicitly requested.
368
Instead of computing the condition codes after each x86 instruction,
369
QEMU just stores one operand (called @code{CC_SRC}), the result
370
(called @code{CC_DST}) and the type of operation (called
371
@code{CC_OP}). When the condition codes are needed, the condition
372
codes can be calculated using this information. In addition, an
373
optimized calculation can be performed for some instruction types like
374
conditional branches.
376
@code{CC_OP} is almost never explicitly set in the generated code
338
377
because it is known at translation time.
340
In order to increase performances, a backward pass is performed on the
341
generated simple instructions (see
342
@code{target-i386/translate.c:optimize_flags()}). When it can be proved that
343
the condition codes are not needed by the next instructions, no
344
condition codes are computed at all.
379
The lazy condition code evaluation is used on x86, m68k and cris. ARM
380
uses a simplified variant for the N and Z flags.
346
382
@node CPU state optimisations
347
383
@section CPU state optimisations
349
The x86 CPU has many internal states which change the way it evaluates
350
instructions. In order to achieve a good speed, the translation phase
351
considers that some state information of the virtual x86 CPU cannot
352
change in it. For example, if the SS, DS and ES segments have a zero
353
base, then the translator does not even generate an addition for the
385
The target CPUs have many internal states which change the way it
386
evaluates instructions. In order to achieve a good speed, the
387
translation phase considers that some state information of the virtual
388
CPU cannot change in it. The state is recorded in the Translation
389
Block (TB). If the state changes (e.g. privilege level), a new TB will
390
be generated and the previous TB won't be used anymore until the state
391
matches the state recorded in the previous TB. For example, if the SS,
392
DS and ES segments have a zero base, then the translator does not even
393
generate an addition for the segment base.
356
395
[The FPU stack pointer register is not handled that way yet].
390
429
When translated code is generated for a basic block, the corresponding
391
host page is write protected if it is not already read-only (with the
392
system call @code{mprotect()}). Then, if a write access is done to the
393
page, Linux raises a SEGV signal. QEMU then invalidates all the
394
translated code in the page and enables write accesses to the page.
430
host page is write protected if it is not already read-only. Then, if
431
a write access is done to the page, Linux raises a SEGV signal. QEMU
432
then invalidates all the translated code in the page and enables write
433
accesses to the page.
396
435
Correct translated code invalidation is done efficiently by maintaining
397
436
a linked list of every translated block contained in a given page. Other
398
437
linked lists are also maintained to undo direct block chaining.
400
Although the overhead of doing @code{mprotect()} calls is important,
401
most MSDOS programs can be emulated at reasonnable speed with QEMU and
404
Note that QEMU also invalidates pages of translated code when it detects
405
that memory mappings are modified with @code{mmap()} or @code{munmap()}.
407
When using a software MMU, the code invalidation is more efficient: if
408
a given code page is invalidated too often because of write accesses,
409
then a bitmap representing all the code inside the page is
410
built. Every store into that page checks the bitmap to see if the code
411
really needs to be invalidated. It avoids invalidating the code when
412
only data is modified in the page.
439
On RISC targets, correctly written software uses memory barriers and
440
cache flushes, so some of the protection above would not be
441
necessary. However, QEMU still requires that the generated code always
442
matches the target instructions in memory in order to handle
443
exceptions correctly.
414
445
@node Exception support
415
446
@section Exception support
448
473
When MMU mappings change, only the chaining of the basic blocks is
449
474
reset (i.e. a basic block can no longer jump directly to another one).
476
@node Device emulation
477
@section Device emulation
479
Systems emulated by QEMU are organized by boards. At initialization
480
phase, each board instantiates a number of CPUs, devices, RAM and
481
ROM. Each device in turn can assign I/O ports or memory areas (for
482
MMIO) to its handlers. When the emulation starts, an access to the
483
ports or MMIO memory areas assigned to the device causes the
484
corresponding handler to be called.
486
RAM and ROM are handled more optimally, only the offset to the host
487
memory needs to be added to the guest address.
489
The video RAM of VGA and other display cards is special: it can be
490
read or written directly like RAM, but write accesses cause the memory
491
to be marked with VGA_DIRTY flag as well.
493
QEMU supports some device classes like serial and parallel ports, USB,
494
drives and network devices, by providing APIs for easier connection to
495
the generic, higher level implementations. The API hides the
496
implementation details from the devices, like native device use or
497
advanced block device formats like QCOW.
499
Usually the devices implement a reset method and register support for
500
saving and loading of the device state. The devices can also use
501
timers, especially together with the use of bottom halves (BHs).
451
503
@node Hardware interrupts
452
504
@section Hardware interrupts