~ubuntu-branches/ubuntu/utopic/xen/utopic

(VMM), or ``hypervisor'', for a variety of processor architectures including x86. Xen can securely execute multiple virtual machines on a single physical system with near native performance. Xen facilitates enterprise-grade functionality, including:

118

119

\begin{itemize}

120

\item Virtual machines with performance close to native hardware.

121

\item Live migration of running virtual machines between physical hosts.

122

\item Up to 32\footnote{IA64 supports up to 64 virtual CPUs per guest virtual machine} virtual CPUs per guest virtual machine, with VCPU hotplug.

123

\item x86/32 with PAE, x86/64, and IA64 platform support.

124

\item Intel and AMD Virtualization Technology for unmodified guest operating systems (including Microsoft Windows).

125

\item Excellent hardware support (supports almost all Linux device

126

drivers).

127

\end{itemize}

128

129

130

\section{Usage Scenarios}

131

132

Usage scenarios for Xen include:

133

134

\begin{description}

135

\item [Server Consolidation.] Move multiple servers onto a single

136

physical host with performance and fault isolation provided at the

137

virtual machine boundaries.

138

\item [Hardware Independence.] Allow legacy applications and operating

139

systems to exploit new hardware.

140

\item [Multiple OS configurations.] Run multiple operating systems

141

simultaneously, for development or testing purposes.

142

\item [Kernel Development.] Test and debug kernel modifications in a

143

sand-boxed virtual machine --- no need for a separate test machine.

144

\item [Cluster Computing.] Management at VM granularity provides more

145

flexibility than separately managing each physical host, but better

146

control and isolation than single-system image solutions,

147

particularly by using live migration for load balancing.

148

\item [Hardware support for custom OSes.] Allow development of new

149

OSes while benefiting from the wide-ranging hardware support of

150

existing OSes such as Linux.

151

\end{description}

152

153

154

\section{Operating System Support}

155

156

Para-virtualization permits very high performance virtualization, even

157

on architectures like x86 that are traditionally very hard to

158

virtualize.

159

160

This approach requires operating systems to be \emph{ported} to run on

161

Xen. Porting an OS to run on Xen is similar to supporting a new

162

hardware platform, however the process is simplified because the

163

para-virtual machine architecture is very similar to the underlying

164

native hardware. Even though operating system kernels must explicitly

165

support Xen, a key feature is that user space applications and

166

libraries \emph{do not} require modification.

167

168

With hardware CPU virtualization as provided by Intel VT and AMD

169

SVM technology, the ability to run an unmodified guest OS kernel

170

is available. No porting of the OS is required, although some

171

additional driver support is necessary within Xen itself. Unlike

172

traditional full virtualization hypervisors, which suffer a tremendous

173

performance overhead, the combination of Xen and VT or Xen and

174

Pacifica technology complement one another to offer superb performance

175

for para-virtualized guest operating systems and full support for

176

unmodified guests running natively on the processor.

177

178

Paravirtualized Xen support is available for increasingly many

179

operating systems: currently, mature Linux support is available and

180

included in the standard distribution. Other OS ports, including

181

NetBSD, FreeBSD and Solaris are also complete.

182

183

184

\section{Hardware Support}

185

186

Xen currently runs on the IA64 and x86 architectures. Multiprocessor

187

machines are supported, and there is support for HyperThreading (SMT).

188

189

The default 32-bit Xen requires processor support for Physical

190

Addressing Extensions (PAE), which enables the hypervisor to address

191

up to 16GB of physical memory. Xen also supports x86/64 platforms

192

such as Intel EM64T and AMD Opteron which can currently address up to

193

1TB of physical memory.

194

195

Xen offloads most of the hardware support issues to the guest OS

196

running in the \emph{Domain~0} management virtual machine. Xen itself

197

contains only the code required to detect and start secondary

198

processors, set up interrupt routing, and perform PCI bus

199

enumeration. Device drivers run within a privileged guest OS rather

200

than within Xen itself. This approach provides compatibility with the

201

majority of device hardware supported by Linux. The default XenLinux

202

build contains support for most server-class network and disk

203

hardware, but you can add support for other hardware by configuring

204

your XenLinux kernel in the normal way.

205

206

207

\section{Structure of a Xen-Based System}

208

209

A Xen system has multiple layers, the lowest and most privileged of

210

which is Xen itself.

211

212

Xen may host multiple \emph{guest} operating systems, each of which is

213

executed within a secure virtual machine. In Xen terminology, a

214

\emph{domain}. Domains are scheduled by Xen to make effective use of the

215

available physical CPUs. Each guest OS manages its own applications.

216

This management includes the responsibility of scheduling each

217

application within the time allotted to the VM by Xen.

218

219

The first domain, \emph{domain~0}, is created automatically when the

220

system boots and has special management privileges. Domain~0 builds

221

other domains and manages their virtual devices. It also performs

222

administrative tasks such as suspending, resuming and migrating other

223

virtual machines.

224

225

Within domain~0, a process called \emph{xend} runs to manage the system.

226

\Xend\ is responsible for managing virtual machines and providing access

227

to their consoles. Commands are issued to \xend\ over an HTTP interface,

228

via a command-line tool.

229

230

231

\section{History}

232

233

Xen was originally developed by the Systems Research Group at the

234

University of Cambridge Computer Laboratory as part of the XenoServers

235

project, funded by the UK-EPSRC\@.

236

237

XenoServers aim to provide a ``public infrastructure for global

238

distributed computing''. Xen plays a key part in that, allowing one to

239

efficiently partition a single machine to enable multiple independent

240

clients to run their operating systems and applications in an

241

environment. This environment provides protection, resource isolation

242

and accounting. The project web page contains further information along

243

with pointers to papers and technical reports:

244

\path{http://www.cl.cam.ac.uk/xeno}

245

246

Xen has grown into a fully-fledged project in its own right, enabling us

247

to investigate interesting research issues regarding the best techniques

248

for virtualizing resources such as the CPU, memory, disk and network.

249

Project contributors now include Citrix, Intel, IBM, HP, AMD, Novell,

250

RedHat, Sun, Fujitsu, and Samsung.

251

252

Xen was first described in a paper presented at SOSP in

253

2003\footnote{\tt

254

http://www.cl.cam.ac.uk/netos/papers/2003-xensosp.pdf}, and the first

255

public release (1.0) was made that October. Since then, Xen has

256

significantly matured and is now used in production scenarios on many

257

sites.

258

259

\section{What's New}

260

261

Xen 3.3.0 offers:

262

263

\begin{itemize}

264

\item IO Emulation (stub domains) for HVM IO performance and scailability

265

\item Replacement of Intel VT vmxassist by new 16b emulation code

266

\item Improved VT-d device pass-through e.g. for graphics devices

267

\item Enhanced C and P state power management

268

\item Exploitation of multi-queue support on modern NICs

269

\item Removal of domain lock for improved PV guest scalability

270

\item 2MB page support for HVM and PV guests

271

\item CPU Portability

272

\end{itemize}

273

274

Xen 3.3 delivers the capabilities needed by enterprise customers and gives computing industry leaders a solid, secure platform to build upon for their virtualization solutions. This latest release establishes Xen as the definitive open source solution for virtualization.

275

276

277

278

\part{Installation}

279

280

%% Chapter Basic Installation

281

\chapter{Basic Installation}

282

283

The Xen distribution includes three main components: Xen itself, ports

284

of Linux and NetBSD to run on Xen, and the userspace tools required to

285

manage a Xen-based system. This chapter describes how to install the

286

Xen~3.3 distribution from source. Alternatively, there may be pre-built

287

packages available as part of your operating system distribution.

288

289

290

\section{Prerequisites}

291

\label{sec:prerequisites}

292

293

The following is a full list of prerequisites. Items marked `$\dag$' are

294

required by the \xend\ control tools, and hence required if you want to

295

run more than one virtual machine; items marked `$*$' are only required

296

if you wish to build from source.

297

\begin{itemize}

298

\item A working Linux distribution using the GRUB bootloader and running

299

on a P6-class or newer CPU\@.

300

\item [$\dag$] The \path{iproute2} package.

301

\item [$\dag$] The Linux bridge-utils\footnote{Available from {\tt

302

http://bridge.sourceforge.net}} (e.g., \path{/sbin/brctl})

303

\item [$\dag$] The Linux hotplug system\footnote{Available from {\tt

304

http://linux-hotplug.sourceforge.net/}} (e.g.,

305

\path{/sbin/hotplug} and related scripts). On newer distributions,

306

this is included alongside the Linux udev system\footnote{See {\tt

307

http://www.kernel.org/pub/linux/utils/kernel/hotplug/udev.html/}}.

308

\item [$*$] Build tools (gcc v3.2.x or v3.3.x, binutils, GNU make).

309

\item [$*$] Development installation of zlib (e.g.,\ zlib-dev).

310

\item [$*$] Development installation of Python v2.2 or later (e.g.,\

311

python-dev).

312

\item [$*$] \LaTeX\ and transfig are required to build the

313

documentation.

314

\end{itemize}

315

316

Once you have satisfied these prerequisites, you can now install either

317

a binary or source distribution of Xen.

318

319

\section{Installing from Binary Tarball}

320

321

Pre-built tarballs are available for download from the XenSource downloads

322

page:

323

\begin{quote} {\tt http://www.xensource.com/downloads/}

324

\end{quote}

325

326

Once you've downloaded the tarball, simply unpack and install:

327

\begin{verbatim}

328

# tar zxvf xen-3.0-install.tgz

329

# cd xen-3.0-install

330

# sh ./install.sh

331

\end{verbatim}

332

333

Once you've installed the binaries you need to configure your system as

334

described in Section~\ref{s:configure}.

335

336

\section{Installing from RPMs}

337

Pre-built RPMs are available for download from the XenSource downloads

338

page:

339

\begin{quote} {\tt http://www.xensource.com/downloads/}

340

\end{quote}

341

342

Once you've downloaded the RPMs, you typically install them via the

343

RPM commands:

344

345

\verb|# rpm -iv rpmname|

346

347

See the instructions and the Release Notes for each RPM set referenced at:

348

\begin{quote}

349

{\tt http://www.xensource.com/downloads/}.

350

\end{quote}

351

352

\section{Installing from Source}

353

354

This section describes how to obtain, build and install Xen from source.

355

356

\subsection{Obtaining the Source}

357

358

The Xen source tree is available as either a compressed source tarball

359

or as a clone of our master Mercurial repository.

360

361

\begin{description}

362

\item[Obtaining the Source Tarball]\mbox{} \\

363

Stable versions and daily snapshots of the Xen source tree are

364

available from the Xen download page:

365

\begin{quote} {\tt \tt http://www.xensource.com/downloads/}

366

\end{quote}

367

\item[Obtaining the source via Mercurial]\mbox{} \\

368

The source tree may also be obtained via the public Mercurial

369

repository at:

370

\begin{quote}{\tt http://xenbits.xensource.com}

371

\end{quote} See the instructions and the Getting Started Guide

372

referenced at:

373

\begin{quote}

374

{\tt http://www.xensource.com/downloads/}

375

\end{quote}

376

\end{description}

377

378

% \section{The distribution}

379

380

% The Xen source code repository is structured as follows:

381

382

% \begin{description}

383

% \item[\path{tools/}] Xen node controller daemon (Xend), command line

384

% tools, control libraries

385

% \item[\path{xen/}] The Xen VMM.

386

% \item[\path{buildconfigs/}] Build configuration files

387

% \item[\path{linux-*-xen-sparse/}] Xen support for Linux.

388

% \item[\path{patches/}] Experimental patches for Linux.

389

% \item[\path{docs/}] Various documentation files for users and

390

% developers.

391

% \item[\path{extras/}] Bonus extras.

392

% \end{description}

393

394

\subsection{Building from Source}

395

396

The top-level Xen Makefile includes a target ``world'' that will do the

397

following:

398

399

\begin{itemize}

400

\item Build Xen.

401

\item Build the control tools, including \xend.

402

\item Download (if necessary) and unpack the Linux 2.6 source code, and

403

patch it for use with Xen.

404

\item Build a Linux kernel to use in domain~0 and a smaller unprivileged

405

kernel, which can be used for unprivileged virtual machines.

406

\end{itemize}

407

408

After the build has completed you should have a top-level directory

409

called \path{dist/} in which all resulting targets will be placed. Of

410

particular interest are the two XenLinux kernel images, one with a

411

``-xen0'' extension which contains hardware device drivers and drivers

412

for Xen's virtual devices, and one with a ``-xenU'' extension that

413

just contains the virtual ones. These are found in

414

\path{dist/install/boot/} along with the image for Xen itself and the

415

configuration files used during the build.

416

417

%The NetBSD port can be built using:

418

%\begin{quote}

419

%\begin{verbatim}

420

%# make netbsd20

421

%\end{verbatim}\end{quote}

422

%NetBSD port is built using a snapshot of the netbsd-2-0 cvs branch.

423

%The snapshot is downloaded as part of the build process if it is not

424

%yet present in the \path{NETBSD\_SRC\_PATH} search path. The build

425

%process also downloads a toolchain which includes all of the tools

426

%necessary to build the NetBSD kernel under Linux.

427

428

To customize the set of kernels built you need to edit the top-level

429

Makefile. Look for the line:

430

\begin{quote}

431

\begin{verbatim}

432

KERNELS ?= linux-2.6-xen0 linux-2.6-xenU

433

\end{verbatim}

434

\end{quote}

435

436

You can edit this line to include any set of operating system kernels

437

which have configurations in the top-level \path{buildconfigs/}

438

directory.

439

440

%% Inspect the Makefile if you want to see what goes on during a

441

%% build. Building Xen and the tools is straightforward, but XenLinux

442

%% is more complicated. The makefile needs a `pristine' Linux kernel

443

%% tree to which it will then add the Xen architecture files. You can

444

%% tell the makefile the location of the appropriate Linux compressed

445

%% tar file by

446

%% setting the LINUX\_SRC environment variable, e.g. \\

447

%% \verb!# LINUX_SRC=/tmp/linux-2.6.11.tar.bz2 make world! \\ or by

448

%% placing the tar file somewhere in the search path of {\tt

449

%% LINUX\_SRC\_PATH} which defaults to `{\tt .:..}'. If the

450

%% makefile can't find a suitable kernel tar file it attempts to

451

%% download it from kernel.org (this won't work if you're behind a

452

%% firewall).

453

454

%% After untaring the pristine kernel tree, the makefile uses the {\tt

455

%% mkbuildtree} script to add the Xen patches to the kernel.

456

457

%% \framebox{\parbox{5in}{

458

%% {\bf Distro specific:} \\

459

%% {\it Gentoo} --- if not using udev (most installations,

460

%% currently), you'll need to enable devfs and devfs mount at boot

461

%% time in the xen0 config. }}

462

463

\subsection{Custom Kernels}

464

465

% If you have an SMP machine you may wish to give the {\tt '-j4'}

466

% argument to make to get a parallel build.

467

468

If you wish to build a customized XenLinux kernel (e.g.\ to support

469

additional devices or enable distribution-required features), you can

470

use the standard Linux configuration mechanisms, specifying that the

471

architecture being built for is \path{xen}, e.g:

472

\begin{quote}

473

\begin{verbatim}

474

# cd linux-2.6.12-xen0

475

# make ARCH=xen xconfig

476

# cd ..

477

# make

478

\end{verbatim}

479

\end{quote}

480

481

You can also copy an existing Linux configuration (\path{.config}) into

482

e.g.\ \path{linux-2.6.12-xen0} and execute:

483

\begin{quote}

484

\begin{verbatim}

485

# make ARCH=xen oldconfig

486

\end{verbatim}

487

\end{quote}

488

489

You may be prompted with some Xen-specific options. We advise accepting

490

the defaults for these options.

491

492

Note that the only difference between the two types of Linux kernels

493

that are built is the configuration file used for each. The ``U''

494

suffixed (unprivileged) versions don't contain any of the physical

495

hardware device drivers, leading to a 30\% reduction in size; hence you

496

may prefer these for your non-privileged domains. The ``0'' suffixed

497

privileged versions can be used to boot the system, as well as in driver

498

domains and unprivileged domains.

499

500

\subsection{Installing Generated Binaries}

501

502

The files produced by the build process are stored under the

503

\path{dist/install/} directory. To install them in their default

504

locations, do:

505

\begin{quote}

506

\begin{verbatim}

507

# make install

508

\end{verbatim}

509

\end{quote}

510

511

Alternatively, users with special installation requirements may wish to

512

install them manually by copying the files to their appropriate

513

destinations.

514

515

%% Files in \path{install/boot/} include:

516

%% \begin{itemize}

517

%% \item \path{install/boot/xen-3.0.gz} Link to the Xen 'kernel'

518

%% \item \path{install/boot/vmlinuz-2.6-xen0} Link to domain 0

519

%% XenLinux kernel

520

%% \item \path{install/boot/vmlinuz-2.6-xenU} Link to unprivileged

521

%% XenLinux kernel

522

%% \end{itemize}

523

524

The \path{dist/install/boot} directory will also contain the config

525

files used for building the XenLinux kernels, and also versions of Xen

526

and XenLinux kernels that contain debug symbols such as

527

(\path{xen-syms-3.0.0} and \path{vmlinux-syms-2.6.12.6-xen0}) which are

528

essential for interpreting crash dumps. Retain these files as the

529

developers may wish to see them if you post on the mailing list.

530

531

532

\section{Configuration}

533

\label{s:configure}

534

535

Once you have built and installed the Xen distribution, it is simple to

536

prepare the machine for booting and running Xen.

537

538

\subsection{GRUB Configuration}

539

540

An entry should be added to \path{grub.conf} (often found under

541

\path{/boot/} or \path{/boot/grub/}) to allow Xen / XenLinux to boot.

542

This file is sometimes called \path{menu.lst}, depending on your

543

distribution. The entry should look something like the following:

544

545

%% KMSelf Thu Dec 1 19:06:13 PST 2005 262144 is useful for RHEL/RH and

546

%% related Dom0s.

547

{\small

548

\begin{verbatim}

549

title Xen 3.0 / XenLinux 2.6

550

kernel /boot/xen-3.0.gz dom0_mem=262144

551

module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro console=tty0

552

\end{verbatim}

553

}

554

555

The kernel line tells GRUB where to find Xen itself and what boot

556

parameters should be passed to it (in this case, setting the domain~0

557

memory allocation in kilobytes and the settings for the serial port).

558

For more details on the various Xen boot parameters see

559

Section~\ref{s:xboot}.

560

561

The module line of the configuration describes the location of the

562

XenLinux kernel that Xen should start and the parameters that should be

563

passed to it. These are standard Linux parameters, identifying the root

564

device and specifying it be initially mounted read only and instructing

565

that console output be sent to the screen. Some distributions such as

566

SuSE do not require the \path{ro} parameter.

567

568

%% \framebox{\parbox{5in}{

569

%% {\bf Distro specific:} \\

570

%% {\it SuSE} --- Omit the {\tt ro} option from the XenLinux

571

%% kernel command line, since the partition won't be remounted rw

572

%% during boot. }}

573

574

To use an initrd, add another \path{module} line to the configuration,

575

like: {\small

576

\begin{verbatim}

577

module /boot/my_initrd.gz

578

\end{verbatim}

579

}

580

581

%% KMSelf Thu Dec 1 19:05:30 PST 2005 Other configs as an appendix?

582

583

When installing a new kernel, it is recommended that you do not delete

584

existing menu options from \path{menu.lst}, as you may wish to boot your

585

old Linux kernel in future, particularly if you have problems.

586

587

\subsection{Serial Console (optional)}

588

589

Serial console access allows you to manage, monitor, and interact with

590

your system over a serial console. This can allow access from another

591

nearby system via a null-modem (``LapLink'') cable or remotely via a serial

592

concentrator.

593

594

You system's BIOS, bootloader (GRUB), Xen, Linux, and login access must

595

each be individually configured for serial console access. It is

596

\emph{not} strictly necessary to have each component fully functional,

597

but it can be quite useful.

598

599

For general information on serial console configuration under Linux,

600

refer to the ``Remote Serial Console HOWTO'' at The Linux Documentation

601

Project: \url{http://www.tldp.org}

602

603

\subsubsection{Serial Console BIOS configuration}

604

605

Enabling system serial console output neither enables nor disables

606

serial capabilities in GRUB, Xen, or Linux, but may make remote

607

management of your system more convenient by displaying POST and other

608

boot messages over serial port and allowing remote BIOS configuration.

609

610

Refer to your hardware vendor's documentation for capabilities and

611

procedures to enable BIOS serial redirection.

612

613

614

\subsubsection{Serial Console GRUB configuration}

615

616

Enabling GRUB serial console output neither enables nor disables Xen or

617

Linux serial capabilities, but may made remote management of your system

618

more convenient by displaying GRUB prompts, menus, and actions over

619

serial port and allowing remote GRUB management.

620

621

Adding the following two lines to your GRUB configuration file,

622

typically either \path{/boot/grub/menu.lst} or \path{/boot/grub/grub.conf}

623

depending on your distro, will enable GRUB serial output.

624

625

\begin{quote}

626

{\small \begin{verbatim}

627

serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1

628

terminal --timeout=10 serial console

629

\end{verbatim}}

630

\end{quote}

631

632

Note that when both the serial port and the local monitor and keyboard

633

are enabled, the text ``\emph{Press any key to continue}'' will appear

634

at both. Pressing a key on one device will cause GRUB to display to

635

that device. The other device will see no output. If no key is

636

pressed before the timeout period expires, the system will boot to the

637

default GRUB boot entry.

638

639

Please refer to the GRUB documentation for further information.

640

641

642

\subsubsection{Serial Console Xen configuration}

643

644

Enabling Xen serial console output neither enables nor disables Linux

645

kernel output or logging in to Linux over serial port. It does however

646

allow you to monitor and log the Xen boot process via serial console and

647

can be very useful in debugging.

648

649

%% kernel /boot/xen-2.0.gz dom0_mem=131072 console=com1,vga com1=115200,8n1

650

%% module /boot/vmlinuz-2.6-xen0 root=/dev/sda4 ro

651

652

In order to configure Xen serial console output, it is necessary to

653

add a boot option to your GRUB config; e.g.\ replace the previous

654

example kernel line with:

655

\begin{quote} {\small \begin{verbatim}

656

kernel /boot/xen.gz dom0_mem=131072 com1=115200,8n1 console=com1,vga

657

\end{verbatim}}

658

\end{quote}

659

660

This configures Xen to output on COM1 at 115,200 baud, 8 data bits, no

661

parity and 1 stop bit. Modify these parameters for your environment.

662

See Section~\ref{s:xboot} for an explanation of all boot parameters.

663

664

One can also configure XenLinux to share the serial console; to achieve

665

this append ``\path{console=ttyS0}'' to your module line.

666

667

668

\subsubsection{Serial Console Linux configuration}

669

670

Enabling Linux serial console output at boot neither enables nor

671

disables logging in to Linux over serial port. It does however allow

672

you to monitor and log the Linux boot process via serial console and can be

673

very useful in debugging.

674

675

To enable Linux output at boot time, add the parameter

676

\path{console=ttyS0} (or ttyS1, ttyS2, etc.) to your kernel GRUB line.

677

Under Xen, this might be:

678

\begin{quote}

679

{\footnotesize \begin{verbatim}

680

module /vmlinuz-2.6-xen0 ro root=/dev/VolGroup00/LogVol00 \

681

console=ttyS0, 115200

682

\end{verbatim}}

683

\end{quote}

684

to enable output over ttyS0 at 115200 baud.

685

686

687

688

\subsubsection{Serial Console Login configuration}

689

690

Logging in to Linux via serial console, under Xen or otherwise, requires

691

specifying a login prompt be started on the serial port. To permit root

692

logins over serial console, the serial port must be added to

693

\path{/etc/securetty}.

694

695

\newpage

696

To automatically start a login prompt over the serial port,

697

add the line: \begin{quote} {\small {\tt c:2345:respawn:/sbin/mingetty

698

ttyS0}} \end{quote} to \path{/etc/inittab}. Run \path{init q} to force

699

a reload of your inttab and start getty.

700

701

To enable root logins, add \path{ttyS0} to \path{/etc/securetty} if not

702

already present.

703

704

Your distribution may use an alternate getty; options include getty,

705

mgetty and agetty. Consult your distribution's documentation

706

for further information.

707

708

709

\subsection{TLS Libraries}

710

711

Users of the XenLinux 2.6 kernel should disable Thread Local Storage

712

(TLS) (e.g.\ by doing a \path{mv /lib/tls /lib/tls.disabled}) before

713

attempting to boot a XenLinux kernel\footnote{If you boot without first

714

disabling TLS, you will get a warning message during the boot process.

715

In this case, simply perform the rename after the machine is up and

716

then run \path{/sbin/ldconfig} to make it take effect.}. You can

717

always reenable TLS by restoring the directory to its original location

718

(i.e.\ \path{mv /lib/tls.disabled /lib/tls}).

719

720

The reason for this is that the current TLS implementation uses

721

segmentation in a way that is not permissible under Xen. If TLS is not

722

disabled, an emulation mode is used within Xen which reduces performance

723

substantially. To ensure full performance you should install a

724

`Xen-friendly' (nosegneg) version of the library.

725

726

727

\section{Booting Xen}

728

729

It should now be possible to restart the system and use Xen. Reboot and

730

choose the new Xen option when the Grub screen appears.

731

732

What follows should look much like a conventional Linux boot. The first

733

portion of the output comes from Xen itself, supplying low level

734

information about itself and the underlying hardware. The last portion

735

of the output comes from XenLinux.

736

737

You may see some error messages during the XenLinux boot. These are not

738

necessarily anything to worry about---they may result from kernel

739

configuration differences between your XenLinux kernel and the one you

740

usually use.

741

742

When the boot completes, you should be able to log into your system as

743

usual. If you are unable to log in, you should still be able to reboot

744

with your normal Linux kernel by selecting it at the GRUB prompt.

745

746

747

% Booting Xen

748

\chapter{Booting a Xen System}

749

750

Booting the system into Xen will bring you up into the privileged

751

management domain, Domain0. At that point you are ready to create

752

guest domains and ``boot'' them using the \texttt{xm create} command.

753

754

\section{Booting Domain0}

755

756

After installation and configuration is complete, reboot the system

757

and and choose the new Xen option when the Grub screen appears.

758

759

What follows should look much like a conventional Linux boot. The

760

first portion of the output comes from Xen itself, supplying low level

761

information about itself and the underlying hardware. The last

762

portion of the output comes from XenLinux.

763

764

%% KMSelf Wed Nov 30 18:09:37 PST 2005: We should specify what these are.

765

766

When the boot completes, you should be able to log into your system as

767

usual. If you are unable to log in, you should still be able to

768

reboot with your normal Linux kernel by selecting it at the GRUB prompt.

769

770

The first step in creating a new domain is to prepare a root

771

filesystem for it to boot. Typically, this might be stored in a normal

772

partition, an LVM or other volume manager partition, a disk file or on

773

an NFS server. A simple way to do this is simply to boot from your

774

standard OS install CD and install the distribution into another

775

partition on your hard drive.

776

777

To start the \xend\ control daemon, type

778

\begin{quote}

779

\verb!# xend start!

780

\end{quote}

781

782

If you wish the daemon to start automatically, see the instructions in

783

Section~\ref{s:xend}. Once the daemon is running, you can use the

784

\path{xm} tool to monitor and maintain the domains running on your

785

system. This chapter provides only a brief tutorial. We provide full

786

details of the \path{xm} tool in the next chapter.

787

788

% \section{From the web interface}

789

790

% Boot the Xen machine and start Xensv (see Chapter~\ref{cha:xensv}

791

% for more details) using the command: \\

792

% \verb_# xensv start_ \\

793

% This will also start Xend (see Chapter~\ref{cha:xend} for more

794

% information).

795

796

% The domain management interface will then be available at {\tt

797

% http://your\_machine:8080/}. This provides a user friendly wizard

798

% for starting domains and functions for managing running domains.

799

800

% \section{From the command line}

801

\section{Booting Guest Domains}

802

803

\subsection{Creating a Domain Configuration File}

804

805

Before you can start an additional domain, you must create a

806

configuration file. We provide two example files which you can use as

807

a starting point:

808

\begin{itemize}

809

\item \path{/etc/xen/xmexample1} is a simple template configuration

810

file for describing a single VM\@.

811

\item \path{/etc/xen/xmexample2} file is a template description that

812

is intended to be reused for multiple virtual machines. Setting the

813

value of the \path{vmid} variable on the \path{xm} command line

814

fills in parts of this template.

815

\end{itemize}

816

817

There are also a number of other examples which you may find useful.

818

Copy one of these files and edit it as appropriate. Typical values

819

you may wish to edit include:

820

821

\begin{quote}

822

\begin{description}

823

\item[kernel] Set this to the path of the kernel you compiled for use

824

with Xen (e.g.\ \path{kernel = ``/boot/vmlinuz-2.6-xenU''})

825

\item[memory] Set this to the size of the domain's memory in megabytes

826

(e.g.\ \path{memory = 64})

827

\item[disk] Set the first entry in this list to calculate the offset

828

of the domain's root partition, based on the domain ID\@. Set the

829

second to the location of \path{/usr} if you are sharing it between

830

domains (e.g.\ \path{disk = ['phy:your\_hard\_drive\%d,sda1,w' \%

831

(base\_partition\_number + vmid),

832

'phy:your\_usr\_partition,sda6,r' ]}

833

\item[dhcp] Uncomment the dhcp variable, so that the domain will

834

receive its IP address from a DHCP server (e.g.\ \path{dhcp=``dhcp''})

835

\end{description}

836

\end{quote}

837

838

You may also want to edit the {\bf vif} variable in order to choose

839

the MAC address of the virtual ethernet interface yourself. For

840

example:

841

842

\begin{quote}

843

\verb_vif = ['mac=00:16:3E:F6:BB:B3']_

844

\end{quote}

845

If you do not set this variable, \xend\ will automatically generate a

846

random MAC address from the range 00:16:3E:xx:xx:xx, assigned by IEEE to

847

XenSource as an OUI (organizationally unique identifier). XenSource

848

Inc. gives permission for anyone to use addresses randomly allocated

849

from this range for use by their Xen domains.

850

851

For a list of IEEE OUI assignments, see

852

\url{http://standards.ieee.org/regauth/oui/oui.txt}

853

854

855

\subsection{Booting the Guest Domain}

856

857

The \path{xm} tool provides a variety of commands for managing

858

domains. Use the \path{create} command to start new domains. Assuming

859

you've created a configuration file \path{myvmconf} based around

860

\path{/etc/xen/xmexample2}, to start a domain with virtual machine

861

ID~1 you should type:

862

863

\begin{quote}

864

\begin{verbatim}

865

# xm create -c myvmconf vmid=1

866

\end{verbatim}

867

\end{quote}

868

869

The \path{-c} switch causes \path{xm} to turn into the domain's

870

console after creation. The \path{vmid=1} sets the \path{vmid}

871

variable used in the \path{myvmconf} file.

872

873

You should see the console boot messages from the new domain appearing

874

in the terminal in which you typed the command, culminating in a login

875

prompt.

876

877

878

\section{Starting / Stopping Domains Automatically}

879

880

It is possible to have certain domains start automatically at boot

881

time and to have dom0 wait for all running domains to shutdown before

882

it shuts down the system.

883

884

To specify a domain is to start at boot-time, place its configuration

885

file (or a link to it) under \path{/etc/xen/auto/}.

886

887

A Sys-V style init script for Red Hat and LSB-compliant systems is

888

provided and will be automatically copied to \path{/etc/init.d/}

889

during install. You can then enable it in the appropriate way for

890

your distribution.

891

892

For instance, on Red Hat:

893

894

\begin{quote}

895

\verb_# chkconfig --add xendomains_

896

\end{quote}

897

898

By default, this will start the boot-time domains in runlevels 3, 4

899

and 5.

900

901

You can also use the \path{service} command to run this script

902

manually, e.g:

903

904

\begin{quote}

905

\verb_# service xendomains start_

906

907

Starts all the domains with config files under /etc/xen/auto/.

908

\end{quote}

909

910

\begin{quote}

911

\verb_# service xendomains stop_

912

913

Shuts down all running Xen domains.

914

\end{quote}

915

916

917

918

\part{Configuration and Management}

919

920

%% Chapter Domain Management Tools and Daemons

921

\chapter{Domain Management Tools}

922

923

This chapter summarizes the management software and tools available.

924

925

926

\section{\Xend\ }

927

\label{s:xend}

928

929

930

The \Xend\ node control daemon performs system management functions

931

related to virtual machines. It forms a central point of control of

932

virtualized resources, and must be running in order to start and manage

933

virtual machines. \Xend\ must be run as root because it needs access to

934

privileged system management functions.

935

936

An initialization script named \texttt{/etc/init.d/xend} is provided to

937

start \Xend\ at boot time. Use the tool appropriate (i.e. chkconfig) for

938

your Linux distribution to specify the runlevels at which this script

939

should be executed, or manually create symbolic links in the correct

940

runlevel directories.

941

942

\Xend\ can be started on the command line as well, and supports the

943

following set of parameters:

944

945

\begin{tabular}{ll}

946

\verb!# xend start! & start \xend, if not already running \\

947

\verb!# xend stop! & stop \xend\ if already running \\

948

\verb!# xend restart! & restart \xend\ if running, otherwise start it \\

949

% \verb!# xend trace_start! & start \xend, with very detailed debug logging \\

950

\verb!# xend status! & indicates \xend\ status by its return code

951

\end{tabular}

952

953

A SysV init script called {\tt xend} is provided to start \xend\ at

954

boot time. {\tt make install} installs this script in

955

\path{/etc/init.d}. To enable it, you have to make symbolic links in

956

the appropriate runlevel directories or use the {\tt chkconfig} tool,

957

where available. Once \xend\ is running, administration can be done

958

using the \texttt{xm} tool.

959

960

\subsection{Logging}

961

962

As \xend\ runs, events will be logged to \path{/var/log/xen/xend.log} and

963

(less frequently) to \path{/var/log/xen/xend-debug.log}. These, along with

964

the standard syslog files, are useful when troubleshooting problems.

965

966

\subsection{Configuring \Xend\ }

967

968

\Xend\ is written in Python. At startup, it reads its configuration

969

information from the file \path{/etc/xen/xend-config.sxp}. The Xen

970

installation places an example \texttt{xend-config.sxp} file in the

971

\texttt{/etc/xen} subdirectory which should work for most installations.

972

973

See the example configuration file \texttt{xend-debug.sxp} and the

974

section 5 man page \texttt{xend-config.sxp} for a full list of

975

parameters and more detailed information. Some of the most important

976

parameters are discussed below.

977

978

An HTTP interface and a Unix domain socket API are available to

979

communicate with \Xend. This allows remote users to pass commands to the

980

daemon. By default, \Xend does not start an HTTP server. It does start a

981

Unix domain socket management server, as the low level utility

982

\texttt{xm} requires it. For support of cross-machine migration, \Xend\

983

can start a relocation server. This support is not enabled by default

984

for security reasons.

985

986

Note: the example \texttt{xend} configuration file modifies the defaults and

987

starts up \Xend\ as an HTTP server as well as a relocation server.

988

989

From the file:

990

991

\begin{verbatim}

992

#(xend-http-server no)

993

(xend-http-server yes)

994

#(xend-unix-server yes)

995

#(xend-relocation-server no)

996

(xend-relocation-server yes)

997

\end{verbatim}

998

999

Comment or uncomment lines in that file to disable or enable features

1000

that you require.

1001

1002

Connections from remote hosts are disabled by default:

1003

1004

\begin{verbatim}

1005

# Address xend should listen on for HTTP connections, if xend-http-server is

1006

# set.

1007

# Specifying 'localhost' prevents remote connections.

1008

# Specifying the empty string '' (the default) allows all connections.

1009

#(xend-address '')

1010

(xend-address localhost)

1011

\end{verbatim}

1012

1013

It is recommended that if migration support is not needed, the

1014

\texttt{xend-relocation-server} parameter value be changed to

1015

``\texttt{no}'' or commented out.

1016

1017

\section{Xm}

1018

\label{s:xm}

1019

1020

The xm tool is the primary tool for managing Xen from the console. The

1021

general format of an xm command line is:

1022

1023

\begin{verbatim}

1024

# xm command [switches] [arguments] [variables]

1025

\end{verbatim}

1026

1027

The available \emph{switches} and \emph{arguments} are dependent on the

1028

\emph{command} chosen. The \emph{variables} may be set using

1029

declarations of the form {\tt variable=value} and command line

1030

declarations override any of the values in the configuration file being

1031

used, including the standard variables described above and any custom

1032

variables (for instance, the \path{xmdefconfig} file uses a {\tt vmid}

1033

variable).

1034

1035

For online help for the commands available, type:

1036

1037

\begin{quote}

1038

\begin{verbatim}

1039

# xm help

1040

\end{verbatim}

1041

\end{quote}

1042

1043

This will list the most commonly used commands. The full list can be obtained

1044

using \verb_xm help --long_. You can also type \path{xm help $<$command$>$}

1045

for more information on a given command.

1046

1047

\subsection{Basic Management Commands}

1048

1049

One useful command is \verb_# xm list_ which lists all domains running in rows

1050

of the following format:

1051

\begin{center} {\tt name domid memory vcpus state cputime}

1052

\end{center}

1053

1054

The meaning of each field is as follows:

1055

\begin{quote}

1056

\begin{description}

1057

\item[name] The descriptive name of the virtual machine.

1058

\item[domid] The number of the domain ID this virtual machine is

1059

running in.

1060

\item[memory] Memory size in megabytes.

1061

\item[vcpus] The number of virtual CPUs this domain has.

1062

\item[state] Domain state consists of 5 fields:

1063

\begin{description}

1064

\item[r] running

1065

\item[b] blocked

1066

\item[p] paused

1067

\item[s] shutdown

1068

\item[c] crashed

1069

\end{description}

1070

\item[cputime] How much CPU time (in seconds) the domain has used so

1071

far.

1072

\end{description}

1073

\end{quote}

1074

1075

The \path{xm list} command also supports a long output format when the

1076

\path{-l} switch is used. This outputs the full details of the

1077

running domains in \xend's SXP configuration format.

1078

1079

If you want to know how long your domains have been running for, then

1080

you can use the \verb_# xm uptime_ command.

1081

1082

1083

You can get access to the console of a particular domain using

1084

the \verb_# xm console_ command (e.g.\ \verb_# xm console myVM_).

1085

1086

\subsection{Domain Scheduling Management Commands}

1087

1088

The credit CPU scheduler automatically load balances guest VCPUs

1089

across all available physical CPUs on an SMP host. The user need

1090

not manually pin VCPUs to load balance the system. However, she

1091

can restrict which CPUs a particular VCPU may run on using

1092

the \path{xm vcpu-pin} command.

1093

1094

Each guest domain is assigned a \path{weight} and a \path{cap}.

1095

1096

A domain with a weight of 512 will get twice as much CPU as a

1097

domain with a weight of 256 on a contended host. Legal weights

1098

range from 1 to 65535 and the default is 256.

1099

1100

The cap optionally fixes the maximum amount of CPU a guest will

1101

be able to consume, even if the host system has idle CPU cycles.

1102

The cap is expressed in percentage of one physical CPU: 100 is

1103

1 physical CPU, 50 is half a CPU, 400 is 4 CPUs, etc... The

1104

default, 0, means there is no upper cap.

1105

1106

When you are running with the credit scheduler, you can check and

1107

modify your domains' weights and caps using the \path{xm sched-credit}

1108

command:

1109

1110

\begin{tabular}{ll}

1111

\verb!xm sched-credit -d <domain>! & lists weight and cap \\

1112

\verb!xm sched-credit -d <domain> -w <weight>! & sets the weight \\

1113

\verb!xm sched-credit -d <domain> -c <cap>! & sets the cap

1114

\end{tabular}

1115

1116

1117

1118

%% Chapter Domain Configuration

1119

\chapter{Domain Configuration}

1120

\label{cha:config}

1121

1122

The following contains the syntax of the domain configuration files

1123

and description of how to further specify networking, driver domain

1124

and general scheduling behavior.

1125

1126

1127

\section{Configuration Files}

1128

\label{s:cfiles}

1129

1130

Xen configuration files contain the following standard variables.

1131

Unless otherwise stated, configuration items should be enclosed in

1132

quotes: see the configuration scripts in \path{/etc/xen/}

1133

for concrete examples.

1134

1135

\begin{description}

1136

\item[kernel] Path to the kernel image.

1137

\item[ramdisk] Path to a ramdisk image (optional).

1138

% \item[builder] The name of the domain build function (e.g.

1139

% {\tt'linux'} or {\tt'netbsd'}.

1140

\item[memory] Memory size in megabytes.

1141

\item[vcpus] The number of virtual CPUs.

1142

\item[console] Port to export the domain console on (default 9600 +

1143

domain ID).

1144

\item[vif] Network interface configuration. This may simply contain

1145

an empty string for each desired interface, or may override various

1146

settings, e.g.\

1147

\begin{verbatim}

1148

vif = [ 'mac=00:16:3E:00:00:11, bridge=xen-br0',

1149

'bridge=xen-br1' ]

1150

\end{verbatim}

1151

to assign a MAC address and bridge to the first interface and assign

1152

a different bridge to the second interface, leaving \xend\ to choose

1153

the MAC address. The settings that may be overridden in this way are

1154

type, mac, bridge, ip, script, backend, and vifname.

1155

\item[disk] List of block devices to export to the domain e.g.

1156

\verb_disk = [ 'phy:hda1,sda1,r' ]_

1157

exports physical device \path{/dev/hda1} to the domain as

1158

\path{/dev/sda1} with read-only access. Exporting a disk read-write

1159

which is currently mounted is dangerous -- if you are \emph{certain}

1160

you wish to do this, you can specify \path{w!} as the mode.

1161

\item[dhcp] Set to {\tt `dhcp'} if you want to use DHCP to configure

1162

networking.

1163

\item[netmask] Manually configured IP netmask.

1164

\item[gateway] Manually configured IP gateway.

1165

\item[hostname] Set the hostname for the virtual machine.

1166

\item[root] Specify the root device parameter on the kernel command

1167

line.

1168

\item[nfs\_server] IP address for the NFS server (if any).

1169

\item[nfs\_root] Path of the root filesystem on the NFS server (if

1170

any).

1171

\item[extra] Extra string to append to the kernel command line (if

1172

any)

1173

\end{description}

1174

1175

Additional fields are documented in the example configuration files

1176

(e.g. to configure virtual TPM functionality).

1177

1178

For additional flexibility, it is also possible to include Python

1179

scripting commands in configuration files. An example of this is the

1180

\path{xmexample2} file, which uses Python code to handle the

1181

\path{vmid} variable.

1182

1183

1184

%\part{Advanced Topics}

1185

1186

1187

\section{Network Configuration}

1188

1189

For many users, the default installation should work ``out of the

1190

box''. More complicated network setups, for instance with multiple

1191

Ethernet interfaces and/or existing bridging setups will require some

1192

special configuration.

1193

1194

The purpose of this section is to describe the mechanisms provided by

1195

\xend\ to allow a flexible configuration for Xen's virtual networking.

1196

1197

\subsection{Xen virtual network topology}

1198

1199

Each domain network interface is connected to a virtual network

1200

interface in dom0 by a point to point link (effectively a ``virtual

1201

crossover cable''). These devices are named {\tt

1202

vif$<$domid$>$.$<$vifid$>$} (e.g.\ {\tt vif1.0} for the first

1203

interface in domain~1, {\tt vif3.1} for the second interface in

1204

domain~3).

1205

1206

Traffic on these virtual interfaces is handled in domain~0 using

1207

standard Linux mechanisms for bridging, routing, rate limiting, etc.

1208

Xend calls on two shell scripts to perform initial configuration of

1209

the network and configuration of new virtual interfaces. By default,

1210

these scripts configure a single bridge for all the virtual

1211

interfaces. Arbitrary routing / bridging configurations can be

1212

configured by customizing the scripts, as described in the following

1213

section.

1214

1215

\subsection{Xen networking scripts}

1216

1217

Xen's virtual networking is configured by two shell scripts (by

1218

default \path{network-bridge} and \path{vif-bridge}). These are called

1219

automatically by \xend\ when certain events occur, with arguments to

1220

the scripts providing further contextual information. These scripts

1221

are found by default in \path{/etc/xen/scripts}. The names and

1222

locations of the scripts can be configured in

1223

\path{/etc/xen/xend-config.sxp}.

1224

1225

\begin{description}

1226

\item[network-bridge:] This script is called whenever \xend\ is started or

1227

stopped to respectively initialize or tear down the Xen virtual

1228

network. In the default configuration initialization creates the

1229

bridge `xen-br0' and moves eth0 onto that bridge, modifying the

1230

routing accordingly. When \xend\ exits, it deletes the Xen bridge

1231

and removes eth0, restoring the normal IP and routing configuration.

1232

1233

%% In configurations where the bridge already exists, this script

1234

%% could be replaced with a link to \path{/bin/true} (for instance).

1235

1236

\item[vif-bridge:] This script is called for every domain virtual

1237

interface and can configure firewalling rules and add the vif to the

1238

appropriate bridge. By default, this adds and removes VIFs on the

1239

default Xen bridge.

1240

\end{description}

1241

1242

Other example scripts are available (\path{network-route} and

1243

\path{vif-route}, \path{network-nat} and \path{vif-nat}).

1244

For more complex network setups (e.g.\ where routing is required or

1245

integrate with existing bridges) these scripts may be replaced with

1246

customized variants for your site's preferred configuration.

1247

1248

\section{Driver Domain Configuration}

1249

\label{s:ddconf}

1250

1251

\subsection{PCI}

1252

\label{ss:pcidd}

1253

1254

Individual PCI devices can be assigned to a given domain (a PCI driver domain)

1255

to allow that domain direct access to the PCI hardware.

1256

1257

While PCI Driver Domains can increase the stability and security of a system

1258

by addressing a number of security concerns, there are some security issues

1259

that remain that you can read about in Section~\ref{s:ddsecurity}.

1260

1261

\subsubsection{Compile-Time Setup}

1262

To use this functionality, ensure

1263

that the PCI Backend is compiled in to a privileged domain (e.g. domain 0)

1264

and that the domains which will be assigned PCI devices have the PCI Frontend

1265

compiled in. In XenLinux, the PCI Backend is available under the Xen

1266

configuration section while the PCI Frontend is under the

1267

architecture-specific "Bus Options" section. You may compile both the backend

1268

and the frontend into the same kernel; they will not affect each other.

1269

1270

\subsubsection{PCI Backend Configuration - Binding at Boot}

1271

The PCI devices you wish to assign to unprivileged domains must be "hidden"

1272

from your backend domain (usually domain 0) so that it does not load a driver

1273

for them. Use the \path{pciback.hide} kernel parameter which is specified on

1274

the kernel command-line and is configurable through GRUB (see

1275

Section~\ref{s:configure}). Note that devices are not really hidden from the

1276

backend domain. The PCI Backend appears to the Linux kernel as a regular PCI

1277

device driver. The PCI Backend ensures that no other device driver loads

1278

for the devices by binding itself as the device driver for those devices.

1279

PCI devices are identified by hexadecimal slot/function numbers (on Linux,

1280

use \path{lspci} to determine slot/function numbers of your devices) and

1281

can be specified with or without the PCI domain: \\

1282

\centerline{ {\tt ({\em bus}:{\em slot}.{\em func})} example {\tt (02:1d.3)}} \\

1283

\centerline{ {\tt ({\em domain}:{\em bus}:{\em slot}.{\em func})} example {\tt (0000:02:1d.3)}} \\

1284

1285

An example kernel command-line which hides two PCI devices might be: \\

1286

\centerline{ {\tt root=/dev/sda4 ro console=tty0 pciback.hide=(02:01.f)(0000:04:1d.0) } } \\

1287

1288

\subsubsection{PCI Backend Configuration - Late Binding}

1289

PCI devices can also be bound to the PCI Backend after boot through the manual

1290

binding/unbinding facilities provided by the Linux kernel in sysfs (allowing

1291

for a Xen user to give PCI devices to driver domains that were not specified

1292

on the kernel command-line). There are several attributes with the PCI

1293

Backend's sysfs directory (\path{/sys/bus/pci/drivers/pciback}) that can be

1294

used to bind/unbind devices:

1295

1296

\begin{description}

1297

\item[slots] lists all of the PCI slots that the PCI Backend will try to seize

1298

(or "hide" from Domain 0). A PCI slot must appear in this list before it can

1299

be bound to the PCI Backend through the \path{bind} attribute.

1300

\item[new\_slot] write the name of a slot here (in 0000:00:00.0 format) to

1301

have the PCI Backend seize the device in this slot.

1302

\item[remove\_slot] write the name of a slot here (same format as

1303

\path{new\_slot}) to have the PCI Backend no longer try to seize devices in

1304

this slot. Note that this does not unbind the driver from a device it has

1305

already seized.

1306

\item[bind] write the name of a slot here (in 0000:00:00.0 format) to have

1307

the Linux kernel attempt to bind the device in that slot to the PCI Backend

1308

driver.

1309

\item[unbind] write the name of a skit here (same format as \path{bind}) to have

1310

the Linux kernel unbind the device from the PCI Backend. DO NOT unbind a

1311

device while it is currently given to a PCI driver domain!

1312

\end{description}

1313

1314

Some examples:

1315

1316

Bind a device to the PCI Backend which is not bound to any other driver.

1317

\begin{verbatim}

1318

# # Add a new slot to the PCI Backend's list

1319

# echo -n 0000:01:04.d > /sys/bus/pci/drivers/pciback/new_slot

1320

# # Now that the backend is watching for the slot, bind to it

1321

# echo -n 0000:01:04.d > /sys/bus/pci/drivers/pciback/bind

1322

\end{verbatim}

1323

1324

Unbind a device from its driver and bind to the PCI Backend.

1325

\begin{verbatim}

1326

# # Unbind a PCI network card from its network driver

1327

# echo -n 0000:05:02.0 > /sys/bus/pci/drivers/3c905/unbind

1328

# # And now bind it to the PCI Backend

1329

# echo -n 0000:05:02.0 > /sys/bus/pci/drivers/pciback/new_slot

1330

# echo -n 0000:05:02.0 > /sys/bus/pci/drivers/pciback/bind

1331

\end{verbatim}

1332

1333

Note that the "-n" option in the example is important as it causes echo to not

1334

output a new-line.

1335

1336

\subsubsection{PCI Backend Configuration - User-space Quirks}

1337

Quirky devices (such as the Broadcom Tigon 3) may need write access to their

1338

configuration space registers. Xen can be instructed to allow specified PCI

1339

devices write access to specific configuration space registers. The policy may

1340

be found in:

1341

1342

\centerline{ \path{/etc/xen/xend-pci-quirks.sxp} }

1343

1344

The policy file is heavily commented and is intended to provide enough

1345

documentation for developers to extend it.

1346

1347

\subsubsection{PCI Backend Configuration - Permissive Flag}

1348

If the user-space quirks approach doesn't meet your needs you may want to enable

1349

the permissive flag for that device. To do so, first get the PCI domain, bus,

1350

slot, and function information from dom0 via \path{lspci}. Then augment the

1351

user-space policy for permissive devices. The permissive policy can be found

1352

in:

1353

1354

\centerline{ \path{/etc/xen/xend-pci-permissive.sxp} }

1355

1356

Currently, the only way to reset the permissive flag is to unbind the device

1357

from the PCI Backend driver.

1358

1359

\subsubsection{PCI Backend - Checking Status}

1360

There two important sysfs nodes that provide a mechanism to view specifics on

1361

quirks and permissive devices:

1362

\begin{description}

1363

\item \path{/sys/bus/drivers/pciback/permissive} \\

1364

Use \path{cat} on this file to view a list of permissive slots.

1365

\item \path{/sys/bus/drivers/pciback/quirks} \\

1366

Use \path{cat} on this file view a hierarchical view of devices bound to the

1367

PCI backend, their PCI vendor/device ID, and any quirks that are associated with

1368

that particular slot.

1369

\end{description}

1370

1371

You may notice that every device bound to the PCI backend has 17 quirks standard

1372

"quirks" regardless of \path{xend-pci-quirks.sxp}. These default entries are

1373

necessary to support interactions between the PCI bus manager and the device bound

1374

to it. Even non-quirky devices should have these standard entries.

1375

1376

In this case, preference was given to accuracy over aesthetics by choosing to

1377

show the standard quirks in the quirks list rather than hide them from the

1378

inquiring user

1379

1380

\subsubsection{PCI Frontend Configuration}

1381

To configure a domU to receive a PCI device:

1382

1383

\begin{description}

1384

\item[Command-line:]

1385

Use the {\em pci} command-line flag. For multiple devices, use the option

1386

multiple times. \\

1387

\centerline{ {\tt xm create netcard-dd pci=01:00.0 pci=02:03.0 }} \\

1388

1389

\item[Flat Format configuration file:]

1390

Specify all of your PCI devices in a python list named {\em pci}. \\

1391

\centerline{ {\tt pci=['01:00.0','02:03.0'] }} \\

1392

1393

\item[SXP Format configuration file:]

1394

Use a single PCI device section for all of your devices (specify the numbers

1395

in hexadecimal with the preceding '0x'). Note that {\em domain} here refers

1396

to the PCI domain, not a virtual machine within Xen.

1397

{\small

1398

\begin{verbatim}

1399

(device (pci

1400

(dev (domain 0x0)(bus 0x3)(slot 0x1a)(func 0x1)

1401

(dev (domain 0x0)(bus 0x1)(slot 0x5)(func 0x0)

1402

)

1403

\end{verbatim}

1404

}

1405

\end{description}

1406

1407

%% There are two possible types of privileges: IO privileges and

1408

%% administration privileges.

1409

1410

\section{Support for virtual Trusted Platform Module (vTPM)}

1411

\label{ss:vtpm}

1412

1413

Paravirtualized domains can be given access to a virtualized version

1414

of a TPM. This enables applications in these domains to use the services

1415

of the TPM device for example through a TSS stack

1416

\footnote{Trousers TSS stack: http://sourceforge.net/projects/trousers}.

1417

The Xen source repository provides the necessary software components to

1418

enable virtual TPM access. Support is provided through several

1419

different pieces. First, a TPM emulator has been modified to provide TPM's

1420

functionality for the virtual TPM subsystem. Second, a virtual TPM Manager

1421

coordinates the virtual TPMs efforts, manages their creation, and provides

1422

protected key storage using the TPM. Third, a device driver pair providing

1423

a TPM front- and backend is available for XenLinux to deliver TPM commands

1424

from the domain to the virtual TPM manager, which dispatches it to a

1425

software TPM. Since the TPM Manager relies on a HW TPM for protected key

1426

storage, therefore this subsystem requires a Linux-supported hardware TPM.

1427

For development purposes, a TPM emulator is available for use on non-TPM

1428

enabled platforms.

1429

1430

\subsubsection{Compile-Time Setup}

1431

To enable access to the virtual TPM, the virtual TPM backend driver must

1432

be compiled for a privileged domain (e.g. domain 0). Using the XenLinux

1433

configuration, the necessary driver can be selected in the Xen configuration

1434

section. Unless the driver has been compiled into the kernel, its module

1435

must be activated using the following command:

1436

1437

\begin{verbatim}

1438

modprobe tpmbk

1439

\end{verbatim}

1440

1441

Similarly, the TPM frontend driver must be compiled for the kernel trying

1442

to use TPM functionality. Its driver can be selected in the kernel

1443

configuration section Device Driver / Character Devices / TPM Devices.

1444

Along with that the TPM driver for the built-in TPM must be selected.

1445

If the virtual TPM driver has been compiled as module, it

1446

must be activated using the following command:

1447

1448

\begin{verbatim}

1449

modprobe tpm_xenu

1450

\end{verbatim}

1451

1452

Furthermore, it is necessary to build the virtual TPM manager and software

1453

TPM by making changes to entries in Xen build configuration files.

1454

The following entry in the file Config.mk in the Xen root source

1455

directory must be made:

1456

1457

\begin{verbatim}

1458

VTPM_TOOLS ?= y

1459

\end{verbatim}

1460

1461

After a build of the Xen tree and a reboot of the machine, the TPM backend

1462

drive must be loaded. Once loaded, the virtual TPM manager daemon

1463

must be started before TPM-enabled guest domains may be launched.

1464

To enable being the destination of a virtual TPM Migration, the virtual TPM

1465

migration daemon must also be loaded.

1466

1467

\begin{verbatim}

1468

vtpm_managerd

1469

\end{verbatim}

1470

\begin{verbatim}

1471

vtpm_migratord

1472

\end{verbatim}

1473

1474

Once the VTPM manager is running, the VTPM can be accessed by loading the

1475

front end driver in a guest domain.

1476

1477

\subsubsection{Development and Testing TPM Emulator}

1478

For development and testing on non-TPM enabled platforms, a TPM emulator

1479

can be used in replacement of a platform TPM. First, the entry in the file

1480

tools/vtpm/Rules.mk must look as follows:

1481

1482

\begin{verbatim}

1483

BUILD_EMULATOR = y

1484

\end{verbatim}

1485

1486

Second, the entry in the file tool/vtpm\_manager/Rules.mk must be uncommented

1487

as follows:

1488

1489

\begin{verbatim}

1490

# TCS talks to fifo's rather than /dev/tpm. TPM Emulator assumed on fifos

1491

CFLAGS += -DDUMMY_TPM

1492

\end{verbatim}

1493

1494

Before starting the virtual TPM Manager, start the emulator by executing

1495

the following in dom0:

1496

1497

\begin{verbatim}

1498

tpm_emulator clear

1499

\end{verbatim}

1500

1501

\subsubsection{vTPM Frontend Configuration}

1502

To provide TPM functionality to a user domain, a line must be added to

1503

the virtual TPM configuration file using the following format:

1504

1505

\begin{verbatim}

1506

vtpm = ['instance=<instance number>, backend=<domain id>']

1507

\end{verbatim}

1508

1509

The { \it instance number} reflects the preferred virtual TPM instance

1510

to associate with the domain. If the selected instance is

1511

already associated with another domain, the system will automatically

1512

select the next available instance. An instance number greater than

1513

zero must be provided. It is possible to omit the instance

1514

parameter from the configuration file.

1515

1516

The {\it domain id} provides the ID of the domain where the

1517

virtual TPM backend driver and virtual TPM are running in. It should

1518

currently always be set to '0'.

1519

1520

1521

Examples for valid vtpm entries in the configuration file are

1522

1523

\begin{verbatim}

1524

vtpm = ['instance=1, backend=0']

1525

\end{verbatim}

1526

and

1527

\begin{verbatim}

1528

vtpm = ['backend=0'].

1529

\end{verbatim}

1530

1531

\subsubsection{Using the virtual TPM}

1532

1533

Access to TPM functionality is provided by the virtual TPM frontend driver.

1534

Similar to existing hardware TPM drivers, this driver provides basic TPM

1535

status information through the {\it sysfs} filesystem. In a Xen user domain

1536

the sysfs entries can be found in /sys/devices/xen/vtpm-0.

1537

1538

Commands can be sent to the virtual TPM instance using the character

1539

device /dev/tpm0 (major 10, minor 224).

1540

1541

% Chapter Storage and FileSytem Management

1542

\chapter{Storage and File System Management}

1543

1544

Storage can be made available to virtual machines in a number of

1545

different ways. This chapter covers some possible configurations.

1546

1547

The most straightforward method is to export a physical block device (a

1548

hard drive or partition) from dom0 directly to the guest domain as a

1549

virtual block device (VBD).

1550

1551

Storage may also be exported from a filesystem image or a partitioned

1552

filesystem image as a \emph{file-backed VBD}.

1553

1554

Finally, standard network storage protocols such as NBD, iSCSI, NFS,

1555

etc., can be used to provide storage to virtual machines.

1556

1557

1558

\section{Exporting Physical Devices as VBDs}

1559

\label{s:exporting-physical-devices-as-vbds}

1560

1561

One of the simplest configurations is to directly export individual

1562

partitions from domain~0 to other domains. To achieve this use the

1563

\path{phy:} specifier in your domain configuration file. For example a

1564

line like

1565

\begin{quote}

1566

\verb_disk = ['phy:hda3,sda1,w']_

1567

\end{quote}

1568

specifies that the partition \path{/dev/hda3} in domain~0 should be

1569

exported read-write to the new domain as \path{/dev/sda1}; one could

1570

equally well export it as \path{/dev/hda} or \path{/dev/sdb5} should

1571

one wish.

1572

1573

In addition to local disks and partitions, it is possible to export

1574

any device that Linux considers to be ``a disk'' in the same manner.

1575

For example, if you have iSCSI disks or GNBD volumes imported into

1576

domain~0 you can export these to other domains using the \path{phy:}

1577

disk syntax. E.g.:

1578

\begin{quote}

1579

\verb_disk = ['phy:vg/lvm1,sda2,w']_

1580

\end{quote}

1581

1582

\begin{center}

1583

\framebox{\bf Warning: Block device sharing}

1584

\end{center}

1585

\begin{quote}

1586

Block devices should typically only be shared between domains in a

1587

read-only fashion otherwise the Linux kernel's file systems will get

1588

very confused as the file system structure may change underneath

1589

them (having the same ext3 partition mounted \path{rw} twice is a

1590

sure fire way to cause irreparable damage)! \Xend\ will attempt to

1591

prevent you from doing this by checking that the device is not

1592

mounted read-write in domain~0, and hasn't already been exported

1593

read-write to another domain. If you want read-write sharing,

1594

export the directory to other domains via NFS from domain~0 (or use

1595

a cluster file system such as GFS or ocfs2).

1596

\end{quote}

1597

1598

1599

\section{Using File-backed VBDs}

1600

1601

It is also possible to use a file in Domain~0 as the primary storage

1602

for a virtual machine. As well as being convenient, this also has the

1603

advantage that the virtual block device will be \emph{sparse} ---

1604

space will only really be allocated as parts of the file are used. So

1605

if a virtual machine uses only half of its disk space then the file

1606

really takes up half of the size allocated.

1607

1608

For example, to create a 2GB sparse file-backed virtual block device

1609

(actually only consumes no disk space at all):

1610

\begin{quote}

1611

\verb_# dd if=/dev/zero of=vm1disk bs=1k seek=2048k count=0_

1612

\end{quote}

1613

1614

Make a file system in the disk file:

1615

\begin{quote}

1616

\verb_# mkfs -t ext3 vm1disk_

1617

\end{quote}

1618

1619

(when the tool asks for confirmation, answer `y')

1620

1621

Populate the file system e.g.\ by copying from the current root:

1622

\begin{quote}

1623

\begin{verbatim}

1624

# mount -o loop vm1disk /mnt

1625

# cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt

1626

# mkdir /mnt/{proc,sys,home,tmp}

1627

\end{verbatim}

1628

\end{quote}

1629

1630

Tailor the file system by editing \path{/etc/fstab},

1631

\path{/etc/hostname}, etc.\ Don't forget to edit the files in the

1632

mounted file system, instead of your domain~0 filesystem, e.g.\ you

1633

would edit \path{/mnt/etc/fstab} instead of \path{/etc/fstab}. For

1634

this example put \path{/dev/sda1} to root in fstab.

1635

1636

Now unmount (this is important!):

1637

\begin{quote}

1638

\verb_# umount /mnt_

1639

\end{quote}

1640

1641

In the configuration file set:

1642

\begin{quote}

1643

\verb_disk = ['tap:aio:/full/path/to/vm1disk,sda1,w']_

1644

\end{quote}

1645

1646

As the virtual machine writes to its `disk', the sparse file will be

1647

filled in and consume more space up to the original 2GB.

1648

1649

{\em{Note:}} Users that have worked with file-backed VBDs on Xen in previous

1650

versions will be interested to know that this support is now provided through

1651

the blktap driver instead of the loopback driver. This change results in

1652

file-based block devices that are higher-performance, more scalable, and which

1653

provide better safety properties for VBD data. All that is required to update

1654

your existing file-backed VM configurations is to change VBD configuration

1655

lines from:

1656

\begin{quote}

1657

\verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_

1658

\end{quote}

1659

to:

1660

\begin{quote}

1661

\verb_disk = ['tap:aio:/full/path/to/vm1disk,sda1,w']_

1662

\end{quote}

1663

1664

1665

\subsection{Loopback-mounted file-backed VBDs (deprecated)}

1666

1667

{\em{{\bf{Note:}} Loopback mounted VBDs have now been replaced with

1668

blktap-based support for raw image files, as described above. This

1669

section remains to detail a configuration that was used by older Xen

1670

versions.}}

1671

1672

Raw image file-backed VBDs may also be attached to VMs using the

1673

Linux loopback driver. The only required change to the raw file

1674

instructions above are to specify the configuration entry as:

1675

\begin{quote}

1676

\verb_disk = ['file:/full/path/to/vm1disk,sda1,w']_

1677

\end{quote}

1678

1679

{\bf Note that loopback file-backed VBDs may not be appropriate for backing

1680

I/O-intensive domains.} This approach is known to experience

1681

substantial slowdowns under heavy I/O workloads, due to the I/O

1682

handling by the loopback block device used to support file-backed VBDs

1683

in dom0. Loopback support remains for old Xen installations, and users

1684

are strongly encouraged to use the blktap-based file support (using

1685

``{\tt{tap:aio}}'' as described above).

1686

1687

Additionally, Linux supports a maximum of eight loopback file-backed

1688

VBDs across all domains by default. This limit can be statically

1689

increased by using the \emph{max\_loop} module parameter if

1690

CONFIG\_BLK\_DEV\_LOOP is compiled as a module in the dom0 kernel, or

1691

by using the \emph{max\_loop=n} boot option if CONFIG\_BLK\_DEV\_LOOP

1692

is compiled directly into the dom0 kernel. Again, users are encouraged

1693

to use the blktap-based file support described above which scales to much

1694

larger number of active VBDs.

1695

1696

1697

\section{Using LVM-backed VBDs}

1698

\label{s:using-lvm-backed-vbds}

1699

1700

A particularly appealing solution is to use LVM volumes as backing for

1701

domain file-systems since this allows dynamic growing/shrinking of

1702

volumes as well as snapshot and other features.

1703

1704

To initialize a partition to support LVM volumes:

1705

\begin{quote}

1706

\begin{verbatim}

1707

# pvcreate /dev/sda10

1708

\end{verbatim}

1709

\end{quote}

1710

1711

Create a volume group named `vg' on the physical partition:

1712

\begin{quote}

1713

\begin{verbatim}

1714

# vgcreate vg /dev/sda10

1715

\end{verbatim}

1716

\end{quote}

1717

1718

Create a logical volume of size 4GB named `myvmdisk1':

1719

\begin{quote}

1720

\begin{verbatim}

1721

# lvcreate -L4096M -n myvmdisk1 vg

1722

\end{verbatim}

1723

\end{quote}

1724

1725

You should now see that you have a \path{/dev/vg/myvmdisk1} Make a

1726

filesystem, mount it and populate it, e.g.:

1727

\begin{quote}

1728

\begin{verbatim}

1729

# mkfs -t ext3 /dev/vg/myvmdisk1

1730

# mount /dev/vg/myvmdisk1 /mnt

1731

# cp -ax / /mnt

1732

# umount /mnt

1733

\end{verbatim}

1734

\end{quote}

1735

1736

Now configure your VM with the following disk configuration:

1737

\begin{quote}

1738

\begin{verbatim}

1739

disk = [ 'phy:vg/myvmdisk1,sda1,w' ]

1740

\end{verbatim}

1741

\end{quote}

1742

1743

LVM enables you to grow the size of logical volumes, but you'll need

1744

to resize the corresponding file system to make use of the new space.

1745

Some file systems (e.g.\ ext3) now support online resize. See the LVM

1746

manuals for more details.

1747

1748

You can also use LVM for creating copy-on-write (CoW) clones of LVM

1749

volumes (known as writable persistent snapshots in LVM terminology).

1750

This facility is new in Linux 2.6.8, so isn't as stable as one might

1751

hope. In particular, using lots of CoW LVM disks consumes a lot of

1752

dom0 memory, and error conditions such as running out of disk space

1753

are not handled well. Hopefully this will improve in future.

1754

1755

To create two copy-on-write clones of the above file system you would

1756

use the following commands:

1757

1758

\begin{quote}

1759

\begin{verbatim}

1760

# lvcreate -s -L1024M -n myclonedisk1 /dev/vg/myvmdisk1

1761

# lvcreate -s -L1024M -n myclonedisk2 /dev/vg/myvmdisk1

1762

\end{verbatim}

1763

\end{quote}

1764

1765

Each of these can grow to have 1GB of differences from the master

1766

volume. You can grow the amount of space for storing the differences

1767

using the lvextend command, e.g.:

1768

\begin{quote}

1769

\begin{verbatim}

1770

# lvextend +100M /dev/vg/myclonedisk1

1771

\end{verbatim}

1772

\end{quote}

1773

1774

Don't let the `differences volume' ever fill up otherwise LVM gets

1775

rather confused. It may be possible to automate the growing process by

1776

using \path{dmsetup wait} to spot the volume getting full and then

1777

issue an \path{lvextend}.

1778

1779

In principle, it is possible to continue writing to the volume that

1780

has been cloned (the changes will not be visible to the clones), but

1781

we wouldn't recommend this: have the cloned volume as a `pristine'

1782

file system install that isn't mounted directly by any of the virtual

1783

machines.

1784

1785

1786

\section{Using NFS Root}

1787

1788

First, populate a root filesystem in a directory on the server

1789

machine. This can be on a distinct physical machine, or simply run

1790

within a virtual machine on the same node.

1791

1792

Now configure the NFS server to export this filesystem over the

1793

network by adding a line to \path{/etc/exports}, for instance:

1794

1795

\begin{quote}

1796

\begin{small}

1797

\begin{verbatim}

1798

/export/vm1root 192.0.2.4/24 (rw,sync,no_root_squash)

1799

\end{verbatim}

1800

\end{small}

1801

\end{quote}

1802

1803

Finally, configure the domain to use NFS root. In addition to the

1804

normal variables, you should make sure to set the following values in

1805

the domain's configuration file:

1806

1807

\begin{quote}

1808

\begin{small}

1809

\begin{verbatim}

1810

root = '/dev/nfs'

1811

nfs_server = '2.3.4.5' # substitute IP address of server

1812

nfs_root = '/path/to/root' # path to root FS on the server

1813

\end{verbatim}

1814

\end{small}

1815

\end{quote}

1816

1817

The domain will need network access at boot time, so either statically

1818

configure an IP address using the config variables \path{ip},

1819

\path{netmask}, \path{gateway}, \path{hostname}; or enable DHCP

1820

(\path{dhcp='dhcp'}).

1821

1822

Note that the Linux NFS root implementation is known to have stability

1823

problems under high load (this is not a Xen-specific problem), so this

1824

configuration may not be appropriate for critical servers.

1825

1826

1827

\chapter{CPU Management}

1828

1829

%% KMS Something sage about CPU / processor management.

1830

1831

Xen allows a domain's virtual CPU(s) to be associated with one or more

1832

host CPUs. This can be used to allocate real resources among one or

1833

more guests, or to make optimal use of processor resources when

1834

utilizing dual-core, hyperthreading, or other advanced CPU technologies.

1835

1836

Xen enumerates physical CPUs in a `depth first' fashion. For a system

1837

with both hyperthreading and multiple cores, this would be all the

1838

hyperthreads on a given core, then all the cores on a given socket,

1839

and then all sockets. I.e. if you had a two socket, dual core,

1840

hyperthreaded Xeon the CPU order would be:

1841

1842

1843

\begin{center}

1844

\begin{tabular}{l|l|l|l|l|l|l|r}

1845

\multicolumn{4}{c|}{socket0} & \multicolumn{4}{c}{socket1} \\ \hline

1846

\multicolumn{2}{c|}{core0} & \multicolumn{2}{c|}{core1} &

1847

\multicolumn{2}{c|}{core0} & \multicolumn{2}{c}{core1} \\ \hline

1848

ht0 & ht1 & ht0 & ht1 & ht0 & ht1 & ht0 & ht1 \\

1849

\#0 & \#1 & \#2 & \#3 & \#4 & \#5 & \#6 & \#7 \\

1850

\end{tabular}

1851

\end{center}

1852

1853

1854

Having multiple vcpus belonging to the same domain mapped to the same

1855

physical CPU is very likely to lead to poor performance. It's better to

1856

use `vcpus-set' to hot-unplug one of the vcpus and ensure the others are

1857

pinned on different CPUs.

1858

1859

If you are running IO intensive tasks, its typically better to dedicate

1860

either a hyperthread or whole core to running domain 0, and hence pin

1861

other domains so that they can't use CPU 0. If your workload is mostly

1862

compute intensive, you may want to pin vcpus such that all physical CPU

1863

threads are available for guest domains.

1864

1865

\chapter{Migrating Domains}

1866

1867

\section{Domain Save and Restore}

1868

1869

The administrator of a Xen system may suspend a virtual machine's

1870

current state into a disk file in domain~0, allowing it to be resumed at

1871

a later time.

1872

1873

For example you can suspend a domain called ``VM1'' to disk using the

1874

command:

1875

\begin{verbatim}

1876

# xm save VM1 VM1.chk

1877

\end{verbatim}

1878

1879

This will stop the domain named ``VM1'' and save its current state

1880

into a file called \path{VM1.chk}.

1881

1882

To resume execution of this domain, use the \path{xm restore} command:

1883

\begin{verbatim}

1884

# xm restore VM1.chk

1885

\end{verbatim}

1886

1887

This will restore the state of the domain and resume its execution.

1888

The domain will carry on as before and the console may be reconnected

1889

using the \path{xm console} command, as described earlier.

1890

1891

\section{Migration and Live Migration}

1892

1893

Migration is used to transfer a domain between physical hosts. There

1894

are two varieties: regular and live migration. The former moves a

1895

virtual machine from one host to another by pausing it, copying its

1896

memory contents, and then resuming it on the destination. The latter

1897

performs the same logical functionality but without needing to pause

1898

the domain for the duration. In general when performing live migration

1899

the domain continues its usual activities and---from the user's

1900

perspective---the migration should be imperceptible.

1901

1902

To perform a live migration, both hosts must be running Xen / \xend\ and

1903

the destination host must have sufficient resources (e.g.\ memory

1904

capacity) to accommodate the domain after the move. Furthermore we

1905

currently require both source and destination machines to be on the same

1906

L2 subnet.

1907

1908

Currently, there is no support for providing automatic remote access

1909

to filesystems stored on local disk when a domain is migrated.

1910

Administrators should choose an appropriate storage solution (i.e.\

1911

SAN, NAS, etc.) to ensure that domain filesystems are also available

1912

on their destination node. GNBD is a good method for exporting a

1913

volume from one machine to another. iSCSI can do a similar job, but is

1914

more complex to set up.

1915

1916

When a domain migrates, it's MAC and IP address move with it, thus it is

1917

only possible to migrate VMs within the same layer-2 network and IP

1918

subnet. If the destination node is on a different subnet, the

1919

administrator would need to manually configure a suitable etherip or IP

1920

tunnel in the domain~0 of the remote node.

1921

1922

A domain may be migrated using the \path{xm migrate} command. To live

1923

migrate a domain to another machine, we would use the command:

1924

1925

\begin{verbatim}

1926

# xm migrate --live mydomain destination.ournetwork.com

1927

\end{verbatim}

1928

1929

Without the \path{--live} flag, \xend\ simply stops the domain and

1930

copies the memory image over to the new node and restarts it. Since

1931

domains can have large allocations this can be quite time consuming,

1932

even on a Gigabit network. With the \path{--live} flag \xend\ attempts

1933

to keep the domain running while the migration is in progress, resulting

1934

in typical down times of just 60--300ms.

1935

1936

For now it will be necessary to reconnect to the domain's console on the

1937

new machine using the \path{xm console} command. If a migrated domain

1938

has any open network connections then they will be preserved, so SSH

1939

connections do not have this limitation.

1940

1941

1942

%% Chapter Securing Xen

1943

\chapter{Securing Xen}

1944

1945

This chapter describes how to secure a Xen system. It describes a number

1946

of scenarios and provides a corresponding set of best practices. It

1947

begins with a section devoted to understanding the security implications

1948

of a Xen system.

1949

1950

1951

\section{Xen Security Considerations}

1952

1953

When deploying a Xen system, one must be sure to secure the management

1954

domain (Domain-0) as much as possible. If the management domain is

1955

compromised, all other domains are also vulnerable. The following are a

1956

set of best practices for Domain-0:

1957

1958

\begin{enumerate}

1959

\item \textbf{Run the smallest number of necessary services.} The less

1960

things that are present in a management partition, the better.

1961

Remember, a service running as root in the management domain has full

1962

access to all other domains on the system.

1963

\item \textbf{Use a firewall to restrict the traffic to the management

1964

domain.} A firewall with default-reject rules will help prevent

1965

attacks on the management domain.

1966

\item \textbf{Do not allow users to access Domain-0.} The Linux kernel

1967

has been known to have local-user root exploits. If you allow normal

1968

users to access Domain-0 (even as unprivileged users) you run the risk

1969

of a kernel exploit making all of your domains vulnerable.

1970

\end{enumerate}

1971

1972

\section{Driver Domain Security Considerations}

1973

\label{s:ddsecurity}

1974

1975

Driver domains address a range of security problems that exist regarding

1976

the use of device drivers and hardware. On many operating systems in common

1977

use today, device drivers run within the kernel with the same privileges as

1978

the kernel. Few or no mechanisms exist to protect the integrity of the kernel

1979

from a misbehaving (read "buggy") or malicious device driver. Driver

1980

domains exist to aid in isolating a device driver within its own virtual

1981

machine where it cannot affect the stability and integrity of other

1982

domains. If a driver crashes, the driver domain can be restarted rather than

1983

have the entire machine crash (and restart) with it. Drivers written by

1984

unknown or untrusted third-parties can be confined to an isolated space.

1985

Driver domains thus address a number of security and stability issues with

1986

device drivers.

1987

1988

However, due to limitations in current hardware, a number of security

1989

concerns remain that need to be considered when setting up driver domains (it

1990

should be noted that the following list is not intended to be exhaustive).

1991

1992

\begin{enumerate}

1993

\item \textbf{Without an IOMMU, a hardware device can DMA to memory regions

1994

outside of its controlling domain.} Architectures which do not have an

1995

IOMMU (e.g. most x86-based platforms) to restrict DMA usage by hardware

1996

are vulnerable. A hardware device which can perform arbitrary memory reads

1997

and writes can read/write outside of the memory of its controlling domain.

1998

A malicious or misbehaving domain could use a hardware device it controls

1999

to send data overwriting memory in another domain or to read arbitrary

2000

regions of memory in another domain.

2001

\item \textbf{Shared buses are vulnerable to sniffing.} Devices that share

2002

a data bus can sniff (and possible spoof) each others' data. Device A that

2003

is assigned to Domain A could eavesdrop on data being transmitted by

2004

Domain B to Device B and then relay that data back to Domain A.

2005

\item \textbf{Devices which share interrupt lines can either prevent the

2006

reception of that interrupt by the driver domain or can trigger the

2007

interrupt service routine of that guest needlessly.} A devices which shares

2008

a level-triggered interrupt (e.g. PCI devices) with another device can

2009

raise an interrupt and never clear it. This effectively blocks other devices

2010

which share that interrupt line from notifying their controlling driver

2011

domains that they need to be serviced. A device which shares an

2012

any type of interrupt line can trigger its interrupt continually which

2013

forces execution time to be spent (in multiple guests) in the interrupt

2014

service routine (potentially denying time to other processes within that

2015

guest). System architectures which allow each device to have its own

2016

interrupt line (e.g. PCI's Message Signaled Interrupts) are less

2017

vulnerable to this denial-of-service problem.

2018

\item \textbf{Devices may share the use of I/O memory address space.} Xen can

2019

only restrict access to a device's physical I/O resources at a certain

2020

granularity. For interrupt lines and I/O port address space, that

2021

granularity is very fine (per interrupt line and per I/O port). However,

2022

Xen can only restrict access to I/O memory address space on a page size

2023

basis. If more than one device shares use of a page in I/O memory address

2024

space, the domains to which those devices are assigned will be able to

2025

access the I/O memory address space of each other's devices.

2026

\end{enumerate}

2027

2028

2029

\section{Security Scenarios}

2030

2031

2032

\subsection{The Isolated Management Network}

2033

2034

In this scenario, each node has two network cards in the cluster. One

2035

network card is connected to the outside world and one network card is a

2036

physically isolated management network specifically for Xen instances to

2037

use.

2038

2039

As long as all of the management partitions are trusted equally, this is

2040

the most secure scenario. No additional configuration is needed other

2041

than forcing Xend to bind to the management interface for relocation.

2042

2043

2044

\subsection{A Subnet Behind a Firewall}

2045

2046

In this scenario, each node has only one network card but the entire

2047

cluster sits behind a firewall. This firewall should do at least the

2048

following:

2049

2050

\begin{enumerate}

2051

\item Prevent IP spoofing from outside of the subnet.

2052

\item Prevent access to the relocation port of any of the nodes in the

2053

cluster except from within the cluster.

2054

\end{enumerate}

2055

2056

The following iptables rules can be used on each node to prevent

2057

migrations to that node from outside the subnet assuming the main

2058

firewall does not do this for you:

2059

2060

\begin{verbatim}

2061

# this command disables all access to the Xen relocation

2062

# port:

2063

iptables -A INPUT -p tcp --destination-port 8002 -j REJECT

2064

2065

# this command enables Xen relocations only from the specific

2066

# subnet:

2067

iptables -I INPUT -p tcp -{}-source 192.0.2.0/24 \

2068

--destination-port 8002 -j ACCEPT

2069

\end{verbatim}

2070

2071

\subsection{Nodes on an Untrusted Subnet}

2072

2073

Migration on an untrusted subnet is not safe in current versions of Xen.

2074

It may be possible to perform migrations through a secure tunnel via an

2075

VPN or SSH. The only safe option in the absence of a secure tunnel is to

2076

disable migration completely. The easiest way to do this is with

2077

iptables:

2078

2079

\begin{verbatim}

2080

# this command disables all access to the Xen relocation port

2081

iptables -A INPUT -p tcp -{}-destination-port 8002 -j REJECT

2082

\end{verbatim}

2083

2084

%% Chapter Xen Mandatory Access Control Framework

2085

\chapter{sHype/Xen Access Control}

2086

The Xen mandatory access control framework is an implementation of the

2087

sHype Hypervisor Security Architecture

2088

(www.research.ibm.com/ssd\_shype). It permits or denies communication

2089

and resource access of domains based on a security policy. The

2090

mandatory access controls are enforced in addition to the Xen core

2091

controls, such as memory protection. They are designed to remain

2092

transparent during normal operation of domains (policy-conform

2093

behavior) but to intervene when domains move outside their intended

2094

sharing behavior. This chapter will describe how the sHype access

2095

controls in Xen can be configured to prevent viruses from spilling

2096

over from one into another workload type and secrets from leaking from

2097

one workload type to another. sHype/Xen depends on the correct

2098

behavior of Domain-0 (cf previous chapter).

2099

2100

Benefits of configuring sHype/ACM in Xen include:

2101

\begin{itemize}

2102

\item robust workload and resource protection effective against rogue

2103

user domains

2104

\item simple, platform- and operating system-independent security

2105

policies (ideal for heterogeneous distributed environments)

2106

\item safety net with minimal performance overhead in case operating

2107

system security is missing, does not scale, or fails

2108

\end{itemize}

2109

2110

These benefits are very valuable because today's operating systems

2111

become increasingly complex and often have no or insufficient

2112

mandatory access controls. (Discretionary access controls, supported

2113

by most operating systems, are not effective against viruses or

2114

misbehaving programs.) Where mandatory access control exists (e.g.,

2115

SELinux), they usually deploy platform-specific, complex, and difficult

2116

to understand security policies. Multi-tier applications in business

2117

environments typically require different operating systems

2118

(e.g., AIX, Windows, Linux) in different tiers. Related distributed

2119

transactions and workloads cannot be easily protected on the OS level.

2120

The Xen access control framework steps in to offer a coarse-grained

2121

but very robust and consistent security layer and safety net across

2122

different platforms and operating systems.

2123

2124

To control sharing between domains, Xen mediates all inter-domain

2125

communication (shared memory, events) as well as the access of domains

2126

to resources such as storage disks. Thus, Xen can confine distributed

2127

workloads (domain payloads) by permitting sharing among domains

2128

running the same type of workload and denying sharing between pairs of

2129

domains that run different workload types. We assume that--from a Xen

2130

perspective--only one workload type is running per user domain. To

2131

enable Xen to associate domains and resources with workload types,

2132

security labels including the workload types are attached to domains

2133

and resources. These labels and the hypervisor sHype controls cannot

2134

be manipulated or bypassed by user domains and are effective even

2135

against compromised or rogue domains.

2136

2137

\section{Overview}

2138

This section gives an overview of how workloads can be protected using

2139

the sHype mandatory access control framework in Xen.

2140

Figure~\ref{fig:acmoverview} shows the necessary steps in activating

2141

the Xen workload protection. These steps are described in detail in

2142

Section~\ref{section:acmexample}.

2143

2144

\begin{figure}

2145

\centering

2146

\includegraphics[width=13cm]{figs/acm_overview.eps}

2147

\caption{Overview of activating sHype workload protection in Xen.

2148

Section numbers point to representative examples.}

2149

\label{fig:acmoverview}

2150

\end{figure}

2151

2152

First, the sHype/ACM access control must be enabled in the Xen

2153

distribution and the distribution must be built and installed (cf

2154

Subsection~\ref{subsection:acmexampleconfigure}). Before we can

2155

enforce security, a Xen security policy must be created (cf

2156

Subsection~\ref{subsection:acmexamplecreate}) and deployed (cf

2157

Subsection~\ref{subsection:acmexampleinstall}). This policy defines

2158

the workload types differentiated during access control. It also

2159

defines the rules that compare workload types of domains and resources

2160

to decide about access requests. Workload types are represented by

2161

security labels that can be securely associated to domains and resources (cf

2162

Subsections~\ref{subsection:acmexamplelabeldomains}

2163

and~\ref{subsection:acmexamplelabelresources}). The functioning of

2164

the active sHype/Xen workload protection is demonstrated using simple

2165

resource assignment, and domain creation tests in

2166

Subsection~\ref{subsection:acmexampletest}.

2167

Section~\ref{section:acmpolicy} describes the syntax and semantics of

2168

the sHype/Xen security policy in detail and introduces briefly the

2169

tools that are available to help you create your own sHype security policies.

2170

2171

The next section describes all the necessary steps to create, deploy,

2172

and test a simple workload protection policy. It is meant to enable

2173

Xen users and developers to quickly try out the sHype/Xen workload

2174

protection. Those readers who are interested in learning more about

2175

how the sHype access control in Xen works and how it is configured

2176

using the XML security policy should read Section~\ref{section:acmpolicy}

2177

as well. Section~\ref{section:acmlimitations} concludes this chapter with

2178

current limitations of the sHype implementation for Xen.

2179

2180

\section{Xen Workload Protection Step-by-Step}

2181

\label{section:acmexample}

2182

2183

You are about to configure and deploy the Xen sHype workload protection

2184

by following 5 simple steps:

2185

\begin{itemize}

2186

\item configure and install sHype/Xen

2187

\item create a simple workload protection security policy

2188

\item deploy the sHype/Xen security policy

2189

\item associate domains and resources with workload labels,

2190

\item test the workload protection

2191

\end{itemize}

2192

The essential commands to create and deploy an sHype/Xen security

2193

policy are numbered throughout the following sections. If you want a

2194

quick-guide or return at a later time to go quickly through this

2195

demonstration, simply look for the numbered commands and apply them in

2196

order.

2197

2198

\subsection{Configuring/Building sHype Support into Xen}

2199

\label{subsection:acmexampleconfigure}

2200

First, we need to configure the access control module in Xen and

2201

install the ACM-enabled Xen hypervisor. This step installs security

2202

tools and compiles sHype/ACM controls into the Xen hypervisor.

2203

2204

To enable sHype/ACM in Xen, please edit the Config.mk file in the top

2205

Xen directory.

2206

2207

\begin{verbatim}

2208

(1) In Config.mk

2209

Change: XSM_ENABLE ?= n

2210

To: XSM_ENABLE ?= y

2211

2212

Change: ACM_SECURITY ?= n

2213

To: ACM_SECURITY ?= y

2214

\end{verbatim}

2215

2216

Then install the security-enabled Xen environment as follows:

2217

2218

\begin{verbatim}

2219

(2) # make world

2220

# make install

2221

\end{verbatim}

2222

2223

Reboot into the security-enabled Xen hypervisor.

2224

2225

\begin{verbatim}

2226

(3) # reboot

2227

\end{verbatim}

2228

2229

Xen will boot into the default security policy. After reboot,

2230

you can explore the simple DEFAULT policy.

2231

\begin{scriptsize}

2232

\begin{verbatim}

2233

# xm getpolicy

2234

Supported security subsystems : ACM

2235

Policy name : DEFAULT

2236

Policy type : ACM

2237

Version of XML policy : 1.0

2238

Policy configuration : loaded

2239

2240

# xm labels

2241

SystemManagement

2242

2243

# xm list --label

2244

Name ID Mem VCPUs State Time(s) Label

2245

Domain-0 0 941 1 r----- 38.1 ACM:DEFAULT:SystemManagement

2246

\end{verbatim}

2247

\end{scriptsize}

2248

2249

In this state, no domains can be started.

2250

Now, a policy can be created and loaded into the hypervisor.

2251

2252

\subsection{Creating A WLP Policy in 3 Simple Steps with ezPolicy}

2253

\label{subsection:acmexamplecreate}

2254

2255

We will use the ezPolicy tool to quickly create a policy that protects

2256

workloads. You will need both the Python and wxPython packages to run

2257

this tool. To run the tool in Domain-0, you can download the wxPython

2258

package from www.wxpython.org or use the command \verb|yum install wxPython|

2259

in Redhat/Fedora. To run the tool on MS Windows, you also need to download

2260

the Python package from www.python.org. After these packages are installed,

2261

start the ezPolicy tool with the following command:

2262

2263

\begin{verbatim}

2264

(4) # xensec_ezpolicy

2265

\end{verbatim}

2266

2267

Figure~\ref{fig:acmezpolicy} shows a screen-shot of the tool. The

2268

following steps illustrate how you can create the workload definition

2269

shown in Figure~\ref{fig:acmezpolicy}. You can use \verb|<CTRL>-h| to

2270

pop up a help window at any time. The indicators (a), (b), and (c) in

2271

Figure~\ref{fig:acmezpolicy} show the buttons that are used during the

2272

3 steps of creating a policy:

2273

\begin{enumerate}

2274

\item defining workloads

2275

\item defining run-time conflicts

2276

\item translating the workload definition into an sHype/Xen access

2277

control policy

2278

\end{enumerate}

2279

2280

\paragraph{Defining workloads.} Workloads are defined for each

2281

organization and department that you enter in the left panel.

2282

2283

To ease the transition from an unlabeled to a fully labeled workload-protection

2284

environment, we have added support to sHype/Xen to run unlabeled domains accessing

2285

unlabeled resources in addition to labeled domains accessing labeled resources.

2286

2287

Support for running unlabeled domains on sHype/Xen is enabled by adding the

2288

predefined workload type and label \verb|__UNLABELED__| to the security

2289

policy. (This is a double underscore

2290

followed by the string ''\verb|UNLABELED|'' followed by a double underscore.)

2291

The ezPolicy tool automatically adds this organization-level workload type

2292

to a new workload definition (cf Figure~\ref{fig:acmezpolicy}). It can simply be

2293

deleted from the workload definition if no such support is desired. If unlabeled domains

2294

are supported in the policy, then any domain or resource that has no label will implicitly

2295

inherit this label when access control decisions are made. In effect, unlabeled

2296

domains and resources define a new workload type \verb|__UNLABELED__|, which is

2297

confined from any other labeled workload.

2298

2299

Please use now the ``New Org'' button to add the organization workload types

2300

``A-Bank'', ``B-Bank'', and ``AutoCorp''.

2301

2302

You can refine an organization to differentiate between multiple

2303

department workloads by right-clicking the organization and selecting

2304

\verb|Add Department| (or selecting an organization and pressing

2305

\verb|<CRTL>-a|). Create department workloads ``SecurityUnderwriting'',

2306

and ``MarketAnalysis'' for the ``A-Bank''. The resulting layout of the

2307

tool should be similar to the left panel shown in

2308

Figure~\ref{fig:acmezpolicy}.

2309

2310

\begin{figure}[htb]

2311

\centering

2312

\includegraphics[width=13cm]{figs/acm_ezpolicy_gui.eps}

2313

\caption{Final layout including workload definition and Run-time Exclusion rules.}

2314

\label{fig:acmezpolicy}

2315

\end{figure}

2316

2317

\paragraph{Defining run-time conflicts.} Workloads that shall be

2318

prohibited from running concurrently on the same hypervisor platform

2319

are grouped into ``Run-time Exclusion rules'' on the right panel of

2320

the window. Cautious users should include the \verb|__UNLABELED__|

2321

workload type in all run-time exclusion rules because any workload

2322

could run inside unlabeled domains.

2323

2324

To prevent A-Bank and B-Bank workloads (including their

2325

departmental workloads) from running simultaneously on the same

2326

hypervisor system, select the organization ``A-Bank'' and, while

2327

pressing the \verb|<CTRL>|-key, select the organization ``B-Bank''.

2328

Being cautious, we also prevent unlabeled workloads from running with

2329

any of those workloads by pressing the \verb|<CTRL>|-key and selecting

2330

``\_\_UNLABELED\_\_''. Now press the button named ``Create run-time exclusion

2331

rule from selection''. A popup window will ask for the name for this run-time

2332

exclusion rule (enter a name or just hit \verb|<ENTER>|). A rule will

2333

appear on the right panel. The name is used as reference only and does

2334

not affect access control decisions.

2335

2336

Please repeat this process to create another run-time exclusion rule

2337

for the department workloads ``A-Bank.SecurityUnderwriting'',

2338

``A-Bank.MarketAnalysis''. Also add the ``\_\_UNLABELED\_\_''

2339

workload type to this conflict set.

2340

2341

The resulting layout of your window should be similar to

2342

Figure~\ref{fig:acmezpolicy}. Save this workload definition by

2343

selecting ``Save Workload Definition as ...'' in the ``File'' menu.

2344

This workload definition can be later refined if required.

2345

2346

\paragraph{Translating the workload definition into an sHype/Xen access

2347

control policy.} To translate the workload definition into a access

2348

control policy understood by Xen, please select the ``Save as Xen ACM

2349

Security Policy'' in the ``File'' menu. Enter the following policy

2350

name in the popup window: \verb|mytest|. If you are running ezPolicy in

2351

Domain-0, the resulting policy file mytest\_security-policy.xml will

2352

automatically be placed into the right directory (/etc/xen/acm-security/policies/).

2353

If you run the tool on another system, then you need to copy the

2354

resulting policy file into Domain-0 before continuing. See

2355

Section~\ref{subsection:acmnaming} for naming conventions of security

2356

policies.

2357

2358

\begin{scriptsize}

2359

\textbf{Note:} The support for \verb|__UNLABELED__| domains and

2360

resources is meant to help transitioning from an uncontrolled

2361

environment to a workload-protected environment by starting with

2362

unlabeled domains and resources and then step-by-step labeling domains

2363

and resources. Once all workloads are labeled, the \verb|__UNLABELED__|

2364

type can simply be removed from the Domain-0 label or from the policy

2365

through a policy update. Section~\ref{subsection:acmpolicymanagement} will

2366

show how unlabeled domains can be disabled by updating the

2367

\verb|mytest| policy at run-time.

2368

\end{scriptsize}

2369

2370

\subsection{Deploying a WLP Policy}

2371

\label{subsection:acmexampleinstall}

2372

To deploy the workload protection policy we created in

2373

Section~\ref{subsection:acmexamplecreate}, we create a policy

2374

representation (mytest.bin), load it into the Xen

2375

hypervisor, and configure Xen to also load this policy during

2376

reboot.

2377

2378

The following command translates the source policy representation

2379

into a format that can be loaded into Xen with sHype/ACM support,

2380

activates the policy, and configures this policy for future boot

2381

cycles into the boot sequence. Please refer to the \verb|xm|

2382

man page for further details:

2383

2384

\begin{verbatim}

2385

(5) # xm setpolicy ACM mytest

2386

Successfully set the new policy.

2387

Supported security subsystems : ACM

2388

Policy name : mytest

2389

Policy type : ACM

2390

Version of XML policy : 1.0

2391

Policy configuration : loaded, activated for boot

2392

\end{verbatim}

2393

2394

Alternatively, if installing the policy fails (e.g., because it cannot

2395

identify the Xen boot entry), you can manually install the policy in 3

2396

steps a-c.

2397

2398

(\textit{Alternatively to 5 - step a}) Manually copy the policy binary

2399

file into the boot directory:

2400

2401

\begin{scriptsize}

2402

\begin{verbatim}

2403

# cp /etc/xen/acm-security/policies/mytest.bin /boot/mytest.bin

2404

\end{verbatim}

2405

\end{scriptsize}

2406

2407

(\textit{Alternatively to 5 - step b}) Manually add a module line to your

2408

Xen boot entry so that grub loads this policy file during startup:

2409

2410

\begin{scriptsize}

2411

\begin{verbatim}

2412

title XEN Devel with 2.6.18.8

2413

kernel /xen.gz

2414

module /vmlinuz-2.6.18.8-xen root=/dev/sda3 ro console=tty0

2415

module /initrd-2.6.18.8-xen.img

2416

module /mytest.bin

2417

\end{verbatim}

2418

\end{scriptsize}

2419

2420

(\textit{Alternatively to 5 - step c}) Reboot. Xen will choose the

2421

bootstrap label defined in the policy as Domain-0 label during reboot.

2422

After reboot, you can re-label Domain-0 at run-time,

2423

cf Section~\ref{subsection:acmlabeldom0}.

2424

2425

Assuming that command (5) succeeded or you followed the alternative

2426

instructions above, you should see the new policy and label appear

2427

when listing domains:

2428

2429

\begin{scriptsize}

2430

\begin{verbatim}

2431

# xm list --label

2432

Name ID Mem VCPUs State Time(s) Label

2433

Domain-0 0 941 1 r----- 81.5 ACM:mytest:SystemManagement

2434

\end{verbatim}

2435

\end{scriptsize}

2436

2437

If the security label at the end of the line says ``INACTIVE'' then the

2438

security is not enabled. Verify the previous steps. Note: Domain-0 is

2439

assigned a default label (see \verb|bootstrap| policy attribute

2440

explained in Section~\ref{section:acmpolicy}). All other domains must

2441

be explicitly labeled, which we describe in detail below.

2442

2443

\subsection{Labeling Unmanaged User Domains}

2444

\label{subsection:acmexamplelabeldomains}

2445

2446

Unmanaged domains are started in Xen by using a configuration

2447

file. Please refer to Section~\ref{subsection:acmlabelmanageddomains}

2448

if you are using managed domains.

2449

2450

The following configuration file defines \verb|domain1|:

2451

2452

\begin{scriptsize}

2453

\begin{verbatim}

2454

# cat domain1.xm

2455

kernel= "/boot/vmlinuz-2.6.18.8-xen"

2456

memory = 128

2457

name = "domain1"

2458

vif = ['']

2459

dhcp = "dhcp"

2460

disk = ['file:/home/xen/dom_fc5/fedora.fc5.img,sda1,w', \

2461

'file:/home/xen/dom_fc5/fedora.fc5.swap,sda2,w']

2462

root = "/dev/sda1 ro xencons=tty"

2463

\end{verbatim}

2464

\end{scriptsize}

2465

2466

Every domain must be associated with a security label before it can start

2467

on sHype/Xen. Otherwise, sHype/Xen would not be able to enforce the policy

2468

consistently. Our \verb|mytest| policy is configured so that Xen

2469

assigns a default label \verb|__UNLABELED__| to domains and resources that

2470

have no label and supports them in a controlled manner. Since neither the domain,

2471

nor the resources are (yet) labeled, this domain can start under the \verb|mytest|

2472

policy:

2473

2474

\begin{scriptsize}

2475

\begin{verbatim}

2476

# xm create domain1.xm

2477

Using config file "./domain1.xm".

2478

Started domain domain1

2479

2480

# xm list --label

2481

Name ID Mem VCPUs State Time(s) Label

2482

domain1 1 128 1 -b---- 0.7 ACM:mytest:__UNLABELED__

2483

Domain-0 0 875 1 r----- 84.6 ACM:mytest:SystemManagement

2484

\end{verbatim}

2485

\end{scriptsize}

2486

2487

Please shutdown domain1 so that we can move it into the protection

2488

domain of workload \verb|A-Bank|.

2489

2490

\begin{scriptsize}

2491

\begin{verbatim}

2492

# xm shutdown domain1

2493

(wait some seconds until the domain has shut down)

2494

2495

#xm list --label

2496

Name ID Mem VCPUs State Time(s) Label

2497

Domain-0 0 875 1 r----- 86.4 ACM:mytest:SystemManagement

2498

\end{verbatim}

2499

\end{scriptsize}

2500

2501

We assume that the processing in domain1 contributes to the \verb|A-Bank| workload.

2502

We explore now how to transition this domain into the ``A-Bank'' workload-protection.

2503

The following command prints all domain labels available in the active policy:

2504

2505

\begin{scriptsize}

2506

\begin{verbatim}

2507

# xm labels

2508

A-Bank

2509

A-Bank.MarketAnalysis

2510

A-Bank.SecurityUnderwriting

2511

AutoCorp

2512

B-Bank

2513

SystemManagement

2514

__UNLABELED__

2515

\end{verbatim}

2516

\end{scriptsize}

2517

2518

Now label \verb|domain1| with the A-Bank label and another \verb|domain2|

2519

with the B-Bank label. Please refer to the xm man page for

2520

further information.

2521

2522

\begin{verbatim}

2523

(6) # xm addlabel A-Bank dom domain1.xm

2524

# xm addlabel B-Bank dom domain2.xm

2525

\end{verbatim}

2526

2527

Let us try to start the domain again:

2528

2529

\begin{scriptsize}

2530

\begin{verbatim}

2531

# xm create domain1.xm

2532

Using config file "./domain1.xm".

2533

Error: VM's access to block device 'file:/home/xen/dom_fc5/fedora.fc5.img' denied

2534

\end{verbatim}

2535

\end{scriptsize}

2536

2537

This error indicates that \verb|domain1|, if started, would not be able to

2538

access its image and swap files because they are not labeled. This

2539

makes sense because to confine workloads, access of domains to

2540

resources must be controlled. Otherwise, domains that are not allowed

2541

to communicate or run simultaneously could share data through storage

2542

resources.

2543

2544

\subsection{Labeling Resources}

2545

\label{subsection:acmexamplelabelresources}

2546

You can use the \verb|xm labels type=res| command to list available

2547

resource labels. Let us assign the A-Bank resource label to the

2548

\verb|domain1| image file representing \verb|/dev/sda1| and to its swap file:

2549

2550

\begin{verbatim}

2551

(7) # xm addlabel A-Bank res \

2552

file:/home/xen/dom_fc5/fedora.fc5.img

2553

2554

# xm addlabel A-Bank res \

2555

file:/home/xen/dom_fc5/fedora.fc5.swap

2556

\end{verbatim}

2557

2558

The following command lists all labeled resources on the system, e.g.,

2559

to lookup or verify the labeling:

2560

2561

\begin{scriptsize}

2562

\begin{verbatim}

2563

# xm resources

2564

file:/home/xen/dom_fc5/fedora.fc5.swap

2565

type: ACM

2566

policy: mytest

2567

label: A-Bank

2568

file:/home/xen/dom_fc5/fedora.fc5.img

2569

type: ACM

2570

policy: mytest

2571

label: A-Bank

2572

\end{verbatim}

2573

\end{scriptsize}

2574

2575

Starting \verb|domain1| will now succeed:

2576

2577

\begin{scriptsize}

2578

\begin{verbatim}

2579

# xm create domain1.xm

2580

Using config file "./domain1.xm".

2581

Started domain domain1

2582

2583

# xm list --label

2584

Name ID Mem VCPUs State Time(s) Label

2585

domain1 3 128 1 -b---- 0.8 ACM:mytest:A-Bank

2586

Domain-0 0 875 1 r----- 90.9 ACM:mytest:SystemManagement

2587

\end{verbatim}

2588

\end{scriptsize}

2589

2590

Currently, if a labeled resource is moved to another location, the

2591

label must first be manually removed, and after the move re-attached

2592

using the xm commands \verb|rmlabel| and \verb|addlabel|

2593

respectively. Please see Section~\ref{section:acmlimitations} for

2594

further details.

2595

2596

\begin{verbatim}

2597

(8) Label the resources of domain2 as B-Bank

2598

but please do not start this domain yet.

2599

\end{verbatim}

2600

2601

\subsection{Testing The Xen Workload Protection}

2602

\label{subsection:acmexampletest}

2603

2604

We are about to demonstrate the sHype/Xen workload protection by verifying

2605

\begin{itemize}

2606

\item that user domains with conflicting workloads cannot run

2607

simultaneously

2608

\item that user domains cannot access resources of workloads other than the

2609

one they are associated with

2610

\item that user domains cannot exchange network packets if they are not

2611

associated with the same workload type (not yet supported in Xen)

2612

\end{itemize}

2613

2614

\paragraph{Test 1: Run-time exclusion rules.} We assume that \verb|domain1|

2615

with the A-Bank label is still running. While \verb|domain1| is running,

2616

the run-time exclusion set of our policy implies that \verb|domain2| cannot

2617

start because the label of \verb|domain1| includes the CHWALL type A-Bank

2618

and the label of \verb|domain2| includes the CHWALL type B-Bank. The

2619

run-time exclusion rule of our policy enforces that A-Bank and

2620

B-Bank cannot run at the same time on the same hypervisor platform.

2621

Once domain1 is stopped, saved, or migrated to another platform,

2622

\verb|domain2| can start. Once \verb|domain2| is started, however,

2623

\verb|domain1| can no longer start or resume on this system. When creating the

2624

Chinese Wall types for the workload labels, the ezPolicy tool policy

2625

translation component ensures that department workloads inherit all the

2626

organization types (and with it any organization exclusions).

2627

2628

\begin{scriptsize}

2629

\begin{verbatim}

2630

# xm list --label

2631

Name ID Mem VCPUs State Time(s) Label

2632

domain1 3 128 1 -b---- 0.8 ACM:mytest:A-Bank

2633

Domain-0 0 875 1 r----- 90.9 ACM:mytest:SystemManagement

2634

2635

# xm create domain2.xm

2636

Using config file "./domain2.xm".

2637

Error: 'Domain in conflict set with running domains'

2638

2639

# xm shutdown domain1

2640

(wait some seconds until domain 1 is shut down)

2641

2642

# xm list --label

2643

Name ID Mem VCPUs State Time(s) Label

2644

Domain-0 0 873 1 r----- 95.3 ACM:mytest:SystemManagement

2645

2646

# xm create domain2.xm

2647

Using config file "./domain2.xm".

2648

Started domain domain2

2649

2650

# xm list --label

2651

Name ID Mem VCPUs State Time(s) Label

2652

domain2 5 164 1 -b---- 0.3 ACM:mytest:B-Bank

2653

Domain-0 0 839 1 r----- 96.4 ACM:mytest:SystemManagement

2654

2655

# xm create domain1.xm

2656

Using config file "domain1.xm".

2657

Error: 'Domain in conflict with running domains'

2658

2659

# xm shutdown domain2

2660

# xm list --label

2661

Name ID Mem VCPUs State Time(s) Label

2662

Domain-0 0 839 1 r----- 97.8 ACM:mytest:SystemManagement

2663

\end{verbatim}

2664

\end{scriptsize}

2665

2666

You can verify that domains with AutoCorp label can run together with

2667

domains labeled A-Bank or B-Bank.

2668

2669

\paragraph{Test2: Resource access.} In this test, we will re-label the

2670

swap file for \verb|domain1| with the \verb|B-Bank| resource label. In a

2671

real environment, the swap file must be sanitized (scrubbed/zeroed) before

2672

it is reassigned to prevent data leaks from the A-Bank to the B-Bank workload

2673

through the swap file.

2674

2675

We expect that \verb|domain1| will no longer start because it cannot access

2676

this resource. This test checks the sharing abilities of domains, which are

2677

defined by the Simple Type Enforcement Policy component.

2678

2679

\begin{scriptsize}

2680

\begin{verbatim}

2681

# xm rmlabel res file:/home/xen/dom_fc5/fedora.fc5.swap

2682

2683

# xm addlabel B-Bank res file:/home/xen/dom_fc5/fedora.fc5.swap

2684

2685

# xm resources

2686

file:/home/xen/dom_fc5/fedora.fc5.swap

2687

type: ACM

2688

policy: mytest

2689

label: B-Bank

2690

file:/home/xen/dom_fc5/fedora.fc5.img

2691

type: ACM

2692

policy: mytest

2693

label: A-Bank

2694

2695

# xm create domain1.xm

2696

Using config file "./domain1.xm".

2697

Error:

2698

VM's access to block device 'file:/home/xen/dom_fc5/fedora.fc5.swap' denied

2699

\end{verbatim}

2700

\end{scriptsize}

2701

2702

The resource authorization checks are performed before the domain is actually started

2703

so that failures during the startup are prevented. A domain is only started if all

2704

the resources specified in its configuration are accessible.

2705

2706

\paragraph{Test 3: Communication.} In this test we would verify that

2707

two domains with labels A-Bank and B-Bank cannot exchange network packets

2708

by using the 'ping' connectivity test. It is also related to the STE

2709

policy. {\bf Note:} sHype/Xen does control direct communication between

2710

domains. However, domains associated with different workloads can

2711

currently still communicate through the Domain-0 virtual network. We

2712

are working on the sHype/ACM controls for local and remote network

2713

traffic through Domain-0. Please monitor the xen-devel mailing list

2714

for updated information.

2715

2716

2717

\subsection{Labeling Domain-0 --or-- Restricting System Authorization}

2718

\label{subsection:acmlabeldom0}

2719

The major use case for explicitly labeling or relabeling Domain-0 is to restrict

2720

or extend which workload types can run on a virtualized Xen system. This enables

2721

flexible partitioning of the physical infrastructure as well as the workloads

2722

running on it in a multi-platform environment.

2723

2724

In case no Domain-0 label is explicitly stated, we automatically assigned Domain-0

2725

the \verb|SystemManagement| label, which includes all STE (workload) types that

2726

are known to the policy. In effect, the Domain-0 label authorizes the Xen system

2727

to run only those workload types, whose STE types are included in the Domain-0

2728

label. Hence, choosing the \verb|SystemManagement| label for Domain-0 permits any

2729

labeled domain to run. Resetting the label for Domain-0 at boot or run-time to

2730

a label with a subset of the known STE workload types restricts which user domains

2731

can run on this system. If Domain-0 is relabeled at run-time, then the new label

2732

must at least include all STE types of those domains that are currently running.

2733

The operation fails otherwise. This requirement ensures that the system remains

2734

in a valid security configuration after re-labelling.

2735

2736

Restricting the Domain-0 authorization through the label creates a flexible

2737

policy-driven way to strongly partition the physical infrastructure and the

2738

workloads running on it. This partitioning will be automatically enforced during

2739

migration, start, or resume of domains and simplifies the security management

2740

considerably. Strongly competing workloads can be forced to run on separate physical

2741

infrastructure and become less depend on the domain isolation capabilities

2742

of the hypervisor.

2743

2744

First, we relabel the swap image back to A-Bank and then start up domain1:

2745

\begin{scriptsize}

2746

\begin{verbatim}

2747

# xm rmlabel res file:/home/xen/dom_fc5/fedora.fc5.swap

2748

2749

# xm addlabel A-Bank res file:/home/xen/dom_fc5/fedora.fc5.swap

2750

2751

# xm create domain1.xm

2752

Using config file "./domain1.xm".

2753

Started domain domain1

2754

2755

# xm list --label

2756

Name ID Mem VCPUs State Time(s) Label

2757

domain1 7 128 1 -b---- 0.7 ACM:mytest:A-Bank

2758

Domain-0 0 839 1 r----- 103.1 ACM:mytest:SystemManagement

2759

\end{verbatim}

2760

\end{scriptsize}

2761

2762

The following command will restrict the Xen system to only run STE types

2763

included in the A-Bank label.

2764

2765

\begin{scriptsize}

2766

\begin{verbatim}

2767

# xm addlabel A-Bank mgt Domain-0

2768

Successfully set the label of domain 'Domain-0' to 'A-Bank'.

2769

2770

# xm list --label

2771

Name ID Mem VCPUs State Time(s) Label

2772

Domain-0 0 839 1 r----- 103.7 ACM:mytest:A-Bank

2773

domain1 7 128 1 -b---- 0.7 ACM:mytest:A-Bank

2774

2775

\end{verbatim}

2776

\end{scriptsize}

2777

2778

In our example policy in Figure~\ref{fig:acmxmlfileb}, this means that

2779

only \verb|A-Bank| domains and workloads (types) can run after the

2780

successful completion of this command because the \verb|A-Bank| label

2781

includes only a single STE type, namely \verb|A-Bank|. This command

2782

fails if any running domain has an STE type in its label that is not

2783

included in the A-Bank label.

2784

2785

If we now label a domain3 with AutoCorp, it cannot start because Domain-0 is

2786

no longer authorized to run the workload type \verb|AutoCorp|.

2787

\begin{scriptsize}

2788

\begin{verbatim}

2789

# xm addlabel AutoCorp dom domain3.xm

2790

(remember to label its resources, too)

2791

2792

# xm create domain3.xm

2793

Using config file "./domain3.xm".

2794

Error: VM is not authorized to run.

2795

2796

# xm list --label

2797

Name ID Mem VCPUs State Time(s) Label

2798

Domain-0 0 839 1 r----- 104.7 ACM:mytest:A-Bank

2799

domain1 7 128 1 -b---- 0.7 ACM:mytest:A-Bank

2800

\end{verbatim}

2801

\end{scriptsize}

2802

2803

At this point, unlabeled domains cannot start either. Let domain4.xm

2804

describe an unlabeled domain, then trying to start domain4

2805

will fail:

2806

\begin{scriptsize}

2807

\begin{verbatim}

2808

# xm getlabel dom domain4.xm

2809

Error: 'Domain not labeled'

2810

2811

# xm create domain4.xm

2812

Using config file "./domain4.xm".

2813

Error: VM is not authorized to run.

2814

\end{verbatim}

2815

\end{scriptsize}

2816

2817

Relabeling Domain-0 with the SystemManagement label will enable domain3 to start.

2818

\begin{scriptsize}

2819

\begin{verbatim}

2820

# xm addlabel SystemManagement mgt Domain-0

2821

Successfully set the label of domain 'Domain-0' to 'SystemManagement'.

2822

2823

# xm list --label

2824

Name ID Mem VCPUs State Time(s) Label

2825

domain1 7 128 1 -b---- 0.8 ACM:mytest:A-Bank

2826

Domain-0 0 839 1 r----- 106.6 ACM:mytest:SystemManagement

2827

2828

# xm create domain3.xm

2829

Using config file "./domain3.xm".

2830

Started domain domain3

2831

2832

# xm list --label

2833

Name ID Mem VCPUs State Time(s) Label

2834

domain1 7 128 1 -b---- 0.8 ACM:mytest:A-Bank

2835

domain3 8 164 1 -b---- 0.3 ACM:mytest:AutoCorp

2836

Domain-0 0 711 1 r----- 107.6 ACM:mytest:SystemManagement

2837

\end{verbatim}

2838

\end{scriptsize}

2839

2840

2841

\subsection{Labeling Managed User Domains}

2842

\label{subsection:acmlabelmanageddomains}

2843

2844

Xend has been extended with functionality to manage domains along with their

2845

configuration information. Such domains are configured and started via Xen-API

2846

calls. Since managed domains do not have an associated xm configuration file,

2847

the existing \verb|addlabel| command, which adds the security label into a

2848

domain's configuration file, will not work for such managed domains.

2849

2850

Therefore, we have extended the \verb|xm addlabel| and \verb|xm rmlabel|

2851

subcommands to enable adding security labels to and removing security

2852

labels from managed domain configurations. The following example shows how

2853

the \verb|A-Bank| label can be assigned to the xend-managed

2854

domain configuration of \verb|domain1|. Removing labels from managed user

2855

domain configurations works similarly.

2856

2857

Below, we show a dormant configuration of the managed domain1

2858

with ID \verb|"-1"| and state \verb|"-----"| before labeling:

2859

\begin{scriptsize}

2860

\begin{verbatim}

2861

# xm list --label

2862

Name ID Mem VCPUs State Time(s) Label

2863

domain1 -1 128 1 ------ 0.0 ACM:mytest:__UNLABELED__

2864

Domain-0 0 711 1 r----- 128.4 ACM:mytest:SystemManagement

2865

\end{verbatim}

2866

\end{scriptsize}

2867

2868

Now we label the managed domain:

2869

\begin{scriptsize}

2870

\begin{verbatim}

2871

# xm addlabel A-Bank mgt domain1

2872

Successfully set the label of the dormant domain 'domain1' to 'A-Bank'.

2873

\end{verbatim}

2874

\end{scriptsize}

2875

2876

After labeling, you can see that the security label is part of the

2877

domain configuration:

2878

\begin{scriptsize}

2879

\begin{verbatim}

2880

# xm list --label

2881

Name ID Mem VCPUs State Time(s) Label

2882

domain1 -1 128 1 ------ 0.0 ACM:mytest:A-Bank

2883

Domain-0 0 711 1 r----- 129.7 ACM:mytest:SystemManagement

2884

\end{verbatim}

2885

\end{scriptsize}

2886

2887

This command extension does not support relabeling of individual running user domains

2888

for several reasons. For one, because of the difficulty to revoke resources

2889

in cases where a running domain's new label does not permit access to resources

2890

that were accessible under the old label. Another reason is that changing the

2891

label of a single domain of a workload is rarely a good choice and will affect

2892

the workload isolation properties of the overall workload.

2893

2894

However, the name and contents of the label associated with running domains can

2895

be indirectly changed through a global policy change, which will update the whole

2896

workload consistently (domains and resources), cf.

2897

Section~\ref{subsection:acmpolicymanagement}.

2898

2899

\section{Xen Access Control Policy}

2900

\label{section:acmpolicy}

2901

2902

This section describes the sHype/Xen access control policy in detail.

2903

It gives enough information to enable the reader to write custom

2904

access control policies and to use the available Xen policy tools. The

2905

policy language is expressive enough to specify most symmetric access

2906

relationships between domains and resources efficiently.

2907

2908

The Xen access control policy consists of two policy components. The

2909

first component, called Simple Type Enforcement (STE) policy, controls

2910

the sharing between running domains, i.e., communication or access to

2911

shared resources. The second component, called Chinese Wall (CHWALL)

2912

policy, controls which domains can run simultaneously on the same

2913

virtualized platform. The CHWALL and STE policy components complement

2914

each other. The XML policy file includes all information

2915

needed by Xen to enforce those policies.

2916

2917

Figures~\ref{fig:acmxmlfilea} and \ref{fig:acmxmlfileb} show the fully

2918

functional but very simple example Xen security policy that is created

2919

by ezPolicy as shown in Figure~\ref{fig:acmezpolicy}. The policy can

2920

distinguish the 6 workload types shown in lines 11-17 in

2921

Fig.~\ref{fig:acmxmlfilea}. The whole XML Security Policy consists of

2922

four parts:

2923

\begin{enumerate}

2924

\item Policy header including the policy name

2925

\item Simple Type Enforcement block

2926

\item Chinese Wall Policy block

2927

\item Label definition block

2928

\end{enumerate}

2929

2930

\begin{figure}

2931

\begin{scriptsize}

2932

\begin{verbatim}

2933

01 <?xml version="1.0" ?>

2934

02

2935

03 <SecurityPolicyDefinition ...">

2936

04 <PolicyHeader>

2937

05 <PolicyName>mytest</PolicyName>

2938

06 <Date>Mon Nov 19 22:51:56 2007</Date>

2939

07 <Version>1.0</Version>

2940

08 </PolicyHeader>

2941

09 <SimpleTypeEnforcement>

2942

10 <SimpleTypeEnforcementTypes>

2943

11 <Type>SystemManagement</Type>

2944

12 <Type>__UNLABELED__</Type>

2945

13 <Type>A-Bank</Type>

2946

14 <Type>A-Bank.SecurityUnderwriting</Type>

2947

15 <Type>A-Bank.MarketAnalysis</Type>

2948

16 <Type>B-Bank</Type>

2949

17 <Type>AutoCorp</Type>

2950

18 </SimpleTypeEnforcementTypes>

2951

19 </SimpleTypeEnforcement>

2952

20 <ChineseWall priority="PrimaryPolicyComponent">

2953

21 <ChineseWallTypes>

2954

22 <Type>SystemManagement</Type>

2955

23 <Type>__UNLABELED__</Type>

2956

24 <Type>A-Bank</Type>

2957

25 <Type>A-Bank.SecurityUnderwriting</Type>

2958

26 <Type>A-Bank.MarketAnalysis</Type>

2959

27 <Type>B-Bank</Type>

2960

28 <Type>AutoCorp</Type>

2961

29 </ChineseWallTypes>

2962

30 <ConflictSets>

2963

31 <Conflict name="RER">

2964

32 <Type>A-Bank</Type>

2965

33 <Type>B-Bank</Type>

2966

34 <Type>__UNLABELED__</Type>

2967

35 </Conflict>

2968

36 <Conflict name="RER">

2969

37 <Type>A-Bank.MarketAnalysis</Type>

2970

38 <Type>A-Bank.SecurityUnderwriting</Type>

2971

39 <Type>__UNLABELED__</Type>

2972

40 </Conflict>

2973

41 </ConflictSets>

2974

42 </ChineseWall>

2975

\end{verbatim}

2976

\end{scriptsize}

2977

\caption{Example XML security policy file -- Part I: Types and Rules Definition.}

2978

\label{fig:acmxmlfilea}

2979

\end{figure}

2980

2981

\subsection{Policy Header and Policy Name}

2982

\label{subsection:acmnaming}

2983

Lines 1-2 (cf Figure~\ref{fig:acmxmlfilea}) include the usual XML

2984

header. The security policy definition starts in Line 3 and refers to

2985

the policy schema. The XML-Schema definition for the Xen policy can be

2986

found in the file

2987

\textit{/etc/xen/acm-security/policies/security-policy.xsd}. Examples

2988

for security policies can be found in the example subdirectory. The

2989

acm-security directory is only installed if ACM security is configured

2990

during installation (cf Section~\ref{subsection:acmexampleconfigure}).

2991

2992

The \verb|Policy Header| spans lines 4-8. It includes a date field and

2993

defines the policy name \verb|mytest| as well

2994

as the version of the XML. It can also include optional fields that are

2995

not shown and are for future use (see schema definition).

2996

2997

The policy name serves two purposes: First, it provides a unique name

2998

for the security policy. This name is also exported by the Xen

2999

hypervisor to the Xen management tools in order to ensure that both

3000

the Xen hypervisor and Domain-0 enforce the same policy.

3001

We plan to extend the policy name with a

3002

digital fingerprint of the policy contents to better protect this

3003

correlation. Second, it implicitly points the xm tools to the

3004

location where the XML policy file is stored on the Xen system.

3005

Replacing the colons in the policy name by slashes yields the local

3006

path to the policy file starting from the global policy directory

3007

\verb|/etc/xen/acm-security/policies|. The last part of the policy

3008

name is the prefix for the XML policy file name, completed by

3009

\verb|-security_policy.xml|. Our example policy with the name

3010

\verb|mytest| can be found in the XML policy file named

3011

\verb|mytest-security_policy.xml| that is stored under the global

3012

policy directory. Another, preinstalled example policy named

3013

\verb|example.test| can be found in the \verb|test-security_policy.xml|

3014

under \verb|/etc/xen/acm-security/policies/example|.

3015

3016

\subsection{Simple Type Enforcement Policy Component}

3017

3018

The Simple Type Enforcement (STE) policy controls which domains can

3019

communicate or share resources. This way, Xen can enforce confinement

3020

of workload types by confining the domains running those workload

3021

types and their resources. The mandatory access control framework

3022

enforces its policy when

3023

domains access intended communication or cooperation means (shared

3024

memory, events, shared resources such as block devices). It builds on

3025

top of the core hypervisor isolation, which restricts the ways of

3026

inter-communication to those intended means. STE does not protect or

3027

intend to protect from covert channels in the hypervisor or hardware;

3028

this is an orthogonal problem that can be mitigated by using the

3029

Run-time Exclusion rules described above or by fixing the problem leading

3030

to those covert channels in the core hypervisor or hardware platform.

3031

3032

Xen controls sharing between domains on the resource and domain level

3033

because this is the abstraction the hypervisor and its management

3034

understand naturally. While this is coarse-grained, it is also very

3035

reliable and robust and it requires minimal changes to implement

3036

mandatory access controls in the hypervisor. It enables platform- and

3037

operating system-independent policies as part of a layered security

3038

approach.

3039

3040

Lines 11-17 (cf Figure~\ref{fig:acmxmlfilea}) define the Simple Type

3041

Enforcement policy component. Essentially, they define the workload

3042

type names \verb|SystemManagement|, \verb|A-Bank|,

3043

\verb|AutoCorp| etc. that are available in the STE policy component. The

3044

policy rules are implicit: Xen permits two domains to communicate with

3045

each other if and only if their security labels have at least one STE type in

3046

common. Similarly, Xen permits a user domain to access a

3047

resource if and only if the labels of the domain and the resource

3048

have at least one STE workload type in common.

3049

3050

\subsection{Chinese Wall Policy Component}

3051

3052

The Chinese Wall security policy interpretation of sHype enables users

3053

to prevent certain workloads from running simultaneously on the same

3054

hypervisor platform. Run-time Exclusion rules (RER), also called

3055

Conflict Sets or Anti-Collocation rules, define a set of workload types

3056

that are not permitted to run simultaneously on the same virtualized

3057

platform. Of all the workloads specified in a Run-time

3058

Exclusion rule, at most one type can run on the same hypervisor

3059

platform at a time. Run-time Exclusion Rules implement a less

3060

rigorous variant of the original Chinese Wall security component. They

3061

do not implement the *-property of the policy, which would require to

3062

restrict also types that are not part of an exclusion rule once they

3063

are running together with a type in an exclusion rule

3064

(http://www.gammassl.co.uk/topics/chinesewall.html provides more information

3065

on the original Chinese Wall policy).

3066

3067

Xen considers the \verb|ChineseWallTypes| part of the label for the

3068

enforcement of the Run-time Exclusion rules. It is illegal to define

3069

labels including conflicting Chinese Wall types.

3070

3071

Lines 20-41 (cf Figure~\ref{fig:acmxmlfilea}) define the Chinese Wall

3072

policy component. Lines 22-28 define the known Chinese Wall types,

3073

which coincide here with the STE types defined above. This usually

3074

holds if the criteria for sharing among domains and sharing of the

3075

hardware platform are the same. Lines 30-41 define one Run-time

3076

Exclusion rules, the first of which is depicted below:

3077

3078

\begin{scriptsize}

3079

\begin{verbatim}

3080

31 <Conflict name="RER">

3081

32 <Type>A-Bank</Type>

3082

33 <Type>B-Bank</Type>

3083

34 <Type>__UNLABELED__</Type>

3084

35 </Conflict>

3085

\end{verbatim}

3086

\end{scriptsize}

3087

3088

Based on this rule, Xen enforces that only one of the types

3089

3090

on a single hypervisor platform at a time. For example, once a domain assigned a

3091

\verb|A-Bank| workload type is started, domains with the

3092

\verb|B-Bank| type or unlabeled domains will be denied to start.

3093

When the former domain stops and no other domains with the \verb|A-Bank|

3094

type are running, then domains with the \verb|B-Bank| type or unlabeled domains

3095

can start.

3096

3097

Xen maintains reference counts on each running workload type to keep

3098

track of which workload types are running. Every time a domain starts

3099

or resumes, the reference count on those Chinese Wall types that are

3100

referenced in the domain's label are incremented. Every time a domain

3101

is destroyed or saved, the reference counts of its Chinese Wall types

3102

are decremented. sHype in Xen fully supports migration and live-migration,

3103

which is subject to access control the same way as saving a domain on

3104

the source platform and resuming it on the destination platform.

3105

3106

Here are some reasons why users might want to restrict workloads or domains

3107

from sharing the system hardware simultaneously:

3108

3109

\begin{itemize}

3110

\item Imperfect resource management or control might enable a compromised

3111

user domain to starve other domains and the workload running in them.

3112

\item Redundant user domains might run the same workload to increase

3113

availability; such domains should not run on the same hardware to

3114

avoid single points of failure.

3115

\item Imperfect Xen core domain isolation might enable two rogue

3116

domains running different workload types to use unintended and

3117

unknown ways (covert channels) to exchange some bits of information.

3118

This way, they bypass the policed Xen access control mechanisms. Such

3119

imperfections cannot be completely eliminated and are a result of

3120

trade-offs between security and other design requirements. For a

3121

simple example of a covert channel see

3122

http://www.multicians.org/timing-chn.html. Such covert channels

3123

exist also between workloads running on different platforms if they

3124

are connected through networks. The Xen Chinese Wall policy provides

3125

an approximated ``air-gap'' between selected workload types.

3126

\end{itemize}

3127

3128

\subsection{Security Labels}

3129

3130

To enable Xen to associate domains with workload types running in

3131

them, each domain is assigned a security label that includes the

3132

workload types of the domain.

3133

3134

\begin{figure}[htb]

3135

\begin{tabular*}{\textwidth}{@{\extracolsep{\fill}}l|l}

3136

\begin{minipage}{0.475\textwidth}

3137

\begin{tiny}

3138

\begin{verbatim}

3139

3140

3141

3142

<Name>SystemManagement</Name>

3143

3144

<Type>SystemManagement</Type>

3145

<Type>__UNLABELED__</Type>

3146

3147

<Type>A-Bank.SecurityUnderwriting</Type>

3148

<Type>A-Bank.MarketAnalysis</Type>

3149

3150

<Type>AutoCorp</Type>

3151

</SimpleTypeEnforcementTypes>

3152

3153

<Type>SystemManagement</Type>

3154

</ChineseWallTypes>

3155

</VirtualMachineLabel>

3156

3157

<Name>__UNLABELED__</Name>

3158

3159

<Type>__UNLABELED__</Type>

3160

</SimpleTypeEnforcementTypes>

3161

3162

<Type>__UNLABELED__</Type>

3163

</ChineseWallTypes>

3164

</VirtualMachineLabel>

3165

3166

3167

3168

3169

</SimpleTypeEnforcementTypes>

3170

3171

3172

</ChineseWallTypes>

3173

</VirtualMachineLabel>

3174

3175

<Name>A-Bank.SecurityUnderwriting</Name>

3176

3177

<Type>A-Bank.SecurityUnderwriting</Type>

3178

</SimpleTypeEnforcementTypes>

3179

3180

3181

<Type>A-Bank.SecurityUnderwriting</Type>

3182

</ChineseWallTypes>

3183

</VirtualMachineLabel>

3184

3185

<Name>A-Bank.MarketAnalysis</Name>

3186

3187

<Type>A-Bank.MarketAnalysis</Type>

3188

</SimpleTypeEnforcementTypes>

3189

3190

3191

<Type>A-Bank.MarketAnalysis</Type>

3192

</ChineseWallTypes>

3193

</VirtualMachineLabel>

3194

3195

3196

3197

3198

</SimpleTypeEnforcementTypes>

3199

3200

3201

</ChineseWallTypes>

3202

</VirtualMachineLabel>

3203

\end{verbatim}

3204

\end{tiny}

3205

\end{minipage} &

3206

\begin{minipage}{0.475\textwidth}

3207

\begin{tiny}

3208

\begin{verbatim}

3209

3210

<Name>AutoCorp</Name>

3211

3212

<Type>AutoCorp</Type>

3213

</SimpleTypeEnforcementTypes>

3214

3215

<Type>AutoCorp</Type>

3216

</ChineseWallTypes>

3217

</VirtualMachineLabel>

3218

</SubjectLabels>

3219

3220

3221

<Name>SystemManagement</Name>

3222

3223

<Type>SystemManagement</Type>

3224

</SimpleTypeEnforcementTypes>

3225

</ResourceLabel>

3226

3227

<Name>__UNLABELED__</Name>

3228

3229

<Type>__UNLABELED__</Type>

3230

</SimpleTypeEnforcementTypes>

3231

</ResourceLabel>

3232

3233

3234

3235

3236

</SimpleTypeEnforcementTypes>

3237

</ResourceLabel>

3238

3239

<Name>A-Bank.SecurityUnderwriting</Name>

3240

3241

<Type>A-Bank.SecurityUnderwriting</Type>

3242

</SimpleTypeEnforcementTypes>

3243

</ResourceLabel>

3244

3245

<Name>A-Bank.MarketAnalysis</Name>

3246

3247

<Type>A-Bank.MarketAnalysis</Type>

3248

</SimpleTypeEnforcementTypes>

3249

</ResourceLabel>

3250

3251

3252

3253

3254

</SimpleTypeEnforcementTypes>

3255

</ResourceLabel>

3256

3257

<Name>AutoCorp</Name>

3258

3259

<Type>AutoCorp</Type>

3260

</SimpleTypeEnforcementTypes>

3261

</ResourceLabel>

3262

</ObjectLabels>

3263

</SecurityLabelTemplate>

3264

</SecurityPolicyDefinition>

3265

3266

3267

3268

3269

3270

3271

3272

3273

\end{verbatim}

3274

\end{tiny}

3275

\end{minipage}

3276

\end{tabular*}

3277

\caption{Example XML security policy file -- Part II: Label Definition.}

3278

\label{fig:acmxmlfileb}

3279

\end{figure}

3280

% DO NOT MODIFY WHITESPACE ABOVE, it balances the columns

3281

The \verb|SecurityLabelTemplate| (cf Figure~\ref{fig:acmxmlfileb}) defines

3282

the security labels that can be associated with domains and resources when

3283

this policy is active (use the \verb|xm labels type=any| command described in

3284

Section~\ref{subsection:acmexamplelabeldomains} to list all available labels).

3285

3286

The domain labels include

3287

Chinese Wall types while resource labels do not include Chinese Wall types.

3288

The \verb|SubjectLabels| policy section defines the labels that can be

3289

assigned to domains. The VM label

3290

\verb|A-Bank.SecurityUnderwriting| in Figure~\ref{fig:acmxmlfileb})

3291

associates the domain that carries it with the workload STE type

3292

\verb|A-Bank.SecurityUnderwriting| and with the CHWALL types \verb|A-Bank|

3293

and \verb|A-Bank.SecurityUnderwriting|. The ezPolicy tool

3294

assumes that any department workload will inherit any conflict set that

3295

is specified for its organization, i.e., if \verb|B-Bank| is running, not

3296

only \verb|A-Bank| but also all its departmental workloads are prevented

3297

from running by this first run-time exclusion set. The separation of STE

3298

and CHWALL types in the label definition ensures that

3299

all departmental workloads are isolated from each other and from their generic

3300

organization workloads, while they are sharing CHWALL types to

3301

simplify the formulation of run-time exclusion sets.

3302

3303

The \verb|bootstrap| attribute of the \verb|<SubjectLabels>| XML node

3304

in our example policy shown in Figure~\ref{fig:acmxmlfileb} names

3305

the label \verb|SystemManagement| as the label that Xen will assign

3306

to Domain-0 at boot time (if this policy is installed as boot policy). The

3307

label of Domain-0 can be persistently changed at run-time with the

3308

\verb|addlabel| command, which adds an overriding option to the grub.conf

3309

boot entry (cf Section~\ref{subsection:acmlabeldom0}).

3310

All user domains are assigned labels according to their domain configuration

3311

(see Section~\ref{subsection:acmexamplelabeldomains} for examples of

3312

how to label domains).

3313

3314

The \verb|ObjectLabels| depicted in Figure~\ref{fig:acmxmlfileb} can be

3315

assigned to resources when this policy is active.

3316

3317

In general, user domains should be assigned labels that have only a

3318

single SimpleTypeEnforcement workload type. This way, workloads remain

3319

confined even if user domains become rogue. Any domain that is

3320

assigned a label with multiple STE types must be trusted to keep

3321

information belonging to the different STE types separate (confined).

3322

For example, Domain-0 is assigned the bootstrap label

3323

\verb|SystemManagement|, which includes all existing STE types.

3324

Therefore, Domain-0 must take care not to enable unauthorized

3325

information flow (eg. through block devices or virtual networking)

3326

between domains or resources that are assigned different STE types.

3327

3328

Security administrators simply use the name of a label (specified in

3329

the \verb|<Name>| field) to associate a label with a domain (cf.

3330

Section~\ref{subsection:acmexamplelabeldomains}). The types inside the

3331

label are used by the Xen access control enforcement. While the name

3332

can be arbitrarily chosen (as long as it is unique), it is advisable

3333

to choose the label name in accordance to the security types included.

3334

Similarly, the STE and CHWALL types should be named according to the

3335

workloads they represent. While the XML representation of the label

3336

in the above example seems unnecessary flexible, labels in general

3337

must be able to include multiple types.

3338

3339

We assume in the following example, that \verb|A-Bank.SecurityUnderwriting| and

3340

\verb|A-Bank.MarketAnalysis| workloads use virtual disks that are provided

3341

by a virtual I/O domain hosting a physical storage device and carrying

3342

the following label:

3343

3344

\begin{scriptsize}

3345

\begin{verbatim}

3346

3347

<Name>VIOServer</Name>

3348

3349

3350

<Type>A-Bank.SecurityUnderwriting</Type>

3351

<Type>A-Bank.MarketAnalysis</Type>

3352

<Type>VIOServer</Type>

3353

</SimpleTypeEnforcementTypes>

3354

3355

<Type>VIOServer</Type>

3356

</ChineseWallTypes>

3357

</VirtualMachineLabel>

3358

\end{verbatim}

3359

\end{scriptsize}

3360

3361

This Virtual I/O domain (VIO) exports its virtualized disks by

3362

communicating to all domains labeled with the

3363

\verb|A-Bank.SecurityUnderwriting|, the \verb|A-Bank|, or the

3364

\verb|A-Bank.MarketAnalysis| label. This requires the

3365

VIO domain to carry those STE types. In addition, this label includes a

3366

new \verb|VIOServer| type that can be used to restrict direct access to the

3367

physical storage resource to the VIODomain.

3368

3369

In this example, the confinement of these A-Bank workloads depends on the

3370

VIO domain that must keep the data of those different workloads separate.

3371

The virtual disks are labeled as well to keep track of their assignments

3372

to workload types (see Section~\ref{subsection:acmexamplelabelresources}

3373

for labeling resources) and enforcement functions inside the VIO

3374

domain must ensure that the labels of the domain mounting a virtual

3375

disk and the virtual disk label share a common STE type. The VIO label

3376

carrying its own VIOServer CHWALL type introduces the flexibility to

3377

permit the trusted VIO server to run together with \verb|A-Bank.SecurityUnderwriting|

3378

or \verb|A-Bank.MarketAnalysis| workloads.

3379

3380

Alternatively, a system that has two hard-drives does not need a VIO

3381

domain but can directly assign one hardware storage device to each of

3382

the workloads if the platform offers an IO-MMU, cf

3383

Section~\ref{s:ddsecurity}. Sharing hardware through virtualized devices

3384

is a trade-off between the amount of trusted code (size of the trusted

3385

computing base) and the amount of acceptable over-provisioning. This

3386

holds both for peripherals and for system platforms.

3387

3388

3389

\subsection{Managing sHype/Xen Security Policies at Run-time}

3390

\label{subsection:acmpolicymanagement}

3391

3392

\subsubsection{Removing the sHype/Xen Security Policy}

3393

When resetting the policy, no labeled domains can be running.

3394

Please stop or shutdown all running labeled domains. Then you can reset

3395

the policy to the default policy using the \verb|resetpolicy| command:

3396

3397

\begin{scriptsize}

3398

\begin{verbatim}

3399

# xm getpolicy

3400

Supported security subsystems : ACM

3401

Policy name : mytest

3402

Policy type : ACM

3403

Version of XML policy : 1.0

3404

Policy configuration : loaded, activated for boot

3405

3406

# xm resetpolicy

3407

Successfully reset the system's policy.

3408

3409

# xm getpolicy

3410

Supported security subsystems : ACM

3411

Policy name : DEFAULT

3412

Policy type : ACM

3413

Version of XML policy : 1.0

3414

Policy configuration : loaded

3415

3416

# xm resources

3417

file:/home/xen/dom_fc5/fedora.fc5.swap

3418

type: INV_ACM

3419

policy: mytest

3420

label: A-Bank

3421

file:/home/xen/dom_fc5/fedora.fc5.img

3422

type: INV_ACM

3423

policy: mytest

3424

label: A-Bank

3425

\end{verbatim}

3426

\end{scriptsize}

3427

3428

As the \verb|xm resources| output shows, all resource labels have

3429

invalidated type information but their semantics remain associated

3430

with the resources so that they can later on either be relabeled

3431

with semantically equivalent labels or sanitized and reused

3432

(storage resources).

3433

3434

At this point, the system is in the same initial state as after

3435

configuring XSM and sHype/ACM and rebooting the system without

3436

a specific policy. No user domains can run.

3437

3438

\subsubsection{Changing to a Different sHype/Xen Security Policy}

3439

The easiest way to change to a different, unrelated policy is to reset the system

3440

policy and then set the new policy. Please consider that the existing

3441

domain and resource labels become invalid at this point. Please refer

3442

to the next section for an example of how to seamlessly update an

3443

active policy at run-time without invalidating labels.

3444

3445

\begin{scriptsize}

3446

\begin{verbatim}

3447

# xm resetpolicy

3448

Successfully reset the system's policy.

3449

3450

# xm setpolicy ACM example.test

3451

Successfully set the new policy.

3452

Supported security subsystems : ACM

3453

Policy name : example.test

3454

Policy type : ACM

3455

Version of XML policy : 1.0

3456

Policy configuration : loaded, activated for boot

3457

3458

# xm labels

3459

CocaCola

3460

PepsiCo

3461

SystemManagement

3462

VIO

3463

# xm list --label

3464

Name ID Mem VCPUs State Time(s) Label

3465

Domain-0 0 873 1 r----- 56.3 ACM:example.test:SystemManagement

3466

3467

# xm resetpolicy

3468

Successfully reset the system's policy.

3469

3470

# xm getpolicy

3471

Supported security subsystems : ACM

3472

Policy name : DEFAULT

3473

Policy type : ACM

3474

Version of XML policy : 1.0

3475

Policy configuration : loaded

3476

3477

# xm list --label

3478

Name ID Mem VCPUs State Time(s) Label

3479

Domain-0 0 873 1 r----- 57.2 ACM:DEFAULT:SystemManagement

3480

3481

# xm setpolicy ACM mytest

3482

Successfully set the new policy.

3483

Supported security subsystems : ACM

3484

Policy name : mytest

3485

Policy type : ACM

3486

Version of XML policy : 1.0

3487

Policy configuration : loaded, activated for boot

3488

3489

# xm labels

3490

A-Bank

3491

A-Bank.MarketAnalysis

3492

A-Bank.SecurityUnderwriting

3493

AutoCorp

3494

B-Bank

3495

SystemManagement

3496

__UNLABELED__

3497

3498

# xm list --label

3499

Name ID Mem VCPUs State Time(s) Label

3500

Domain-0 0 873 1 r----- 58.0 ACM:mytest:SystemManagement

3501

\end{verbatim}

3502

\end{scriptsize}

3503

3504

The described way of changing policies by resetting the existing

3505

policy is useful for testing different policies. For real deployment

3506

environments, a policy update as described in the following section

3507

is more appropriate and can be applied seamlessly at run-time while

3508

user domains are running.

3509

3510

\subsubsection{Update an sHype/Xen Security Policy at Run-time}

3511

3512

Once an ACM security policy is activated (loaded into the Xen

3513

hypervisor), the policy may be updated at run-time without the

3514

need to re-boot the system. The XML update-policy contains several

3515

additional information fields that are required to safely link the

3516

new policy contents to the old policy and ensure a consistent

3517

transformation of the system security state from the old to the

3518

new policy. Those additional fields are required for policies that

3519

are updating an existing policy at run-time.

3520

3521

The major benefit of policy updates is the ability to add, delete,

3522

or rename workload types, labels, and conflict sets (run-time

3523

exclusion rules) to accommodate changes in the managed virtual

3524

environment without the need to reboot the Xen system. When a

3525

new policy renames labels of the current policy, the labels

3526

attached to resources and domains are automatically updated

3527

during a successful policy update.

3528

3529

We have manually crafted an update policy for the \verb|mytest|

3530

security policy and stored it in the file mytest\_update-security\_policy.xml

3531

in the policies directory. We will discuss this policy in detail before

3532

using it to update a running sHype/Xen system. The following figures contain

3533

the whole contents of the update policy file.

3534

3535

Figure~\ref{fig:acmupdateheader} shows the policy

3536

header of an update-policy and the new \verb|FromPolicy| XML

3537

node. For the policy update to succeed, the policy name and the

3538

policy version fields of the \verb|FromPolicy| XML node must

3539

exactly match those of the currently enforced policy. This

3540

ensures a controlled update path of the policy.

3541

3542

\begin{figure}[htb]

3543

\begin{scriptsize}

3544

\begin{verbatim}

3545

<?xml version="1.0" encoding="UTF-8"?>

3546

3547

<SecurityPolicyDefinition xmlns="http://www.ibm.com"

3548

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

3549

xsi:schemaLocation="http://www.ibm.com ../../security_policy.xsd ">

3550

3551

<PolicyName>mytest</PolicyName>

3552

3553

3554

3555

<PolicyName>mytest</PolicyName>

3556

3557

</FromPolicy>

3558

</PolicyHeader>

3559

\end{verbatim}

3560

\end{scriptsize}

3561

\caption{XML security policy update -- Part I: Updated Policy Header.}

3562

\label{fig:acmupdateheader}

3563

\end{figure}

3564

3565

The version number of the new policy, which is shown in the

3566

node following the \verb|Date| node, must be a logical increment

3567

to the current policy's version. Therefore at least the minor

3568

number of the policy version must be incremented. This ensures

3569

that a policy update is applied only to exactly the policy for

3570

which this update was created and minimizes unforseen side-effects

3571

of policy updates.

3572

3573

\paragraph{Types and Conflic Sets}

3574

The type names and the assignment of types to labels or conflict

3575

sets (run-time exclusion rules) can

3576

simply be changed consistently throughout the policy. Types,

3577

as opposed to labels, are not directly associated or referenced

3578

outside the policy so they do not need to carry their history

3579

in a ``From'' field. The figure below shows the update for the

3580

types and conflict sets. The \verb|__UNLABELED__| type is removed

3581

to disable support for running unlabeled domains. Additionally,

3582

we have renamed the two \verb|A-Bank| department types with

3583

abbreviated names \verb|A-Bank.SU| and \verb|A-Bank.MA|. You

3584

can also see how those type names are

3585

consistently changed within the conflict set definition.

3586

3587

\begin{figure}[htb]

3588

\begin{scriptsize}

3589

\begin{verbatim}

3590

3591

3592

<Type>SystemManagement</Type>

3593

3594

3595

3596

3597

<Type>AutoCorp</Type>

3598

</SimpleTypeEnforcementTypes>

3599

</SimpleTypeEnforcement>

3600

3601

3602

3603

<Type>SystemManagement</Type>

3604

3605

3606

3607

3608

<Type>AutoCorp</Type>

3609

</ChineseWallTypes>

3610

3611

3612

3613

3614

3615

</Conflict>

3616

3617

3618

3619

</Conflict>

3620

</ConflictSets>

3621

</ChineseWall>

3622

\end{verbatim}

3623

\end{scriptsize}

3624

\caption{XML security policy update -- Part II: Updated Types and Conflict Sets.}

3625

\label{fig:acmupdatetypesnrules}

3626

\end{figure}

3627

3628

In the same way, new types can be introduced and new conflict sets

3629

can be defined by simply adding the types or conflict sets to the

3630

update policy.

3631

3632

\paragraph{Labels} Virtual machine and resource labels of an existing policy can be

3633

deleted through a policy update simply by omitting them in the

3634

update-policy. However, if a currently running virtual machine

3635

or a currently used resource is labeled with a label not stated

3636

in the update-policy, then the policy update is rejected. This

3637

ensures that a policy update leaves the system in a consistent

3638

security state.

3639

3640

A policy update also enables the renaming of virtual machine and

3641

resource labels. Linking the old label name with the new label

3642

name is achieved through the \verb|from| attribute in the

3643

\verb|VirtualMachineLabel| or \verb|ResourceLabel| nodes in the

3644

update-policy. Figure~\ref{fig:acmupdatelabels} shown how subject

3645

and resource labels

3646

are updated from their old name \verb|A-Bank.SecurityUnterwriting|

3647

to their new name \verb|A-Bank.SU| using the \verb|from| attribute.

3648

3649

\begin{figure}[htb]

3650

\begin{tabular*}{\textwidth}{@{\extracolsep{\fill}}l|l}

3651

\begin{minipage}{0.475\textwidth}

3652

\begin{tiny}

3653

\begin{verbatim}

3654

3655

3656

3657

<Name>SystemManagement</Name>

3658

3659

<Type>SystemManagement</Type>

3660

3661

3662

3663

3664

<Type>AutoCorp</Type>

3665

</SimpleTypeEnforcementTypes>

3666

3667

<Type>SystemManagement</Type>

3668

</ChineseWallTypes>

3669

</VirtualMachineLabel>

3670

3671

3672

3673

<Type>SystemManagement</Type>

3674

3675

3676

3677

</SimpleTypeEnforcementTypes>

3678

3679

<Type>SystemManagement</Type>

3680

</ChineseWallTypes>

3681

</VirtualMachineLabel>

3682

3683

3684

3685

3686

</SimpleTypeEnforcementTypes>

3687

3688

3689

</ChineseWallTypes>

3690

</VirtualMachineLabel>

3691

3692

3693

A-Bank.SU</Name>

3694

3695

3696

</SimpleTypeEnforcementTypes>

3697

3698

3699

3700

</ChineseWallTypes>

3701

</VirtualMachineLabel>

3702

3703

3704

A-Bank.MA</Name>

3705

3706

3707

</SimpleTypeEnforcementTypes>

3708

3709

3710

3711

</ChineseWallTypes>

3712

</VirtualMachineLabel>

3713

\end{verbatim}

3714

\end{tiny}

3715

\end{minipage} &

3716

\begin{minipage}{0.475\textwidth}

3717

\begin{tiny}

3718

\begin{verbatim}

3719

3720

3721

3722

3723

</SimpleTypeEnforcementTypes>

3724

3725

3726

</ChineseWallTypes>

3727

</VirtualMachineLabel>

3728

3729

<Name>AutoCorp</Name>

3730

3731

<Type>AutoCorp</Type>

3732

</SimpleTypeEnforcementTypes>

3733

3734

<Type>AutoCorp</Type>

3735

</ChineseWallTypes>

3736

</VirtualMachineLabel>

3737

</SubjectLabels>

3738

3739

3740

3741

<Name>SystemManagement</Name>

3742

3743

<Type>SystemManagement</Type>

3744

</SimpleTypeEnforcementTypes>

3745

</ResourceLabel>

3746

3747

3748

3749

3750

</SimpleTypeEnforcementTypes>

3751

</ResourceLabel>

3752

3753

3754

A-Bank.SU</Name>

3755

3756

3757

</SimpleTypeEnforcementTypes>

3758

</ResourceLabel>

3759

3760

3761

A-Bank.MA</Name>

3762

3763

3764

</SimpleTypeEnforcementTypes>

3765

</ResourceLabel>

3766

3767

3768

3769

3770

</SimpleTypeEnforcementTypes>

3771

</ResourceLabel>

3772

3773

<Name>AutoCorp</Name>

3774

3775

<Type>AutoCorp</Type>

3776

</SimpleTypeEnforcementTypes>

3777

</ResourceLabel>

3778

</ObjectLabels>

3779

</SecurityLabelTemplate>

3780

</SecurityPolicyDefinition>

3781

\end{verbatim}

3782

\end{tiny}

3783

\end{minipage}

3784

\end{tabular*}

3785

\caption{XML security policy update -- Part III: Updated Label Definition.}

3786

\label{fig:acmupdatelabels}

3787

\end{figure}

3788

% DO NOT MODIFY WHITESPACE ABOVE, it balances the columns

3789

3790

The updated label definition also includes a new label \verb|A-Bank-WL|

3791

that includes all STE types related to A-Bank. Its CHWALL type

3792

is \verb|SystemManagement|. This indicates that this label is designed

3793

as Domain-0 label. A Xen system can be restricted to only run A-Bank

3794

related workloads by relabeling Domain-0 with the \verb|A-Bank-WL| label.

3795

3796

We assume that the update-policy shown in

3797

Figures~\ref{fig:acmupdateheader}, \ref{fig:acmupdatetypesnrules}

3798

and \ref{fig:acmupdatelabels}

3799

is stored in the XML file mytest\_update-security\_policy.xml located

3800

in the ACM policy directory. See Section~\ref{subsection:acmnaming}

3801

for information about policy names and locations.

3802

3803

The following \verb|xm setpolicy| command updates the active ACM

3804

security policy at run-time.

3805

3806

\begin{scriptsize}

3807

\begin{verbatim}

3808

# xm list --label

3809

Name ID Mem VCPUs State Time(s) Label

3810

domain1 2 128 1 -b---- 0.6 ACM:mytest:A-Bank

3811

domain4 3 164 1 -b---- 0.3 ACM:mytest:A-Bank.SecurityUnderwriting

3812

Domain-0 0 711 1 r----- 71.8 ACM:mytest:SystemManagement

3813

3814

# xm resources

3815

file:/home/xen/dom_fc5/fedora.fc5.swap

3816

type: ACM

3817

policy: mytest

3818

label: A-Bank

3819

file:/home/xen/dom_fc5/fedora.fc5.img

3820

type: ACM

3821

policy: mytest

3822

label: A-Bank

3823

3824

# xm setpolicy ACM mytest_update

3825

Successfully set the new policy.

3826

Supported security subsystems : ACM

3827

Policy name : mytest

3828

Policy type : ACM

3829

Version of XML policy : 1.1

3830

Policy configuration : loaded, activated for boot

3831

3832

# xm list --label

3833

Name ID Mem VCPUs State Time(s) Label

3834

domain1 2 128 1 -b---- 0.7 ACM:mytest:A-Bank

3835

domain4 3 164 1 -b---- 0.3 ACM:mytest:A-Bank.SU

3836

Domain-0 0 711 1 r----- 72.8 ACM:mytest:SystemManagement

3837

3838

# xm labels

3839

A-Bank

3840

A-Bank-WL

3841

A-Bank.MA

3842

A-Bank.SU

3843

AutoCorp

3844

B-Bank

3845

3846

# xm resources

3847

file:/home/xen/dom_fc5/fedora.fc5.swap

3848

type: ACM

3849

policy: mytest

3850

label: A-Bank

3851

file:/home/xen/dom_fc5/fedora.fc5.img

3852

type: ACM

3853

policy: mytest

3854

label: A-Bank

3855

\end{verbatim}

3856

\end{scriptsize}

3857

3858

After successful completion of this command, \verb|xm list --label|

3859

shows that the labels of running domains changed to their new names.

3860

3861

are now available in the policy. The resource labels remain valid after

3862

the successful update as \verb|xm resources| confirms.

3863

3864

The \verb|setpolicy| command fails if the new policy is inconsistent

3865

with the current one or the policy is inconsistent internally (e.g., types

3866

are renamed in the type definition but not in the label definition part of

3867

the policy). In this case, the old policy remains active.

3868

3869

After relabeling Domain-0 with the new \verb|A-Bank-WL| label, we can no

3870

longer run domains labeled \verb|B-Bank| or \verb|AutoCorp| since their

3871

STE types are not a subset of the new Domain-0 label.

3872

3873

\begin{scriptsize}

3874

\begin{verbatim}

3875

# xm addlabel A-Bank-WL mgt Domain-0

3876

Successfully set the label of domain 'Domain-0' to 'A-Bank-WL'.

3877

3878

# xm list --label

3879

Name ID Mem VCPUs State Time(s) Label

3880

domain1 2 128 1 -b---- 0.8 ACM:mytest:A-Bank

3881

Domain-0 0 711 1 r----- 74.5 ACM:mytest:A-Bank-WL

3882

domain4 3 164 1 -b---- 0.3 ACM:mytest:A-Bank.SU

3883

3884

# xm getlabel dom domain3.xm

3885

policytype=ACM,policy=mytest,label=AutoCorp

3886

3887

# xm create domain3.xm

3888

Using config file "./domain3.xm".

3889

Error: VM is not authorized to run.

3890

3891

# xm addlabel SystemManagement mgt Domain-0

3892

Successfully set the label of domain 'Domain-0' to 'SystemManagement'.

3893

3894

# xm list --label

3895

Name ID Mem VCPUs State Time(s) Label

3896

domain1 2 128 1 -b---- 0.8 ACM:mytest:A-Bank

3897

domain4 3 164 1 -b---- 0.3 ACM:mytest:A-Bank.SU

3898

Domain-0 0 709 1 r----- 76.4 ACM:mytest:SystemManagement

3899

3900

# xm create domain3.xm

3901

Using config file "./domain3.xm".

3902

Started domain domain3

3903

3904

# xm list --label

3905

Name ID Mem VCPUs State Time(s) Label

3906

domain1 2 128 1 -b---- 0.8 ACM:mytest:A-Bank

3907

domain4 3 164 1 -b---- 0.3 ACM:mytest:A-Bank.SU

3908

domain3 4 164 1 -b---- 0.3 ACM:mytest:AutoCorp

3909

Domain-0 0 547 1 r----- 77.5 ACM:mytest:SystemManagement

3910

\end{verbatim}

3911

\end{scriptsize}

3912

3913

In the same manner, you can add new labels to support new workloads and

3914

add, delete, or rename workload types (STE and/or CHWALL types) simply

3915

by changing the composition of labels. Another use case is to add new

3916

workload types to the current Domain-0 label to enable them to run.

3917

Conflict sets (run-time exclusion rules) can be simply omitted or added.

3918

The policy and label changes become active at once and new workloads

3919

can be run in protected mode without rebooting the Xen system.

3920

3921

In all these cases, if any running user domain would--under the new policy--not

3922

be allowed to run or would not be allowed to access any of the resources

3923

it currently uses, then the policy update is rejected. In this case, you

3924

can stop domains that conflict with the new policy and update the policy

3925

afterwards. The old policy remains active until a policy update succeeds

3926

or Xen is re-booted into a new policy.

3927

3928

\subsection{Tools For Creating sHype/Xen Security Policies}

3929

To create a security policy for Xen, you can use one of the following

3930

tools:

3931

\begin{itemize}

3932

\item \verb|ezPolicy| GUI tool -- start writing policies

3933

\item \verb|xensec_gen| tool -- refine policies created with \verb|ezPolicy|

3934

\item text or XML editor

3935

\end{itemize}

3936

3937

We use the \verb|ezPolicy| tool in

3938

Section~\ref{subsection:acmexamplecreate} to quickly create a workload

3939

protection policy. If desired, the resulting XML policy file can be

3940

loaded into the \verb|xensec_gen| tool to refine it. It can also be

3941

directly edited using an XML editor. Any XML policy file is verified

3942

against the security policy schema when it is translated (see

3943

Subsection~\ref{subsection:acmexampleinstall}).

3944

3945

\section{Current Limitations}

3946

\label{section:acmlimitations}

3947

3948

The sHype/ACM configuration for Xen is work in progress. There is

3949

ongoing work for protecting virtualized resources and planned and

3950

ongoing work for protecting access to remote resources and domains.

3951

The following sections describe limitations of some of the areas into

3952

which access control is being extended.

3953

3954

\subsection{Network Traffic}

3955

Local and remote network traffic is currently not controlled.

3956

Solutions to add sHype/ACM policy enforcement to the virtual network

3957

exist but need to be discussed before they can become part of Xen.

3958

Subjecting external network traffic to the ACM security policy is work

3959

in progress. Manually setting up filters in domain 0 is required for

3960

now but does not scale well.

3961

3962

\subsection{Resource Access and Usage Control}

3963

3964

Enforcing the security policy across multiple hypervisor systems and

3965

on access to remote shared resources is work in progress. Extending

3966

access control to new types of resources is ongoing work (e.g. network

3967

storage).

3968

3969

On a single Xen system, information about the association of resources

3970

and security labels is stored in

3971

\verb|/var/lib/xend/security/policies/resource_labels|. This file relates

3972

a full resource path with a security label. This association is weak

3973

and will break if resources are moved or renamed without adapting the

3974

label file. Improving the protection of label-resource relationships

3975

is ongoing work.

3976

3977

Controlling resource usage and enforcing resource limits in general is

3978

ongoing work in the Xen community.

3979

3980

\subsection{Domain Migration}

3981

3982

Labels on domains are enforced during domain migration and the

3983

destination hypervisor will ensure that the domain label is valid and

3984

the domain is permitted to run (considering the Chinese Wall policy

3985

rules) before it accepts the migration. However, the network between

3986

the source and destination hypervisor as well as both hypervisors must

3987

be trusted. Architectures and prototypes exist that both protect the

3988

network connection and ensure that the hypervisors enforce access

3989

control consistently but patches are not yet available for the main

3990

stream.

3991

3992

\subsection{Covert Channels}

3993

3994

The sHype access control aims at system independent security policies.

3995

It builds on top of the core hypervisor isolation. Any covert channels

3996

that exist in the core hypervisor or in the hardware (e.g., shared

3997

processor cache) will be inherited. If those covert channels are not

3998

the result of trade-offs between security and other system properties,

3999

then they are most effectively minimized or eliminated where they are

4000

caused. sHype offers however some means to mitigate their impact, e.g.,

4001

run-time exclusion rules (cf Section~\ref{subsection:acmexamplecreate})

4002

or limiting the system authorization (cf Section~\ref{subsection:acmlabeldom0}).

4003

4004

4005

\part{Reference}

4006

4007

%% Chapter Build and Boot Options

4008

\chapter{Build and Boot Options}

4009

4010

This chapter describes the build- and boot-time options which may be

4011

used to tailor your Xen system.

4012

4013

\section{Top-level Configuration Options}

4014

4015

Top-level configuration is achieved by editing one of two

4016

files: \path{Config.mk} and \path{Makefile}.

4017

4018

The former allows the overall build target architecture to be

4019

specified. You will typically not need to modify this unless

4020

you are cross-compiling. Additional configuration options are

4021

documented in the \path{Config.mk} file.

4022

4023

The top-level \path{Makefile} is chiefly used to customize the set of

4024

kernels built. Look for the line:

4025

\begin{quote}

4026

\begin{verbatim}

4027

KERNELS ?= linux-2.6-xen0 linux-2.6-xenU

4028

\end{verbatim}

4029

\end{quote}

4030

4031

Allowable options here are any kernels which have a corresponding

4032

build configuration file in the \path{buildconfigs/} directory.

4033

4034

4035

4036

\section{Xen Build Options}

4037

4038

Xen provides a number of build-time options which should be set as

4039

environment variables or passed on make's command-line.

4040

4041

\begin{description}

4042

\item[verbose=y] Enable debugging messages when Xen detects an

4043

unexpected condition. Also enables console output from all domains.

4044

\item[debug=y] Enable debug assertions. Implies {\bf verbose=y}.

4045

(Primarily useful for tracing bugs in Xen).

4046

\item[debugger=y] Enable the in-Xen debugger. This can be used to

4047

debug Xen, guest OSes, and applications.

4048

\item[perfc=y] Enable performance counters for significant events

4049

within Xen. The counts can be reset or displayed on Xen's console

4050

via console control keys.

4051

\end{description}

4052

4053

4054

\section{Xen Boot Options}

4055

\label{s:xboot}

4056

4057

These options are used to configure Xen's behaviour at runtime. They

4058

should be appended to Xen's command line, either manually or by

4059

editing \path{grub.conf}.

4060

4061

\begin{description}

4062

\item [ noreboot ] Don't reboot the machine automatically on errors.

4063

This is useful to catch debug output if you aren't catching console

4064

messages via the serial line.

4065

\item [ nosmp ] Disable SMP support. This option is implied by

4066

`ignorebiostables'.

4067

\item [ watchdog ] Enable NMI watchdog which can report certain

4068

failures.

4069

\item [ noirqbalance ] Disable software IRQ balancing and affinity.

4070

This can be used on systems such as Dell 1850/2850 that have

4071

workarounds in hardware for IRQ-routing issues.

4072

\item [ badpage=$<$page number$>$,$<$page number$>$, \ldots ] Specify

4073

a list of pages not to be allocated for use because they contain bad

4074

bytes. For example, if your memory tester says that byte 0x12345678

4075

is bad, you would place `badpage=0x12345' on Xen's command line.

4076

\item [ serial\_tx\_buffer=$<$size$>$ ] Size of serial transmit

4077

buffers. Default is 16kB.

4078

\item [ com1=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$

4079

com2=$<$baud$>$,DPS,$<$io\_base$>$,$<$irq$>$ ] \mbox{}\\

4080

Xen supports up to two 16550-compatible serial ports. For example:

4081

`com1=9600, 8n1, 0x408, 5' maps COM1 to a 9600-baud port, 8 data

4082

bits, no parity, 1 stop bit, I/O port base 0x408, IRQ 5. If some

4083

configuration options are standard (e.g., I/O base and IRQ), then

4084

only a prefix of the full configuration string need be specified. If

4085

the baud rate is pre-configured (e.g., by the bootloader) then you

4086

can specify `auto' in place of a numeric baud rate.

4087

\item [ console=$<$specifier list$>$ ] Specify the destination for Xen

4088

console I/O. This is a comma-separated list of, for example:

4089

\begin{description}

4090

\item[ vga ] Use VGA console (until domain 0 boots, unless {\bf

4091

vga=...keep } is specified).

4092

\item[ com1 ] Use serial port com1.

4093

\item[ com2H ] Use serial port com2. Transmitted chars will have the

4094

MSB set. Received chars must have MSB set.

4095

\item[ com2L] Use serial port com2. Transmitted chars will have the

4096

MSB cleared. Received chars must have MSB cleared.

4097

\end{description}

4098

The latter two examples allow a single port to be shared by two

4099

subsystems (e.g.\ console and debugger). Sharing is controlled by

4100

MSB of each transmitted/received character. [NB. Default for this

4101

option is `com1,vga']

4102

\item [ vga=$<$mode$>$(,keep) ] The mode is one of the following options:

4103

\begin{description}

4104

\item[ ask ] Display a vga menu allowing manual selection of video

4105

mode.

4106

\item[ current ] Use existing vga mode without modification.

4107

\item[ text-$<$mode$>$ ] Select text-mode resolution, where mode is

4108

one of 80x25, 80x28, 80x30, 80x34, 80x43, 80x50, 80x60.

4109

\item[ gfx-$<$mode$>$ ] Select VESA graphics mode

4110

$<$width$>$x$<$height$>$x$<$depth$>$ (e.g., `vga=gfx-1024x768x32').

4111

\item[ mode-$<$mode$>$ ] Specify a mode number as discovered by `vga

4112

ask'. Note that the numbers are displayed in hex and hence must be

4113

prefixed by `0x' here (e.g., `vga=mode-0x0335').

4114

\end{description}

4115

The mode may optionally be followed by `{\bf,keep}' to cause Xen to keep

4116

writing to the VGA console after domain 0 starts booting (e.g., `vga=text-80x50,keep').

4117

\item [ no-real-mode ] (x86 only) Do not execute real-mode bootstrap

4118

code when booting Xen. This option should not be used except for

4119

debugging. It will effectively disable the {\bf vga} option, which

4120

relies on real mode to set the video mode.

4121

\item [ edid=no,force ] (x86 only) Either force retrieval of monitor

4122

EDID information via VESA DDC, or disable it (edid=no). This option

4123

should not normally be required except for debugging purposes.

4124

\item [ edd=off,on,skipmbr ] (x86 only) Control retrieval of Extended

4125

Disc Data (EDD) from the BIOS during boot.

4126

\item [ console\_to\_ring ] Place guest console output into the

4127

hypervisor console ring buffer. This is disabled by default.

4128

When enabled, both hypervisor output and guest console output

4129

is available from the ring buffer. This can be useful for logging

4130

and/or remote presentation of console data.

4131

\item [ sync\_console ] Force synchronous console output. This is

4132

useful if you system fails unexpectedly before it has sent all

4133

available output to the console. In most cases Xen will

4134

automatically enter synchronous mode when an exceptional event

4135

occurs, but this option provides a manual fallback.

4136

\item [ conswitch=$<$switch-char$><$auto-switch-char$>$ ] Specify how

4137

to switch serial-console input between Xen and DOM0. The required

4138

sequence is CTRL-$<$switch-char$>$ pressed three times. Specifying

4139

the backtick character disables switching. The

4140

$<$auto-switch-char$>$ specifies whether Xen should auto-switch

4141

input to DOM0 when it boots --- if it is `x' then auto-switching is

4142

disabled. Any other value, or omitting the character, enables

4143

auto-switching. [NB. Default switch-char is `a'.]

4144

\item [ loglvl=$<$level$>/<$level$>$ ]

4145

Specify logging level. Messages of the specified severity level (and

4146

higher) will be printed to the Xen console. Valid levels are `none',

4147

`error', `warning', `info', `debug', and `all'. The second level

4148

specifier is optional: it is used to specify message severities

4149

which are to be rate limited. Default is `loglvl=warning'.

4150

\item [ guest\_loglvl=$<$level$>/<$level$>$ ] As for loglvl, but

4151

applies to messages relating to guests. Default is

4152

`guest\_loglvl=none/warning'.

4153

\item [ console\_timestamps ]

4154

Adds a timestamp prefix to each line of Xen console output.

4155

\item [ nmi=xxx ]

4156

Specify what to do with an NMI parity or I/O error. \\

4157

`nmi=fatal': Xen prints a diagnostic and then hangs. \\

4158

`nmi=dom0': Inform DOM0 of the NMI. \\

4159

`nmi=ignore': Ignore the NMI.

4160

\item [ mem=xxx ] Set the physical RAM address limit. Any RAM

4161

appearing beyond this physical address in the memory map will be

4162

ignored. This parameter may be specified with a B, K, M or G suffix,

4163

representing bytes, kilobytes, megabytes and gigabytes respectively.

4164

The default unit, if no suffix is specified, is kilobytes.

4165

\item [ dom0\_mem=$<$specifier list$>$ ] Set the amount of memory to

4166

be allocated to domain 0. This is a comma-separated list containing

4167

the following optional components:

4168

\begin{description}

4169

\item[ min:$<$min\_amt$>$ ] Minimum amount to allocate to domain 0

4170

\item[ max:$<$min\_amt$>$ ] Maximum amount to allocate to domain 0

4171

\item[ $<$amt$>$ ] Precise amount to allocate to domain 0

4172

\end{description}

4173

Each numeric parameter may be specified with a B, K, M or

4174

G suffix, representing bytes, kilobytes, megabytes and gigabytes

4175

respectively; if no suffix is specified, the parameter defaults to

4176

kilobytes. Negative values are subtracted from total available

4177

memory. If $<$amt$>$ is not specified, it defaults to all available

4178

memory less a small amount (clamped to 128MB) for uses such as DMA

4179

buffers.

4180

\item [ dom0\_vcpus\_pin ] Pins domain 0 VCPUs on their respective

4181

physical CPUS (default=false).

4182

\item [ tbuf\_size=xxx ] Set the size of the per-cpu trace buffers, in

4183

pages (default 0).

4184

\item [ sched=xxx ] Select the CPU scheduler Xen should use. The

4185

current possibilities are `credit' (default), and `sedf'.

4186

\item [ apic\_verbosity=debug,verbose ] Print more detailed

4187

information about local APIC and IOAPIC configuration.

4188

\item [ lapic ] Force use of local APIC even when left disabled by

4189

uniprocessor BIOS.

4190

\item [ nolapic ] Ignore local APIC in a uniprocessor system, even if

4191

enabled by the BIOS.

4192

\item [ apic=bigsmp,default,es7000,summit ] Specify NUMA platform.

4193

This can usually be probed automatically.

4194

\item [ dma\_bits=xxx ] Specify width of DMA addresses in bits. This

4195

is used in NUMA systems to prevent this special DMA memory from

4196

being exhausted in one node when remote nodes have available memory.

4197

\item [ vcpu\_migration\_delay=$<$minimum\_time$>$] Set minimum time of

4198

vcpu migration in microseconds (default 0). This parameter avoids agressive

4199

vcpu migration. For example, the linux kernel uses 0.5ms by default.

4200

\end{description}

4201

4202

In addition, the following options may be specified on the Xen command

4203

line. Since domain 0 shares responsibility for booting the platform,

4204

Xen will automatically propagate these options to its command line.

4205

These options are taken from Linux's command-line syntax with

4206

unchanged semantics.

4207

4208

\begin{description}

4209

\item [ acpi=off,force,strict,ht,noirq,\ldots ] Modify how Xen (and

4210

domain 0) parses the BIOS ACPI tables.

4211

\item [ acpi\_skip\_timer\_override ] Instruct Xen (and domain~0) to

4212

ignore timer-interrupt override instructions specified by the BIOS

4213

ACPI tables.

4214

\item [ noapic ] Instruct Xen (and domain~0) to ignore any IOAPICs

4215

that are present in the system, and instead continue to use the

4216

legacy PIC.

4217

\end{description}

4218

4219

4220

\section{XenLinux Boot Options}

4221

4222

In addition to the standard Linux kernel boot options, we support:

4223

\begin{description}

4224

\item[ xencons=xxx ] Specify the device node to which the Xen virtual

4225

console driver is attached. The following options are supported:

4226

\begin{center}

4227

\begin{tabular}{l}

4228

`xencons=off': disable virtual console \\

4229

`xencons=tty': attach console to /dev/tty1 (tty0 at boot-time) \\

4230

`xencons=ttyS': attach console to /dev/ttyS0 \\

4231

`xencons=xvc': attach console to /dev/xvc0

4232

\end{tabular}

4233

\end{center}

4234

The default is ttyS for dom0 and xvc for all other domains.

4235

\end{description}

4236

4237

4238

%% Chapter Further Support

4239

\chapter{Further Support}

4240

4241

If you have questions that are not answered by this manual, the

4242

sources of information listed below may be of interest to you. Note

4243

that bug reports, suggestions and contributions related to the

4244

software (or the documentation) should be sent to the Xen developers'

4245

mailing list (address below).

4246

4247

4248

\section{Other Documentation}

4249

4250

For developers interested in porting operating systems to Xen, the

4251

\emph{Xen Interface Manual} is distributed in the \path{docs/}

4252

directory of the Xen source distribution.

4253

4254

4255

\section{Online References}

4256

4257

The official Xen web site can be found at:

4258

\begin{quote} {\tt http://www.xen.org}

4259

\end{quote}

4260

4261

This contains links to the latest versions of all online

4262

documentation, including the latest version of the FAQ.

4263

4264

Information regarding Xen is also available at the Xen Wiki at

4265

\begin{quote} {\tt http://wiki.xensource.com/xenwiki/}\end{quote}

4266

The Xen project uses Bugzilla as its bug tracking system. You'll find

4267

the Xen Bugzilla at http://bugzilla.xensource.com/bugzilla/.

4268

4269

4270

\section{Mailing Lists}

4271

4272

There are several mailing lists that are used to discuss Xen related

4273

topics. The most widely relevant are listed below. An official page of

4274

mailing lists and subscription information can be found at \begin{quote}

4275

{\tt http://lists.xensource.com/} \end{quote}

4276

4277

\begin{description}

4278

\item[xen-devel@lists.xensource.com] Used for development

4279

discussions and bug reports. Subscribe at: \\

4280

{\small {\tt http://lists.xensource.com/xen-devel}}

4281

\item[xen-users@lists.xensource.com] Used for installation and usage

4282

discussions and requests for help. Subscribe at: \\

4283

{\small {\tt http://lists.xensource.com/xen-users}}

4284

\item[xen-announce@lists.xensource.com] Used for announcements only.

4285

Subscribe at: \\

4286

{\small {\tt http://lists.xensource.com/xen-announce}}

4287

\item[xen-changelog@lists.xensource.com] Changelog feed

4288

from the unstable and 3.x trees - developer oriented. Subscribe at: \\

4289

{\small {\tt http://lists.xensource.com/xen-changelog}}

4290

\end{description}

4291

4292

4293

4294

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

4295

4296

\appendix

4297

4298

\chapter{Unmodified (HVM) guest domains in Xen with Hardware support for Virtualization}

4299

4300

Xen supports guest domains running unmodified guest operating systems using

4301

virtualization extensions available on recent processors. Currently processors

4302

featuring the Intel Virtualization Extension (Intel-VT) or the AMD extension

4303

(AMD-V) are supported. The technology covering both implementations is

4304

called HVM (for Hardware Virtual Machine) in Xen. More information about the

4305

virtualization extensions are available on the respective websites:

4306

{\small {\tt http://www.intel.com/technology/computing/vptech}}

4307

4308

4309

{\small {\tt http://www.amd.com/us-en/assets/content\_type/white\_papers\_and\_tech\_docs/24593.pdf}}

4310

4311

\section{Building Xen with HVM support}

4312

4313

The following packages need to be installed in order to build Xen with HVM support. Some Linux distributions do not provide these packages by default.

4314

4315

\begin{tabular}{lp{11.0cm}}

4316

{\bfseries Package} & {\bfseries Description} \\

4317

4318

dev86 & The dev86 package provides an assembler and linker for real mode 80x86 instructions. You need to have this package installed in order to build the BIOS code which runs in (virtual) real mode.

4319

4320

If the dev86 package is not available on the x86\_64 distribution, you can install the i386 version of it. The dev86 rpm package for various distributions can be found at {\scriptsize {\tt http://www.rpmfind.net/linux/rpm2html/search.php?query=dev86\&submit=Search}} \\

4321

4322

SDL-devel, SDL & Simple DirectMedia Layer (SDL) is another way of virtualizing the unmodified guest console. It provides an X window for the guest console.

4323

4324

If the SDL and SDL-devel packages are not installed by default on the build system, they can be obtained from {\scriptsize {\tt http://www.rpmfind.net/linux/rpm2html/search.php?query=SDL\&submit=Search}}

4325

4326

4327

{\scriptsize {\tt http://www.rpmfind.net/linux/rpm2html/search.php?query=SDL-devel\&submit=Search}} \\

4328

4329

\end{tabular}

4330

4331

\section{Configuration file for unmodified HVM guests}

4332

4333

The Xen installation includes a sample configuration file, {\small {\tt /etc/xen/xmexample.hvm}}. There are comments describing all the options. In addition to the common options that are the same as those for paravirtualized guest configurations, HVM guest configurations have the following settings:

4334

4335

\begin{tabular}{lp{11.0cm}}

4336

4337

{\bfseries Parameter} & {\bfseries Description} \\

4338

4339

kernel & The HVM firmware loader, {\small {\tt /usr/lib/xen/boot/hvmloader}}\\

4340

4341

builder & The domain build function. The HVM domain uses the 'hvm' builder.\\

4342

4343

acpi & Enable HVM guest ACPI, default=1 (enabled)\\

4344

4345

apic & Enable HVM guest APIC, default=1 (enabled)\\

4346

4347

pae & Enable HVM guest PAE, default=1 (enabled)\\

4348

4349

hap & Enable hardware-assisted paging support, such as AMD-V's nested paging

4350

or Intel\textregistered VT's extended paging. If available, Xen will

4351

use hardware-assisted paging instead of shadow paging for this guest's memory

4352

management.\\

4353

4354

vif & Optionally defines MAC address and/or bridge for the network interfaces. Random MACs are assigned if not given. {\small {\tt type=ioemu}} means ioemu is used to virtualize the HVM NIC. If no type is specified, vbd is used, as with paravirtualized guests.\\

4355

4356

disk & Defines the disk devices you want the domain to have access to, and what you want them accessible as. If using a physical device as the HVM guest's disk, each disk entry is of the form

4357

4358

{\small {\tt phy:UNAME,ioemu:DEV,MODE,}}

4359

4360

where UNAME is the host device file, DEV is the device name the domain will see, and MODE is r for read-only, w for read-write. ioemu means the disk will use ioemu to virtualize the HVM disk. If not adding ioemu, it uses vbd like paravirtualized guests.

4361

4362

If using disk image file, its form should be like

4363

4364

{\small {\tt file:FILEPATH,ioemu:DEV,MODE}}

4365

4366

Optical devices can be emulated by appending cdrom to the device type

4367

4368

{\small {\tt ',hdc:cdrom,r'}}

4369

4370

If using more than one disk, there should be a comma between each disk entry. For example:

4371

4372

{\scriptsize {\tt disk = ['file:/var/images/image1.img,ioemu:hda,w', 'phy:hda1,hdb1,w', 'file:/var/images/install1.iso,hdc:cdrom,r']}}\\

4373

4374

boot & Boot from floppy (a), hard disk (c) or CD-ROM (d). For example, to boot from CD-ROM and fallback to HD, the entry should be:

4375

4376

boot='dc'\\

4377

4378

device\_model & The device emulation tool for HVM guests. This parameter should not be changed.\\

4379

4380

sdl & Enable SDL library for graphics, default = 0 (disabled)\\

4381

4382

vnc & Enable VNC library for graphics, default = 1 (enabled)\\

4383

4384

vncconsole & Enable spawning of the vncviewer (only valid when vnc=1), default = 0 (disabled)

4385

4386

If vnc=1 and vncconsole=0, user can use vncviewer to manually connect HVM from remote. For example:

4387

4388

{\small {\tt vncviewer domain0\_IP\_address:HVM\_domain\_id}} \\

4389

4390

serial & Enable redirection of HVM serial output to pty device\\

4391

4392

\end{tabular}

4393

4394

\begin{tabular}{lp{10cm}}

4395

4396

usb & Enable USB support without defining a specific USB device.

4397

This option defaults to 0 (disabled) unless the option usbdevice is

4398

specified in which case this option then defaults to 1 (enabled).\\

4399

4400

usbdevice & Enable USB support and also enable support for the given

4401

device. Devices that can be specified are {\small {\tt mouse}} (a PS/2 style

4402

mouse), {\small {\tt tablet}} (an absolute pointing device) and

4403

{\small {\tt host:id1:id2}} (a physical USB device on the host machine whose

4404

ids are {\small {\tt id1}} and {\small {\tt id2}}). The advantage

4405

of {\small {\tt tablet}} is that Windows guests will automatically recognize

4406

and support this device so specifying the config line

4407

4408

{\small

4409

\begin{verbatim}

4410

usbdevice='tablet'

4411

\end{verbatim}

4412

}

4413

4414

will create a mouse that works transparently with Windows guests under VNC.

4415

Linux doesn't recognize the USB tablet yet so Linux guests under VNC will

4416

still need the Summagraphics emulation.

4417

Details about mouse emulation are provided in section \textbf{A.4.3}.\\

4418

4419

localtime & Set the real time clock to local time [default=0, that is, set to UTC].\\

4420

4421

soundhw & Enable sound card support and specify the hardware to emulate. Values can be sb16, es1370 or all. Default is none.\\

4422

4423

full-screen & Start in full screen.\\

4424

4425

nographic & Another way to redirect serial output. If enabled, no 'sdl' or 'vnc' can work. Not recommended.\\

4426

4427

\end{tabular}

4428

4429

4430

\section{Creating virtual disks from scratch}

4431

\subsection{Using physical disks}

4432

If you are using a physical disk or physical disk partition, you need to install a Linux OS on the disk first. Then the boot loader should be installed in the correct place. For example {\small {\tt dev/sda}} for booting from the whole disk, or {\small {\tt /dev/sda1}} for booting from partition 1.

4433

4434

\subsection{Using disk image files}

4435

You need to create a large empty disk image file first; then, you need to install a Linux OS onto it. There are two methods you can choose. One is directly installing it using a HVM guest while booting from the OS installation CD-ROM. The other is copying an installed OS into it. The boot loader will also need to be installed.

4436

4437

\subsubsection*{To create the image file:}

4438

The image size should be big enough to accommodate the entire OS. This example assumes the size is 1G (which is probably too small for most OSes).

4439

4440

{\small {\tt \# dd if=/dev/zero of=hd.img bs=1M count=0 seek=1024}}

4441

4442

\subsubsection*{To directly install Linux OS into an image file using a HVM guest:}

4443

4444

Install Xen and create HVM with the original image file with booting from CD-ROM. Then it is just like a normal Linux OS installation. The HVM configuration file should have a stanza for the CD-ROM as well as a boot device specification:

4445

4446

{\small {\tt disk=['file:/var/images/your-hd.img,hda,w', ',hdc:cdrom,r' ]

4447

boot='d'}}

4448

4449

If this method does not succeed, you can choose the following method of copying an installed Linux OS into an image file.

4450

4451

\subsubsection*{To copy a installed OS into an image file:}

4452

Directly installing is an easier way to make partitions and install an OS in a disk image file. But if you want to create a specific OS in your disk image, then you will most likely want to use this method.

4453

4454

\begin{enumerate}

4455

\item {\bfseries Install a normal Linux OS on the host machine}\\

4456

You can choose any way to install Linux, such as using yum to install Red Hat Linux or YAST to install Novell SuSE Linux. The rest of this example assumes the Linux OS is installed in {\small {\tt /var/guestos/}}.

4457

4458

\item {\bfseries Make the partition table}\\

4459

The image file will be treated as hard disk, so you should make the partition table in the image file. For example:

4460

4461

{\scriptsize {\tt \# losetup /dev/loop0 hd.img\\

4462

\# fdisk -b 512 -C 4096 -H 16 -S 32 /dev/loop0\\

4463

press 'n' to add new partition\\

4464

press 'p' to choose primary partition\\

4465

press '1' to set partition number\\

4466

press "Enter" keys to choose default value of "First Cylinder" parameter.\\

4467

press "Enter" keys to choose default value of "Last Cylinder" parameter.\\

4468

press 'w' to write partition table and exit\\

4469

\# losetup -d /dev/loop0}}

4470

4471

\item {\bfseries Make the file system and install grub}\\

4472

{\scriptsize {\tt \# ln -s /dev/loop0 /dev/loop\\

4473

\# losetup /dev/loop0 hd.img\\

4474

\# losetup -o 16384 /dev/loop1 hd.img\\

4475

\# mkfs.ext3 /dev/loop1\\

4476

\# mount /dev/loop1 /mnt\\

4477

\# mkdir -p /mnt/boot/grub\\

4478

\# cp /boot/grub/stage* /boot/grub/e2fs\_stage1\_5 /mnt/boot/grub\\

4479

\# umount /mnt\\

4480

\# grub\\

4481

grub> device (hd0) /dev/loop\\

4482

grub> root (hd0,0)\\

4483

grub> setup (hd0)\\

4484

grub> quit\\

4485

\# rm /dev/loop\\

4486

\# losetup -d /dev/loop0\\

4487

\# losetup -d /dev/loop1}}

4488

4489

The {\small {\tt losetup}} option {\small {\tt -o 16384}} skips the partition table in the image file. It is the number of sectors times 512. We need {\small {\tt /dev/loop}} because grub is expecting a disk device \emph{name}, where \emph{name} represents the entire disk and \emph{name1} represents the first partition.

4490

4491

\item {\bfseries Copy the OS files to the image}\\

4492

If you have Xen installed, you can easily use {\small {\tt lomount}} instead of {\small {\tt losetup}} and {\small {\tt mount}} when coping files to some partitions. {\small {\tt lomount}} just needs the partition information.

4493

4494

{\scriptsize {\tt \# lomount -t ext3 -diskimage hd.img -partition 1 /mnt/guest\\

4495

\# cp -ax /var/guestos/\{root,dev,var,etc,usr,bin,sbin,lib\} /mnt/guest\\

4496

\# mkdir /mnt/guest/\{proc,sys,home,tmp\}}}

4497

4498

\item {\bfseries Edit the {\small {\tt /etc/fstab}} of the guest image}\\

4499

The fstab should look like this:

4500

4501

{\scriptsize {\tt \# vim /mnt/guest/etc/fstab\\

4502

/dev/hda1 / ext3 defaults 1 1\\

4503

none /dev/pts devpts gid=5,mode=620 0 0\\

4504

none /dev/shm tmpfs defaults 0 0\\

4505

none /proc proc defaults 0 0\\

4506

none /sys sysfs efaults 0 0}}

4507

4508

\item {\bfseries umount the image file}\\

4509

{\small {\tt \# umount /mnt/guest}}

4510

\end{enumerate}

4511

4512

Now, the guest OS image {\small {\tt hd.img}} is ready. You can also reference {\small {\tt http://free.oszoo.org}} for quickstart images. But make sure to install the boot loader.

4513

4514

\section{HVM Guests}

4515

\subsection{Editing the Xen HVM config file}

4516

Make a copy of the example HVM configuration file {\small {\tt /etc/xen/xmexample.hvm}} and edit the line that reads

4517

4518

{\small {\tt disk = [ 'file:/var/images/\emph{min-el3-i386.img},hda,w' ]}}

4519

4520

replacing \emph{min-el3-i386.img} with the name of the guest OS image file you just made.

4521

4522

\subsection{Creating HVM guests}

4523

Simply follow the usual method of creating the guest, providing the filename of your HVM configuration file:\\

4524

4525

{\small {\tt \# xend start\\

4526

\# xm create /etc/xen/hvmguest.hvm}}

4527

4528

In the default configuration, VNC is on and SDL is off. Therefore VNC windows will open when HVM guests are created. If you want to use SDL to create HVM guests, set {\small {\tt sdl=1}} in your HVM configuration file. You can also turn off VNC by setting {\small {\tt vnc=0}}.

4529

4530

\subsection{Mouse issues, especially under VNC}

4531

Mouse handling when using VNC is a little problematic.

4532

The problem is that the VNC viewer provides a virtual pointer which is

4533

located at an absolute location in the VNC window and only absolute

4534

coordinates are provided.

4535

The HVM device model converts these absolute mouse coordinates

4536

into the relative motion deltas that are expected by the PS/2

4537

mouse driver running in the guest.

4538

Unfortunately,

4539

it is impossible to keep these generated mouse deltas

4540

accurate enough for the guest cursor to exactly match

4541

the VNC pointer.

4542

This can lead to situations where the guest's cursor

4543

is in the center of the screen and there's no way to

4544

move that cursor to the left

4545

(it can happen that the VNC pointer is at the left

4546

edge of the screen and,

4547

therefore,

4548

there are no longer any left mouse deltas that

4549

can be provided by the device model emulation code.)

4550

4551

To deal with these mouse issues there are 4 different

4552

mouse emulations available from the HVM device model:

4553

4554

\begin{description}

4555

\item[PS/2 mouse over the PS/2 port.]

4556

This is the default mouse

4557

that works perfectly well under SDL.

4558

Under VNC the guest cursor will get

4559

out of sync with the VNC pointer.

4560

When this happens you can re-synchronize

4561

the guest cursor to the VNC pointer by

4562

holding down the

4563

\textbf{left-ctl}

4564

and

4565

\textbf{left-alt}

4566

keys together.

4567

While these keys are down VNC pointer motions

4568

will not be reported to the guest so

4569

that the VNC pointer can be moved

4570

to a place where it is possible

4571

to move the guest cursor again.

4572

4573

\item[Summagraphics mouse over the serial port.]

4574

The device model also provides emulation

4575

for a Summagraphics tablet,

4576

an absolute pointer device.

4577

This emulation is provided over the second

4578

serial port,

4579

\textbf{/dev/ttyS1}

4580

for Linux guests and

4581

\textbf{COM2}

4582

for Windows guests.

4583

Unfortunately,

4584

neither Linux nor Windows provides

4585

default support for the Summagraphics

4586

tablet so the guest will have to be

4587

manually configured for this mouse.

4588

4589

\textbf{Linux configuration.}

4590

4591

First,

4592

configure the GPM service to use the Summagraphics tablet.

4593

This can vary between distributions but,

4594

typically,

4595

all that needs to be done is modify the file

4596

\path{/etc/sysconfig/mouse} to contain the lines:

4597

4598

{\small

4599

\begin{verbatim}

4600

MOUSETYPE="summa"

4601

XMOUSETYPE="SUMMA"

4602

DEVICE=/dev/ttyS1

4603

\end{verbatim}

4604

}

4605

4606

and then restart the GPM daemon.

4607

4608

Next,

4609

modify the X11 config

4610

\path{/etc/X11/xorg.conf}

4611

to support the Summgraphics tablet by replacing

4612

the input device stanza with the following:

4613

4614

{\small

4615

\begin{verbatim}

4616

Section "InputDevice"

4617

Identifier "Mouse0"

4618

Driver "summa"

4619

Option "Device" "/dev/ttyS1"

4620

Option "InputFashion" "Tablet"

4621

Option "Mode" "Absolute"

4622

Option "Name" "EasyPen"

4623

Option "Compatible" "True"

4624

Option "Protocol" "Auto"

4625

Option "SendCoreEvents" "on"

4626

Option "Vendor" "GENIUS"

4627

EndSection

4628

\end{verbatim}

4629

}

4630

4631

Restart X and the X cursor should now properly

4632

track the VNC pointer.

4633

4634

4635

\textbf{Windows configuration.}

4636

4637

Get the file

4638

\path{http://www.cad-plan.de/files/download/tw2k.exe}

4639

and execute that file on the guest,

4640

answering the questions as follows:

4641

4642

\begin{enumerate}

4643

\item When the program asks for \textbf{model},

4644

scroll down and select \textbf{SummaSketch (MM Compatible)}.

4645

4646

\item When the program asks for \textbf{COM Port} specify \textbf{com2}.

4647

4648

\item When the programs asks for a \textbf{Cursor Type} specify

4649

\textbf{4 button cursor/puck}.

4650

4651

\item The guest system will then reboot and,

4652

when it comes back up,

4653

the guest cursor will now properly track

4654

the VNC pointer.

4655

\end{enumerate}

4656

4657

\item[PS/2 mouse over USB port.]

4658

This is just the same PS/2 emulation except it is

4659

provided over a USB port.

4660

This emulation is enabled by the configuration flag:

4661

{\small

4662

\begin{verbatim}

4663

usbdevice='mouse'

4664

\end{verbatim}

4665

}

4666

4667

\item[USB tablet over USB port.]

4668

The USB tablet is an absolute pointing device

4669

that has the advantage that it is automatically

4670

supported under Windows guests,

4671

although Linux guests still require some

4672

manual configuration.

4673

This mouse emulation is enabled by the

4674

configuration flag:

4675

{\small

4676

\begin{verbatim}

4677

usbdevice='tablet'

4678

\end{verbatim}

4679

}

4680

4681

\textbf{Linux configuration.}

4682

4683

Unfortunately,

4684

there is no GPM support for the

4685

USB tablet at this point in time.

4686

If you intend to use a GPM pointing

4687

device under VNC you should

4688

configure the guest for Summagraphics

4689

emulation.

4690

4691

Support for X11 is available by following

4692

the instructions at\\

4693

\verb+http://stz-softwaretechnik.com/~ke/touchscreen/evtouch.html+\\

4694

with one minor change.

4695

The

4696

\path{xorg.conf}

4697

given in those instructions

4698

uses the wrong values for the X \& Y minimums and maximums,

4699

use the following config stanza instead:

4700

4701

{\small

4702

\begin{verbatim}

4703

Section "InputDevice"

4704

Identifier "Tablet"

4705

Driver "evtouch"

4706

Option "Device" "/dev/input/event2"

4707

Option "DeviceName" "touchscreen"

4708

Option "MinX" "0"

4709

Option "MinY" "0"

4710

Option "MaxX" "32256"

4711

Option "MaxY" "32256"

4712

Option "ReportingMode" "Raw"

4713

Option "Emulate3Buttons"

4714

Option "Emulate3Timeout" "50"

4715

Option "SendCoreEvents" "On"

4716

EndSection

4717

\end{verbatim}

4718

}

4719

4720

\textbf{Windows configuration.}

4721

4722

Just enabling the USB tablet in the

4723

guest's configuration file is sufficient,

4724

Windows will automatically recognize and

4725

configure device drivers for this

4726

pointing device.

4727

4728

\end{description}

4729

4730

\subsection{USB Support}

4731

There is support for an emulated USB mouse,

4732

an emulated USB tablet

4733

and physical low speed USB devices

4734

(support for high speed USB 2.0 devices is

4735

still under development).

4736

4737

\begin{description}

4738

\item[USB PS/2 style mouse.]

4739

Details on the USB mouse emulation are

4740

given in sections

4741

\textbf{A.2}

4742

and

4743

\textbf{A.4.3}.

4744

Enabling USB PS/2 style mouse emulation

4745

is just a matter of adding the line

4746

4747

{\small

4748

\begin{verbatim}

4749

usbdevice='mouse'

4750

\end{verbatim}

4751

}

4752

4753

to the configuration file.

4754

\item[USB tablet.]

4755

Details on the USB tablet emulation are

4756

given in sections

4757

\textbf{A.2}

4758

and

4759

\textbf{A.4.3}.

4760

Enabling USB tablet emulation

4761

is just a matter of adding the line

4762

4763

{\small

4764

\begin{verbatim}

4765

usbdevice='tablet'

4766

\end{verbatim}

4767

}

4768

4769

to the configuration file.

4770

\item[USB physical devices.]

4771

Access to a physical (low speed) USB device

4772

is enabled by adding a line of the form

4773

4774

{\small

4775

\begin{verbatim}

4776

usbdevice='host:vid:pid'

4777

\end{verbatim}

4778

}

4779

4780

into the the configuration file.\footnote{

4781

There is an alternate

4782

way of specifying a USB device that

4783

uses the syntax

4784

\textbf{host:bus.addr}

4785

but this syntax suffers from

4786

a major problem that makes

4787

it effectively useless.

4788

The problem is that the

4789

\textbf{addr}

4790

portion of this address

4791

changes every time the USB device

4792

is plugged into the system.

4793

For this reason this addressing

4794

scheme is not recommended and

4795

will not be documented further.

4796

}

4797

\textbf{vid}

4798

and

4799

\textbf{pid}

4800

are a

4801

product id and

4802

vendor id

4803

that uniquely identify

4804

the USB device.

4805

These ids can be identified

4806

in two ways:

4807

4808

\begin{enumerate}

4809

\item Through the control window.

4810

As described in section

4811

\textbf{A.4.6}

4812

the control window

4813

is activated by pressing

4814

\textbf{ctl-alt-2}

4815

in the guest VGA window.

4816

As long as USB support is

4817

enabled in the guest by including

4818

the config file line

4819

{\small

4820

\begin{verbatim}

4821

usb=1

4822

\end{verbatim}

4823

}

4824

then executing the command

4825

{\small

4826

\begin{verbatim}

4827

info usbhost

4828

\end{verbatim}

4829

}

4830

in the control window

4831

will display a list of all

4832

usb devices and their ids.

4833

For example,

4834

this output:

4835

{\small

4836

\begin{verbatim}

4837

Device 1.3, speed 1.5 Mb/s

4838

Class 00: USB device 04b3:310b

4839

\end{verbatim}

4840

}

4841

was created from a USB mouse with

4842

vendor id

4843

\textbf{04b3}

4844

and product id

4845

\textbf{310b}.

4846

This device could be made available

4847

to the HVM guest by including the

4848

config file entry

4849

{\small

4850

\begin{verbatim}

4851

usbdevice='host:04be:310b'

4852

\end{verbatim}

4853

}

4854

4855

It is also possible to

4856

enable access to a USB

4857

device dynamically through

4858

the control window.

4859

The control window command

4860

{\small

4861

\begin{verbatim}

4862

usb_add host:vid:pid

4863

\end{verbatim}

4864

}

4865

will also allow access to a

4866

USB device with vendor id

4867

\textbf{vid}

4868

and product id

4869

\textbf{pid}.

4870

\item Through the

4871

\path{/proc} file system.

4872

The contents of the pseudo file

4873

\path{/proc/bus/usb/devices}

4874

can also be used to identify

4875

vendor and product ids.

4876

Looking at this file,

4877

the line starting with

4878

\textbf{P:}

4879

has a field

4880

\textbf{Vendor}

4881

giving the vendor id and

4882

another field

4883

\textbf{ProdID}

4884

giving the product id.

4885

The contents of

4886

\path{/proc/bus/usb/devices}

4887

for the example mouse is as

4888

follows:

4889

{\small

4890

\begin{verbatim}

4891

T: Bus=01 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#= 3 Spd=1.5 MxCh= 0

4892

D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1

4893

P: Vendor=04b3 ProdID=310b Rev= 1.60

4894

C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA

4895

I: If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=(none)

4896

E: Ad=81(I) Atr=03(Int.) MxPS= 4 Ivl=10ms

4897

\end{verbatim}

4898

}

4899

Note that the

4900

\textbf{P:}

4901

line correctly identifies the

4902

vendor id and product id

4903

for this mouse as

4904

\textbf{04b3:310b}.

4905

\end{enumerate}

4906

There is one other issue to

4907

be aware of when accessing a

4908

physical USB device from the guest.

4909

The Dom0 kernel must not have

4910

a device driver loaded for

4911

the device that the guest wishes

4912

to access.

4913

This means that the Dom0

4914

kernel must not have that

4915

device driver compiled into

4916

the kernel or,

4917

if using modules,

4918

that driver module must

4919

not be loaded.

4920

Note that this is the device

4921

specific USB driver that must

4922

not be loaded,

4923

either the

4924

\textbf{UHCI}

4925

4926

\textbf{OHCI}

4927

USB controller driver must

4928

still be loaded.

4929

4930

Going back to the USB mouse

4931

as an example,

4932

if \textbf{lsmod}

4933

gives the output:

4934

4935

{\small

4936

\begin{verbatim}

4937

Module Size Used by

4938

usbmouse 4128 0

4939

usbhid 28996 0

4940

uhci_hcd 35409 0

4941

\end{verbatim}

4942

}

4943

4944

then the USB mouse is being

4945

used by the Dom0 kernel and is

4946

not available to the guest.

4947

Executing the command

4948

\textbf{rmmod usbhid}\footnote{

4949

Turns out the

4950

\textbf{usbhid}

4951

driver is the significant

4952

one for the USB mouse,

4953

the presence or absence of

4954

the module

4955

\textbf{usbmouse}

4956

has no effect on whether or

4957

not the guest can see a USB mouse.}

4958

will remove the USB mouse

4959

driver from the Dom0 kernel

4960

and the mouse will now be

4961

accessible by the HVM guest.

4962

4963

Be aware the the Linux USB

4964

hotplug system will reload

4965

the drivers if a USB device

4966

is removed and plugged back

4967

in.

4968

This means that just unloading

4969

the driver module might not

4970

be sufficient if the USB device

4971

is removed and added back.

4972

A more reliable technique is

4973

to first

4974

\textbf{rmmod}

4975

the driver and then rename the

4976

driver file in the

4977

\path{/lib/modules}

4978

directory,

4979

just to make sure it doesn't get

4980

reloaded.

4981

\end{description}

4982

4983

\subsection{Destroy HVM guests}

4984

HVM guests can be destroyed in the same way as can paravirtualized guests. We recommend that you shut-down the guest using the guest OS' provided method, for Linux, type the command

4985

4986

{\small {\tt poweroff}}

4987

4988

in the HVM guest's console, for Windows use Start -> Shutdown first to prevent

4989

data loss. Depending on the configuration the guest will be automatically

4990

destroyed, otherwise execute the command

4991

4992

{\small {\tt xm destroy \emph{vmx\_guest\_id} }}

4993

4994

at the Domain0 console.

4995

4996

\subsection{HVM window (X or VNC) Hot Key}

4997

If you are running in the X environment after creating a HVM guest, an X window is created. There are several hot keys for control of the HVM guest that can be used in the window.

4998

4999

{\bfseries Ctrl+Alt+2} switches from guest VGA window to the control window. Typing {\small {\tt help }} shows the control commands help. For example, 'q' is the command to destroy the HVM guest.\\

5000

{\bfseries Ctrl+Alt+1} switches back to HVM guest's VGA.\\

5001

{\bfseries Ctrl+Alt+3} switches to serial port output. It captures serial output from the HVM guest. It works only if the HVM guest was configured to use the serial port. \\

5002

5003

\chapter{Vnets - Domain Virtual Networking}

5004

5005

Xen optionally supports virtual networking for domains using {\em vnets}.

5006

These emulate private LANs that domains can use. Domains on the same

5007

vnet can be hosted on the same machine or on separate machines, and the

5008

vnets remain connected if domains are migrated. Ethernet traffic

5009

on a vnet is tunneled inside IP packets on the physical network. A vnet is a virtual

5010

network and addressing within it need have no relation to addressing on

5011

the underlying physical network. Separate vnets, or vnets and the physical network,

5012

can be connected using domains with more than one network interface and

5013

enabling IP forwarding or bridging in the usual way.

5014

5015

Vnet support is included in \texttt{xm} and \xend:

5016

\begin{verbatim}

5017

# xm vnet-create <config>

5018

\end{verbatim}

5019

creates a vnet using the configuration in the file \verb|<config>|.

5020

When a vnet is created its configuration is stored by \xend and the vnet persists until it is

5021

deleted using

5022

\begin{verbatim}

5023

# xm vnet-delete <vnetid>

5024

\end{verbatim}

5025

The vnets \xend knows about are listed by

5026

\begin{verbatim}

5027

# xm vnet-list

5028

\end{verbatim}

5029

More vnet management commands are available using the

5030

\texttt{vn} tool included in the vnet distribution.

5031

5032

The format of a vnet configuration file is

5033

\begin{verbatim}

5034

(vnet (id <vnetid>)

5035

(bridge <bridge>)

5036

(vnetif <vnet interface>)

5037

(security <level>))

5038

\end{verbatim}

5039

White space is not significant. The parameters are:

5040

\begin{itemize}

5041

\item \verb|<vnetid>|: vnet id, the 128-bit vnet identifier. This can be given

5042

as 8 4-digit hex numbers separated by colons, or in short form as a single 4-digit hex number.

5043

The short form is the same as the long form with the first 7 fields zero.

5044

Vnet ids must be non-zero and id 1 is reserved.

5045

5046

\item \verb|<bridge>|: the name of a bridge interface to create for the vnet. Domains

5047

are connected to the vnet by connecting their virtual interfaces to the bridge.

5048

Bridge names are limited to 14 characters by the kernel.

5049

5050

\item \verb|<vnetif>|: the name of the virtual interface onto the vnet (optional). The

5051

interface encapsulates and decapsulates vnet traffic for the network and is attached

5052

to the vnet bridge. Interface names are limited to 14 characters by the kernel.

5053

5054

\item \verb|<level>|: security level for the vnet (optional). The level may be one of

5055

\begin{itemize}

5056

\item \verb|none|: no security (default). Vnet traffic is in clear on the network.

5057

\item \verb|auth|: authentication. Vnet traffic is authenticated using IPSEC

5058

ESP with hmac96.

5059

\item \verb|conf|: confidentiality. Vnet traffic is authenticated and encrypted

5060

using IPSEC ESP with hmac96 and AES-128.

5061

\end{itemize}

5062

Authentication and confidentiality are experimental and use hard-wired keys at present.

5063

\end{itemize}

5064

When a vnet is created its configuration is stored by \xend and the vnet persists until it is

5065

deleted using \texttt{xm vnet-delete <vnetid>}. The interfaces and bridges used by vnets

5066

are visible in the output of \texttt{ifconfig} and \texttt{brctl show}.

5067

5068

\section{Example}

5069

If the file \path{vnet97.sxp} contains

5070

\begin{verbatim}

5071

(vnet (id 97) (bridge vnet97) (vnetif vnif97)

5072

(security none))

5073

\end{verbatim}

5074

Then \texttt{xm vnet-create vnet97.sxp} will define a vnet with id 97 and no security.

5075

The bridge for the vnet is called vnet97 and the virtual interface for it is vnif97.

5076

To add an interface on a domain to this vnet set its bridge to vnet97

5077

in its configuration. In Python:

5078

\begin{verbatim}

5079

vif="bridge=vnet97"

5080

\end{verbatim}

5081

In sxp:

5082

\begin{verbatim}

5083

(dev (vif (mac aa:00:00:01:02:03) (bridge vnet97)))

5084

\end{verbatim}

5085

Once the domain is started you should see its interface in the output of \texttt{brctl show}

5086

under the ports for \texttt{vnet97}.

5087

5088

To get best performance it is a good idea to reduce the MTU of a domain's interface

5089

onto a vnet to 1400. For example using \texttt{ifconfig eth0 mtu 1400} or putting

5090

\texttt{MTU=1400} in \texttt{ifcfg-eth0}.

5091

You may also have to change or remove cached config files for eth0 under

5092

\texttt{/etc/sysconfig/networking}. Vnets work anyway, but performance can be reduced

5093

by IP fragmentation caused by the vnet encapsulation exceeding the hardware MTU.

5094

5095

\section{Installing vnet support}

5096

Vnets are implemented using a kernel module, which needs to be loaded before

5097

they can be used. You can either do this manually before starting \xend, using the

5098

command \texttt{vn insmod}, or configure \xend to use the \path{network-vnet}

5099

script in the xend configuration file \texttt{/etc/xend/xend-config.sxp}:

5100

\begin{verbatim}

5101

(network-script network-vnet)

5102

\end{verbatim}

5103

This script insmods the module and calls the \path{network-bridge} script.

5104

5105

The vnet code is not compiled and installed by default.

5106

To compile the code and install on the current system

5107

use \texttt{make install} in the root of the vnet source tree,

5108

\path{tools/vnet}. It is also possible to install to an installation

5109

directory using \texttt{make dist}. See the \path{Makefile} in

5110

the source for details.

5111

5112

The vnet module creates vnet interfaces \texttt{vnif0002},

5113

\texttt{vnif0003} and \texttt{vnif0004} by default. You can test that

5114

vnets are working by configuring IP addresses on these interfaces

5115

and trying to ping them across the network. For example, using machines

5116

hostA and hostB:

5117

\begin{verbatim}

5118

hostA# ifconfig vnif0004 192.0.2.100 up

5119

hostB# ifconfig vnif0004 192.0.2.101 up

5120

hostB# ping 192.0.2.100

5121

\end{verbatim}

5122

5123

The vnet implementation uses IP multicast to discover vnet interfaces, so

5124

all machines hosting vnets must be reachable by multicast. Network switches

5125

are often configured not to forward multicast packets, so this often

5126

means that all machines using a vnet must be on the same LAN segment,

5127

unless you configure vnet forwarding.

5128

5129

You can test multicast coverage by pinging the vnet multicast address:

5130

\begin{verbatim}

5131

# ping -b 224.10.0.1

5132

\end{verbatim}

5133

You should see replies from all machines with the vnet module running.

5134

You can see if vnet packets are being sent or received by dumping traffic

5135

on the vnet UDP port:

5136

\begin{verbatim}

5137

# tcpdump udp port 1798

5138

\end{verbatim}

5139

5140

If multicast is not being forwarded between machines you can configure

5141

multicast forwarding using vn. Suppose we have machines hostA on 192.0.2.200

5142

and hostB on 192.0.2.211 and that multicast is not forwarded between them.

5143

We use vn to configure each machine to forward to the other:

5144

\begin{verbatim}

5145

hostA# vn peer-add hostB

5146

hostB# vn peer-add hostA

5147

\end{verbatim}

5148

Multicast forwarding needs to be used carefully - you must avoid creating forwarding

5149

loops. Typically only one machine on a subnet needs to be configured to forward,

5150

as it will forward multicasts received from other machines on the subnet.

5151

5152

%% Chapter Glossary of Terms moved to glossary.tex

5153

\chapter{Glossary of Terms}

5154

5155

\begin{description}

5156

5157

\item[Domain] A domain is the execution context that contains a

5158

running {\bf virtual machine}. The relationship between virtual

5159

machines and domains on Xen is similar to that between programs and

5160

processes in an operating system: a virtual machine is a persistent

5161

entity that resides on disk (somewhat like a program). When it is

5162

loaded for execution, it runs in a domain. Each domain has a {\bf

5163

domain ID}.

5164

5165

\item[Domain 0] The first domain to be started on a Xen machine.

5166

Domain 0 is responsible for managing the system.

5167

5168

\item[Domain ID] A unique identifier for a {\bf domain}, analogous to

5169

a process ID in an operating system.

5170

5171

\item[Full virtualization] An approach to virtualization which

5172

requires no modifications to the hosted operating system, providing

5173

the illusion of a complete system of real hardware devices.

5174

5175

\item[Hypervisor] An alternative term for {\bf VMM}, used because it

5176

means `beyond supervisor', since it is responsible for managing

5177

multiple `supervisor' kernels.

5178

5179

\item[Live migration] A technique for moving a running virtual machine

5180

to another physical host, without stopping it or the services

5181

running on it.

5182

5183

\item[Paravirtualization] An approach to virtualization which requires

5184

modifications to the operating system in order to run in a virtual

5185

machine. Xen uses paravirtualization but preserves binary

5186

compatibility for user space applications.

5187

5188

\item[Shadow pagetables] A technique for hiding the layout of machine

5189

memory from a virtual machine's operating system. Used in some {\bf

5190

VMMs} to provide the illusion of contiguous physical memory, in

5191

Xen this is used during {\bf live migration}.

5192

5193

\item[Virtual Block Device] Persistent storage available to a virtual

5194

machine, providing the abstraction of an actual block storage device.

5195

{\bf VBD}s may be actual block devices, filesystem images, or

5196

remote/network storage.

5197

5198

\item[Virtual Machine] The environment in which a hosted operating

5199

system runs, providing the abstraction of a dedicated machine. A

5200

virtual machine may be identical to the underlying hardware (as in

5201

{\bf full virtualization}, or it may differ, as in {\bf

5202

paravirtualization}).

5203

5204

\item[VMM] Virtual Machine Monitor - the software that allows multiple

5205

virtual machines to be multiplexed on a single physical machine.

5206

5207

\item[Xen] Xen is a paravirtualizing virtual machine monitor,

5208

developed primarily by the Systems Research Group at the University

5209

of Cambridge Computer Laboratory.

5210

5211

\item[XenLinux] A name for the port of the Linux kernel that

5212

runs on Xen.

5213

5214

\end{description}

5215

5216

5217

\end{document}

5218

5219

5220

%% Other stuff without a home

5221

5222

%% Instructions Re Python API

5223

5224

%% Other Control Tasks using Python

5225

%% ================================

5226

5227

%% A Python module 'Xc' is installed as part of the tools-install

5228

%% process. This can be imported, and an 'xc object' instantiated, to

5229

%% provide access to privileged command operations:

5230

5231

%% # import Xc

5232

%% # xc = Xc.new()

5233

%% # dir(xc)

5234

%% # help(xc.domain_create)

5235

5236

%% In this way you can see that the class 'xc' contains useful

5237

%% documentation for you to consult.

5238

5239

%% A further package of useful routines (xenctl) is also installed:

5240

5241

%% # import xenctl.utils

5242

%% # help(xenctl.utils)

5243

5244

%% You can use these modules to write your own custom scripts or you

5245

%% can customise the scripts supplied in the Xen distribution.

5246

5247

5248

5249

% Explain about AGP GART

5250

5251

5252

%% If you're not intending to configure the new domain with an IP

5253

%% address on your LAN, then you'll probably want to use NAT. The

5254

%% 'xen_nat_enable' installs a few useful iptables rules into domain0

5255

%% to enable NAT. [NB: We plan to support RSIP in future]

5256

5257

5258

5259

%% Installing the file systems from the CD

5260

%% =======================================

5261

5262

%% If you haven't got an existing Linux installation onto which you

5263

%% can just drop down the Xen and Xenlinux images, then the file

5264

%% systems on the CD provide a quick way of doing an install. However,

5265

%% you would be better off in the long run doing a proper install of

5266

%% your preferred distro and installing Xen onto that, rather than

5267

%% just doing the hack described below:

5268

5269

%% Choose one or two partitions, depending on whether you want a

5270

%% separate /usr or not. Make file systems on it/them e.g.:

5271

%% mkfs -t ext3 /dev/hda3

5272

%% [or mkfs -t ext2 /dev/hda3 && tune2fs -j /dev/hda3 if using an old

5273

%% version of mkfs]

5274

5275

%% Next, mount the file system(s) e.g.:

5276

%% mkdir /mnt/root && mount /dev/hda3 /mnt/root

5277

%% [mkdir /mnt/usr && mount /dev/hda4 /mnt/usr]

5278

5279

%% To install the root file system, simply untar /usr/XenDemoCD/root.tar.gz:

5280

%% cd /mnt/root && tar -zxpf /usr/XenDemoCD/root.tar.gz

5281

5282

%% You'll need to edit /mnt/root/etc/fstab to reflect your file system

5283

%% configuration. Changing the password file (etc/shadow) is probably a

5284

%% good idea too.

5285

5286

%% To install the usr file system, copy the file system from CD on

5287

%% /usr, though leaving out the "XenDemoCD" and "boot" directories:

5288

%% cd /usr && cp -a X11R6 etc java libexec root src bin dict kerberos

5289

%% local sbin tmp doc include lib man share /mnt/usr

5290

5291

%% If you intend to boot off these file systems (i.e. use them for

5292

%% domain 0), then you probably want to copy the /usr/boot

5293

%% directory on the cd over the top of the current symlink to /boot

5294

%% on your root filesystem (after deleting the current symlink)

5295

%% i.e.:

5296

%% cd /mnt/root ; rm boot ; cp -a /usr/boot .

Older »