~hazmat/pyjuju/security-specification

« back to all changes in this revision

Viewing changes to docs/source/drafts/security.rst

Committer: kapil.thangavelu at canonical
Date: 2011-06-08 20:04:43 UTC
Revision ID: kapil.thangavelu@canonical.com-20110608200443-bbbgmtalp31s1ef5

additional escalation scenarios, outline next steps broadly

files modified:
docs/source/drafts/security.rst

Show diffs side-by-side

added added

removed removed

docs/source/drafts/security.rst

-----------------

Ensemble is committed to providing a reliable secure mechanism for

deploying services. To that end it utilizes shared storage ACLs,

encrypted communications, as well as process isolation to help

reach that goal.

What follows is an overview of these different mechanisms and

how they contribute to keeping an ensemble environment secure.

deploying services. What follows is an overview of these different

mechanisms and how they contribute to keeping an ensemble environment

secure.

Glossary

========

First a glossary of terms used in this document.

Zookeeper ACLs

--------------

Ensemble relies on the security facilities provided by the zookeeper's

coordination storage, whereby zookeeper automatically restricts access

to each node, based on the ACL permission map on each node. This ACL

facility maps permissions to principal identity tokens. Zookeeper

provides permissions for read, write, delete, create, and admin access

to each node. Every zookeeper connection can associate principal credentials

to its connection, and all access by that connection is validated against

the per node ACL mapping.

Principal

---------

+++++++++

A principal in the context of ensemble can represent any actor or

group of actors within the system. Each principal is authenticated via

ACL mapping explicitly giving access to the node.

Token Database

--------------

++++++++++++++

A mapping of principal id to their acl identity token. The identity

token is a md5 checksum of username/password prefixed of the form

world readable but only writable by the security (ticket granting)

agent, which is responsible for creating principals.

Zookeeper ACLs

++++++++++++++

Ensemble relies on the security facilities provided by the zookeeper's

coordination storage, whereby zookeeper automatically restricts access

to each node, based on the ACL permission map on each node. This ACL

facility maps permissions to principal identity tokens. Zookeeper

provides permissions for read, write, delete, create, and admin access

to each node. Every zookeeper connection can associate principal

credentials to its connection, and all access by that connection is

validated against the per node ACL mapping.

Security Agent

--------------

++++++++++++++

An additional zk connected actor responsible for creating principals

An additional zookeeper connected actor responsible for creating principals

and providing an up to date token database.

The security agent manages a token database (definition to follow),

and provides for the creation of new principals and handing out their

hash tokens to inquiring parties.

Passing credentials to clients

--------------------------------------

How the system passes credentials is a critical aspect to managing

principal access. Instead of passing principals credentials directly

via insecure channels, an actor creating another actor also

establishes a principal creation token via the security agent. The

principal creation token is a one time use string to create a

principal and its password, and update. If a malicious user intercepts

the token and uses it, compared with passing credentials directly it

minimizes the time that a third party has to perform such an

interception. Moreover invalid use of a token can be logged to as a

foresenic information.

Creating the initial principals. During bootstrap there are

Clients interact with the tgs to obtain principal, a principal.

Global read access to token by name.

OTP for principal creation.

The clients are handed an initial token (separate than the auth token)

which will be consumed by the TGS when creating a principal. This is

to allow for

Access for provisioning agents

OTP for principal access.

The token database will need to resolve services to service ids as service names are reusable.

Hook protocols

Privilege Escalation Scenarios

---------------------------------------

100

We have 5 different levels of escalation atm,

101

102

container escalation (service unit environment)

103

machine escalation (virtual machine),

104

agent escalation (a malicious zk connected actor),

105

106

Beyond that we have escalations which are effectively fatal, as they have access to sensitive data.

107

- ensemble enviroment zookeeper on disk data (ie bootstrap machine)

108

- trusted agent escalation (provisioning agent, bootstrap machine agent).

109

110

The system is comprised of a number of actors connecting to and

111

communicating via a shared storage. The shared storage provides ACLs

112

and we provide to outline the communication channels between actors as

113

possible attack vectors.

114

Security Policy

+++++++++++++++

Each actor employs a security policy, to determine the ACL map for a given

node path that may create. The policy simply takes the path to the node

to be created, and returns back an ACL map that can be set on the node.

Creating principals for actors

------------------------------

How the system passes credentials to an actor is a critical aspect to

managing principals securely. Every actor in the system needs its own

unique principal, to provide an auth identity, the credentials for a

principal are known only to the actor utilizing them and transiently

the security agent when they are created.

Instead of passing principals credentials directly via insecure

channels, an actor creating another actor also establishes a principal

creation token via the security agent. The principal creation token is

a one time use string which can be used to create a principal and its

password, and update the token database.

The security agent has a simple policy in place regarding principal

names and which actors can create them, ie. a provisioning agent can

create machine principals, but not service unit principals.

If a malicious user intercepts the token and uses it, compared with

passing credentials directly it minimizes the time that a third party

has to perform such an interception. Moreover invalid use of a token

can be logged as foresenic information.

One question that emerges with the use of a separate agent for creating

identities, is how agents needed for bootstrap recieve their credentials.

- The bootstrap can utilize a specialized OTP interface with a precreated

known value, which it can use to initialize the tree.

Encrypted zookeeper communications

----------------------------------

As zookeeper does not currently support SSL/TLS transport level

security, Ensemble utilizes SSH port forwarding to ensure encrypted

100

communications to zookeeper. One significant lacking to this approach,

101

is that any process on the set of ensemble machines can attempt to

102

connect zookeeper to brute force principal passwords.

103

104

Privileged Data

105

---------------

106

107

Certain data stored within zookeeper, is by its nature privileged and

108

should only be shared with agents requiring it for their function. For

109

example the Ensemble provider credentials should only be exposed to

110

the provisioning agent, as its required for it to function, any

111

additional access to the data, would be regarded as a data escalation

112

vulnerability.

113

114

Additionally services utilize relations to communicate with each

115

other, every service unit of the services participating within a

116

relation gets write access only to its own node within the relation,

117

and has read access to all service unit relation settings. An

118

unrelated service unit from a different service, is not allowed to

119

read any settings from the relation.

120

121

122

Relations attacks

123

-----------------

124

125

Ensemble is comprised of a number of actors connecting to and

126

communicating via a shared storage. When two services enter into a

127

relation, a private bidirectional channel is created for them to

128

exchange data.

129

130

Ensemble ensures that the zookeeper nodes used for this communication

131

are subject to the proper ACL constraints such that unrelated services

132

are unable to access them.

133

134

But these relations represent adhoc inter machine communication, which

135

are formula defined. A malicious agent could possibly abuse one of

136

these protocols to further compromise additional agents. Unlike other

137

attack vectors in ensemble, this is one that ensemble can only make

138

minimal safety guarantees regarding, outside of perhaps a simple

139

validation of relation data (currently treated as a binary blob) with

140

relation type associated schemas.

141

115

142

The formulas executed by the unit agent provide for user executed code

116

143

done within an lxc container (with root privileges). LXC provides

117

144

limited support for security against root in a container, so a

119

146

those of the other units on a machine.

120

147

121

148

122

123

Port Access to services

124

-----------------------

125

126

If we don’t have static information, how can we prevent port conflicts

127

when doing unit placement, short answer, we can’t. Now we need a way

128

for services to interrogate information on open ports on their machine

129

so they can select a non-conflicting port (container network is

130

separate than the machine so no way of identifying within the

131

container). So let’s say thats fine for app servers, now we connect a

132

proxy service to them, and we have a defined traffic port, ideally

133

we’d just assign a dns entry to the proxy service, but now we have a

134

problem in that we have a port offset on the url.

135

136

137

Additional Todo List

138

-------------------------

149

Privilege Escalation Scenarios

150

------------------------------

151

152

We have serveral different levels of escalation within ensemble for

153

malicious code that need to be considered.

154

155

container escalation

156

++++++++++++++++++++

157

158

All formula hooks are executed within an lxc container to give a

159

minimally isolated environment. This lxc container is rather trivially

160

exploitable to gain root access on the machine, as formulas execute

161

as root within the container and lxc provides minimal security guarantees

162

atm, which leads to the next escalation level.

163

164

Future work is needed to provide better security around lxc

165

integration, perhaps via integration of apparmor and ongoing lxc

166

isolation work.

167

168

Machine escalation

169

++++++++++++++++++

170

171

A machine is considered compromised if malicious code has root access

172

on the machine, all service units colocated on the machine are also

173

considered compromised if this occurs.

174

175

Agent Escalation

176

++++++++++++++++

177

178

An agent is considered compromised if malicious code has an open zookeeper

179

connection with a valid actor principal identity. The malicious code

180

has access to all data exposed via ACL to the compromised identity.

181

182

Beyond these generic scenarios we have particular escalations which

183

are effectively fatal, as they entail access to sensitive data that

184

spans the ensemble environment or machine provider.

185

186

A bootstrap machine compromise which allow for disk access could be

187

considered fatal as the Ensemble shared state (zookeeper) data is

188

resident on disk.

189

190

Certain agents like the provisioning agent, compromise of whose identity

191

would allow malicious code to utilize the machine provider credentials.

192

193

194

Access to Deployed services

195

----------------------------

196

197

A plan for controlled public access to deployed services is provided

198

separately by the expose-services specification.

199

200

Currently all internal access within a machine provider environment

201

like ec2 is unfiltered.

202

203

In future we should have machine level firewalling to allow access

204

between services based on their relations.

205

206

Next Steps

207

----------

139

208

140

209

SSH Host Identity Checks

141

- we should pull the ssh key of the machine into zk, so connections to a given machine can verify against valid keys of environment machines

142

143

Formula Storage must be referenced by

210

211

we should pull the ssh key of the machine into zk, so connections to a

212

given machine can verify against valid keys of environment machines

213

214

Formula Storage URLs

215

216

Currently the formula storage access is referenced by a storage key

217

which is retrieved via the machine provider storage interface. This

218

requires access to the machine provider credentials by Formula Storage

219

by machine agents, which they shouldn't need.

220

221

- Security Agent & Token Database

222

- Security Policy (Path Based ACL generator)

223

- Connections w/ Principal

224

225

Older »