2
* Copyright (C) 2015 Canonical Ltd.
4
* This program is free software: you can redistribute it and/or modify
5
* it under the terms of the GNU Lesser General Public License version 3 as
6
* published by the Free Software Foundation.
8
* This program is distributed in the hope that it will be useful,
9
* but WITHOUT ANY WARRANTY; without even the implied warranty of
10
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
11
* GNU Lesser General Public License for more details.
13
* You should have received a copy of the GNU Lesser General Public License
14
* along with this program. If not, see <http://www.gnu.org/licenses/>.
16
* Authored by: Michi Henning <michi.henning@canonical.com>
21
#include <core/cache_discard_policy.h>
22
#include <core/cache_events.h>
23
#include <core/optional.h>
24
#include <core/persistent_cache_stats.h>
32
class PersistentStringCacheImpl;
34
} // namespace internal
37
\brief A cache of key-value pairs with persistent storage.
39
PersistentStringCache provides a cache of key-value pairs with a backing store.
40
It is intended for caching arbitrary (possibly large) amounts of data,
41
such as might be needed by a web browser cache. The cache scales to large
42
numbers (hundreds of thousands) of entries and is very fast. (Typically,
43
the performance-limiting factor is the I/O bandwidth to disk.)
45
The cache is robust in the face of crashes and power loss. After a
46
re-start, it is guaranteed to be in a consistent state with correct
47
data. Some number of updates that were made just prior to a power loss
48
or kernel crash can be lost; however, if just the calling process
49
crashes, no updates are lost.
51
A cache has a maximum size (which can be changed at any time). Once
52
the cache reaches its maximum size, when adding an entry,
53
the cache automatically discards enough entries to make room for the new entry.
55
Keys can be (possibly binary) strings of size > 0. Values can be
56
(possibly binary) strings including the empty string.
58
Entries maintain an access time, which is used to keep them in
59
least-recently-used (LRU) order. In addition, entries can have
60
an optional expiry time. (If no expiry time is specified, infinite
61
expiry time is assumed.)
63
\note This class is thread-safe; you can call member functions from
64
different threads without any synchronization. Thread-safety is
65
provided for convenience, not performance. Calling concurrently
66
into the cache from multiple threads will not yield improved performance.
70
The cache provides two different discard policies, `lru_ttl`
73
For `lru_ttl`, the discard policy of the cache is to first delete all
74
entries that have expired. If this does not free sufficient space
75
to make room for a new entry, the cache then deletes entries in oldest
76
to newest (LRU) order until sufficient space is available. This
77
deletion in LRU order may delete entries that have an
78
expiry time, but have not expired yet, as well as entries with
81
For `lru_only`, entries do not maintain an expiry time and
82
are therefore discarded strictly in LRU order.
84
Access and expiry times are recorded with millisecond granularity.
85
To indicate infinite expiry time, use the defaulted parameter value or
86
`chrono::system_clock::time_point()`.
90
Methods throw `std::runtime_error` if the underlying database
91
(leveldb) reports an error. If leveldb detects database corruption, the code
92
throws `std::system_error` with with a 666 error code. To recover
93
from this error, remove all files in the cache directory.
94
Other errors are indicated by throwing `std::logic_error` or
95
`std::invalid_argument` as appropriate.
99
Besides storing key-value pairs, the cache allows you to add arbitrary
100
extra data to each entry. This is useful, for example, to maintain
101
metadata (such as HTTP header details) for the entries in the cache.
103
\warning It is not possible to distinguish between "no metadata was ever added"
104
and "empty metadata was added and retrieved". Do not use the metadata in such
105
a way that you rely the difference between "metadata not there" and
106
"metadata is the empty string".
110
Some rough performance figures, taken on an Intel Ivy Bridge i7-3770K 3.5 GHz
111
with 16 GB RAM, appear below. Records are filled with random data to make them non-compressible.
113
After filling the cache, the code performs cache lookups using random
114
keys, with an 80% hit probability. On a miss, it inserts a new
115
random record. This measures the typical steady-state behavior:
116
whenever a cache miss happens, the caller fetches the data and
117
inserts a new record into the cache.
123
Record size | 20 kB, normal distribution, stddev = 7000
125
Running the test With a 7200 rpm spinning disk produces:
128
-------------|------------
133
Running the test With an Intel 256 GB SSD produces:
136
-------------|------------
141
\note When benchmarking, make sure to compile in release mode. In debug
142
mode, a number of expensive assertions are turned on.
144
\note Also be aware that leveldb uses Snappy compression beneath the
145
covers. This means that, if test data is simply filled with
146
a fixed byte pattern, you will measure artificially high performance.
149
class PersistentStringCache
153
Convenience typedef for the return type of open().
155
typedef std::unique_ptr<PersistentStringCache> UPtr;
158
\brief Simple pair of value and metadata.
163
\brief Stores the value of an entry.
168
\brief Stores the metadata of an entry. If no metadata exists for an entry,
169
`metadata` is returned as the empty string when it is retrieved.
171
std::string metadata;
174
/** @name Copy and Assignment
175
Cache instances are not copyable, but can be moved.
176
\note The constructors are private. Use one of the open()
177
static member functions to create or open a cache.
181
PersistentStringCache(PersistentStringCache const&) = delete;
182
PersistentStringCache& operator=(PersistentStringCache const&) = delete;
184
PersistentStringCache(PersistentStringCache&&);
185
PersistentStringCache& operator=(PersistentStringCache&&);
189
Destroys the instance.
191
~PersistentStringCache();
193
/** @name Creation Methods
199
\brief Creates or opens a PersistentStringCache.
201
If no cache exists on disk, it will be created;
202
otherwise, the pre-existing cache contents are used.
204
An existing cache can be opened only if `max_size_in_bytes` and `policy` have
205
the same values they had when the cache was last closed.
207
\param cache_path The path to a directory in which to store the cache. The contents
208
of this directory are exlusively owned by the cache; do not create additional files
209
or directories there. The directory need not exist when creating a new cache.
210
\param max_size_in_bytes The maximum size in bytes for the cache.
211
\param policy The discard policy for the cache (`lru_only` or `lru_ttl`). The discard
212
policy cannot be changed once a cache has been created.
214
The size of an entry is the sum of the sizes of its key, value, and metadata.
215
The maximum size of the cache is the sum of the sizes of all its entries.
217
\return A <code>unique_ptr</code> to the instance.
218
\throws invalid_argument `max_size_in_bytes` is < 1.
219
\throws logic_error `max_size_in_bytes` or `policy` do not match the settings of a pre-existing cache.
221
static UPtr open(std::string const& cache_path, int64_t max_size_in_bytes, CacheDiscardPolicy policy);
224
\brief Opens an existing PersistentStringCache.
225
\param cache_path The path to a directory containing the existing cache.
226
\return A <code>unique_ptr</code> to the instance.
228
static UPtr open(std::string const& cache_path);
238
\brief Returns the value of an entry in the cache, provided the entry has not expired.
239
\param key The key for the entry.
240
\return A null value if the entry could not be retrieved; the value of the entry, otherwise.
241
\throws invalid_argument `key` is the empty string.
242
\note This operation updates the access time of the entry.
244
Optional<std::string> get(std::string const& key) const;
247
\brief Returns the data for an entry in the cache, provided the entry has not expired.
248
\param key The key for the entry.
249
\return A null value if the entry could not be retrieved; the data of the entry, otherwise.
250
If no metadata exists, `Data::metadata` is set to the empty string.
251
\throws invalid_argument `key` is the empty string.
252
\note This operation updates the access time of the entry.
254
Optional<Data> get_data(std::string const& key) const;
257
\brief Returns the metadata for an entry in the cache, provided the entry has not expired.
258
\param key The key for the entry.
259
\return A null value if the entry could not be retrieved; the metadata of the entry, otherwise.
260
\throws invalid_argument `key` is the empty string.
261
\note This operation does _not_ update the access time of the entry.
264
Optional<std::string> get_metadata(std::string const& key) const;
267
\brief Tests if an (unexpired) entry is in the cache.
268
\param key The key for the entry.
269
\return `true` if the entry is in the cache; `false` otherwise.
270
\throws invalid_argument `key` is the empty string.
271
\note This operation does _not_ update the access time of the entry.
274
bool contains_key(std::string const& key) const;
277
\brief Returns the number of entries in the cache.
278
\return The total number of entries in the cache.
279
\note The returned count includes possibly expired entries.
281
int64_t size() const noexcept;
284
\brief Returns the number of bytes consumed by entries in the cache.
285
\return The total number of bytes in the cache.
286
\note The returned size includes possibly expired entries.
288
int64_t size_in_bytes() const noexcept;
291
\brief Returns the maximum size of the cache in bytes.
292
\return The maximum number of bytes that can be stored in the cache.
295
int64_t max_size_in_bytes() const noexcept;
298
\brief Returns an estimate of the disk space consumed by the cache.
299
\return The approximate number of bytes used by the cache on disk.
300
\note The returned size may be smaller than the eventual size
301
if there are updates to the cache that have not yet been written to disk.
303
int64_t disk_size_in_bytes() const;
306
\brief Returns the discard policy of the cache.
307
\return The discard policy (`lru_only` or `lru_ttl`).
309
CacheDiscardPolicy discard_policy() const noexcept;
312
\brief Returns statistics for the cache.
314
The returned statistics are persistent and are restored the next
315
time an existing cache is opened. Call clear_stats() to explicitly
316
reset the statistics counters and time stamps to zero.
317
\return An object that provides accessors to statistics and settings.
320
PersistentCacheStats stats() const;
330
\brief Adds or updates an entry.
332
If an entry with the given key does not exist in the cache, it is added
333
(possibly evicting a number of expired and/or older entries). If the entry
334
still exists (whether expired or not), it is updated with the new value
335
(and possibly expiry time).
337
This operation deletes any metadata associated with the entry.
339
\return `true` if the entry was added or updated. `false` if the policy
340
is `lru_ttl` and `expiry_time` is in the past.
342
\throws invalid_argument `key` is the empty string.
343
\throws logic_error The size of the entry exceeds the maximum cache size.
344
\throws logic_error The cache policy is `lru_only` and a non-infinite expiry time was provided.
346
bool put(std::string const& key,
347
std::string const& value,
348
std::chrono::time_point<std::chrono::system_clock> expiry_time = std::chrono::system_clock::time_point());
351
\brief Adds or updates an entry.
353
\note This overload is provided to avoid the need to construct a string value.
355
If an entry with the given key does not exist in the cache, it is added
356
(possibly evicting a number of expired and/or older entries). If the entry
357
still exists (whether expired or not), it is updated with the new value
358
(and possibly expiry time).
360
This operation deletes any metadata associated with the entry.
362
\return `true` if the entry was added or updated. `false` if the policy
363
is `lru_ttl` and `expiry_time` is in the past.
365
\param key The key of the entry.
366
\param value A pointer to the first byte of the value.
367
\param size The size of the value in bytes.
368
\param expiry_time The time at which the entry expires.
370
\throws invalid_argument `key` is the empty string.
371
\throws invalid_argument `value` is `nullptr`.
372
\throws invalid_argument `size` is negative.
373
\throws logic_error The size of the entry exceeds the maximum cache size.
374
\throws logic_error The cache policy is `lru_only` and a non-infinite expiry time was provided.
376
bool put(std::string const& key,
379
std::chrono::time_point<std::chrono::system_clock> expiry_time = std::chrono::system_clock::time_point());
382
\brief Adds or updates an entry and its metadata.
384
If an entry with the given key does not exist in the cache, it is added
385
(possibly evicting a number of expired and/or older entries). If the entry
386
still exists (whether expired or not), it is updated with the new value
387
and metadata (and possibly expiry time).
389
\return `true` if the entry was added or updated. `false` if the policy
390
is `lru_ttl` and `expiry_time` is in the past.
392
\throws invalid_argument `key` is the empty string.
393
\throws logic_error The sum of sizes of the entry and metadata exceeds the maximum cache size.
394
\throws logic_error The cache policy is `lru_only` and a non-infinite expiry time was provided.
396
bool put(std::string const& key,
397
std::string const& value,
398
std::string const& metadata,
399
std::chrono::time_point<std::chrono::system_clock> expiry_time = std::chrono::system_clock::time_point());
402
\brief Adds or updates an entry and its metadata.
404
\note This overload is provided to avoid the need to construct strings
405
for the value and metadata.
407
If an entry with the given key does not exist in the cache, it is added
408
(possibly evicting a number of expired and/or older entries). If the entry
409
still exists (whether expired or not), it is updated with the new value
410
and metadata (and possibly expiry time).
412
\return `true` if the entry was added or updated. `false` if the policy
413
is `lru_ttl` and `expiry_time` is in the past.
415
\param key The key of the entry.
416
\param value A pointer to the first byte of the value.
417
\param value_size The size of the value in bytes.
418
\param metadata A pointer to the first byte of the metadata.
419
\param metadata_size The size of the metadata in bytes.
420
\param expiry_time The time at which the entry expires.
422
\throws invalid_argument `key` is the empty string.
423
\throws logic_error The sum of sizes of the entry and metadata exceeds the maximum cache size.
424
\throws logic_error The cache policy is `lru_only` and a non-infinite expiry time was provided.
426
bool put(std::string const& key,
429
char const* metadata,
430
int64_t metadata_size,
431
std::chrono::time_point<std::chrono::system_clock> expiry_time = std::chrono::system_clock::time_point());
434
\brief Function called by the cache to load an entry after a cache miss.
436
typedef std::function<void(std::string const& key, PersistentStringCache& cache)> Loader;
439
\brief Atomically retrieves or stores a cache entry.
441
`get_or_put` attempts to retrieve the value of a (non-expired) entry.
442
If the entry can be found, it returns its value. Otherwise, it calls
443
`load_func`, which is expected to add the entry to the cache. If the
444
load function succeeds in adding the entry, the value added by the load
445
function is returned. The load function is called by the application thread.
447
\return A null value if the entry could not be retrieved or loaded; the value of the entry, otherwise.
448
\throws runtime_error The load function threw an exception.
450
\note The load function must (synchronously) call one of the overloaded `put` methods to add a
451
new entry for the provided key. Calling any other method on the cache from within the load
452
function causes undefined behavior.
454
\warning This operation holds a lock on the cache while the load function runs.
455
This means that, if multiple threads call into the cache, they will be blocked
456
for the duration of the load function.
458
Optional<std::string> get_or_put(std::string const& key, Loader const& load_func);
461
\brief Atomically retrieves or stores a cache entry.
463
`get_or_put` attempts to retrieve the value and metadata of a (non-expired) entry.
464
If the entry can be found, it returns its data. Otherwise, it calls
465
`load_func`, which is expected to add the entry to the cache. If the
466
load function succeeds in adding the entry, the data added by the load
467
function is returned. The load function is called by the application thread.
469
\return A null value if the entry could not be retrieved or loaded; the value and metadata of the entry, otherwise.
470
\throws runtime_error The load function threw an exception.
472
\note The load function must (synchronously) call one of the overloaded `put` methods to add a
473
new entry for the provided key. Calling any other method on the cache from within the load
474
function causes undefined behavior.
476
\warning This operation holds a lock on the cache while the load function runs.
477
This means that, if multiple threads call into the cache, they will be blocked
478
for the duration of the load function.
480
Optional<Data> get_or_put_data(std::string const& key, Loader const& load_func);
483
\brief Adds or replaces the metadata for an entry.
485
If a (non-expired) entry with the given key exists in the cache, its metadata
486
is set to the provided value, replacing any previous metadata.
488
\return `true` if the metadata was added or updated. `false` if the
489
entry could not be found or was expired.
491
\throws invalid_argument `key` is the empty string.
492
\throws logic_error The new size of the entry would exceed the maximum cache size.
494
\note This operation does _not_ update the access time of the entry.
497
bool put_metadata(std::string const& key, std::string const& metadata);
500
\brief Adds or replaces the metadata for an entry.
502
\note This overload is provided to avoid the need to construct a string
505
If a (non-expired) entry with the given key exists in the cache, its metadata
506
is set to the provided value, replacing any previous metadata.
508
\param key The key of the entry.
509
\param metadata A pointer to the first byte of the metadata.
510
\param size The size of the metadata in bytes.
512
\return `true` if the metadata was added or updated. `false` if the
513
entry could not be found or was expired.
515
\throws invalid_argument `key` is the empty string.
516
\throws invalid_argument `metadata` is `nullptr`.
517
\throws invalid_argument `size` is negative.
518
\throws logic_error The new size of the entry would exceed the maximum cache size.
520
\note This operation does _not_ update the access time of the entry.
523
bool put_metadata(std::string const& key, char const* metadata, int64_t size);
526
\brief Removes an entry and returns its value.
528
If a (non-expired) entry with the given key can be found, it is removed
529
from the cache and its value returned.
531
\return A null value if the entry could not be found; the value of the entry, otherwise.
532
\throws invalid_argument `key` is the empty string.
534
Optional<std::string> take(std::string const& key);
537
\brief Removes an entry and returns its value and metadata.
539
If a (non-expired) entry with the given key can be found, it is removed
540
from the cache and its data returned.
541
If no metadata exists, `Data::metadata` is set to the empty string.
543
\return A null value if the entry could not be retrieved; the value and metadata of the entry, otherwise.
544
\throws invalid_argument `key` is the empty string.
546
Optional<Data> take_data(std::string const& key);
549
\brief Removes an entry and its associated metadata (if any).
551
If a (non-expired) entry with the given key can be found, it is removed
554
\return `true` if the entry was removed; `false` if the entry could not be found or was expired.
555
\throws invalid_argument `key` is the empty string.
557
bool invalidate(std::string const& key);
560
\brief Atomically removes the specified entries from the cache.
562
\param keys A vector of keys for the entries to be removed. If the vector is empty, this operation is
563
a no-op. If one or more keys are empty or specify non-existent entries, they are ignored.
565
void invalidate(std::vector<std::string> const& keys);
568
\brief Atomically removes the specified entries from the cache.
570
\param begin Iterator to the first key for the entries to be removed.
571
\param end Iterator to the one-beyond-the-last key for the entries to be removed.
573
If the iterator range is empty, this operation is a no-op.
574
If one or more keys are empty or specify non-existent entries, they are ignored.
576
template<typename It>
577
void invalidate(It begin, It end)
579
std::vector<std::string> keys;
582
keys.push_back(*begin++);
588
\brief Atomically removes the specified entries from the cache.
590
\param keys The keys for the entries to be removed. If `keys` is empty, this operation is
591
a no-op. If one or more keys are empty or specify non-existent entries, they are ignored.
593
void invalidate(std::initializer_list<std::string> const& keys);
596
\brief Deletes all entries from the cache.
598
This operation completely empties the cache.
599
\note Clearing the cache also resets the statistics counters.
605
\brief Updates the access time of an entry.
607
If the entry specified by `key` is still in the cache (whether expired or not),
608
it is marked as the most-recently used entry. If the policy is `lru_ttl`, the
609
entry's expiry time is updated with the specified time (infinite expiry by default).
611
\return `true` if the entry was updated; `false` if the entry could not be found or
612
`expiry_time` is in the past.
613
\throws invalid_argument `key` is the empty string.
614
\throws logic_error `key` is the empty string.
615
\throws logic_error The cache policy is `lru_only` and a non-infinite expiry time was provided.
618
std::string const& key,
619
std::chrono::time_point<std::chrono::system_clock> expiry_time = std::chrono::system_clock::time_point());
622
\brief Resets all statistics counters.
627
\brief Changes the maximum size of the cache.
629
If `size_in_bytes` is greater or equal to max_size_in_bytes(), the cache size is set to `size_in_bytes`.
631
If `size_in_bytes` is less than max_size_in_bytes(), the cache discards existing entries until
632
the size falls to (or below) `size_in_bytes` and sets the cache size to the new value.
634
\throws invalid_argument `size_in_bytes` is < 1
636
\note If the new size is less than the current size, this operation compacts the database
637
to use the smallest possible amount of disk space.
639
void resize(int64_t size_in_bytes);
642
\brief Expires entries.
644
Expires entries using the cache's expiration policy until the cache size falls to or below
645
`used_size_in_bytes`. If `used_size_in_bytes` is equal to or greater than the current cache size,
646
this operation is a no-op.
648
\throws invalid_argument `used_size_in_bytes` is < 0
649
\throws logic_error `used_size_in_bytes` is > max_size_in_bytes().
651
void trim_to(int64_t used_size_in_bytes);
654
\brief Compacts the database.
656
This operation compacts the database to consume as little disk space as possible.
657
Note that this operation can be slow. (Compacting a 100 MB cache can take around
658
ten seconds on a machine with a spinning-platter disk.)
664
/** @name Monitoring cache activity
666
The cache allows you to register one or more callback functions that are called when
667
the cache contents change.
669
\note Callback functions are called by the application thread that triggered the
672
\warning Do not invoke operations on the cache from within a callback function. Doing
673
so has undefined behavior.
679
\brief The type of a handler function.
681
\note Callback functions are called by the application thread that triggered the
684
\warning Do not invoke operations on the cache from within a callback function. Doing
685
so has undefined behavior.
687
\param key The key of the entry.
688
\param ev The event type.
689
\param stats The cache statistics. Note that the `stats` parameter reflects the state
690
of the cache _after_ the corresponding event. For example, for a `Put` event, `stats.size_in_bytes()`
691
_includes_ the size of the added entry.
693
typedef std::function<void(std::string const& key, CacheEvent ev, PersistentCacheStats const& stats)> EventCallback;
696
\brief Installs a handler for one or more events.
698
\param events A bitwise OR of the event types for which to install the handler. To install
699
a handler for all events, you can use core::AllCacheEvents.
701
\param cb The handler to install. To cancel an existing handler, pass `nullptr`.
703
For example, to install a handler for `get` and `put` events, you could use:
706
auto cache = PersistentStringCache::open("my_cache");
708
auto handler = [](string const& key, CacheEvent event, PersistentCacheStats const& stats)
712
cache->set_handler(CacheEvent::get | CacheEvent::put, handler);
717
void set_handler(CacheEvent events, EventCallback cb);
723
PersistentStringCache(std::string const& cache_path, int64_t max_size_in_bytes, CacheDiscardPolicy policy);
724
PersistentStringCache(std::string const& cache_path);
726
std::unique_ptr<internal::PersistentStringCacheImpl> p_;