### The Subset Assignment Problem for Data Placement in
Caches

**Abstract:**
We introduce the subset assignment problem in which items of varying sizes
are placed on a set of bins with limited capacity.
Items can be replicated and placed on any subset of the bins including
possibly the empty set. Each (item, subset) pair has an associated cost.
The goal is to minimize the total cost of assigning items to subsets
subject to the constraint that the bin capacities are not exceeded.
This problem is motivated by the design of caching systems composed of
banks of memory with varying cost/performance specifications.
The ability to replicate a data item in more than one memory bank can
benefit the overall performance of the system with a faster recovery time
in the event of a memory failure.
For this setting, the number n of data objects (items) is very large and
the number d of memory banks (bins) is a small constant (on the order of 3
or 4).
Therefore, the goal is to determine an optimal assignment in time that
minimizes dependence on n. The integral version of this problem is NP-hard
since it is a generalization of the knapsack problem. We focus on an
efficient solution to the LP relaxation as the number of fractionally
assigned items will be at most d. If the data objects are small with
respect to the size of the memory banks, the effect of dropping the
fractionally assigned data items will be small.
We give an algorithm that solves the LP relaxation in time
O(3^{d(d+1)} poly(d) n log(n) log(nC) log(Z)),
where Z is the maximum size of any object and C is the maximum cost for
storing an item.

Joint work with Shahram Ghandeharizadeh and Sandy Irani.