Quantcast
Channel: dBforums – Everything on Databases, Design, Developers and Administrators
Viewing all articles
Browse latest Browse all 13329

Picking a Representative Subset

$
0
0
This is an old problem, but Ia m looking for new solutions for it. The problem is to find a subset of rows such that each value in each of two columns (animals and food brands in this example) appears in at least one row. The purpose is to produce a set of samples from a large table. The table has a animal_name column and an food_brand column; I want a set of samples that contains at least one of each animal_name and at least one of each food_brand, but no more than necessary.

CREATE TABLE Feeding_Options
(license_nbr INTEGER NOT NULL PRIMARY KEY,
animal_name VARCHAR (10) NOT NULL,
food_brand VARCHAR(15) NOT NULL);

INSERT INTO Feeding_Options
VALUES
(6401715, 'rat', 'IAMS'),
(1058337, 'rat', 'IAMS'),
(459443, 'dog', 'Sci Diet'),
(4018210, 'cat', 'Purina'),
(2430656, 'cat', 'Purina'),
(6802081, 'fish', 'Alpo'),
(4236511, 'fish', 'Alpo'),
(2162104, 'cat', 'Purina'),
(2073679, 'fish', 'Alpo'),
(8148891, 'cat', 'Purina'),
(1868445, 'cat', 'Purina'),
(6749213, 'cat', 'Purina'),
(8363621, 'fox', 'Alpo'),
(9999, 'fox', 'IAMS');

To help frame the problem better, her are the values in each column:
animals = {'rat', 'dog', 'cat', 'fish', 'fox'}
food = {'IAMS', 'Sci Diet', 'Purina', 'Alpo'}

In this data, (6401715, 'rat', 'IAMS') and (1058337, 'rat', 'IAMS') are interchangeable, as are some of the other rows. There can be more than one minimal solution whcihmight be the whole set. But I would be happy to simply find a near-minimal solution.

Viewing all articles
Browse latest Browse all 13329

Trending Articles