Quantcast
Channel: dBforums – Everything on Databases, Design, Developers and Administrators
Viewing all articles
Browse latest Browse all 13329

New to DB2

$
0
0
Hello Everyone.

I'm (somewhat) new to DB2. Essentially, I work as a data analyst for a consulting group. Most of the time, I get to work with Microsoft's SQL Sever, however a particular client has a particular need that we do the programming on their system which happens to be DB2. In addition, these people have a large customer base.

I've got a few different questions, and so I'll break them down into different posts as I come across the challenge.

First question:

I have 4 massive tables that I need to join together.
Each table has about 7 million rows.
* One table has transaction date information.
* One table has customer information.
* One table has account information (a customer can have multiple accounts)
* And one table has summary information (that I have created)

I want to join all this together into one table. This one table is fed into a proprietary data analysis tool and for speed reasons, its best to have all the data in one massive table fed into the tool. (The tool can handle it... its not he bottleneck.)

When building the query to do this, we've run into a lot of speed issues.

So, what is the best game plan for doing this efficiently?

I've tried putting clustered indices on the tables' keys and doing the joins one at a time.

So to walk you through it...

The transaction table table has the transaction start and stop dates, along with the customer ID number. I have set a clustered index on those 2 fields and then join the customer information onto the table using the transaction dates and the customer name. On this table, I add an index for the transaction dates, customer number, and account number. Then I join on the account data using those 3 fields as the key.

We end up with a pretty massive table with about 70 columns and 8 million rows. (Entirely because some customers have multiple accounts).

However, this whole join process takes a good 20 hours or so to do. I'd like to see if that sounds reasonable or if anyone has advice on creating mega-tables like this one.

Does this seem slow to you people? When we've built data like this before, we've typically had *much* smaller tables and better hardware. I'm wondering if I'm going about writing this query all wrong.

Would writing one massive join perform better? I made an initial stab at that, but it seemed to be even slower than this process.

Thanks in advance.

Viewing all articles
Browse latest Browse all 13329

Trending Articles