Monday, March 19, 2012

calculate a sum

hello,
i have a large table with invoices ~ 8 Mio entries,
with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
example
TInvoices
100001 22 23,22 19991129
100001 14 23,22 20051222
100001 22 23,22 20061230 *
100001 15 23,22 20051229
100001 11 23,22 20060101
100001 15 23,22 20060101
in a separat table i have a list of products for wich i must calculate the
sum 4 SellPrice
example - only for 2 Products (11 and 25)
TProducts4Sum
11
22
The period for which the sum must be calculated is in a other table
TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
20051220 20051231
The result must be :23,22
because only this line maches the 2 condition
100001 22 23,22 20061230 *
What is the best way to have a good performance - in calculating this.
best regardsI'm ; first you say you want to calculate the sum of SellPrice for
products 11 and 25, then you flag one row for product 22 and say that's the
row that should be returned. Why? What do you really want to do?
Adam Machanic
Pro SQL Server 2005, available now
http://www.apress.com/book/bookDisplay.html?bID=457
--
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:68F484A6-D53A-4AF2-90C1-F3C66E3243B9@.microsoft.com...
> hello,
> i have a large table with invoices ~ 8 Mio entries,
> with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
> example
> TInvoices
> 100001 22 23,22 19991129
> 100001 14 23,22 20051222
> 100001 22 23,22 20061230 *
> 100001 15 23,22 20051229
> 100001 11 23,22 20060101
> 100001 15 23,22 20060101
> in a separat table i have a list of products for wich i must calculate
> the
> sum 4 SellPrice
> example - only for 2 Products (11 and 25)
> TProducts4Sum
> 11
> 22
> The period for which the sum must be calculated is in a other table
> TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
> 20051220 20051231
> The result must be :23,22
> because only this line maches the 2 condition
> 100001 22 23,22 20061230 *
> What is the best way to have a good performance - in calculating this.
> best regards|||Xavier
> TProducts4Sum
> 11
> 22
I don't understand , do you want 11-25 or 11-22 in output?
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:68F484A6-D53A-4AF2-90C1-F3C66E3243B9@.microsoft.com...
> hello,
> i have a large table with invoices ~ 8 Mio entries,
> with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
> example
> TInvoices
> 100001 22 23,22 19991129
> 100001 14 23,22 20051222
> 100001 22 23,22 20061230 *
> 100001 15 23,22 20051229
> 100001 11 23,22 20060101
> 100001 15 23,22 20060101
> in a separat table i have a list of products for wich i must calculate
> the
> sum 4 SellPrice
> example - only for 2 Products (11 and 25)
> TProducts4Sum
> 11
> 22
> The period for which the sum must be calculated is in a other table
> TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
> 20051220 20051231
> The result must be :23,22
> because only this line maches the 2 condition
> 100001 22 23,22 20061230 *
> What is the best way to have a good performance - in calculating this.
> best regards|||Just join the 3 tables with inner joins.
TInvoices.ProductNr = TProducts4Sum.ProductNr
TInvoices.InvoiceDate between TPeriod4Sum.BeginDate and TPeriod4Sum.EndDate
Then select everything from table TLInvoices, just to make sure you are
returning the corrrect rows.
You can also use a subquery to select the list of products:
i.e. :
Where TInvoices.ProductNr in (select ProductNr from TProducts4Sum)
Depending on how much data you have, and your system setup, either one could
be faster. I suggest testing both approaches out and seeing what happens.
Try it each way for various numbers of products in the TProducts4Sum table.
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:68F484A6-D53A-4AF2-90C1-F3C66E3243B9@.microsoft.com...
> hello,
> i have a large table with invoices ~ 8 Mio entries,
> with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
> example
> TInvoices
> 100001 22 23,22 19991129
> 100001 14 23,22 20051222
> 100001 22 23,22 20061230 *
> 100001 15 23,22 20051229
> 100001 11 23,22 20060101
> 100001 15 23,22 20060101
> in a separat table i have a list of products for wich i must calculate
the
> sum 4 SellPrice
> example - only for 2 Products (11 and 25)
> TProducts4Sum
> 11
> 22
> The period for which the sum must be calculated is in a other table
> TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
> 20051220 20051231
> The result must be :23,22
> because only this line maches the 2 condition
> 100001 22 23,22 20061230 *
> What is the best way to have a good performance - in calculating this.
> best regards|||hello sorry for the confusions.
i try to explain the purpose.
All invoices for over 15 years are stored in the table TInvoices (~8Mio
records).
I want to know what Conversion(Sum of sellprice) was made with some products
in a defined period for each customer.
example - only for 2 Products (11 and 22)
TProducts4Sum
11
22
example the period
TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005) - the
table has 2 VarChar colums with the values
20051220 20051231
if TInvoices has entries like
100001 22 20,00 19991129
100002 34 40,00 20051122
100003 22 20,00 20051226 * ok(ProdNr 22 and corect period)
100003 21 10,00 20051227
100003 13 30,00 20051228
100003 11 10,00 20051228 * ok(ProdNr 11 and corect period)
100004 22 20,00 20051230 * ok(ProdNr 22 and corect period)
100010 15 50,00 20060101
in the table we have more customersNr
100001 ... 100010
I want to calculate for the products which are listed in the table
TProducts4Sum (normal with ~ 250ProductNr) the Sum for SellPrice for each
CustomerNr
result
cutomer
100003 -> 30,00 (10,00 + 20,00)
100004-> 20,00
thanks|||hello Jim,
i worked with with joins .... but it is not performant!
Now i tryed somthing like (attention - the table names are not exactly the
same ...)
SELECT SoldTo, SUM(SellPrice) AS SumSellPrice
FROM Invoices
WHERE (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
GROUP BY Material, SoldTo
HAVING (Material IN
(SELECT ProdNr
FROM ProdNr2ActionId
WHERE actionid = 46))
and for this calculation it takes ~ 12 seconds (output 877 lines)
of the form
SoldTo SumSellPrice
100001 23456,55
....
in the period are aprox 250000 records.
Is this time of 12 seconds normal?, or could it be better?
thanks|||I meant to add in my original post, after you confirm that you are getting
all the correct rows, just select product and sum of price.
select TProducts4Sum.ProductNr, sum(TInvoices.SellPrice)
From
[tables, joins, and where clause here]
group by TProducts4Sum.ProductNr
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:68F484A6-D53A-4AF2-90C1-F3C66E3243B9@.microsoft.com...
> hello,
> i have a large table with invoices ~ 8 Mio entries,
> with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
> example
> TInvoices
> 100001 22 23,22 19991129
> 100001 14 23,22 20051222
> 100001 22 23,22 20061230 *
> 100001 15 23,22 20051229
> 100001 11 23,22 20060101
> 100001 15 23,22 20060101
> in a separat table i have a list of products for wich i must calculate
the
> sum 4 SellPrice
> example - only for 2 Products (11 and 25)
> TProducts4Sum
> 11
> 22
> The period for which the sum must be calculated is in a other table
> TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
> 20051220 20051231
> The result must be :23,22
> because only this line maches the 2 condition
> 100001 22 23,22 20061230 *
> What is the best way to have a good performance - in calculating this.
> best regards|||Xavier,
Can you post the code you used with the joins? with 250000 records you
should get very fast performance with either result set, although this is
also dependent on your indexes. In my original post I assumed that you had
indexes on your date and/or product fields. If not, this could be the
source of your problem.
Also, try changing your having clause below to be part of the where clause.
Having is generally used to evaluate agregate values AFTER the group by is
processed, meaning your query will sum all products before filtering the
list. You probably want to make sure the list is filtered prior to
summarizing in order to speed things up.
After you have tried this, there are some other things you can look at to
try to speed things up, namely with indexes. If neither your date field nor
your product field are indexed in your invoices table, then the query will
run slowly no matter what.
Depending on your data, you want to make sure you are using the correct
index to minimize the result set.
I'm not certain how the indexes work with date ranges, but a between might
work better than two seperate ranges as you have now. I would see what
happens with indexes built on:
Date, Product
Product, Date
try one index, drop it, and try the other. See which one gives the best
results. Remember to try the results for different ranges and different
lists of products. One will work much better with small date ranges and the
other will work better with low numbers of products.
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:4A425FCD-7E92-4FBD-A7AD-4DFFEAB2B283@.microsoft.com...
> hello Jim,
> i worked with with joins .... but it is not performant!
> Now i tryed somthing like (attention - the table names are not exactly the
> same ...)
> SELECT SoldTo, SUM(SellPrice) AS SumSellPrice
> FROM Invoices
> WHERE (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
> GROUP BY Material, SoldTo
> HAVING (Material IN
> (SELECT ProdNr
> FROM ProdNr2ActionId
> WHERE actionid = 46))
> and for this calculation it takes ~ 12 seconds (output 877 lines)
> of the form
> SoldTo SumSellPrice
> 100001 23456,55
> ....
> in the period are aprox 250000 records.
> Is this time of 12 seconds normal?, or could it be better?
> thanks
>|||for clarifications .....
in the table are ~ 8Mio records.
the condition
WHERE (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
is valid for 250000 records
I have index on SoldTo,CreatedOn,ProdNr and Material.
I have to display that value SumSellPrice for each customer on the webpage.
Because it takes to much time to make this on the fly - i decided to make
this calculation for all customers ~70000 two times the day and write the
result in a suplimentary table, so that if over the web-frontend i get a
request i display the value from that suplimentary table.
I am not sure what is the best way to have best performances to calculate
each SumSellPrice for each customer for the predefined period of time and
product list
It seams that the needed time for calculation grows very much as i add the
date interval .. (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
I will try to use between.
The database from my customer - so i must make a bakup and restore on a
other system to check / drop indexes.
What do you mean with very fast performance ... how much is a usual time
for such a calculation - the server is a (Dual Xeon 900MHz 1GB Ram)
thanks
"Jim Underwood" wrote:

> Xavier,
> Can you post the code you used with the joins? with 250000 records you
> should get very fast performance with either result set, although this is
> also dependent on your indexes. In my original post I assumed that you ha
d
> indexes on your date and/or product fields. If not, this could be the
> source of your problem.
> Also, try changing your having clause below to be part of the where clause
.
> Having is generally used to evaluate agregate values AFTER the group by is
> processed, meaning your query will sum all products before filtering the
> list. You probably want to make sure the list is filtered prior to
> summarizing in order to speed things up.
> After you have tried this, there are some other things you can look at to
> try to speed things up, namely with indexes. If neither your date field n
or
> your product field are indexed in your invoices table, then the query will
> run slowly no matter what.
> Depending on your data, you want to make sure you are using the correct
> index to minimize the result set.
> I'm not certain how the indexes work with date ranges, but a between might
> work better than two seperate ranges as you have now. I would see what
> happens with indexes built on:
> Date, Product
> Product, Date
> try one index, drop it, and try the other. See which one gives the best
> results. Remember to try the results for different ranges and different
> lists of products. One will work much better with small date ranges and t
he
> other will work better with low numbers of products.
>
> "Xavier" <Xavier@.discussions.microsoft.com> wrote in message
> news:4A425FCD-7E92-4FBD-A7AD-4DFFEAB2B283@.microsoft.com...
>
>|||Xavier,
Retrieving 250 thousand of 8 million records is much, much different than
what I thought you had. It is possible that 12 seconds may be reasonable,
but I would think you could get it to return the data much faster.
Read through and try to reply to my questions at the end.
First, try creating an extra index on just:
CreatedOn,ProdNr
It is worth the time it takes to create a backup if this will fix your
problem.
I believe if the first part of your index is SoldTo, and you are not using
this field as a filter, then the index cannot be used and you are doing a
full table scan through all 8 million records. Adding this index should,
hopefully, allow you to process only 3 percent of the records, which should
give you a performance boost.
The increase that you are seeing when you increase the date interval is
probably the time it takes the database to summarize the data. Summarizing
data can be very processor and memory intensive.
Now, a few questions...
1. Did you move your HAVING clause to your WHERE clause and see what the
effects are? I would expect this to give you significant improvements, as
the database would filter the data before summarizing it, thus cutting down
on memory and CPU usage.
2. If you use only the product as a filter, how many records would be
returned? If this will typically be less than the 250000 that the date
filter returns, then you will want your index to be set up as:
ProdNr, CreatedOn
3. Please post the SQL you used with the join. Although I suspect the
problem is with your indexes, I would like to see how the SQL differs
between the two versions. It might help me to better inderstand how SQL
Server optimizes queries.
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:F50649FB-C631-4F15-9797-00DBE7CD9FE1@.microsoft.com...
> for clarifications .....
> in the table are ~ 8Mio records.
> the condition
> WHERE (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
> is valid for 250000 records
> I have index on SoldTo,CreatedOn,ProdNr and Material.
> I have to display that value SumSellPrice for each customer on the
webpage.
> Because it takes to much time to make this on the fly - i decided to make
> this calculation for all customers ~70000 two times the day and write the
> result in a suplimentary table, so that if over the web-frontend i get a
> request i display the value from that suplimentary table.
> I am not sure what is the best way to have best performances to calculate
> each SumSellPrice for each customer for the predefined period of time and
> product list
> It seams that the needed time for calculation grows very much as i add the
> date interval .. (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
> I will try to use between.
> The database from my customer - so i must make a bakup and restore on a
> other system to check / drop indexes.
> What do you mean with very fast performance ... how much is a usual time
> for such a calculation - the server is a (Dual Xeon 900MHz 1GB Ram)
> thanks
>
> "Jim Underwood" wrote:
>
is
had
clause.
is
to
nor
will
might
the
the

No comments:

Post a Comment