Showing posts with label sum. Show all posts
Showing posts with label sum. Show all posts

Sunday, March 25, 2012

Calculated field question sum of a product rather then the product of sums

Here is my calculated field. I want to sum the “effective duration” * “portfolio weight” but what I get back is the sum of the “effective duration” * sum of “portfolio weight”.

How would I write that code or should I do that calculation when I am importing the data into the fact table.

CREATE MEMBER CURRENTCUBE.[MEASURES].[Weighted Duration]
AS Case
// Test to avoid division by zero.
When IsEmpty
(
[Measures].[Effective Duration]
)
Then Null
When IsEmpty
(
[Measures].[Portfolio Weight]
)
Then Null
Else (
// The Root function returns the (All) value for the target dimension.
[Measures].[Effective Duration] * [Measures].[Portfolio Weight]
)

End,
FORMAT_STRING = "#,#.0000",
VISIBLE = 1 ;

I believe the problem is caused by the fact that calculated measure does not have an aggregation function by itself.

At each level in a user hierarchy, it will have to compute its value based on the values of underlying regular measures you supplied in the expression. Hence, since in most likelihood 'effective duration' and 'portfolio weight' are aggregated using Sum function, this explains the result you are getting.

You may want to specify how to compute the calculated measure for each level of the 'target hierarchy'.

The following example does that using recursive definition for Weighted Duration. It will look at whether currentmember is a leaf in the 'target hierarchy' or not.

CREATE MEMBER CURRENTCUBE.[MEASURES].[Weighted Duration] AS

IIF ( IsLeaf([Target Dim].[Target Hierarchy].CurrentMember),

[Measures].[Effective Duration] * [Measures].[Portfolio Weight]),

Sum(([Target Dim].[Target Hierarchy].CurrentMember.Children, [MEASURES].[Weighted Duration])))

|||

I think this is the right concept but I need to go all the way down to the fact table. Either that or I do it in the FactTable itself. I feel like that's somewhat of a shortcut. If I precalculate everything in the fact table during the data import I feel like I'm cheating a little. I thought olap was made for this sort of thing. Weightings/ratios are rather basic.

The other part about it is pulling out a measure for a specific level in the heirarchy. No matter what dimensions I am using, if I am using DimPortfolio, I want the total of a given measure (market value) for the "All" level of the portfolio. I'm having trouble with that too.

|||

Precalculating is the best solution, given some limitations that calculated members/measures have.

You can submit your wishlist to SQL Server group though, I believe I saw some website dedicated to that, https://connect.microsoft.com

Calculated field question sum of a product rather then the product of sums

Here is my calculated field. I want to sum the “effective duration” * “portfolio weight” but what I get back is the sum of the “effective duration” * sum of “portfolio weight”.

How would I write that code or should I do that calculation when I am importing the data into the fact table.

CREATE MEMBER CURRENTCUBE.[MEASURES].[Weighted Duration]
AS Case
// Test to avoid division by zero.
When IsEmpty
(
[Measures].[Effective Duration]
)
Then Null
When IsEmpty
(
[Measures].[Portfolio Weight]
)
Then Null
Else (
// The Root function returns the (All) value for the target dimension.
[Measures].[Effective Duration] * [Measures].[Portfolio Weight]
)

End,
FORMAT_STRING = "#,#.0000",
VISIBLE = 1 ;

I believe the problem is caused by the fact that calculated measure does not have an aggregation function by itself.

At each level in a user hierarchy, it will have to compute its value based on the values of underlying regular measures you supplied in the expression. Hence, since in most likelihood 'effective duration' and 'portfolio weight' are aggregated using Sum function, this explains the result you are getting.

You may want to specify how to compute the calculated measure for each level of the 'target hierarchy'.

The following example does that using recursive definition for Weighted Duration. It will look at whether currentmember is a leaf in the 'target hierarchy' or not.

CREATE MEMBER CURRENTCUBE.[MEASURES].[Weighted Duration] AS

IIF ( IsLeaf([Target Dim].[Target Hierarchy].CurrentMember),

[Measures].[Effective Duration] * [Measures].[Portfolio Weight]),

Sum(([Target Dim].[Target Hierarchy].CurrentMember.Children, [MEASURES].[Weighted Duration])))

|||

I think this is the right concept but I need to go all the way down to the fact table. Either that or I do it in the FactTable itself. I feel like that's somewhat of a shortcut. If I precalculate everything in the fact table during the data import I feel like I'm cheating a little. I thought olap was made for this sort of thing. Weightings/ratios are rather basic.

The other part about it is pulling out a measure for a specific level in the heirarchy. No matter what dimensions I am using, if I am using DimPortfolio, I want the total of a given measure (market value) for the "All" level of the portfolio. I'm having trouble with that too.

|||

Precalculating is the best solution, given some limitations that calculated members/measures have.

You can submit your wishlist to SQL Server group though, I believe I saw some website dedicated to that, https://connect.microsoft.com

Thursday, March 22, 2012

calculated field crashes Vs 2005 sp1 ?

Hi!

when I'm trying something litle bit more complex thing than string manipulation in calculated field ex: =RunningValue(Fields!SALES.Value, Sum)


It just crashes visual studio when trying to run the report?! I think this could state as a bug in RS?

RunningValue requires 3 parameters (Expression, function and scope)

Edit Expression dialog may offer a choice with 2 parameters, that is a bug.

|||

Hi!

Thanks for a quick reply, but this did'nt solve this.

Maybe I'll explain little bit
I want to have cumulative sum value over the data. So I open datasource and add new field, and check this field as calculated field with expression : =RunningValue(Fields!SALES.Value, Sum, "DataSource1")

Then I click preview panel and Visual studio waits some seconds and crashes!!!

I really need this beacause Customer needs to have cumulative sales groups


Product Sales Cumulative %
A group
Product1 100 100 28%
Product2 70 170 48%
B group 170 48%
Product3 60 230 65%
Product4 50 280 80%
C group 280 80%
Product5 40 320 91%
Product6 30 350 100%


And the grouping is dynamic. Ex all cumulative sales less than 0% belongs to the group A sales below 80% belongs to B and others are in group C
At this moment I'm feeling pretty desperate on this case and it seems it is not possible to do this by report services!

|||Aggregate functions are not allowed in calculated fields|||

Seems that I found a solution.
Beacause this cannot be done wiht RS I had load data into data table, do inner calculations and then
pass this data to RS.

I don't understand design decisions of this? Why it's not possible to RS that Headers would accept data from inner fields as a criteria. This report creation process seems to be very waterfall aproach, no precalculations :(

calculated field crashes Vs 2005 sp1 ?

Hi!

when I'm trying something litle bit more complex thing than string manipulation in calculated field ex: =RunningValue(Fields!SALES.Value, Sum)


It just crashes visual studio when trying to run the report?! I think this could state as a bug in RS?

RunningValue requires 3 parameters (Expression, function and scope)

Edit Expression dialog may offer a choice with 2 parameters, that is a bug.

|||

Hi!

Thanks for a quick reply, but this did'nt solve this.

Maybe I'll explain little bit
I want to have cumulative sum value over the data. So I open datasource and add new field, and check this field as calculated field with expression : =RunningValue(Fields!SALES.Value, Sum, "DataSource1")

Then I click preview panel and Visual studio waits some seconds and crashes!!!

I really need this beacause Customer needs to have cumulative sales groups


Product Sales Cumulative %
A group
Product1 100 100 28%
Product2 70 170 48%
B group 170 48%
Product3 60 230 65%
Product4 50 280 80%
C group 280 80%
Product5 40 320 91%
Product6 30 350 100%


And the grouping is dynamic. Ex all cumulative sales less than 0% belongs to the group A sales below 80% belongs to B and others are in group C
At this moment I'm feeling pretty desperate on this case and it seems it is not possible to do this by report services!

|||Aggregate functions are not allowed in calculated fields|||

Seems that I found a solution.
Beacause this cannot be done wiht RS I had load data into data table, do inner calculations and then
pass this data to RS.

I don't understand design decisions of this? Why it's not possible to RS that Headers would accept data from inner fields as a criteria. This report creation process seems to be very waterfall aproach, no precalculations :(

sql

Calculated column in a table

Hi,
I have a table with a calculated column.
There is also a footer which calculates the sum in a column.
How I can calculate the sum from a caculated column?
Ex: 2 10
5 25 <- calculated column
--
7 35 <- calculated sum of column
(SRS 2000 SP2)
Thanks
EricThe best way to do this is to created a calculated field. On the field list,
do a right mouse click, add, click on calculated field. After that it is
treated like any other field.
Bruce Loehle-Conger
MVP SQL Server Reporting Services
"Eric" <jug@.nospam.nospam> wrote in message
news:uHCxZsEaFHA.1448@.TK2MSFTNGP09.phx.gbl...
> Hi,
> I have a table with a calculated column.
> There is also a footer which calculates the sum in a column.
> How I can calculate the sum from a caculated column?
>
> Ex: 2 10
> 5 25 <- calculated column
> --
> 7 35 <- calculated sum of column
> (SRS 2000 SP2)
>
> Thanks
> Eric
>|||Thanks for answering.
But now there's another problem.
I can't use an aggreate function in the column. The column calculates the
percentage of the first column to the Total.
Any idea jow I can do that in the report?
Thanks
Eric
"Bruce L-C [MVP]" <bruce_lcNOSPAM@.hotmail.com> schrieb im Newsbeitrag
news:eyqGY%23EaFHA.3712@.TK2MSFTNGP09.phx.gbl...
> The best way to do this is to created a calculated field. On the field
> list, do a right mouse click, add, click on calculated field. After that
> it is treated like any other field.
>
> --
> Bruce Loehle-Conger
> MVP SQL Server Reporting Services
> "Eric" <jug@.nospam.nospam> wrote in message
> news:uHCxZsEaFHA.1448@.TK2MSFTNGP09.phx.gbl...
>> Hi,
>> I have a table with a calculated column.
>> There is also a footer which calculates the sum in a column.
>> How I can calculate the sum from a caculated column?
>>
>> Ex: 2 10
>> 5 25 <- calculated column
>> --
>> 7 35 <- calculated sum of column
>> (SRS 2000 SP2)
>>
>> Thanks
>> Eric
>|||Hi Eric,
What aggreate function? You may refer sample report Company Sales.rdl for
more information about how to sum the columns.
Sincerely yours,
Michael Cheng
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
=====================================================
This posting is provided "AS IS" with no warranties, and confers no rights.|||Thanks for answering.
this is an too easy sample.
I have a column with values (DataBound). A second column should show the
percentage of first column to the sum.
Example:
20 20%
80 80%
-- --
100 100%
For calcualating the Second column I need to know the sum of the first
column.
How I do that?
If the second column is a 'calculated field', I can't use aggreate function
like 'Sum(column1)'.
thanks
Eric
"Michael Cheng [MSFT]" <v-mingqc@.online.microsoft.com> schrieb im
Newsbeitrag news:MDOIi7LbFHA.2476@.TK2MSFTNGXA01.phx.gbl...
> Hi Eric,
> What aggreate function? You may refer sample report Company Sales.rdl for
> more information about how to sum the columns.
>
> Sincerely yours,
> Michael Cheng
> Microsoft Online Partner Support
> When responding to posts, please "Reply to Group" via your newsreader so
> that others may learn and benefit from your issue.
> =====================================================> This posting is provided "AS IS" with no warranties, and confers no
> rights.
>|||Hi Eric,
Thanks for your patience.
For now, aggregates of aggregates are not currently supported in Reporting
Services so you will find other way to accomplish this. For example, you
could generate this data in the DataSet.\
Sincerely yours,
Michael Cheng
Microsoft Online Partner Support
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
=====================================================
This posting is provided "AS IS" with no warranties, and confers no rights.

Calculate Total Column

I need to calculate the Average by Location and then sum the Average for a Total that is displayed on a report. Following is my stored procedure so far:

SELECT tblLocn.LocnCode, tblLocn.LocnDesc,
#work.StartDate, #work.ZValue,
J1.ZAvg ............. Average by locn -- want the sum of all locns here
FROM tblLocn
LEFT JOIN #work ON tblLocn.LocnCode = #work.LocnCode
LEFT JOIN (SELECT #work.LocnCode,
ZAvg = Avg(#work.ZValue)
FROM #work
GROUP BY #work.LocnCode) J1 ON tblLocn.LocnCode = J1.LocnCode

How would I change this procedure to display either a) total average on each row, or b) location average on 1 row only so when I Sum this field it is the correct value.
I am aware that I could create a workfile with the Total value and then Join to that. Is there another alternative? ThanksJ1.ZAvg is your average by location, so when you sum it on a report you will get the sum of all averages.

Bigger question: are you sure this the operation you want? Your statistics will be skewed if your location don't have equal number of records. Consider:
Sum(Avg(1,2,3,4,5), Avg(6,7,8)) = 10

But the same values distributed across different locations yields this:
Sum(Avg(1,2,3), Avg(4,5,6,7,8)) = 8

If you just want the average per location, this value is constant across all locations since it applies to the entire enterprise, and can be calculated separately:
Sum(#work.ZValue)/Count(Locations)

blindman|||I do want the average by location as I have it coded. The problem with just summing the J1.ZAvg field is that the SELECT statement is not grouped by location but rather by DATE. This means that the J1.ZAvg field appears on each row for the location. If I sum that up, I would get an exploded value. Make sense?|||If I change the J1.ZAvg to Sum(J1.ZAvg), that would give me the total I want, except that it seems I cannot use the Sum unless I use a GROUP BY clause. In Access I used the DSUM aggregate function which is not available for Access Project. I may be missing something as I'm fairly new to SQL Server.|||So calculate the value once, store it in a variable, and include it in your output select statement:

set @.SumLocationAvgs = sum((select avg(zwork) from #Workd group by location))

Note: the above is just pseudo-code. You'll have to write for your specification.

blindman|||Makes total sense! Thanks a bunch.|||Oops .. I get the following message:

ERROR 130: Cannot perform an aggregate function on an expression containing an aggregate or a subquery

Set @.SumLocationAvgs = Sum((SELECT Avg(#work.ZValue) FROM #work GROUP BY LocnCode))|||Try this instead:

Set @.SumLocationAvgs =
(select Sum(SubQuery.LocnCodeAvg)
from (SELECT Avg(#work.ZValue) LocnCodeAvg FROM #work GROUP BY LocnCode) SubQuery)|||Exactly what I needed. Thank you.

Tuesday, March 20, 2012

Calculate sum of SQL Top 10

I have an SQL statement which returns the Top 10 states with the number of visitors

SELECT TOP 10 Customer.State States, COUNT(Customer.state) Visitors
FROM [Customer] WHERE Customer.year = '2006'
GROUP BY Customer.state
ORDER BY COUNT(Customer.state) DESC

So far this is what I have

state| visitors

MD341527.2PA215417.2NJ127510.2NY10258.2VA8136.5MA2922.3FL2562DE2431.9OH2411.9CA2381.9

But what i need is to calculate the total for the Visitors column in my SQL so that is like so

MD341527.2PA215417.2NJ127510.2NY10258.2VA8136.5MA2922.3FL2562DE2431.9OH2411.9CA2381.9Total Top 10995279.3Total for All Years12555100

I tried using the sum but I was only getting one value and not the rest...So how can i accomplish this?

Thank you

You can play with rollup and cube to get your result. Here is a sample for you to get start:

SELECT

ISNULL(state,'top10'),SUM(mycount)as top10Sum,SUM(myavg)as top10avg

FROM

tab1

GROUP

BY state

WITH

rollup

UNION

ALL

SELECT

'all states',SUM(mycount)as top10Sum,SUM(myavg)as top10avg

FROM

tab1|||

where would i place this query

SELECT TOP 10 Customer.State States, COUNT(Customer.state) Visitors
FROM [Customer] WHERE Customer.year = '2006'
GROUP BY Customer.state
ORDER BY COUNT(Customer.state) DESC

|||

Somehting like this:

SELECT

t3.state, t3.mycount1FROM(

SELECT

TOP 10 State,COUNT(*)as mycount1FROM Customer

GROUPBY state

ORDERBYCOUNT(*)DESC) t3

UNION

SELECT

'top10'as state,SUM(t1.mycount1)as mycount1FROM(

SELECT

TOP 10 State,COUNT(*)as mycount1FROM CustomerGROUPBY stateORDERBYCOUNT(*)DESC)as t1

UNION

ALL

SELECT

'ALL',COUNT(customer)as mycount1

FROM

Customer|||

this example you gave me is not working...

basically I need a way to combine the following two SQL statements to have one final result

SELECT TOP 10 Customer.State States, COUNT(Customer.state) Visitors
FROM [Customer] WHERE Customer.year = '2006'
GROUP BY Customer.state
ORDER BY COUNT(Customer.state) DESC

SELECT 'Total Top 10', SUM(t1.Visitors)
FROM
(SELECT TOP 10 Customer.State States, COUNT(Customer.state) Visitors
FROM [Customer] WHERE Customer.year = '2006'
GROUP BY Customer.state
ORDER BY COUNT(Customer.state) DESC )t1|||

declare @.result table

(

States varchar(100),

Visitors int

)

insert into @.result (States, Visitors)

SELECT TOP 10 Customer.State States, COUNT(Customer.state) Visitors
FROM [Customer] WHERE Customer.year = '2006'
GROUP BY Customer.state
ORDER BY COUNT(Customer.state) DESC

select *

from @.result

union all

select 'Total Top 10', sum(Visitors) from @.result

|||thanx you're a life saver|||

Hello,

I don't know why it is not working for you since I don't have any data from you to test.

Here is something I used to test the script.

CREATE TABLE [dbo].[tab1$]([state] [nvarchar](50), [customer] [nvarchar](50) )Sample Data:INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c1')INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c2')INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c3')INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c4')INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c5')INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c6')INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c7')INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c8')INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c9')INSERT INTO [tab1$] ([state],[customer])VALUES('a1','c10')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b1')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b2')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b3')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b4')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b5')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b6')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b7')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b8')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b9')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b10')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b11')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b12')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b13')INSERT INTO [tab1$] ([state],[customer])VALUES('a2','b14')INSERT INTO [tab1$] ([state],[customer])VALUES('a3','c1')INSERT INTO [tab1$] ([state],[customer])VALUES('a3','c2')INSERT INTO [tab1$] ([state],[customer])VALUES('a3','c3')INSERT INTO [tab1$] ([state],[customer])VALUES('a3','c4')INSERT INTO [tab1$] ([state],[customer])VALUES('a3','c5')INSERT INTO [tab1$] ([state],[customer])VALUES('a4','d1')INSERT INTO [tab1$] ([state],[customer])VALUES('a4','d2')INSERT INTO [tab1$] ([state],[customer])VALUES('a4','d3')INSERT INTO [tab1$] ([state],[customer])VALUES('a4','d4')INSERT INTO [tab1$] ([state],[customer])VALUES('a4','d5')INSERT INTO [tab1$] ([state],[customer])VALUES('a4','d6')INSERT INTO [tab1$] ([state],[customer])VALUES('a4','d7')

Here is the SQL script (I chose top 2 instead):

SELECT t3.state, t3.mycount1FROM(SELECT TOP 2 State,COUNT(*)as mycount1FROM tab1$GROUP BY stateORDER BYCOUNT(*)DESC) t3UNION SELECT'top 10'as state,SUM(t1.mycount1)as mycount1FROM (SELECT TOP 2 State,COUNT(*)as mycount1FROM tab1$GROUP BY stateORDER BYCOUNT(*)DESC)as t1UNIONALLSELECT'ALL',COUNT(customer)as mycount1FROM tab1$

|||

When i try to use this @.result table on this query I get the following error

SELECT TOP (10) t1.City City,t1.State State,SUM(t1.Population) Population , SUM(t1.Visitors) Visitors
FROM (
SELECT Customer.zip Zipcode, COUNT(Zipcode) Visitors,Census.city,Census.State,Census.Population
FROM [Customer] JOIN [Census Test Data] Census ON Customer.zip = Census.zipcode
WHERE Customer.month = '8' AND Customer.year = '2006'
GROUP BY Customer.zip, Census.city,Census.State,Census.Population
) t1
GROUP BY t1.city,t1.State
ORDER BY Visitors DESC

ERROR: The select list for the INSERT statement contains more items than the insert list. The number of SELECT values must match the number of INSERT columns.

How do i make the values match

Calculate percent based on SUM

I'd like to calculate the percent of the total for column DataValue. So if
the sum of column DataValue is 200 and the values for record A is 25 I'd
like to return 12.5%
Based on my sample data I'd like to return:
A,12.5
B,25
C,12.5
D,50
CREATE TABLE TEST
(ID varchar(1),DataValue int)
INSERT INTO TEST (ID,DataValue)VALUES('A',25)
INSERT INTO TEST (ID,DataValue)VALUES('B',50)
INSERT INTO TEST (ID,DataValue)VALUES('C',25)
INSERT INTO TEST (ID,DataValue)VALUES('D',100)
Thanks...SELECT id, datavalue/
(SELECT CAST(SUM(datavalue) AS REAL)
FROM Test)*100
FROM Test
David Portas
SQL Server MVP
--|||Try,
use northwind
go
CREATE TABLE TEST
(ID varchar(1),DataValue int)
INSERT INTO TEST (ID,DataValue)VALUES('A',25)
INSERT INTO TEST (ID,DataValue)VALUES('B',50)
INSERT INTO TEST (ID,DataValue)VALUES('C',25)
INSERT INTO TEST (ID,DataValue)VALUES('D',100)
go
select
a.[id],
(a.datavalue * 100.00) / nullif(b.sum_datavalue, 0) as col_percent
from
test as a
inner join
(
select sum(datavalue) from test
) as b(sum_datavalue)
on 1 = 1
go
drop table test
go
AMB
"Terri" wrote:

> I'd like to calculate the percent of the total for column DataValue. So if
> the sum of column DataValue is 200 and the values for record A is 25 I'd
> like to return 12.5%
> Based on my sample data I'd like to return:
> A,12.5
> B,25
> C,12.5
> D,50
> CREATE TABLE TEST
> (ID varchar(1),DataValue int)
> INSERT INTO TEST (ID,DataValue)VALUES('A',25)
> INSERT INTO TEST (ID,DataValue)VALUES('B',50)
> INSERT INTO TEST (ID,DataValue)VALUES('C',25)
> INSERT INTO TEST (ID,DataValue)VALUES('D',100)
> Thanks...
>
>

Calculate integral of time series

Hello,
I want to compute an integral of values in a table that has 2 columns:
Column A: Time-stamp (T)
Column B: value (KWH)
I need to compute the sum of (KWH*(T(n)-T(n-1)))
Could you suggest a way to do it in T-SQL?
thanks
AmiAmi Einav wrote:
>I need to compute the sum of (KWH*(T(n)-T(n-1)))
Ad an autoincrementing id to your Table named idTbl
Assuming Tbl as Your Tablename
SELECT Tbl.T, Tbl2.T, Tbl.KWH , Tbl.KWH*(Tbl.T-Tbl2.T) as KWHt
FROM Tbl
INNER JOIN Tbl as Tbl2 on Tbl2.idTbl+1 = Tbl.idTbl
Maybe you a datetime - conversion to get the calculation work.
in the step Goup/Sum it together as you like
greetings,
J.C.|||Thanks Joe - that's exactly what I was looking for.
Ami
"Joe Care" <u18810@.uwe> wrote in message news:5c0163d5b2f02@.uwe...
> Ami Einav wrote:
> Ad an autoincrementing id to your Table named idTbl
> Assuming Tbl as Your Tablename
> SELECT Tbl.T, Tbl2.T, Tbl.KWH , Tbl.KWH*(Tbl.T-Tbl2.T) as KWHt
> FROM Tbl
> INNER JOIN Tbl as Tbl2 on Tbl2.idTbl+1 = Tbl.idTbl
> Maybe you a datetime - conversion to get the calculation work.
> in the step Goup/Sum it together as you like
> greetings,
> J.C.sql

Monday, March 19, 2012

calculate a sum

hello,
i have a large table with invoices ~ 8 Mio entries,
with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
example
TInvoices
100001 22 23,22 19991129
100001 14 23,22 20051222
100001 22 23,22 20061230 *
100001 15 23,22 20051229
100001 11 23,22 20060101
100001 15 23,22 20060101
in a separat table i have a list of products for wich i must calculate the
sum 4 SellPrice
example - only for 2 Products (11 and 25)
TProducts4Sum
11
22
The period for which the sum must be calculated is in a other table
TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
20051220 20051231
The result must be :23,22
because only this line maches the 2 condition
100001 22 23,22 20061230 *
What is the best way to have a good performance - in calculating this.
best regardsI'm ; first you say you want to calculate the sum of SellPrice for
products 11 and 25, then you flag one row for product 22 and say that's the
row that should be returned. Why? What do you really want to do?
Adam Machanic
Pro SQL Server 2005, available now
http://www.apress.com/book/bookDisplay.html?bID=457
--
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:68F484A6-D53A-4AF2-90C1-F3C66E3243B9@.microsoft.com...
> hello,
> i have a large table with invoices ~ 8 Mio entries,
> with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
> example
> TInvoices
> 100001 22 23,22 19991129
> 100001 14 23,22 20051222
> 100001 22 23,22 20061230 *
> 100001 15 23,22 20051229
> 100001 11 23,22 20060101
> 100001 15 23,22 20060101
> in a separat table i have a list of products for wich i must calculate
> the
> sum 4 SellPrice
> example - only for 2 Products (11 and 25)
> TProducts4Sum
> 11
> 22
> The period for which the sum must be calculated is in a other table
> TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
> 20051220 20051231
> The result must be :23,22
> because only this line maches the 2 condition
> 100001 22 23,22 20061230 *
> What is the best way to have a good performance - in calculating this.
> best regards|||Xavier
> TProducts4Sum
> 11
> 22
I don't understand , do you want 11-25 or 11-22 in output?
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:68F484A6-D53A-4AF2-90C1-F3C66E3243B9@.microsoft.com...
> hello,
> i have a large table with invoices ~ 8 Mio entries,
> with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
> example
> TInvoices
> 100001 22 23,22 19991129
> 100001 14 23,22 20051222
> 100001 22 23,22 20061230 *
> 100001 15 23,22 20051229
> 100001 11 23,22 20060101
> 100001 15 23,22 20060101
> in a separat table i have a list of products for wich i must calculate
> the
> sum 4 SellPrice
> example - only for 2 Products (11 and 25)
> TProducts4Sum
> 11
> 22
> The period for which the sum must be calculated is in a other table
> TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
> 20051220 20051231
> The result must be :23,22
> because only this line maches the 2 condition
> 100001 22 23,22 20061230 *
> What is the best way to have a good performance - in calculating this.
> best regards|||Just join the 3 tables with inner joins.
TInvoices.ProductNr = TProducts4Sum.ProductNr
TInvoices.InvoiceDate between TPeriod4Sum.BeginDate and TPeriod4Sum.EndDate
Then select everything from table TLInvoices, just to make sure you are
returning the corrrect rows.
You can also use a subquery to select the list of products:
i.e. :
Where TInvoices.ProductNr in (select ProductNr from TProducts4Sum)
Depending on how much data you have, and your system setup, either one could
be faster. I suggest testing both approaches out and seeing what happens.
Try it each way for various numbers of products in the TProducts4Sum table.
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:68F484A6-D53A-4AF2-90C1-F3C66E3243B9@.microsoft.com...
> hello,
> i have a large table with invoices ~ 8 Mio entries,
> with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
> example
> TInvoices
> 100001 22 23,22 19991129
> 100001 14 23,22 20051222
> 100001 22 23,22 20061230 *
> 100001 15 23,22 20051229
> 100001 11 23,22 20060101
> 100001 15 23,22 20060101
> in a separat table i have a list of products for wich i must calculate
the
> sum 4 SellPrice
> example - only for 2 Products (11 and 25)
> TProducts4Sum
> 11
> 22
> The period for which the sum must be calculated is in a other table
> TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
> 20051220 20051231
> The result must be :23,22
> because only this line maches the 2 condition
> 100001 22 23,22 20061230 *
> What is the best way to have a good performance - in calculating this.
> best regards|||hello sorry for the confusions.
i try to explain the purpose.
All invoices for over 15 years are stored in the table TInvoices (~8Mio
records).
I want to know what Conversion(Sum of sellprice) was made with some products
in a defined period for each customer.
example - only for 2 Products (11 and 22)
TProducts4Sum
11
22
example the period
TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005) - the
table has 2 VarChar colums with the values
20051220 20051231
if TInvoices has entries like
100001 22 20,00 19991129
100002 34 40,00 20051122
100003 22 20,00 20051226 * ok(ProdNr 22 and corect period)
100003 21 10,00 20051227
100003 13 30,00 20051228
100003 11 10,00 20051228 * ok(ProdNr 11 and corect period)
100004 22 20,00 20051230 * ok(ProdNr 22 and corect period)
100010 15 50,00 20060101
in the table we have more customersNr
100001 ... 100010
I want to calculate for the products which are listed in the table
TProducts4Sum (normal with ~ 250ProductNr) the Sum for SellPrice for each
CustomerNr
result
cutomer
100003 -> 30,00 (10,00 + 20,00)
100004-> 20,00
thanks|||hello Jim,
i worked with with joins .... but it is not performant!
Now i tryed somthing like (attention - the table names are not exactly the
same ...)
SELECT SoldTo, SUM(SellPrice) AS SumSellPrice
FROM Invoices
WHERE (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
GROUP BY Material, SoldTo
HAVING (Material IN
(SELECT ProdNr
FROM ProdNr2ActionId
WHERE actionid = 46))
and for this calculation it takes ~ 12 seconds (output 877 lines)
of the form
SoldTo SumSellPrice
100001 23456,55
....
in the period are aprox 250000 records.
Is this time of 12 seconds normal?, or could it be better?
thanks|||I meant to add in my original post, after you confirm that you are getting
all the correct rows, just select product and sum of price.
select TProducts4Sum.ProductNr, sum(TInvoices.SellPrice)
From
[tables, joins, and where clause here]
group by TProducts4Sum.ProductNr
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:68F484A6-D53A-4AF2-90C1-F3C66E3243B9@.microsoft.com...
> hello,
> i have a large table with invoices ~ 8 Mio entries,
> with a structure -> customNr,ProductNr,SellPrice,InvoiceDate
> example
> TInvoices
> 100001 22 23,22 19991129
> 100001 14 23,22 20051222
> 100001 22 23,22 20061230 *
> 100001 15 23,22 20051229
> 100001 11 23,22 20060101
> 100001 15 23,22 20060101
> in a separat table i have a list of products for wich i must calculate
the
> sum 4 SellPrice
> example - only for 2 Products (11 and 25)
> TProducts4Sum
> 11
> 22
> The period for which the sum must be calculated is in a other table
> TPeriod4Sum (in this case for the period 20.12.2005 - 31.12.2005)
> 20051220 20051231
> The result must be :23,22
> because only this line maches the 2 condition
> 100001 22 23,22 20061230 *
> What is the best way to have a good performance - in calculating this.
> best regards|||Xavier,
Can you post the code you used with the joins? with 250000 records you
should get very fast performance with either result set, although this is
also dependent on your indexes. In my original post I assumed that you had
indexes on your date and/or product fields. If not, this could be the
source of your problem.
Also, try changing your having clause below to be part of the where clause.
Having is generally used to evaluate agregate values AFTER the group by is
processed, meaning your query will sum all products before filtering the
list. You probably want to make sure the list is filtered prior to
summarizing in order to speed things up.
After you have tried this, there are some other things you can look at to
try to speed things up, namely with indexes. If neither your date field nor
your product field are indexed in your invoices table, then the query will
run slowly no matter what.
Depending on your data, you want to make sure you are using the correct
index to minimize the result set.
I'm not certain how the indexes work with date ranges, but a between might
work better than two seperate ranges as you have now. I would see what
happens with indexes built on:
Date, Product
Product, Date
try one index, drop it, and try the other. See which one gives the best
results. Remember to try the results for different ranges and different
lists of products. One will work much better with small date ranges and the
other will work better with low numbers of products.
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:4A425FCD-7E92-4FBD-A7AD-4DFFEAB2B283@.microsoft.com...
> hello Jim,
> i worked with with joins .... but it is not performant!
> Now i tryed somthing like (attention - the table names are not exactly the
> same ...)
> SELECT SoldTo, SUM(SellPrice) AS SumSellPrice
> FROM Invoices
> WHERE (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
> GROUP BY Material, SoldTo
> HAVING (Material IN
> (SELECT ProdNr
> FROM ProdNr2ActionId
> WHERE actionid = 46))
> and for this calculation it takes ~ 12 seconds (output 877 lines)
> of the form
> SoldTo SumSellPrice
> 100001 23456,55
> ....
> in the period are aprox 250000 records.
> Is this time of 12 seconds normal?, or could it be better?
> thanks
>|||for clarifications .....
in the table are ~ 8Mio records.
the condition
WHERE (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
is valid for 250000 records
I have index on SoldTo,CreatedOn,ProdNr and Material.
I have to display that value SumSellPrice for each customer on the webpage.
Because it takes to much time to make this on the fly - i decided to make
this calculation for all customers ~70000 two times the day and write the
result in a suplimentary table, so that if over the web-frontend i get a
request i display the value from that suplimentary table.
I am not sure what is the best way to have best performances to calculate
each SumSellPrice for each customer for the predefined period of time and
product list
It seams that the needed time for calculation grows very much as i add the
date interval .. (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
I will try to use between.
The database from my customer - so i must make a bakup and restore on a
other system to check / drop indexes.
What do you mean with very fast performance ... how much is a usual time
for such a calculation - the server is a (Dual Xeon 900MHz 1GB Ram)
thanks
"Jim Underwood" wrote:

> Xavier,
> Can you post the code you used with the joins? with 250000 records you
> should get very fast performance with either result set, although this is
> also dependent on your indexes. In my original post I assumed that you ha
d
> indexes on your date and/or product fields. If not, this could be the
> source of your problem.
> Also, try changing your having clause below to be part of the where clause
.
> Having is generally used to evaluate agregate values AFTER the group by is
> processed, meaning your query will sum all products before filtering the
> list. You probably want to make sure the list is filtered prior to
> summarizing in order to speed things up.
> After you have tried this, there are some other things you can look at to
> try to speed things up, namely with indexes. If neither your date field n
or
> your product field are indexed in your invoices table, then the query will
> run slowly no matter what.
> Depending on your data, you want to make sure you are using the correct
> index to minimize the result set.
> I'm not certain how the indexes work with date ranges, but a between might
> work better than two seperate ranges as you have now. I would see what
> happens with indexes built on:
> Date, Product
> Product, Date
> try one index, drop it, and try the other. See which one gives the best
> results. Remember to try the results for different ranges and different
> lists of products. One will work much better with small date ranges and t
he
> other will work better with low numbers of products.
>
> "Xavier" <Xavier@.discussions.microsoft.com> wrote in message
> news:4A425FCD-7E92-4FBD-A7AD-4DFFEAB2B283@.microsoft.com...
>
>|||Xavier,
Retrieving 250 thousand of 8 million records is much, much different than
what I thought you had. It is possible that 12 seconds may be reasonable,
but I would think you could get it to return the data much faster.
Read through and try to reply to my questions at the end.
First, try creating an extra index on just:
CreatedOn,ProdNr
It is worth the time it takes to create a backup if this will fix your
problem.
I believe if the first part of your index is SoldTo, and you are not using
this field as a filter, then the index cannot be used and you are doing a
full table scan through all 8 million records. Adding this index should,
hopefully, allow you to process only 3 percent of the records, which should
give you a performance boost.
The increase that you are seeing when you increase the date interval is
probably the time it takes the database to summarize the data. Summarizing
data can be very processor and memory intensive.
Now, a few questions...
1. Did you move your HAVING clause to your WHERE clause and see what the
effects are? I would expect this to give you significant improvements, as
the database would filter the data before summarizing it, thus cutting down
on memory and CPU usage.
2. If you use only the product as a filter, how many records would be
returned? If this will typically be less than the 250000 that the date
filter returns, then you will want your index to be set up as:
ProdNr, CreatedOn
3. Please post the SQL you used with the join. Although I suspect the
problem is with your indexes, I would like to see how the SQL differs
between the two versions. It might help me to better inderstand how SQL
Server optimizes queries.
"Xavier" <Xavier@.discussions.microsoft.com> wrote in message
news:F50649FB-C631-4F15-9797-00DBE7CD9FE1@.microsoft.com...
> for clarifications .....
> in the table are ~ 8Mio records.
> the condition
> WHERE (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
> is valid for 250000 records
> I have index on SoldTo,CreatedOn,ProdNr and Material.
> I have to display that value SumSellPrice for each customer on the
webpage.
> Because it takes to much time to make this on the fly - i decided to make
> this calculation for all customers ~70000 two times the day and write the
> result in a suplimentary table, so that if over the web-frontend i get a
> request i display the value from that suplimentary table.
> I am not sure what is the best way to have best performances to calculate
> each SumSellPrice for each customer for the predefined period of time and
> product list
> It seams that the needed time for calculation grows very much as i add the
> date interval .. (CreatedOn >= '20050601') AND (CreatedOn <= '20050631')
> I will try to use between.
> The database from my customer - so i must make a bakup and restore on a
> other system to check / drop indexes.
> What do you mean with very fast performance ... how much is a usual time
> for such a calculation - the server is a (Dual Xeon 900MHz 1GB Ram)
> thanks
>
> "Jim Underwood" wrote:
>
is
had
clause.
is
to
nor
will
might
the
the

Thursday, March 8, 2012

Caching doesn't seem to be working

I have a small Cube (about 100MB size) with about 10 dimensions.

I have created YTD calculated members (few of them) like this:

SUM
( PeriodsToDate (
[end_date].[Financial].[FYear],
TAIL (EXISTING [end_date].[Financial].[FYYYYMM].Members, 1).item(0)
),
[Measures].[LOS]
)

Now if I do a simple MDX like this:

SELECT

NON EMPTY ([end_date].[FYYYYMM Attribute].[200708]

) ON COLUMNS ,

NON EMPTY ([specialty].[SubSpecialtyDesc Attribute].allmembers,

[purchase_unit].[PUC].allmembers)

ON ROWS

FROM [cube_contract_reporting]

WHERE (

[Measures].[ActualRawVolumeYTD],

[purchase_unit].[WIES_PU].[WIES PU].&No,

[end_date].[Financial].[FYear].&[2007],

[purchaser].[PHFundedFlag Attribute].&Yes

)

It returns the first run result in 8 minutes. Which is long, however second run takes 50 seconds, which makes Okay.

if I run something like this:

SELECT

NON EMPTY ([end_date].[FYYYYMM Attribute].allmembers

) ON COLUMNS ,

NON EMPTY ([specialty].[SubSpecialtyDesc Attribute].allmembers,

[purchase_unit].[PUC].allmembers)

ON ROWS

FROM [cube_contract_reporting]

WHERE (

[Measures].[ActualRawVolumeYTD],

[purchase_unit].[WIES_PU].[WIES PU].&No,

[end_date].[Financial].[FYear].&[2007],

[purchaser].[PHFundedFlag Attribute].&Yes

)

it will take few hours for the first run and about an hour for the second run...

is this a caching problem? or am I doing something wrong?

I have created so many aggregations, and tried every trick I can think of or found online.... nothing changed the performance!

Actually the caching does appear to be working as the second runs are all noticably faster than the initial run. It appears to be the calculation that is slowing things down. There are some things that can be done to tune YTD style calculations, but they ususally rely on having a natural date hierarchy. We need to know a few things about your cube to see how best to tune it.

Is [LOS] a base measure or a calculation?

How fast do the queries run if you use a base measure instead of the YTD calculation?

Does the [end_date].[FYYYYMM Attribute] attribute have a relationship to the [end_date].[FYear] attribute?

Do the [puchase_unit].[WIES_PU] and [purchase_unit].[PUC] attributes have a relationship defined between them?

|||

Darren Gosbell wrote:

Actually the caching does appear to be working as the second runs are all noticably faster than the initial run. It appears to be the calculation that is slowing things down. There are some things that can be done to tune YTD style calculations, but they ususally rely on having a natural date hierarchy. We need to know a few things about your cube to see how best to tune it.

Is [LOS] a base measure or a calculation?

How fast do the queries run if you use a base measure instead of the YTD calculation?

Does the [end_date].[FYYYYMM Attribute] attribute have a relationship to the [end_date].[FYear] attribute?

Do the [puchase_unit].[WIES_PU] and [purchase_unit].[PUC] attributes have a relationship defined between them?

I haven't received the email notification so I didn't know of you reply, sorry for being late.

[LOS] is a base measure, however LOSYTD is calculated memeber (among about 20 YTD fields)

How fast on base measures : Very Fast, 6 seconds for the very same query with LOS instead of LOSYTD

There is a Financial Hierarchy that holds the fiscal year dates FYear-> FQTR->FYYYYMM->Date

No relationship exists between Weis_PU and PUC.

Also one "interesting" note. I noticed that if I have only ONE YTD calculation in the cube then the first run takes long (say 50 min.) and the second run takes about 50 seconds! (which is long but bearable).

Thank you for helping me out.

|||

Yes, there seems to have been an issue with the notifications. I got none for a number of days and then over 20 of them came through today.

The number of YTD calculations in the cube should not really make a difference, I can't think why this should impact on performance as calculations are only executed as they are referenced. This is a bit of a concern, but I can't think what could be causing this behaviour.

Seeing that you do have relationships (and presumably a user hierarchy) defined on the Financial date, you might be able to optimize it using the following technique http://cwebbbi.spaces.live.com/Blog/cns!1pi7ETChsJ1un_2s41jm9Iyg!111.entry. This basically "walks" up the date hierarchy, so that if you are on month 11, rather than adding up 11 months, it adds up 3 quarters and 2 months, which can significantly speed up your calculation. (although SP2 might do this without a code change)

Are you on SP2? There were some performance enhancements particularly aimed at running sum and YTD style calculations that my help significantly (see: http://www.sqljunkies.com/WebLog/mosha/archive/2006/11/17/rsum_performance.aspx)

If "feels" as if there are not appropriate aggregations at the month level and that the code is being forced to go down to the leaf level. The SP2 fix mentioned above may help here as may running the usage based optimisation wizard.

Also, just looking back at your calculation. Do you really need the EXISTING operator in there? Neither of your sample queries appeared to require it and it would probably run much faster with a reference to currentmember.

Code Snippet

SUM
( PeriodsToDate (
[end_date].[Financial].[FYear],
[end_date].[Financial].[FYYYYMM].CurrentMember
),
[Measures].[LOS]
)

This of course gives you a relative YTD, but if you want to replicate the fixed year to date that I think you probably get with the the Existing function then you could do something like the following. I don't think this would handle the ALL member properly, but you could fix that with an appropriate SCOPE statement to handle that case.

Code Snippet

SUM
( PeriodsToDate (
[end_date].[Financial].[FYear],
Tail([end_date].[Financial].[FYYYYMM].CurrentMember.Siblings,1).item(0)
),
[Measures].[LOS]
)

|||

Hi,

Thank you for your help. I will certainly push for SP2 as it seems to be a good solution.

The reason behind using "existing" is to allow the user to multi-select. In a common scenario, a business user would say what was the YTD for a particular month in a particular year, and then try another month and another...etc...

The first code you used would fail on multisets because it can't determine the current member, and I suppose the second will too, but I'm not sure.

I don't know what is happening BUT I did the following:

1. Removed a dimension with a huge set of records that we used for testing only (that is not used in the MDX AT ALL)

2. Partitioned the data based on financial year.

The same MDX (no change) is doing the first run in 30 min. and the second in 1 min.

I will see if SP2 further brings this down and let you know. Thanks for your help!

|||

No, neither of the calcs I suggested in the previous post will work in a multi select scenario. They will work with multiple members on the axis, but not if you have multiple months in a set in the where clause. It was just that your sample query showed selection by year, so I though maybe the multi-select capability might not be strictly required.

Adding partitioning makes some sense as your queries were only hitting a single year, so this would cause the storage engine to have to scan less data. Even though SSAS does not require you to set the data slice for a partition, there has been some evidence recently that doing so can improve performance so you might want to make sure you are doing this.

Removing a large dimension that was not part of the query would also reduce the overall size of the cube, but may also point to the fact that the query might not be finding appropriate aggregations. The updated samples with SP2 includes an aggregation designer and while you would want to be really careful about manually designing your own aggregations, this tool does a good job of presenting the existing aggregations in a GUI which helps you easily see which attributes are included in the designed aggregations.

One final suggestion that I just thought of was that sometimes breaking your calculation into smaller pieces can help as the individual bits can be cached separately by the formula engine. So possibly breaking out the finding of the last month member from the actual periods to date sum would speed things up too. It's hard to know for sure if this will have much of an impact or not, but it should be fairly easy for you to test

Code Snippet

CREATE MEMBER CURRENTCUBE.[end_date].[Financial].[CurrentEndMonth]

AS TAIL (EXISTING [end_date].[Financial].[FYYYYMM].Members, 1).item(0)


SUM
( PeriodsToDate (

[end_date].[Financial].[FYear],
[end_date].[Financial].[CurrentEndMonth]

),
[Measures].[LOS]
)