Difference in results of queries on same data

All posts relating to Oracle PL/SQL development.

Moderator: Tim...

Difference in results of queries on same data

Postby Salman » Tue Jun 11, 2013 9:37 am

Hi Tim,
I faced this problem on SQLServer, so i though it might be some bug (because I could not find out why I am facing this problem), but, then I moved data to Oracle, but I see same problem here, which means there is some problem with my data somewhere or with the queries which I have.
I have a table D_CUSTOMER which contains data of my customers based on a membership card. So a single customer may have multiple rows here if he has multiple membership cards.
Then I have a FURNITURE table which contains transactions by the customers.
I execute following query to fetch the data from y FURNITURE table to find out the number of unique customers (removed duplication), number of visits done by these customers (distinct RECEIPTNO will return us this because each visit will have a distinct receipt number) and then total amount spent by all customers.
Code: Select all
select  COUNT(DISTINCT IDNO) AS UNIQ, COUNT(distinct RECEIPTNO) AS VISIT, SUM(AMT_TXN) AS SPENT
 from FURNITURE
where trandate_period_skey between 946 and 1126
UNIQ         VISIT        SPENT
148269   265907   5551854.18


Now I want same information based on gender of the customers, so i used following query
Code: Select all
select GENDER, COUNT(DISTINCT A.IDNO) AS UNQI, COUNT(distinct RECEIPTNO) AS VISIT, SUM(AMT_TXN) AS SPEND
from FURNITURE A,
(SELECT IDNO, MAX(GENDER) AS GENDER FROM D_CUSTOMER WHERE MEMBERTYPE = 'PA' GROUP BY IDNO) b
where TRANDATE_PERIOD_SKEY between 946 and 1126
and a.IDNO = b.IDNO
group by gender

GENDER  UNIQ    VISITS  SPENT
1   14   19   448.61
F   101009   183629   3770651.08
M   43988   76702   1652281.4
P99   3055   5237   121099.35
~   203   344   7373.74

FURNITURE table contains data only from members with MEMBERTYPE = 'PA'. Here we are using MAX(GENDER) because there are multiple entries for the same customer. So it is supposed to pick only one entry against a CUSTOMER using MAX function.

Now the problem is that if we sum the values under VISIT column, it should be 265907 as returned by the first query, but, after adding values for VISIT column, the actual value is 265931.
Can you please help me to understand that why I am having this difference?
Thanks

Salman
Salman
Senior Advisor
 
Posts: 752
Joined: Tue Nov 02, 2004 8:08 am
Location: Pakistan

Re: Difference in results of queries on same data

Postby Tim... » Tue Jun 11, 2013 10:22 am

Hi.

If I'm understanding your issue correctly, the cause of the problem is very simple.

Here is your select list:

Code: Select all
select GENDER, COUNT(DISTINCT A.IDNO) AS UNQI, COUNT(distinct RECEIPTNO) AS VISIT, SUM(AMT_TXN) AS SPEND


Col1 : You probably only have two genders M/F, unless you have unknown.
Col2 : Count of the number of distinct IDNO values within the gender.
Col3 : Count of the number of distinct RECEIPTNO values within the gender.
Col4 : Sum of all AMT_TXN values within the gender. Not just the sum for distinct receipts.

I believe you are expecting just the sum for distinct reciepts, but you are actually asking for the sum of all receipts. I believe this is your problem, assuming I've not misunderstood your question.

Cheers

Tim...
Tim...
Oracle ACE Director
Oracle ACE of the Year 2006 - Oracle Magazine Editors Choice Awards
OakTable Member
OCP DBA 7.3, 8, 8i, 9i, 10g, 11g
OCP Advanced PL/SQL Developer
Oracle Database: SQL Certified Expert
My website: http://www.oracle-base.com
My blog: http://www.oracle-base.com/blog
Tim...
Site Admin
 
Posts: 17933
Joined: Mon Nov 01, 2004 5:56 pm
Location: England, UK

Re: Difference in results of queries on same data

Postby Salman » Wed Jun 12, 2013 1:58 am

Thanks for your reply Tim,
Let me explain my problem in even simpler way. Let me post my two queries once againg.

First query
Code: Select all
select  COUNT(DISTINCT IDNO) AS UNIQ, COUNT(distinct RECEIPTNO) AS VISIT
 from FURNITURE
where trandate_period_skey between 946 and 1126
UNIQ         VISIT
148269       265907


Here we see that there are total 148269 distinct customer who visited total 265907 times for shoping of furniture. Each distinct receiptno shows a single visit.

Second query is simply involving D_CUSTOMER table to find out that how many visits were from males and how many were from females
Code: Select all
select GENDER, COUNT(DISTINCT A.IDNO),COUNT(distinct RECEIPTNO) AS VISIT
from FURNITURE A,
(SELECT IDNO, MAX(GENDER) AS GENDER FROM D_CUSTOMER WHERE MEMBERTYPE = 'PA' GROUP BY IDNO) b
where TRANDATE_PERIOD_SKEY between 946 and 1126
and a.IDNO = b.IDNO
group by gender

GENDER     UNIQ    VISIT

F   103876   188573
M   44393   77357


Now here comes the problem. If you sum values under UNIQ column, total is 148269 (103876 Females and 44393 Males), which is same as the value in the result of first query. But, if we sum the values under VISIT column, 188573 visits by Females and 77357 visits by Males and total is 265930 which is different from the value of COUNT(DISTINCT RECEIPT) in first query because first query says total visits are 265907.
I belive now I am able to explain it correctly. When we do grouping by GENDER, value of total visits is different then if we count total visits without having involved GENDER from D_CUsTOMER table.

Col1 : You probably only have two genders M/F, unless you have unknown.
I have either F or M as gender.

Col2 : Count of the number of distinct IDNO values within the gender.
In first query, just count all distinct IDNO. This will tell us how many unique customers did shopping.
In second query, it counts based on gender.

Col3 : Count of the number of distinct RECEIPTNO values within the gender.
In first query, just count all distinct RECEIPTNO which will tell us total visits by all Males and Females (without gender specified in the query.
In second query, it counts visits based on gender.


Col4 : Sum of all AMT_TXN values within the gender. Not just the sum for distinct receipts.
This is no longer relavant as i have removed this part from my queries for simplicity

Salman
Salman
Senior Advisor
 
Posts: 752
Joined: Tue Nov 02, 2004 8:08 am
Location: Pakistan

Re: Difference in results of queries on same data

Postby Salman » Wed Jun 12, 2013 3:34 am

Hi
I have got the solution. Actually I had same receipt number issued to two different customers, one make and one female and hence while grouping, this type of receipts were being counted for both genders and hence total was greater during grouping.

Case is closed.

Salman
Salman
Senior Advisor
 
Posts: 752
Joined: Tue Nov 02, 2004 8:08 am
Location: Pakistan

Re: Difference in results of queries on same data

Postby Tim... » Wed Jun 12, 2013 7:29 am

:)
Tim...
Oracle ACE Director
Oracle ACE of the Year 2006 - Oracle Magazine Editors Choice Awards
OakTable Member
OCP DBA 7.3, 8, 8i, 9i, 10g, 11g
OCP Advanced PL/SQL Developer
Oracle Database: SQL Certified Expert
My website: http://www.oracle-base.com
My blog: http://www.oracle-base.com/blog
Tim...
Site Admin
 
Posts: 17933
Joined: Mon Nov 01, 2004 5:56 pm
Location: England, UK


Return to Oracle SQL and PL/SQL Development

Who is online

Users browsing this forum: No registered users and 5 guests