Home » SQL & PL/SQL » SQL & PL/SQL » Grouping with the previous one if identical
Grouping with the previous one if identical [message #687314] Tue, 21 February 2023 06:46 Go to next message
questvba
Messages: 15
Registered: July 2022
Junior Member
Hi Forum,

With the following query

WITH MYDATA as
	(SELECT 3017479 AS filenb, 40818 AS child, 202001 AS date_start, 204.66 AS amount FROM dual UNION ALL
	SELECT 3017479, 40818, 202002, 204.66 FROM dual UNION ALL
	SELECT 3017479, 40818, 202003, 208.75 FROM dual UNION ALL
	SELECT 3017479, 40818, 202004, 123.64 FROM dual UNION ALL
	SELECT 3017479, 40818, 202005, 123.64 FROM dual UNION ALL
	SELECT 3017479, 40818, 202006, 123.64 FROM dual UNION ALL
	SELECT 3017479, 40818, 202007, 174.64 FROM dual UNION ALL
	SELECT 3017479, 40818, 202008, 123.64 FROM dual UNION ALL
	SELECT 3017479, 40818, 202009, 123.64 FROM dual UNION ALL
	SELECT 3017479, 40818, 202010, 123.64 FROM dual UNION ALL
	SELECT 3017479, 40818, 202011, 123.64 FROM dual UNION ALL
	SELECT 3017479, 40818, 202012, 123.64 FROM dual UNION ALL
	
	SELECT 3017479, 60914, 202001, 210.29 FROM dual UNION ALL
	SELECT 3017479, 60914, 202002, 210.29 FROM dual UNION ALL
	SELECT 3017479, 60914, 202003, 210.29 FROM dual UNION ALL
	SELECT 3017479, 60914, 202004, 163.23 FROM dual UNION ALL
	SELECT 3017479, 60914, 202005, 163.23 FROM dual UNION ALL
	SELECT 3017479, 60914, 202006, 163.23 FROM dual UNION ALL
	SELECT 3017479, 60914, 202007, 184.25 FROM dual UNION ALL
	SELECT 3017479, 60914, 202008, 163.23 FROM dual UNION ALL
	SELECT 3017479, 60914, 202009, 163.23 FROM dual UNION ALL
	SELECT 3017479, 60914, 202010, 163.23 FROM dual UNION ALL
	SELECT 3017479, 60914, 202011, 163.23 FROM dual UNION ALL
	SELECT 3017479, 60914, 202012, 184.25 FROM dual 	
	)	

SELECT *
FROM MYDATA
;


I have this result (left side)
In file 3017479, I have two children (40818 and 60914) who receive amounts that may be identical or different depending on certain circumstances.

/forum/fa/14692/0/


The aim would be to be able to group the contiguous identical amounts by file, by child. This would look like this (right side):

/forum/fa/14692/0/

For child 40818, the amount is the same in 202004 and 202008 but as there was a break in 202007, they are two different groups.
DATES_TART is the MIN of the grouping and DATE_END is the MAX of the grouping.

At the moment I have tried with LAG, LEAD, FIRST_VALUE but I can't get the desired result.

If you have any questions...
If you have a solution, I'm interested.

Re: Grouping with the previous one if identical [message #687315 is a reply to message #687314] Tue, 21 February 2023 08:05 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
SQL> WITH MYDATA as
  2     (SELECT 3017479 AS filenb, 40818 AS child, 202001 AS date_start, 204.66 AS amount FROM dual UNION ALL
  3     SELECT 3017479, 40818, 202002, 204.66 FROM dual UNION ALL
  4     SELECT 3017479, 40818, 202003, 208.75 FROM dual UNION ALL
  5     SELECT 3017479, 40818, 202004, 123.64 FROM dual UNION ALL
  6     SELECT 3017479, 40818, 202005, 123.64 FROM dual UNION ALL
  7     SELECT 3017479, 40818, 202006, 123.64 FROM dual UNION ALL
  8     SELECT 3017479, 40818, 202007, 174.64 FROM dual UNION ALL
  9     SELECT 3017479, 40818, 202008, 123.64 FROM dual UNION ALL
 10     SELECT 3017479, 40818, 202009, 123.64 FROM dual UNION ALL
 11     SELECT 3017479, 40818, 202010, 123.64 FROM dual UNION ALL
 12     SELECT 3017479, 40818, 202011, 123.64 FROM dual UNION ALL
 13     SELECT 3017479, 40818, 202012, 123.64 FROM dual UNION ALL
 14     SELECT 3017479, 60914, 202001, 210.29 FROM dual UNION ALL
 15     SELECT 3017479, 60914, 202002, 210.29 FROM dual UNION ALL
 16     SELECT 3017479, 60914, 202003, 210.29 FROM dual UNION ALL
 17     SELECT 3017479, 60914, 202004, 163.23 FROM dual UNION ALL
 18     SELECT 3017479, 60914, 202005, 163.23 FROM dual UNION ALL
 19     SELECT 3017479, 60914, 202006, 163.23 FROM dual UNION ALL
 20     SELECT 3017479, 60914, 202007, 184.25 FROM dual UNION ALL
 21     SELECT 3017479, 60914, 202008, 163.23 FROM dual UNION ALL
 22     SELECT 3017479, 60914, 202009, 163.23 FROM dual UNION ALL
 23     SELECT 3017479, 60914, 202010, 163.23 FROM dual UNION ALL
 24     SELECT 3017479, 60914, 202011, 163.23 FROM dual UNION ALL
 25     SELECT 3017479, 60914, 202012, 184.25 FROM dual
 26     ),
 27    t as (
 28      select filenb, child, date_start, amount,
 29             case
 30               when lag(amount,1,-1) over (partition by filenb, child order by date_start) != amount
 31                 then row_number() over (partition by filenb, child order by date_start)
 32             end rn
 33      from mydata
 34    ),
 35    grouping as (
 36      select filenb, child, date_start, amount,
 37             max(rn) over (partition by filenb, child order by date_start) grp
 38      from t
 39    )
 40  select filenb, child, min(date_start) date_start, max(date_start) date_end, amount
 41  from grouping
 42  group by filenb, child, amount, grp
 43  order by filenb, child, date_start
 44  /
    FILENB      CHILD DATE_START   DATE_END     AMOUNT
---------- ---------- ---------- ---------- ----------
   3017479      40818     202001     202002     204.66
   3017479      40818     202003     202003     208.75
   3017479      40818     202004     202006     123.64
   3017479      40818     202007     202007     174.64
   3017479      40818     202008     202012     123.64
   3017479      60914     202001     202003     210.29
   3017479      60914     202004     202006     163.23
   3017479      60914     202007     202007     184.25
   3017479      60914     202008     202011     163.23
   3017479      60914     202012     202012     184.25

10 rows selected.
Re: Grouping with the previous one if identical [message #687316 is a reply to message #687315] Tue, 21 February 2023 08:18 Go to previous messageGo to next message
questvba
Messages: 15
Registered: July 2022
Junior Member
Merci Michel pour la solution.
J'examine pour comprendre et je reviens si jamais j'ai besoin d'un éclaircissement.
Re: Grouping with the previous one if identical [message #687317 is a reply to message #687315] Tue, 21 February 2023 08:45 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3269
Registered: January 2010
Location: Connecticut, USA
Senior Member
WITH MYDATA AS (
                SELECT 3017479 FILENB, 40818 CHILD, 202001 DATE_START, 204.66 AMOUNT FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202002, 204.66 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202003, 208.75 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202004, 123.64 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202005, 123.64 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202006, 123.64 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202007, 174.64 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202008, 123.64 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202009, 123.64 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202010, 123.64 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202011, 123.64 FROM DUAL UNION ALL
                SELECT 3017479, 40818, 202012, 123.64 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202001, 210.29 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202002, 210.29 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202003, 210.29 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202004, 163.23 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202005, 163.23 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202006, 163.23 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202007, 184.25 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202008, 163.23 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202009, 163.23 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202010, 163.23 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202011, 163.23 FROM DUAL UNION ALL
                SELECT 3017479, 60914, 202012, 184.25 FROM DUAL
               ),
          T AS (
                SELECT  FILENB,
                        CHILD,
                        DATE_START,
                        AMOUNT,
                        DATE_START - DENSE_RANK() OVER(PARTITION BY FILENB,CHILD,AMOUNT ORDER BY DATE_START) GRP
                  FROM  MYDATA
               )
SELECT  FILENB,
        CHILD,
        MIN(DATE_START) DATE_START,
        MAX(DATE_START) DATE_END,
        AMOUNT
  FROM  T
  GROUP BY FILENB,
           CHILD,
           AMOUNT,
           GRP
  ORDER BY FILENB,
           CHILD,
           DATE_START
/

    FILENB      CHILD DATE_START   DATE_END     AMOUNT
---------- ---------- ---------- ---------- ----------
   3017479      40818     202001     202002     204.66
   3017479      40818     202003     202003     208.75
   3017479      40818     202004     202006     123.64
   3017479      40818     202007     202007     174.64
   3017479      40818     202008     202012     123.64
   3017479      60914     202001     202003     210.29
   3017479      60914     202004     202006     163.23
   3017479      60914     202007     202007     184.25
   3017479      60914     202008     202011     163.23
   3017479      60914     202012     202012     184.25

10 rows selected.

SQL>
SY.
Re: Grouping with the previous one if identical [message #687318 is a reply to message #687317] Tue, 21 February 2023 09:04 Go to previous messageGo to next message
questvba
Messages: 15
Registered: July 2022
Junior Member
Waouw, Sy.
Another solution. Thank you. I am looking into it.
Re: Grouping with the previous one if identical [message #687319 is a reply to message #687318] Tue, 21 February 2023 09:58 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

The first subquery "t" adds a tag to each when a new "group" starts:
SQL> WITH MYDATA as
  2     (SELECT 3017479 AS filenb, 40818 AS child, 202001 AS date_start, 204.66 AS amount FROM dual UNION ALL
  3     SELECT 3017479, 40818, 202002, 204.66 FROM dual UNION ALL
  4     SELECT 3017479, 40818, 202003, 208.75 FROM dual UNION ALL
  5     SELECT 3017479, 40818, 202004, 123.64 FROM dual UNION ALL
  6     SELECT 3017479, 40818, 202005, 123.64 FROM dual UNION ALL
  7     SELECT 3017479, 40818, 202006, 123.64 FROM dual UNION ALL
  8     SELECT 3017479, 40818, 202007, 174.64 FROM dual UNION ALL
  9     SELECT 3017479, 40818, 202008, 123.64 FROM dual UNION ALL
 10     SELECT 3017479, 40818, 202009, 123.64 FROM dual UNION ALL
 11     SELECT 3017479, 40818, 202010, 123.64 FROM dual UNION ALL
 12     SELECT 3017479, 40818, 202011, 123.64 FROM dual UNION ALL
 13     SELECT 3017479, 40818, 202012, 123.64 FROM dual UNION ALL
 14     SELECT 3017479, 60914, 202001, 210.29 FROM dual UNION ALL
 15     SELECT 3017479, 60914, 202002, 210.29 FROM dual UNION ALL
 16     SELECT 3017479, 60914, 202003, 210.29 FROM dual UNION ALL
 17     SELECT 3017479, 60914, 202004, 163.23 FROM dual UNION ALL
 18     SELECT 3017479, 60914, 202005, 163.23 FROM dual UNION ALL
 19     SELECT 3017479, 60914, 202006, 163.23 FROM dual UNION ALL
 20     SELECT 3017479, 60914, 202007, 184.25 FROM dual UNION ALL
 21     SELECT 3017479, 60914, 202008, 163.23 FROM dual UNION ALL
 22     SELECT 3017479, 60914, 202009, 163.23 FROM dual UNION ALL
 23     SELECT 3017479, 60914, 202010, 163.23 FROM dual UNION ALL
 24     SELECT 3017479, 60914, 202011, 163.23 FROM dual UNION ALL
 25     SELECT 3017479, 60914, 202012, 184.25 FROM dual
 26     )
 27      select filenb, child, date_start, amount,
 28             case
 29               when lag(amount,1,-1) over (partition by filenb, child order by date_start) != amount
 30                 then row_number() over (partition by filenb, child order by date_start)
 31             end rn
 32      from mydata
 33  order by filenb, child, date_start
 34  /
    FILENB      CHILD DATE_START     AMOUNT         RN
---------- ---------- ---------- ---------- ----------
   3017479      40818     202001     204.66          1
   3017479      40818     202002     204.66
   3017479      40818     202003     208.75          3
   3017479      40818     202004     123.64          4
   3017479      40818     202005     123.64
   3017479      40818     202006     123.64
   3017479      40818     202007     174.64          7
   3017479      40818     202008     123.64          8
   3017479      40818     202009     123.64
   3017479      40818     202010     123.64
   3017479      40818     202011     123.64
   3017479      40818     202012     123.64
   3017479      60914     202001     210.29          1
   3017479      60914     202002     210.29
   3017479      60914     202003     210.29
   3017479      60914     202004     163.23          4
   3017479      60914     202005     163.23
   3017479      60914     202006     163.23
   3017479      60914     202007     184.25          7
   3017479      60914     202008     163.23          8
   3017479      60914     202009     163.23
   3017479      60914     202010     163.23
   3017479      60914     202011     163.23
   3017479      60914     202012     184.25         12

24 rows selected.
The second one "grouping" spreads this tag to each member of the group:
SQL> break on grp skip 1 dup
SQL> WITH MYDATA as
  2     (SELECT 3017479 AS filenb, 40818 AS child, 202001 AS date_start, 204.66 AS amount FROM dual UNION ALL
  3     SELECT 3017479, 40818, 202002, 204.66 FROM dual UNION ALL
  4     SELECT 3017479, 40818, 202003, 208.75 FROM dual UNION ALL
  5     SELECT 3017479, 40818, 202004, 123.64 FROM dual UNION ALL
  6     SELECT 3017479, 40818, 202005, 123.64 FROM dual UNION ALL
  7     SELECT 3017479, 40818, 202006, 123.64 FROM dual UNION ALL
  8     SELECT 3017479, 40818, 202007, 174.64 FROM dual UNION ALL
  9     SELECT 3017479, 40818, 202008, 123.64 FROM dual UNION ALL
 10     SELECT 3017479, 40818, 202009, 123.64 FROM dual UNION ALL
 11     SELECT 3017479, 40818, 202010, 123.64 FROM dual UNION ALL
 12     SELECT 3017479, 40818, 202011, 123.64 FROM dual UNION ALL
 13     SELECT 3017479, 40818, 202012, 123.64 FROM dual UNION ALL
 14     SELECT 3017479, 60914, 202001, 210.29 FROM dual UNION ALL
 15     SELECT 3017479, 60914, 202002, 210.29 FROM dual UNION ALL
 16     SELECT 3017479, 60914, 202003, 210.29 FROM dual UNION ALL
 17     SELECT 3017479, 60914, 202004, 163.23 FROM dual UNION ALL
 18     SELECT 3017479, 60914, 202005, 163.23 FROM dual UNION ALL
 19     SELECT 3017479, 60914, 202006, 163.23 FROM dual UNION ALL
 20     SELECT 3017479, 60914, 202007, 184.25 FROM dual UNION ALL
 21     SELECT 3017479, 60914, 202008, 163.23 FROM dual UNION ALL
 22     SELECT 3017479, 60914, 202009, 163.23 FROM dual UNION ALL
 23     SELECT 3017479, 60914, 202010, 163.23 FROM dual UNION ALL
 24     SELECT 3017479, 60914, 202011, 163.23 FROM dual UNION ALL
 25     SELECT 3017479, 60914, 202012, 184.25 FROM dual
 26     ),
 27    t as (
 28      select filenb, child, date_start, amount,
 29             case
 30               when lag(amount,1,-1) over (partition by filenb, child order by date_start) != amount
 31                 then row_number() over (partition by filenb, child order by date_start)
 32             end rn
 33      from mydata
 34    )
 35      select filenb, child, date_start, amount,
 36             max(rn) over (partition by filenb, child order by date_start) grp
 37      from t
 38  order by filenb, child, date_start
 39  /
    FILENB      CHILD DATE_START     AMOUNT        GRP
---------- ---------- ---------- ---------- ----------
   3017479      40818     202001     204.66          1
   3017479      40818     202002     204.66          1

   3017479      40818     202003     208.75          3

   3017479      40818     202004     123.64          4
   3017479      40818     202005     123.64          4
   3017479      40818     202006     123.64          4

   3017479      40818     202007     174.64          7

   3017479      40818     202008     123.64          8
   3017479      40818     202009     123.64          8
   3017479      40818     202010     123.64          8
   3017479      40818     202011     123.64          8
   3017479      40818     202012     123.64          8

   3017479      60914     202001     210.29          1
   3017479      60914     202002     210.29          1
   3017479      60914     202003     210.29          1

   3017479      60914     202004     163.23          4
   3017479      60914     202005     163.23          4
   3017479      60914     202006     163.23          4

   3017479      60914     202007     184.25          7

   3017479      60914     202008     163.23          8
   3017479      60914     202009     163.23          8
   3017479      60914     202010     163.23          8
   3017479      60914     202011     163.23          8

   3017479      60914     202012     184.25         12


24 rows selected.
Then the last one to get each group with their min and max.
Re: Grouping with the previous one if identical [message #687320 is a reply to message #687318] Tue, 21 February 2023 11:26 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Note that Solomon's solution works only if all dates are filled...
... unless you specify that if some dates are missing this is counted as a "break" even if amount does not change, in this later case mine does not work.

For instance, I modified the dates of last 4 rows:
SQL> WITH MYDATA AS (
  2                  SELECT 3017479 FILENB, 40818 CHILD, 202001 DATE_START, 204.66 AMOUNT FROM DUAL UNION ALL
  3                  SELECT 3017479, 40818, 202002, 204.66 FROM DUAL UNION ALL
  4                  SELECT 3017479, 40818, 202003, 208.75 FROM DUAL UNION ALL
  5                  SELECT 3017479, 40818, 202004, 123.64 FROM DUAL UNION ALL
  6                  SELECT 3017479, 40818, 202005, 123.64 FROM DUAL UNION ALL
  7                  SELECT 3017479, 40818, 202006, 123.64 FROM DUAL UNION ALL
  8                  SELECT 3017479, 40818, 202007, 174.64 FROM DUAL UNION ALL
  9                  SELECT 3017479, 40818, 202008, 123.64 FROM DUAL UNION ALL
 10                  SELECT 3017479, 40818, 202009, 123.64 FROM DUAL UNION ALL
 11                  SELECT 3017479, 40818, 202010, 123.64 FROM DUAL UNION ALL
 12                  SELECT 3017479, 40818, 202011, 123.64 FROM DUAL UNION ALL
 13                  SELECT 3017479, 40818, 202012, 123.64 FROM DUAL UNION ALL
 14                  SELECT 3017479, 60914, 202001, 210.29 FROM DUAL UNION ALL
 15                  SELECT 3017479, 60914, 202002, 210.29 FROM DUAL UNION ALL
 16                  SELECT 3017479, 60914, 202003, 210.29 FROM DUAL UNION ALL
 17                  SELECT 3017479, 60914, 202004, 163.23 FROM DUAL UNION ALL
 18                  SELECT 3017479, 60914, 202005, 163.23 FROM DUAL UNION ALL
 19                  SELECT 3017479, 60914, 202006, 163.23 FROM DUAL UNION ALL
 20                  SELECT 3017479, 60914, 202007, 184.25 FROM DUAL UNION ALL
 21                  SELECT 3017479, 60914, 202008, 163.23 FROM DUAL UNION ALL
 22                  SELECT 3017479, 60914, 202010, 163.23 FROM DUAL UNION ALL
 23                  SELECT 3017479, 60914, 202012, 163.23 FROM DUAL UNION ALL
 24                  SELECT 3017479, 60914, 202013, 163.23 FROM DUAL UNION ALL
 25                  SELECT 3017479, 60914, 202014, 184.25 FROM DUAL
 26                 ),
 27            T AS (
 28                  SELECT  FILENB,
 29                          CHILD,
 30                          DATE_START,
 31                          AMOUNT,
 32                          DATE_START - DENSE_RANK() OVER(PARTITION BY FILENB,CHILD,AMOUNT ORDER BY DATE_START) GRP
 33                    FROM  MYDATA
 34                 )
 35  SELECT  FILENB,
 36          CHILD,
 37          MIN(DATE_START) DATE_START,
 38          MAX(DATE_START) DATE_END,
 39          AMOUNT
 40    FROM  T
 41    GROUP BY FILENB,
 42             CHILD,
 43             AMOUNT,
 44             GRP
 45    ORDER BY FILENB,
 46             CHILD,
 47             DATE_START
 48  /
    FILENB      CHILD DATE_START   DATE_END     AMOUNT
---------- ---------- ---------- ---------- ----------
   3017479      40818     202001     202002     204.66
   3017479      40818     202003     202003     208.75
   3017479      40818     202004     202006     123.64
   3017479      40818     202007     202007     174.64
   3017479      40818     202008     202012     123.64
   3017479      60914     202001     202003     210.29
   3017479      60914     202004     202006     163.23
   3017479      60914     202007     202007     184.25
   3017479      60914     202008     202008     163.23
   3017479      60914     202010     202010     163.23
   3017479      60914     202012     202013     163.23
   3017479      60914     202014     202014     184.25

12 rows selected.
In the lines with 162.23 there are breaks in dates but not in amount.
SQL> WITH MYDATA AS (
  2                  SELECT 3017479 FILENB, 40818 CHILD, 202001 DATE_START, 204.66 AMOUNT FROM DUAL UNION ALL
  3                  SELECT 3017479, 40818, 202002, 204.66 FROM DUAL UNION ALL
  4                  SELECT 3017479, 40818, 202003, 208.75 FROM DUAL UNION ALL
  5                  SELECT 3017479, 40818, 202004, 123.64 FROM DUAL UNION ALL
  6                  SELECT 3017479, 40818, 202005, 123.64 FROM DUAL UNION ALL
  7                  SELECT 3017479, 40818, 202006, 123.64 FROM DUAL UNION ALL
  8                  SELECT 3017479, 40818, 202007, 174.64 FROM DUAL UNION ALL
  9                  SELECT 3017479, 40818, 202008, 123.64 FROM DUAL UNION ALL
 10                  SELECT 3017479, 40818, 202009, 123.64 FROM DUAL UNION ALL
 11                  SELECT 3017479, 40818, 202010, 123.64 FROM DUAL UNION ALL
 12                  SELECT 3017479, 40818, 202011, 123.64 FROM DUAL UNION ALL
 13                  SELECT 3017479, 40818, 202012, 123.64 FROM DUAL UNION ALL
 14                  SELECT 3017479, 60914, 202001, 210.29 FROM DUAL UNION ALL
 15                  SELECT 3017479, 60914, 202002, 210.29 FROM DUAL UNION ALL
 16                  SELECT 3017479, 60914, 202003, 210.29 FROM DUAL UNION ALL
 17                  SELECT 3017479, 60914, 202004, 163.23 FROM DUAL UNION ALL
 18                  SELECT 3017479, 60914, 202005, 163.23 FROM DUAL UNION ALL
 19                  SELECT 3017479, 60914, 202006, 163.23 FROM DUAL UNION ALL
 20                  SELECT 3017479, 60914, 202007, 184.25 FROM DUAL UNION ALL
 21                  SELECT 3017479, 60914, 202008, 163.23 FROM DUAL UNION ALL
 22                  SELECT 3017479, 60914, 202010, 163.23 FROM DUAL UNION ALL
 23                  SELECT 3017479, 60914, 202012, 163.23 FROM DUAL UNION ALL
 24                  SELECT 3017479, 60914, 202013, 163.23 FROM DUAL UNION ALL
 25                  SELECT 3017479, 60914, 202014, 184.25 FROM DUAL
 26                 ),
 27    t as (
 28      select filenb, child, date_start, amount,
 29             case
 30               when lag(amount,1,-1) over (partition by filenb, child order by date_start) != amount
 31                 then row_number() over (partition by filenb, child order by date_start)
 32             end rn
 33      from mydata
 34    ),
 35    grouping as (
 36      select filenb, child, date_start, amount,
 37             max(rn) over (partition by filenb, child order by date_start) grp
 38      from t
 39    )
 40  select filenb, child, min(date_start) date_start, max(date_start) date_end, amount
 41  from grouping
 42  group by filenb, child, amount, grp
 43  order by filenb, child, date_start
 44  /
    FILENB      CHILD DATE_START   DATE_END     AMOUNT
---------- ---------- ---------- ---------- ----------
   3017479      40818     202001     202002     204.66
   3017479      40818     202003     202003     208.75
   3017479      40818     202004     202006     123.64
   3017479      40818     202007     202007     174.64
   3017479      40818     202008     202012     123.64
   3017479      60914     202001     202003     210.29
   3017479      60914     202004     202006     163.23
   3017479      60914     202007     202007     184.25
   3017479      60914     202008     202013     163.23
   3017479      60914     202014     202014     184.25

10 rows selected.
So this is to you to know which one fits your need.

Re: Grouping with the previous one if identical [message #687321 is a reply to message #687320] Tue, 21 February 2023 12:30 Go to previous messageGo to next message
questvba
Messages: 15
Registered: July 2022
Junior Member
Yep Michel.
When we ask the question, we always think that everything is clear and...
Indeed, as you say, when there is a break in the amount and/or in the date, it is a new group.
Re: Grouping with the previous one if identical [message #687382 is a reply to message #687317] Sun, 05 March 2023 21:28 Go to previous messageGo to next message
mathguy
Messages: 106
Registered: January 2023
Senior Member
The Tabibitosan solution will fail if the data is not all from a single calendar year. If a "child" has the same amount in 202012 and in 202101, the Tabibitosan method (as written) will find a gap there, because "dates" are stored as integers, and consecutive months like Dec. 2020 and Jan. 2021 are represented by non-consecutive integers.

I haven't checked yet, but it is possible that other solutions in this thread suffer from the same defect. ("Defect" unless the OP states that all dates are from the same calendar year; he didn't say that, although the sample given for testing does satisfy that restriction.)
Re: Grouping with the previous one if identical [message #687383 is a reply to message #687382] Sun, 05 March 2023 21:37 Go to previous messageGo to next message
mathguy
Messages: 106
Registered: January 2023
Senior Member
The problem has a trivial solution using MATCH_RECOGNIZE. I added more sample data to test for the following cases:

- Multiple file numbers (added 4000000)
- Child with gap in the data (child 80000 has data only for Jan-Feb and Aug 2020) - even with the same amount, there should be separate rows in the output
- Child (81000) with data in Dec of one year and Jan of the following year: there is no gap in the dates, even though there is gap in their numerical representation.
- Child (83002) with data for exactly one month (to make sure the query works correctly for such input too).

with
  mydata (filenb, child, date_start, amount) as (
    select 3017479, 40818, 202001, 204.66 from dual union all
    select 3017479, 40818, 202002, 204.66 from dual union all
    select 3017479, 40818, 202003, 208.75 from dual union all
    select 3017479, 40818, 202004, 123.64 from dual union all
    select 3017479, 40818, 202005, 123.64 from dual union all
    select 3017479, 40818, 202006, 123.64 from dual union all
    select 3017479, 40818, 202007, 174.64 from dual union all
    select 3017479, 40818, 202008, 123.64 from dual union all
    select 3017479, 40818, 202009, 123.64 from dual union all
    select 3017479, 40818, 202010, 123.64 from dual union all
    select 3017479, 40818, 202011, 123.64 from dual union all
    select 3017479, 40818, 202012, 123.64 from dual union all
    select 3017479, 60914, 202001, 210.29 from dual union all
    select 3017479, 60914, 202002, 210.29 from dual union all
    select 3017479, 60914, 202003, 210.29 from dual union all
    select 3017479, 60914, 202004, 163.23 from dual union all
    select 3017479, 60914, 202005, 163.23 from dual union all
    select 3017479, 60914, 202006, 163.23 from dual union all
    select 3017479, 60914, 202007, 184.25 from dual union all
    select 3017479, 60914, 202008, 163.23 from dual union all
    select 3017479, 60914, 202009, 163.23 from dual union all
    select 3017479, 60914, 202010, 163.23 from dual union all
    select 3017479, 60914, 202011, 163.23 from dual union all
    select 3017479, 60914, 202012, 184.25 from dual union all
    select 4000000, 80000, 202001, 123.45 from dual union all
    select 4000000, 80000, 202002, 123.45 from dual union all
    select 4000000, 80000, 202008, 123.45 from dual union all
    select 4000000, 81000, 202012, 100.01 from dual union all
    select 4000000, 81000, 202101, 100.01 from dual union all
    select 4000000, 83002, 202008, 403.92 from dual
  )
select filenb, child, date_start, date_end, amount
from   mydata
match_recognize(
  partition by filenb, child
  order     by date_start
  measures  f.date_start as date_start, last(date_start) as date_end,
            f.amount as amount
  pattern   ( f s* )
  define    s as amount = f.amount
                 and to_date(to_char(date_start), 'yyyymm') =
                     add_months(to_date(to_char(prev(date_start)), 'yyyymm'), 1)
);


    FILENB      CHILD DATE_START   DATE_END     AMOUNT
---------- ---------- ---------- ---------- ----------
   3017479      40818     202001     202002     204.66
   3017479      40818     202003     202003     208.75
   3017479      40818     202004     202006     123.64
   3017479      40818     202007     202007     174.64
   3017479      40818     202008     202012     123.64
   3017479      60914     202001     202003     210.29
   3017479      60914     202004     202006     163.23
   3017479      60914     202007     202007     184.25
   3017479      60914     202008     202011     163.23
   3017479      60914     202012     202012     184.25
   4000000      80000     202001     202002     123.45
   4000000      80000     202008     202008     123.45
   4000000      81000     202012     202101     100.01
   4000000      83002     202008     202008     403.92
Re: Grouping with the previous one if identical [message #687389 is a reply to message #687382] Mon, 06 March 2023 00:52 Go to previous messageGo to next message
Michel Cadot
Messages: 68625
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Quote:
The Tabibitosan solution will fail if the data is not all from a single calendar year. If a "child" has the same amount in 202012 and in 202101, the Tabibitosan method (as written) will find a gap there, because "dates" are stored as integers, and consecutive months like Dec. 2020 and Jan. 2021 are represented by non-consecutive integers.

The solution is then to insert a previous step that numbers the rows.


Quote:
I haven't checked yet, but it is possible that other solutions in this thread suffer from the same defect.

Too bad you didn't. Your post would be more interesting if you did.
I didn't checked but it is possible that other solutions don't suffer from the same defect.
Easy, doesn't it?

Re: Grouping with the previous one if identical [message #687404 is a reply to message #687389] Mon, 06 March 2023 09:22 Go to previous messageGo to next message
mathguy
Messages: 106
Registered: January 2023
Senior Member
Michel Cadot wrote on Mon, 06 March 2023 00:52

Quote:

The Tabibitosan solution will fail if the data is not all from a single calendar year. If a "child" has the same amount in 202012 and in 202101, the Tabibitosan method (as written) will find a gap there, because "dates" are stored as integers, and consecutive months like Dec. 2020 and Jan. 2021 are represented by non-consecutive integers.
The solution is then to insert a previous step that numbers the rows.
I don't see how numbering the rows can fix the solution. If you think it is possible, please show us how.

The tabibitosan method can be used here, but it's a bit more delicate than the usual approach. To create the grouping flags, one would need to convert the numbers to dates and use ADD_MONTHS, similar to what I did in the MATCH_RECOGNIZE solution. (Alternatively, one can devise some direct computation on integers that would be equivalent to the date calculation - but besides being tedious and therefore prone to introducing errors, it would obfuscate the real meaning of the computation, making it harder on future maintainers of the code. In any case, it would not be "numbering the rows".)

Something like this:

with
  mydata (filenb, child, date_start, amount) as (
    ....................
  )
, prep (filenb, child, date_start, amount, grp) as (
    select filenb, child, date_start, amount,
           add_months(to_date(to_char(date_start), 'yyyymm'),
                     - dense_rank() over (partition by filenb, child, amount order by date_start))
    from   mydata
  )
select filenb, child, min(date_start) as date_start, max(date_start) as date_end, amount
from   prep
group  by filenb, child, amount, grp
order  by filenb, child, date_start  --  if needed
;


Re: Grouping with the previous one if identical [message #687406 is a reply to message #687389] Mon, 06 March 2023 09:53 Go to previous message
mathguy
Messages: 106
Registered: January 2023
Senior Member
Michel Cadot wrote on Mon, 06 March 2023 00:52


Quote:

I haven't checked yet, but it is possible that other solutions in this thread suffer from the same defect.
Too bad you didn't. Your post would be more interesting if you did.
I didn't check other solutions, because I didn't know enough about the OP's problem; waiting for the OP to clarify.

On the one hand, it is possible that all the inputs are for one calendar year, always. Perhaps that is what the OP needs: even with raw data for multiple years, he may need to report on one calendar year - filtering by date in a WHERE clause. Then the question is moot.

On the other hand, it is also possible that the OP must report over more than one calendar year, but show different groups when one year ends and another begins, even when the months are consecutive (no gaps) and the amount is the same. Then, in fact, the way Solomon wrote the solution would be correct. However, if this was part of the requirement, I would expect the sample data to include such cases, and it didn't; and/or the OP to say something to that effect, which he did not.

My speculation that other solutions may suffer from the same "defect" was based on what I didn't see in the code (only having glanced quickly at the other answers). If dates may be from different years, and January following December with the same amount should be considered as part of the same group, I would have to see something like the ADD_MONTHS trick I used in MATCH_RECOGNIZE (and in the tabibitosan solution I just showed), or a corresponding, direct arithmetic computation. I saw nothing of the sort, whence my speculation.

I am still not going to look closely at other solutions - I don't see the point. I will wait for the OP to clarify. From a very quick scanning, it seems that so far there are only two solutions (other than what I proposed) - you gave a solution that doesn't start a new group when there is a gap in dates, and the OP already explained that that isn't his requirement, and Solomon's tabibitosan solution. Was there anything else?
Previous Topic: how to incrementally get all numbers incremented by n between 2 columns values
Next Topic: Identify Groups where 2 or more Clients have bought the same Products - SQL Oracle
Goto Forum:
  


Current Time: Thu Mar 28 19:30:54 CDT 2024