Finding duplicate values in column and then Increment a pesudo COUNT column by 1

ACECORP · August 1, 2017, 4:15pm

PROBLEM DESCRIPTION

I am hoping someone can provide some insight on the following problem, specifically if it can be solved using an AD-HOC Query or similar structure.

If it can be solved via an AD-HOC Query or similar structure, I would appreciate any advice, suggestions, or recommendations on how I might approach writing the query.

Here is the problem:

I am using DB2 for iSeries version V7R2M0

I have a table with approximately 100K records.

I have a query that combines one column from one table (we will call this table FOLSOM.DYNACRM002) with two columns from another table (we will call this table FOLSOM.BFP002) via a JOIN.

The columns I am concerned with are as follows:

TABLENAME                     FIELD NAME          TYPE
FOLSOM.DYNACRM002             IDNUM               VARCHAR
FOLSOM.BFP002                 PDATE               NUMERIC
FOLSOM.BFP002                 PTIME               NUMERIC

Many times, there will only be one row where a given IDNUM exists.

However, the more likely result is that there can be anywhere from 2 to several thousand rows that contain the SAME IDNUM value, but the corresponding columns of PDATE and PTIME are different.

“Somehow” and this is the problem – somehow, I need to build into the query itself, an extra ‘fake column’ named IDNUM_SEQ that stores a sequence number that ‘appears’ based on the ‘rules’ described below.

The way this sequence number needs to work is that if there is only one row in the returned data that has a given IDNUM the IDNUM_SEQ column will contain a 1

However, when multiple rows are returned that contain the same IDNUM (but different PDATE and PTIME values obviously), the topmost row gets a 1 in the IDNUM_SEQ column, followed by a 2 in the next one, and a 3 in the next one, and so forth.

If I was able to achieve my goal and solve this problem, an example of what the returned data would look like complete with sequence numbers can be seen in the image attached below named DB2_Data_Sample_Annotated.png. (pardon my inability to center properly right justify the numbers I manually added using an art/drawing app).

The ad-query I am using currently that returns the data I am discussing here is listed below

DELETE FROM FOLSOM.DYNACRM004;

INSERT INTO FOLSOM.DYNACRM004 (
SELECT DISTINCT
RTRIM(AACPCMS) AS AACPCMS
,AACSCD
,AACSYR
,AACSSQ
,TRIM(AAMATTR) AS AAMATTR
FROM XTERM.CPP100 CRIMAS);

SELECT
TRIM(CRIMAS.AAMATTR) AS IDNUM
, ' ' AS IDNUM_SEQ
,FOFMAS.ABPYDT AS PDATE
,FOFMAS.ABADTM AS PTIME
FROM FOLSOM.BFP002 FOFMAS
INNER JOIN FOLSOM.DYNACRM004 CRIMAS ON CRIMAS.AACSCD = FOFMAS.ABCSCC
AND CRIMAS.AACSYR = FOFMAS.ABCSYR
AND CRIMAS.AACSSQ = FOFMAS.ABCSSQ
WHERE FOFMAS.ABCSCC <> 9
AND FOFMAS.ABCSSQ <> 99999
AND (FOFMAS.ABCSSQ <> 0 OR FOFMAS.ABFSQ# <> 0)
ORDER BY TRIM(CRIMAS.AAMATTR) ASC, FOFMAS.ABPYDT ASC , FOFMAS.ABADTM ASC
;

Hopefully what I am looking to do makes sense.

My questions are:

• Can this be done via an AD-HOC query or some other AD-HOC structure I can run in the i5 native ‘run a sql script or query – tool’ or via DBArtisan or some other query tool. Even if this query must be broken into two, queries, and occur across two steps, making use of a ‘staging or holding table’, that is acceptable.

• If the answer to the above question is YES, how might I approach writing the query or set of queries w/ staging or holding table to do this?

I am a beginner/almost-intermediate DB2 user so I can create tables and insert/update data, write queries without a problem. I just need some help getting determining how to approach this problem.

Any guidance, advice, or suggestions anyone can provide on how to approach this would be greatly appreciated.

gbritton · August 1, 2017, 5:30pm

see https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/sqlref/src/tpc/db2z_olapspecification.html

for information on windowing functions, in particular:

numbering-specification

-ROW_NUMBER--(--)--OVER--(--+-----------------------------+--+-------------------------+--)-><
'-| window-partition-clause |-' '-| window-order-clause |-'

ACECORP · August 1, 2017, 5:55pm

Awesome! I thanks! I got it working based on your recommendation!

The line below was the magic hammer!

ROW_NUMBER() OVER (PARTITION BY CRIMAS.AAMATTR ORDER BY FOFMAS.ABPYDT ASC , FOFMAS.ABADTM ASC) AS ROWNUM

It fit into the larger solution as shown below

DELETE FROM FOLSOM.DYNACRM004
;

INSERT INTO FOLSOM.DYNACRM004 (

SELECT DISTINCT
RTRIM(AACPCMS) AS AACPCMS
,AACSCD
,AACSYR
,AACSSQ
,TRIM(AAMATTR) AS AAMATTR
FROM XTERM.CPP100 CRIMAS

)
;

SELECT
TRIM(CRIMAS.AAMATTR) AS IDNUM
,ROW_NUMBER() OVER (PARTITION BY CRIMAS.AAMATTR ORDER BY FOFMAS.ABPYDT ASC , FOFMAS.ABADTM ASC) AS ROWNUM
,FOFMAS.ABPYDT AS PDATE
,FOFMAS.ABADTM AS PTIME
FROM FOLSOM.BFP002 FOFMAS
INNER JOIN FOLSOM.DYNACRM004 CRIMAS ON CRIMAS.AACSCD = FOFMAS.ABCSCC
AND CRIMAS.AACSYR = FOFMAS.ABCSYR
AND CRIMAS.AACSSQ = FOFMAS.ABCSSQ
WHERE FOFMAS.ABCSCC <> 9
AND FOFMAS.ABCSSQ <> 99999
AND (FOFMAS.ABCSSQ <> 0 OR FOFMAS.ABFSQ# <> 0)
ORDER BY TRIM(CRIMAS.AAMATTR) ASC, FOFMAS.ABPYDT ASC , FOFMAS.ABADTM ASC
;