SQLTeam.com | Weblogs | Forums

Need help with SQL query, Microsoft SQL 2000


#1

Hello everyone,

I am not an SQL pro, and last time did queries 10 years ago. Now I need to make one, and just can't seem to finalize.
Very hard to explain and define the problem in right words, thus I can't find solution in Google myself, but hopefully for you, experts, it will be an easy one.

There are 3 tables, and tons of files in Windows filesystem with different random names (like 1ox45z0.doc)
From the database I need to get metadata about records, which are related to some of the files mentioned above.

So, how the tables look like?
1. Main one - with almost all the data I need in output
2. Relation one - with related item IDs from the main table.
3. Path table - with just record system IDs and their appropriate filenames in Windows filesystem.

The 1. table contains records, which are directly connected to 3. table records, so it would be no problem to get out filenames (from 3.) and corresponding data (from 1.) The problem is - I need data (from 1.) of other records, which are related to previously mentioned data through the relation table (2.)

Basically that means, I need to get A (data from 1.), which are related to B (another data from 1.) through table 2., and additionally one column - path from 3., which refers to B, not to A. Is it even possible? Because A also has their own path in 3.

Currently my structure looks like this:

Select 1.ID, 1.something, 1.something, 3.path
from 1 inner join 3
where
(1.ID IN
select 2.ID
from 2 inner join 1 inner join 3
where ...
)

First select takes data I need; where and second select help me to extract particular related records I need.
Everything works well, except the marked out part, because query, of course, returns 3.path from A data, not B data
If you could only give a direction where to dig, it would help a lot.


#2

kinda vague. Please post your table schemas as CREATE TABLE statements along with some sample data for the tables as INSERT INTO statements and the desired output from the query using the sample data


#3

The following is pseudocode as we don't know the table/fieldnames:

select main.id
      ,main.something
      ,main.something_else
      ,path.path
  from main_table as main
       inner join relation_table as rel
               on rel.item_id=main.item_id
       inner join mail_table as prev_main
               on prev_main.item_id=rel.prev_item_id
       inner join path_table as path
               on path.system_id=prev_main.system_id

#4

Thank you for effort, but it still doesn't work for me.
I will try to give approximate table schemes and some example data.

CREATE TABLE main
(
docnumber int(4),
docname varchar(240),
key_number varchar(30),
application varchar(128)
)

INSERT INTO main
VALUES ('12345', 'Real document name', '24', 'DOCX')

INSERT INTO main
VALUES ('12346', 'No name because this is attached PDF to real document', 'NULL', 'PDF')

INSERT INTO main
VALUES ('12347', 'Second real document name', '25', 'DOCX')

INSERT INTO main
VALUES ('12348', 'Third real document name', '26', 'DOCX')

INSERT INTO main
VALUES ('12349', 'Fourth real document name', '27', 'DOCX')

INSERT INTO main
VALUES ('12350', 'No name because this is another attached PDF to real document', 'NULL', 'PDF')

Main table with 6 records - 4 of them real documents with useful metadata, 2 of them - attachments with no useful metadata (but their records from database perspective in main table are the same as real document records). Application field contains file extensions, and for this example we assume that all real documents are DOCX, and all attachments are PDF format.

CREATE TABLE relation
(
item1 int(4),
item2 int(4)
)

INSERT INTO relation
VALUES ('12345', '12346')

INSERT INTO relation
VALUES ('12349', '12350')

Relation table with all related docnumbers (they are identical with main table docnumbers). 2 example data are added to show relation between (12345 and 12346), (12349 and 12350) documents.

CREATE TABLE path
(
path varchar(128),
docnumber int(4),
)

INSERT INTO path
VALUES ('File12345.docx', '12345')

INSERT INTO path
VALUES ('File12346.pdf', '12346')

INSERT INTO path
VALUES ('File12347.docx', '12347')

INSERT INTO path
VALUES ('File12348.docx', '12348')

INSERT INTO path
VALUES ('File12349.docx', '12349')

INSERT INTO path
VALUES ('File12350.pdf', '12350')

Path table connects database with Windows filesystem, where files of documents are kept. Example data are created for each corresponding record from main table.

What I need in the end is to identify metadata for all PDF files in Windows filesystem (corresponding records will all have application value of 'PDF' in main table). But - since their records don't have real business values, I need to extract their related real document metadata.

Desired outcome with sample data:
12345, Real document name, 24, File12346.pdf (! - important - I need real document data combined with filename of attachment - related one !)
12349, Fourth real document name, 27, File12350.pdf

I don't need 12347, 12348 document data in output, because they don't have relations with records, which applications are PDF.

In my current SQL query I try to identify all data (buisness - attachments) from main table with 'PDF' values in application field, then go to relation table and find their related records (business - documents), and extract these document metadata. The problem is I can't get real document metadata with corresponding related attachment filenames out together.

SELECT main.DOCNUMBER, main.key_number, main.DOCNAME, path.PATH
FROM main INNER JOIN
path ON path.DOCNUMBER = main.DOCNUMBER
WHERE (main.DOCNUMBER IN
(SELECT relation.ITEM1
FROM path INNER JOIN
main ON path.DOCNUMBER = main.DOCNUMBER INNER JOIN
relation ON relation.ITEM2 = main.DOCNUMBER
WHERE main.APPLICATION = 'PDF')


#5

Umm... int(4) - I do not think that was ever in sql server.
Try something simple like:

SELECT P.docnumber, M.docname, M.key_number, P.[path]
FROM relation R
	JOIN main M
		ON R.item1 = M.docnumber
	JOIN path P
		ON R.item2 = P.docnumber

#6

Nope, srry; joins work well, but that way I get info about attachments, not real document metadata...
With my example data it would look like:
12346, No name because this is attached PDF to real document, NULL, File12346.pdf
(only 'File12346.pdf' is desired output; it needs to be combined with data from record 12345)


#7

What do you mean by:

Can you give some examples?


#8

Did you even bother to look at the query?
The join I gave gives you access to the DOCX row in main and the PDF row in path.


#9

gbritton - see below; from the example I gave - I bolded 'real documents' and put 'attachments' in italic.

As written before - in output I need to get all data about 'real documents' (only those, which are related to 'attachments') + filename (from path table) of attachment (not the real document)


#10

Thank you, after some modifying it really worked; I get the principle now.
Tried it before as well, just messed up something.

Theme closed, thank you all for responsiveness.