Select a single row based on conditions in multiple rows

Hello dear SQL experts,

I am working on a project related to arthropod species identification.

Here is a simplified table where we list various arthropod species with their properties:

i_spID | t_property | t_species            | i_property_val
-----------------------------------------------------------
1      | nmb_legs   | Blatta germanica     |  6
1      | nmb_wings  | Blatta germanica     |  4
2      | nmb_legs   | Musca domestica      |  6
2      | nmb_wings  | Musca domestica      |  2
3      | nmb_legs   | Latrodectus hesperus |  8
--------------------------------------------------------------

i_spID and t_property comprise the primary key.

Each species may have multiple properties (unlimited number) in the table. The ultimate goal is giving a set of properties to identify (ideally) a single species.
I've come up with a query having multiple subqueries, each for each pair of property/value:

SELECT t1.i_spID,t1.t_species
FROM (SELECT * FROM DBO.t_species WHERE t_property='nmb_legs'  AND i_property_val=6) AS t1,
     (SELECT * FROM DBO.t_species WHERE t_property='nmb_wings' AND i_property_val=4) AS t2
WHERE t1.i_spID=t2.i_spID

(this one identifies the cocroach 'Blatta germanica')

The problem is that as the number of properties grows the query becomes too combersome especially the "where" part in which we need to connect each subquery with all the others through i_spID.

So is there a better way to write the query?

Thank you,
Alex

-- create and fill in table SQL:

CREATE table DBO.t_species
 (
  i_spID                     int NOT NULL,
  t_property                 nvarchar(20) NOT NULL,
  t_species                  nvarchar(50),
  i_property_val             int
 );
ALTER TABLE DBO.t_species
ADD CONSTRAINT pkID_sp01 PRIMARY KEY (i_spID,t_property);
-------------------
INSERT INTO DBO.t_species (i_spID, t_property, t_species, i_property_val)
VALUES (1, 'nmb_legs', 'Blatta germanica', 6),
       (2, 'nmb_legs', 'Musca domestica', 6),
       (3, 'nmb_legs', 'Latrodectus hesperus', 8),
       (1, 'nmb_wings', 'Blatta germanica', 4),
       (2, 'nmb_wings', 'Musca domestica', 2)

Normalize the data more and use temporary tables for a forum post:

CREATE TABLE #Species
(
	spID int NOT NULL
		PRIMARY KEY
	,species nvarchar(50) NOT NULL
);
GO
INSERT INTO #Species
VALUES (1, 'Blatta germanica')
	,(2, 'Musca domestica')
	,(3, 'Latrodectus hesperus');
GO
CREATE TABLE #SpeciesProperties
(
	spID int NOT NULL
	,property varchar(20) NOT NULL
	,property_val int NOT NULL
	,PRIMARY KEY (spID, property)
);
GO
INSERT INTO #SpeciesProperties
VALUES (1, 'nmb_legs', 6)
	,(2, 'nmb_legs', 6)
	,(3, 'nmb_legs', 8)
	,(1, 'nmb_wings', 4)
	,(2, 'nmb_wings', 2);
GO

This will remove the subqueries:

WITH Properties
AS
(
	SELECT spID
	FROM #SpeciesProperties
	GROUP BY spID
	HAVING MAX(CASE WHEN property='nmb_legs' THEN property_val END) = 6
		AND MAX(CASE WHEN property='nmb_wings' THEN property_val END) = 4
)
SELECT S.spID, S.species
FROM #Species S
	JOIN Properties P
		ON S.spID = P.spID;

You could also have a Properties table and just have numbers in #SpeciesProperties.

1 Like

I didn't know if you wanted to list just the id or all the data. I initially listed all the data: if you want just the id / species(assuming that's unique?), then remove the outer query and just use the subquery.


DROP TABLE IF EXISTS #properties;

CREATE TABLE #properties (
    t_property nvarchar(20) NOT NULL,
    i_propery_val int NULL
    );
INSERT INTO #properties VALUES
    ('nmb_legs', 6),
    ('nmb_wings', 4)

SELECT s.*
FROM (
    SELECT i_spID
    FROM dbo.t_species s
    INNER JOIN #properties p ON p.t_property = s.t_property AND p.i_propery_val = s.i_property_val
    GROUP BY i_spID
    HAVING COUNT(*) = (SELECT COUNT(*) FROM #properties)
) AS matches
INNER JOIN dbo.t_species s ON s.i_spID = matches.i_spID

You could also have a more general solution with a stored procedure passing the properties and values as JSON, XML, table variable etc

eg With JSON

-- This would be a parameter
DECLARE @json nvarchar(MAX) =
'[
    {
        "property": "nmb_legs"
        ,"property_val": 6
    },
    {
        "property": "nmb_wings"
        ,"property_val": 4
    }
]';

CREATE TABLE #Parameters
(
	property varchar(20) COLLATE DATABASE_DEFAULT NOT NULL
		PRIMARY KEY
	,property_val int NOT NULL
);

DECLARE @rcount int;

INSERT INTO #Parameters
SELECT property, property_val
FROM OPENJSON(@json)
WITH (
    property varchar(20) '$.property'
    ,property_val int '$.property_val'
);

SET @rcount = @@ROWCOUNT;

WITH Properties
AS
(
	SELECT spID
	FROM #SpeciesProperties SP
		JOIN #Parameters A
			ON SP.property = A.property
				AND SP.property_val = A.property_val
	GROUP BY spID
	HAVING COUNT(*) = @rcount
)
SELECT S.spID, S.species
FROM #Species S
	JOIN Properties P
		ON S.spID = P.spID;
1 Like

Hi Ifor,

Works perfect! You are absolutely right about two tables, I really have two, but for some (probably stupid) reason decided to present a view in the question instead of the underlying tables.

Thanks for you help!

Hi ScottPletcher,

Thank you for your input. Yes, it works, however only for this particular case with two properties. Honestly I don't completely get the purpose of this #properties table. If this is a reference table to contain unlimited number of all possible properties then I don't know how we can make it work. Any thoughts?

Yes Ifor, most probably I will need a stored procedure to get the results. The query is to be dynamically assembled since the number of involved properties will vary, so it's a good reason to put all the logic to generate a needed statement into a SP.

There is no need for Dynamic SQL - just put the properties in a temp table, as Scott and I have shown, and join. I would also normalize out Properties allowing SpeciesProperties to only contain numbers; this will be more efficient with high volumes of data.

You are right, Ifor, I used inapropriate term, I didn't mean to compile a SQL string and execute it with sp_ExecuteSql :slight_smile:

Hi 

Hope this helps 

Different 

CREATE TABLE DBO.t_species
(
  i_spID                     int NOT NULL,
  t_property                 nvarchar(20) NOT NULL,
  t_species                  nvarchar(50),
  i_property_val             int
);
ALTER TABLE DBO.t_species
ADD CONSTRAINT pkID_sp01 PRIMARY KEY (i_spID, t_property);

INSERT INTO DBO.t_species (i_spID, t_property, t_species, i_property_val)
VALUES (1, 'nmb_legs', 'Blatta germanica', 6),
       (2, 'nmb_legs', 'Musca domestica', 6),
       (3, 'nmb_legs', 'Latrodectus hesperus', 8),
       (1, 'nmb_wings', 'Blatta germanica', 4),
       (2, 'nmb_wings', 'Musca domestica', 2);

WITH PropertyCriteria AS (
    SELECT i_spID
    FROM DBO.t_species
    WHERE (t_property = 'nmb_legs' AND i_property_val = 6)
       OR (t_property = 'nmb_wings' AND i_property_val =

Hi harishgg1,

Yes, got it, thank you!