Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Databases Querying Relational Databases Joining Table Data with SQL Review and Practice

SQL self join

I have perhaps more of a career advice question: I have a graduate school colleague working at Sony PlayStation, who advised me that the proper use of self join querying is valuable in his work.

Is there a video, or another good resource, that covers self joins?

When I attempted learning the self join, last year on w3schools, I found it was quite difficult, and so I would love a great resource.

Thank you very much!

2 Answers

Brendan Whiting
seal-mask
.a{fill-rule:evenodd;}techdegree seal-36
Brendan Whiting
Front End Web Development Techdegree Graduate 84,738 Points

In case this is helpful, let me try and explain what's going on in the W3Schools example:

SELECT A.CustomerName AS CustomerName1, B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City 
ORDER BY A.City;

So, the question they're trying to answer is "What customers live in the same city as each other?". They have a Customers table that has a CustomerName and a City column. So all the information they need is in this one table, they don't need any other tables. In order to do. this, they need to alias the table as two different aliases - FROM Customers A, Customers B - otherwise there's no way for us to put the same table on both the left and the right side of this comparison.

What this is going to output is a list of all possible pairs of customers who live in the same city as each other. So if Seth, Mark, and Brendan all lived in the same city, the possible combinations that it would generate would be:

Seth Mark
Seth Brendan
Mark Seth
Mark Brendan
Brendan Seth
Brendan Mark

Some of these are redundant. Flip them around and it's the same pair. But solving that problem is maybe for later...

Notice that in this part of the query WHERE A.CustomerID <> B.CustomerID it's basically using <> to mean "not equals". So I don't want to pair the customer with themselves, I want to pair them with someone else in the same city.

So Brendan, the number of rows is looking like a permutation calculation. This means if a city had ten rows of people, there would be 90 different pairs (45 unique ones). Interesting.

Here's something that may help https://stackoverflow.com/questions/3362038/what-is-self-join-and-when-would-you-use-it

To me this example is odd in that maybe you wouldn't structure the database this way. It seems that it would be more appropriate to have an employee table with employee subtype tables such as a supervisor table, officer level table and then a self join would not be necessary. So perhaps your db design should be such that Self Joins shouldn't be necessary?? Perhaps that is why Treehouse doesn't address Self Joins?? But I'm very new to database design concepts and best practices so take with many large grains of salt.

Thanks Seth, I appreciate you putting that thought forward. That's a very good hypothesis indeed. Yes, I'd love to hear any other database people chime in.