Inner Join vs Natural Join vs USING: are there any advantages?

Question

Inner Join vs Natural Join vs USING: are there any advantages?

Suppose I have two simple tables, for example:

CREATE TABLE departments(dept INT PRIMARY KEY, name); CREATE TABLE employees(id PRIMARY KEY, fname, gname, dept INT REFERENCES departments(dept));

(simplified, of course).

I could have any of the following statements:

 SELECT * FROM employees e INNER JOIN departments d ON e.dept=d.dept; SELECT * FROM employees e NATURAL JOIN departments d; SELECT * FROM employees e JOIN departments d USING(dept);

A working example can be found here: SQL Fiddle: http://sqlfiddle.com/#!15/864a5/13/10

All of them give almost the same results - of course, the same lines.

I always preferred the first form because of its flexibility, readability and predictability - you clearly define what is connected with what.

Now, besides the fact that the first form has a duplicated column, is there a real advantage for the other two forms? Or is it just syntactic sugar?

I see that the drawback of the latter forms is that you are expected to call your primary and foreign keys the same, which is not always practical.

+5

sql join

Manngo Jan 27 '16 at 8:55

source share

4 answers

Entering JOIN by default enables INNER JOIN. So:

 SELECT * FROM employees e INNER JOIN departments d USING(dept);

equivalently

 SELECT * FROM employees e JOIN departments d USING(dept);

and you get only one dept column as a result.

In the same way

 SELECT * FROM employees e INNER JOIN departments d ON e.dept=d.dept;

equivalently

 SELECT * FROM employees e JOIN departments d ON e.dept=d.dept;

but as a result, you get duplicate columns.

INNER JOIN easier to read, especially if your request contains other types of connections (LEFT or RIGHT or ..) included in it.

A NATURAL JOIN accepts the same name in both tables. Thus, you cannot do NATURAL JOIN if, for example, in the employee table, the join column is called "department", and in the table of your departments your join column is called "dept"

+1

Thibault clement Jan 27 '16 at 9:13

source share

From Oracle Documentation :

NATURAL JOIN is a JOIN operation that creates an implicit join clause for you based on common columns in two joined tables. Shared columns are columns with the same name in both tables.
A NATURAL JOIN can be an INNER join, a LEFT OUTER join, or a RIGHT OUTER join. The default connection is INNER.

Sentence

 TableA JOIN tableB USING(column)

as you noted, just syntactic sugar for

 TableA JOIN tableB ON tableA.column = tableB.column

0

Pieter geerkens Jan 27 '16 at 9:12

source share

NATURAL JOIN is not widely supported, and is not JOIN USING (i.e. not in SQL Server)

There are many arguments why NATURAL JOIN is a bad idea. Personally, I think that not explicitly mentioning things like associations is causing disaster.

For example, if you add a column to a table without realizing that this corresponds to a natural join, you may have unexpected code failures when a natural join suddenly does something completely different. You think adding a column will not break anything, but it can break poorly written views and a natural mix.

When you create a system, you should never allow these kinds of risks to enter it. This is the same as creating views for multiple tables without a table alias in each column and using insert without a list of columns.

For these reasons, if you are just learning SQL now, break the habit of using them.

0

Nick.McDermaid Jan 27 '16 at 9:40

source share

philipxy · Accepted Answer · 2016-01-27T10:19:07+0000

Now, besides the fact that the first form has a duplicated column, is there a real advantage for the other two forms? Or is it just syntactic sugar?

TL DR NATURAL JOIN is used in a specific relational programming style that is simpler than the regular SQL style. (Although when embedded in SQL, it is burdened with the rest of the SQL query syntax.) This is because 1. it directly uses simple predicate logic operators, precision engineering language (including software development), science (including computer science) and mathematics, and moreover 2. simultaneously and as an alternative, it directly uses the simple operators of relational algebra .

A common complaint about NATURAL JOINs is that since the shared columns are not explicit, incorrect column matching may occur after changing the schema. And this may be the case in a particular development environment. But in this case, it was required that only some columns be combined, and NATURAL JOIN without PROJECT is not suitable. Therefore, these arguments assume that NATURAL JOIN is being used improperly. Moreover, opponents do not even know that they ignore the requirements. Such complaints are specious. (Moreover, software design principles for audio software result in no interfaces with such specifications.)

Another related erroneous claim in the same camp is that NATURAL JOIN does not even take into account relations with foreign keys . But any join exists due to table values, not limitations . The request does not require restrictions. If a constraint is added, the request remains true. If the restriction is discarded, then the request relying on it becomes incorrect and should be changed to a phrase that does not rely on it, which should not be changed. This has nothing to do with NATURAL JOIN.

You have described the difference in action: only one copy of each common column is returned.

From Is there any rule of thumb for creating an SQL query from a human-readable description? :

It turns out that natural language expressions, logical expressions, and relational algebra expressions and SQL expressions (a hybrid of the last two) correspond to a rather straightforward path.

For example, from Codd 1970 :

The relationship in question is called a component. [...] The value of the component (x, y, z) is that part x is a direct component (or subassembly) of part y, and z units of part x are needed to assemble one unit of part y.

From this answer :

Each base table has an operator template, aka predicate, parameterized by the names of the columns by which we place the row or leave it.

Including a string in a predicate gives aka. Rows that make a true offer go to the table, while rows that make a false offer remain valid. (Thus, the table approves the sentence of each current row and does not indicate the sentence of each missing row.)

But each value of the table expression has a predicate in its expression. The relational model is designed so that if tables T and U contain rows, where T (...) and U (...) (respectively), then:

T NATURAL JOIN U contains strings where T (...) AND U (...)
T WHERE condition contains strings where T (...) AND condition
T UNION CORRESPONDING U contains strings where T (...) OR U (...)
T EXCEPT CORRESPONDING U contains strings where T (...) AND NOT U (...)
SELECT DISTINCT columns to keep FROM T contains rows where EXISTING columns exist, SO T (...)
etc.

While reasoning about SQL otherwise ... is not "natural":

The SQL SELECT statement can be considered algebraically as 1. implicitly RENAMEing each column C table with the (possibly implicit) correlation name T to TC , then 2. CROSS JOINING, then 3. RESTRICTING for IN INER ON, then 4. RESTRICTING per WHERE, then 5. PROJECTing per SELECT, then 6. RENAMEING for SELECT, discarding T. s, then 7. implicitly RENAMEing to discard the remaining T. Between the algebra T. -RENAMEings, operators can also be regarded as logical operators and table names as their predicates: T JOIN ... vs Employee T.EMPLOYEE has name T.NAME ... AND ... But conceptually inside the SELECT statement is a CROSS JOIN table with a double RENAME with TC for column names, and external tables have C for column names.

As an alternative, the SQL SELECT statement can be logically thought of as 1. entering FORSOME T IN E around the entire expression for the correlation name T and the base name or subquery E , then 2. referring to the value of the quantified T , using TC to refer to part of C , then 3. build the result lines from TC behind FROM, etc., then 4. name the columns of the result lines in the SELECT clause, then 4. leaving the scope of FORSOME s. Again, algebraic operators are considered as logical operators and table names as their predicates. Again, this conceptually has TC inside SELECT, but C outside with correlation names going and going.

These two interpretations of SQL are nowhere as easy as just using JOIN or AND, etc., interchangeably. (You do not have to accept that it is simpler, but this perception is why NATURAL JOIN and UNION / EXCEPT CORRESPONDING exist.) (Arguments criticizing this style outside the context of its intended use are specious.)

USAGE is a variation of strong aspen with one foot in the NATURAL JOIN camp and one in CROSS JOIN. This has no real role in the first, because there are no duplicate column names. In the latter, it more or less simply reduces the JOIN clauses and SELECT clauses.

I see that the drawback of the latter forms is that you are expected to call your primary and foreign keys the same, which is not always practical.

PK (primary keys), FK (foreign keys) and other restrictions are not needed for requests. (Knowing a column is a function of others that allows scalar subqueries, but you can always phrase without them.) Moreover, any two tables can be focused. If you need two columns with the same name with NATURAL JOIN, you will rename SELECT AS.

Inner Join vs Natural Join vs USING: are there any advantages?

More articles: