Connection (programming term)

honggarae 17/10/2021 981

Inner join

Inner join (inner join) is a common "join" operation used in applications, it is generally Both are the default connection types. The inner join combines the columns of the two tables (such as A and B) based on the join predicate to produce a new result table. The query compares each row of table A with each row of table B and finds the combination that satisfies the join predicate. When the join predicate is satisfied, the matching rows in A and B will be combined by column (combined side by side) into a row in the result set. The result set generated by the connection can be defined as a Cartesian product (cross-connection) of the two tables first - each row in A and each row in B are combined, and then the records that satisfy the connection predicate are returned. In fact, SQL products will try to use other ways to realize the connection as much as possible, and the Cartesian product operation is very inefficient.

SQL defines two different syntax methods to represent "connection". The first is the "explicit connection symbol", which uses the keyword JOIN explicitly, and the second is the "implicit connection symbol", which uses the so-called "implicit connection symbol". The implicit connection symbol puts the tables that need to be connected in the FROM part of the SELECT statement, separated by commas. This constitutes a "cross-connect", the WHERE statement may place some filter predicates (filter conditions). Those filter predicates are functionally equivalent to explicit connection symbols. The SQL 89 standard only supports internal connections and cross connections, so only implicit connection is the expression; the SQL 92 standard adds support for external connections. JOIN expression.

Inner joins can be further divided into: equal joins, natural joins, and cross joins (see below).

The program should pay special attention to the columns of the join basis that may contain NULL values , NULL value does not match any value (even with itself) - unless predicates such as IS NULL or IS NOT NULL are explicitly used in the join condition.

For example, the following query passes the Employee table and the Department The shared attribute DepartmentID of the tables joins the two tables. Where the DepartmentID of the two tables match (if the join predicate is satisfied), the query will combine the LastName, DepartmentID and of the two tables >DepartmentName and other columns, put them in a row (a record) of the result table. When the DepartmentID does not match, no data will be generated in the result table.

Explicitly Connection example:

SELECT *FROM employee INNER JOIN department ON employee.DepartmentID=department.DepartmentID

Equivalent to:

SELECT* FROM, departmentE employee.DepartmentID = department.DepartmentID

The output result of explicit inner join:

tr>
Employee.LastNameEmployee.DepartmentID< /th>Department.DepartmentNameDepartment.DepartmentID
Robinson34Secretary 34
Jones33Engineering Department33 td>
Smith34Secretary34
Steinberg33Engineering Department33
Rafferty31Sales Department31

< b>Note: neither the employee "Jasper" nor the department "marketing" appears. They do not have any matching records in the expected table: "Jasper" has no associated department, and the department number 35 does not have any employees. In this way, there is no information about Jasper or the marketing department in the table after the "connection". Relative to the expected result, this behavior may be a subtle bug. External connections may be able to avoid this situation.

Equal link

Equal join (equi-join, or equijoin) is a comparative join (θ join ) is a special case, its connection predicate only uses equality comparison. Use other comparison operators (such as

SELECT *FROM employee INNER JOIN department ON employee.DepartmentID = department.DepartmentID

SQL provides an optional short notation to express equality connections , It uses the USING keyword (Feature ID F402):

SELECT *FROM employee JOIN department USING (DepartmentID)

The USING structure is not just syntactic sugar, the result of the above query It is different from the result of the query obtained by using an explicit predicate. In particular, the column listed in the USING section will appear only once in the temporary table of the connection result, and there is no table name to qualify the column name. In the above In the example, the temporary table of the join results in a separate column named DepartmentID instead of employee.DepartmentID or department.DepartmentID.

The USING statement is now used by MySQL, Oracle, PostgreSQL, SQLite, and DB2 /400 and other products.

Natural connection

Natural connection is further specialized than equal connection. When two tables are connected in natural connection, all the columns with the same name in the two tables will be To be compared, this is implicit. In the result table obtained by natural join, the column with the same name in the two tables only appears once.

The query instance used for inner join above can be expressed by natural join As follows:

SELECT *FROM employee NATURAL JOIN department

After using the USING statement, the DepartmentID column appears only once in the connection table, and there is no table name as a prefix:

< td>Smith< tr>
DepartmentIDEmployee.LastNameDepartment.DepartmentName
34Secretary
33JonesEngineering Department
34Ro binsonSecretary
33SteinbergEngineering Department
31RaffertySales Department

When using JOIN USING or NATURAL JOIN in Oracle, if Add a table name as a prefix to the names of the columns shared by the two tables, and a compilation error will be reported: "ORA-25154: column part of USING clause cannot have qualifier" or "ORA-25155: column used in NATURAL join cannot have qualifier" ".

Cross join

Cross join(cross join), also known as Cartesian join(cartesian join) or cross product(Product), it is the basis of all types of inner joins. Regarding the table as a collection of row records, cross join returns the Cartesian product of these two collections. This is actually equivalent to the link condition of the inner link being "forever true", or the link condition does not exist.

If A and B are two sets, their cross-connection is recorded as: A × B.

The SQL code used for cross join lists the table name in FROM, but does not contain any filtered join predicates.

Explicit cross join example:

< pre>SELECT *FROM employee CROSS JOIN department

Implicit cross connection example:

SELECT *FROM employee, department;
< td>31 td>< td>33
Employee.LastNameEmployee.DepartmentIDDepartment.DepartmentNameDepartment.DepartmentID
Rafferty td>31Sales Department31
Jones33 Sales Department31
Steinberg33Sales Department
Smith34Sales Department31
Robinson34Sales Department31
JasperNULLSales Department31
Rafferty31 Engineering Department33
Jones33Engineering Department
Steinberg33Engineering Department33
Smith34Engineering Department33
Robinson 34Engineering Department33
JasperNULL td>Engineering Department33
Rafferty31Secretary 34
Jones33Secretary34
Steinberg33Secretary34
Smith34Secretary34
Robinson34Secretary34
JasperNULLSecretary34< /td>
Rafferty31Marketing Department35
Jones33Marketing Department35
Steinberg33Marketing Department35
Smith34Market Department35
Robinson34Marketing Department35< /td>
JasperNULLMarketing Department35

Cross join will not apply any predicate to filter the records in the result table. Programmers can use the WHERE statement to further filter the result set.

Outer join

Outer join does not require that every record in the two tables connected has a matching record in the other table. The table that needs to keep all records (even if there is no matching record for this record) is called the retention table. Outer joins can be further divided into left outer joins, right outer joins and full joins according to the rows of the left table, right table or all tables in the join table.

(In this case left< /i><left> and right<right> represent the two sides of the JOIN keyword.)

In the standard SQL language, there is no implicit connection symbol for outer joins.

When an outer join contains both an ON clause and a WHERE clause, you should only write the join condition between tables in ON In the clause, the filtering of the data in the table must be written in the WHERE clause. The conditional expressions of internal connections can be placed in either the ON clause or the WHERE clause. This is because for external joins, rows in the reserved table that are filtered out by the ON clause must be added back. After this operation, the WHERE clause will be used to filter the rows in the connection result.

Left outer join

Left outer join(left outer join), also referred to as left outer join b>(left join), if the two tables A and B are left outer join, then the result table will contain all the records of the "left table" (ie table A), even if those records are in the "right" The table "B does not match the join condition. This means that even if the ON statement has 0 matches in B, the join operation will still return a record, but the value of each column from B in this record is NULL. This means that left outer join will return the combination of all records in the left table and matching records in the right table (if there is no matching record in the right table, the values ​​of all columns from the right table are set to NULL). If a row of the left table has multiple matching rows in the right table, the rows of the left table will be copied as many as the matching rows of the right table, and combined to generate the join result.

For example, This allows us to find the department of an employee and display all employees, even if the employee does not have an associated department. (In the above internal connection part, there is an opposite example. Employees without an associated department number are not displayed in the results. ).

Left outer connection example: (The line added relative to the inner connection is marked in italics)

SELECT * FROM employee LEFT OUTER JOIN department ON employee.DepartmentID = department.DepartmentID 
Employee.LastNameEmployee.DepartmentIDDepartment.DepartmentNameDepartment.DepartmentID
Jones33Engineering33
Rafferty 31Sales31
Robinson34Clerical34
Smith34Clerical34
JasperNULLNULLNULL
Steinberg33Engineering33

Right outer join

Right outer join, also abbreviated as Right join, it is completely similar to the left outer join, except that the order of the joined tables is reversed. If A is the right Connect to table B, then each row in "right table" B will appear at least once in the connection table. If the record of table B does not find a matching row in "left table" A, the value of the column from A in the connection table Set to NULL.

The right join operation returns all rows of the right table and the rows that match these rows in the left table (there is no match, the column value from the left table is set to NULL).

For example, this allows us to display the department when looking for each employee and his department information, when there are no employees in this department.

Example of right connection: (relative The line added in the internal link is marked in italics)

SELECT* FROM employee RIGHT OUTER JOIN department ON employee.DepartmentID= department.DepartmentID
< td>34
Employee.LastName Employee.DepartmentIDDepartment.DepartmentNameDepartment.DepartmentID
SmithClerical34
Jones33Engineering 33
Robinson34Clerical34
Steinberg33Engineering33
Raff erty31Sales31
NULLNULL td>Marketing35

In fact, explicit right connection It is rarely used, because it can always be replaced with a left connection-just change the position of the table. In addition, the right connection does not have any additional functions compared to the left connection. The above table can also be obtained by using the left connection: < /p>

SELECT* FROM department LEFT OUTER JOIN employee ON employee.DepartmentID= department.DepartmentID

Fully connected

Fully connected is left and right outside The union of the connection. The connection table contains all the records of the connected table. If there are no matching records, it is filled with NULL.

For example, this allows us to view every employee and every employee in the department Departments with employees, at the same time, you can also see employees who are not in any department and departments without any employees.

Fully connected example:

SELECT* FROM employee FULL OUTER JOIN department ON employee.DepartmentID = department.DepartmentID
Employee.LastNameEmployee.DepartmentIDDepartment.DepartmentName Department.DepartmentID
Smith34Clerical34
Jones33Engineering33
Robinson 34Clerical34
JasperNULLNULLNULL
Steinberg33Engineering< /td>33
Rafferty31Sales31
NULLNULLMarketing35< /td>

Some database systems (such as MySQL) do not directly support full connections, but they can be simulated by the union of left and right outer connections (reference: union). And above, etc. examples of monovalent:p>

 SELECT * FROM employee LEFT JOIN department ON employee.DepartmentID = department.DepartmentIDUNIONSELECT * FROM employee RIGHT JOIN department ON employee.DepartmentID = department.DepartmentIDWHERE employee.DepartmentID iS NULLpre> < p>SQLite does not support right connections, and all external connections can be simulated as follows:
SELECT employee.*, department.*FROM employee JOIN department ON employee.DepartmentID=department.Department.Department.Department. department.*FROM department LEFT JOIN employee ON employee.DepartmentID = department.DepartmentIDWHERE employee.DepartmentID IS NULL

Self-connection

The self-connection example is with itself A good explanation.

Example

Build a query that tries to find records like this: Each record contains two employees, they are from The same country. If you have two employee tables (Employee), then as long as the employees in the first table and the employees in the second table are in the same country, you can use a normal join (equal join) operation Go get this table. However, all the employee information here is in a single big table.

The following modified employee table Employee:

tr>
Employee table( Employee)
EmployeeIDLastNameCountryDepartmentID
123RaffertyAustralia31
124Jones< /td>Australia33
145SteinbergAustralia 33
201RobinsonUnited States34
305SmithUnited Kingdom34
306< /td>JasperUnited KingdomNULL

The query of the sample solution can be written as follows: p>

SELECT F.EmployeeID, F.LastName, S.EmployeeID, S.LastName, F.CountryFROM Employee F, Employee SWHERE F.Country = S.CountryAND F.EmployeeID

After it is executed, it will generate the following Table:

< td>Jones
Self-joined employee table (Employee) through Country
EmployeeIDLastNameEmployeeIDLastNameCountry
123Rafferty124JonesAustralia
123 td>Rafferty145SteinbergAustralia
124145SteinbergAustralia
305Smith 306JasperUnited Kingdom

About this example, please note:

  • F and S are the aliases of the first and second copies of the employee table (employee)

  • The condition F.Country = S .Country excludes the combination of employees in different countries. This example only expects to get the combination of employees in the same country.

  • The condition F.EmployeeID < S.EmployeeID excludes Employee ID is the same combination.

  • F.EmployeeID < S.EmployeeID excludes duplicate combinations. Without this condition, it will generate useless similar to the following table Data (only United Kingdom as an example)

EmployeeIDLastNameEmployeeIDLastNameCountry
305Smith305SmithUnited Kingdom
305Smith306Jasp erUnited Kingdom
306Jasper305Smith< /td>United Kingdom
306Jasper306JasperUnited Kingdom

Only two lines satisfy the requirements of the initial question. The first and last items are useless for this example.

Alternatives

The results of outer join queries can also be obtained through associated subqueries. For example

SELECT employee.LastName, employee.DepartmentID, department.DepartmentName FROM employee LEFT OUTER JOIN department ON employee.DepartmentID=department.DepartmentID

It can also be written as follows:

SELECT employee.LastName, employee.DepartmentName, department department. WHERE employee.DepartmentID = department.DepartmentID) FROM employee

Join algorithm

To perform a join operation, there are three basic algorithms.

Nested loop( LOOP JOIN)

Similar to the double loop in C language programming. The table that is scanned row by row as the outer loop is called the external input table; for each row of the external input table, another table that needs to be scanned and checked for matching is called the internal input table (equivalent to the inner loop). It is suitable for the situation where the number of rows in the external input table is small and the internal input table has created an index.

MERGE JOIN

Similar to the merging of two ordered arrays. Both input tables are sorted on the merged column; then the two tables are joined or discarded row by row in order. If the index is built in advance, the computational complexity of the merge connection is linear.

HASH JOIN

Suitable for intermediate results of queries, usually temporary tables without indexes; and when the number of rows of intermediate results is large. Hash concatenation selects the input table with a smaller number of rows as the generation input, applies a hash function to the values ​​of the concatenated columns, and puts its rows (the storage location) into the hash bucket.

Latest: Linker

Next: 3 (Internet language)