Relationship between parallel execution and party shoning that speeds up mass data processing (1/4): Codezine
At the beginning
In the third part, we saw partitioning, a data structure that limits the range when accessing a large amount of data.This time, I will explain the parallel execution (parallel execution, parallel execution, Parallel Processing), which is often used in processing that accesses large amounts of data such as analysis and aggregation processing.Parallel execution has a strong relationship with party shoning.
In addition, although Oracle is provided by Oracle, the database listed as an example in this series is explained so that those who do not use Oracle products can refer to it.
Target reader
This series assumes the following readers.
What is parallel execution?
Tens of millions and billions of data generated by the transaction system will be used later for aggregation and analysis.In other words, one aggregation / analysis process occurs in which a large amount of data is accessed.Since such processing is generally time -consuming, implementation that is often used to execute in a short time is a combination of partitioning that limits the access range of the data described in the third time, and the combination of parallel execution.is.
Database servers are generally composed of multiple CPU core and storage devices.The aim of parallel execution is to reduce the overall processing time by dividing them, assigning them into multiple CPU cores, and executing them at the same time as there are a large amount of data to be processed.Recent CPUs have several dozens of cores per socket, and one server can be expected to significantly reduce the processing time due to parallel execution.
A primitive means of implementing parallel execution in the system is that a person may divide the data to be processed and put it into multiple processing programs.However, many DBMS have the function of automatically paralleling one SQL processing.This time, I will look at the implementation that automatically parallel this one process.
Process architecture for parallel execution
When parallel execution, the DBMS OS process or thread is launched more than one, and the processing is performed simultaneously by allocating them to multiple CPU threads.Since the processing results shared by multiple processes must be sent back to one database client, the processing results are aggregated in one process and this one process is sent back to the database client.The process connected by the database client supervises the processing of parallel execution and is generally called a coordinator process.
In the case of Oracle Database, when the database client connects to the Oracle server, the connection is connected to one Oracle server process.If you do parallel, this Oracle server process will play a role in the coordinator process.
The processing data is shared in multiple parallel execution processes, but here is the concept of party shoning.If the data split and the assignment method are not appropriate, the processing time of each parallel execution process will be uneven.Further, if the data exchange ability is not between parallel execution processes, the inappropriate data split will increase the number of coordinator processes to bind the result returned by each parallel execution process, that is, the number of parts that cannot be parallelized will increase.Become.