Database System Best Practices Series -- Using Prepared Statement
Database System Best Practices Series -- Using Prepared Statement
Prepared statement is a feature supported by all mainstream relational databases, including Klustron.
Today I will talk about the benefits of using prepared statements. There are two main advantages: first, it can greatly improve the security of information systems and prevent SQL injection; second, it can improve query performance to a certain extent.
Therefore, it is strongly recommended that you use the prepared statement feature. As for the specific programming and API usage of prepared statements, this article will not elaborate on it. You can refer to the documentation of the client API of the relevant database. The following will explain the working principles of the two huge advantages of the prepared statement feature in detail.
Why Prepared Statement Can Prevent SQL Injection
The Principle of SQL Injection
Most information systems today use a software or webpage as the front-end to receive user input, and then use these inputs in the application layer of the server (i.e. using java, php, python, CGI, etc. to develop server application software systems) to implement application logic, and finally use user input as parameters to operate the backend database for data manipulation.
Many application developers directly use the front-end user's input string as part of the SQL query, concatenating it into an SQL statement. This is a very poor programming practice. If there is a system that has a function to delete products on the front-end, it uses a text box to receive the user input of a product code, and then the back-end uses this code as a parameter to delete the product. The pseudo-code for concatenating SQL strings is:
SQLstr = 'delete from products where pcode=\'' + textbox.string_value + '\'';
If the user inputs a string like this in the frontend: xx' or '1, then the resulting SQL statement sent to the database is
delete from products where pcode=’xx’ or ’1‘;
And this statement will delete all data in the system products table.
Typically in such cases, if the user input is used as a non-string parameter value, such as a numerical value, experienced developers will convert the user input to a constant of the required type and then concatenate it, so as to avoid injection. Using the above query as an example, if the pcode column is of the int type, then the user input xx‘ or ‘1
would be converted to 0 after numerical conversion, and then concatenated with the SQL statement, resulting in the following statement sent to the backend: delete from products where pcode=0
Although this may result in the deletion of another row, it will not result in the loss of all table data. However, some inexperienced developers do not even perform the above numerical type conversions, making it impossible to avoid such pitfalls in simple cases.
And if the pcode in the above example is indeed of string type, there are some application-level tools and development examples to prevent SQL injection, such as prohibiting various operator characters (such as +=-/|&) or prohibiting certain SQL keywords (such as select, or, and, etc.), but it is impossible to completely eliminate SQL injection.
It can be said that as long as the user input string is still directly parsed by the syntax parser of the relational database, SQL injection can always be achieved. In simpler terms: as long as you are still directly concatenating SQL statements with the user input string (partially or entirely) at the application layer, SQL injection will always occur.
So how can you avoid SQL injection once and for all? This is where prepared statements come into play.
Working Principle of Prepared Statement
Taking the SQL statement mentioned earlier as an example, the prepared statement is: delete from products where pcode=?;
The SQL parser of the relational database parses and optimizes this statement to form a query plan. The client-side user program obtains a handle to this prepared query statement, which is valid for the session's lifetime. This query plan is the method for executing this query statement, but the query plan cannot be executed without specific parameters.
When the user binds parameters to this prepared statement using its handle, the user can execute this query plan. At this point, regardless of what parameters the user binds, the parameter is not re-parsed by the SQL parser. It is only used as a specified type of parameter value by this query plan to complete the query execution. So, even if the parameter bound by the user is xx‘ or '1
the SQL statement executed is equivalent to: delete from products where pcode=‘xx\' or \'1';
This way, there is no risk of any data loss.
Moreover, users can repeatedly bind different parameters to this prepared statement. If there are a large number of SQL statements with this structure, the database system avoids parsing and optimizing the same SQL statements repeatedly, thereby improving system performance.