AI Library
DuckDB-NSQL is a sophisticated text-to-SQL model comprising 7 billion parameters, meticulously created by MotherDuck and Numbers Station. This model is designed with the specific purpose of generating SQL queries from natural language prompts, making it an invaluable tool for developers and data analysts alike.
DuckDB-NSQL is built upon Meta’s original Llama-2 model, which is a powerful transformer architecture. It has undergone additional pre-training on a comprehensive dataset of general SQL queries and has been fine-tuned on a specialized dataset containing text-to-SQL pairs uniquely relevant to DuckDB.
This model stands out due to the following features:
To utilize DuckDB-NSQL for SQL generation, users must provide a schema and specific prompts. Here’s a basic example:
Provided this schema: CREATE TABLE orders ( OrderID bigint, CustomerID bigint, OrderDate timestamp, ProductName varchar(255), Quantity int, PricePerUnit double, TotalAmount double ); Give me orders placed after January 1, 2023
SELECT * FROM orders WHERE OrderDate > '2023-01-01';
The model has been fine-tuned on a dataset consisting of:
During the training, cross-entropy loss was utilized to maximize the sequential input likelihood while minimizing overfitting on the SQL components of the pairs.
DuckDB-NSQL is primarily designed for generating SQL queries from specified table schemas and natural language instructions. It excels when used with the defined prompt formats and is versatile in producing a range of SQL statements beyond simple SELECT queries.
For those looking to leverage local language models for database querying, DuckDB-NSQL seamlessly integrates into various workflows.