Posterous theme by Cory Watilo

Testing SQL queries without a database

I recently devised a way to test SQL queries without the benefit (or headache) of an actual database. This works on the simple cases I need.

I’m big on code testing and test-driven development in general. I also write a lot of small utilities for my company’s products that perform SQL queries. So naturally I went looking for a simple solution to test the results of the queries.

The tests generally consist of an input set of data, a query and a result set of data or result status. There tend to be lots of little queries to test. The cases (or datasets) that are of interest don’t need to be complicated or large, they just need to represent the problem at hand, usually just a few rows. Additionally, the queries want to operate on the same tables, but don’t want to interfere with each other. So the need is for a query to be tested against different datasets representing the same database and to do so in isolation.

A regular database has the benefit of parsing and performing the query as intended, but also has the baggage of the system. In order to change datasets, you either need to inject the data into the database via a query or you need to detach a database file and attach a file with a different dataset (we won’t consider running multiple database instances). It is also difficult to capture the initial dataset for each test case. The system has to be maintained and kept running, etc.

It would be nice to have an easier method of generating and querying datasets, preferably text-based. Getting rid of the baggage of a running database system would be nice as well.

The solution I found does this, but it has limitations.

As for the data format, CSV works fine. The tabular form used by databases lends itself to a format which works in Excel. As Excel is one of my favorite general-purpose tools, CSV is both powerful and accessible.

As for how to access the data, Microsoft provides a built-in ODBC driver for Text files. By using ODBC to run the query, you can run a real Microsoft SQL query against a text file as a database, provided you do some setup and that you are only trying to read the database. Updates and deletes are not supported by the Text driver.

I need to re-edit this post to provide the exact details, but here are the broad strokes:

  1. Using the ODBC Data Sources tool, create a File DSN that specifies the format of your database. Specify the file that represents you table and make sure the format (CSV) and the column headings are defined as fields. This generates a DSN file (I named it txt.dsn) and a schema.ini file. I put these both in the test directory so they get checked into version control and propagated to the build server.
  2. Put the data for a query into the CSV file.
  3. Generate the SQL query that you are testing and, if the test requires a result set, the expected result set.
  4. Use a tool that provides ODBC access (in my case, Python with pyodbc and py.test) to write a test which compares the actual result of the query to the expected result.

The SQL query which you are testing will need to refer to the table by the actual filename, with the “.csv” filename. Since I’m testing real queries, I have to rewrite the table references in the query to include the proper filename.

Because of the limitations of the text driver, you can test simple select queries (I haven’t tested joins), but I don’t think you can handle more complex things like stored procedures, selecting to tables, variables, etc. etc.

For my purposes, I only need to be able to have and switch between multiple small datasets, which is cake with this system. Put your datasets into CSV files, make sure the schema.ini file gets updated with field definitions (using the ODBC data source tool) and direct your queries to the filenames. Beats a real database hands down, while letting you test SQL queries with features like subqueries and aliasing.

You can also keep result sets in CSV files and use Python’s CSV access to compare result rows. In order to be independent from ordering issues with result sets, I convert the actual and expected result sets to Python sets before comparing. Converting to sets also requires that the rows themselves (both actual and expected) be cast to tuples so they may be hashed as set members.

Wrinkles

You can’t update records and you can’t delete them. This is essentially read-only. I test these kinds of queries by converting them to SELECT statements where possible. INSERTS are supposedly supported.

When comparing results from ODBC in Python versus CSV, blank fields come across as ‘’ in ODBC and None in Python. A quick list comprehension converts one to the other.

Whitespace in fields is also a comparison issue. Either make sure fields are stripped of whitespace in your CSV, or do whitespace stripping in your code.