Looking for a Better Database Solution for Trading Systems

Rene Rose 2021

I develop trading systems for stocks and other securities. To manage price data from various sources, I use a custom program written in VB.Net. This program maintains a LiteDB database for each underlying, resulting in several dozen database files. I've found that LiteDB is very slow when inserting large amounts of data, and since I import a significant amount of data, I'm looking for a more efficient solution. Currently, I'm considering switching to a relational DB system like SQLite or MS SQL Server. My data model includes a class for price data from CSV files, which contains the following fields: "Identifier" (text) "TimeStamp" (date/time) "Open", "High", "Low", "Close" (single precision floating point numbers) "Size" (integer) The issue is that the identifier is identical for several hundred thousand data points, and the timestamps and price values often repeat. Since this data is distributed across various instruments, duplicate entries occur. One possible solution would be to distribute this data class across multiple tables to avoid duplicate entries. However, I'm concerned that this may cause performance issues when inserting several hundred thousand data points. Has anyone had similar experiences or advice to share? Thanks in advance!

Jeremy Falcon

Are there no other NoSQL databases you can try besides LiteDB? If you go the relational route and attempt to normalize your flattened data being inserted, it's not going to get faster. Not sure if your important is mean to be realtime or not. So, this is just general-purpose ideas... These would be your options: 1) Are you using connnection pooling or opening an new connection with every insert? Perhaps that's the bottleneck. LiteDB may not have the concept of connections at all, but if it does that's the first place to check. 2) Can you toss in more threads to this import process? Will LiteDB even handle concurrency or will it choke? 3) Determine why your current LiteDB is choking. Is the bottleneck in your code or the DB? Is there a locked transaction not working? Is it thread safe? Are you using more than one thread getting locked? etc. 4) If the above doesn't work, find a different NoSQL DB that doesn't choke. Get a real one and not one shipped as a DLL once meant for concurrency as it'll have the best throughput even on a single user. I've used MongoDB, I'm sure there are others. 5) If none of that works, then go download MariaDB (MySQL fork), but make sure this import table is using ISAM storage. SQL Server will not be anywhere as fast as this as it doesn't allow you to choose storage engines. This will be for unnormalized data only that supports nothing fancy like triggers, constraints, and foreign keys, but ISAM is fast preciously for that reason. And, you can always have another import process/ETL transform the data if needed in non-realtime. 6) Since it's just an import, the fastest (but last resort) solution would be to just write out the data into an appended binary file. You'd still need a process to import it into an actual DB, but that can be offloaded so the original process isn't bottlenecked.

Rene Rose 2021 wrote:

One possible solution would be to distribute this data class across multiple tables to avoid duplicate entries. However, I'm concerned that this may cause performance issues when inserting several hundred thousand data points.

Duplication isn't an issue as long as your tables aren't indexed... for writing. Most NoSQL databases don't index, so chances are you're good. Reading is a different story however.

Jeremy Falcon

RedDk

Cutting to a chase, both SQLite (always) and MS SQL Server (for development, certainly) are free so assuming that you've got your experience writing SQL in LiteDB and all your code ... it should be rather easy to convert to either one of these, and without much ado, find out for yourself whether by using them they solve the problems you have.

jschell

Rene Rose 2021 wrote:

The issue is that the identifier is identical for several hundred thousand data points,

Err...that is not a 'database' problem. It is a architecture/design problem. Can you store duplicate rows in a database? Yes. Should you? That is architecture/design question. Certainly if the rows are exactly the same then there is no point in storing them. But even in that case you can.

Rene Rose 2021 wrote:

when inserting several hundred thousand data points.

Over what period of time? One day? Easy. Any database can handle that. One second? Then yes that is going to be a problem. But even then if that only happens once a day it is not a problem. But if continuous then yes it is a problem. Lets say it is continuous and you do 100,000 a second and each row is 100 bytes. Size = (24 * 60 * 60) * 100,000 * 100 If I did the math right then that means you are storing 800 gigs a day. So if every day then for a year you are going to need a petrabyte of storage for each year. So you might want to check your cloud pricing options before you dive in. Not to mention that only using 100 bytes of data for one row is very small.

RichardInToronto

Hi Rene, I have no experience with kdb+, and I don't know if it works with your development tools. I went to a couple of live demonstrations of it a few years ago. There is a "Q" programming language that was similar to SQL, and less verbose. I can't remember if it was aggregation, but the speed was incredible. You might want investigate it: kdb+ | KX[^] Hope that helps.

Quote:

Richard Rogers LinkedIn: https://www.linkedin.com/in/rrcdn/