It is my first question on Database Administration Stack Exchange. Please be kind if something is not done the nice way..
I would like to be advised concerning the way I’m working on my database, in order to make it as ACID as possible.
I am working on a Web Application using Flask Framework (Python) and a PostgreSQL (9.4) database. I am using SQLAlchemy as ORM.
I am doing Financial Analysis on Stocks handled by my Hedge Fund.
My work consists in 3 parts:
- updating the database daily
- doing animal mathematics on it
- looking for best results
- presenting the results on the web app
Here is my database:
security
table (the column active
only means whether the stock should be updated or not):
id | ticker | active
----+----------------+--------
1 | 3993 HK Equity | t
2 | A2A IM Equity | t
3 | AA UN Equity | f
4 | AA/ LN Equity | f
5 | AAL LN Equity | t
6 | AALB NA Equity | t
valuation
table
security_id | date | px_close | px_open | px_high | px_low | volume
-------------+------------+----------+----------+----------+---------+--------
42 | 2012-11-06 | 44.757 | 45.099 | 45.336 | 43.918 | 157295
42 | 2012-11-07 | 45.099 | 44.791 | 45.099 | 44.226 | 162671
42 | 2012-11-08 | 44.245 | 45.051 | 45.08 | 44.241 | 56763
42 | 2012-11-09 | 43.994 | 44.307 | 44.672 | 43.771 | 123424
43 | 2012-11-06 | 143.994 | 144.255 | 144.601 | 143.785 | 54598
43 | 2012-11-07 | 143.885 | 143.629 | 144.108 | 143.524 | 39380
43 | 2012-11-08 | 144.392 | 144.056 | 144.928 | 143.866 | 60509
43 | 2012-11-09 | 144.008 | 144.563 | 144.61 | 143.913 | 62784
regression
table (I have a lot of other tables like this one, e.g polynomial_regression
, learning_clustering
, multiple_regression
, etc. but it is for the example):
security_id | driver_id | security_field | driver_field | lag | percentile | percentage | 1w_change | r2
-------------+-----------+----------------+--------------+-----+------------------+--------------------+--------------------+-------------------
10260 | 622 | px_open | px_low | 0 | 2.39085239085239 | 0.233283156908672 | -1.65479076267458 | 0.76137019793836
10260 | 622 | px_open | best_eps | 0 | 25.8333333333333 | 0.070876900435926 | 4.62480252764613 | 0.593456515505473
10260 | 622 | px_high | px_close | 0 | 2.18295218295218 | 0.211301222947739 | -7.67181960128018 | 0.756561105159073
10260 | 622 | px_high | px_open | 0 | 3.01455301455301 | 0.173091398167856 | -2.06843453731421 | 0.755248026072118
10260 | 622 | px_high | px_high | 0 | 2.5987525987526 | 0.189682461742673 | -10.5755212601686 | 0.748422363395702
10260 | 622 | px_high | px_low | 0 | 2.5987525987526 | 0.206663840524837 | -3.72904823112292 | 0.758691421914131
10260 | 622 | px_high | best_eps | 0 | 26.7857142857143 | 0.0642618785351929 | 0.600880162491535 | 0.586932757156747
10260 | 622 | px_low | px_close | 0 | 2.18295218295218 | 0.241074941757577 | -3.72991088758724 | 0.758602680862882
10260 | 622 | px_low | px_open | 0 | 3.01455301455301 | 0.193588738372212 | -0.616152379637857 | 0.753488068588711
10260 | 622 | px_low | px_high | 0 | 2.80665280665281 | 0.21139288442443 | -4.97343018089906 | 0.743185969928886
Main of the operations done on the database are made via the Flask App, but not always. It’s why I would like to have my database as ‘strong’ as possible (I mean here ACID). What’s more, I obviously trust more Postgres that Flask.
The update of my database is made everyday at 3am. New values of the day before are inserted, and new active
stocks are updated from the beginning of the times (and other stocks, e.g the ones just after dividends, etc.).
Then, 2h later, when the update is done, a Flask script is computing all the new regressions pairwise, and inserting the results.
The thing is that between the update of the values, and the update of the regressions, the database is not in a ‘stable’ state: new values have been inserted, but the corresponding regressions doesn’t exist yet.
Would you think I should translate all my mathematical analysis from Python to SQL, and create Triggers launching it whenever a new value is inserted?
- My problem is that I don’t know the ‘speed’ of SQL when it deals with stats and maths, compared to other language. I know that basic functions (correlation, R^2, …) are already implemented in SQL, but I am using far more ‘advanced’ (I mean ‘complex’ here..) functions (even if I have not represented it here).
- If you think it is not a good idea, would you have other tips I could use to stress the ‘well-behaviour’ of my database, despite of using external tools as the Python App?
Thank you very much for your advises,
Edouard