I created a table to insert all the documents of my application. This is a simple table (let it be called DOC_DATA), which has 3 fields: DOC_ID, FileSize, Data. The data is varbinary (max).
Then I have many tables (CUSTOMERS_DOCUMENTS, EMPLOYEES_DOCUMENTS, ...) that contain other data (for example, “document description”, “Created”, “Customer ID” ...). My case is not entirely true, anyway, by writing this example, I can express myself better. All these tables have FK for DOC_DATA.DOC_ID).
When a user searches for a client document, he will run a query similar to this:
select CD.*, DD.FileSize
from DOC_DATA DD
join CUSTOMERS_DOCUMENTS CD ON CD.DOC_ID = DD.DOC_ID
My question is: will the performance of this query be bad, because we also read a field from a potentially huge table (the DOC_DATA table may contain a lot of GB of data), or is this not a problem?
An alternative solution is to put the FIleSize field in all the main tables (CUSTOMER_DOCUMENTS, EMPLOYEES_DOCUMENTS, ...). Of course, the connection has a small impact on performance, now I do not ask to join or not to join in general, but to join or not to join the HUGE table, while I am not interested in HUGE fields.
Please note: I am not developing a new system, I support an inherited system, so here I do not discuss what is the best design in general, but this is the best option in this case.