Not a direct answer, but three possible options:
Use the built-in caching mechanism of RavenDB
My initial guess is that your caching mechanism is actually hurting performance. The RavenDB client has built-in caching (see here how to configure it: https://ravendb.net/docs/article-page/3.5/csharp/client-api/how-to/setup-aggressive-caching )
The problem is that the cache is local to each server. If server A uploaded the file earlier, server B will still need to extract it if this happens in order to process this file next time.
One option that you could implement is to split the workload. For instance:
- Server A => fetch files starting with AD
- Server B => fetch files starting with EH
- Server C => ...
This will provide cache optimization on each server.
Get a bigger car
If you still want to use your own caching mechanism, there are two things that I think might be the bottleneck:
- Disk access
- JSON destabilization
For these problems, the only thing I can imagine is to get more resources:
- If it's a drive, use premium storage with an SSD.
- If this is deserialization, get a VM with a large processor
RAM Cache Files
Alternatively, instead of writing files to disk, save them in memory and get a virtual machine with a large amount of RAM. You do not need THIS a lot of RAM, since 1000 files * 10 MB are still only 1 GB. This will eliminate disk access and deserialization.
But in the end, itβs best to first determine where the bottleneck is and see if it can be mitigated using the built-in RavenDB caching mechanism.
source share