I just made some improvements for the Hough transform line detector within the framework, which will help with this, but you will need to do additional preprocessing of your image to select this blue frame.
Let me explain how this operation works. Firstly, it detects the edges of the image. For each pixel defined as an edge (right now, I'm using a Canny edge detector for this), the coordinate of that pixel is extracted. Then, each of these coordinates is used to draw a pair of lines in a parallel coordinate space (based on the process described in "Real-time detection of lines using parallel coordinates and OpenGL" from Dubská et al.).
Pixels in a parallel coordinate space where straight lines intersect will increase in intensity. The points of greatest intensity in the parallel coordinate space indicate the presence of a line in the real scene.
However, only pixels that are local maxima for intensity indicate real lines. The task is to determine local maxima for suppressing noise from busy scenes. This is something that I have not completely decided in this operation. In your image above, a huge number of lines are associated with a mess of points located above the detection threshold in a parallel coordinate space, but not deleted properly so as not to be local maxima.
However, I made some improvements, so now I get a cleaner result from the operation (I just quickly did it with a live video of my screen):
I fixed the error in the local non-simulation suppression filter and expanded the area in which it works from 3x3 to 5x5. It still leaves a bunch of not maximum points that contribute to noise, but it is much better.
You will notice that this is still not quite what you want. It collects lines in text, but not in your inbox. This is because the black text on a white background creates very strong, very sharp edges at the edge detection stage, but the light blue selection box on a white background requires an extremely low threshold to even be raised in any edge detection process.
If you always collect a blue square, then I would recommend that you perform a preprocessing operation to uniquely identify the blue objects in the scene. An easy way to do this is to define a custom filter that subtracts the red component from blue for each pixel, negative flooring values and accepts the result of this calculation as an output for red, green, and blue channels. You might even want to multiply the result by 2.0-3.0 to amplify this difference.
The result should be an image in which the blue areas in your image show both white and everywhere, like black. This will greatly improve the contrast around your selection box and make text selection easier. You will need to experiment with the correct parameters to make it as reliable as in your case.