Is there a common Java library that will handle URL encoding / decoding for a collection of strings?

I often have to encode or decode a large collection or array of strings. Besides repeating through them and using the static URLDecoder.decode (string, "UTF-8"), are there libraries that will make this type of operation more efficient?

The employee claims that using the static method to decode strings in place is not thread safe. Why would that be?

+6
source share
4 answers

JDK URLDecoder has not been effectively implemented. In particular, it relies internally on StringBuffer (which unnecessarily introduces synchronization in the case of URLDecoder). The Apache community provides URLCodec , but reportedly also has similar performance issues, but I have not yet confirmed that it is still in the latest version.

Mark A. Zizmer posted a problem and performance report using URLDecoder. He logged some error messages and eventually wrote a complete replacement. Since this is the case, I will provide some key passages here, but you really should read the entire source article here: http://blogger.ziesemer.com/2009/05/improving-url-coder-performance-java.html p>

Selected quotes:

Java provides a standard implementation of this function in java.net.URLEncoder and java.net.URLDecoder. Unfortunately, this is not the most efficient, thanks to both the API and the details within the implementation. A number of performance-related errors have been filed at sun.com regarding URLEncoder.

There is an alternative: org.apache.commons.codec.net.URLCodec from the Apache Commons Codec. (Commons Codec also provides useful implementations for Base64 encoding.) Unfortunately, Commons' URLCodec suffers from some of the same problems as Java URLEncoder / URLDecoder.

...

Recommendations for JDK and Commons:

When building any of the buffer classes, for example. ByteArrayOutputStream, CharArrayWriter, StringBuilder or StringBuffer, evaluate and skip estimated bandwidth. The JDK URLEncoder currently does this for its StringBuffer, but should do so also for the CharArrayWriter instance. A generic URLCodec should do this for its ByteArrayOutputStream instance. If the default buffer size classes are too small, they may need to resize them by copying to new, larger buffers - this is not exactly a โ€œcheapโ€ operation. If the default buffer sizes for the classes are too large, the memory may be unnecessarily wasted.

Both implementations are dependent on Charsets, but only accept them as a string name. Charset provides a simple and small cache for finding names - saving only the last 2 encodings. This should not be relied upon, and both should accept instances of Charset for other compatibility reasons.

Both implementations use only inputs and outputs of a fixed size. JDK URLEncoder only works with String instances. URLCodec Commons is also string based, but also works with byte [] arrays. This is a design-level constraint that essentially prevents efficient processing of inputs of longer or variable lengths. Instead, stream-supporting interfaces such as CharSequence, Appendable, and java.nio Buffer should be supported by ByteBuffer and CharBuffer implementations.

...

Please note that com.ziesemer.utils.urlCodec is 3 times faster than JDK URLEncoder and more than 1.5 times faster than JDK URLDecoder. (The JDK URLDecoder was faster than URLEncoder, so there wasnโ€™t much room for improvement.)

I think your colleague is mistaken in suggesting that URLDecode not be thread safe. Other answers are explained in detail here.

EDIT [2012-07-03] - For a later comment posted by OP

Not sure if you were looking for more ideas or not? You are right that if you intend to work on a list as an atomic collection, you will have to synchronize all access to the list, including links outside of your method. However, if you agree with the contents of the returned list, potentially different from the original list, then the brute force approach for working in a "batch" sequence from a collection that can be modified by other threads might look something like this:

/** * @param origList will be copied by this method so that origList can continue * to be read/write by other threads. * @return list containing decoded strings for each entry that was in origList at time of copy. */ public List<String> decodeListOfStringSafely(List<String> origList) throws UnsupportedEncodingException { List<String> snapshotList = new ArrayList<String>(origList); List<String> newList = new ArrayList<String>(); for (String urlStr : snapshotList) { String decodedUrlStr = URLDecoder.decode(urlStr, "UTF8"); newList.add(decodedUrlStr); } return newList; } 

If this does not help, Iโ€™m still not sure what you need, and it would be better for you to create a new, more concise question. If this is what you asked for, be careful, because this example from the context is not a good idea for many reasons.

+7
source

Apache has a URLCodec that can be used to decode encodings.

If your static method only works with local variables or final initialized variables, it is completely thread safe.

As parameters live on the stack, and they are completely thread safe, the final constants are immutable, so they cannot be changed.

The following code is completely thread safe:

 public static String encodeMyValue(String value){ // do encoding here } 

Care must be taken if the final variables are mutable, which means that you cannot reassign it, but you can change its internal representation (properties).

0
source

Thread protection is never really required with static functions (or is it a project failure). Especially not if you donโ€™t even have access to static variables in the class.

I would suggest using the function you used before and iterating through the collection

0
source

Basically, there is no magic thread safety for static methods or methods or instance constructors. They can be called simultaneously by several flows if synchronization is not applied. If they do not retrieve or modify any shared data, they will usually be safe โ€” if they access shared data, you need to be more careful.

therefore, in your case, you can write a synchronized method on top of this urldecoding or encoding, with which you can enforce thread safety.

0
source

Source: https://habr.com/ru/post/914657/


All Articles