Is there an efficient way to parallelize a large number of GET requests in Java. I have a file with 200,000 lines, each of which requires a GET request from Wikimedia. And then I have to write part of the answer to the shared file. I pasted the main part of my code below as a link.
while ((line = br.readLine()) != null) { count++; if ((count % 1000) == 0) { System.out.println(count + " tags parsed"); fbw.flush(); bw.flush(); } //System.out.println(line); String target = new String(line); if (target.startsWith("\"") && (target.endsWith("\""))) { target = target.replaceAll("\"", ""); } String url = "http://en.wikipedia.org/w/api.php?action=query&prop=revisions&format=xml&rvprop=timestamp&rvlimit=1&rvdir=newer&titles="; url = url + URLEncoder.encode(target, "UTF-8"); URL obj = new URL(url); HttpURLConnection con = (HttpURLConnection) obj.openConnection(); // optional default is GET con.setRequestMethod("GET"); //add request header //con.setRequestProperty("User-Agent", USER_AGENT); int responsecode = con.getResponseCode(); //System.out.println("Sending 'Get' request to URL: " + url); BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream())); String inputLine; StringBuffer response = new StringBuffer(); while ((inputLine = in.readLine()) != null) { response.append(inputLine); } Document doc = loadXMLFromString(response.toString()); NodeList x = doc.getElementsByTagName("revisions"); if (x.getLength() == 1) { String time = x.item(0).getFirstChild().getAttributes().item(0).getTextContent().substring(0,10).replaceAll("-", ""); bw.write(line + "\t" + time + "\n"); } else if (x.getLength() == 2) { String time = x.item(1).getFirstChild().getAttributes().item(0).getTextContent().substring(0, 10).replaceAll("-", ""); bw.write(line + "\t" + time + "\n"); } else { fbw.write(line + "\t" + "NULL" + "\n"); } }
I googled, and there seem to be two options. One of them is to create threads, and the other is to use something called the Contractor. Could someone give a little guidance on which would be more appropriate for this task?
source share