My goal is to create a hash map using String as a key and input value as a HashSet of strings.
OUTPUT
Here is the result:
Hudson+(surname)=[Q2720681], Hudson,+Quebec=[Q141445], Hudson+(given+name)=[Q5928530], Hudson,+Colorado=[Q2272323], Hudson,+Illinois=[Q2672022], Hudson,+Indiana=[Q2710584], Hudson,+Ontario=[Q5928505], Hudson,+Buenos+Aires+Province=[Q10298710], Hudson,+Florida=[Q768903]]
In my idea, it should look like this:
[Hudson+(surname)=[Q2720681,Q141445,Q5928530,Q2272323,Q2672022]]
The goal is to store a specific name on Wikidata, and then all the Q values associated with it can be as follows:
This is the page for Bush.
I want Bush to be the Key, and then for all the different points of origin, all the different ways that they Bushcould be connected to the Wikidata terminal page, I want to save the corresponding Q value or unique alphanumeric identifier.
, , - , wikipedia, - , wikidata.
, Bush :
George H. W. Bush
George W. Bush
Jeb Bush
Bush family
Bush (surname)
Q:
. (Q23505)
. (Q207)
(Q221997)
(Q2743830)
Bush (Q1484464)
,
Key: Bush
: Q23505, Q207, Q221997, Q2743830, Q1484464
, , .
Q. .
Key: Jeb Bush
: Q221997
Key: George W. Bush
: Q207
..
github, .
, , :
public static HashSet<String> put_to_hash(String key, String value)
{
if (!q_valMap.containsKey(key))
{
return q_valMap.put(key, new HashSet<String>() );
}
HashSet<String> list = q_valMap.get(key);
list.add(value);
return q_valMap.put(key, list);
}
:
while ((line_by_line = wiki_data_pagecontent.readLine()) != null)
{
// if we can determine it a disambig page we need to send it off to get all
// the possible senses in which it can be used.
Pattern disambig_pattern = Pattern.compile("<div class=\"wikibase-entitytermsview-heading-description \">Wikipedia disambiguation page</div>");
Matcher disambig_indicator = disambig_pattern.matcher(line_by_line);
if (disambig_indicator.matches())
{
//off to get the different usages
Wikipedia_Disambig_Fetcher.all_possibilities( variable_entity );
}
else
{
//get the Q value off the page by matching
Pattern q_page_pattern = Pattern.compile("<span class=\"wikibase-toolbar-container\"><span class=\"wikibase-toolbar-item " +
"wikibase-toolbar \">\\[<span class=\"wikibase-toolbar-item wikibase-toolbar-button wikibase-toolbar-button-edit\"><a " +
"href=\"/wiki/Special:SetSiteLink/(.*?)\">edit</a></span>\\]</span></span>");
Matcher match_Q_component = q_page_pattern.matcher(line_by_line);
if ( match_Q_component.matches() )
{
String Q = match_Q_component.group(1);
// 'Q' should be appended to an array, since each entity can hold multiple
// Q values on that basis of disambig
put_to_hash( variable_entity, Q );
}
}
}
:
public static void all_possibilities( String variable_entity ) throws Exception
{
System.out.println("this is a disambig page");
Document docx = Jsoup.connect( "https://en.wikipedia.org/wiki/" + variable_entity ).get();
Elements linx = docx.select( "p:contains(" + variable_entity + ") ~ ul a:eq(0)" );
for (Element linq : linx)
{
System.out.println(linq.text());
String linq_nospace = linq.text().replace(' ', '+');
Wikidata_Q_Reader.getQ( linq_nospace );
}
}
, , Key, . . , - , .