Metadata collection

I am trying to use the https://pypi.python.org/pypi/pyoai metadata collection package to collect data on this site https://www.duo.uio.no/oai/request?verb=Identify

I tried the example on the pyaoi website, but that didn't work. When I test it, I get an error message. The code:

from oaipmh.client import Client from oaipmh.metadata import MetadataRegistry, oai_dc_reader URL = 'http://uni.edu/ir/oaipmh' registry = MetadataRegistry() registry.registerReader('oai_dc', oai_dc_reader) client = Client(URL, registry) for record in client.listRecords(metadataPrefix='oai_dc'): print record 

This is the stack trace:

 Traceback (most recent call last): File "/Users/arashsaidi/PycharmProjects/get-new-DUO/get-files.py", line 8, in <module> for record in client.listRecords(metadataPrefix='oai_dc'): File "/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site-packages/oaipmh/common.py", line 115, in method return obj(self, **kw) File "/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site-packages/oaipmh/common.py", line 110, in __call__ return bound_self.handleVerb(self._verb, kw) File "/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site-packages/oaipmh/client.py", line 65, in handleVerb kw, self.makeRequestErrorHandling(verb=verb, **kw)) File "/Users/arashsaidi/.virtualenvs/lbk/lib/python2.7/site-packages/oaipmh/client.py", line 273, in makeRequestErrorHandling raise error.XMLSyntaxError(kw) oaipmh.error.XMLSyntaxError: {'verb': 'ListRecords', 'metadataPrefix': 'oai_dc'} 

I need to access all the files on the page that I linked to above and create an additional file with some metadata.

Any suggestions?

+5
source share
2 answers

In the end, I used the Sickle package, which, it seemed to me, has much better documentation and is easier to use:

This code gets all the sets, and then extracts each record from each set. This seems like the best solution, given the fact that there are more than 30,000 records. Doing this for each set gives you more control. Hope this can help others. I have no idea why libraries use OAI, it doesn't seem like a good way to organize data for me ...

 # gets sickle from OAI sickle = Sickle('http://www.duo.uio.no/oai/request') sets = sickle.ListSets() # gets all sets for recs in sets: for rec in recs: if rec[0] == 'setSpec': try: print rec[1][0], self.spec_list[rec[1][0]] records = sickle.ListRecords(metadataPrefix='xoai', set=rec[1][0], ignore_deleted=True) self.write_file_and_metadata() except Exception as e: # simple exception handling if not possible to retrieve record print('Exception: {}'.format(e)) 
+2
source

It seems that the link from the pyoai website ( http://uni.edu/ir/oaipmh ) is dead because it returns 404.
However, you should receive data from your site as follows:

 from oaipmh.client import Client from oaipmh.metadata import MetadataRegistry, oai_dc_reader URL = 'https://www.duo.uio.no/oai/request' registry = MetadataRegistry() registry.registerReader('oai_dc', oai_dc_reader) client = Client(URL, registry) # identify info identify = client.identify() print "Repository name: {0}".format(identify.repositoryName()) print "Base URL: {0}".format(identify.baseURL()) print "Protocol version: {0}".format(identify.protocolVersion()) print "Granularity: {0}".format(identify.granularity()) print "Compression: {0}".format(identify.compression()) print "Deleted record: {0}".format(identify.deletedRecord()) # list records records = client.listRecords(metadataPrefix='oai_dc') for record in records: # do something with the record pass # list metadata formats formats = client.listMetadataFormats() for f in formats: # do something with f pass 
0
source

Source: https://habr.com/ru/post/1209711/


All Articles