Summary of all bq jobs

Question

Summary of all bq jobs

Is there a way to list the entire job id using the bq command line tool for a given time interval? I need to loop through all the identifier and find if there is any error.

I use the web interface to find out the job id, and then use the command:

bq show -j --format=prettyjson job_id

Later, I would manually copy the insert of the "error" part of the output. It takes a long time to report a summary of work for a given day.

+4

google-bigquery

shantanuo 20 sept '12 at 6:38

source share

2 answers

The following shell script is close to what I need to report.

 #!/bin/sh bq ls -j `bq show | grep ^Project | awk '{print $2}'` | grep "`date +'%d %b'`" | awk '{print $1}' > tosave.txt for myjob in `cat tosave.txt` do bq ls -j `bq show | grep ^Project | awk '{print $2}'` | grep $myjob bq show --format=prettyjson -j $myjob | grep -C2 "message" | head done

+1

shantanuo 20 sept '12 at 8:52

source share

Michael manoochehri · Accepted Answer · 2012-09-20T07:18:02+0000

Of course, you can list up to the last 1000 tasks for the project that you have access to by doing:

 bq ls -j --max_results=1000 project_number

If you have more than 1000 tasks, you can also write a Python script to list all the tasks by paging through the results in batches of 1000 - for example:

 import httplib2 import pprint import sys from apiclient.discovery import build from apiclient.errors import HttpError from oauth2client.client import AccessTokenRefreshError from oauth2client.client import OAuth2WebServerFlow from oauth2client.client import flow_from_clientsecrets from oauth2client.file import Storage from oauth2client.tools import run # Enter your Google Developer Project number PROJECT_NUMBER = 'XXXXXXXXXXXX' FLOW = flow_from_clientsecrets('client_secrets.json', scope='https://www.googleapis.com/auth/bigquery') def main(): storage = Storage('bigquery_credentials.dat') credentials = storage.get() if credentials is None or credentials.invalid: credentials = run(FLOW, storage) http = httplib2.Http() http = credentials.authorize(http) bigquery_service = build('bigquery', 'v2', http=http) jobs = bigquery_service.jobs() page_token=None count=0 while True: response = list_jobs_page(jobs, page_token) if response['jobs'] is not None: for job in response['jobs']: count += 1 print '%d. %s\t%s\t%s' % (count, job['jobReference']['jobId'], job['state'], job['errorResult']['reason'] if job.get('errorResult') else '') if response.get('nextPageToken'): page_token = response['nextPageToken'] else: break def list_jobs_page(jobs, page_token=None): try: jobs_list = jobs.list(projectId=PROJECT_NUMBER, projection='minimal', allUsers=True, maxResults=1000, pageToken=page_token).execute() return jobs_list except HttpError as err: print 'Error:', pprint.pprint(err.content) if __name__ == '__main__': main()

Summary of all bq jobs

More articles: