How to measure the popularity of a language through Github Archive data?

I am trying to gauge the popularity of a programming language with:

  • The number of stars in the repositories combined with ...
  • The programming languages โ€‹โ€‹used in repo and ...
  • Total bytes of code in each language (recognizing that some languages โ€‹โ€‹are more / less detailed)

Conveniently, there is massive Github information provided by Github Archive and hosted by BigQuery. The only problem is that I donโ€™t see the โ€œlanguageโ€ available in any of the payloads for the various types of events in Github Archive.

Here's the BigQuery query that I ran, trying to find if and where, the language can be populated with Github Archive data:

SELECT *
FROM [githubarchive:month.201612]
WHERE JSON_EXTRACT(payload, "$.repository.language") is null
LIMIT 100

-, , , Github ? ? , BigQuery github_repos, , . - (.. "" , ).

!

+4
2

BigQuery GitHub Archive GHTorrent -

pull, ( http://mads-hartmann.com/2015/02/05/github-archive.html):

SELECT COUNT(*) c, JSON_EXTRACT_SCALAR(payload, '$.pull_request.base.repo.language') lang
FROM [githubarchive:month.201612]
WHERE JSON_EXTRACT_SCALAR(payload, '$.pull_request.base.repo.language') IS NOT NULL
GROUP BY 2
ORDER BY 1 DESC
LIMIT 10

http://i.imgur.com/PmDxoEX.png

:

SELECT COUNT(*) c, repo.name 
FROM [githubarchive:month.201612]
WHERE type='WatchEvent'
GROUP BY 2
ORDER BY 1 DESC
LIMIT 10

http://i.imgur.com/yXDHUlB.png

GHTorrent:

SELECT language, SUM(bytes) bytes
FROM [ghtorrent-bq:ght.project_languages]
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10

http://i.imgur.com/8RvrVBA.png

, . GitHub BigQuery.

, !

+7
SELECT 
  JSON_EXTRACT_SCALAR(payload, '$.pull_request.head.repo.language') AS language,
  COUNT(1) AS usage
FROM [githubarchive:month.201601] 
GROUP BY language
HAVING NOT language IS NULL
ORDER BY usage DESC
+1

Source: https://habr.com/ru/post/1665808/


All Articles