The code shown in this notebook is a simplified version of a coding class at the University of Copenhagen Library, showing how to retrieve total citations per paper and citations per year for a set of DOI's.
In this example, we hard-code the DOI's, ideally these should be obtained from a different source, e.g. a research information system, database or similar.
In this example, I picked five completely random papers from my reading library, and list their DOI's in a hard-coded variable.
import requests
import matplotlib.pyplot as plt
dois = ['10.1371/journal.pone.0073381',
'10.1016/j.joi.2010.09.003',
'10.1353/pla.2006.0026',
'10.1002/asi.22797',
'10.1016/j.jclinepi.2009.09.012']
# Example of how to read in DOI's from a raw text file, one DOI per line. Uncomment (remove #) to use
#with open("doi.txt") as file:
# dois = file.readlines()
# dois = [doi.rstrip() for doi in dois]
We are interested in citations per paper (saved in cites
and citations per year (total for all papers), saved in cites_by_year
. We use the single-entity retrieval method from openAlex, using the DOI as ID.
If a DOI does not exist in openAlex, the requests
-query returns a 404 response, which we could use to report better on the missing coverage, however, for this simple example, we just stick to a try-except solution and report the number of errors, e
.
cites_by_year = {}
cites = []
e = 0
for doi in dois:
try:
response = requests.get("https://api.openalex.org/works/https://doi.org/" + doi)
result = response.json()
cites.append(result["cited_by_count"])
cbys = result["counts_by_year"]
for cby in cbys:
y = cby["year"]
c = cby["cited_by_count"]
if y in cites_by_year:
cites_by_year[y] = cites_by_year[y] + c
else:
cites_by_year[y] = c
except:
e = e + 1
continue
print("DOI's with error: " + str(e))
Just checking the results, to see if they are making any kind of sense:
print(cites)
print(cites_by_year)
Looks like everything works as intended. We could end here, but:
First citations per year:
plt.style.use('seaborn-whitegrid')
cby = dict(sorted(cites_by_year.items()))
x = list(cby.keys())
y = list(cby.values())
plt.plot(x, y, '-o', color='#276FBF');
And now citations per paper, ranked by total citations:
cpp = sorted(cites, reverse = True)
x = list(range(1,len(cpp)+1))
plt.bar(x,cpp)