Grouping Multiple Text Representations of the Same Entity

Dataset: Wellcome Trust Article Processing Charges Paid in 2012-2013 / Code for Kenya Health Facilities in Kenya

In [14]:
import pandas as pd
import re

fi = pd.read_csv("University returns_for_figshare_FINAL.csv")
# fi = pd.read_csv("eHealth-Kenya-Facilities-Download-21102015.csv")

fi
Out[14]:
PMID/PMCID Publisher Journal title Article title COST (�) charged to Wellcome (inc VAT when charged)
0 PMC3378987\r Elsevier Academy of Nutrition and Dietetics Parent support and parent mediated behaviours ... �2,379.54
1 PMCID: PMC3780468 ACS (Amercian Chemical Society) Publications ACS Chemical Biology A Novel Allosteric Inhibitor of the Uridine Di... �1,294.59
2 PMCID: PMC3621575 ACS (Amercian Chemical Society) Publications ACS Chemical Biology Chemical proteomic analysis reveals the drugab... �1,294.78
3 NaN American Chemical Society ACS Chemical Biology Discovery of _2 Adrenergic Receptor Ligands Us... �947.07
4 PMID: 24015914 PMC3833349 American Chemical Society ACS Chemical Biology Discovery of an allosteric inhibitor binding s... �1,267.76
5 : PMC3805332 American Chemical Society ACS Chemical Biology Synthesis of alpha-glucan in mycobacteria invo... �2,286.73
6 PMCID:\r PMC3656742\r American Chemical Society ACS Chemical Neuroscience Continuous online microdialysis using microflu... �1,186.80
7 PMCID: 3584654 AMERICAN CHEMICAL SOCIETY ACS NANO HYDROXY-TERMINATED CONJUGATED POLYMER NANOPART... �642.89
8 23373658 American Chemical Society Publications ACS Nano Skin dendritic cell targeting via microneedle ... �693.39
9 PMCID:\r PMC3727331\r International Union of Crystallography Acta Crystallographica Section D, Biological ... Clustering procedures for the optimal selectio... �771.42
10 PMCID: PMC3565438 International Union of Crystallography Acta Crystallographica Section D: Biological C... Intensity statistics in the presence of transl... �773.74
11 PMCID: PMC3668577 International Union of Crystallography (iucr) Acta Crystallographica Section F: Structural B... Structure of diaminohydroxyphosphoribosylamino... �785.60
12 PMCID: PMC3606566 International Union of Crystallography (iucr) Acta Crystallographica Section F: Structural B... Structure of Pseudomonas aeruginosa inosine 5'... �807.67
13 \r PMC3498934 International Union of Crystallography Acta Crystallographica, Section D Nearest-cell: a fast and easy tool for locatin... �757.18
14 PMID:22993091 PMC3447403 International Union of Crystallography Acta Crystallography D Crystallization, dehydration and experimental ... �774.19
15 PMC3087623 International Union of Crystallography Acta D Structure of HLA-A*0301 in complex with a pept... �750.16
16 PMC3808818 Society for Publication of Acta Dermato-Venere... Acta Dermato Venereologica The Importance of a Full Clinical Examination:... �653.96
17 PMID: 23828613 (July 2013 Epub) Springer Acta Diabetologica A rare SNP in pre-miR-34a is associated with i... �2,336.28
18 PMC3374517 International Union of Crystallography ACTA F Crystallization and preliminary crystallograph... �754.90
19 PMC3549237 Springer Acta Neuropathol Overexpression of human wild-type FUS causes p... �1,901.04
20 NaN Springer Acta Neuropathologica �-Synucleinopathy associated with G51D SNCA mu... �1,884.01
21 PMC3661931 Springer Acta Neuropathologica Insufficient OPC migration into demyelinated l... �2,250.97
22 3535376 Springer Acta Neuropathologica Unravelling the enigma o selective vulnerabili... �2,348.21
23 PMC3798121 Wiley-Blackwell Acta Opthalmologica Visual and psychological morbidity among patie... �2,270.16
24 21624095 PMCID: PMC3734623 Wiley Acta Physiol Integration of transient receptor potential ca... �1,991.50
25 In Process Wiley Addiction Acute alcohol-related dysfunction as a predict... �1,919.51
26 23734913 Wiley Addiction Childhood conduct disorder trajectories, prior... �2,352.94
27 NaN Springer Advances in Experimental Medicine and Biology Wavelet cross-correlation to investigate regio... �1,928.45
28 NaN Springer Advances in Experimental Medicine and Biology Modelling Cerebrovascular Reactivity: A Novel ... �1,928.46
29 NaN Springer Advances in Experimental Medicine and Biology Normobaric hyperoxia does not change optical s... �1,928.46
... ... ... ... ... ...
2098 PMC3717178 John Wiley & Sons Ltd Tropical Medicine and International Health Nutritional supplementation: the additional co... �1,530.77
2099 PMC3775257 John Wiley & Sons Ltd Tropical Medicine and International Health Disengagement from care in a decentralised pri... �1,836.92
2100 PMC3558801 Wiley Tropical Medicine and International Health Preparing for national school-based deworming ... �1,870.32
2101 NaN Wiley Tropical Medicine and International Health Meningococcal carriage in the African meningit... �1,949.32
2102 PMC3770928 Wiley Tropical Medicine and International Health Epidemiology and control of trachoma: systamat... �1,974.72
2103 PMC3627817 Wiley Tropical Medicine and International Health Maternal recall of birth weight and birth size... �1,896.93
2104 PMCID:\r PMC3759846 Elsevier Tuberculosis Pathways of IL-1_ secretion by macrophages inf... �1,999.94
2105 PMCID:\r PMC3608034 Elsevier Tuberculosis A novel assay of antimycobacterial activity an... �2,322.57
2106 NaN Cambridge University Press Urban History Leisure, economy and colonial urbanism: Darjee... �2,034.00
2107 PMCID: PMC3599165 Springer Urolithiasis Accuracy of urine pH testing in a regional met... �1,112.40
2108 PMC3763375 Elsevier Vaccine Human papillomavirus (HPV) vaccine implementat... �1,433.34
2109 PMID: 24035434 Elsevier Vaccine Cattle immunized against the pathogenic l-_-gl... �1,448.38
2110 PMC3763374 Elsevier Vaccine Protection against avian necrotic enteritis af... �1,477.73
2111 PMC3404461 Elsevier Vaccine Tailoring subunit vaccine immunogenicity:�maxi... �2,328.84
2112 PMCID: PMC3740234 Elsevier Ltd Vaccine Increased IgG but normal IgA anti-pneumococcal... �1,428.68
2113 23117109 Elsevier Vascular Pharmacology Signal transduction and modulating pathways in... �2,352.53
2114 PMC3757156 Elsevier Veterinary Microbiology Isolation of canine Anaplasma phagocytophilum ... �2,451.49
2115 NaN Elsevier Veterinary Parasitology Persistence of the efficacy of copper oxide wi... �1,811.23
2116 PMC3611597 Elsevier Veterinary Parasitology Toxocara canis: molecular basis of immune reco... �2,488.17
2117 PMC3786614 BMJ Publishing Group Veterinary Record Proactive dairy cattle disease control in the ... �2,040.00
2118 PMC3716626 BioMed Central Ltd Veterinary Research Understanding foot-and-mouth disease virus tra... �993.30
2119 PMCID: PMC3791421 Elsevier Virology An essential fifth coding ORF in the sobemovir... �1,435.05
2120 23562481 The Boulevard Virology Metagenomic study of the viruses of African st... �2,421.96
2121 PMC3190389 BioMed Central Virology Journal Label-free quantitative proteomics reveals reg... �1,242.00
2122 23201205 Elsevier Virus Research Prostratin exhibits both replication enhancing... �1,947.09
2123 pub Aug 2013 Elsevier Vision Research Sensitivity to numerosity is not a unique visu... �1,456.18
2124 23200744 PMC3552157 Elsevier Vision Research Perceptual learning of second order cues for l... �2,385.25
2125 PMC3472342\r\r Cambridge University Press Visual Neuroscience Masking within and across visual dimensions: P... �2,034.00
2126 PMCID: PMC3600532 Wiley-Blackwell Zoonoses and Public Health Ecology of zoonotic infectious diseases in bat... �2,272.15
2127 NaN NaN NaN NaN NaN

2128 rows × 5 columns

Finding Groups

Define column to operate on and drop rows with NULL values

In [15]:
col = "Publisher"
# col = "Sub Location"

fi = fi[pd.notnull(fi[col])]
vals = fi[col]

Define fingerprint and groups lists

In [16]:
fps = []
cls = []
cts = []

Define dictionary containing phrases to be replaced and their replacement

In [17]:
replacements = {
    '&': 'and',
    ' ltd': '',
    ' limited': '',
    ' inc': '',
    ' incorporated': '',
    ' publications': '',
    ' publishing': '',
    ' group': '',
}

Define ngram fingerprint function

In [18]:
def fingerprint_ngram(n, str):
    str = str.lower()                              # Make all letters lowercase
    for phrase in replacements:                    # Make replacements specified in dictionary
        str = str.replace(phrase, replacements[phrase])
    str = ''.join(str.split(" ")).strip()          # Remove all whitespaces
    str = re.sub(r'[^\w\s]','',str)                # Remove all punctuations
    str = ''.join(sorted(set([str[i:i+n] for i in range(0, len(str), n)])))  # Split, sort and remove duplicates
    return str

Define whitespace fingerprint function

In [19]:
def fingerprint_wspace(str):
    str = str.lower()                              # Make all letters lowercase
    for phrase in replacements:                    # Make replacements specified in dictionary
        str = str.replace(phrase, replacements[phrase])
    str = re.sub(r'[^\w\s]','',str)                # Remove all punctuations
    str = ''.join(sorted(set(str.split(" "))))     # Split, sort and remove duplicates
    return str

Define generic fingerprint function

In [20]:
def fingerprint_funk(str):
    return fingerprint_ngram(2, str)

For each entry in column, generate its fingerprint. If fingerprint is not in the fingerprint list, add to it. Append entry to corresponding list in the list of groups.

In [21]:
for val in vals:
    fp = fingerprint_funk(val)
    if fp not in fps:
        fps.append(fp)
        cls.append([val])
        cts.append(1)
    else:
        fp_id = fps.index(fp)
        if val not in cls[fp_id]:
            cls[fp_id].append(val)
            cts[fp_id] += 1

Generate DataFrame and display groups.

In [22]:
dfdict = {"Group Fingerprint": fps, "Values in Group": cls, "Group Size": cts}
df = pd.DataFrame(dfdict)
df[df["Group Size"] > 1].sort_values("Group Size", ascending=False).head(20)
Out[22]:
Group Fingerprint Group Size Values in Group
36 bmj 8 [BMJ Group, BMJ, BMJ , BMJ Publishing Group, B...
78 gesa 7 [SAGE, Sage, Sage Publications, Sage Publicati...
55 hnjolendnssowiya 7 [John Wiley & Sons Ltd, JOHN WILEY & SONS, Joh...
56 naretu 7 [Nature, Nature , Nature Publishing Group, Nat...
17 biceedlntomra 6 [BioMed Central Limited, BioMed Central, BioMe...
0 elersevi 4 [Elsevier, Elsevier , ELSEVIER, Elsevier Ltd]
46 biecisnyofogolompathts 4 [THE COMPANY OF BIOLOGISTS, The company of Bio...
39 eslandpoprrts 4 [Portland Press, PORTLAND PRESS LTD, Portland ...
11 eresfoivoxprrdssityun 4 [Oxford University Press, Oxford University Pr...
9 agbhergmngrirlspve 3 [SPRINGER-VERLAG GMBH, Springer-Verlag GmbH, S...
193 ospl 3 [PLOS, PLoS, Plos]
2 alamanchciemereticsoy 3 [American Chemical Society, AMERICAN CHEMICAL ...
7 cklalellwewiyb 3 [Wiley-Blackwell, Wiley Blackwell, Wiley/Black...
33 amancicreretgyiciolomioborsoyf 3 [American Society for Microbiology, American S...
76 ceeriolensptrisscubviwiys 3 [Wiley Subscription Services Inc., Wiley Subs...
16 andfisncorratayl 3 [Taylor & Francis, Taylor and Francis, Taylor ...
113 cadmus 2 [Cadmus , Cadmus]
123 amanarbicidmeceretheicmiocogolorrysostulyyf 2 [American Society for Biochemistry and Molecul...
115 agaiasatcudieeriniololunandngninsonrnsasesitettub 2 [International Union Against Tuberculosis and ...
128 cachcueaemfogyieioislalelomemondnsocrbrithtrtyya 2 [The American Society for Biochemistry and Mol...

Number of unique values (before & after)

In [23]:
len(fi[col].unique())
Out[23]:
299
In [24]:
len(df["Group Fingerprint"].unique())
Out[24]:
214

Number of clusters

In [25]:
len(df[df["Group Size"] > 1])
Out[25]:
45

Largest cluster

In [26]:
df.iloc[df["Group Size"].idxmax()]['Values in Group']
Out[26]:
['BMJ Group',
 'BMJ',
 'BMJ ',
 'BMJ Publishing Group',
 'BMJ Publishing Group Ltd',
 'BMJ PUBLISHING GROUP',
 'BMJ Group ',
 'BMJ group']

Group and Replace in DataFrame

Automatically find the value with the highest frequency and replace others with this value. Usually fixes any mispellings.

Add fingerprint column to DataFrame

In [27]:
fifp = fi.copy()
fifp['fingerprint'] = fifp.apply(lambda x: fingerprint_funk(x[col]), axis=1)
fifp.head(5)
Out[27]:
PMID/PMCID Publisher Journal title Article title COST (�) charged to Wellcome (inc VAT when charged) fingerprint
0 PMC3378987\r Elsevier Academy of Nutrition and Dietetics Parent support and parent mediated behaviours ... �2,379.54 elersevi
1 PMCID: PMC3780468 ACS (Amercian Chemical Society) Publications ACS Chemical Biology A Novel Allosteric Inhibitor of the Uridine Di... �1,294.59 accaheiaielsmemincocrcsaty
2 PMCID: PMC3621575 ACS (Amercian Chemical Society) Publications ACS Chemical Biology Chemical proteomic analysis reveals the drugab... �1,294.78 accaheiaielsmemincocrcsaty
3 NaN American Chemical Society ACS Chemical Biology Discovery of _2 Adrenergic Receptor Ligands Us... �947.07 alamanchciemereticsoy
4 PMID: 24015914 PMC3833349 American Chemical Society ACS Chemical Biology Discovery of an allosteric inhibitor binding s... �1,267.76 alamanchciemereticsoy

For a fingerprint, find the value (that matches with this fingerprint) with the highest frequency

In [28]:
# replace_from = fingerprint_funk(fi.sample()[col].iloc[0])              # Use random value as example
replace_from = fingerprint_funk(fi[col].value_counts().idxmax())         # Use most frequent value as example
fifp[fifp["fingerprint"] == replace_from][col].value_counts().head(10)
Out[28]:
Elsevier        387
Elsevier          8
ELSEVIER          4
Elsevier Ltd      1
Name: Publisher, dtype: int64
In [29]:
replace_with = fifp[fifp["fingerprint"] == replace_from][col].value_counts().idxmax()
replace_with
Out[29]:
'Elsevier'

Replace entries with this value

In [30]:
fim = fi.copy()
for index, row in fim.iterrows():
    fp = fingerprint_funk(row[col])
    if fp == replace_from:
        row[col] = replace_with
fim[fim[col] == replace_with].head(10)
Out[30]:
PMID/PMCID Publisher Journal title Article title COST (�) charged to Wellcome (inc VAT when charged)
0 PMC3378987\r Elsevier Academy of Nutrition and Dietetics Parent support and parent mediated behaviours ... �2,379.54
64 PMID: 23907068 PMC3837358 Elsevier American Journal of Geriatric Psychiatry The epidemiology of delirium: challenges and o... �2,404.53
66 PMC3516598 Elsevier American Journal of Human Genetics Mutations in ANO3 cause dominant cranio-cervic... �2,296.94
67 PMC3769921 Elsevier American Journal of Human Genetics Mutations in FBXL4 cause mitochondrial encepha... �2,334.41
68 PMC3591859 Elsevier American Journal of Human Genetics Constitutional mutations in RTEL1 cause severe... �2,434.04
69 3567269 Elsevier American Journal of Human Genetics LRIG2 Mutations Cause Urofacial Syndrome �3,938.82
73 PMID: 22898127 PMC3830178 Elsevier American Journal of Preventive Medicine Physical activity and transitioning to retirem... �2,377.65
74 PMID: 23159264 PMC3834139 Elsevier American Journal of Preventive Medicine Financial incentives to promote active travel:... �2,377.65
75 3708126 Elsevier American Journal of Preventive Medicine Sickle Cell Disease in Africa; a neglected cau... �1,834.77
86 PMCID: PMC3740237 Elsevier Analytical Biochemistry An expression system for screening of proteins... �2,381.62

Put everything in a function

In [31]:
def cluster_merge(df, col, replace_from, replace_with=None):
    replace_from = fingerprint_funk(replace_from)
    if replace_with is None:
        fifp = df.copy()
        fifp['fingerprint'] = fifp.apply(lambda x: fingerprint_funk(x[col]), axis=1)
        replace_with = fifp[fifp["fingerprint"] == replace_from][col].value_counts().idxmax()
    fim = df.copy()
    for index, row in fim.iterrows():
        fp = fingerprint_funk(row[col])
        if fp == replace_from:
            row[col] = replace_with
    return fim
In [32]:
cluster_merge(fi, "Publisher", "Elsevier")
Out[32]:
PMID/PMCID Publisher Journal title Article title COST (�) charged to Wellcome (inc VAT when charged)
0 PMC3378987\r Elsevier Academy of Nutrition and Dietetics Parent support and parent mediated behaviours ... �2,379.54
1 PMCID: PMC3780468 ACS (Amercian Chemical Society) Publications ACS Chemical Biology A Novel Allosteric Inhibitor of the Uridine Di... �1,294.59
2 PMCID: PMC3621575 ACS (Amercian Chemical Society) Publications ACS Chemical Biology Chemical proteomic analysis reveals the drugab... �1,294.78
3 NaN American Chemical Society ACS Chemical Biology Discovery of _2 Adrenergic Receptor Ligands Us... �947.07
4 PMID: 24015914 PMC3833349 American Chemical Society ACS Chemical Biology Discovery of an allosteric inhibitor binding s... �1,267.76
5 : PMC3805332 American Chemical Society ACS Chemical Biology Synthesis of alpha-glucan in mycobacteria invo... �2,286.73
6 PMCID:\r PMC3656742\r American Chemical Society ACS Chemical Neuroscience Continuous online microdialysis using microflu... �1,186.80
7 PMCID: 3584654 AMERICAN CHEMICAL SOCIETY ACS NANO HYDROXY-TERMINATED CONJUGATED POLYMER NANOPART... �642.89
8 23373658 American Chemical Society Publications ACS Nano Skin dendritic cell targeting via microneedle ... �693.39
9 PMCID:\r PMC3727331\r International Union of Crystallography Acta Crystallographica Section D, Biological ... Clustering procedures for the optimal selectio... �771.42
10 PMCID: PMC3565438 International Union of Crystallography Acta Crystallographica Section D: Biological C... Intensity statistics in the presence of transl... �773.74
11 PMCID: PMC3668577 International Union of Crystallography (iucr) Acta Crystallographica Section F: Structural B... Structure of diaminohydroxyphosphoribosylamino... �785.60
12 PMCID: PMC3606566 International Union of Crystallography (iucr) Acta Crystallographica Section F: Structural B... Structure of Pseudomonas aeruginosa inosine 5'... �807.67
13 \r PMC3498934 International Union of Crystallography Acta Crystallographica, Section D Nearest-cell: a fast and easy tool for locatin... �757.18
14 PMID:22993091 PMC3447403 International Union of Crystallography Acta Crystallography D Crystallization, dehydration and experimental ... �774.19
15 PMC3087623 International Union of Crystallography Acta D Structure of HLA-A*0301 in complex with a pept... �750.16
16 PMC3808818 Society for Publication of Acta Dermato-Venere... Acta Dermato Venereologica The Importance of a Full Clinical Examination:... �653.96
17 PMID: 23828613 (July 2013 Epub) Springer Acta Diabetologica A rare SNP in pre-miR-34a is associated with i... �2,336.28
18 PMC3374517 International Union of Crystallography ACTA F Crystallization and preliminary crystallograph... �754.90
19 PMC3549237 Springer Acta Neuropathol Overexpression of human wild-type FUS causes p... �1,901.04
20 NaN Springer Acta Neuropathologica �-Synucleinopathy associated with G51D SNCA mu... �1,884.01
21 PMC3661931 Springer Acta Neuropathologica Insufficient OPC migration into demyelinated l... �2,250.97
22 3535376 Springer Acta Neuropathologica Unravelling the enigma o selective vulnerabili... �2,348.21
23 PMC3798121 Wiley-Blackwell Acta Opthalmologica Visual and psychological morbidity among patie... �2,270.16
24 21624095 PMCID: PMC3734623 Wiley Acta Physiol Integration of transient receptor potential ca... �1,991.50
25 In Process Wiley Addiction Acute alcohol-related dysfunction as a predict... �1,919.51
26 23734913 Wiley Addiction Childhood conduct disorder trajectories, prior... �2,352.94
27 NaN Springer Advances in Experimental Medicine and Biology Wavelet cross-correlation to investigate regio... �1,928.45
28 NaN Springer Advances in Experimental Medicine and Biology Modelling Cerebrovascular Reactivity: A Novel ... �1,928.46
29 NaN Springer Advances in Experimental Medicine and Biology Normobaric hyperoxia does not change optical s... �1,928.46
... ... ... ... ... ...
2097 PMCID:\r PMC3508281\r\r Springer Tropical Animal Health & Production Low prevalence of bovine tuberculosis in Somal... �2,054.78
2098 PMC3717178 John Wiley & Sons Ltd Tropical Medicine and International Health Nutritional supplementation: the additional co... �1,530.77
2099 PMC3775257 John Wiley & Sons Ltd Tropical Medicine and International Health Disengagement from care in a decentralised pri... �1,836.92
2100 PMC3558801 Wiley Tropical Medicine and International Health Preparing for national school-based deworming ... �1,870.32
2101 NaN Wiley Tropical Medicine and International Health Meningococcal carriage in the African meningit... �1,949.32
2102 PMC3770928 Wiley Tropical Medicine and International Health Epidemiology and control of trachoma: systamat... �1,974.72
2103 PMC3627817 Wiley Tropical Medicine and International Health Maternal recall of birth weight and birth size... �1,896.93
2104 PMCID:\r PMC3759846 Elsevier Tuberculosis Pathways of IL-1_ secretion by macrophages inf... �1,999.94
2105 PMCID:\r PMC3608034 Elsevier Tuberculosis A novel assay of antimycobacterial activity an... �2,322.57
2106 NaN Cambridge University Press Urban History Leisure, economy and colonial urbanism: Darjee... �2,034.00
2107 PMCID: PMC3599165 Springer Urolithiasis Accuracy of urine pH testing in a regional met... �1,112.40
2108 PMC3763375 Elsevier Vaccine Human papillomavirus (HPV) vaccine implementat... �1,433.34
2109 PMID: 24035434 Elsevier Vaccine Cattle immunized against the pathogenic l-_-gl... �1,448.38
2110 PMC3763374 Elsevier Vaccine Protection against avian necrotic enteritis af... �1,477.73
2111 PMC3404461 Elsevier Vaccine Tailoring subunit vaccine immunogenicity:�maxi... �2,328.84
2112 PMCID: PMC3740234 Elsevier Vaccine Increased IgG but normal IgA anti-pneumococcal... �1,428.68
2113 23117109 Elsevier Vascular Pharmacology Signal transduction and modulating pathways in... �2,352.53
2114 PMC3757156 Elsevier Veterinary Microbiology Isolation of canine Anaplasma phagocytophilum ... �2,451.49
2115 NaN Elsevier Veterinary Parasitology Persistence of the efficacy of copper oxide wi... �1,811.23
2116 PMC3611597 Elsevier Veterinary Parasitology Toxocara canis: molecular basis of immune reco... �2,488.17
2117 PMC3786614 BMJ Publishing Group Veterinary Record Proactive dairy cattle disease control in the ... �2,040.00
2118 PMC3716626 BioMed Central Ltd Veterinary Research Understanding foot-and-mouth disease virus tra... �993.30
2119 PMCID: PMC3791421 Elsevier Virology An essential fifth coding ORF in the sobemovir... �1,435.05
2120 23562481 The Boulevard Virology Metagenomic study of the viruses of African st... �2,421.96
2121 PMC3190389 BioMed Central Virology Journal Label-free quantitative proteomics reveals reg... �1,242.00
2122 23201205 Elsevier Virus Research Prostratin exhibits both replication enhancing... �1,947.09
2123 pub Aug 2013 Elsevier Vision Research Sensitivity to numerosity is not a unique visu... �1,456.18
2124 23200744 PMC3552157 Elsevier Vision Research Perceptual learning of second order cues for l... �2,385.25
2125 PMC3472342\r\r Cambridge University Press Visual Neuroscience Masking within and across visual dimensions: P... �2,034.00
2126 PMCID: PMC3600532 Wiley-Blackwell Zoonoses and Public Health Ecology of zoonotic infectious diseases in bat... �2,272.15

2127 rows × 5 columns

In [33]:
cluster_merge(fi, "Publisher", "Elsevier", replace_with="Alternative")
Out[33]:
PMID/PMCID Publisher Journal title Article title COST (�) charged to Wellcome (inc VAT when charged)
0 PMC3378987\r Alternative Academy of Nutrition and Dietetics Parent support and parent mediated behaviours ... �2,379.54
1 PMCID: PMC3780468 ACS (Amercian Chemical Society) Publications ACS Chemical Biology A Novel Allosteric Inhibitor of the Uridine Di... �1,294.59
2 PMCID: PMC3621575 ACS (Amercian Chemical Society) Publications ACS Chemical Biology Chemical proteomic analysis reveals the drugab... �1,294.78
3 NaN American Chemical Society ACS Chemical Biology Discovery of _2 Adrenergic Receptor Ligands Us... �947.07
4 PMID: 24015914 PMC3833349 American Chemical Society ACS Chemical Biology Discovery of an allosteric inhibitor binding s... �1,267.76
5 : PMC3805332 American Chemical Society ACS Chemical Biology Synthesis of alpha-glucan in mycobacteria invo... �2,286.73
6 PMCID:\r PMC3656742\r American Chemical Society ACS Chemical Neuroscience Continuous online microdialysis using microflu... �1,186.80
7 PMCID: 3584654 AMERICAN CHEMICAL SOCIETY ACS NANO HYDROXY-TERMINATED CONJUGATED POLYMER NANOPART... �642.89
8 23373658 American Chemical Society Publications ACS Nano Skin dendritic cell targeting via microneedle ... �693.39
9 PMCID:\r PMC3727331\r International Union of Crystallography Acta Crystallographica Section D, Biological ... Clustering procedures for the optimal selectio... �771.42
10 PMCID: PMC3565438 International Union of Crystallography Acta Crystallographica Section D: Biological C... Intensity statistics in the presence of transl... �773.74
11 PMCID: PMC3668577 International Union of Crystallography (iucr) Acta Crystallographica Section F: Structural B... Structure of diaminohydroxyphosphoribosylamino... �785.60
12 PMCID: PMC3606566 International Union of Crystallography (iucr) Acta Crystallographica Section F: Structural B... Structure of Pseudomonas aeruginosa inosine 5'... �807.67
13 \r PMC3498934 International Union of Crystallography Acta Crystallographica, Section D Nearest-cell: a fast and easy tool for locatin... �757.18
14 PMID:22993091 PMC3447403 International Union of Crystallography Acta Crystallography D Crystallization, dehydration and experimental ... �774.19
15 PMC3087623 International Union of Crystallography Acta D Structure of HLA-A*0301 in complex with a pept... �750.16
16 PMC3808818 Society for Publication of Acta Dermato-Venere... Acta Dermato Venereologica The Importance of a Full Clinical Examination:... �653.96
17 PMID: 23828613 (July 2013 Epub) Springer Acta Diabetologica A rare SNP in pre-miR-34a is associated with i... �2,336.28
18 PMC3374517 International Union of Crystallography ACTA F Crystallization and preliminary crystallograph... �754.90
19 PMC3549237 Springer Acta Neuropathol Overexpression of human wild-type FUS causes p... �1,901.04
20 NaN Springer Acta Neuropathologica �-Synucleinopathy associated with G51D SNCA mu... �1,884.01
21 PMC3661931 Springer Acta Neuropathologica Insufficient OPC migration into demyelinated l... �2,250.97
22 3535376 Springer Acta Neuropathologica Unravelling the enigma o selective vulnerabili... �2,348.21
23 PMC3798121 Wiley-Blackwell Acta Opthalmologica Visual and psychological morbidity among patie... �2,270.16
24 21624095 PMCID: PMC3734623 Wiley Acta Physiol Integration of transient receptor potential ca... �1,991.50
25 In Process Wiley Addiction Acute alcohol-related dysfunction as a predict... �1,919.51
26 23734913 Wiley Addiction Childhood conduct disorder trajectories, prior... �2,352.94
27 NaN Springer Advances in Experimental Medicine and Biology Wavelet cross-correlation to investigate regio... �1,928.45
28 NaN Springer Advances in Experimental Medicine and Biology Modelling Cerebrovascular Reactivity: A Novel ... �1,928.46
29 NaN Springer Advances in Experimental Medicine and Biology Normobaric hyperoxia does not change optical s... �1,928.46
... ... ... ... ... ...
2097 PMCID:\r PMC3508281\r\r Springer Tropical Animal Health & Production Low prevalence of bovine tuberculosis in Somal... �2,054.78
2098 PMC3717178 John Wiley & Sons Ltd Tropical Medicine and International Health Nutritional supplementation: the additional co... �1,530.77
2099 PMC3775257 John Wiley & Sons Ltd Tropical Medicine and International Health Disengagement from care in a decentralised pri... �1,836.92
2100 PMC3558801 Wiley Tropical Medicine and International Health Preparing for national school-based deworming ... �1,870.32
2101 NaN Wiley Tropical Medicine and International Health Meningococcal carriage in the African meningit... �1,949.32
2102 PMC3770928 Wiley Tropical Medicine and International Health Epidemiology and control of trachoma: systamat... �1,974.72
2103 PMC3627817 Wiley Tropical Medicine and International Health Maternal recall of birth weight and birth size... �1,896.93
2104 PMCID:\r PMC3759846 Alternative Tuberculosis Pathways of IL-1_ secretion by macrophages inf... �1,999.94
2105 PMCID:\r PMC3608034 Alternative Tuberculosis A novel assay of antimycobacterial activity an... �2,322.57
2106 NaN Cambridge University Press Urban History Leisure, economy and colonial urbanism: Darjee... �2,034.00
2107 PMCID: PMC3599165 Springer Urolithiasis Accuracy of urine pH testing in a regional met... �1,112.40
2108 PMC3763375 Alternative Vaccine Human papillomavirus (HPV) vaccine implementat... �1,433.34
2109 PMID: 24035434 Alternative Vaccine Cattle immunized against the pathogenic l-_-gl... �1,448.38
2110 PMC3763374 Alternative Vaccine Protection against avian necrotic enteritis af... �1,477.73
2111 PMC3404461 Alternative Vaccine Tailoring subunit vaccine immunogenicity:�maxi... �2,328.84
2112 PMCID: PMC3740234 Alternative Vaccine Increased IgG but normal IgA anti-pneumococcal... �1,428.68
2113 23117109 Alternative Vascular Pharmacology Signal transduction and modulating pathways in... �2,352.53
2114 PMC3757156 Alternative Veterinary Microbiology Isolation of canine Anaplasma phagocytophilum ... �2,451.49
2115 NaN Alternative Veterinary Parasitology Persistence of the efficacy of copper oxide wi... �1,811.23
2116 PMC3611597 Alternative Veterinary Parasitology Toxocara canis: molecular basis of immune reco... �2,488.17
2117 PMC3786614 BMJ Publishing Group Veterinary Record Proactive dairy cattle disease control in the ... �2,040.00
2118 PMC3716626 BioMed Central Ltd Veterinary Research Understanding foot-and-mouth disease virus tra... �993.30
2119 PMCID: PMC3791421 Alternative Virology An essential fifth coding ORF in the sobemovir... �1,435.05
2120 23562481 The Boulevard Virology Metagenomic study of the viruses of African st... �2,421.96
2121 PMC3190389 BioMed Central Virology Journal Label-free quantitative proteomics reveals reg... �1,242.00
2122 23201205 Alternative Virus Research Prostratin exhibits both replication enhancing... �1,947.09
2123 pub Aug 2013 Alternative Vision Research Sensitivity to numerosity is not a unique visu... �1,456.18
2124 23200744 PMC3552157 Alternative Vision Research Perceptual learning of second order cues for l... �2,385.25
2125 PMC3472342\r\r Cambridge University Press Visual Neuroscience Masking within and across visual dimensions: P... �2,034.00
2126 PMCID: PMC3600532 Wiley-Blackwell Zoonoses and Public Health Ecology of zoonotic infectious diseases in bat... �2,272.15

2127 rows × 5 columns