├── Latex_template.zip ├── ProcessedData ├── CHEMPROT │ ├── dev.tsv │ ├── test.tsv │ ├── test_normal_final.tsv │ ├── test_overlapping_final.tsv │ └── train.tsv └── DDIExtraction2013 │ ├── dev.tsv │ ├── test.tsv │ ├── test_normal.tsv │ ├── test_overlapping.tsv │ └── train.tsv ├── README.md ├── Records └── record_76.56% │ ├── bert_config.json │ ├── eval_results.txt │ └── test_results.txt ├── Resources └── NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 │ ├── bert_config.json │ └── vocab.txt ├── SourceCode ├── BioRE.py ├── BioRE_BG.py ├── file_utils.py ├── modeling.py └── modeling_BG.py └── model.jpg /Latex_template.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CongSun-dlut/CPI_extraction/cda9f2b7f6fb757b999744cc9a95be205e2cc265/Latex_template.zip -------------------------------------------------------------------------------- /ProcessedData/DDIExtraction2013/test_normal.tsv: -------------------------------------------------------------------------------- 1 | index sentence label 2 | DDI-DrugBank.d572.s1.p0 The incremental LDL-C reduction due to adding @DRUG$ to @DRUG$ may be reduced by this interaction. DDI-effect 3 | DDI-DrugBank.d572.s10.p0 Patients who take both @DRUG$ and @DRUG$ should be carefully monitored. DDI-advise 4 | DDI-DrugBank.d572.s11.p0 Carcinogenesis, Mutagenesis, Impairment of Fertility A 104-week dietary carcinogenicity study with @DRUG$ was conducted in rats at doses up to 1500 mg/kg/day (males) and 500 mg/kg/day (females) (~20 times the human exposure at 10 mg daily based on AUC0-24hr for total @DRUG$). DDI-false 5 | DDI-DrugBank.d572.s12.p0 A 104-week dietary carcinogenicity study with @DRUG$ was also conducted in mice at doses up to 500 mg/kg/day (>150 times the human exposure at 10 mg daily based on AUC0-24hr for total @DRUG$). DDI-false 6 | DDI-DrugBank.d572.s17.p0 In oral (gavage) fertility studies of @DRUG$ conducted in rats, there was no evidence of reproductive toxicity at doses up to 1000 mg/kg/day in male or female rats (~7 times the human exposure at 10 mg daily based on AUC0-24hr for total @DRUG$). DDI-false 7 | DDI-DrugBank.d572.s22.p0 In rabbits treated with @DRUG$, an increased incidence of extra thoracic ribs was observed at 1000 mg/kg/day (150 times the human exposure at 10 mg daily based on AUC0-24hr for total @DRUG$). DDI-false 8 | DDI-DrugBank.d572.s5.p0 Co-administration of @DRUG$ with @DRUG$ is not recommended until use in patients is studied. DDI-advise 9 | DDI-DrugBank.d573.s1.p0 If at all possible @DRUG$ should be discontinued well before @DRUG$ is begun. DDI-advise 10 | DDI-DrugBank.d575.s2.p0 Therefore, caution should be used in administering @DRUG$ (@DRUG$ injection) to patients receiving these other agents. DDI-false 11 | DDI-DrugBank.d575.s3.p0 There was no change in the plasma kinetics of @DRUG$ when coadministered with @DRUG$. DDI-false 12 | DDI-DrugBank.d575.s5.p0 An increase in intracellular levels of @DRUG$ was observed in vitro in the presence of @DRUG$. DDI-mechanism 13 | DDI-DrugBank.d576.s4.p0 @DRUG$ given concomitantly with @DRUG$ may predispose to systemic bleeding. DDI-effect 14 | DDI-DrugBank.d576.s5.p0 @DRUG$ may enhance the hypoglycemic effect of oral @DRUG$ of the sulfonylurea class. DDI-effect 15 | DDI-DrugBank.d577.s0.p0 In elderly patients concurrently receiving certain @DRUG$, primarily @DRUG$, an increased incidence of thrombopenia with purpura has been reported. DDI-false 16 | DDI-DrugBank.d577.s3.p0 @DRUG$ may inhibit the hepatic metabolism of @DRUG$. DDI-mechanism 17 | DDI-DrugBank.d577.s4.p0 At a 1.6-g dose, @DRUG$ produced a slight but significant increase in the half-life of @DRUG$ but did not produce a corresponding decrease in the metabolic clearance rate. DDI-mechanism 18 | DDI-DrugBank.d578.s1.p0 Mutual inhibition of metabolism occurs with concurrent use of @DRUG$ and @DRUG$; DDI-mechanism 19 | DDI-DrugBank.d578.s10.p0 The effect of @DRUG$ on oral @DRUG$ is variable. DDI-false 20 | DDI-DrugBank.d578.s11.p0 There are reports of enhanced as well as diminished effects of @DRUG$ when given concurrently with @DRUG$. DDI-effect 21 | DDI-DrugBank.d578.s3.p0 convulsions have been reported with concurrent use of @DRUG$ and @DRUG$. DDI-effect 22 | DDI-DrugBank.d578.s7.p0 @DRUG$ may increase the clearance of chronic high dose @DRUG$. DDI-mechanism 23 | DDI-DrugBank.d578.s9.p0 @DRUG$ should be used cautiously in conjunction with @DRUG$ in patients suffering from hypoprothrombinemia. DDI-advise 24 | DDI-DrugBank.d580.s1.p0 However, @DRUG$, a potent inhibitor of cytochrome P450 3A4, may increase plasma levels of @DRUG$ during concomitant dosing. DDI-mechanism 25 | DDI-DrugBank.d581.s2.p0 patients receiving @DRUG$ and @DRUG$ should have more frequent monitoring of neutrophil counts. DDI-advise 26 | DDI-DrugBank.d584.s5.p0 A 5.5-fold decrease in the mean @DRUG$ dose from 6.13 mg/day to 1.13 mg/day (approximately 80-85% reduction of @DRUG$ dose), was necessary to maintain a target INR of 1.5. DDI-false 27 | DDI-DrugBank.d584.s7.p0 Furthermore, in patients receiving both drugs, careful monitoring of the INR or PT, and adjustment of the @DRUG$ dosage if indicated are recommended when the @DRUG$ dose is changed or discontinued. DDI-advise 28 | DDI-DrugBank.d585.s1.p0 @DRUG$ antagonize the effects of @DRUG$. DDI-effect 29 | DDI-DrugBank.d585.s2.p0 @DRUG$ in the presence of increased intraocular pressure may be hazardous when taken concurrently with agents such as @DRUG$. DDI-effect 30 | DDI-DrugBank.d585.s3.p0 @DRUG$ may affect gastrointestinal absorption of various drugs, such as slowly dissolving dosage forms of @DRUG$; DDI-mechanism 31 | DDI-DrugBank.d585.s5.p0 @DRUG$ may antagonize the effects of drugs that alter gastrointestinal motility, such as @DRUG$. DDI-effect 32 | DDI-DrugBank.d585.s6.p0 Because @DRUG$ may interfere with the absorption of @DRUG$, simultaneous use of these drugs should be avoided. DDI-mechanism 33 | DDI-DrugBank.d588.s12.p0 @DRUG$: Efficacy may be decreased due to urinary alkalizing effect of @DRUG$. DDI-effect 34 | DDI-DrugBank.d588.s6.p0 @DRUG$ or @DRUG$: May increase the risk of hypokalemia and increase salt and water retention. DDI-false 35 | DDI-DrugBank.d588.s7.p0 @DRUG$: Serum @DRUG$ levels may increase. DDI-false 36 | DDI-DrugBank.d590.s1.p0 Coadministration with @DRUG$ results in a slight decrease in the rate of @DRUG$ absorption that is clinically unimportant. DDI-false 37 | DDI-DrugBank.d590.s4.p0 For information on drug interactions associated with @DRUG$, refer to the @DRUG$ package insert. DDI-false 38 | DDI-DrugBank.d591.s2.p0 In vivo, the plasma clearance of @DRUG$ was reduced by 70% during coadministration of @DRUG$ (25 mg bid for 2 days), a selective and potent CYP1A2 inhibitor. DDI-mechanism 39 | DDI-DrugBank.d591.s5.p0 Possible interactions with drugs known to be metabolized by CYP1A2 via competitive inhibition such as @DRUG$ and @DRUG$ may also occur. DDI-false 40 | DDI-DrugBank.d595.s2.p0 The addition of @DRUG$ to parenteral solutions containing @DRUG$ should be avoided, except where compatibility has been previously established. DDI-false 41 | DDI-DrugBank.d596.s4.p0 The elevation of prothrombin time induced by @DRUG$ is not affected by administration of @DRUG$. DDI-false 42 | DDI-DrugBank.d596.s5.p0 Effect of Other Drugs on the Metabolism of @DRUG$: Drugs that induce or inhibit CYP450 metabolism are not expected to alter the metabolism of @DRUG$. DDI-false 43 | DDI-DrugBank.d597.s1.p0 This interaction should be given consideration in patients taking @DRUG$ concomitantly with @DRUG$. DDI-advise 44 | DDI-DrugBank.d597.s13.p0 In vitro testing found no protein binding drug interaction between @DRUG$ and @DRUG$. DDI-false 45 | DDI-DrugBank.d597.s16.p0 Studies with @DRUG$ agents and @DRUG$ have not demonstrated a reduction in natriuretic effect. DDI-false 46 | DDI-DrugBank.d597.s17.p0 @DRUG$: single and multiple dose pharmacodynamics and pharmacokinetics are not affected by multiple doses of @DRUG$. DDI-false 47 | DDI-DrugBank.d597.s18.p0 Nevertheless, during concomitant therapy with @DRUG$ and @DRUG$, patients should be observed closely for signs of declining renal function, as well as to assure diuretic efficacy. DDI-advise 48 | DDI-DrugBank.d597.s22.p0 Patients on @DRUG$ treatment should be closely monitored when @DRUG$ is introduced or withdrawn. DDI-advise 49 | DDI-DrugBank.d597.s24.p0 @DRUG$ did not have a significant effect on the pharmacokinetics of single doses of @DRUG$. DDI-false 50 | DDI-DrugBank.d597.s25.p0 In vitro, @DRUG$ did not displace @DRUG$ from its human serum binding sites. DDI-false 51 | DDI-DrugBank.d597.s6.p0 @DRUG$ is not a substitute for @DRUG$ for cardiovascular prophylaxis. DDI-false 52 | DDI-DrugBank.d598.s0.p0 @DRUG$ may inhibit the hepatic metabolism of @DRUG$. DDI-mechanism 53 | DDI-DrugBank.d599.s4.p0 If @DRUG$ replacement is required, the @DRUG$ dose should be carefully adjusted. DDI-false 54 | DDI-DrugBank.d599.s6.p0 Limited published data indicate that @DRUG$ treatment increases cytochrome P450 (CP450) mediated @DRUG$ clearance in man. DDI-mechanism 55 | DDI-DrugBank.d602.s0.p0 Caution is recommended when administering @DRUG$ with compounds that are metabolized/eliminated predominantly by the UGT1A1 pathway (e.g. @DRUG$). DDI-advise 56 | DDI-DrugBank.d602.s1.p0 Concomitant treatment with @DRUG$ resulted in a 21% increase in the AUC of @DRUG$. DDI-mechanism 57 | DDI-DrugBank.d602.s2.p0 Caution is recommended when administering @DRUG$ with @DRUG$. DDI-advise 58 | DDI-DrugBank.d603.s0.p0 When administered concomitantly with @DRUG$ , @DRUG$ may enhance or precipitate bradycardia, A.V. DDI-effect 59 | DDI-DrugBank.d603.s3.p0 @DRUG$ has been used in patients concomitantly treated with salt-retaining steroid therapy (i.e., @DRUG$), with or without salt supplementation. DDI-false 60 | DDI-DrugBank.d604.s5.p0 @DRUG$ with @DRUG$ may reduce the effectiveness of the contraception and up to one month after discontinuation of these concomitant therapies. DDI-effect 61 | DDI-DrugBank.d608.s1.p0 However, increased prothrombin time and bleeding have been reported in patients on concomitant @DRUG$ and @DRUG$ therapy. DDI-effect 62 | DDI-DrugBank.d608.s2.p0 Therefore, caution should be exercised when administering @DRUG$ to patients on @DRUG$. DDI-advise 63 | DDI-DrugBank.d608.s4.p0 Caution should be used if @DRUG$ is administered concomitantly with @DRUG$. DDI-advise 64 | DDI-DrugBank.d610.s2.p0 @DRUG$ has no effect on the pharmacokinetic properties of @DRUG$. DDI-false 65 | DDI-DrugBank.d613.s0.p0 A study published in 2002 found that @DRUG$ causes a statistically significant increase in plasma clearance of @DRUG$. DDI-mechanism 66 | DDI-DrugBank.d616.s1.p0 Caution should be exercised when administering @DRUG$ therapy in combination with other @DRUG$. DDI-advise 67 | DDI-DrugBank.d619.s0.p0 The occurrence of stupor, muscular rigidity, severe agitation, and elevated temperature has been reported in some patients receiving the combination of @DRUG$ and @DRUG$. DDI-effect 68 | DDI-DrugBank.d619.s2.p0 This is typical of the interaction of @DRUG$ and @DRUG$. DDI-int 69 | DDI-DrugBank.d622.s1.p0 This is especially true if the total dose of @DRUG$ has been large and the administration of @DRUG$ has been delayed. DDI-false 70 | DDI-DrugBank.d625.s1.p0 Impaired renal function has been described in bone marrow transplant patients who were conditioned with high-dose intravenous @DRUG$ and who subsequently received @DRUG$ to prevent graft-versus-host disease DDI-effect 71 | DDI-DrugBank.d627.s0.p0 @DRUG$ should be administered with caution to patients taking @DRUG$ because of the possibility of conduction disturbances. DDI-advise 72 | DDI-DrugBank.d627.s3.p0 These effects should be considered when anticholinergic properties may be contributing to the therapeutic effect of concomitant medication (e.g., @DRUG$, inhaled @DRUG$). DDI-false 73 | DDI-DrugBank.d631.s2.p0 @DRUG$ (@DRUG$) should be administered until normal hematopoiesis is restored. DDI-false 74 | DDI-DrugBank.d631.s3.p0 Mild hepatotoxicity has been reported in some patients when @DRUG$ and @DRUG$ were administered concomitantly. DDI-effect 75 | DDI-DrugBank.d632.s3.p0 When the rapid onset of a concomitant orally administered agent is a critical determinant of effectiveness (such as @DRUG$), the agent should be administered at least 1 hour prior to or 2 hours after @DRUG$ injection. DDI-advise 76 | DDI-DrugBank.d632.s5.p0 No formal interaction studies have been performed to assess the effect of @DRUG$ on the kinetics of oral @DRUG$. DDI-false 77 | DDI-DrugBank.d632.s7.p0 Thus, @DRUG$ and @DRUG$ should not be mixed and must be administered separately. DDI-false 78 | DDI-DrugBank.d633.s0.p0 Since @DRUG$ is a substrate for the metabolic pathways involving CYP2D6 and CYP1A2 enzymes, inhibition or induction of either of these enzymes would be expected to alter @DRUG$ plasma concentrations. DDI-false 79 | DDI-DrugBank.d633.s1.p0 In a formal, single-dose interaction study (n = 6 males) the clearance of @DRUG$ was decreased by 38% following the coadministration of @DRUG$, an inhibitor of CYP1A2. DDI-mechanism 80 | DDI-DrugBank.d633.s11.p0 In a formal study, @DRUG$ were shown not to affect @DRUG$ plasma concentrations. DDI-false 81 | DDI-DrugBank.d633.s2.p0 In another formal study (n = 8 extensive and n = 7 poor metabolizers of CYP2D6), coadministration of @DRUG$ did not alter the kinetics of @DRUG$ in the poor CYP2D6 metabolizer group. DDI-false 82 | DDI-DrugBank.d633.s20.p0 Additionally, in one controlled study in five normal subjects and seven patients, the clearance of @DRUG$ was decreased 50% following the administration of @DRUG$ . DDI-mechanism 83 | DDI-DrugBank.d633.s4.p0 In this crossover steady state study, the pharmacokinetics of @DRUG$ were unaffected in either phenotype by the coadministration of @DRUG$. DDI-false 84 | DDI-DrugBank.d633.s6.p0 When concomitant administration of either of these two drugs with @DRUG$ is initiated, the dose of @DRUG$ should be slowly titrated to desired effect. DDI-false 85 | DDI-DrugBank.d634.s3.p0 @DRUG$: Serum digitalis levels may be increased when hyperthyroid patients on a stable @DRUG$ regimen become euthyroid; DDI-false 86 | DDI-DrugBank.d635.s0.p0 Other @DRUG$ should not be used concomitantly with @DRUG$ because they may have additive effects. DDI-advise 87 | DDI-DrugBank.d636.s0.p0 Some @DRUG$ may interact with @DRUG$. DDI-int 88 | DDI-DrugBank.d637.s1.p0 Use of @DRUG$ in conjunction with oral @DRUG$ may increase the risk of respiratory depression, hypotension and profound sedation or coma. DDI-effect 89 | DDI-DrugBank.d638.s0.p0 @DRUG$ may decrease the hypotensive effect of @DRUG$. DDI-effect 90 | DDI-DrugBank.d640.s0.p0 @DRUG$: Excessive reductions in blood pressure may occur in patients on diuretic therapy when @DRUG$ are started. DDI-false 91 | DDI-DrugBank.d640.s8.p0 If a @DRUG$ is also used, the risk of @DRUG$ toxicity may be increased. DDI-effect 92 | DDI-DrugBank.d641.s2.p0 Concomitant administration of substances that are also tubularly secreted (e.g., @DRUG$) could potentially result in delayed clearance of @DRUG$. DDI-mechanism 93 | DDI-DrugBank.d641.s4.p0 Patients with mild to moderate renal insufficiency should avoid taking @DRUG$ with short elimination half-lives for a period of 2 days before, the day of, and 2 days following administration of @DRUG$. DDI-advise 94 | DDI-DrugBank.d642.s0.p0 Prothrombin time or other suitable anticoagulation test should be monitored if @DRUG$ is administered with @DRUG$. DDI-advise 95 | DDI-DrugBank.d645.s4.p0 @DRUG$ decreases the clearance of @DRUG$ by 19%. DDI-mechanism 96 | DDI-DrugBank.d645.s5.p0 @DRUG$ increases the clearance of @DRUG$ by 15%. DDI-mechanism 97 | DDI-DrugBank.d645.s8.p0 @DRUG$ clearance is unaffected by @DRUG$. DDI-false 98 | DDI-DrugBank.d647.s1.p0 @DRUG$ has been reported to prolong the elimination half-life of @DRUG$ and may lead to severe bone marrow suppression; DDI-mechanism 99 | DDI-DrugBank.d647.s2.p0 a reduction of the @DRUG$ dosage should be considered in patients receiving @DRUG$ concurrently. DDI-advise 100 | DDI-DrugBank.d648.s3.p0 If @DRUG$ is given concurrently with @DRUG$, prothrombin time should be monitored. DDI-advise 101 | DDI-DrugBank.d650.s2.p0 Because @DRUG$ is approximately 90% bound to plasma proteins, caution should be exercised if @DRUG$ is coadministered with other drugs known to affect protein binding. DDI-false 102 | DDI-DrugBank.d651.s0.p0 Drug Interactions: Women on oral @DRUG$ have shown a significant increase in plasma @DRUG$ levels. DDI-mechanism 103 | DDI-DrugBank.d652.s2.p0 The finding that @DRUG$ releases catecholamines in patients with essential hypertension suggests that it should be used cautiously, if at all, in patients receiving @DRUG$. DDI-advise 104 | DDI-DrugBank.d652.s6.p0 Because the action of @DRUG$ will influence the delivery of food to the intestines and thus the rate of absorption, @DRUG$ dosage or timing of dosage may require adjustment. DDI-advise 105 | DDI-DrugBank.d653.s4.p0 @DRUG$ may increase the clearance of chronic high dose @DRUG$. DDI-mechanism 106 | DDI-DrugBank.d653.s6.p0 @DRUG$ should be used cautiously in conjunction with @DRUG$ in patients suffering from hypoprothrombinemia. DDI-advise 107 | DDI-DrugBank.d653.s7.p0 The effect of @DRUG$ on oral @DRUG$ is variable. DDI-effect 108 | DDI-DrugBank.d653.s8.p0 There are reports of enhanced as well as diminished effects of @DRUG$ when given concurrently with @DRUG$. DDI-effect 109 | DDI-DrugBank.d654.s2.p0 Based on in vitro P450 inhibition studies and the low systemic exposure observed following topical application of @DRUG$, @DRUG$ is unlikely to affect the metabolism of other P450 substrates. DDI-false 110 | DDI-DrugBank.d656.s3.p0 It is recommended that plasma @DRUG$ levels be monitored when initiating, adjusting and discontinuing @DRUG$. DDI-advise 111 | DDI-DrugBank.d658.s0.p0 When administered concurrently, @DRUG$ may increase the effects of oral @DRUG$; DDI-effect 112 | DDI-DrugBank.d661.s2.p0 The median duration of action of @DRUG$ 0.6 mg/kg administered after a 1 mg/kg dose of @DRUG$ when T 1 returned to 75% of control was 36 minutes (range 14-57, n=12) vs. DDI-false 113 | DDI-DrugBank.d661.s4.p0 There are no controlled studies documenting the use of @DRUG$ before or after other @DRUG$. DDI-false 114 | DDI-DrugBank.d665.s7.p0 Displacement of @DRUG$ from plasma protein binding sites will elevate the level of free @DRUG$ in the serum. DDI-false 115 | DDI-DrugBank.d666.s0.p0 Catecholamine-depleting drugs (e.g., @DRUG$) may have an additive effect when given with @DRUG$. DDI-effect 116 | DDI-DrugBank.d666.s3.p0 @DRUG$ has been shown to increase serum @DRUG$ levels when both drugs are co-administered. DDI-mechanism 117 | DDI-DrugBank.d668.s1.p0 Furthermore, @DRUG$, @DRUG$, St. DDI-false 118 | DDI-DrugBank.d669.s0.p0 Concomitant administration of @DRUG$ and @DRUG$ has been associated with erythema and histamine-like flushing and anaphylactoid reactions. DDI-effect 119 | DDI-DrugBank.d670.s0.p0 Potential for @DRUG$ to Affect Other Drugs @DRUG$ is not expected to cause clinically important pharmacokinetic interactions with drugs that are metabolized by cytochrome P450 isozymes. DDI-false 120 | DDI-DrugBank.d670.s9.p0 Potential for Other Drugs to Affect @DRUG$ @DRUG$ is not a substrate of CYP1A2, CYP2A6, CYP2C9, and CYP2C19, so that an interaction with inhibitors or inducers of these isozymes is unlikely. DDI-false 121 | DDI-DrugBank.d671.s0.p0 The interaction of @DRUG$ (@DRUG$) with other drugs has not been studied in humans. DDI-false 122 | DDI-DrugBank.d671.s2.p0 Decreased seizure threshold has been reported in patients receiving @DRUG$ concomitantly with @DRUG$. DDI-effect 123 | DDI-DrugBank.d676.s4.p0 No cross-resistance with other @DRUG$, radiotherapy or @DRUG$ has been demonstrated. DDI-false 124 | DDI-DrugBank.d677.s1.p0 as well as @DRUG$ and @DRUG$. DDI-false 125 | DDI-DrugBank.d678.s1.p0 The metabolism of @DRUG$ is accelerated by @DRUG$; DDI-mechanism 126 | DDI-DrugBank.d680.s3.p0 Therefore, when coadministered with @DRUG$, adjustment of the dosage of @DRUG$ or of such drugs may not be necessary. DDI-false 127 | DDI-DrugBank.d680.s7.p0 Patients treated with @DRUG$ and @DRUG$ concomitantly should be monitored for increases in INR and prothrombin time. DDI-advise 128 | DDI-DrugBank.d686.s0.p0 @DRUG$ may compete with other drugs, such as @DRUG$, for sites of metabolism in the liver, thus elevating the serum levels of such compounds to potentially toxic levels. DDI-mechanism 129 | DDI-DrugBank.d686.s1.p0 Therefore, when concomitant use of @DRUG$ and @DRUG$ is anticipated, it may be necessary to monitor blood levels and/or reduce the dosage of such compounds. DDI-advise 130 | DDI-DrugBank.d687.s0.p0 Administration of @DRUG$ decreases oral clearance of @DRUG$ by about 5%. DDI-mechanism 131 | DDI-DrugBank.d688.s2.p0 Administration of a higher dose of @DRUG$ should be considered when coadministering with @DRUG$. DDI-advise 132 | DDI-DrugBank.d689.s3.p0 The pressor response of @DRUG$ may also be potentiated by @DRUG$. DDI-effect 133 | DDI-DrugBank.d693.s1.p0 When daily doses of @DRUG$ 2 g and weekly doses of @DRUG$ 7.5 mg were coadministered to 15 rheumatoid arthritis patients in a drug-drug interaction study, the pharmacokinetic disposition of the drugs was not altered. DDI-false 134 | DDI-DrugBank.d693.s2.p0 Daily doses of @DRUG$ 2 g (maximum 3 g) and weekly doses of @DRUG$ 7.5 mg (maximum 15 mg) were administered alone or in combination to 310 rheumatoid arthritis patients in two controlled 52-week clinical studies. DDI-false 135 | DDI-DrugBank.d694.s2.p0 No dosage adjustment of @DRUG$ or @DRUG$ is recommended. DDI-false 136 | DDI-DrugBank.d696.s0.p0 @DRUG$ may reverse the analgesic activity of @DRUG$. DDI-effect 137 | DDI-DrugBank.d698.s1.p0 @DRUG$ - concurrent use decreases gastrointestinal absorption of @DRUG$; DDI-mechanism 138 | DDI-DrugBank.d698.s2.p0 requirements for @DRUG$ may be increased in patients receiving @DRUG$. DDI-effect 139 | DDI-DrugBank.d700.s1.p0 It is recommended that @DRUG$ and @DRUG$ not be administered simultaneously. DDI-advise 140 | DDI-DrugBank.d700.s10.p0 However, in a drug-drug interaction study, mean @DRUG$ AUC was decreased by 15% when coadministered with @DRUG$. DDI-mechanism 141 | DDI-DrugBank.d700.s11.p0 Therefore, it is recommended that oral @DRUG$ are co- administered with @DRUG$ with caution and additional birth control methods be considered. DDI-advise 142 | DDI-DrugBank.d700.s16.p0 Interference of @DRUG$ hydrolysis may lead to less @DRUG$ available for absorption. DDI-false 143 | DDI-DrugBank.d704.s0.p0 Drug-Drug Interactions: The pharmacokinetic and pharmacodynamic interactions between @DRUG$ capsules and other @DRUG$ have not been determined. DDI-false 144 | DDI-DrugBank.d704.s1.p0 However, interactions may be expected and @DRUG$ capsules should NOT be used in combination with other @DRUG$. DDI-advise 145 | DDI-DrugBank.d704.s2.p0 The pharmacokinetic interaction between @DRUG$ and @DRUG$ capsules was investigated. DDI-false 146 | DDI-DrugBank.d704.s4.p0 Therefore, @DRUG$ capsules should be used with caution in combination with @DRUG$, particularly at doses higher than 0.4 mg. DDI-advise 147 | DDI-DrugBank.d704.s5.p0 Results from limited in vitro and in vivo drug-drug interaction studies between @DRUG$ and @DRUG$ are inconclusive. DDI-false 148 | DDI-DrugBank.d704.s6.p0 Therefore, caution should be exercised with concomitant administration of @DRUG$ and @DRUG$ capsules. DDI-advise 149 | DDI-DrugBank.d706.s0.p0 Catecholamine-depleting drugs (e.g., @DRUG$) may have an additive effect when given with @DRUG$. DDI-effect 150 | DDI-DrugBank.d709.s5.p0 Although such a risk is not verified for @DRUG$, combination of @DRUG$ with such drugs is not recommended. DDI-false 151 | DDI-DrugBank.d710.s2.p0 @DRUG$: @DRUG$, 100 mg, was given twice daily for 13 days to 12 healthy volunteers. DDI-false 152 | DDI-DrugBank.d710.s5.p0 Coadministration with @DRUG$ reduced the peak concentration and AUC values for @DRUG$ by approximately 11%. DDI-mechanism 153 | DDI-DrugBank.d710.s6.p0 @DRUG$: @DRUG$, 100 mg, was given twice daily fro 13 days to 12 healthy volunteers. DDI-false 154 | DDI-DrugBank.d710.s9.p0 Peak plasma concentrations and AUC of @DRUG$ were reduced approximately 10% in the presence of @DRUG$. DDI-mechanism 155 | DDI-DrugBank.d711.s3.p0 The concurrent use of @DRUG$ and @DRUG$ has been reported to result in fatal renal toxicity. DDI-effect 156 | DDI-DrugBank.d713.s0.p0 The concomitant use of @DRUG$ with other @DRUG$ that produce dry mouth, constipation, and other anticholinergic pharmacological effects may increase the frequency and/or severity of such effects. DDI-effect 157 | DDI-DrugBank.d713.s3.p0 Coadministration of @DRUG$ with drugs that are eliminated by active renal tubular secretion may increase the serum concentration of @DRUG$ and/or the coadministered drug due to competition for this elimination pathway. DDI-false 158 | DDI-DrugBank.d716.s1.p0 Changes in @DRUG$ and other diabetes drug therapies may be necessary during treatment with @DRUG$. DDI-advise 159 | DDI-DrugBank.d720.s2.p0 @DRUG$ reduce @DRUG$ clearance, significantly increasing systemic exposure. DDI-mechanism 160 | DDI-DrugBank.d720.s4.p0 If concomitant treatment with @DRUG$ and an @DRUG$ is clinically warranted, appropriate observation of the patient is advised. DDI-advise 161 | DDI-DrugBank.d721.s3.p0 The pharmacokinetic parameters of @DRUG$ are not significantly modified by @DRUG$ coadministration. DDI-false 162 | DDI-DrugBank.d721.s4.p0 In vitro studies show that @DRUG$ does not displace @DRUG$ from its binding site on protein. DDI-false 163 | DDI-DrugBank.d725.s3.p0 The pressor response of @DRUG$ may also be potentiated by @DRUG$. DDI-effect 164 | DDI-DrugBank.d726.s2.p0 Dose adjustment of @DRUG$ Capsules may be required, and iPTH and serum calcium concentrations should be closely monitored if a patient initiates or discontinues therapy with a strong CYP3A4 inhibitor such as @DRUG$. DDI-advise 165 | DDI-DrugBank.d727.s0.p0 @DRUG$ competitively inhibits the intracellular phosphorylation of @DRUG$. DDI-effect 166 | DDI-DrugBank.d727.s1.p0 Therefore, use of @DRUG$ in combination with @DRUG$ should be avoided. DDI-advise 167 | DDI-DrugBank.d728.s1.p0 No pharmacokinetic interaction between 85 mg/m2 @DRUG$ and infusional @DRUG$ has been observed in patients treated every 2 weeks. DDI-false 168 | DDI-DrugBank.d728.s2.p0 Increases of @DRUG$ plasma concentrations by approximately 20% have been observed with doses of 130 mg/m2 @DRUG$ dosed every 3 weeks. DDI-mechanism 169 | DDI-DrugBank.d732.s0.p0 @DRUG$ is not known to interact with other drugs including @DRUG$ supplements; DDI-false 170 | DDI-DrugBank.d732.s2.p0 Concomitant administration of @DRUG$ with other chelation therapy, such as @DRUG$ is not recommended. DDI-advise 171 | DDI-DrugBank.d734.s0.p0 @DRUG$ may accentuate the orthostatic hypotension that may occur with @DRUG$. DDI-effect 172 | DDI-DrugBank.d734.s1.p0 Antihypertensive effects of @DRUG$ and related compounds may be counteracted when @DRUG$ are used concomitantly. DDI-effect 173 | DDI-DrugBank.d734.s2.p0 Concomitant administration of @DRUG$ with @DRUG$ results in increased plasma levels of both drugs. DDI-mechanism 174 | DDI-DrugBank.d736.s1.p0 Hypokalemia may develop during concomitant use of @DRUG$ or @DRUG$. DDI-false 175 | DDI-DrugBank.d736.s3.p0 @DRUG$ may decrease arterial responsiveness to @DRUG$. DDI-effect 176 | DDI-DrugBank.d736.s5.p0 @DRUG$ may increase the responsiveness of @DRUG$. DDI-effect 177 | DDI-DrugBank.d736.s7.p0 @DRUG$ may add to or potentiate the action of other @DRUG$. DDI-effect 178 | DDI-DrugBank.d736.s8.p0 Potentiation occurs with @DRUG-DRUG$. DDI-false 179 | DDI-DrugBank.d737.s7.p0 CYP interactions: Inhibitors of cytochrome P450 enzymes would not be expected to affect @DRUG$ elimination because @DRUG$ is not appreciably metabolized by these enzymes in vivo or in vitro. DDI-false 180 | DDI-DrugBank.d738.s0.p0 Do not exceed a 5 mg daily dose of @DRUG$ when administered with therapeutic doses of @DRUG$ or other potent CYP3A4 inhibitors. DDI-advise 181 | DDI-DrugBank.d739.s0.p0 @DRUG$ at a Dose of 10 mg Once Daily Dosed to Pharmacokinetic Steady State - did not cause clinically significant changes in the kinetics of a single intravenous dose of @DRUG$ (predominantly a cytochrome P450 1A2 substrate). DDI-false 182 | DDI-DrugBank.d739.s1.p0 - did not change the pharmacokinetic profile of @DRUG$ (a substrate of cytochromes P450 2A6 and 2C9) or influence the effect of a single 30-mg oral dose of @DRUG$ on prothrombin time or the INR (International Normalized Ratio). DDI-false 183 | DDI-DrugBank.d742.s10.p0 However, the impairment of motor skills produced by @DRUG$ has been shown to be additive with those caused by @DRUG$. DDI-effect 184 | DDI-DrugBank.d742.s11.p0 Accordingly, patients should be advised to avoid @DRUG$ and other similar drugs while taking @DRUG$. DDI-advise 185 | DDI-DrugBank.d742.s2.p0 Drugs Affecting Hepatic Metabolism The metabolism and pharmacokinetics of @DRUG$ (@DRUG$) Orally Disintegrating Tablets may be affected by the induction or inhibition of drug-metab-olizing enzymes. DDI-false 186 | DDI-DrugBank.d742.s8.p0 However, the impairment of cognitive and motor skills produced by @DRUG$ were shown to be additive with those produced by @DRUG$. DDI-effect 187 | DDI-DrugBank.d743.s11.p0 A few spontaneous accounts of QT interval prolongation with ventricular arrhythmia including torsades de pointes, have been reported in patients receiving @DRUG$ or @DRUG$. DDI-false 188 | DDI-DrugBank.d743.s13.p0 Studies to evaluate potential interactions of @DRUG$ with @DRUG$ are in progress. DDI-false 189 | DDI-DrugBank.d743.s4.p0 Concomitant administration of @DRUG$ and @DRUG$ is contraindicated. DDI-advise 190 | DDI-DrugBank.d743.s7.p0 Concomitant administration of @DRUG$ and @DRUG$ is contraindicated. DDI-advise 191 | DDI-DrugBank.d750.s0.p0 Concomitant use of @DRUG$ with @DRUG$ may result in an adverse drug interaction. DDI-int 192 | DDI-DrugBank.d752.s9.p0 The difficulty in achieving adequate sedation may have been the result of decreased absorption of the @DRUG$ due to both the gastrointestinal effects and stimulant effects of @DRUG$. DDI-false 193 | DDI-DrugBank.d753.s2.p0 @DRUG$ or other medications with anticholinergic activity when used concurrently with @DRUG$ may result in increased risk of urinary retention and/or severe constipation, which may lead to paralytic ileus. DDI-effect 194 | DDI-DrugBank.d753.s3.p0 It has been reported that the incidence of bradycardia was increased when @DRUG$ was combined with @DRUG$ for induction of anesthesia. DDI-effect 195 | DDI-DrugBank.d753.s4.p0 In addition, CNS toxicity has been reported (confusion, disorientation, respiratory depression, apnea, seizures) following coadministration of @DRUG$ with @DRUG$; DDI-effect 196 | DDI-DrugBank.d756.s1.p0 Combination with @DRUG$ (@DRUG$) is associated with increased risk of seizures. DDI-false 197 | DDI-DrugBank.d758.s0.p0 The CNS depressant effects of @DRUG$ may be additive with that of other @DRUG$.. DDI-effect 198 | DDI-DrugBank.d759.s0.p0 Limited evidence suggests that @DRUG$ may influence the intensity and duration of action of @DRUG$. DDI-mechanism 199 | DDI-DrugBank.d761.s12.p0 Although undercarboxylated osteocalcin, a marker of vitamin K nutritional status, was unaltered with @DRUG$ administration, vitamin K levels tended to decline in subjects taking @DRUG$. DDI-false 200 | DDI-DrugBank.d761.s4.p0 @DRUG$ inhibited absorption of a @DRUG$ supplement by approximately 60%. DDI-mechanism 201 | DDI-DrugBank.d762.s2.p0 Therefore, the coadministration of @DRUG$ with @DRUG$ is not recommended. DDI-advise 202 | DDI-DrugBank.d763.s1.p0 Because @DRUG$ is metabolized by hepatic cytochrome P-450 drug-metabolizing enzymes (CYP3A4, CYP2D6, CYP1A2), inducers or inhibitors of these enzymes may change the clearance and, hence, the half-life of @DRUG$. DDI-false 203 | DDI-DrugBank.d763.s7.p0 @DRUG$ did not increase blood levels of high-dose @DRUG$. DDI-false 204 | DDI-DrugBank.d765.s0.p0 CYP3A4 Inhibitors: @DRUG$, an inhibitor of the drug metabolizing enzyme CYP3A4, significantly increased plasma concentrations of @DRUG$ when coadministered to subjects who were poor metabolizers (see CLINICAL PHARMACOLOGY, Variability in Metabolism and Drug-Drug Interactions). DDI-mechanism 205 | DDI-DrugBank.d766.s0.p0 Since @DRUG$ is metabolized by a P450 enzyme system, drugs that induce or inhibit this drug metabolizing enzyme system may elicit important drug-drug interactions that may alter @DRUG$ plasma concentrations. DDI-false 206 | DDI-DrugBank.d769.s1.p0 Physicians should carefully monitor PT and INR in patients concurrently administered @DRUG$ and @DRUG$. DDI-advise 207 | DDI-DrugBank.d770.s4.p0 however, in patients with Paget's Disease prior @DRUG$ use appears to reduce the anti-resorptive response to @DRUG$ nasal spray. DDI-effect 208 | DDI-DrugBank.d775.s1.p0 Prior administration of @DRUG$ can potentiate the neuromuscular blocking effects of @DRUG$. DDI-effect 209 | DDI-DrugBank.d775.s2.p0 Evidence of spontaneous recovery from @DRUG$ should be observed before the administration of @DRUG$. DDI-advise 210 | DDI-DrugBank.d775.s4.p0 There are no clinical data on the use of @DRUG$ with other @DRUG$. DDI-false 211 | DDI-DrugBank.d776.s17.p0 Depending on clinical circumstances, consideration should be given to increasing the dose of @DRUG$ when it is administered with @DRUG$. DDI-advise 212 | DDI-DrugBank.d776.s23.p0 The combined use of @DRUG$ at doses of 400 mg or greater with @DRUG$ is contraindicated. DDI-advise 213 | DDI-DrugBank.d776.s24.p0 The coadministration of @DRUG$ at doses lower than 400 mg/day with @DRUG$ should be carefully monitored. DDI-advise 214 | DDI-DrugBank.d776.s27.p0 The combined use of @DRUG$ with @DRUG$ is contraindicated. DDI-advise 215 | DDI-DrugBank.d776.s3.p0 one fatality has been reported from hypoglycemia in association with combined @DRUG$ and @DRUG$ use. DDI-effect 216 | DDI-DrugBank.d776.s32.p0 Patients receiving @DRUG$ and @DRUG$ concomitantly should be carefully monitored. DDI-advise 217 | DDI-DrugBank.d776.s34.p0 Patients receiving @DRUG$ and @DRUG$ concomitantly should be carefully monitored. DDI-advise 218 | DDI-DrugBank.d776.s39.p0 however, in some patients there were decreases up to 47% and 33% of @DRUG$ and @DRUG$ levels. DDI-false 219 | DDI-DrugBank.d776.s8.p0 Careful monitoring of prothrombin time in patients receiving @DRUG$ and @DRUG$ is recommended. DDI-advise 220 | DDI-DrugBank.d777.s0.p0 The interaction of @DRUG$, @DRUG$, with other drugs has not been well studied. DDI-false 221 | DDI-DrugBank.d777.s3.p0 In the treatment of acute MI, @DRUG$, when not otherwise contraindicated, should be administered with @DRUG$ ( see below ). DDI-false 222 | DDI-DrugBank.d777.s4.p0 Anticoagulation and @DRUG$ After Treatment for Myocardial Infarction -- In the treatment of acute myocardial infarction, the use of @DRUG$ has been shown to reduce the incidence of reinfarction and stroke. DDI-false 223 | DDI-DrugBank.d777.s5.p0 The addition of @DRUG$ to @DRUG$ causes a minimal increase in the risk of minor bleeding (3.9% vs. 3.1%), but does not appear to increase the incidence of major bleeding (see DDI-effect 224 | DDI-DrugBank.d779.s0.p0 When @DRUG$ is used with other @DRUG$, potentiation of antihypertensive effect may occur. DDI-effect 225 | DDI-DrugBank.d779.s2.p0 Patients may require reduced doses of @DRUG$ when on @DRUG$. DDI-advise 226 | DDI-DrugBank.d781.s1.p0 The possibility of hypotensive effects can be minimized by either discontinuing the @DRUG$ or increasing the salt intake prior to initiation of treatment with @DRUG$. DDI-effect 227 | DDI-DrugBank.d781.s10.p0 Use of a @DRUG$ may further increase the risk of @DRUG$ toxicity. DDI-false 228 | DDI-DrugBank.d781.s15.p0 Food Interaction: Oral administration of @DRUG$ Tablets with food does not significantly lower the rate or extent of @DRUG$ absorption relative to the fasted state. DDI-false 229 | DDI-DrugBank.d781.s16.p0 However, the extent of biotransformation of @DRUG$ to the active metabolite, @DRUG$, is reduced approximately 43%, resulting in a reduction in the plasma ACE inhibition curve of approximately 20%, probably clinically insignificant. DDI-false 230 | DDI-DrugBank.d781.s2.p0 If @DRUG$ cannot be interrupted, close medical supervision should be provided with the first dose of @DRUG$ Tablets, for at least two hours and until blood pressure has stabilized for another hour. DDI-advise 231 | DDI-DrugBank.d781.s3.p0 The rate and extent of @DRUG$ absorption and elimination are not affected by concomitant @DRUG$. DDI-false 232 | DDI-DrugBank.d781.s4.p0 The bioavailability of @DRUG$ was reduced by @DRUG$, however, and this was associated with a decrease in plasma ACE inhibition. DDI-false 233 | DDI-DrugBank.d784.s0.p0 The concomitant use of @DRUG$ with other @DRUG$ or with other agents which produce dry mouth, constipation, somnolence (drowsiness), and/or other anticholinergic-like effects may increase the frequency and/or severity of such effects. DDI-effect 234 | DDI-MedLine.d143.s0.p0 [Interaction between @DRUG$ and @DRUG$]. DDI-false 235 | DDI-MedLine.d143.s1.p0 The drug interaction between @DRUG$ and @DRUG$ has been the subject of much study in recent years. DDI-false 236 | DDI-MedLine.d143.s2.p0 Contradictory results regarding the effect of @DRUG$ on platelet reactivity and on clinical outcome in @DRUG$-treated patients have been reported in literature. DDI-false 237 | DDI-MedLine.d143.s4.p0 @DRUG$ has a much weaker effect on @DRUG$'s pharmacokinetics and on platelet reactivity during concomitant use. DDI-mechanism 238 | DDI-MedLine.d143.s5.p0 The influence of the other @DRUG$ when used simultaneously with @DRUG$ has not yet been investigated in adequately randomized studies. DDI-false 239 | DDI-MedLine.d143.s7.p0 To date, there is no conclusive evidence of a clinically-relevant interaction between any of the @DRUG$ and @DRUG$. DDI-false 240 | DDI-MedLine.d147.s0.p0 Interaction of @DRUG$ and @DRUG$, in vitro and in vivo, in human A375 melanoma cells. DDI-false 241 | DDI-MedLine.d147.s2.p0 The effect of @DRUG$ and @DRUG$, alone and in combination, was evaluated for growth inhibition (MTT), radiolabeled drug uptake, cellular thiol content (HPLC), and DNA strand breaks (Comet assay). DDI-false 242 | DDI-MedLine.d147.s6.p0 There was a >75% reduction in cellular glutathione and cysteine with @DRUG$ but not @DRUG$. DDI-false 243 | DDI-MedLine.d147.s8.p0 @DRUG$ and @DRUG$ show additive effects in vitro but not in vivo in human A375 melanoma cells. DDI-effect 244 | DDI-MedLine.d154.s0.p0 Enhancement of humoral immune responses to inactivated Newcastle disease and @DRUG$ by oral administration of @DRUG$ in chickens. DDI-false 245 | DDI-MedLine.d154.s3.p0 In experiment 1, oral administration of @DRUG$ at a dose of 5 mg/kg of BW for 7 d on the immune response in chickens intramuscularly injected with @DRUG$ was evaluated. DDI-false 246 | DDI-MedLine.d154.s5.p0 In experiment 2, the same regimen of @DRUG$ was administered to chickens inoculated with @DRUG$, and an enhanced serum antibody response to AI vaccination was also observed. DDI-effect 247 | DDI-MedLine.d154.s6.p0 Considering the safety of @DRUG$, because no adverse effect was found throughout the experiments, @DRUG$ may be a promising oral adjuvant to improve immunization in poultry. DDI-false 248 | DDI-MedLine.d159.s0.p0 Protective effect of @DRUG$ and @DRUG$ against the acute toxicity of diepoxybutane to human lymphocytes. DDI-false 249 | DDI-MedLine.d159.s6.p0 (ii) @DRUG$ elicits a significant protective effect on DEB induced toxicity, which was potentiated by @DRUG$. DDI-effect 250 | DDI-MedLine.d161.s3.p0 In the present study, we tested whether the highly specific metabotropic glutamate receptor 5 antagonist, @DRUG$, reduces dopaminergic and noradrenergic neuronal loss in monkeys rendered parkinsonian by chronic treatment with low doses of @DRUG$. DDI-false 251 | DDI-MedLine.d161.s7.p0 The @DRUG$ treatment also had a significant effect on the @DRUG$-induced loss of norepinephrine neurons in the locus coeruleus and adjoining A5 and A7 noradrenaline cell groups. DDI-effect 252 | DDI-MedLine.d161.s9.p0 Our data demonstrate that chronic treatment with the metabotropic glutamate receptor 5 antagonist, @DRUG$, significantly reduces @DRUG$ toxicity towards dopaminergic and noradrenergic cell groups in non-human primates. DDI-effect 253 | DDI-MedLine.d163.s4.p0 The susceptibility results for @DRUG$ and @DRUG$ were interpreted according to the most relevant criteria. DDI-false 254 | DDI-MedLine.d163.s7.p0 The combination of @DRUG$ and @DRUG$ can be synergistic against MRSA. DDI-effect 255 | DDI-MedLine.d164.s0.p0 @DRUG$ enhances @DRUG$-mediated antibody-dependent cellular cytotoxicity by up-regulation of cell surface HER2 expression. DDI-effect 256 | DDI-MedLine.d164.s3.p0 In an in vitro assay, @DRUG$ induced HER2 expression at the cell surface of HER2-positive breast cancer cell lines, leading to the enhancement of @DRUG$-mediated ADCC. DDI-effect 257 | DDI-MedLine.d164.s4.p0 Furthermore, we present a case report in which a second @DRUG$ treatment following @DRUG$ resulted in the marked shrinkage of multiple metastatic tumors in HER2-positive breast cancer. DDI-effect 258 | DDI-MedLine.d166.s0.p0 Improved parathyroid hormone control by @DRUG$ is associated with reduction in @DRUG$ requirement in patients with end-stage renal disease. DDI-false 259 | DDI-MedLine.d166.s1.p0 Uncontrolled hy-per-parathyroidism causes bone marrow fibrosis, leading to @DRUG$ (@DRUG$) resistance. DDI-false 260 | DDI-MedLine.d166.s17.p0 Reduction of PTH by @DRUG$ is associated with a decrease in @DRUG$ requirement. DDI-effect 261 | DDI-MedLine.d166.s2.p0 Medical treatment with @DRUG$ is effective in reducing plasma parathyroid hormone (PTH) levels, but its effect on @DRUG$ dosing is unknown. DDI-false 262 | DDI-MedLine.d166.s6.p0 The primary objective of the study was to ascertain the difference in @DRUG$ responsiveness before and after 12 months of @DRUG$ therapy. DDI-false 263 | DDI-MedLine.d169.s6.p0 We also evaluated the effects of @DRUG$ doses between 200 and 800 mg on absorption of 5 mg @DRUG$ [as concentrated RBC (CRBC)]. DDI-false 264 | DDI-MedLine.d169.s8.p0 @DRUG$ doses 1000 mg diminished @DRUG$ absorption by an average of 49.6%. DDI-mechanism 265 | DDI-MedLine.d169.s9.p0 A @DRUG$ dose of 800 mg diminished absorption of 5 mg @DRUG$ by 37.7%. DDI-mechanism 266 | DDI-MedLine.d179.s0.p0 Synergistic interaction between @DRUG$ and @DRUG$ is sequence dependent in human non-small lung cancer with EGFR TKIs-resistant mutation. DDI-int 267 | DDI-MedLine.d179.s3.p0 NSCLC cell lines with EGFR T790M mutation and K-ras mutation were exposed to either @DRUG$ or @DRUG$ or both based on various sequential administrations. DDI-false 268 | DDI-MedLine.d179.s5.p0 @DRUG$ exhibited dose-dependent growth inhibition in NSCLC cell lines and arrested cell cycle at G1 phase, whereas @DRUG$ arrested at S phase. DDI-false 269 | DDI-MedLine.d179.s6.p0 Although single or concurrent use of @DRUG$ and @DRUG$ has some anti-proliferative effects, the sequential administrations of both drugs remarkably enhanced anti-tumor activity. DDI-effect 270 | DDI-MedLine.d179.s7.p0 When cells were exposed to @DRUG$ followed by @DRUG$, synergism was observed. DDI-effect 271 | DDI-MedLine.d179.s8.p0 The molecular basis of this synergism is that the signaling pathways that were initially activated by @DRUG$ exposure were efficiently suppressed by the subsequent exposure to @DRUG$. DDI-effect 272 | DDI-MedLine.d180.s1.p0 The @DRUG$ (@DRUG$), one of S-adenosylhomocysteine (AdoHcy) hydrolase inhibitors, has shown antitumor activities in a broad range of solid tumors and acute myeloid leukemia. DDI-false 273 | DDI-MedLine.d181.s4.p0 The authors report the case of an infant with confirmed congenital hypothyroidism on @DRUG$ who experienced a possible drug interaction with @DRUG$. DDI-int 274 | DDI-MedLine.d181.s6.p0 Questioning revealed the child was taking @DRUG$ drops before feeds while on @DRUG$. DDI-false 275 | DDI-MedLine.d181.s8.p0 Drug interaction of @DRUG$ with @DRUG$ has not been reported previously and is not listed in the British National Formulary for Children. DDI-false 276 | DDI-MedLine.d184.s0.p0 @DRUG$ locomotor sensitization and conditioned place preference in adolescent male and female rats neonatally treated with @DRUG$. DDI-false 277 | DDI-MedLine.d184.s2.p0 The objective was to analyze the effects of neonatal @DRUG$ treatment on effects of @DRUG$ in adolescent rats using locomotor sensitization and conditioned place preference procedures. DDI-false 278 | DDI-MedLine.d184.s6.p0 Male rats demonstrated sensitization to @DRUG$, although this was muted compared with female rats, and were unaffected by neonatal @DRUG$. DDI-false 279 | DDI-MedLine.d185.s12.p0 Co-administration of @DRUG$ with CYP3A inhibitors is feasible as the observed increase in @DRUG$ PK parameters was not considered clinically relevant. DDI-false 280 | DDI-MedLine.d185.s13.p0 Considering the variability in exposure following enzyme inhibition and the fact that chronic dosing of @DRUG$ was not studied with CYP3A inhibitors, close monitoring of @DRUG$-related adverse events is necessary. DDI-false 281 | DDI-MedLine.d185.s2.p0 This study evaluated the effect of a potent CYP3A inhibitor, @DRUG$, on the pharmacokinetics and safety of @DRUG$. DDI-false 282 | DDI-MedLine.d185.s5.p0 On day 8, a single @DRUG$ dose was co-administered with @DRUG$. DDI-false 283 | DDI-MedLine.d185.s7.p0 In the presence of @DRUG$, there was 1.6- and 1.8-fold increase in C (max) and AUC of @DRUG$, respectively. DDI-mechanism 284 | DDI-MedLine.d189.s11.p0 These results suggest that rat liver slices do not reproduce the reported in vivo biotransformation of @DRUG$ and therefore may not be an appropriate model for @DRUG$ metabolism. DDI-false 285 | DDI-MedLine.d189.s3.p0 The incubations were performed in the absence and presence of the non-specific CYP inhibitor, @DRUG$ (@DRUG$) and isoform-specific inhibitors. DDI-false 286 | DDI-MedLine.d189.s7.p0 @DRUG$ decreased the toxicity of @DRUG$, increased exposure to parent compound, and decreased metabolite levels in a dose-dependent manner. DDI-effect 287 | DDI-MedLine.d189.s8.p0 Of the isoform-specific CYP inhibitors tested for an effect on the @DRUG$ metabolite profile, only @DRUG$ was noticeably effective, indicating a role of CYPs 2A6, 2C9, 2Cl9, and 2E1. DDI-false 288 | DDI-MedLine.d190.s1.p0 To determine whether the co-injection of extracellular matrix degrading enzymes improves retinal transduction following intravitreal delivery of @DRUG$ (@DRUG$). DDI-false 289 | DDI-MedLine.d190.s5.p0 The addition of @DRUG$ or @DRUG$ greatly enhanced transduction of the retinal ganglion cell layer and increased the depth of transduction into the outer retina. DDI-false 290 | DDI-MedLine.d190.s6.p0 @DRUG$ had a limited effect and @DRUG$ was ineffective. DDI-false 291 | DDI-MedLine.d190.s7.p0 Electroretinograms survived with higher concentrations of @DRUG$ and @DRUG$ than were required for optimal retinal transduction. DDI-false 292 | DDI-MedLine.d194.s0.p0 The role of p27(Kip1) in @DRUG$-enhanced @DRUG$ cytotoxicity in human ovarian cancer cells. DDI-effect 293 | DDI-MedLine.d194.s11.p0 @DRUG$ + @DRUG$ vs. DDI-false 294 | DDI-MedLine.d194.s15.p0 The siRNA knockdown of p27(Kip1) decreased @DRUG$- and @DRUG$-induced apoptosis compared with a negative control siRNA (sub-G1 fraction, control siRNA vs. p27(Kip1) siRNA: 42.5% vs. 20.1%, difference = 22.4%, 95% CI = 20.1% to 24.7%, P = .017). DDI-false 295 | DDI-MedLine.d194.s16.p0 Studies with forced expression and siRNA knockdown of Bcl-2 and Cdk1 suggest that @DRUG$-mediated induction of p27(Kip1) enhanced @DRUG$-induced apoptosis by negatively regulating Bcl-2 and Cdk1 expression. DDI-effect 296 | DDI-MedLine.d194.s17.p0 Inhibition of Src family and Abl kinases with either siRNAs or @DRUG$ enhances @DRUG$ sensitivity of ovarian cancer cells through p27(Kip1)-mediated suppression of Bcl-2 and Cdk1 expression. DDI-effect 297 | DDI-MedLine.d194.s4.p0 The effect of @DRUG$, an inhibitor of Src and Abl kinases, on @DRUG$ sensitivity was measured in ovarian cancer cells and HEY xenografts. DDI-false 298 | DDI-MedLine.d194.s5.p0 The roles of p27(Kip1), Bcl-2, and Cdk1 in apoptosis induced by @DRUG$ and @DRUG$ were assessed using a terminal deoxynucleotidyl transferase-mediated dUTP nick-end labeling (TUNEL) assay, siRNA knockdown of gene expression, transfection with Bcl-2 and Cdk1 expression vectors, and flow cytometry. DDI-false 299 | DDI-MedLine.d194.s9.p0 HEY cells treated with @DRUG$ plus @DRUG$ formed fewer colonies than did cells treated with either agent alone. DDI-effect 300 | DDI-MedLine.d195.s0.p0 Influence of @DRUG$ on @DRUG$ induced antinociception and its pharmacokinetics. DDI-false 301 | DDI-MedLine.d195.s5.p0 @DRUG$ plasma concentration was also increased when it was administered with @DRUG$. DDI-mechanism 302 | DDI-MedLine.d195.s7.p0 From this study it can be concluded that @DRUG$ can be used as a bioenhancer along with @DRUG$. DDI-effect 303 | DDI-MedLine.d199.s0.p0 Prevention of emergence agitation in seven children receiving low-dose @DRUG$ and @DRUG$ total intravenous anesthesia. DDI-false 304 | DDI-MedLine.d199.s2.p0 We retrospectively reviewed the records of 7 pediatric oncology patients who received low-dose @DRUG$ in conjunction with @DRUG$ for total intravenous anesthesia (TIVA) repeatedly for radiation therapy. DDI-false 305 | DDI-MedLine.d200.s0.p0 [Efficacy of fixed combination @DRUG$/@DRUG$ in hospitalized patients with hypertensive disease] DDI-false 306 | DDI-MedLine.d200.s1.p0 Efficacy and tolerability of fixed @DRUG$/@DRUG$ combination was studied in 86 patients with hypertensive disease hospitalized in departments of general internal medicine or cardiology. DDI-false 307 | DDI-MedLine.d200.s2.p0 All patients had indications for antihypertensive therapy and were randomized either to fixed combination @DRUG$/@DRUG$ (n=43) or to therapy which corresponded to the hospital formulary (n=43). DDI-false 308 | DDI-MedLine.d200.s7.p0 Rate of achievement of target BP with fixed combination @DRUG$/@DRUG$ (93%) was comparable with that on traditional therapy (90%). DDI-false 309 | DDI-MedLine.d201.s4.p0 The aim of the present study was to investigate whether HB-EGF is a therapeutic target for T-ALL, and to further elucidate the antitumor effects of a specific inhibitor of HB-EGF, @DRUG$ (@DRUG$). DDI-false 310 | DDI-MedLine.d201.s5.p0 We elucidated the expression of HB-EGF in T-ALL cell lines, and evaluated the effect of @DRUG$ on these cells alone or in combination with @DRUG$. DDI-false 311 | DDI-MedLine.d204.s0.p0 Distinct synergistic action of @DRUG$ and @DRUG$ against Pseudomonas aeruginosa. DDI-false 312 | DDI-MedLine.d204.s9.p0 Distinct and statistically significant synergism was observed between @DRUG$ and @DRUG$ by disc diffusion tests when compared with their individual effects. DDI-effect 313 | DDI-MedLine.d208.s0.p0 @DRUG$ failure in an HIV-positive woman on @DRUG$ therapy resulting in two ectopic pregnancies. DDI-effect 314 | DDI-MedLine.d208.s3.p0 We present an interesting case of an HIV-positive woman on @DRUG$ therapy having tubal pregnancies on two separate occasions with @DRUG$ in place. DDI-effect 315 | DDI-MedLine.d209.s1.p0 @DRUG$, an @DRUG$, is the standard hormone treatment for breast cancer. DDI-false 316 | DDI-MedLine.d209.s10.p0 Depression does not always require @DRUG$ therapy, and @DRUG$ have no proven preventive impact on hot flushes linked to the menopause. DDI-false 317 | DDI-MedLine.d211.s2.p0 The preclinical combination of @DRUG$ with the mTOR inhibitor @DRUG$ has displayed synergy in vitro and represents a novel combination in MM. DDI-effect 318 | DDI-MedLine.d213.s0.p0 In vivo CYP3A activity is significantly lower in @DRUG$-treated as compared with @DRUG$-treated renal allograft recipients. DDI-false 319 | DDI-MedLine.d213.s1.p0 In vitro studies have identified @DRUG$ and @DRUG$ as CYP3A inhibitors. DDI-false 320 | DDI-MedLine.d214.s0.p0 [Influence of @DRUG$ and @DRUG$ on MPTP-evoked behavior violations in C57BL/6 mice]. DDI-false 321 | DDI-MedLine.d214.s5.p0 Acute administration of @DRUG$ or @DRUG$ failed to influence locomotion in mice, while their combination normalized motor activity. DDI-effect 322 | DDI-MedLine.d216.s5.p0 In the @DRUG$ and @DRUG$ phases, the areas under the effect-time curves (self-reported drowsiness and performance) were significantly higher than those in the placebo phase (P < 0.05). DDI-false 323 | DDI-MedLine.d216.s6.p0 The findings suggest that the dosage of @DRUG$ should be reduced in patients receiving @DRUG$. DDI-advise 324 | DDI-MedLine.d217.s0.p0 Interaction of @DRUG$ with different @DRUG$ is antagonistic in breast but not in other cancer cells. DDI-effect 325 | DDI-MedLine.d217.s11.p0 These results, if confirmed in-vivo, indicate that @DRUG$ is not a suitable chemosensitizer for breast cancer or with @DRUG$ for other cancers. DDI-false 326 | DDI-MedLine.d217.s2.p0 This study investigates the ability of @DRUG$ to sensitize cells from different origins to several @DRUG$. DDI-false 327 | DDI-MedLine.d218.s0.p0 @DRUG$-@DRUG$ interaction and associated gastrointestinal bleeding risk in a case-control study. DDI-false 328 | DDI-MedLine.d218.s3.p0 We sought to evaluate whether initiation of an @DRUG$ increases the risk of hospitalization for gastrointestinal bleeding in @DRUG$ users. DDI-false 329 | DDI-MedLine.d218.s5.p0 In total, 430,455 @DRUG$ users contributed 407,370 person-years of @DRUG$ use. DDI-false 330 | DDI-MedLine.d218.s9.p0 Also @DRUG$, which is not believed to interact with @DRUG$, increased the risk of GI bleeding (OR = 1.75 [95% CI, 1.30-2.35]). DDI-effect 331 | DDI-MedLine.d227.s0.p0 @DRUG$ enhances the antinociceptive effect of @DRUG$ in mice. DDI-effect 332 | DDI-MedLine.d227.s2.p0 The purpose of this study was to evaluate whether the psychostimulant drug @DRUG$ has any effect on @DRUG$-induced antinociception and locomotor inhibitor activity in mice in acute application. DDI-false 333 | DDI-MedLine.d227.s8.p0 At the same time, the locomotor inhibitory effect of @DRUG$ was counteracted by @DRUG$. DDI-effect 334 | DDI-MedLine.d231.s0.p0 Interaction study of @DRUG$ and @DRUG$ with co-administered drugs. DDI-false 335 | DDI-MedLine.d231.s6.p0 The response of @DRUG$ and @DRUG$ after interaction with co-administered drugs at different conditions and temperature were noted using a Shimadzu HPLC system with PDA detector. DDI-false 336 | DrugDDI.d21928724.s8.p0 @DRUG$ can be a useful @DRUG$ especially in cases which respond to drugs having mixed effects on serotonin and catecholamines levels in the brain. DDI-false 337 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Chemical-protein Interaction Extraction via Gaussian Probability Distribution and External Biomedical Knowledge 2 | 3 | 4 | ## Paper ## 5 | [Chemical-protein Interaction Extraction via Gaussian Probability Distribution and External Biomedical Knowledge](https://arxiv.org/abs/1911.09487) 6 | 7 | ## Model structure ## 8 | 9 | 10 | 11 | ## Codes ## 12 | In this repository, we provide the codes of our proposed model. \ 13 | 'pytorch_model.bin' in the Resources and Records can be obtained from [pytorch_models](https://drive.google.com/drive/folders/15o_h-_YQUgccvc9202hTGrSyZfrzOPFX?usp=sharing). 14 | ``` 15 | CPI Extraction 16 | -Resources 17 | -NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 18 | -vocab.txt 19 | -pytorch_model.bin 20 | -bert_config.json 21 | -Records 22 | -record_76.56% 23 | -pytorch_model.bin 24 | -bert_config.json 25 | -eval_results.txt 26 | -test_results.txt 27 | -SourceCode 28 | -BioRE.py 29 | -BioRE_BG.py 30 | -modeling.py 31 | -modeling_BG.py 32 | -file_utils.py 33 | -ProcessedData 34 | -CHEMPROT 35 | -train.tsv 36 | -dev.tsv 37 | -test.tsv 38 | -test_overlapping.tsv 39 | -test_normal.tsv 40 | -DDIExtraction2013 41 | -train.tsv 42 | -dev.tsv 43 | -test.tsv 44 | -test_overlapping.tsv 45 | -test_normal.tsv 46 | ``` 47 | 48 | * Resources: provide the BERT model pre-trained on PubMed. 49 | * Records: provide a record of our proposed model. 50 | * SourceCode: provide the source codes. BioRE.py performs our proposed model; and BioRE.py performs the 'BERT+Gaussian' model. 51 | * ProcessedData: provide the CHEMPROT and DDIExtraction2013 datasets, including the overlapping and normal instances. 52 | 53 | 54 | 55 | ## Tested environments ## 56 | 57 | * Ubuntu 16.04 58 | * python 3.6.9 59 | * pytorch-pretrained-bert 0.6.1 60 | * torch 1.1.0 61 | * numpy 1.16.4 62 | * pandas 0.25.0 63 | * scipy 1.3.1 64 | 65 | 66 | ## Run models ## 67 | Since the model contains multiple layer, it generally need some time to train. If the users have no time to train model, the saved model in the Records can be loaded to test. \ 68 | Some examples of execution instructions are listed below. 69 | 70 | 71 | #### Run our proposed model #### 72 | ``` 73 | python BioRE.py \ 74 | --task_name cpi \ 75 | --do_train \ 76 | --do_eval \ 77 | --do_predict \ 78 | --do_lower_case \ 79 | --data_dir /$YourPath/CPI_extraction/ProcessedData/CHEMPROT \ 80 | --bert_model /$YourPath/CPI_extraction/Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 \ 81 | --max_seq_length 128 \ 82 | --train_batch_size 16 \ 83 | --eval_batch_size 8 \ 84 | --predict_batch_size 8 \ 85 | --learning_rate 2e-5 \ 86 | --num_train_epochs 2.0 \ 87 | --seed 47 \ 88 | --output_dir /$YourOutputPath 89 | ``` 90 | #### Load the record of our proposed model #### 91 | ``` 92 | python BioRE.py \ 93 | --task_name cpi \ 94 | --do_eval \ 95 | --do_predict \ 96 | --do_lower_case \ 97 | --data_dir /$YourPath/CPI_extraction/ProcessedData/CHEMPROT \ 98 | --bert_model /$YourPath/CPI_extraction/Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 \ 99 | --saved_model /$YourSavedmodelPath \ 100 | --max_seq_length 128 \ 101 | --train_batch_size 16 \ 102 | --eval_batch_size 8 \ 103 | --predict_batch_size 8 \ 104 | --learning_rate 2e-5 \ 105 | --num_train_epochs 2.0 \ 106 | --output_dir /$YourOutputPath 107 | ``` 108 | #### Run the 'BERT+Gaussian' model on the CHEMPROT dataset #### 109 | ``` 110 | python BioRE_BG.py \ 111 | --task_name cpi \ 112 | --do_train \ 113 | --do_eval \ 114 | --do_predict \ 115 | --do_lower_case \ 116 | --data_dir /$YourPath/CPI_extraction/ProcessedData/CHEMPROT \ 117 | --bert_model /$YourPath/CPI_extraction/Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 \ 118 | --max_seq_length 128 \ 119 | --train_batch_size 16 \ 120 | --eval_batch_size 8 \ 121 | --predict_batch_size 8 \ 122 | --learning_rate 2e-5 \ 123 | --num_train_epochs 2.0 \ 124 | --seed 47 \ 125 | --output_dir /$YourOutputPath 126 | ``` 127 | #### Run the 'BERT+Gaussian' model on the DDIExtraction dataset #### 128 | ``` 129 | python BioRE_BG.py \ 130 | --task_name ddi \ 131 | --do_train \ 132 | --do_eval \ 133 | --do_predict \ 134 | --do_lower_case \ 135 | --data_dir /$YourPath/CPI_extraction/ProcessedData/DDIExtraction2013 \ 136 | --bert_model /$YourPath/CPI_extraction/Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12 \ 137 | --max_seq_length 128 \ 138 | --train_batch_size 16 \ 139 | --eval_batch_size 8 \ 140 | --predict_batch_size 8 \ 141 | --learning_rate 2e-5 \ 142 | --num_train_epochs 3.0 \ 143 | --seed 17 \ 144 | --output_dir /$YourOutputPath 145 | ``` 146 | 147 | ## Cite ## 148 | 149 | ``` 150 | @article{10.1093/bioinformatics/btaa491, 151 | author = {Sun, Cong and Yang, Zhihao and Su, Leilei and Wang, Lei and Zhang, Yin and Lin, Hongfei and Wang, Jian}, 152 | title = "{Chemical-protein Interaction Extraction via Gaussian Probability Distribution and External Biomedical Knowledge}", 153 | journal = {Bioinformatics}, 154 | year = {2020}, 155 | month = {05}, 156 | issn = {1367-4803}, 157 | doi = {10.1093/bioinformatics/btaa491}, 158 | url = {https://doi.org/10.1093/bioinformatics/btaa491}, 159 | note = {btaa491}, 160 | } 161 | ``` 162 | 163 | -------------------------------------------------------------------------------- /Records/record_76.56%/bert_config.json: -------------------------------------------------------------------------------- 1 | { 2 | "attention_probs_dropout_prob": 0.1, 3 | "hidden_act": "gelu", 4 | "hidden_dropout_prob": 0.1, 5 | "hidden_size": 768, 6 | "initializer_range": 0.02, 7 | "intermediate_size": 3072, 8 | "max_position_embeddings": 512, 9 | "num_attention_heads": 12, 10 | "num_hidden_layers": 12, 11 | "type_vocab_size": 2, 12 | "vocab_size": 30522 13 | } 14 | -------------------------------------------------------------------------------- /Records/record_76.56%/eval_results.txt: -------------------------------------------------------------------------------- 1 | F1 = 0.7709061023217587 2 | P = 0.7654018767849857 3 | R = 0.7764900662251656 4 | eval_loss = 0.29068153011782727 5 | global_step = 2434 6 | loss = 0.13145572502983333 7 | -------------------------------------------------------------------------------- /Records/record_76.56%/test_results.txt: -------------------------------------------------------------------------------- 1 | F1 = 0.7656477438136826 2 | P = 0.7708089097303634 3 | R = 0.7605552342394447 4 | global_step = 2434 5 | loss = 0.13145572502983333 6 | test_loss = 0.3190204159669293 7 | -------------------------------------------------------------------------------- /Resources/NCBI_BERT_pubmed_uncased_L-12_H-768_A-12/bert_config.json: -------------------------------------------------------------------------------- 1 | { 2 | "attention_probs_dropout_prob": 0.1, 3 | "hidden_act": "gelu", 4 | "hidden_dropout_prob": 0.1, 5 | "hidden_size": 768, 6 | "initializer_range": 0.02, 7 | "intermediate_size": 3072, 8 | "max_position_embeddings": 512, 9 | "num_attention_heads": 12, 10 | "num_hidden_layers": 12, 11 | "type_vocab_size": 2, 12 | "vocab_size": 30522 13 | } 14 | -------------------------------------------------------------------------------- /SourceCode/BioRE.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Author: Cong Sun 3 | # Email: suncong132@mail.dlut.edu.cn 4 | # 5 | # BioRE 6 | # 7 | #-------------------------------------------------------------------------- 8 | 9 | 10 | 11 | from __future__ import absolute_import, division, print_function 12 | import argparse 13 | import csv 14 | import logging 15 | import os 16 | import random 17 | import sys 18 | import numpy as np 19 | import torch 20 | from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,TensorDataset) 21 | from torch.utils.data.distributed import DistributedSampler 22 | from tqdm import tqdm, trange 23 | from torch.nn import CrossEntropyLoss, MSELoss 24 | import scipy 25 | from scipy.stats import pearsonr, spearmanr 26 | from sklearn.metrics import matthews_corrcoef, f1_score, precision_recall_fscore_support 27 | 28 | import modeling 29 | from modeling import BertForSequenceClassification, BertConfig, WEIGHTS_NAME, CONFIG_NAME 30 | from pytorch_pretrained_bert.file_utils import PYTORCH_PRETRAINED_BERT_CACHE 31 | from pytorch_pretrained_bert.tokenization import BertTokenizer 32 | from pytorch_pretrained_bert.optimization import BertAdam, warmup_linear 33 | import os 34 | 35 | logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s', 36 | datefmt = '%m/%d/%Y %H:%M:%S', 37 | level = logging.INFO) 38 | logger = logging.getLogger(__name__) 39 | 40 | sigma = 2.5 41 | mu = 0.0 42 | 43 | norm_0=scipy.stats.norm(mu,sigma).cdf(1)-scipy.stats.norm(mu,sigma).cdf(0) 44 | norm_1=scipy.stats.norm(mu,sigma).cdf(2)-scipy.stats.norm(mu,sigma).cdf(1) 45 | norm_2=scipy.stats.norm(mu,sigma).cdf(3)-scipy.stats.norm(mu,sigma).cdf(2) 46 | norm_3=scipy.stats.norm(mu,sigma).cdf(4)-scipy.stats.norm(mu,sigma).cdf(3) 47 | norm_4=scipy.stats.norm(mu,sigma).cdf(5)-scipy.stats.norm(mu,sigma).cdf(4) 48 | norm_5=scipy.stats.norm(mu,sigma).cdf(6)-scipy.stats.norm(mu,sigma).cdf(5) 49 | norm_6=scipy.stats.norm(mu,sigma).cdf(7)-scipy.stats.norm(mu,sigma).cdf(6) 50 | 51 | 52 | 53 | class InputExample(object): 54 | """A single training/test example for simple sequence classification.""" 55 | 56 | def __init__(self, guid, title, text_a, text_b=None, sdp=None, label=None): 57 | """Constructs a InputExample.""" 58 | self.guid = guid 59 | self.title = title 60 | self.text_a = text_a 61 | self.text_b = text_b 62 | self.sdp = sdp 63 | self.label = label 64 | 65 | 66 | class InputFeatures(object): 67 | """A single set of features of data.""" 68 | 69 | def __init__(self, title_ids, title_mask, title_segment, input_ids, input_mask, segment_ids, \ 70 | sdp_ids, sdp_mask, sdp_segment, P_gauss1_list, P_gauss2_list, label_id): 71 | 72 | self.title_ids=title_ids 73 | self.title_mask=title_mask 74 | self.title_segment = title_segment 75 | 76 | self.input_ids = input_ids 77 | self.input_mask = input_mask 78 | self.segment_ids = segment_ids 79 | 80 | self.sdp_ids = sdp_ids 81 | self.sdp_mask = sdp_mask 82 | self.sdp_segment = sdp_segment 83 | 84 | self.P_gauss1_list = P_gauss1_list 85 | self.P_gauss2_list = P_gauss2_list 86 | self.label_id = label_id 87 | 88 | 89 | class DataProcessor(object): 90 | """Base class for data converters for sequence classification data sets.""" 91 | 92 | def get_train_examples(self, data_dir): 93 | """Gets a collection of `InputExample`s for the train set.""" 94 | raise NotImplementedError() 95 | 96 | def get_dev_examples(self, data_dir): 97 | """Gets a collection of `InputExample`s for the dev set.""" 98 | raise NotImplementedError() 99 | 100 | def get_labels(self): 101 | """Gets the list of labels for this data set.""" 102 | raise NotImplementedError() 103 | 104 | @classmethod 105 | def _read_tsv(cls, input_file, quotechar=None): 106 | """Reads a tab separated value file.""" 107 | with open(input_file, "r") as f: 108 | reader = csv.reader(f, delimiter="\t", quotechar=quotechar) 109 | lines = [] 110 | for line in reader: 111 | if sys.version_info[0] == 2: 112 | line = list(unicode(cell, 'utf-8') for cell in line) 113 | lines.append(line) 114 | return lines 115 | 116 | 117 | class BioBERTChemprotProcessor(DataProcessor): 118 | """Processor for the BioBERT data set (GLUE version).""" 119 | 120 | def get_train_examples(self, data_dir): 121 | """See base class.""" 122 | return self._create_examples( 123 | self._read_tsv(os.path.join(data_dir, "train.tsv")), "train") 124 | 125 | def get_dev_examples(self, data_dir): 126 | """See base class.""" 127 | return self._create_examples( 128 | self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev") 129 | 130 | def get_test_examples(self, data_dir): 131 | """See base class.""" 132 | return self._create_examples( 133 | self._read_tsv(os.path.join(data_dir, "test.tsv")), "test") 134 | 135 | 136 | def get_labels(self): 137 | """See base class.""" 138 | return ["CPR:3", "CPR:4", "CPR:5", "CPR:6", "CPR:9", "false"] 139 | 140 | def _create_examples(self, lines, set_type): 141 | """Creates examples for the training and dev sets.""" 142 | examples = [] 143 | for (i, line) in enumerate(lines): 144 | if set_type == "train" and i == 0: 145 | continue 146 | if set_type == "dev" and i == 0: 147 | continue 148 | if set_type == "test" and i == 0: 149 | continue 150 | guid = "%s-%s" % (set_type, i) 151 | title = line[1] 152 | text_a = line[2] 153 | sdp = line[3] 154 | label = line[4] 155 | examples.append( 156 | InputExample(guid=guid, title=title, text_a=text_a, text_b=None, sdp=sdp, label=label)) 157 | return examples 158 | 159 | 160 | 161 | def convert_examples_to_features(examples, label_list, max_seq_length, 162 | tokenizer, output_mode): 163 | """Loads a data file into a list of `InputBatch`s.""" 164 | 165 | label_map = {label : i for i, label in enumerate(label_list)} 166 | 167 | features = [] 168 | for (ex_index, example) in enumerate(examples): 169 | if ex_index % 10000 == 0: 170 | logger.info("Writing example %d of %d" % (ex_index, len(examples))) 171 | 172 | title = tokenizer.tokenize(example.title) 173 | if len(title) > max_seq_length - 2: 174 | title = title[:(max_seq_length - 2)] 175 | 176 | tokens_a = tokenizer.tokenize(example.text_a) 177 | 178 | tokens_b = None 179 | if example.text_b: 180 | tokens_b = tokenizer.tokenize(example.text_b) 181 | # Modifies `tokens_a` and `tokens_b` in place so that the total 182 | # length is less than the specified length. 183 | # Account for [CLS], [SEP], [SEP] with "- 3" 184 | _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3) 185 | else: 186 | # Account for [CLS] and [SEP] with "- 2" 187 | if len(tokens_a) > max_seq_length - 2: 188 | tokens_a = tokens_a[:(max_seq_length - 2)] 189 | 190 | sdp = tokenizer.tokenize(example.sdp) 191 | if len(sdp) > max_seq_length - 2: 192 | sdp = sdp[:(max_seq_length - 2)] 193 | 194 | # The convention in BERT is: 195 | # (a) For sequence pairs: 196 | # tokens: [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP] 197 | # type_ids: 0 0 0 0 0 0 0 0 1 1 1 1 1 1 198 | # (b) For single sequences: 199 | # tokens: [CLS] the dog is hairy . [SEP] 200 | # type_ids: 0 0 0 0 0 0 0 201 | # 202 | # Where "type_ids" are used to indicate whether this is the first 203 | # sequence or the second sequence. The embedding vectors for `type=0` and 204 | # `type=1` were learned during pre-training and are added to the wordpiece 205 | # embedding vector (and position vector). This is not *strictly* necessary 206 | # since the [SEP] token unambiguously separates the sequences, but it makes 207 | # it easier for the model to learn the concept of sequences. 208 | # 209 | # For classification tasks, the first vector (corresponding to [CLS]) is 210 | # used as as the "sentence vector". Note that this only makes sense because 211 | # the entire model is fine-tuned. 212 | 213 | title = ["[CLS]"] + title + ["[SEP]"] 214 | tokens = ["[CLS]"] + tokens_a + ["[SEP]"] 215 | sdp = ["[CLS]"] + sdp + ["[SEP]"] 216 | 217 | title_segment = [0] * len(title) 218 | segment_ids = [0] * len(tokens) 219 | sdp_segment = [0] * len(sdp) 220 | 221 | if tokens_b: 222 | tokens += tokens_b + ["[SEP]"] 223 | segment_ids += [1] * (len(tokens_b) + 1) 224 | 225 | if '@' not in tokens: 226 | P_gauss1_array = np.zeros(max_seq_length) 227 | P_gauss2_array = np.zeros(max_seq_length) 228 | else: 229 | ann1_index=tokens.index('@') 230 | if '@' not in tokens[ann1_index+1:]: 231 | P_array = np.zeros(max_seq_length) 232 | P_array[ann1_index-1] = norm_1 233 | P_array[ann1_index] = norm_0 234 | P_array[ann1_index+1] = norm_0 235 | if ann1_index+2 < max_seq_length: 236 | P_array[ann1_index+2] = norm_1 237 | if ann1_index+3 < max_seq_length: 238 | P_array[ann1_index+3] = norm_2 239 | if ann1_index+4 < max_seq_length: 240 | P_array[ann1_index+4] = norm_3 241 | if ann1_index+5 < max_seq_length: 242 | P_array[ann1_index+5] = norm_4 243 | if ann1_index+6 < max_seq_length: 244 | P_array[ann1_index+6] = norm_5 245 | if ann1_index+7 < max_seq_length: 246 | P_array[ann1_index+7] = norm_6 247 | if ann1_index-2 >= 0: 248 | P_array[ann1_index-2] = norm_2 249 | if ann1_index-3 >= 0: 250 | P_array[ann1_index-3] = norm_3 251 | if ann1_index-4 >= 0: 252 | P_array[ann1_index-4] = norm_4 253 | if ann1_index-5 >= 0: 254 | P_array[ann1_index-5] = norm_5 255 | if ann1_index-6 >= 0: 256 | P_array[ann1_index-6] = norm_6 257 | P_gauss1_array = P_array 258 | P_gauss2_array = P_array 259 | else: 260 | ann2_index=tokens.index('@',ann1_index+1) 261 | P1_array = np.zeros(max_seq_length) 262 | P2_array = np.zeros(max_seq_length) 263 | 264 | P1_array[ann1_index-1] = norm_1 265 | P1_array[ann1_index] = norm_0 266 | P1_array[ann1_index+1] = norm_0 267 | if ann1_index+2 < max_seq_length: 268 | P1_array[ann1_index+2] = norm_1 269 | if ann1_index+3 < max_seq_length: 270 | P1_array[ann1_index+3] = norm_2 271 | if ann1_index+4 < max_seq_length: 272 | P1_array[ann1_index+4] = norm_3 273 | if ann1_index+5 < max_seq_length: 274 | P1_array[ann1_index+5] = norm_4 275 | if ann1_index+6 < max_seq_length: 276 | P1_array[ann1_index+6] = norm_5 277 | if ann1_index+7 < max_seq_length: 278 | P1_array[ann1_index+7] = norm_6 279 | if ann1_index-2 >=0: 280 | P1_array[ann1_index-2] = norm_2 281 | if ann1_index-3 >=0: 282 | P1_array[ann1_index-3] = norm_3 283 | if ann1_index-4 >=0: 284 | P1_array[ann1_index-4] = norm_4 285 | if ann1_index-5 >=0: 286 | P1_array[ann1_index-5] = norm_5 287 | if ann1_index-6 >=0: 288 | P1_array[ann1_index-6] = norm_6 289 | 290 | P2_array[ann2_index-1] = norm_1 291 | P2_array[ann2_index] = norm_0 292 | P2_array[ann2_index+1] = norm_0 293 | if ann2_index+2 < max_seq_length: 294 | P2_array[ann2_index+2] = norm_1 295 | if ann2_index+3 < max_seq_length: 296 | P2_array[ann2_index+3] = norm_2 297 | if ann2_index+4 < max_seq_length: 298 | P2_array[ann2_index+4] = norm_3 299 | if ann2_index+5 < max_seq_length: 300 | P2_array[ann2_index+5] = norm_4 301 | if ann2_index+6 < max_seq_length: 302 | P2_array[ann2_index+6] = norm_5 303 | if ann2_index+7 < max_seq_length: 304 | P2_array[ann2_index+7] = norm_6 305 | if ann2_index-2 >=0: 306 | P2_array[ann2_index-2] = norm_2 307 | if ann2_index-3 >= 0: 308 | P2_array[ann2_index-3] = norm_3 309 | if ann2_index-4 >= 0: 310 | P2_array[ann2_index-4] = norm_4 311 | if ann2_index-5 >= 0: 312 | P2_array[ann2_index-5] = norm_5 313 | if ann2_index-6 >= 0: 314 | P2_array[ann2_index-6] = norm_6 315 | P_gauss1_array = P1_array 316 | P_gauss2_array = P2_array 317 | 318 | P_gauss1_list= list(P_gauss1_array) 319 | P_gauss2_list= list(P_gauss2_array) 320 | 321 | title_ids = tokenizer.convert_tokens_to_ids(title) 322 | input_ids = tokenizer.convert_tokens_to_ids(tokens) 323 | sdp_ids = tokenizer.convert_tokens_to_ids(sdp) 324 | 325 | # The mask has 1 for real tokens and 0 for padding tokens. Only real 326 | # tokens are attended to. 327 | title_mask = [1] * len(title_ids) 328 | input_mask = [1] * len(input_ids) 329 | sdp_mask = [1] * len(sdp_ids) 330 | 331 | # Zero-pad up to the sequence length. 332 | padding_title = [0] * (max_seq_length - len(title_ids)) 333 | title_ids += padding_title 334 | 335 | padding = [0] * (max_seq_length - len(input_ids)) 336 | input_ids += padding 337 | 338 | padding_sdp = [0] * (max_seq_length - len(sdp_ids)) 339 | sdp_ids += padding_sdp 340 | 341 | title_mask += padding_title 342 | input_mask += padding 343 | sdp_mask += padding_sdp 344 | 345 | title_segment += padding_title 346 | segment_ids += padding 347 | sdp_segment += padding_sdp 348 | 349 | assert len(title_ids) == max_seq_length 350 | assert len(title_mask) == max_seq_length 351 | assert len(title_segment) == max_seq_length 352 | assert len(input_ids) == max_seq_length 353 | assert len(input_mask) == max_seq_length 354 | assert len(segment_ids) == max_seq_length 355 | assert len(sdp_ids) == max_seq_length 356 | assert len(sdp_mask) == max_seq_length 357 | assert len(sdp_segment) == max_seq_length 358 | assert len(P_gauss1_list) == max_seq_length 359 | assert len(P_gauss2_list) == max_seq_length 360 | 361 | if output_mode == "classification": 362 | label_id = label_map[example.label] 363 | elif output_mode == "regression": 364 | label_id = float(example.label) 365 | else: 366 | raise KeyError(output_mode) 367 | 368 | if ex_index < 5: 369 | logger.info("*** Example ***") 370 | logger.info("guid: %s" % (example.guid)) 371 | logger.info("title: %s" % " ".join([str(x) for x in title])) 372 | logger.info("title_ids: %s" % " ".join([str(x) for x in title_ids])) 373 | logger.info("title_mask: %s" % " ".join([str(x) for x in title_mask])) 374 | logger.info("title_segment: %s" % " ".join([str(x) for x in title_segment])) 375 | 376 | logger.info("tokens: %s" % " ".join([str(x) for x in tokens])) 377 | logger.info("input_ids: %s" % " ".join([str(x) for x in input_ids])) 378 | logger.info("input_mask: %s" % " ".join([str(x) for x in input_mask])) 379 | logger.info("segment_ids: %s" % " ".join([str(x) for x in segment_ids])) 380 | 381 | logger.info("sdp: %s" % " ".join([str(x) for x in sdp])) 382 | logger.info("sdp_ids: %s" % " ".join([str(x) for x in sdp_ids])) 383 | logger.info("sdp_mask: %s" % " ".join([str(x) for x in sdp_mask])) 384 | logger.info("sdp_segment: %s" % " ".join([str(x) for x in sdp_segment])) 385 | 386 | logger.info("P_gauss1_list: %s" % " ".join([str(x) for x in P_gauss1_list])) 387 | logger.info("P_gauss2_list: %s" % " ".join([str(x) for x in P_gauss2_list])) 388 | logger.info("label: %s (id = %d)" % (example.label, label_id)) 389 | 390 | features.append( 391 | InputFeatures(title_ids=title_ids, 392 | title_mask=title_mask, 393 | title_segment=title_segment, 394 | input_ids=input_ids, 395 | input_mask=input_mask, 396 | segment_ids=segment_ids, 397 | sdp_ids=sdp_ids, 398 | sdp_mask=sdp_mask, 399 | sdp_segment=sdp_segment, 400 | P_gauss1_list=P_gauss1_list, 401 | P_gauss2_list=P_gauss2_list, 402 | label_id=label_id)) 403 | return features 404 | 405 | 406 | def _truncate_seq_pair(tokens_a, tokens_b, max_length): 407 | """Truncates a sequence pair in place to the maximum length.""" 408 | 409 | # This is a simple heuristic which will always truncate the longer sequence 410 | # one token at a time. This makes more sense than truncating an equal percent 411 | # of tokens from each, since if one sequence is very short then each token 412 | # that's truncated likely contains more information than a longer sequence. 413 | while True: 414 | total_length = len(tokens_a) + len(tokens_b) 415 | if total_length <= max_length: 416 | break 417 | if len(tokens_a) > len(tokens_b): 418 | tokens_a.pop() 419 | else: 420 | tokens_b.pop() 421 | 422 | 423 | def f1_chemprot(preds, labels): 424 | p,r,f,s = precision_recall_fscore_support(y_pred=preds, y_true=labels, labels=[0,1,2,3,4], average="micro") 425 | return { 426 | "P": p, 427 | "R": r, 428 | "F1": f, 429 | } 430 | 431 | 432 | def compute_metrics(task_name, preds, labels): 433 | assert len(preds) == len(labels) 434 | if task_name == "cpi": 435 | return f1_chemprot(preds, labels) 436 | else: 437 | raise KeyError(task_name) 438 | 439 | def main(): 440 | parser = argparse.ArgumentParser() 441 | 442 | ## Required parameters 443 | parser.add_argument("--data_dir", 444 | default=None, 445 | type=str, 446 | required=True, 447 | help="The input data dir. Should contain the .tsv files (or other data files) for the task.") 448 | parser.add_argument("--bert_model", default=None, type=str, required=True, 449 | help="Bert pre-trained model selected in the list: bert-base-uncased, " 450 | "bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, " 451 | "bert-base-multilingual-cased, bert-base-chinese.") 452 | parser.add_argument("--task_name", 453 | default=None, 454 | type=str, 455 | required=True, 456 | help="The name of the task to train.") 457 | parser.add_argument("--output_dir", 458 | default=None, 459 | type=str, 460 | required=True, 461 | help="The output directory where the model predictions and checkpoints will be written.") 462 | 463 | ## Other parameters 464 | parser.add_argument("--cache_dir", 465 | default="", 466 | type=str, 467 | help="Where do you want to store the pre-trained models downloaded from s3") 468 | parser.add_argument("--max_seq_length", 469 | default=128, 470 | type=int, 471 | help="The maximum total input sequence length after WordPiece tokenization. \n" 472 | "Sequences longer than this will be truncated, and sequences shorter \n" 473 | "than this will be padded.") 474 | parser.add_argument("--saved_model", default=None, type=str, 475 | help="saved model") 476 | parser.add_argument("--do_train", 477 | action='store_true', 478 | help="Whether to run training.") 479 | parser.add_argument("--do_eval", 480 | action='store_true', 481 | help="Whether to run eval on the dev set.") 482 | parser.add_argument("--do_predict", 483 | action='store_true', 484 | help="Whether to run test on the test set.") 485 | parser.add_argument("--do_lower_case", 486 | action='store_true', 487 | help="Set this flag if you are using an uncased model.") 488 | parser.add_argument("--train_batch_size", 489 | default=16, 490 | type=int, 491 | help="Total batch size for training.") 492 | parser.add_argument("--eval_batch_size", 493 | default=8, 494 | type=int, 495 | help="Total batch size for eval.") 496 | parser.add_argument("--predict_batch_size", 497 | default=8, 498 | type=int, 499 | help="Total batch size for predict.") 500 | parser.add_argument("--learning_rate", 501 | default=5e-5, 502 | type=float, 503 | help="The initial learning rate for Adam.") 504 | parser.add_argument("--num_train_epochs", 505 | default=2.0, 506 | type=float, 507 | help="Total number of training epochs to perform.") 508 | parser.add_argument("--warmup_proportion", 509 | default=0.1, 510 | type=float, 511 | help="Proportion of training to perform linear learning rate warmup for. " 512 | "E.g., 0.1 = 10%% of training.") 513 | parser.add_argument("--no_cuda", 514 | action='store_true', 515 | help="Whether not to use CUDA when available") 516 | parser.add_argument("--local_rank", 517 | type=int, 518 | default=-1, 519 | help="local_rank for distributed training on gpus") 520 | parser.add_argument('--seed', 521 | type=int, 522 | default=47, 523 | help="random seed for initialization") 524 | parser.add_argument('--gradient_accumulation_steps', 525 | type=int, 526 | default=1, 527 | help="Number of updates steps to accumulate before performing a backward/update pass.") 528 | parser.add_argument('--fp16', 529 | action='store_true', 530 | help="Whether to use 16-bit float precision instead of 32-bit") 531 | parser.add_argument('--loss_scale', 532 | type=float, default=0, 533 | help="Loss scaling to improve fp16 numeric stability. Only used when fp16 set to True.\n" 534 | "0 (default value): dynamic loss scaling.\n" 535 | "Positive power of 2: static loss scaling value.\n") 536 | parser.add_argument('--server_ip', type=str, default='', help="Can be used for distant debugging.") 537 | parser.add_argument('--server_port', type=str, default='', help="Can be used for distant debugging.") 538 | 539 | args = parser.parse_args() 540 | 541 | if args.server_ip and args.server_port: 542 | # Distant debugging - see https://code.visualstudio.com/docs/python/debugging#_attach-to-a-local-script 543 | import ptvsd 544 | print("Waiting for debugger attach") 545 | ptvsd.enable_attach(address=(args.server_ip, args.server_port), redirect_output=True) 546 | ptvsd.wait_for_attach() 547 | 548 | processors = { 549 | "cpi": BioBERTChemprotProcessor, 550 | } 551 | 552 | output_modes = { 553 | "cpi": "classification", 554 | } 555 | 556 | if args.local_rank == -1 or args.no_cuda: 557 | device = torch.device("cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu") 558 | n_gpu = torch.cuda.device_count() 559 | else: 560 | torch.cuda.set_device(args.local_rank) 561 | device = torch.device("cuda", args.local_rank) 562 | n_gpu = 1 563 | # Initializes the distributed backend which will take care of sychronizing nodes/GPUs 564 | torch.distributed.init_process_group(backend='nccl') 565 | logger.info("device: {} n_gpu: {}, distributed training: {}, 16-bits training: {}".format( 566 | device, n_gpu, bool(args.local_rank != -1), args.fp16)) 567 | 568 | if args.gradient_accumulation_steps < 1: 569 | raise ValueError("Invalid gradient_accumulation_steps parameter: {}, should be >= 1".format( 570 | args.gradient_accumulation_steps)) 571 | 572 | args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps 573 | 574 | random.seed(args.seed) 575 | np.random.seed(args.seed) 576 | torch.manual_seed(args.seed) 577 | if n_gpu > 0: 578 | torch.cuda.manual_seed_all(args.seed) 579 | 580 | if not args.do_train and not args.do_eval: 581 | raise ValueError("At least one of `do_train` or `do_eval` must be True.") 582 | 583 | if os.path.exists(args.output_dir) and os.listdir(args.output_dir) and args.do_train: 584 | raise ValueError("Output directory ({}) already exists and is not empty.".format(args.output_dir)) 585 | if not os.path.exists(args.output_dir): 586 | os.makedirs(args.output_dir) 587 | 588 | task_name = args.task_name.lower() 589 | 590 | if task_name not in processors: 591 | raise ValueError("Task not found: %s" % (task_name)) 592 | 593 | processor = processors[task_name]() 594 | output_mode = output_modes[task_name] 595 | 596 | label_list = processor.get_labels() 597 | num_labels = len(label_list) 598 | 599 | tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case, 600 | never_split = ("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]", "CPR:3", "CPR:4", "CPR:5", "CPR:6", "CPR:9")) 601 | 602 | tokenizer.vocab["CPR:3"] = 3 603 | tokenizer.vocab["CPR:4"] = 4 604 | tokenizer.vocab["CPR:5"] = 5 605 | tokenizer.vocab["CPR:6"] = 6 606 | tokenizer.vocab["CPR:9"] = 9 607 | train_examples = None 608 | num_train_optimization_steps = None 609 | 610 | 611 | if args.do_train: 612 | train_examples = processor.get_train_examples(args.data_dir) 613 | num_train_optimization_steps = int( 614 | len(train_examples) / args.train_batch_size / args.gradient_accumulation_steps) * args.num_train_epochs 615 | if args.local_rank != -1: 616 | num_train_optimization_steps = num_train_optimization_steps // torch.distributed.get_world_size() 617 | 618 | # Prepare model 619 | cache_dir = args.cache_dir if args.cache_dir else os.path.join(str(PYTORCH_PRETRAINED_BERT_CACHE), 'distributed_{}'.format(args.local_rank)) 620 | model = BertForSequenceClassification.from_pretrained(args.bert_model, 621 | cache_dir=cache_dir, 622 | num_labels=num_labels) 623 | if args.fp16: 624 | model.half() 625 | model.to(device) 626 | if args.local_rank != -1: 627 | try: 628 | from apex.parallel import DistributedDataParallel as DDP 629 | except ImportError: 630 | raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.") 631 | 632 | model = DDP(model) 633 | elif n_gpu > 1: 634 | model = torch.nn.DataParallel(model) 635 | 636 | # Prepare optimizer 637 | param_optimizer = list(model.named_parameters()) 638 | no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight'] 639 | optimizer_grouped_parameters = [ 640 | {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01}, 641 | {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0} 642 | ] 643 | if args.fp16: 644 | try: 645 | from apex.optimizers import FP16_Optimizer 646 | from apex.optimizers import FusedAdam 647 | except ImportError: 648 | raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.") 649 | 650 | optimizer = FusedAdam(optimizer_grouped_parameters, 651 | lr=args.learning_rate, 652 | bias_correction=False, 653 | max_grad_norm=1.0) 654 | if args.loss_scale == 0: 655 | optimizer = FP16_Optimizer(optimizer, dynamic_loss_scale=True) 656 | else: 657 | optimizer = FP16_Optimizer(optimizer, static_loss_scale=args.loss_scale) 658 | 659 | else: 660 | optimizer = BertAdam(optimizer_grouped_parameters, 661 | lr=args.learning_rate, 662 | warmup=args.warmup_proportion, 663 | t_total=num_train_optimization_steps, 664 | e=1e-08) 665 | 666 | global_step = 0 667 | nb_tr_steps = 0 668 | tr_loss = 0 669 | if args.do_train: 670 | train_features = convert_examples_to_features( 671 | train_examples, label_list, args.max_seq_length, tokenizer, output_mode) 672 | logger.info("***** Running training *****") 673 | logger.info(" Num examples = %d", len(train_examples)) 674 | logger.info(" Batch size = %d", args.train_batch_size) 675 | logger.info(" Num steps = %d", num_train_optimization_steps) 676 | all_title_ids = torch.tensor([f.title_ids for f in train_features], dtype=torch.long) 677 | all_title_mask = torch.tensor([f.title_mask for f in train_features], dtype=torch.long) 678 | all_title_segment = torch.tensor([f.title_segment for f in train_features], dtype=torch.long) 679 | 680 | all_input_ids = torch.tensor([f.input_ids for f in train_features], dtype=torch.long) 681 | all_input_mask = torch.tensor([f.input_mask for f in train_features], dtype=torch.long) 682 | all_segment_ids = torch.tensor([f.segment_ids for f in train_features], dtype=torch.long) 683 | 684 | all_sdp_ids = torch.tensor([f.sdp_ids for f in train_features], dtype=torch.long) 685 | all_sdp_mask = torch.tensor([f.sdp_mask for f in train_features], dtype=torch.long) 686 | all_sdp_segment = torch.tensor([f.sdp_segment for f in train_features], dtype=torch.long) 687 | 688 | all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in train_features], dtype=torch.float) 689 | all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in train_features], dtype=torch.float) 690 | 691 | if output_mode == "classification": 692 | all_label_ids = torch.tensor([f.label_id for f in train_features], dtype=torch.long) 693 | elif output_mode == "regression": 694 | all_label_ids = torch.tensor([f.label_id for f in train_features], dtype=torch.float) 695 | 696 | train_data = TensorDataset(all_title_ids, all_title_mask, all_title_segment, all_input_ids, all_input_mask, all_segment_ids, \ 697 | all_sdp_ids, all_sdp_mask, all_sdp_segment, all_P_gauss1_list, all_P_gauss2_list,all_label_ids) 698 | 699 | if args.local_rank == -1: 700 | train_sampler = RandomSampler(train_data) 701 | else: 702 | train_sampler = DistributedSampler(train_data) 703 | train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=args.train_batch_size) 704 | 705 | model.train() 706 | for _ in trange(int(args.num_train_epochs), desc="Epoch"): 707 | tr_loss = 0 708 | nb_tr_examples, nb_tr_steps = 0, 0 709 | for step, batch in enumerate(tqdm(train_dataloader, desc="Iteration")): 710 | batch = tuple(t.to(device) for t in batch) 711 | title_ids, title_mask, title_segment, input_ids, input_mask, segment_ids, \ 712 | sdp_ids, sdp_mask, sdp_segment, P_gauss1_list, P_gauss2_list, label_ids = batch 713 | 714 | # define a new function to compute loss values for both output_modes 715 | logits = model(title_ids, title_segment, title_mask, input_ids, segment_ids, input_mask, \ 716 | sdp_ids, sdp_segment, sdp_mask, P_gauss1_list, P_gauss2_list, labels=None) 717 | 718 | if output_mode == "classification": 719 | loss_fct = CrossEntropyLoss() 720 | loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1)) 721 | elif output_mode == "regression": 722 | loss_fct = MSELoss() 723 | loss = loss_fct(logits.view(-1), label_ids.view(-1)) 724 | 725 | if n_gpu > 1: 726 | loss = loss.mean() # mean() to average on multi-gpu. 727 | if args.gradient_accumulation_steps > 1: 728 | loss = loss / args.gradient_accumulation_steps 729 | 730 | if args.fp16: 731 | optimizer.backward(loss) 732 | else: 733 | loss.backward() 734 | 735 | tr_loss += loss.item() 736 | nb_tr_examples += input_ids.size(0) 737 | nb_tr_steps += 1 738 | if (step + 1) % args.gradient_accumulation_steps == 0: 739 | if args.fp16: 740 | # modify learning rate with special warm up BERT uses 741 | # if args.fp16 is False, BertAdam is used that handles this automatically 742 | lr_this_step = args.learning_rate * warmup_linear(global_step/num_train_optimization_steps, args.warmup_proportion) 743 | for param_group in optimizer.param_groups: 744 | param_group['lr'] = lr_this_step 745 | optimizer.step() 746 | optimizer.zero_grad() 747 | global_step += 1 748 | 749 | # Save a trained model and the associated configuration 750 | model_to_save = model.module if hasattr(model, 'module') else model # Only save the model it-self 751 | output_model_file = os.path.join(args.output_dir, WEIGHTS_NAME) 752 | torch.save(model_to_save.state_dict(), output_model_file) 753 | output_config_file = os.path.join(args.output_dir, CONFIG_NAME) 754 | with open(output_config_file, 'w') as f: 755 | f.write(model_to_save.config.to_json_string()) 756 | 757 | else: 758 | output_model_file = os.path.join(args.saved_model, 'pytorch_model.bin') 759 | output_config_file = os.path.join(args.saved_model, 'bert_config.json') 760 | config = BertConfig(output_config_file) 761 | model = BertForSequenceClassification(config, num_labels=num_labels) 762 | model.load_state_dict(torch.load(output_model_file)) 763 | model.to(device) 764 | 765 | 766 | if args.do_eval and (args.local_rank == -1 or torch.distributed.get_rank() == 0): 767 | eval_examples = processor.get_dev_examples(args.data_dir) 768 | eval_features = convert_examples_to_features( 769 | eval_examples, label_list, args.max_seq_length, tokenizer, output_mode) 770 | logger.info("***** Running evaluation *****") 771 | logger.info(" Num examples = %d", len(eval_examples)) 772 | logger.info(" Batch size = %d", args.eval_batch_size) 773 | all_title_ids = torch.tensor([f.title_ids for f in eval_features], dtype=torch.long) 774 | all_title_mask = torch.tensor([f.title_mask for f in eval_features], dtype=torch.long) 775 | all_title_segment = torch.tensor([f.title_segment for f in eval_features], dtype=torch.long) 776 | 777 | all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long) 778 | all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long) 779 | all_segment_ids = torch.tensor([f.segment_ids for f in eval_features], dtype=torch.long) 780 | 781 | all_sdp_ids = torch.tensor([f.sdp_ids for f in eval_features], dtype=torch.long) 782 | all_sdp_mask = torch.tensor([f.sdp_mask for f in eval_features], dtype=torch.long) 783 | all_sdp_segment = torch.tensor([f.sdp_segment for f in eval_features], dtype=torch.long) 784 | 785 | all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in eval_features], dtype=torch.float) 786 | all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in eval_features], dtype=torch.float) 787 | 788 | if output_mode == "classification": 789 | all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.long) 790 | elif output_mode == "regression": 791 | all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.float) 792 | 793 | eval_data = TensorDataset(all_title_ids, all_title_mask, all_title_segment, all_input_ids, all_input_mask, all_segment_ids, \ 794 | all_sdp_ids, all_sdp_mask, all_sdp_segment, all_P_gauss1_list, all_P_gauss2_list, all_label_ids) 795 | 796 | # Run prediction for full data 797 | eval_sampler = SequentialSampler(eval_data) 798 | eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=args.eval_batch_size) 799 | 800 | model.eval() 801 | eval_loss = 0 802 | nb_eval_steps = 0 803 | preds = [] 804 | 805 | for title_ids, title_mask, title_segment, input_ids, input_mask, segment_ids, sdp_ids, sdp_mask, sdp_segment, P_gauss1_list, P_gauss2_list, label_ids in tqdm(eval_dataloader, desc="Evaluating"): 806 | title_ids = title_ids.to(device) 807 | title_mask = title_mask.to(device) 808 | title_segment = title_segment.to(device) 809 | input_ids = input_ids.to(device) 810 | input_mask = input_mask.to(device) 811 | segment_ids = segment_ids.to(device) 812 | sdp_ids = sdp_ids.to(device) 813 | sdp_mask = sdp_mask.to(device) 814 | sdp_segment = sdp_segment.to(device) 815 | P_gauss1_list = P_gauss1_list.to(device) 816 | P_gauss2_list = P_gauss2_list.to(device) 817 | label_ids = label_ids.to(device) 818 | 819 | with torch.no_grad(): 820 | logits = model(title_ids, title_segment, title_mask, input_ids, segment_ids, input_mask, sdp_ids, sdp_segment, sdp_mask, P_gauss1_list, P_gauss2_list, labels=None) 821 | 822 | # create eval loss and other metric required by the task 823 | if output_mode == "classification": 824 | loss_fct = CrossEntropyLoss() 825 | tmp_eval_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1)) 826 | elif output_mode == "regression": 827 | loss_fct = MSELoss() 828 | tmp_eval_loss = loss_fct(logits.view(-1), label_ids.view(-1)) 829 | 830 | eval_loss += tmp_eval_loss.mean().item() 831 | nb_eval_steps += 1 832 | if len(preds) == 0: 833 | preds.append(logits.detach().cpu().numpy()) 834 | else: 835 | preds[0] = np.append( 836 | preds[0], logits.detach().cpu().numpy(), axis=0) 837 | 838 | eval_loss = eval_loss / nb_eval_steps 839 | preds = preds[0] 840 | if output_mode == "classification": 841 | preds = np.argmax(preds, axis=1) 842 | elif output_mode == "regression": 843 | preds = np.squeeze(preds) 844 | result = compute_metrics(task_name, preds, all_label_ids.numpy()) 845 | loss = tr_loss/nb_tr_steps if args.do_train else None 846 | 847 | result['eval_loss'] = eval_loss 848 | result['global_step'] = global_step 849 | result['loss'] = loss 850 | 851 | output_eval_file = os.path.join(args.output_dir, "eval_results.txt") 852 | with open(output_eval_file, "w") as writer: 853 | logger.info("***** Eval results *****") 854 | for key in sorted(result.keys()): 855 | logger.info(" %s = %s", key, str(result[key])) 856 | writer.write("%s = %s\n" % (key, str(result[key]))) 857 | 858 | 859 | if args.do_predict and (args.local_rank == -1 or torch.distributed.get_rank() == 0): 860 | test_examples = processor.get_test_examples(args.data_dir) 861 | test_features = convert_examples_to_features( 862 | test_examples, label_list, args.max_seq_length, tokenizer, output_mode) 863 | logger.info("***** Running prediction *****") 864 | logger.info(" Num examples = %d", len(test_examples)) 865 | logger.info(" Batch size = %d", args.predict_batch_size) 866 | all_title_ids = torch.tensor([f.title_ids for f in test_features], dtype=torch.long) 867 | all_title_mask = torch.tensor([f.title_mask for f in test_features], dtype=torch.long) 868 | all_title_segment = torch.tensor([f.title_segment for f in test_features], dtype=torch.long) 869 | 870 | all_input_ids = torch.tensor([f.input_ids for f in test_features], dtype=torch.long) 871 | all_input_mask = torch.tensor([f.input_mask for f in test_features], dtype=torch.long) 872 | all_segment_ids = torch.tensor([f.segment_ids for f in test_features], dtype=torch.long) 873 | 874 | all_sdp_ids = torch.tensor([f.sdp_ids for f in test_features], dtype=torch.long) 875 | all_sdp_mask = torch.tensor([f.sdp_mask for f in test_features], dtype=torch.long) 876 | all_sdp_segment = torch.tensor([f.sdp_segment for f in test_features], dtype=torch.long) 877 | 878 | all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in test_features], dtype=torch.float) 879 | all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in test_features], dtype=torch.float) 880 | 881 | if output_mode == "classification": 882 | all_label_ids = torch.tensor([f.label_id for f in test_features], dtype=torch.long) 883 | elif output_mode == "regression": 884 | all_label_ids = torch.tensor([f.label_id for f in test_features], dtype=torch.float) 885 | 886 | predict_data = TensorDataset(all_title_ids, all_title_mask, all_title_segment, all_input_ids, all_input_mask, all_segment_ids, all_sdp_ids, all_sdp_mask, all_sdp_segment, all_P_gauss1_list, all_P_gauss2_list, all_label_ids) 887 | 888 | # Run prediction for full data 889 | predict_sampler = SequentialSampler(predict_data) 890 | predict_dataloader = DataLoader(predict_data, sampler=predict_sampler, batch_size=args.predict_batch_size) 891 | 892 | model.eval() 893 | test_loss = 0 894 | nb_test_steps = 0 895 | preds = [] 896 | 897 | for title_ids, title_mask, title_segment, input_ids, input_mask, segment_ids, sdp_ids, sdp_mask, sdp_segment, P_gauss1_list, P_gauss2_list,label_ids in tqdm(predict_dataloader, desc="Predicting"): 898 | title_ids = title_ids.to(device) 899 | title_mask = title_mask.to(device) 900 | title_segment = title_segment.to(device) 901 | input_ids = input_ids.to(device) 902 | input_mask = input_mask.to(device) 903 | segment_ids = segment_ids.to(device) 904 | sdp_ids = sdp_ids.to(device) 905 | sdp_mask = sdp_mask.to(device) 906 | sdp_segment = sdp_segment.to(device) 907 | P_gauss1_list = P_gauss1_list.to(device) 908 | P_gauss2_list = P_gauss2_list.to(device) 909 | label_ids = label_ids.to(device) 910 | 911 | 912 | with torch.no_grad(): 913 | logits = model(title_ids, title_segment, title_mask, input_ids, segment_ids, input_mask,\ 914 | sdp_ids, sdp_segment, sdp_mask, P_gauss1_list, P_gauss2_list, labels=None) 915 | 916 | # create eval loss and other metric required by the task 917 | if output_mode == "classification": 918 | loss_fct = CrossEntropyLoss() 919 | tmp_test_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1)) 920 | 921 | elif output_mode == "regression": 922 | loss_fct = MSELoss() 923 | tmp_test_loss = loss_fct(logits.view(-1), label_ids.view(-1)) 924 | 925 | test_loss += tmp_test_loss.mean().item() 926 | nb_test_steps += 1 927 | if len(preds) == 0: 928 | preds.append(logits.detach().cpu().numpy()) 929 | else: 930 | preds[0] = np.append( 931 | preds[0], logits.detach().cpu().numpy(), axis=0) 932 | 933 | test_loss = test_loss / nb_test_steps 934 | preds = preds[0] 935 | if output_mode == "classification": 936 | preds = np.argmax(preds, axis=1) 937 | elif output_mode == "regression": 938 | preds = np.squeeze(preds) 939 | result = compute_metrics(task_name, preds, all_label_ids.numpy()) 940 | loss = tr_loss/nb_tr_steps if args.do_train else None 941 | 942 | result['test_loss'] = test_loss 943 | result['global_step'] = global_step 944 | result['loss'] = loss 945 | 946 | output_test_file = os.path.join(args.output_dir, "test_results.txt") 947 | with open(output_test_file, "w") as writer: 948 | logger.info("***** Predict results *****") 949 | for key in sorted(result.keys()): 950 | logger.info(" %s = %s", key, str(result[key])) 951 | writer.write("%s = %s\n" % (key, str(result[key]))) 952 | 953 | 954 | if __name__ == "__main__": 955 | main() -------------------------------------------------------------------------------- /SourceCode/BioRE_BG.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Author: Cong Sun 3 | # Email: suncong132@mail.dlut.edu.cn 4 | # 5 | # BioRE_BG 6 | # 7 | #-------------------------------------------------------------------------- 8 | 9 | 10 | 11 | from __future__ import absolute_import, division, print_function 12 | import argparse 13 | import csv 14 | import logging 15 | import os 16 | import random 17 | import sys 18 | import numpy as np 19 | import torch 20 | from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,TensorDataset) 21 | from torch.utils.data.distributed import DistributedSampler 22 | from tqdm import tqdm, trange 23 | from torch.nn import CrossEntropyLoss, MSELoss 24 | import scipy 25 | from scipy.stats import pearsonr, spearmanr 26 | from sklearn.metrics import matthews_corrcoef, f1_score, precision_recall_fscore_support 27 | 28 | import modeling_BG 29 | from modeling_BG import BertForSequenceClassification, BertConfig, WEIGHTS_NAME, CONFIG_NAME 30 | from pytorch_pretrained_bert.file_utils import PYTORCH_PRETRAINED_BERT_CACHE 31 | from pytorch_pretrained_bert.tokenization import BertTokenizer 32 | from pytorch_pretrained_bert.optimization import BertAdam, warmup_linear 33 | import os 34 | 35 | logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s', 36 | datefmt = '%m/%d/%Y %H:%M:%S', 37 | level = logging.INFO) 38 | logger = logging.getLogger(__name__) 39 | 40 | sigma = 2.5 41 | mu = 0.0 42 | 43 | norm_0=scipy.stats.norm(mu,sigma).cdf(1)-scipy.stats.norm(mu,sigma).cdf(0) 44 | norm_1=scipy.stats.norm(mu,sigma).cdf(2)-scipy.stats.norm(mu,sigma).cdf(1) 45 | norm_2=scipy.stats.norm(mu,sigma).cdf(3)-scipy.stats.norm(mu,sigma).cdf(2) 46 | norm_3=scipy.stats.norm(mu,sigma).cdf(4)-scipy.stats.norm(mu,sigma).cdf(3) 47 | norm_4=scipy.stats.norm(mu,sigma).cdf(5)-scipy.stats.norm(mu,sigma).cdf(4) 48 | norm_5=scipy.stats.norm(mu,sigma).cdf(6)-scipy.stats.norm(mu,sigma).cdf(5) 49 | norm_6=scipy.stats.norm(mu,sigma).cdf(7)-scipy.stats.norm(mu,sigma).cdf(6) 50 | 51 | 52 | 53 | class InputExample(object): 54 | """A single training/test example for simple sequence classification.""" 55 | 56 | def __init__(self, guid, text_a, text_b=None, label=None): 57 | """Constructs a InputExample.""" 58 | self.guid = guid 59 | self.text_a = text_a 60 | self.text_b = text_b 61 | self.label = label 62 | 63 | 64 | class InputFeatures(object): 65 | """A single set of features of data.""" 66 | 67 | def __init__(self, input_ids, input_mask, segment_ids, P_gauss1_list, P_gauss2_list, label_id): 68 | 69 | self.input_ids = input_ids 70 | self.input_mask = input_mask 71 | self.segment_ids = segment_ids 72 | 73 | self.P_gauss1_list = P_gauss1_list 74 | self.P_gauss2_list = P_gauss2_list 75 | self.label_id = label_id 76 | 77 | 78 | class DataProcessor(object): 79 | """Base class for data converters for sequence classification data sets.""" 80 | 81 | def get_train_examples(self, data_dir): 82 | """Gets a collection of `InputExample`s for the train set.""" 83 | raise NotImplementedError() 84 | 85 | def get_dev_examples(self, data_dir): 86 | """Gets a collection of `InputExample`s for the dev set.""" 87 | raise NotImplementedError() 88 | 89 | def get_labels(self): 90 | """Gets the list of labels for this data set.""" 91 | raise NotImplementedError() 92 | 93 | @classmethod 94 | def _read_tsv(cls, input_file, quotechar=None): 95 | """Reads a tab separated value file.""" 96 | with open(input_file, "r") as f: 97 | reader = csv.reader(f, delimiter="\t", quotechar=quotechar) 98 | lines = [] 99 | for line in reader: 100 | if sys.version_info[0] == 2: 101 | line = list(unicode(cell, 'utf-8') for cell in line) 102 | lines.append(line) 103 | return lines 104 | 105 | 106 | class BioBERTChemprotProcessor(DataProcessor): 107 | """Processor for the BioBERT data set (GLUE version).""" 108 | 109 | def get_train_examples(self, data_dir): 110 | """See base class.""" 111 | return self._create_examples( 112 | self._read_tsv(os.path.join(data_dir, "train.tsv")), "train") 113 | 114 | def get_dev_examples(self, data_dir): 115 | """See base class.""" 116 | return self._create_examples( 117 | self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev") 118 | 119 | def get_test_examples(self, data_dir): 120 | """See base class.""" 121 | return self._create_examples( 122 | self._read_tsv(os.path.join(data_dir, "test.tsv")), "test") 123 | 124 | 125 | def get_labels(self): 126 | """See base class.""" 127 | return ["CPR:3", "CPR:4", "CPR:5", "CPR:6", "CPR:9", "false"] 128 | 129 | def _create_examples(self, lines, set_type): 130 | """Creates examples for the training and dev sets.""" 131 | examples = [] 132 | for (i, line) in enumerate(lines): 133 | if set_type == "train" and i == 0: 134 | continue 135 | if set_type == "dev" and i == 0: 136 | continue 137 | if set_type == "test" and i == 0: 138 | continue 139 | guid = "%s-%s" % (set_type, i) 140 | # title = line[1] 141 | text_a = line[2] 142 | # sdp = line[3] 143 | label = line[4] 144 | examples.append( 145 | InputExample(guid=guid, text_a=text_a, text_b=None, label=label)) 146 | return examples 147 | 148 | 149 | class BioBERT_DDIProcessor(DataProcessor): 150 | """Processor for the BioBERT data set (GLUE version).""" 151 | 152 | def get_train_examples(self, data_dir): 153 | """See base class.""" 154 | return self._create_examples( 155 | self._read_tsv(os.path.join(data_dir, "train.tsv")), "train") 156 | 157 | def get_dev_examples(self, data_dir): 158 | """See base class.""" 159 | return self._create_examples( 160 | self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev") 161 | 162 | def get_test_examples(self, data_dir): 163 | """See base class.""" 164 | return self._create_examples( 165 | self._read_tsv(os.path.join(data_dir, "test.tsv")), "test") 166 | 167 | def get_labels(self): 168 | """See base class.""" 169 | return ["DDI-advise", "DDI-effect", "DDI-mechanism", "DDI-int", "DDI-false"] 170 | 171 | def _create_examples(self, lines, set_type): 172 | """Creates examples for the training and dev sets.""" 173 | examples = [] 174 | for (i, line) in enumerate(lines): 175 | if set_type == "train" and i == 0: 176 | continue 177 | if set_type == "dev" and i == 0: 178 | continue 179 | if set_type == "test" and i == 0: 180 | continue 181 | guid = "%s-%s" % (set_type, i) 182 | text_a = line[1] 183 | label = line[2] 184 | examples.append( 185 | InputExample(guid=guid, text_a=text_a, text_b=None, label=label)) 186 | return examples 187 | 188 | 189 | def convert_examples_to_features(examples, label_list, max_seq_length, 190 | tokenizer, output_mode): 191 | """Loads a data file into a list of `InputBatch`s.""" 192 | 193 | label_map = {label : i for i, label in enumerate(label_list)} 194 | 195 | features = [] 196 | for (ex_index, example) in enumerate(examples): 197 | if ex_index % 10000 == 0: 198 | logger.info("Writing example %d of %d" % (ex_index, len(examples))) 199 | 200 | tokens_a = tokenizer.tokenize(example.text_a) 201 | 202 | tokens_b = None 203 | if example.text_b: 204 | tokens_b = tokenizer.tokenize(example.text_b) 205 | # Modifies `tokens_a` and `tokens_b` in place so that the total 206 | # length is less than the specified length. 207 | # Account for [CLS], [SEP], [SEP] with "- 3" 208 | _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3) 209 | else: 210 | # Account for [CLS] and [SEP] with "- 2" 211 | if len(tokens_a) > max_seq_length - 2: 212 | tokens_a = tokens_a[:(max_seq_length - 2)] 213 | 214 | tokens = ["[CLS]"] + tokens_a + ["[SEP]"] 215 | segment_ids = [0] * len(tokens) 216 | 217 | if tokens_b: 218 | tokens += tokens_b + ["[SEP]"] 219 | segment_ids += [1] * (len(tokens_b) + 1) 220 | 221 | if '@' not in tokens: 222 | P_gauss1_array = np.zeros(max_seq_length) 223 | P_gauss2_array = np.zeros(max_seq_length) 224 | else: 225 | ann1_index=tokens.index('@') 226 | if '@' not in tokens[ann1_index+1:]: 227 | P_array = np.zeros(max_seq_length) 228 | P_array[ann1_index-1] = norm_1 229 | P_array[ann1_index] = norm_0 230 | P_array[ann1_index+1] = norm_0 231 | if ann1_index+2 < max_seq_length: 232 | P_array[ann1_index+2] = norm_1 233 | if ann1_index+3 < max_seq_length: 234 | P_array[ann1_index+3] = norm_2 235 | if ann1_index+4 < max_seq_length: 236 | P_array[ann1_index+4] = norm_3 237 | if ann1_index+5 < max_seq_length: 238 | P_array[ann1_index+5] = norm_4 239 | if ann1_index+6 < max_seq_length: 240 | P_array[ann1_index+6] = norm_5 241 | if ann1_index+7 < max_seq_length: 242 | P_array[ann1_index+7] = norm_6 243 | if ann1_index-2 >= 0: 244 | P_array[ann1_index-2] = norm_2 245 | if ann1_index-3 >= 0: 246 | P_array[ann1_index-3] = norm_3 247 | if ann1_index-4 >= 0: 248 | P_array[ann1_index-4] = norm_4 249 | if ann1_index-5 >= 0: 250 | P_array[ann1_index-5] = norm_5 251 | if ann1_index-6 >= 0: 252 | P_array[ann1_index-6] = norm_6 253 | P_gauss1_array = P_array 254 | P_gauss2_array = P_array 255 | else: 256 | ann2_index=tokens.index('@',ann1_index+1) 257 | P1_array = np.zeros(max_seq_length) 258 | P2_array = np.zeros(max_seq_length) 259 | 260 | P1_array[ann1_index-1] = norm_1 261 | P1_array[ann1_index] = norm_0 262 | P1_array[ann1_index+1] = norm_0 263 | if ann1_index+2 < max_seq_length: 264 | P1_array[ann1_index+2] = norm_1 265 | if ann1_index+3 < max_seq_length: 266 | P1_array[ann1_index+3] = norm_2 267 | if ann1_index+4 < max_seq_length: 268 | P1_array[ann1_index+4] = norm_3 269 | if ann1_index+5 < max_seq_length: 270 | P1_array[ann1_index+5] = norm_4 271 | if ann1_index+6 < max_seq_length: 272 | P1_array[ann1_index+6] = norm_5 273 | if ann1_index+7 < max_seq_length: 274 | P1_array[ann1_index+7] = norm_6 275 | if ann1_index-2 >=0: 276 | P1_array[ann1_index-2] = norm_2 277 | if ann1_index-3 >=0: 278 | P1_array[ann1_index-3] = norm_3 279 | if ann1_index-4 >=0: 280 | P1_array[ann1_index-4] = norm_4 281 | if ann1_index-5 >=0: 282 | P1_array[ann1_index-5] = norm_5 283 | if ann1_index-6 >=0: 284 | P1_array[ann1_index-6] = norm_6 285 | 286 | P2_array[ann2_index-1] = norm_1 287 | P2_array[ann2_index] = norm_0 288 | P2_array[ann2_index+1] = norm_0 289 | if ann2_index+2 < max_seq_length: 290 | P2_array[ann2_index+2] = norm_1 291 | if ann2_index+3 < max_seq_length: 292 | P2_array[ann2_index+3] = norm_2 293 | if ann2_index+4 < max_seq_length: 294 | P2_array[ann2_index+4] = norm_3 295 | if ann2_index+5 < max_seq_length: 296 | P2_array[ann2_index+5] = norm_4 297 | if ann2_index+6 < max_seq_length: 298 | P2_array[ann2_index+6] = norm_5 299 | if ann2_index+7 < max_seq_length: 300 | P2_array[ann2_index+7] = norm_6 301 | if ann2_index-2 >=0: 302 | P2_array[ann2_index-2] = norm_2 303 | if ann2_index-3 >= 0: 304 | P2_array[ann2_index-3] = norm_3 305 | if ann2_index-4 >= 0: 306 | P2_array[ann2_index-4] = norm_4 307 | if ann2_index-5 >= 0: 308 | P2_array[ann2_index-5] = norm_5 309 | if ann2_index-6 >= 0: 310 | P2_array[ann2_index-6] = norm_6 311 | P_gauss1_array = P1_array 312 | P_gauss2_array = P2_array 313 | 314 | P_gauss1_list= list(P_gauss1_array) 315 | P_gauss2_list= list(P_gauss2_array) 316 | 317 | 318 | input_ids = tokenizer.convert_tokens_to_ids(tokens) 319 | # The mask has 1 for real tokens and 0 for padding tokens. Only real 320 | # tokens are attended to. 321 | input_mask = [1] * len(input_ids) 322 | 323 | 324 | # Zero-pad up to the sequence length. 325 | padding = [0] * (max_seq_length - len(input_ids)) 326 | input_ids += padding 327 | input_mask += padding 328 | segment_ids += padding 329 | 330 | assert len(input_ids) == max_seq_length 331 | assert len(input_mask) == max_seq_length 332 | assert len(segment_ids) == max_seq_length 333 | assert len(P_gauss1_list) == max_seq_length 334 | assert len(P_gauss2_list) == max_seq_length 335 | 336 | if output_mode == "classification": 337 | label_id = label_map[example.label] 338 | elif output_mode == "regression": 339 | label_id = float(example.label) 340 | else: 341 | raise KeyError(output_mode) 342 | 343 | if ex_index < 5: 344 | logger.info("*** Example ***") 345 | logger.info("guid: %s" % (example.guid)) 346 | logger.info("tokens: %s" % " ".join([str(x) for x in tokens])) 347 | logger.info("input_ids: %s" % " ".join([str(x) for x in input_ids])) 348 | logger.info("input_mask: %s" % " ".join([str(x) for x in input_mask])) 349 | logger.info("segment_ids: %s" % " ".join([str(x) for x in segment_ids])) 350 | logger.info("P_gauss1_list: %s" % " ".join([str(x) for x in P_gauss1_list])) 351 | logger.info("P_gauss2_list: %s" % " ".join([str(x) for x in P_gauss2_list])) 352 | logger.info("label: %s (id = %d)" % (example.label, label_id)) 353 | 354 | features.append( 355 | InputFeatures(input_ids=input_ids, 356 | input_mask=input_mask, 357 | segment_ids=segment_ids, 358 | P_gauss1_list=P_gauss1_list, 359 | P_gauss2_list=P_gauss2_list, 360 | label_id=label_id)) 361 | return features 362 | 363 | 364 | def _truncate_seq_pair(tokens_a, tokens_b, max_length): 365 | """Truncates a sequence pair in place to the maximum length.""" 366 | 367 | # This is a simple heuristic which will always truncate the longer sequence 368 | # one token at a time. This makes more sense than truncating an equal percent 369 | # of tokens from each, since if one sequence is very short then each token 370 | # that's truncated likely contains more information than a longer sequence. 371 | while True: 372 | total_length = len(tokens_a) + len(tokens_b) 373 | if total_length <= max_length: 374 | break 375 | if len(tokens_a) > len(tokens_b): 376 | tokens_a.pop() 377 | else: 378 | tokens_b.pop() 379 | 380 | def f1_chemprot(preds, labels): 381 | p,r,f,s = precision_recall_fscore_support(y_pred=preds, y_true=labels, labels=[0,1,2,3,4], average="micro") 382 | return { 383 | "P": p, 384 | "R": r, 385 | "F1": f, 386 | } 387 | 388 | def f1_ddi(preds, labels): 389 | p,r,f,s = precision_recall_fscore_support(y_pred=preds, y_true=labels, labels=[0,1,2,3], average="micro") 390 | return { 391 | "P": p, 392 | "R": r, 393 | "F1": f, 394 | } 395 | 396 | def compute_metrics(task_name, preds, labels): 397 | assert len(preds) == len(labels) 398 | if task_name == "cpi": 399 | return f1_chemprot(preds, labels) 400 | elif task_name == "ddi": 401 | return f1_ddi(preds, labels) 402 | else: 403 | raise KeyError(task_name) 404 | 405 | def main(): 406 | parser = argparse.ArgumentParser() 407 | 408 | ## Required parameters 409 | parser.add_argument("--data_dir", 410 | default=None, 411 | type=str, 412 | required=True, 413 | help="The input data dir. Should contain the .tsv files (or other data files) for the task.") 414 | parser.add_argument("--bert_model", default=None, type=str, required=True, 415 | help="Bert pre-trained model selected in the list: bert-base-uncased, " 416 | "bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, " 417 | "bert-base-multilingual-cased, bert-base-chinese.") 418 | parser.add_argument("--task_name", 419 | default=None, 420 | type=str, 421 | required=True, 422 | help="The name of the task to train.") 423 | parser.add_argument("--output_dir", 424 | default=None, 425 | type=str, 426 | required=True, 427 | help="The output directory where the model predictions and checkpoints will be written.") 428 | 429 | ## Other parameters 430 | parser.add_argument("--cache_dir", 431 | default="", 432 | type=str, 433 | help="Where do you want to store the pre-trained models downloaded from s3") 434 | parser.add_argument("--max_seq_length", 435 | default=128, 436 | type=int, 437 | help="The maximum total input sequence length after WordPiece tokenization. \n" 438 | "Sequences longer than this will be truncated, and sequences shorter \n" 439 | "than this will be padded.") 440 | parser.add_argument("--saved_model", default=None, type=str, 441 | help="saved model") 442 | parser.add_argument("--do_train", 443 | action='store_true', 444 | help="Whether to run training.") 445 | parser.add_argument("--do_eval", 446 | action='store_true', 447 | help="Whether to run eval on the dev set.") 448 | parser.add_argument("--do_predict", 449 | action='store_true', 450 | help="Whether to run test on the test set.") 451 | parser.add_argument("--do_lower_case", 452 | action='store_true', 453 | help="Set this flag if you are using an uncased model.") 454 | parser.add_argument("--train_batch_size", 455 | default=16, 456 | type=int, 457 | help="Total batch size for training.") 458 | parser.add_argument("--eval_batch_size", 459 | default=8, 460 | type=int, 461 | help="Total batch size for eval.") 462 | parser.add_argument("--predict_batch_size", 463 | default=8, 464 | type=int, 465 | help="Total batch size for predict.") 466 | parser.add_argument("--learning_rate", 467 | default=5e-5, 468 | type=float, 469 | help="The initial learning rate for Adam.") 470 | parser.add_argument("--num_train_epochs", 471 | default=2.0, 472 | type=float, 473 | help="Total number of training epochs to perform.") 474 | parser.add_argument("--warmup_proportion", 475 | default=0.1, 476 | type=float, 477 | help="Proportion of training to perform linear learning rate warmup for. " 478 | "E.g., 0.1 = 10%% of training.") 479 | parser.add_argument("--no_cuda", 480 | action='store_true', 481 | help="Whether not to use CUDA when available") 482 | parser.add_argument("--local_rank", 483 | type=int, 484 | default=-1, 485 | help="local_rank for distributed training on gpus") 486 | parser.add_argument('--seed', 487 | type=int, 488 | default=47, 489 | help="random seed for initialization") 490 | parser.add_argument('--gradient_accumulation_steps', 491 | type=int, 492 | default=1, 493 | help="Number of updates steps to accumulate before performing a backward/update pass.") 494 | parser.add_argument('--fp16', 495 | action='store_true', 496 | help="Whether to use 16-bit float precision instead of 32-bit") 497 | parser.add_argument('--loss_scale', 498 | type=float, default=0, 499 | help="Loss scaling to improve fp16 numeric stability. Only used when fp16 set to True.\n" 500 | "0 (default value): dynamic loss scaling.\n" 501 | "Positive power of 2: static loss scaling value.\n") 502 | parser.add_argument('--server_ip', type=str, default='', help="Can be used for distant debugging.") 503 | parser.add_argument('--server_port', type=str, default='', help="Can be used for distant debugging.") 504 | 505 | args = parser.parse_args() 506 | 507 | if args.server_ip and args.server_port: 508 | # Distant debugging - see https://code.visualstudio.com/docs/python/debugging#_attach-to-a-local-script 509 | import ptvsd 510 | print("Waiting for debugger attach") 511 | ptvsd.enable_attach(address=(args.server_ip, args.server_port), redirect_output=True) 512 | ptvsd.wait_for_attach() 513 | 514 | processors = { 515 | "cpi": BioBERTChemprotProcessor, 516 | "ddi": BioBERT_DDIProcessor, 517 | } 518 | 519 | output_modes = { 520 | "cpi": "classification", 521 | "ddi": "classification", 522 | } 523 | 524 | if args.local_rank == -1 or args.no_cuda: 525 | device = torch.device("cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu") 526 | n_gpu = torch.cuda.device_count() 527 | else: 528 | torch.cuda.set_device(args.local_rank) 529 | device = torch.device("cuda", args.local_rank) 530 | n_gpu = 1 531 | # Initializes the distributed backend which will take care of sychronizing nodes/GPUs 532 | torch.distributed.init_process_group(backend='nccl') 533 | logger.info("device: {} n_gpu: {}, distributed training: {}, 16-bits training: {}".format( 534 | device, n_gpu, bool(args.local_rank != -1), args.fp16)) 535 | 536 | if args.gradient_accumulation_steps < 1: 537 | raise ValueError("Invalid gradient_accumulation_steps parameter: {}, should be >= 1".format( 538 | args.gradient_accumulation_steps)) 539 | 540 | args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps 541 | 542 | random.seed(args.seed) 543 | np.random.seed(args.seed) 544 | torch.manual_seed(args.seed) 545 | if n_gpu > 0: 546 | torch.cuda.manual_seed_all(args.seed) 547 | 548 | if not args.do_train and not args.do_eval: 549 | raise ValueError("At least one of `do_train` or `do_eval` must be True.") 550 | 551 | if os.path.exists(args.output_dir) and os.listdir(args.output_dir) and args.do_train: 552 | raise ValueError("Output directory ({}) already exists and is not empty.".format(args.output_dir)) 553 | if not os.path.exists(args.output_dir): 554 | os.makedirs(args.output_dir) 555 | 556 | task_name = args.task_name.lower() 557 | 558 | if task_name not in processors: 559 | raise ValueError("Task not found: %s" % (task_name)) 560 | 561 | processor = processors[task_name]() 562 | output_mode = output_modes[task_name] 563 | 564 | label_list = processor.get_labels() 565 | num_labels = len(label_list) 566 | 567 | tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case, 568 | never_split = ("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]", "CPR:3", "CPR:4", "CPR:5", "CPR:6", "CPR:9")) 569 | 570 | tokenizer.vocab["CPR:3"] = 3 571 | tokenizer.vocab["CPR:4"] = 4 572 | tokenizer.vocab["CPR:5"] = 5 573 | tokenizer.vocab["CPR:6"] = 6 574 | tokenizer.vocab["CPR:9"] = 9 575 | train_examples = None 576 | num_train_optimization_steps = None 577 | 578 | 579 | if args.do_train: 580 | train_examples = processor.get_train_examples(args.data_dir) 581 | num_train_optimization_steps = int( 582 | len(train_examples) / args.train_batch_size / args.gradient_accumulation_steps) * args.num_train_epochs 583 | if args.local_rank != -1: 584 | num_train_optimization_steps = num_train_optimization_steps // torch.distributed.get_world_size() 585 | 586 | # Prepare model 587 | cache_dir = args.cache_dir if args.cache_dir else os.path.join(str(PYTORCH_PRETRAINED_BERT_CACHE), 'distributed_{}'.format(args.local_rank)) 588 | model = BertForSequenceClassification.from_pretrained(args.bert_model, 589 | cache_dir=cache_dir, 590 | num_labels=num_labels) 591 | if args.fp16: 592 | model.half() 593 | model.to(device) 594 | if args.local_rank != -1: 595 | try: 596 | from apex.parallel import DistributedDataParallel as DDP 597 | except ImportError: 598 | raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.") 599 | 600 | model = DDP(model) 601 | elif n_gpu > 1: 602 | model = torch.nn.DataParallel(model) 603 | 604 | # Prepare optimizer 605 | param_optimizer = list(model.named_parameters()) 606 | no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight'] 607 | optimizer_grouped_parameters = [ 608 | {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01}, 609 | {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0} 610 | ] 611 | if args.fp16: 612 | try: 613 | from apex.optimizers import FP16_Optimizer 614 | from apex.optimizers import FusedAdam 615 | except ImportError: 616 | raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.") 617 | 618 | optimizer = FusedAdam(optimizer_grouped_parameters, 619 | lr=args.learning_rate, 620 | bias_correction=False, 621 | max_grad_norm=1.0) 622 | if args.loss_scale == 0: 623 | optimizer = FP16_Optimizer(optimizer, dynamic_loss_scale=True) 624 | else: 625 | optimizer = FP16_Optimizer(optimizer, static_loss_scale=args.loss_scale) 626 | 627 | else: 628 | optimizer = BertAdam(optimizer_grouped_parameters, 629 | lr=args.learning_rate, 630 | warmup=args.warmup_proportion, 631 | t_total=num_train_optimization_steps, 632 | e=1e-08) 633 | 634 | global_step = 0 635 | nb_tr_steps = 0 636 | tr_loss = 0 637 | if args.do_train: 638 | train_features = convert_examples_to_features( 639 | train_examples, label_list, args.max_seq_length, tokenizer, output_mode) 640 | logger.info("***** Running training *****") 641 | logger.info(" Num examples = %d", len(train_examples)) 642 | logger.info(" Batch size = %d", args.train_batch_size) 643 | logger.info(" Num steps = %d", num_train_optimization_steps) 644 | 645 | all_input_ids = torch.tensor([f.input_ids for f in train_features], dtype=torch.long) 646 | all_input_mask = torch.tensor([f.input_mask for f in train_features], dtype=torch.long) 647 | all_segment_ids = torch.tensor([f.segment_ids for f in train_features], dtype=torch.long) 648 | all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in train_features], dtype=torch.float) 649 | all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in train_features], dtype=torch.float) 650 | 651 | if output_mode == "classification": 652 | all_label_ids = torch.tensor([f.label_id for f in train_features], dtype=torch.long) 653 | elif output_mode == "regression": 654 | all_label_ids = torch.tensor([f.label_id for f in train_features], dtype=torch.float) 655 | 656 | train_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_P_gauss1_list, all_P_gauss2_list,all_label_ids) 657 | 658 | if args.local_rank == -1: 659 | train_sampler = RandomSampler(train_data) 660 | else: 661 | train_sampler = DistributedSampler(train_data) 662 | train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=args.train_batch_size) 663 | 664 | model.train() 665 | for _ in trange(int(args.num_train_epochs), desc="Epoch"): 666 | tr_loss = 0 667 | nb_tr_examples, nb_tr_steps = 0, 0 668 | for step, batch in enumerate(tqdm(train_dataloader, desc="Iteration")): 669 | batch = tuple(t.to(device) for t in batch) 670 | input_ids, input_mask, segment_ids, P_gauss1_list, P_gauss2_list, label_ids = batch 671 | 672 | # define a new function to compute loss values for both output_modes 673 | logits = model(input_ids, segment_ids, input_mask, P_gauss1_list, P_gauss2_list, labels=None) 674 | 675 | if output_mode == "classification": 676 | loss_fct = CrossEntropyLoss() 677 | loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1)) 678 | elif output_mode == "regression": 679 | loss_fct = MSELoss() 680 | loss = loss_fct(logits.view(-1), label_ids.view(-1)) 681 | 682 | if n_gpu > 1: 683 | loss = loss.mean() # mean() to average on multi-gpu. 684 | if args.gradient_accumulation_steps > 1: 685 | loss = loss / args.gradient_accumulation_steps 686 | 687 | if args.fp16: 688 | optimizer.backward(loss) 689 | else: 690 | loss.backward() 691 | 692 | tr_loss += loss.item() 693 | nb_tr_examples += input_ids.size(0) 694 | nb_tr_steps += 1 695 | if (step + 1) % args.gradient_accumulation_steps == 0: 696 | if args.fp16: 697 | # modify learning rate with special warm up BERT uses 698 | # if args.fp16 is False, BertAdam is used that handles this automatically 699 | lr_this_step = args.learning_rate * warmup_linear(global_step/num_train_optimization_steps, args.warmup_proportion) 700 | for param_group in optimizer.param_groups: 701 | param_group['lr'] = lr_this_step 702 | optimizer.step() 703 | optimizer.zero_grad() 704 | global_step += 1 705 | 706 | # Save a trained model and the associated configuration 707 | model_to_save = model.module if hasattr(model, 'module') else model # Only save the model it-self 708 | output_model_file = os.path.join(args.output_dir, WEIGHTS_NAME) 709 | torch.save(model_to_save.state_dict(), output_model_file) 710 | output_config_file = os.path.join(args.output_dir, CONFIG_NAME) 711 | with open(output_config_file, 'w') as f: 712 | f.write(model_to_save.config.to_json_string()) 713 | 714 | else: 715 | output_model_file = os.path.join(args.saved_model, 'pytorch_model.bin') 716 | output_config_file = os.path.join(args.saved_model, 'bert_config.json') 717 | config = BertConfig(output_config_file) 718 | model = BertForSequenceClassification(config, num_labels=num_labels) 719 | model.load_state_dict(torch.load(output_model_file)) 720 | model.to(device) 721 | 722 | 723 | if args.do_eval and (args.local_rank == -1 or torch.distributed.get_rank() == 0): 724 | eval_examples = processor.get_dev_examples(args.data_dir) 725 | eval_features = convert_examples_to_features( 726 | eval_examples, label_list, args.max_seq_length, tokenizer, output_mode) 727 | logger.info("***** Running evaluation *****") 728 | logger.info(" Num examples = %d", len(eval_examples)) 729 | logger.info(" Batch size = %d", args.eval_batch_size) 730 | 731 | all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long) 732 | all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long) 733 | all_segment_ids = torch.tensor([f.segment_ids for f in eval_features], dtype=torch.long) 734 | all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in eval_features], dtype=torch.float) 735 | all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in eval_features], dtype=torch.float) 736 | 737 | if output_mode == "classification": 738 | all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.long) 739 | elif output_mode == "regression": 740 | all_label_ids = torch.tensor([f.label_id for f in eval_features], dtype=torch.float) 741 | 742 | eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_P_gauss1_list, all_P_gauss2_list, all_label_ids) 743 | 744 | # Run prediction for full data 745 | eval_sampler = SequentialSampler(eval_data) 746 | eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=args.eval_batch_size) 747 | 748 | model.eval() 749 | eval_loss = 0 750 | nb_eval_steps = 0 751 | preds = [] 752 | 753 | for input_ids, input_mask, segment_ids, P_gauss1_list, P_gauss2_list, label_ids in tqdm(eval_dataloader, desc="Evaluating"): 754 | input_ids = input_ids.to(device) 755 | input_mask = input_mask.to(device) 756 | segment_ids = segment_ids.to(device) 757 | P_gauss1_list = P_gauss1_list.to(device) 758 | P_gauss2_list = P_gauss2_list.to(device) 759 | label_ids = label_ids.to(device) 760 | 761 | with torch.no_grad(): 762 | logits = model(input_ids, segment_ids, input_mask, P_gauss1_list, P_gauss2_list, labels=None) 763 | 764 | # create eval loss and other metric required by the task 765 | if output_mode == "classification": 766 | loss_fct = CrossEntropyLoss() 767 | tmp_eval_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1)) 768 | elif output_mode == "regression": 769 | loss_fct = MSELoss() 770 | tmp_eval_loss = loss_fct(logits.view(-1), label_ids.view(-1)) 771 | 772 | eval_loss += tmp_eval_loss.mean().item() 773 | nb_eval_steps += 1 774 | if len(preds) == 0: 775 | preds.append(logits.detach().cpu().numpy()) 776 | else: 777 | preds[0] = np.append( 778 | preds[0], logits.detach().cpu().numpy(), axis=0) 779 | 780 | eval_loss = eval_loss / nb_eval_steps 781 | preds = preds[0] 782 | if output_mode == "classification": 783 | preds = np.argmax(preds, axis=1) 784 | elif output_mode == "regression": 785 | preds = np.squeeze(preds) 786 | result = compute_metrics(task_name, preds, all_label_ids.numpy()) 787 | loss = tr_loss/nb_tr_steps if args.do_train else None 788 | 789 | result['eval_loss'] = eval_loss 790 | result['global_step'] = global_step 791 | result['loss'] = loss 792 | 793 | output_eval_file = os.path.join(args.output_dir, "eval_results.txt") 794 | with open(output_eval_file, "w") as writer: 795 | logger.info("***** Eval results *****") 796 | for key in sorted(result.keys()): 797 | logger.info(" %s = %s", key, str(result[key])) 798 | writer.write("%s = %s\n" % (key, str(result[key]))) 799 | 800 | 801 | if args.do_predict and (args.local_rank == -1 or torch.distributed.get_rank() == 0): 802 | test_examples = processor.get_test_examples(args.data_dir) 803 | test_features = convert_examples_to_features( 804 | test_examples, label_list, args.max_seq_length, tokenizer, output_mode) 805 | logger.info("***** Running prediction *****") 806 | logger.info(" Num examples = %d", len(test_examples)) 807 | logger.info(" Batch size = %d", args.predict_batch_size) 808 | 809 | all_input_ids = torch.tensor([f.input_ids for f in test_features], dtype=torch.long) 810 | all_input_mask = torch.tensor([f.input_mask for f in test_features], dtype=torch.long) 811 | all_segment_ids = torch.tensor([f.segment_ids for f in test_features], dtype=torch.long) 812 | all_P_gauss1_list = torch.tensor([f.P_gauss1_list for f in test_features], dtype=torch.float) 813 | all_P_gauss2_list = torch.tensor([f.P_gauss2_list for f in test_features], dtype=torch.float) 814 | 815 | if output_mode == "classification": 816 | all_label_ids = torch.tensor([f.label_id for f in test_features], dtype=torch.long) 817 | elif output_mode == "regression": 818 | all_label_ids = torch.tensor([f.label_id for f in test_features], dtype=torch.float) 819 | 820 | predict_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_P_gauss1_list, all_P_gauss2_list, all_label_ids) 821 | 822 | # Run prediction for full data 823 | predict_sampler = SequentialSampler(predict_data) 824 | predict_dataloader = DataLoader(predict_data, sampler=predict_sampler, batch_size=args.predict_batch_size) 825 | 826 | model.eval() 827 | test_loss = 0 828 | nb_test_steps = 0 829 | preds = [] 830 | 831 | for input_ids, input_mask, segment_ids, P_gauss1_list, P_gauss2_list,label_ids in tqdm(predict_dataloader, desc="Predicting"): 832 | input_ids = input_ids.to(device) 833 | input_mask = input_mask.to(device) 834 | segment_ids = segment_ids.to(device) 835 | P_gauss1_list = P_gauss1_list.to(device) 836 | P_gauss2_list = P_gauss2_list.to(device) 837 | label_ids = label_ids.to(device) 838 | 839 | 840 | with torch.no_grad(): 841 | logits = model(input_ids, segment_ids, input_mask, P_gauss1_list, P_gauss2_list, labels=None) 842 | 843 | # create eval loss and other metric required by the task 844 | if output_mode == "classification": 845 | loss_fct = CrossEntropyLoss() 846 | tmp_test_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1)) 847 | 848 | elif output_mode == "regression": 849 | loss_fct = MSELoss() 850 | tmp_test_loss = loss_fct(logits.view(-1), label_ids.view(-1)) 851 | 852 | test_loss += tmp_test_loss.mean().item() 853 | nb_test_steps += 1 854 | if len(preds) == 0: 855 | preds.append(logits.detach().cpu().numpy()) 856 | else: 857 | preds[0] = np.append( 858 | preds[0], logits.detach().cpu().numpy(), axis=0) 859 | 860 | test_loss = test_loss / nb_test_steps 861 | preds = preds[0] 862 | if output_mode == "classification": 863 | preds = np.argmax(preds, axis=1) 864 | elif output_mode == "regression": 865 | preds = np.squeeze(preds) 866 | result = compute_metrics(task_name, preds, all_label_ids.numpy()) 867 | loss = tr_loss/nb_tr_steps if args.do_train else None 868 | 869 | result['test_loss'] = test_loss 870 | result['global_step'] = global_step 871 | result['loss'] = loss 872 | 873 | output_test_file = os.path.join(args.output_dir, "test_results.txt") 874 | with open(output_test_file, "w") as writer: 875 | logger.info("***** Predict results *****") 876 | for key in sorted(result.keys()): 877 | logger.info(" %s = %s", key, str(result[key])) 878 | writer.write("%s = %s\n" % (key, str(result[key]))) 879 | 880 | 881 | if __name__ == "__main__": 882 | main() -------------------------------------------------------------------------------- /SourceCode/file_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utilities for working with the local dataset cache. 3 | This file is adapted from the AllenNLP library at https://github.com/allenai/allennlp 4 | Copyright by the AllenNLP authors. 5 | """ 6 | from __future__ import (absolute_import, division, print_function, unicode_literals) 7 | 8 | import json 9 | import logging 10 | import os 11 | import shutil 12 | import tempfile 13 | from functools import wraps 14 | from hashlib import sha256 15 | import sys 16 | from io import open 17 | 18 | import boto3 19 | import requests 20 | from botocore.exceptions import ClientError 21 | from tqdm import tqdm 22 | 23 | try: 24 | from urllib.parse import urlparse 25 | except ImportError: 26 | from urlparse import urlparse 27 | 28 | try: 29 | from pathlib import Path 30 | PYTORCH_PRETRAINED_BERT_CACHE = Path(os.getenv('PYTORCH_PRETRAINED_BERT_CACHE', 31 | Path.home() / '.pytorch_pretrained_bert')) 32 | except (AttributeError, ImportError): 33 | PYTORCH_PRETRAINED_BERT_CACHE = os.getenv('PYTORCH_PRETRAINED_BERT_CACHE', 34 | os.path.join(os.path.expanduser("~"), '.pytorch_pretrained_bert')) 35 | 36 | logger = logging.getLogger(__name__) # pylint: disable=invalid-name 37 | 38 | 39 | def url_to_filename(url, etag=None): 40 | """ 41 | Convert `url` into a hashed filename in a repeatable way. 42 | If `etag` is specified, append its hash to the url's, delimited 43 | by a period. 44 | """ 45 | url_bytes = url.encode('utf-8') 46 | url_hash = sha256(url_bytes) 47 | filename = url_hash.hexdigest() 48 | 49 | if etag: 50 | etag_bytes = etag.encode('utf-8') 51 | etag_hash = sha256(etag_bytes) 52 | filename += '.' + etag_hash.hexdigest() 53 | 54 | return filename 55 | 56 | 57 | def filename_to_url(filename, cache_dir=None): 58 | """ 59 | Return the url and etag (which may be ``None``) stored for `filename`. 60 | Raise ``EnvironmentError`` if `filename` or its stored metadata do not exist. 61 | """ 62 | if cache_dir is None: 63 | cache_dir = PYTORCH_PRETRAINED_BERT_CACHE 64 | if sys.version_info[0] == 3 and isinstance(cache_dir, Path): 65 | cache_dir = str(cache_dir) 66 | 67 | cache_path = os.path.join(cache_dir, filename) 68 | if not os.path.exists(cache_path): 69 | raise EnvironmentError("file {} not found".format(cache_path)) 70 | 71 | meta_path = cache_path + '.json' 72 | if not os.path.exists(meta_path): 73 | raise EnvironmentError("file {} not found".format(meta_path)) 74 | 75 | with open(meta_path, encoding="utf-8") as meta_file: 76 | metadata = json.load(meta_file) 77 | url = metadata['url'] 78 | etag = metadata['etag'] 79 | 80 | return url, etag 81 | 82 | 83 | def cached_path(url_or_filename, cache_dir=None): 84 | """ 85 | Given something that might be a URL (or might be a local path), 86 | determine which. If it's a URL, download the file and cache it, and 87 | return the path to the cached file. If it's already a local path, 88 | make sure the file exists and then return the path. 89 | """ 90 | if cache_dir is None: 91 | cache_dir = PYTORCH_PRETRAINED_BERT_CACHE 92 | if sys.version_info[0] == 3 and isinstance(url_or_filename, Path): 93 | url_or_filename = str(url_or_filename) 94 | if sys.version_info[0] == 3 and isinstance(cache_dir, Path): 95 | cache_dir = str(cache_dir) 96 | 97 | parsed = urlparse(url_or_filename) 98 | 99 | if parsed.scheme in ('http', 'https', 's3'): 100 | # URL, so get it from the cache (downloading if necessary) 101 | return get_from_cache(url_or_filename, cache_dir) 102 | elif os.path.exists(url_or_filename): 103 | # File, and it exists. 104 | return url_or_filename 105 | elif parsed.scheme == '': 106 | # File, but it doesn't exist. 107 | raise EnvironmentError("file {} not found".format(url_or_filename)) 108 | else: 109 | # Something unknown 110 | raise ValueError("unable to parse {} as a URL or as a local path".format(url_or_filename)) 111 | 112 | 113 | def split_s3_path(url): 114 | """Split a full s3 path into the bucket name and path.""" 115 | parsed = urlparse(url) 116 | if not parsed.netloc or not parsed.path: 117 | raise ValueError("bad s3 path {}".format(url)) 118 | bucket_name = parsed.netloc 119 | s3_path = parsed.path 120 | # Remove '/' at beginning of path. 121 | if s3_path.startswith("/"): 122 | s3_path = s3_path[1:] 123 | return bucket_name, s3_path 124 | 125 | 126 | def s3_request(func): 127 | """ 128 | Wrapper function for s3 requests in order to create more helpful error 129 | messages. 130 | """ 131 | 132 | @wraps(func) 133 | def wrapper(url, *args, **kwargs): 134 | try: 135 | return func(url, *args, **kwargs) 136 | except ClientError as exc: 137 | if int(exc.response["Error"]["Code"]) == 404: 138 | raise EnvironmentError("file {} not found".format(url)) 139 | else: 140 | raise 141 | 142 | return wrapper 143 | 144 | 145 | @s3_request 146 | def s3_etag(url): 147 | """Check ETag on S3 object.""" 148 | s3_resource = boto3.resource("s3") 149 | bucket_name, s3_path = split_s3_path(url) 150 | s3_object = s3_resource.Object(bucket_name, s3_path) 151 | return s3_object.e_tag 152 | 153 | 154 | @s3_request 155 | def s3_get(url, temp_file): 156 | """Pull a file directly from S3.""" 157 | s3_resource = boto3.resource("s3") 158 | bucket_name, s3_path = split_s3_path(url) 159 | s3_resource.Bucket(bucket_name).download_fileobj(s3_path, temp_file) 160 | 161 | 162 | def http_get(url, temp_file): 163 | req = requests.get(url, stream=True) 164 | content_length = req.headers.get('Content-Length') 165 | total = int(content_length) if content_length is not None else None 166 | progress = tqdm(unit="B", total=total) 167 | for chunk in req.iter_content(chunk_size=1024): 168 | if chunk: # filter out keep-alive new chunks 169 | progress.update(len(chunk)) 170 | temp_file.write(chunk) 171 | progress.close() 172 | 173 | 174 | def get_from_cache(url, cache_dir=None): 175 | """ 176 | Given a URL, look for the corresponding dataset in the local cache. 177 | If it's not there, download it. Then return the path to the cached file. 178 | """ 179 | if cache_dir is None: 180 | cache_dir = PYTORCH_PRETRAINED_BERT_CACHE 181 | if sys.version_info[0] == 3 and isinstance(cache_dir, Path): 182 | cache_dir = str(cache_dir) 183 | 184 | if not os.path.exists(cache_dir): 185 | os.makedirs(cache_dir) 186 | 187 | # Get eTag to add to filename, if it exists. 188 | if url.startswith("s3://"): 189 | etag = s3_etag(url) 190 | else: 191 | response = requests.head(url, allow_redirects=True) 192 | if response.status_code != 200: 193 | raise IOError("HEAD request failed for url {} with status code {}" 194 | .format(url, response.status_code)) 195 | etag = response.headers.get("ETag") 196 | 197 | filename = url_to_filename(url, etag) 198 | 199 | # get cache path to put the file 200 | cache_path = os.path.join(cache_dir, filename) 201 | 202 | if not os.path.exists(cache_path): 203 | # Download to temporary file, then copy to cache dir once finished. 204 | # Otherwise you get corrupt cache entries if the download gets interrupted. 205 | with tempfile.NamedTemporaryFile() as temp_file: 206 | logger.info("%s not found in cache, downloading to %s", url, temp_file.name) 207 | 208 | # GET file object 209 | if url.startswith("s3://"): 210 | s3_get(url, temp_file) 211 | else: 212 | http_get(url, temp_file) 213 | 214 | # we are copying the file before closing it, so flush to avoid truncation 215 | temp_file.flush() 216 | # shutil.copyfileobj() starts at the current position, so go to the start 217 | temp_file.seek(0) 218 | 219 | logger.info("copying %s to cache at %s", temp_file.name, cache_path) 220 | with open(cache_path, 'wb') as cache_file: 221 | shutil.copyfileobj(temp_file, cache_file) 222 | 223 | logger.info("creating metadata file for %s", cache_path) 224 | meta = {'url': url, 'etag': etag} 225 | meta_path = cache_path + '.json' 226 | with open(meta_path, 'w', encoding="utf-8") as meta_file: 227 | json.dump(meta, meta_file) 228 | 229 | logger.info("removing temp file %s", temp_file.name) 230 | 231 | return cache_path 232 | 233 | 234 | def read_set_from_file(filename): 235 | ''' 236 | Extract a de-duped collection (set) of text from a file. 237 | Expected file format is one item per line. 238 | ''' 239 | collection = set() 240 | with open(filename, 'r', encoding='utf-8') as file_: 241 | for line in file_: 242 | collection.add(line.rstrip()) 243 | return collection 244 | 245 | 246 | def get_file_extension(path, dot=True, lower=True): 247 | ext = os.path.splitext(path)[1] 248 | ext = ext if dot else ext[1:] 249 | return ext.lower() if lower else ext 250 | -------------------------------------------------------------------------------- /SourceCode/modeling.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team. 3 | # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | """PyTorch BERT model.""" 17 | 18 | from __future__ import absolute_import, division, print_function, unicode_literals 19 | 20 | import copy 21 | import json 22 | import logging 23 | import math 24 | import os 25 | import shutil 26 | import tarfile 27 | import tempfile 28 | import sys 29 | from io import open 30 | 31 | import torch 32 | from torch import nn 33 | from torch.nn import CrossEntropyLoss 34 | 35 | import file_utils 36 | from file_utils import cached_path 37 | import numpy as np 38 | import math 39 | import scipy.stats 40 | 41 | 42 | logger = logging.getLogger(__name__) 43 | 44 | PRETRAINED_MODEL_ARCHIVE_MAP = { 45 | 'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz", 46 | 'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased.tar.gz", 47 | 'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz", 48 | 'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz", 49 | 'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased.tar.gz", 50 | 'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased.tar.gz", 51 | 'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz", 52 | } 53 | CONFIG_NAME = 'bert_config.json' 54 | WEIGHTS_NAME = 'pytorch_model.bin' 55 | TF_WEIGHTS_NAME = 'model.ckpt' 56 | 57 | def load_tf_weights_in_bert(model, tf_checkpoint_path): 58 | """ Load tf checkpoints in a pytorch model 59 | """ 60 | try: 61 | import re 62 | import numpy as np 63 | import tensorflow as tf 64 | except ImportError: 65 | print("Loading a TensorFlow models in PyTorch, requires TensorFlow to be installed. Please see " 66 | "https://www.tensorflow.org/install/ for installation instructions.") 67 | raise 68 | tf_path = os.path.abspath(tf_checkpoint_path) 69 | print("Converting TensorFlow checkpoint from {}".format(tf_path)) 70 | # Load weights from TF model 71 | init_vars = tf.train.list_variables(tf_path) 72 | names = [] 73 | arrays = [] 74 | for name, shape in init_vars: 75 | print("Loading TF weight {} with shape {}".format(name, shape)) 76 | array = tf.train.load_variable(tf_path, name) 77 | names.append(name) 78 | arrays.append(array) 79 | 80 | for name, array in zip(names, arrays): 81 | name = name.split('/') 82 | # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v 83 | # which are not required for using pretrained model 84 | if any(n in ["adam_v", "adam_m", "global_step"] for n in name): 85 | print("Skipping {}".format("/".join(name))) 86 | continue 87 | pointer = model 88 | for m_name in name: 89 | if re.fullmatch(r'[A-Za-z]+_\d+', m_name): 90 | l = re.split(r'_(\d+)', m_name) 91 | else: 92 | l = [m_name] 93 | if l[0] == 'kernel' or l[0] == 'gamma': 94 | pointer = getattr(pointer, 'weight') 95 | elif l[0] == 'output_bias' or l[0] == 'beta': 96 | pointer = getattr(pointer, 'bias') 97 | elif l[0] == 'output_weights': 98 | pointer = getattr(pointer, 'weight') 99 | elif l[0] == 'squad': 100 | pointer = getattr(pointer, 'classifier') 101 | else: 102 | try: 103 | pointer = getattr(pointer, l[0]) 104 | except AttributeError: 105 | print("Skipping {}".format("/".join(name))) 106 | continue 107 | if len(l) >= 2: 108 | num = int(l[1]) 109 | pointer = pointer[num] 110 | if m_name[-11:] == '_embeddings': 111 | pointer = getattr(pointer, 'weight') 112 | elif m_name == 'kernel': 113 | array = np.transpose(array) 114 | try: 115 | assert pointer.shape == array.shape 116 | except AssertionError as e: 117 | e.args += (pointer.shape, array.shape) 118 | raise 119 | print("Initialize PyTorch weight {}".format(name)) 120 | pointer.data = torch.from_numpy(array) 121 | return model 122 | 123 | 124 | def gelu(x): 125 | """Implementation of the gelu activation function. 126 | For information: OpenAI GPT's gelu is slightly different (and gives slightly different results): 127 | 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3)))) 128 | Also see https://arxiv.org/abs/1606.08415 129 | """ 130 | return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0))) 131 | 132 | 133 | def swish(x): 134 | return x * torch.sigmoid(x) 135 | 136 | 137 | ACT2FN = {"gelu": gelu, "relu": torch.nn.functional.relu, "swish": swish} 138 | 139 | 140 | class BertConfig(object): 141 | """Configuration class to store the configuration of a `BertModel`. 142 | """ 143 | def __init__(self, 144 | vocab_size_or_config_json_file, 145 | hidden_size=768, 146 | num_hidden_layers=12, 147 | num_attention_heads=12, 148 | intermediate_size=3072, 149 | hidden_act="gelu", 150 | hidden_dropout_prob=0.1, 151 | attention_probs_dropout_prob=0.1, 152 | max_position_embeddings=512, 153 | type_vocab_size=2, 154 | initializer_range=0.02): 155 | """Constructs BertConfig. 156 | 157 | Args: 158 | vocab_size_or_config_json_file: Vocabulary size of `inputs_ids` in `BertModel`. 159 | hidden_size: Size of the encoder layers and the pooler layer. 160 | num_hidden_layers: Number of hidden layers in the Transformer encoder. 161 | num_attention_heads: Number of attention heads for each attention layer in 162 | the Transformer encoder. 163 | intermediate_size: The size of the "intermediate" (i.e., feed-forward) 164 | layer in the Transformer encoder. 165 | hidden_act: The non-linear activation function (function or string) in the 166 | encoder and pooler. If string, "gelu", "relu" and "swish" are supported. 167 | hidden_dropout_prob: The dropout probabilitiy for all fully connected 168 | layers in the embeddings, encoder, and pooler. 169 | attention_probs_dropout_prob: The dropout ratio for the attention 170 | probabilities. 171 | max_position_embeddings: The maximum sequence length that this model might 172 | ever be used with. Typically set this to something large just in case 173 | (e.g., 512 or 1024 or 2048). 174 | type_vocab_size: The vocabulary size of the `token_type_ids` passed into 175 | `BertModel`. 176 | initializer_range: The sttdev of the truncated_normal_initializer for 177 | initializing all weight matrices. 178 | """ 179 | if isinstance(vocab_size_or_config_json_file, str) or (sys.version_info[0] == 2 180 | and isinstance(vocab_size_or_config_json_file, unicode)): 181 | with open(vocab_size_or_config_json_file, "r", encoding='utf-8') as reader: 182 | json_config = json.loads(reader.read()) 183 | for key, value in json_config.items(): 184 | self.__dict__[key] = value 185 | elif isinstance(vocab_size_or_config_json_file, int): 186 | self.vocab_size = vocab_size_or_config_json_file 187 | self.hidden_size = hidden_size 188 | self.num_hidden_layers = num_hidden_layers 189 | self.num_attention_heads = num_attention_heads 190 | self.hidden_act = hidden_act 191 | self.intermediate_size = intermediate_size 192 | self.hidden_dropout_prob = hidden_dropout_prob 193 | self.attention_probs_dropout_prob = attention_probs_dropout_prob 194 | self.max_position_embeddings = max_position_embeddings 195 | self.type_vocab_size = type_vocab_size 196 | self.initializer_range = initializer_range 197 | else: 198 | raise ValueError("First argument must be either a vocabulary size (int)" 199 | "or the path to a pretrained model config file (str)") 200 | 201 | @classmethod 202 | def from_dict(cls, json_object): 203 | """Constructs a `BertConfig` from a Python dictionary of parameters.""" 204 | config = BertConfig(vocab_size_or_config_json_file=-1) 205 | for key, value in json_object.items(): 206 | config.__dict__[key] = value 207 | return config 208 | 209 | @classmethod 210 | def from_json_file(cls, json_file): 211 | """Constructs a `BertConfig` from a json file of parameters.""" 212 | with open(json_file, "r", encoding='utf-8') as reader: 213 | text = reader.read() 214 | return cls.from_dict(json.loads(text)) 215 | 216 | def __repr__(self): 217 | return str(self.to_json_string()) 218 | 219 | def to_dict(self): 220 | """Serializes this instance to a Python dictionary.""" 221 | output = copy.deepcopy(self.__dict__) 222 | return output 223 | 224 | def to_json_string(self): 225 | """Serializes this instance to a JSON string.""" 226 | return json.dumps(self.to_dict(), indent=2, sort_keys=True) + "\n" 227 | 228 | try: 229 | from apex.normalization.fused_layer_norm import FusedLayerNorm as BertLayerNorm 230 | except ImportError: 231 | logger.info("Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .") 232 | class BertLayerNorm(nn.Module): 233 | def __init__(self, hidden_size, eps=1e-12): 234 | """Construct a layernorm module in the TF style (epsilon inside the square root). 235 | """ 236 | super(BertLayerNorm, self).__init__() 237 | self.weight = nn.Parameter(torch.ones(hidden_size)) 238 | self.bias = nn.Parameter(torch.zeros(hidden_size)) 239 | self.variance_epsilon = eps 240 | 241 | def forward(self, x): 242 | u = x.mean(-1, keepdim=True) 243 | s = (x - u).pow(2).mean(-1, keepdim=True) 244 | x = (x - u) / torch.sqrt(s + self.variance_epsilon) 245 | return self.weight * x + self.bias 246 | 247 | class BertEmbeddings(nn.Module): 248 | """Construct the embeddings from word, position and token_type embeddings. 249 | """ 250 | def __init__(self, config): 251 | super(BertEmbeddings, self).__init__() 252 | self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=0) 253 | self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size) 254 | self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size) 255 | 256 | # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load 257 | # any TensorFlow checkpoint file 258 | self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12) 259 | self.dropout = nn.Dropout(config.hidden_dropout_prob) 260 | 261 | def forward(self, input_ids, token_type_ids=None): 262 | seq_length = input_ids.size(1) 263 | position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device) 264 | position_ids = position_ids.unsqueeze(0).expand_as(input_ids) 265 | if token_type_ids is None: 266 | token_type_ids = torch.zeros_like(input_ids) 267 | 268 | words_embeddings = self.word_embeddings(input_ids) 269 | position_embeddings = self.position_embeddings(position_ids) 270 | token_type_embeddings = self.token_type_embeddings(token_type_ids) 271 | 272 | embeddings = words_embeddings + position_embeddings + token_type_embeddings 273 | embeddings = self.LayerNorm(embeddings) 274 | embeddings = self.dropout(embeddings) 275 | return embeddings 276 | 277 | 278 | class BertSelfAttention(nn.Module): 279 | def __init__(self, config): 280 | super(BertSelfAttention, self).__init__() 281 | if config.hidden_size % config.num_attention_heads != 0: 282 | raise ValueError( 283 | "The hidden size (%d) is not a multiple of the number of attention " 284 | "heads (%d)" % (config.hidden_size, config.num_attention_heads)) 285 | self.num_attention_heads = config.num_attention_heads 286 | self.attention_head_size = int(config.hidden_size / config.num_attention_heads) 287 | self.all_head_size = self.num_attention_heads * self.attention_head_size 288 | 289 | self.query = nn.Linear(config.hidden_size, self.all_head_size) 290 | self.key = nn.Linear(config.hidden_size, self.all_head_size) 291 | self.value = nn.Linear(config.hidden_size, self.all_head_size) 292 | 293 | self.dropout = nn.Dropout(config.attention_probs_dropout_prob) 294 | 295 | def transpose_for_scores(self, x): 296 | new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size) 297 | x = x.view(*new_x_shape) 298 | return x.permute(0, 2, 1, 3) 299 | 300 | def forward(self, hidden_states, attention_mask): 301 | mixed_query_layer = self.query(hidden_states) 302 | mixed_key_layer = self.key(hidden_states) 303 | mixed_value_layer = self.value(hidden_states) 304 | 305 | query_layer = self.transpose_for_scores(mixed_query_layer) 306 | key_layer = self.transpose_for_scores(mixed_key_layer) 307 | value_layer = self.transpose_for_scores(mixed_value_layer) 308 | 309 | # Take the dot product between "query" and "key" to get the raw attention scores. 310 | attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) 311 | attention_scores = attention_scores / math.sqrt(self.attention_head_size) 312 | # Apply the attention mask is (precomputed for all layers in BertModel forward() function) 313 | attention_scores = attention_scores + attention_mask 314 | 315 | # Normalize the attention scores to probabilities. 316 | attention_probs = nn.Softmax(dim=-1)(attention_scores) 317 | 318 | # This is actually dropping out entire tokens to attend to, which might 319 | # seem a bit unusual, but is taken from the original Transformer paper. 320 | attention_probs = self.dropout(attention_probs) 321 | 322 | context_layer = torch.matmul(attention_probs, value_layer) 323 | context_layer = context_layer.permute(0, 2, 1, 3).contiguous() 324 | new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,) 325 | context_layer = context_layer.view(*new_context_layer_shape) 326 | return context_layer 327 | 328 | 329 | class BertSelfOutput(nn.Module): 330 | def __init__(self, config): 331 | super(BertSelfOutput, self).__init__() 332 | self.dense = nn.Linear(config.hidden_size, config.hidden_size) 333 | self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12) 334 | self.dropout = nn.Dropout(config.hidden_dropout_prob) 335 | 336 | def forward(self, hidden_states, input_tensor): 337 | hidden_states = self.dense(hidden_states) 338 | hidden_states = self.dropout(hidden_states) 339 | hidden_states = self.LayerNorm(hidden_states + input_tensor) 340 | return hidden_states 341 | 342 | 343 | class BertAttention(nn.Module): 344 | def __init__(self, config): 345 | super(BertAttention, self).__init__() 346 | self.self = BertSelfAttention(config) 347 | self.output = BertSelfOutput(config) 348 | 349 | def forward(self, input_tensor, attention_mask): 350 | self_output = self.self(input_tensor, attention_mask) 351 | attention_output = self.output(self_output, input_tensor) 352 | return attention_output 353 | 354 | 355 | class BertIntermediate(nn.Module): 356 | def __init__(self, config): 357 | super(BertIntermediate, self).__init__() 358 | self.dense = nn.Linear(config.hidden_size, config.intermediate_size) 359 | if isinstance(config.hidden_act, str) or (sys.version_info[0] == 2 and isinstance(config.hidden_act, unicode)): 360 | self.intermediate_act_fn = ACT2FN[config.hidden_act] 361 | else: 362 | self.intermediate_act_fn = config.hidden_act 363 | 364 | def forward(self, hidden_states): 365 | hidden_states = self.dense(hidden_states) 366 | hidden_states = self.intermediate_act_fn(hidden_states) 367 | return hidden_states 368 | 369 | 370 | class BertOutput(nn.Module): 371 | def __init__(self, config): 372 | super(BertOutput, self).__init__() 373 | self.dense = nn.Linear(config.intermediate_size, config.hidden_size) 374 | self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12) 375 | self.dropout = nn.Dropout(config.hidden_dropout_prob) 376 | 377 | def forward(self, hidden_states, input_tensor): 378 | hidden_states = self.dense(hidden_states) 379 | hidden_states = self.dropout(hidden_states) 380 | hidden_states = self.LayerNorm(hidden_states + input_tensor) 381 | return hidden_states 382 | 383 | 384 | class BertLayer(nn.Module): 385 | def __init__(self, config): 386 | super(BertLayer, self).__init__() 387 | self.attention = BertAttention(config) 388 | self.intermediate = BertIntermediate(config) 389 | self.output = BertOutput(config) 390 | 391 | def forward(self, hidden_states, attention_mask): 392 | attention_output = self.attention(hidden_states, attention_mask) 393 | intermediate_output = self.intermediate(attention_output) 394 | layer_output = self.output(intermediate_output, attention_output) 395 | return layer_output 396 | 397 | 398 | class BertEncoder(nn.Module): 399 | def __init__(self, config): 400 | super(BertEncoder, self).__init__() 401 | layer = BertLayer(config) 402 | self.layer = nn.ModuleList([copy.deepcopy(layer) for _ in range(config.num_hidden_layers)]) 403 | 404 | def forward(self, hidden_states, attention_mask, output_all_encoded_layers=True): 405 | all_encoder_layers = [] 406 | for layer_module in self.layer: 407 | hidden_states = layer_module(hidden_states, attention_mask) 408 | if output_all_encoded_layers: 409 | all_encoder_layers.append(hidden_states) 410 | if not output_all_encoded_layers: 411 | all_encoder_layers.append(hidden_states) 412 | return all_encoder_layers 413 | 414 | 415 | class BertPooler(nn.Module): 416 | def __init__(self, config): 417 | super(BertPooler, self).__init__() 418 | self.dense = nn.Linear(config.hidden_size, config.hidden_size) 419 | self.activation = nn.Tanh() 420 | 421 | def forward(self, hidden_states): 422 | # We "pool" the model by simply taking the hidden state corresponding 423 | # to the first token. 424 | first_token_tensor = hidden_states[:, 0] 425 | pooled_output = self.dense(first_token_tensor) 426 | pooled_output = self.activation(pooled_output) 427 | return pooled_output 428 | 429 | 430 | class BertPredictionHeadTransform(nn.Module): 431 | def __init__(self, config): 432 | super(BertPredictionHeadTransform, self).__init__() 433 | self.dense = nn.Linear(config.hidden_size, config.hidden_size) 434 | if isinstance(config.hidden_act, str) or (sys.version_info[0] == 2 and isinstance(config.hidden_act, unicode)): 435 | self.transform_act_fn = ACT2FN[config.hidden_act] 436 | else: 437 | self.transform_act_fn = config.hidden_act 438 | self.LayerNorm = BertLayerNorm(config.hidden_size, eps=1e-12) 439 | 440 | def forward(self, hidden_states): 441 | hidden_states = self.dense(hidden_states) 442 | hidden_states = self.transform_act_fn(hidden_states) 443 | hidden_states = self.LayerNorm(hidden_states) 444 | return hidden_states 445 | 446 | 447 | class BertLMPredictionHead(nn.Module): 448 | def __init__(self, config, bert_model_embedding_weights): 449 | super(BertLMPredictionHead, self).__init__() 450 | self.transform = BertPredictionHeadTransform(config) 451 | 452 | # The output weights are the same as the input embeddings, but there is 453 | # an output-only bias for each token. 454 | self.decoder = nn.Linear(bert_model_embedding_weights.size(1), 455 | bert_model_embedding_weights.size(0), 456 | bias=False) 457 | self.decoder.weight = bert_model_embedding_weights 458 | self.bias = nn.Parameter(torch.zeros(bert_model_embedding_weights.size(0))) 459 | 460 | def forward(self, hidden_states): 461 | hidden_states = self.transform(hidden_states) 462 | hidden_states = self.decoder(hidden_states) + self.bias 463 | return hidden_states 464 | 465 | 466 | class BertOnlyMLMHead(nn.Module): 467 | def __init__(self, config, bert_model_embedding_weights): 468 | super(BertOnlyMLMHead, self).__init__() 469 | self.predictions = BertLMPredictionHead(config, bert_model_embedding_weights) 470 | 471 | def forward(self, sequence_output): 472 | prediction_scores = self.predictions(sequence_output) 473 | return prediction_scores 474 | 475 | 476 | class BertOnlyNSPHead(nn.Module): 477 | def __init__(self, config): 478 | super(BertOnlyNSPHead, self).__init__() 479 | self.seq_relationship = nn.Linear(config.hidden_size, 2) 480 | 481 | def forward(self, pooled_output): 482 | seq_relationship_score = self.seq_relationship(pooled_output) 483 | return seq_relationship_score 484 | 485 | 486 | class BertPreTrainingHeads(nn.Module): 487 | def __init__(self, config, bert_model_embedding_weights): 488 | super(BertPreTrainingHeads, self).__init__() 489 | self.predictions = BertLMPredictionHead(config, bert_model_embedding_weights) 490 | self.seq_relationship = nn.Linear(config.hidden_size, 2) 491 | 492 | def forward(self, sequence_output, pooled_output): 493 | prediction_scores = self.predictions(sequence_output) 494 | seq_relationship_score = self.seq_relationship(pooled_output) 495 | return prediction_scores, seq_relationship_score 496 | 497 | 498 | class BertPreTrainedModel(nn.Module): 499 | """ An abstract class to handle weights initialization and 500 | a simple interface for dowloading and loading pretrained models. 501 | """ 502 | def __init__(self, config, *inputs, **kwargs): 503 | super(BertPreTrainedModel, self).__init__() 504 | if not isinstance(config, BertConfig): 505 | raise ValueError( 506 | "Parameter config in `{}(config)` should be an instance of class `BertConfig`. " 507 | "To create a model from a Google pretrained model use " 508 | "`model = {}.from_pretrained(PRETRAINED_MODEL_NAME)`".format( 509 | self.__class__.__name__, self.__class__.__name__ 510 | )) 511 | self.config = config 512 | 513 | def init_bert_weights(self, module): 514 | """ Initialize the weights. 515 | """ 516 | if isinstance(module, (nn.Linear, nn.Embedding)): 517 | # Slightly different from the TF version which uses truncated_normal for initialization 518 | # cf https://github.com/pytorch/pytorch/pull/5617 519 | module.weight.data.normal_(mean=0.0, std=self.config.initializer_range) 520 | elif isinstance(module, BertLayerNorm): 521 | module.bias.data.zero_() 522 | module.weight.data.fill_(1.0) 523 | if isinstance(module, nn.Linear) and module.bias is not None: 524 | module.bias.data.zero_() 525 | 526 | @classmethod 527 | def from_pretrained(cls, pretrained_model_name_or_path, state_dict=None, cache_dir=None, 528 | from_tf=False, *inputs, **kwargs): 529 | """ 530 | Instantiate a BertPreTrainedModel from a pre-trained model file or a pytorch state dict. 531 | Download and cache the pre-trained model file if needed. 532 | 533 | Params: 534 | pretrained_model_name_or_path: either: 535 | - a str with the name of a pre-trained model to load selected in the list of: 536 | . `bert-base-uncased` 537 | . `bert-large-uncased` 538 | . `bert-base-cased` 539 | . `bert-large-cased` 540 | . `bert-base-multilingual-uncased` 541 | . `bert-base-multilingual-cased` 542 | . `bert-base-chinese` 543 | - a path or url to a pretrained model archive containing: 544 | . `bert_config.json` a configuration file for the model 545 | . `pytorch_model.bin` a PyTorch dump of a BertForPreTraining instance 546 | - a path or url to a pretrained model archive containing: 547 | . `bert_config.json` a configuration file for the model 548 | . `model.chkpt` a TensorFlow checkpoint 549 | from_tf: should we load the weights from a locally saved TensorFlow checkpoint 550 | cache_dir: an optional path to a folder in which the pre-trained models will be cached. 551 | state_dict: an optional state dictionnary (collections.OrderedDict object) to use instead of Google pre-trained models 552 | *inputs, **kwargs: additional input for the specific Bert class 553 | (ex: num_labels for BertForSequenceClassification) 554 | """ 555 | if pretrained_model_name_or_path in PRETRAINED_MODEL_ARCHIVE_MAP: 556 | archive_file = PRETRAINED_MODEL_ARCHIVE_MAP[pretrained_model_name_or_path] 557 | else: 558 | archive_file = pretrained_model_name_or_path 559 | # redirect to the cache, if necessary 560 | try: 561 | resolved_archive_file = cached_path(archive_file, cache_dir=cache_dir) 562 | except EnvironmentError: 563 | logger.error( 564 | "Model name '{}' was not found in model name list ({}). " 565 | "We assumed '{}' was a path or url but couldn't find any file " 566 | "associated to this path or url.".format( 567 | pretrained_model_name_or_path, 568 | ', '.join(PRETRAINED_MODEL_ARCHIVE_MAP.keys()), 569 | archive_file)) 570 | return None 571 | if resolved_archive_file == archive_file: 572 | logger.info("loading archive file {}".format(archive_file)) 573 | else: 574 | logger.info("loading archive file {} from cache at {}".format( 575 | archive_file, resolved_archive_file)) 576 | tempdir = None 577 | if os.path.isdir(resolved_archive_file) or from_tf: 578 | serialization_dir = resolved_archive_file 579 | else: 580 | # Extract archive to temp dir 581 | tempdir = tempfile.mkdtemp() 582 | logger.info("extracting archive file {} to temp dir {}".format( 583 | resolved_archive_file, tempdir)) 584 | with tarfile.open(resolved_archive_file, 'r:gz') as archive: 585 | archive.extractall(tempdir) 586 | serialization_dir = tempdir 587 | # Load config 588 | config_file = os.path.join(serialization_dir, CONFIG_NAME) 589 | config = BertConfig.from_json_file(config_file) 590 | logger.info("Model config {}".format(config)) 591 | # Instantiate model. 592 | model = cls(config, *inputs, **kwargs) 593 | if state_dict is None and not from_tf: 594 | weights_path = os.path.join(serialization_dir, WEIGHTS_NAME) 595 | state_dict = torch.load(weights_path, map_location='cpu' if not torch.cuda.is_available() else None) 596 | if tempdir: 597 | # Clean up temp dir 598 | shutil.rmtree(tempdir) 599 | if from_tf: 600 | # Directly load from a TensorFlow checkpoint 601 | weights_path = os.path.join(serialization_dir, TF_WEIGHTS_NAME) 602 | return load_tf_weights_in_bert(model, weights_path) 603 | # Load from a PyTorch state_dict 604 | old_keys = [] 605 | new_keys = [] 606 | for key in state_dict.keys(): 607 | new_key = None 608 | if 'gamma' in key: 609 | new_key = key.replace('gamma', 'weight') 610 | if 'beta' in key: 611 | new_key = key.replace('beta', 'bias') 612 | if new_key: 613 | old_keys.append(key) 614 | new_keys.append(new_key) 615 | for old_key, new_key in zip(old_keys, new_keys): 616 | state_dict[new_key] = state_dict.pop(old_key) 617 | 618 | missing_keys = [] 619 | unexpected_keys = [] 620 | error_msgs = [] 621 | # copy state_dict so _load_from_state_dict can modify it 622 | metadata = getattr(state_dict, '_metadata', None) 623 | state_dict = state_dict.copy() 624 | if metadata is not None: 625 | state_dict._metadata = metadata 626 | 627 | def load(module, prefix=''): 628 | local_metadata = {} if metadata is None else metadata.get(prefix[:-1], {}) 629 | module._load_from_state_dict( 630 | state_dict, prefix, local_metadata, True, missing_keys, unexpected_keys, error_msgs) 631 | for name, child in module._modules.items(): 632 | if child is not None: 633 | load(child, prefix + name + '.') 634 | start_prefix = '' 635 | if not hasattr(model, 'bert') and any(s.startswith('bert.') for s in state_dict.keys()): 636 | start_prefix = 'bert.' 637 | load(model, prefix=start_prefix) 638 | if len(missing_keys) > 0: 639 | logger.info("Weights of {} not initialized from pretrained model: {}".format( 640 | model.__class__.__name__, missing_keys)) 641 | if len(unexpected_keys) > 0: 642 | logger.info("Weights from pretrained model not used in {}: {}".format( 643 | model.__class__.__name__, unexpected_keys)) 644 | if len(error_msgs) > 0: 645 | raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( 646 | model.__class__.__name__, "\n\t".join(error_msgs))) 647 | return model 648 | 649 | 650 | class BertModel(BertPreTrainedModel): 651 | """BERT model ("Bidirectional Embedding Representations from a Transformer"). 652 | 653 | Params: 654 | config: a BertConfig class instance with the configuration to build a new model 655 | 656 | Inputs: 657 | `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] 658 | with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts 659 | `extract_features.py`, `run_classifier.py` and `run_squad.py`) 660 | `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token 661 | types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to 662 | a `sentence B` token (see BERT paper for more details). 663 | `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices 664 | selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max 665 | input sequence length in the current batch. It's the mask that we typically use for attention when 666 | a batch has varying length sentences. 667 | `output_all_encoded_layers`: boolean which controls the content of the `encoded_layers` output as described below. Default: `True`. 668 | 669 | Outputs: Tuple of (encoded_layers, pooled_output) 670 | `encoded_layers`: controled by `output_all_encoded_layers` argument: 671 | - `output_all_encoded_layers=True`: outputs a list of the full sequences of encoded-hidden-states at the end 672 | of each attention block (i.e. 12 full sequences for BERT-base, 24 for BERT-large), each 673 | encoded-hidden-state is a torch.FloatTensor of size [batch_size, sequence_length, hidden_size], 674 | - `output_all_encoded_layers=False`: outputs only the full sequence of hidden-states corresponding 675 | to the last attention block of shape [batch_size, sequence_length, hidden_size], 676 | `pooled_output`: a torch.FloatTensor of size [batch_size, hidden_size] which is the output of a 677 | classifier pretrained on top of the hidden state associated to the first character of the 678 | input (`CLS`) to train on the Next-Sentence task (see BERT's paper). 679 | 680 | Example usage: 681 | ```python 682 | # Already been converted into WordPiece token ids 683 | input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]]) 684 | input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]]) 685 | token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]]) 686 | 687 | config = modeling.BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, 688 | num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) 689 | 690 | model = modeling.BertModel(config=config) 691 | all_encoder_layers, pooled_output = model(input_ids, token_type_ids, input_mask) 692 | ``` 693 | """ 694 | def __init__(self, config): 695 | super(BertModel, self).__init__(config) 696 | self.embeddings = BertEmbeddings(config) 697 | self.encoder = BertEncoder(config) 698 | self.pooler = BertPooler(config) 699 | self.apply(self.init_bert_weights) 700 | 701 | def forward(self, input_ids, token_type_ids=None, attention_mask=None, output_all_encoded_layers=True): 702 | if attention_mask is None: 703 | attention_mask = torch.ones_like(input_ids) 704 | if token_type_ids is None: 705 | token_type_ids = torch.zeros_like(input_ids) 706 | 707 | # We create a 3D attention mask from a 2D tensor mask. 708 | # Sizes are [batch_size, 1, 1, to_seq_length] 709 | # So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length] 710 | # this attention mask is more simple than the triangular masking of causal attention 711 | # used in OpenAI GPT, we just need to prepare the broadcast dimension here. 712 | extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2) 713 | 714 | # Since attention_mask is 1.0 for positions we want to attend and 0.0 for 715 | # masked positions, this operation will create a tensor which is 0.0 for 716 | # positions we want to attend and -10000.0 for masked positions. 717 | # Since we are adding it to the raw scores before the softmax, this is 718 | # effectively the same as removing these entirely. 719 | extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility 720 | extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0 721 | 722 | embedding_output = self.embeddings(input_ids, token_type_ids) 723 | encoded_layers = self.encoder(embedding_output, 724 | extended_attention_mask, 725 | output_all_encoded_layers=output_all_encoded_layers) 726 | sequence_output = encoded_layers[-1] 727 | pooled_output = self.pooler(sequence_output) 728 | if not output_all_encoded_layers: 729 | encoded_layers = encoded_layers[-1] 730 | return encoded_layers, pooled_output 731 | 732 | 733 | class BertForPreTraining(BertPreTrainedModel): 734 | """BERT model with pre-training heads. 735 | This module comprises the BERT model followed by the two pre-training heads: 736 | - the masked language modeling head, and 737 | - the next sentence classification head. 738 | 739 | Params: 740 | config: a BertConfig class instance with the configuration to build a new model. 741 | 742 | Inputs: 743 | `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] 744 | with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts 745 | `extract_features.py`, `run_classifier.py` and `run_squad.py`) 746 | `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token 747 | types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to 748 | a `sentence B` token (see BERT paper for more details). 749 | `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices 750 | selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max 751 | input sequence length in the current batch. It's the mask that we typically use for attention when 752 | a batch has varying length sentences. 753 | `masked_lm_labels`: optional masked language modeling labels: torch.LongTensor of shape [batch_size, sequence_length] 754 | with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss 755 | is only computed for the labels set in [0, ..., vocab_size] 756 | `next_sentence_label`: optional next sentence classification loss: torch.LongTensor of shape [batch_size] 757 | with indices selected in [0, 1]. 758 | 0 => next sentence is the continuation, 1 => next sentence is a random sentence. 759 | 760 | Outputs: 761 | if `masked_lm_labels` and `next_sentence_label` are not `None`: 762 | Outputs the total_loss which is the sum of the masked language modeling loss and the next 763 | sentence classification loss. 764 | if `masked_lm_labels` or `next_sentence_label` is `None`: 765 | Outputs a tuple comprising 766 | - the masked language modeling logits of shape [batch_size, sequence_length, vocab_size], and 767 | - the next sentence classification logits of shape [batch_size, 2]. 768 | 769 | Example usage: 770 | ```python 771 | # Already been converted into WordPiece token ids 772 | input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]]) 773 | input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]]) 774 | token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]]) 775 | 776 | config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, 777 | num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) 778 | 779 | model = BertForPreTraining(config) 780 | masked_lm_logits_scores, seq_relationship_logits = model(input_ids, token_type_ids, input_mask) 781 | ``` 782 | """ 783 | def __init__(self, config): 784 | super(BertForPreTraining, self).__init__(config) 785 | self.bert = BertModel(config) 786 | self.cls = BertPreTrainingHeads(config, self.bert.embeddings.word_embeddings.weight) 787 | self.apply(self.init_bert_weights) 788 | 789 | def forward(self, input_ids, token_type_ids=None, attention_mask=None, masked_lm_labels=None, next_sentence_label=None): 790 | sequence_output, pooled_output = self.bert(input_ids, token_type_ids, attention_mask, 791 | output_all_encoded_layers=False) 792 | prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output) 793 | 794 | if masked_lm_labels is not None and next_sentence_label is not None: 795 | loss_fct = CrossEntropyLoss(ignore_index=-1) 796 | masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), masked_lm_labels.view(-1)) 797 | next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1)) 798 | total_loss = masked_lm_loss + next_sentence_loss 799 | return total_loss 800 | else: 801 | return prediction_scores, seq_relationship_score 802 | 803 | 804 | class BertForMaskedLM(BertPreTrainedModel): 805 | """BERT model with the masked language modeling head. 806 | This module comprises the BERT model followed by the masked language modeling head. 807 | 808 | Params: 809 | config: a BertConfig class instance with the configuration to build a new model. 810 | 811 | Inputs: 812 | `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] 813 | with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts 814 | `extract_features.py`, `run_classifier.py` and `run_squad.py`) 815 | `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token 816 | types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to 817 | a `sentence B` token (see BERT paper for more details). 818 | `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices 819 | selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max 820 | input sequence length in the current batch. It's the mask that we typically use for attention when 821 | a batch has varying length sentences. 822 | `masked_lm_labels`: masked language modeling labels: torch.LongTensor of shape [batch_size, sequence_length] 823 | with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss 824 | is only computed for the labels set in [0, ..., vocab_size] 825 | 826 | Outputs: 827 | if `masked_lm_labels` is not `None`: 828 | Outputs the masked language modeling loss. 829 | if `masked_lm_labels` is `None`: 830 | Outputs the masked language modeling logits of shape [batch_size, sequence_length, vocab_size]. 831 | 832 | Example usage: 833 | ```python 834 | # Already been converted into WordPiece token ids 835 | input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]]) 836 | input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]]) 837 | token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]]) 838 | 839 | config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, 840 | num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) 841 | 842 | model = BertForMaskedLM(config) 843 | masked_lm_logits_scores = model(input_ids, token_type_ids, input_mask) 844 | ``` 845 | """ 846 | def __init__(self, config): 847 | super(BertForMaskedLM, self).__init__(config) 848 | self.bert = BertModel(config) 849 | self.cls = BertOnlyMLMHead(config, self.bert.embeddings.word_embeddings.weight) 850 | self.apply(self.init_bert_weights) 851 | 852 | def forward(self, input_ids, token_type_ids=None, attention_mask=None, masked_lm_labels=None): 853 | sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, 854 | output_all_encoded_layers=False) 855 | prediction_scores = self.cls(sequence_output) 856 | 857 | if masked_lm_labels is not None: 858 | loss_fct = CrossEntropyLoss(ignore_index=-1) 859 | masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), masked_lm_labels.view(-1)) 860 | return masked_lm_loss 861 | else: 862 | return prediction_scores 863 | 864 | 865 | class BertForNextSentencePrediction(BertPreTrainedModel): 866 | """BERT model with next sentence prediction head. 867 | This module comprises the BERT model followed by the next sentence classification head. 868 | 869 | Params: 870 | config: a BertConfig class instance with the configuration to build a new model. 871 | 872 | Inputs: 873 | `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] 874 | with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts 875 | `extract_features.py`, `run_classifier.py` and `run_squad.py`) 876 | `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token 877 | types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to 878 | a `sentence B` token (see BERT paper for more details). 879 | `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices 880 | selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max 881 | input sequence length in the current batch. It's the mask that we typically use for attention when 882 | a batch has varying length sentences. 883 | `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size] 884 | with indices selected in [0, 1]. 885 | 0 => next sentence is the continuation, 1 => next sentence is a random sentence. 886 | 887 | Outputs: 888 | if `next_sentence_label` is not `None`: 889 | Outputs the total_loss which is the sum of the masked language modeling loss and the next 890 | sentence classification loss. 891 | if `next_sentence_label` is `None`: 892 | Outputs the next sentence classification logits of shape [batch_size, 2]. 893 | 894 | Example usage: 895 | ```python 896 | # Already been converted into WordPiece token ids 897 | input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]]) 898 | input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]]) 899 | token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]]) 900 | 901 | config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, 902 | num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) 903 | 904 | model = BertForNextSentencePrediction(config) 905 | seq_relationship_logits = model(input_ids, token_type_ids, input_mask) 906 | ``` 907 | """ 908 | def __init__(self, config): 909 | super(BertForNextSentencePrediction, self).__init__(config) 910 | self.bert = BertModel(config) 911 | self.cls = BertOnlyNSPHead(config) 912 | self.apply(self.init_bert_weights) 913 | 914 | def forward(self, input_ids, token_type_ids=None, attention_mask=None, next_sentence_label=None): 915 | _, pooled_output = self.bert(input_ids, token_type_ids, attention_mask, 916 | output_all_encoded_layers=False) 917 | seq_relationship_score = self.cls( pooled_output) 918 | 919 | if next_sentence_label is not None: 920 | loss_fct = CrossEntropyLoss(ignore_index=-1) 921 | next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1)) 922 | return next_sentence_loss 923 | else: 924 | return seq_relationship_score 925 | 926 | 927 | class BertForSequenceClassification(BertPreTrainedModel): 928 | """BERT model for classification. 929 | """ 930 | def __init__(self, config, num_labels): 931 | super(BertForSequenceClassification, self).__init__(config) 932 | self.num_labels = num_labels 933 | self.bert = BertModel(config) 934 | self.dropout = nn.Dropout(config.hidden_dropout_prob) 935 | self.activation = ACT2FN[config.hidden_act] 936 | self.activation_final = nn.Tanh() 937 | self.classifier = nn.Linear(5*config.hidden_size, num_labels) 938 | self.apply(self.init_bert_weights) 939 | 940 | 941 | def forward(self, title_ids, title_type_ids=None, title_mask=None, input_ids=None, token_type_ids=None, attention_mask=None, sdp_ids=None, sdp_type_ids=None, sdp_mask=None, P_gauss1_bs=None, P_gauss2_bs=None, labels=None): 942 | 943 | input_token_output, pooled_output = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False) 944 | input_token_output = self.dropout(input_token_output) 945 | input_token_output = self.activation(input_token_output) 946 | 947 | _, title_output = self.bert(title_ids, title_type_ids, title_mask, output_all_encoded_layers=False) 948 | title_query = title_output.unsqueeze(dim=1) 949 | t_u = torch.matmul(title_query, input_token_output.transpose(-1, -2)) 950 | t_alpha = nn.Softmax(dim=-1)(t_u) 951 | t_v = torch.matmul(t_alpha, input_token_output) 952 | output_t = t_v.squeeze(dim=1) 953 | output_t = self.activation_final(output_t) 954 | output_t = self.dropout(output_t) 955 | 956 | _, sdp_output = self.bert(sdp_ids, sdp_type_ids, sdp_mask, output_all_encoded_layers=False) 957 | sdp_query=sdp_output.unsqueeze(dim=1) 958 | u = torch.matmul(sdp_query, input_token_output.transpose(-1, -2)) 959 | alpha = nn.Softmax(dim=-1)(u) 960 | v = torch.matmul(alpha, input_token_output) 961 | output_g = v.squeeze(dim=1) 962 | output_g = self.activation_final(output_g) 963 | output_g = self.dropout(output_g) 964 | 965 | P_gauss1_bs=P_gauss1_bs.unsqueeze(dim=1) 966 | P_gauss2_bs=P_gauss2_bs.unsqueeze(dim=1) 967 | gauss_entity1=torch.matmul(P_gauss1_bs, input_token_output) 968 | gauss_entity2=torch.matmul(P_gauss2_bs, input_token_output) 969 | gauss_entity1=gauss_entity1.squeeze(dim=1) 970 | gauss_entity2=gauss_entity2.squeeze(dim=1) 971 | gauss_entity1 = self.activation_final(gauss_entity1) 972 | gauss_entity2 = self.activation_final(gauss_entity2) 973 | gauss_entity1 = self.dropout(gauss_entity1) 974 | gauss_entity2 = self.dropout(gauss_entity2) 975 | 976 | pooled_output = self.dropout(pooled_output) 977 | output = torch.cat((output_t,pooled_output,gauss_entity1,gauss_entity2,output_g),-1) 978 | output = self.dropout(output) 979 | logits = self.classifier(output) 980 | 981 | if labels is not None: 982 | loss_fct = CrossEntropyLoss() 983 | loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) 984 | return loss 985 | else: 986 | return logits 987 | 988 | 989 | class BertForMultipleChoice(BertPreTrainedModel): 990 | """BERT model for multiple choice tasks. 991 | This module is composed of the BERT model with a linear layer on top of 992 | the pooled output. 993 | 994 | Params: 995 | `config`: a BertConfig class instance with the configuration to build a new model. 996 | `num_choices`: the number of classes for the classifier. Default = 2. 997 | 998 | Inputs: 999 | `input_ids`: a torch.LongTensor of shape [batch_size, num_choices, sequence_length] 1000 | with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts 1001 | `extract_features.py`, `run_classifier.py` and `run_squad.py`) 1002 | `token_type_ids`: an optional torch.LongTensor of shape [batch_size, num_choices, sequence_length] 1003 | with the token types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` 1004 | and type 1 corresponds to a `sentence B` token (see BERT paper for more details). 1005 | `attention_mask`: an optional torch.LongTensor of shape [batch_size, num_choices, sequence_length] with indices 1006 | selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max 1007 | input sequence length in the current batch. It's the mask that we typically use for attention when 1008 | a batch has varying length sentences. 1009 | `labels`: labels for the classification output: torch.LongTensor of shape [batch_size] 1010 | with indices selected in [0, ..., num_choices]. 1011 | 1012 | Outputs: 1013 | if `labels` is not `None`: 1014 | Outputs the CrossEntropy classification loss of the output with the labels. 1015 | if `labels` is `None`: 1016 | Outputs the classification logits of shape [batch_size, num_labels]. 1017 | 1018 | Example usage: 1019 | ```python 1020 | # Already been converted into WordPiece token ids 1021 | input_ids = torch.LongTensor([[[31, 51, 99], [15, 5, 0]], [[12, 16, 42], [14, 28, 57]]]) 1022 | input_mask = torch.LongTensor([[[1, 1, 1], [1, 1, 0]],[[1,1,0], [1, 0, 0]]]) 1023 | token_type_ids = torch.LongTensor([[[0, 0, 1], [0, 1, 0]],[[0, 1, 1], [0, 0, 1]]]) 1024 | config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, 1025 | num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) 1026 | 1027 | num_choices = 2 1028 | 1029 | model = BertForMultipleChoice(config, num_choices) 1030 | logits = model(input_ids, token_type_ids, input_mask) 1031 | ``` 1032 | """ 1033 | def __init__(self, config, num_choices): 1034 | super(BertForMultipleChoice, self).__init__(config) 1035 | self.num_choices = num_choices 1036 | self.bert = BertModel(config) 1037 | self.dropout = nn.Dropout(config.hidden_dropout_prob) 1038 | self.classifier = nn.Linear(config.hidden_size, 1) 1039 | self.apply(self.init_bert_weights) 1040 | 1041 | def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None): 1042 | flat_input_ids = input_ids.view(-1, input_ids.size(-1)) 1043 | flat_token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) 1044 | flat_attention_mask = attention_mask.view(-1, attention_mask.size(-1)) 1045 | _, pooled_output = self.bert(flat_input_ids, flat_token_type_ids, flat_attention_mask, output_all_encoded_layers=False) 1046 | pooled_output = self.dropout(pooled_output) 1047 | logits = self.classifier(pooled_output) 1048 | reshaped_logits = logits.view(-1, self.num_choices) 1049 | 1050 | if labels is not None: 1051 | loss_fct = CrossEntropyLoss() 1052 | loss = loss_fct(reshaped_logits, labels) 1053 | return loss 1054 | else: 1055 | return reshaped_logits 1056 | 1057 | 1058 | class BertForTokenClassification(BertPreTrainedModel): 1059 | """BERT model for token-level classification. 1060 | This module is composed of the BERT model with a linear layer on top of 1061 | the full hidden state of the last layer. 1062 | 1063 | Params: 1064 | `config`: a BertConfig class instance with the configuration to build a new model. 1065 | `num_labels`: the number of classes for the classifier. Default = 2. 1066 | 1067 | Inputs: 1068 | `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] 1069 | with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts 1070 | `extract_features.py`, `run_classifier.py` and `run_squad.py`) 1071 | `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token 1072 | types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to 1073 | a `sentence B` token (see BERT paper for more details). 1074 | `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices 1075 | selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max 1076 | input sequence length in the current batch. It's the mask that we typically use for attention when 1077 | a batch has varying length sentences. 1078 | `labels`: labels for the classification output: torch.LongTensor of shape [batch_size, sequence_length] 1079 | with indices selected in [0, ..., num_labels]. 1080 | 1081 | Outputs: 1082 | if `labels` is not `None`: 1083 | Outputs the CrossEntropy classification loss of the output with the labels. 1084 | if `labels` is `None`: 1085 | Outputs the classification logits of shape [batch_size, sequence_length, num_labels]. 1086 | 1087 | Example usage: 1088 | ```python 1089 | # Already been converted into WordPiece token ids 1090 | input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]]) 1091 | input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]]) 1092 | token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]]) 1093 | 1094 | config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, 1095 | num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) 1096 | 1097 | num_labels = 2 1098 | 1099 | model = BertForTokenClassification(config, num_labels) 1100 | logits = model(input_ids, token_type_ids, input_mask) 1101 | ``` 1102 | """ 1103 | def __init__(self, config, num_labels): 1104 | super(BertForTokenClassification, self).__init__(config) 1105 | self.num_labels = num_labels 1106 | self.bert = BertModel(config) 1107 | self.dropout = nn.Dropout(config.hidden_dropout_prob) 1108 | self.classifier = nn.Linear(config.hidden_size, num_labels) 1109 | self.apply(self.init_bert_weights) 1110 | 1111 | def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None): 1112 | sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False) 1113 | sequence_output = self.dropout(sequence_output) 1114 | logits = self.classifier(sequence_output) 1115 | 1116 | if labels is not None: 1117 | loss_fct = CrossEntropyLoss() 1118 | # Only keep active parts of the loss 1119 | if attention_mask is not None: 1120 | active_loss = attention_mask.view(-1) == 1 1121 | active_logits = logits.view(-1, self.num_labels)[active_loss] 1122 | active_labels = labels.view(-1)[active_loss] 1123 | loss = loss_fct(active_logits, active_labels) 1124 | else: 1125 | loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) 1126 | return loss 1127 | else: 1128 | return logits 1129 | 1130 | 1131 | class BertForQuestionAnswering(BertPreTrainedModel): 1132 | """BERT model for Question Answering (span extraction). 1133 | This module is composed of the BERT model with a linear layer on top of 1134 | the sequence output that computes start_logits and end_logits 1135 | 1136 | Params: 1137 | `config`: a BertConfig class instance with the configuration to build a new model. 1138 | 1139 | Inputs: 1140 | `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] 1141 | with the word token indices in the vocabulary(see the tokens preprocessing logic in the scripts 1142 | `extract_features.py`, `run_classifier.py` and `run_squad.py`) 1143 | `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token 1144 | types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to 1145 | a `sentence B` token (see BERT paper for more details). 1146 | `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices 1147 | selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max 1148 | input sequence length in the current batch. It's the mask that we typically use for attention when 1149 | a batch has varying length sentences. 1150 | `start_positions`: position of the first token for the labeled span: torch.LongTensor of shape [batch_size]. 1151 | Positions are clamped to the length of the sequence and position outside of the sequence are not taken 1152 | into account for computing the loss. 1153 | `end_positions`: position of the last token for the labeled span: torch.LongTensor of shape [batch_size]. 1154 | Positions are clamped to the length of the sequence and position outside of the sequence are not taken 1155 | into account for computing the loss. 1156 | 1157 | Outputs: 1158 | if `start_positions` and `end_positions` are not `None`: 1159 | Outputs the total_loss which is the sum of the CrossEntropy loss for the start and end token positions. 1160 | if `start_positions` or `end_positions` is `None`: 1161 | Outputs a tuple of start_logits, end_logits which are the logits respectively for the start and end 1162 | position tokens of shape [batch_size, sequence_length]. 1163 | 1164 | Example usage: 1165 | ```python 1166 | # Already been converted into WordPiece token ids 1167 | input_ids = torch.LongTensor([[31, 51, 99], [15, 5, 0]]) 1168 | input_mask = torch.LongTensor([[1, 1, 1], [1, 1, 0]]) 1169 | token_type_ids = torch.LongTensor([[0, 0, 1], [0, 1, 0]]) 1170 | 1171 | config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, 1172 | num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072) 1173 | 1174 | model = BertForQuestionAnswering(config) 1175 | start_logits, end_logits = model(input_ids, token_type_ids, input_mask) 1176 | ``` 1177 | """ 1178 | def __init__(self, config): 1179 | super(BertForQuestionAnswering, self).__init__(config) 1180 | self.bert = BertModel(config) 1181 | # TODO check with Google if it's normal there is no dropout on the token classifier of SQuAD in the TF version 1182 | # self.dropout = nn.Dropout(config.hidden_dropout_prob) 1183 | self.qa_outputs = nn.Linear(config.hidden_size, 2) 1184 | self.apply(self.init_bert_weights) 1185 | 1186 | def forward(self, input_ids, token_type_ids=None, attention_mask=None, start_positions=None, end_positions=None): 1187 | sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False) 1188 | logits = self.qa_outputs(sequence_output) 1189 | start_logits, end_logits = logits.split(1, dim=-1) 1190 | start_logits = start_logits.squeeze(-1) 1191 | end_logits = end_logits.squeeze(-1) 1192 | 1193 | if start_positions is not None and end_positions is not None: 1194 | # If we are on multi-GPU, split add a dimension 1195 | if len(start_positions.size()) > 1: 1196 | start_positions = start_positions.squeeze(-1) 1197 | if len(end_positions.size()) > 1: 1198 | end_positions = end_positions.squeeze(-1) 1199 | # sometimes the start/end positions are outside our model inputs, we ignore these terms 1200 | ignored_index = start_logits.size(1) 1201 | start_positions.clamp_(0, ignored_index) 1202 | end_positions.clamp_(0, ignored_index) 1203 | 1204 | loss_fct = CrossEntropyLoss(ignore_index=ignored_index) 1205 | start_loss = loss_fct(start_logits, start_positions) 1206 | end_loss = loss_fct(end_logits, end_positions) 1207 | total_loss = (start_loss + end_loss) / 2 1208 | return total_loss 1209 | else: 1210 | return start_logits, end_logits 1211 | -------------------------------------------------------------------------------- /model.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CongSun-dlut/CPI_extraction/cda9f2b7f6fb757b999744cc9a95be205e2cc265/model.jpg --------------------------------------------------------------------------------